openlayer.InferencePipeline.update_data#

InferencePipeline.update_data(*args, **kwargs)#

Updates values for data already on the Openlayer platform.

This method is frequently used to upload the ground truths of production data that was already published without them. This is useful when the ground truths are not available during inference time, but they shall be update later to enable performance metrics.

Parameters:
dfpd.DataFrame

Dataframe containing ground truths.

The df must contain a column with the inference IDs, and another column with the ground truths.

ground_truth_column_nameOptional[str]

Name of the column containing the ground truths. Optional, defaults to None.

inference_id_column_namestr

Name of the column containing the inference IDs. The inference IDs are used to match the ground truths with the production data already published.

Examples

Related guide: How to set up monitoring.

Let’s say you have a batch of production data already published to the Openlayer platform (with the method publish_batch_data). Now, you want to update the ground truths of this batch.

First, instantiate the client and retrieve an existing inference pipeline:

>>> import openlayer
>>>
>>> client = openlayer.OpenlayerClient('YOUR_API_KEY_HERE')
>>>
>>> project = client.load_project(name="Churn prediction")
>>>
>>> inference_pipeline = project.load_inference_pipeline(
...     name="XGBoost model inference pipeline",
... )

If your df with the ground truths looks like the following:

>>> df
            inference_id  label
0             d56d2b2c      0
1             3b0b2521      1
2             8c294a3a      0

You can publish the ground truths with:

>>> inference_pipeline.update_data(
...     df=df,
...     inference_id_column_name='inference_id',
...     ground_truth_column_name='label',
... )