openlayer.InferencePipeline.upload_reference_dataframe#
- InferencePipeline.upload_reference_dataframe(*args, **kwargs)#
Uploads a reference dataset (a pandas dataframe) to an inference pipeline.
The reference dataset is used to measure drift in the inference pipeline. The different types of drift are measured by comparing the production data published to the platform with the reference dataset.
Ideally, the reference dataset should be a representative sample of the training set used to train the deployed model.
- Parameters:
- dataset_dfpd.DataFrame
Dataframe containing the reference dataset.
- dataset_configDict[str, any], optional
Dictionary containing the dataset configuration. This is not needed if
dataset_config_file_path
is provided.What’s in the dataset config?
The dataset configuration depends on the
TaskType
. Refer to the How to write dataset configs guides for details.- dataset_config_file_pathstr
Path to the dataset configuration YAML file. This is not needed if
dataset_config
is provided.What’s in the dataset config file?
The dataset configuration YAML depends on the
TaskType
. Refer to the How to write dataset configs guides for details.
Notes
Your dataset is in csv file? You can use the
upload_reference_dataset
method instead.Examples
Related guide: How to set up monitoring.
First, instantiate the client and retrieve an existing inference pipeline:
>>> import openlayer >>> >>> client = openlayer.OpenlayerClient('YOUR_API_KEY_HERE') >>> >>> project = client.load_project(name="Churn prediction") >>> >>> inference_pipeline = project.load_inference_pipeline( ... name="XGBoost model inference pipeline", ... )
With the
InferencePipeline
object retrieved, you are able to upload a reference dataset.For example, if your project’s task type is tabular classification, your dataset looks like the following (stored in a pandas dataframe called
df
):>>> df CreditScore Geography Balance Churned 0 618 France 321.92 1 1 714 Germany 102001.22 0 2 604 Spain 12333.15 0
Important
The labels in your csv must be integers that correctly index into the
class_names
array that you define (as shown below). E.g. 0 => ‘Retained’, 1 => ‘Churned’Prepare the dataset config:
>>> dataset_config = { ... 'classNames': ['Retained', 'Churned'], ... 'labelColumnName': 'Churned', ... 'featureNames': ['CreditScore', 'Geography', 'Balance'], ... 'categoricalFeatureNames': ['Geography'], ... }
You can now upload this reference dataset to your project with:
>>> inference_pipeline.upload_reference_dataframe( ... dataset_df=df, ... dataset_config_file_path=dataset_config, ... )