openlayer.InferencePipeline.upload_reference_dataset#
- InferencePipeline.upload_reference_dataset(*args, **kwargs)#
Uploads a reference dataset saved as a csv file to an inference pipeline.
The reference dataset is used to measure drift in the inference pipeline. The different types of drift are measured by comparing the production data published to the platform with the reference dataset.
Ideally, the reference dataset should be a representative sample of the training set used to train the deployed model.
- Parameters:
- file_pathstr
Path to the csv file containing the reference dataset.
- dataset_configDict[str, any], optional
Dictionary containing the dataset configuration. This is not needed if
dataset_config_file_pathis provided.What’s in the dataset config?
The dataset configuration depends on the
TaskType. Refer to the How to write dataset configs guides for details.- dataset_config_file_pathstr
Path to the dataset configuration YAML file. This is not needed if
dataset_configis provided.What’s in the dataset config file?
The dataset configuration YAML depends on the
TaskType. Refer to the How to write dataset configs guides for details.
Notes
Your dataset is in a pandas dataframe? You can use the
upload_reference_dataframemethod instead.Examples
Related guide: How to set up monitoring.
First, instantiate the client and retrieve an existing inference pipeline:
>>> import openlayer >>> >>> client = openlayer.OpenlayerClient('YOUR_API_KEY_HERE') >>> >>> project = client.load_project(name="Churn prediction") >>> >>> inference_pipeline = project.load_inference_pipeline( ... name="XGBoost model inference pipeline", ... )
With the
InferencePipelineobject retrieved, you are able to upload a reference dataset.For example, if your project’s task type is tabular classification and your dataset looks like the following:
CreditScore
Geography
Balance
Churned
618
France
321.92
1
714
Germany
102001.22
0
604
Spain
12333.15
0
Important
The labels in your csv must be integers that correctly index into the
class_namesarray that you define (as shown below). E.g. 0 => ‘Retained’, 1 => ‘Churned’Prepare the dataset config:
>>> dataset_config = { ... 'classNames': ['Retained', 'Churned'], ... 'labelColumnName': 'Churned', ... 'featureNames': ['CreditScore', 'Geography', 'Balance'], ... 'categoricalFeatureNames': ['Geography'], ... }
You can now upload this reference dataset to your project with:
>>> inference_pipeline.upload_reference_dataset( ... file_path='/path/to/dataset.csv', ... dataset_config=dataset_config, ... )