openlayer.InferencePipeline.upload_reference_dataset#
- InferencePipeline.upload_reference_dataset(*args, **kwargs)#
Uploads a reference dataset saved as a csv file to an inference pipeline.
The reference dataset is used to measure drift in the inference pipeline. The different types of drift are measured by comparing the production data published to the platform with the reference dataset.
Ideally, the reference dataset should be a representative sample of the training set used to train the deployed model.
- Parameters:
- file_pathstr
Path to the csv file containing the reference dataset.
- dataset_configDict[str, any], optional
Dictionary containing the dataset configuration. This is not needed if
dataset_config_file_path
is provided.What’s in the dataset config?
The dataset configuration depends on the
TaskType
. Refer to the How to write dataset configs guides for details.- dataset_config_file_pathstr
Path to the dataset configuration YAML file. This is not needed if
dataset_config
is provided.What’s in the dataset config file?
The dataset configuration YAML depends on the
TaskType
. Refer to the How to write dataset configs guides for details.
Notes
Your dataset is in a pandas dataframe? You can use the
upload_reference_dataframe
method instead.Examples
Related guide: How to set up monitoring.
First, instantiate the client and retrieve an existing inference pipeline:
>>> import openlayer >>> >>> client = openlayer.OpenlayerClient('YOUR_API_KEY_HERE') >>> >>> project = client.load_project(name="Churn prediction") >>> >>> inference_pipeline = project.load_inference_pipeline( ... name="XGBoost model inference pipeline", ... )
With the
InferencePipeline
object retrieved, you are able to upload a reference dataset.For example, if your project’s task type is tabular classification and your dataset looks like the following:
CreditScore
Geography
Balance
Churned
618
France
321.92
1
714
Germany
102001.22
0
604
Spain
12333.15
0
Important
The labels in your csv must be integers that correctly index into the
class_names
array that you define (as shown below). E.g. 0 => ‘Retained’, 1 => ‘Churned’Prepare the dataset config:
>>> dataset_config = { ... 'classNames': ['Retained', 'Churned'], ... 'labelColumnName': 'Churned', ... 'featureNames': ['CreditScore', 'Geography', 'Balance'], ... 'categoricalFeatureNames': ['Geography'], ... }
You can now upload this reference dataset to your project with:
>>> inference_pipeline.upload_reference_dataset( ... file_path='/path/to/dataset.csv', ... dataset_config=dataset_config, ... )