openlayer.OpenlayerClient.add_dataset#
- OpenlayerClient.add_dataset(file_path, task_type, dataset_config=None, dataset_config_file_path=None, project_id=None, force=False)#
Adds a dataset to a project’s staging area (from a csv).
- Parameters
- file_pathstr
Path to the csv file containing the dataset.
- dataset_config: Dict[str, any]
Dictionary containing the dataset configuration. This is not needed if
dataset_config_file_path
is provided.What’s in the dataset config?
The dataset configuration depends on the
TaskType
. Refer to the documentation for examples.- dataset_config_file_pathstr
Path to the dataset configuration YAML file. This is not needed if
dataset_config
is provided.What’s in the dataset config file?
The dataset configuration YAML depends on the
TaskType
. Refer to the documentation for examples.- forcebool
If
add_dataset
is called when there is already a dataset of the same type in the staging area, whenforce=True
, the existing staged dataset will be overwritten by the new one. Whenforce=False
, the user will be prompted to confirm the overwrite.
Notes
Please ensure your input features are strings, ints or floats.
Please ensure your label column name is not contained in
feature_names
.
Examples
First, instantiate the client:
>>> import openlayer >>> client = openlayer.OpenlayerClient('YOUR_API_KEY_HERE')
Create a project if you don’t have one:
>>> from openlayer.tasks import TaskType >>> project = client.create_project( ... name="Churn Prediction", ... task_type=TaskType.TabularClassification, ... description="My first project!", ... )
If you already have a project created on the platform:
>>> project = client.load_project(name="Your project name")
If your project’s task type is tabular classification…
Let’s say your dataset looks like the following:
CreditScore
Geography
Balance
Churned
618
France
321.92
1
714
Germany
102001.22
0
604
Spain
12333.15
0
Important
The labels in your csv must be integers that correctly index into the
class_names
array that you define (as shown below). E.g. 0 => ‘Retained’, 1 => ‘Churned’Write the dataset config YAML file with the variables are needed by Openlayer:
>>> import yaml >>> >> dataset_config = { ... 'columnNames': ['CreditScore', 'Geography', 'Balance', 'Churned'], ... 'classNames': ['Retained', 'Churned'], ... 'labelColumnName': 'Churned', ... 'label': 'training', # or 'validation' ... 'featureNames': ['CreditScore', 'Geography', 'Balance'], ... 'categoricalFeatureNames': ['Geography'], ... } >>> >>> with open('/path/to/dataset_config.yaml', 'w') as f: ... yaml.dump(dataset_config, f)
You can now add this dataset to your project with:
>>> project.add_dataset( ... file_path='/path/to/dataset.csv', ... dataset_config_file_path='/path/to/dataset_config.yaml', ... )
After adding the dataset to the project, it is staged, waiting to be committed and pushed to the platform. You can check what’s on your staging area with
status
. If you want to push the dataset right away with a commit message, you can use thecommit
andpush
methods:>>> project.commit("Initial dataset commit.") >>> project.push()
If your task type is text classification…
Let’s say your dataset looks like the following:
Text
Sentiment
I have had a long weekend
0
I’m in a fantastic mood today
1
Things are looking up
1
Write the dataset config YAML file with the variables are needed by Openlayer:
>>> import yaml >>> >> dataset_config = { ... 'columnNames': ['Text', 'Sentiment'], ... 'classNames': ['Negative', 'Positive'], ... 'labelColumnName': 'Sentiment', ... 'label': 'training', # or 'validation' ... 'textColumnName': 'Text', ... } >>> >>> with open('/path/to/dataset_config.yaml', 'w') as f: ... yaml.dump(dataset_config, f)
You can now add this dataset to your project with:
>>> project.add_dataset( ... file_path='/path/to/dataset.csv', ... dataset_config_file_path='/path/to/dataset_config.yaml', ... )
After adding the dataset to the project, it is staged, waiting to be committed and pushed to the platform. You can check what’s on your staging area with
status
. If you want to push the dataset right away with a commit message, you can use thecommit
andpush
methods:>>> project.commit("Initial dataset commit.") >>> project.push()