openlayer.Project.add_dataset#
- Project.add_dataset(*args, **kwargs)#
Adds a dataset (csv file) to a project’s staging area.
- Parameters:
- file_pathstr
Path to the dataset csv file.
- dataset_config: Dict[str, any]
Dictionary containing the dataset configuration. This is not needed if
dataset_config_file_path
is provided.What’s in the dataset config?
The dataset configuration depends on the project’s
tasks.TaskType
. Refer to the How to write dataset configs guides for details.- dataset_config_file_pathstr
Path to the dataset configuration YAML file. This is not needed if
dataset_config
is provided.What’s in the dataset config file?
The dataset configuration YAML depends on the project’s
tasks.TaskType
. Refer to the How to write dataset configs guides for details.- forcebool
If
add_dataset
is called when there is already a dataset of the same type in the staging area, whenforce=True
, the existing staged dataset will be overwritten by the new one. Whenforce=False
, the user will be prompted to confirm the overwrite first.
Notes
Your dataset is in a pandas dataframe? You can use the
add_dataframe
method instead.Examples
Related guide: How to upload datasets and models for development.
First, instantiate the client:
>>> import openlayer >>> >>> client = openlayer.OpenlayerClient('YOUR_API_KEY_HERE')
Create a project if you don’t have one:
>>> from openlayer.tasks import TaskType >>> >>> project = client.create_project( ... name="Churn Prediction", ... task_type=TaskType.TabularClassification, ... description="My first project!", ... )
If you already have a project created on the platform:
>>> project = client.load_project(name="Your project name")
Let’s say you have a tabular classification project and your dataset looks like the following:
CreditScore
Geography
Balance
Churned
618
France
321.92
1
714
Germany
102001.22
0
604
Spain
12333.15
0
Prepare the dataset config:
>>> dataset_config = { ... 'classNames': ['Retained', 'Churned'], ... 'labelColumnName': 'Churned', ... 'label': 'training', # or 'validation' ... 'featureNames': ['CreditScore', 'Geography', 'Balance'], ... 'categoricalFeatureNames': ['Geography'], ... }
What’s in the dataset config?
The dataset configuration depends on the project’s
tasks.TaskType
. Refer to the How to write dataset configs guides for details.You can now add this dataset to your project with:
>>> project.add_dataset( ... file_path='/path/to/dataset.csv', ... dataset_config=dataset_config, ... )
After adding the dataset to the project, it is staged, waiting to be committed and pushed to the platform.
You can check what’s on your staging area with
status
. If you want to push the dataset right away with a commit message, you can use thecommit
andpush
methods:>>> project.commit("Initial dataset commit.") >>> project.push()