openlayer.Project.add_dataframe#
- Project.add_dataframe(*args, **kwargs)#
Adds a dataset (Pandas dataframe) to a project’s staging area.
- Parameters:
- dataset_dfpd.DataFrame
Dataframe with your dataset.
- dataset_config: Dict[str, any]
Dictionary containing the dataset configuration. This is not needed if
dataset_config_file_path
is provided.What’s in the dataset config?
The dataset configuration depends on the project’s
tasks.TaskType
. Refer to the How to write dataset configs guides for details.- dataset_config_file_pathstr
Path to the dataset configuration YAML file. This is not needed if
dataset_config
is provided.What’s in the dataset config file?
The dataset configuration YAML depends on the project’s
tasks.TaskType
. Refer to the How to write dataset configs guides for details.- forcebool
If
add_dataset
is called when there is already a dataset of the same type in the staging area, whenforce=True
, the existing staged dataset will be overwritten by the new one. Whenforce=False
, the user will be prompted to confirm the overwrite first.
Notes
Your dataset is in csv file? You can use the
add_dataset
method instead.Examples
Related guide: How to upload datasets and models for development.
First, instantiate the client:
>>> import openlayer >>> >>> client = openlayer.OpenlayerClient('YOUR_API_KEY_HERE')
Create a project if you don’t have one:
>>> from openlayer.tasks import TaskType >>> >>> project = client.create_project( ... name="Churn Prediction", ... task_type=TaskType.TabularClassification, ... description="My first project!", ... )
If you already have a project created on the platform:
>>> project = client.load_project(name="Your project name")
Let’s say you have a tabular classification project and your dataset looks like the following:
>>> df CreditScore Geography Balance Churned 0 618 France 321.92 1 1 714 Germany 102001.22 0 2 604 Spain 12333.15 0
Prepare the dataset config:
>>> dataset_config = { ... 'classNames': ['Retained', 'Churned'], ... 'labelColumnName': 'Churned', ... 'label': 'training', # or 'validation' ... 'featureNames': ['CreditScore', 'Geography', 'Balance'], ... 'categoricalFeatureNames': ['Geography'], ... }
What’s in the dataset config?
The dataset configuration depends on the project’s
tasks.TaskType
. Refer to the How to write dataset configs guides for details.You can now add this dataset to your project with:
>>> project.add_dataset( ... dataset_df=df, ... dataset_config=dataset_config, ... )
After adding the dataset to the project, it is staged, waiting to be committed and pushed to the platform.
You can check what’s on your staging area with
status
. If you want to push the dataset right away with a commit message, you can use thecommit
andpush
methods:>>> project.commit("Initial dataset commit.") >>> project.push()