openlayer.Project.add_dataframe#
- Project.add_dataframe(*args, **kwargs)#
Adds a dataset (Pandas dataframe) to a project’s staging area.
- Parameters:
- dataset_dfpd.DataFrame
Dataframe with your dataset.
- dataset_config: Dict[str, any]
Dictionary containing the dataset configuration. This is not needed if
dataset_config_file_pathis provided.What’s in the dataset config?
The dataset configuration depends on the project’s
tasks.TaskType. Refer to the How to write dataset configs guides for details.- dataset_config_file_pathstr
Path to the dataset configuration YAML file. This is not needed if
dataset_configis provided.What’s in the dataset config file?
The dataset configuration YAML depends on the project’s
tasks.TaskType. Refer to the How to write dataset configs guides for details.- forcebool
If
add_datasetis called when there is already a dataset of the same type in the staging area, whenforce=True, the existing staged dataset will be overwritten by the new one. Whenforce=False, the user will be prompted to confirm the overwrite first.
Notes
Your dataset is in csv file? You can use the
add_datasetmethod instead.Examples
Related guide: How to upload datasets and models for development.
First, instantiate the client:
>>> import openlayer >>> >>> client = openlayer.OpenlayerClient('YOUR_API_KEY_HERE')
Create a project if you don’t have one:
>>> from openlayer.tasks import TaskType >>> >>> project = client.create_project( ... name="Churn Prediction", ... task_type=TaskType.TabularClassification, ... description="My first project!", ... )
If you already have a project created on the platform:
>>> project = client.load_project(name="Your project name")
Let’s say you have a tabular classification project and your dataset looks like the following:
>>> df CreditScore Geography Balance Churned 0 618 France 321.92 1 1 714 Germany 102001.22 0 2 604 Spain 12333.15 0
Prepare the dataset config:
>>> dataset_config = { ... 'classNames': ['Retained', 'Churned'], ... 'labelColumnName': 'Churned', ... 'label': 'training', # or 'validation' ... 'featureNames': ['CreditScore', 'Geography', 'Balance'], ... 'categoricalFeatureNames': ['Geography'], ... }
What’s in the dataset config?
The dataset configuration depends on the project’s
tasks.TaskType. Refer to the How to write dataset configs guides for details.You can now add this dataset to your project with:
>>> project.add_dataset( ... dataset_df=df, ... dataset_config=dataset_config, ... )
After adding the dataset to the project, it is staged, waiting to be committed and pushed to the platform.
You can check what’s on your staging area with
status. If you want to push the dataset right away with a commit message, you can use thecommitandpushmethods:>>> project.commit("Initial dataset commit.") >>> project.push()