openlayer.OpenlayerClient.add_dataset#

OpenlayerClient.add_dataset(task_type, file_path, class_names, label_column_name, feature_names=[], text_column_name=None, categorical_feature_names=[], tag_column_name=None, language='en', sep=',', commit_message=None, project_id=None)#

Uploads a dataset to the Openlayer platform (from a csv).

Parameters
file_pathstr

Path to the csv file containing the dataset.

class_namesList[str]

List of class names indexed by label integer in the dataset. E.g. [negative, positive] when [0, 1] are in your label column.

label_column_namestr

Column header in the csv containing the labels.

Important

The labels in this column must be zero-indexed integer values.

feature_namesList[str], default []

List of input feature names. Only applicable if your task_type is TaskType.TabularClassification or TaskType.TabularRegression.

text_column_namestr, default None

Column header in the csv containing the input text. Only applicable if your task_type is TaskType.TextClassification.

categorical_feature_namesList[str], default []

A list containing the names of all categorical features in the dataset. E.g. [“Gender”, “Geography”]. Only applicable if your task_type is TaskType.TabularClassification or TaskType.TabularRegression.

tag_column_namestr, default None

Column header in the csv containing tags you want pre-populated in Openlayer.

Important

Each cell in this column must be either empty or contain a list of strings.

Tags

[‘sample’]

[‘tag_one’, ‘tag_two’]

languagestr, default ‘en’

The language of the dataset in ISO 639-1 (alpha-2 code) format.

sepstr, default ‘,’

Delimiter to use. E.g. ‘\t’.

commit_messagestr, default None

Commit message for this version.

Returns
Dataset

An object containing information about your uploaded dataset.

Notes

  • Please ensure your input features are strings, ints or floats.

  • Please ensure your label column name is not contained in feature_names.

Examples

First, instantiate the client:

>>> import openlayer
>>> client = openlayer.OpenlayerClient('YOUR_API_KEY_HERE')

Create a project if you don’t have one:

>>> from openlayer.tasks import TaskType
>>> project = client.create_project(
...     name="Churn Prediction",
...     task_type=TaskType.TabularClassification,
...     description="My first project!",
... )

If you already have a project created on the platform:

>>> project = client.load_project(name="Your project name")

If your project’s task type is tabular classification…

Let’s say your dataset looks like the following:

CreditScore

Geography

Balance

Churned

618

France

321.92

1

714

Germany

102001.22

0

604

Spain

12333.15

0

Important

The labels in your csv must be integers that correctly index into the class_names array that you define (as shown below). E.g. 0 => ‘Retained’, 1 => ‘Churned’

The variables are needed by Openlayer are:

>>> class_names = ['Retained', 'Churned']
>>> feature_names = ['CreditScore', 'Geography', 'Balance']
>>> label_column_name = 'Churned'
>>> categorical_feature_names = ['Geography']

You can now upload this dataset to Openlayer:

>>> dataset = client.add_dataset(
...     file_path='/path/to/dataset.csv',
...     commit_message="First commit!",
...     class_names=class_names,
...     label_column_name=label_column_name,
...     feature_names=feature_names,
...     categorical_feature_names=categorical_feature_names,
... )
>>> dataset.to_dict()

If your task type is text classification…

Let’s say your dataset looks like the following:

Text

Sentiment

I have had a long weekend

0

I’m in a fantastic mood today

1

Things are looking up

1

The variables are needed by Openlayer are:

>>> class_names = ['Negative', 'Positive']
>>> text_column_name = 'Text'
>>> label_column_name = 'Sentiment'

You can now upload this dataset to Openlayer:

>>> dataset = client.add_dataset(
...     file_path='/path/to/dataset.csv',
...     commit_message="First commit!",
...     class_names=class_names,
...     label_column_name=label_column_name,
...     text_column_name=text_column_name,
... )
>>> dataset.to_dict()