openlayer.OpenlayerClient.add_dataframe#

OpenlayerClient.add_dataframe(task_type, df, class_names, label_column_name, feature_names=[], text_column_name=None, categorical_feature_names=[], commit_message=None, tag_column_name=None, language='en', project_id=None)#

Uploads a dataset to the Openlayer platform (from a pandas DataFrame).

Parameters
dfpd.DataFrame

Dataframe containing your dataset.

class_namesList[str]

List of class names indexed by label integer in the dataset. E.g. [negative, positive] when [0, 1] are in your label column.

label_column_namestr

Column header in the dataframe containing the labels.

Important

The labels in this column must be zero-indexed integer values.

feature_namesList[str], default []

List of input feature names. Only applicable if your task_type is TaskType.TabularClassification or TaskType.TabularRegression.

text_column_namestr, default None

Column header in the dataframe containing the input text. Only applicable if your task_type is TaskType.TextClassification.

categorical_feature_namesList[str], default []

A list containing the names of all categorical features in the dataframe. E.g. [“Gender”, “Geography”]. Only applicable if your task_type is TaskType.TabularClassification or TaskType.TabularRegression.

commit_messagestr, default None

Commit message for this version.

tag_column_namestr, default None

Column header in the dataframe containing tags you want pre-populated in Openlayer.

Important

Each cell in this column must be either empty or contain a list of strings.

Tags

[‘sample’]

[‘tag_one’, ‘tag_two’]

languagestr, default ‘en’

The language of the dataset in ISO 639-1 (alpha-2 code) format.

Returns
Dataset

An object containing information about your uploaded dataset.

Notes

  • Please ensure your input features are strings, ints or floats.

  • Please ensure your label column name is not contained in feature_names.

Examples

First, instantiate the client:

>>> import openlayer
>>> client = openlayer.OpenlayerClient('YOUR_API_KEY_HERE')

Create a project if you don’t have one:

>>> from openlayer.tasks import TaskType
>>> project = client.create_project(
...     name="Churn Prediction",
...     task_type=TaskType.TabularClassification,
...     description="My first project!",
... )

If you already have a project created on the platform:

>>> project = client.load_project(name="Your project name")

If your project’s task type is tabular classification…

Let’s say your dataframe looks like the following:

>>> df
    CreditScore  Geography    Balance  Churned
0           618     France     321.92        1
1           714    Germany  102001.22        0
2           604      Spain   12333.15        0

Important

The labels in your dataframe must be integers that correctly index into the class_names array that you define (as shown below). E.g. 0 => ‘Retained’, 1 => ‘Churned’.

The variables are needed by Openlayer are:

>>> class_names = ['Retained', 'Churned']
>>> feature_names = ['CreditScore', 'Geography', 'Balance']
>>> label_column_name = 'Churned'
>>> categorical_feature_names = ['Geography']

You can now upload this dataset to Openlayer:

>>> dataset = client.add_dataset(
...     df=df,
...     commit_message="First commit!",
...     class_names=class_names,
...     feature_names=feature_names,
...     label_column_name=label_column_name,
...     categorical_feature_names=categorical_feature_names,
... )
>>> dataset.to_dict()

If your task type is text classification…

Let’s say your dataset looks like the following:

>>> df
                              Text  Sentiment
0    I have had a long weekend              0
1    I'm in a fantastic mood today          1
2    Things are looking up                  1

The variables are needed by Openlayer are:

>>> class_names = ['Negative', 'Positive']
>>> text_column_name = 'Text'
>>> label_column_name = 'Sentiment'

You can now upload this dataset to Openlayer:

>>> dataset = client.add_dataset(
...     df=df,
...     commit_message="First commit!",
...     class_names=class_names,
...     text_column_name=text_column_name,
...     label_column_name=label_column_name,
... )
>>> dataset.to_dict()