openlayer.OpenlayerClient.add_model#

OpenlayerClient.add_model(name, task_type, function, model, model_type, class_names, requirements_txt_file, feature_names=[], categorical_feature_names=[], train_sample_df=None, train_sample_label_column_name=None, setup_script=None, custom_model_code=None, dependent_dir=None, commit_message=None, project_id=None, **kwargs)#

Uploads a model to the Openlayer platform.

Parameters
namestr

Name of your model.

Important

Versioning models on the Openlayer platform happens via the name argument. If add_model is called with a name that still does not exist inside the project, Openlayer treats it as the first version of a new model lineage. On the other hand, if a model with the specified name already exists inside the project, Openlayer treats it as a new version of an existing model lineage.

function

Prediction function object in expected format. Scroll down for examples.

Note

On the Openlayer platform, running inference with the model corresponds to calling function. Therefore, expect the latency of model calls in the platform to be similar to that of calling function on a CPU. Preparing function to work with batches of data can improve latency.

model

The Python object for your model loaded into memory. This will get pickled now and later loaded and passed to your predict_proba function to compute run reports, test reports, or conduct what-if analysis.

model_typeModelType

Model framework. E.g. ModelType.sklearn.

class_namesList[str]

List of class names corresponding to the outputs of your predict function. E.g. [‘positive’, ‘negative’].

requirements_txt_filestr, default None

Path to a requirements.txt file containing Python dependencies needed by your predict function.

feature_namesList[str], default []

List of input feature names. Only applicable if your task_type is TaskType.TabularClassification or TaskType.TabularRegression.

categorical_feature_namesList[str], default []

A list containing the names of all categorical features used by the model. E.g. [“Gender”, “Geography”]. Only applicable if your task_type is TaskType.TabularClassification or TaskType.TabularRegression.

train_sample_dfpd.DataFrame, default None

A random sample of >= 100 rows from your training dataset. This is used to support explainability features. Only applicable if your task_type is TaskType.TabularClassification or TaskType.TabularRegression.

train_sample_label_column_namestr, default None

Column header in train_sample_df containing the labels. Only applicable if your task_type is TaskType.TabularClassification or TaskType.TabularRegression.

setup_scriptstr, default None

Path to a bash script executing any commands necessary to run before loading the model. This is run after installing python requirements.

Note

This is useful for installing custom libraries, downloading NLTK corpora etc.

custom_model_codestr, default None

Code needed to initialize the model. Model object must be None in this case. Required, and only applicable if your model_type is ModelType.custom.

dependent_dirstr, default None

Path to a dir of file dependencies needed to load the model. Required if your model_type is ModelType.custom.

commit_messagestr, default None

Commit message for this version.

**kwargs

Any additional keyword args you would like to pass to your predict_proba function.

Note

If you include tokenizer as part of your predict_proba’s kwargs, it will also be used by our explainability techniques.

Returns
Model

An object containing information about your uploaded model.

Examples

See also

Our sample notebooks and tutorials.

First, instantiate the client:

>>> import openlayer
>>> client = openlayer.OpenlayerClient('YOUR_API_KEY_HERE')

Create a project if you don’t have one:

>>> from openlayer.tasks import TaskType
>>> project = client.create_project(
...     name="Churn Prediction",
...     task_type=TaskType.TabularClassification,
...     description="My first project!",
... )

If you already have a project created on the platform:

>>> project = client.load_project(name="Your project name")

If your project’s task type is tabular classification…

Let’s say your dataset looks like the following:

>>> df
    CreditScore  Geography    Balance  Churned
0           618     France     321.92        1
1           714    Germany  102001.22        0
2           604      Spain   12333.15        0
..          ...        ...        ...      ...

The first set of variables needed by Openlayer are:

>>> from openlayer import TaskType
>>>
>>> class_names = ['Retained', 'Churned']
>>> feature_names = ['CreditScore', 'Geography', 'Balance']
>>> categorical_feature_names = ['Geography']

Now let’s say you’ve trained a simple scikit-learn model on data that looks like the above.

You must next define a predict_proba function that adheres to the following signature:

>>> def predict_proba(model, input_features: np.ndarray, **kwargs):
...     # Optional pre-processing of input_features
...     preds = model.predict_proba(input_features)
...     # Optional re-weighting of preds
...     return preds

The model arg must be the actual trained model object, and the input_features arg must be a 2D numpy array containing a batch of features that will be passed to the model as inputs.

You can optionally include other kwargs in the function, including variables, encoders etc. You simply pass those kwargs to the project.add_model function call when you upload the model.

Here’s an example of the predict_proba function in action:

>>> x_train = df[feature_names]
>>> y_train = df['Churned']
>>> sklearn_model = LogisticRegression(random_state=1300)
>>> sklearn_model.fit(x_train, y_train)
>>>
>>> input_features = x_train.to_numpy()
array([[618, 'France', 321.92],
       [714, 'Germany', 102001.22],
       [604, 'Spain', 12333.15], ...], dtype=object)
>>> predict_proba(sklearn_model, input_features)
array([[0.21735231, 0.78264769],
       [0.66502929, 0.33497071],
       [0.81455616, 0.18544384], ...])

The other model-specific variables needed by Openlayer are:

>>> from openlayer import ModelType
>>>
>>> model_type = ModelType.sklearn
>>> train_sample_df = df.sample(5000)
>>> train_sample_label_column_name = 'Churned'
>>> requirements_txt_file = "requirements.txt"  # path to requirements.txt

Important

For tabular classification models, Openlayer needs a representative sample of your training dataset, so it can effectively explain your model’s predictions.

You can now upload this model to Openlayer:

>>> model = project.add_model(
...     name='Linear classifier',
...     commit_message='First iteration of vanilla logistic regression',
...     function=predict_proba,
...     model=sklearn_model,
...     model_type=model_type,
...     class_names=class_names,
...     feature_names=feature_names,
...     categorical_feature_names=categorical_feature_names,
...     train_sample_df=train_sample_df,
...     train_sample_label_column_name=train_sample_label_column_name,
...     requirements_txt_file=requirements_txt_file,
... )
>>> model.to_dict()

If your task type is text classification…

Let’s say your dataset looks like the following:

>>> df
                              Text  Sentiment
0    I have had a long weekend              0
1    I'm in a fantastic mood today          1
2    Things are looking up                  1
..                             ...        ...

The first variable needed by Openlayer is:

>>> class_names = ['Negative', 'Positive']

Now let’s say you’ve trained a simple scikit-learn model on data that looks like the above.

You must next define a predict_proba function that adheres to the following signature:

>>> def predict_proba(model, text_list: List[str], **kwargs):
...     # Optional pre-processing of text_list
...     preds = model.predict_proba(text_list)
...     # Optional re-weighting of preds
...     return preds

The model arg must be the actual trained model object, and the text_list arg must be a list of strings.

You can optionally include other kwargs in the function, including tokenizers, variables, encoders etc. You simply pass those kwargs to the project.add_model function call when you upload the model.

Here’s an example of the predict_proba function in action:

>>> x_train = df['Text']
>>> y_train = df['Sentiment']
>>> sentiment_lr = Pipeline(
...     [
...         (
...             "count_vect",
...             CountVectorizer(min_df=100, ngram_range=(1, 2), stop_words="english"),
...         ),
...         ("lr", LogisticRegression()),
...     ]
... )
>>> sklearn_model.fit(x_train, y_train)
>>> text_list = ['good', 'bad']
>>> predict_proba(sentiment_lr, text_list)
array([[0.30857194, 0.69142806],
       [0.71900947, 0.28099053]])

The other model-specific variables needed by Openlayer are:

>>> from openlayer import ModelType
>>>
>>> model_type = ModelType.sklearn
>>> requirements_txt_file = "requirements.txt"  # path to requirements.txt

You can now upload this dataset to Openlayer:

>>> model = project.add_model(
...     name='Linear classifier',
...     commit_message='First iteration of vanilla logistic regression',
...     function=predict_proba,
...     model=sklearn_model,
...     model_type=model_type,
...     class_names=class_names,
...     requirements_txt_file=requirements_txt_file,
... )
>>> model.to_dict()

Note

If inside the given project the add_model method is called with name='Linear classifier' for the first time, a new model lineage will be created with Linear classifier as a name and description will be the first commit on that new tree. In the future, if you’d like to commit a new version to that same lineage, you can simply call add_model using name='Linear classifier' again and use description with the new commit message. If you’d like to start a new separate lineage inside that project, you can call the add_model method with a different name. E.g., name ='Nonlinear classifier'.