Skip to content

Integrate with scikit-learn

Comet integrates with scikit-learn.

Scikit-learn is a free software machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support-vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.

Open In Colab

Log automatically

Below you will find a list of items that Comet can automatically log from Scikit-Learn without having to manually instrument your code

  • Hyperparameters

Configure Comet for scikit-learn

You can control what is automatically logged by Comet through an experiment parameter, environment variable, or configuration setting:

ItemExperiment ParameterEnvironment SettingConfiguration Setting
hyperparametersauto_param_loggingCOMET_AUTO_LOG_PARAMETERScomet.auto_log.parameters

For more information about using environment parameters in Comet, see Configure Comet.

End-to-end example

Here is a scikit-learn example.

For more examples using scikit-learn, see our examples GitHub repository.

import comet_ml

#create an experiment with your api key
exp = Experiment(project_name='sklearn-demos',
                 auto_param_logging=False)

import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import f1_score, precision_score, recall_score, confusion_matrix
random_state = 42

def evaluate(y_test, y_pred):
  return {
      'f1': f1_score(y_test, y_pred),
      'precision': precision_score(y_test, y_pred),
      'recall': recall_score(y_test, y_pred)
  }

experiment = comet_ml.Experiment(
    api_key="<Your API Key>",
    project_name="<Your Project Name>"
)

cancer = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(
    cancer.data,
    cancer.target,
    stratify=cancer.target,
    random_state=random_state)

clf = RandomForestClassifier()
clf.fit(X_train, y_train)

# Log Training Metrics
y_train_pred = clf.predict(X_train)
with experiment.train():
  metrics = evaluate(y_train, y_train_pred)
  experiment.log_metrics(metrics)

# Log Test Metrics
y_test_pred = clf.predict(X_test)

with experiment.test():
  metrics = evaluate(y_test, y_test_pred)
  experiment.log_metrics(metrics)

Note

There are alternatives to setting the API key programatically. See more here.

This example shows you how to search across parameter combinations using grid search:

from comet_ml import Experiment
from sklearn import svm, datasets
from sklearn.model_selection import GridSearchCV

iris = datasets.load_iris()
parameters = {'kernel': ('linear', 'rbf'), 'C': [1, 10]}
svr = svm.SVC()
clf = GridSearchCV(svr, parameters)
clf.fit(iris.data, iris.target)

for i in range(len(clf.cv_results_['params'])):
    exp = Experiment(workspace="your workspace",
                     project_name="grid_search_example")
    for k,v in clf.cv_results_.items():
        if k == "params":
            exp.log_parameters(v[i])
        else:
            exp.log_metric(k,v[i])

Try it out!

Try our example for using Comet with scikit-learn.

Open In Colab

Scikit-Learn model saving and loading

Comet provides user-friendly helpers to allow you to easily save your model and load them back.

Saving a model

To save a Scikit-Learn model, you can use the comet_ml.integration.sklearn.log_model helper like this:

import pickle
from comet_ml import Experiment
from comet_ml.integration.sklearn import log_model
from sklearn import svm
from sklearn import datasets

experiment = Experiment()

iris = datasets.load_iris()

model = svm.SVC()
model.fit(iris.data, iris.target)

# Save the model
log_model(
    experiment,
    "my-model",
    model,
    persistence_module=pickle,
)

The model file will be saved as an Experiment Model which is visible in the Experiment assets tab. From there you will be able to register it in the Model Registry.

comet_ml.integration.sklearn.log_model support pickle, cloudpickle and joblib persistence modules.

Check out the reference documentation for more details.

Loading a model

Once you have saved a model using comet_ml.integration.sklearn.log_model, you can load it back with its counterpart comet_ml.integration.sklearn.load_model.

Here is how you can load a model from the Model Registry:

from comet_ml.integration.sklearn import load_model

# Load the model from Comet Registry
model = load_model("registry://WORKSPACE/my-model:1.2.4")

prediction = model.predict(...)

You can load Scikit-Learn Model from various sources:

  • file://data/my-model, load the Model from the file path data/my-model (relative path)
  • file:///path/to/my-model, load the Model from the file path /path/to/-my-model (absolute path)
  • registry://<workspace>/<registry_name>, load the Model from the Model Registry identified by the workspace and registry name, take the last version of it.
  • registry://<workspace>/<registry_name>:version, load the Model from the Model Registry identified by the workspace, registry name and explicit version.
  • experiment://<experiment_key>/<model_name>, load the Model from an Experiment, identified by the Experiment key and the model_name.
  • experiment://<workspace>/<project_name>/<experiment_name>/<model_name>, load the Model from an Experiment, identified by the workspace name, project name, experiment name and the model_name.

Check out the reference documentation for more details.

Jul. 9, 2024