Skip to content

Integrate with Ludwig Toolbox

Comet integrates with Ludwig Toolbox.

Ludwig is a TensorFlow-based toolbox that lets you to train and test deep learning models without the need to write code. By offering a well-defined, codeless deep learning pipeline from beginning to end, Ludwig enables practitioners and researchers alike to quickly train and test their models and obtain strong baselines to compare experiments against. Ludwig offers CLI commands for preprocessing data, training, issuing predictions, and visualizations.

Install Ludwig

Install Ludwig for Python (and spacy for English as a dependency, since we're using text features for this example). The following examples have been tested with Python 3.6 and Ludwig 0.2.

$ pip install ludwig
$ python -m spacy download en

If you encounter problems installing gmpy, install libgmp or gmp:

  • On Debian-based Linux distributions: sudo apt-get install libgmp3-dev
  • On MacOS: brew install gmp

Install Comet

  1. If you haven't already, install Comet:
$ pip install comet_ml
  1. Log on to Comet.

  2. Make sure to set up your Comet credentials. Get your API key in the Settings page.

  3. Make your API key available to Ludwig and set which Comet project you’d like the Ludwig experiment details to report to. Replace the following ... with the appropriate values:

$ export COMET_API_KEY="..."
$ export COMET_PROJECT_NAME="..."

We recommend that you create a new directory for each Ludwig experiment.

Some background: every time you want to create a new model and train it, you will use one of two commands:

  • ludwig train
  • ludwig experiment

Once you run these commands with the --comet flag, a .comet.config file is created in the current directory. This .comet.config file pulls your API key and Comet Project name from the environment variables you set above and creates an Experiment key for use in this directory.

If you want to run another Experiment, it is recommended that you create a new directory (and thus it will create another Experiment).

Download the dataset

For this example, we will be working on a text classification problem with the Reuters-21578, a well-known newswire dataset. It only contains 21,578 newswire documents grouped into six categories. Two are 'big' categories (many positive documents), two are 'medium' categories, and two are 'small' categories (few positive documents).

  • Small categories: heat.csv, housing.csv
  • Medium categories: coffee.csv, gold.csv
  • Big categories: acq.csv, earn.csv

To get the dataset, we use the curl command-line program:

$ curl http://boston.lti.cs.cmu.edu/classes/95-865-K/HW/HW2/reuters-allcats-6.zip \
    -o reuters-allcats-6.zip
$ unzip reuters-allcats-6.zip

You can also just download the file and place it in this directory.

Define the model

Define the model you wish to build with the input and output features you want. Create a file named model_definition.yaml with these contents:

input_features:
    -
        name: text
        type: text
        level: word
        encoder: parallel_cnn

output_features:
    -
        name: class
        type: category

Train the model

Train the model with the new --comet flag:

$ ludwig experiment --comet --data_csv reuters-allcats.csv \
    --model_definition_file model_definition.yaml

Once you run this, a Comet experiment will be created. Check your output for that Comet experiment URL.

Analysis

In Comet (even while the above Experiment is being run), you’ll be able to see:

  • Your live model metrics in real-time on the Charts tab.
  • The bash command you ran to train your Experiment along with any run arguments in the Code tab.
  • Hyperparameters that Ludwig is using (defaults) in the Hyperparameter tab and much more!

If you choose to make any visualizations with Ludwig, it’s also possible to upload these visualizations to Comet’s Image tab by running:

$ ludwig visualize --comet \
    --visualization learning_curves \
    --training_statistics \
    ./results/experiment_run_0/training_statistics.json

To keep up to date with Ludwig, consider these resources:

Jul. 9, 2024