Use Artifacts¶
Frequently, the data used or created by an Experiment or model must be tracked. In many cases, machine learning teams must:
- Reuse data produced in their experimentation pipeline, and allow it to be tracked, versioned, consumed, and analyzed in a managed way.
- Track and reproduce complex multi-Experiment scenarios, where the output of one Experiment would be used as the input of another Experiment.
- Iterate on their datasets over time, track which model used which version of the dataset, and schedule model re-training.
Comet Artifacts is a tool that provides a convenient way to log, version, and browse data from all parts of the experimentation pipeline.
An Artifact
is a versioned object, where each version is an immutable snapshot of files and assets, arranged in a folder-like logical structure. Each snapshot can also include metadata, tags and aliases to make it easier to query. An Artifact also tracks which Experiments consumed it, and which Experiment produced it giving full data lineage capabilities.
Artifacts live at the Workspace level and can be accessed across Projects. The logged Artifact can be viewed in your Workspace in the Artifacts tab.
Artifact versions make it easy to fetch a snapshot of data from a particular point in time, or reproduce an Experiment, based on its specific inputs.
Grow your ML Projects with Comet Artifacts¶
Using Comet Artifacts as part of your training runs provides several benefits:
- You are able to version your dataset and cleanly separate your dataset preparation from your Model training, while maintaing a clear link between them.
- Then, with Comet Artifact Lineage, you will be able to quickly identify which version of your dataset was used to train a given model. Artifacts become especially valuable when you run multi-stage pipelines and need to track the inputs and outputs of each Experiment in the process.
- The number of Experiments has increased and your datasets have grown; you are likely to be working with cloud storage services. That's when Comet's remote Artifact Assets features become especially powerful.
- Finally, Comet lets you fully sync with remote Artifact Assets on AWS S3 and GCS.
Create an Artifact¶
It takes only a few lines of code to register an Artifact of any size to Comet.
from comet_ml import Experiment, Artifact
experiment = Experiment(
api_key="<Your API Key>",
project_name="<Your Project Name>"
)
artifact = Artifact(name="artifact-name", artifact_type="dataset")
artifact.add("path/to/my/file.csv")
experiment.log_artifact(artifact)
experiment.end()
Download an Artifact¶
The process of using an Artifact is similar to the one used to create it.
from comet_ml import Experiment, Artifact
experiment = Experiment(
api_key="<Your API Key>",
project_name="<Your Project Name>"
)
logged_artifact = experiment.get_artifact("artifact-name")
local_artifact = logged_artifact.download("./data")
Using Artifacts with Snowflake¶
Comet's integration with Snowflake's scalable cloud data platform provides developers building Machine Learning applications with a comprehensive overview of their model development process by allowing them to track and version their Snowflake queries and datasets.
Visit our Snowflake integration page for more information and examples
Learn more¶
- Remote Artifacts: Using Remote Artifacts to track data stored outside of Comet
- Comet Artifact Lineage: an interactive way to better understand and visualize the datasets that your team is using for machine learning.
- End-to-end example: Artifact basics in action.