Version control and Git

Version control is a foundational practice in the world of software development, crucial for tracking changes, maintaining stability, and facilitating collaboration. In machine learning operations (MLOps), versioning extends beyond code to include data, models, and more.

Why versioning matters in MLOps

There are several reasons why you should use versioning in MLOps:

Reproducibility

Reproducibility is a cornerstone of scientific and engineering disciplines. In MLOps, the ability to reproduce results is essential for validating models and their predictions. Versioning code and data ensures that experiments are repeatable, helping teams understand which changes influenced model performance and ensuring that models can be audited or debugged.

To learn more about reproducibility in Domino, see Domino Reproducibility Engine (DRE).

Collaboration

Versioning facilitates smoother collaboration within teams. Data scientists often work in diverse teams where experiments are built upon and iterated by different team members. Versioning allows these teams to track who made what changes, merge contributions efficiently, and revert changes if something goes wrong.

To learn more about collaboration, see Collaborative data science in Domino.

Compliance

For many industries, compliance with regulatory standards requires detailed records of data and model versions used to make decisions. Versioning provides an audit trail that can be invaluable for compliance and monitoring purposes.

To learn more about compliance in Domino, see Security and compliance.

Continuous improvement via versioning

In MLOps, models are continuously updated with new data. Versioning allows for the seamless integration of new data and models into production environments, supporting continuous improvement and deployment practices. This is crucial for applications where model performance can degrade over time as patterns in data change. Domino provides robust support to version both data and code, integrating these capabilities into its platform to enhance MLOps workflows.

Versioning code via Git integration

Domino integrates seamlessly with Git repositories, supporting code versioning practices that are familiar to software developers. This integration allows users to track changes in their code, collaborate through pull requests, and maintain a history of modifications. This linkage ensures that all aspects of model development, from data preparation to model training and evaluation, are versioned and traceable.

To learn more, see working with Git integration.

Versioning data via Domino Snapshots

Domino allows users to create snapshots of datasets at various points in their projects. These snapshots capture the state of the dataset at a specific time, providing a versioned history that can be used to train models at different stages of the project. This is particularly useful for tracking how changes in data affect model performance over time.

To learn more, see version data with Snapshots.

Versioning tools and libraries via Domino Environments

Domino has built-in support for versioning Environments, which are based on Docker containers that contain tools and libraries that data scientists and machine learning (ML) engineers need to train and deploy great models. It can often be challenging to install these tools and libraries, as versions of the different open-source libraries may not be compatible with each other. Domino makes it easy to add new tools or libraries to an Environment, which then creates a new version of that Environment. Other team members or auditors can then use that Environment to test changes or review the model and software for potential risks.

To learn more, see working with Domino Environments.

Unified workflow

By supporting both data and code versioning, Domino provides a unified workflow that maintains the integrity of ML projects. This approach reduces errors, simplifies debugging, and enhances the collaborative efforts of data science teams.

To learn more, see getting started with Projects in Domino.

Next steps

Orchestrate MLOps with CI/CD.

User Guide

Admin Guide

API Guide

Release Notes