Version control is a foundational practice in the world of software development, crucial for tracking changes, maintaining stability, and facilitating collaboration. In machine learning operations (MLOps), versioning extends beyond code to include data, models, and more.
There are several reasons why you should use versioning in MLOps:
Reproducibility
Reproducibility is a cornerstone of scientific and engineering disciplines. In MLOps, the ability to reproduce results is essential for validating models and their predictions. Versioning code and data ensures that experiments are repeatable, helping teams understand which changes influenced model performance and ensuring that models can be audited or debugged.
To learn more about reproducibility in Domino, see Domino Reproducibility Engine (DRE).
Collaboration
Versioning facilitates smoother collaboration within teams. Data scientists often work in diverse teams where experiments are built upon and iterated by different team members. Versioning allows these teams to track who made what changes, merge contributions efficiently, and revert changes if something goes wrong.
To learn more about collaboration, see Collaborative data science in Domino.
Compliance
For many industries, compliance with regulatory standards requires detailed records of data and model versions used to make decisions. Versioning provides an audit trail that can be invaluable for compliance and monitoring purposes.
To learn more about compliance in Domino, see Security and compliance.
In MLOps, models are continuously updated with new data. Versioning allows for the seamless integration of new data and models into production environments, supporting continuous improvement and deployment practices. This is crucial for applications where model performance can degrade over time as patterns in data change. Domino provides robust support to version both data and code, integrating these capabilities into its platform to enhance MLOps workflows.
Versioning code via Git integration
Domino integrates seamlessly with Git repositories, supporting code versioning practices that are familiar to software developers. This integration allows users to track changes in their code, collaborate through pull requests, and maintain a history of modifications. This linkage ensures that all aspects of model development, from data preparation to model training and evaluation, are versioned and traceable.
To learn more, see working with Git integration.
Versioning data via Domino Snapshots
Domino allows users to create snapshots of datasets at various points in their projects. These snapshots capture the state of the dataset at a specific time, providing a versioned history that can be used to train models at different stages of the project. This is particularly useful for tracking how changes in data affect model performance over time.
To learn more, see version data with Snapshots.
Versioning tools and libraries via Domino Environments
Domino has built-in support for versioning Environments, which are based on Docker containers that contain tools and libraries that data scientists and machine learning (ML) engineers need to train and deploy great models. It can often be challenging to install these tools and libraries, as versions of the different open-source libraries may not be compatible with each other. Domino makes it easy to add new tools or libraries to an Environment, which then creates a new version of that Environment. Other team members or auditors can then use that Environment to test changes or review the model and software for potential risks.
To learn more, see working with Domino Environments.
Unified workflow
By supporting both data and code versioning, Domino provides a unified workflow that maintains the integrity of ML projects. This approach reduces errors, simplifies debugging, and enhances the collaborative efforts of data science teams.
To learn more, see getting started with Projects in Domino.