Domino 5.0.0 (December 2021)¶
The following versions have been validated with Domino 5.0. Other versions may also be compatible but are not guaranteed.
Ray - 1.6.0
Spark - 3.1.2
Dask - 2021.10.0
Many of these new features require packages that are specific to the 5.0 Domino Standard Environment (DSE). To use these features, do one of the following:
If you use the latest 5.0 DSE, all packages come pre-installed.
If you are using a custom environment, be sure to apply the same customizations after switching to the 5.0 DSE.
If you’ve created a custom environment or one that doesn’t include the latest packages, add them using these instructions.
Seamless experience to enable continuous monitoring of Model APIs¶
Domino’s Model APIs now have a quick and easy set of steps to prepare your model for continuous monitoring and proactive alerting on drift and model quality.
You no longer need to build a separate data pipeline to collect and store a model’s production data for use in monitoring. Using our data capture libraries, you can add simple instrumentation to your training and inference code that is seamlessly integrated with the workbench and deployment platform. Domino automatically analyzes the prediction and training data to generate drift and model quality monitoring metrics and alerting. Our integrated workflows unlock remediation steps via our new workspace reproducibility feature. We provide you with an environment to debug your code and access prediction data in near real time in the form of a Domino Dataset for deeper analysis of drift and model quality results.
New mechanism for ingesting prediction data
New client libraries in Python and R allow you to instrument your prediction logic to capture prediction data that is then automatically streamed to the Model Monitor for analysis so it can be applied in data drift and model quality monitoring. The data is also made available in Domino Datasets within your project.
Domino automatically infers the schema of the model from the registered Training Set so you do not need to configure the model separately for monitoring.
A new ‘Monitoring’ tab in the Model API section provides detailed drift and model quality information on a per-feature basis. Within a simple configuration dialog, you can enable continuous monitoring of your model.
In addition, the new UI allows configurations to adjust test types, thresholds, schedules, alert notifications, and more.
Signals for drift and model quality monitoring are now integrated into the Model API overview page to get a holistic view of the quality of all your models.
Model remediation and reproducibility
With a new ‘Open in Workspace’ feature, Domino establishes a “closed-loop” environment to access historical predictions for deeper analysis of issues found during monitoring, reproduce a development environment, and re-deploy a more accurate model faster.
The Cohort Analysis feature gives you the details you need in order to take remedial action when the Model Monitor detects data drift or model quality issues. It identifies underperforming cohorts of data, determines what differentiates this data, and shows you specific features of the data that require examination.
See Cohort Analysis for details and configuration steps.
Support for Parquet files
Parquet files are now supported in training, prediction, and ground truth datasets in the Model Monitor. With this change, both CSV and Parquet formats are supported for datasets in monitoring.
See Model Monitoring for complete details.
Install-time updates to Domino¶
With the model monitoring integration, there are a few changes to the Domino deployment and installation process:
The minimum node requirements for Domino are now increased by 2 nodes (1 platform node + 1 compute node).
If the platform cluster does not support autoscaling, extra nodes must be made available before an upgrade to 5.0. If they do, no changes are needed.
Install options for the Model Monitor
No standalone Model Monitor installation and no standalone Domino installation in 5.0
Install-time configuration options to enable / disable the Model Monitor no longer exist.
Domino can now auto-scale Spark, Ray, and Dask on-demand clusters to dynamically grow and shrink to optimize resource utilization and reduce cost, especially with bursty workloads. This allows you to start with a small cluster which then automatically scales up and down in response to the resource consumption of your workload.
Cluster auto-scaling is not enabled by default. See the Administration Guide for details about enabling this feature. See Spark cluster auto-scaling, Ray cluster auto-scaling, and Dask cluster auto-scaling for more details and configuration options.
The preferred method of connecting to a Ray cluster has changed to use a modified version of
ray.init(). For more information, including the proper way of connecting to a Ray cluster depending on the version, please see the relevant section of the Ray docs.
Domino Data API¶
This new Python library lets you access tabular and file-based data using consistent access patterns and SQL-based access to tabular data. Using this method, there is no need to restart a workload to install drivers. Connectors can be queried on the fly. Results are available as dataframe abstractions for popular libraries.
This is a preview feature, subject to change in future releases.
In Domino 5.0, this new data access method is supported for Snowflake, Redshift, and S3, with additional data sources planned for future releases.
See the Domino Data API documentation.
Training sets persist dataframes for model training or analysis. You can store or load multiple versions of dataframes, produced by data source queries or constructed by any other method you like. Training sets are stored alongside projects, using the same permissions as the project. In addition to training data, training sets capture metadata to establish a baseline for integrated model monitoring. Performance is optimized by caching expensive retrieval operations.
Vault integration for secret storage
Domino now stores environment variables, user API keys, and data source access secrets at rest, in encrypted form, inside a pre-configured internal HashiCorp Vault.
Improvements to data sources
Data connectors (UI, SDK, and API) provide access to popular data sources in a structured, self-service, and highly secure manner.
Now when you have a configured data source, you can copy a code snippet to paste into your query code for easy access to the data.
Credentials are now stored securely.
Data access credentials now are stored, encrypted, in an internal HashiCorp Vault.
You can share data sources with other users and organizations.
You can query multiple data sources.
Git integration improvements
This release includes smoother integration for Git-based projects.
Switch Git branches in your Domino workspace.
Users now have full access to Git branches during model development for seamless collaboration.
Resolve merge conflicts directly in your Domino workspace.
Domino 5.0 improves on existing capabilities by simplifying the process for resolving conflicts when merging code.
Checkpoints for durable workspaces
Durable workspaces, introduced in Domino 4.4, now allow you to create “checkpoints” – reproducible commit points that you can return to at any point to review the history of your work or branch experiments in new directions. A new Open in Workspace button in the Model API interface lets you open a model at any checkpoint. You can also review the workspace session history. See Recreate A Workspace From A Previous Commit.
Only models published in 5.0 and up can be opened this way. Models originally published using an earlier release will have the Open in Workspace button grayed out. Additionally, commits made outside of Domino cannot be used as checkpoints.
Compute cluster dedicated hardware tiers
A new Restrict to compute cluster setting specifies the compute cluster types (such as Spark, Ray, Dask, and so on) that should exclusively use a given Hardware Tier. See the Administration Guide for details.
Improvements to Domino Environments
Stability and security are improved in the Domino Environments for 5.0. Here are the key changes in the 5.0 Domino Environment images:
Fixed R Kernel in Jupyter
Add Spark Cluster and Spark Compute Environments by default to new deploys to support the Cohort Analysis feature
Fixed Jupyter notebook subdomain support
Added more Ray packages to the Ray Compute Environment
The Model Monitor is now compatible with Istio.
The Domino CLI installer no longer requires administrator privileges on Windows or Linux.
On Mac OS X, the installer prompts you for administrator credentials. If you do not supply them, you can still install and use the CLI by updating your
PATH. See Installing the Domino Command Line (CLI).
The default values of these configuration keys have been doubled in order to support workspace reproducibility:
The GP3 volume type is now supported and used by default when deploying in AWS.
GP3 is now the default storage type for all platform and compute nodes in new deployments in AWS. In existing deployments, compute nodes can leverage GP3 volume types. Switching storage volume types for platform nodes to GP3 is not supported.
- To use GP3 volumes for compute nodes in 5.0.0:
Create a new GP3 storage class.
com.cerebro.domino.computegrid.kubernetes.volume.storageClasscentral configuration value to
The values for known configuration keys are now validated in the central config editor UI.
New configuration keys:
metrics_server.installdetermines whether the Kubernetes
metrics-servercomponent is installed as part of the installation/upgrade process. The default is
false; you only need to set this to
truein EKS or any other Kubernetes cluster type where the
metrics-servercomponent is not provided out-of-the-box.
vault.enabledspecifies whether the internal vault is enabled. In 5.0, this must be set to
helm.versionconfiguration key has been removed. Helm 3 is now the only supported version.
Domino can now import environment variables from projects that are owned by the organization.
These are issues known to exist in the 5.0.0 release that will be fixed in an upcoming release.
If you created a new DSE-based environment or rebuilt an existing one between January 14-21, those environments may contain outdated packages that prevent new 5.0 features from working correctly. You can resolve this issue by rebuilding those environments or by downloading the latest packages from PyPI at https://pypi.org/project/dominodatalab-data/.
Email notifications from the Model Monitor only work over SMTP+STARTTLS, which is usually deployed on ports
2587. SMTPS, which is usually deployed on ports
2465, is not supported. If you are not receiving emails from the Model Monitor, and the Model Monitor is configured to send emails over port
465, try changing the email settings to use port
587instead. You can configure the SMTP port for the Model Monitor email service at Model Monitor > Settings > Notification Channels > SMTP Port.
When you export a Domino model to Amazon Sagemaker and create an endpoint from the exported model, the Sagemaker endpoint may fail with the error
The primary container for production variant variant-name-1 did not pass the ping health check.
Follow these steps to work around the issue:
USER rootin the environment
Publish the model API from the new instructions.
Export the same image in Sagemaker and create an endpoint.
When using a Ray workspace to run a GPU workload that exercises the use of the
filelock._unixsubmodule, you may encounter the error
ModuleNotFoundError: No module named 'filelock._unix'; 'filelock' is not a package. This can occur in environments that are based on the
quay.io/domino/ray-environment:ubuntu18-py3.8-r4.1-ray1.6.0-domino5.0image tag. To work around this issue, update the workspace module’s version of
RUN pip install filelock==3.0.12 --user
When building Model API images with the V2 Image Builder (Forge) on deployments with Istio enabled, Repocloner is not supported. That means that on deployments with Istio enabled, the
ShortLived.RepoclonerImageBuildsconfiguration key must be set to “false” if the
ShortLived.ImageBuilderV2key is set to “true”.