Domino 5.8.0 (October 2023)

Validated frameworks

The following versions have been validated with Domino 5.8.0. Other versions might be compatible but are not guaranteed.

New features

AI Hub (preview)

The new AI Hub enables quickly building AI applications from prebuilt solutions curated from the best of open source that are enterprise-ready. The AI Hub enables you and your team to discover and reuse templates for several common ML use cases and industry-specific patterns while providing access to best practices and “art of the possible” inspiration with Domino.

Screenshot of the landing page of the AI Hub

FinOps GA

Gain insight and control over your cloud infrastructure costs with FinOps.

  • Track - FinOps can integrate directly with cloud provider billing APIs, so you can track your actual cloud bills, including any discounts or special agreements you may have.

  • Aggregate - Group costs by important dimensions like user, project, organization, and hardware tiers to identify cost contributors and allocate expenses.

  • Control - Set budgets and alerts for Projects and Organizations to proactively control costs.

Feature store GA

Domino’s feature store is now Generally Available (GA). The centralized feature store lets you store, catalog, search, share, and re-use features across your organization, enabling you to develop and deploy your models faster and more consistently. Access features in both the online store for real-time inferencing, and in the offline store for model training and batch inferencing.

Model registry and governance updates

You now have more flexibility and tighter control over model governance review processes.

  • Custom review stages - Define custom model review stages to mold the review process to your needs - add stages like "Pre-production" and "Staging-2" to align with your development workflows.

  • Model discoverability settings - Decide who in your organization gets to see your models and protect sensitive information with enhanced discoverability settings.

  • Model activity log - Audit model review activity with a new log containing model stage transitions, review requests, and review responses.

Data Source support for Databricks, Trino, and Starburst JDBC

Access more of your data with new Data Source connectors for the following sources:

Connect to any Starburst-supported JDBC data entities including:

  • ClickHouse

  • Druid

  • Greenplum

  • MariaDB

  • Ignite

  • SingleStore (MemSQL)

  • Synapse

  • Vertica

  • Generic JDBC connectivity

Improvements

  • It’s now easier than ever to see who accessed Data Sources and when they accessed them with audit log information now available from the model card, Workspaces, Jobs, and Experiments results pages.

  • Enhanced Grafana alerting now includes real-time alerts with links to troubleshooting runbooks to improve your MTTR (Mean Time to Resolution).

    • Real-time health notifications: Receive real-time alerts from the cluster at the onset of any anomaly or issue for even faster response times.

    • Guided troubleshooting runbooks: Alerts now come with a detailed runbook offering step-by-step solutions to quickly resolve issues.

    • Revamped Grafana dashboards: Overhauled the Grafana dashboard experience for a clearer view into the health of your Domino.

      Domino Workloads Alert

  • Experiment manager usability and quality of life improvements:

    • MLflow upgraded to version 2.6.0 (previously on 2.3.2).

    • Improved charting experience, including persisting customizations on the user level.

    • Search for parameters and metrics in the column selector.

    • Edit experiments and run names in the UI.

    • Archive and restore experiments in the UI.

    • Run details now include lineage to the registered model and data used.

  • You can now configure Zendesk support for your Domino instance during installation. See Configuration reference for more information.

  • The terraform-aws-eks module has undergone usability and stability enhancements. It now sets up infrastructure, cluster, and nodes independently, each with its own state, for more precise control, lower risk of disruptions, and faster iteration. We’ve also introduced a script to manage multiple modules, simplifying Terraform commands across sectors.

  • Keycloak pod replicas no longer share jar files as a stability enhancement. Since sharing the same file with multiple Keycloak replicas could lead to unexpected behavior, you must now upload jar files with custom JavaScript providers to each Keycloak pod replica individually. See Single sign-on (SSO) configuration for more information.

  • Hardware tier dropdown is now sorted alphabetically. It also includes keyword searching capability and provides information about the number of GPUs when relevant.

  • You can now hide the Reset Password button in the Admin user management page by setting the Central Configuration key com.cerebro.domino.userManagement.passwordResetEnabled to false.

Non-root executions

For increased security, Domino now enforces non-root constraints on an execution’s containers. Some of these constraints are always-on, and some are turned on by setting com.cerebro.domino.computegrid.kubernetes.nonRootExecutions.enabled to true in Central Configuration.

Admins should be aware of the following always-on constraints:

  • Privileged capabilities have been removed from non-user-code containers and the execution pod spec.

  • Mounted volumes are configured to use the well-known non-root gid 12574, via Kubernetes fsGroup.

  • The reverse-proxy container inside the run pod listens on port 8765 instead of port 80.

  • The keys com.cerebro.domino.computegrid.dominoGroupId and com.cerebro.domino.computegrid.dominoUserId have been deprecated and removed from Central Configuration.

When com.cerebro.domino.computegrid.kubernetes.nonRootExecutions.enabled is set to true, admins should be aware of the following additional constraints:

  • Privileged capabilities have been removed from the user code ("run") container.

  • Different requirements for environment images, see Manually create an Environment with a pre-built image for more details.

  • The Domino user has been removed from the sudoers file; therefore the sudo command can no longer be used in Workspaces. Domino recommends building environment images with the necessary packages instead of installing them at runtime for better reproducibility and increased security.

API changes

  • The paths under /u/{ownerUsername}/{projectName}/scheduledruns, that were deprecated in Domino 5.5.0, have been removed.

Bug fixes

  • You can now see the Status, Active Version, and Owner columns in the Model API list.

  • App authors can now pass their Domino username in the header for Apps.

  • Users can see raw files whose size is ⇐ 5 MB (com.cerebro.domino.frontend.defaultMaxFileSizeToRenderInBytes) when they click on the "View Latest Raw File" button in the code file browser, even if their S3 buckets don’t have CORS enabled.

Known issues

  • In Azure Blob Store deployments, projects with many files may fail to sync through the Domino CLI. To work around this issue, do not disable file locking when prompted by Domino.

  • You cannot view the latest raw file if you click View Latest Raw File. In the navigation pane, go to Files and click a file to view its details.

  • When uploading a large file to the Azure blob store by syncing a Workspace, you may encounter a Java Out of Memory error from Azure if the file/blob already exists. To work around this issue, use the Domino CLI to upload the file to the project.

  • Model Monitoring data sources aren’t validated. If you enter an invalid bucket name and attempt to save, the entry will go through. However, you won’t be able to see metrics for that entry because the name points to an invalid bucket.

  • Domino instances that make use of Azure Blob Storage may experience stalled Jobs within projects with many large files.

  • If you attach a Git repository to a DFS project that points to a tagged release, the tag won’t be honored when building a Model API in that project. The build log will show an error similar to the following, and the model will be built using the default branch of your Git repository instead of the tagged branch:

    Jul 05 2023 14:36:27 -0500 #10 6.481 WARN [d.r.d.GitRepoUpdater] could not parse ref: v1.3.0 checking out default branch correlationId="iA2qWrYSLQ" thread="main"

    To work around this issue, use the branch name when building Model APIs instead of the release tag.

  • If an admin resets a user’s password, it invalidates all the user’s authentication tokens, including tokens used for long-running tasks like Jobs, Workspaces, or Apps. The user must create a new password, log back into Domino, and restart all executions. This also applies to CLI authentication; the user must re-login to their Domino CLI.

  • In Domino 5.6, the cost analyzer pod (inactive unless Kubecost is enabled) defaults to a different storageClass compared to Domino 5.7. As a result, the pod won’t run after upgrading to 5.7, breaking Kubecost functionality. However, data will continue to persist in Prometheus (or custom storage if using Kubecost Enterprise).

    To prevent this issue while still in Domino 5.6, override the default storageClass gp2 with the one expected in 5.7, dominodisk, during Kubecost installation by setting release_overrides.cost-analyzer.chart_values.persistentVolume.storageClass to dominodisk in the agent yaml before installing Kubecost.

    If you’ve already installed Kubecost on Domino 5.6, avoid the upgrade error by setting release_overrides.cost-analyzer.chart_values.persistentVolume.storageClass to gp2 in the agent YAML configuration file before upgrading to 5.7.

  • Rename dataset’s file button is not available when the user navigates to the dataset from the global dataset page.

    To work around this issue, navigate to the dataset from the project’s page.

  • The sample script for making asynchronous Model API requests contains an extra / at the end of the DOMINO_URL variable. As a result, running the script will show an error similar to the following.

    {'requestId': 'key not found: HandlerDef', 'errors': ['java.util.NoSuchElementException: key not found: HandlerDef']}

    To work around this issue, remove the trailing / at the end of the DOMINO_URL variable.

  • The Jobs REST API uses GitRefV1 to reference git objects (commits, branches, and tags). Not all examples in the API spec worked, so they’ve been updated to reflect the actual valid values. This change doesn’t affect API functionality; it’s just a fix to the documentation.

  • Links to Stack Trace and CPU Flame Graph in the Ray Cluster UI’s Cluster tab are broken due to an issue in Ray 2.4 not supporting links when hosted behind a reverse proxy. This problem is specific to the Cluster tab; links correctly function in other tabs. The issue is fixed in Ray 2.7 and will be updated in future Domino Ray image releases.

  • The Model API Management API documentation is missing 3 new attributes for the endpoint Publish a new model. You can publish a Model API using the registered name and model version by using the following attributes:

    • modelSource: Can be either File (default) or Registry. Use Registry for models registered in MLflow.

    • registeredModelName: The name of the model registered in MLflow (required if modelSource is set to Registry).

    • registeredModelVersion: The model’s version registered in MLflow (required if modelSource is set to Registry).

  • Deleting all R variables from memory using rm(list = ls(al = TRUE)) also deletes variables that Domino uses for internal processes. To safely delete variables, use rm(list = ls(all = TRUE)[!grepl("^.domino", ls(all = TRUE))]) instead."

  • When restarting a Workspace through the Update Settings modal, External Data Volumes are not mounted in the new Workspace. Follow the steps to mount External Data Volumes. This issue is fixed in Domino 5.9.0.

  • Downloading single files from Datasets using the Download Selected Items button will fail if the filename contains special characters, including + and &. As a workaround, you can download these types of files via the action menu, located to the right of the filename. This issue is fixed in Domino 5.10.0.

  • Annonymous users cannot run launchers or view public GBP projects due to the git credentials migration to vault. This issue is fixed in Domino 5.9.1.

  • Spaces in ADLS filenames are not allowed when getting and putting objects in Azure Data Sources with DominoDataR. As a workaround, upgrade to DominoDataR version 0.2.4. This issue is fixed in Domino 5.10.0.

  • Viewing dataset files in an Azure-based Domino cluster may lock files, preventing them from being deleted or modified. Restarting Nucleus frontend pods will release the lock. This issue is fixed in Domino 5.11.1.

Upgrade notes

  • GKE users that provisioned their infrastructure with Domino’s terraform-gcp-gke module must apply the changes introduced for 5.7.0 as of terraform-gcp-gke v2.5.0 when upgrading to ensure firewall rules work properly.

  • VPN support from within executions was updated to be disabled by default. Support can be enabled by setting the global config value com.cerebro.domino.computegrid.executions.allowVpn = true.

  • MongoDB is no longer the authoritative source of truth for User Roles. Keycloak has taken over the role. User Groups in Keycloak now correspond to Domino Global Roles, and a user’s membership status in these groups defines their Domino roles. The Central Config key authentication.oidc.externalRolesEnabled has been retired and no longer has any effect. Any edits made to roles in MongoDB will be overridden by the data from Keycloak.

  • EKS users are recommended to update the AWS VPC CNI settings to enable ANNOTATE_POD_IP in order to prevent execution timeout errors when an image pull takes longer than 10 minutes. In order to bypass the validation check during an upgrade, pass --warn-only as a command line option to the installer.

  • Domino CLI clients version 1.x (released in 2017 or earlier) are no longer supported. It is recommended to upgrade to Domino CLI version 6.0.