Domino Analytics Distribution




Overview

Each run and workspace in Domino operates in its own Docker container. These Docker containers are defined by Domino compute environments. Environments can be shared and customized, and they are automatically versioned by Domino.

New installations of Domino come with a standard set of environments (Domino Analytics Distribution) and associated Docker images. Periodically, Domino publishes a new set of standard environments with updated libraries and packages. These environments include many of the common packages and libraries pre-configured.

We also make available a set of “minimal” environments (Domino Minimal Distribution) which includes only the necessary packages required to work with in Domino. These would be an appropriate option for a user who want to build an environment from scratch.




Domino Analytics Distribution (DAD)

The Domino Analytics Distributions are designed to handle most of what a typical data science workflow needs out of the box. They include the most common python and R packages along with an installation of CUDA which is required for utilizing GPU machines.

You can view the contents of any environment, but reviewing it’s dockerfile which shows all of the included packages and their version numbers.

You can review the available dockerfile and descriptions here

The actual images are hosted on dockerhub here

Domino Minimal Distribution (DMD)

While the DAD includes most of what a data scientist needs to do their work (e.g. Pandas, Scikit-learn, Dplyr), the DMD includes only the bare necessities required to work in Domino.

Specifically, The objective for the DMD is to provide an image which will allow one to: - Open Jupyter, Jupyterlab, VScode and Rstudio workspaces - Batch run Python and R jobs - Host a Shiny web app - Publish a Python and R Model API - Use Domino’s git integration - SSH into runs container (Domino versions <4.0) - Instillation for pip and R packages, as needed

You can shrink the DMD to be smaller, as needed, by removing any of the workspaces you won’t be using or removing either Python or R.

You can review the available dockerfile and descriptions here: https://github.com/dominodatalab/Domino_Base_Images/

The actual images are hosted on dockerhub here: https://hub.docker.com/r/dominodatalab/base




Example of Implementing a New Environment

  1. Select an environment from the available by choosing the python and R version. Typically, you’ll always want to chose the latest environment.
  • Note: Environments tagged “_legacy” are designed with work with Domino versions <4. The only difference between a regular and legacy environment is the way they handle CUDA given the switch to using nvidia-docker2 in Domino version 4.0.
  1. Find the Appropriate Name, Description, Image URI and “Pluggable Properties” for you environment.

Title: DAD Py3.7 R3.6

URI: dominodatalab/base:DAD_py3.7_r3.6_2019q4

Description:

Ubuntu 18.04
Mini-conda 4.7.12.1
Python 3.7.4
R 3.6.2
Jupyter, Jupyterlab, VSCode, Rstudio
Cuda 10.0
https://github.com/dominodatalab/Domino_Base_Images/tree/master/Domino_Analytics_Distribution/2019_q4_py3.7_r3.6

Pluggable Workspace Tools

jupyter:
  title: "Jupyter (Python, R, Julia)"
  iconUrl: "/assets/images/workspace-logos/Jupyter.svg"
  start: [ "/var/opt/workspaces/jupyter/start" ]
  httpProxy:
    port: 8888
    rewrite: false
    internalPath: "/{{ownerUsername}}/{{projectName}}/{{sessionPathComponent}}/{{runId}}/{{#if pathToOpen}}tree/{{pathToOpen}}{{/if}}"
  supportedFileExtensions: [ ".ipynb" ]
jupyterlab:
  title: "JupyterLab"
  iconUrl: "/assets/images/workspace-logos/jupyterlab.svg"
  httpProxy:
    internalPath: "/{{ownerUsername}}/{{projectName}}/{{sessionPathComponent}}/{{runId}}/{{#if pathToOpen}}tree/{{pathToOpen}}{{/if}}"
    port: 8888
    rewrite: false
    requireSubdomain: false
vscode
 title: "vscode"
 iconUrl: "/assets/images/workspace-logos/vscode.svg"
 start: [ "/var/opt/workspaces/vscode/start" ]
 httpProxy:
    port: 8888
    requireSubdomain: false
rstudio:
  title: "RStudio"
  iconUrl: "/assets/images/workspace-logos/Rstudio.svg"
  start: [ "/var/opt/workspaces/rstudio/start" ]
  httpProxy:
    port: 8888
    requireSubdomain: false
  1. Create a new Domino Compute environment
  1. Update your Domino AMI (not required for non-cloud)
  • Once you’ve created a compute environment with a new base image, you’ll want to work with your admin to update your Domino’s AMI (or if not on AWS, the GCP or Azure equivalent) by caching the new image. As Domino spins up and down new executors, if your new image is not in the AMI, it will need to pull that image onto the executor the first time it starts up. This can cause a ~10 minute delay for starting workspaces on new executors. See here for the procedure to snap and update your AMI

FAQ

  1. How can I tell which image I’m currently using?
    The URI for the image will be listed on your compute environments overview page. If you environment is built on top of the another environment, you may need to click through to the parent environment before seeing the underlying docker image.
  2. I have a third party docker image, can I use that in Domino?
    Maybe, but not likely without some customization. The DAD and DMD are tested and configured to meet the Domino platform requirements and conventions. For example, by convention Domino uses /mnt as the default working directory. By and large, these requirements are best understood by reviewing the DMD dockerfiles.

    If you have a dockerfile you’d like to use within Domino, it’s recommended that you add those instructions to either the DMD or DAD rather than starting from scratch.

  3. How can I learn about new versions of the DAD and make feature requests?
    Check out the Domino community forum for news and updates.