Preload Environment packages

Domino Environments have a Dockerfile Instructions section. Type commands in this section to install packages directly into your Environment.

Save package install time and improve reproducibility

Preloading packages can save time as it prevents repeated loading in Workspaces and Jobs. Installation is not required after the initial build as Domino Environments are cached. However, the Environment is still loaded with each new execution so only packages that are frequently used should be cached. Preloading too many unnecessary packages will increase the size of the cached which can significantly increase load times.

Dockerfile instructions can load specific versions of packages and libraries for clarity which helps with reproducibility and collaboration. It’s a best practice to fix package versions in the Dockerfile instructions.

An alternative to caching packages in a Python project is to use a requirement.txt file. See requirements.txt in Domino for more details.

Docker command syntax in Domino

Consult the official Docker documentation to learn more about Dockerfiles:

Note	Domino’s Docker images must be run by the user, `ubuntu`. However, Domino recommends installing packages in your Dockerfiles as the `root` user to avoid permissions issues. To do this, add `USER root` before your Docker commands. Then, after running the commands, type `USER ubuntu`.

Do not start your Dockerfile instructions in Domino with a FROM line. Domino includes the FROM line for you, pointing to the base image specified when setting up the Environment.

The most common Dockerfile instructions you’ll use are RUN, ENV, and ARG:

RUN commands execute lines of bash. For example:

USER root

RUN wget http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz
RUN tar xvzf spark-1.5.1-bin-hadoop2.6.tgz
RUN mv spark-1.5.1-bin-hadoop2.6 /opt
RUN rm spark-1.5.1-bin-hadoop2.6.tgz
RUN pip install s3fs==2021.6.1 scipy==1.7.1
RUN R --no-save -e "install.packages('remotes')"

USER ubuntu

ARG commands set build-time variables, and ENV commands set container bash Environment variables. They will be accessible from runs that use this Environment. For example:

ENV SPARK_HOME /opt/spark-1.5.1-bin-hadoop2.6

If you set variables in the Environment variables section of your definition, you can use an ARG statement:

ARG SPARK_HOME

This will be available for the build step. If you want the variable to be available in the final compute Environment you must add an ENV statement referencing the argument name:

ENV SPARK_HOME=$SPARK_HOME

Examples: Package installation

When you edit your Environment, click R Package or Python Package to insert a line to install packages.
Enter the names of the packages.

You can also add the commands, as in the following examples:

R Package Installation, example with the devtools package.

USER root
RUN R --no-save -e "install.packages('devtools')"
RUN chown -R ubuntu:ubuntu /usr/local/lib/R/site-library  # ensure ubuntu user can update installed packages
USER ubuntu

Python Package Installation with Pip, for example with the… numpy package.
```
USER root
RUN pip install numpy
USER ubuntu
```

Dockerfile best practices

Docker optimizes its build process by keeping track of commands it has run and aggressively caching the results. This means that if it sees the same set of commands as in a previous build, it will assume that it can use the cached version. A single new command will invalidate the caching of all subsequent commands.

A Docker image can have up to 127 layers, or commands. To work around this limit, you can use &&: to combine several commands into a single command.

RUN
  wget http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz &&
  tar xvzf spark-1.5.1-bin-hadoop2.6.tgz &&
  mv spark-1.5.1-bin-hadoop2.6 /opt &&
  rm spark-1.5.1-bin-hadoop2.6.tgz

If you are installing multiple python packages via pip, it’s almost always best to use a single pip install command. This ensures that dependencies and package versions are properly resolved. If you install via separate commands, you may end up inadvertently overriding a package with the wrong version, due a dependency specified by a later installation. For example:
```
RUN pip install luigi nolearn lasagne
```

User Guide

Admin Guide

API Guide

Release Notes

Preload Environment packages

Save package install time and improve reproducibility

Docker command syntax in Domino

Examples: Package installation

Dockerfile best practices