Domino Environments have a Dockerfile Instructions section. Type commands in this section to install packages directly into your Environment.
Preloading packages can save time as it prevents repeated loading in Workspaces and Jobs. Installation is not required after the initial build as Domino Environments are cached. However, the Environment is still loaded with each new execution so only packages that are frequently used should be cached. Preloading too many unnecessary packages will increase the size of the cached which can significantly increase load times.
Dockerfile instructions can load specific versions of packages and libraries for clarity which helps with reproducibility and collaboration. It’s a best practice to fix package versions in the Dockerfile instructions.
An alternative to caching packages in a Python project is to use a requirement.txt file. See requirements.txt in Domino for more details.
Consult the official Docker documentation to learn more about Dockerfiles:
Do not start your Dockerfile instructions in Domino with a
Domino includes the
FROM line for you, pointing to the base image specified when setting up the Environment.
The most common Dockerfile instructions you’ll use are
RUN commands execute lines of bash.
USER root RUN wget http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz RUN tar xvzf spark-1.5.1-bin-hadoop2.6.tgz RUN mv spark-1.5.1-bin-hadoop2.6 /opt RUN rm spark-1.5.1-bin-hadoop2.6.tgz RUN pip install s3fs==2021.6.1 scipy==1.7.1 RUN R --no-save -e "install.packages('remotes')" USER ubuntu
ARG commands set build-time variables, and
ENV commands set container bash Environment variables.
They will be accessible from runs that use this Environment.
ENV SPARK_HOME /opt/spark-1.5.1-bin-hadoop2.6
If you set variables in the Environment variables section of your definition, you can use an
This will be available for the build step.
If you want the variable to be available in the final compute Environment you must add an
ENV statement referencing the argument name:
When you edit your Environment, click R Package or Python Package to insert a line to install packages.
Enter the names of the packages.
You can also add the commands, as in the following examples:
R Package Installation, example with the devtools package.
USER root RUN R --no-save -e "install.packages('devtools')" RUN chown -R ubuntu:ubuntu /usr/local/lib/R/site-library # ensure ubuntu user can update installed packages USER ubuntu
Python Package Installation with Pip, for example with the… numpy package.
USER root RUN pip install numpy USER ubuntu
Docker optimizes its build process by keeping track of commands it has run and aggressively caching the results. This means that if it sees the same set of commands as in a previous build, it will assume that it can use the cached version. A single new command will invalidate the caching of all subsequent commands.
A Docker image can have up to 127 layers, or commands. To work around this limit, you can use
&&: to combine several commands into a single command.
RUN \ wget http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz && \ tar xvzf spark-1.5.1-bin-hadoop2.6.tgz && \ mv spark-1.5.1-bin-hadoop2.6 /opt && \ rm spark-1.5.1-bin-hadoop2.6.tgz
If you are installing multiple python packages via pip, it’s almost always best to use a single pip install command. This ensures that dependencies and package versions are properly resolved. If you install via separate commands, you may end up inadvertently overriding a package with the wrong version, due a dependency specified by a later installation. For example:
RUN pip install luigi nolearn lasagne