Domino Environments have a Dockerfile Instructions section. Type commands in this section to install packages directly into your Environment.
Preloading packages can save time as it prevents repeated loading in Workspaces and Jobs. Installation is not required after the initial build as Domino Environments are cached. However, the Environment is still loaded with each new execution so only packages that are frequently used should be cached. Preloading too many unnecessary packages will increase the size of the cached which can significantly increase load times.
Dockerfile instructions can load specific versions of packages and libraries for clarity which helps with reproducibility and collaboration. It’s a best practice to fix package versions in the Dockerfile instructions.
An alternative to caching packages in a Python project is to use a requirement.txt file. See requirements.txt in Domino for more details.
Consult the official Docker documentation to learn more about Dockerfiles:
Do not start your Dockerfile instructions in Domino with a FROM
line.
Domino includes the FROM
line for you, pointing to the base image specified when setting up the Environment.
The most common Dockerfile instructions you’ll use are
RUN
, ENV
, and ARG
:
RUN
commands execute lines of bash.
For example:
RUN wget http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz
RUN tar xvzf spark-1.5.1-bin-hadoop2.6.tgz
RUN mv spark-1.5.1-bin-hadoop2.6 /opt
RUN rm spark-1.5.1-bin-hadoop2.6.tgz
RUN pip install s3fs==2021.6.1 scipy==1.7.1
RUN R --no-save -e "install.packages('remotes')"
ARG
commands set build-time variables, and
ENV
commands set container bash Environment variables.
They will be accessible from runs that use this Environment.
For example:
ENV SPARK_HOME /opt/spark-1.5.1-bin-hadoop2.6
If you set variables in the Environment variables section of your definition, you can use an ARG
statement:
ARG SPARK_HOME
This will be available for the build step.
If you want the variable to be available in the final compute Environment you must add an ENV
statement referencing the argument name:
ENV SPARK_HOME=$SPARK_HOME
-
When you edit your Environment, click R Package or Python Package to insert a line to install packages.
-
Enter the names of the packages.
-
You can also add the commands, as in the following examples:
-
R Package Installation, example with the devtools package.
RUN R --no-save -e "install.packages('devtools')"
-
Python Package Installation with Pip, for example with the… numpy package.
RUN pip install numpy
-
-
Docker optimizes its build process by keeping track of commands it has run and aggressively caching the results. This means that if it sees the same set of commands as in a previous build, it will assume that it can use the cached version. A single new command will invalidate the caching of all subsequent commands.
-
A Docker image can have up to 127 layers, or commands. To work around this limit, you can use
&&
: to combine several commands into a single command.RUN wget http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz && tar xvzf spark-1.5.1-bin-hadoop2.6.tgz && mv spark-1.5.1-bin-hadoop2.6 /opt && rm spark-1.5.1-bin-hadoop2.6.tgz
-
If you are installing multiple python packages via pip, it’s almost always best to use a single pip install command. This ensures that dependencies and package versions are properly resolved. If you install via separate commands, you may end up inadvertently overriding a package with the wrong version, due a dependency specified by a later installation. For example:
RUN pip install luigi nolearn lasagne