In a shared Spark cluster, it can be challenging for teams to manage their dependencies (e.g. Python packages or JARs). Installing every dependency that a Spark application may need before it runs and dealing with version conflicts can be complex and time-consuming.
Domino allows you to easily package and manage dependencies as part of your Spark-enabled compute environments. This approach creates the flexibility to manage dependencies for individual projects or workloads without having to deal with the complexity of a shared cluster.
To add a new dependency, add the appropriate statements in the Docker Instructions section of the relevant Spark and execution compute environments.
For example to add numpy you would want to include the following.
USER root ### Optionally specify version if desired RUN pip install numpy USER ubuntu