- NVIDIA Driver
-
Your server administrator must configure the NVIDIA driver at the host level. Use the configuration guide to identify the correct NVIDIA driver for your host. See the DGX Systems Documentation for more information.
- CUDA Version
-
The CUDA software version required for a given development framework, such as Tensorflow, is documented on their website. For example, Tensorflow >=2.1 requires CUDA 10.1 and some additional software packages, for example, CuDNN.
- CUDA and NVIDIA Driver Compatibility
-
After you identify the correct CUDA version, consult the CUDA-NVIDIA Driver Compatibility Table.
In the Tensorflow 2.1 example, the CUDA 10.1 requirement means you must be running CUDA >=10.1 and NVIDIA driver >=410.48 on the host. Table 1 in the previous link will guide your choice of matching CUDA and NVIDIA driver versions.
Subsequently, the Domino Compute Environment must be configured to leverage the exact CUDA version that corresponds to the application.
Simplifying this constraint, CUDA drivers provide backwards compatibility: the CUDA version on the host can be greater or equal to that which is specified in your Compute Environment.
Because the CUDA software installation process often returns unexpected results when attempting to install an exact CUDA version, including patch version, the fastest route to a functioning configuration is typically to install the latest available minor release from your required major version of CUDA, and subsequently creating a Docker environment variable (ENV) from within your Compute Environment that constrains compatible sets of CUDA, GPU generations, and NVIDIA drivers.
- Need Additional Assistance?
-
Consult your Domino customer success engineer for guidance on your specific needs. Domino can sample configurations that will simplify your configuration process.