In this guide, we will walk through building a SAS Data Science Docker container image that will be integrated with Domino Data Lab. Before we dive in, we will answer a few common questions and provide additional resources.
Before getting started, you will need a few things. Configuring and getting ready for these are outside of the scope of this guide.
A Docker Registry will be used to store the final SAS Data Science image before it can be consumed in Domino. There many options for Docker Registry providers and software. If you do not feel comfortable with setting up a Docker Registry to store the Docker images for your Domino Data Lab environment, contact your Domino Customer Success Manager (CSM) or Technical Account Manager (TAM).
This installation does require that you have a valid SAS Data Science license, which is provided to you by SAS Institute Inc. As part of the license, you should have a file called SAS_Viya_deployment_data.zip that will contain all of your license information and will be used to download the appropriate software.
All of the instructions in this guide are written for Red Hat Enterprise Linux variants. The instructions are primarily for CentOS 7, but can easily be adapted to support Red Hat Enterprise Linux, SuSE Enterprise Linux, or Oracle Linux, which are all supported by the SAS Data Science platform.
See the following page for Linux 64-bit operating systems that SAS Data Science (Viya family) supports: SAS Supported Operating Systems.
The instructions for building the base SAS Data Science image that we follow are based on the SAS Container Recipes, which is available on the GitHub webpage below. Consult the directions in the following GitHub repository for exact instructions for your situation: SAS Container Recipes.
In this guide, we will be building a SAS Data Science image with a CentOS 7 base. This will be a single Viya container instead of the full-blown Viya platform across multiple containers.
The general build instructions are as follows:
Clone the GitHub repository for SAS Container Recipes
cp PATHTO/SAS_Viya_deployment_data.zip .
PATHTOabove with the directory that contains your SAS_Viya_deployment_data.zip file.
Build the SAS Viya image using the build.sh utility provided
Shell Command (cont)
./build.sh –base-image centos –base-tag 7 –type single –zip ./SAS_Viya_deployment_data.zip
At the end of this process, you should have a SAS Data Science Docker image locally.
If you run into any issues, contact your SAS Institute Inc. representative for support in resolving the issues.
Although it is outside the scope of this document, if you require installing any additional components like SAS/ACCESS modules or database drivers, consult with your SAS representatives. These additional components can be layered on top of your base SAS Data Science Docker image.
We will now switch over to the Domino GitHub repository for the SAS Data Science image build. The Domino repository contains all the files necessary to finalize the build of the SAS Data Science container image to make it integrated with Domino.
Follow the README instructions on the Domino repository for more information about the individual files.
These are the steps you will need to follow to complete the build process:
Clone the Domino GitHub repository
Modify the Dockerfile’s FROM instruction to use the SAS Data Science image you built in the prior steps
sed -Ei.bak "s#SASDS_DOCKER_TAG#$SASDS_DOCKER_TAG#g" Dockerfile
NAME:TAGabove to the Docker image tag that was created in the Creating a SAS Data Science Docker Image step.
Build the Docker image
Shell Command (cont)
docker build . -t $DOMINO_SASDS_DOCKER_TAG
NAME:TAGabove to your final Docker Registry image name and tag. This is the Docker image that will be later used inside of a Domino Compute Environment.
Before pushing the Docker image to your Docker Registry, test it locally first. There are two modes to test:
After a couple of minutes when you launch the interactive SAS Studio, you should see a message "SAS Studio is now running". This is when you can visit http://localhost/SASStudio/start.html in your web browser to test SAS Studio.
The final step is to push the Domino-integrated SAS Data Science Docker image to a Docker Registry. This Docker Registry will be later used to pull the Docker image into your Domino Data Lab environment.
NAME:TAG with the Docker Registry tag you used in the
Integrate the SAS Data Science Docker Image with Domino step.
Work with your Domino Data Lab technical account team on the best method to pull the Docker image into your Domino Data Lab environment.
The last step is to configure your Compute Environment in your Domino Data Lab environment.
In your Domino Data Lab environment, go to the Domino Compute Environments page and create a new Compute Environment.
Set the Custom Image location to your Docker Registry image. For the Custom Image URL, use the Docker Registry image URL that you created in the Push the Domino-Integrated SAS Data Science Docker Image to a Docker Registry step.
Create a Pluggable Workspace for SAS Studio in your Compute Environment
image:/images/logos/domino.png Pluggable Workspace
title: "SAS Data Science"
start: [ "/var/opt/workspaces/sasds/start" ]
When you are done defining the Pluggable Workspace, click Build to finalize your SAS Data Science configuration for Domino Data Lab
The easiest way to keep your SAS Data Science updated is to repeat the steps in this guide whenever a new release of SAS Data Science is available. The same process should be repeated when you need to update a license file during renewals.
Repeating this process will ensure that you are staying current with the latest version of the SAS Data Science software.
By default SAS Studio will log the user out after 30 minutes - so no further development can be done in that session, and changes not written to the filesystem cannot be saved.
The recommendation is to set timeout to a high value for example, 24 hours.
In the SAS Data Science Compute Environment in Domino, set the following in the Dockerfile:
This is a Spring Boot 2.0 property rather than a Studio property; use the 'm' for specifying minutes or an interval alone for seconds.
NB: this will be baked into the initial image build in future releases.
Ensure that the
LD_LIBRARY_PATH is set first, before individual ODBC
libraries, as per the example below:
Errors can be generated when trying to read Parquet files if the LD_LIBRARY_PATH has not been set correctly: see Configure ODBC connections above.