Creating a SAS Data Science Workspace Environment

In this guide, we will walk through building a SAS Data Science Docker container image that will be integrated with Domino Data Lab. Before we dive in, we will answer a few common questions and provide additional resources.

Preparation

Before getting started, you will need a few things. Configuring and getting ready for these are outside of the scope of this guide.

Internet Access

Although you are not required to run the completed SAS Data Science container image on a Domino Data Lab environment that has Internet access, you will need Internet access to download the appropriate tools that are used to build the SAS Data Science container image.

Docker Client

We will be using a Docker CLI client to build the SAS Data Science container image. Although all of the commands shown can be copy/pasted, it is good to have some familiarity with the Docker CLI tools.

Docker Registry

A Docker Registry will be used to store the final SAS Data Science image before it can be consumed in Domino. There many options for Docker Registry providers and software. If you do not feel comfortable with setting up a Docker Registry to store the Docker images for your Domino Data Lab environment, please contact your Domino Customer Success Manager (CSM) or Technical Account Manager (TAM).

Git Client

We will be checking out a SAS Container Recipes Git repository. Although there are other ways to download this repository from the Internet, Git CLI will be used in this guide.

SAS Data Science License

This installation does require that you have a valid SAS Data Science license, which is provided to you by SAS Institute Inc. As part of the license, you should have a file called SAS_Viya_deployment_data.zip that will contain all of your license information and will be used to download the appropriate software.

Comfort with Linux Command-Line Utilities

All of the instructions in this guide are written for Red Hat Enterprise Linux variants. The instructions are primarily for CentOS 7, but can easily be adapted to support Red Hat Enterprise Linux, SuSE Enterprise Linux, or Oracle Linux, which are all supported by the SAS Data Science platform.

Please see the following page for Linux 64-bit operating systems that SAS Data Science (Viya family) supports: SAS Supported Operating Systems.

Creating a SAS Data Science Docker Image

The instructions for building the base SAS Data Science image that we follow are based on the SAS Container Recipes, which is available on the GitHub webpage below. Please consult the directions in the following GitHub repository for exact instructions for your situation: SAS Container Recipes.

In this guide, we will be building a SAS Data Science image with a CentOS 7 base. This will be a single Viya container instead of the full-blown Viya platform across multiple containers.

The general build instructions are as follows:

  1. Clone the GitHub repository for SAS Container Recipes
image3 Shell Command
1 git clone https://github.com/sassoftware/sas-container-recipes.git
2 cd sas-container-recipes
3 cp PATHTO/SAS_Viya_deployment_data.zip .

Replace PATHTO above with the directory that contains your SAS_Viya_deployment_data.zip file.

  1. Build the SAS Viya image using the build.sh utility provided
image6 Shell Command (cont)
4 ./build.sh –base-image centos –base-tag 7 –type single –zip ./SAS_Viya_deployment_data.zip

At the end of this process, you should have a SAS Data Science Docker image locally.

If you run into any issues, please contact your SAS Institute Inc. representative for support in resolving the issues.

Adding Additional Licensed SAS Software

Although it is outside the scope of this document, if you require installing any additional components like SAS/ACCESS modules or database drivers, please consult with your SAS representatives. These additional components can be layered on top of your base SAS Data Science Docker image.

Integrating the SAS Data Science Docker Image with Domino

We will now switch over to the Domino GitHub repository for the SAS Data Science image build. The Domino repository contains all of the files necessary to finalize the build of the SAS Data Science container image to make it integrated with Domino.

Please follow the README instructions on the Domino repository for more information about the individual files.

These are the steps you will need to follow to complete the build process:

  1. Clone the Domino GitHub repository
image7 Shell Command
1 git clone https://github.com/imarchenko/sas-data-science.git
2 cd sas-data-science
  1. Modify the Dockerfile’s FROM instruction to use the SAS Data Science image you built in the prior steps
image9 Shell Command (cont)
3 SASDS_DOCKER_TAG=NAME:TAG.
4 sed -Ei.bak “s#SASDS_DOCKER_TAG#$SASDS_DOCKER_TAG#g” Dockerfile

Please change NAME:TAG above to the Docker image tag that was created in the Creating a SAS Data Science Docker Image step.

  1. Build the Docker image
image10 Shell Command (cont)
5 DOMINO_SASDS_DOCKER_TAG=NAME:TAG
6 docker build . -t $DOMINO_SASDS_DOCKER_TAG

Please change NAME:TAG above to your final Docker Registry image name and tag. This is the Docker image that will be later used inside of a Domino Compute Environment.

Testing the Docker Image Locally

Before pushing the Docker image to your Docker Registry, it is a good idea to test it locally first. There are two modes to test:

Interactive (SAS Studio)

image13 Shell Command
1 docker run -p 80:8888 -u domino:domino -w /mnt -v $PWD/tests:/mnt -it $DOMINO_SASDS_DOCKER_TAG /var/opt/workspaces/sasds/start

After a couple of minutes when you launch the interactive SAS Studio, you should see a message “SAS Studio is now running”. This is when you can visit http://localhost/SASStudio/start.html in your web browser to test SAS Studio.

Batch

image16 Shell Command
1 SAS_BATCH_PROGRAM=PROGRAM.SAS.
2 docker run -u domino:domino -w /mnt -v $PWD/tests:/mnt -it $DOMINO_SASDS_DOCKER_TAG run_sas.sh $SAS_BATCH_PROGRAM

Please change PROGRAM.SAS above with your test SAS program.

Push the Domino-Integrated SAS Data Science Docker Image to a Docker Registry

The final step is to push the Domino-integrated SAS Data Science Docker image to a Docker Registry. This Docker Registry will be later used to pull the Docker image into your Domino Data Lab environment.

image17 Shell Command
1 docker push $DOMINO_SASDS_DOCKER_TAG

Replace NAME:TAG with the Docker Registry tag you used in the Integrating the SAS Data Science Docker Image with Domino step.

Please work with your Domino Data Lab technical account team on the best method to pull the Docker image into your Domino Data Lab environment.

Configuring the SAS Data Science Compute Environment in Domino

Congratulations, you are near the end of the installation process. The last step is to configure your Compute Environment in your Domino Data Lab environment.

  1. In your Domino Data Lab environment, navigate to the Domino Compute Environments page and create a new Compute Environment
image18
  1. Set the “Custom Image” location to your Docker Registry image. For the Custom Image URL, use the Docker Registry image URL that you created in the Push the Domino-Integrated SAS Data Science Docker Image to a Docker Registry step.
image19
  1. Create a Pluggable Workspace for SAS Studio in your Compute Environment
image22 Pluggable Workspace
1 sasds:
2 title: “SAS Data Science”
3 iconUrl: “https://upload.wikimedia.org/wikipedia/commons/1/10/SAS_logo_horiz.svg
4 start: [ “/var/opt/workspaces/sasds/start” ]
5 httpProxy:
6 internalPath: “/{{ownerUsername}}/{{projectName}}/{{sessionPathComponent}}/{{runId}}/start.html”
7 port: 8888
8 rewrite: false
9 requireSubdomain: false
image23
  1. When you are done defining the Pluggable Workspace, click the Build button at the bottom of the Compute Environment page to finalize your SAS Data Science configuration for Domino Data Lab

Maintenance and License Updates

The easiest way to keep your SAS Data Science updated is to repeat the steps in this guide whenever a new release of SAS Data Science is available. The same process should be repeated when you need to update a license file during renewals.

Repeating this process will ensure that you are staying current with the latest version of the SAS Data Science software.

Troubleshooting

SAS Studio Timeout

By default SAS Studio will log the user out after 30 minutes - so no further development can be done in that session, and changes not written to the filesystem cannot be saved.

The recommendation is to set timeout to a high value e.g. 24 hours.

In the SAS Data Science Compute Environment in Domino, set the following in the Dockerfile:

File: /opt/sas/viya/config/etc/sysconfig/sasstudio.conf

Setting:

export java_global_option_server_servlet_session_timeout="-Dserver.servlet.session.timeout=1440m"

This is a Spring Boot 2.0 property rather than a Studio property; use the ‘m’ for specifying minutes or an interval alone for seconds.

NB: this will be baked into the initial image build in future releases.

SAS Studio Tabs Lost after Session Timeout

To prevent tabs being lost after losing connection, configure the following option in Preferences.

../../../_images/image1.png

Configuring ODBC connections

Ensure that the LD_LIBRARY_PATH is set first, before individual ODBC libraries, as per the example below:

export SASINSIDE=/sasinside/odbc

export ODBCINI=/sasinside/odbc.ini

export ODBCINST=/sasinside/odbcinst.ini

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${SASINSIDE}/lib:${SASINSIDE}/lib/snowflake_odbc/lib

export SIMBAINI=/sasinside/odbc/lib/snowflake_odbc/lib/simba.snowflake.ini

ERROR: Failed to load the Apache Parquet support extension

Errors can be generated when trying to read Parquet files if the LD_LIBRARY_PATH has not been set correctly: please see Configuring ODBC connections above.