Connect to a MapR cluster from Domino

Domino supports connecting to a MapR cluster through the addition of cluster-specific binaries and configuration files to your Domino environment.

At a high level, the process is as follows:

Connect to a MapR Edge node and gather the required binaries and configuration files, then download them to your local machine.
Upload the gathered files into a Domino project to allow access by the Domino environment builder.
Create a new Domino environment that uses the uploaded files to enable connections to your cluster.
Enable YARN integration for the Domino projects that you want to use with the MapR cluster.

Domino supports the following types of connections to a MapR cluster:

Gathering the required binaries and configuration files

You will find most of the necessary files for setting up your Domino environment on your https://mapr.com/docs/52/AdvancedInstallation/PlanningtheCluster-node-types.html[MapR Edge node^]. To get started, connect to the Edge node via SSH, then follow the steps below.

Create a directory named hadoop-binaries-configs at /tmp.
```
mkdir /tmp/hadoop-binaries-configs
```
Copy hive-site.xml from /opt/mapr/spark/spark-<version>/conf to /tmp/hadoop-binaries-configs/. Be sure to replace the <version> string in the command below with the number that matches the folder name on your edge node.
```
cp /opt/mapr/spark-<version>/conf /tmp/hadoop-binaries-configs/
```
Copy the ssl_truststore from /opt/mapr/conf to /tmp/hadoop-binaries-configs/
```
cp /opt/mapr/conf/ssl_truststore /tmp/hadoop-binaries-configs/
```
Once you’ve copied the above files into /tmp/hadoop-binaries-configs, zip up the directory for transfer to your local machine.
```
cd /tmp
tar -zcf hadoop-binaries-configs.tar.gz hadoop-binaries-configs
```
Then use SCP from your local machine to download the zipped archive. After transfer, extract the files to your local filesystem and keep them handy for a future step where they will be uploaded to Domino.
On the MapR edge node, run the following command to identify the version of Java running on the cluster.
```
java -version
```
You must then download a JDK .tar file from the Oracle downloads page that matches that version. The filename will have a pattern like the following.

jdk-8u211-linux-x64.tar.gz

Keep this JDK handy for use in a future step.

Uploading the binaries and configuration files to Domino

Use the following procedure to upload the files you retrieved in the previous step to a public Domino project. This will make the files available to the Domino environment builder.

Log in to Domino, then create a new public project.
Open the Files page for the new project, then click to browse for files and select the files you downloaded from the MapR edge node, and the JDK you downloaded from Oracle. Then click Upload.
From the Files page of your project, click New File. Name the file run-client.sh, and in its contents you must construct an invocation of the MapR configure.sh script that is valid for setting up a client to connect to your cluster. A full explanation of how to invoke this script is beyond the scope of this document. Read the full documentation on the script from MapR, and consider the following example.

run-client.sh
```
#!/bin/bash
/opt/mapr/server/configure.sh -N <clustername> -c -secure -C <host1>:7222,<host2>:7222,<host3>:7222 -HS <historyServer>
```
Once your project contains the files from the MapR edge node, the correct JDK, and a run-client.sh script that wraps the MapR configuration script, click the gear menu next to each of those files, then right click Download and click Copy Link Address. Save these URLs in your notes, as you will need them in the next step.

Once you have recorded the download URL of the binaries and configuration files, you’re ready to build a Domino environment for connecting to MapR.

Creating a Domino environment for connecting to MapR

Click Environments from the Domino main menu, then click Create Environment.
Give the environment an informative name, then choose a base environment that includes the version of Python that is installed on the nodes of your MapR cluster. Most Linux distributions ship with Python 2.7 by default, so you will see the Domino Analytics Distribution for Python 2.7 used as the base image in the following examples. Click Create when finished.

After creating the environment, click Edit Definition. Copy the below example into your Dockerfile Instructions, then be sure to edit it wherever necessary with values specific to your deployment and cluster.

In this Dockerfile, wherever you see a hyphenated instruction enclosed in carats like <paste-your-domino-download-url-here>, be sure to replace it with the corresponding value you recorded in previous steps. You may also need to edit commands that follow to match downloaded filenames.

# Base Image: quay.io/domino/base:Ubuntu16_DAD_Py2.7_R3.4-20180727
USER root

# Give the ubuntu user ability to sudo as any user including root in the compute environment
RUN echo "ubuntu ALL=(ALL:ALL) NOPASSWD: ALL" >> /etc/sudoers

# Set up directories
RUN mkdir /tmp/mapr-cluster-downloads &&
   mkdir /usr/jdk64

# Create a mapr user and group
RUN groupadd -g 5000 mapr
RUN useradd -u 5000 -g mapr mapr
RUN usermod -s /bin/bash mapr

# Use the following wget commands to download the four files you added to Domino in the previous section.
# You should have copied down the URLs to download a JDK .tar, the two files from the edge node, and the run-client.sh script you created.
# The example below will use a JDK file named jdk-8u112-linux-x64.tar.gz. If you're using a different version or have a different filename, replace it wherever it occurs.
RUN cd /tmp/mapr-cluster-downloads &&
   wget <paste-your-run-client-dot-sh-download-url-here> -O /tmp/mapr-cluster-downloads/run-client.sh.gz &&
   wget <paste-your-hive-site-dot-xml-download-url-here> -O /tmp/mapr-cluster-downloads/hive-site.xml.gz &&
   wget <paste-your-jdk-tar-download-url-here> -O /tmp/mapr-cluster-downloads/jdk-8u112-linux-x64.tar.gz &&
   wget <paste-your-ssl-truststore-download-url-here> -O /tmp/mapr-cluster-downloads/ssl_truststore.gz &&
   gunzip run-client.sh.gz &&
   gunzip hive-site.xml.gz &&
   gunzip jdk-8u112-linux-x64.tar.gz &&
   gunzip ssl_truststore.gz &&
   cd ~

# Install Java from the JDK
RUN tar xvf /tmp/mapr-cluster-downloads/jdk-8u112-linux-x64.tar -C /usr/jdk64 &&
   ln -s /usr/jdk64/jdk1.8.0_112 /usr/jdk64/default
ENV JAVA_HOME=/usr/jdk64/default
RUN echo "export JAVA_HOME=/usr/jdk64/default" >> /home/ubuntu/.domino-defaults &&
   echo "export PATH=$JAVA_HOME/bin:$PATH" >> /home/ubuntu/.domino-defaults

# Install mapr-client and Spark binaries from the MapR ubuntu repository.
# These examples are for MapR 6.1.0.
# If you are using a different version of MapR, replace these URLs with the correct versions from http://archive.mapr.com/releases/.
RUN echo "deb https://package.mapr.com/releases/v6.1.0/ubuntu binary trusty" >> /etc/apt/sources.list
RUN echo "deb https://package.mapr.com/releases/MEP/MEP-6.0.0/ubuntu binary trusty" >> /etc/apt/sources.list
RUN wget -O - https://package.mapr.com/releases/pub/maprgpg.key | sudo apt-key add -
RUN apt-get update
RUN apt-get -y install mapr-client mapr-spark mapr-hive

# Copy the ssl_truststore file from the /opt/mapr/conf directory on the cluster to the /opt/mapr/conf directory on the client
RUN cp /tmp/mapr-cluster-downloads/ssl_truststore /opt/mapr/conf/

# Make your customized script from the previous section executable
RUN chmod +x /tmp/mapr-cluster-downloads/run-client.sh

# Update SPARK and HADOOP environment variables.
# Make sure the Spark and Hadoop version numbers match what is installed on your cluster
# The examples below show Spark 2.3.1 and Hadoop 2.7.0.
# If you are using different versions, be sure to edit the file and directory names to match.
# Make sure the py4j file name is correct per your edgenode.
ENV SPARK_HOME=/opt/mapr/spark/spark-2.3.1
RUN echo "export HADOOP_HOME=/opt/mapr/hadoop/hadoop-2.7.0" >> /home/ubuntu/.domino-defaults &&
   echo "export HADOOP_CONF_DIR=/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop" >> /home/ubuntu/.domino-defaults &&
   echo "export YARN_CONF_DIR=/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop" >> /home/ubuntu/.domino-defaults &&
   echo "export SPARK_HOME=/opt/mapr/spark/spark-2.3.1" >> /home/ubuntu/.domino-defaults &&
   echo "export SPARK_CONF_DIR=/opt/mapr/spark/spark-2.3.1/conf" >> /home/ubuntu/.domino-defaults &&
   echo "export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.7-src.zip" >> /home/ubuntu/.domino-defaults

# Change spark configuration directory permission as a new spark-defaults.conf file gets created by Domino's spark integration
RUN chmod 777 /opt/mapr/spark/spark-2.3.1/conf

# Add symlinks for Spark binaries
RUN ln -s /opt/mapr/spark/spark-2.3.1/bin/pyspark /usr/bin/pyspark
RUN ln -s /opt/mapr/spark/spark-2.3.1/bin/spark-shell/usr/bin/spark-shell
RUN ln -s /opt/mapr/spark/spark-2.3.1/bin/spark-submit /usr/bin/spark-submit

# Update Java path for R
RUN export LD_LIBRARY_PATH=/usr/jdk64/default/jre/lib/amd64/server && R CMD javareconf

# Install Python and R JDBC packages
RUN pip install jaydebeapi
RUN R --no-save -e 'install.packages(c("RJDBC"))'

Scroll down to the Pre Run Script field and add the following lines, being sure to match the Spark version in the directory name to the one being set up by the Dockerfile instructions.

# Configure mapr-client with your customized script.
sudo bash /tmp/mapr-cluster-downloads/run-client.sh

# Copy hive-site.xml to the spark configuration directory
# Be sure to match the Spark version in this folder name to match what you set up above.
cp /tmp/mapr-cluster-downloads/hive-site.xml /opt/mapr/spark/spark-2.3.1/conf

(Optional) If you want to store and access MapR user tickets as Domino environment variables, follow these additional steps.
1. Request a long-running MapR ticket from your cluster administrator, and copy its contents to your local machine. The ticket will be formatted as:
  
  <cluster-name> <token>
2. Add that token as a Domino environment variable to your Domino user account with the name USERTICKET.
1. Add the following lines to the bottom of the Pre Run Script field for the environment you edited previously.
  ## Write maprticket in environment variable to a file during runtime echo $USERTICKET > /tmp/maprticket_12574 chown ubuntu:ubuntu /tmp/maprticket_12574 chmod 600 /tmp/maprticket_12574
  Note that if you do this, every user that wants to use this environment must set up a USERTICKET environment variable as described in the previous step.
Click Build when finished editing the Dockerfile instructions. If the build completes successfully, you are ready to try using the environment.

Configure a Domino project for use with a MapR cluster

This procedure assumes that an environment with the necessary client software has been created according to the instructions above. Ask your Domino admin for access to such an environment.

Open the Domino project you want to use with your MapR cluster, then click Settings from the project menu.
On the Integrations tab, click to select YARN integration from the Apache Spark panel, then click Save. You should not need to edit any of the fields in this section.
On the Hardware & Environment tab, change the project default environment to the one you built earlier with the binaries and configuration files.

You are now ready to start Runs from this project that interact with your MapR cluster.