Connecting to Impala from Domino




Overview

This article describes how to connect to Apache Impala from Domino.

Apache Impala is an open source massively parallel processing SQL query engine for data stored in a computer cluster running Apache Hadoop.




Using Impala ODBC Connector for Cloudera Enterprise with pyodbc

Domino recommends using the Impala ODBC Connector for Cloudera Enterprise in concert with the pyodbc library for interacting with Impala from Python.


Environment setup

  1. Visit the Cloudera downloads page to download the Impala ODBC Connector for Cloudera Enterprise to your local machine. For default Domino images of Ubuntu 16.04, you should download the 64-bit Debian package. Keep track of where you save this file, as you will need it in a later step.

    Screen_Shot_2019-02-04_at_10.56.30_AM.png

  2. Create a new public project in your Domino instance to host the driver files for use in Domino environments.

    Screen_Shot_2019-02-04_at_11.07.04_AM.png

  3. In the new project, click browse for files and select the driver file you downloaded earlier to queue it for upload. Click Upload to add it to the project.

    Screen_Shot_2019-02-04_at_11.09.02_AM.png

  4. After the driver file has been added to your project files, click the gear next to it in the files list, then right click Download and click Copy link address. Save this address somewhere and keep it handy, as you will need when setting up your environment.

    Screen_Shot_2019-02-04_at_11.16.13_AM.png

  5. Add the below Dockerfile instructions below to install the driver and pyodbc in your environment, pasting in the URL you copied earlier where indicated on line 5.

    # download the driver from your project
    RUN mkdir /ref_files
    RUN \
    cd /ref_files && \
    wget --no-check-certificate [paste-download-url-from-previous-step-here] && \
    gzip -d clouderaimpalaodbc_2.6.0.1000-2_amd64.deb.gz
    
    # install the driver
    RUN gdebi /ref_files/clouderaimpalaodbc_2.6.0.1000-2_amd64.deb --n
    
    # update odbc.ini file for impala driver
    RUN \
    echo "\n\
    [Cloudera ODBC Driver for Impala] \n \
    Driver=/opt/cloudera/impalaodbc/lib/64/libclouderaimpalaodbc64.so \n \
    KrbFQDN=_HOST \n \
    KrbServiceName=impala \n" >> /etc/odbcinst.ini
    
    # set up impala libraries
    RUN export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cloudera/impalaodbc/lib/64
    RUN ldd /opt/cloudera/impalaodbc/lib/64/libclouderaimpalaodbc64.so
    
    # install pyodbc
    RUN pip install pyodbc
    
     For a basic introduction to modifying Domino environments,

    watch this tutorial video.


Credential setup

There are several environment variables you should set up to store secure information about your Impala connection. Set the following as Domino environment variables on your user account:

  • IMPALA_HOST

    Hostname where your Impala service is running. Make sure your Impala service and network firewall are configured to accept connections from Domino.

  • IMPALA_PORT

    The port your Impala service is configured to accept connections on.

  • IMPALA_KERB_HOST

    Hostname of your Kerberos authentication service.

  • IMPALA_KERB_REALM

    The name of the Kerberos realm used by the Impala service.

Read Environment variables for secure credential storage to learn more about Domino environment variables.


Usage

Read the pyodbc documentation for detailed information on how to use the package to interact with a database. Below are some example for how to set up a connection.

import pyodbc
import os

# fetch values from environment variables
hostname = os.environ['IMPALA_HOST']
service_port = os.environ['IMPALA_PORT']
kerb_host = os.environ['IMPALA_KERB_HOST']
kerb_realm = os.environ['IMPALA_KERB_REALM']

# create connection object
conn = pyodbc.connect('Host=hostname;'
                     +'DRIVER={Cloudera ODBC Driver for Impala};'
                     +'PORT=service_port;'
                     +'KrbRealm=kerb_realm;'
                     +'KrbFQDN=kerb_host;'
                     +'KrbServiceName=impala;'
                     +'AUTHMECH=1',autocommit=True)

# if you see:
# 'Error! Filename not specified'
# while querying Impala using the connection object,
# add the following configuration line:
#
# conn.setencoding(encoding='utf-8', ctype=pyodbc.SQL_CHAR)


# if your Impala uses SSL, add SSL=1 to the connection string
# conn = pyodbc.connect('Host=hostname;'
#                      +'DRIVER={Cloudera ODBC Driver for Impala};'
#                      +'PORT=service_port;'
#                      +'KrbRealm=kerb_realm;'
#                      +'KrbFQDN=kerb_host;'
#                      +'KrbServiceName=impala;'
#                      +'AUTHMECH=1;'
#                      +'SSL=1;'
#                      +'AllowSelfSignedServerCert=1', autocommit=True)