The simplest and fastest way to connect and interact with the Snowflake database is by using a Domino Data Source. Data Sources simplify the database setup via configuration records and are optimized for read-heavy, SQL-based usage.
-
From inside your project, click on Data in the Project menu bar. This will open the Data window.
-
On the Data Sources page, click the + Add a Data Source button.
-
In the popup window, select Snowflake from the list.
-
On the Configure window, enter your account and connection credentials, add an optional description, then click Next.
-
On the next Authenticate window, enter your Snowflake username and password then click Test Credentials.
-
Domino will attempt to connect to the specified database using the user credentials you entered. If all goes well, you will see a message, confirming that the connection was successful. Click Next.
-
On the next Permissions screen, click Finish Setup.
You are now ready to use the Data Source.
-
Start a Domino Workspace. As the workspace starts up, you will notice a red dot next to Data in the Project menu. Once the Workspace is loaded, click Data.
-
The left pane of the screen will expand to show the Data Source. Click the copy icon to copy the connection code associated with the Snowflake Data Source.
-
Next, on the Launcher page, click on the Python shortcut in the Notebook section to open a Python Notebook.
-
Use Command + V (on Mac) or Ctrl + V on Windows and Linux to paste the code you copied into the first cell of the notebook. The Data Source functionality in Domino enables users to connect to the database using these templated commands. It will look something like this:
from domino.data_sources import DataSourceClient # Instantiate a client and fetch the DataSource instance ds = DataSourceClient().get_datasource("weather_db") # Create a simple wrapper from the query result res = ds.query("select * from {{table_reference}}") # Load the result into a Pandas dataframe df = res.to_pandas()
-
Modify the third statement (the query) so that it matches one of your database tables:
res = ds.query("select * from state_province")
This will load the list of states from the database table into a result object, which is then converted into a Pandas dataframe with the following code:
df = res.to_pandas()
-
Add the following code to display the first few rows of the table:
df.head()
-
Run the cell to get the following results:
STATE_CODE NAME ---------------------------------- 0 AK ALASKA 1 AL ALABAMA 2 AR ARKANSAS 3 AS AMERICAN SAMOA 4 AZ ARIZONA
-
As another example, run the following code, with a different query, to count how many weather stations are located in each country (note that the country code takes the first two characters in the station identifier):
sfQuery = """SELECT SUBSTR(STATION_ID, 1, 2) AS COUNTRY_CODE, COUNT(STATION_ID) AS STATION_COUNT FROM WEATHER_STATION GROUP BY SUBSTR(STATION_ID, 1, 2) ORDER BY STATION_COUNT DESC;""" res = ds.query(sfQuery) df = res.to_pandas() df
The following results are returned:
COUNTRY_CODE STATION_COUNT ---------------------------------- 0 GM 1123 1 NL 386 2 SP 207 3 UK 158 4 FR 111 5 IT 104 6 PO 26 7 AU 12 8 SZ 10 9 BE 1
In summary, the Data Source usage pattern entailed the following:
-
Define the query in SQL.
-
Use the Data Source to send the query and collect the results.
-
Convert the results into a Pandas dataframe.
-
Do something productive with the dataframe.
Use Snowflake’s Python driver for read and write operations to create a predictive weather model.