This tutorial will give you a chance to experience working with the Snowflake database and Domino. You will follow a basic data collection, engineering, and loading workflow and then create a model in Python that uses the data in Snowflake.
Domino offers various methods to connect to Snowflake:
-
Snowflake SnowSQL.
-
Domino Data Source using Snowflake.
-
Snowflake Connector for Python.
-
Snowflake Snowpark.
In this Get Started series, you’ll learn how to work with Domino Data Stores to crush big data with the following workflow:
-
Preliminaries – Data Engineering:
-
Find data.
-
Understand the data.
-
Get the data.
-
Wrangle data into a format usable for analysis.
-
-
Analysis:
-
Look at the data – normally using a subset of the complete dataset.
-
Clean the data – deal with missing and errant data.
-
Identify the arguments that you believe matter for your prediction to work.
-
-
Model development:
-
Try out several algorithms to determine which one produces the best results.
-
Save the training function.
-
-
Model training:
-
Run the model training function on the complete dataset.
-
Collect the model.
-
Test again.
-
-
This tutorial is aimed at data science professionals familiar with JupyterLab, Jupyter Notebooks, and the Python language.
-
The code is for illustration purposes. It is functional, tested, and offers a very basic view into the use of Domino with data in Snowflake.
-
Domino offers multiple connectivity modes with Snowflake — primarily:
-
Domino Data Sources - meant for read-oriented exploration.
-
The Snowflake Python library - meant for full-featured database operations in Snowflake.
-
-
Please use Domino’s file sync functionality to store your file progress in the project’s repository throughout the tutorial.
-
Familiarity with Domino Workspaces and Datasets.
-
Access permissions (username, password, and authorization) to a Snowflake database.
-
The name of your Snowflake warehouse, database, and schema.
-
Domino permissions to set up a Snowflake Data Source (if applicable).
-
Snowflake’s SnowSQL command line tool for the data engineering and loading sections of this tutorial.
-
Familiarity with the SQL language and Pandas library.
The tutorial is designed to be followed in a sequence:
-
Data engineering - Prepare and load the data into Snowflake.
-
Use Snowflake with a Domino Data Source - A simple connectivity example.
-
Use Snowflake’s Python driver in Domino: Build a data update service with a Domino Job.
-
Snowflake Snowpark - Create a model in Domino and set it up as a Snowflake user-defined function (Video).