Get started with Snowflake (and Python)

This tutorial will give you a chance to experience working with the Snowflake database and Domino. You will follow a basic data collection, engineering, and loading workflow and then create a model in Python that uses the data in Snowflake.

Domino offers various methods to connect to Snowflake:

Overview

In this Get Started series, you’ll learn how to work with Domino Data Stores to crush big data with the following workflow:

  1. Preliminaries – Data Engineering:

    1. Find data.

    2. Understand the data.

    3. Get the data.

    4. Wrangle data into a format usable for analysis.

  2. Analysis:

    1. Look at the data – normally using a subset of the complete dataset.

    2. Clean the data – deal with missing and errant data.

    3. Identify the arguments that you believe matter for your prediction to work.

  3. Model development:

    1. Try out several algorithms to determine which one produces the best results.

    2. Save the training function.

  4. Model training:

    1. Run the model training function on the complete dataset.

    2. Collect the model.

    3. Test again.

Assumptions

  • This tutorial is aimed at data science professionals familiar with JupyterLab, Jupyter Notebooks, and the Python language.

  • The code is for illustration purposes. It is functional, tested, and offers a very basic view into the use of Domino with data in Snowflake.

  • Domino offers multiple connectivity modes with Snowflake — primarily:

    • Domino Data Sources - meant for read-oriented exploration.

    • The Snowflake Python library - meant for full-featured database operations in Snowflake.

  • Please use Domino’s file sync functionality to store your file progress in the project’s repository throughout the tutorial.

Pre-requisites

  • Familiarity with Domino Workspaces and Datasets.

  • Access permissions (username, password, and authorization) to a Snowflake database.

  • The name of your Snowflake warehouse, database, and schema.

  • Domino permissions to set up a Snowflake Data Source (if applicable).

  • Snowflake’s SnowSQL command line tool for the data engineering and loading sections of this tutorial.

  • Familiarity with the SQL language and Pandas library.