Connect a Data Source

Model monitoring ingests and processes a model’s training, prediction, and ground truth data to monitor the model. You must set up the data source from which the data is read. After you set up the data source, it is available to all users and can be used in any model.

You can have multiple data sources linked to the model monitor, and multiple instances for each data source type.

Note	Enable read and list access for each data source.

The Model Monitor supports data from the following sources:

Amazon S3
Azure Blob
Azure Data Lake Gen 1
Azure Data Lake Gen 2
Google Cloud Storage
HDFS
Snowflake

Connect a Data Source

If you haven’t set up a data source, go to the Data page (from the navigation pane, go to Model Monitor > Monitoring Data Sources).
Click Add Data Source.

Complete the details as needed. The following describes source-specific configurations:

Data source

Required fields

Authentication

Amazon S3

Data Source Name

S3 Bucket Name

S3 Region

If the S3 buckets can be authenticated to using IAM roles, select the Load Credentials from IAM Role attached to the instance checkbox.

Enter an AWS Access Key and an AWS Secret Key.

Azure Blob Store

Data Source Name

Account name

Container

Access Key

Saas Token

Azure Data Lake Gen 1

Data Source name

Container

Select the Authentication Type:

If you select Client Credentials, enter:
- Token Endpoint
- Application ID
- Secret Key
If you select Managed Identity , you can enter an optional Port Number.

This method applies when the Model Monitor is deployed on Azure VMs configured with service identities that can access Azure Data Lake.

Azure Data Lake Gen 2

Data Source name

Account Name

Container

Select the Authentication Type:

Shared Key.
Client Credentials. Then, enter:
- Endpoint
- Client ID
- Client Secret
Username Password. Then, enter:
- Endpoint
- Username
- Password
Refresh Token. Then, enter:
- Refresh Token
- Client ID
Managed Identity. This method applies when the Model Monitor has been deployed on Azure VMs configured with service identities that can access Azure Data Lake Gen2. Then, you can enter the following optional information:
- Tenant ID
- Client ID

Google Cloud Storage

Data Source Name

Bucket

JSON Key File

HDFS

Data Source Name

Host

Port (optional)

Snowflake

All fields

Enter the following:

Account URL which uniquely identifies the Snowflake account in your organization.
Enter your Username and Password.
In Database, enter the name of the Snowflake database that contains the data.
In Schema, enter the name of the active schema for the session.
In Warehouse, enter the name of a compute resource cluster that provides the resources in Snowflake.
Select a Role.

Note
All fields must match Snowflake’s requirements for object identifiers.

Click Add.