Connect a Data Source

Model monitoring ingests and processes a model’s training, prediction, and ground truth data to monitor the model. You must set up the data source from which the data is read. After you set up the data source, it is available to all users and can be used in any model.

You can have multiple data sources linked to the model monitor, and multiple instances for each data source type.

Note

Enable read and list access for each data source.

The Model Monitor supports data from the following sources:

  • Amazon S3

  • Generic S3

  • Azure Blob

  • Azure Data Lake Gen 1

  • Azure Data Lake Gen 2

  • Google Cloud Storage

  • HDFS

  • Snowflake

Connect a Data Source

  1. If you haven’t set up a data source, go to the Data page (from the navigation pane, go to Model Monitor > Monitoring Data Sources).

  2. Click Add Data Source.

  3. Complete the details as needed. The following describes source-specific configurations:

    Data sourceRequired fieldsAuthentication

    Amazon S3

    Data Source Name

    S3 Bucket Name

    S3 Region

    If the S3 buckets can be authenticated to using IAM roles, select the Load Credentials from IAM Role attached to the instance checkbox.

    Enter an AWS Access Key and an AWS Secret Key.

    Generic S3

    Data Source Name

    S3 Bucket Name

    S3 Endpoint

    Enter an Access Key and a Secret Key.

    Azure Blob Store

    Data Source Name

    Account name

    Container

    Access Key

    Saas Token

    Azure Data Lake Gen 1

    Data Source name

    Container

    Select the Authentication Type:

    1. If you select Client Credentials, enter:

      • Token Endpoint

      • Application ID

      • Secret Key

    2. If you select Managed Identity , you can enter an optional Port Number.

    This method applies when the Model Monitor is deployed on Azure VMs configured with service identities that can access Azure Data Lake.

    Azure Data Lake Gen 2

    Data Source name

    Account Name

    Container

    Select the Authentication Type:

    1. Shared Key.

    2. Client Credentials. Then, enter:

      • Endpoint

      • Client ID

      • Client Secret

    3. Username Password. Then, enter:

      • Endpoint

      • Username

      • Password

    4. Refresh Token. Then, enter:

      • Refresh Token

      • Client ID

    5. Managed Identity. This method applies when the Model Monitor has been deployed on Azure VMs configured with service identities that can access Azure Data Lake Gen2. Then, you can enter the following optional information:

      • Tenant ID

      • Client ID

    Google Cloud Storage

    Data Source Name

    Bucket

    JSON Key File

    HDFS

    Data Source Name

    Host

    Port (optional)

    Snowflake

    All fields

    Enter the following:

    1. Account URL which uniquely identifies the Snowflake account in your organization.

    2. Enter your Username and Password.

    3. In Database, enter the name of the Snowflake database that contains the data.

    4. In Schema, enter the name of the active schema for the session.

    5. In Warehouse, enter the name of a compute resource cluster that provides the resources in Snowflake.

    6. Select a Role.

      Note
      All fields must match Snowflake’s requirements for object identifiers.
  4. Click Add.