Ingest data into the Model Monitor

Model monitoring ingests and processes model’s training, prediction and ground truth data to monitor the model. For it to work, you need to set up a Data Source from which it can read this data. Data sources set up in your Model Monitor deployment are accessible to all users and can be used for feeding data to any model.

The Model Monitor supports reading data from the following Data Sources:

Amazon S3
Azure Blob
Azure Data Lake Gen 1
Azure Data Lake Gen 2
Google Cloud Storage
HDFS

Set up data sources

The Data Sources section in the Model Monitor enables you to add and configure a new data source. You can have multiple data sources linked to the model monitor. With each data source type, you can have multiple instances configured as data sources. For each data source you’re connecting, make sure to enable read and list access.

To add a new data source in the model monitor, go to the Data Sources section, and click Add Data Source. Select the type of data source you want to register, provide a name for the data source (this is the name that you will use when registering datasets for models), and fill in the details relevant to the type you choose.

In case of any errors with the configuration of the registered data sources or access related issues, appropriate errors will be shown on the Ingest Dashboard of the registered model.

Amazon S3

Required configurations:

Bucket
Region

Model monitoring offers two ways to authenticate to your S3 bucket:

If the S3 buckets can be authenticated to using IAM roles, you can enable the checkbox that accepts credentials from the Attached IAM role.
Alternatively, you can provide an Access Key and a Secret Key.

Azure Blob Storage

Required configurations:

Container
Account name

Model monitoring offers users two ways to authenticate to your Azure Blob store:

By providing the Account Key
By providing the SAS Token for a container

Azure Data Lake Gen 1

Required configurations:

Container

Model monitoring offers users two ways to authenticate requests to Azure Data Lake Gen 1:

By providing client credentials that include:
1. Token Endpoint
2. Application ID
3. Secret Key
Via a managed service identity. This method applies when the Model Monitor has been deployed on Azure VMs configured with service identities that can access Azure Data Lake. This includes -
1. An optional port number

Azure Data Lake Gen 2

Required configurations:

Container
Account Name

Model monitoring offers users five ways to authenticate to Azure Data Lake Gen 2:

By providing the shared key
By providing client credentials that include -
1. Token Endpoint
2. Application ID
3. Client Secret
By Providing a username and password that includes -
1. Token Endpoint
2. Username
3. Password
By providing a refresh token that includes -
1. Refresh token
2. Optional client ID
Via a managed service identity This method applies when the Model Monitor has been deployed on Azure VMs configured with service identities that can access Azure Data Lake Gen2. This includes -
1. An optional tenant ID
2. An optional client ID

Google Cloud Storage

Required configurations:

Bucket name
JSON key file

HDFS

Required configurations:

Host name
Optional port number