Model monitoring ingests and processes model’s training, prediction and ground truth data to monitor the model. For it to work, you need to set up a Data Source from which it can read this data. Data sources set up in your Model Monitor deployment are accessible to all users and can be used for feeding data to any model.
The Model Monitor supports reading data from the following Data Sources:
-
Amazon S3
-
Azure Blob
-
Azure Data Lake Gen 1
-
Azure Data Lake Gen 2
-
Google Cloud Storage
-
HDFS
The Data Sources section in the Model Monitor enables you to add and configure a new data source. You can have multiple data sources linked to the model monitor. With each data source type, you can have multiple instances configured as data sources. For each data source you’re connecting, make sure to enable read
and list
access.
To add a new data source in the model monitor, go to the Data Sources section, and click Add Data Source. Select the type of data source you want to register, provide a name for the data source (this is the name that you will use when registering datasets for models), and fill in the details relevant to the type you choose.
In case of any errors with the configuration of the registered data sources or access related issues, appropriate errors will be shown on the Ingest Dashboard of the registered model.
Amazon S3
Required configurations:
-
Bucket
-
Region
Model monitoring offers two ways to authenticate to your S3 bucket:
-
If the S3 buckets can be authenticated to using IAM roles, you can enable the checkbox that accepts credentials from the
Attached IAM
role. -
Alternatively, you can provide an
Access Key
and aSecret Key
.
Azure Data Lake Gen 1
Required configurations:
-
Container
Model monitoring offers users two ways to authenticate requests to Azure Data Lake Gen 1:
-
By providing client credentials that include:
-
Token Endpoint
-
Application ID
-
Secret Key
-
-
Via a
managed service identity
. This method applies when the Model Monitor has been deployed on Azure VMs configured with service identities that can access Azure Data Lake. This includes --
An optional port number
-
Azure Data Lake Gen 2
Required configurations:
-
Container
-
Account Name
Model monitoring offers users five ways to authenticate to Azure Data Lake Gen 2:
-
By providing the
shared key
-
By providing client credentials that include -
-
Token Endpoint
-
Application ID
-
Client Secret
-
-
By Providing a username and password that includes -
-
Token Endpoint
-
Username
-
Password
-
-
By providing a refresh token that includes -
-
Refresh token
-
Optional client ID
-
-
Via a
managed service identity
This method applies when the Model Monitor has been deployed on Azure VMs configured with service identities that can access Azure Data Lake Gen2. This includes --
An optional tenant ID
-
An optional client ID
-