A TrainingSet
is a versioned set of data, column information, and other metadata. TrainingSets
are created implicitly when the first TrainingSetVersion
with a particular training_set_name
are added using the create_training_set_version
function.
A TrainingSet
can include versions from the same project. Attempting to add a version from a different project can result in an error.
TrainingSet
names are strings containing only alphanumeric characters in the basic Latin alphabet including dash and underscore: [-A-Za-z_-]
from domino.training_sets import TrainingSetClient, model
training_set_version = TrainingSetClient.create_training_set_version(
training_set_name=training_set_name,
df=my_pandas_dataframe,
key_columns=["user_id", "transaction_id"],
target_columns=["is_fraud"],
exclude_columns=["extra_column1", "extra_column2"],
monitoring_meta=model.MonitoringMeta(
timestamp_columns=["ts"],
categorical_columns=["categorical_column1", "categorical_column2"],
ordinal_columns=["ordinal_column1"],
),
meta={"year": "2021"}
)
Note
|
To use a TrainingSet for model monitoring the monitoring_meta keyword argument must have a value for classification models. You can create a TrainingSet without this argument, but it can not be used until the argument has a value. Trying to register a Domino endpoint for monitoring, from the Grafana Monitoring tab for that Domino endpoint, with a null monitoring_meta value displays the following error: The selected Feature Set Version cannot currently be used for monitoring because it does not contain a schema definition.
|