Create TrainingSets

A TrainingSet is a versioned set of data, column information, and other metadata. TrainingSets are created implicitly when the first TrainingSetVersion with a particular training_set_name are added using the create_training_set_version function.

A TrainingSet can include versions from the same project. Attempting to add a version from a different project can result in an error.

TrainingSet names are strings containing only alphanumeric characters in the basic Latin alphabet including dash and underscore: [-A-Za-z_-]

from domino.training_sets import TrainingSetClient, model

training_set_version = TrainingSetClient.create_training_set_version(
    training_set_name=training_set_name,
    df=my_pandas_dataframe,
    key_columns=["user_id", "transaction_id"],
    target_columns=["is_fraud"],
    exclude_columns=["extra_column1", "extra_column2"],
    monitoring_meta=model.MonitoringMeta(
        timestamp_columns=["ts"],
        categorical_columns=["categorical_column1", "categorical_column2"],
        ordinal_columns=["ordinal_column1"],
    ),
    meta={"year": "2021"}
)
Note
To use a TrainingSet for model monitoring the monitoring_meta keyword argument must have a value for classification models. You can create a TrainingSet without this argument, but it can not be used until the argument has a value. Trying to register a Model API for monitoring, from the Monitoring tab for that Model API, with a null monitoring_meta value displays the following error: The selected Feature Set Version cannot currently be used for monitoring because it does not contain a schema definition.