domino logo
Tech Ecosystem
Get started with Python
Step 0: Orient yourself to DominoStep 1: Create a projectStep 2: Configure your projectStep 3: Start a workspaceStep 4: Get your files and dataStep 5: Develop your modelStep 6: Clean up WorkspacesStep 7: Deploy your model
Get started with R
Step 0: Orient yourself to Domino (R Tutorial)Step 1: Create a projectStep 2: Configure your projectStep 3: Start a workspaceStep 4: Get your files and dataStep 5: Develop your modelStep 6: Clean up WorkspacesStep 7: Deploy your model
Get Started with MATLAB
Step 1: Orient yourself to DominoStep 2: Create a Domino ProjectStep 3: Configure Your Domino ProjectStep 4: Start a MATLAB WorkspaceStep 5: Fetch and Save Your DataStep 6: Develop Your ModelStep 7: Clean Up Your Workspace
Step 8: Deploy Your Model
Scheduled JobsLaunchers
Step 9: Working with Domino Datasets
Domino Reference
Notifications
On-Demand Open MPI
Configure MPI PrerequisitesFile Sync MPI ClustersValidate MPI VersionWork with your ClusterManage Dependencies
Projects
Projects OverviewProjects PortfolioReference ProjectsProject Goals in Domino 4+
Git Integration
Git Repositories in DominoGit-based Projects with CodeSyncWorking from a Commit ID in Git
Jira Integration in DominoUpload Files to Domino using your BrowserFork and Merge ProjectsSearchSharing and CollaborationCommentsDomino Service FilesystemComparing File RevisionsRevert Projects and Files
Advanced Project Settings
Project DependenciesProject TagsRename a ProjectSet up your Project to Ignore FilesUpload files larger than 550MBExporting Files as a Python or R PackageTransfer Project Ownership
Domino Runs
JobsDiagnostic Statistics with dominostats.jsonNotificationsResultsRun Comparison
Advanced Options for Domino Runs
Run StatesDomino Environment VariablesEnvironment Variables for Secure Credential StorageUse Apache Airflow with Domino
Scheduled Jobs
Domino Workspaces
WorkspacesUse Git in Your WorkspaceRecreate A Workspace From A Previous CommitUse Visual Studio Code in Domino WorkspacesPersist RStudio PreferencesAccess Multiple Hosted Applications in one Workspace Session
Spark on Domino
On-Demand Spark
On-Demand Spark OverviewValidated Spark VersionConfigure PrerequisitesWork with your ClusterManage DependenciesWork with Data
External Hadoop and Spark
Hadoop and Spark OverviewConnecting to a Cloudera CDH5 cluster from DominoConnecting to a Hortonworks cluster from DominoConnect to a MapR cluster from DominoConnect to an Amazon EMR cluster from DominoRunning Local Spark on a Domino ExecutorUsing PySpark in Jupyter WorkspacesKerberos Authentication
On-Demand Ray
On-Demand Ray OverviewValidated Ray VersionConfigure PrerequisitesWork with your ClusterManage DependenciesWork with Data
On-Demand Dask
On-Demand Dask OverviewValidated Dask VersionConfigure PrerequisitesWork with Your ClusterManage DependenciesWork with Data
Customize the Domino Software Environment
Environment ManagementDomino Standard EnvironmentsInstall Packages and DependenciesAdd Workspace IDEsAdding Jupyter KernelsAutomatic Adaptation of Custom Images
Partner Environments for Domino
Use MATLAB as a WorkspaceUse Stata as a WorkspaceUse SAS as a Workspace
Advanced Options for Domino Software Environment
Publish in Domino with Custom ImagesInstall Custom Packages in Domino with Git IntegrationAdd Custom DNS Servers to Your Domino EnvironmentConfigure a Compute Environment to User Private Cran/Conda/PyPi MirrorsUse TensorBoard in Jupyter Workspaces
Publish your Work
Publish a Model API
Model Publishing OverviewModel Invocation SettingsModel Access and CollaborationModel Deployment ConfigurationPromote Projects to ProductionExport Model ImageExport to NVIDIA Fleet Command
Publish a Web Application
App Publishing OverviewGet Started with DashGet Started with ShinyGet Started with FlaskContent Security Policies for Web Apps
Advanced Web Application Settings in Domino
App Scaling and PerformanceHost HTML Pages from DominoHow to Get the Domino Username of an App Viewer
Launchers
Launchers OverviewAdvanced Launcher Editor
Assets Portfolio Overview
Model Monitoring and Remediation
Monitor WorkflowsData Drift and Quality Monitoring
Set up Monitoring for Model APIs
Set up Prediction CaptureSet up Drift DetectionSet up Model Quality MonitoringSet up NotificationsSet Scheduled ChecksSet up Cohort Analysis
Set up Model Monitor
Connect a Data SourceRegister a ModelSet up Drift DetectionSet up Model Quality MonitoringSet up Cohort AnalysisSet up NotificationsSet Scheduled Checks
Use Monitoring
Access the Monitor DashboardAnalyze Data DriftAnalyze Model QualityExclude Features from Scheduled Checks
Remediation
Cohort Analysis
Review the Cohort Analysis
Remediate a Model API
Monitor Settings
API TokenHealth DashboardNotification ChannelsTest Defaults
Monitoring Config JSON
Supported Binning Methods
Model Monitoring APIsTroubleshoot the Model Monitor
Connect to your Data
Data in Domino
Datasets OverviewProject FilesDatasets Best Practices
Connect to Data Sources
External Data VolumesDomino Data Sources
Connect to External Data
Connect Domino to DataRobotConnect to Azure Data Lake StorageConnect to BigQuery from DominoConnect to Google Cloud Storage from DominoConnect to IBM DB2 from DominoConnect to IBM Netezza from DominoConnect to Impala from DominoConnect to MSSQL from DominoConnect to MySQL from DominoConnect to Okera from DominoConnect to Oracle Database from DominoConnect to PostgreSQL from DominoConnect to Redshift from DominoConnect to S3 from DominoConnect to Snowflake from DominoConnect to Teradata from Domino
Work with Data Best Practices
Work with Big Data in DominoWork with Lots of FilesMove Data Over a Network
Advanced User Configuration Settings
User API KeysDomino TokenOrganizations Overview
Use the Domino Command Line Interface (CLI)
Install the Domino Command Line (CLI)Domino CLI ReferenceDownload Files with the CLIForce-Restore a Local ProjectMove a Project Between Domino DeploymentsUse the Domino CLI Behind a Proxy
Browser Support
Get Help with Domino
Additional ResourcesGet Domino VersionContact Domino Technical SupportSupport Bundles
domino logo
About Domino
Domino Data LabKnowledge BaseData Science BlogCommunityTraining
User Guide
>
Domino Reference
>
Projects
>
Reference Projects

Reference Projects

Domino Data Lab provides a collection of open-source solutions called Domino Reference Projects. These projects are freely available to the data science and machine learning community, and were built with the following goals:

  • To educate practitioners about a specific data science topic.

  • To accomplish a specific analytical method or task in the Domino MLOps Platform, including relevant best practices.

  • To provide an easy way to share pre-built assets where possible such as a Launcher, Scheduled Job, App, Endpoint, and so on.

  • To facilitate onboarding of new team members by providing end-to-end implementations that they can use to get experience with the platform.

All the projects follow a common pattern, where a use case was developed with Python or R. The data sets that the projects use are based on freely available collections of data that are encapsulated with the reference project or are available externally to be downloaded.

Typically, the projects contain a Jupyter notebook, which provides background and context for the use case. Most of the projects also include the relevant scripts for operationalization (such as model retraining job scripts, Model API scripts, and web applications). The projects and all accompanying assets are available on GitHub.

The following table lists the reference projects that are currently available.

Project NameBrief descriptionGitHub Link

Credit Card Fraud Detection

Uses XGBoost to detect credit card transaction fraud

https://github.com/dominodatalab/domino-reference-project-fraud-detection

Named Entity Recognition

Locates and classifies named entities with a BiLSTM-CRF model

https://github.com/dominodatalab/domino-reference-project-ner

The GitHub repositories include instructions about how to use the project assets and how to create a dedicated compute environment, if needed.

See import the relevant GitHub repository to bring the project assets into your Domino installation or leverage Git-based projects.

Credit fraud detection reference project

Credit card fraud represents a significant problem for financial institutions, and reliable fraud detection is generally challenging. You can use this project as a template to facilitate training a machine learning model on a real-world credit card fraud dataset. It employs techniques like oversampling and threshold moving to address class imbalance.

The dataset used in this project was collected as part of a research collaboration between Worldline and the Machine Learning Group of Université Libre de Bruxelles. You can download the raw data from Kaggle.

The following assets are included in the project:

  • FraudDetection.ipynb - A notebook that performs exploratory data analysis, data wrangling, hyperparameter optimization, model training, and evaluation. The notebook introduces the use cases and describes the key techniques needed to implement a classification model (such as oversampling and threshold moving).

  • model_train.py - A training script that can be operationalized and retrain the model on-demand or on schedule. You can use the script as a template. The key elements that must be customized for other datasets are:

    • load_data - data ingestion function

    • feature_eng - data wrangling

    • xgboost_search - more specifically, the values in params that define the grid search scope

  • model_api.py - A scoring function that exposes the persisted model as Model API. The score function accepts all independent parameters of the dataset as arguments and uses the model to compute the fraud probability for the individual transaction.

Note

This project uses the imblearn and xgboost Python packages that are not included in the Domino standard environments. You can either customize a copy of the Domino Standard Environment or create a new environment using the Dockerfile instructions in the README.md file of the project.

Named entity recognition

Named Entity Recognition (NER) is a Neuro-Linguisitic Programming (NLP) problem that involves locating and classifying named entities (people, places, organizations, and so on) mentioned in unstructured text. This problem is used in many NLP applications that deal with use-cases like machine translation, information retrieval, chatbots, and others. In this project, we fit a BiLSTM-CRF model using a freely available annotated corpus and Keras.

This project uses the Annotated Corpus for Named Entity Recognition dataset. This dataset is based on the GMB (Groningen Meaning Bank) corpus and has been tagged, annotated and built specifically to train a classifier to predict named entities such as name and location.

The assets included in the project are:

  • ner.ipynb - A notebook that performs exploratory data analysis, data wrangling, hyperparameter optimization, model training, and evaluation. The notebook introduces the use cases and describes the key techniques needed to implement an NER classification model.

  • model_train.py - A training script that can be operationalized and retrain the model on-demand or on-schedule. You can use the script as a template. The key elements that must be customized for other datasets are:

    • load_data - data ingestion function

    • pre_process - data wrangling

      Most of the important parameters are controlled through command-line arguments to the script.

  • model_api.py - A scoring function that exposes the persisted model as Model API. The score function accepts a string of plain text and outputs the tokenized version of the text with the corresponding IOB tags.

This project uses the plot-keras-history and keras-contrib Python packages that are not included in the Domino standard environments. You can either customize a copy of the Domino Standard Environment or create a new environment with the Dockerfile instructions in the README.md file of the project.

Domino Data LabKnowledge BaseData Science BlogCommunityTraining
Copyright © 2022 Domino Data Lab. All rights reserved.