Auto Model Documentation

Automatically generate professional ML model documentation from code and Experiment Manager artifacts.

Overview

Auto Model Documentation is an LLM-powered tool that automatically generates professional documentation for machine learning models. It analyzes your ML codebase, extracts information from MLflow artifacts, and produces comprehensive Word documents and Jupyter notebooks describing your models, features, training pipelines, and performance metrics.

Auto-generate model documentation

Key Capabilities

FeatureDescription

Code Analysis

Scans Python source files to understand model architecture, feature engineering, and training logic

Experiment Manager Integration

Extracts metrics, parameters, and artifacts from registered models and experiments

LLM-Powered Generation

Uses Claude or GPT-4 to create natural language documentation from technical artifacts

Word Document Output

Generates professional .docx files with tables, charts, and formatted sections

Jupyter Notebook Output

Creates editable notebooks for further customization and sharing

Two Access Methods

Auto Model Documentation provides two ways to generate documentation:

  • Web UI: A browser-based interface for interactive configuration and job monitoring

  • CLI: A command-line tool for scripting and automation

Getting Started

Prerequisites

Before using Auto Model Documentation, ensure you have:

  • Python 3.10 or higher installed

  • Access to an LLM API (Anthropic Claude or OpenAI GPT-4)

  • MLflow tracking server access (if documenting registered models)

  • Your ML codebase accessible from the local filesystem

Installation

Install the package and its dependencies:

# Clone or download the repository
cd auto_model_docs

# Install in development mode
pip install -e .

# Or install dependencies directly
pip install -r requirements.txt

The package requires the following core dependencies:

PackagePurpose

anthropic

Anthropic Claude API client

openai

OpenAI GPT API client

mlflow

MLflow tracking and model registry access

python-docx

Word document generation

matplotlib

Chart and visualization generation

pyyaml

YAML spec file parsing

rich

Terminal progress display

Environment Setup

Set up your API keys and MLflow configuration. Create a .env file in the project root or set environment variables:

# LLM API Keys (set one or both)
ANTHROPIC_API_KEY=your-anthropic-api-key
OPENAI_API_KEY=your-openai-api-key

# Optional: Custom OpenAI-compatible endpoint
OPENAI_BASE_URL=https://api.example.com/v1

# MLflow Configuration (optional)
MLFLOW_TRACKING_URI=https://your-mlflow-server.com
Tip
In Domino environments, the MLflow tracking URI is automatically configured. You can also pass API keys directly via the CLI or Web UI.

Web UI Guide

The Web UI provides an intuitive interface for configuring and running documentation generation jobs.

Accessing the Web Interface

Start the web application:

python web_app.py

The application launches on http://localhost:8000 by default. In Domino, the app is accessible through the app deployments URL.

Configuration Panel

The configuration panel contains the essential settings for document generation.

FieldDescription

Spec File

Path to the YAML document specification file. Defaults to doc_spec.yaml in the project root.

Upload Spec

Alternatively, upload a spec file directly from your computer.

Code Root

Root directory of the ML codebase to analyze. Defaults to /mnt/code in Domino or . locally.

Output Directory

Where to save generated documents. Defaults to /mnt/data/{project_name} in Domino or ./output locally.

Options Panel

Configure the LLM provider and model settings.

FieldDescription

Provider

Select anthropic (Claude) or openai (GPT-4). Default: openai

API Key

Your API key for the selected provider. Can also be set via environment variable.

Model

Model name override. If blank, uses claude-sonnet-4-20250514 for Anthropic or gpt-4o for OpenAI.

Base URL

Custom API endpoint for OpenAI-compatible services (e.g., Azure OpenAI, local models).

Advanced Options

Fine-tune the generation process with advanced settings.

FieldDescriptionDefault

Generation Workers

Number of parallel workers for content generation. Higher values speed up generation but increase API costs.

4

Planning Workers

Number of parallel workers for section planning.

4

Max Files

Maximum number of source files to scan from the codebase.

50

Timeout

Timeout for individual LLM API calls in seconds.

120

Generate Notebook

Also create an editable Jupyter notebook alongside the Word document.

Off

Notebook Path

Custom path for the generated notebook.

<output>/model_docs_notebook.ipynb

Verbose

Enable detailed logging to show progress of each pipeline step.

Off

Artifact Filtering

Control which MLflow models and experiments are included in the documentation.

FieldDescription

Model Names

Comma-separated list of model names to include. Supports wildcards: (any characters) and ? (single character). Example: churn,fraud_detector

Experiment Names

Comma-separated list of experiment names to include. Supports wildcards. Example: customer_churn*,fraud_detection

Latest Only

When checked, only includes the latest version of each model, ignoring older versions.

Tip
Use wildcards to match multiple models. For example, prod_* matches all models starting with "prod_".

Running a Job

  1. Configure your settings in the form panels

  2. Click Generate Documentation to start the job

  3. Monitor progress in the Logs panel

Running a job to generate model documentation

The progress display shows four phases:

  • Scanning: Analyzing code and extracting MLflow artifacts

  • Planning: LLM planning content blocks for each section

  • Generating: Creating narratives, tables, and charts

  • Building: Assembling the final Word document

Once complete, download links appear for the generated files:

Download documentation artifacts
Tip
Click Stop to cancel a running job. The application will clean up any partial artifacts.

CLI Usage

The command-line interface provides full control over documentation generation for automation and scripting.

Basic Command Structure

python main.py --spec <path-to-spec.yaml> [options]

The --spec argument is required and specifies the YAML document specification file.

Key Arguments

ArgumentShortDescription

--spec

-s

Required. Path to YAML document specification file.

--output

-o

Output directory for generated documents.

--code-root

-c

Root directory of codebase to analyze.

--provider

-p

LLM provider: anthropic or openai. Default: openai

--model

-m

Model name override (uses provider default if not set).

--verbose

-v

Enable verbose output with detailed progress.

--notebook

Also generate an editable Jupyter notebook.

--notebook-path

Custom path for the generated notebook.

--notebook-from-cache

Regenerate notebook from cached results (skips full pipeline).

Filtering Options

Filter which MLflow models and experiments are documented:

ArgumentDescription

--experiments

Comma-separated list of experiment names/patterns. Supports wildcards.

--models

Comma-separated list of model names/patterns. Supports wildcards.

--latest-only

Only include the latest version of each model.

--disable-project-filtering

Disable automatic Domino project filtering (scan all projects).

Advanced Options

ArgumentDescriptionDefault

--generation-workers, -w

Number of parallel content generation workers.

4

--planning-workers

Number of parallel section planning workers.

4

--max-files

Maximum number of source files to scan.

50

--timeout

Timeout for individual LLM API calls (seconds).

120

--max-retries

Maximum retries for failed LLM requests.

3

--initial-backoff

Initial backoff delay for retries (seconds).

3.0

--max-backoff

Maximum backoff delay (seconds).

30.0

--backoff-jitter

Random jitter factor for backoff (0.0-1.0).

0.2

Example Commands

Basic generation with Anthropic Claude:

python main.py --spec doc_spec.yaml --provider anthropic

Generate with verbose output and notebook:

python main.py --spec doc_spec.yaml \
    --provider openai \
    --notebook \
    --verbose

Filter to specific models and experiments:

python main.py --spec doc_spec.yaml \
    --provider anthropic \
    --models "churn_model*,fraud_detector" \
    --experiments "production_*" \
    --latest-only

Custom paths and parallel workers:

python main.py --spec custom_spec.yaml \
    --code-root /path/to/ml/project \
    --output /path/to/output \
    --generation-workers 8 \
    --planning-workers 4 \
    --max-files 100

Regenerate notebook from cached results:

python main.py --spec doc_spec.yaml \
    --notebook-from-cache \
    --notebook-path ./updated_notebook.ipynb

Document Specification (YAML)

The document specification file defines the structure and content of the generated documentation.

Spec File Structure

A spec file contains three main components:

# Document metadata
title: "Machine Learning Model Documentation"
authors: "Data Science Team"

# Sections to include
sections:
  - Executive Summary
  - Data Overview
  - Model Architecture
  - Model Performance
  - Conclusion

# Optional hints for content guidance
hints:
  "Executive Summary": >
    Focus on business impact and key metrics.

Section Configuration

The sections list defines which sections appear in the document and their order:

sections:
  - Executive Summary
  - Data Overview
  - Feature Engineering
  - Model Architecture
  - Training Pipeline
  - Model Performance
  - Validation Results
  - Deployment Considerations
  - Conclusion
  - Appendix

Each section name becomes a heading in the generated document. The LLM generates appropriate content based on the section name and any hints provided.

Per-Model Sections

For multi-model documentation, use the per_model modifier to create separate subsections for each registered model:

sections:
  - Executive Summary
  - Data Overview
  - "Model Performance: per_model"  # Creates subsection for each model
  - Conclusion

This generates sections like:

  • 5. Model Performance

  • 5.1 Model Performance: churn_predictor

  • 5.2 Model Performance: fraud_detector

Hints for Guiding Content

Use hints to guide the LLM on what content to emphasize for each section:

hints:
  "Executive Summary": >
    Focus on business impact, key metrics, and high-level model capabilities.
    Keep it suitable for non-technical stakeholders.

  "Data Overview": >
    Describe data sources, volume, quality considerations, and any data
    preprocessing steps applied before feature engineering.

  "Model Performance": >
    Present key metrics (accuracy, AUC, F1, etc.), confusion matrix
    insights, and performance analysis across different segments.

  "Validation Results": >
    Document the validation approach based on what is found in the code.
    Do not describe validation methods that were not actually implemented.

  "Conclusion": >
    Summarize key model capabilities and performance highlights.
    Do not introduce new technical details - synthesize what was covered.
Tip
Be specific in hints about what to include or exclude. The LLM will follow these guidelines when generating content.

Example Spec File

Complete example specification:

# Document Specification for Auto Model Documentation
title: "Machine Learning Model Documentation"
authors: "Data Science Team"

sections:
  - Executive Summary
  - Data Overview
  - Feature Engineering
  - Model Architecture
  - Training Pipeline
  - "Model Performance: per_model"
  - Validation Results
  - Deployment Considerations
  - Conclusion
  - Appendix

hints:
  "Executive Summary": >
    Focus on business impact, key metrics, and high-level model capabilities.
    Keep it suitable for non-technical stakeholders.

  "Data Overview": >
    Describe data sources, volume, quality considerations, and any data
    preprocessing steps applied before feature engineering.

  "Feature Engineering": >
    Document feature transformations, encoding methods, scaling approaches,
    and feature selection rationale.

  "Model Architecture": >
    Include model type, hyperparameters, training configuration,
    and architectural decisions.

  "Model Performance": >
    Present key metrics (accuracy, AUC, F1, etc.), confusion matrix
    insights, and performance analysis across different segments.

  "Validation Results": >
    Document the validation approach used based on what is found in the code.
    If cross-validation was implemented, cover those results.
    If only holdout/test set validation was used, document that approach.

  "Deployment Considerations": >
    Address inference latency, resource requirements, monitoring strategy,
    and model refresh cadence.

  "Conclusion": >
    Summarize the key model capabilities and performance highlights.
    Describe recommended monitoring approach based on the model type.

Pipeline Phases

Auto Model Documentation executes a 4-phase pipeline to generate documentation.

Pipeline phases to generate documentation

Scanning Phase

The scanning phase extracts information from two sources in parallel:

Code Scanner:

  • Scans Python files in the code root directory

  • Extracts function definitions, class structures, and docstrings

  • Identifies ML-related patterns (model training, feature engineering)

  • Respects max_files limit to control scope

Artifact Scanner:

  • Connects to MLflow tracking server

  • Queries registered models and experiment runs

  • Extracts metrics, parameters, tags, and artifact metadata

  • Applies model/experiment filters if specified

Planning Phase

The planning phase uses the LLM to create a content plan for each section:

  • Analyzes the scanned context (code + artifacts)

  • Determines what content blocks are needed for each section

  • Plans narratives, tables, charts, and other elements

  • Considers hints from the spec file

  • Runs in parallel with configurable worker count

Generation Phase

The generation phase creates actual content for each planned block:

  • Narratives: Natural language descriptions and explanations

  • Tables: Structured data like metrics, parameters, feature lists

  • Charts: Visualizations of metrics, performance trends, distributions

  • Processes blocks in parallel for efficiency

Building Phase

The building phase assembles the final outputs:

  • Creates Word document with proper formatting and styles

  • Inserts tables, charts, and images

  • Generates Jupyter notebook if requested

  • Saves cache file for future notebook regeneration

Output Formats

Word Document (.docx)

The primary output is a professionally formatted Word document:

FeatureDescription

Structured Sections

Hierarchical headings matching the spec file structure

Formatted Tables

Metrics, parameters, and data presented in clean tables

Embedded Charts

Matplotlib visualizations inserted as images

Consistent Styling

Professional typography and layout

Editable

Open in Microsoft Word or Google Docs for further editing

Default location: <output_dir>/model_documentation.docx

Jupyter Notebook (.ipynb)

When --notebook is enabled, generates an editable notebook:

FeatureDescription

Markdown Cells

Section content as editable markdown

Code Cells

Executable code for reproducing charts and tables

Interactive

Run and modify in Jupyter Lab or VS Code

Shareable

Export to HTML, PDF, or other formats

Default location: <output_dir>/model_docs_notebook.ipynb

Cache File

A JSON cache file stores generation results:

FeatureDescription

Fast Regeneration

Rebuild notebook without re-running the full pipeline

Preserves Content

All generated text, tables, and chart data

Use with --notebook-from-cache

Skip scanning, planning, and generation phases

Location: <output_dir>/.autodoc_cache.json

Configuration Reference

Environment Variables

All settings can be configured via environment variables with or without the AUTODOC_ prefix:

VariableDescriptionDefault

ANTHROPIC_API_KEY

Anthropic Claude API key

-

OPENAI_API_KEY

OpenAI GPT API key

-

OPENAI_BASE_URL

Custom OpenAI-compatible endpoint

-

MLFLOW_TRACKING_URI

MLflow tracking server URI

-

CODE_ROOT

Root directory of codebase

/mnt/code

OUTPUT_DIR

Output directory

/mnt/data/{project} or ./output

MAX_FILES

Maximum files to scan

50

PARALLEL_WORKERS

Content generation workers

1

PLANNING_WORKERS

Section planning workers

1

LLM Provider Settings

VariableDescriptionDefault

LLM_PROVIDER

Provider: anthropic or openai

anthropic

LLM_MODEL

Model name override

Provider default

LLM_MAX_RETRIES

Max retries for requests

3

LLM_INITIAL_BACKOFF

Initial retry backoff (seconds)

3.0

LLM_MAX_BACKOFF

Maximum retry backoff (seconds)

30.0

LLM_BACKOFF_JITTER

Jitter factor (0.0-1.0)

0.2

Default models by provider:

ProviderDefault Model

Anthropic

claude-sonnet-4-20250514

OpenAI

gpt-4o

MLflow Settings

VariableDescription

MLFLOW_TRACKING_URI

MLflow tracking server URL

MLFLOW_EXPERIMENT_NAME

Default experiment to query

In Domino environments, MLflow is automatically configured. Use filtering options to control which models are documented.

Parallelization Settings

Control parallel execution to balance speed and API costs:

SettingRecommended ValueNotes

Generation Workers

4-8

Higher values speed up generation

Planning Workers

4

Planning is typically faster than generation

Max Files

50-100

Increase for larger codebases

Tip
Start with lower worker counts and increase if you have API rate limit headroom.

Best Practices

Tips for Better Documentation

  1. Write descriptive docstrings in your ML code. The scanner extracts these for context.

  2. Use meaningful MLflow experiment and model names. These appear in the generated documentation.

  3. Log comprehensive metrics to MLflow. The more metrics available, the richer the documentation.

  4. Organize code logically. Separate feature engineering, model training, and evaluation into distinct modules.

  5. Use clear function and variable names. The LLM uses these to understand code intent.

Recommended YAML Spec Patterns

  1. Start broad, then specific:

    sections:
      - Executive Summary      # High-level overview
      - Technical Deep Dive    # Detailed technical content
      - Appendix               # Supporting details
  2. Use per_model for multi-model projects:

    sections:
      - "Model Performance: per_model"
      - "Deployment Guide: per_model"
  3. Provide detailed hints for critical sections:

    hints:
      "Executive Summary": >
        This section is for executive stakeholders.
        Focus on business value and ROI.
        Avoid technical jargon.

Performance Optimization

  1. Limit file scanning for large codebases:

    --max-files 30
  2. Use model filtering to focus on specific models:

    --models "production_*" --latest-only
  3. Increase workers if you have API rate limit headroom:

    --generation-workers 8 --planning-workers 4
  4. Use notebook caching for iterative refinement to the notebook generation without calling the LLM during development:

    # First run: full generation
    python main.py --spec doc_spec.yaml --notebook
    
    # Subsequent runs: fast notebook rebuild
    python main.py --spec doc_spec.yaml --notebook-from-cache
  5. Enable verbose mode for debugging:

    python main.py --spec doc_spec.yaml --verbose

For additional help or to report issues, refer to the project repository or contact your support team.