Metadata Cache

msh maintains a metadata cache in the .msh/ directory for fast AI operations and project analysis.

Location

The metadata cache is stored in .msh/ directory at the project root:

my-project/
├── .msh/
│   ├── manifest.json      # Compiled manifest of all assets
│   ├── lineage.json       # Lineage graph (edges between assets)
│   ├── schemas.json       # Flattened view of schemas per asset
│   ├── tests.json         # Test definitions and latest statuses
│   ├── versions.json      # Deployment versions/history
│   └── glossary.json      # Cached glossary file
├── msh.yaml
└── models/

Cache Files

manifest.json

Compiled manifest of all assets in the project.

Structure:

{
  "project": {
    "id": "my-project",
    "name": "My Project",
    "warehouse": "postgres",
    "default_schema": "public"
  },
  "assets": [
    {
      "id": "revenue",
      "path": "assets/revenue.msh",
      "blocks": {...},
      "schema": {...}
    }
  ]
}

Purpose:

Fast AI operations (no need to re-parse all files)
Project-level analysis
Asset discovery

lineage.json

Lineage graph showing dependencies between assets.

Structure:

{
  "edges": [
    {
      "from": "stg_orders",
      "to": "revenue",
      "type": "transform"
    }
  ],
  "nodes": ["stg_orders", "revenue"]
}

Purpose:

Dependency analysis
Impact analysis
DAG visualization

schemas.json

Flattened view of schemas per asset.

Structure:

{
  "revenue": {
    "columns": [
      {"name": "customer_id", "type": "integer"},
      {"name": "month", "type": "date"},
      {"name": "monthly_revenue", "type": "decimal"}
    ]
  }
}

Purpose:

Schema analysis
Type checking
Column discovery

tests.json

Test definitions and latest statuses.

Structure:

{
  "revenue": {
    "tests": [
      {"type": "unique", "columns": ["customer_id", "month"]},
      {"type": "assert", "sql": "monthly_revenue > 0"}
    ],
    "last_run": "2024-01-15T10:00:00Z",
    "status": "passed"
  }
}

Purpose:

Test analysis
Quality monitoring
Test coverage

versions.json

Deployment versions and history.

Structure:

{
  "revenue": [
    {
      "version": "a1b2c3d4",
      "deployed_at": "2024-01-15T10:00:00Z",
      "status": "active"
    }
  ]
}

Purpose:

Version tracking
Deployment history
Rollback analysis

glossary.json

Cached glossary file.

Structure:

{
  "terms": [
    {
      "id": "term.customer",
      "name": "Customer",
      "description": "A customer entity"
    }
  ],
  "metrics": [...],
  "dimensions": [...],
  "policies": [...]
}

Purpose:

Business glossary
Term linking
Policy enforcement

Cache Generation

Initial Generation

Generate cache using msh manifest:

msh manifest

This creates all cache files in .msh/ directory.

Incremental Updates

Cache is automatically updated when:

Assets are modified
Glossary is updated
Tests are run

You can also manually update:

msh manifest --update

Cache Invalidation

Cache is invalidated when:

Asset files are modified
msh.yaml is changed
Glossary is updated

Manual invalidation:

rm -rf .msh/
msh manifest

Benefits

Performance

Fast AI operations: No need to re-parse all files
Incremental updates: Only changed files are re-parsed
Shared cache: CLI and engine share the same cache

Consistency

Single source of truth: All metadata in one place
Version controlled: Cache can be committed (optional)
Reproducible: Same cache = same results

AI Optimization

Token limits: Cache is optimized for AI consumption
Relevant data: Only necessary metadata included
Structured format: Easy for AI to parse

Best Practices

Version Control

Option 1: Commit cache (recommended for teams)

git add .msh/
git commit -m "Update metadata cache"

Option 2: Ignore cache (regenerate on demand)

# .gitignore
.msh/

Cache Maintenance

Regular updates: Run msh manifest after changes
Clean cache: Remove .msh/ if cache is corrupted
Monitor size: Cache files can grow large in big projects

msh manifest - Generate cache
Context Packs - Uses cache for AI operations
Metadata Concepts - Metadata system overview

Location​

Cache Files​

manifest.json​

lineage.json​

schemas.json​

tests.json​

versions.json​

glossary.json​

Cache Generation​

Initial Generation​

Incremental Updates​

Cache Invalidation​

Benefits​

Performance​

Consistency​

AI Optimization​

Best Practices​

Version Control​

Cache Maintenance​

Related Documentation​

Location

Cache Files

manifest.json

lineage.json

schemas.json

tests.json

versions.json

glossary.json

Cache Generation

Initial Generation

Incremental Updates

Cache Invalidation

Benefits

Performance

Consistency

AI Optimization

Best Practices

Version Control

Cache Maintenance

Related Documentation