Metadata Cache
msh maintains a metadata cache in the .msh/ directory for fast AI operations and project analysis.
Location
The metadata cache is stored in .msh/ directory at the project root:
my-project/
├── .msh/
│ ├── manifest.json # Compiled manifest of all assets
│ ├── lineage.json # Lineage graph (edges between assets)
│ ├── schemas.json # Flattened view of schemas per asset
│ ├── tests.json # Test definitions and latest statuses
│ ├── versions.json # Deployment versions/history
│ └── glossary.json # Cached glossary file
├── msh.yaml
└── models/
Cache Files
manifest.json
Compiled manifest of all assets in the project.
Structure:
{
"project": {
"id": "my-project",
"name": "My Project",
"warehouse": "postgres",
"default_schema": "public"
},
"assets": [
{
"id": "revenue",
"path": "assets/revenue.msh",
"blocks": {...},
"schema": {...}
}
]
}
Purpose:
- Fast AI operations (no need to re-parse all files)
- Project-level analysis
- Asset discovery
lineage.json
Lineage graph showing dependencies between assets.
Structure:
{
"edges": [
{
"from": "stg_orders",
"to": "revenue",
"type": "transform"
}
],
"nodes": ["stg_orders", "revenue"]
}
Purpose:
- Dependency analysis
- Impact analysis
- DAG visualization
schemas.json
Flattened view of schemas per asset.
Structure:
{
"revenue": {
"columns": [
{"name": "customer_id", "type": "integer"},
{"name": "month", "type": "date"},
{"name": "monthly_revenue", "type": "decimal"}
]
}
}
Purpose:
- Schema analysis
- Type checking
- Column discovery
tests.json
Test definitions and latest statuses.
Structure:
{
"revenue": {
"tests": [
{"type": "unique", "columns": ["customer_id", "month"]},
{"type": "assert", "sql": "monthly_revenue > 0"}
],
"last_run": "2024-01-15T10:00:00Z",
"status": "passed"
}
}
Purpose:
- Test analysis
- Quality monitoring
- Test coverage
versions.json
Deployment versions and history.
Structure:
{
"revenue": [
{
"version": "a1b2c3d4",
"deployed_at": "2024-01-15T10:00:00Z",
"status": "active"
}
]
}
Purpose:
- Version tracking
- Deployment history
- Rollback analysis
glossary.json
Cached glossary file.
Structure:
{
"terms": [
{
"id": "term.customer",
"name": "Customer",
"description": "A customer entity"
}
],
"metrics": [...],
"dimensions": [...],
"policies": [...]
}
Purpose:
- Business glossary
- Term linking
- Policy enforcement
Cache Generation
Initial Generation
Generate cache using msh manifest:
msh manifest
This creates all cache files in .msh/ directory.
Incremental Updates
Cache is automatically updated when:
- Assets are modified
- Glossary is updated
- Tests are run
You can also manually update:
msh manifest --update
Cache Invalidation
Cache is invalidated when:
- Asset files are modified
msh.yamlis changed- Glossary is updated
Manual invalidation:
rm -rf .msh/
msh manifest
Benefits
Performance
- Fast AI operations: No need to re-parse all files
- Incremental updates: Only changed files are re-parsed
- Shared cache: CLI and engine share the same cache
Consistency
- Single source of truth: All metadata in one place
- Version controlled: Cache can be committed (optional)
- Reproducible: Same cache = same results
AI Optimization
- Token limits: Cache is optimized for AI consumption
- Relevant data: Only necessary metadata included
- Structured format: Easy for AI to parse
Best Practices
Version Control
Option 1: Commit cache (recommended for teams)
git add .msh/
git commit -m "Update metadata cache"
Option 2: Ignore cache (regenerate on demand)
# .gitignore
.msh/
Cache Maintenance
- Regular updates: Run
msh manifestafter changes - Clean cache: Remove
.msh/if cache is corrupted - Monitor size: Cache files can grow large in big projects
Related Documentation
msh manifest- Generate cache- Context Packs - Uses cache for AI operations
- Metadata Concepts - Metadata system overview