Freshness Command
Check if your data assets are up-to-date and meet freshness expectations. This command validates that data pipelines are running on schedule and alerts you to stale data.
Overview
The msh freshness command checks the last successful run time for each asset against configured freshness expectations. It queries the msh_state_history table (fast) rather than scanning raw data (slow), making it efficient for monitoring large datasets.
Usage
msh freshness
What it does:
- Reads freshness expectations from
.mshfiles - Queries
msh_state_historytable for last successful run timestamps - Compares age against
warn_afteranderror_afterthresholds - Displays a table with status indicators
Example Output
Data Freshness
┌──────────────┬──────────────┬────────┐
│ Asset │ Last Run │ Status │
├──────────────┼──────────────┼────────┤
│ customers │ 2.3h ago │ OK │
│ orders │ 15.2h ago │ WARN │
│ revenue │ 50.1h ago │ ERROR │
└──────────────┴──────────────┴────────┘
Status Indicators:
OK(green): Data is fresh, withinwarn_afterthresholdWARN(yellow): Data is stale, exceedswarn_afterbut withinerror_afterERROR(red): Data is very stale, exceedserror_afterthresholdMISSING: Asset has never been successfully deployed
Configuration: Freshness Block
Add a freshness block to your .msh file to set expectations:
name: customers
ingest:
type: rest_api
endpoint: https://api.example.com/customers
freshness:
warn_after: 12h # Warn if data is older than 12 hours
error_after: 24h # Error if data is older than 24 hours
transform: |
SELECT * FROM {{ source }}
Freshness Fields
-
warn_after(optional): Duration before warning (default:24h)- Format:
12h,30m,2d,3600s - Examples:
12h,1d,30m,7200s
- Format:
-
error_after(optional): Duration before error (default:48h)- Format: Same as
warn_after - Must be greater than
warn_after
- Format: Same as
Duration Format
Supported time units:
s- secondsm- minutesh- hoursd- days
Examples:
freshness:
warn_after: 6h # 6 hours
error_after: 12h # 12 hours
freshness:
warn_after: 30m # 30 minutes
error_after: 2h # 2 hours
freshness:
warn_after: 1d # 1 day
error_after: 3d # 3 days
How It Works: The Fast Query Trick
Why it's fast: Instead of scanning raw data tables (which can be terabytes), msh freshness queries the lightweight msh_state_history metadata table. This table stores:
- Asset name
- Deployment hash
- Timestamp of last successful run
Query Pattern:
SELECT timestamp FROM msh_state_history
WHERE asset = 'customers'
ORDER BY timestamp DESC
LIMIT 1
This makes freshness checks instant even for large datasets.
Examples
Example 1: High-Frequency Data
For data that should update every few hours:
name: real_time_metrics
ingest:
type: rest_api
endpoint: https://api.example.com/metrics
freshness:
warn_after: 2h # Warn after 2 hours
error_after: 6h # Error after 6 hours
transform: |
SELECT * FROM {{ source }}
Example 2: Daily Batch Data
For daily ETL jobs:
name: daily_sales
ingest:
type: sql_database
credentials: ${SOURCE_DB}
table: sales
freshness:
warn_after: 25h # Warn if not updated within 25 hours
error_after: 48h # Error if not updated within 48 hours
transform: |
SELECT * FROM {{ source }}
Example 3: Weekly Reports
For weekly aggregations:
name: weekly_summary
ingest:
type: rest_api
endpoint: https://api.example.com/weekly
freshness:
warn_after: 8d # Warn after 8 days
error_after: 10d # Error after 10 days
transform: |
SELECT * FROM {{ source }}
Integration with Monitoring
CI/CD Pipelines
Use msh freshness in CI/CD to fail builds if data is stale:
# GitHub Actions example
- name: Check Data Freshness
run: msh freshness
# Exit code 1 if any asset exceeds error_after threshold
Scheduled Monitoring
Set up a cron job to check freshness regularly:
# Check freshness every hour
0 * * * * cd /path/to/project && msh freshness
Alerting
Combine with alerting tools:
#!/bin/bash
msh freshness
if [ $? -ne 0 ]; then
# Send alert (email, Slack, PagerDuty, etc.)
send_alert "Data freshness check failed"
fi
Troubleshooting
"MISSING" Status
If an asset shows MISSING:
│ customers │ Never │ MISSING │
Causes:
- Asset has never been successfully deployed
- Asset name mismatch between
.mshfile and deployment history
Solution:
# Deploy the asset
msh run customers
# Verify deployment
msh status
Assets Not Showing
If assets don't appear in the freshness check:
Cause: Missing freshness block in .msh file
Solution: Add freshness configuration to your .msh file:
freshness:
warn_after: 24h
error_after: 48h
Incorrect Timestamps
If timestamps seem wrong:
Cause: Timezone differences or clock skew
Solution: Ensure your database server and local machine have synchronized clocks.
Best Practices
- Set Realistic Thresholds: Base
warn_afteranderror_afteron your actual update frequency - Monitor in Production: Add freshness checks to production monitoring dashboards
- Alert on Errors: Set up alerts for
ERRORstatus to catch pipeline failures quickly - Document Expectations: Document freshness requirements in your data catalog
Related Commands
msh status: View active versions (doesn't check freshness)msh versions: View deployment history for an assetmsh run: Deploy assets (updates freshness timestamps)
See Also
- Production Deployment: Setting up monitoring and alerts
- Atomic Resilience: Understanding deployment state