Freshness Command

Check if your data assets are up-to-date and meet freshness expectations. This command validates that data pipelines are running on schedule and alerts you to stale data.

Overview

The msh freshness command checks the last successful run time for each asset against configured freshness expectations. It queries the msh_state_history table (fast) rather than scanning raw data (slow), making it efficient for monitoring large datasets.

Usage

msh freshness

What it does:

Reads freshness expectations from .msh files
Queries msh_state_history table for last successful run timestamps
Compares age against warn_after and error_after thresholds
Displays a table with status indicators

Example Output

Data Freshness
┌──────────────┬──────────────┬────────┐
│ Asset        │ Last Run     │ Status │
├──────────────┼──────────────┼────────┤
│ customers    │ 2.3h ago     │ OK     │
│ orders       │ 15.2h ago    │ WARN   │
│ revenue      │ 50.1h ago    │ ERROR  │
└──────────────┴──────────────┴────────┘

Status Indicators:

OK (green): Data is fresh, within warn_after threshold
WARN (yellow): Data is stale, exceeds warn_after but within error_after
ERROR (red): Data is very stale, exceeds error_after threshold
MISSING: Asset has never been successfully deployed

Configuration: Freshness Block

Add a freshness block to your .msh file to set expectations:

name: customers
ingest:
  type: rest_api
  endpoint: https://api.example.com/customers

freshness:
  warn_after: 12h    # Warn if data is older than 12 hours
  error_after: 24h   # Error if data is older than 24 hours

transform: |
  SELECT * FROM {{ source }}

Freshness Fields

warn_after (optional): Duration before warning (default: 24h)
- Format: 12h, 30m, 2d, 3600s
- Examples: 12h, 1d, 30m, 7200s
error_after (optional): Duration before error (default: 48h)
- Format: Same as warn_after
- Must be greater than warn_after

Duration Format

Supported time units:

s - seconds
m - minutes
h - hours
d - days

Examples:

freshness:
  warn_after: 6h      # 6 hours
  error_after: 12h    # 12 hours

freshness:
  warn_after: 30m     # 30 minutes
  error_after: 2h     # 2 hours

freshness:
  warn_after: 1d      # 1 day
  error_after: 3d     # 3 days

How It Works: The Fast Query Trick

Why it's fast: Instead of scanning raw data tables (which can be terabytes), msh freshness queries the lightweight msh_state_history metadata table. This table stores:

Asset name
Deployment hash
Timestamp of last successful run

Query Pattern:

SELECT timestamp FROM msh_state_history 
WHERE asset = 'customers' 
ORDER BY timestamp DESC 
LIMIT 1

This makes freshness checks instant even for large datasets.

Examples

Example 1: High-Frequency Data

For data that should update every few hours:

name: real_time_metrics
ingest:
  type: rest_api
  endpoint: https://api.example.com/metrics

freshness:
  warn_after: 2h     # Warn after 2 hours
  error_after: 6h    # Error after 6 hours

transform: |
  SELECT * FROM {{ source }}

Example 2: Daily Batch Data

For daily ETL jobs:

name: daily_sales
ingest:
  type: sql_database
  credentials: ${SOURCE_DB}
  table: sales

freshness:
  warn_after: 25h    # Warn if not updated within 25 hours
  error_after: 48h   # Error if not updated within 48 hours

transform: |
  SELECT * FROM {{ source }}

Example 3: Weekly Reports

For weekly aggregations:

name: weekly_summary
ingest:
  type: rest_api
  endpoint: https://api.example.com/weekly

freshness:
  warn_after: 8d     # Warn after 8 days
  error_after: 10d  # Error after 10 days

transform: |
  SELECT * FROM {{ source }}

Integration with Monitoring

CI/CD Pipelines

Use msh freshness in CI/CD to fail builds if data is stale:

# GitHub Actions example
- name: Check Data Freshness
  run: msh freshness
  # Exit code 1 if any asset exceeds error_after threshold

Scheduled Monitoring

Set up a cron job to check freshness regularly:

# Check freshness every hour
0 * * * * cd /path/to/project && msh freshness

Alerting

Combine with alerting tools:

#!/bin/bash
msh freshness
if [ $? -ne 0 ]; then
  # Send alert (email, Slack, PagerDuty, etc.)
  send_alert "Data freshness check failed"
fi

Troubleshooting

"MISSING" Status

If an asset shows MISSING:

│ customers    │ Never         │ MISSING │

Causes:

Asset has never been successfully deployed
Asset name mismatch between .msh file and deployment history

Solution:

# Deploy the asset
msh run customers

# Verify deployment
msh status

Assets Not Showing

If assets don't appear in the freshness check:

Cause: Missing freshness block in .msh file

Solution: Add freshness configuration to your .msh file:

freshness:
  warn_after: 24h
  error_after: 48h

Incorrect Timestamps

If timestamps seem wrong:

Cause: Timezone differences or clock skew

Solution: Ensure your database server and local machine have synchronized clocks.

Best Practices

Set Realistic Thresholds: Base warn_after and error_after on your actual update frequency
Monitor in Production: Add freshness checks to production monitoring dashboards
Alert on Errors: Set up alerts for ERROR status to catch pipeline failures quickly
Document Expectations: Document freshness requirements in your data catalog

msh status: View active versions (doesn't check freshness)
msh versions: View deployment history for an asset
msh run: Deploy assets (updates freshness timestamps)

Overview​

Usage​

Example Output​

Configuration: Freshness Block​

Freshness Fields​

Duration Format​

How It Works: The Fast Query Trick​

Examples​

Example 1: High-Frequency Data​

Example 2: Daily Batch Data​

Example 3: Weekly Reports​

Integration with Monitoring​

CI/CD Pipelines​

Scheduled Monitoring​

Alerting​

Troubleshooting​

"MISSING" Status​

Assets Not Showing​

Incorrect Timestamps​

Best Practices​

Related Commands​

See Also​