Skip to main content

Atomic Resilience

msh is designed to be safe by default. Unlike traditional ETL tools that leave your data in an inconsistent state when they fail, msh guarantees that your production data is never touched unless the entire pipeline succeeds.

The Time Machine (msh rollback)

Because msh uses a Blue/Green deployment strategy, every successful run creates a new, immutable version of your data. The previous version is kept online until you decide to remove it (or until it expires based on your retention policy).

This enables the Time Machine: the ability to instantly revert your entire data warehouse to a previous, known-good state.

How to Rollback

If you discover a bug in your latest deployment, you don't need to revert code and re-run a long pipeline. You simply run:

msh rollback

Output:

[msh] Rolling back to previous state (run_id: b2c3d4e)...
[msh] ✓ Swapped views back to analytics_green_b2c3d4e
[msh] ✓ Rollback complete in 0.4s

This command:

  1. Looks up the previous successful deployment hash in the msh_state_history table.
  2. Executes an atomic CREATE OR REPLACE VIEW to point your production schemas back to that hash.
  3. Marks the "bad" deployment as rolled back.

Rolling Back Specific Assets

You can also rollback specific assets if you don't want to revert the entire warehouse:

msh rollback models/revenue.msh

Pre-Flight Contracts

Data pipelines often fail because the source data changes unexpectedly (e.g., a column is renamed or data types change). msh allows you to define Contracts that are verified before any data is processed.

The contract Block

You can add a contract block to your .msh file to enforce schema expectations.

name: stripe_payments
ingest:
type: rest_api
endpoint: https://api.stripe.com/v1/charges

contract:
evolution: evolve # Allow schema evolution (default: "evolve")
enforce_types: true # Enforce type consistency
required_columns: # Columns that must exist
- id
- amount
- currency
allow_new_columns: true # Allow new columns (default: true)

transform: |
SELECT id, amount, currency FROM {{ source }}

Contract Fields:

  • evolution: Schema evolution mode

    • "evolve" (default): Allows new columns to be added automatically
    • "freeze": Prevents new columns (uses dlt's schema_evolution="freeze")
  • enforce_types: Boolean (default: false)

    • true: Validates that data types match expectations
    • false: Allows type flexibility
  • required_columns: List of column names that must exist

    • Pipeline fails if any required column is missing
    • Empty list means no columns are required
  • allow_new_columns: Boolean (default: true)

    • true: Allows columns not in required_columns list
    • false: Only allows columns specified in required_columns (when evolution: freeze)

Fail-Fast Logic

When you run msh run, the Orchestrator checks these contracts against the source schema before launching the ingestion job.

  • If the contract is met: The pipeline proceeds.
  • If the contract is violated: The pipeline fails immediately (Fail-Fast), saving you from processing invalid data or waking up to a broken warehouse.

Failure Example:

[msh] Checking contracts for stripe_payments...
[msh] ✗ Contract Failed: Missing required columns: ['currency']
[msh] Found: ['id', 'amount', 'created']
[msh] Aborting run. No data was changed.

Contract Validation

Contracts are validated before ingestion during the pipeline execution:

  1. Required Columns Check: Verifies all columns in required_columns exist in the source data
  2. Schema Evolution Check: If evolution: freeze, prevents new columns from being added
  3. Type Enforcement: If enforce_types: true, validates data types match expectations

Validation happens in Phase 9 of the pipeline (Pre-Flight Contracts), ensuring failures occur before any data is processed.

Example: Freezing Schema

To prevent schema drift, use evolution: freeze:

name: stable_api_data
ingest:
type: rest_api
endpoint: https://api.example.com/data

contract:
evolution: freeze # Prevent new columns
enforce_types: true
required_columns:
- id
- name
- created_at
allow_new_columns: false # Strict: only required columns allowed

transform: |
SELECT id, name, created_at FROM {{ source }}

This ensures your schema never changes unexpectedly, catching breaking changes early.