Atomic Resilience
msh is designed to be safe by default. Unlike traditional ETL tools that leave your data in an inconsistent state when they fail, msh guarantees that your production data is never touched unless the entire pipeline succeeds.
The Time Machine (msh rollback)
Because msh uses a Blue/Green deployment strategy, every successful run creates a new, immutable version of your data. The previous version is kept online until you decide to remove it (or until it expires based on your retention policy).
This enables the Time Machine: the ability to instantly revert your entire data warehouse to a previous, known-good state.
How to Rollback
If you discover a bug in your latest deployment, you don't need to revert code and re-run a long pipeline. You simply run:
msh rollback
Output:
[msh] Rolling back to previous state (run_id: b2c3d4e)...
[msh] ✓ Swapped views back to analytics_green_b2c3d4e
[msh] ✓ Rollback complete in 0.4s
This command:
- Looks up the previous successful deployment hash in the
msh_state_historytable. - Executes an atomic
CREATE OR REPLACE VIEWto point your production schemas back to that hash. - Marks the "bad" deployment as rolled back.
Rolling Back Specific Assets
You can also rollback specific assets if you don't want to revert the entire warehouse:
msh rollback models/revenue.msh
Pre-Flight Contracts
Data pipelines often fail because the source data changes unexpectedly (e.g., a column is renamed or data types change). msh allows you to define Contracts that are verified before any data is processed.
The contract Block
You can add a contract block to your .msh file to enforce schema expectations.
name: stripe_payments
ingest:
type: rest_api
endpoint: https://api.stripe.com/v1/charges
contract:
evolution: evolve # Allow schema evolution (default: "evolve")
enforce_types: true # Enforce type consistency
required_columns: # Columns that must exist
- id
- amount
- currency
allow_new_columns: true # Allow new columns (default: true)
transform: |
SELECT id, amount, currency FROM {{ source }}
Contract Fields:
-
evolution: Schema evolution mode"evolve"(default): Allows new columns to be added automatically"freeze": Prevents new columns (uses dlt'sschema_evolution="freeze")
-
enforce_types: Boolean (default:false)true: Validates that data types match expectationsfalse: Allows type flexibility
-
required_columns: List of column names that must exist- Pipeline fails if any required column is missing
- Empty list means no columns are required
-
allow_new_columns: Boolean (default:true)true: Allows columns not inrequired_columnslistfalse: Only allows columns specified inrequired_columns(whenevolution: freeze)
Fail-Fast Logic
When you run msh run, the Orchestrator checks these contracts against the source schema before launching the ingestion job.
- If the contract is met: The pipeline proceeds.
- If the contract is violated: The pipeline fails immediately (Fail-Fast), saving you from processing invalid data or waking up to a broken warehouse.
Failure Example:
[msh] Checking contracts for stripe_payments...
[msh] ✗ Contract Failed: Missing required columns: ['currency']
[msh] Found: ['id', 'amount', 'created']
[msh] Aborting run. No data was changed.
Contract Validation
Contracts are validated before ingestion during the pipeline execution:
- Required Columns Check: Verifies all columns in
required_columnsexist in the source data - Schema Evolution Check: If
evolution: freeze, prevents new columns from being added - Type Enforcement: If
enforce_types: true, validates data types match expectations
Validation happens in Phase 9 of the pipeline (Pre-Flight Contracts), ensuring failures occur before any data is processed.
Example: Freezing Schema
To prevent schema drift, use evolution: freeze:
name: stable_api_data
ingest:
type: rest_api
endpoint: https://api.example.com/data
contract:
evolution: freeze # Prevent new columns
enforce_types: true
required_columns:
- id
- name
- created_at
allow_new_columns: false # Strict: only required columns allowed
transform: |
SELECT id, name, created_at FROM {{ source }}
This ensures your schema never changes unexpectedly, catching breaking changes early.