Lifecycle Contract
The core of msh's reliability lies in its strict Lifecycle Contract. This contract dictates how state is managed and how changes are applied to your data warehouse.
Remote State (The Brain)
Unlike many CLI tools that store state in a local .json file or sqlite database, msh stores its entire state in the destination database itself, specifically in the msh_state_history table.
The "Docker Amnesia" Problem
In modern data stacks, orchestrators often run in ephemeral containers (Docker, Kubernetes). If state is stored locally, it vanishes when the container dies. This leads to "amnesia"—the tool forgets what it deployed, making rollbacks impossible.
By storing state in the destination:
- Persistence: The state survives container restarts.
- Collaboration: Multiple engineers can run
mshagainst the same environment without conflicts. - Single Source of Truth: The database knows exactly what version of the code it is running.
State Schema
The msh_state_history table tracks every deployment:
CREATE TABLE msh_state_history (
run_id VARCHAR PRIMARY KEY,
deployed_at TIMESTAMP,
schema_name VARCHAR,
git_commit VARCHAR,
status VARCHAR, -- 'active', 'archived', 'failed'
metadata JSONB
);
The Atomic Swap Mechanism
msh employs a Blue/Green Deployment strategy for data.
- Green (Staging): All new data and transformations are first loaded into a temporary "Green" schema (e.g.,
analytics_green_a3f2b1c). - Verify: msh runs tests against the Green schema.
- Swap: If all tests pass, msh performs an Atomic Swap.
The Swap Logic
The swap happens using CREATE OR REPLACE VIEW to ensure zero downtime:
BEGIN TRANSACTION;
-- Step 1: Point production views to the new Green schema
CREATE OR REPLACE VIEW analytics.customers AS
SELECT * FROM analytics_green_a3f2b1c.customers;
CREATE OR REPLACE VIEW analytics.revenue AS
SELECT * FROM analytics_green_a3f2b1c.revenue;
-- Step 2: Archive the old Blue schema
ALTER SCHEMA analytics_blue_9d8e7f6
RENAME TO analytics_archive_9d8e7f6;
-- Step 3: Promote Green to Blue
ALTER SCHEMA analytics_green_a3f2b1c
RENAME TO analytics_blue_a3f2b1c;
-- Step 4: Update state
UPDATE msh_state_history
SET status = 'archived'
WHERE run_id = '9d8e7f6';
INSERT INTO msh_state_history (run_id, schema_name, status)
VALUES ('a3f2b1c', 'analytics_blue_a3f2b1c', 'active');
COMMIT;
This ensures that consumers of your data never see a broken or partial state. The view swap is atomic—it either fully succeeds or fully fails.
Failure Tolerance
If any step in the pipeline fails (Ingest, Transform, or Test):
- The Green schema is immediately dropped.
- The Blue schema remains untouched and live.
- The error is logged to
msh_state_history.
You can fix the issue and retry without any manual cleanup.
Rollback Mechanism
Rollback is instant because the archived schema still exists:
BEGIN TRANSACTION;
-- Repoint views to the archived schema
CREATE OR REPLACE VIEW analytics.customers AS
SELECT * FROM analytics_archive_9d8e7f6.customers;
-- Update state
UPDATE msh_state_history
SET status = 'active'
WHERE run_id = '9d8e7f6';
COMMIT;