The Atomic Architecture

msh unifies ingestion, transformation, and deployment into a single atomic workflow. This architecture ensures that your data warehouse is always in a consistent, valid state.

The Pipeline

The following diagram illustrates the lifecycle of a single asset run, from source to production view.

Key Components

1. The Engine (Ingest)

Schema Evolution: Automatically adapting to new columns.
Normalization: Unnesting JSON into relational tables.
Incremental Loading: Fetching only new data using cursors.

2. The Transformer

Isolation: It reads from the specific Raw Table _hash created by the engine, ensuring isolation from other concurrent runs.
Compilation: Jinja templates are resolved to pure SQL.

3. The Lifecycle Manager

The brain of msh. It orchestrates the Blue/Green Deployment:

Create: A new "Green" table (Model Table _a1b2) is built.
Test: Data quality tests run against this Green table.
Swap: If tests pass, the Production View is atomically updated to point to the Green table.
Cleanup: Old versions are retained for a configurable period for rollback, then dropped.

State Management

msh maintains a lightweight state in your destination warehouse (in the msh_state_history table). This "Remote State" allows msh to be stateless application-side, making it perfect for ephemeral runners like Kubernetes jobs or AWS Lambda.

The Pipeline​

Key Components​

1. The Engine (Ingest)​

2. The Transformer​

3. The Lifecycle Manager​

State Management​