The Atomic Architecture
msh unifies ingestion, transformation, and deployment into a single atomic workflow. This architecture ensures that your data warehouse is always in a consistent, valid state.
The Pipeline
The following diagram illustrates the lifecycle of a single asset run, from source to production view.
Key Components
1. The Engine (Ingest)
Powered by dlt, the engine extracts data from APIs or databases. It handles:
- Schema Evolution: Automatically adapting to new columns.
- Normalization: Unnesting JSON into relational tables.
- Incremental Loading: Fetching only new data using cursors.
2. The Transformer
Powered by dbt, the transformer compiles your SQL logic.
- Isolation: It reads from the specific
Raw Table _hashcreated by the engine, ensuring isolation from other concurrent runs. - Compilation: Jinja templates are resolved to pure SQL.
3. The Lifecycle Manager
The brain of msh. It orchestrates the Blue/Green Deployment:
- Create: A new "Green" table (
Model Table _a1b2) is built. - Test: Data quality tests run against this Green table.
- Swap: If tests pass, the Production View is atomically updated to point to the Green table.
- Cleanup: Old versions are retained for a configurable period for rollback, then dropped.
State Management
msh maintains a lightweight state in your destination warehouse (in the msh_state_history table). This "Remote State" allows msh to be stateless application-side, making it perfect for ephemeral runners like Kubernetes jobs or AWS Lambda.