Skip to main content

The Atomic Architecture

msh unifies ingestion, transformation, and deployment into a single atomic workflow. This architecture ensures that your data warehouse is always in a consistent, valid state.

The Pipeline

The following diagram illustrates the lifecycle of a single asset run, from source to production view.

Pipeline Flow Diagram

Key Components

1. The Engine (Ingest)

Powered by dlt, the engine extracts data from APIs or databases. It handles:

  • Schema Evolution: Automatically adapting to new columns.
  • Normalization: Unnesting JSON into relational tables.
  • Incremental Loading: Fetching only new data using cursors.

2. The Transformer

Powered by dbt, the transformer compiles your SQL logic.

  • Isolation: It reads from the specific Raw Table _hash created by the engine, ensuring isolation from other concurrent runs.
  • Compilation: Jinja templates are resolved to pure SQL.

3. The Lifecycle Manager

The brain of msh. It orchestrates the Blue/Green Deployment:

  1. Create: A new "Green" table (Model Table _a1b2) is built.
  2. Test: Data quality tests run against this Green table.
  3. Swap: If tests pass, the Production View is atomically updated to point to the Green table.
  4. Cleanup: Old versions are retained for a configurable period for rollback, then dropped.

State Management

msh maintains a lightweight state in your destination warehouse (in the msh_state_history table). This "Remote State" allows msh to be stateless application-side, making it perfect for ephemeral runners like Kubernetes jobs or AWS Lambda.