Migrating from dbt
If you are already using dbt, you don't need to rewrite your entire project to start using msh. msh is designed to wrap your existing SQL logic, adding the power of Atomic Data Engineering (Blue/Green deployment, Smart Ingest, and Rollbacks) without changing your transformation code.
The Wrapper Pattern
The easiest way to migrate is to use the "Wrapper Pattern". You keep your complex .sql files exactly as they are and create a lightweight .msh file to manage the lifecycle.
1. Existing dbt Model
Suppose you have a complex revenue model in models/revenue_mart.sql:
-- models/revenue_mart.sql
WITH transactions AS (
SELECT * FROM {{ source('stripe', 'charges') }}
),
refunds AS (
SELECT * FROM {{ source('stripe', 'refunds') }}
)
SELECT
t.id,
t.amount - COALESCE(r.amount, 0) as net_amount,
t.created_at
FROM transactions t
LEFT JOIN refunds r ON t.id = r.charge_id
2. Create the Wrapper
Create a file named models/revenue_mart.msh next to it:
name: revenue_mart
# Define where the data comes from (msh handles the ingest)
ingest:
type: dlt
source_name: stripe
resource: charges
# Point to your existing SQL file
transform_file: revenue_mart.sql
# Add Atomic Lifecycle features
deployment:
strategy: blue_green
tests:
- not_null: id
- unique: id
3. Run msh
msh run
What happens:
- Ingest: msh sees the
ingestblock and automatically loads the Stripe data into a raw table (e.g.,raw_revenue_mart_a1b2). - Compile: msh reads
revenue_mart.sql, replaces{{ source(...) }}with the new raw table, and compiles it into a dbt model. - Execute: msh runs dbt to build the model in a staging schema.
- Swap: Once tests pass, msh atomically swaps the view to point to the new table.
Benefits of Wrapping
- Zero Rewrites: Your SQL logic remains untouched.
- State Management: msh takes over the responsibility of managing table state (Blue/Green), so you never see "table not found" errors during updates.
- Unified Pipeline: Ingest and Transform are now coupled in a single atomic unit. If ingestion fails, transformation doesn't run. If transformation fails, the old data remains live.
Source Definitions (dbt-style)
msh supports dbt-style source definitions, allowing you to define sources once in msh.yaml and reference them from individual .msh files. This eliminates credential repetition and follows the DRY principle.
dbt Sources.yml → msh.yaml
dbt approach (sources.yml):
# models/sources.yml
sources:
- name: stripe
description: Stripe payment data
tables:
- name: charges
- name: customers
- name: postgres
description: Production database
schema: public
tables:
- name: orders
- name: customers
msh approach (msh.yaml):
# msh.yaml
sources:
- name: stripe_api
type: rest_api
endpoint: "https://api.stripe.com"
credentials:
api_key: "${STRIPE_API_KEY}"
resources:
- name: charges
- name: customers
- name: prod_db
type: sql_database
credentials: "${DB_PROD_CREDENTIALS}"
schema: public
tables:
- name: orders
- name: customers
Referencing Sources
dbt approach:
-- models/stg_orders.sql
SELECT * FROM {{ source('postgres', 'orders') }}
msh approach:
# models/stg_orders.msh
ingest:
source: prod_db
table: orders
transform: |
SELECT * FROM {{ source }}
Key Differences
| Feature | dbt | msh |
|---|---|---|
| Source Definition | sources.yml | msh.yaml |
| Credentials | Defined separately | Included in source definition |
| Environment Variables | Manual handling | ${VAR_NAME} syntax |
| Source Types | SQL databases only | SQL databases + REST APIs + more |
| Usage | {{ source('name', 'table') }} | source: name in ingest block |
Migration Path
- Identify Sources: List all sources used in your dbt project
- Create
msh.yaml: Convertsources.ymlentries to msh format - Update
.mshFiles: Replace{{ source(...) }}withsource:references - Test: Verify all sources resolve correctly
- Gradual Migration: Both styles can coexist during migration
Example: Complete Migration
Before (dbt):
# models/sources.yml
sources:
- name: stripe
tables:
- name: charges
# models/stg_charges.sql
SELECT * FROM {{ source('stripe', 'charges') }}
After (msh):
# msh.yaml
sources:
- name: stripe_api
type: rest_api
endpoint: "https://api.stripe.com"
credentials:
api_key: "${STRIPE_API_KEY}"
resources:
- name: charges
# models/stg_charges.msh
ingest:
source: stripe_api
resource: charges
transform: |
SELECT * FROM {{ source }}
See msh.yaml Configuration Reference for complete documentation.