Skip to main content

Quickstart

Get started with msh in under 10 minutes. This guide will walk you through installing msh, creating your first data pipeline, and viewing the results.

Prerequisites

  • Python 3.9 or higher
  • A destination database (Postgres, Snowflake, or DuckDB)

Installation

Install msh via pip:

pip install msh-cli

Verify the installation:

msh --version

Project Setup

Create a new directory for your msh project:

mkdir my-msh-project
cd my-msh-project

Initialize the project:

msh init

What this creates:

my-msh-project/
├── .env # Environment variables (secrets)
├── models/ # Your .msh files go here
└── .gitignore # Pre-configured to exclude .env

Configure Your Destination

Edit the .env file to configure your destination database. For this quickstart, we'll use DuckDB (no setup required):

# .env
DESTINATION__DUCKDB__CREDENTIALS="duckdb:///my_data.duckdb"

For Postgres:

DESTINATION__POSTGRES__CREDENTIALS="postgresql://user:password@localhost:5432/analytics"

Optional: Define Sources in msh.yaml

For larger projects, you can define sources once in msh.yaml and reference them from your .msh files:

# msh.yaml
sources:
- name: jsonplaceholder
type: rest_api
endpoint: "https://jsonplaceholder.typicode.com"
resources:
- name: users
- name: posts

Then reference in .msh files:

ingest:
source: jsonplaceholder
resource: users

See msh.yaml Configuration Reference for more details.


Create Your First Asset

Instead of writing YAML manually, use msh discover to automatically generate your asset configuration:

msh discover https://jsonplaceholder.typicode.com/users --name my_first_asset

What this does:

  • Probes the REST API endpoint
  • Discovers the schema (columns and data types)
  • Generates a complete .msh file with proper configuration
  • Creates schema contracts automatically

Example Output:

Detected source type: rest_api
Generated .msh configuration:
============================================================
name: my_first_asset
description: Auto-discovered from rest_api
ingest:
type: rest_api
endpoint: https://jsonplaceholder.typicode.com/users
resource: data
contract:
evolution: evolve
enforce_types: true
required_columns:
- id
- name
- email
- username
transform: |
SELECT * FROM {{ source }}
============================================================

✓ Written to: models/my_first_asset.msh
You can now run: msh run my_first_asset

Customize the transformation:

Edit models/my_first_asset.msh to add your business logic:

name: my_first_asset
ingest:
type: rest_api
endpoint: https://jsonplaceholder.typicode.com/users
resource: data

contract:
evolution: evolve
enforce_types: true
required_columns:
- id
- name
- email
- username

transform: |
SELECT
id,
name,
email,
UPPER(username) as username_upper
FROM {{ source }}
WHERE id <= 5

This asset:

  1. Ingests data from the JSONPlaceholder API (a free test API)
  2. Transforms it using SQL (selecting specific columns and uppercasing the username)
  3. Filters to only the first 5 users

Run Your First Pipeline

Execute the pipeline:

msh run

Expected Output:

msh Run v1.0.0
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

[1/4] Initializing Green Schema (analytics_green_a3f2b1c)
[OK] Schema created

[2/4] Ingestion (dlt)
[OK] rest_api.users → raw_rest_api_users (10 rows)

[3/4] Transformation (dbt)
[OK] models/my_first_asset.msh → my_first_asset (5 rows)

[4/4] Blue/Green Deploy
[OK] Swapping analytics_blue ↔ analytics_green_a3f2b1c
[OK] Deployment complete

State saved to msh_state_history (run_id: a3f2b1c)

View the Results

Option 1: Query Directly

If using DuckDB:

duckdb my_data.duckdb
SELECT * FROM analytics.my_first_asset;

Option 2: Use the Dashboard

Launch the msh UI:

msh ui

Open your browser to http://localhost:3000. You'll see:

  • Active Deployments: Your current pipeline state
  • Asset List: All your models with row counts
  • Lineage Graph: Visual representation of data flow (API → Raw → Model)

What Just Happened?

  1. Ingestion: msh used dlt to fetch data from the JSONPlaceholder API
  2. Smart Ingest: Because your SQL only selected id, name, email, and username, msh only fetched those fields (not all 10+ fields available)
  3. Transformation: Your SQL ran in a temporary "Green" schema
  4. Blue/Green Swap: After validation, msh atomically swapped the production view to point to the new data
  5. State Tracking: The deployment was recorded in msh_state_history for rollback capability

Next Steps

Add More Assets

Use msh discover to create another asset:

msh discover https://jsonplaceholder.typicode.com/users --name active_users

Then edit models/active_users.msh to customize the transformation:

name: active_users
ingest:
type: rest_api
endpoint: https://jsonplaceholder.typicode.com/users
resource: data

transform: |
SELECT
id,
name,
email
FROM {{ source }}
WHERE email LIKE '%biz'

Run again:

msh run

Test Rollback

Make a breaking change to my_first_asset.msh (e.g., reference a column that doesn't exist), then run:

msh run  # This will fail
msh rollback # Instantly revert to the previous working state

Explore Commands

msh discover <url>           # Auto-discover and generate .msh files
msh sample <asset> # Preview data from assets
msh doctor # Check your environment health
msh plan # Preview changes without executing
msh lineage # View the dependency graph

Preview Your Data

Use msh sample to quickly preview data:

# Preview latest data
msh sample my_first_asset

# Check raw data
msh sample my_first_asset --source raw

# Create test dataset
msh sample my_first_asset --size 100

Troubleshooting

"Address already in use" when running msh ui

Port 3000 is already in use. Kill the existing process:

lsof -i :3000
kill -9 <PID>

"No module named 'dlt'"

Install the required dependencies:

pip install dlt

API Connection Errors

Check your internet connection and verify the API endpoint is accessible:

curl https://jsonplaceholder.typicode.com/users

What's Next?