Quickstart
Get started with msh in under 10 minutes. This guide will walk you through installing msh, creating your first data pipeline, and viewing the results.
Prerequisites
- Python 3.9 or higher
- A destination database (Postgres, Snowflake, or DuckDB)
Installation
Install msh via pip:
pip install msh-cli
Verify the installation:
msh --version
Project Setup
Create a new directory for your msh project:
mkdir my-msh-project
cd my-msh-project
Initialize the project:
msh init
What this creates:
my-msh-project/
├── .env # Environment variables (secrets)
├── models/ # Your .msh files go here
└── .gitignore # Pre-configured to exclude .env
Configure Your Destination
Edit the .env file to configure your destination database. For this quickstart, we'll use DuckDB (no setup required):
# .env
DESTINATION__DUCKDB__CREDENTIALS="duckdb:///my_data.duckdb"
For Postgres:
DESTINATION__POSTGRES__CREDENTIALS="postgresql://user:password@localhost:5432/analytics"
Optional: Define Sources in msh.yaml
For larger projects, you can define sources once in msh.yaml and reference them from your .msh files:
# msh.yaml
sources:
- name: jsonplaceholder
type: rest_api
endpoint: "https://jsonplaceholder.typicode.com"
resources:
- name: users
- name: posts
Then reference in .msh files:
ingest:
source: jsonplaceholder
resource: users
See msh.yaml Configuration Reference for more details.
Create Your First Asset
Instead of writing YAML manually, use msh discover to automatically generate your asset configuration:
msh discover https://jsonplaceholder.typicode.com/users --name my_first_asset
What this does:
- Probes the REST API endpoint
- Discovers the schema (columns and data types)
- Generates a complete
.mshfile with proper configuration - Creates schema contracts automatically
Example Output:
Detected source type: rest_api
Generated .msh configuration:
============================================================
name: my_first_asset
description: Auto-discovered from rest_api
ingest:
type: rest_api
endpoint: https://jsonplaceholder.typicode.com/users
resource: data
contract:
evolution: evolve
enforce_types: true
required_columns:
- id
- name
- email
- username
transform: |
SELECT * FROM {{ source }}
============================================================
✓ Written to: models/my_first_asset.msh
You can now run: msh run my_first_asset
Customize the transformation:
Edit models/my_first_asset.msh to add your business logic:
name: my_first_asset
ingest:
type: rest_api
endpoint: https://jsonplaceholder.typicode.com/users
resource: data
contract:
evolution: evolve
enforce_types: true
required_columns:
- id
- name
- email
- username
transform: |
SELECT
id,
name,
email,
UPPER(username) as username_upper
FROM {{ source }}
WHERE id <= 5
This asset:
- Ingests data from the JSONPlaceholder API (a free test API)
- Transforms it using SQL (selecting specific columns and uppercasing the username)
- Filters to only the first 5 users
Run Your First Pipeline
Execute the pipeline:
msh run
Expected Output:
msh Run v1.0.0
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
[1/4] Initializing Green Schema (analytics_green_a3f2b1c)
[OK] Schema created
[2/4] Ingestion (dlt)
[OK] rest_api.users → raw_rest_api_users (10 rows)
[3/4] Transformation (dbt)
[OK] models/my_first_asset.msh → my_first_asset (5 rows)
[4/4] Blue/Green Deploy
[OK] Swapping analytics_blue ↔ analytics_green_a3f2b1c
[OK] Deployment complete
State saved to msh_state_history (run_id: a3f2b1c)
View the Results
Option 1: Query Directly
If using DuckDB:
duckdb my_data.duckdb
SELECT * FROM analytics.my_first_asset;
Option 2: Use the Dashboard
Launch the msh UI:
msh ui
Open your browser to http://localhost:3000. You'll see:
- Active Deployments: Your current pipeline state
- Asset List: All your models with row counts
- Lineage Graph: Visual representation of data flow (API → Raw → Model)
What Just Happened?
- Ingestion: msh used
dltto fetch data from the JSONPlaceholder API - Smart Ingest: Because your SQL only selected
id,name,email, andusername, msh only fetched those fields (not all 10+ fields available) - Transformation: Your SQL ran in a temporary "Green" schema
- Blue/Green Swap: After validation, msh atomically swapped the production view to point to the new data
- State Tracking: The deployment was recorded in
msh_state_historyfor rollback capability
Next Steps
Add More Assets
Use msh discover to create another asset:
msh discover https://jsonplaceholder.typicode.com/users --name active_users
Then edit models/active_users.msh to customize the transformation:
name: active_users
ingest:
type: rest_api
endpoint: https://jsonplaceholder.typicode.com/users
resource: data
transform: |
SELECT
id,
name,
email
FROM {{ source }}
WHERE email LIKE '%biz'
Run again:
msh run
Test Rollback
Make a breaking change to my_first_asset.msh (e.g., reference a column that doesn't exist), then run:
msh run # This will fail
msh rollback # Instantly revert to the previous working state
Explore Commands
msh discover <url> # Auto-discover and generate .msh files
msh sample <asset> # Preview data from assets
msh doctor # Check your environment health
msh plan # Preview changes without executing
msh lineage # View the dependency graph
Preview Your Data
Use msh sample to quickly preview data:
# Preview latest data
msh sample my_first_asset
# Check raw data
msh sample my_first_asset --source raw
# Create test dataset
msh sample my_first_asset --size 100
Troubleshooting
"Address already in use" when running msh ui
Port 3000 is already in use. Kill the existing process:
lsof -i :3000
kill -9 <PID>
"No module named 'dlt'"
Install the required dependencies:
pip install dlt
API Connection Errors
Check your internet connection and verify the API endpoint is accessible:
curl https://jsonplaceholder.typicode.com/users
What's Next?
- Universal Connectivity: Connect to real data sources (Stripe, Salesforce, Postgres)
- Smart Ingest: Learn how msh optimizes data fetching
- Polyglot Transforms: Mix Python and SQL for advanced transformations
- Production Deployment: Deploy msh in CI/CD