Universal Connectivity
msh is designed to be destination-agnostic. You can switch your underlying data warehouse from Postgres to Snowflake (or vice versa) with a single configuration change.
Secrets Management
All secrets are managed via environment variables or a .env file. msh follows the dlt convention for naming credentials, which means they are automatically injected into the pipeline without additional configuration.
The .env File
Create a .env file in your project root:
# .env
STRIPE_API_KEY=sk_live_abc123xyz
SALESFORCE_USERNAME=user@company.com
SALESFORCE_PASSWORD=secure_password
SALESFORCE_SECURITY_TOKEN=token123
# Destination credentials
DESTINATION__POSTGRES__CREDENTIALS="postgresql://user:pass@localhost:5432/msh_db"
msh automatically loads this file and makes the variables available to both the orchestration layer (for schema swaps) and the ingestion layer (for API calls).
Never commit .env files to version control. Add .env to your .gitignore.
Destination Configuration
The destination is where your data lands. msh supports any database that dlt and dbt support, including Postgres, Snowflake, BigQuery, Redshift, and DuckDB.
Postgres
Set the following environment variables. DESTINATION__... is for dlt/msh, while POSTGRES_... are for dbt.
# dlt / msh (Ingestion & Orchestration)
DESTINATION__POSTGRES__CREDENTIALS="postgresql://user:password@host:port/database"
# dbt (Transformation)
POSTGRES_HOST="host"
POSTGRES_USER="user"
POSTGRES_PASSWORD="password"
POSTGRES_PORT="5432"
POSTGRES_DB="database"
Snowflake
For Snowflake, you need both the dlt credentials and the dbt environment variables.
Important Notes:
- Account name should be just the identifier (e.g.,
xyz123), not the full URL - Schema names are automatically converted to uppercase (Snowflake convention)
- Schema names are validated and sanitized (max 255 chars, recommended 63)
- In dev mode, git branch suffixes are automatically appended to schema names
# dlt / msh (Ingestion & Orchestration)
DESTINATION__SNOWFLAKE__CREDENTIALS__DATABASE="ANALYTICS"
DESTINATION__SNOWFLAKE__CREDENTIALS__PASSWORD="secure_password"
DESTINATION__SNOWFLAKE__CREDENTIALS__USERNAME="MSH_USER"
DESTINATION__SNOWFLAKE__CREDENTIALS__ACCOUNT="xyz123" # or HOST="xyz123.snowflakecomputing.com"
DESTINATION__SNOWFLAKE__CREDENTIALS__WAREHOUSE="COMPUTE_WH"
DESTINATION__SNOWFLAKE__CREDENTIALS__ROLE="TRANSFORMER"
# dbt (Transformation)
SNOWFLAKE_ACCOUNT="xyz123"
SNOWFLAKE_USER="MSH_USER"
SNOWFLAKE_PASSWORD="secure_password"
SNOWFLAKE_ROLE="TRANSFORMER"
SNOWFLAKE_DATABASE="ANALYTICS"
SNOWFLAKE_WAREHOUSE="COMPUTE_WH"
Connection String Format: msh automatically builds the Snowflake connection string in the format:
snowflake://user:password@account/database?warehouse=warehouse&role=role&schema=schema
Schema Naming Best Practices:
- Use uppercase names (automatically enforced)
- Keep names under 63 characters (recommended)
- Avoid special characters (automatically sanitized)
- In dev mode, branch names are appended:
PUBLIC_feature_branch
Switching Destinations
To switch from Postgres to Snowflake:
- Update your
.envfile with the new destination credentials. - Run
msh initto initialize the internal state in the new destination. - Run
msh runto deploy your pipeline.
msh handles the dialect differences for you. Standard SQL transformations will work on both platforms without modification.
msh automatically detects the destination type based on the environment variables you provide. You don't need to explicitly set a DESTINATION_TYPE variable.
Source Configuration
msh supports two approaches for defining sources:
- Source Definitions (Recommended): Define sources once in
msh.yamland reference them from.mshfiles - Direct Credentials: Define credentials directly in each
.mshfile
Both approaches work and can be mixed in the same project. Source definitions follow the DRY principle and are recommended for projects with multiple assets using the same sources.
Source Definitions (Recommended)
Define sources once in msh.yaml and reference them from your .msh files:
Define in msh.yaml:
# msh.yaml
sources:
- name: prod_db
type: sql_database
credentials: "${DB_PROD_CREDENTIALS}"
schema: public
tables:
- name: orders
- name: customers
- name: stripe_api
type: rest_api
endpoint: "https://api.stripe.com"
credentials:
api_key: "${STRIPE_API_KEY}"
resources:
- name: customers
- name: charges
Reference in .msh files:
# models/stg_orders.msh
ingest:
source: prod_db
table: orders
transform: |
SELECT * FROM {{ source }}
# models/stg_stripe_customers.msh
ingest:
source: stripe_api
resource: customers
transform: |
SELECT
id,
email,
created_at
FROM {{ source }}
WHERE status = 'active'
Benefits:
- ✅ Change credentials once, update everywhere
- ✅ Clear source catalog in one place
- ✅ Environment variables for sensitive data
- ✅ Eliminates credential repetition
See msh.yaml Configuration Reference for complete documentation.
Direct Credentials (Alternative)
You can also define credentials directly in each .msh file:
API Sources:
For API sources, reference the credentials using ${ENV_VAR} syntax:
# models/stripe_customers.msh
source:
type: dlt
source_name: stripe
resource: customers
credentials:
api_key: ${STRIPE_API_KEY}
sql: |
SELECT
id,
email,
created_at
FROM source('stripe', 'customers')
WHERE status = 'active'
Database Sources (DB-to-DB):
For database-to-database replication, use the sql_database source type:
# models/legacy_users.msh
source:
type: dlt
source_name: sql_database
credentials: ${SOURCE__POSTGRES__CREDENTIALS}
table: users
schema: public
sql: |
SELECT
user_id,
email,
signup_date
FROM source('sql_database', 'users')
Environment Variable:
SOURCE__POSTGRES__CREDENTIALS="postgresql://readonly_user:pass@legacy-db.company.com:5432/production"
When to Use Each Approach
Use Source Definitions (msh.yaml) when:
- You have multiple
.mshfiles using the same source - You want a centralized source catalog
- You're building a layered project (staging → intermediate → marts)
- You want to follow DRY principles
Use Direct Credentials when:
- You have a single asset using a unique source
- You're prototyping or testing
- You need source-specific configuration that doesn't fit the source definition pattern
Both approaches can coexist in the same project, allowing gradual migration from direct credentials to source definitions.
Troubleshooting
Snowflake-Specific Issues
If you encounter errors with Snowflake, see the Snowflake Troubleshooting Guide for detailed solutions to common problems including:
- Warehouse suspension errors
- Connection timeouts
- Authentication failures
- Schema name issues
- Quota exceeded errors
Quick Checks:
- Verify all required environment variables are set (run
env | grep SNOWFLAKE) - Ensure your warehouse is running (not suspended) in Snowflake UI
- Check user permissions (CREATE SCHEMA, CREATE TABLE, CREATE VIEW)
- Run with
--debugflag for detailed error messages:msh run --debug
General Connection Issues
Problem: Connection fails immediately
- Check that all required environment variables are set
- Verify credentials are correct (test manually if possible)
- Check network connectivity to the destination
Problem: Connection timeout
- Verify the destination is accessible from your network
- Check firewall rules and IP whitelisting
- For Snowflake, ensure warehouse is running
Problem: Authentication errors
- Double-check username and password (case-sensitive)
- Verify account/database names match exactly
- Check user permissions and role assignments
Best Practices
- Use Read-Only Credentials for Sources: When connecting to production databases, always use a read-only user to prevent accidental writes.
- Separate Environments: Use different
.envfiles for development, staging, and production (e.g.,.env.dev,.env.prod). - Rotate Secrets Regularly: API keys and database passwords should be rotated periodically and updated in your secret management system.
- Test Connections First: Use
msh run --dry-runto verify connections before running full pipelines. - Monitor Warehouse Status: For Snowflake, set up alerts for warehouse suspension and monitor compute usage.