Skip to main content

Universal Connectivity

msh is designed to be destination-agnostic. You can switch your underlying data warehouse from Postgres to Snowflake (or vice versa) with a single configuration change.

Secrets Management

All secrets are managed via environment variables or a .env file. msh follows the dlt convention for naming credentials, which means they are automatically injected into the pipeline without additional configuration.

The .env File

Create a .env file in your project root:

# .env
STRIPE_API_KEY=sk_live_abc123xyz
SALESFORCE_USERNAME=user@company.com
SALESFORCE_PASSWORD=secure_password
SALESFORCE_SECURITY_TOKEN=token123

# Destination credentials
DESTINATION__POSTGRES__CREDENTIALS="postgresql://user:pass@localhost:5432/msh_db"

msh automatically loads this file and makes the variables available to both the orchestration layer (for schema swaps) and the ingestion layer (for API calls).

info

Never commit .env files to version control. Add .env to your .gitignore.


Destination Configuration

The destination is where your data lands. msh supports any database that dlt and dbt support, including Postgres, Snowflake, BigQuery, Redshift, and DuckDB.

Postgres

Set the following environment variables. DESTINATION__... is for dlt/msh, while POSTGRES_... are for dbt.

# dlt / msh (Ingestion & Orchestration)
DESTINATION__POSTGRES__CREDENTIALS="postgresql://user:password@host:port/database"

# dbt (Transformation)
POSTGRES_HOST="host"
POSTGRES_USER="user"
POSTGRES_PASSWORD="password"
POSTGRES_PORT="5432"
POSTGRES_DB="database"

Snowflake

For Snowflake, you need both the dlt credentials and the dbt environment variables.

Important Notes:

  • Account name should be just the identifier (e.g., xyz123), not the full URL
  • Schema names are automatically converted to uppercase (Snowflake convention)
  • Schema names are validated and sanitized (max 255 chars, recommended 63)
  • In dev mode, git branch suffixes are automatically appended to schema names
# dlt / msh (Ingestion & Orchestration)
DESTINATION__SNOWFLAKE__CREDENTIALS__DATABASE="ANALYTICS"
DESTINATION__SNOWFLAKE__CREDENTIALS__PASSWORD="secure_password"
DESTINATION__SNOWFLAKE__CREDENTIALS__USERNAME="MSH_USER"
DESTINATION__SNOWFLAKE__CREDENTIALS__ACCOUNT="xyz123" # or HOST="xyz123.snowflakecomputing.com"
DESTINATION__SNOWFLAKE__CREDENTIALS__WAREHOUSE="COMPUTE_WH"
DESTINATION__SNOWFLAKE__CREDENTIALS__ROLE="TRANSFORMER"

# dbt (Transformation)
SNOWFLAKE_ACCOUNT="xyz123"
SNOWFLAKE_USER="MSH_USER"
SNOWFLAKE_PASSWORD="secure_password"
SNOWFLAKE_ROLE="TRANSFORMER"
SNOWFLAKE_DATABASE="ANALYTICS"
SNOWFLAKE_WAREHOUSE="COMPUTE_WH"

Connection String Format: msh automatically builds the Snowflake connection string in the format:

snowflake://user:password@account/database?warehouse=warehouse&role=role&schema=schema

Schema Naming Best Practices:

  • Use uppercase names (automatically enforced)
  • Keep names under 63 characters (recommended)
  • Avoid special characters (automatically sanitized)
  • In dev mode, branch names are appended: PUBLIC_feature_branch

Switching Destinations

To switch from Postgres to Snowflake:

  1. Update your .env file with the new destination credentials.
  2. Run msh init to initialize the internal state in the new destination.
  3. Run msh run to deploy your pipeline.

msh handles the dialect differences for you. Standard SQL transformations will work on both platforms without modification.

note

msh automatically detects the destination type based on the environment variables you provide. You don't need to explicitly set a DESTINATION_TYPE variable.


Source Configuration

msh supports two approaches for defining sources:

  1. Source Definitions (Recommended): Define sources once in msh.yaml and reference them from .msh files
  2. Direct Credentials: Define credentials directly in each .msh file

Both approaches work and can be mixed in the same project. Source definitions follow the DRY principle and are recommended for projects with multiple assets using the same sources.

Define sources once in msh.yaml and reference them from your .msh files:

Define in msh.yaml:

# msh.yaml
sources:
- name: prod_db
type: sql_database
credentials: "${DB_PROD_CREDENTIALS}"
schema: public
tables:
- name: orders
- name: customers

- name: stripe_api
type: rest_api
endpoint: "https://api.stripe.com"
credentials:
api_key: "${STRIPE_API_KEY}"
resources:
- name: customers
- name: charges

Reference in .msh files:

# models/stg_orders.msh
ingest:
source: prod_db
table: orders

transform: |
SELECT * FROM {{ source }}

# models/stg_stripe_customers.msh
ingest:
source: stripe_api
resource: customers

transform: |
SELECT
id,
email,
created_at
FROM {{ source }}
WHERE status = 'active'

Benefits:

  • ✅ Change credentials once, update everywhere
  • ✅ Clear source catalog in one place
  • ✅ Environment variables for sensitive data
  • ✅ Eliminates credential repetition

See msh.yaml Configuration Reference for complete documentation.

Direct Credentials (Alternative)

You can also define credentials directly in each .msh file:

API Sources:

For API sources, reference the credentials using ${ENV_VAR} syntax:

# models/stripe_customers.msh
source:
type: dlt
source_name: stripe
resource: customers
credentials:
api_key: ${STRIPE_API_KEY}

sql: |
SELECT
id,
email,
created_at
FROM source('stripe', 'customers')
WHERE status = 'active'

Database Sources (DB-to-DB):

For database-to-database replication, use the sql_database source type:

# models/legacy_users.msh
source:
type: dlt
source_name: sql_database
credentials: ${SOURCE__POSTGRES__CREDENTIALS}
table: users
schema: public

sql: |
SELECT
user_id,
email,
signup_date
FROM source('sql_database', 'users')

Environment Variable:

SOURCE__POSTGRES__CREDENTIALS="postgresql://readonly_user:pass@legacy-db.company.com:5432/production"

When to Use Each Approach

Use Source Definitions (msh.yaml) when:

  • You have multiple .msh files using the same source
  • You want a centralized source catalog
  • You're building a layered project (staging → intermediate → marts)
  • You want to follow DRY principles

Use Direct Credentials when:

  • You have a single asset using a unique source
  • You're prototyping or testing
  • You need source-specific configuration that doesn't fit the source definition pattern

Both approaches can coexist in the same project, allowing gradual migration from direct credentials to source definitions.


Troubleshooting

Snowflake-Specific Issues

If you encounter errors with Snowflake, see the Snowflake Troubleshooting Guide for detailed solutions to common problems including:

  • Warehouse suspension errors
  • Connection timeouts
  • Authentication failures
  • Schema name issues
  • Quota exceeded errors

Quick Checks:

  1. Verify all required environment variables are set (run env | grep SNOWFLAKE)
  2. Ensure your warehouse is running (not suspended) in Snowflake UI
  3. Check user permissions (CREATE SCHEMA, CREATE TABLE, CREATE VIEW)
  4. Run with --debug flag for detailed error messages: msh run --debug

General Connection Issues

Problem: Connection fails immediately

  • Check that all required environment variables are set
  • Verify credentials are correct (test manually if possible)
  • Check network connectivity to the destination

Problem: Connection timeout

  • Verify the destination is accessible from your network
  • Check firewall rules and IP whitelisting
  • For Snowflake, ensure warehouse is running

Problem: Authentication errors

  • Double-check username and password (case-sensitive)
  • Verify account/database names match exactly
  • Check user permissions and role assignments

Best Practices

  1. Use Read-Only Credentials for Sources: When connecting to production databases, always use a read-only user to prevent accidental writes.
  2. Separate Environments: Use different .env files for development, staging, and production (e.g., .env.dev, .env.prod).
  3. Rotate Secrets Regularly: API keys and database passwords should be rotated periodically and updated in your secret management system.
  4. Test Connections First: Use msh run --dry-run to verify connections before running full pipelines.
  5. Monitor Warehouse Status: For Snowflake, set up alerts for warehouse suspension and monitor compute usage.