Skip to main content

Environment Configuration

msh supports multiple environments (development, staging, production) with different configurations, schema naming, and deployment behaviors. This guide explains how to configure and use environments effectively.

Overview

Environments in msh control:

  • Schema naming: Git-aware schemas in dev, fixed schemas in prod
  • Configuration files: Environment-specific .env files
  • Deployment behavior: Different destinations and credentials per environment

The --env Flag

All msh commands support the --env flag to specify the target environment:

msh run --env dev
msh run --env staging
msh run --env prod

Default: If --env is not specified, msh defaults to dev.

Environment-Specific Configuration

Creating Environment Files

Create separate .env files for each environment:

my-msh-project/
├── .env.dev # Development environment
├── .env.staging # Staging environment
├── .env.production # Production environment
└── .env # Default (used if no --env specified)

Environment File Naming

msh loads environment files using the pattern: .env.<environment>

Examples:

  • msh run --env dev → loads .env.dev
  • msh run --env staging → loads .env.staging
  • msh run --env prod → loads .env.production or .env.prod
  • msh run (no flag) → loads .env (defaults to dev behavior)

Example Environment Files

.env.dev (Development):

# Development database (local or shared dev instance)
DESTINATION__DUCKDB__CREDENTIALS="duckdb:///dev_data.duckdb"

# Or Postgres dev instance
DESTINATION__POSTGRES__CREDENTIALS="postgresql://dev_user:dev_pass@localhost:5432/analytics_dev"

# Source credentials (can use test/sandbox APIs)
STRIPE_API_KEY="sk_test_..."
SALESFORCE_USERNAME="dev@example.com"

.env.staging (Staging):

# Staging database (shared staging instance)
DESTINATION__POSTGRES__CREDENTIALS="postgresql://staging_user:staging_pass@staging-db:5432/analytics_staging"

# Source credentials (staging/test APIs)
STRIPE_API_KEY="sk_test_..."
SALESFORCE_USERNAME="staging@example.com"

.env.production (Production):

# Production database (Snowflake, Postgres, etc.)
DESTINATION__SNOWFLAKE__CREDENTIALS__DATABASE="ANALYTICS_PROD"
DESTINATION__SNOWFLAKE__CREDENTIALS__USERNAME="MSH_PROD_USER"
DESTINATION__SNOWFLAKE__CREDENTIALS__PASSWORD="secure_prod_password"
DESTINATION__SNOWFLAKE__CREDENTIALS__HOST="abc123.snowflakecomputing.com"
DESTINATION__SNOWFLAKE__CREDENTIALS__WAREHOUSE="COMPUTE_WH"
DESTINATION__SNOWFLAKE__CREDENTIALS__ROLE="TRANSFORMER"

# Production source credentials
STRIPE_API_KEY="sk_live_..."
SALESFORCE_USERNAME="prod@example.com"

Schema Naming by Environment

Development Environment (--env dev)

In development, msh uses Git-aware schemas for automatic isolation:

Behavior:

  • Schema names include git branch suffix
  • Each developer branch gets isolated schemas
  • Prevents conflicts when multiple developers work simultaneously

Examples:

  • Branch feature/new-api → Schema: main_feature_new_api
  • Branch bugfix/issue-123 → Schema: main_bugfix_issue_123
  • Branch main → Schema: main_main
  • No git repo → Schema: main_local (fallback)

Raw Dataset:

  • Branch feature/new-api → Dataset: msh_raw_feature_new_api
  • Production → Dataset: msh_raw (no suffix)

See Git-Aware Schemas for complete details.

Production Environment (--env prod)

In production, msh uses fixed schemas without git suffixes:

Behavior:

  • Schema names are consistent and predictable
  • No git branch suffixes
  • Always uses base schema names

Examples:

  • Schema: main (DuckDB) or PUBLIC (Snowflake) or public (Postgres)
  • Raw Dataset: msh_raw

Why: Production deployments should always use the same schema names for consistency and to avoid confusion.

Environment Comparison

FeatureDevelopment (--env dev)Production (--env prod)
Schema NamingGit-aware (with branch suffix)Fixed (no suffix)
Example Schemamain_feature_new_apimain
Raw Datasetmsh_raw_feature_new_apimsh_raw
Use CaseLocal development, feature branchesProduction deployments
IsolationPer-branch isolationSingle production schema
Configuration File.env.dev.env.production

Using Environments in Commands

Run Command

# Development (default)
msh run
msh run --env dev

# Staging
msh run --env staging

# Production
msh run --env prod

Plan Command

# Preview changes in dev
msh plan --env dev

# Preview changes in prod
msh plan --env prod

Rollback Command

# Rollback dev environment
msh rollback --env dev orders

# Rollback production
msh rollback --env prod orders

Status Command

# Check dev status
msh status

# Check prod status (if you have prod credentials configured)
msh status # Uses current environment

CI/CD Integration

GitHub Actions Example

name: Deploy

on:
push:
branches: [main]

jobs:
deploy-staging:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Deploy to Staging
run: msh run --env staging
env:
DESTINATION__POSTGRES__CREDENTIALS: ${{ secrets.STAGING_DB }}

deploy-production:
runs-on: ubuntu-latest
needs: deploy-staging
if: github.ref == 'refs/heads/main'
steps:
- uses: actions/checkout@v3
- name: Deploy to Production
run: msh run --env prod
env:
DESTINATION__SNOWFLAKE__CREDENTIALS__DATABASE: ${{ secrets.PROD_DB }}

Docker Example

FROM python:3.11-slim

WORKDIR /app

# Copy environment-specific config
COPY .env.production .env

# Run with production environment
CMD ["msh", "run", "--env", "prod"]

Best Practices

1. Never Commit Environment Files

Add to .gitignore:

.env*
!.env.example

Create .env.example as a template:

# .env.example
DESTINATION__POSTGRES__CREDENTIALS="postgresql://user:pass@host:5432/db"
STRIPE_API_KEY="your_api_key_here"

2. Use Secret Management in Production

Don't:

# ❌ Hardcode in .env file
DESTINATION__SNOWFLAKE__CREDENTIALS__PASSWORD="hardcoded_password"

Do:

# ✅ Use environment variables from secret manager
# Set in CI/CD or orchestration platform
DESTINATION__SNOWFLAKE__CREDENTIALS__PASSWORD="${SNOWFLAKE_PASSWORD}"

3. Separate Dev and Prod Databases

Development:

  • Use local databases (DuckDB, local Postgres)
  • Or shared dev instance with isolated schemas (Git-aware)

Production:

  • Use dedicated production database
  • Never mix dev and prod data

4. Test in Staging First

Workflow:

  1. Develop locally (--env dev)
  2. Deploy to staging (--env staging)
  3. Verify in staging
  4. Deploy to production (--env prod)

5. Environment-Specific Asset Configuration

You can also configure assets differently per environment in msh.yaml:

# msh.yaml
project_name: my_project

environments:
dev:
destination: duckdb
target_schema: main
staging:
destination: postgres
target_schema: analytics_staging
prod:
destination: snowflake
target_schema: ANALYTICS_PROD

Then use:

msh run --env dev      # Uses DuckDB
msh run --env staging # Uses Postgres staging
msh run --env prod # Uses Snowflake production

Troubleshooting

Wrong Environment File Loaded

Symptom: Command uses wrong credentials or database.

Fix: Verify the environment file name matches:

# Check if file exists
ls -la .env.dev .env.production

# Verify file is being loaded
msh run --env dev --debug

Schema Name Mismatch

Symptom: Tables not found or wrong schema used.

Cause: Environment mismatch (dev vs prod) or git branch changed.

Fix:

# Check current environment
msh status

# Verify git branch (in dev)
git branch

# Use correct environment
msh run --env prod

Credentials Not Found

Symptom: Connection errors or authentication failures.

Fix: Ensure environment file exists and contains correct credentials:

# Check if file exists
cat .env.production

# Verify credentials format
# Should match: DESTINATION__<TYPE>__CREDENTIALS=...