Production Deployment
This guide covers deploying msh in production environments using Docker, CI/CD, and orchestration platforms.
Containerization
Dockerfile
Create a Dockerfile in your project root:
FROM python:3.11-slim
# Install system dependencies
RUN apt-get update && apt-get install -y \
git \
&& rm -rf /var/lib/apt/lists/*
# Set working directory
WORKDIR /app
# Install msh and dependencies
RUN pip install --no-cache-dir msh-cli dbt-core dbt-postgres dlt
# Copy project files
COPY models/ ./models/
COPY .env.production .env
# Run msh
CMD ["msh", "run"]
Build and Run
# Build the image
docker build -t msh-pipeline:latest .
# Run locally
docker run --env-file .env.production msh-pipeline:latest
# Run with volume mount for development
docker run -v $(pwd)/models:/app/models msh-pipeline:latest
CI/CD Integration
GitHub Actions
msh provides a command to generate a GitHub Actions workflow:
msh generate github
This creates .github/workflows/msh-deploy.yml:
name: msh Deploy
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.11'
- name: Install msh
run: |
pip install msh-cli dbt-core dbt-snowflake dlt
- name: Run msh doctor
run: msh doctor
env:
DESTINATION__SNOWFLAKE__CREDENTIALS__DATABASE: ${{ secrets.SNOWFLAKE_DATABASE }}
DESTINATION__SNOWFLAKE__CREDENTIALS__USERNAME: ${{ secrets.SNOWFLAKE_USERNAME }}
DESTINATION__SNOWFLAKE__CREDENTIALS__PASSWORD: ${{ secrets.SNOWFLAKE_PASSWORD }}
DESTINATION__SNOWFLAKE__CREDENTIALS__HOST: ${{ secrets.SNOWFLAKE_HOST }}
DESTINATION__SNOWFLAKE__CREDENTIALS__WAREHOUSE: ${{ secrets.SNOWFLAKE_WAREHOUSE }}
DESTINATION__SNOWFLAKE__CREDENTIALS__ROLE: ${{ secrets.SNOWFLAKE_ROLE }}
- name: Run msh plan (PR only)
if: github.event_name == 'pull_request'
run: msh plan
env:
DESTINATION__SNOWFLAKE__CREDENTIALS__DATABASE: ${{ secrets.SNOWFLAKE_DATABASE }}
DESTINATION__SNOWFLAKE__CREDENTIALS__USERNAME: ${{ secrets.SNOWFLAKE_USERNAME }}
DESTINATION__SNOWFLAKE__CREDENTIALS__PASSWORD: ${{ secrets.SNOWFLAKE_PASSWORD }}
DESTINATION__SNOWFLAKE__CREDENTIALS__HOST: ${{ secrets.SNOWFLAKE_HOST }}
DESTINATION__SNOWFLAKE__CREDENTIALS__WAREHOUSE: ${{ secrets.SNOWFLAKE_WAREHOUSE }}
DESTINATION__SNOWFLAKE__CREDENTIALS__ROLE: ${{ secrets.SNOWFLAKE_ROLE }}
- name: Run msh deploy (main only)
if: github.ref == 'refs/heads/main'
run: msh run
env:
DESTINATION__SNOWFLAKE__CREDENTIALS__DATABASE: ${{ secrets.SNOWFLAKE_DATABASE }}
DESTINATION__SNOWFLAKE__CREDENTIALS__USERNAME: ${{ secrets.SNOWFLAKE_USERNAME }}
DESTINATION__SNOWFLAKE__CREDENTIALS__PASSWORD: ${{ secrets.SNOWFLAKE_PASSWORD }}
DESTINATION__SNOWFLAKE__CREDENTIALS__HOST: ${{ secrets.SNOWFLAKE_HOST }}
DESTINATION__SNOWFLAKE__CREDENTIALS__WAREHOUSE: ${{ secrets.SNOWFLAKE_WAREHOUSE }}
DESTINATION__SNOWFLAKE__CREDENTIALS__ROLE: ${{ secrets.SNOWFLAKE_ROLE }}
STRIPE_API_KEY: ${{ secrets.STRIPE_API_KEY }}
SALESFORCE_USERNAME: ${{ secrets.SALESFORCE_USERNAME }}
SALESFORCE_PASSWORD: ${{ secrets.SALESFORCE_PASSWORD }}
Setting Up GitHub Secrets
- Navigate to your repository → Settings → Secrets and variables → Actions
- Click New repository secret
- Add each environment variable:
Destination Credentials (Snowflake example):
SNOWFLAKE_DATABASESNOWFLAKE_USERNAMESNOWFLAKE_PASSWORDSNOWFLAKE_HOSTSNOWFLAKE_WAREHOUSESNOWFLAKE_ROLE
Source Credentials:
STRIPE_API_KEYSALESFORCE_USERNAMESALESFORCE_PASSWORDSALESFORCE_SECURITY_TOKEN
GitLab CI
Create .gitlab-ci.yml:
stages:
- validate
- deploy
variables:
PIP_CACHE_DIR: "$CI_PROJECT_DIR/.cache/pip"
cache:
paths:
- .cache/pip
validate:
stage: validate
image: python:3.11-slim
script:
- pip install msh-cli dbt-core dbt-postgres dlt
- msh doctor
- msh plan
only:
- merge_requests
deploy:
stage: deploy
image: python:3.11-slim
script:
- pip install msh-cli dbt-core dbt-postgres dlt
- msh run
only:
- main
environment:
name: production
Add variables in Settings → CI/CD → Variables.
Multi-Environment Strategy
Environment-Specific Configuration
Create separate .env files for each environment:
.env.dev
.env.staging
.env.production
.env.dev:
DESTINATION__POSTGRES__CREDENTIALS="postgresql://user:pass@localhost:5432/analytics_dev"
.env.staging:
DESTINATION__POSTGRES__CREDENTIALS="postgresql://user:pass@staging-db:5432/analytics_staging"
.env.production:
DESTINATION__SNOWFLAKE__CREDENTIALS__DATABASE="ANALYTICS_PROD"
DESTINATION__SNOWFLAKE__CREDENTIALS__USERNAME="MSH_PROD_USER"
# ... other Snowflake credentials
Using --env Flag
Run msh with a specific environment:
# Development
msh run --env dev
# Staging
msh run --env staging
# Production
msh run --env prod
This loads the corresponding .env.<environment> file.
Orchestration with Airflow
Airflow DAG
Create dags/msh_pipeline.py:
from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime, timedelta
default_args = {
'owner': 'data-team',
'depends_on_past': False,
'start_date': datetime(2024, 1, 1),
'email_on_failure': True,
'email_on_retry': False,
'retries': 2,
'retry_delay': timedelta(minutes=5),
}
dag = DAG(
'msh_pipeline',
default_args=default_args,
description='Run msh data pipeline',
schedule_interval='0 2 * * *', # Daily at 2 AM
catchup=False,
)
# Health check
doctor = BashOperator(
task_id='msh_doctor',
bash_command='cd /opt/msh && msh doctor',
dag=dag,
)
# Run pipeline
run = BashOperator(
task_id='msh_run',
bash_command='cd /opt/msh && msh run --env prod',
dag=dag,
)
# Verify deployment
verify = BashOperator(
task_id='verify_deployment',
bash_command='cd /opt/msh && msh ui --verify',
dag=dag,
)
doctor >> run >> verify
Environment Variables in Airflow
Set environment variables in Airflow:
- Airflow UI → Admin → Variables
- Add each credential as a variable
- Reference in your DAG using
Variable.get('SNOWFLAKE_PASSWORD')
Or use Connections for database credentials.
Kubernetes Deployment
Kubernetes Manifests
k8s/configmap.yaml:
apiVersion: v1
kind: ConfigMap
metadata:
name: msh-config
data:
MSH_ENV: "production"
k8s/secret.yaml:
apiVersion: v1
kind: Secret
metadata:
name: msh-secrets
type: Opaque
stringData:
DESTINATION__SNOWFLAKE__CREDENTIALS__DATABASE: "ANALYTICS_PROD"
DESTINATION__SNOWFLAKE__CREDENTIALS__USERNAME: "MSH_USER"
DESTINATION__SNOWFLAKE__CREDENTIALS__PASSWORD: "secure_password"
DESTINATION__SNOWFLAKE__CREDENTIALS__HOST: "abc123.snowflakecomputing.com"
DESTINATION__SNOWFLAKE__CREDENTIALS__WAREHOUSE: "COMPUTE_WH"
DESTINATION__SNOWFLAKE__CREDENTIALS__ROLE: "TRANSFORMER"
STRIPE_API_KEY: "sk_live_..."
k8s/cronjob.yaml:
apiVersion: batch/v1
kind: CronJob
metadata:
name: msh-pipeline
spec:
schedule: "0 2 * * *" # Daily at 2 AM
jobTemplate:
spec:
template:
spec:
containers:
- name: msh
image: msh-pipeline:latest
command: ["msh", "run", "--env", "prod"]
envFrom:
- configMapRef:
name: msh-config
- secretRef:
name: msh-secrets
restartPolicy: OnFailure
Deploy to Kubernetes
kubectl apply -f k8s/configmap.yaml
kubectl apply -f k8s/secret.yaml
kubectl apply -f k8s/cronjob.yaml
Monitoring and Alerting
Health Checks
Add a health check endpoint to your deployment:
# In your CI/CD or cron job
msh doctor || exit 1
Logging
msh logs to stdout by default. Capture logs in your orchestrator:
Docker:
docker logs msh-pipeline > /var/log/msh/pipeline.log
Kubernetes:
kubectl logs -f cronjob/msh-pipeline
Alerting
Set up alerts for:
- Pipeline Failures: Monitor exit codes from
msh run - State Drift: Check
msh_state_historyfor failed deployments - Performance: Track execution time
Example: Slack Notification on Failure
#!/bin/bash
msh run --env prod
if [ $? -ne 0 ]; then
curl -X POST -H 'Content-type: application/json' \
--data '{"text":"msh pipeline failed!"}' \
$SLACK_WEBHOOK_URL
fi
Performance Tuning
Parallel Execution
For large projects, enable parallel execution:
msh run --threads 4
Resource Limits
In Kubernetes, set resource limits:
resources:
requests:
memory: "2Gi"
cpu: "1000m"
limits:
memory: "4Gi"
cpu: "2000m"
Incremental Runs
Use incremental execution for large datasets:
# In your .msh file
execution: incremental
incremental:
strategy: merge
primary_key: id
Security Best Practices
- Never Commit Secrets: Use
.gitignoreto exclude.envfiles - Use Secret Management: Store secrets in GitHub Secrets, AWS Secrets Manager, or HashiCorp Vault
- Least Privilege: Grant database users only necessary permissions
- Rotate Credentials: Regularly rotate API keys and database passwords
- Audit Logs: Enable audit logging in your destination database
- Network Security: Use VPCs and private endpoints for database connections
Rollback Strategy
If a deployment fails in production:
# Automatic rollback on failure
msh run --env prod --auto-rollback
# Manual rollback
msh rollback --env prod
Example: Complete Production Setup
Directory Structure:
my-msh-project/
├── .github/
│ └── workflows/
│ └── msh-deploy.yml
├── k8s/
│ ├── configmap.yaml
│ ├── secret.yaml
│ └── cronjob.yaml
├── models/
│ ├── customers.msh
│ └── revenue.msh
├── .env.dev
├── .env.staging
├── .env.production
├── Dockerfile
└── .gitignore
Deployment Flow:
- Developer pushes to feature branch → GitHub Actions runs
msh plan - PR is merged to
main→ GitHub Actions runsmsh run --env staging - Manual approval → Kubernetes CronJob runs
msh run --env proddaily - On failure → Slack alert sent, automatic rollback triggered
Next Steps
- Troubleshooting: Debug production issues
- CLI Reference: Full command documentation
- Lifecycle Contract: Understand Blue/Green deployment