Skip to main content

Introduction

msh (The Atomic Data Engine) is a next-generation data orchestration tool designed to bring software engineering rigor to data engineering.

Mission

Our mission is to make data pipelines atomic, reliable, and reversible. We believe that data teams should move with the same confidence and speed as software teams, using tools that enforce safety and best practices by default.

High-Level Architecture

msh orchestrates dlt (Ingestion) and dbt (Transformation) using a Blue/Green Deployment system with Remote State stored directly in your destination database.

High-Level Architecture Diagram

Key Concepts

  • Smart Ingest: Optimizes costs by fetching only the columns you actually select in your SQL.
  • Universal Connectivity: Connect any API to any DB, or DB to DB.
  • Atomic Reliability: Every run is a transaction. If it fails, nothing changes.
  • Lineage: Automatic graph generation from your SQL and Python code.
  • Polyglot: Mix SQL and Python (Polars/Pandas) seamlessly.