Skip to main content

Glossary Policies

Define and enforce rules and constraints using glossary policies.

Purpose

Policies provide:

  • Data governance: Enforce rules across assets
  • PII protection: Mask or block PII columns
  • Quality assurance: Ensure data quality standards
  • Compliance: Meet regulatory requirements

Policy Types

PII Protection

Protect personally identifiable information:

policies:
- name: No PII in public assets
rule: PII columns cannot be in public schema
pii_columns: [email, ssn, phone]
applies_to: [public.*]
enforcement: strict

Enforcement:

  • PII columns are masked in context packs
  • PII columns are blocked in public assets
  • Policy violations are reported

Data Quality

Ensure data quality standards:

policies:
- name: Revenue must be positive
rule: Revenue columns must be > 0
applies_to: [revenue, amount]
enforcement: warning

Enforcement:

  • Warnings shown during AI generation
  • Tests suggested for policy compliance
  • Violations reported in reviews

Schema Constraints

Enforce schema constraints:

policies:
- name: Required columns
rule: All staging assets must have id and created_at columns
required_columns: [id, created_at]
applies_to: [staging.*]
enforcement: strict

Policy Structure

Basic Policy

policies:
- name: Policy Name
rule: Policy description

Policy with Enforcement

policies:
- name: No PII in public assets
rule: PII columns cannot be in public schema
pii_columns: [email, ssn, phone]
applies_to: [public.*]
enforcement: strict

Enforcement Levels:

  • strict - Block violations
  • warning - Warn but allow
  • info - Inform only

Policy with Conditions

policies:
- name: Revenue must be positive
rule: Revenue columns must be > 0
applies_to: [revenue, amount]
condition: "column_value > 0"
enforcement: warning

Creating Policies

Using CLI

Policies are created manually in glossary.yaml or msh.yaml.

Manual Creation

Create policies in glossary.yaml:

policies:
- name: No PII in public assets
rule: PII columns cannot be in public schema
pii_columns: [email, ssn, phone]
applies_to: [public.*]
enforcement: strict

- name: Revenue must be positive
rule: Revenue columns must be > 0
applies_to: [revenue, amount]
enforcement: warning

Or in msh.yaml:

glossary:
policies:
- name: No PII in public assets
rule: PII columns cannot be in public schema
pii_columns: [email, ssn, phone]
applies_to: [public.*]
enforcement: strict

Policy Enforcement

During AI Generation

Policies are checked during AI generation:

msh ai new --name public_customers
# Description: "Create asset with email column in public schema"

✗ Policy violation: PII column 'email' cannot be in public schema
Suggested fix: Use private schema or mask email column

Generation blocked for policy violation.

During Reviews

Policies are checked during reviews:

msh ai review assets/revenue.msh
# Output includes policy violations

In Context Packs

PII columns are masked in context packs:

{
"columns": [
{"name": "customer_id", "type": "integer"},
{"name": "email", "type": "string", "masked": true}
]
}

Policy Examples

PII Protection

policies:
- name: No PII in public assets
rule: PII columns cannot be in public schema
pii_columns: [email, ssn, phone, credit_card]
applies_to: [public.*]
enforcement: strict

Data Quality

policies:
- name: Revenue must be positive
rule: Revenue columns must be > 0
applies_to: [revenue, amount, total]
condition: "column_value > 0"
enforcement: warning

Schema Constraints

policies:
- name: Required columns
rule: All staging assets must have id and created_at columns
required_columns: [id, created_at]
applies_to: [staging.*]
enforcement: strict

Naming Conventions

policies:
- name: Column naming
rule: Columns must follow naming conventions
naming_pattern: "^[a-z][a-z0-9_]*$"
applies_to: [*]
enforcement: warning

Best Practices

Clear Rules

Write clear, actionable rules:

# ✅ Good
rule: PII columns cannot be in public schema

# ❌ Bad
rule: Be careful with PII

Specific Applies To

Be specific about what policies apply to:

# ✅ Good
applies_to: [public.*, staging.*]

# ❌ Bad
applies_to: [*]

Appropriate Enforcement

Use appropriate enforcement levels:

# Strict for security
enforcement: strict

# Warning for quality
enforcement: warning

# Info for documentation
enforcement: info