Jan 17, 2026

Safe(r) Deterministic AI Code Generation Through DSL Partitioning

AI can write code faster than you can review it. Deploying AI-generated code straight to production?

risky business

Risky business!

The promise is obvious: write business logic, data transformations, API integrations, and workflows at the pace of conversation. The risk is equally obvious: AI-generated code can be insecure, untested, unpredictable. Worse, it’s non-deterministic. Ask an LLM to implement the same requirements twice and you’ll get completely different code.

This creates a problem: we want the velocity, but can’t afford the risk.

A Solution: Partition Generation with a DSL

Don’t fight AI code generation. Constrain where it operates.

A Domain-Specific Language sits between AI-generated orchestration and human-written critical components. The DSL acts as a capability boundary: AI can only express operations the DSL supports, and the DSL only exposes safe, pre-vetted primitives.

This narrows the interaction surface, making AI output deterministic and predictable. Same requirements, same DSL output.

Who Writes What?

Layer	Who Writes	Change Frequency	Testing	Examples
Critical Core	Humans + careful AI	Monthly	Exhaustive	Auth, DB access, encryption
DSL Runtime	Humans	Monthly	Thorough	Parser, executor, validators
Business Logic	AI + oversight	Daily/hourly	Schema + access validation	Workflows, transforms, rules

Note: AI can help with core components, but those PRs require stringent review (more on this later).

Why This is Better

Speed With Reduced Risk

AI iterates on business logic rapidly because it’s working in a high-level DSL. Changes happen in hours, not days.

But the DSL runtime ensures that no matter what the AI generates:

Schema validated before execution
Access permissions checked and unauthorized operations gated
Only approved APIs are called
Data access goes through validated queries
Execution stays within resource limits

Determinism Through Constraint

Direct code generation is a mess:

# Generation 1
def process_order(order):
    if order.total > 100:
        discount = order.total * 0.1
        return order.total - discount
    return order.total

# Generation 2: Same requirements, different code
def process_order(order_data):
    base_amount = order_data["total"]
    discount_rate = 0.1 if base_amount >= 100 else 0.0
    final_amount = base_amount * (1 - discount_rate)
    return final_amount

Both work. But they’re syntactically different, use different variable names, different structures. Code reviews are tedious. Testing is inconsistent. Git diffs are noisy.

DSLs eliminate this:

# Generation 1
discount_rule:
  threshold: 100
  rate: 0.10

# Generation 2: Identical
discount_rule:
  threshold: 100
  rate: 0.10

Limited vocabulary. Enforced structure. Single canonical representation.

This means:

Predictable code review: Clear, scannable diffs
Consistent testing: Same test works every time
Unified debugging: Same error format always
Meaningful version control: Each commit is a semantic change

When AI output is predictable, you can build automation that relies on it: bulk testing, static analysis, automated migrations, documentation generation.

Built-In Governance

The DSL is your governance layer:

Version control: DSL files tracked in git
Code review: Even AI-generated changes get PR’d
Schema validation: DSL structure validated before execution
Access control: Permissions checked, unauthorized operations gated
Audit trail: Every execution logged with full context
Rollback: Revert the DSL file
Testing: DSL expressions can be unit tested

Progressive Complexity

Start simple, expand as needed:

Step 1: Basic transforms

transform:
  input: customer_data
  operations:
    - map:
        name: "${first_name} ${last_name}"
        email: "${email.toLowerCase()}"
    - filter: "age >= 18"
  output: validated_customers

Step 2: Multi-step workflows

workflow:
  name: onboard_customer
  steps:
    - validate: customer_schema
    - call: create_account_service
    - send: welcome_email_template
    - log: audit.customer_created

Step 3: Complex rules

decision_tree:
  name: loan_approval
  rules:
    - if: "credit_score >= 700 && debt_ratio < 0.3"
      then: approve
    - if: "credit_score >= 650 && income > 75000"
      then: manual_review
    - default: reject

Evolving the Core

The DSL pattern doesn’t freeze your core. Business requirements change. New integrations happen. Core components need updates.

The difference is how carefully you handle these changes.

AI Can Help, But With Guardrails

Aspect	DSL Layer	Core Layer
AI role	Generates complete implementation	Assists human developer
Review	1-2 reviewers, standard process	2-3+ senior engineers, stringent
Testing	Schema + access validation + integration	Unit + integration + security + perf
Deployment	Hours	Days/weeks with staged rollout
Approval	Team lead	Architect + security sign-off

Example: Adding SMS Support

DSL team needs SMS:

workflow:
  steps:
    - send_notification:
        type: sms  # Not supported yet!
        recipient: "${customer.phone}"
        message: "Your order is ready"

Engineering ticket:

Title: Add SMS notification to core

Requirements:
- Twilio integration
- Rate limiting (100/min per customer)
- PII compliance (phone encryption)
- Delivery tracking
- Error handling

AI Assistance: Medium
Review: Stringent

Development with AI:

# Developer writes interface (human-led)
class NotificationService:
    def send_sms(
        self,
        recipient: str,
        message: str,
        idempotency_key: str
    ) -> NotificationResult:
        """
        Send SMS with PII protection and rate limiting.

        Security: Phone numbers encrypted at rest
        Rate limit: 100/min per customer via Redis
        Idempotency: Prevents duplicate sends
        """
        pass  # AI helps implement

# AI generates implementation
# Developer reviews line-by-line:
# - PII fields encrypted?
# - Rate limiting correct?
# - Errors handled safely?
# - Logs sanitized?

# Multiple iterations with AI
# Human adds edge cases AI missed
# Security team reviews

Review checklist:

Security: No SQL injection, XSS, auth bypass, credential leaks
PII compliance: Encrypted, logged safely, retained correctly
Error handling: All exceptions caught, no data loss
Performance: Load tested, no N+1 queries
Backward compatibility: Doesn’t break existing DSL
Documentation: API docs, security notes, examples
Test coverage: >90%, security tests included
Deployment plan: Staged rollout, monitoring, rollback
On-call runbook: Debugging steps, incident response

After approval, extend DSL:

# dsl/operations.yaml
notification_operations:
  send_sms:
    maps_to: NotificationService.send_sms
    parameters:
      recipient: {type: phone_number, required: true}
      message: {type: string, max_length: 160}
    rate_limit: 100_per_minute
    security_level: pii_handling

Now AI can use it:

workflow:
  name: order_ready_notification
  steps:
    - send_notification:
        type: sms
        recipient: "${customer.phone}"
        message: "Order #${order.id} ready for pickup!"

The Two-Track Model

Fast Track (DSL Layer):

AI generates complete workflows
Quick review (hours to days)
Deploy frequently (multiple times daily)
Low risk (sandboxed)
Team lead approval

Careful Track (Core Layer):

AI assists humans
Thorough review (days to weeks)
Deploy infrequently (weekly to monthly)
High risk (full system access)
Architecture team approval

This maximizes velocity where risk is lower (DSL layer) while enforcing rigor where it’s critical (core layer).

When to Update Core vs DSL

Extend DSL:

Requirements fit existing primitives
New combination of existing operations
Customer-specific logic
Experimental feature

Update Core:

Completely new capability (new API, data source)
Performance optimization at runtime
Security requirements changed
Bug fix in existing component

The DSL constrains 90% of changes to fast track. Only 10% need the careful track.

Real-World Example: Payment Workflow

Critical Core (human-written, audited):

# payments/core.py
class PaymentService:
    """Secure payment processing - human written"""

    def charge_card(self, customer_id: str, amount: Decimal) -> PaymentResult:
        # Extensive validation
        # PCI-compliant token handling
        # Idempotency checks
        # Fraud detection
        # Comprehensive logging
        ...

    def refund(self, transaction_id: str, amount: Decimal) -> RefundResult:
        ...

DSL Operations:

# dsl/operations.yaml
payment_operations:
  charge:
    maps_to: PaymentService.charge_card
    parameters:
      customer_id: {type: string, required: true}
      amount: {type: decimal, min: 0.01, max: 10000}

  refund:
    maps_to: PaymentService.refund
    parameters:
      transaction_id: {type: string, required: true}
      amount: {type: decimal, min: 0.01}

AI-Generated Workflow:

# workflows/subscription_renewal.yaml
# Generated by AI: "Create subscription renewal with retry logic"

workflow:
  name: subscription_renewal

  steps:
    - name: fetch_subscription
      operation: database.query
      params:
        table: subscriptions
        filter: "next_billing_date <= today()"

    - name: charge_customer
      operation: payment.charge
      params:
        customer_id: "${fetch_subscription.customer_id}"
        amount: "${fetch_subscription.amount}"
      retry:
        max_attempts: 3
        backoff: exponential
      on_error:
        goto: handle_failure

    - name: update_subscription
      operation: database.update
      params:
        table: subscriptions
        id: "${fetch_subscription.id}"
        fields:
          last_charge: "${charge_customer.transaction_id}"
          next_billing_date: "addMonths(today(), 1)"

    - name: send_receipt
      operation: notification.email
      params:
        template: receipt
        recipient: "${fetch_subscription.email}"
        data: "${charge_customer}"

    - name: handle_failure
      condition: "charge_customer.failed"
      operation: notification.email
      params:
        template: payment_failed
        recipient: "${fetch_subscription.email}"

Implementation Patterns

Start with YAML/JSON

Declarative configs are easy to parse, validate, and version control:

# Simple and scannable
workflow:
  name: process_order
  steps:
    - validate: order_schema
    - call: payment_service
    - send: confirmation_email

Pros: Easy to parse, validate, version control Cons: Limited expressiveness for complex logic Best for: Workflows, data transforms, simple rules

Embedded DSLs for Complexity

For sophisticated logic, use embedded DSLs in safe languages:

-- Lua embedded in runtime (sandboxed)
function calculate_discount(customer)
  if customer.orders > 10 then
    return 0.15
  elseif customer.lifetime_value > 1000 then
    return 0.10
  else
    return 0.05
  end
end

Pros: More expressive, handles complex logic Cons: Requires sandboxing, more attack surface Best for: Complex calculations, business rules, pricing

Schema Validation

Define strict schemas:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "workflow": {
      "type": "object",
      "properties": {
        "name": {"type": "string", "pattern": "^[a-z_]+$"},
        "steps": {"type": "array"}
      },
      "required": ["name", "steps"]
    }
  }
}

Testing Infrastructure

Make DSL testing trivial:

# dsl_test_framework.py
def test_dsl_workflow(dsl_file, mock_services):
    runtime = DSLRuntime(services=mock_services)
    result = runtime.execute(load_dsl(dsl_file))
    return result

# Example test
def test_subscription_renewal():
    mocks = {
        'database': MockDatabase(),
        'payment': MockPaymentService(),
        'notification': MockNotificationService()
    }

    result = test_dsl_workflow('subscription_renewal.yaml', mocks)
    assert result.success
    assert mocks['payment'].charge_called
    assert mocks['notification'].receipt_sent

Security Considerations

Sandbox the Runtime

Resource limits: CPU, memory, execution time
Network isolation: Control external service calls
Data access: Enforce row-level security, field masking
Capabilities: Explicitly grant permissions per DSL file

Audit Everything

audit_log = {
    'timestamp': '2026-01-06T10:30:00Z',
    'dsl_file': 'subscription_renewal.yaml',
    'dsl_version': 'abc123',
    'triggered_by': 'scheduled_job',
    'operations_executed': [
        'database.query',
        'payment.charge',
        'database.update',
        'notification.email'
    ],
    'duration_ms': 245,
    'success': True
}

Review Process

AI generates DSL
Automated schema validation (structure, syntax)
Automated access validation (permissions, authorized operations)
Automated tests run against mocks
Human review approves
Staged rollout: dev → staging → prod

Getting Started

Pick one workflow: Start with a single, non-critical process
Extract the core: Pull critical components into services
Design minimal DSL: 3-5 operations covering 80% of use case
Have AI generate it: Use your preferred model
Test rigorously: Validate, test, review
Iterate: Learn and expand

The Bottom Line

AI code generation is too valuable to avoid, too risky to use directly.

DSLs provide the safety partition:

AI moves fast on business logic
Humans maintain security and infrastructure
The DSL enforces boundaries and governance
Everyone wins: productivity, security, velocity

The architecture emerged from real experience integrating AI into enterprise systems. The key insight: don’t fight AI’s tendency to generate code. Just make sure it’s generating the right kind of code.