Safe(r) Deterministic AI Code Generation Through DSL Partitioning


AI can write code faster than you can review it. Deploying AI-generated code straight to production?

risky business

Risky business!

The promise is obvious: write business logic, data transformations, API integrations, and workflows at the pace of conversation. The risk is equally obvious: AI-generated code can be insecure, untested, unpredictable. Worse, it’s non-deterministic—ask an LLM to implement the same requirements twice and you’ll get completely different code.

This creates a problem: we want the velocity, but can’t afford the risk.


A Solution: Partition Generation with a DSL

Don’t fight AI code generation. Constrain where it operates.

A Domain-Specific Language sits between AI-generated orchestration and human-written critical components. The DSL acts as a capability boundary—AI can only express operations the DSL supports, and the DSL only exposes safe, pre-vetted primitives.

This narrows the interaction surface, making AI output deterministic and predictable. Same requirements, same DSL output.

Who Writes What?

LayerWho WritesChange FrequencyTestingExamples
Critical CoreHumans + careful AIMonthlyExhaustiveAuth, DB access, encryption
DSL RuntimeHumansMonthlyThoroughParser, executor, validators
Business LogicAI + oversightDaily/hourlySchema + access validationWorkflows, transforms, rules

Note: AI can help with core components, but those PRs require stringent review (more on this later).


Why This is Better

Speed With Reduced Risk

AI iterates on business logic rapidly because it’s working in a high-level DSL. Changes happen in hours, not days.

But the DSL runtime ensures that no matter what the AI generates:

  • Schema validated before execution
  • Access permissions checked and unauthorized operations gated
  • Only approved APIs are called
  • Data access goes through validated queries
  • Execution stays within resource limits

Determinism Through Constraint

Direct code generation is a mess:

# Generation 1
def process_order(order):
    if order.total > 100:
        discount = order.total * 0.1
        return order.total - discount
    return order.total

# Generation 2: Same requirements, different code
def process_order(order_data):
    base_amount = order_data["total"]
    discount_rate = 0.1 if base_amount >= 100 else 0.0
    final_amount = base_amount * (1 - discount_rate)
    return final_amount

Both work. But they’re syntactically different, use different variable names, different structures. Code reviews are tedious. Testing is inconsistent. Git diffs are noisy.

DSLs eliminate this:

# Generation 1
discount_rule:
  threshold: 100
  rate: 0.10

# Generation 2: Identical
discount_rule:
  threshold: 100
  rate: 0.10

Limited vocabulary. Enforced structure. Single canonical representation.

This means:

  • Predictable code review: Clear, scannable diffs
  • Consistent testing: Same test works every time
  • Unified debugging: Same error format always
  • Meaningful version control: Each commit is a semantic change

When AI output is predictable, you can build automation that relies on it—bulk testing, static analysis, automated migrations, documentation generation.

Built-In Governance

The DSL is your governance layer:

  • Version control: DSL files tracked in git
  • Code review: Even AI-generated changes get PR’d
  • Schema validation: DSL structure validated before execution
  • Access control: Permissions checked, unauthorized operations gated
  • Audit trail: Every execution logged with full context
  • Rollback: Revert the DSL file
  • Testing: DSL expressions can be unit tested

Progressive Complexity

Start simple, expand as needed:

Step 1: Basic transforms

transform:
  input: customer_data
  operations:
    - map:
        name: "${first_name} ${last_name}"
        email: "${email.toLowerCase()}"
    - filter: "age >= 18"
  output: validated_customers

Step 2: Multi-step workflows

workflow:
  name: onboard_customer
  steps:
    - validate: customer_schema
    - call: create_account_service
    - send: welcome_email_template
    - log: audit.customer_created

Step 3: Complex rules

decision_tree:
  name: loan_approval
  rules:
    - if: "credit_score >= 700 && debt_ratio < 0.3"
      then: approve
    - if: "credit_score >= 650 && income > 75000"
      then: manual_review
    - default: reject

Evolving the Core

The DSL pattern doesn’t freeze your core. Business requirements change. New integrations happen. Core components need updates.

The difference is how carefully you handle these changes.

AI Can Help, But With Guardrails

AspectDSL LayerCore Layer
AI roleGenerates complete implementationAssists human developer
Review1-2 reviewers, standard process2-3+ senior engineers, stringent
TestingSchema + access validation + integrationUnit + integration + security + perf
DeploymentHoursDays/weeks with staged rollout
ApprovalTeam leadArchitect + security sign-off

Example: Adding SMS Support

DSL team needs SMS:

workflow:
  steps:
    - send_notification:
        type: sms  # Not supported yet!
        recipient: "${customer.phone}"
        message: "Your order is ready"

Engineering ticket:

Title: Add SMS notification to core

Requirements:
- Twilio integration
- Rate limiting (100/min per customer)
- PII compliance (phone encryption)
- Delivery tracking
- Error handling

AI Assistance: Medium
Review: Stringent

Development with AI:

# Developer writes interface (human-led)
class NotificationService:
    def send_sms(
        self,
        recipient: str,
        message: str,
        idempotency_key: str
    ) -> NotificationResult:
        """
        Send SMS with PII protection and rate limiting.

        Security: Phone numbers encrypted at rest
        Rate limit: 100/min per customer via Redis
        Idempotency: Prevents duplicate sends
        """
        pass  # AI helps implement

# AI generates implementation
# Developer reviews line-by-line:
# - PII fields encrypted?
# - Rate limiting correct?
# - Errors handled safely?
# - Logs sanitized?

# Multiple iterations with AI
# Human adds edge cases AI missed
# Security team reviews

Review checklist:

  • Security: No SQL injection, XSS, auth bypass, credential leaks
  • PII compliance: Encrypted, logged safely, retained correctly
  • Error handling: All exceptions caught, no data loss
  • Performance: Load tested, no N+1 queries
  • Backward compatibility: Doesn’t break existing DSL
  • Documentation: API docs, security notes, examples
  • Test coverage: >90%, security tests included
  • Deployment plan: Staged rollout, monitoring, rollback
  • On-call runbook: Debugging steps, incident response

After approval, extend DSL:

# dsl/operations.yaml
notification_operations:
  send_sms:
    maps_to: NotificationService.send_sms
    parameters:
      recipient: {type: phone_number, required: true}
      message: {type: string, max_length: 160}
    rate_limit: 100_per_minute
    security_level: pii_handling

Now AI can use it:

workflow:
  name: order_ready_notification
  steps:
    - send_notification:
        type: sms
        recipient: "${customer.phone}"
        message: "Order #${order.id} ready for pickup!"

The Two-Track Model

Fast Track (DSL Layer):

  • AI generates complete workflows
  • Quick review (hours to days)
  • Deploy frequently (multiple times daily)
  • Low risk (sandboxed)
  • Team lead approval

Careful Track (Core Layer):

  • AI assists humans
  • Thorough review (days to weeks)
  • Deploy infrequently (weekly to monthly)
  • High risk (full system access)
  • Architecture team approval

This maximizes velocity where risk is lower (DSL layer) while enforcing rigor where it’s critical (core layer).

When to Update Core vs DSL

Extend DSL:

  • Requirements fit existing primitives
  • New combination of existing operations
  • Customer-specific logic
  • Experimental feature

Update Core:

  • Completely new capability (new API, data source)
  • Performance optimization at runtime
  • Security requirements changed
  • Bug fix in existing component

The DSL constrains 90% of changes to fast track. Only 10% need the careful track.


Real-World Example: Payment Workflow

Critical Core (human-written, audited):

# payments/core.py
class PaymentService:
    """Secure payment processing - human written"""

    def charge_card(self, customer_id: str, amount: Decimal) -> PaymentResult:
        # Extensive validation
        # PCI-compliant token handling
        # Idempotency checks
        # Fraud detection
        # Comprehensive logging
        ...

    def refund(self, transaction_id: str, amount: Decimal) -> RefundResult:
        ...

DSL Operations:

# dsl/operations.yaml
payment_operations:
  charge:
    maps_to: PaymentService.charge_card
    parameters:
      customer_id: {type: string, required: true}
      amount: {type: decimal, min: 0.01, max: 10000}

  refund:
    maps_to: PaymentService.refund
    parameters:
      transaction_id: {type: string, required: true}
      amount: {type: decimal, min: 0.01}

AI-Generated Workflow:

# workflows/subscription_renewal.yaml
# Generated by AI: "Create subscription renewal with retry logic"

workflow:
  name: subscription_renewal

  steps:
    - name: fetch_subscription
      operation: database.query
      params:
        table: subscriptions
        filter: "next_billing_date <= today()"

    - name: charge_customer
      operation: payment.charge
      params:
        customer_id: "${fetch_subscription.customer_id}"
        amount: "${fetch_subscription.amount}"
      retry:
        max_attempts: 3
        backoff: exponential
      on_error:
        goto: handle_failure

    - name: update_subscription
      operation: database.update
      params:
        table: subscriptions
        id: "${fetch_subscription.id}"
        fields:
          last_charge: "${charge_customer.transaction_id}"
          next_billing_date: "addMonths(today(), 1)"

    - name: send_receipt
      operation: notification.email
      params:
        template: receipt
        recipient: "${fetch_subscription.email}"
        data: "${charge_customer}"

    - name: handle_failure
      condition: "charge_customer.failed"
      operation: notification.email
      params:
        template: payment_failed
        recipient: "${fetch_subscription.email}"

Implementation Patterns

Start with YAML/JSON

Declarative configs are easy to parse, validate, and version control:

# Simple and scannable
workflow:
  name: process_order
  steps:
    - validate: order_schema
    - call: payment_service
    - send: confirmation_email

Pros: Easy to parse, validate, version control Cons: Limited expressiveness for complex logic Best for: Workflows, data transforms, simple rules

Embedded DSLs for Complexity

For sophisticated logic, use embedded DSLs in safe languages:

-- Lua embedded in runtime (sandboxed)
function calculate_discount(customer)
  if customer.orders > 10 then
    return 0.15
  elseif customer.lifetime_value > 1000 then
    return 0.10
  else
    return 0.05
  end
end

Pros: More expressive, handles complex logic Cons: Requires sandboxing, more attack surface Best for: Complex calculations, business rules, pricing

Schema Validation

Define strict schemas:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "workflow": {
      "type": "object",
      "properties": {
        "name": {"type": "string", "pattern": "^[a-z_]+$"},
        "steps": {"type": "array"}
      },
      "required": ["name", "steps"]
    }
  }
}

Testing Infrastructure

Make DSL testing trivial:

# dsl_test_framework.py
def test_dsl_workflow(dsl_file, mock_services):
    runtime = DSLRuntime(services=mock_services)
    result = runtime.execute(load_dsl(dsl_file))
    return result

# Example test
def test_subscription_renewal():
    mocks = {
        'database': MockDatabase(),
        'payment': MockPaymentService(),
        'notification': MockNotificationService()
    }

    result = test_dsl_workflow('subscription_renewal.yaml', mocks)
    assert result.success
    assert mocks['payment'].charge_called
    assert mocks['notification'].receipt_sent

Security Considerations

Sandbox the Runtime

  • Resource limits: CPU, memory, execution time
  • Network isolation: Control external service calls
  • Data access: Enforce row-level security, field masking
  • Capabilities: Explicitly grant permissions per DSL file

Audit Everything

audit_log = {
    'timestamp': '2026-01-06T10:30:00Z',
    'dsl_file': 'subscription_renewal.yaml',
    'dsl_version': 'abc123',
    'triggered_by': 'scheduled_job',
    'operations_executed': [
        'database.query',
        'payment.charge',
        'database.update',
        'notification.email'
    ],
    'duration_ms': 245,
    'success': True
}

Review Process

  1. AI generates DSL
  2. Automated schema validation (structure, syntax)
  3. Automated access validation (permissions, authorized operations)
  4. Automated tests run against mocks
  5. Human review approves
  6. Staged rollout: dev → staging → prod

Getting Started

  1. Pick one workflow: Start with a single, non-critical process
  2. Extract the core: Pull critical components into services
  3. Design minimal DSL: 3-5 operations covering 80% of use case
  4. Have AI generate it: Use your preferred model
  5. Test rigorously: Validate, test, review
  6. Iterate: Learn and expand

The Bottom Line

AI code generation is too valuable to avoid, too risky to use directly.

DSLs provide the safety partition:

  • AI moves fast on business logic
  • Humans maintain security and infrastructure
  • The DSL enforces boundaries and governance
  • Everyone wins: productivity, security, velocity

The architecture emerged from real experience integrating AI into enterprise systems. The key insight: don’t fight AI’s tendency to generate code—just make sure it’s generating the right kind of code.