Safe(r) Deterministic AI Code Generation Through DSL Partitioning
AI can write code faster than you can review it. Deploying AI-generated code straight to production?

Risky business!
The promise is obvious: write business logic, data transformations, API integrations, and workflows at the pace of conversation. The risk is equally obvious: AI-generated code can be insecure, untested, unpredictable. Worse, it’s non-deterministic—ask an LLM to implement the same requirements twice and you’ll get completely different code.
This creates a problem: we want the velocity, but can’t afford the risk.
A Solution: Partition Generation with a DSL
Don’t fight AI code generation. Constrain where it operates.
A Domain-Specific Language sits between AI-generated orchestration and human-written critical components. The DSL acts as a capability boundary—AI can only express operations the DSL supports, and the DSL only exposes safe, pre-vetted primitives.
This narrows the interaction surface, making AI output deterministic and predictable. Same requirements, same DSL output.
Who Writes What?
| Layer | Who Writes | Change Frequency | Testing | Examples |
|---|---|---|---|---|
| Critical Core | Humans + careful AI | Monthly | Exhaustive | Auth, DB access, encryption |
| DSL Runtime | Humans | Monthly | Thorough | Parser, executor, validators |
| Business Logic | AI + oversight | Daily/hourly | Schema + access validation | Workflows, transforms, rules |
Note: AI can help with core components, but those PRs require stringent review (more on this later).
Why This is Better
Speed With Reduced Risk
AI iterates on business logic rapidly because it’s working in a high-level DSL. Changes happen in hours, not days.
But the DSL runtime ensures that no matter what the AI generates:
- Schema validated before execution
- Access permissions checked and unauthorized operations gated
- Only approved APIs are called
- Data access goes through validated queries
- Execution stays within resource limits
Determinism Through Constraint
Direct code generation is a mess:
# Generation 1
def process_order(order):
if order.total > 100:
discount = order.total * 0.1
return order.total - discount
return order.total
# Generation 2: Same requirements, different code
def process_order(order_data):
base_amount = order_data["total"]
discount_rate = 0.1 if base_amount >= 100 else 0.0
final_amount = base_amount * (1 - discount_rate)
return final_amount
Both work. But they’re syntactically different, use different variable names, different structures. Code reviews are tedious. Testing is inconsistent. Git diffs are noisy.
DSLs eliminate this:
# Generation 1
discount_rule:
threshold: 100
rate: 0.10
# Generation 2: Identical
discount_rule:
threshold: 100
rate: 0.10
Limited vocabulary. Enforced structure. Single canonical representation.
This means:
- Predictable code review: Clear, scannable diffs
- Consistent testing: Same test works every time
- Unified debugging: Same error format always
- Meaningful version control: Each commit is a semantic change
When AI output is predictable, you can build automation that relies on it—bulk testing, static analysis, automated migrations, documentation generation.
Built-In Governance
The DSL is your governance layer:
- Version control: DSL files tracked in git
- Code review: Even AI-generated changes get PR’d
- Schema validation: DSL structure validated before execution
- Access control: Permissions checked, unauthorized operations gated
- Audit trail: Every execution logged with full context
- Rollback: Revert the DSL file
- Testing: DSL expressions can be unit tested
Progressive Complexity
Start simple, expand as needed:
Step 1: Basic transforms
transform:
input: customer_data
operations:
- map:
name: "${first_name} ${last_name}"
email: "${email.toLowerCase()}"
- filter: "age >= 18"
output: validated_customers
Step 2: Multi-step workflows
workflow:
name: onboard_customer
steps:
- validate: customer_schema
- call: create_account_service
- send: welcome_email_template
- log: audit.customer_created
Step 3: Complex rules
decision_tree:
name: loan_approval
rules:
- if: "credit_score >= 700 && debt_ratio < 0.3"
then: approve
- if: "credit_score >= 650 && income > 75000"
then: manual_review
- default: reject
Evolving the Core
The DSL pattern doesn’t freeze your core. Business requirements change. New integrations happen. Core components need updates.
The difference is how carefully you handle these changes.
AI Can Help, But With Guardrails
| Aspect | DSL Layer | Core Layer |
|---|---|---|
| AI role | Generates complete implementation | Assists human developer |
| Review | 1-2 reviewers, standard process | 2-3+ senior engineers, stringent |
| Testing | Schema + access validation + integration | Unit + integration + security + perf |
| Deployment | Hours | Days/weeks with staged rollout |
| Approval | Team lead | Architect + security sign-off |
Example: Adding SMS Support
DSL team needs SMS:
workflow:
steps:
- send_notification:
type: sms # Not supported yet!
recipient: "${customer.phone}"
message: "Your order is ready"
Engineering ticket:
Title: Add SMS notification to core
Requirements:
- Twilio integration
- Rate limiting (100/min per customer)
- PII compliance (phone encryption)
- Delivery tracking
- Error handling
AI Assistance: Medium
Review: Stringent
Development with AI:
# Developer writes interface (human-led)
class NotificationService:
def send_sms(
self,
recipient: str,
message: str,
idempotency_key: str
) -> NotificationResult:
"""
Send SMS with PII protection and rate limiting.
Security: Phone numbers encrypted at rest
Rate limit: 100/min per customer via Redis
Idempotency: Prevents duplicate sends
"""
pass # AI helps implement
# AI generates implementation
# Developer reviews line-by-line:
# - PII fields encrypted?
# - Rate limiting correct?
# - Errors handled safely?
# - Logs sanitized?
# Multiple iterations with AI
# Human adds edge cases AI missed
# Security team reviews
Review checklist:
- Security: No SQL injection, XSS, auth bypass, credential leaks
- PII compliance: Encrypted, logged safely, retained correctly
- Error handling: All exceptions caught, no data loss
- Performance: Load tested, no N+1 queries
- Backward compatibility: Doesn’t break existing DSL
- Documentation: API docs, security notes, examples
- Test coverage: >90%, security tests included
- Deployment plan: Staged rollout, monitoring, rollback
- On-call runbook: Debugging steps, incident response
After approval, extend DSL:
# dsl/operations.yaml
notification_operations:
send_sms:
maps_to: NotificationService.send_sms
parameters:
recipient: {type: phone_number, required: true}
message: {type: string, max_length: 160}
rate_limit: 100_per_minute
security_level: pii_handling
Now AI can use it:
workflow:
name: order_ready_notification
steps:
- send_notification:
type: sms
recipient: "${customer.phone}"
message: "Order #${order.id} ready for pickup!"
The Two-Track Model
Fast Track (DSL Layer):
- AI generates complete workflows
- Quick review (hours to days)
- Deploy frequently (multiple times daily)
- Low risk (sandboxed)
- Team lead approval
Careful Track (Core Layer):
- AI assists humans
- Thorough review (days to weeks)
- Deploy infrequently (weekly to monthly)
- High risk (full system access)
- Architecture team approval
This maximizes velocity where risk is lower (DSL layer) while enforcing rigor where it’s critical (core layer).
When to Update Core vs DSL
Extend DSL:
- Requirements fit existing primitives
- New combination of existing operations
- Customer-specific logic
- Experimental feature
Update Core:
- Completely new capability (new API, data source)
- Performance optimization at runtime
- Security requirements changed
- Bug fix in existing component
The DSL constrains 90% of changes to fast track. Only 10% need the careful track.
Real-World Example: Payment Workflow
Critical Core (human-written, audited):
# payments/core.py
class PaymentService:
"""Secure payment processing - human written"""
def charge_card(self, customer_id: str, amount: Decimal) -> PaymentResult:
# Extensive validation
# PCI-compliant token handling
# Idempotency checks
# Fraud detection
# Comprehensive logging
...
def refund(self, transaction_id: str, amount: Decimal) -> RefundResult:
...
DSL Operations:
# dsl/operations.yaml
payment_operations:
charge:
maps_to: PaymentService.charge_card
parameters:
customer_id: {type: string, required: true}
amount: {type: decimal, min: 0.01, max: 10000}
refund:
maps_to: PaymentService.refund
parameters:
transaction_id: {type: string, required: true}
amount: {type: decimal, min: 0.01}
AI-Generated Workflow:
# workflows/subscription_renewal.yaml
# Generated by AI: "Create subscription renewal with retry logic"
workflow:
name: subscription_renewal
steps:
- name: fetch_subscription
operation: database.query
params:
table: subscriptions
filter: "next_billing_date <= today()"
- name: charge_customer
operation: payment.charge
params:
customer_id: "${fetch_subscription.customer_id}"
amount: "${fetch_subscription.amount}"
retry:
max_attempts: 3
backoff: exponential
on_error:
goto: handle_failure
- name: update_subscription
operation: database.update
params:
table: subscriptions
id: "${fetch_subscription.id}"
fields:
last_charge: "${charge_customer.transaction_id}"
next_billing_date: "addMonths(today(), 1)"
- name: send_receipt
operation: notification.email
params:
template: receipt
recipient: "${fetch_subscription.email}"
data: "${charge_customer}"
- name: handle_failure
condition: "charge_customer.failed"
operation: notification.email
params:
template: payment_failed
recipient: "${fetch_subscription.email}"
Implementation Patterns
Start with YAML/JSON
Declarative configs are easy to parse, validate, and version control:
# Simple and scannable
workflow:
name: process_order
steps:
- validate: order_schema
- call: payment_service
- send: confirmation_email
Pros: Easy to parse, validate, version control Cons: Limited expressiveness for complex logic Best for: Workflows, data transforms, simple rules
Embedded DSLs for Complexity
For sophisticated logic, use embedded DSLs in safe languages:
-- Lua embedded in runtime (sandboxed)
function calculate_discount(customer)
if customer.orders > 10 then
return 0.15
elseif customer.lifetime_value > 1000 then
return 0.10
else
return 0.05
end
end
Pros: More expressive, handles complex logic Cons: Requires sandboxing, more attack surface Best for: Complex calculations, business rules, pricing
Schema Validation
Define strict schemas:
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"workflow": {
"type": "object",
"properties": {
"name": {"type": "string", "pattern": "^[a-z_]+$"},
"steps": {"type": "array"}
},
"required": ["name", "steps"]
}
}
}
Testing Infrastructure
Make DSL testing trivial:
# dsl_test_framework.py
def test_dsl_workflow(dsl_file, mock_services):
runtime = DSLRuntime(services=mock_services)
result = runtime.execute(load_dsl(dsl_file))
return result
# Example test
def test_subscription_renewal():
mocks = {
'database': MockDatabase(),
'payment': MockPaymentService(),
'notification': MockNotificationService()
}
result = test_dsl_workflow('subscription_renewal.yaml', mocks)
assert result.success
assert mocks['payment'].charge_called
assert mocks['notification'].receipt_sent
Security Considerations
Sandbox the Runtime
- Resource limits: CPU, memory, execution time
- Network isolation: Control external service calls
- Data access: Enforce row-level security, field masking
- Capabilities: Explicitly grant permissions per DSL file
Audit Everything
audit_log = {
'timestamp': '2026-01-06T10:30:00Z',
'dsl_file': 'subscription_renewal.yaml',
'dsl_version': 'abc123',
'triggered_by': 'scheduled_job',
'operations_executed': [
'database.query',
'payment.charge',
'database.update',
'notification.email'
],
'duration_ms': 245,
'success': True
}
Review Process
- AI generates DSL
- Automated schema validation (structure, syntax)
- Automated access validation (permissions, authorized operations)
- Automated tests run against mocks
- Human review approves
- Staged rollout: dev → staging → prod
Getting Started
- Pick one workflow: Start with a single, non-critical process
- Extract the core: Pull critical components into services
- Design minimal DSL: 3-5 operations covering 80% of use case
- Have AI generate it: Use your preferred model
- Test rigorously: Validate, test, review
- Iterate: Learn and expand
The Bottom Line
AI code generation is too valuable to avoid, too risky to use directly.
DSLs provide the safety partition:
- AI moves fast on business logic
- Humans maintain security and infrastructure
- The DSL enforces boundaries and governance
- Everyone wins: productivity, security, velocity
The architecture emerged from real experience integrating AI into enterprise systems. The key insight: don’t fight AI’s tendency to generate code—just make sure it’s generating the right kind of code.