Why do AI agents need safety standards?
What happens when an AI agent runs without boundaries?
AI agents are fundamentally different from traditional software. A web server handles requests within defined parameters. An AI agent decides what to do next — and it does so at machine speed, continuously, across multiple systems simultaneously.
Without explicit boundaries, a single agent can exhaust an API budget in minutes. 83% of data breaches in 2025 involved compromised credentials (IBM Cost of a Data Breach Report) — and AI agents routinely handle credentials to call external services. A $50 cost limit becomes a $2,000 bill. A draft email becomes a sent email. A staging deploy becomes a production deploy.
The failure modes compound. An agent that can read files and call APIs can accidentally exfiltrate data. An agent that can write code can introduce vulnerabilities. An agent that can send messages can damage client relationships. Speed amplifies every mistake. What a human does in a day, an agent does in seconds — including the mistakes.
What regulations require AI agent safety controls?
The regulatory landscape for AI agents is crystallising rapidly. The EU AI Act, effective August 2, 2026, mandates human oversight and shutdown capabilities for high-risk AI systems. Article 14 requires that AI systems "can be effectively overseen by natural persons" with the ability to "interrupt, pause or stop the system."
The Colorado AI Act (June 2026) requires impact assessments and transparency for high-risk AI decisions. California's Transparent AI Disclosure Act, the Texas Responsible AI Governance Act, and Illinois HB 3773 all reference "kill switch" or "human override" requirements. At least 14 US states had active AI governance legislation as of January 2026.
Beyond AI-specific laws, existing frameworks apply directly: GDPR requires encryption of personal data — relevant when agents process user information. SOC 2 Type II requires encryption controls — relevant when agents handle credentials. ISO 27001 requires information security management — relevant to every agent that touches a database.
How does the AI Agent Safety Stack prevent incidents?
The Stack applies a principle that's worked in every other engineering discipline: separation of concerns. One file per concern. Each specification is independent — use one or all twelve. They complement each other but don't require each other.
The architecture is defence-in-depth. THROTTLE.md slows the agent down before it hits hard limits. ESCALATE.md requires human approval for high-risk actions. FAILSAFE.md defines safe fallback states. KILLSWITCH.md provides emergency stop. TERMINATE.md handles permanent shutdown when recovery isn't possible. Each layer catches what the previous layer missed.
Critically, these specifications are version-controlled, auditable, and co-located with your code. When a regulator asks "what safety controls does your AI agent have?" — you point to the files in your repo. When an auditor asks for evidence of human oversight — you show the git history. One file serves four audiences: the agent (reads it on startup), the engineer (reads it during code review), the compliance team (reads it during audits), and the regulator (reads it if something goes wrong).
| Capability | Safety Stack | Ad-hoc policies | No policy |
|---|---|---|---|
| Version controlled | Yes | Sometimes | No |
| Auditable by regulators | Yes | Partially | No |
| Machine-readable | Yes | No | No |
| Co-located with code | Yes | Rarely | No |
| Standardised format | Yes | No | No |
| EU AI Act compatible | Yes | Depends | No |
What did teams use before these specifications?
Before the AI Agent Safety Stack, safety rules lived in three places — all of them wrong. Hardcoded in the system prompt: invisible to auditors, lost when the prompt changes, and impossible to version-control independently. Buried in config files: scattered across environment variables, YAML configs, and framework-specific settings that no compliance team would ever find. Missing entirely: the most common case, where safety boundaries simply didn't exist.
Some teams documented safety rules in Notion pages, Confluence wikis, or Google Docs. The problem: documentation that isn't co-located with code drifts. The wiki says the spend limit is $100. The actual limit in the code is $500. No one noticed because no one reads the wiki during code review.
Plain-text Markdown in the repository root solves every one of these problems. It's version-controlled (git tracks every change). It's auditable (diff the file to see what changed and when). It's human-readable (any stakeholder can open it). It's machine-readable (the agent parses it on startup). And it's impossible to ignore — it's right there in the project root, next to README.md, visible in every file listing.
Last Updated: 13 March 2026
The AI Agent Safety Stack
Who builds this?
The AI Agent Safety Stack is maintained as a collection of open-source projects under the MIT licence. Each specification has its own domain, GitHub repository, and community.
The stack was created to address a gap in the AI agent ecosystem: safety rules that are version-controlled, auditable, machine-readable, and co-located with your code. Not buried in wikis. Not hardcoded in prompts. Not missing entirely.
Frequently asked questions
Start with one file.
Add more as you grow.
Begin with KILLSWITCH.md for emergency stop boundaries. Add THROTTLE.md for cost control. Add ENCRYPT.md for data protection. The stack grows with your agent.
GET STARTED ON GITHUBOr email directly: [email protected]