# Five Guardrails for Using AI Agents Safely in Engineering Workflows

*By Alex Chen — Published on doany.ai, April 2026*

Last quarter, an agent on one of our test pipelines hardcoded a live API key into a fixture file. The CI passed. The PR looked clean. A human caught it during review only because the key prefix looked familiar. That's not a process. That's luck.

AI agents are already embedded in how engineering teams work. GitHub reported that [Copilot was generating 46% of developers' code](https://github.blog/news-insights/research/research-quantifying-github-copilots-impact-on-developer-productivity-and-happiness/) as of 2023, and the tooling has only gotten more autonomous since — codegen agents, CI bots, PR reviewers that operate across entire repos. The question isn't whether your team is using agents. It's whether you have guardrails around them.

This isn't a fear piece. Agents make teams faster. But "just let it run" isn't a strategy. Here are five guardrails that actually work, drawn from our experience and conversations with a dozen engineering teams.

## 1. Sandbox everything

Agents should never run on bare metal or talk to production infrastructure. This sounds obvious, but most setups I've seen give agents the same access as the developer who triggered them.

Run agents in ephemeral containers with explicit resource limits. Mount filesystems read-only where possible. Block network access to production databases by default.

```yaml
# Example: restrict an agent's container at the orchestrator level
agent_runtime:
  isolation: container
  network_policy: deny-all
  allow_list:
    - registry.internal
    - api.github.com
  filesystem:
    /src: read-only
    /tmp/workspace: read-write
  ttl: 3600  # self-destructs after 1 hour
```

The principle is the same one behind ephemeral dev environments: if an agent's session can't outlive its task, the blast radius shrinks dramatically. Stripe [has talked publicly](https://stripe.com/blog/online-migrations) about running workloads in ephemeral environments that self-destruct — the same idea applies to agent execution.

## 2. Add human-in-the-loop checkpoints (without killing velocity)

Not every agent action needs a human tap on the shoulder. That defeats the point. But destructive actions — deployments, deletions, merges to main — must require explicit approval.

A tiered permission model works well:

| Action type | Agent permission | Example |
|---|---|---|
| Read | Auto-approve | Fetching logs, reading source files |
| Write | Review required | Opening PRs, editing code |
| Deploy/Delete | Explicit approval | Merging to main, modifying infra |

This mirrors how we already think about IAM roles. The agent gets least-privilege access, and the human stays in the loop where the cost of error is highest. GitLab's [2024 AI in DevSecOps report](https://about.gitlab.com/developer-survey/) found that teams with structured approval workflows for AI-generated changes reported fewer production incidents than teams with unrestricted agent access.

## 3. Validate outputs before they land

Agent-generated code should go through the same static analysis and secret scanning as human-written code — but automatically, before it ever reaches a reviewer.

Integrate tools like [gitleaks](https://github.com/gitleaks/gitleaks) or [TruffleHog](https://github.com/trufflesecurity/trufflehog) into your CI pipeline gating agent-generated PRs. Run your existing linters and SAST tools. This isn't new infrastructure — it's applying what you already have to a new source of input.

A 2023 Stanford study ([*Do Users Write More Insecure Code with AI Assistants?*](https://arxiv.org/abs/2211.03622), Perry et al.) found that participants using AI assistants produced less secure code and were more likely to believe it was secure. The failure mode isn't that agents write malicious code. It's that humans trust the output and skip scrutiny.

## 4. Build audit trails from day one

Every agent action should be logged: who triggered it, what tools it called, what changes it made, and what the outcome was. This isn't optional overhead — it's a compliance requirement that's coming for every team.

If you're SOC 2 certified or pursuing it, your auditors will ask about AI agent activity. The [NIST AI Risk Management Framework](https://www.nist.gov/artificial-intelligence/executive-order-safe-secure-and-trustworthy-artificial-intelligence) (AI RMF 1.0) explicitly calls for traceability and accountability in AI system operations. Get ahead of this now.

Practically, this means structured logs for every agent session:

```json
{
  "session_id": "agent-a1b2c3",
  "triggered_by": "alex.chen",
  "trigger": "pr_comment",
  "actions": [
    {"tool": "file_edit", "target": "src/auth.ts", "status": "completed"},
    {"tool": "run_tests", "target": "suite:auth", "status": "passed"}
  ],
  "timestamp": "2026-04-15T10:32:00Z"
}
```

## 5. Defend against prompt injection

Agents that ingest external content — issue descriptions, PR comments, documentation — have a real attack surface. A crafted payload in a GitHub issue can hijack an agent's behavior if the architecture doesn't account for it.

Simon Willison has [documented this class of vulnerability extensively](https://simonwillison.net/2023/Apr/14/worst-that-can-happen/): when an LLM treats untrusted user input as instructions, the results range from data exfiltration to unauthorized code changes. The [OWASP Top 10 for LLM Applications](https://owasp.org/www-project-top-10-for-large-language-model-applications/) lists prompt injection as the #1 risk for a reason.

Input sanitization helps but isn't sufficient. The stronger pattern is architectural: separate the agent's instruction context from external data, and never let untrusted content flow into a privileged execution path.

## Getting started without slowing down

You don't need all five on day one. Start with two:

1. **Sandbox your agent runtime.** Containers, network restrictions, ephemeral sessions. This takes a day to set up and immediately caps your downside.
2. **Turn on secret scanning in CI.** If you're not already running gitleaks or equivalent, you're one agent hallucination away from a leaked credential.

Then layer in the rest. Human-in-the-loop checkpoints where the stakes are highest. Audit logging for compliance. Prompt injection defenses as your agents start consuming external content.

AI agents are genuinely good at making engineering teams faster. The teams that get the most out of them won't be the ones who move fastest — they'll be the ones who don't have to roll something back at 2 AM because an agent went off-script.

---

*Alex is a staff engineer at doany.ai focused on developer experience and infrastructure.*