# Create queue drain runbook

## Properties (Decision Log database)

| Property      | Value |
|---------------|-------|
| Title         | Create queue drain runbook |
| Decision Date | 2026-04-12 |
| Status        | Decided |
| Category      | Process |
| Severity      | SEV-2 |
| Owner         | Carlos Mendez |
| Tags          | runbook, ops, queue-drain, warehouse-routing |
| Due Date      | 2026-04-16 |
| Incident ID   | INC-2026-0412-001 |
| Related Pages | → INC-2026-0412-001 — Warehouse routing outage postmortem |
|               | → Decision: Add prod-scale load profiles to staging |
|               | → Decision: Add memory profiling to CI for route-optimizer |
|               | → Decision: Decouple carrier-api from route-optimizer releases |

---

## Page Body

## Summary

Document the manual queue drain procedure used during the 2026-04-12 outage so any ops engineer can execute it during future incidents without relying on tribal knowledge.

## Context

During the outage, Carlos Mendez manually drained the SFO order queue via the admin tool for ~45 minutes while the hotfix was being developed. This procedure is currently undocumented — if Carlos hadn't been available, the ops team would have been unable to unblock order flow. The 340+ stuck orders would have continued accumulating.

The runbook itself should be created as a separate page in the **Engineering Docs** database (Doc Type: Runbook) once written. This Decision Log entry tracks the decision; the deliverable lives in Engineering Docs.

## Timeline

- 2026-04-12 08:31 — Sofia asks Carlos to manually drain SFO queue via admin tool
- 2026-04-12 08:32–09:15 — Carlos manually drains queue (~45 min of hands-on work)
- 2026-04-12 09:47 — Retro decision: document the procedure as a runbook

## Impact

Without the manual drain, all ~340 orders would have remained stuck for the full 92-minute incident duration plus queue processing time. The drain procedure is a critical incident response capability that only one person currently knows.

## Root Cause

No documented procedure for manual queue draining. Tribal knowledge held by a single team member.

## Decision

Create a step-by-step runbook for the manual queue drain procedure covering all warehouse hubs (SFO, PDX, etc.). Publish in the Engineering Docs database. Have a second ops engineer validate the runbook by walking through it.

## Alternatives Considered

- **Fully automate the queue drain** — Desirable long-term but not feasible by the April 16 deadline. The runbook is the immediate deliverable; automation can follow as a separate initiative.
- **Just add it to the existing incident response wiki** — Rejected; the procedure is specific enough to warrant its own dedicated runbook with step-by-step instructions and hub-specific details.

## Action Items

- [ ] Write queue drain runbook with step-by-step instructions for all hubs — Owner: Carlos Mendez, Due: 2026-04-16
- [ ] Publish runbook in Engineering Docs database (Doc Type: Runbook, Team: Ops) — Owner: Carlos Mendez, Due: 2026-04-16
- [ ] Have a second ops engineer validate the runbook end-to-end — Owner: Carlos Mendez, Due: 2026-04-16

## Related

- Postmortem: INC-2026-0412-001 — Warehouse routing outage
- Future deliverable: Queue drain runbook (Engineering Docs → Runbook)
