# Decouple carrier-api from route-optimizer releases

## Properties (Decision Log database)

| Property      | Value |
|---------------|-------|
| Title         | Decouple carrier-api from route-optimizer releases |
| Decision Date | 2026-04-12 |
| Status        | Decided |
| Category      | Architecture |
| Severity      | SEV-2 |
| Owner         | Carlos Mendez |
| Stakeholders  | Derek Wu |
| Tags          | carrier-api, route-optimizer, release-process, decoupling |
| Due Date      | 2026-05-03 |
| Incident ID   | INC-2026-0412-001 |
| Source Link   | https://github.com/acme-logistics/route-optimizer/pull/4821 |
| Related Pages | → INC-2026-0412-001 — Warehouse routing outage postmortem |
|               | → INC-2026-0328 — Carrier API timeout cascade |
|               | → Decision: Add prod-scale load profiles to staging |
|               | → Decision: Add memory profiling to CI for route-optimizer |
|               | → Decision: Create queue drain runbook |

---

## Page Body

## Summary

Separate carrier-api auth and route-optimizer into independent release trains so that rolling back one service doesn't break the other. This requires an RFC to design the decoupling.

## Context

During the 2026-04-12 outage, the natural mitigation — rollback from v2.14.0 to v2.13.2 — was blocked because v2.14.0 also contained the carrier-api auth migration. Rolling back would have broken carrier handshakes across all hubs, compounding the outage. This forced the team down a slower hotfix path (92 min TTR instead of an estimated ~15 min rollback).

This is the **second time** carrier-api coupling has caused problems. INC-2026-0328 (Carrier API timeout cascade) involved the same coupling issue. The pattern is clear: bundled deploys create blast radius and rollback risk.

## Timeline

- 2026-04-12 08:27 — Rollback to v2.13.2 proposed
- 2026-04-12 08:28 — Rollback ruled out due to carrier-api auth coupling
- 2026-04-12 08:31 — Hotfix path chosen instead (slower, but doesn't break carrier auth)
- 2026-04-12 09:47 — Retro decision: decouple via RFC

## Impact

Rollback blockage added ~60 minutes to incident resolution. Without the coupling, a simple rollback could have resolved the outage by ~08:35.

## Root Cause

Carrier-api auth migration and route-optimizer algorithm changes are bundled in the same release artifact and deploy pipeline. There is no way to roll back one without affecting the other.

## Decision

Write an RFC to separate carrier-api auth and route-optimizer into independent release trains. Each service/module must be independently deployable and rollback-safe. The RFC should cover the migration path, shared state handling, and CI/CD pipeline changes.

## Alternatives Considered

- **Feature flags on carrier-api auth changes** — Rejected as primary solution; adds code complexity and doesn't address the fundamental deploy coupling. Useful as a supplementary technique but not sufficient alone.
- **Better rollback testing for coupled releases** — Rejected; doesn't address the root coupling risk that has now caused issues in two separate incidents.
- **Monorepo with independent build targets** — Possible approach but the RFC should evaluate this among other options.

## Action Items

- [ ] Draft RFC for decoupled release trains — Owners: Carlos Mendez + Derek Wu, Due: 2026-05-03
- [ ] Review RFC with platform team for infrastructure implications — Owner: Carlos Mendez, Due: 2026-05-03
- [ ] Reference INC-2026-0328 findings in the RFC — Owner: Derek Wu, Due: 2026-05-03

## Open Questions

- What's the migration path for shared auth state during decoupling?
- Do we need a versioned contract between carrier-api and route-optimizer?

## Related

- Postmortem: INC-2026-0412-001 — Warehouse routing outage
- Prior incident: INC-2026-0328 — Carrier API timeout cascade (same carrier-api coupling issue)
- Hotfix PR: https://github.com/acme-logistics/route-optimizer/pull/4821
