# doany.ai Support Ops Recommendation Pack
## Incident Review — 2026-04-14

---

## Executive Snapshot

The weekend launch (Apr 11–12) generated **24 tickets** across email and chat. The data reveals three systemic failures:

| Metric | Weekend Actual | Target |
|---|---|---|
| Duplicate ticket rate | **42%** (10/24) | < 8% |
| Bot containment rate | **10%** (2/20 sessions) | > 35% billing, > 20% API |
| First-response SLA attainment | **25%** (6/24 met target) | > 95% for critical |
| Enterprise SLA attainment | **0%** (0/8 met target) | 100% critical/high |
| Avg CSAT (where collected) | **2.7 / 5** | ≥ 4.0 |

**Root causes:** no cross-channel deduplication, bot blind to incidents & duplicates, no deterministic priority scoring, ambiguous escalation ownership for billing/API overlap.

---

## 1. Triage Rules — Deterministic Priority & Routing

### 1a. Replace free-form severity with a point-based score

Every incoming ticket gets auto-scored on three axes:

| Axis | Condition | Points |
|---|---|---|
| **Customer tier** | `enterprise` | +3 |
| | `pro` | +2 |
| | `free` | +1 |
| **Impact** | Cannot pay / cannot use API / active outage | +3 |
| | Degraded feature or delayed processing | +2 |
| | Informational question | +1 |
| **Scope** | Multiple users or teams affected | +2 |
| | Single user | +1 |

**Priority mapping:**

| Score | Priority | SLA First Response |
|---|---|---|
| 7+ | Critical | 10 min |
| 5–6 | High | 30 min |
| 3–4 | Medium | 60 min |
| ≤ 2 | Low | 120 min |

**Weekend evidence:** T-8421 (enterprise payment failure, critical) waited 57 min for first response—5.7× over target. A deterministic score of 8 (enterprise +3, cannot pay +3, multiple users +2) would have auto-paged Billing Ops immediately.

### 1b. Routing matrix with single-owner accountability

| Condition | Owner | Watcher | Escalation if unresolved |
|---|---|---|---|
| `billing + payment-failure` | Billing Ops | Support Lead | Support Lead @ 30 min |
| `api + incident\|auth\|timeout` | API On-call | L2 | Engineering Manager @ 30 min |
| `billing + api` mixed intent | Support Lead triages within 10 min | Assigns one owner + one watcher | VP Support @ 60 min |
| Enterprise + negative sentiment + unresolved > 30 min | Auto-escalate to Support Lead | — | — |

**Weekend evidence:** T-8403 (enterprise, API auth, critical) was escalated to API On-call but first response took 22 min (target 10 min). The routing happened, but paging didn't fire on SLA proximity.

### 1c. Cross-channel duplicate handling

**Fingerprint:** `customer_id + topic + subtopic + 12-hour window`

| Step | Action |
|---|---|
| New ticket arrives | Compute fingerprint, search open tickets |
| Match found | Auto-merge as child of original ticket; preserve transcript link |
| No match | Proceed with normal triage |
| Agent override | Allow manual unlink if fingerprint was wrong (false positive) |

**Weekend evidence:** 10 of 24 tickets were duplicates. All 10 share the same `customer_id + topic + subtopic` as their parent and arrived within hours. Every single one would have been caught by this fingerprint logic:

| Parent | Duplicate | Customer | Topic | Gap |
|---|---|---|---|---|
| T-8401 | T-8402 | CUS-1142 | billing/renewal | 9 min |
| T-8403 | T-8404 | CUS-0091 | api/auth | 18 min |
| T-8407 | T-8408 | CUS-2278 | api/rate-limit | 15 min |
| T-8410 | T-8411 | CUS-0188 | billing/credit-note | 45 min |
| T-8412 | T-8413 | CUS-7774 | api/timeout | 27 min |
| T-8414 | T-8415 | CUS-3911 | billing/trial-conversion | 15 min |
| T-8416 | T-8417 | CUS-0027 | api/incident | 13 min |
| T-8418 | T-8419 | CUS-1982 | billing/refund | 22 min |
| T-8421 | T-8422 | CUS-1056 | billing/payment-failure | 17 min |
| T-8423 | T-8424 | CUS-8821 | api/sdk | 25 min |

Eliminating these 10 tickets would have cut queue volume by 42%, freeing L1 capacity for the 18 SLA-breaching tickets.

---

## 2. Automation Recommendations

### 2a. Chatbot guardrails (Intercom Fin)

The bot resolved **2 of 20 sessions** (10%). Seven handoffs were explicitly tagged `duplicate_not_detected`. Three more failed because the bot lacked incident/context awareness.

| Rule | Implementation | Prevents |
|---|---|---|
| **2-strike handoff** | After 2 failed intent clarifications, stop self-service and hand off to agent with full transcript | `user_rephrased_twice`, `customer_rejected_answer` (2 sessions) |
| **Incident-aware banner** | Poll status page or `#support-incidents` Slack; if active incident matches user intent, show banner + auto-create linked incident ticket | `missing_incident_context`, `no_incident_banner_logic`, `high_severity_not_caught` (3 sessions) |
| **Duplicate gate** | Before creating a new ticket, query Zendesk for open tickets with same `customer_id + topic + subtopic`; if found, show status of existing ticket instead | `duplicate_not_detected` (7 sessions) |
| **Enterprise fast-lane** | If `customer_tier = enterprise` AND `bot_confidence < 0.70`, skip FAQ and hand off immediately | `missing_enterprise_path`, `enterprise_urgency_not_detected` (2 sessions) |
| **Critical-score bypass** | If auto-scored priority = critical, bypass bot entirely and route to human | prevents bot suggesting "retry later" during outages |

**Projected impact:** These rules would have resolved or prevented handoff failures in **14 of 18 failed sessions**, lifting containment from 10% to an estimated 30–40%.

### 2b. Ticketing automations (Zendesk)

| Automation | Trigger | Action |
|---|---|---|
| **Auto-tag on create** | Every new ticket | Compute and stamp `priority_score`, `intent_family`, `duplicate_risk`, `escalation_owner` |
| **SLA early-warning** | 80% of first-response target elapsed | Alert assigned agent + team lead in Slack `#support-incidents` |
| **SLA breach alert** | 100% of first-response target elapsed | Page escalation owner per routing matrix; auto-escalate priority by one level |
| **Reopen merge** | Ticket reopened within 72 hours | Attach to original case unless customer explicitly requests a new issue |
| **Stale-pending sweep** | Pending ticket with no update for 4 hours | Notify owner; after 8 hours, escalate to Support Lead |

**Weekend evidence:** 5 tickets are still `pending` (T-8406, T-8410, T-8412, T-8418, T-8421). A stale-pending sweep would have flagged all of them before the incident review.

---

## 3. Self-Service Improvements

### 3a. Incident-aware help center blocks

Publish three dynamic FAQ blocks that pull live status data:

| Block | Content | Trigger |
|---|---|---|
| **Payment failure checklist** | Verify card, check bank hold, retry steps, link to Billing Ops contact | `billing/payment-failure` or `billing/renewal` intent |
| **API auth troubleshooting** | Exact 401/403 error codes, key rotation steps, token TTL reference | `api/auth` intent |
| **Timeout troubleshooting** | Status page embed, batch API limits, retry-with-backoff sample code | `api/timeout` or `api/rate-limit` intent |

Each block includes a live status-page widget. If there's an active incident, the block shows the incident banner instead of generic troubleshooting.

### 3b. Feedback loop

- Add binary "Was this helpful?" to every bot answer and every help-center article.
- Weekly: export top 10 failed bot intents (by `handoff_reason`) → knowledge-base refresh queue.
- Weekend data shows `article_not_actionable` and `no_dynamic_quota_check` as specific content gaps to close this week.

### 3c. Guided flows for high-volume subtopics

Based on weekend volume, build interactive guided flows for:

1. **Refund request** (billing/refund) — collect order ID, reason, preferred resolution → pre-fill Billing Ops ticket
2. **Invoice correction** (billing/invoice, billing/credit-note) — collect VAT ID, invoice number, correction type → route to Billing Ops with structured data
3. **Webhook setup** (onboarding/setup) — already high-containment (BOT-3206 resolved in bot); formalize as a maintained guided flow

---

## 4. KPI Framework

### 4a. Operational metrics (track weekly)

| KPI | Definition | Weekend Baseline | 30-Day Target |
|---|---|---|---|
| Duplicate ticket rate | `duplicate tickets / total tickets` | **42%** | < 8% |
| Bot containment rate | `resolved_in_bot / total bot sessions` | **10%** | > 35% (billing/account), > 20% (API) |
| First-response SLA attainment | `tickets responded within SLA / total` | **25%** | > 95% (critical), > 85% (all) |
| Time to correct owner | Minutes from creation to correct-owner assignment | Not tracked | p90 < 10 min |
| Escalation lag | Minutes from creation to owner acknowledgment | Not tracked | p90 < 15 min |
| Pending-ticket age | Hours oldest pending ticket has been without update | 48h+ (T-8406) | < 4h |

### 4b. Customer-outcome metrics (track weekly)

| KPI | Definition | Weekend Baseline | 30-Day Target |
|---|---|---|---|
| CSAT by topic | Mean CSAT per `topic` category | 2.7 overall | ≥ 4.0 |
| Negative-sentiment rate | % of tickets with `negative` sentiment | **75%** (18/24) | < 30% |
| Reopen rate | Tickets reopened within 72h / total resolved | Not tracked | < 5% |

### 4c. Dashboard requirements

Build a single Zendesk Explore dashboard with:
- Real-time SLA attainment gauge by priority
- Duplicate rate trend (daily)
- Bot containment rate trend (daily)
- CSAT by topic heatmap
- Pending tickets older than 4 hours (alert list)

---

## 5. Validation Checklist

Run these before go-live sign-off. Each test uses historical weekend tickets as inputs.

### Triage & Routing

- [ ] Replay all 24 weekend tickets through the scoring rubric; confirm priority assignments match expected values
- [ ] Confirm 10 known duplicate pairs are auto-linked by fingerprint logic
- [ ] Confirm T-8421 (enterprise, payment-failure, critical) auto-routes to Billing Ops with Support Lead watcher
- [ ] Confirm T-8403 (enterprise, API auth, critical) auto-routes to API On-call with L2 watcher
- [ ] Confirm a synthetic mixed billing/API ticket is held for Support Lead triage within 10 min

### Chatbot

- [ ] Simulate 2 failed intent clarifications → verify bot hands off with full transcript
- [ ] Set an active incident in status page → verify bot shows incident banner for matching intent
- [ ] Submit a ticket for CUS-1142 + billing/renewal while T-8401 is open → verify bot surfaces existing ticket instead of creating duplicate
- [ ] Submit an enterprise ticket with bot confidence < 0.70 → verify immediate handoff (no FAQ loop)
- [ ] Submit a critical-score ticket → verify bot is bypassed entirely

### SLA & Escalation

- [ ] Create a critical ticket in sandbox → verify Slack alert fires at 8 min (80% of 10-min target)
- [ ] Let the critical ticket breach → verify page fires at 10 min to escalation owner
- [ ] Create a pending ticket, leave untouched for 4 hours → verify stale-pending alert fires
- [ ] Verify SLA and duplicate-rate dashboard widgets update with current field mappings

### Self-Service

- [ ] Trigger payment-failure FAQ block while an incident is active → verify incident banner overrides generic content
- [ ] Complete the refund guided flow → verify pre-filled Billing Ops ticket is created with correct fields
- [ ] Click "Was this helpful? No" on a bot answer → verify it appears in the weekly failed-intent export

---

## Rollout Plan (This Week)

| Day | Actions | Owner |
|---|---|---|
| **Tue (Today)** | Finalize scoring rubric, routing matrix, and escalation owners in incident review | Support Lead + Engineering Manager |
| **Wed** | Configure Zendesk: auto-tag rules, priority scoring, duplicate fingerprint trigger, SLA timers | Support Ops |
| **Thu** | Deploy chatbot changes: 2-strike handoff, incident banner, duplicate gate, enterprise fast-lane | Support Ops + Engineering |
| **Fri** | QA: replay weekend tickets through new rules, run full validation checklist | Support Ops + QA |
| **Sat** | Agent training session; go live with new workflows; monitor first 24h metrics | Support Lead |

---

## Appendix: Weekend Ticket Evidence Summary

### Duplicate Pairs (10 tickets that should have been auto-merged)

| Parent → Child | Customer | Topic | Channel cross | Time gap |
|---|---|---|---|---|
| T-8401 → T-8402 | CUS-1142 | billing/renewal | email → chat | 9 min |
| T-8403 → T-8404 | CUS-0091 | api/auth | chat → email | 18 min |
| T-8407 → T-8408 | CUS-2278 | api/rate-limit | chat → email | 15 min |
| T-8410 → T-8411 | CUS-0188 | billing/credit-note | chat → email | 45 min |
| T-8412 → T-8413 | CUS-7774 | api/timeout | chat → email | 27 min |
| T-8414 → T-8415 | CUS-3911 | billing/trial-conv | chat → email | 15 min |
| T-8416 → T-8417 | CUS-0027 | api/incident | chat → chat | 13 min |
| T-8418 → T-8419 | CUS-1982 | billing/refund | email → chat | 22 min |
| T-8421 → T-8422 | CUS-1056 | billing/payment-fail | email → chat | 17 min |
| T-8423 → T-8424 | CUS-8821 | api/sdk | chat → email | 25 min |

### Critical SLA Breaches

| Ticket | Customer Tier | Topic | SLA Target | Actual | Overshoot |
|---|---|---|---|---|---|
| T-8403 | enterprise | api/auth | 10 min | 22 min | 2.2× |
| T-8416 | enterprise | api/incident | 10 min | 18 min | 1.8× |
| T-8421 | enterprise | billing/payment-failure | 10 min | 57 min | 5.7× |

### Bot Session Failure Breakdown (18 of 20 sessions failed)

| Failure category | Count | Sessions |
|---|---|---|
| Duplicate not detected | 7 | BOT-3205, 3208, 3210, 3212, 3214, 3216, 3219 |
| Missing incident/context awareness | 3 | BOT-3202, 3213, 3209 |
| Enterprise urgency not detected | 2 | BOT-3207, 3218 |
| Content gap (article/check missing) | 3 | BOT-3203, 3204, 3220 |
| Customer rejected / loop | 2 | BOT-3201, 3211 |
| Resolved successfully | 2 | BOT-3206, 3217 |
