# Trace Debugger v2.3: Stop Grepping Logs. Replay What Happened.

Priya Sharma, a senior backend engineer at Fintable, used to spend 45 minutes reconstructing a single failed payment flow. Grepping CloudWatch, correlating timestamps across three services, hoping logs hadn't rolled over.

The first time she used Trace Debugger v2.3 on a production incident, she found the root cause in four minutes.

> "It was a downstream service returning a 200 with an error body — the kind of thing that's invisible in log-based debugging. The replay showed me the exact payload. I literally said 'oh, there it is' out loud." — Priya Sharma, Senior Backend Engineer, Fintable

That's the shift. Debugging used to be archaeology — digging through log fragments, reconstructing timelines, hoping you'd find the right shard before your afternoon disappeared. v2.3 makes it replay. The failed request is right there, frozen in time, with every variable, header, and payload intact.

Trace Debugger v2.3 is now generally available. Here's what shipped and what it actually looks like in practice.

---

## Inline Trace Replay

Click a failed span in the trace panel. The debugger reconstructs the exact state at that point — variables, headers, payloads, downstream responses — without re-running the request. HTTP, gRPC, and WebSocket spans all work. Traces up to 2,000 spans deep are supported.

This isn't a log viewer with a better UI. It's a reconstruction of what actually happened at runtime, at the exact moment things went wrong.

Jordan Abebe, a solo founder running an indie SaaS, found a bug that logs alone would have never surfaced:

> "I had a webhook from Stripe that was silently failing. The logs showed a 200 response, so I assumed it was fine. Trace replay showed me that my handler was catching an error, logging it to a file I never check, and returning 200 anyway. I would have never found that from logs alone." — Jordan Abebe, Full-Stack Developer

The pattern keeps showing up: the bugs that survive longest are the ones that look fine in logs. A 200 that should be a 500. A payload that's technically valid but semantically wrong. Trace replay makes those visible because you're looking at the actual state, not a text summary of it.

---

## Snapshot Diff View

Regressions are a specific kind of misery. Something worked yesterday. It doesn't work today. Somewhere in the last 14 commits, something changed, and you're about to spend an hour bisecting.

Snapshot Diff View shortcuts that. Pick a "known good" trace and a broken one, and the diff highlights where they diverge: changed payloads, new error codes, timing anomalies. Green means matching, yellow means changed, red means a span is missing entirely.

Marcus Chen, platform lead at Ridgewell Health, used it on a real regression during the beta:

> "Patient record lookups started timing out after a deploy. I pulled a trace from before the deploy and one from after, ran the diff, and immediately saw that a new middleware was adding 300ms to every downstream call. Without the diff view I would have been bisecting commits for an hour." — Marcus Chen, Platform Lead, Ridgewell Health

Access it from the trace list: right-click any trace, select "Compare with..." and pick the baseline.

---

## Collaborative Trace Sharing

Debugging is often a team sport, but the information is stuck on one person's screen. Trace sharing fixes that: generate a permalink to a frozen snapshot of the trace state. The person who opens it sees exactly what you saw.

The viewer doesn't need a paid seat — read-only access is free. Links are valid for 30 days. You can annotate any span with a comment, so you can point directly at the problem instead of writing a paragraph in Slack explaining where to look.

Dana Ostrowski, engineering manager at Conveyor, measured the impact on her team's incident workflow:

> "The trace sharing feature changed how we do incident response. Before, the person who found the issue had to stay on the call to walk everyone through it. Now they just drop a link in the incident channel and go back to fixing. Our incident calls got shorter because people could self-serve the context." — Dana Ostrowski, Engineering Manager, Conveyor

Priya saw the same thing at Fintable:

> "I sent a trace link to our payments team in Slack. They clicked it, saw exactly what I saw, and we had a fix merged in 20 minutes. Previously that would've been a 45-minute screen share where I walk someone through logs."

---

## Two Lines to Get Started

New auto-instrumentation for Express 4.x/5.x and Fastify 4.x means you don't need to manually create spans. Install the SDK and add one import. Traces start flowing.

```bash
npm install @doany/trace-sdk
```

```js
// That's it. Traces start flowing.
require('@doany/trace-sdk/auto');
const express = require('express');
```

No config files. No YAML. No agent sidecar.

Marcus's team at Ridgewell Health had traces in staging within 10 minutes of installing the package:

> "We were nervous about adding more instrumentation overhead. But the auto-instrumentation for Express was literally two lines. We had traces flowing in staging within 10 minutes of installing the package. No config, no YAML files, no agent sidecar. That was the thing that got our CTO to approve rolling it out to prod." — Marcus Chen, Platform Lead, Ridgewell Health

Jordan didn't even read the docs:

> "Two lines of code. I didn't read the docs. I just added the import, deployed, and suddenly I could see every request flowing through my app."

If you're running Koa or Hono, auto-instrumentation isn't supported yet — manual instrumentation still works for those frameworks.

---

## The Beta Was Rough. Then It Wasn't.

We're not going to pretend the beta was smooth. Marcus will tell you himself:

> "Honestly the beta was rough at first. The trace panel kept freezing when we had big traces — like 600+ spans. We almost gave up on it. But the team shipped a fix for that within a week of us reporting it, and after that it was solid." — Marcus Chen, Platform Lead, Ridgewell Health

The trace panel was blocking the UI thread with synchronous rendering on large traces. That's fixed. The panel handles 500+ span traces without freezing now.

We also fixed a race condition in span ordering for parallel async handlers, broken WebSocket span context propagation across reconnects, and incorrect latency calculations during clock drift. Full list is in the [release notes](https://doany.ai/docs/release-notes/v2.3).

---

## One Breaking Change Worth Knowing About

`TraceContext.propagate()` now returns `Promise<TraceHeaders>` instead of `TraceHeaders`. If you were calling it synchronously, you'll need to add `await`.

Tomás Herrera at Stackframe hit this during the beta:

> "The async change in propagate() bit us. We had a custom middleware that called it synchronously, and it silently broke trace propagation across our gateway service. Took us a day to figure out. The migration guide could have been clearer about this." — Tomás Herrera, DevOps Engineer, Stackframe

Fair feedback. We've updated the [migration guide](https://doany.ai/docs/migration/v2.3) with before/after code examples for this change. If you have custom middleware that calls `propagate()`, check it before upgrading.

Minimum Node.js version is also now 18 (up from 16).

---

## What Teams Are Seeing

The numbers from beta users aren't lab benchmarks — they're from teams running this on real production incidents.

Conveyor's shipping calculations team tracked their mean time to resolution across incident retros. It dropped from about 2 hours to 35 minutes after rolling out v2.3. The other teams saw those numbers and asked to onboard themselves.

Tomás at Stackframe, whose team already runs Datadog and Jaeger, put it simply:

> "We kept Datadog for metrics and alerting, but for actual debugging, the team reaches for doany first now." — Tomás Herrera, DevOps Engineer, Stackframe

The old workflow: something breaks, check dashboards, grep logs, open Jaeger, squint at waterfalls, maybe find it. The new workflow: something breaks, open the failed trace, hit replay, see what happened.

> "You're not searching for information anymore — it's just there." — Tomás Herrera

---

## What's Still Missing

Trace-to-code mapping — click a span and jump to the exact line in your repo — doesn't exist yet. Dana's team at Conveyor wants it. So do we. It's on the roadmap.

The 30-day expiry on shared trace links is too short for teams that reference incidents in postmortems months later. We're looking at that too.

Koa and Hono auto-instrumentation is coming. Snapshot Diff View can misalign spans when traces come from different SDK versions — make sure both services run `@doany/trace-sdk` v4.1+ to avoid that.

---

## Get Started

```bash
npm install @doany/trace-sdk
```

```js
require('@doany/trace-sdk/auto');
```

Two lines. No config. Traces start flowing. The [docs](https://doany.ai/docs) have everything else — setup for gRPC and WebSocket instrumentation, self-hosted options, and the migration guide if you're upgrading from v2.2.

If something breaks, open the trace and replay it. That's the whole idea.