# Onboarding Experiment Results — Exec Brief

**Experiment:** EXP-2025-Q4-ONB-01 | **Run:** Oct 1 – Dec 15, 2025 | **Prepared:** Apr 12, 2026

---

## Bottom Line

**The new 3-step onboarding flow significantly improved 14-day activation.** Treatment users activated at 73% vs. 20% for control — a +53 percentage-point lift (p = 0.009). The result holds under sensitivity checks. We recommend shipping the new flow.

---

## Headline Numbers

| Metric | Treatment (3-step) | Control (6-step) | Delta |
|---|---|---|---|
| **14-day activation rate** | **73.3%** (11/15) | **20.0%** (3/15) | **+53.3 pp** |
| Onboarding completion | 100% (15/15) | 80% (12/15) | +20 pp |
| Median onboarding time | **4.0 min** | **9.2 hours** | -99.3% |
| Project created (14d) | 93% (14/15) | 53% (8/15) | +40 pp |
| Invite / share (14d) | 73% (11/15) | 20% (3/15) | +53 pp |

## Statistical Rigor

| Test | Value |
|---|---|
| Method | Fisher's exact test (two-sided) |
| Odds ratio | 11.0 |
| p-value | **0.009** |
| 95% CI for rate difference | **+17.5 pp to +76.7 pp** (Agresti-Caffo) |

The entire confidence interval is above zero — even the conservative lower bound implies a meaningful improvement.

## Where the Funnel Breaks in Control

```
              Treatment    Control
Signup        100%         100%
  ↓
Onboarding    100%          80%    ← 3 control users never finished
  ↓
Project        93%          53%    ← biggest drop in control
  ↓
Invite/Share   73%          20%    ← social step is the activation gate
```

The 6-step flow loses users at every stage. The 3-step flow keeps nearly everyone through onboarding, and the faster time-to-value (4 min vs. 9+ hours) translates into dramatically higher project creation and social action downstream.

## Data Quality & Exclusions

| Issue | Action | Impact |
|---|---|---|
| Internal/test accounts (U1033, U1034) | Excluded | 2 users removed |
| Enterprise manual assigns (U1031, U1032) | Excluded (not randomized) | 2 users removed |
| Payment gateway outage (Nov 12-14) | No action needed | Both groups equally affected |
| Mobile rendering bug (Nov 20-22) | Sensitivity analysis below | 2 treatment users in window |

**Final analysis population: 30 users (15 treatment, 15 control)**

## Sensitivity: Mobile Bug (Nov 20-22)

Two treatment-group mobile users (U1019, U1021) signed up during the rendering bug window. Neither activated. Excluding them:

| | Treatment | Control | p-value |
|---|---|---|---|
| Bug-window users included | 73.3% (11/15) | 20.0% (3/15) | 0.009 |
| Bug-window users excluded | **84.6% (11/13)** | 20.0% (3/15) | **0.002** |

Result **strengthens** when potentially impacted users are removed — the primary finding is robust.

## Caveats for Leadership

1. **Small sample size (n=30).** The effect is large enough to be statistically significant despite this, but the wide confidence interval (+17 pp to +77 pp) reflects genuine uncertainty about the true magnitude. We know the direction; we're less precise on how big.

2. **No pre-registered power analysis.** This experiment was not sized for a target minimum detectable effect. A properly powered replication (n~200 per arm for detecting a 15 pp lift at 80% power) would tighten the estimate.

3. **Subgroup cuts are unreliable at this n.** Platform and region breakdowns directionally favor treatment everywhere, but individual cells (e.g., APAC has 0 treatment users) are too small to draw conclusions.

4. **Activation ≠ retention.** 14-day activation is a leading indicator. We should validate that these activated users also retain at 30/60/90 days before drawing long-term conclusions.

## Recommendation

**Ship the 3-step flow as default**, with two follow-up actions:

1. **Monitor 30-day retention** for the existing cohort to confirm activation translates to sustained usage.
2. **Run a larger confirmatory experiment** (n~400+) to narrow the confidence interval and enable reliable platform/region subgroup analysis.

---

*Analysis excludes 4 users per data quality rules. Primary test: Fisher's exact (two-sided). CI: Agresti-Caffo method for small-sample proportion differences. Reproducible via `analyze.py`.*
