Home › Course › Module 14 › User story: did the change work? A quick A/B test

User story: did the change work? A quick A/B test

Module 14 · Scenario: Data & Business Analyst10 min readAdvanced

What you'll learn

Run a two-sample t-test
Interpret a p-value plainly
Build a confidence interval for the lift

The story

Marketing tried a new landing page on half the traffic. The new version has 4.2% conversion vs the old 3.8%. Is that a real win or just luck?

The data

import pandas as pd
df = pd.read_csv("ab_test.csv")   # visitor_id, group, converted (0/1)

print(df.groupby("group")["converted"].agg(["count","mean","sum"]))

Quick check with scipy

from scipy import stats

a = df.loc[df["group"]=="control",   "converted"]
b = df.loc[df["group"]=="treatment", "converted"]

result = stats.ttest_ind(b, a, equal_var=False)
print(f"t = {result.statistic:.2f}, p = {result.pvalue:.4f}")

What the p-value really says

p = the probability of seeing a difference at least this big, if there was no real effect. p < 0.05 by convention means "this is unlikely to be noise." Not "definitely real." Not "important." Just "unlikely noise."

Confidence interval for the lift

import numpy as np
p_a, n_a = a.mean(), len(a)
p_b, n_b = b.mean(), len(b)
lift = p_b - p_a
se = (p_a*(1-p_a)/n_a + p_b*(1-p_b)/n_b) ** 0.5
ci = (lift - 1.96*se, lift + 1.96*se)
print(f"Lift: {lift*100:+.2f} pp ({ci[0]*100:+.2f}, {ci[1]*100:+.2f})")

The confidence interval is more useful than the p-value — it tells you the size of the effect, not just "is it real."

Quick visual

import matplotlib.pyplot as plt
groups = df.groupby("group")["converted"].mean()
errs = df.groupby("group")["converted"].apply(lambda s: 1.96 * (s.mean()*(1-s.mean())/len(s))**0.5)

fig, ax = plt.subplots(figsize=(5, 4))
groups.plot(kind="bar", yerr=errs, capsize=6, ax=ax, color=["#94a3b8","#217346"])
ax.set_title("Conversion rate by group (95% CI)")
ax.set_ylabel("Conversion %")
plt.tight_layout()
plt.savefig("ab_test.png", dpi=200)

Key takeaways

Two-sample t-test is the workhorse for "is A different from B?"
p-value: probability of seeing this if there's no real effect.
Confidence intervals communicate size, not just significance.

Underpowered?

Re-run the analysis with only the first 1,000 rows of each group. How does the CI change? You've just seen why we need lots of data for small effects.

📹 Video walkthrough

A video walkthrough of this lesson will be embedded here. Until then, the written walkthrough above mirrors what the video will cover step-for-step.

← Previous lessonUser story: customer cohort retention Next lesson →User story: trend, seasonality, and 'is this Tuesday weird?'