Marketing tried a new landing page on half the traffic. The new version has 4.2% conversion vs the old 3.8%. Is that a real win or just luck?
import pandas as pd
df = pd.read_csv("ab_test.csv") # visitor_id, group, converted (0/1)
print(df.groupby("group")["converted"].agg(["count","mean","sum"]))
from scipy import stats
a = df.loc[df["group"]=="control", "converted"]
b = df.loc[df["group"]=="treatment", "converted"]
result = stats.ttest_ind(b, a, equal_var=False)
print(f"t = {result.statistic:.2f}, p = {result.pvalue:.4f}")
p = the probability of seeing a difference at least this big, if there was no real effect. p < 0.05 by convention means "this is unlikely to be noise." Not "definitely real." Not "important." Just "unlikely noise."
import numpy as np
p_a, n_a = a.mean(), len(a)
p_b, n_b = b.mean(), len(b)
lift = p_b - p_a
se = (p_a*(1-p_a)/n_a + p_b*(1-p_b)/n_b) ** 0.5
ci = (lift - 1.96*se, lift + 1.96*se)
print(f"Lift: {lift*100:+.2f} pp ({ci[0]*100:+.2f}, {ci[1]*100:+.2f})")
The confidence interval is more useful than the p-value — it tells you the size of the effect, not just "is it real."
import matplotlib.pyplot as plt
groups = df.groupby("group")["converted"].mean()
errs = df.groupby("group")["converted"].apply(lambda s: 1.96 * (s.mean()*(1-s.mean())/len(s))**0.5)
fig, ax = plt.subplots(figsize=(5, 4))
groups.plot(kind="bar", yerr=errs, capsize=6, ax=ax, color=["#94a3b8","#217346"])
ax.set_title("Conversion rate by group (95% CI)")
ax.set_ylabel("Conversion %")
plt.tight_layout()
plt.savefig("ab_test.png", dpi=200)
Re-run the analysis with only the first 1,000 rows of each group. How does the CI change? You've just seen why we need lots of data for small effects.