Question 1

What is statistical significance in A/B testing?

Accepted Answer

Statistical significance means the difference between pattern A and pattern B can be statistically determined to be real rather than due to chance. Generally, if the p-value is less than 0.05 (95% confidence level), it is considered statistically significant.

Question 2

How is the required sample size calculated?

Accepted Answer

It is calculated statistically from four parameters: baseline CVR, minimum detectable effect (MDE), significance level, and statistical power. Typically, 5% significance level and 80% statistical power are used.

Question 3

What is the p-value?

Accepted Answer

The p-value is the probability of obtaining results as extreme as or more extreme than observed, assuming the null hypothesis (no difference between A and B) is correct. A smaller p-value indicates higher likelihood of statistical significance.

Question 4

How is CVR (conversion rate) calculated?

Accepted Answer

CVR = Conversions / Visitors × 100 (%). For example, if 50 out of 1,000 visitors convert, the CVR is 5.0%.

Question 5

How long should the test period be?

Accepted Answer

We recommend a minimum of 1–2 weeks of complete business cycle. Continue testing until sufficient sample size is collected, and avoid stopping early by peeking at interim results—this is key to obtaining statistically accurate results.

Question 6

When should I stop an A/B test early?

Accepted Answer

You should avoid stopping tests early even if results look promising, as this can introduce selection bias and inflate your false positive rate. A best practice is to set your sample size in advance and let the test run to completion. If you must stop early, consider using a Bayesian approach or sequential testing methods that account for multiple peeks.

Question 7

What's the difference between one-tailed and two-tailed tests?

Accepted Answer

A two-tailed test checks if variant B is different from variant A in either direction (better or worse), while a one-tailed test only checks if B is better. Two-tailed tests are more conservative and recommended for most A/B tests unless you have a strong reason to only care about improvement in one direction. If you use a one-tailed test incorrectly, you risk missing important negative effects.

Question 8

How do I account for testing multiple hypotheses simultaneously?

Accepted Answer

When running multiple A/B tests in parallel, you increase the risk of false positives due to the multiple comparison problem. You can adjust your significance level using Bonferroni correction (divide 0.05 by the number of tests) or more advanced methods like FDR control. This calculator tests one hypothesis at a time, so apply corrections manually if you're testing several variants.

Question 9

What's a good baseline conversion rate to use?

Accepted Answer

Baseline conversion rates vary widely by industry—e-commerce averages 2-3%, SaaS may be 5-10%, and form signups could be 1-5%. You should use your own historical data as the baseline rather than industry benchmarks to ensure your sample size accounts for your actual traffic patterns. Higher baseline rates require smaller sample sizes to achieve the same statistical power.

Question 10

Can I use this calculator for metrics other than conversion rates?

Accepted Answer

Yes, this calculator works for any binary metric where you're comparing two rates, including email open rates, click-through rates, sign-up rates, or engagement metrics. Simply input your baseline metric rate and your expected improvement to get the required sample size. The statistical principles are identical whether you're testing a button color or an email subject line.

Question 11

What should I do if my test doesn't reach statistical significance?

Accepted Answer

If your test is underpowered, you can run it longer to collect more data, accept a larger minimum effect size, or declare the test inconclusive. Never draw strong conclusions from non-significant results—you simply don't have enough evidence either way. Consider whether your minimum detectable effect was realistic, as sometimes the true difference is smaller than initially assumed.

Metric	Meaning	Reference
P-value	Probability the difference is due to chance	Significant if < 0.05
Z-value	Number of standard deviations of difference	95% significant if \|Z\| > 1.96
Improvement Rate	CVR change from A to B	(CVR_B - CVR_A) / CVR_A
Confidence Interval	Estimated range of difference	Significant if excludes 0

📊 A/B Test Calculator

Ⓐ Pattern A (Control)

Ⓑ Pattern B (Variant)

Explanation of Formulas

How to Read Results

Usage and Application Examples

What is an A/B Test Calculator?

How to Use

Use Cases

Tips & Insights

Frequently Asked Questions