A/B Testing Deployments
A/B testing is a deployment strategy where two versions of an application run simultaneously. A portion of users see version A (the existing version) and a portion see version B (the new version). You measure the outcome and keep whichever performs better.
It is different from canary deployments. A canary release is about risk reduction — you are checking the new version does not break. A/B testing is about learning — you are comparing two approaches to find out which works better.
How It Works
- Define what you are testing and what success looks like (the success metric)
- Split traffic: send a percentage of users to the new version
- Run both versions for a defined period
- Measure the outcome against your success metric
- Roll out the winner to 100% of users and remove the losing version
Defining Success
Before you run an A/B test, be explicit about your metric. Common metrics:
- Conversion rate (sign-ups, purchases, form completions)
- Click-through rate on a button or link
- Page engagement time
- Error rate or page load time
- Feature adoption rate
Without a defined metric, you cannot make a decision.
Statistical Significance
Do not end a test too early. Small samples produce noisy results. You need enough data to be confident the difference you see is real, not random variance. Most A/B testing tools calculate statistical significance for you — wait until you reach at least 95% confidence before acting on results.
What to A/B Test
A/B testing is well suited to:
- UI layout and copy changes
- New features you are unsure users will adopt
- Checkout flow changes
- Pricing page variations
- Onboarding sequences
It is less suited to backend infrastructure changes (use canary or blue/green instead).
Tools
| Tool | Notes |
|---|---|
| Optimizely | Full-featured experimentation platform |
| Google Optimize | Now deprecated — use GA4 experiments |
| LaunchDarkly | Feature flags with built-in experimentation |
| Unleash | Open source feature toggle service |
| Bucketer | Simple percentage-based traffic bucketing |
Risks
- Interaction effects — if you run multiple A/B tests simultaneously, they can interfere with each other
- Novelty effect — users sometimes engage more with anything new, not because it is better
- Sample pollution — if the same user sees both variants (e.g. clears cookies), your data is inaccurate