paulund
#deployments #testing #devops

A/B Testing Deployments

A/B testing is a deployment strategy where two versions of an application run simultaneously. A portion of users see version A (the existing version) and a portion see version B (the new version). You measure the outcome and keep whichever performs better.

It is different from canary deployments. A canary release is about risk reduction — you are checking the new version does not break. A/B testing is about learning — you are comparing two approaches to find out which works better.

How It Works

  1. Define what you are testing and what success looks like (the success metric)
  2. Split traffic: send a percentage of users to the new version
  3. Run both versions for a defined period
  4. Measure the outcome against your success metric
  5. Roll out the winner to 100% of users and remove the losing version

Defining Success

Before you run an A/B test, be explicit about your metric. Common metrics:

  • Conversion rate (sign-ups, purchases, form completions)
  • Click-through rate on a button or link
  • Page engagement time
  • Error rate or page load time
  • Feature adoption rate

Without a defined metric, you cannot make a decision.

Statistical Significance

Do not end a test too early. Small samples produce noisy results. You need enough data to be confident the difference you see is real, not random variance. Most A/B testing tools calculate statistical significance for you — wait until you reach at least 95% confidence before acting on results.

What to A/B Test

A/B testing is well suited to:

  • UI layout and copy changes
  • New features you are unsure users will adopt
  • Checkout flow changes
  • Pricing page variations
  • Onboarding sequences

It is less suited to backend infrastructure changes (use canary or blue/green instead).

Tools

Tool Notes
Optimizely Full-featured experimentation platform
Google Optimize Now deprecated — use GA4 experiments
LaunchDarkly Feature flags with built-in experimentation
Unleash Open source feature toggle service
Bucketer Simple percentage-based traffic bucketing

Risks

  • Interaction effects — if you run multiple A/B tests simultaneously, they can interfere with each other
  • Novelty effect — users sometimes engage more with anything new, not because it is better
  • Sample pollution — if the same user sees both variants (e.g. clears cookies), your data is inaccurate