A/B Testing Feature Flag Configuration

Set up A/B testing with feature flags by defining multiple variations, traffic allocation, and control/treatment groups for data-driven decisions.

Experimentation

Detailed Explanation

A/B Testing with Feature Flags

A/B testing uses feature flags to split traffic between two or more variations, allowing you to measure which version performs better. Unlike simple rollouts, A/B tests assign users to named groups for statistical analysis.

Configuration Example

{
  "checkout-button-color": {
    "name": "Checkout Button Color Test",
    "description": "Test whether green or orange CTA button improves conversion",
    "type": "string",
    "enabled": true,
    "defaultValue": "green",
    "targeting": [
      {
        "type": "percentage-rollout",
        "percentage": 50
      }
    ]
  }
}

LaunchDarkly Multivariate Setup

For proper A/B testing in LaunchDarkly, you define multiple variations:

{
  "variations": [
    { "value": "control", "name": "Original green button" },
    { "value": "treatment-a", "name": "Orange button" },
    { "value": "treatment-b", "name": "Blue button" }
  ],
  "rollout": {
    "variations": [
      { "variation": 0, "weight": 33334 },
      { "variation": 1, "weight": 33333 },
      { "variation": 2, "weight": 33333 }
    ]
  }
}

Statistical Significance

To get meaningful results from an A/B test:

  • Sample size: Need enough users in each group (typically 1,000+ per variation)
  • Duration: Run for at least 1-2 full business cycles (typically 1-2 weeks)
  • Consistency: The same user must always see the same variation
  • Single variable: Change only one thing per test to isolate the effect

Metrics to Track

Metric Type Examples
Primary Conversion rate, revenue per user
Secondary Bounce rate, time on page
Guardrail Error rate, page load time, support tickets

Important: Avoid Interaction Effects

If running multiple A/B tests simultaneously, ensure they don't interfere with each other. Use separate groupId values for bucketing so the same users aren't always in the treatment groups.

Use Case

An e-commerce team wants to test whether changing the checkout button color from green to orange increases the conversion rate. They split 50/50 between control and treatment, run the test for two weeks, then analyze the results to decide which button to keep.

Try It — Feature Flag Config Generator

Open full tool