Monitoring and Observability with Feature Flags

Q: Monitoring and Observability with Feature Flags

## Monitoring Feature Flag Rollouts Feature flags without monitoring are flying blind. You need to track which variation each user receives, compare key metrics between groups, and set up alerts to catch problems before they spread. ### Monitoring Configuration json { "new-recommendation-algo": { "name": "New Recommendation Algorithm", "type": "boolean", "enabled": true, "defaultValue": false, "targeting": [ { "type": "percentage-rollout", "percentage": 20 } ],

Set up monitoring and alerting for feature flag rollouts. Track flag evaluations, compare metrics between variations, and detect issues early.

Best Practices

Detailed Explanation

Monitoring Feature Flag Rollouts

Feature flags without monitoring are flying blind. You need to track which variation each user receives, compare key metrics between groups, and set up alerts to catch problems before they spread.

Monitoring Configuration

{
  "new-recommendation-algo": {
    "name": "New Recommendation Algorithm",
    "type": "boolean",
    "enabled": true,
    "defaultValue": false,
    "targeting": [
      { "type": "percentage-rollout", "percentage": 20 }
    ],
    "_monitoring": {
      "primaryMetrics": [
        "recommendation.click_through_rate",
        "recommendation.conversion_rate"
      ],
      "guardrailMetrics": [
        "page.load_time_p99",
        "recommendation.error_rate",
        "support.ticket_count"
      ],
      "alertThresholds": {
        "error_rate_increase": 0.5,
        "latency_p99_increase_ms": 200,
        "conversion_rate_decrease": 2.0
      }
    }
  }
}

What to Monitor

Category	Metrics	Alert When
Reliability	Error rate, exceptions	> 0.5% increase vs control
Performance	p50, p95, p99 latency	> 100ms increase
Business	Conversion, revenue	> 2% decrease
Infrastructure	CPU, memory, DB queries	> 20% increase
User experience	Bounce rate, session duration	Significant change

Flag Evaluation Logging

Log every flag evaluation for analysis:

{
  "timestamp": "2025-01-15T10:30:00Z",
  "flagKey": "new-recommendation-algo",
  "userId": "user-123",
  "variation": true,
  "reason": "percentage-rollout",
  "evaluationTimeMs": 2
}

Dashboard Layout

Feature Flag Rollout Dashboard
├─ Flag Status: ON (20% rollout)
├─ Users in treatment: 12,456
├─ Users in control: 49,824
├─── Primary Metrics
│   ├─ CTR: Treatment 12.3% vs Control 10.1% (+2.2%)
│   └─ Conversion: Treatment 3.2% vs Control 3.0% (+0.2%)
├─── Guardrail Metrics
│   ├─ Error rate: 0.02% vs 0.01% (within threshold)
│   ├─ p99 latency: 340ms vs 310ms (within threshold)
│   └─ Support tickets: 3 vs 2 (normal)
└─── Rollout History
    ├─ Day 1: 5% → No issues
    ├─ Day 3: 10% → No issues
    └─ Day 5: 20% → Current

Alerting Rules

Immediate (PagerDuty): Error rate spike > 1% in treatment group
Warning (Slack): Latency p99 increase > 50ms
Informational (Dashboard): Any metric divergence > 5%

Post-Rollout Analysis

After reaching 100%, analyze the complete data:

Compare all metrics between the treatment and control periods
Check for any lagging effects (issues that took days to appear)
Document findings in a rollout retrospective
Archive the monitoring configuration with the flag cleanup ticket

Use Case

A media platform rolls out a new recommendation algorithm to 20% of users. Their monitoring dashboard shows a 2.2% improvement in click-through rate with no increase in error rate or latency. Guardrail alerts are configured to automatically page the on-call engineer if error rates spike, giving confidence to increase the rollout to 50%.

Try It — Feature Flag Config Generator

Open full tool →