Canary Release Feature Flag
Configure a canary release using feature flags to test new functionality with a tiny percentage of real traffic before wider rollout.
Rollout Strategies
Detailed Explanation
Canary Release with Feature Flags
A canary release exposes a new feature to a very small percentage of production traffic (typically 1-5%) to detect issues before they affect all users. Named after the canary in a coal mine, this pattern provides early warning of problems.
Configuration Example
{
"new-payment-gateway": {
"name": "New Payment Gateway",
"description": "Stripe integration replacing Braintree",
"type": "boolean",
"enabled": true,
"defaultValue": false,
"targeting": [
{
"type": "percentage-rollout",
"percentage": 2
}
]
}
}
Canary vs Other Rollout Strategies
| Strategy | Traffic | Purpose | Duration |
|---|---|---|---|
| Canary | 1-5% | Detect bugs in production | Hours to 1 day |
| Beta | Segment-based | Gather feedback | Days to weeks |
| Gradual rollout | 5% → 100% | Reduce risk | Days to weeks |
| Blue-green | 0% or 100% | Zero-downtime deploy | Minutes |
What to Monitor During Canary
Key Metrics for Canary Group vs Control:
├─ Error rate (HTTP 5xx)
├─ Latency (p50, p95, p99)
├─ CPU and memory usage
├─ Business metrics (conversion, revenue)
├─ Client-side errors (JavaScript exceptions)
└─ User complaints / support tickets
Automatic Canary Analysis
Some teams automate the canary decision:
- Deploy canary at 2% traffic
- Automated system compares canary metrics vs control for 1 hour
- If metrics are within acceptable thresholds: auto-promote to 10%
- If metrics degrade: auto-rollback to 0%
- Continue promoting through stages until 100%
Canary Configuration for Multiple Metrics
{
"canary-analysis-config": {
"type": "json",
"defaultValue": {
"analysisInterval": "10m",
"maxErrorRateIncrease": 0.5,
"maxLatencyP99Increase": 100,
"minimumSampleSize": 1000,
"promotionSchedule": [2, 10, 25, 50, 100]
}
}
}
Rollback Criteria
Immediately rollback if:
- Error rate increases by more than 0.5% compared to control
- p99 latency increases by more than 100ms
- Any SEV-1 alerts fire for the canary group
- Revenue per user drops by more than 2%
Use Case
A fintech company is migrating from Braintree to Stripe for payment processing. They create a canary flag at 2% to route a small fraction of real transactions through Stripe. After 24 hours of monitoring shows no increase in payment failures or latency, they increase to 10% and continue the gradual rollout.