Kill Switch Feature Flag Pattern
Implement a kill switch pattern using feature flags to instantly disable problematic features in production without code deployment or server restarts.
Detailed Explanation
Kill Switch Pattern
A kill switch is a feature flag designed to instantly disable a feature or service when problems occur. Unlike regular feature flags that enable new features, kill switches are always on in production and get turned off during incidents.
Configuration Example
{
"payment-processing-enabled": {
"name": "Payment Processing Enabled",
"description": "KILL SWITCH: Disable to stop all payment processing during incidents",
"type": "boolean",
"enabled": true,
"defaultValue": true,
"targeting": []
}
}
Key Difference from Regular Flags
| Aspect | Regular Flag | Kill Switch |
|---|---|---|
| Default state | Off (feature disabled) | On (feature enabled) |
| Action | Turn on to enable | Turn off to disable |
| Urgency | Planned launches | Emergency response |
| Lifetime | Temporary (remove after launch) | Permanent |
| Naming | enable-feature-x |
feature-x-enabled or circuit-breaker-x |
Implementation in Code
// Kill switch pattern in application code
if (featureFlags.isEnabled("payment-processing-enabled")) {
processPayment(order);
} else {
// Graceful degradation
queuePaymentForLater(order);
showMaintenanceMessage();
}
Multiple Kill Switches
Complex systems often have layered kill switches:
{
"external-api-calls-enabled": { "defaultValue": true },
"payment-processing-enabled": { "defaultValue": true },
"email-notifications-enabled": { "defaultValue": true },
"search-indexing-enabled": { "defaultValue": true },
"analytics-tracking-enabled": { "defaultValue": true }
}
Incident Response Runbook
- Identify the failing component
- Toggle the corresponding kill switch to OFF
- Verify the system stabilizes
- Investigate the root cause
- Deploy a fix
- Toggle the kill switch back to ON
- Monitor for recurrence
Best Practices
- Label clearly: Prefix descriptions with "KILL SWITCH:" so on-call engineers find them quickly
- No targeting rules: Kill switches should affect all users immediately
- Test the off path: Regularly verify that the degraded experience works correctly
- Fast propagation: Use streaming SDKs to minimize the delay between toggling and effect
- Document dependencies: Note which services are affected by each kill switch
Use Case
During a Black Friday sales event, the third-party payment processor experiences intermittent failures. The on-call engineer flips the payment-processing-enabled kill switch to OFF, which queues orders for later processing instead of showing errors to customers. The switch is restored once the payment provider stabilizes.