Error Budget Calculation: How to Track and Use Your Downtime Allowance

Learn how to calculate and manage error budgets for SRE teams. Convert SLA percentages into actionable downtime budgets with practical examples and burn rate tracking.

SRE Practices

Detailed Explanation

What Is an Error Budget?

An error budget is the maximum amount of time or errors your service is allowed within a given period while still meeting its SLA. It is the mathematical complement of your SLA: if your SLA is 99.9%, your error budget is 0.1%.

Calculating Error Budgets

The formula is straightforward:

Error Budget = (1 - SLA/100) x Total Minutes in Period

Monthly error budgets for common SLA levels:

SLA Error Budget % Monthly Budget (minutes)
99% 1.0% 438 min (7h 18m)
99.5% 0.5% 219 min (3h 39m)
99.9% 0.1% 43.8 min
99.95% 0.05% 21.9 min
99.99% 0.01% 4.38 min
99.999% 0.001% 0.44 min (26s)

Error Budget Policy

A well-defined error budget policy answers these questions:

  1. What happens when the budget is exhausted? (Typically: freeze deployments, focus on reliability)
  2. What counts against the budget? (User-facing errors, latency SLO violations, full outages)
  3. How is the budget measured? (Request success rate, synthetic monitoring, real user metrics)
  4. Who owns the budget? (Usually shared between product and SRE teams)

Burn Rate Monitoring

Track how fast your error budget is being consumed:

Burn Rate = Actual Error Rate / Allowed Error Rate
  • Burn rate = 1.0: Consuming budget at exactly the expected rate
  • Burn rate = 2.0: Budget will be exhausted in half the period
  • Burn rate = 10.0: Major incident — budget will be gone in days

Practical Example

Your service has a 99.9% SLA with a monthly budget of 43.8 minutes:

  • Week 1: Deployment issue causes 5 minutes of errors → 38.8 minutes remaining
  • Week 2: Clean week → 38.8 minutes remaining
  • Week 3: Database failover takes 3 minutes → 35.8 minutes remaining
  • Week 4: 35.8 minutes still available → safe to deploy new features

If the budget had been consumed by Week 2, the team would freeze deployments and focus on reliability improvements.

Use Case

Use error budget calculations to negotiate SLA targets with stakeholders, set deployment policies, define incident severity thresholds, and balance feature velocity with reliability investments in SRE teams.

Try It — Uptime Calculator

Open full tool