Database SLA: Uptime Considerations for Data Services

Understand database-specific SLA considerations including replication lag, failover time, RTO/RPO, and how database availability differs from compute availability.

Infrastructure SLAs

Detailed Explanation

Database Availability Is Different

Database SLAs are more nuanced than compute SLAs because databases have stateful requirements. A web server can restart instantly, but a database failover involves data consistency checks, replication catchup, and connection re-establishment.

Key Database SLA Metrics

Metric Definition Typical Targets
Availability Percentage of time the DB accepts queries 99.9% - 99.99%
RTO (Recovery Time Objective) Max time to restore service after failure 1 min - 4 hours
RPO (Recovery Point Objective) Max acceptable data loss in time 0 - 24 hours
Replication Lag Delay between primary and replica writes 0 - 60 seconds

Cloud Database SLA Comparison

Service SLA Failover Time RPO
AWS RDS Multi-AZ 99.95% 60-120 seconds 0 (synchronous)
AWS Aurora 99.99% <30 seconds 0 (synchronous)
Azure SQL Database (Business Critical) 99.995% ~30 seconds 0
GCP Cloud SQL (Regional) 99.95% ~60 seconds 0
GCP Cloud Spanner 99.999% Automatic 0

Failover Impact on Application SLA

Database failover is not instantaneous. During failover:

  1. Connection pool drain: Existing connections are broken (2-5 seconds)
  2. DNS propagation: New endpoint resolves (0-30 seconds)
  3. Replica promotion: New primary takes over (10-120 seconds)
  4. Connection re-establishment: App reconnects (1-10 seconds)

Total perceived downtime: 30 seconds to 3 minutes per failover event

Write vs Read Availability

A common pattern is to have higher availability for reads than writes:

Writes: Primary only → 99.95% (single point of failure)
Reads: Primary + N replicas → 99.999%+ (parallel redundancy)

For applications that are read-heavy (most web apps), this means:

  • Read SLA: Very high (99.99%+) with multiple read replicas
  • Write SLA: Limited by primary availability (99.95% typically)
  • Composite: Weighted by read/write ratio

Database-Specific Downtime Causes

Unlike compute instances, databases face unique availability threats:

  • Schema migrations (ALTER TABLE on large tables can lock writes)
  • Replication breakage (replica falls too far behind)
  • Storage exhaustion (disk full = database crash)
  • Connection limit exhaustion (too many clients)
  • Vacuum/maintenance operations (PostgreSQL VACUUM, MySQL OPTIMIZE)

Use Case

Reference this guide when designing database architectures, choosing between managed database services, setting RTO/RPO targets, and understanding how database failover impacts your application's overall SLA.

Try It — Uptime Calculator

Open full tool