Database SLA: Uptime Considerations for Data Services
Understand database-specific SLA considerations including replication lag, failover time, RTO/RPO, and how database availability differs from compute availability.
Detailed Explanation
Database Availability Is Different
Database SLAs are more nuanced than compute SLAs because databases have stateful requirements. A web server can restart instantly, but a database failover involves data consistency checks, replication catchup, and connection re-establishment.
Key Database SLA Metrics
| Metric | Definition | Typical Targets |
|---|---|---|
| Availability | Percentage of time the DB accepts queries | 99.9% - 99.99% |
| RTO (Recovery Time Objective) | Max time to restore service after failure | 1 min - 4 hours |
| RPO (Recovery Point Objective) | Max acceptable data loss in time | 0 - 24 hours |
| Replication Lag | Delay between primary and replica writes | 0 - 60 seconds |
Cloud Database SLA Comparison
| Service | SLA | Failover Time | RPO |
|---|---|---|---|
| AWS RDS Multi-AZ | 99.95% | 60-120 seconds | 0 (synchronous) |
| AWS Aurora | 99.99% | <30 seconds | 0 (synchronous) |
| Azure SQL Database (Business Critical) | 99.995% | ~30 seconds | 0 |
| GCP Cloud SQL (Regional) | 99.95% | ~60 seconds | 0 |
| GCP Cloud Spanner | 99.999% | Automatic | 0 |
Failover Impact on Application SLA
Database failover is not instantaneous. During failover:
- Connection pool drain: Existing connections are broken (2-5 seconds)
- DNS propagation: New endpoint resolves (0-30 seconds)
- Replica promotion: New primary takes over (10-120 seconds)
- Connection re-establishment: App reconnects (1-10 seconds)
Total perceived downtime: 30 seconds to 3 minutes per failover event
Write vs Read Availability
A common pattern is to have higher availability for reads than writes:
Writes: Primary only → 99.95% (single point of failure)
Reads: Primary + N replicas → 99.999%+ (parallel redundancy)
For applications that are read-heavy (most web apps), this means:
- Read SLA: Very high (99.99%+) with multiple read replicas
- Write SLA: Limited by primary availability (99.95% typically)
- Composite: Weighted by read/write ratio
Database-Specific Downtime Causes
Unlike compute instances, databases face unique availability threats:
- Schema migrations (ALTER TABLE on large tables can lock writes)
- Replication breakage (replica falls too far behind)
- Storage exhaustion (disk full = database crash)
- Connection limit exhaustion (too many clients)
- Vacuum/maintenance operations (PostgreSQL VACUUM, MySQL OPTIMIZE)
Use Case
Reference this guide when designing database architectures, choosing between managed database services, setting RTO/RPO targets, and understanding how database failover impacts your application's overall SLA.
Try It — Uptime Calculator
Related Topics
Composite SLA Calculation: Combining Multiple Service SLAs
SRE Practices
Four Nines (99.99%) SLA Explained
SLA Levels
AWS EC2 SLA: Understanding Amazon's Uptime Guarantee
Cloud Provider SLAs
Error Budget Calculation: How to Track and Use Your Downtime Allowance
SRE Practices
Incident Response Time and Its Impact on SLA
SRE Practices