Memory Usage Health Check Component Design
Design a memory usage health check component that monitors heap usage, garbage collection pressure, and memory thresholds for Node.js, Java, and Go applications.
Detailed Explanation
Memory Usage Health Check
Memory health checks detect memory leaks and excessive memory consumption before they cause out-of-memory (OOM) kills or performance degradation.
Response Component
{
"memory": {
"status": "UP",
"duration": "1ms",
"message": "Memory usage within limits",
"details": {
"heapUsed": "256MB",
"heapTotal": "512MB",
"rss": "580MB",
"external": "24MB",
"utilization": "50%"
}
}
}
Node.js Implementation
function checkMemory() {
const mem = process.memoryUsage();
const heapUsedMB = Math.round(mem.heapUsed / 1024 / 1024);
const heapTotalMB = Math.round(mem.heapTotal / 1024 / 1024);
const utilization = (mem.heapUsed / mem.heapTotal * 100).toFixed(1);
let status = 'UP';
if (parseFloat(utilization) > 90) status = 'DOWN';
else if (parseFloat(utilization) > 75) status = 'DEGRADED';
return {
status,
duration: '1ms',
message: `Heap: ${heapUsedMB}MB / ${heapTotalMB}MB (${utilization}%)`
};
}
Memory Thresholds
| Metric | Healthy | Degraded | Unhealthy |
|---|---|---|---|
| Heap utilization | < 75% | 75-90% | > 90% |
| RSS growth rate | Stable | Slow growth | Rapid growth |
| GC pause time | < 50ms | 50-200ms | > 200ms |
Why Include in Liveness Probe
Memory checks are one of the few dependency-like checks appropriate for liveness probes. A memory leak that pushes heap usage above 90% typically requires a restart to recover, which is exactly what liveness probes are designed to trigger.
Container Memory vs App Memory
In containerized environments, check both:
- Application heap: Your runtime's managed memory
- Container limit: The cgroup memory limit set by Kubernetes
An app can be using 60% of its heap but 95% of the container limit due to native memory, thread stacks, and memory-mapped files.
Use Case
Long-running Node.js, Java, or Python services where memory leaks can gradually degrade performance, especially in Kubernetes where OOM kills disrupt service availability.