Message Queue Health Check Component Design
Design a health check component for message queues (RabbitMQ, SQS, Kafka) that monitors connection status, queue depth, consumer lag, and message throughput.
Detailed Explanation
Message Queue Health Check
Queue health checks verify that your asynchronous messaging infrastructure is functioning. Queue issues can cause silent data loss or processing delays.
Response Component
{
"queue": {
"status": "UP",
"duration": "8ms",
"message": "RabbitMQ connected, queues operational",
"details": {
"type": "rabbitmq",
"connection": "open",
"channels": 3,
"queues": {
"orders": { "messages": 42, "consumers": 2 },
"notifications": { "messages": 0, "consumers": 1 }
}
}
}
}
What to Check
- Connection status: Is the broker reachable?
- Queue depth: Are messages piling up?
- Consumer count: Are consumers attached and processing?
- Dead letter queue: Are messages failing processing?
- Consumer lag (Kafka): How far behind are consumers?
Health Thresholds
| Metric | Healthy | Degraded | Unhealthy |
|---|---|---|---|
| Connection | Open | Reconnecting | Failed |
| Queue depth | < 1000 | 1000-10000 | > 10000 |
| Consumer count | > 0 | 0 (briefly) | 0 (extended) |
| Consumer lag | < 100 | 100-10000 | > 10000 |
| DLQ messages | 0 | < 10 | > 10 |
Producer vs Consumer Health
Producer services should check:
- Connection to broker
- Ability to publish messages
- Queue existence
Consumer services should check:
- Connection to broker
- Consumer registration
- Processing throughput
- Dead letter queue size
Kafka-Specific Checks
For Kafka, include consumer group lag per partition and check that the consumer group is active. High lag indicates consumers are falling behind, which may mean the service cannot process real-time data effectively.
Use Case
Event-driven architectures using RabbitMQ, Amazon SQS, Apache Kafka, or Redis Streams where message processing delays or failures directly impact business operations.