Deployment with Horizontal Pod Autoscaler
Configure a Kubernetes Deployment with resource requests and limits optimized for Horizontal Pod Autoscaler (HPA) based on CPU utilization.
Patterns
Detailed Explanation
Autoscaling with HPA
The Horizontal Pod Autoscaler (HPA) automatically adjusts the number of pod replicas based on observed metrics like CPU utilization. For HPA to work correctly, your pods must have resource requests defined.
Deployment Configuration
apiVersion: apps/v1
kind: Deployment
metadata:
name: scalable-app
labels:
app: "scalable-app"
spec:
replicas: 2
template:
spec:
containers:
- name: app
image: my-app:latest
ports:
- name: http
containerPort: 8080
resources:
requests:
cpu: "200m"
memory: "256Mi"
limits:
cpu: "1000m"
memory: "512Mi"
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
HPA Configuration (separate resource)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: scalable-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: scalable-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Why Resource Requests Matter
HPA calculates CPU utilization as a percentage of the requested CPU. If your pod requests 200m CPU and is using 140m, utilization is 70%. Without requests, HPA cannot calculate utilization and will not scale.
Best Practices for HPA
- Set requests realistically: Overly generous requests mean HPA scales too late; too low means it scales too aggressively
- CPU limits > requests: Allow bursting above requests for short spikes without triggering a scale event
- Readiness probes: New pods should only receive traffic when ready; otherwise, HPA may scale up more than needed
- Cooldown periods: Default scale-down stabilization is 5 minutes to prevent flapping
- Min replicas >= 2: Always keep at least 2 replicas for high availability
Use Case
Building auto-scaling applications on Kubernetes that handle variable traffic loads, from web APIs to microservices that need to scale with demand while maintaining cost efficiency.