GPU Workload Pod with Node Selector and Tolerations
Schedule a GPU-intensive workload (ML inference, video processing) on GPU-equipped Kubernetes nodes using node selectors, tolerations, and nvidia.com/gpu resource requests.
Detailed Explanation
GPU Workloads on Kubernetes
GPU workloads require specific scheduling to land on nodes with GPU hardware. Kubernetes uses node labels, tolerations, and extended resources to manage GPU scheduling.
Key Configuration
apiVersion: apps/v1
kind: Deployment
metadata:
name: ml-inference
labels:
app: "ml-inference"
workload: "gpu"
spec:
replicas: 1
template:
spec:
containers:
- name: inference
image: my-ml-model:latest
ports:
- name: grpc
containerPort: 8500
resources:
requests:
cpu: "500m"
memory: "2Gi"
limits:
cpu: "2000m"
memory: "4Gi"
nodeSelector:
accelerator: "nvidia-tesla-v100"
tolerations:
- key: "nvidia.com/gpu"
operator: "Exists"
effect: "NoSchedule"
GPU Resource Requests
To actually request GPU hardware, add to the container resources:
resources:
limits:
nvidia.com/gpu: 1
This requires the NVIDIA device plugin to be installed in your cluster. GPU resources are integer-only and not oversubscribed — if you request 1 GPU, you get exclusive access to one physical GPU.
Node Selector vs Node Affinity
| Feature | nodeSelector | nodeAffinity |
|---|---|---|
| Syntax | Simple key-value | Expressive operators |
| Soft preferences | No | Yes (preferredDuringScheduling) |
| Multiple conditions | AND only | AND and OR |
| Use case | Simple GPU scheduling | Complex multi-zone scheduling |
For simple GPU scheduling, nodeSelector is sufficient. Use nodeAffinity when you need soft preferences or complex matching logic.
Tolerations
GPU nodes are typically tainted to prevent non-GPU workloads from being scheduled there (wasteful). The toleration allows your GPU workload to be placed on these tainted nodes. Without the toleration, the scheduler will not consider GPU nodes even if they have available resources.
Use Case
Running machine learning inference, model training, video transcoding, or other GPU-accelerated workloads on dedicated GPU nodes in a Kubernetes cluster.