GPU Workload Pod with Node Selector and Tolerations

Schedule a GPU-intensive workload (ML inference, video processing) on GPU-equipped Kubernetes nodes using node selectors, tolerations, and nvidia.com/gpu resource requests.

Scheduling

Detailed Explanation

GPU Workloads on Kubernetes

GPU workloads require specific scheduling to land on nodes with GPU hardware. Kubernetes uses node labels, tolerations, and extended resources to manage GPU scheduling.

Key Configuration

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ml-inference
  labels:
    app: "ml-inference"
    workload: "gpu"
spec:
  replicas: 1
  template:
    spec:
      containers:
        - name: inference
          image: my-ml-model:latest
          ports:
            - name: grpc
              containerPort: 8500
          resources:
            requests:
              cpu: "500m"
              memory: "2Gi"
            limits:
              cpu: "2000m"
              memory: "4Gi"
      nodeSelector:
        accelerator: "nvidia-tesla-v100"
      tolerations:
        - key: "nvidia.com/gpu"
          operator: "Exists"
          effect: "NoSchedule"

GPU Resource Requests

To actually request GPU hardware, add to the container resources:

resources:
  limits:
    nvidia.com/gpu: 1

This requires the NVIDIA device plugin to be installed in your cluster. GPU resources are integer-only and not oversubscribed — if you request 1 GPU, you get exclusive access to one physical GPU.

Node Selector vs Node Affinity

Feature nodeSelector nodeAffinity
Syntax Simple key-value Expressive operators
Soft preferences No Yes (preferredDuringScheduling)
Multiple conditions AND only AND and OR
Use case Simple GPU scheduling Complex multi-zone scheduling

For simple GPU scheduling, nodeSelector is sufficient. Use nodeAffinity when you need soft preferences or complex matching logic.

Tolerations

GPU nodes are typically tainted to prevent non-GPU workloads from being scheduled there (wasteful). The toleration allows your GPU workload to be placed on these tainted nodes. Without the toleration, the scheduler will not consider GPU nodes even if they have available resources.

Use Case

Running machine learning inference, model training, video transcoding, or other GPU-accelerated workloads on dedicated GPU nodes in a Kubernetes cluster.

Try It — K8s Pod Spec Builder

Open full tool