GPUs⚓︎

Links to external documentation

prokube uses the NVIDIA GPU Operator to manage GPUs in the cluster. The operator is responsible for the installation of the necessary drivers and the configuration of the GPUs.

There are two ways of sharing a GPU in the cluster: Timeslicing and MIG (Multi Instance GPU). Both allow a GPU to be partitioned into multiple smaller instances, which can be used by different pods. Timeslicing each GPU into 4 equal parts is the default configuration. Using MIG is more effective, but it is only supported by certain GPUs (usually A30 and upwards).

With timeslicing, multiple workloads can share the same physical GPU concurrently on the host. We will refer to each slice as to a logical GPU further in this documentation, as it can be assigned to workloads as nvidia.com/gpu: 1 in Pod requests/limits. Each logical GPU can be assigned to a different workload. For example, if a GPU is timesliced into 4 parts, 4 pods can use the same physical GPU simultaneously by requesting one logical GPU each. A pod can also request multiple logical GPUs if needed.

The processing power of a timesliced GPU is shared between all active workloads on the GPU, meaning that workloads may get more than 25% of the GPU power when there are less than 4 active workloads on a 4x timesliced GPU.

The memory of a timesliced GPU is shared between all active workloads on the GPU, meaning that any workload can attempt to use the full memory of the GPU. However, if the total memory usage of all workloads exceeds the physical memory of the GPU, performance will degrade drastically.

How to set up timeslicing⚓︎

By default, prokube sets up 4x timeslicing for all GPUs in the cluster. No additional configuration is needed. To change this, you can modify a CR called clusterpolicies.nvidia.com/cluster-policy in the gpu-operator-resources namespace.

With MIG, a GPU is partitioned into multiple instances at the hardware level. Each instance has its own dedicated resources (compute cores, memory, bandwidth) and can be assigned to a different workload. This allows for better isolation and performance compared to timeslicing.

How to enable MIG⚓︎

Since not all devices support Multi Instance GPU (MIG), the default setting in prokube is timeslicing. To change this, you can modify a CRD called clusterpolicy. One option is to edit that CRD using tools like kubectl, k9s, or Lens. However, the preferred method is to use GitOps and commit the corresponding changes in the paas repository.

You'll find the necessary customizations in the paas repo under this path: paas/gpus/base/cluster-policy.yaml. Here is an example configuration you might want to use:

apiVersion: nvidia.com/v1
kind: ClusterPolicy
metadata:
  name: cluster-policy
spec:
  mig:
    strategy: single
  migManager:
    config:
      default: all-1g.5gb
      name: default-mig-parted-config

The default-mig-parted-config is a ConfigMap that is automatically deployed by the GPU operator. It holds configurations such as all-1g.5gb or all-balanced, describing how to partition the GPUs. For example, an A30 GPU can be split into 6 or 12 GB units with respect to the GPU memory.

For heterogeneous GPU nodes, use mig.strategy: mixed and configure MIG only for the devices that support it. Non-MIG-capable GPUs remain available as full nvidia.com/gpu resources. If you define custom MIG profiles in your own config map, keep the default set to a value accepted by the GPU Operator, such as all-disabled, and activate the custom profile through the node label described below. Some GPU Operator versions validate the default field against built-in profiles and reject custom profile names.

Example custom MIG config map:

apiVersion: v1
kind: ConfigMap
metadata:
  name: custom-mig-config
  namespace: gpu-operator-resources
data:
  config.yaml: |-
    version: v1
    mig-configs:
      all-disabled:
      - devices: all
        mig-enabled: false

      a100-80gb-balanced:
      - device-filter: ["0x20B510DE"]
        devices: all
        mig-enabled: true
        mig-devices:
          "1g.10gb": 2
          "2g.20gb": 1
          "3g.40gb": 1

The device-filter value is derived from the PCI device ID. For example, lspci -nn | grep -i nvidia may show [10de:20b5]; the NVIDIA MIG config format expects this as 0x20B510DE. If all GPUs on the node are the same model, you can omit device-filter and use device indices directly instead (e.g. devices: [0, 1, 2]).

Reference the custom config map from the ClusterPolicy:

apiVersion: nvidia.com/v1
kind: ClusterPolicy
metadata:
  name: cluster-policy
spec:
  mig:
    strategy: mixed
  migManager:
    config:
      name: custom-mig-config
      default: all-disabled

To ensure a clean configuration, it is advisable to reinstall the operator. On MicroK8s deployments:

microk8s disable gpu
microk8s enable gpu

On other Kubernetes distributions, uninstall and reinstall the GPU operator using Helm or your preferred method.

Then proceed with the GitOps synchronization.

More details on this can be found in the NVIDIA GPU Operator documentation on MIG support.

Changing MIG partitions⚓︎

To apply a specific configuration, the nodes that hold GPUs must be labeled. The operator labels the nodes with the value specified by default in the cluster policy. To change the configuration, modify the label:

kubectl label nodes example-node-name nvidia.com/mig.config=all-2g.12gb --overwrite

The operator will pick up the label change and reconfigure the GPUs accordingly.

For custom profiles, use the custom profile name from the MIG config map:

kubectl label nodes example-node-name nvidia.com/mig.config=a100-80gb-balanced --overwrite

Check whether the configuration was applied successfully:

kubectl get nodes -o custom-columns=NAME:.metadata.name,MIG:.metadata.labels.nvidia\.com/mig\.config,STATE:.metadata.labels.nvidia\.com/mig\.config\.state

The expected state is success. If the state becomes failed, inspect the MIG Manager logs:

kubectl get pods -n gpu-operator-resources | grep -i mig
kubectl logs -n gpu-operator-resources <mig-manager-pod-name> --tail=200

Changing MIG mode usually requires a GPU reset. The reset can fail if GPU devices are still in use by workloads or host processes. Before changing MIG partitions, stop GPU workloads and verify that no pods request NVIDIA resources:

kubectl get pods -A -o json | jq -r '
  .items[] as $pod |
  $pod.spec.containers[]? as $container |
  ($container.resources.limits // {}) |
  to_entries[] |
  select(.key | startswith("nvidia.com/")) |
  "\($pod.metadata.namespace)\t\($pod.metadata.name)\t\(.key)\t\(.value)"
'

If the MIG Manager still reports that a GPU is in use, inspect host processes on the affected node:

sudo fuser -v /dev/nvidia*

On single-node clusters or after driver installation, rebooting the node can be the simplest way to release all GPU clients and allow the GPU reset to complete.

Inspecting MIG configuration⚓︎

To view the current MIG partitioning on a node, run nvidia-smi -L inside the driver daemonset container:

kubectl exec -it -n gpu-operator-resources ds/nvidia-driver-daemonset -- nvidia-smi -L

To list all MIG profiles supported by the GPUs on a node:

kubectl exec -it -n gpu-operator-resources ds/nvidia-driver-daemonset -- nvidia-smi mig -lgip

Not all profiles listed here may be valid in combination with others. The NVIDIA MIG user guide lists the supported profiles per GPU model.

Using GPUs for your own workloads⚓︎

To use GPUs in prokube components, please refer to the respective sections of this documentation. To use GPUs in your own workloads, request them in the pod specification:

apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
spec:
  containers:
  - name: gpu-container
    image: nvidia/cuda:11.0-base
    resources:          
      limits:
        nvidia.com/gpu: 1 # <-- Request 1 GPU

This will request 1 GPU for the container. The NVIDIA GPU Operator will automatically schedule the pod on a node with an available GPU and mount the necessary drivers and libraries from the host into the container. No need for you to install any GPU drivers or libraries in the container image.

For MIG partitions, request the MIG resource name instead of nvidia.com/gpu:

apiVersion: v1
kind: Pod
metadata:
  name: mig-pod
spec:
  containers:
  - name: gpu-container
    image: nvidia/cuda:11.0-base
    resources:
      limits:
        nvidia.com/mig-1g.10gb: 1

The exact resource names depend on the MIG profiles configured on your GPUs, for example nvidia.com/mig-1g.10gb, nvidia.com/mig-2g.20gb, or nvidia.com/mig-3g.40gb.

MIG and multi-GPU workloads⚓︎

NCCL is not supported with MIG (NVIDIA MIG deployment considerations). This means tensor parallelism — which vLLM and PyTorch use via NCCL — cannot span multiple MIG instances, even on the same physical GPU. As a result, a single model inference process is limited to one MIG slice.

Data parallelism (running independent model replicas, each on its own MIG slice) works fine in any constellation, since each replica uses only its assigned instance and no inter-instance communication is needed.

Monitoring of GPU Usage⚓︎

Checking how many GPUs are available on the cluster and how many are in use⚓︎

You can use the following script to get the total number of GPUs available in the cluster and how many are currently in use:

# Get total GPUs available in the cluster
total_gpus=$(kubectl get nodes -o=jsonpath='{range .items[*]}{.status.allocatable.nvidia\.com/gpu}{"\n"}{end}' | awk '{s+=$1} END {print s}')

# Get total GPUs currently in use
used_gpus=$(kubectl get pods -A -o=jsonpath='{range .items[*]}{range .spec.containers[*]}{.resources.requests.nvidia\.com/gpu}{"\n"}{end}{end}' | awk '{s+=$1} END {print s}')

echo "Total GPUs: $total_gpus"
echo "GPUs in use: $used_gpus"

Note, that this script will only work if the pods are requesting GPUs in the spec.containers[*].resources.requests.nvidia\.com/gpu field. If the pods are requesting GPUs in a different field, you will need to modify the script accordingly.

For clusters with MIG enabled, nvidia.com/gpu does not include MIG partition resources. Check all NVIDIA allocatable resources on a node with:

kubectl describe node <node-name> | grep -E "nvidia.com/(gpu|mig)"

To list GPU and MIG resource usage by pods, use:

kubectl get pods -A -o json | jq -r '
  .items[] as $pod |
  $pod.spec.containers[]? as $container |
  ($container.resources.limits // {}) |
  to_entries[] |
  select(.key | startswith("nvidia.com/")) |
  "\($pod.metadata.namespace)\t\($pod.metadata.name)\t\(.key)\t\(.value)"
'

Checking which pods are using GPUs and how many⚓︎

You can use the following command to print a table to see which pods, in which namespaces have requested how many GPUs (and what the status is of those pods):

kubectl get pods -A -o=custom-columns='NAMESPACE:metadata.namespace,NAME:metadata.name,STATUS:status.phase,GPUs:spec.containers[*].resources.limits.nvidia\.com/gpu' | grep -v '<none>'

Note that it's possible for pods to be using more GPUs than they requested, so the GPUs column may not always be accurate.