GPUs⚓︎
Links to external documentation
prokube uses the NVIDIA GPU Operator to manage GPUs in the cluster. The operator is responsible for the installation of the necessary drivers and the configuration of the GPUs.
There are two ways of sharing a GPU in the cluster: Timeslicing and MIG (Multi Instance GPU). Both allow a GPU to be partitioned into multiple smaller instances, which can be used by different pods. Timeslicing each GPU into 4 equal parts is the default configuration. Using MIG is more effective, but it is only supported by certain GPUs (usually A30 and upwards).
GPU Sharing with Timeslicing⚓︎
With timeslicing, multiple workloads can share the same physical GPU concurrently on the host. We will refer to each slice as to a logical GPU further in this documentation, as it can be assigned to workloads as nvidia.com/gpu: 1 in Pod requests/limits. Each logical GPU can be assigned to a different workload. For example, if a GPU is timesliced into 4 parts, 4 pods can use the same physical GPU simultaneously by requesting one logical GPU each. A pod can also request multiple logical GPUs if needed.
The processing power of a timesliced GPU is shared between all active workloads on the GPU, meaning that workloads may get more than 25% of the GPU power when there are less than 4 active workloads on a 4x timesliced GPU.
The memory of a timesliced GPU is shared between all active workloads on the GPU, meaning that any workload can attempt to use the full memory of the GPU. However, if the total memory usage of all workloads exceeds the physical memory of the GPU, performance will degrade drastically.
How to set up timeslicing⚓︎
By default, prokube sets up 4x timeslicing for all GPUs in the cluster. No additional configuration is needed. To change this, you can modify a CR called clusterpolicies.nvidia.com/cluster-policy in the gpu-operator-resources namespace.
GPU Sharing with MIG (Multi Instance GPU)⚓︎
With MIG, a GPU is partitioned into multiple instances at the hardware level. Each instance has its own dedicated resources (compute cores, memory, bandwidth) and can be assigned to a different workload. This allows for better isolation and performance compared to timeslicing.
How to enable MIG⚓︎
Since not all devices support Multi Instance GPU (MIG), the default setting in prokube is timeslicing.
To change this, you can modify a CRD called clusterpolicy.
One option is to edit that CRD using tools like kubectl, k9s, or Lens.
However, the preferred method is to use GitOps and commit the corresponding changes in the paas repository.
You'll find the necessary customizations in the paas repo under this path:
paas/gpus/base/cluster-policy.yaml.
Here is an example configuration you might want to use:
apiVersion: nvidia.com/v1
kind: ClusterPolicy
metadata:
name: cluster-policy
spec:
mig:
strategy: single
migManager:
config:
default: all-1g.5gb
name: default-mig-parted-config
default-mig-parted-config is a ConfigMap that is automatically deployed by the GPU operator.
It holds configurations such as all-1g.5gb or all-balanced, describing how to partition the GPUs.
For example, an A30 GPU can be split into 6 or 12 GB units with respect to the GPU memory.
For heterogeneous GPU nodes, use mig.strategy: mixed and configure MIG only
for the devices that support it. Non-MIG-capable GPUs remain available as full
nvidia.com/gpu resources. If you define custom MIG profiles in your own
config map, keep the default set to a value accepted by the GPU Operator, such
as all-disabled, and activate the custom profile through the node label
described below. Some GPU Operator versions validate the default field against
built-in profiles and reject custom profile names.
Example custom MIG config map:
apiVersion: v1
kind: ConfigMap
metadata:
name: custom-mig-config
namespace: gpu-operator-resources
data:
config.yaml: |-
version: v1
mig-configs:
all-disabled:
- devices: all
mig-enabled: false
a100-80gb-balanced:
- device-filter: ["0x20B510DE"]
devices: all
mig-enabled: true
mig-devices:
"1g.10gb": 2
"2g.20gb": 1
"3g.40gb": 1
The device-filter value is derived from the PCI device ID. For example,
lspci -nn | grep -i nvidia may show [10de:20b5]; the NVIDIA MIG config
format expects this as 0x20B510DE. If all GPUs on the node are the same
model, you can omit device-filter and use device indices directly instead (e.g.
devices: [0, 1, 2]).
Reference the custom config map from the ClusterPolicy:
apiVersion: nvidia.com/v1
kind: ClusterPolicy
metadata:
name: cluster-policy
spec:
mig:
strategy: mixed
migManager:
config:
name: custom-mig-config
default: all-disabled
To ensure a clean configuration, it is advisable to reinstall the operator. On MicroK8s deployments:
microk8s disable gpu
microk8s enable gpu
On other Kubernetes distributions, uninstall and reinstall the GPU operator using Helm or your preferred method.
Then proceed with the GitOps synchronization.
More details on this can be found in the NVIDIA GPU Operator documentation on MIG support.
Changing MIG partitions⚓︎
To apply a specific configuration, the nodes that hold GPUs must be labeled. The operator labels the nodes with the value specified by default in the cluster policy. To change the configuration, modify the label:
kubectl label nodes example-node-name nvidia.com/mig.config=all-2g.12gb --overwrite
The operator will pick up the label change and reconfigure the GPUs accordingly.
For custom profiles, use the custom profile name from the MIG config map:
kubectl label nodes example-node-name nvidia.com/mig.config=a100-80gb-balanced --overwrite
Check whether the configuration was applied successfully:
kubectl get nodes -o custom-columns=NAME:.metadata.name,MIG:.metadata.labels.nvidia\.com/mig\.config,STATE:.metadata.labels.nvidia\.com/mig\.config\.state
The expected state is success. If the state becomes failed, inspect the MIG Manager logs:
kubectl get pods -n gpu-operator-resources | grep -i mig
kubectl logs -n gpu-operator-resources <mig-manager-pod-name> --tail=200
Changing MIG mode usually requires a GPU reset. The reset can fail if GPU devices are still in use by workloads or host processes. Before changing MIG partitions, stop GPU workloads and verify that no pods request NVIDIA resources:
kubectl get pods -A -o json | jq -r '
.items[] as $pod |
$pod.spec.containers[]? as $container |
($container.resources.limits // {}) |
to_entries[] |
select(.key | startswith("nvidia.com/")) |
"\($pod.metadata.namespace)\t\($pod.metadata.name)\t\(.key)\t\(.value)"
'
If the MIG Manager still reports that a GPU is in use, inspect host processes on the affected node:
sudo fuser -v /dev/nvidia*
On single-node clusters or after driver installation, rebooting the node can be the simplest way to release all GPU clients and allow the GPU reset to complete.
Inspecting MIG configuration⚓︎
To view the current MIG partitioning on a node, run nvidia-smi -L inside the
driver daemonset container:
kubectl exec -it -n gpu-operator-resources ds/nvidia-driver-daemonset -- nvidia-smi -L
To list all MIG profiles supported by the GPUs on a node:
kubectl exec -it -n gpu-operator-resources ds/nvidia-driver-daemonset -- nvidia-smi mig -lgip
Not all profiles listed here may be valid in combination with others. The NVIDIA MIG user guide lists the supported profiles per GPU model.
Using GPUs for your own workloads⚓︎
To use GPUs in prokube components, please refer to the respective sections of this documentation. To use GPUs in your own workloads, request them in the pod specification:
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod
spec:
containers:
- name: gpu-container
image: nvidia/cuda:11.0-base
resources:
limits:
nvidia.com/gpu: 1 # <-- Request 1 GPU
For MIG partitions, request the MIG resource name instead of nvidia.com/gpu:
apiVersion: v1
kind: Pod
metadata:
name: mig-pod
spec:
containers:
- name: gpu-container
image: nvidia/cuda:11.0-base
resources:
limits:
nvidia.com/mig-1g.10gb: 1
The exact resource names depend on the MIG profiles configured on your GPUs, for
example nvidia.com/mig-1g.10gb, nvidia.com/mig-2g.20gb, or
nvidia.com/mig-3g.40gb.
MIG and multi-GPU workloads⚓︎
NCCL is not supported with MIG (NVIDIA MIG deployment considerations). This means tensor parallelism — which vLLM and PyTorch use via NCCL — cannot span multiple MIG instances, even on the same physical GPU. As a result, a single model inference process is limited to one MIG slice.
Data parallelism (running independent model replicas, each on its own MIG slice) works fine in any constellation, since each replica uses only its assigned instance and no inter-instance communication is needed.
Monitoring of GPU Usage⚓︎
Checking how many GPUs are available on the cluster and how many are in use⚓︎
You can use the following script to get the total number of GPUs available in the cluster and how many are currently in use:
# Get total GPUs available in the cluster
total_gpus=$(kubectl get nodes -o=jsonpath='{range .items[*]}{.status.allocatable.nvidia\.com/gpu}{"\n"}{end}' | awk '{s+=$1} END {print s}')
# Get total GPUs currently in use
used_gpus=$(kubectl get pods -A -o=jsonpath='{range .items[*]}{range .spec.containers[*]}{.resources.requests.nvidia\.com/gpu}{"\n"}{end}{end}' | awk '{s+=$1} END {print s}')
echo "Total GPUs: $total_gpus"
echo "GPUs in use: $used_gpus"
Note, that this script will only work if the pods are requesting GPUs in the
spec.containers[*].resources.requests.nvidia\.com/gpu field. If the pods are
requesting GPUs in a different field, you will need to modify the script
accordingly.
For clusters with MIG enabled, nvidia.com/gpu does not include MIG partition
resources. Check all NVIDIA allocatable resources on a node with:
kubectl describe node <node-name> | grep -E "nvidia.com/(gpu|mig)"
To list GPU and MIG resource usage by pods, use:
kubectl get pods -A -o json | jq -r '
.items[] as $pod |
$pod.spec.containers[]? as $container |
($container.resources.limits // {}) |
to_entries[] |
select(.key | startswith("nvidia.com/")) |
"\($pod.metadata.namespace)\t\($pod.metadata.name)\t\(.key)\t\(.value)"
'
Checking which pods are using GPUs and how many⚓︎
You can use the following command to print a table to see which pods, in which namespaces have requested how many GPUs (and what the status is of those pods):
kubectl get pods -A -o=custom-columns='NAMESPACE:metadata.namespace,NAME:metadata.name,STATUS:status.phase,GPUs:spec.containers[*].resources.limits.nvidia\.com/gpu' | grep -v '<none>'
GPUs column may not always be accurate.