GPUs⚓︎

Links to external documentation

prokube uses the NVIDIA GPU Operator to manage GPUs in the cluster. The operator is responsible for the installation of the necessary drivers and the configuration of the GPUs.

There are two ways of sharing a GPU in the cluster: Timeslicing and MIG (Multi Instance GPU). Both allow a GPU to be partitioned into multiple smaller instances, which can be used by different pods. Timeslicing each GPU into 4 equal parts is the default configuration. Using MIG is more effective, but it is only supported by certain GPUs (usually A30 and upwards).

With timeslicing, multiple workloads can share the same physical GPU concurrently on the host. We will refer to each slice as to a logical GPU further in this documentation, as it can be assigned to workloads as nvidia.com/gpu: 1 in Pod requests/limits. Each logical GPU can be assigned to a different workload. For example, if a GPU is timesliced into 4 parts, 4 pods can use the same physical GPU simultaneously by requesting one logical GPU each. A pod can also request multiple logical GPUs if needed.

The processing power of a timesliced GPU is shared between all active workloads on the GPU, meaning that workloads may get more than 25% of the GPU power when there are less than 4 active workloads on a 4x timesliced GPU.

The memory of a timesliced GPU is shared between all active workloads on the GPU, meaning that any workload can attempt to use the full memory of the GPU. However, if the total memory usage of all workloads exceeds the physical memory of the GPU, performance will degrade drastically.

How to set up timeslicing⚓︎

By default, prokube sets up 4x timeslicing for all GPUs in the cluster. No additional configuration is needed. To change this, you can modify a CR called clusterpolicies.nvidia.com/cluster-policy in the gpu-operator-resources namespace.

With MIG, a GPU is partitioned into multiple instances at the hardware level. Each instance has its own dedicated resources (compute cores, memory, bandwidth) and can be assigned to a different workload. This allows for better isolation and performance compared to timeslicing.

How to enable MIG⚓︎

Since not all devices support Multi Instance GPU (MIG), the default setting in prokube is timeslicing. To change this, you can modify a CRD called clusterpolicy. One option is to edit that CRD using tools like kubectl, k9s, or Lens. However, the preferred method is to use GitOps and commit the corresponding changes in the paas repository.

You'll find the necessary customizations in the paas repo under this path: paas/gpus/base/cluster-policy.yaml. Here is an example configuration you might want to use:

apiVersion: nvidia.com/v1
kind: ClusterPolicy
metadata:
  name: cluster-policy
spec:
  mig:
    strategy: single
  migManager:
    config:
      default: all-1g.5gb
      name: default-mig-parted-config

The default-mig-parted-config is a ConfigMap that is automatically deployed by the GPU operator. It holds configurations such as all-1g.5gb or all-balanced, describing how to partition the GPUs. For example, an A30 GPU can be split into 6 or 12 GB units with respect to the GPU memory. To ensure a clean configuration, it is advisable to reinstall the operator. On MicroK8s deployments:

microk8s disable gpu
microk8s enable gpu

On other Kubernetes distributions, uninstall and reinstall the GPU operator using Helm or your preferred method.

Then proceed with the GitOps synchronization.

More details on this can be found in the NVIDIA GPU Operator documentation on MIG support.

Changing MIG partitions⚓︎

To apply a specific configuration, the nodes that hold GPUs must be labeled. The operator labels the nodes with the value specified by default in the cluster policy. To change the configuration, modify the label:

kubectl label nodes example-node-name nvidia.com/mig.config=all-2g.12gb --overwrite

The operator will pick up the label change and reconfigure the GPUs accordingly.

Using GPUs for your own workloads⚓︎

To use GPUs in prokube components, please refer to the respective sections of this documentation. To use GPUs in your own workloads, request them in the pod specification:

apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
spec:
  containers:
  - name: gpu-container
    image: nvidia/cuda:11.0-base
    resources:          
      limits:
        nvidia.com/gpu: 1 # <-- Request 1 GPU

This will request 1 GPU for the container. The NVIDIA GPU Operator will automatically schedule the pod on a node with an available GPU and mount the necessary drivers and libraries from the host into the container. No need for you to install any GPU drivers or libraries in the container image.

Monitoring of GPU Usage⚓︎

Checking how many GPUs are available on the cluster and how many are in use⚓︎

You can use the following script to get the total number of GPUs available in the cluster and how many are currently in use:

# Get total GPUs available in the cluster
total_gpus=$(kubectl get nodes -o=jsonpath='{range .items[*]}{.status.allocatable.nvidia\.com/gpu}{"\n"}{end}' | awk '{s+=$1} END {print s}')

# Get total GPUs currently in use
used_gpus=$(kubectl get pods -A -o=jsonpath='{range .items[*]}{range .spec.containers[*]}{.resources.requests.nvidia\.com/gpu}{"\n"}{end}{end}' | awk '{s+=$1} END {print s}')

echo "Total GPUs: $total_gpus"
echo "GPUs in use: $used_gpus"

Note, that this script will only work if the pods are requesting GPUs in the spec.containers[*].resources.requests.nvidia\.com/gpu field. If the pods are requesting GPUs in a different field, you will need to modify the script accordingly.

Checking which pods are using GPUs and how many⚓︎

You can use the following command to print a table to see which pods, in which namespaces have requested how many GPUs (and what the status is of those pods):

kubectl get pods -A -o=custom-columns='NAMESPACE:metadata.namespace,NAME:metadata.name,STATUS:status.phase,GPUs:spec.containers[*].resources.limits.nvidia\.com/gpu' | grep -v '<none>'

Note that it's possible for pods to be using more GPUs than they requested, so the GPUs column may not always be accurate.