Grafana⚓︎

Links to external documentation

prokube uses one Grafana instance to visualize all metrics and logs. The Grafana instance is served on prokube under the /grafana path.

Custom Dashboards⚓︎

prokube offers a few custom dashboards that make cluster resources overview easier. Check out:

Cluster Overview (Admin) dashboards to conveniently track GPU, CPU, memory, storage (for multi-node clusters) and MinIO resource usage. There are dashboard versions available for single- and multi-node deployments. There are also (User) versions available, which show less details.
NVIDIA DCGM Exporter for more detailed tracking of GPU resources.
vLLM for monitoring vLLM inference metrics (token throughput, latency, cache usage).

Dashboard Persistence⚓︎

Dashboards created in the UI are not persisted

Dashboards created or modified interactively through the Grafana UI are stored in Grafana's internal database. This data is lost when the Grafana pod restarts.

To persist dashboards permanently, export them as JSON and save them as ConfigMaps in Kubernetes. The recommended approach is to commit these ConfigMaps to your GitOps repository so they are automatically provisioned on deployment.

See the Grafana documentation on provisioning dashboards for details on the ConfigMap format.