Kubernetes Access⚓︎
Links to external documentation
prokube runs on a Kubernetes cluster and the Kubeflow profiles are mapped to Kubernetes namespaces.
Your administrator can create a .kubeconfig file for you to access the cluster.
This file contains the necessary information to access the cluster and can be used with tools like kubectl, k9s or OpenLens.
The names of the namespaces are the same as the profile names.
Image Pull Secrets⚓︎
prokube comes with a small operator that copies all secrets from the ops namespace that begin with regcred- to all other namespaces, and patches the service accounts in those namespaces to use those image pull secrets.
It also updates those secrets if they get changed in the ops namespace.
Limits⚓︎
prokube applies some defaults to all workspaces (i.e. all namespaces which are
attached to a "Profile"). By default those limits are a maximum of 100 pods,
this is to prevent overloading the cluster with ten-thousands of pods from a
single workspace. This limit can be adjusted per profile by an admin live in the
cluster, or for all newly created profiles by changing the patch
paas/ops/pk-user-management-operator/profiles/patches/patch_profiles.j2 which
gets automatically applied to all newly created profiles.
To limit resource consumption like CPU, Memory or persistent volumes sizes, you can add any ResourceQuota to the patch (for newly created profiles) or edit already existing profiles.
An example patch in patch_profiles.j2 with resource quotas that limits the
number of pods to 100, the number of persistent volume claims to 2 and the
total size of persistent volumes of the storage class openebs-hostpath to 10Gi
would look like this:
{
"apiVersion": "kubeflow.org/v1",
"kind": "Profile",
"metadata": {
"name": "{{ profile_id }}"
},
"spec": {
"resourceQuotaSpec": {
"hard": {
"count/pods": "100",
"persistentvolumeclaims": "2",
"openebs-hostpath.storageclass.storage.k8s.io/requests.storage": "10Gi"
}
}
}
}
FAQ⚓︎
How do I debug pods?⚓︎
Often the events of the pod give a hint what went wrong. You can get the events
of a pod with kubectl describe pod <pod-name> -n <namespace>. Note, that the
events are not kept forever, so if the pod was created a long time ago, you
might not see the events anymore. In that case, you can try to delete the pod
and re-run the pipeline.
For crashing pods, you can also get the logs of the container with kubectl logs
<pod-name> -n <namespace>. If the container is crashing immediately after
starting, you can also try to run the container locally with docker run to see
if it crashes there as well.
If the logs are not available anymore, you can also get the logs from Loki via
the Grafana interface which is served on prokube under the /grafana path.
Example failures and their solutions:⚓︎
There is extensive info about debugging in the Kubernetes documentation.
Some common errors and their solutions are:
Status CreateContainerConfigError⚓︎
-
Error: secret "<secret-name>" not found: The secret<secret-name>is missing in the namespace. -
Error: couldn't find key <MY-SECRET-KEY> in Secret <namespace>/<secret-name>: The secretis missing the key .
Status ImagePullBackOff⚓︎
The image cannot be pulled. Check if the image exists and if the image pull secret is set correctly. If the image is private, you need to set the image pull secret in the pipeline's service account. See the KFP Cookbook for more information.
Status CrashLoopBackOff⚓︎
The container crashes immediately after starting. Check the logs of the
container with kubectl logs <pod-name> -n <namespace>. The logs should give
you a hint what went wrong.
Status Init:Error⚓︎
The init container failed. Check the logs of the init container with kubectl
logs <pod-name> -n <namespace> -c <init-container-name>.
If you see something like exec /usr/local/bin/argoexec: exec format error the
binary is not compatible with the architecture of the container. You might be
using arm64, but your binary is compiled for amd64.
How do I access my service?⚓︎
You can access services in your namespace using port forwarding:
kubectl port-forward -n <namespace> svc/<service-name> <local-port>:<service-port>
localhost:<local-port>.
My portforwards keep dying, how do I automatically re-open them?⚓︎
while :; do kubectl port-forward -n <namespace> <pod-or-service-name> <local-port>:<remote-port>; done
How do I know what I can do on the cluster (permissions)?⚓︎
Run kubectl auth can-i on the shell
After restart, some pods cannot attach their PVCs⚓︎
After restarting a node or the whole cluster, some pods are stuck in ContainerCreating with an error message like this FailedAttachVolume 2s attachdetach-controller Multi-Attach error for volume "pvc-50b69437-1280-4487-aeec-b48282148f98" Volume is already exclusively attached to one node and can't be attached to another.
Find the corresponding VolumeAttachment to the PVC and delete it. E.g. kubectl get volumeattachment to see all VolumeAttachments and then kubectl delete volumeattachment csi-fae461b653134a7c137bcae38b3e7198efbe6cfa1063c617c0cec1da7f8aad6c.
See this post for more info.
Missing user permissions⚓︎
Check that all permissions are present, that the latest version of the Kubernetes user generation script adds to a new user account.
Cannot see resources of shared profiles in OpenLens⚓︎
Non-admin users do not have the permission to list the namespaces of the cluster. That makes handling of resources in OpenLens difficult. To enable namespace switching in OpenLens, go to the corresponding cluster's settings (not the application settings) and then "Namespaces". Add all namespaces there, that you have access to on the cluster. Then you can switch between your namespaces in the resources view.
My Pods cannot pull my image from a docker registry⚓︎
If you see a status of ImagePullBackOff and an event similar to:
Failed to pull image "myregistry.com:4567/group/myimage:latest": rpc error: code = Unknown desc = failed to pull and unpack image "myregistry.com:4567/group/myimage:latest": failed to resolve reference "myregistry.com:4567/group/myimage:latest": failed to authorize: failed to fetch anonymous token: unexpected status: 403 Forbidden
Create a new access token with permissions to pull an image (on GitLab this is
called read_registry). On GitLab the Token name is your username to use with
that specific token. To make sure that the token is working, you can validate it by hand:
Tip
On gitlab, you need to select at least Reporter role to actually enable the token to read the registry.
First make sure that you are logged out at the registry in question and then validate that you currently cannot pull the image (you should see an access forbidden error message):
docker logout myregistry.com
docker pull myregistry.com:4567/group/myimage:latest
Then log in with docker (you should see a message Login Succeeded if it worked) and load your image:
TOKEN=<your-secret-token>
TOKEN_NAME=<your-token-name>
docker login myregistry.com:4567 -u $TOKEN_NAME --password-stdin <<<$TOKEN
docker pull myregistry.com:4567/group/myimage:latest
Create a new secret in your personal namespace and add your new secret as an ImagePullSecret to the default ServiceAccount of your namespace:
NAME=<name-for-your-new-secret>
NAMESPACE=<your-namespace>
SERVER=myregistry.com:4567
kubectl create secret docker-registry $NAME --docker-server=$SERVER --docker-username=$TOKEN_NAME --docker-password=$TOKEN -n $NAMESPACE
kubectl patch serviceaccount default -p '[{"op": "add", "path": "/imagePullSecrets/-", "value": {"name": "<name-for-your-new-secret>"}}]' -n $NAMESPACE --type='json'
Now kubernetes should be able to pull images from your registry.
How do I enable pulling images from a private or internal image registry?⚓︎
To allow Kubernetes to pull images from a private registry, create a docker-registry secret with a name starting with regcred- (e.g., regcred-my-custom-registry).
Option 1: Cluster-wide access (recommended)⚓︎
Create the secret in the ops namespace:
#!/bin/bash
# Create registry secret for cluster-wide access
username='your-registry-token-name'
password='your-token'
server='your.registry.server'
namespace='ops'
secret_name='regcred-my-custom-registry'
kubectl create secret docker-registry $secret_name \
--docker-server=$server \
--docker-username="$username" \
--docker-password=$password \
-n $namespace
Option 2: Single namespace access⚓︎
- Create the secret in your specific namespace (use the script above but set
namespace='your-namespace') - Add it to your ServiceAccount:
kubectl patch serviceaccount default \
-p '[{"op": "add", "path": "/imagePullSecrets/-", "value": {"name": "'$secret_name'"}}]' \
--type='json' \
-n $namespace
Info
When using Option 1 (ops namespace), the prokube secret operator automatically:
- Copies all
regcred-*secrets to every namespace - Adds them to all ServiceAccounts starting with "default"
- Ensures all prokube-supported applications can pull images