Model Serving FAQ⚓︎

My InferenceService is not ready even though all pods (model and/or transformer) are running⚓︎

Check the logs of the knative-controller pod in the knative-serving namespace (you might need to ask your administrator). You can do this with the following command:

kubectl logs deployment/controller -n knative-serving

If no errors show up, check the other components of knative.

My InferenceService is not ready and the logs show an error like `failed to create private K8s Service: Internal error occurred: failed to allocate a serviceIP: range is full`⚓︎

Example log message:

controller {"severity":"ERROR","timestamp":"2024-05-21T19:42:11.338383374Z","logger":"controller","caller":"controller/controller.go:566","message":"Reconcile error","commit":"e82287d","knative.dev/pod":"controller-fc5fc97bb-g9vjn","knative.dev/controller":"knative.dev.serving.pkg.reconciler.serverlessservice.reconciler","knative.dev/kind":"networking.internal.knative.dev.ServerlessService","knative.dev/traceid":"4ec31ccc-78
4d-4f57-a87b-2b98499c9cec","knative.dev/key":"<namespace>/<name>-transformer-default-00006","duration":"5.8467ms","error":"failed to create private K8s Service: Internal error occurred: failed to allocate a serviceIP: range is full","stacktrace":"knative.dev/pkg/controller.(*Impl).handleErr\n\tknative.dev/pkg@v0.0.0-20221011175852-714b7630a836/controller/controller.go:566\nknative.dev/pkg/controller.(*Impl).processNextWorkItem\
n\tknative.dev/pkg@v0.0.0-20221011175852-714b7630a836/controller/controller.go:543\nknative.dev/pkg/controller.(*Impl).RunContext.func3\n\tknative.dev/pkg@v0.0.0-20221011175852-714b7630a836/controller/controller.go:491"}

You probably have too many services in your cluster. If this is due to too many revisions of InferenceServices, you can either delete those services by hand (kubernetes services, not InferenceServices) or ask your administrator to tweak the garbage collection settings.

401 Unauthorized when accessing model endpoint⚓︎

The request requires the x-api-key header to bypass the istio-system/api-key-filter EnvoyFilter.

curl -v -k \
  -H "Content-Type: application/json" \
  -H "x-api-key: ${X-API-KEY}" \
  ${MODEL_ENDPOINT} \

Note: the -k flag disables TLS certificate verification and is useful for debugging. Do not use it in production scripts. You can get the valid key by running (ask your admin if you don't have access):

kubectl get envoyfilter api-key-filter -n istio-system -o yaml | grep 'api_key ==' | awk '{print $NF}' | tr -d '"'

My LLM is loaded successfully, but model pod restarts right after⚓︎

Model load probably took more time than allowed by knative's Revision progressDeadline. To overcome, set an annotation for your InferenceService:

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: qwen2-5-72b-instruct
spec:
  predictor:
    annotations:
      serving.knative.dev/progress-deadline: 180m
    # ...

Alternatively, ask your admin to edit the default progressDeadline value in the knative-serving/config-features ConfigMap (currently set to 1 hour in prokube deployments).

I want to schedule my model on a specific node⚓︎

Modify spec.predictor.nodeSelector in the InferenceService, e.g.:

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: qwen2-5-72b-instruct
spec:
  predictor:
    nodeSelector:  # also works with affinity and tolerations
      nvidia.com/mig.config: all-disabled  # these are node labels
      nvidia.com/gpu.product: NVIDIA-H100-NVL

NOTE: the node scheduling feature must be enabled in your cluster, otherwise an error will occur. To use the feature, ask your admin to enable it.