C4-W2-Ungraded Lab: Intro to Kubernetes-Issue with yaml files

Hello,

I cannot access the pod or the service (lab: “C4-W2-Ungraded Lab: Intro to Kubernetes”). I feel that the issue may be with the yaml files (but I am not sure). I pulled the “tensorflow/serving” image in my Docker Desktop before I started the lab. When I deploy in wsl using Docker as the vm-driver, everything seems to work fine until I try to access the exposed service using a curl command or even when I try to access the pod itself to execute commands on the pod. At that point, it breaks the deployment. I have attached my commands and output for you. I feel that it may not be using any image (not sure if that’s possible, though, because the deployment seems to be working at first). Can someone please point out to me what is wrong with what I am doing? Thanks.

Regards,
Matt

My commands and outputs:

mahadi@moham:~/tmp/saved_model_half_plus_two_cpu$ pwd
/home/mahadi/tmp/saved_model_half_plus_two_cpu
mahadi@moham:~/tmp/saved_model_half_plus_two_cpu$ cd ~
mahadi@moham:~$ pwd
/home/mahadi
mahadi@moham:~$ minikube start --mount=True --mount-string="/home/mahadi/tmp:/var/tmp" --vm-driver=docker
😄  minikube v1.27.1 on Ubuntu 20.04 (amd64)
✨  Using the docker driver based on user configuration
📌  Using Docker driver with root privileges
👍  Starting control plane node minikube in cluster minikube
🚜  Pulling base image ...
🔥  Creating docker container (CPUs=2, Memory=2200MB) ...
🐳  Preparing Kubernetes v1.25.2 on Docker 20.10.18 ...
    ▪ Generating certificates and keys ...
    ▪ Booting up control plane ...
    ▪ Configuring RBAC rules ...
🔎  Verifying Kubernetes components...
    ▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5
🌟  Enabled addons: default-storageclass, storage-provisioner
🏄  Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default
mahadi@moham:~$ kubectl apply -f yaml/configmap.yaml
configmap/tfserving-configs created
mahadi@moham:~$ kubectl describe cm tfserving-configs
Name:         tfserving-configs
Namespace:    default
Labels:       <none>
Annotations:  <none>

Data
====
MODEL_NAME:
----
half_plus_two
MODEL_PATH:
----
/models/half_plus_two

BinaryData
====

Events:  <none>
mahadi@moham:~$ kubectl apply -f yaml/deployment.yaml
deployment.apps/tf-serving-deployment created
mahadi@moham:~$ kubectl get deploy
NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
tf-serving-deployment   0/1     1            0           7s
mahadi@moham:~$ kubectl get deploy
NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
tf-serving-deployment   0/1     1            0           28s
mahadi@moham:~$ kubectl get deploy
NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
tf-serving-deployment   1/1     1            1           43s
mahadi@moham:~$ kubectl apply -f yaml/service.yaml
service/tf-serving-service created
mahadi@moham:~$ kubectl get svc tf-serving-service
NAME                 TYPE       CLUSTER-IP     EXTERNAL-IP   PORT(S)          AGE
tf-serving-service   NodePort   10.104.81.39   <none>        8501:30001/TCP   14s
mahadi@moham:~$ curl -d '{"instances": [1.0, 2.0, 5.0]}' -X POST http://127.0.0.1:41713/v1/models/half_plus_two:predict
curl: (56) Recv failure: Connection reset by peer
mahadi@moham:~$ kubectl get deploy
NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
tf-serving-deployment   0/1     1            0           2m48s
mahadi@moham:~$ kubectl get deploy
NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
tf-serving-deployment   1/1     1            1           3m25s
mahadi@moham:~$ kubectl get deploy
NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
tf-serving-deployment   1/1     1            1           4m10s
mahadi@moham:~$ kubectl get pods
NAME                                     READY   STATUS             RESTARTS      AGE
tf-serving-deployment-84c7c5448b-9f9k9   0/1     CrashLoopBackOff   3 (31s ago)   4m43s
mahadi@moham:~$ kubectl exec -ti tf-serving-deployment-84c7c5448b-9f9k9 -- curl localhost:8501
error: unable to upgrade connection: container not found ("tf-serving")
mahadi@moham:~$ kubectl get deploy
NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
tf-serving-deployment   0/1     1            0           6m26s
mahadi@moham:~$ kubectl get deploy
NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
tf-serving-deployment   0/1     1            0           7m5s
mahadi@moham:~$ kubectl get deploy
NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
tf-serving-deployment   0/1     1            0           7m23s
mahadi@moham:~$ kubectl get deploy
NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
tf-serving-deployment   1/1     1            1           7m37s
mahadi@moham:~$ kubectl get pods
NAME                                     READY   STATUS    RESTARTS       AGE
tf-serving-deployment-84c7c5448b-9f9k9   1/1     Running   5 (109s ago)   7m42s
mahadi@moham:~$ kubectl exec -ti tf-serving-deployment-84c7c5448b-9f9k9 -- bash
root@tf-serving-deployment-84c7c5448b-9f9k9:/# command terminated with exit code 137
mahadi@moham:~$ kubectl get deploy
NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
tf-serving-deployment   0/1     1            0           8m23s

Another terminal from where I got the service url:

mahadi@moham:~$ minikube service tf-serving-service --url
http://127.0.0.1:41713
❗  Because you are using a Docker driver on linux, the terminal needs to be open to run it.

The tensorflow/serving image is pulled from the cloud registry. The way the image is specified in deployment.yaml is correct.

For some odd reason, virtualbox driver works fine whereas docker drivier doesn’t.

Here’s the logs for virtualbox:

$ kubectl logs -f tf-serving-deployment-84c7c5448b-qcw5h
2022-10-23 07:09:28.443664: I tensorflow_serving/model_servers/server.cc:89] Building single TensorFlow model file config:  model_name: half_plus_two model_base_path: /models/half_plus_two
2022-10-23 07:09:28.542783: I tensorflow_serving/model_servers/server_core.cc:465] Adding/updating models.
2022-10-23 07:09:28.642558: I tensorflow_serving/model_servers/server_core.cc:594]  (Re-)adding model: half_plus_two
2022-10-23 07:10:03.942898: I tensorflow_serving/core/basic_manager.cc:740] Successfully reserved resources to load servable {name: half_plus_two version: 123}
2022-10-23 07:10:03.943060: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: half_plus_two version: 123}
2022-10-23 07:10:03.943168: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: half_plus_two version: 123}
2022-10-23 07:10:04.050410: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:45] Reading SavedModel from: /models/half_plus_two/00000123
2022-10-23 07:10:04.245332: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:89] Reading meta graph with tags { serve }
2022-10-23 07:10:04.245493: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:130] Reading SavedModel debug info (if present) from: /models/half_plus_two/00000123
2022-10-23 07:10:04.250039: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-10-23 07:10:10.443469: I external/org_tensorflow/tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:354] MLIR V1 optimization pass is not enabled
2022-10-23 07:10:10.742613: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:229] Restoring SavedModel bundle.
2022-10-23 07:10:15.842522: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:213] Running initialization op on SavedModel bundle at path: /models/half_plus_two/00000123
2022-10-23 07:10:17.842692: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:305] SavedModel load for tags { serve }; Status: success: OK. Took 13792294 microseconds.
2022-10-23 07:10:17.844166: I tensorflow_serving/servables/tensorflow/saved_model_warmup_util.cc:62] No warmup data file found at /models/half_plus_two/00000123/assets.extra/tf_serving_warmup_requests
2022-10-23 07:10:50.942567: I tensorflow_serving/core/loader_harness.cc:95] Successfully loaded servable version {name: half_plus_two version: 123}
2022-10-23 07:10:51.142848: I tensorflow_serving/model_servers/server_core.cc:486] Finished adding/updating models
2022-10-23 07:10:51.143249: I tensorflow_serving/model_servers/server.cc:133] Using InsecureServerCredentials
2022-10-23 07:10:51.143324: I tensorflow_serving/model_servers/server.cc:395] Profiler service is enabled
2022-10-23 07:10:51.542713: I tensorflow_serving/model_servers/server.cc:421] Running gRPC ModelServer at 0.0.0.0:8500 ...
[warn] getaddrinfo: address family for nodename not supported
2022-10-23 07:10:51.743451: I tensorflow_serving/model_servers/server.cc:442] Exporting HTTP/REST API at:localhost:8501 ...
[evhttp_server.cc : 245] NET_LOG: Entering the event loop ...

Here’s the curl output:

$ curl -d '{"instances": [1.0, 2.0, 5.0]}' -X POST $(minikube ip):30001/v1/models/half_plus_two:predict
{
    "predictions": [2.5, 3.0, 4.5
    ]
}

This is the log when using docker driver:

$ kubectl logs -f tf-serving-deployment-84c7c5448b-jl486
2022-10-23 07:16:21.624013: I tensorflow_serving/model_servers/server.cc:89] Building single TensorFlow model file config:  model_name: half_plus_two model_base_path: /models/half_plus_two
2022-10-23 07:16:21.823624: I tensorflow_serving/model_servers/server_core.cc:465] Adding/updating models.
2022-10-23 07:16:21.823684: I tensorflow_serving/model_servers/server_core.cc:594]  (Re-)adding model: half_plus_two
/usr/bin/tf_serving_entrypoint.sh: line 3:     7 Killed                  tensorflow_model_server --port=8500 --rest_api_port=8501 --model_name=${MODEL_NAME} --model_base_path=${MODEL_BASE_PATH}/${MODEL_NAME} "$@"

Before moving further, could you please try on virtualbox and confirm that things work?

Hi Balaji,

I could finally get Docker to work with this lab by changing the resource limits in the yaml file, but VirtualBox keeps giving me a CrashLoopBackOff, and it is not clear what is causing it. Probably, the yaml files need to be adjusted again. I can make VirtualBox work with the previous labs but not with this one because of this CrashLoopBackOff error.

Regards,
Matt

PS C:\> kubectl get pods
NAME                          READY   STATUS             RESTARTS      AGE
tf-serving-656989d955-tp6qq   0/1     CrashLoopBackOff   2 (53s ago)   114s
PS C:\> kubectl describe pods
Name:             tf-serving-656989d955-tp6qq
Namespace:        default
Priority:         0
Service Account:  default
Node:             minikube/192.168.59.115
Start Time:       Tue, 25 Oct 2022 18:20:46 -0700
Labels:           app=tf-serving
                  pod-template-hash=656989d955
Annotations:      <none>
Status:           Running
IP:               172.17.0.3
IPs:
  IP:           172.17.0.3
Controlled By:  ReplicaSet/tf-serving-656989d955
Containers:
  tf-serving:
    Container ID:   docker://643dcda59bd942adf6e057fa460cc2a3a652b6ac8226967b6473a14391f53176
    Image:          tensorflow/serving:latest
    Image ID:       docker-pullable://tensorflow/serving@sha256:6c3c199683df6165f5ae28266131722063e9fa012c15065fc4e245ac7d1db980
    Port:           8501/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    132
      Started:      Tue, 25 Oct 2022 18:22:20 -0700
      Finished:     Tue, 25 Oct 2022 18:22:20 -0700
    Ready:          False
    Restart Count:  3
    Limits:
      cpu:     500m
      memory:  640M
    Requests:
      cpu:     100m
      memory:  320M
    Environment Variables from:
      tfserving-configs  ConfigMap  Optional: false
    Environment:         <none>
    Mounts:
      /models/half_plus_two from tf-serving-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jz2qj (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  tf-serving-volume:
    Type:          HostPath (bare host directory volume)
    Path:          /var/tmp/saved_model_half_plus_two_cpu
    HostPathType:  Directory
  kube-api-access-jz2qj:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  2m3s               default-scheduler  Successfully assigned default/tf-serving-656989d955-tp6qq to minikube
  Normal   Pulled     85s                kubelet            Successfully pulled image "tensorflow/serving:latest" in 35.266442322s
  Normal   Pulled     82s                kubelet            Successfully pulled image "tensorflow/serving:latest" in 1.205746643s
  Normal   Pulled     63s                kubelet            Successfully pulled image "tensorflow/serving:latest" in 1.093954775s
  Normal   Pulling    31s (x4 over 2m)   kubelet            Pulling image "tensorflow/serving:latest"
  Normal   Created    30s (x4 over 85s)  kubelet            Created container tf-serving
  Normal   Pulled     30s                kubelet            Successfully pulled image "tensorflow/serving:latest" in 1.154055431s
  Normal   Started    29s (x4 over 84s)  kubelet            Started container tf-serving
  Warning  BackOff    7s (x6 over 82s)   kubelet            Back-off restarting failed container
PS C:\> kubectl logs tf-serving-656989d955-tp6qq --previous --tail 10
/usr/bin/tf_serving_entrypoint.sh: line 3:     7 Illegal instruction     (core dumped) tensorflow_model_server --port=8500 --rest_api_port=8501 --model_name=${MODEL_NAME} --model_base_path=${MODEL_BASE_PATH}/${MODEL_NAME} "$@"

Hi @ahadimatt , could you show me how to change the resource limits in the yaml file, I’m facing the same issue right now, I’m using Windows + Docker driver because virtualbox driver doesn’t work after many tries. Thank you in advance.

Hello @Josh_Chen Please checkout this blog

Hi all,
minikube often have issues with DockerDesktop as Docker doesn’t support port. You typically expose the service outside the cluster with a Service of type: Loadbalancer or use an Ingress-gateway. At the moment, I haven’t seen Docker patched it yet.

See this issue and minikube’s handbook about NodeIP

Therefore, Virtualbox’s method is more stable than Docker’s. For Windows OS, please try again with these changes:

  1. When you run minikube start --mount=True --mount-string="C:/tmp:/var/tmp" --vm-driver=virtualbox

If there’s an error like this:

This VM is having trouble accessing https://k8s.gcr.io
 To pull new external images, you may need to configure a proxy: https://minikube.sigs.k8s.io/docs/reference/networking/proxy/

You need to set your proxy before starting minikube

Run minikube ip
Then copy your IP to this command:

set NO_PROXY=localhost,127.0.0.1,10.96.0.0/12,192.168.59.0/24,192.168.49.0/24,192.168.39.0/24,<your_minikube_ip>

Then start minikube as normal:

minikube start --mount=True --mount-string="C:/tmp:/var/tmp" --vm-driver=virtualbox

The curl command is different on Windows. The valid Json file should be be in "" and not ' like in the Ungrade_lab file. Run this curl command on Command Prompt and you should see the correct output:

curl -d "{""instances"": [1.0, 2.0, 5.0]}" -X POST http://(your_minikube_ip):30001/v1/models/half_plus_two:predict

@Th_o_Vy_Le_Nguy_n thank you for your response. I tried your virtualbox’s method, 1st setup proxy, and run the windows command on PowerShell, I got the following error.


Then I followed the suggestion and used minikube delete and did the above steps again, I’m having this error now.

ok, then I just googled a bit and found that there is an option we can try which is --no-vtx-check so I added this to the minikube start command and it ended up like this minikube start --mount=True --mount-string="C:/tmp:/var/tmp" --vm-driver=virtualbox --no-vtx-check. well well, unfortunately, I got this error:

so seems like Virtualbox’s method is more buggy on my side, Oops, also, I can confirm that I enabled virtualization on my PC.
image

Hi @Josh_Chen, please try reinstall VirtualBox 7.0.6 (Window host)
When you reboot, please ensure

  1. Virtualization is enable in BIOS
  2. Turn off Hyper-V
  3. SGX is set to always not software control

If the errors still persist, you can try again with Docker. The README.md has been updated, so let us know if there’s any errors appear again. We can also try screening support if you want to.

Happy Learning,
Vy