C4W2 Assignment Autoscaling TensorFlow model deployments with TF Serving and Kubernetes

Ingrid_H · August 30, 2022, 8:36pm

Hello,
In the Task 8. Testing the model, after running

EXTERNAL_IP=[YOUR_SERVICE_IP] curl -d @locust/request-body.json -X POST http://${EXTERNAL_IP}:8501/v1/models/image_classifier:predict

I get: curl: (52) Empty reply from server

I set EXTERNAL_IP after

kubectl get svc image-classifier

NAME                    TYPE                   CLUSTER-IP    **EXTERNAL-IP**   PORT(S)                              AGE

Is this correct?

I passed the assignment but would like to understand what’s wrong.
Thank you.

magnusbarata · September 4, 2022, 3:24pm

I got the same response as well, and as a result the HPA didn’t work. This means the autograder will fail on the last task of monitoring the load test.

magnusbarata · September 4, 2022, 5:04pm

After further investigation, it seems that the latest tf-serving image is the cause.
The latest image on Docker hub was updated on Aug 30.
To fix this, you need to change the image (to something like 2.8.0) on the deployment manifest.
After that change, your tf-serving/deployment.yaml should look something like this:

apiVersion: apps/v1
kind: Deployment
metadata: # kpt-merge: default/image-classifier
  name: image-classifier
  namespace: default
  labels:
    app: image-classifier
spec:
  replicas: 1
  selector:
    matchLabels:
      app: image-classifier
  template:
    metadata:
      labels:
        app: image-classifier
    spec:
      containers:
      - name: tf-serving
        image: "tensorflow/serving:2.8.0"
        args:
        - "--model_name=$(MODEL_NAME)"
        - "--model_base_path=$(MODEL_PATH)"
        envFrom:
        - configMapRef:
            name: tfserving-configs
        imagePullPolicy: IfNotPresent
        readinessProbe:
          tcpSocket:
            port: 8500
          initialDelaySeconds: 10
          periodSeconds: 5
          failureThreshold: 10
        ports:
        - name: http
          containerPort: 8501
          protocol: TCP
        - name: grpc
          containerPort: 8500
          protocol: TCP
        resources:
          requests:
            cpu: "3"
            memory: 4Gi

Please note that you need to do this after passing the assessment of “Task 5. Creating TensorFlow Serving deployment”, because it seems that the assessment also check the tf-serving version.

So in short, what you need to do is:

Do as the instruction says up to assessment on task 5 completed.
Update the deployment manifest as shown above, and apply it one more time i.e.
kubectl apply -f tf-serving/deployment.yaml
The rest is as instructed in the lab.

Until we get official clarification, I hope this helps.

chris.favila · September 6, 2022, 2:47am

Hi! Thank you for sharing this workaround! We’ll investigate the issue and escalate to our partners if needed. Thanks again!

chris.favila · September 6, 2022, 2:50am

Hi Ingrid! Welcome to the community! Sorry saw this just now. We’ll investigate so our partners can update the lab. Thank you for reporting!

chris.favila · September 6, 2022, 3:48am

Hi everyone! I am able to replicate this issue and have reported it to our partners. Will update this thread as soon as I hear from them. Thanks!

flake · September 6, 2022, 9:59pm

I’d say it’s bad practice to have an implicit :latest there - it just breaks eventually. Plus no digest but this might be fine for a tutorial.

Aanand · September 28, 2022, 2:32am

‘’’
EXTERNAL_IP=34.68.208.107

curl -d @locust/request-body.json -X POST http://${EXTERNAL_IP}:8501/v1/models/image_classifier:predict
Warning: Couldn’t read data from file “locust/request-body.json”, this makes
Warning: an empty POST.
{
“error”: “JSON Parse error: The document is empty”

‘’’

Getting the same error.
Passed, How

Aanand · September 28, 2022, 2:42am

I have not cleared task 5 despite passing everything.
Then task 6 and task 7 get cleared. But in task 8 get error despite declaring the external ip
EXTERNAL_IP=34.68.208.107
curl -d @locust/request-body.json -X POST http://${EXTERNAL_IP}:8501/v1/models/image_classifier:predict
Warning: Couldn’t read data from file “locust/request-body.json”, this makes
Warning: an empty POST.
{
“error”: “JSON Parse error: The document is empty”

Getting above error. But passed.
Kindly enlighten

Aanand · September 29, 2022, 11:05am

Completed the entire exercise with load test.
Yet the grader not assessing forTask 5 despite creating deployments
student_01_ed20a4bdd641@cloudshell:~/tfserving-gke (qwiklabs-gcp-01-fcba78b1c319)$ kubectl get deployments
NAME READY UP-TO-DATE AVAILABLE AGE
image-classifier 1/1 1 1 53m

Aanand · September 29, 2022, 11:36am

@Zaid_Askari But do not wait here. Go ahead and complete he assignment . You will pass !

chris.favila · October 3, 2022, 4:32am

Hi everyone! We’ve reported the new bug to our partners so it can be fixed. In the meantime, you can skip the checkpoint for Task 5 after you see that the deployment is READY (i.e. 1/1). You will still be able to complete the lab and get a passing score of 85/100. More importantly, you will still see how to set up autoscaling in your model deployments.

Lab scores are not shown on your public certificates. However, in case you want to get 100/100 without waiting for the official fix, you can follow this workaround:

Complete all tasks (i.e. up to Task 12) to get 85/100.
Return to the Terminal and navigate outside the locust folder: cd ...
You should now be inside the ~/tfserving-gke directory. Here, you can terminate the deployment: kubectl delete -f tf-serving/deployment.yaml
Open the Cloud Editor and navigate to tfserving-gke/tf-serving/deployment.yaml
Edit line 34 from image: "tensorflow/serving:2.8.0" to image: "tensorflow/serving"
Save the file.
Go back to the Terminal and start the deployment: kubectl apply -f tf-serving/deployment.yaml
Wait for 5 minutes and click the Task 5 checkpoint. It should now be marked as passed. Note: In my attempt, the checkpoint passed even if the deployment was not ready (i.e. shown as 0/1 when you do kubectl get deployments)

Hope this helps. Temporarily marking this as the solution for visibility. Will update this thread once the bug is fixed. Thank you!

chris.favila · October 8, 2022, 12:54pm

Hi everyone! The issue with Task 5 should now be fixed. Feel free to retry the lab if you want. Thanks!

Abonia_Sojasingaraya · April 7, 2023, 2:59pm

same error!Please provide the solution if you were able to solve it in your side.

Topic		Replies	Views
C4W2: Autoscaling TensorFlow Model Deployments with TF Serving and Kubernetes: External IP Deploying Machine Learning Models in Production	6	567	July 29, 2023
C4W2 - QwickLabs - External IP issue Deploying Machine Learning Models in Production	8	694	July 21, 2022
C4W2: Autoscaling TensorFlow Model Deployments with TF Serving and Kubernetes Deploying Machine Learning Models in Production	4	627	February 20, 2023
C4W2 Graded Lab: Autoscaling TensorFlow model deployments with TF Serving and Kubernetes - Task5 Deploying Machine Learning Models in Production	2	558	January 9, 2023
C4W2:Autoscaling TensorFlow Model Deployments with TF Serving and Kubernetes: Issue with Task 11 Deploying Machine Learning Models in Production	2	453	July 29, 2023

C4W2 Assignment Autoscaling TensorFlow model deployments with TF Serving and Kubernetes

Related topics