C4W3 Canary Releases: Error when making requests

warner.e · August 3, 2022, 10:35am

I found that whenever I made a request, I received the following error:
upstream connect error or disconnect/reset before headers. reset reason: connection termination
I could still pass the lab, but could not see the canary deployment of ResNet101. Does anyone know why this occurs, and how to get around it?

tatoooo · August 5, 2022, 1:28pm

I also got the same error and for me the issue was due to wrong model path in the configmap.

Otherwise I would make sure that the model server is actually up and listening on port 8501, e.g.

# enter the default container for your pod for resnet50/resnet101
kubectl exec -it <POD_NAME> /bin/bash

# check model server process
ps aux | grep tensorflow_model_server

# check port 8501 is open 
apt-get update
apt-get install lsof 
lsof -i :8501

Ingrid_H · August 31, 2022, 8:39pm

I’m also getting the same error.

d1ggs · September 2, 2022, 7:18am

Same error, cannot get it to predict even on resnet50.

Looking at the pod status with kubectl get pods, the pods get restarted after the request, so it looks like something is crashing the pod upon arrival of the request.

I tried following @tatoooo 's suggestions but the service looks healthy and the ports are open.

magnusbarata · September 5, 2022, 5:58pm

@d1ggs I faced the same issue and the solution I provide here seems to solve it.

Concretely, you can try to modify tf-serving image on the deployment manifest of tf-serving/deployment-resnet50.yaml and tf-serving/deployment-resnet101.yaml by adding a specific version tag (such as 2.8.0). After that you can reapply the deployment e.g.

kubectl apply -f tf-serving/deployment-resnet<version>.yaml

To make sure the deployment is updated, you can also delete the deployment before reapplying it.

kubectl delete deploy image-classifier-resnet<version>

Hope this helps.

N.B. Other debugging method you could try

Topic		Replies	Views
Vote C4W3 - Graded Lab 2 canary release - Tensorflow serving crash on model load and workaround Deploying Machine Learning Models in Production	12	687	October 3, 2022
C4W3 - Graded Lab 2 canary release - No deployments visible Deploying Machine Learning Models in Production	2	564	January 15, 2022
When can "Implementing Canary Releases of TensorFlow Model Deployments with Kubernetes and Anthos Service Mesh" be available again Deploying Machine Learning Models in Production	5	657	December 27, 2022
C4W3 Deployment creation taking too long Deploying Machine Learning Models in Production	9	601	September 16, 2022
C4 W2 Autoscaling TensorFlow Lab Deploying Machine Learning Models in Production	7	702	June 21, 2022

C4W3 Canary Releases: Error when making requests

Related topics