C4W3 Canary Releases: Error when making requests

I found that whenever I made a request, I received the following error:
upstream connect error or disconnect/reset before headers. reset reason: connection termination
I could still pass the lab, but could not see the canary deployment of ResNet101. Does anyone know why this occurs, and how to get around it?

I also got the same error and for me the issue was due to wrong model path in the configmap.

Otherwise I would make sure that the model server is actually up and listening on port 8501, e.g.

# enter the default container for your pod for resnet50/resnet101
kubectl exec -it <POD_NAME> /bin/bash

# check model server process
ps aux | grep tensorflow_model_server

# check port 8501 is open 
apt-get update
apt-get install lsof 
lsof -i :8501

I’m also getting the same error.

Same error, cannot get it to predict even on resnet50.

Looking at the pod status with kubectl get pods, the pods get restarted after the request, so it looks like something is crashing the pod upon arrival of the request.

I tried following @tatoooo 's suggestions but the service looks healthy and the ports are open.

@d1ggs I faced the same issue and the solution I provide here seems to solve it.

Concretely, you can try to modify tf-serving image on the deployment manifest of tf-serving/deployment-resnet50.yaml and tf-serving/deployment-resnet101.yaml by adding a specific version tag (such as 2.8.0). After that you can reapply the deployment e.g.

kubectl apply -f tf-serving/deployment-resnet<version>.yaml

To make sure the deployment is updated, you can also delete the deployment before reapplying it.

kubectl delete deploy image-classifier-resnet<version>

Hope this helps. :slight_smile:

N.B. Other debugging method you could try