C3W3_Graded Lab_The status of all pods does not change to Running

W3_Graded Lab__Distributed Multi-worker TensorFlow Training on Kubernetes

The status of all pods does not change to Running.
(70/100)

kubectl get pods

Notice that the pods are named using the following convention [JOB_NAME]-worker-[WORKER_INDEX].
Wait till the status of all pods changes to Running.
To retrieve the logs for the chief (worker 0) execute the following command. It will continue streaming the logs till the training program completes.

student_04_594954e1dbf0@cloudshell:~/lab-files (qwiklabs-gcp-03-5790d7d6f5d1) kubectl get pods NAME READY STATUS RESTARTS AGE multi-worker-worker-0 0/1 ImagePullBackOff 0 5m5s multi-worker-worker-1 0/1 ImagePullBackOff 0 5m5s multi-worker-worker-2 0/1 ImagePullBackOff 0 5m5s student_04_594954e1dbf0@cloudshell:~/lab-files (qwiklabs-gcp-03-5790d7d6f5d1) kubectl get pods
NAME READY STATUS RESTARTS AGE
multi-worker-worker-0 0/1 ImagePullBackOff 0 7m26s
multi-worker-worker-1 0/1 ImagePullBackOff 0 7m26s
multi-worker-worker-2 0/1 ImagePullBackOff 0 7m26s
student_04_594954e1dbf0@cloudshell:~/lab-files (qwiklabs-gcp-03-5790d7d6f5d1) kubectl get pods NAME READY STATUS RESTARTS AGE multi-worker-worker-0 0/1 ImagePullBackOff 0 11m multi-worker-worker-1 0/1 ImagePullBackOff 0 11m multi-worker-worker-2 0/1 ImagePullBackOff 0 11m student_04_594954e1dbf0@cloudshell:~/lab-files (qwiklabs-gcp-03-5790d7d6f5d1) kubectl get pods
NAME READY STATUS RESTARTS AGE
multi-worker-worker-0 0/1 ImagePullBackOff 0 17m
multi-worker-worker-1 0/1 ImagePullBackOff 0 17m
multi-worker-worker-2 0/1 ImagePullBackOff 0 17m
student_04_594954e1dbf0@cloudshell:~/lab-files (qwiklabs-gcp-03-5790d7d6f5d1)$ kubectl get pods
NAME READY STATUS RESTARTS AGE
multi-worker-worker-0 0/1 ImagePullBackOff 0 37m
multi-worker-worker-1 0/1 ImagePullBackOff 0 37m
multi-worker-worker-2 0/1 ImagePullBackOff 0 37m

student_04_594954e1dbf0@cloudshell:~/lab-files (qwiklabs-gcp-03-5790d7d6f5d1) kubectl logs --follow {JOB_NAME}-worker-0
Error from server (BadRequest): container “tensorflow” in pod “multi-worker-worker-0” is waiting to start: trying and failing to pull image
student_04_594954e1dbf0@cloudshell:~/lab-files (qwiklabs-gcp-03-5790d7d6f5d1) kubectl logs --follow {JOB_NAME}-worker-0
Error from server (BadRequest): container “tensorflow” in pod “multi-worker-worker-0” is waiting to start: trying and failing to pull image

1 Like

hi @jschoi , the error message indicates that there is something wrong with pulling the docker image, can you check the YAML file and make sure the repository URL is correct?
There is a similar issue in this thread

Hi there, I have the same error. There’s no part of the yaml file that has a Repository URL