C3W3_Graded Lab_The status of all pods does not change to Running

jschoi · July 25, 2021, 12:24pm

W3_Graded Lab__Distributed Multi-worker TensorFlow Training on Kubernetes

The status of all pods does not change to Running.
(70/100)

kubectl get pods

Notice that the pods are named using the following convention [JOB_NAME]-worker-[WORKER_INDEX].
Wait till the status of all pods changes to Running.
To retrieve the logs for the chief (worker 0) execute the following command. It will continue streaming the logs till the training program completes.

student_04_594954e1dbf0@cloudshell:~/lab-files (qwiklabs-gcp-03-5790d7d6f5d1) kubectl get pods NAME READY STATUS RESTARTS AGE multi-worker-worker-0 0/1 ImagePullBackOff 0 5m5s multi-worker-worker-1 0/1 ImagePullBackOff 0 5m5s multi-worker-worker-2 0/1 ImagePullBackOff 0 5m5s student_04_594954e1dbf0@cloudshell:~/lab-files (qwiklabs-gcp-03-5790d7d6f5d1) kubectl get pods
NAME READY STATUS RESTARTS AGE
multi-worker-worker-0 0/1 ImagePullBackOff 0 7m26s
multi-worker-worker-1 0/1 ImagePullBackOff 0 7m26s
multi-worker-worker-2 0/1 ImagePullBackOff 0 7m26s
student_04_594954e1dbf0@cloudshell:~/lab-files (qwiklabs-gcp-03-5790d7d6f5d1) kubectl get pods NAME READY STATUS RESTARTS AGE multi-worker-worker-0 0/1 ImagePullBackOff 0 11m multi-worker-worker-1 0/1 ImagePullBackOff 0 11m multi-worker-worker-2 0/1 ImagePullBackOff 0 11m student_04_594954e1dbf0@cloudshell:~/lab-files (qwiklabs-gcp-03-5790d7d6f5d1) kubectl get pods
NAME READY STATUS RESTARTS AGE
multi-worker-worker-0 0/1 ImagePullBackOff 0 17m
multi-worker-worker-1 0/1 ImagePullBackOff 0 17m
multi-worker-worker-2 0/1 ImagePullBackOff 0 17m
student_04_594954e1dbf0@cloudshell:~/lab-files (qwiklabs-gcp-03-5790d7d6f5d1)$ kubectl get pods
NAME READY STATUS RESTARTS AGE
multi-worker-worker-0 0/1 ImagePullBackOff 0 37m
multi-worker-worker-1 0/1 ImagePullBackOff 0 37m
multi-worker-worker-2 0/1 ImagePullBackOff 0 37m

student_04_594954e1dbf0@cloudshell:~/lab-files (qwiklabs-gcp-03-5790d7d6f5d1) kubectl logs --follow {JOB_NAME}-worker-0
Error from server (BadRequest): container “tensorflow” in pod “multi-worker-worker-0” is waiting to start: trying and failing to pull image
student_04_594954e1dbf0@cloudshell:~/lab-files (qwiklabs-gcp-03-5790d7d6f5d1) kubectl logs --follow {JOB_NAME}-worker-0
Error from server (BadRequest): container “tensorflow” in pod “multi-worker-worker-0” is waiting to start: trying and failing to pull image

tranvinhcuong · July 26, 2021, 6:15am

hi @jschoi , the error message indicates that there is something wrong with pulling the docker image, can you check the YAML file and make sure the repository URL is correct?
There is a similar issue in this thread

ZaheedaT · July 28, 2022, 8:32pm

Hi there, I have the same error. There’s no part of the yaml file that has a Repository URL

Topic		Replies	Views
Week 3 Assignment - High performance modelling Machine Learning Modeling Pipelines in Production	3	732	February 22, 2023
Distributed Multi-worker TensorFlow Training on Kubernetes - Job not running Machine Learning Modeling Pipelines in Production	5	592	September 26, 2021
Week3: Graded external tool: Error in starting job Machine Learning Modeling Pipelines in Production	2	614	September 15, 2021
C3W3_Graded_Lab_Error__PROJECT_ID No such file or directory Machine Learning Modeling Pipelines in Production	2	620	August 4, 2021
C3W3-ImagePullBackOff, not "Running" Machine Learning Modeling Pipelines in Production	7	702	September 24, 2022

C3W3_Graded Lab_The status of all pods does not change to Running

The status of all pods does not change to Running. (70/100)

Related topics

The status of all pods does not change to Running.
(70/100)