C3W3-ImagePullBackOff, not "Running"

Hi,

I was at the final stage of the C3W3 assignment “Distributed MultiWorker Tensorflow Training…”.
However, the following command was expected to run and make Status column “Running…”. This is not happening.

kubectl get pods

Output should be:

NAME READY STATUS RESTARTS AGE
multi-worker-worker-0 0/1 Running 0 21m
multi-worker-worker-1 0/1 Running 0 21m
multi-worker-worker-2 0/1 Running 0 21m

Rather, this is the output. Status column should have been “Running” as per the assignment.

NAME READY STATUS RESTARTS AGE
multi-worker-worker-0 0/1 ErrImagePull 0 21m
multi-worker-worker-1 0/1 ImagePullBackOff 0 21m
multi-worker-worker-2 0/1 ErrImagePull 0 21m

When I run the following command to retrieve logs

kubectl logs --follow ${JOB_NAME}-worker-0

Output is :

Error from server (BadRequest): container “tensorflow” in pod “multi-worker-worker-0” is waiting to start: trying and failing to pull image

I got this because I failed to update tfjob.yaml properly. Look carefully at the " Updating the TFJob manifest" section and make sure you update the image name and the two arguments:

kind: TFJob
metadata:
  name: multi-worker
spec:
  cleanPodPolicy: None
  tfReplicaSpecs:
    Worker:
      replicas: 3
      template:
        spec:
          containers:
            - name: tensorflow
              image: <CHANGE THIS>
              args:
                - --epochs=5
                - --steps_per_epoch=100
                - --per_worker_batch=64
                - --saved_model_path=<CHANGE THIS>
                - --checkpoint_path=<CHANGE THIS>

3 Likes

I did update the .yml file within the editor as you said.
However, it still gives me the same error message.
Any suggestions?

Thanks this helped me.

I am also getting the same error even after updating

I got the same error. Updating the tfjob.yaml file alone didnt solve the issue for me.
So I tried deleting the pods using this command “kubectl delete pods multi-worker-worker-x” and submitted the job again. That solved my issue.

1 Like

How can i edit the yaml file? Any link or image?

I am also stuck.
How did you do this . Stuck corse completion due to this