Week3: Graded external tool: Error in starting job

Hi, how to fix this?

kubectl get pods NAME READY STATUS RESTARTS AGE multi-worker-worker-0 0/1 ImagePullBackOff 0 10m multi-worker-worker-1 0/1 ImagePullBackOff 0 10m multi-worker-worker-2 0/1 ImagePullBackOff 0 10m kubectl describe pod multi-worker-worker-0
Name: multi-worker-worker-0
Namespace: default
Priority: 0
Node: gke-cluster-1-default-pool-8c9b34ca-ml4l/10.128.0.4
Start Time: Fri, 10 Sep 2021 15:50:54 +0000
Labels: controller-name=tf-operator
group-name=kubeflow.org
job-name=multi-worker
job-role=master
tf-job-name=multi-worker
tf-replica-index=0
tf-replica-type=worker
Annotations:
Status: Pending
IP: 10.92.0.8
IPs:
IP: 10.92.0.8
Controlled By: TFJob/multi-worker
Containers:
tensorflow:
Container ID:
Image: mnist
Image ID:
Port: 2222/TCP
Host Port: 0/TCP
Args:
–epochs=5
–steps_per_epoch=100
–per_worker_batch=64
–saved_model_path=gs://qwiklabs-gcp-04-918586e665a4-bucket/saved_model_dir
–checkpoint_path=gs://qwiklabs-gcp-04-918586e665a4-bucket/checkpoints
State: Waiting
Reason: ImagePullBackOff
Ready: False
Restart Count: 0
Environment:
TF_CONFIG: {“cluster”:{“worker”:[“multi-worker-worker-0.default.svc:2222”,“multi-worker-worker-1.default.svc:2222”,“multi-worker-worker-2.default.svc:2222”]},“task”:{“type”:“worker”,“index”:0},“environment”:“cloud”}
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-gfpw2 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-gfpw2:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-gfpw2
Optional: false
QoS Class: BestEffort
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
Normal Scheduled 11m default-scheduler Successfully assigned default/multi-worker-worker-0 to gke-cluster-1-default-pool-8c9b34ca-ml4l
Normal Pulling 9m29s (x4 over 11m) kubelet Pulling image “mnist”
Warning Failed 9m28s (x4 over 11m) kubelet Failed to pull image “mnist”: rpc error: code = Unknown desc = failed to pull and unpack image “docker.io/library/mnist:latest”: failed to resolve reference “docker.io/library/mnist:latest”: pull access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed
Warning Failed 9m28s (x4 over 11m) kubelet Error: ErrImagePull
Normal BackOff 5m52s (x21 over 11m) kubelet Back-off pulling image “mnist”
Warning Failed 49s (x43 over 11m) kubelet Error: ImagePullBackOff

Hi! It seems you forgot to change the image in the YAML file. It’s still mnist. You need to change it to the image you pushed to the Cloud Registry. That would be something like gs://qwiklabs-gcp-04-918586e665a4/mnist-train . This would change if you restart the lab so please re-check the new project id provided. Hope this helps!

1 Like

Yes, it works after changing image name.

Thank you Chris F.

Regards,
Herojit