C3W3 Distributed Multi-worker TensorFlow Training on Kubernetes - Submitting TF job failed

Dear team,
I receive the error message: TFJob multi-worker has failed because 1 Worker replica(s) failed. although I believe YAML-file was set up correct - pls find screen shot attached.

likewise pls. find screenshot of the error message for your reference.

pls. advise what to do,
thanx in advance


Hi Jorn! Sorry for the late reply. If you haven’t solved the issue yet, it looks like you forgot to append the -bucket suffix in the saved model and checkpoint paths. That is part of the bucket name as mentioned in the instructions (i.e. gs://{PROJECT_ID}-bucket). Hope this helps!