Update the image and args

Next, update the --saved_model_path and --checkpoint_path arguments by replacing the bucket token with the name of you Cloud storage bucket. Recall that your bucket name is [YOUR_PROJECT_ID]-bucket .

what do we need to here?

hi @quartermaine , welcome to the course!

When doing model training you need someplace to store the model and the intermediate result of the model (checkpoint), here you use a cloud storage bucket. To use that you need to specify the id of the bucket which you have created in previous steps.

Hope it helps,
Cuong

hi @tranvinhcuong ,
Thank you for the information, I have edited the tfjob.yaml file with the storage bucket but when I run the command kubectl logs --follow ${JOB_NAME}-worker-0 I get

Error from server (BadRequest): container “tensorflow” in pod “multi-worker-worker-0” is waiting to start: trying and failing to pull image

hi @quartermaine ,
the error message said something wrong with the docker image, can you check you got the correct tag for the image?

hi @tranvinhcuong ,
I was able to pass the lab as you mentioned there was actually a wrong tag image.

1 Like

where do you find the tfjob.yaml file? thanks

Hi @qchaldemer ,

In the cloud shell press the button open editor and you see a list of files, the tfjob.yaml file is located in the lab-files file and you can edit it there.

1 Like