Next, update the --saved_model_path
and --checkpoint_path
arguments by replacing the bucket
token with the name of you Cloud storage bucket. Recall that your bucket name is [YOUR_PROJECT_ID]-bucket
.
what do we need to here?
Next, update the --saved_model_path
and --checkpoint_path
arguments by replacing the bucket
token with the name of you Cloud storage bucket. Recall that your bucket name is [YOUR_PROJECT_ID]-bucket
.
what do we need to here?
hi @quartermaine , welcome to the course!
When doing model training you need someplace to store the model and the intermediate result of the model (checkpoint), here you use a cloud storage bucket. To use that you need to specify the id of the bucket which you have created in previous steps.
Hope it helps,
Cuong
hi @tranvinhcuong ,
Thank you for the information, I have edited the tfjob.yaml file with the storage bucket but when I run the command kubectl logs --follow ${JOB_NAME}-worker-0
I get
Error from server (BadRequest): container “tensorflow” in pod “multi-worker-worker-0” is waiting to start: trying and failing to pull image
hi @quartermaine ,
the error message said something wrong with the docker image, can you check you got the correct tag for the image?
hi @tranvinhcuong ,
I was able to pass the lab as you mentioned there was actually a wrong tag image.
where do you find the tfjob.yaml file? thanks
Hi @qchaldemer ,
In the cloud shell press the button open editor
and you see a list of files, the tfjob.yaml
file is located in the lab-files
file and you can edit it there.