Error in upgrading TFjob Manifest

apica.sharma15 · October 7, 2022, 2:50pm

Please help me solve this

apica.sharma15 · October 7, 2022, 3:32pm

@chris.favila please help. I have tried it 12 times now .

chris.favila · October 7, 2022, 4:37pm

Hi Apica! Can you post your tfjob.yaml here? Please check that you edited it correctly. You should modify these fields:

image
--saved_model_path
--checkpoint_path

All three fields involve your project ID in the string. Please check the sample file in the lab instructions. You can use the Cloud Editor to edit the said file. On the upper right of the Cloud Shell you are typing on, there is a Cloud Editor button. Please click that to open a file manager-like interface. Then navigate to tfjob.yaml under the lab-files folder. From there, you can edit the file to have the correct image and paths. Save then click the Cloud Shell button again to go back to your terminal and execute the next commands.

Also, can you let us know what command you were running that led to the 4th screenshot above? There’s an object must be a non-empty string error which I haven’t encountered yet in this lab. That might help us in debugging.

Lastly, please also provide a screenshot of the output of this command:

gcloud container images list

Just leave them here and I’ll take a look asap tomorrow. I have to log out now because it’s past midnight already in my side of the world. Hope you understand. Rest assured that you will be able to complete this lab. Thanks!

apica.sharma15 · October 7, 2022, 6:08pm

apica.sharma15 · October 7, 2022, 6:10pm

tried again and this time getting error

chris.favila · October 8, 2022, 11:33am

Hi Apica. The tfjob.yaml looks correct. I’ll try this out and update you asap.

By the way, if you will reply, please use the reply button at the bottom left of this box. Click that instead of the blue Reply button at the bottom of all threads. I think that only sends notifications to the owner of the topic (i.e. you). Thanks!

chris.favila · October 8, 2022, 1:21pm

Hello again. I retried the lab and did not run into any issues. For your next attempt, please send a screenshot of these 3 commands when you encounter them in the lab:

gsutil ls - The result should look something like this:

Screen Shot 2022-10-08 at 9.06.07 PM1258×72 21 KB
gcloud container images list - It should look something like this:

Screen Shot 2022-10-08 at 9.05.41 PM1818×118 49.3 KB

JOB_NAME=multi-worker
kubectl describe tfjob $JOB_NAME

You sent something similar above. It will look something like this. Notice that the image and bucket names are the same as the output of the first two commands above.

After that, the training should commence. Hope you’re able to complete the lab in the next one!

chris.favila · October 8, 2022, 1:26pm

Oh I think I see the issue now. In your tfjob.yaml, there was a gcr.io string prepended to the paths. That should be removed because that points to the Google Cloud Registry and that’s not where your buckets are stored. Please the sample output in my post above. Thanks!

apica.sharma15 · October 9, 2022, 1:51pm

I did not see any number of attempts notice before. now it says that “you quota has been exceeded for the lab”
what do i do ?

apica.sharma15 · October 9, 2022, 1:57pm

Okay checked the other thread regarding this. Reached support and issue is solved

apica.sharma15 · October 9, 2022, 2:20pm

with gcr.io and without gcr.io in both image path and saved model path , GETTING THE SAME ERROR

chris.favila · October 9, 2022, 11:30pm

Hi Apica. Looks like you misspelled qwiklabs. It is shown as qwicklabs in the screenshot. Kindly refer to my instructions in this post to make sure you don’t misspell anything and to ensure the resources are properly created. Please provide those three screenshots next time for easier debugging.

Also, please do not remove gcr.io in the image field. You only need to remove it in saved_model_path and checkpoint_path. As mentioned before, the strings you will put there will depend on the output of gsutil ls and gcloud container images list. Thanks.

apica.sharma15 · October 10, 2022, 6:54am

@chris.favila thank you. It is done.

chris.favila · October 10, 2022, 8:08am

Awesome! Glad you were able to complete it. Next time, kindly create the topic in the correct course category so the course mentors can see it. This was initially posted in the News and Announcements category and they are not monitoring that. Thanks!

HarryXPan · February 22, 2023, 7:44am

For whoever came to this thread because of the object must be a non-empty string error - please check if you forgot to add -bucket as part of the bucket name in the tfjob.yaml file.

Isics · May 3, 2023, 11:35am

Yep, that was me
thanks!

Topic		Replies	Views
Course 3 Week 3 Machine Learning Modeling Pipelines in Production	12	745	June 9, 2022
C3W3 Distributed Multi-worker TF Training on kubernetes - edit TFJob - Machine Learning Modeling Pipelines in Production	23	1104	July 13, 2023
Update the image and args Machine Learning Modeling Pipelines in Production	6	597	July 8, 2021
Problems with W3C3 lab assigment Machine Learning Modeling Pipelines in Production	6	591	September 7, 2022
C3W3: Graded Assignment: How to update the TFjob manifest? Machine Learning Modeling Pipelines in Production	1	564	July 1, 2022

Error in upgrading TFjob Manifest

Related topics