C3W3_Graded Lab

image


Hello, Sir, I have checked all the answers for the similar issues, but still cannot let the worker status turn to be running, not sure what is the problem, also my time for the lab will ends, can I redo it later with the same resources allocated?
thank you!

Hi @XIAO_XIAO , welcome to the community!

Could you please explain a bit more what you are trying to do?

Is this error happening when running the routine on item 9? the one that takes 10-15 minutes? if not, when is this happening to you?

I look forward for your reply.

Juan

Hi, Juan, Thank you for the quick reply, this is the assignment of week 3, the error occurs at the task 6, after I submitting the TFJob at task 5, the worker never start and shows fail to pull image. From the screen shot you can see I already change the Yaml file with my bucket name, so I am confused and not know where is the problem. Thank you!

Thanks for the reply @XIAO_XIAO. Based on your response I am suspecting that this question is not from the Machine Learning Specialization but from a TensorFlow specialization. Can you please correct me if I am wrong?

No, it is for the Machine Learning Engineering for Production (MLOps) Specialization. The Third Course: Machine Learning Modeling Pipelines in Production. The assignment is week3, distributed Multi-worker Tensorflow Training on Kubernetes. The problems occurs at Task5 and 6, after I submitting the TFJob, the job never runs, the workers always fail to pull image. Let me know if you need more information, thank you!

@XIAO_XIAO thank you for the clarification. I have reclassified your case from “Machine Learning Specialization” to “Machine Learning Engineering for Production (MLOps)”.

I am sorry I cannot help you with this case as I have not done that specialization. By me doing this reclassification of your case, a mentor assigned to this specialization will be able to assist you.

I wish you success in your endeavor!

Juan

Thank you very much!

Juan Olano via DeepLearning.AI <notifications@dlai.discoursemail.com> 于2022年10月22日周六 22:24写道:

@XIAO_XIAO
Please fix the yaml file. The image entry in the yaml file should match the tag you created and pushed to the registry. It’s not just mnist.

Thank you very much for the instruction, I just try to fix it and receive the message my quota has been exceeded for this lab, is there anyway I can refresh it and finish the lab, thank you very much!

The issue is resolved, thank you very much for all the help!

Balaji Ambresh via DeepLearning.AI <notifications@dlai.discoursemail.com> 于2022年10月23日周日 14:33写道: