Error in "TFX on Google Cloud Vertex Pipelines"

loka108 · June 30, 2022, 5:19pm

I am trying to finish the graded External tool of “TFX on Google Cloud Vertex Pipelines”.
My pipeline running takes forever to complete CsvExampleGen step as shown in the link.

So, I restarted the lab several times to finish it.
But, It doesn’t restart now with the following error message.
Please, let me know what I should do.

======================
Lab failed on Thu, 30 Jun 2022 13:05:41 -0400: Deployment Manager error: On [/deployments/qldm-23591937-4ebe9e728fc6d2bf/resources/startup-vm]: ["{"ResourceType":"compute.v1.instance","ResourceErrorCode":"ZONE_RESOURCE_POOL_EXHAUSTED","ResourceErrorMessage":"The zone 'projects/qwikl… Any credits or tokens you used to start this lab have been refunded.

balaji.ambresh · June 30, 2022, 5:25pm

ZONE_RESOURCE_POOL_EXHAUSTED indicates that the particular zone doesn’t have sufficient resources. Please get in touch with qwiklabs via chat to let them know to scale resources.

One tip I can offer is for you to switch to a different zone and try the lab. Start with zones under us-central1. They almost always seems to have resources.

loka108 · July 1, 2022, 12:31am

Thank you for your fast response.
I can access the lab when I started at different time.
But, it still doesn’t work.
It sent a error message as like the follow

Build Failed
Build failed with 524.
If you are experiencing the build failure after installing an extension (or trying to include previously installed…
if you specifically intended to install a source extension, please run ‘jupyter lab build’ on the server for full…

As you know, there is pip install in the front of the Jupyter notebook.

!pip install --upgrade pip
!pip install --upgrade “tfx[kfp]<2”

It makes the following error message, which is ERROR of dependency.

ERROR: pip’s dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
explainable-ai-sdk 1.3.2 requires xai-image-widget, which is not installed.
tensorflow-io 0.21.0 requires tensorflow<2.7.0,>=2.6.0, but you have tensorflow 2.8.2 which is incompatible.
tensorflow-io 0.21.0 requires tensorflow-io-gcs-filesystem==0.21.0, but you have tensorflow-io-gcs-filesystem 0.23.1 which is incompatible.
pandas-profiling 3.0.0 requires tangled-up-in-unicode==0.1.0, but you have tangled-up-in-unicode 0.2.0 which is incompatible.
numba 0.54.1 requires numpy<1.21,>=1.17, but you have numpy 1.21.6 which is incompatible.
cloud-tpu-client 0.10 requires google-api-python-client==1.8.0, but you have google-api-python-client 1.12.11 which is incompatible.

if I use "pip install --upgrade “tfx[kfp]” instead of "pip install --upgrade “tfx[kfp]<2”, the Error is disappeared.

But, My pipeline running takes forever to complete CsvExampleGen step.
I found the ERROR message in the LOGS of tfx-on-googlecloud in vertex AI workbench.
I think metadata URL is not valid.
Please let me know your opinions.

Jun 30 23:46:57 tfx-on-googlecloud bash[960]: 404 Client Error: Not Found for url: http://metadata/computeMetadata/v1/instance/attributes/use-collaborative
Jun 30 23:46:57 tfx-on-googlecloud bash[960]: 404 Client Error: Not Found for url: http://metadata/computeMetadata/v1/instance/attributes/use-collaborative

loka108 · July 1, 2022, 1:06am

I think I’ve tried too many.
I can’t access the google cloud with the message "Sorry, your quota has been exceeded for this lab.
Please, let me know what I should do to finish this lab.
I already finish all courses in Machine Learning Engineering for Production (MLOps) Specialization except this external lab.

hhhhho · July 1, 2022, 3:55am

Lab failed on Thu, 30 Jun 2022 12:34:15 -0400: Deployment Manager error: On [/deployments/qldm-23585995-f6b7dd5ef24525ef/resources/startup-vm]: ["{"ResourceType":"compute.v1.instance","ResourceErrorCode":"ZONE_RESOURCE_POOL_EXHAUSTED","ResourceErrorMessage":"The zone 'projects/qwikl… Any credits or tokens you used to start this lab have been refunded.

I also receive this message.
So, I should wait for a while and try it again at another time?

balaji.ambresh · July 1, 2022, 6:27am

@loka108 If you want to fix library dependencies, the first step would be to downgrade tensorflow to some 2.6.x version. After that, install the rest of the dependencies. Here’s a reference notebook which uses metadata path. See if providing a local path helps make progress in the lab.

Another approach is to ping qwiklabs. They’ll do the following:

File a ticket. (ask them to provide the ticket id for your reference as well)
Fix the notebook
Send an email once the fix is in place.

Please reply to this post with the ticket id so that I can file the ticket to notify the staff as well.

@hhhhho What did qwiklabs say help say?

hhhhho · July 1, 2022, 8:12am

I believe it is not libraries dependency issue but there maybe some server issues instead.
I tried for many times with exactly the same steps, and I successfully run the lab at the forth/fifth time.

I personally suggest to have patience for this lab …

loka108 · July 1, 2022, 3:13pm

Congratulations to finish your lab.
I couldn’t finish my lab even though I tried 7~8 times. and I can’t access the lab the message “Sorry, your quota has been exceeded for this lab.”
I guess I’m out of luck.

loka108 · July 1, 2022, 3:23pm

I added the tensorflow upgrade in the notebook as like the follow. but, it did not work.
!pip install --upgrade pip
!pip install --upgrade tensorflow==2.6.0
!pip install --upgrade “tfx[kfp]<2”

I also made a new workbench with tensorflow 2.6 LTS by myself in the Vertex AI/Workbench .
But, it was switched to tensorflow 2.8 automatically after kernel reset.

What’s the mean of “ticket”? How can I find the ticket.
I can’t access the lab now with the message “Sorry, your quota has been exceeded for this lab.”
If I can find the ticket after the lab starts, it is impossible to check that.

balaji.ambresh · July 1, 2022, 4:06pm

Contact qwiklabs via help menu on assignment page. Get in touch through chat option and a ticket will be filed for you. Get the reference number for follow up.

loka108 · July 1, 2022, 8:56pm

I chatted with qwiklabs.
He shared the video about how to do this lab.

I could not figure out what is different from what I did.
I think some cells in the notebook should be clicked several times.
But, it is just my guess. Anyhow, it is finished now.
Thank you for your help.

gbournigal · July 2, 2022, 1:12am

I am having the same problem and don’t know how to finish it

When checking the version I am getting the following warning:

2022-07-02 00:53:26.268197: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library ‘libcudart.so.11.0’; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-07-02 00:53:26.268239: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.

I don’t know if this could bring some light to the problem.

balaji.ambresh · July 2, 2022, 6:29am

@gbournigal These warnings indicate the you’re not using a gpu. Starting with tensorflow enterprise notebook without gpu will produce these warnings. This has nothing to do with the correctness of the lab. Please check with qwiklabs (via chat) on what the underlying issue is.

loka108 · July 2, 2022, 7:41pm

According to the video,
In the PIP install cell, the independence error is disappeared if you run again after the first.
There is no error of load dynamic library in the version check cell of the video. But, I also saw same error message.
I think tensorflow version is 2.8.0 in the lab while both the version in video and the note of workbench are tensorflow 2.7.0.
In my experience, the error message is disappeared when you run again after the first run as like the PIP install cell.
After several re-running for some cells, I could finish this lab.
I couldn’t still understand why the error messages are disappeared after re-running. Anyway, it worked for my case.

Topic		Replies	Views
C4W3 - Graded lab TFX on Google Cloud Vertex Pipelines fails on start Deploying Machine Learning Models in Production	14	644	December 1, 2022
C4W3 TFX on Google Cloud Vertex Pipelines Deploying Machine Learning Models in Production	4	678	July 16, 2022
TFX on Google Cloud Vertex Pipelines Deploying Machine Learning Models in Production	11	657	July 6, 2022
C4W3 - TFX on Google Cloud Vertex Pipelines (pip install error) Deploying Machine Learning Models in Production	4	558	October 3, 2022
Lab week 3: 'TFX on Google Cloud Vertex Pipelines' Deploying Machine Learning Models in Production	22	1006	February 17, 2024

Error in "TFX on Google Cloud Vertex Pipelines"

Related topics