Challenges with Week1 - Lab1

Hi Guys,

Overall I enjoy courses from DeepLearning.AI, but somehow my first hands-on exercise hasn’t been such a great experience.

Instruction overall are easy to follow, but difficult to find help.

  1. First I had an issues with selecting right instance type (size) - ( [Troubleshooting] Getting “ResourceLimitExceeded” when selecting ml.m5.2xlarge - Course Q&A / Generative AI with Large Language Models - DeepLearning.AI . To be fair, this problem is mentioned at the Coursera page, but it’d be good to mentioned if it’s even worth trying to do the lab if you have this problem or just give a link where to report the problem and wait.
  2. It seemed that default seized instance (2 CPU and 4GB, forgot the name and instance is blocked now). However, faces another issue during the lab - which was related to dependent libraries when trying to load the FLANT5 model :
    RuntimeError: Failed to import transformers.models.t5.modeling_t5 because of the following error (look up to see its traceback): No module named ‘torch._C’
    This issues has been reported several times, but doesn’t seems to have a clear resolution:
    2.1 Assignment 1; torch.c runtime error - Generative AI with Large Language Models / GenAI with LLMs Resources - DeepLearning.AI - No resolution, suggestion to report a problem by support and learner idled.
    2.2 Module Not Found Error: No module named ‘torch._C’ - Course Q&A / Generative AI with Large Language Models - DeepLearning.AI - long conversation, seems learner managed to Auto-magically resolve the problem through quite persistent struggle, however got the account blocked in the end - not clear resolution. (unless you consider “have you tried to turn it on/off” a resolution).
    2.3 Getting “No module named ‘torch._C’” error for Lab 1 - Course Q&A / Generative AI with Large Language Models - DeepLearning.AI - suggestion to try another image, learner didn’t come back.

Somewhat suggestion 3 was somewhat I could try, however seems like since that time there were changes in images, and instead of DataScience image now we have DataScience2.0 and DataScience3.0. I have tried both, DataScience2.0 is Python 3.8 and errored out even sooner with when trying to lead the data set. At this stage my AWS learning account expired and I got Account deactivated.

So I had to go and submit the GenAI with LLMs Lab Issue Report (google.com) issue form (I actually did it twice because there is no notification, my application has been received or not and not clear how to follow up).

I’ve ended up spending ~2h hours trying to configure the environment for the lab, instead of actually learning.

Outcome: So I suppose I’ll wait till my AWS lab/learning account will be re-activated and hope that this time I won’t face the issues above.

I might not get the issue when I’ll get my account, but if I may ask DeepLearning.AI for some support. it’d be great to:

  1. Have some sort of notification or status for Google form report (to know that your problem is being taken care of).
  2. if possible please have a look a the unresolve torch._C problem, I’m sure it’ll help other people who might be as frustrated as I was.
  3. It might be helpful to mention full library requirements for the exercise/Lab somewhere if I wish to do it in my own environment.

Thank you in advance.

An update.

I got my AWS account re-activatged and completed the Lab1.

I still faced with the problem of selecting instance type:
ResourceLimitExceeded: The account-level service limit ‘Studio KernelGateway Apps running on ml.m5.2xlarge instance’ is 1 Apps, with current utilization of 1 Apps and a request delta of 1 Apps.

However I saw in one of the posts to explore running instances and resources and shutdown all and then select the right image size. and it seems to work, seems like this training AWS account is limited to 1 ml.m5.2xlarge instance.

So if it helps anyone, you can navigate to running instances in the icon tab on the left:

  1. at first you are here:
  2. You need to click on the circle with empty square in the middle:

I already restarted instances, but you’ll see something like
2.1. Running Instances :
ml.m5.2xlarge
ml.t3.medium
2.2. Running Apps
sagemaker-data-science1.0
sagemaker-data-science1.0

Click on Turn On/Off sign and shutdown all instances, so you’ll disable all running resources and will see something like this:

Then you can select a instance type again:

that worked for me.

Hi Jevgenijs. Thank you for the feedback. The AWS bug is still being investigated by our partners. Thank you for sharing the workaround as well from the other thread.

The submission confirmation of the Google form shows a message indicating the wait time for reactivating your account. Beyond that, learners can follow up.

As for the torch._C problem: From a few previous cases, it usually starts with an incorrect instance type or a pip install that wasn’t run. From what I know, the learners usually get through it by retrying with the correct instance type and making sure that the pip install cell (usually the first cell) is run. I see that the third topic you referenced had the same answer. I just marked it as the solution so other learners will see it. Will also add it to the FAQ.

I’m not sure about the full library requirements, but maybe I can ask the team after the holidays.

Thanks again for the feedback!

1 Like