Failed to start kernel - AccessDeniedException

When I try to start the Generative AI with LLMs week 3 lab, I get as far as the Jupyter notebook, but it doesn’t have a kernel running. When I click on “No kernel” and pick the large kernel I get the following error message:

Failed to start kernel
Failed to launch app [sagemaker-data-science-ml-m5-large-685ada67a98eea46e68c3200c9bf]. AccessDeniedException: User: arn:aws:sts::236211171397:assumed-role/sagemaker-studio-vpc-firewall-us-east-1-sagemaker-execution-role/SageMaker is not authorized to perform: sagemaker:CreateApp on resource: arn:aws:sagemaker:us-east-1:236211171397:app/d-olachvsndcyk/sagemaker-user-profile-us-east-1/kernelgateway/sagemaker-data-science-ml-m5-large-685ada67a98eea46e68c3200c9bf with an explicit deny in an identity-based policy (Context: RequestId: 058a259c-85a5-4e0b-82ca-dd40136686e8, TimeStamp: 1691849491.0373292, Date: Sat Aug 12 14:11:31 2023)

How can I reset this lab and get it to work ?

1 Like

Hi @nickweeds

welcome to the community.

Did you tried to restart and clean out the kernel?

best regards
elirod

It is important to select the exact Kernel i.e. ml.m5.2xlarge as all others will fail with an access denied error like you got.

1 Like

@elirod
Hi all,
I have the same issue (tried to get access from scratch 3 times), getting the following message:
Failed to start kernelFailed to launch app [datascience-1-0-ml-t3-2xlarge-04b8c65bf681bfc91b919b0dd21a]. AccessDeniedException: User: arn:aws:sts::216558164389:assumed-role/sagemaker-studio-vpc-firewall-us-east-1-sagemaker-execution-role/SageMaker is not authorized to perform: sagemaker:CreateApp on resource: arn:aws:sagemaker:us-east-1:216558164389:app/d-g79lsyb2ctyr/sagemaker-user-profile-us-east-1/kernelgateway/datascience-1-0-ml-t3-2xlarge-04b8c65bf681bfc91b919b0dd21a with an explicit deny in an identity-based policy (Context: RequestId: 954c6286-c480-4346-bb15-3489617953e3, TimeStamp: 1691941946.7399821, Date: Sun Aug 13 15:52:26 2023)

Nevertheless, I tried it with 2 CPU and low memory, but during evaluation the kernel died.
Have you found a solution you could share?
And what happens with our timeframe, if this is not our fault that the lab 3 is not working?

Thanks in advance.

1 Like

@Ilo_Brnk , This is the same issue for me as well–kernel dies at,

toxicity_evaluator = evaluate.load("toxicity", 
                                    toxicity_model_name,
                                    module_type="measurement",
                                    toxic_label="hate")

I have checked the hugging face repo for any changes with regard to model and there seems to be no change.

Could this be related to the instance size? We are allowed only a medium instance and the one shown in walkthrough video is m5 large.
@elirod, I appreciate your support here. Thanks!

3 Likes

Hey, did this issue resolved? I am having the same issue!

P.S issue resolve by “Restart kernel and clear outputs”
image
and then changing the GPU type

1 Like

Hi all,
restarting the kernel does not solve the issue for me;
only having 2 CPU, 4 GBRAM as fixed medium config is not enough for the task.
In the Lab 3 coding it is said (as already for Lab 2) you need 8 CPU, 32 GB RAM which corresponds to the 2xlarge config of the project which is not allowed to configure as a student.

Each time it crashed at the same code position @bharathy89 mentioned (2.3 Evaluate Toxicity):
toxicity_evaluator = evaluate.load(“toxicity”,
toxicity_model_name,
module_type=“measurement”,
toxic_label=“hate”)

So, what kind of solution exists? How can we finish the course?

PS - some time later having found a workaround:
I could change the environment to the 2xlarge config as mentioned in the first part of the notebook, after the kernel died in the toxicity evaluation part, then clicking on the Kernel menue, selecting the ‘Restart Kernel and Clear All Outputs’, then using the kernel config function on the top right sagemaker study window.
After all that with the correct environment the kernel does not dy during notebook run and the notebook finished appropriately to submit the result.

3 Likes

This worked for me! Thanks @Ilo_Brnk

Hi all,

I’m experiencing a similar problem. When trying to load the ml.m5.2xlarge config, the notebook get’s stuck in “loading kernel notebook”:

Facing the same problem. unable to change the environment.

Thanks @Ilo_Brnk for sharing this.
This helped me to complete the lab.

I’m getting the same issue.
Failed to start kernel
Failed to launch app [sagemaker-data-science-ml-m5-large-685ada67a98eea46e68c3200c9bf]. AccessDeniedException: User: arn:aws:sts::244930903551:assumed-role/sagemaker-studio-vpc-firewall-us-east-1-sagemaker-execution-role/SageMaker is not authorized to perform: sagemaker:CreateApp on resource: arn:aws:sagemaker:us-east-1:244930903551:app/d-9k51gqjl6hfm/sagemaker-user-profile-us-east-1/kernelgateway/sagemaker-data-science-ml-m5-large-685ada67a98eea46e68c3200c9bf with an explicit deny in an identity-based policy (Context: RequestId: 37d2d479-6220-4ae7-80f1-b091c685fdd1, TimeStamp: 1703278224.5513651, Date: Fri Dec 22 20:50:24 2023)

It’s looking like this is happening again, and the old “work around” no longer works. I have been getting the message below:

Failed to start kernel

Failed to launch app [sagemaker-data-scien-ml-m5-2xlarge-58ec53cbfb4afb44281d61bdec8c]. ResourceLimitExceeded: The account-level service limit 'Studio KernelGateway Apps running on ml.m5.2xlarge instance' is 1 Apps, with current utilization of 1 Apps and a request delta of 1 Apps. Please use AWS Service Quotas to request an increase for this quota. If AWS Service Quotas is not available, contact AWS support to request an increase for this quota. (Context: RequestId: 73991515-a9b3-4a80-b84e-5b19aa0d72db, TimeStamp: 1703733665.2841454, Date: Thu Dec 28 03:21:05 2023)

I can get as far as: 3.2 - Fine-Tune the Model using the default ml.t3.medium kernel, then it crashes.

Please advise or fix this…
Ticket: https://www.coursera.support/s/case/500VH000003BzckYAC/blocker-can-not-select-the-correct-kernel-for-the-lab-mlm52xlarge

John

Hi everyone! From the screenshots on this thread, the issue is that the incorrect instance type was chosen. Please make sure that you’re selecting ml.m5.2xlarge as mentioned in the instructions and NOT ml.m5.large. Hope this helps!

Hi John. That is a different error. Please follow this topic instead. Thanks!

1 Like