Week 2 - Lab - Section 2.1 - Internal Server Error

Three times today I have tried and failed this lab. This is my latest output:

In CloudWatch Logs everything looks ok:

Hello @memark,

I have faced the same issue. I will report it.

Kind regards,

Hi @memark,

Unfortunately, it seems that this particular training job is demanding to the server. It worked out for me: to restart the kernel and run the previous cells again.

Let us know how it goes.

Cheers!

I have already tried three times (restarting the whole lab, not just the kernel) and failed. Is there no other solution?

Did another try. It finally succeeded!

2 Likes

Thanks for letting us know. I have reported to the DeepLearning.ai team; and I think @bj.kim also did.

Additional Edit:
DeepLearning.ai team said that they have reported the issue to AWS and that they are investigating. This happens very rarely. For now the solution is to restart the Kernel and try to run the job again as you did.

2 Likes

I’ve tried this several times today and it is still a problem. I spent a lot of time restarting the kernel, restarting the lab and have gotten no where.

This is not a rare occurrence.

image

1 Like

Same issue today. CloudWatch suggests the training was successful. Could this be an issue with my how I setup the notebook?

Same happens to me. Training takes a reasonable amount of time (30-40 minutes) but uploading the resulting model takes literally hours, which makes absolutely no sense.

Hi @Alkanen and @norcalpedaler,

Thanks for letting us know and apologies for facing such issue.

I have followed up with the teaching staff team within Deeplearning.AI and AWS and will let you know as soon as they return a status to share.

1 Like

@Alkanen and @norcalpedaler:

Can you please share your vocareum lab link and your AWS account in a private message? I have had feedback from DeepLearning.AI: we’ll forward them to the AWS team.

However, what I was told is that it might be not a fast fix as this is a Sagemaker problem.

Hi @Raul!

Thanks for getting back to us so quickly. I actually got it working shortly after writing my previous comment (my very next retry actually), so I finished the lab and don’t have the link anymore.

2 Likes

I’m glad you’ve made it. Nonetheless, this issue is currently under assessment…

1 Like

Same happening to me, can´t make it work after several tries…

Hello All, Tried twice this evening and failed both times. Will try again tomorrow. Is there no other solution than to just try again? I noticed the generated model file is ~800mb is that why it is taking time?

Hi, I also got the same issue. What should I do?

Hi @gary, @sachinkl and @ValentinDeLaRosa

Can you please share your vocareum lab link and the AWS account so that I can forward it to the DeepLearning.AI and AWS team?

However, this is not a fast fix because is a Sagemaker problem. Apologies for the inconvenience.

I tried it the next day early morning and it worked fine so I am good for now.
Thanks Raul for the followup.

1 Like

It works now. Thanks Raul!