C3-W2 Programming Assignment 2: Unable to Continue with Lab Due to Persistent AWS Glue Issues

Dear Support Team,

I hope you’re well. I’m writing to report persistent issues with the hands-on lab involving AWS Glue (programming assignment Week 2 of Course 3). Every time I run the lab, I encounter problems that prevent me from progressing.

Specifically:

  • The Glue jobs remain stuck in “Running” for an unusually long time, and sometimes even time out.
  • When trying to check the job status or logs, I receive permission errors such as:
    “This IAM user does not have permission to view Log Groups in this account…”
  • I’ve attempted to rerun the lab several times, but the problems persist. It seems that the environment is not resetting cleanly between runs.

Because of this, I’m unable to complete the exercise or move forward in the course. I kindly request that my lab environment — including the AWS resources — be fully reset or reviewed so I can proceed with a clean state.

Thank you in advance for your help,
Best regards,

@Diana_Bohorquez Can you explain at what step you’re getting this error? It looks like something is not configured well in the script of the glue jobs.

A couple of clarification items:

  • Every time you finish a session, the AWS account automatically goes through a cleaning process. So when you start a new session, the AWS resources are automatically fully reset for you.
  • If you want the lab files to be reset, you would need to do that manually. I can explain what to do in this case if you also want your lab files to be reset.

Hi,

I’m trying to run step 3.2.3 of the lab, where we’re instructed to start the Glue jobs using the aws glue start-job-run command. However, every time I attempt this, both jobs get stuck and eventually return a TIMEOUT status.

I’ve tried this multiple times — restarting the lab environment and following all the instructions — but the result is always the same.

Additionally, I read on another forum that it’s possible to recover the original lab scripts by deleting the current ones and clicking “Get Latest Version” from the Lab Help tab. I tried that too, but unfortunately it didn’t resolve the issue.

Could you please advise how I can proceed? Is there a problem with the scripts or something else in the setup?

Thank you in advance,
Diana

@Diana_Bohorquez Can you share in a message two things with me:

  • your glue.tf file under landing_etl module
  • your lab id

Yea, sure!

Lab ID: wcdfonnkznzs
attach glue file.
glue.tf.txt (2.0 KB)

@Diana_Bohorquez Did you make sure to run “source scripts/setup.sh”? Were you asked to prompt the variables for terraform and did you manually assign variable like my-glue-scripts-bucket?

Because here’s what I’m seeing in your account: SdkClientException occurred : com.amazonaws.SdkClientException: Unable to execute HTTP request: Connect to my-glue-scripts-bucket.s3.amazonaws.com:443 [my-glue-scripts-bucket.s3.amazonaws.com/16.12.5.52, my-glue-scripts-bucket.s3.amazonaws.com/3.5.253.173, my-glue-scripts-bucket.s3.amazonaws.com/3.5.254.0, my-glue-scripts-bucket.s3.amazonaws.com/3.5.252.126, my-glue-scripts-bucket.s3.amazonaws.com/16.12.5.124, my-glue-scripts-bucket.s3.amazonaws.com/16.12.4.76, my-glue-scripts-bucket.s3.amazonaws.com/3.5.254.69] failed: connect timed out

THe bucket for the glue script is not labeled my-glue-scripts, it’s labeled something different, so it’s finding hard time locating the scripts.

Yes, I always run source scripts/setup.sh at the beginning.

Regarding your second question, I’m not sure if I was prompted for the Terraform variables, so I might have assigned some values manually without realizing it.

How could I proceed in order to deal with this issue?

In the terminal, can you try again:

if you’re in the terraform folder, type
cd ..
source scripts/setup.sh
then

cd terraform
terraform init || echo "$?"
terraform apply || echo "$?"

Then try again the glue jobs

Let me know how it goes for you

@Diana_Bohorquez I suggested the above assuming you’re still in the lab session.

When you repeat the lab again, just make sure to follow the same steps in the lab. Make sure to start source scripts/setup.sh

Sometimes, if the terminal crashes, you would need to type again source scripts/setup.sh to make sure the environment variables are there.

Hi!
This morning I started the lab again. I was able to run the Glue jobs successfully and continued the lab up to step 4.3.2.

When I started step 4.3.3, I ran into more issues while trying to deploy the Glue jobs. Here’s what I did and the problems I encountered:

  1. I ran the following commands successfully:
cd terraform  
terraform init  
terraform plan  
terraform apply  
  1. I got the Job ID for the first Glue job:
aws glue start-job-run --job-name de-c3w2a1-csv-transformation-job | jq -r '.JobRunId'  
  1. When I tried to get the Job ID for the second Glue job, the terminal crashed.
  2. I started over and followed your suggestion:
source scripts/setup.sh

then

cd terraform  
terraform init || echo "$?"  
terraform apply || echo "$?"  
  1. Then I repeated the steps from point 2.
  2. The terminal crashed again when running the second job.
  3. This time, I tried:
cd terraform
terraform apply -no-color 2> errors.txt  

and started over again from step 1.

  1. It crashed once more.
  2. I rebooted and start over again from step 4, and this time I get the next error message.

I really appreciate your support.
Diana

@Diana_Bohorquez Thank you for all the details you provided it!

Item 3
" 3. When I tried to get the Job ID for the second Glue job, the terminal crashed."

  • what did you do to get the the job id of the second glue job?

Item 6
" 6. The terminal crashed again when running the second job."

So here it looks like the terminal crashed when you tried:
aws glue start-job-run --job-name sth | jq -r ‘.JobRunId’

Am I correct?

I’ll have more feedback for you especially for the last error, but I want first to make sure that this is exactly when the terminal crashed.

@Diana_Bohorquez If the problem was only related to running the glue job,

  • to get the glue job id, you can either type terraform output or check the glue job in the AWS console
  • you don’t need to re-do terraform apply because resources have been already created, you just need to make sure you have the right job id
  • please avoid rebooting the lab during the lab session, this will make your lab environment out of sync with the AWS resources that have been created, hence the issue “resources already existed”
  • you can append || echo "$?" to any command to prevent the terminal from crashing.
  • you can also run the glue job manually in the AWS console.

:frowning:

I was able to advance a little bit, but …

I run source scripts/setup.shcd terraform , terraform init, terraform plan.

After that, I run aws glue start-job-run --job-name <GLUE-JOB-NAME> | jq -r '.JobRunId' , and this time I could get every ID for every glue job.

When I tried to run the jobs (aws glue get-job-run --job-name <GLUE-JOB-NAME> --run-id <JOB-RUN-ID> --output text --query "JobRun.JobRunState), this time I get FAILED message.

When I went to AWS Console, I goy a message that I don’t know how to debug.

did you make sure to run the cells in the notebook that upload the scripts to the s3 bucket?

The s3 bucket that should have the scripts is empty.

@Diana_Bohorquez Also please make sure to run the glue job in the correct order:

  • extraction jobs first
  • then transformation job second

Thank u so much for your support.
I could finished this lab finally.

I am having issues with this assignment as well. I get to section 3.2 where you use terraform to get the Glue jobs going and it crashes on me during ‘terraform apply’. I get this error message:

│ Error: Reference to undeclared module

│ on outputs.tf line 24, in output “glue_ratings_transform_job”:
│ 24: value = module.transform_etl.glue_ratings_transform_job

│ No module call named “transform_etl” is declared in the root module.

But I’m not sure what the root module is?

I tried running the command terraform apply -no-color 2> errors.txt as suggested in the instructions and it does create a text file with an error message but the terminal still crashes. I do not see any Glue jobs in the AWS console. Here is the error message in the .txt file generated:

Error: No configuration files

Apply requires configuration to be present. Applying without a configuration
would mark everything for destruction, which is normally not what is desired.
If you would like to destroy everything, run ‘terraform destroy’ instead.

Any help would be appreciated. I know there is a way to reset labs/assignments but I cannot find those instructions. Thanks!