I would like to ask regarding the error that I got when starting glue job in section 4.1.7 for part 1 capstone project. I keep getting an AnalysisError on the AWS Glue Console for the api-users-extract job only while the other two extrac jobs always successful. I did rerun the job multiple times and still the same result.
I also already submitted a request to refresh the lab too but it is still the same as before.
Hello @Youhorng,
It seems I could reproduce the error. Could you check you are using the correct api_end_date
. Thank you:
# Set `"--api_start_date"` to `"2020-01-01"`
"--api_start_date" = "2020-01-01"
# Set `"--api_end_date"` to `"2020-01-31"`
"--api_end_date" = "2020-01-31" <----USE 2020-01-31 instead of 2020 -01-01
# Replace the placeholder <API-ENDPOINT> with the value from the CloudFormation outputs
"--api_url" = "http://ec2-***.compute-1.amazonaws.com/users"
This is the error if you are using end date 2020-01-01
instead. Hope it helps:
I getting this error for
User: arn:aws:sts::058264395471:assumed-role/voclabs/user3923357=crhcxbsvcuog is not authorized to perform: logs:DescribeLogStreams on resource: arn:aws:logs:us-east-1:058264395471:log-group:/aws-glue/jobs/error:log-stream: because no identity-based policy allows the logs:DescribeLogStreams action
Hello @HeyChong,
When you run the glue job, you have to use the actual name of the glue job not the name of the output variable.
So one of the outputs you got:
glue_api_users_extract_job = “de-c4w4a1-api-users-extract-job”
When you run the glue job, the command should be:
aws glue start-job-run --job-name de-c4w4a1-api-users-extract-job | jq -r ‘.JobRunId’
Hope it helps
Thanks! I run it directly from AWS console still encounter this error.
Hello @HeyChong,
That is strange, you get this error when you run the command or when the job fails in the AWS glue console. Please fill this form for a lab refresh. Thank you
I too have the same issue. The API config arguments specially those highlighted api_start_date and api_end_date are as per the comments there in the file. @Georgios could you please assist ?
There is no access to the console to check the job run logs.
However, I went a head and submitted the lab refresh request through the form taken the link from one of the conversations in this discussion chain.
Thanks
Karthik
Hello @Karthikeyadarbha,
I could reproduce your issue after I deleted the .py files from the scripts bucket:
Could you run those two cells to upload them in step 4.1.5, thanks:
1 Like
I ran these steps multiple times. However, I shall execute them again and check, update.
Thanks
1 Like
The suggested steps were executed again, the job is failing still.
Hello @Karthikeyadarbha,
Could you check if there are in the scripts bucket
, also in the glue.tf
if you use the correct script_location when you set the value of scripts_bucket and "de-c4w4a1-api-extract-job.py"
for the script object key, thanks:
script_location = "s3://${var.scripts_bucket}/de-c4w4a1-api-extract-job.py"
Many thanks, it worked to execute extract glue jobs successfully now. But I have trouble now in executing transform jobs. There is no execution error, in fact the bash is notifying for a few input configuration variables.
Could you please help me identify the issue ? @Georgios
I doubt on below configurations whether I made them properly or not
I ran into this issue several times as well. I’m guessing I was taking too long? Anyway, you can run the setup scripts again (the script you ran from the command line at the start of the lab) and the background variables got set back up for me and terraform didn’t prompt me for these anymore.
2 Likes
I tried many attempts, but couldn’t run/ execute transform jobs. As I see following error many times exactly when running the ops terraform commands - terraform plan
or seeing again the same error reported in the previous message
Also, looks like the completion of this lab execise is a time killing activity with all the steps need execution again and again 
Hello @Karthikeyadarbha,
Yes it’s true that you need to execute the same steps and it’s time killing. After you get this message you need to use reload window and run source scripts/setup.sh again to set those variables. Otherwise you will be asked to input them manually when terraform plan/apply. Also if you run into existing resources when terraform apply and get disconnected you also need to run source scripts/setup.sh again. Basically it’s the same process again and again. Thank you
So, executed all the steps in sequential manner and Transform job is failing now: Could you please help with the error ? Not sure on the error logs as they are not accessible.
particularly am I correctly configuring this piece ?
Hello @Karthikeyadarbha,
There seems to be a bug in the de-c4w4a1-transform-songs-job.py
, use df and withColumn method to create that job. If you get has no attribute add,
Hope it helps:
df = df.withColumn( <----Use this method instead of add
"ingest_on", F.to_date(F.lit(ingest_date), "yyyy-MM-dd")
).withColumn("source_from", F.lit("postgres_rds"))
I have challenges in completing the lab particularly with resources availability, by the time I reach the serving statge after extract and Transform layers, the session goes off, options suggested such as running source scripts and there after all the sequence of steps are not helping though.
Can someone please help on this ? Seems like the primary of this challenge to complete the lab before sessions goes off.
Hi @Georgios
Today I logged out and tried with a new session into my coursera and lab accounts and I still re-produce the same connection issue immediately after 20 mins of session, also I see terminal crash right from the beginning of the session and when running “terraform plan” command and no error.txt file generated.
Thanks
Helllo @Karthikeyadarbha,
It seems I couldn’t reproduce the issue with your lab since I tried it earlier. When this happens you could try to reload window and start from the beginning with source scripts/setup.sh
command then go to terraform folder as usual. To create the errors.txt
file you need to run the terraform apply -no-color 2> errors.txt
every time the terminal crashes to find any existing resources like this, post a screenshot if any issue, thanks: