Assignment 5: Capstone Project Part 2 - Data Quality and Orchestration and Visualization

Hello,

I was able to finish the 1st part of Capstone project. In the 2nd part of Capstone project, I am getting problems at steps 2.3 and 2.4.

1- It works till terraform plan. But whenever I apply terraform apply parts, I get exit: code 1 error and terminal crashes.

2- One time I was able to run all three apply commands without error. But got error at step 2.4, I got Succeeded at glue_api_users_extract_job part when I ran

aws glue start-job-run --job-name | jq -r ‘.JobRunId’

aws glue get-job-run --job-name --run-id --output text --query “JobRun.JobRunState”

But got Failed instead of Succeeded message when I ran the above 2nd command posted above for glue_sessions_users_extract_job. Hence I was not able to proceed to glue_rds_extract_job

Why am getting problems with step 2.3? And if I am able to get to step 2.4, how can I make to above two commands run for all three extract jobs?

Regards.

For step 2.3 at the end it says: If the terminal continues to crash, run the following command instead: terraform apply -no-color 2> errors.txt. You could try and delete those resources from the AWS console:

  1. de-c4w4a2-connection-rds at AWS glue> connections

  1. de-c4w4a2-glue-role at IAM>roles

If the issue continues even after a new lab session (a few hours). You can try with the lab refresh form, it takes 1-2 business days.
For part 2.4 you need to check terraform/modules/extract_job/glue.tf, each time you need to replace the <API_ENDPOINT> after each labe session.
You could also check the issues with the jobs, for example if you have the correct api_url (2 places) and ingest_date:

    "--api_url"             = "http://ec2-****.compute-1.amazonaws.com/sessions" <-----Check if you have /sessions missing
    "--target_path"         = "s3://${var.data_lake_bucket}/landing_zone/api/sessions"
    "--ingest_date"         = "2020-02-01"
1 Like