Week 4 Capstone – Glue API Extract Jobs Failing (ConnectTimeout) – Unable to Proceed to DBT/Redshift

Course: Data Modeling, Transformation, and Serving
Week: Week 4 – Serving Data
Assignment: Programming Assignment 4 – Capstone Project Part 1 (C4_W4_Assignment_1.ipynb)

Hi everyone,

I’m currently working on C4_W4_Assignment_1.ipynb from the Coursera platform, and I’ve been stuck for almost a month due to repeated AWS Glue job failures.

Landing Zone – Extract Jobs (Glue)

After successfully running Terraform, I executed the following Glue extract jobs:

  • de-c4w4a1-rds-extract-jobSUCCEEDED

  • de-c4w4a1-api-users-extract-jobFAILED

  • de-c4w4a1-api-sessions-extract-jobFAILED

Both API-based jobs fail with the following error:

ConnectTimeout: HTTPConnectionPool(host='ec2-3-219-211-66.compute-1.amazonaws.com', port=80):
Max retries exceeded with url: /users?start_date=2020-01-01&end_date=2020-01-31

As a result, these jobs have no successful runs, while only the RDS extract works.

Transformation Zone – Transform Jobs

Because the extract jobs are failing, the Transformation Glue jobs are also failing, and I’m unable to proceed to the Redshift / dbt serving layer part of the assignment.

Has anyone faced a similar AP

I timeout issue in Week 4? Any guidance on how to resolve this or proceed would be really helpful.