C4W4 project 1 2 of 3 glue jobs are failing

UmairRauf · April 10, 2025, 5:33am

I am getting this error when running “de-c4w4a1-api-users-extract-job” and “de-c4w4a1-api-sessions-extract-job” jobs

ConnectTimeout: HTTPConnectionPool(host=‘ec2-3-224-47-182.compute-1.amazonaws.com’, port=80): Max retries exceeded with url: /users?start_date=2020-01-01&end_date=2020-01-31 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f80aa8b71f0>, ‘Connection to ec2-3-224-47-182.compute-1.amazonaws.com timed out. (connect timeout=None)’))

“de-c4w4a1-rds-extract-job” is running successfully. Can you please tell me what is the issue.

Georgios · April 10, 2025, 12:45pm

Hello @UmairRauf,

I could reproduce your issue, it seems you have the wrong API endpoint. Could you check you replace it in two places in the terraform/modules/extract_job/glue.tf (step 4.1.2). Note, that the API endpoint changes every lab session and needs to be updated (same with the PACIFIC-TIME-CURRENT-DATE for transform_job):

    # Replace the placeholder <API-ENDPOINT> with the value from the CloudFormation outputs
    "--api_url"             = "http://ec****/users"

`

UmairRauf · April 10, 2025, 4:09pm

now I am getting this error
resource “aws_glue_connection” “rds_connection” {
name = “${var.project}-connection-rds”

At `connection_properties`, add `var.username` and `var.password` to the `USERNAME` and `PASSWORD` parameters respectively

connection_properties = {
JDBC_CONNECTION_URL = “jdbc:postgresql://{var.host}:{var.port}/${var.database}”
USERNAME = var.username
PASSWORD = var.password
}

At the `physical_connection_requirements` configuration, set the `subnet_id` to `data.aws_subnet.public_a.id` and

the `security_group_id_list` to a list containing the element `data.aws_security_group.db_sg.id`

physical_connection_requirements {
availability_zone = data.aws_subnet.public_a.availability_zone
security_group_id_list = [data.aws_security_group.db_sg.id]
subnet_id = data.aws_subnet.public_a.id
}
}

Complete the resource `"aws_glue_job" "rds_ingestion_etl_job"`

resource “aws_glue_job” “rds_ingestion_etl_job” {
name = “${var.project}-rds-extract-job”

Set the `role_arn` parameter to `aws_iam_role.glue_role.arn`

role_arn = aws_iam_role.glue_role.arn
glue_version = “4.0”

Set the `connections` parameter to a list containing the RDS connection you just created with `aws_glue_connection.rds_connection.name`

connections = [aws_glue_connection.rds_connection.name]
command {
name = “glueetl”
# Set the value of scripts_bucket and “de-c4w4a1-extract-songs-job.py” for the script object key
script_location = “s3://${var.scripts_bucket}/de-c4w4a1-extract-songs-job.py”
python_version = 3
}

At `default_arguments`, complete the arguments

default_arguments = {
“–enable-job-insights” = “true”
“–job-language” = “python”
# Set "--rds_connection" as aws_glue_connection.rds_connection.name
“–rds_connection” = aws_glue_connection.rds_connection.name
# Set "--data_lake_bucket" as var.data_lake_bucket
“–data_lake_bucket” = var.data_lake_bucket
}

Set up the `timeout` to 5 and the number of workers to 2. The time unit here is minutes.

timeout = 5
number_of_workers = 2

worker_type = “G.1X”
}

Complete the resource `"aws_glue_job" "api_users_ingestion_etl_job"`

resource “aws_glue_job” “api_users_ingestion_etl_job” {
name = “${var.project}-api-users-extract-job”
role_arn = aws_iam_role.glue_role.arn
glue_version = “4.0”

command {
name = “glueetl”
# Set the value of scripts_bucket and "de-c4w4a1-api-extract-job.py" for the script object key
script_location = “s3://${var.scripts_bucket}/de-c4w4a1-api-extract-job.py”
python_version = 3
}

Set the arguments in the `default_arguments` configuration parameter

default_arguments = {
“–enable-job-insights” = “true”
“–job-language” = “python”
# Set "--api_start_date" to "2020-01-01"
“–api_start_date” = “2020-01-01”
# Set "--api_end_date" to "2020-01-31"
“–api_end_date” = “2020-01-31”
# Replace the placeholder with the value from the CloudFormation outputs
“–api_url” = “http://ec2-3-210-55-215.compute-1.amazonaws.com/”
# Notice the target path. This line of the code code is complete - no changes are required
“–target_path” = “s3://${var.data_lake_bucket}/landing_zone/api/users”
}

Set up the `timeout` to 5 and the number of workers to 2. The time unit here is minutes.

timeout = 5
number_of_workers = 2

worker_type = “G.1X”
}

Complete the resource `"aws_glue_job" "api_sessions_ingestion_etl_job"`

resource “aws_glue_job” “api_sessions_ingestion_etl_job” {
name = “${var.project}-api-sessions-extract-job”
role_arn = aws_iam_role.glue_role.arn
glue_version = “4.0”

command {
name = “glueetl”
# Set the value of scripts_bucket and "de-c4w4a1-api-extract-job.py" for the script object key
script_location = “s3://${var.scripts_bucket}/de-c4w4a1-api-extract-job.py”
python_version = 3
}

Set the arguments in the `default_arguments` configuration parameter

default_arguments = {
“–enable-job-insights” = “true”
“–job-language” = “python”
# Set "--api_start_date" to "2020-01-01"
“–api_start_date” = “2020-01-01”
# Set "--api_end_date" to "2020-01-31"
“–api_end_date” = “2020-01-31”
# Replace the placeholder with the value from the CloudFormation outputs
“–api_url” = “http://ec2-3-210-55-215.compute-1.amazonaws.com/”
# Notice the target path. This line of the code code is complete - no changes are required
“–target_path” = “s3://${var.data_lake_bucket}/landing_zone/api/sessions”
}

Set up the `timeout` to 5 and the number of workers to 2. The time unit here is minutes.

timeout = 5
number_of_workers = 2

worker_type = “G.1X”
}

This is the error

Georgios · April 10, 2025, 4:30pm

Hello @UmairRauf,

I could reproduce your issue, it seems you didn’t include the /users and /sessions in the api_url after the API endpoint. Could you check in terraform/modules/extract_job/glue.tf those two places:

# Replace the placeholder <API-ENDPOINT> with the value from the CloudFormation outputs
"--api_url"             = "http://ec****/users" <--Add /users or /sessions after API endpoint

Also in sessions

Topic		Replies	Views
C4_W4_Assignment_1, Glue Jobs fail with connection issue Data Modeling, Transformation, and Serving week-module-4 , coursera-platform	1	39	November 16, 2025
C4W4 Capstone Part 1 Issues - not getting response in cell Data Modeling, Transformation, and Serving week-module-4 , coursera-platform	11	67	March 22, 2025
C4w4cap1: all jobs failing Data Modeling, Transformation, and Serving week-module-4 , coursera-platform	1	53	December 9, 2024
Error when running Glue Job on C4W4 Assignment Part 2 Data Modeling, Transformation, and Serving week-module-4 , coursera-platform	1	35	June 1, 2025
C4W4 Capston Part 1: AnalysisException Error Glue job run on api users extract Data Modeling, Transformation, and Serving week-module-3 , week-module-4 , project , coursera-platform	24	148	June 19, 2025

C4W4 project 1 2 of 3 glue jobs are failing

At connection_properties, add var.username and var.password to the USERNAME and PASSWORD parameters respectively

At the physical_connection_requirements configuration, set the subnet_id to data.aws_subnet.public_a.id and

the security_group_id_list to a list containing the element data.aws_security_group.db_sg.id

Complete the resource "aws_glue_job" "rds_ingestion_etl_job"

Set the role_arn parameter to aws_iam_role.glue_role.arn

Set the connections parameter to a list containing the RDS connection you just created with aws_glue_connection.rds_connection.name

At default_arguments, complete the arguments

Set up the timeout to 5 and the number of workers to 2. The time unit here is minutes.

Complete the resource "aws_glue_job" "api_users_ingestion_etl_job"

Set the arguments in the default_arguments configuration parameter

Set up the timeout to 5 and the number of workers to 2. The time unit here is minutes.

Complete the resource "aws_glue_job" "api_sessions_ingestion_etl_job"

Set the arguments in the default_arguments configuration parameter

Set up the timeout to 5 and the number of workers to 2. The time unit here is minutes.

Related topics

At `connection_properties`, add `var.username` and `var.password` to the `USERNAME` and `PASSWORD` parameters respectively

At the `physical_connection_requirements` configuration, set the `subnet_id` to `data.aws_subnet.public_a.id` and

the `security_group_id_list` to a list containing the element `data.aws_security_group.db_sg.id`

Complete the resource `"aws_glue_job" "rds_ingestion_etl_job"`

Set the `role_arn` parameter to `aws_iam_role.glue_role.arn`

Set the `connections` parameter to a list containing the RDS connection you just created with `aws_glue_connection.rds_connection.name`

At `default_arguments`, complete the arguments

Set up the `timeout` to 5 and the number of workers to 2. The time unit here is minutes.

Complete the resource `"aws_glue_job" "api_users_ingestion_etl_job"`

Set the arguments in the `default_arguments` configuration parameter

Set up the `timeout` to 5 and the number of workers to 2. The time unit here is minutes.

Complete the resource `"aws_glue_job" "api_sessions_ingestion_etl_job"`

Set the arguments in the `default_arguments` configuration parameter

Set up the `timeout` to 5 and the number of workers to 2. The time unit here is minutes.