C4W4 project 1 2 of 3 glue jobs are failing

I am getting this error when running “de-c4w4a1-api-users-extract-job” and “de-c4w4a1-api-sessions-extract-job” jobs

ConnectTimeout: HTTPConnectionPool(host=‘ec2-3-224-47-182.compute-1.amazonaws.com’, port=80): Max retries exceeded with url: /users?start_date=2020-01-01&end_date=2020-01-31 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f80aa8b71f0>, ‘Connection to ec2-3-224-47-182.compute-1.amazonaws.com timed out. (connect timeout=None)’))

“de-c4w4a1-rds-extract-job” is running successfully. Can you please tell me what is the issue.

Hello @UmairRauf,

I could reproduce your issue, it seems you have the wrong API endpoint. Could you check you replace it in two places in the terraform/modules/extract_job/glue.tf (step 4.1.2). Note, that the API endpoint changes every lab session and needs to be updated (same with the PACIFIC-TIME-CURRENT-DATE for transform_job):

    # Replace the placeholder <API-ENDPOINT> with the value from the CloudFormation outputs
    "--api_url"             = "http://ec****/users"


`


now I am getting this error
resource “aws_glue_connection” “rds_connection” {
name = “${var.project}-connection-rds”

At connection_properties, add var.username and var.password to the USERNAME and PASSWORD parameters respectively

connection_properties = {
JDBC_CONNECTION_URL = “jdbc:postgresql://{var.host}:{var.port}/${var.database}”
USERNAME = var.username
PASSWORD = var.password
}

At the physical_connection_requirements configuration, set the subnet_id to data.aws_subnet.public_a.id and

the security_group_id_list to a list containing the element data.aws_security_group.db_sg.id

physical_connection_requirements {
availability_zone = data.aws_subnet.public_a.availability_zone
security_group_id_list = [data.aws_security_group.db_sg.id]
subnet_id = data.aws_subnet.public_a.id
}
}

Complete the resource "aws_glue_job" "rds_ingestion_etl_job"

resource “aws_glue_job” “rds_ingestion_etl_job” {
name = “${var.project}-rds-extract-job”

Set the role_arn parameter to aws_iam_role.glue_role.arn

role_arn = aws_iam_role.glue_role.arn
glue_version = “4.0”

Set the connections parameter to a list containing the RDS connection you just created with aws_glue_connection.rds_connection.name

connections = [aws_glue_connection.rds_connection.name]
command {
name = “glueetl”
# Set the value of scripts_bucket and “de-c4w4a1-extract-songs-job.py” for the script object key
script_location = “s3://${var.scripts_bucket}/de-c4w4a1-extract-songs-job.py”
python_version = 3
}

At default_arguments, complete the arguments

default_arguments = {
“–enable-job-insights” = “true”
“–job-language” = “python”
# Set "--rds_connection" as aws_glue_connection.rds_connection.name
“–rds_connection” = aws_glue_connection.rds_connection.name
# Set "--data_lake_bucket" as var.data_lake_bucket
“–data_lake_bucket” = var.data_lake_bucket
}

Set up the timeout to 5 and the number of workers to 2. The time unit here is minutes.

timeout = 5
number_of_workers = 2

worker_type = “G.1X”
}

Complete the resource "aws_glue_job" "api_users_ingestion_etl_job"

resource “aws_glue_job” “api_users_ingestion_etl_job” {
name = “${var.project}-api-users-extract-job”
role_arn = aws_iam_role.glue_role.arn
glue_version = “4.0”

command {
name = “glueetl”
# Set the value of scripts_bucket and "de-c4w4a1-api-extract-job.py" for the script object key
script_location = “s3://${var.scripts_bucket}/de-c4w4a1-api-extract-job.py”
python_version = 3
}

Set the arguments in the default_arguments configuration parameter

default_arguments = {
“–enable-job-insights” = “true”
“–job-language” = “python”
# Set "--api_start_date" to "2020-01-01"
“–api_start_date” = “2020-01-01”
# Set "--api_end_date" to "2020-01-31"
“–api_end_date” = “2020-01-31”
# Replace the placeholder with the value from the CloudFormation outputs
“–api_url” = “http://ec2-3-210-55-215.compute-1.amazonaws.com/
# Notice the target path. This line of the code code is complete - no changes are required
“–target_path” = “s3://${var.data_lake_bucket}/landing_zone/api/users”
}

Set up the timeout to 5 and the number of workers to 2. The time unit here is minutes.

timeout = 5
number_of_workers = 2

worker_type = “G.1X”
}

Complete the resource "aws_glue_job" "api_sessions_ingestion_etl_job"

resource “aws_glue_job” “api_sessions_ingestion_etl_job” {
name = “${var.project}-api-sessions-extract-job”
role_arn = aws_iam_role.glue_role.arn
glue_version = “4.0”

command {
name = “glueetl”
# Set the value of scripts_bucket and "de-c4w4a1-api-extract-job.py" for the script object key
script_location = “s3://${var.scripts_bucket}/de-c4w4a1-api-extract-job.py”
python_version = 3
}

Set the arguments in the default_arguments configuration parameter

default_arguments = {
“–enable-job-insights” = “true”
“–job-language” = “python”
# Set "--api_start_date" to "2020-01-01"
“–api_start_date” = “2020-01-01”
# Set "--api_end_date" to "2020-01-31"
“–api_end_date” = “2020-01-31”
# Replace the placeholder with the value from the CloudFormation outputs
“–api_url” = “http://ec2-3-210-55-215.compute-1.amazonaws.com/
# Notice the target path. This line of the code code is complete - no changes are required
“–target_path” = “s3://${var.data_lake_bucket}/landing_zone/api/sessions”
}

Set up the timeout to 5 and the number of workers to 2. The time unit here is minutes.

timeout = 5
number_of_workers = 2

worker_type = “G.1X”
}

This is the error

Hello @UmairRauf,

I could reproduce your issue, it seems you didn’t include the /users and /sessions in the api_url after the API endpoint. Could you check in terraform/modules/extract_job/glue.tf those two places:

# Replace the placeholder <API-ENDPOINT> with the value from the CloudFormation outputs
"--api_url"             = "http://ec****/users" <--Add /users or /sessions after API endpoint

Also in sessions