This is regarding section 4.1.7 of the notebook C4_W4_Assignment_1.ipynb.
I am able to successfully run the terraform steps after loading the ETL jobs, and both the rds-extract-job
and the users-extract-job
complete successfully, however I’m getting the following error in the Glue console when attempting to run the api-sessions-extract-job
:
HTTPError: 500 Server Error: Internal Server Error for url: http://ec2-3-217-34-172.compute-1.amazonaws.com/sessions
I’ve tried rebooting the session and re-running and re-loading everything, but it’s giving me the same result.
Here is the relevant code block from the glue.tf file in the repo:
# Complete the resource "aws_glue_job" "api_sessions_ingestion_etl_job"
resource "aws_glue_job" "api_sessions_ingestion_etl_job" {
name = "${var.project}-api-sessions-extract-job"
role_arn = aws_iam_role.glue_role.arn
glue_version = "4.0"
command {
name = "glueetl"
# Set the value of scripts_bucket and "de-c4w4a1-api-extract-job.py" for the script object key
script_location = "s3://${var.scripts_bucket}/de-c4w4a1-api-extract-job.py"
python_version = 3
}
# Set the arguments in the `default_arguments` configuration parameter
default_arguments = {
"--enable-job-insights" = "true"
"--job-language" = "python"
# Set `"--api_start_date"` to `"2020-01-01"`
"--api_start_date" = "2020-01-01"
# Set `"--api_end_date"` to `"2020-01-31"`
"--api_end_date" = "2020-01-31"
# Replace the placeholder <API-ENDPOINT> with the value from the CloudFormation outputs
"--api_url" = "http://ec2-3-217-34-172.compute-1.amazonaws.com/sessions"
# Notice the target path. This line of the code code is complete - no changes are required
"--target_path" = "s3://${var.data_lake_bucket}/landing_zone/api/sessions"
}
# Set up the `timeout` to 5 and the number of workers to 2. The time unit here is minutes.
timeout = 5
number_of_workers = 2
worker_type = "G.1X"
}
Earlier in the script, I’m able to see a 200 response from that API endpoint:
Any ideas? Thank you!