Data Engineering - C4W4 - Task 1

Hello,
I am dealing with the first Task of Course 4 Task 1 of Data Engineering. When I try to deploy the scripts in the Landing Zone they fail to start (aws glue get-job-run fails). I don’t know how to solve this problem.

Thanks in advance.

In the AWS console I get the error:
MissingSchema: Invalid URL ‘ec2-107-21-8-209.compute-1.amazonaws.com?start_date=2020-01-01&end_date=2020-01-31’: No schema supplied. Perhaps you meant http://ec2-107-21-8-209.compute-1.amazonaws.com?start_date=2020-01-01&end_date=2020-01-31?

Now terraform apply shuts down with error:

Error: creating Glue Connection (de-c4w4a1-connection-rds): operation error Glue: CreateConnection, https response error StatusCode: 400, RequestID: c312f61f-e395-43d7-865c-a7ee46d5be33, AlreadyExistsException: Connection already exists.

with module.extract_job.aws_glue_connection.rds_connection,
on modules/extract_job/glue.tf line 2, in resource “aws_glue_connection” “rds_connection”:
2: resource “aws_glue_connection” “rds_connection” {

Error: creating IAM Role (de-c4w4a1-glue-role): operation error IAM: CreateRole, https response error StatusCode: 409, RequestID: 7384f773-15e0-4bcf-8fcc-0a16cb836e22, EntityAlreadyExists: Role with name de-c4w4a1-glue-role already exists.
And a similar error to CreateIAMRole

I will move this thread to the discussion forum area for that course.

Posting in the “AI Discussions” forum will not get you many answers related to a specific course.

Hi,

I think this pertains to the second data source (new API).

Please make sure to copy the correct endpoint value from the CloudFormation outputs tab and replace the placeholder <API_ENDPOINT> with it (i.e., item 2.4)

Then, test the API by performing a GET request to the endpoint (i.e., item 2.5). Please run the cell below it and make sure that you get a status code 200 before proceeding to the next items.

1 Like

Hi,
I modified it in the notebook, and it get 200 as result. I modified it in two places in glue.tf. I now get the error in the AWS console:

HTTPError: 404 Client Error: Not Found for url: http://ec2-3-223-128-148.compute-1.amazonaws.com/?start_date=2020-01-01&end_date=2020-01-31

Hi,

Could you please share more details about the specific step where you’re encountering the issue? This will help us assist you more effectively.

In Item 2.5 I get status code 200, so everything is ok up to that point.

The problem arises in point 4.1.7. There I have to run the scripts:

aws glue start-job-run --job-name | jq -r ‘.JobRunId’

This sentence runs ok. But when I try if they run ok:

aws glue get-job-run --job-name --run-id --output text --query “JobRun.JobRunState”

with the scripts de-c4w4a1-api-users-extract-job and de-c4w4a1-api-sessions-extract-job I get

FAILED.

For the script de-c4w4a1-rds-extract-job I get SUCCEEDED.

When I go to the AWS Console to G3 and I go to Run in these jobs I get

HTTPError: 404 Client Error: Not Found for url: http://ec2-54-243-232-194.compute-1.amazonaws.com/?start_date=2020-01-01&end_date=2020-01-31

Thanks in advance.

Thanks in advance.

Hello,

Could you please start by checking whether you’re able to extract the sample data from the two API sources, as outlined in steps 2.6 to 2.7? You should receive a JSON response containing the data content.

Next, please ensure that you’ve included the required endpoint parameters in the Python script for the API, as described in step 4.1.1, and that the changes are saved correctly.

Also, kindly review step 4.1.3 to confirm that no modifications were made to this section. Only uncomment the lines specified in step 4.1.4, as instructed.