C4W4 Capstone Project Pt 1: Section 4.2.6

After 5 attempts I am stuck. I don’t think the lab refreshes are working. After a few corrections to initial code changes, everything worked fine. I got some random terminal crashes and unreproducible errors, but after lab reboot, things worked fine until section 4.2 when setting up the transform glue jobs.

To the best of my knowledge, there are no code errors and it will not get past the terraform apply in 4.2.6:

No matter what I do, lab reboots, starting at step 1 and redoing the entire lab, etc, it stops here. Any help would be appreciated. Last resort is to have entire lab reset to zero.

Run the env set up script and immediately right after that finished, do the deploy of the terraform. The variables should be set and you will not be asked to provide values.

Not sure what you mean. are you referring to source scripts/setup.sh?

This part never finishes. It shouldn’t be asking me for the var value, this plan should run and complete. I’ve tried re-running setup.sh and that doesn’t seem to help.

Are you suggesting that I run setup.sh after uploading the scripts and before running terraform init?

Yes. Run below in projects directory
scripts/setup.sh

I had a similar issue like yours and I just finished after reverse engineering it basically. I actually did a terraform destroy to get ride of all the resources on AWS. And then I redo terraform apply. Here I had issues that the resources are already there in AWS. I went ahead and manually deleted the resources on AWS and made sure that the terraform apply go through. Then run the extract jobs first. When they finish run the transform jobs.

I just tried that, no luck. It ran the plan command, but when I ran apply it gave me:

Error: creating Glue Connection (de-c4w4a1-connection-rds): operation error Glue: CreateConnection, https response error StatusCode: 400, RequestID: 40ba307e-2d25-4c12-aed0-8b4bd23d59d8, AlreadyExistsException: Connection already exists.

with module.extract_job.aws_glue_connection.rds_connection,
on modules/extract_job/glue.tf line 2, in resource “aws_glue_connection” “rds_connection”:
2: resource “aws_glue_connection” “rds_connection” {

Error: creating IAM Role (de-c4w4a1-glue-role): operation error IAM: CreateRole, https response error StatusCode: 409, RequestID: b5b67b63-b33e-4c01-ae92-16c6c36ab556, EntityAlreadyExists: Role with name de-c4w4a1-glue-role already exists.

with module.extract_job.aws_iam_role.glue_role,
on modules/extract_job/iam.tf line 1, in resource “aws_iam_role” “glue_role”:
1: resource “aws_iam_role” “glue_role” {

This works fine in the extract glue jobs, but now fails here.

Look at the error message brother:
“Connection already exists.”
“Role with name de-c4w4a1-glue-role already exists.”

manually delet those from the AWS console and rerun.e

Hi,

I can’t move forward with this Capstone Project Pt1,

Error: reading IAM Role (de-c4w4a1-load-role): couldn’t find resource

** with module.serving.data.aws_iam_role.redshift_spectrum_role,**
** on modules/serving/iam.tf line 2, in data “aws_iam_role” “redshift_spectrum_role”:**
** 2: data “aws_iam_role” “redshift_spectrum_role” {**

Is it possible to have a fresh environment?

how do I manually delete those connections?

After the Terminal Crash, I had an existing IAM role error for the glue job. So I decided to manually delete IAMs roles.

Hello @AritraDasRay and @euloge
Your labs need a refresh. Please, fill out this form if you are still facing the issue.

Thank You.

Now I am stuck in a transformation job. It failed when running.

Job : de-c4w4a1-json-transform-job
failed with error message : AnalysisException: Path does not exist: s3://de-c4w4a1-972550453525-us-east-1-data-lake/landing_zone/api/users/2025-03-01

but the s3 path exist

Job : de-c4w4a1-songs-transform-job
failed with error message : AttributeError: ‘DataFrame’ object has no attribute ‘duration’

But i can see in the csv file on s3 field “duration”

Hi, I filled out the form.

@euloge your post is for yesterday, but the transformation job is looking for files in a path with 2025-03-01 in it. I guess you might have forgotten to update some of the current date placeholders.

Thank You, @Amir_Zare. It works. The issue was the date as you mentionned