Capstone Project Part 1 - ETL and Data Modeling - complete frustration

I have now filled the form fill the form for a total of 5 times. The last three times, I was sent an e-mail that stated " has been resolved" - only to find out that the same error messages are reported when I run setup.sh.
The problem starts the same way… I initiate and plan terraform, but when I apply, the terminal goes away. I have to start another terminal and repeat the same process… The same problem repeats. After several trials, I reboot the system by following the instructions. Then I’m back to the original problem:

Finally, now I get this message:

What does this mean? Does it mean I can’t complete the assignment? and therefore can’t complete the course as well?
Please help!

Hello @sgeletta
Have you tried using the terraform apply -no-color 2> errors.txt? This way you can see what’s going wrong with the terraform apply command. There might be an issue with your terraform configuration files.

It is not about terraform errors. It is about my connections being reset while I am waiting for apply to run… and then never being able to reconnect in the current state… although when I reset Windows, I get connected… with all environment variables disappearing and/or the error message:
“Error: creating IAM Role (de-c4w4a1-glue-role): operation error IAM: CreateRole, https response error StatusCode: 409, RequestID: f3942540-b7a4-4479-84b1-f0552d7a6d0e, EntityAlreadyExists: Role with name de-c4w4a1-glue-role already exists.” when I try to re-run terraform apply. I have been on this routine for about 4 times, just today!! Seems to me that the assigned resources are not adequate to allow completing the project… or something… It is towards the end of the project when all 9 terraform processes are run that this is happening.

Just FYI, if AWS gets mad at you, it can take several hours before it will release the assets it is holding. After that, you have a better chance of starting over.

The “terraform” tasks are particularly vexing.

This course is plagued by an unstable programming environment, because it’s run by AWS, not DLAI.

It’s not really a good learning environment. AWS assumes you don’t make any mistakes, and gets unhappy if you do.

@sgeletta the allocated resources are enough for the assignment. However, as @TMosh have already mentioned, if you try running similar commands repetitively, AWS will overload, and you won’t be able to continue. Please, try filling out the form once again, and this time, try to get everything correct on the first try. This way you will be able to pass the assignment.

@TMosh: “if AWS gets mad at you…” lol! AWS has been mad at me for the last four weeks… and now I’m mad at it because I got charged extra for a course that I should have finished four weeks ago.
@Amir_Zare: debugging code the size of the capstone project requires running repetitive commands. This should have been anticipated by the course planners - IMHO. I will fill the form, again…

I have (pretty much) finished debugging terraform code. I applied the necessary changes for the “serving” module to run, but again, I got kicked out at “terraform apply”. When I restore the console and re-run “terraform apply” command, I got the now-so-familiar error message:
Error: creating Glue Connection (de-c4w4a1-connection-rds): operation error Glue: CreateConnection, https response error StatusCode: 400, RequestID: b848ce21-6d79-46d4-b4b1-deaf31d2df2e, AlreadyExistsException: Connection already exists.

│ with module.extract_job.aws_glue_connection.rds_connection,
│ on modules/extract_job/glue.tf line 2, in resource “aws_glue_connection” “rds_connection”:
│ 2: resource “aws_glue_connection” “rds_connection” {



│ Error: creating IAM Role (de-c4w4a1-glue-role): operation error IAM: CreateRole, https response error StatusCode: 409, RequestID: e668f7d8-043d-42b9-8479-8bd686b4a878, EntityAlreadyExists: Role with name de-c4w4a1-glue-role already exists.

│ with module.extract_job.aws_iam_role.glue_role,
│ on modules/extract_job/iam.tf line 1, in resource “aws_iam_role” “glue_role”:
│ 1: resource “aws_iam_role” “glue_role” {



│ Error: pq: Schema “deftunes_serving” already exists

│ with module.serving.redshift_schema.serving_schema,
│ on modules/serving/redshift.tf line 2, in resource “redshift_schema” “serving_schema”:
│ 2: resource “redshift_schema” “serving_schema” {



│ Error: pq: Schema “deftunes_transform” already exists

│ with module.serving.redshift_schema.transform_external_from_glue_data_catalog,
│ on modules/serving/redshift.tf line 12, in resource “redshift_schema” “transform_external_from_glue_data_catalog”:
│ 12: resource “redshift_schema” “transform_external_from_glue_data_catalog” {

Which means I will have to submit a lab reset request and wait for another two days. I have no options, do I?

Hi @sgeletta - When you get this error, you don’t need to submit the form. This is something related to the terraform state and to the order you’re following to apply the terraform steps.

  • Did you remove any terraform related file?
  • Did you reboot the lab while you were in the lab session?
  • Can you explain to me the steps you did before you got this error?

You can always delete these resources from the AWS conosle. Also when you repeat the lab, make sure you’re doing things step by step as if you’re starting the lab from beginning.

I’m happy to assist you when you try again the lab, but I would appreciate if you explain to me the exact steps you’re doing (you can send them in a message).

Please make sure to tag me to so I got to see your reply.

As another tip, one learner suggested appending || echo “$?” to all the terraform commands so the error gets displayed in the terminal without crashing it ( as an alternative to printing the errors in the text file). I thought of sharing it also here.

It is great news that I didn’t need to reset the lab. Please let me know how to proceed.
Thanks again!
Simon

Okay great for the first two points! Great you didn’t reboot the lab, as this would have affected the terraform state.

Regarding the steps, clarification question: when you run the first module, did you run the extract glue jobs before proceeding to the second module? Also when run the second module, did you run the transform glue jobs?

The extract job didn’t give me much trouble. The transform job required repeated debugging (the last hold up was the correctly specifying the value for “ingestion-date” which I finally succeeded in doing). So, these two tasks were submitted together. As soon as I got a “success” message for each of the modules in the two steps, I continued to the “serving” step and tried to submit that next.

@sgeletta Thank you for all the detailed info. That’s very helpful for me.

One more clarification question: after the connection was lost, you said you reset the window. What did you exactly do here? Did you refresh the window or launch the lab again? I’m asking to understand why the terraform state has been lost.

My initial hypothesis: It looks like there might some error in the files you edited in 4.3.1 ->4.3.5, so terraform apply did not work, you lost connection, something happened to the terraform state.
I will try the lab and try to replicate the error for me to understand what’s happening.

If you’d like, you can do the following:

  • share in a message to me you the work you did in 4.3.1 → 4.3.5 to make sure all is good here
  • Or try again the exact lab steps (module 1 and its glue jobs then module 2 and its glue jobs), but when you get to the serving section: let’ s try to use
    terraform apply || echo "$?" (I’ve been hearing from other learners that it prevents the disconnection from happening)

I refreshed the browser window. I will share my files momentarily… Thanks!