Intro to Data Eng: Week 4 Lab 3 AWS Glue Error

Hi,

I have been facing consistent issues with this lab, especially when running the terraform files. I have parsed through all the solutions provided to other users with the same issue, I deleted the resources in AWS / deleted code files in VS and used ‘Get latest version’ as well/ I also requested for a lab refresh which seemed to help getting over the terraform issues but then immediately ran into issues running the GLUE job. I have also deleted resources from AWS only to be confronted with terraform claiming it needed those resources when it claimed there resources already existed in a previous error. I have waited the stipulated time for the lab to reset but that too has not resolved the errors. Could someone please provide some guidance on how can I overcome these issues.
The latest issue I ran into was an ‘Insufficient Lake Formation’. Attached screenshot from AWS console for reference:

Hello @Epoch_and_chill,

The Insfucient Lake Formation Permissions can be handled with that form. There is an option Insufficient lake formation, note it should take 2 business (no weekends) days to complete since it is a manual process performed by the engineers. That should fix your issue with de-c1w4-etl-job failing.

About your other issues, there is no need to delete the terraform files if you haven’t changed any code or missing files to use get latest version. What you need to do instead is run this command terraform apply -no-color 2> errors.txt. Then in errors.txt file as you found out there are resources you need to delete:

  1. rds connection

  2. glue role

  3. ml database

4 and if you run terraform apply -no-color 2> errors.txt a second time you might also need to delete the crawler

I am not sure what resources confronted that terraform claimed it needed, since after your terraform apply completes without crashing it will recreate any missing resources. Hope it helps

Thanks @Georgios I’ll give it a try.

So I ran the error.txt command along with terraform apply and it gave me no errors, so the only issue I am facing now is ‘Insufficient Lake Formation’ while running the Glue job. I had previously submitted a request to reset the lab to overcome the terraform errors but looks like it will need another reset to overcome the Glue job errors.

Hello @Epoch_and_chill,

Sorry for the inconvenience the lab refresh seems to fix that. Perhaps @hawraa.salami could check your lab refresh since you already had the same issue with this lab before. Hope it helps

That would be great! Thanks a lot @Georgios

1 Like

@Epoch_and_chill I just refreshed your AWS account, can you try again and let me know if you’re still encountering any issues?

Hey @hawraa.salami I did run the lab again, the first terraform initialization & the glue jobs ran well and the embeddings were created and the vector db’s were created. I am now getting an error when trying to implement the streaming pipeline. I did generate the errors.txt file after running terraform apply here’s the error message.

Error: creating Glue Catalog Database (de-c1w4-ml-db): operation error Glue: CreateDatabase, https response error StatusCode: 400, RequestID: 329f7fd8-30a8-47f5-9139-e1865d1b680c, AlreadyExistsException: Database already exists.

with module.etl.aws_glue_catalog_database.ml_database,
on modules/etl/glue.tf line 1, in resource “aws_glue_catalog_database” “ml_database”:
1: resource “aws_glue_catalog_database” “ml_database” {

Error: creating Glue Connection (de-c1w4-rds-connection): operation error Glue: CreateConnection, https response error StatusCode: 400, RequestID: 46cf95a7-8db3-4a9c-9116-321c3eb27172, AlreadyExistsException: Connection already exists.

with module.etl.aws_glue_connection.rds_connection,
on modules/etl/glue.tf line 6, in resource “aws_glue_connection” “rds_connection”:
6: resource “aws_glue_connection” “rds_connection” {

Error: creating IAM Role (de-c1w4-glue-role): operation error IAM: CreateRole, https response error StatusCode: 409, RequestID: 2a612e5a-57dd-401f-b8c1-7b98fbe8bc42, EntityAlreadyExists: Role with name de-c1w4-glue-role already exists.

with module.etl.aws_iam_role.glue_role,
on modules/etl/iam_roles.tf line 1, in resource “aws_iam_role” “glue_role”:
1: resource “aws_iam_role” “glue_role” {

UPDATE: I deleted the resources mentioned above through the AWS Console but was then met with different errors:

Error: creating IAM Role (de-c1w4-rds-role): operation error IAM: CreateRole, https response error StatusCode: 409, RequestID: 4098ae43-0c4f-4c9b-a7f0-bb4bde5aaa8e, EntityAlreadyExists: Role with name de-c1w4-rds-role already exists.

with module.vector_db.aws_iam_role.rds_role,
on modules/vector-db/iam_roles.tf line 1, in resource “aws_iam_role” “rds_role”:
1: resource “aws_iam_role” “rds_role” {

Error: creating RDS DB Subnet Group (de-c1w4-vector-db-subnet-group): operation error RDS: CreateDBSubnetGroup, https response error StatusCode: 400, RequestID: 5aff7b22-c523-40f4-886c-f4759cf1f0f2, DBSubnetGroupAlreadyExists: The DB subnet group ‘de-c1w4-vector-db-subnet-group’ already exists.

with module.vector_db.aws_db_subnet_group.vector_db_subnet_group,
on modules/vector-db/rds.tf line 1, in resource “aws_db_subnet_group” “vector_db_subnet_group”:
1: resource “aws_db_subnet_group” “vector_db_subnet_group” {

Error: creating Security Group (de-c1w4-vector-db-sg): operation error EC2: CreateSecurityGroup, https response error StatusCode: 400, RequestID: c1d3f429-a6f9-4f0e-b2ce-3b5ba1a05963, api error InvalidGroup.Duplicate: The security group ‘de-c1w4-vector-db-sg’ already exists for VPC ‘vpc-09df03d28804533c5’

with module.vector_db.aws_security_group.vector_db_sg,
on modules/vector-db/rds.tf line 10, in resource “aws_security_group” “vector_db_sg”:
10: resource “aws_security_group” “vector_db_sg” {

Hello @Epoch_and_chill,

This error is because the terminal crashed when used terraform apply when tried to implement the streaming pipeline and now those resources are shown as existing. A new lab session after a few hours should be able to clear them since you can’t manually delete them. Could you try in a new lab session and hopefully you will run terraform apply without errors. Thank you

@Epoch_and_chill When you try again the lab, the terraform files contain your saved work from a previous session.

The lab asks you first to uncomment first etl section in main.tf and outputs.tf and then those of vector_db and so on. If you try again the lab and let’s say you have the first two modules uncommented from the previous session, and your run terraform in the first section, all the resources of these two modules will be created. So when you get to the second section and run terraform again, you’ll get that all the resources have been already created.

So when you try the lab again, I’d suggest that you either comment back everything you uncommented from previous sessions (in main.tf and outputs.tf) Or you can retrieve the original lab files as follows:

  1. Create a new folder and move all current lab files to this folder (you can label the folder as old_files)

  2. Click on lab help which is the question mark on the top right

  3. Click on “Get the latest version”.

Hey @hawraa.salami and @Georgios, just wanted to say thank you for all your help. The lab did reset as you had suggested and all of your suggestions helped and I was able to complete it. Thanks again!

1 Like

hii by mistake i deleted my all resources so i am not able to do my assignment please help me

Hello @AkkashGanvir,

Could you go to help (at top right corner) and choose get latest version. That should update your lab and recreate any files you deleted. Hope it helps:

  1. Click on lab help which is the question mark on the top right

  1. Click on “Get the latest version”.

You could also ask for a lab refresh with this form to reset your lab from start

@Georgios , I am also getting the same error. Can you please advise on how to proceed?

Hello @IroquoisPliskin,

Please submit this form for a lab refresh. It seems to fix Insufficient Lake Formation but it takes 2 business days to complete. Thank you

Thank you, I submitted the information in the form, let me know if you need additional information.

1 Like