Capstone Project Part 2 - Glue Data Quality Rulesets Already Exist

While doing the terraform apply -target=module.data_quality, i keep getting the error
Error: creating Glue Data Quality Ruleset (songs_dq_ruleset): operation error Glue: CreateDataQualityRuleset, https response error StatusCode: 400, RequestID: daa04876-8600-4696-b04b-f83aa804ae72, AlreadyExistsException: Another ruleset with the same name already exists: songs_dq_ruleset

with module.data_quality.aws_glue_data_quality_ruleset.songs_dq_ruleset,
on modules/data_quality/glue.tf line 1, in resource “aws_glue_data_quality_ruleset” “songs_dq_ruleset”:
1: resource “aws_glue_data_quality_ruleset” “songs_dq_ruleset” {

Error: creating Glue Data Quality Ruleset (sessions_dq_ruleset): operation error Glue: CreateDataQualityRuleset, https response error StatusCode: 400, RequestID: 9356bf26-eb03-42a8-a02d-5c497095ab3f, AlreadyExistsException: Another ruleset with the same name already exists: sessions_dq_ruleset

with module.data_quality.aws_glue_data_quality_ruleset.sessions_dq_ruleset,
on modules/data_quality/glue.tf line 10, in resource “aws_glue_data_quality_ruleset” “sessions_dq_ruleset”:
10: resource “aws_glue_data_quality_ruleset” “sessions_dq_ruleset” {

Error: creating Glue Data Quality Ruleset (users_dq_ruleset): operation error Glue: CreateDataQualityRuleset, https response error StatusCode: 400, RequestID: 6631fb93-b058-45f7-bfb6-8749ec9e60ef, AlreadyExistsException: Another ruleset with the same name already exists: users_dq_ruleset

with module.data_quality.aws_glue_data_quality_ruleset.users_dq_ruleset,
on modules/data_quality/glue.tf line 19, in resource “aws_glue_data_quality_ruleset” “users_dq_ruleset”:
19: resource “aws_glue_data_quality_ruleset” “users_dq_ruleset” {

my previous glue jobs have been executed and successful but in glue databases, I cannot delete the rulesets

what should I do??
please anyone help

Hello @arban2212,
Could you make sure you are following the correct order of the commands in part 2.3:

terraform init
terraform plan
terraform apply -target=module.extract_job
terraform apply -target=module.transform_job
terraform apply -target=module.serving

Later in step 3.1.2 you create the rulesets with this command terraform apply -target=module.data_quality if you run it before serving for example you might get this error. Hope it helps

HI @Georgios I followed the correct order of commands as you mentioned. Those worked perfectly for me, but my issue is only related to module.data_quality.
Can you suggest some other alternatives?

Hello @arban2212,

Sorry for the inconvenience but after you followed the correct order and you didn’t find them in the AWS glue console I can’t think something. Since you are not able to delete them to continue you could ask for a lab refresh. Please submit the form and it takes 2 business days to complete. Hope it helps

yes done the same
thank you

1 Like

Hi @Georgios , I have applied for a lab refresh again because my Glue jobs which succeeded in the previous attempt are not successful when I ran it again today. Can you please make sure that I get the refreshed lab on time because my deadline is in 3 days.

Hello @arban2212,

Yes the lab refresh will be on time. Perhaps you need to change the API endpoint in your new lab session. Could you check in cloudformation and make sure you use the correct one. Thank you

ok i’ll make sure of that

Hello @arban2212,

Yes you could check that, you need to replace the <API_ENDPOINT> placeholders with the API Endpoint value (in two places). Every lab session should be a different one. What error do you get when you try to run could you share the screenshot, thanks:

yes i was getting a similar error for 3 glue jobs

Hello @arban2212,

I see this might be because you’ve edited something in the terraform/modules/extract_job/glue.tf file. Which jobs has the issue and what does the errors says, perhaps I can check it, thanks. Another issue if you changed wrong the file:

    "--api_url"             = "http://ec2***.compute-1.amazonaws.com/users"
    "--target_path"         = "s3://${var.data_lake_bucket}/landing_zone/api/users"
    "--ingest_date"         = "2020-02-01"

Hi @Georgios
Ok will change the values
Just wanted to know whether the lab refresh will be on time or not because of tight deadline. Thanks

Hello @hawraa.salami,

It seems there is an issue with @arban2212 lab running his jobs and existing resources. Could you check his lab refresh since his deadlines are close. @arban2212 when did you submit the form since it takes two business days to complete and could you please share the outputs of the jobs if it Insufficient Lake Formation Permissions or something else. Thank you

Hi @Georgios i submitted the form yesterday. Thanks
I will try to find the output screenshots if it is with me.

Hello @arban2212,

Yes please do since it already been two days since you had the issue with the dq_rulesets. You should be fine after the lab refresh but just in case there is something with the failing jobs we could do. Thank you

Another issue which is happening recently is when I try to open the lab, the jupyter notebook is not loading and instead shows ‘unsupported extension vscode.ipynb’
Tried reloading multiple times but no solution.

Hello @arban2212,

Does it load but not opening the instructions like this Unknown extension vscode.ipynb. I had this issue here it didn’t load because of a poor connection and throws that error after a while. Could you check you are using everything like before. Thanks:

Yes i am getting the same output as your screenshot.

Hello @arban2212,

Just checking since I was able to get a lab refresh earlier, were you able to access the lab after your new issue. I wonder since you should be able to continue with your glue jobs or perhaps you might need to send a new form again to be sure. Thank you

Hi @Georgios
Received the new lab. Will let you know whether I face the same issue again

1 Like