C4W4 Capstone Project Part 1 - ETL and Data Modeling: 4.2 - Transformation Zone- Jobs failed

My jobs (de-c4w4a1-json-transform-job or de-c4w4a1-songs-transform-job ) are failed while running them in VS terminal. I have compared my files in the Capstone Project Part 1, and the same files in Capstone Project Part 2. The variables in my Part 1 file match in the Part 2 files, except Part 2 has a few more lines for date variables.

  1. terraform/assets/transform_jobs folder:
    a. de-c4w4a1-transform-songs-job.py
    b. de-c4w4a1-transform-json-job.py
  2. terraform/modules/transform_job/s3.tf
  3. terraform/modules/transform_job/glue.tf
  4. terraform/main.tf : lines 16 to 30
  5. terraform/outputs.tf : lines 22 to 34

I don’t see any error in Log. Could you please advise me on how to troubleshoot them? I have reviewed the files above couple times. Here are info at my end.

VSCode terminal:
(jupyterlab-venv) abc@b5d1bacb7ba0:~/workspace/terraform$ aws glue get-job-run --job-name de-c4w4a1-json-transform-job --run-id jr_ef6395bc9ca8f3c8cf16019775c42a9493949153173bfbef293416360791c9a7 --output text --query “JobRun.JobRunState”

FAILED

(jupyterlab-venv) abc@b5d1bacb7ba0:~/workspace/terraform$ aws glue get-job-run --job-name de-c4w4a1-songs-transform-job --run-id jr_ea76439a91beb002faf0fb8ca6851769aaa49cb589f2b6259c0f0f977c433b90 --output text --query “JobRun.JobRunState”

FAILED

Glue jobs check:




I found the errors of the jobs.


In de-c4w4a1-transform-songs-job.py, the error in Glue is {ValueError: time data ‘yyyy-mm-dd’ does not match format ‘%Y-%m-%d’}. I have the date variables as follows. Could you advise me?

line 63-65: original codes
ingest_date = args[“ingest_date”]
date_object = datetime.strptime(ingest_date, “%Y-%m-%d”)
ingest_date_str = date_object.strftime(“%Y_%m_%d”)

landing_node = glueContext.create_dynamic_frame.from_options(

connection_options={
“paths”: [
f"s3://{source_bucket_path}/landing_zone/db_songs/ingest_on={ingest_date_str}/"
],

)

Task:
I have tried the variables “ingest_date”, “date_object”, or “ingest_date_str” and no luck. I also tried “%Y-%m-%d” and no luck too.

Add Metadata: lines 106-108

df = df.withColumn(
“ingest_on”, F.to_date(F.lit(date_object), “yyyy-MM-dd”)
).withColumn(“source_from”, F.lit(“postgres_rds”))

Hello @Adazhu,
I think the code in de-c4w4a1-transform-songs-job is correct. The issue with the de-c4w4a1-json-transform-job is related to the s3 data lake bucket:
Could you check you are not missing that folder in the S3 buckets:
de-c4w4a1-[ACCOUNT ID]-us-east-1-data-lake/landing_zone/api/users/[TODAY DATE]

That could mean that your Landing Zone jobs didnt create the correct folder for the transform-job to SUCCEED. Check if you used the correct [API-ENDPOINT] in the terraform/modules/extract_job/**glue.tf** file.

When you complete dates in the other transform_job/glue.tf use the correct syntax (year-month-day).

Another mistake would be to use .py is provided in S3.tf but not in the second one. Hope its helpful

Hi Georgios,

This is the last assignment of my whole DE certification. My subscription will be end on 12/7. Thanks for responding me quickly. :slight_smile:

  1. S3 bucket check:

I have Today Date folder in the landing_zone/api/users folder. My 4.1 jobs were successful as usual.


  1. glue.tf
    I have this “yyyy-mm-dd” for “–ingest_date”. Is that correct? Do I need to replace it by “2024-12-04” for today date, for example?

  1. API endpoint in the files are correct.

Thanks!

@Adazhu yes, could you change the date in part 2 with our date. Thanks

Yes, the two jobs are successfully completed! Thank you so much! :slight_smile:

1 Like