C4W4 Capstone project Part-1 'ingest_date' format

Hey I am getting the following error while trying to run the following Transformation glue jobs.

aws glue start-job-run --job-name de-c4w4a1-json-transform-job | jq -r ‘.JobRunId’

AnalysisException: Path does not exist: s3://de-c4w4a1-426992998030-us-east-1-data-lake/landing_zone/api/users/yyyy-mm-dd

aws glue start-job-run --job-name de-c4w4a1-songs-transform-job | jq -r ‘.JobRunId’

ValueError: time data ‘yyyy-mm-dd’ does not match format ‘%Y-%m-%d’

Could you please confirm what should be the value of the “–ingest_date =” in the glue.tf file?

Hello @cchristyraj
You are supposed to replace the <PACIFIC-TIME-CURRENT-DATE> placeholder in two places in the terraform/modules/transform_job/glue.tf file. The value should be the current pacific datetime in the yyyy-mm-dd format, for example 2024-11-28.

I get that but don’t know where I am doing the mistake. Should the value be “yyyy-MM-dd” instead of “yyyy-mm-dd”?

I input the today’s date as value and completed the Transformation jobs.

Hi, Amir! I’m still hitting this issue after supplying the current Pacific date. My code snippet of the glue.tf are below. What am I missing?

default_arguments = {
“–enable-job-insights” = “true”
“–job-language” = “python”
# Set "--catalog_database" to aws_glue_catalog_database.transform_db.name
“–catalog_database” = aws_glue_catalog_database.transform_db.name
# Set “–ingest_date” to the server’s current date in Pacific Time (UTC-7), in “yyyy-mm-dd” format.
# (replace the placeholder <PACIFIC-TIME-CURRENT-DATE>)
“–ingest_date” = “2024-12-19”

default_arguments = {
“–enable-job-insights” = “true”
“–job-language” = “python”
“–catalog_database” = aws_glue_catalog_database.transform_db.name
# Set “–ingest_date” to the server’s current date in Pacific Time (UTC-7), in “yyyy-mm-dd” format.
# (replace the placeholder <PACIFIC-TIME-CURRENT-DATE>)
“–ingest_date” = “2024-12-19”

For songs-transform job I get AttributeError: ‘DataFrame’ object has no attribute ‘duration’. For json-transform job I get AnalysisException: Path does not exist: s3://de-c4w4a1-533267286350-us-east-1-data-lake/landing_zone/api/users/2024-12-19. Would appreciate your advice.

Hello @ArtK
Apparently, you have filled in the <PACIFIC-TIME-CURRENT-DATE> variable in transform_job/glue.tf file correctly, yet the the data the second one reads doesn’t have the required columns, and the first job can’t find the data it needs. So, my guess is that your issue is with the extract jobs. After you run your extract jobs, you can check the address you see in the exception, namely s3://de-c4w4a1-sensitivedatahere-us-east-1-data-lake/landing_zone/api/users/2024-12-19 , and verify that the files the transform job wants to read indeed exist there.

1 Like

Thanks, Amir! I think I messed it up somehow by proceeding to the next step (servicing) before the jobs were completed. Anyway, my lab expired the other day, and when I re-did it just now, all worked like a charm. Thanks!

Hello @Amir_Zare
I seem to have the same issue, but I have the mentioned data files from the extract jobs in place.
But I still got this error “ValueError: time data ‘’ does not match format ‘%Y-%m-%d’” in my AWS Glue for job de-c4w4a1-songs-transform-job

In the respective glue.tf its the current date:
default_arguments = {

# Set “–ingest_date” to the server’s current date in Pacific Time (UTC-7), in “yyyy-mm-dd” format.
# (replace the placeholder <PACIFIC-TIME-CURRENT-DATE>)
“–ingest_date” = “2025-02-13”

My extract jobs run with SUCCESS.
I do not know where to fix this. I am stuck at the same position 4 times by now. Could you please help?

Thanks! Thomas (LabID drdmhpekafrw)

Hello @max85
Seems like you have missed some of the placeholders, or you might have forgot to save the changes made to the file before deploying your terraform components. The error is saying that you still have "<PACIFIC-TIME-CURRENT-DATE>" in your files. Please, make sure that you replace this with the current date in two instances in the glue.tf (lines 27 and 69) and save the changes made to the files before running terraform commands.