C4W4 Capstone project Part-1 'ingest_date' format

Hey I am getting the following error while trying to run the following Transformation glue jobs.

aws glue start-job-run --job-name de-c4w4a1-json-transform-job | jq -r ‘.JobRunId’

AnalysisException: Path does not exist: s3://de-c4w4a1-426992998030-us-east-1-data-lake/landing_zone/api/users/yyyy-mm-dd

aws glue start-job-run --job-name de-c4w4a1-songs-transform-job | jq -r ‘.JobRunId’

ValueError: time data ‘yyyy-mm-dd’ does not match format ‘%Y-%m-%d’

Could you please confirm what should be the value of the “–ingest_date =” in the glue.tf file?

Hello @cchristyraj
You are supposed to replace the <PACIFIC-TIME-CURRENT-DATE> placeholder in two places in the terraform/modules/transform_job/glue.tf file. The value should be the current pacific datetime in the yyyy-mm-dd format, for example 2024-11-28.

I get that but don’t know where I am doing the mistake. Should the value be “yyyy-MM-dd” instead of “yyyy-mm-dd”?

I input the today’s date as value and completed the Transformation jobs.

Hi, Amir! I’m still hitting this issue after supplying the current Pacific date. My code snippet of the glue.tf are below. What am I missing?

default_arguments = {
“–enable-job-insights” = “true”
“–job-language” = “python”
# Set "--catalog_database" to aws_glue_catalog_database.transform_db.name
“–catalog_database” = aws_glue_catalog_database.transform_db.name
# Set “–ingest_date” to the server’s current date in Pacific Time (UTC-7), in “yyyy-mm-dd” format.
# (replace the placeholder <PACIFIC-TIME-CURRENT-DATE>)
“–ingest_date” = “2024-12-19”

default_arguments = {
“–enable-job-insights” = “true”
“–job-language” = “python”
“–catalog_database” = aws_glue_catalog_database.transform_db.name
# Set “–ingest_date” to the server’s current date in Pacific Time (UTC-7), in “yyyy-mm-dd” format.
# (replace the placeholder <PACIFIC-TIME-CURRENT-DATE>)
“–ingest_date” = “2024-12-19”

For songs-transform job I get AttributeError: ‘DataFrame’ object has no attribute ‘duration’. For json-transform job I get AnalysisException: Path does not exist: s3://de-c4w4a1-533267286350-us-east-1-data-lake/landing_zone/api/users/2024-12-19. Would appreciate your advice.

Hello @ArtK
Apparently, you have filled in the <PACIFIC-TIME-CURRENT-DATE> variable in transform_job/glue.tf file correctly, yet the the data the second one reads doesn’t have the required columns, and the first job can’t find the data it needs. So, my guess is that your issue is with the extract jobs. After you run your extract jobs, you can check the address you see in the exception, namely s3://de-c4w4a1-sensitivedatahere-us-east-1-data-lake/landing_zone/api/users/2024-12-19 , and verify that the files the transform job wants to read indeed exist there.

1 Like

Thanks, Amir! I think I messed it up somehow by proceeding to the next step (servicing) before the jobs were completed. Anyway, my lab expired the other day, and when I re-did it just now, all worked like a charm. Thanks!

Hello @Amir_Zare
I seem to have the same issue, but I have the mentioned data files from the extract jobs in place.
But I still got this error “ValueError: time data ‘’ does not match format ‘%Y-%m-%d’” in my AWS Glue for job de-c4w4a1-songs-transform-job

In the respective glue.tf its the current date:
default_arguments = {

# Set “–ingest_date” to the server’s current date in Pacific Time (UTC-7), in “yyyy-mm-dd” format.
# (replace the placeholder <PACIFIC-TIME-CURRENT-DATE>)
“–ingest_date” = “2025-02-13”

My extract jobs run with SUCCESS.
I do not know where to fix this. I am stuck at the same position 4 times by now. Could you please help?

Thanks! Thomas (LabID drdmhpekafrw)

Hello @max85
Seems like you have missed some of the placeholders, or you might have forgot to save the changes made to the file before deploying your terraform components. The error is saying that you still have "<PACIFIC-TIME-CURRENT-DATE>" in your files. Please, make sure that you replace this with the current date in two instances in the glue.tf (lines 27 and 69) and save the changes made to the files before running terraform commands.

I follow exactly but still encounter the following error:

AnalysisException: Path does not exist: s3://de-c4w4a1-05826439xxxx-us-east-1-data-lake/landing_zone/api/users/yyyy-mm-dd

Hello @HeyChong
The error is saying the path s3://de-c4w4a1-05826439xxxx-us-east-1-data-lake/landing_zone/api/users/yyyy-mm-dd does not exist. It means it’s looking for a path with yyyy-mm-dd in it. You might have missed out to replace the yyyy-mm-dd placeholder somewhere, or you might have forgotten to save the changes made to the files.

I try change the “–ingest_date” = “%Y-%m-%d” in modules>transform>glue.tf
yet get the following error:

de-c4w4a1-json-transform-job
AnalysisException: Path does not exist: s3://de-c4w4a1-058264395471-us-east-1-data-lake/landing_zone/api/users/%Y-%m-%d

de-c4w4a1-songs-transform-job
ValueError: time data ‘%Y-%m-%d’ does not match format ‘Y_%m_%d’

my timezone is GMT +8 is there anythings I miss out?

@HeyChong it doesn’t have anything to do with your time-zone. You need to change the placeholder “” with the current pacific date in the yyyy-mm-dd format. For example, if today is March 19th, 2025, it would be 2025-03-19.
image
You are either not changing the placeholder in all the places it appears, or you are not saving the changes before running the terraform commands. According to the error you are getting, terraform is looking for the literal address s3://de-c4w4a1-058264395471-us-east-1-data-lake/landing_zone/api/users/2025-03-19, while it should be looking for s3://de-c4w4a1-058264395471-us-east-1-data-lake/landing_zone/api/users/%Y-%m-%d in your case.