C3W2A1 - which part of the code explicitly applies the iceberg format?

Suraj_Kamath · November 19, 2024, 7:21am

I’m in the portion of the assignment where the ratings table in the curated db is being updated with the new ratings from the json file. There is a part of the code where it checks if the table exists and if it doesn’t, it creates the table. I figure that is where it should apply the format that was provided in the glue parameters, but I can’t see where that happens. The only mention of iceberg is in location, but that doesn’t seem enough to setup the table as iceberg format.

try:
if table_exists:
else: additional_options = {“write.parquet.compression-codec”: “gzip”} data_df.writeTo( f"glue_catalog.{database_name}.{table_name}" ).tableProperty(“format-version”, “2”).tableProperty( “location”, f"s3://{data_lake_bucket}/{database_name}/{table_name}/iceberg", ).options( **additional_options ).create() logger.info(f"Created {database_name}.{table_name} \n") except Exception as err: traceback_error = traceback.format_exc() logger.error( f"Error while merging into {database_name}.{table_name}. {traceback_error} {err} \n"

Georgios · November 19, 2024, 3:00pm

Hello @Suraj_Kamath,
I think it is explained in part 6.1.11. the table is populated with data if table_exists from json to iceberg (de-c3w2a1-ratings-to-iceberg-job):

There are other parts in the lab that iceberg tables are created, you can find create_iceberg_table in lf_utils .py, I am not sure where you can use them though. Hope it helps

Suraj_Kamath · November 20, 2024, 4:31am

That helps. it isn’t needed for the lab itself. I’m just exploring the code to see how it works behind the scenes. the glue job runs the python script, the python script should create the table on the first run. I’ll try to find where it calls the create_iceberg_table function.

I just wasn’t sure if the iceberg tables were pre-created during the lab setup, or the glue job was creating them and where.

Topic		Replies	Views
C3W2 Programming Assignment: TABLE_ALREADY_EXISTS Data Storage and Queries week-2 , module-3	6	40	February 18, 2025
Assignment 2: Building a Data Lakehouse with AWS Lake Formation and Apache Iceberg Data Storage and Queries week-2	4	37	February 25, 2025
C3W2-Lab2_assigment DataLake house - missing ratingtimestamp col data Data Storage and Queries week-2	1	18	January 2, 2025
C3W2 resource "aws_glue_job" "ratings_to_iceberg_job" not in glue.tf Source Systems, Data Ingestion, and Pipelines week-2	1	16	November 28, 2024
C3W2 Assignment, what value should I type in? Data Storage and Queries week-2	3	31	March 13, 2025

C3W2A1 - which part of the code explicitly applies the iceberg format?

Related topics