C3W2A1 - which part of the code explicitly applies the iceberg format?

I’m in the portion of the assignment where the ratings table in the curated db is being updated with the new ratings from the json file. There is a part of the code where it checks if the table exists and if it doesn’t, it creates the table. I figure that is where it should apply the format that was provided in the glue parameters, but I can’t see where that happens. The only mention of iceberg is in location, but that doesn’t seem enough to setup the table as iceberg format.

try:
if table_exists:

else:
additional_options = {“write.parquet.compression-codec”: “gzip”}
data_df.writeTo(
f"glue_catalog.{database_name}.{table_name}"
).tableProperty(“format-version”, “2”).tableProperty(
“location”,
f"s3://{data_lake_bucket}/{database_name}/{table_name}/iceberg",
).options(
**additional_options
).create()
logger.info(f"Created {database_name}.{table_name} \n")
except Exception as err:
traceback_error = traceback.format_exc()
logger.error(
f"Error while merging into {database_name}.{table_name}. {traceback_error} {err} \n"

Hello @Suraj_Kamath,
I think it is explained in part 6.1.11. the table is populated with data if table_exists from json to iceberg (de-c3w2a1-ratings-to-iceberg-job):


There are other parts in the lab that iceberg tables are created, you can find create_iceberg_table in lf_utils .py, I am not sure where you can use them though. Hope it helps

That helps. it isn’t needed for the lab itself. I’m just exploring the code to see how it works behind the scenes. the glue job runs the python script, the python script should create the table on the first run. I’ll try to find where it calls the create_iceberg_table function.

I just wasn’t sure if the iceberg tables were pre-created during the lab setup, or the glue job was creating them and where.

1 Like