C1W2 - Storage systems vs Storage abstractions

Ajith1 · April 1, 2025, 2:47pm

When creating a post, please add:

Module # C1W2
Link to the classroom item you are referring to:
https://www.coursera.org/learn/intro-to-data-engineering/lecture/nOOKQ/storage
Description (include relevant info but please do not post solution code or your entire notebook)
According to the slides Apache iceberg and hudi is considered as storage systems. But Apache iceberg and hudi are datalakehouse which is also considered as storage abstraction

So the question is Apache iceberg and hudi are storage systems or storage abstractions ?

Georgios · April 1, 2025, 3:40pm

You will learn more about storage abstractions and there is a lab on AWS lake formations and Iceberg later in Course 3. In the week 2 slides we can see that storage abstractions (e.g. data Lakes, data Warehouse etc) are in the top of the hierachy. Storage systems like the one you mentioned (iceberg and hudi and other) are in the middle and add features to the storage abstractions. Iceberg adds shema flexibility, data partioning for large datasets and other features as we will see later in the course. Hope it helps

Ajith1 · April 2, 2025, 5:30pm

Sorry for the confusion with my question, let me clarify the same.

Apache iceberg and hudi are open datalakehouse solutions.

According to slidedeck,

Apache iceberg and hudi are placed under storage systems
datalakehouse is placed under storage abstractions

Isn’t this contradictory statements?

Georgios · April 2, 2025, 6:19pm

Hello @Ajith1,

Ok I think I understand what you mean but according to the lectures of Week 2 Iceberg and hudi are definitely not storage abstractions sitting on top of the Hierarchy. You are correct about Apache iceberg and hudi are open datalakehouse solutions and Joe explains that in his book after Delta lakes (which he calls storage management system).

If you follow the course 3 later we implement a datalakehouse in this Assignment: Building a Data Lakehouse with AWS Lake Formation and Apache Iceberg. That will explain about any differences in the datalakehouse and Iceberg architecture. Hope it helps

Topic		Replies	Views
C3W2-Quiz discussion item for snapshots Data Storage and Queries week-module-2 , coursera-platform	2	13	January 15, 2025
Redshift & DLH architecture notes & feedback Data Storage and Queries week-module-3 , coursera-platform	1	11	January 3, 2025
C3-W2 Assignment Metadata problem Data Storage and Queries week-module-2 , week-module-3 , coursera-platform	33	156	June 4, 2025
C3W2 Assignment 2: Building a Data Lakehouse...cannot see AWS Glue Catalog Data Storage and Queries week-module-2 , coursera-platform	1	27	March 29, 2025
Assignment 2: Building a Data Lakehouse with AWS Lake Formation and Apache Iceberg Data Storage and Queries week-module-2 , coursera-platform	6	72	April 10, 2025

C1W2 - Storage systems vs Storage abstractions

Related topics