Why using S3 instead of RDS?

trhnam · March 28, 2025, 1:19pm

Hi, in the lab exercise of W2, why using S3 for the star database (the database created after ETL) instead of using a relational (RDS) database for it? What are the differences? Why S3 is preferable in this case?

yjcb22 · April 15, 2025, 4:26pm

Dear @trhnam
Thanks for posting your question and welcome to the team!

Please find my comments:

Since this is the first course, the idea behind the lab is to show you the general work flow of an entire pipeline for a data engineering transformation, i.e., ingesting->ETL->serving… Consequently, you will see this diagram:

image696×352 47.2 KB

There you will see that we are already using a database as the source of the data consequently after performing the ETL phase it would more sense to show the student the outcome in a storage more related to a datalake system (in this case S3) and then to be able to query using Amazon Athena
2. By exposing the student to more systems, RDS (mysql/postgress), S3 with Athena, you are learning more technologies that reflect the actual case in a production environment.
3. By using S3 you can learn that it would be optimal for storing different types of data, files (video, parquet, csv, audio), databases (queried via Athena).

I hope this makes it clear for you.
Later in the course you will learn how the transformed sql data (to start schema) is loaded into another database (mysql/postgres) but in this introductory lab the idea is to expose the student to more technologies used in a production environment.

Topic		Replies	Views
Why S3+Athena for analytsts to query instead of a relational database? Introduction to Data Engineering week-module-2 , ai-discussions	2	11	June 24, 2025
SageMaker connecting with other resources than S3 AI Discussions	1	38	May 18, 2023
Redshift & DLH architecture notes & feedback Data Storage and Queries week-module-3 , coursera-platform	1	12	January 3, 2025
Building End-to-End Batch and Streaming Data Pipelines Based on Stakeholder Requirements Introduction to Data Engineering week-module-4 , coursera-platform	3	36	November 16, 2024
Local development environment Introduction to Data Engineering week-module-2 , coursera-platform	1	40	September 21, 2024

Why using S3 instead of RDS?

Related topics