AWS Networking

Hello :grinning:

I am planning to start a DE project to apply everything I’ve learned from this course. The first step I have in mind is to design the network architecture.

In this post, I will explain what I understand about AWS Networking best practices. Please feel free to correct me or add any precisions/clarifications.

Key AWS Networking Concepts

  1. VPC (Virtual Private Cloud):
  • A VPC is a virtual network that isolates resources and defines IP ranges.

  • Resources within a VPC can be organized into public and private subnets. Public subnets can be accessed from the internet, while private subnets are isolated from direct internet access.

  1. Internet Gateway (IGW):

An IGW is attached to a VPC to allow public subnets to have internet access. Traffic from public subnets can go through the IGW to reach external resources, such as your local machine or external tools.

  1. NAT Gateway:

NAT Gateways allow private subnet resources to initiate outbound internet connections (for things like updates or accessing APIs) while still remaining inaccessible to incoming internet traffic. NAT Gateways sit in public subnets to route traffic out from private subnets.

  1. Route Tables:
  • Route tables define how traffic is directed within the VPC. Each subnet is associated with a route table that determines where traffic goes.
  • Public subnets typically have routes to the Internet Gateway, while private subnets have routes to the NAT Gateway for outbound-only internet access.
  1. Security Groups:

Security groups are virtual firewalls for controlling inbound and outbound traffic at the instance or service level.

  1. Network ACLs (Access Control Lists):

Network ACLs provide an additional layer of security at the subnet level, controlling traffic in and out of subnets with rules for IPs and ports.

Practical Example

In this example, I want to create a redshift database and be able to access it from y local machine or an orchestration tool outside of the VPC.

  1. VPC Configuration:
  • Create a VPC with public and private subnets across different Availability Zones for high availability.
  • Place the Redshift cluster in one of the private subnets to keep it secure, limiting its exposure to the internet.
  1. Internet Gateway:

Attach an IGW to the VPC to enable internet access for public subnets. The IGW allows traffic to flow from your local machine to resources in the VPC.

  1. NAT Gateway:
  • Deploy a NAT Gateway in each public subnet and configure it to handle outbound traffic from the private subnets.
  • This will allow resources in private subnets to access the internet without direct exposure.
  1. Route Tables:
  • Public Subnet Route Table: Configure the route table for each public subnet to direct 0.0.0.0/0 traffic through the Internet Gateway.
  • Private Subnet Route Table: Configure the route tables for each private subnet to direct 0.0.0.0/0 traffic through the respective NAT Gateway, ensuring only outbound internet access.
  1. Security Groups for Redshift:
  • Create a security group for Redshift that allows inbound traffic on port 5439 (Redshift’s default port) only from your local machine’s IP address or the IP range of your orchestration tool.
  • Configure outbound rules in the security group as needed, although Redshift typically only needs inbound rules for external access.
  1. Network ACLs (Optional):

These are not essential for the project.

Has anyone else started a real-world project? I’d love to hear about them!

Looking forward to learning from all of you!

1 Like

Hi @francois_adam that’s an amazing approach, I think you could create something around that…I want to build some projects using the tools we learn on the first course, this can help us to understand the material and fill gaps on the knowledge we might have! I will share a post about it soon! Do you have a dataset or type of project you want to do?

Good morning @pastorsoto,

I recently started the DataTalksClub Data Engineering Zoomcamp to work on more practical projects. Although the course uses Google Cloud as the cloud provider, I decided to use AWS instead to apply what I learned in our course.

For the introductory project, I’m working with the NYC TLC dataset. However, for my final project, I’m considering a topic focused on European public transportation data or something within the medical field.

Looking forward to seeing your post and hearing more about your project ideas!

1 Like