Airflow production implementation. Use-cases

I am working on the Airflow’s last grading exercise. In all of the Airflow’s implementation it is having the instruction like this:

For educational purposes, the proposed DAG uses pandas , SciPy , Great Expectations , and NumPy to handle the data within the Airflow instance. This is not desirable in real-life Airflow pipelines.

I understand that for the 1 or 2 labs it is good to keep it simple and provide all the implementation and use-cases in the single file to make it easy to run and execute.

However, this makes it difficult and confusing to learners on how the real production implementation/workflow will work using this operators in projects.

In the last exercise, at least it should demonstrate the real world use-case implementation or at least provide a code reference for the interested persons.

If any mentor or support person can help with the real-life implementation structure and template. It is a great help moving forward.

Hello @hardikg

I did a little research, and to the best of my understanding we should use operators that let us run resource intensive tasks on external machines rather than on the Airflow host. Two of these operators are:

  1. KubernetesPodOperator: It allows us to create Kubernetes pods on a cluster and command the pods to run some code.
  2. ECS Operator: It allows us to run tasks on AWS EC servers.

You can take a look at the documentation for these two operators. However, they are out of the scope of these courses, and unfortunately we cannot cover them here.

Hope it helps.

Thank you for your response, @Amir_Zare .
Yes. I know it’s outside the course, but I’m looking for any documentation or template use cases for using Airflow in real-world scenarios. If this is included as optional as well, the ones who are interested can check practice using that approach as well

1 Like