Airflow

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

1

Airflow: (2.5.0)
 Airflow is an open-source tool used to create, schedule, and monitor workflows.
 It was developed by Airbnb and is now maintained by the ASF.
 Airflow allows users to define their workflows as code, which can be versioned,
tested, and reused.
 Airflow uses Directed Acyclic Graphs (DAGs) to define the structure of workflows,
with each task in the DAG representing a discrete unit of work.
 Tasks can be defined using a variety of operators, which can perform tasks such as
running SQL queries, executing scripts, and sending emails.
 Airflow's scheduling engine allows users to specify when tasks should be run, with
support for both cron-like schedules and more complex calendar-based schedules.
 Airflow also provides a web interface for monitoring the progress of workflows,
viewing logs, and manually triggering tasks.
 Airflow has become popular for its ability to integrate with a wide variety of
external systems and tools, such as databases, message queues, and cloud platforms.

 Traditional ETL data pipelines are sequentially executed.


 Hence any failure in middle of steps, we need to Re-run entire pipeline again which
will be time and resource consuming process.
 Also, there is No mechanism to address below points.

1. Retry and Notify


2. Monitoring
3. Logging

What is Airflow?
2

 Airflow is a workflow management tool. (NOT an ETL tool)


 Uses Directed Acyclic Graphs (DAG’s) to create data pipelines.
 Python is the language used in Airflow.
 It is highly scalable.

What is DAG?

 DAG stands for Directed Acyclic Graph which is directed and without cycles
connecting the other edges.
 The edges of DAG will go only one way.
 Airflow is a workflow management tool and DAG is used to create workflows.

 By default, Airflow uses SQLite, which is intended for development purposes only
since it supports single Read/ Write.
 Hence we cannot execute multiple workflows at a time.
 In PROD we need to use either Postgres/ My SQL.
3

You might also like