Professional Documents
Culture Documents
AIRFLOW
AIRFLOW
Key Concepts:
Introduction:
In Apache Airflow, Directed Acyclic Graphs (DAGs) represent a collection of tasks with
dependencies and a defined schedule. DAGs are defined using Python scripts and are
the backbone of workflows managed by Airflow. Each DAG describes how tasks are
structured and how they relate to each other.
Example:
➢ default_args = {
➢ 'owner': 'Gaurav',
➢ 'sla': timedelta(minutes=5),
➢ "params": {
➢ "endpoint": healthcheck_uuid
➢ }
➢ }
➢ mysql_params = {
➢ "mysql_conf_file": "/opt/airflow/mysql.cnf",
➢ "conn_suffix": "db-zenu"
➢ }
➢
➢ with DAG('age_of_listings_count_daily', default_args=default_args,
schedule_interval='1 0 * * *',
➢ template_searchpath=['/opt/airflow/templates'],
catchup=False, concurrency=1) as dag:
➢
➢ t0 = notification.dag_start_healthcheck_notify("start")
➢
➢ t1 = BashOperator(task_id='get_daily',
➢
bash_command='druid_dump_age_of_listings_count.bash',
➢ params=mysql_params,
➢ retries=2, retry_delay=timedelta(seconds=15),
➢
on_failure_callback=notification.task_fail_healthcheck_notify
➢ )
➢
➢ t2 = S3KeySensor(task_id='check_daily',
➢ bucket_key="{{
var.json.s3_age_of_listings_count_data_path.s3_path }}/{{
execution_date.format('YYYY-MM-DD') }}*.csv.gz",
➢ bucket_name="{{
var.json.s3_age_of_listings_count_data_path.bucket_name }}",
➢ wildcard_match=True,
➢ retries=2, retry_delay=timedelta(seconds=5),
➢
on_failure_callback=notification.task_fail_healthcheck_notify
➢ )
➢
➢ t3 = DruidOperator(task_id='ingest_daily',
➢
json_index_file='druid_ingest_age_of_listings_count.json',
➢ druid_ingest_conn_id='druid_ingest_conn',
➢
on_failure_callback=notification.task_fail_healthcheck_notify
➢ )
➢
➢ tn = notification.dag_success_healthcheck_notify("end")
➢
➢ t0 >> t1 >> t2 >> t3 >> t4