Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

@mohansaid9 ➜ /workspaces/hands-on-introduction-data-engineering-4395021 (02-02) $

export AIRFLOW_HOME="/workspaces/hands-on-introduction-data-engineering-
4395021/airflow" && pip install "apache-airflow==2.5.1" --constraint
"https://raw.githubusercontent.com/apache/airflow/constraints-2.5.1/constraints-3.7.txt"

@mohansaid9 ➜ /workspaces/hands-on-introduction-data-engineering-4395021 (02-02) $


airflow

@mohansaid9 ➜ /workspaces/hands-on-introduction-data-engineering-4395021 (02-02) $


airflow db init  creates a database

@mohansaid9 ➜ /workspaces/hands-on-introduction-data-engineering-4395021 (02-02) $


airflow users create \
--username admin \
--firstname Firstname \
--lastname Lastname \
--role Admin \
--email admin@example.org \
--password password

airflow users create \


--username admin \
--firstname Firstname \
--lastname Lastname \
--role Admin \
--email admin@example.org \
--password password

 creates user

@mohansaid9 ➜ /workspaces/hands-on-introduction-data-engineering-4395021 (02-02) $


cd airflow/
@mohansaid9 ➜ /workspaces/hands-on-introduction-data-engineering-4395021/airflow
(02-02) $ ls

Go to webserver_config.py and set WTF_CSRF_ENABLED = False

The Airflow webserver is a web-based UI that's commonly used in production to provide an


overview of all DAGs and their execution flow. It offers several ways to manage
administrative settings, connections, variables, and other components of Airflow through an
easy-to-use web interface.

The Airflow scheduler is a process that continually monitors all tasks and DAGs in Airflow. It
starts subprocesses that keep track of the heartbeat of all DAGs and checks whether any
active tasks can be triggered. Although it's possible to run the webserver without the
scheduler, it's also not recommended. Now let's switch back to Codespaces and see how to
run both the Airflow webserver and the scheduler.

@mohansaid9 ➜ /workspaces/hands-on-introduction-data-engineering-4395021/airflow
(02-02) $ airflow webserver -D

Port gets generated (


(cd ..)

@mohansaid9 ➜ /workspaces/hands-on-introduction-data-engineering-4395021/airflow
(02-02) $ airflow dags list

@mohansaid9 ➜ /workspaces/hands-on-introduction-data-engineering-4395021/airflow
(02-02) $ airflow scheduler -D

@mohansaid9 ➜ /workspaces/hands-on-introduction-data-engineering-4395021/airflow
(02-02) $ cat $AIRFLOW_HOME/airflow-webserver.pid | xargs kill
@mohansaid9 ➜ /workspaces/hands-on-introduction-data-engineering-4395021/airflow
(02-02) $ echo "" > $AIRFLOW_HOME/airflow-webserver.pid
@mohansaid9 ➜ /workspaces/hands-on-introduction-data-engineering-4395021/airflow
(02-02) $ cat $AIRFLOW_HOME/airflow-scheduler.pid | xargs kill
@mohansaid9 ➜ /workspaces/hands-on-introduction-data-engineering-4395021/airflow
(02-02) $ echo "" > $AIRFLOW_HOME/airflow-scheduler.pid

Upon installation, Airflow will create an airflow.cfg file that lives in the Airflow installation
directory. To see where that directory is, you can run echo AIRFLOW_HOME.

@mohansaid9 ➜ /workspaces/hands-on-introduction-data-engineering-4395021/airflow
(main) $ echo $AIRFLOW_HOME

I start by checking if any of the environment variables have been set for Airflow. In this case,
it looks like I only have AIRFLOW_HOME set, so I should be good to go.

@mohansaid9 ➜ /workspaces/hands-on-introduction-data-engineering-4395021/airflow
(main) $ env | grep -i airflow

# The folder where your airflow pipelines live, most likely a


# subfolder in a code repository. This path must be absolute.
dags_folder = /workspaces/hands-on-introduction-data-engineering-4395021/airflow/dags

We can unload example DAGS by load_examples = False in airflow.cfg file.

@mohansaid9 ➜ /workspaces/hands-on-introduction-data-engineering-4395021/airflow
(main) $ cat airflow.cfg | grep "dags_folder"

You might also like