Case Study - Flight Delay

Predicting Flight Delay Risk Using a Random
Forest Classifier Based on Air Traffic Scenarios and

Environmental Conditions
Markus Bardach Eduard Gringinger Michael Schrefl, Christoph G. Schuetz
Vienna University of Technology Frequentis AG Johannes Kepler University Linz
Vienna, Austria Vienna, Austria Linz, Austria
markus -dot- bardach -at- gmail -dot- com 0000-0003-3897-3003 0000-0003-1741-0252, 0000-0002-0955-8647
Abstract—A reduction of delay costs can be achieved through the flight and must be approved by air traffic control, which
more adaptable flight planning, which hinges on accurate predic- coordinates all flights and aims to minimize congestion and
tion of delays. In order to counteract the expected delay of flights, avoid delay over all flights. In order to counteract the expected
air traffic control may adapt flight plans through slot swapping,
opening another runway, or changing the runway configuration, delay of flights, air traffic control may adapt flight plans
for example. Environmental conditions and external events such through slot swapping, opening another runway, or chang-
as runway and airspace closures may render a flight plan ing the runway configuration, for example. Environmental
obsolete, which must be taken into account when aiming to conditions and external events such as runway and airspace
reduce delay. Air traffic control must recognize changes in the closures may render a flight plan obsolete, which must be
environment and external events such as runway and airspace
closures as early as possible in order to adapt flight plans taken into account when aiming to reduce delay. Messages
accordingly and avoid delays. Current systems employed by indicate changes in environmental conditions and notify of
air traffic control do not sufficiently leverage the multitude of external events, respectively. METARs contain basic weather
available data for the detection of upcoming congestion and, data such as air temperature and information about special
consequently, flight delays. Therefore, flight plans are not adapted weather phenomena from thunderstorms to hail. NOTAMs
fast enough in air traffic scenarios with potentially high delay.
In this paper, we aim to predict the risk class of an air traffic describe runway or airspace closures, among other things. In
scenario based on the expected cost of the delays, and considering addition, general airport and runway information such as the
information about environmental conditions and external events. number of runways or the length of a runway, which may
In particular, we present a random forest classifier for Atlanta influence delay, are a valuable source for delay planning.
International Airport, which achieves an accuracy of 82.5% Air traffic control must recognize changes in the envi-
for the highest and thus most important risk classes. The
development of similar classifiers for other airports may help ronment and external events such as runway and airspace
air traffic control to more accurately predict scenarios with high closures as early as possible in order to adapt flight plans
congestion, and counteract accordingly in the future. accordingly and avoid delays. Current systems employed by
Index Terms—air traffic management, machine learning, pre- air traffic control do not sufficiently leverage the multitude
dictive analytics of available data for the detection of upcoming congestion
and, consequently, flight delays. Therefore, flight plans are
I. I NTRODUCTION not adapted fast enough in air traffic scenarios with potentially
According to EUROCONTROL delay statistics [16], in high delay. In this regard, the term “air traffic scenario” refers
2018, total delay in air traffic amounted to 19.1 million to the ensemble of flights departing or arriving, respectively,
minutes of en-route delay, which accounts only for the delay at a certain airport within a certain time frame together with
while the aircraft is in the air after take-off and before landing; information about the characteristics of the involved aircraft
with respect to 2017, that is a 105% increase in en-route (number of small/medium/large aircraft). For example, the
delay. The two main causes of delay were capacity/staffing ensemble of flights departing Vienna International Airport on
issues and weather conditions [16]. A report on delay costs April 2, 2017, from 2:50pm to 3:49pm constitutes an air
published by the University of Westminster in cooperation traffic scenario that involves ten small, 45 medium, and nine
with EUROCONTROL states that on average delay costs start large aircraft; this information describes the air traffic scenario
at EUR 32 for the first minute and rise to EUR 80 270 for proper. Furthermore, the information describing the air traffic
a delay of over 300 minutes [11, p. 81]. A reduction of scenario proper can be enriched with additional information
delay costs can be achieved through more adaptable flight about environmental conditions as well as information about
planning, which hinges on accurate prediction of delays. A runway status (number of open/closed runways) and airspace
flight plan includes information about take-off and landing closure (yes or no) obtained from METARs and NOTAMs,
slots as well as the route itself. A flight plan is created before respectively. For example, the scenario that involves flights
978-1-7281-9825-5/20/$31.00 ©2020 IEEE

Authorized licensed use limited to: Universitaet Linz. Downloaded on March 07,2022 at 09:23:58 UTC from IEEE Xplore. Restrictions apply.
departing Vienna International Airport on April 2, 2017, from threshold determines the tolerated deviance from planned ar-
2:50pm to 3:49pm is characterized, among others, by an air rival time in order to be considered on time. Hence, by adding
temperature of 45° F, a wind speed of 20 mph, a cloud ceiling weather information from the origin airport and the destination
of 5 000 ft, and the presence of rain. In this scenario, two out airport, the classification results achieve an accuracy of up to
of four runways are open and no airspace closures have been 85.8% for a delay threshold of 60 minutes. Smaller thresholds
announced. In this paper, we consider information about the decrease classification accuracy. Different types of classifiers
scenario proper together with information about environmental were evaluated, but the best-performing classification method
conditions and external events in order to predict delay. was random forest, which we also use in this paper.
Belcastro et al. [12] demonstrate that individual flights can Rebollo et al. [10] model the US airport network to pre-
be assigned to classes of delay time (risk classes) with up dict aggregated air traffic delays. Hence, prediction requires
to 80% accuracy using a random forest classifier based on information about the entire airport network. The inputs are
weather observations at the airport. the delay states of the most influential airports and the global
In this paper, we aim to predict the risk class of an air traffic delay state of the National Airspace System. Meteorological
scenario based on the expected cost of the delays, which goes information or any other environment condition information is
beyond previous work in the following ways: not used for prediction. For a delay threshold of 60 minutes,
1) Rather than individual flights, the focus of delay plan- prediction has an accuracy of 81%.
ning is on air traffic scenarios, i.e., an ensemble of flights Banavar et al. [7] model flight delays and cancellations
within a time frame. based on the expected weather. Linear regression and neural
2) The risk classes are not determined by the delay time network models are compared. In this case, the neural network
but by the expected delay costs of the air traffic scenario, outperforms linear regression. Interestingly, the accuracy of
based on a cost model which takes into account the delay estimation varies depending on the season. In this paper,
length of the delays and the size of the aircraft. we use the same data source for US flight data as Banavar et
3) Prediction takes into account environmental conditions al. [7] in their work.
of a scenario obtained from weather data (METARs) Choi et al. [13] classify delays of individual pairs of origin
but also external events such as runway and airspace and destination airports by including meteorological data at
closures indicated by Notices to Airmen (NOTAMs). the origin and destination airports. The goal is to predict if a
We present a random forest classifier for Atlanta International flight is on time or delayed. In this case, a flight is considered
Airport, which achieves, based on the analysis of airport sce- on time if it has less than 15 minutes of delay. The following
narios, including information about environmental conditions classifiers are evaluated: random forest, AdaBoost, KNN, and
and external events obtained from METARs and NOTAMs, decision trees. The result was that the random forest classifier
respectively, an accuracy of 82.5% (82% precision, 83% recall) outperformed the others.
for the highest and thus most important risk classes – with Yi et al. [14] predict flight delay by using multiple linear
average delay costs per flight of more than USD 10 500. The regression on historical flight data that also include some
development of similar classifiers for other airports may help weather information, e.g., about wind direction and speed.
air traffic control to evaluate upcoming scenarios with high The goal is to predict if a flight has more than 30 minutes of
congestion more easily, which in turn would allow air traffic delay. The linear regression model outperformed naı̈ve bayes
control to take appropriate action in order to prevent flight and C4.5 by achieving 79.1% in accuracy.
delay. The remainder of this paper is organized as follows. In When it comes to the prediction of delay, meteorological
Section II we explain the background and review related work. conditions are of key importance according to the current
In Section III we describe the available data. In Section IV we level of knowledge. For example, Abdelghany et al. [4] state
describe the necessary data preparation work. In Section V that weather accounts for nearly 75% of delays due to tight
we discuss development of the random forest classifier. In connections and could increase further if no actions are taken.
Section VI we evaluate the developed classifier. We conclude Allan et al. [2] find that reduced ceiling and visibility are the
with a summary and an outlook on future work. leading contributors to major delays. Pattern identification in
air traffic flow management by Cruciol et al. [9] shows that air
II. BACKGROUND AND R ELATED W ORK temperature and wind speed correlates with air traffic delay.
Research on predicting flight delay has mainly focused Furthermore, the classifier proposed by Belcastro et al. [12]
on individual flights. For example, Assent et al. [6] classify for predicting flight delays based on historical flight data and
flights into the three categories “ahead of time”, “on time” meteorological data, including the two attributes ceiling and
and “delayed”, based on historic flight data, with a reported visibility highlighted by Allan et al. [2], demonstrates that
accuracy of up to 45.4%. The authors adapted their classifier good prediction results can be achieved with these data. Other
to take attributes of locally varying relevance into account. works [6], [17] on flight delay classification mention that
The classifier, however, does not consider information about meteorological information would have improved results.
the environment, which is left to future work. Belcastro et Research on delay prediction is predominantly focused on
al. [12] combine flight and meteorological data to predict the the delay of a single flight. Assent et al. [6] classify individual
delay of individual flights using various delay thresholds; the flights into the categories “ahead of time”, “on time”, and
business understanding, which serves to assess the situation
and determine the goals of the whole data mining process.
The first step can be seen as hypothesis building. The second
step is data understanding, where the data are collected,
described, explored, and verified. It is important to determine
whether all data are available and valid in order to reach the
goal set in the first step. The third step is data preparation.
Collected data must be selected, cleaned, and transformed into
the required format. Due to the fact that almost all data mining
processes require data from different sources and formats, data
preparation is essential to achieve good and valid results. The
fourth step is modeling, where a predictive model is selected,
trained on the available data, and assessed; this step is where
the actual data analysis happens. The fifth step is evaluation.
The results of the previous step are evaluated and the whole
process is reviewed. It is possible that the process owner
will have to restart with the first step because the hypothesis
Fig. 1. Cross-industry standard process for data mining (CRISP-DM) [1] from the beginning has to be reformulated or the chosen data
sources were not appropriate for the analysis problem at hand.
The process concludes with the deployment. Multiple cycles
“delayed” whereas Belcastro et al. [12] use delay thresholds of the CRISP-DM process may be necessary to achieve and
from below 15 minutes to below 90 minutes of delay. In this improve the results. The iterative character of this process
paper, we instead classify air traffic scenarios into risk classes, means that the deployment of data mining results leads to
the presented classifier predicts the average delay of a group a better business understanding (Step 1), which then leads to
of planes in the time frame of the scenario. new hypotheses, goals, and so on [1].
An interesting aspect of flight delay classification is that
different attributes have locally varying relevance [6]. For III. DATA U NDERSTANDING
example, air temperature has been found to be more relevant The United States Department of Transportation provides
for delay prediction at certain airports than others. Assent et open and free access to aviation data dating back to 1987 [24].
al. [6] account for this fact by developing their own classifier Carrier on-time performance data can be downloaded in
that was able to slightly outperform traditional classification monthly batches. The data set includes all domestic flights
methods. Consequently, a generalization of the classifier for for carriers that account for at least one percent of scheduled
different airports is difficult. Instead, the findings by Assent passenger revenues. This means that the data represents about
et al. [6] suggest that it is better to develop a classifier for a 77% of US air traffic measured by the amount of passengers.
single airport and describe a way to adapt it to others. The same source is also used by Belcastro et al. [12] and
When it comes to modeling of classifiers, current knowledge Abdel et al. [5]. The second data source for historical flight
shows that there is not a single best classification method for data is the EUROCONTROL flight archive for R&D, which
a certain application, but that different classification methods is a new portal and will replace the existing Demand Data
have strengths and weaknesses according (see [8, p. 331]). Repository 2 [19]. During the collection, the new flight archive
For example, a random forest classifier is robust to outliers, was still in its beta phase and EUROCONTROL granted access
scalable and able to model non-linear decision. These are as beta user for the purposes of this paper. During the beta
factors that are important for airport scenarios. Pham et al. [17] tests only three months of data were available. These three
and Belcastro et al. [12] both use random forest classifiers. months are June 2016, March 2017 and June 2017. The portal
Belcastro et al. [12] also experimented with other classification provided information about all flights managed by EURO-
methods, e.g., support vector machines, naı̈ve bayes, and lo- CONTROL including international flights and transcontinental
gistic regression, which were outperformed by random forest. flights that just flew through European airspace. The US and
Wang et al. [18], however, used multi-layer neural networks the EUROCONTROL historic flight data differ in information
for their delay predictions over flight data [18]. granularity. US flight data include information about taxi times
The cross-industry standard process for data mining and the exact take-off and landing times, which the European
(CRISP-DM) defines multiple steps that can be followed in flight data does not include. However, the relevant information
order to achieve valid results in a transparent manner [1], [15]. of departure delay and arrival delay for the air traffic scenario
Figure 1 shows the CRISP-DM steps, which are conducted creation can be calculated.
in iterations. We followed these steps in the development of Risk class prediction for air traffic scenarios has to be done
the presented classifier; the structure of this paper reflects the for individual airports since attributes have locally varying
CRISP-DM steps. relevance. To select an airport, the one with the biggest delays
The CRISP-DM consists of six steps. The first step is have been chosen. One might assume that the delay measured
TABLE I from a website1 which offers data about all airports in the
T OTAL A RRIVAL D ELAY FOR A IRPORTS IN THE US IN 2016 world and all runways free to download as open data [23].
No. Airport Total Arrival Delay Movements The downloaded CSV file included 55 485 entries including
1 Atlanta (ATL) 3 938 293 min 898 356 small, medium and large airports as well as heliports, balloon
2 O’ Hare (ORD) 3 457 079 min 867 635 ports and closed airports. Aircraft data were downloaded from
3 San Francisco (SFO) 3 073 502 min 450 388
4 Los Angeles (LAX) 2 997 929 min 697 138 the aircraft characteristics database of the Federal Aviation
5 Dallas (DFW) 2 779 141 min 672 748 Administration (FAA) [20]. The file was obtained in Excel
6 Denver (DEN) 2 392 183 min 565 503 format and includes information about 2 766 different aircraft
7 New York (EWR) 1 956 875 min 435 907
8 New York (LGA) 1 732 238 min 369 987 types. Each aircraft is described with 25 attributes. Even
9 Las Vegas (LAS) 1 639 679 min 541 428 though the data set includes all aircraft needed according
10 Boston (BOS) 1 628 219 min 372 930 to the historic flight data sets from the US and Europe, it
had a lot of “to be done” values. If information of a certain
aircraft is needed, these values cannot be replaced with zero or
in absolute numbers would correspond to airports with the the average from other records. Missing aircraft information
most traffic as probability would suggest the more flights attributes have to be added manually by researching the correct
the more chances for delays. This assumption, however, does value on websites such as Skybrary2 . NOTAMs have been
not entirely hold. Table I shows a top-ten list of airports collected by the Federal Aviation Administration (FAA) for
with the largest total amount of arrival delay. The list is years. The FAA provides an archival search function for the
dominated by the airports with the most flights per year. last five years on their website [21]. The website provides no
Shaded cells indicate airports that are also among the top-ten API and displays NOTAMs for one day, thus every day needs
busiest airports. Nevertheless, it can be observed that three to be downloaded manually. The two offered formats are PDF
of these airports are not among the busiest, and have other and Excel. 365 Excel files were downloaded and converted to
reasons for the large amount of delay. As mentioned in Franck a CSX file for the airport of Atlanta for 2017 and 91 Excel files
et al. [3], EWR is highly affected by weather. Between the for the airport of Vienna. Each file consists of all messages
airports of Atlanta and O’Hare, 1st and 2nd positions in the that were active on this day.
table, respectively, is quite a large jump of 13.8% in total
arrival delay. The difference in movements, however, is only IV. DATA P REPARATION
3.5%. Thus, Atlanta is a suitable candidate for classification
Experiments with different sizes of air traffic scenario data
as the reasons for the delay might be due to the environment
sets have shown that large data sets are very slow on the
conditions.
provided hardware. This led to the decision to reduce size
of the data sets employed for learning the classifier. For the
TABLE II
T OTAL A RRIVAL D ELAY FOR A IRPORTS IN THE EU IN J UNE 2016
airport of Atlanta, only data from 2017 is selected and for
Vienna all three months of data, which increases the speed of
No. Airport Total Arr. Delay Movmts. the classification and makes parameter optimization possible.
1 London Heathrow (LHR) 472 622 min 39 920
2 Amsterdam Schiphol (AMS) 286 155 min 42 988 The first step in the two-step data cleaning process for flight
3 Paris Charles-de-Gaulle (CDG) 262 999 min 41 120 data concerns the cleaning of all flight records that do not
4 Frankfurt am Main (FRA) 258 185 min 40 808 have a valid tail or model number. A non-valid tail or model
5 Gatwick Airport (LGW) 223 985 min 25 652
6 Madrid-Barajas (MAD) 163 565 min 32 367 number is defined by the fact that no matching aircraft could
7 Barcelona-El Prat (BCN) 159 961 min 28 389 be found. Reasons for that are typos in the numbers, or that
8 EDDM 138 944 min 34 933 the aircraft is a helicopter or balloon. If no aircraft information
9 Paris-Orly (ORY) 127 800 min 20 659
10 Wien-Schwechat (VIE) 121 346 min 21 206 is available, the maximum take-off weight is unknown, which
is necessary for the delay cost calculation. Therefore, these
flights are flagged by setting the maximum take-off weight to
Similar evaluations have been done for European airports II. 9 999 999. This unique value highlights these flights for the
Even though the airport of Vienna is not number one on the scenario creation in the data construction task, where they are
list of total arrival delays for European airports, our increased processed individually.
knowledge about the airport due to the geographical proximity
The second step in the cleaning process for flight data is to
makes it worth a closer look. Vienna is on 11th position with
subtract departure delay of arriving flights from the total delay.
11.44 minutes of average arrival delay and on 10th position
If an aircraft is arriving in Atlanta or Vienna, the reason for
of total arrival delay in Europe.
delay at the departure airport is unknown. It might be due to
METARs have been downloaded from ASOS stations at certain environment conditions at Vienna or Atlanta, but this
the Iowa Environment Mesonet [22]. A script written in R is only one of many possible reasons: Other reasons may be
to download METAR data for a single airport is available. For security procedures, technical failure, waiting for crew, etc.
the purposes of this paper the script was adapted to be able to
download weather data for the largest 50 US and 50 European 1 http://ourairports.com
airports at once. Airport and runway data were collected 2 http://skybrary.aero
The departure delay is defined by the time between the filed adds three attributes due to the fact that Vienna has only
off-block time and the actual off-block time for historical flight two runways and no relevant taxiways could be identified for
data for Europe. For US flights, the departure delay is stored prediction.
in a separate “dep delay” field. Using a sort-and-merge algorithm, all flights that occurred
Meteorological data include many missing values. The first during the time span of the scenario have been identified. The
reason is that the unprocessed METAR did not include the flights are split into three attributes: small aircraft, medium
information. For example, an ice accretion is very unlikely aircraft and large aircraft. The size of an aircraft depends
in summer. The second reason is that the automatic transfor- on the maximum take-off weight (MTOW). The range of
mation of the unprocessed METAR to the attributes done by all MTOWs is split into three equally sized groups. A small
the data provider did not work correctly. In the second case, aircraft has a weight of up to 70 000 pounds, a medium aircraft
it is possible to fill the missing values with the correct ones of up to 200 000, and a large aircraft has a weight of more
from the METAR. Not all attributes, however, can be extracted than 200 000 pounds. Flights that have been highlighted in the
from the METAR. Sea level pressure was dismissed as it was data cleaning task by setting the maximum take-off weight
missing in 92.3% of records and cannot be extracted from to 9 999 999 get an artificial take-off weight calculated by the
the unprocessed METAR. This applies also to the apparent average take-off weight of all flights in the particular scenario.
temperature, which was missing in 91% of records. Sky Level Based on this weight the flight is added to one of the three
Coverage 2, 3 and 4 were missing in 73.06%, 87.3% and groups. If a scenario consists of more than 5% of highlighted
98.7% of records, respectively, as the METAR only included flights (with an unknown take-off weight) the whole scenario is
information about Level 1 in 95% of records. Experiments in removed from the final dataset. By this rule 9.1% of departure
the modeling phase have shown that only Sky Level Cover- scenarios and 11.7% of arrival scenarios were removed.
age 1 had an effect on prediction. Therefore these attributes In the time span of the scenario the planned arrival and
were dismissed in the final construction of air traffic scenarios. departure distance is calculated since very close distances are
This also applies to the Sky Level Altitude 2, 3 and 4. Ice likely to cause congestion. This is done by splitting the time
accretion 1h, 3h and 6h and peak wind gust, peak wind gust span into five minute blocks and counting how many aircraft
direction and peak wind gust time were all missing in 99% depart and arrive during each block. Then, the amount is
of records. For the few records that included information divided by five to get the average per minute of each block.
about ice accretion and peak wind gust time/direction these The block with the highest number is added as arrival or
environment conditions were expressed in other attributes in departure distance attribute into the scenario. Independently, if
the final air traffic scenario. In total no METAR record had the particular scenario is about arriving or departing aircraft,
to be removed completely, which is important because if no the arrival and departure distance attributes are added. The
METAR data is available for an air traffic scenario the whole reasons is that a peak in arrival distance can lead to congestion
scenario has to be dismissed. for departing aircraft as they might have to wait for a free take-
The required attributes of the aircraft data are the model off slot on the runway. For Atlanta, the arrival distance peaks
(Attribute 3), the maximum take of weight, and the ICAO at 4.8 aircraft per minute. In Vienna, the peak is 2.8 aircraft
code. From the selected aircraft of the previous step about, per minute. Finally, the scenario type is defined: If the flight
35% of aircraft had a missing value for the maximum take- information is about arriving or departing aircraft. Weather and
off weight. This information has to be added manually by NOTAM information are equal in both types.
searching each aircraft model on the Skybrary website, which The last missing attribute to characterize an air traffic
is includes all facts for various aircraft models, including the scenario is the risk class. It is calculated from the cost of
maximum take-off weight. all delayed flights of a scenario. The report of cost reference
The base of each scenario is METAR data as it defines values [11, p. 79] compiled by EUROCONTROL in coopera-
the period of time and the meteorological information of a tion with the University of Westminster serves to calculate the
scenario. The beginning of an air traffic scenario is the time delay cost of each flight. In Annex J of the report a regression
when a METAR record is published and the end of a scenario analysis is proposed that allows to calculate the delay cost for
is before the next METAR record is published. This sets the each aircraft based on its maximum take off weight in tons.
time spans to one hour for scenarios at Atlanta airport and to The linear regression function is of the form
30 minutes for Vienna airport.
NOTAM data are also added to the scenario. As there y=m · x + c (1)
is one NOTAM file for each day, the information of the
file corresponding to the date of the scenario is added. For The function is based on the twelve most common aircraft
the airport of Atlanta, ten attributes are added to the air types. Variable x of the linear regression function is the root
traffic scenarios from NOTAM data. The amount of attributes of the maximum take off weight. There were two sets of
depends on the amount of runways and relevant taxiways. regression coefficients calculated. One for the full tactical costs
Experiments in the modeling phase have shown that only including reactionary cost for at-gate delay, also known as
taxiways B - B3 have a positive impact on the prediction of departure delay, and one for en-route delay, also known as
risk classes. For the airport of Vienna, NOTAM data only arrival delay. Each set includes the coefficients m and c for
nine delay categories from below 5 minutes to more than 300 1) Number of trees in the forest: Possible number of trees
minutes. is set from 0 to 100 divided into 10 steps.
The delay values of each flight are taken from the historical 2) Maximum depth of each tree in the forest: Possible depth
flight data. For US data, the arrival and departure delay of a is set from -1 to 100 divided into 5 steps.
flight is stored in the attributes “dep delay” (Attribute 9) and 3) Minimum leaf size: Possible leaf size is set from 1 to
“arr delay” (Attribute 16). For data from Europe the delay 100 divided into 5 steps.
values have to be calculated. Departure delay is the difference These parameter optimisation settings lead to 396 possible
between the filed off-block time (Attribute 4) and actual off- combinations of parameters and training runs. The running
block time (Attribute 6). Arrival delay is the difference be- time of all combinations was about 6 hours for the larger
tween the filed arrival time (Attribute 5) and the actual arrival dataset on the given hardware. The possibility to try more
time (Attribute 7). By knowing the delay, the right coefficients combinations was limited by the available random access
can be chosen to calculate delay costs. The maximum take-off memory.
weight is taken from the aircraft data set. The Cross Validation operator consists of a sub-process
For each scenario, the delay costs are divided by the total itself where the Random Forest classifier is trained, tested and
amount of aircraft to get an average delay cost per aircraft the performance is measured with the metrics accuracy, pre-
for each scenario. That way, each scenario, independently of cision and recall. After the model was finished with training,
the number of aircraft, can be compared with each other. The testing and parameter optimisation the best model moves on
range of average delay costs now needs to be split into the to the Explain Predictions operator shown in Figure 2. This
three risk classes. operator visualizes which attributes were the most important
for classification and calculates global weights for every
TABLE III
R ISK C LASSES FOR A IR T RAFFIC S CENARIOS attribute of the air traffic scenario.
The air traffic scenario dataset for the airport of Atlanta
Risk Class Cost Range Average Minute Range
1 EUR 0 - 1000 0 - 15 minutes
consists of 15 064 scenarios. The Sample operator undersam-
2 EUR 1001 - 9500 15 - 60 minutes ples the scenarios to 624 per risk class which adds up to
3 EUR >9500 >60 minutes 1 872 scenarios for training and testing. The air traffic scenario
dataset for the airport of Vienna consists of 6 768 scenarios. In
Following the ranges in Table III, every air traffic scenario this case, the dataset has to be undersampled to 153 scenarios
is put into one of the three risk classes which is the final per class, which adds up to 459 scenarios for training and
attribute for each scenario record. testing.
V. M ODELING
VI. E VALUATION AND R ESULTS
The modeling phase is done with help of the tool Rapid-
Miner, version 9.5. In RapidMiner, training and testing of a The iterative nature of CRISP-DM led to the fact that
classifier was done by defining a process and the processes the process, including the evaluation phase, was done more
were executed on Rapid Miner Server, version 9.5, in a virtual than once. Results of earlier runs of the process had to be
machine with 8 GB RAM. Processes were executed in parallel discarded as precision and recall did not achieve satisfactory
on the client version as well using a laptop with 8 GB RAM results. The supposed reason was that the initial data set did
and an Intel Core i7-4500U with 1.80 Ghz. The tool offers only include meteorological data as environment condition
to create processes by connecting different operators, similar data and included several airports at once. This led to the
to workflow modeling. The classifier that is trained is the decision to include NOTAMs and perform classification only
random forest classifier. In RapidMiner two processes are set for individual airports, which substantially improved precision
up to train and test with air traffic scenarios from Atlanta and and recall. For the datasets of Atlanta and Vienna airport the
Vienna. The process is shown in Figure 2. following metrics have been finally achieved:
The implemented RapidMiner process starts by retrieving For the airport of Atlanta precision values range from
the air traffic data set. The Set Role operator sets the risk 75.00% to 81.96% and recall values range from 67.79% to
class attribute as the attribute that should be classified. Then, 86.06% for the random forest classifier. For the classification
the Sample operator balances the data set and reduces the of Class 3 risks the classifier achieved the best results. Class 3
records of Class 1 and Class 2 to the number of Class 3 includes the air traffic scenarios with the highest delay costs
records. In the case of Atlanta, the number was reduced to 624 and thus also with the biggest saving potential with over 80%
records, and in the case of Vienna to 153 records. Now the in precision and recall values. For the classification of Class 2
data are ready for classifier training. The Optimize Parameter risks the classifier achieved the worst results. The overall best
operator varies the parameters of the random forest classifier value is recall of class 1 with 86.06%.
for every run and consists of a sub-process. In the sub-process, Results show that the classifier is able to predict risk Class 3
the Cross Validation operator splits the dataset into k-folds for the best, which means that the chosen environment condition
training and testing. For parameter optimization, the following data sources explain the reasons for high delays. The classifier
parameters are varied: has the most difficulties to predict class 2.
Fig. 2. Random Forest process in Rapid Miner for air traffic scenario training and testing
Classification metrics for the dataset of Vienna are worse The risk classes are based on delay costs of a scenario.
in comparison to the Atlanta dataset. The relations, however, Costs are determined using linear regression over the reference
are similar. The classifier achieved the best results predicting costs reported by EUROCONTROL. The linear regression
Class 3 risks, while for Class 2 risks the classifier performed depends on the maximum take-off weight of an aircraft. The
worst. Additionally, the difference between precision and data sources are on-time performance statistics reported by
recall of each class is very similar to the differences of the the United Status Bureau of Transportation, flight data from
Atlanta dataset. The worse numbers of the precision and recall EUROCONTROL, METARs, NOTAMs, airport and runway
values are probably due to the reduced size of the data set. For data, and aircraft characteristics. The analysis of the data sets
the airport of Vienna only three months of data were available. gives insights into delay amount and frequency in relation to
This leads to the conclusion that a larger dataset for classifier weather and external events. The allocation of the scenario into
training, of about one year of data, improves the results. the specific risk classes for training showed high imbalance in
the data as flights with short delays are more common than
TABLE IV with long delays. To solve this problem, undersampling was
E VALUATION OF R ANDOM F OREST C LASSIFIERS used to achieve balanced classes for the classification training
Atlanta - Random Forest Classifier and testing. Results showed that precision and recall values
Precision Recall reach more than 80% for classifying Class-3 risk – the delay
Risk Class 1 79.56% 86.06% class with highest costs.
Risk Class 2 75.00% 67.79%
Risk Class 3 81.96% 83.15%
Research described in this paper may serve as the basis for
Vienna - Random Forest Classifier an intelligent combination of different sources of air traffic and
Precision Recall environmental data in order to support air traffic control by
Risk Class 1 60.69% 69.54% providing classification of air traffic scenarios. In this regard,
Risk Class 2 53.16% 55.63%
Risk Class 3 64.66% 66.03% a possible improvement lies in the use of additional training
data. The data that could be collected from EUROCONTROL
and the available computational power limited the size of
VII. S UMMARY AND F UTURE W ORK the training data in this preliminary study. Experience shows
The aim of this work was to predict one of three risk that an increased size of the employed data set improves
classes for air traffic scenarios at a certain airport. Those the results. For example, by using two or more years of
scenarios are characterized by the flights within a certain data for training the classifier, chances are that the results
time frame, classification takes into account information about improve further. In the presented research, time series were not
the involved aircraft as well as weather data and information considered for the analysis. Information about past scenarios
about external events such as runway and airspace closures. could further increase classification accuracy.Information from
different past scenarios could be combined by choosing differ- [10] J. Rebollo and H. Balakrishnan, “Characterization and
ent time frames. In this work, the time frame of a scenario is prediction of air traffic delays,” Transportation Research
determined by the times when the weather data are updated. Part C: Emerging Technologies, vol. 44, pp. 231–241,
Peaks in traffic, however, do not always happen in between Jul. 2014. DOI: 10.1016/j.trc.2014.04.007.
the publication of weather data. By defining new strategies for [11] A. Cook and G. Tanner, European airline delay cost
setting the time span of a scenario, improvements of the result reference values – updated and extended values (version
could be possible. Another possible path for future research is 4.1), https://www.eurocontrol.int/publication/european-
to employ air traffic situations and risk class analysis as a basis airline-delay-cost-reference-values, 2015. (visited on
for finding association patterns, which may reveal correlations 05/21/2020).
between certain attributes, which may in turn influences flight [12] L. Belcastro, F. Marozzo, D. Talia, and P. Trunfio,
planning. By knowing the correlations of two attributes, flight “Using scalable data mining for predicting flight de-
plans can be adapted accordingly with the aim to decrease lays,” ACM Transactions on Intelligent Systems and
delay when the aircraft performs the flight. As evaluation has Technology, vol. 8, no. 1, Jul. 2016, ISSN: 2157-6904.
shown, attributes that influence classification of a scenario DOI : 10.1145/2888402.
into a risk class differ between the two investigated airports. [13] S. Choi, Y. J. Kim, S. Briceno, and D. Mavris, “Predic-
Increasing knowledge about the specific airport may help to tion of weather-induced airline delays based on machine
improve classification. People with domain knowledge of the learning algorithms,” in 2016 IEEE/AIAA 35th Digital
specific airport may help to find further attributes that have an Avionics Systems Conference (DASC), 2016, pp. 1–6.
effect on classification. DOI : 10.1109/DASC.2016.7777956.
[14] Y. Ding, “Predicting flight delay based on multiple
R EFERENCES
linear regression,” IOP Conference Series: Earth and
[1] P. Chapman, J. Clinton, R. Kerber, T. Khabaza, T. Environmental Science, vol. 81, p. 012 198, Aug. 2017.
Reinartz, C. Shearer, and R. Wirth, Crisp-dm 1.0: Step- DOI : 10.1088/1755-1315/81/1/012198.
by-step data mining guide, 1996. [15] J. Blitzstein and H. Pfister, Harvard data science course,
[2] S. Allan, J. A. Beesley, and J. E. Evans, “Analysis of http://cs109.org/, 2018.
delay causality at newark international airport,” Pro- [16] EUROCONTROl, Latest on delays,
ceedings of the 4th USA/Europe Air Traffic Management https://www.eurocontrol.int/publication/network-
Research and Development Seminar, Dec. 2001. manager-annual-report-2018, 2018. (visited on
[3] E. Frank and M. Hall, “A simple approach to ordinal 07/04/2020).
classification,” in European Conference on Machine [17] D.-T. Pham, S. Alam, S. Yi-Lin, and V. N. Duong, “A
Learning, Springer, 2001, pp. 145–156. machine learning apporach on past ads-b data to predict
[4] K. F. Abdelghany, S. S. Shah, S. Raina, and A. F. Ab- planning controller’s actions,” Air Traffic Management
delghany, “A model for projecting flight delays during Research Institute at Nanyang Technology University,
irregular operation conditions,” Journal of Air Transport Tech. Rep., 2018.
Management, vol. 10, no. 6, pp. 385–394, 2004. DOI: [18] Z. Wang, M. LIANG, and D. Delahaye, “Automated
10.1016/j.jairtraman.2004. data-driven prediction on aircraft estimated time of
[5] M. Abdel-Aty, C. Lee, Y. Bai, X. Li, and M. Michalak, arrival,” Dec. 2018, pp. 1–9.
“Detecting periodic patterns of arrival delay,” Journal [19] EUROCONTROL, Demand data repository web por-
of Air Transport Management, vol. 13, pp. 355–361, tal, https://www.eurocontrol.int/ddr, 2019. (visited on
Nov. 2007. DOI: 10.1016/j.jairtraman.2007.06.002. 06/30/2020).
[6] I. Assent, R. Krieger, P. Welter, J. Herbers, and T. Seidl, [20] Federal Aviation Administration,
“Data mining for robust flight scheduling,” in, ser. Data Aircraft characteristics database,
Mining for Business Applications. Springer, Jan. 2009, https://www.faa.gov/airports/engineering/aircraft char
pp. 267–282. DOI: 10.1007/978-0-387-79420-4\ 19. database/, 2019. (visited on 07/01/2020).
[7] B. Sridhar, Y. Wang, R. Jehlen, and A. Klein, “Modeling [21] ——, Notam search,
flight delays and cancellartions at the national, regrional https://notams.aim.faa.gov/notamSearch/nsapp.html,
and airport levels in the united states,” USA/Europe Air 2019. (visited on 07/02/2020).
traffic Management Research and Development Semi- [22] Iowa State University, Asos-awos metar data download,
nar, no. 8, 2009. https://mesonet.agron.iastate.edu/request/download.phtml,
[8] J. Han, M. Kamber, and J. Pei, Data Mining: Concepts 2019. (visited on 07/01/2020).
and Techniques, 3rd edition. Morgan Kaufmann, 2011, [23] OurAirports, Open data downloads,
ISBN : 978-0123814791. https://ourairports.com/data/, 2019. (visited on
[9] L. Cruciol, L. Weigang, L. Li, and J.-P. Clarke, “In- 05/23/2020).
flight cost optimization for air traffic flow management [24] US Bureau of Transportation Statistics, Carrier on-
using data mining method on big data,” Jan. 2014, time performance, https://www.bts.gov/topics/airline-
pp. 1491–1498. time-tables, 2019. (visited on 06/30/2020).

Case Study - Flight Delay

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Case Study - Flight Delay

Uploaded by

Copyright:

Available Formats

Predicting Flight Delay Risk Using a Random

Forest Classifier Based on Air Traffic Scenarios and

978-1-7281-9825-5/20/$31.00 ©2020 IEEE

airports at once. Airport and runway data were collected 2 http://skybrary.aero

You might also like