Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 40

QMT 3001 BUSINESS

FORECASTING TERM PROJECT

GROUP 5
PEGASUS

02.06.2023

Student ID Name Signature

2019432018 Tuğçe AKIN

2019432024 Busenur ARPACI

2017432126 Özkancan BAHAR

2019432062 Ömer KARABAY

Contents
INTRODUCTION..........................................................................................................3
PRELIMINARY DATA ANALYSIS............................................................................5

Autocorrelation Function Analysis............................................................................9

MOVING AVERAGE AND EXPONENTIAL SMOOTHING METHODS.............11

Naïve Method...........................................................................................................11

Moving Average Method.........................................................................................12

Single Exponential Smoothing.................................................................................13

Holt’s Method..........................................................................................................14

Winter’s Method......................................................................................................14

Diagnostics and Model Selection.............................................................................15

DECOMPOSITION.....................................................................................................16

Forecasting with Decomposition..............................................................................22

REGRESSION ANALYSIS........................................................................................23

Model Specification.................................................................................................23

Model Building........................................................................................................25

Overall Model......................................................................................................25

Reduced Model....................................................................................................28

Forecasting with Regression....................................................................................29

MODEL SELECTION and FORECASTING.............................................................31

Best Model Forecasting............................................................................................35

MANAGERIAL IMPLICATIONS..............................................................................38

REFERENCES.............................................................................................................40
INTRODUCTION

The airline industry is a large and growing sector that plays a critical role in global
transportation and commerce. Airlines generate significant economic benefits, supporting
jobs and contributing to national and global GDP. The market is highly concentrated, with a
small number of large carriers dominating the industry. Key competitive factors include
price, route network, customer service, and technology.

Low-cost carrier Pegasus Airlines is based in Turkey and offers service to a number
of locations in Europe, the Middle East, and Asia. The airline was founded in 1990, and it
began flying in 1997. Pegasus' international market share is 19.5%. Pegasus Airlines
provides a range of goods and services, including a few fare classes to accommodate various
travel requirements and price ranges. Only a carry-on bag and a personal item are included in
the airline's basic rate, dubbed Essentials. For a price, passengers can purchase extra services
like checked baggage, preferred seating, and meal options.

Our dataset shows the number of enplanements for U.S. air carrier domestic and
international scheduled passenger flights from January 2000 to December 2022. The data is
presented in millions of passengers. The numbers in this dataset will be associated with
Pegasus Airlines' 19.5% market share. Based on our time series graph shown in Figure 1, it
appears that there is an increasing trend over time. The trend line shows a positive slope,
indicating that the values are increasing over time. Additionally, there seems to be some
seasonality in the data. The peaks occur at around the end of each year, which may be due to
increased consumer spending during the holiday season. It is also worth noting that there
appears to be some cyclicality in our dataset. There are periodic fluctuations in the data, with
peaks occurring every few years, followed by periods of lower values. This cyclicality could
be due to various factors such as economic cycles, industry trends, or other external factors.

The graph shown in Figure 1 also shows us an overall increasing trend in the number
of enplanements over the period shown, with some fluctuations and variations. The number
of enplanements started to decline in 2001, which may have been due to the impact of the
September 11 terrorist attacks on air travel. After a few years of recovery, the number of
enplanements again declined during the 2008-2009 recession, before returning to growth. The
COVID-19 pandemic had a significant impact on air travel, with a sharp decline in
enplanements starting in March 2020. The lowest point was reached in April 2020, when
enplanements were down to around 5% of their pre-pandemic level. There has been a gradual
recovery in enplanements since the start of the pandemic, but the number is still well below
pre-pandemic levels as of the end of 2022. It is reasonable to expect that enplanements will
continue to increase trend over the long term, albeit with some short-term fluctuations and
setbacks. Based on the historical data shown in the graph, it's likely that there are seasonal
effects on our dataset. Traditionally, the peak travel periods for air travel are during the
summer months (June to August) and around major holidays such as Thanksgiving and
Christmas/New Year's. During these times, many people take vacations or travel to visit
family and friends, leading to increased demand for air travel.

Conversely, there tends to be lower demand for air travel during the winter months
(January to February) and during the fall season (September to November), as people are less
likely to travel for leisure purposes during these times. There appear to be some business
cycles. As we can see, there was a downturn in enplanements starting in 2001, following the
September 11 terrorist attacks. This downturn was followed by a period of recovery and
growth, which was then interrupted by the 2008-2009 recession. After the recession, there
was another period of growth in enplanements, which was then followed by the significant
downturn in air travel due to the COVID-19 pandemic. Based on this dataset, we can say that
one potential forecasting problem we could investigate is predicting the number of
enplanements for future periods. Enplanements refer to the number of passengers boarding a
flight, and forecasting future enplanements can be useful for airlines, airport operators, and
transportation planners. The variable being investigated is the number of passengers who
board a flight from U.S. air carriers on domestic and international, scheduled passenger
flights.

As a time series variable, the number of enplanements can exhibit various patterns
and trends over time. We might expect to see a long-term upward trend in enplanements,
reflecting growth in the travel industry and increasing demand for air travel. We might also
see seasonal patterns, with higher levels of enplanements during certain times of the year
(such as holiday travel periods) and lower levels during other times. Some potential
independent variables to investigate for forecasting purposes include economic indicators
such as GDP and unemployment rate, fuel prices, airline-specific factors such as route
network and pricing strategies, and external factors such as weather and geopolitical events.

PRELIMINARY DATA ANALYSIS

The dataset consists of enplaned passenger numbers for Pegasus over a period of 22
years. It appears to be a time series containing monthly data points, going from January 2000
through December 2022, according to the data. Despite certain dips and plateaus along the
road, the numbers appear to rise generally over time despite considerable month-to-month
variation.

Figure 1: Time Series Plot of the Number of Enplanements

To begin with, there appears to have been a general increasing trend in passenger
counts over time, indicating that the airline has been growing in popularity and luring more
customers. This tendency is visible from the dataset's graph, which has a definite increasing
slope.
Based on the dataset supplied, it appears that Pegasus Airlines has been growing
steadily over the past 22 years, with no significant changes in demand or other external
factors that can affect passenger counts. However, it would be useful to have more
comprehensive data on the airline industry, such as details on monetary conditions, political
developments, and shifts in consumer behaviour.

Although it started with a decrease of 46% in March 2020, it decreased by 106.4% in


3 months. When this period was examined, it was seen that it coincided with the Coronavirus
period (pandemic). In this period, in addition to the closure of going abroad, a ban on staying
home has also been implemented in many countries. For this reason, many industries have
been affected by the pandemic. However, the airline industry is one of the most affected
industries.

Airlines have responded by reducing their flying schedules and eliminating some
routes. Due to their costs at the time when the flights were permitted, the airline sector
experienced a huge financial loss. Due to aircraft cancellations and diminishing customer
service, many airlines have seen a significant loss in revenue. Additionally, their clients were
required to take health precautions on their behalf. These actions have lengthened flights and
made an already dire financial situation for airlines even worse.

Historical data will not be sufficient for forecasting due to the pandemic's effects on
the airline industry. The number of upcoming flights and the demand for passengers may
change significantly based on the pandemic's progress and travel restrictions. As a result,
businesses need to be ready for an uncertain time. It appears that more data will be required
for forecasting in order to identify potential trends and future outcomes for the airline
business.

Median Mean Standard Deviation Minimum Value Maximum Value

61651 60997 10772,95544 3013 80746


Table 1: Descriptive Statistics

The dataset's median value is 61651, which writes down that 50% of the dataset's
values fall below it and 50% rise above it. The dataset's distribution may be slightly
negatively biased given that the mean value is 60997, which is marginally less than the
median. The dataset's values are spread out around the mean, according to the standard
deviation of 10772.96, with most values occurring within around one standard deviation of
the mean. The range of values contained in the data set is shown by the minimum and
maximum values, 3013 and 80746, respectively. This provides an understanding of the
possible range of results that might be anticipated in further observations. The dataset looks
to contain a wide range of values overall, although most of the values are centred around the
mean.

A histogram is a figure ware rectangle whose size is related to the frequency of a


variable and whose width is equal to the class interval. When we look at the histogram chart
(see Figure 2), we see that the most repeated scheduled passenger flights are between 57500
and 62500 thousand intervals. We also see that most of the Pegasus flights lie between 52500
and 67500 thousand intervals.

Figure 2: Histogram of Pegasus Enplanements


If most of the data values lie on the right side of the histogram and the tail of the
histogram is skewed to the left, the histogram is skewed to the left. In this instance, the mean
value is less than the data set's median value. Here we see that our mean value is 60997 and
our median value is 61651 in our data set, which explains why our histogram is left-skewed.
That's why we see that most of the tiles in our histogram chart are stacked to the right of the
chart. There are some smaller intervals on the histogram graph that make the graph
asymmetrical and left-skewed. When the data set is examined, the values between 17500 and
22500 give the histogram graph the appearance. The Covid-19 pandemic quarantine, which
has been effective since March 2020, is one of the main reasons for this appearance.

A box plot is a form of chart frequently employed in explanatory data analysis in


descriptive statistics. Box plots use the percentiles and averages of the data to visually depict
the distribution of numerical data and skewness. Box plots show the five-number summary of
a set of data: including the minimum score, first (lower) quartile, median, third (upper)
quartile, and maximum score.

Figure 3. Boxplots of Pegasus Enplanements

As Figure 3 shows us, the minimum and maximum scores are 46163 and 77609,
respectively. Our Q1 (lower quartile) and Q3 (upper quartile) values are 57899,3 and 66176.
Finally, our median value is 61651. Likewise, in our box plot, the distribution is symmetrical
(normal distribution) when the Median is in the middle of the box and the whiskers are
approximately the same on both sides of the box.

So far, we analysed the basic statistics of the enplanements and showed them both
numerically and graphically. Overall, the dataset's distribution is spread out around the mean
value (60997), while most of the Pegasus flights lie between 52500 and 67500 thousand
intervals. Also, the most repeated scheduled passenger flights are between 57500 and 62500
thousand intervals which includes the median value (61615).

Autocorrelation Function Analysis

Autocorrelation, also known as serial correlation, refers to the degree of correlation of


the same variables between two successive time intervals. The value of autocorrelation
ranges from -1 to 1. A value between -1 and 0 represents negative autocorrelation. A value
between 0 and 1 represents positive autocorrelation.

Figure 4: Autocorrelation Function of Pegasus Enplanements


If a series is trending, the subsequent observations are highly correlated, the
autocorrelation coefficients are often significantly different from zero for the first few time
lags, and as the number of time lags grows, they steadily decrease until they are zero. When
we examine Figure 4, it is certain that the enplanement is trending. The first few
autocorrelation function values are significant from zero, whereas the more the time lags
grow, the value of the autocorrelation function becomes closer to zero. (See Table 2).

Figure 5: Value of ACF from Lag1 to Lag12

,For example, the first autocorrelation coefficient (0,93) is statistically significant at a


5% level (T=15,49) > T (tabulated) = T (276;0,025) = 6,16. Also, the second autocorrelation
coefficient (0,82) is statistically significant at a 5% level (T=8,23) > T (tabulated) = T
(276;0,025) = 4,65 (See Table 1). Up to lag 8, autocorrelation function values are meaningful
in terms of statistics. However, after lag 8 autocorrelation function values can not break the
significance level limit boundaries; in other words, statistically unimportant. Hence, this
shows that our data is nonstationary (trending).

In order to decide whether our data is statistically seasonal, we take a look at lag 12,
that is because our data is monthly based on frequency. The lag 12 autocorrelation coefficient
(0,07) is statistically not significant at a 5% level (T=0,41) < T (tabulated) = T (276;0,025) =
0,73. (See Table 1). Therefore, there is not strong enough evidence to say that there is
seasonality in our dataset.
MOVING AVERAGE AND EXPONENTIAL SMOOTHING METHODS
Our analysis reveals that the airline's passenger counts show a clear increasing trend
over time, indicating the airline's growing popularity and customer base. This trend is visible
in the dataset's graph, which shows a clear upward slope.

The moving average and exponential smoothing methods are used to forecast future
passenger counts. To smooth out fluctuations in the data, the moving average method takes
the average of a specified number of previous observations. While this method can detect
trends in data, it may not be suitable for more complex patterns such as seasonality or
irregular fluctuations. Alternatively, the exponential smoothing method gives more weight to
recent observations while keeping the data trend in mind. Given the obvious upward trend in
passenger numbers, this method may be more appropriate. However, experimenting with
different parameter settings is critical in order to find the best forecasting model for the
dataset.

Overall, the exponential smoothing method, which considers both the trend in the data
and recent observations, can be used to forecast future passenger counts for the airline. This
approach can assist the airline in better planning its business operations and resources to meet
rising demand.

Naïve Method
The Naive Method is a very simple forecasting technique that involves using the
actual value from the previous period as the forecast for the current period. While it is a good
starting point for any forecasting project, it is generally not the best method for making
accurate predictions, particularly for longer time horizons.

Regarding the values we obtained, a Mean Absolute Deviation (MAD) of 1320.93


indicates that, on average, the forecasts are off by about 1320.93 enplanements. A Mean
Squared Error (MSE) of 14852078.14 indicates that there is a larger error in the forecasts,
and a Mean Absolute Percentage Error (MAPE) of 6.85% indicates that the average
percentage error in the forecasts is about 6.85%.
Naïve Model MAD MSE MAPE
Trend Naive 1320,93 14852078,14 6,85%

Table 2: Trend Naive Method Errors

Based on these results, it seems that the Naive Method may not be the best forecasting
method for our dataset, as the errors are relatively high. However, it's important to note that
the appropriateness of a forecasting method depends on a variety of factors, including the
nature of the data, the forecasting horizon, and the specific goals of the analysis.

It may be worth exploring other forecasting methods, such as moving averages, or


exponential smoothing, to see if they can produce more accurate forecasts for our dataset.
Additionally, we may want to consider incorporating other independent variables, such as
GDP or oil prices, into our analysis to improve the accuracy of our forecasts.

Moving Average Method


The moving average model can not handle trend or seasonality effectively, but it may
perform better than the total mean. Since Pegasus data shows a trending increase with cycling
peaks and busts, it is better to use double exponential moving average method since it would
give better results while tracking the trend. A longer moving average period is often more
appropriate for monthly data that shows a trend. The duration of the moving average period
will be determined by the features of the data as well as the goal of the analysis.

The parameter that produces the smallest error is determined by the error metric being
utilised. Looking at the table, we can see that, regardless of the error metric, a bigger value of
k generally correlates to a larger error. This implies that a lower k value may be better
appropriate for this data.

However, to answer the question of why a specific parameter has an obvious smallest
error, we will look at the MAPE value to simplify things. In Table 2, we can see that the 2
period parameter has the lowest MAPE value because it tracks the data closer. As we move
on to higher parameters, the MAPE also increases because we include more old data than
recent data. This causes the moving average method that we use, track the trend slower. That
way, the actual value and forecasted value gap increases.

Parameter (k) MAD MSE MAPE


2 1563,50 23879110,65 10,23
4 2056,25 38780520,77 13,85
6 2522,14 51126164,43 15,90
12 3705,47 85088621,98 19,53
Table 3: Moving Average Errors

Single Exponential Smoothing


Based on the values of the mean absolute deviation (MAD), mean squared error
(MSE), and mean absolute percentage error (MAPE) for the different values of the smoothing
parameter, it appears that the single exponential smoothing method with a smoothing
parameter of 0.9 is the most appropriate for our dataset.
This is because it has the lowest MAD, MSE, and MAPE values among the different
smoothing parameter values tested, indicating that it produces the most accurate forecasts.
However, further analysis and comparison with other forecasting methods may be necessary
to confirm this.

Parameter (α) MAD MSE MAPE


0.1 3836 73816966 19
0.2 2613 49489560 15
0.3 2095 36955686 13
0.6 1507 21776008 10
0.9 1328 15980615 7
Table 4: Single Exponential Smoothing Errors

The parameter that yields the smallest error is 0.9, with a MAD value of 1328, an
MSE value of 15980615, and an MAPE value of 7. This indicates that the best fit for the data
was achieved with a smoothing parameter of 0.9. This is likely because the larger smoothing
parameter places more weight on recent observations, which may be more indicative of future
trends in the data.
Holt’s Method
For time series data having a trend component, double exponential smoothing, often
known as Holt's approach, is acceptable. Double exponential smoothing can be used to
predict future values of a time series with a linear or exponential trend. This method is
especially beneficial when the data contains a trend component that detrending tools cannot
simply eliminate. Since Pegasus time series data has a trend component, double exponential
smoothing (Holt's Method) is appropriate.

Using the double exponential smoothing approach (Holt's method) with the given
data, the combination of α=0.9 and β=0.9 yields the lowest MAPE value of 5,50. This implies
that this set of parameters provide the best fit to the data.

This particular set of parameters produces the lowest MAPE number because it allows
the model to respond aggressively to both the level and trend in the data. The parameter α
governs the weight provided to the most recent observation, whereas the β parameter governs
the weight given to the trend's slope. By increasing both α and β, the model is able to respond
to changes in the data more quickly, resulting in a more accurate forecast.

β
0.1 0.2 0.3 0.6 0.9

α 0.1 19,28 19,36 19,72 20,07 19,64

0.2 15,78 16,17 16,14 15,21 14,83


0.3 13,69 13,84 13,54 12,98 13,02
0.6 9,80 9,52 9,29 9,47 9,12
0.9 7,67 7,68 7,53 6,67 5,50
Table 5: Holt's Method Errors

Winter’s Method

Holt (1957) and Winters (1960) extended Holt’s method to capture seasonality. The
Holt-Winters seasonal method comprises the forecast equation and three smoothing equations
— one for the level ℓt, one for the trend bt, and one for the seasonal component st, with
corresponding smoothing parameters α, β and γ (Athanasopoulos, 2018). When we enter the
α, β and γ values on Minitab in line with these parameters, the MAPE values that are formed
are below.
α =0.1 α =0.2 α =0.3 α =0.6
γ =0.1 β=0.1 19,18 15,81 13,79 10,04
β=0.2 19,22 16,24 14,06 9,79
β=0.3 19,88 16,54 14,06 9,67
γ =0.2 β=0.1 18,99 15,81 13,94 10,13
β=0.2 19,12 16,50 14,41 9,98
β=0.3 20,12 17,37 14,81 9,97
Table 6: Winter's Method Errors

If the mean absolute percentage errors (MAPEs) are compared, the .6 smoothing
constant is also better. To demonstrate:

α = 0.6 β = 0.3 γ = 0.1 MAPE = 9,67%

When the other variables are kept constant, the α variable increases while the error
decreases.

Diagnostics and Model Selection


When comparing models, it is important to consider the specific needs and
requirements. Performance measures such as MAD, MSE, and MAPE provide different
perspectives on the accuracy of the models. The model with the smallest value for each
measure would be preferred, as it indicates lower forecast errors. The goodness of fit to the
data is crucial in determining how well the models capture the underlying patterns and trends
in the historical data. It is important to choose a model that adequately represents the data
without overfitting. Overfitting occurs when a model captures noise or random fluctuations in
the data, leading to poor performance on new data. A balance between model complexity and
goodness of fit should be considered, aiming for a model that strikes the right balance
between simplicity and accuracy.
The Holt’s Method can capture trends in the data and provide better
predictions compared to the Naive and Moving Average methods. As a result, Holt's Model
performs best. The error values of this model are lower than other models. In addition, the
error distribution of the Holt Model is more independent and normal. For this reason, the Holt
Model can be defined as the model that offers a perfect match with the data and produces the
best forecasts. When there is a current pattern in the data and that trend is anticipated to
persist in the future, Holt's model is helpful. It provides better predictions by considering both
level and trend components compared to simple methods such as the Naïve Model or Moving
Average.

DECOMPOSITION
Both the additive and multiplicative models have the same MAPE value of 22 and
MAD value of 5696 based on the accuracy measures provided. The MSD value for the
multiplicative model, on the other hand, is slightly higher than that for the additive model,
implying that the multiplicative model may have a slightly higher error.

We can compare their graphs to see which one is better suited to the data. The
additive model assumes that the magnitude of seasonal fluctuations remains constant over
time, whereas the multiplicative model assumes that the magnitude of seasonal fluctuations
increases or decreases in accordance with the overall trend. As a result, we can compare the
seasonal fluctuations in the two models to see which one best fits the data.
If the seasonal fluctuations in the data appear to have a constant magnitude, the
additive model may be more appropriate. If the seasonal fluctuations appear to be
proportional to the overall trend, the multiplicative model may be more appropriate.

If the seasonal fluctuations in the data appear to have a constant magnitude, the
additive model may be more appropriate. If the seasonal fluctuations appear to be
proportional to the overall trend, the multiplicative model may be more appropriate. For our
case, as we see in the Graph 2, there is no significant difference in which method we use.
However, as stated before, MSD value is slightly less in the additive method than the
multiplicative method. So that, we will continue our analysis with additive method.
Figure 6: Comparison of Additive and Multiplicative Methods

Figure 7: Fitted Line Plot


On this fitted line plot, the points generally follow the regression line. The points
adequately cover the entire range of density values. However, the points from 01 March 2020
to 01 June 2021 appear to be an outlier. The fact that the pandemic affects all flights around
the world is the reason for the out-of-line here. Other than that, the model properly fits any
curvature in the data.
R2 is the percentage of variation in the response that is explained by the model. The
higher the R2 value, the better the model fits your data. In our calculation, we get our R2
11,8%.

Figure 8: Time Series Decomposition Plot


Figure 9: Component Analysis

Data that have been adjusted for the season are useful for identifying both short- and
long-term trends in flight patterns. Year-over-year fluctuations in unadjusted data have
typically been used to illustrate short- and long-term trends in the aviation sector. By
comparing the same month (May to May), these comparisons avoid the impact of seasonal
changes, but they are flawed for two reasons. First, due to calendar influences, the months
may differ (one may have Easter while the other does not). Second, there may be variance
within the months; there can be a general increase but some internal drop.
Figure 10: Seasonal Indices and Residuals by Season of Number of Enplanements

The difference between the prediction and the observed value is the residual. If we
plot the observed values and overlay the fitted regression line, the residuals for each
observation would be the vertical distance between the observation and the regression line.

An observation has a positive residual if its value is greater than the predicted value
made by the regression line. In our dataset, residuals by season values are positive, but not
significant.

The seasonally adjusted data plot depicts the data without the seasonal component. It
appears to have a clear increasing trend from 2017 to mid-2019, then a decrease until mid-
2020, and then another increase until the series ends in early 2021. With some minor seasonal
variations, the values appear to fluctuate around this trend. The seasonally adjusted and
detrended data graph depicts the data after both the seasonal and trend components have been
removed. The seasonally adjusted and detrended data is relatively flat, with no discernible
trend or pattern. The data shows some fluctuations, but they do not follow a consistent
pattern.
Figure 11: Component Analysis for the Number of Enplanements

The residuals' normal probability plot and histogram both show a roughly normal
distribution, which is a desirable feature of residuals in a regression analysis. This indicates
that the regression model fits the data well and that the residuals do not exhibit any
significant patterns or deviations from normality.

A random scatter of points is visible in the residuals versus fits plot, which is another
desirable feature of a well-fitting regression model. There appears to be no discernible pattern
in the residuals, implying that the model captures the majority of the variation in the data.
However, there are a few outliers with magnitudes far greater than the majority of the
residuals. An unexpected event, Covid-19 health crisis, that significantly impacted air travel
in Turkey is one possible real-life event that could have caused these outliers.
Figure 12: Residuals Plots for the Number of Enplanements

Forecasting with Decomposition


Multiplicative Model:

Fitted Trend Equation


Yt = 56028 + 35,88×t

Forecasts
Period Forecast
277 65952,1
278 65906,4
279 65928,5
280 66304,6

Additive Model:
Fitted Trend Equation

Yt = 56028 + 35,87×t

Forecasts

Period Forecast

277 65949,7

278 65908,5

279 65929,7

280 66279,9

REGRESSION ANALYSIS

Model Specification
To determine the independent variables that may influence the dependent variable, we
need to analyse the dataset and consider the context of the problem. In this case, the
dependent variable is "the number of enplanements" which represents the enplanements for
U.S. air carrier domestic and international scheduled passenger flights. The following
independent factors could potentially have an impact on the number of enplanements:

1. Time Index: observation date (time variable, continuous).


2. Economic Indicators: Variables related to the state of the economy, such as GDP
growth rate, consumer spending, or unemployment rate (continuous variables).
3. Fuel Prices: Variables representing the price of fuel (continuous variable).
4. Airline-Specific Factors: Dummy variables representing specific events or actions
related to individual airlines, such as marketing campaigns, route expansion or
reduction, mergers, or pricing strategies (binary dummy variables).
5. Travel Restrictions or Regulations: Dummy variables indicating the presence or
absence of significant travel restrictions, visa policies, or security measures (binary
dummy variables).
6. Events and Holidays: Dummy variables indicating the presence or absence of major
events, holidays, or conferences (seasonal dummy variables).
7. Terrorist Attacks: Terrorist attacks as September 11 attacks can be considered dummy
according to our dataset. (Dummy variables)
8. Covid-19 pandemic: Dummy variables indicating the presence or absence of Covid-
19. (Dummy variables)

Economic indicators such as GDP growth rate, consumer spending, or


unemployment rate can influence air travel demand. A positive relationship is generally
expected, meaning that as the economy improves, there may be an increase in enplanements.
Fuel prices can impact airline operating costs and airfares, which can influence passenger
demand. A negative relationship is typically expected, as higher fuel prices could lead to
higher airfares, potentially reducing air travel demand.

The relationship between airline-specific factors and enplanements can vary


depending on the specific context. For example, positive airline-specific factors such as
successful marketing campaigns, route expansions, or competitive pricing strategies may lead
to increased enplanements. Conversely, negative factors like service disruptions or negative
publicity could potentially have a negative relationship. Stricter travel restrictions or
regulations are expected to have a negative relationship, as they can limit travel
opportunities and reduce air travel demand. Major events, holidays, or conferences can have a
positive relationship with enplanements, as they can attract more passengers travelling for
leisure, business, or specific purposes during those periods. Finally, terrorist attacks and
Covid-19 situations as dummy variables, can have a negative relationship with
enplanements.

The correlation coefficient computed between "Number of Enplanements" and "US Gasoline
Prices" is 0.373. This positive correlation implies that when "US Gasoline Prices" rise, the
"Number of Enplanements" tend to climb as well. However, the correlation value of 0.373
indicates that the association between these two variables is relatively weak. The correlation
coefficient between "Number of Enplanements" and "US Consumption Expenditures" is also
calculated as 0.318. This positive correlation implies that the "Number of Enplanements" has
a minor tendency to increase as "US Consumption Expenditures" rise.

According to the first thesis above, we would expect the variables of “Number of
Enplanements” and “US Gasoline Prices” to have at least some negative correlation due to
the gasoline prices having an inflationary effect on ticket prices. However, keep in mind that
those times are mostly caused by inflationary monetary policies causing US citizens to spend
more money. That might be the reason we don’t see a negative correlation between these two
variables.

Figure 13: Correlations between the Number of Enplanements and U.S. Gasoline Prices

Model Building
Overall Model
To investigate the link between the response variable and the predictor factors, a
regression analysis was performed. Each predictor variable's coefficients were estimated,
along with their standard errors, t-values, and p-values.

With a coefficient of 44,462, the constant term, which represents the expected value
of the response variable when all predictor variables are zero, was found to be statistically
significant (p 0.05). The US Gasoline Prices coefficient was estimated to be 3,904 with a
standard error of 946. It was shown to be statistically significant (p 0.05), showing that a one-
unit increase in US Gasoline Prices is related with a 3,904 unit increase in the response
variable when all other factors are held constant.

Similarly, the US Personal Consumption Expenditure coefficient was estimated to be


0.565 with a standard error of 0.264. It was also shown to be statistically significant (p 0.05),
implying that a one-unit rise in US Personal Consumption Expenditure is associated with a
0.565 unit increase in the response variable when all other variables are held constant.
Figure 14: The Coefficients; Constant, U.S. Gasoline Prices and U.S. Personal Consumption Expenditures

The variance inflation factor (VIF) was used to test for multicollinearity. Both
predictor variables have VIF values of 1.55, showing low multicollinearity. Based on the
coefficients, statistical significance, and low multicollinearity, it is suggested that the final
model be constructed by including the significant predictor variable (US Personal
Consumption Expenditure) and excluding any variables that are not statistically significant or
have severe multicollinearity.

The Model Summary reported an R-squared value of 15.36%, suggesting that the
predictor variables in the model explain roughly 15.36% of the variability in the response
variable.

Figure 15: Model Summary

To have the final model, we remove the US Gasoline Prices which has the correlation
that does not fit our assumption. We are only going to keep the US Consumption
Expenditures variable for our final model. For our final model our new Coefficients, Model
Summary and Variance Analysis table are below.
Figure 16: Coefficients

Figure 17: Model Summary

Figure 18: Analysis of Variance

Figure 19: Residual Plots for Number of Enplanements


Reduced Model
In this analysis, we aim to develop a model based on the information provided to gain
insight into the relationship between US consumption expenditures and the number of
enplanements. According to our analysis, the US consumption expenditure variables emerged
as the most influential factors affecting the number of aircraft. To reach the relationship
between enplanements and this factor, we performed an analysis of variance (ANOVA),
which allowed us to evaluate the significance of the regression model. The results showed a
significant correlation between the planes and selected variables, as evidenced by a low P-
value and a large F-value.

Figure 20: Durbin-Watson Statistic

We also examined the diagnostics and residuals of the reduced model for unusual
observations. We identified several observations with large residuals indicating potential
outliers or influential data points that may require further investigation. In addition, the
Durbin-Watson statistic reveals the presence of positive autocorrelation in residuals,
suggesting that the independence assumption is violated.
Also, the histogram of the dataset is given in Graph 2. The histogram looks skewed and has a
non-normal shape, it may indicate a violation of the normality assumption.
Figure 21: Fits and Diagnostics

These observations include entries with extreme differences between the actual
number of enplanements and the predicted values. Outliers and unusual observations can
arise due to various factors such as data entry errors, exceptional events, or unaccounted-for
factors that influence the dependent variable. Observations 243, 244, 245, 246, and 247 have
significantly lower actual enplanement numbers compared to the predicted values, indicating
a potential anomaly in these instances. Conversely, observations 270, 271, 272, 273, 274,
275, and 276 have actual enplanements higher than predicted, suggesting an overestimation
by the model.

Forecasting with Regression


We used the double exponential smoothing method to forecast the independent variable of
US consumption expenditures. So, we found the next 12 variables to be as stated in the table
below:
Figure 22: Forecasting with Regression

As we use these forecasted independent variables in our final regression model, we expect
future US Consumption Values to be:

Figure 23: Regression Equation

Table 7: Regression Forecasts


Figure 24: Residuals Plots for U.S. Consumption Expenditures

MODEL SELECTION and FORECASTING


While determining the best among the moving average exponential methods, the
parameters considered are MAD, MAPE, and MSE, respectively. In addition, the parameters
α, β, and γ are also checked. As we mentioned before, our dataset is not stationary and there
is no seasonality in our dataset. Therefore, the most compatible model for enplanements
between moving average exponential methods is Holt's method.

Using Holt's method with the enplanements data, the combination of α=0.9 and β=0.9
yields the lowest MAPE in comparison to other methods in the section (value of %5,50). This
implies that this set of parameters provides the best fit for the enplanements. Mean Absolute
Percentage Error has been the decisive parameter in finding the most suitable model in this
section.
Figure 25: Accuracy Measures for Holt's Method

Figure 26: Residual Plots for Enplanements of Holt's Method

The points in the normal probability plot in Figure 2 do not lie nearly along a straight
line. Therefore, Holt’s method is not well-fitted between the residuals and a normal
distribution. The histogram is centred at zero, and it is symmetrical (not skewed). For the
versus fit version, most residuals lie on the zero line, however, there are some outlier values
in the graph. The error variability may not be constant, because of the Covid-19 pandemic.
Also, in the order graph around the 240th month, some residuals took very high values. Value
240 coincides with covid-19 pandemic quarantines, in May 2020.

When choosing the most compatible decomposition methods, we looked at the effect
of seasonal fluctuations on our enplanement dataset in addition to the Moving Average
methods parameters. As we mentioned before, conducting both additive and multiplicative
methods has resulted in almost the same results. However, the MSD value is slightly less in
the additive method than in the multiplicative method. So, we chose the additive method in
this part.

Figure 27: Residual Plots for Additive Decomposition Model

As we mentioned at the end of the decomposition part, the residuals' normal


probability plot and histogram both show an almost normal distribution, which is a desirable
feature of residuals in a regression analysis. This indicates that the regression model fits the
data well and that the residuals do not exhibit any significant patterns or deviations from
normality.

The residuals versus fits plot exhibits a random scatter of dots, which is another
desirable quality of a well-fitting regression model. The absence of any obvious trend in the
residuals suggests that the model adequately accounts for most of the variation in the data.
There are a few outliers, nevertheless, whose magnitudes are significantly larger than most of
the residuals. One potential real-life occurrence that might have contributed to these outliers
is the Covid-19 health issue, which had a major influence on Turkish air travel.
In the regression part, in addition to the parameters we looked at in other parts, we
added independent variables. We first explored how different variables could potentially have
an impact on enplanements. Then we decided to find the correlation values and reduce them
to a single variable. In the overall model of regression analysis, we mentioned that we only
have one variable for the final regression model. That is because we remove the US Gasoline
Prices which have a correlation that does not fit our assumption. Therefore, we conducted our
regression analysis with the US Consumption Expenditures variable for our final model.

Figure 28: Residual Plots for U.S. Consumption Expenditures

The figure shows us the regression models consumption expenditure residual plots.
Overall, the plots seem more well-fitted than decomposition models. Normal probability plots
residuals are concentrated at zero, and only four out of 276 data are deviated from 0. The
histogram is centred at zero, and it is symmetrical like Holt’s method. While versus fit and
versus order graphs are like Holt’s Method residuals, this time the residuals are spread
horizontally on the 0 axis.
Figure 29: Residual Plots for Number of Enplanements

The regression analysis residuals are indistinguishable from the decomposition


analysis residuals. The only visible difference is the versus fits plot. A random scatter of
points is also visible in the residuals versus fits plot, too which is desirable. There appears
like the decomposition model, there is no discernible pattern in the residuals, implying that
the model captures most of the variation in the data.

However, in this model, there are fewer outliers in comparison to the decomposition
residual. Also, the spread about zero in the vertical direction of the residuals is more equal for
all residuals along the horizontal axis, in comparison to regression ones. Therefore, we can
say that this model is more fitted than decomposition one, in terms of residuals.

Best Model Forecasting

So far, we have reviewed the statistical data and residual plots for the top 3 models.
What we need to consider in determining the superior forecast model are internal and external
factors for the aviation industry. In businesses, SWOT analysis is very useful in identifying
internal and external opportunities and threats.
Pegasus is one of the pioneering brand names in Turkey and the leading low-cost
airline carrier in the sector. Pegasus Airlines' board of directors and financial structure are
also considered to determine the superior model.

The adverse impact of the Covid-19 pandemic, which led to bottom-line losses,
eroded the equity level, Global recession and geopolitical risks stemming from the Russia-
Ukraine tension have increased uncertainty for both Pegasus and its competitors. We
mentioned some independent variables in the regression analysis part. Some variables have a
huge effect on the airline sector and the number of enplanements.

Therefore, considering the airline companies' desire to reduce uncertainty and foreign
dependency, we chose Holt's Model, the model that is least dependent on uncertainty and
dependence on external factors.

Estimates based on Holt's Model for the next four periods of our dataset are set out below.

Figure 30: Forecasting Periods 277 to280


Figure 31: Residual Plots for Number of Enplanements of Holt's Method

Figure 32: Overall Plot of Enplanements including Forecasted Data


MANAGERIAL IMPLICATIONS

Table 8: Forecasts for Pegasus:

According to Figure 7, for the period 277, the expected number of enplanements is
around 14,454, with a range of lower to upper. This suggests a moderate level of
enplanements during this period. For the next period, the forecasted enplanements slightly
decrease to approximately 14,265. This indicates a slight drop compared to the previous one.
The forecasted enplanements continue to decrease for the other period. The last forecast
shows a slight decrease. This means a continued downward trend in enplanements. Pegasus's
forecast model indicates a general decline in enplanements for the next four time periods.
However, it's important to remember that forecasts are not always exact, and external factors
can impact actual enplanement numbers.
Over the next four time periods, a decrease in the projected number of enplanements
is expected. This implies that Pegasus might see a drop in passenger volume at this time. Less
traffic is expected, which could influence the airline's profitability. Pegasus needs to plan for
a probable drop in demand for flights. To minimize the effect on their commercial operations,
this might necessitate modifying flight itineraries, managing resources, and putting in place
cost-saving measures. To reduce the effects of decreasing enplanements, Pegasus may also
look at steps to draw in and keep consumers, such as launching marketing campaigns, giving
affordable prices, and offering top-notch customer service. Pegasus must keep a close eye on
market conditions and be aware of anything that can have an impact on travel behaviour, such
as alterations in consumer preferences, travel restrictions, or changes in the economy. This
will enable them to make sensible business judgments and adjust their strategy as necessary.
Forecasted demand for flights enables Pegasus Airlines to allocate its resources
optimally. By aligning flight schedules, staffing levels, and aircraft capacity with anticipated
passenger traffic, Pegasus can enhance operational efficiency and cost-effectiveness.
Accurate forecasts of future passenger traffic empower Pegasus to engage in robust financial
planning and budgeting. By understanding the expected trends in enplanements, the airline
can project revenues, manage costs effectively, and make informed investment decisions.
Forecasts also play a vital role in risk mitigation for Pegasus Airlines. By understanding the
expected changes in passenger traffic, Pegasus can identify potential risks and take proactive
measures to address them. For instance, forecasts can help the airline manage challenges such
as overcapacity, underutilization of resources, or potential revenue shortfalls. Armed with this
knowledge, Pegasus can implement risk management strategies and contingency plans,
minimizing the impact of adverse events and ensuring business continuity.

Our analysis aimed to develop a reliable forecasting model for enplanements. After
examining the data, we found that Holt's method, a moving average exponential method,
provided the best fit for predicting enplanements. By fine-tuning the parameters, we achieved
the lowest prediction errors. We also investigated the impact of seasonal fluctuations on
enplanements using decomposition methods. The additive method, which considers a linear
combination of components, proved to be more effective in capturing the underlying patterns
in the data. Furthermore, we explored the relationship between enplanements and
independent variables through regression analysis. After evaluating multiple variables, we
determined that US consumption expenditures had the strongest correlation with
enplanements and was the most suitable predictor for our final model. It is important to note
that our models were affected by the Covid-19 pandemic, which introduced unusual
observations and outliers in the data. These exceptional circumstances had a significant
impact on air travel and should be considered when interpreting the forecasting results.

Overall, our refined forecasting model, incorporating Holt's method, additive


decomposition, and US consumption expenditures in regression analysis, provides a robust
approach for predicting enplanements. These findings can assist decision-makers in the
aviation industry in making informed plans and strategies. Recognizing the inherent
uncertainty of forecasting is crucial, especially in the face of unanticipated occurrences like
the Covid-19 pandemic. To provide accurate and current forecasts, it is necessary to regularly
review the model and update it as new data becomes available.
REFERENCES

Athanasopoulos, R. J. (2018). Forecasting: Principles and Practice.

You might also like