Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 23

Sarath Chandra Tumuluri

Passengers Carried from Sofia Airport (2006-2016)

Contents
INTRODUCTION........................................................................................................................2
EXOGENOUS VARIABLES.......................................................................................................3
TIME SERIES OF DATA............................................................................................................4
DATA IMPUTATION...................................................................................................................5
DATA SMOOTHING...................................................................................................................8
SAMPLE MEAN AND ACF FUNCTION................................................................................10
DATA TRANSFORMATIONS..................................................................................................12

1|Page
Sarath Chandra Tumuluri

Introduction

1) This file contains data of Wallace Library Heat consumption per hour in
BTU (traditional unit of work equal to about 1055 joules).

Microsoft Excel
Macro-Enabled Worksheet
2) Source of data : Wallace Library heat consumption provided by Rochester Institute of
Technology.
3) Exogenous variables : Outside and inside temperatures , winter ventilation provided by
mechanical and other systems, Infiltration resulting from building construction and
usage and heat required to raise the temperature of materials that is frequently
brought into heated space from outdoors.

Time Series Plot of Entire Data-Set

2|Page
Sarath Chandra Tumuluri

Cyclic or Seasonal Data and non-stationary data(no natural mean over time, trend exhibited)
with no atypical events.

Subset data for the year of 2015;

3|Page
Sarath Chandra Tumuluri

It can be seen from the graph that the heat consumption is more starting from January to Apirl
which is understandable as it is winter season and outside temperatures drop to the lowest.

4|Page
Sarath Chandra Tumuluri

Comparing the data of January 2015 with January 2016 to see if there is any correlation with
the months, it can be seen that the amount of heat consumed in the year of 2016 is more when
compared with 2015. This can be related to the exogenous variable of Outside Temperature
which might be higher in the year 2016, which resulted in more heat consumption per hour, or
BTU/hr.

5|Page
Sarath Chandra Tumuluri

Data subset of only taking the month of January and seeing if there is any trend or seasonality
followed. It can be depicted from the graph that the heat consumption is high in the month
end of January 2015.

Taking a single week of January 2015 and exploring for if any trend or seasonality shown.

6|Page
Sarath Chandra Tumuluri

We can see sudden jump of heat consumption starting 3 rd day of the week and literally in the
first few days of the week of January 2015.

Data Cleansing Operations.

7|Page
Sarath Chandra Tumuluri

Data that is provided is verified and can be seen as evenlyspaced data without any data missing
in between. For the date to come on X-Axis , special function of as.posxict is used and
sequence is spilt by 3 month duration, which can be seen in the graph above.

Now to perform data imputation techniques intentionally data has been taken out and tried
several data imputation methods on it to see the perform of each method of imputation on this
particular data set of heat consumption.Data imputation methods that are used are , Kalman ,
Interpolation and moving average.

Kalman Technique :

Interpolation Technique :

Moving Average :

8|Page
Sarath Chandra Tumuluri

Data smoothing :

Plotted data smoothing using rolling median and rolling mean for the 2 nd week of January 2015
and compared the data smoothing done by rolling median and rolling mean, which can be seen
in the graph.

Moving Average for 2 months worth of data

9|Page
Sarath Chandra Tumuluri

Moving Median for the same data

10 | P a g e
Sarath Chandra Tumuluri

Discussion on the observation


A simple moving average for span 5, which assigns 1/5 weigh to the most recent
observations. Exhibits less variability and easier to interpret and analyse if there is any
trend. But failed to remove the potential outliers.

11 | P a g e
Sarath Chandra Tumuluri

Exogenous variables

1) Aircraft Movement, that is number of planes moving in and out of the airport. Number
of passengers carried increases with the increase in the aircraft movement but not the
vice-versa. Making it a significant exogenous variable for forecasting the number of
passengers carried by the airport in the coming years.
2) Extended Airport Terminal (Hotels, Retail, Parking)
With the increase in the terminals, more business happens near the airport which
increases the number of passengers carried by the airport monthly.
3) Location of the airport
If the location of the airport is developing and creating more business around the
world, then it will have an effect on the passengers carried, which makes it an
exogenous variable.

12 | P a g e
Sarath Chandra Tumuluri

Time Series of Data

Cyclic or Seasonal Data and non-stationary data (no natural mean over time, trend exhibited)
with no atypical events

Data Subset for the year of 2016

13 | P a g e
Sarath Chandra Tumuluri

Increase in the number of passengers carried generally happens in the months of June and July
and declines at the end of the year.
The increase in the number of passengers can be attributed to the special festival that happens
in the location of Bulgaria between June and July.

Data Imputation
To understand the accuracy of the three imputation methods of, mean imputation,
interpolation imputation and kalman imputation, two observations were taken out and
tried to impute them with each of the methods.

Interpolation Imputation

Red : Indicates the imputed value


Black : Passengers carried data

152022 value recorded in Feb, 2016

170728 Value imputed by Interpolation technique


14 | P a g e
Sarath Chandra Tumuluri

10 % Error recorded

Mean Imputation

Red : Indicates the imputed value

Black : Passengers carried data

190477.2 Value calculated by Mean Imputation.

20 % variance from the original value.

15 | P a g e
Sarath Chandra Tumuluri

Kalman Imputation

Red: Indicates the imputed value Black: Passengers carried data

168858.6 Value Calculated by Kalman Imputation

11 % variance from the original value.

16 | P a g e
Sarath Chandra Tumuluri

Data Smoothing
Moving Average (N=5) for 2006

R- Code used :

library(zoo)

strdates<-as.Date(Passenngerdata$Time,"%Y-%m")

movingaverage5periods=rollmedian(Passenngerdata$Units,3)

View(movingaverage5periods)

rollingmedian<-Passenngerdata[4:123,]

View(rollingmedian)

Passengerdatess<-data.frame(rollingmedian$time,movingaverage5periods)

par(mfrow = c(2,1)) #make two graphs one above another

plot(Passenngerdata$Units[4:124], type="l",xlab = 'Dates', ylab= 'UnitsofPassengers')

17 | P a g e
Sarath Chandra Tumuluri

A simple moving average for span 5, which assigns 1/5 weigh to the most recent
observations. Exhibits less variability and easier to interpret and analyse if there is any
trend. But failed to remove the potential outliers.

Rolling Median or Moving Median Method


Applied for the entire data set, which takes the median of 3 observations.

Smoother curve with less variation from the original.

18 | P a g e
Sarath Chandra Tumuluri

Sample Mean and ACF Function

ACF function calculated for the year (2006)

With lag= 0,1,23.

Which can be interpreted as an auto-correlated, Stationary Series.

ACF for passengers carried from Sofia Airport (2006-2016)

19 | P a g e
Sarath Chandra Tumuluri

Which has been interpreted as auto-correlated data.

20 | P a g e
Sarath Chandra Tumuluri

Data Transformations
Using Differencing to remove Trend and Seasonality:

X(t)=y(t)-y(t-1)

Red: Without Differencing Black: With Differencing

21 | P a g e
Sarath Chandra Tumuluri

Red: Without Differencing Black : With Differencing

Which shows constant variance and mean over time.

Exponential Smoothing Method Decision Chart

Single Exponential Smoothing

22 | P a g e
Sarath Chandra Tumuluri

Double Exponential Smoothing

Forecasted Using Holt Winters Single Exponential Data

23 | P a g e

You might also like