Written Report - 890

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

STAT 890AI Spring/Summer 2022

Course Project
Student name: Bains, Prabhraj Kaur

Student ID number: 200469841

TIME SERIES

A time series is a dataset with observations in an ordered sequence with an explicit or implicit
attribute (or attributes) that indicate a temporal value. A time series is simply a series of data
points ordered in time. In a time series, time is often the independent variable and the goal is
usually to make a forecast for the future. Time series metrics refer to a piece of data that is
tracked at an increment in time. For instance, a metric could refer to how much inventory was
sold in a store from one day to the next. Time series data is everywhere, since time is a
constituent of everything that is observable. As our world gets increasingly instrumented,
sensors and systems are constantly emitting a relentless stream of time series data. Examples
of time series analysis:

• Electrical activity in the brain


• Rainfall measurements
• Stock prices
• Number of sunspots
• Annual retail sales
• Monthly subscribers
• Heartbeats per minute

FOURIER ANALYSIS

The Fourier analysis or harmonic analysis of a time series is a decomposition of the series into
sum of sinusoidal components (the coefficient of which are discrete Fourier transform of the
series). However, the term is used in a wider sense to describe any data analysis procedure that
describes or measures the fluctuations in a time series by comparing them with sinusoids. The
official definition of the Fourier Transform states that it is a method that allows you to
decompose functions depending on space or time into functions depending on frequency. The
Fourier Transform is a great tool for extracting the different seasonality patterns from a single
time series variable. For an hourly temperature data set, for example, the Fourier Transform
can detect the presence of day/night variations and summer/winter variations and it will tell
you that those two seasonality (frequencies) are present in your data.

FAST FOURIER TRANSFORM GRAPH

To calculate the Fast Fourier graph, we need to calculate the moving average. For calculating
moving average, we use the formula: -

After calculating MAB we will calculate the Delta f that is 1/total number of months. Using delta
f we will calculate the frequencies. Next step is to apply Fourier analysis on the data of
calculated moving averages. Lastly, we calculate the absolute values of the Fourier analysis data
set in a separate column. Finally, we plot the graph of the frequencies and the absolute values
(setting the interval of x axis from 0 to 0.5). All these steps are to be followed while calculating
Moving Average Temperature, Birth and Death.

FAST FOURIER TRANSFORM GRAPH- TEMPERATURE

FFT(Temp)
23000

22000

21000

20000

19000

18000
0 0.1 0.2 0.3 0.4 0.5
Insights from Graph

It is pretty evident that the graph is having its highest peak at point (0.0820313, 22789.0324) in
the interval of x axis from 0 to 0.5.

FAST FOURIER TRANSFORM GRAPH- BIRTH

FFT (Birth)
60

50

40

30

20

10

0
0 0.1 0.2 0.3 0.4 0.5

Insights from Graph

The highest peak of the graph is at point (0.0820313, 54.8825175) in the interval of x axis from
0 to 0.5.

FAST FOURIER TRANSFORM GRAPH- DEATH

FFT(Death)
70

60

50

40

30

20

10

0
0 0.1 0.2 0.3 0.4 0.5
Insights from Graph

The highest peak of the graph is at point (0.0820313, 59.03373405) in the interval of x axis from
0 to 0.5.

Interesting Insight: - Every graph has its highest peak when x coordinates is 0.0820313.

MULTIPLE REGRESSION

Multiple regression is a powerful statistical technique used to analyze the relationship between
a dependent variable and multiple independent variables. It expands upon the concept of
simple linear regression, which focuses on the relationship between a dependent variable and a
single independent variable.

In multiple regression, the goal is to understand how a set of independent variables, when
considered together, influences the dependent variable. It enables researchers to explore
complex relationships and determine the relative contributions of each independent variable in
explaining the variability in the dependent variable.

The multiple regression model can be represented by the equation:

𝑦 = 𝛽0 + 𝛽1𝑋1 + 𝛽2𝑋2 + 𝛽3𝑋3 + … + 𝛽𝑛𝑋𝑛 + 𝜀

here's a breakdown of each component:

• 𝑦 represents the dependent variable, which we aim to predict or explain.


• 𝑋1 , 𝑋2 , 𝑋3 , … , 𝑋𝑛 denote the independent variables or predictors, which could be
quantitative or categorical variables.
• β0 represents the intercept or constant term, which accounts for the expected mean
value of y when all independent variables are zero.
• 𝛽1 , 𝛽2 , 𝛽3 , … , 𝛽𝑛 are the regression coefficients associated with each independent
variable. these coefficients indicate the change in the dependent variable for a unit
change in the corresponding independent variable, assuming all other variables remain
constant. they reflect the strength and direction of the relationships.
• 𝜀 represents the error term or residual, which captures the unexplained variability in the
dependent variable not accounted for by the independent variables. it includes factors
that are not included in the model or random fluctuations.
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.727765196
R Square 0.52964218
Adjusted R Square 0.525923937
Standard Error 326.1778955
Observations 256

ANOVA

Significance
df SS MS F F
Regression 2 30309848.81 15154924.41 142.444184 3.64912E-42
Residual 253 26917180.94 106392.0195
Total 255 57227029.75

Standard P-
Coefficients Error t Stat value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
5.28885E-
Intercept 1040.548682 161.0396158 6.461445378 10 723.3997092 1357.697654 723.3997092 1357.697654
1.76675E-
𝑥1 23.44406306 2.118239371 11.06771188 23 19.27243463 27.61569149 19.27243463 27.61569149
6.02872E-
𝑥2 1.610667203 0.105095589 15.32573554 38 1.403693548 1.817640859 1.403693548 1.817640859

So, the equation of is: -

𝑦 = 23.44406306x1 + 1.610667203x2 +1040.548682

For 𝑥1 the p-value is 1.76675E-23 which is much smaller than the significance level of 0.05. Therefore, we
can reject the null hypothesis for 𝑥1 and conclude that there is a statistically significant relationship
between 𝑥1 and the dependent variable.

For 𝑥2 , the p-value is 6.02872E-38, which is also significantly smaller than 0.05. Hence, we can reject the
null hypothesis for 𝑥2 , and conclude that there is a statistically significant relationship between 𝑥2 and
the dependent variable.
In both cases, the p-values are much smaller than the significance level, providing strong evidence
against the null hypothesis. Therefore, based on the provided information, we can reject the null
hypothesis for both 𝑥1 and 𝑥2 , indicating that they have significant relationships with the dependent
variable.

You might also like