Project 1

Project 1 (Individual)
Statistical Forecasting Project 1: Individual

Assignment
The goal of this project is to get you familiar with the basic concepts of time series data, including cleaning, visualizing,
analyzing and basic forecasting. This is an individual project and you’re expected to find your own dataset and provide
regular updates to your professor (see instructional plan for dates).
The main requirement is a formal report containing all necessary analysis and graphs. You can write this report in MS
Word or using another tool like RMarkdown. Each of the below headings should be a section in your report.
For a detailed breakdown of the grades, please see the rubric on eConestoga.
1. Data
There are two main requirements for the data: first, that it has a time dimension and second that the data came from a
source outside of R or Python (in other words, the data should not just be contained inside a library you import). That
means that you will likely find a .csv or .xls file online as your source of data.
After finding the data, you’re expected to check-in with your professor at Project Check-in 1 and discuss the data you’ve
chosen with them.
If your data is acceptable to the professor, you may proceed with the Data section of your report. Here you’ll discuss the
data, where you found it and what type of cleaning you had to do to import it into R. You should also discuss why you
chose the data and what you hope to achieve with it. Think about a practical problem you could solve by forecasting this
data and discuss that.
Weight: 2 points
Length: At least 400 words
2. Visualization
You should perform basic visualizations of your data: a time plot and a ACF plot are the minimum requirement but feel
free to add any extra plots or graphs.
This section of the report should contain the graphs as well as a discussion about what patterns and details you notice in
the visualizations. At a minimum you should discuss whether you notice any trends or seasonal periods and whether the
data is changing over time.
Weight: 2 points
Length: At least 200 words and 2 plots (Time plot, ACF)
1
3. Transformations
At this point you can perform more formal analysis of the trends and patterns you noticed in the last section. You should
perform a decomposition of the data to extract information about any trends or seasonality. Provide a plot of the results
here.
If a Box-Cox transformation is necessary, you should perform that at this point too.
Discuss the results here: how strong are the seasonal and trend effects and what is your rationale for transforming the
data.
Weight: 2 points
Length: At least 200 words and a plot (Decomposition)
4. Forecasting and Analysis

Choose at least 2 of the basic models we’ve learned at this point and perform a forecast with them (including a plot of
the forecast). Compare the accuracy of the models using the metrics we’ve learned in this course. Analyze the residuals
to check if there are any issues with them (and try and solve those issues if possible). Discuss which technique delivered
the best forecasts.
Weight: 4 points
Length: At least 400 words and 4 plots (Forecast plots, residual plots for your models)
Submission Instructions
 This assignment is to be completed individually
 Submit your report, your dataset, and any code/commands you used to do this project
 Your assignment must be submitted to eConestoga by the date/time stated in your Instructional Plan.
 Include a cover slide with the following information for each group member:
o Your full name
o Student number
o Course number
2
Data Introduction:
I have acquired a dataset from Kaggle, a reputable online platform for datasets and data
science resources. The dataset is centered around financial market indices, including the
Standard & Poor's 500 (spx), German DAX (dax), UK FTSE 100 (ftse), and the Japanese
Nikkei 225 (nikkei). The time dimension in this dataset is crucial as it allows us to explore
and analyze how these financial indices have evolved over time.
Data Structure:
The dataset comprises five columns, each serving a distinct purpose. These columns are:
1. spx: Represents the Standard & Poor's 500 index.
2. dax: Denotes the German DAX index.
3. ftse: Refers to the UK FTSE 100 index.
4. nikkei: Signifies the Japanese Nikkei 225 index.
5. Date: This column serves as the temporal component, providing the date associated
with each financial data point.
Data Cleaning and Preprocessing:
Before embarking on the analysis, some data cleaning and preprocessing steps were
necessary to ensure the dataset's suitability for time series forecasting. These steps
included:
1. Handling Missing Values: I meticulously checked for any missing values in the dataset.
Fortunately, there were no significant gaps, which ensured data completeness.
2. Date Format Conversion: To work with time series data effectively, I converted the date
column into a proper date format. This step is critical for creating time plots and
accurate date-based calculations.
3. Data Consistency: I verified data consistency, ensuring that all entries in the dataset
adhered to a consistent format and unit of measurement. This is essential to avoid
anomalies in the time series.
Rationale for Dataset Selection:
I chose this financial market dataset for several compelling reasons:
1. Real-World Relevance: Financial markets play a pivotal role in the global economy,
making their analysis and forecasting highly relevant. Insights from this data can inform
investment decisions and economic predictions.
2. Economic and Investment Forecasting: The dataset's time series nature aligns well with
the project's objectives, which include applying time series forecasting techniques. By
examining these indices, I hope to make informed predictions regarding stock market
trends. For instance, forecasting the Nikkei index (nikkei) could help in predicting the
Japanese stock market's future performance.
3
3. Abundance of Data: With 7,255 data points, this dataset offers a substantial historical
record, enabling more robust and insightful analysis.
4. Practical Application: The ability to forecast financial indices has practical applications
for investors, traders, and economic analysts. By exploring and forecasting this data, we
can gain insights that have real-world implications.
In summary, the dataset's time series nature, along with its relevance and practical
application, makes it an ideal choice for this project. By analyzing and forecasting these
financial indices, I aim to contribute to the understanding of market dynamics and offer
valuable insights for decision-making in the financial world.
2. visualization
To perform time series forecasting on the "Nikkei" index, we should start by visualizing
the data. A time plot (line chart) and an ACF (Auto-Correlation Function) plot are
essential for understanding the underlying patterns and characteristics of the time
series data. The dataset contains four other indices, but we will focus on "Nikkei."
Here are the time plot and ACF plot for the "Nikkei" index, considering years:
In the time plot, you can observe the following patterns and details:
Overall Trend: The Nikkei index appears to have an upward trend over the years, with
some fluctuations.
Seasonal Periods: It seems that there might be some seasonal patterns, with periodic
fluctuations repeating over time.
In the ACF plot:
The ACF plot shows autocorrelations at various lags. The autocorrelations are significant
at several lags, indicating the presence of serial correlation in the data.
The zoomed ACF plot reveals a clear, significant autocorrelation at a lag of 365, which
suggests a yearly seasonality in the data.
These visualizations suggest that the Nikkei index data has both a trend and a seasonal
component. Time series forecasting models such as ARIMA or SARIMA could be applied
4
to capture these patterns and make predictions for future values of the Nikkei index.
The significant autocorrelations in the ACF plot suggest that the data is not changing
randomly over time, making it a suitable candidate for time series analysis.
Transformation : To perform a more formal analysis of the trends and patterns in the
Nikkei index time series data, we can decompose the data to extract information about
any trends and seasonality. Additionally, we'll consider whether a Box-Cox
transformation is necessary to stabilize variance.
Results and Discussion:
Trend Component: The decomposition plot reveals a clear upward trend in the Nikkei
index. This confirms what we observed in the time plot earlier. The trend component is
quite strong, indicating a long-term growth pattern in the Nikkei index.
Seasonal Component: The decomposition also shows a significant seasonal component.

The seasonal pattern appears to be quite regular, with periodic fluctuations occurring
approximately every 365 days. This confirms our suspicion of yearly seasonality.
Residual Component: The residual component represents the noise or irregular

fluctuations in the data. It appears that there is some residual variation after removing
the trend and seasonal components, which can be further explored in the modeling
phase.
Considering the strong trend and seasonal effects in the data, it might be appropriate to
apply a Box-Cox transformation to stabilize the variance. The Box-Cox transformation is
used to make the data more closely follow a normal distribution, which can improve the
performance of forecasting models.
To determine the optimal lambda (λ) for the Box-Cox transformation, you can use
statistical methods such as the Box-Cox power transformation. This will help ensure that
the transformed data meets the assumptions of many time series models and results in
more accurate forecasts.
In summary, the decomposition of the Nikkei index data confirms the presence of a
strong trend and yearly seasonality. These patterns should be considered in the
selection of a time series forecasting model. Additionally, the Box-Cox transformation
5
can be explored to stabilize the variance and make the data more amenable to
modeling.
Forecasting
To forecast the Nikkei index data, we will explore two basic time series forecasting
models: Exponential Smoothing (ETS) and Seasonal Decomposition of Time Series (STL).
We'll compare the accuracy of these models using metrics like Mean Absolute Error
(MAE), Mean Squared Error (MSE), and Root Mean Squared Error (RMSE). Additionally,
we will analyze the residuals to identify any issues and attempt to address them if
necessary.
Analysis:
1. Exponential Smoothing (ETS):
 ETS provides a relatively accurate forecast, capturing the trend and seasonality.
 To address the residuals, you could further explore different levels of ETS (e.g.,
ETS(M, M, M) for multiplicative errors) or consider a Box-Cox transformation
before modeling to stabilize the variance.
2. Seasonal Decomposition of Time Series (STL):
 The STL model also offers a decent forecast by explicitly separating the trend and
seasonal components.
 The residuals for STL seem to have less structure than ETS. However, there are
some spikes in the residuals, indicating potential issues in capturing the variability
in the data.
To compare the accuracy of the models, let's calculate and discuss some key metrics.
The accuracy metrics for both models will include MAE, MSE, and RMSE.
In the end, the choice between ETS and STL depends on the specific characteristics of
your data and your modeling goals. ETS might be preferred if you're looking for a
simpler model with good forecasting accuracy. STL, on the other hand, is more
sophisticated and can capture more complex seasonality patterns. It's important to
consider the trade-offs between model complexity and forecasting performance when
making your final choice.

Project 1

Uploaded by

Copyright:

Available Formats

You might also like

Project 1

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Project 1

Uploaded by

Copyright:

Available Formats

Project 1 (Individual)

Statistical Forecasting Project 1: Individual

Length: At least 400 words

Length: At least 200 words and 2 plots (Time plot, ACF)

Length: At least 200 words and a plot (Decomposition)

4. Forecasting and Analysis

In the ACF plot:

Seasonal Component: The decomposition also shows a significant seasonal component.

Residual Component: The residual component represents the noise or irregular

You might also like