Professional Documents
Culture Documents
01 ASAP GM TimeSeriesForcasting _Day1_2_Introduction
01 ASAP GM TimeSeriesForcasting _Day1_2_Introduction
01 ASAP GM TimeSeriesForcasting _Day1_2_Introduction
Business Analytics
MODULE II-Unit:3
Time Series Forecasting: Day 1-2
Prof. Dr. George Mathew
B.Sc., B.Tech, PGDCA, PGDM, MBA, PhD
Phone: 9447798852
Site: https://relsoft.in 1
Time Series Forecasting
SYLLABUS
3.1 What is Time-Series analysis
3.1.1 Decomposition
3.2 Correlation and Causation
3.3 Autocorrelation
3.4 Forecasting Data
3.5 Autoregressive Modeling, Moving averages and
ARIMA
3.6 Neural networks for Time series data
Overview of Time Series Forecasting
Time series is a type of data that measures how things
change over time. In a time series data set, the time
column does not represent a variable per se: it is actually
a primary structure that you can use to order your data
set.
This primary temporal structure makes time series
problems more challenging as data scientists need to
apply specific data preprocessing and feature engineering
techniques to handle time series data.
Overview of Time Series Forecasting
Time series is a type of data that measures how things change over time. In
a time series data set, the time column does not represent a variable per se:
it is actually a primary structure that you can use to order your data set. This
primary temporal structure makes time series problems more challenging as
data scientists need to apply specific data preprocessing and feature
engineering techniques to handle time series data.
However, it also represents a source of additional knowledge that data
scientists can use to their advantage:
You will learn how to leverage this temporal information to extrapolate
insights from your time series data, like trends and seasonality information,
to make your time series easier to model and to use it for future strategy and
planning operations in several industries. From finance to manufacturing
and health care, time series forecasting has always played a major role in
unlocking business insights with respect to time.
Following are some examples of problems that time
series forecasting can help you solve:
• What are the expected sales volumes of thousands
of food groups in different grocery stores next
quarter?
• What are the resale values of vehicles after leasing
them out for three years?
• What are passenger numbers for each major
international airline route and for each class of
passenger?
• What is the future electricity load in an energy
supply chain infrastructure, so that suppliers can
ensure efficiency and prevent energy waste and
theft?
Overview of Time Series Forecasting
Time series data is an important source of information used for
future decision making, strategy, and planning operations in
different industries: from marketing and finance to education,
healthcare, and robotics. In the past few decades, machine learning
model-based forecasting has also become a very popular tool
in the private and public sectors. Currently, most of the resources
and tutorials for machine learning model-based time series
forecasting generally fall into two categories:
1. Code demonstration repo for certain specific forecasting
scenarios, without conceptual details, and
2. Academic-style explanations of the theory behind forecasting
and mathematical formula.
3. Both of these approaches are very helpful for learning purposes,
and I highly recommend using those resources if you are
interested in understanding the math behind theoretical
hypotheses.
The plot in Figure illustrates an example of time series
forecasting applied to the energy load use case.
What is Time-Series analysis?
Time-Series Analysis (TSA) is the process of extracting a summary and
other statistical information from time-series, most importantly, the analysis
of trend and seasonality. It is often an ad hoc exploration and analysis that
usually involves visualizing distributions, trends, cyclic patterns, and
relationships between features, and between features and
the target(s).
TSA is roughly exploratory data analysis (EDA) that's specific to time-
series data. This comparison can be misleading however since TSA can
include both descriptive and exploratory elements.
Let's see quickly the differences between descriptive and exploratory
analysis:
● Descriptive analysis summarizes characteristics of a dataset
● Exploratory analysis analyzes for patterns, trends, or relationships
between variables
Therefore, TSA is the initial investigation of a dataset with the goal of
discovering patterns, especially trend and seasonality, and obtaining initial
insights, testing hypotheses, and extracting meaningful summary statistics.
What is Time-Series analysis?
A part of TSA is collecting and reviewing data, examining the distribution of
variables (and variable types), and checking for errors, outliers, and
missing values. Some errors, variable types, and anomalies can be
corrected, therefore EDA is often performed hand in hand with
preprocessing and feature engineering, where columns and fields are
selected and transformed. The whole process from data loading to
machine learning is highly iterative and may involve multiple instances of
TSA at different points
Here are a few crucial steps for working with time-series:
● Importing the dataset
● Data cleaning
● Understanding variables
● Uncovering relationships between variables
● Identifying trend and seasonality
● Preprocessing (including feature engineering)
● Training a machine learning model
What is Time-Series analysis?
Importing the data can be considered prior to TSA,
and data cleaning, feature engineering, and training
a machine learning model are not strictly part of TSA.
Importing the data includes parsing, for example
extracting dates. The three steps that are central to
TSA are understanding variables, uncovering
relationships between variables, and identifying trend
and seasonality.
The steps belonging to TSA and leading to
preprocessing (feature engineering) and machine
learning are highly iterative, and can be visually
appreciated in the following time-series machine
learning flywheel:
Components of time series
There are a few additional concepts that are important to explain for those new to Python:
Scalar class: A scalar class is a numerical value alone and defines a vector
space. A quantity described by multiple scalars, such as having both direction and
magnitude, is called a vector.
Array class: An array class is a data structure that holds a fixed number of values of the
same data type.
Data type: A data type is a labeling of different data objects. It represents the type of
values in your data set used to understand how to store and manipulate data.
Primary creation method: A group of functions and operations that are typically written
and used to perform a single or a set of actions on your data. Functions, if properly written,
provide better efficiency, reproducibility, and modularity to your Python code.
Time stamps vs. Periods: The pandas library in Python provides comprehensive
and built-in support for time series data. The examples below and for the rest of
the chapter focuses on energy demand forecasting. In general, demand
forecasting is the practice of predicting future demand (or quantity of specific
products and services) by using historical demand data to make informed business
decisions about future plans, such as planning a product inventory optimization
strategy, allocating marketing investments, and estimating the cost of new
products and services to meet customer demand and needs.
Energy demand forecasting is a type of demand forecasting where the goal is to
predict the future load (or energy demand) on an energy grid. It is a critical
business operation for companies in the energy sector, as operators need to
maintain the fine balance between the energy consumed on a grid and the energy
supplied to it. Typically, grid operators can take short-term decisions to manage
energy supply to the grid and keep the load in balance. An accurate short-term
forecast of energy demand is therefore essential for the operator to make these
decisions with confidence. This scenario details the construction of a machine
learning energy demand forecasting solution.
Let's first look at the concept of time stamped data, which is the most essential
type of time series data and helps you combine values with specific points in time.
This means that each data point of your data set will have a temporal information
that you can leverage.
Time Series Exploration and Understanding
In the following sections, you will learn the first steps to take to explore,
analyze, and understand time series data. We will focus on these topics:
➢ How to get started with time series data analysis
➢ How to calculate and review summary statistics for time series data
➢ How to perform data cleaning of missing periods in time series
➢ How to perform time series data normalization and standardization