Time Series Analysis and Spectral Analysis

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 11

Research Methodology in Science

FUNDAMENTALS OF TIME SERIES ANALYSIS AND SPECTRAL


ANALYSIS
Time Series
A time series is a set of data indexed by time. For example {y t:t=1,2,…n}. Diggle (1990)
notes that observations do not need to be evenly spaced and that a “more honest” notation
might be {y(ti):t=1,2,…n}.

Autocovariance
Time series are typically characterized by some degree of serial dependence. This
dependence can be measured by the autocovariance, which is simply the covariance
between two elements in the series γ(s,t)=cov(ys,yt)=E(ys−μs)(yt – μt)

Autocorrelation Function (ACF)


The ACF is measure of the linear predictability of the series. It is the Pearson correlation
coefficient between to elements of a time series, e.g., at times s and t.

γ (s , t)
ρ(s,t) =
√ γ (s , s) γ ( t , t)
Cross-correlation Function (CCF)
The CCF is the linear predictability of one series yt from some other series xs

(γ xy (s , t))
ρxy(s,t)=
√(γ x (s , s )γ y (t , t))
where, γxy(s,t)=cov(xs,yt)=E(xs−μxs)(yt−μyt) is the cross-covariance.

Time Series in R
R has a class for regularly-spaced time-series data (ts) but the requirement of regular
spacing is quite limiting. Epidemic data are frequently irregular. Furthermore, the format of
the dates associated with reporting data can vary wildly. The package zoo (which stands for
“Z’s ordered observations”) provides support for irregularly-spaced data that uses arbitrary
ordering format.

Time Series Analysis Types

Because time series analysis includes many categories or variations of data, analysts
sometimes must make complex models. However, analysts can’t account for all variances,
and they can’t generalize a specific model to every sample. Models that are too complex or
that try to do too many things can lead to a lack of fit. Lack of fit or overfitting models lead
to those models not distinguishing between random error and true relationships, leaving
analysis skewed and forecasts incorrect.

Models of time series analysis include:

Classification: Identifies and assigns categories to the data.



Curve fitting: Plots the data along a curve to study the relationships of

variables within the data.
 Descriptive analysis: Identifies patterns in time series data, like trends,
cycles, or seasonal variation.
 Explanative analysis: Attempts to understand the data and the relationships
within it, as well as cause and effect.
 Exploratory analysis: Highlights the main characteristics of the time series
data, usually in a visual format.
 Forecasting: Predicts future data. This type is based on historical trends. It
uses the historical data as a model for future data, predicting scenarios that could happen
along future plot points.
 Intervention analysis: Studies how an event can change the data.
 Segmentation: Splits the data into segments to show the underlying properties
of the source information.

Data classification

Further, time series data can be classified into two main categories:

Stock time series data means measuring attributes at a certain point in time,

like a static snapshot of the information as it was.
 Flow time series data means measuring the activity of the attributes over a
certain period, which is generally part of the total whole and makes up a portion of the
results.

Data variations

In time series data, variations can occur sporadically throughout the data:

Functional analysis can pick out the patterns and relationships within the data

to identify notable events.
 Trend analysis means determining consistent movement in a certain direction.
There are two types of trends: deterministic, where we can find the underlying cause, and
stochastic, which is random and unexplainable.
 Seasonal variation describes events that occur at specific and regular
intervals during the course of a year. Serial dependence occurs when data points close
together in time tend to be related.

Time series analysis and forecasting models must define the types of data relevant to
answering the business question. Once analysts have chosen the relevant data they want to
analyze, they choose what types of analysis and techniques are the best fit.

Important Considerations for Time Series Analysis

While time series data is data collected over time, there are different types of data that
describe how and when that time data was recorded. For example:

 Time series data is data that is recorded over consistent intervals of time.
 Cross-sectional data consists of several variables recorded at the same time.
 Pooled data is a combination of both time series data and cross-sectional data.

Time Series Analysis Models and Techniques


Just as there are many types and models, there are also a variety of methods to study data.
Here are the three most common.

 Box-Jenkins ARIMA models: These univariate models are used to better


understand a single time-dependent variable, such as temperature over time, and to predict
future data points of variables. These models work on the assumption that the data is
stationary. Analysts have to account for and remove as many differences and seasonalities
in past data points as they can. Thankfully, the ARIMA model includes terms to account for
moving averages, seasonal difference operators, and autoregressive terms within the model.
 Box-Jenkins Multivariate Models: Multivariate models are used to analyze
more than one time-dependent variable, such as temperature and humidity, over time.

Holt-Winters Method: The Holt-Winters method is an exponential smoothing technique. It


is designed to predict outcomes, provided that the data points include seasonality.

Methods for time series analysis

Time series analysis is a powerful technique for studying patterns and trends in data
that change over time, such as speech, language, and communication. In linguistics
research, time series analysis can help you answer questions such as how language evolves,
how speakers vary their speech, and how linguistic features correlate with social or
cognitive factors. Data preparation methods are

 Data visualization
 Data modeling
 Data analysis
 Data interpretation
 Data ethics

Data preparation
Before you can perform any time series analysis, you need to prepare your data in a
suitable format. This means that you need to have a clear definition of your time variable,
such as date, hour, or second, and a measure of your linguistic variable, such as word
frequency, pitch, or sentiment. You also need to check for missing values, outliers, and non-
stationarity, which can affect the quality and validity of your analysis. Depending on your
research question, you may also need to transform, aggregate, or normalize your data to
make it more comparable or interpretable.

Data visualization
One of the simplest and most effective ways to explore your time series data is to
visualize it. Visualization can help you identify patterns, trends, cycles, and anomalies in
your data, and generate hypotheses for further analysis. There are many tools and libraries
that can help you create interactive and informative plots of your time series data, such as
matplotlib, seaborn, plotly, and ggplot2 in Python and R. Some of the common types of
plots for time series data are line charts, scatter plots, histograms, box plots, and heat maps.

Data modelling
To go beyond descriptive statistics and infer causal relationships or make
predictions from your time series data, you need to use data modeling techniques. Data
modeling involves fitting a mathematical function or a statistical model to your data, and
testing its accuracy and significance. There are many types of models for time series data,
such as autoregressive models, moving average models, exponential smoothing models, and
neural network models. Some of the tools and libraries that can help you build and evaluate
these models are statsmodels, scikit-learn, TensorFlow, and Keras in Python and R.

Data analysis
Once you have a suitable model for your time series data, you can use it to perform
various types of data analysis, such as trend analysis, seasonality analysis, correlation
analysis, and anomaly detection. These types of analysis can help you answer specific
questions about your data, such as how your linguistic variable changes over time, how it is
influenced by external factors, how it relates to other variables, and how it deviates from
normal behavior. Some of the tools and libraries that can help you perform these types of
analysis are pandas, numpy, scipy, and dplyr in Python and R.

Data interpretation
The final step of time series analysis is to interpret your results and communicate
them to your audience. This means that you need to explain what your plots, models, and
statistics mean in terms of your research question and hypothesis, and how they contribute
to the existing knowledge in your field. You also need to acknowledge the limitations and
assumptions of your analysis, and suggest directions for future research. Some of the tools
and libraries that can help you create and share reports and presentations of your results are
Jupyter Notebook, R Markdown, Shiny, and Dash in Python and R.

Data ethics
As a linguistics researcher, you also need to be aware of the ethical implications of
your time series analysis, especially if you are dealing with sensitive or personal data, such
as speech recordings, text messages, or social media posts. You need to respect the privacy
and consent of your data sources, and follow the ethical guidelines and regulations of your
institution and discipline. You also need to be transparent and honest about your data
collection, processing, and analysis methods, and avoid any bias or manipulation of your
data or results.

Data preparation and feature engineering tools for time series


Data preparation and feature engineering are two very important steps in the
data science pipeline. Data preparation is typically the first step in any data science project.
It’s the process of getting data into a form that can be used for analysis and further
processing. Feature engineering is a process of extracting features from raw data to make it
more useful for modelling and prediction. Below, we’ll mention some of the most popular
tools used for these tasks.
Time Series Forecasting: Data, Analysis, and Practice

 Time series projects with Pandas


 Time series projects with NumPy
 Time series projects with Datetime
 Time series projects with Tsfresh

Time series projects with Pandas


Pandas is a Python library for data manipulation and analysis. It includes data
structures and methods for manipulating numerical tables and time series. Also, it contains
extensive capabilities and features for working with time series data for all domains.
It supports data input from a variety of file types, including CSV, JSON, Parquet, SQL
database tables and queries, and Microsoft Excel. Also, Pandas allows various data
manipulation features such as merging, reshaping, selecting, as well as data cleaning and
wrangling.
Some useful time series features are:
 Date range generation and frequency conversions
 Moving window statistics
 Moving window linear regressions
 Date shifting
 Lagging and many more
Time series projects with NumPy
NumPy is a Python library that adds support for huge, multi-dimensional arrays and
matrices, as well as a vast number of high-level mathematical functions that may be used on
these arrays. It has a very similar syntax to MATLAB and includes a high-performance
multidimensional array object as well as capabilities for working with these arrays.
NumPy’s datetime64 data type and arrays enable an extremely compact representation of
dates in time series. Using NumPy also makes it simple to do various time series operations

Time series projects with Datetime


Datetime is a Python module that allows us to work with dates and times. This module
contains the methods and functions required to handle the scenarios such as:
 Representation of dates and times
 Arithmetic of dates and times
 Comparison of dates and times
 Working with time series is simple using this tool. It allows users to transform
dates and times into objects and manipulate them. For example, with only a few lines of
code, we may convert from one DateTime format to another, add a number of days, months,
or years too.

Time series projects with Tsfresh


Tsfresh is a Python package. It automatically calculates a large number of time series
characteristics, known as features. The package combines established algorithms from
statistics, time series analysis, signal processing, and non-linear dynamics with a robust
feature selection algorithm to provide systematic time series feature extraction.
The Tsfresh package includes a filtering procedure to prevent the extraction of irrelevant
features. This filtering procedure assesses each characteristic’s explaining power and
significance for the regression or classification tasks.
Some examples of advanced time series features are:

 Fourier transform components


 Wavelet transform
 Partial autocorrelation and others

Data analysis and visualization packages for time series


Data analysis and visualization packages are tools that help data analysts to create
graphs and charts from their data. Data analysis is defined as the process of cleaning,
transforming, and modelling data in order to uncover useful information for business
decisions. The goal of data analysis is to extract useful information from data and make
decisions based on that information.
The graphical representation of data is known as data visualization. Data visualization tools,
which use visual elements such as charts and graphs, provide an easy way to see and
understand trends and patterns in data.
There is a wide range of data analysis and visualization packages for time series and some
of them are

 Time series projects with neptune.ai


 Time series projects with Matplotlib
 Time series projects with Plotly
 Time series projects with Statsmodels

Time series projects with neptune.ai


neptune.ai is an experiment tracking tool used by more than 20,000 data scientists,
machine learning engineers, and researchers. It provides a convenient interface for
organizing and controlling models in a single place.
With neptune.ai, it’s possible to Record information about datasets, parameters, and code
regarding every model. Have all the metrics, charts, and any other ML metadata organized
in a single place.
 Effortless reproducible model training and make comparisons.
 Back up everything on the cloud.
 Integrate it with more than 25 libraries, such as PyTorch, TensorFlow, Matplotlib, and others.

Time series projects with Matplotlib


Probably the most popular Python package for data visualization is Matplotlib. It’s
used for creating static, animated, and interactive visualizations. With Matplotlib it’s
possible to do some things such as:
 Produce plots suitable for publication
 Create interactive figures that can be zoomed in, panned, and updated
 Change the visual style and layout

Time series projects with Plotly


Plotly is an interactive, open-source, and browser-based graphing library for Python and R.
It’s a high-level, declarative charting library with over 30 chart types, including scientific
charts, 3D graphs, statistical charts, SVG maps, financial charts, and more. Besides that,
with Plotly it’s possible to draw interactive time series-based charts such as lines, gantts,
scatter plots, and similar.

Time series projects with Statsmodels


Statsmodels is a Python package that provides classes and functions for estimating a wide
range of statistical models, as well as running statistical tests and statistical data analysis.
We’ll cover in more detail this library in the section about forecasting but here it’s worth
mentioning that it provides a very convenient method for time series decomposition and its
visualization. With this package, we can easily decompose any time series and analyze its
components such as trend, seasonal components, and residual or noise

Experiment tracking tools for time series


Experiment tracking tools are usually high-level tools that can be used for a variety of
purposes like tracking the results of an experiment, showing what would happen if one
changed the parameters in an experiment, model management, and similar.
They are typically more user-friendly than low-level packages and can save a significant
amount of time when developing machine learning models. Only two of them will be
mentioned here, as they are most likely the most popular ones.
For time series, it’s especially important to have a convenient environment for tracking
defined metrics and hyperparameters, since it’s most likely that we would need to run a lot
of different experiments. Usually, time series models are not big in comparison to some
convolution neural networks and as an input have a few hundred or thousand numerical
values, so models train pretty fast. Also, they often require quite some time for
hyperparameter tuning.
Finally, it would be very beneficial to connect in one place models from different packages
as well as visualization tools.
Spectral Analysis
 Many time series show periodic behavior. This periodic behavior can be very
complex. Spectral analysis is a technique that allows us to discover underlying periodicities.
To perform spectral analysis, we first must transform data from time domain to frequency
domain.
 The technical details of spectral analysis go well beyond the scope of these
notes. The classic source is Priestly (1981), but there are plenty of others. In brief, the
covariance of the time series can be represented by a function known as the spectral density.
The spectral density can be estimated using on object known as a periodogram, which is the
squared correlation between our time series and sine/cosine waves at the different
frequencies spanned by the series (Venables & Ripley 2002).
 For large n, the periodogram is approximately independent for distinct
frequencies. This independence can be improved – as can the visual quality and
interpretability of the plot – by smoothing the periodogram using a kernel smoother (which
is generally some sort of weighted running average).

Coherence
 Coherence is a time-series measure similar to correlation. It’s a measure of
recurrent phenomena (i.e., waves). Two waves are coherent if they have a constant relative
phase.
 Most approaches to finding periodic behavior (including coherence) assume
that the underlying series are stationary, meaning that the mean of the process remains
constant. Clearly, this is not such a good assumption when the goal of an analysis is to study
environmental change. Wavelets allow us to study localized periodic behavior. In
particular,we look for regions of high-power in the frequency-time plot.

The following are some of the research outcomes where spectral analysis played a vital
role.

High-Precision Spectral Analysis Techniques

 High-precision spectral analysis techniques have proved to be an important means to


carry out research on rotor system stability. After adopting this technique, the identification
accuracy of the spectral feature parameter for the speeding-change process improved
significantly and the maximum amplitude error was controlled at less than 15%.
 In this research, spectral analysis techniques were linked with several other methods,
namely, proportional interpolation, time-space domain transformation, and time-domain
refinement into the sampling method.

Arc Atomic Emission Spectral Analysis Method

 Arc atomic emission spectral analysis method is a novel method for the
determination of macro and micro contents of human bio-substrates. This analysis is based
on the complex physical and chemical studies for preparing hair. Following this technique,
analysis was carried out on the hair samples of a group of patients in order to diagnose and
also to restore the element balance in the body. The research revealed that by comparing the
elemental content in the human hair with reference values, it is possible to assess the degree
of element imbalance in the body.
 Spectral analysis also offers a rapid, accurate, versatile, and reliable method of
measuring the quality of both fresh and frozen fish by identifying and quantifying specific
contaminants and determining physical/chemical processes that indicate spoilage.
Spectrophotometric instrumentation has been recently used to monitor a number of key
parameters for quality checks, such as oxidative rancidity, dimethylamine, ammonia,
hypoxanthine, thiobarbituric acid, and formaldehyde levels.
 Researchers have developed a novel colorimetric method, i.e., analysis of tri-
methyl amine using microvolume UV-Vis spectrophotometry in combination with
headspace-single-drop microextraction. This method has increased sensitivity, stability,
simplicity, and rapidity which provides the detection of spoilage at an earlier stage across a
larger number of species. This spectral analysis technique is an economical method for
quality assurance and thus has a huge positive impact on the fish industry.

Entropy Spectral Analysis Methods

 Entropy spectral analysis methods are applied for the forecasting of streamflow that is
vital for reservoir operation, flood control, power generation, river ecological restoration,
irrigation, and navigation. This method is used to study the monthly streamflow for five
hydrological stations in northwest China and is based on using maximum Burg entropy,
maximum configurational entropy, and minimum relative entropy.
 Similarly, spectral analysis acts as an important tool for deciphering information from
the paleoclimatic time series in the frequency domain. Thus, it is utilized to detect the
presence of harmonic signal components in a time series or to obtain phase relations
between harmonic signal components being present in two different time series (cross-
spectral analysis). The spectral analysis of surface waves (SASW) method is a
nondestructive method that determines the moduli and thicknesses of pavement systems.
 Importance of time series data analysis
 Time series analysis helps organizations understand the underlying causes of trends or
systemic patterns over time. Using data visualizations, business users can see seasonal
trends and dig deeper into why these trends occur.

When organizations analyze data over consistent intervals, they can also use time series
forecasting to predict the likelihood of future events. Time series forecasting is part of
predictive analytics. It can show likely changes in the data, like seasonality or cyclic
behavior, which provides a better understanding of data variables and helps forecast better.
Working principle of spectral analysis
The spectrum calculation has been carefully designed to provide a coherent, statistically
reasonable result that works consistently for many different types of data.
 In the specified region, every sample is retained for spectral analysis. Above
and below this window, an additional 100ms of data is sampled and ramped to zero, with
empty samples at the window boundaries excluded. The spectrum for each prepared trace is
calculated, smoothed and resampled to a 1Hz increment.
 Using this approach, the smallest possible sampled region is 200ms, assuming
no empty samples or data boundaries.
 The spectral resolution is dependent on the number of samples in the window.
For the minimal window case at a 2ms sample rate, 100 frequencies can be calculated prior
to resampling. For larger windows with more samples, the spectral resolution increases
accordingly.
 This results in a spectrum that is reasonable for the data, without suffering
from edge effects or bias from limited window sizes

Error ANALYSIS

The Idea of Error


 The concept of error needs to be well understood.
 A measurement may be made of a quantity which has an accepted value which
can be looked up in a handbook (e.g.. the density of brass). The difference between the
measurement and the accepted value is not what is meant by error. Such accepted values are
not "right" answers. They are just measurements made by other people which have errors
associated with them as well.
 Nor does error mean "blunder." Reading a scale backwards, misunderstanding
what you are doing or elbowing your lab partner's measuring apparatus are blunders which
can be caught and should simply be disregarded.
 Obviously, it cannot be determined exactly how far off a measurement is; if
this could be done, it would be possible to just give a more accurate, corrected value.
 Error, then, has to do with uncertainty in measurements that nothing can be
done about. If a measurement is repeated, the values obtained will differ and none of the
results can be preferred over the others. Although it is not possible to do anything about
such error, it can be characterized. For instance, the repeated measurements may cluster
tightly together or they may spread widely. This pattern can be analyzed systematically.

Classification of Error
Generally, errors can be divided into two broad and rough but useful classes:
 systematic errors
 Random errors

Systematic errors
systematic errors are errors which tend to shift all measurements in a systematic way so
their mean value is displaced. This may be due to such things as incorrect calibration of
equipment, consistently improper use of equipment or failure to properly account for some
effect. In a sense, a systematic error is rather like a blunder and large systematic errors can
and must be eliminated in a good experiment. But small systematic errors will always be
present. For instance, no instrument can ever be calibrated perfectly.
Other sources of systematic errors are external effects which can change the results of the
experiment, but for which the corrections are not well known. In science, the reasons why
several independent confirmations of experimental results are often required (especially
using different techniques) is because different apparatus at different places may be affected
by different systematic effects. Aside from making mistakes (such as thinking one is using
the x10 scale, and actually using the x100 scale), the reason why experiments sometimes
yield results which may be far outside the quoted errors is because of systematic effects
which were not accounted for.

Random errors
 Random errors are errors which fluctuate from one measurement to the next. They
yield results distributed about some mean value. They can occur for a variety of reasons.
 They may occur due to lack of sensitivity. For a sufficiently a small change an
instrument may not be able to respond to it or to indicate it or the observer may not be able
to discern it.
 They may occur due to noise. There may be extraneous disturbances which cannot be
taken into account.
 They may be due to imprecise definition.
 They may also occur due to statistical processes such as the roll of dice.
 Random errors displace measurements in an arbitrary direction whereas systematic
errors displace measurements in a single direction. Some systematic error can be
substantially eliminated (or properly taken into account). Random errors are unavoidable
and must be lived wit.

You might also like