Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 18

Data Presentation and Analysing

BABAJIDE, Adesina David (Doctoral Student)


Department of Banking & Finance,
School of Management and Business Studies
Yaba College of Technology,
Yaba Lagos
https://orcid.org/0000-0002-6278-8644
Web of Science Researcher ID: ADK-8747-2022
Email: david.babajide@yabatech.edu.ng

Abstract
Data is the basis of information, reasoning, or calculation, it is analysed to obtain
information. Data analysis is a process of inspecting, cleansing, transforming, and data
modeling with the main aim of discovering useful information, informing conclusions, or
supporting theories for empirical decision-making. It is observed that most business
decision is taken in a dynamic and uncertain business environment, therefore data
analyses are necessary to assist management in take an informed decision that will
enhance performance. Data enable business plan again without repeating past mistakes.
Data Presentation forms an integral part of all academic and business research as
well as professional practices. Data presentation requires skills and an
understanding of data. It is necessary to make use of collected data which are
considered to be the raw fact that needs to be processed to provide empirical
information.
Analysis starts with the collection of data, followed by processing (which can be
done by various data processing methods). Processed data helps in obtaining
information from it, as the raw form is non-comprehensive in nature. Data can be
presented in form of a table of frequency (which is displayed in frequencies or
percentages or both), diagrammatic display (graphs, charts, maps and other
methods). These methods help in adding the visual aspect which makes it much
more comfortable and easy to understand. This visual representation is also called
data visualization.
Data analysis in business to understand problems facing an organisation, and to explore
data in meaningful ways. ess. This means finding out more about your customers, knowing
what they buy and how much they spend as well as their preferences and purchasing
habits. It also involves keeping tabs on your competitors to find out who’re their
customers and their spending habits, preferences, and other details in other to gain
competitive edge in the market.

Key words: Diagnostic Analysis, Predictive Analysis, Prescriptive Analysis, ARMA

Section I: Introduction
Data is defined as things known or assumed as facts which form basis of information,
reasoning or calculation. Meaning data is analysed to obtain information. Therefore data
analysis is a process of inspecting, cleansing, transformation and data modelling with the

1
main aim of discovering useful information, informing conclusions, and supporting
empirical decision-making. These processes involves a number of closely related
operations which are performed with the purpose of summarizing the collected data,
organize and manipulate to bring out information that proffer solution to research
questions raised. Xia and Gong, (2015) opined that in today's business world, data analysis
plays a role in making decisions more scientific and helping businesses operate more
effectively.
Most business decision are taking under dynamic and uncertain business environment,
data analyses is necessary to assist management and other stakeholders take an informed
decision that will enhance performance. Data enable business plan again without repeating
past mistakes.
Data Presentation forms an integral part of all academic and business research as
well as professional practices. Data presentation requires skills and understanding
of data. It is necessary to make use of collected data which are considered to be raw
fact that need to be processed to provide empirical information. Data analysis helps
in the interpretation of data and help take a decision or answer the research
questions raised in the process of research. This is done by using various Data
processing tools and Software. Analysis starts with the collection of data, followed
by processing (which can be done by various data processing methods). Processed
data helps in obtaining information from it, as the raw form is non-comprehensive
in nature. Data can be presented in form of table of frequency (which is displayed
in frequencies or percentage or both), diagrammatic display (graphs, charts, maps
and other methods). These methods help in adding the visual aspect which makes it
much more comfortable and easy to understand. This visual representation is
also called data visualization.
Data analysis is a process of inspecting, cleansing, transformation and data
modelling with the main aim of discovering useful information, informed
conclusions, and supporting empirical decision-making. These processes involves a
number of closely related operations which are performed with the purpose of
summarizing the collected data, organize and manipulate to bring out information
that proffer solution to research questions raised. Xia and Gong, (2015) opined that
in today's business world, data analysis plays a role in making decisions more
scientific and helping businesses operate more effectively. Most business decision
are taking under dynamic and uncertain business environment, data analyses is
necessary to assist management and other stakeholders take an informed decision
that will enhance performance. Data enable business plan again without repeating
past mistakes.
Data analysis tools make it easier for users to process and manipulate data, analyze
the relationships and correlations between data sets, and it also helps to identify
patterns and trends for interpretation. This paper is organized in five section,
Section I is the introduction, section II review types of Data while Section III deals
with Method of Analyzing Data while section IV is review various method of data
cleaning and Section V is about the summary and conclusion of the paper.

Section II: Types of Data


Data is defined as things known or assumed as facts which form basis of
information, reasoning or calculation, therefore data is analysed to obtain

2
information. There are basically two types of data in statistics, which can be further
sub-divided into four (4) types. Data can be quantitative or qualitative, quantitative
data are data about quantities of things, things that we measure, and so we describe
them in terms of numbers. As such, quantitative data are also called Numerical data
while qualitative data (which is also known as categorical data) give us information
about the qualities of things. They are observed phenomenon, not measured, and so
we generally label them with names. Therefore all data that are collected are either
measured (Quantitative data) or are observed feature of interest (Qualitative data).
Quantitative Data can either be discrete and continuous data, discrete data is data
that can only take certain values and cannot be made more precise. This might only
be whole numbers, like the numbers of students in a class at certain time (any
number from 1 to 20 or even 40) or could be other types of fixed number scheme,
such as shoe sizes (2, 2.5, 3, 3.5, and so on. They are called discrete data because
they have fixed points and measures in between do not exist (you cannot get 2.5
students, nor can you have a shoe size of 3.49). Counted data are also discrete data,
so number of patients in the hospital and number of faculty in the university are all
examples of discrete data.
Continuous data are data that can take any value, usually within certain limits, and
could be divided into finer and finer parts. A person's height is continuous data as it
can be measured in metres and fractions of metres (centimetres, millimetres,
nanometres). Time of an event is also continuous data and can be measures in years
and divided into smaller fractions, depending on how accurately you wish to record
it (months, days, hours, minutes, seconds, etc.).
While quantitative data are measured, qualitative data are observed and placed into
categories, exception to this is when categories have been numbered for practical
purposes, such as types of Animals (1, 2, 3…) instead of (Pig, Sheep, Cow…). In
this case, the numbers must be treated as the names of the categories - you're not
allowed to do any calculations with them.
But Stanley (1946) an American psychologist sub divided quantitative and
qualitative data into four (4) types; Ratio, Interval, Ordinal and Nominal Data.
a. Ratio Data is defined as quantitative data, having the same properties as
interval data, with an equal and definitive ratio between each data and
absolute “zero” being treated as a point of origin. In other words, there can
be no negative numerical value in ratio data.
b. Interval data, also called an integer, is defined as a data type which is
measured along a scale, in which each point is placed at equal distance from
one another. Interval data always appears in the form of numbers or
numerical values where the distance between the two points is standardized
and equal.
c. Ordinal data is a categorical, statistical data type where the variables have
natural, ordered categories and the distances between the categories is not
known. These data exist on an ordinal scale, one of four levels of
measurement.
d. Nominal data (also known as nominal scale) is a type of data that is used to
label variables without providing any quantitative value. It is the simplest

3
form of a scale of measure. ... One of the most notable features of ordinal
data is that, nominal data cannot be ordered and cannot be measured.

Section III: Types of Data Analysis: Techniques and Methods


Data Analysis is the process of systematic application of logical techniques and (or)
statistical method to describe, illustrate, condense, recap, and evaluate data. Data
analysis is an important process in business decision taking and to understand the
problems facing an organization. It of note that data in itself is merely facts and
figures. Savenye and Robinson, (2004) opined that data analysis organizes,
interprets, structures and presents data in a useful format (information) that
provides context for the empirical decision taking. Shamoo and Resnik (2003)
opined that various analytic procedures provide a way of drawing inductive
inferences from data and distinguishing the signal (the phenomenon of interest)
from the noise (statistical fluctuations) present in the data.
Every field of study has developed its own accepted practices for data analysis.
Resnik (2000) states that it is prudent to follow these accepted norms. Resnik
further states that the norms are based on two factors:
a) the nature of the variables used (i.e., quantitative, comparative, or
qualitative),
b) assumptions about the population from which the data are drawn (i.e.,
random distribution, independence, sample size, etc.).
While data analysis in qualitative research can include statistical procedures, many
times analysis becomes an ongoing iterative process where data is continuously
collected and analyzed almost simultaneously. Indeed, researchers generally
analyze for patterns in observations through the entire data collection phase
(Savenye & Robinson, 2004). The form of the analysis is determined by the
specific qualitative approach taken (field study, ethnography content analysis, oral
history, biography, unobtrusive research) and the form of the data (field notes,
documents, audiotape and videotape).
Quantitative data analysis is a process of inspecting, cleansing, transforming, and
modeling data with the goal of discovering useful information, informing
conclusions, and supporting decision-making. These data analysis has multiple
facets and approaches, encompassing diverse techniques under a variety of names,
and is used in different business, science, and social science domains. In today's
business world, data analysis plays a role in making decisions more scientific and
helping businesses operate more effectively. Kimball, Ross, Thornthwaite, Mundy
and Becker, (2008) posited that an essential component of ensuring data integrity is
the accurate and appropriate analysis of data. Data analysis tools make it easier for
users to process and manipulate data, analyze the relationships and correlations
between data sets, and it also helps to identify patterns and trends for interpretation.

3.1 Types of Data Analysis Techniques


There are several types of Data Analysis techniques that exist based on business
and technology. However, the major Data Analysis methods are:
 Text Analysis
 Statistical Analysis

4
 Diagnostic Analysis
 Predictive Analysis
 Prescriptive Analysis

3.1.1 Text Analysis


Text Analysis is also referred to as Data Mining. It is one of the methods of data
analysis to discover a pattern in large data sets using databases or data mining tools.
Research organizations uses data mining tools to turn raw data into useful
information (SAS, 2019). It help transform raw data into business information.

3.1.2 Statistical Analysis


This is component of data analyses, it is the collection and interpretation of data in
order to uncover patterns and trends. Statistical analysis can be used in situations
like gathering research interpretations, statistical modelling or designing surveys
and studies. Statistical analysis includes data collection, analysis, interpretation,
presentation, and modeling of data.

a. Inferential Analysis
Statistical inference is the process of using data analysis to infer properties of an
underlying distribution of probability. Inferential statistical analysis infers
properties of a population, for example by testing hypotheses and deriving
estimates. It arise out of the fact that sampling naturally incurs sampling error
and thus a sample is not expected to perfectly represent the population.

b. Descriptive Data Analysis


This is term given to the analysis of data that helps describe, or summarize data
in a meaningful way such that patterns might emerge from the data. Descriptive
statistics do not, however, allow us to make conclusions beyond the data we have
analysed or reach conclusions regarding any hypotheses we might have made.
They are simply a way to describe our data. Descriptive statistics are very
important because data presentation would be hard to visualize especially if there
was a lot of it. Descriptive statistics therefore enables us to present the data in a
more meaningful way, which allows simpler interpretation of the data. Laerd
(2019) categorised descriptive data analyses into two:
i. Measures of central tendency: these are ways of describing the central
position of a frequency distribution for a group of data. In this case, the
frequency distribution is simply the distribution and pattern of data under
study. Central position is described by using a number of statistics, including
the mode, median, and mean.
ii. Measures of spread: these are ways of summarizing a group of data by
describing how spread out the scores are with some data lower and others
higher. Measures of spread help us to summarize how spread out these data
are. To describe this spread, a number of statistics are available to us,
including the range, quartiles, absolute deviation, variance, skewness (refers

5
to a distortion or asymmetry that deviates from the symmetrical bell curve,
or normal), Kurtosis (statistical measure that defines how heavily the tails of
a distribution differ from the tails of a normal distribution) and standard
deviation.
When we use descriptive statistics it is useful to summarize our group of data
using a combination of tabulated description (i.e., tables), graphical description
(i.e., graphs and charts) and statistical commentary (that is discussion of the
results).

3.1.3 Diagnostic Analysis


Diagnostic analysis describes techniques that let you deep-dive into your data,
Diagnostic analytics describes the techniques you will use to ask your data: Why
did this happen? Diagnostic Analysis shows this by finding the cause using insight
found in Statistical Analysis (using such techniques as data discovery, drill-down,
data mining, and correlations). Diagnostic analytics takes a deeper look at data to
better understand the causes of behaviors and events, to help answer critical
workforce questions. There are three categories that the functions of diagnostic
analytics fall into.
 Identify anomalies: Based on the results of descriptive analysis,
Researchers must identify areas that require further study because they raise
questions that cannot be answered simply by looking at the data.
 Drill into the analytics: Researchers must identify the data sources that will
help them explain these anomalies.
 Determine causal relationships: Hidden relationships are uncovered by
looking at events that might have resulted in the identified anomalies.

3.1.4 Predictive Analysis


This is the use of data, statistical and machine learning techniques to identify the
likelihood of future outcomes based on historical data. It is the branch of the
analytics which is used to make predictions about unknown future events.
Predictive Analysis shows what is likely to happen by analyzing previous data,
with aim of going beyond knowing what has happened to providing a best
assessment of what will happen in the future. This analysis makes predictions about
future outcomes based on current or past data. Forecasting is just an estimate, its
accuracy is based on how much detailed information you have and how the data are
manipulated. It uses many techniques from data mining, statistics, modelling,
machine learning, and artificial intelligence to analyse current data to make
predictions about future. There are many models developed for design-specific
functions thus;
i. Forecast models
A forecast model is one of the most common predictive analytics models. It
handles metric value prediction by estimating the values of new data based
on learnings from historical data. It is often used to generate numerical
values in historical data when there is none to be found. One of the greatest

6
strengths of predictive analytics is its ability to input multiple parameters.
For this reason, they are one of the most widely used predictive analytics
models in use. They are used in different industries and business purposes.
Forecast models are popular because they are incredibly versatile.

ii. Classification models


One of the most common predictive analytics models are classification
models. These models work by categorising information based on historical
data. Classification models are used in different industries because they can
be easily retrained with new data and can provide a broad analysis for
answering questions. Classification models can be used in different
industries like finance and retail, which explains why they are so common
compared to other models.

iii. Outliers Models


While classification and forecast models work with historical data, the
outliers model works with anomalous data entries within a dataset. As the
name implies, anomalous data refers to data that deviates from the norm. It
works by identifying unusual data, either in isolation or in relation with
different categories and numbers. Outlier models are useful in industries
where identifying anomalies can save organisations millions of dollars,
namely in retail and finance. One reason why predictive analytics models are
so effective in detecting fraud is because outlier models can be used to find
anomalies. Since an incidence of fraud is a deviation from the norm, an
outlier model is more likely to predict it before it occurs. For example, when
identifying a fraudulent transaction, the outlier model can assess the amount
of money lost, location, purchase history, time and the nature of the
purchase. Outlier models are incredibly valued because of their close
connection to anomaly data.

iv. Time series model


While classification and forecast models focus on historical data, outliers
focus on anomaly data. The time series model focuses on data where time is
the input parameter. The time series model works by using different data
points (taken from the previous year’s data) to develop a numerical metric
that will predict trends within a specified period.
If organisations want to see how a particular variable changes over time,
then they need a Time Series predictive analytics model. For example, if a
small business owner wants to measure sales for the past four quarters, then
a Time Series model is needed. A Time Series model is superior to
conventional methods of calculating the progress of a variable because it can
forecast for multiple regions or projects simultaneously or focus on a single
region or project, depending on the organisation’s needs. Furthermore, it can
take into account extraneous factors that could affect the variables, like
seasons.

7
v. Clustering Model
The clustering model takes data and sorts it into different groups based on
common attributes. The ability to divide data into different datasets based on
specific attributes is particularly useful in certain applications, like
marketing. For example, marketers can divide a potential customer base
based on common attributes. It works using two types of clustering – hard
and soft clustering. Hard clustering categorises each data point as belonging
to a data cluster or not. While soft clustering assigns data probability when
joining a cluster.

Predictive analytics models have their strengths and weaknesses and are best
used for specific uses. One of the biggest benefits applicable to all models is
that they are reusable and can be adjusted to have common business rules. A
model can be reusable and trained using algorithms.

3.1.5. Prescriptive Analysis


Prescriptive analytics focuses on finding the best course of action given the results
obtained from manipulating available data. It make use of both descriptive
analytics and predictive analytics, but emphasizes actionable insights instead of
data monitoring. So it combines the insight from all previous analysis to determine
which action to take in a current problem or decision. Most data-driven companies
are utilizing prescriptive analysis because predictive and descriptive analysis are
not enough to improve data performance. Based on current situations and problems,
they analyze the data and make decisions.

3.2 Data Analysis Process


The Data Analysis Process is the process of gathering of raw fact (data) by using a
proper application or tool which allows you to explore the data and find a pattern in
it. Based on that information are obtain empirical based decisions can be taken.
Data Analysis consists of the following phases:

3.2.1 Data Requirement Gathering


Data requirement gathering process ensure that identified data are relevant and
feasible, the process employs a top-down approach that emphasizes business-driven
needs by incorporates data discovery and assessment. Having identified the data
requirements, selected data’s sources are determined and their quality is assessed
using the data quality assessment process.

3.2.2 Data Collection


After requirement gathering, next is collection of data. There are two main sources
of data which depend on the aim of the research topic.
a. Primary Data Source: Research may be subjective which is referred to as
phenomenological research, it is concerned with the study of experiences
from the perspective of an individual and emphasizes the importance of

8
personal perspectives and interpretations. In this case source of data
collection is from a primary source.
Primary data source is an original data source, one in which the data are
collected firsthand by the researcher for a specific research purpose or
project. Primary data can be collected in a number of ways. Primary data
collection is quite expensive and time consuming compared to secondary
data collection. It is by comparing phenomenal and event through
observation, pictures, survey, interview or making use of questionnaires as
instrument of data collection.
b. Secondary data
Data that have been manipulated to fit into certain occasion are obtain here,
data are collected in regular interval which maybe weekly, monthly or yearly
but it must have regular interval. Source of such data are Annual reports of
organization, national data sources (Central Banks-www.cenbank.org;
Bureau of statistics-www.beureuofstatistic.org.ng), International data
sources (World Bank-www.wb.org; IMF-www.imf.org and so on) and other
developmental international agencies (International Labour organization-
www.ilo.org; United Nation Development Agency-www.undp). These data
are normally collected in Time series that is with equal interval; if is
collected in certain organization it is called Times Series Data (TSD) and if
cut across Sectors e.g. Banking sector and manufacturing sector it is called
Cross-Sectorial Data (SSD) data gather from different sectors or groups at a
single point in time. While Panel Data referred to as longitudinal data, is
data that contains observations about different cross sections across time
interval.

3.2.3 Data Cleaning


Data cleaning is the process of preparing data for analysis by removing or
modifying data that is incorrect, incomplete, irrelevant, duplicated, or improperly
formatted. This type of data is usually not necessary or helpful when it comes to
analysing data because it may hinder the process or provide inaccurate results.
Now whatever data is collected may not be useful or irrelevant to your aim of
analysis, hence it should be cleaned. The data which is collected may contain
duplicate records, white spaces or errors. The data should be cleaned and error free.
This phase must be done before Analysis because based on data cleaning, your
output of Analysis will be closer to your expected outcome.

Section IV: Unit Root Test


Unit root tests are tests for stationarity in a time series. Stationarity occur in times
series data if a shift in time does not cause a change in the shape of the distribution;
unit roots are one cause for non-stationarity. A unit root (also called a unit root
process or a difference stationary process) is a stochastic trend in a time series,
sometimes called a “random walk with drift”; If a time series has a unit root, it
shows a systematic pattern that is unpredictable. A unit root also known as
difference stationary process is a stochastic trend in a time series, sometimes called
a “random walk with drift”. A stochastic model represents a situation

9
where uncertainty is present. In other words, it’s a model for a process that has
some kind of randomness, the opposite is a deterministic model, which predicts
outcomes with 100% certainty. If a time series has a unit root, it shows a systematic
pattern that is unpredictable. Most of data employed in financial time series display
trending behavior or non-stationarity in the mean examples are asset prices,
exchange rates and real GDP. It is important to determine the most appropriate
form of the trend in the data. Therefore unit root tests are tests for stationarity in a
time series. A time series has stationarity if a shift in time does not resulted to
change in the shape of the distribution. That is basic properties of the distribution
like the mean, variance and covariance are constant over time.
Therefore stationarity means that the statistical properties of a process generating a
time series do not change over time. It does not mean that the series does not
change over time, just that the way it changes does not itself change over time. The
algebraic equivalent is thus a linear function, perhaps, and not a constant one; the
value of a linear function changes as x grows, but the way it changes remains
constant- it has a constant slope; one value that captures that rate of change.

Non-stationary time series variables are called a unit root because of the
mathematics behind the process. A process can be written as a series of monomials
(expressions with a single term). Each monomial corresponds to a root. If one of
these roots is equal to 1, then that is a unit root. Unit-root problem is concerned
with the existence of characteristic roots of a time series model on the unit circle. If
a random walk model is;
zt = zt−1 + at (1)
Where, at is a white noise process.
White noise process is a random process X(t), if SX(f) is constant for all
frequencies. This constant is usually denoted by N02. The random process X(t) is
called a white noise process if SX(f)=N02, for all f.
This can be a sequence of martingale differences, that is, E(a t|Ft−1) = 0, Var(at|Ft−1)
is finite, and E(|at|2+∂ |Ft−1) < 1 for some ∂ > 0, where F t−1 is the ∂-field generated by
{at−1, at−2, . . .}. It is assumes that Z 0=0. It will be seen later that this assumption has
no effect on the limiting distributions of unit-root test statistics.
Chan and Wei (1988) opined that a martingale is a sequence of random variables (a
stochastic process) for which, at a particular time, the conditional expectation of the
next value in the sequence is equal to the present value, regardless of all prior
values.

Figure 1A: Time series generated by a stationary process

10
Figure 1B: Time series generated by a non-stationary processes.

4.1 Purpose and Uses of Unit Root Test


Most forecasting methods assume that a distribution has stationarity, therefore,
autocovariance and autocorrelations rely on the assumption of stationarity. An
absence of stationarity can cause unexpected or bizarre behaviors;
 In Auto Regressive Moving Average (ARMA) modeling the data must be
transformed to stationary form prior to analysis. If the data are trending, then
some form of trend removal is required. Two common trend removal or de-
trending procedures are first differencing and time-trend regression. First
differencing is appropriate for I(1) time series and time-trend regression is
appropriate for trend stationary I(0) time series. Unit root tests can be used
to determine if trending data should be first differenced or regressed on
deterministic functions of time to render the data stationary. Moreover,
economic and finance theory often suggests the existence of long-run
equilibrium relationships among non-stationary time series variables. If
these variables are I(1), then co-integration techniques can be used to model
these long-run relations. Hence, pre-testing for unit roots is often a first step
in the co-integration modeling. In practice, common trading strategy in
finance involves exploiting mean-reverting behavior among the prices of
pairs of assets. Unit root tests can be used to determine which pairs of assets
appear to exhibit mean-reverting behavior.
 It is observed that economic and finance theory often suggests the existence
of long-run equilibrium relationships among nonstationary time series
variables. This can strongly influence its behaviour and properties -e.g.
persistence of shocks will be infinite for non-stationary series
 The spurious regression phenomenon in least squares occurs for a wide
range of data generating processes, such as driftless unit roots, unit roots
with drift. That if two variables are trending over time, a regression of one
on the other could have a high R2 even if the two are totally unrelated.
 If the variables in the regression model are not stationary, then it can be
proved that the standard assumptions for asymptotic analysis will not valid.

11
In other words, the usual “t-ratios” will not follow a t-distribution, so we
cannot validly undertake hypothesis tests about the regression parameters.
Due to these properties, stationarity has become a common assumption for many
practices and tools in time series analysis. These include trend estimation,
forecasting and causal inference, among others.

4.2 Types of Stationary


Models can show different types of stationarity:
 Strict stationarity means that the joint distribution of any moments of any
degree (e.g. expected values, variances, third order and higher moments) within
the process is never dependent on time. This definition is in practice too strict
to be used for any real-life model.
 First-order stationarity series have means that never changes with time. Any
other statistics (like variance) can change.
 Second-order stationarity (also called weak stationarity) time series have a
constant mean, variance and an autocovariance that does not change with time.
Other statistics in the system are free to change over time. This constrained
version of strict stationarity is very common.
 Trend-stationary models fluctuate around a deterministic trend (the series
mean). These deterministic trends can be linear or quadratic, but the amplitude
(height of one oscillation) of the fluctuations neither increases nor decreases
across the series.
 Difference-stationary models are models that need one or more differencings
to become stationary

4.3 Unit Root Test


Unit Root Test are known for having low statistical power, many of these tests
exist, in part and complimentary because none stand out as having the most power.
Tests include:
 The Dickey Fuller Test (sometimes called a Dickey Pantula test), which is
based on linear regression. Serial correlation can be an issue, in which case
the Augmented Dickey-Fuller (ADF) test can be used. The ADF handles
bigger, more complex models. It does have the downside of a fairly high
Type I error rate.
 The Elliott–Rothenberg–Stock Test, which has two subtypes:
o The P-test takes the error term’s serial correlation into account,
o The DF-GLS test can be applied to detrended data without intercept.
 The Schmidt–Phillips Test includes the coefficients of the deterministic
variables in the null and alternate hypotheses. Subtypes are the rho-test and
the tau-test.
 The Phillips–Perron (PP) Test is a modification of the Dickey Fuller test,
and corrects for autocorrelation and heteroscedasticity in the errors.
 The Zivot-Andrews test allows a break at an unknown point in the intercept
or linear trend.

Section V: Concept of Co-integration

12
Co-integration tests identify scenarios where two or more non-stationary time series
are integrated together in a way that they cannot deviate from equilibrium in the
long term. The tests are used to identify the degree of sensitivity of two variables to
the same average price over a specified period of time.
Before the introduction of co-integration tests, economists relied on linear
regressions to find the relationship between several time series processes. Granger
and Newbold (1987) argued that linear regression was an incorrect approach for
analyzing time series due to the possibility of producing spurious correlation. A
spurious correlation occurs when two or more associated variables are deemed
causally related due to either a coincidence or an unknown third factor. This
possible result is a misleading statistical relationship between several time series
variables. Granger and Engle (1987) formalized the co-integrating vector approach.
Their concept established that two or more non-stationary times series data are
integrated together in a way that they cannot move away from some equilibrium in
the long term.

Figure 3: Co-integration of Gender as an Indicator of Marriage Age

Male

Female

Source: Econometrics Beat (Dave Giles’s 2020)

The two economists argued against the use of linear regression to analyze the
relationship between several time series variables because detrending would not
solve the issue of spurious correlation. Instead, they recommended checking for co-
integration of the non-stationary time series. They argued that two or more time
series variables with I(1) trends can be co-integrated if it can be proved that there is
a relationship between the variables.

5.1 Methods of Testing for Co-integration


There are three main methods of testing for co-integration. They are used to
identify the long-term relationships between two or more sets of variables. The
methods include:

a. Engle-Granger Two-Step Method


The Engle-Granger Two-Step method starts by creating residuals based on the
static regression and then testing the residuals for the presence of unit roots. It uses

13
the Augmented Dickey-Fuller Test (ADF) or other tests to test for stationarity units
in time series. If the time series is co-integrated, the Engle-Granger method will
show the stationarity of the residuals.
The limitation with the Engle-Granger method is that if there are more than two
variables, the method may show more than two co-integrating relationships.
Another limitation is that it is a single equation model. However, some of the
drawbacks have been addressed in recent co-integration tests like Johansen’s and
Phillips-Ouliaris tests. The Engle-Granger test can be determined using STAT or
MATLAB software.

b. Johansen Test
The Johansen test is used to test co-integrating relationships between several non-
stationary time series data. Compared to the Engle-Granger test, the Johansen test
allows for more than one co-integrating relationship. However, it is subject to
asymptotic properties or large sample theory is a framework for assessing
properties of estimators and statistical tests. Within this framework, it is often
assumed that the sample size n may grow indefinitely; the properties of estimators
and tests are then evaluated under the limit of n → ∞. In practice, a limit evaluation
is considered to be approximately valid for large finite sample sizes too. Since a
small sample size would produce unreliable results. Using the test to find co-
integration of several time series avoids the issues created when errors are carried
forward to the next step. Johansen’s test comes in two main forms, i.e., Trace tests
and Maximum Eigenvalue test.

c. Trace tests
Trace tests evaluate the number of linear combinations in a time series data, i.e., K
to be equal to the value K0, and the hypothesis for the value K to be greater than K0.
It is illustrated as follows:
H0: K = K0
H0: K > K0
When using the trace test to test for co-integration in a sample, we set K 0 to zero to
test whether the null hypothesis will be rejected. If it is rejected, we can deduce that
there exists a co-integration relationship in the sample. Therefore, the null
hypothesis should be rejected to confirm the existence of a co-integration
relationship in the sample.

Maximum Eigenvalue test


An Eigenvalue is defined as a non-zero vector which, when a linear transformation
is applied to it, changes by a scalar factor. The Maximum Eigenvalue test is similar
to the Johansen’s trace test. The key difference between the two is the null
hypothesis.
H0: K = K0
H0: K = K0 + 1
In a scenario where K=K0 and the null hypothesis is rejected, it means that there is
only one possible outcome of the variable to produce a stationary process.
However, in a scenario where K0 = m-1 and the null hypothesis is rejected, it means

14
that there are M possible linear combinations. Such a scenario is impossible unless
the variables in the time series are stationary.

Section Six: Conclusion


Data analytics is important in today’s businesses, due to the fact that businesses like ever
are faced with a dynamic environments full of uncertainties, hence navigating such
environments, businesses must make empirically backed business decisions. Although
businesspersons sometime wrongly believe that data analysis is something complex and
only applied to online businesses only and that the new-fangled technology are somehow
connected to eCommerce only. Overtime, these beliefs has been proved wrong, in fact,
data analysis is important for the success of every business.
Data analysis in business to understand problems facing an organisation, and to explore
data in meaningful ways. Data in itself is merely facts and figures organised, interprets,
structures and presents the data into useful information that provides context for decision
taking in business environment. In simplest terms, data analysis is nothing but checking
records of your business. This means finding out more about your customers, knowing
what they buy and how much they spend as well as their preferences and purchasing
habits. It also involve keeping tabs on your competitors to find out who’re their customers
and their spending habits, preferences, and other details in other to gain competitive edge
in the market.
Drucker (1974) opined that every business has two functions-marketing and innovation.
This help to sums up the importance of data analysis for every business, thus;
 Data analysis, whether it is for online or offline business helps you draw a near
accurate marketing strategy. This means you can target the right audiences for
your business with the correct blend of products or services. Additionally, you can
also charge different rates in different markets due to shipping or currency
conversion issues.
 When data obtain eCommerce portal are analysed it provide a clear picture of
where the bulk of your customers are located and their buying preferences,
budgets, and other vital details. These details can form the backbone of your
marketing strategy for any business.
 In innovation, data analysis helps to innovate a product or service line to meet the
requirements of a local or distant market. It also enables the extension of a product
line if necessary to appeal to a higher number of people. Expanding a product line
is possible through data analysis. Because you can find why people prefer a
specific product and are willing to pay a certain amount of money.
 By offering a better, innovative product for a slightly higher price or adapting
something to sell for a lower rate. By doing so, more consumers are motivated to
patronize the product. Data analysis gives you a clear picture of frequencies at
which people buy something. Lowering the price can lead people to buy more
while upping your rate for an innovative product can help for newer customers. In
both ways, data analysis helps you win.
 Data analysis is extremely important for every business that is serious about
expanding its operations for success. Especially introducing product into new
market, small quantity of products or services are introduce and gauge the
response through data analysis. Data analysis helps you understand the
preferences of people based upon demographics. Data analysis gives a clear
picture of the demographics of a distant location and their requirements, it is
possible to enter that market with products or services that have a local appeal.

15
 Data analysis helps expose gray patches in your business and effectively counter
them by deploying different strategies. Data analysis will clearly indicate areas
where an online or offline business lags. And you can devise ways and means to
counter that.

16
REFERENCES
Caner, M. and L. Kilian (2001). Size distortions of tests of the null hypothesis of
stationarity: Evidence and implications for the PPP debate. Journal of International
Money and Finance, Vol. 20 (2). Pg. 639-657.
Dickey, D. and W. Fuller (1979). Distribution of the estimators for autoregressive time
series with a unit root. Journal of the American Statistical Association, Vol. 74, (9)
pg. 427-431.
Dickey, D. and W. Fuller (1981). Likelihood ratio statistics for autoregressive time series
with a unit root. Econometrica Vol. 49 (3). Pg.1057-1072.
Elliot, G., T.J. Rothenberg, and J.H. Stock (1996). Efficient tests for an autoregressive unit
root. Econometrica. Vol. 64 (8). Pg.813-836.
Everitt, B. S.; Skrondal, A. (2010). the Cambridge dictionary of statistics. Cambridge;
University Press.
Fuller, W. (1996). Introduction to statistical time series, Second Edition. New York; John
Wiley.
Gottschalk, L. A. (1995). Content analysis of verbal behavior: New findings and clinical
applications. New Jersey: Lawrence Erlbaum Associates, Inc
Hamilton, J. (1994). Time series analysis. , New Jersey; Princeton University Press.
Hatanaka, T. (1995). Time-series-based econometrics: unit roots and co-integration.
Oxford; University Press.
Kimball, R., Ross, M., Thornthwaite, W., Mundy, J., Becker, B. (2008). The data
warehouse lifecycle toolkit. , New Jersey; Wiley Publishing. ISBN 978-0-470-
14977-5
Kwiatkowski, D., P.C.B. Phillips, P. Schmidt and Y. Shin (1992). Testing the null
hypothesis of stationarity against the alternative of a unit root. Journal of
Econometrics; Vol.54 (5). Pg 159-178.
MacKinnon, J. (1996). Numerical distribution functions for unit root and cointegration
tests. Journal of Applied Econometrics; Vol. 11 (8). Pg. 601-618.
Maddala, G.S. and I.-M. Kim (1998). Unit roots, cointegration and structural change.
Oxford; University Press, Oxford.
Ng, S., and P. Perron (1995). Unit root tests in ARMA models with data-dependent
methods for the selection of the truncation Lag. Journal of the American Statistical
Association; Vol. 90 (6). Pg. 268-281.
Ng, S., and P. Perron (2001). Lag length selection and the construction of unit root tests
with good size and power. Econometrica; Vol. 69 (7). Pg. 1519-1554.
Perron, P. and S. Ng. (1996). Useful modifications to some unit root tests with dependent
errors and their local asymptotic properties. Review of Economic Studies; Vol. 63
(23). Pg. 435-463.
Phillips, P.C.B. (1987). Time series regression with a unit root. Econometrica, Vol. 55
(21). Pg. 227-301.
Phillips, P.C.B. and P. Perron (1988). Testing for unit roots in time series regression.
Biometrika; Vol. 75 (6). Pg. 335-346.
Phillips, P.C.B. and Z. Xiao (1998). A primer on unit root testing. Journal of Economic
Surveys; Vol. 12 (6). Pg. 423-470.
Schwert, W. (1989). Test for Unit Roots: A Monte Carlo investigation. Journal of
Business and Economic Statistics; Vol. 7 (1). Pg. 147-159.
Said, S.E. and D. Dickey (1984). Testing for unit roots in autoregressive moving-average
models with unknown order. Biometrika; Vol. 71 (1). Pg. 599-607.
Stock, J.H. (1994). Units roots, structural breaks and trends. in R.F. Engle and D.L.
McFadden (eds.), Handbook of Econometrics, Vol. IV. (2).

17
Jeans, M. E. (1992). Clinical significance of research: A growing concern. Canadian
Journal of Nursing Research; Vol. 24 (1). Pg. 1-4.
Resnik, D. (2000). Statistics, ethics, and research: an agenda for educations and reform.
Accountability in Research; Vol. 8 (4). Pg. 163-88
Shamoo, A.E. and Resnik, B.R. (2003). Responsible Conduct of Research. Oxford;
University Press.
Shamoo, A.E. (1989). Principles of Research Data Audit. New York; Gordon and Breach.
Smeeton, N., Goda, D. (2003). Conducting and presenting social work research: some
basic statistical considerations. Britain Journal of Social Work; Vol. 33 (6). Pg. 567-
573.
Vogt, W.P. (2005). Dictionary of statistics & methodology. A Nontechnical Guide for the
Social Sciences. SAGE. Xnv

18

You might also like