Download as ppsx, pdf, or txt
Download as ppsx, pdf, or txt
You are on page 1of 22

Examining the Data

By Salman Sarwat
Examining the Data
Data Examination is the important initial step for
quantitative research, which researchers often overlook

Data collection  data examination  data analysis

Data Examination refers to the appropriateness of


data to be used for a particular analysis

Thus, data examination should conducted before data


analysis
2 Examining the Data by Salman Sarwat
Examining the Data
Data Examination reveals some hidden elements and
characteristics of data, which can have serious
impacts on validity and accuracy of results of various
statistical tools

Example: Nonrandom missing data, cyclic data, outliers

Data Examination is not limited to identification but it


proposes few remedies as well (Data refinement)

3 Examining the Data by Salman Sarwat


Data Examination includes
1. Checking the data for various statistical assumptions,
on which quantitative tools are relying
2. Identification & handling of Missing Data
3. Identification & handling of Outliers
4. Checking the data for illogical values

Approaches for Data Examination:


Graphical tools
Empirical tools
 Graphical tools are not meant to replace empirical tools but rather
compliment them

4 Examining the Data by Salman Sarwat


Checking Statistical Assumptions
Normality
Linearity
Homoscedasticity
Non-correlated Errors
Multicollinearity
Stationarity
Some assumptions are checked prior to analysis whereas
some are checked concurrent

5 Examining the Data by Salman Sarwat


Data Profiling
(Graphical Examination)
Univariate Profiling
Examine the shape of the distribution
 Bar diagram,
 Histogram

Bivariate Profiling
Relationship between the variables
 Scatter plots
 Scatter Matrix
Group differences
 Box plot

Multivariate Profiling
Direct Portrayal of data values (glyphs / metroglyphs) – circles with radii
Mathematical Transformations (Fourier’s Transformation)
Iconic Representation (Chernoff faces)

6 Examining the Data by Salman Sarwat


Univariate Profiling
A pie chart is a disk divided into wedge-shaped pieces
proportional to the relative frequencies of the
qualitative data.
Bar chart displays the distinct values of the qualitative
data on a horizontal axis and the relative frequencies
(or frequencies or percents) of those values on a
vertical axis.
A histogram displays the classes of the quantitative
data on a horizontal axis and the frequencies (relative
frequencies, percents) of those classes on a vertical
axis.
7 Examining the Data by Salman Sarwat
Univariate Profiling
Distribution of a Data: The distribution of a data set is a table,
graph, or formula that provides the values of the observations and
how often they occur.

A frequency distribution provides a table of the values of the


observations and how often they occur.
Qualitative: value grouping
Quantitative: value grouping, interval grouping, boundary grouping
(for continuous data)
 
Distribution Shapes: bell shaped, triangular, uniform, reverse J
shaped, J shaped, right skewed, left skewed, bimodal, and
multimodal.

8 Examining the Data by Salman Sarwat


Distribution Shapes

9 Examining the Data by Salman Sarwat


Plots
Dotplots:
Horizontal axis displays the possible values of the
quantitative, record each observation by placing a dot
over the appropriate value on the horizontal axis data

10 Examining the Data by Salman Sarwat


Plots
Stemplot: Stem & Leaf Diagram

11 Examining the Data by Salman Sarwat


Plots
Normal probability Plot
For a large sample, a histogram, stemplot and dotplots of
the observations should be roughly bell shaped for
normally distributed; for a relatively small sample, it is
often difficult.
For a relatively small sample, a more sensitive graphical
technique Normal probability plots is used
Idea:
 Compare the observed values of the variable to the observations
expected for a normally distributed variable (normal scores) If the
variable is normally distributed, the normal probability plot
should be roughly linear (i.e., fall roughly in a straight line)

12 Examining the Data by Salman Sarwat


Normal Probability Plot

13 Examining the Data by Salman Sarwat


Normal probability Plot

14 Examining the Data by Salman Sarwat


Plots
Boxplots
A boxplot, also called a box-and-whisker diagram, is
based on the five-number summary and can be used to
provide a graphical display of the center and variation of
a data set. These diagrams, like stem-and-leaf diagrams,
were invented by Professor John Tukey.
The five-number summary of a data set is
 Min, Q1, Q2, Q3, Max

15 Examining the Data by Salman Sarwat


Boxplots
In a boxplot, the two lines emanating from the box are
called whiskers.
The lower limit and upper limit of a data set are
 Lower limit = Q1 − 1.5 · IQR
 Upper limit = Q3 + 1.5 · IQR
 IQR = Q3 − Q1

16 Examining the Data by Salman Sarwat


Bivariate Profiling
Scatterplot:

It is a graph of data from two quantitative variables of a


population, horizontal axis for the observations of one
variable and a vertical axis for the observations of the
other. Each pair of observations is then plotted as a point.

It gives general idea about the nature of relationship


between two variables

17 Examining the Data by Salman Sarwat


Scatterplot

18 Examining the Data by Salman Sarwat


Boxplots
To understand the differences across some non metric
groups

19 Examining the Data by Salman Sarwat


Scatterplot Matrix
For multiple bivariate relationship

20 Examining the Data by Salman Sarwat


Multivariate Profiling
Chernoff Faces
The individual parts, such as eyes, ears, mouth and nose
represent values of the variables by their shape, size,
placement and orientation.
The idea behind using faces is that humans easily recognize
faces and notice small changes without difficulty.
Chernoff faces handle each variable differently.
Because the features of the faces vary in perceived
importance, the way in which variables are mapped to the
features should be carefully chosen
Asymmetrical Chernoff faces: 18 variables 36

21 Examining the Data by Salman Sarwat


Chernoff Faces

22 Examining the Data by Salman Sarwat

You might also like