Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Chapter

3
Descrip.on of Data: Sta.s.cs and
Graphical Descrip.on
Outline
Introduc)on
Descrip)ve Sta)s)cs: Describing Data
Measures of Central Tendency
Proper)es of Measures of Central Tendency
Measures of Variability
Graphical Representa)on of Data
Descrip.ve Sta.s.cs:
Describing Data
The descrip)ve sta)s)cs provides informa)on
about:
Center of the data
Variability of the data
Distribu)on of the data
It is used to understand the sample and not
make an inference about the popula)on.
Measures of Central Tendency

Mean
Most popular measure
Obtained by dividing the sum of the scores
(x) by the number of scores (n)
n n

X i X i
(1 + 2 + 3 + 4 + 5 ) 15
X= i =1 X= i =1
= = =3
n n 5 5
Median and Mode
Median: Median is the point above which and
below which lie 50 percent ordinally arranged
data points.
n +1 5 +1 6
Median Location = = = =3
2 2 2

Mode: The score in distribu)on that has the


highest frequency.
Data: x = 1, 2, 3, 4, 2, 5. Mode = 2
Proper.es of Measures of
Central Tendency
Mean:
Balancing point of distribu)on
Mean would change even if one score changes
Sum of the score is translated into mean
Algebraic manipula)ons are possible with mean
Most resistant central tendency to sampling varia)on
More sensi)ve to extreme scores
Proper.es of Measures of
Central Tendency
Median
Less sensi)ve to extreme scores
Mathema)cal manipula)on not possible
Mode
Useful in understanding most frequent response
Measures of Variability
Range
Dierence between lowest score and highest score in the data
Interquar.le Range (IQR)
Solves the problem of dependence of range on extreme scores.
Lower 25 percent scores is called as rst quar)le (Q1) and cut-o point
for upper 25 percent scores is called as third quar)le (Q3).
Interquar)le range is Q3Q1
Semi-interquar)le range (Q) is obtained by dividing interquar)le range
by 2
Average Devia.on
Mean Absolute Devia.on
Median Absolute Devia.ons
Variance and Standard
Devia.on
n

( X X)
2
i
1 n
= ( Xi X )
2
S =
2
X
i =1

n n i =1
Sample variance is the average of squared devia)on from mean.
The denominator n1 is the variance, which is an es)mator of popula)on
variance.
Standard Devia.on (S)
Standard devia)on is a posi)ve square-root of the variance
n

( Xi X )
2

S X = S X2 = i =1
n 1
The Graphical Representa.on
of Data
Stem and Leaf Graph
Stem is on the le[-hand column and leaves are the lists
on the right-hand row.
Data: 22, 25, 32, 43, 46, 49, 55, 55, 55
Stem Leaves

1 0
2 2 5
3 2
4 3 6 9
5 5 5 5
6 0
Box-whisker Plot (Box Plot)
The box plot uses a quar)le
as its basis.
The data is divided into
three areas: lower 25
percent, middle 50 percent
data, and upper 25 percent
data.
The upper and lower 25
percent data is represented
by whiskers and middle 50
percent data is represented
by a box.
Histograms
The histograms
represent how
frequent the numbers
are in the data.
The gure shows the
histogram for the data
sampled from normally
distributed popula)on
Kernel Density Plots
Kernel density es)mator
(KDE) is a nonparametric
method of es)ma)ng pdf of
con)nuous random variable
n
1
fh ( X ) = K h ( X X i )
2 i =1
Steps:
(i) Choose kernel
(ii) Construct kernel func)on
for each data point
(iii) All individual func)ons are
added and divided by n
The ggplot2 and LaSce: Data
Visualiza.on with R
install.packages("ggplot2")
library(ggplot2)
qplot(x, geom="histogram", binwidth = 1)
qplot(x, y, data =, color =, shape =, size =, alpha =,
geom =, method =, formula =, facets =, xlim =,
ylim = xlab =, ylab =, main =, sub =)
Each argument needs to be specied.

You might also like