Download as pdf or txt
Download as pdf or txt
You are on page 1of 63

Descriptive Statistics

Quantitative(variable)
Discrete (no. of
customers, no of
claims)
Continuous (salary,
price)
Qualitative(Attribute)
Ordinal (customer
satisfaction, efficiency
of workers, bond
rating)
Nominal (sex,
nationality, eye color)
7/3/2013 2 Descriptive Statistics
Data
Primary
Secondary
Data
Time series
(unemployment
rate, GDP)
Cross Sectional
(queue length in
different SBI
branches)
7/3/2013 3 Descriptive Statistics
Definition
Primary Data

Collected from source
directly
Collected under the control
and supervision of
investigation

Secondary Data

Not collected by the
investigator
Derived from the other
sources
7/3/2013 4 Descriptive Statistics
Interview Method
Questionnaire Method
Observation Method
Methods of collecting Primary Data
7/3/2013 5 Descriptive Statistics


Diagram Presentation
Diagram
Line (time
series)
Simple Multiple
Bar
Vertical (time
series)
Horizontal
(cross
sectional
Component Subdivided
Pie
7/3/2013 6 Descriptive Statistics
When data are collected in original form,
they are called raw data.
When the raw data is organized into a
frequency distribution, the frequency will
be the number of values in a specific class
of the distribution (grouped data).
7/3/2013 Descriptive Statistics 7
Data Table : Compressive Strength of 80
Aluminum Lithium Alloy
105 221 183 186 121 181 180 143
97 154 153 174 120 168 167 141
245 228 174 199 181 158 176 110
163 131 154 115 160 208 158 133
207 180 190 193 194 133 156 123
134 178 76 167 184 135 229 146
218 157 101 171 165 172 158 169
199 151 142 163 145 171 148 158
160 175 149 87 160 237 150 135
196 201 200 176 150 170 118 149
7/3/2013 Descriptive Statistics 8
Stem-And-Leaf
Stem leaf frequency
7 6 1
8 7 1
9 7 1
10 5 1 2
11 5 0 8 3
12 1 0 3 3
13 4 1 3 5 3 5 6
14 2 9 5 8 3 1 6 9 8
15 4 7 1 3 4 0 8 8 6 8 0 8 12
16 3 0 7 3 0 5 0 8 7 9 10
17 8 5 4 4 1 6 2 1 0 6 10
18 0 3 6 1 4 1 0 7
19 9 6 0 9 3 4 6
20 7 1 0 8 4
21 8 1
22 1 8 9 3
23 7 1
24 5 1

7/3/2013 Descriptive Statistics 9
class width=
upper class boundary-lower class boundary
Terms Associated with a
Grouped Frequency Distribution
7/3/2013 Descriptive Statistics 10
Class Mark or Mid-Value
class marks are the midpoints of the class
boundaries

Class mark=
1/2(upper class boundary+lower class boundary)

7/3/2013 Descriptive Statistics 11
FD=Class frequency/class width
It gives number of observations in a class of width
one
Use- When class widths are not equal, frequency
density is plotted on the y-axis to draw Histogram

Frequency Density
7/3/2013 Descriptive Statistics 12
RF=Class frequency/total frequency
Relative Frequency
7/3/2013 Descriptive Statistics 13
Visualizing Data
The three most commonly used
graphs in research are:
The histogram.
The frequency polygon.
The cumulative frequency graph or
ogive
7/3/2013 Descriptive Statistics 14

Characteristic Definition / Interpretation
Central Tendency Where are the data values concentrated?
What seem to be typical or middle data
values?
Key Characteristics
Dispersion How much variation is there in the data?
How spread out are the data values?
Are there unusual values?
Shape Are the data values distributed
symmetrically? Skewed? Sharply peaked?
Flat? Bimodal?
7/3/2013 Descriptive Statistics 15
Measure Formula Excel Formula Pro Con
Mean
(Raw
data)
=AVERAGE(Data)
Familiar and
uses all the
sample
information.
NA to
extreme
values and
open class
Measures
Mean
(Groupe
d data)
=AVERAGE(Data)
Familiar and
uses all the
sample
information.
NA to
extreme
values and
open class
7/3/2013 Descriptive Statistics

=
=
=
k
i
i
k
i
i i
f
f x
x
1
1
n
x
x
n
i
i
=
=
1
16
Measure Formula Excel Formula Pro Con
Median
Middle value
in sorted
array
=MEDIAN(Data)
Robust
when
extreme
data values
exist.
Statistical
procedure
s for
median
are
complex
Measures
Mode
Most
frequently
occurring
data value
=MODE(Data)
Useful for
attribute
data or
discrete
data with a
small range.
May not be
unique,
and is not
helpful for
continuous
data.
7/3/2013 Descriptive Statistics 17
Statistic is descriptive measure derived from a sample
(n items).
Parameter is descriptive measure derived from a
population (N items).
Population vs Sample
Characteristics
7/3/2013 Descriptive Statistics 18
Calculation of Mean




= =
= =
=
=
= = =
= = =
= = =
= = =
k
i
i i
k
i
i
k
i
i i
k
i
i
n
i
i
N
i
i
f N f x
N
mean Sample x
f N f x
N
mean Population
data Grouped
size sample n x
n
mean Sample x
size Population N x
N
mean Population
data Raw
1 1
1 1
1
1
;
1

;
1

:
;
1

;
1

:

7/3/2013 Descriptive Statistics 19


Seventy efficiency apartments were randomly sampled in a small
college town. The monthly rent prices for these apartments are
listed below.





Sample Mean
Example: Apartment Rents
7/3/2013 Descriptive Statistics 20
Sample Mean
34, 356
490.80
70
i
x
x
n
= = =

7/3/2013 Descriptive Statistics 21


Consider the following n = 6 data values:
11 12 15 17 21 32
What is the median?
M = (x
3
+x
4
)/2 = (15+17)/2 = 16
11 12 15 16 17 21 32
For even n, Median =
/ 2 ( / 2 1)
2
n n
x x
+
+
n/2 = 6/2 = 3 and n/2+1 = 6/2 + 1 = 4
Calculation of Median (n is even)
7/3/2013 Descriptive Statistics 22
Consider the following n = 7 data values:
11 12 15 17 21 32 38
What is the median?
11 12 15 17 21 32 38
(n+1)/2 = 8/2 = 4
Calculation of Median (n is odd)
For odd n, Median =
( 1) / 2 n
x
+
7/3/2013 Descriptive Statistics 23
Trimmed Mean
It is obtained by deleting a percentage of the
smallest and largest values from a data set and then
computing the mean of the remaining values.
For example, the 5% trimmed mean is obtained by
removing the smallest 5% and the largest 5% of the
data values and then computing the mean of the
remaining values.
Another measure, sometimes used when extreme
values are present, is the trimmed mean.
7/3/2013 Descriptive Statistics 24
A bimodal distribution refers to the shape of the
histogram rather than the mode of the raw data.
Occurs when dissimilar populations are combined in
one sample. For example,
Mode
7/3/2013 Descriptive Statistics 25
Percentiles are data that have been divided into 100 groups and how the
data spread over an interval from smallest to largest value
For example, you score in the 83
rd
percentile on a
standardized test. That means that 83% of the test-
takers scored below you.
Deciles are data that have been divided into 10 groups.
Quartiles are data that have been divided into 4
groups.
Percentiles and Quartiles
In general by pth order quantile or fractile (Zp ), we mean that p
Proportion of the total observations lie below
7/3/2013 Descriptive Statistics 26
Put p=1/4, 2/4, 3/4, get quartiles
Put p=1/10,2/10, , 9/10, get deciles
Put p=1/100, 2/100, , 99/100, get percentiles

Step 1. Sort the observations.
Step 2. Calculate np ; n=no of observations.
Percentiles and Quartiles
7/3/2013 Descriptive Statistics
Step 3: If np is not an integer, consider the next integer value as
the position else take both the integer and the next integer
as the positions; take their mean
27
Third Quartile
Third quartile = 75th percentile
np = (75/100)70 = 52.5 = 53
Third quartile = 525
7/3/2013 Descriptive Statistics 28
Dispersion
Describes how similar a set of observations
are to each other
or
the degree of deviation (spread) of a set of
data from their central value
In general, the more spread out a distribution is,
the larger the measure of dispersion will be
7/3/2013 Descriptive Statistics 29
Measures of Dispersion
There are five main measures of dispersion:
Range
Mean Deviation
Mean squared deviation (variance)
Root mean squared deviation (Standard
Deviation)
Inter-quartile range (IQR)

7/3/2013 Descriptive Statistics 30
Measure Formula Excel Formula Pro Con
Range x
max
x
min

=MAX(Data)-
MIN(Data)
Easy to
calculate
Sensitive to
extreme data
values.
Measures
Mean
Deviation
=ABS(expr)
Measures
deviation
accurately
Further
algebraic
treatment is
not possible

=

n
i
i
x x
n
1
1
7/3/2013 Descriptive Statistics 31
Measure Formula Excel Formula Pro Con
Populatio
n
Variance
=VARP(array)
Important
measure
Overestim
ates the
error
Measures

Sample
Variance
=VAR(array)
Important
measure
Overestim
ates the
error

=
=
N
i
i
x
N
1
2 2
) (
1
o

=
n
i
i
x x
n
s
1
2 2
) (
1
1
7/3/2013 Descriptive Statistics 32
REMEMBER
2
1
2
1
2 2
1 1
1
) (
1
1

x
n
n
f x
n
f x x
n
s
data grouped For
k
i
i i
k
i
i i

=

= =
7/3/2013 Descriptive Statistics 33
Measure Formula Excel Formula Pro Con
Populatio
n
Standard
Deviation
=STDEVP(array
)
Best
measure
Measures
Sample
Standard
deviation
=STDEV(array)
Best
measure
2
o o =
2
s s =
7/3/2013 Descriptive Statistics 34
Inter-quartile Range
The inter-quartile range (IQR) is defined as the
difference of the first and third quartiles
divided by two
The first quartile is the 25
th
percentile
The third quartile is the 75
th
percentile
IQR = (Q
3
- Q
1
)
7/3/2013 Descriptive Statistics 35
When To Use the SIR
It is the range for the middle 50% of the data
The SIR is often used with skewed data as it is
insensitive to the extreme scores
The SIR is used with open end distribution

7/3/2013 Descriptive Statistics 36
Interquartile Range
3rd Quartile (Q3) = 525
1st Quartile (Q1) = 445
Interquartile Range = Q3 - Q1 = 525 - 445 = 80
Example: Apartment Rents
7/3/2013 Descriptive Statistics 37
Coefficient of Variation (CV)
Relative measure (unit free) used for the purpose
of comparison of variability when
(i) two variables of different units are compared
(ii) two variables of same unit with varying mean
are compared
Relative Measure=absolute measure/avg. *100

100
s
CV
x
=
7/3/2013 Descriptive Statistics 38
| | | |
| | = =
| |
\ . \ .
54.74
100 % 100 % 11.15%
490.80
s
x
2
2996.16 54.74 s s = = =
the standard
deviation is
about 11%
of the mean
Variance
Standard Deviation
Coefficient of Variation
Sample Variance, Standard Deviation,
And Coefficient of Variation
Example: Apartment Rents

=
=

=
n
i
i
x x
n
s
1
2 2
16 . 2996 ) (
1
1
7/3/2013 Descriptive Statistics 39
Skewness
Skew is a measure of symmetry in the
distribution of data
Positive Skew
Negative Skew
Normal (skew =
0)
7/3/2013 Descriptive Statistics 40
Measure of Skew
Skewness is a unit-free measure of shape of any
frequency distribution.
The coefficient compares two samples measured in
different units or one sample with a known reference
distribution (e.g., symmetric normal distribution).
Calculate the samples skewness coefficient
7/3/2013 Descriptive Statistics 41
Nature of Skewness
If , distribution has a positive skewness or
is right skewed
If , distribution has a negative skewness
or is left skewed
If , distribution is symmetrical
0
1
> g
0
1
< g
0
1
= g
7/3/2013 Descriptive Statistics 42
Kurtosis is the relative length of the tails and the
degree of concentration in the center.
Consider three kurtosis prototype shapes.
Kurtosis
7/3/2013 Descriptive Statistics 43
Kurtosis
When the distribution is normally distributed, its
kurtosis equals 3 and it is said to be mesokurtic
When the distribution is less spread out than
normal, its kurtosis is greater than 3 and it is said
to be leptokurtic
When the distribution is more spread out than
normal, its kurtosis is less than 3 and it is said to
be platykurtic
7/3/2013 Descriptive Statistics 44
The z-score is often called the standardized value.


It denotes the number of standard deviations a data
value x
i
is from the mean.

An observations z-score is a measure of the relative
location of the observation in a data set.




z-Scores
s
x x
z
i
i

=
Excels STANDARDIZE function can be used to
compute the z-score.
7/3/2013 Descriptive Statistics 45
425 490.80
1.20
54.74
i
x x
z
s

= = =
z-Scores
Standardized Values for Apartment Rents

Example: Apartment Rents


7/3/2013 Descriptive Statistics 46
Chebyshevs Theorem
At least (1 - 1/z
2
) of the items in any data set will be
within z standard deviations of the mean, where z is
any value greater than 1.
Chebyshevs theorem requires z > 1, but z need not
be an integer.
7/3/2013 Descriptive Statistics 47
At least of the data values must be
within of the mean.
75%
z = 2 standard deviations
Chebyshevs Theorem
At least of the data values must be
within of the mean.
89%
z = 3 standard deviations
At least of the data values must be
within of the mean.
94%
z = 4 standard deviations
7/3/2013 Descriptive Statistics 48
Empirical Rule
For data having a bell-shaped
distribution:
of the values of a normal random variable
are within of its mean.
68.26%
+/- 1 standard deviation
of the values of a normal random variable
are within of its mean.
95.44%
+/- 2 standard deviations
of the values of a normal random variable
are within of its mean.
99.72%
+/- 3 standard deviations
7/3/2013 Descriptive Statistics 49
Empirical Rule
x
3o 1o
2o
+ 1o
+ 2o
+ 3o

68.26%
95.44%
99.72%
7/3/2013 Descriptive Statistics 50
Detecting Outliers
An outlier is an unusually small or unusually large
value in a data set.
A data value with a z-score less than -3 or greater
than +3 might be considered an outlier.
7/3/2013 Descriptive Statistics 51
Box Plot
A box plot is a graphical summary of to identify
outliers.
A key to the development of a box plot is the
computation of the median and the quartiles Q
1
and
Q
3
.
7/3/2013 Descriptive Statistics 52
Box Plot
Lower Limit: Q1 - 1.5(IQR) = 445 - 1.5(80) = 325
Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(80) = 645
The lower limit is located 1.5(IQR) below Q1
The upper limit is located 1.5(IQR) above Q3.
There are no outliers (values less than 325 or
greater than 645) in the apartment rent data.
Example: Apartment Rents
7/3/2013 Descriptive Statistics 53
Box Plot

Whiskers (dashed lines) are drawn from the ends of the box
to the smallest and largest data values inside the limits.
400 425 450 475 500 525 550 575 600 625
Smallest value
inside limits = 425
Largest value
inside limits = 615
Example: Apartment Rents
7/3/2013 Descriptive Statistics 54
Weighted Mean
When the mean is computed by giving each data
value a weight that reflects its importance, it is
referred to as a weighted mean.
In the computation of a grade point average (GPA),
the weights are the number of credit hours earned for
each grade.
When data vary in importance, the analyst
must choose the weight that best reflects the
importance of each value.
7/3/2013 55 Descriptive Statistics
Weighted Mean
i i
i
wx
x
w
=

where:
x
i
= value of observation i
w
i
= weight for observation i
7/3/2013 56 Descriptive Statistics
Mean for Grouped Data
i i
f M
x
n
=

N
M f
i i
=
where:
f
i
= frequency of class i
M
i
= midpoint of class i
Sample Data
Population Data
7/3/2013 57 Descriptive Statistics
Sample Mean for Grouped Data
Example: Apartment Rents
7/3/2013 58 Descriptive Statistics
Sample Mean for Grouped Data
This approximation
differs by $2.41 from
the actual sample
mean of $490.80.
34, 525
493.21
70
x = =
Example: Apartment Rents
7/3/2013 59 Descriptive Statistics
Variance for Grouped Data
s
f M x
n
i i
2
2
1
=

( )
o

2
2
=

f M
N
i i
( )
For sample data
For population data
7/3/2013 60 Descriptive Statistics
Sample Variance for Grouped Data
7/3/2013 61 Descriptive Statistics
3,017.89 54.94 s = =
s
2
= 208,234.29/(70 1) = 3,017.89
This approximation differs by only $.20
from the actual standard deviation of $54.74.
Sample Variance
Sample Standard Deviation
Example: Apartment Rents
Sample Variance for Grouped Data
7/3/2013 62 Descriptive Statistics
ACKNOWLEDGEMENT
1) Statistics for Management by Levin & Rubin ( Prentice Hall )

2) Business Statistics by Aczel and Soundarpardian ( Pearson )

3) Business Statistics by Anderson, Sweeney & Williams ( Cengage )

4) Applied Statistics in Business & Economics by Doane ( McGraw-Hill )

You might also like