Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 24

INTRODUCTION TO

STATISTICS AND
DESCRIPTIVE STATISTICS

1
LECTURE GOALS
After completing this lecture, you should be able to:

 Know key definitions: Population vs. Sample


 Identify the different types of statistical charts
 Compute and interpret the mean, median, and mode for a
set of data
 Compute the range, variance, and standard deviation and
know what these values mean

2
POPULATIONS and SAMPLES

A Population - the entire items or individuals of interest


eg: All voters in the election, all students in a university, all
workers in a factory etc.

Parameter - a summary measure computed to describe a


characteristic of the population eg average age of population

A Sample - a subset of the population


eg: 1000 voters selected at random for interview, 500 students
from a university, etc.

Statistic - summary measure computed to describe a


characteristic of the sample
3
DESCRIBING DATA USING NUMERICAL
MEASURES
Describing Data Numerically

Central Location Variation / dispersion

Mean Range

Median Interquartile Range


Mode
Variance

Standard Deviation

Coefficient of
Variation 4
MEAN (ARITHMETIC AVERAGE)
 The Mean is the arithmetic average of
data values
N = Population Size
Population mean
N

xx1  x2    x N
i
 
i 1
N N
n = Sample Size
n
Sample mean x i
x1  x2    xn
x i 1

n n
5
MEAN (ARITHMETIC AVERAGE)
 The most common measure of central tendency
 Mean = sum of values divided by the number of
values
 Affected by extreme values (outliers)

For example, 2 sets of data are given below:


2, 7, 9 12, 15 : mean = (2+7+9+12+15)/5 = 9

2, 7, 9 12, 15, 70 : mean = (2+7+9+12+15+70)/6

= 19.17
6
MEDIAN
 In an ordered array, the median is the
“middle” number, i.e., the number that splits
the distribution in half
 The median is not affected by extreme
values (outliers)

 To find the median, sort the data values from low


to high
 Find the value in the i = (n+1)/2 position
(middle position)

7
 For example, a set of data is given below:
10 5 19 8 3
First, we rank the given data in an increasing
order as follows:
3 5 8 10 19
There are five observations in the data set.
Consequently, n = 5 and

Position of the middle term = (5+1)/2 = 3

Median is 8
8
MODE
 The value that occurs most often
 Not affected by extreme values
 There may be no mode
 There may be several modes

9
For example, 2 sets of data are given as
follows:
77 69 74 81 71 68 74 73
Mode = 74

495, 486, 503, 495, 470, 505, 470, 499


Mode = 495 and 470

10
SHAPE OF A DISTRIBUTION
 Describes how data is distributed
 Symmetric or skewed

Left-Skewed Symmetric Right-Skewed


outliers outliers

Mean < Median Mean = Median Mean > Median


(Longer tail extends to left) (Longer tail extends to right)

Example sets of data:


1, 30, 34, 40, 46, 50 1, 2, 4, 6, 7 2, 7, 9, 12, 15, 70
11
Which measure of location
is the “best”?

 Mean is generally used for


symmetric distributions and no
extreme values (outliers) exist
 Median is often used for
skewed distributions and if
extreme values (outliers) exist.

12
Measures of Variation

Variation

Range Variance Standard Deviation Coefficient of


Variation
Population Population
Interquartile
Variance Standard
Range
Deviation

Sample Sample
Variance Standard
Deviation

13
Variation

 Measures of variation give information on the


spread or variability of the data values.
Example sets of data:
20, 25, 30, 35, 40
Mean: 30
Standard deviation (SD): 7.9

10, 20, 30, 40, 50


Mean: 30
Standard deviation (SD): 15.8

Same center / mean,


different variation 14
RANGE
 Simplest measure of variation
 Difference between the largest and the smallest
observations:

Range = xmaximum – xminimum

Example:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

15
Disadvantages of the Range
 Ignores the way in which data are distributed

7 8 9 10 11 12 7 8 9 10 11 12

 Sensitive to outliers

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120

16
Interquartile Range
 Difference between the 3rd quarter and
1st quarter of the data
 Interquartile range = 3rd quartile – 1st quartile

 To find the 1st quartile and the 3rd quartile, sort


the data values from low to high.
1st quartile: the value in the (n+1) position
3rd quartile: the value in the (n+1) position

17
Find the Inter Quartile Range
Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22
*make sure the data is ranked in an increasing order

1st quartile position = (9+1) = 2.5,


1st quartile = (12+13)/2 = 12.5,

3rd quartile position = (9+1) = 7.5,


3rd quartile = (18+21)/2 = 19.5

Interquartile range = 19.5 – 12.5 = 7

18
VARIANCE

Population variance:
N

 i
(x  μ) 2

σ2  i 1
N

n
Sample variance:  (x i  x) 2

s2  i1
n -1
19
STANDARD DEVIATION

Population standard deviation:

x x
N 2 2

2
σ  σ
 i
(x  μ) 2
OR   
 N 
i 1 N  
N
 Sample standard deviation:

s 
2
n

 (x i  x ) 2
OR
1   x 2


s i 1
s
n 1   x 
2

n 
n -1  
20
Calculation Example:
Sample Standard Deviation
Sample Data (Xi) : 48.50 38.40 65.50
22.60 79.80 54.60
x x²
48.50 2352.25
38.40 1474.56
65.50 4290.25
22.60 510.76
79.80 6368.04
54.60 2981.16
∑x = 309.40 ∑x² = 17,977.02
    
2
1  x 1  (309.40) 2

s
n 1 
 x 
2

n 
  17,977.02 
6 1  6
  20.11
  
Coefficient of Variation

 The coefficient of variation is used to


measure variation relative to the mean

Population Sample

σ  s 
CV  
μ
  100% CV   
   x   100%
 

22
Comparing Coefficients of Variation
 Stock A:
 Average price last year = $50 CV = ($5/$50) . 100%
 Standard deviation = $5
= 10%

 Stock B:
 Average price last year = $100
CV = ($5/$100) . 100%
 Standard deviation = $5
= 5%

23
Both stocks have the same standard
deviation, but stock B is less variable
relative to its price, i.e. the price for stock
B is less volatile than stock A.

24

You might also like