Professional Documents
Culture Documents
Data PDF
Data PDF
Data
Categorical Numerical
Examples:
Marital Status
Are you registered to Discrete Continuous
vote?
Eye Color Examples: Examples:
(Defined categories or Number of Children Weight
groups) Defects per hour Voltage
(Counted items) (Measured characteristics)
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-2
Measurement Levels
Differences between
measurements, true Ratio Data
zero exists
Quantitative Data
Differences between
measurements but no Interval Data
true zero
Ordered Categories
(rankings, order, or Ordinal Data
scaling)
Qualitative Data
Categories (no
ordering or direction) Nominal Data
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-3
Graphical
Presentation of Data
Graph
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-4
Graphical
Presentation of Data
(continued)
Techniques reviewed in this chapter:
Categorical Numerical
Variables Variables
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-5
Tables and Graphs for
Categorical Variables
Categorical
Data
Frequency
Distribution Bar Pie Pareto
Table Chart Chart Diagram
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-6
The Frequency
Distribution Table
Summarize data by category
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-7
Bar and Pie Charts
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-8
Bar Chart Example
Hospital Number
Unit of Patients
3000
2000
1000
0
Cardiac
Emergency
Maternity
Surgery
Intensive
Care
Care
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-9
Pie Chart Example
Hospital Number % of
Unit of Patients Total
Hospital Patients by Unit
Cardiac Care 1,052 11.93
Emergency 2,245 25.46 Cardiac Care
12%
Intensive Care 340 3.86
Maternity 552 6.26
Surgery 4,630 52.50
Emergency
Surgery 25%
53%
Intensive Care
(Percentages 4%
are rounded to Maternity
the nearest 6%
percent)
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-10
Graphs for Time-Series Data
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-11
Line Chart Example
350
300
Thousands of subscribers
250
200
150
100
50
0
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-12
Frequency Distributions
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-13
Why Use Frequency Distributions?
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-14
Class Intervals
and Class Boundaries
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-15
Frequency Distribution Example
24, 35, 17, 21, 24, 37, 26, 46, 58, 30,
32, 13, 12, 38, 41, 43, 44, 27, 53, 27
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-16
Frequency Distribution Example
(continued)
Find range: 58 - 12 = 46
Select number of classes: 5 (usually between 5 and 15)
Compute interval width: 10 (46/5 then round up)
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-17
Frequency Distribution Example
(continued)
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Relative
Interval Frequency Percentage
Frequency
10 but less than 20 3 .15 15
20 but less than 30 6 .30 30
30 but less than 40 5 .25 25
40 but less than 50 4 .20 20
50 but less than 60 2 .10 10
Total 20 1.00 100
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-18
Histogram
Interval Frequency
His togram : Daily High Te m pe rature
10 but less than 20 3
20 but less than 30 6 7 6
30 but less than 40 5
6 5
40 but less than 50 4
50 but less than 60 2 5 4
Frequency
4 3
3 2
2
1 0 0
(No gaps 0
between 0 0 10 10 2020 30 30 40 40 50 50 60 60 70
bars) Temperature in Degrees
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-20
How Many Class Intervals?
Frequency
with gaps from empty classes 2
1.5
12
16
20
24
28
32
36
40
44
48
52
56
60
More
4
8
Temperature
10
may compress variation too much and 8
Frequency
yield a blocky distribution 6
4
variation. 0
0 30 60 More
Temperature
(X axis labels are upper class endpoints)
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-21
The Cumulative
Frequency Distribuiton
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Cumulative Cumulative
Class Frequency Percentage
Frequency Percentage
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-22
Distribution Shape
The shape of the distribution is said to be
symmetric if the observations are balanced,
or evenly distributed, about the center.
Symmetric Distribution
10
9
8
7
Frequency
6
5
4
3
2
1
0
1 2 3 4 5 6 7 8 9
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-23
Distribution Shape
(continued)
The shape of the distribution is said to be
skewed if the observations are not
symmetrically distributed around the center.
Positively Skewed Distribution
Frequency
extends to the right in the direction of 6
positive values.
4
0
1 2 3 4 5 6 7 8 9
6
negative values. 4
0
1 2 3 4 5 6 7 8 9
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-24
Scatter Diagrams
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-25
Scatter Diagram Example
29 146
150
33 160
38 167 100
42 170 50
50 188
0
55 195
0 10 20 30 40 50 60 70
60 200
Volume per Day
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-26
Cross Tables
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-27
Cross Table Example
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-28
Graphing
Multivariate Categorical Data
(continued)
S avings
CD
B onds
S toc ks
0 10 20 30 40 50 60
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-29
Statistics for
Business and Economics
6th Edition
Mode Variance
Standard Deviation
Coefficient of Variation
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-31
Measures of Central Tendency
Overview
Central Tendency
x i
x i1
n
Arithmetic Midpoint of Most frequently
average ranked values observed value
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-32
Arithmetic Mean
The arithmetic mean (mean) is the most
common measure of central tendency
For a population of N values:
N
x
x1 x 2 x N
i Population
i1
values
N N
Population size
x i
x1 x 2 x n Observed
x i1
values
n n
Sample size
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-33
Arithmetic Mean
(continued)
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Mean = 3 Mean = 4
1 2 3 4 5 15 1 2 3 4 10 20
3 4
5 5 5 5
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-34
Median
In an ordered list, the median is the middle
number (50% above, 50% below)
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Median = 3 Median = 3
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-35
Finding the Median
n 1
Median position position in the ordered data
2
If the number of values is odd, the median is the middle number
If the number of values is even, the median is the average of
the two middle numbers
n 1
Note that is not the value of the median, only the
2
position of the median in the ranked data
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-36
Mode
A measure of central tendency
Value that occurs most often
Not affected by extreme values
Used for either numerical or categorical data
There may may be no mode
There may be several modes
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6
No Mode
Mode = 9
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-37
Review Example
$2,000,000
500,000 $500 K
300,000 $300 K
100,000
100,000
$100 K
$100 K
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-38
Review Example:
Summary Statistics
House Prices:
Mean: ($3,000,000/5)
$2,000,000 = $600,000
500,000
300,000
100,000
100,000 Median: middle value of ranked data
Sum 3,000,000
= $300,000
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-39
Which measure of location
is the best?
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-40
Shape of a Distribution
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-41
Measures of Variability
Variation
Same center,
different variation
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-42
Range
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 14 - 1 = 13
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-43
Disadvantages of the Range
Ignores the way in which data are distributed
7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5
Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-44
Interquartile Range
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-45
Interquartile Range
Example:
Median X
X Q1 Q3 maximum
minimum (Q2)
25% 25% 25% 25%
12 30 45 57 70
Interquartile range
= 57 30 = 27
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-46
Quartiles
Quartiles split the ranked data into 4 segments with
an equal number of values per segment
Q1 Q2 Q3
The first quartile, Q1, is the value for which 25% of the
observations are smaller and 75% are larger
Q2 is the same as the median (50% are smaller, 50% are
larger)
Only 25% of the observations are greater than the third
quartile
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-47
Quartile Formulas
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-48
Quartiles
(n = 9)
Q1 = is in the 0.25(9+1) = 2.5 position of the ranked data
so use the value half way between the 2nd and 3rd values,
so Q1 = 12.5
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-49
Population Variance
2 i1
N -1
Where = population mean
N = population size
xi = ith value of the variable x
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-50
Sample Variance
s
2 i1
n -1
Where X = arithmetic mean
n = sample size
Xi = ith value of the variable X
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-51
Population Standard Deviation
Most commonly used measure of variation
Shows variation about the mean
Has the same units as the original data
i
(x ) 2
i 1
N -1
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-52
Sample Standard Deviation
Most commonly used measure of variation
Shows variation about the mean
Has the same units as the original data
(x x)
2
i
S i1
n -1
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-53
Calculation Example:
Sample Standard Deviation
Sample
Data (xi) : 10 12 14 15 17 18 18 24
n=8 Mean = x = 16
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-55
Weighted Mean
w x i i
w 1x1 w 2 x 2 w n x n
x i1
w wi
Where wi is the weight of the ith observation
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-56
Approximations for Grouped
Data
Suppose a data set contains values m1, m2, . . ., mk,
occurring with frequencies f1, f2, . . . fK
fimi K
where N fi
i1 i1
N
For a sample of n observations, the mean is
K
fm i i
K
where n fi
x i1
i1
n
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-57
Approximations for Grouped
Data
Suppose a data set contains values m1, m2, . . ., mk,
occurring with frequencies f1, f2, . . . fK
i i
f (m ) 2
2 i 1
N
For a sample of n observations, the variance is
K
i i
f (m x) 2
s2 i 1
n 1
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-58
The Sample Covariance
The covariance measures the strength of the linear relationship
between two variables
(x i x )(y i y )
Cov (x , y) xy i1
N
The sample covariance:
n
(x x)(y y)
i i
Cov (x , y) s xy i1
n 1
Only concerned with the strength of the relationship
No causal effect is implied
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-59
Interpreting Covariance
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-60
Coefficient of Correlation
Measures the relative strength of the linear relationship
between two variables
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-61
Features of
Correlation Coefficient, r
Unit free
Ranges between 1 and 1
The closer to 1, the stronger the negative linear
relationship
The closer to 1, the stronger the positive linear
relationship
The closer to 0, the weaker any positive linear
relationship
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-62
Scatter Plots of Data with Various
Correlation Coefficients
Y Y Y
X X X
r = -1 r = -.6 r=0
Y
Y Y
X X X
r = +1 r = +.3 r=0
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-63
Interpreting the Result
Scatter Plot of Test Scores
r = .733 100
95
There is a relatively
Test #2 Score
90
85
relationship between 75
test score #1 70
70 75 80 85 90 95 100
Test #1 Score
and test score #2
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-64
Obtaining Linear Relationships
Y = 0 + 1X
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-65
Least Squares Regression
Estimates for coefficients 0 and 1 are found to
minimize the sum of the squared residuals
The least-squares regression line, based on sample
data, is
y b0 b1 x
Where b1 is the slope of the line and b0 is the y-
intercept:
Cov(x, y) sy
b1 2
r b0 y b1x
sx sx
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-66