Download as pdf or txt
Download as pdf or txt
You are on page 1of 66

Statistics for

Business and Economics


6th Edition

Describing Data: Graphical


Types of Data

Data

Categorical Numerical

Examples:
Marital Status
Are you registered to Discrete Continuous
vote?
Eye Color Examples: Examples:
(Defined categories or Number of Children Weight
groups) Defects per hour Voltage
(Counted items) (Measured characteristics)

Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-2
Measurement Levels
Differences between
measurements, true Ratio Data
zero exists
Quantitative Data

Differences between
measurements but no Interval Data
true zero

Ordered Categories
(rankings, order, or Ordinal Data
scaling)
Qualitative Data

Categories (no
ordering or direction) Nominal Data
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-3
Graphical
Presentation of Data

Data in raw form are usually not easy to use


for decision making
Some type of organization is needed
Table

Graph

The type of graph to use depends on the


variable being summarized

Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-4
Graphical
Presentation of Data
(continued)
Techniques reviewed in this chapter:

Categorical Numerical
Variables Variables

Frequency distribution Line chart


Bar chart Frequency distribution
Pie chart Histogram and ogive
Pareto diagram Stem-and-leaf display
Scatter plot

Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-5
Tables and Graphs for
Categorical Variables
Categorical
Data

Tabulating Data Graphing Data

Frequency
Distribution Bar Pie Pareto
Table Chart Chart Diagram

Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-6
The Frequency
Distribution Table
Summarize data by category

Example: Hospital Patients by Unit


Hospital Unit Number of Patients

Cardiac Care 1,052


Emergency 2,245
Intensive Care 340
Maternity 552
Surgery 4,630
(Variables are
categorical)

Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-7
Bar and Pie Charts

Bar charts and Pie charts are often used


for qualitative (category) data

Height of bar or size of pie slice shows the


frequency or percentage for each
category

Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-8
Bar Chart Example

Hospital Number
Unit of Patients

Cardiac Care 1,052


Emergency 2,245 Hospital Patients by Unit
5000
Intensive Care 340
Maternity 552 4000
patients per year
Surgery 4,630
Number of

3000

2000

1000

0
Cardiac

Emergency

Maternity

Surgery
Intensive
Care

Care
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-9
Pie Chart Example

Hospital Number % of
Unit of Patients Total
Hospital Patients by Unit
Cardiac Care 1,052 11.93
Emergency 2,245 25.46 Cardiac Care
12%
Intensive Care 340 3.86
Maternity 552 6.26
Surgery 4,630 52.50

Emergency
Surgery 25%
53%

Intensive Care
(Percentages 4%
are rounded to Maternity
the nearest 6%
percent)
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-10
Graphs for Time-Series Data

A line chart (time-series plot) is used to show


the values of a variable over time

Time is measured on the horizontal axis

The variable of interest is measured on the


vertical axis

Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-11
Line Chart Example

Magazine Subscriptions by Year

350

300
Thousands of subscribers

250

200

150

100

50

0
1990

1991

1992

1993

1994

1995

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-12
Frequency Distributions

What is a Frequency Distribution?


A frequency distribution is a list or a table
containing class groupings (categories or
ranges within which the data fall) ...
and the corresponding frequencies with which
data fall within each class or category

Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-13
Why Use Frequency Distributions?

A frequency distribution is a way to


summarize data
The distribution condenses the raw data
into a more useful form...
and allows for a quick visual interpretation
of the data

Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-14
Class Intervals
and Class Boundaries

Each class grouping has the same width


Determine the width of each interval by
largest number smallest number
w interval width
number of desired intervals

Use at least 5 but no more than 15-20 intervals


Intervals never overlap
Round up the interval width to get desirable
interval endpoints

Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-15
Frequency Distribution Example

Example: A manufacturer of insulation randomly


selects 20 winter days and records the daily
high temperature

24, 35, 17, 21, 24, 37, 26, 46, 58, 30,
32, 13, 12, 38, 41, 43, 44, 27, 53, 27

Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-16
Frequency Distribution Example
(continued)

Sort raw data in ascending order:


12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Find range: 58 - 12 = 46
Select number of classes: 5 (usually between 5 and 15)
Compute interval width: 10 (46/5 then round up)

Determine interval boundaries: 10 but less than 20, 20 but


less than 30, . . . , 60 but less than 70

Count observations & assign to classes

Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-17
Frequency Distribution Example
(continued)
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Relative
Interval Frequency Percentage
Frequency
10 but less than 20 3 .15 15
20 but less than 30 6 .30 30
30 but less than 40 5 .25 25
40 but less than 50 4 .20 20
50 but less than 60 2 .10 10
Total 20 1.00 100
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-18
Histogram

A graph of the data in a frequency distribution


is called a histogram
The interval endpoints are shown on the
horizontal axis
the vertical axis is either frequency, relative
frequency, or percentage
Bars of the appropriate heights are used to
represent the number of observations within
each class
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-19
Histogram Example

Interval Frequency
His togram : Daily High Te m pe rature
10 but less than 20 3
20 but less than 30 6 7 6
30 but less than 40 5
6 5
40 but less than 50 4
50 but less than 60 2 5 4
Frequency

4 3
3 2
2
1 0 0
(No gaps 0
between 0 0 10 10 2020 30 30 40 40 50 50 60 60 70
bars) Temperature in Degrees
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-20
How Many Class Intervals?

Many (Narrow class intervals) 3.5


3
may yield a very jagged distribution 2.5

Frequency
with gaps from empty classes 2
1.5

Can give a poor indication of how 1


0.5
frequency varies across classes 0

12
16
20
24
28
32
36
40
44
48
52
56
60
More
4
8
Temperature

Few (Wide class intervals) 12

10
may compress variation too much and 8

Frequency
yield a blocky distribution 6
4

can obscure important patterns of 2

variation. 0
0 30 60 More
Temperature
(X axis labels are upper class endpoints)

Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-21
The Cumulative
Frequency Distribuiton
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Cumulative Cumulative
Class Frequency Percentage
Frequency Percentage

10 but less than 20 3 15 3 15


20 but less than 30 6 30 9 45
30 but less than 40 5 25 14 70
40 but less than 50 4 20 18 90
50 but less than 60 2 10 20 100
Total 20 100

Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-22
Distribution Shape
The shape of the distribution is said to be
symmetric if the observations are balanced,
or evenly distributed, about the center.
Symmetric Distribution

10
9
8
7
Frequency

6
5
4
3
2
1
0
1 2 3 4 5 6 7 8 9

Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-23
Distribution Shape
(continued)
The shape of the distribution is said to be
skewed if the observations are not
symmetrically distributed around the center.
Positively Skewed Distribution

A positively skewed distribution 12

(skewed to the right) has a tail that


10

Frequency
extends to the right in the direction of 6

positive values.
4

0
1 2 3 4 5 6 7 8 9

Negatively Skewed Distribution


A negatively skewed distribution
12
(skewed to the left) has a tail that 10

extends to the left in the direction of 8


Frequency

6
negative values. 4

0
1 2 3 4 5 6 7 8 9

Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-24
Scatter Diagrams

Scatter Diagrams are used for paired


observations taken from two
numerical variables

The Scatter Diagram:


one variable is measured on the vertical

axis and the other variable is measured


on the horizontal axis

Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-25
Scatter Diagram Example

Volume Cost per


Cost per Day vs. Production Volume
per day day
23 125 250
26 140 200
Cost per Day

29 146
150
33 160
38 167 100
42 170 50
50 188
0
55 195
0 10 20 30 40 50 60 70
60 200
Volume per Day

Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-26
Cross Tables

Cross Tables (or contingency tables) list the


number of observations for every combination
of values for two categorical or ordinal
variables

If there are r categories for the first variable


(rows) and c categories for the second
variable (columns), the table is called an r x c
cross table

Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-27
Cross Table Example

4 x 3 Cross Table for Investment Choices by Investor


(values in $1000s)
Investment Investor A Investor B Investor C Total
Category
Stocks 46.5 55 27.5 129
Bonds 32.0 44 19.0 95
CD 15.5 20 13.5 49
Savings 16.0 28 7.0 51
Total 110.0 147 67.0 324

Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-28
Graphing
Multivariate Categorical Data
(continued)

Side by side bar charts


Comparing Investors

S avings

CD

B onds

S toc ks

0 10 20 30 40 50 60

Investor A Investor B Investor C

Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-29
Statistics for
Business and Economics
6th Edition

Describing Data: Numerical


Describing Data Numerically
Describing Data Numerically

Central Tendency Variation

Arithmetic Mean Range

Median Interquartile Range

Mode Variance

Standard Deviation

Coefficient of Variation

Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-31
Measures of Central Tendency
Overview
Central Tendency

Mean Median Mode

x i
x i1
n
Arithmetic Midpoint of Most frequently
average ranked values observed value

Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-32
Arithmetic Mean
The arithmetic mean (mean) is the most
common measure of central tendency
For a population of N values:
N

x
x1 x 2 x N
i Population
i1
values
N N
Population size

For a sample of size n:


n

x i
x1 x 2 x n Observed
x i1
values
n n
Sample size
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-33
Arithmetic Mean
(continued)

The most common measure of central tendency


Mean = sum of values divided by the number of values
Affected by extreme values (outliers)

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

Mean = 3 Mean = 4
1 2 3 4 5 15 1 2 3 4 10 20
3 4
5 5 5 5

Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-34
Median
In an ordered list, the median is the middle
number (50% above, 50% below)

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

Median = 3 Median = 3

Not affected by extreme values

Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-35
Finding the Median

The location of the median:

n 1
Median position position in the ordered data
2
If the number of values is odd, the median is the middle number
If the number of values is even, the median is the average of
the two middle numbers

n 1
Note that is not the value of the median, only the
2
position of the median in the ranked data

Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-36
Mode
A measure of central tendency
Value that occurs most often
Not affected by extreme values
Used for either numerical or categorical data
There may may be no mode
There may be several modes

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6

No Mode
Mode = 9
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-37
Review Example

Five houses on a hill by the beach


$2,000 K
House Prices:

$2,000,000
500,000 $500 K
300,000 $300 K
100,000
100,000

$100 K

$100 K

Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-38
Review Example:
Summary Statistics

House Prices:
Mean: ($3,000,000/5)
$2,000,000 = $600,000
500,000
300,000
100,000
100,000 Median: middle value of ranked data
Sum 3,000,000
= $300,000

Mode: most frequent value


= $100,000

Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-39
Which measure of location
is the best?

Mean is generally used, unless


extreme values (outliers) exist
Then median is often used, since
the median is not sensitive to
extreme values.
Example: Median home prices may be
reported for a region less sensitive to
outliers

Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-40
Shape of a Distribution

Describes how data are distributed


Measures of shape
Symmetric or skewed

Left-Skewed Symmetric Right-Skewed


Mean < Median Mean = Median Median < Mean

Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-41
Measures of Variability
Variation

Range Interquartile Variance Standard Coefficient of


Range Deviation Variation

Measures of variation give


information on the spread
or variability of the data
values.

Same center,
different variation
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-42
Range

Simplest measure of variation


Difference between the largest and the smallest
observations:

Range = Xlargest Xsmallest

Example:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Range = 14 - 1 = 13

Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-43
Disadvantages of the Range
Ignores the way in which data are distributed

7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5

Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119

Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-44
Interquartile Range

Can eliminate some outlier problems by using


the interquartile range

Eliminate high- and low-valued observations


and calculate the range of the middle 50% of
the data

Interquartile range = 3rd quartile 1st quartile


IQR = Q3 Q1

Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-45
Interquartile Range

Example:
Median X
X Q1 Q3 maximum
minimum (Q2)
25% 25% 25% 25%

12 30 45 57 70

Interquartile range
= 57 30 = 27

Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-46
Quartiles
Quartiles split the ranked data into 4 segments with
an equal number of values per segment

25% 25% 25% 25%

Q1 Q2 Q3

The first quartile, Q1, is the value for which 25% of the
observations are smaller and 75% are larger
Q2 is the same as the median (50% are smaller, 50% are
larger)
Only 25% of the observations are greater than the third
quartile

Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-47
Quartile Formulas

Find a quartile by determining the value in the


appropriate position in the ranked data, where

First quartile position: Q1 = 0.25(n+1)

Second quartile position: Q2 = 0.50(n+1)


(the median position)

Third quartile position: Q3 = 0.75(n+1)

where n is the number of observed values

Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-48
Quartiles

Example: Find the first quartile


Sample Ranked Data: 11 12 13 16 16 17 18 21 22

(n = 9)
Q1 = is in the 0.25(9+1) = 2.5 position of the ranked data
so use the value half way between the 2nd and 3rd values,

so Q1 = 12.5

Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-49
Population Variance

Average of squared deviations of values from


the mean
N
Population variance:
(x )
i
2


2 i1
N -1
Where = population mean
N = population size
xi = ith value of the variable x
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-50
Sample Variance

Average (approximately) of squared deviations


of values from the mean
n
Sample variance:
(x x)
i
2

s
2 i1
n -1
Where X = arithmetic mean
n = sample size
Xi = ith value of the variable X
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-51
Population Standard Deviation
Most commonly used measure of variation
Shows variation about the mean
Has the same units as the original data

Population standard deviation:

i
(x ) 2

i 1
N -1
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-52
Sample Standard Deviation
Most commonly used measure of variation
Shows variation about the mean
Has the same units as the original data

Sample standard deviation: n

(x x)

2
i
S i1
n -1

Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-53
Calculation Example:
Sample Standard Deviation
Sample
Data (xi) : 10 12 14 15 17 18 18 24
n=8 Mean = x = 16

(10 X)2 (12 x)2 (14 x)2 (24 x)2


s
n 1

(10 16)2 (12 16)2 (14 16)2 (24 16)2



8 1

126 A measure of the average


4.2426 scatter around the mean
7
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-54
Measuring variation

Small standard deviation

Large standard deviation

Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-55
Weighted Mean

The weighted mean of a set of data is

w x i i
w 1x1 w 2 x 2 w n x n
x i1

w wi
Where wi is the weight of the ith observation

Use when data is already grouped into n classes, with


wi values in the ith class

Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-56
Approximations for Grouped
Data
Suppose a data set contains values m1, m2, . . ., mk,
occurring with frequencies f1, f2, . . . fK

For a population of N observations the mean is


K

fimi K
where N fi
i1 i1

N
For a sample of n observations, the mean is
K

fm i i
K
where n fi
x i1
i1

n
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-57
Approximations for Grouped
Data
Suppose a data set contains values m1, m2, . . ., mk,
occurring with frequencies f1, f2, . . . fK

For a population of N observations the variance is


K

i i
f (m ) 2

2 i 1
N
For a sample of n observations, the variance is
K

i i
f (m x) 2

s2 i 1
n 1
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-58
The Sample Covariance
The covariance measures the strength of the linear relationship
between two variables

The population covariance:


N

(x i x )(y i y )
Cov (x , y) xy i1
N
The sample covariance:
n

(x x)(y y)
i i
Cov (x , y) s xy i1
n 1
Only concerned with the strength of the relationship
No causal effect is implied
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-59
Interpreting Covariance

Covariance between two variables:

Cov(x,y) > 0 x and y tend to move in the same direction

Cov(x,y) < 0 x and y tend to move in opposite directions

Cov(x,y) = 0 x and y are independent

Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-60
Coefficient of Correlation
Measures the relative strength of the linear relationship
between two variables

Population correlation coefficient:


Cov (x , y)

XY
Sample correlation coefficient:
Cov (x , y)
r
sX sY

Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-61
Features of
Correlation Coefficient, r
Unit free
Ranges between 1 and 1
The closer to 1, the stronger the negative linear
relationship
The closer to 1, the stronger the positive linear
relationship
The closer to 0, the weaker any positive linear
relationship

Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-62
Scatter Plots of Data with Various
Correlation Coefficients
Y Y Y

X X X
r = -1 r = -.6 r=0
Y
Y Y

X X X
r = +1 r = +.3 r=0
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-63
Interpreting the Result
Scatter Plot of Test Scores
r = .733 100

95

There is a relatively

Test #2 Score
90

85

strong positive linear 80

relationship between 75

test score #1 70
70 75 80 85 90 95 100

Test #1 Score
and test score #2

Students who scored high on the first test tended


to score high on second test

Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-64
Obtaining Linear Relationships

An equation can be fit to show the best linear


relationship between two variables:

Y = 0 + 1X

Where Y is the dependent variable and X is the


independent variable

Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-65
Least Squares Regression
Estimates for coefficients 0 and 1 are found to
minimize the sum of the squared residuals
The least-squares regression line, based on sample
data, is
y b0 b1 x
Where b1 is the slope of the line and b0 is the y-
intercept:

Cov(x, y) sy
b1 2
r b0 y b1x
sx sx
Statistics for Business and Economics, 6e 2007 Pearson Education, Inc. Chap 2-66

You might also like