Managerial Stat Lecture

Statistics for Business and
Economics
Week 1
Introduction to Statistics
Descriptive Statistics
Chapter 1
Statistics, Data, &
Statistical Thinking
What Is Statistics?
Science of collecting,
organizing, analyzing
and interpreting data in
order to make decision.
Application Areas
Economics
Forecasting
Demographics
Sports
Individual & Team
Performance
Engineering
Construction
Materials
Business
Consumer Preferences
Financial Trends
Branches of Statistic
Descriptive Statistics
Involves the collection, organization, summarization,
and display of data
Branches of Statistic
Inferential statistic:
Involves using a
sample to draw
conclusion about a
population.
A basic tool of
inferential statistics is
probability.
2. Decide which branch of statistic each

statement belongs to (inferential/descriptive)
a. The chances of winning the California lottery are 1
in 22 million.
a. Inferential statistics
In the year 2020, 168 million Americans will be

enrolled in an HMO.
a. Descriptive statistics
b. Descriptive statistics
b. Inferential statistics
There is a relationship between smoking cigarettes

and getting emphysema.
a. Inferential statistics
b. Descriptive statistics
Population vs. Sample
opulation
A subset of population
Has the same/similar
characteristics as the
population
More often used
Numeric description is
Collection of all
outcomes, responses,
measurements or
counts that are of
interest
Numeric description of
population is
Parameter
ample
Population
Sample
Statistic
Ex
1. Is the number a parameter or statistic?
a. A recent survey by a national womens
association showed that the average salary of
3,500 of its 65,000 membership was $73,000.
a. Statistic
A recent survey by the alumni of a major

university indicated that the average salary of
8,000 of its 250,000 graduates was $120,000.
a. Statistic
b. Parameter
b. Parameter
The average salary of all GM workers is

$42,500.
a. Statistic
b. Parameter
Types of Data
Qualitative
Consists of attributes,
labels, or nonnumeric entries
Quantitative:
Consists of numerical
measurements and
counts.
College major of each

student in a class.
Gender of each employee
at a company.
Method of payment
(cash, check, credit card).
Number of defective items

in a lot.
Salaries of CEOs of oil
companies.
Ages of employees at a
company.
3. Decide if the data is quantitative or

qualitative:
Ex
a. The colors of automobiles on a used car

lot
a. Quantitative
b. Qualitative
The number of complaint letters received

by the USPS in a given day
Quantitative
b. Qualitative
The number of seats in a movie theater

Quantitative
b. Qualitative
Types of Data Collection

1. Published source:
Example: book, journal,
newspaper, Web site

2. Designed experiment:
A treatment is applied to part
of a population and
responses are observed.
Example: effect of stress on

driving habits.
Example: effect of a drug
using 2 groups
(drug/placebo).

3. Survey:
A group of people are surveyed and their responses are
recorded to investigate one/more characteristics of
population.
Example: study U.S.
Residents presidential
approval.

4. Observation study
A researcher observes
and measures
characteristics of a
group.
Example: observing how
DeVry students drive.
Ex
4. What method of data collection would you
use to collect data for:
a. A study where a drug was given to patients and
a placebo to another group of 10 patients to
determine if the drug has an effect on a patients
illness.
Perform an Experiment
b. A study of the salaries of college professors at a

particular college.
Take a Survey (Census)
c. A study where a political pollster wishes to

determine if his candidate is leading in the polls.
Take a Survey
Samples
A representative sample exhibits characteristics
typical of those possessed by the population of
interest.
A random sample of n experimental units is a
sample selected from the population in such a way
that every different sample of size n has an equal
chance of selection.
Nonrandom Sample Errors

Selection bias results when a subset of the
experimental units in the population is excluded so
that these units have no chance of being selected for
the sample.
Non-response bias results when the researchers
conducting a survey or study are unable to obtain data
on all experimental units selected for the sample.
Measurement error refers to inaccuracies in the
values of the data recorded. In surveys, the error may
be due to ambiguous or leading questions and the
interviewers effect on the respondent.
Chapter 2
Methods for Describing
Sets of Data
Data Presentation
Data
Presentation
Qualitative
Data
Quantitative
Data
Dot
Plot
Summary
Table
Bar
Graph
Pie
Chart
Pareto
Diagram
Stem-&-Leaf
Display
Frequency
Distribution
Histogram
Summary Table
1. Lists categories & number of elements in category
2. Obtained by tallying responses in category
3. May show frequencies (counts), % or both
Major
Accounting
Economics
Management
Total
Count
130
20
50
200
Bar Graph
Percent
Used
Also
Frequency
150
Equal Bar
Widths
Bar Height
Shows
Frequency or %
100
50
0
Acct.
Econ.
Major
Zero Point
Mgmt.
Vertical Bars
for Qualitative
Variables
Pie Chart
1. Shows breakdown of
total quantity into
categories
2. Useful for showing
relative difference
Majors
Econ.
10%
Mgmt.
25%
Acct.
65%
Pareto Diagram
Like a bar graph, but with the categories arranged by
height in descending order from left to right.
Percent
Used
Also
Frequency
150
Equal Bar
Widths
Bar Height
Shows
Frequency or %
100
50
0
Acct.
Mgmt.
Major
Zero Point
Econ.
Vertical Bars
for Qualitative
Variables
Ex
Example
Youre an analyst for IRI. You want to show the
market shares held by Web browsers in 2006.
Construct a bar graph, pie chart, & Pareto diagram
to describe the data.
Browser
Mkt. Share 2006
Mkt. Share 2013
Firefox
14%
23%
Internet Explorer
81%
26%
Safari
4%
6%
1%
100%
45%
100%
Others (Chrome)
Total
Market Share (%)
Bar Graph Solution 2006

100%
80%
60%
40%
20%
0%
Firefox
Internet
Explorer
Safari
Browser
Others
Pie Chart Solution 2006

Market Share
Firefox,
14%
Safari, 4%
Others,
1%
Internet
Explorer,
81%
Market Share (%)
Pareto Diagram Solution 2006

100%
80%
60%
40%
20%
0%
Internet
Explorer
Firefox
Safari
Browser
Others
Dot Plot
1. Horizontal axis is a scale for the quantitative variable,
e.g., percent.
2. The numerical value of each measurement is located
on the horizontal scale by a dot.
Stem-and-Leaf Display
1. Divide each observation
into stem value and leaf
value
Stems are listed in
order in a column
Leaf value is placed in
corresponding stem
row to right of bar
2 144677
26
3 028
4 1
Key : 3 | 8 = 38
2. Data: 21, 24, 24, 26, 27, 27, 30, 32, 38, 41
Frequency Distribution
Class
Frequency
(f)
Relative
Frequency
1-5
0.14
6-10
0.22
11-15
0.17
16-20
0.22
21-25
0.14
26-30
0.11
36
A table that shows

classes or intervals of
data entries with
number of entries in
each class.
Data:
2, 7, 12, 17, 1, 4, 21,
22, 29, 24, 18, 14, 8, 9,
27, 3, 10, 13, 15, 5, 23,
30, 6, 6, 11, 20, 19, 7,
19, 28, 16, 25, 16, 12,
17, 9.
Key Terms
A class is one of the categories into which
quantitative data can be classified.
The class frequency is the number of
observations in the data set falling into a
particular class.
The class relative frequency is the class
frequency divided by the total numbers of
observations in the data set.
The class percentage is the class relative
frequency multiplied by 100.
Class width:
2nd_lower_class 1st_lower class = 6 - 1=5
Lower class limits: 1, 6, 11, 16, 21, 26
Upper class limits: 5, 10, 15, 20, 25, 30
Midpoints: (upper_class+lower_class)/2
3, 8, 13, 18, 23, 28
Range: max min = 30 1 = 29

Sample size n = f = 36
Relative frequency = f/n
Frequency Distribution Table

Example
Raw Data: 24, 26, 24, 21, 27 27 30, 41, 32, 38
Class
Width
Midpoint Frequency
16 25
20.5
26 35
30.5
36 45
40.5
Boundaries
(Lower + Upper Boundaries) / 2
Relative Frequency &

% Distribution Tables
Relative Frequency
Distribution (f/n)
Class
f/n
Percentage
Distribution
Class
16 25
.3
16 25
30.0
26 35
.5
26 35
50.0
36 45
.2
36 45
20.0
Ex
6. Use the table to answer the questions
Class
4-10
11-17
18-24
25-31
Frequency
5
15
16
11
Relative Frequency
5/47 = 0.11
15/47 = 0.32
16/47 = 0.34
11/47 = 0.23
a. Identify the class width
11- 4 = 7
b. Identify number of samples Total = 5+15+16+11 = 47
c. Give Relative Frequency of each class
Rel. Freq = Freq/Total
Histogram
Class
16 25
26 35
36 45
Count
5
Frequency
3
Bars
Touch
2
1
0
0
16
25
35
Lower Boundary
45
55
Freq.
3
5
2
Central Tendency
A Set of measurements that measures data
tendency to cluster about certain numerical
values, and data variability.
Central Tendency
(Location)
Variation
(Dispersion)
Standard Notation
Measure
Mean
Standard
Deviation
Sample
Population
(mu)
(sigma)
Variance
Size
s 2
Mean
1.
2.
3.
4.
Most common measure of central tendency

Acts as balance point
Affected by extreme values (outliers)
Denoted x where
n
x i 1
n
x
n
Median
1. Measure of central tendency
2. Middle value in ordered sequence
If n is odd, middle value of sequence

If n is even, average of 2 middle values
3. Position of median in sequence

Positioning
Point
n 1
2
4. Not affected by extreme values
Mode
1. Measure of central tendency
2. Value that occurs most often
4. May be no mode or several modes
5. May be used for quantitative or qualitative
data
Range
1. Measure of dispersion
2. Difference between largest & smallest
observations
Range = xlargest xsmallest
3. Ignores how data are distributed
7 8 9 10
Range = 10 7 = 3
7 8 9 10
Range = 10 7 = 3
Variance &
Standard Deviation
1. Measures of dispersion
2. Most common measures
3. Consider how data are distributed
4. Show variation about mean (x or )
x = 8.3
4
8 10 12
Central Tendency
-
Mean: population
x ,
Sample x
N
n
Median: Value in the middle when data is placed in order.
Mode: Most frequent
Range: (data_max) (data_min)
Deviation: Difference between data and mean x or x x

population
(x )
Variance:
Standard deviation= Variance
Sample: S
(x x)
n 1
Ex
7. What is the standard deviation if variance is 36?
Std. Dev = Sqrt(Variance ) = Sqrt (36)= 6
8. Youre a financial analyst for Prudential-Bache
Securities. You have collected the following closing stock
prices of new stock issues: 32,33,40,32,30
Describe the stock prices in terms of central tendency.
(32+33+40+32+30)/5=
Mean:
33.4
Median:
Mode:
Range
Variance
30,32,32,33,40
30,32,32,33,40
40 30 =
32
32
10
[(30-33.4)2+ (32- 33.4)2+(32- 33.4)2

+(33- 33.4)2 +(40- 33.4)2]/(5-1)=
Standard Deviation:
Sqrt(14.80)
14.80
3.85
Knowing central tendency

Knowing ONLY the mean can lead one to purchase
Model A. Knowing the variance as well may change
ones decision!
Shape
1. Describes how data are distributed
2. Measures of Shape
Skew (tail) = Symmetry
Left-Skewed
Mean Median
Symmetric
Mean = Median
Right-Skewed
Median Mean
Chebyshevs Theorem
Applies to any
shape data set
At least 3/4 (75%)
of the data lies in
the interval
x 2s to x + 2s
At least 8/9 of the
data lies in the
interval
x 3s to x + 3s
x 3s
x 2s
xs
xs
x 2s
No useful information
At least 3/4 of the data
At least 8/9 of the data
x 3s
Chebyshevs Theorem Example

Previously we found the mean
closing stock price of new stock
issues is 33.4 and the standard
deviation is 3.44.
Use this information to form an
interval that will contain at least
75% of the closing stock prices of
new stock issues.
Chebyshevs Theorem Example

At least 75% of the closing stock prices of new stock
issues will lie within 2 standard deviations of the mean.
x = 33.5
s = 3.44
(x 2s, x + 2s) = ( 33.5 2 * 3.44 , 33.5 + 2 * 3.44)

(26.62 , 40.38)
Empirical Rule (68-95-99.7)
To see how data varies from the

mean
Data needs to be
Symmetric
Bell shaped distribution
Empirical Rule Example

Previously we found the mean
closing stock price of new
stock issues is 33.5 and the
standard deviation is 3.44. If
we can assume the data is
symmetric and mound shaped,
calculate the percentage of the
data that lie within the intervals
x + s, x + 2s, x + 3s.
Empirical Rule Example

According to the Empirical Rule, approximately 68% of
the data will lie in the interval (x s, x + s),
(33.5 3.44, 33.5 + 3.44) = (30.06 , 36.94)
Approximately 95% of the data will lie in the interval (x
2s, x + 2s),
(33.5 23.44, 33.5 + 23.44) = (26.62 , 40.38)
Approximately 99.7% of the data will lie in the interval
(x 3s, x + 3s),
(33.5 33.44, 33.5 + 33.44) = (23.18 , 43.82)
Numerical Measures of
Relative Standing: Percentiles
Describes the relative location of a
measurement compared to the rest of the data
The pth percentile is a number such that p% of
the data falls below it and (100 p)% falls
above it
Median = 50th percentile
Percentile Example
You scored 560 on the GMAT exam. This
score puts you in the 58th percentile.
What percentage of test takers scored lower
than you did?
What percentage of test takers scored higher
than you did?
Percentile Example
What percentage of test takers scored lower
than you did?
58% of test takers scored lower than 560.
What percentage of test takers scored higher
than you did?
(100 58)% = 42% of test takers scored
higher than 560.
Numerical Measures of
Relative Standing: zScores
Describes the relative location of a
measurement compared to the rest of the data
Sample zscore
xx
z
s
Population zscore
z
Measures the number of standard deviations

away from the mean a data value is located
ZScore Example
The mean time to assemble a
product is 22.5 minutes with a
standard deviation of 2.5 minutes.
Find the zscore for an item that
took 20 minutes to assemble.
Find the zscore for an item that
took 27.5 minutes to assemble.
ZScore Example
x = 20, = 22.5 = 2.5
z=
20 22.5
2.5
= 1.0
x = 27.5, = 22.5 = 2.5

z=
27.5 22.5
2.5
= 2.0
Interpretation of zScores for

Mound-Shaped Distributions
of Data
1. Approximately 68% of the measurements
will have a z-score between 1 and 1.
2. Approximately 95% of the measurements
3. Approximately 99.7% of the measurements
(see the figure on the next slide)
Interpretation of zScores
Outlier
An observation (or measurement) that is unusually large
or small relative to the other values in a data set is called
an outlier. Outliers typically are attributable to one of
the following causes:
1. The measurement is observed, recorded, or entered
into the computer incorrectly.
2. The measurement comes from a different
population.
3. The measurement is correct but represents a rare

(chance) event.
Quartiles
Split ordered data into 4 quarters
25%
25%
Q1
25%
Q2
Lower quartile QL is 25th percentile.

Middle quartile m is the median.
Upper quartile QU is 75th percentile.
Interquartile range: IQR = QU QL
25%
Q3
Interquartile Range
1. Measure of dispersion
2. Also called midspread
3. Difference between third & first quartiles
Interquartile Range = Q3 Q1
4. Spread in middle 50%
Thinking Challenge
Youre a financial analyst for
Prudential-Bache Securities.
You have collected the
following closing stock prices
of new stock issues: 17, 16,
21, 18, 13, 16, 12, 11.
What are the quartiles, Q1

and Q3, and the interquartile
range?
Box Plot
1. Graphical display of data using 5-number
summary (Min, Q1, Q2, Q3, Max)
Xsmallest Q 1 Median Q 3
10
Xlargest
12
Shape & Box Plot

Left-Skewed
Q 1 Median Q3
Symmetric
Q1
Median Q 3
Right-Skewed
Q 1 Median Q 3
Detecting Outliers
Box Plots: Observations falling between the
inner and outer fences are deemed suspect
outliers.
z-scores: Observations with z-scores greater than
3 in absolute value are considered outliers.
Graphing Bivariate
Relationships
Describes a relationship between two
quantitative variables
Plot the data in a scattergram (or scatterplot)
y
Positive
relationship
Negative
relationship
No
relationship
Scattergram Example
Youre a marketing analyst for Hasbro Toys.
You gather the following data:
Ad $ (x) Sales (Units) (y)
1
1
2
1
3
2
4
2
5
4
Draw a scattergram of the data

Managerial Stat Lecture

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Managerial Stat Lecture

Uploaded by

Copyright:

Available Formats

Statistics for Business and

2. Decide which branch of statistic each

In the year 2020, 168 million Americans will be

There is a relationship between smoking cigarettes

Population vs. Sample

A recent survey by the alumni of a major

The average salary of all GM workers is

College major of each

Number of defective items

3. Decide if the data is quantitative or

a. The colors of automobiles on a used car

The number of complaint letters received

The number of seats in a movie theater

Types of Data Collection

Types of Data Collection

Example: effect of stress on

Types of Data Collection

Types of Data Collection

b. A study of the salaries of college professors at a

c. A study where a political pollster wishes to

Nonrandom Sample Errors

Mkt. Share 2006

Mkt. Share 2013

Market Share (%)

Bar Graph Solution 2006

Pie Chart Solution 2006

Market Share (%)

Pareto Diagram Solution 2006

A table that shows

Range: max min = 30 1 = 29

Frequency Distribution Table

(Lower + Upper Boundaries) / 2

Relative Frequency &

Most common measure of central tendency

If n is odd, middle value of sequence

3. Position of median in sequence

4. Not affected by extreme values

Mode: Most frequent

Range: (data_max) (data_min)

Deviation: Difference between data and mean x or x x

Standard deviation= Variance

[(30-33.4)2+ (32- 33.4)2+(32- 33.4)2

Knowing central tendency

Chebyshevs Theorem Example

Chebyshevs Theorem Example

(x 2s, x + 2s) = ( 33.5 2 * 3.44 , 33.5 + 2 * 3.44)

Empirical Rule (68-95-99.7)

To see how data varies from the

Empirical Rule Example

Empirical Rule Example

Measures the number of standard deviations

x = 27.5, = 22.5 = 2.5

Interpretation of zScores for

3. The measurement is correct but represents a rare

Lower quartile QL is 25th percentile.

What are the quartiles, Q1

Shape & Box Plot

You might also like