Professional Documents
Culture Documents
Managerial Stat Lecture
Managerial Stat Lecture
Economics
Week 1
Introduction to Statistics
Descriptive Statistics
Chapter 1
Statistics, Data, &
Statistical Thinking
What Is Statistics?
Science of collecting,
organizing, analyzing
and interpreting data in
order to make decision.
Application Areas
Economics
Forecasting
Demographics
Sports
Individual & Team
Performance
Engineering
Construction
Materials
Business
Consumer Preferences
Financial Trends
Branches of Statistic
Descriptive Statistics
Involves the collection, organization, summarization,
and display of data
Branches of Statistic
Inferential statistic:
Involves using a
sample to draw
conclusion about a
population.
A basic tool of
inferential statistics is
probability.
b. Descriptive statistics
b. Inferential statistics
b. Descriptive statistics
opulation
A subset of population
Has the same/similar
characteristics as the
population
More often used
Numeric description is
Collection of all
outcomes, responses,
measurements or
counts that are of
interest
Numeric description of
population is
Parameter
ample
Population
Sample
Statistic
Ex
1. Is the number a parameter or statistic?
a. A recent survey by a national womens
association showed that the average salary of
3,500 of its 65,000 membership was $73,000.
a. Statistic
b. Parameter
b. Parameter
b. Parameter
Types of Data
Qualitative
Consists of attributes,
labels, or nonnumeric entries
Quantitative:
Consists of numerical
measurements and
counts.
Ex
b. Qualitative
b. Qualitative
b. Qualitative
Ex
4. What method of data collection would you
use to collect data for:
a. A study where a drug was given to patients and
a placebo to another group of 10 patients to
determine if the drug has an effect on a patients
illness.
Perform an Experiment
Samples
A representative sample exhibits characteristics
typical of those possessed by the population of
interest.
A random sample of n experimental units is a
sample selected from the population in such a way
that every different sample of size n has an equal
chance of selection.
Chapter 2
Methods for Describing
Sets of Data
Data Presentation
Data
Presentation
Qualitative
Data
Quantitative
Data
Dot
Plot
Summary
Table
Bar
Graph
Pie
Chart
Pareto
Diagram
Stem-&-Leaf
Display
Frequency
Distribution
Histogram
Summary Table
1. Lists categories & number of elements in category
2. Obtained by tallying responses in category
3. May show frequencies (counts), % or both
Major
Accounting
Economics
Management
Total
Count
130
20
50
200
Bar Graph
Percent
Used
Also
Frequency
150
Equal Bar
Widths
Bar Height
Shows
Frequency or %
100
50
0
Acct.
Econ.
Major
Zero Point
Mgmt.
Vertical Bars
for Qualitative
Variables
Pie Chart
1. Shows breakdown of
total quantity into
categories
2. Useful for showing
relative difference
Majors
Econ.
10%
Mgmt.
25%
Acct.
65%
Pareto Diagram
Like a bar graph, but with the categories arranged by
height in descending order from left to right.
Percent
Used
Also
Frequency
150
Equal Bar
Widths
Bar Height
Shows
Frequency or %
100
50
0
Acct.
Mgmt.
Major
Zero Point
Econ.
Vertical Bars
for Qualitative
Variables
Ex
Example
Youre an analyst for IRI. You want to show the
market shares held by Web browsers in 2006.
Construct a bar graph, pie chart, & Pareto diagram
to describe the data.
Browser
Firefox
14%
23%
Internet Explorer
81%
26%
Safari
4%
6%
1%
100%
45%
100%
Others (Chrome)
Total
Internet
Explorer
Safari
Browser
Others
Internet
Explorer,
81%
Firefox
Safari
Browser
Others
Dot Plot
1. Horizontal axis is a scale for the quantitative variable,
e.g., percent.
2. The numerical value of each measurement is located
on the horizontal scale by a dot.
Stem-and-Leaf Display
1. Divide each observation
into stem value and leaf
value
Stems are listed in
order in a column
Leaf value is placed in
corresponding stem
row to right of bar
2 144677
26
3 028
4 1
Key : 3 | 8 = 38
2. Data: 21, 24, 24, 26, 27, 27, 30, 32, 38, 41
Frequency Distribution
Class
Frequency
(f)
Relative
Frequency
1-5
0.14
6-10
0.22
11-15
0.17
16-20
0.22
21-25
0.14
26-30
0.11
36
Key Terms
A class is one of the categories into which
quantitative data can be classified.
The class frequency is the number of
observations in the data set falling into a
particular class.
The class relative frequency is the class
frequency divided by the total numbers of
observations in the data set.
The class percentage is the class relative
frequency multiplied by 100.
Class width:
2nd_lower_class 1st_lower class = 6 - 1=5
Lower class limits: 1, 6, 11, 16, 21, 26
Upper class limits: 5, 10, 15, 20, 25, 30
Midpoints: (upper_class+lower_class)/2
3, 8, 13, 18, 23, 28
Width
Midpoint Frequency
16 25
20.5
26 35
30.5
36 45
40.5
Boundaries
Class
f/n
Percentage
Distribution
Class
16 25
.3
16 25
30.0
26 35
.5
26 35
50.0
36 45
.2
36 45
20.0
Ex
6. Use the table to answer the questions
Class
4-10
11-17
18-24
25-31
Frequency
5
15
16
11
Relative Frequency
5/47 = 0.11
15/47 = 0.32
16/47 = 0.34
11/47 = 0.23
a. Identify the class width
11- 4 = 7
b. Identify number of samples Total = 5+15+16+11 = 47
c. Give Relative Frequency of each class
Rel. Freq = Freq/Total
Histogram
Class
16 25
26 35
36 45
Count
5
Frequency
3
Bars
Touch
2
1
0
0
16
25
35
Lower Boundary
45
55
Freq.
3
5
2
Central Tendency
A Set of measurements that measures data
tendency to cluster about certain numerical
values, and data variability.
Central Tendency
(Location)
Variation
(Dispersion)
Standard Notation
Measure
Mean
Standard
Deviation
Sample
Population
(mu)
(sigma)
Variance
Size
s 2
Mean
1.
2.
3.
4.
x i 1
n
x
n
Median
1. Measure of central tendency
2. Middle value in ordered sequence
Point
n 1
2
Mode
1. Measure of central tendency
2. Value that occurs most often
3. Not affected by extreme values
4. May be no mode or several modes
5. May be used for quantitative or qualitative
data
Range
1. Measure of dispersion
2. Difference between largest & smallest
observations
Range = xlargest xsmallest
3. Ignores how data are distributed
7 8 9 10
Range = 10 7 = 3
7 8 9 10
Range = 10 7 = 3
Variance &
Standard Deviation
1. Measures of dispersion
2. Most common measures
3. Consider how data are distributed
4. Show variation about mean (x or )
x = 8.3
4
8 10 12
Central Tendency
-
Mean: population
x ,
Sample x
N
n
Median: Value in the middle when data is placed in order.
(x )
Variance:
Sample: S
(x x)
n 1
Ex
7. What is the standard deviation if variance is 36?
Std. Dev = Sqrt(Variance ) = Sqrt (36)= 6
8. Youre a financial analyst for Prudential-Bache
Securities. You have collected the following closing stock
prices of new stock issues: 32,33,40,32,30
Describe the stock prices in terms of central tendency.
(32+33+40+32+30)/5=
Mean:
33.4
Median:
Mode:
Range
Variance
30,32,32,33,40
30,32,32,33,40
40 30 =
32
32
10
Standard Deviation:
Sqrt(14.80)
14.80
3.85
Shape
1. Describes how data are distributed
2. Measures of Shape
Skew (tail) = Symmetry
Left-Skewed
Mean Median
Symmetric
Mean = Median
Right-Skewed
Median Mean
Chebyshevs Theorem
Applies to any
shape data set
At least 3/4 (75%)
of the data lies in
the interval
x 2s to x + 2s
At least 8/9 of the
data lies in the
interval
x 3s to x + 3s
x 3s
x 2s
xs
xs
x 2s
No useful information
At least 3/4 of the data
At least 8/9 of the data
x 3s
s = 3.44
Data needs to be
Symmetric
Bell shaped distribution
Numerical Measures of
Relative Standing: Percentiles
Describes the relative location of a
measurement compared to the rest of the data
The pth percentile is a number such that p% of
the data falls below it and (100 p)% falls
above it
Median = 50th percentile
Percentile Example
You scored 560 on the GMAT exam. This
score puts you in the 58th percentile.
What percentage of test takers scored lower
than you did?
What percentage of test takers scored higher
than you did?
Percentile Example
What percentage of test takers scored lower
than you did?
58% of test takers scored lower than 560.
What percentage of test takers scored higher
than you did?
(100 58)% = 42% of test takers scored
higher than 560.
Numerical Measures of
Relative Standing: zScores
Describes the relative location of a
measurement compared to the rest of the data
Sample zscore
xx
z
s
Population zscore
z
ZScore Example
The mean time to assemble a
product is 22.5 minutes with a
standard deviation of 2.5 minutes.
Find the zscore for an item that
took 20 minutes to assemble.
Find the zscore for an item that
took 27.5 minutes to assemble.
ZScore Example
x = 20, = 22.5 = 2.5
z=
20 22.5
2.5
= 1.0
27.5 22.5
2.5
= 2.0
Interpretation of zScores
Outlier
An observation (or measurement) that is unusually large
or small relative to the other values in a data set is called
an outlier. Outliers typically are attributable to one of
the following causes:
1. The measurement is observed, recorded, or entered
into the computer incorrectly.
2. The measurement comes from a different
population.
Quartiles
Split ordered data into 4 quarters
25%
25%
Q1
25%
Q2
25%
Q3
Interquartile Range
1. Measure of dispersion
2. Also called midspread
3. Difference between third & first quartiles
Interquartile Range = Q3 Q1
4. Spread in middle 50%
5. Not affected by extreme values
Thinking Challenge
Youre a financial analyst for
Prudential-Bache Securities.
You have collected the
following closing stock prices
of new stock issues: 17, 16,
21, 18, 13, 16, 12, 11.
Box Plot
1. Graphical display of data using 5-number
summary (Min, Q1, Q2, Q3, Max)
Xsmallest Q 1 Median Q 3
10
Xlargest
12
Symmetric
Q1
Median Q 3
Right-Skewed
Q 1 Median Q 3
Detecting Outliers
Box Plots: Observations falling between the
inner and outer fences are deemed suspect
outliers.
z-scores: Observations with z-scores greater than
3 in absolute value are considered outliers.
Graphing Bivariate
Relationships
Describes a relationship between two
quantitative variables
Plot the data in a scattergram (or scatterplot)
y
Positive
relationship
Negative
relationship
No
relationship
Scattergram Example
Youre a marketing analyst for Hasbro Toys.
You gather the following data:
Ad $ (x) Sales (Units) (y)
1
1
2
1
3
2
4
2
5
4
Draw a scattergram of the data