Intro Ba

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 187

Business statistics

Quantitative techniques

Decision making & Analysis


in
Business Management

DECISION MAKING
A method for determining optimal
strategies when faced with several
decision alternatives and at times an
uncertain pattern of future events.

Business & decision making


Business management levels
Basis of decision

Types of Decision Making Situations

Uncertainty anything can happen, we dont know


Still some analysis is possible on different outcomes
( payoffs) based on different approaches
The decision maker is unable or unwilling to
estimate probabilities
Apply a common sense criterion

For example
Senior management decision making is Strategic in
nature &long term.
Participating in a competition

Types of Decision Making Situations


Decision making under certainty
state of nature is known
decision is to choose the alternative with the best payoff
We know/assume well defined unchanging system .
Mathematical models and equations are derived.
We try to implement best system, model.

For example
Lower management decision making involves
Implementation of actionable objectives and plans & is
short term.
MRP, ERP, many software based systems, rigid rules and
regulations.

Types of Decision Making Situations


Decision making under risk (probability situations)
Expected Value Criterion
compute expected value for each decision alternative
select alternative with best expected value
we know some probabilities of things to happen

For Example
Middle management decision making involves conversion
of strategies into actionable objectives and plans & is
medium term
Wherever some probability of happening of event can be
calculated. Or some pattern can be observed regarding
happening of events. In such cases many time Probabilities
are calculated using past data.

Uncertainty vs risk vs
certainty

Always management struggles to


convert uncertain to risk, and risk to
certainty.
How?
There is a cost involved in this
conversion

Whole world or any world operates


under certain laws and influences.
If one understand these laws, one can
manage better.
If one has to free oneself from one law
one has to take help of another law.

If one is not able to know and understand


laws of how things happen like cause &
effect, then one look for probabilistic
outcomes based on past data , trying to find
patterns
If one is not able to get past data and
probability associated, one goes for the
possible choices based on outcomes (payoffs)
and ones optimistic, pessimistic approach

Knowing / information reduces


uncertainty and risks.
Thats where whole of business statistics
and operations research plays a role
Which is further developed as Business
analytics

In todays scenario, whole of business


management is to somehow approach
towards right decision with more and
more certainty
But with highly changing and dynamic
environment and competition more
and more uncertainty is there.

Problem is compounded when lower


management tends to become
strategic and senior management
tends to become highly rigid and
defined ( at times to save
themselves from difficult/harsh
leadership decisions) or at times
whimsical .

Resulting into (or resulted from )


failure of existing models and
systems.

But on other side , now more then ever,


we have data - big data
Challenge is to convert this data into
knowing and info leading to a decision.
And to develop more flexible models and
systems

Some approaches
Analysis based deterministic Models like
inventory models, production planning models,
mrp models, Linear programming models
Market research, quality, spc, sqc, based on
sampling and probability
Simulation
Under uncertainty optimistic, conservative,
minimax regret, game theory

Business statistics

Business Statistics is a science


assisting you to make business
decisions under
uncertaintiesbased on some
numerical and measurable scales.
Decision making processes must be
based on data, not on personal
opinion nor on belief.

Decisions,
Decisions,
Decisions,
BUSINESS IS ALL ABOUT
Decisions

Business Decisions

Uncertainties

UNCERTAINITIES

THINGS ARE NOT IN OUR CONTR

Just like weather, if you cannot


control something, you should
learn how to measure and
analyze it, in order to predict it,
effectively.

Its all
about
DATA,
INFORMATI
ON
AND
DECISION,
dear.

Elements, Variables, and Observations

The elements are the entities on which data are


collected.
A variable is a characteristic of interest for the elem
The set of measurements collected for a particular
element is called an observation.
The total number of data values in a data set is the
number of elements multiplied by the number of
variables.

Data, Data Sets,


Elements, Variables, and Observations
Variables

Observation
Element
Names

Company

Stock
Exchange

Dataram
AMEX
EnergySouth
OTC
Keystone
NYSE
LandCare
NYSE
Psychemedics AMEX

Annual
Earn/
Sales($M) Share($)
73.10
74.00
365.70
111.40
17.60
Data Set

0.86
1.67
0.86
0.33
0.13

Measuring DATA

Scales of Measurement
Scales
Scales of
of measurement
measurement include:
include:
Nominal

Interval

Ordinal

Ratio

The
The scale
scale determines
determines the
the amount
amount of
of information
information
contained
contained in
in the
the data.
data.
The
The scale
scale indicates
indicates the
the data
data summarization
summarization and
and
statistical
statistical analyses
analyses that
that are
are most
most appropriate.
appropriate.

Scales of Measurement

Nominal

Data
Data are
are labels
labels or
or names
names used
used to
to identify
identify an
an
attribute
attribute of
of the
the element.
element.
A
A nonnumeric
nonnumeric label
label or
or numeric
numeric code
code may
may be
be used.
used.

Scales of Measurement
Nominal
Example:
Example:
Students
Students of
of aa university
university are
are classified
classified by
by the
the
school
school in
in which
which they
they are
are enrolled
enrolled using
using aa
nonnumeric
nonnumeric label
label such
such as
as Business,
Business, Humanities,
Humanities,
Education,
Education, and
and so
so on.
on.

Alternatively,
Alternatively, aa numeric
numeric code
code could
could be
be used
used for
for
the
the school
school variable
variable (e.g.
(e.g. 11 denotes
denotes Business,
Business,
22 denotes
denotes Humanities,
Humanities, 33 denotes
denotes Education,
Education, and
and
so
so on).
on).

Scales of Measurement

Ordinal

The
The data
data have
have the
the properties
properties of
of nominal
nominal data
data and
and
the
the order
order or
or rank
rank of
of the
the data
data is
is meaningful
meaningful..
A
A nonnumeric
nonnumeric label
label or
or numeric
numeric code
code may
may be
be used.
used.

Scales of Measurement

Ordinal

Example:
Example:
Students
Students of
of aa university
university are
are classified
classified by
by their
their
class
class standing
standing using
using aa nonnumeric
nonnumeric label
label such
such as
as
Freshman,
Freshman, Sophomore,
Sophomore, Junior,
Junior, or
or Senior.
Senior.
Alternatively,
Alternatively, aa numeric
numeric code
code could
could be
be used
used for
for
the
the class
class standing
standing variable
variable (e.g.
(e.g. 11 denotes
denotes
Freshman,
Freshman, 22 denotes
denotes Sophomore,
Sophomore, and
and so
so on).
on).

Scales of Measurement
Interval
The
The data
data have
have the
the properties
properties of
of ordinal
ordinal data,
data, and
and
the
the interval
interval between
between observations
observations is
is expressed
expressed in
in
terms
terms of
of aa fixed
fixed unit
unit of
of measure.
measure.
Interval
Interval data
data are
are always
always numeric
numeric..

Scales of Measurement
Interval
Example:
Example:
Melissa
Melissa has
has an
an SAT
SAT score
score of
of 1205
1205,, while
while Kevin
Kevin
has
has an
an SAT
SAT score
score of
of 1090
1090.. Melissa
Melissa scored
scored 115
115
points
points more
more than
than Kevin.
Kevin.

Scales of Measurement

Ratio

The
The data
data have
have all
all the
the properties
properties of
of interval
interval data
data
and
and the
the ratio
ratio of
of two
two values
values is
is meaningful
meaningful..

Variables
Variables such
such as
as distance,
distance, height,
height, weight,
weight, and
and time
time
tim
use
use the
the ratio
ratio scale.
scale.

This
This scale
scale must
must contain
contain aa zero
zero value
value that
that indicates
indicates
that
that nothing
nothing exists
exists for
for the
the variable
variable at
at the
the zero
zero point
point
poin

Scales of Measurement

Ratio

Example:
Example:
Melissas
Melissas college
college record
record shows
shows 36
36 credit
credit hours
hours
earned,
earned, while
while Kevins
Kevins record
record shows
shows 72
72 credit
credit
hours
hours earned.
earned. Kevin
Kevin has
has twice
twice as
as many
many credit
credit
hours
hours earned
earned as
as Melissa.
Melissa.

BUSINESS
STATISTICS

Always Remember its


all about DATA
And we want to take
decision based on our
collection and
interpretation of data

Types of Data
Data

Discrete

Numerical

Categorical

(Quantitative)

(Qualitative)

Continuous

Qualitative Data
Labels
Labels or
or names
names used
used to
to identify
identify an
an attribute
attribute of
of each
each
element
element
Often
Often referred
referred to
to as
as categorical
categorical data
data
Use
Use either
either the
the nominal
nominal or
or ordinal
ordinal scale
scale of
of
measurement
measurement
Can
Can be
be either
either numeric
numeric or
or nonnumeric
nonnumeric
Appropriate
Appropriate statistical
statistical analyses
analyses are
are rather
rather limited
limited

Quantitative Data

Quantitative
Quantitative data
data are
are always
always numeric
numeric..
Ordinary
Ordinary arithmetic
arithmetic operations
operations are
are meaningful
meaningful for
for
quantitative
quantitative data.
data.

Quantitative Data
Quantitative
Quantitative data
data indicate
indicate how
how many
many or
or how
how much:
much:
discrete
discrete,, if
if measuring
measuring how
how many
many
continuous
continuous,, if
if measuring
measuring how
how much
much

Quantitative Data
Quantitative
Quantitative data
data indicate
indicate how
how many
many or
or how
how much:
much:
discrete
discrete,, if
if measuring
measuring how
how many
many
continuous
continuous,, if
if measuring
measuring how
how much
much

GIVE EXAMPLES OF BOTH

Classify each variable as discrete or


continuous.
The time it takes to drive to work.
The number of credit cards a person has.
The number of employees working in a large department
store.
The number of cars stolen each week in a large city.

Scales of Measurement
Data
Qualitative

Numerical
Numerical

Nomina
Nomina Ordina
Ordina
ll
ll

NonNonnumerical
numerical
Nominal
Nominal Ordinal
Ordinal

Quantitativ
e
Numerical
Numerical

Interval
Interval Ratio
Ratio

For each of the following variables, determine whether the


variable is categorical or numerical.
If the variable is numerical, determine whether the variable
is discrete or continuous.
a. Number of telephones per household
b. Length (in minutes) of the longest long-distance call
made per month
c. Whether someone in the household owns a
cell phone

TILL NOW WE HAVE CONSIDERED TYPE OF DATA AND


SCALES OF MEASUREMENT

TILL NOW WE HAVE CONSIDERED TYPE OF DATA AND


SCALES OF MEASUREMENT

BUT HOW AND FROM


WHERE TO COLLECT DATA

Data Sources
Data
Sources
Primary

Experiment

Survey

Secondary

Observation

Published
(& On-Line)

Cross-Sectional Data
Cross-sectional
Cross-sectional data
data are
are collected
collected at
at the
the same
same or
or
approximately
approximately the
the same
same point
point in
in time.
time.
Example
Example:: data
data detailing
detailing the
the number
number of
of building
building
permits
permits issued
issued in
in June
June 2003
2003 in
in each
each of
of the
the counties
counties
of
of Ohio
Ohio

Time Series Data


Time
Time series
series data
data are
are collected
collected over
over several
several time
time
periods.
periods.
Example
Example:: data
data detailing
detailing the
the number
number of
of building
building
permits
permits issued
issued in
in Lucas
Lucas County,
County, Ohio
Ohio in
in each
each of
of
the
the last
last 36
36 months
months

WE CAN HAVE POPULATION DATA OR


SAMPLE DATA

WHAT DOES THIS STATEMENT


MEANS?

Key Definitions
A population (universe) is the
collection of things under
consideration
A sample is a portion of the population
selected for analysis
A parameter is a summary measure
computed to describe a characteristic
of the population
A statistic is a summary measure
computed to describe a characteristic

Population and Sample

Population

Sample
Use statistics to
summarize features

Use parameters to
summarize features

Inference on the population from the samp

Statistical Methods
Statistical
Methods

Descriptive
Statistics

Inferential
Statistics

For each statement, decide


whether descriptive or inferential
statistics is used.
A recent study showed that eating garlic can lower blood
pressure.
The average number of students in a class at White Oak
University is 22.6.
Last years total attendance at Long Run High Schools
football games was 8235.
The chance that a person will be robbed in a certain city is
15%.

Descriptive Statistics
Descriptive statistics are the
tabular, graphical, and numerical
methods used to summarize data.

Descriptive Statistics:
These are statistical
methods used to
describe data that
have been collected.

Descriptive Statistics:
Tabular and Graphical
Presentations

Summarizing Qualitative Data


Summarizing Quantitative Data
Types of Data
Data
Data

Numerical
Numerical

Categorical
Categorical

(Quantitative)
(Quantitative)

Discrete
Discrete

(Qualitative)
(Qualitative)

Continuous
Continuous

Example: Marada Inn


Guests staying at Marada Inn were
asked to rate the quality of their
accommodations as being excellent,
above average, average, below
average, or poor. The ratings provided
by a sample of 20 guests are:
Below Average
Above Average
Above Average
Average
Above Average
Average
Above Average

Average
Above Average
Below Average
Poor
Excellent
Above Average
Average

Above Average
Above Average
Below Average
Poor
Above Average
Average

Frequency Distribution

Rating
Frequency
2
Poor
3
Below Average
5
Average
9
Above Average
1
Excellent
Total
20

Relative Frequency and


Percent Frequency Distributions

Relative
Percent
Rating
Frequency Frequency
.10
10
Poor
.15
15
Below Average
.25
25 .10(100) =
Average
10
.45
45
Above Average
.05
5
Excellent
Total
1.00
100
1/20 = .
05

Bar Graph
Marada Inn Quality Ratings

10
9
8
Frequency

7
6
5
4
3
2
1
Poor

Below Average Above Excellent


Average
Average

Rating

Pie Chart
Marada Inn Quality
Ratings

Excellent
5%

Poor
10%
Above
Average
45%

Below
Average
15%
Average
25%

Frequency Distribution
A
A frequency
frequency distribution
distribution is
is aa tabular
tabular summary
summary of
of
data
data showing
showing the
the frequency
frequency (or
(or number)
number) of
of items
items
in
in each
each of
of several
several non-overlapping
non-overlapping classes.
classes.
The
The objective
objective is
is to
to provide
provide insights
insights about
about the
the data
data
that
that cannot
cannot be
be quickly
quickly obtained
obtained by
by looking
looking only
only at
at
the
the original
original data.
data.

Relative Frequency Distribution


The
The relative
relative frequency
frequency of
of aa class
class is
is the
the fraction
fraction or
or
proportion
proportion of
of the
the total
total number
number of
of data
data items
items
belonging
belonging to
to the
the class.
class.
A
A relative
relative frequency
frequency distribution
distribution is
is aa tabular
tabular
summary
summary of
of aa set
set of
of data
data showing
showing the
the relative
relative
frequency
frequency for
for each
each class.
class.

Percent Frequency
Distribution
The
The percent
percent frequency
frequency of
of aa class
class is
is the
the relative
relative
frequency
frequency multiplied
multiplied by
by 100.
100.
A
A percent
percent frequency
frequency distribution
distribution is
is aa tabular
tabular
summary
summary of
of aa set
set of
of data
data showing
showing the
the percent
percent
frequency
frequency for
for each
each class.
class.

Bar Graph
A bar graph is a graphical device for presenting
qualitative data.
On one axis (usually the horizontal axis), we specify
the labels that are used for each of the classes.
A frequency, relative frequency, or percent frequency
scale can be used for the other axis (usually the
vertical axis).

The bars are separated to emphasize the fact that eac


class is a separate category.

Pie Chart
The pie chart is a commonly used graphical device
for presenting relative frequency distributions for
qualitative data.

Since there are 360 degrees in a circle,


a class with a relative frequency of .25 would
consume .25(360) = 90 degrees of the circle.

Example: Marada Inn

Insights Gained from the Preceding Pie Chart


One-half of the customers surveyed gave Marada
a quality rating of above average or excellen
(looking at the left side of the pie). This might
please the manager.

For each customer who gave an excellent ratin


there were two customers who gave a poor
rating (looking at the top of the pie). This should
shoul
displease the manager.

A categorical variable has three categories with


the following frequencies of occurrence:

a. Compute the percentage of values in each category.


b. Construct a bar chart.
c. Construct a pie chart.
d. Construct a Pareto diagram

You manage a team that sells computer


hardware to software development companies.
At each company, your representatives have a
primary contact. You have categorized these
contacts by the department of the company in
which they work (Development,Computer
Services,Finance,Other,Don't Know)
(contacts.sav)
Use Frequencies to study the distribution of
departments to see if it meshes with your
goals.

produces a frequency table and pie


chart for the variabledept.
What's the interpretation?

At a glance, you see that the


plurality of your contacts work in the
computer services departments of
their respective companies, followed
by those in the financial and
development departments.

Business Statistics

PARETO PRINCIPLE
The Pareto principle exists when
the majority of items in a set of
data occur in a small number of
categories and the few remaining
items are spread out over a large
number of categories. These two
groups are often referred to as the
vital few and the trivial many.

The Pareto Diagram


In a Pareto diagram, the
categorized responses are plotted
in descending order, according to
their frequencies, and are combined
with a cumulative percentage line on
the same chart.
The Pareto diagram can identify
situations in which the Pareto principle
occurs.

The Pareto diagram has the ability to


separate the vital few from the trivial
many, enabling you to focus on the
important categories.
In situations in which the data involved
consist of defective or nonconforming
items, the Pareto diagram is a powerful
tool for prioritizing improvement efforts.

`
We have data for a large injection-molding
company that manufactures plastic
molded Components used in computer
keyboards, washing machines,
automobiles, and television sets
The data presented in Table consist of all
computer keyboards with defects
produced during a three-month period.

Using Frequencies to Study


Ordinal Data
In addition to the department of each
contact, you have recorded their
company ranks. Use Frequencies to
study the distribution of company
ranks to see if it meshes with your
goals.

The frequency table for ordinal data


serves much the same purpose as
the table for nominal data. For
example, you can see from the table
that 15.7% of your contacts are
junior managers.

However, when studying ordinal


data, theCumulative Percentis
much more useful. The table, since it
has been ordered by descending
values, shows that 62.7% of your
contacts are of at least senior
manager rank.

Medication errors are a serious problem in hospitals. The following data


represent the root causes of pharmacy errors at a hospital during a
recent time period:
Reason for Failure
Frequency
Additional instructions
16
Dose
23
Drug
14
Duplicate order entry
22
Frequency
47
Omission
21
Order not discontinued when received 12
Order not received
52
Patient
5
Route
4
Other
8
a. Construct a Pareto diagram.
b. Discuss the vital few and trivial many reasons for the root causes
of pharmacy errors.

Summarizing Quantitative
Data

Summarizing Quantitative
Data
Frequency Distribution
Relative Frequency and Percent
Frequency Distributions
Dot Plot
Histogram
Cumulative Distributions
Ogive

Example: Hudson Auto


Repair
The manager of Hudson Auto
would like to have a better
understanding of the cost
of parts used in the engine
tune-ups performed in the
shop. She examines 50
customer invoices for tune-ups. The costs of
parts,
rounded to the nearest dollar, are listed on the
next
slide.

Example: Hudson Auto Repair


Sample of Parts Cost for 50 Tune-ups

9911
7711
110044
8855
6622

7788
6699
7744
9977
8822

9933
7722
6622
8888
9988

5577
8899
6688
6688
110011

7755
6666
9977
8833
7799

5522
7755
110055
6688
110055

9999
7799
7777
7711
7799

8800
7755
6655
6699
6699

9977
7722
8800
6677
6622

6622
7766
110099
7744
7733

Frequency Distribution Table


Steps
1- Determine range
2- Select number of classes
Usually between 5 and 20 inclusive

3- Compute class intervals (width)


4- Determine class boundaries (limits)
5- Compute class midpoints
6- Count observations & assign to
classes

Frequency Distribution
Guidelines for Selecting Number of
Use between 5 and 20 classes.
Classes

Data sets with a larger number of elements


usually require a larger number of classes.
Smaller data sets usually require fewer classes

Frequency Distribution
(Continued)
Guidelines
for Selecting Width of
Use classes of equal width.
Classes

Approximate Class Width =

L a rg e s t D a ta V a lu e S m a lle s t D a ta V a lu e
N u m b e r o f C la s s e s

Example: Frequency
Distribution

For Hudson Auto Repair, if we choose


six
classes: Class Width = (109 - 52)/6 = 9.5 10
Approximate
Parts Cost ($)
Frequency
50-59
2
60-69
13
70-79
16
80-89
7
90-99
7
100-109
5
Total
50

Relative Frequency and


Percent Frequency
Distributions
Parts
Relative
Percent
Cost ($) Frequency Frequency
50-59
.04
4
60-69
.26
26
2/50
.
70-79
.32
04(100
32
)
80-89
.14
14
90-99
.14
14
100-109 .10
10
Total 1.00
100

Histogram

Another common graphical presentation of


quantitative data is a histogram.
The variable of interest is placed on the horizontal
axis.
A rectangle is drawn above each class interval with
its height corresponding to the intervals frequency
relative frequency, or percent frequency.

Unlike a bar graph, a histogram has no natural


separation between rectangles of adjacent classes.

Histogram

Tune-up Parts Cost


18
16
14

Frequency

12
10
8
6
4
2

Parts
Cost ($)
5059 6069 7079 8089 9099 100-110

Histogram (Continued)

Symmetric

Left tail is the mirror image of the right


tail
Example:
heights and weights of people
.35
Relative Frequency

.30
.25
.20
.15
.10
.05
0

Histogram (Continued)

Moderately Skewed Left


A longer tail to the left
Example: exam scores
.35

Relative Frequency

.30
.25
.20
.15
.10
.05
0

Histogram (Continued)
Moderately Right Skewed
A Longer tail to the right
Example: housing values
.35

Relative Frequency

.30
.25
.20
.15
.10
.05
0

Histogram (Continued)
Highly Skewed Right
A very long tail to the right
Example: executive salaries
.35

Relative Frequency

.30
.25
.20
.15
.10
.05
0

Cumulative Distributions
Cumulative
Cumulative frequency
frequency distribution
distribution shows
shows the
the
number
number of
of items
items with
with values
values less
less than
than or
or equal
equal to
to
the
the upper
upper limit
limit of
of each
each class..
class..
Cumulative
Cumulative relative
relative frequency
frequency distribution
distribution shows
shows
the
the proportion
proportion of
of items
items with
with values
values less
less than
than or
or
equal
equal to
to the
the upper
upper limit
limit of
of each
each class.
class.
Cumulative
Cumulative percent
percent frequency
frequency distribution
distribution shows
shows
the
the percentage
percentage of
of items
items with
with values
values less
less than
than or
or
equal
equal to
to the
the upper
upper limit
limit of
of each
each class.
class.

Cumulative Distributions
Example: Hudson Auto Repair
Cumulative Cumulative
Cumulative Relative
Percent
Cost ($) Frequency Frequency Frequency
2
.04
< 59
4
15
.30
< 69
30
31
< 79
62
2 +.62
15/50
.
13 .76
30(100
38
< 89
76
)
45
.90
< 99
90
50
1.00
< 109
100

Ogive
An ogive is a graph of a cumulative
distribution.
The data values are shown on the horizontal
axis.
Shown on the vertical axis are the:

cumulative frequencies, or
cumulative relative frequencies, or
cumulative percent frequencies

The frequency (one of the above) of each class


is plotted as a point.
The plotted points are connected by straight
lines.

Ogive
Example: Hudson Auto Repair
Because the class limits for the parts-cost
data are 50-59, 60-69, and so on, there
appear to be one-unit gaps from 59 to 60,
69 to 70, and so on.
These gaps are eliminated by plotting points
halfway between the class limits.
Thus, 59.5 is used for the 50-59 class, 69.5
is used for the 60-69 class, and so on.

Ogive with
Cumulative Percent Frequencies
Tune-up
Tune-up Parts
Parts Cost
Cost
Cumulative Percent Frequency

100
80
60

(89.5, 76)

40
20
Parts
Cost ($)

50

60

70

80

90

100

110

Stated and True (or Real) Class Limits


True Classes: Are those classes such that the
upper true limit of a class is the same as the lower
true limit of the next class.
For comparison, the stated class limits and true
class limits are given in the following tablenext
slide:

Stated and True (or Real) Class Limits


Stated
$600-$799
$799.50
$800-$999
$999.50

True
$599.50 up to but not including
$799.50 up to but not including

In the first column of the above table the data were


rounded to the nearest dollar. For example, $799.50
was rounded up to $800 and tailed in the second
class. Any amount over $799 but under 799.50 was
rounded down to $799 and included in the first class.
Thus, the $600-$799 class actually includes all data
from $599.50 inclusive up to but not including
$799.50.

Universal burger is concerned about


product waste, so they sampled their
burger waste record from the past year
with
the
following
results:
2
16
4
12
19
29
24
7
19
22

14

24

31

18

20

16

Construct a frequency distribution for


these data and a relative frequency
distribution. Use intervals of 5 burgers.
One of the goals is for at least 75 % of
shifts to have no more than 16 burgers
wasted. Can you determine from the
frequency distribution whether this goal
has been achieved?

Dot Plot
One of the simplest graphical
summaries of data is a dot plot.
A horizontal axis shows the range
of data values.
Then each data value is
represented by a dot placed above
the axis.

Dot Plot
Tune-up Parts Cost

.
50

.
.
..
. .
.
.
.. .. .. ..
.
. ..... .......... .. . .. . . ..
60

70

80

Cost ($)

90

100

110

Stem and Leaf diagram


A stem-and-leaf display
organizes data into groups
(called stems) so that the values
within each group (the leaves)
branch out to the right on each row.
The resulting display allows you to
see how the data are distributed and
where concentrations of data exist.

suppose that 15 students from your


class eat lunch at a fast-food
restaurant. The following data are
the amounts spent for lunch:
5.40, 4.30, 4.80, 5.50, 7.30, 8.50,
6.10, 4.80, 4.90, 4.90, 5.50, 3.50,
5.90, 6.30, 6.60

To form the stem-and-leaf display,


you use the units as the stems and
round the decimals
(the leaves) to one decimal place.
For example, the first value is 5.40.
Its stem (row) is 5, and its leaf is 4.
The second value, is 4.30. Its stem
(row) is 4, and its leaf is 3.

CROSS TABULATIONS
The study of patterns that may exist
between two or more categorical
variables is common in business.
These patterns are explained by
cross-tabulating the data.
You can present crosstabulations
in tabular form (contingency
tables) or graphical form

The Contingency Table


A contingency table presents the
results of two categorical
variables. The joint responses
areclassified so that the categories of
one variable are located in the rows
and the categories of the other variable
are located in the columns. The values
located at the intersections of the rows
and columns are called cells.

Depending on the type of


contingency table constructed,
the cells for each row column
combination contain the frequency,
the percentage of the overall total,
the percentage of the row total, or
the percentage of the column total.

Mutualfunds .xls
Cross tabulation of category vs
objective
Cross tabulation of objective vs risk

The Scatter Plot


You use a scatter plot to examine
possible relationships between
two numerical variables
For each observation, you plot one
variable on the X axis and the other
variable on the vertical Y axis

the cost of a fast-food hamburger


meal and the cost of two movie
tickets in 10 cities around the world.
The data file Cost of Living.xls
contains the complete data set.

a. Construct six separate scatter plots.


For each, use the overall cost index as the Y
axis. Use the monthly rent for a two-bedroom
apartment, the costs of a cup of coffee with
service, a fast-food hamburger meal, dry
cleaning a men s blazer, toothpaste, and movie
tickets as the X axis.
b. What conclusions can you reach about
the relationship of the overall cost index to
these six variables?

Statistical Methods
Statistical
Methods

Descriptive
Statistics

Inferential
Statistics

Descriptive Statistics
Descriptive statistics are the
tabular, graphical, and numerical
methods used to summarize data.

Descriptive Statistics:
These are statistical
methods used to
describe data that
have been collected.

Descriptive Statistics:
Numerical Measures
Measures of Location
Measures of Variability

Measures of Location
Mean

If the measures are computed


Median
for data from a sample,
Mode
they are called sample statistics.
Percentiles
If the measures are computed
Quartiles
for data from a population,
they are called population parameters
A sample statistic is referred to
as the point estimator of the
corresponding population parameter.
For example, the sample mean is a
point estimator of the population mean.

Mean
The mean of a data set is the
average of all the data values.
x
As we said, the sample mean
the point estimator of the
population mean .

is

Sample Mean

Sum
Sum of
of the
the values
values
of
of the
the n
n observations

xi
n
Number of
observations
observations
in the sample

Population Mean

Sum
Sum of
of the
the values
values
of
of the
the N
N observations
observations

xi

N
Number of
observations
observations in
in
the population

Sample Mean
Example: Apartment Rents
Seventy efficiency apartments
were randomly sampled in
a small college town. The
monthly rent prices for
these apartments are listed
in ascending order on the next slide.

Sample Mean
Example Continued
425
440
450
465
480
510
575

430
440
450
470
485
515
575

430
440
450
470
490
525
580

435
445
450
472
490
525
590

435
445
450
475
490
525
600

435
445
460
475
500
535
600

435
445
460
475
500
549
600

Monthly Rent for 70


Apartments

435
445
460
480
500
550
600

440
450
465
480
500
570
615

440
450
465
480
510
570
615

Sample Mean
Example Continued
x
425
440
450
465
480
510
575

430
440
450
470
485
515
575

430
440
450
470
490
525
580

435
445
450
472
490
525
590

xi
n

34,356

70

435
445
450
475
490
525
600

435
445
460
475
500
535
600

4 9 0 .8 0
435
445
460
475
500
549
600

Monthly Rent for 70


Apartments

435
445
460
480
500
550
600

440
450
465
480
500
570
615

440
450
465
480
510
570
615

Arithmetic Mean of Group


Data
if z1 , z 2 , z3 ,.........., z k
and
f1values
, f 2 , f 3 ,........,
fk

are the mid-

are the corresponding


frequencies, where the subscript k
stands for the number of classes,
then the mean is
fz

z
f

i i
i

Geometric Mean
Geometric mean is defined as the positive root of the
product of observations. Symbolically,

G ( x1 x 2 x 3 x n )

1/ n

It is also often used for a set of numbers whose values


are meant to be multiplied together or are exponential
in nature, such as data on the growth of the human
population or interest rates of a financial investment.
Find geometric mean of rate of growth: 34, 27, 45, 55,
22, 34

Weighted Mean
The Weighted mean of the positive real
numbers x1,x2, ..., xn with their weight w1,w2,
..., wn is defined to be
n

w x

i i

i 1
n

i 1

Potential Problem with


Means
Sample mean gives equal weight to all
measurements
Outliers can have a large influence on
the computed mean value
Distorts our intuition about the central
tendency of the measured values

Potential Problem with


Means

Mean

Mean

Median

The median of a data set is the value in the middle


when the data items are arranged in ascending ord

Whenever a data set has extreme values, the media


is the preferred measure of central location.

The median is the measure of location most often


reported for annual income and property value data
A few extremely large incomes or property values
can inflate the mean.

n 1
P o s itio n in g P o in t
2

Median
For an odd number of observations:
26 18 27 12 14 27 19

7 observations

12 14 18 19 26 27 27

in ascending order

the median is the middle value.


Median = 19

Median
For an even number of observations:
26 18 27 12 14 27 30 19

8 observations

12 14 18 19 26 27 27 30

in ascending order

the median is the average of the middle two values.


Median = (19 + 26)/2 = 22.5

Median: Example
Averaging the 35th and 36th data values:
Median = (475 + 475)/2 = 475
425
440
450
465
480
510
575

430
440
450
470
485
515
575

430
440
450
470
490
525
580

435
445
450
472
490
525
590

435
445
450
475
490
525
600

435
445
460
475
500
535
600

Monthly Rent for 70


Apartments

435
445
460
475
500
549
600

435
445
460
480
500
550
600

440
450
465
480
500
570
615

440
450
465
480
510
570
615

Median of Group Data


h
M e Lo
fo

F
2

L0 = Lower class boundary of the median


class
h = Width of the median class
f0 = Frequency of the median class
F = Cumulative frequency of the premedian class

Example-3:Find Median
Age in years

Number of births

Cumulative number of
births

14.5-19.5

677

677

19.5-24.5

1908

2585

24.5-29.5

1737

4332

29.5-34.5

1040

5362

34.5-39.5

294

5656

39.5-44.5

91

5747

44.5-49.5

16

5763

All ages

5763

Potential Problem with


Means
Median

Mean

Median

Mean

Mode

The mode of a data set is the value that occurs with


greatest frequency.
The greatest frequency can occur at two or more
different values.
If the data have exactly two modes, the data are
bimodal.
If the data have more than two modes, the data are
multimodal.

Mode: Example
450 occurred most frequently (7 times)
Mode = 450
425
440
450
465
480
510
575

430
440
450
470
485
515
575

430
440
450
470
490
525
580

435
445
450
472
490
525
590

435
445
450
475
490
525
600

435
445
460
475
500
535
600

Monthly Rent for 70


Apartments

435
445
460
475
500
549
600

435
445
460
480
500
550
600

440
450
465
480
500
570
615

440
450
465
480
510
570
615

Mode: Another Example

No Mode
Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7
One Mode
Raw Data: 6.0 4.96.0
4.9
More Than 1 Mode
Raw Data: 21 28

28

8.9
41

6.3 4.9
43

43

Use Excel to Compute


the Mean, Median, and Mode
of the Following Data and Explain the
Answers:
425
440
450
465
480
510
575

430
440
450
470
485
515
575

430
440
450
470
490
525
580

435
445
450
472
490
525
590

435
445
450
475
490
525
600

435
445
460
475
500
535
600

435
445
460
475
500
549
600

435
445
460
480
500
550
600

440
450
465
480
500
570
615

440
450
465
480
510
570
615

Mean, Median, or Mode?


Mean
If the sum of all values is meaningful
Incorporates all available information

Median
Intuitive sense of central tendency with
outliers
What is typical of a set of values?

Mode
When data can be grouped into distinct
types, categories (categorical data)

Which mean to use?


Mean value must still conform to
characteristics of a good performance
metric
Linear
Reliable
Repeatable
Easy to use
Consistent
Independent

Best measure of performance still is


execution time

Percentiles

A percentile provides information about how the


data are spread over the interval from the smallest
value to the largest value.
Admission test scores for colleges and universities
are frequently reported in terms of percentiles.
You are familiar with percentile score of national
educational tests such as MAT, and SAT, which tell
you where you stand in comparison with others.
For example, if you are in the 83th percentile,
then 83% of the test-takers scored below you and
you are in the top 17% of the test takers.

Percentiles
Definition
The pth percentile of a data set is a
value such that at least p percent of
the items take on this value or less
and at least (100 - p) percent of the
items take on this value or more.

Steps for Finding Percentiles


Arrange the data in ascending order.
Compute index i, the position of the pth percentile.
i = (p/100)n
If i is not an integer, round up. The p th percentile
is the value in the i th position.
If i is an integer, the p th percentile is the average
of the values in positions i and i +1.

80th Percentile: Example


i = (p/100)n = (80/100)70 = 56
Averaging the 56th and 57th data values:
80th Percentile = (535 + 549)/2 = 542
425
440
450
465
480
510
575

430
440
450
470
485
515
575

430
440
450
470
490
525
580

435
445
450
472
490
525
590

435
445
450
475
490
525
600

435
445
460
475
500
535
600

435
445
460
475
500
549
600

Note: Data is in ascending order.

435
445
460
480
500
550
600

440
450
465
480
500
570
615

440
450
465
480
510
570
615

80th Percentile: Example Continued

425
440
450
465
480
510
575

At least 80%
of the items
take on a value
of 542 or less.

At least 20%
of the items
take on a value
of 542 or more.

56/70 = .8 or 80%

14/70 = .2 or 20%

430
440
450
470
485
515
575

430
440
450
470
490
525
580

435
445
450
472
490
525
590

435
445
450
475
490
525
600

435
445
460
475
500
535
600

435
445
460
475
500
549
600

435
445
460
480
500
550
600

440
450
465
480
500
570
615

440
450
465
480
510
570
615

Use Excel to Find 80th Percentile


Excel Formula Worksheet

1
2
3
4
5
6

80th percentile

A
B
C
D
E
Apart- Monthly
ment Rent ($)
80th Percentile
1
525
=PERCENTILE(B2:B71,.8)
2
440
3
450
It is not necessary
4
615
to put the data
5
480
in ascending order.

Note: Rows 7-71 are not shown.

80th Percentile
Excel Value Worksheet

1
2
3
4
5
6

A
B
C
Apart- Monthly
ment Rent ($)
1
525
2
440
3
450
4
615
5
480

D
80th Percentile
537.8

Note: Rows 7-71 are not shown.

Quartiles
Quartiles are specific percentiles.
First Quartile = 25th Percentile
Second Quartile = 50th Percentile = Median
Third Quartile = 75th Percentile

Unless the sample size is large, percentiles


may not make sense, since percentiles
divide the data into 100 groups.
In smaller samples, we might divide the
data into four groups (quartiles). Since
almost any sample can be divided into four
groups, the quartiles are important
descriptive statistics to explain.

Third Quartile
Excel Formula Worksheet

1
2
3
4
5
6

A
B
Apart- Monthly
ment Rent ($)
1
525
2
440
3
450
4
615
5
480

3rd quartile

Third Quartile
=QUARTILE(B2:B71,3)

Note: Rows 7-71 are not shown.

It is not necessary
to put the data
in ascending order.

Third Quartile
Excel Value Worksheet

1
2
3
4
5
6

A
B
Apart- Monthly
ment Rent ($)
1
525
2
440
3
450
4
615
5
480

D
Third Quartile
522.5

Note: Rows 7-71 are not shown.

ABC Computer company manufacture a


computer with 3 year warranty.
At present only about 3 % of atlas computer
do not make it to warranty period.
Currently Supplier X supplies chip (main
component of computer).
Two suppliers of chip are there X & Y.
Consultant said to take a sample of 10 chips
from each sample and compare the mean
life.
Results are average life of X chips is 10
years and average life of Y chips is 20 years.
Consultant orders to stop supply from X and

After switching to supplier Y


more then 50% of computers
started failing before warranty
period.
What happened?

Supplier X chips, n = 10 :
10, 10, 10, 10, 9, 11, 10, 11, 9, 10
Mean = 10 years, range = 2 years,
and standard deviation = 0.67 years
Supplier Y chips, n=10 (life in years):
40, 2, 37, 1, 0, 31, 49, 1, 38, 1
Mean = 20 years, range = 48 years,
and standard deviation = 20.50 years

The New York City Transit Authority has just purchased a brand
new type of fluorescent light bulb.
These bulbs have an average life of 5.10 years with a standard
deviation of 0.00.
Consultant for the Transit Authority, recommends that on one
Saturday evening between midnight and 5 a.m., when the number
of riders is small, all the bulbs in the subway system should be
replaced.
A worker for the system suggest replacing 5% of the bulbs each
Saturday evening for the next twenty weeks until all the bulbs are
replaced.
Consultant insists that it is cheaper and more efficient to replace all
the bulbs at once. The Transit Authority listens to Consultant.

Set A: 2, 5, 17, 17,


44.
Set B: 17, 17, 17, 17,
17.
All Set
three
average
C:sets
13,have
14, 17,
17, of 17
Although all sets are different.
24.
We need measures which account for
such dispersion of data

The word dispersion is used to denote the


degree of heterogeneity in the data.
It is an important characteristic indicating
the extent to which observations vary
amongst themselves.
A measure of dispersion is designed to state
numerically the extent to which individual
observations vary on the average

Measures of Variability
(Dispersion)

It is often desirable to consider measures of variabil


(dispersion), as well as measures of location.

For example, in choosing supplier A or supplier B we


might consider not only the average delivery tim
each, but also the variability in delivery time for

Measures of Variability
(Dispersion)
Range
Interquartile Range or
Midspread
Variance
Standard Deviation
Coefficient of Variation

Range

The range of a data set is the difference between th


largest and smallest data values.
It is the simplest measure of variability.
It is very sensitive to the smallest and largest data
values.

Range: Example
Range = largest value - smallest value
Range = 615 - 425 = 190
425
440
450
465
480
510
575

430
440
450
470
485
515
575

430
440
450
470
490
525
580

435
445
450
472
490
525
590

435
445
450
475
490
525
600

435
445
460
475
500
535
600

435
445
460
475
500
549
600

Monthly Rent for 70


Apartments

435
445
460
480
500
550
600

440
450
465
480
500
570
615

440
450
465
480
510
570
615

Interquartile Range or
Midspread

The interquartile range of a data set is the differenc


between the third quartile and the first quartile
It is the range for the middle 50% of the data.
It overcomes the sensitivity to extreme data values
not effected by the extreme values.

In te r q u a r tile R a n g e Q

Interquartile Range:
Example
3rd Quartile (Q3) = 525
1st Quartile (Q1) = 445
Interquartile Range = Q3 - Q1 = 525 - 445 = 80
425
440
450
465
480
510
575

430
440
450
470
485
515
575

430
440
450
470
490
525
580

435
445
450
472
490
525
590

435
445
450
475
490
525
600

435
445
460
475
500
535
600

Monthly Rent for 70


Apartments

435
445
460
475
500
549
600

435
445
460
480
500
550
600

440
450
465
480
500
570
615

440
450
465
480
510
570
615

Mean Deviation

An average distance that a score deviates from the mean

Set A: 2, 5, 17, 17,


44.
Set B: 17, 17, 17, 17,
17.
Set C: 13, 14, 17, 17,

Variance

The variance is a measure of variability that utilizes


all the data.

It is based on the difference between the value of


x
each observation (xi) and the mean
( for a sample
for a population).

Variance

The variance is the average of the squared


differences between each data value and the mean.
The variance is computed as follows:
2
(
x

x
)

i
s2
n 1

for a
sample

2
(xi )

N
2

for a
population

Variance for Grouped


Data
For sample data

f i xi x
n

For population data

f x

Standard Deviation
The standard deviation of a data set is the positive
square root of the variance.

It is measured in the same units as the data, making


it more easily interpreted than the variance.

Standard Deviation
The standard deviation is computed as follows:

s2

for a
sample

for a
population

Coefficient of Variation
The coefficient of variation indicates how large the
standard deviation is in relation to the mean.
The coefficient of variation is computed as follows:

1 0 0 %
x

for a
sample

1 0 0 %

for a
population

Coefficient of Variation
(Continued)
Measure of relative dispersion
Always a %
CV is the standard deviation
expressed as percent of the mean
Used to compare two or more
groups
Weakness: CV is undefined if the
mean is zero or if data are negative.
Thus, CV is used only for variables
whose values are X>=0

Example Continued
Given the following monthly rent prices for 70 apartments, find
variance, standard deviation, and the coefficient of variation.:
use equations & Excel

425
440
450
465
480
510
575

430
440
450
470
485
515
575

430
440
450
470
490
525
580

435
445
450
472
490
525
590

435
445
450
475
490
525
600

435
445
460
475
500
535
600

435
445
460
475
500
549
600

435
445
460
480
500
550
600

440
450
465
480
500
570
615

Monthly Rent for 70 Apartments

440
450
465
480
510
570
615

Solutions
Variance
s2

(x i x )2
n 1

2 , 9 9 6 .1 6

Standard Deviation

the
standard
s s 2 9 9 6 .4 7 5 4 .7 4
deviation is
about 11%
Coefficient of Variation
of
of the
s

5 4 .7 4

1 0 0 %
1 0 0 % 1 1 .1 5 mean
%
x

4 9 0 .8 0

Note that CV

is the standard deviation expressed as


percent of the mean.

EXAMPLE
Given the
following data:

357
654
763
621
900

550
290
700
789
605

IfIf you
you need
need help
help with
with
this,
this, see
see next
next slides.
slides.

Use
Use Excel
Excel to
to find:
find:
A.
A. The
The mean
mean
B.
B. The
The mode
mode
C.
C. The
The median
median
D.
D. The
The 75th
75th percentile
percentile
E.
E. The
The first
first and
and the
the third
third
quartile
quartile
F.
F.The
The range
range
G.
G. The
The interquartile
interquartile range
range or
or
midspread
midspread
H.
H. The
The standard
standard deviation
deviation
I.I. The
The coefficient
coefficient of
of variation
variation

A Problem Using Excel

A private research
organization studying
families in various
countries reported
the following data for
the amount of time 4year old children
spent alone with their
fathers each day.

Country

Time with Dad


(minutes)

Belgium

30

Canada

44

China

54

Finland

50

A Problem Using Excel


(Continued)
Use Excel, answer the following questions and explain
your answers (round all numbers into two decimal
places):

A. The mean
B. The mode
C. The median
D. The 75th percentile
E. The first and the third quartile
F. The range
G. The interquartile range or midspread
H. The standard deviation
I. The coefficient of variation

Note: All results are rounded to two


decimal places.

Statistical Analysis Using


Microsoft Excel

Statistical analysis typically involves working with


large amounts of data.
Computer software is typically used to conduct the
analysis.
Frequently the data that is to be analyzed resides in
spreadsheet.
Modern spreadsheet packages are capable of data
management, analysis, and presentation.
MS Excel is the most widely available spreadsheet
software in business organizations.

Statistical Analysis Using Microsoft Excel


3 tasks might be
needed:
Enter Data
Enter Functions and Formulas
Apply Tools

D
Mean
Median
Mode
Range

1
2
3
4
5
6
7
8

A
Parts
Cost
91
71
104
85
62
78
69

E
=AVERAGE(A2:A71)
=MEDIAN(A2:A71)
=MODE(A2:A71)
=MAX(A2:A71)-MIN(A2:A71)

Statistical Analysis Using Microsoft Excel


Excel Worksheet (showing data)
A
1
2
3
4
5
6
7
8
9

Customer
Sam Abrams
Mary Gagnon
Ted Dunn
ABC Appliances
Harry Morgan
Sara Morehead
Vista Travel, Inc.
John Williams

B
Invoice #
20994
21003
21010
21094
21116
21155
21172
21198

C
Parts
Cost ($)
91
71
104
85
62
78
69
74

Note: Rows 10-51 are not shown.

D
Labor
Cost ($)
185
205
192
178
242
148
165
190

Statistical Analysis Using Microsoft Excel


Excel Formula Worksheet
1
2
3
4
5
6
7
8
9

C
D
E
Parts
Labor
Cost ($) Cost ($)
91
185
71
205
104
192
85
178
62
242
78
148
69
165
74
190

Average Parts Cost =AVERAGE(C2:C51)

Note: Columns A-B and rows 10-51 are not shown.

Statistical Analysis Using Microsoft Excel


Excel Value Worksheet
1
2
3
4
5
6
7
8
9

C
D
E
Parts
Labor
Cost ($) Cost ($)
91
185
71
205
104
192
85
178
62
242
78
148
69
165
74
190

Average Parts Cost

79

Note: Columns A-B and rows 10-51 are not shown.

You might also like