Download as pdf or txt
Download as pdf or txt
You are on page 1of 71

Statistics for Business and

Economics
Week 1
Introduction to Statistics
Descriptive Statistics

Chapter 1
Statistics, Data, &
Statistical Thinking

What Is Statistics?
Science of collecting,
organizing, analyzing
and interpreting data in
order to make decision.

Application Areas
Economics
Forecasting
Demographics

Sports
Individual & Team
Performance

Engineering
Construction
Materials

Business
Consumer Preferences
Financial Trends

Branches of Statistic
Descriptive Statistics
Involves the collection, organization, summarization,
and display of data

Branches of Statistic
Inferential statistic:
Involves using a
sample to draw
conclusion about a
population.
A basic tool of
inferential statistics is
probability.

2. Decide which branch of statistic each


statement belongs to (inferential/descriptive)
a. The chances of winning the California lottery are 1
in 22 million.
a. Inferential statistics

In the year 2020, 168 million Americans will be


enrolled in an HMO.
a. Descriptive statistics

b. Descriptive statistics

b. Inferential statistics

There is a relationship between smoking cigarettes


and getting emphysema.
a. Inferential statistics

b. Descriptive statistics

Population vs. Sample

opulation

A subset of population
Has the same/similar
characteristics as the
population
More often used
Numeric description is

Collection of all
outcomes, responses,
measurements or
counts that are of
interest
Numeric description of
population is

Parameter

ample

Population
Sample

Statistic

Ex
1. Is the number a parameter or statistic?
a. A recent survey by a national womens
association showed that the average salary of
3,500 of its 65,000 membership was $73,000.
a. Statistic

A recent survey by the alumni of a major


university indicated that the average salary of
8,000 of its 250,000 graduates was $120,000.
a. Statistic

b. Parameter

b. Parameter

The average salary of all GM workers is


$42,500.
a. Statistic

b. Parameter

Types of Data
Qualitative
Consists of attributes,
labels, or nonnumeric entries

Quantitative:
Consists of numerical
measurements and
counts.

College major of each


student in a class.
Gender of each employee
at a company.
Method of payment
(cash, check, credit card).

Number of defective items


in a lot.
Salaries of CEOs of oil
companies.
Ages of employees at a
company.

3. Decide if the data is quantitative or


qualitative:

Ex

a. The colors of automobiles on a used car


lot
a. Quantitative

b. Qualitative

The number of complaint letters received


by the USPS in a given day
Quantitative

b. Qualitative

The number of seats in a movie theater


Quantitative

b. Qualitative

Types of Data Collection


1. Published source:
Example: book, journal,
newspaper, Web site

Types of Data Collection


2. Designed experiment:
A treatment is applied to part
of a population and
responses are observed.

Example: effect of stress on


driving habits.
Example: effect of a drug
using 2 groups
(drug/placebo).

Types of Data Collection


3. Survey:
A group of people are surveyed and their responses are
recorded to investigate one/more characteristics of
population.
Example: study U.S.
Residents presidential
approval.

Types of Data Collection


4. Observation study
A researcher observes
and measures
characteristics of a
group.
Example: observing how
DeVry students drive.

Ex
4. What method of data collection would you
use to collect data for:
a. A study where a drug was given to patients and
a placebo to another group of 10 patients to
determine if the drug has an effect on a patients
illness.
Perform an Experiment

b. A study of the salaries of college professors at a


particular college.
Take a Survey (Census)

c. A study where a political pollster wishes to


determine if his candidate is leading in the polls.
Take a Survey

Samples
A representative sample exhibits characteristics
typical of those possessed by the population of
interest.
A random sample of n experimental units is a
sample selected from the population in such a way
that every different sample of size n has an equal
chance of selection.

Nonrandom Sample Errors


Selection bias results when a subset of the
experimental units in the population is excluded so
that these units have no chance of being selected for
the sample.
Non-response bias results when the researchers
conducting a survey or study are unable to obtain data
on all experimental units selected for the sample.
Measurement error refers to inaccuracies in the
values of the data recorded. In surveys, the error may
be due to ambiguous or leading questions and the
interviewers effect on the respondent.

Chapter 2
Methods for Describing
Sets of Data

Data Presentation
Data
Presentation
Qualitative
Data

Quantitative
Data
Dot
Plot

Summary
Table

Bar
Graph

Pie
Chart

Pareto
Diagram

Stem-&-Leaf
Display

Frequency
Distribution

Histogram

Summary Table
1. Lists categories & number of elements in category
2. Obtained by tallying responses in category
3. May show frequencies (counts), % or both
Major
Accounting
Economics
Management
Total

Count
130
20
50
200

Bar Graph

Percent
Used
Also

Frequency

150

Equal Bar
Widths

Bar Height
Shows
Frequency or %

100

50

0
Acct.

Econ.
Major

Zero Point

Mgmt.

Vertical Bars
for Qualitative
Variables

Pie Chart
1. Shows breakdown of
total quantity into
categories
2. Useful for showing
relative difference

Majors
Econ.
10%

Mgmt.
25%

Acct.
65%

Pareto Diagram
Like a bar graph, but with the categories arranged by
height in descending order from left to right.

Percent
Used
Also

Frequency

150

Equal Bar
Widths

Bar Height
Shows
Frequency or %

100

50

0
Acct.

Mgmt.
Major

Zero Point

Econ.

Vertical Bars
for Qualitative
Variables

Ex

Example
Youre an analyst for IRI. You want to show the
market shares held by Web browsers in 2006.
Construct a bar graph, pie chart, & Pareto diagram
to describe the data.
Browser

Mkt. Share 2006

Mkt. Share 2013

Firefox

14%

23%

Internet Explorer

81%

26%

Safari

4%

6%

1%
100%

45%
100%

Others (Chrome)
Total

Market Share (%)

Bar Graph Solution 2006


100%
80%
60%
40%
20%
0%
Firefox

Internet
Explorer

Safari

Browser

Others

Pie Chart Solution 2006


Market Share
Firefox,
14%
Safari, 4%
Others,
1%

Internet
Explorer,
81%

Market Share (%)

Pareto Diagram Solution 2006


100%
80%
60%
40%
20%
0%
Internet
Explorer

Firefox

Safari

Browser

Others

Dot Plot
1. Horizontal axis is a scale for the quantitative variable,
e.g., percent.
2. The numerical value of each measurement is located
on the horizontal scale by a dot.

Stem-and-Leaf Display
1. Divide each observation
into stem value and leaf
value
Stems are listed in
order in a column
Leaf value is placed in
corresponding stem
row to right of bar

2 144677
26

3 028
4 1

Key : 3 | 8 = 38

2. Data: 21, 24, 24, 26, 27, 27, 30, 32, 38, 41

Frequency Distribution
Class

Frequency
(f)

Relative
Frequency

1-5

0.14

6-10

0.22

11-15

0.17

16-20

0.22

21-25

0.14

26-30

0.11

36

A table that shows


classes or intervals of
data entries with
number of entries in
each class.
Data:
2, 7, 12, 17, 1, 4, 21,
22, 29, 24, 18, 14, 8, 9,
27, 3, 10, 13, 15, 5, 23,
30, 6, 6, 11, 20, 19, 7,
19, 28, 16, 25, 16, 12,
17, 9.

Key Terms
A class is one of the categories into which
quantitative data can be classified.
The class frequency is the number of
observations in the data set falling into a
particular class.
The class relative frequency is the class
frequency divided by the total numbers of
observations in the data set.
The class percentage is the class relative
frequency multiplied by 100.

Class width:
2nd_lower_class 1st_lower class = 6 - 1=5
Lower class limits: 1, 6, 11, 16, 21, 26
Upper class limits: 5, 10, 15, 20, 25, 30
Midpoints: (upper_class+lower_class)/2
3, 8, 13, 18, 23, 28

Range: max min = 30 1 = 29


Sample size n = f = 36
Relative frequency = f/n

Frequency Distribution Table


Example
Raw Data: 24, 26, 24, 21, 27 27 30, 41, 32, 38
Class

Width

Midpoint Frequency

16 25

20.5

26 35

30.5

36 45

40.5

Boundaries

(Lower + Upper Boundaries) / 2

Relative Frequency &


% Distribution Tables
Relative Frequency
Distribution (f/n)

Class

f/n

Percentage
Distribution

Class

16 25

.3

16 25

30.0

26 35

.5

26 35

50.0

36 45

.2

36 45

20.0

Ex
6. Use the table to answer the questions

Class
4-10
11-17
18-24
25-31

Frequency
5
15
16
11

Relative Frequency

5/47 = 0.11
15/47 = 0.32
16/47 = 0.34
11/47 = 0.23
a. Identify the class width
11- 4 = 7
b. Identify number of samples Total = 5+15+16+11 = 47
c. Give Relative Frequency of each class
Rel. Freq = Freq/Total

Histogram
Class
16 25
26 35
36 45

Count
5
Frequency

3
Bars
Touch

2
1
0
0

16

25

35

Lower Boundary

45

55

Freq.
3
5
2

Central Tendency
A Set of measurements that measures data
tendency to cluster about certain numerical
values, and data variability.
Central Tendency
(Location)

Variation
(Dispersion)

Standard Notation
Measure
Mean
Standard
Deviation

Sample

Population

(mu)

(sigma)

Variance
Size

s 2

Mean
1.
2.
3.
4.

Most common measure of central tendency


Acts as balance point
Affected by extreme values (outliers)
Denoted x where
n

x i 1
n

x
n

Median
1. Measure of central tendency
2. Middle value in ordered sequence

If n is odd, middle value of sequence


If n is even, average of 2 middle values

3. Position of median in sequence


Positioning

Point

n 1
2

4. Not affected by extreme values

Mode
1. Measure of central tendency
2. Value that occurs most often
3. Not affected by extreme values
4. May be no mode or several modes
5. May be used for quantitative or qualitative
data

Range
1. Measure of dispersion
2. Difference between largest & smallest
observations
Range = xlargest xsmallest
3. Ignores how data are distributed

7 8 9 10
Range = 10 7 = 3

7 8 9 10
Range = 10 7 = 3

Variance &
Standard Deviation
1. Measures of dispersion
2. Most common measures
3. Consider how data are distributed
4. Show variation about mean (x or )

x = 8.3
4

8 10 12

Central Tendency
-

Mean: population

x ,

Sample x

N
n
Median: Value in the middle when data is placed in order.

Mode: Most frequent

Range: (data_max) (data_min)

Deviation: Difference between data and mean x or x x


population

(x )

Variance:

Standard deviation= Variance

Sample: S

(x x)

n 1

Ex
7. What is the standard deviation if variance is 36?
Std. Dev = Sqrt(Variance ) = Sqrt (36)= 6
8. Youre a financial analyst for Prudential-Bache
Securities. You have collected the following closing stock
prices of new stock issues: 32,33,40,32,30
Describe the stock prices in terms of central tendency.
(32+33+40+32+30)/5=
Mean:
33.4
Median:
Mode:
Range
Variance

30,32,32,33,40
30,32,32,33,40
40 30 =

32
32
10

[(30-33.4)2+ (32- 33.4)2+(32- 33.4)2


+(33- 33.4)2 +(40- 33.4)2]/(5-1)=

Standard Deviation:

Sqrt(14.80)

14.80
3.85

Knowing central tendency


Knowing ONLY the mean can lead one to purchase
Model A. Knowing the variance as well may change
ones decision!

Shape
1. Describes how data are distributed
2. Measures of Shape
Skew (tail) = Symmetry
Left-Skewed
Mean Median

Symmetric
Mean = Median

Right-Skewed
Median Mean

Chebyshevs Theorem
Applies to any
shape data set
At least 3/4 (75%)
of the data lies in
the interval
x 2s to x + 2s
At least 8/9 of the
data lies in the
interval
x 3s to x + 3s

x 3s

x 2s

xs

xs

x 2s

No useful information
At least 3/4 of the data
At least 8/9 of the data

x 3s

Chebyshevs Theorem Example


Previously we found the mean
closing stock price of new stock
issues is 33.4 and the standard
deviation is 3.44.
Use this information to form an
interval that will contain at least
75% of the closing stock prices of
new stock issues.

Chebyshevs Theorem Example


At least 75% of the closing stock prices of new stock
issues will lie within 2 standard deviations of the mean.
x = 33.5

s = 3.44

(x 2s, x + 2s) = ( 33.5 2 * 3.44 , 33.5 + 2 * 3.44)


(26.62 , 40.38)

Empirical Rule (68-95-99.7)

To see how data varies from the


mean

Data needs to be
Symmetric
Bell shaped distribution

Empirical Rule Example


Previously we found the mean
closing stock price of new
stock issues is 33.5 and the
standard deviation is 3.44. If
we can assume the data is
symmetric and mound shaped,
calculate the percentage of the
data that lie within the intervals
x + s, x + 2s, x + 3s.

Empirical Rule Example


According to the Empirical Rule, approximately 68% of
the data will lie in the interval (x s, x + s),
(33.5 3.44, 33.5 + 3.44) = (30.06 , 36.94)
Approximately 95% of the data will lie in the interval (x
2s, x + 2s),
(33.5 23.44, 33.5 + 23.44) = (26.62 , 40.38)
Approximately 99.7% of the data will lie in the interval
(x 3s, x + 3s),
(33.5 33.44, 33.5 + 33.44) = (23.18 , 43.82)

Numerical Measures of
Relative Standing: Percentiles
Describes the relative location of a
measurement compared to the rest of the data
The pth percentile is a number such that p% of
the data falls below it and (100 p)% falls
above it
Median = 50th percentile

Percentile Example
You scored 560 on the GMAT exam. This
score puts you in the 58th percentile.
What percentage of test takers scored lower
than you did?
What percentage of test takers scored higher
than you did?

Percentile Example
What percentage of test takers scored lower
than you did?
58% of test takers scored lower than 560.
What percentage of test takers scored higher
than you did?
(100 58)% = 42% of test takers scored
higher than 560.

Numerical Measures of
Relative Standing: zScores
Describes the relative location of a
measurement compared to the rest of the data
Sample zscore
xx
z
s

Population zscore
z

Measures the number of standard deviations


away from the mean a data value is located

ZScore Example
The mean time to assemble a
product is 22.5 minutes with a
standard deviation of 2.5 minutes.
Find the zscore for an item that
took 20 minutes to assemble.
Find the zscore for an item that
took 27.5 minutes to assemble.

ZScore Example
x = 20, = 22.5 = 2.5
z=

20 22.5

2.5

= 1.0

x = 27.5, = 22.5 = 2.5


z=

27.5 22.5

2.5

= 2.0

Interpretation of zScores for


Mound-Shaped Distributions
of Data
1. Approximately 68% of the measurements
will have a z-score between 1 and 1.
2. Approximately 95% of the measurements
will have a z-score between 2 and 2.
3. Approximately 99.7% of the measurements
will have a z-score between 3 and 3.
(see the figure on the next slide)

Interpretation of zScores

Outlier
An observation (or measurement) that is unusually large
or small relative to the other values in a data set is called
an outlier. Outliers typically are attributable to one of
the following causes:
1. The measurement is observed, recorded, or entered
into the computer incorrectly.
2. The measurement comes from a different
population.

3. The measurement is correct but represents a rare


(chance) event.

Quartiles
Split ordered data into 4 quarters
25%

25%
Q1

25%
Q2

Lower quartile QL is 25th percentile.


Middle quartile m is the median.
Upper quartile QU is 75th percentile.
Interquartile range: IQR = QU QL

25%
Q3

Interquartile Range
1. Measure of dispersion
2. Also called midspread
3. Difference between third & first quartiles
Interquartile Range = Q3 Q1
4. Spread in middle 50%
5. Not affected by extreme values

Thinking Challenge
Youre a financial analyst for
Prudential-Bache Securities.
You have collected the
following closing stock prices
of new stock issues: 17, 16,
21, 18, 13, 16, 12, 11.

What are the quartiles, Q1


and Q3, and the interquartile
range?

Box Plot
1. Graphical display of data using 5-number
summary (Min, Q1, Q2, Q3, Max)

Xsmallest Q 1 Median Q 3

10

Xlargest

12

Shape & Box Plot


Left-Skewed
Q 1 Median Q3

Symmetric
Q1

Median Q 3

Right-Skewed
Q 1 Median Q 3

Detecting Outliers
Box Plots: Observations falling between the
inner and outer fences are deemed suspect
outliers.
z-scores: Observations with z-scores greater than
3 in absolute value are considered outliers.

Graphing Bivariate
Relationships
Describes a relationship between two
quantitative variables
Plot the data in a scattergram (or scatterplot)
y

Positive
relationship

Negative
relationship

No
relationship

Scattergram Example
Youre a marketing analyst for Hasbro Toys.
You gather the following data:
Ad $ (x) Sales (Units) (y)
1
1
2
1
3
2
4
2
5
4
Draw a scattergram of the data

You might also like