Professional Documents
Culture Documents
Lecture 1
Lecture 1
Describing Data
©
Summarizing and Describing
Data (Graphical)
Pareto Chart
Histogram
Pareto Chart
Consider a company that sells business
bags. Among these bags, some items
generate more revenues than other items.
By ranking the items according to the
revenue, the company will know which
items they have to emphasize (in terms of
cost management, etc). For such a
purpose, a Pareto chart is useful.
Pareto Chart Example
\16,000,000 120%
\14,000,000
100%
100%
\12,000,000 90%
80% 80%
\10,000,000
\8,000,000 60%
\6,000,000
40%
\4,000,000
20%
\2,000,000
\0 0%
k n ag A rs B B A rs ks
ac w B e e e e e ld
e c
Bl
Bro A Cas h old a s
c as
c as o Pa
Ba
g
ag O
u it rd itC he he y
H
D ay
s s s
B S Ca Su tac
tac Ke
e es e A t A t
is n n m
si Na
Bu B u
Sum
A Procedure to make a Pareto
1. Compute the revenue for each item
2. Compute the total revenue
3. Sort the data according to the revenue
4. Compute the percentage of revenue for
each item
5. Compute the cumulative percentage of
revenue
6. Make the Pareto Chart
Pareto Chart Example
\16,000,000 120%
\14,000,000
100%
100%
\12,000,000 90%
80% 80%
\10,000,000
\8,000,000 60%
\6,000,000
40%
\4,000,000
20%
\2,000,000
\0 0%
A
g
s
B
A
n
k
Ba
er
ck
er
ac
ow
se
se
se
se
ld
ld
Pa
Bl
Ca
Br
ca
ca
Ca
ho
Ho
O
y
g
Da
e
e
it
Ba
rd
it
y
Ba
ch
ch
Su
Su
Ke
Ca
s
ta
ta
s
es
At
At
es
e
m
sin
sin
Na
Bu
Bu
Revenue by clients
100%
25,000,000 120%
100% 100%
20,000,000
90%
80% 80%
15,000,000
60%
10,000,000
40%
5,000,000
20%
0 0%
Hakuhodo Office ABC Daisan Kikaku Asahi Agency Ad soken Taiyo
Advertisement
Revenue Cumulative %
Histogram and frequency table
Example
12 11 11 ~ 20 0
10 ~ 25 4
8 ~ 30 5
6
Frequency
6 5 ~ 35 11
4 4
4 ~ 40 11
2
2 ~ 45 6
0 0 0 0
0 ~ 50 4
~ 55 2
~ 60 0
Clients' Age range
More 0
From the histogram, we can
learn that
Clients of age between 35 and 45 are the
primary clients.
Population
A population is the complete set of all items
in which an investigator is interested.
Examples of Populations
Names of all registered voters in the
United States.
Incomes of all families living in Daytona
Beach.
Grade point averages of all the students in
your university.
A major objective of statistics is to make
an inference about the population. For
example “What is the average income of
all families living in Daytona Beach.”
Often, collecting the data for the
population is costly or impossible.
Therefore, we often collect data for only a
part of the population. Such data is called
a “Sample”.
A Sample
Sample
A sample is an observed subset of
population values.
Numerical Measure of
Summarizing Data
3 x3 X i 1
n n
. . n
x1 x2 xn x i
X i 1
n n
For the population mean, we use μ to denote the
population mean. We also use upper case N to denote the
N
sample size.
x1 x2 x N x i
i 1
N N
3-1 Cautionary note
: Mean (average) is not necessarily the
“center of the data”
3-2 Example
“The average Japanese household saving
in year 2005 is ¥ 17,280,000”
Above
40,000
38,000-
1 1
40,000
36,000-
center of the data”. An example
38,000
34,000-
2 2 1.9 1.7 1.2 1.3
36,,000
The mean may not be “the
32,000-
34,000
30,000-
32,000
28,000-
30,000
Sample mean
=17,280,000
26,000-
Histgram of Japanese Household Savings
24,000
20,000-
22,000
18,000-
20,000
16,000-
18,000
14,000-
5.1 4.5
16,000
12,000-
14,000
6.9 6.2 10,000-
12,000
8,000-10,000
8.2
6,000-8,000
9.5
4,000-6,000
10.6
2,000-4,000
16 14.1
below2,000
14
12
10
8
6
4
2
0
Percentage
One may think that the average is the
“normal household”. However, you can
see that a lot of households have savings
much less than the average. The average
saving is very high because a few
households have huge savings.
In such case, “median” can give you a
better sense of a “normal household”. The
definition of the median is given in the
next slide.
4-1 Median
Sort the data in an ascending order.
Then the median is the value in the
middle (middle observation)
Above 40,000
38,000-
1 1
40,000
36,000-
38,000
Japanese Household saving
34,000-
36,000
32,000-
34,000
30,000-
32,000
28,000-
Sample Average
30,000
=17,280,000
26,000-
Histgram of Japanese Household Savings
28,000
3.5 3 3 2.7
22,000-
24,000
20,000-
22,000
18,000-
20,000
16,000-
18,000
5.1 4.5 14,000-
16,000
12,000-
14,000
10,000-
6.9 6.2
12,000
8,000-10,000
10,520,000
Median =
8.2
6,000-8,000
9.5
4,000-6,000
10.6
2,000-4,000
16 14.1
below2,000
8
6
4
2
0
14
12
10
Percentage
Corresponding chapters
This lecture note covers the following
topics of the textbook.
1.1 Sampling
Example 2.6 Pareto Diagram
2.4 Arithmetic Mean, Median