Professional Documents
Culture Documents
Chapter 2
Chapter 2
Chapter 2
CIIAPTER 2
..o..O..o....O...............t.......o..............................
Descriptive Statistics
data' Excel was used to create the following bar graphs for the
6000 uooo
?
.ct
zooo 10oo
zo
1999
20oo
2001
fear
2002
Doctoral Enrollment
50000 45000 40000 - 35000 soooo E zsooo o E 2oooo
'
lsooo
10000
5000
0
2000
2001
Year
in the number of doctoral degrees We can see from the graph that there has been little change the enrollment has increased awarded during,nir,i".i-rtt" years. we also see that graphs support the statements made by the dramatically oln", ttr"rJ-ru-"-dr,,u y"urr. These bar American Society of Engineering Education'
Chapter 2
Descr
2.3
Beach hotspot
will
it is a qualitative variable.
Beach condition
will
will
it is a qualitative
a pie chart
Beach Condition
Excel was used to form a pie chart for nearshore bar condition of the six hotspots.
Neanshore Bar Condition
d.
These pie charts describe the conditions of these six beach hotspots. They should not be used to make inferences about beach hotspots around the cout.rtry.
Descriptive Statistics
2.7
2.5a'Excelwasusedtocreateabargraphforthesouroesofunauthorizedeomputeruseln
t999.
40
35
trs
10
Excelwasusedtocfeateabargraphforthesoufcesofunauthorizedcomputerusetn
2001.
2.9
45
40 35
Chapter 2
Det
2.7
a Pareto diagram
First
140
S ig
nificant Digit
120 100
.3I
o
E
80 60
40
20
0
b.
The results do not support Benford's Law. Benford's Law states that the value " I " should occur first most often, around 30olo of the time. We see in these results that the value .,6" occurred most often. In addition, the value "1" occurred much less than 307o of the time. In fact, it occurred just 14.67%o (1091143> of the time.
2.9
Using MINITAB a bar chart for the Extinct status versus flight capability is:
80
7A
60 50
It appears that extinct status is related to flight capability. For birds that do have flight most of them are present. For those birds that do not have flight capability, most "uputitity, are extinct.
Descriptive Statistics
2.tt
Density is: The bar chart for Extinct status versus Nest
present' to nest density' Th9 grgnortion of birds It appears that extinct status is not related high and nest densitlz low' rot tt"tt density u".v absent, and extinct d;;;;" "i*1*
(TA)' most to habitat' For those in aerial terrestrial It appears that the extinct status is related extinct' For those ground terrestrial (TG), most species are species are present. F;;ffi; in in aquatic, most species are present'
Chapter 2
Des
2.11
a.
b.
Stem-and-leaf of Cesium Leaf Unit = 0.10 1 2 4 (3) 2 -5 0 -55 -500 -4 865 -47r
N :
lstoCram:6f i.Cesiuin
g
o
,$
d.
The stem-and-leaf display appears to be more informative than the other graphs' Since there are only 9 observations, the histogram and dot plot have very few observations
per category. There are 4 observations with radioactivity level of -5.00 or lower. The proportion : measurements with a radioactivity level of -5'0 or lower is 4 / 9 -444.
of
Descriptive Statistics
2.13
that measure between l '5 and 2'5 on the To estimate the percentage of the aftershocks
2.17
Toestimatethepercentageoftheaftershocksthatmeasuregreaterthan3'0onthe corresponding bars of the bargraph' Richter scale, we must sum the percentag", on the
O"i"g
2.15
Mini.mum
74 - 000
Median
Sten Leaves t'74 l1 3199 B 11 5 58 685 B 66777"7 L2 I 999 15 9 000000000001"1-11111111 36 g 22222222223333333333333333 62 (31) g 444444444444444455s55555555555s 7'7 "1111 g aae e e e a666666667'7 7'7 7 71117 7't 1'717 81 g eBBeeeeEBBBBSBBgg 9999999999999999 44 11. 10 00000000000
the ones digit of the sanitation score for The stems are the tens digit and the leaves are the ship.
b.
b.
higher. We estimate
168ofthe174ships,or96'6Vooftheships'registeredasanitairo,nscoreof36or to tftipt thit have an accepted sanitation standard at pt"p""iott "f " be.966.
74 is the lowest saore on the stem-and-leaf display'
c.
Chapter 2
Descriptiv
2.17 a.
Statistix was used to construct the following histogram for the pH levels of the sampled wells:
>\ o
c q) 3 g
(l)
LL
.O
7.5
pH Level
8.O
a pH value less
b.
Statistix was used to construct the following histogram for the MTBE values of the wells with detectible levels of MTBE:
I lr
>' o c (u =40 s
15
2A
25
30
3s
70 cases
It appears that roughly 6 of the 70 MTBE levels, or 6/'70: .086, have a MTBE value that exceeds 5.
Descriptive Statistics
ll
2.19
L'
b.
Toconvertafrequencyhistogramtoarelativefrequencyhistogram'wemustfirst + 70 frequencieJ, which is 35 + 130 divide each of the frequencies by the sum Jail the i ts * 5 :255. The rbhtive frequency table is:
Class
Interval Freq-uency
3-5
5_:7
R9lLtilv-e Fr-eg-uency
35 130
70
15
5
istzss: .t37
2.23
.a7t
.112
F g
p
.g
.zts .l
.rrt
.oBt
2.25
c.
9 nanometers in diameter is The proportion of copper particles exceeding : approximatelY 2O/255 .078'
2.21
a.
l0
_ =-T.y :4=6
n5 :
middle (3'd) measurement:
5
median: rr
4,5,6' 6'9' lZ
mean:
T v 49=6 =7=n
Descri
Chapter 2
median:
m:
average
5+6 l l 22
mode: Most frequently occurring value
Mode
2.23
mean:
median: m:
2+2
Half of the 40 observations fall below 2 and half fall above 2. tnode: Most frequently occurring value
Mode:2
Because there is only a slight degree of skewness in the data, the preferred measure of central tendency would be the mean.
2.25
)oL1l
30
Z'
median:
m:
16th)
Half the values in the old process fall below 9.975. Half the values in the new process fall below 9-455. mode: Most frequently occurring value
Descriptive Statistics
l3
Mo.n- 8'82
mean' preferred measure of central tendency is the Because of the symmetry in the dat4 the
2.27
.lL2, .2O5, .225,.23g,.241,.27O, '27O, '33O, '375' '523', '59l', '618 mean:
2.3
V =7:-12 m:
IY
3.sss :'5J5
.27O+.270
2
- .1^
--2.=J
Mode:
-270
preferred to the right, the mean would probably be Although the data set is slightly skewed to th;;;;i"" and mode because it has nicer properties'
2.29
)1
a.
_r/
n - --.-: n
Zy
-1'9u --.08+(-'06)+"'+(-'16) = 25 25
= -.1544
Themedianisthemiddlenumberoncethedatahavebeenarrangedinorder:
-|.07,-.21,_.20,-,17,_.I7,_.17,_.|6,-.16,_.!6,_.15,_.|2,-.12,_.11,-.10, -'06' -'06' -'04 -.09, -.09;-.09, -.08, -'08, -'07, -'o7 ' -'06'
The median is -.1 1'
Themodeisthevaluewiththehighestfrequency.since-'17,-'16'*'09'-'06each
ocour 3 times, all are modes'
2:.
b.
After deletine -1
_'> 10
24 The median is the average of the middle two numbers once the data are arranged in order. The middle two numbers are 10 and I
l.
The median
The mean changes from -.1544 to -J1625 -. 1 I to mean changes much more than the median when the outlier is removed. -.105. The
2.31
t"
(-'
10) +
(*' l l)
: -.105.
Range:
1.55
1.37:.18 t7.34s3-tt'772
b.
.s2:u"
r,
n-I
-trd n
8-1
:.0041
":
d.
\foo+r :.o64
is
If the standard deviation of the daily amrnonia levels during the morning drive-time
L45 ppm (compared to .064 ppm in the afternoon drive-time), then the morning drivetime has rnore variable ammonia levels.
2.33
a.
The data appears to be moundshaped in nature. We use the Empirical Rule to describe this data set. We expect approximately 68Yo of the observations to fall within the interval y t s. We expect approximately 95Yo of the observations to fall within the interval ! + 2s. We expect approximately IOO%o of the observations to fall within the interval V +3s. From the printout, we see
b.
t:
0.8425 and s
: =
0.3455.
,
c.
+ 2s
0.8425 *2(0.3455)
(0.1515,1.534)
Based on the Empirical Rule, we would expect approximately 95o/" of the observations to fall within the interval.
d.
From the histogram, it appears that roughly 9l%o of the observations fall in the interval. This is very close to the expected95%o. We use the Empirical Rule to describe this data set. We expect approximately 68%o of the observations to fall within the interval y * s. We expect approximately 95%o of the observations to fall within the interval y + 2s. We expect approximately I 00oZ of the observations to fall within the interval V +3s. We get y + 2s
2.35
1.000 + 2(0.950)
(-0.900, 2.900)'
Descriptive Statistics
l5
We use the Empirical Rule to describe this data set. We expect approximately 68%o of the observations to fall within the interval y + s' We expect approximately 95Yo of the observations to fall within the interval y + 2s' We expect approximately 100% of the observations to fall within the interval T + 3s. We get
2.43
+2s
4.560 + 2(10.390)
(-16.220,25.340)'
2.37
We find the summary information and the stem-and-leaf display for the data set from Statisix:
Variable
FRP
10
Stam Leawes
234.74
Mean
sD
9.9075
Minimum
2r5.70
I'Iaxinum
2.45
248.90
013 78
Since the shape is not moundshaped, we use Chebyshev's Rule to describe where data falls. we expect atleast 07o of the observations to fall within the interval y + s' we expect at least
of
the observations to
/t
y + l5
'
y + 3s
234.74 + 3(9.9075)
(204-815,264'665)
2.3g a.
From the histogram, the data do not follow the true mound-shape very well. The intervals in the middle are much higher than they should be. In addition, there are some extremely large velocities and some extremely small velocities. Because the data do not follow a mound-shaped distribution, the Empirical Rule would not be
appropriate.
b.
Using Chebyshev's Rule, at least I - l/42 or I - l/16 or l5/16 or 93.8o of the velocities *ill fall within 4 standard deviations of the mean. This interval is:
V + 4s
2.49
27,117 + 4(1,280)
* 27,ll7
+ 5,12O
(21,997, 32,237)
At least 93.7syo of the velocities will fall between 21,997 and32,237 km per second.
c. 2.41
Since the data look approximately symmetric, the mean would be a good estimate for the velocity of galaxy cluster A2142. Thus, this estimate would be27,l l7 km per
second.
The 75'h percentile is the point in the distribution in which 75Yo of the data values fall below and25Yo ofthe data values fall above. 75Yo of the TP concentrations at the 28 Everglades
l6
Chapter 2
Descr
sites had levels that fell below l0 micrograms per liter. T'he 75'r'percentile was chosen most likely to identif, the upper quarter of the sites that the DEP wanted to label as unsafe.
2.43
The mean cyanide concentration is 84.0 and the median is 28.8. Since the mean is much greater than the median, the data are skewed to the right. Since the data are not moundshaped, the Empirical Rule does not apply. The variance is 6,400, so the standard
deviation is 80. The upper quartile is 88.5. Thus,75Vo of all the measurements are less than 88.5.
2.45
-- 94.42 and s
4.380.
*t s
74-94.42
4.380
: _4.66
The score for the Nautilus Explorer is 4.66 standard deviations below the mean for all the cruise ships.
b.
Thez-scorefortheRotterdam
is:, :/-l
The score for the Rotterdam is 0.32 standard deviations below the mean for all the cruise ships.
2.47
,, :
for I 0.50
is
.s41
b.
-9.804 -?o
-- 9.422 and s
10.50-9.422
.479
-1.t<
The closer the z-score is to 0, the more likely it is to occur. Thus, a voltage reading 10.50 is more likely to occur at the old location because the z-score is closer to 0.
2.49
a.
of
The median is the value in which half the data fall above and half fall below. the sampled clinkers have measurements below 170 mg/kg.
50%o
of
b.
The lower quartile is the point in the distribution in which 25%" of the data values fall below and 7 5Yo of the data values fall above. 25o/o of the sampled clinkers have
measurements below 115 mg/kg.
The upper quartile is the point in the distribution in which 7 5%o of the data values fall below and 25oh of the data values fall above. 7 5%o of the sampled clinkers have measurements below 260 mg/kg.
d.
145
Descriptive Statistics
11
quartile' lnner fences are found 1.5 IQR's above the upper quartile and below the lower
for this Since no clinkers were found beyond the inner fences, no outliers were detected
data set.
To construct a box plot, we first must make some preliminary calculations. The arranged median is the average of the middle two observations, once the data have been there are an evgn number of observations, 30' The average in ascending order. b"caure compute Qvand iF r""d l6mobservations is (9.97 + g9Syz:9.975. Next, we + : (l/4x30 + "iitt" lower quartile, h, is the data point with the rank of (l/4)(n l) The orr. lf : l.lS - 8. T'he 8,h ranked data point is 9.80. The upper quartile, pg, is the data is point with the rank of (3/4)(n+ 1): (3/4X30 + 1): 23.25 x23. T]ne23'd datapoint : fences ]O.OS. The interquartiie range, lQR, is Q" - Q": 10.05 - 9.90 .25. The inner The innerfencesare arelocated l.5OQR):I-5(t,: '375 below Qtandabove Qu'
9.80
+ .375 : lO-425' The outer fences are located 3(lQR) : 3(.25):.75 below Q1-andabove Qv. Theouterfences are 9.80 - -75:9.05 and 10'05 + .75:10.80. The box plot is shown below'
.375
:9.425
and 10.05
Outer
l-owd
Fenca
uper hid
Ferp
m-9-975
There are four highly suspect outliers in the data set: 8.05, 8.72,8.72, and 8.80' outside the outeriences. In addition, there is one suspect outlier: 10'55' It lies between the inner and outer fences.
b.
All lie
To use the z-score method to detect outliers, we must first calculate the mean and standard deviation for the data.
mean:
=9.804
variance:
. s':
Zr,-{I4f n
n-l
zsst.B4r5-Qs!'lDz
.54O9
30
8.4851
:.2926
29
29
standard deviation:
t: J.Zg26 :
Chapter 2
Descri
I
I
or, : 8'05-
.5409
?:804 = *3.24
=9Z-;'?,'!! : .5409
-2.00
These are the only data points with z-scores of 2 or more in absolute value. T'hese data
The median of the new data is the average of the middle two observations, once the data have been arranged in ascending order, because there are an even number of observations, 30. Thi average of the l5s and l6th observations is (9.43 + 9.45)/2: 9.455. Next,wecompute QyandQu. The lowerquartile,Qt, isthedatapointwiththe rank of (l/4)(n + 1): (l/4)(30 + 1): 7.75 x 8. The 8'h ranked data point is 9.14. The
upperquartlle,Qv, isthedatapointwiththerank of (3/4)(n + 1): (3/4)(30 + l): 23.25 o23- The23'd datapoint is 9.75. The interquartile range, IQR, is Qu - Qt-- 9.75 - 9.14 : .61 . The inner fences are located 1.5(lQR) : 1.5(.61) : .91 5: below Qy and above Qu. The inner fences are9.l4 - .915 : 8.225 and9.75 + .915 .10.665. The outer fences are located 3(IQR):3(.61): 1.83 below Qyand above pg. The outer fences are 9.14 _ l.g3 :7.31 and9.75 + 1.93 : 11.59.
The box plot is shown as follows:
There are no points beyond the inner fences, so there are no outliers or suspected
outliers.
d.
282.67 :9.422
\-r
, v _(rd
2
n-1
=30
z6to.0613-?YSt30
6.650337
29
:.2293
-1
standarddeviation:
t:
JZZW :.4759
a z-score
of
z _8.51-9.422:
.4789
_1.90
Descriptive Statistics
l9
point) has a z-score and the data point 10.12 (maximum data
of
z=
to.l2-9.422
.4789
=1.46
Nodatapointhasaz-scoreof2ormoreinabsolutevalue.Therearenooutliersor
suspected outliers.
unusual (less than 2'5%) to observe a If the data are actually mound-shaped, it would be very is zsx. Thus, if we did observe 1'8oz' we batch with I .Soyozinc phosphide if the true mean in today's production is probably would conclude that the mean percent of ziniphosphide
less than 2.0olo.
.55
': ':':
T#g
-2'5
tuer'
;;r"';;;;;il;
n"w ptoa""ts'
Tire
Fate
2.57
The
'lO7 '067
'OSO
f;
The
witt
Chapter 2
Descriptive
a.
To find tho frequencies for each of the oategories, we multiply the relative frequencies by the total sample size,242 million. The frequency table is:
Tire Fate
Dumped
Frequency (millions)
.776 .107 .067 .050 187.792
Relative
Frequency
25.894
16.214
12.100
Exported
F
E
E I
a
T r
Ou!19.d
Eun.d
?L.
Fd.
RfaCc.t
2.57
.8
.78
F g
..F
.3o
.2
.1
.4E !
'H
FF 6-9
LE
8q
E6 c
5
.= o
at,
il
u,
b
E
*-!
8qE s ,E E
b8?
8 J
o n
gg
EF
E8 o6
EE
r't
$tE
Ea =
The poor road condition that caused the most accidents was "road repairs/under construction" with 39 accidents. The poor road condition that caused the next most accidents was
Descriptive Statistics
21
2.5gTheEmpiricalRulestatesthatapproximatelyglo/'ofthedatawillfallwithinthe for the data set' find the foliowing summary information v + zsirrt"*uj. *"
Variable Rougtrness
N
Mean
1. BB10
SD
20
0.5239
+ 2s
1'8810 +2(O's239)
(0'8332' 29288)
z.6t
The z-score
is: " :
+ =ry#f
-1'06
Ahead-injuryratingof408islessthanthemeanhead-injuryrating.Itisalittlemorethan
the mean' one standirddeviation below
2.63
E
I
&.
qf.
Compound
CompoundC,H,hasthehighestrelativeabundance(.354),withcoumpoundCH:thenext all have relative H, C3H.1 ,C7FI15, C'oH,, , and others highest with .210. Compounds abundance less than '1'
find the class Suppose we pick 8 classes' To Next, decide on the number of classes. of classes: divide the range by the number ;;;,;"
Class width
:2'66/8:
'3325
'34
Chapter 2
Desc{
,,)
will begin at 61.675, just below the smallest percentage. The resulting eight class intervals are shown as follows:
The first class Class
Class
I
2
J
Class
Interval Tallv
ilil
4
5
)nt
Ir*t
)r+t .I'ttl
rrr
.t2t
.091
lu
tttt
}ru
lrr
l8
t5
9
3
.273 .227
63.375
h( )r{ lttt
ill
ilt
-63.715
64.0s5 64.395
.r36
.o45 .045
66
7
8
64.055 63.7rs
' Totals n:
.999
For each class, count the number of observations. This gives the class frequency. The class relative frequency is then calculated by dividing the class frequency by the number of observations, t? : 66. For class l, the class relative frequency is 4/66: .061 . The other class relative frequencies are computed in a similar fashion and are listed in the table. The data above are then used to construct the relative frequency histogram.
.*-l
F '"]
LI
dt
"o-l
-"1
^i(H
'*l olr
I
d.
mean:
-El'
n-7
l -
26 t 67 4.4g 83
(4 I 5 s' 5s)2
65 .6085
-24'0703 = 3io3
65
t: J.llU :
Descriptive Statistics
23
e.
y]';:s -62'6g3t2('6085)
- 62'963!l'217=(61'746'64'180)
Yes.Thepercentageofobservationsthatappeafinthisintervalis64/66x1to0Vo: g6.g7yo. T.his i, close to the 95%o given by the Empirical Rule'
2.67
Variable
Scrams
s6
Mean
SD
o tt 3
Q
4.036
3 'O27
o o
To determine
if 1l unplanned
scrams is likely, we
b. c.
2.71
d.
ChapterZ
Descri
Frequency
81
65: 146
Proportion = '
d.
FrequencY
Yes. Since no rod diameters were reported at the interval centered at 0.999, it appears these rods were being recorded at the higher diameter value.
Descriptive Statistics
21
it does not appear that one cause is Because each of the bars are about the same height,
more likely to oacur than any other'
b.
o:L! 'n50
-2991 =59.82
-z s =--- n_l"
>r'
-ry:ft'477 -+50_l
13e,55s.38
49
= 2,848.068e8
: J"t
:.8,848.06898 =53-367
chebyshev's Rule to describe the Since the data are not mound-shaped, we must use :t -; :; ofthe observations will fall that at least F -b :r-
data' we know
yx3s=59.82t3(53.367)=59.82+160.101=(_100.281'219.921). Atleast8/goftheobservationswillfallbetween-100'281and2|9'92|.
2.73
a.
The histogram portrays quantitative data' describe the data' A frequency distribution or histogram is being used to 1.0025 and 1'0045 centimeters can be The frequency of rods with diameters between
b. c.
Chapter2
Descrip
o E
5 (E o u)
1oo
taal
-rtrriiii*i.'. .
6
aaa a
rt
l-ergth
The mean density for the unoiled area is 3.27, while the mean for the oiled area is 3.495. The median for the unoiled area is .89 and is .70 for the oiled area. These are both fairly similar. Using Chebyshev's Theorem, at least 7syo of the observations will fall within 2 standard deviations of the mean. This interval for unoiled areas would be:
V + 2s h.
3.495
2(5.968)
3.495 + I 1.936
- (-8.441, I 5'43 I )
From the above two intervals, we know that at least 7 5o/o of the observations for the unoiledareawillfallbetween-10.3 I and16.67 andatmost2iyooftheobservations will fall above 15.431 for the oiled areas. Thus, the unoiled areas would be more likely to have a seabird density of 16. We will use a frequency bar graph to describe the data. First, we must add up the number of spills under each category. These values are summarized in the following
table: Cause of
2.11
Spillage
Frequency l1 t3
T2
t2
50
Descriptive Statistics
25