Download as pdf
Download as pdf
You are on page 1of 63
® © Deta: Dota arce mdividual pieces of eee myoremation reecorcded ond used fore the purpose of analysis « Statistics: Statistics jg the seience of Collecting , omalyzing , presenting, ond ’ m-erpreetins data, ag well as of makin de@isions based om such amalyses - Population AR population consists .of all elements — mdivideals , Hems, or objects — whose Charcaetercisties arce being sttedied- Cremple ¢ A porction of +he population selected 4c Steady 16 reofercrred to as Sample. Example23 4.Cb) The following data give the total number of iPods® sold by a mail order company on each of 30 days. Construct a frequency distribution table. 8 25 11 15 29 22 10 5 17 21 22 13 26 16 18 12 9 26 20 16 23 14 19 23 20 16 27 16 21 14 Example 2-3: Solution The minimum value is 5, and the maximum value is 29. Suppose we decide to group these data using five classes of equal width. Then Approximate width of each class = BS =48 Now we round this approximate width to a convenient number, say 5. The lower limit of the first class can be taken as 5 or any number less than 5. Suppose we take 5 as the lower limit of the first class. Then our classes will be 5-9, 10-14, 15-19, 20-24, and 25-29 ‘Table 2.9 Frequency Distribution for the Data on iPods Sold iPods Sold Tally f +9 til 3 10-14 tH 6 15-19 HH IIL 8 20-24 THI 8 25-29 tH 5 =s= 30 Relative Frequency and Percentage Distributions Calculating Relative Frequency and Percentage Frequencyof that class__f Relative frequency of a class= =f apemey Sum of all frequencies Sf Percentage = (Relative frequency): 100 Example 2-4 Calculate the relative frequencies and percentages for Table 2.9. Example 2-4: Solution ‘Table 2.10 Relative Frequency and Percentage Distributions for ‘Table 2.9 Pods Sold ‘Class Boundaries 39 $5 0 less tan 9S tots 9.5 to less than 14.5 1519 14.5 10 ess than 19.5 20-24 19.5 to less than 24.5 25.29 24.5 to less than 29.5 Figure 2.3 Frequency histogram for Table 2.9. 5-9 10-14 15-19 20-24 25-20 iPods sold igure 2.4 Relative frequency histogram for Table 2.10. Figure 2.5 Frequency polygon for Table 2.9. / [oer Boundarcies 5-2 4.5-9-5 lot | 10 F g Ho-14 [9.5 145 [o fo2| 2 | 9 219 114-5- 19-5] @ |o.28] 26-7) 17 8 5 RF |fere. Ramu 6] 7| 25 90-24 119.5- 24-5] ® jo.% a67| 2 96-22] 24.5 - 22.5 Histogreamse a oP toa 4 729 59 244 Pw 2- a peli a) Fi TeeGwone ‘Pp aM o 439 ae aa a? 2.7 Dotplots Definition Values that are very small or very large relative to the majority of the values in a data set are called outliers or extreme values. 1(c) page-9 An experiment is a process that, when performed, results in one and only one of many observations. These observations are called that oufcomes of the experiment. An event is a collection of one or more of the outcomes of an experiment. Probability is a numerical measure of the likelihood that a specific event will occur. The probability that a person is in favor of genetic engineering is .55 and that a person is against it is .45. Two persons are randomly selected, and it is observed whether they favor or oppose genetic engineering. a) Draw a tree diagram for this experiment o) Find the probability that at least one of the two persons favors genetic engineering. Prem Mann, Introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons. All right reserved Example 4-34: Solution a) Let F = a person is in favor of genetic engineering A = a person is against genetic engineering This experiment has four outcomes. The tree diagram in Figure 4.21 shows these four outcomes and their probabilities. Figure 4.21 Tree diagram. First permon {Skeet perscn Final outsorwen and ' thes pecbabsiben i AAFP) = (58) (55) = 3025 \ r= | = i | | | | AAPA) = (95) (48) = 2475 i ANAP) 2 (45) (55) = 2473, 55 (AA) = | 45) (48) = 202s Figuoee 4.2) Tree diaprare Prem Mann, introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons. lll right reserved Example 4-34: Solution @, P(at least one person favors) = P(FF or FA or AF) P(FF) + P(FA) + P(AF) = .3025 + .2475 + .2475 = .7975 ) efinition Marginal probability is the probability of a single event without consideration of any other event. Marginal probability is also called simple probability. Definition Conditional probability is the probability that an event will occur given that another has already occurred. If A and B are two events, then the conditional probability A given B is written as P(A|B) and read as “the probability of A given that B has already occurred.” Example 4-27 The probability that a patient is allergic to penicillin is .20. Suppose this drug is administered to three patients. a) Find the probability that all three of them are allergic to it. b) Find the probability that at least one of the them is not allergic to it. Prem Mann, Introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons. All right reserved Example 4-27: Solution a) Let A, B, and C denote the events the first, second and third patients, respectively, are allergic to penicillin. Hence, P (Aand B and C) = P(A) P(B) P(C) = (.20) (.20) (.20) = .008 b) Let us define the following events: G = all three patients are allergic H = at least one patient is not allergic = P(G) = P(A and B and C) = .008 «= Therefore, using the complementary event rule, we obtain = P(H) =1-P(G) =1- .008 = .992 Prem Mann, Introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons. All right reserved Figure 4.18 Tree diagram for joint probabilities. Fntptare | Second paiot | Taedpaient — Fratoucons | | ring = 208 anc) = 022 nakic) = 0% Padi) = 128 miinc)= 02 Piiac)=.128 Pic) «328 niiic)= $12 Figure 4.10 Tyee diagram for jum pevtabsiis Prem Mann, Introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons. All right reserved 2@ The standard deviation is the most-used measure of dispersion. The value of the standard devia- tion tells how closely the values of a data set are clustered around the mean. In general, a lower value of the standard deviation for a data set indicates that the values of that data set are spread over a rel- atively smaller range around the mean. In contrast, a larger value of the standard deviation for a data set indicates that the values of that data set are spread over a relatively larger range around the mean. The standard deviation is obtained by taking the positive square root of the variance. The variance calculated for population data is denoted by o” (read as sigma squared),’ and the variance calculated for sample data is denoted by s”. Consequently, the standard deviation 2Note that 3 is uppercase sigma and o is lowercase sigma of the Greek alphabet. calculated for population data is denoted by o,, and the standard deviation calculated for sam- ple data is denoted by s. Following are what we will call the basic formulas that are used to calculate the variance and standard deviation. X(x - py? X(x — xP 2 ST" 2 = o W and nol [Sq-m Sea) o = .| —— and s = 4./——— N n-1 where o? is the population variance, s? is the sample variance, o is the population standard deviation, and s is the sample standard deviation. The quantity x — yw or x — x in the above formulas is called the deviation of the x value from the mean. The sum of the deviations of the x values from the mean is always zero; that is, S(x — w) = Oand S(x — x) =0. For example, suppose the midterm scores of a sample of four students are 82, 95, 67, and 92, respectively. Then, the mean score for these four students is 82 + 95 + 67 +92 _ 4 The deviations of the four scores from the mean are calculated in Table 3.5. As we can observe from the table, the sum of the deviations of the x values from the mean is zero; that is, }(x — X) = 0. For this reason we square the deviations to calculate the variance and standard deviation. x= 84 82 82-84= -2 95 95 —- 84= +11 67 67 —-84=-17 92 92-84= +8 Lx 0 From the computational point of view, it is easier and more efficient to use short-cut for- mulas to calculate the variance and standard deviation. By using the short-cut formulas, we reduce the computation time and round-off errors. Use of the basic formulas for ungrouped data is illustrated in Section A3.1.1 of Appendix 3.1 of this chapter. The short-cut formulas for calculating the variance and standard deviation are as follows. Short-Cut Formulas for the Variance and Standard Deviation for Ungrouped Data Sx)? Sx? ee 24 pe = er ee eee N n-1 where o” is the population variance and s? is the sample variance. The standard deviation is obtained by taking the positive square root of the variance. Population standard deviation: c= Sample standard deviation: s= d- (©) Short-cut Formulas for the Variance and Standard Deviation for Ungrouped Data Yxe- (Ss) x n ye 2 o Nand s?= N n-1 where o? is the population variance and s? is the sample variance. Short-cut Formulas for the Variance and Standard Deviation for Ungrouped Data The standard deviation is obtained by taking the positive square root of the variance. Population standard deviation: o=vo? Sample standard deviation: — s=vs* Short-Cut Formulas for the Variance and Standard Deviation for Grouped Data St 2) Som ars oo = ——___4 ee | a and s?= N n-1 where o? is the population variance, s? is the sample variance, and m is the midpoint of a class. 4 Number of Orders f m mf mf 10-12 4 1 44 484 13-15 12 14 168 2352 16-18 20 17 340 5780 19-21 14 20 280 5600 n=50 Imf = 832 Im*f = 14,216 any" (832)? 9 2m m f - 14,216— 7.5820 n-1 50-1 s=¥s? = V7.5820 = 2.75 orders Dm i 832 = 16.64 orders 50 x= n Daily Commuting Time (minutes) f m mf mf 0 to less than 10 4 5 20 100 10 to less than 20 9 15 135 2025 20 to less than 30 6 25 150 3750 30 to less than 40 4 35 140 4900 40 to less than 50 2 4s 90 4050 =25 Xmf = 535 mf = 14,825 = Se =135.04 2 2 Sg a” 14,825-035)- oH N_ 25 Example 3-16: Solution 2 =m mf = unt” _ 14825 ey oats o 25_ _=-"° 135.04 N 25 25 o = Vo? = 135,04 = 11.62 minutes Thus, the standard deviation of the daily 4.1 Experiment, Outcomes, and Sample Space Definit 4(a) An experimen? is a process that, when performed, results in one and only one of many observations. These observations are called that ou!comes of the experiment. The collection of all outcomes for an experiment is called a saip/e space. Let us define the following events: F = the randomly selected person is a female M = the randomly selected person is a male V =the randomly selected person is a vegetarian N = the randomly selected person is a non-vegetarian. P(M orV) = P(M)+P(V)—P(M and V) 1100 , 600 200 2500 2500 2500 = 44+.24-.08=.60 Prem Mann, introductory Statistics, 7/E Copyright © 2010 John Wilay & Sons. All right reserved Table 4.10 Two-Way Classification Table Vegetarian “ Nowvegetarian (V) Tocal Female (F) (aoa) i) {ian Male iM) 200 yoo 1100 Toral (eo) 1900 The probability that a randomly selected student from a college is a senior is .20, and the joint probability that the student is a computer science major and a senior is .03. Find the conditional probability that a student selected at random is a computer science major given that the student is a senior. Prem Mann, /ntroductory Statistics, 7/E Xn Copyright © 2010 John Wiley & Sons. All right reserved Example 4-25: Solution Let us define the following two events: = A =the student selected is a senior = B =the student selected is a computer science major From the given information, P(A) = .20 and P(Aand B) = .03 Hence, P (B | A) = P(A and B)/P(A) = .03/.20 = .15 A university president has proposed that all students must take a course in ethics as a requirement for graduation, Three hundred faculty members and students from this university were asked about their opinion on this issue. Table 4.9 gives a two-way classification of the responses of these faculty members and students. Find the probability that one person selected at random from these 300 persons is a faculty member or is in favor of this proposal. NO) Table 4.9 Two-Way Classification of Responses Prem Mann, introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons. All right reserved Faver Oppose Newtral Total Faculty 45 Is 1 mw Student mn 10 30 m0 Total 135 US wo 3) Example 4-30: Solution Let us define the following events: A = the person selected is a faculty member 8 = the person selected is in favor of the proposal From the information in the Table 4.9, P(A) = 70/300 = .2333 P(B) = 135/300 = .4500 P(A and B) = P(A) P(B | A) = (70/300)(45/70) = .1500 Using the addition rule, we obtain P(A or B) = P(A) + P(B) - P(A and B) = .2333 + .4500 - .1500 = .5333 7.2 Sampli 5( a) pling Errors Definition is the difference between the value of a sample statistic and the value of the corresponding population parameter. In the case of the mean, Sampling error= Y— assuming that the sample is random and no nonsampling error has been made. Prem Mann, introductory Statistics. 7/E Copyright © 2010 John Wiley & Sons. All right reserved SAMPLING AND NONSAMPLING ERRORS Definition The errors that occur in the collection, recording, and tabulation of data are called Suppose there are only five students in an advanced statistics class and the midterm scores of these five students are 70 78 80 80 95 Let x denote the score of a student. Using single-valued classes (because there are only five data values, there is no need to group them), we can write the frequency distribution of scores as in Tuble 7.1 along with the relative frequencies of classes, which are obtained by dividing the fre- quencies of classes by the population size. Table 7.2, which lists the probabilities of various x values, presents the probability distribution of the population. Note that these probabilities are the same as the relative frequencies. able 1 Population Frequency and Table 722 Population Relative Frequency Distributions Probability Relative eee x t Frequency x Pix) 70 1 i/S= 20 70 20 78 L W/S= 20 rid 20 80 2 2/5= 40 80 0 95 L 1/S= 20 95 20 N=5 Sum = 1.00 P(x) = 1.00 The values of the mean and standard deviation calculated for the probability distribution of Tuble 7.2 give the values of the population parameters jz and o. These values are jp = 80.60 and a = 8.09. The values of and @ for the probability distribution of Table 7.2 can be calcu- lated using the formulas given in Section 5.3 of Chapter 5 (see Exercise 7.6), Reconsider the population of midterm scores of five students given in Table 7.1. Consider all possible samples of three scores each that can be selected, without replacement, from that population. The total number of possible samples, given by the combinations formula discussed in Chapter 4, is 10; that is, SL See Be Ded —"— = =10 35-3)! 3+ 2+1+261 Total number of samples = ,C, = Suppose we assign the letters A, B, C. D, and E to the scores of the five students, so that A=70, B=78 C=80, D=80, E=95 Then, the 10 possible samples of three scores cach are ABC, ABD, ABE, ACD, ACE, ADE, BCD, BCE, BDE, CDE These 10 samples and their respective means are listed in Table 7.3. Note that the first two samples have the same three scores. The reason for this is that two of the students (C and D) have the same score, and, hence, the samples ABC and ABD contain the same values. The mean of cach sample is obtained by dividing the sum of the three scores in- cluded in that sample by 3, For instance, the mean of the first sample is (70 + 78 + 80)/3 = 76. Note that the values of the means of samples in Table 7.3 are rounded to two decimal places. By using the values of ¥ given in Table 7.3, we record the frequency distribution of ¥ in Table 7.4. By dividing the frequencies of the various values of ¥ by the sum of all frequencies, we obtain the relative frequencies of classes, which are listed in the third column of Table 7.4, These relative frequencies are used as probabilities and listed in Table 7.5. This table gives the sampling distribution of x. If we select just one sample of three scores from the population of five scores, we may draw any of the 10 possible samples. Hence, the sample mean, %, can assume any of the val- ues listed in Table 7.5 with the corresponding probability, For instance, the probability that the mean of a randomly selected sample of three scores is 81.67 is .20. This probability can be written as P@ = 81.67) = 20 Table 73 All Possible Samples and Table 7.4 Frequency and Relative Table 75 Sampling Distri- Their Means When the Frequency Distributions of ¥ bution of ¥ ‘Sample Size Is 3 When the Sample Size Is 3 When the Sam- Scores in Relative Hefei 3 Sample the Sample z = f Frequency z PQ) ‘ABC 70, 78, 80 76.00 76.00 2 2/10 = 20 76.00 20 ABD 70, 78, 80 76.00 76.67 1 1/10= 10 76.67 10 ABE 70, 78, 95 81.00 79.33 1 1/10= 10 79.33 10 ACD 70, 80, 80 76.67 81.00 1 1/10= 10 81.00 10 ACE 70, 80, 95, 8167 81.67 2 2/10 = 20 81.67 20 ADE 70, 80, 95 81.67 84.33 2 2/10= .20 84.33 20 BCD 78, 80, 80 79.33 85.00 1 1/10 = 10 85.00 10 BCE 78, 80, 95 84.33 sf= 10 ‘Sum = 1.00 SPQ) = 1.00 BDE 78, 80, 95 84.33 CDE 80, 80, 95 85.00 The mean wage for all 5000 employees who work at a large company is $27.50 and the standard deviation is $3.70. Let X be the mean wage per hour for a random sample of certain employees selected from this company. Find the mean and standard deviation of x fora sample size of (a) 30 (b) 75. (c) 200 5(b) same process ... only digit 10 and 100 — Example 7-2: Solution (a) N= 5000, p = $27.50, o = $3.70. In this case, n/N'= 30/5000 = .006 < .05. 2 = = $27.50 o _ 3.70 ea —— = $.676 OF» Jn 30 $ Prom Mann, Intioductory Statistics, 77E CCopytight © 2010 John Wiy & Sons. Al right reserved Example 7-2: Solution (b) N = 5000, p = $27.50, o = $3.70. In this case, n/N= 75/5000 = .015 < .05. My =H = $27.50 o _ 3.70 ———— = $.427 “Tn vs? Example 8-1: Solutio: 5 (c) a) n=25, x= $145, and o= $35 o 35 oO, =—= = —— = $7.00 Vn J25 Point estimate of p = x = $145 “am Mann, Introductory Statistics, 7/E Copyright © 2010 ovo Wiley & Sons. All right reserved Example 8-1: Solution b) Confidence level is 90% or .90. Here, the area in each tail of the normal distribution curve is a/2=(1-.90)/2=.05. Hence, z= 1.65. X+ zo, =145+1.65(7.00) = 1454 11.55 = (145-11.55) to (145 +11.55) = $133.45 to $156.55 Prem Mann, Introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons, All right reserved ail Airtel © 21:50 Se 47% OE: C_6.pdf Example 6-7: Solution ) o The z value forx = 25is0 o The z value for x = 32 is X-HM_ 32-25 4 P(QS 23:08 © 40% G) < C_8.pdf Q @ Example 8-5 Or. Moore wanted to estimate the mean cholesterol level for all adult men living in Hartford. He took a sample of 25 adult men from Hartford and found that the mean cholesterol level for this sample is 186 mg/dL with a standard deviation of 12 mg/dL. Assume that the cholesterol levels for all adult men in Hartford are (approximately) normally distributed. Construct a 95% confidence interval for the population mean fe Prem Mann, introductory Statistics, 77 Copyright © 2010 John Wiley & Sons. Al right reserved Example 8-5: Solution ao cis not known, n < 30, and the population is normally distributed (Case I) a Use the t distribution to make a confidence interval for p 0 n=25, x=186, s=12, and confidence /eve/ =95% 8 s _ 12 S, =—= = ——=2.40 * Nn 25 Prem Mann. introductory Statistics, 77E Copyright © 2010 Jahn Wiley & Sons. All ight reserved avi! Airtel > 21:47 © 49%8) < C_6.pdf Q @ Example 6-4: Solution a) P(1.19 21:43 @ 50% =) C_6.pdf Q ®@ ae Copyright © 2010 Jotn Wiley & Sons. Al right reserved Example 6-3 Find the following areas under the standard normal curve. a) Area to the right of z = 2.32 ») Area to the left of z = -1.54 Prom Mann, inirocuctory Stasis, HE Copyright © 2010 John Wiley & Sons. Allright reserved Example 6-3: Solution a) To find the area to the right of z=2.32, first we find the area to the left of z=2.32. Then we subtract this area from 1.0, which is the total area under the curve. The required area is 1.0 - .9898 = -0102. Prem Mann, Introductory Statistics, 7 Copyright © 2010 John Wiley & Sons. Allright reserved 6.@) Figure 6.22 Area to the left of z = -1.54. ‘Shaded areas 618 as 0 Prem Mari, Introductory Statistics, 77 Copyright © 2010 John Wiley & Sons. Al ight rasorved Example 6-4 Find the following probabilities for the standard normal curve. a) P(1.19 <2 < 2.12) b) P(-1.56 -.75) Example 6-4: Solution a) P(1.19 .05, then ©, is calculated as: N-n o, = (24 [N= nVN-1 N-n where the factor \w-1 is called the finite- population correction factor. Boe Consultant Associates has five employees. Table 7.6 gives the names of these five employees and information concerning their knowledge of statistics. Table 7.6 Information on the Five Employees \0) of Boe Consultant Associates a Name Knows Statistics Ally Yes John No Susan No Lee Yes Tom Yes If we define the population proportion, p, as the proportion of employces who know statistics, then p= 3/5 = 60 Now, stippose we draw all possible samples of three employees each and compute the pro- portion of employees, for each sample, who know statistics. The total number of samples of size three that can be drawn from the population of five employees is 5! 35 — 3)! $:4+3+2+1 Total number of samples = 5C, = spat 10 Table 7.7 lists these 10 possible samples and the proportion of employees who know statistics for each of those samples. Note that we have rounded the values of f to two dec- imal places. Table 17 All Possible Samples of Size 3 and the Value of f for Each Sample Proportion Who Know Statistics Sample é Ally, John, Susan. W/3= 33 Ally, John, Lee 2/3= 67 Ally, John, Tom 2/3 = .67 Ally, Susan, Lee 2/3= 67 Ally, Susan, Tom 2/a= 67 Ally, Lee, Tom 3/3 = 1.00 John, Susan, Lee W/3= 33 John, Susan, Tom 1/3= 33 John, Lee, Tom 2/3= 67 Susan, Lee, Tom 2/3= 467 Using Tuble 7.7, we prepare the frequency distribution of j as recorded in Table 7.8, along with the relative frequencies of classes, which are obtained by dividing the frequencies of classes by the population size. The relative frequencies are used as probabilities and listed in Table 7.9. This table gives the sampling distribution of f. Table 78 Frequency and Relative Table 79 Sampling Distri- Frequency Distributions of p bution of p When When the Sample Size Is 3 the Sample Size Relative Is3 A f Frequency - Pp) 33 3 3/10= 30 33 30 67 6 6/10= 60 67 60 1.00 I 1/10= 10 1.00 10 w= 10 Sum = 1.00 EPA) = 1.00 Solution Let p be the proportion of all cligible voters who favor Maureen Webster. Then, 42 p=53 and g=1-p=1-.53=.47 The mean of the sampling distribution of the sample proportion j is Bg =p = 53 The population of all voters is large (because the city is large), and the sample size is small compared to the population. Consequently, we can assume that n/N = .05. Hence, the stan- dard deviation of p is calculated as op= gu~ [ee . 02495496 From the central limit theorem, the shape of the sampling distribution of is approximately normal, (The reader should check that np > 5 and ng > 5 and, hence, the sample size is large.) The probability that / is less than .49 is given by the area under the normal distribution curve for f to the left of jf = 49, as shown in Figure 7.18. The z value for p = .49 is P-p_ AD- 53 7 02495496 —1.60 Figure 7.18 P(p < .49) Shaded area is.0548 Thus, the required probability from Table IV is P(p < 49) = P(z < —1.60) = 0548 Hence, the probability that less than 49% of the voters in a random sample of 400 will favor Maureen Webster is .0548, a

You might also like