Download as pdf
Download as pdf
You are on page 1of 170
For the lower formula: In this example: APPLIED BUSINESS STATI quartile, Qy, apply a formula similar to the medi Oj, = 10 — lower limit of the Q, interval 30 — sample size 3 — cumulative frequency of the interval before the Qy interval 5 — class width 5 — frequency of Q, interval. 30 5/7 = s| Q = 10 +3 = 145 days Inte c ‘erpretation 25 % of re-orders will be received within 14,5 days. © Upper quartile , Qs: 3x30 Qs position: SH = 3439 — 22,5t position. Qg interval = [20-<25] because the 225th observation falls within these class limits. For the upper quartile, Qy a formula similar to the median formula again applied. In this example: Og = 20 — 1 =30 — f@Q=7 — lower limit of the Q, interval = sample size cumulative frequency of the interval before the Q, interval \ MEASURES OF CENTRAL LOCATION a of percentiles e=5 — classwidth fag = 7 — frequency of Q, interval. Applying the formula 3x30 sC = -w\ 7 Q=20+ = 23,93 days Interpretation 75 % of re-orders will be received within 23,93 days. Alternatively, 25% of re-orders: will be received after 23,93 days, In general, any percentile value can be found by adjusting the median formula to: * Find the required percentile’s position and from this, * establish the percentile interval. Examples * 90th percentile position = 0,90 % n * 35th percentile position = 0,35 x1 * 25th percentile position (Q,) = 0.25 xn Percentiles can be used to identify various non-central values. For example, if itis desired to work with a truncated dataset which excludes extreme values at either end of the ordered dataset (e.g, exclude the lower 5 per cent and the upper 5 per cent of observations}, then finding the 5th and 95th percentiles will identify the limits within which the remaining 90 per cent of data values will fall. COMPARING THE MEAN, MEDIAN AND MODE ‘Symmetrical distribution (negatively) skewed distribution The choice of a representatine central location value depends on the shape of the frequency distribution. If the frequency distribution of the random variable is symmetrical, then all three measures have the identical value and any can be chosen. This is illustrated for the symmetrical histogram in Figure 3.2. In these circumstances, it is recommended that the menm be selected as it has ‘useful properties which are of value in further data analysis. A negatively skewed distribution is characterised by a few relatively small ‘observations. This creates a “long” tail on the negative (left) side of the distribution. In such distributions, the mean is most influenced as these few small observations will “pull down” the mean value. It is then not APPLIED BUSINESS STATISTICS: Mean = Median = Mode Frequency ively) skewed ibution A positively skewed distribution is characterised by a few relatively la ‘observations, This creates a “long tail on the positive (right) side of distribution. These few larger observations will tend to “pull up” | mean value which again makes it the least representative of all measur ‘of central location. (See Figure 3.4.) If a distribution is distorted by extreme values (Le. skewed) then | median or the mode is more representative than the mean. If the frequency distribution is skewed, the median may be the b measure of central location as it is not pulled by extreme values (as mean is), nor is it as highly influenced by the frequency of occurrer ‘of a single value (as the made is). EXAMPLE 6 ER MEASURES OF CENTRAL LOCATION ‘When data represents percentage changes such as indexes, or grov rates, then an appropriate measure of central location is the geomei mean. * Multiply all» observations together; * take the nth root of the product. The geometric meun (G.M.) is calculated as follows: GM. = Vxyxxpxagx. Xx, The electricity tariff problem The electricity tariff has increased by 12 per cent, 8 per cent and 16 | cent per annum over a three-year period. Find the average ann increase in tariff. Solution Geometric mean = ¥1,12«1,08%1,16 = 11195 ie. an 11,95 % annual increase. Compare this to the arithmetic mean which gives a value of: (112+1,08+ 1,16) _ 5 112ie. a 12 % annual increase. Grouped data method EXAMPLE 7 APPLIED BUSINESS STATISTICS: To check, assume the initial price per kilowatt hour was RL. Then: * the actual price after three years would be 140313 fie. 1 x 112* 108 x 1,16) © Computations based on geometric and arithmetic means Geametric rrean Arithmetic mean Year} 1X1,1195=1,1195 1x1 122112 Year 2 1,1195%1,1195 =1,2533 12x 1,12 = 12544 Year3 12533 x 1,1195 = 14081 W) 12544 x 112 = 14049 OX) As seen from the example, the arithmetic mean overstates the average percentage change. Another area of application involves finding average escalation rates of construction projects. The geometric mean (G.M.) for grouped data is: Dh Loge) | XA | GM. = Antilog The fuel price changes problem Find the average price increase for fuel aver the past 20 years. The following frequency distribution shows the percentage price changes in fuel on 30 occasions over the past 20 years, Price changes Midpoint (percentages) fi 6-<12 9 8 w2-<18 u 18-<24 2 24-<30 Solution Mathematically: Geometricmean = "V9 x 15! x 217 x 274 This formula can be simplified using logarithms: MEASURES OF CENTRAL LOCATION 1 (BRE nen re eee ee SE Harmonic mean EXAMPLE 8 Interval ie , Log) 6-<12 9 8 10,9542 12-<18 15 u 11761 18-<24 a 7 13222 24-<50 a 4 Last Then Log (G.M,) = 35,519/30 = 1,1851 Geometric mean = antilog (1,1851) 5314 On a calculator, the antilog is found by raising 10 to the power of 1,1851 e108 5,314) 7 ‘ i Interpretation Fuel has increased on average by 15,314 % on each oocasion that the price was adjusted. When a dataset contains values that represent rates of chmge, then the harmonic mean is the appropriate measure of central location. Method © Sum the recipricals of the observations * Divide this sum into the number of observations. Formula Harmonic mean = 7 A The cyclist problem Ifa cyclist travels 50 km/‘hour over a 5 km stretch of road, and 30 km/h over another hilly 5 kn stretch of road, find the average speed over the 10 km distance. Solution Harmonicmean = —2— Ae 50 * 30 = 37,5 km per hour. 72 APPLIED BUSINESS STATISTICS Note that the arithmetic mean would have given a value of 40 km per hour which overstates the true average speed. Weighted The simple arithmetic mean assumes that each observation is of equal arithmetic mean importance in computing the average (i.e. each observation hasa weight (weignted of V%). However, if the importance (weight) of each observation is average) different, then the appropriate measure of central location for the obser- ‘vations would be the weighted arithmetic mean. Method «Each observation, x; is first multiplied (weighted) by its importance measure which is generally its frequency, f;. ‘© These weighted observations are then summed. © The sum of the weighted observations is then divided by the sum of the weights. In formula terms, the weighted arithmetic mean is given by: weighted = EXAMPLE 9 The training consultant's earnings problem Lk ‘A training consultant is paid R150 per hour for one 8 hour training programme; R120 per hour on a second training programme of 6 hours and R200 for a2 hour seminar. What is his average earnings per hour cover the three engagements? | Solution Earnings per hour Hours Total earnings | x worked fy fx | Programmeone 150 8 1.200 | Programme two 120 6 720 Seminar (2 hours) 200 2 400 . | Total hours worked 16 Total earnings 2320 F rae _ Total earnings _ 2.320 Weighted arithmetic mean = "2 thoare = 16 = R145 per hour, MEASURES OF CENTRAL LOCATION 73 ‘The method for computing the weighted arithmetic mean is identical to the method used to find the arithmetic mean for grouped data. (See section 3.3.) 3.9 SUMMARY This chapter introduced the first of the measures of descriptive statistics, namely measures of central location. The mean, mode and median are identified as the three most important and commonly used measures of central location. The computation for each of these three measures Was illustrated for both ungrouped data and grouped data cases. In addition, the conditions under which each central location measure would be appropriate to use are also described. This refers to the skewness of the frequency distribution and the nature of the data available. Quartiles (or percentiles), which are non-central measures of loca- tion, are also described. In particular, the first and third quartiles are derived for both ungrouped and grouped data cases. Less commonly used measures of central location are identified as the geometric mean and the harmonicmean, The conditions appropriate for their use are also given. The weighted arithmetic mean is shown to be the same as the arithmbtic mean for grouped data. APPLIED BUSINESS STATISTICS EXERCISES Exercise 1A restaurant owner randomily selectéd and recorded the value of meals enjoyed by 15 dinets ona given day. The values of meals were (in rands): 4 YS ged G2 Be A aio 25 80 M2 UM 65 WB 25 39 2 3 34-25 8 40 (Define the random variable and the data type. (ii) Find: (a) The arithmetic mean and interpret. (6) The median value of a meal. (c) Can you identify a mode? Give its value. (iii) Which central location measure would you choose? Exercise 2 The human resource department of a company analysed the level of ——~ absenteeism of 56 employees who reported ill over the past year. Aibsentecism level Nuviber of employees (days absent) f a : 2 “aT-<15 n B-<19, 6 19-<23 3 ( Find the mean, median and modal level of absenteeism. (ii) Is the distribution of days absent symmetrical? Gil} Determine the first quartile and 75 th percentile. (iv) What percentage of values lie between Q, and Q,? Exercise 3. Find the average price paid per share in an equity portfolio condisting of; “ 400 shares bought for RIS each, 100 shares bought for R20 each, 50 shares bought for R40 each, and 500 shares bought for R10 each. Exercise 4 Office rental agreements contain escalation clauses. For a particular office block in the CBD, the escalation rates for 1988, 1989, 1990 and 1991 were 16 %, 14%, 10 % and 22 % respectively. What was the average rate of escalation in office rentals for this office block over this 4 year period? Exercise 5 The ratio of actual costs to estimated costs for 49 new product develop- ments ina pharmaceutical firm are given in the following distribution: MEASURES OF CENTRAL LOCATION 78 Cost ratio ‘Number of products 0,00 -049, Y 0,50 -0,99 5 100-149 1B 1,50-1,99 6 200-249 6 250-299 10 3,00 -349 3 (i) Calculate the mean, median and modal cost ratio for this distribu- tion (i) Do you think it is justified for the management to present any of the results of (i) above to the board of directors as a representative cost ratio for development projects? Explain. [IMM - October 1990] Exercise 6 The following data represents the percentages of family income allo- cated to groceries for a sample of 50 shoppers: Percertages of famity income ‘Number of shoppers 10-19 5 20=29 12 ue n . 3 () Compute the value of the mean [IMM - May 1990] (ii) Find the median and modal percentages of family income spent on groceries for this sample of shoppers. (ii) Which of these threé measures of central location values would you publish in a‘consumer magazine as being representative of the actual percentage of family income spent on groceries? Explain. Lf Exercise 7 The table below gives the monthly rental paid by 150 employees in company X. Rent (in-rands) Number of employees 20 7k _ Si ae Tala, Ge 2 Wo 19 tay 5 So (i) Calculate the value of the median rent. SU (IMM - October 1989] or] MSU LIBRARY’ sie ae 4 APPLIED BUSINESS STATISTICS (i) Determine the upper quartile rent paid. (iii) 25% of employees pay less than a certain rent. What is the maximum rent paid by this group of employees? (iv) What is the range of rent paid by the middle 50 % of employees? Exercise 8 Indicate the relation between the mean, median and mode in a positiv- ely skewed frequency distribution by means of simple diagram. Explain briefly the positioning of these averages. HAC(SA) - May 1992] Exercise 9 The daily wages earned by men in a certain industry are given in the following table: a % T Wages frands) No. of mei Wages (rares) ‘No. of men | 50-<52 2B 62-<64 40 soca 68 64-<66 130 Bt <56 7 66 < 68 92 56-58 294 6—<70 60 ‘58-<60 4306 70-<72 26 60=<62 550 72-<74 W7 (@ Calculate the arithmetic mean wage and the median wage. (ii) Find the wage limits of the central 50 % of wage earners. (ii) Find the percentage of wage earners earning less than R60 per day. ILAC(SA) ~ May 1990] Exercise 10 State whether the following statements are true or false and give reasons for your answers: @_Themean may be a misleading measure of central location if the data form a skewed distribution. 0) If the mean is greater than the median the distribution is skewed to the right. Gi) The mode is influenced by outliers. Gy) Ina frequency distribution, the modal value need not necessarily lie in the interval with the highest frequency. (v) The 50th percentile is another term to describe the mode. wt (vi) The value which divides an ordered data set into 25 % below it and 75 % above it is called the upper quartile. Exercise 11, A companty operates a computer service bureau and provides account- ing services for clients. The sales manager is concemed about the company’s pricing policy which uses a flat fee of R250 for computer processing time to develop a balance sheet and income statement. The OF CENTRAL LOCATION baa sales manager has selected a random sample of 20 computer runs and recorded the computer processing time in the following frequency distribution: Processing Time Number of Computer Runs 0,00.and under 1,00 minutes 7 1,00 and under 2,00 minutes 0 2,00 and under 3,00 minutos 3 3,00and under 4,00 minutes 1 4,00 and under 5,00 minutes 8 (i) Find the mean processing time required to develop a balance sheet and income statement. Gi) Using the mean processing time found in (), find the average cost per balance sheet and income statement if the charge were R150 per processing minute. Would this be more profitable for the firm, based on this sample? - [CIS - November 1990] Gif) Determine the modal processing time. . (iv) What processing time separates the faster 50 % of computer runs a from the slower 50 % of computer runs? (v) Perform the average costing exercise (as in (ii)) based on a charge of R150 per processing minute using the mode as the central location measure, (vi) Repeat part (v) assuming the median to be the representative measure of central location. (vii), Which costing system would you recommend that the sales manager adopt? Why? ‘Exercise 12 For which of the following would the mean be inappropriate as a ‘measure of central location? (i) The ages of children at a play school, (ii) the diameters of metal shafts in a shipment for the mining industry, (iii) the different makes of cars ina car park, iv) the number of pages in different textbooks on statistics, and (v) the residential location of mail order customers. Exercise 18 The following measures of central location were calculated for a dis- tribution of the number of persons per household in a residential area of a city: a * mode + mean = 56 persons; and + median =5 persons. 78 APPLIED BUSINESS STATISTICS If there are 2 637 houscholds in the area, which of the following proce- dures would be appropriate to determine the total number of persons residing in the area? (i) multiply the number of households by 4 Gi) multiply the number of houscholds by 5,6 (ii) multiply the number of households by 5 (iv) multiply the number of households by 14,6 (Le. 4 + 5,6 + 5) and divide by 3. ICIS — May 1990] Exercise 14 The mean mark of fifty students in class A is 90; for 20 students in class B the mean is 80 and for 10 students in class C the mean is 60. Find the (weighted) arithmetic mean mark of the students in all three classes. [CIS — May 1990} Exercise 15 The number of days in a year that employees ina certain company were away from work due to illness is given in the following table: Sick days Number of employees p-6 67 7-8 a 9-10 | oF 1-12 5 Find the modal class and modal days sick. [CIS - May 1990] Exercise 16 State whether the following statements are true or false. Give reasons for your answers. If the median mass of 5 people in a lift is 65 kg and a 70 kg man enters, then: @ The new median will be about 66 kg. (i) The median will increase. (ii) It is impossible for the new median to be less than it was. Gv) Itis impossible for the new median to stay exactly at 65 kg. (v) The median may increase, but that depends on the actual values of all six figures. [CIS — November 1989] CENTRAL LOCATION 73 417A company employs 12 persons in managerial positions. Their seniority (in years of service) and sex are listed below: Sex F M F M F M M F F F F M Seniority (years) 8°. 15+ +6 9.19 3 47 2 1 (i) Find the seniority mean, the seniority median and the seniority mode for the above data. (ii) Which of the mean, median and mode is the least useful measure of location for the seniority data? Give a reason for your answer. (iii) Find the mode for the sex data. Does this indicate anything about the employment practice of the company when compared to the medians for the seniority data for males and females? {CIS - May 1989] 18 The following table represents the frequency distribution of daily tak- ings (R1 00's) of a retail company: Daily takings Midpoint Number of days 0100 50 10 100-200 150 20 200 = 300 20 50 300 —400 350 80 400-500 450 (100% 500 —600 550 80 350 (i) Calculate the mean takings per day. ii) Determine in which class the median will fall, (iii) If the frequencies of the second and third class intervals were changed around so that they read: 100-200 150 50 200=300 250 30 would the mean and the median be affected? Give reasons for your answer, but do not re-calculate them. ICIS - May 1988] Exercise 19 Mary Scully is employed as an “Affirmative Action Officer” by Ortex ~ Electronics. Mary reports directly to the plant manager, and is respon- sible for monitoring and making recommendations on Ortex hiring procedures, working conditions and compensation plans. As part of her ongoing monitoring of compensation plans, Mary collected data on hourly earnings on all non-salaried employees at Ortex. To aid in interpreting the data, Mary organised the data into the following fre- quency distribution: Exercise 20 Exercise 21 ‘Exercise 22 Exercise 23 Exercise 24 APPLIED BUSINESS STAT! Hourly earnings ‘Number of: (Rands) (i) Calculate the mean, median and mode of the hourly earnings for t men, Gi) As part ofher analysis, Mary calculated the mean, mode and median for the hourly earnings for men and women. The summary statistic for women are reproduced below: Women Mean = = R5,248 Median == R5,273 Mode = R5,022 Mary reviewed the summary statistics, and concluded that there little difference in the hourly earnings of the men and women at Ortes ‘Would you agree with Mary's conclusion? [IMM - May 199 Both the arithmetic mean and the median are measures of centr location. Explain when you would use the median in preference to arithmetic mean. [CIS - May 199 ‘The mean monthly salary paid to all employees ina particular compan was RI 500. The average monthly salaries paid to male and fer employees were R1 560 and Ri 260 respectively. Determine the perce age of males and females employed by the company. NAC(SA) - November 199 The price of a particular brand of soap increased each year by 5 %, 12 6 %, 18 %, 9 % and 10 % over six years. Find the average annual increase for this brand of soap. Anaeroplane travels at 950 km/hour to Johannesburg from Cape To’ and returns to Cape Town at 810 km/hour. What was the average speed per hour for the round trip? The daily percentage change (to the nearest per cent) of an equity trade on the JSE was monitored for 100 days by an investment analyst. Thes eee OOM Exercise 25 _ Exercise 26 daily percentage changes were summarised into the frequency distribu- tion below. Daily percentage Number of ‘change of an equity (%) days 15 eunueon ene Shs Find the geometric mean daily percentage change. ‘The ABC Bicycle company published its annual production figures of bicycles as at 31 March of each year. ‘Year end Units praduced 31/3/1983 12500 31/3/1989 13.250 31/3/1990 14310 31/3/1991 15741 31/3/1992 17630 ‘Use the formula for the geometric mean and calculate the average annual percentage increase over the past 5 years in units produced by the ABC Bicycle company. Use your answer to forecast the production figure for 1995. ILACGA) - May 1992] If the mean annual salary paid to the three top executives of a firm is R156 000, can one of them receive an annual salary of R500 000? Show all calculations. [IMM - November 1992] Flot 4 Measures of Dispersion CHAPTER OBECHIVES nccmmcnires cenmarnee aeen vee . 84 INTRODUCTION = enone eee! Ob RANGE... vannEen TmRmRESESRUD Sau geRMBHTCE 85 INTERQUARTILE RANGE 0.000 s ccc e seu e eee eens 86 QUARTILE DEVIATION 66.0050 000s scecaeemeecnce 08 VARIANCE «6... cuaniann noneee secncenepennne BF + Computation for ungrouped data. en * Computation for grouped data. 5 - 90 STANDARD DEVIATION ..... Pia eee 92 COEFFICIENT OF VARIATION «2... 0..00 0200020 98 MEASURES OF SKEWNESS ..... - 94 + Pearson’s coefficient of skewness . . + 95 + Bowley’s coefficient of skewness _ 7 MEASURES OF PEAKEDNESS — (KURTOSIS) ............ 98 SUMMARY EXERCISES 4.1 OBJECTIVES ‘After studying this chapter, you should be able to: + Identify the various measures of dispersion. + Identify the dispersion measures appropriate for the different types. + Compute each dispersion measure for both grouped and ungrouj sets of data. * Understand the meaning and uses of each measure of dispersion. Identify and compute measures of skewness. INTRODUCTION 42 In chapter 3 it was mentioned that there are two descriptive statisti measures which are always used to describe the behaviour (or teristics) of a random variable. They are: * a measure of central location (refer Chapter 3); and + ameasure of spread or dispersion, Spread {or dispersion) refers to the extent by which the observations random variable are scattered about the central value. Figure 4.1 il trates cases where 3 samples of a random variable could have the s: central value, but different measures of dispersion. This chapter look at measures to quantify this dispersion, Figure 4.1 Distributions with different dispersions values Measures of dispersion provide useful information with which. reliability of the central value may be judged. Widely dispersed o! vations indicate low reliability and less representativeness of the value. Conversely, a high concentration of observations about the tral value increases confidence in the reliability and representativ of the central value. This chapter will also examine a means of cor ing the dispersion of observations between two datasets to establ which one shows greater variability. Types of The measures that are used to describe dispersion are: dispersion * Range measures * Interquartile range * Quartile deviation © Variance + Standard deviation, ‘The method of computation, appropriate data types, uses and interpretation of each are now described. The range is the difference between the highest and lowest observed values ina dataset. Range = Maximumvalue — Minimunvalue Smax ~ Tmin for ungroupediata; or Upper limit (highestclass) — Lowerlimit (lowestclass) for groupeddata ' PLE 1 The re-order problem. (Data obtained from Example 1 in chapter 3.3.) a 7 27 Mw 7 ww Bp 2» 6 8 0 8 16 9 2 2% 5 2B 2B 2 42 M 6 6 DB B eas: min = 5 Range = 29 - 5 = 24 days Interpretation 24 days separates the shortest time between successive re-oders (x9) fromthe longest time between successive re-orders (%inq) {OF @ particular tange of woman's clothing. The range is a crude estimate of spread. It is easily calculated, but is distorted by extreme values (outliers). An outlier would be Xpyin OF Xmax Tt ig therefore a volatile and unstable measure of dispersion as it can vary greatly between samples taken from the same population. It also provides no information on the clustering of observations within the dataset about a central value as it uses only two observations (1.2. Xin AMG Xpag5) in its computation. Wy APPLIED BUSINESS STATISTICS 4.4 INTERQUARTILE RANGE Because the range can be distorted by extreme values (outliers), modified range which excludes these outliers is often calculated. This modified range considers the variability shown by only the | middle 50 per cent of observations and is called the interquartile range. Itis the difference between the upper and lower quartiles, Interquartile Range = Q, --Q; This modified range removes some of the instability inherent in the range if outliers are present, but it excludes 50 per cent of all observa- tions from further analysis. This measure of dispersion, like the range, also provides no infor- mation on the clustering of observations within the dataset as it uses only two observations (i.e. Q, and Q,) in its computation. 4.5 QUARTILE DEVIATION ‘A measure of variation based on this modified range is called the quartile deviation (Q.D) (or alternatively, the semi-interquartile range). It is found by dividing the irterquartile range én half. Quartile Deviation(Q.D.) = Graphically Ordered Sample of Observations + 4 1 ik 4st Q, a, Q oath I-—@.D. —1-— QB) |= Interquarile Range —~ EXAMPLE 2 Quartile deviation for the re-order problem. (Data from Chapter 3) Q, = 14,5 days 18,89 days RES OF DISPERSION a7 Quartile deviation = 93= 145 = 4,715 days Interpretation 50 per cent of all observations are expected to lie within 4,715 days either side of ‘the median of 18,89 days. ‘Atematively, 25 por cont of observatichs are considered to lie within 4,715 days ‘below the median value and 25 per cent of observations are expected to lie within 4,715 days above the median value. ” The quartile deviation is an appropriate measure of spread for the median. It identifies the range below and above the median within which 50 per cent of the observations are likely to fall The quartile deviation is useful as a measure of dispersion if the sample of observations contains excessive outliers as it ignores the top 25 per cent and bottom 25 per cent of the ranked observations. ‘As with the interquartile range, the quartile deviation does not use all the observations and therefore gives no indication of the spread of values between the upper and lower quartiles, VARIANCE The most useful and reliable measures of dispersion are those that: * take every observation into account, and * are based on an average deviation from a central value. The variance is such a measure of dispersion. By having these properties, the variance statistic has become the most commonly used measure of dispersion. It is a powerful statistic which is used extensively in statistical analysis. Illustration — Consider the ages (in years) of 7 second hand cars: 13,7 WW 15 12 18 $finyears). Step 1 Find the sample mean, Step 2 Find the squared deviation of each observation from the sample mean. As noted from Table 4.1, the sum of the deviations is zero. This will always be true when deviations about the mean are measured and. summed. It emphasises the unbiased nature of the arithmetic mean. APPLIED BUSINESS STATISTICS Since ¥(x,-¥) = 0, the deviations must first be squared to avoid plus and minus deviations cancelling each other. These squared devia- tions are then summed. Graphically: Deviations from the mean: a | ° Values of © 7 910 Be 1 18 | em | Sue TABLE 4.1 Variance calculation for the Car Ages problem | Car Ages | Deciation Squered Deviations 5 z =) 6 -F)? 13 12 “4 1 7 2 5s BB 10 2 2 4 15 12 33 9 12 2 0 0 18 12 +6 % 9 12 3 2 Daj-7 = 0 Dey -#P = 84 | cE Step 3 Determine a measure of average squared deviation A measure of average squared deviation can now be found by dividing ¢ total squared deviation by (1 - 1). Its called the variance, ie: = I4years* Note: Division by 1 would appear logical, but the variance stati ‘would then be a biased measure of dispersion. It can be shown to unbiased if division is by (n - 1). For large samples however (Le. > this distinction becomes less important. ec JRES OF DISPERSION 89 a a = I RTC IER The formula for a variance can now be expressed as: Variance = Sumofsquared deviation | (sample size-1) Mathematically The above formula for the variance, is computationally cumbersome. By mathematically rearranging this formula, a computationally efficient formula can be produced which is more commonly used in practice. Computational variance formula for ungrouped data: EXAMPLE 8 The wariance for the car ages problem. The computational variance formula is used. Variance = 14 years* Car Ages (in years) i x e ™ 3B 169 7 ccd 10 100, n=7 15 225 Iz 144 18 328 3 8 Vira st Ex = 1092 1092 — (7) (12) 7-1) ~ = “6 — Caer EXAMPLE 4 Computation for grouped data APPLIED BUSINESS STATISTICS The variance for the re-order problem. Referto the re-order problem (Chapter 3). From the data given in Example 1 of Chapter 3.3, the following values are derived from the sample. of 30 observations: Dix = 18 + 26 + 15 +... + 26 + 19 + 22 = 555 Vid = 18? + 26 + 1S +... 26? +P + 27? = 11541 555 Also ¥ = 18,5 days 30 Using the computational formula for a variance: ne 43,914 days? Grouped data is given by a frequency distribution. In these cases, with the mean, only an approximation of the variance can be found the individual observations are not known. In a similar way to computing the mean from grouped data, # midpoint of each class interval is used as a representative value for all observations in that class. Step 1 Find the mean for the sample of observations. Step 2 Find the squared deviation of the midpoint from the mean for each’ interval. ie. (@j;-¥F Step 3 Multiply each squared deviation by its class frequency. ie. f,aj;-27 ‘This gives the tolal squared deviation of all the observations within class. Step 4 Sum the total squared deviations from each class over all classes. ie Df -wF SURES OF DISPERSION st EXAMPLE 5 This will give an approximation to the total squared deviation for all observations within the sample. Step 5 ‘The variance (average squared deviation) is found by dividing the total squared deviation by (n- 1). Mathematically However, as for ungrouped data, a computationally more efficient variance formula can be developed and used. Computational variance formula for grouped data. Variance for the re-order problem (Computational Foritula). Refer to the frequency distribution given in Table 3.1 of Example 2 in Chapter 3.3. Re-order period Mi a (days) 5-10 10-215 15 -<20 20 -<25 25 -<30 = 1883 days t = 30,67 days” Compare this approximate measure of variance to the S? = 43,914 obtained from the original observations. APPLIED BUSINESS STATISTICS The variance is a measure of average squared deviation about the arithmetic mean. It is expressed in squared units. Consequently, its meaning in a practical sense is obscure. To provide meaning, the dispersion measure should be expressed in the original unit of measure of the random variable, EXAMPLE 6 Interpretation of 4 the standard deviation 4.7 STANDARD DEVIATION A standard deviation is a statistical measure which expresses the average deviation about the mean in the original units of the random variable (i.e. in un-squared units of measure). The standard deviation is the squutre root of the variance and is written a5 SorS, Mathematically: [se | Standard deviation for the re-arder problem. Refer to Example 4 for the ungrouped data variance. Variance = 43914 days* Then the standard deviation = 43,914 S, = 6627 days The standard deviation is a relatively stable measure of dispersion acr: different samples of the same random variable. It is therefore a rat! powerful statistic. It describes how the observations are spread al the mean. If the frequency distribution is approximately symmetrical, about 68 % of all observations will fall within the range of one stane deviation about the mean, i.e. within the limits of (@- $,) and @ + 5,). The interval within 2 standard deviations about the mean inclu about 95,5 % of all observations, i.e. within the limits of (¥ - 25,) and (F + 25,). This property will be referred to again in the section dealing probability distributions (Chapter 6) and sampling distributic (Chapter 7). EASURES OF DISPERSION _ 4.8 COEFFICIENT OF VARIATION It is sometimes necessary to compare samples of data from different random variables to establish which sample data shows greater vari- ability. A direct comparison of their respective standard deviations would be misleading as the random variables may be measured in different units. A meaningful comparison is possible only if the measures of vari- ability are expressed in the same units. This is achieved by producing a measure of relative variability (Le. relative to their mean) which is expressed in percentage terms. A statistic which shows this relative dispersion about a mean for a random variable is called the coefficient of variation. It is defined as follows: This ratio describes how large the measure of dispersion is relative to the mean of the observations. A coefficient of variation value close to 0 indicates low variability and a tight clustering of observations about the mean. Conversely, a large coefficient of variation value indicates that observations are more spread out about their mean value. EXAMPLE 7 Compute and compare the coefficient of variation between 2 company features. Assume 2 characteristics (random variables) of a company are being studied, namely turnover and emplayee age. Determine which charac- teristic shows greater variability. The following statistics have been computed: Turnoverfmontit Employee Age Mean R54 588 38,2 years ‘Standard deviation ‘R 6444 7,9 years _ 8444 x 100 _ 29 x 100 Goetficientsof Variation (CV) cv, = AEE cv, = 4 =1547 % = 20,68 % Interpretation The age characteristic shows greater variability than tumover/month. APPLIED BUSINESS STATISTICS aS eT MEASURES OF SKEWNESS: Figure 4.2 ‘Symmetrical ‘distribution Skewed to the right (positively skewed) distributions Figure 4.3 ‘Skewed to the right ‘distribution Random variables are primarily described by two measures, namely a measure of central location and a measure of dispersion. A third measure, however can also be used to describe thé shape of the distribution of observations. This third measure is referred to asa measure of skewness. Three common shapes can generally be observed: * Symmmetrical distributions © Skewed to the right (positively skewed) distributions * Skewed to the left (negatively skewed) distributions. Symmetrical distributions have a central peak and mirror image slopes on cither side of the central value. (See Figure 4.2.) I Mean = Median = Mode In a symmetrical distribution, all three commonly used measures central location (i.e. mean, median and mode) will lie at the sar position, Skewoness refers to the degree of departure from symmetry. A distribution which is skewed to the right has a few relatively values in the dataset. This implies that the distribution has a ‘long’ to the right (in the positive direction). (See Figure 4.3.) MEASURES OF DISPERSION ‘The mean is the measure of central location which is most influenced by the few extremely large values and hence lies the furtherest to the right. The median is less influenced, while the mode lies at the peak of 7 the distribution. An example of a positively skewed random variable is a household's constimption pattern (in days) of perishable groceries. The majority will be eaten fresh (eg. within say 3 days of purchase), the rest will only be eaten much later (og. a week or more later). ‘Skewed to the left A distribution which is skewed to the left has a few relatively smail (negatively values in the dataset. This implies that the distribution has a relatively skewed) ‘long’ tail to the left (in the negative direction). (See Figure 4.4.) distributions Figure 4.4 ‘Skewed to the left distribution The mean is the measure of central location which is most influenced by the few extremely small values and hence lies the furtherest to the z left. The median is less influenced, while the mode lies at the peak of + the distribution. An example of a negatively skewed random variable is the lead time (in days) of selling houses. Few are sold quickly — the majority are sold after lengthy periods of time. are ‘Two descriptive statistics can be used to measure the degree of skewness or symmetry within a distribution of values of a random variable. They are: + Pearson's Coefficient of Skewness. + Bowley’s Coefficient of Skewness. 4 Pearson's This measures the degree of departure from symmetry based on the coefficient of difference between the mean and the median, or between the mean and | skewness the mode. It is valid to compute Pearson's coefficient of skewness only for quantitative random variables where the data are interval- or ratio-scaled. The arithmetic mean cannot be computed for categorical data. APPLIED BUSINESS STATISTICS SES Sp 3 (Mean — Median) St» = Standard deviation or, alternatively (Mean — Mode) ? ~ Standard deviation Symmetrical distribution: (shown by Sk, = 0) Ina symmetrical distribution, as seen from Figure 4.2, the mean and the median are equal. Hence the numerator in Pearson's formula has a value of zero (sea — median = 0) and consequently, Pearson’s skewness coeffi- cient is zero. | Skewed to the right distribution: (shown by Sk, > 0) . If the distribution is skewed to the right as shown in Figure 4.3, the mean | will be greater than the median. Hence (mean ~median) will be a positive | value, implying that Sk, will be positive. The magnitude of the skewness coefficient will depend on the size of the difference between the mean and the median. An excessively skewed distribution will have a mean very much larger than the me | dian, hence the skewness coefficient will be a large positive value Moderate skewness will be reflected in a skewness coefficient which small positive. Skewed fo the left distribution: (shown by Sk, < 0) In askewed to the left distribution the mean is smaller than the medi This is shown in Figure 44. Hence (mean ~ median) will be a negatis value, implying that Sk, will be negative. ‘The magnitude of the skewness coefficient will again depend on # size of the difference between the mean and the median. Moderai negative skewness will be reflected in a skewness coefficient which i small negative (ie, close to zero). EXAMPLE 8 Pearson's skewness measure for the re-order problem. : Giver: Mean =185 days Median = 18,89 days Standard deviation = 6,627 days ‘Then Pearson's coefficient of skewness: = 3185 — 18,89) Shp RT = = 0,1766 JRES OF DISPERSION or Bowley's coefficient of skewness EXAMPLE 9 Seer iaenrerte TT Interpretation 8k, is negative, but close to zero. This implies a slight skewness to the left. In practice, this value is close enough to zero to conclude reasonable symmetry. The alternative Pearson's formula for skewness is used when only the mean, mode and standard deviation are known. Bowley’s coefficient of skewness is based on the quartile deviation and its measure relative to the median. Bowley’s skewness coefficient is defined as follows: (Qs = Q) = Q - (Q3 - QD) Sk, ‘The coefficient takes on values between =1 and +1 only. The denominator is the interquartile range (i.e. Q;—Q,), while (Q, -Q,) is the distance from the upper quartile to the median, and (Q) ~Q)) is the distance from the median to the lowér quartile, If the two measures of deviation in the numerator are equal (ie. (Qy ~ Q,) = (Q - Q))), then Bowley’s coefficient of skewness is zeroand the distribution is symmetrical. Af the upper quartile distance from the median is greater than the lower quartile distance from the median, then there are a few extremely large values in the dataset. This implies positive skewness. Bowley’s coefficient will then be positive. The maximum value will be +1, im- plying excessive skewness to the right. Conversely, if the lower quartile distance from the medianis greater than the upper quartile distance from the median, then there are a few extremely small values in the dataset, This implies negative skewness. Bowley’s coefficient will then be negative. The maximum value will be -1, implying excessive skewness to the left. Consider the following ordered dataset of the length (in pages) of 9 tourist guide books on Cape Town, uM 6 FB 30 56 98 106 110 t " t a a Q Compute Bowley’s skewness coefficient and comment on the result. Qs - Q, = 98 - 30 = 68 Q - Q = 9-27 =3 Q, - Q, = 98 - 27=71 98 APPLIED BUSINESS STATISTICS: TR SESS ES (68 = 3) 7A 0,9155 Sk, = Interpretation This result shaws excessive skewness tothe right (positive), EXAMPLE 10 Compute Bowley‘s skewness measure for the re-order problem. From Chapter 3, sections 3.5 and 3.6, the following quartile measures were computed Q, = 145 days Q, = 18,89 days Q, = 23,93 days Then Bowley’s skewness coefficient is given by: _ 23,93 — 18,89) — (1889 — 145) Shy = (23,93 — 14,5) Interpretation ‘This result shaws mild skewness to the right (positive). In practice, such a positive (or negative) value would imply a reasonably symmetrical distribution. result is consistent with Pearson's skewness coefficient value for the re-or problem in Example 8. 4.10 MEASURES OF PEAKEDNESS — (KURTOSIS) A final descriptive measure to describe the behaviour of a rand variable is the degree of peakedness of the distribution. Figure 4.5 illustrates three possible forms of peakedness, Figure 4.5 peakedness (kurtosis) OF DISPERSION se © A highly peaked distribution (i.e. a heavy concentration of observa- tions around the central location) is called leptokurtic (graph A). = A moderately peaked distribution is called mesokurtic (graph B). * A flat distribution (i.e. the observations are widely spread about the central location) is called platykurtic (graph C). ‘The formula to compute peakedness is complicated and it is usually only determined by inspection of a graph of the distribution (i.e. from the frequency polygon). SUMMARY ‘The four descriptive statistical measures whieh have been developed in chapters 3 and 4 fully describe the behaviour of a random variable. To summarise, a random variable can be described by its central value, its degree of dispersion (spread) about the central value, its skewness, and finally by its penkedness. ‘All provide information to assist in deciding whether the central location value can be regarded as a reliable, representative value of all the observations in a sample of data ‘A good representative, reliable central measure has a low measure of dispersion (i.e. a small S, as reflected both in absolute terms and in relative terms through the CV); a symmtetrical shape of a distribution; and a highly peaked distribution These terms, particularly the mean, standard deviation (variance) and concept of symmetry are important in further statistical analysis and will be referred to again in later chapters. 100 ‘i APPLIED BUSINESS STATISTICS Ete a EXERCISES Exercise 1 The following data represents the percentage of family income allocated to groceries for a sample of 50 shoppers. Percentage Frequency 10-19 6 20-29 14 30-39 16 40-49 " 50-59 3 Compute the values of the mean and standard deviation. {IMM ~ May 1 Exercise 2 The table below gives the monthly rent paid by 150 employees company X. Rent (in Rams) ‘Number of employees 100-159 20 160-219 35 220-279 : .280 =339 R 340 -309 19 400 =459 1b fa) Calculate the value of @) The median monthly rent. The standard deviation of the monthly rent. {IMM = October 1 (8) Find the lower and upper quartile monthly rents. ' (©) Determine the quartile deviation of monthly rent. (d) Interpret the meanings of the standard deviation and the qua deviation in terms of the above problem. Exercise 3. Discuss briefly: (i) the limitation of the range as a measure of dispersion. (ii) which measure of dispersion would you use if the mean is used the measure of central location? Why? (ii) which measure of dispersion would you use if the’median is as the measure of central location? Why? Exercise 4. The mean is said to be affected by extremely large or small values 6 outliers). Can it also be said that the standard deviation is also aff by outliers? Explain. -ASURES OF DISPERSION - 101 Exercise Find the mean and standard deviation for the following data which records the duration of 20 telephone hotline calls on the 087 line for advice on “car repairs”. Duration Number of calls 0o-<1 7 1-<2 0 2-3 3 3-<4 1 4-<5 9 Ata cost of R2,60 per minute, what was the average cost of a call, and what was the total cost paid by the 20 telephone callers? Exercise 6 The following intermediate results were obtained from a frequency distribution: Dh = 100 Diha; = 3460 ¥ fad = 124690 Find the mean and standard deviation of the x variable. ICIS - November 1989] Exercise 7 Consider the following two sets of data, each with 5 observations: Set 1 30 40 50 60 70 Set 2 130 140, 150 160) 170 Which of the following descriptive measures will be the same for both sets of data? Give reasons for your answers. No calculations are necess- ary. @) The mean, (ii) the variance, (ii) the mode. , ICIS - May 1988] Exercise 8 76 motorists were asked to tecord for the month of May 1992, the amount of money they spent on petrol purchases. The following distribution shows their purchases: Petrol purchases (in Rands) Number of motorists o-< 50 4 50-< 100 1L 100—< 150 8 150—< 200_ 200- < 250, 250-< 300 102 APPLIED BUSINESS STATISTICS. (a) Find the mean, median and medal amount spent on petrol during, May 1992 by the 76 motorists. (®) Find the standard deviation of petrol purchases. (c) Determine the degree of skewness in the distribution of petrol ‘purchases using Pearson's coefficient of skewness. Comment on the finding (a) Find the interquartile range and quartile deviation. (e) Use Bowley’s skewness measure to find the degree of skewness in! the distribution of petrol purchases. Comment on the finding. Exercise 9 (a) Calculate the mean, mode and standard deviation for the distribu- tion of hourly earnings of women and men as given below: Number Hourly earnings Women Men 470-490 6 5 4,90 —5,10 31 16 5,10-530 29 25 530-550 19 30 550-570 15 mt (®) Find the coefficients of variation for each of the distributions hourly earnings, Which shows greater variability in hourly ea ings? Why? (@ Find the degree of skewness for each distribution using Pearson’ measure. Comment on the findings, Exercise 10 An inspector visits the fresh fruit market and inspects at random trays of peaches. The number of peaches in each tray are counted shown as a distribution: ® Number of peaches Nuamber of Trays a 1 4 4 45 18 46 5 a7 2 Calculate the standard deviation for the number of peaches per tray. [CIS - May 1 Exercise 11. Give three reasons why the standard deviation is regarded as a be measure of dispersion than the range. IAC(SA) ~ May 1 Exercise 12. 12 female clients who purchased a particular brand of perfume gave the following ratings (based on the 5-point Likert scale) to each of two statements concerning brand preference. The Likert rating scale is as follows: 1 —Strongly disagree. 2 — Disagree. 3 —Neutral. 4 — Agree. 5 — Strongly agree. Response ratings to each statement Statement] 2 2 3 4 2 3 2 13 3 2 4 Staement2 3 35 2 4 3 3 2 4 4 5 3 (@ Determine the mean response rating per statement. (b) Find the standard deviation of response ratings per statement. (c) Compute the coefficient of variation for responses to each statement. (On which statement is there less concensus? Explain. Exercise 19. The number of days ina year that employeesin a certain company were away from work due to illness is given in the following table: Sick days Number of employees 5-6 67 7-8 a1 9-10 67 1-12 52 : Find the mean and variance of days sick. a Exercise 14 Employee bonuses earned by workers at a furniture factory in a recent month (in Rands} were: wot dae as 58° 6 a 2 6a sit 30 a3 Rs 73 29° 39k” 53 ore Ble (i), Find the mean and standard deviation of bonuses. (ii) Find the interquartile range and quartile deviation, _ (iii) Find Bowley’s skewness measure. Comment on the result. Exercise 15 A small kiosk selling ice creams has kept a record of sales for November and December 1991. They trade under the name “Cooldow: 404 Exercise 16 » Exercise 17 APPLIED BUSINESS STATISTICS eemonee The sales are summarised in the following frequency distribution: Daily sales (R) Number of days 60-< 70 5 70-< 80 u s0-< 90 Prd 9a-<100 13, 100-<110 7 110-< 120 3 6 (2) Whatis the mean and standard deviation of daily sales. Comment. (b) Use Pearson’s skewness measure to determine the skewness of the distribution of ice cream sales. A market research bureau asked 100 housewives who purchased Brand X washing powder how many packets they bought during a specified period. The purchases recorded were as follows: 1 is 6 © $$ 9 0 2 @ 0 6 41 4 #W 0 4 W 1B 1 22 2 1 7 6 1 1 & 2% 6b 3 2 3 8 6 b 3 1 1 7 1 101 1 4 1 1 2 30 6 7 5 2 2 B 0 B 3 6 1 3 2 3 1 3 B F mM 4 3 1 1 1 1 1 1 4 11 8 4 4 9 1 6 4 6 1 18 2 mw 6 6 9 4 2 6 1 9 (@) Arrange these data into an appropriate frequency distribution using the classes 0,5 ~ 10,5; 10,5 - 20,5; 20,5 ~ 30,5; etc. (b) Calculate the arithmetic mean, median, mode and standard devi tion of the grouped data. (co) Explain what is meant by “skewness”; how it is measured and difference between negative and positive skewness. (@ Compute Pearson's coefficient of skewness and discuss the sket ness of the frequency distribution constructed in (2) above. [IMM ~ June 1 Telkom has implemented a system of charging out telephone callls base on the length of a call. To find out how this new charging out syst would affect its telephone bill, a market research company which carri out extensive telephone interviews monitored the duration of 600 « over a period of 3 days. RES OF DISPERSION os 405 The following frequency distribution was compiled: Duration (minutes) Number of calls 2- 49 38 5-79 122 8-10,9 186 11-13,9 134 14-169 98 17-199 2 (i) Find the mean and median of call lengths. Gi) Find the standard deviation of call lengths (i) Find the interquartile range and quartile deviation. (iv) Establish the values of beth Pearson's and Bowley’s skewness measures, (v) Which set of descriptive measures (mean and standard deviation or median and interquartile range) would you recommend to the management as the representative measures of call lengths? Exercise 18 The annual earings of a company’s salesmen at its Johannesburg and ‘Cape Town offices are as follows: Earnings ‘Number of Salesmen (1 000s) Johannesburg Cape Town 6-<8 3 2 8-<10 x 3 10-<12 13 6 12-<14 7 8 14-<16 4 3 16-< 20 4 2 20-<25 2 6 Compare the salesmen’s earnings in Johannesburg and Cape Town by finding the means, medians and quartile deviations. ICIS - November 1992] Exercise 19 It has been claimed that for samples of size m= 10, the range should be roughly three times as large as the standard deviation. Check this claim with reference to the following data, representing the daily sales totals (in R1 000s) for 10 days at a large retail outlet. 1B 21 16 24 28 20 2 29 19 23 [IMM - November 1992] 5 Basic Probability Concepts shet IS CHAPTER OBJECTIVES «1. cece eee e eee eee INTRODUCTION TYPES OF PROBABILITIES . BASIC PROPERTIES OF A PROBABILITY . BASIC PROBABILITY CONCEPTS * Intersection of two events (AB) . . + Union of two events (AWB) . + Mutually exclusive events’. . + Collectively exhaustive events Statistically independent events COMPUTATION OF OBJECTIVE PROBABILITIES + Marginal probability + Joint probability + Conditional probability PROBABILITY RULES = AGiiion rake ion Shuually excasive eten Es + Addition rule: mutually exclusive events . . + Multiplication rule: statistically dependent events + Multiplication rule: statistically independent events COUNTING RULES SUMMARY EXERCISES 107 + 108 108 109 110 valt lit 112 well? 113 «Tl 113 14 15, 17 vag? - 118 9 . 120 = 124 124 = 125 5.1 OBJECTIVES After studying this chapter, you should be able to: * Define the different types of probabilities. * Describe the properties and concepts of a probability. * Apply the rules of probability. © Construct and interpret probabilities from a contingency table (joint probability table). * Understand counting rules (permutations and combinations). 5.2. INTRODUCTION Probability theory is fundamental to the area of statistical inferen. Inferential statistics deals with generalising the behaviour of rando: variables from sample findirigs to the broader population. Probabili theory is used to quantify the uncertainties involved in making th generalisations. Definition A probability is the chance, or likelihood, of a particular outcome out a number of possible outcomes occurring for a given event, 5.3 TYPES OF PROBABILITIES Probabilities are broadly of two types: Subjective ubjective probabilities cannot be statistically verified. They are used extensively in statistical analysis and will not be consid further. Objective When the probability of an event can be verified (usually thr Probabilities repeated experimentation or empirical observations), it is referred an objective probability. It is ‘this type of probability that is extensively in statistical analysis. Mathematically, a probability is defined as the ratio of two numbers. P(A) = ‘ = f where A = event of a specifictype (or with specific properties) number of outcomes of event A total number of possible outcomes {also called the sample space) P(A) = probability ofevent A occurring. now An event is part of a random experiment which is defined as the issue under study for which outcomes are yet to be determined. Objective probabilities are derived either: * a priori, ie. when the possible outcomes are known in advance — such as tossing a coin, selecting cards from a deck; * empirically, i.¢. when the values of r and 1 are not known in advance and have to be observed empirically through data collection; or + theoretically ie. through the use of theoretical distribution functions (mathematical formula that can be used to compute probabilities for certain event types). This chapter will focus on computing probabilities empirically, while chapter 6 will show how objective probabilities can be found using theoretical functions, BASIC PROPERTIES OF A PROBABILITY (i) A probability value lies only between 0 and 1, i.e. Oz PA) EL Gi) Ifan event A cannot occur (ie. an impossible event), then: P(A) =0. (iii) If an event A is certain to occur, then: P(A) =1. (iv) The sum of the probabilities of all possible outcomes of a random experiment equals one. For example, if liver, chicken and beef are the only possible choices (outcomes) of dog food flavours, then for a random selection of one tin off a supermarket shelf: iw / Piliver) + P(chicken) + P(beef) = 1 (v) Complementary probabilities: If PCA) is the probability of event A occurring, then the probability of event A not occurring is defined as: PCA) =1- P(A) Note: P(A) is also sometimes written as P(A’). { “Gi ove wusu 4 Was AE Qo1d 440 EXAMPLE 1 Consider a random process of drawing cards from a card deck. LIED BUSINESS STATIS These probabilities are called a prior probabilities. (® Let A = event of selecting a red card. Then P(red card) = Bo = i (26 possible red cards out of 52 cards) i) Let A = event of cletiogs spade. Then P(spade) = 25 = } (13 possible spades out of a total of 52 0 Gi) Let A = event of selecting an ace. Then Place) = s = # (4 possible aces out of a total of 52 cards) (iv) Let A = event of selecting “not an ace. Then P(not an ace)= 1 — Pace) = 1 — 13 13 5.5 BASIC PROBABILITY CONCEPTS EXAMPLE 2 Consider a random experiment of selecting companies from the Ja The concepts will be illustrated using the following example. nesburg Stock Exchange. These probabilities are called empirical probabilities as the data ne ary for constructing probabilities has firstly to be gathered from surv and organised into tabular form. Values for the random variables, company size and industry type measured and tabulaled in Table 5.1 for 150 companies. Table 5.1 is called a contingency table. It summarises into freq counts the values of the two categorical random variables. TABLE 5.1 ‘Sample of 150. companies listed on the JSE. Company Size (in Rmillion turnover) Indusirytype | Median ae Roxw Total @-<10) — (10-<50) (>50) Mining 0 0 35 Finance 8 2 2 Service J & 3 1 Retail 4 13 6 | Column Total | 29 a st 150 n of The intersection of events A and B is the set.of outcomes that belongs to twoevents both A and B simultaneously. It is written as (A 7 B) (Le. A and B), and (ANB) the keyword is “and”. Figure 5.1 shows the intersection of events graphically using a Venn diagram. Graphically Figure 5.4 Sample Space = n ad ears er L Example \ © Let A = event (small company), and ‘+ Let B = event (service company), Then (A. B) is the set of all-small and service companies. From Table 5.1, there are 6 companies which are both stuall and service out of 150 JSE companies surveyed. This is shown graphically in the Venn diagranrin Figure 5.2. oftwo The wzion of events A and Bis the set of outcomes that belongs to cither events A orBorboth. Itis written as (A UB) (ie. A or Bor both) and the keyword (AUB) is “gr” Figure 53 shows the union of events graphically using a Venn diagram. vas ~ 12 Graphically La ie to Dace 412 APPLIED BUSINESS STATISTICS. Exemple VW + Let A =event (small company), an * Lot B= event (service company). Then (A UB) is the set of all small ar service or both (small and service) companies. PY, ‘There are 29 small companies (includes 6 service companies), 10 service / companies (includes 6 small companies), and 6 small and service com panties. Thus there are 33 (29 + 10 - 6) Companies which are eithér smi or€érvice or both. In the calculation of the number of companies. whic are either small or service or both, the intersection event is subtract once to avoid double-counting This is showen graphically in the Venn diagram in Figure 5.4. Figure 5.4 Venn diagram of the Union of JSE companies Mutually Events are mutually exclusive if they cannot occur together on a single exclusive events of 2 random experiment. (i.e. not at the same point in time). Example © Let A =event (small company), and « © Let B = event (medium company). Events A and B are mutually exclusive because a randomly sel company from the JSE sample cannot be both small and medium at same time. Example aa © Let A event (small company), and © Let B = event (finance company). Events A and Bare not mutually exclusive because a randomly sel company from the JSE sample ean be a small, financial company at same time. Collectively Eventsare said to be collectively exhaustive when the union of all exhaustive events is equal to the sample space. This means, that in a single trial events random experiment, at least one of these events is certain to occur. Statistically ‘independent Example | L- + Let A =event (small company) * Let B = event (medium company} ° Let event (large company) Then (A v Bu C) = sample space (i.e. small, medium, large companies) = all JSE companies Two events are said to be statistically independent if the occurrence of event A has no effect on the outcome of event B occurring and visa versa. Example + Let A = event (an employee over 30 years of age) + Let B = event (a female employee) If it can be assumed or empirically verified that a randomly selected employee over 30 years of age froma large orgainsatith is equally likely to be either a male or a female employee, then the two events A and B are statistically independent. A word of caution: The terms “statistically independent” events and “mutually exclusive” events should not be confused. They are two very different concepts. When two events are “mutually exclusive”, they are not “statistically independent”. They are dependent in the sense that that if one event happens, then the other event cannot happen. In probability terms, the probability of the intersection of two mu- __ tually exclusive events is zero, while the probability of two independent events is equal to the product of the probabilities of the separate events. “COMPUTATION OF OBJECTIVE PROBABILITIES Marginal probability Objective probabilities can be classified into 3 categories. The categories are: + Marginal probability. * Joint probability. * Conditional probability. ‘The definition and computation of each type is described next. A marginal probability is the probability of only a single event A occurring. It is written as: |p | 4 Jointprobability A joint probability is the probability of both event A and event APPLIED BUSINESS STA A single event is an event that describes the outcomes of one rando variable only. A frequency distribution describes the occurrence of one characteristic of interest at a time and is used to estimate mar probabilities. Example From Table 5.1, the random variable company size is deseribed by & frequency distribution in the Column Total row. Extract from Table 5.1 [ Company size Number of ISE firms 7 Small (0 -< 10) 2 Medium (10-< 50) 37 Large (> 50) Es n= 150 ‘Let A = event (mall company) 29 isp = 01933 | Then P(A) = Example From Table 5.1 , the random variable industry type is described by & frequency distribution in the Row Total column, Extract from Table 5.1: ‘Number of SE firms Industry type Mining Finance Service Retail occurring simultaneously on a given trial of a random experiment. PROBABILITY CONCEPTS 115 joint event describes the behaviour of two or more random variables (Le. the characteristics of interest) simultaneously. It is written as: race | A joint event was defined earlier as the intersection of two simple events ina Venn diagram. . A contingency table shows the behaviour of two random variables sim- ultancously and is used to find joint probabilities. A contingency table gives absolute frequencies (counts). When the absolute frequencies are expressed in relative terms (Le. as a proportion of the total sample size), the table is called a joint probability distribu- tion (or a joint relative frequency distribution). Table 5.1 (shown as Table 5.2 below) is a contingency table of the random variables company size and industry type. TABLE 5.2 ‘Sample of 150 companies listed on the JSE Company Size fm R million ternover) Tnesen rs Small Medium Large ad | (0-<10 = (0-<50) «> 50) Mining o 35 38 Finance 2 2 R Service 3 1 10 Retail 8 6 33 Column Totat_| 7 ae 180 ees = = eB Example + Let A event (small company) © Let B = event (finance company) There are9 out of 150 JSE listed companies in the sample whichare both small and finance companies. Then P(A B) = 52, =0}06 Conditional A conditional probability is the probability ofone event A occurring given Probability information about the occurrence of another event B. PETER 416 APPLIED BUSINESS STATISTI ae A conditional event describes the behaviour of one random varial A conditional probability is defined as: _PANB) PCA/B)= PB) | J, The essential feature of the conditional probability is that the sam space is reduced to the outcomes describing event B (the given pri event) only, and not all possible outcomes as for marginal and joi probabilities. Example * Let A= event (large company) + Let B= event (retail company) Then P(A/B) is the probability of selecting a company from the sample which is large given that the company is known to be a company. Extract from Table 5.2: Retail 14 13, 48 ‘There are 6 large companies out of 33 retail companies Thus P (A/B)= & = 0,1818 Using the formula: P(A MB) = = (a joint probability) and P(B) = 2 (a marginal probability) vag lt Then P(A/B)-= ay = 0.1818 PROBABILITY CONCEPTS PROBABILITY RULES Probablilty rules have been developed to compute probabilities of more complex related events. There are two basic probability rules, namely: © Addition rule: — For non-mutually exclusive events. ~ For mutually exclusive events. + Multiplication rade: — For statistically dependent events. = For statistically independent events. 2 The probability of either event A or event B or both occurring in a single trial of a random experiment (ie. the union of two non-mutually exclusive events) is defined as: P(A UB) =P(A) + PB) - PAD In Venn diagram terms, the union of two non-mutually exclusive events is the combined outcomes of the two overlapping events A, B. Gee Figure 5.5). Example © Let A= event (small company), and © Let B = event (service company) Events A and B are nat mutually exclusive as they can occur simulta- neously. From Table 5.2, compute the following: P(A) = Pésmall company) = = P«B).= Plseraice company) = js and P(A 1 B) = P(small and service company) = -& ~“ L. : 7» Addition rule: mutually exclusive events Figure 56 Venn diagram for the ‘addition rule (mutually exctusive events) APPLIED BUSINESS STATISTICS Then P(A UB) = P (either small or service or both) a 2, 0 _ 28 ~ 450 " 150 150 ‘This is the probability that a company selected at random from the JSE will be either a small of a service company or both. The probability of either event A or event B occurting in a single trial of a random experiment (Le. the union of two mutually exclusive events) is defined as: P(AUB) (A) + P(B) For mutually exclusive events, there is no intersectional event. Thus P(A. B) = 0. In Venn diagram terms, the union of two mutually exclusive events is the combined outcomes of the two separate events, A, B, (Gee Figure 5.6). [ a = =T © > 7 Pines ee pak * Let A =event (small company), and * Let B= event (large company) Example Events A and B are mutually exclusive as both events cannot simultancously in a single JSE company. From Table 5.2, derive: P(A) = Peemaltcompany) = 2h PB) = P large company) = P(A U B)= P (eithersmuill or large company) 29 | 8h = 750 * 150 113 | = 59 7 07593 This is the probability that a company selected at random from the JSE will be either a siall or a large company. The multiplication rule may be used to find the joint probability of event A and event B occurring on a single trial of random experiment (i.e. the intersection of the two events). By rearranging the conditional probability formula, the multiplication rule is defined as: P{AMB) =P (A/B) x PEB) | Where P(A 7 B)= joint probability of A and B. P(A/B) =, conditional probability of A given B. P(B) = marginal probability Exanrple Find the probability of selecting a medium, finance company from the JSE* sample. Refer to Table 5.2 for the absolute frequencies. Let A= event (medinim company) Let B = event (ffragnce company) Intuitively from Table 5.2, PAB) = 2h = 0.14 ‘Applying the multiplication rule formula: PCB) = PaFinance company) = 72. P(A/B) = Pomediumyfinance company) Then P(A mB) = P(A/B) x P{B) 2 y 72 ~ 72150 21 = Y5q = 0-14 “ a “72 Then P(A UB) = P (eithersmall or large company) _ 29 | 150 * 150 113 = 459 7 0-7533 This is the probability that a company selected at random from the JSE will be either a small or a large company. Multiplication ‘The multiplication rule may be used to find the joint probability of event ‘A and event B occurring ona single trial of random experiment (ie. the intersection of the two events). By rearranging the conditional probability formula, the multiplication mule is defined as: {A/B) x PXB) Where P(A mB) = joint probability of Aand B, P(A/B) =, conditional probability of A given B. P(B) = marginal probability Example Find the probability of selecting a medium, finance company from the JSE sample. Refer to Table 5.2 for the absolute frequencies. Let A = event (medium company} Let B= event (finance company) Intuitively from Table 5.2, P(A 9B) = =m 014 Applying the multiplication rule formula: = Pf 7 P(B) = PUfinance company) = 72 P(A/B) = PGmedium|finance company) = a Then P(A 0 B)= P(A/B) x P(B) ay, 7 =e aD 21 50 = O14 ~ Multiplication rule: statistically independent events Test for statistical independence of events APPLIED BUSINESS STATISTICS eee If two events are statistically independent, then the multiplication rule reduces to: P(A. By = P(A) x PB) i.e. the product of the two marginal probabilities. Refer to section 5.5 where the concept of statistically independe: events was discussed. Two events A and B are statistically independent if the followi relationship can be shown to be true: ‘This means that if the marginal probability of event A equals conditional probability of event A given that event B has occurred, th 7 the two events are statistically independent. This implies that the prior gegurrence of event Bin no way influen the outcome of the (single) event!A. Example Fy Determine if company size is statistically independent of industry type ® the JSE sample” If it can be shown that P(A/B) = P(A), then the two events statistically independent, Two outcomes are arbitrarily chosen each event. Let A = event (medium company) Let B = event (finance company) Then P(A/B) = 2) = 02917 37 _ y2467 and P(A) = 355 Since the two probabilities are not equal, the two events are statistic dependent (.c. company size and industry type are related). PROBABILITY CONCEPTS ” ac acs eat a 2 ‘COUNTING RULES Probability computations involve counting the number of successful outcomes (r) and the total number of possible outcomes (7!) and express- ‘ing them as a ratio. Often the values for rand 1 are not feasible to count because of the large number of possible outcomes involved. Counting rules assist in finding values for r and 1. There are three basic counting rules: * Multiplication rule. + Permutations. S Combinations. multiplication The multiplication rule is applied in 2 ways: rule (a) The total number of ways in which m objects can be arranged (ordered) is given by: nl = 1 factorial = ma =1) (n=2) (8-3)... 3.24 Note: 0 Example The number of different ways in which 7 horses can complete a race is given by: T= 7654321 4 = 5040 different arrangement. (b) Ifa particular random process has = nr possible outcomes on the first trial = nz possible outcomes on the second trial etc —_ nj possible outcomes on the jth trial. Then the total number of outcomes for the j trials are: my X Mg X Mg XH) Example A restaurant menu has a choice of 4 starters, ten main courses and 6 desserts. Then the total number of possible meals that can be ordered are; 4x 10x 6 = 240 meals. ‘A permutation is the number of distinct ways in which a group of objects can be arranged. Each possible arrangement (ordering) is called a permutation. APPLIED BUSINESS STATISTI ‘The number of ways of arranging r objects selected from m objects, where ordering is important, is given by the formula: 1 Where n! = n factorial = n(n —1)(1 —2)(n—3)...3.21 r = number of objects selected ata time Ht = totalnumber of objects. Exainple 10 horses compete in a race. (® How many distinct arrangements are there of the first 3 horses pi the post? (ii) What is the probability of predicting the order of the first 3 hor ‘past the post? Solution: (i) Since the order of 3 horses is important, it is appropriate to use permutation formula. Given 1=10 horses horses tol _ to! Then Fy= Gg—3~ 7 There are 720 distinct ways of selecting 3 horses out of 10 horses. (ii) The probability of selecting the first 3 horses past the post is 1 Selecting 3 out of 10 horses 1 720 Pifirst 3 horses) = chanceofwinning. Combinations A combination is the number of different ways of arranging a subset objects selected from a group of objects where the ordering is not import Each possible arrangement is called a combination. The number of ways of arranging r objects selected from m objects, not considering order, is given by the formula: Where nl= m factorial = n(n — 1) ( — 2) — 3) -.. 3.21 (-)O=2O-3)... 321 umber of objects selected total number of objects. Example 10 horses compete in a race. (How many arrangements are there of the first 3 horses past the post, not comsidering the order in which the first three pass the post? (ii) What is the probability of predicting the first 3 horses past the post, in any order? Solution: ! 4i) The order of the first 3 horses is not important, hence apply the combination formula. Given n= 10 horses r= 3 horses 10! 10! Then 4S, = 319 — 3) = 371 1098.76.54. ~ 6.2.1) 7.65. = 120 There are 120 different ways of selecting 3 horses out of 10 horses, without regard to order. (ii) The probability of selecting the first 3 horses past the post, disre- garding order is given by: Pifirst 3 horses) = 1 Selecting 3 out of 10 horses 1 = j9p Chancoof winning. 124 APPLIED BUSINESS STATISTICS Spores 5.9 SUMMARY & This chapter laid the foundation for inferential statistics which is covered in later chapters, The term probability was defined asa measure of the uncertainty associated with the outcome of a specific event. Properties of probabilities were defined as were concepts of probabilities such as the union and intersection of events, mutually exclusive events, collectively exhaustive sets of events and statistically independent events. The basic probability types, namely marginal, joint and conditional probabilities, were computed from contingency tables. Probability com pulations for more complex, related events were also derived through the use of the addition rule and the multiplication rule. Finally, counting rules such as permutations and combinations, were introduced as a way of finding the number of outcomes associated with specific events. | The next chapter will consider probability computations for random) variables which follow certain defined theoretical patterns called prob- ability distributions. PROBABILITY CONCEPTS: CISES Exercise 1. The Personnel department of an Insurance firm analysed the qualifica- tions profile of their 129 managers. The qualification level is the highest qualification achieved by a manager. . Managem! eel Qualification, level -______+ Section tend |p Department Fea | | Division Matric | \g Diploma | Degree 4 Total 8 (a) Define the two random variables, their measurement scale and data type. (b) Complete the above contingency table. (c) Produce the joint probability table. (@) What is the probability of a person selected at random: (having only a matric? « Gi) being a section head and having a degree? {Gii) being a department head given that they have a diploma? Gy) being a division head? ~ () being a division head or a section head? (vi) having matric, or a. diploma, or a degree? (vii) having matric giver that the person isa department head? . (Wild) being a division head Sr having a diploma? (c) For each probability cotnputed in (AG) — (wit), state: (‘the type of probability / (Le. marginal; joint; conditional) Gi) which probability rule, if any, was applied (Le. addition rule; multiplication rule) f) Are the events in (v) and (viii) mutually exclusive? x Exercise 2 The incomplete relative frequency table for events XX) Xy Xeand ¥,, Y, and Y,, is given below: - | x x x x | i! Y Orn? 0,03 012 003 |" O25" Yi, | 008 0,07 010 Oa, | 2,27 Y 0,08 oz 018 G10 ag 126 i APPLIED BUSINESS STATISTICS = a ee en Se @ Find PX) (i) Find Pty, (ii) Find Pey, and X,) (iv) Find PCY, or Y, or Y5) (v) Find Pty, or Y,) ICIS - November 1990] Exercise 8. If the probability is 0,2 that an event will occur at any attempt, ind pendently, indicate which of the following statements are correct (gi reasons for your answer) G)_ The event will occur exactly once in every sequence of five attempt (i) The event will occur on every fifth attempt. (ii) The event is certain to occur within 20 attempts (iv) We cannot predict with certainty on which attempt the event first occur. (CIS - May 1 Exercise 4 The following bivariate probability distribution refers to two char. teristics: age and traffic offences of residents in Bloemfontein Offences aver last 12 montis Ase None One Tao or More 5 a E, - Under 18, : 012 005 2, E, 18 or Older 045 O14 0,01 g, G ¥ (What is the probability that a! randomly selected resident had traffic offences in the last 12 months, given that he/sheis 18 oral Gi) What is the probability that a randomly selected resident had or more traffic offences in the last 12 months? x HR (ii) What is PE, v F,) equal to? (iv) If event A is a randomly selected resident under 18 with less, two offences, find P(A’. [CIS - May 1 Exercise 5 The following table shows the 2.500 employees of a large corporati classified by sex~and by opinion on a proposal to emphasise fri ‘benefits rather than wage increases in pending wage discussions: APPLIED BUSINESS STATISTICS. SS EEES mos @ Find PC\,) (ii) Find PCY,) (iii) Find PCY, and X,) (iv) Find P(Y, or ¥, or Y;) (v) Find PCY, or Y,) ICIS — November 1990] Exercise 3. If the probability is 02 that an event will occur at any attempt, inde pendently, indicate which of the following statements are correct (give reasons for your answer): (i) The event will occur exactly once in every sequence of five attempts (i) The event will occur on every fifth attempt. ‘ The event is certain to occur within 20 attempts. (iv) We cannot predict with certainty on which attempt the event, first occur. . ICIS - May 1990) Exercise 4 The following bivariate probability distribution refers to two charac teristics: age and traffic offences of residents in Bloemfontein. Offences over last 12 months Ase None One | Teee or More Fr B | Ey - Under 18, 023° 042 _ 005 2, E, 18 or Older Ol O01 vm sae = 3 (®) What is the probability that a’ randomily selected resident had traffic offences in the last 12 months, given that he/sheis 18 or olde (ii) What is the probability that a randomly selected resident had. or more traffic offences in the last 12 months? ae? (ii) What is P(E, U F,) equal to? (iv) If event A is a randomly selected resident under 18 with less, two offences, find P(A). 4 ICIS ~ May I Exercise 5 The following table shows the 2 500 employees of a large corporati classified by sex’4nd by opinion on a proposal to emphasise fri benefits rather than wage increases in pending wage discussions: E, Male » E, Female Total * Opinion Favour Neutral F, FB “09 150 300 100 1200 250 Total | io 450 1500, Low 2560 i) Given that an employee chosen at random is a male, find the probability that the employee is opposed to the proposal. (i) Find the joint probability that an employee picked at random is a Exercise 6 (a) female opposed to the proposal. [CIs - May 1990} ‘wine dealer has classified the last 200 customers according to the criteria given in Type of wine tonegit South African French German Other Total | the following table: | Age of customer TOR Find the following probabilities: (® PC 40 # 4 | Total 20 110 An employee is selected at random from this population. Calculate the probability that the employee is: (@ under 25 years of age Gi a production worker ‘GGii) a'sales person and between 25 and 40 years ofage AYP (v) a production worker or under 25, or both, (@) Find the value of: : @ L i) 5P5 Calculate the following probabilities: (i) P(L, oS) PL) x P(S,/Ly) equivalent to? as follows: Income (R) <3000 3000 —<6000 S000 =<9.0007 >9.000 <3000 3000 -<6000 6.000 ~ <9.000-— > 9000 <3000 3.000 -< 6000 6000 —<9000 > 2000 (iv) over 40, given that he/she is an office worker * (c) The following table shows 1 000 adult shoppers cross-classified on the basis of lifestyle (L) and the shop (S) from which they make most of their clothing purchases: Lifestyle Shape bought oe Total Je} Ss Sy Ss 5 _— Ty 110 24 16 55 398 Le 152 170 2 fe 356 by 52 36 16 102 206 Total = 314 420 St 212 2000 Gi) PUYPG,/L). Using probability notation, what is ICIS November 1988] Exercise 9A market research company was commissioned by a local car manufac- turer to research the demographic profile (age and income per month) of the owners of their latest model car launched in March this year. A sample of 200 owners were interviewed. The research results are given ‘Number of axoners 15 APPLIED BUSINESS STATISTICS SE RS [ae (@) Present this data in a contingency table. (b) If an owner is selected at random, what is the probability that: () the owner will be between 30 and 40 years old? Gii) the owner will be between 20 and 30 years old and have a monthly income less than R3 000? (iii) the owner will be older than 40 years of age or earn more than 9.000 per month? : (iv) the owner will earn between R6 000 and R9 000 per month and be less than 40 years old? (v) the owner will earn between R6 000 and R9 000 per month given that this person is under 30 years of age? (vi) the owner is under 30 years of age given that this person is earning more than Ré6 000 per month? : : Exercise 10 Explain the meaning of the following terms used in the theory of probability and give an example of each: @) a priori Gi) joint events ji) independent events (iv) mutually exclusive events. [CIS - May 1991 Exercisa 11) In a group of people on holiday it is established that there are: — 10 males under the age of 21 — 8 females under the age of 21 — 6males aged between 21 and 30 — 5 females aged between 21 and 30 — 7 males over the age‘bf 30. (a) Calculate the probability that, if one person is selected at 1 from the group this person will be: (i) a male under the age of 30 (ii) a female —_— (iii) a female over the age of 30 (iv) a male over the age of 21 (¥} a person not older than 21. * . . [CIS - November 1! vorssadstp Jo arnsvaur v pur uoHeog] esquA2 Jo aIMsvoUT y SAGHOSOG -suoe|noze> yo yunoure ayy 2onpar ajqissod sanouoijan paxapisuoo 9q pToYs apis Auwpuauazduson UL, (U) -sonmqeqord fenprarput jo Sunumuns ayy Apeu storys eo.S cuumyy $89] OM “3s0He JB “um a4oul OW 'ySHA] Jy Se IONS spioméay (1) ey payou aq pTROYs I] ‘Ayayqun iy $1 | "suogoesuEN g Jo aidwes WopUE! © Wow Puno} eq jm SUOROBSUELL ]OOU! Z ise] 1B TEU (GOUEYD % 9 & SoMye) C2S0'O Au st Amiaedaud aL vogEjesaseju} £2900 = (guonsonb wow) [¢622'0 + F990] - T= [d=4d + @=Ad] - = (isd - T= @24d SAUL -posn aq 1 Ramdeqoad jo ao avy wawaydiuos ayy “suoneyn2]e9 snoxeUO Prone OL @=Hat" + pada + (B= 4d4+ (C= d= 2d spur ‘Aqypeonemoypenr passardxa, ¢ € HoHMOS -spassasoid Apoaiz0sUT Ua8q 3Aey suonpesuen payoyos WopUeT g Jo. Ino Z jsve7 70376) AyTqUaoad aya PITA E UOHSEND 7 -passoooid Ayoeuicaul uoeg e424 jm SuOROBSUEN pelDaIas AlWOPUEL g 6 nog UEU} e20W OU JEU (KtUFEUED ere|duwOO IsoUIIe} TPBE'O ‘Agugegosd aus. onesie) Tp66'0 = S1S0'0+ C6227’ + FE99'0 = (cS Aa ATOR cisu0 = e60 goo KEG OE - ada £6270 = S60 S0'0 =G=4)d SOLLSILVIS SSANISNS GaNiddv ‘Ort The poisson question Figure 6.1 Illustration of a sisson Distribution with 2 = 5 THE POISSON PROBABILITY DISTRIBUTION A poisson process is a discrete process. It measures the mumber of occurrences of a particular outcome of a discrete random variable in a predetermined time, space or volume interval, for which an average number of occurrences of the outcome can be deter- mined Examples * The mumber of cars arriving ata parking garage in a one-hour time interval. The number of telephone calls received in a ten-minute interval * The mumber of defective screws per consignment. + The mumber of typing errors per page © The number of particles of a given chemical in a litre of water. What is the probability of r occurrences of a given outcome being observed in a predetermined time, space or volume interval? The poisson probability distribution function which is defined as fol- lows, will derive probabilities for the above question: Where a = themean number of occurrences per predetermined time, space or volume intervai mathematical constantapproximately equal to 2.71828 number of occurrences for which a probability is required, ier can beeither0, 1,2,3,4,...and iscalled the domain. The shape of the poisson distribution is shown in Figure 6.1. lla, | 0123456768910 x Number of occurrences EXAMPLE 3 Question 1 (ii) The average number of stoppages, a, can be determined or is given) APPLIED BUSINESS STATISTICS — rene In the case of poisson processes, the number of occurrences of a given outcome for a discrete random variable can be any integer value from zero to infinity. There is no theoretical upper limit. The textile spinning machine problem A textile producer has established that a spinning machine stops ran- domly due to thread breakages at an average rate of 5 stoppages per hour. What is the probability that in a given hour, 3 stoppages will occur on this spinning machine? ‘The random variable, machine stoppages, “fits” the poisson process for the following reasons: () The random variable is discrete. It measures the number of machine stoppages per hour. Any integer value from zero to infinity can theoretically occur in any given hour. (ii) The problem is a poisson process as it describes the number of occurrences (stoppages) in a predetermined time interval (one hour). In this instance, a = 5. Solution 1 Given: a = averagenumber of stoppagesper hour = 5 perhour Required to find P(x = 3 stoppages per Irour) },006738 x 20,833 Interpretation Thore is only a 14,04 % chance that exactly 3 breakdowns of this spinning mact will occur in any given hour when the average stoppage rate is § per hour. Find the probability that ¢f most 2 stoppages will occur in any gi hour. Solution 2 Expressed mathematically, af most 2 stoppages implies either 0, or 1, 2 stoppages in a given hour. These outcomes are mutually exclusive and the combined probabil can be found using the addition rule of probability, ic P& <2) =P(x=0)+ Pixs 1)+ Pir =2) P(O stoppages): 2 “a = 0,00674 PC stoppage): >5s P(r=1) = [> = 0,006738 (6) P(x =0) = 0,006738 (1) = 0,0337 PQQ stoppages): Px=2) = = 0,006738 (12,5) 2 = 0,0842 ay Then P(x $2) = 0,00674 + 0.0337 + 0,0842 = 0,12464 Interpretation ‘There is only a 12,46 % chance that at most 2 stoppages will occur in a given one hour period. Find the probability that more than 4 stoppages will occur in a given hour. Solution 3 Expressed mathematically, find P(x > 4). + Since x is a discrete random variable, the first integer value of x above 4 is x = 5. Hence the problem becomes one of finding: P(x 25) =P =5) + Pe =6)+ Pir +... The values of x for which probabilities must be found continue to infinity. To solve this problem, the complementary rule of probability must be used, Ge. the complement of x > 5is all x $4): ie. PQ@25)= 1-PG <4) = 1 — [P(r = 0) + P(e = 1) + Pr =2) + Pe =3) + Plr= 4) = 1 = [0,0067 + 0,0337 + 0,0842 + 0,1904 + 0,1755] =1-0,A405 = 0,5595 A Interpretation * There is a 55,95 % chance that more than 4 stoppages will occur in any given one ‘ hour period. oe 1 eS PT ae Question 4 Descriptive measures of the IED BUSINESS STATISTICS: ES What is the probability that no more than 1 stoppage will occur in a given tav0-hour interval? Solution 4 Notice that the time interval over which the breakdowns are observed has changed from one hour ta two hours. Now the new value for a must relate to the average number breakdowns per fteo-hour interval. Thus a is 10 per two hours (5 per hour x 2 hours). Mathematically, using a = 10: Find P(x <1) =PG@ = 0)+ P(x = 1) P(O stoppages): 10 Pu=0) =" ae = 0,0000454(2) = 00000454 P(1 stoppage): ot P=) = = 0,0000454(10) Ww = 0000454 ‘Then P(r < 1) = 0,0000454 + 0,000454 = 0,000499 Interpretation There is a 0,0499 % chance (less than Veo %) that no more than 1 stoppage: ‘oceur ina given two hour period. It is extremety unlikely. A word of caution: As a general rule, always check that the time, 5 ‘or volume interval over which occurrences of the random variable observed is the same as the time, space or volume interval correspar to the average rate of occurrences, a. When they differ, adjust the average rate of occurrences to coine with the observed interval. ‘A measure of central location and 2 measure of dispersion computed for a random variable which follows a poisson process the following formulae: Mean Standard deviation ILITY DISTRIBUTIONS us CONTINUOUS PROBABILITY DISTRIBUTIONS ‘A continuous random variable can take on any value (as opposed to only discrete values) in an interval. Because there are an infinite number of possible outcomes associated with a continuous fandom variable, probabilities are only found for ranges of x values rather than individual x values as is the case in discrete probability distributions. \ Continuous probability distribution functions are used to find prob- abilities associated with intervals of x aalues. The likelihood of outcomes ofa large majority of continuous random variables can be described by the normal probability distribution func- tion. \ THE NORMAL PROBABILITY DISTRIBUTION A normal probability distribution function finds the probabilities for a continuous random variable and has the following, characteristics: + Ikis bell shaped. It is symmetrical about a central value, } * The tails of the distritution never touch the x axis (i.e. asynrptotic) * A normally distributed random variable is described by two parameters, namely a meat (u), and a standard deviation (6). * The area under the curve equals one. This corresponds to the complete sample space of a random experi- ment as it represents the sum of probabilities associated with all possible ualwes of x of the given random variable. : Due to symmetry, the area above pis 0,5. * The probability associated with a particular range of x values is de- scribed by the arca under He curve between the limits of the x range (eg. x, 1,82) =? The total area to the right of the midpoint at z= 0 is 0,5. The z tal only provides areas between 0 <2 < k. The area above k is found subtracting the area between 0 1,82) = 05000 - 0,4656 = 0,0344 (iv) PC-21 <2 < 1,32) =? P(-2,1 ) = 0,8051. ‘The area above k is greater than 0,5000. Hence k must lie below z= and will again have a negative sign. Recall that the area under half normal probability distribution is 0,5000. The area which must be scanned for is not 0,8051, but 0,8051 — 0,5008 03051 as the z table provides areas for only half the z distribution. ‘The area of 0,3051 corresponds toa z value of 0,86. Since 2 is negative, k = -0,86. Thus Plz > -0,86) = 08051. This method of finding z values from given probabilities can be tended to find corresponding x values. This requires the use of transformation formula to convert z values into associated x values, 3 = The telephone installation problem What should the time limit for installing a new telephone be sucht no more than 10 per cent of installations will exceed this limit? Recall from Example 5 that p, = 45.and 6, SF DISTRIBUTIONS _ 1s7 SE Se Solution Graphically: x Step 1 ‘The question translates into finding initially, a z value (=k) correspond- ing to an area of 0,10 above this value of z = From 2 tables, find the z value corresponding to an area of (0,5000 — 0,1000) = 0,4000. (As only areas between [0 < z < k] are given by the 2 table.) The closest z value is 1,28. Thus the area between [0 9) Jo ajdures wopuer ev ut ¢ Appexo wep Atqeqord ayy st ym quan sad og st Pow prepueys ay} OF |apou! axnppp ayy zjoud prom sauIOISND e eYT i Aypgqeqoid unw-Buoj ayy yep juaurayeys aup soyeu 1eSeueur SupAyIE YB es1IEKy fO661 799990 - WIN -aynupur voAtd e Uy fe ao wey sow aAfeDaL [ILM (1) spqnumur UaAiS b UE sT]eo ¢ ApPENa aATBI—T TELM () Jays rem Aunqeqoad ayy aemaqe> nee coy ILITY DISTRIBUTIONS 1st CSc iii) at least 3 orders will be received? (6) What is the probability that, on a given half-day, no orders will be received? (c) What is the mean and standard deviation of orders received per day? Exercise 13. The average number of calls coming imto a switchboard during the busiest part of the day fora small firm is Scalls perminute. If the number of incoming calls follows a Poisson distribution, what is the probability that for any given minute there will be exactly 2 calls? [MM - May 1990] Exercise 14. The number of purchases of toothpaste by a typical familly is a random variable having a Poisson distribution with averages = 18 per month. What is the probability that a typical family will purchase at least four tubes of toothpaste in any given month? [IMM =(October 1989] Exercise 15 The average number of claims per hour made to a certain insurance company is 1,2. What is the probability that in any given hour either two or three claims will be received? TMM - May 19891 Exercise 16 Ina certain company there is a probability of 0.4 that an account has a credit balance among the account statements that are generated at the end of the month. If three accounts are sampled randomly, indicate which of the following expressions will give the probability that two of the three accounts have a credit balance. Give reasons for your answer. @ jCX6*) @ @ArO6)' Gil) ,C,406!) [CIS — May 1990] (iv) What is the probability that none of the sampled accounts has a credit balance? (v) Whatis the probability that at least one of the sampled accounts has a credit balance? ise 17 There isan 80 % chance that a trainee in a company training programme will complete the programme successfully. What is the probability that in a group of 4 trainees chosen at random: Gi) all 4 will complete the programme successfully? ICIS - May 19901 (Gi) at most one trainee will be unsuccessful on the programme? 162 _APPLIED BUSINESS STATIS! are - saRAS ane Exeftise 18 In a particular industrial process 15 “% of all items are sub-standa ‘Twenty items are randomly sampled from the production line tested. Find the probability that at most one item is defective. fv [CIS - November 1: <é Xereise 19 It takes on average, 70 minutes with a standard deviation of 10 minu to assemble a particular microcomputer ‘Assume that assembly time is normally distributed. (a) What is the probability that a given microcomputer will: (i) Take between 70 and 80 minutes to be assembled? (ii) be assembled within 62 minutes? . ii) take between 56 and 72 minutes to be assembled? Jy) Within what time limit can 35 % of microcomputers be expected be assembled? * ‘ Exercise 20 Given that x follows a normal distribution with a mean (u) of 64 a1 standard deviation (6) of 0,5, find: v @ Pu <63) (ii) P(x > 63,7) Gil) PO29 ?) =0,1026 (v) P(e» 2) = 09772 [CIS— November I “fe Exercise 21 The number of seconds of continuous spray yielded by cans of a brand of (ozone friendly) deodorant spray is normally distributed a mean of 260 seconds and a standard deviation of 15 second (® What is the probability that a can selected at random will yi continuous spray of duration between 245 and 275 seconds? (ii) The probability is 0,0968 that a randomly selected can of this will yield a continuous spray equal to or less than what num! seconds? [CIS — May t Exercise 22 Find the following probabilities for the standard normal distributi G@) P@>1,5) Gi) Pe <~0,68) (i) PO<2<1,5) [CIS — November oy Exercise23 Determine the missing values for the following probabilities for standard normal distribution, 2: @ P (b) What is the minimum age of the oldest 20 % of passengers? [CIS ~ May 1 Exercise $2. Certain tablets are packed 12 per box. IF5 % of the tablets manufaci are chipped, what is the probability that a randomly selected box i) be free of chipped tablets? (ii) have not more than one chipped tablet? HAC(SA) - May © Exercise 33 (a) Five questions are included in a true/false test requiring four answers to pass. If you guess each answer, what is the prol that you pass? (b) If the true/false test is replaced by a multiple choice test cont four alternatives, what is the probability that you pass? TAC(SA) ~ May « Exercise 34 If printers experience an error rate of 0,075 errors per page, what probability of finding 12 errors in a 200 page book? [TACKSA) - May Exercise95 A market survey in a certain city showed that 70 % of the purchased their TV sets in departmental stores. DISTRIBUTIONS Ifa random sample of 15 shoppers is questioned, what is the probability that three or fewer of those questioned purchased their sets from stores other than department stores? {IMM - October 1991] reise 36 It has been observed that a proportion of 0,10 of the recipients of a mail order has responded to it. If a sample of 20 individuals is sent the offer, determine the probability that at least two individuals respond to the offer. [IMM — June 1992] ‘Exercise 97 Customers are known to arrive at a muffler shop on a random basis, with an average of two customers per hour arriving at the facility. (a) What is the probability that more than 3 customers will require service during a particular hour? (b) What is the probability that fewer than 4 customers will require service during a 4 hour period in the morning on a particular day? IMM June 1992] pExercise 38 A car dealer estimates that they sell 2 cars on average per week. They only have space in their showroom for 4 cars. If stock replacement cannot take place within a week, what is the probability, assuming weekly car sales follow a Poisson process, that the dealer will run out of stock in any one week? Exercise 39’ An aptitude test indicates that 15 % of the population have negotiating skills. What is the probability that in a randomly chosen group of 5, at least 2 will be found to have negotiating skills? ICIS - November 1992) Exercise 40 The mean mass of 500 kudu at a private game park is 151 kg and the standard deviation is 15 kg. Assume that there is a normal distribution of the masses, calculate how many kudu have a mass (i) between 120 and 155 kg, (i) more than 185 kg. Exercise 41 The quality control manager of Marilyn's Cookies is inspecting a batch of chocolate-chip cookies that have just been baked. The average num- ber of chip parts per cookie is 6. What is the probability that in any particular cookie being inspected (correct to 4 decimal places): (i) fewer than five chip parts will be found? Gi) exactly five chip parts will be found? Gi) five or more chip parts will be found? [IMM — November 1992] 201% ts 166 USINESS STATISTICS: : 2 Sia Exercise 42 A market researcher found that 10 % of the general population is left-handed, and that 22 % of children born to mothers over the age of 40 are left-handed. Find the probability of there being fewer than two left-handed people in any 10 randomly selected sample from the following groups: (@) the general population ii) children born to mothers over the age of 40. [IMM — November 199: 1 Index Numbers IAPTER OBJECTIVES ©. e cece ee eee eee neers v3 eea INTRODUCTION ...-.---- BS PRICE INDEXES = 5 QUANTITY INDEXES . Pare 7-7. PROBLEMS OF INDEX NUMBER CONSTRUCTION .... -. - .286 LIMITATION ON THE INTERPRETATION OF INDEX NUMBERS . Berrys ie te ee 287 ADDITIONAL TOPICS ON INDEX NUMBERS ......-...-- 287 SUMMARY) sua ccuw.uttis peer see cae oA EXERCISES... 0e-eumweiss PUES BANA AEROS ev 292 era i 27a APPLIED BUSINESS STATIS" 414 OBJECTIVES After studying this chapter, you should be able to: © Explain the purpose of index numbers, © Develop indices to measure price changes over time. Develop indices to measure quantity changes over time. * Distinguish between the Laspeyres and Paasche weighting methods. © Discuss pitfalls of index number construction, + Revise the base period of a series of index numbers. + Explain and derive link relatives, * Transform monetary values into real values. 11.2 INTRODUCTION Definition Construction EXAMPLES An index mumber is defined as: A summary measure of the change in the level of activity of a item or a basket of relat ne Hime period to anot ‘An index number is constructed by expressing the valve of an i the current period as a ratio of its value in the base periad ‘This ratio is expressed in percentage terms. Current Period Value ,, 100 Base Period Value Index Number = ‘An index number measures percentage changes from a bi which has an index = 100. An index number above 100 i increase in the level of activity being monitored, while an ind: below 100 reflects a decrease. The magnitude of the change is the difference between the index number and the base index Index numbers used in South Africa © JSE Actuaries Index: Overall (all share index) + JSE Actuaries Index: Gold Index + JSE Actuaries Index: Industrial Index + JSE Actuaries Index: Bond Index * CPI Consumer Price Index (1985 = 100) * PPI Production Price Index (1980 = 100) * Manufacturing Output ( 1980 = 100) SACOB’s Business Confidence Index Consumer Confidence index. 275 jes of Thereare two major categories of index numbers. Within each category, bers an index can be computed for either a single item or a basket of related items. These categories are © Price Indexes: — single price index = composite price index * Quantity Indexes: — single quantity index — composite quantity index. This chapter will cover the following topies: The construction of index numbers (both price and quantity indexes); highlight some problems of index number construction; discuss. some limitations of index numbers; and identify other forms of index numbers. The following symbolic notation is used: 9) = current period quantity. ICE INDEXES A price index measures the percentage change in price between any two time periods. This can be computed for either an individual item or a basket of items, le item For a single item, the relative price change from one time period to relative another is found by computing its price relative. 4 Price relative =?! 5 100% Po This relative price change is multiplied by 100 to expressitin percentage terms. EXAMPLE 1 In 1986, 98 octane petrol cost R0,87 per litre (p,). In 1991, 98 octane petrol cost R1,24 per litre (p,). 124 Then the price relative for 98 octane petrol = ‘6, x 100 = 142,53 Fora basket of items — composite price index Weighted aggregates method APPLIED BUSINESS STATISTI SE aeeeet Interpretation ‘The price index of 98 octane petro! in 1991 stands at 142,53. This shows that ail of 88 octane petrol has increased by 42,53% between 1986 (with base index = 1 and 1991. The average price change for a basket of items from one time period ( base period) to another period (the current period) can be measured a composite price index. Each item is weighted according to its importance in the bas! Importance is determined by both an item's unit price and qua consumed per period (ie. the value of each item in the basket). Since only price changes are being monitored by the composite index, it is necessary to hold quantities consumed constant. This pe price changes to be monitored without the confounding effect of tity changes. Quantities can be held constant at either base period or cu period levels when determining weights (value) to be assigned to item in the basket, This gives rise to two approaches to determining weights: + The Laspeyres Approach (This approach advocates holding quantities constant at period levels. * The Paasche Approach w/> Th approsch advocates Kolding quarts constant at period levels, Que of these approaches must be selected when deriving price index. Unless explicitly stated, the Laspeyres approach is assumed. Methods for constructing # composite price index There are two methods which can be employed to find this cor price index once the approach to weighting the items (Le. s either the Laspeyres or the Paasche approach} has been decid: “methods produce the same composite price index value, but ‘their reasoning. i The two construction methods are: © Weighted aggregates. + Weighted average of price relatives The method of weighted aggregates is illustrated using the w approach of Laspeyres. The method is equally appropriate using the Paasche weighting approach. Step 1 Find the base period value for the basket of items. Multiply the base period prices with base period quantities (y x qa) for each item, and sum over all items in the basket, ie. ; | Base period value = E (pp * 4). Interpreted, it means, finding what the basket of items would have cost in the base period when base period prices applied and base period quantities were consumed. Step 2 Find the curvent period value for the basket of items. Multiply the current period prices with base period quantities (?; > qo) for each-item, and sum over all items in the basket, ie. Current period value = value =E 0% 40) | Interpreted, it means, finding what the basket of items would currently cost if current prices were paid for base period quantities. Step 3 Find the ratio of these two values (ie. current value divided by base value). This will reflect the combined average relative change in prices from ‘one time period to another for all items in the basket. This ratio can be seen as an aggregated price relative. * qo) Laspeyres price index ZO) 00% Eto * 40) Similarly, if quantities are held constant at current period levels, a Paasche price index using the method of weighted aggregates will be constructed as follows: The method is equally appropriate using the Paasche weighting approach. Step 1 Find the base period value for the basket of items. Multiply the base period prices with base period quantities (pp x qq) for each item, and sum over all items in the basket, ie.” Base period value = ¥ (pp x 9p)- ga Interpreted, it means, finding what the basket of items would have cost in the base period when base period prices applied and base period quantities were consumed. Step 2 Find the current period value for the basket of items. Multiply the current period prices with base period quantities (p, 4p) for each item, and sum aver all items in the basket, ie. —-- =i Current period value = E (p, x 0) | Interpreted, it means, finding what the basket of items would currently cost if current prices were paid for base period quantities Step 3 Find the ratio of these two values (.e. current value divided by base value), This will reflect the combined average relative change in prices from one time period to another for all items in the basket. This ratio can be seen as an aggregated price relative. Deri X40) ‘Laspeyres price index = % 100% Lo x aw) Similarly, if quantities are held constant at current period levels, a Paasche price index using the method of weighted aggregates will be constructed as follows: . Lexan Paasche price index = Looxn x 100% ¢ EXAMPLE 2 The share portfolio problem In 1986 an investor acquired a small portfolio of shares on the JSE. has bought and sold from this portfolio over time. Both his initial current (1992) portfolio is given below. Share Base-year (1986) Current yeer (1992) Po % Pa a A be 350 156 300 B 200.¢ 240 126 0 € 1260 50 1890c 100 Question 1 Use the method of weighted aggregates to find the Laspeyres composi price index for 1992. This index will show the average change in prices from 1986 (1986 = base index = 100). Solution 1 Weighted aggregates method (Laspeyres approach): Share Base year Current year Vatue in 1986 1992 Base Po To ya a PoX 40+ A 6c 350 Se 300 22750 B 200¢ 240, 120¢ 60 48.000 Cc 1260 ¢ 50 18% 100 63.000 133750 (py * Laspeyres Price Index = ZO * 00% 2% * 40) 163.550: = 193750 ~ 100% = 122,28 Interpretation lf quantities are held constant at 1986 (base period) levels, the prices of of shares have increased by, on average, 22,28 % since 1986, Question2 Use the method of weighted aggregates to find the Pansche price index for 1992. Again, this index will show the average share prices from 1986 (1986 = base index = 100). NUMBERS: eighted average of price relatives method Solution 2 Weighted aggregates method (Paasche approach): ” Share Base year Current year Value in 1986 1992 Base Current Po a Pi XX A 65 350 WSe¢ 300 19500 4500 B 20e 49 ew 7200 € 1260 50 -189e = 100126000» «189000 157500 230700 1%) - Deg 100 % Paasche price index interpretation It quantities are held constant at 1982 (current period) levels, the composite price index is 146,48. This means thatthe share prices in the portfolio have increased by, on average, 46,48 % since 1986, This method highlights the fact that the composite price index is a weighted average of the individual items’ price relatives. The averaging process takes into account the relative importance of each item in the basket as measured by its relative value. An item’s relative valueis a percentage showing its value relative to the total basket value. High valued items will weight (or influence) the composite price index more than low valued items. Step 1 Find the price relative for each item in the basket. Step 2 Find the value of each item in the basket using cither Laspeyres i.e. (pg X qg)] or Pansche lie. (py q,)1- Step 3 Weight the price relative for each item by the item's relative value in the basket. Multiply the price relative for each item by its value. Then sum over all items, and finally divide by the total value of the basket. | 280 a EE In formulae terms: If base period quantities are used as weights, we have: 2) x 100 X (py x 2) Laspeyres price index = & (Po 40) If current period quantities are used as weights, we have: Paasche price index = EXAMPLE 8 The share portfolio problem Refer to Example 2 for the data. Question 1 Use the weighted average of price relatives method to find the peytes composite price index for 1992. Solution 1 Weighted average of price relatives (Laspeyres): Shave 1986 1992 Price Base == Pricer (base) (Current) sebtive auc Base 0 Po % mr Pt PaX Mo 1) Ba (ox & pee A 6c 350 «Sc «17682-22750 4025 000 Bo 200c | 2402e S60 48.000 2.880 000 © 120¢ 50 «1890¢ 1500" 63.000 9-450 000 133750 16355000 A 7) x 100 x (pp x sf) (Po a Laspeyres price index = ——"——______~ Dir 40) _ 16355000 = 133750 = 122,28 Question 2 Unweighted composite price index 281 a ee Interpretation Same as for weighted aggregates method using Laspeyres approach. Use the weighted average of price relatives method to find the Paasche composite price index for 1992. Solution 2 Weighted average of price relatives (Paasche): Share 1986 1992 Price Current Price relative (Base) (Current) relative vale * Current value 5 x Po P a Ag Ph Joss Po A 650 lsc 300 17692 19500 3450000 B 200¢ 120¢ 60 60 12000 720000 ic 1260¢ 1890¢ 100 150 126000 +18:900.000 157500 23670000 [F-29000] Paasche price index = = Liroxn) _ 23070 000 ~ 157500 = 146,48 Interpretation ‘Same as for the weighted aggregates method using the Paasche approach. It should be noted that the compasite price indexes found for the Laspeyres and Paasche approaches are identical for both the weighted aggregates method and the method of weighted average of price rela- tives. The benefit of this weighted average of price relatives method over the weighted aggregates method is the added insight it provides on the price changes in individual items. An unweighted composite price index ignores the relative importances of each item’s quantity consumed. Itassumes equal weights for each item in a basket. If the quantity consumed is considered to be one unit for each item, then an unweighted composite price index formula can be stated as: Cy Price index (unweighted) = 2d x 100% Lt ee —" aS _aaaw | | APPLIED BUSINESS STATI This method of index number construction is only valid if equal qu: tities of each item in the basket are consumed. Generally this is not case. 11.4 QUANTITY INDEXES For a single item —quantity relative EXAMPLE 4 A quantity index measures the percentage change in consumption | either an individual item or a basket of items from one time pe another. These changes in consumption patterns can be quantified in si ways to that for price indexes. When constructing quantity indexes, it is necessary to hold levels constunt over time to isolate the effect of quantity (const level) changes only. For a single item, the relative quantity chang? from one time another is found by computing its quantity relative. Quantity relative “5 x 100% io Where: q, = quantity in thecurrent period, Gq = quantity in the base period. The quantity change is measured in percentage terms. The share portfolio problent Tf 350 units of share A were purchased in 1986 and the investor: holds 300 units in 1992, then the change in the shareholdi share A from 1986 to 1992 is given by its quantity relative, ie. Quantity relative for share A= a x = 857 100 Interpretation Sinoe the quantity index is below 100, this shows a reduction in co reduced holding level) of share A. 14,3 percant tewer units of share A 1992 than in the base period of 1988. " abasket of The average quantity change for a basket of items from one time period items. (the base period) to another period (the current period) can be measured composite by a composite quantity index. ityindex As with price index construction, it must be decided whether to hold prices constant at cither base period levels or current period levels in determining weights. + Laspeyres approach: This approach advocates holding prices constant at base period levels. a Zeoxa oom | = @ox40 | | Laspeyres quantity index = © Paasche approach: This approach advocates holding prices constant at current period levels. Lm soe | Lax — a =I Paasche quantity index = With each of these two approaches, either the weighted aggregates method or the weighted average of quantity relatives method can be employed to find the composite quantity index These two methods, for both Laspeyres and Paasche, will be illus- trated using the share porifolio data, EXAMPLE 5 The share portfolio problem Question 1 Use the weighted aggregates methed to find the Laspeyres composite quantity index for 1992, This index will show the average change in the number of units of shares held from 1986 (1986 = base index = 100) to 1992. Solution 1 Weighted aggregates method (Laspeyres approach: i.e. prices held constant at base period levels): Shave Base year Current year Veale ir 1986 1992 Base (Current > A n 40 PX A 6c 350 ise 300, RDO 19500 Bo 2We «|e ‘aso00 12000 c 1260 ¢ 50 18% 100 63.000 126000 133.750 157500 Question 2 ‘Question 3 % x) Laspeyres quantity index = 2% x4 x 100% EO * 49) 157 500, = 135750 % 100 = 1178 Interpretation lfprices are held constant at 1986 (base period) levels, the composite quantity i is 117,8. This shows that the number of units of shares held have increased, average, by 17,8 % trom 1986 to 1992. Use the weighted aggregates method to find the Paasche comp: quantity index for 1992. This index will show the average change in number of units of shares held from 1986 (1986 - base index = 100) 1992. Solution 2 Method of weighted aggregates (Paasche approach: i.e. prices 30 7 2 . 5 20 ia x i . . Be 5 10 15 20 Coal usage (x) Step 3 Computations: Find a and using the method of least squares. This requires applying the formulae to find a and 6. EXAMPLE 2 ‘Question APPLIED BUSINESS STATISTICS eee x v rt ay 15 38 25 525 6 8 36 108 10 4 100 240 18 2 aut 576 9 24 8 216 7 20 | 140 it 2 196. 448 u 2 1m 319 5 “4 5 70 8 2 64 176 Ex=103 Zy=250 Dy =2818 p = 102818) — (103 x 250) _ 1518 10 (1.221) — (103) a = 250 = 1518 x 108) _ 9 567 10 The estimated regression line is now defined as: ’ H = 9,367 + 1518 forS

You might also like