Download as pdf
Download as pdf
You are on page 1of 60
quiz 1 (2020/2021) ANSWER ALL QUESTIONS Outline the two main things that are essential in survey methods (descriptive), ‘Apart from the bar charts and pie chart, mention any eight specific graphs that can be used to describe data. Mention any two characteristics that bare common to all averages. Mention any three key ways defective questionnaires could be determined. Closed questions in questionnaire designing comprises of... Assurance of confidentiality in questionnaire designing is very important. Where exactly can it be written? SOLUTIONS TO Quiz 1(2020/2021) a) You observe closely the population bounded by the researcher parameter, researcher parameter means the information you need. b) You make carefully record of what you observe; that is if aggregate record is made, it helps the researcher to record to study the observation being provided. a) Histogram b) Frequency polygon i c) Stem-and-leaf plot~ d)Pictogram + e) Box-and-whisker plot f) Line graph ~ g) Scatter plot ~ h) Gauge chart. ~ a) All the averages are applicable to quantitative data. b) All the averages are used to determine the central value of the data set. c) They symmetric when all averages are equal. a) Omissions b) Inconsistencies Scanned with CamScanner c) Incomplete entries. . a) Multiple choice questions b) Dichotomous questions c) Ranking or rating questions . Itis part of the questionnaire layout. Quiz 2 (2020/2021) ANSWER ALL QUESTIONS What is Correlation Analysis? Mention any eight types of correlation coefficient. . State whether each of the following statements is true or false a) Correlation analysis strictly measurres the linear relationship between two variables. b) Correlation coefficient can be between -1 and 1. ¢) The process of using sample mean, X to estimate the population parameter, 1 is known as estimation. d) Coefficient of Non-determination is the fraction of variation that is not explained by the Y-variable relative to the X-variable. Two data sets A and B are being compared. Data A has coefficient of skewness 3.6 and mean of 24, Data set B has coefficient of skewness 1.2 and mean of 1.9. Assume that the two data have equal standard deviation and equal median. a) Estimate the median and the variance of the two data sets. b) Which data set is really more dispersed? c) Given that a sample size of 16 was used, estimate the standard error that is associated with the use of X, in estimating fl. Scanned with CamScanner SOTIONS TO QUIZ 2 (2020/2021) 1. Correlation Analysis is the process of measu between two variables. ig the strength of the relationship 2. a) Pearson-product moment correlation bb) Spearman's Rank correlation coefficient ¢) Kendall's Tau Rank correlation coefficient d) Phi correlation coefficient e) Spurious correlation coefficient {) Intraclass correlation coefficient g) Point biserial correlation coefficient h) Concordance correlation coefficient. 3. a) The correct option is True. b) The correct option is True. d) The correct option is false. d) The correct option is false. 4. a) we consider the two inputs; DATA SETA i... coef ficient of skewness = ation 96 = 2242 median) Sa 3(24 — median} Sa= f 3, ee @ DATASET B ke 3(mean — median) coef ficient of skewness = on dard deviation 3(19 - median) 12a Se 3(19 — median) 22 ee oe 2 Se 2 (2) = Sand median of A = median of B from the question, S, Hence, we equate both equations to find the median Scanned with CamScanner 3(24 — median) _ 3(19 - median) 3.6 1.2 1,2[3(24 - median)] = 3.6[3(19 — median)] 3,6(24 — median) = 10.8(19 — median) 24 — median = 57 — 3median median = 16.5 Hence, median of A = median of B = 16.5 ii. Form equation 1, we can deduce the standard deviation of A; _ 3(24— median) 7 pee but median of A = 165 3(24 - 16. Then we have that, 5, = Hence, variance of A = (Sy)? = (6.25) From equation two, we can deduce the standard deviation of B; _ 3(19 — median) Se = 12 but median of B = 16.5 3(19 — 16.5, 75 Sp 2309516) 6 == 6.25 ence, variance of B = (Sp)? = (6.25)? = 39.0625 b) Given that the two data sets have the same mean ahd standard deviations, the datas: with the highest mean is considered to be more dispersed. Hence, data set A is more dispersed than data set B. c) Givenn = 16, standard deviation = 16.5, Hence, Standard error of X, (Sz,) = z a 6.25 e (Sz,) = ag = 15625 Scanned with CamScanner END OF SECOND SEMESTER EXAMINATION (2020/2021) SECTION A All these questions in the first. It contains 50% of the total score. Q1., If a data set has mean of 24 and mode of 30 then the median is likely to be A28 8.22 c.26 D. None of the above Q2. Which of the following is true about a platykurtic curve? A. The peak is very sharp B. The mean, the median and the mode is the same C. Ithas coefficient of skewness to be zero D. None of the above Q3. if the sum of the deviation square of a sample dataset is found to be 86, and the variance was found to be 5.058, determine the sample size. Al7 B.18 c.16 D. None of the above 4. Which of the following is not a type of closed questions in a questionnaire designing? A. Rating B. Multiple choice C. Likert scale D. Ranking Q5. All the following must be avoided at all cost in questionnaire designing except A. technical terms B. leading questions C. double questions D. ambiguous questions 06. Identify the graph that can be easly used to discuss the skewness of the dataset. A. Percentage component bar chart B. Pie chart C. Simple bar chart D. All of the above. Scanned with CamScanner Q7. Estimate the mean of absolute deviation of the data: 10, 12, 13, and 21. A350 8.4.66 c.14.01 D. None of the above Q8. Which of the following is not a type of correlation coefficient? ‘A. Polychoric correlation coefficient B, Matthew correlation coefficient C. Spurious correlation coefficient D. All the above are correlation The mean is most useful as a measure of central location when the distribution of scoresis A. bimodal B. normal C. skewed D. dispersed Q10. One of the uses of correlation can be that it A. determines the position of individual student B. estimate the process of variation within scores C. explain the performance of students in different subjects D. None of the above. Q11. In a dataset the mean was 15 and the variance is 20.25. What is the value of the coefficient of determination? A3.3% B. 74.1% C. 30.5% D. None of the above Q12. The correlation coefficient found between the performance in the mathematics and statistics was 0.90. This means that ‘A. low performance in mathematics is more likely to record low performance in statistics B. high performance in mathematics is likely to record moderate performance in statistics. C. moderate score in mathematics is more likely to record low performance in statistics D. high performance in mathematics is more likely to record low performance in statistics Q13. Which of the following can be considered as a variable? A. The number of hours every day B. The amount of hours you sleep every day C. The size of chemical container Scanned with CamScanner D. None of the above Q14, Ifthe correlation coefficient is given by r = 0.75, how much variation has not been explained by the independent variable relative to the dependent variable? A. 15.00% 8.56.25% .43,75% 0. 75.00% 15. The difference (Di) between the paired ranking of the variables X,¥ are: -3,0, 2, -1,-1, "3.5, 15,2, 1,3, 2, and -3. Find the spearman's rank correlation coefficient between X and Y ‘A.0.72 B.0.80 C.0.88 D. None of the above 16. In finding Kendall’ tau rank correlation coefficient the P and Qwere found to beS2 and 12 respectively, if the sample size, N, was 12, estimate the correlation coefficient. A61 B.41 c97 D. None of the above Q17. The proportion of the total variation in the dependent variable that is explained by the variation in the independent variable is known as A. Coefficient of variation B. Correlation coefficient C. Coefficient of determination D. Coefficient of skewness. Ina simple linear regression analysis the following summary statistics were obtained as: 20 20 20 20 20 > 236.25, Ya =77.66, ye = 6250, vy = 21.63, yy ia Use this information from the data to answer questions 18-21. 106.32. Q18. Find the regression coefficient, b, where Y = a + bx; + @ A.0.20 8.0.30 €.0.35 D. 0.25 Scanned with CamScanner Q19. Find the regression coefficient, a. A.0.20 8.0.30 €.0.35 0.0.25 Q20, Estimate the residual if X = 8 and ¥ = 3. A2.3 8.0.3 C07 D. None of the above Q21. Predict Y and X is 511 A128 B.115 C119 D. None of the above Q22. In simple linear regression the sum of the residual square was found to be 23.75. For a sample size of 10, estimate the standard error that is associated with the estimated model. A.1.185 B. 1.624 C.1.723 D. None of the above Q23. Which of the following statement is not true about regression analysis? A. There are two main types of variables; X and Y. B. In any regression model the error term can be estimated. C. There are two main type of regression analysis; linear and non- linear, D. The expected value of the error term should be equal to zero. Q24. All the following are the similarities of the Spearman's ranked correlation and the Kendall's tau ranked correlation coefficients except that A. both are used for qualitative and quantitative data sets. B. both involve ranking of observations C. they both talk about correlation analysis D. Their values are always between -1 and 1 inclusive. Q25. The rule that is used to estimate the population parameter is known as. A. estimations B. estimator Scanned with CamScanner C. estimate D. None of the above Q26. Find the standard error for using, X to estimate the population mean, 1, for the data: 3, 5, 7, 5, and 10. A 1.183 B. 2.646 C.0.817 D, None of the above Q27. One of the strength of the mean as a measure of central location is that it A. perform best in skewed distribution B. is sensitive to extreme scores C. is useful for closed end distribution D. perform best in normal distributions. Q28. Which of the following statement is not true about the use of the eye ball fitting method in estimating regression coefficients in simple linear regression analysis? A. There is so much variation in estimating regression coefficients. B. Without plotting the three additional points the regression coefficients cannot be estimated C. It requires plotting of scatter diagram. D. The least squared method, estimates the coefficients better than the eyeball fitting method Q29. Which of the following variables can be termed as discrete? A. The temperature of rooms B. The number of cell phones C. Student’s earnings at a place D. Age of students Q30. Scores in an examination yielded a mean of 25 and the coefficient of variation 0.108. Why is the variance of the distribution? A.7.29 B. 2.70 C. 6.42 D. None of the above Scanned with CamScanner SECTION B Answer any two questions from this section. All questions carry equal marks. 31. (a) Outline the structure of statistics as a subject. (b) Outline any five keys in words in the definition of statistics, and briefly explain what each mean in statistics? {c) The scores of students in mathematics and statistics marked out of 15 were recorded as follows: STUDENTS 1 2 3 4 5 MATHEMATICS 5 2 4 8 cE STATISTICS 4 7 3 6 10 (i) Estimate the linear regression model for mathematics on statistics using the least squared method. Comment on the relationship. (ii) If a student score 14 in statistics what will be his likely score in mathematics. (iii) Estimate the standard error of estimation of the model. 32. (a) Outline three differences and three similarities between the use of questionnaire in collecting data. (b) Classify the following variables as Discrete, Continuous, Nominal, or Ordinal. (i) Rating of companies ) Types of cars own by people i) Ranks of personnel in the police (iv) Title of different books. (v) Ages of some animals (vi) Numbers on car number plates. (vii) The month people are born. (vii) Grades of students examinations. c) The following data were obtained for a sample size of 10. x a 5 6 2 0 1 3 4 [3 2 4 5 5 3 2 1 9 7 8 5 6 4 6 7 8 = {Using spearman’s rank correlation coefficient, find which pair of variables are highly related. = Account for the percentage of variability unaccounted for by the estimates of the relationship. Scanned with CamScanner 33. a) Briefly discuss any five things that should be avoided in questionnaire designing. b) Simple linear regression model by P; = a + bY; + @, where P is the response variable, Y is the predictor variable, a and b are regression coefficients and e; is the error term. i) State the normal equations for estimating a and b. ii) Discuss the importance of the random error, e Briefly discuss how the eyeball fitting method could be used effectively to estimate a and b. iv) State the error that is associated with estimating a and b. 34, (a) Let = *, be the sample projection, where X is the number of observations with the characteristics understudy and n is the sample size. (i) Using B to explain the concept of estimation. ji) Derive the formula for finding standard error of using, to estimate the population proportion, P. (b) The data are obtained on X, the length of time in weeks that a promotional project has been in progress at a small business, and Y, the beginning times of the campaign. (X, Y) is given as: (1, 10), (2, 14), (3, 18), (4, 20), (2, 14), (3, 15), (1, 12) and (4, 19). (i) Using Kendall's tau rank correlation coefficient, discuss how the two variables, X and Y are related. ii) Estimate coefficients of non-determination and interpret your answer. SOLUTIONS TO SECTION A (2020/2021) 1) Using the formula, 3(median) = mode + 2(mean) aida mode+2(mean) median = ean 3 median = wt = 26 The correct option is (C) 2) The correct option is (c) Scanned with CamScanner rhe 3) Sample variance(s*) = $058 = = a-1=— n= 1 = 17.00276789 n = 18.00276789 = 18 The correct option is (8) 4) The correct option is (C} $) The correct option is (C) 6) The correct option is (C) 7) Mean absolute Deviation = ou 20412413421, - - - - a - (110 14] + 112 = 14] + [13 = 14] + [21 14] 442414 235 MAD 7 The correct option is (A) 8) The correct option is (0) 9) The correct option is (8) 10) The correct option is (0) 11) coef ficient of variation = sander dents x 100 coef f.of var.= 2 x 100 = 30% The correct option is (D) 12) The correct option is (A) 13) The correct option is (B) 14) coef ficient of non - determination = 1-1? = 1 — (0.75)? coef f.of non det.= 0.4375 = 43.75% The correct option is (C) 6x07 n(n2=1) 15)r=1-— but, Scanned with CamScanner > p? = (-3)? + (0)? + (-1)? + (-1)? + (3.5)? + (1.5)? + QP + GY + BY + + (3)? = 565 D— = 0,8024475524 = 0.80 hence,r = 1— ‘The correct option is (8) 16)P =52, Q=12, N= 12 srg 52-12 40 _ Rann 7 eo 0.606066061 = 0.61 The correct option is (D) 17) The correct option is (C) ADIL, xy-Bly XBT y _ 20(77.66)~(62.50)(21.63) _ 7 mR yx-(Oh, x) 20023625)-(6250)7 0.2458931298 = 025 The correct option is (D) 18) b = 19) a = Hav=Blat a = MS 0250625) — 9.30025 = 03 The correct option is (B) w=. butx =8, ¥ = 0.3 + 0.25% when x = 8,7 = 0.3 + 0.25(8) = 23 hence,e = 3-2.3=07 The correct answer is (C) 20) € 21)y .3 + 0.25x, but x = 511 y=03+ 0.25(511) = 128.05 = 128 The correct option is (A) 22) SSE = V}20%4 — 9)? = 23.75 _ [eon [23.28 Say = m=z Y 10-2 bike The correct option is (C) 23) The correct option is (A) 24) The correct option is (A) 25) The correct option is (A) 3454745410 _ 30 26) x=¥i= n Scanned with CamScanner -6f4 pia GL 6? + 6-6)? + (7-6 +58)? + (106) = 28 2.6458 2.6358 = 1.1832 ost = 1.183 The correct option is (A) 27) The correct option is (D) 28) The correct option is (8) is (B) 29) The correct of 30) Coef ficient of variation = 7 0.108 = “oe variance = 7.29 The correct option is (A) SOLUTIONS TO SECTION B (2020/2021) 31. (a) Statistics as a subject can be grouped into; i. Descriptive statistics: Which is presenting, organizing and summarizing data ii, Inferential Statistics: Which is drawing conclusions about a population based on data observed in a sample. {b). i. Science: Is a systematic procedures and methods for collecting, analyzing, interpreting and presenting empirical data in statistics. ii. Organization: Is the systematic arrangement of collected raw data, so that the data becomes easy to understand and more convenient for further statistical work. iii, Numerical: Is when data is expressed in numbers rather than natural language description. Numerical data is always collected in number form. iv. Collection: Is a process of gathering information from all the relevant sources to find a solution to the research problem Scanned with CamScanner : Is the process of systematically applying statistical and logical techniques to describe and illustrate and evaluate data. vi. Interpretation: Is the process of reviewing data through some predefined processes which will help assign some meaning to the data and arrive at a relevant conclusion. It involves taking the results of data analysis, making inferences on the relations studied and using them to conclude. vii, Presentation: Is defined as the process of using various graphical formats to visually represent the relationship between two or more data sets so that an informed decision can be made based on them. (c).i. Finding the regression for Mathematics on Statistics using least square Then let Y= Mathematics and X=Statistics Now, The regression model is H the form 9 = a + bx; yea bEx = nExi- Lady where a= 27S = 7 bx and also b= Ser Finding the parameters a and b eZ a Pe aay. 5 4 3 12 7 84 ah sem = 12 6 48 = ae ia 10 110 121 100 | Dy=40 x, = 30 Dri = 274 | Yyf = 370 y=210 | ‘Where n=5, wp = MEAD EAE _ $(078)-G0440) _ 7 2 4133333333 ~ 1.133 ‘3(210)-(30)? OCD — 4.202 = 12 And Therefore the linear regression model for Mathematics on Statistics is $= at bx, = 124 1.133%; ji. Given that a student score 14 in Statistics, using the regression model we found in i, we can find the likely score for mathematics = x = 14 substituting into the regression model § = 1.2 +1133x; = 1.2 + 1.133(14) = 17.062 ili, Finding the standard error of the estimation of the model. se _ facie a0* ree ee) The standard error is given as: S,.y But we know that n = 5 Scanned with CamScanner Now, finding (vi — 5)? Using the table below x Y [ peat bx, e=yv~- HK | f= i-HP 4 5 mame |Sr7a7, -0.732 0.535824 7 12 | 9.431 2.869 8.231161 ~ bs ee (4.599 -0.599 (0.358801 6 3 [7.998 0.002 0.000002, 0 1 [22.53 1.53 2.3409 j 201 =F. ~ = 11.466688 | os = Poem _ [ursecess _ > :Syy = PROBE = JAAS = 1.955 32. (a). Differences Therefore the standard error of the regression model is 1.955 Questionnaire Interview In Questionnaire they made uses of closed ended questions In Interview an open-ended questions are asked by the interviewer to the respondent The questionnaire method of collecting data involves emailing questionnaire to respondents in a written format Interview method is one wherein the interviewer communicates to the respondents orally The questionnaire is objective Interview is subjective Similarities + They both involves questions and answers. ii. Both Questionnaire and Interview require some information obtained from the respondents. They are both instrument of collecting data. (b) Discrete Continuous Nominal Ordinal (vil). The month (v). Ages ofsome | (ii). Types of cars own | (i). Rating of people are both animals by people companies (iv). Title of different | (ii). Ranks of books personnel in the police (vi). Numbers on car | (vii). Grades of number plates students examinations Scanned with CamScanner (c).Using Spearman's rank correlation coefficient and taking them in pairs. ‘The Spearman's Rank Correlation coefficient is given by: p = 1 — (2s Taking our first pair (X,¥) x ¥. Ry [ Ry D?=(R,-R,, 4 5.5 1 ao 35 “| 5 - -| 7 z | oa | a } 3 ~| 2 | i 1 5 2 30.25 __ YD? = 309.5 n=10 >p=1-(22) —0.87576 10(11 Then the percentage of the variability unaccounted for is 1 i oy = 1—(-0.87576)? = 0.233 = 23.3% 6(309.5) )=1 Taking the pair (X,Z) Xx D? =(R,—R,)" | 9 yD? = 121 (B®) 21 (2) 21 -Be ne, ons be Ges) al ee) St eal y= 02607) Then the percentage of the variability unaccounted for is 1—(p)? = 1 — (0.2667)? = 0.9289 = 92.89% Teking the last pair (¥,Z) Yy ! Scanned with CamScanner (c).Using Spearman's rank correlation coefficient and taking them in pairs, ‘The Spearman's Rank Correlation coefficient is given by: p = 1 — (£202) ‘Taking our first pair (X,Y) et x i Ry Ry 2 = = D? = (Re Ry) = 5 2 4 [5 5 3 2 i T 15 10 72.25 | 5 75 2 30.25 | YD? = 3095 = =1— (£222) - 1 ~ (6G) _ 619 _ ‘Then the percentage of the variability unaccounted for is 1 — (p)? = 1 - (—0,87576)? = 0.233 = 23.3% Taking the pair (X, R, R, Dt =(R,—R,)* | 55 35 15 75 10 a 55 35 15 . 75 25 25 \ ye =121, | Nofalwi-lo[sfolufu 07), _(_62)_)_,__ n=10 =p=1- (2) ah =i) alr 0265) Then the percentage of the variability unaccounted for is 1—(p)? = 1 — (0.2667)? = 0.9289 = 92.89% Taking the last pair (Y,Z) _ Scanned with CamScanner 3 es 7 a [2 F a5 45 u =f [3 45 25 5 Ts 2 85 42.25 5 ie 2 65 20.25 {6 L 3 [46S 10 12.25 }2 te [85 6.5 as 1 fT 45 3025 {s 8 25. __+(| 0.25 —_ _ _ yb? = ~ 7) 4 2 (SATS) Bs, n=10 =p=1-(22) 1- (Ss) = — Fo = 70-063636 Then the percentage of the variability unaccounted for is 1 — (p)? = 1 — (-0.063636)? = 0.996 = 99.6% 33. (a) i. Avoid using leading questions; That is, a question that force someone to give an answer ii, Avoid using technical words or jargons; They are words that a group of people understand. Itis not common to many people. iii, Avoid questions that rely on memory as much as possible. For instance, how many visitors did you receive last year? iv. The questions should not be long, complex and ambiguous. v. Avoid using double barrel questions (b). Given that P; = a + bY; + e i. the normal equations for estimating a and b are given by: _ ycbEn ~ nba Dai DI, oR and b= SOx ii, The error term ¢; in the regression model is also referred to as residual. The error term isy the difference between the actual value associated with a data point and the predicted value. tell us about the component of scores that cannot be predicted by variables in the equations iii. For the estimation of the parameters using eyeball fitting method. The first thing to do is to plot a scatter diagram of the Response variable, say Y, against, the Predictor variable say, X. that is, plot of Y on X. Scanned with CamScanner =a byte, That's the gradient b Then follow the steps below to estimate a and b 1 Find the centroid of X and Y. That is the mean of X and Y, (X,Y). 2. Draw a line vertical to the response variable to pass through the point, (X,7). Ensure that it is parallel to the response variable. 3, Identify all the points that are found at the right side and the points at the left side of the vertical line through the centroid. 4, Now find the right centroid by considering all the points at the right side of the line, (Xp, Za). ‘Also find the left centroid by considering all the points at the left side of the line, (1, 7,). 5, Plot the line of best fit by passing it through (X,’) and then as close to (Xp, Ya) and (X,Y) as possible. 6. Now find the Y-intercept and the slope of the line drawn. By obtaining a and b, you are able to obtain, the linear regression model P, = a + bY, +e. (iv) The error that is associated with estimating a and b is given by: S,. 34. (a) i. With reference to p, Estimation is the process of using the sample proportion (5) to estimate the value of the population proportion (p). Scanned with CamScanner _Var(s) = Var(p) = Var (2) = "SF as required Finding the Kendall's tau rank correlation. The is tied observation t = I v1) |five-0-7] Re-arranging the x components and presenting it in a tabular form x Y P @ 10 11 | 12 14 18 a5. 20 19 =]e]eyepepepeye ofo|n[efalala] lofefolselelelo Total=26 Total=2, S=P+(-Q) =26-2= 24 AlsoN =8 And also finding T, and T, There is no tied observation in y so 7, There is tied observation in x 1, = 32-1) Now, 1 appeared 3 times 3 appeared 2 times 4 appeared 2 times = Ty = FDte(te— 1) = 318-1) + 22-1) +22-H] = 3 (10) = 5 s 24 2+ = 0.94753 He , T= = Oe ES Frorn-alno-o-7] | Ee-a-sfeena Therefore Kendall's tau correlation coefficient is 0.94753 =? ii. Coefficient of non-determinatio ~ ()? = 1 (0.94753)? = 0.1022 Which is 10.22%. it means 10.22% is the proportion of total variation in dependent variable say Y explained by the variation in the independent variables, X. Scanned with CamScanner quiz 1 (2019/2020) ANSWER ALL QUESTIONS Q1. Explain the following terms as used in the definition of statistics: a) Science b) Numerical c) Organization Q2. Outline how a survey can be planned. Q3. Briefly explain what is wrong with the following statements in questionnaire designing, if any. a) Isit wise to keep the city clean? b) Can you acquire and maintain an electronic blood pressure monitor? ) How old are you? Q4, Mention five probability sampling techniques and four non-probability sampling techniques. Q5. a) Draw a symmetrical Box-and-whisker plot. (b) Outline any four major usefulness of this box plot in (ai) Q6. Identify four characteristics that are unique to only the mean SOLUTIONS TO QUIZ 1 (2019/2020) 1. a) Sciencet Is a systematic procedures and methods for collecting, analyzing, interpreting and presenting empirical data in statistics. b) Numerical: Is when data is expressed in numbers rather than natural language description. Numerical data is always collected in number form. Scanned with CamScanner c) Organization: Is the systematic arrangement of collected raw data, so that the data becomes easy to understand and more convenient for further statistical work. Q2. A survey can be planned in the following processes; a) Objective (Topic) b) Target population c) Samples d) Knowing the sample size e) Identical samples Q3. a) This isa leading question. This is because, it is forcing the respondent to say yes. b) This is a double-barreled question. This is because, the question touches on more than one issue, yet the respondent is expected to give just an answer to the question. c) This isa sensitive question. This Q4, Using a table to present the answer; Non-Probability Sampling Techniques ‘Snowball sampling Quota sampling Purposive sampling Convenience sampling Probability Sampling Techniques Systematic random sampling Stratified random sampling Cluster sampling “Simple random sampling _ on . nae [Voluntary response Q5. a) Minimum Observation Maximum cee a a % where Q, = Lower quartile, Qz = Median, Q3 = Upper Quartile b) The importance of Box-and-Whisker Plot: i) It detects shift in location that's whether it is symmetric or skewed. Scanned with CamScanner ii) It is used to identify the extreme observations or outliers. lijit is used in comparing variability in two or more datasets. iv) It provides a visual summary of five key numbers that are associated with the dataset, (the minimum value, the lower quarter, the median, the upper quartile and the maximum value). (06. The characteristics that are unique to only the mean are given below; a) The mean is unique for any set of numerical data. b) The mean takes into account every value or score in the data. Therefore, itis sensitive to extreme values, which are called outliers. )_ Means of subsets can be combined to determine the mean of the complete data set. d) The mean cannot be computed for the data in a frequency distribution that has an, open-ended class. Quiz 2 (2019/2020) ANSWER ALL QUESTIONS QL. Apart from the percentiles and the quartiles, mention any five measures of dispersion. Q2. State any one way in basic statistics in which normality of a data set can be determined. Q3. Completer the following statements; a. Correlation coefficient, r. b. In spearman’s ranked correlation coefficient calculation of p. are used for the c. In sperman’s ranked correlation coefficient formula, the value D? represents Scanned with CamScanner Q4. Give three similarities between Spearman's ranked correlation and Kendall tau ranked correlation QS. Briefly discuss how you will calculate the Kendall's tau correlation coefficient, for a data, Assume that there is tied values present in only X (independent value), Q6. A data set is given by 5, 7,3,8 and 12. Find the mean, X. Hence find the variance of the mean and comment on the value obtained. SOLUTIONS TO QUIZ 2 (2019/2020) Q1. The other five measures of ispersion are; a) Deciles b) Range c) Standard deviation d) Variance e) Mean absolute deviation. Q2. Normality of the data set is determined by finding the mean, mode and median and verifying whether they are of the same value. Q3. The correct answer is “ the two variables say, X and Y are measured. It means no matter the unit of measurement of X and Y, the same value of r will be obtained.” Q4. The correct answer is, “ the rank of X and Y.” QS. Similarities a) Both make use of ranked values in estimating r. b) They are both suitable for qualitative and quantitative data. c) They all assume a monotonic relationship between the variables. Q6. The process tom calculate the Kendall's tau correlation coefficient is given below; a) First, we rank the values of X in ascending order. Scanned with CamScanner b) We attach them with their corresponding Y values. ¢)_ Find the sum of the number of the values greater than each of the ¥ values, sum them. and indicate the final result as P. d) Also, find the sum of the number of the value less than each of the ¥ values, indicate the final result as Q (make sure to negate the value of Q), e) Find the number of tied observations in X , multiply it by the difference between the number and 1 and hence, sum them as Ty. f)_ Find the sum of P and Qand label it as S and use the formula below to find the correlation coefficient; a7, = Mi = HTH Hence, Var(z) = var (4 2 But, s? = Tenet 2 Now 3:(x; — #)? = asta Bas 5-7)? +(7-7)? + (3-7)? + (8-7)? + (12-7)? = 46 2 11. Therefore Var(z) = = = 23 This mean that on an average of 7, the values are dispersed from the mean by 2.3 units. Scanned with CamScanner END OF SEMESTER EXAMS 2019/2020 SECTION A Answer these questions in the first hour. It constitutes 50% of the total score. Q1. Which of the following variables can be termed as nominal scale? A. Age of students B. Number of cell phones C. Temperature of a place D. None of the above Q2. The mean is useful as a measure of central location when distribution of scores is not A. Skewed B. Dispersed C. Normal D. None of the above Q3. The median for data set A is 20 and that of data set B is 30. What is the median of the complete data set. A. 20 B. 25 G27 D. None of the above Q4. Which of these is most essential in describing survey methods? A. The target population B. Set of questions to be asked in questionnaire designing C. Make a careful record of what you observe. D. Aimed at determining some characteristics. QS. Which of the following is not associated with structured questionnaire in questionnaire designing? ‘A. Open questions B. Coding and editing C. Closed questions D. Allthe above. Q6. Which of the following statements is true about questionnaire? ‘A. Multiple choice questions may be exhaustive. B. Leading questions may be avoided if possible C. Start your questions with simple demographic questions in all these cases. D. None of the above Q7. All the following are specific measures of dispersions except; A. Deviation B. Range Scanned with CamScanner C. Standard deviation D. Allthe above, Q8. Which of the following statements is not true? A. Percentage component bar chart can be used to describe the skewness of a data set. 8. The error term in regression analysis can be estimated. CC. Stem-and-leaf plot can be used to describe the nature of a data set. D. Relative frequency polygon can be used to describe the skewness of a data set. Q9, Which of the following terms is not associated with the definition of statistics? A. Analysis 8B. Numerical C. Classification D. None of the above. The following is the age distribution of students. [age (yrs) [10 12 [s 17 9 |20 | 55, 40 [20 18 28 [so \ Use the information to answer questions 10 to 12. Q10. Find the mean value of the distribution. A. 12.79 B. 11.79 C. 13.31 D. None of the above Q11. Find the median mark of the distribution. A. 10 B. 12 Cc. 17 D9 Q12. Assuming that the variance of the data set is 0.25, calculate the coefficient of skewness of the distribution. A. 5.45 B. 4.74. C. 9.48 D. None of the above. Q13. Which of the following is/are disadvantage of using questionnaire for survey? |. Cost involved Il. Low response rate Ill. Irresponsible A. Iilonly B. land Ill only ¢. [land Ill only D. Jonly Scanned with CamScanner Q14. if a statistician says that certain information is unreliable, it means that A. The procedure used was wrong 8. The sample size used was too small C. The method and the procedure used were unacceptable D. The right sampling technique was not used: Q15. which of the following statements is true about regression analysis? ‘A. Regression analysis talks about the linear relationship between the two variables, B. The error term is always equal to zero C. Bg is the estimated value of the predictor variable when the response variables are unimportant. D. None of the above. Q16. The linear relationship between Age (X) in yearsnand the Height (¥) in centimeters of # certain kind of animal is given by: ? = 4.12X — 2.21. Ifa kind of animal is 20 years, estimatey the height of the animal. A. 4.32cm 5.39cm B. C. 5.89cm D. None of the above. Q17. Which of the following statements is not true about correlation coefficient (r)? A. rcan be independent of the units in which X and Y 8. There are number of ways r can be calculated. C. r does not depend on which of the variables understudy is labelled, X, and which is labelled, Y. D. All the above statements are true about correlation coefficient. Q18. Which of the following is not a type of correlation coefficient. A. Polychoric correlation coefficient B. Matthew’s correlation coefficient C. Interclass correlation coefficient D. Allthe above are types of correlation coefficient. Q19. Which of the following statements represent a variable? A. The volume ofa liter bottle B, The size of chemical containers C. The number of hours in a day D. None of the above. Q20. Which interpretation correctly explains a very high coefficient of variation. A. The data is highly dispersed about the origin B. The data is highly skewed Scanned with CamScanner C. The data is highly dispersed about the mean D. The data is highly distributed about the mean. Q21. Which of the following is not a true statement about the correlation coefficient (1)? The value of ris independent of the values of the two variables It the possible to practically get r = 1, or 7 = —1. r = 0, means that there is no correlation between the two variables concerned. Allthe above statements about r. poe ‘The summary of a data obtained when a research was conducted is given as, 10 10 28.64 > xy = 2824, Ss 1 = 10 10 = 16.75, yy = 170, i Use this information from the data to answer questions 22 - 26. Q22. Find the correlation coefficient, r, between X and Y. -0.697 A. B. 0.973 c. 0.172 D. None of the above 23. Which of this is suitable interpretation of the value, r? A. Strong negative correlation B. Strong positive correlation C. Weak negative correlation D. Moderate positive correlation. Q24. Assume simple linear regression, estimate the regression coefficient B. A. -4.0 B. 4.5 c 40 D. None of the above. Q25. Estimate the coefficient of determination of the study A. 48.6% B. 94.5% C. 3.9% D. None of the above. Q26. Determine the mean value of the response variable of the study. A. 17.00 B. 168 C. 16.70 Scanned with CamScanner D. None of the above 27. Which of the following statements is weakness for using the eye ball fitting method in simple linear regression? Different lines can be drawn by different people for the same data set. A Different estimates can be obtained by different people for the same data set. B. C. It may not be reliable to use D. . All the above. The table below gives a linear relationship between X and Y. ji Xx 3 2 [4 2 3 2 (aay 2 1 | 3 1 2 2 Use this to answer questions 28 to 29. Q28. Find the simple linear regression equation of ¥ on X. A P= -0.24 40.5%; B. P = -0.30 + 0.8%; Cc. P= 0.30 - 0.8%; D. None of the above Q29. Estimate the residual when X = 10, and Y = 7. A. 0.7 B. 14.70 C. -0.70 D. None of the above. Q30. What is coefficient of non-determination? A. Proportion of total variation in the dependent variable that is explained by the variatic in the independent variable Proportion of total variation in the dependent variable that is not explained by the variation in the independent variable. Proportion of total variation explained by the variables B. ' D. None of the above. SECTION B Answer any two questions from this section All questions carry equal marks 31 (a) what is Regression Analysis? (b) What is editing in questionnaire designing? Scanned with CamScanner (c) the following are the grades obtained by some ten students in an examination for Physics (P), Mathematics (M) and Chemistry (C), where grade Ais the highest seare and & is the lowest score. stud [2 [2 (3 {4 |5 [6 [7 [8 9 7 4 Pry. [B | D+ [B+ [D+ [A E BA at maths|c [Be ic [A [8 le [> [c@ [0 (b+ | Gem fcr fe (e [8 |[@ [D> |b ja |B if Using Spearman's rank correlation coefficient, discuss the relationship between each pair of subjects. Comment on the strength of the relationship, 032 (a) What is pre-test in a survey? (b) Discuss how the eye ball fitting method is used to obtain the regression coefficients a and B, for stimating regression model, y = a + Bx; + €, where y is the response variable, x is the predictor variable and ¢ is the error term. (c) Briefly discuss the term estimations. (d) Given a sample data: 3, 5, 7, 9 and 6, find the mean, £ and estimate its standard error, Sz. 33 (a) Discuss the types of closed items in questionnaire designing. In each type illustrate with an example. (6) Sketch Box-and-whisker plot, and discuss any five main importance. (c) The table below shows the relationship between the Age (¥) and the Weight (X) of some people in community: __ People | 1. 2 3 [4 5 6 7 is | [Age (Y) [10 | 15 25 2 i |2 |B. ae i | Weight | 6 10 18 6 14 15 7 18 i \ i heaves) (i) Using the kendall’s Tau rank correlation coefficient, discuss the relationship between the Age and the Weight. (i) Estimate the proportion of variation that is accounted for by the variables. Q34. (a) What is a filter question in questionnaire designing? (b) Briefly differentiate between data collection technique and sampling Technique. (c) The table below is a distribution of examination results of 70 students. —__ 3 "Chass 145-49 | 50-54 | 55-59 | 60-64 | 65-69 | 70-74 | 75-79 | 80-84 | 85-89 | 90-94 | 95-£ Scanned with CamScanner Freq. 1 Tz 14 [4 7 9 16 {i) Estimate the averages of the data set. (ii) Calculate the coefficient of skewness and hence comment on the distribution of the data, (iii) Find the coefficient of variation and comment on the variability of the distribution. SOLUTIONS TO SECTION A (2019/2020) Q1. The correct option is (D) Q2, The correct option is (A) Q3. Note: we cannot find the combined median of two non-identical data sets given their respective medians only. Also, we don’t have information about the data values too. The correct option is (D) Q4. The correct option is (C) QS. The correct option is (B) Q6. The correct option is (C) Q7, The correct option is (A) Q8. The correct option is (A) Q9. The correct option is (D) Q10. We complete the table by constructing a third row given below; (ix [20 ssi[aa ses fs: 7 9 20 Freq.[55 [40 | 20 18 2 [50 » f = 208 fa |ssome amie ata | | [306 | 225°] 100 >, fe = 2661 | l mean(%) = ye ae 12.79326923 The correct option is (A) Scanned with CamScanner il. Since ¥ f is even, we have that, $(2/ = 2208) = 104 we -arrange the oe ‘in ascending order of age. 2 x 7 110 12 7 {20 Freq. | 20 [2s 35 40 18 {50 We look for 104* and 105® positions by adding frequencies starting from the least age (5 years). Hence, the median is 12. The correct option is (B) 12. skewness(y,) = = 30279-12) _ 4.74 The correct option is (8) 13. The correct option is (C) Q14. The correct option is (D) Q15. The correct option is (D) Qué. 7 = 4.12x -2.21 when x = 20,9 = 4.12(20) — 2.21 = 80.19em. The correct option is (D) 17. The correct option is (D) Q18. The correct option is (C) Q19. The correct option is (B) Q20. The correct option is (C) Q21. The correct option is (A) Jann. tinh] 10(282.4) — 16.75(170) [10 (28.64) — (16.75)"][10(2900) - (170)?] The correct option is (D) Q22. = -0.9726451274 = -0.973 Q23.The correct option is (A) DB pay-Dhy2Dhyy _ 10(2824)-1675(070) _ _ 4975695931 = 4.0 nYhy x?-(2h12) *"40(28.64)-(16.75)* The correct option is (A) O24. = Scanned with CamScanner 025. R? = r? = (-0.973)’ = 0.946729 = 94.7% The correct option is (D) 170 a26.y = 2X = 2 - 17.00 nm 0 ‘The correct option is (A) Q27. The correct option is (D) O28. f= a+ fx x Y XY x? 3 2 6 9 2 i 2 4 4 3 12 16 2. 1 2 4 3 2 6 9 2 2 4 4 yeu DxY = 32 Dix =46 32)—1601) _ mn 6 etree 08 a= MAI BEAT N08 — 9.39 hence, J = —0.30 +0.8x The correct option is (B) Q29.e=y-f when x = 10,y = 7, 9 = —0.30 + 0.8(10) = 7.7 hence,e; = 7-7.7 = -0.7 The correct option is (C). 030. The correct option is (B) SOLUTIONS TO SECTION B 31.(a). Regression Analysis represents mathematical equation that defines the relationshi¢ between two or more variables. It seeks to find a model that best describes the relationship tha! exist between two or more variables. The relationship could be linear or non-linear. There are two main types of regression analysis. It includes: i. Simple Linear Regression Analysis ii, Multiple Linear Regression Analysis Scanned with CamScanner (0). Editing in questionnaire designing is the first part of questionnaire processing that deals with assessing the completeness of surveys and preparing them for spreadsheet entries it involves checking for defective questionnaires, looking out for misprints, as well as illegible and inconsistent responses. - oie MATHS Re Ry D? = (Rp — Ru) | B Cc [55 6 0.25 A BF R 2 0 D* [ ¢ as 6 6.25 Be A 4 1 9 a Dt B 85 3 3025 “ A c i 6 16 ae E dD 't0 95 0.25 ~~ B ct 55 4 Brags 8 A _D 2 95, 56.25 cs Dt i 8 1 = = ane pt = 2 = Gea 26363 [PHYSICS CHEMISTRY Rp Re DP = (Rp — Re) | B ct 5.5 65 1 sd A B 2 35 2.25 Dt B 8.5. 35 25 Bt B 4 35 0.25 Dt ct 85 65 4 A D 2 9 49 10 8 eS | 35 1 20.25 = 2 35 2.25 —| 9 Z 10 posi | L ean) a (ae =) Scanned with CamScanner (0). Editing in questionnaire designing is the first part of questionnaire processing that deals with assessing the completeness of surveys and preparing them for spreadsheet entries.it involves checking for defective questionnaires, looking out for misprints, as well as illegible and inconsistent responses. (a. PHYSICS. MATHS Rp B Ry BJ sfAlsfajalajala | @lolalolalelelalylo 1 Goer, rpaysics ___| CHEMISTRY | PHYSIC F A B Dt B Bt B air) aS) Gees a) CHEMISTRY | remit Scanned with CamScanner =1-(24) n?=1), e 32.(a) A pre-test in survey is when a questionnaire is tested on a small sample of respondents, before a full-scale study in order to identify any problems such as unclear wording or ambiguous sentences (b) For the estimation of the parameters using eyeball fitting method; The first thing to do is to plot a scatter diagram of the Response variable, say Y, against, the Predictor variable say, X. that is, plot of Y on X. y=atpxte That's the gradient f = Then follow the steps below to estimate a and b 1. Find the centroid of X and Y. That is the mean of X and Y, (X,Y). 2 Draw a line vertical to the response variable to pass through the point, (X, 7). Ensure that it is parallel to the response variable. Scanned with CamScanner 3. Identify all the points that are found at th: Qu e right sidi z ae eee ight side and the points at the left side of the 4. Now nd the right centroid by considering all the points at the right side of the line, (Xa, ¥x). Also find the left centroid by considering all the points at the left side of the line, , (X,, ¥,). Plot the line of best fit b RP gms si *s possible. it by passing it through (X,Y) and then as close to (Zp, 7a) and (X1,¥,) 6. Now find the Y-intercept and the slope of the line drawn. By obtaining a and b, you are able to obtain, the linear regression model y = a + Bx; + &- {c). Estimation can be defined as the process of using the sample parameter to estimate the population parameter. For example, when the sample mean, z is used to estimate the population mean, 1. (d) finding the mean first. 7 ii _ 345474046 _ 30 got = Be =6 wow ZC — x)? = (3-6)? + 5-6)? + (7-6)? + (9-6)? + (6 - 6)" = 20 2.236 5 22286. Ea 0.99997 Therefore Sz 33. (a). i. Multiple-Choice Questions: In here, a number of alternative answers are provided and the respondent is asked to select one or more of the alternatives. Usually @ good multiple choice questions have alternatives which are exhaustive. Eaxmple: Which of the following means of transport do you use to travel to work? Car iy Bus {] Bicycle Lind On foot [1 Other (Please specify) -. Questions: This type of question has only two alternative responses, e.g Agree or Disagree, etc. These questions are easy to code and dents and the researchers. ii, Dichotomous Yes or No, Accept or Don’t accept, analyze but can also pose problems for the respon' Example: Are you a citizen of Ghana? Yes [el No L-3 Scanned with CamScanner ili, Ranking Questions: Is situation in which a respondent is allowed to select two or more alternatives as answers to a single question. In here, respondents are asked to rank the set of alternatives in order of increasing or decreasing importance. Example: What attributes do you think are responsible for your company's success?(Indicate by is the least important and the 4 most important). numbering from 1 to 4, in order, where Good management ) Employee morale C1 Adequate funding C1 Loyalty of consumers tJ iv. Rating Questions (The Likert Scale): In here, our concern is about which attribute is more important than the other. So basically we want to measure the level of importance attached to each one separately using the same scale. Example: every respondent might be asked to rate Good management from (iii) on a1 to scale, which may be defined as: 1=Very unimportant (VU) 2=Unimportant (U) 3=Not sure (N) 4zImportant (I) =Very important (VI) BOX-AND-WHISKER PLOT (b). Z: Minimum Observatio| Maximum observati es Qa Q Q where Q, = Lower quartile, Q, = Median, Q; = Upper Quartile The importance of Box-and-Whisker Plo! i. It detects shift in location that’s whether it is symmetric or skewed ii, It is used to identify the extreme observations or outliers iii, it is used in comparing variability in two or more datasets iv. it provides a visual summary of five key numbers that are associated with the dataset ie. The minimum value, the lower quarter, the median, the upper quartile and the maximum value v. Itis used in comparing location in two or more datasets H-Spread (c).{i) Finding the Kendall's tau rank correlation. The is tied observation t 1)-Tx][20(v-1)-1, Scanned with CamScanner Re-arranging the x components an 5 ( a Rrpsentng in a tabular form is | fs “fo a 5 0 a) 4 o 3 0 2 0 oO i fo [o FCS Total=26 [Total=2 And also finding T, and Ty There is no tied observation in y so T, = 0 There is tied observation in x 3 Te = 5 te(te — 1) Now, 6 appeared 2 times 18 appeared 2 times 1 Ty = FYte(te ~ 1) = F122 - 1) 422-1 =2@ Hence, T= Therefore Kendall's tau correlation coefficient is 0.8895 (ii) Finding the coefficient of determination=(R)* = (x)? = (0.8895)? = 0.792 = 79.2% 34, (a) Filter questions are questions (typically formatted as "yes or no”) meant to help respondents avoid answering questions that do not pertain to them. In filter questions, respondents who answer “yes” to filter questions are then asked more detailed follow-up questions, whereas those who answer “no” are not questioned further on the topic. (b) Data collection technique refers to the ways data are collected for a study. We can collect using this key methods: by observations, by interviews and by questionnaire, While Sampling Technique refers to the methods used in selecting sample for a study. It includes probability sampling and non-probability sampling. ‘Casslimit [© Boundary | Midpoint | Frequency fx i x (sag ([4as-a9.5 | 47 1 on 50-54 \49.5-54.5 52 {2 Scanned with CamScanner

You might also like