Download as pdf
Download as pdf
You are on page 1of 28
Course Code BCS-040 Course Title : Statistical Techniques Assignment Number BCA(IV)040/Assignment/2023-24 Maximum Marks : 100 Weightage 23% Last Date of Submission 31 October, 2023 (For July session) 30" April, 2024 (For January session) Note: This assignment has 8 questions of 80 marks (each question carries equal marks). Answer all the questions, Answer all the questions. Rest 20 marks are for viva voce. You may use illustrations and diagrams to enhance explanations. Please go through the guidelines regarding assignments given in the Programme Guide for the format of presentation. QI. Calculate the mean and standard deviation for the following data: (19) 0-10 r 10 — 20 8 20 — 30 10 30—40 36 40 — 50 12 50— 60 17 60 — 70 10 Q2. Given the following sample of 20 numbers: a0) 15 45 52 43 50 59 41 47 56 79 72 18.45 54.78 12.41 4858 14 (@ Compute mean, variance and standard deviation. (ii) If the largest value in the above set of numbers is changed to 500, to what extent are the mean and variance affected by the change? Justify your answer. Q3. ao (a) Write two merits and two demerits of Median. (b) An incomplete frequency distribution is given as follows 12.30 765 25 18 ch Frequenc) 10-20 12 20-30 30. 30-40 ? 40-50 6 50 60. ? 60-70 25 70— 80 18 Qs. Given that median value of 200 observations is 46, determine the missing frequencies using the median formula. Box X contains 5 red and 4 blue balls, Box Y contains 2 red and 5 blue balls. A ball is drawn at random from each box. Find the probability of drawing one red and one blue ball ao ‘A Manager of a car company wants to estimate the relationship between age of cars and annual ‘maintenance cost. The following data from six cars of same model are obtained as; ‘Age of Car (in years) ] Annual Maintenance Cost (In hundred rupees) 7 10 15 is 20 25 35 olay = (a) Construct a (b) _ Fita best linear regression line, by considering annual maintenance cost as the dependent variable and the age of the car as the independent variable. 2 er diagram for the data given above, 8) (©) Use this regression line to predict the annual maintenance cost for the car of age 8 years.(5) Suppose A and B are two independent events, associated with a random experiment. If the probability of occurrence of either A or B equals 0.6; while probability that only A occurs equals 0.4, then determine the probability of occurrence of event B. ao A chemical firm wants to determine how four catalysts differ in yield, The firm runs the experiment in three of its plants, types A, B, C. In each plant, the yield is measured with each catalyst. The yield (in quintals) are as follows: Plant Catalyst eo) —| fe —elele A B ic (a) Performan ANOVA and comment whether the yield due to a particular catalyst is significant or not at 5% level of significance, Given Fxs= 4.76. 6) (b) Construct ANOVA table for one-way classification, 6 Explain the following with the help of an example each: ) Binomial distribution b) test for mean ©) Properties of good estimator 4d) F-test for Equality of two variances ao) Copyright with Kunj Publication only Not for resale Ph, 8006184581 (Call Us) Course Code: BCS-040 Course Title: Statistical Techniques Assignment Number: BCA (IV)/040/Assignment/2023-24 Disclainer/Specal Note: These are jus the sample of the AnswerSoltions to some ofthe Question given inthe Assignments. These ‘Somple AnawerSolions are prepared by Prsne Teacher/Taruuthors forthe help and pidanc the dent to get an ea of how Iefthe can anaer the Questions given the Asignmens, We dona clan 100% accuracy of thse sample answers as hee are based on the Ieowlge and capa of Private Teacher ator, Sample answers may e sen athe Gaideep forthe reference to prepare the ners ofthe question given bth assignment A these solutions and answers are prepared bythe private TeacherTator so he chances ‘oferroror mistake cannot be dene Any Omission or Error highs regreted though every civ has Been taken while propa theve Sample Ansiwers/ Solutions. Please consult your own TeacherTutor before you prepare a partcular Ansiver and or uptodate ntl exact ‘nformation, data and olan. Stade should mut dad refer the ocala materia proved by the ier Note: This assignment has 8 questions of 80 marks (each question carries equal marks). Answer all the questions. Answer all the questions. Rest 20 marks are for viva voce. You may use illustrations and diagrams to enhance explanations. Please go through the guidelines regarding assignments given in the Programme Guide for the format of presentation. Q1. Calculate the mean.and standard deviation for the following data: 0-10 7 19 —20 8 20— 30 10 30— 40 %6 40= 50, 22. 50 — 60 47. 60 70 18 Ans: To calculate the mean and standard deviation for the given data, we first need to find the midpoint of each interval and then use those midpoints to find the mean. After that, we can calculate the standard deviation using the formula. Step 1: Find the midpoints of each interval. The midpoint of each interval can be calculated by taking the average of the lower and upper bounds. Interval Midpoint 0-10 5 10-20 15 20-30 25, 30-40 35 40-50 45 50-60 55 60-70 65 Copyright with Kunj Publication only Not for resale Ph. 8006184581 (Call Us) Step 2: Calculate the sum of (midpoint x frequency) for all intervals. Sum = (5 x 7) + (15 x 8) + (25 x 10) + (35 x 36) + (45 x 12) + (55 x 17) + 65 x 10) Step 3: Calculate the total frequency. Total frequency = 7 + 8 + 10+ 36+ 12+17+10 Step 4: Calculate the mean (average). Mean = Sum / Total frequency Step 5: Calculate the variance (average of squared differences from the mean). Variance = (5 - Mean)*2 x 7 + (15 - Mean)42 x 8 + (25 - Mean)*2 x 10 + (35 - Mean)*2 x 36 + (45 - Mean)42 x 12 + (55 - Mean)*2 x 17 + (65 - Mean)*2 x 10]/ Total frequency Step 6: Calculate the standard deviation (square root of the variance). Standard deviation = Variance: Let's perform the calculations: Step.1: Interval 0-10 10 - 20. 20-30 30 - 40 40-50. 50 - 60 60-70 Step 2: (5 x 7) + (15 x 8) + (25 x 10) + (35 x 36) + (45 x 12) + (55 x 17) + (65 x 10) 35 + 120 + 250 + 1260 + 540 +935 + 650 Sum = 3950 Total frequency =7+8+10+36+12+17+10 Total frequency = 100 Step 4: Mean = Sum / Total frequency Mean = 3950// 100 Mean = 39.5 Copyright with Kunj Publication only Not for resale Ph. 8006184581 (Call Us) Step 5: Variance = [(5 - 39.5)§2 x 7 + (15 - 39.5)92 x 8 + (25 - 39.5)*2 x 10 + (35 - 39.5)%2 x 36 + (45 - 39,5)*2 x 12 + (SS - 39.5)*2 x 17 + (65 - 39.5)*2 x 10] / 100 Variance = [(-34.5)"2 x 7 + (-24.5)"2 x 8 + (-14.5)"2 x 10 + (-4.5)"2 x 36 + (5.5)92 x 12 + (15.5)°2 x 17 + (25.5)°2 x 10] / 100 Variance = [1188.25 + 600.25 + 210.25 + 81 + 363 + 380.25 + 650.25] / 100 Variance = 3473.25 / 100 Variance = 34.7325 Step 6: Standard deviation = Variance Standard deviation Standard deviation So, the mean is approximately. 39:5, and the staitdard deviation is approximately 5.893. Q2. Given the following sample of 20 numbers? 15 45 52 43 50 59 41 47 56 79 72 18 45 54.78 12 41/48 58 14 @ Compute meansvariance andstandard deviations ‘To compute the mean, variance, and Standard deviation of the given saniple of 20 numbers, follow these steps: Step 1: Calculate the mean (average) of the numbers. Step 2: Calculate the variance by finding the average of the squared differences, between each number and the mean. Step 3: Calculate the standard deviation by taking the square root of the variance. Let's do the calculations step by step: Step 1: Calculate the mean (average): Mean =(15 +45 +52 +43 +50 +594 41447456479 +724 18445454478 + 12 +41 +48 +58 4 14)/20 Mean = 821/20 Mean = 41.05 Step 2: Calculate the variance: Variance = © [(xi - Mean)*2] /N Copyright with Kunj Publication only Not for resale Ph. 8006184581 (Call Us) where xi is each number in the sample, Mean is the mean calculated in Step 1, and N is the number of data points (20 in this case). Variance = [(15 - 41.05)42 + (45 - 41.05)*2 + (52 - 41.05)*2 + (43 - 41.052 + (50 - 41.05)"2 + (59 = 41.05)92 + (41 - 41.05)92 + (47 - 41.05)"2 + (56 - 41.05)42 + (79 - 41.05)"2 + (72 - 41.05)42 + (18 - 41,05)42 + (45 - 41.05)*2 + (54 - 41.05)42 + (78 - 41.05)"2 + (12 - 41.05)92 + (41 - 41.05)92 + (48 - 41.05)"2 + (58 - 41.05)92 + (14 - 41.052] / 20 Variance = (726.7681 + 16.0801 + 134.4721 + 0.0025 + 64.8001 + 323.6025 + 0.0025 + 22.1801 + 192.0225 + 1384,0601 + 972.6401 + 657.8101 + 16.0801 + 158.6025 + 1393.4801 + 870.3025 + 0.0025 + 9.0025 + 329.4025 + 1038.4601) /20 Variance = 650.477 Step 3: Calculate the standard deviation: Standard Deviation = \Variance Standard Deviation= 650.477 Standard Deviation = 25.51 (rounded to two decimal places) So, the computed values are: Mean = 41.05 Variance = 650.477, Standard Deviation = 25.51 (ii) If the largest value in theaiboye set of numbers is changed to 500, to what extefibare the Hiéan and/Variance affected by the change? Justify your answer. To determine how changing the largest value in the set of numbers affects the mean and variance, let's first calculate the mean and variance of the original set of numbers and then compare them with the modified set. Original set of numbers (n=20): 15 45 52 43 50 59 41 47 56 79 72 18 45 54.78 12 41 48 58 14 Step 1: Calculate the mean (average): Mean = (Sum of all numbers) / (Number of numbers) Mean = (15 + 45 +52 + 43 +50 +59 +41 +47 +56+79+72 + 18 +45 +54+78+ 12+41 +48 +58 + 14)/20 Step 2: Calculate the variance: Copyright with Kunj Publication only Not for resale Ph. 8006184581 (Call Us) Variance = 3((xi - Mean)*2) /1n where xi is each individual number in the set. Variance = [(15 - 37.342 + (45 - 37.3)2 +... + (14 - 37.3)%2] / 20, Variance = (371.6 + 81.61 + ... + 610.09) / 20 Variance = 2251.14 / 20 Variance ~ 112.56 Now, let's modify the largest value to 500 and recalculate the mean and variance. Modified set of numbers (n=20): 15 45 52.43 50 59 41 47 56 79 72 18 45 5478 12 41 48 58500 Step 1: Calculate the mean (average): Mean = (Sum of all numbers) / (Number of numbers) Mean = (15 + 45 +52 + 43 + 50459 + 41 +474 56 + 79 +72 + 18 +45 + 54+78 + 12 +41 +48 +58 + 500)/ 20 Mean = 1316/20, Mean = 65.8 Step 2: Calculate the variance: Variance = ¥((xi- Mean)’2)//n. where xi is each individual number in the set, Variance = [(15 - 65.8)°2 + (45 - 65,8)92 + + (500 —65.8)*2}7 20° Variance = (2809.64 + 437.44+ ... + 173895.04) / 20 Variance = 246089.98 (20 Variance ~ 12304.5 Now, let's compare the original and modified mean and variance: Original Mean ~ 37.3 Modified Mean = 65.8 Change in Mean = Modified Mean - Original Mean Change in Mean Change in Mean Original Variance = 112.56 Modified Variance = 12304.5 Change in Variance = Modified Variance - Original Variance Change in Variance = 12304.5 - 112.56 Copyright with Kunj Publication only Not for resale Ph. 8006184581 (Call Us) Change in Variance = 12191.94 Justification: When the largest value in the set is changed from 79 to 500, the mean increases significantly from approximately 37.3 to 65.8. This increase in mean is expected since we replaced a smaller number with a much larger one. Similarly, the variance increases substantially from approximately 112.56 to 12304.5. This increase in variance is also expected because the variance measures the spread of data points around the mean, and replacing one of the smaller values with a much larger one causes the data to be more dispersed. In summary, changing the largest value from 79 to 500 leads to a considerable increase in both the mean and the variance. Qs. (a) Write two merits and twedemerits of Median, Median is a statistical measure commonly used in data analysis, especially in scenarios where data is skewed or contains outliers. Here are two merits and two demerits of using the median: Merits: 1. Robustness to outliers: One of the primary, adyantages of the median is,its robustness to outliers. OutlietS are extreme values that can significantly skew the mean (average) of adataset. However, since the median only considers the middle Value, itis less affected by extreme data points, making it a more reliable measure of central tendency in such cases. 2. Applicable to ordinal data: The median can be used with ordinal data, which represents a ranking of values rather than numerical values. In situations where precise numerical values may not be available or meaningful, using the median allows for meaningful analysis and comparisons based on the order of the data points. Demer 1. Insensitive to data distribution: While the median is robust to outliers, it does not take into account the distribution of the data. This means that it may not accurately represent the overall shape or spread of the data. In cases where the data is symmetrically distributed, the median might not capture the true center as well as the mean would. 2. Limited use with small sample sizes: The median can be less precise with small sample sizes. Since it only considers one or a few data points in the Copyright with Kunj Publication only Not for resale Ph. 8006184581 (Call Us) middle, it may not provide a good estimate of the true central tendency if the sample size is very small. In such cases, the mean might be a better measure. In summary, the median is a valuable statistical measure due to its robustness to outliers and applicability to ordinal data. However, it has limitations when it comes to representing the overall data distribution and may not be as precise with small sample sizes, Understanding the context and characteristics of the data is essential in choosing the appropriate measure of central tendency for any analysis. (b) An incomplete frequency distribution is given as follows 12 30? 65 25 18 Frequency 12 30 2 40-50 65 50-60 z 60-70 25 70-80 18 Given that median Yaluc of 200 observations is 46, determine the missing frequencies using the median formula, Todetermine the missing frequencies using the median formula, we need to find the median class first. The median class is the cliss interval that contains the median Value, which in this case is 467 Once we identify the median class, we can use the median formula to find the missing frequencies. Given frequency distribution: C.L. Frequency 10-20 12 20-3030 30-402 40 ~ 50 65 50-60? 60 ~ 70.25 70-80 18 Step 1: Identify the median class, The median class is the class interval that contains the median value, which is 46. Median class: 40 — 50 Copyright with Kunj Publication only Not for resale Ph. 8006184581 (Call Us) Step 2: Use the median formula to find the missing frequencies. ‘The median formula is: Median = L + ((N/2 - F)/f) *C Where: L = Lower boundary of the median class N = Total number of observations F = Cumulative frequency of the class before the median class £ = Frequency of the median class C = Width of the median class (upper boundary - lower boundary) Given information: N = 200 (total number of observations) Median = 46 L = 40 (lower boundary of the median class) C= 50 - 40 = 10 (width of the median class) Now we need.to find the cumulative frequency (F) of the class before the median class, which is the sum of the frequencies up to the class, before the median.class, Cumulative Frequency of the class before the median class = Frequency of the class in the previous row (30) + Cumulative frequency of the class in the previous row (12) = 30.4 12= 42. Now we can calculate the missifig frequency in the median class: 46 = 40 + (20/2 - 42)/f)* 10 Simplify: 46 - 40 = ((100 - 42) /f) * 10 6 = (58/1) * 10 Now, solve for f £=(58 * 10)/6 £=580/6 £ = 96.67 (approximately) Since frequency must be a whole number, we round f to the nearest whole number: Copyright with Kunj Publication only Not for resale Ph. 8006184581 (Call Us) £297 Now we have found the missing frequency in the median class, which is approximately 97. Updated frequency distribution: C.l. Frequency 10-20 12 20-3030 30-4097 40-50 65 50-60? 60 — 70 25 70-80 18 Now, we can use the fact that the total number of observations (N) is 200 to find the ing frequency in the class 50 — 60: N=¥ Frequency, 200 = 12 + 30 +97 +65 + Missing Frequency +25 +18 Missing Frequency = 200 - (12 + 30 +.97.4.65 +25 +18) Missing Frequency = 200 - 247 Missing Frequency = -47 However, we cannot have a negative frequency. It seems there might be an error in the original data or the calculations. Please double-check the values and the median formula used to ensure accuracy. Q4. Box X contains 5 red and 4 blue balls, Box Y contains 2 red and 5 blue balls. A ball is drawn at random from each box. Find the probability of drawing one red and one blue ball. To find the probability of drawing one red and one blue ball from the two boxes, we can set up the possible outcomes and calculate the probabilities. Let's use R to represent a red ball and B to represent a blue ball. Box X contains 5 red balls (R) and 4 blue balls (B). Box Y contains 2 red balls (R) and 5 blue balls (B). —[—$__— To calculate the probability of drawing one red and one blue ball, we need to consider two cases: 1. Drawing a red ball from Box X and a blue ball from Box Y 2. Drawing a blue ball from Box X and a red ball from Box Y, Let's draw a diagram to represent the possible outcomes for each case: eee kes ee ee Cees kag Coa cae 4B Peers Cate Coe rar or ey Box X: SR Box zr Pees Cre Now, let's calculate the probabilities for each case: 1. Probability of drawing a fed ball from Box X and a blue ball from Box Y: Probability = (Number of ways to draw (R, B)) / (Total number of possible outcomes) Probability = (Number of red balls in Box X * Number of blue balls in Box Y) / (Total number of balls in each box) Probability = (5 * 5) /(9 * 7, 2. Probability of drawing a blue ball from Box X and a red ball from Box Y: Probability = (Number of ways to draw (B, R)) / (Total number of possible outcomes) Probability = (Number of blue balls in Box X * Number of red balls in Box Y) / (Total number of balls in each box) Probability = (4 * 2) /(9 * 7)= 8/63 = 0.127 25/63 = 0.397 Now, to find the overall probability of drawing one red and one blue ball, we sum up. the probabilities of both cases: Total Probability = Probability of Case 1 + Probability of Case 2 Total Probability ~ 0.397 + 0.127 = 0.524 ul So, the probability of drawing one red and one blue ball is approximately 0.524 or 52.4% QS. A Manager of a car company wants to estimate the relationship between age of cars and annual maintenance cost. The following data from six cars of same model are obtained as: Age of Car (in years) Annual Maintenance Cost (In hundred rupees) 10 15 18 20 25 35 (a) Construct a seatter diagraniffor the data given above. To construct a scatter diagram for the given data, welll plot the age of the cars on the x-axis and the annual maintenance cost (in hundred rupees) on the y-axis. Each data point will represent one car from the dataset. Here's the scatter plot for the data: The scatter diagram visually represents the relationship between the age of the cars and their corresponding annual maintenance costs. Each point on the plot shows the age of a car on the x-axis and its maintenance cost on the y-axis. This allows us to observe any patterns or trends in the data, (b) Fit a best linear regression line, by considering annual maintenance cost as the dependent variable and the age of the car as the independent variable. To fit the best linear regression line for the given data, we need to find the equation of the line that best describes the relationship between the age of the cars and their annual maintenance cost. The equation of a linear regression line is given by yemxtb Copyright with Kunj Publication only Not for resale Ph. 8006184581 (Call Us) where: + yiis the dependent variable (Annual Maintenance Cost in this case) ‘vis the independent variable (Age of the Car in years). mis the slope of the line. bis the y-intercept (the value of y when x is 0). We can use the least squares method to estimate the values of m and b that minimize the sum of squared differences between the predicted values and the actual values of the dependent variable. Let's calculate the values of m and b step by step: Step 1: Calculate the mean of the independent variable (Age of Car) and the dependent variable (Annual Maintenange Gost). . 5 (rounded to one decimal place) 104154+18+20+25+3: 123 ae a a 72 = 20,5 (rounded to one decimal place) 6 Step 2: Calculate the sum of the produets of the differences between each data point and the mean of x and y. Y(@iI=2. (i-¥)) = - 3.5) . (10 ~ 20.5) + (23.5) - (15 ~ 20.5) + B-3.5) +(18-20.5)44-3.5)-(20-20.5)+(5-3.5)-(25-20.5)+(6-3.5)(35-20.5) = -2.5-10.5+4A1.5)-(H5.5)}4(-0.5)(-2.5)40.5-(0.S)H1S-4,.542.5:14,5 = 26.25+8.25+1.25-0.25+6,75+36.25 = 78.75 Step 3: Calculate the sum of the squared differences between each data point and the mean of x. L((xi — ¥)2) = (1-3.5)? + 23.5) + 3-3 2.5)? + (-1.5)? + (0.5)? + 0.5? + 1,5? + 2.57 =6,25+2.2540.25+0,25+2.25+6.25 =175 Copyright with Kunj Publication only Not for resale Ph. 8006184581 (Call Us) Step 4: Calculate the slope (m) of the regression line. ((xt-X). (i-Y)) D(i-z)2) 175 m=4.5 (rounded to one decimal place) m: Step 5: Calculate the y-intercept (6) of the regression line. b= y= (mex) = 20.5 ~ (4.53.5) b=20,5-15.75=4.75 (rounded to two decimal places) So, the best linear regression line that fits the data can be represented by the equation: Annual Maintenance Cost=4.5xAge of art4.75 Note: The values are rounded to one or two decimal places for simplicity, but you can use more precise values if needed. (©) Use this régression line to predict the annual maintenance cost for the car of age 8 years. ‘To predict the annual maintenance cost fora car of age 8 years, we can use the regression line obtained from the given data. The regression line is a mathematical model that estimates the relationship between the age of cars and annual maintenance costs. Let's first find the equation of the regression line (y = mx + b) using the given data: Step 1: Calculate the mean of the age of cars (x) and the mean of the annual maintenance cost (y) Mean of x (age) =(1+2+3+44+5+6)/6=3.5 Mean of y (annual maintenance cost) = (10+ 15 + 18 + 20 +25 + 35) /6 = 23.8333 (approx) Step 2: Calculate the sum of squares of the differences between x and its mean (Z(x - %)) and the sum of products of the differences between x and its mean and the differences between y and its mean (3(x - )(y - 9)). B(x - HP = (1 - 3.5)? + (2- 3.5 + (3-35 + 4-357 +5 -3.5P + 6-35? = 2.9167 (approx) Copyright with Kunj Publication only Not for resale Ph. 8006184581 (Call Us) (x - Ry - F) = (I - 3.5)(10 - 23.8333) + (2 - 3.5)(15 - 23.8333) + (3 - 3.5)(18 - 23,8333) + (4 - 3.5)(20 - 23.8333) + (5 - 3.5)(25 - 23.8333) + (6 - 3.5)(35 - 23.8333) = 49.5 (approx) Step 3: Calculate the slope (m) of the regression line. m= E(x - X)(y = 9) / Bx - XP m= 49.5 / 2.9167 m= 16.98 Step 4: Calculate the y-intercept (b) of the regression line. b=y-m*x b= 23.8333 - 16.98 * = 23,8333 - 5 NJ. PUBLICATION ALL US:- 8006184581 , we can use this equati ict the annual maintenance cost for a car of age y = 16.98 * 8 - 35.60 y* 135.84 - 35.60 y= 100.24 ‘The predicted annual maintenance cost for a car of age 8 years is approximately 100.24 hundred rupees. Q6. Suppose A and B are two independent events, associated with a random experiment. If the probability of occurrence of either A or B equals 0.6; while probability that only A occurs equals 0.4, then determine the probability of occurrence of event B. Copyright with Kunj Publication only Not for resale Ph. 8006184581 (Call Us) Let's denote the probability of event A occurring as P(A) and the probability of event B occurring as P(B). Given information: 1. Probability of occurrence of either A or B (P(A or B)) equals 0.6. 2. Probability that only A occurs (P(A) - P(A and B)) equals 0.4. Since A and B are independent events, the probability of both A and B occurring (P(A and B)) is given by: P(A and B) = P(A) * P(B) Now, let's use the information provided to solve for P(B): 1. Probability of occurrence, or B)) equals 0.6. P(A or B) = P(A) + ~P(A and B) KE PUBLICATION we can substitute tl P(A) from the second equation into the first 0.6 = (0.4 + P(A and B)) + P(B) - P(A and B) Simplify: 0.6 =0.4 + P(B) Now, solve for P(B): PB) =0.6- 0.4 PB) =0.2 So, the probability of occurrence of event B (P(B)) is 0.2 or 20%. Copyright with Kunj Publication only Not for resale Ph. 8006184581 (Call Us) Q7. A chemical firm wants to determine how four catalysts differ in yield. The firm runs the experiment in three of its plants, types A, B, C. In each plant, the yield is measured with each catalyst. The yield (in quintals) are as follows: Plant Catalyst 1 A 2 B 3 c 1 (a) Perform an ANOVA and comment whether the yield due to a particular catalyst is significant or not at 5% level of significance. Given F36= 4.76. ‘To perform the analysis of variance (ANOVA) for the given data, we'll first set up the null and alternative hypotheses: Null Hypothesis (H0): The meaiis of yield for all four catalysts are equal. Alternative Hypothesis (Ha): The means Of yield for at least one catalyst differ significantly from the others. Next, we'll calculate the necessary values for the ANOVA: 1. Calculate the overall mean (Grand Mean): Grand Mean = (Sum of all observations) / (Total number of observations) Calculate the sum of squares betweer groups (SSG): SSG = Sum of (Number of observations in each group * (Mean of each group - Grand Mean)*2) . Calculate the sum of squares within groups (SSW): SSW = Sum of [(Observation - Mean of the group)42] . Calculate the degrees of freedom between groups (dfG): diG = Number of groups ~ | 5. Calculate the degrees of freedom within groups (dfW): d£W = Total number of observations - Number of groups . Calculate the mean square between groups (MSG): MSG = SSG/dfG . Calculate the mean square within groups (MSW): MSW = SSW / dfW . Calculate the F-ratio: Copyright with Kunj Publication only Not for resale Ph. 8006184581 (Call Us) F=MSG/MSW Now, let's calculate the values for the given data: Plant Catalyst 1234 A2124 B3213 C1331 1, Calculate the overall mean (Grand Mean): Grand Mean =(2+1+2+44+3+2414341434+3+1/12 = 23/12 = 1.92 (rounded to two decimal places) 2. Calculate the sum of squares between, groups (SSG): SSG = (4*(1.92 - 292+ 4*(1.92 - 1)42) + (441.92.- 2)72) + (441.92 - 4)92) + (4°(1.92 - 3)°2) 4 (49(1.92 - 2)92) + (4(1.92 - 1)92) + (4'*(1.92 - 3)42) = 8.64 + 2.88 8.64 + 25.92 + 5.76 + 8.64 + 2.88 + 5.76 369512 3, Calculate the sum of squares within groups (SSW): SSW = (2- 1.92)92°+ (1 - 1.92)2 + (2- 1.92)°2 + (4 - 1.92)92 + (3 - 1.922 + 2- 1,92)92 + (1 - 1.92)92 + (3 - 1.9292 4. 92)42 + (BIA1-92)9F 3 A 1,92)92-4U1- 1,92)2 = 0.0064 + 0.0064 +0.00644 4.1664 + 1.4464 + 0.0064 + 0.0064 + 1.4464 + 0.0064 + 1.4464 + 1.4464 + 0.0064 = 10.4136 4. Calculate the degrees of freedom between groups (dfG): dfG = Number of groups - 1 =3-1 =2 5. Calculate the degrees of freedom within groups (dfW): EW = Total number of observations - Number of groups = 12-3 =9 6. Calculate the mean square between groups (MSG): MSG = SSG/ dfG = 69.12/2 Copyright with Kunj Publication only Not for resale Ph. 8006184581 (Call Us) = 34.56 7. Calculate the mean square within groups (MSW): MSW = SSW / dfW = 10.4136 /9 1571 (rounded to four decimal places) . Calculate the F-ratio: MSG/ MSW 34,56/ 1.1571 9.86 (rounded to two decimal places) Now, we compare the calculated F-ratio (29.86) with the critical F-value at 5% level of significance (given as F3,6=4.76). Since the calculated F-ratio (29.86) s greater than the critical F-value (4.76), we reject the null hypothesis (H0)eThis means that the yield due to a particular catalyst is, significant at the 5% level of significance. Inconclusion, the difference in yield due to at least one.catalyst is statistically significant at a.5% level of significance. (b) Construct ANOVA table forone-way classification. ‘To construct the ANOVA (Analysis Of Variance) table for one-way classification. we need to follow these steps: Step 1: Caléullate the overall mean (grand mean) of all the data points. Step 2: Calculate the sum of squares for each source of variation (treatments, error, and total). Step 3: Determine the degrees of freedom for each source of variation. Step 4: Calculate the mean square for each source of variation, Step 5: Calculate the F-statistic by dividing the mean square for treatments by the mean square for error. Step 6: Determine the critical F-value at a chosen significance level and degrees of freedom for treatments and error, Step 7: Compare the calculated F-statistic with the critical F-value to determine if, there is a significant difference among the means of different catalysts. Given data: Plant Catalyst 1234 Copyright with Kunj Publication only Not for resale Ph. 8006184581 (Call Us) A2124 B3213 C1331 Let's calculate the ANOVA table step by step: Step 1: Calculate the overall mean (grand mean). Grand Mean = (Sum of all observations) / (Total number of observations) Grand Mean =(24+1+24+4434+241434143434D/12 Grand Mean = 23 / 12 Grand Mean = 1.92 Step 2: Calculate the sum of squares for each source of variation, (a) Sum of squares between treatments (SS_between): SS_between = n * (Mean of eacli group - Grand Mean)*2) SS_between = 3 * (2 -4:92)9D+ (2 - 1.92)42 + (291.92)42 + (2 - 1.92)92) SS_between = 3 *(0,0064 + 0.0064 + 0.0064 + 0,0064) SS_between =3 0.0256 SS_between ~.0.0768 (b) Sum of squares within treatments (SS_within): SS_within = 2E((Individual value~ Respective group mean)*2) SS within = (2 - 2)92 + (1 = 202 + 2-.2)24 (4 = 202463 - 292 42-292 HCL 22 + (3 - 2)A2 + (1 - 2)92 + 3 - 20D + (32/924 (1 2)02 SS_within=0+1+0+4+140+1+141+14141 SS_within = 11 (c) Total sum of squares (SS_total): SS_total = SS_between + SS_within SS_total ~ 0.0768 + II SS_total ~ 11.0768 Step 3: Determine the degrees of freedom for each source of variation. (a) Degrees of freedom between treatments (df_between): df_between = number of groups - | df_between = 3-1 df_between = 2 (b) Degrees of freedom within treatments (df_within): df_witl ‘otal number of observations - Total number of groups df_within = 12-3 Copyright with Kunj Publication only Not for resale Ph. 8006184581 (Call Us) df_within = 9 (c) Total degrees of freedom (df_total): df_total = Total number of observations - 1 df_total = 12-1 df_total = 11 Step 4: Calculate the mean square for each source of variation. (a) Mean square between treatments (MS_between): MS_betweet s_between / df_between MS_between = 0.0768 / 2 MS_between = 0.0384 (b) Mean square within treatments (MS_within): MS_within = SS_within / df_within MS_within = 11/9 MS _ within = 1.2222 Step 5: Calculate the F-statistic, F-statistic = MS_between / MS_within F-statistie= 0,0384/ 1.2222 F-statistic ~ 0.0314 Step 6: Determine the critical P-value. To determine the critical F-value, we need the significance level («) and degrees of freedom for treatments (df_between) and error (df_within). Let's assume a significance level of a = 0.05. From statistical tables or software, for a = 0.05, df_between = 2, and df_within = 9, the critical F-value is approximately 3.81. Step 7: Compare the calculated F-statistic with the critical F-value Since the calculated F-statistic (0.0314) is less than the critical F-value (3.81), we fail to reject the null hypothesis. It means that there is no significant difference among the means of different catalysts at the chosen significance level. The ANOVA table would look like this: Copyright with Kunj Publication only Not for resale Ph. 8006184581 (Call Us) Source | Sum of Squares (SS) | Degrees of Freedom (df) | Mean Square (MS) | F-Value Between (Treatments) | 0.0768 | 0.0384 | 0.0314 Within (Error) | 11.0000 | 1.2222 | Total | 11.0768 | ul I Note: The values in the table are rounded for simplicity. In practice, you would use more decimal places for calculations, Q8. Explain the following with the help of an example each: a) Binomial distribution ‘The binomial distribution is a probability distribution that models the number of successes in a fixed numbér of independent Bernoulli trials. Each trial can result in one of two possible outcomes: success (usually denoted as "S") with a probability of p, or failure (usually denoted as "F") with a probability of q = ‘The key characteristics of the binomial distribution are: ~The trials are independent, meaning the outcome of one trial does not affect the outcome of the other trials . ‘The number of trials, denoted as n, is fixed: . Each trial has the same probability of success, denoted as p. Exampl Let's consider an example of flipping a fair coin. In this case, the probability of getting a head (success) is p = 0.5, and the probability of getting a tail (failure) is q = 0.5. Now, suppose we want to find the probability of getting exactly 3 heads in 5 coin flips ( }). To calculate this probability, we can use the binomial distribution formula: P(X =k) = (n choose k) * pk * qh(n-k) where: P(X = k) is the probability of getting k successes (in this case, 3 heads), (n choose k) is the binomial coefficient, which represents the number of ways to choose k successes out of n trials, and it is calculated as (n choose k) = nl / (k! * (n= ky, pis the probability of success (getting a head), which is 0.5 in this example, is the probability of failure (getting a tail), which is also 0.5 in this example, Copyright with Kunj Publication only Not for resale Ph. 8006184581 (Call Us) k is the number of successes we are interested in (in this case, 3). Let's calculate the probability of getting exactly 3 heads in 5 coin flips using the formula: PX 5 choose 3) * 0.543 * 0.54(5-3) 511 (31 * (5-3)!)) * 0.543 * 0.52 (10) * 0.125 0.25 3125 So, the probability of getting exactly 3 heads in 5 coin flips is 0.3125 or 31.25%. b) t-test for mean ‘The t-test is a statistical hypothesis test used to determine if there is a significant difference between the means of tWo groups. Itis commonly employed when comparing the means of tWo sample populations to assess whether the observed differences are due'to chance or represent a real effect. Example: Let's say a pharmaceutical company is testing a new drug to lower blood pressure, ‘They conduct an experiment with two groups of participants: Group A: Receives the new drug. Group B: Receives a placebo (inactive substance): ‘The researchers Want to kitow if the new drug has a significant effect on reducing blood pressure Compared to the placebo. They measure the blood pressure of both groups after a month of treatment and collect the following data: Group A (Drug): [130, 125, 135, 128, 132] Group B (Placebo): [135, 138, 132, 140, 137] To determine if there is a statistically significant difference between the means of the two groups, the researchers can use the t-test for mean. The t-test will assess whether the difference in the means is large enough compared to the variation within the groups. After performing the t-test, they might find that the t-value is 2.54, and the corresponding p-value is 0.034, The p-value is a measure of the probability of observing the data under the assumption that there is no real difference between the groups. In this case, a p-value of 0.034 indicates that there is a 3.4% chance of obtaining the observed results if the drug had no effect. Copyright with Kunj Publication only Not for resale Ph. 8006184581 (Call Us) ‘Typically, researchers set a significance level (alpha), commonly 0.05 (5%), as a threshold. If the p-value is less than or equal to the significance level, they reject the null hypothesis, which means they conclude that there is a significant difference in blood pressure between the drug and placebo groups. Otherwise, if the p-value is greater than the significance level, they fail to reject the null hypothesis, suggesting that there is no significant difference between the means of the two groups. In this example, since the p-value (0.034) is less than the significance level (0.05), the researchers can conclude that the new drug has a statistically significant effect on reducing blood pressure compared to the placebo. ©) Properties of good estimator In statistics, an estimator is a rule or formula used to estimate an unknown parameter or quantity from a sample of data. A good estimator possesses certain desirable properties that make it reliableyeffi¢ient, and unbiased in estimating the population parameter. Here are some'essential properties of a good estimator, along with examples for eachy 1. Unbiasedness: An estimator is unbiased if its expected value is equal to the true Valle Of the pafametét being estimated, Inther Words; on average, the estimator does not overestimate or underestimate the parameter. Example: Let's say,we want to estimate the average height of students in aschool. We randomly select a sample of 50 students and measure their heights. If the average height of the sample is very close tothe true-average height of all students in the school, then the sample mean isan unbiased estimator of the population mean height. 2. Efficiency::An estimator is efficient if it has the smallest variance among all ‘unbiased estimators of the parameter. An efficient estimator provides the most precise and accurate estimates with the least amount of variability. Example: In the context of estimating the variance of exam scores, two different estimators are computed using different formulas. If one of these estimators consistently produces smaller variances compared to the other while being unbiased, then it is considered more efficient. 3. Consistency: An estimator is consistent if, as the sample size increases, the estimator approaches the true value of the parameter. In other words, it converges to the correct value as more data is collected. Example: Suppose we are estimating the probability of a coin landing heads up. As we flip the coin more and more times, the proportion of heads in the sample will approach the true probability of a head, and the estimator becomes more consistent. Copyright with Kunj Publication only Not for resale Ph. 8006184581 (Call Us) 4. Sufficiency: An estimator is sufficient if it uses all the relevant information in the data and does not depend on unnecessary details. It captures all the essential information required to estimate the parameter, Example: When estimating the mean score of students in a class, the sample mean is, a sufficient estimator because it considers all individual scores and ignores other irrelevant data, 5. Robustness: An estimator is robust if it remains reasonably accurate even when the underlying assumptions are violated or the data contains outliers or extreme values. Example: The median is a robust estimator for the center of a data set because it is less affected by outliers compared to the mean, 6. Asymptotic Normality: An estimator is asymptotically normal if, as the sample size approaches infinity, its distribution becomes approximately normal. This property allowS'us to use Various statistical techniques based on the normal distribution, Example: In larg€ samples, the sample mean of a random variable with a finite variance becomes asymptotically normal due to the central limit theorem. ‘These properties help assess the quality of an estimator and its suitability for making reliable inferences about the population parameter based on sample data. 4) F-test for Equality of two variances ‘The F-test for equality of two varidinces is a statistical test used to determine whether two populations have equal variances, It is based on the F-distribution and is commonly used in-vatious fields, such as statistics, engineering, and scien research. The test is important because it helps assess the variability or spread of data in two or more groups and is often a prerequisite for other statistical tests, such as the test. Here's how the F-test for equality of two variances works, along with an example: Exampl Let's say we have two groups of students, Group A and Group B, and we want to compare their scores on a mathematics test. We want to determine if the variance of scores in Group A is significantly different from the variance of scores in Group B. 1. Hypotheses: ‘The null hypothesis (HO): The variances of scores in Group A and Group B are equal. The alternative hypothesis (Ha): The variances of scores in Group A and Group B are not equal, Copyright with Kunj Publication only Not for resale Ph. 8006184581 (Call Us) 2. Data collection: For each group, we collect the test scores of a random sample of students. Group A scores: 85, 88, 90, 82, 87, 91, 86, 84, 83, 89 Group B scores: 78, 80, 79, 81, 82, 84, 86, 83, 85, 87 3. Calculate sample variances: Next, we calculate the sample variances for each group. Sample variance of Group A (s42A) = [(85-87.1)42 + (88-87.1)2 + ... + (89-87.1)"2] 1 (nA-1) Sample variance of Group B (s*2B) = [(78-82.1)*2 + (80-82.1)92 +... + (87-82.1)2] /(nB-1) 4. Compute the F-statistic: The F-statistic is calculated as,the'ratio of the larger sample variance to the smaller sample variance. F = s42A/s‘2B if 8®2A > s*2B F-= s®2B /s"2A if s®2B > s2A, 5. Determine the critical value and compare with F-statist Using the F-distribution table and degrees of freedom for each group, we ean find the critical value. If the calculated F-statistic is. greater than the critical value, we reject the null hypothesis and conclude that the variances are significantly different. Otherwise, we fail to reject the null hypothesis, suggesting that the variances are equal. 6, Interpretation: Suppose we found that F = 2.4, and the critical value for a certain level of significance (c.g., «= 0.05) was 3.18, Since 2.4 < 3.18, we fail to reject the null hypothesis. This means that, at the chosen level of significance, we do not have enough evidence to claim that the variances of scores in Group A and Group B are different. Therefore, we assume their variances are equal for further analysis. Note: If the sample sizes are small, the F-test may not be reliable, and other tests like the Levene's test or Brown-Forsythe test can be used to assess variance equality.

You might also like