Project by Rebecca Winzer University of Idaho STAT 301 5 December 2012
Winzer 1 The purpose of this study is to determine the discrepancies of size in brake wheels. These discrepancies can be contributed to three different operators as well as three different machines, so this study will be multi-variable and rather complex. I expect to find the source of the error contributed to at least one of the operators or one of the machines, but the errors could accumulate from more than one source. This particular data set interested me because of the complexity and the wide range of possibilities for the source of the greatest error; I thought it would be a great study to apply a lot of the test-type statistics learned over the semester. For machine one, two, and three, 15 samples are collected for each, but 5 of each of the samples are paired with operators one, two, and three. Therefore machines and operators will be dependent on each other throughout the study. For these particular brake wheels, the population is determined by the total amount of brake wheels made from those specific machines and operators. The sample is random and draws from the pool of the total brake wheels made (the population) to reflect the total population. There are concerns for statistical bias throughout the study. Representativeness is a potential issue when doing this study. If a sample does not represent the true population, results will be skewed and biased. Fortunately the subject that the study concerns is not complex. Brake wheels bring no potential bias threat to the study because obviously they are inanimate objects with no ability to impact the data collected. The factor that we have to take into account is selection bias, in other words equal representation for all of the different combinations. So really we just have to represent an equal number of samples for each Machine/Operator combination, which the data accumulated does have. There are other potential bias threats. The potential omission of information has to be understood. Was it just the machines and operators that played a role in the size discrepancy of the brake wheels, or have Winzer 2 other steps to the process been omitted because they were deemed negligible? We put some trust in the data collectors that they appropriately analyzed all of the factors that went into making these brake wheels and that they have given us an accurate reflection of these discrepancies. Detection bias might play a role. Once again, the data collectors might have been biased towards choosing brake wheels with discrepancies because that is what they were looking for when collecting the data. We have to consider how the data was taken. We trust that the data collectors did as random of a collection of data as they possibly could for each machine and operator. Reporting bias might be another contributing factor. We ensure that the data collected was accumulated for only 5 samples of each combination randomly and that samples were not chosen out of the entire data collection. I believe the sample to be representative of the true population because we have an equal number of samples for each machine/operator combination and great guarantee that are data collectors are competent. I plotted a boxplot graph choosing the graph variable as the difference and the categorical variables as the three different machines and operators. The boxplot showed that the overall error between the machines was pretty equal. What appears to be causing the most error falls on the behalf of the operators. When looking at the boxplot, one sees that operators one, two, and three are very different from one another. Operator two appears to contribute to the greatest amount of error and operator three contributes to the least amount of error. But the difference between Operator one and three and operator two and three is quite drastic. It is true that operator two contributes the greatest amount of error, but the drastic difference between each operator output appears to contribute to the greatest discrepancy. When looking at the collected descriptive statistics one sees for the results of machine one, two, and three the great variance in mean Winzer 3 values for the operators. For the operator results, the machines have mean values that are very close to one another. They also have maximum and minimum values that are very close as well. For the machine results, there is a great amount of variance between the operators. Are the mean values for machine one, two, and three essentially equal? We perform a two-way ANOVA test of the difference versus the machine and operator. We choose our null hypothesis to be that the mean values of machine one, two, and three are equal. Our alternative hypothesis is that any of the two mean values or all three are different. I performed the test and found that the p-value is 0.391 (which is much greater than 0.05). We cannot reject the null hypothesis therefore it seems that the machine means are all essentially equal. Are the mean values for operators one, two, and three essentially equal? Again we perform the two-way ANOVA test. We choose our null hypothesis to be that the mean values of operator one, two, and three are equal. Our alternative hypothesis is that any of the two mean values or all three are different. I performed the test and I found a p-value of 0.000 for operators. Since 0.000 is less than the p-value of 0.05 we must reject the null hypothesis. Therefore we accept the alternative hypothesis that at least two of the operator mean values are different. Are the mean values for the combination of the machines and operators essentially equal? We perform a two-way ANOVA test. We choose our null hypothesis to be that the mean values of the combination of machines and operators are equal. Our alternative hypothesis is that at least two of the mean value combinations are different. We perform the test and get a p-value for the interaction of 0.896 (which is much greater than p-value 0.05). We cannot reject the null hypothesis and therefore it seems that the mean values for the machine and operator combinations are all equal. The statistical inference data shows that the operators are contributing to the discrepancy. To further Winzer 4 emphasize this conclusion, I performed a one-way ANOVA test of both machines versus difference and operators versus difference. I used Tukey method for both. For the operators: operator one and two were in the same grouping of A, but operator three was in a grouping of B. The method shows the large difference that operator three has from the other two operators. Again we get a p-value of 0.000 which shows the difference in the mean values of the operators. For the machines: machine one, two, and three are all in the same grouping of A. The p-value is 0.499 (which is much greater than 0.05) so it shows again that the mean values for the machines are equal. The method was appropriate because we want to find what is causing the largest discrepancy in brake wheel size. The important information is the difference measured from the operator and machine combination.I collected a group of four different graphs for residual plots versus difference. For the normal probability plot, the residuals follow the line pattern overall. The plot shows that the residuals are normally distributed and therefore follow a linear fit. This plot shows that the data collectors did a good job in reflecting all of the variables that contribute to the difference in brake wheel size.For the versus fits graph, there is a little change in variability across the plot, but not too much. It shows again that we have normally distributed residuals and that the data fits a linear model. The histogram graph does not quite show a bell curve, but we also have to take in account the very small sample that we are working from. The larger the sample the more the histogram will reflect a normal bell curve.For the most part, though, the histogram reflects a bell curve shape and therefore the residuals are shown to be normally distributed. The residual versus observation order graph is very unpredictable reflecting no pattern, which is good. This graph shows that the residuals are independent of one another, so
Winzer 5 the independence condition is met. These graphs show that there are no serious violations of the linear regression assumption. From our studies we can conclude that the operators are the causation for the greatest source of error when making the brake wheels. Our descriptive statistics data show that the greatest discrepancy occurs in the operators. They have the greatest difference among their mean values. The boxplot confirms the data. It shows that the machines overall have around the same level of error, but the error contributed by the operators is very widespread. For our statistical inference data, all but the operators had equal mean values of error. For both machines and machine/operator combinations the error was essentially equal between them. For the three operators, two differed greatly from the third. Operator one and two were drastically different than operator three (operator one and two were in grouping A while operator three was in grouping B). One question that naturally arises: will changing the operators also change the machines output? We can round off this argument. Our residual versus observation order graph shows that there is independence among the residuals. Therefore we can conclude that operators and machines are (for the most part) independent of one another and that changing one will not change the other. We found our large source of error: fluctuation among the operators. To further our study of this problem, I suggest that we use a much large sample size. Instead of 5 samples for each machine/operator combination, I advise that we instead take 50 samples. We must use all of the methods to obtain a completely random sample, but taking a larger sample would contribute to more accurate results. From our studies, I would also propose that we take a much more selective look at the operators. They appear to be the lead cause in brake wheel size discrepancy, and Winzer 6 fixing the operators would help diminish the variance. For someone conducting a similar study, I would first like to ask how the operators specifically function in making the brake wheels. Is there a distinct function the operators perform that creates the variation? Are there multiple factors? There must be many elements that contribute to how an operator functions, so I suggest that an individual conducting a corresponding study take into account these agents. The factors that contribute to machine output and operator output were not really considered in this study. Since there are multiple contributing factors to operator output they must also be considered in forthcoming studies to improve efficiency for the future.
Winzer 7
Descriptive Statistics: Difference
Results for Machine = 1
Variable Operator Mean SE Mean StDev Variance Minimum Q1 Median Difference 1 2.800 0.255 0.570 0.325 2.000 2.250 3.000 2 3.400 0.292 0.652 0.425 3.000 3.000 3.000 3 2.300 0.255 0.570 0.325 1.500 1.750 2.500
Winzer 9 Statistical Inference Data for Machines and Operators
Two-way ANOVA: Difference versus Machine, Operator
Source DF SS MS F P Machine 2 0.8778 0.43889 0.96 0.391 Operator 2 9.2111 4.60556 10.11 0.000 Interaction 4 0.4889 0.12222 0.27 0.896 Error 36 16.4000 0.45556 Total 44 26.9778
S = 0.6749 R-Sq = 39.21% R-Sq(adj) = 25.70%
Individual 95% CIs For Mean Based on Pooled StDev Machine Mean -------+---------+---------+---------+-- 1 2.83333 (----------*-----------) 2 3.06667 (-----------*-----------) 3 3.16667 (-----------*----------) -------+---------+---------+---------+-- 2.70 3.00 3.30 3.60
Individual 95% CIs For Mean Based on Pooled StDev Operator Mean --------+---------+---------+---------+- 1 3.10000 (------*------) 2 3.53333 (------*------) 3 2.43333 (------*------) --------+---------+---------+---------+- 2.50 3.00 3.50 4.00
One-way ANOVA: Difference versus Operator
Source DF SS MS F P Operator 2 9.211 4.606 10.89 0.000 Error 42 17.767 0.423 Total 44 26.978
S = 0.6504 R-Sq = 34.14% R-Sq(adj) = 31.01%
Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev --------+---------+---------+---------+- 1 15 3.1000 0.5412 (------*------) 2 15 3.5333 0.8550 (------*-----) 3 15 2.4333 0.4952 (------*-----) --------+---------+---------+---------+- 2.50 3.00 3.50 4.00
Pooled StDev = 0.6504
Grouping Information Using Tukey Method
Operator N Mean Grouping 2 15 3.5333 A Winzer 10 1 15 3.1000 A 3 15 2.4333 B
Means that do not share a letter are significantly different.
Tukey 95% Simultaneous Confidence Intervals All Pairwise Comparisons among Levels of Operator
Source DF SS MS F P Machine 2 0.878 0.439 0.71 0.499 Error 42 26.100 0.621 Total 44 26.978
S = 0.7883 R-Sq = 3.25% R-Sq(adj) = 0.00%
Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev ---------+---------+---------+---------+ 1 15 2.8333 0.7237 (------------*-------------) 2 15 3.0667 0.7988 (------------*-------------) 3 15 3.1667 0.8381 (-------------*------------) ---------+---------+---------+---------+ 2.70 3.00 3.30 3.60
Pooled StDev = 0.7883
Grouping Information Using Tukey Method
Machine N Mean Grouping 3 15 3.1667 A 2 15 3.0667 A 1 15 2.8333 A
Means that do not share a letter are significantly different.
Winzer 11 Tukey 95% Simultaneous Confidence Intervals All Pairwise Comparisons among Levels of Machine