Professional Documents
Culture Documents
MS 5318 HW2 Lam Wing Yan
MS 5318 HW2 Lam Wing Yan
1. (5 pts) Do poets die young? According to William Butler Yeats, “She is the Gaelic
muse, for she gives inspiration to those she persecutes. The Gaelic poets die young, for
she is restless, and will not let them remain long on earth.” One study designed to
investigate this issue examined the age at death for writers from diJerent cultures and
genders. Three categories of writers examined were novelists, poets, and nonfiction
writers. The ages at death for female writers in these categories from North America are
given in poets.xls. Most of the writers are from the United States, but Canadian and
Mexican writers are also included.
Does the mean age at death diJer among the three groups? Run the appropriate
procedure and summarize the findings. Use a significance level 𝜶 = 𝟎. 𝟎𝟓. In your
submission, include the Excel output tables NOT the original data.
Ans:
The null and alternative hypotheses for one-way ANOVA:
𝐇𝟎 : 𝛍𝟏 = 𝛍𝟐 = 𝛍𝟑 (the mean age at death diJer among the three groups are the same )
𝐇𝐚 : not all the 𝛍𝐢 are equal (opposite to H0)
where µ' = the mean age at death for novelists
µ( = the mean age at death for poets
µ) = the mean age at death for nonfiction
By conducting a one-way ANOVA analysis on Excel, we can get the below result:
Anova: Single Factor
SUMMARY
Groups Count Sum Average Variance
Novels' Age 67 4787 71.44776 170.3419
Poems' Age 32 2022 63.1875 299.1895
Nonfiction's Age 24 1845 76.875 198.7228
ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 2744.193 2 1372.096 6.562944 0.001973 3.071779
Within Groups 25088.07 120 209.0672
Conclusion :
Given that α=0.05, while the p-value =0.001972985 < α, so reject H0 at 5% level and get
the conclusion that the mean age at death diJer among the three groups are not the
same with a 5 % significance level.
2. (5pts) A particular paperback mystery book is published with a choice of three
diJerent pictures on the cover: a photograph of the actor playing the main character in
the movie version of the book, a drawing of the mansion where the story in the book
takes place, or an embossed graphic of the murder weapon. A certain bookstore keeps
copies of this book with each of the pictures on the cover on its racks. To test the
hypothesis that sales of this book are equally divided among the three choices, a
simple random sample of 120 purchases of this book is obtained. The numbers are
displayed in the table below:
Run the appropriate procedure to test the null hypothesis that sales of this book are
equally divided among the three choices. Report the results and the conclusion of your
hypothesis tests. Use 𝛼 = 0.05.
Ans:
H0: The sales of this book are equally divided among the three choices
Ha: The sales of this book are not equally divided among the three choices
Next, we would test of goodness of fit: Chi-squared Test with Uniform Probability.
Step1 :
Step 2:
1. The counts are mutually exclusive with each individual contributing to only one cell.
2. Expected cell count ≥10.
Calculate (Observed cell - Expected cell )^2/ Expected cell , the result are shown at the
below:
Picture on the cover
Photograph Drawing Embossed graphic Total
Observed Count 31 47 42 120
Uniform Probability 0.33333333 0.333333333 0.333333333
Excepted Count 40 40 40
(Observed - Expected)^2/ Expected 2.025 1.225 0.1
Conclusion :
Given that α=0.05. when p-value =0.187308179 > 0.05, we fail to reject H0 and
conclude that the sales of this book are not equally divided among the three choices
when the significant level= 5%.
3. (5pts) The students in a statistics class are categorized by gender and by the year in
school. The numbers obtained are displayed below: Suppose we wish to test the null
hypothesis that there is no association between the year in school and gender. What are
the results and the conclusion of the test? Use 𝛼 = 5%.
Ans:
H0: The two categorical variables (Gender and Year in school) are independent.
Ha: The two categorical variables (Gender and Year in school) are not independent.
Step 1: Write down the contingency table and calculate row total, column total, and
total count as the observed table shown below:
Observed Table Year in school
Gender Freshman Sophomore Junior Senior Row Total
Male 5 4 15 19 43
Female 25 23 13 10 71
Column Total 30 27 28 29 114
𝐧𝐢 𝐦 𝐣
,𝐢𝐣 =
Step 2: Calculate the expected count for each cell: 𝐧 = (n1*m1)/n, and get the
𝐧
excepted table shown as below:
Excepted Table Year in school
Gender Freshman Sophomore Junior Senior Row Total
Male 11.31578947 10.18421053 10.5614 10.93859649 43
Female 18.68421053 16.81578947 17.4386 18.06140351 71
Column Total 30 27 28 29 114
1. The counts are mutually exclusive with each individual contributing to only one cell.
2. Expected cell count ≥10, both conditions are fulfilled, We can continue to conduct
the chi-squared statistic.
Degree of freedom = 3
Conclusion :
Given that α=0.05. when p-value = 2.24316E-05 < 0.05, we reject H0 and conclude that
the two variables are not independent when the significant level= 5%.
Degree of freedom = 3
chi-squared statistic 2=4.22380259
P-value = 2.24316E-05
Summary
Given that α=0.05. when p-value = 2.24316E-05 < 0.05, we reject H0 and conclude that the two variables are not independent when the significant level= 5%.
4. (5pts) A producer of fertilizer fills bags using an automated process. When the
process is in control, the mean weight of the bags is μ=50 kilograms and the standard
deviation of the weights is σ=2.4 kilograms. Each hour, 30 filled bags are selected at
random and the average weight is computed. The quality control manager set the
control limits for the sample average weight to be [48.98, 51.02]. If the sample mean
falls outside the control limits, the manager has to stop the process and have the filling
machine calibrated. If the process is in control, what is the probability that the average
weight will be within the control limits? Note: use central limit theorem.
Ans:
Weights of 30 filled bags : 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝟑𝟎 should within the control limits [48.98, 51.02],
shouldn’t smaller than 48.98 kilograms and more than 51.02 kilograms:
Next to find the probability that the average weight will be within the control limits
[48.98, 51.02], the detail process is shown as below:
Step 1: Find the probability of the weight of the selected bag is less than 51.02
Step 2: After getting the z score of = 2.327820869, we can thus find the normal
distribution by using the excel formula =NORM.S.DIST(2. 327820869,1)
Step 3: The probability of the weight of the selected bag is less than 51.02
=0.990039191.
Step 4: Find the probability of the weight of the selected bag is less than 48.98
Step 5: After getting the z score of = -2.327820869, we can thus find the normal
distribution by using the excel formula =NORM.S.DIST(-2.327820869,1)
Step 6: The probability of the weight of the selected bag is less than 48.98 =
0.009960809.
Conclusion :
From the above result conclusion that the probability that the average weight within the
control limits [48.98, 51.02]
= the probability of the weight of the selected bag is less than 51.02 – the probability of
the weight of the selected bag is less than 48.98
=0.990039191-0.009960809
=0.980078382= 98% of the probability that the average weight within the control limits