Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

MS 5318 – Homework # 2

Lam Wing Yan

1. (5 pts) Do poets die young? According to William Butler Yeats, “She is the Gaelic
muse, for she gives inspiration to those she persecutes. The Gaelic poets die young, for
she is restless, and will not let them remain long on earth.” One study designed to
investigate this issue examined the age at death for writers from diJerent cultures and
genders. Three categories of writers examined were novelists, poets, and nonfiction
writers. The ages at death for female writers in these categories from North America are
given in poets.xls. Most of the writers are from the United States, but Canadian and
Mexican writers are also included.
Does the mean age at death diJer among the three groups? Run the appropriate
procedure and summarize the findings. Use a significance level 𝜶 = 𝟎. 𝟎𝟓. In your
submission, include the Excel output tables NOT the original data.
Ans:
The null and alternative hypotheses for one-way ANOVA:
𝐇𝟎 : 𝛍𝟏 = 𝛍𝟐 = 𝛍𝟑 (the mean age at death diJer among the three groups are the same )
𝐇𝐚 : not all the 𝛍𝐢 are equal (opposite to H0)
where µ' = the mean age at death for novelists
µ( = the mean age at death for poets
µ) = the mean age at death for nonfiction

By conducting a one-way ANOVA analysis on Excel, we can get the below result:
Anova: Single Factor

SUMMARY
Groups Count Sum Average Variance
Novels' Age 67 4787 71.44776 170.3419
Poems' Age 32 2022 63.1875 299.1895
Nonfiction's Age 24 1845 76.875 198.7228

ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 2744.193 2 1372.096 6.562944 0.001973 3.071779
Within Groups 25088.07 120 209.0672

Total 27832.26 122

Conclusion :
Given that α=0.05, while the p-value =0.001972985 < α, so reject H0 at 5% level and get
the conclusion that the mean age at death diJer among the three groups are not the
same with a 5 % significance level.
2. (5pts) A particular paperback mystery book is published with a choice of three
diJerent pictures on the cover: a photograph of the actor playing the main character in
the movie version of the book, a drawing of the mansion where the story in the book
takes place, or an embossed graphic of the murder weapon. A certain bookstore keeps
copies of this book with each of the pictures on the cover on its racks. To test the
hypothesis that sales of this book are equally divided among the three choices, a
simple random sample of 120 purchases of this book is obtained. The numbers are
displayed in the table below:

Run the appropriate procedure to test the null hypothesis that sales of this book are
equally divided among the three choices. Report the results and the conclusion of your
hypothesis tests. Use 𝛼 = 0.05.

Ans:

The null and alternative hypotheses are set as follow:

H0: The sales of this book are equally divided among the three choices

Ha: The sales of this book are not equally divided among the three choices

Next, we would test of goodness of fit: Chi-squared Test with Uniform Probability.

Step1 :

The uniform probability = 1/3=0.3333,

And the expected count=Total * Uniform probability,

We can get the following result and table by using Excel:


Picture on the cover
Photograph Drawing Embossed graphic Total
Observed Count 31 47 42 120
Uniform Probability 0.33333333 0.333333333 0.333333333
Excepted Count 40 40 40

Step 2:

Then we check conditions:

1. The counts are mutually exclusive with each individual contributing to only one cell.
2. Expected cell count ≥10.

Since both conditions are satisfied, we can continue the test.


Step 3:

Calculate (Observed cell - Expected cell )^2/ Expected cell , the result are shown at the
below:
Picture on the cover
Photograph Drawing Embossed graphic Total
Observed Count 31 47 42 120
Uniform Probability 0.33333333 0.333333333 0.333333333
Excepted Count 40 40 40
(Observed - Expected)^2/ Expected 2.025 1.225 0.1

Step 4: Calculate Degree of Freedom , Chi-squared statistic and P-value :

Degree of freedom = 3-1=2

Chi-squared statistic = 3.35

P-value = chisq.dist.rt (χ^2, (r-1)(c-1))=0.187308179

Conclusion :

Given that α=0.05. when p-value =0.187308179 > 0.05, we fail to reject H0 and
conclude that the sales of this book are not equally divided among the three choices
when the significant level= 5%.
3. (5pts) The students in a statistics class are categorized by gender and by the year in
school. The numbers obtained are displayed below: Suppose we wish to test the null
hypothesis that there is no association between the year in school and gender. What are
the results and the conclusion of the test? Use 𝛼 = 5%.

Ans:

The null and alternative hypotheses are set as follow:

H0: The two categorical variables (Gender and Year in school) are independent.

Ha: The two categorical variables (Gender and Year in school) are not independent.

Next we test of its independence:

Step 1: Write down the contingency table and calculate row total, column total, and
total count as the observed table shown below:
Observed Table Year in school
Gender Freshman Sophomore Junior Senior Row Total
Male 5 4 15 19 43
Female 25 23 13 10 71
Column Total 30 27 28 29 114

𝐧𝐢 𝐦 𝐣
,𝐢𝐣 =
Step 2: Calculate the expected count for each cell: 𝐧 = (n1*m1)/n, and get the
𝐧
excepted table shown as below:
Excepted Table Year in school
Gender Freshman Sophomore Junior Senior Row Total
Male 11.31578947 10.18421053 10.5614 10.93859649 43
Female 18.68421053 16.81578947 17.4386 18.06140351 71
Column Total 30 27 28 29 114

Step 3: Check conditions (both)

1. The counts are mutually exclusive with each individual contributing to only one cell.

2. Expected cell count ≥10, both conditions are fulfilled, We can continue to conduct
the chi-squared statistic.

Since both conditions are satisfied, we can continue the test.


𝟐
-𝐧𝐢𝐣 .𝐧
/𝐢𝐣 0
Step 4: Calculate the chi-squared statistic 𝛘𝟐 = ∑ 𝐧
/𝐢𝐣
(Observed cell - Expected cell )^2/ Expected cell Year in school
Gender Freshman Sophomore Junior Senior
Male 3.525092 3.755269958 1.86539 5.941002
Female 2.134915 2.274318425 1.129743 3.598072

Step 5: Calculate Degree of Freedom, Chi-squared statistic and P-value :

Degree of freedom = 3

chi-squared statistic =24.22380259

P-value =chisq.dist.rt (χ^2, (r-1)(c-1))=2.24316E-05

Conclusion :

Given that α=0.05. when p-value = 2.24316E-05 < 0.05, we reject H0 and conclude that
the two variables are not independent when the significant level= 5%.

The entire process is stated as below:


Observed Table Year in school
Gender Freshman Sophomore Junior Senior Row Total
Male 5 4 15 19 43
Female 25 23 13 10 71
Column Total 30 27 28 29 114

Excepted Table Year in school


Gender Freshman Sophomore Junior Senior Row Total
Male 11.31578947 10.18421053 10.5614 10.93859649 43
Female 18.68421053 16.81578947 17.4386 18.06140351 71
Column Total 30 27 28 29 114

Checking Both Conditions


1 The counts are mutually exclusive with each individual contributing to only one cell.
2 Expected cell count ≥10
Both conditions are fullfilled, We can continue to conduct the chi-squared statistic

(Observed cell - Expected cell )^2/ Expected cell Year in school


Gender Freshman Sophomore Junior Senior
Male 3.525092 3.755269958 1.86539 5.941002
Female 2.134915 2.274318425 1.129743 3.598072

Degree of freedom = 3
chi-squared statistic 2=4.22380259
P-value = 2.24316E-05

Summary
Given that α=0.05. when p-value = 2.24316E-05 < 0.05, we reject H0 and conclude that the two variables are not independent when the significant level= 5%.
4. (5pts) A producer of fertilizer fills bags using an automated process. When the
process is in control, the mean weight of the bags is μ=50 kilograms and the standard
deviation of the weights is σ=2.4 kilograms. Each hour, 30 filled bags are selected at
random and the average weight is computed. The quality control manager set the
control limits for the sample average weight to be [48.98, 51.02]. If the sample mean
falls outside the control limits, the manager has to stop the process and have the filling
machine calibrated. If the process is in control, what is the probability that the average
weight will be within the control limits? Note: use central limit theorem.

Ans:

Weights of 30 filled bags : 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝟑𝟎 should within the control limits [48.98, 51.02],
shouldn’t smaller than 48.98 kilograms and more than 51.02 kilograms:

𝟒𝟖. 𝟗𝟖 ≤ 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝟑𝟎 ≤51.02

Next to find the probability that the average weight will be within the control limits
[48.98, 51.02], the detail process is shown as below:

Step 1: Find the probability of the weight of the selected bag is less than 51.02

P{𝑥̅ ≤ 51.02}=P{𝑧 ≤ (51.02 − 50)/(2.4/√30)= 2.327820869}

Step 2: After getting the z score of = 2.327820869, we can thus find the normal
distribution by using the excel formula =NORM.S.DIST(2. 327820869,1)

Step 3: The probability of the weight of the selected bag is less than 51.02
=0.990039191.

Step 4: Find the probability of the weight of the selected bag is less than 48.98

P{𝑥̅ ≤ 48.98}=P{𝑧 ≤ (48.98 − 50)/(2.4/√30)= -2.327820869}

Step 5: After getting the z score of = -2.327820869, we can thus find the normal
distribution by using the excel formula =NORM.S.DIST(-2.327820869,1)

Step 6: The probability of the weight of the selected bag is less than 48.98 =
0.009960809.

Conclusion :

From the above result conclusion that the probability that the average weight within the
control limits [48.98, 51.02]

= the probability of the weight of the selected bag is less than 51.02 – the probability of
the weight of the selected bag is less than 48.98

=0.990039191-0.009960809

=0.980078382= 98% of the probability that the average weight within the control limits

You might also like