Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 8

ĐẠI HỌC QUỐC GIA

ĐẠI HỌC BÁCH KHOA TP HỒ CHÍ MINH




Subject: probability and statistics


Project 1

Teacher in charge: Nguyễn Tiến Dũng


Group: 2
Class: CC02

Name Student ID
Từ Hữu Thịnh 1952471
Vũ Hồng Quân 1952946
Vương Quang Hưng 1952752
Trịnh Trần Nguyên An 1952550
Tô Minh Đức 1810897

Thành phố Hồ Chí Minh – 2020


Exercise 1:

Solve using: One-way Anova.


Method: One-way Anova
Tool: R.
Assume that:
- H : the blood lead levels among workers in the above factory are the same
-H¯: the blood lead levels among workers in the above factory are different
Using R to solve
Regenerating table of data
 observation=c(rep("F1",5),rep("F2",7),rep("F3",7),rep("F4",5),rep("F5",5))
 F1 = c(0.25,0.28,0.32,0.22,0.22)
 F2 = c(0.22,0.25,0.24,0.28,0.31,0.21,0.22)
 F3 = c(0.25,0.26,0.28,0.25,0.22,0.28,0.31)
 F4 = c(0.31,0.33,0.30,0.29,0.25)
 F5 = c(0.22,0.28,0.28,0.25,0.30)
 factorlevel=c(F1,F2,F3,F4,F5)
 data=data.frame(observation,factorlevel)
 print(data)
Then we apply aov() and summary() to find the result:

From here we can get conclusion by 2 methods.


First method
Fvalue < Fcritical(4,24,0.03) (1.58 < 3.22) so H is qualified, which mean the blood
lead levels among workers in the above factory are the same.
Second method
Pvalue = Pr(>F) = 0.211 > significant level (0.03) so H is qualified, which mean
the blood leads levels among workers in the above factory are the same.
Exercise 2:
 Solve using: Statistical Hypothesis.
 Method: Chi-squared test.
 Tool: R Chi-squared test.
 Assume that:
 H= The income of skilled workers from 2 age groups is the same,
 H¯ =The income of skilled workers from 2 age groups is different.

Using R to solve:
First, we have to create the table and set all value in a dataframe for computing
this exercise:
 T0_1= c(71,54)
 T1_2= c(430,324)
 T2_3= c(1072,894)
 T3_4= c(1609,1202)
 T4_6= c(1178,903)
 Tover6= c(158,112)
 Topic2_2 = cbind(T0_1,T1_2,T2_3,T3_4,T4_6,Tover6)
 colnames(Topic2_2)=c("0-1","1-2","2-3","3-4","4-6",">6")
 rownames(Topic2_2)=c("40-50","50-60")
 print(Topic2_2)

Then we apply chisq.test(Topic2_2) to get the result

 P- value=0.5116 > α=0.05 so H isqualified.


 The income of skilled workers between two age groups is the same.

Exercise 3: 
 The number of students arriving late at five high schools on different days of
week are given the following table:
Highschool
Days of week
A(1) B(2) C(3) D(4)
Monday (1) 3.5 7.4 8.0 3.5
Tuesday (2) 5.6 4.1 6.1 9.6
Wednesday (3) 4.1 2.5 1.8 2.1
Thursday (4) 7.2 3.2 2.2 1.5
Is there any significant difference in the number of late arrivals among different
days of the week at the significance level α = 1%?

 Type, Method: Anova: Two-Factor Without Replication.

 Assume that: z
+ H1= The mean of students arriving late at 5 high schools are equal.
+ ¯H1= There are at least 2 schools that mean are not equal. 
 +  H2 = The mean of students arriving late in every day of the week are equal.
+ ¯H2 = There are at least 2 days that the mean are not equal.

Using R to solve:
First we use gl() command to create 2 column highschool and day of weeks
 school=gl(4,4,16)
 dayofweek=gl(4,1,16)
Then we have to input the value for student arriving late
 students=c(5,4,5,7,4,5,3,2,4,3,4,5,4,4,3,2)
And we’re going to put it all in a dataframe and print it by using
 Topic2_3=data.frame(school, dayofweek,students)
 print(Topic2_3)
After that, we have the table as below
Finally, we apply the command Topic2_3= aov(students ~ school +
dayofweek, data= Topic2_3) and summary(Topic2_3) to get the conclusion

So we have the result and conclusion as below


 Days of week (Pr(>F)) = 0.179 > α=0.01 so H2 is qualified, which mean
days of week don’t effect on the number of students arriving late.

 Schools (Pr(>F)) =0.954 > α=0.01 so H1 is qualified, which mean schools


of week don’t effect on the number of students arriving late. 
Exercise 4:
In a scientific experiment, the thickness of the nickel coating obtained from
various types of plating tanks is measured.
Thickness of Plating tank
nickel coating A(1) B(2) C(3)
4-8(1) 32 51 68
8-12 (2) 123 108 80
12-16 (3) 10 26 26
16-20 (4) 41 24 28
20-24 (5) 19 20 28
At the significance level of α = 0.05, test the hypothesis: the coating thickness
does not depend on the type of plating tank used.
 Type, Method: Anova: Two-Factor Without Replication.
Hypothesis
+ H1= The mean of measured plank are equal.
+ ¯H1= There are at least 2 measured plank that mean are not equal. 
 +  H2 = The mean of measured plank are equal.
+ ¯H2 = There are at least 2 measured plank that the mean are not equal.

Using R to solve:
Like exercise 3, first we need to create 2 columns tank and thickness by using
command gl()
 Thickness =gl(5,3,15)
 Tank = gl(3,1,15)

Then we’re going to enter the value for measured plank and create a table to
compute:
 Measured = c(32,51,68,123,108,80,10,26,26,41,24,28,19,20,28)
 Topic2_4 = data.frame(Thickness,Tank,Measured)
 print(Topic2_4)

Finally, we use aov() then summary() to see the result


 Topic2_4 = aov (Measured ~ Thickness + Tank, data = Topic2_4)
 summary(Topic2_4)

So we have
Thickness Pr(>F) =0.000978 < significant level = 0.05 so H1 is rejected, which
means measured plank in experiment do depends on the thickness of nickel
coating.
Tank Pr(>F) = 0.99 > 0.05 so H2 is accepted, which means measured plank in
experiment doesn’t depend on the type of plank.

You might also like