Project 2

2011
Group:
- Nguyn Quang Pht _ BAIU09433
- Trng Phc Tn V _BAWE09285
- L Th Bch Tuyn _BAIU08233
- Nguyn Vn Tun _BAIU09448
- Nguyn Hong Ph _BAIU09149
27/12/2011
Statistic Project
2 | P a g e

REQUIREMENT
Using these information and instructions for next three questions:
Objective: Conduct a small research for the IU Cafeteria service.
Method:
- Use the questionnaire below to ask the students evaluation (grading from 0 to 100
points) for the IU Cafeteria service.
- Use some statistics tools to study about the evaluation.
Question 1:
Write a short essay to describe and explain your sampling process. Interpret your collected data.
Do you believe that the whole service of IU Cafeteria is good (> 70 points)? Explain.
Question 2:
Consider the whole service score of IU Cafeteria as the dependent variable and the others as the
independent variables.
a. Test the correlation among independent variables.
b. Establish regression relationship, write down the regression equation. What the aspect is
affect most on IU Cafeteria service?
c. Use the regression equation to estimate new value of dependent variables with the given
independent variables.

Question 3:
Applying Chi-squared test to test the normality of score of IU Cafeterias whole service? Using
o = 0.05

Using the data of 2011 IU entrance examination scores (be attached) to do next two
questions:
Question 4:
Do you believe that the scores of all divisions (A, B, D1) are equal? Explain.
Which division has the highest scores and which one has the lowest scores? Interpret your
answer.

Question 5:
Apply Chi-squared test to test the independent between the scores of students and their gender.
(Hint: Classify the scores into 3 or 4 scales. For example, scores are classified to <17, 17 to
<20, 20 to <24 and 24)

3 | P a g e

ANSWER
Question 1:
For collecting the data randomly, we operated through two channels online and offline. The first
channel, we used combine internet tools such as Google Doc, Facebook, mail, etc. A worksheet was
build likely

We used the worksheet for both online and offline. For offline channel, we printed and gave it for
randomly some students in some class. After summing up the data (more than 60 persons), we checked
again any data that was made by same one person. In that case, we would pick one which was made on
the last day. In finally, we using function RANDOM in Excel program in order to collect 40 samples
data. That is the way we used to collect data randomly.
4 | P a g e

Sample data (WS)

H
0
: > 70
H
1
: < 70

Since p- value too small, so we reject null hypothesis. Thus, the whole service of IU Cafeteria is
not good
Question 2:
a) The correlation among independent variables
Correlation
matrix

Cafeteria quality

1 2 3 4 5

FQ FD WT FC ET
1 FQ 1.0000
2 FD 0.6511 1.0000
3 WT 0.5766 0.6149 1.0000
4 FC 0.2733 0.2863 0.2973 1.0000
5 ET 0.3409 0.4372 0.5332 0.4190 1.0000

Evidence

Sample size 40 n

Sample Mean 59.95 x-bar

Sample Stdev. 15.1859 s

o Unknown; Population Normal

Test
Statistic
-4.1856 t

At an o
of

Null Hypothesis p-value 5%

H
0
: = 70 0.0002 Reject

H
0
: > 70 0.0001 Reject

H
0
: s 70 0.9999

70 80 70 70
50 60 30 50
65 80 60 65
50 80 65 85
45 60 60 73
40 50 65 75
60 70 10 70
50 50 60 70
70 70 50 40
65 50 75 40
5 | P a g e

b)
Multiple Regression Results
Cafeteria quality

0 1 2 3 4 5

Intercept FQ FD WT FC ET

b 14.4279817 0.31629 0.1215 0.1019 -0.0352 0.2453

s(b) 8.38897445 0.11652 0.1322 0.122 0.117 0.1326
t 1.71987432 2.71449 0.919 0.8355 -0.3008 1.8494
p-
value 0.0945 0.0104 0.3646 0.4093 0.7654 0.0731

VIF 1.9180 2.0966 2.0330 1.2446 1.5905

ANOVA Table
Source SS df MS F FCritical p-value

Regn. 5186.56 5 1037.3 9.2633 2.4936 0.0000 s 10.582
Error 3807.34 34 111.98

Total 8993.9 39 230.61 R
2
0.5767 Adjusted R
2
0.5144

The regression equation: Y(WS) = 14.43 + 0.32FQ + 0.12FD + 0.1WT 0.04FC +
0.25ET+ c
We have some hypothesis:
H
0
: |
FQ
=0 H
1
: |
FQ
0
H
0
: |
FD
=0 H
1
: |
FD
0
H
0
: |
WT
=0 H
1
: |
WT
0
H
0
: |
FC
=0 H
1
: |
FC
0
H
0
: |
ET
=0 H
1
: |
ET
0
Since, Df=34>30, we use z-distribution for testing
The test statistic value:

6 | P a g e

Z
TFQ
=
0
(
)
= 2.71449
Z
TFD
=
0
(
)
= 0.919
Z
TWT
=
0
()
= 0.8355
Z
TFC
=
0
(
)
= -0.3008
Z
TET
=
0
(
)
= 1.8494
Df= 34; o=0.05 , the critical value : Z
C
= 1.96 . Since, only Z
TFQ
Z
C
e [-1.96 ; 1.96] so that
we can reject H
0
: |
FQ
=0. It means that based on the hypothesis testing, we have enough evidence
to prove that the variable Food Quality is significant. Or Food quality affects most on IU
Cafeteria service.

c) Use the regression equation to estimate new value of dependent variables with the given
independent variables
Given X FQ FD WT FC ET
1 70 80 65 75 85

So, Y(WS) = 14.43 + 0.32(70) + 0.12(80) + 0.1(65) 0.04(75) + 0.25(85)= 71.18

7 | P a g e

Question 3:
H
0
: The population has a normal distribution
H
1
: The population is not normally distributed
We know that the probability that the value of Z will be between -1 and 1 is about 0.68. We also
know that the probability that Z will be between -2 and 2 is about 0.95, and we know other such
probabilities. Figure below shows one possible partition of the standard normal distribution to
intervals and their probabilities

Using the values 0.44 and 1 and their negatives, we get a complete partition of the Z scale into
the six intervals: - to -1, with associated probability of 0.1587; -1 to -0.44, with probability
0.1713; -0.44 to 0, with probability 0.1700; 0 to 0.44, with probability 0.1700; 0.44 to 1, with
probability 0.1713; and, finally, 1 to , with probability 0.1587
Now we transform the Z scale values to interval boundaries for our problem:
x
1
= 59.95 + (-1)(15.19) = 44.76
x
2
=59.95 + (-0.44)(15.19) = 53.27
x
3
=59.95 + (0)(15.19) = 59.95
x
4
=59.95 + (0.44)(15.19) = 66.63
x
5
=59.95 + (1)(15.19) = 75.14
Therefore, we have the following table:

Mean 59.95

Std. Devn. 15.1859

Size 40

Class Interval Actual Expected

- to
44.76
5 6.348 _
2
13.7333 df 3
44.764 to 53.27 9 6.852
53.268 to 59.95 0 6.8 p-value 0.0033
59.95 to 66.63 11 6.8
66.632 to 75.14 11 6.852

75.136
to 4 6.348

Since _
2
=13.7333 > _
2
(0.05, 3)
= 7.81473, H
0
is rejected at 0.05 level of significant. Therefore, the
population is not normally distributed
8 | P a g e

Question 4:
Step1. Collect data form the IU entrance exam 2011:
+ Converting the PDF format into Excel format.
+ Choose the range for each division A, B, D1
+ Using Randbetween to take the random samples from the population
+ For each division, we take about 15 samples.
Division A
No. Name Birthday Gender Score Major
3 QSQA.00533 m Minh Khu 01/06/93 Nam 19 Cng ngh thc phm
5 QSQA.00828 L Cao Nhi 25/10/93 N 16.5 Cng ngh thc phm
9 QSQA.01573 Nguyn Nh
Phng
20/02/93 N 15.5 Cng ngh thc phm
252 QSQA.01031 L Nguyn Cm
San
02/02/93 N 14 Cng ngh sinh hc
281 QSQA.00228 Nguyn Thnh
t
19/11/93 Nam 14 Cng ngh sinh hc
231 QSQA.01589 Huznh Ngc Bo
Trn
241 QSQA.01051 Phm Anh Ti 28/10/93 Nam 26 Cng ngh sinh hc
289 QSQA.01555 Tng Tho Uyn
Minh
12/11/93 N 13 Qun l{ ngun li thu
sn
249 QSQA.00958 Nguyn Ngc
Mai Phng
444 QSQA.00740 Trn i Ngha 20/07/93 Nam 25 in t - vin thng
450 QSQA.00591 Nguyn Hong
Phng Linh
26/11/93 N 15 K thut h thng cng
nghip
429 QSQA.00403 Dng Minh
Hong
01/04/91 Nam 13.5 in t vin thng
926 QSQA.01215 Nguyn Anh
Th
19/03/93 N 21.5 Qun tr kinh doanh
878 QSQA.00566 Nguyn Phng
Ln
04/04/93 Nam 18 Qun tr kinh doanh
844 QSQA.00074 Phm L T Anh 02/09/93 N 15 Qun tr kinh doanh

9 | P a g e

Division B
No Name Birthday Gender Score Major
228 QSQB.00017 L Th Ngc Anh 04/04/93 N 17.5 Cng ngh sinh hc
206 QSQB.00111 Trn Ngc Huy 17/08/93 Nam 19.5 Cng ngh sinh hc
209 QSQB.00107 Trn Th Khnh Ha 17/10/93 N 16 Cng ngh sinh hc
147 QSQB.00365 Nguyn Hi Tr 21/04/93 Nam 19 Cng ngh sinh hc
146 QSQB.00352 Nguyn Vng Bo Trm 04/03/93 N 19 Cng ngh sinh hc
221 QSQB.00077 Thanh Hi 20/07/93 Nam 17.5 Cng ngh sinh hc
337 QSQB.00368 L Trung 05/04/93 Nam 24.5 Cng ngh sinh hc
331 QSQB.00071 V Ngc Gic 24/01/93 Nam 19.5 Cng ngh sinh hc
536 QSQB.00362 Nguyn Minh Trit 10/08/93 Nam 17 K thut y sinh
548 QSQB.00110 C Gia Huy 13/03/93 Nam 29 K thut y sinh
535 QSQB.00378 Trnh Nht Tun 27/11/93 Nam 21 K thut y sinh
533 QSQB.00430 L Dng Bnh 20/08/93 Nam 16.5 K thut y sinh
553 QSQB.00204 Hong Kim Ngn 30/11/93 N 16 K thut y sinh
556 QSQB.00188 Trn Kiu My 11/01/93 N 16 K thut y sinh
541 QSQB.00253 Trn Cao Phc 27/01/93 Nam 18 K thut y sinh

Division D1
No Name Birthday Gender Score Major
606 QSQD1.00268 Trn Th Diu Hin 13/09/93 N 22.5 Ti chnh - ngn hng
580 QSQD1.00580 Nguyn ng Khi
Nguyn
19/02/93 Nam 19.5 Ti chnh - ngn hng
585 QSQD1.00564 Huznh L Hng Ngc 11/05/93 N 21 Ti chnh - ngn hng
10 | P a g e

603 QSQD1.00338 Phm Quznh Thin
Hng
12/12/93 N 21.5 Ti chnh - ngn hng
582 QSQD1.00548 Nguyn Kim Ngn 10/03/93 N 22 Ti chnh - ngn hng
607 QSQD1.00270 ng Nguyn Tho Hin 03/12/93 N 23.5 Ti chnh - ngn hng
718 QSQD1.01126 Phm Hong Yn 06/08/93 N 23.5 Ti chnh - ngn hng
752 QSQD1.01088 Phm Thanh Vn 14/08/93 N 21 Qun tr kinh doanh
745 QSQD1.01066 Phan Hong Phng
Uyn
27/04/93 N 22 Ti chnh - ngn hng
794 QSQD1.01150 Lng Th Thy Nga 10/12/92 N 18.5 Qun tr kinh doanh
793 QSQD1.01149 Trn Minh Nam 10/10/93 Nam 18.5 Qun tr kinh doanh
1044 QSQD1.00607 ng Hin Xun Nhi 13/11/93 N 21.5 Qun tr kinh doanh
1048 QSQD1.00646 Phan Th Bo Nh 21/02/93 N 17.5 Qun tr kinh doanh
996 QSQD1.00332 Trn Vit Hng 02/11/93 Nam 24 Qun tr kinh doanh
1010 QSQD1.00704 Nguyn H Phng 25/04/93 N 19.5 Qun tr kinh doanh

Step 2: Select the necessary data, we have
Division A Division B Division D1
1 19 17.5 22.5
2 16.5 19.5 19.5
3 15.5 16 21
4 14 19 21.5
5 14 19 22
6 12 17.5 23.5
7 26 24.5 23.5
8 13 19.5 21
11 | P a g e

9 15 17 22
10 25 29 18.5
11 15 21 18.5
12 13.5 16.5 21.5
13 21.5 16 17.5
14 18 16 24
15 15 18 19.5

Step 3: Solve the data to clarify whether the score of all division are equal or not.
- We assume that the three treatments (or three samples) are independent of each other and three
populations are normally distributed.
-The Null and Alternative Hypothesis:
Ho:
1
=
2
=
3
H
1
: Not all
i
( i= 1, 2, 3) are equal

-Summary:
Count SUM
Average
Stdv.S
A 15 253 16.86666667 4.273952113
B 15 286 19.06666667 3.560029427
D1 15 316 21.06666667 1.989855223

12 | P a g e

ANOVA TABLE
Using the template, we easily get the result:

CONCLUSION:
From the result we get above, we reject Ho, its mean that we do not have enough statistical evidence to
prove the equality of the score in the three division.
As we can see in the template, the Turkey test for pairwise comparison of group means, there is only the
difference between X
1
and X
3
is Significant. In addition, the average value of X
3
> X
1
, we can conclude
that the division D1 has highest score and the division A has lowest score.

13 | P a g e

Question 5:
We use function of Excel is that RANDBETWEEN(1;1402) at cell J2 . We have received a
number which is taken randomly in interval [1; 1402], after that we pick the data which
correspond with ordinal number. For example, we run function RANDBETWEEN(1;1402) and
have gotten number equals 837; after that we pick the 873
th
data is

We repeat the practice of pick data 100 times and we got the 100 data randomly. Using Excels
function like FILTER , we statistic the number of students score into four scale :
V Khoa 03/02/93 Nam 02.14 - Qun Tn Bnh, TP H Ch Minh 24 407 Qun tr kinh doanh
Nam 19
Nam 24
Nam 22
gender
N 22
male female
Nam 21.5
<=17 21 26
N 21
17<-<=20 9 22
N 20
20<-<=24 5 15
N 19
>24 2 0
N 19
Nam 19
Nam 19
N 18.5
N 18.5
N 18
N 18
Nam 18
N 17.5
N 17.5
N 17
Nam 17
N 16.5
N 16.5
N 16.5
Nam 16
N 15.5
N 15.5
N 15.5
N 15.5
Nam 14
H
0
: The scores of students and their gender are independent of
each other.
H
1
: The scores of students and their gender are not independent.

14 | P a g e

Since _
2
=6.67 < _
2
(0.05, 3)
= 7.81473, we cannot reject H
o
. It means that based on the Chi-square
testing for independence, we have enough to prove that the scores of students and their gender are
independent of each other.
N 13.5
N 13.5
N 13
N 13
N 12
Nam 24
N 22
N 21,5
N 21
Nam 21
N 19,5
Nam 18,5
N 18,5
N 17,5
Nam 17
Nam 17
Nam 16
N 15,5
N 15,5
Nam 13,5
N 12,5
We use Chi-squared test in the template for testing the independent between
the scores of students and their gender. We have gotten result below:
Chi-square Contingency Table Test for
Independence

male female Total

<=17 Observed 21 26 47

(O - E) / E 0,75 0,44 1,19

17<-<=20 Observed 9 22 31

(O - E) / E 0,53 0,31 0,84

20<-<=24 Observed 5 15 20

(O - E) / E 0,78 0,46 1,24

>24 Observed 2 0 2

(O - E) / E 2,15 1,26 3,41

Total Observed 37 63 100

(O - E) / E 4,21 2,47 6,67

6,67 chi-square

3 df

,0830 p-value

,250 Coefficient of Contingency

Project 2

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Project 2

Uploaded by

Copyright:

Available Formats

2011

You might also like