Professional Documents
Culture Documents
Data Analysis of Research Methodology
Data Analysis of Research Methodology
Single variate Analysis: Average ( A.M, G.M, H.M, Me, Mo), Variation (s, M.D, R, C.V ),
Skewness, Kurtosis, Test of hypothesis
Bivariate Analysis: Simple correlation, Simple Regression, Trend Analysis, Ratio Analysis,
Test of hypothesis
2=
(O − E )2 =
O2
−n Where, O=observed frequency, E=expected
E E
theoretical frequency.
Degree of freedom:
Uses of 2-test:
1
Conditions for the application of 2 test:
The following six basic conditions must be met in order for chi-square
analysis to be applied.
One of the most frequent uses of 2 is for testing the null hypothesis that
two criteria of classification are independent. They are independent if the
distribution of a criterion in no way depends on the distribution of the other
criterion. If they are not independent, there is an association between the
criteria.
2
Ai Oi1 Oi2 ... .... Oij ... Oic Ri
To test the above null hypothesis, the required test statistic is given by,
O 2 ij
2= − N , Where, Oij= Observed frequency in the ith row and
i j E ij
jth column
Ri C j
Eij= Expected frequency = , Ri is the row total and Cj is the column total
N
If the calculated of 2 is greater than critical value, then null hypothesis will be
rejected otherwise accepted.
Example :
A sample of 200 people with a particular disease was selected. Out of these,
100 were given a drug and the others were not given any drug. The results are as
follows:
3
Drug No drug Total
Cured 65 55 120
Not cured 35 45 80
Solution:
Here, null hypothesis, H0: Drug and diseases are independent. i.e. drug is
not effective
To test the above null hypothesis the required test statistic is given by-
Oij Ri C j Oij
2
E ij =
N E ij
65 120 100 65 2
= 60 = 70.416
200 60
35 80 100 35 2
= 40 = 30.625
200 40
55 120 100 55 2
= 60 = 50.416
200 60
45 80 100 45 2
= 40 = 50.625
200 40
4
Oij2 Oij2
= −n
2
=202.17
i j Eij E
i j ij
=202.107-200
=2.107
Since calculated value of 2 is less than critical value of 2 hence the null
hypothesis may be accepted, i.e. drug is not effective in curing the disease.
Exercise:
For knowing relationship between soft drink and gender of the student of an
institution, a survey is conducted on 135 students. The findings are as follows:
Z -test/Normal- test
Let U be a statistic. E (U) and б(U) be the expected value and standard deviation
of U respectively. If population standard deviation is known or estimated from
large sample (n 30) then normal test or Z-test is defined as
U − E (U ) U − E (U )
Z= or Z =
(U ) estimated (U )
5
Uses of normal test:
Single population mean, two population means, proportion and correlation may
be tested by normal test.
Example: A drug research experimental unit is testing two drugs newly developed to reduce
blood pressure levels. The drugs are administered to two different sets of animals. In group one,
350 of 600 animals tested respond to drug one and in group two, 260 of 500 animals tested
respond to drug two. The research unit wants to test whether there is a difference between the
efficacies of the said two drugs at 5 per cent level of significance. How will you deal with this
problem?
Solution:
We take the null hypothesis that there is no difference between the two drugs i.e., H0: π1 = π2
The alternative hypothesis can be taken as that there is a difference between the drugs i.e.,
6
Ha: π1 ≠ π2
For testing the significance of difference, the required test statistic is as follows
Given information can be stated as: p1 = 350/ 600= 0.583, n1 = 600, p2 = 260/500 =0.520 , n2 =
500
Now,
(0.583 − 0.520) − 0
𝑧= = 2.093
√0.583(1 − 0.583) + 0.520(−0.520)
600 500
At 𝛼 =5% level of significance, the critical value of z =1.96. Since calculated value of z=2.093
is greater than the critical value of z =1.96, hence null hypothesis may be rejected. Thus, we
conclude that the difference between the efficacies of the two drugs is significant.
Exercise:
F-test
Uses of F–test:
F-test is mainly used to test the null hypothesis regarding the equality of
two population variances, homogeneity of independent estimates of population
7
means, significance of sample correlation ratio and also for testing the linearity of
regression.
To test the above null hypothesis the required test statistic is given by
s12
F = ( s12 s 2 2 ) which follows F-distribution with n1-1 and
s2 2
s 2
n2–1 degree of freedom or F = 2 ( s 2 2 s12 ) which follows F-distribution with n2-
s12
1 and n1–1 degree of freedom
Example:
Solution:
Here, null hypothesis, H0: 612 = 622 i.e. the variance of source A and that of
source B are same.
8
Here, alternative hypothesis, H0: 612 >622 i.e. the variance of source A and that of
source B are not same.
To test the above null hypothesis the required test statistic is given by
s12
F=
s2 2
F = 225 =1.1 25
200
The tabulated value of F with n1-1=10-1=9 and n2–1=11-1=10 d.f. at =0.05 level
of significance is 3.02. Since the calculated of F is less than tabulated value, hence
null hypothesis may be accepted, i.e. the variance of source A and that of source B
are same.
Exercise:
A sample of the monthly earnings records of 15 employees of company A has a variance
of Tk. 15.90 while a similar sample of 27 employees for company B has a variance of Tk. 17.50.
Is it safe to assume that there is less variance in company A than in company B?
Exercise:
CGPA of two sections of students each section containing 10 students of BBA 7th
semester of IIUC is given below:
CGPA of section A: 3.90, 4.00, 3.78, 3.50, 2.90, 3.45, 3.80, 3.95 3.98 3.70
CGPA of section B: 4.00, 2.95, 3.50, 3.25, 3.59, 3.80, 3.60, 3.90 3.60 3.55
9
Example: The following data gives the yield of wheat, amount of fertilizer and level of irrigation
of seven fields:
Yield of wheat (in 100 kg) Amount of fertilizer (kg/acre) Level of irrigation
40 10 100
50 20 200
50 30 300
70 20 400
65 25 450
68 32 470
80 35 500
i) Find regression equation of yield of wheat on amount of fertilizer and level of
irrigation;
ii) Estimate probable yield of wheat if amount of fertilizer and level of irrigation are 40
and 520 respectively;
iii) Construct analysis of variance (ANOVA) table;
iv) Find co-efficient of multiple determination and comment;
v) Test whether the regression as a whole is significant;
Solution:
i)
We have, regression equation of yield of wheat on amount of fertilizer and level of irrigation
y=a +b1x1+b2x2
10
∑ 𝑦 423 ∑ 𝑥1 172 ∑ 𝑥2 2420
𝑦̅= = =60.43, 𝑥
̅̅̅1 = = =24.57, 𝑥
̅̅̅2 = = =345.71
𝑛 7 𝑛 7 𝑛 7
𝑆𝑆(𝑥2)𝑆𝑃(𝑥1𝑦)−𝑆𝑃(𝑥1𝑥2)𝑆𝑃(𝑥2𝑦)
b1= 𝑆𝑆(𝑥1)𝑆𝑆(𝑥2)−{𝑆𝑃(𝑥1𝑥2)}2
𝑆𝑆(𝑥1)𝑆𝑃(𝑥2𝑦)−𝑆𝑃(𝑥1𝑥2)𝑆𝑃(𝑥1𝑦)
b2= 𝑆𝑆(𝑥1)𝑆𝑆(𝑥2)−{𝑆𝑃(𝑥1𝑥2)}2
(∑ 𝑥1 )2 (172)2
SS(x1)=∑ 𝑥1 2 − =4674 – =447.71
𝑛 7
(∑ 𝑥2 )2 (2420)2
SS(x2)=∑ 𝑥2 2 − =973400 - =136771.43
𝑛 7
(∑ 𝑦)2 (423)2
SS(y)=∑ 𝑦 2 − =26749- =1187.71
𝑛 7
∑ 𝑥1 ∑ 𝑦 172×423
SP(x1y)=∑ 𝑥1𝑦 − =10901- =507.29
𝑛 7
∑ 𝑥2 ∑ 𝑦 2420×423
SP (x2y)=∑ 𝑥2𝑦 − =158210 - =11972.86
𝑛 7
∑ 𝑥1 ∑ 𝑥2 172×2420
SP (x1x2)=∑ 𝑥1𝑥2 − =65790 - =6327.14
𝑛 7
𝑆𝑆(𝑥2)𝑆𝑃(𝑥1𝑦)−𝑆𝑃(𝑥1𝑥2)𝑆𝑃(𝑥2𝑦) 136771.43×507.29−6327.14×11972.86
b1= = =-0.301
𝑆𝑆(𝑥1)𝑆𝑆(𝑥2)−{𝑆𝑃(𝑥1𝑥2)}2 447.71×136771.43−{6327.14}2
𝑆𝑆(𝑥1)𝑆𝑃(𝑥2𝑦)−𝑆𝑃(𝑥1𝑥2)𝑆𝑃(𝑥1𝑦) 447.71×11972.86−6327.14×507.29
b2= = = 0.101
𝑆𝑆(𝑥1)𝑆𝑆(𝑥2)−{𝑆𝑃(𝑥1𝑥2)}2 447.71×136771.43−{6327.14}2
11
iv) We have, Co-efficient of multiple determination
𝑆𝑆(𝑅𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛) 1056.56
R2 = = 1187.71 = 0.89 =89%
𝑆𝑆(𝑇𝑜𝑡𝑎𝑙)
R2 indicates that 89% variation in yield of wheat has been occurred because of amount of
fertilizer and level of irrigation.
The tabulated value of F with 2 and 4 d.f. at =0.05 level of significance is 6.94. Since the
calculated of F is greater than tabulated value, hence null hypothesis may be rejected and,
alternative hypothesis may be accepted i.e. the regression as a whole is significant.
Exercise: A soft drink bottler is analyzing the vending machine serving routes in his
distribution system. He is interested in predicting the time required by the distribution driver to
service the vending machines in an outlet. This service activity includes stocking the machines
with new beverage products and performing minor maintenance or housekeeping. It has been
suggested that the two most important variables influencing delivery time (y in min) are the
number of cases of product stocked (x1) and the distance walked by the driver (x2 in feet). 25
observations on delivery times, cases stocked and walking times have been recorded.
Solution:
ii)
12
Exercise: The district manager of Jasons , a large discount retail chain , is investigating why
certain stores in her region are performing better than others. She believes that three factors are
related to total sales(Y): the number of competitor in the region (X1), the population in the
surrounding area(X2), and the amount spend on advertising(X3). From her district, consisting of
several hundred stores, she selects a random sample of 30 stores. The sample data were run on
the SPSS software package and the result with some missing figures are given below:
Solution:
i)
13
Predictor coef StDev t-ratio
Constant 14 7 ----
X1 -1 0.70 -1/0.70=-1.43
X2 30 5.20 30/5.20=5.76
X3 0.20 0.08 0.20/0.08=2.5
ii) Here, Null hypothesis, H0: the regression as a whole is insignificant
The tabulated value of F with 3 and 26 d.f. at =0.05 level of significance is 2.98. Since the
calculated of F is greater than tabulated value, hence null hypothesis may be rejected and,
alternative hypothesis may be accepted i.e. the regression as a whole is significant.
iii) Here, Null hypothesis, H0: Slope of no. of competitor in the region is insignificant
Absolute t =1.43
v) The tabulated value of t with 26 d.f. at =0.05 level of significance is 2.056. Since the
calculated of t is smaller than tabulated value, hence null hypothesis may be accepted and,
alternative hypothesis may be rejected i.e. Slope of no. of competitor in the region is
insignificant. Thus , we can delete any of the variable.
14