Professional Documents
Culture Documents
Question - A: Y n N (μ, σ n Y −1/2, ´Y +1/2) μ
Question - A: Y n N (μ, σ n Y −1/2, ´Y +1/2) μ
Question - A: Y n N (μ, σ n Y −1/2, ´Y +1/2) μ
1. Study the Lungcap data set and answer the following questions.
2. Suppose it is given that 20% of the male smokers and 15% of the female smokers were born
caesarean. With the help of the data, verify the above statements. Give enough reasons for your
answers.
3. Plot the histogram of the distribution of Lungcap amongst smokers.
4. Plot the histogram of the distribution of Height amongst smokers.
5. Are height and Lungcap independent?
6. Are the variation of Lungcap of male smokers and female smokers equal?
7. Are the average of Lungcap of smokers and non-smokers equal?
8. Plot the histogram of the age amongst smokers.
9. What percentage of people below 16 years smoke?
10. What percentage of people above 17 years smoke?
11. Test if smoking habit and age are dependent.
12. Test if smoking habit and Lungcap are dependent.
13. Fit a suitable distribution to height and also to Lungcap. Test the goodness of fit.
QUESTION – B
Study the car data set and answer the following questions.
1. Find the average and variance of price and mileage separately. Comment on the results. How will
you interpret the result statistically?
2. Test if the mean mileage of different car manufacturers within some price range are equal.
Clearly specify all the assumptions and the null and alternative hypotheses.
3. Find a 90% confidence price range for the Chevrolet cars.
4. Find a 90% confidence for variance of prices for Pontiac cars.
5. Calculate the correlation coefficient between mileage and Liter for each company.
6. Comment on the results.
7. Suppose a car has a Liter of 3.8. How sure will you be that its mileage is more than 20,000?
8. Is there any correlation between prices and mileage?
QUESTION – C
1. Let Ý be the mean of a random sample of size n1from N ( μ , σ 2=10) . Find n1 such that the
probability of the random interval ( Ý −1/2, Ý +1/2) includes μ is approximately 0.954.
2. Let Ź be the mean of a random sample of size n2 from N ( μ , σ 2=9 ) . Find n2 such that the
probability of the random interval ( Ź−1 , Ź +1) includes μ is approximately 0.90.
3. Draw 200 random samples each of size n1 (found above) from a normal distribution with mean 5
and variance 3.
4. Write down the distribution of the sample mean. Test using the data obtained in Q3 above, if the
sample means follow that distribution.
5. Draw 200 random samples each of size n2 (found above) from a normal distribution with mean 7
and variance 3.
6. Compute 95% confidence interval for the difference of means from each of the 200 samples.
Draw a graph to show all 200 confidence intervals and comment.
QUESTION – D
1. Collect stock prices for 5 companies from 1st Jan 2016 to 30th June 2016.
2. Plot the histogram of the returns for each company. Describe the histograms.
3. Test whether the average returns for 5 companies are equal. State clearly the assumptions
required, null and alternative hypotheses.
4. Test whether the average returns for each pair of companies are equal.
5. Comment on the results.
QUESTION – E
1. The income distribution of a very large population is exponential with average income ₹ 40, 000
per annum. Draw 500 samples (from the income distribution) of size 100 each. Sketch the
distribution of sample average income. Comment.
2. The age distribution of a very large population is given below:
Age Group 15-18 18-21 21-23 23-25 25-27 27-29 29-31 31-33 33-35
(years)
Proportion 0.1 0.1 0.1 0.1 0.2 0.1 0.1 0.1 0.1
Draw 100 samples (from the age distribution) of size 50 each. Sketch the distribution of sample
average age. Comment.
1|Page
Section-A
Q.1.i.
Two-Way Table
Smoking Habit
Gender Non Grand
Smoker Smoker Total
Male 33 334 367
Female 44 314 358
Grand Total 77 648 725
Q.1.ii. Marginal Probabilities
Smoking Habit
Gender Non Marginal
Smoker Smoker Probability
Male 0.046 0.461 0.506
Female 0.061 0.433 0.494
Marginal
Q.1.iii Given that one randomly0.106
Probability selected person
0.894is a smoker, probability
1.000 that the person is
female:
P(Female|Smoker) = #of female smokers
#of smokers
= 44
77
= 0.571
Since cal < 1,0.05 , the p-value for cal (>10%) is more than 5%. Hence there is not sufficient
evidence to reject HO and we accept that gender and smoking habits are independent.
Q.2 Given 20%(=m )of male smokers and 15%(=f ) of female smokers were born caesarean.
2|Page
a) As per the sample,
# of male smokers = 33 , # of male smoker born caesarean =10
Proportion of male smoker born caesarean, P m =10/33 =30.3%
Sample size,Nm=33
Since sample size > 30, as per CLT, Pm ~ N(Pm,SDm)
Standard deviation, SDm= sqrt(Pmx(1-Pm)/ Nm)= 0.08
Zcal=Pmm)/SD = (30.30%-20%)/0.08 = 1.29
Z+cri=Z0.975=1.96; Z-cri=Z0.025=-1.96
Hypothesis Statement:
HO: Proportion of male smoker, m = 20%
HA: Proportion of male smoker, m ≠ 20%
Rejection Rule
Reject HO if Zcal > Z+cri or Zcal < Z-cri
Since Z-cri > Zcal (=1.29) < Z+cri , there is not enough
evidence to reject HO.
Hence we accept the hypothesis that 20% of the male smokers were born caesarean.
b) As per the sample
# of female smokers = 44 , # of male smoker born caesarean =11
Proportion of male smoker born caesarean, P f =11/44=25%
Sample size,Nf=44
Since sample size > 30, as per CLT, Pf ~ N(Pf,SDf)
Standard deviation, SDf= sqrt(Pfx(1-Pf)/ Nf)= 0.065
Zcal=Pff)/SD = (25% - 15%)/0.065 = 1.53
Z+cri=Z0.975=1.96; Z-cri=Z0.025=-1.96
Hypothesis Statement:
HO: Proportion of female smoker, f = 15%
HA: Proportion of female smoker, f ≠ 20%
Rejection Rule
Reject HO if Zcal > Z+cri or Zcal < Z-cri
Since Z-cri > Zcal (=1.53) < Z+cri , there is not enough
evidence to reject HO.
Hence we accept the hypothesis that 15% of the female smokers were born caesarean.
Q3.&4.
3|Page
Q5. Hypothesis Statement:
Lungcap Height Total
H0: Height and lungcap are independent for the
<63 >63
following ranges
<7 229 34 263
HA: Height and lungcap are dependent
>7 54 408 462
Rejection Rule: Total 283 442 725
Reject Ho if cal is less than 5% p-value.
Observed Frequencies Expected Frequencies Difference Sq. Diff./Exp. Freq
F Value Given E Value Expected (Fij - Eij) (Fij - Eij)^2/Eij
F11 229 E11 102.66 126.34 155.479
F12 34 E12 160.34 -126.34 99.549
F21 54 E21 180.34 -126.34 88.509
F22 408 E22 281.66 126.34 56.670
Degrees of freedom= (2-1)(2-1)=1 cal 400.207
1,0.05 3.841
Since cal > 1,0.05 , the p-value for cal (~0%) is less than 5%. Hence reject H O and state that
Height and lungcap are dependent.
Since Fcal (=1.19)< Fcrit(1.96), there is not enough reasons to reject H O. Hence we accept the
hypothesis and state that the variances of male smokers and female smokers are equal.
Q7. Let 1 and 2 be the average of lungcap of smokers and non-smokers. Whereas 12 and 22 are
the sample variance of the respective population.
x1= Random Variable of average of lungcap of sample smokers ~ N(1,12/n1)
x2= Random Variable of average of lungcap of sample non-smokers~ N(2,22/n2)
4|Page
As per data,
No. of smokers, n1=77 No. of non-smokers, n2= 648
Average of lungcap of smokers x1=8.645 Average of lungcap of non-smokers x2= 7.77
Sample lungcap variance of smoker, s= 3.545 Sample lungcap variance of non-smoker, s=
7.432
5|Page
Q.11. H0: Age and smoking habit are independent Smoking Age
for the above age ranges Habit <15 >15 Total
HA: Age and smoking habit are dependent for the Yes 42 35 77
above age ranges No 506 142 648
Reject Ho if cal is less than 5% p-value. Total 548 177 725
Observed Frequencies Expected Frequencies Difference Sq. Diff./Exp. Freq
F Value Given E Value Expected (Fij - Eij) (Fij - Eij)^2/Eij
F11 42 E11 58.20 -16.20 4.510
F12 35 E12 18.80 16.20 13.963
F21 506 E21 489.80 16.20 0.536
F22 142 E22 158.20 -16.20 1.659
Degrees of freedom= (2-1)(2-1)=1 cal 20.668
1,0.05 3.841
Since cal > 1,0.05 , the p-value for cal (~0%) is less than 5%. Hence reject H O and state that age
and smoking habit are dependent.
Q.12. Hypothesis Statement:
H0: Lungcap and smoking habit are independent for Smoking Lungcap
the above lungcap ranges Habit <9 >9 Total
HA: Lungcap and smoking habit are dependent for Yes 43 34 77
the above lungcap ranges No 432 216 648
Total 475 250 725
Reject Ho if cal is less than 5% p-value.
Observed Frequencies Expected Frequencies Difference Sq. Diff./Exp. Freq
F Value Given E Value Expected (Fij - Eij) (Fij - Eij)^2/Eij
F11 43 E11 50.45 -7.45 1.100
F12 34 E12 26.55 7.45 2.089
F21 432 E21 424.55 7.45 0.131
F22 216 E22 223.45 -7.45 0.248
Degrees of freedom= (2-1)(2-1)=1 cal 3.568
1,0.05 3.841
Since cal < 1,0.05 , the p-value for cal (>=5%) is more than 5%. Hence there is not enough
reasons to reject HO and state that lungcap and smoking habit are independent.
Q13. As per the data, we have the following descriptive statistics for lungcap and height:
LungCap Height
Mean 7.863148 Mean 64.83628
Standard Standard
Deviation 2.662008 Deviation 7.202144
Count 725 Count 725
6|Page
Distribution for Lungcap
HO : We assume the lungcap distribution of the population to follow Normal Distribution
~ N(7.863,2.66)
HA: The lungcap distribution doesn’t follow ~ N(7.863,2.66)
We construct the following frequency distribution with taking bin size such that the frequency
percentage is 10%.
Percentage Z-value Bin Frequency Expected fi-ei (fi-ei)2 (fi-ei)2/ei
(fi) Frequenc
y
(ei)
10% -1.28 4.456 83 72.5 10.5 110.250 1.521
20% -0.84 5.627 62 72.5 -10.5 110.250 1.521
30% -0.52 6.479 67 72.5 -5.5 30.250 0.417
40% -0.25 7.198 61 72.5 -11.5 132.250 1.824
50% 0 7.863 72 72.5 -0.5 0.250 0.003
60% 0.25 8.529 74 72.5 1.5 2.250 0.031
70% 0.52 9.247 79 72.5 6.5 42.250 0.583
80% 0.84 10.099 71 72.5 -1.5 2.250 0.031
90% 1.28 11.271 88 72.5 15.5 240.250 3.314
More 68 72.5 -4.5 20.250 0.279
cal 9.524
We get cal = 9.52
For significance level 5% and degrees of freedom 7 (=10-2-1), we have 7,0.05=14.064.
Since cal < 7,0.05 , p-value will be more than 5% . Hence we accept H O and lungcap distribution
follow ~ N(7.863,2.66).
7|Page
80% 0.84 70.886 70 72.5 -2.5 6.250 0.086
90% 1.28 74.055 98 72.5 25.5 650.250 8.969
More 71 72.5 -1.5 2.250 0.031
cal 17.414
We get cal = 17.414
For significance level 5% and degrees of freedom 7 (=10-2-1), we have 7,0.05=14.064.
Since cal > 7,0.05 , p-value will be less than 5% . Hence we reject H O and height distribution
doesn’t follow ~ (64.836,7.202).
Section-B
We observe that the sample variance of price is more than mileage. That means the spread of
price around average is more than that of mileage. So we can say that wide range of priced cars
have mileage closer to 19831.93.
Q2. Average
Price Range mileage (xi) Variance (i2) Sample Size(ni)
<20000 20241.52 64394503 467
20k-40k 19759.26 65564947 297
>40K 15589.65 95651556 40
Let 1,2 and 3 be the average of mileage of cars in the price range as given in the table.
Whereas 12 ,22 and 32 are the variance of the respective car price range.
Hypothesis Statement
HO : 1= 2=3
HA : 1≠ 2≠3
We conducted
Anova test. Since
the p-Value is less
than 0.05, we
reject HO and state
that the average
mileage of the
cars in above price
range are not
equal.
8|Page
Q3.
Price-Chevrollet t-value Price
t+0.05,319 0.824822 16745.82
Mean 16427.6 t-0.95,319 -0.82482 16109.38
Standard Deviation 6901.439 CL= 636.4364
Sample Variance 47629867
Count 320
Confidence Level(90.0%) 636.4364
Q.4. 150
N
16708238
Sample Variance
171.507
149,0.1
CI Variance (90%) 14515607 (n-1)s2/149,0.1
Q.5. Manufacture
r Cov.(ML) SD(M) SD(L) Corr.(ML)
Buick 162.323 6932.136 0.230 0.102
Cadillac 594.100 8964.292 0.803 0.083
Chevrolet -285.829 8203.571 1.151 -0.030
Pontiac 959.280 8110.435 1.098 0.108
SAAB -9.525 8404.288 0.162 -0.007
Saturn -501.661 8479.994 0.301 -0.197
Q.6. A n a l y s i n g t h e
be stated that there is weak linear relation between mileage and liter as the correlation
coefficients are close to zero.
9|Page
Q8. As correlation between price and mileage Cov.(PM) SD(P) SD(M) Corr.(PM)
is close to zero, there is a weak linear -11589868.158 9884.853 8196.320 -0.143
relation among them.
Section-C:
10 | P a g e
1
0.8
0.6
0.4
0.2
0
-0.2 1 8 1 5 2 2 29 3 6 43 50 5 7 6 4 71 7 8 8 5 92 99 0 6 1 3 2 0 2 7 3 4 4 1 4 8 5 5 6 2 6 9 7 6 8 3 9 0 9 7
-0.4 1 1 1 1 1 1 1 1 1 1 1 1 1 1
-0.6
-0.8
-1
Section D:
Q.1. I collected stock price of Monnet Ispat & Energy Ltd, GAIL (India) Ltd, Alstom India Ltd, ABB
India Ltd and Siemens Ltd from 01.01.2016 to 30.06.2016.
11 | P a g e
Since the p-Value is greater than 0.05, we accept the H O and state that the average return of the
said companies are equal.
12 | P a g e
Section E:
Q.1
13 | P a g e
Since the sample size is more than 30,i.e 100, the average salary of each sample will follow normal
distribution, N(40000,SD=40000/sqrt(10))
Sample mean calculated = 40260.59 ~ 40000
Standard deviation = 3983.923 ~ 4000 (=40000/10)
Q.2 Since the sample size is more than 30, i.e 50, the average age of the samples should follow
normal distribution as per CLT.
Population average age =25.8, and standard deviation = 5.216 then sample average age must have mean
25.8 ~ 25.09 and standard deviation = 5.216/sqrt(50) = 0.737 ~ 0.64
14 | P a g e