Professional Documents
Culture Documents
Ba Assignment 3
Ba Assignment 3
ASSIGNMENT 3
Submitted by:
RADHIKA PADIA (ROLL NO 15)
CIS
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
AG
E
24
22
27
26
29
23
24
22
26
48
37
32
32
37
41
23
34
38
32
29
36
34
32
28
29
47
24
43
42
44
24
24
23
36
39
30
28
42
39
27
36
47
MARTI
AL
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
1
1
1
1
NU
MC
AR
4
1
1
2
1
1
2
2
1
1
1
3
1
1
1
1
5
2
1
3
1
1
2
1
1
2
1
2
3
1
3
4
2
1
1
1
2
1
1
1
2
1
AVAG
E
2.5
3
2.5
2
2.5
3
3
3
2
0.5
1.5
3
1
1
2
2
3
2
1.5
2.5
1.5
1.5
2
2.5
2
1
0.5
1
1.5
0.5
1
2
2.5
2
2
2
4.5
2
1.5
2
2.5
2
NUM
TRIP
1
1
0
2
1
0
0
1
1
3
2
0
0
0
3
0
1
0
0
0
0
0
1
0
3
5
9
2
2
7
2
0
0
0
0
0
2
2
1
3
1
1
CONCE
PT1
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
0
0
0
0
0
0
0
1
0
0
0
0
NUMCA
R1
1
0
0
1
0
0
1
1
0
0
0
1
0
0
0
0
1
1
0
1
0
0
1
0
0
1
0
1
1
0
1
1
1
0
0
0
1
0
0
0
1
0
GROUPS
1
1
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
3
3
3
3
3
3
3
3
4
4
4
4
3
4
5
4
5
5
4
5
4
5
4
4
4
4
5
4
4
5
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
42
42
47
43
62
55
39
58
43
59
43
42
47
38
37
51
47
51
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
1
1
2
1
2
3
1
1
1
3
1
4
2
2
1
2
1
3.5
0.5
1
1.5
0.5
0.5
1.5
2
0.4
0.5
1.5
1.5
1.5
1
0.8
1
1.5
2
1
3
4
4
6
4
5
6
2
7
3
4
4
3
8
2
1
6
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
1
0
1
1
0
0
0
1
0
1
1
1
0
1
0
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
QUESTON 1:
Divide the sample into two groups
a. Those showing high interest 4 or 5 rating on CONCEPT
b. Those showing low interest 1or 2or 3 rating on CONCEPT
Cross tabulate high versus low interest with CIS. How strong is the association between
interest in the policy and the current insurance supplier? Is the association statistically
significant? What does it tell you?
SOLUTION:
THE RECODED VALUES OF CONCEPT IS NAMED AS CONCEPT1 LABELLED AS FOLLOWS:
CONCEPT1(rows) by CIS(columns)
0
Total
22
12
34
18
26
Total
30
30
60
Value
Df
p-Value
Pearson Chi-Square
6.787
1.000
0.009
INTERPRETATION:
Let us assume,
Ho: the association between interest in the policy and the current insurance supplier is not strong.
H1: the association between interest in the policy and the current insurance supplier is strong.
From the above analysis, we get to know that p value is 0.009 which is less than 0.05 (at 5% level of
significance).
Therefore Ho is rejected and H1 is accepted
We can conclude that the association between interest in the policy and the current insurance
supplier is very strong and significant.
QUESTION2:
We can consider the concept rating (CONCEPT) as an Independent Variable and the
remaining 6 variables as predictor variables. Regress CONCEPT on the other variables:
a. Interpret the regression equation and indicate the extent to which those variations in the
predictor variables explain the variation in the independent variable?
b. Is each of the predictor variables significant at 0.05 level? Can a simpler mode (involving
fewer predictors be developed? If so what is the model and what is the percentage
improvement of the simple model over the full model?
SOLUTION:
PART a:
OLS Regression
Dependent Variable
CONCEPT
60
Multiple R
0.869
Squared Multiple R
0.755
0.727
0.705
Coefficient
Standard Error
Std.
Coefficient
Tolerance
p-Value
CONSTANT
1.360
0.588
0.000
2.315
0.025
CIS
-0.067
0.211
-0.025
0.746
-0.318
0.752
MARTIAL
-0.098
0.239
-0.034
0.673
-0.409
0.684
AGE
0.056
0.013
0.418
0.451
4.132
0.000
NUMCAR
0.033
0.099
0.023
0.914
0.328
0.744
AVAGE
-0.442
0.140
-0.282
0.579
-3.160
0.003
NUMTRIP
0.226
0.051
0.383
0.627
4.458
0.000
Analysis of Variance
Source
SS
df
Mean Squares
F-Ratio
p-Value
Regression
81.359
13.560
27.248
0.000
Residual
26.375
53
0.498
INTERPRETATION:
The regression equation is:
The p value is 0.00 which is less than 0.05 at 5% level of significance ,therefore Ho is
rejected and H1 is accepted i.e. , the regression equation is significant in predicting the
dependent variable.
Also, squared multiple r is 0.755, which indicates that the goodness of fit is at a fairly good
level.
THE p value of the constant is 0.025 which suggests that changes in the predictor variables are
associated with changes in the dependent variable.
PART b:
1 OLS Regression
Dependent Variable
CONCEPT
60
Multiple R
0.323
Squared Multiple R
0.105
0.089
1.290
Coefficient
Standard Error
Std.
Coefficient
Tolerance
p-Value
CONSTANT
2.633
0.235
0.000
11.184
0.000
CIS
0.867
0.333
0.323
1.000
2.603
0.012
Analysis of Variance
Source
SS
df
Mean Squares
F-Ratio
p-Value
Regression
11.267
11.267
6.774
0.012
Residual
96.467
58
1.663
INTERPRETATION:
The regression equation is :
2. OLS Regression
Coefficient
Standard Error
Tolerance
p-Value
0.449
Std.
Coefficient
0.000
CONSTANT
-0.549
-1.222
0.227
AGE
0.098
0.012
0.739
1.000
8.348
0.000
Analysis of Variance
Source
SS
df
Mean Squares
F-Ratio
p-Value
Regression
58.796
58.796
69.685
0.000
Residual
48.937
58
0.844
INTERPRETATION:
The regression equation is:
concept=0.549+0.098 AGE
H0: the regression equation is not a significant predictor of the dependent variable( concept1)
H1: the regression equation is a significant predictor of the dependent variable( Concept1)
The p value is 0.000 , which is less than 0.05 at 5% level of significance ,which indicates that H0 is
rejected and H1 is accepted.
Therefore AGE is a significant predictor of concept at 0.05 level of signicance.
3. OLS Regression
Effect
Coefficient
CONSTANT
2.211
MARTIAL
1.253
0.435
1.000
p-Value
7.851
0.000
3.679
0.001
Analysis of Variance
Source
SS
df
Mean Squares
F-Ratio
p-Value
Regression
20.380
20.380
13.532
0.001
Residual
87.353
58
1.506
INTERPRETATION:
The regression equation is:
4. OLS Regression
Coefficient
Standard Error
CONSTANT
3.395
0.352
Std.
Coefficient
0.000
Tolerance
p-Value
9.633
0.000
Coefficient
Standard Error
NUMCAR
-0.195
0.182
Std.
Coefficient
-0.139
Tolerance
p-Value
1.000
-1.073
0.288
Analysis of Variance
Source
SS
df
Mean Squares
F-Ratio
p-Value
Regression
2.095
2.095
1.150
0.288
Residual
105.638
58
1.821
INTERPRETATION:
The regression equation is:
concept=3.3950.195 NUMCAR
H0: the regression equation is not a significant predictor of the dependent variable( concept1)
H1: the regression equation is a significant predictor of the dependent variable( Concept1)
The p value is 0.288 , which is MORE than 0.05 at 5% level of significance ,which indicates that H0
is accepted and H1 is rejected.
Therefore NUMCAR is NOT a significant predictor of concept at 0.05 level of signicance.
5. OLS Regression
Coefficient
Standard Error
Tolerance
p-Value
0.295
Std.
Coefficient
0.000
CONSTANT
4.968
16.856
0.000
AVAGE
-1.074
0.150
-0.685
1.000
-7.164
0.000
Analysis of Variance
Source
SS
df
Mean Squares
F-Ratio
p-Value
Regression
50.576
50.576
51.321
0.000
Residual
57.158
58
0.985
INTERPRETATION:
The regression equation is:
concept=4.9681.074 AVAGE
H0: the regression equation is not a significant predictor of the dependent variable( concept1)
H1: the regression equation is a significant predictor of the dependent variable( Concept1)
The p value is 0.000 , which is less than 0.05 at 5% level of significance ,which indicates that H0 is
rejected and H1 is accepted.
Therefore AVAGE is a significant predictor of concept at 0.05 level of signicance.
6. OLS Regression
Coefficient
Standard Error
Tolerance
p-Value
0.166
Std.
Coefficient
0.000
CONSTANT
2.136
12.843
0.000
NUMTRIP
0.430
0.053
0.729
1.000
8.116
0.000
Analysis of Variance
Source
SS
Df
Mean Squares
F-Ratio
p-Value
Regression
57.286
57.286
65.863
0.000
Residual
50.447
58
0.870
INTERPRETATION:
The regression equation is:
Mallows' Cp MSE
Variables
0.546
0.538
42.339
0.844
AGE
0.532
0.524
45.374
0.870
NUMTRIP
0.707
0.696
9.524
0.555
AGE, NUMTRIP
0.662
0.650
19.222
0.639
AVAGE, AGE
0.754
0.741
1.295
0.474
0.709
0.693
11.100
0.561
0.754
0.736
3.190
0.481
0.754
0.736
3.230
0.482
0.755
0.732
5.101
0.489
10
0.755
0.732
5.108
0.489
11
0.755
0.727
7.000
0.498
Model No
AIC
AICC
BIC
Variables
164.044
164.473
170.327
AGE
165.868
166.296
172.151
NUMTRIP
139.824
140.551
148.201
AGE, NUMTRIP
148.348
149.076
156.726
AVAGE, AGE
131.290
132.401
141.762
141.423
142.534
151.894
133.172
134.756
145.738
133.217
134.802
145.783
135.071
137.225
149.731
10
135.078
137.232
149.739
11
136.956
139.780
153.711
Value
Adjusted R-Sq
0.741
AIC
131.290
AIC (Corrected)
132.401
Schwarz's BIC
141.762
Value
INTERPRETATION:
From the tables above, we see that the adjusted R sq of AVAGE, AGE, NUMTRIP is highest,
i.e. 0.741 which indicates that it is the simple and best subset model.
The AIC value of AVAGE, AGE, NUMTRIP is 131.290, which is the lowest, hence the best
subset.
So, the percentage improvement of the simpler model over the full model would be as
follows:
[(0.741-0.727)/ 0.727]*100= 1.9257%
QUESTION 3:
Divide the sample into 4 groups: Rushmore Single, Rushmore Married, Other Company
Single, and Other Company Married. Run a single factor ANOVA to test the null hypothesis
that the mean of the CONCEPT for the four groups are the same at 5% level of significance?
If not which group has the highest average rating?
SOLUTION:
Analysis of Variance
Effects coding used for categorical variables in model.
The categorical values encountered during processing are
Variables
Levels
GROUPS (4 levels)
Dependent Variable
CONCEPT
60
Multiple R
0.563
Squared Multiple R
0.317
Analysis of Variance
Source
Type III SS
df
Mean Squares
F-Ratio
p-Value
GROUPS
34.150
11.383
8.663
0.000
Error
73.583
56
1.314
INTERPRETATION:
LET US ASSUME:
Ho: the mean of the CONCEPT for the four groups are the same.
H1: the mean of the CONCEPT for the four groups are different.
From the analysis of variance, we get to know that p value is 0.000 at 5% level of
significance ,which is less than 0.05.therefore null hypothesis is rejected and alternate
hypothesis is accepted.
We interpret that at least one of the means is different.
Now to know which mean is different, we need to perform the pairwise comparison,
PAIRWISE COMPARISON:
Post Hoc Test of CONCEPT
Using least squares means.
Using model MSE of 1.314 with 56 df.
Upper
-0.569
0.560
-1.719
0.581
0.148
0.992
-1.263
1.558
-1.727
0.001
-2.848
-0.606
0.717
0.454
-0.562
1.996
-1.158
0.011
-2.109
-0.207
-1.875
0.001
-3.128
-0.622
INTERPRETATION:
We compare the p value of all the 6 groups and select those p values whose value is less than
0.05.
From the above table, p values of group (1,4) , (2,4),( 3,4) are less than 0.05.
Amongst the group , we get to know that variable 4 (rushmore married) is common in all the
groups having p value less than 0.05.
Therefore , it means that the mean of RUSHMORE MARRIED(4) is different .
QUESTION 4:
Divide the NUMCAR as follows One and More than One. Now using the CONCEPT and
the 4 groups (as developed in 3 above) run a 2 way ANOVA with the second concept as
NUMCAR groups. Is there any difference between the results obtained in 3 above and this
new 2 way ANOVA at 5% level of significance?
SOLUTION:
Analysis of Variance
Effects coding used for categorical variables in model.
The categorical values encountered during processing are
Variables
Levels
GROUPS (4 levels)
1.000
2.000
NUMCAR1 (2 levels)
0.000
1.000
3.000
Dependent Variable
CONCEPT
60
Multiple R
0.593
Squared Multiple R
0.351
4.000
Analysis of Variance
Source
Type III SS
df
GROUPS
34.538
11.513
8.568
0.000
NUMCAR1
2.614
2.614
1.945
0.169
GROUPS *NUMCAR1
2.430
0.810
0.603
0.616
Error
69.873
52
1.344
INTERPRETATION:
We run a two way annova with the 4 GROUPS and NUMCAR.
Let us assume,
Hog: the average means of CONCEPT is similar as the average means of GROUPS.
H1 g: the average means of CONCEPT is different from the average means of GROUPS.
Ho n: the average means of CONCEPT is similar as the average means of NUMCAR
H1 n: the average means of CONCEPT is different from the average means of NUMCAR.
From the p values, we can see the null hypothesis (ho g) is rejected due to its value being less
than 0.05, which means that the average means of CONCEPT is different from the average
means of GROUPS.
Similarly, the p value of NUMCAR is greater than 0.05, therefore we accept the null
hypothesis( ho n), i.e. , the average means of CONCEPT is similar as the average means of
NUMCAR.
Now to know which variable of the GROUP has a different mean, we perform a pairwise
comparison of the GROUPS.
PAIRWISE COMPARISON
Post Hoc Test of CONCEPT
Groups (J)
Difference
p-Value
Upper
-0.615
0.529
-1.781
0.550
0.089
0.998
-1.340
1.519
-1.786
0.001
-2.922
-0.650
0.705
0.483
-0.592
2.001
-1.170
0.012
-2.134
-0.207
-1.875
0.001
-3.145
-0.605
INTERPRETATION:
We compare the p value of all the 6 groups and select those p values whose value is less than
0.05.
From the above table, p values of group (1,4) , (2,4),( 3,4) are less than 0.05.
Amongst the group , we get to know that variable 4 (Rushmore Married) is common in all the
groups having p value less than 0.05.
Therefore , it means that the mean of RUSHMORE MARRIED(4) is different .
Therefore, there is no difference in the results obtained in question 3 above and this new 2
way ANOVA at 5% level of significance.
QUESTION 5:
Factor analyse the full 60x7 data matrix using principal component analysis using Varimax
rotation. Apply Kaisers criterion (eigenvalue > 1) to extract the principal components. How
will you interpret each set of rotated factor loadings?
Factor Analysis
3.413
1.065
0.913
0.663
0.432
0.353
0.161
Component Loadings
1
CONCEPT
0.906
-0.014
CIS
0.460
0.496
AGE
0.855
0.100
MARTIAL
0.619
0.075
NUMCAR
-0.207
0.853
AVAGE
-0.779
0.269
NUMTRIP
0.786
0.048
3.413
1.065
48.754
15.217
CONCEPT
0.904
-0.051
CIS
0.481
0.477
AGE
0.858
0.065
MARTIAL
0.621
0.049
NUMCAR
-0.171
0.861
AVAGE
-0.767
0.301
NUMTRIP
0.788
0.015
2.791
1.063
39.871
15.182
INTERPRETATION:
There are two latent factors at work which explains 63.97% of the total market behaviour of
the rushmore insurance.
P1=0.906 CONCEPT +0.460 CIS+ 0.855 AGE +0.619 MARTIAL0.207 NUMCAR0.779 AVAGE+ 0.786
P2=0.014 CONCEPT +0.496 CIS +0.100 AGE +0.075 MARTIAL+0.853 NUMCAR+0.269 AVAGE+0.048
FACTORS
PRINCIPAL COMPONENTS
CONCEPT
CIS
AGE
MARTIAL
NUMCAR
AVAGE
NUMTRIP
1
1
1
1
1
2
1
USAGE CHARACTERISTICS
Therefore we can say that 48.754% of market would buy insurance depending on self
characteristics and 15.21% of market would buy insurance depending on the car
characteristics.
RECOMMENDATIONS:
1.WISDOM OF OFFERING:
From the CHI SQUARE test of association, we understood that the association between
interest in the policy and the current insurance supplier is very strong and significant, which
means that the policy would be acceptable by the majority of respondents.
2. TARGET MARKET:
Upper
-0.932
0.022
-1.762
-0.103
-1.991
0.000
-2.821
-1.162
-2.883
0.000
-4.054
-1.713
-1.059
0.010
-1.921
-0.196
-1.951
0.000
-3.145
-0.757
-0.892
0.208
-2.086
0.302
20-30: 1
30-40: 2
40-50: 3
50 and above : 4
After the pairwise comparison, we can see that AGE GROUP 20-40 is the segment highly interested
in the insuance policy.
Also, from the above analysis, (question 4) , we concluded that the RUSHMORE MARRIED group
is the most significant group and therefore this segment can serve as the target market of the
insurance company.
From the factor analysis, we saw that 84.96% of market would buy insurance depending on self
characteristics and 12.948% of market would buy insurance depending on the car
characteristics.
3.FURTHER RESEARCH REQUIRED:
Cluster analysis could be done to segment the market further more.