Assignment Docx-Predictive

1.
MRP Biscuit Company Ltd
i. Run a multiple regression explaining the preference for the brand of biscuits in terms of
the nutrition value, taste and preservation quality
Solution: Using SPSS, generated the below coefficient table with 95 % Confidence level (5% level of
significance)
Predictor variables – nutrition, preservation of quality, taste.

Inference: Looking at the Significance value of each predictor variable, for taste variable the value 0.107 >
0.05 hence taste wouldn’t have any significant influence on the preference outcome variable
Coefficientsa
Unstandardized Standardized 95.0% Confidence Interval
Coefficients Coefficients for B Collinearity Statistics
Model B Std. Error Beta t Sig. Lower Bound Upper Bound Tolerance VIF
1 (Constant) .733 .301 2.436 .020 .123 1.343
p_qlty .548 .118 .522 4.660 .000 .310 .787 .309 3.238
taste .170 .103 .198 1.655 .107 -.038 .379 .271 3.690
nutrition .295 .103 .284 2.865 .007 .086 .503 .395 2.531
a. Dependent Variable: pref
Running Linear regression model after eliminating ‘taste’ variable

Predictor variables – nutrition, preservation of quality
Inference – Each of above two predictor variables can be used to determine the preference variable
outcome significantly as the probability level of significance is less than 0.05
Coefficientsa
Unstandardized Standardized 95.0% Confidence Interval
Coefficients Coefficients for B Collinearity Sta
Model B Std. Error Beta t Sig. Lower Bound Upper Bound Tolerance
1 (Constant) .668 .305 2.189 .035 .050 1.287
p_qlty .665 .096 .634 6.914 .000 .470 .860 .484
nutrition .367 .095 .354 3.862 .000 .175 .560 .484
Hence the model below can be used to determine the preference for biscuits Preference = 0.668
+ 0.367 * Nutrition + 0.665* Preservation Quality
ii. Interpret the partial regression coefficients
Solution: Preference = 0.668 + 0.367 * Nutrition + 0.665* Preservation Quality

Inference: The coefficients of Nutrition and Preservation quality are positive hence they
have positive significance in determining preference variable.
a. For every one unit of change in nutrition value, there is an increase in 36% of
customer preference keeping preservation quality as constant
b. Similarly keeping nutrition as constant, for every 1 unit of change in preservation quality,
there is an increase of 66.5% of customer preference
iii. Test the overall significance of the regression using the ANOVA table
Solution: Inference: ANOVA table shows a significance of 0 which is less than 0.05 hence the regression
model can be used to predict the customer preference
Preference = 0.668 + 0.367 * Nutrition + 0.665* Preservation Quality
ANOVAa
Model Sum of Squares df Mean Square F Sig.
1 Regression 107.036 2 53.518 104.557 .000b
Residual 18.939 37 .512
Total 125.975 39
b. Predictors: (Constant), nutrition, p_qlty
iv. Examine the significance of the partial regression coefficient using a 5 percent level
of significance
Solution: Inference:
a. For every one unit of change in nutrition value, there is an increase in 36% of
customer preference keeping preservation quality as constant
b. Similarly keeping nutrition as constant, for every 1 unit of change in
preservation quality, there is an increase of 66.5% of customer preference
v. As a marketing manager of the biscuit company, on what attributes will you

concentrate more so as to improve the marketability of the brand
Solution: Inference: From above inference (iv), improving preservation quality improves the
customer preference by 66% hence it’s imperative to focus on preserving the freshness of the
biscuits by using appropriate storage and package materials. Next focus should be on the nutrition
that improves the customer preference by 36%, hence use nutrients rich raw materials to bake
biscuits.
2. Predicting High/Low user of Social Networking Sites among students

A study was conducted to identify the variables which distinguish between heavy/light users of
social networking sites among students. A questionnaire was designed for the purpose. The
social networking sites considered for the study were Facebook, Orkut, Linked-in, Twitter, etc.
the online survey was conducted on a sample of 61 students in the age group of 20 to 30. The
collected response data is attached herewith in excel sheet.
Questions:
1. Divide the sample into two groups-one that is using the social networking site for less than
one hour on weekdays (low users) and the second which is using the social networking site
for one or more hours (high users). Run a two-group Logistic regression analysis
with high/low user as a categorical dependent variable and the variables X3A to X3L as
predictor variables. To:
(a) Compute the percentage of respondents that it is able to classify correctly
Solution:
Weekday_user
Cumulative
Frequency Percent Valid Percent Percent
Valid 31 33.7 33.7 33.7

Low_user 32 34.8 34.8 68.5
High_user 29 31.5 31.5 100.0

Total 92 100.0 100.0
The percentage of the low user (34.8%) is more than the High user (31.5%). A total of cumulative
frequency of 68% usage of internet and social media is less than 1 hour on week days
2.Determine the statistical significance of the logistic function
Hosmer and Lemeshow Test

Step Chi-square df Sig.
1 .000 8 1.000
Since P=1.0is more than 0.05, we accept the hypothesis that there is no difference between the
predicted value and observed frequencies. Accept the model
As the Significance is higher than 0.05, we accept that the model is good and the goodness of fit is
satisfied.
Model Summary
Cox & Snell R Nagelkerke R

Step -2 Log likelihood Square Square
1 .186 .248
71.865a
a. Estimation terminated at iteration number 4 because

parameter estimates changed by less than .001.
As per Cox and Snell R, there is 18.6% variation in the value. And As per Nagalkerke square, there
is 24.8% variation in the value
Classification Tablea
Predicted
Weekday_user Percentage Correct
Low_user High_user
Observed
Step 1Weekday_userLow_user 11
High_user 21 65.6
19
10 65.5
65.6
Overall Percentage
a. The cut value is .500
The accuracy between the predicted percentage is 65.6%. Overall, it is a good prediction.
Variables in the Equation

95% C.I.for EXP(B)
B S.E. Wald df Sig. Exp(B) Lower Upper
Step 1a X3A -.015 .282 .003 1 .959 .986 .567 1.713

X3B .559 .498 1.260 1 .262 1.749 .659 4.645
X3C -.165 .503 .107 1 .743 .848 .316 2.274
X3D -.295 .296 .991 1 .319 .745 .417 1.330
X3E .147 .393 .140 1 .709 1.158 .536 2.501
X3F .604 .447 1.831 1 .176 1.830 .763 4.392
X3G -.361 .426 .717 1 .397 .697 .302 1.607
X3H .461 .338 1.861 1 .173 1.586 .817 3.079
X3I -.075 .301 .061 1 .804 .928 .515 1.674
X3J -.596 .402 2.201 1 .138 .551 .251 1.211
X3K .222 .339 .427 1 .514 1.248 .642 2.426

X3L -.315 .280 1.265 1 .261 .730 .422 1.263
1.821 .007 1 .935 .861
Constant -.150
a. Variable(s) entered on step 1: X3A, X3B, X3C, X3D, X3E, X3F, X3G, X3H, X3I, X3J, X3K, X3L.
As per the above findings none of the variable is significant.

95% C.I.for EXP(B)
Step 1a X3B .369 .297 1.553 1 .213 1.447 .809 2.587

X3D -.290 .293 .976 1 .323 .749 .421 1.330
X3F .730 .353 4.285 1 .038 2.075 1.040 4.141
X3G -.399 .406 .967 1 .325 .671 .303 1.486
X3H .451 .325 1.924 1 .165 1.570 .830 2.968
X3J -.536 .391 1.875 1 .171 .585 .272 1.260
X3L -.272 .266 1.044 1 .307 .762 .452 1.284

Constant .094 1.543 .004 1 .952 1.098
a. Variable(s) entered on step 1: X3B, X3D, X3F, X3G, X3H, X3J, X3L.
But when removed some of the variables, we see X3F (blogging) is the only variable significant.
It is 2.07 times more likely to affect the usage of social media than any other variables.
For every increase in 1 point, the odds of the contribution of blogging (X3F) increases by 2.075
times. As the X3F percentage of odds is more than 1 (1.040), the contribution for social media
usage is more.
Cut Off Score:
KMO and Bartlett's Test
Kaiser-Meyer-Olkin Measure of Sampling Adequacy. .619

Bartlett's Test of SphericityApprox. Chi-Square 271.788
df Sig. 66
.000
The KMO value is more than 0.5, so we can go ahead with factor analysis. Bartlett’s test is also
significant as it is less than 0.5.
Total Variance Explained
Initial Eigenvalues Extraction Sums of Squared Loadings Rotation Sums of Squared Loadings
Componen % of Cumulative % of Cumulative % of Cumulative

t Total Variance % Total Variance % Total Variance %
1 3.278 27.314 27.314 3.278 27.314 27.314 2.593 21.612 21.612

2 2.256 18.801 46.115 2.256 18.801 46.115 2.153 17.941 39.553
3 1.681 14.006 60.121 1.681 14.006 60.121 1.871 15.589 55.142
4 1.145 9.543 69.663 1.145 9.543 69.663 1.743 14.521 69.663
5 .902 7.515 77.179
6 .702 5.847 83.026
7 .600 4.997 88.023
8 .379 3.158 91.181
9 .349 2.910 94.092
10 .331 2.755 96.847
11 .261 2.172 99.019
12 .118 .981 100.000
Extraction Method: Principal Component Analysis.
We have considered all the variables contributing to the social media usage. As we see that only
for the first 4 variables, the Eigen value is more than 1 instead of 12 factors. The cumulative of
these 4 factors is 69.66 %.
Rotated Component Matrixa

Component
1 2 3 4
X3A .108 -.007 -.235 .849

X3B -.082 .919 -.044 -.121
X3C .012 .892 -.129 -.177
X3D .035 .154 .800 -.108
X3E .777 -.010 .092 .195
X3F .833 .073 .133 .061
X3G .809 .094 -.090 .127
X3H .605 -.063 .556 -.022
X3I .451 -.041 .043 .491
X3J .188 .663 .190 .158
X3K .118 -.111 .273 .783
X3L .066 -.131 .837 .165

Rotation Method: Varimax with Kaiser Normalization.
a. Rotation converged in 5 iterations.
The Eigen Values for the 4 contributing factors are shown in above table.

Component
1 2 3 4
X3A .108 -.007 -.235 .849

X3B -.082 .919 -.044 -.121
X3C .012 .892 -.129 -.177
X3D .035 .154 .800 -.108
X3E .777 -.010 .092 .195
X3F .833 .073 .133 .061
X3G .809 .094 -.090 .127
X3H .605 -.063 .556 -.022
X3I .451 -.041 .043 .491
X3J .188 .663 .190 .158
X3K .118 -.111 .273 .783
X3L .066 -.131 .837 .165

We can consider the cut off score to be 0.6. Then for the factor 1, we have X3E (Promote events),
X3F (blogging), X3H (Games) will be included. Factor 2 includes, X3B (Messaging), X3C
(Networking), X3J (Photo Sharing). Factor 3 includes X3D (Make new friends), X3L (Online
dating). Factor 4 includes X3A (Linking with professional), X3K (Job seeking).
So, we can rename the 1st factor as Games and blogs site.
2nd factor can be renamed as Networking and sharing.
3rd factor can be renamed as friends and Dating
4th factor can be renamed as Linking and job search
II)1. Frequency and percentage of respondents
Weekend_user
Cumulative
Frequency Percent Valid Percent Percent
Valid 31 33.7 33.7 33.7

Low_user 22 23.9 23.9 57.6
High_user 39 42.4 42.4 100.0

Total 92 100.0 100.0
The Percentage of social media users more than 4 hours (42.4%) on weekend is more than the users
who spend time less than 4 hours (23.9%) on weekend. We can see high users are spending
cumulative time of more than 57% when compared to low user of 4 hours.
Determine the statistical significance of the logistic function
Hosmer and Lemeshow Test

Step Chi-square df Sig.
1 11.679 8 .166
Since P=1.0is more than 0.05, we accept the hypothesis that there is no difference between the
predicted value and observed frequencies. Accept the model
As the Significance is higher than 0.05, we accept that the model is good and the goodness of fit is
satisfied.
Model Summary
Cox & Snell R Nagelkerke R

Step -2 Log likelihood Square Square
1 .212 .290
65.237a
a. Estimation terminated at iteration number 5 because

parameter estimates changed by less than .001.
As per Cox and Snell R, there is 21% variation in the value. And As per Nagalkerke square, there is
29% variation in the value
Classification Tablea
Predicted
Weekend_user Percentage Correct
Low_user High_user
Observed
Step 1Weekend_userLow_user 13
High_user 9 40.9
31
8 79.5
65.6
Overall Percentage
a. The cut value is .500
The accuracy between the predicted percentage is 65.6%. Overall, it is a good prediction.
As per the above findings none of the variable is significant.

95% C.I.for EXP(B)

a
Step 1 X3A -.815 .340 5.732 1 .017 .443 .227 .863
X3B -.428 .545 .619 1 .432 .652 .224 1.895
X3C .278 .534 .271 1 .603 1.320 .464 3.759
X3D -.082 .298 .076 1 .782 .921 .513 1.653
X3E -.198 .425 .217 1 .641 .820 .357 1.887
X3F .483 .495 .948 1 .330 1.620 .613 4.279
X3G -.202 .476 .179 1 .672 .817 .322 2.078
X3H .060 .375 .025 1 .873 1.062 .509 2.212
X3I .175 .321 .299 1 .584 1.192 .636 2.234
X3J -.228 .418 .297 1 .586 .796 .351 1.807
X3K .067 .354 .035 1 .851 1.069 .534 2.140
X3L .177 .288 .378 1 .539 1.193 .679 2.097

Constant 2.907 1.931 2.266 1 .132 18.303
a. Variable(s) entered on step 1: X3A, X3B, X3C, X3D, X3E, X3F, X3G, X3H, X3I, X3J, X3K, X3L.
We see only X3A is significant in adding time to social media.

Cut Off Score:
KMO and Bartlett's Test
Kaiser-Meyer-Olkin Measure of Sampling Adequacy. .619

Bartlett's Test of SphericityApprox. Chi-Square 271.788
df Sig. 66
.000
The KMO value is more than 0.5, so we can go ahead with factor analysis. Bartlett’s test is also
significant as it is less than 0.5.
Total Variance Explained

Initial Eigenvalues Extraction Sums of Squared Loadings Rotation Sums of Squared Loadings
Componen % of Cumulative % of Cumulative % of Cumulative

t Total Variance % Total Variance % Total Variance %
1 3.278 27.314 27.314 3.278 27.314 27.314 2.593 21.612 21.612

2 2.256 18.801 46.115 2.256 18.801 46.115 2.153 17.941 39.553
3 1.681 14.006 60.121 1.681 14.006 60.121 1.871 15.589 55.142
4 1.145 9.543 69.663 1.145 9.543 69.663 1.743 14.521 69.663
5 .902 7.515 77.179
6 .702 5.847 83.026
7 .600 4.997 88.023
8 .379 3.158 91.181
9 .349 2.910 94.092
10 .331 2.755 96.847
11 .261 2.172 99.019
12 .118 .981 100.000
We have considered all the variables contributing to the social media usage. As we see that only
for the first 4 variables, the Eigen value is more than 1 instead of 12 factors. The cumulative of
these 4 factors is 69.66 %.
Component
1 2 3 4
X3A .108 -.007 -.235 .849

X3B -.082 .919 -.044 -.121
X3C .012 .892 -.129 -.177
X3D .035 .154 .800 -.108
X3E .777 -.010 .092 .195
X3F .833 .073 .133 .061
X3G .809 .094 -.090 .127
X3H .605 -.063 .556 -.022
X3I .451 -.041 .043 .491
X3J .188 .663 .190 .158
X3K .118 -.111 .273 .783
X3L .066 -.131 .837 .165

The Eigen Values for the 4 contributing factors are shown in above table.

Component
1 2 3 4
X3A .108 -.007 -.235 .849

X3B -.082 .919 -.044 -.121
X3C .012 .892 -.129 -.177
X3D .035 .154 .800 -.108
X3E .777 -.010 .092 .195
X3F .833 .073 .133 .061
X3G .809 .094 -.090 .127
X3H .605 -.063 .556 -.022
X3I .451 -.041 .043 .491
X3J .188 .663 .190 .158
X3K .118 -.111 .273 .783
X3L .066 -.131 .837 .165

We can consider the cut off score to be 0.6. Then for the factor 1, we have X3E (Promote events),
X3F (blogging), X3H (Games) will be included. Factor 2 includes, X3B (Messaging), X3C
(Networking), X3J (Photo Sharing). Factor 3 includes X3D (Make new friends), X3L (Online
dating). Factor 4 includes X3A (Linking with professional), X3K (Job seeking).
So, we can rename the 1st factor as Games and blogs site.
2nd factor can be renamed as Networking and sharing.
3rd factor can be renamed as friends and Dating
4th factor can be renamed as Linking and job search

Assignment Docx-Predictive

Uploaded by

Copyright:

Available Formats

You might also like

Assignment Docx-Predictive

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Assignment Docx-Predictive

Uploaded by

Copyright:

Available Formats

1.

MRP Biscuit Company Ltd

Predictor variables – nutrition, preservation of quality, taste.

a. Dependent Variable: pref

Running Linear regression model after eliminating ‘taste’ variable

Solution: Preference = 0.668 + 0.367 * Nutrition + 0.665* Preservation Quality

v. As a marketing manager of the biscuit company, on what attributes will you

2. Predicting High/Low user of Social Networking Sites among students

Valid 31 33.7 33.7 33.7

High_user 29 31.5 31.5 100.0

2.Determine the statistical significance of the logistic function

Hosmer and Lemeshow Test

Cox & Snell R Nagelkerke R

a. Estimation terminated at iteration number 4 because

Variables in the Equation

B S.E. Wald df Sig. Exp(B) Lower Upper

Step 1a X3A -.015 .282 .003 1 .959 .986 .567 1.713

X3C -.165 .503 .107 1 .743 .848 .316 2.274

X3D -.295 .296 .991 1 .319 .745 .417 1.330

X3E .147 .393 .140 1 .709 1.158 .536 2.501

X3F .604 .447 1.831 1 .176 1.830 .763 4.392

X3G -.361 .426 .717 1 .397 .697 .302 1.607

X3H .461 .338 1.861 1 .173 1.586 .817 3.079

X3I -.075 .301 .061 1 .804 .928 .515 1.674

X3J -.596 .402 2.201 1 .138 .551 .251 1.211

X3K .222 .339 .427 1 .514 1.248 .642 2.426

As per the above findings none of the variable is significant.

Variables in the Equation

B S.E. Wald df Sig. Exp(B) Lower Upper

Step 1a X3B .369 .297 1.553 1 .213 1.447 .809 2.587

X3F .730 .353 4.285 1 .038 2.075 1.040 4.141

X3G -.399 .406 .967 1 .325 .671 .303 1.486

X3H .451 .325 1.924 1 .165 1.570 .830 2.968

X3J -.536 .391 1.875 1 .171 .585 .272 1.260

X3L -.272 .266 1.044 1 .307 .762 .452 1.284

Cut Off Score:

KMO and Bartlett's Test

Kaiser-Meyer-Olkin Measure of Sampling Adequacy. .619

Componen % of Cumulative % of Cumulative % of Cumulative

1 3.278 27.314 27.314 3.278 27.314 27.314 2.593 21.612 21.612

Extraction Method: Principal Component Analysis.

Rotated Component Matrixa

X3A .108 -.007 -.235 .849

Extraction Method: Principal Component Analysis.

Rotated Component Matrixa

X3A .108 -.007 -.235 .849

Extraction Method: Principal Component Analysis.

Valid 31 33.7 33.7 33.7

High_user 39 42.4 42.4 100.0

Determine the statistical significance of the logistic function

Hosmer and Lemeshow Test

Cox & Snell R Nagelkerke R

a. Estimation terminated at iteration number 5 because

As per the above findings none of the variable is significant.

Variables in the Equation

B S.E. Wald df Sig. Exp(B) Lower Upper

X3C .278 .534 .271 1 .603 1.320 .464 3.759