Assignment Docx-Predictive

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 12

1.

MRP Biscuit Company Ltd

i. Run a multiple regression explaining the preference for the brand of biscuits in terms of
the nutrition value, taste and preservation quality

Solution: Using SPSS, generated the below coefficient table with 95 % Confidence level (5% level of
significance)

Predictor variables – nutrition, preservation of quality, taste.


Inference: Looking at the Significance value of each predictor variable, for taste variable the value 0.107 >
0.05 hence taste wouldn’t have any significant influence on the preference outcome variable

Coefficientsa
Unstandardized Standardized 95.0% Confidence Interval
Coefficients Coefficients for B Collinearity Statistics

Model B Std. Error Beta t Sig. Lower Bound Upper Bound Tolerance VIF
1 (Constant) .733 .301 2.436 .020 .123 1.343

p_qlty .548 .118 .522 4.660 .000 .310 .787 .309 3.238

taste .170 .103 .198 1.655 .107 -.038 .379 .271 3.690

nutrition .295 .103 .284 2.865 .007 .086 .503 .395 2.531

a. Dependent Variable: pref

Running Linear regression model after eliminating ‘taste’ variable


Predictor variables – nutrition, preservation of quality
Inference – Each of above two predictor variables can be used to determine the preference variable
outcome significantly as the probability level of significance is less than 0.05

Coefficientsa
Unstandardized Standardized 95.0% Confidence Interval
Coefficients Coefficients for B Collinearity Sta
Model B Std. Error Beta t Sig. Lower Bound Upper Bound Tolerance
1 (Constant) .668 .305 2.189 .035 .050 1.287
p_qlty .665 .096 .634 6.914 .000 .470 .860 .484
nutrition .367 .095 .354 3.862 .000 .175 .560 .484
a. Dependent Variable: pref

Hence the model below can be used to determine the preference for biscuits Preference = 0.668
+ 0.367 * Nutrition + 0.665* Preservation Quality
ii. Interpret the partial regression coefficients

Solution: Preference = 0.668 + 0.367 * Nutrition + 0.665* Preservation Quality


Inference: The coefficients of Nutrition and Preservation quality are positive hence they
have positive significance in determining preference variable.
a. For every one unit of change in nutrition value, there is an increase in 36% of
customer preference keeping preservation quality as constant
b. Similarly keeping nutrition as constant, for every 1 unit of change in preservation quality,
there is an increase of 66.5% of customer preference

iii. Test the overall significance of the regression using the ANOVA table

Solution: Inference: ANOVA table shows a significance of 0 which is less than 0.05 hence the regression
model can be used to predict the customer preference
Preference = 0.668 + 0.367 * Nutrition + 0.665* Preservation Quality

ANOVAa
Model Sum of Squares df Mean Square F Sig.
1 Regression 107.036 2 53.518 104.557 .000b
Residual 18.939 37 .512
Total 125.975 39
a. Dependent Variable: pref
b. Predictors: (Constant), nutrition, p_qlty

iv. Examine the significance of the partial regression coefficient using a 5 percent level
of significance
Solution: Inference:
a. For every one unit of change in nutrition value, there is an increase in 36% of
customer preference keeping preservation quality as constant
b. Similarly keeping nutrition as constant, for every 1 unit of change in
preservation quality, there is an increase of 66.5% of customer preference

v. As a marketing manager of the biscuit company, on what attributes will you


concentrate more so as to improve the marketability of the brand

Solution: Inference: From above inference (iv), improving preservation quality improves the
customer preference by 66% hence it’s imperative to focus on preserving the freshness of the
biscuits by using appropriate storage and package materials. Next focus should be on the nutrition
that improves the customer preference by 36%, hence use nutrients rich raw materials to bake
biscuits.

2. Predicting High/Low user of Social Networking Sites among students


A study was conducted to identify the variables which distinguish between heavy/light users of
social networking sites among students. A questionnaire was designed for the purpose. The
social networking sites considered for the study were Facebook, Orkut, Linked-in, Twitter, etc.
the online survey was conducted on a sample of 61 students in the age group of 20 to 30. The
collected response data is attached herewith in excel sheet.

Questions:

1. Divide the sample into two groups-one that is using the social networking site for less than
one hour on weekdays (low users) and the second which is using the social networking site
for one or more hours (high users). Run a two-group Logistic regression analysis
with high/low user as a categorical dependent variable and the variables X3A to X3L as
predictor variables. To:
(a) Compute the percentage of respondents that it is able to classify correctly

Solution:

Weekday_user

Cumulative
Frequency Percent Valid Percent Percent

Valid 31 33.7 33.7 33.7


Low_user 32 34.8 34.8 68.5

High_user 29 31.5 31.5 100.0


Total 92 100.0 100.0

The percentage of the low user (34.8%) is more than the High user (31.5%). A total of cumulative
frequency of 68% usage of internet and social media is less than 1 hour on week days

2.Determine the statistical significance of the logistic function

Hosmer and Lemeshow Test


Step Chi-square df Sig.
1 .000 8 1.000

Since P=1.0is more than 0.05, we accept the hypothesis that there is no difference between the
predicted value and observed frequencies. Accept the model
As the Significance is higher than 0.05, we accept that the model is good and the goodness of fit is
satisfied.

Model Summary

Cox & Snell R Nagelkerke R


Step -2 Log likelihood Square Square
1 .186 .248

71.865a

a. Estimation terminated at iteration number 4 because


parameter estimates changed by less than .001.

As per Cox and Snell R, there is 18.6% variation in the value. And As per Nagalkerke square, there
is 24.8% variation in the value

Classification Tablea

Predicted
Weekday_user Percentage Correct

Low_user High_user
Observed
Step 1Weekday_userLow_user 11
High_user 21 65.6
19
10 65.5
65.6
Overall Percentage
a. The cut value is .500

The accuracy between the predicted percentage is 65.6%. Overall, it is a good prediction.

Variables in the Equation


95% C.I.for EXP(B)

B S.E. Wald df Sig. Exp(B) Lower Upper

Step 1a X3A -.015 .282 .003 1 .959 .986 .567 1.713


X3B .559 .498 1.260 1 .262 1.749 .659 4.645

X3C -.165 .503 .107 1 .743 .848 .316 2.274

X3D -.295 .296 .991 1 .319 .745 .417 1.330

X3E .147 .393 .140 1 .709 1.158 .536 2.501

X3F .604 .447 1.831 1 .176 1.830 .763 4.392

X3G -.361 .426 .717 1 .397 .697 .302 1.607

X3H .461 .338 1.861 1 .173 1.586 .817 3.079

X3I -.075 .301 .061 1 .804 .928 .515 1.674

X3J -.596 .402 2.201 1 .138 .551 .251 1.211

X3K .222 .339 .427 1 .514 1.248 .642 2.426


X3L -.315 .280 1.265 1 .261 .730 .422 1.263
1.821 .007 1 .935 .861
Constant -.150
a. Variable(s) entered on step 1: X3A, X3B, X3C, X3D, X3E, X3F, X3G, X3H, X3I, X3J, X3K, X3L.

As per the above findings none of the variable is significant.

Variables in the Equation


95% C.I.for EXP(B)

B S.E. Wald df Sig. Exp(B) Lower Upper

Step 1a X3B .369 .297 1.553 1 .213 1.447 .809 2.587


X3D -.290 .293 .976 1 .323 .749 .421 1.330

X3F .730 .353 4.285 1 .038 2.075 1.040 4.141

X3G -.399 .406 .967 1 .325 .671 .303 1.486

X3H .451 .325 1.924 1 .165 1.570 .830 2.968

X3J -.536 .391 1.875 1 .171 .585 .272 1.260

X3L -.272 .266 1.044 1 .307 .762 .452 1.284


Constant .094 1.543 .004 1 .952 1.098

a. Variable(s) entered on step 1: X3B, X3D, X3F, X3G, X3H, X3J, X3L.

But when removed some of the variables, we see X3F (blogging) is the only variable significant.
It is 2.07 times more likely to affect the usage of social media than any other variables.
For every increase in 1 point, the odds of the contribution of blogging (X3F) increases by 2.075
times. As the X3F percentage of odds is more than 1 (1.040), the contribution for social media
usage is more.

Cut Off Score:

KMO and Bartlett's Test

Kaiser-Meyer-Olkin Measure of Sampling Adequacy. .619


Bartlett's Test of SphericityApprox. Chi-Square 271.788
df Sig. 66
.000

The KMO value is more than 0.5, so we can go ahead with factor analysis. Bartlett’s test is also
significant as it is less than 0.5.
Total Variance Explained
Initial Eigenvalues Extraction Sums of Squared Loadings Rotation Sums of Squared Loadings

Componen % of Cumulative % of Cumulative % of Cumulative


t Total Variance % Total Variance % Total Variance %

1 3.278 27.314 27.314 3.278 27.314 27.314 2.593 21.612 21.612


2 2.256 18.801 46.115 2.256 18.801 46.115 2.153 17.941 39.553
3 1.681 14.006 60.121 1.681 14.006 60.121 1.871 15.589 55.142
4 1.145 9.543 69.663 1.145 9.543 69.663 1.743 14.521 69.663
5 .902 7.515 77.179
6 .702 5.847 83.026
7 .600 4.997 88.023
8 .379 3.158 91.181
9 .349 2.910 94.092
10 .331 2.755 96.847
11 .261 2.172 99.019
12 .118 .981 100.000

Extraction Method: Principal Component Analysis.

We have considered all the variables contributing to the social media usage. As we see that only
for the first 4 variables, the Eigen value is more than 1 instead of 12 factors. The cumulative of
these 4 factors is 69.66 %.

Rotated Component Matrixa


Component
1 2 3 4

X3A .108 -.007 -.235 .849


X3B -.082 .919 -.044 -.121
X3C .012 .892 -.129 -.177
X3D .035 .154 .800 -.108
X3E .777 -.010 .092 .195
X3F .833 .073 .133 .061
X3G .809 .094 -.090 .127
X3H .605 -.063 .556 -.022
X3I .451 -.041 .043 .491
X3J .188 .663 .190 .158
X3K .118 -.111 .273 .783
X3L .066 -.131 .837 .165

Extraction Method: Principal Component Analysis.


Rotation Method: Varimax with Kaiser Normalization.
a. Rotation converged in 5 iterations.

The Eigen Values for the 4 contributing factors are shown in above table.

Rotated Component Matrixa


Component
1 2 3 4

X3A .108 -.007 -.235 .849


X3B -.082 .919 -.044 -.121
X3C .012 .892 -.129 -.177
X3D .035 .154 .800 -.108
X3E .777 -.010 .092 .195
X3F .833 .073 .133 .061
X3G .809 .094 -.090 .127
X3H .605 -.063 .556 -.022
X3I .451 -.041 .043 .491
X3J .188 .663 .190 .158
X3K .118 -.111 .273 .783
X3L .066 -.131 .837 .165

Extraction Method: Principal Component Analysis.


Rotation Method: Varimax with Kaiser Normalization.
a. Rotation converged in 5 iterations.

We can consider the cut off score to be 0.6. Then for the factor 1, we have X3E (Promote events),
X3F (blogging), X3H (Games) will be included. Factor 2 includes, X3B (Messaging), X3C
(Networking), X3J (Photo Sharing). Factor 3 includes X3D (Make new friends), X3L (Online
dating). Factor 4 includes X3A (Linking with professional), X3K (Job seeking).
So, we can rename the 1st factor as Games and blogs site.
2nd factor can be renamed as Networking and sharing.
3rd factor can be renamed as friends and Dating
4th factor can be renamed as Linking and job search
II)1. Frequency and percentage of respondents

Weekend_user

Cumulative
Frequency Percent Valid Percent Percent

Valid 31 33.7 33.7 33.7


Low_user 22 23.9 23.9 57.6

High_user 39 42.4 42.4 100.0


Total 92 100.0 100.0

The Percentage of social media users more than 4 hours (42.4%) on weekend is more than the users
who spend time less than 4 hours (23.9%) on weekend. We can see high users are spending
cumulative time of more than 57% when compared to low user of 4 hours.

Determine the statistical significance of the logistic function

Hosmer and Lemeshow Test


Step Chi-square df Sig.
1 11.679 8 .166

Since P=1.0is more than 0.05, we accept the hypothesis that there is no difference between the
predicted value and observed frequencies. Accept the model

As the Significance is higher than 0.05, we accept that the model is good and the goodness of fit is
satisfied.

Model Summary

Cox & Snell R Nagelkerke R


Step -2 Log likelihood Square Square
1 .212 .290

65.237a

a. Estimation terminated at iteration number 5 because


parameter estimates changed by less than .001.

As per Cox and Snell R, there is 21% variation in the value. And As per Nagalkerke square, there is
29% variation in the value
Classification Tablea

Predicted
Weekend_user Percentage Correct

Low_user High_user
Observed
Step 1Weekend_userLow_user 13
High_user 9 40.9
31
8 79.5
65.6
Overall Percentage
a. The cut value is .500

The accuracy between the predicted percentage is 65.6%. Overall, it is a good prediction.

As per the above findings none of the variable is significant.

Variables in the Equation


95% C.I.for EXP(B)

B S.E. Wald df Sig. Exp(B) Lower Upper


a
Step 1 X3A -.815 .340 5.732 1 .017 .443 .227 .863
X3B -.428 .545 .619 1 .432 .652 .224 1.895

X3C .278 .534 .271 1 .603 1.320 .464 3.759

X3D -.082 .298 .076 1 .782 .921 .513 1.653

X3E -.198 .425 .217 1 .641 .820 .357 1.887

X3F .483 .495 .948 1 .330 1.620 .613 4.279

X3G -.202 .476 .179 1 .672 .817 .322 2.078

X3H .060 .375 .025 1 .873 1.062 .509 2.212

X3I .175 .321 .299 1 .584 1.192 .636 2.234

X3J -.228 .418 .297 1 .586 .796 .351 1.807

X3K .067 .354 .035 1 .851 1.069 .534 2.140

X3L .177 .288 .378 1 .539 1.193 .679 2.097


Constant 2.907 1.931 2.266 1 .132 18.303

a. Variable(s) entered on step 1: X3A, X3B, X3C, X3D, X3E, X3F, X3G, X3H, X3I, X3J, X3K, X3L.

We see only X3A is significant in adding time to social media.


Cut Off Score:

KMO and Bartlett's Test

Kaiser-Meyer-Olkin Measure of Sampling Adequacy. .619


Bartlett's Test of SphericityApprox. Chi-Square 271.788
df Sig. 66
.000

The KMO value is more than 0.5, so we can go ahead with factor analysis. Bartlett’s test is also
significant as it is less than 0.5.

Total Variance Explained


Initial Eigenvalues Extraction Sums of Squared Loadings Rotation Sums of Squared Loadings

Componen % of Cumulative % of Cumulative % of Cumulative


t Total Variance % Total Variance % Total Variance %

1 3.278 27.314 27.314 3.278 27.314 27.314 2.593 21.612 21.612


2 2.256 18.801 46.115 2.256 18.801 46.115 2.153 17.941 39.553
3 1.681 14.006 60.121 1.681 14.006 60.121 1.871 15.589 55.142
4 1.145 9.543 69.663 1.145 9.543 69.663 1.743 14.521 69.663
5 .902 7.515 77.179
6 .702 5.847 83.026
7 .600 4.997 88.023
8 .379 3.158 91.181
9 .349 2.910 94.092
10 .331 2.755 96.847
11 .261 2.172 99.019
12 .118 .981 100.000

Extraction Method: Principal Component Analysis.

We have considered all the variables contributing to the social media usage. As we see that only
for the first 4 variables, the Eigen value is more than 1 instead of 12 factors. The cumulative of
these 4 factors is 69.66 %.
Rotated Component Matrixa
Component
1 2 3 4

X3A .108 -.007 -.235 .849


X3B -.082 .919 -.044 -.121
X3C .012 .892 -.129 -.177
X3D .035 .154 .800 -.108
X3E .777 -.010 .092 .195
X3F .833 .073 .133 .061
X3G .809 .094 -.090 .127
X3H .605 -.063 .556 -.022
X3I .451 -.041 .043 .491
X3J .188 .663 .190 .158
X3K .118 -.111 .273 .783
X3L .066 -.131 .837 .165

Extraction Method: Principal Component Analysis.


Rotation Method: Varimax with Kaiser Normalization.
a. Rotation converged in 5 iterations.

The Eigen Values for the 4 contributing factors are shown in above table.

Rotated Component Matrixa


Component
1 2 3 4

X3A .108 -.007 -.235 .849


X3B -.082 .919 -.044 -.121
X3C .012 .892 -.129 -.177
X3D .035 .154 .800 -.108
X3E .777 -.010 .092 .195
X3F .833 .073 .133 .061
X3G .809 .094 -.090 .127
X3H .605 -.063 .556 -.022
X3I .451 -.041 .043 .491
X3J .188 .663 .190 .158
X3K .118 -.111 .273 .783
X3L .066 -.131 .837 .165

Extraction Method: Principal Component Analysis.


Rotation Method: Varimax with Kaiser Normalization.
a. Rotation converged in 5 iterations.
We can consider the cut off score to be 0.6. Then for the factor 1, we have X3E (Promote events),
X3F (blogging), X3H (Games) will be included. Factor 2 includes, X3B (Messaging), X3C
(Networking), X3J (Photo Sharing). Factor 3 includes X3D (Make new friends), X3L (Online
dating). Factor 4 includes X3A (Linking with professional), X3K (Job seeking).
So, we can rename the 1st factor as Games and blogs site.
2nd factor can be renamed as Networking and sharing.
3rd factor can be renamed as friends and Dating
4th factor can be renamed as Linking and job search

You might also like