Professional Documents
Culture Documents
Assignment Docx-Predictive
Assignment Docx-Predictive
Assignment Docx-Predictive
i. Run a multiple regression explaining the preference for the brand of biscuits in terms of
the nutrition value, taste and preservation quality
Solution: Using SPSS, generated the below coefficient table with 95 % Confidence level (5% level of
significance)
Coefficientsa
Unstandardized Standardized 95.0% Confidence Interval
Coefficients Coefficients for B Collinearity Statistics
Model B Std. Error Beta t Sig. Lower Bound Upper Bound Tolerance VIF
1 (Constant) .733 .301 2.436 .020 .123 1.343
p_qlty .548 .118 .522 4.660 .000 .310 .787 .309 3.238
taste .170 .103 .198 1.655 .107 -.038 .379 .271 3.690
nutrition .295 .103 .284 2.865 .007 .086 .503 .395 2.531
Coefficientsa
Unstandardized Standardized 95.0% Confidence Interval
Coefficients Coefficients for B Collinearity Sta
Model B Std. Error Beta t Sig. Lower Bound Upper Bound Tolerance
1 (Constant) .668 .305 2.189 .035 .050 1.287
p_qlty .665 .096 .634 6.914 .000 .470 .860 .484
nutrition .367 .095 .354 3.862 .000 .175 .560 .484
a. Dependent Variable: pref
Hence the model below can be used to determine the preference for biscuits Preference = 0.668
+ 0.367 * Nutrition + 0.665* Preservation Quality
ii. Interpret the partial regression coefficients
iii. Test the overall significance of the regression using the ANOVA table
Solution: Inference: ANOVA table shows a significance of 0 which is less than 0.05 hence the regression
model can be used to predict the customer preference
Preference = 0.668 + 0.367 * Nutrition + 0.665* Preservation Quality
ANOVAa
Model Sum of Squares df Mean Square F Sig.
1 Regression 107.036 2 53.518 104.557 .000b
Residual 18.939 37 .512
Total 125.975 39
a. Dependent Variable: pref
b. Predictors: (Constant), nutrition, p_qlty
iv. Examine the significance of the partial regression coefficient using a 5 percent level
of significance
Solution: Inference:
a. For every one unit of change in nutrition value, there is an increase in 36% of
customer preference keeping preservation quality as constant
b. Similarly keeping nutrition as constant, for every 1 unit of change in
preservation quality, there is an increase of 66.5% of customer preference
Solution: Inference: From above inference (iv), improving preservation quality improves the
customer preference by 66% hence it’s imperative to focus on preserving the freshness of the
biscuits by using appropriate storage and package materials. Next focus should be on the nutrition
that improves the customer preference by 36%, hence use nutrients rich raw materials to bake
biscuits.
Questions:
1. Divide the sample into two groups-one that is using the social networking site for less than
one hour on weekdays (low users) and the second which is using the social networking site
for one or more hours (high users). Run a two-group Logistic regression analysis
with high/low user as a categorical dependent variable and the variables X3A to X3L as
predictor variables. To:
(a) Compute the percentage of respondents that it is able to classify correctly
Solution:
Weekday_user
Cumulative
Frequency Percent Valid Percent Percent
The percentage of the low user (34.8%) is more than the High user (31.5%). A total of cumulative
frequency of 68% usage of internet and social media is less than 1 hour on week days
Since P=1.0is more than 0.05, we accept the hypothesis that there is no difference between the
predicted value and observed frequencies. Accept the model
As the Significance is higher than 0.05, we accept that the model is good and the goodness of fit is
satisfied.
Model Summary
71.865a
As per Cox and Snell R, there is 18.6% variation in the value. And As per Nagalkerke square, there
is 24.8% variation in the value
Classification Tablea
Predicted
Weekday_user Percentage Correct
Low_user High_user
Observed
Step 1Weekday_userLow_user 11
High_user 21 65.6
19
10 65.5
65.6
Overall Percentage
a. The cut value is .500
The accuracy between the predicted percentage is 65.6%. Overall, it is a good prediction.
a. Variable(s) entered on step 1: X3B, X3D, X3F, X3G, X3H, X3J, X3L.
But when removed some of the variables, we see X3F (blogging) is the only variable significant.
It is 2.07 times more likely to affect the usage of social media than any other variables.
For every increase in 1 point, the odds of the contribution of blogging (X3F) increases by 2.075
times. As the X3F percentage of odds is more than 1 (1.040), the contribution for social media
usage is more.
The KMO value is more than 0.5, so we can go ahead with factor analysis. Bartlett’s test is also
significant as it is less than 0.5.
Total Variance Explained
Initial Eigenvalues Extraction Sums of Squared Loadings Rotation Sums of Squared Loadings
We have considered all the variables contributing to the social media usage. As we see that only
for the first 4 variables, the Eigen value is more than 1 instead of 12 factors. The cumulative of
these 4 factors is 69.66 %.
The Eigen Values for the 4 contributing factors are shown in above table.
We can consider the cut off score to be 0.6. Then for the factor 1, we have X3E (Promote events),
X3F (blogging), X3H (Games) will be included. Factor 2 includes, X3B (Messaging), X3C
(Networking), X3J (Photo Sharing). Factor 3 includes X3D (Make new friends), X3L (Online
dating). Factor 4 includes X3A (Linking with professional), X3K (Job seeking).
So, we can rename the 1st factor as Games and blogs site.
2nd factor can be renamed as Networking and sharing.
3rd factor can be renamed as friends and Dating
4th factor can be renamed as Linking and job search
II)1. Frequency and percentage of respondents
Weekend_user
Cumulative
Frequency Percent Valid Percent Percent
The Percentage of social media users more than 4 hours (42.4%) on weekend is more than the users
who spend time less than 4 hours (23.9%) on weekend. We can see high users are spending
cumulative time of more than 57% when compared to low user of 4 hours.
Since P=1.0is more than 0.05, we accept the hypothesis that there is no difference between the
predicted value and observed frequencies. Accept the model
As the Significance is higher than 0.05, we accept that the model is good and the goodness of fit is
satisfied.
Model Summary
65.237a
As per Cox and Snell R, there is 21% variation in the value. And As per Nagalkerke square, there is
29% variation in the value
Classification Tablea
Predicted
Weekend_user Percentage Correct
Low_user High_user
Observed
Step 1Weekend_userLow_user 13
High_user 9 40.9
31
8 79.5
65.6
Overall Percentage
a. The cut value is .500
The accuracy between the predicted percentage is 65.6%. Overall, it is a good prediction.
a. Variable(s) entered on step 1: X3A, X3B, X3C, X3D, X3E, X3F, X3G, X3H, X3I, X3J, X3K, X3L.
The KMO value is more than 0.5, so we can go ahead with factor analysis. Bartlett’s test is also
significant as it is less than 0.5.
We have considered all the variables contributing to the social media usage. As we see that only
for the first 4 variables, the Eigen value is more than 1 instead of 12 factors. The cumulative of
these 4 factors is 69.66 %.
Rotated Component Matrixa
Component
1 2 3 4
The Eigen Values for the 4 contributing factors are shown in above table.