Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

UNIT- 4 ASSIGNMENT

Simple Linear Regression Model:


Descriptive Statistics:
Mean of "V11" (Operating Profit): $34,425.24
Standard Deviation of "V11": $54,193.11
Mean of "V9" (Unit Sold): 256.93
Standard Deviation of "V9": 214.252
Sample Size (N): 9648

Correlation:
Pearson Correlation between "V11" and "V9" is strong at 0.892, indicating a high positive
linear relationship between Unit Sold and Operating Profit.
The p-value (Sig.) is significant at .000 (p < 0.05), indicating that the correlation is statistically
significant.

Model Summary:
The model's R-squared value is 0.796, meaning that 79.6% of the variance in Operating Profit
(V11) can be explained by the linear relationship with Unit Sold (V9). This indicates a strong
relationship between the two variables.
The Adjusted R-squared is the same as R-squared, which suggests that adding Unit Sold to
the model significantly improved the model's explanatory power.
The Standard Error of the Estimate is the estimate of the standard deviation of the
errors/residuals in the model. In this case, it's approximately $24,457.86.
ANOVA (Analysis of Variance):
The ANOVA table shows that the regression model is statistically significant, as indicated by
the p-value (Sig.) of .000 (p < 0.05). This means that the relationship between Unit Sold (V9)
and Operating Profit (V11) is not due to chance.
The F-statistic and its associated p-value indicate that the regression model as a whole is
significant in predicting Operating Profit based on Unit Sold.
Coefficient Interpretation:
The coefficient for "V9" (Unit Sold) in the regression equation would indicate the change in
Operating Profit for a one-unit increase in Unit Sold, holding other variables constant.
For example, if the coefficient for "V9" is 100, it would mean that for every additional unit
sold, Operating Profit is estimated to increase by $100, assuming all other factors remain
constant.
MULTIPLE REGRESSION ANALYSIS WITH MATRIX SCATTERPLOT:
Descriptive Statistics:
Mean of "Operating_Profit": $34,425.24 with a standard deviation of $54,193.11.
Mean of "Units_Sold": 256.93 with a standard deviation of 214.252.
Mean of "Price_per_Unit": $45.22 with a standard deviation of $14.71.
Mean of "Operating_Margin": 42.30% with a standard deviation of 9.72%.
Mean of "Total_Sales": $93,273.44 with a standard deviation of $141,916.02.
Sample Size (N): 9648.

Correlation Analysis:
There are several correlations to note:
"Operating_Profit" has a strong positive correlation with "Units_Sold" (0.892), "Total_Sales"
(0.956), and a moderate positive correlation with "Price_per_Unit" (0.395).
"Operating_Margin" shows a negative correlation with "Operating_Profit" (-0.212), indicating
a weaker relationship.
"Total_Sales" has a strong positive correlation with both "Units_Sold" (0.913) and
"Operating_Profit" (0.956).

Model Summary:
The multiple linear regression model has an impressive R-squared value of 0.938, indicating
that 93.8% of the variance in Operating Profit (dependent variable) is explained by the
combination of "Total_Sales," "Operating_Margin," "Price_per_Unit," and "Units_Sold"
(predictor variables).
The Adjusted R-squared is the same as R-squared, which suggests that the addition of the
predictors significantly improved the model's explanatory power.
The Standard Error of the Estimate is approximately $13,535.71, indicating the average
amount that the observed Operating Profit values deviate from the predicted values by the
model.

ANOVA (Analysis of Variance):


The ANOVA table shows that the regression model is highly significant, with a very small p-
value (< 0.0001). This indicates that at least one of the predictors in the model significantly
contributes to explaining the variance in Operating Profit.
Coefficient Interpretation:
Each predictor variable's coefficient in the regression equation represents the change in
Operating Profit for a one-unit change in that predictor variable, holding other predictors
constant.
For example, if the coefficient for "Units_Sold" is 100, it means that for every additional unit
sold, Operating Profit is estimated to increase by $100, assuming all other factors remain
constant.
Overall, the multiple linear regression model suggests a strong relationship between the
predictors ("Total_Sales," "Operating_Margin," "Price_per_Unit," "Units_Sold") and the
outcome variable ("Operating_Profit"), with high explanatory power and statistical
significance.

STEPWISE METHOD COMPARISION:


Stepwise Model Summary:
In the stepwise method, variables are added or removed based on their contribution to the
model's predictive power.
The stepwise regression results in multiple models with different numbers of predictors
added sequentially.
Model Comparison:
Enter Method (from previous analysis):
R-squared (Adjusted) for the model: 0.938 (0.938).
The number of predictors: Total Sales, Operating Margin, Units Sold, Price per Unit.
Stepwise Method (Four Models):
Model 1: R-squared (Adjusted): 0.915 (0.915), Predictors: Total Sales.
Model 2: R-squared (Adjusted): 0.936 (0.936), Predictors: Total Sales, Operating Margin.
Model 3: R-squared (Adjusted): 0.937 (0.937), Predictors: Total Sales, Operating Margin,
Units Sold.
Model 4 (Final Model): R-squared (Adjusted): 0.938 (0.938), Predictors: Total Sales, Operating
Margin, Units Sold, Price per Unit.
ANOVA (Model Comparison):
Each model's ANOVA table shows the significance of the regression models.
Model 4 (from stepwise) has a similar F-statistic and p-value as the model obtained from the
normal enter method, indicating its significant predictive power.
Interpretation:
The stepwise method automatically selects the best subset of predictors that contribute
significantly to the model's predictive ability.
In this case, the final stepwise model (Model 4) includes Total Sales, Operating Margin, Units
Sold, and Price per Unit, similar to what was chosen in the normal enter method.
Overall, both methods result in a model with similar predictive power (as indicated by R-
squared values) and significant predictors. The stepwise method helps in automating the
selection process based on statistical criteria, while the enter method requires manual
selection of predictors beforehand.

BINARY LOGISTIC REGRESSION:


Highly Profitable:1 if operating profit >200000 else 0
Case Processing Summary:
All 9648 cases were included in the analysis, with no missing data.
Categorical Variables Coding (Sales Method):
In-store: 1740 cases
Online: 4889 cases
Outlet: 3019 cases
Classification Table (Step 0):
The initial model correctly predicted 98% of the cases as either "Highly Profitable" (1) or not
(0), indicating a strong predictive power at this stage.
There were 190 cases of actual "Highly Profitable" that were misclassified as not highly
profitable.
Variables in the Equation (Step 0):
The initial step (Step 0) did not include any variables in the equation. This could suggest that
none of the variables individually were significant predictors in the initial model.
Block 1 Method=Enter:
The enter method attempts to enter all variables simultaneously in the model.
Omnibus Tests of Model Coefficients:
The model at Step 1 was highly significant (p < .001) based on the chi-square test, indicating
that the included variables collectively contribute to predicting the outcome.
Model Summary (Step 1):
The log-likelihood value is very close to zero, which is typical for a well-fitted model. The Cox
& Snell R-Square and Nagelkerke R-Square values indicate a strong association between
predictors and the outcome.
Classification Table (Step 1):
The final model achieved 100% accuracy in classifying cases as "Highly_Profitable" or not,
which is unusual and may indicate potential overfitting or perfect separation in the data.
Variables in the Equation (Step 1):

 Operating_Profit: The coefficient is 0.004, but it is not statistically significant (p =


0.654). This suggests that operating profit, as a single predictor, does not significantly
influence the likelihood of being highly profitable.
 Operating_Margin: The coefficient is 0.848, but it has a very large standard error
(S.E.), making its significance questionable.
 Units_Sold: The coefficient is -0.026, indicating that as units sold increase, the odds of
being highly profitable decrease slightly. However, it is not statistically significant (p =
0.982).
 Sales_Method: The model includes dummy variables for Sales_Method (In-store,
Online, Outlet). However, none of these variables are statistically significant in
predicting "Highly_Profitable" status, as indicated by their high p-values.
 Price_per_Unit and Total_Sales: Both variables also do not have significant
coefficients in this model.
Constant Term:
The constant term has a large negative coefficient (-813.027), but it is not statistically
significant (p = 0.699). This indicates that the intercept alone does not significantly contribute
to the prediction of "Highly_Profitable" status.
Overall Impression:
While the model achieved perfect accuracy in classification, the lack of significance for most
variables suggests that the model may not generalize well to new data or that there may be
issues with the data such as multicollinearity, perfect separation, or overfitting.
Further investigation into the data quality, model assumptions, and potential interactions
between variables is recommended to ensure the reliability and validity of the logistic
regression model.

You might also like