Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

EXERCISE DISCUSSION

A clothing company manager wants to know the relationship between sales with several
factors in order to predict the acquisition of results selling clothes to get the maximum
profit. The dependent variable used is sales (in million rupiahs). Meanwhile, the
independent variables used are promotion costs through TV (in million rupiahs),
promotion costs through flyers (in million rupiahs), the number of stores in that district,
and the number of competitors in the same district. Do the regression analysis and
determine the best model regression using the model selection criteria procedure. The
following data were obtained from fifteen districts.
TV Flyer Store Competitor Sales

31 5.5 8 10 79

55 2.5 6 8 200

67 8 9 12 163

50 3 16 7 201

38 3 15 8 146

71 2.9 17 12 177

30 8 8 12 31

56 9 10 5 292

42 4 4 8 160

73 6.5 16 5 340

60 5.5 7 11 160

44 5 12 12 87

50 6 6 6 237

39 5 4 10 107

55 3.5 4 10 155
Answer:
In this case, we will use a multiple linear regression analysis since there are four
independent variables (promotion costs through TV, promotion costs through social
media, promotion costs through radio, and promotion costs through flyers) and one
dependent variable, namely Sales. The procedures are as follows.

Assumption:
1. Normality test
● Hypothesis
H0: Sales data follows a normal distribution.
H1: Sales data does not follow a normal distribution.
● Significance level
α = 5%
● Test statistic

P-value = 0.759 (𝑛 ≤ 50 so we used Shapiro-Wilk)


● Critical region
H0 is rejected if the p-value is less than the significance level (α)
● Conclusion
Because the p-value (0.759) is greater than the significance level (0.05),
then H0 is not rejected. So it can be concluded that the population of the
Sales data is normally distributed.
● Interpretation
Normality test was carried out to find out the fulfillment of the first
assumption, which is the assumption of normality on the dependent
variable. Obtained p-value (0.759 using the Sig. of Shapiro-Wilk) is
greater than the significance level (0.05) then H0 is not rejected. So it can
be concluded that the population of the Sales data is normally distributed.
Thus, the assumption of normality of the dependent variable is met.
2. Linearity test

Interpretation:
The scatter plot above shows the relationship between the Sales variable (Y) with
promotion costs through TV (X1), promotion costs through social media (X2),
promotion costs through radio (X3), and promotion costs through flyers (X4).
Dependent Independent R-Square Interpretation
Variable Variable

Sales TV 0.500 Indicating that 50% of the


variation in the Sales variable
can be explained by the
promotion costs through TV
variable

Flyer 0.012 Indicating that 1.2% of the


variation in the Sales variable
can be explained by the
promotion costs through flyers
variable

Store 0.094 Indicating that 9.4% of the


variation in the Sales variable
can be explained by the
promotion costs through radio
variable
Competitor 0.638 Indicating that 63.8% of the
variation in the Sales variable
can be explained by the
promotion costs through social
media variable

Conclusion :
Variables TV and competitor show a straight-line relationship with the Sales
variable. However, if you look at the relationship between the Sales variable with
the flyer and store variables, it doesn’t show a linear relationship and also the
R-Square value are very small, in other words, the percentage of the flyer and
store variables that can explain the variation in the Sales variable is very small.
Therefore, through the linearity test, only two independent variables passed the
assumption test. However, because there are still variables that show a linear
relationship, the four independent variables are still included in the model
formation.

Multiple Linear Regression Analysis


MODEL 1
Variables that included in this model are:
- Dependent variable (Y) : Sales
- Independent variable (X)
1. X1 = TV
2. X2 = Flyer
3. X3 = Store
4. X4 = Competitor
- Constant

1. Model Summary
Interpretation:
● R : indicates the degree of relationship between the independent variables
and the dependent variable which is equal to 0.999, this value indicates
that there is a close relationship between the independent variables and the
dependent variable.
● 𝐑𝟐 : indicates that 99.7% variation of the dependent variable can be
explained by the independent variables. While the rest is explained by
other reasons.
● Adjusted 𝐑𝟐 : 0.996 indicates a correction to R2 by 99.6%.
● Std. Error of the Estimate : 4.95659 shows the magnitude of the
variation in the regression model of 4.95659.
● AIC (Akaike Information Criterion): the value of AIC is 51.940
● SBC (Schwarz Bayesian Criterion): the value of SBC is 55.480
● Cp Mallow’s: the number of Cp Mallow’s is 5 equal to the number of
parameters (including constant)
● PRESS: the value of sum square residual is 679.00

2. Overall Test
● Hypothesis
H0: 𝛽i = 0, i = 1, 2, 3, 4 (regression model is not suitable for use/there is no
linear relationship between the dependent variable and the independent
variable)
H1: Not all 𝛽i = 0, i = 1, 2, 3, 4 (regression model is suitable for use/ there
is a linear relationship between the dependent variable and the
independent variable)
● Significance level
α = 5%
● Test Statistic

P-value = 0.000
● Critical region
H0 is rejected if the p-value is less than the significance level (α)
● Conclusion
Because the p-value (0.000) is less than the significance level (0.05), then
H0 is rejected. So it can be concluded that the model is suitable for use /
there is a linear relationship between the dependent variable and the
independent variable.
● Interpretation
By stating the initial hypothesis H0 that the regression model is not
feasible to use and alternative hypothesis H1 regression model is feasible
to use. At the significance levelα = 5%, H0 will enter the critical or
rejection area if the value of Sig. is less than α. Obtained Sig. (0.000) is
less than the significance level (0.05) then H0 is rejected. So it can be
concluded that the regression model is feasible to use or there is a linear
relationship between the dependent variable and independent variable.

3. Partial Test

a. Partial hypothesis for constant


● Hypothesis
H0: 𝛽0 = 0 (constant is not statistically significant to the regression
model)
H1: 𝛽0 ≠ 0 (constant is statistically significant to the regression
model)
● Significance level
α = 5%
● Test statistic
P-value = 0.000
● Critical region
H0 is rejected if the p-value is less than the significance level (α)
● Conclusion
Because the p-value (0.000) is less than the significance level
(0.05), then H0 is rejected. So it can be concluded that the constant
is statistically significant to the regression model.
● Interpretation
By stating the initial hypothesis H0 that the constant is not
statistically significant to the regression model and alternative
hypothesis H1 the constant is statistically significant to the
regression model. At the significance levelα = 5%, H0 will enter
the critical or rejection area if the value of Sig. is less than α.
Obtained Sig. (0.000) is less than the significance level (0.05) then
H0 is rejected. So it can be concluded that the constant is
statistically significant to the regression model.

b. Partial hypothesis for coefficient variable


● Hypothesis
H0: 𝛽i = 0, i = 1, 2, 3, 4 (predictor variable (𝑋𝑖) is not statistically
significant)
H1: 𝛽i ≠ 0, i = 1, 2, 3, 4 (predictor variable (𝑋𝑖) statistically
significant)
● Significance level
α = 5%
● Test statistic
Independent variable Sig.

TV 0.000

Flyer 0.007

Store 0.461

Competitor 0.000
● Critical region
H0 is rejected if the p-value is less than the significance level (α)
● Conclusion
Independent variable Conclusion

TV Because the p-value (0.000) is less than


the significance level (0.05), then H0 is
rejected. So it can be concluded that
the TV variable is significant to the
regression model.
Flyer Because the p-value (0.007) is less than
the significance level (0.05), then H0 is
rejected. So it can be concluded that
the Flyer variable is significant to the
regression model.

Store Because the p-value (0.461) is greater


than the significance level (0.05), then
H0 is not rejected. So it can be
concluded that the Store variable is not
significant to the regression model.

Competitor Because the p-value (0.000) is less than


the significance level (0.05), then H0 is
rejected. So it can be concluded that
the Competitor variable is significant to
the regression model.

● Interpretation
By stating the initial hypothesis H0 that the predictor variable (𝑋𝑖)
is not statistically significant to the regression model and
alternative hypothesis H1 the predictor variable (𝑋𝑖) is statistically
significant to the regression model. At the significance level
α = 5%, H0 will enter the critical or rejection area if the value of
Sig. is less than α. Obtained Sig. of TV, Flyer, Store, and
Competitor respectively are 0.000, 0.007, 0.461, and 0.000. This
shows that the critical region is rejected on the TV, Flyer, and
Competitor variable. In other words, the variable Store is not
statistically significant to the regression model.

From the Partial Test, we know that the Store variable is not statistically significant to the
regression model and has the largest p-value (Sig.). So, we should take out the Store
variable and do the regression analysis again.
MODEL 2
Variables that included in this model are:
- Dependent variable (Y) : Sales
- Independent variable (X)
1. X1 = TV
2. X2 = Flyer
3. X3 = Competitor
- Constant
1. Model Summary

Interpretation:
● R : indicates the degree of relationship between the independent variables
and the dependent variable which is equal to 0.999, this value indicates
that there is a close relationship between the independent variables and the
dependent variable.
● 𝐑𝟐 : indicates that 99.7% variation of the dependent variable can be
explained by the independent variables. While the rest is explained by
other reasons.
● Adjusted 𝐑𝟐 : 0.996 indicates a correction to R2 by 99.6%.
● Std. Error of the Estimate : 4.86255 shows the magnitude of the
variation in the regression model of 4.86255
● AIC (Akaike Information Criterion): the value of AIC is 50.795
● SBC (Schwarz Bayesian Criterion): the value of SBC is 53.627
● Cp Mallow’s: the number of Cp Mallow’s is 4 equal to the number of
parameters (including constant)
● PRESS: the value of sum square residual is 605.12
2. Overall Test
● Hypothesis
H0: 𝛽i = 0, i = 1, 2, 3 (regression model is not suitable for use/there is no
linear relationship between the dependent variable and the independent
variable)
H1: Not all 𝛽i = 0, i = 1, 2, 3 (regression model is suitable for use/ there is a
linear relationship between the dependent variable and the independent
variable)
● Significance level
α = 5%
● Test Statistic

P-value = 0.000
● Critical region
H0 is rejected if the p-value is less than the significance level (α)
● Conclusion
Because the p-value (0.000) is less than the significance level (0.05), then
H0 is rejected. So it can be concluded that the model is suitable for use /
there is a linear relationship between the dependent variable and the
independent variable.
● Interpretation
By stating the initial hypothesis H0 that the regression model is not
feasible to use and alternative hypothesis H1 regression model is feasible
to use. At the significance levelα = 5%, H0 will enter the critical or
rejection area if the value of Sig. is less than α. Obtained Sig. (0.000) is
less than the significance level (0.05) then H0 is rejected. So it can be
concluded that the regression model is feasible to use or there is a linear
relationship between the dependent variable and independent variable.
3. Partial Test

a. Partial hypothesis for constant


● Hypothesis
H0: 𝛽0 = 0 (constant is not statistically significant to the regression
model)
H1: 𝛽0 ≠ 0 (constant is statistically significant to the regression
model)
● Significance level
α = 5%
● Test statistic
P-value = 0.000
● Critical region
H0 is rejected if the p-value is less than the significance level (α)
● Conclusion
Because the p-value (0.000) is less than the significance level
(0.05), then H0 is rejected. So it can be concluded that the constant
is statistically significant to the regression model.
● Interpretation
By stating the initial hypothesis H0 that the constant is not
statistically significant to the regression model and alternative
hypothesis H1 the constant is statistically significant to the
regression model. At the significance levelα = 5%, H0 will enter
the critical or rejection area if the value of Sig. is less than α.
Obtained Sig. (0.000) is less than the significance level (0.05) then
H0 is rejected. So it can be concluded that the constant is
statistically significant to the regression model.

b. Partial hypothesis for coefficient variable


● Hypothesis
H0: 𝛽i = 0, i = 1, 2, 3 (predictor variable (𝑋𝑖) is not statistically
significant)
H1: 𝛽i ≠ 0, i = 1, 2, 3 (predictor variable (𝑋𝑖) statistically
significant)
● Significance level
α = 5%
● Test statistic
Independent variable Sig.

TV 0.000

Flyer 0.007

Competitor 0.000
● Critical region
H0 is rejected if the p-value is less than the significance level (α)
● Conclusion
Independent variable Conclusion

TV Because the p-value (0.000) is less than


the significance level (0.05), then H0 is
rejected. So it can be concluded that
the TV variable is significant to the
regression model.

Flyer Because the p-value (0.007) is less than


the significance level (0.05), then H0 is
rejected. So it can be concluded that
the Flyer variable is significant to the
regression model.

Competitor Because the p-value (0.000) is less than


the significance level (0.05), then H0 is
rejected. So it can be concluded that
the Competitor variable is significant to
the regression model.

● Interpretation
By stating the initial hypothesis H0 that the predictor variable (𝑋𝑖)
is not statistically significant to the regression model and
alternative hypothesis H1 the predictor variable (𝑋𝑖) is statistically
significant to the regression model. At the significance level
α = 5%, H0 will enter the critical or rejection area if the value of
Sig. is less than α. Obtained Sig. of TV, Flyer, and Competitor
respectively are 0.000, 0.007, and 0.000. This shows that the
critical region is rejected on the TV, Flyer, and Competitor
variable. In other words, all independent variables in Model 2 are
statistically significant to the regression model.

REGRESSION MODEL
Model 1
𝑆𝑎𝑙𝑒𝑠 = 177. 392 + 3. 533 * 𝑇𝑉 + 2. 184 * 𝐹𝑙𝑦𝑒𝑟 + 0. 236 * 𝑆𝑡𝑜𝑟𝑒 − 22. 187 * 𝐶𝑜𝑚𝑝𝑒𝑡𝑖𝑡𝑜𝑟

Model 2
𝑆𝑎𝑙𝑒𝑠 = 178. 893 + 3. 562 * 𝑇𝑉 + 2. 109 * 𝐹𝑙𝑦𝑒𝑟 − 22. 222 * 𝐶𝑜𝑚𝑝𝑒𝑡𝑖𝑡𝑜𝑟

MODEL SELECTION CRITERIA


The best model is the model that has the most criteria fulfilled. We choose the model with
the highest R2 and Adjusted R2, the smallest SE, AIC, SBC, and PRESS. For Cp
Mallow’s is the model with a value of Cp ≤ p. Here is the summary of the 7 standard
criteria in model selection:
Adj. Cp
Model R2 SE AIC SBC PRESS
R2 Mallow’s

Model 1 0.997 0.996 4.95659 51.940 55.480 5 679.00

Model 2 0.997 0.996 4.86255 50.795 53.627 4 605.12

By looking at the table above:


● The R2 is the same between Model 1 and Model 2
● The Adjusted R 2 is the same between Model 1 and Model 2
● The smallest SE is 4.86255 in Model 2
● The smallest AIC is 50.795 in Model 2
● The smallest SBC is 53.627 in Model 2
● Cp Mallow’s is fulfilled in all model
● The smallest PRESS is 605.12 in Model 2
So we can conclude that Model 2 has the highest amount of fulfillment criteria, which is
7 out of 7 criteria fulfilled. So we choose Model 2 as the best model.
BEST REGRESSION MODEL
After undergoing the tests above, we can conclude that model 2 is the best regression
model from the model selection criteria procedure. The equation of Model 2 is as
follows:
𝑌 = β0 + β1𝑋1+β2𝑋2+β3𝑋3
𝑆𝑎𝑙𝑒𝑠 = 178. 893 + 3. 562 * 𝑇𝑉 + 2. 109 * 𝐹𝑙𝑦𝑒𝑟 − 22. 222 * 𝐶𝑜𝑚𝑝𝑒𝑡𝑖𝑡𝑜𝑟

Interpretation:
● For each addition of 1 unit promotion costs through TV variable (in million
rupiahs), the value of Sales will increase by 3.562 (in million rupiahs).
● For each addition of 1 unit promotion costs through flyers variable (in million
rupiahs), the value of Sales will increase by 2.109 (in million rupiahs).
● For each addition of 1 unit of Competitor variable, the value of Sales will
decrease by 22.222 (in million rupiahs).

Conclusion:
Based on the analysis above, after doing the regression analysis procedure the best
regression model obtained by using the model selection criteria procedure is Model 2. So,
the manager of that clothing company can determine the value of the sales by using TV,
Flyer, and Competitor variables. This model has an R-Squared value equal to 0.996
which means that 99.6% variation of the dependent variable can be explained by the
independent variables (TV, Flyer, and Competitor) while the rest is explained by other
reasons.

You might also like