Assignment 2 Completed

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 7

Assignment 02: Business Analytics (BUSI 650)

Deadline: Monday, October 13, 2023 (11:59 PM PST)

Submission: Word or PDF

It is realized that the Price ($) of commuting in a country depends on the age of the passenger, duration, and
distance.

Age of the passenger Duration (in minutes) Distance (in miles) Price (in $)
61 16 3.2 22.3
24 4 1.5 12.5
47 29 5 29
32 33 5.8 36.2
23 14 2.3 19.1
82 30 6.1 36.5
57 56 12 66.9
36 11 1.9 17.3
42 2 0.8 8
47 14 1.7 18
29 15 2.8 24.3
27 15 2.5 22.1
19 45 4.2 44.3
45 19 3.6 23
39 49 10.2 61.1
33 31 6 35

Based on the data, please answer the following questions:

1. State dependent (output) and independent variables (input) in the dataset provided
Dependent Variable (Output): Price (in $)
Independent Variables (Inputs): Age of the passenger, Duration (in minutes), Distance (in miles)

2. Conduct a univariate analysis for Price vs Age of the passenger. Stated univariate regression function,
calculate error % and calculate model accuracy. Predict the price for age=37.
The regression function based on the output would be:

Price = 21.3917+0.2074×Age

Actual Price−Predicted Price


Error %= ∗100 %
Actual Price

The R² value is 0.0426816, or 4.27%. This means that approximately 4.27% of the variance in the Price can be explained by the Age
of the passenger. An R² value of 4.27% is quite low, suggesting that the Age of the passenger is not a strong predictor of the Price in
this model.
Using the regression function, we would predict the price for age=37 as follows: Price=21.3917+0.2074×37Price=21.3917+0.2074×37
= $29.06

3. Conduct a univariate analysis for Price vs Duration. Stated univariate regression function, calculate error %
and calculate model accuracy. Predict the price for duration=22

Univariate Regression Function: The regression function based on the output would be:

Price=5.5011+1.01197×DurationPrice
Model Accuracy: The R-squared value represents the model accuracy, which in this case is 0.95556, or 95.56%. This high R-squared
value indicates a strong linear relationship between Duration and Price.

Predict the Price for Duration 22: Using the regression equation provided, we predict the price for a duration of 22 minutes as
follows: Price=5.5011+1= $27.76

The formula for each individual percentage error is:


Predicted Price = Intercept + (Coefficient × Distance)
Absolute Error = |Actual Price - Predicted Price|
Percentage Error = (Absolute Error / Actual Price) × 100%

Duration (in Actual Price Predicted Price Absolute Error


minutes) ($) ($) ($) Percentage Error (%)

16 22.3 21.693 0.607 2.72

4 12.5 9.549 2.951 23.61

29 29 34.848 5.848 20.17

33 36.2 38.896 2.696 7.45


14 19.1 19.669 0.569 2.98

30 36.5 35.860 0.640 1.75

56 66.9 62.171 4.729 7.07

11 17.3 16.633 0.667 3.86

2 8 7.525 0.475 5.94

14 18 19.669 1.669 9.27

15 24.3 20.681 3.619 14.89

15 22.1 20.681 1.419 6.42

45 44.3 51.040 6.740 15.21

19 23 24.728 1.728 7.51


49 61.1 55.087 6.013 9.84

31 35 36.872 1.872 5.35

MAPE = Σ(Percentage Error) / Number of Observations


The Mean Absolute Percentage Error (MAPE) for the regression model predicting price based on duration
is approximately 9.00%.

4. Conduct a univariate analysis for Price vs Distance. Stated univariate regression function, calculate error %
and calculate model accuracy. Predict the price for distance=3.9
The R-squared value, which provides the model accuracy, is 0.92205, or 92.21%. This suggests that the model explains 92.21% of the
variance in the Price based on the Distance alone, indicating a strong linear relationship.

Price at 3.9 =7.7636+5.0486×3.9 = $27.45

The formula for each individual percentage error is:

Predicted Price = Intercept + (Coefficient × Distance)


Absolute Error = |Actual Price - Predicted Price|
Percentage Error = (Absolute Error / Actual Price) × 100%

Distance (in miles) Actual Price ($) Predicted Price ($) Absolute Error ($) Percentage Error (%)

3.2 22.3 23.919 1.619 7.26

1.5 12.5 15.337 2.837 22.69

5.0 29.0 33.007 4.007 13.82

5.8 36.2 37.045 0.845 2.34

2.3 19.1 19.375 0.275 1.44

6.1 36.5 38.560 2.060 5.64

12.0 66.9 68.347 1.447 2.16

1.9 17.3 17.356 0.056 0.32

0.8 8.0 11.803 3.803 47.53

1.7 18.0 16.346 1.654 9.19

2.8 24.3 21.900 2.400 9.88

2.5 22.1 20.385 1.715 7.76


Distance (in miles) Actual Price ($) Predicted Price ($) Absolute Error ($) Percentage Error (%)

4.2 44.3 28.968 15.332 34.61

3.6 23.0 25.939 2.939 12.78

10.2 61.1 59.259 1.841 3.01

6.0 35.0 38.055 3.055 8.73

MAPE = Σ(Percentage Error) / Number of Observations

The MAPE across all observations is 11.82%.

5. Conduct a multivariate regression analysis [ Include all inputs/ independent variables. Do not exclude any
inputs even if the p values are not significant (>0.05)]
Take a snapshot of the summary output of regression that shows R square, coefficients for the intercept and
the independent variables

What is the regression function?

Price=Intercept+(Coefficient for Age×Age)+(Coefficient for Duration×Duration)+(Coefficient for Distance×Distanc


e)

6. From the summary output above, was there any independent variable (input) that was not statistically
significant (p value > 0.05)? If so, which one?
The independent variable that was not statistically significant (p-value > 0.05) is the "Age of the
passenger". The p-value for age is 0.282423278, which is greater than the usual significance level of 0.05.
7. Remove the statistically insignificant input (p value > 0.05) from the dataset and rerun the regression again.
4.1. If you are using Word file, take a snapshot of the summary output of regression that
shows R square, coefficients for the intercept and the variables.

4.2. What is the regression function?


Price=Intercept+)+(Coefficient for Duration×Duration)+(Coefficient for Distance×Distance)

4.3. Conduct model evaluation (Hint: Predicted vs Actual, Absolute of Predicted vs Actual
and Variance)

AbsoluteResidu
Actual Value PredictedValue AbsoluteDifference al

22.3 20.871 1.429 1.429


12.5 11.905 0.595 0.595
29 29.285 0.285 -0.285
36.2 37.716 1.516 -1.516
19.1 18.509 0.591 0.591
36.5 39.006 2.506 -2.506
66.9 66.233 0.667 0.667
17.3 17.515 0.215 -0.215
8 6.558 1.442 1.442
18 16.374 1.626 1.626
24.3 21.981 2.319 2.319
22.1 21.189 0.911 0.911
44.3 45.597 1.297 -1.297
23 23.605 0.605 -0.605
61.1 58.763 2.337 2.337
35 38.722 3.722 -3.722

8. From your multivariate analysis, calculate the predicted value for the following inputs (Age = 30, distance
= 15 and duration = 4)

You might also like