Professional Documents
Culture Documents
AnBis Gasal 2324 - Sesi 2
AnBis Gasal 2324 - Sesi 2
AnBis Gasal 2324 - Sesi 2
Analytics (1)
Course: Business Analytics – Session 2
Odd Semester – 2023/2024
September 2023
Summarized from
Evans, James R. Business Analytics, 3rd Edition.
Pearson Education, 2020. .
q Statistical Inference
q Trendlines and Regression Analysis
2
Statistical
Inference
3
Statistical Inference
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.276 4
Hypothesis Testing
Reject the null hypothesis (H0) : the sample data provide sufficient statistical evidence to
support alternate hypothesis.
Fail to reject the null hypothesis (H0) : the sample data do not support alternate hypothesis. We
can only accept as valid the existing theory or belief, but we can never prove it.
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.276 5
Steps of Hypothesis-Testing
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.276 - 277 6
One-Sample Hypothesis Tests -
Types
1. H0: population parameter ≥ constant vs H1: population parameter < constant
2. H0: population parameter ≤ constant vs H1: population parameter > constant
3. H0: population parameter = constant vs H1: population parameter ≠ constant
REMEMBER:
We cannot “prove” that H0 is true, we can only fail to reject it.
If we cannot reject the null hypothesis, we have shown only that there is insufficient evidence to
conclude that the alternative hypothesis is true.
Rejecting the null hypothesis provides strong evidence (in a statistical sense) that the null hypothesis
is not true and that the alternative hypothesis is true.
Therefore, what we wish to provide evidence for statistically should be identified as the alternative
hypothesis.
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.277 7
One-Sample Hypothesis Tests -
Example 1
CadSoft, a producer of computer-aided design software for the aerospace
industry, receives numerous calls for technical support. In the past, the average
response time has been at least 25 minutes. The company has upgraded its
information systems and believes that this will help reduce response time. As a
result, it believes that the aver-age response time can be reduced to less than 25
minutes. The company collected a sample of 44 response times.
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p. 278 8
One-Sample Hypothesis Tests -
Results
1. H0 is actually true and the test correctly fails to reject it.
2. H0 is actually false and the test correctly reaches this conclusion.
3. H0 is actually true BUT the test incorrectly rejects it. (TYPE I ERROR)
4. H0 is actually false BUT the test incorrectly fails to reject it rejects it. (TYPE II ERROR)
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.279 10
One-Sample Hypothesis Tests -
Example 2
In the CadSoft example, sample data for 44 customers revealed a mean response
time of 21.91 minutes and a sample standard deviation of 19.49 minutes.
x - µ0 21.91 - 25 -3.09
t= = = = -1.05
s / n 19.49 / 44 2.938
t = −1.05 indicates that the sample mean of 21.91 is 1.05 standard errors below the
hypothesized mean of 25 minutes
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p. 280 11
Critical Value and Drawing a
Conclusion
• The conclusion to reject or fail to reject H0 is based on comparing the value of the
test statistic to a “critical value” from the sampling distribution of the test statistic
when the null hypothesis is true and the chosen level of significance, a.
• The sampling distribution of the test statistic is usually the normal distribution, t-
distribution.
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.280 12
Rejection Regions
One-Tailed Tests :
Specify a direction of relationship (where H0 is either ≥ or ≤)
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.281 13
One-Tailed Test - Example 3
For the CadSoft example, if the level of significance is 0.05, then the critical value
is t0.05,43 is found in Excel using = T.INV(0.95, 43) = 1.68. Because the t-distribution
is symmetric with a mean of 0 and this is a lower-tailed test, we use the negative
of this number ( -1.68) as the critical value. n = 44; df = n - 1 = 43
Hypothesis formula :
H0: µ ≥ 25 minutes Result :
t = −1.05 does not fall in the rejection region.
H1: µ < 25 minutes
Conclusion :
• Fail to reject H0 .
• Cannot conclude that the mean response
time has improved to less than 25 minutes.
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p. 282 14
Two-Tailed Test
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.282 15
Two-Tailed Test - Example 4
Data collected in a survey of 34 respondents by a travel agency. Suppose that the travel
agency wanted to target individuals who were approximately 35 years old. Thus, we wish
to test whether the average age of respondents is equal to 35. The sample mean is
38.676, and the sample standard deviation is 7.857
Hypothesis : Critical value, t0.025,33, with the Excel function =T.INV.2T(.05, 33),
H0: mean age = 35 we obtain 2.0345. Thus, the critical values are ± 2.0345
x - µ0 ( 38.676 - 35 )
H1: mean age ≠ 35 t= = = 2.73
s/ n (
.857 / 34 )
Result :
t = 2.73 fall in the rejection region.
Conclusion :
Reject H0 that the average age is 35.
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p. 282 16
p-Values
READ the text book abount finding p-values when σ is known or unknown
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.284 17
p-Values - Example 5
For the CadSoft example, p-value is obtained For the Vacation example, p-value is obtained
using the Excel function =T.DIST(-1.05, 43, using the Excel function =T.DIST.2T(2.73, 33) =
TRUE) = 0.15. 0.010.
Result : Result :
p = 0.15 is not less than α = 0.05. p = 0.010 is less than α = 0.05.
Conclusion : Conclusion :
• Fails to reject H0. Reject H0.
• There is about a 15% chance that the test
statistic would be -1.05 or smaller if H0
were true
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p. 282 18
One Sample Hypothesis Tests -
Excel Template
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p. 287 19
Two-Sample Hypothesis Tests
Hypotheses:
H 0 : µ1 - µ 2 {³, £, or =}0
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.288 20
Two-Sample Hypothesis Tests –
Example 6 (Independent Samples)
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.290 - 291 21
Two-Sample Hypothesis Tests –
Example 6 (Independent Samples)
Hypothesis :
H 0 : µ1 - µ2 £ 0
H1 : µ1 - µ2 > 0
Result :
tstat = 3.827 > tcritical = 1.812, OR
p = 0.00166 < α = 0.05.
Conclusion :
Reject H0.
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.291 22
Two-Sample Hypothesis Tests –
Paired Samples
Hypotheses:
H 0 : µ D {³, £, or =}0
H1 : µ D {<, >, or ¹}0
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.290 - 292 23
Two-Sample Hypothesis Tests –
Example 7 (Paired Samples)
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.290 - 293 24
Two-Sample Hypothesis Tests –
Example 7 (Independent Samples)
Hypothesis :
H 0 : µD = 0
H 1 : µD ≠ 0
Two-Tailed
Result :
t is < the lower critical value, OR
p-value ≈ 0
Conclusion :
Reject H0.
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education p.290 - 293 25
F-test
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.2932 - 293 26
Conducting F-test
• Reject the H0, if the F-test statistic > the critical value .
• Note that we are using α/2, to find the critical value, not α.
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.292 - 293 27
F-test –
Example 8
• Determine whether the variance of lead times is the same for
Alum Sheeting and Durrable Products in the Purchase Orders
data.
• The variance of the lead times for Alum Sheeting is larger than
the variance for Durable Products, so this is assigned to Variable
1.
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.292 - 293 28
F-test –
Example 8
Result :
F < Fcrit, OR
p-value > α/2 = 0.025
Conclusion :
Fail to reject H0.
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.292 - 293 29
Analysis of Variance (ANOVA)
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.294 - 296 30
Analysis of Variance (ANOVA) –
Assumptions
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.294 - 296 31
Analysis of Variance (ANOVA) –
Example 9
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.294 - 295 32
Analysis of Variance (ANOVA) –
Example 9
Result :
F > Fcrit, OR
p-value < α = 0.05
Conclusion :
Reject H0.
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education p.294 - 295 33
Trendlines and
Regression Analysis
34
Using Chart in Modeling
Relationships dan Trends in Data
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.313 35
Mathematical Functions Used in
Predictive Analytical Models
the base of natural logarithms, e = 2.71828, is often used for the constant b
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.313 36
Excel Trendline Tool
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.313 37
Mathematical Functions –
Example
Market research study has collected data on sales volumes for different levels of pricing of
a particular product.
The model is :
Sales = 20,512 – (9.5116 x Price)
If the price is $125, we can
estimate the level of sales as :
Sales = 20,512 – (9.5116 x 125)
Sales = 19,323
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.314 - 315 38
R2 (R-squared)
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.314 39
Regression Analysis
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.314 40
Data Classification by the Type of
Measurement Scale
Categorical (Nominal)
• Sorted into categories according to specified characteristics
• The categories bear no quantitative relationship to one another, but we usually
assign an arbitrary number to each category, exp 0 or 1, to ease the process of
man-aging the data and computing statistics.
• Usually counted or expressed as proportions or percentages.
• Exp: Employees might be classified as managers, supervisors, and associates.
Ordinal
• Can be ordered or ranked according to some relationship to one another.
• More meaningful than categorical data because data can be compared.
• However, ordinal data have no fixed units of measurement, so we cannot make
meaningful numerical statements about differences between categories.
• Exp: College basketball rankings are ordinal, a higher ranking signifies a stronger
team but does not specify any numerical measure of strength.
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.146 - 147 41
Data Classification by the Type of
Measurement Scale
Interval
• Ordinal but have constant differences between observations and have
arbitrary zero points.
• Exp: commonly are time and temperature.
• Celsius scales represent a specified measure of distance—degrees—
but have arbitrary zero points. we cannot compute meaningful ratios;
for example, we cannot say that 50 degrees is twice as hot as 25
degrees. However, we can compare differences.
Ratio
• Continuous and have a natural zero point.
• Exp. the Seattle region sold $12 million in March whereas the Tampa
region sold $6 million means that Seattle sold twice as much as Tampa
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.146 - 147 42
Simple Linear Regression
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.317 - 318 43
Simple Linear Regression –
Example
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.317 - 318 44
Simple Linear Regression –
Example
Equation :
Market Value = a + b x Square Feet
Two possible lines are shown below :
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.317 - 318 45
Least-Squares Regression - 1
Yˆ = b0 + b1 X (8.2 )
• The estimated value of Y for Xi :
Yˆi = b0 + b1 X i
Note: Xi is the value of the independent variable of the ith
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.320 - 321 46
Residuals
The observed errors associated with estimating the value of the dependent
variable using the regression line.
ei = Yi - Yˆi (8.3)
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.320 - 321 47
Least-Squares Regression - 2
• Excel functions:
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.320 - 321 48
Simple Linear Regression with Excel
– Alt 1
Slope = b1 = 35.036
= SLOPE ( C 4 : C 45, B 4 : B 45 )
Intercept = b0 = 32,673
= INTERCEPT ( C 4 : C 45, B 4 : B 45 )
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.322 - 324 49
Simple Linear Regression with Excel
– Alt 2
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.322 - 324 50
Simple Linear Regression with Excel
– Alt 2
Ŷ = b 0 + b 1X Ŷ = 32.673 + 35.036X
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.322 - 324 51
Regression Statistics
Multiple R (ǀrǀ)
• r is the sample correlation coefficient.
• The value of r varies from −1 to +1. (r is negative if slope is negative)
• r is negative if slope is negative.
R Square (R2)
• See previous slides.
Adjusted R Square
• Modifies the value of R2 by incorporating the sample size and the number of explanatory variables
in the model.
• Useful when comparing this model with other models that include additional explanatory
variables.
Standard Error
• The variability of the observed Y-values from the predicted values (Ŷ).
• If the data are clustered close to the regression line, then the standard error will be small.
• The more scattered the data, the larger the standard error.
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.322 - 324 52
Regression as Analysis of Variance
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.324 - 326 53
Regression as Analysis of Variance –
Example
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.324 - 326 54
Regression as Analysis of Variance –
Using t-Test - Example
b1 - 0
t= (8.8)
standard error
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.324 - 326 55
Checking Assumptions
Linearity
• examine scatter diagram à appear linear.
• examine residual plot à appear random.
Normality of Errors
• view a histogram of standard residuals.
• regression is robust to departures from normality.
Homoscedasticity
• variation about the regression line is constant.
• examine the residual plot.
Independence of Errors
• Successive observations should not be related.
• Correlation among successive observations over time is called autocorrelation
• Important when the independent variable is time.
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.327 - 329 56
Checking Assumptions –
Example Home Market Value
Linearity
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.327 - 329 57
Checking Assumptions –
Example Home Market Value
Normality of Errors
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.327 - 329 58
Checking Assumptions –
Example Home Market Value
Homoscedasticity
residual plot shows no serious difference in the spread of the data for different
X values
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.327 - 329 59
Checking Assumptions –
Example Home Market Value
Independence of
Errors
Because the data is cross-sectional, we can assume this assumption holds.
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.327 - 329 60
Multiple Linear Regression
A linear regression model with more than one independent variable (X).
Y = b 0 + b1 X 1 + b 2 X 2 + ! + b k X k + e (8.10 )
Where :
Y is the dependent variable,
X ,!, X are the independent (explanatory) variables,
1 k
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.329 61
ANOVA for Multiple Regression
• ANOVA tests for significance of the ENTIRE model. That is, it computes
an F-statistic testing the hypotheses:
H 0 : b1 = b 2 = ! = b k = 0
H1 : at least one b j is not 0
• Output also provides information to test hypotheses about each of the
individual regression coefficients.
• Reject H0 that the slope associated with independent variable i is 0,
then the independent variable i (Xi) is significant and improves the
ability of the model to better predict the dependent variable.
• If we fail to reject H0 independent variable is not significant and
probably should not be included in the model.
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.330 - 331 62
Multiple Linear Regression –
Example
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.331-333 63
Multiple Linear Regression –
Example
Result and Regression Model
• The value of R2 indicates that 53% of the variation in the dependent variable is
explained by these independent variables.
• All coefficients of independent variables are statistically significant (p-value < 0.05).
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.331-333 64
Systematic Model Building Approach
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.334 - 335 65
Systematic Model Building Approach
- Example
Banking Data
Home value has the largest p-value, drop and re-run the regression.
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.334 - 335 66
Systematic Model Building Approach
- Example
Banking Data
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.334 - 335 67
Multicollinearity
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.336 68
Multicollinearity ?
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.336 69
Regression with Categorical
Independen Variables
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.336 70
Regression with Categorical
Independen Variables - Example
• Employee Salaries provides data for 35 employees.
Y = b 0 + b1 X 1 + b 2 X 2 + e
where
Y = salary
X 1 = age
X 2 = MBA indicator (0 or 1)
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.338 - 340 71
Regression with Categorical
Independen Variables - Example
Source: Evans, James R. 2020. Business Analytics, 3rd Edition. Pearson Education. p.338 - 340 72
THANK YOU