Stats CH Final

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 68

STATISTICS Chapter 8 (Part II) Prof.

Tan Le
Chapter 8: (Part II)
8.4 OTHER CONTINUOUS DISTRIBUTIONS
Student t Distribution (σ is unknown)
The Student t distribution was first derived by William Gosset (1876-1937). (Gosset published his
findings (1908) under the pseudonym “Student” and used the letter t to represent the random variable,
hence the Student t distribution- also called the Student’s t distribution.)
x −µ x−µ
t= , or t= ( n)
s s
n

Definition: The number of degrees of freedom for a data set corresponds to the number of scores that
can vary after certain restrictions have been imposed on all scores.
For the applications of this section, the number of degrees of freedom is simply the sample size minus 1:
Degrees of freedom = ν = n – 1
(nu)

Important Properties of the Student t Distribution


1. The Student t distribution is different for different sample sizes.
2. The Student t distribution has the same general symmetric bell shape as the standard normal
distribution, but it reflects the greater variability that is expected with small samples.
3. The Student t distribution has mean of t = 0.
4. The standard deviation of the Student t distribution varies with the sample size, but it is greater than 1
5. As the sample size n gets larger, the Student t distribution gets closer to the standard normal distribution.

Critical t Values
1. Critical values are found in Table 8.2 (or Appendix B, Table 4).
2. Degrees of freedom = ν = n – 1
3. After finding the number of degrees of freedom, refer to Table 8.2 and locate that number in the
column at the left. With a particular row of t values now identified, select the critical t value that
corresponds to the appropriate column heading (the values of the tail(s)).
If a critical t value is located at the left tail, be sure to make it negative.

Table 8.2 Critical Values of t (Table 4 in Appendix B)


---♦---

Determining Student t Values


(Note: INWARD method)

Example: Using the t table (Table 4) to find the value of t,


if given a sample size n = 11, and the right area under the Student t curve is 0.05

Page 1 of 5
STATISTICS Chapter 8 (Part II) Prof. Tan Le
Solution: Degree of freedom = ν = n – 1 = 11 – 1 = 10, top 5% (α = 0.05 on the right side)
α = 0.05

0.95

0 +1.812 t

Hence, t = +1.812
---♦---

Example: Using the t table (Table 4) to find the value of t,


if given a sample size n = 11, and the bottom (left) 5%
Solution: Degree of freedom = ν = n – 1 = 11 – 1 = 10, lowest 5% (α = 0.05 on the left side)
α = 0.05

0.95

– 1.812 0 t

Hence, t = –1.812
(Note: The value of t such that the area to its left is negative)
Page 2 of 5
STATISTICS Chapter 8 (Part II) Prof. Tan Le

Example: Using the t table (Table 4) to find the value of t,


if given a sample size n = 33, and the lowest 1%

Solution: Degree of freedom = ν = n – 1 = 33 – 1 = 32, lowest 1% (α = 0.01 on the left side)


Note: Because 32 degrees of freedom is not listed, we find the closest number of degrees of freedom,
which is 30.
α = 0.01

0.99

– 2.475 0 t

Hence, t = –2.475
---♦---

Example:
t.05, 10 = 1.812, t.05, 25 = 1.708, t.05, 70 = t.05, 72 = 1.667, t.05, 10 = –1.812

Example:
t.05, ∞ = 1.282, t.025, ∞ = 1.96, t.01, ∞ = 2.33, t.005, ∞ = 2.576
---♦---

Exercise: Questions 8.94, 8.96

Page 3 of 5
STATISTICS Chapter 8 (Part II) Prof. Tan Le

The Student t Distribution

Degrees of
Freedom
t.100 t.050 t.025 t.010 t.005
1 3.078 6.314 12.706 31.821 63.657
2 1.886 2.920 4.303 6.965 9.925
3 1.638 2.353 3.182 4.541 5.841
4 1.533 2.132 2.776 3.747 4.604
5 1.476 2.015 2.571 3.365 4.032
6 1.440 1.943 2.447 3.143 3.707
7 1.415 1.895 2.365 2.998 3.499
8 1.397 1.860 2.306 2.896 3.355
9 1.383 1.833 2.262 2.821 3.250
10 1.372 1.812 2.228 2.764 3.169
11 1.363 1.796 2.201 2.718 3.106
12 1.356 1.782 2.179 2.681 3.055
13 1.350 1.771 2.160 2.650 3.012
14 1.345 1.761 2.145 2.624 2.977
15 1.341 1.753 2.131 2.602 2.947
16 1.337 1.746 2.120 2.583 2.921
17 1.333 1.740 2.110 2.567 2.898
18 1.330 1.734 2.101 2.552 2.878
19 1.328 1.729 2.093 2.539 2.861
20 1.325 1.725 2.086 2.528 2.845
21 1.323 1.721 2.080 2.518 2.831
22 1.321 1.717 2.074 2.508 2.819
23 1.319 1.714 2.069 2.500 2.807
24 1.318 1.711 2.064 2.492 2.797
25 1.316 1.708 2.060 2.485 2.787
26 1.315 1.706 2.056 2.479 2.779
27 1.314 1.703 2.052 2.473 2.771
28 1.313 1.701 2.048 2.467 2.763
29 1.311 1.699 2.045 2.462 2.756
30 1.310 1.697 2.042 2.457 2.750

Page 4 of 5
STATISTICS Chapter 8 (Part II) Prof. Tan Le

The Student t Distribution

Degrees of
Freedom
t.100 t.050 t.025 t.010 t.005
35 1.306 1.690 2.030 2.438 2.724
40 1.303 1.684 2.021 2.423 2.925
45 1.301 1.679 2.014 2.412 2.690
50 1.299 1.676 2.009 2.403 2.678
55 1.297 1.673 2.004 2.396 2.668
60 1.296 1.671 2.000 2.390 2.660
65 1.295 1.669 1.997 2.385 2.654
70 1.294 1.667 1.994 2.381 2.648
75 1.293 1.665 1.992 2.377 2.643
80 1.292 1.664 1.990 2.374 2.639
85 1.292 1.663 1.988 2.371 2.635
90 1.291 1.662 1.987 2.368 2.632
95 1.291 1.661 1.985 2.366 2.629
100 1.290 1.660 1.984 2.364 2.626
110 1.289 1.659 1.982 2.361 2.621
120 1.289 1.658 1.980 2.358 2.617
130 1.288 1.657 1.978 2.355 2.614
140 1.288 1.656 1.977 2.353 2.611
150 1.287 1.655 1.976 2.351 2.609
160 1.287 1.654 1.975 2.350 2.607
170 1.287 1.654 1.974 2.348 2.605
180 1.286 1.653 1.973 2.347 2.603
190 1.286 1.653 1.973 2.346 2.602
200 1.286 1.653 1.972 2.345 2.601
∞ 1.282 1.645 1.960 2.326 2.576

Page 5 of 5
STATISTICS Chapter 12 (Part I) Prof. Tan Le
Chapter 12: (Part I)
INFERENCE ABOUT A POPULATION
12.1 INFERENCE ABOUT A POPULATION MEAN
WHEN THE STANDARD DEVIATION σ IS UNKNOWN

Student t Distribution: The Student t distribution was first derived by William Gosset (1876-1937).
(Gosset published his findings (1908) under the pseudonym “Student” and used the letter t to represent the
random variable, hence the Student t distribution- also called the Student’s t distribution.)

Example: Using the t table (Table 4) to find the value of t,


if given a sample size n = 11, and the 95% degrees of confidence
Solution: Degree of freedom = ν = n – 1 = 11 – 1 = 10,
95% degrees of confidence (α = 5% = 0.05 for two tails, → each tail = α/2 = 0.05/2 = 0.025)
α/2 = 0.025 α/2 = 0.025

0.95

-2.228 0 +2.228 t

Hence, n = 10, and 95% confidence → t = ± 2.228


---♦---

Example: Using the t table (Table 4) to find the value of t,


if given a sample size n = 33, and the 95% degrees of confidence.

Solution: Degree of freedom = n – 1 = 33 – 1 = 32 (there is no degrees of freedom 32, use the closest df. is 30),
95% degrees of confidence (α = 5% = 0.05 for two tails, → each tail = α/2 = 0.05/2 = 0.025)

Page 1 of 5
STATISTICS Chapter 12 (Part I) Prof. Tan Le
α = 0.025 α = 0.025

0.95

– 2.042 0 +2.042 t

Hence, n = 30, 95% confidence → t = ± 2.042


---♦---

CONFIDENCE INTERVAL ESTIMATOR of µ


( σ is unknown → Student t distribution)

Margin of Error for the Estimate of µ


s
E = tα , where tα has n - 1 degrees of freedom
2 n 2

s
Confidence Interval for the Estimate of µ: x − E < µ < x + E , where E = t.
n
Other equivalent form for the confidence interval are (x − E , x + E ) . The values x − E and x + E are called
confidence interval estimator.
s s
LCL = x – tα ⋅ , and UCL = x + tα ⋅
2 n 2 n
---♦---

Page 2 of 5
STATISTICS Chapter 12 (Part I) Prof. Tan Le

The following 3-steps are the expectation for constructing confidence interval.
Step 1. Find the critical values t (if σ is unknown, and s is known) correspond to the given
degree of confidence (%).
s
Step 2. Find E = tα (s is given)
2 n
Step 3. Construct the confidence interval: x − E < µ < x + E
---♦---

Example 12.2:
1829247
Given s = 4499, sample size n = 192, x = = 9527. Construct a 95% confidence interval.
192
Solution:
Step 1: Degree of freedom = n – 1 = 192 – 1 = 191, (because 191 degrees of freedom is not listed, we
find the closest number of degrees of freedom, which is 190.)
95% degrees of confidence (α = 5% = 0.05 for two tails, → each tail = α/2 = 0.05/2 = 0.025)

Hence, n = 190, 95% confidence → t = ± 1.973


s 𝟒𝟒𝟒𝟒𝟒𝟒𝟒𝟒
Step 2: E = t. = 1.973 = 641
n √𝟏𝟏𝟏𝟏𝟏𝟏

Step 3: LCL = x – E = 9527 – 641 = 8886


UCL = x + E = 9527 + 641 = 10168 The 95% confidence interval estimate of
µ is ($8,886 < µ < $10,168)
---♦---

Exercise: 12.2, 12.4, 12.6, 12.8

Page 3 of 5
STATISTICS Chapter 12 (Part I) Prof. Tan Le
The Student t Distribution

Degrees of
Freedom
t.100 t.050 t.025 t.010 t.005
1 3.078 6.314 12.706 31.821 63.657
2 1.886 2.920 4.303 6.965 9.925
3 1.638 2.353 3.182 4.541 5.841
4 1.533 2.132 2.776 3.747 4.604
5 1.476 2.015 2.571 3.365 4.032
6 1.440 1.943 2.447 3.143 3.707
7 1.415 1.895 2.365 2.998 3.499
8 1.397 1.860 2.306 2.896 3.355
9 1.383 1.833 2.262 2.821 3.250
10 1.372 1.812 2.228 2.764 3.169
11 1.363 1.796 2.201 2.718 3.106
12 1.356 1.782 2.179 2.681 3.055
13 1.350 1.771 2.160 2.650 3.012
14 1.345 1.761 2.145 2.624 2.977
15 1.341 1.753 2.131 2.602 2.947
16 1.337 1.746 2.120 2.583 2.921
17 1.333 1.740 2.110 2.567 2.898
18 1.330 1.734 2.101 2.552 2.878
19 1.328 1.729 2.093 2.539 2.861
20 1.325 1.725 2.086 2.528 2.845
21 1.323 1.721 2.080 2.518 2.831
22 1.321 1.717 2.074 2.508 2.819
23 1.319 1.714 2.069 2.500 2.807
24 1.318 1.711 2.064 2.492 2.797
25 1.316 1.708 2.060 2.485 2.787
26 1.315 1.706 2.056 2.479 2.779
27 1.314 1.703 2.052 2.473 2.771
28 1.313 1.701 2.048 2.467 2.763
29 1.311 1.699 2.045 2.462 2.756
30 1.310 1.697 2.042 2.457 2.750
Page 4 of 5
STATISTICS Chapter 12 (Part I) Prof. Tan Le
The Student t Distribution

Degrees of
Freedom
t.100 t.050 t.025 t.010 t.005
35 1.306 1.690 2.030 2.438 2.724
40 1.303 1.684 2.021 2.423 2.925
45 1.301 1.679 2.014 2.412 2.690
50 1.299 1.676 2.009 2.403 2.678
55 1.297 1.673 2.004 2.396 2.668
60 1.296 1.671 2.000 2.390 2.660
65 1.295 1.669 1.997 2.385 2.654
70 1.294 1.667 1.994 2.381 2.648
75 1.293 1.665 1.992 2.377 2.643
80 1.292 1.664 1.990 2.374 2.639
85 1.292 1.663 1.988 2.371 2.635
90 1.291 1.662 1.987 2.368 2.632
95 1.291 1.661 1.985 2.366 2.629
100 1.290 1.660 1.984 2.364 2.626
110 1.289 1.659 1.982 2.361 2.621
120 1.289 1.658 1.980 2.358 2.617
130 1.288 1.657 1.978 2.355 2.614
140 1.288 1.656 1.977 2.353 2.611
150 1.287 1.655 1.976 2.351 2.609
160 1.287 1.654 1.975 2.350 2.607
170 1.287 1.654 1.974 2.348 2.605
180 1.286 1.653 1.973 2.347 2.603
190 1.286 1.653 1.973 2.346 2.602
200 1.286 1.653 1.972 2.345 2.601
∞ 1.282 1.645 1.960 2.326 2.576

Page 5 of 5
STATISTICS Chapter 13 Prof. Tan Le

Chapter 13:
INFERENCE ABOUT COMPARING TWO POPULATIONS
In a two-sample hypothesis-testing problem, the underlying parameters of two different populations,
{neither of whose values is assumed known}, are compared.

Two samples are said to be independent when the data points in one sample are unrelated to the data
points in the second sample.
If the values in one sample are related to the values of the other sample, the samples are dependent. Such
samples are often referred to as matched paired or paired samples.
---♦---
Sampling Distribution of x1 – x2
σ 12 σ 22
The standard error is σerror = σ x = +
n1 n2
---♦---

Components of a Formal Hypothesis Test

 The null hypothesis (denoted by H 0 ) is a statement about the value of a population parameter (such as
the mean µ), and it must contain the condition of equality and must be written with the symbol =, ≤ ,
or ≥ . For the mean, the null hypothesis will be stated in one of these three forms:
H0: µ1 – µ2 = 0 H0: µ1 – µ2 ≤ 0 H0: µ1 – µ2 ≥ 0
We test the null hypothesis directly in the sense that we assume it is true and reach a conclusion
to either reject H 0 or fail to reject H 0 .

 The alternative hypothesis (denoted by H1 or Ha) is the statement that must be true if the null hypothesis
is false. For the mean, the alternative hypothesis will be stated in only one of these three possible forms:
Ha: µ1 – µ2 ≠ 0 Ha: µ1 – µ2 > 0 Ha: µ1 – µ2 < 0
(two tailed test) (right tailed test) (left tailed test)
Note that H1 (or Ha) is the opposite of H 0 .
Note:
1. There are 2 possible decisions:
a. Conclude that there is enough evidence to support the alternative hypothesis Ha.
Reject the null hypothesis Ho
b. Conclude that there is not enough evidence to support the alternative hypothesis Ha.
Do not reject the null hypothesis Ho
(accept, support)

2. Two possible errors can be made in a test.


a. Type I error: The mistake of rejecting the true null hypothesis.
b. Type II error: The mistake of failing to reject the false null hypothesis.

Page 1 of 13
STATISTICS Chapter 13 Prof. Tan Le

Describing the p-Value (Excel)


If the p-value ≤ significance level α, there is strong evidence to infer that the alternative hypothesis Ha is
true. The result is deemed to be significant. → Reject the null hypothesis H0.

When the p-value > significance level α, there is no evidence to infer that the alternative hypothesis Ha is
true. We also say that the test is not significant. → Do not reject the null hypothesis H0.
(accept, support)
---♦---

ONE TAILED or TWO TAILED TEST?


Two Tailed Test:
same, equal, no difference, = , … Ho
Difference, not the same, not equal, ≠ , … Ha
---♦---

Example: There is no difference in mean credit card debts of household in Burlington and Hamilton.

So, the null and alternative hypothesis are

Ho: µB = µH (claim) or, Ho : µ B – µ H = 0 or, Ho : µ D = 0


Ha : µ B ≠ µ H Ha : µ B – µ H ≠ 0 Ha : µ D ≠ 0

rejected region rejected region

accepted region

Sign used in
Two-tailed test
---♦---

Example: … Can we infer that police officers and security officers differ in their vacation expenses?

So, the null and alternative hypothesis are

Ho : µ P = µ S or, Ho : µ P – µ S = 0 or, Ho : µ D = 0
Ha: µP ≠ µS (claim) Ha : µ P – µ S ≠ 0 Ha : µ D ≠ 0

Page 2 of 13
STATISTICS Chapter 13 Prof. Tan Le

One Tailed Test:


If the statement (claim) states: Better, exceeds, improved, or lower, … → one tailed test problem.

Right Tailed Test:


at most, no more than, ≤ ,… Ho
More than, higher, or increase, > , … Ha
---♦---
Example: A real estate agency says that the number of home sales drop most since recession as new rules
put brakes on market. Can we conclude that at the 5% significance level that the number of houses sold of
this year decreased on average? (Note: Last year sales more than this year)

Ho: µ last year ≤ µ this year or Ho: µ last year – µ this year ≤ 0 or, Ho : µ d ≤ 0
Ha: µlast year > µthis year (claim) Ha: µ last year – µ this year > 0 Ha: µd > 0

rejected region

accepted region

Sign used in

Right-tailed test
---♦---

Example: Can the company infer that the new tire will last longer on average than the existing tire?

Ho: µnew ≤ µexisting or, Ho: µnew – µexisting ≤ 0 or, Ho : µ d ≤ 0
Ha: µnew > µexisting Ha: µnew – µexisting > 0 Ha : µ d > 0

---♦---

Example: The store manager would like to know if the mean checkout time using the standard method is
longer than using the U-Scan?

Ho : µ S ≤ µ U or Ho : µ S – µ U ≤ 0 or, Ho : µ d ≤ 0
Ha: µS > µU (claim) Ha : µ S – µ U > 0 Ha : µ d > 0

Page 3 of 13
STATISTICS Chapter 13 Prof. Tan Le

Left Tailed Test:


at least, no less than, ≥ , … Ho
Less than, lower, cheating or reduce, < , … Ha
---♦---

Example: A real estate agency says that the number of home sales drop most since recession as new rules
put brakes on market. Can we conclude that at the 5% significance level that the number of houses sold of
this year decreased on average? (Note: This year sales less than last year.)

Ho: µ this year ≥ µ last year or Ho: µ this year – µ last year ≥ 0 or, Ho : µ d ≥ 0
Ha: µthis year < µlast year (claim) Ha: µ this year – µ last year < 0 Ha: µd < 0

rejected region

accepted region

Sign used in
Left-tailed test

---♦---
Note: We can change the right tailed test problem to left tailed test problem by changing the set up.
---♦---

Example: Can the company infer that the existing tire will last shorter on average than the new tire?

Ho: µexisting ≥ µnew or, Ho: µexisting – µnew ≥ 0 or, Ho: µd ≥ 0
Ha: µexisting < µnew Ha: µexisting – µnew < 0 Ha : µ d < 0

---♦---

Example: A manufacturer claims that the war usage of its 17-inch flat panel monitors is less than that of its
leading competitor. At α = 0.10, is there enough evidence to support the manufacturer’s claim?

The claim is “the watt usage of manufacturer’s 17-inch flat monitors is less than that of its leading competitor.”

Ho : µ m ≥ µ c or, Ho : µ m – µ c ≥ 0 or, Ho : µ d ≥ 0
Ha: µm < µc (claim) Ha : µ m – µ c < 0 Ha : µ d < 0

Page 4 of 13
STATISTICS Chapter 13 Prof. Tan Le

13.1 INFERENCE ABOUT THE DIFFERENCE BETWEEN TWO MEANS:


INDEPENDENT SAMPLES

Hypothesis Testing of Two Populations (σ is known → Z distribution)


The following 4-steps are the expectation for hypothesis testing.

1. Step 1:
H0: μ1 – μ2 = 0 or , H0: μ1 – μ2 ≤ 0 or, H0: μ1 – μ2 ≥ 0
Ha: μ1 – μ2 ≠ 0 (two tailed test). Ha: μ1 – μ2 > 0 (right tailed test). Ha: μ1 – μ2 < 0 (left tailed test).

2. Step 2: Bell curve figure (two tails or one tail) and Table 3 (find Z critical from the Z Table).

3. Step 3: Z-Test statistic: Z =


(x1 − x2 ) − (µ1 − µ2 ) , Note: (μ1 – μ2) = 0
 σ 12 σ 22 
 + 
 n1 n2 

4. Step 4: Comparison and Conclusion


Reject the null hypothesis H0 if the test statistic is in the critical region.
Do not reject the null hypothesis H0 if the test statistic is in the accepted region.

(and/or using p-value to make a conclusion)


---♦---

Example:
Sample of size 6 were drawn independently from two normal populations. These data are listed below.
Test to determine whether the means of the two populations no differ. (Use α = 0.01)
Sample mean Population standard deviation Sample size
Samples σ
𝑥𝑥̅ n
1 7.83 3.06 6
2 8.50 2.95 6

Solution: Since σ is known → Z distribution


1. Step 1: H0: µ1 = µ2 ,or H0: µ1 – µ2 = 0 (claim)
Ha: µ1 ≠ µ2 (two tailed test) Ha: µ1 – µ2 ≠ 0 (two tailed test)

2. Step 2: Given 2 tails α = 0.01 = 1%


(Using the Normal Distribution Table → Find the Z value)

Page 5 of 13
STATISTICS Chapter 13 Prof. Tan Le

rejected region rejected region

accepted region

–2.58 0 +2.58 Z

3. Step 3: Test statistic: Z =


(x1 − x2 ) − (µ1 − µ 2 ) =
(7.83 − 8.50) − 0 =
− 0.67
=
− 0.67
=
 σ 12 σ 22   3.06 2 2.95 2  3.01 1.734935157
 +   6 + 6 
 n1 n2   
= – 0.386

4. Step 4: Comparison and Conclusion


Because the test statistics Z = – 0.386 is in the accepted region.
We do not reject (accept) the null hypothesis H0: µ1 = µ2 .
---♦---
or:
1. Step 1: H0: µ1 = µ2 or, H0: µ1 – µ2 = 0
Ha: µ1 ≠ µ2 (two tailed test) Ha: µ1 – µ2 ≠ 0 (two tailed test)

2. Steps 2 and 3: Z-test (Excel):


z-Test: Two Sample for Means

Sample
Sample 1 2
Mean 7.833333 8.5
Known Variance 9.3636 8.7025
Observations 6 6
Hypothesized Mean Difference 0
z -0.3842
P(Z<=z) one-tail 0.350417
z Critical one-tail 2.326348
P(Z<=z) two-tail 0.700834
z Critical two-tail 2.575829

3. Step 4: Comparison and Conclusion


Because the test statistics Z = – 0.384 is less than the Z critical two tail ± 2.578.
(It is in the accepted region)
We do not reject (accept) the null hypothesis H0: µ1 = µ2 .

or using p-value (by excel) P-value = 0.700834 > α = 0.01, We do not reject the H0: µ1 = µ2 .)

Page 6 of 13
STATISTICS Chapter 13 Prof. Tan Le

Hypothesis Testing of Two Populations (σ is unknown → student t distribution)

Test statistic for μ1 – μ2 when Equal Variances (𝑺𝑺𝟐𝟐𝟏𝟏 = 𝑺𝑺𝟐𝟐𝟐𝟐 ), (n1 = n2 → same sizes)

1). Step 1:
H0: μ1 – μ2 = 0 or, H0: μ1 – μ2 ≤ 0 or, H0: μ1 – μ2 ≥ 0
Ha: μ1 – μ2 ≠ 0 (two tailed test). Ha: μ1 – μ2 > 0 (right tailed test). Ha: μ1 – μ2 < 0 (left tailed test).

2). Steps 2 and 3: Excel


(Data → Data Analysis → t-test: Two sample Assuming Equal Variances)

Degrees of freedom ν = n1 + n2 – 2
(n − 1) s12 + (n2 − 1) s 22
Pooled variance estimator s 2p = 1
n1 + n2 − 2

Test statistic: t =
(x1 − x2 ) − (µ1 − µ 2 ) , Note: (μ1 – μ2) = 0
1 1
s 2p  + 
 n1 n2 

3). Step 4: Comparison and Conclusion


If the p-value ≤ significance level α, → Reject the null hypothesis H0.
When the p-value > significance level α, → Do not reject the null hypothesis H0.
(accept)
---♦---

Step by Step Illustration: Equal Variances


INDEPENDENT GROUPS, ONE-TAILED TEST

Example 13.1: Direct and Broker-Purchased Mutual Funds


Can investors do better by buying mutual funds directly than purchasing mutual funds through broker? To
help answer this question, a group of researchers randomly sampled the annual returns through brokers and
recorded the net annual returns, which are the returns on investment after deducting all relevant fees.

Direct Broker
n1 = 50 n2 = 50
X 1 = 6.63 X 2 = 3.72
S12 = 37.49 S 22 = 43.34

Can we conclude at the 5% significance level that directly purchased mutual funds outperformed funds
bought through brokers?

Page 7 of 13
STATISTICS Chapter 13 Prof. Tan Le

Solution: Since σ is unknown → t distribution


1. Step 1: H0: μD ≤ μB or, H0: μD – μB ≤ 0
Ha: μD > μB (right tail test). Ha: μD – μB > 0 (right tail test).

2. Steps 2 and 3: Excel

3. Step 4: Comparison and Conclusion


From Excel, the P(T ≤ t ) one-tail = 0.0122 < α = 0.05, We reject the null hypothesis H0: μD ≤ μB.
(or, we support the alternative hypothesis Ha: μD > μB.)
---♦---

Test statistic for μ1 – μ2 when Unequal Variances (𝑺𝑺𝟐𝟐𝟏𝟏 ≠ 𝑺𝑺𝟐𝟐𝟐𝟐 ), (n1 ≠ n2 → difference sizes)
1). Step 1:
H0: μ1 – μ2 = 0 or, H0: μ1 – μ2 ≤ 0 or, H0: μ1 – μ2 ≥ 0
Ha: μ1 – μ2 ≠ 0 (two tailed test). Ha: μ1 – μ2 > 0 (right tailed test). Ha: μ1 – μ2 < 0 (left tailed test).

2). Steps 2 and 3: Excel


(Data → Data Analysis → t-test: Two sample Assuming Unequal Variances)

2
 s12 s 22 
 + 
Degrees of freedom ν =  n1 n2  , t=
(x1 − x2 ) − (µ1 − µ 2 ) , Note: (μ1 – μ2) = 0
(
s12 / n1
2
) (
+
s 22 / n2
2
) s12 s 22
+
n1 − 1 n2 − 1 n1 n2

3). Step 4: Comparison and Conclusion


If the p-value ≤ significance level α, → Reject the null hypothesis H0.
When the p-value > significance level α, → Do not reject the null hypothesis H0.
(accept, support)

Page 8 of 13
STATISTICS Chapter 13 Prof. Tan Le

Step by Step Illustration: Unequal Variances


INDEPENDENT GROUPS, TWO-TAILED TEST

Example: Effect of New CEO in Family Run Business


Do the data allow us to infer that the effect of making an offspring CEO is different from the effect of
hiring an outsider as CEO?

1) Offspring 2) Outsider
n1 = 42 n2 = 98
X 1 = – 0.10 X 2 = 1.24
S12 = 3.79 S 22 = 8.03
Test the hypothesis at α = 5% significance level.
---♦---

Solution: Since σ is unknown → t distribution


1. Step 1: H0: μ1 = μ2 or, H0: (μ1 – μ2) = 0
Ha: μ1 ≠ μ2 (two tailed test). Ha: (μ1 – μ2) ≠ 0 (two tailed test).

2. Steps 2 and 3: Excel

3. Step 4: Comparison and Conclusion


From Excel, the P(T ≤ t ) two-tail = 0.0017 < α = 0.05, We reject the null hypothesis H0: μ1 = μ2.
(or, we do not reject the alternative hypothesis Ha: μ1 ≠ μ2.)

Conclusion: There is a sufficient different evidence to infer that the mean change in operating income differ.
---♦---

Page 9 of 13
STATISTICS Chapter 13 Prof. Tan Le

13.3 INFERENCE ABOUT THE DIFFERENCE BETWEEN TWO MEANS:


MATCHED PAIRS EXPERIMENT (DEPENDENT SAMPLES)

X D = d = D = Mean value of the differences D for the dependent samples.


∑D D1 + D2 + ... + Dn
d =D = =
n n
where, D1 = Xbefore11 – Xafter21, D2 = Xbefore12 – Xafter22, …
---♦---

D = X before – X after = ∑ before – ∑ after


X X
or, X D = d =
n n
---♦---

∑ (d − d )
2

SD = Standard deviation value of the differences for the paired sample data, SD =
(n − 1)
---♦---

Test statistic for μD (matched pairs) → dependent samples → n1 = n2 (same sample sizes)
The mean of the population of differences μD = μ1 – μ2
------
• Pre-step: Identify the statistic that is relevant to this test such as sample size n, calculate the difference
(by subtraction) between them (each pair), calculate the sample mean of difference d = x D = x1 – x 2 and

∑ (d − d )
2

standard deviation of difference Sd=


(n − 1)

1. Step 1:
H0: μD = 0 or, H0: μd ≤ 0 or, H0: μD ≥ 0
Ha: μD ≠ 0 (two tailed test). Ha: μd > 0 (right tailed test). Ha: μD < 0 (left tailed test).

2. Step 2: Determine number degrees of freedom ν = nD – 1 and rejection regions.


and then, find the t-critical value (from t-Table).

3. Step 3: Test statistic:


xD − µ D
t= , Note: μD = 0
sD
nD

4. Step 4: Comparison and Conclusion


Reject the null hypothesis H0 if the test statistic is in the critical region.
Do not reject the null hypothesis H0 if the test statistic is in the accepted region.
---♦---

Page 10 of 13
STATISTICS Chapter 13 Prof. Tan Le

Step by Step Illustration:


DEPENDENT GROUPS, ONE-TAILED TEST
Example: The following sample information shows the number of detective units produced in the morning
shift and the afternoon shift for a sample of four days last week.
Day Morning shift Afternoon shift d d – 𝑑𝑑̅ (d – 𝑑𝑑̅)2
1 10 8
2 12 9
3 15 12
4 11 15
Totals ∑ 𝒅𝒅 = ? ∑�𝑑𝑑 − 𝑑𝑑̅ � = 0 2
∑�𝑑𝑑 − 𝑑𝑑̅� = ?

At the 0.05 significance level, can we conclude there are less detect produced in the afternoon shift.
(or, at the 0.05 significance level, can we conclude there are more detect produced in the morning shift.)

Solution:
Pre-step: First, we have to calculate the difference between the “morning shift” and “afternoon shift”, and
the total of the differences.
Morning Afternoon
Day d d – 𝑑𝑑̅ (d – 𝑑𝑑̅ )2
shift shift
1 10 8 10 – 8 = 2
2 12 9 12 – 9 = 3
3 15 12 15 – 12 = 3
4 11 15 11 – 15 = – 4
Totals ∑ 𝒅𝒅 = 4 ∑�𝑑𝑑 − 𝑑𝑑̅ � = 0 2
∑�𝑑𝑑 − 𝑑𝑑̅ � = ?

∑ 𝑑𝑑 𝟒𝟒
Calculate the average of the difference 𝑑𝑑̅ = = =1
𝑛𝑛 4
And then, continue the table
Morning Afternoon
Day d d – 𝑑𝑑̅ (d – 𝑑𝑑̅ )2
shift shift
1 10 8 10 – 8 = 2 2–1=1 (1)2 = 1
2 12 9 12 – 9 = 3 3–1=2 (2)2 = 4
3 15 12 15 – 12 = 3 3–1=2 (2)2 = 4
4 11 15 11 – 15 = – 4 (– 4) – 1 = – 5 (– 5)2 = 25
Totals ∑ 𝒅𝒅 = 4 ∑�𝑑𝑑 − 𝑑𝑑̅ � = 0 2
∑�𝑑𝑑 − 𝑑𝑑̅ � = 34

∑(𝑑𝑑−𝑑𝑑�)2 𝟑𝟑𝟑𝟑
Calculate the standard deviation of the difference, Sd = � =� = √11.3333333 = 3.3665
𝑛𝑛−1 4−1

Page 11 of 13
STATISTICS Chapter 13 Prof. Tan Le

Since σ is unknown → t distribution


1. Step 1: H0: µM ≤ µA or, H0: µD ≤ 0
Ha: µM > µA (right tailed test) Ha: µD > 0 (right tailed test)

2. Step 2: right tails α = 0.05 = 5%


Degrees of freedom = ν = nD – 1 = 4 – 1 = 3

rees of Freedom
t.100 t.050 t.025 t.010 t.005
1 3.078 6.314 12.706 31.821 63.657
2 1.886 2.920 4.303 6.965 9.925
3 1.638 2.353 3.182 4.541 5.841
4 1.533 2.132 2.776 3.747 4.604

rejected region (right tail 5%)

accepted region

0 +2.353 t

xD − µ D 1−0 1
3. Step 3: Test statistic: t = = 3.3665 = = 0.5941
sD � 1.68325
nD √4

4. Step 4: Comparison and Conclusion


Because the test statistics t = 0.5941 is in the accepted region.
We do not reject the null hypothesis H0: µD ≤ 0
(accept)

There is no difference in defective produced in the morning or afternoon shift.


---♦---

Page 12 of 13
STATISTICS Chapter 13 Prof. Tan Le

Second method:

1. Step 1: H0: µM ≤ µA or, H0: µD ≤ 0


Ha: µM > µA (right tailed test) Ha: µD > 0 (right tailed test)

2. Steps 2 and 3: Excel

t-Test: Paired Two Sample for Means

Morning Shift Afternoon Shift


Mean 12 11
Variance 4.666666667 10
Observations 4 4
Pearson Correlation 0.243975018
Hypothesized Mean Difference 0
df 3
t Stat 0.594088526
P(T<=t) one-tail 0.297136382
t Critical one-tail 2.353363435
P(T<=t) two-tail 0.594272763
t Critical two-tail 3.182446305

3. Step 4: Comparison and Conclusion


From Excel, the P(T ≤ t ) one-tail = 0.297136382 > α = 0.05,
We do not reject the null hypothesis H0: μM ≤ μA.

(or, we reject the alternative hypothesis Ha: μM > μA.)

Conclusion: There is no difference in defective produced in the morning or afternoon shift.


---♦---

CHAPTER SUMARY:

Exercises: 13.2, 13.4, 13.8, 13.10, 13.12, 13.16, 13.88, 13.90, 13.92a

Page 13 of 13
STAT 5002 Chapter 16 Prof. Tan Le

Chapters 16 and 4
SIMPLE LINEAR REGRESSION AND CORRELATION
General Form of Linear Regression Equation:
y = a + bx or, ŷ = b0 + b1x
Where, a (or b0) is called y-intercept
b (or b1) is called slope (or regression coefficient) for x.
x is called independent variable.
y is called dependent variable.
---♦---

Objective:
Analyze the relationship between two variables x (independent variable) and y (dependent variable).
Regression analysis is used to predict the value of one variable (y) on the basis of the other variable (x).
---♦---

4.4 MEASURES OF LINEAR RELATIONSHIP (page 110)

Coefficient of Correlation: r
The linear correlation coefficient r, measures the strength of the linear relationship between the
paired x and y values in a sample. (The linear correlation coefficient is sometimes referred to as the
Pearson product moment correlation coefficient in honor of Karl Pearson (1857-1936), who originally
developed it.)

Correlation coefficient: Generally ranging between –1.00 and +1.00, a number in which both the
strength and direction of correlation are expressed.

–1.00 ← perfect negative correlation



–0.60 ← strong negative correlation

–0.30 ← moderate negative correlation

–0.10 ← weak negative correlation

0.00 ← no correlation

+0.10 ← weak positive correlation

+0.30 ← moderate positive correlation

+0.60 ← strong positive correlation

+1.00 ← perfect positive correlation

Page 1 of 11
STAT 5002 Chapter 16 Prof. Tan Le

Properties of the Linear Correlation Coefficient r


1. The value of r is always between -1 and 1. That is, –1 ≤ r ≤ +1
2. The value of r does not change if all values of either variable are converted to a different scale.
3. The value of r is not affected by the choice of x or y.
4. r measures the strength of a linear relationship.

Sample coefficient of correlation,


n ∑ xy − (∑ x )(∑ y )
r=
n (∑ x ) − (∑ x )
2 2
n (∑ y ) − (∑ y )
2 2

---♦---
Least Squares method: ŷ = b0 + b1x, or ŷ = a + bx

n (∑ xy )− (∑ x )(∑ y )
Slope b1 (or b): b1 = (slope)
n (∑ x ) − (∑ x )
2 2

y-intercept b0 (or a): b0 =


(∑ y )(∑ x ) − (∑ x )(∑ xy )
2

( y − intercept )
n (∑ x ) − (∑ x )
2 2

or, b0 = y −b x =
∑ y −b ∑x .
1 1
n n
---♦---

Example 4.17: Estimating Fixed and Variable Costs


Day 1 2 3 4 5 6 7 8 9 10
Number of tools 7 3 2 5 8 11 5 15 3 6
Electricity cost 23.80 11.89 15.98 26.11 31.79 39.93 12.27 40.06 21.38 18.65

Solution:
Day X Y XY X2 Y2
1 7 23.80 (7)(23.80) = 166.60 (7)2 = 49 (23.80)2 = 566.44
2 3 11.89 (3)(11.89) = 35.67 (3)2 = 9 (11.89)2 = 141.37
3 2 15.98 - - -
4 5 26.11 - - -
5 8 31.79 - - -
6 11 39.93 - - -
7 5 12.27 - - -
8 15 40.06 - - -
9 3 21.38 - - -
10 6 18.65 (6)(18.65) = 111.90 (6)2 = 36 (18.65)2 = 347.82
Total = ∑X = 65 ∑Y=241.86 ∑XY = 1896.62 ∑X2 = 567 ∑Y2 = 6810.20

Page 2 of 11
STAT 5002 Chapter 16 Prof. Tan Le

n( ∑ xy ) − ( ∑ x )( ∑ y ) 10(1896.62) − (65)( 241.86) 3245.3


b1 = = = = 2.246
n( ∑ x ) − ( ∑ x )
2 2
10(567) − (65) 2 1445

b0 =
∑ y −b ∑x =
241.86
– 2.246
65
= 24.19 – 14.625 = 9.587
1
n n 10 10

Hence, ŷ = 9.59 + 2.25 x


Interpret the coefficients:
• Y- intercept “a” (or b0); this represents the value of Y when X equals to 0.
• Slope “b” (or b1); this means that for each increase of one unit in X, the average value of Y is
estimated by b units. (ie: increased by b units if b is “+”, or decreased by b units if b is “–” ).

Interpret:
The least squares line is: ŷ = 9.59 + 2.25 x
The y-intercept is b0 = 9.59, which means that, if the number of tools x = 0, the least squares line would
intersect the y-axis at 9.59.
The slope is b1 = 2.25, which means that in this sample, for each 1-unit increase in the number of
tools, the marginal increase in the electricity cost is 2.25.
---♦---

Linear Correlation Coefficient r


n ∑ xy − (∑ x )(∑ y ) 10(1896.62) − (65)( 241.86)
r= = = 0.8711
( ) (
n ∑ x 2 − (∑ x ) n ∑ y 2 − (∑ y )
2 2
) 10(567) − (65) 2 10(6810.20) − ( 241.86) 2

There is a strong positive correlation between the number of tools and electricity cost. (Since, it is
close to +1). → More tools, higher electricity cost.

Note: The linear correlation coefficient r and the slope have the same sign.
If the slope is (+), the correlation coefficient r is (+)
If the slope is (–), the correlation coefficient r is (–).
---♦---

Using the Regression Equation for Prediction


In predicting a value of y based on some given value of x …
1. If there is not a significant linear correlation (r ≈ 0), the best predicted y value is y .
2. If there is a significant linear correlation (r ≈ -1, or r ≈ +1), the best predicted y value is
found by substituting the x value into the regression equation.

Example: From examples 4.17 and 4.18 above and given number of tools x = 10.
Since r = 0.8711 (significant), we found that the best predicted electricity cost at 10 is
ŷ = 9.59 + 2.25 x
= 9.59 + 2.25(10)
= 9.59 + 22.50
= 32.09

Page 3 of 11
STAT 5002 Chapter 16 Prof. Tan Le

Excel Solution:
Scatter Diagram of Number of Tools and Electricity Cost
50.00
y = 2.245x + 9.587
45.00
R² = 0.7588
40.00 r = 0.8711
35.00
Electricity cost

30.00
25.00
Series1
20.00
Linear (Series1)
15.00
10.00
5.00
0.00
0 5 10 15 20
Number of tools

Coefficient of Determination R2
The coefficient of determination is the amount of the variation in y that is explained by the regression
explained variation SSR
line. It is computed as r2 = =
total variation SSTotal
---♦---

Other Parts of the Computer Printout


Total deviation = explained deviation + unexplained deviation
(y – y ) = (ŷ – y ) + (y – ŷ)

Total sum of squares = regression sum of squares + error sum of squares


SSTotal = SSR + SSE
---♦---

The coefficient of determination r2, measures the amount of variation in the dependent variable that is
explained by the variation in the independent variable.
---♦---

Example 4.18: r = 0.8711


R2 = (0.8711)2 = 0.7588

This tells us that 75.88% of the variation in electrical costs is explained by the number of tools.
The remaining 24.12% (100% – 75.88% = 24.12%) is unexplained due to the other factors.

Page 4 of 11
STAT 5002 Chapter 16 Prof. Tan Le

16.2 ESTIMATING THE COEFFICIENTS (page 635)


Least squares (regression line); yˆ = b0 + b1 x
where, b0 is the y-intercept, b1 is the slope, and ŷ is the predicted or fitted value of y.

Example 16.1: Annual Bonus and Years of Experience

Years of Experience, x 1 2 3 4 5 6

Annual Bonus, y 6 1 9 5 17 12

The graph of the regression equation is called the regression line (or line of best fit, or least square line)

1st method: (Excel)

Scatter Plot of Years of Experience and Annual Bonus


18
y = 2.114x + 0.933
16 R² = 0.491
14 r = 0.7007

12
Bonus, y

10
8 Series1
6 Linear (Series1)

4
2
0
0 1 2 3 4 5 6 7
Years of Experience, x

---♦----
2nd method:

ANOVA Table: (from Excel output)


SUMMARY OUTPUT

Regression Statistics
Multiple R 0.700695581
R Square 0.490974298
2
Adjusted R 0.363717872
Standard
Error 4.502909113
Observations 6

Page 5 of 11
STAT 5002 Chapter 16 Prof. Tan Le

ANOVA
Significance
df SS MS F F
Regression 1 78.22857143 78.22857 3.858149 0.120968
Residual 4 81.1047619 20.27619
Total 5 159.3333333

Standard Upper Lower Upper


Coefficients Error t Stat P-value Lower 95% 95% 95.0% 95.0%
Intercept 0.933333333 4.19198025 0.222647 0.834717 -10.70547 12.572136 -10.7055 12.57214
X Variable 1 2.114285714 1.076401159 1.964217 0.120968 -0.874283 5.1028544 -0.87428 5.102854

Slope

SSE 81.1047610
Standard Error of Estimate S ε = = = 20.27619025 = 4.502909087 = 4.503
n−2 6−2

SSR 78.22857143
Coefficient of Determination r2 = = = 0.490974298 = 0.491
SSTotal 159.3333333
This tells us that 49.1% of the variation in annual bonus, y is explained by the number years of
experience, x. The remaining 50.9% (100% – 49.1% = 50.9%) is unexplained due to the other factors.
---♦---

Correlation Coefficient r = r 2 = 0.491 = 0.701


This tells us there is a strong positive correlation between the number of years of experience and the
annual bonus. (The number of years of experience increase, the higher annual bonus).
---♦---

Regression equation ŷ = 0.933333333 + 2.114285714 x


= 0.933 + 2.114 x

Predict the annual bonus ($1000), if the year of experience is 4.


Solution:
ŷ = 0.933 + 2.114 x
= 0.933 + 2.114(4)
= 0.933 + 8.456
= 9.389 ($ thousands) = $ 9389
---♦---

Recall: Describing the p-value


If the p-value is less than significance level α → Reject the null hypothesis Ho
When the p-value exceeds significance level α → Do not reject (support) the null hypothesis Ho

Page 6 of 11
STAT 5002 Chapter 16 Prof. Tan Le

Testing the Slope


The process of testing hypotheses about β1 is identical to the process of testing other parameter.
We can draw inferences about the population slope β1 from the sample slope b1.

The null hypothesis specified that there is no relationship, which means that the slope is 0.

Example:
Test to determine whether there is enough evidence in the Excel output above to infer that there is a
linear relationship between the years of experience and the annual bonus. Use 5% significance level.
Solution:
H0: β1 = 0 There is no relationship between the year of experience and the annual bonus.
H1: β1 ≠ 0 There is a relationship between the year of experience and the annual bonus. (claim)

From the Excel output above

Standard Upper Lower Upper


Coefficients Error t Stat P-value Lower 95% 95% 95.0% 95.0%
Intercept 0.933333333 4.19198025 0.222647 0.834717 -10.70547 12.572136 -10.7055 12.57214
X Variable 1 2.114285714 1.076401159 1.964217 0.120968 -0.874283 5.1028544 -0.87428 5.102854

p = 0.120968 ˃ α = 0.05

Support H0: β1 = 0, reject H1: β1 ≠ 0.

There is no linear relationship exits at 5% significance level.


---♦---

Example 16.2: Car price and odometer. (for all 3-year-old Toyota Camry)
Car Odometer (1000 mi) Price ($1000)
1 37.4 14.6
2 44.8 14.1
3 45.8 14.0
  
98 33.2 14.5
99 39.2 14.7
100 36.4 14.3

Page 7 of 11
STAT 5002 Chapter 16 Prof. Tan Le

1st method: (Excel)

Scatter Plot of Odometer and Price


18
16
14
Price ($1000)

12
10
8
6 y = -0.0669x + 17.249
4 R² = 0.6483
2 r = – 0.8052
0
0.0 10.0 20.0 30.0 40.0 50.0 60.0
Odometer (1000mi)

The least squares line is: ŷ = 17.249 – 0.0669x


Interpret:
The y-intercept is b0 = 17.250, which means that, when x = 0, (the car was not driven at all) the selling
price is $17.250 (thousands) or $17,250.

The slope is b1 = – 0.0669, which means that in this sample, the odometer increase 1 unit (1000 miles),
the marginal decrease in the price is $0.0669 (thousands) or $66.90.

2nd method:

ANOVA Table: (from Excel output)


SUMMARY
OUTPUT

Regression Statistics
Multiple R 0.8052
R Square 0.6483
Adjusted R Square 0.6447
Standard Error 0.3265
Observations 100.0000

ANOVA
Significance
df SS MS F F
Regression 1.0000 19.2556 19.2556 180.642989 0.0000
Residual 98.0000 10.4463 0.1066
Total 99.0000 29.7019

Page 8 of 11
STAT 5002 Chapter 16 Prof. Tan Le

Standard Lower Up
Coefficients Error t Stat P-value Lower 95% Upper 95% 95.0% 95
Intercept 17.2487 0.1821 94.7250 0.000000 16.8874 17.6101 16.8874 17
Odometer (1000 mi) – 0.0669 0.0050 -13.4403 0.000000 -0.0767 -0.0570 -0.0767 -0

Slope
---♦---
R2 = 0.6483
This tells us that 64.83% of the variation in car price is explained by the Odometer. The remaining
35.17% (100% – 64.83% = 35.17%) is unexplained due to the other factors.

r = r 2 = 0.6483 = –0.8052
This tells us there is a strong negative correlation between the odometer and the price. (Higher
odometer, less price).

Regression equation ŷ = 17.249 – 0.0669x


Predict the price of a used Toyota Camry, if the odometer is 40000 miles.
Solution:
ŷ = 17.249 – 0.0669 x
= 17.249 – 0.0669(40)
= 17.249 – 2.676
= 14.573 ($ thousands) = $14573
---♦---

TESTING THE SLOPE


Two Tail Test:
Test to determine whether there is enough evidence in the Excel output above to infer that there is a
linear relationship between the odometer (1000 mi) and the price ($1000) for all 3-year-old Toyota
Camry. Use 5% significance level.
Solution:
H0: β1 = 0 There is no relationship between odometer and price.
H1: β1 ≠ 0 There is a relationship between odometer and price.

From the Excel output above

Standard Lower Up
Coefficients Error t Stat P-value Lower 95% Upper 95% 95.0% 95
Intercept 17.2487 0.1821 94.7250 0.000000 16.8874 17.6101 16.8874 17
Odometer (1000 mi) – 0.0669 0.0050 -13.4403 0.000000 -0.0767 -0.0570 -0.0767 -0

p = 0.00000 < α = 0.05


Reject H0: β1 = 0, support H1: β1 ≠ 0.

There is a linear relationship exits at 5% significance level.

Page 9 of 11
STAT 5002 Chapter 16 Prof. Tan Le

One Tail Test: (We have to divide the two tail p-value by 2)
Test to determine whether there is enough evidence in the Excel output above to infer that there is a
negative linear relationship between the price and the odometer reading for all 3-year-old Toyota
Camry. Use a 5% significance level.

Solution:
H0: β1 = 0 (There is no linear relationship)
H1: β1 < 0 (There is a negative linear relationship)

p 0.00000
= = 0.00000 < α = 0.05
2 2
→ Reject H0: β1 = 0, support H1: β1 < 0.

There is a negative linear relationship exits at 5% significance level. (higher odometer, lower price)
---♦---

2
Note: If r2 is given, then r = r
The slope b1 and the linear correlation coefficient (r value) have the same sign.

If the slope is (+), the correlation coefficient r is (+)


If the slope is (–), the correlation coefficient r is (–).
---♦---

Tan Le Example: Given ŷ = 1.25 – 0.67x, and r2 = 0.64, what is r value?


Solution: r = r 2 = 0.64 = ...
There is a strong negative correlation between the two variables x, and y.
---♦---

CHAPTER 4 SUMMARY (page 132)


Exercises
Exercise 4.4: Try even questions 4.84 – 4.98 page 122.

CHAPTER 16 SUMMARY (page 674)


Exercises
Exercise 16.2: Try even questions 16.2 – 16.16 page 641.
Exercise 16.4: Try even questions 16.22 – 16.32 page 661.

Page 10 of 11
STAT 5002 Chapter 16 Prof. Tan Le

EXERCISE #10
1. Are the marks one receives in a course related to the amount of time spent studying the
subject? To analyze this mysterious possibility a student took a random sample of 10 students
who had enrolled in an accounting class last semester. He asked each to report his or her mark
in course and the total number of hours spent studying accounting. These data are listed here.
(text #4.66 textbook)
Time Spent Studying 40 42 37 47 25 44 41 48 35 28
Marks 77 63 79 86 51 78 83 90 65 47
a) What is the dependent and independent variable?
b) What is the coefficient of determination (r2) and interpret the meaning.
c) What is the correlation coefficient (r) and comment on the strength of correlation?
d) Determine the least squares line ŷ = b0 + b1x.
e) Predict the mark if the time spent studying is 45 hours.
---♦---

2. Attempting to analyze the relationship between advertising and sales, the owner of a furniture
store recorded the monthly advertising budget ($ thousands) and the sales ($ million) for a
sample of 12 months. The data are listed here: (text #16.2 textbook)

Advertising 23 46 60 54 28 33 25 31 36 88 90 99

Sales 9.6 11.3 12.8 9.8 8.9 12.5 12.0 11.4 12.6 13.7 14.4 15.9

a) Draw a scatter diagram. Does it appear that advertising and sales are linearly related?
b) Calculate the least squares line and interpret the coefficients.
---♦---

3. An economist believes that the price is the biggest factor affecting quantity sold. To support his
argument he collects 30 data points relating quantity sold and price.
Suppose the regression line for the data is ŷ = 45 – 0.28 x and r2 = 0.58.

a) What is the dependent and independent variable?


b) Is the correlation positive or negative?
c) What is the correlation coefficient (r) and comment on the strength of correlation?
d) What is the coefficient of determination (r2) and interpret the meaning.
e) Predict the quantity sold at a price of $7.50.

Page 11 of 11
STAT 5002 Chapter 17 Prof. Tan Le

Chapter 17
MULTIPLE REGRESSION
A multiple regression equation expresses a linear relationship between a dependent variable Y and two
or more independent variables (x1, x2, x3,…,xk).

General Form of Multiple Regression Equation:


ŷ = b0 + b1 x1 + b2 x2 + b3 x3 + … + bk xk
Where, b0 is called y-intercept
b1, b2, b3, …,bk are called slopes (or regression coefficients) for x’s.
x1, x2, x3, …, xk are called independent variables.
y is called dependent variable.
---♦---

Measuring autocorrelation: The Durbin Watson Statistic:


One of the basic assumptions of the regression model that has been considered is the independence of
the errors. This assumption is often violated when data are collected over sequential periods of time
because a residual at any one point in time may tend to be similar to residuals at adjacent points in time.
Such a pattern in the residuals is called autocorrelation. When substantial autocorrelation is present in a
set of data, the validity of a fitted regression model can be in serious doubt.

AUTOCORRELATION:
In multiple regression models, successive observations of the dependent variable are supposed to be
uncorrelated. Violation of this assumption often occurs when data are correlated successively over
period of time. This type of correlation is called autocorrelation.
---♦---

Coefficient of Determination R2
The coefficient of determination is the amount of the variation in y that is explained by the regression
explained variation SSR
line. It is computed as r2 = =
total variation SSTotal

Page 1 of 5
STAT 5002 Chapter 17 Prof. Tan Le

HETEROSCEDASTICITY and HOMOSCEDASTICITY:


The variance of the error variable σ ε2 is required to be constant (This means that the errors vary the
same amount when X is a high value). If the requirement is satisfied, the condition is called
homoscedasticity. When this requirement is violated, the condition is called heteroscedasticity.

HOMOSCEDASTICITY:
The variation around the regression equation is the same for all the values of the independent variables.

Floor Residual Plot


100

50
Residuals

0
0 5 10 15 20 25 30
-50
Floor

HETEROSCEDASTICITY

DEPTH Residual Plot


40000

20000
Residuals

0
56 58 60 62 64 66
-20000
DEPTH

---♦---

Multicollinearity:
Multicollinearity exists when the independent variables are correlated. There are two main symptoms of
multicollinearity:
a) The model as a whole is significant but no variables are significant.
b) The coefficient for a variable can have the opposite sign of what it should.
---♦---
MULTICOLLINEARITY
In multiple regression models, correlation among independent variables can potentially distort the
standard error of estimate and may therefore lead to incorrect conclusions as to which independent
variables are statistically significant. Such correlation is called multicollinearity.

Page 2 of 5
STAT 5002 Chapter 17 Prof. Tan Le

Inferences in Multiple Linear Regression:


ŷ = β0 + β1 x1 + β2 x2 + β3 x3 + … + βk xk + εi
Note: ε (Epsilon) = Error

Example: The Table below contains the measurements from anaesthetized bears. Using all the
independent variables x1 through x6, find the multiple regression by Excel.

Dependent variable Y: Weight (pounds)


Independent variable x1: Age (months)
Independent variable x2: Head Length (inches)
Independent variable x3: Head Width (inches)
Independent variable x4: Neck (inches)
Independent variable x5: Body Length (inches)
Independent variable x6: Chest (inches)
---♦---
Excel:

Page 3 of 5
STAT 5002 Chapter 17 Prof. Tan Le

Examining the results of the ANOVA table, if we use α = 0.05, we find that the model as a whole is
significant with a P-value of 0.0646, though barely so. However, when we examine the P-values of each
of the independent variables, Age has the lowest one at 0.1388. Again using α = 0.05, we conclude that
none of the variables is significant. This due to the presence of multicollinearity.
---♦---
Describing the p-value
If the p-value ≤ significant level α, there is strong evidence to infer that the alternative hypothesis Ha is
true. The result is deemed to be significant. → Reject the null hypothesis H0.
When the p-value > significant level α, there is no evidence to infer that the alternative hypothesis Ha is
true. We also say that the test is not significant. → Do not reject (accept) the null hypothesis H0.
---♦---

TESTING THE COEFFICIENTS


Question:
Based on the Excel output, does it look that the multiple regression model is overall significant?
Conduct the test at 5% significance level.

Solution:
Step 1:
Ho: β1 = β2 = β3 = β4 = β5 = β6 = 0
Ha: At least one of the regression coefficient βi is not equal to 0.

Step 2: Excel output

P-value = significance F = 0.0464 < α = 0.05


Step 3:
Reject H0: β1 = β2 = β3 = β4 = β5 = β6 = 0
(Support H1: At least one of the regression coefficient is not equal to 0.)

At 5% significance level, there is at least one significance predictor of bear weight. The model is
overall significance.
(There is a linear relationship exits at 5% significance level.)

Page 4 of 5
STAT 5002 Chapter 17 Prof. Tan Le

Using the Multiple Regression Equation for Prediction

The multiple regression model is:


Yˆ = – 216.4152 – 1.3720 x1 – 4.4894 x2 + 6.8528 x3 + 19.4721 x4 – 2.9307 x5 + 6.7196 x6

Question:
Predict the bear weight (lb.) if a bear is 50 months old, and has a head-length measuring 16 inches, a
head-width measuring 8 inches, a neck measuring 27 inches, a body-length measuring 70 inches, and a
chest measuring 40 inches around.

Solution:
Yˆ = – 216.4152 – 1.3720 x1 – 4.4894 x2 + 6.8528 x3 + 19.4721 x4 – 2.9307 x5 + 6.7196 x6
= – 216.4152 – 1.3720(50) – 4.4894(16) + 6.8528(8) + 19.4721(27) – 2.9307(70) + 6.7196(40)
= 287.3585 pounds
---♦---

CHAPTER 17 SUMMARY

Exercises

Page 5 of 5
STAT 5002 Chapter 20 Prof. Tan Le

Chapter 20:
TIME SERIES ANALYSIS AND FORECASTING
Any variable that is measured over time in sequential order is called a time series.
Forecasting is a common practice among managers and government decision makers.
20.1 TIME SERIES COMPONENTS
Time Series Components:
1. Long-term trend
2. Seasonal variation
3. Cyclical variation
4. Random variation.

1. Long-term trend: A trend is a long-term, relatively smooth pattern (upward or downward) or


direction exhibited by a series. Its duration is more than a year.

Time Time

2. Seasonal variation: Seasonal variation refers to cycles that occur over short repetitive calendar and,
by definition, have a duration of less than a year.

Time

3. Cyclical variation: Cyclical variation is a wavelike pattern describing a long term trend that
generally apparent over a number of years, resulting in a cyclical effect.

Time

Page 1 of 6
STAT 5002 Chapter 20 Prof. Tan Le

4. Random variation: Random variation is caused by irregular and unpredictable changes in a time
series that are not caused by any other components.

Time Time

Table: Factors influencing time-series data

Component Classification Definition Reason for Duration


of Component Influence
Trend Systematic Overall or persistent, long- Changes in Several years
term upward or downward technology, wealth,
pattern of movement value
Seasonal Systematic Fairly regular periodic Weather conditions, Within 12
fluctuations that occur social customs, months (or
within each 12 month period religious customs monthly or
year after year quarterly
data)
Cyclical Systematic Repeating up-and-down Interactions of Usually 2-10
swings or movement through numerous years with
four phases: from peak combinations of differing
(prosperity) to contraction factors influencing intensity for
(recession) to trough the economy a complete
(depression) to expansion cycle
(recovery or growth)
Irregular Unsystematic The erratic or “residual” Random variations Short
fluctuations in a series that in data due to duration and
exist after taking into unforeseen events nonrepeating
account the systematic such as strikes,
effects – trend, seasonal, and hurricanes, floods,
cyclical political
assassinations, etc.
---♦---

MEASURING FORECAST ERROR


Forecast Error: The forecast error tells us how well the model performed using the past data.
Page 2 of 6
STAT 5002 Chapter 20 Prof. Tan Le

Forecast Error = Actual value – Forecast value = At – Ft

Mean Absolute Deviation (MAD), or Mean Absolute Error (MAE)


∑𝑇𝑇
𝑡𝑡=1 |𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒| ∑𝑇𝑇
𝑡𝑡=1 |𝐴𝐴𝑡𝑡 −𝐹𝐹𝑡𝑡 |
MAD = MAE = =
𝑇𝑇 𝑇𝑇

Mean Squared Error (MSE): The MSE accentuates large deviations.


∑𝑇𝑇
𝑡𝑡=1 | 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒|
2 ∑𝑇𝑇
𝑡𝑡=1 |𝐴𝐴𝑡𝑡 −𝐹𝐹𝑡𝑡 |
2
MSE = =
𝑇𝑇 𝑇𝑇

Mean Absolute Percent Error (MAPE): The MAPE expresses the error as a percentage of the actual
values.
 Forecast error  T
 At − Ft 
∑  
Actual ∑  A 
t =1 
MAPE = × 100% = 100% t 
T T
Note: The smaller error, the better forecast.
---♦---

20.2 SMOOTHING TECHNIQUES


MOVING AVERAGES
Moving averages smooth out variation when forecasting demands are fairly steady.

k- period moving average =


∑ (actual value in previous k periods)
k
Note: Increasing the size of “k” (number of averaged periods) increases the smoothness of the time series.
---♦---
Example: Wallace Garden Supply Example
Month Actual 3 Month Moving Error |Error| (Error)2 |Error|/Actual
Sales Average Forecast =A-F
Jan 10 “ “ “ “ “
Feb 12 “ “ “ “ “
Mar 16 “ “ “ “ “
Apr 13 (10+12+16)/3 = 12.67 0.333 0.333 0.111 0.026
May 17 (12+16+13)/3 = 13.67 3.333 3.333 11.111 0.196
Jun 19 (16+13+17)/3 = 15.33 3.667 3.667 13.444 0.193
Jul 15 (13+17+19)/3 = 16.33 -1.333 1.333 1.778 0.089
Aug 20 (17+19+15)/3 = 17.00 3.000 3.000 9.000 0.150
Sep 22 (19+15+20)/3 = 18.00 4.000 4.000 16.000 0.182
Oct 19 (15+20+22)/3 = 19.00 0.000 0.000 0.000 0.000
Nov 21 (20+22+19)/3 = 20.33 0.667 0.667 0.444 0.032
Dec 19 (22+19+21)/3 = 20.67 -1.667 1.667 2.778 0.088
Totals = 18.0 54.666 0.956
19 + 21 + 19
Forecast for Jan of following year: = 19.667
3

Page 3 of 6
STAT 5002 Chapter 20 Prof. Tan Le

Mean Absolute Deviation (Mean Absolute Error) = MAD = MAE = ∑ Forecast error =
18.0
= 2.000
T 9
∑ Forecast error
2
54.666
Mean Squared Error = MSE = = 6.074=
T 9
 Forecast error 
∑  Actual 0.956
Mean Absolute Percent Error = MAPE = = = 0.1061 = 10.61%
T 9
---♦♦---

WEIGHTED MOVING AVERAGE


Weights can be used to put more emphasis on recent periods.

∑𝑇𝑇
𝑖𝑖=1(𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤ℎ𝑡𝑡 𝑓𝑓𝑓𝑓𝑓𝑓 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑖𝑖)(𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣 𝑖𝑖𝑖𝑖 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑖𝑖)
k-period weighted moving average =
∑𝑇𝑇
𝑖𝑖=1(𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤ℎ𝑡𝑡𝑡𝑡)

Period Weighted Applied


Last month 3
Two months ago 2
Three months ago 1
Sum of weights 6

Example: Wallace Garden Supply Example


Month Actual 3 Month Weighted Moving Error |Error| (Error)2 |Error|/Actual
Sales Average Forecast =A-F
Jan 10 “ “ “ “ “
Feb 12 “ “ “ “ “
Mar 16 “ “ “ “ “
Apr 13 (1x10+2x12+3x16)/6 = 13.667 -0.667 0.667 0.444 0.051
May 17 (1x12+2x16+3x13)/6 = 13.833 3.167 3.167 10.028 0.186
Jun 19 (1x16+2x13+3x17)/6 = 15.500 3.500 3.500 12.250 0.184
Jul 15 (1x13+2x17+3x19)/6 = 17.333 -2.333 2.333 5.444 0.156
Aug 20 (1x17+2x19+3x15)/6 = 16.667 3.333 3.333 11.111 0.167
Sep 22 (1x19+2x15+3x20)/6 = 18.167 3.833 3.833 14.694 0.174
Oct 19 (1x15+2x20+3x22)/6 = 20.167 -1.167 1.167 1.361 0.061
Nov 21 (1x20+2x22+3x19)/6 = 20.167 0.833 0.833 0.694 0.040
Dec 19 (1x22+2x19+3x21)/6 = 20.500 -1.500 1.500 2.250 0.079
Totals 20.33 58.275 1.098

(1)19 + ( 2)21 + (3)19


Next period (January) forecast = = 19.667
6

Page 4 of 6
STAT 5002 Chapter 20 Prof. Tan Le

Mean Absolute Deviation (Mean Absolute Error): MAD = MAE = ∑ Error =


20.33
= 2.26
T 9
∑ Error
2
58.275
Mean Squared Error: MSE = = = 6.475
T 9
 Error 
∑  Actual 1.098
Mean Absolute Percent Error: MAPE = × 100% = = 0.122 = 12.20%
T 9

---♦---
EXPONENTIAL SMOOTHING
Exponential smoothing is also a type of moving averages model.
Forecast for period (t+1) = forecast for period t + 𝛼𝛼(actual value in period t – forecast for period t)
Ft+1 = Ft + α (At – Ft)
or, Ft+1 = α At + (1 – α) Ft
where
Ft = value of the exponentially smoothed series being computed in time period t
Ft+1 = value of the exponentially smoothed series being computed in time period t = 1.
At = actual value of the time series in period t
𝛼𝛼 = subjectively assigned weighted or smoothing constant (where 0 ≤ 𝛼𝛼 ≤ 1)
F1 = A1
Example: Wallace Garden Supply Example
Month Actual Forecast ( α = 0.1) Error |Error| (Error)2 |Error|/Actual
Sales =A-F
Jan 10 10.00 (assumed) “ “ “ “
Feb 12 10.0 +.1(10-10.0) = 10.00 2.000 2.000 4.000 0.167
Mar 16 10.0 +.1(12-10.0) = 10.20 5.800 5.800 33.640 0.364
Apr 13 10.2 +.1(16-10.2) = 10.78 2.220 2.220 4.928 0.171
May 17 10.8 +.1(13-10.8) = 11.00 5.998 5.998 35.976 0.353
Jun 19 11.0 +.1(17-11.0) = 11.60 7.398 7.398 54.733 0.389
Jul 15 11.6 +.1(19-11.6) = 12.34 2.658 2.658 7.067 0.177
Aug 20 12.3 +.1(15-12.3) = 12.61 7.393 7.393 54.650 0.370
Sep 22 12.6 +.1(20-12.6) = 13.35 8.653 8.653 74.879 0.393
Oct 19 13.4 +.1(22-13.4) = 14.21 4.788 4.788 22.925 0.252
Nov 21 14.2 +.1(19-14.2) = 14.69 6.309 6.309 39.806 0.300
Dec 19 14.7 +.1(21-14.7) = 15.32 3.678 3.678 13.529 0.194
Totals 56.89 346.137 0.956

Forecast for Jan of following year: F13 = F12 + α (A12 – F12) = 15.32 + 0.1(19 – 15.32) = 15.69

Mean Absolute Deviation (Mean Absolute Error): MAD = MAE = (∑|Error|)/11 = 56.89/11 = 5.172
MSE = (∑|Error|2)/11 = 346.137/11 = 31.467
MAPE = [∑(|Error|)/Actual)]/11 = 3.13/11 = 0.2844 = 28.44%
Page 5 of 6
STAT 5002 Chapter 20 Prof. Tan Le

2 2
Relationship between 𝛼𝛼 and k: 𝛼𝛼= , or k = –1
𝑘𝑘+1 𝛼𝛼

Note: The exponential smoothing formula Ft+1 = α At + (1 – α) Ft can be rewrite as

Ft+1 = αAt + α(1 – α)At – 1 + α(1 – α)2At – 2 + α(1 – α)3At – 3 + …

Note:
If we desire only to smooth a series by eliminating unwanted cyclical and irregular variations, we should
select a small value for 𝛼𝛼 (close to 0).

On the other hand, if our goal is forecasting, we should choose a large value for 𝛼𝛼 (close to 1).

---♦---

CHAPTER SUMMARY

Exercises: questions 20.1 – 20.17, 20.30, 20.32


---♦---

(Homework:)
The following data represent the annual sales (in millions of dollars) for a food-processing
company for the years 1993 – 2014.

ANNUAL SALES (MILLIONS OF DOLLARS)

YEAR SALES YEAR SALES YEAR SALES


1993 6.60 2002 8.30 2011 7.80
1994 8.60 2003 9.30 2012 8.40
1995 9.10 2004 8.60 2013 8.30
1996 9.50 2005 7.80 2014 8.40
1997 9.00 2006 8.10
1998 7.10 2007 7.90
1999 6.80 2008 7.50
2000 6.20 2009 7.40
2001 7.80 2010 7.70

a) Fit a 3 year moving average to the data and plot the results on your chart.
b) Using a smoothing coefficient of w = 0.25, exponentially smooth the series
c) What is your exponentially smoothed forecast for the trend in 2015?

Page 6 of 6
STATISTICS Chapter 11 & Chapter 12(Part II) Prof. Tan Le

Chapter 11:
INTRODUCTION TO HYPOTHESIS TESTING
11.1 CONCEPT OF HYPOTHESIS TESTING

Definition: In statistics, a hypothesis is a claim or statement about the population parameters.


(such as: population mean μ, or population variance σ2).

The critical concepts in hypothesis testing follow:


1) There are 2 hypotheses. One is called the null hypothesis (Ho), and the other called the alternative
or research hypothesis (Ha or H1).
2) The testing procedure begins with the assumption that the null hypothesis is true.
3) The goal of the process is to determine whether there is enough evidence to infer that the
alternative is true.
4) There are 2 possible decisions:
a. Conclude that there is enough evidence to support the alternative hypothesis Ha.
Reject the null hypothesis Ho
b. Conclude that there is not enough evidence to support the alternative hypothesis.
Do not reject the null hypothesis Ho
(accept, support)

5) Two possible errors can be made in any test.


a. Type I error: The mistake of rejecting the null hypothesis when it is true.
P(Type I error) = α
b. Type II error: The mistake of failing to reject the null hypothesis when it is false.
P(Type II error) = β

Components of a Formal Hypothesis Test

 The null hypothesis (denoted by H 0 ) is a statement about the value of a population parameter (such
as the mean), and it must contain the condition of equality and must be written with the symbol =, ≤,
or ≥. For the mean, the null hypothesis will be stated in one of these three possible forms:
H0 : µ = k H0 : µ ≤ k H0 : µ ≥ k
We test the null hypothesis directly in the sense that we assume it is true and reach a conclusion to either
reject H 0 or fail to reject H 0 .

 The alternative hypothesis (denoted by H1 or Ha) is the statement that must be true if the null
hypothesis is false. For the mean, the alternative hypothesis will be stated in only one of these three
possible forms:
Ha : µ ≠ k Ha : µ > k Ha : µ < k
(two tailed test) (right tailed test) (left tailed test)
Note that H1 (or Ha) is the opposite of H 0 .

Page 1 of 10
STATISTICS Chapter 11 & Chapter 12(Part II) Prof. Tan Le

11.2 TESTING THE POPULATION MEAN


WHEN THE POPULATION STANDARD DEVIATION σ IS KNOWN

Rejection Region: (Critical region)


The rejection region is a range of values such that if the test statistic falls into that range, we decide to
reject the null hypothesis in favor of the alternative hypothesis. (Critical region(s): tail(s) of the
symmetrical bell shape.)

Significance level α: The value (probability) of the tail(s).

Compare Test Statistic and the Critical Value from the Table:
If the test statistic (value) is in the rejected region, we reject the null hypothesis H0, we conclude that
there is enough statistical evidence to infer that the alternative hypothesis is true.
If the test statistic (value) is not in the rejected region, we do not reject the null hypothesis H0, we
conclude that there is not enough statistical evidence to infer that the alternative hypothesis is true.
---♦---

x−µ  x−µ 
Standardized Test Statistic: z = , or Z= n 
σ  σ 
n
---♦---

p-value: The p-value of a test is the probability of observing a test statistic at least as extreme as the one
computed given that the null hypothesis is true.

Describing the p-Value


If the p-value ≤ significance level α, there is strong evidence to infer that the alternative hypothesis Ha is
true. The result is deemed to be significant. → Reject the null hypothesis H0.

When the p-value > significance level α, there is no evidence to infer that the alternative hypothesis Ha is
true. We also say that the test is not significant. → Do not reject (accept) the null hypothesis H0.
---♦---

Compare Test Statistic and the Critical Value from the Table:
If the test statistic (value) is in the rejected region, we reject the null hypothesis H0, we conclude that
there is enough statistical evidence to infer that the alternative hypothesis is true.
If the test statistic (value) is not in the rejected region, we do not reject the null hypothesis H0, we
conclude that there is not enough statistical evidence to infer that the alternative hypothesis is true.

Note: When we state the hypotheses, we list the alternative hypothesis Ha first followed by the null hypothesis Ho.
---♦---

Page 2 of 10
STATISTICS Chapter 11 & Chapter 12(Part II) Prof. Tan Le

ONE TAILED or TWO TAILED TEST?


Two Tailed Test:
It was, is, equal to, … Ho: µ = …
Difference, more-or-less, not equal to, … Ha: µ ≠ …
---♦---

Example: A consumer analysis reports that the mean life of a certain type of automobile battery is not 72 months.

Note: The claim “the mean … is not 72 months” can be written as µ ≠ 72 months.
Its complement is µ = 72 months. Because µ = 72 months contains the statement of equality, it becomes the null hypothesis.
In this case, alternative hypothesis represents the claim.

Ho: µ = 72 months
Ha: µ ≠ 72 months. (claim)

rejected region rejected region

accepted region

Sign used in
Two-tailed test
---♦---

Example: The college cafeteria claims that the average amount spent by a student per visit is $3.50.

Note: The claim “the average … is 3.50” can be written as µ = $3.50.
Its complement is µ ≠ $3.50. Because µ = $3.50 contains the statement of equality, it becomes the null hypothesis. In this
case, the null hypothesis represents the claim.

You intent to test the mean amount spent differs from this amount.

Ho: µ = $3.50 (claim)


Ha: µ ≠ $3.50

Page 3 of 10
STATISTICS Chapter 11 & Chapter 12(Part II) Prof. Tan Le

Right Tailed Test:


at most, no more than ,… Ho: µ ≤ …
More than, higher, or increase, … Ha: µ > …
---♦---

Example: A company advertises that the mean life of its furnaces is more than 15 years

Note: The claim “the mean … is more than 15 years” can be written as µ > 15 years.
Its complement is µ ≤ 15 years. Because µ ≤ 15 years contains the statement of equality, it becomes the null hypothesis. In
this case, alternative hypothesis represents the claim.

Ho: µ ≤ 15 years
Ha: µ > 15 years (claim)

rejected region

accepted region

Sign used in

Right-tailed test
---♦---

Example: A company claim that their product contains no more than 2 grams of saturated fat on average.

Note: The claim “…product contains no more than 2 grams” can be written as µ ≤ 2 grams.
Its complement is µ > 2 grams. Because µ ≤ 2 grams contains the statement of equality, it becomes the null hypothesis. In this
case, the null hypothesis represents the claim.

You intent to test whether there is strong evidence the mean saturated fat content is greater than their claim.

Ho: µ ≤ 2 grams (claim)


Ha: µ > 2 grams

Page 4 of 10
STATISTICS Chapter 11 & Chapter 12(Part II) Prof. Tan Le

Left Tailed Test:


at least, no less than , … Ho: µ ≥ …
Less than, lower, cheating or reduce , … Ha: µ < …
---♦---

Example: A car dealership announces that the mean time for an oil change is less than 20 minutes.

Note: The claim “the mean … is less than 20 minutes” can be written as µ < 20 minutes.
Its complement is µ ≥ 20 minutes. Because µ ≥ 20 minutes contains the statement of equality, it becomes the null hypothesis.
In this case, alternative hypothesis represents the claim.

Ho: µ ≥ 20 minutes
Ha: µ < 20 minutes (claim)

rejected region

accepted region

Sign used in
Left-tailed test
---♦---

Example: The shipping dock receive 40 kg bags of flour to use in the production of the baked goods. A sample of
25 bags is taken from each shipment and weighed to ensure that the average weight is at least 40kg.

Note: The claim “…the average weight is at least 40 kg” can be written as µ ≥ 40 kg.
Its complement is µ < 40 kg. Because µ ≥ 40 kg contains the statement of equality, it becomes the null hypothesis. In this
case, the null hypothesis represents the claim.

You intent to test whether there is a strong evidence that the mean weight is less than 40 kg.

Ho: µ ≥ 40 kg (claim)
Ha: µ < 40 kg

Page 5 of 10
STATISTICS Chapter 11 & Chapter 12(Part II) Prof. Tan Le

Hypothesis Testing of One Population (σ is known → Z distribution)


The following 4-steps are the expectation for hypothesis testing.

1). Step 1. Identify the specific claim (about population mean μ) or hypothesis to be tested and put it in
symbolic form.
The null hypothesis H0 is the one that contains the condition of equality.
The alternative hypothesis Ha (or H1) is the other statement.

H0: μ = … or, H0: μ ≤ … or, H0: μ ≥ …


Ha: μ ≠ … (two tailed test). Ha: μ > … (right tailed test). Ha: μ < … (left tailed test).

2). Step 2. Bell curve figure (two tails or one tail) and Z critical (Z Table).
(OUTWARD method)

3). Step 3. Z test statistic (formula).


x−µ (x − µ)
Z= , or z= n
σ σ
n

4). Step 4. Comparison and Conclusion


Reject the null hypothesis H0 if the test statistic value is in the critical regions (tails).
Do not reject the null hypothesis H0 if the test statistic value is not in the critical regions.
(and/or using p-value to make a conclusion)

Page 6 of 10
STATISTICS Chapter 11 & Chapter 12(Part II) Prof. Tan Le

Step by Step Illustration: (σ is given → using Z distribution)


Example: Test the claim that μ > 750, given a sample of n = 36 for which x = 800. Assume the
population standard deviation σ = 100. Use a significance level of α = 0.01.
Solution:
Since σ is given → Z distribution
1. Step 1: H0: μ ≤ 750
Ha: μ > 750 (right tailed test).

2. Step 2: right tail α = 1% = 0.01


rejected region (α = 1% = 0.01)

accepted region

99%

0 +2.33 Z

(Note: given right tail = 1% → the rest of the curve is 99% = 0.99 → find Z value by the Table 3
0.99 ≈ 0.9901 → Z = +2.33)

3. Step 3: Test statistic (formula).


x − µ 800 − 750 (x − µ) (800 − 750)
z= = = 3.00, or z= n = 36 = 3.00
σ 100 36 σ 100
n

4. Step 4: Comparison and Conclusion


Because the test statistics z = 3.00 is greater than 2.33 (in the rejected region).
We reject the null hypothesis H0: μ ≤ 750.

And/or
p-value = P(Z > 3.00) = 0.13% = 0.0013.
p-value = 0.0013 which is less than 0.01. Thus, there is not overwhelming evidence to infer that the
alternative hypothesis Ha is true. Reject the H0: μ ≤ 750

---♦---

Page 7 of 10
STATISTICS Chapter 11 & Chapter 12(Part II) Prof. Tan Le

Step by Step Illustration: (σ is given → using Z distribution)


Example 11.2: Comparison of AT&T and Its Competitor
Test there is a difference, μ ≠ 17.09, given a sample of n = 100 for which x = 17.55. Assume the
population standard deviation σ = 3.87. Use a significance level of α = 0.05.

Solution:
1. Step 1: H0: μ = 17.09
Ha: μ ≠ 17.09 (two tailed test).

2. Step 2: 95% confidence → Z = ±1.96

0.025 0.025 z0.025 = 1.96

Reject H0 if test statistic z > 1.96, or z < -1.96

-1.96 1.96

Reject H0 Reject H0

3. Step 3: Test statistic (formula).


x − µ 17.55 − 17.09 (x − µ) (17.55 − 17.09)
z= = = 1.19, or z= n = 100 = 1.19
σ 3.87 100 σ 3.87
n

4. Step 4: Comparison and Conclusion


Because the test statistics z = 1.19 is in the accepted region.
Do not reject the null hypothesis H0: μ = 17.09.
(We support the null hypothesis H0: μ = 17.09).

And/or
p-value = P(Z < –1.19) + P(Z > 1.19) = 0.1170 + 0.1170 = 0.2340.
p-value = 0.2340 which is more than 0.05. Thus, there is not enough evidence to infer that there is not a
difference between the average AT&T bill and that of its competitor.
Do not reject the null hypothesis H0: μ = 17.09.
(Accept the null hypothesis H0: μ = 17.09)

Or, reject the alternative hypothesis Ha: μ ≠ 17.09)

Page 8 of 10
STATISTICS Chapter 11 & Chapter 12(Part II) Prof. Tan Le

Chapter 12: (Part II)


INFERENCE ABOUT A POPULATION
TESTING THE POPULATION MEAN WHEN
THE POPULATION STANDARD DEVIATION σ IS UNKNOWN
(Student t distribution)

Hypothesis Testing of One Population (σ is unknown → student t distribution)


The following 4-steps are the expectation for hypothesis testing.

1. Step 1: Identify the specific claim (about population mean μ) or hypothesis to be tested and put it in
symbolic form.
The null hypothesis H0 is the one that contains the condition of equality.
The alternative hypothesis Ha (or H1) is the other statement.

H0: μ = … or, H0: μ ≤ … or, H0: μ ≥ …


Ha : μ ≠ … Ha : μ > … Ha : μ < …
(two tailed test) (right tailed test). (left tailed test).

2. Step 2: Test statistic (formula).


x−µ (x − µ)
t= , or t= n
s s
n

3. Step 3: Bell curve figure (two tails or one tail) and Table 4 (t Table).
Degree of freedom = n – 1

4. Step 4: Comparison and Conclusion


Reject the null hypothesis H0 if the test statistic value is in the critical regions (tails).
Do not reject the null hypothesis H0 if the test statistic value is not in the critical regions.
(and/or using p-value to make a conclusion)
---♦---

Step by Step Illustration: (s is given → using t distribution)


Test the claim that μ ≠ 60, given a sample of n = 162 for which sample mean, x = 63.70 and sample
standard deviation, s = 18.94. Use a significance level of α = 0.05.
Solution:
Since s is given → t distribution (σ is unknown)
1. Step 1: H0: μ = 60
Ha: μ ≠ 60 (two tailed test).

Page 9 of 10
STATISTICS Chapter 11 & Chapter 12(Part II) Prof. Tan Le

2. Step 2: two tails α = 0.05 = 5%, (each tail = α/2 = 0.05/2 = 0.025)
Degree of freedom = n – 1 = 162 – 1 = 161,
(there is no degrees of freedom 161, use the closest df. is 160 → find t value by the Table 4)

rejected region rejected region

accepted region

– 1.975 0 +1.975 t

3. Step 3: Test statistic (formula).


x−µ 63.70 − 60 (x − µ) (63.70 − 60)
t= = = 2.49, or t= n = 162 = 2.49
s 18.94 162 s 18.94
n

4. Step 4: Comparison and Conclusion


Because the test statistics t = 2.49 is greater than 1.975 (it is in the rejected region).
We reject the null hypothesis H0: μ = 60.
---♦---

or We support the alternative hypothesis Ha: μ ≠ 60.


---♦---

CHAPTER 11 SUMMARY:

Exercises: 11.4, 11.8, 11.10, 11.12, 11.14, 11.36, 11.38, 11.40, 11.42

CHAPTER 12 SUMARY:

Exercises: 12.10, 12.12, 12.14, 12.16, 12.18, 12.20, 12.22, 12.24, 12.26, 12.28

Page 10 of 10
STAT 5002 PRACTICE #5 (part II) Prof. Tan Le

CONFIDENCE INTERVAL (Solution)


(Student t-distribution)
Name: ___________________________
ID#: ___________________________

#1. A random sample of 25 was drawn from a population. The sample mean and standard
deviation are x = 510 and s = 125. Estimate population mean µ with 98% confidence.
Solution: Given: n = 25, x = 510, s = 125.
98% confidence

Step 1: S is given, (σ is unknown) → t distribution


98% confidence → the remaining of the bell curve is 2% = 0.02 → each tail is 1% = 0.01
Degree of freedom = n – 1 = 25 – 1 = 24

α/2 = 1% α/2 = 1%

0.98

–2.492 +2.492 t

sx s 125
Step 2: Standard error = serror = = = = 25.00
n 25
s 125
Margin of error = E = t.Serror = t. = (2.492) = (2.492) (25) = 62.3
n 25

Step 3: LCL = X – E = 510 – 62.3 = 447.7


UCL = X + E = 510 + 62.3 = 572.3
Thus, the 98% confident interval is: 447.7 < μ < 572.3

Page 1 of 1
STAT 5002 PRACTICE #6 Prof. Tan Le

HYPOTHESIS TESTING FOR ONE POPULATION MEAN (SOLUTION)

1. A manufacturer of light bulbs advertises that, on average, its long-life bulb will last more
than 5,000 hours. To test the claim, a statistician took a random sample of 100 bulbs and
measured the amount of time until each bulb burned out. The sample mean was calculated as
x = 5,060 hours. If we assume that the lifetime of this type of bulb has a population standard
deviation σ = 350 hours, can we conclude at the 5% significant level that the claim is true?

Solution:

Step 1: State the null hypothesis and the alternate hypothesis


H0: µ ≤ 5000
H1: µ > 5000 → ( __RIGHT__ tailed test)
(note: keyword in the problem is “…last more than …”)

Step 2: right tail 5% = 0.05


rejected region (5% = 0.05)

Accepted region

0.95

1.645 (from Z Table)

Step 3: Test statistic (formula).


x−µ 5060 − 5000 (x − µ) (5060 − 5000)
Z= = = 1.71, or Z= n = 100 = 1.71
σ 350 100 σ 350
n

Step 4. Comparison and Conclusion


Because 1.71 is greater than 1.645 (it belongs to rejected region),

We reject the null hypothesis H0: μ ≤ 5000 hrs

The claim is true.

Page 1 of 2
STAT 5002 PRACTICE #6 Prof. Tan Le

2. A major bank is concerned about the amount of debt being accrued by customers using its
credit cards. The Board of Directors voted to institute an expensive monitoring system if the
mean credit card debt of all the bank’s customers is $2,000. The bank randomly selected 28
credit-card holders and determined the amount of credit card debt charged. For this sample
group, the sample mean was $2,180 and the sample standard deviation was $800. Use a 5%
level of significance to test the claim that the mean credit card debt is not equal to $2,000.

Solution:

Step 1: State the null hypothesis and the alternate hypothesis


H0: µ = 2000
Ha: µ ≠ 2000 → ( __Two__ tailed test)
(note: keyword in the problem is “…not equal to …”)

Step 2. Two tails = 5% → each tail = 2.5% = 0.025


Degree of freedom = n – 1 = 27

rejected region (2.5% = 0.025) rejected region (2.5% = 0.025)

Accepted region

–2.052 2.052 (from t Table)

Step 3. Test statistic (formula).


x−µ 2180 − 2000 (x − µ) (2180 − 2000)
t= = = 1.19, or t= n = 28 = 1.19
s 800 28 s 800
n

Step 4. Comparison and Conclusion


Because test statistic t = 1.19 is greater than –2.052 but less than 2.052
→ It is in the accepted region.

We fail to reject the null hypothesis H0: μ = $2000


(We support the null hypothesis H0: μ = $2000)

Page 2 of 2
STAT 5002 PRACTICE #7 (SOLUTION) Prof. Tan Le

HYPOTHESIS TESTING FOR TWO POPULATIONS

Name: ______________________________

ID#: ______________________________

#1. Sample of size 6 were drawn independently from two normal populations. These data are
listed below. Test to determine whether the means of the two populations no differ. (Use α = 0.05)
Sample 1: 12, 6, 5, 8, 11, 5 → X 1 = 7.83, and given σ1 = 3.06
Sample 2: 7, 11, 13, 5, 8, 7 → X 2 = 8.50, and given σ2 = 2.95

Solution:

Since σ is known → Z distribution


1. Step 1: H0: µ1 = µ2 ,or H0: µ1 – µ2 = 0
Ha: µ1 ≠ µ2 (two tailed test) Ha: µ1 – µ2 ≠ 0 (two tailed test)

2. Step 2: Given 2 tails α = 0.05 = 5%


(Using the Normal Distribution Table → Find the Z value)

rejected region rejected region

accepted region

–1 .96 1.96
Z

Page 1 of 4
STAT 5002 PRACTICE #7 (SOLUTION) Prof. Tan Le

3. Step 3: Test statistic: Z =


(x1 − x2 ) − (µ1 − µ 2 ) =
 σ 12 σ 22 
 n + n 
 1 2 

=
(7.83 − 8.50) − 0
 3.06 2 2.95 2 
 + 
 6 6 

− 0.67
=
3.01

− 0.67
=
1.734935157

= –0.386

4. Step 4: Comparison and Conclusion


Because the test statistics Z = –0.386 is in the ___accepted _______ region.

We ________do not reject (accept)_______ the null hypothesis H0: µ1 = µ2 .

---♦---

Page 2 of 4
STAT 5002 PRACTICE #7 (SOLUTION) Prof. Tan Le

#2. An advertising agency wants to determine whether a new training program improved an
agent’s ability to undertake an advertising job. A supervisor rated six agents on their ability
before and after the program as follows (high score indicates greater ability).
Before After
Mary 7.6 14.7
John 9.9 14.1
Bill 8.6 11.8
Jim 9.5 16.1
Peter 8.4 14.7
Tom 9.2 14.1

Is there enough evidence to support the new training program improved an agent’s ability?
Uses 5% level of significance.

Solution: Before < After

Program Before After Difference


d– d (d – d )2
Participant X1 X2 d = X1 – X2
Mary 7.6 14.7 7.6 – 14.7 = –7.1 –7.1 – (–5.38) = – 1.72 (–1.72)2 = 2.96
John 9.9 14.1 9.9 – 14.1 = –4.2 –4.2 – (+5.38) = + 1.18 (+1.18)2 = 1.39
Bill 8.6 11.8 8.6 – 11.8 = –3.2
Jim 9.5 16.1 9.5 – 16.1 = –6.6
Peter 8.4 14.7 8.4 – 14.7 = –6.3
Tom 9.2 14.1 9.2 – 14.1 = –4.9
∑d = – 32.3 0 ∑(d – d ) = 11.67
2
Total

Where d = x D = ∑d =
− 32.3
= – 5.38
n 6

Since σ is unknown → t distribution

1. Step 1: H0: µ1 ≥ µ2 ,or H0: µ1 – µ2 ≥ 0


Ha: µ1 < µ2 (left tailed test) Ha: µ1 – µ2 < 0 (left tailed test)

2. Step 2:

SD =
∑ (D − D ) 2

=
11.67
=
11.67
= 2.334 = 1.53
n −1 6 −1 5

Page 3 of 4
STAT 5002 PRACTICE #7 (SOLUTION) Prof. Tan Le

3. Step 3: Tail α = 0.05 = 5%


Degrees of freedom = ν = nD – 1 = 6 – 1 = 5

(Note: given left tail = 5% = 0.05


Degree of freedom = 5 → find t value by the Table )

rejected region

accepted region

–2.015
t

D − µD − 5.38 − 0
4. Step 4: Test statistic: t = = = -8.62
sD 1.53
n 6

5. Step 5: Comparison and Conclusion


Because the test statistics t = -8.62 is in the ___rejected _____ region.

We ___________reject____________ the null hypothesis H0: µ1 ≥ µ2 .


---♦---

Page 4 of 4
STAT 5002 Practice #8 (Solution) Prof. Tan Le

REGRESSION (SOLUTION)

1. Attempting to analyze the relationship between advertising and sales, the owner of a furniture
store recorded the monthly advertising budget ($ thousands) and the sales ($ million) for a
sample of 12 months. The data are listed here: (text #16.2)

Advertising 23 46 60 54 28 33 25 31 36 88 90 99

Sales 9.6 11.3 12.8 9.8 8.9 12.5 12.0 11.4 12.6 13.7 14.4 15.9

Suppose the regression line for the data is ŷ = 9.10 + 0.058 x and r = 0.78.

a) What is the dependent and independent variable?


Answer:
Independent variable, x: Advertising
Dependent variable, y: Sales
---♦---
b) Is the correlation positive or negative?
Answer:
Positive, since the slope is positive +0.058 → which means that in this sample, if the advertising
increased by 1 unit, the sales increased by 0.058.
---♦---
c) What is the correlation coefficient (r) and comment on the strength of correlation?
Answer:
Correlation coefficient r = 0.78
There is a strong positive correlation between the advertising, x and the sales, y.
→ More advertising, more sales.
---♦---
2
d) What is the coefficient of determination (r ) and interpret the meaning.
Answer:
Coefficient of determination r2 = (0.78)2= 0.61
which means that 61% of the total variation in the sales, y can be explained by the variation in the
advertising, x. The remaining 39% (100% – 61% = 39%) of the total variation in the sales, y
remained unexplained due to the other factors.
---♦---
e) Predict the sales if the advertising budget is $50 000.
Answer:
Since r = 0.78 (significant), we found that the best predicted quantity sold at $50 (thousands) is
ŷ = 9.10 + 0.058 x
= 9.10 + 0.058 (50)
= 9.10 + 2.90
= 12 ($ million)
Page 1 of 3
STAT 5002 Practice #8 (Solution) Prof. Tan Le
2. Are the marks one receives in a course related to the amount of time spent studying the
subject? To analyze this mysterious possibility a student took a random sample of 10 students
who had enrolled in an accounting class last semester. He asked each to report his or her mark
in course and the total number of hours spent studying accounting. These data are listed here.
(text #4.66)
Time Spent Studying 40 42 37 47 25 44 41 48 35 28
Marks 77 63 79 86 51 78 83 90 65 47
Below is the Excel output. Please use it to answer the following questions

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.881173563
R Square 0.7764668
Adjusted R Square 0.748525204
Standard Error 7.375607542
Observations 10

ANOVA
df SS MS F Significance F
Regression 1 1511.703307 1511.703 27.7889 0.000753909
Residual 8 435.1966929 54.39959
Total 9 1946.9

Upper Lower
Coefficients Standard Error t Stat P-value Lower 95% 95% 95.0%
Intercept 5.9217458 12.7314594 0.465127 0.654239 -23.4370522 35.28054 -23.4371

Time Spent Studying 1.7048644 0.323410691 5.271515 0.000754 0.959078057 2.450651 0.959078

Solution:

a) Test to determine whether there is enough evidence in the Excel output above to infer that
there is a linear relationship between the years of experience and the annual bonus. Use 5%
significance level.
Solution: Testing the slope b1

Step 1: H0: β1 = 0 (no linear relationship)


H1: β1 ≠ 0 (linear relationship)

Page 2 of 3
STAT 5002 Practice #8 (Solution) Prof. Tan Le
Step 2:

Upper Lower
Coefficients Standard Error t Stat P-value Lower 95% 95% 95.0% 9
Intercept 5.9217458 12.7314594 0.465127 0.654239 -23.4370522 35.28054 -23.4371 3

Time Spent Studying 1.7048644 0.323410691 5.271515 0.000754 0.959078057 2.450651 0.959078 2

Step 3: Conclusion
The p-value = 0.000754 < α = 0.05.
Reject H0: β1 = 0

(or, we support the H1: β1 ≠ 0,


There is a relationship between the time spent studying and the marks.)
---♦---

b) What is the dependent and independent variable?


Time spent studying: independent variable x
Marks: dependent variable y
---♦---
c) What is the coefficient of determination (r2) and interpret the meaning.
Coefficient of determination r2 = 0.7765
which means that 77.65% of the total variation in the marks, y can be explained by the
variation in the time spent studying, x.
The remaining 22.35% (since, 100 – 77.65 = 22.35%) of the total variation in the marks y
Remained unexplained due to the other factors.
---♦---
d) What is the correlation coefficient (r) and comment on the strength of correlation?
Correlation coefficient r = +0.8812, which strong positive correlation between the time spent
studying and the marks. (In general, more time studying, higher marks)
---♦---
e) Determine the least squares line ŷ = b0 + b1x.
ŷ = 5.922 + 1.705 x
---♦---
f) Predict the mark if the time spent studying is 45 hours.
ŷ = 5.922 + 1.705 (45)
= 5.922 + 76.725
= 82.647
---♦---

Page 3 of 3
STAT 5002 PRACTICE #9 Prof. Tan Le

MULTIPLE LINEAR REGRESSION (SOLUTION)


Excel:
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.856184
R Square 0.733051
Adjusted R Square 0.682999
Standard Error 44.738608
Observations 20

ANOVA
df SS MS F Significance F p-value
Regression 3 87941.06119 29313.69 14.64554 0.00007518
Residual 16 32024.68881 2001.543
Total 19 119965.75

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 140.3187 36.1172 3.8851 0.0013 63.7536 216.8839 63.7536 216.8839
Mean Outside Temperature -12.3168 2.6850 -4.5873 0.0003 -18.0087 -6.6249 -18.0087 -6.6249
Attic Insulation (cm) -4.1013 1.7248 -2.3778 0.0302 -7.7578 -0.4449 -7.7578 -0.4449
Age of Furnace (years) 8.5268 3.2954 2.5875 0.0198 1.5409 15.5126 1.5409 15.5126

Solution:
Recall: The population multiple linear regression equation for this problem is:
ŷ = β0 + β1 x1 + β2 x2 + β3 x3 + ε

a) Based on the Excel output, does it look that the multiple regression model is overall significant?
Conduct the test at 5% significance level.
Step 1:
Ho: β0 = β1= β2= β3 = 0
Ha: At least one of the regression coefficient βi is not equal to 0.

Steps 2&3: Excel


ANOVA
df SS MS F Significance F
Regression 3 87941.06119 29313.69 14.64554 0.00007518
Residual 16 32024.68881 2001.543
Total 19 119965.75

P-value = significance F = 0.00007518 < α = 0.05

Step 4: Conclusion
Reject H0: β1 = β2 = β3 = 0
(Support H1: At least one of the regression coefficient is not equal to 0.)

At 5% significance level, there is at least one significance predictor of the home heating cost.
The model is overall significant. (There is a linear relationship exits at 5% significance level.)
Page 1 of 2
STAT 5002 PRACTICE #9 Prof. Tan Le

b) Predict the January heating cost (y) for a home if we know the mean outside temperature for the
month is –8oC, (x1), with 7.5cm of insulation (x2), and the furnace is 6 years old (x3)
Solution:

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 140.3187 36.1172 3.8851 0.0013 63.7536 216.8839
Mean Outside Temperature -12.3168 2.6850 -4.5873 0.0003 -18.0087 -6.6249
Attic Insulation (cm) -4.1013 1.7248 -2.3778 0.0302 -7.7578 -0.4449
Age of Furnace (years) 8.5268 3.2954 2.5875 0.0198 1.5409 15.5126

The multiple regression model is:


Yˆ = 140.3187 – 12.3168 x1 – 4.1013 x2 + 8.5268 x3

Predict the January heating cost $y if given x1 = –8, x2 = 7.5, and x3 = 6


Yˆ = 140.3187 – 12.3168 x1 – 4.1013 x2 + 8.5268 x3
= 140.3187 – 12.3168(-8) – 4.1013(7.5) + 8.5268(6)
= 140.3187 + 98.53 – 30.76 + 51.16
= $259.25

Page 2 of 2
STAT 5002 PRACTICE #10 Prof. Tan Le

TIME SERIES (SOLUTION)


1. Given the following data

Period Time 3-Period Moving Weighted Moving Average Exponential Smoothed


Series Average with α = 0.25
1 16 - - 16.00
2 22 - - 16.00
3 19 - - 17.50
4 24 (16+22+19)/3=19.00 (16*1+22*3+19*5)/9=19.67 17.88
5 30 (22+19+24)/3=21.67 (22*1+19*3+24*5)/9=22.11 19.41
6 26 (19+24+30)/3=24.33 (19*1+24*3+30*5)/9=26.78 22.06
Forecast (24+30+26)/3=26.67 (24*1+30*3+26*5)/9=27.11 23.045

a) Calculate the 3-period moving period


b) Compute the weighted moving averages with three periods, using 5, 3, and 1 for the weights of
the most recent, the second most recent, and the third most recent periods respectively. Find the
forecast for period 7. Round all values to two decimal places.

c) Compute the exponentially smoothed values using a smoothing coefficient α = 0.25. Find the
forecast for period 7. Round all values to two decimal places. Ft+1 = α At + (1 – α) Ft

d) Calculate MAD for 3-period moving averages and weighted moving averages (Use results from
question a and b). Which method provides better forecast? Round MAD to three decimal places.

Period Time Series 3-Period Absolute Weighted Absolute


Moving Error Moving Average Error
Average
1 16 - - - -
2 22 - - - -
3 19 - - - -
4 24 19.00 5.00 19.67 4.33
5 30 21.67 8.33 22.11 7.89
6 26 24.33 1.67 26.78 0.78
MAD = 5.000 MAD = 4.333

MAD (Moving Average) = (5.00 + 8.33 + 1.67)/3 = 5.000


MAD (Weighted Moving Average) = (4.33 + 7.89 + 0.78)/3 = 4.333

Weighted Moving Average provides the better forecast.


Page 1 of 1

You might also like