Professional Documents
Culture Documents
Stats CH Final
Stats CH Final
Stats CH Final
Tan Le
Chapter 8: (Part II)
8.4 OTHER CONTINUOUS DISTRIBUTIONS
Student t Distribution (σ is unknown)
The Student t distribution was first derived by William Gosset (1876-1937). (Gosset published his
findings (1908) under the pseudonym “Student” and used the letter t to represent the random variable,
hence the Student t distribution- also called the Student’s t distribution.)
x −µ x−µ
t= , or t= ( n)
s s
n
Definition: The number of degrees of freedom for a data set corresponds to the number of scores that
can vary after certain restrictions have been imposed on all scores.
For the applications of this section, the number of degrees of freedom is simply the sample size minus 1:
Degrees of freedom = ν = n – 1
(nu)
Critical t Values
1. Critical values are found in Table 8.2 (or Appendix B, Table 4).
2. Degrees of freedom = ν = n – 1
3. After finding the number of degrees of freedom, refer to Table 8.2 and locate that number in the
column at the left. With a particular row of t values now identified, select the critical t value that
corresponds to the appropriate column heading (the values of the tail(s)).
If a critical t value is located at the left tail, be sure to make it negative.
Page 1 of 5
STATISTICS Chapter 8 (Part II) Prof. Tan Le
Solution: Degree of freedom = ν = n – 1 = 11 – 1 = 10, top 5% (α = 0.05 on the right side)
α = 0.05
0.95
0 +1.812 t
Hence, t = +1.812
---♦---
0.95
– 1.812 0 t
Hence, t = –1.812
(Note: The value of t such that the area to its left is negative)
Page 2 of 5
STATISTICS Chapter 8 (Part II) Prof. Tan Le
0.99
– 2.475 0 t
Hence, t = –2.475
---♦---
Example:
t.05, 10 = 1.812, t.05, 25 = 1.708, t.05, 70 = t.05, 72 = 1.667, t.05, 10 = –1.812
Example:
t.05, ∞ = 1.282, t.025, ∞ = 1.96, t.01, ∞ = 2.33, t.005, ∞ = 2.576
---♦---
Page 3 of 5
STATISTICS Chapter 8 (Part II) Prof. Tan Le
Degrees of
Freedom
t.100 t.050 t.025 t.010 t.005
1 3.078 6.314 12.706 31.821 63.657
2 1.886 2.920 4.303 6.965 9.925
3 1.638 2.353 3.182 4.541 5.841
4 1.533 2.132 2.776 3.747 4.604
5 1.476 2.015 2.571 3.365 4.032
6 1.440 1.943 2.447 3.143 3.707
7 1.415 1.895 2.365 2.998 3.499
8 1.397 1.860 2.306 2.896 3.355
9 1.383 1.833 2.262 2.821 3.250
10 1.372 1.812 2.228 2.764 3.169
11 1.363 1.796 2.201 2.718 3.106
12 1.356 1.782 2.179 2.681 3.055
13 1.350 1.771 2.160 2.650 3.012
14 1.345 1.761 2.145 2.624 2.977
15 1.341 1.753 2.131 2.602 2.947
16 1.337 1.746 2.120 2.583 2.921
17 1.333 1.740 2.110 2.567 2.898
18 1.330 1.734 2.101 2.552 2.878
19 1.328 1.729 2.093 2.539 2.861
20 1.325 1.725 2.086 2.528 2.845
21 1.323 1.721 2.080 2.518 2.831
22 1.321 1.717 2.074 2.508 2.819
23 1.319 1.714 2.069 2.500 2.807
24 1.318 1.711 2.064 2.492 2.797
25 1.316 1.708 2.060 2.485 2.787
26 1.315 1.706 2.056 2.479 2.779
27 1.314 1.703 2.052 2.473 2.771
28 1.313 1.701 2.048 2.467 2.763
29 1.311 1.699 2.045 2.462 2.756
30 1.310 1.697 2.042 2.457 2.750
Page 4 of 5
STATISTICS Chapter 8 (Part II) Prof. Tan Le
Degrees of
Freedom
t.100 t.050 t.025 t.010 t.005
35 1.306 1.690 2.030 2.438 2.724
40 1.303 1.684 2.021 2.423 2.925
45 1.301 1.679 2.014 2.412 2.690
50 1.299 1.676 2.009 2.403 2.678
55 1.297 1.673 2.004 2.396 2.668
60 1.296 1.671 2.000 2.390 2.660
65 1.295 1.669 1.997 2.385 2.654
70 1.294 1.667 1.994 2.381 2.648
75 1.293 1.665 1.992 2.377 2.643
80 1.292 1.664 1.990 2.374 2.639
85 1.292 1.663 1.988 2.371 2.635
90 1.291 1.662 1.987 2.368 2.632
95 1.291 1.661 1.985 2.366 2.629
100 1.290 1.660 1.984 2.364 2.626
110 1.289 1.659 1.982 2.361 2.621
120 1.289 1.658 1.980 2.358 2.617
130 1.288 1.657 1.978 2.355 2.614
140 1.288 1.656 1.977 2.353 2.611
150 1.287 1.655 1.976 2.351 2.609
160 1.287 1.654 1.975 2.350 2.607
170 1.287 1.654 1.974 2.348 2.605
180 1.286 1.653 1.973 2.347 2.603
190 1.286 1.653 1.973 2.346 2.602
200 1.286 1.653 1.972 2.345 2.601
∞ 1.282 1.645 1.960 2.326 2.576
Page 5 of 5
STATISTICS Chapter 12 (Part I) Prof. Tan Le
Chapter 12: (Part I)
INFERENCE ABOUT A POPULATION
12.1 INFERENCE ABOUT A POPULATION MEAN
WHEN THE STANDARD DEVIATION σ IS UNKNOWN
Student t Distribution: The Student t distribution was first derived by William Gosset (1876-1937).
(Gosset published his findings (1908) under the pseudonym “Student” and used the letter t to represent the
random variable, hence the Student t distribution- also called the Student’s t distribution.)
0.95
-2.228 0 +2.228 t
Solution: Degree of freedom = n – 1 = 33 – 1 = 32 (there is no degrees of freedom 32, use the closest df. is 30),
95% degrees of confidence (α = 5% = 0.05 for two tails, → each tail = α/2 = 0.05/2 = 0.025)
Page 1 of 5
STATISTICS Chapter 12 (Part I) Prof. Tan Le
α = 0.025 α = 0.025
0.95
– 2.042 0 +2.042 t
s
Confidence Interval for the Estimate of µ: x − E < µ < x + E , where E = t.
n
Other equivalent form for the confidence interval are (x − E , x + E ) . The values x − E and x + E are called
confidence interval estimator.
s s
LCL = x – tα ⋅ , and UCL = x + tα ⋅
2 n 2 n
---♦---
Page 2 of 5
STATISTICS Chapter 12 (Part I) Prof. Tan Le
The following 3-steps are the expectation for constructing confidence interval.
Step 1. Find the critical values t (if σ is unknown, and s is known) correspond to the given
degree of confidence (%).
s
Step 2. Find E = tα (s is given)
2 n
Step 3. Construct the confidence interval: x − E < µ < x + E
---♦---
Example 12.2:
1829247
Given s = 4499, sample size n = 192, x = = 9527. Construct a 95% confidence interval.
192
Solution:
Step 1: Degree of freedom = n – 1 = 192 – 1 = 191, (because 191 degrees of freedom is not listed, we
find the closest number of degrees of freedom, which is 190.)
95% degrees of confidence (α = 5% = 0.05 for two tails, → each tail = α/2 = 0.05/2 = 0.025)
Page 3 of 5
STATISTICS Chapter 12 (Part I) Prof. Tan Le
The Student t Distribution
Degrees of
Freedom
t.100 t.050 t.025 t.010 t.005
1 3.078 6.314 12.706 31.821 63.657
2 1.886 2.920 4.303 6.965 9.925
3 1.638 2.353 3.182 4.541 5.841
4 1.533 2.132 2.776 3.747 4.604
5 1.476 2.015 2.571 3.365 4.032
6 1.440 1.943 2.447 3.143 3.707
7 1.415 1.895 2.365 2.998 3.499
8 1.397 1.860 2.306 2.896 3.355
9 1.383 1.833 2.262 2.821 3.250
10 1.372 1.812 2.228 2.764 3.169
11 1.363 1.796 2.201 2.718 3.106
12 1.356 1.782 2.179 2.681 3.055
13 1.350 1.771 2.160 2.650 3.012
14 1.345 1.761 2.145 2.624 2.977
15 1.341 1.753 2.131 2.602 2.947
16 1.337 1.746 2.120 2.583 2.921
17 1.333 1.740 2.110 2.567 2.898
18 1.330 1.734 2.101 2.552 2.878
19 1.328 1.729 2.093 2.539 2.861
20 1.325 1.725 2.086 2.528 2.845
21 1.323 1.721 2.080 2.518 2.831
22 1.321 1.717 2.074 2.508 2.819
23 1.319 1.714 2.069 2.500 2.807
24 1.318 1.711 2.064 2.492 2.797
25 1.316 1.708 2.060 2.485 2.787
26 1.315 1.706 2.056 2.479 2.779
27 1.314 1.703 2.052 2.473 2.771
28 1.313 1.701 2.048 2.467 2.763
29 1.311 1.699 2.045 2.462 2.756
30 1.310 1.697 2.042 2.457 2.750
Page 4 of 5
STATISTICS Chapter 12 (Part I) Prof. Tan Le
The Student t Distribution
Degrees of
Freedom
t.100 t.050 t.025 t.010 t.005
35 1.306 1.690 2.030 2.438 2.724
40 1.303 1.684 2.021 2.423 2.925
45 1.301 1.679 2.014 2.412 2.690
50 1.299 1.676 2.009 2.403 2.678
55 1.297 1.673 2.004 2.396 2.668
60 1.296 1.671 2.000 2.390 2.660
65 1.295 1.669 1.997 2.385 2.654
70 1.294 1.667 1.994 2.381 2.648
75 1.293 1.665 1.992 2.377 2.643
80 1.292 1.664 1.990 2.374 2.639
85 1.292 1.663 1.988 2.371 2.635
90 1.291 1.662 1.987 2.368 2.632
95 1.291 1.661 1.985 2.366 2.629
100 1.290 1.660 1.984 2.364 2.626
110 1.289 1.659 1.982 2.361 2.621
120 1.289 1.658 1.980 2.358 2.617
130 1.288 1.657 1.978 2.355 2.614
140 1.288 1.656 1.977 2.353 2.611
150 1.287 1.655 1.976 2.351 2.609
160 1.287 1.654 1.975 2.350 2.607
170 1.287 1.654 1.974 2.348 2.605
180 1.286 1.653 1.973 2.347 2.603
190 1.286 1.653 1.973 2.346 2.602
200 1.286 1.653 1.972 2.345 2.601
∞ 1.282 1.645 1.960 2.326 2.576
Page 5 of 5
STATISTICS Chapter 13 Prof. Tan Le
Chapter 13:
INFERENCE ABOUT COMPARING TWO POPULATIONS
In a two-sample hypothesis-testing problem, the underlying parameters of two different populations,
{neither of whose values is assumed known}, are compared.
Two samples are said to be independent when the data points in one sample are unrelated to the data
points in the second sample.
If the values in one sample are related to the values of the other sample, the samples are dependent. Such
samples are often referred to as matched paired or paired samples.
---♦---
Sampling Distribution of x1 – x2
σ 12 σ 22
The standard error is σerror = σ x = +
n1 n2
---♦---
The null hypothesis (denoted by H 0 ) is a statement about the value of a population parameter (such as
the mean µ), and it must contain the condition of equality and must be written with the symbol =, ≤ ,
or ≥ . For the mean, the null hypothesis will be stated in one of these three forms:
H0: µ1 – µ2 = 0 H0: µ1 – µ2 ≤ 0 H0: µ1 – µ2 ≥ 0
We test the null hypothesis directly in the sense that we assume it is true and reach a conclusion
to either reject H 0 or fail to reject H 0 .
The alternative hypothesis (denoted by H1 or Ha) is the statement that must be true if the null hypothesis
is false. For the mean, the alternative hypothesis will be stated in only one of these three possible forms:
Ha: µ1 – µ2 ≠ 0 Ha: µ1 – µ2 > 0 Ha: µ1 – µ2 < 0
(two tailed test) (right tailed test) (left tailed test)
Note that H1 (or Ha) is the opposite of H 0 .
Note:
1. There are 2 possible decisions:
a. Conclude that there is enough evidence to support the alternative hypothesis Ha.
Reject the null hypothesis Ho
b. Conclude that there is not enough evidence to support the alternative hypothesis Ha.
Do not reject the null hypothesis Ho
(accept, support)
Page 1 of 13
STATISTICS Chapter 13 Prof. Tan Le
When the p-value > significance level α, there is no evidence to infer that the alternative hypothesis Ha is
true. We also say that the test is not significant. → Do not reject the null hypothesis H0.
(accept, support)
---♦---
Example: There is no difference in mean credit card debts of household in Burlington and Hamilton.
↓
So, the null and alternative hypothesis are
accepted region
Sign used in
Two-tailed test
---♦---
Example: … Can we infer that police officers and security officers differ in their vacation expenses?
↓
So, the null and alternative hypothesis are
Ho : µ P = µ S or, Ho : µ P – µ S = 0 or, Ho : µ D = 0
Ha: µP ≠ µS (claim) Ha : µ P – µ S ≠ 0 Ha : µ D ≠ 0
Page 2 of 13
STATISTICS Chapter 13 Prof. Tan Le
rejected region
accepted region
Sign used in
Right-tailed test
---♦---
Example: Can the company infer that the new tire will last longer on average than the existing tire?
↓
Ho: µnew ≤ µexisting or, Ho: µnew – µexisting ≤ 0 or, Ho : µ d ≤ 0
Ha: µnew > µexisting Ha: µnew – µexisting > 0 Ha : µ d > 0
---♦---
Example: The store manager would like to know if the mean checkout time using the standard method is
longer than using the U-Scan?
↓
Ho : µ S ≤ µ U or Ho : µ S – µ U ≤ 0 or, Ho : µ d ≤ 0
Ha: µS > µU (claim) Ha : µ S – µ U > 0 Ha : µ d > 0
Page 3 of 13
STATISTICS Chapter 13 Prof. Tan Le
Example: A real estate agency says that the number of home sales drop most since recession as new rules
put brakes on market. Can we conclude that at the 5% significance level that the number of houses sold of
this year decreased on average? (Note: This year sales less than last year.)
↓
Ho: µ this year ≥ µ last year or Ho: µ this year – µ last year ≥ 0 or, Ho : µ d ≥ 0
Ha: µthis year < µlast year (claim) Ha: µ this year – µ last year < 0 Ha: µd < 0
rejected region
accepted region
Sign used in
Left-tailed test
---♦---
Note: We can change the right tailed test problem to left tailed test problem by changing the set up.
---♦---
Example: Can the company infer that the existing tire will last shorter on average than the new tire?
↓
Ho: µexisting ≥ µnew or, Ho: µexisting – µnew ≥ 0 or, Ho: µd ≥ 0
Ha: µexisting < µnew Ha: µexisting – µnew < 0 Ha : µ d < 0
---♦---
Example: A manufacturer claims that the war usage of its 17-inch flat panel monitors is less than that of its
leading competitor. At α = 0.10, is there enough evidence to support the manufacturer’s claim?
The claim is “the watt usage of manufacturer’s 17-inch flat monitors is less than that of its leading competitor.”
↓
Ho : µ m ≥ µ c or, Ho : µ m – µ c ≥ 0 or, Ho : µ d ≥ 0
Ha: µm < µc (claim) Ha : µ m – µ c < 0 Ha : µ d < 0
Page 4 of 13
STATISTICS Chapter 13 Prof. Tan Le
1. Step 1:
H0: μ1 – μ2 = 0 or , H0: μ1 – μ2 ≤ 0 or, H0: μ1 – μ2 ≥ 0
Ha: μ1 – μ2 ≠ 0 (two tailed test). Ha: μ1 – μ2 > 0 (right tailed test). Ha: μ1 – μ2 < 0 (left tailed test).
2. Step 2: Bell curve figure (two tails or one tail) and Table 3 (find Z critical from the Z Table).
Example:
Sample of size 6 were drawn independently from two normal populations. These data are listed below.
Test to determine whether the means of the two populations no differ. (Use α = 0.01)
Sample mean Population standard deviation Sample size
Samples σ
𝑥𝑥̅ n
1 7.83 3.06 6
2 8.50 2.95 6
Page 5 of 13
STATISTICS Chapter 13 Prof. Tan Le
accepted region
–2.58 0 +2.58 Z
Sample
Sample 1 2
Mean 7.833333 8.5
Known Variance 9.3636 8.7025
Observations 6 6
Hypothesized Mean Difference 0
z -0.3842
P(Z<=z) one-tail 0.350417
z Critical one-tail 2.326348
P(Z<=z) two-tail 0.700834
z Critical two-tail 2.575829
or using p-value (by excel) P-value = 0.700834 > α = 0.01, We do not reject the H0: µ1 = µ2 .)
Page 6 of 13
STATISTICS Chapter 13 Prof. Tan Le
Test statistic for μ1 – μ2 when Equal Variances (𝑺𝑺𝟐𝟐𝟏𝟏 = 𝑺𝑺𝟐𝟐𝟐𝟐 ), (n1 = n2 → same sizes)
1). Step 1:
H0: μ1 – μ2 = 0 or, H0: μ1 – μ2 ≤ 0 or, H0: μ1 – μ2 ≥ 0
Ha: μ1 – μ2 ≠ 0 (two tailed test). Ha: μ1 – μ2 > 0 (right tailed test). Ha: μ1 – μ2 < 0 (left tailed test).
Degrees of freedom ν = n1 + n2 – 2
(n − 1) s12 + (n2 − 1) s 22
Pooled variance estimator s 2p = 1
n1 + n2 − 2
Test statistic: t =
(x1 − x2 ) − (µ1 − µ 2 ) , Note: (μ1 – μ2) = 0
1 1
s 2p +
n1 n2
Direct Broker
n1 = 50 n2 = 50
X 1 = 6.63 X 2 = 3.72
S12 = 37.49 S 22 = 43.34
Can we conclude at the 5% significance level that directly purchased mutual funds outperformed funds
bought through brokers?
Page 7 of 13
STATISTICS Chapter 13 Prof. Tan Le
Test statistic for μ1 – μ2 when Unequal Variances (𝑺𝑺𝟐𝟐𝟏𝟏 ≠ 𝑺𝑺𝟐𝟐𝟐𝟐 ), (n1 ≠ n2 → difference sizes)
1). Step 1:
H0: μ1 – μ2 = 0 or, H0: μ1 – μ2 ≤ 0 or, H0: μ1 – μ2 ≥ 0
Ha: μ1 – μ2 ≠ 0 (two tailed test). Ha: μ1 – μ2 > 0 (right tailed test). Ha: μ1 – μ2 < 0 (left tailed test).
2
s12 s 22
+
Degrees of freedom ν = n1 n2 , t=
(x1 − x2 ) − (µ1 − µ 2 ) , Note: (μ1 – μ2) = 0
(
s12 / n1
2
) (
+
s 22 / n2
2
) s12 s 22
+
n1 − 1 n2 − 1 n1 n2
Page 8 of 13
STATISTICS Chapter 13 Prof. Tan Le
1) Offspring 2) Outsider
n1 = 42 n2 = 98
X 1 = – 0.10 X 2 = 1.24
S12 = 3.79 S 22 = 8.03
Test the hypothesis at α = 5% significance level.
---♦---
Conclusion: There is a sufficient different evidence to infer that the mean change in operating income differ.
---♦---
Page 9 of 13
STATISTICS Chapter 13 Prof. Tan Le
∑ (d − d )
2
SD = Standard deviation value of the differences for the paired sample data, SD =
(n − 1)
---♦---
Test statistic for μD (matched pairs) → dependent samples → n1 = n2 (same sample sizes)
The mean of the population of differences μD = μ1 – μ2
------
• Pre-step: Identify the statistic that is relevant to this test such as sample size n, calculate the difference
(by subtraction) between them (each pair), calculate the sample mean of difference d = x D = x1 – x 2 and
∑ (d − d )
2
1. Step 1:
H0: μD = 0 or, H0: μd ≤ 0 or, H0: μD ≥ 0
Ha: μD ≠ 0 (two tailed test). Ha: μd > 0 (right tailed test). Ha: μD < 0 (left tailed test).
Page 10 of 13
STATISTICS Chapter 13 Prof. Tan Le
At the 0.05 significance level, can we conclude there are less detect produced in the afternoon shift.
(or, at the 0.05 significance level, can we conclude there are more detect produced in the morning shift.)
Solution:
Pre-step: First, we have to calculate the difference between the “morning shift” and “afternoon shift”, and
the total of the differences.
Morning Afternoon
Day d d – 𝑑𝑑̅ (d – 𝑑𝑑̅ )2
shift shift
1 10 8 10 – 8 = 2
2 12 9 12 – 9 = 3
3 15 12 15 – 12 = 3
4 11 15 11 – 15 = – 4
Totals ∑ 𝒅𝒅 = 4 ∑�𝑑𝑑 − 𝑑𝑑̅ � = 0 2
∑�𝑑𝑑 − 𝑑𝑑̅ � = ?
∑ 𝑑𝑑 𝟒𝟒
Calculate the average of the difference 𝑑𝑑̅ = = =1
𝑛𝑛 4
And then, continue the table
Morning Afternoon
Day d d – 𝑑𝑑̅ (d – 𝑑𝑑̅ )2
shift shift
1 10 8 10 – 8 = 2 2–1=1 (1)2 = 1
2 12 9 12 – 9 = 3 3–1=2 (2)2 = 4
3 15 12 15 – 12 = 3 3–1=2 (2)2 = 4
4 11 15 11 – 15 = – 4 (– 4) – 1 = – 5 (– 5)2 = 25
Totals ∑ 𝒅𝒅 = 4 ∑�𝑑𝑑 − 𝑑𝑑̅ � = 0 2
∑�𝑑𝑑 − 𝑑𝑑̅ � = 34
∑(𝑑𝑑−𝑑𝑑�)2 𝟑𝟑𝟑𝟑
Calculate the standard deviation of the difference, Sd = � =� = √11.3333333 = 3.3665
𝑛𝑛−1 4−1
Page 11 of 13
STATISTICS Chapter 13 Prof. Tan Le
rees of Freedom
t.100 t.050 t.025 t.010 t.005
1 3.078 6.314 12.706 31.821 63.657
2 1.886 2.920 4.303 6.965 9.925
3 1.638 2.353 3.182 4.541 5.841
4 1.533 2.132 2.776 3.747 4.604
accepted region
0 +2.353 t
xD − µ D 1−0 1
3. Step 3: Test statistic: t = = 3.3665 = = 0.5941
sD � 1.68325
nD √4
Page 12 of 13
STATISTICS Chapter 13 Prof. Tan Le
Second method:
CHAPTER SUMARY:
Exercises: 13.2, 13.4, 13.8, 13.10, 13.12, 13.16, 13.88, 13.90, 13.92a
Page 13 of 13
STAT 5002 Chapter 16 Prof. Tan Le
Chapters 16 and 4
SIMPLE LINEAR REGRESSION AND CORRELATION
General Form of Linear Regression Equation:
y = a + bx or, ŷ = b0 + b1x
Where, a (or b0) is called y-intercept
b (or b1) is called slope (or regression coefficient) for x.
x is called independent variable.
y is called dependent variable.
---♦---
Objective:
Analyze the relationship between two variables x (independent variable) and y (dependent variable).
Regression analysis is used to predict the value of one variable (y) on the basis of the other variable (x).
---♦---
Coefficient of Correlation: r
The linear correlation coefficient r, measures the strength of the linear relationship between the
paired x and y values in a sample. (The linear correlation coefficient is sometimes referred to as the
Pearson product moment correlation coefficient in honor of Karl Pearson (1857-1936), who originally
developed it.)
Correlation coefficient: Generally ranging between –1.00 and +1.00, a number in which both the
strength and direction of correlation are expressed.
Page 1 of 11
STAT 5002 Chapter 16 Prof. Tan Le
---♦---
Least Squares method: ŷ = b0 + b1x, or ŷ = a + bx
n (∑ xy )− (∑ x )(∑ y )
Slope b1 (or b): b1 = (slope)
n (∑ x ) − (∑ x )
2 2
( y − intercept )
n (∑ x ) − (∑ x )
2 2
or, b0 = y −b x =
∑ y −b ∑x .
1 1
n n
---♦---
Solution:
Day X Y XY X2 Y2
1 7 23.80 (7)(23.80) = 166.60 (7)2 = 49 (23.80)2 = 566.44
2 3 11.89 (3)(11.89) = 35.67 (3)2 = 9 (11.89)2 = 141.37
3 2 15.98 - - -
4 5 26.11 - - -
5 8 31.79 - - -
6 11 39.93 - - -
7 5 12.27 - - -
8 15 40.06 - - -
9 3 21.38 - - -
10 6 18.65 (6)(18.65) = 111.90 (6)2 = 36 (18.65)2 = 347.82
Total = ∑X = 65 ∑Y=241.86 ∑XY = 1896.62 ∑X2 = 567 ∑Y2 = 6810.20
Page 2 of 11
STAT 5002 Chapter 16 Prof. Tan Le
b0 =
∑ y −b ∑x =
241.86
– 2.246
65
= 24.19 – 14.625 = 9.587
1
n n 10 10
Interpret:
The least squares line is: ŷ = 9.59 + 2.25 x
The y-intercept is b0 = 9.59, which means that, if the number of tools x = 0, the least squares line would
intersect the y-axis at 9.59.
The slope is b1 = 2.25, which means that in this sample, for each 1-unit increase in the number of
tools, the marginal increase in the electricity cost is 2.25.
---♦---
There is a strong positive correlation between the number of tools and electricity cost. (Since, it is
close to +1). → More tools, higher electricity cost.
Note: The linear correlation coefficient r and the slope have the same sign.
If the slope is (+), the correlation coefficient r is (+)
If the slope is (–), the correlation coefficient r is (–).
---♦---
Example: From examples 4.17 and 4.18 above and given number of tools x = 10.
Since r = 0.8711 (significant), we found that the best predicted electricity cost at 10 is
ŷ = 9.59 + 2.25 x
= 9.59 + 2.25(10)
= 9.59 + 22.50
= 32.09
Page 3 of 11
STAT 5002 Chapter 16 Prof. Tan Le
Excel Solution:
Scatter Diagram of Number of Tools and Electricity Cost
50.00
y = 2.245x + 9.587
45.00
R² = 0.7588
40.00 r = 0.8711
35.00
Electricity cost
30.00
25.00
Series1
20.00
Linear (Series1)
15.00
10.00
5.00
0.00
0 5 10 15 20
Number of tools
Coefficient of Determination R2
The coefficient of determination is the amount of the variation in y that is explained by the regression
explained variation SSR
line. It is computed as r2 = =
total variation SSTotal
---♦---
The coefficient of determination r2, measures the amount of variation in the dependent variable that is
explained by the variation in the independent variable.
---♦---
This tells us that 75.88% of the variation in electrical costs is explained by the number of tools.
The remaining 24.12% (100% – 75.88% = 24.12%) is unexplained due to the other factors.
Page 4 of 11
STAT 5002 Chapter 16 Prof. Tan Le
Years of Experience, x 1 2 3 4 5 6
Annual Bonus, y 6 1 9 5 17 12
The graph of the regression equation is called the regression line (or line of best fit, or least square line)
12
Bonus, y
10
8 Series1
6 Linear (Series1)
4
2
0
0 1 2 3 4 5 6 7
Years of Experience, x
---♦----
2nd method:
Regression Statistics
Multiple R 0.700695581
R Square 0.490974298
2
Adjusted R 0.363717872
Standard
Error 4.502909113
Observations 6
Page 5 of 11
STAT 5002 Chapter 16 Prof. Tan Le
ANOVA
Significance
df SS MS F F
Regression 1 78.22857143 78.22857 3.858149 0.120968
Residual 4 81.1047619 20.27619
Total 5 159.3333333
Slope
SSE 81.1047610
Standard Error of Estimate S ε = = = 20.27619025 = 4.502909087 = 4.503
n−2 6−2
SSR 78.22857143
Coefficient of Determination r2 = = = 0.490974298 = 0.491
SSTotal 159.3333333
This tells us that 49.1% of the variation in annual bonus, y is explained by the number years of
experience, x. The remaining 50.9% (100% – 49.1% = 50.9%) is unexplained due to the other factors.
---♦---
Page 6 of 11
STAT 5002 Chapter 16 Prof. Tan Le
The null hypothesis specified that there is no relationship, which means that the slope is 0.
Example:
Test to determine whether there is enough evidence in the Excel output above to infer that there is a
linear relationship between the years of experience and the annual bonus. Use 5% significance level.
Solution:
H0: β1 = 0 There is no relationship between the year of experience and the annual bonus.
H1: β1 ≠ 0 There is a relationship between the year of experience and the annual bonus. (claim)
p = 0.120968 ˃ α = 0.05
Example 16.2: Car price and odometer. (for all 3-year-old Toyota Camry)
Car Odometer (1000 mi) Price ($1000)
1 37.4 14.6
2 44.8 14.1
3 45.8 14.0
98 33.2 14.5
99 39.2 14.7
100 36.4 14.3
Page 7 of 11
STAT 5002 Chapter 16 Prof. Tan Le
12
10
8
6 y = -0.0669x + 17.249
4 R² = 0.6483
2 r = – 0.8052
0
0.0 10.0 20.0 30.0 40.0 50.0 60.0
Odometer (1000mi)
The slope is b1 = – 0.0669, which means that in this sample, the odometer increase 1 unit (1000 miles),
the marginal decrease in the price is $0.0669 (thousands) or $66.90.
2nd method:
Regression Statistics
Multiple R 0.8052
R Square 0.6483
Adjusted R Square 0.6447
Standard Error 0.3265
Observations 100.0000
ANOVA
Significance
df SS MS F F
Regression 1.0000 19.2556 19.2556 180.642989 0.0000
Residual 98.0000 10.4463 0.1066
Total 99.0000 29.7019
Page 8 of 11
STAT 5002 Chapter 16 Prof. Tan Le
Standard Lower Up
Coefficients Error t Stat P-value Lower 95% Upper 95% 95.0% 95
Intercept 17.2487 0.1821 94.7250 0.000000 16.8874 17.6101 16.8874 17
Odometer (1000 mi) – 0.0669 0.0050 -13.4403 0.000000 -0.0767 -0.0570 -0.0767 -0
Slope
---♦---
R2 = 0.6483
This tells us that 64.83% of the variation in car price is explained by the Odometer. The remaining
35.17% (100% – 64.83% = 35.17%) is unexplained due to the other factors.
r = r 2 = 0.6483 = –0.8052
This tells us there is a strong negative correlation between the odometer and the price. (Higher
odometer, less price).
Standard Lower Up
Coefficients Error t Stat P-value Lower 95% Upper 95% 95.0% 95
Intercept 17.2487 0.1821 94.7250 0.000000 16.8874 17.6101 16.8874 17
Odometer (1000 mi) – 0.0669 0.0050 -13.4403 0.000000 -0.0767 -0.0570 -0.0767 -0
Page 9 of 11
STAT 5002 Chapter 16 Prof. Tan Le
One Tail Test: (We have to divide the two tail p-value by 2)
Test to determine whether there is enough evidence in the Excel output above to infer that there is a
negative linear relationship between the price and the odometer reading for all 3-year-old Toyota
Camry. Use a 5% significance level.
Solution:
H0: β1 = 0 (There is no linear relationship)
H1: β1 < 0 (There is a negative linear relationship)
p 0.00000
= = 0.00000 < α = 0.05
2 2
→ Reject H0: β1 = 0, support H1: β1 < 0.
There is a negative linear relationship exits at 5% significance level. (higher odometer, lower price)
---♦---
2
Note: If r2 is given, then r = r
The slope b1 and the linear correlation coefficient (r value) have the same sign.
Page 10 of 11
STAT 5002 Chapter 16 Prof. Tan Le
EXERCISE #10
1. Are the marks one receives in a course related to the amount of time spent studying the
subject? To analyze this mysterious possibility a student took a random sample of 10 students
who had enrolled in an accounting class last semester. He asked each to report his or her mark
in course and the total number of hours spent studying accounting. These data are listed here.
(text #4.66 textbook)
Time Spent Studying 40 42 37 47 25 44 41 48 35 28
Marks 77 63 79 86 51 78 83 90 65 47
a) What is the dependent and independent variable?
b) What is the coefficient of determination (r2) and interpret the meaning.
c) What is the correlation coefficient (r) and comment on the strength of correlation?
d) Determine the least squares line ŷ = b0 + b1x.
e) Predict the mark if the time spent studying is 45 hours.
---♦---
2. Attempting to analyze the relationship between advertising and sales, the owner of a furniture
store recorded the monthly advertising budget ($ thousands) and the sales ($ million) for a
sample of 12 months. The data are listed here: (text #16.2 textbook)
Advertising 23 46 60 54 28 33 25 31 36 88 90 99
Sales 9.6 11.3 12.8 9.8 8.9 12.5 12.0 11.4 12.6 13.7 14.4 15.9
a) Draw a scatter diagram. Does it appear that advertising and sales are linearly related?
b) Calculate the least squares line and interpret the coefficients.
---♦---
3. An economist believes that the price is the biggest factor affecting quantity sold. To support his
argument he collects 30 data points relating quantity sold and price.
Suppose the regression line for the data is ŷ = 45 – 0.28 x and r2 = 0.58.
Page 11 of 11
STAT 5002 Chapter 17 Prof. Tan Le
Chapter 17
MULTIPLE REGRESSION
A multiple regression equation expresses a linear relationship between a dependent variable Y and two
or more independent variables (x1, x2, x3,…,xk).
AUTOCORRELATION:
In multiple regression models, successive observations of the dependent variable are supposed to be
uncorrelated. Violation of this assumption often occurs when data are correlated successively over
period of time. This type of correlation is called autocorrelation.
---♦---
Coefficient of Determination R2
The coefficient of determination is the amount of the variation in y that is explained by the regression
explained variation SSR
line. It is computed as r2 = =
total variation SSTotal
Page 1 of 5
STAT 5002 Chapter 17 Prof. Tan Le
HOMOSCEDASTICITY:
The variation around the regression equation is the same for all the values of the independent variables.
50
Residuals
0
0 5 10 15 20 25 30
-50
Floor
HETEROSCEDASTICITY
20000
Residuals
0
56 58 60 62 64 66
-20000
DEPTH
---♦---
Multicollinearity:
Multicollinearity exists when the independent variables are correlated. There are two main symptoms of
multicollinearity:
a) The model as a whole is significant but no variables are significant.
b) The coefficient for a variable can have the opposite sign of what it should.
---♦---
MULTICOLLINEARITY
In multiple regression models, correlation among independent variables can potentially distort the
standard error of estimate and may therefore lead to incorrect conclusions as to which independent
variables are statistically significant. Such correlation is called multicollinearity.
Page 2 of 5
STAT 5002 Chapter 17 Prof. Tan Le
Example: The Table below contains the measurements from anaesthetized bears. Using all the
independent variables x1 through x6, find the multiple regression by Excel.
Page 3 of 5
STAT 5002 Chapter 17 Prof. Tan Le
Examining the results of the ANOVA table, if we use α = 0.05, we find that the model as a whole is
significant with a P-value of 0.0646, though barely so. However, when we examine the P-values of each
of the independent variables, Age has the lowest one at 0.1388. Again using α = 0.05, we conclude that
none of the variables is significant. This due to the presence of multicollinearity.
---♦---
Describing the p-value
If the p-value ≤ significant level α, there is strong evidence to infer that the alternative hypothesis Ha is
true. The result is deemed to be significant. → Reject the null hypothesis H0.
When the p-value > significant level α, there is no evidence to infer that the alternative hypothesis Ha is
true. We also say that the test is not significant. → Do not reject (accept) the null hypothesis H0.
---♦---
Solution:
Step 1:
Ho: β1 = β2 = β3 = β4 = β5 = β6 = 0
Ha: At least one of the regression coefficient βi is not equal to 0.
At 5% significance level, there is at least one significance predictor of bear weight. The model is
overall significance.
(There is a linear relationship exits at 5% significance level.)
Page 4 of 5
STAT 5002 Chapter 17 Prof. Tan Le
Question:
Predict the bear weight (lb.) if a bear is 50 months old, and has a head-length measuring 16 inches, a
head-width measuring 8 inches, a neck measuring 27 inches, a body-length measuring 70 inches, and a
chest measuring 40 inches around.
Solution:
Yˆ = – 216.4152 – 1.3720 x1 – 4.4894 x2 + 6.8528 x3 + 19.4721 x4 – 2.9307 x5 + 6.7196 x6
= – 216.4152 – 1.3720(50) – 4.4894(16) + 6.8528(8) + 19.4721(27) – 2.9307(70) + 6.7196(40)
= 287.3585 pounds
---♦---
CHAPTER 17 SUMMARY
Exercises
Page 5 of 5
STAT 5002 Chapter 20 Prof. Tan Le
Chapter 20:
TIME SERIES ANALYSIS AND FORECASTING
Any variable that is measured over time in sequential order is called a time series.
Forecasting is a common practice among managers and government decision makers.
20.1 TIME SERIES COMPONENTS
Time Series Components:
1. Long-term trend
2. Seasonal variation
3. Cyclical variation
4. Random variation.
Time Time
2. Seasonal variation: Seasonal variation refers to cycles that occur over short repetitive calendar and,
by definition, have a duration of less than a year.
Time
3. Cyclical variation: Cyclical variation is a wavelike pattern describing a long term trend that
generally apparent over a number of years, resulting in a cyclical effect.
Time
Page 1 of 6
STAT 5002 Chapter 20 Prof. Tan Le
4. Random variation: Random variation is caused by irregular and unpredictable changes in a time
series that are not caused by any other components.
Time Time
Mean Absolute Percent Error (MAPE): The MAPE expresses the error as a percentage of the actual
values.
Forecast error T
At − Ft
∑
Actual ∑ A
t =1
MAPE = × 100% = 100% t
T T
Note: The smaller error, the better forecast.
---♦---
Page 3 of 6
STAT 5002 Chapter 20 Prof. Tan Le
Mean Absolute Deviation (Mean Absolute Error) = MAD = MAE = ∑ Forecast error =
18.0
= 2.000
T 9
∑ Forecast error
2
54.666
Mean Squared Error = MSE = = 6.074=
T 9
Forecast error
∑ Actual 0.956
Mean Absolute Percent Error = MAPE = = = 0.1061 = 10.61%
T 9
---♦♦---
∑𝑇𝑇
𝑖𝑖=1(𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤ℎ𝑡𝑡 𝑓𝑓𝑓𝑓𝑓𝑓 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑖𝑖)(𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣 𝑖𝑖𝑖𝑖 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑖𝑖)
k-period weighted moving average =
∑𝑇𝑇
𝑖𝑖=1(𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤ℎ𝑡𝑡𝑡𝑡)
Page 4 of 6
STAT 5002 Chapter 20 Prof. Tan Le
---♦---
EXPONENTIAL SMOOTHING
Exponential smoothing is also a type of moving averages model.
Forecast for period (t+1) = forecast for period t + 𝛼𝛼(actual value in period t – forecast for period t)
Ft+1 = Ft + α (At – Ft)
or, Ft+1 = α At + (1 – α) Ft
where
Ft = value of the exponentially smoothed series being computed in time period t
Ft+1 = value of the exponentially smoothed series being computed in time period t = 1.
At = actual value of the time series in period t
𝛼𝛼 = subjectively assigned weighted or smoothing constant (where 0 ≤ 𝛼𝛼 ≤ 1)
F1 = A1
Example: Wallace Garden Supply Example
Month Actual Forecast ( α = 0.1) Error |Error| (Error)2 |Error|/Actual
Sales =A-F
Jan 10 10.00 (assumed) “ “ “ “
Feb 12 10.0 +.1(10-10.0) = 10.00 2.000 2.000 4.000 0.167
Mar 16 10.0 +.1(12-10.0) = 10.20 5.800 5.800 33.640 0.364
Apr 13 10.2 +.1(16-10.2) = 10.78 2.220 2.220 4.928 0.171
May 17 10.8 +.1(13-10.8) = 11.00 5.998 5.998 35.976 0.353
Jun 19 11.0 +.1(17-11.0) = 11.60 7.398 7.398 54.733 0.389
Jul 15 11.6 +.1(19-11.6) = 12.34 2.658 2.658 7.067 0.177
Aug 20 12.3 +.1(15-12.3) = 12.61 7.393 7.393 54.650 0.370
Sep 22 12.6 +.1(20-12.6) = 13.35 8.653 8.653 74.879 0.393
Oct 19 13.4 +.1(22-13.4) = 14.21 4.788 4.788 22.925 0.252
Nov 21 14.2 +.1(19-14.2) = 14.69 6.309 6.309 39.806 0.300
Dec 19 14.7 +.1(21-14.7) = 15.32 3.678 3.678 13.529 0.194
Totals 56.89 346.137 0.956
Forecast for Jan of following year: F13 = F12 + α (A12 – F12) = 15.32 + 0.1(19 – 15.32) = 15.69
Mean Absolute Deviation (Mean Absolute Error): MAD = MAE = (∑|Error|)/11 = 56.89/11 = 5.172
MSE = (∑|Error|2)/11 = 346.137/11 = 31.467
MAPE = [∑(|Error|)/Actual)]/11 = 3.13/11 = 0.2844 = 28.44%
Page 5 of 6
STAT 5002 Chapter 20 Prof. Tan Le
2 2
Relationship between 𝛼𝛼 and k: 𝛼𝛼= , or k = –1
𝑘𝑘+1 𝛼𝛼
Note:
If we desire only to smooth a series by eliminating unwanted cyclical and irregular variations, we should
select a small value for 𝛼𝛼 (close to 0).
On the other hand, if our goal is forecasting, we should choose a large value for 𝛼𝛼 (close to 1).
---♦---
CHAPTER SUMMARY
(Homework:)
The following data represent the annual sales (in millions of dollars) for a food-processing
company for the years 1993 – 2014.
a) Fit a 3 year moving average to the data and plot the results on your chart.
b) Using a smoothing coefficient of w = 0.25, exponentially smooth the series
c) What is your exponentially smoothed forecast for the trend in 2015?
Page 6 of 6
STATISTICS Chapter 11 & Chapter 12(Part II) Prof. Tan Le
Chapter 11:
INTRODUCTION TO HYPOTHESIS TESTING
11.1 CONCEPT OF HYPOTHESIS TESTING
The null hypothesis (denoted by H 0 ) is a statement about the value of a population parameter (such
as the mean), and it must contain the condition of equality and must be written with the symbol =, ≤,
or ≥. For the mean, the null hypothesis will be stated in one of these three possible forms:
H0 : µ = k H0 : µ ≤ k H0 : µ ≥ k
We test the null hypothesis directly in the sense that we assume it is true and reach a conclusion to either
reject H 0 or fail to reject H 0 .
The alternative hypothesis (denoted by H1 or Ha) is the statement that must be true if the null
hypothesis is false. For the mean, the alternative hypothesis will be stated in only one of these three
possible forms:
Ha : µ ≠ k Ha : µ > k Ha : µ < k
(two tailed test) (right tailed test) (left tailed test)
Note that H1 (or Ha) is the opposite of H 0 .
Page 1 of 10
STATISTICS Chapter 11 & Chapter 12(Part II) Prof. Tan Le
Compare Test Statistic and the Critical Value from the Table:
If the test statistic (value) is in the rejected region, we reject the null hypothesis H0, we conclude that
there is enough statistical evidence to infer that the alternative hypothesis is true.
If the test statistic (value) is not in the rejected region, we do not reject the null hypothesis H0, we
conclude that there is not enough statistical evidence to infer that the alternative hypothesis is true.
---♦---
x−µ x−µ
Standardized Test Statistic: z = , or Z= n
σ σ
n
---♦---
p-value: The p-value of a test is the probability of observing a test statistic at least as extreme as the one
computed given that the null hypothesis is true.
When the p-value > significance level α, there is no evidence to infer that the alternative hypothesis Ha is
true. We also say that the test is not significant. → Do not reject (accept) the null hypothesis H0.
---♦---
Compare Test Statistic and the Critical Value from the Table:
If the test statistic (value) is in the rejected region, we reject the null hypothesis H0, we conclude that
there is enough statistical evidence to infer that the alternative hypothesis is true.
If the test statistic (value) is not in the rejected region, we do not reject the null hypothesis H0, we
conclude that there is not enough statistical evidence to infer that the alternative hypothesis is true.
Note: When we state the hypotheses, we list the alternative hypothesis Ha first followed by the null hypothesis Ho.
---♦---
Page 2 of 10
STATISTICS Chapter 11 & Chapter 12(Part II) Prof. Tan Le
Example: A consumer analysis reports that the mean life of a certain type of automobile battery is not 72 months.
↓
Note: The claim “the mean … is not 72 months” can be written as µ ≠ 72 months.
Its complement is µ = 72 months. Because µ = 72 months contains the statement of equality, it becomes the null hypothesis.
In this case, alternative hypothesis represents the claim.
Ho: µ = 72 months
Ha: µ ≠ 72 months. (claim)
accepted region
Sign used in
Two-tailed test
---♦---
Example: The college cafeteria claims that the average amount spent by a student per visit is $3.50.
↓
Note: The claim “the average … is 3.50” can be written as µ = $3.50.
Its complement is µ ≠ $3.50. Because µ = $3.50 contains the statement of equality, it becomes the null hypothesis. In this
case, the null hypothesis represents the claim.
You intent to test the mean amount spent differs from this amount.
Page 3 of 10
STATISTICS Chapter 11 & Chapter 12(Part II) Prof. Tan Le
Example: A company advertises that the mean life of its furnaces is more than 15 years
↓
Note: The claim “the mean … is more than 15 years” can be written as µ > 15 years.
Its complement is µ ≤ 15 years. Because µ ≤ 15 years contains the statement of equality, it becomes the null hypothesis. In
this case, alternative hypothesis represents the claim.
Ho: µ ≤ 15 years
Ha: µ > 15 years (claim)
rejected region
accepted region
Sign used in
Right-tailed test
---♦---
Example: A company claim that their product contains no more than 2 grams of saturated fat on average.
↓
Note: The claim “…product contains no more than 2 grams” can be written as µ ≤ 2 grams.
Its complement is µ > 2 grams. Because µ ≤ 2 grams contains the statement of equality, it becomes the null hypothesis. In this
case, the null hypothesis represents the claim.
You intent to test whether there is strong evidence the mean saturated fat content is greater than their claim.
Page 4 of 10
STATISTICS Chapter 11 & Chapter 12(Part II) Prof. Tan Le
Example: A car dealership announces that the mean time for an oil change is less than 20 minutes.
↓
Note: The claim “the mean … is less than 20 minutes” can be written as µ < 20 minutes.
Its complement is µ ≥ 20 minutes. Because µ ≥ 20 minutes contains the statement of equality, it becomes the null hypothesis.
In this case, alternative hypothesis represents the claim.
Ho: µ ≥ 20 minutes
Ha: µ < 20 minutes (claim)
rejected region
accepted region
Sign used in
Left-tailed test
---♦---
Example: The shipping dock receive 40 kg bags of flour to use in the production of the baked goods. A sample of
25 bags is taken from each shipment and weighed to ensure that the average weight is at least 40kg.
↓
Note: The claim “…the average weight is at least 40 kg” can be written as µ ≥ 40 kg.
Its complement is µ < 40 kg. Because µ ≥ 40 kg contains the statement of equality, it becomes the null hypothesis. In this
case, the null hypothesis represents the claim.
You intent to test whether there is a strong evidence that the mean weight is less than 40 kg.
Ho: µ ≥ 40 kg (claim)
Ha: µ < 40 kg
Page 5 of 10
STATISTICS Chapter 11 & Chapter 12(Part II) Prof. Tan Le
1). Step 1. Identify the specific claim (about population mean μ) or hypothesis to be tested and put it in
symbolic form.
The null hypothesis H0 is the one that contains the condition of equality.
The alternative hypothesis Ha (or H1) is the other statement.
2). Step 2. Bell curve figure (two tails or one tail) and Z critical (Z Table).
(OUTWARD method)
Page 6 of 10
STATISTICS Chapter 11 & Chapter 12(Part II) Prof. Tan Le
accepted region
99%
0 +2.33 Z
(Note: given right tail = 1% → the rest of the curve is 99% = 0.99 → find Z value by the Table 3
0.99 ≈ 0.9901 → Z = +2.33)
And/or
p-value = P(Z > 3.00) = 0.13% = 0.0013.
p-value = 0.0013 which is less than 0.01. Thus, there is not overwhelming evidence to infer that the
alternative hypothesis Ha is true. Reject the H0: μ ≤ 750
---♦---
Page 7 of 10
STATISTICS Chapter 11 & Chapter 12(Part II) Prof. Tan Le
Solution:
1. Step 1: H0: μ = 17.09
Ha: μ ≠ 17.09 (two tailed test).
-1.96 1.96
Reject H0 Reject H0
And/or
p-value = P(Z < –1.19) + P(Z > 1.19) = 0.1170 + 0.1170 = 0.2340.
p-value = 0.2340 which is more than 0.05. Thus, there is not enough evidence to infer that there is not a
difference between the average AT&T bill and that of its competitor.
Do not reject the null hypothesis H0: μ = 17.09.
(Accept the null hypothesis H0: μ = 17.09)
Page 8 of 10
STATISTICS Chapter 11 & Chapter 12(Part II) Prof. Tan Le
1. Step 1: Identify the specific claim (about population mean μ) or hypothesis to be tested and put it in
symbolic form.
The null hypothesis H0 is the one that contains the condition of equality.
The alternative hypothesis Ha (or H1) is the other statement.
3. Step 3: Bell curve figure (two tails or one tail) and Table 4 (t Table).
Degree of freedom = n – 1
Page 9 of 10
STATISTICS Chapter 11 & Chapter 12(Part II) Prof. Tan Le
2. Step 2: two tails α = 0.05 = 5%, (each tail = α/2 = 0.05/2 = 0.025)
Degree of freedom = n – 1 = 162 – 1 = 161,
(there is no degrees of freedom 161, use the closest df. is 160 → find t value by the Table 4)
accepted region
– 1.975 0 +1.975 t
CHAPTER 11 SUMMARY:
Exercises: 11.4, 11.8, 11.10, 11.12, 11.14, 11.36, 11.38, 11.40, 11.42
CHAPTER 12 SUMARY:
Exercises: 12.10, 12.12, 12.14, 12.16, 12.18, 12.20, 12.22, 12.24, 12.26, 12.28
Page 10 of 10
STAT 5002 PRACTICE #5 (part II) Prof. Tan Le
#1. A random sample of 25 was drawn from a population. The sample mean and standard
deviation are x = 510 and s = 125. Estimate population mean µ with 98% confidence.
Solution: Given: n = 25, x = 510, s = 125.
98% confidence
α/2 = 1% α/2 = 1%
0.98
–2.492 +2.492 t
sx s 125
Step 2: Standard error = serror = = = = 25.00
n 25
s 125
Margin of error = E = t.Serror = t. = (2.492) = (2.492) (25) = 62.3
n 25
Page 1 of 1
STAT 5002 PRACTICE #6 Prof. Tan Le
1. A manufacturer of light bulbs advertises that, on average, its long-life bulb will last more
than 5,000 hours. To test the claim, a statistician took a random sample of 100 bulbs and
measured the amount of time until each bulb burned out. The sample mean was calculated as
x = 5,060 hours. If we assume that the lifetime of this type of bulb has a population standard
deviation σ = 350 hours, can we conclude at the 5% significant level that the claim is true?
Solution:
Accepted region
0.95
Page 1 of 2
STAT 5002 PRACTICE #6 Prof. Tan Le
2. A major bank is concerned about the amount of debt being accrued by customers using its
credit cards. The Board of Directors voted to institute an expensive monitoring system if the
mean credit card debt of all the bank’s customers is $2,000. The bank randomly selected 28
credit-card holders and determined the amount of credit card debt charged. For this sample
group, the sample mean was $2,180 and the sample standard deviation was $800. Use a 5%
level of significance to test the claim that the mean credit card debt is not equal to $2,000.
Solution:
Accepted region
Page 2 of 2
STAT 5002 PRACTICE #7 (SOLUTION) Prof. Tan Le
Name: ______________________________
ID#: ______________________________
#1. Sample of size 6 were drawn independently from two normal populations. These data are
listed below. Test to determine whether the means of the two populations no differ. (Use α = 0.05)
Sample 1: 12, 6, 5, 8, 11, 5 → X 1 = 7.83, and given σ1 = 3.06
Sample 2: 7, 11, 13, 5, 8, 7 → X 2 = 8.50, and given σ2 = 2.95
Solution:
accepted region
–1 .96 1.96
Z
Page 1 of 4
STAT 5002 PRACTICE #7 (SOLUTION) Prof. Tan Le
=
(7.83 − 8.50) − 0
3.06 2 2.95 2
+
6 6
− 0.67
=
3.01
− 0.67
=
1.734935157
= –0.386
---♦---
Page 2 of 4
STAT 5002 PRACTICE #7 (SOLUTION) Prof. Tan Le
#2. An advertising agency wants to determine whether a new training program improved an
agent’s ability to undertake an advertising job. A supervisor rated six agents on their ability
before and after the program as follows (high score indicates greater ability).
Before After
Mary 7.6 14.7
John 9.9 14.1
Bill 8.6 11.8
Jim 9.5 16.1
Peter 8.4 14.7
Tom 9.2 14.1
Is there enough evidence to support the new training program improved an agent’s ability?
Uses 5% level of significance.
Where d = x D = ∑d =
− 32.3
= – 5.38
n 6
2. Step 2:
SD =
∑ (D − D ) 2
=
11.67
=
11.67
= 2.334 = 1.53
n −1 6 −1 5
Page 3 of 4
STAT 5002 PRACTICE #7 (SOLUTION) Prof. Tan Le
rejected region
accepted region
–2.015
t
D − µD − 5.38 − 0
4. Step 4: Test statistic: t = = = -8.62
sD 1.53
n 6
Page 4 of 4
STAT 5002 Practice #8 (Solution) Prof. Tan Le
REGRESSION (SOLUTION)
1. Attempting to analyze the relationship between advertising and sales, the owner of a furniture
store recorded the monthly advertising budget ($ thousands) and the sales ($ million) for a
sample of 12 months. The data are listed here: (text #16.2)
Advertising 23 46 60 54 28 33 25 31 36 88 90 99
Sales 9.6 11.3 12.8 9.8 8.9 12.5 12.0 11.4 12.6 13.7 14.4 15.9
Suppose the regression line for the data is ŷ = 9.10 + 0.058 x and r = 0.78.
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.881173563
R Square 0.7764668
Adjusted R Square 0.748525204
Standard Error 7.375607542
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 1511.703307 1511.703 27.7889 0.000753909
Residual 8 435.1966929 54.39959
Total 9 1946.9
Upper Lower
Coefficients Standard Error t Stat P-value Lower 95% 95% 95.0%
Intercept 5.9217458 12.7314594 0.465127 0.654239 -23.4370522 35.28054 -23.4371
Time Spent Studying 1.7048644 0.323410691 5.271515 0.000754 0.959078057 2.450651 0.959078
Solution:
a) Test to determine whether there is enough evidence in the Excel output above to infer that
there is a linear relationship between the years of experience and the annual bonus. Use 5%
significance level.
Solution: Testing the slope b1
Page 2 of 3
STAT 5002 Practice #8 (Solution) Prof. Tan Le
Step 2:
Upper Lower
Coefficients Standard Error t Stat P-value Lower 95% 95% 95.0% 9
Intercept 5.9217458 12.7314594 0.465127 0.654239 -23.4370522 35.28054 -23.4371 3
Time Spent Studying 1.7048644 0.323410691 5.271515 0.000754 0.959078057 2.450651 0.959078 2
Step 3: Conclusion
The p-value = 0.000754 < α = 0.05.
Reject H0: β1 = 0
Page 3 of 3
STAT 5002 PRACTICE #9 Prof. Tan Le
Regression Statistics
Multiple R 0.856184
R Square 0.733051
Adjusted R Square 0.682999
Standard Error 44.738608
Observations 20
ANOVA
df SS MS F Significance F p-value
Regression 3 87941.06119 29313.69 14.64554 0.00007518
Residual 16 32024.68881 2001.543
Total 19 119965.75
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 140.3187 36.1172 3.8851 0.0013 63.7536 216.8839 63.7536 216.8839
Mean Outside Temperature -12.3168 2.6850 -4.5873 0.0003 -18.0087 -6.6249 -18.0087 -6.6249
Attic Insulation (cm) -4.1013 1.7248 -2.3778 0.0302 -7.7578 -0.4449 -7.7578 -0.4449
Age of Furnace (years) 8.5268 3.2954 2.5875 0.0198 1.5409 15.5126 1.5409 15.5126
Solution:
Recall: The population multiple linear regression equation for this problem is:
ŷ = β0 + β1 x1 + β2 x2 + β3 x3 + ε
a) Based on the Excel output, does it look that the multiple regression model is overall significant?
Conduct the test at 5% significance level.
Step 1:
Ho: β0 = β1= β2= β3 = 0
Ha: At least one of the regression coefficient βi is not equal to 0.
Step 4: Conclusion
Reject H0: β1 = β2 = β3 = 0
(Support H1: At least one of the regression coefficient is not equal to 0.)
At 5% significance level, there is at least one significance predictor of the home heating cost.
The model is overall significant. (There is a linear relationship exits at 5% significance level.)
Page 1 of 2
STAT 5002 PRACTICE #9 Prof. Tan Le
b) Predict the January heating cost (y) for a home if we know the mean outside temperature for the
month is –8oC, (x1), with 7.5cm of insulation (x2), and the furnace is 6 years old (x3)
Solution:
Page 2 of 2
STAT 5002 PRACTICE #10 Prof. Tan Le
c) Compute the exponentially smoothed values using a smoothing coefficient α = 0.25. Find the
forecast for period 7. Round all values to two decimal places. Ft+1 = α At + (1 – α) Ft
d) Calculate MAD for 3-period moving averages and weighted moving averages (Use results from
question a and b). Which method provides better forecast? Round MAD to three decimal places.