Professional Documents
Culture Documents
Statistical Analysis BSA
Statistical Analysis BSA
Statistical Analysis BSA
Nominal
• Data are labels or names used to identify an
attribute of the element.
• A nonnumeric label or a numeric code may be
used.
Nominal
• Example:
Students of a university are classified by the
school in which they are enrolled using a
nonnumeric label such as Business,
Humanities, Education, and so on.
Alternatively, a numeric code could be used for
the school variable (e.g. 1 denotes Business, 2
denotes Humanities, 3 denotes Education, and
so on).
Ordinal
• The data have the properties of nominal data and
the order or rank of the data is meaningful.
• A nonnumeric label or a numeric code may be
used.
Ordinal
• Example:
Students of a university are classified by their
class standing using a nonnumeric label such as
Freshman, Sophomore, Junior, or Senior.
Alternatively, a numeric code could be used for
the class standing variable (e.g. 1 denotes
Freshman, 2 denotes Sophomore, and so on).
Interval
• The data have the properties of ordinal data and
the interval between observations is expressed in
terms of a fixed unit of measure.
• Interval data are always numeric.
Interval
• Example:
Melissa has an SAT score of 1205, while Kevin
has an SAT score of 1090. Melissa scored 115
points more than Kevin.
Ratio
• The data have all the properties of interval data
and the ratio of two values is meaningful.
• Variables such as distance, height, weight, and
time use the ratio scale.
• This scale must contain a zero value that indicates
that nothing exists for the variable at the zero
point.
Ratio
• Example:
Melissa’s college record shows 36 credit hours
earned, while Kevin’s record shows 72 credit
hours earned. Kevin has twice as many credit
hours earned as Melissa.
Existing Sources
• Data needed for a particular application might
already exist within a firm. Detailed information
is often kept on customers, suppliers, and
employees for example.
• Substantial amounts of business and economic
data are available from organizations that
specialize in collecting and maintaining data.
Existing Sources
• Government agencies are another important
source of data.
• Data are also available from a variety of industry
associations and special-interest organizations.
Internet
• The Internet has become an important source of
data.
• Most government agencies, like the Bureau of the
Census (www.census.gov), make their data
available through a web site.
• More and more companies are creating web sites
and providing public access to them.
• A number of companies now specialize in making
information available over the Internet.
Statistical Studies
• Statistical studies can be classified as either
experimental or observational.
• In experimental studies the variables of interest
are first identified. Then one or more factors are
controlled so that data can be obtained about how
the factors influence the variables.
• In observational (nonexperimental) studies no
attempt is made to control or influence the
variables of interest; an example is a survey.
Time Requirement
• Searching for information can be time consuming.
• Information might no longer be useful by the time
it is available.
Cost of Acquisition
• Organizations often charge for information even
when it is not their primary business activity.
Data Errors
• Using any data that happens to be available or
that were acquired with little care can lead to poor
and misleading information.
91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76
104 74 62 68 97 105 77 65 80 109
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73
12
10
8
6
4
2
Parts
50 60 70 80 90 100 110 Cost ($)
1. Population
consists of all 2. A sample of 50
tune-ups. Average engine tune-ups
cost of parts is is examined.
unknown.
Frequency Distribution
Relative Frequency
Percent Frequency Distribution
Bar Graph
Pie Chart
Frequency Distribution
Rating Frequency
Poor 2
Below Average 3
Average 5
Above Average 9
Excellent 1
Total 20
Bar Graph
9
8
7
Frequency
6
5
4
3
2
1
Rating
Poor Below Average Above Excellent
Average Average
Pie Chart
Exc.
Poor
5%
10%
Below
Average
Above
15%
Average
45%
Average
25%
Quality Ratings
Frequency Distribution
Relative Frequency and Percent Frequency
Distributions
Dot Plot
Histogram
Cumulative Distributions
Ogive
91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76
104 74 62 68 97 105 77 65 80 109
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73
Frequency Distribution
If we choose six classes:
Approximate Class Width = (109 - 52)/6 = 9.5 10
Cost ($) Frequency
50-59 2
60-69 13
70-79 16
80-89 7
90-99 7
100-109 5
Total 50
Relative Percent
Cost ($) Frequency Frequency
50-59 .04 4
60-69 .26 26
70-79 .32 32
80-89 .14 14
90-99 .14 14
100-109 .10 10
Total 1.00 100
Dot Plot
... ..
.. .. .
.. .
.. .
. .
. . . ..... .......... .. . .. . . ... . .. .
50 60 70 80 90 100 110
Cost ($)
Histogram
18
16
14
Frequency
12
10
8
6
4
2
Parts
Cost ($)
50 60 70 80 90 100 110
© 2003 Thomson/South-Western Slide
57
Cumulative Distributions
Cumulative Distributions
Cumulative Cumulative
Cumulative Relative Percent
Cost ($) Frequency Frequency Frequency
< 59 2 .04 4
< 69 15 .30 30
< 79 31 .62 62
< 89 38 .76 76
< 99 45 .90 90
< 109 50 1.00 100
Ogive
• Because the class limits for the parts-cost data are
50-59, 60-69, and so on, there appear to be one-unit
gaps from 59 to 60, 69 to 70, and so on.
• These gaps are eliminated by plotting points
halfway between the class limits.
• Thus, 59.5 is used for the 50-59 class, 69.5 is used
for the 60-69 class, and so on.
100
80
60
40
20
Parts
Cost ($)
50 60 70 80 90 100 110
Stem-and-Leaf Display
5 2 7
6 2 2 2 2 5 6 7 8 8 8 9 9 9
7 1 1 2 2 3 4 4 5 5 5 6 7 8 9 9 9
8 0 0 2 3 5 8 9
9 1 3 7 7 7 8 9
10 1 4 5 5 9
Leaf Units
• A single digit is used to define each leaf.
• In the preceding example, the leaf unit was 1.
• Leaf units may be 100, 10, 1, 0.1, and so on.
• Where the leaf unit is not shown, it is assumed to
equal 1.
Leaf Unit = 10
16 8
17 1 9
18 0 3
19 1 7
Crosstabulation
The number of Finger Lakes homes sold for each
style and price for the past two years is shown below.
Price Home Style
Range Colonial Ranch Split A-Frame Total
< $99,000 18 6 19 12 55
> $99,000 12 14 16 3 45
Total 30 20 35 15 100
Row Percentages
Column Percentages
A Positive Relationship
y
A Negative Relationship
y
No Apparent Relationship
y
Scatter Diagram
The Panthers football team is interested in
investigating the relationship, if any, between
interceptions made and points scored.
x = Number of y = Number of
Interceptions Points Scored
1 14
3 24
2 18
1 17
3 27
Scatter Diagram
Number of Points Scored y
30
25
20
15
10
5
0 x
0 1 2 3
Number of Interceptions
© 2003 Thomson/South-Western Slide
83
Example: Panthers Football Team
%
x
Mean
Median
Mode
Percentiles
Quartiles
Mean
xi 34, 356
x 490.80
n 70
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Median
Median = 50th percentile
i = (p/100)n = (50/100)70 = 35.5
Averaging the 35th and 36th data values:
Median = (475 + 475)/2 = 475
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Mode
450 occurred most frequently (7 times)
Mode = 450
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
90th Percentile
i = (p/100)n = (90/100)70 = 63
Averaging the 63rd and 64th data values:
90th Percentile = (580 + 590)/2 = 585
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Third Quartile
Third quartile = 75th percentile
i = (p/100)n = (75/100)70 = 52.5 = 53
Third quartile = 525
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Range
Interquartile Range
Variance
Standard Deviation
Coefficient of Variation
Range
Range = largest value - smallest value
Range = 615 - 425 = 190
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Interquartile Range
3rd Quartile (Q3) = 525
1st Quartile (Q1) = 445
Interquartile Range = Q3 - Q1 = 525 - 445 = 80
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
2
Variance
2 ( xi x ) 2
s 2 , 996.16
n 1
Standard Deviation
s s 2 2996. 47 54. 74
Coefficient of Variation
s 54. 74
100 100 11.15
x 490.80
The Weighted Mean and
Working with Grouped Data
%
x
© 2003 Thomson/South-Western Slide
112
Measures of Relative Location
and Detecting Outliers
z-Scores
Chebyshev’s Theorem
Empirical Rule
Detecting Outliers
Chebyshev’s Theorem
Empirical Rule
Interval % in Interval
Within +/- 1s 436.06 to 545.54 48/70 = 69%
Within +/- 2s 381.32 to 600.28 68/70 = 97%
Within +/- 3s 326.58 to 655.02 70/70 = 100%
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Detecting Outliers
The most extreme z-scores are -1.20 and 2.27.
Using |z| > 3 as the criterion for an outlier,
there are no outliers in this data set.
Standardized Values for Apartment Rents
-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93
-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75
-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47
-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20
-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.35
0.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.45
1.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27
Five-Number Summary
Box Plot
Smallest Value
First Quartile
Median
Third Quartile
Largest Value
Five-Number Summary
Lowest Value = 425 First Quartile = 450
Median = 475
Third Quartile = 525 Largest Value = 615
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Box Plot
Lower Limit: Q1 - 1.5(IQR) = 450 - 1.5(75) = 337.5
Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(75) = 637.5
There are no outliers.
x = wi xi
wi
where:
xi = value of observation i
wi = weight for observation i
Sample Data
x
fM
i i
f i
Population Data
fM
i i
N
where:
fi = frequency of class i
Mi = midpoint of class i
Sample Data
2 f i ( Mi x ) 2
s
n 1
Population Data
2
f i ( M i )
2
N
s 3, 017.89 54. 94
This approximation differs by only $.20
from the actual standard deviation of $54.74.
n = 30
Finite Population
• A simple random sample from a finite population
of size N is a sample selected such that each
possible sample of size n has the same probability
of being selected.
• Replacing each sampled element before selecting
subsequent elements is called sampling with
replacement.
Finite Population
• Sampling without replacement is the procedure
used most often.
• In large sampling projects, computer-generated
random numbers are often used to automate the
sample selection process.
Infinite Population
• A simple random sample from an infinite
population is a sample selected such that the
following conditions are satisfied.
• Each element selected comes from the same
population.
• Each element is selected independently.
Infinite Population
• The population is usually considered infinite if it
involves an ongoing process that makes listing or
counting every element impossible.
• The random number selection procedure cannot
be used for infinite populations.
x i
990
900
• Population Standard Deviation
i
( x )2
80
900
Sample Data
Random
No. Number Applicant SAT Score On-Campus
1 744 Connie Reyman 1025 Yes
2 436 William Fox 950 Yes
3 865 Fabian Avante 1090 No
4 790 Eric Paxton 1120 Yes
5 835 Winona Wheeler 1015 No
. . . . .
30 685 Kevin Cossack 965 No
Point Estimates
• x as Point Estimator of
x
x
29,910
i
997
30 30
• s as Point Estimator of
s
(x i x )2
163,996
75.2
29 29
• p as Point Estimator of p
p 20 30 .68
Point Estimates
Note: Different random numbers would have
identified a different sample which would have
resulted in different point estimates.
Standard Deviation of x
Finite Population Infinite Population
N n
x ( ) x
n N 1 n
• A finite population is treated as being
infinite if n/N < .05.
• ( N n ) / ( N 1) is the finite correction factor.
• x is referred to as the standard error of the mean.
80
x 14.6
n 30
x
E ( x ) 990
Sampling
distribution
Area = ? of x
x
980 990 1000
Sampling
Area = 2(.2518) = .5036
distribution
of x
x
980 990 1000
E ( p) p
where:
p = the population proportion
Standard Deviation of p
Finite Population Infinite Population
p (1 p ) N n p (1 p )
p p
n N 1 n
• p is referred to as the standard error of the
proportion.
np > 5
and
n(1 – p) > 5
.72(1 .72)
p .082
30
E( p ) .72
Sampling
distribution
of p
Area = ?
p
0.67 0.72 0.77
Sampling
Area = 2(.2291) = .4582 distribution
of p
p
0.67 0.72 0.77
Population Condition
H0 True Ha True
Conclusion ( ) ( )
Test Statistic
Known Unknown
x 0 x 0
z z
/ n s/ n
Rejection Rule
Reject H0 if z > zReject H0 if z < -z
Sampling distribution
of x (assuming H0 is
true and = 12) Reject H0
1.645 x
x
12 c
(Critical value)
Reject H0
z
0 1.645 2.47
Rejection Rule
Reject H0 if |z| > z
z
-1.96 0 1.96
Reject H0
t
0 1.753
(Critical value)
x x x x Increase n
z z z t
/ n s/ n / n s/ n to > 30
where:
p0 (1 p0 )
p
n
2 ?
1 =
ANOVA
© 2003 Thomson/South-Western Slide
238
Estimation of the Difference Between the Means
of Two Populations: Independent Samples
Point Estimator of the Difference between the Means
of Two Populations
Sampling Distribution x1 x2
Interval Estimate of Large-Sample Case
Interval Estimate of Small-Sample Case
12 22
x1 x2
n1 n2
x1 x2 z / 2 x1 x2
where:
1 - is the confidence coefficient
where:
s12 s22
sx1 x2
n1 n2
Population 1 Population 2
Par, Inc. Golf Balls Rap, Ltd. Golf Balls
m1 = mean driving m2 = mean driving
distance of Par distance of Rap
golf balls golf balls
m1 – m2 = difference between
the mean distances
Simple random sample Simple random sample
of n1 Par golf balls of n2 Rap golf balls
x1 = sample mean distance x2 = sample mean distance
for sample of Par golf ball for sample of Rap golf ball
x1 - x2 = Point Estimate of m1 – m2
x1 x2 z / 2 x1 x2
where:
1 1
2
x1 x2 ( )
n1 n2
2 1 1 1 1
x1 x2 t.025 s ( ) 2. 5 2.101 5. 28( )
n1 n2 12 8
= 2.5 + 2.2 or .3 to 4.7 miles per gallon.
We are 95% confident that the difference between the
mean mpg ratings of the two car types is from .3 to
4.7 mpg (with the M car having the higher mpg).
Test Statistic
Large-Sample Small-Sample
( x1 x2 ) ( 1 2 ) ( x1 x2 ) ( 1 2 )
z t
12 n1 22 n2 s 2 (1 n1 1 n2 )
• Hypotheses
H0: d = 0, Ha: d
j j
n (
j 1
x x ) 2
MSTR
k 1
j
( n
j 1
1) s 2
j
MSE
nT k
Hypotheses
H0: 1=2=3=. . .= k
Ha: Not all population means are equal
Test Statistic
F = MSTR/MSE
Rejection Rule
Reject H0 if F > F
where the value of F is based on an F distribution
with k - 1 numerator degrees of freedom and nT - 1
denominator degrees of freedom.
Analysis of Variance
J. R. Reed would like to know if the mean number
of hours worked per week is the same for the
department managers at her three manufacturing
plants (Buffalo, Pittsburgh, and Detroit).
A simple random sample of 5 managers from
each of the three plants was taken and the number of
hours worked by each manager for the previous
week is shown on the next slide.
Analysis of Variance
Plant 1 Plant 2 Plant 3
Observation Buffalo Pittsburgh Detroit
1 48 73 51
2 54 63 63
3 57 66 61
4 54 64 54
5 62 74 56
Sample Mean 55 68 57
Sample Variance 26.0 26.5 24.5
Analysis of Variance
• Hypotheses
H0: 1=2=3
Ha: Not all the means are equal
where:
1 = mean number of hours worked per
week by the managers at Plant 1
2 = mean number of hours worked per
week by the managers at Plant 2
3 = mean number of hours worked per
week by the managers at Plant 3
Analysis of Variance
• Mean Square Treatment
Since the sample sizes are all equal
= + 68 + 57)/3 = 60
x = (55
SSTR = 5(55 - 60)2 + 5(68 - 60)2 + 5(57 - 60)2 = 490
MSTR = 490/(3 - 1) = 245
• Mean Square Error
SSE = 4(26.0) + 4(26.5) + 4(24.5) = 308
MSE = 308/(15 - 3) = 25.667
Analysis of Variance
• F - Test
If H0 is true, the ratio MSTR/MSE should be near
1 since both MSTR and MSE are estimating 2. If
Ha is true, the ratio should be significantly larger
than 1 since MSTR tends to overestimate 2.
Analysis of Variance
• Rejection Rule
Assuming = .05, F.05 = 3.89 (2 d.f. numerator,
12 d.f. denominator). Reject H0 if F > 3.89
• Test Statistic
F = MSTR/MSE = 245/25.667 = 9.55
Analysis of Variance
• ANOVA Table
Analysis of Variance
• Conclusion
F = 9.55 > F.05 = 3.89, so we reject H0. The mean
number of hours worked per week by department
managers is not the same at each plant.
p = 0
p -
H o: 1
2
p = 0
p -
H a: 1
2
Expected Value
E ( p1 p2 ) p1 p2
Standard Deviation
p1 (1 p1 ) p2 (1 p2 )
p1 p2
n1 n2
Distribution Form
If the sample sizes are large (n1p1, n1(1 - p1), n2p2,
and n2(1 - p2) are all greater than or equal to 5), the
sampling distribution of p1 p2 can be approximated
by a normal probability distribution.
p1 (1 p1 ) p2 (1 p2 )
p1 p2
n1 n2
p1 p2
p1 – p2
Interval Estimate
p1 p2 z / 2 p1 p2
Point Estimator of p1 p2
p1 (1 p1 ) p2 (1 p2 )
s p1 p2
n1 n2
.08 + 1.96(.0510)
.08 + .10
-.02 to +.18
Hypotheses
H 0 : p1 - p2 < 0
H a : p1 - p2 > 0
Test statistic
( p1 p2 ) ( p1 p2 )
z
p1 p2
s p1 p2 p (1 p )(1 n1 1 n2 )
where:
n1 p1 n2 p2
p
n1 n2
(. 48. 40) 0 . 08
z 1. 56
. 0514 . 0514
5. Reject H0 if 2 (where
2 is the significance level and
there are k - 1 degrees of freedom).
• Test Statistic
2 2 2 2
( 30 25) ( 20 25) ( 35 25) (15 25)
2
25 25 25 25
=1+1+4+4
= 10
( f ij eij ) 2
2
i j eij
y = b0 + b1x +e
E(y) = 0 + 1x
E(y)
Regression line
Intercept Slope b1
b0
is positive
E(y)
Slope b1
is negative
No Relationship
E(y)
Regression line
Intercept
b0
Slope b1
is 0
ŷ b0 b1 x
Estimated
b0 and b1 Regression Equation
provide estimates of ŷ b0 b1 x
b0 and b1 Sample Statistics
b0, b1
min (y i y i ) 2
where:
yi = observed value of the dependent variable
for the ith observation
y^i = estimated value of the dependent variable
for the ith observation
xi y i ( xi y i ) / n
b1 2 2
xi ( xi ) / n
b0 y b1 x
where:
xi = value of independent variable for ith observation
yi = value of dependent variable for ith observation
_
x = mean value for independent variable
_
y = mean value for dependent variable
n = total number of observations
Scatter Diagram
30
25
20
Cars Sold
y^ = 10 + 5x
15
10
5
0
0 1 2 3 4
TV Ads
( y i y )2 ( y^i y )2 ( y i y^i )2
where:
SST = total sum of squares
SSR = sum of squares due to regression
SSE = sum of squares due to error
r2 = SSR/SST
where:
SST = total sum of squares
SSR = sum of squares due to regression
Coefficient of Determination
r2 = SSR/SST = 100/114 = .8772
The regression relationship is very strong because
88% of the variation in number of cars sold can be
explained by the linear relationship between the
number of TV ads and the number of cars sold.
rxy (sign of b1 ) r 2
where:
b1 = the slope of the estimated regression
equation yˆ b0 b1 x
rxy (sign of b1 ) r 2
The sign of b1 in the equation yˆ 10 is5“+”.
x
rxy = + .8772
rxy = +.9366
An Estimate of s 2
The mean square error (MSE) provides the estimate
of s 2, and the notation s2 is also used.
s2 = MSE = SSE/(n-2)
where:
SSE ( yi yˆ i ) 2 ( yi b0 b1 xi ) 2
An Estimate of s
• To estimate s we take the square root of s 2.
• The resulting s is called the standard error of the
estimate.
SSE
s MSE
n2
Hypotheses
H0: 1 = 0
Ha: 1 = 0
Test Statistic
b1
t
sb 1
Rejection Rule
t Test
• Hypotheses
H0 : 1 = 0
Ha: 1 = 0
• Rejection Rule
For = .05 and d.f. = 3, t.025 = 3.182
Reject H0 if t > 3.182
t Test
• Test Statistics
t = 5/1.08 = 4.63
• Conclusions
t = 4.63 > 3.182, so reject H0
is tthe
/ 2 t value providing an area
of a/2 in the upper tail of a
t distribution with n - 2 degrees
of freedom
Rejection Rule
Reject H0 if 0 is not included in
the confidence interval for 1.
95% Confidence Interval for 1
b1 t / 2 sb1
= 5 +/- 3.182(1.08) = 5 +/- 3.44
or 1.56 to 8.44
Conclusion
0 is not included in the confidence interval.
Reject H0
Hypotheses
H0 : 1 = 0
Ha : 1 = 0
Test Statistic
F = MSR/MSE
Rejection Rule
Reject H0 if F > F
F Test
• Hypotheses
H0 : 1 = 0
Ha: 1 = 0
• Rejection Rule
For = .05 and d.f. = 1, 3: F.05 = 10.13
Reject H0 if F > 10.13.
F Test
• Test Statistic
F = MSR/MSE = 100/4.667 = 21.43
• Conclusion
F = 21.43 > 10.13, so we reject H0.
yp + t/2 sind
Point Estimation
If 3 TV ads are run prior to a sale, we expect the mean
number of cars sold to be:
y^ = 10 + 5(3) = 25 cars
Residuals
Observation Predicted Cars Sold Residuals
1 15 -1
2 25 -1
3 20 -2
4 15 2
5 25 2
Residual Plot
1
0
-1
-2
-3
0 1 2 3 4
TV Ads
Residual Plot
y yˆ
Good Pattern
Residual
Residual Plot
y yˆ
Nonconstant Variance
Residual
Residual Plot
y yˆ
Model Form Not Adequate
Residual
Estimated Multiple
Regression Equation
b0, b1, b2, . . . , bp yˆ b0 b1 x1 b2 x2 ... bp x p
provide estimates of b0, b1, b2, . . . , bp
b0, b1, b2, . . . , bp are sample statistics
i
( y y ) 2
i
( ˆ
y y )2
i i
( y ˆ
y )2
R 2 = SSR/SST
n1
Ra2 1 ( 1 R 2 )
np1
where
y = annual salary ($000)
x1 = years of experience
x2 = score on programmer aptitude test
Least Squares
Input Data Output
x1 x2 y
Computer b0 =
Package b1 =
4 78 24
for Solving b2 =
7 100 43
Multiple
. . . R2 =
Regression
. . .
Problems etc.
3 89 30
The regression is
Salary = 3.174 + 1.404 Exper + 0.251 Score
Predictor Coef Stdev t-ratio
p
Constant 3.174 6.156 .52 .613
Exper 1.4039 .1986 7.07 .000
Score .25089 .07735 3.24 .005
s = 2.419 R-sq = 83.4% R-sq(adj) =
81.5%
Hypotheses
H 0: 1 = 2 = . . . = p = 0
Ha: One or more of the parameters
is not equal to zero.
Test Statistic
F = MSR/MSE
Rejection Rule
Reject H0 if F > F
where F is based on an F distribution with p d.f. in
the numerator and n - p - 1 d.f. in the denominator.
Hypotheses
H0: i = 0
Ha: i = 0
Test Statistic
bi
t
sbi
Rejection Rule
Reject H0 if t < -tor t > t
where t is based on a t distribution with
n - p - 1 degrees of freedom.
F Test
• Hypotheses
H0 : 1 = 2 = 0
Ha: One or both of the parameters
is not equal to zero.
• Rejection Rule
For = .05 and d.f. = 2, 17:
F.05 = 3.59
Reject H0 if F > 3.59.
F Test
• Test Statistic
F = MSR/MSE
= 250.16/5.85 = 42.76
• Conclusion
H0: i = 0
Ha: i = 0
• Rejection Rule
For = .05 and d.f. = 17:
t.025 = 2.11
Reject H0 if t > 2.11
b1 1. 4039 b2 . 25089
7 . 07 3. 24
sb1 . 1986 sb2 . 07735
• Conclusions
Reject H0: 1 = 0 and reject H0: 2 = 0.
Both independent variables are significant.