Professional Documents
Culture Documents
Excel Statistical Analysis
Excel Statistical Analysis
Excel Statistical Analysis
Conjoint Analysis is used by marketers to tell which product attributes of a product are most important to a consume
and to what degree is each important to the consumer.
Step 4 - Final data preparation step prior to running regression - Remove 1 variable from each set of
variables with more than 1 choice. Removal of these variables removes the predictability of the other variables.
Card
A
B
C
Red
Blue
$50
$100
$150
1
1
0
0
1
0
1
0
0
2
1
0
0
1
0
0
1
0
3
1
0
0
1
0
0
0
1
4
1
0
0
0
1
1
0
0
5
1
0
0
0
1
0
1
0
6
1
0
0
0
1
0
0
1
7
0
1
0
1
0
1
0
0
8
0
1
0
1
0
0
1
0
9
10
11
12
13
14
15
16
17
18
0
0
0
0
0
0
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
0
0
0
1
1
1
0
0
0
0
1
1
1
0
0
0
1
1
1
0
1
0
0
1
0
0
1
0
0
0
0
1
0
0
1
0
0
1
0
1
0
0
1
0
0
1
0
0
1
Card
1
2
3
4
5
6
7
B
0
0
0
0
0
0
1
C
0
0
0
0
0
0
0
Blue
0
0
0
1
1
1
0
$100
0
1
0
0
1
0
0
$150
0
0
1
0
0
1
0
8
9
10
11
12
13
14
15
16
17
18
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
0
0
1
1
1
0
0
0
1
1
1
1
0
0
1
0
0
1
0
0
1
0
0
1
0
0
1
0
0
1
0
0
1
Conjoint
each combination
Preference
5
5
0
8
5
2
7
5
3
9
6
5
10
7
5
9
7
8
m each set of
ity of the other variables.
Preference
5
5
0
8
5
2
7
5
Conjoint is an analysis that provides a marketer with a method to predict how much more or less a co
one combination of product attributes over another combination of product attributes. The degree tha
a product attribute is called the "utility" of that attribute. For example, a product might come in three b
at three levels of price. Each color, brand, and price level will have its own utility caluculated during th
Conjoint is done using Multiple Regression. Each product attribute variation will assigned as one of th
to the Multiple Regression equation. For example, the color red will be represented by one independe
blue will be presented by another independent variable. The resulting regression equation assigns a
variable. These coefficients are the utilities of each of the attributes. The more positive an individual c
highly valued is the associated product attribute. The coefficients can be interrpretted as the utilities o
In this conjoint exercise, we are going to determine the utilities of eight product attributes. They are a
There are 18 possible combinations of these attributes (3 brands x two colors x three prices). The
on a scale of 0 to 10 (10 being the best). The consumer test results are modified for the regression e
The resulting regression analysis calculates a coefficient for each independent variable as part of the
Each coefficient is the measure of value that the consumer places on the product attribute associated
The chart on the left side provides the choices that the consumer had to analyze. The consumer
was provided with 18 separate cards. Each card contained one of the 18 possible variations of
product attributes. The consumer had to rate their overall preference of each combination of attribute
on a scale of 1 to 10.
The chart on the right shows the consumer's stated preference for each combination of attributes.
Non-numerical attributes were assigned numbers. Brand A and Red are shown as 1's in their respect
respective columns. Brand C was assigned a 3 in its respective column.
The chart is now further prepared for Regression Analysis. Each individual product attribute
3
9
6
5
10
7
5
9
7
8
Preference
5
5
0
8
5
2
7
5
3
9
6
5
10
7
5
9
7
8
is given its own column. Each product attribute now has either the value of 1 or 0.
One problem must be corrected before this data can be submitted for Regression
Analysis. Independent variables or combinations of independent variables should
not be able to predict each other. Using independent variables that are highly correlated
to each other (either positively or negatively) produce a regression error known as co-linearity.
For example, if the color is either red or blue, knowing the state of one of the color (if
the state of Blue = 1, the state of Red must = 0), we know the state of the other color.
This error condition also occurs when there are 3 variables. If you know the states of 2,
you know the state of the remaining one.
These error conditions are solved by removing one column of data from each type of
variation. Information about Brand A, Red, and Price level $50 were removed.
We will see below that this has no effect on the accuracy of the Regression output.
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.933190299
R Square
0.8708441342
Adjusted R Square
0.8121369224
Standard Error
1.1413191612
Observations
17
ANOVA
df
Regression
Residual
Total
Intercept
Brand B
Brand C
Blue
$100
$150
5
11
16
Coefficients
5.9166666667
1.5138888889
3.3472222222
1.2314814815
-2.3194444444
-4.3194444444
SS
MS
96.6124727669 19.3224946
14.3287037037 1.30260943
110.9411764706
Standard Error
0.8070345183
0.6989123946
0.6989123946
0.5599921057
0.6989123946
0.6989123946
t Stat
7.33136753
2.16606387
4.7891871
2.19910507
-3.31864832
-6.18023729
The coefficients attached to each of the product attributes simply show the consumer's
utility for that attribute. The utilities for each attribute are relative to each other.
For example, Price level $50 has the highest preference with with a utility of 0 while Price
level $150 has the lowest utility of -4.319444444. Blue has a utility of 1.231481481,
which is that much hgiher than the utility of red, which was 0. Brand C was the most liked brand
with a utility of 3.347222222 with Brand A is liked the least with a utility of 0.
The resulting Regression Equation still does a good job of predicting overall preference.
For example, the consumer rated the combination of attributes on card 13 with a 10.
Here the predicted Combination Preference for card 13 attribute combination is:
(5.9166) + (3.3472)(1) = 9.263 which is very close to the consumer's rating of 10.
The regression appears to be a good one because Adjusted R Squared is high (close to 1).
Adjusted R Square = Explained variance over unexplained variance. Here, Adjusted R Square is 8.12
Each of the variables has a low p-Value and is therefore a significant predictor.
The absolute value of the coefficients indicates the effect that each has on the consumer's
overall liking of product. For example, Brand C (coefficient = 3.347) produced the highest
positive influence while the $150 price (coefficient = -4.319) reduces consumer liking the most.
The overall low significance of the regressions F statistic indicates that the regression, overall, is valid
he value of 1 or 0.
ed for Regression
t variables should
hat are highly correlated
on error known as co-linearity.
Regression output.
F
14.83368241
Significance F
0.000143011
P-value
1.482774E-005
0.0531402239
0.0005630386
0.0501644572
0.0068476873
6.905826E-005
Lower 95%
4.1403956692
-0.0244069189
1.8089264144
-0.0010528323
-3.8577402522
-5.8577402522
Upper 95%
7.692937664
3.052184697
4.88551803
2.464015795
-0.781148637
-2.781148637
Lower 95.0%
4.1403956692
-0.0244069189
1.8089264144
-0.0010528323
-3.8577402522
-5.8577402522
Upper 95.0%
7.6929376641
3.0521846967
4.88551803
2.4640157952
-0.7811486366
-2.7811486366
e combination is:
onsumer's rating of 10.
icant predictor.
Regression
Regression is a statistical techniques that is used to create predictive models. The models receive input (independen
the outcome of the dependent variable.
When performing Multiple Regression, Correlation Analysis should be performed on a independent and dependent va
S&P
0.8799
7.5187
5.558
1.3716
Viacom
0.7541
14.9701
11.9792
7.907
AT&T
2.1407
-2.5948
7.7869
-8.5551
GM
-4.6296
18.986
-1.7226
-0.5535
Coke
-18.8406
6.6964
-3.3473
5.8442
5/29/1998
6/27/1998
-1.6289
2.4171
-5.1724
3.4091
1.2474
0.8214
6.679
1.8261
1.9427
2.1063
S&P
Viacom
0.8799
0.7541
7.5187 14.9701
5.558 11.9792
1.3716
7.907
-1.6289
-5.1724
2.4171
3.4091
AT&T
2.1407
-2.5948
7.7869
-8.5551
1.2474
0.8214
GM
-4.6296
18.986
-1.7226
-0.5535
6.679
1.8261
Correlation Analysis
Tools / Data Analysis / Correlation
S&P
S&P
Viacom
AT&T
GM
Coke
Viacom
AT&T
GM
1
0.9386616468
0.1285583787
0.4703491066
1
-0.0989328142
0.3504379667
1
-0.2637108598
0.2550526617
0.3423373581
-0.5014902082
0.627513676
Coke has a low correlation with the S&P and is therefore not a good predictor of the S&P
Also, if two of the independent variables above are highly correctlated with each other, only one of th
be used in the Multiple Regression below. This is not the case here because none of the variables ab
a high correlation with another variable. Using highly correlated variables as inputs to a Multiple Reg
causes an error called Multicollinearity and should be avoided. Multiple Regressions should be built
new independent variable at a time and evaluating results. Good new independent variables noticea
and lower Standard Error without causing much change to Coefficients. Poor new independent varia
R-Square much but have unpredictable effects on Coefficients. Build regressions up one variable at
evaluate after adding each new variable.
Multiple Regression
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.9877323112
R Square
0.9756151185
Adjusted R Square
0.9390377963
Standard Error
0.8210012659
Observations
6
ANOVA
df
3
2
5
SS
53.9356008961
1.3480861572
55.2836870533
MS
17.978533632
0.6740430786
F
26.6726774628
Coefficients
0.1250621001
0.3942208055
0.1701350642
0.0912674536
Standard Error
0.4416975598
0.0525631859
0.0701416328
0.0474978751
t Stat
0.2831396672
7.4999412397
2.4255931513
1.9215060332
P-value
0.8036858951
0.0173175912
0.1361101668
0.1946174186
Regression
Residual
Total
Intercept
Viacom
AT&T
GM
Low signifiance of the F statistic - indicates that, overall, the regession output is statistically significant (valid), at leas
p-Values for each variable - The lower the p-Value, the better predictor the variable was.
Viacom returns are a good predictor of the S&P
AT&T and GM returns are much less effective predictors of the S&P return (higher p-Values) - These would not be vali
The small coefficients of these two company returns also indicate that they are lesss valid predictors.
Adding new independent variables to a regression equation always increases R Square.
Adjusted R Square is increased only when newly added independent variable increase predictability of the dependent
Coke
r investments
ates that 94% of the variance of the S&P return is explained by the model - This is good.
Viacom indicates that it is the biggest predictor of the S&P. It's high correlation indicates this as well.
egression is used to determine confidence intervals.
al = Predicted S&P Value +/- z(95%) * (Standard Error)
e) shows high ratio of explained (regression) over unexplained (residual) variance. Low p value (Significance of F) shows regressi
iance (17.9) / Unexplained variance (0.67) = 26.6 - This is high and is good. A low P value shows that this is significant.
Significance F
0.0363534242
Lower 95%
-1.7754091113
0.1680596703
-0.1316600237
-0.1130994085
Upper 95%
2.0255333115
0.6203819408
0.4719301521
0.2956343158
Lower 95.0%
-1.7754091113
0.1680596703
-0.1316600237
-0.1130994085
Upper 95.0%
2.0255333115
0.6203819408
0.4719301521
0.2956343158
AFTER
Average
Daily
DEALER
Sales
Sales
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
100
130
120
140
155
200
300
260
190
185
100
130
120
140
155
110
135
122
157
160
206
309
283
202
192
110
135
122
157
160
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
P
Q
R
S
T
U
V
W
X
Y
Z
A1
B1
C1
D1
200
300
260
190
185
100
130
120
140
155
200
300
260
190
185
206
309
283
202
192
110
135
122
157
160
206
309
283
202
192
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
hypothesis is 0 and =
After
-4.6296
18.986
-1.7226
-0.5535
6.679
1.8261
Mean
Variance
Observations
Pearson Correlation
Hypothesized Mean Difference
df
t Stat
P(T<=t) one-tail
t Critical one-tail
P(T<=t) two-tail
t Critical two-tail
Problem:
Car
1
2
3
4
5
6
A tire manufacturer wants to determine if a new rubber formulation will improve tire wear.
12 sets of tires were created with the old rubber formula and 12 sets of news with the new
rubber formulation. They were placed on the following cars and driven until they wore out.
Determine at a 0.05 level of significance whether the new rubber produces longer tread life.
Tire Location
Front
Rear
Front
Rear
Front
Rear
Front
Rear
Front
Rear
Front
Rear
Old Rubber
37661
42342
31108
41239
32903
42658
29829
39616
34625
42650
31923
39990
New Rubber
31902
41203
38816
43305
35375
52353
30883
49424
38724
43234
34565
43861
The NULL Hypothesis here is that the mean tread wear of the old rubber equals the mean tread wear of the
The p-Value for both one-tailed test and two-0tailed test is less than the level of significance (0.05) so the N
is rejected - Therefore, we have a 95% certainty that the new rubber compund increases tread wear.
Problem:
Viacom
0.7541
14.9701
11.9792
7.907
-5.1724
3.4091
0.7541
14.9701
11.9792
7.907
-5.1724
3.4091
0.7541
14.9701
Evaluate the returns of these two stocks to determine if there is a real difference. Use a 0.05
GM
-4.6296
18.986
-1.7226
-0.5535
6.679
1.8261
-4.6296
18.986
-1.7226
-0.5535
6.679
1.8261
-4.6296
18.986
11.9792
7.907
-5.1724
3.4091
3.4091
0.7541
14.9701
11.9792
7.907
-5.1724
3.4091
0.7541
14.9701
11.9792
7.907
-5.1724
Problem:
-1.7226
-0.5535
6.679
1.8261
1.8261
-4.6296
18.986
-1.7226
-0.5535
6.679
1.8261
-4.6296
18.986
-1.7226
-0.5535
6.679
The NULL Hypothesis that the means of both returns are equ
A company is testing light bulbs from 2 suppliers. Below is listed the hours of usage before
Determine using a 0.05 level of significance whether the new supplier's light bulbs really las
old supplier's.
p-Values for both one and two tailed tests are greater than th
so it can be stated with 95% certainty that there is a differenc
New
55
45
58
52
54
47
51
61
49
56
52
49
If Change Occurred
Difference
10
5
2
17
5
6
9
23
12
7
10
5
2
17
5
6
9
23
12
7
10
5
2
17
5
6
9
23
12
7
In this case, we want to determine with 95% certainty whether an advertising campa
to our large dealer network. To determine this, we must take Before and After sampl
The keys to success of this sampling are the following:
We are trying to determine whether the Mean Difference falls inside or outside the 9
If the Mean Difference falls within this 95% Confidence Interval, We say that there is
If the Mean Difference Falls outside this Confidence Interval, there is a 95% chance
We can state with 95% certainly that there has been no significant change if the Ave
the 95% Confidence Interval of this mean being 0. To determine the 95% Confidence
Sample size (COUNT) =
Sample Standard Deviation (S
Sample Standard Error =
Sample Mean (AVERAGE) =
30
6.11
1.11
9.60
(1 - Confidence Interval) =
0.05
The 95% confidence interval will contain 95% of the area under the Normal curve. The rem
The Z Score represents the right outer edge of the confidence interval. Total area under th
a 95% two-tailed confidence interval is 97.5%. The z Score for this is 1.96. This means tha
is to the left of 1.96 Standard deviations to the right of the mean.
1.96
NORMSINV(0.975)
The 95% Confidence Interval around a Sample Mean of 0 = 0 +/- (Z Score for 95% CI)
0 +/- (1.96) x (1.11)
The 95% Confidence Interval for the Mean = 0 is from -2.18 to +2.18
If the Sample Mean (9.60) is outside of the 95% Confidence Interval for the Mean Differen
We can say with 95% certainty that Average Daily sales throughout the entire population o
has increased.
This is the case because Mean of 9.60 is outside of the confidence interval of -2.18 to +2.
We can now state with 95% certainty that the advertising campaign has caused a change
Old Rubber
37212
23678506
12
0.7364904091
0
11
-2.3950919344
0.0177699241
1.7958848142
0.0355398482
2.2009851587
New Rubber
40303.75
43699518.3864
12
led tests are greater than the stated level of significance (0.05)
ainty that there is a difference in the returns of these companies.
New
47.5 52.416666667
90.5476190476 21.537878788
22
12
66.8255208333
0
32
-1.675954
0.051746314
1.6938887026
0.103492628
2.0369333344
average daily sales over a week or a month. It cannot just be one sample of one day's sales
tative of the overall population.
e falls inside or outside the 95% Confidence Interval that the Mean Difference is 0.
Interval, We say that there is a 95% that the Mean Difference is 0 and No change occurred.
terval, there is a 95% chance that average daily sales for the whole network has changed.
ple Standard Error = (Sample Standard Deviation) / ( Square Root of Sample Size)
under the Normal curve. The remaining 5% () will be split between each outer tail on the Normal curve.
nce interval. Total area under the Normal curve to the left of this Z value for
e for this is 1.96. This means that 97.5% of the total area under the Normal curve
MSINV(0.975)
= 0 +/- (Z Score for 95% CI) * (Sample Standard Error)
ampaign has caused a change in the daily sales of the dealer network.
ANOVA is a technique for testing the equality of different population means. ANOVA is very useful because it can be
extened to any number of populations. All ANOVA test the NULL Hypothesis - that is - all samples drawn have the sam
ANOVA is often used by markets to tests whether different marketing campaigns with multiple varying elements actua
The NULL Hypothesis is rejected - that is - there are real differences between the means - if the p-Value pertaining to t
item being evaluated is less than the desired level of significance. For example, in the 1st ANOVA below, the p-Value
petaining to "Between Methods (Groups) is less than the desired lever of significance - So there is a difference betwe
Students
Problem: 3 different sale training methods are used. Three groups of
four randomly chosen new saleppeople are chosen. Each
group is trained using one of the methods. After the course
is completed, sales totals of each salesperson over the
next two weeks is collected.
1
2
3
4
Count
Sum
4
4
4
68
80
92
ANOVA
Source of Variation
Between Groups
Within Groups
Total
SS
df
72
46
2
9
118
11
The p-Value for Methods (Between Groups, which are the Methods) (0.011419201) is much less than the level of signi
so there is a difference between the effectiveness of the teaching methods..
The p-Value calculated by Excel agrees with the hand-calculated p-Value, which is less than the level of significance.
difference in the effectiveness between the courses.
Problem:
Typist 1
Typist 2
Typist 3
Typist 4
Typist 5
In this example, the two factors that influence the speed of typing are 1) the keyboard, and 2) the typing ability of each
Count
Keyboard A
Keyboard B
Keyboard C
Sum
3
3
3
3
3
180
338
141
303
216
5
5
5
375
379
424
ANOVA
Source of Variation
Rows
Columns
Error
Total
SS
df
9151.0666666667
296.1333333333
94.5333333333
4
2
8
9541.7333333333
14
The p-Value for the Rows (5.42004E-08) is much less than the level of significance (0.05) so there is a difference betwe
The p-Value for columns (0.003428581) is much less than the level of significance (0.05) so there is a difference betwe
Two factors are being evaluated and the tests are performed more than once (in this case, each test is performed in tw
Problem
Sophisticated
Athletic
Design 1
Count
Sum
Average
Variance
2
5.53
2.765
0.00245
2
3.37
1.685
0.25205
2
5.97
2.985
0.18605
2
2.9
1.45
0.005
2
5.13
2.565
0.00125
2
6.03
3.015
0.03645
6
16.63
2.7716666667
0.0732566667
6
12.3
2.05
0.62848
Design 2
Count
Sum
Average
Variance
Design 3
Count
Sum
Average
Variance
Total
Count
Sum
Average
Variance
ANOVA
Source of Variation
Sample
Columns
Interaction
Within
SS
Total
df
0.8072111111
4.9910777778
2.2771222222
1.0447
2
2
4
9
9.1201111111
17
The p-Value for Sample (0.076062) is more than the level of significance (0.05). We cannot reject the NULL Hypothesis
The p-value for Columns (0.00037339) is less than the level of significance (0.05). This indicates that that overall adve
The p-Value for Interaction (0.022409) is less than the level of significance. This indicates that different combinations
Column Total
Method 1
16
21
18
13
68
Column Mean
17
Grand Mean =
20
-3
36
72
Degrees of Freedom
Between Groups DOF = # groups - 1 = c - 1 = 3 - 1 =
11
Sum of Squares
Between Groups Sum of the Squares
Sum of Squares Within Groups
72
46
118
Mean Squares
MS = Mean Square = Sum of Square / degrees of freedom
SS
72
46
df
2
9
F Statistic
F Statistic = (MS Between Group) / (MS Within Groups)
F Statistic = 36 / 5.111111 =
7.0434782609
p Value
p-Value = FDIST(F Statistic,DOF Between Groups,DOF Within Groups) =
p-Value = FDIST(7.043478,2,9) =
0.0144192029
The p-value of 0.014419 is less than the designated level of significance of 0.05. This indicates
if there was no difference in effectiveness between the courses. Therefore, there is at least 95
by Excel
is worksheet
Teaching Method
Method 1
Method 2
Method 3
16
19
24
21
20
21
18
21
22
13
20
25
Average
Variance
17 11.3333333
20 0.66666667
23 3.33333333
P-value
F
36 7.04347826 0.014419201
5.1111111111
MS
F crit
4.2564947291
Keyboard A
51
109
47
98
70
Keyboard B
57
112
43
98
69
Keyboard C
72
117
51
107
77
Average
Variance
60
117
112.6666667 16.3333333
47
16
101
27
72
19
75
75.8
84.8
767.5
819.7
724.2
P-value
MS
F
2287.766667 193.605078 5.42004E-008
148.0666667 12.5303244 0.003428581
11.81666667
F crit
7.0060766231
8.6491106407
Sophisticated
2.80
2.73
3.29
2.68
2.54
2.59
Popular
Athletic
2.04
1.33
1.50
1.40
3.15
2.88
Total
2
6
2.84
11.74
1.42 1.95666667
0.0512 0.46722667
2
6
2.82
11.69
1.41 1.94833333
0.3362 0.75057667
2
6
3.25
14.41
1.625 2.40166667
0.17405 0.44477667
6
8.91
1.485
0.12407
Popular
1.58
1.26
1.00
1.82
1.92
1.33
P-value
MS
F
0.403605556 3.4770269 0.076062669
2.495538889 21.4988513
0.00037339
0.569280556 4.90430267 0.022409688
0.116077778
F crit
4.2564947291
4.2564947291
3.6330885115
ect the NULL Hypothesis that states that the package does not affect sales.
by Hand
Method 2
19
20
21
20
80
Method 3
24
21
22
25
92
Column Total
20
23
Column Mean
36
MS
36
5.1111111111
The p-Value represents the proportion of area under the F Distribution curve to the right of the given F value.
If this p-Value is less than the stated level of significance, this demonstrates that there is a difference
in the objects or process being analyzed. - in other words, there is a difference in the variances.
nce of 0.05. This indicates that there is less than a 5% chance that this result could have occurred
refore, there is at least 95% certainty that there is a real difference in effectiveness of the courses.
Method 1
16
21
18
13
68
17
Method 1
Method 2 Method 3
19
24
20
21
21
22
20
25
80
92
20
23
Method 2 Method 3
16 - 17
21-17
18 - 17
13 - 17
Method 1
-1
4
1
-4
19 - 20
20 - 20
21 - 20
20 - 20
24 - 23
21 - 23
22 - 23
25 - 23
Method 2 Method 3
-1
1
0
-2
1
-1
0
2
Square each
Method 1
1
16
1
16
34
46
Method 2 Method 3
1
1
0
4
1
1
0
4
2
10
Quality control people use the Chi Square test to determine if process' variance levels are staying within given limits.
The Chi Square Distribution is used to determine if a population's variance has been changed. The Chi Squre Distribution is sk
curve occuring at the point on the x axis that equals the number of degrees of freedom (n-1 --> Sample Size - 1). The total are
The area under the curve to the left or right of outer limits determines wihether it can be said with a certain degree of confidenc
If the area outside the Chi Square Statistic (the p value) is less than the desired level of significance, then the population varia
If Sample Standard Deviation, s, is greater than Population Standard Deviation, , then the Chi Squared Statistic will be to the
and the p value produced by CHIDIST(ChiSquare Statistic, degrees of freedom) will be the p value of the right tail.
If Sample Standard Deviation, s, is less than Population Standard Deviation, , then the Chi Squared Statistic will be to the lef
and the p value produced by CHIDIST(ChiSquare Statistic, degrees of freedom) will still be the area under the Chi Square curv
To get the area under the left tail (are to the left of the Chi Square point), the p-value = 1 - CHIDIST(Chi Square Statistic, degre
Problem: A manufacturer wants to check if the variance on a process has changed. A machine drills a hole as part o
The standard deviation of the hole diameter has historically been 1.6 ml.
A random sample of 50 hole diameters were checked in one batch. The measured sample standard deviatio
At an 0.05 level of significance, has the population standard deviation increased above 1.6 ml?
Givens:
n=
Degrees of Freedom= n-1
Level of Significance, , =
Population Standard Deviation, , =
Sample Standard Deviation, s, =
50
49
0.05
1.6
1.9
Use the Chi Squared Test to determine if there has been a change in variance.
1) Calculate Chi Square Statistic, = [ (n-1)*(s*s) ] / (*) =
69.09766
0.030749
This p value states the portion of total area under the Chi Square distribution curve for 49 degree of freedom to the
The Chi Square Statistic is caluculated from sample size (n - 1), population standard deviation, and sample standa
If the p value ( the area under the Chi Square distribution curve to the right of the Chi Square Statistic on that curve
greater than the level of significance value we are evaluating ( = 0.05 on a one-tailed test), then we accept the NU
In the case the p value (0.030749) is less than the desired level of significance ( = 0.05), and we reject the
It appears that the population variance has increased above 1.6 ml.
Problem: A manufacturer wants to check if the variance on a process has changed. A machine drills a hole as part o
The standard deviation of the hole diameter has historically been 1.6 ml. The engineers believe that they ha
A random sample of 50 hole diameters were checked in one batch. The measured sample standard deviatio
At an 0.05 level of significance, has the population standard deviation decreased 1.6 ml?
Givens:
n=
Degrees of Freedom= n-1
Level of Significance, , =
Population Standard Deviation, , =
Sample Standard Deviation, s, =
50
49
0.05
1.6
1.375
Use the Chi Squared Test to determine if there has been a change in variance.
1) Calculate Chi Square Statistic, = [ (n-1)*(s*s) ] / (*) =
36.18774
0.912951
0.087049
This p value states the portion of total area under the Chi Square distribution curve for 49 degree of freedom to the
The Chi Square Statistic is calculated from sample size (n - 1), population standard deviation, and sample standard
If the p value ( the area under the Chi Square distribution curve to the right of the Chi Square Statistic on that curve
greater than the level of significance value we are evaluating ( = 0.05 on a one-tailed test), then we accept the NU
In the case the p value (0.087049) is greater than the desired level of significance ( = 0.05), and we do not
It appears that the population variance has not decreased below 1.6 ml.
. The Chi Squre Distribution is skewed with the high point of the
> Sample Size - 1). The total area under the Chi Squared curve is 1.0.
with a certain degree of confidence that the population variance has changed.
cance, then the population variance has changed.
hi Squared Statistic will be to the right (greater than) the degree of freedom point
value of the right tail.
Squared Statistic will be to the left (less than) the degree of freedom point
e area under the Chi Square curve to the right of the Chi Square Statistic point..
IDIST(Chi Square Statistic, degrees of freedom)
ance ( = 0.05), and we do not reject the NULL Hypothesis that there has been no change.
Normal Distribution
The Normal distribution is a continuous distribution, as oppoed to a discrete distribution such as the binomial distrib
Any Normal distribution can be identified by two variables - the mean and standard deviation
The area under the entire density function = 1.
Most problems involving the Normal distribution fall into two categories:
1) Determining the probability of a normally distributed random variable having a value within a given interval
2) Determining a Confidence Interval - that is - Determining an interval within which the value of a normally distribute
To be able to apply the Normal distribution, It is extremely important that the underlying population can be
For any population, whether Normally distributed or not, the distribution of x bar (th
Normally distributed if sample size is large (30 or more).
This a basic tenant of the Central Limit Theorem - Statistics' most fundamental rule.
It is important to note that the problems on this page do not deal with samples. These problems only use parameters
The z distribution, sometimes called the standard normal distribution, is a normal distirbution with the mean, , = 0 and the sta
Population parameters are generally described with Greek letters, such as (population mean) and (population standard de
while Sample parameters are genearlly described with Roman letters, such as x bar (sample mean) and s (sample standard d
Statistical Function NORMSDIST(z) tells what percentage of total area of standardized normal curve (mean = 0 and standard
is to the left of a point z standard deviations from the mean, which is 0.
NORMSDIST(0) =
NORMSDIST(1.96) =
0.5
This means that half of the area under the standardized normal curve exists t
0.975
This means that 97.5% of the total area under that staandardized normal curv
This point of z = 1.96 is often used to calculate the 95% Confidence interval.
standard deviations to the left of the mena and extends to 1.96 standard devi
95% of the total area under the bell shaped Normal curve.
Statistical Function NORMSINV() tells how many standard deviations a point on a normal curve is to the left of the mean that t
will equal the percentage given as the argument for the function.
NORMSINV(0.0975) =
1.96
This means that 97.5% of the total area under the normal curve is to the left o
Statisical Function NORMDIST(x, mean, standard dev, TRUE) will calculate the area under the curve to the left of point x on a
The TRUE stated to provide Cumulative area - This is nearly always TRUE)
NORMDIST(1.96,0,1,TRUE) =
0.975
Problem: A store has normally distributed daily sales. The average daily sales = $2,000 and the daily sales standard d
What is the probability that the sales of one random day will be below $1,000?
Population Mean = = "mu" = $2,000
Population Standard Deviation = = "sigma" = = $500
x = $1,000
NORMDIST(1000,2000,500,TRUE) =
0.02275
2.28%
This can be interpreted by saying the only 2.28% of the total area
Problem: A brand of car has a mean fuel consumption of 27 mpg with a standard deviation of 5 mpg.
What percentage of the cars can be expected to have a fuel consumption of between 25 mpg and 30 mpg?
Fuel consumption is normally distributed for this population.
Percentage of cars with fuel efficiency between 25 mpg and 30 mpg =
Percentage of cars with fuel efficiency less than 30% - Percentage of cars with fuel efficiency less than 25% =
NORMDIST(30,27,5,TRUE) - NORMDIST(25,27,5,TRUE) = 0.725747
0.344578
Statistical Function NORMSINV() tells how many standard deviations a point on a normal curve is to the left of the mean that t
will equal the percentage given as the argument for the function.
NORMINV(0.975,0,1)
1.96
This means that 97.5% of the total area under the normal curve is to the left o
Problem: A company's package delivery time is normally distributed with a mean of 10 hours and a standard deviation
What delivery time will be beaten by only 2.5% of all deliveries?
= 10
=3
NORMINV(0.025,10,3) =
4.12
Meaning that only 2.5% of all package delivery times will be quick
Problem: A tire company makes a tire with a normally distributed tread life that has a mean of 39,000 miles and standa
What tread life would be exceeded by 98% of all tires?
= 39,000
= 5,000
NORMINV(0.02,39000,5300) =
28115
Meaning that only 2% of all tires will wear out before 28,115 miles
Problem: A tire company makes a tire with a normally distributed tread life that has a mean of 39,000 miles and standa
What would the range of tread life be that 95% of all tires would wear out in?
= 39,000
= 5,000
Calculation of the left boundary:
NORMINV(0.025,39000,5300) =
28612
Meaning that only 2.5% of all tires will wear out before 28,115 mile
49388
Meaning that only 2.5% of all tires will wear out after 49,388 miles
So, 95% of tires will wear out in the range of 28,612 miles to 49,388 miles.
e value of a normally distributed random variable will fall with a given probability
nderlying population can be proven to be normally distributed. This is often not the case.
tandardized normal curve exists to the left of z when z = 0 (z is exactly on top of the mean, that is, 0 standard deviations away from the mea
er that staandardized normal curve is to the left of the z when z is 1.96 standard deviations from the mean.
ate the 95% Confidence interval. That is, the section under the normal curve that starts a 1.96
nd extends to 1.96 standard deviations to the right of the normal curve will contain
Normal curve.
ve is to the left of the mean that the stated total area under the normal curve
er the normal curve is to the left of the point 1.96 standard deviations from the mean
he curve to the left of point x on a normal curve with the given mean and standard deviation.
g the only 2.28% of the total area under this particular Normal curve falls to the left of x = 1,000
tion of 5 mpg.
5 mpg and 30 mpg?
ve is to the left of the mean that the stated total area under the normal curve
er the normal curve is to the left of the point 1.96 standard deviations from the mean
Confidence Intervals
Collection of 40 individual test scores
210
340
490
610
In other words, calculate a 95% Confidence Interval for the population mean.
Sample size must be at least 30 and must be random and representative of the population
Sample size (COUNT) =
Sample Standard Deviation (STDEV) =
(1 - Confidence Interval) =
Mean (AVERAGE) =
Excel calculates the Confidence Interval to be 49.42 using the following statistical function: CONFIDENCE (alpha, s
Input for this function are CONFIDENCE(0.05,159.48,40) =
Let's see how Excel's calculation holds up to the correct, manual calculation of Confidence Interval calculated from
(Excel hits it just about right on)
The 95% Confidence Interval around a Sample Mean of 0 = 0 +/- (Z Score for 95% Confidence Interval) * (Sam
Z Score for 95% Confidence Interval (two sided) = Z(0.975) = 1.96
Sample Standard Error = (Sample Standard Deviation) / ( Square Root of Sample Size)
Sample Standard Error = (159.48) / (Square Root [40] ) = 25.21
Confidence Interval = Sample Mean +/- Z Score(95% Confidence Interval) *(Sample Standard Error)
Confidence Interval = 473.5 +/- (1.96) x (25.21) = 473.5 +/Confidence Interval = 473.75 +/- 49.41 = 124.32 to 223.16
This means that there is a 95% chance that the mean of the entire popultation
is between the endpoints of this 95% Confidence Interval
Determining Sample Size (n) for a Given Confidence Level and Bound (B)
n = number of sample needed to establish a specified confidence interval of of width B on either side of mean
e.g. How many samples must be taken to estimate the population diameter (of, for example,
holes drilled by a machine) to within 0.05 mm. of the mean sample diameter with 99% confidence.
Standard deviation (determined from previous sampling) is 0.75 mm ?.
n = [ (Z score of two-tailed 99% confidence)**2 x (sample standard deviation)**2 ] / [Interval**2]
n = [ (2.575)**2 x (0.75)**2 ] / [ (0.05)**2 ] = 1,492
NORMSINV(0.995)=
Problem: A restaurant owner wants to estimate within $2.00 the average amount that customers spend during lunch.
For experience, the standard deviation of the population is $5.00. How many samples need to be taken to get a sampl
that is 92% certain of being within $2.00 of the population mean
Z score of two-tailed 92% confidence = NORMSINV(0.96) =
Population Standard Deviation = 5.00
Interval = 2.00
220
370
500
640
230
370
500
640
240
380
510
640
270
400
510
650
(Need
49.42
(0.975) = 1.96
1.96
Insert /NORMSINV(0.975)
Fu
40] ) = 25.21
49.41
32 to 223.16
dence Interval
area in each tail.
1.96
0.975
dence Interval
1.64
0.95
2.576
1.751
Samples
300
410
540
660
300
410
540
660
Inveral, = 0.05)
ORMSINV(0.975)
culation of 49.41)
320
450
580
750
320
470
580
750
320
470
610
790
Binomial Distribution
Binomial distributions are are collections of discrete values as opposed to, for example, the Normal distribution, whic
Any binomial distribution can be identified the value of two of its variables - the number of trials (n) and the probabilit
In this case, generate 5 random numbers, Each with possible outcomes of 2 or 3. Each event has a 20% probability of a "2" ou
(You could easily do the same thing with outputs of 1 and 0 - measuring something occuring or not occurring)
3
3
3
2
2
Number of variable = 1
(The value of the 1 variable is 1 or 0)
Number of random variables = 5
Outcome Probability
2
0.2
3
0.8
Sum of 2's =
2
Statistical function COUNTIF - Select the range of outputs to be c
The sum is the number of successes in 5 random trials, each having a 0.20 chance of a "2" outcome.
Problem: What is the probability of 3 successful outcomes in 5 trials if the probability of a success
s = number of successes =
n = number of trials =
0.2
Probability of this is =
0.0512
Statistical Function / BINOMDIST (in this case, you don't want cumulative distribution - Use 0 as that last argument)
Which is =
Format / Cell / Percentage
5.12%
Problem - In 12 trials (n = 12), what is the probability that at least 10 of them (Sum of the probabilities that s = 10, s = 11, and s
will have the 1 of the 2 possible outcomes that has a probability of occuring of 65%?
The probabilities of each outcome need to be added up.
10
11
12
0.65
12
0
0.108846 0.036753 0.005688009
0.151288 This represents a combined probability of
Statistical function BINOMDIST(s,p,n,FALSE)
BINOMDIST(10,12,0.65,0) + BINOMDIST(11,12,0.65,0) + BINOMDIST(12,12,0.65,0)
6
0.5
10
1
0.828125
BINOMDIST(6,10,0.5,1)
3
0.5
10
1
0.171875
Equals
BINOMDIST(3,10,0.5,1)
Problem - If 10% of products require servicing, what is probability that less than 15 o
The problem actually asks what is the probability that up to 14 products will need servicing.
Therefore, you are solving for the cumulative probability that up to 14 products need servicing
s = 14
p = 0.10
n = 200
TRUE = 1
BINOMDIST(14,200,0.10,1) =
0.092946
9.29%
elect the range of outputs to be counted and then select the cell that has the output to be counted, (Where outcome = 2)
0.20 chance of a "2" outcome.
ome to occur
15.13%
0.65625
65.63%
e outcome = 2)
Population Proportions
When sample of size n is used to estimate a population proportion, e.g. a proportion of a population who would vote f
it can be analyzed using the binomial distribution
The population proportion of success will be the same as p, the probability of success of a single trial.
The following relationships hold true for population proportions:
The mean of sample proportions = = p
The standard deviation of sample proportions = = SQRT { [ p (1 - p) ] / n }
The confidence interval of a population proportion would be = z = p zSQRT { [ p (1 - p) ] / n }
Problem: A random sample of 350 people was chosen and each person was asked if they recognized a particular bran
112 people recognized the brand. Calculate a 95% confidence interval of the proportion of the total populatio
who recognize the brand.
Givens:
n=
p= 112 / 350 =
350
0.32
Confidence level
0.95 - This means that 2.5% of area under Normal curve exists in each tail above and belo
z = NORMSINV(0.975) =
1.96
- 97.5% of the total area under the normal curve is to the left of a point 1.96 standard
0.32
0.27113
0.04887
to
0.36887
The minimum number of sample needed, n, to obtian a confidence interval of a certain width, e (or given sample error
n = p (1-p) (z/e)**2
It is better to use the binomial distribution to calculate the p value when dealing with
The p value is the area under the Normal curve outside of x - NOT the probability of a successful trial)
Problem: A manufacturer of circuit boards wants to keep the proportion of defective boards at 0.098.
The manufactur tested 156 randomly chosen boards and found 20 to be defective.
Determine with a 95% certainty (0.05 level of significance) the defective proportion has not increased above 0
n=
p=
x=
156
0.098
19
1 - the probability that19 or less are defective = 1 - Cumulative probability of 19 defective = 1 - BINOMDIST(19,256,0.
1-
0.870142
0.129858
This p-value of 0.129858 is greater than (0.05 - the level of significance - the proportion of area under the Normal curve to th
We therefore conclude that the large x value could have happened by chance and we fail to reject the NULL Hypothesis.
To determine whether a known population has changed, take a sample of the population and use the binomial distribu
calculate the probability of that sampling event (the number of successes, x, per given sample size,n, given p - the pre
and compare that probabiilty to the desired level of significance.
If this probability is less than the level of significance you have established ( for a one-tailed test and /2 for a two-ta
then the NULL Hypothesis is rejected.
here is a 95% chance that between 27.1% and 36.9% of the total population
cessful trial)
ds at 0.098.
e = 1 - BINOMDIST(19,256,0.098,1)
40,619
40,803
41,129
40,831
40,712
41,334
41,496
41,749
42,645
42,625
42,833
43,053
43,563
43,907
43,589
44,025
44,397
44,837
44,698
45,086
45,671
46,081
46,842
47,627
48,542
49,389
50,862
51,213
51,753
52,784
54,077
55,349
56,225
56,860
57,461
58,105
59,250
59,949
61,126
61,899
62,423
Females
14,974
15,580
16,285
17,000
17,593
17,957
17,492
18,266
19,456
19,591
20,093
20,455
20,689
21,608
21,758
22,134
22,734
23,351
24,043
25,003
25,642
26,770
27,954
28,810
29,580
30,148
31,491
32,972
34,214
35,399
37,323
38,959
40,747
41,866
42,952
44,255
44,994
46,740
47,852
49,085
50,436
80,000
70,000
60,000
50,000
40,000
30,000
20,000
10,000
0
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
63,375
64,805
65,149
65,767
66,329
66,788
67,516
67,434
68,884
69,547
70,295
51,996
52,925
53,328
54,356
54,982
56,322
56,871
57,503
58,788
59,583
60,718
x bar (x mean)
20
30
42
40
55
521
118
118
118
118
118
118
Sum ( (x - x bar)**2) =
# of points
Statistical Function COUNT
n-1
Sum
Arithmetic Function SUM
708
Mean
Statistical Function AVERAGE
118
39117.2
Standard Deviation
Statistical Function STDEV
197.7807
Median Value
Owner
Occupied
Alabama
Alaska
Arizona
Arkansas
California
Colorado
Connecticut
Delaware
District of Columbia
Florida
Georgia
Hawaii
Idaho
Illinois
Indiana
Iowa
Kansas
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
53,700
94,400
80,100
46,300
195,500
82,700
177,800
100,100
123,900
77,100
71,300
245,300
58,200
80,900
53,900
45,900
52,200
Kentucky
Louisiana
Maine
Maryland
Massachusetts
Michigan
Minnesota
Mississippi
Missouri
$
$
$
$
$
$
$
$
$
50,500
58,500
87,400
116,500
162,800
60,600
74,000
45,600
59,800
Descriptive Statistics
Median Value Owner Occupied
Mean
Standard Error
Median
Mode
Standard Deviation
Sample Variance
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count
Histogram
Bin Range Requested By Histogram (in Yellow)
Interval
1
2
3
4
5
6
Montana
Nebraska
Nevada
$
$
$
56,600
50,400
95,700
7
8
New Hampshire
New Jersey
New Mexico
New York
North Carolina
North Dakota
Ohio
Oklahoma
Oregon
Pennsylvania
Rhode Island
South Carolina
South Dakota
Tennessee
Texas
Utah
Vermont
Virginia
Washington
West Virginia
Wisconsin
Wyoming
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
129,400
162,300
70,100
131,600
65,800
50,800
63,500
48,100
67,100
69,700
133,500
61,100
45,200
58,400
59,600
68,900
95,500
91,000
93,400
47,900
62,500
61,600
Frequency
More
27
25
20
15
Frequency
11
10
4
Original Data
Gross Domestic Product Per Capita
using Purchasing Power Parity 1991
Country
Australia
Austria
Belgium
Canada
Denmark
Finland
France
Germany
Greece
Iceland
Ireland
Italy
Japan
Luxembourg
Netherlands
New Zealand
Norway
Portugal
Spain
Sweden
Switzerland
Turkey
United Kingdom
United States
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
17,280
17,454
19,178
17,621
15,997
18,227
19,500
7,775
17,237
11,507
16,896
19,107
21,372
16,530
13,883
16,904
9,191
12,719
16,729
21,747
3,491
15,720
22,204
Greece
Portugal
Ireland
Spain
New Zealand
United Kingdom
Finland
Australia
Netherlands
Sweden
Italy
Norway
Iceland
Austria
Belgium
Denmark
France
Japan
Canada
Germany
Luxembourg
Switzerland
United States
Males
Females
ght the Males and Females column of data to create the chart. Do not highlight the year column.
e 2nd step of creating the chart, click the Series tab and highlight the Year column as the x-axis.
Statistics - Tools / Data Analysis / Descriptive Statistics
Females
52371.3076923077
1362.9393673634
50125.5
#N/A
9828.2955487544
96595393.3936654
-1.3294344022
0.4121671405
29676
40619
70295
2723308
52
Mean
Standard Error
Median
Mode
Standard Deviation
Sample Variance
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count
34646.6
2051.745
30819.5
#N/A
14795.34
2.2E+008
-1.365154
0.341443
45744
14974
60718
1801623
52
and Variance
x - x bar
(x - x bar)2
-98
-88
-76
-78
-63
403
9604
7744
5776
6084
3969
162409
n-1 =
195586
5
39117.2
197.7807
Arithmetic Function SQRT
Mean
Standard Error
Median
118
80.7436271995
41
Mode
Standard Deviation
Sample Variance
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count
#N/A
197.7806866203
39117.2
5.925570311
2.429919032
501
20
521
708
6
ptive Statistics
195000
220000
220000
245000
ta Analysis / Histogram
Frequency
27
11
4
4
2
1
1
1
Frequency
Histogram
Bin
3491
8169.25
Frequency
1
1
The data needs to be copied here and then sorted Data / Sort
omestic Product Per Capita using
urchasing Power Parity 1991
15
10
5
0
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
7,775
9,191
11,507
12,719
13,883
15,720
15,997
16,085
16,530
16,729
16,896
16,904
17,237
17,280
17,454
17,621
18,227
19,107
19,178
19,500
21,372
21,747
22,204
12847.5
17525.75
More
3
11
8
10
5
0
Males
Females
Histogram
15
10
5
0
Frequency
Histogram
15
10
Frequency
5
0
Hypothesis testing is one of the types of statistical tests to determine if a change has occurred to a population mean.
The first hypothesis, the NULL Hypothesis, is usually stated in terms such as "There has been no change in the popu
This will normally involve an equal sign.
The second hypothesis, the Alternative Hypothesis, states that the population mean has changed in one of three ways
1) The population mean has changed (increased OR decreased) - This involves a two-tailed test
2) The population mean has decreased - This involves a one-tailed test with the left tail
3) The population mean has increased - This involves a one-tailed test with the right tail.
In summary, hypothesis testing involves:
1) Determining the NULL hypothesis, determining the level of certainty to which that NULL Hypothesis
1) Determining the NULL hypothesis. This is normally that the original population mean has not changed.
2) Determining the level of certainty to which that NULL Hypothesis will be tested. If you want to establish a 95% certainty leve
3) Take a sample of the population.
4) Calculate the sample mean. This value will be called x.
5) Graph this sample mean on the normal curve created from the original population mean
6) The NULL Hypothesis is accepted or rejected based upon the results of either of the following tests (which are both equivale
6a) The critical value test - The level of certainty, , is converted to a "critical value." This "critical value" is the number of stand
the level of certianty is from the mean. For example, on a two-tailed test, an of 0.05 translates to a 95% level of certainty
On a two-tailed test, this would result in 2.5% of the total area under the Normal curve to be greater than the right critical v
and 2.5% of the area under the Normal curve to be less than the left critical value. Each critical value is 1.96 standard devi
from the mean on the normal curve - NORMSINV(0.975) =
1.96
The z value of the sample mean is calculated. The z-value is the number of standard deviations that the sample mean is fr
on a Normal curve derived from the population mean.
If the z-value of the sample is farther away from the mean than the critical value (the z value of that level of certainty), then
6b) The p-value test - This is equivalent to the above test A Normal curve is constructed based upon the population mean.
The is the significance level. The significance level represents that percentage of the area under the normal curve that is
For example, on a two-tailed test with a 95% required level of certainty, = 0.05. The test is two-tailed so 2.5% of the total
and 2.5% of the area under the normal curve will be below the 95% confidence area.
The p value is equal to the percentage of area under the normal curve that is outside of x on the normal curve.
If the p value is less than the the percentage of the area under the normal curve corresponding to , the NULL Hypothesis
Problem: A manufacturer claims that the average thickness of metal sheets is 15 mls. And that the population standar
50 sheets are sample having a sample mean of 14.982 mls. At the 0.05 significance level (95% confidence leve
the manufacturer's claim that the average thickness of 15 mls. is correct.
Givens:
n=
=
=
x=
=
50
0.05
0.1
14.982
15
The ALTERNATE Hypothesis is that 15 mls. (Since we are testing whether a difference exists in either direction, this is a tw
1) Calculate Sample Standard Error
0.014142
3) Calculate p value - the area under the Normal curve outside the sample z value.
NORMSDIST(1.272792) =
This states that 10.154% of the total area under the Normal curve is lies outside a point 1.27 standard deviations from the m
THE P TEST CAN BE PERFORMED AT THIS POINT
The NULL Hypothesis is rejected if the p-value (the percentage of area under the Normal curve ouside point x) is less than /2
The p-value = 0.101546 and is much larger than /2 (0.025) so the NULL Hypothesis is not rejected - The manufacturer's claim
1.96
This states that of 0.05 on a two-tailed test produces a confidence interval that goes from 1.96 standard deviations above th
If x is outside of this range (the z value for z is greater than 1.96), then the NULL Hypothesis is rejected.
In this case, the z value of x (1.27279) is less than the critical value (1.96) and therefore x is closer to the mean than the critica
Problem: A furniture company states that its average delivery time is 15 days with a (population) standard deviation o
A random sample of 50 deliveries showed an average delivery time of 17 days.
Determine within 98% certainty (0.02 significance level) whether delivery time has increased.
Givens:
n=
=
50
0.02
=
x=
=
4
17
15
This is a one-tailed test because we are checking whether delivery time increased.
NULL Hypothesis - = 15
ALTERNATE Hypothesis - > 15
Using the P-test, we will determine if the p value (area above x under the normal curve) is less than (since this is a one-tailed
1) Calculate Sample Standard Error
0.565685
3.535534
3) Calculate p value - the area under the Normal curve outside the sample z value =
1 - NORMSDIST(3.535534) =
This states that 0.000203 of the total area under the Normal curve is lies above the point 3.535534 standard deviations abo
This p-value (0.000203) is less than (0.02) so the NULL Hypothesis is rejected - It appears likely that delievery time has in
d to a population mean.
normal curve.
o , the NULL Hypothesis is normally rejected.
SDIST(1.272792) =
0.101546
dard deviations from the mean on either side (tail) of the Normal curve.
ndard deviations above the mean to 1.96 standard deviations below the mean.
the mean than the critical value, and we do not reject the NULL Hypothesis.
RMSDIST(3.535534) =
10.999797
4 standard deviations above the mean.
that delievery time has increased.
0.000203
Discrete Variables
Calculating Means, Standard Deviations, and Variances of their distributions of Disrete Variables.
P(x)
x * P(x)
Grade
4
3
2
1
0
Probability
0.1
0.2
0.35
0.25
0.1
0.4
0.6
0.7
0.25
0
1.95
1.95
x
Grade
4
3
2
1
0
P(x)
Mean
1.95
1.95
1.95
1.95
1.95
( x - Mean )
2.05
1.05
0.05
-0.95
-1.95
Square of (x - Mean )
4.2025
1.1025
0.0025
0.9025
3.8025
Variance =
Probability
0.1
0.2
0.35
0.25
0.1
SQRT (Variance) =
Mathematical Function SQRT
]=
1.2475
1.116915