Professional Documents
Culture Documents
Chapter No. 08 Fundamental Sampling Distributions and Data Descriptions - 02 (Presentation)
Chapter No. 08 Fundamental Sampling Distributions and Data Descriptions - 02 (Presentation)
Chapter No. 08 Fundamental Sampling Distributions and Data Descriptions - 02 (Presentation)
Fall-21
If Z= o, the value is on the mean. Positive Z value means score is above the
mean value. While, negative value means score is below the mean value.
Chi-squared “n” degrees of freedom
If z1, z2, …, zn are independent standard normal random variables,
then the random variable “x” follows the chi-square distribution
with k degrees of freedom.
Chi-squared “n-1” degrees of freedom
The t-distribution is used extensively in problems that deal with inference about
the population mean or in problems that involve comparative samples (i.e., in cases
where one is trying to determine if means from two samples are significantly different)
F- distribution with (n-1,m-1)
If X1 and X2 are two independent chi-square random variables with u and v
degrees of freedom, respectively, then the ratio
DEFINTITION:
The null hypothesis (H0), stated as the null, is a statement about a
population parameter, such as the population mean, that is assumed to be
true.
An alternative hypothesis (H1) is a statement that directly contradicts a
null hypothesis by stating that that the actual value of a population
parameter is less than, greater than, or not equal to the value stated in the
null hypothesis.
FOUR STEPS TO HYPOTHESIS TESTING
The goal of hypothesis testing is to determine
the likelihood that a population parameter,
such as the mean, is likely to be true.
Step 1: State the hypotheses.
Step 2: Set the criteria for a decision.
Step 3: Compute the test statistic.
Step 4: Make a decision.
FOUR STEPS TO HYPOTHESIS TESTING
Step 1: State the hypotheses.
We begin by stating the value of a population
mean in a null hypothesis, which we presume
is true. For the children watching TV example,
we state the null hypothesis that children in
the United States watch an average of 3 hours
of TV per week.
FOUR STEPS TO HYPOTHESIS TESTING
Step 2: Set the criteria for a decision.
To set the criteria for a decision, we state the level of significance for
a test.
Level of significance, or significance level
To set the criteria for a decision, we state the level of significance for
a test. This is similar to the criterion that jurors use in a criminal trial.
Jurors decide whether the evidence presented shows guilt beyond a
reasonable doubt (this is the criterion). Likewise, in hypothesis
testing, we collect data to show that the null hypothesis is not true,
based on the likelihood of selecting a sample mean from a population
(the likelihood is the criterion). The likelihood or level of significance
is typically set at 5% in behavioral research studies. When the
probability of obtaining a sample mean is less than 5% if the null
hypothesis were true, then we conclude that the sample we selected
is too unlikely and so we reject the null hypothesis.
FOUR STEPS TO HYPOTHESIS TESTING
Step 3: Compute the test statistic.
Suppose we measure a sample mean equal to 4 hours per week
that children watch TV. To make a decision, we need to evaluate
how likely this sample outcome is, if the population mean stated
by the null hypothesis (3 hours per week) is true. We use a test
statistic to determine this likelihood. Specifically, a test statistic
tells us how far, or how many standard deviations, a sample
mean is from the population mean. The larger the value of the
test statistic, the further the distance, or number of standard
deviations, a sample mean is from the population mean stated in
the null hypothesis.
The test statistic is a mathematical formula that allows
researchers to convert the original measurement (e.g. a sample
mean) into units of the null distribution (e.g. a z-score), so that
we can look up probabilities in a table..
FOUR STEPS TO HYPOTHESIS TESTING
Step 4: Make a decision.
We use the value of the test statistic to make a decision about the
null hypothesis. The decision is based on the probability of
obtaining a sample mean, given that the value stated in the null
hypothesis is true. If the probability of obtaining a sample mean is
less than 5% when the null hypothesis is true, then the decision is
to reject the null hypothesis. If the probability of obtaining a
sample mean is greater than 5% when the null hypothesis is true,
then the decision is to retain the null hypothesis. In sum, there
are two decisions a researcher can make:
1. Reject the null hypothesis. The sample mean is associated with
a low probability of occurrence when the null hypothesis is true.
2. Fail to reject the null hypothesis. The sample mean is
associated with a high probability of occurrence when the null
hypothesis is true.
FOUR STEPS TO HYPOTHESIS TESTING
P-Value
The total area under the distribution curve is 1. Blue shaded area is
equal to α (Your rejection region). α/2 represents area above the
cut off region on the right side of the distribution. 1-α/2 represents
area below the cut off region on the left side of the distribution.
P-value tells how much area is above/below the calculated test
statistic t₀.
Lets say for upper tail,
If P-value is high, the area above the test statistic t₀ is high.
Hence, the test statistic t₀ is closer to mean value, which supports
null hypothesis.
Note: the test statistic t₀ measures how far it is from mean.
P-Value
A p value is the probability of obtaining a sample
outcome, given that the value stated in the null
hypothesis is true. The p value for obtaining a
sample outcome is compared to the level of
significance.
1. P-value ≤ α → reject H ₀ at level α
2. P-value > α → Do not reject H₀ at level α
P-Value
Suppose that two materials are being considered
for coating a particular type of metal in order to
inhibit corrosion.
Specimens are obtained, and one collection is
coated with material 1 and one collection coated
with material 2.
The sample sizes are n1 = n2 = 10, and corrosion is
measured in percent of surface area affected.
The hypothesis is that the samples came from
common distributions with mean μ = 10. Let us
assume that the population variance is 1.0. Then we
are testing
H₀: μ₁ = μ₂ = 10
P-Value
The above figure represent a point plot of the data; the data are placed on the
distribution stated by the null hypothesis. Let us assume that the “×” data refer to
material 1 and the “◦” data refer to material 2. Now it seems clear that the data
do refute the null hypothesis. But how can this be summarized in one number?
The P-value can be viewed as simply the probability of obtaining these data
given that both samples come from the same distribution. Clearly, this
probability is quite small, say 0.00000001! Thus, the small P-value clearly refutes
H0, and the conclusion is that the population means are significantly different.
Ho: µ1= µ2
H₁: µ1 ǂ µ2
α = 0.05
Degrees of freedom= n1+ n2 -2 = 10+10-2= 18
The Two Sample t-Test
Note: The equal variance and normality assumptions are easy to check using a
normal probability plot.
60
50
40
30
20
10
1
15.5 16.0 16.5 17.0 17.5 18.0
Modified Mortar
The Two Sample t-Test
The Two Sample t-Test
The Two Sample t-Test
The Two Sample t-Test
H₀: µ₁ = µ₀
The sample variance S² is used to estimate population variance ϭ². The test
statistic is
The One Sample t-test
Example:
The following sample consists of n=20 observations on dielectric breakdown voltage
of a piece of epoxy resin .
a) We would like to demonstrate that the mean of dielectric breakdown is equal to
27 days. Set up appropriate hypotheses for investigating this claim.
b) Test these hypotheses using α= 0.01. What are your conclusions
Solution: As population variance is unknown and sample size is less than 40, deploy
the one sample t-test
Zα = Z₀.₀₅ = 1.645
Conclusions:
As │Z₀│ > Zα
2.80 > 1.645
Thus H₀ is rejected, and we conclude that the lot average
breaking strengths exceeds 200 psi.
T test and z-Test
•Both tests are identical if sample size (dof) approaches infinity.
•Use z-test if n>30 (population variance know or unknown)
• Use z-test if n<30 (population variance know)
• Use t-test, if population variance is unknown and n<30
The larger the df is, the more closely the t distribution approximates
a normal distribution.
Summary
Summary
Inferences About the Variances
of Normal Distributions
In many experiments, we are interested in possible differences
in the mean response for two treatments. However, in some
experiments it is the comparison of variability in the data that is
important.
In the food and beverage industry, for example, it is important
that the variability of filling equipment be small so that all
packages have close to the nominal net weight or volume of
content.
In chemical laboratories, we may wish to compare the
variability of two analytical methods. We now briefly examine
tests of hypotheses and confidence intervals for variances of
normal distributions.
Unlike the tests on means, the procedures for tests on variances
are rather sensitive to the normality assumption.
Inferences About the Variances
of Normal Distributions
Inferences About the Variances
of Normal Distributions
Consider testing the equality of the variances of two normal populations.
If independent random samples of size n1 and n2 are taken from
populations 1 and 2, respectively.
As dof of population is 11 and is not given table. We will take average of 10 and 12 (i.e.,
((3.14+3.07)/2)= 3.105)
Or in excel, use command FINV(α,n-1, m-1,)= FINV(0.05,11,9)= 3.10
Percentage Points of the F Distribution
Table IV of the Appendix gives only upper-tail percentage points
of F; however, the upper- and lower-tail points are related by
Inferences About the Differences in
Means, Paired Comparison Designs
Comparison between 2 brands of tires (i.e., General and Dunlop)
•You want to buy new tires for your car. And you want to know
average life of automobile’s tires.
•Some friends suggested General tires ( 12, 14, 16, 19, 13 months)
and some suggested Dunlop tires (11, 13, 15, 20, 19 months) on the
basis of their experience.
•Now the question is, whether the experiment was run under the
same conditions i.e.,
•Road conditions
•Load on tires
•Environmental temperature
•Hence, the analysis would have been more accurate, had someone
used 1 tire of General (front right wheel) and 1 tire of Dunlop (front
left wheel).
Inferences About the Differences in
Means, Paired Comparison Designs
• If the numerator is considerably bigger than the denominator, you have evidence
for a systematic factor on top of random chance
Example I
Tim believes that his “true weight” is 187 lbs with a standard
deviation of 3 lbs.
Tim weighs himself once a week for four weeks. The average
of these four measurements is 190.5.
3. An X-bar of 190.5 is what Ztest? What is the probability of getting a Ztest as high as
ours?
X − H 0 190.5 − 187
Z test = = = 2.33 P ( Z 2.33) = .0099
X 3 4
4. If H0 were true, there would be only about a 1% chance of randomly obtaining the
data we have. Reject H0.
Example I illustrated
z = 190.5-187 = 2.33
3
Reject H0
4
0.01
x = 187
x = 1.5 190.5
0 1.65 2.33
Zcrit Ztest
What Is a t-test?
• In most research situations, the parameters and
are unknown because the test is novel
• Estimates, based on sample statistics must be used in
place of the parameters
• Use of estimation reduces the certainty of the tests
by a quantifiable probability which depends upon the
size of the sample
• For very large samples, the t-test and z-test are
identical
• t-tests are just like z-tests, except that they
compensate for the increasing uncertainty of small
sample sizes
Anthony Greene 89
The larger the df is, the more closely the t
distribution approximates a normal distribution.
Anthony Greene 90
Use of t-test vs. z-test
1. z-test
a) is known:
b) Mx is computed
c) M is known ( is known)
2. t-test
a) is hypothesized or predicted
(not computed and generally not known):
b) Mx is computed
c) M is unknown ( is unknown) :
sM is computed
d) Degrees freedom (d.f.) is computed as the one less than the
sample size (the denominator of the standard deviation):
df = n - 1
Anthony Greene 91