Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Small Sample Test

Test Based on F Distribution


An F-test is used to test if the variances of two populations are equal. This test can be a two-tailed
test or a one-tailed test. The two-tailed version tests against the alternative that the variances are
not equal. The one-tailed version only tests in one direction that is the variance from the first
population is either greater than or less than (but not both) the second population variance. The
choice is determined by the problem. For example, if we are testing a new process, we may only
be interested in knowing if the new process is less variable than the old process.
Chi-Square Distribution: Goodness of Fit
The test is applied when you have one categorical variable from a single population. It is used to
determine whether sample data are consistent with a hypothesized distribution.
For example, suppose a company printed baseball cards. It claimed that 30% of its cards were
rookies; 60%, veterans; and 10%, All-Stars. We could gather a random sample of baseball cards
and use a chi-square goodness of fit test to see whether our sample distribution differed
significantly from the distribution claimed by the company.

When to Use the Chi-Square Goodness of Fit Test


The chi-square goodness of fit test is appropriate when the following conditions are met:
 The sampling method is simple random sampling.
 The variable under study is categorical.
 The expected value of the number of sample observations in each level of the variable is at
least 5.

This approach consists of four steps:


(1) State the hypotheses,
(2) Formulate an analysis plan,
(3) Analyze sample data, and
(4) Interpret results.

State the Hypotheses


Every hypothesis test requires the analyst to state a null hypothesis (H 0) and an alternative
hypothesis (Ha). The hypotheses are stated in such a way that they are mutually exclusive. That is,
if one is true, the other must be false; and vice versa.
For a chi-square goodness of fit test, the hypotheses take the following form.

H0: The data are consistent with a specified distribution.


Ha: The data are not consistent with a specified distribution.

Typically, the null hypothesis (H0) specifies the proportion of observations at each level of the
categorical variable. The alternative hypothesis (H a) is that at least one of the specified proportions
is not true.

Formulate an Analysis Plan


The analysis plan describes how to use sample data to accept or reject the null hypothesis. The
plan should specify the following elements.

Significance level: Often, researchers choose significance levels equal to 0.01, 0.05, or 0.10;
but any value between 0 and 1 can be used.
Test method: Use the chi-square goodness of fit test, to determine whether observed sample
frequencies differ significantly from expected frequencies specified in the null hypothesis.

Analyze Sample Data


Using sample data, find the degrees of freedom, expected frequency counts, test statistic, and the
P-value associated with the test statistic.

Test statistic: The test statistic is a chi-square random variable (Χ 2) defined by the following
equation.

P-value: The P-value is the probability of observing a sample statistic as extreme as the test
statistic.
If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null
hypothesis. Typically, this involves comparing the P-value to the significance level, and rejecting
the null hypothesis when the P-value is less than the significance level.

Chi-Square Distribution: Independence of Attributes


The test is applied when you have two categorical variables from a single population. It is used to
determine whether there is a significant association between the two variables.
For example, in an election survey, voters might be classified by gender (male or female) and
voting preference (Democrat, Republican, or Independent). We could use a chi-square test for
independence to determine whether gender is related to voting preference.

When to Use Chi-Square Test for Independence


The test procedure described in this lesson is appropriate when the following conditions are met:
 The sampling method is simple random sampling.
 The variables under study are each categorical.
 If sample data are displayed in a contingency table, the expected frequency count for each
cell of the table is at least 5.

This approach consists of four steps:


(1) State the hypotheses,
(2) formulate an analysis plan,
(3) Analyze sample data, and
(4) Interpret results.

State the Hypotheses


Suppose that Variable A has r levels, and Variable B has c levels. The null hypothesis states that
knowing the level of Variable A does not help you predict the level of Variable B. That is, the
variables are independent.
H0: Variable A and Variable B are independent.
Ha: Variable A and Variable B are not independent.

The alternative hypothesis is that knowing the level of Variable A can help you predict the level of
Variable B.

Formulate an Analysis Plan


The analysis plan describes how to use sample data to accept or reject the null hypothesis. The
plan should specify the following elements.

Significance level: Often, researchers choose significance levels equal to 0.01, 0.05, or 0.10;
but any value between 0 and 1 can be used.

Test method: Use the chi-square test for independence to determine whether there is a
significant relationship between two categorical variables.

Analyze Sample Data


Using sample data, find the degrees of freedom, expected frequencies, test statistic, and the P-
value associated with the test statistic.

Test statistic: The test statistic is a chi-square random variable (Χ 2) defined by the following
equation.

where Or,c is the observed frequency count at level r of Variable A and level c of Variable B, and
Er,c is the expected frequency count at level r of Variable A and level c of Variable B.

P-value: The P-value is the probability of observing a sample statistic as extreme as the test
statistic.

Interpret Results
If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null
hypothesis. Typically, this involves comparing the P-value to the significance level, and rejecting
the null hypothesis when the P-value is less than the significance level.

Test Based on T Distribution


T-tests are handy hypothesis tests in statistics when you want to compare means. You can
compare a sample mean to a hypothesized or target value using a one-sample t-test. You can
compare the means of two groups with a two-sample t-test. If you have two groups with paired
observations (e.g., before and after measurements), use the paired t-test
What Are t-Values?
T-tests are called t-tests because the test results are all based on t-values. T-values are an
example of what statisticians call test statistics. A test statistic is a standardized value that is
calculated from sample data during a hypothesis test. The procedure that calculates the test
statistic compares your data to what is expected under the null hypothesis.
Each type of t-test uses a specific procedure to boil all of your sample data down to one value, the
t-value. The calculations behind t-values compare your sample mean(s) to the null hypothesis and
incorporates both the sample size and the variability in the data. A t-value of 0 indicates that the
sample results exactly equal the null hypothesis. As the difference between the sample data and
the null hypothesis increases, the absolute value of the t-value increases.
Assume that we perform a t-test and it calculates a t-value of 2 for our sample data. By itself, a t-
value of 2 doesn’t really tell us anything. T-values are not in the units of the original data, or
anything else we’d be familiar with. We need a larger context in which we can place individual t-
values before we can interpret them. This is where t-distributions come in.

What Are t-Distributions?


When you perform a t-test for a single study, you obtain a single t-value. However, if we drew
multiple random samples of the same size from the same population and performed the same t-
test, we would obtain many t-values and we could plot a distribution of all of them. This type of
distribution is known as a sampling distribution.
Fortunately, the properties of t-distributions are well understood in statistics, so we can plot them
without having to collect many samples! A specific t-distribution is defined by its degrees of
freedom (DF), a value closely related to sample size. Therefore, different t-distributions exist for
every sample size.
T-distributions assume that you draw repeated random samples from a population where the null
hypothesis is true. You place the t-value from your study in the t-distribution to determine how
consistent your results are with the null hypothesis.

The graph above shows a t-distribution that has 20 degrees of freedom, which corresponds to a
sample size of 21 in a one-sample t-test. It is a symmetric, bell-shaped distribution that is similar to
the normal distribution, but with thicker tails. This graph plots the probability density function
(PDF), which describes the likelihood of each t-value.
The peak of the graph is right at zero, which indicates that obtaining a sample value close to the
null hypothesis is the most likely. That makes sense because t-distributions assume that the null
hypothesis is true. T-values become less likely as you get further away from zero in either
direction. In other words, when the null hypothesis is true, you are less likely to obtain a sample
that is very different from the null hypothesis.
Our t-value of 2 indicates a positive difference between our sample data and the null hypothesis.
The graph shows that there is a reasonable probability of obtaining a t-value from -2 to +2 when
the null hypothesis is true. Our t-value of 2 is an unusual value, but we don’t know exactly how
unusual. Our ultimate goal is to determine whether our t-value is unusual enough to warrant
rejecting the null hypothesis. To do that, we'll need to calculate the probability.
Test of Variance

You might also like