Introduction On Chi Square Distribution

Introduction on Chi Square Distribution:-
The Chi Square distribution is a mathematical distribution that is

used directly or indirectly in many tests of significance. The most
common use of the chi square distribution is to test differences
among proportions. although this test is by no means the only test
based on the chi square distribution, it has come to be known as the
chi square test. The chi square distribution has one parameter,
its degrees of freedom (df). It has a positive skew; the skew is less
with more degrees of freedom. As the df increase, the chi square
distribution approaches a normal distribution. The mean of a chi
square distribution is its df. The mode is df - 2 and the median is
approximately df -0.7.
Significance Test
A significance test is performed to determine if an observed value of a statistic differs

enough from a hypothesized value of a parameter to draw the inference that the
hypothesized value of the parameter is not the true value. The hypothesized value of
the parameter is called the "null hypothesis." A significance test consists of
calculating the probability of obtaining a statistic as different or more different from
the null hypothesis (given that the null hypothesis is correct) than the statistic
obtained in the sample. If this probability is sufficiently low, then the difference
between the parameter and the statistic is said to be "statistically significant."
Just how low is sufficiently low? The choice is somewhat arbitrary but by convention
levels of 0.05 and 0.01 are most commonly used.
For instance, an experimenter may hypothesize that the size of a food reward does
not affect the speed a rat runs down an alley. One group of rats receives a large
reward and another receives a small reward for running the alley. Suppose the mean
running time for the large reward were 1.5 seconds and the mean running time for
the small reward were 2.1 seconds.
The difference between means is thus 2.1 - 1.5 = 0.6 seconds. The test of whether
this difference is significant consists of determining the probability of obtaining a
difference as large or larger than 0.6 seconds given there is really no effect of
magnitude of reward. If the probability is low (below the significance level) then the
null hypothesis that magnitude of reward makes no difference is rejected in favor of
the alternate hypothesis that it does make a difference. The null hypothesis is not
accepted just because it is not rejected.
Degrees of Freedom
Estimates of parameters can be based upon different amounts of information. The

number of independent pieces of information that go into the estimate of a
parameter is called the degrees of freedom (df). In general, the degrees of freedom
of an estimate is equal to the number of independent scores that go into the
estimate minus the number of parameters estimated as intermediate steps in the
estimation of the parameter itself. For example, if the variance, σ², is to be estimated
from a random sample of N independent scores, then the degrees of freedom is
equal to the number of independent scores (N) minus the number of parameters
estimated as intermediate steps (one, μ estimated by M) and is therefore equal to N-
1.
Significance Level
In hypothesis testing, the significance level is the criterion used for rejecting the null
hypothesis. The significance level is used in hypothesis testing as follows: First, the
difference between the results of the experiment and the null hypothesis is
determined. Then, assuming the null hypothesis is true, the probability of a
difference that large or larger is computed . Finally, this probability is compared to
the significance level. If the probability is less than or equal to the significance level,
then the null hypothesis is rejected and the outcome is said to be statistically
significant. Traditionally, experimenters have used either the 0.05 level (sometimes
called the 5% level) or the 0.01 level (1% level), although the choice of levels is largely
subjective. The lower the significance level, the more the data must diverge from the
null hypothesis to be significant. Therefore, the 0.01 level is more conservative than
the 0.05 level. The Greek letter alpha (α) is sometimes used to indicate the
significance level. See also: Type I error and significance test.
Type I and II errors
There are two kinds of errors that can be made in significance testing: (1) a true null
hypothesis can be incorrectly rejected and (2) a false null hypothesis can fail to be
rejected. The former error is called a Type I error and the latter error is called a Type
II error. These two types of errors are defined in the table.
True State of the Null Hypothesis

Statistical Decision
H0 True H0 False
Reject H0 Type I error Correct
Do not Reject H0 Correct Type II error
The probability of a Type I error is designated by the Greek letter alpha (a) and is
called the Type I error rate; the probability of a Type II error (the Type II error rate) is
designated by the Greek letter beta (ß) . A Type II error is only an error in the sense
that an opportunity to reject the null hypothesis correctly was lost. It is not an error
in the sense that an incorrect conclusion was drawn since no conclusion is drawn
when the null hypothesis is not rejected
A Type I error, on the other hand, is an error in every sense of the word. A conclusion
is drawn that the null hypothesis is false when, in fact, it is true. Therefore, Type I
errors are generally considered more serious than Type II errors. The probability of a
Type I error (α) is called the significance level and is set by the experimenter. There is
a tradeoff between Type I and Type II errors. The more an experimenter protects
himself or herself against Type I errors by choosing a low level, the greater the
chance of a Type II error. Requiring very strong evidence to reject the null hypothesis
makes it very unlikely that a true null hypothesis will be rejected. However, it
increases the chance that a false null hypothesis will not be rejected, thus lowering
power. The Type I error rate is almost always set at .05 or at .01, the latter being
more conservative since it requires stronger evidence to reject the null hypothesis at
the .01 level then at the .05 level.
Applications:-
 To test if, more than two population proportions can be considered equal.
 To determine if two attributes are independent of each other.
 To test goodness of fit .
 To test the discrepancies between observed and expected frequencies.
Characteristics:-
 Continuous distribution.
 Assumes non negative values only .
 Chi square distribution curve starts at the origin and lies entirely to the right
of Y axis.
 4.Parameter degrees of freedom (d f) also figure.
 5.Shape of Chi square distribution curve is skewed for very small d.f and
changes drastically as d.f. increases . For large d.f, Chi-square distribution
looks like a normal distribution curve.
Conditions for Use:-
 Used for large samples.
 All expected frequency > 10 for good accuracy, and should be minimum 5.
 Take Ho: There is no significant difference between the sample proportions or
between the observed and the corresponding expected values.
 If the degrees of freedom and area required in the right tail (ie, significance
level of test ) are given, the critical values of chi-square can be found from the
table.
 If the Chi sq value got is > critical value, H 0 is rejected.
Chi Square Test:-
Chi Square c2 for goodness of fit:-
The c2 test is a statistical method that tests whether a given set of data fit a hypothesis.
 Tests the probability of this
 Not that the hypothesis is correct but the probability
 Can the test accept the hypothesis and it’s wrong? Can we reject a correct
hypothesis? You bet! That’s why we call it probability and not absolutility!
 But in the following pages you will see how statistics can gauge the confidence
of our answer.
To perform the c2 statistical test you must:-
 Make a null hypothesis concerning your data
 Predict the outcome of the data if your null hypothesis were correct
 Establish the c2 value
 Determine the degrees of freedom for the test
 Determine the probability that your null hypothesis is correct
 Accept or reject your null hypothesis based on the probability
 Draw a conclusion
These 7 steps will be explained in detail in the following example. :-
Example: If we observed 99 purple and 45 white flowers, is this a 3:1 ratio? Use
c2 to test this hypothesis.
So we can write the problem as:
A 3:1 ratio is indicative of a monohybrid cross:
Pp X Pp -> 3 P_ : 1 pp
99 purple and 45 white flowers total = 144
Are these progeny in a 3:1 ratio representative of a monohybrid cross?
Step 1. State null hypothesis in detail
Hypothesis: These progeny are in a 3:1 ratio
representative of a monohybrid cross
Pp X Pp
gives
3 Purple : 1 White
Would give progeny: P_ & pp in a 3:1 ratio
Step 2. Determine rules of probability to predict expected values
If this is a 3:1 ratio, then we would expect ¾ purple and ¼ white:
99 purple + 45 white = 144 total progeny
- So, we would expect for the Purple ¾ of 144 = 108.

- We would expect for the White ¼ of 144 = 36.
- 108 purple and 36 white are the expected values
- The given data are the observed values (ie, in this case 99 purple and 45
white)
Step 3. Establish the c2 value
-First, determine deviation of actual data from expected:
For each class:
Find the difference from the expected:
108 - 99 = 9 purple
36 - 45 = -9 white
Square the difference:
(9)2 = 81 purple
(-9)2 = 81 white
Divide by the expected:
81/108 = 0.75 purple
81/36 = 2.25 white
Step 3. Establish the c2 value
Total the results for all the classes of progeny
c2 = 0.75 + 2.25 = 3.0
We can summarize Step 3 by the equation:

c2 = S (observed - expected)2
expected
So now that we have a number, What does this value mean?
To answer that question we have to remember that c 2 is determined from the eviations
from the expected values. Therefore the smaller the number is the smaller the
deviation is from the expected values:
If c2 = 0
Data fit hypothesis exactly and no difference is seen
Conversely, the larger the number is the larger the deviation is from the expected
values.
Step 4: Determine the Degrees of Freedom for the test
The degrees of freedom are the number of independent variables
One phenotype is not variable with itself and so would have 0 degrees of freedom.
Two phenotypes have one variable and therefore would have 1 degree of freedom, etc.
We can therefore summarize this by saying that:
Degrees of freedom = # of classes - 1
For our example, the degrees of Freedom = 2 (Purple and White) - 1 = 1
Step 5. Determine the probability that your null hypothesis is correct

Up to now we still have not determined the “probability” that the data fit the
hypothesis but only the sum of how far each individual data point has deviated from
the expected. T
We will take a little detour to demonstrate that probability depends on sample size:
Probability depends on sample size
Let’s look at 3 sample sizes below, 4, 8, and 40. Here we are looking at tall vs. dwarf
trees. Now let’s look at the predicted distribution of trees having the indicated
number of tall trees on the left for each category and the probability for each outcome
on the right. Since tall is dominant, it is not surprising to see that the probability is
higher for higher numbers of trees to be tall. You will notice though that the larger the
number the more accurate the prediction of ¾ should be tall.
as n gets larger, curve gets smoother

So when you plot the results of the previous table you see that as you increase the
number, the curve gets smoother, but there is less change in probability. With 40 trees
you see that you get a bell curve shape. Let’s examine the last one in greater detail.
Ratios within 95% limits are supportive of hypothesis
How do we know what probability is most likely to be correct?
If we look at the graph, 95% of the area is under the curve and 5% is in the shoulder
regions. This 5% is the data that is most in doubt – do probabilities that fall into this
region mean that the hypothesis is incorrect? No. Do probabilities that fall under in
the 95% area mean that the hypothesis is correct? No. But depending upon sample
size if we are within those 95% confidence limits we accept the hypothesis.
• Degrees of freedom (df) are listed in the outer columns
• Probabilities head the interior columns
• The numbers in those columns are c2 values
If we go from the graph to the c2 table below we see columns with probabilities as
headings.
Within the body of the table are the c2 values that correspond to the probability at the
top.
(bell curve shoulders)

To return to our example, to determine the probability that our data fit the null
hypothesis, we will use both the degrees of freedom that you determined (1) and the c 2
value (3.0) in a c2 table shown below.
Find the df line you determined
Locate the 2 columns that span the c2 value from your analysis
Go to the top of the column for the probability (p)
Since we have 1 df and our c2 is 3.0 our probability is between 0.1 and 0.05.
We would write this as 0.1>p>0.05

Step 6 Accept or reject the null hypothesis based on the probability
Since p is above 0.05, we accept the hypothesis.
Step 7 Draw the conclusion
These data for purple and white are indicative of a 3:1 ratio
The c2 test can be used to test any type of genetic hypothesis using these 7 steps:
Monohybrids, dihybrids, testcrosses, and as we’ll see linkage of 2 genes.

Introduction On Chi Square Distribution

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction On Chi Square Distribution

Uploaded by

Copyright:

Available Formats

Introduction on Chi Square Distribution:-

The Chi Square distribution is a mathematical distribution that is

A significance test is performed to determine if an observed value of a statistic differs

Estimates of parameters can be based upon different amounts of information. The

Type I and II errors

True State of the Null Hypothesis

Reject H0 Type I error Correct

Do not Reject H0 Correct Type II error

 To determine if two attributes are independent of each other.

 To test goodness of fit .

 To test the discrepancies between observed and expected frequencies.

 Assumes non negative values only .

 4.Parameter degrees of freedom (d f) also figure.

Conditions for Use:-

 Used for large samples.

 If the Chi sq value got is > critical value, H 0 is rejected.

Chi Square Test:-

Chi Square c2 for goodness of fit:-

 Tests the probability of this

 Not that the hypothesis is correct but the probability

To perform the c2 statistical test you must:-

 Make a null hypothesis concerning your data

 Establish the c2 value

 Determine the degrees of freedom for the test

 Determine the probability that your null hypothesis is correct

 Accept or reject your null hypothesis based on the probability

So we can write the problem as:

A 3:1 ratio is indicative of a monohybrid cross:

99 purple and 45 white flowers total = 144

Are these progeny in a 3:1 ratio representative of a monohybrid cross?

Step 1. State null hypothesis in detail

Hypothesis: These progeny are in a 3:1 ratio

representative of a monohybrid cross

Would give progeny: P_ & pp in a 3:1 ratio

Step 2. Determine rules of probability to predict expected values

If this is a 3:1 ratio, then we would expect ¾ purple and ¼ white:

99 purple + 45 white = 144 total progeny

- So, we would expect for the Purple ¾ of 144 = 108.

- 108 purple and 36 white are the expected values

Step 3. Establish the c2 value

-First, determine deviation of actual data from expected:

For each class:

Find the difference from the expected:

Square the difference:

Divide by the expected:

81/108 = 0.75 purple

81/36 = 2.25 white

Step 3. Establish the c2 value

Total the results for all the classes of progeny

c2 = 0.75 + 2.25 = 3.0

We can summarize Step 3 by the equation:

So now that we have a number, What does this value mean?

Data fit hypothesis exactly and no difference is seen

Step 4: Determine the Degrees of Freedom for the test

The degrees of freedom are the number of independent variables

Degrees of freedom = # of classes - 1

For our example, the degrees of Freedom = 2 (Purple and White) - 1 = 1

Step 5. Determine the probability that your null hypothesis is correct

Probability depends on sample size

as n gets larger, curve gets smoother

Ratios within 95% limits are supportive of hypothesis

How do we know what probability is most likely to be correct?