Professional Documents
Culture Documents
Chapter 11
Chapter 11
Chapter 11
Nonparametric tests
o Do not rely on statistics and do not always follow a normal distribution. If the
distribution is skewed the mean does not have the same value to us than if the data are
distributed symmetrically
Parametric tests
o Tests that rely on statistics from distribution, such as the normal distribution
The Jarque-Bera test is used to see if your data is drawn from a normal distribution. If the
distribution is normal, then the JB will be zero. The JB will get larger as the sample size increases
which means we can tolerate less and less skew
Compares the mean of a sample to a pre-specified value and tests for a deviation from that
value. It is used to compare the mean of a variable in a sample of data to a (hypothesized) mean
in the population from which our sample data are drawn. This is important because we rarely
ever have access to data for an entire population
The general rule for hypothesis testing is, you have the observed difference in the numerator
and what is effectively the expected difference in the denominator (this gets smaller and smaller
as the sample size increases) because you have more and more of the population in our sample.
Then see if the actual/observe difference is much more than the expected difference by
comparing the ratio to some known statistics
181 (last part)
Why do we care about sample means, this is because we can conclude that the two populations
are representative of the population
The test is nearly the same as the other tests we have done but different because we want to
know if there is a difference in the two sample means and their respective standard deviations
We want to know if the difference that we observed is out of the ordinary, given that we are
using sample data. If we have the z-statistic and calculate the standard error of the difference in
the means
By placing restrictions we can be more precise, and the standard error of the means will be
smaller in magnitude with this restriction, this means we would be more likely to reject the null
hypothesis of equal means because of our expectation of the difference will be smaller
In the end we would be more likely to reject the null hypothesis of equal means because of our
expectation of the differences will be smaller
Often a two-sample difference of means test is not appropriate, this is for two reasons
o Your data is interval or ratio, but they are not from a normal distribution
o Or, you have data that is ordinal or ranked, so the mean and standard deviation also do
not have meanings that they should
If these are present, use a nonparametric test for two independent samples
The Wilcoxon Rank Sum Test:
o Combine all of the data together
o Rank the combine data from lowest to highest
o Separate the samples
o Calculate the sum of the ranks for each sample
If the samples are drawn from the same population, the sum of the ranks should be similar if the
sample sizes are the same
If you have two samples have similar summed ranks when they are somooshed together into
one data set, ranked and then separated, the two “distributions” of those data should be
similar. If the data have summed ranks that are similar, the data would look something like this
when it is smooshed together: XXYYXYXYYYXXYXXYYX
o When its like this you wouldn’t know which data vale came from either sample by
looking at it
Step 5: Calculate the test statistic using the z/t test:
o For this we need the theoretical mean and the standard deviation which are calculated
off of sample sizes and they increase as the sample size increases
o This does not occur with the parametric mean: it increases or decreases but gets closer
and closer to the population mean
Step 6: Compare Zw with the Zw table, the test statistics
o If the Zw you calculate is between the lower and upper critical vales, Zwt* and Zwu*,
you fail to reject the null hypothesis of equal ranks
Measures the number of times an observation from the smaller sample ranks lower than an
observation from the larger sample
If you have two samples taken at the same time from different places and the people who asked
their views on capital punishment, these samples are independent. The views of one spatial unit
(a city) are not dependent the views on another city, they can be influenced by the same thing,
media and politicians but there is not direct link between the two samples
o What will be dependent is a sample from the same city, with the same people who are
asked about their views on capital punishment before and after an information session
covering issues, this sample is dependent
Matched pairs: there is a matching observation (same person, place, and thing) in the other
sample
There are two tests for these;
o Parametric, (a test for interval/ration data)
o Nonparametric for ranked data, or interval/ration data that has been converted to
ordinal data
This tests considers the value difference between the values of each matched pair (same person,
place, thing) and we want to see if there is a pattern
To find the difference for the matched pair, you take the difference between the two variables
The rest of the steps are the same, calculate the test statistic and compare it with the t-table or
calculate a p-value
When data is “strongly ordered” it means each observation has its own rank
We also measure the difference in absolute terms, nothing the difference and then add the sign
of the change to them. It is common not to no have ordered data
Next, we sum up the positive ranks, sum up the negative ranks, and then determine the total
number of matched pairs
When the sample size increases, we can use more “traditional” statistical tables to preform
hypothesis tests
Here we work with calculated percentages, which is ratio data and with proportions we always
calculate a z-statistic because these proportions are normally distributed
Used if we want to know if different neighborhoods within a city supportive some initiatives
There are three steps to this:
o Calculate the proportions of yes, no, pass, fail, support, etc.
o Calculate the test statistic. Only if the difference in proportions is large enough is that
difference considered statistically significant
o Compare the calculated value with a z-table or calculate p-value
To get the p-value;
o Identify the correct test statistic.
o Calculate the test statistic using the relevant properties of your sample
o Specify the characteristics of the test statistic’s sampling distribution
o Place your test statistic in the sampling distribution to find the p value