02 HME 712 Week 2 Audio Performing T Tests

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

02 HME 712 Week 2 Audio Performing t-tests

In this slideshow, we're going to find out how to perform t-test in Stata, so you'll get the
commands. We're concerned in this slideshow, just with the tests for independent
samples, we're going to do single sample, and also two sample t-tests in this slideshow.
Recall that we only want to do t-tests when the conditions have been met. Therefore, in
this slideshow, we're going to spend quite a lot of time looking at our independent samples
or our single sample and deciding whether or not they come from a normal or are likely
to come from a normal distributed population at random. And secondly, are the variances
equal or not for two sample tests.
It may seem obvious, but you'd be surprised that every year some student says I don't
get any results and they don't have the data loaded. So, make sure that you've got the
correct variable in your data set. If you're going to type a command about ht, make sure
there's a variable called ht otherwise data will not be able to perform the analysis for you.
Generally, you need one continuous variable for height, if you're looking at height, and a
second binary variable to identify the two groups that you want to compare. This is called
the long format. In the rare situation where you have wide formats you'll need for example,
height of male and high to female, but you won't need any binary variable.
Here we have the correct layout for the variable height, and the binary variable, sex. In
other words, the long format. Look in your data editor and make sure that you have the
data before you try the analysis. In this case, sex equals zero is used for females, and
sex equals one is used for males. Here we have an example of the wide format for the
same kind of data, heights of males and heights of females. This is compulsory if you're
doing a paired t-test, which we're going to learn about later. But it's very unusual, although
it can happen, to have this wide format when you're doing a two-sample t-test. If your
variable has got a sample size greater than 30, it's recommended that you use a
histogram, plot a histogram, a frequency histogram, and see whether it looks more or less
normal. It doesn't have to be very normal; so it just has to be unimodal and not too
skewed. Remember, the t-tests are robust to this assumption.
If a sample is smaller than 30, then the histogram can be very misleading. It may make
you think that this is not normal, whereas in fact, it has come from a normal distribution.
So we prefer to use the Shapiro Wilk test or the equivalent. And that's what we use. If the
p-value for the Swilk is less than 0.05, then you've got a problem, don't do the t-test.
Here we have Stata output for a Shapiro Wilk test, and you'll see that the p-value is
0.04817. So, it's less than 0.05, so we reject the null hypothesis. As in this situation, with
only 14 observations, we would not perform a t-test. Just a Stata tip, to save you a bit of
time. If you're doing a Shapiro Wilk test on a number of variables, you can do them all in
one go like this; you can say for example, smoke, height, weight, age, and it will give you
the Shapiro Wilk p-values for all three. In this case, we will only be happy to do a t-test on
age.
The next thing we have to decide if it's a two-sample test, is whether or not to do Student's
test or Welch test. Remember the differences that if the variances are equal, then we do
Student's test. If they're not equal, we do Welch's test. Now, it's customary in some places
to use an F test to do this, please do not use the F test to decide. We're going to use a
visual method; we're just going to appraise the variances and decide whether there's 10%
02 HME 712 Week 2 Audio Performing t-tests

or less difference in which case we'll do a Student's test. The F test lacks power, and you
end up doing too many Student's tests and committing type one errors.
The way in which we assess equality of variances is simply by inspection. And as a rule
of thumb, if there is less than 10% difference in the variances, we can treat them as if
they were equal. If it's 10% or greater, we should probably do a Welch test. Be careful,
we're comparing a 10% difference in the variances, not the standard deviations. If you
look in that table, standard deviation 2.5, 2.7, not very different. But once you square
them, the variances become 6.25 and 7.29, that's quite a lot different. It's a 14%
difference. So, we will do a Welch test.
In summary, for both Welch and the Student's t-test, we want quantitative data, they must
be normally distributed. And for the Student's t-test variances must be equal. If they're
not, we do the Welch t-test. If you're doing a single sample t-test, then the null hypothesis
is always that mu equals the gold standard. The differences that Ha can take one of three
options, either mu is not equal to the gold standard for a two-sided option. That's if we
don't mind if it's if it's too high or too low, both are equally bad. On the other hand, if we're
only concerned if mu is higher than the gold standard, then the Ha would be, mu greater
than GS. If we're only concerned if mu is less than the gold standard, then Ha would be,
mu less than GS.
The Stata command for the single sample t-test is the same irrespective of which option
for the alternative hypothesis you want to explore. So, it's always the same, it's simply t-
test variable name equals the gold standard. Irrespective of which one of the Ha you are
interested in, the commanding status is exactly the same. It's very simple, we just state
that t-test... that the variable equals the gold standard. So, if it's the mean of the birth
weights, and the gold standard is 3.4. We'd say t-test birth weights equals 3.4. Now we
turn our attention to the Stata commands for a t-test for two independent samples. The
Stata command is... are quite simple. But before we get to them, let's talk about the null
and alternative hypotheses. Unlike the single tailed test, these are quite straightforward.
Because they don't vary. We almost always do two sided tests for two sample t-tests, we
almost never do single sided. So, we can ignore all those alternative Ha's.
For a two sample t-test, which is a common test that people do, and if it's long format,
then for the Welch test and the t-test, the commands are the same, they're given here,
except that for the Welch test, you must type in, unequal after the comma. If in a rare
situation, where you’ve got a wide format, the commands are shown here again. Don't
forget to put unpaired. If you don't do that Stata we'll assume that these are paired data
which they aren't, and Stata will perform a paired t-test which would be incorrect. Here
we have two hypothetical examples, one with long format one with wide format. Hopefully
once you've read through the scenarios, there are many scenarios and you've looked at
the tests. It will all make sense.

You might also like