Download as pdf or txt
Download as pdf or txt
You are on page 1of 61

STAB 2143

BIOLOGICAL STATISTIC
ANALYSIS

Izfa Riza Binti Hazmi


PhD (Insect Systematics &
Taxonomy)
izfahazmi@ukm.edu.my
1
t- test
A t-test used to determine if there is
significant difference between the means of
two groups, which may be related in certain
features.

t-tests are use when:


• We do not know the population variance
• Our sample size is small, n < 30
3
The t test compares two
averages (means) and tells you if
they are different from each other.

The t test also tells you how


significant the differences are.
4
Example: You have a flu and you try a
naturopathic remedy. Your flu lasts a couple
of days.

The next time you have a flu, you take a


Panadol and the flu lasts a week.

You survey your friends and they tell you


that their flu were of a shorter duration (an
average of 3 days) when they took the
5
naturopathic remedy.
What you really want to know is, are
these results repeatable?

A t test can tell you by comparing the


means of the two groups and letting
you know the probability of those
results happening by chance.
6
Types of t-tests

7
8
Independent (un-paired) 9

t-test
The independent samples t-test is the most
common form of the t-test.

It helps you to compare the means of two sets of


data.

For example, you could run a t test to see if the


average test scores of males and females are
different; the test will answers the question,
“Could the differences of m vs f score have
occurred by a random chance?” 10
The independent-samples t-test
compares the means between
two unrelated groups on the
same continuous, dependent
variable.

11
One sample t-test
12
It is used to examine the mean
difference between the sample and
the known value of the population
mean.

We draw a random sample and


compare the sample mean with the
population mean and make a statistical
decision to know whether the sample
13
mean is different from the population
mean.
The average mean height of students
(age 12 years) in Malaysia is 140 mm.
You wanted to test whether the mean
height of class 6 Cemerlang are different
from 140mm. A random sample of 22
student’s heights were collected.

135 119 106 135 180 108 128 160 143 175
170 205 195 185 182 150 175 190 180 195
14
220 235
P= 0.002 < 0.0025
We reject the null hypothesis

The mean height of class 6 Cemerlang are not


equal to 140 mm 15
Stat > Basic Statistics > 1-
Sample t

We use 1-Sample t to compute a confidence


interval and perform a hypothesis test of
the mean when the population standard
deviation, is unknown.

16
Measurements were made on nine widgets. You
wanted to test if the population mean is 5 using a
90% confidence interval.

1. Open the worksheet, and key in your data.


2. Choose Stat > Basic Statistics > 1-Sample t.
3. In one or more samples, each in a column,
enter Values.
4. Check Perform hypothesis test. In
Hypothesized mean, enter 5.
5. Click Options. In Confidence level, enter 90.
Click OK in each dialog box. 17
Session window output

One-Sample T: Values

Test of μ = 5 vs ≠ 5

Variable N Mean StDev SE Mean 90% CI T P


Values 9 4.7889 0.2472 0.0824 (4.6357, 4.9421) -2.56 0.034

18
Interpret the results
• P value, 0.034 < 0.05 (a level)
• We reject the Ho

-2.56

-1.860 1.860

19

SO, the mean for nine widget are not 5


A professor wants to know if her introductory statistics
class has a good grasp of basic math. Six students are
chosen at random from the class and given a math
proficiency test. The professor wants the class to be able to
score above 70 on the test. The six students get scores of
62, 92, 75, 68, 83, and 95. Can the professor have 90
percent confidence that the mean score for the class on
the test would be above 70?

One tail or two tailed?

What is the significance level would be?


20
21
Interpret the results
• P value, 0.074 < 0.1 (a level)
• We reject the Ho

1.71

1.476

22

SO, mean score for the class on the test would be above 70
Two
sample t-test

23
2-sample t-test
In statistical hypothesis testing, a two-
sample test is a test performed on the data
of two random samples, each
independently obtained from a different
given population.

The purpose of the test is to determine


whether the difference between
these two populations is statistically
24
significant.
For example, we want to compare two
categories of some categorical variable
(e.g., compare males and females) or
two populations receiving different
treatments in context of an experiment.

The two-sample t-test - answering


questions about the mean where the
data are collected from two random
samples of independent observations. 25
One of the model assumptions of
the two-sample t-tests for means is
that the observations are
independent.

Thus if samples are chosen so that


there is some natural pairing, then
the two-sample t-test is not
26
appropriate.
A hypothesis test for two population
means to determine whetherthey
are significantly different.

The two-sample t-test for unpaired


data is defined as:
[H0: μ1=μ2] [H0: μ1<μ2] [H0: μ1>μ2]
[Ha:μ1≠μ2] [Ha: μ1>μ2] [Ha: μ1<μ2] 27
Example: You want to compare two car
manufacturers – Company A and Company B – to
determine which makes stronger seatbelts.

You take a sample of seatbelts and measure the


mean amount of force needed to break them.

The 2-sample t-test analyzes the difference


between these two means to determine
whether the difference is statistically significant.
28
H0: μ1 - μ2 = 0
(seatbelt strengths from both
companies are equal)

Ha: μ1 - μ2 ≠ 0
(seatbelt strengths from both
companies are different)
29
Two-sample T for C1 vs C2

N Mean StDev SE Mean


C1 20 33.1 11.0 2.5
C2 20 35.90 4.53 1.0

Difference = μ (C1) - μ (C2)


Estimate for difference: -2.80
95% CI for difference: (-8.18, 2.58)
T-Test of difference = 0 (vs ≠): T-Value = -1.05
P-Value = 0.298 DF = 38 30
Both use Pooled StDev = 8.3974
The data set contains miles
per gallon for U.S. cars
(sample 1) and for Japanese
cars (sample 2).

The first column is miles per


gallon for U.S. cars and the
second column is miles per
gallon for Japanese cars.
31
We are testing the hypothesis that the population
means are equal for the two samples. We assume
that the variances for the two samples are equal.

H0: μ1 = μ2
Ha: μ1 ≠ μ2

32
The absolute value of the test statistic for our
example, 12.62059, is greater than the critical
value of 1.9673, so we reject the null
hypothesis and conclude that the two
population means are different at the 0.05 33
significance level.
A study was performed to evaluate the effectiveness
of two devices for improving the efficiency of gas
home-heating systems.

Energy consumption in houses was measured after


one of the two devices was installed. The two devices
were an electric vent damper (Damper=1) and a
thermally activated vent damper (Damper=2).

You want to compare the effectiveness of these two


devices by determining whether or not there is any
evidence that the difference between the devices is
different from zero. 34
Session window output
Two-Sample T-Test and CI: BTU.In, Damper

Two-sample T for BTU.In

Damper N Mean StDev SE Mean


1 40 9.91 3.02 0.48
2 50 10.14 2.77 0.39

Difference = μ (1) - μ (2)


Estimate for difference: -0.235
95% CI for difference: (-1.450, 0.980)
T-Test of difference = 0 (vs ≠): T-Value = -0.38 P-Value = 0.701 DF = 88
Both use Pooled StDev = 2.8818
35
Interpret the results

Since the p-value is greater than


commonly chosen a -levels , there is
no evidence for a difference in energy
use when using an electric vent
damper versus a thermally activated
vent damper.
36
Example
The scores on a (hypothetical) vocabulary test of a
group of 20 year olds and a group of 60 year olds
are shown below.
Test the mean difference for
significance using the 0.05
level. List the assumptions
made in computing your
answer.

37
38
Two-Sample T-Test and CI: C1, C2

Two-sample T for C1 vs C2

N Mean StDev SE Mean


C1 9 18.89 5.80 1.9
C2 8 25.38 4.81 1.7

Difference = mu (C1) - mu (C2)


Estimate for difference: -6.48611
95% CI for difference: (-12.00667, -0.96555)
T-Test of difference = 0 (vs not =): T-Value = -2.52 P-Value = 0.025
DF = 14
39
To do a 2-sample t-confidence
interval and test

1. Choose Stat > Basic Statistics > 2-Sample t.


2. If your data are unstacked, that is each sample is in a
separate column:
• Choose Each sample is in its own column.
• In Sample 1, enter the column containing the first
sample.
• In Sample 2, enter the column containing the other
sample.
3. If you like, use any dialog box options, and click OK.
40
Dependent samples t-test/Paired t- 41

test
Paired t test (dependent samples):
used to compare related
observations.

Paired sample t-test is used in


‘before-after’ studies, or when the
samples are the matched pairs, or
when it is a case-control study. 42
Before-and-after observations on the same
subjects (e.g. students’ diagnostic test results
before and after a particular module or
course).

A comparison of two different


measurement/treatment where the
measurements/treatments are applied to the
same subjects (e.g. blood pressure
measurements using a stethoscope and a
dynamap). 43
Do test scores differ significantly if the
test is taken at 8 a.m. or noon?

Trace metals in drinking water affect


the flavor and an unusually high
concentration can pose a health
hazard. Ten pairs of data were taken
measuring zinc concentration in
bottom water and surface water. 44
XL company give training to their employee and
they want to know whether the training had impact
on the efficiency of the employee.

They collect data from the employee on a seven


scale rating, before the training and after the
training.

By using the paired sample t-test, they can


statistically conclude whether or not training has
improved the efficiency of the employee.
45
In medicine, by using the paired sample
t-test, we can figure out whether or not
a particular medicine will cure the
illness.

46
Managers at a fitness facility want to determine
whether their weight-loss program is effective.

They randomly select 50 people to participate in


the program.

Because the "before" and "after" samples


measure the same subjects, and because a
subject's weight after the program is related to
her weight before participation, the samples are
dependent.
47
The managers' test uses the following hypotheses
H0: md = 0
(Participants' weight did not change after they completed the
program)
Ha: md ≠ 0
(Participants' weight change after they completed the
program)

• Because we recorded before-and-after measurements on


each individual subject, all variation that exists between
independent members of a sample is absent.
• Any remaining effect is largely due to the effect of the
weight-loss program. 48
Using the example
stated with n = 20
students, the following
results were obtained:

Perform hypothesis
test to check whether
the module give impact
on the students
knowledge.
49
Results

Paired T for C7 - C8

N Mean StDev SE Mean


C7 20 18.4000 3.1523 0.7049
C8 20 20.4500 4.0585 0.9075
Difference 20 -2.05000 2.83725 0.63443

95% CI for mean difference: (-3.37787; -0.72213)


T-Test of mean difference = 0 (vs not = 0): T-Value = -3.23 P-Value = 0.004

P-value less than 0.025, thus we reject H0

Conclusion: there is difference on the performance of student after the 50


module implemented
To compute a paired t-test

1 Choose Stat > Basic Statistics > Paired t.


2 In Sample 1, enter the column
containing the first sample.
3 In Sample 2, enter the column
containing the second sample.
4 If you like, use any dialog box options,
and click OK.
51
52
53
Data - Paired t
• The data from each sample must be in
separate numeric columns of equal length.
• Each row contains the paired
measurements for an observation.
• If either measurement in a row is missing,
MINITAB automatically omits that row
from the calculations.
54
Paired t - Graphs
• Stat > Basic Statistics > Paired t > Graphs
• Displays a histogram, an individual value
plot, and a boxplot of the paired
differences.
• The graphs show the sample mean of the
differences and a confidence interval (or
bound) for the mean of the differences.
• In addition, the null hypothesis test value 55
is displayed when you do a hypothesis test
56
Example of Paired t
• A shoe company wants to compare
two materials, A and B.
• Each of ten boys in a study wore a
special pair of shoes [left (A) and
right (B)].
• After three months, the shoes are
measured for wear. 57
• A paired t-test would probably have a
smaller error term than the corresponding
unpaired test because it removes
variability that is due to differences
between the pairs.

• For example, one boy may live in the city


and walk on pavement most of the day,
while another boy may live in the country
and spend much of his day on unpaved
surfaces. 58
Session window output
Paired T-Test and CI: Mat-A, Mat-B

Paired T for Mat-A - Mat-B

N Mean StDev SE Mean


Mat-A 10 10.630 2.451 0.775
Mat-B 10 11.040 2.518 0.796
Difference 10 -0.410 0.387 0.122

95% CI for mean difference: (-0.687, -0.133)


T-Test of mean difference = 0 (vs ≠ 0): T-Value = -3.35 P-Value = 0.009
59
Interpreting the results
H0: md = 0
Ha: md ≠ 0

• The P=0.009 < 0.025 [reject the H0] suggests that


the two materials do not perform equally.

• Specifically, Material B (mean = 11.04)


performed better than Material A (mean =
10.63) in terms of wear over the three month
60
test period.
n above 30
σ known Inferential statistics n(sample size) small
σ unknown

Z-test t-test

unpaired data are


Paired paired; the
simplest form
of pairing is to
1-sample 2-sample measure each
subject twice
often before
Hypothesized mean given two-sample t methods allow us and after a
to draw conclusions about the treatment is
difference between the means applied
of two independent groups

61
P < α = REJECT H0 P > α = ACCEPT H0

You might also like