BIO1103PE2

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

FAR EASTERN UNIVERSITY MANILA

INSTITUTE OF HEALTH SCIENCES AND


NURSING
Medical Technology Department
Nicanor Reyes St. Sampaloc Manila
Tel. no. (02) 877 7338

PRACTICE EXERCISE 2
CHAPTER 2: Basic Statistics and Chemometrics
Cuevas, Marish Ydrick
Demafelis, Richie Anthea
Diaz, Aalia Mhylls

Significance Testing
● A way to determine whether the difference between two or more values is too large to be
explained by indeterminate error is via significance testing.
● First step of any significance testing is through stating the null (H0) and alternative
hypothesis (HA).
o One of these hypotheses will be rejected while the other one will be accepted.
However, when failing to reject null hypothesis, it is not the same as accepting it. A
null hypothesis is retained because there is insufficient evidence to reject it.
● Second step is to identify the confidence level/confidence interval (CI), which defines the
probability that we are rejecting the null hypothesis. The confidence level can also be
expressed as α value. The confidence value α refers to the probability that we are
incorrectly rejecting the null hypothesis.
o Usually we set the confidence interval at 95%, but it really depends on the
𝐶𝐼
researcher. On the other hand, we can solve for α = 1 − 100
.
o Say, if we are testing at 95% confidence level, it means that the probability that we
are correctly rejecting the null hypothesis is 95%, while the probability of
incorrectly rejecting the null hypothesis is 5%.
o Is it always ideal to reject the null hypothesis? Not really, because there are cases
where having no significant difference is much preferred. It really depends on the
researcher that conducts the significance testing.
● Aside from identifying the confidence level, we are also to determine if we are conducting
the significance testing whether under one-tailed or two-tailed testing.
o If we are to just determine if there is any significant difference between two or
more values, we are doing significance testing under two-tailed testing.
If we are testing under two-tailed testing, we state the hypothesis as “…no significant difference…”,
or :
𝐻0: µ = µ0
𝐻𝐴: µ ≠ µ0
o However, if we are to determine if there was an increase, or decrease in the
difference between values, we use one-tailed testing.
o If we are testing under one-tailed testing, we state the hypothesis as “…no
significant increase/decrease depending on the condition:

𝐻𝐴: µ > µ0 (…is greater than…)


𝐻𝐴: µ < µ0 (… is less than…)
o Regarding any statistical testing, the usual testing is done under two-tailed testing,
and α=0.05.
● Aside from CI, we also determine p-value, which refers to the probability that the test
statistic occurred by chance alone. It is a different value from the α value.
● Setting up these parameters is needed so that we can locate the critical value to which we
are to compare your test statistic.
● After setting the parameters needed, we determine the appropriate testing method to be
applied. For two groups with unknown variance and N<30, we use t-test.
● Lastly, we compare the calculated statistic to the tabulated statistic, and from that, we
determine whether we reject the null hypothesis or retain it.

Student’s t-test
● Student’s t-test is a way to compare if there are significant difference between two means.
We conduct t-test for two groups samples of unknown variances and the N<30.
● A one sample t-test tests the mean of a single group against a known mean/population
mean.
● An independent samples t-test compares the means for two groups.
● A paired sample t-test compares means of two groups dependent of each other, could be
by cause and effect or time (before and after).
● We reject or fail to reject the null hypothesis depending on the conditions:
H0 HA
tcalc>ttab Reject H0 Accept HA

tcalc<ttab Fail to reject H0 Do not accept HA

Working Example
A. A supplier claims that there would be change in the longevity of batteries after they added new
Li ores sourced from other mining sites. To validate the supplier’s claim, you were tasked to test if
there would be significance difference by measuring the number of hours the battery lasts before,
and after addition of Li ore. You apply the parameters 90% CI, p=0.10.

Group 1 Group 2
Battery without new Li ore (hrs) Battery with new Li ore (hrs)
1 177 6 152 1 181 6 180
2 185 7 165 2 189 7 158
3 172 8 194 3 170 8 189
4 163 9 183 4 161 9 187
5 181 10 175 5 182 10 180

1. State the hypothesis:


H0 = There is no significant difference between the longevity of batteries that do not contain the
new Li ore and those that contain the new Li ore.
HA = There is significant difference between the longevity of batteries that do not contain the new
Li ore and those that contain the new Li ore.
2. Determine the parameters needed for the analysis.
For this example, p-value = 0.10. Meanwhile, since we are only testing for any significant
difference, we use two-tailed testing.
3. Determine the appropriate test method
For this problem, since we don’t know the population standard deviation (σ), and N<30, we use
t-test, specifically paired t-test.
To solve for t-test statistic tcalc:
3.1. We first tabulate our values

3.2. Get the difference between values of two groups, then sum them up. We will call this as ∑ 𝐷.

2
3.3. Square the differences then add them up. We will call this as ∑ 𝐷 .

No. Group 1 Group 2 D D2

1 177 181
2 185 189
3 172 170
4 163 161
5 181 182
6 152 180
7 165 158
8 194 189
9 183 187
10 175 180
SUM

2
So, we determine that for sample size N=_______, ∑ 𝐷 is ________, while ∑ 𝐷 is _________.

3.4. Once we determine the sums, we plug in those values in the formula below.
∑𝐷

𝑁
𝑡𝑐𝑎𝑙𝑐 =
2
⎛∑𝐷⎞
2 ⎝ ⎠
∑𝐷 − 𝑁

𝑁(𝑁−1)
( )
( )
𝑡𝑐𝑎𝑙𝑐 = 2
( )
( )− ( )
( )( −1)

𝑡𝑐𝑎𝑙𝑐 = ___________
3.5. Compare tcalc to ttab. Determine if we will reject H0 or not. To determine t tab, use the t-table for
comparison. For this problem, CI = 95%, p=0.10, two-tailed testing. Aside from the two values
below, we also need to determine degrees of freedom (df). If we have CI, p, and df we can
determine the needed ttab.
df= N-1 = ________
H0 HA
tcalc>ttab Reject H0 Accept HA
tcalc<ttab Fail to reject H0 Do not accept HA
ttab:________
tcalc __________ ttab

Since tcalc __________ ttab, we (reject, fail to reject) the null hypothesis. Therefore, there is (a significant,
no significant) difference after addition of the new Li ores.

*How about independent t-test? For independent t-test, we either assume that your two groups have equal
or unequal variances. To confirm, we first apply F-test to confirm if variances are equal, then we apply a
new formula depending on variances if equal or unequal. EXAM WILL ONLY COVER paired t-test.

Test for Outliers

● An outlier is any data point that is far away from


other points. They are usually created by gross
errors.
● There are various ways to test for outliers but the
most recommended is the Grubb’s test.
● We compute for a G value (G calc) then compare it
to the tabulated G value (G tab) .

||𝑥 |
| 𝑞𝑢𝑒𝑠𝑡𝑖𝑜𝑛𝑎𝑏𝑙𝑒−𝑥||
𝐺𝑐𝑎𝑙𝑐 = 𝑠

• If Gcalc > Gtab, the questionable value is an outlier


should be discarded.
• If Gcalc <Gtab, the questionable value should be
retained.

t-TEST TABLE
How to use: Solve for df. Then determine p value, which
is the value in the upper part. Make sure to note if one-tailed or two-tailed testing.
Practice Exercises

A. A researcher wants to determine if there is a significant difference in the Fe content of food


supplements after addition of lyophilized seaweed extract (LSE) at 97.5% CI. Given below are Fe
content (expressed as mg Fe/100g) of food supplements before and after addition of LSE.
Supplement Food supplements Food supplements
No. without LSE with LSE
1 49 52
2 53 55
3 51 52
4 52 53
5 47 50
6 50 54
7 52 54
8 53 53

1. Find the mean, median, standard deviation, variance, RSD (in terms of %) of both groups.

2. State the null and alternative hypotheses.

3. Show if there is a significant difference by presenting solutions on how you solved for t calc and its
comparison to ttab.
4. Do we reject the null hypothesis? What would be the p inequality for the groups?

B. You and your friends tried doing significance testing out on a whim to know if the Dolomite
Beach can temporarily improve the emotional state after looking at it. So, your group made
randomized survey of 10 participants where they were asked initially to rate their emotional state
in a scale of 1-10, where 1 being least happy, and 10 being the happiest. Then, you led them to the
Dolomite Beach, let them sit for 15 minutes, and then asked again for their emotional state,
applying the same scale. Below are the score of the participants before and after staying at the
dolomite beach.
Participant Before Looking After Looking Participant Before Looking After Looking
No. at DB at DB No. at DB at DB
1 3 5 6 7 9
2 5 6 7 2 5
3 2 5 8 5 3
4 6 3 9 6 7
5 8 8 10 5 3

1. Find the mean, median, standard deviation, variance, RSD (in terms of %) of both groups.

Mean: 3+5+2+6+8+7+2+5+6+5/10 = 4.9


Median: 3, 5 ,2 ,6 ,8 ,7 ,2 ,5, 6, 5 => 5+5/2 = 5
Standard Deviation: = 2.02
Variance: s^2= 4.1
Relative Standard Deviation: 100(4.9)/5.44 = 41.22%

Mean: 5+6+5+3+8+9+5+3+7+3/10 = 5.4


Median: 5, 6, 5, 3, 8, 9, 5, 3, 7, 3 => 8+9/2 = 5
Standard Deviation: 2.11
Variance: s^2 = 4.48
Relative Standard Deviation: 100(2.11)/5.4 = 39.07%

2. State the null and alternative hypotheses.

Null = H0 = 0.4122
Alternative = Ha ≠ 0.4122

Null = H0 = 0,.3907
Alternative = Ha ≠ 0.3907

3. Show if there is a significant difference by presenting solutions on how you solved for t calc and its
comparison to ttab.

VARIABLE 1 VARIABLE 2

Mean 4.9 5.4

Variance 4.1 4.488888889

Observation 10 10

Pearson Correlation 0.450656234

Hypothesized Mean Difference 0

df 9

t Stat -0.727606875

P(T<=t) one-tail 0.242675532

t Critical one-tail 1.833112933

P(T=<t) two-tail 0.485351064

t Critical two-tail 2.262157163

4. Do we reject the null hypothesis? What would be the p inequality for the groups?

We do not reject the null hypothesis. There is no significant difference between before and after
looking at Dolomite beach

C. Given are chloride (Cl-) content of most drinking water samples gathered from a well source.
The chloride content was determined via argentometry or titrating the water sample with 0.01 M
of AgNO 3.
1 2 3 4 5 6 7 8 9
g Cl- per 100 mL
1.05 2.79 2.67 2.99 3.85 3.77 3.56 4.21 4.02
water (×10-3)

1. Find the mean, median, standard deviation, and RSD (as decimal) of the data set.

Mean= 1.05+2.79+2.67+2.99+3.85+3.77+3.56+4.21+4.02/9 = 3.21


Median=1.05, 2.67, 2.79, 2.99, 3.56, 3.77, 3.85, 4.02, 4.21
Mode= There is no mode because no value was repeated in the given data set.
Standard Deviation (sx)= 0.98
Relative Standard Deviation= s/x → 0.98/3.21 = 0.31
n= 9 min(x)= 1.05 max(x)= 4.21 Q1= 2.73 Q3= 3.94
Σx= 28
Σx^2= 100.55
σ^2x= 0.85
σx= 0.93
s^2x= 0.96

2. Is the highest value an outlier? Show by computing for G calc then compare it with Gtab.

Gtab= 2.110
Gcalc= | questionable value – mean| / standard deviation
Gcalc= | 4.22 – 3.21| / 0.98
1.02, therefore, NOT AN OUTLIER

3. Is the lowest value an outlier? Show by computing for G calc then compare it with Gtab.

Gtab= 2.110
Gcalc= | questionable value – mean| / standard deviation
Gcalc= | 1.05 – 3.21| / 0.98
2.20, therefore, IT IS AN OUTLIER

4. Find the new mean, standard deviation, and RSD (as decimal) of the data set AFTER removing
the outlier.

Mean = 2.79+2.67+2.99+3.85+3.77+3.56+4.21+4.02 ÷ 8 = 3.50


Standard Deviation (sx) = 0.58
RSD = s/x → 0.58/3.50 = 0.20

D. Below are amounts of Au that were extracted from electronic scraps thru electrogravimetric
process.

1 2 3 4 5 6 7 8
Mass Au (μg) 18 29 20 27 22 39 21 23

1. Find the mean, median, standard deviation, and RSD (as decimal) of the data set.

Mean: 18 + 29 + 20 + 27 + 22 + 39 + 21 + 23 ÷ 8 = 24.875 — 24.88


Median: 18, 20, 21, 22, 23, 27, 29, 39 — 22 + 23 ÷ 2 = 22.5
Standard Deviation: 6.75
Relative Standard Deviation (RSD): 6.75 ÷ 24.88 = 0.2713

2. Is the highest value an outlier? Show by computing for G calc then compare it with Gtab.

18, 20, 21, 22, 23, 27, 29, 39


Gcalc.: |39 − 24.88|
6.75
2.091, CONSIDERED AS AN OUTLIER

Gtab: 2.032

3. Is the lowest value an outlier? Show by computing for G calc then compare it with Gtab.

18, 20, 21, 22, 23, 27, 29, 39


Gcalc.: |18 − 24.88|
6.75
1.019, NOT AN OUTLIER
Gtab: 2.032

4. Find the new mean, standard deviation, and RSD (as decimal) of the data set AFTER removing
the outlier.

Mean: 18 +29 +20 + 27 + 22 + 21 + 23 = 22.86


Standard Deviation: 1.77
Relative Standard Deviation (RSD): 1.77 ÷ 22.86 = 0.774

-END-

You might also like