Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Biostatistics (Final Term)

Inferential Statistics (Applied Statistics): Branch of statistics that uses various analytical tools to draw
inferences about the population data from sample data.
Estimation: Statistical estimation is a method by which we estimate the population parameters from
sample values (i.e. statistics).

Estimator: The sample value e.g. ̅ X or s.d. is called estimator and the actual value obtained by evaluating
and estimator in a given instance is called estimate.
Biased Estimation: : Statistic is called biased estimator of corresponding parameter if the mean of the
sampling distribution of the statistic is NOT equal to the corresponding population parameter.
Unbiased Estimation: Statistic is called unbiased estimator of corresponding parameter if the mean of
the sampling distribution of the statistic is equal to the corresponding population parameter.
Point estimation: A single statistic is used to provide an estimate of population parameter.
Interval Estimation: A Range of values is used in making estimation of population parameter. For
example, if height of a student is measured as 175cm. then the measurement give the point estimation but
if the height of student is given as 175 ± 5cm then measurement is gives an interval estimation.
Confidence Interval: In interval estimation of population parameter, we can find two quantities t1 & t2
based on sample observations drawn from the population such that the unknown parameter is included in
the interval (t1 , t2) in a specified percentage of cases, then this interval is called confidence interval.
Confidence Limit: the lower limit t1 & upper limit t2 of the confidence interval are called confidence
limits.
Parameter Test: are those that make assumptions about the parameters of the population distribution
from which the sample is drawn.
Non-Parameter Test: are not based on assumptions, that is, the data can be collected from a sample that
does not follow a specific distribution.
P-Value Level of Significance
Is the probability which is calculated from the test Is the probability of rejecting the null hypothesis
statistics and compared to the level of significance when the null hypothesis is true.
to judgment that the null hypothesis is accepted or
rejected.
Indicate a probability that you calculate after a Pre-chosen probability
given study.
Null hypothesis (Ho): Statistical Hypothesis taken for possible acceptance.
Alternative Hypothesis (H1): Statistical Hypothesis complementary to null hypothesis.
Composite Hypothesis: A statistical hypothesis which does not completely specify the distribution of a
random variable and covers a set of values from the parameter space. E.g. if the entire parameter space
covers -∞ to +∞ a composite hypothesis could be μ ≤ 0. An Alternative hypothesis is always a composite
hypothesis.
Errors of sampling:

Suhaib Afzaal
Type I Error (α): When null hypothesis is true but it is rejected by test procedure leading to a wrong
decision.
Type II Error (β): When null hypothesis is false but it is accepted by test procedure leading to a wrong
decision.
Level of significance: The maximum probability of making type I error is called level of significance
denoted by “α”. Then the probability of making the correct decision is (1-α). α is generally specified
before a test procedure in most cases level of significance is 1%(α=0.01) or 5% (0.05).
Critical region: Also called “Rejection Region” is the area under standard normal curve in whinch the
test statistic value has to fall for the rejection of null hypothesis.
Example 9.1 Systolic BP of 100 males was taken Mean BP was found to be 128mm Hg and Standard
Deviation is 13mmHg. Find (i) 95% and (ii) 99% confidence limits of BP within which population
mean would lie.
Solution:
̅ = 𝟏𝟐𝟖𝒎𝒎𝑯𝒈
𝒙 (i) 95% Confidence Limit
n = 100 ̅) = 𝟏𝟐𝟖 ± 𝟏. 𝟗𝟔(𝟏. 𝟑) = 𝟏𝟑𝟎. 𝟓 , 𝟏𝟐𝟓. 𝟓
̅ ± 𝟏. 𝟗𝟔(𝐒. 𝐄. 𝐨𝐟 𝒙
𝒙
S.D. = 13mmHg (ii) 99% Confidence Limit
𝛔 𝟏𝟑 𝟏𝟑 ̅) = 𝟏𝟐𝟖 ± 𝟐. 𝟓𝟖(𝟏. 𝟑) = 𝟏𝟑𝟏. 𝟒 , 𝟏𝟐𝟒. 𝟔
̅ ± 𝟐. 𝟓𝟖(𝐒. 𝐄. 𝐨𝐟 𝒙
𝒙
𝑺. 𝑬. = = = = 𝟏. 𝟑
√𝐧 √𝟏𝟎𝟎 𝟏𝟎
Chi-Square Test
Chi-Square Test: Non-Parametric, used when data are in frequencies such as in the numbers of
responses in two or more categories. It can be used with any data which can be reduced to proportions or
percentages. Denoted by Greek letter X2 and pronounced as Kye square.
Q. Write down the procedure to test the association of two attributes?
Null hypothesis (Ho): Two attributes are independent and have no association. There is
significance difference between 2 attributes.
1.
Alternative Hypothesis (H1): Two attributes are dependent and have association. There is No
significance difference between 2 attributes.
Where,
2. ∝ = 1% 𝑜𝑟 5% α = Level of significance
(𝑓𝑜 − 𝑓𝑒 )2 𝑓𝑜 = Observed Frequency
3. 𝑥2 = ∑ 𝑓𝑒 = Expected Frequency
𝑓𝑒
Df = Degree of freedom
4. 𝐷𝑓 = 𝑣 = (𝑟 − 1)(𝑐 − 1) c = No. of Columns
r = No. of Rows

5. 𝑥 2 ≥ 𝑥 2𝛼,𝑣 Critical region (C.R)

6. Calculation
7. Conclusion
Properties:

Suhaib Afzaal
• Chi-square values increase with increase in degree of freedom.
• The value of x2 lies between 0 and ∞.
• Chi-Square curve is always positively skewed.
• The mean of x2 distribution is number of degrees of freedom & standard deviation
Uses/Applications:
Chi-square test is mainly applied to:

• Test the goodness of fit.


• Test the independence of attributes.
• Test the homogeneity of attributes in respect of a particular characteristics.
• Test the population variance.
Conditions of Validity of Chi-Square test:
Chi-Square test can be used only when:

• The total number of observation is large i.e. n > 50.


• The observations are independent.
• The expected frequency of any item of cell should not be less than 5. If fe < 5, then frequencies
taking from the preceding or succeeding frequency be pooled together in order to make 5 or more
then 5.
Q. Calculate the Value of x2 and also test at level of significance b/w the smoking & genders are
independent to data given below:
Smokers Non-Smokers Total
Male 40 10 50
Female 05 45 50
Total 45 55 100
1. Hypothesis:
H0 = Genders and smoking habit are independent
H1 = Genders and smoking habit are dependent
2. Level of Significance:
α = 5% = 0.05
3. Degree of freedom:
𝐷𝑓 = 𝑣 = (𝑟 − 1)(𝑐 − 1) = (2 − 1)(2 − 1) = 1

Suhaib Afzaal
4. Calculations:
fo fe fo-fe (fo-fe)2 (𝐟𝐨 − 𝐟𝐞 )𝟐
𝐟𝐞
𝟒𝟓×𝟓𝟎
40 𝟏𝟎𝟎
= 22.5 17.5 306.3 13.6
𝟒𝟓×𝟓𝟎
5 𝟏𝟎𝟎
= 22.5 -17.5 306.3 13.6
𝟓𝟓×𝟓𝟎
10 𝟏𝟎𝟎
= 27.5 -17.5 306.3 11.1
𝟓𝟓×𝟓𝟎
45 𝟏𝟎𝟎
= 27.5 17.5 306.3 11.1
49.5

5. X2 Test Formula:
(𝑓𝑜 −𝑓𝑒 )2
𝑥2 = ∑ [ 𝑓𝑒
] =49.5
6. Critical region:
𝑥 2 ≥ 𝑥 2𝛼,𝑣
𝑥 2 ≥ 𝑥 20.05,1
49.5 ≥ 3.841

7. Conclusion:
Since, the calculated value fall in critical region so H0 is rejected and we accept the H1 and we
have to conclude that gender and smoking are dependent.

T-Test:
Used for small sample size (n ≤ 30).
Require Sample standard deviation (S).
Q. Write down the procedure of dimple t-test for small sample size/ Analysis of population mean of
small sample size?
1. Hypothesis:

t-test types Null Hypothesis (Ho) Alternative Hypothesis (H1):


Two Tail t-test µ = µo µ ≠ µo
One Tail Upper tailed t-test µ ≥ µo µ < µo
One Tail Lower tailed t-test µ ≤ µo µ > µo
2. Level of significance:
α = 1%(0.01) or 5% (0.05)
3. t-test:
̅−µ
𝒙
𝑻=
𝑺/√𝒏
4. Degree of freedom:
𝝂=𝒏−𝟏
5. Critical region:

Suhaib Afzaal
|𝑻| ≥ 𝑻𝜶 ,𝝂
𝟐

|𝑻| < − 𝑻𝜶 ,𝝂

|𝑻| > 𝑻𝜶 ,𝝂

6. Calculations:
7. Conclusion:
Properties:

• t-Distribution is asymptotic to x-axis.


• The shape of curve formed of t-distribution varies with degree of freedom.
• t-distribution is a symmetrical distribution with mean zero.
• t-distribution has a greater spread than normal distribution.
• t-distribution becomes closer to z distribution with increase in degree of freedom. Theoretically
when d.f. approaches to ∞, then the t-distribution approaches the standard normal curve.
Uses/Applications of t-test(for small sample size):

• To test the hypothesis that the population mean µ has a specified value µo when population
variance σ2 is not known.
• To test whether the two population means are equal (n1 < 30, n2 < 30) when σ1 and σ2 are
unknown.

Q. There are 25 students used Horlicks for a month there mean and standard deviation are 20 and
17 respectively. Test at 5% level of significance that population
mean is 30?
Given:
n = 25 α = 5% = 0.05
x̄ = 20 µ = 30
S = 17
Solution:
Null Hypothesis H0 µ0 = 30
Alternative Hypothesis H1 µ1 ≠ 30
Degree of freedom: ν = n – 1 = 25-1 = 24
Critical Region: |𝑇| ≥ 𝑇𝛼 ,𝜈 = 𝑇0.05 ,24= 2.064
2 2

t-test formula: 𝑥̅ − µ ̅20


̅̅̅ − 30
𝑇= = = −2.88
𝑆/√𝑛 17/√25
Critical region |𝑻| ≥ 𝑻𝜶 ,𝝂 𝟐. 𝟖𝟖 ≥ 𝟐. 𝟎𝟔𝟒
𝟐

Conclusion: The value of t-distribution falls in the critical region, thus the null hypothesis is rejected and
alternative hypothesis is accepted. The researcher may conclude that population mean increase in height
is not 30.
Q. Write down the procedure:

Suhaib Afzaal
For analysis of 2 drugs in small sample size?
for independent sample t-test of 2 population?
difference of two population mean in small sample size?
to test the equality of 2 population in small sample size?
1. Hypothesis:
t-test types Null Hypothesis (Ho) Alternative Hypothesis (H1):
Two Tail t-test µ1 - µ2 = 0 µ1 - µ2 ≠ 0
One Tail Upper tailed t-test µ ≥ µo µ1 - µ2 < 0
One Tail Lower tailed t-test µ ≤ µo µ > µo
2. Level of significance:
α = 1%(0.01) or 5% (0.05)
3. t-test:
̅−µ
𝒙
𝑻=
𝑺/√𝒏
4. Degree of freedom:
𝝂=𝒏−𝟏
5. Critical region:
|𝑻| ≥ 𝑻𝜶 ,𝝂
𝟐

|𝑻| < − 𝑻𝜶 ,𝝂

|𝑻| > 𝑻𝜶 ,𝝂

6. Calculations:
7. Conclusion:

ANOVA/f-Test
ANOVA(Analysis of Variance): is a statistical formula used to compare variances across the means (or
average) of different groups.
Experimental Design:
Definition: A plan used to collect data relevant to the problem under study in such a way as to provide a
basis for valid and objective inference about stated problem. There are two types of designs;
a. Systemic designs
b. Randomized designs
Basic Principles of Experimental Designs

• Randomization: Random assigning of treatments to experimental units, Every possible allotment


of treatments have same probability. E.g. Drawing cards from well-shuffled container or Balls
from well-shaken container

Suhaib Afzaal
• Replication: Repetition of basic experiment. A complete run for all treatments to be tested in the
experiment. E.g. Plots of land foe agricultural experiment
• Local Control: All extraneous sources of variation are not removed by randomization &
replication. Amount of balancing, blocking, and grouping of experimental units. Balancing means
treatments should be assigned to experimental units in such a way that the result in a balance
arrangements of treatments. Blocking means experimental units should be collected together to
form a relatively homogenous group.
Completely Randomized design: is the simplest type of he basic designs, may be defined as a design in
which the treatments are assigned to experimental units completely at random, that is the randomization is
done without any restrictions. Example of Experimental layout for CRD using 4 treatments A, B, C and D
each repeated 3 times is:

Randomized complete Block design: in which the experimental material is divided into groups or
blocks in such a manner that:
i. The experimental units within a particular block are relatively homogenous.
ii. Each block contain a complete set of treatments.
iii. The treatments are assigned at random to experimental units within each block, which means
randomization is restricted to blocks.
Example of Experimental layout for CRBD using 6 treatments A, B, C, D, E and F each repeated 3 times
is:

Latin Square Design: Involves simultaneous blocking of experimental units in two perpendicular
directions called, rows and columns thus imposing the restriction that each treatment must appear once
and only once in each row & once and only once in each column. Such double blocking of experimental
units / doubly restricted random assignments is called Latin square design.

Suhaib Afzaal

You might also like