Bio Statistics

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Biostatistics | Hasnat Hussain (Reus-11)

Biostatistics
Q.1) Research in medical science is incomplete without applying statistical
techniques and analyses.
(a) Define data and classify quantitative and qualitative data.
(c) Define standard error.
(d) Briefly explain rapid sand filtration process for purification of water.
ANS:
Data:
“Collection of observations in a scientific way is called data. When the data becomes
meaningful, it is called information.”

Classification of data:
1. Quantitative data (numerical)
i. Discrete data
 Data is in whole numbers.
 E.g. Number of live births, number of TB cases etc.
ii. Continuous data
 Measurement is on a continuous scale (interval, ratio).
 E.g. Height, weight, pulse, BP etc.

2. Qualitative data (categorical)


i. Nominal data
 Data is divided into groups.
 E.g. Male/female, black/whites etc.
ii. Ordinal data
 Data is arranged in some kind of order.
 E.g. (i) Opinion on something; fully agree, agree, disagree, totally
disagree
(ii) Rank; 1st 2nd 3rd 4th 5th 6th 7th etc.

Another classification of variables is where we classify them into independent or dependent


variables.
(i) Independent variable  doesn’t depend on any kind of factors.

1
Biostatistics | Hasnat Hussain (Reus-11)

(ii) Dependent variable  depend on many factors.

Standard error:
“The standard error (SE) is the standard deviation of the sampling distribution of a statistic,
most commonly of the mean.”
 The standard error is an estimate of the standard deviation of a statistic.
 If we take a random sample from the population over and over again, we will find
that every sample will have a different mean.

 Formula 

s  standard deviation of the sample


n  number of observations

Rapid sand filtration process of water purification:


Rapid sand filters are of two types; the gravity type and the pressure type. Following steps
are involved in the purification of water in rapid sand filters;
1. Coagulation
 The raw water is treated with a chemical coagulant such as alum.
2. Rapid mixing
 The water is then violently agitated in the “mixing chamber” for a few minutes.
 This causes equal distribution of alum throughout the water.
3. Flocculation
 Water is stirred slowly and gently in the flocculation chamber for about 30
minutes.
 This will lead to the formation of a thick, white precipitate of aluminium
hydroxide.
4. Sedimentation
 The water is now put into the sedimentation tanks and left there for 2-6 hours
until the flocculent, bacteria and impurities settle down at the bottom of the
tanks.
 The precipitate or sludge which settles down at the bottom is removed from
time to time.
5. Filtration

2
Biostatistics | Hasnat Hussain (Reus-11)

i. The water is now subjected to rapid sand filtration.

Q.2) The following data shows the amount of phosphates per load of laundry
in grams for a random sample of various types of detergents.
(a) Calculate the measures of central tendency.
(b) Give characteristics of normal distribution.
Detergent Phosphate per load
(grams)

A x P Blue sail 48
Dash 47
Concentrated all 42
Cold water all 42
Breeze 41
Ajax 31
Sears 30
Fab 29
Cold powder 29
Bold 29
Rinso 26
Oxydol 34

ANS:
Measures of central tendency:
Mean  48+47+42+42+41+31+30+29+29+29+26+34/12 = 35.66
Median (most commonly occurring number)  29
Mode (the middle one after arranging the data in ascending or descending order) 
26 29 29 29 30 31 34 41 42 42 47 48  31+34/2 = 32.5

Characteristics of normal distribution curve:


1. Bell shaped

3
Biostatistics | Hasnat Hussain (Reus-11)

2. Mean, mode and median coincide


3. Total area of the curve is 1
4. Mean is 0
5. SD is 1
6. Unimodal
7. Curve never touches the baseline
8. Right and left halves are equal (symmetrical)
9. Theoretical, not seen practically.

Q.3) Name the various methods of presenting quantitative and qualitative


data.
ANS:
Methods of presenting quantitative and qualitative data:
1. Tables
ii. Simple tables
iii. Frequency distribution tables
2. Charts and diagrams
i. Bar chart
ii. Pie chart
iii. Histogram
iv. Frequency polygon
v. Line diagram
vi. Scattered diagram
3. Graphs
4. Pictures
i. Pictograms

4
Biostatistics | Hasnat Hussain (Reus-11)

5. Special curves

Q.4) Following is the data of IQ of 9 students of MBBS; 110, 105, 110, 106, 95,
120, 115, 112, 117.
(a) What do you understand by measures of central tendency of a data?
(b) How will you calculate measure of central tendency for this data?
(c) Calculate the standard deviation for the data.
ANS:
Measures of central tendency:
A measure of central tendency is a single value that describes a set of data by identifying
the central position within that set of data. There are 3 measures of central tendency;
mean, mode and median.

Measures of central tendency for this data:


Mean  110+105+110+106+95+120+115+112+117/9 = 110
Median  95 105 106 110 110 112 115 117 120 = 110
Mode  110

SD:

S/no. X x̄ (mean) x-x̄ (x-x̄)2


1 95 110 15 225
2 105 110 5 25

3 106 110 4 16
4 110 110 0 0
5 110 110 0 0
6 112 110 2 4
7 115 110 5 25

5
Biostatistics | Hasnat Hussain (Reus-11)

8 117 110 7 49
9 120 110 10 100
Total = 444

√𝐸(𝑥−𝑥 𝑏𝑎𝑟)2
SD =
𝑛−1

By putting in the values;


√444
SD = = +/- 7.44
9−1

Q.5) Define; (i) confidence interval (ii) type-1 error in hypothesis testing (iii)
variable.
ANS:
Confidence interval:
“An interval between confidence limits within which the true value of a population
parameter is stated to lie with a specified probability.”

Type-1 error in hypothesis testing:


“Rejecting the null hypothesis when it is in fact true is called a Type I error.”
 Type I errors are similar to false positives.
 Example  A drug being used to treat a disease. If we reject the null hypothesis in
this situation, then our claim is that the drug does in fact have some effect on a

6
Biostatistics | Hasnat Hussain (Reus-11)

disease. But if the null hypothesis is true, then in reality the drug does not combat
the disease at all. The drug is falsely claimed to have a positive effect on a disease.
 Type I errors can be controlled. The value of alpha, which is the level of significance
you set for your hypothesis, has a direct effect on type I errors. Alpha is the
probability that we have a type I error.
 For a 95% confidence level, the value of alpha is 0.05. This means that there is a 5%
probability that we will reject a true null hypothesis.

Variable:
“Any characteristic whose value is different (variable) among individuals.”
E.g. height, weight, IQ etc.

Q.6) What are different types of data? How you construct a frequency
distribution table? What is the co-efficient of variation? How it is calculated?
ANS:
Different types of data:
 Described previously.
Construction of a frequency distribution table:
 The data is first divided into groups (class intervals) and the frequency of each group
is shown in the adjacent column.
 When there is large data, a maximum of 20 groups, and when there is a small data, a
minimum of 5 groups should be made.
 The class intervals should be equal, so that data can be compared.

Example;
Ages of patients admitted due to polio  1, 1, 1, 2, 2, 2, 3, 3, 4, 5, 5, 5, 6, 6, 6, 6, 6, 6, 7, 7,
8, 9, 9, 9, 10, 10, 10, 10, 11, 11, 11, 11, 12, 13, 14, 14, 14, 14, 14

7
Biostatistics | Hasnat Hussain (Reus-11)

Age Number of patients


(frequency)
(class intervals)
0-2 6
3-5 6
6-8 9
9-11 11
12-14 7

Coefficient of variation:
“A statistical measure of the dispersion of data values in a data series around the mean."
Formula;
Coefficient of variation (CV) = SD/mean x 100

Q.7) following is the data of IQ of MBBS students; 110, 115, 99, 105, 120, 100,
107.
(a) Calculate mean and median.
(b) Calculate standard deviation.
ANS:
Mean:
110+115+99+105+120+100+107/7 = 108
Median:
99 100 105 107 110 115 120  107
SD:

S/no. X x̄ (mean) x-x̄ (x-x̄)2


1 99 108 9 81
2 100 108 8 64

8
Biostatistics | Hasnat Hussain (Reus-11)

3 105 108 3 9
4 107 108 1 1
5 110 108 2 4
6 115 108 7 49
7 120 108 12 144
Total = 352

√𝐸(𝑥−𝑥 𝑏𝑎𝑟)2
SD =
𝑛−1

By putting in the values;


√352
SD = = +/- 7.65
7−1

Q.8) Note on standard normal distribution curve.


ANS:
Standard normal distribution curve:
This is a standardized normal distribution curve which has been made by statisticians to
estimate the area under the normal curve. Also called Gaussian curve.
Characteristics;
 Described previously.

Area b/w 1 SD on either side of the mean covers 68.3% area.


Area b/w 2SD on either side of the mean covers 95.4% area.
Area b/w 3SD on either side of the mean covers 99.7% area.
These limits on either side of the mean are called “confidence limits” and the interval b/w
them is called “confidence interval”.

9
Biostatistics | Hasnat Hussain (Reus-11)

Q.9) What do you understand by the term statistical significance? Write a


short note on confidence interval.
ANS:
Statistical significance:
“The probability that a result or relationship is caused by something other than mere
random chance.”
 In everyday English, the word significance means "important." But when researchers
say the findings of a study were "statistically significant," they do not necessarily
mean the findings are important.
 Statistical significance refers to whether any differences observed between groups
being studied are "real" or whether they are simply due to chance.
 Hypothesis testing is used to determine if a result is statistically significant or not. We
get a "p-value". In general, a 5% or lower p-value is considered to be statistically
significant.
 When a statistic is significant, it simply means that you are very sure that the statistic
is reliable.

Confidence interval:
“An interval between confidence limits within which the true value of a population
parameter is stated to lie with a specified probability.”
(Population parameter e.g. mean, mode or median etc.)

10
Biostatistics | Hasnat Hussain (Reus-11)

 A 95% confidence interval is a range of values that you can be 95% sure that it
contains the true parameter (e.g. mean) of the population.
 To express a confidence interval, you need three pieces of information; (i) Confidence
level (ii) Statistic (iii) Margin of error

Q.10) Short note on chi-square test.


ANS:
Chi-square test:
Chi square (X2) test is an alternate method of testing the significance of difference b/w two
proportions. It can also be used to compare more than 2 groups.
Formula 

4 Steps of chi square test;


1. Testing the null hypothesis
2. Application of the X2 test
3. Finding the degree of freedom
4. Looking into the probability tables

Q.11) A study was conducted in a medical college to know the Hb level of male
and female students. The study was conducted on 150 males and 100 females.
The results showed that the mean Hb level of boys was 12 with +/- 2 standard
deviation whereas the mean Hb in females was 11 with +/- 3 standard
deviation.

11
Biostatistics | Hasnat Hussain (Reus-11)

(a) What type of variables were studied in the above study?


(b) Calculate the coefficient of variation for male and female students
separately. What does this ratio depict?
(c) Calculate 95% confidence interval of Hb for male and female students
separately.
ANS:
Variables studied in this study:
 Quantitative data (numerical)
Coefficient of variation:
Males
CV = 2/12 x 100 = 16.66 %
Females
CV = 3/11 x 100 = 27.27 %

Calculation of 95% confidence interval:


Males
95% CI = mean +/- 2 x SD/√n
By putting in the values, we get;
12 +/- 2 x 2/√150  12 +/- 4 /12.24  12 +/- 0.32
95% CI = 11.68 - 12.32

Females
95% CI = mean +/- 2 x SD/√n
By putting in the values, we get;
11 +/- 2 x 3 /√100  11 +/- 6/10  11 +/- 0.6
95% CI = 10.4 – 11.6

12
Biostatistics | Hasnat Hussain (Reus-11)

Q.12) The principal of a medical college in KPK claims that the school’s
students are highly intelligent group with an average IQ of 135. This claim
constitutes a hypothesis that can be tested referring to above statement.
(a) Define hypothesis and enlist the various steps involved in testing it.
ANS:
Hypothesis:
“A supposition or proposed explanation made on the basis of limited evidence as a starting
point for further investigation.”

Steps involved in testing a hypothesis:


Step 1: State the null and alternative hypothesis.
Step 2: Set the criteria for a decision (level of significance).
Step 3: Establish the critical values.
Step 4: Draw a random sample table from the population and calculate the mean of the
sample.
Step 5: Select appropriate statistical test and compute the value of the test statistic Z or
t or X2.
Step 6: Compare the calculated value with the critical values and then accept or reject
the null hypothesis.

Prepared By: Hasnat Hussain (Reus-11) 

13

You might also like