Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 79

Biostatistics

and
Experimental Design
Yadgar Ali Mahmood
University of Garmian
1. Introduction
1.1 Statistics
Defined in to two modes
1. As statistical data: it is a numerical
representation of things.
2. As statistical method: it is a field of study
that deals with mathematical formulas, models,
and techniques that are used in statistical analysis of
raw research data.
It helps to know the object under study in a better way
– Statistical methods include:
1. Designing studies
2. Collecting data
3. Presenting data
4. summarizing data
5. Drawing inferences
What is Biostatistics?
• It is the application of statistical methods to
the biological and life sciences.

Limitations of statistics
 As a science, statistics has its own limitation
– Deals with only quantitative information

– Deals with aggregate of facts, not with individual data


items 7

– Data are approximately, not mathematical correct

– Statistics can be easily misused


1.2 Data
• Data — The Key Component of a Study
– More important than the methods used in
the analysis are the use of the
appropriate study design and the proper
definition and measurement of the study
variables.

• No good study without good data!


1.3 Design
• Design — The Road to Relevant Data

• Obtaining relevant data requires a


carefully drawn plan that identifies

– the population of interest


– the procedure used to select study units
– the process used in the measurement of
the attributes of interest.
Design…
• Standard methods of data collection:
1. Surveys: deals with ways to select a random
sample that is representative of the
population of interest and from which a valid
inference can be made.
2. Experiment: involves the creation of a plan
for determining whether or not there are
differences between groups.
3. Records: provide ready-made data for
routine and continuous information.
Design…

4. External sources: Sometimes we also


analyze data that
were already collected.

• In this case, we need to understand


how the data were collected
– in order to determine the appropriate
methods of analysis
1.4 Replication
• Replication — Part of the Scientific
Method
– Statistical analysis of data may demonstrate
that there is a high probability of an association
between two variables.

– However, a single study rarely provides proof


that such an association exists.

– Results must be replicated by additional studies that


eliminate other factors that could have accounted for
the relationship observed between the study
variables.
1.5 Applying Statistical Methods
• It requires more than the ability to use
statistical software, deriving formula...
– It is understanding the context for the use of
statistical procedures (study’s goal, the data,
and how data collected and measured)
• Think instead of simply memorizing
formulas, and statistical software
1.6 Scales of measurement
• Observations and Variable:
– is a characteristic under study that assume
different value for different element like blood
pressure, age, sex, …
– In statistics, we observe or measure
characteristics, called variables, of study
subjects, called observational units.

14
• The main divisions are qualitative (categorical)
and
quantitative (numerical variables).
Scales of measurement…
• Qualitative variable: a variable which can’t be
measured in quantitative form. But can only be
identified by name or categories
– E.g. place of birth, types of drug, stages of breast
cancer (I, II, III, or IV), degree of pain (minimal,
moderate, severe). …

15
Scales of measurement…
• Quantitative variable: A variable that can be
measured and expressed numerically and they
can be of two types (discrete or continuous).
– The values of a discrete variable are usually whole
numbers, e.g. the number of episodes of diarrhea in the
first five years of life.
– A continuous variable is a measurement on a
continuous scale, e.g. weight, height, blood 16

pressure, age, etc.


Types of measurement scales
• Nominal
– Data that represent categories or names
– There is no implied order to the categories of
nominal data.
– No arithmetic and relational operation can be
applied.
– E.g.
• Blood type (A, B, O and AB)
17
• Eye color (brown, black, blue, etc.)
• Sex (Male, Female)
Types of measurement scales…
• Ordinal
– Categories that can be ranked, but differences
between ranks do not exist
– Arithmetic operations are not applicable but
relational operations.
– Ordering is the sole property of ordinal scale.
– E.g.
• Degree of pain (minimal, moderate, severe)
• Rating scales (Excellent, Very good, Good, Fair, poor) 18

• Letter grade (A, B, C, D and F)


Types of measurement scales…
• Interval
– Data that can be ranked and differences are
meaningful. However, there is no meaningful
zero, so ratios are meaningless.
– All arithmetic operations except division and
relational operations are also possible.
– E.g.

– IQ test (intelligence scale).


19
– Temperature in degree
Types of measurement scales…
• Ratio
– Data can be ranked, differences are
meaningful, and there is a true zero.
– All arithmetic and relational operations are
applicable.
– E.g.
• Age (30 year individual is two times of 15 years)
• Weight (0kg is to mean, no weight)
• Number of drugs (0 means no drug)
20
1.7 Sources of data
Two source: primary and secondary
1. Primary Data: a data collected by the user
directly from the source.
– Methods of collection
 Personal Interview (Telephone, face-to-face…)
 Group discussion
 Questionnaires
 Door-to-Door Survey
 New Product Registration
Sources of data…
2. Secondary Data: a data gathered or compiled
from published and unpublished sources.
– From journals, reports, government publications,
publications of professionals and research
organizations.

– E.g.

- CSA: Central statistics agency

- DHS: the demographic and Health Survey

- HDS: Health and Demographic Surveillance


1.8 Division of statistics
Depending on how data can be used

• Descriptive statistics (Exploratory): is concerned with


summary calculations, graphs, charts and tables… about a
given data.
• Inferential statistics (Confirmatory): is a method used to
generalize from a sample to a population.
– sometimes called analytical statistics
1.9 Stages in statistical investigation
Five stages
• Collection of data: the process of measuring,
gathering, assembling the raw data up on which
investigation is to be based.
• Organization of data: Summarization of data in
some meaningful way, e.g. table form

24
Stages in…
• Presentation of the data: The process of re-
organization, classification, compilation… of data to
present it in a meaningful form.
• Analysis of data: The process of extracting
relevant information from the summarized data
• Inference of data: The interpretation and further
observation of the various statistical measures
through the analysis of the data
– And by implementing those methods by which 25

conclusions are formed and inferences made.


1.10 Types of questions
1. Open-ended questions
• Permit free responses
• Not allowed any possible answers to
choose from.
• Mostly used for investigation of
• Facts which the researcher is not familiar
• Opinions, attitudes, and suggestions of
informants
• Sensitive issues
Types of questions…Example
• Can you describe exactly what the traditional
birth attendant did when your labor started?
• What sensations did you experience during
your cataract surgery?
• How do you feel when your baby’s diarrhea
does not stop?
Types of questions…Example
2. Close-ended Questions
• Offer a list of possible options/answers
• When designing closed questions you should
try to:
• Make lists are complete and mutually exclusive
(events can’t happen at same time)
• Keep the number of options as few as possible
• It is useful if the range of possible responses is
known
Types of questions…Example
• What is your marital status?
1. Single

2. Married/living together

3. Separated/divorced/widowed

• Have you ever gone to the local village


health worker for treatment?
1. Yes

2. No
Steps in designing questionnaire
1. Content
Decide what questions will be needed to

measure your variables and reach objectives

2. Formulating Questions

Specific and precise enough that respondents do


not interpret them differently
Steps…
3. Sequencing of Questions

Better to be logical for the respondent

4. Formatting the Questionnaire


Not only be consumer but also user friendly

5. Translation
If the interview will be conducted in one or
more local languages, translate
2. Data presentation
Data presentation
• Having collected and edited the data, the next
step is to organize it.
• That is to present it in a readily clear
condensed form
• The presentation of data is classified in to two
1. Tabulation
2. Diagrammatic
Tabular presentation
• Frequency distribution: is the organization of
raw data in table form using classes and
frequencies
• There are three basic types of frequency
distributions
• Categorical frequency distribution
• Ungrouped frequency distribution
• Grouped frequency distribution
Categorical frequency distribution
• Used for data that can be place in specific
categories such as nominal or ordinal.
E.g. a researcher collected the following
data on marital status for 25 Patients.
(M=married, S=single, W=widowed and
D=divorced)

Qs: Present the given data in table form

S D W D M
S M M M S
D S M M S
D D S S W
W W D D W
Solution
Make a table as shown

Class Tally Frequency Percent

M ////// 6 24%

S /////// 7 28%

D /////// 7 28%

W ///// 5 20%
Ungrouped frequency distribution
• Is a table of all the potential raw score values
• Often constructed for small set or data on
discrete variable.
E.g. The following data represent the Weight of
12 clients in nutrition consulting clinic.

Qs: Construct ungrouped frequency distribution

80 76 90

70 60 62

63 60 63

76 70 70
Solution
Make a table as shown

Mark Tally Frequency

60 // 2

62 / 1

63 // 2

70 /// 3

76 // 2

80 / 1

90 / 1
Grouped frequency Distribution
• When the range of the data is large, the data
must be grouped in to classes that are more
than one unit in width
Qs: Construct a frequency distribution for the
following data.

11 29 6 33 14 31 22 27 19 20

18 17 22 38 23 21 26 34 39 27
Solution

Make table as follows

Class limit class boundary Class mark Freq. CF(<) CF(>)


6-11 5.5-11.5 8.5 2 2 20
12-17 11.5-17.5 14.5 2 4 18
18-23 17.5-23.5 20.5 7 11 16
24-29 23.5-29.5 26.5 4 15 9
30-35 29.5-35.5 32.5 3 18 5
35-41 35.5-41.5 38.5 2 20 2
Diagrammatic and Graphic presentation
• presenting data in visual displays
• Importance
– They have greater attraction.
– They facilitate comparison
– They are easily understandable
• The commonly used diagrammatic
presentation for discrete as well as
qualitative data are:
– Pie charts, Bar charts, Pictogram, map…
Pie chart
• A pie chart is a circle that is divided in to
sections according to the percentage of
frequencies in each category of the
distribution.
Example: Draw a pie chart to represent the
following OPD (Out Patient Department) Patients
of the year 2018 in the given hospital.

Men Women Girls Boys


2500 2000 4000 1500
Solution
• First make a table like:

• Then, draw
Bar chart
• is the most widely used graphical method for
describing qualitative data.
• A set of bars representing some magnitude
over time space.
• The common types of bar chart
– Simple
– Multiple
– Component … etc
Simple bar chart
E.g. Distribution of Decayed teeth among
children of a primary school
Multiple bar chart
E.g. Distribution of marital status by sex
%

60 Male

40 Female

20

Single Married Divorced Widowed


Marital status
Graphical presentation of data

• The commonly used graphs for


continuous data are
– histogram,
– Frequency polygon
– Ogive (Cumulative Frequency graph)…
Histogram
• A graph which displays the data by using
vertical bars of various heights to represent
frequencies.
• Class boundaries are placed along the horizontal
axes.
• Example: Construct a histogram to represent
the previous data
– i.e., (example on grouped freq.distrib.)
Solution
7

4
3
2

5.5 11.5 17.5 23.5 29.5 35.5


41. 5

Class boundaries on x-axis and frequency on y-axis


Frequency Polygon:
• it is a line graph where,
– The frequency is placed along the vertical axis and
Class marks at horizontal axis
• Example: draw a line graph for the above
example on histogram
Solution
class marks are in the x-axis
3. MCT and MV

MCT (Measures of central tendency)


MV (Measures of variation)
MCT (Measures of central tendency)
• useful in data editing as well as in aiding our
understanding of the data
• Sometimes called Average

• Objectives
• To understand the data easily
• To facilitate comparison
• To make further statistical analysis
Types of MCT
• The Mean (Arithmetic, Geometric and
Harmonic)
• The Mode
• The Median
• Quantiles (Quartiles, deciles and percentiles)

• The choice of these averages depends up on


which best fit the property under discussion.
54
The Mean ( X )
• The Arithmetic Mean:
• Is defined as the sum of the magnitude of the
items divided by the number of items
• The mean of X1+X2+X3+,…+Xn is denoted by
A.M ,m or X and is given by:
,, Or

55
Mean for Ungrouped data

• Example: Obtain the mean age of the


following ages of children under Pedi clinic
2, 7, 8, 2, 7, 3, 7
• Solution:

56
Mean for grouped data
• If data are given in the shape of a continuous
frequency distribution, then the mean is

57
Example: calculate the mean for the
following data

Class frequency Solution


6- 10 35
11- 15 23
16- 20 15
21- 25 12
26- 30 9
31- 35 6

58
The Mode ( X ˆ )
• Mode is a value which occurs most frequently in
a set of values
• The mode may not exist and even if it does
exist, it may not be unique.
• In case of discrete distribution the value having
the maximum frequency is the modal value.
• The mode of a set of numbers X1, X2, X3,…Xn is

usually denoted by:



59
Examples:
1. Find the mode of 5, 3, 5, 8, 9
Mode =5
2. Find the mode of 8, 9, 9, 7, 8, 2, and 5. It
is a bimodal Data: 8 and 9
3. Find the mode of 4, 12, 3, 6, and 7.
No mode for this data.

60
~X
The Median( )
• In a distribution, median is the value of the
variable which divides it in to two equal
halves.
• In an ordered series of data median is an
observation lying exactly in the middle of the
series.

61
Example:
Find the median of the following numbers.
a) 6, 5, 2, 8, 9, 4
b) 2, 1, 8, 3, 5
Solution:
a) First order the data: b) Order the data :
2, 4, 5, 6, 8, 9 1, 2, 3, 5, 8
Here n=6, which is even Here n=5 , which is
n=6 odd

62
MV: Measures of variation
• The spread of items of a distribution is known
as dispersion or variation.
• In other words, the degree to which numerical
data tend to spread about an average value is
called dispersion or variation of the data.

63
Objectives of measures of variation
• To judge the reliability of MCT
• To control variability itself
• To compare two or more groups of numbers in terms
of their variability
• To make further statistical analysis

64
Types of Measures of Dispersion
• The most commonly used measures of
dispersions are:
– Range and relative range
– Standard deviation and coefficient of
variation
– Quartile deviation and coefficient of
Quartile deviation

65
The Range
• The range is the largest score minus the
smallest score.
• It is a quick and dirty measure of variability.
• It is greatly affected by extreme scores.

• R=L-S, L=Largest and S=Smallest

Example: 32 35 36 42 42 43 43 45

Range is 45-32=13
66
Mean Deviation
• Is the arithmetic mean of the values of the
absolute deviations from a given average
• Depending up on the type of averages used
we have different mean deviations
• Mean deviation for raw data and for
frequency
distribution respectively as follows:

67
The variance and standard deviation
Population Variance:
• If we divide the variation by the number of
values in the population, we get the
population variance.
• This variance is the "average squared
deviation from the mean"
• And for frequency distribution

68
Sample Variance
• It simply be the population variance with the
population mean replaced by the sample mean.
• However, one of the major uses of statistics is
to estimate the corresponding parameter.
• To counteract this, the sum of the squares of
the deviations is divided by one less than the
sample size 69
Sample variance formula
For raw data:

Or

shorthand formula

For frequency distribution:


70
Or

,
shorthand formula
Standard deviation
• It is the square root of variance
• Population standard deviation

• Sample standard deviation

71
Examples:
• Find the variance and standard deviation of
the following sample data
1. 5, 17, 12, 10.
2. The data is given in the form of frequency
distribution

72
Cont…

73
Cont…

74
Coefficient of Variation (C.V)
• Is defined as the ratio of standard deviation
to the mean usually expressed as percents.

• The distribution having less C.V is said to be


less variable or more consistent.

75
Example:
• An analysis of the monthly wages paid to
workers in two dep’t Pedi (A) and Ortho (B)
belonging to the same campus gives the
following results
Value Dep’t A Dep’t B

Mean wage 52.5 47.5

Variance 100 121

In which dep’t is there greater variability in 76


individual wages?
Cont…

• in dep’t B there is a greater


variability in
individual wages.
77
Standard Scores (Z-scores)
• If X is a measurement from a distribution with
mean X and standard deviation S, then its
value in standard units is

78
Cont…
• Z gives the deviations from the mean in units
of standard deviation
• Z gives the number of standard deviation a
particular observation lie above or below the
mean.
• It is used to compare two observations
coming from different group

79
Examples:
1. Two sections were given Biostatistics
examinations. The following information was
given.
Value HO (Sec1) Nursing (Sec2)
Mean 78 90
Sd 6 5

• Student A from section 1 scored 90 and


student B from section 2 scored 95. Relatively
speaking who performed better?
80
Solutions:
• Calculate the standard score of both students

• Student A performed better relative to his


section because the score of student A is
2SD above the mean score of his section
while, the score of student B is only 1s.d
above the mean score of his section. 81
Measures of shape
• Measures of skewness
– Skewed to the right
– Skewed to the left
– Symmetric

• Measures of kurtosis
– Leptokurtic
– Mesokurtic
– Platykurtic

You might also like