Professional Documents
Culture Documents
MODULE 1 Introduction To BIOSTAT
MODULE 1 Introduction To BIOSTAT
BERNADINE NSA
EKPENYONG
ASSO PROF OF PUBLIC
H E A LT H ( E P I D E M I O L O G Y )
Introduction: Why Biostatistics
• Knowledge of biostatistics helps you to critically
appraise a scientific publication, know which test to
run for your research.
• .To do this you should know whether the right test
has been used and how to interpret the resulting
figures
Learning outcome
• Understand the meaning of biostatistics
Descriptive Statistics are the first tools used to explore the data, getting
some important indications of what the data set “looks like” [ measures
of central tendency, measures of dispersion, summation and
presentation of data]
Inferential statistics
• Inferential statistics are used to make inference or draw
conclusions from our data. It allows us to infer findings
from a smaller group (sample) to a larger group
(population)
•
• Statistical software packages : Excel, SPSS, EPI INFO,
STATA,
VARIABLES
Variables
•.
Numerical Categorical
(quantitative (qualitative
vARIABLES
• Variables – is a general term for any feature /characteristics of the unit or subject,
which is observed or measured, may vary from object to object
• Numerical variable/ Data include counts such as number of children of a specific
age and measurements such as height, weight. It is divided into
• (1) Continuous variable(also called quantitative or measurement variABLE)/ data is
measured and can take any value (within a range) its infinite eg height, time, weight,
SBP,, CD4 cell count, age,
• (2) Discrete variable/ data is counted, and can only take certain values, its finite,
usually in whole numbers / integer eg no of students in a class, you can’t have half a
student; number of episodes of diarrhea a child has had for one year.
Numerical Variables
• .
NumericalVariables
Discrete Continuous
Categorical variables/ Data
• Categorical variables/ Data are the result of classifying. For instance, individuals can
• be classified into categories according to their blood group; A, B, O, or AB. ; nominal and
ordinal variables are types of categorical variable.
• Nominal variables have distinct levels that have no inherent ordering eg hair colour and
sex,tribe, occupation,race, religion, lga. State of origin, country etc
• Ordinal variables have levels that do follow a distinct ordering eg position in the office,
year of study etcOrdinal variable could be either dichotomous ( binary ) variables ; here
there are only two categories in which the two observations can fall in eg. Sex ( male or
female), death or alive
• Or polychotomous variables is the variables where there are more than two categories in
which the observations can follow eg ethnic groups , occupation, religion
Categorical variables
• .
Categorical
Variables
Ordinal
Nominal Dichotomous
polychotomous
Classification of Variables
• .
Variables
Confounding Variables
Independent and dependent variables
• Independent (exposure) variable – it is a factor that is measured or manipulated
by the researcher to determine its relationship to an observed phenomenon (ie the
dependent variable). It causes a change in dependent (outcome) variable eg age,
sex, occupation, marital status, educational level, SES etc.
• The Median is the point which has half the values above,
and half below.
EXAMPLE
• Consider a set of six women patients aged 52, 55, 56, 58, 59 and 92
years, there are two “middle” ages, 56 and 58. The median is halfway
between these, i.e. 57 years.
• This gives a better idea of the mid-point of this skewed data than the
mean of 62.
• The median value of 57 years indicates that half the women are older
than 57 years, while half the women are younger than 57 years.
Standard Deviation
• Standard deviation (SD) is used for data which are “normally
distributed” to provide information on how much the data vary or
cluster around their mean.
• Bar chart
• Pie chart
• Line graphs
Histogram
Chart Title
25
20
15
10
0
[1, 5] (5, 9] (9, 13] (13, 17] (17, 21] (21, 25]
Histogram
• A histogram are the appropriate graphical display for ordinal
variables
4.5
3.5
2.5
1.5
0.5
0
Category 1 Category 2 Category 3 Category 4
Bar chart
• A bar chart are the appropriate graphical display for
categorial variables ,
12
10
0
Category 1 Category 2 Category 3 Category 4
• A distribution that has a central location to the left and a tail off to the right is
said to be positively skewed or skewed to the right
• A distribution that has a central location to the right and a tail off to the left is
said to be negatively skewed or skewed to the left
Normal distribution
Skewed distributions
characteristics of normal curve
• Bell shaped curve
• Symmetrical
• Limits are called confidence limits and the range between the two is called
confidence interval
• Notice that the Standard Deviation (SD) tells us about the variability or
spread of data values around the mean value in a sample. However, the
Confidence Interval (CI) tells us the range in which the true value (say, the
mean if the sample were infinitely large) is likely to be.
P values
• The P (probability) value is used when we wish to see how likely it is that
a hypothesis is true. The hypothesis, known as the “null hypothesis”, is
usually that there is no significant difference between two treatments.
• The lower the P value, the less likely it is that the difference happened
by chance and so the higher the significance of the finding.
P Value
• In most cases, an event might be considered to be “unlikely” to
occur if the chance of occurrence is less than 0.05 (or 1 in 20).
Test
statistics
Non-
parametric
parametric
5 5
Parametric & non parametric
• Student independent T test Mann Whitney U
• Analysis of variance (ANOVA) is a group of statistical techniques used to compare the means of
two or more samples to see whether they come from the same population
• The chi-squared test is used when the researcher wants to test for the difference between actual
and expected frequencies of two independent categorical variables
• Correlation analysis is used when the researcher wants to know if there is a linear relationship
between two variables that are not necessarily dependent on one another
• .
Thank you for your attention
Questions
Quick assessment
• Characterize the following variables and classify them as
qualitative/categorical or quantitative/numerical. If qualitative
/categorical, can the variable be ordered? If quantitative /numerical, is
the variable discrete or continuous? In each case define the values of
the variable: (1) race, (2) date of birth, (3) systolic blood pressure, (4)
intelligence quotient, (5) Apgar score, (6) white blood count, (7) weight,
and (8) quality of medical care.