Professional Documents
Culture Documents
EPI 1.05 Basic Biostatistics
EPI 1.05 Basic Biostatistics
EPI 1.05 Basic Biostatistics
Content Outline: To compare the proportion of those with PEFR values less
I. Introduction to Biostatistics than predicted amongst those exposed to high vs. low
II. Descriptive Statistics pollution loads
A. Tables To determine if there is an association between PEFR and
B. Graphs
C. Summarizing Figures pollution load
i. Qualitative To determine if pollution load (Ug/NCM TSP) is a
ii. Quantitative prediction of lung function (PEFR)
III. Inferential Statistics
IV. Hypothesis Testing Descriptive Analysis Inferential Analysis
Tables Estimation
Graphs - Point Estimate
Summarizing Figures - Interval Estimate
LEGEND: Qualitative Measures Hypothesis Testing
‼ - Frequency - Mean
Presentation Mentioned in Remember Book - Location - Proportion
the lecture Quantitative Measures
- Central Tendency
OBJECTIVES - Dispersion
To differentiate descriptive and inferential statistics
To know the concept of population and sample Population vs. Sample
To distinguish a parameter from a statistic
To know the different ways of describing data
Population includes all the elements from a set of data
To determine the appropriate descriptive measure for Sample consists of one or more observations from the
summarizing data population
To compute and interpret a summary measure
To differentiate estimation and hypothesis testing
BIOSTATISTICS
Definition of Biostatistics
Application of statistical methods to life sciences
o Collection
o Organization Best employed due to constraints (manpower, money,
o Analysis time) of conducting a study on an entire population
o Interpretation Used to make generalizations about the target population
Applications: Information-based decision-making Should be representative of the population to make a valid
process conclusion
o Problem identification
o Needs assessment Example Study: Prevalence of internet addiction among high
o Resource allocation school students in Quezon City
o Program evaluation
Population: All high school students in Quezon City
I. Descriptive Statistics Sample: All high school students in 1 private and 1
Method to summarize and present data in a form that’s public school in Quezon City
easier to analyze and interpret Parameter vs. Statistic
e.g. The Philippine Atmospheric Geographic Astronomical Parameter
Service Administration (PAGASA) measures the daily o Numerical constant obtained by observing the total
amount of rainfall in millimeters. They compute for the population
average daily amount of rainfall every month for the past o (usually unknown)
year. o e.g. “A barangay has 5095 households and the
average income per year is Php 222,000
Inferential Statistics Statistic
Method to make generalizations and conclusions about a o Numerical variable obtained by observing a random
target population based on results from a sample sample from the population
e.g. Given the data on the daily amount of rainfall for the o e.g. “Ten households are selected for investigations
last ten years, PAGASA declares that the average amount and the average income per year of the households
of rainfall for August the next year is between 23 to 25 is Php 195,000
millimeters.
Examples:
ESTIMATION
o To determine Peak Expiratory Flow Rates (PEFR) of
children residing in barangays with high and low Definition of Estimation
pollution load The process by which a statistic computed for a random
o To compare the mean PEFR of children between 2 sample is used to approximate the corresponding
barangays parameter
Interval Estimate
Consists of two numbers, a lower limit and an upper
limit, which serve as the bounding values within which the
parameter (true value) is expected to lie with a certain
degree of confidence
Example: Estimate the height of Dr. Zulueta
Appropriate Statistical Test Factors
o 6’1” = Point estimate
Objectives of the study
o 5’11” to 6’2” = Interval estimate
Type of variable
Steps in Hypothesis Testing:
1. Stating the null hypothesis, H0, and the alternative Level of measurement
hypothesis, H1 Number of samples
Whether samples are related or independent
2. Stating the level of significance, α
Assumption on the distribution
3. Choosing the test statistic and determining sample
distribution Critical Region
4. Determining the critical region SIZE of a critical region depends on the set level of
5. Computing the TEST STATISTIC significance α
6. Making the statistical decision (whether or not to LOCATION of the critical region depends on the nature of
reject the null hypothesis) the alternative hypothesis (one-tailed or two-tailed)
STATISTICAL DECISION
Definition of Statistical Decision
WHETHER OR NOT to reject the null hypothesis
1. For Manual computations, if computed value of the
test statistic falls in the critical region, THE NULL
HYPOTHESIS IS REJECTED
2. FOR COMPUTERIZED OUTPUT, if the p-value
(usually presented together with the computed
values of the test statistic) ≤ to the set α, then THE
NULL HYPOTHESIS IS REJECTED
3. If the CONFIDENCE INTERVAL does not include the
null value (1), THE NULL HYPOTHESIS IS
REJECTED
Drawing Conclusion
Statistical Decision Conclusion
Reject the null hypothesis State the alternative
STATEMENT OF LEVEL OF SIGNIFICANCE (H0) hypothesis (H1)
Do not reject the null There is no sufficient
Definition Statement of Level of Significance
hypothesis (H0) evidence to say (state the
When arbitrarily setting the level of significance at α, the
alternative hypothesis)
probability of erroneously rejecting a true H0 is set almost equal
to α.
Note: We do not accept the
E.g. If α = .05, probability of rejecting a true
null hypothesis
hypothesis is at most only 5%
2 of 8
E1 T5
Basic Biostatistics
o Non data: the use of too
WAYS TO DESCRIBE OR INTERPRET DATA many gride lines and frames (as examples)
Textual
When there are 3 or fewer numbers or data to be
interpreted (APA)
Data is simply narrated
Used if the data can easily be explained using sentences
or paragraphs.
Involves enumerating important characteristics,
emphasizing significant figures, and identifying important
features of data
Tabular
When there are 4-20 numbers or data to be interpreted
(APA)
Tables provide a compact way of presenting large sets of
detailed information
The formatting of a table may allow comparisons with ease
Provides interrelationships among the different variables
being presented Quantitative vs. Qualitative
Can have frequency counts, proportions, percentages, QUANTITATIVE
totals and avergaes o Variables can be measured and ordered according to
Characteristic of a good table: quantity or amount
o Simple – devoid of unnecessary markings o The values can be expressed numerically
o Direct – contain what is only necessary o Two subtypes:
o Clear – must jive with textual description Discrete: integers, whole numbers
Continuous: fractions, decimals
QUALITATIVE
o The categories are simply used as labels to
distinguish one group from another and not for
comparison
o Examples: Sex, religion, regions in the country,
marital status
Measures of Frequency
1. COUNT
Absolute number of persons/elements with the
characteristic
Ex. 100 females in the room
2. RATIO
Single number representing the relative size of two
numbers; depicted as a/b, k as the multiplier
(multiples of 10)
Isang tingin palang, kita mo na agad yung sinasabi mong
data, compared to a long text description.
Graphical Examples:
When there are more than 20 numbers or data to be
o Sex ratio “1:3 male to females ratio”
interpreted (APA)
“100 male to 300 female”
Portrays numerical figures that are found in a table in a
o Teacher to student ratio
pictorial or illustrational form
o Odds ratio – odds of exposure among cases/
Allows comparison of the different series or groups
odds of exposure among control
Gives only an approximation of the data – tables have the o Risks ratio
exact data! 3. PROPORTION
Shows the different relationships between the given Special type of ratio where the numerator is part of
variables the denominator
Accurate, clear, simple, and professional looking
o Scales should have equal increments
o Avoid optical illusions
o Avoid broken lines and markers
Should emphasize only data – only DATA and not NON Examples:
DATA o 300 female, 100 male; proportion of female is ¾
(300 divide 400)
3 of 8
E1 T4
Basic Biostatistics
4. RATE
Frequency of occurrence of events in a given interval
of time
Examples:
o Birth Rate/Live Births
o Incidence Rate
o Prevalence Rate Median Middlemost observation, if n is
odd
Measures of Location Mean of the two middlemost
PERCENTILE observations, if n is even
o One of the 99 values of a variable, which divides
the distribution into 100 equal parts
DECILE
o One of the 9 values of a variable, which divides the
distribution into 10 equal parts
QUARTILE Mode
o One of the 3 values of variable which divides the Most frequently occurring value
distribution into 4 equal parts in a set of observation
“50th percentile is 2nd quartile, median, 5th decile
25th percentile is 1st quartile, 2.5th decile
75th percentile 3rd quartile 7.5th decile”
Application of percentile in interpretation of growth chart
horizontal axis age in months, vertical axis height and
weight.
12 mo boy, 95th percentile falls at 12.4 kg 59% of 12
month boys are less than or equal to 12.4kg , 5% more
than.
Used to compare individual values with standard
value/norm
Used also for measurement of ability and intelligence,
normal lab values range Where:
E – sum
Xi - individual value
i – subscript from first to last observation
n – total no of observations
4 of 8
E1 T5
Basic Biostatistics
1. NOMINAL
Class to which object/unit assigned is numbered or
named - s2 is variance
2. ORDINAL - difference of highest and lowest values, squared, divide to
Classes are ordered or ranked n-1 (number of observations)
Two types:
o Qualitative – best described using proportions o STANDARD DEVIATION – square root of the
Ex: Likert Scaled responses (Strongly variance
Disagree, Disagree, Agree, Strongly Agree) o COEFFICIENT OF THE VARIANCE
o Quantitative- best described using median o INTERQUARTILE RANGE
3. INTERVAL
Zero point is not fixed but is arbitrary
Zero does not mean absence of characteristic
4. RATIO
Height, weight, blood pressure
Nature of Distribution
Looks like a perfect bell-shaped curve
5 of 8
E1 T4
Basic Biostatistics
SUMMARY:
Proportions and percentages are used to summarize
nominal and ordinal data
Percentiles are useful to compare an individual
observation with a norm
Median is used for ordinal data or skewed numerical data
Range is used with numerical data when the purpose is to
emphasize extreme values
Standard deviation is used when the mean is used
6 of 8
E1 T5
Basic Biostatistics
Epidemiology and Research Methods I
Ma. Peñafrancia L. Adversario, M.D., FPPS, MSPH||February 2, 2017
LINE GRAPH
Time series
Shows trend or changes of data within a given
amount of time
REFERENCE
Dr. Adversario’s PPT
SCATTER PLOT
Quantitative
Shows the probable relationship between two quantitative
variables
It specifically shows correlation – gives information on
magnitude and direction of correlation.