EPI 1.05 Basic Biostatistics

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Basic Biostatistics

Epidemiology and Research Methods I


Ma. Peñafrancia L. Adversario, M.D., FPPS, MSPH||February 2, 2017

Content Outline:  To compare the proportion of those with PEFR values less
I. Introduction to Biostatistics than predicted amongst those exposed to high vs. low
II. Descriptive Statistics pollution loads
A. Tables  To determine if there is an association between PEFR and
B. Graphs
C. Summarizing Figures pollution load
i. Qualitative  To determine if pollution load (Ug/NCM TSP) is a
ii. Quantitative prediction of lung function (PEFR)
III. Inferential Statistics
IV. Hypothesis Testing Descriptive Analysis Inferential Analysis
 Tables  Estimation
 Graphs - Point Estimate
 Summarizing Figures - Interval Estimate
LEGEND:  Qualitative Measures  Hypothesis Testing
‼ - Frequency - Mean
Presentation Mentioned in Remember Book - Location - Proportion
the lecture  Quantitative Measures
- Central Tendency
OBJECTIVES - Dispersion
 To differentiate descriptive and inferential statistics
 To know the concept of population and sample Population vs. Sample
 To distinguish a parameter from a statistic
 To know the different ways of describing data
 Population includes all the elements from a set of data
 To determine the appropriate descriptive measure for  Sample consists of one or more observations from the
summarizing data population
 To compute and interpret a summary measure
 To differentiate estimation and hypothesis testing

BIOSTATISTICS
Definition of Biostatistics
 Application of statistical methods to life sciences
o Collection
o Organization  Best employed due to constraints (manpower, money,
o Analysis time) of conducting a study on an entire population
o Interpretation  Used to make generalizations about the target population
 Applications: Information-based decision-making  Should be representative of the population to make a valid
process conclusion
o Problem identification
o Needs assessment Example Study: Prevalence of internet addiction among high
o Resource allocation school students in Quezon City
o Program evaluation
 Population: All high school students in Quezon City
I. Descriptive Statistics  Sample: All high school students in 1 private and 1
 Method to summarize and present data in a form that’s  public school in Quezon City
easier to analyze and interpret Parameter vs. Statistic
 e.g. The Philippine Atmospheric Geographic Astronomical  Parameter
Service Administration (PAGASA) measures the daily o Numerical constant obtained by observing the total
amount of rainfall in millimeters. They compute for the population
average daily amount of rainfall every month for the past o (usually unknown)
year. o e.g. “A barangay has 5095 households and the
average income per year is Php 222,000
Inferential Statistics  Statistic
 Method to make generalizations and conclusions about a o Numerical variable obtained by observing a random
target population based on results from a sample sample from the population
 e.g. Given the data on the daily amount of rainfall for the o e.g. “Ten households are selected for investigations
last ten years, PAGASA declares that the average amount and the average income per year of the households
of rainfall for August the next year is between 23 to 25 is Php 195,000
millimeters.
 Examples:
ESTIMATION
o To determine Peak Expiratory Flow Rates (PEFR) of
children residing in barangays with high and low Definition of Estimation
pollution load  The process by which a statistic computed for a random
o To compare the mean PEFR of children between 2 sample is used to approximate the corresponding
barangays parameter

| Sunico | Tan, J.| Tan, B.| Tan, R. | Telan | 1 of 8


E1 T5
Basic Biostatistics
Point Estimate
 Single numerical value used to approximate the population
parameter

Interval Estimate
 Consists of two numbers, a lower limit and an upper
limit, which serve as the bounding values within which the
parameter (true value) is expected to lie with a certain
degree of confidence
 Example: Estimate the height of Dr. Zulueta
Appropriate Statistical Test Factors
o 6’1” = Point estimate
 Objectives of the study
o 5’11” to 6’2” = Interval estimate
 Type of variable
 Steps in Hypothesis Testing:
1. Stating the null hypothesis, H0, and the alternative  Level of measurement
hypothesis, H1  Number of samples
 Whether samples are related or independent
2. Stating the level of significance, α
 Assumption on the distribution
3. Choosing the test statistic and determining sample
distribution Critical Region
4. Determining the critical region  SIZE of a critical region depends on the set level of
5. Computing the TEST STATISTIC significance α
6. Making the statistical decision (whether or not to  LOCATION of the critical region depends on the nature of
reject the null hypothesis) the alternative hypothesis (one-tailed or two-tailed)

STATISTICAL DECISION
Definition of Statistical Decision
 WHETHER OR NOT to reject the null hypothesis
1. For Manual computations, if computed value of the
test statistic falls in the critical region, THE NULL
HYPOTHESIS IS REJECTED
2. FOR COMPUTERIZED OUTPUT, if the p-value
(usually presented together with the computed
values of the test statistic) ≤ to the set α, then THE
NULL HYPOTHESIS IS REJECTED
3. If the CONFIDENCE INTERVAL does not include the
null value (1), THE NULL HYPOTHESIS IS
REJECTED

Statistical Decision and Corresponding Conclusion


Null Hypothesis Rejected
 Results statistically significant; then STATE
ALTERNATIVE HYPOTHESIS (H1)

Null Hypothesis NOT Rejected


 Results NOT statistically significant; then “There is no
sufficient evidence to say (STATE ALTERNATIVE
HYPOTHESIS)”
 Note: We generally don’t accept the null hypothesis

Drawing Conclusion
Statistical Decision Conclusion
Reject the null hypothesis State the alternative
STATEMENT OF LEVEL OF SIGNIFICANCE (H0) hypothesis (H1)
Do not reject the null There is no sufficient
Definition Statement of Level of Significance
hypothesis (H0) evidence to say (state the
When arbitrarily setting the level of significance at α, the
alternative hypothesis)
probability of erroneously rejecting a true H0 is set almost equal
to α.
Note: We do not accept the
 E.g. If α = .05, probability of rejecting a true
null hypothesis
hypothesis is at most only 5%

2 of 8
E1 T5
Basic Biostatistics
o Non data: the use of too
WAYS TO DESCRIBE OR INTERPRET DATA many gride lines and frames (as examples)
Textual
 When there are 3 or fewer numbers or data to be
interpreted (APA)
 Data is simply narrated
 Used if the data can easily be explained using sentences
or paragraphs.
 Involves enumerating important characteristics,
emphasizing significant figures, and identifying important
features of data

Tabular
 When there are 4-20 numbers or data to be interpreted
(APA)
 Tables provide a compact way of presenting large sets of
detailed information
 The formatting of a table may allow comparisons with ease
 Provides interrelationships among the different variables
being presented Quantitative vs. Qualitative
 Can have frequency counts, proportions, percentages,  QUANTITATIVE
totals and avergaes o Variables can be measured and ordered according to
 Characteristic of a good table: quantity or amount
o Simple – devoid of unnecessary markings o The values can be expressed numerically
o Direct – contain what is only necessary o Two subtypes:
o Clear – must jive with textual description  Discrete: integers, whole numbers
 Continuous: fractions, decimals
 QUALITATIVE
o The categories are simply used as labels to
distinguish one group from another and not for
comparison
o Examples: Sex, religion, regions in the country,
marital status

Measures of Frequency
1. COUNT
 Absolute number of persons/elements with the
characteristic
 Ex. 100 females in the room
2. RATIO
 Single number representing the relative size of two
numbers; depicted as a/b, k as the multiplier
(multiples of 10)
Isang tingin palang, kita mo na agad yung sinasabi mong
data, compared to a long text description.

Graphical  Examples:
 When there are more than 20 numbers or data to be
o Sex ratio “1:3 male to females ratio” 
interpreted (APA)
“100 male to 300 female”
 Portrays numerical figures that are found in a table in a
o Teacher to student ratio
pictorial or illustrational form
o Odds ratio – odds of exposure among cases/
 Allows comparison of the different series or groups
odds of exposure among control
 Gives only an approximation of the data – tables have the o Risks ratio
exact data! 3. PROPORTION
 Shows the different relationships between the given  Special type of ratio where the numerator is part of
variables the denominator
 Accurate, clear, simple, and professional looking
o Scales should have equal increments
o Avoid optical illusions
o Avoid broken lines and markers
 Should emphasize only data – only DATA and not NON  Examples:
DATA o 300 female, 100 male; proportion of female is ¾
(300 divide 400)

3 of 8
E1 T4
Basic Biostatistics
4. RATE
 Frequency of occurrence of events in a given interval
of time
 Examples:
o Birth Rate/Live Births
o Incidence Rate
o Prevalence Rate Median  Middlemost observation, if n is
odd
Measures of Location  Mean of the two middlemost
 PERCENTILE observations, if n is even
o One of the 99 values of a variable, which divides
the distribution into 100 equal parts
 DECILE
o One of the 9 values of a variable, which divides the
distribution into 10 equal parts
 QUARTILE Mode
o One of the 3 values of variable which divides the  Most frequently occurring value
distribution into 4 equal parts in a set of observation
 “50th percentile is 2nd quartile, median, 5th decile
25th percentile is 1st quartile, 2.5th decile
75th percentile 3rd quartile 7.5th decile”
 Application of percentile in interpretation of growth chart
horizontal axis age in months, vertical axis height and
weight.
 12 mo boy, 95th percentile falls at 12.4 kg  59% of 12
month boys are less than or equal to 12.4kg , 5% more
than.
 Used to compare individual values with standard
value/norm
 Used also for measurement of ability and intelligence,
normal lab values range Where:
E – sum
Xi - individual value
i – subscript from first to last observation
n – total no of observations

Measures of Central Tendency


 MEAN – average; “arithmetic mean”
 MEDIAN – middlemost observation
 MODE – most frequently occurring value

Measures of Central Ungrouped Data


Tendency
Mean  Sum of observations / total no.
of observations

4 of 8
E1 T5
Basic Biostatistics

Interpretation:  Median is not affected by extreme values. Cannot use


 Mean – average weight of patients is 14.4 kg mean to summarize data if it is skewed.
 Median – half of patient weighed less than 15.85 kg while  Ex: Income of individuals in developing countries
the other half weighed more than or equal to 15.85 kg (If income inc, number in that particular income is lower)
 Mode – usual weight of patients is 12.6 and 16 kg  Incidence and age of individuals with diabetes, as they get
older, more incidence of diabetes
Choice of the Measure of Central Tendency
 SCALE OF MEASUREMENT Measures of Dispersion
 NATURE OF THE DISTRIBUTION  “Average is not adequate to describe a set of observations
because it does not give information on how the values
Type of Variable and Scale of Measurement tend to clump together or spread apart”
o RANGE – extremes; difference between highest and
lowest values
o VARIANCE – average of squared deviations of
individual observations from the mean; mean as the
reference point, how far it is from this  deviation

1. NOMINAL
 Class to which object/unit assigned is numbered or
named - s2 is variance
2. ORDINAL - difference of highest and lowest values, squared, divide to
 Classes are ordered or ranked n-1 (number of observations)
 Two types:
o Qualitative – best described using proportions o STANDARD DEVIATION – square root of the
 Ex: Likert Scaled responses (Strongly variance
Disagree, Disagree, Agree, Strongly Agree) o COEFFICIENT OF THE VARIANCE
o Quantitative- best described using median o INTERQUARTILE RANGE
3. INTERVAL
 Zero point is not fixed but is arbitrary
 Zero does not mean absence of characteristic
4. RATIO
 Height, weight, blood pressure

Nature of Distribution
 Looks like a perfect bell-shaped curve

5 of 8
E1 T4
Basic Biostatistics

BAR (horizontal or vertical)


 Qualitative, Quantitative discrete
 It is used to compare absolute or relative values/counts
between different categories of a discrete quantitative(for
vertical bar graphs) or qualitative (horizontal bar graphs)
variable.

SUMMARY:
 Proportions and percentages are used to summarize
nominal and ordinal data
 Percentiles are useful to compare an individual
observation with a norm
 Median is used for ordinal data or skewed numerical data
 Range is used with numerical data when the purpose is to
emphasize extreme values
 Standard deviation is used when the mean is used

Choosing the appropriate graph


HISTOGRAM or FREQUENCY POLYGON
 Quantitative, Continuous data
 It is a graphic representation of a distribution of a
continuous variable
 This continuous variable is placed in the horizontal axis
 Frequencies are placed in the vertical axis
 Advantage: depict two or more distributions in a single
graph

6 of 8
E1 T5
Basic Biostatistics
Epidemiology and Research Methods I
Ma. Peñafrancia L. Adversario, M.D., FPPS, MSPH||February 2, 2017

PIE CHART SUMMARY


 Qualitative
 Used when you need a breakdown of a group where the
number of categories used is not too many (at most five)
 Sections of pie graph are usually in sequential order –
from the largest proportion to the smallest.

LINE GRAPH
 Time series
 Shows trend or changes of data within a given
amount of time

REFERENCE
Dr. Adversario’s PPT

SCATTER PLOT
 Quantitative
 Shows the probable relationship between two quantitative
variables
 It specifically shows correlation – gives information on
magnitude and direction of correlation.

| Sunico | Tan, J.| Tan, B.| Tan, R. | Telan | 7 of 8


E1 T5

You might also like