Introduction to Biostatistics

By: Amare M (MPH/Biostatistics)

• Appreciate basic statistical concepts

• Identify classification of statistics

• Understand applications and limitations of statistics

• Understand types of variables

Introduction
• While Biostatistics can be broadly defined as
the application of statistics to problems in
biology and medicine, understanding its
intricate facets requires a closer look at specific
definitions and nuances:

Introduction
Core definition:
• The application of statistical theory and methods to
the collection, analysis, interpretation, and
presentation of data in biology and health-related
• It encompasses design of experiments, analysis of
observational data, development of statistical
models, and drawing scientific conclusions based on

Introduction
Key components
• Statistical Theory: Biostatistics utilizes the foundational principles of
probability and statistical inference to draw meaning from
data. This includes techniques like hypothesis testing, regression
analysis, and statistical modeling.
• Data Collection and Analysis: Designing studies to collect accurate
and relevant data, implementing efficient analysis methods, and
ensuring data quality are core parts of Biostatistical practice.
• Interpretation and Presentation: Converting complex statistical
results into understandable and meaningful conclusions for
scientists, policymakers, and the public is crucial. Effective
communication of findings is essential for their impact.

Introduction
Subfields and applications
• Clinical Research: Designing and analyzing clinical trials for new
drugs and treatments, assessing safety and efficacy, and meta-
analyzing data across studies.
• Epidemiology: Investigating patterns of disease occurrence and
identifying risk factors, studying disease outbreaks, and informing
public health interventions.
• Public Health: Analyzing healthcare utilization data, evaluating
healthcare policies, and optimizing resource allocation for disease
prevention and screening programs.
• Genetics and Genomics: Identifying genetic variants associated with
diseases, developing personalized medicine approaches based on
individual genetic profiles, and analyzing large-scale genomic

Introduction
Beyond Medicine:

• Applications extend to conservation

biology, agriculture, ecology, and other fields using
statistical methods to analyze biological data and
inform decision-making.

Introduction
Nuances and Limitations
• Data Quality and Bias: Biostatistics relies heavily on the quality of
data collected and potential biases in study design or
analysis. Addressing these limitations is crucial for drawing reliable
• Statistical Interpretation: P-values and statistical significance alone
don't guarantee practical or clinical relevance of
findings. Contextual understanding and consideration of effect size
are essential.
• Biological Complexity: Statistically simplifying complex biological
phenomena can lead to overlooking important interactions or non-
linear relationships. Biostatistics needs to embrace the inherent
complexity of living systems.
• Ethical Considerations: Privacy protection, data security, and
responsible use of statistics are crucial areas of concern in
Biostatistical practice.
Introduction
Evolving landscape
• Biostatistics is a continuously evolving field adapting to new
technologies, data sources, and analytical methods.
Advanced areas like machine learning, Bayesian statistics,
and network analysis are increasingly integrated into
Biostatistical research.
• In conclusion, understanding biostatistics goes beyond a
simple textbook definition. It involves appreciating its
diverse applications, limitations, and nuances. By critically
evaluating its tools and acknowledging its complexities, we
can harness the power of biostatistics to advance scientific
knowledge and promote health for all.

Introduction
Why Biostatistics Matters: Unveiling the Rationales
• Biostatistics isn't just a fancy way to crunch numbers in biology.
• It's a vital tool with compelling rationales driving its importance in
various fields:
1. Quantifying Uncertainty in the Messy World of Biology:
• Biological phenomena are inherently complex and subject to
• Biostatistics provides a framework to quantify this uncertainty and
draw reliable conclusions from data.
• It allows us to move beyond anecdotal evidence and intuition to
make informed decisions based on data-driven insights.

Introduction
2. Designing Meaningful Research:

• From clinical trials to Epidemiological

studies, Biostatistics is crucial for designing sound
research protocols.
• It helps determine appropriate sample
sizes, randomization procedures, and data collection
methods to ensure the study yields valid and
generalizable results.

Introduction
3. Making Sense of Data Overload:
• The scientific landscape is flooded with data, both large and
• Biostatistics provides powerful tools to analyze these
datasets, identify patterns, and extract meaningful
• This empowers researchers to uncover hidden
trends, predict future outcomes, and test hypotheses about
biological processes.

Introduction
4. Guiding Clinical Decisions and Public Health Strategies:
• Biostatistics informs evidence-based medical practice.
• By analyzing clinical trial data, it helps determine the
efficacy and safety of new treatments, leading to
improved patient care.
• In public health, biostatistics plays a crucial role in
tracking disease outbreaks, identifying risk factors, and
developing effective prevention programs.

Introduction
5. Personalized Medicine Revolution:
• Biostatistics is fueling the personalized
medicine revolution by analyzing individual
genetic and other biological data to predict
disease risk and tailor treatment plans. This
shift towards a more precise and patient-
centered approach to healthcare relies heavily
on Biostatistical techniques.

Introduction
6. Beyond Medicine: Broad Applications:
• The reach of biostatistics extends beyond healthcare.
• It is used in conservation biology to model population
dynamics and guide conservation efforts.
• In agriculture, it optimizes crop yields and disease
control strategies.
• Even environmental studies utilize biostatistics to
analyze pollution levels and assess ecological impacts.

Introduction
7. Bridging the Gap Between Data and Action:

• Biostatistics provides a crucial bridge between raw data

and actionable insights.

• By turning numbers into knowledge, it empowers

researchers, policymakers, and healthcare
professionals to make informed decisions that improve
human health and well-being.

Introduction
8. Continuous Innovation and Refinement:

• Biostatistics is a dynamic field that constantly evolves

with new technologies and analytical methods.

• As research expands and data becomes more

complex, biostatisticians develop new tools and refine
existing methods to better understand and interpret
biological phenomena.

Introduction
• In conclusion, the rationales for biostatistics go far
beyond mere number crunching.
• It's a powerful tool that drives scientific discovery,
informs healthcare decisions, and shapes public health
• As we navigate the ever-evolving world of Biology,
Biostatistics provides us with a lens of logic and insight,
guiding us towards a future of evidence-based and
data-driven solutions.
Introduction
Basic statistical concepts
• Statistics might seem daunting, but the core
concepts are surprisingly simple.

Introduction
Key terms
• Population: The entire group you're interested in
studying (e.g., all patients in a hospital).
• Sample: A smaller subset of the population used to
draw conclusions about the whole (e.g., 100 patients
chosen from the hospital).
• Parameter: A characteristic of the entire population
(e.g., average height of all patients).
• Statistic: A characteristic of the sample used to
estimate the corresponding parameter (e.g., average
height of the 100 patients).

Introduction
Key terms …
• Census: It is complete enumeration of all population members to
obtain the required information. It is usually conducted and
compiled nationally.
• Survey: Gathering data from parts of the population
• Data are the quantities (numbers) or qualities (attributes) measured
or observed that are to be collected and/or analyzed. Data are
pieces of information about individuals organized into variables
• Variable: A characteristic which takes different values in different
persons, places, or things.

Introduction
Descriptive Statistics
• Measures of Central Tendency: Tell you where the
data tends to cluster.
o Mean: Average of all values.
o Median: The middle value when data is ordered.
o Mode: The value that appears most frequently.
• Measures of Dispersion: Tell you how spread out the
data is.
o Variance: Average squared distance from the
o Standard Deviation: Square root of the
variance, easier to interpret.

Introduction
Inferential Statistics

• Making inferences about the population based on a


• Hypothesis Testing: Formulating a null hypothesis (no

difference between groups) and trying to reject it
based on sample data.

• Confidence Intervals: Estimating the range within

which the true population parameter likely lies.

Introduction
Important Concepts:
• Correlation: How strongly two variables are related (but
doesn't imply causation).
• Probability Distributions: Shows the likelihood of different
outcomes in a process.
• Regression Analysis: Modeling the relationship between
• Random sampling is crucial for valid inferences.
• Statistics are estimates, not exact values.
• Context matters when interpreting data.

Introduction
Classification of statistics
• There are several ways to classify statistics, depending
on your perspective and purpose. Here are some
common classifications:
By Data Type:
• Descriptive Statistics: These summarize the
characteristics of a dataset, like measures of central
tendency (mean, median, mode) and dispersion
(variance, standard deviation).
• Inferential Statistics: These use sample data to draw
conclusions about the population from which it
came, like hypothesis testing and confidence intervals.

Introduction
Classification of statistics ….
By Analysis Method:
• Parametric Statistics: These assume data follows a
specific known distribution (e.g., normal). Examples
include t-tests and ANOVA.
• Non-Parametric Statistics: These make no assumptions
about the underlying distribution of data. Examples
include Mann-Whitney U test and Kruskal-Wallis test.
By Focus:
• Univariate Statistics: These analyze a single variable.
• Multivariate Statistics: These analyze relationships
between multiple variables.

Introduction
Classification of statistics …
Other Classifications:
• Frequency Statistics: These deal with the
frequency of observations within categories.
• Bayesian Statistics: These incorporate prior
knowledge into analysis to refine estimation.
• Machine Learning: This field employs statistical
methods to build models that can learn and make
predictions from data.
Remember: The choice of classification depends on
the specific context and question you're trying to
Introduction
Applications of statistics
• Biostatistics plays a crucial role in various aspects of biology
and medicine, helping us make sense of data and improve
health outcomes. Some key applications are described below:

1. Clinical Research:

 Designing Clinical Trials: Biostatisticians ensure trials are

properly designed, with adequate sample
size, randomization, and blinding to eliminate bias. They help
define primary and secondary endpoints to be measured and
analyze the collected data to assess the efficacy and safety of
new drugs, treatments, or interventions.

Introduction
• Meta-Analysis: Combining data from multiple
studies of similar design using statistical
methods provides stronger evidence than
individual studies. Biostatisticians ensure
proper data pooling and adjustment for
differences across studies.

Introduction
2. Epidemiology:
• Identifying Risk Factors: Biostatisticians help identify
factors that increase the risk of developing
diseases, like specific genes, environmental
exposures, or lifestyle habits. They analyze large
datasets to find associations and estimate the strength
of these relationships.
• Outbreak Investigation: During disease
outbreaks, biostatisticians analyze surveillance data to
track transmission patterns, identify sources of
infection, and inform control measures. They use
modeling techniques to predict future trends and guide
public health decisions.

Introduction
3. Public Health:
• Healthcare Policy and Planning: Biostatisticians provide
data-driven insights to inform healthcare policies. They
analyze healthcare utilization data to identify gaps in
access, assess the effectiveness of different
interventions, and allocate resources efficiently.
• Disease Prevention and Screening
Programs: Biostatisticians contribute to designing and
evaluating screening programs for early detection of
diseases like cancer or diabetes. They develop models
to assess the cost-effectiveness of these programs and
optimize their implementation.

Introduction
4. Genetics and Genomics:
• Genome-Wide Association Studies
(GWAS): Biostatisticians analyze vast amounts of
genetic data to identify genetic variants associated
with diseases or traits. They use sophisticated
statistical methods to account for multiple
comparisons and identify true associations.
• Personalized Medicine: Biostatisticians develop
statistical models to predict individual risk of disease
based on genetic and other individual factors. This
information can guide personalized healthcare
decisions and treatment plans.

Introduction
5. Beyond Medicine:
• Conservation Biology: Biostatisticians help assess
population sizes and trends of endangered species, analyze
the impact of environmental changes on ecosystems, and
inform conservation strategies.
• Agriculture: Biostatisticians optimize crop yields by
analyzing the effects of different fertilizers, pesticides, and
farming practices. They develop statistical models to
predict crop growth and disease outbreaks.
• These are just a few examples of the vast applications of
biostatistics. Its role continues to expand as new
technologies and data sources emerge, driving
advancements in healthcare, medicine, and various other

Introduction
Limitations of statistics
• It deals with quantitative data only
• It deals with the mass, not an individual
• It is true on an average only
• Its results are correct in a general sense (always
subject to certain amount of error)
• It can be misused in many ways

Introduction
Types of variables in Biostatistics
• Understanding the type of variable you're dealing with is crucial for
choosing appropriate statistical methods in biostatistics. Here's a
breakdown of the main types:
1. By Measurement:
• Quantitative Variables: These represent numerical values that can
be measured and ordered.
• Examples include height, weight, blood pressure, or cholesterol
levels. These can be further subcategorized:
– Continuous: They can take any value within a range (e.g., height
can be 1.56m, 1.73m, etc.).
– Discrete: They can only take specific integer numbers
(e.g., number of children, number of hospital admissions,
number of ANC visit a pregnant woman does have ).

Introduction
• Qualitative Variables: These represent non-numerical attributes or
• Examples include blood type, disease status (present/absent), or
eye color.
• They can be further subcategorized:
– Nominal: Categories have no inherent order (e.g., blood type
A, B, AB, O). Nominal variables have ≥2 categories
– Ordinal: Categories have an order of increasing or decreasing
value (e.g., tumor stage I, II, III). Has >2 categories

Introduction
2. By Role in Analysis:
• Independent Variables: These are thought to influence or cause
changes in other variables. In an experiment, this is the manipulated
variable. Examples include type of medication in a clinical trial or age in
a study of age-related diseases.
• Dependent Variables: These are affected by the independent variable
and are measured to observe the effect. In an experiment, this is the
outcome variable. Examples include blood pressure changes after
taking medication or disease incidence in different age groups.
• Confounding Variables: These influence both the independent and
dependent variables, potentially distorting the observed relationship.
They need to be controlled for or accounted for in analysis. Examples
include smoking status or socioeconomic status in studies of health

Introduction
3. Other Important Variables:
• Time-dependent Variables: These change over time, requiring
specific statistical methods for analysis.
• Examples include weight gain over months or tumor growth over
• Binary Variables: These only have two possible values, often coded
as 0/1 or yes/no.
• Examples include presence/absence of a disease or success/failure
of a treatment.
• Choosing the right statistical analysis depends on the type of
variable and its role in your study. Knowing these classifications
empowers you to make informed decisions and extract meaningful
insights from your Biostatistical data.

Introduction
Scales of measurement
 Measurement is the process of assigning numbers or other
symbols to characteristics or attributes of the objects or
people of interest according to certain prespecified rules.
 Scaling is representing a quantity/a variable according to a
particular scale/ measurement
The level of measurement determines which statistical
calculations are meaningful.
Depending on their nature, variables can be measured
in four different scales

• Nominal = Naming
• Ordinal = Naming + Order
• Interval = Naming + Order + Equal Intervals
• Ratio = Naming + Order + Equal Intervals +
True zero value and True ratio

The four level of measurements


Introduction

