Biostat Intro

Ariel Marticio, RMT, MBA

Faculty of College of Medical Laboratory Science
STATISTICS is a science that deals with the
collection, organization, analysis, interpretation and
presentation of information that can be stated

 The study and use of theory and methods for the

analysis of data arising from random processes or

 The study of how we make sense of data.

 An estimate of an unknown numerical quantity (ex.

Mean height of men age 20).
Field of STATISTICS provides some of the most
fundamental tools and techniques of the scientific
o forming hypotheses
o designing experiments and observational studies
o gathering data
o summarizing data
o drawing inferences from data (e.g., testing hypotheses)
Field of STATISTICS can be divided into:

1. Mathematical Statistics - the study and development

of statistical theory and methods in the abstract.

2. Applied Statistics - the application of statistical

methods to solve real problems involving randomly
generated data and the development of new
statistical methodology motivated by real problems.

 E.g, biostatistics, psychometrics, econometrics,

chemometrics, astrostatistics, environmetrics
 The branch of applied statistics directed toward
applications in the health sciences and biology.

 Bio-statistics: means when we use the statistical

tools on the Biological Problems and derived some
results about that. Example: Medical Science

 It is also called Bio-metry. It means measurement of

 Normally, in medicine for precision, facts, observations
or measurements have to be expressed in figures.

 Bar diagrams, Multiple Bar diagram,Histogram, Pie chart and etc.,

 To calculate average, median, mode standard deviation
of the given collected data
 To compare two sets of data
 To get a conclusion (or) result
 To find the association between the two variables
 To find the correlation bet. the two variables
 To give the results in a tabular or diagrammatic form.
• In Public Health or Community Health, it is called Health

• In Medicine, it is called Medical Statistics.

In this we study the defect, injury, disease, efficacy of
drug, Serum and Line of treatment, etc.,

• In population related study it is called Vital Statistics. e.g.

study of vital events like births, marriages and deaths.
Statistics and Scientific Method
In general, the scientific method includes:
1. A review of facts, theories, and proposals,
2. Formulation of a logical hypothesis that can be
evaluated by experimental methods, and
3. Objective evaluation of the hypothesis on the
basis of experimental results

Basic steps:
1. Making observation
2. Generating a hypothesis
3. Deciding how to test the hypothesis
4. Experimenting
Importance of studying Biostatistics
• It is a tool in the decision making process. An
information based decision making process
needs application of biostatistics.

• It is an integral part of the basic foundation

upon which the expertise of health
adminstrators, planners rest.
Importance of studying Health Statistics
1. It is essential in the fields of preventive
medicine and public health.
2. It provides the foundation upon which all
aspects of public health programs are
3. It serves as guide in the planning,
programming and implementation of health
4. It is used in the evaluation of the
effectiveness of health services
Two Branches of Statistics
Descriptive - are concerned with the
presentation, organization and summarization of
data. (collect and present data)

Inferential - allows us to generalize from our

sample data to a larger group of subjects.
(analyze and interpret data)
Phenomenon of Variation
The reason for the existence of Biostatistics is
that the world is full of variation.

Its not needed if everyone else in the world are

exactly like everyone else

It is in this variability among people and even

within any person from one time to another that
statistics were born.
Biostatistics can be both
qualitative and quantitative data
which can be:
1. Constant – value remains the same from
person to person, from time to time or from
place to place
2. Variable- values or categories cannot be
Data Types
Data are observations of random variables made on
the elements of a population or sample.
• Data are the quantities (numbers) or qualities
(attributes) measured or observed that are to be
collected and or analyzed.
• The word data is plural, datum is singular.
• A collection of data is often called a data set

Example: Low Birth Weight Infant Data

Data Types
 Measurements and observed attributes on low
birth weight infants born in two teaching hospitals
in Metro Manila.

The variables measured here are:

• sbp = systolic blood pressure
• sex = gender (1-male, 0-female)
• tox = maternal diagnosis of toxemia (1-yes, 0-no)
• grmhem = whether infant had a germinal matrix
hemorrhage (1-yes, 0-no)
• gestage = gestational age (weeks)
Measurement Scale Variable
A characteristic that varies from one biological entity to
another is termed as variable.

Various Measurements:
Ratio Scale: The measurements scales having a constant
size interval and true zero point.

• Besides heights and numbers, ratio scales include

weights (mg, g), volumes (cc, cu.m), capacities (ml, l),
rates (cm/sec., Km/h) and lengths of time (h, Yr) etc.,
Various Measurements:

Interval Scale: Some measurement scales posses a

constant interval size but not a zero.

• A good example is that of the two common

temperature scales.
• Celcius (C ) and Fahrenheit (F).
Various Measurements:

Ordinal Scale: The data consist of an ordering or ranking

of measurement.
• For example: The examination marks of 75, 80, 87, 92,
and 95% (ratio scale) might be recorded as A, B, C, D
and E (ordinal scale) respectively.
Various Measurements:

Nominal Scale: The variables are classified by some

quality rather than by a numerical measurement.

In such cases, the variable is called an attribute and said to

using a nominal scale of measurement.
• For example:
1. Data are represented as male or female.
2. Heights may be recorded as tall or short.
Types of Variables
Another way to distinguish Types of Variables
1. Qualitative variables have values that are
intrinsically nonnumeric (categorical).
• E.g. Cause of death, nationality, race, gender, severity of pain
(mild, moderate, severe)
• Qualitative variables generally have either nominal or ordinal
• Qualitative variables can be reassigned numeric values (e.g.,
male-1 female-0),but they are still intrinsically qualitative.

2. Quantitative variables have values that are

intrinsically numeric.
• E.g., survival time, systolic blood pressure, number of children
in a family, height, age, body mass index
Types of Quantitative Variable
1. Discrete variables have a set of possible values
that is either finite or countably infinite.
• E.g., number of pregnancies, shoe size, number of missing teeth
• For a discrete variable there are gaps between its possible
• Discrete values often take integer, whole numbers, values
(counts) but some discrete variables can take non-integer

2. Continuous variable has a set of possible values

including all values in an interval of the real line.
• E.g., duration of a seizure, body mass index, height
• No gaps between possible values
Data Collection
The collection of health information needs
knowledge of the different sources of these data
as well as the methods of obtaining them
1. Primary data – are those obtained first
hand by the investigation to help him
answer specifically the purpose
2. Secondary –which are already existing and
obtained by some people for purposes not
necessarily those of the investigation

• A variable is a characteristic or condition that can change or take on

different values.
• Most research begins with a general question about the relationship
between two variables for a specific group of individuals.


• The entire group of individuals is called the population.

• For example, a researcher may be interested in the relation between
class size (variable 1) and academic performance (variable 2) for the
population of third-grade children.


• Usually populations are so large that a researcher cannot examine

the entire group. Therefore, a sample is selected to represent the
population in a research study. The goal is to use the results
obtained from the sample to help answer questions about the

Correlational Studies

• The goal of a correlational study is to determine whether there is a

relationship between two variables and to describe the relationship.
• A correlational study simply observes the two variables as they exist

• The goal of an experiment is to demonstrate a cause-and-effect
relationship between two variables; that is, to show that changing the
value of one variable causes changes to occur in a second variable.

Experiments (cont.)
• In an experiment, one variable is manipulated to create
treatment conditions. A second variable is observed and
measured to obtain scores for a group of individuals in each
of the treatment conditions. The measurements are then
compared to see if there are differences between treatment
conditions. All other variables are controlled to prevent
them from influencing the results.

• In an experiment, the manipulated variable is called the

independent variable and the observed variable is the
dependent variable.

Other Types of Studies
• Other types of research studies, know as non-experimental or quasi-
experimental, are similar to experiments because they also compare
groups of scores.
• These studies do not use a manipulated variable to differentiate the
groups. Instead, the variable that differentiates the groups is usually a
pre-existing participant variable (such as male/female) or a time variable
(such as before/after).

Other Types of Studies (cont.)

• Because these studies do not use the manipulation and control of

true experiments, they cannot demonstrate cause and effect
relationships. As a result, they are similar to correlational research
because they simply demonstrate and describe relationships.

Descriptive Statistics

• Descriptive statistics are methods for organizing and summarizing

• For example, tables or graphs are used to organize data, and
descriptive values such as the average score are used to summarize
• A descriptive value for a population is called a parameter and a
descriptive value for a sample is called a statistic.

Inferential Statistics

• Inferential statistics are methods for using sample

data to make general conclusions (inferences) about
• Because a sample is typically only a part of the
whole population, sample data provide only limited
information about the population. As a result,
sample statistics are generally imperfect
representatives of the corresponding population

Sampling Error

• The discrepancy between a sample statistic and its population

parameter is called sampling error.
• Defining and measuring sampling error is a large part of inferential


• The individual measurements or scores obtained for

a research participant will be identified by the letter
X (or X and Y if there are multiple scores for each
• The number of scores in a data set will be identified
by N for a population or n for a sample.
• Summing a set of values is a common operation in
statistics and has its own notation. The Greek letter
sigma, Σ, will be used to stand for "the sum of." For
example, ΣX identifies the sum of the scores.

Order of Operations

1. All calculations within parentheses are done

2. Squaring or raising to other exponents is done
3. Multiplying, and dividing are done third, and
should be completed in order from left to right.
4. Summation with the Σ notation is done next.
5. Any additional adding and subtracting is done
last and should be completed in order from left
to right.
Data Sources
Experimental studies
 the researcher deliberately imposes a treatment on
one or more subjects or experimental units not
necessarily human. The experimenter then measures
or observes the subjects’ response to the treatment.

 Crucial element is that there is an intervention.

• To assess whether or not saccharine is carcinogenic, a
researcher feeds 25 mice daily doses of saccharine.
After 2 months 10 of the 25 mice have developed
Data Sources
• To assess whether or not saccharine is
carcinogenic, a researcher feeds 25 mice daily
doses of saccharine. After 2 months 10 of the 25
mice have developed tumors.

Select 25 more mice and treat them exactly the same
but give them daily doses of an inert substance a
Data Sources
• Suppose that in the control group only 1 mouse
develops a tumor, Is this evidence of a
carcinogenic effect?

Starting with 50 relatively homogeneous (similar)
mice, randomly assign 25 to the saccharine
treatment, and 25 to the control treatment.
Data Sources
Randomization an extremely important aspect of
experimental design.

• In the saccharine example,we should start out with 50

homogeneous mice, but of course they will differ some.

 Randomization ensures that the two experimental

groups will be probabilistically alike with respect to
all nuisance variables (potential confounders) E.g.,
the distribution of body weights should be about
the same in the two groups.
Data Sources
 An experiment is blind if the subjects don’t know
which treatment they receive.
 E.g., suppose we randomize 25 of 50 migraine
sufferers to an active drug and the remaining 25 to
a placebo control treatment.

• Experiment is blind if pills in the two treatment groups

look and taste identical and subjects are not told
which treatment they receive.
• This guards against the placebo effect.
Data Sources
 An experiment is double-blind if the researcher
who administers the treatments and measures the
response does not know which treatment is

• Guards against experimenter effect.

• Experimenter may behave differently toward the
subjects in the two groups, or measure the response
differently in the two groups.
Data Sources
Experiments have many advantages and are strongly
preferred when possible. However, experiments are
rarely feasible in public health / epidemiology.

In health sciences/medicine, experiments involving

humans are called clinical trials.
Data Sources
Observational study is when the investigator collects
data merely by watching or asking questions.

• No intervention
• Data collected on an existing system.
o Less expensive
o Easier logistically
o More often ethically practical
o Interventions often not possible
Types of Obervational Studies
1. Case studies or case series
 A descriptive account of interesting characteristics
(e.g., symptoms) observed in a single case (subject
with disease) or in a sample of cases.

• Typically are unplanned and don’t involve any research

• No comparison group.
• Poor design, but can generate research hypotheses
for subsequent investigation.
Types of Obervational Studies
2. Case control study
 Conducted retrospectively by looking into past.
 Two types of subjects included:
• cases - subjects with the disease/outcome of interest
• controls - subjects without the disease/outcome

 History of two groups is examined to determine which

subjects were exposed to, or otherwise possessed, a prior
 Association between exposure and disease then
 Controls are often matched to cases based on similar
Types of Obervational Studies
2. Case control study

• Useful for studying rare disease
• Useful for studying diseases with long latency periods
• Can explore several potential risk factors (exposures) for disease
• Can use existing data sources –cheap, quick, easy to conduct

• Prone to methodological errors and biases
• Dependent on high quality records
• Difficult to select an appropriate control group
• More diffcult statistical methods required for proper analysis
Types of Obervational Studies
3. Cross-sectional studies
 Collect data from a group of subjects at one point in
 Sometimes called prevalence studies, due to their
focus on a single point in time.
Types of Obervational Studies
3. Cross-sectional studies

• Often based on a sample of the general population, not just people
seeking medical care.
• Can be carried out over a relatively short period of time.

• Difficult to separate cause and effect because measurement of
exposure and disease are made at one point in time so it may not be
possible to determine which came first
• Are biased toward detecting cases with disease of long duration and
can involve misclassications of cases in remission or under eective
medical treatment
• Snapshot in time can be misleading in a variety of other ways
Types of Obervational Studies
4. Cohort Studies
 Usually conducted prospectively (forward in time)
 A cohort is a group of people who have something in
common at a particular point in time and who remain
part of the group through time.
 A cohort of disease free subjects are selected and
their exposure status evaluated at the start of the
 They are then followed through time in order to
observe who develops disease.
 Association between exposures (risk factors) and
disease are then quantified.
Types of Obervational Studies
3. Cohort studies

• Useful when exposure of interest is rare
• Can examine multiple effects (e.g., diseases) of a single exposure
• Can elucidate temporal relationship between exposure and disease
thereby getting closer to causation
• Allows direct measurement of incidence of disease

• Ineffcient for studying rare diseases
• Generally requires a large number of subjects
• Expensive and time consuming
1. Consider a survey of nurses’ opinions of their
working conditions. What type of variables are:
(i) length of service
(ii) staff grade
(iii) age
(iv) salary
(v) number of patients seen in a day
(vi) possession of a degree.
2. What differences do you think are there
in a discrete measurement such as shoe
size, and a discrete measurement such as
family size?
3. You want to determine if cinnamon reduces a
person’s insulin sensitivity. You give patients who
are insulin sensitive a certain amount of cinnamon
and then measure their glucose levels. Is this an
observation or an experiment? Why?

