Download as pdf or txt
Download as pdf or txt
You are on page 1of 30

First Edition

2015
Statistics ASM

Important definitions
Incidence
 Incidence → the number of new cases within a defined population at a specified time
 Incidence rate → the number of new cases per population at risk in a given time period
 Incidence rate → is a ratio of the number of individuals in the population who develop an illness in a given
time period (commonly 1 year) divided by the total number of individuals at risk for the illness during that
time period (e.g. the number of IV drug abusers newly diagnosed with AIDS in 2013 divided by the number of
IV drug abusers in the population during 2013)
 Incidence proportion (cumulative incidence) → the number of new cases within a specified time period
divided by the size of the population initially at risk. For example, if a population initially contains 1,000 non-
diseased persons and 28 develop a condition over 2 years of observation, the incidence proportion is 28 cases
per 1,000 persons → 2.8%
Number of new cases within a defined population at a specified time
Incidence rate = Total population at risk during the same period of time

Prevalence
 Prevalence → the total number of cases within a defined population at a specified time
 Prevalence rate → the total number of cases per population at risk in a given time period
 Prevalence rate → is a ratio of the number of individuals in the population who have an illness (e.g. AIDS)
divided by the total number of individuals at risk for the illness
 Point prevalence → is a ratio of the number of individuals who have an illness at a specific point in time (e.g.
the number of people who have AIDS on August 31, 2013) divided by the total population who could have the
illness on that date
 Period prevalence → is a ratio of the number of individuals who have an illness during a specific time period
(e.g. the number of people who have AIDS in 2013) divided by the total population who could have the illness
during that specific time period
 Lifetime prevalence (LTP) → the proportion of a population that at some point in their life (up to the time of
assessment) have experienced the condition
 Prevalence rates are affected by the incidence of the disease and the duration of illness
Total number of cases within a defined population at a specified time
Prevalence rate =
Total population at risk during the same period of time

 Prevalence answers "How many people have this disease right now?" and incidence answers "How many people
per year newly acquire this disease?"
 Prevalence rate of an illness is decreased either when patients recover or die

Attack rate
 It is the cumulative incidence of infection over a period of time. This is typically used during an epidemic (e.g.
during the influenza outbreak, the attack rate was 12%)
 It is the biostatistical measure of frequency of morbidity, or speed of spread, in an at risk population
 It is calculated by taking the number of new cases in the population at risk and dividing by the number of
persons at risk in the population. It is usually expressed as a percentage
 For example → if, after a picnic, 20 out of 40 people who ate fried chicken and 10 out of 50 people who ate
fried fish become ill → the attack rate is 50% for chicken and 20% for fish
Number of new cases in the population at risk
Attack rate =
Number of persons at risk in the population

1
Statistics ASM

Mortality rates
 Mortality rate is the number of deaths within a defined population at a specified time
 Mortality rate is typically expressed in units of deaths per 1000 individuals per year
 The standardized mortality ratio is the observed mortality rate divided by the expected mortality rate
 Proportionate mortality rate (PMR) is used to determine the relative importance of a specific cause of death in
relation to all causes of death in a population (e.g. the leading cause of death in USA in 1980 was heart disease
with a PMR of 38.2%)
 Stillbirth rate → number of stillbirths (i.e. babies born with no signs of life after 24 weeks’ gestation) per
1000 total births
 Perinatal mortality rate → number of stillbirths and deaths within the first week of life per 1000 total births
 Neonatal mortality rate → number of deaths of live born babies aged up to 1 month per 1000 live births
 Infant mortality rate → number of deaths of all infants aged under 1 year per 1000 live births
Number of deaths during one year
Mortality rate = × 1000
Total population at mid-year

Number of deaths due to a diseases in a specified period of time


Case fatality rate = × 100
Number of cases of the disease in the same period of time

Number of deaths from a given cause in a specified period of time


PMR = × 100
Total number of deaths in the same period of time

Number of still births in a period of time


Still birth rate = × 1000
Total number of births (live & still) in the same period of time

Number of still births and early neonatal deaths (< 7 days old) in a period of time
Perinatal mortality rate = × 1000
Total number of births (live & still) in the same period of time

Number of deaths under 28 days of age in a period of time


Neonatal mortality rate = × 1000
Total number of live births in the same period of time

Number of deaths under one year of age in a period of time


Infant mortality rate = × 1000
Total number of live births in the same period of time

Number of deaths in infants aged from 28 days to < 1 year in a period of time
Postneonatal mortality rate = × 1000
Total number of live births in the same period of time

 Stillbirth → fetal death and expulsion from the uterus after 24 weeks’ gestation
 Miscarriage/abortion → fetal death and expulsion from the uterus before 24 weeks’ gestation

2
Statistics ASM

Study design
Research questions
 A research study should always be designed to answer a particular research question. The question usually
relates to a specific population. For example:
 Does taking folic acid early in pregnancy prevent neural tube defects?
 Is a new inhaled steroid better than current treatment for improving lung function among cystic fibrosis
patients?
 Is low birth weight associated with hypertension in later life?
 A well-built clinical foreground question should have 4 components. The PICO model is a helpful tool that
assists you in organizing and focusing your foreground question into a searchable query
 P = Patient, Problem, Population
 How would you describe a group of patients similar to you?
 What are the most important characteristics of the patient?
 I = Intervention, Prognostic Factor, Exposure
 What main intervention are you considering?
 What do you want to do with this patient?
 What is the main alternative being considered?
 C = Comparison (can be none or placebo)
 What is the main alternative to compare with the intervention?
 Are you trying to decide between 2 drugs, a drug and no medication or placebo, or 2 diagnostic tests?
 O = Outcome
 What are you trying to accomplish, measure, improve or affect?
 Outcomes may be disease-oriented or patient-oriented
 For example:
 P = pregnant women; I = folic acid; C = no folic acid; O = neural tube defect – yes/no
 P = CF patients; I = new inhaled steroid; C = current treatment, O = improvement in lung function
 P = newborns; I = low birthweight; C = normal birthweight; O = hypertension
 To answer the specified research question, random samples of the relevant populations are taken (e.g. pregnant
women, cystic fibrosis patients, LBW individuals, and normal birthweight individuals)
 Based on the differences found between the different groups of samples, inferences are made about the
populations from which they were randomly sampled. For example:
 If the women in the sample taking folic acid have fewer neural tube defects, it may be inferred that taking
folic acid during pregnancy will reduce the incidence of neural tube defects in the population
 If among our sample of CF patients, those taking steroids have better lung function on average than those
on current treatment, the inference might be that steroids improve lung function among CF patients in
general. Note that some of the patients in the sample who were on current treatment may have had better
lung function than some of those using steroids, but it is the average difference that is considered
 If there is a difference in hypertensive rates between samples of individuals who were and were not of
LBW, it may be inferred that birth weight is associated with later hypertension in the population in general

Confounding
 Confounding may be an important source of error
 A confounding factor is a background variable (i.e. something not of direct interest) which:
 Is different between the groups being compared, and
 Affects the outcome being studied
 In a study to compare the effect of folic acid supplementation in early pregnancy on neural tube defects, age
will be a confounding factor if:
 Either the folic acid group or placebo group tends to consist of older-women, and
 Older women are more, or less, likely to have a child with a neural tube defect

3
Statistics ASM

 When studying the effects of a new inhaled steroid against standard therapy for cystic fibrosis patients, disease
severity will be a confounder if:
 One of the groups (new steroid or standard therapy) consists of more severely affected patients, and
 Disease severity affects the outcome measure (lung function)
 In the comparison of hypertension rates between LBW and not LBW, social class will be a confounder if:
 The LBW babies are more likely to have lower social class, and
 Social class is associated with the risk of hypertension
 If a difference is found between the groups (folic acid/placebo, new steroid/standard therapy, and low/normal
birth weight) we will not know whether the differences are, respectively, the results of folic acid or age, of the
potency of the new steroid or the severity of disease in the patient, or of birth weight or social class
 Confounding may be avoided by matching individuals in the groups according to potential confounders
 For example:
 We could age-match folic acid and placebo pairs
 We could recruit individuals of low and normal birth weight from similar social classes
 We could find pairs of cystic fibrosis patients of similar disease severity, and randomly allocate one of
each pair to receive the new steroid while the other receives standard therapy

Different types of studies


A. Descriptive studies
1. Case reports
2. Case-series
3. Surveys studies
4. Qualitative studies
B. Observational studies
1. Case-control study
2. Cross-sectional study
3. Cohort study
4. Ecological study
C. Experimental studies
1. Randomized control trials
a) Double blind study
b) Single blind study
c) Unblinded study
2. Crossover study

Case series and case reports


 They consist either of collections of reports on the treatment of individual patients with the same condition, or
of reports on a single patient
 They are used to illustrate an aspect of a condition, the treatment or the adverse reaction to treatment
 They have no control group (one to compare outcomes), so they have no statistical validity
 The benefits of case series/reports are that they are easy to understand and can be written up in a very short
period of time

4
Statistics ASM

Observational studies
 The researchers don’t change anything. They only observe and document what occurs in one or more groups
of individuals (e.g. those who do and do not take folic acid early in pregnancy)
 When an observational study compares 2 groups, it may be categorized as being either:
1. Case control study → consider differences between the groups in the past
2. Cross-sectional study → consider differences between the groups at the present time
3. Cohort study → consider differences between the groups in the future
 Case control study
 It is a retrospective study. This means that you begin at the end (with the disease), and then work
backwards, to hunt for possible causes
 It usually compare diseased and healthy groups, and look back in time to see what they have done
differently in the past that may have led to disease
 These studies are concerned with aetiology rather than treatment. They are more suitable for rare diseases
 These studies can’t calculate incidence, prevalence, or relative risk. Results are expressed as odds ratios
 Case control studies are less reliable than either randomized controlled trials or cohort studies
 For example → a study in which colon cancer patients are asked what kinds of food they have eaten in the
past and the answers are compared with a selected healthy control group

 Cross-sectional study (transversal study, prevalence study)


 A study that examines the relationship between diseases (or other health-related characteristics) and other
variables of interest as they exist in a defined population at a specified time (i.e. exposure and outcomes
are both measured at the same time)
 It is usually used for quantifying:
 The prevalence of a disease or risk factor
 The accuracy of a diagnostic test
 These studies allow determination of prevalence and relative risk; but not the incidence
 It can’t evaluate hypotheses about causation, as it doesn’t take into account how the timing of exposure
to a risk factor relates to the development of disease
 For example → what is the current prevalence of cystic fibrosis in a population of adolescents ?
 Another example → Is there an association between diabetes and overweight ?
 Cohort study (longitudinal study)
 Two or more groups of subjects are selected on the basis of their exposure or lack of exposure to a
particular risk factor or agent (e.g. drinking alcohol or not, smoking or not, eating a high-fat diet or not),
and follow the groups forward in time to see how many in each group develop a particular disease
 Cohort studies are used to establish causation of a disease or to evaluate the outcome/impact of treatment,
when randomized controlled clinical trials are not possible (more suitable for common diseases)
 Cohort studies may be either:
a) Prospective → exposure factors are identified at the beginning of a study and a defined population is
followed into the future
b) Retrospective → past medical records for the defined population are used to identify exposure factors
 Prospective cohort studies are more reliable than retrospective cohort studies

5
Statistics ASM

 Cohort studies allow determination of incidence and relative risk; but not the prevalence
 Disadvantages include the large numbers required for rare outcomes, problems of drop-out bias, and
changes in practice during long follow-up periods
 Cohort studies are used for determining the outcome of infants born prematurely

 Ecological study
 Here the unit of analysis is a population rather than an individual, and association across different
populations is investigated
 For example → an ecological study may look at the association between prematurity and childhood cancer
rates in different countries, to see whether those countries with higher prematurity rates also have higher
levels of childhood cancers

Case-control studies Cohort studies


Suitable for rare diseases Suitable for common diseases
Short study time and cheaper to perform Prolonged study time with potential for increasing
drop-out rates, and therefore more costly
Smaller number of subjects required Large number of subjects usually required
Bias may occur in the selection of cases & Less selection bias occurs
controls, and in ascertaining exposure
Because data collected retrospectively, some data Prospective data collection may be more accurate
may not be available or of poorer quality
No volunteer subjects needed Subjects usually volunteer
Can’t determine incidence, prevalence or relative Can determine incidence, relative risk & attributable
risk. Results are expressed as odds ratios risk; but can’t calculate prevalence
Estimate of time from exposure to development Estimate of time from exposure to development of
of disease is not possible disease is possible

 Case-control studies can’t calculate incidence, prevalence, or relative risk. Results are expressed as odds ratios
 Cross-sectional allow determination of prevalence and relative risk; but not the incidence
 Cohort studies allow determination of incidence and relative risk; but not the prevalence
 Case-control studies are highly prone to selection and recall bias

Randomized controlled trial (randomized clinical trial, RCT)


 It is the mainstay of experimental medical studies, normally used in testing new drugs and treatments
 In randomized controlled studies:
 There are two groups, one treatment group and one control group. The treatment group receives the
treatment under investigation, and the control group receives either no treatment (placebo) or standard
treatment. The control group does not necessarily consist of normal healthy individuals
 Patients are randomly assigned to all groups
 Placebo is a pharmacologically inert dummy, identical in appearance to the treatment(s), should normally be
used for the control group, when there is no conventional treatment available
 Having a control group allows for a comparison of treatments (e.g. treatment A produced favorable results
56% of the time versus treatment B in which only 25% of patients had favorable results)
 Randomization means that each patient has the same chance of being assigned to either of the groups,
regardless of their personal characteristics. Note that random does not mean haphazard or systematic

6
Statistics ASM

 Randomization helps to avoid the selection bias in the assignment process. It also increases the probability
that differences between groups can be attributed to the treatment(s) under study
 Allocation concealment means that the allocation (to treatment or control) is unknown before the individual
is entered into the study
 RCT may be either:
a) Double-blind → neither the patient nor the medical staff/physician knows which treatment the patient has
been randomized to receive
b) Single-blind → either the patient or the medical staff/physician does not know (usually the patient)
c) Unblinded (or open) → both the patient and the medical staff/physician know
 It is preferable that studies are blinded, because knowledge of treatment may affect the outcome and introduce
a bias in the results
 There are certain types of questions on which RCT can’t be done for ethical reasons, for instance, if patients
were asked to undertake harmful experiences (e.g. smoking) or denied any treatment beyond a placebo when
there are known effective treatments
 For example → studies of treatments that consist essentially of taking pills are very easy to do double blind;
the patient takes one of two pills of identical size, shape, and color, and neither the patient nor the physician
needs to know which is which

 The "random" in RCT refers to the equal chance in allocation of individuals to either experimental or control group,
not to the way the sample is drawn

Stages of drug development in clinical trials


Phase Main aims
Preclinical In vitro studies → pharmacology and toxicology in laboratory animals
Phase 1 Checking for safety → safety, tolerability, pharmacokinetics, pharmacodynamics in healthy
individuals and/or patients
Phase 2 Checking for efficacy and dosing → initial treatment studies to determine the efficacy and dosing
of the drug in a small number of patients
Phase 3 Confirm the results → large RCTs comparing the new drug with the current gold standard drug
Phase 4 Post-marketing surveillance → long-term safety and rare events in all patients prescribed the drug

Crossover studies
 In a crossover (or within-patient) study, each patient receives treatment and placebo in a random order
 Crossover studies are only suitable for chronic disorders that are not cured, but for which treatment may give
temporary relief
 There should be no carryover effect of the treatment from one treatment period to the next
 Sometimes, it is necessary to leave a gap between the end of the first treatment and the start of the next to
ensure that there is no overlap. This gap period is known as a washout period
 The outcome may or may not be normally distributed

7
Statistics ASM

Systematic review
 It is a comprehensive survey of a topic that takes great care to find all relevant studies of the highest level of
evidence (both published & unpublished) → assess each study → synthesize the findings from individual
studies in an unbiased, explicit and reproducible way and present a balanced and impartial summary of the
findings with due consideration of any flaws in the evidence
 In this way it can be used for the evaluation of either existing or new technologies and practices
 A systematic review is more rigorous than a traditional literature review & attempts to reduce influence of bias
 The difference between a systematic review and a meta-analysis is that a systematic review looks at the whole
picture (qualitative view), while a meta-analysis looks for the specific statistical picture (quantitative view)

Meta-analyses
 Meta-analysis is a systematic, objective way to combine data from many studies, usually from RCTs, and
arrive at a pooled estimate of treatment effectiveness and statistical significance
 Meta-analysis can also combine data from case-control and cohort studies
 The results of a meta-analysis are usually expressed as odds ratio or relative risks
 The advantage to merging these data is that it increases sample size and allows for analyses that would not
otherwise be possible
 Two problems with meta-analysis:
a) Publication bias → studies showing no effect or little effect are often not published and just “filed” away
b) The quality of the design of the studies from which data is pulled
 This can lead to misleading results when all the data on the subject from “published” literature are summarized

 The results of a before and after study should always be viewed with great caution
 The results of both case-control studies and meta-analysis are usually expressed as odds ratio
 The main aim of randomization is to remove selection bias and confounding
 Consent to randomization should be obtained as a part of the overall consent to the study
 Crossover trials are less prone to confounding and are more efficient than parallel trials (different patients on each
treatment), and should be preferred where the nature and outcome for the treatments allows them
 The best study design to determine whether a certain factor is causally implicated in the onset of a rare disease is
case-control study

8
Statistics ASM

Comparing study designs


 There may be many ways to address a particular research question (e.g. if we wish to determine whether
smoking is causally associated with lung cancer, any of the following would potentially address this question:
 Case-control studies → would select a group of lung cancer patients and a group of healthy controls to
see how they differed in previous behaviors (i.e. smoking)
 Cross-sectional studies → would show that those with lung cancer are more likely to be current smokers
 Cohort studies → would select groups of currently healthy smokers and non-smokers, and follow these
forward in time to see whether one group was more likely to develop lung cancer
 Ecological studies → might show a relationship between levels of smoking and lung cancer rates in
different countries/regions
 Randomized controlled trial (RCT) → would randomly allocate healthy individuals to smoke or not,
and then see who developed lung cancer (unethical)
 In all of the study types, there is potential for confounding (e.g if the smokers are more likely to drink alcohol)

Matching the strongest research designs to clinical questions


Question Design
Causation and risk factors Case-control studies, Cohort studies
Prevalence and diagnosis Cross-sectional studies
Incidence and prognosis Cohort studies
Treatment Randomized controlled trails (RCT)

Intention-to-treat analysis
 In RCTs, the outcomes for the two allocation groups (those allocated to active treatment and those allocated to
control) should be compared irrespective of whether patients actually received or completed allocated
treatment(s) or had missing data or poor compliance
 This avoids introducing bias into the assessment of treatment

Interim analysis
 Analyses that are carried out before the end of the clinical trial in order to assess whether the accumulating
data are beginning to demonstrate a beneficial effect of one treatment over the other with sufficient certainty
 This can avoid further patients being randomized to the inferior treatment

Criteria for assessing causation


1. Specificity → if a variable is associated with a single outcome, and the outcome is associated with only a
single possible cause, then the relationship is more likely to be causal
2. Strength → strong associations are more likely than weak ones
3. Consistency → multiple studies finding the same thing are more likely to be causal
4. Temporality → causes must precede effects. This is absolutely necessary to suggest causation
5. Biological gradient → if an increased exposure is associated with an increased rate or severity of disease, then
causality is more likely
6. Plausibility → hypotheses should sound reasonable. However, new epidemiological findings may expand
knowledge
7. Coherence → causal association is strengthened if epidemiological data fits in with pathology
8. Experiment → if the cause is removed and disease frequency declines, the likelihood of a causal link is
strengthened
9. Analogy → if a similar association has been shown to be causal, then the association under investigation is
more likely to be causal

 Case-control studies are useful for studying disease types with a long latent period
 Observational studies cannot be randomized

9
Statistics ASM

Bias, reliability and validity


Bias
 Sampling bias → volunteer subjects in a study may not be representative of the population being studied;
as a consequence, the results of the study may not be generalizable to the entire population
 Selection bias → occurs when there is a systematic difference in the way study groups are chosen. One
method of decreasing this bias is randomization
 Recall bias → patients who experience an adverse outcome have a different likelihood of recalling an
exposure than do patients who do not have an adverse outcome, independent of the true extent of the exposure
 Expectancy bias → occurs when a physician knows which patients are in treatment versus placebo group,
causing the physician to draw conclusions supporting the expected outcome. One method of decreasing this
bias is a double-blind design
 Late-look bias → results from information being gathered too late to draw conclusions about the disease or
exposure of interest from the entire study population. For instance, more severe cases may have already died
 Measurement bias → describes how information gathered affects information collected. For example, the
Hawthorne effect describes how people act differently when being watched
 Proficiency bias → this is an issue when comparing the effects of different treatments administered at multiple
sites. Physicians at one site may have more skill, thereby providing better treatment
 Publication bias (Positive-outcome bias)
 It is where researchers publish only favorable results
 It is the tendency to publish research with a positive outcome more frequently than research with a
negative outcome
 Negative outcome refers to finding nothing of statistical significance or causal consequence, not to finding
that something affects us negatively
 Media bias → refers to the tendency of the media to publish medical study stories with positive outcomes
much more frequently than such stories with negative outcomes

Reducing bias in clinical treatment trials


 Blind studies, placebos, crossover studies, and randomized studies are used to reduce bias
 Blind studies
 The expectations of subjects can influence the effectiveness of treatment
 Blind studies attempt to reduce this influence
 In a single-blind study → the subject does not know what treatment he or she is receiving
 In a double-blind study → neither the subject nor the clinician-evaluator knows what treatment the
subject is receiving
 Placebo responses
 In a blind drug study, a subject may receive a placebo (an inactive substance) rather than the active drug
 People receiving the placebo are the control group
 People receiving the active drug are the experimental group
 A number of subjects in research studies respond to the treatment with placebos alone (the placebo effect)
 Crossover studies
 In a crossover study, subjects are randomly assigned to one of the two groups
 Subjects in Group 1 first receive the drug, and subjects in Group 2 first receive the placebo
 Later in the crossover study, the groups switch → those in Group 1 receive the placebo, and those in
Group 2 receive the drug
 Because all of the subjects receive both drug and placebo, each subject acts as his or her own control
 Randomization
 In order to ensure that the proportion of sicker and healthier people is the same in the treatment and
control (placebo) groups, subjects are randomly assigned to the groups
 The number of subjects in each group does not have to be equal

11
Statistics ASM

Reliability
 Reliability refers to the reproducibility or dependability of results
 Interrater reliability → is a measure of whether the results of the test are similar when the test is administered
by a different rater or examiner
 Test-retest reliability → is a measure of whether the results of the test are similar when the person is tested a
second or third time

Validity
 It is a measure of the appropriateness of a test, that is, whether the test assesses what it was designed to assess
(e.g. Does a new IQ test really measure IQ or does it instead measure educational level?)
 Sensitivity and specificity are components of validity

 To be useful, testing instruments must be bias-free, reliable, and valid


 Coverage is the proportion of children in a population who have been screened over a period of time
 Acceptable screening tests should give a yield of at least 1 in 10,000 positive diagnoses of a treatable condition

11
Statistics ASM

Quantifying risk
 Absolute risk, relative risk, attributable risk, and the odds (or odds risk) ratio are measures used to quantify
risk in population studies
 The odds ratio is calculated for case-control studies and meta-analyses
 Absolute, relative, and attributable risks are calculated for cohort studies
 Risk
 It is the probability that an event will happen
 As one boy is born for every two births → the risk (probability) of giving birth to a boy is 1/2 or 0.50
 If one in every 100 patients suffers a side effect from a treatment → the risk is 1/100 or 0.01
 Absolute risk
 It is the probability that a person will have a medical event
 It is the ratio of the number of people who have a medical event divided by all of the people who could
have the event because of their medical condition
 Absolute risk is expressed as a percentage
 Absolute risk is equal to the incidence rate
 For example → research studies have found that among 10,000 people age 75 and over who take
a drug like ibuprofen for osteoarthritis pain, 15 of them will die from stomach bleeding. The absolute
risk of dying from stomach bleeding is 15 out of 10,000, or 0.15% of people taking ibuprofen
 Absolute risk reduction (ARR)
 It is the difference in absolute risks
 It is a way of measuring the size of a difference between two treatments
 For example → if the incidence rate of lung cancer among the people in Newark and in Trenton, New
Jersey, in 2013 is 20/1,000 and 15/1,000 respectively → the absolute risk is 20/1,000 or 2.0% in Newark
and 1.5% in Trenton → the ARR is 2.0% − 1.5% = 0.5%
 Another example → in a clinical trial of a drug to prevent migraines, 2 of 100 people taking the drug
experience a migraine (2%), compared with 4 of 100 people taking a placebo (4%) → The absolute risk
reduction is 4% − 2% = 2%. That is, there were 2% fewer migraines in people taking the drug
 Relative risk (RR)
 It compares the incidence rate of a disorder among individuals exposed to a risk factor (e.g. lung cancer
among smokers) with the incidence rate of the same disorder among individuals not exposed to risk factor
 RR is the incidence rate of the exposed (or treated) group (i.e. experimental event rate = EER) divided by
the incidence rate of the unexposed (or untreated) group (i.e. control event rate = CER)
 RR = EER ÷ CER
 For example → incidence rate of lung cancer among smokers in a city in New Jersey is 20/1,000 (0.02),
while the incidence rate of lung cancer among non-smokers in this city is 2/1,000 (0.002). Therefore, the
fold increase in risk of lung cancer (the relative risk) for smokers vs. nonsmokers in this New Jersey
population is 0.02 ÷ 0.002 = 10
 A relative risk of 10 means that in this city, if an individual smokes, his or her risk of getting lung cancer
is 10 times that of a non-smoker
 Relative risk reduction (RRR)
 It is the proportion or percentage by which an intervention reduces the event rate
 RRR = (CER – EER) ÷ CER
 For example → in a clinical trial of a drug to prevent migraines, 2 of 100 people taking the drug
experience a migraine (2%), compared with 4 of 100 people taking a placebo (4%)
 RRR = (0.04 – 0.02) ÷ 0.04 = 0.5 = 50%
 Attributable risk
 Attributable risk is useful for determining what would happen in a study population if the risk factor were
removed (e.g. determining how common lung cancer would be in a study if people did not smoke)
 Attributable risk is the incidence rate of the unexposed group subtracted from the incidence rate of the
exposed group

12
Statistics ASM

 Attributable risk = EER - CER


 For example → incidence rate of lung cancer among smokers in a city in New Jersey is 20/1,000 (0.02),
while the incidence rate of lung cancer among non-smokers in this city is 2/1,000 (0.002). therefore, the
risk of lung cancer attributable to smoking (the attributable risk) in this New Jersey city's population is
0.02 – 0.002 = 0.018 (18/1,000)
 Odds
 They are calculated by dividing number of times an event happens by number of times it does not happen
 For every two births, one boy is born and one boy is not born → the odds of giving birth to a boy are
1/1 = 1, or 50:50
 If one in every 100 patients suffers a side effect from a treatment → odds are 1/99; while risk is 1/100
 Odds ratios (OR)
 An odds ratio is a measure of association between an exposure and an outcome
 An odds ratio is a relative measure of effect, which allows the comparison of the intervention group of a
study relative to the comparison or placebo group
 Since incidence data are not available in a case-control study, the odds ratio (i.e. odds risk ratio) can be
used as an estimate of relative risk in such studies
 They are calculated by dividing odds of having been exposed to a risk factor by odds in a control group
 An odds ratio of 1 → the likelihood of exposure to the risk factor is identical for both cases and controls
 An odds ratio of > 1 → the likelihood of exposure to the risk factor is greater for cases than for controls
 An odds ratio of < 1 → the likelihood of exposure to the risk factor is lower for cases than for controls
 Odds ratios should always be presented with their (95%) confidence intervals
 If CI includes 1 → the relationship between risk exposure and disease occurrence is not statistically
significant
 If CI does not include 1 → the relationship between risk exposure & disease occurrence is statistically
significant
 The width of the interval estimates the precision of the odds ratio
 A small CI → indicates a higher level of precision of the odds ratio
 A large CI → indicates a lower level of precision of the odds ratio

Relative risk Confidence interval Interpretation


1.77 (1.22 - 2.45) Statistically significant (increased risk)
1.63 (0.85 - 2.46) NOT statistically significant (risk is the same)
0.78 (0.56 - 0.94) Statistically significant (decreased risk)
- If RR > 1.0, then subtract 1.0 and read as percent increase. So 1.77 means one group has 77% more cases than the other.
- If RR < 1 .0, then subtract from 1.0 and read as reduction in risk. So 0.78 means one group has a 22% reduction in risk.

13
Statistics ASM

 Number needed to treat (NNT)


 NNT is the number of persons who need to take a treatment for one person to benefit from the treatment
 It is the reciprocal of ARR, and is more meaningful than the RRR for assessing the efficacy of a treatment
 RRR is constant regardless of risk, whereas NNT is likely to be higher in low risk groups
 NNT is 1 divided by the ARR [NNT = 1 ÷ ARR]
 Number needed to harm (NNH)
 NNH is the number of persons who need to be exposed to a risk factor for one person to be harmed who
would otherwise not be harmed
 NNH is 1 divided by the attributable risk [NNH = 1 ÷ attributable risk]

 Absolute risk = incidence rate


 Absolute risk reduction = the difference in absolute risks
 Relative risk = EER ÷ CER
 Attributable risk = EER – CER
 Relative risk reduction = (CER – EER) ÷ CER
 NNT = 1 ÷ absolute risk reduction
 NNH = 1 ÷ attributable risk
 Odds ratio = (A × D) ÷ (B × C)
 Odds and risk give similar values for rare events, but may be very different for common events
 Odds ratio are best described as the odds of an event in one group divided by the odds of the event in another
 The test threshold is the value below which treatment is not required
 The treatment threshold is the value at which a test is not warranted as the diagnosis is clinically very likely

14
Statistics ASM

Clinical probability
 Clinical probability is the number of times an event actually occurs divided by the number of times the event
can occur

15
Statistics ASM

Screening tests
 Screening tests are often used to identify individuals at risk of disease. Individuals who are positive on
screening may be investigated further to determine whether they actually have the disease
 Some of those who are screen-positive will not have the disease (i.e. false-positive screen test)
 Some of those who are screen-negative will have the disease (i.e. false-negative screen test)
 This gives a 4-fold situation as shown in the box below (a, b, c and d are the numbers of individuals who fall
into each of the 4 categories)

Screening test result Diseased Disease-free Totals


Positive (indicating possible disease) a (true +ve) b (false +ve) a+b
Negative c (false -ve) d (true –ve) c+d
Totals a+c b+d a+b+c+d

 There are several summary measures that are often used to quantify how good a screening test is:
 Sensitivity → is the proportion of true positives correctly identified by the test → proportion of true
positive screening test in diseased individuals → [a ÷ (a + c)] × 100
 Specificity → is the proportion of true negatives correctly identified by the test → proportion of true
negative screening test in healthy individuals → [d ÷ (b + d)] × 100
 Positive predictive value → describes the chance of a patient having the disease if the test is positive →
proportion of true positive screening test in all positive individuals → [a ÷ (a + b)] × 100
 Negative predictive value → describes the chance of a patient being disease free if the test is negative →
proportion of true negative screening test in all negative individuals → [d ÷ (c + d)] × 100
 Prevalence (pretest probability) → (a + c) ÷ (a + b + c + d)
 Accuracy → proportion of true +ve and true –ve to all individuals → (a + d) ÷ (a + b + c + d)
 Systematic error = (a + b) ÷ (a + c)
 For all of these measures, larger values are associated with better screening tests
 A high sensitivity implies few false negatives, which is important for very rare or lethal diseases
 A high specificity implies few false positives, which is important for common diseases (e.g. diabetes)
 Predictive value is the test’s ability to identify those individuals who truly have the disease (true positive)
amongst all those individuals whose screening tests are positive (true positive + false positive)
 The sensitivity, specificity, and likelihood ratios do not depend on the prevalence of the disease
 The PPV and NPV depend on the prevalence of the disease, and may vary from population to population
 If the prevalence is increased → PPV increases, NPV decreases & the proportion of false positives decreases
 Lowering the screening cut-off level:
 Increases the sensitivity, false-positive results, and negative predictive value (NPV)
 Decreases the specificity, false-negative results, and positive predictive value (PPV)
 No change in the incidence or prevalence of the disease

Likelihood ratios
 These compare the probability of the test result given that the individual has the disease to the probability of
the result occurring if they are disease free
 They are calculated from the sensitivity and specificity, and are not dependent on disease prevalence
 Positive likelihood ratio (LR+) → sensitivity ÷ (100 – specificity)
 Negative likelihood ratio (LR–) → (100 – sensitivity) ÷ specificity
 LR > 1 indicates that the test result is associated with the presence of the disease
 LR < 0.1 indicates that the test result is associated with the absence of disease
 Pre-test odds → the odds of having the disease before you do the test → [pretest probability ÷ (1 - pretest
probability)]
 Post-test odds → the odds of having the disease after you did the test → pre-test odds × LR+

16
Statistics ASM

 Pre-test probability (~ prevalence) → is the proportion of people in the population at risk who have the
disease at a specific time or time interval (i.e. the point prevalence or the period prevalence of the disease)
 Post-test probability → is the proportion of patients testing positive who truly have the disease → [post-test
odds ÷ (1 + post-test odds)]

 The rule of thumb is that a high sensitivity helps to rule out disease (SnOut) and a high specificity helps to rule in
(SpIn) disease

Example
 A screening test is applied to 300 patients with and without disease X. Of 100 who have the disease, 60 test
positive; and of 200 without the disease, only 20 test positive
 The following table can be constructed:

Screening test result Disease X Disease-free Totals


Positive (indicating possible disease) 60 20 80
Negative 40 180 220
Totals 100 200 300

 Sensitivity = a ÷ (a + c) = 60 ÷ (60 + 40) = 0.6 or 60%


 Specificity = d ÷ (b + d) = 180 ÷ (20 + 180) = 0.9 or 90%
 Positive predictive value = a ÷ (a + b) = 60 ÷ (60 + 20) = 0.75 or 75%
 Negative predictive value = d ÷ (c + d) = 180 ÷ (40 + 180) = 0.8 or 80%
 Prevalence (pre-test probability) = (a + c) ÷ (a + b + c + d) = 100 ÷ 300 = 0.33 or 33%
 Accuracy = (a + d) ÷ (a + b + c + d) = (60 + 180) ÷ 300 = 0.8 or 80%
 Systematic error = (a + b) ÷ (a + c) = (60 + 20) ÷ (60 + 40) = 0.8
 Positive likelihood ratio (LR+) = [sensitivity ÷ (100 – specificity)] = 60 ÷ (100 – 90) = 6
 Negative likelihood ratio (LR–) = [(100 – sensitivity) ÷ specificity] = (100 – 60) ÷ 90 = 0.44
 Pre-test odds = [pretest probability ÷ (1 - pretest probability)] = 0.33 ÷ (1 – 0.33) = 0.49
 Post-test odds = pre-test odds × LR+ = 0.49 × 6 = 2.94
 Post-test probability = [post-test odds ÷ (1 + post-test odds)] = 2.94 ÷ (1 + 2.94) = 0.75

Example
 In a study of the utility of serum procalcitonin level for early diagnosis of bacteraemia, 100 consecutive febrile
patients admitted to hospital were tested for serum procalcitonin and culture of bacteria. It was reported that
serum procalcitonin level above 0.5 microgram/L had a specificity of 90% in detecting bacteraemia
 A specificity of 90% means that 90% of individuals without the disease (bacteraemia) will test negative
(serum procalcitonin levels less than 0.5 µg/L)
 Therefore, 10% of individuals without the disease (bacteraemia) would be expected to falsely test positive
(serum procalcitonin levels more than 0.5 µg/L)

Example
 In a study of the utility of serum procalcitonin level for early diagnosis of bacteraemia, 100 consecutive febrile
patients admitted to hospital were tested for serum procalcitonin and culture of bacteria. It was reported that
serum procalcitonin level below 0.5 microgram/L had a negative predictive value of 95% in detecting
bacteraemia.
 NPV of 95% means that 95% of patients who have serum procalcitonin level below 0.5 microgram/L (tested
negative) do not have bacteraemia
 Therefore, 5% of patients who have serum procalcitonin level below 0.5 microgram/L (falsely tested negative)
would be expected to have bacteraemia

17
Statistics ASM

Distributions
Types of data
 Data may be either qualitative (categoric) or quantitative (numeric)
a) Qualitative data → can be nominal or ordinal
b) Quantitative data → can be discrete or continuous
 Qualitative (categoric) data:
 Deals with descriptions
 Data can be observed but not measured. For example:
 Colour of eyes → blue, green, brown, etc
 Socio-economic status → low, middle, or high
 Qualitative data are classified as either:
 Nominal → if there is no natural order between the categories (e.g. eye colour)
 Ordinal → if there is a natural order between the categories (e.g. exam results, socio-economic status)
 If there are only two categories, then the variable is binary
 Quantitative (numeric) data:
 Deals with numbers
 Data can be measured (e.g. length, height, area, volume, weight, speed, time, temperature, humidity)
 Quantitative data are classified as either:
 Discrete → if the measurements are integers (i.e. a whole number such as 3 or 4, but not 3.5) (e.g.
number of people in a household, number of cigarettes smoked per day, number of antibiotic courses)
 Continuous → if the measurements can take on any value, usually within some range (BW, HR, BP)
 Quantities such as sex and weight are called variables, because the value of these quantities vary from
one observation to another
 Numbers calculated to describe important features of the data are called statistics

Ranking data
 Ranking the data involves putting the values in numerical order, and then assigning new values to denote
where in the ordered set they fall
 We give the smallest value the rank 1, the next largest value the number 2, the next largest number 3 etc
 The numbers 1,2,3,... that are assigned to the various values are called the ranks
 The highest value will have a rank equal to the total number in the sample (i.e. if there are n values in the
sample, the largest value will have rank n)
 Equal data values should be given equal ranks. To achieve this, the corresponding ranks will be averaged
between the data values
 Non-parametric tests use the ranks rather than the original data values in the subsequent analysis
 The median (i.e. middle ranked value) is used as a measure of centre
 Non-parametric tests make comparisons of medians between groups as opposed to parametric tests which
compare means
 The ranks yield a lot less information than the original values and are not very sensitive to changes in the data
 For example → rank the following sample of 14 values (2, 34, -5, -7, 25, 2, 34, 34, 67, 28, -2, 0, 7, 23)
 Sorting the values into the order of magnitude gives → -7, -5, -2, 0, 2, 2, 7, 23, 25, 28, 34, 34, 34, 67
 Ranks are assigned

Values -7 -5 -2 0 2 2 7 23 25 28 34 34 34 67
Ranks 1 2 3 4 ? ? 7 8 9 10 ? ? ? 14
 There are 14 numbers → the smallest value has a rank of 1, while the largest number has a rank of 14
 The ranks 5 and 6 need to be assigned to the two '2's → hence assign rank (5+6)/2 = 5½ to each value 2
 The ranks 11,12 and 13 need to be assigned to the three '34's → hence assign rank (11+12+13)/3 = 12 to
each value 34

18
Statistics ASM

 Thus, the ranks for the sample will be:

Values -7 -5 -2 0 2 2 7 23 25 28 34 34 34 67
Ranks 1 2 3 4 5½ 5½ 7 8 9 10 12 12 12 14

Measures of central tendency


1. Mean (= average)
 The sum of the values of the observations divided by the numbers of observations
 Use the mean to describe the middle of a set of data that does not have an outlier
2. Median (= middle)
 The central value of data series when the values are lined up in order of magnitude (50th percentile value)
 Use the median to describe the middle of a set of data that does have an outlier
 To get the median → you first put your numbers in an ascending or descending order
 If you have an odd number of numbers → the median is the center number (e.g. 3 is the median for the
numbers 1, 1, 3, 4, 9)
 If you have an even number of numbers → the median is the average of the two innermost numbers (e.g.
2.5 is the median for the numbers 1, 2, 3, 7)
3. Mode (= most frequent)
 The most frequently occurring value in a set of observations
 It is not always representative of the true central value
 Use the mode when the data is non-numeric or when asked to choose the most popular item
 It is possible to have more than one mode, and it is possible to have no mode
 If there is no mode → write "no mode", and do not write zero (0)
 For example → find the mean, median and mode for the following data: 5, 15, 10, 15, 5, 10, 10, 20, 25, 15
 Firstly, you will need to organize the data in an ascending order → 5, 5, 10, 10, 10, 15, 15, 15, 20, 25
 Mean = (5 + 5 + 10 + 10 + 10 + 15 + 15 + 15 + 20 + 25) ÷10 = 130 ÷ 10 = 13
 Median → 5, 5, 10, 10, 10, 15, 15, 15, 20, 25
 Listing the data in order (from smallest to largest) is the easiest way to find the median
 The numbers 10 and 15 both fall in the middle
 Average these two numbers to get the median → (10 + 15) ÷ 2 = 12.5
 Mode → Two numbers appear most often (10 and 15). There are three 10's and three 15's. In this
example there are two answers for the mode
 What will happen to the measures of central tendency if we add the same amount to all data values, or
multiply each data value by the same amount?

Data Mean Median Mode


Original Data Set 6, 7, 8, 10, 12, 14, 14, 15, 16, 20 12.2 13 14
Add 3 to each data value 9, 10, 11, 13, 15, 17, 17, 18, 19, 23 15.2 16 17
Multiply 2 times each data value 12, 14, 16, 20, 24, 28, 28, 30, 32, 40 24.4 26 28

 When added → since all values are shifted the same amount, the measures of central tendency all shifted by
the same amount. If you add 3 to each data value, you will add 3 to the mean, mode and median
 When multiplied → since all values are affected by the same multiplicative values, the measures of central
tendency will feel the same affect. If you multiply each data value by 2, you will multiply the mean, mode
and median by 2

19
Statistics ASM

 The two sets of scores above are identical except for the first score. The set on the left shows the actual
scores. The set on the right shows what would happen if one of the scores was WAY out of range in regard to
the other scores. Such a term is called an outlier
 With the outlier → the mean changed, but the median did not change

Shapes of distribution
A. Normal distribution (Gaussian or Bell-shaped distribution)
 The normal distribution is symmetrical and bell-shaped, with one side the mirror image of the other
 If distribution is normal:
 The mean and standard deviation are preferable as a summary of the data
 The mean, median, and mode are equal
 About 68% of the observations fall within one standard deviation of the mean
 About 95% of the observations fall within 2 standard deviations of the mean
 About 99.7% of the observations fall within 3 standard deviations of the mean
B. Skewed distribution
 A distribution that is asymmetric (skewed)
 If distribution is skewed → the median & interquartile range (IQR) are preferable as a summary of data
 The skew is named according to the direction in which the tail points
 A skewed distribution with a longer tail among the lower values is skewed to left or negatively skewed
 A skewed distribution with a longer tail among the higher values is skewed to right or positively skewed
 In general, if the curve is skewed:
 The mean is always towards the long tail
 The mode is always on the top of the curve
 The median is somewhere between the two
 When the data are negatively skewed → mean < median < mode
 When the data are positively skewed → mean > median > mode

 Logarithmic conversion may transform a skewed distribution to a normal distribution


 A positive skew has the tail to the right, and the mean greater than the median
 A negative skew has the tail to the left, and the median greater than the mean
 For skewed distributions, the median is a better representation of central tendency than is the mean

21
Statistics ASM

Measures of variability
 Measures of variability tell you how "spread out" or how much variability is present in a set of numbers
 Measures of variability should be reported along with measures of central tendency because they provide very
different but complementary and important information
 An easy way to get the idea of variability is to look at two sets of data, one that is highly variable and one that
is not very variable
 For example, which of these two sets of numbers appears to be the most spread out, Set A or Set B?
 Set A → 93, 96, 98, 99, 99, 99, 100
 Set B → 10, 29, 52, 69, 87, 92, 100
 If you said Set B is more spread out, then you are right! The numbers in set B are more "spread out"; that is,
they are more variability
 Measures of variability include the range, the variance, and the standard deviation
 Range
 It is the difference between the largest and smallest value
 For example the range in Set A shown above is 7, and the range in Set B shown above is 90
 Variance (σ2)
 It is the average of the squared differences between each value and the mean
 It measures how far a set of numbers is spread out
 It tells you (exactly) the average deviation from the mean, in "squared units"
 A small variance indicates that the data points tend to be very close to the mean and hence to each other,
while a high variance indicates that the data points are very spread out from the mean and from each other
 A variance of zero indicates that all the values are identical (e.g. for the data 3, 3, 3, 3, 3, 3, the variance
and standard deviation will equal zero)
 For example, how to calculate the variance for the following data → 3, 4, 4, 5, 6, 8
 Firstly, calculate the mean → (3 + 4 + 4 + 5 + 6 + 8) ÷ 6 = 5
 Then for each number; subtract the mean and square the result → [(3-5)2 + (4-5)2 + (4-5)2 + (5-5)2 +
(6-5)2 + (8-5)2] = 4 + 1 + 1 + 0 + 1 + 9 = 16
 Finally, calculate the average of those squared differences → 16 ÷ 6 = 2.7
 Standard deviation (σ)
 It is the square root of the variance
 The standard deviation tells you (approximately) how far the numbers tend to vary from the mean
 SD should only be used when the data has a normal distribution
 It is important to distinguish between the standard deviation of a population and the standard deviation of
a sample. They have different notation, and they are computed differently
 The standard deviation of a population is denoted by σ, and the standard deviation of a sample, by s

 Percentile or centile → the value below which a certain percentage of values fall. The value at the 50th percentile
means that half the data is above and half below that value.
 Interquartile range → usually describes the data which fall between the 25th and 75th percentile
 Lead time → the interval between identification of a condition by screening and the development of symptoms
 Lag time → the interval between an intervention being assessed as clinically useful and an intervention actually
entering everyday practice

21
Statistics ASM

Standard error of mean and confidence intervals


Standard error of the mean (SEM, σM)
 It is also known as “standard error” (SE)
 It is the standard deviation of the sampling distribution of the mean
 It is a measure of how precisely the sample mean approximates the population mean
 It measures the scatter of data around the mean
 It is calculated by dividing the standard deviation by the square root of the sample size →
 Where σ is the standard deviation of the original distribution and N is the sample size
 The standard error of the mean must be smaller than the SD
 SE is NOT used to calculate the chi-squared test as it compares proportions, and is NOT related to the SD
 The SD and SE may only be used in a normally distributed population
 SEM is affected by two main factors:
a) Standard deviation of original population values → the greater the SD, the greater the SEM
b) Sample size → the greater the sample size, the lower the SEM

Confidence interval (CI)


 It is a range of values in which we are fairly confident the true population value lies
 Therefore, a confidence interval gives an estimated range (interval) of values which is likely to include an
unknown population parameter
 Confidence interval is usually reported as 95% CI (i.e. we can be 95% confident that the population value lies
within those limits)
 CI = mean ± (z × SE)
 The CI is equal to the mean of the sample (X) plus or minus the z score multiplied by the SE
 For the 95% CI → a z score of 2 is used
 For the 99% CI → a z score of 2.5 is used
 For the 99.7% CI → a z score of 3 is used
 Confidence interval is affected by two main factors:
a) Variation in the population
b) Sample size → the larger the sample size, the smaller the confidence interval

Example
 A research study is designed to identify the mean body weight of women between the ages of 30 and 39 in Los
Angeles. To do this, a researcher obtains the body weights of an unbiased sample of 81 women in Los Angeles
in this age group. The mean body weight of the women in the sample is 135 pounds with a SD of 18
 What is the estimated standard error of the mean for this population?
 SEM = 18 ÷ square root of 81 = 18 ÷ 9 = 2
 What is the 95% confidence interval for this sample?
 For the 95% CI → a z score of 2 is used
 95% CI = 135 ± (2 × 2) = 135 ± 4 = 131 - 139
 What is the 99% confidence interval for this sample?
 For the 99% CI → a z score of 2.5 is used
 99% CI = 135 ± (2.5 × 2) = 135 ± 5 = 130 – 140

 The wider the CI → the less precise and the more accurate the estimate of the mean
 The narrower the CI → the more precise and the less accurate the estimate of the mean
 Precision reflects how reliable the estimate is, and accuracy reflects how close the estimate is to the true mean
 If the CI contains zero → the difference will not be statistically significant (e.g. 95% CI is –2 to 22)
 The standard deviation is never more than the difference between the mean and the lowest value
 The standard deviation has to be about 65% of the difference between the mean and the lowest value
22
Statistics ASM

Significance tests
 Statistical significance tests, or hypothesis tests, use the sample data to assess how likely some specified null
hypothesis is to be correct
 The measure of “how likely” is given by a probability known as the p-value
 Usually, the null hypothesis is that there is “no difference” between the groups

Null hypotheses
 The null hypothesis (Ho) underlies all statistical tests
 Null hypothesis says that the findings are the result of chance or random factors
 If you want to show that a drug works, the null hypothesis will be that the drug does NOT work
 Example of the null hypothesis:
 A group of 20 patients who have similar systolic blood pressures at the beginning of a study (Time 1) is
divided into two groups of 10 patients each. One group is given daily doses of an experimental drug meant
to lower blood pressure (experimental group); the other group is given daily doses of a placebo (placebo
group). Blood pressure in all 20 patients is measured 2 weeks later (Time 2)
 The null hypothesis assumes that there are no significant differences in blood pressure between the two
groups at Time 2
 If, at Time 2, patients in the experimental group show systolic blood pressures similar to those in the
placebo group → the null hypothesis is not rejected
 If, at Time 2, patients in the experimental group have significantly lower or higher blood pressures than
those in the placebo group → the null hypothesis is rejected

Critical level (p-value)


 P-value is the probability of the null hypothesis to be true
 P-value is a measure of the strength of the evidence against the null hypothesis
 Since the p-value is a probability, it takes values between 0 and 1
 The P-value never is actually zero, so we never totally disprove the null hypothesis
 The smaller the value (nearer to 0), the less likely the null hypothesis is to be true, suggesting that there is
likely to be a difference between groups
 A P-value equal to or less than 0.05 is generally considered to be statistically significant
 If p ≤ 0.05 → reject the null hypothesis
 If p > 0.05 → do not reject the null hypothesis
 P = 0.05 → means that the difference will only have happened by chance 1 in 20 or 5%. This is considered to
be ‘statistically significant’
 P = 0.01 → means that the difference will only have happened by chance 1 in 100 or 1%. This is considered to
be ‘highly significant’
 P = 0.001 → means that the difference will have happened by chance 1 in 1000 times. This is considered to be
‘very highly significant’

Meaning of the P-value


 Provides criterion for making decisions about the null hypothesis
 Quantifies the chances that a decision to reject the null hypothesis will be wrong
 Tells statistical significance, not clinical significance or likelihood of benefit
 Limits to the p-value → the p-value does NOT tell us:
 The chance that an individual patient will benefit
 The percentage of patients who will benefit
 The degree of benefit expected for a given patient

23
Statistics ASM

Types of error
 Type 1 error (false positive)
 Rejection of the null hypothesis when it is really true (i.e. assuming that there is a difference between data
sets when in fact there is not) (e.g. asserting that the drug works when it doesn't)
 The chance of type I error is given by the p-value
 If p = 0.05 → the chance of a type I error is 5 in 100, or 1 in 20
 The probability of a Type I error is designated by the Greek letter alpha (α), and is called Type I error rate
or significance level
 Type 2 error (false negative)
 Failing to reject the null hypothesis when it is really false (i.e. assuming that there is no difference when
there is one) (e.g. asserting the drug does not work when it really does)
 The chance of a type II error cannot be directly estimated from the p-value
 The probability of a Type II error is designated by the Greek letter beta (β), and is called Type II error rate
 One minus the type II error (1-β) is the power of the test
 Type I error (error of commission) is generally considered worse than type II error (error of omission)
 If the null hypothesis is not rejected → there is no chance of a type I error
 If the null hypothesis is rejected → there is no chance of a type II error

Significance level (α)


 A Type I error occurs when the researcher rejects a null hypothesis, when it is actually true
 The probability of committing a Type I error is called the significance level, and is often denoted by α
 It refers to the amount of risk regarding the accuracy of the test that the researcher is willing to accept
 The significance level is used in hypothesis testing as follows:
 First, the difference between the results of the experiment and the null hypothesis is determined
 Then, assuming the null hypothesis is true, the probability of a difference that large or larger is computed
 Finally, this probability is compared to the significance level
 If the probability (P-value) is less than or equal to the significance level (α), then the null hypothesis is
rejected and the outcome is said to be statistically significant
 Traditionally, experimenters have used either the 0.05 level (sometimes called the 5% level) or the 0.01 level
(1% level), although the choice of levels is largely subjective
 The lower the significance level, the more the data must diverge from the null hypothesis to be significant.
Therefore, the 0.01 level is more conservative than the 0.05 level

Statistical power
 In statistics, power is the capacity to detect a difference if there is one
 Just as increasing the power of a microscope makes it easier to see what is going on in histology, increasing
statistical power allows us to detect what is happening in the data
 Power is directly related to type II error
 Power = 1 – type II error
 Power indicates the probability of not making a Type II error
 Power is the probability of correctly rejecting a false null hypothesis
 The power of a hypothesis test is affected by 3 factors:
a) Sample size (n) → other things being equal, the larger the sample size, the greater the power of the test
b) Significance level (α) → the higher the significance level, the higher the power of the test
c) The effect size → the greater the effect size, the greater the power of the test

24
Statistics ASM

Effect size
 The effect size is the difference between the true value and the value specified in the null hypothesis
 Effect size = True value - Hypothesized value
 For example → suppose the null hypothesis states that a population mean is equal to 100. A researcher might
ask: What is the probability of rejecting the null hypothesis if the true population mean is equal to 90?. In this
example, the effect size would be 90 - 100, which equals -10

 Power calculations should be performed before any study commences, and do not help directly with interpretation
of the results
 The expression p < 0.01 means that the probability of it occurring by chance alone is less than if p < 0.05
 Power is increased by increasing the number of patients in the study
 P < 0.0001 means that there is only a 1 in 10,000 probability that the results could have occurred by chance

One- and two-tailed tests


 A one-tailed test (one-sided test)
 It is concerned with differences between observations in one direction (i.e. only one tail of the normal
distribution curve) (e.g. whether drug A is better than a placebo)
 It is a test of a statistical hypothesis, where the region of rejection is on one side of sampling distribution
 For example → suppose the null hypothesis states that the mean is ≤ 10. The alternative hypothesis would
be that the mean is > 10. The region of rejection would consist of a range of numbers located on the right
side of sampling distribution; that is, a set of numbers greater than 10
 A two-tailed test (two-sided test)
 It is concerned with differences between observations in either direction (e.g. whether drug A or drug B is
better)
 It is a test of statistical hypothesis , where the region of rejection is on both sides of sampling distribution
 For example → suppose the null hypothesis states that the mean is equal to 10. The alternative hypothesis
would be that the mean is less than 10 or greater than 10. The region of rejection would consist of a range
of numbers located on both sides of sampling distribution; that is, the region of rejection would consist
partly of numbers that were less than 10 and partly of numbers that were greater than 10

Clinical versus statistical significance


 Statistical significance is not the same as clinical significance
 Although a study may show that the results from drug A are statistically significantly better than for drug B we
have to consider:
 The magnitude of the improvement
 The costs
 The ease of administration
 The potential side-effects of the two drugs, etc
before deciding that the result is clinically significant, and that drug A should be introduced in preference to
drug B
 Therefore, the question about whether a new treatment should be used in practice requires evaluating its
importance in the real world

 The confidence interval for a difference provides all of the information needed to do a significance test and p-values
are not needed in the presence of the 95% confidence interval for a difference
 If the CI contains zero → the difference will not be statistically significant (e.g. 95% CI is –2 to 22)
 The confidence interval (CI) is significant for any ratio if it doesn’t include zero
 The CI for relative risk and odds ratio is considered significant only if it doesn’t contain 1

25
Statistics ASM

Types of significance tests


 Statistical tests are used to analyze data from medical studies
 The results of statistical tests indicate whether to reject or not reject the null hypothesis
 Statistical tests can be parametric or nonparametric

Parametric tests
 Parametric tests use population parameters (e.g. mean scores) and are usually used to identify the presence of
statistically significant differences between groups when:
 Data is normally distributed
 The sample size is large
 Commonly used parametric statistical tests include:
a) t-test (sometimes called “Student's t-test” or “Student's paired t-test”)
b) Analysis of variance (ANOVA)
c) Pearson's coefficient of linear correlation
 Student's t-test
 It is a statistic that checks if two means (averages) are reliably different from each other
 It is used for comparing a single small sample with a population or to compare the difference in means
between two small samples
 Use Student's t-test when you have one nominal variable and one measurement variable, and you want to
compare the mean values of the measurement variable. The nominal variable must have only two values,
such as "male" and "female" or "treated" and "untreated"
 It is inappropriate if more than two means are compared
 The larger the t-value → the larger the difference between the two means → the smaller the p-value → the
stronger the evidence that the null hypothesis is untrue
 t-value = variance between groups ÷ variance within groups
 The t-test may be paired or unpaired
 Unpaired (independent) t-test → used to compare the average (means) of two groups of numerical
data, provided the values are approximately normally distributed and the samples are not small
 Paired (dependent) t-test → compares the means of two small paired observations of numerical data,
either on the same individual or on matched individuals
 If the two-sample t-test is invalid (non-normality and/or small samples) → Mann-Whitney U test is used
 Analysis of variance (ANOVA)
 This is a set of techniques used to compare the means of more than two samples
 They can also allow for independent variables which may affect the outcome

Non-parametric tests
 They are usually used to identify the presence of statistically significant differences between groups when:
 Data is not normally distributed
 The sample size is small
 Commonly used non-parametric statistical tests include:
a) Chi-square (x2) test
a) Wilcoxon signed rank test → used for matched or paired data
b) Wilcoxon rank sum test → used for unpaired data
c) Mann-Whitney U test → gives equivalent results to the Wilcoxon rank sum test
d) Kruskal-Wallis test
 Chi-square (x2) test
 It is used to compare proportions between two groups, provided the samples are large enough and the
proportions in each group not extreme
 If only small samples are available and/or the proportions are extreme → Fisher’s exact test is used
 If proportions are paired → McNemar’s test should be used
26
Statistics ASM

Normal distribution Non-normal distribution


Describe one group Mean, SD Median, interquartile range
Compare one group to a hypothetical value One-sample t test Wilcoxon test
Compare two unpaired groups Unpaired t test Mann-Whitney test
Compare two paired groups Paired t test Wilcoxon test
Compare three or more unmatched groups One-way ANOVA Kruskal-Wallis test
Compare three or more matched groups Repeated-measures ANOVA Friedman test
Quantify association between two variables Pearson correlation Spearman correlation
Predict value from another measured variable Simple linear regression or Non-linear Non-parametric regression
regression
Predict value from several measured or Multiple linear regression or Multiple
binomial variables nonlinear regression

 Non-parametric tests are less powerful than parametric tests


 t-Tests are appropriate for continuous numeric outcomes
 The t-test is used to compare the differences between 2 means
 Proportions can be compared between two groups using the chi-square or Fisher’s exact tests
 If the samples are large and the proportions are not extreme → chi-square test
 If the samples are small and the proportions are extreme → Fisher’s exact test
 Chi-square is not applicable for extreme percentages (e.g. 10% and 0.5%). Therefore, in order to compare 10%
with 0.5% seen in two different groups, Fisher’s exact test should be used
 The Chi-square test ranges from -1.0 to +1.0
 The larger the Chi-square value → the lower the p-value → the more statistically significant difference
 The Fisher exact test is used only for categorical measures
 95% confidence intervals are calculated as 1.96 times standard deviation
 The most appropriate statistical test for a non-normally distributed continuous measure when the groups are
independent is the Mann-Whiney U-test

27
Statistics ASM

Correlation
 Correlation exists if there is a linear relationship between two continuous variables (e.g. height and weight)
 The strength of the relationship is denoted by the Pearson correlation coefficient, which is denoted by r
 The correlation coefficient indicates how closely the points lie to a line
 The correlation coefficient gives a measure of the linear association between two continuous measurements
 If the two variables move in the same direction → r is positive (e.g. as height increases, body weight
increases, or as calorie intake decreases, body weight decreases)
 If the two variables move in opposite directions → r is negative (e.g., as time spent exercising increases, body
weight decreases)
 The r value can range from -1 to +1
 0 implies no linear association between the variables, although there may still be a non-linear one
 +1 or -1 represents perfect positive or negative correlation
 Different values have different implications for strength of the relationship, but cannot imply cause & effect

r value Strength of relationship


0.0-0.2 Very weak, probably meaningless
0.2-0.4 Low, might warrant investigation
0.4-0.6 Reasonable
0.6-0.8 Strong
0.8-1.0 Very strong

 A correlation coefficient may be strong but statistically non-significant because of sample size
 The statistical significance of the correlation coefficient is based on the associated p value
 For normal distributions → the Pearson correlation coefficient is used
 For non-normal distributions → the Spearman correlation coefficient is used

28
Statistics ASM

Regression
 Regression analysis is used to find how one set of data relates to another
 Linear regression is used to describe relationship between two quantitative variables, and how one value (y)
varies depending on another value (x) (e.g. the incidence of myocardial infarction and number of cigarettes
smoked per day)
 It is mainly used when there is one measured dependent variable and one or more independent variables
 In simple regression → there is one quantitative dependent variable and one independent variable
 In multiple regression → there is one quantitative dependent variable and two or more independent variables
 The regression line is given by the equation → y = mx + c
 Y → is the dependent variable that the equation tries to predict
 X → is the independent variable that is being used to predict Y
 C → is the regression constant (i.e. the value where the line intercepts the y axis)
 M → is the regression coefficient (i.e. the slope of the line, representing a positive or negative correlation)
 This can be used to predict what value y may take for a given value of x (since m is the amount by which y
changes, and c adds the difference when y is not at the baseline)
 Correlation and regression are easily confused:
 Correlation measures the strength of the association between variables
 Regression quantifies the association, and should only be used if one of the variables is thought to precede
or cause the other

 Logistic regression analysis can be used to compare proportions, making adjustment for other factors, yielding
adjusted odds ratios

29

You might also like