Professional Documents
Culture Documents
Statistic Sheet Highlighted
Statistic Sheet Highlighted
2015
Statistics ASM
Important definitions
Incidence
Incidence → the number of new cases within a defined population at a specified time
Incidence rate → the number of new cases per population at risk in a given time period
Incidence rate → is a ratio of the number of individuals in the population who develop an illness in a given
time period (commonly 1 year) divided by the total number of individuals at risk for the illness during that
time period (e.g. the number of IV drug abusers newly diagnosed with AIDS in 2013 divided by the number of
IV drug abusers in the population during 2013)
Incidence proportion (cumulative incidence) → the number of new cases within a specified time period
divided by the size of the population initially at risk. For example, if a population initially contains 1,000 non-
diseased persons and 28 develop a condition over 2 years of observation, the incidence proportion is 28 cases
per 1,000 persons → 2.8%
Number of new cases within a defined population at a specified time
Incidence rate = Total population at risk during the same period of time
Prevalence
Prevalence → the total number of cases within a defined population at a specified time
Prevalence rate → the total number of cases per population at risk in a given time period
Prevalence rate → is a ratio of the number of individuals in the population who have an illness (e.g. AIDS)
divided by the total number of individuals at risk for the illness
Point prevalence → is a ratio of the number of individuals who have an illness at a specific point in time (e.g.
the number of people who have AIDS on August 31, 2013) divided by the total population who could have the
illness on that date
Period prevalence → is a ratio of the number of individuals who have an illness during a specific time period
(e.g. the number of people who have AIDS in 2013) divided by the total population who could have the illness
during that specific time period
Lifetime prevalence (LTP) → the proportion of a population that at some point in their life (up to the time of
assessment) have experienced the condition
Prevalence rates are affected by the incidence of the disease and the duration of illness
Total number of cases within a defined population at a specified time
Prevalence rate =
Total population at risk during the same period of time
Prevalence answers "How many people have this disease right now?" and incidence answers "How many people
per year newly acquire this disease?"
Prevalence rate of an illness is decreased either when patients recover or die
Attack rate
It is the cumulative incidence of infection over a period of time. This is typically used during an epidemic (e.g.
during the influenza outbreak, the attack rate was 12%)
It is the biostatistical measure of frequency of morbidity, or speed of spread, in an at risk population
It is calculated by taking the number of new cases in the population at risk and dividing by the number of
persons at risk in the population. It is usually expressed as a percentage
For example → if, after a picnic, 20 out of 40 people who ate fried chicken and 10 out of 50 people who ate
fried fish become ill → the attack rate is 50% for chicken and 20% for fish
Number of new cases in the population at risk
Attack rate =
Number of persons at risk in the population
1
Statistics ASM
Mortality rates
Mortality rate is the number of deaths within a defined population at a specified time
Mortality rate is typically expressed in units of deaths per 1000 individuals per year
The standardized mortality ratio is the observed mortality rate divided by the expected mortality rate
Proportionate mortality rate (PMR) is used to determine the relative importance of a specific cause of death in
relation to all causes of death in a population (e.g. the leading cause of death in USA in 1980 was heart disease
with a PMR of 38.2%)
Stillbirth rate → number of stillbirths (i.e. babies born with no signs of life after 24 weeks’ gestation) per
1000 total births
Perinatal mortality rate → number of stillbirths and deaths within the first week of life per 1000 total births
Neonatal mortality rate → number of deaths of live born babies aged up to 1 month per 1000 live births
Infant mortality rate → number of deaths of all infants aged under 1 year per 1000 live births
Number of deaths during one year
Mortality rate = × 1000
Total population at mid-year
Number of still births and early neonatal deaths (< 7 days old) in a period of time
Perinatal mortality rate = × 1000
Total number of births (live & still) in the same period of time
Number of deaths in infants aged from 28 days to < 1 year in a period of time
Postneonatal mortality rate = × 1000
Total number of live births in the same period of time
Stillbirth → fetal death and expulsion from the uterus after 24 weeks’ gestation
Miscarriage/abortion → fetal death and expulsion from the uterus before 24 weeks’ gestation
2
Statistics ASM
Study design
Research questions
A research study should always be designed to answer a particular research question. The question usually
relates to a specific population. For example:
Does taking folic acid early in pregnancy prevent neural tube defects?
Is a new inhaled steroid better than current treatment for improving lung function among cystic fibrosis
patients?
Is low birth weight associated with hypertension in later life?
A well-built clinical foreground question should have 4 components. The PICO model is a helpful tool that
assists you in organizing and focusing your foreground question into a searchable query
P = Patient, Problem, Population
How would you describe a group of patients similar to you?
What are the most important characteristics of the patient?
I = Intervention, Prognostic Factor, Exposure
What main intervention are you considering?
What do you want to do with this patient?
What is the main alternative being considered?
C = Comparison (can be none or placebo)
What is the main alternative to compare with the intervention?
Are you trying to decide between 2 drugs, a drug and no medication or placebo, or 2 diagnostic tests?
O = Outcome
What are you trying to accomplish, measure, improve or affect?
Outcomes may be disease-oriented or patient-oriented
For example:
P = pregnant women; I = folic acid; C = no folic acid; O = neural tube defect – yes/no
P = CF patients; I = new inhaled steroid; C = current treatment, O = improvement in lung function
P = newborns; I = low birthweight; C = normal birthweight; O = hypertension
To answer the specified research question, random samples of the relevant populations are taken (e.g. pregnant
women, cystic fibrosis patients, LBW individuals, and normal birthweight individuals)
Based on the differences found between the different groups of samples, inferences are made about the
populations from which they were randomly sampled. For example:
If the women in the sample taking folic acid have fewer neural tube defects, it may be inferred that taking
folic acid during pregnancy will reduce the incidence of neural tube defects in the population
If among our sample of CF patients, those taking steroids have better lung function on average than those
on current treatment, the inference might be that steroids improve lung function among CF patients in
general. Note that some of the patients in the sample who were on current treatment may have had better
lung function than some of those using steroids, but it is the average difference that is considered
If there is a difference in hypertensive rates between samples of individuals who were and were not of
LBW, it may be inferred that birth weight is associated with later hypertension in the population in general
Confounding
Confounding may be an important source of error
A confounding factor is a background variable (i.e. something not of direct interest) which:
Is different between the groups being compared, and
Affects the outcome being studied
In a study to compare the effect of folic acid supplementation in early pregnancy on neural tube defects, age
will be a confounding factor if:
Either the folic acid group or placebo group tends to consist of older-women, and
Older women are more, or less, likely to have a child with a neural tube defect
3
Statistics ASM
When studying the effects of a new inhaled steroid against standard therapy for cystic fibrosis patients, disease
severity will be a confounder if:
One of the groups (new steroid or standard therapy) consists of more severely affected patients, and
Disease severity affects the outcome measure (lung function)
In the comparison of hypertension rates between LBW and not LBW, social class will be a confounder if:
The LBW babies are more likely to have lower social class, and
Social class is associated with the risk of hypertension
If a difference is found between the groups (folic acid/placebo, new steroid/standard therapy, and low/normal
birth weight) we will not know whether the differences are, respectively, the results of folic acid or age, of the
potency of the new steroid or the severity of disease in the patient, or of birth weight or social class
Confounding may be avoided by matching individuals in the groups according to potential confounders
For example:
We could age-match folic acid and placebo pairs
We could recruit individuals of low and normal birth weight from similar social classes
We could find pairs of cystic fibrosis patients of similar disease severity, and randomly allocate one of
each pair to receive the new steroid while the other receives standard therapy
4
Statistics ASM
Observational studies
The researchers don’t change anything. They only observe and document what occurs in one or more groups
of individuals (e.g. those who do and do not take folic acid early in pregnancy)
When an observational study compares 2 groups, it may be categorized as being either:
1. Case control study → consider differences between the groups in the past
2. Cross-sectional study → consider differences between the groups at the present time
3. Cohort study → consider differences between the groups in the future
Case control study
It is a retrospective study. This means that you begin at the end (with the disease), and then work
backwards, to hunt for possible causes
It usually compare diseased and healthy groups, and look back in time to see what they have done
differently in the past that may have led to disease
These studies are concerned with aetiology rather than treatment. They are more suitable for rare diseases
These studies can’t calculate incidence, prevalence, or relative risk. Results are expressed as odds ratios
Case control studies are less reliable than either randomized controlled trials or cohort studies
For example → a study in which colon cancer patients are asked what kinds of food they have eaten in the
past and the answers are compared with a selected healthy control group
5
Statistics ASM
Cohort studies allow determination of incidence and relative risk; but not the prevalence
Disadvantages include the large numbers required for rare outcomes, problems of drop-out bias, and
changes in practice during long follow-up periods
Cohort studies are used for determining the outcome of infants born prematurely
Ecological study
Here the unit of analysis is a population rather than an individual, and association across different
populations is investigated
For example → an ecological study may look at the association between prematurity and childhood cancer
rates in different countries, to see whether those countries with higher prematurity rates also have higher
levels of childhood cancers
Case-control studies can’t calculate incidence, prevalence, or relative risk. Results are expressed as odds ratios
Cross-sectional allow determination of prevalence and relative risk; but not the incidence
Cohort studies allow determination of incidence and relative risk; but not the prevalence
Case-control studies are highly prone to selection and recall bias
6
Statistics ASM
Randomization helps to avoid the selection bias in the assignment process. It also increases the probability
that differences between groups can be attributed to the treatment(s) under study
Allocation concealment means that the allocation (to treatment or control) is unknown before the individual
is entered into the study
RCT may be either:
a) Double-blind → neither the patient nor the medical staff/physician knows which treatment the patient has
been randomized to receive
b) Single-blind → either the patient or the medical staff/physician does not know (usually the patient)
c) Unblinded (or open) → both the patient and the medical staff/physician know
It is preferable that studies are blinded, because knowledge of treatment may affect the outcome and introduce
a bias in the results
There are certain types of questions on which RCT can’t be done for ethical reasons, for instance, if patients
were asked to undertake harmful experiences (e.g. smoking) or denied any treatment beyond a placebo when
there are known effective treatments
For example → studies of treatments that consist essentially of taking pills are very easy to do double blind;
the patient takes one of two pills of identical size, shape, and color, and neither the patient nor the physician
needs to know which is which
The "random" in RCT refers to the equal chance in allocation of individuals to either experimental or control group,
not to the way the sample is drawn
Crossover studies
In a crossover (or within-patient) study, each patient receives treatment and placebo in a random order
Crossover studies are only suitable for chronic disorders that are not cured, but for which treatment may give
temporary relief
There should be no carryover effect of the treatment from one treatment period to the next
Sometimes, it is necessary to leave a gap between the end of the first treatment and the start of the next to
ensure that there is no overlap. This gap period is known as a washout period
The outcome may or may not be normally distributed
7
Statistics ASM
Systematic review
It is a comprehensive survey of a topic that takes great care to find all relevant studies of the highest level of
evidence (both published & unpublished) → assess each study → synthesize the findings from individual
studies in an unbiased, explicit and reproducible way and present a balanced and impartial summary of the
findings with due consideration of any flaws in the evidence
In this way it can be used for the evaluation of either existing or new technologies and practices
A systematic review is more rigorous than a traditional literature review & attempts to reduce influence of bias
The difference between a systematic review and a meta-analysis is that a systematic review looks at the whole
picture (qualitative view), while a meta-analysis looks for the specific statistical picture (quantitative view)
Meta-analyses
Meta-analysis is a systematic, objective way to combine data from many studies, usually from RCTs, and
arrive at a pooled estimate of treatment effectiveness and statistical significance
Meta-analysis can also combine data from case-control and cohort studies
The results of a meta-analysis are usually expressed as odds ratio or relative risks
The advantage to merging these data is that it increases sample size and allows for analyses that would not
otherwise be possible
Two problems with meta-analysis:
a) Publication bias → studies showing no effect or little effect are often not published and just “filed” away
b) The quality of the design of the studies from which data is pulled
This can lead to misleading results when all the data on the subject from “published” literature are summarized
The results of a before and after study should always be viewed with great caution
The results of both case-control studies and meta-analysis are usually expressed as odds ratio
The main aim of randomization is to remove selection bias and confounding
Consent to randomization should be obtained as a part of the overall consent to the study
Crossover trials are less prone to confounding and are more efficient than parallel trials (different patients on each
treatment), and should be preferred where the nature and outcome for the treatments allows them
The best study design to determine whether a certain factor is causally implicated in the onset of a rare disease is
case-control study
8
Statistics ASM
Intention-to-treat analysis
In RCTs, the outcomes for the two allocation groups (those allocated to active treatment and those allocated to
control) should be compared irrespective of whether patients actually received or completed allocated
treatment(s) or had missing data or poor compliance
This avoids introducing bias into the assessment of treatment
Interim analysis
Analyses that are carried out before the end of the clinical trial in order to assess whether the accumulating
data are beginning to demonstrate a beneficial effect of one treatment over the other with sufficient certainty
This can avoid further patients being randomized to the inferior treatment
Case-control studies are useful for studying disease types with a long latent period
Observational studies cannot be randomized
9
Statistics ASM
11
Statistics ASM
Reliability
Reliability refers to the reproducibility or dependability of results
Interrater reliability → is a measure of whether the results of the test are similar when the test is administered
by a different rater or examiner
Test-retest reliability → is a measure of whether the results of the test are similar when the person is tested a
second or third time
Validity
It is a measure of the appropriateness of a test, that is, whether the test assesses what it was designed to assess
(e.g. Does a new IQ test really measure IQ or does it instead measure educational level?)
Sensitivity and specificity are components of validity
11
Statistics ASM
Quantifying risk
Absolute risk, relative risk, attributable risk, and the odds (or odds risk) ratio are measures used to quantify
risk in population studies
The odds ratio is calculated for case-control studies and meta-analyses
Absolute, relative, and attributable risks are calculated for cohort studies
Risk
It is the probability that an event will happen
As one boy is born for every two births → the risk (probability) of giving birth to a boy is 1/2 or 0.50
If one in every 100 patients suffers a side effect from a treatment → the risk is 1/100 or 0.01
Absolute risk
It is the probability that a person will have a medical event
It is the ratio of the number of people who have a medical event divided by all of the people who could
have the event because of their medical condition
Absolute risk is expressed as a percentage
Absolute risk is equal to the incidence rate
For example → research studies have found that among 10,000 people age 75 and over who take
a drug like ibuprofen for osteoarthritis pain, 15 of them will die from stomach bleeding. The absolute
risk of dying from stomach bleeding is 15 out of 10,000, or 0.15% of people taking ibuprofen
Absolute risk reduction (ARR)
It is the difference in absolute risks
It is a way of measuring the size of a difference between two treatments
For example → if the incidence rate of lung cancer among the people in Newark and in Trenton, New
Jersey, in 2013 is 20/1,000 and 15/1,000 respectively → the absolute risk is 20/1,000 or 2.0% in Newark
and 1.5% in Trenton → the ARR is 2.0% − 1.5% = 0.5%
Another example → in a clinical trial of a drug to prevent migraines, 2 of 100 people taking the drug
experience a migraine (2%), compared with 4 of 100 people taking a placebo (4%) → The absolute risk
reduction is 4% − 2% = 2%. That is, there were 2% fewer migraines in people taking the drug
Relative risk (RR)
It compares the incidence rate of a disorder among individuals exposed to a risk factor (e.g. lung cancer
among smokers) with the incidence rate of the same disorder among individuals not exposed to risk factor
RR is the incidence rate of the exposed (or treated) group (i.e. experimental event rate = EER) divided by
the incidence rate of the unexposed (or untreated) group (i.e. control event rate = CER)
RR = EER ÷ CER
For example → incidence rate of lung cancer among smokers in a city in New Jersey is 20/1,000 (0.02),
while the incidence rate of lung cancer among non-smokers in this city is 2/1,000 (0.002). Therefore, the
fold increase in risk of lung cancer (the relative risk) for smokers vs. nonsmokers in this New Jersey
population is 0.02 ÷ 0.002 = 10
A relative risk of 10 means that in this city, if an individual smokes, his or her risk of getting lung cancer
is 10 times that of a non-smoker
Relative risk reduction (RRR)
It is the proportion or percentage by which an intervention reduces the event rate
RRR = (CER – EER) ÷ CER
For example → in a clinical trial of a drug to prevent migraines, 2 of 100 people taking the drug
experience a migraine (2%), compared with 4 of 100 people taking a placebo (4%)
RRR = (0.04 – 0.02) ÷ 0.04 = 0.5 = 50%
Attributable risk
Attributable risk is useful for determining what would happen in a study population if the risk factor were
removed (e.g. determining how common lung cancer would be in a study if people did not smoke)
Attributable risk is the incidence rate of the unexposed group subtracted from the incidence rate of the
exposed group
12
Statistics ASM
13
Statistics ASM
14
Statistics ASM
Clinical probability
Clinical probability is the number of times an event actually occurs divided by the number of times the event
can occur
15
Statistics ASM
Screening tests
Screening tests are often used to identify individuals at risk of disease. Individuals who are positive on
screening may be investigated further to determine whether they actually have the disease
Some of those who are screen-positive will not have the disease (i.e. false-positive screen test)
Some of those who are screen-negative will have the disease (i.e. false-negative screen test)
This gives a 4-fold situation as shown in the box below (a, b, c and d are the numbers of individuals who fall
into each of the 4 categories)
There are several summary measures that are often used to quantify how good a screening test is:
Sensitivity → is the proportion of true positives correctly identified by the test → proportion of true
positive screening test in diseased individuals → [a ÷ (a + c)] × 100
Specificity → is the proportion of true negatives correctly identified by the test → proportion of true
negative screening test in healthy individuals → [d ÷ (b + d)] × 100
Positive predictive value → describes the chance of a patient having the disease if the test is positive →
proportion of true positive screening test in all positive individuals → [a ÷ (a + b)] × 100
Negative predictive value → describes the chance of a patient being disease free if the test is negative →
proportion of true negative screening test in all negative individuals → [d ÷ (c + d)] × 100
Prevalence (pretest probability) → (a + c) ÷ (a + b + c + d)
Accuracy → proportion of true +ve and true –ve to all individuals → (a + d) ÷ (a + b + c + d)
Systematic error = (a + b) ÷ (a + c)
For all of these measures, larger values are associated with better screening tests
A high sensitivity implies few false negatives, which is important for very rare or lethal diseases
A high specificity implies few false positives, which is important for common diseases (e.g. diabetes)
Predictive value is the test’s ability to identify those individuals who truly have the disease (true positive)
amongst all those individuals whose screening tests are positive (true positive + false positive)
The sensitivity, specificity, and likelihood ratios do not depend on the prevalence of the disease
The PPV and NPV depend on the prevalence of the disease, and may vary from population to population
If the prevalence is increased → PPV increases, NPV decreases & the proportion of false positives decreases
Lowering the screening cut-off level:
Increases the sensitivity, false-positive results, and negative predictive value (NPV)
Decreases the specificity, false-negative results, and positive predictive value (PPV)
No change in the incidence or prevalence of the disease
Likelihood ratios
These compare the probability of the test result given that the individual has the disease to the probability of
the result occurring if they are disease free
They are calculated from the sensitivity and specificity, and are not dependent on disease prevalence
Positive likelihood ratio (LR+) → sensitivity ÷ (100 – specificity)
Negative likelihood ratio (LR–) → (100 – sensitivity) ÷ specificity
LR > 1 indicates that the test result is associated with the presence of the disease
LR < 0.1 indicates that the test result is associated with the absence of disease
Pre-test odds → the odds of having the disease before you do the test → [pretest probability ÷ (1 - pretest
probability)]
Post-test odds → the odds of having the disease after you did the test → pre-test odds × LR+
16
Statistics ASM
Pre-test probability (~ prevalence) → is the proportion of people in the population at risk who have the
disease at a specific time or time interval (i.e. the point prevalence or the period prevalence of the disease)
Post-test probability → is the proportion of patients testing positive who truly have the disease → [post-test
odds ÷ (1 + post-test odds)]
The rule of thumb is that a high sensitivity helps to rule out disease (SnOut) and a high specificity helps to rule in
(SpIn) disease
Example
A screening test is applied to 300 patients with and without disease X. Of 100 who have the disease, 60 test
positive; and of 200 without the disease, only 20 test positive
The following table can be constructed:
Example
In a study of the utility of serum procalcitonin level for early diagnosis of bacteraemia, 100 consecutive febrile
patients admitted to hospital were tested for serum procalcitonin and culture of bacteria. It was reported that
serum procalcitonin level above 0.5 microgram/L had a specificity of 90% in detecting bacteraemia
A specificity of 90% means that 90% of individuals without the disease (bacteraemia) will test negative
(serum procalcitonin levels less than 0.5 µg/L)
Therefore, 10% of individuals without the disease (bacteraemia) would be expected to falsely test positive
(serum procalcitonin levels more than 0.5 µg/L)
Example
In a study of the utility of serum procalcitonin level for early diagnosis of bacteraemia, 100 consecutive febrile
patients admitted to hospital were tested for serum procalcitonin and culture of bacteria. It was reported that
serum procalcitonin level below 0.5 microgram/L had a negative predictive value of 95% in detecting
bacteraemia.
NPV of 95% means that 95% of patients who have serum procalcitonin level below 0.5 microgram/L (tested
negative) do not have bacteraemia
Therefore, 5% of patients who have serum procalcitonin level below 0.5 microgram/L (falsely tested negative)
would be expected to have bacteraemia
17
Statistics ASM
Distributions
Types of data
Data may be either qualitative (categoric) or quantitative (numeric)
a) Qualitative data → can be nominal or ordinal
b) Quantitative data → can be discrete or continuous
Qualitative (categoric) data:
Deals with descriptions
Data can be observed but not measured. For example:
Colour of eyes → blue, green, brown, etc
Socio-economic status → low, middle, or high
Qualitative data are classified as either:
Nominal → if there is no natural order between the categories (e.g. eye colour)
Ordinal → if there is a natural order between the categories (e.g. exam results, socio-economic status)
If there are only two categories, then the variable is binary
Quantitative (numeric) data:
Deals with numbers
Data can be measured (e.g. length, height, area, volume, weight, speed, time, temperature, humidity)
Quantitative data are classified as either:
Discrete → if the measurements are integers (i.e. a whole number such as 3 or 4, but not 3.5) (e.g.
number of people in a household, number of cigarettes smoked per day, number of antibiotic courses)
Continuous → if the measurements can take on any value, usually within some range (BW, HR, BP)
Quantities such as sex and weight are called variables, because the value of these quantities vary from
one observation to another
Numbers calculated to describe important features of the data are called statistics
Ranking data
Ranking the data involves putting the values in numerical order, and then assigning new values to denote
where in the ordered set they fall
We give the smallest value the rank 1, the next largest value the number 2, the next largest number 3 etc
The numbers 1,2,3,... that are assigned to the various values are called the ranks
The highest value will have a rank equal to the total number in the sample (i.e. if there are n values in the
sample, the largest value will have rank n)
Equal data values should be given equal ranks. To achieve this, the corresponding ranks will be averaged
between the data values
Non-parametric tests use the ranks rather than the original data values in the subsequent analysis
The median (i.e. middle ranked value) is used as a measure of centre
Non-parametric tests make comparisons of medians between groups as opposed to parametric tests which
compare means
The ranks yield a lot less information than the original values and are not very sensitive to changes in the data
For example → rank the following sample of 14 values (2, 34, -5, -7, 25, 2, 34, 34, 67, 28, -2, 0, 7, 23)
Sorting the values into the order of magnitude gives → -7, -5, -2, 0, 2, 2, 7, 23, 25, 28, 34, 34, 34, 67
Ranks are assigned
Values -7 -5 -2 0 2 2 7 23 25 28 34 34 34 67
Ranks 1 2 3 4 ? ? 7 8 9 10 ? ? ? 14
There are 14 numbers → the smallest value has a rank of 1, while the largest number has a rank of 14
The ranks 5 and 6 need to be assigned to the two '2's → hence assign rank (5+6)/2 = 5½ to each value 2
The ranks 11,12 and 13 need to be assigned to the three '34's → hence assign rank (11+12+13)/3 = 12 to
each value 34
18
Statistics ASM
Values -7 -5 -2 0 2 2 7 23 25 28 34 34 34 67
Ranks 1 2 3 4 5½ 5½ 7 8 9 10 12 12 12 14
When added → since all values are shifted the same amount, the measures of central tendency all shifted by
the same amount. If you add 3 to each data value, you will add 3 to the mean, mode and median
When multiplied → since all values are affected by the same multiplicative values, the measures of central
tendency will feel the same affect. If you multiply each data value by 2, you will multiply the mean, mode
and median by 2
19
Statistics ASM
The two sets of scores above are identical except for the first score. The set on the left shows the actual
scores. The set on the right shows what would happen if one of the scores was WAY out of range in regard to
the other scores. Such a term is called an outlier
With the outlier → the mean changed, but the median did not change
Shapes of distribution
A. Normal distribution (Gaussian or Bell-shaped distribution)
The normal distribution is symmetrical and bell-shaped, with one side the mirror image of the other
If distribution is normal:
The mean and standard deviation are preferable as a summary of the data
The mean, median, and mode are equal
About 68% of the observations fall within one standard deviation of the mean
About 95% of the observations fall within 2 standard deviations of the mean
About 99.7% of the observations fall within 3 standard deviations of the mean
B. Skewed distribution
A distribution that is asymmetric (skewed)
If distribution is skewed → the median & interquartile range (IQR) are preferable as a summary of data
The skew is named according to the direction in which the tail points
A skewed distribution with a longer tail among the lower values is skewed to left or negatively skewed
A skewed distribution with a longer tail among the higher values is skewed to right or positively skewed
In general, if the curve is skewed:
The mean is always towards the long tail
The mode is always on the top of the curve
The median is somewhere between the two
When the data are negatively skewed → mean < median < mode
When the data are positively skewed → mean > median > mode
21
Statistics ASM
Measures of variability
Measures of variability tell you how "spread out" or how much variability is present in a set of numbers
Measures of variability should be reported along with measures of central tendency because they provide very
different but complementary and important information
An easy way to get the idea of variability is to look at two sets of data, one that is highly variable and one that
is not very variable
For example, which of these two sets of numbers appears to be the most spread out, Set A or Set B?
Set A → 93, 96, 98, 99, 99, 99, 100
Set B → 10, 29, 52, 69, 87, 92, 100
If you said Set B is more spread out, then you are right! The numbers in set B are more "spread out"; that is,
they are more variability
Measures of variability include the range, the variance, and the standard deviation
Range
It is the difference between the largest and smallest value
For example the range in Set A shown above is 7, and the range in Set B shown above is 90
Variance (σ2)
It is the average of the squared differences between each value and the mean
It measures how far a set of numbers is spread out
It tells you (exactly) the average deviation from the mean, in "squared units"
A small variance indicates that the data points tend to be very close to the mean and hence to each other,
while a high variance indicates that the data points are very spread out from the mean and from each other
A variance of zero indicates that all the values are identical (e.g. for the data 3, 3, 3, 3, 3, 3, the variance
and standard deviation will equal zero)
For example, how to calculate the variance for the following data → 3, 4, 4, 5, 6, 8
Firstly, calculate the mean → (3 + 4 + 4 + 5 + 6 + 8) ÷ 6 = 5
Then for each number; subtract the mean and square the result → [(3-5)2 + (4-5)2 + (4-5)2 + (5-5)2 +
(6-5)2 + (8-5)2] = 4 + 1 + 1 + 0 + 1 + 9 = 16
Finally, calculate the average of those squared differences → 16 ÷ 6 = 2.7
Standard deviation (σ)
It is the square root of the variance
The standard deviation tells you (approximately) how far the numbers tend to vary from the mean
SD should only be used when the data has a normal distribution
It is important to distinguish between the standard deviation of a population and the standard deviation of
a sample. They have different notation, and they are computed differently
The standard deviation of a population is denoted by σ, and the standard deviation of a sample, by s
Percentile or centile → the value below which a certain percentage of values fall. The value at the 50th percentile
means that half the data is above and half below that value.
Interquartile range → usually describes the data which fall between the 25th and 75th percentile
Lead time → the interval between identification of a condition by screening and the development of symptoms
Lag time → the interval between an intervention being assessed as clinically useful and an intervention actually
entering everyday practice
21
Statistics ASM
Example
A research study is designed to identify the mean body weight of women between the ages of 30 and 39 in Los
Angeles. To do this, a researcher obtains the body weights of an unbiased sample of 81 women in Los Angeles
in this age group. The mean body weight of the women in the sample is 135 pounds with a SD of 18
What is the estimated standard error of the mean for this population?
SEM = 18 ÷ square root of 81 = 18 ÷ 9 = 2
What is the 95% confidence interval for this sample?
For the 95% CI → a z score of 2 is used
95% CI = 135 ± (2 × 2) = 135 ± 4 = 131 - 139
What is the 99% confidence interval for this sample?
For the 99% CI → a z score of 2.5 is used
99% CI = 135 ± (2.5 × 2) = 135 ± 5 = 130 – 140
The wider the CI → the less precise and the more accurate the estimate of the mean
The narrower the CI → the more precise and the less accurate the estimate of the mean
Precision reflects how reliable the estimate is, and accuracy reflects how close the estimate is to the true mean
If the CI contains zero → the difference will not be statistically significant (e.g. 95% CI is –2 to 22)
The standard deviation is never more than the difference between the mean and the lowest value
The standard deviation has to be about 65% of the difference between the mean and the lowest value
22
Statistics ASM
Significance tests
Statistical significance tests, or hypothesis tests, use the sample data to assess how likely some specified null
hypothesis is to be correct
The measure of “how likely” is given by a probability known as the p-value
Usually, the null hypothesis is that there is “no difference” between the groups
Null hypotheses
The null hypothesis (Ho) underlies all statistical tests
Null hypothesis says that the findings are the result of chance or random factors
If you want to show that a drug works, the null hypothesis will be that the drug does NOT work
Example of the null hypothesis:
A group of 20 patients who have similar systolic blood pressures at the beginning of a study (Time 1) is
divided into two groups of 10 patients each. One group is given daily doses of an experimental drug meant
to lower blood pressure (experimental group); the other group is given daily doses of a placebo (placebo
group). Blood pressure in all 20 patients is measured 2 weeks later (Time 2)
The null hypothesis assumes that there are no significant differences in blood pressure between the two
groups at Time 2
If, at Time 2, patients in the experimental group show systolic blood pressures similar to those in the
placebo group → the null hypothesis is not rejected
If, at Time 2, patients in the experimental group have significantly lower or higher blood pressures than
those in the placebo group → the null hypothesis is rejected
23
Statistics ASM
Types of error
Type 1 error (false positive)
Rejection of the null hypothesis when it is really true (i.e. assuming that there is a difference between data
sets when in fact there is not) (e.g. asserting that the drug works when it doesn't)
The chance of type I error is given by the p-value
If p = 0.05 → the chance of a type I error is 5 in 100, or 1 in 20
The probability of a Type I error is designated by the Greek letter alpha (α), and is called Type I error rate
or significance level
Type 2 error (false negative)
Failing to reject the null hypothesis when it is really false (i.e. assuming that there is no difference when
there is one) (e.g. asserting the drug does not work when it really does)
The chance of a type II error cannot be directly estimated from the p-value
The probability of a Type II error is designated by the Greek letter beta (β), and is called Type II error rate
One minus the type II error (1-β) is the power of the test
Type I error (error of commission) is generally considered worse than type II error (error of omission)
If the null hypothesis is not rejected → there is no chance of a type I error
If the null hypothesis is rejected → there is no chance of a type II error
Statistical power
In statistics, power is the capacity to detect a difference if there is one
Just as increasing the power of a microscope makes it easier to see what is going on in histology, increasing
statistical power allows us to detect what is happening in the data
Power is directly related to type II error
Power = 1 – type II error
Power indicates the probability of not making a Type II error
Power is the probability of correctly rejecting a false null hypothesis
The power of a hypothesis test is affected by 3 factors:
a) Sample size (n) → other things being equal, the larger the sample size, the greater the power of the test
b) Significance level (α) → the higher the significance level, the higher the power of the test
c) The effect size → the greater the effect size, the greater the power of the test
24
Statistics ASM
Effect size
The effect size is the difference between the true value and the value specified in the null hypothesis
Effect size = True value - Hypothesized value
For example → suppose the null hypothesis states that a population mean is equal to 100. A researcher might
ask: What is the probability of rejecting the null hypothesis if the true population mean is equal to 90?. In this
example, the effect size would be 90 - 100, which equals -10
Power calculations should be performed before any study commences, and do not help directly with interpretation
of the results
The expression p < 0.01 means that the probability of it occurring by chance alone is less than if p < 0.05
Power is increased by increasing the number of patients in the study
P < 0.0001 means that there is only a 1 in 10,000 probability that the results could have occurred by chance
The confidence interval for a difference provides all of the information needed to do a significance test and p-values
are not needed in the presence of the 95% confidence interval for a difference
If the CI contains zero → the difference will not be statistically significant (e.g. 95% CI is –2 to 22)
The confidence interval (CI) is significant for any ratio if it doesn’t include zero
The CI for relative risk and odds ratio is considered significant only if it doesn’t contain 1
25
Statistics ASM
Parametric tests
Parametric tests use population parameters (e.g. mean scores) and are usually used to identify the presence of
statistically significant differences between groups when:
Data is normally distributed
The sample size is large
Commonly used parametric statistical tests include:
a) t-test (sometimes called “Student's t-test” or “Student's paired t-test”)
b) Analysis of variance (ANOVA)
c) Pearson's coefficient of linear correlation
Student's t-test
It is a statistic that checks if two means (averages) are reliably different from each other
It is used for comparing a single small sample with a population or to compare the difference in means
between two small samples
Use Student's t-test when you have one nominal variable and one measurement variable, and you want to
compare the mean values of the measurement variable. The nominal variable must have only two values,
such as "male" and "female" or "treated" and "untreated"
It is inappropriate if more than two means are compared
The larger the t-value → the larger the difference between the two means → the smaller the p-value → the
stronger the evidence that the null hypothesis is untrue
t-value = variance between groups ÷ variance within groups
The t-test may be paired or unpaired
Unpaired (independent) t-test → used to compare the average (means) of two groups of numerical
data, provided the values are approximately normally distributed and the samples are not small
Paired (dependent) t-test → compares the means of two small paired observations of numerical data,
either on the same individual or on matched individuals
If the two-sample t-test is invalid (non-normality and/or small samples) → Mann-Whitney U test is used
Analysis of variance (ANOVA)
This is a set of techniques used to compare the means of more than two samples
They can also allow for independent variables which may affect the outcome
Non-parametric tests
They are usually used to identify the presence of statistically significant differences between groups when:
Data is not normally distributed
The sample size is small
Commonly used non-parametric statistical tests include:
a) Chi-square (x2) test
a) Wilcoxon signed rank test → used for matched or paired data
b) Wilcoxon rank sum test → used for unpaired data
c) Mann-Whitney U test → gives equivalent results to the Wilcoxon rank sum test
d) Kruskal-Wallis test
Chi-square (x2) test
It is used to compare proportions between two groups, provided the samples are large enough and the
proportions in each group not extreme
If only small samples are available and/or the proportions are extreme → Fisher’s exact test is used
If proportions are paired → McNemar’s test should be used
26
Statistics ASM
27
Statistics ASM
Correlation
Correlation exists if there is a linear relationship between two continuous variables (e.g. height and weight)
The strength of the relationship is denoted by the Pearson correlation coefficient, which is denoted by r
The correlation coefficient indicates how closely the points lie to a line
The correlation coefficient gives a measure of the linear association between two continuous measurements
If the two variables move in the same direction → r is positive (e.g. as height increases, body weight
increases, or as calorie intake decreases, body weight decreases)
If the two variables move in opposite directions → r is negative (e.g., as time spent exercising increases, body
weight decreases)
The r value can range from -1 to +1
0 implies no linear association between the variables, although there may still be a non-linear one
+1 or -1 represents perfect positive or negative correlation
Different values have different implications for strength of the relationship, but cannot imply cause & effect
A correlation coefficient may be strong but statistically non-significant because of sample size
The statistical significance of the correlation coefficient is based on the associated p value
For normal distributions → the Pearson correlation coefficient is used
For non-normal distributions → the Spearman correlation coefficient is used
28
Statistics ASM
Regression
Regression analysis is used to find how one set of data relates to another
Linear regression is used to describe relationship between two quantitative variables, and how one value (y)
varies depending on another value (x) (e.g. the incidence of myocardial infarction and number of cigarettes
smoked per day)
It is mainly used when there is one measured dependent variable and one or more independent variables
In simple regression → there is one quantitative dependent variable and one independent variable
In multiple regression → there is one quantitative dependent variable and two or more independent variables
The regression line is given by the equation → y = mx + c
Y → is the dependent variable that the equation tries to predict
X → is the independent variable that is being used to predict Y
C → is the regression constant (i.e. the value where the line intercepts the y axis)
M → is the regression coefficient (i.e. the slope of the line, representing a positive or negative correlation)
This can be used to predict what value y may take for a given value of x (since m is the amount by which y
changes, and c adds the difference when y is not at the baseline)
Correlation and regression are easily confused:
Correlation measures the strength of the association between variables
Regression quantifies the association, and should only be used if one of the variables is thought to precede
or cause the other
Logistic regression analysis can be used to compare proportions, making adjustment for other factors, yielding
adjusted odds ratios
29