Heterogeneity

EVIDENCE BASED MEDICINE
1. In medical statistics which of the following terms describes the probability that a particular
result would have occured by chance
Heterogeneity
Homogeneity
External vailidity
Internal vailidity
P-value
The p value is the value which represents the probability that a particular result would have occured
by chance
2. Which of the following is typically used as a graphical representation to show the results of a
meta analysis?
Histogram
Scatterplot
Bar Diagram
Forest plot
Kaplan Meier curve

The results of multiple individual trials in a systematic review or meta analysis may be represented
diagrammatically using a 'forest plot'.
As can be seen from the diagram below each study has its own square plotted to show that studies
outcome measure. The area of each square is proportional to the study's weight in the meta-
analysis. The overall meta-analysed measure of effect is often represented on the plot as a dashed
vertical line. This meta-analysed measure of effect is commonly plotted as a diamond
Image sourced from Wikipedia
3. In a normal (gaussian) distribution what percentage of results lie within 1 standard deviations
of the mean?
50%
66%
68%
95%
99.7%
Gaussian distribution
In a normal (gaussian) distribution the median equals the mode and mean value. 68% of values are
within 1 standard deviation of the mean, 95% are within 2 standard deviations of the mean and
99.7% are within 3 standard deviations of the mean. This is known as the three-sigma rule.
Next question
4. Which one of the following statements regarding significance tests is incorrect?
Parametric data is usually normally distributed
Student's t-test may be paired on unpaired
Pearson's product-moment coefficient is used to assess correlation between two variables
Chi-squared test is used to compare parametric data
Paired data refers to data obtained from a single group of patients
Significance tests: types
The type of significance test used depends on whether the data is parametric (something which can
be measured, usually normally distributed) or non-parametric
Parametric tests
 Student's t-test - paired or unpaired*

 Pearson's product-moment coefficient - correlation
Non-parametric tests
 Mann-Whitney U test - unpaired data

 Wilcoxon signed-rank test - compares two sets of observations on a single sample
 chi-squared test - used to compare proportions or percentages
 Spearman, Kendall rank - correlation
*paired data refers to data obtained from a single group of patients, e.g. Measurement before and
after an intervention. Unpaired data comes from two different groups of patients, e.g. Comparing
response to different interventions in two groups
Next question
Condition Positive Condition Negative
Test Outcome Positive A B
Test Outcome Negative C D
5. Which of the following represent Type 2 errors?
A+B
False negatives are type 2 errors
Statistics
A = True Positive
B = False Positive
C = False Negative
D = True Negative

Sensitivity
Sensitivity is the ability of a test to correctly identify disease.
Sensitivity = A / A+C or True Positives/Number of people with disease
Specificity
Specificity is the ability of a test to correctly exclude a disease
Specificity = D / B+D or True Negatives/Number of people without disease
Positive Predictive Value

Positive predictive value = A/A+B or True Positives/All positive test results
Negative Predictive Value

Negative predictive value = D/C+D or True Negatives/All Negative test results
Likelihood Ratios
LR+ = Sensitivity / (1 - Specificity)
LR- = (1- Sensitivity) / Specificity
Next question
6. An endocrinologist performs a study to assess whether a patient's HbA1c level is correlated

to their LDL level. Assuming both HbA1c and LDL are normally distributed, which one of the
following statistical tests is it most appropriate to perform?
Chi-squared test
Pearson's product-moment coefficient
Mann-Whitney test
Spearman's rank correlation coefficient
McNemar's test
Correlation
 parametric (normally distributed): Pearson's coefficient
 non-parametric: Spearman's coefficient
Pearson's product-moment coefficient test is most appropriate as the data is parametric and the
study is assessing the correlation of two variables
McNemar's test is a non-parametric method used on nominal data to determine whether the row and
column marginal frequencies are equal
Parametric tests


Next question
7. A follow-up study is performed looking at the height of 100 adults who were given steroids
during childhood. The average height of the adults is 169cm, with a standard deviation of
16cm. What is the standard error of the mean?
Cannot be calculated
1.69
0.16
1.6
1.3
Standard error of the mean = standard deviation / square root (number of patients)
The standard error of the mean is calculated by the standard deviation / square root (number of
patients)
= 16 / square root (100) = 16 / 10 = 1.6

Please rate this question:
Confidence interval and standard error of the mean
The confidence interval is a common and sometimes misunderstood principle in medical statistics.
 a formal definition may be: a range of values for a variable of interest constructed so that this
range has a specified probability of including the true value of the variable. The specified
probability is called the confidence level, and the end points of the confidence interval are
called the confidence limits*
 in simpler terms: a range of values within which the true effect of intervention is likely to lie
The likelihood of the true effect lying within the confidence interval is determined by the confidence
level. For example a confidence interval at the 95% confidence level means that the confidence
interval should contain the true effect of intervention 95% of the time.
How is the confidence interval calculated?
The standard error of the mean (SEM) is a measure of the spread expected for the mean of the
observations - i.e. how 'accurate' the calculated sample mean is from the true population mean
Key point
 SEM = SD / square root (n)

 where SD = standard deviation and n = sample size
 therefore the SEM gets smaller as the sample size (n) increases
A 95% confidence interval:
 lower limit = mean - (1.96 * SEM)

 upper limit = mean + (1.96 * SEM)
The above formula is a slight simplification:
 if a small sample size is used (e.g. n < 100) then it is important to use a 'Student's T critical
value' look-up table to replace 1.96 with a different value
 if a different confidence level is required, e.g. 90% then 1.96 is replaced by a different value.
For 90% this would 1.645
Results such as mean value are often presented along with a confidence interval. For example, in a
study the mean height in a sample taken from a population is 183cm. You know that the standard
error (SE) (the standard deviation of the mean) is 2cm. This gives a 95% confidence interval of 179-
187cm (+/- 2 SE).
*Last JM. A dictionary of epidemiology. Oxford: International Journal of Epidemiology, 1988

Next question
8. In evidence based medicine what does the term MeSH stand for?
Method of Scientific Hierarchy
Medical Subject Headings
Method of Structural Harmonization
Medical Statistics Hierarchy
Method Statistical Help
MeSH stands for Medical Subject Headings. MeSH is used by databases such as Medline/Pubmed
to classify medical information in a hierarchical structure.

9. Regarding the table above which of the below formulas would you use to calculate the
positive likelihood ratio?
A ÷ (A+C)
D ÷ (B+D)
(A/A+C) ÷ (1 - (D/B+D))
(1 - (A/A+C)) ÷ (D/B+D)
A ÷ (A+B)
Two types of likelihood ratios exist - positive (LR+) and negative (LR-).
Positive likelihood ratios tell us how likely a disease is if the test is positive, while the negative
likelihood ratio tells us how unlikely a disease is if the test is negative.
They are calculated as:

A ÷ (A+C) is the calculation for sensitivity

D ÷ (B+D) is the calculation for specificity
A ÷ (A+B) is the calculation for positive predictive value
question
Statistics
A = True Positive
B = False Positive
C = False Negative
D = True Negative
Sensitivity
Specificity


Likelihood Ratios
Next question
10. Which one of the following techniques would be most suitable to detect and quantify a viral
protein?
Polymerase chain reaction
Northern blotting
Western blotting
Southern blotting
Eastern blotting
Molecular biology techniques
 SNOW (South - NOrth - West)

 DROP (DNA - RNA - Protein)
The following table shows a very basic summary of molecular biology techniques
Technique Description
Southern blotting Detects DNA
Northern blotting Detects RNA
Western blotting Detects proteins

Uses gel electrophoresis to separate native proteins by 3-D structure
Examples include the confirmatory HIV test
 SNOW (South - NOrth - West)

 DROP (DNA - RNA - Protein)
Enzyme-linked immunosorbent assay (ELISA)
 a type of biochemical assay used to detect antigens and antibodies

 a colour changing enzyme is attached to the antibody if looking for an antigen and to an
antigen if looking for an antibody
 the sample therefore changes colour if the antigen or antibody is detected
 an example includes the initial HIV test
Next question
11.
Which of the following represent Type 1 errors?
A+B
False positives are type 1 errors

Statistics
A = True Positive
B = False Positive
C = False Negative
D = True Negative

Sensitivity
Specificity


Likelihood Ratios
Next question
12. Which of the following id NOT a common feature of lithium toxicity?
Ataxia
Nausea
Tented T-waves on ECG
Tremor
Vomiting
The most common ECG finding in Lithium toxicity is T-wave flattening

The main symptoms of toxicity are:
 GI symptoms eg nausea, vomiting and diarrhoea

 Neuromuscular symptoms eg dystonia, tremor, ataxia, hyperreflexia, altered mental state
 Severely poisoned patients can develop a syndrome of irreversible lithium-effectuated
neurotoxicity (SILENT). Features include cognitive impairment, peripheral neuropathy and
cerebellar dysfunction.
Lithium
Lithium Key Points
 Lithium has a narrow therapeutic index and serum levels should be taken at 12 hours post
dose.
 Serum levels should be between 0.4-1mmol.litre.
 Overdose occurs with levels over 1.5mmol/litre, levels over 2.0mmol/litre require urgent
treatment.
 Patients should carry lithium cards to alert medical professionals of treament and to monitor
levels.
Next question
13. Which one of the following significance tests is used to analyse data which is measured and
follows a normal distribution?
Chi-squared test
Spearman's rank correlation coefficient
Wilcoxon matched-pairs
Mann-Whitney test
Student's t-test
Student's t-test is used to analyse parametric data. The other tests are used on non-parametric data
Parametric tests


Next question
14. For the following data set:
1,3,4,5,5,5,6,7,7,9,12,14,17
What is the mean?
7.3
7.7
Mean = 7.3
Mode = 5
Median = 6
Mean, Median and Mode
The mean or average when the context is clear, is the sum of a collection of numbers divided by the
number of numbers in the collection
The mode is the value that appears most often in a set of data
The median is the number separating the higher half of a data sample, a population, or a probability
distribution, from the lower half. The median of a finite list of numbers can be found by arranging all
the observations from lowest value to highest value and picking the middle one
For example a study looks at the number of attendances brittle asthmatics have at A&E's across the
north of England each quarter. Each A&E submits the number of attendances as below:
1,3,4,5,5,5,6,7,7,9,12,14,17
Mean = 7.3
Mode = 5
Median = 6
Next question
A contingency table is constructed for a new blood protein marker to screen for prostate cancer in
men aged between 50 and 70 years:
Prostate cancer present Prostate cancer absent
New test positive 19 20
New test negative 14 723
15. What is the positive predictive value of the new test?
19/20
723/743
19/39
19/33
723/737
Positive predictive value = true positives / (true positives + false positives)
= 19 / (19 + 20)
Screening test statistics
Patients and doctors need to know if a disease or condition is present or absent. Tests can be used
to help us decide. Tests generally guide us by indicating how likely it is that the patient has the
condition.
In order to interpret test results we need to have a working knowledge of the statistics used to
describe them.
Contingency tables (also known as 2 * 2 tables, see below) are used to illustrate and calculate test
statistics such as sensitivity. It would be unusual for a medical exam not to feature a question based
around screening test statistics. Commit the following table to memory and spend time practicing
using it as you will be expected to make calculations using it in your exam.
TP = true positive; FP = false positive; TN = true negative; FN = false negative
Disease present Disease absent
Test positive TP FP
Test negative FN TN
The table below lists the main statistical terms used in relation to screening tests:
Measure Formula Plain english
Sensitivity TP / (TP + FN ) Proportion of patients with the condition who have a

positive test result
Specificity TN / (TN + FP) Proportion of patients without the condition who

have a negative test result
Positive predictive value TP / (TP + FP) The chance that the patient has the condition if the
diagnostic test is positive
Negative predictive value TN / (TN + FN) The chance that the patient does not have the
condition if the diagnostic test is negative
Likelihood ratio for a sensitivity / (1 - How much the odds of the disease increase when a
positive test result specificity) test is positive
Likelihood ratio for a (1 - sensitivity) / How much the odds of the disease decrease when a
negative test result specificity test is negative
Positive and negative predictive values are prevalence dependent. Likelihood ratios are not
prevalence dependent.
Precision
The precision quantifies a tests ability to produce the same measurements with repeated tests.
Next question
16. A 64 year old woman, whose husband had a TIA one month ago brings you a newspaper
article with the headline 'new super drug prevents stroke'. Reading through the article with
her, it states that a recent clinical trial has shown that a new lipid-lowering therapy for stroke
had a number needed to treat (NNT) of 20 for the prevention of the primary end-point. How
do you best describe these results to her?
20 extra patients in the placebo group had a stroke
For 1000 patients treated with active therapy, there would be 50 fewer strokes
For 1000 patients treated with active therapy, there would be 20 fewer strokes
For every 1000 patients treated with active therapy there would be 100 fewer strokes
20 patients in the treatment group were protected from stroke
This prevention study for stroke reveals that 20 patients need to be treated to prevent one event.
Thus if you treat a 1000 patients then you will expect to have 50 fewer strokes.
NNT is a time-specific epidemiological measure of the number of patient who need to be treated in
order to prevent one adverse outcome. A perfect NNT would be 1, where everyone improves with
treatment, thus the higher the NNT, the less effective the treatment.
Numbers needed to treat and absolute risk reduction
Numbers needed to treat (NNT) is a measure that indicates how many patients would require an
intervention to reduce the expected number of outcomes by one
It is calculated by 1/(Absolute risk reduction) and is rounded to the next highest whole number
Experimental event rate (EER) = (Number who had particular outcome with the intervention) / (Total
number who had the intervention)
Control event rate (CER) = (Number who had particular outcome with the control/ (Total number who
had the control)
Absolute risk reduction = CER-EER or EER-CER?
The absolute risk reduction (ARR) may be calculated by finding the absolute difference between the
control event rate (CER) and the experimental event rate (EER). You will often find both versions of
the above listed in different sources. In some ways in doesn't matter which you use as you will end
up with the same answer but from a technical point of view:
 if the outcome of the study is undesirable then ARR = CER - EER

 if the outcome of the study is desirable then ARR* = EER - CER
*this may be more accurately termed absolute benefit increase, rather than absolute risk reduction
Next question
17. The serum potassium is measured in a 1,000 patients taking an ACE inhibitors. The mean
potassium is 4.6 mmol/l with a standard deviation of 0.3 mmol/l. Which one of the following
statements is correct?
95% of values lie between 4.5 and 4.75 mmol/l
95.4% of values lie between 4.3 and 4.9 mmol/l
We know that 68.3% of values of a normally distributed variable lie within 1 standard deviation of the
mean. This means the range is 4.3 to 4.9 mmol/l.
Normal distribution
The normal distribution is also known as the Gaussian distribution or 'bell-shaped' distribution. It
describes the spread of many biological and clinical measurements
Properties of the Normal distribution
 symmetrical i.e. Mean = mode = median

 68.3% of values lie within 1 SD of the mean
 this is often reversed, so that within 1.96 SD of the mean lie 95% of the sample values
 the range of the mean - (1.96 *SD) to the mean + (1.96 * SD) is called the 95% confidence
interval, i.e. If a repeat sample of 100 observations are taken from the same group 95 of
them would be expected to lie in that range
Standard deviation
 the standard deviation (SD) is a measure of how much dispersion exists from the mean
 SD = square root (variance)
Next question
18. In systematic reviews which of the following terms refers to the degree of incompatibility
between the results of trials?
Heterogeneity
Homogeneity
External validity
Internal vailidity
P-value
In systematic reviews heterogeneity (not homogeneity) refers to the degree of incompatibility

between the results of trials included in the review.
External validity of a study refers to degree to which the results of a study can be applied to non-
study populations. The internal validity refers to how good the experimental techniques used to
determine cause and effect in a study are.
EBM Definitions
Next question
19. Which of the following statements regarding likelihood ratios (LR) is correct?
LR > 1 indicates disease
LR can be calculated if sensitivity is known
LR can be calculated if specificity is known
Negative LR = sensitivity / (1 -specificity)
Positive LR = (1-sensitivity) / specificity
positive likelihood ratio = sensitivity / (1 -specificity)

negative likelihood ratio = (1-sensitivity) / specificity
A likelihood ratio > 1 indicates the test result is associated with the disease.
A likelihood ratio < 1 indicates that the result is associated with absence of the disease
Sensitivity and specificity are both needed to calculate likelihood ratios
Discuss and give feedback
Next question
Statistics
A = True Positive
B = False Positive
C = False Negative
D = True Negative
Sensitivity
Specificity


Likelihood Ratios
20. The Framingham Heart Study is an example of a:
Cross-sectional survey
Cohort study
Case-control study
Randomised controlled trial

Meta-analysis
Study design
The following table highlights the main features of the main types of study:
Study type Key features
Randomised Participants randomly allocated to intervention or control group (e.g. standard

controlled trial treatment or placebo)
Practical or ethical problems may limit use
Cohort study Observational and prospective. Two (or more) are selected according to their
exposure to a particular agent (e.g. medicine, toxin) and followed up to see how
many develop a disease or other outcome.
The usual outcome measure is the relative risk.
Examples include Framingham Heart Study
Case-control Observational and retrospective. Patients with a particular condition (cases) are
study identified and matched with controls. Data is then collected on past exposure to a
possible causal agent for the condition.
The usual outcome measure is the odds ratio.
Inexpensive, produce quick results

Useful for studying rare conditions
Prone to confounding
Cross-sectional Provide a 'snapshot', sometimes called prevalence studies

survey
Provide weak evidence of cause and effect
Next question
21. A study is designed to compare the calcium levels of males and females who have Crohn's
disease. The investigators aim to discover whether there is a difference between the average
calcium level in males compared to females. From previous studies it is known that the
calcium levels are normally distributed. Which one of the following statistical tests is it most
appropriate to use?
Pearson's test
Mann-Whitney test
Chi-squared test
Student's unpaired t-test
Student's paired t-test
As the data is parametric and compares two independent sample from the same population an
unpaired t-test is the most appropriate test to use
Parametric tests


Next question
22. What is the correct formula to calculate the positive predictive value?
Sensitivity / (1 - specificity)
TP / (TP + FP)
TN / (TN + FP)
TN / (TN + FN)
TP / (TP + FN )
Positive predictive value = TP / (TP + FP)
condition.
describe them.

Test positive TP FP
Test negative FN TN


Precision
Next question
23. A randomised controlled trial is performed to look at a new drug to prevent hip fractures in
postmenopausal women. Group A consists of 1,000 women who take the new drug whilst
group B contains 1,400 women taking a placebo. The hip fracture rate in group A is 2% and
in group B is 4%. What is the number needed to treat to prevent one hip fracture?
10
50
12
NNT = 1 / Absolute Risk Reduction
They key to answering this question is to ignore irrelevant data such as the number of patients in
each group.
Control event rate = 4% = 0.04

Experimental event rate = 2% = 0.02
Absolute risk reduction = 0.04 - 0.02 = 0.02

Number needed to treat = 1 / 0.02 = 50
Numbers needed to treat and absolute risk reduction
Numbers needed to treat (NNT) is a measure that indicates how many patients would require an
intervention to reduce the expected number of outcomes by one
It is calculated by 1/(Absolute risk reduction) and is rounded to the next highest whole number
Experimental event rate (EER) = (Number who had particular outcome with the intervention) / (Total
number who had the intervention)
Control event rate (CER) = (Number who had particular outcome with the control/ (Total number who
had the control)
Absolute risk reduction = CER-EER or EER-CER?

The absolute risk reduction (ARR) may be calculated by finding the absolute difference between the
control event rate (CER) and the experimental event rate (EER). You will often find both versions of
the above listed in different sources. In some ways in doesn't matter which you use as you will end
up with the same answer but from a technical point of view:
 if the outcome of the study is undesirable then ARR = CER - EER

 if the outcome of the study is desirable then ARR* = EER - CER
*this may be more accurately termed absolute benefit increase, rather than absolute risk reduction
Next question
24. Evidence can be organised into a hierarchy. Of the following which is the best source of
evidence?
Cohort study
Expert opinion
Observational studies
Systematic review
Randomised control trial
Systematic reviews provide the most valuable evidence. See the pyramid b
Hierarchy of Evidence
Next question
25.
Specificity is calculated by which of the following?
A/A+B
A/A+C
B/A+B
B/B+D
D/B+D
Specificity is calculated by using D/B+D
Statistics
A = True Positive
B = False Positive
C = False Negative
D = True Negative
Sensitivity
Specificity


Likelihood Ratios
Next question
26. If a new anti-platelet drug is given to 100 people and 2 people experience life-threatening
bleeding due to the agent. What is the number needed to harm?
2
50
98
100
200
In this case NNH is 50.
This is calculated in the same way as for NNT, but used to describe adverse events. For NNH, large
numbers are good, because they mean that adverse events are rare. Small values for NNH are bad,
because they mean adverse events are common.
It is defined as the inverse of the attributable risk.
Statistics
A = True Positive
B = False Positive
C = False Negative
D = True Negative
Sensitivity
Specificity


Likelihood Ratios
27. Which of the following terms best describes the degree to which the results of a study can be
a applied to non-study populations?
Heterogeneity
Homogeneity
External validity
Internal validity
P-value
In systematic reviews heterogeneity (not homogeneity) refers to the degree of incompatibility

between the results of trials included in the review.
External validity of a study refers to degree to which the results of a study can be applied to non-
study populations. The internal validity refers to how good the experimental techniques used to
determine cause and effect in a study are.
EBM Definitions
Next question
28. When considering the hierarchy of evidence. Which of the following is considered the
weakest form of evidence?
Meta Analysis
Case report
Cohort study
Systematic review
Randomised control trial
Of the above options the case report is the weakest form. See the pyramid below.
Hierarchy of Evidence
Next question
29. Positive Predictive Value is calculated by which of the following?
A/A+B
A/A+C
A/A+D
D/C+D
D/B+D
Positive Predictive Value is calculated by using A/A+B
Statistics
A = True Positive
B = False Positive
C = False Negative
D = True Negative

Sensitivity
Specificity


Likelihood Ratios
Next question
30. When comparing normally distributed data to skewed data on a graph. Which of the following
statements are true?
In left skew the median moves left relative to the mean
The median is farther out in the long tail than is the mean
In right skew the mean moves left relative to the median
Moving away from the peak and toward the tail, the order is always from: mean to mode to
median
Moving away from the peak and toward the tail, the order is always from: mode to median to
mean
When comparing normally distributed data to skewed data the following are true:
 The mean is farther out in the long tail than is the median
 In right skew the mean moves right relative to the median
 In left skew the mean moves left relative to the median
 Moving away from the peak and toward the tail, the order is always from mode to median to
mean
1,3,4,5,5,5,6,7,7,9,12,14,17
Mean = 7.3
Mode = 5
Median = 6
Next question
31.
Negative Predictive Value is calculated by which of the following?
B/A-B
B/A+B
B/B+D
D/C+D
D/B+D
Negative Predictive Value is calculated by using D/C+D

Next question
Statistics
A = True Positive
B = False Positive
C = False Negative
D = True Negative
Sensitivity
Specificity


Likelihood Ratios
Next question
A rapid urine screening test is developed to detect Chlamydia. A trial involving 200 men and women
is performed comparing the new test to the existing NAAT techniques:
Chlamydia present Chlamydia absent
New test positive 20 3
New test negative 5 172
32.
What is the negative predictive value of the new test?
172/177
20/23
172/192
172/175
20/25
Negative predictive value = TN / (TN + FN)
= 172 / (172 + 5)
Next question
condition.
describe them.
Test positive TP FP
Test negative FN TN


Precision
Next question
33. A new blood test to screen patients for heart failure is trialled on 500 patients. The test was
positive in 40 of the 50 patients shown to have heart failure by echocardiography. It was also
positive in 20 patients who were shown not to have heart failure. What is the positive
predictive value of the test?
0.8
0.66
0.33
0.1
Cannot be calculated
A contingency table can be constructed from the above data, as shown below:
Heart failure No heart failure
Test positive 40 20
Test negative 10 430
Positive predictive value = TP / (TP + FP) = 40 / (40 + 20) = 0.66

Next question
condition.
describe them.
Test positive TP FP
Test negative FN TN


Precision
Next question
34. Sensitivity is calculated by which of the following?
A/A+B
A/A+C
B/A+B
B/B+D
B/C+D
The answer is B
In the exam its often useful to draw the table above

Next question
Statistics
A = True Positive
B = False Positive
C = False Negative
D = True Negative
Sensitivity
Specificity


Likelihood Ratios
Next question
35. What are funnel plots primarily used for?
Demonstrate the heterogeneity of a meta-analysis
Demonstrate the existence of publication bias in meta-analyses
Provide a graphical representation of the relative risk results in a case-control study
Provide a graphical representation of the relative risk results in a cohort study

Provide a graphical representation of the probability of a patient experiencing a particular
adverse effect
Funnel plots - show publication bias in meta-analyses
Next question
Funnel plot
A funnel plot is primarily used to demonstrate the existence of publication bias in meta-analyses.
Funnel plots are usually drawn with treatment effects on the horizontal axis and study size on the
vertical axis.
Interpretation
 a symmetrical, inverted funnel shape indicates that publication bias is unlikely

 conversely, an asymmetrical funnel indicates a relationship between treatment effect and
study size. This indicates either publication bias or a systematic difference between smaller
and larger studies ('small study effects')
Next question
36. Which of the following terms refers to a situation where patients are analysed in the groups
to which they were originally assigned, even though they may have switched treatment arms
of treatment or not completed the trial.
Blinding
Double blinding
Intention to treat
Intention to prevent harm

Selection bias
Intention-to-treat refers to the fact that in a study patients are analysed in the groups to which they
were originally assigned, even though they may have switched treatment arms of treatment or not
completed the trial.
Blinding is used in trials to reduce bias by hiding the intervention from the patient or clinician.
Next question
EBM Definitions
Next question
37. For the following data set:
1,3,4,5,5,5,6,7,7,9,12,14,17
What is the median?
7.3
7.7
Mean = 7.3
Mode = 5
Median = 6
Next question
1,3,4,5,5,5,6,7,7,9,12,14,17
Mean = 7.3
Mode = 5
Median = 6
Next question
38. Randomisation such as that used in randomised control trials are particularly effective in
eliminating which of the following?
Loss to follow up
Observer bias
Selection bias
Measurement bias
Attrition bias
Randomisation in RCTs if done correctly should eliminate bias in treatment assignment specifically
selection bias and confounding.
Next question
Bias in RCTs
Performance bias occurs when participants know if they are in the control group and seek other
forms of medical care.
Observer bias occurs where observers interpret a persons behaviour incorrectly and do not notice
other behaviours, and attribute results according to the group the participant is allocated. Double
blinding reduces performance and observer bias.
Measurement bias is sometimes also referred to as systemic error.
Attrition bias results from systematic differences between treatment groups as a result of differential
withdrawals or exclusions of participants
Next question
39. Number needed to treat is the inverse of which of the following?
Attributable risk
Odds ration
Standard deviation of the mean
Relative risk reduction
Relative risk increase
Number needed to treat (NNT) and Number needed to harm (NNH) are both calculated by:
1/ (EER - CER) OR 1/attributable risk
as EER - CER = attributable risk
NNT and NNH may also be described as the inverse of absolute risk reduction/increase as:
EER CER = absolute risk reduction (if <0) or absolute risk increase (if >0)
The difference between NNT and NNH is in whats being measured. In NNT a positive or beneficial
outcome is being measured whereas in NNH a negative outcome is being measured e.g. for steroids
in asthma NNT may measure reduction in frequency of asthma attacks whereas NNH may measure
number of cases or oral candida.
Key:
EER is experimental group event rate
CER is control group event rate
Statistics
A = True Positive
B = False Positive
C = False Negative
D = True Negative
Sensitivity
Specificity


Likelihood Ratios
Next question
40. Which one of the following statements best describes a type II statistical error?
The p value fails to reach statistical significance
A study fails to reach an appropriate power
The null hypothesis is rejected when it is true
The null hypothesis is accepted when it is false
The alternative hypothesis is rejected when it is false
Next question
Significance tests
A null hypothesis (H0) states that two treatments are equally effective (and is hence negatively
phrased). A significance test uses the sample data to assess how likely the null hypothesis is to be
correct.
For example:
 'there is no difference in the prevalence of colorectal cancer in patients taking low-dose

aspirin compared to those who are not'
The alternative hypothesis (H1) is the opposite of the null hypothesis, i.e. There is a difference
between the two treatments
The p value is the probability of obtaining a result by chance at least as extreme as the one that was
actually observed, assuming that the null hypothesis is true. It is therefore equal to the chance of
making a type I error (see below).
Two types of errors may occur when testing the null hypothesis
 type I: the null hypothesis is rejected when it is true - i.e. Showing a difference between two
groups when it doesn't exist, a false positive. This is determined against a preset significance
level (termed alpha). As the significance level is determined in advance the chance of
making a type I error is not affected by sample size. It is however increased if the number of
end-points are increased. For example if a study has 20 end-points it is likely one of these
will be reached, just by chance.
 type II: the null hypothesis is accepted when it is false - i.e. Failing to spot a difference when
one really exists, a false negative. The probability of making a type II error is termed beta. It
is determined by both sample size and alpha
Study accepts H0 Study rejects H0
Reality H0 Type 1 error (alpha)
Reality H1 Type 2 error (beta) Power (1 - beta)
The power of a study is the probability of (correctly) rejecting the null hypothesis when it is false, i.e.
the probability of detecting a statistically significant difference
 power = 1 - the probability of a type II error

 power can be increased by increasing the sample size
Next question
41. Which one of the following defines the standard error of the mean?
Square root (Standard deviation / number of patients)
Number of patients / square root (mean)
Number of patients / square root (standard deviation)
Standard deviation / square root (number of patients)

Standard deviation / square root (mean)
Standard error of the mean = standard deviation / square root (number of patients)
Next question
Confidence interval and standard error of the mean
The confidence interval is a common and sometimes misunderstood principle in medical statistics.
 a formal definition may be: a range of values for a variable of interest constructed so that this
range has a specified probability of including the true value of the variable. The specified
probability is called the confidence level, and the end points of the confidence interval are
called the confidence limits*
 in simpler terms: a range of values within which the true effect of intervention is likely to lie
The likelihood of the true effect lying within the confidence interval is determined by the confidence
level. For example a confidence interval at the 95% confidence level means that the confidence
interval should contain the true effect of intervention 95% of the time.
How is the confidence interval calculated?
The standard error of the mean (SEM) is a measure of the spread expected for the mean of the
observations - i.e. how 'accurate' the calculated sample mean is from the true population mean
Key point
 SEM = SD / square root (n)

 where SD = standard deviation and n = sample size
 therefore the SEM gets smaller as the sample size (n) increases
A 95% confidence interval:
 lower limit = mean - (1.96 * SEM)

 upper limit = mean + (1.96 * SEM)
The above formula is a slight simplification:
 if a small sample size is used (e.g. n < 100) then it is important to use a 'Student's T critical
value' look-up table to replace 1.96 with a different value
 if a different confidence level is required, e.g. 90% then 1.96 is replaced by a different value.
For 90% this would 1.645
Results such as mean value are often presented along with a confidence interval. For example, in a
study the mean height in a sample taken from a population is 183cm. You know that the standard
error (SE) (the standard deviation of the mean) is 2cm. This gives a 95% confidence interval of 179-
187cm (+/- 2 SE).
*Last JM. A dictionary of epidemiology. Oxford: International Journal of Epidemiology, 1988

Next question
42. In a normal (gaussian) distribution what percentage of results lie within 2 standard deviations
of the mean?
67%
95%
97.5%
99%
99.7%
Gaussian distribution
In a normal (gaussian) distribution the median equals the mode and mean value. 68% of values are
within 1 standard deviation of the mean, 95% are within 2 standard deviations of the mean and
99.7% are within 3 standard deviations of the mean. This is known as the three-sigma rule.
43. A new anti-epileptic drug is trialled for children with absence seizures. There are 250 children
in the control group and 150 children assigned to take the new drug. After 4 months 100
children in the control group had had a seizure compared to 15 children in the group taking
the new medication. What is the relative risk reduction?
30%
3.33
75%
40%
Relative risk reduction = (EER - CER) / CER
Experimental event rate, EER = 15 / 150 = 0.1
Control event rate, CER = 100 / 250 = 0.4
Relative risk reduction = (EER - CER) / CER = (0.1 - 0.4) / 0.4 = -0.75 or a 75% reduction
Relative risk
Relative risk (RR) is the ratio of risk in the experimental group (experimental event rate, EER) to
risk in the control group (control event rate, CER). The term relative risk ratio is sometimes used
instead of relative risk.
To recap
 EER = rate at which events occur in the experimental group

 CER = rate at which events occur in the control group
For example, if we look at a trial comparing the use of paracetamol for dysmenorrhoea compared to
placebo we may get the following results
Total number of patients Experienced significant pain relief
Paracetamol 100 60
Placebo 80 20
Experimental event rate, EER = 60 / 100 = 0.6
Control event rate, CER = 20 / 80 = 0.25

Therefore the relative risk ratio = EER / CER = 0.6 / 0.25 = 2.4
If the risk ratio is > 1 then the rate of an event (in this case experiencing significant pain relief) is
increased compared to controls. It is therefore appropriate to calculate the relative risk increase if
necessary (see below).
If the risk ratio is < 1 then the rate of an event is decreased compared to controls. The relative risk
reduction should therefore be calculated (see below).
Relative risk reduction (RRR) or relative risk increase (RRI) is calculated by dividing the absolute
risk change by the control event rate
Using the above data, RRI = (EER - CER) / CER = (0.6 - 0.25) / 0.25 = 1.4 = 140%
Next question
44. Number needed to harm is the inverse of which of the following?
Attributable risk
Relative risk
Odds ratio
Number needed to treat
Standard deviation
This is calculated in the same way as for NNT, but used to describe adverse events. For NNH, large
numbers are good, because they mean that adverse events are rare. Small values for NNH are bad,
because they mean adverse events are common.
It is defined as the inverse of the attributable risk (or absolute risk reduction)
Statistics
A = True Positive
B = False Positive
C = False Negative
D = True Negative
Sensitivity
Specificity


Likelihood Ratios

Heterogeneity

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Heterogeneity

Uploaded by

Copyright:

Available Formats

EVIDENCE BASED MEDICINE

Kaplan Meier curve

Image sourced from Wikipedia

Image sourced from Wikipedia

Parametric data is usually normally distributed

Student's t-test may be paired on unpaired

Pearson's product-moment coefficient is used to assess correlation between two variables

Chi-squared test is used to compare parametric data

Paired data refers to data obtained from a single group of patients

Significance tests: types

 Student's t-test - paired or unpaired*

 Mann-Whitney U test - unpaired data

Test Outcome Positive A B

Test Outcome Negative C D

5. Which of the following represent Type 2 errors?

False negatives are type 2 errors

Condition Positive Condition Negative

Test Outcome Positive A B

Test Outcome Negative C D

Positive Predictive Value

Negative Predictive Value

6. An endocrinologist performs a study to assess whether a patient's HbA1c level is correlated

Pearson's product-moment coefficient

Spearman's rank correlation coefficient

Significance tests: types

 Student's t-test - paired or unpaired*

 Mann-Whitney U test - unpaired data

= 16 / square root (100) = 16 / 10 = 1.6

Confidence interval and standard error of the mean

How is the confidence interval calculated?

 SEM = SD / square root (n)

A 95% confidence interval:

 lower limit = mean - (1.96 * SEM)

*Last JM. A dictionary of epidemiology. Oxford: International Journal of Epidemiology, 1988

Method of Scientific Hierarchy

Medical Subject Headings

Method of Structural Harmonization

Medical Statistics Hierarchy

Method Statistical Help

Condition Positive Condition Negative

Test Outcome Positive A B

Test Outcome Negative C D

They are calculated as:

LR+ = Sensitivity / (1 - Specificity)

A ÷ (A+C) is the calculation for sensitivity

Condition Positive Condition Negative

Test Outcome Positive A B

Test Outcome Negative C D

Positive Predictive Value

Negative Predictive Value

Polymerase chain reaction

Molecular biology techniques

 SNOW (South - NOrth - West)

Molecular biology techniques

Southern blotting Detects DNA

Northern blotting Detects RNA

Western blotting Detects proteins

Molecular biology techniques

 SNOW (South - NOrth - West)

Enzyme-linked immunosorbent assay (ELISA)

 a type of biochemical assay used to detect antigens and antibodies

Test Outcome Positive A B

Test Outcome Negative C D