Evaluation of Diagnostic Tests

You might also like

Download as pdf
Download as pdf
You are on page 1of 52
DR MRIDULA SOLANKI DEPARTMENT OF COMMUNITY MEDICINE SETH G.S. MEDICAL COOLEGE &K.E.M. HOSPITAL iting Reasons for Ordering a Laboratory Test of There are 4 major legitimate reasons for ordering a labora- tory test:* 1. Diagnosis (to rule in or rule out a diagnosis), 2. Monitoring (¢g, the effect of drug therapy). 3, Screening (eg, for congenital hypothyroidism via neonatal thyroxine testing). 4, Research (to understand the pathophysiology of a particular disease process), The Challenge of Clinical Measurement Diagnosis are based on information, from formal measurements and/or from your clinical judgment. This information is seldom perfectly accurate: Random errors can occur (machine not working?) Biases in judgment or measurement can occur (“this kid doesn't look sick”) Due to biological variability, this patient may not fit the general rule Diagnosis (e.g., hypertension) involves a categorical judgment; this often requires dividing a continuous score (blood pressure) into categories. Choosing the cutting-point is challenging. Therefore... s You need to be aware ... Diagnostic judgments are based on probabilities; That using a quantitative approach is better than just guessing! That you will gradually become familiar with the typical accuracy of measurements in your chosen clinical field; That the principles apply to both diagnostic and screening tests; Of some of the ways to describe the accuracy of a measurement. Why choose one test and not another?, + Reliability: consistency or reproducibility; this considers chance or random errors (which sometimes increase, sometimes decrease, scores). “Is it measuring something?” * Validity: “Is it measuring what it is supposed to measure?” By extension, “what diagnostic conclusion can | draw from a particular score on this test?” Validity may be affected by bias, which refers to systematic errors (these fall in a certain direction) + Safety, Acceptability, Cost, etc. This is probably how screening questionnaires (e.g. ofesstanategon) work >of Outcome of Test Result: Test results b Test results Test results True value? Dresenung aN © Validity is the ability of a test to indicate which individuals have the disease and which do not Ways of Assessing Validity * Content or “Face” validity: does it make clinical or biological sense? Does it include the relevant symptoms? * Criterion: comparison to a “gold standard” definitive measure (e.g., biopsy, autopsy) — Expressed as sensitivity and specificity Ways of Assessing Validity * Content or “Face” validity: does it make clinical or biological sense? Does it include the relevant symptoms? * Criterion: comparison to a “gold standard” definitive measure (e.g., biopsy, autopsy) — Expressed as sensitivity and specificity * Construct validity Criterion validation: “Gold Standard” The criterion that your clinical observation or simple test is judged against: — more definitive (but expensive or invasive) tests, such as a complete work-up, or — the clinical outcome (for screening tests, when workup of well patients is unethical). Sensitivity and specificity are calculated from a research study comparing the test to a gold standard. Validation of methods * Definition : validation is the confirmation by examination and the provision of objective evidence that the particular requirements for a specific intended use are fulfilled. Method Validation * Analytical methods must be appropriately validated —Linearity (upper and lower) + Is the calibration accurate over the entire reporting range? The calibration parameters may change at the upper end (saturation) and lower end (adsorption) —Accuracy —Precision — Specificity + Is only the target compound being measured? + Mass spectrometry is increasingly preferred, but must be properly run Accuracy vs. Precision Accurac' Precision - How well a - How well a series of measurement agrees measurements agree with an accepted with each other value @ Ors Accuracy vs. Precision Not Accurate Accurate Not Precise Not Precise Not Accura te Acura te Precise Precise ) US @KESSiniing) BYU INACCBYAcali(4 IN = Sensitivity . — The ability of the test to identify correctly those who have the disease = Specificity — The ability of the test to identify correctly those who do not have the disease iS presenting) ke WTR cae NAOKN a ote > = Must know the correct disease status prior to calculation = Gold standard test is the best test available — Itis often invasive or expensive = Anewtestis, for example, a new screening test or a less expensive diagnostic test Use a 2x 2 table to compare the performance of the new test to the gold standard test * In medicine and statistics, gold standard test refers to a diagnostic test or benchmark that is the best available under reasonable conditions. It does not have to be necessarily the best possible test for the condition in absolute terms. For example, in medicine, dealing with conditions that require an autopsy to have a perfect diagnosis, the gold standard test is normally less accurate than the autopsy. * A hypothetical ideal “gold standard" test has a sensitivity of 100% with respect to the presence of the disease (it identifies all individuals with a well defined disease process; it does not have any false-negative results) and a specificity of 100% (it does not falsely identify someone with a condition that does not have the condition; it does not have any false-positive results). In practice, there are sometimes no true “gold standard" tests. Sometimes they are called "perfect" and “alloyed” gold standard Disease oe = atc (All people with disease) (All people without disease) is presenting Comparison of Disease Statu: Gold Standard Test and New Test Disease —- — + a q (True positives) New test c d (True negatives) = Sensitivity is the ability of the test to identify correctly those who have the disease (a) from all individuals with the disease (atc) = Sensitivity is a fixed characteristic of the test = Specificity is the ability of the test to identify correctly those who do not have the disease (d) from all individuals free from the disease (b+d) S d__ true negatives sensitivity= rq disease— =Pr(T—|D-) = Specificity is also a fixed characteristic of the test is presenting Clinical applications * A specific test can be useful to rule ina disease. Why? — Very specific tests give few false positives. So, if the result is positive, you can be sure the patient has the condition * A sensitive test can be useful for ruling a disease out: —A negative result on a very sensitive test (which detects all true cases) reassures you that the patient does not have the disease Most Tests Provide a Continuous Score. Selecting a Cutting Point Test scores for a healthy population Sick population Healthy Pathological scores Possible dit-point scores is presenting Most Tests Provide a Continuous Score. Selecting a Cutting Point Test scores for a healthy population Sick population Healthy athological scores scores Move this way to increase sensitivity (include more of = +---- sick group) iS (presenting Most Tests Provide a Continuous Score. Selecting a Cutting Point Test scores for a healthy population Sick population ‘athological scores scores Move this way to increase specificity (exclude healthy Peat) eee Crucial issue: changing cut-point can improve S (presenting, sensitivity or specificity, but never both Problems Resulting from Test Errors * False Positives can arise due to other factors (such as taking other medications, diet, etc.) They entail the cost and danger of further investigations, labeling, worry for the patient. — This is similar to Type | or alpha error in a test of Statistical significance . * False Negatives imply missed cases, so potentially bad outcomes if untreated Problems Resulting from Test Errors + False Positives can arise due to other factors (such as taking other medications, diet, etc.) They entail the cost and danger of further investigations, labeling, worry for the patient. — This is similar to Type | or alpha error in a test of statistical significance . + False Negatives imply missed cases, so potentially bad outcomes if untreated — Type Il or beta error: the chance of missing a true difference Your Patient's Question: “Doctor, how likely am | to have this disease?” This introduces Predictive Values * Sensitivity & specificity don’t answer this, because they work from the gold standard. * Now you need to work from the test result, but you won't know whether this person is a true positive or a false positive (or a true or false negative). How accurately does a positive (or negative) result predict disease (or health)? Start from Prevalence * Before you do any test, the best guide you have to a diagnosis is based on prevalence: — Common conditions (in this population) are the more likely diagnosis * Prevalence indicates the ‘pre-test probability of disease’ iS resending Prevalence = atc/N An ideal, or truly accurate, test will always give a positive result with disease, and a negative result without disease. This is not the case for all tests. In practice this means that not all positive test results will represent disease. This is described by the Positive Predictive Value (PPV). Equally, not all negative results will represent no disease. This is described by the Negative Predictive Value (NPV). * Positive predictive value is defined as the proportion of those with a positive test result who actually have disease. + Negative predictive value is defined as the proportion of those with a negative test result who do not have disease. Evaluation of Screening Test Screening Test Diagnosis Total Raoult Diseased Not Diseased Positive 40 (a) PO) 60 (a+b) Negative 400 (c) 9840 (a) 9940 (c+d) Total 140 (atc) 9860 (b+d) 10000 (a+b+c+d) a) Sensitivity (True Positives) = (a / a + c) = (40 / 140) X 100 = 28.57% b) Specificity (True Negatives) = (d/ b+ d ) = (9840 / 9860) X 100 = 99.79% c) False Negative =(c/a+c) = (100/140) X100=71.4% d) False Positive = (b/b +d) = (20/9860) X 100 = 0.20% e) Positive Predictive Value = (a/ a + b ) = (40 / 60) X 100 = 66.66% f) Negative Predictive Value = (d/c + d ) = (9840 / 9940) X 100 = 98.9% This Leads to ... Likelihood Ratios + Defined as the odds that a given level of a diagnostic test result would be expected in a patient with the disease, as opposed to a patient without: true positive rate / false positive rate [TP / FP] + Advantages: — Combines sensitivity and specificity into one number — Can be calculated for many levels of the test — Can be turned into predictive values LR for positive test = Sensitivity / (1-Specificity) targer the no. better it is LR for negative test = (1-Sensitivity) / Specificity smatier is better Practical application: a Nomogram Seer 1) You need the LR for this test 2) Plot the likelihood ratio on center axis (e.g., LR+ = 20) 3) Select pretest probability (prevalence) on left axis (e.g. Prevalence = 30%) 88 8388388 Sa wes ve 4) Draw line through these points to right axis to indicate post-test ai probability of disease Example: Post-test probability = 91% Chaining LRs Together (1) + Example: 45 year-old woman presents with “chest pain” — Based on her age, pretest probability that a vague chest pain indicates CAD is about 1% * Take a fuller history. She reports a 1-month history of intermittent chest pain, suggesting angina (substernal pain; radiating down arm; induced by effort; relieved by rest...) — LR of this history for angina is about 100 1. From the History: ‘She's young; pretest probability about 1% LR 100 ills PESeINtiing) The previous example: =o si * s kt 4 kt i i Pretest probability rises to 50% based on history Chaining LRs Together (2) 45 year-old woman with 1-month history of intermittent chest pain... After the history, post test probability is now about 50%. What will you do? A more precise (but also more costly) test: + Record an ECG — Results = 2.2 mm ST-segment depression. LR for ECG 2.2 mm result = 10. — This raises post test probability to > 90% for coronary artery disease (see next slide) The previous example: ECG Results . Post-test probability now rises — to 90% ” Now start pretest Bi fe probability ° ‘8 (i.e. 60%, prior to =| fe ECG, based on “ . history) ‘i he 4 t is presenting) There is another way to combine sensitivity and specificity: Meet Receiver Operating Characteristic (ROC) curves Work out Sensitivity and Specificity for every possible cut-point, then plot these. Area under the curve indicates the information provided by the test 1 = In an ideal test, the 2 blue line would z 06 reach the top left a corner. S For a useless test io! o* it would lie along the diagonal: no 0.2 better than guessing 0 0.2 04 06 08 1 30 iS pIESanuing) 1-Specificity ( = false positives) Introduction to ROC curves ROC = Receiver Operating Characteristic Started in electronic signal detection theory (1940s - 1950s) Has become very popular in biomedical applications, particularly radiology and imaging Also used in machine learning applications to assess Classifiers Can be used to compare tests/procedures Receiver Operating Characteristic Methodology Comparing ROC Curves © 010203040500 070809 1 False positive rate lis (pKESentiing) ROC curves: simplest case * Consider diagnostic test for a disease * Test has 2 possible outcomes: — ‘postive’ = suggesting presence of disease — ‘negative’ * An individual can test either positive or negative for the disease Area under ROC curve (AUC) * Overall measure of test performance * Comparisons between two tests based on differences between (estimated) AUC ¢ For continuous data, AUC equivalent to Mann- Whitney U-statistic (nonparametric test of difference in location between two populations) True Positive Rate = we z True Positive Rate xe ee FAY TOL INUY CUIVES AUC = 100% . "|Z AUC = 50% False Posive ° % False Postne AUC = 90% . E AUC = 65% False Positive % , False Positive % Rate Rate Interpretation of AUC + AUC can be interpreted as the probability that the test result from a randomly chosen diseased individual is more indicative of disease than that from a randomly chosen nondiseased individual * So can think of this as a nonparametric distance between disease/nondisease test results Problems with AUC « No clinically relevant meaning + A lot of the area is coming from the range of large false positive values, no one cares what's going on in that region (need to examine restricted regions) * The curves might cross, so that there might be a meaningful difference in performance that is not picked up by AUC Phase! Phase Il Phase Ill Phase IV Phase V Phase VI Test phases for diagnostic tests Investigates whether test results are different for patients + disease Investigates whether patients with disease are more likely to have positive test results compared to patients without diseas: Investigates how well the test distinguishes between patients + disease in patients suspected of having the disease Investigates how informative a test is considering additional information available at the moment of testing. Investigates whether using the test leads to better health outcomes Investigates whether using the test leads to better health outcomes at acceptable costs 5 key points in the clinical evaluation of diagnostic tests: Sensitivity and specificity always should be reported together. ROC curves allow a comprehensive assessment and comparison of diagnostic test accuracy. PPV and NPV cannot be interpreted correctly without knowing the prevalence of disease in the study sample. Patients who did not undergo the reference standard procedure should never be omitted from studies of diagnostic test accuracy. Published guidelines should be followed when reporting the findings from studies of diagnostic test accuracy. Thank you

You might also like