Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 7

Basic concept of probability

Mohamed Hussein

Introduction

 People use the term probability many times each day.


 For example, physician says that a patient has a 50-50 chance of surviving a certain operation.
 Another physician may say that she is 95% certain that a patient has a particular disease
 Most people express probabilities in terms of percentages.
 A probability is a number that reflects the chance or likelihood that a particular event will occur.
Probabilities can be expressed as proportions that range from 0 to 1, and they can also be
expressed as percentages ranging from 0% to 100%.
 A probability of 0 indicates that there is no chance that a particular event will occur, whereas a
probability of 1 indicates that an event is certain to occur.
 A probability of 0.45 (45%) indicates that there are 45 chances out of 100 of the event
occurring.
 Some Terms Related to Probability
 Probability is the possibility that an event occur.
 An event is an outcome or a set of outcomes of a random process
 Sample space is the set of all possible outcomes or results of that experiment.
 Sample point is the elements of the sample space.
 Experiment: a process by which an outcome is obtained,

Or activity with an observable result.

i.e., rolling a die.

 Properties of probability

 The probability ranges between 0 and 1


 If an outcome cannot occur, its probability is 0
 If an outcome is sure, it has a probability of 1
 The sum of probabilities of mutually exclusive outcomes is equal to 1
 P(A) + P(B) = 1

 Example
 Suppose we wish to conduct a study of obesity in children 5-10 years of age who are seeking
medical care at a particular pediatric practice.
 The population (sampling frame) includes all children who were seen in the practice in the past
12 months and is summarized in the table below (next slide).
Age (years)

5 6 7 8 9 10 Total

Boys 432 379 501 410 420 418 2,560

Girls 408 513 412 436 461 500 2,730

Totals 840 892 913 846 881 918 5,290

Unconditional Probability

If we select a child at random (by simple random sampling), then each child has the same probability
(equal chance) of being selected, and the probability is 1/N, where N=the population size. Thus, the
probability that any child is selected is 1/5,290 = 0.0002. In most sampling situations we are generally
not concerned with sampling a specific individual but instead we concern ourselves with the
probability of sampling certain types of individuals. For example, what is the probability of selecting a
boy or a child 7 years of age?

The following formula can be used to compute probabilities of selecting individuals with specific
attributes or characteristics.

P (characteristic) = # persons with characteristic / N

Try to figure these out before looking at the answers:

1. What is the probability of selecting a boy?


2. What is the probability of selecting a 7 year-old?
3. What is the probability of selecting a boy who is 10 years of age?
4. What is the probability of selecting a child (boy or girl) who is at least 8 years of age?

Conditional Probability
Each of the probabilities computed in the previous section (e.g., P(boy), P(7 years of age)) is an
unconditional probability, because the denominator for each is the total population size (N=5,290)
reflecting the fact that everyone in the entire population is eligible to be selected. However,
sometimes it is of interest to focus on a particular subset of the population (e.g., a sub-population).
For example, suppose we are interested just in the girls and ask the question, what is the probability
of selecting a 9 year old from the sub-population of girls? There is a total of N G=2,730 girls (here
NG refers to the population of girls), and the probability of selecting a 9 year old from the sub-
population of girls is written as follows:

P (9 year old | girls) = # persons with characteristic / N

Where | girls indicates that we are conditioning the question to a specific subgroup, i.e., the subgroup
specified to the right of the vertical line.

The conditional probability is computed using the same approach we used to compute unconditional
probabilities. In this case:

P (9 year old | girls) = 461/2,730 = 0.169.

This also means that 16.9% of the girls are 9 years of age. Note that this is not the same as the
probability of selecting a 9-year old girl from the overall population, which is P (girl who is 9 years of
age) = 461/5,290 = 0.087.

What is the probability of selecting a boy from among the 6 year olds?

Evaluating Screening Tests

Screening tests are often used in clinical practice to assess the likelihood that a person has a
particular medical condition. The rationale is that, if disease is identified early (before the
manifestation of symptoms), then earlier treatment may lead to cure or improved survival or quality of
life.

One can collect data to examine the ability of a screening procedure to identify individuals with a
disease. Suppose that a population of N=120 men over 50 years of age who are considered at high
risk for prostate cancer have both the PSA screening test and a biopsy. The PSA results are reported
as low, slightly to moderately elevated or highly elevated based on the following levels of measured
protein, respectively: 0-2.5, 2.6-19.9 and 20 or more nanograms per milliliter. 9 The biopsy results of
the study are shown below.

PSA Level (Screening Test) Prostate Cancer No Prostate Cancer Totals


Low (0-2.5 ng/ml) 3 61 64
Slight/Moderate Elevation (2.6-19.9 ng/ml) 13 28 41
Highly Elevated (>29 ng/ml) 12 3 15
Totals 28 92 120

 What is the probability that a man has prostate cancer given he has a low level of PSA?
 What is the probability that a man has prostate cancer given he has a slightly to moderately
elevated level of PSA?
 What is the probability that a man has prostate cancer given he has a highly elevated level of
PSA?

Screening for Down syndrome

To address this question, let's first consider a screening test for Down Syndrome. In pregnancy,
women often undergo screening to assess whether their fetus is likely to have Down Syndrome. The
screening test evaluates levels of specific hormones in the blood. Screening test results are reported
as positive or negative, indicating that a women is more or less likely to be carrying an affected fetus.
Suppose that a population of N=4,810 pregnant women undergo the screening test and are scored as
either positive or negative depending on the levels of hormones in the blood. In addition, suppose that
each woman is followed to birth to determine whether the fetus was, in fact, affected with Down
Syndrome. The results of the screening tests are summarized below.

Screening Test Down Syndrome No Down Syndrome Total


Positive 9 351 360
Negative 1 4,449 4,450
Total 10 4,800 4,810

In order to evaluate the screening test, each participant undergoes the screening test and is classified
as positive or negative based on criteria that are specific to the test (e.g., high levels of a marker in a
serum test or presence of a mass on a mammogram). A definitive diagnosis is also made for each
participant based on definitive diagnostic tests or on an actual determination of outcome.

Using the data above:

What is the probability that a woman with a positive screening test has an affected fetus?

What is the probability that a woman with a negative test has an affected fetus?

Sensitivity and Specificity

As noted above, screening tests are not diagnostic, but instead may identify individuals more likely to
have a certain condition. There are two measures that are commonly used to evaluate the
performance of screening tests: the sensitivity and specificity of the test. The sensitivity of the test
reflects the probability that the screening test will be positive among those who are diseased. In
contrast, the specificity of the test reflects the probability that the screening test will be negative
among those who, in fact, do not have the disease.
A total of N patients complete both the screening test and the diagnostic test. The data are often
organized as follows with the results of the screening test shown in the rows and results of the
diagnostic test are shown in the columns.

Diseased Disease Free Total


Screen Positive a B a+b
Screen Negative c D c+d
a+c b+d N

 Sensitivity = True Positive Fraction = P(Screen Positive | Disease) = a/(a+c)


 Specificity = True Negative Fraction = P(Screen Negative | Disease Free) = d/(b+d)

One might also consider the:

 False Positive Fraction = P(Screen Positive | Disease Free) = b/(b+d)


 False Negative Fraction = P(Screen Negative | Disease) = c/(a+c)

The false positive fraction is 1-specificity and the false negative fraction is 1-sensitivity. Therefore,
knowing sensitivity and specificity captures the information in the false positive and false negative
fractions. These are simply alternate ways of expressing the same information. Often times,
sensitivity and the false positive fraction are reported for a test.

For the screening test for Down syndrome the following results were obtained:

Screening Test Result Affected Fetus Unaffected Fetus Total


Positive 9 351 360
Negative 1 4,449 4,450
Totals 10 4,800 4,810

Thus, the performance characteristics of the test are:

 Sensitivity = P(Screen Positive | Affected Fetus) = 9/10=0.900,


 Specificity = P(Screen Negative | Unaffected Fetus) = 4,449/4,800=0.927.
 False Positive Fraction = P(Screen Positive | Unaffected Fetus) = 351/4,800 = 0.073.
 False Negative Fraction = P(Screen Negative | Affected Fetus) = 1/10 = 0.100.

Interpretation:

 If a woman is carrying an affected fetus, there is a 90.0% probability that the screening test will
be positive.
 If the woman is carrying an unaffected fetus, there is a 92.7% probability that the screening
test will be negative.

However, the false positive and false negative fractions quantify errors in the test. The errors are
often of greatest concern.
 If a woman is carrying an unaffected fetus, there is a 7.3% probability that the screening test
will be positive. (If a woman is carrying an unaffected fetus, there is a 7.3% probability that the
test will incorrectly come back positive. This is potentially a serious problem as a positive test
result would likely produce great anxiety for the woman and her family.)
 And if the woman is carrying an affected fetus there is a 10.0% probability that the test will be
negative. (A false negative result is also problematic. If a woman is carrying an affected fetus,
there is a 10.0% probability that the test will come back negative, and the woman and her
family might feel a false sense of assurance that the fetus is not affected when, in fact, the
screening test missed the abnormality.

The sensitivity and false positive fractions are often reported for screening tests. However, for some
tests, the specificity and false negative fractions might be the most important. The most important
characteristics of any screening test depend on the implications of an error. In all cases, it is
important to understand the performance characteristics of any screening test to appropriately
interpret results and their implications.

Positive and Negative Predictive Value

Consider the results of a screening test from the patient's perspective! If the screening test is positive,
the patient wants to know "What is the probability that I actually have the disease?" And if the test is
negative, astute patients may ask, "What is the probability that I do not actually have disease if my
test comes back negative?"

These questions refer to the positive and negative predictive values of the screening test, and they
can be answered with conditional probabilities.

Diseased Non-Diseased Total


Screen Positive A B a+b
Screen Negative C D c+d
Totals a+c b+d N

 Positive Predictive Value = P(Disease | Screen Positive) = a/(a+b)


 Negative Predictive Value = P(Disease Free | Screen Negative) = d/(c+d)

Consider again the study evaluating pregnant women for carrying a fetus with Down syndrome:

Screening Test Affected Fetus Unaffected Fetus Total


Positive 9 351 360
Negative 1 4,449 4,450
Total 10 4,800 4,810

 Positive Predictive Value = P(Affected Fetus | Screen Positive) = 9/360 = 0.025


 Negative Predictive Value = P(Unaffected | Screen Negative) = 4,449/4,450 = 0.999

1 Interpretation:

 If a woman screens positive, there is a 2.5% probability that she is carrying an affected fetus.
 If a woman screens negative, there is a 99.9% probability that she is carrying an unaffected
fetus.
Positive Predictive Value (Yield) Depends on the Prevalence of Disease

The sensitivity and specificity of a screening test are characteristics of the test's performance at a
given cut-off point (criterion of positivity). However, the positive predictive value of a screening test
will be influenced not only by the sensitivity and specificity of the test, but also by the prevalence of
the disease in the population that is being screened. In this example, the positive predictive value is
very low (here 2.5%) because it depends on the prevalence of the disease in the population. This is
due to the fact that as the disease becomes more prevalent, subjects are more frequently in the
"affected" or "diseased" column, so the probability of disease among subjects with positive tests will
be higher.

In this example, the prevalence of Down syndrome in the population of N=4,810 women is 10/4,810 =
0.002 (i.e., in this population Down syndrome affects 2 per 1,000 fetuses). While this screening test
has good performance characteristics (sensitivity of 90.0% and specificity of 92.7%), the prevalence
of the condition is low, so even a test with a high sensitivity and specificity has a low positive
predictive value. Because positive and negative predictive values depend on the prevalence of the
disease, they cannot be estimated in case control designs.

You might also like