Download as pdf or txt
Download as pdf or txt
You are on page 1of 86

CSN 2045

BIOSTATISTICS AND
RESEARCH METHODS IN PHARMACY

Pharmacy C479

(4 quarter credits)

A Course for Distance Learning


Prepared by Sean D. Sullivan, Holly Andrilla, David Blough, and Andy Stergachis
First Edition November 1996
Revised February 1998

Copyright 1996 by UW Distance Learning


University of Washington Extension
5001 25th Avenue N.E.
Seattle, Washington 98105-4190

All rights reserved. No part of this publication may


be reproduced in any form or by any means without
permission in writing from the publisher.

Printed on recycled paper.


CONTENTS
Introduction.................................................................................................................................................4
Required Texts; Supplemental Materials; What Is Pharmacy C479?;
Assignments and Exams; Grading Policy; Instruction for Assignment
Submission; Deadlines; Finding Key Terms; A Word of Caution; Study
Tips for Research Design and Biostatistics; What If I CanÕt Get the
Same Answer as the Book?; Optional Materials; Electronic Mail
Options; About Your Instructors; Contacting Your Instructor

Lesson One The Basics of Clinical Research...........................................................................................12


The Anatomy of Research; Elements of the Research Protocol;
Significance; Research Design; Study Subjects; Measures; Statistical
Plan; Validity; Error; The Research Question; Choosing Study
Subjects; Populations; Specifying Selection Criteria; Sampling; Design
Errors and Prevention Strategies; Planning Measurement: Precision
and Accuracy; Measurement Scales; Precision; Accuracy; Validity;
Conclusion

Lesson Two Cohort Studies ......................................................................................................................................26


An Introduction to Cohort Studies; Retrospective Cohort Study; Steps
in Planning a Cohort Study; Analyzing Cohort Studies: Incidence and
Relative Risk; The Relative Risk Ratio

Lesson Three Cross-Sectional and Case-Control Studies...............................................................32


Cross-Sectional Studies; Advantages of Cross-Sectional Studies;
Weaknesses of Cross-Sectional Studies; Conducting a Cross-
Sectional Study; An Introduction to Case-Control Studies; Advantages
of Case-Control Studies; Weaknesses of Case-Control Studies;
Choosing Cases; Choosing Controls; Assessment of Exposures; Bias
in Case-Control Studies; Sampling Bias; Measurement Bias; Some
Methods to Reduce Measurement Bias; Confounding; Analysis of
Data from Case-Control Studies

Lesson Four Experiments.............................................................................................................................................38


Introduction to Experimental Designs; Types of Experimental Designs;
Special Types of RBTs and Other Experimental Designs; Conclusion;
Further Reading

Assignment #1 .........................................................................................................................................46

Lesson Five Preparing for the First Midterm Examination...........................................................48

Lesson Six Data Analysis: Descriptive Measurement....................................................................50


Understanding the Data Set

Assignment #3 .........................................................................................................................................54

Lesson Seven Distributions............................................................................................................................................56


Introduction; Probability

Assignment #4 .........................................................................................................................................59
Lesson Eight Confidence Estimation 60
Introduction to Statistical Estimation; Constructing a Confidence
Interval for a Mean; Comparing Two Means; Confidence Intervals for
an Odds Ratio

Assignment #5 .........................................................................................................................................64

Lesson Nine Preparing for the Second Midterm Examination...................................................66

Lesson Ten Hypothesis Testing...........................................................................................................................68


The Purpose of Hypothesis Testing; Errors; Stating a Research
Hypothesis

Assignment #7 .........................................................................................................................................71

Lesson Eleven Basic Statistical Testing ..............................................................................................................72


Introduction; Testing Two Proportions; Testing Several Proportions;
Testing Two Means; Limitations

Assignment #8 .........................................................................................................................................76

Lesson Twelve Correlation and Regression......................................................................................................78


Correlation; Regression

Assignment #9 .........................................................................................................................................81

Lesson Thirteen Statistical Testing...............................................................................................................................82


Nonparametric Methods; Wilcoxon Sign Test; Wilcoxon Rank-Sum
Test

Assignment #10 ......................................................................................................................................85

Lesson Fourteen Preparing for the Final Examination.................................................................................86


PHARMACY C479
BIOSTATISTICS AND RESEARCH
METHODS IN PHARMACY
INTRODUCTION
Required Texts

Le and Boen. Health and Numbers: Basic Biostatistical Methods.


Wiley-Liss, 1995.

Hulley and Cummings. Designing Clinical Research. Williams and


Wilkins, 1988.

Supplemental Materials
You will need a calculator, preferably one that has scientific or statistical
functions. We do not recommend that you spend hundreds of dollars on a
calculator for this course. Before undertaking Lessons Six through
Fourteen, you may want to refer to a standard algebra refresher text to
refamiliarize yourself with the concepts of notation (e.g., Σ, σ, µ, α, β, ν),
formulas, expressions, equations, coefficients, plotting and coordinates,
slope, and intercept.

What Is Pharmacy C479?


This course is designed for students in the external Pharm.D. program as a
clear and practical introduction to the concepts of clinical research study
design and applied biostatistics. The specific purpose of the course is to
prepare you to evaluate the clinical literature with respect to study design
and statistical assessment. There is an emphasis in this course on the
application of clinical research design and statistics concepts to
biomedical and pharmacy or drug-related problems and not on the
mathematical fundamentals.

We will be covering a lot of material in this course in a relatively short


time (two textbooks in five months or less!). Strict attention to the
readings and assignments will be an important factor for successful
completion of the course. In addition, you will need a working
understanding of algebra. If you have not reviewed algebraic logic or
concepts in the recent past, it would be in your best interest to purchase
and work through an algebra review guide.

Much of the underlying technical equations and proofs in biostatistics are


not presented in this course. You are not likely to require this level of
detail in practice. In addition, calculators and computers handle much of
the mundane but necessary calculations for statistical analysis. However,
certain key statistical concepts require knowledge about some basic

4 Lesson OneÑThe Basics of Clinical Research


formulae. These will be presented when necessary, but the detail and
mathematics will be kept at a minimum.

Assignments and Exams


Most lessons will consist of a reading assignment, a review of the
important elements of each chapter, practice problems, and a written
assignment for submission. There will be two midterms and a
comprehensive final examination. These tests will be supervised by a
proctor in your local area.

The course will cover most but not all of the material in the two required
textbooks. The fact that this is a tremendous amount of reading is not lost
on us. Without the benefit of lectures, you are limited to the information
contained in the textbooks and this guide. To make the textbooks more
practical and useful, we have endeavored to select the key concepts in
each chapter to review in this study guide. In addition, for the statistical
analysis sections we have selected problems for you to work that
accompany each of the key concepts. This study guide works well as a
companion to the textbooks and not as a substitute. Please do all the
readings before attempting the problems.

The practice problems for Lessons Six through Fourteen have been
selected from the problem sets in the Le and Boen book. The answers to
some these problems can be found in the book. You should make every
attempt to complete each of the problems even though many of the
answers appear in the back of the book. Hence, when grading your
assignments, we will be able to discern how you arrived at your answer
and give you feedback in areas where you may be having some difficulty.
In addition, you may see one or more of these problems on the
examinations.

Each written assignment will consist of between one and twenty problems
that you will submit for grading. These problems will come from the Le
and Boen book and other sources. Because these will be graded it is
important for you to write clearly and, in the case of statistics problems, to
show the steps you took to arrive at the solution. The midterms, Lessons
Five and Nine, and the final examination, Lesson Fourteen, should be
scheduled only after you have received all your corrected assignments
back. Please allow sufficient time (ten to fifteen days) for these to be
graded. The midterms cover the lessons since the last exam. The final is
comprehensive, covering all of the lessons. All exams are to be taken
without the aid of the textbooks or other materials, with two exceptions:
you are allowed to bring one page of formulae to help you on the exams,
and you are allowed to bring and use a calculator. To help complete the
exams, you will also be provided with a copy of appendixes A, B, and C
from the Le and Boen text. You will have ninety minutes to complete each
midterm exam and two hours to complete the final.

Grading Policy
The written assignments will collectively count for 25% of your grade,
each of the midterms will count 25% of your grade (for a total of 50%),

Pharmacy C479 5
and the final will count for the remaining 25% of the grade. Each
assignment and exam will be graded on a four-point scale.

Instruction for Assignment Submission


Please use 8.5" by 11" paper for your assignments. Mark your solutions
for the statistical problems clearly by drawing a box around the answer.
Put your name and the assignment number atop each page and staple
multiple pages together. Assignments submitted by electronic mail cannot
be accepted for this course.

Please be certain to enclose the Assignment Identification Sheet that


corresponds to your assignment when you submit your work. Your
assignments cannot be processed without these forms.

It is best to always keep a copy of the work you submit. Mail can
occasionally get lost.

Deadlines
If you need to have your final course grade turned in to the registrar by a
particular date, plan ahead. Turn in all of the preceding assignments before
you schedule an exam and allow two weeks after submitting the final
exam for Distance Learning to process your grade. No special
accommodations can be made on this point.

Finding Key Terms


Each lesson has a list of key terms that are crucial to your understanding
the material and to your success on the exams. Neither this course guide
nor the textbooks contain a glossary for an easy list of definitions. Instead,
all three attempt to define the terms as they use them.

In the course guide, you will find many of the key terms explained in the
commentary. The textbooks, however, have the added advantage of an
index that you can use to find many of the key terms used throughout the
texts. Please take the time to look up and memorize these terms.

A Word of Caution
Procrastination is your enemy in completing a self-study course. Without
the set schedule of a lecture course, some students may put off doing
lessons and assignments. This course can be completed in the time frame
of a ten-week quarter if you apply yourself. You will have to manage your
own time to successfully complete this course.

Study Tips for Research Design and Biostatistics


Some students of research design and biostatistics complain that the
concepts and techniques are too difficult to comprehend and apply.
Common complaints are that textbooks are complex and difficult to read,
the lectures too dry, and the examples irrelevant. We acknowledge these
criticisms as being partly true. The fault is largely not the students, but the
instructor and educational materials. We have searched extensively to
select the most applied and easy to use textbooks for this course. In
addition, we have made it a priority to develop a clear and concise study
guide with practical problems relevant to clinical practice.

6 Lesson OneÑThe Basics of Clinical Research


It is incumbent upon you to treat this course as you would a job. Pay
attention to it every day. There is no shortcut to learning these tools. You
will master this material with practice and a good attitude.

Do not read the text like a novel. While there is a rather important “story
line,” the books do not read like a novel. You may find it easiest to read
each chapter in sections, mastering each before proceeding to the next.

Remember to practice, practice, practice, and to review if the material


fades.

What If I Can’t Get the Same Answer as the Book?


Imagine this: You’ve just come to the end of a problem that involves some
messy calculations, and you feel confident in your answer. You check it in
the book and the answer isn’t the same!

What now? Don’t panic.

There are three possible reasons for the discrepancy between your answer
and the one in the book:

1. It’s impossible to print an error-free study guide or textbook. There


are mistakes in the answers.

2. Space is at a premium in any textbook. The answer in the book may


have been manipulated to fit into a given space and therefore looks
different.

3. Your answer might, in fact, be wrong.

So what do you do? Quickly look over your work for clerical errors like
dropping a negative sign. Compare your answer with the book’s, and use
the similarities and differences to focus on the particular spot where you
disagree. If you still can’t find the error, look at other problems similar to
the one you just did. If your answers agree on four out of five problems,
you understand how to do the problems and you can safely forget this one.
On the other hand, if you answers agree with the book’s on zero out of
five problems, you are probably on the wrong track, and you should go
back and read some more.

Important note: Don’t get in the habit of relying on the answers.


Remember, they won’t be there for you during exams.

Nevertheless, we have found many errors in Le and Boen’s Health and


Numbers: Basic Biostatistical Methods. Please see the errata sheet we
have provided as an insert to help alert you to these errors in advance.
The insert may be found at the end of this introduction to the course guide.

Pharmacy C479 7
Optional Materials
The following books may help you with course material but are not
required and will not be referred to in the course.

Clinical Research Methods


These books address the fundamentals of clinical trial design and conduct
and are an excellent source for those wishing further reading in the area.
¥ Freidman, Furberg, and DeMets. Fundamentals of Clinical
Trials. 2d ed. Mosby-Year, 1985.

¥ Shapiro and Louis, eds. Clinical Trials: Issues and Approaches.


Marcel Dekker, 1983.

Biostatistics
These two books are commonly used as introductory texts in biostatistics
courses for health professionals. You may find them helpful if you wish
further clarification or more in-depth discussion on the statistics topics.
¥ Norman and Streiner. Biostatistics: The Bare Essentials. Mosby-
Year, 1994.

¥ Glantz. Biostatistics Primer. 3d ed. McGraw-Hill, 1995.

Electronic Mail Options


The advantages of using electronic mail in this course are as follows:
¥ You can get personal answers to your questions more quickly
than by regular mail.

¥ Your account will provide access to library catalogues of major


universities, GrolierÕs Encyclopedia, WebsterÕs Dictionary, the
Oxford English Dictionary, Usenet, the UW Campus Calendar, and
other resources.

Establishing an Electronic Mail Account


If you do not have an electronic mail account, you can establish a UW
Uniform Access account by doing one of the following:
¥ If you are a matriculated UW student, follow the instructions in
the ÒGetting StartedÓ section of the Guide to Information
Technologies and Resources via the University of Washington.

• If you are not a matriculated UW student, or if you need help


setting up your account, call Distance Learning at (206) 543-2350 or
(800) 543-2320 (voice), or (206) 543-6452 (TDD).

For more information on setting up and using electronic mail and other
Internet resources, see the “Online Access” section of the Distance
Learning Student Handbook.

8 Lesson OneÑThe Basics of Clinical Research


Asking Questions
Once you have an e-mail account, you can address questions about
Distance Learning in general to distance@u.washington.edu and questions
about this particular course and homework problems to David Blough
using the following e-mail address: dkblough@u.washington.edu.

If you have any problems using electronic mail, you can get help by
calling Distance Learning. You can also send e-mail to Distance Learning
at distance@u.washington.edu or UW Computing and Communications at
help@cac.washington.edu.

About Your Instructors

Sean D. Sullivan
I am a native of west central California and I currently live in Kirkland,
Washington. I have been teaching clinical research methods and medical
technology assessment at the University of Washington since 1992. I
received a B.S. in pharmacy from Oregon State University, an M.S. in
pharmacy administration from the University of Texas, and a Ph.D. in
health economics and policy from the University of California at
Berkeley. As an instructor, I have tried to convey a sense of the practical
when it comes to medical statistics and clinical research design.

Holly Andrilla
I am a native of Seattle and attended the University of Washington as an
undergraduate, earning my B.S. in statistics. I continued at the UW and
completed my M.S. in biostatistics. I currently work as a biostatistician in
the Departments of Family Medicine, Pharmacy, and Biostatistics. I also
teach at North Seattle Community College.

Dave Blough
I have been pushing numbers around since receiving my doctorate in
statistics from Iowa State University in 1982. I left the corn and soybean
fields to teach and publish, and currently enjoy working with researchers
from various disciplines through my own consulting firm. When I moved
to Seattle in 1994, it was to work for the Department of Health Services at
the University of Washington. There I helped develop medical risk
adjustment models for the state of Washington. When not playing the
piano or backpacking in the mountains, I love to hunch over the computer
and derive great insights from masses of data. Even more than that, I enjoy
teaching others that statistics can be useful, interesting, and even fun.

Andy Stergachis
I am a lifelong resident of the Pacific Northwest. A graduate of the
Washington State University College of Pharmacy and the University of
Minnesota, I have held a variety of research and administrative positions
at the University of Washington School of Pharmacy and at Group Health
Cooperative of Puget Sound. My research has used several study designs,
ranging from experimental to epidemiologic methods. I am the recipient of
numerous research grants and awards in the area of pharmacoepidemiol-

Pharmacy C479 9
ogy and drug policy. My teaching methods emphasize real-life problems
facing pharmacists and other health-care professionals.

Contacting Your Instructor


You can reach your instructor through voice mail by calling Dave Blough
at (206) 616–8353 or (800) 499–7182, or through electronic mail at
dkblough@u.washington.edu.

10 Lesson OneÑThe Basics of Clinical Research


Pharmacy C479 11
LESSON ONE
THE BASICS OF CLINICAL RESEARCH
Reading Assignment

Hulley and Cummings, chapters 1 through 4

Objectives
At the end of this lesson, you should be able to
o identify the elements of a research protocol;
o formulate a set of primary and secondary clinical research
questions;
o differentiate between internal and external validity;
o critique clinical study enrollment (inclusion and exclusion)
criteria for appropriateness to the research question;
o select from among the possible approaches to sampling; and
o identify the characteristics of study variables.

Key Terms
research questions
design
selection of study subjects
measures
statistical evaluation
treatment group
internal validity
external validity
random error
systematic error
sampling error
measurement error
target population
accessible population
study sample
inclusion and exclusion criteria
sample
probability sampling
nonprobability sampling
consecutive sampling
convenience sampling
judgement sampling

The Anatomy of Research


This course begins with a discussion of the basic elements of clinical
research design. Understanding the research process is a necessary
prerequisite to critical biomedical literature evaluation. In this lesson and
its accompanying textbook reading, the components of the research
protocol are described, as is the process for selecting the primary and

12 Lesson OneÑThe Basics of Clinical Research


secondary research questions and related clinical measurements.
Enrollment criteria are presented along with a discussion of common
sampling approaches for selecting the study subjects from among the
eligible population. Finally, measurement issues in research data are
presented and discussed.

Elements of the Research Protocol


The structure and conduct of a clinical research project is set out in the
study protocol. This is the written plan of the study and is usually prepared
by a team of researchers who are headed by the principal investigator (PI).
The purpose of the protocol is to provide a detailed and descriptive road
map for the efficient conduct of the study. A researcher interested in
conducting the study should be able to read the clinical protocol and
follow the intended research plan from start to finish.

The basic components of the research protocol include the following:


¥ the primary and secondary research questions;

¥ the significance of the research problem;

¥ the overall design of the study;

¥ criteria and plan for selection of study subjects;

¥ list and characteristics of study variables or measures; and

¥ the statistical evaluation plan.

The following sections in this course guide describe each of the elements
of the research protocol in more detail.

Research Questions
The research questions describe the primary and secondary objectives of
the study in a manner that is feasible for clinical investigation. There is
always at least one primary research question. There is often an additional
set of secondary research questions related to the primary objective of the
study. For example, a primary research question might be “Are patients
who receive calcium channel blockers for hypertension treatment at
greater risk of a myocardial infarction than those not receiving calcium
channel blockers?” A secondary research question might relate to the risk
of myocardial infarction in a special subset of hypertensive patients
receiving calcium channel blockers, or it might relate to the risk of MI in
patients taking the drug at a specific dose or taking the short-acting agents
and not the long-acting preparations. The research question is a focused
statement about the clinical problem to be addressed.

Pharmacy C479 13
Significance
The significance section of the protocol should state with clarity the
scientific rationale for the study. In other words, this section should
develop and answer the question “Why is this study important to
undertake?” To do this, the researcher must critically review and make
reference to previously published literature (including the investigator’s
own work) and the gaps in knowledge that exist. The primary and
secondary research questions should emerge from the significance section.

Research Design
Most clinical studies in pharmacy involve research related to new or
existing drugs, devices, or services. There are two basic types of research
designs for studying the effects of these interventions. Clinical studies are
frequently experimental in that the researcher elects to actively test the
effect of an intervention in one group of subjects (e.g., drug, device, or
program) when compared to a control or usual care group. The most
widely used experimental research design is the randomized trial, which is
often viewed as the standard by which all other clinical study designs are
compared. As noted in the text, not all situations require the use of a
randomized trial.

An observational or nonexperimental study is one in which the


investigator observes the interventions and clinical outcomes without
imposing or altering them. Observational studies can be retrospective,
where existing event and outcome data are collected on subjects after the
event has occurred, or they can be prospective, where study subjects are
followed for outcomes that have not yet occurred. Two examples of
observational study designs are the case-control and the prospective or
retrospective cohort study. The results from an observational study often
lead an investigator to conduct a randomized trial in order to test and
measure more formally and specifically the relationship between the
intervention and the outcome.

Study Subjects
The section of the research protocol on study subjects describes the
criteria for including and excluding potentially eligible patients. For
example, the investigator may be interested in studying the respiratory
outcomes of adolescent asthmatics who use peak flow meters as part of a
self-monitoring program. In this case, the inclusion criteria would reflect
an age restriction so that adolescents would be included in the study
sample and all others would be excluded. Another inclusion criteria may
relate to ability to use a peak flow meter or willingness to participate in an
asthma self-monitoring program. In any event, the inclusion and exclusion
criteria serve to select and standardize as much as possible the
characteristics of the population that is to be studied.

In addition to selecting appropriate patients, the investigators will need to


describe the process for picking the study subjects from the population of
patients who may be eligible for the study. For example, subjects for the
previously mentioned calcium channel blocker study may be drawn from a
community catchment area or from a hypertension clinic; the latter would
be quicker and possibly less costly, and the former would require a great

14 Lesson OneÑThe Basics of Clinical Research


deal of research organization in order to recruit potential subjects from the
broader community.

Measures
A major element of the research protocol is the description of data
elements or variables that are to be measured and collected as part of the
evaluation. The primary focus of most clinical intervention studies is on
the clinical outcome or primary clinical endpoint. The outcome is typically
collected from intervention and control subjects both before and after the
intervention. The group receiving the intervention is sometimes referred to
as the treatment group. Outcome variables for clinical studies of
pharmaceuticals are related to the potential benefits and risks of the drug
and can include measures of physiologic and symptomatic function such
as blood pressure, frequency of nocturnal awakenings, cholesterol level, or
intraocular pressure; health status variables such as measures of physical
and mental activity or health-related quality of life; and socioeconomic
measures such as time off from school or work and medical resource
consumption.

The ability of statistical tests to differentiate between real and random


differences in outcome values between treatment and control groups is
directly related to the accuracy and precision of measurement of variables
in a clinical research study. Bias in the data collection process and
measurement of study variables may lead to faulty statistical findings and
inappropriate clinical interpretations.

Statistical Plan
The final section of the research protocol contains a description of the
statistical evaluation strategy as proposed by the researchers. It is
important that the statistical plan is developed prior to data analysis. As
you will learn later in the course, the statistical evaluation plan relates
directly to the primary and secondary research questions, which are posed
in the form of research hypotheses. Without a clinically plausible
theoretical link from the research question to the analysis plan, the study is
left open to criticism from other researchers.

Validity
The purpose of conducting clinical research is so that new medical
technology, devices, procedures, drugs, or programs can be correctly and
accurately described in terms of effectiveness and safety. This is generally
accomplished by a statistical evaluation of the relevant clinical outcome
measures. Chance, bias, and confounding, however, can be alternative
explanations for positive statistical findings.

The validity of a study result is the extent to which the result observed in
the study is in fact the “true” result. Strictly speaking, the results of any
research study apply only to those subjects selected through the
inclusion/exclusion criteria and enrolled in the study. The extent to which
the results of a single clinical study accurately describe what happened
within the study is a measure of the study’s internal validity.

Pharmacy C479 15
The extent to which the findings of a single study can be applied to
settings and populations outside the immediate boundaries of the clinical
study is external validity or generalizability. Investigators strive to
maximize both internal and external validity. Sound internal validity
strengthens the interpretability of the statistical evaluation. External
validity is necessary for the acceptance of clinical trial results beyond the
confines of the clinical trial population. Figures 1.4 and 1.5 in Hulley and
Cummings illustrate the concepts of study validity.

Error
Error may interfere with inference within clinical studies and should be
minimized through appropriate research design. The two types of error
most common to clinical studies are random error and systematic error.
Random error results from chance or unknown influence on the
measurement of important clinical variables.

Suppose that we wanted to estimate the average age of a class of 30


students. We select a random sample of 3 students (a 10% random sample)
and take an average of their age as an estimate for the class. If the “true”
average age (using all 30 students) was 23 years and the “estimated”
average age from the 3 subjects was 25, the difference between the two
measures can be described as random error and likely due to chance. Of
course, the “estimated” average age depends upon the ages of the 3
randomly selected students. Selection of another 3 subjects may result in a
different estimate of the average age. As you might infer from this
example, the larger the sample of students selected relative to the overall
size of the class, the smaller the random bias will be. The most important
strategy for minimizing random error is to increase the sample size upon
which the “estimate” is based. In this case, it would have been optimal to
have estimated the “true” age of the class using all 30 subjects. In clinical
research, it is not possible to sample from among all the possible subjects
with a given disease.

Systematic error results from bias or distortion where the source of the
bias is not random. Suppose that we wanted to estimate the average height
of all females in the United States, but there was only enough research
money to query a random sample of 100 females. If the random sample
were to be drawn from among female basketball players who visited a
sports medicine clinic, then an overestimate of the population or “true”
average height would surely result. The estimate would be systematically
biased in an upward direction and the source of the bias would be
sampling error.

Suppose that the height measurements were taken from among a truly
random sample of all females, and the height measuring device added
(mistakenly, of course) an extra six inches to the height of all subjects.
This would result in a substantial bias of the population average value.
Bias derived from a faulty measurement tool is termed measurement
error. This type of error has the potential to be common and devastating in
clinical research studies because of all of the measurement tools that are
used to collect health outcomes data. The entire field of laboratory

16 Lesson OneÑThe Basics of Clinical Research


medicine is based on the accuracy of values derived from measurement
devices.

Sampling and measurement error are potentially fatal problems in clinical


research. Steps in the research design process must be taken to minimize
both types of error; otherwise, faulty statistical and clinical conclusions
can be drawn from research reports.

The Research Question


The planning and design of the research study depends on the research
question. The clarity of the research question focuses the direction of the
research study. Researchers usually have several related research
questions in mind as they begin to develop the study protocol. Only one
main question, however, should be addressed by the research study. A
clearly stated research question improves the research-design process and
increases the credibility of the entire effort.

The origin of the main research question depends upon the past experience
of the researchers, knowledge of the literature, and key or controversial
topics. Most clinical research questions derive from the application or use
of new technologies. The development of new drugs leads to interesting
questions about their efficacy and safety in various diseases and special
populations.

There are many sources of inspiration for new research questions. The
most notable are the medical literature, journal clubs, national meetings
and other scientific forums, direct observation of patients in a clinical
practice, and interactions with students, residents, graduate students, and
fellows.

Table 2.1 in Hulley and Cummings depicts the characteristics of a good


research question. Many of these features are self-evident. A good
research question must be feasible, novel, and relevant. Once the criteria
are met, the PI must determine whether the research team is technically
capable of conducting the study, whether there are sufficient resources for
the effort, and whether the study questions, the intervention, and the
research design are ethical.

Point 6 in the summary of chapter 2 of Hulley and Cummings is worth


repeating here. Developing the research question is an iterative process
that includes consultations with advisors and colleagues, a growing
familiarity with the literature, and a series of smaller scale research
efforts—including pilot studies—for testing and improving components of
the larger research plan.

Choosing Study Subjects


Two of the most difficult aspects of clinical research are (1) defining the
characteristics of the study sample and (2) identifying and recruiting
subjects that meet these eligibility criteria. Another challenging aspect of
clinical research relates to completing the study at a reasonable cost in
time and money. These critical features of clinical research are closely
linked.

Pharmacy C479 17
The following section describes the process of establishing appropriate
research selection and sampling strategies in order to meet statistical
significance criteria, satisfy the clinical research objective, and implement
the study at a reasonable cost.

Populations
Every person with Type I diabetes alive today on the planet Earth
constitutes the population of Type I diabetics; every male alive today with
HIV-1 in the United States represents the population of U.S. male HIV-1
carriers; and so forth. A subset of the population, let’s say males with
HIV-1 in New York City, represents a sample of the population.

From a research perspective, the population of subjects with a specified


condition represents the target population. The accessible population is a
subset of the target population that is available for study, defined by
geography and by time. The study sample is the assemblage of subjects
selected from among the accessible population.

Specifying Selection Criteria


The inclusion criteria explicitly define the main characteristics of the
target and accessible population and, importantly, determine the exact
demographic and clinical characteristics of subjects to be enrolled in the
study. These criteria are to be carefully selected and consistently applied
throughout the study, and will serve as the basis for applying the results of
the study to other clinical groups or populations. Exclusion criteria
systematically eliminate individuals who meet the inclusion criteria but
whose data are likely to interfere with the interpretation of the findings.
Inclusion and exclusion criteria serve to standardize the study sample at
the expense of generalizability. This trade-off needs to be understood and
weighed carefully by investigators and by readers of research papers.

Sampling
It is not feasible for investigators to enroll the entire accessible population
in a research study because of geographic and temporal barriers. As a
consequence, a smaller number, or sample, can be selected from among
the accessible population whose characteristics closely represent the larger
population. The process of selecting the sample is called sampling.

Two prevailing types of sampling strategies are available to researchers:


probability and nonprobability sampling. Probability sampling allows all
possible study subjects a specified (usually equal) chance of being selected
into the sample. The most frequently used sampling strategy in clinical
research is a simple random sampling scheme that selects persons at
random from amongst the accessible population. A systematic sampling
process involves selecting every ith individual, where i = 1 to n. Possible
selection biases limit the usefulness of this strategy in clinical research.
Stratification prior to random sampling allows disproportionate
representation from less common subgroups. Because there are
disproportionately fewer cases of HIV-1 infection in females than in
males, a stratified sampling approach would be needed to identify and

18 Lesson OneÑThe Basics of Clinical Research


recruit enough female subjects for a research study on gender differences
in HIV-1 treatment.
Nonprobability sampling schemes are more common and practical for
clinical research, yet suffer from serious problems related to statistical
testing. Statistical significance testing requires the assumption that a
probability sample has been used to select study subjects. While each of
the nonprobability sampling strategies strives to approximate a random
sample, the outcome is usually less than desirable. Consecutive sampling
involves taking every patient who meets the inclusion criteria. This is
common among clinical trials. Convenience sampling involves taking
those persons from among the accessible population who are conveniently
available to the research team. Usually this means everyone from a select
number of clinics. Judgement sampling is the quickest and involves
individual selection by the research team.

It may be clear to you now that nonprobability sampling, while convenient


and efficient, is likely in the worst case to produce a study sample that
does not adequately represent the accessible population.

Design Errors and Prevention Strategies


Errors in the design and implementation of the study are common and
preventable. Careful consideration of inclusion and exclusion criteria,
sampling strategies, measurement, and subject recruitment procedures will
minimize research errors and allow for completion of the study in a
feasible time period and at a reasonable cost. Tables 3.2 and 3.3 in Hulley
and Cummings illustrate common research design and implementation
errors and typical prevention alternatives.

Planning Measurement: Precision and Accuracy


Measurements are observations that describe clinical events in terms that
can be analyzed descriptively and statistically. Measurements are taken on
study subjects while they are enrolled in a research study. For example, an
investigator may measure a subject’s heart rate, blood pressure, blood
sugar, age, height, and weight. Measures may be taken through direct
observation, but more frequently are taken by using measurement tools
such as scales or rulers or blood pressure cuffs.

The external validity of a study depends upon how well the measures in
the study represent the “true” clinical events or outcomes. How well does
systolic and diastolic blood pressure measurements accurately reflect
hypertension? How well does peak expiratory flow reflect clinical asthma?
The internal validity of a study rests, in part, on the ability of the
measurement tools to accurately and precisely represent the phenomenon
of interest to the investigator. Is the electronic blood pressure device
measuring the “true” blood pressure of the study subject? Is the
questionnaire on mental status describing “true” mental impairment?

Measurement Scales
Data or measures often are represented numerically. For analytic purposes,
these measures fall into one of two general classifications: (1) continuous
measures or (2) categorical measures. Table 4.1 in Hulley and Cummings
displays these measurement scales. These distinctions are important in that

Pharmacy C479 19
the choice of statistical tests depends upon the type of measure being
compared. (This will be discussed in detail later in the course.) Note that
the technical term for a measure often used by scientists is “variable.”

Continuous measures have quantified intervals on an infinite scale of


values. Weight and height are two examples of continuous measures. The
ability of someone to make an accurate measure of height or weight is
limited by the ability of the tool to elicit the exact value. For example, a
standard bathroom scale may only be able to present a person’s weight in
pounds up to one decimal place, whereas a more accurate scale may be
able to measure weight out to four decimal places. The actual value of the
measure—weight—is limited only by the sensitivity of the tool used to
measure it. On the other hand, discrete measures are those with limited or
finite possible values.

When data are not readily quantifiable (i.e., when they do not have innate
numeric values), researchers often reconfigure them into categories and
assign numerical values to these categories. For example, characterizing
the gender of study subjects is not something that can be represented
numerically in the absence of a categorization scheme. Measures in which
there are only two possible categories are referred to as dichotomous, or
binary, variables. Gender (male or female) is a perfect example of a
dichotomous variable.

Continuous variables can be readily converted into categorical variables


for analytic purposes. A dichotomous classification of blood pressure
control can be devised simply by setting a threshold level. Persons with a
diastolic reading less than or equal to 90 mm Hg could be classified as
normotensive, and all others (90 mm Hg and higher) could be classified as
hypertensive. By creating these classifications, we have created a
dichotomous variable from a continuous measure.

Nominal variables are those with more than two categories for which the
ordering of the categories is not important. Ordinal variables are probably
quite familiar to you; they are measures with multiple categories for which
the ordering of the categories has meaning. For example, you are no doubt
familiar with the following type of question:

Q1. Please rate your overall health during the past two weeks:

1. Excellent
2. Very Good
3. Good
4. Fair
5. Poor

The response set (1 to 5) represents an ordinal variable in which the order


and value of the responses are meaningful to the researcher. A value of
“1” represents a level of health that is greater than a value of “2”, and so
on.

20 Lesson OneÑThe Basics of Clinical Research


Precision
A precise measure is one that has nearly the same value each time it is
assessed. If the values that appear each time a person steps on a scale are
the same over 10 different weighings, the measure (and the scale) is
described as being precise. The goal of any research project is to make
precise measures of all of the important outcome and predictor variables
so that the study can report the results of statistical tests with a defined
degree of confidence. The less precise the measurements, the less
confidence one can have in the statistical results.

There are at least three possible sources of error when making


measurements, all of which can reduce precision and have potentially
harmful consequences on the statistical results of a study. Observer error
results from the person making the measurement. For example, a
pharmacist may not interpret a blood pressure reading correctly, or a
physician may not determine a person’s heart rate (pulse) correctly.
Another source of error rests within the patient: intrinsic biologic
variability may produce different values on repeated measures. The final
source of error, and one of particular importance, is the error produced by
the measurement tool.

The degree of precision in a measure is represented numerically by the


standard deviation and the coefficient of variation. These values depict the
level or amount of error (but not the source, of course) around a central or
mean estimate of the measure. For example, when averaged together, 10
readings of diastolic blood pressure on the same person within one hour
produced a mean value of 87 mm Hg with a standard deviation of 8.7 mm
Hg. The standard deviation suggests that, on average, the individual blood
pressure measurements deviated from the mean of 87 mm Hg by
approximately 8.7 mm Hg. The coefficient of variation (the standard
deviation divided by the mean) represents a useful measure to compare
precision across different variables. In this example, the coefficient of
variation would be 0.10, or about 10% variability around the mean
estimate. Values with large standard deviations and coefficients of
variation lack precision.

There are additional approaches to assessing precision. Correlation


techniques are used widely to evaluate the concordance of paired
measurements. For instance, if three different measures of sleep quality
(e.g., three different types of questionnaires) are taken as part of a drug
study for persons with asthma, it would be logical to assume that the
results of the three measures would produce similar findings. That is, if
sleep quality were improved in one measure, similar findings should
follow from the other two. If the measures of sleep quality were not
correlated, then they lack precision. The same correlation approach can be
used to evaluate the consistency of measurements taken by two or more
observers. If the correlation coefficients are weak (e.g., less than 0.25), the
measures, or the inter- or intraobserver consistency, suffer from bias that
reduces precision.

Table 4.2 in Hulley and Cummings illustrates various strategies employed


by researchers in order to reduce the possibility of bias associated with

Pharmacy C479 21
random error. This is important for you as evaluators of the medical
literature in that any threats to the precise measurement of outcome or
predictor variables directly affect the statistical results and your
interpretation of the findings. When evaluating medical and
pharmaceutical papers, pay particular attention to the methods section and
the extent to which the investigators have employed the strategies listed in
table 4.2. Furthermore, note that all five approaches involve activities that
the research team can attend to prior to implementation of the study.
Outcome assessment tools can be refined and standardized through pilot
testing and previous research, observers and data collectors can be trained,
diagnostic tests can be automated and standardized across different sites,
and if necessary, measurements can be repeated to improve precision.

Accuracy
The accuracy of a measure refers to the degree to which the measure
actually represents what it is intended to. Recall the sleep quality study we
discussed in the previous section. If one of the sleep quality questionnaires
were to be written so that it focused less on the actual quality of sleep and
more on the duration of sleep, the measure could be said to lack accuracy
with respect to sleep quality measurement (and would probably be less
precise, too). Figure 4.2 in the text presents the relationship between
precision and accuracy in a basic and easily understood diagram. You
should study this chart carefully.

The lack of accuracy in a measure results from systematic error. The


greater the error (or bias), the less accurate the measure. Recall that
random error affects precision and ultimately affects the results of
statistical tests. Systematic error, or measurement bias, is much more
problematic and has severely detrimental effects on statistical tests and
determinations of the value of outcome and predictor variables. As you
will read later on, researchers take great pain in designing research studies
that avoid systematic bias. Randomization and blinding are two important
strategies for reducing systematic bias.

There are three main sources of systematic error:


1. Observer bias is a consistent misrepresentation of the ÒtrueÓ value of
a measure as reported by the observer. An observer may knowingly or
unknowingly report values of outcome or predictor variables that are
consistently greater than or less than actual values. Imagine an
observer consistently reporting greater than actual diastolic blood
pressure measurement for persons assigned to the placebo arm of a
hypertensive drug study (when compared to the active treatment arm)
simply because they had an interest in concluding that the active
treatment was better than placebo.

2. Subject bias is a consistent misrepresentation of the “true” value of a


measure as reported by the study subject. Analogous to the example
from above, patients have motivations for influencing the results of
drug studies.

22 Lesson OneÑThe Basics of Clinical Research


3. Instrument bias is a consistent distortion of the “true” value of a
measure caused by a faulty calibration of a lab standard, diagnostic
instrument, questionnaire, or any other measurement tool.

Unlike precision, for which there are quantifiable methods for assessing
the degree of bias (e.g., the standard deviation and coefficient of
variation), equivalent metrics are not available to determine the degree of
systematic error that ultimately affects accuracy. To do so, one would
need an estimate of the “true” value of a variable from which to compare
observed measures. These “gold standard” measures are rarely accessible.

Table 4.4 in Hulley and Cummings displays various strategies for


enhancing accuracy. We shall highlight two: blinding and instrument
calibration.

Blinding is a much-used approach that, importantly, eliminates differential


bias. Differential bias can doom a study. A differential bias is one that
affects one treatment group more than another. For example, knowledge of
assignment to treatment groups by the researcher may differentially bias
the measurement, collection, and analysis of data. Blinding the researcher
and the study subjects (e.g., double-blind) to the assignment of treatments
eliminates any possible differential bias. That is, if there is systematic bias,
it will affect measurements equally in the two treatment groups.

Inaccurate calibration of instruments and questionnaires is an important


source of systematic bias. If laboratory standards are not set properly,
outcome and predictor variables will be affected. Also, if laboratory
standards or diagnostic instruments differ across several sites participating
in the same study, the value of pooled measurements will be affected. It is
important to standardize all measurement tools within and across sites and
over time during the course of a clinical research study.

Validity
Accuracy is more difficult to assess when the outcome measures are more
subjective, such as quality of life, pain, and cognitive function. These
measures are typically assessed not by mechanical tools but rather by
questionnaires. The accuracy of questionnaires is defined in terms of
external validity. That is, how well does the measurement tool (the
questionnaire) represent the phenomenon (e.g., quality of life, pain) in
which the researcher is interested? Has the instrument been validated?

Validity has several components, three of which are described as follows:


1. Predictive validity relates to the degree to which the measurement
successfully predicts an health outcome of interest. A depression
questionnaire is said to have good predictive validity if it can
differentiate individuals into those with and without depression.

2. Criterion-related validity is the extent to which the measurement


agrees with other approaches for measuring the same characteristic. If
an asthmatic with poor lung function, as measured by spirometry, at

Pharmacy C479 23
the time of a clinic visit reports asthma symptom scores that are also
suggestive of poor lung function, the symptom questionnaire is said to
have good criterion validity. This is because the results of the
symptom questionnaire correlate with the results of the spirometer
values. If the symptom scores were not related to the lung function
values, the symptom questionnaire could be said to be an invalid
measure of lung function, even though it still could be a very good
measure of asthma symptoms.

3. Face or content validity is a subjective judgment on the part of the


investigator (and literature evaluator) of whether the components of
the questionnaire actually ask about the phenomenon of interest; that
is, whether it makes sense. If a questionnaire on depression is
evaluated and items or individual questions unrelated to the construct
of depression are discovered, the measurement tool is said to lack face
validity.

Conclusion
This concludes the discussion on research protocol design and
measurement. While complex, these issues are at the heart of a clinical
research study. If you have reached this point and lack a basic
understanding of any of these topics, please contact your instructor (by
electronic mail) for further explanation or additional readings. We will
move next into types of research designs for the purpose of answering
clinical questions. (Note that we skip chapters 5 and 6 in Hulley and
Cummings on purpose.)

Many of the pharmaceutical agents used to treat cancer, mental illnesses,


and disorders that involve pain and other subjective symptoms rely on
questionnaires as the source of outcome measurement in clinical trials. It
is therefore important that you become aware of the possible pitfalls and
biases of using questionnaires for clinical research. If you have interest in
studying this topic in more detail (for example, if you intend a career in
cancer or mental diseases), I strongly encourage you to read chapter 5. It is
short (10 pages) and addresses many of the critical issues related to the use
of questionnaires in clinical research.

You may be thinking about a research fellowship following the external


Pharm.D. program. If so, or if you have plenty of spare time, you may
want to read chapter 6, which concerns the use of secondary data. Many of
the leading medical and pharmaceutical journals are populated with
studies that have used secondary data—data that have not been collected
primarily to answer the researchers’ questions.

24 Lesson OneÑThe Basics of Clinical Research


Pharmacy C479 25
LESSON TWO
COHORT STUDIES
Reading Assignment

Hulley and Cummings, chapter 7

Objectives
At the end of this lesson, you should be able to
o differentiate between a prospective and retrospective cohort
study;
o recount inherent biases in cohort studies and describe methods to
eliminate or reduce these biases; and
o define and calculate a relative risk ratio as a measure of
association between the risk factor and outcome measure.

Key Terms
cohort
cohort study
incidence
prospective cohort study
confounding factors
retrospective cohort study
relative risk ratio

An Introduction to Cohort Studies


A cohort is defined as a group of subjects that are followed over time. In
cohort studies, subjects are identified and examined over time to
determine the incidence of disease in exposed and unexposed or
comparison group subjects.

Incidence is a measure of disease frequency and is defined as the ratio of


the number of new events or cases of disease that develop in a population
during a period of time (often one year) to the number of persons at risk of
contracting that disease during that period of time.

Cohort studies are generally used to compare the incidence of disease


among exposed patients and unexposed patients. Also, they can identify
relationships between risk factors and outcomes. For example, a
researcher may identify a sample of persons without a particular health
outcome, such as lung cancer. The investigator would then follow these
subjects over time taking periodic measurements of potential risk factors
(predictors) of the health outcome, such as exposure to cigarette smoke.
Based upon the results of the cohort study, a researcher may conclude that
the risk factor, in this instance smoking, is associated with a greater
incidence of lung cancer in comparison to persons not exposed to cigarette
smoking.

26 Lesson TwoÑCohort Studies


The purpose of a cohort study is to explore the relationship between risk
factors (e.g., NSAID use) and the incidence of various health outcomes
(e.g., GI bleeding). Many examples of this type of medical research should
be familiar to you; the relationship between fat intake and poor
cardiovascular outcomes, the relationship between smoking and lung
cancer, or the relationship between alcohol consumption and liver disease.
This is accomplished by comparing the presence or absence of the
condition in those who have the risk factor with those who do not. If the
incidence of the condition is greater in those with the exposure of interest,
the risk factor is said to be associated with the health outcome. For
example, there are many cohort studies that have established that cigarette
smoking is a risk factor for the development of lung cancer.

Prospective Cohort Studies


The prospective cohort study is one of the most valid types of
observational study designs. The design is one of the major research tools
of the epidemiologist and health services researcher. A prospective cohort
study is one in which study participants are recruited from an appropriate
population and then followed over a sufficient time period to observe the
occurrence of a particular health outcome. The investigator either
repeatedly measures exposure to certain risk factors over the course of the
study or assesses exposures only at the onset of the study and then relates
the presence or absence of the risk factor to the occurrence of new cases of
the health outcome. See figure 7.1 in the Hulley and Cummings text.
However, this study design cannot answer all the important epidemiology
or health services research questions, nor is it always the most feasible or
practical design. The following describes the main strengths and
limitations of the prospective cohort study design.
The prospective cohort study has these strengths:
¥ It is a strong design for establishing a causal relationship between
risk factors and health outcomes because the risk factors are measured
prior to the occurrence of the health outcome.

¥ It allows for assessment of multiple outcomes related to a single


risk factor. For example, the relationship between use of oral
contraceptives (the risk factor) and the presence or absence of breast
cancer, ovarian cancer, malignant melanoma, or myocardial infarction
may be determined in a single study.

¥ It is of particular value when the exposure factor occurs rarely in


the population since individuals are selected into the study on the
basis of exposure.

The prospective cohort study has these limitations:


¥ It is expensive and time consuming. It frequently requires a large
number of persons followed over a long time period. This design is
inefficient for the study of health outcomes that are not particularly
common.

Pharmacy C479 27
¥ The associations found in cohort studies can be erroneous if they
are due to confounding factors. Confounding factors are those that are
related to both the risk factor and health outcome of interest. The
example in the text regarding smoking as a confounding factor in the
relationship between exercise and CHD is particularly clear. As you
will learn in subsequent chapters, these confounding factors can be
measured and accounted for (controlled) through appropriate design
features and through the use of appropriate statistical methods.

¥ It is important to determine whether or not persons enrolled in the


cohort study have the health outcome of interest prior to beginning the
study. Subjects should be free of the health outcome prior to
enrollment in the study or the results may be biased. Effective
screening for subclinical forms of the disease should reduce this
potential bias.

¥ The validity of the results can be affected by loss to follow-up.


Loss to follow-up generally results from persons who withdraw or
prematurely terminate their participation in the study for whatever
reason.

Retrospective Cohort Study


An alternative to the prospective cohort design is the retrospective cohort
study. The researcher uses data from an existing cohort with available
baseline and follow-up measures and then proceeds to collect
retrospectively from the cohort additional information on possible risk
factors. (See figure 7.2 in Hulley and Cummings.) Because the original
cohort does not have to be recontacted and the data exist in some research
file, the cost of conducting these studies is much less than the prospective
study design. The main limitations relate to possible quality problems with
the existing data and measurements of risk factors. The investigator of a
retrospective study has to work with existing data including all of their
inherent biases.

Steps in Planning a Cohort Study


When Is a Cohort Study the Best Research Design?
Cohort studies are the best research design for
¥ accurately describing incidence and natural history of disease;

¥ describing the timing of risk factors in relationship to a health


outcome measure;

¥ studying rapidly fatal disease (they are the only way to study such
disease);

28 Lesson TwoÑCohort Studies


¥ permitting the investigation of many health outcomes associated
with one risk factor; and

¥ studies requiring long follow-ups on large numbers of persons.

How Are Subjects Selected for a Cohort Study?


¥ Subjects are selected based on their appropriateness to the
research question.

¥ Subjects who cannot possibly develop the health outcome should


not be included.

¥ Selecting subjects from among a population with a high


probability of developing the outcome can reduce the sample size
requirement for the study.

Measuring Risk Factors and Confounding Variables


The quality of the results will be, in large part, related to the quality of the
data collection. All variables, risk factors, confounding variables, and
outcome measures alike, should be measured with attention toward
accuracy and precision. If smoking exposure is not measured accurately,
the researcher is likely to misrepresent the relationship between smoking
and the health outcome of interest—in this case, lung cancer. For many
studies, the necessary frequency of repeated measurement of risk factors
and confounding variables depends upon the nature of the characteristic
and the resources the investigator has to make measurements.

Following Subjects and Measuring Outcomes


As has been mentioned previously, complete follow-up of study subjects
is important to the successful conduct of cohort studies. Not every person
enrolled in a prospective study can be retained by the investigative team
over the duration of the research project. Subjects terminate a study early
for a variety of reasons—some because they move, others because they do
not perceive any benefit from further participation in the study. Subject
dropouts or loss to follow-up can lead to underestimation of the “true”
incidence of the illness.

Retaining members in a cohort (cohort maintenance) is a unique challenge


for investigators. Often, the strategies employed involve frequent contact
by investigators or financial or other incentives to complete follow-up.
Table 7.1 in Hulley and Cummings delineates several strategies for
reducing the loss of subjects during the follow-up period.

Analyzing Cohort Studies: Incidence and Relative Risk


Recall that at the beginning of the chapter we described cohort studies as
being particularly suited for measuring the incidence of disease or for
estimating the relationship between risk factors and illness outcomes. The
incidence of disease is described in terms of the proportion of the “at risk
population” affected by the illness. Variability in this point estimate is

Pharmacy C479 29
represented by a confidence interval. (These concepts will be developed
further in the statistical analysis section of this course.)

Estimation of the association between risk factors and health outcomes is


determined by calculation of a relative risk ratio.

The Relative Risk Ratio


The relative risk ratio is the ratio of the risk of a health outcome in
persons with the risk factor of interest to the risk of the health outcome in
persons without the risk factor. It is calculated by dividing the incidence
rate of the health outcome in exposed subjects by the incidence rate of the
outcome in unexposed subjects.

Table 2-1 shows how data from a cohort study may be presented and used
to estimate relative risk.

Table 2-1ÑRelative Risk


Ratio
Health Outcome
Exposure to Risk Factor Yes No Total
Yes a b a+b
No c d c+d

The simple relative risk ratio is frequently used to describe associations


between health outcomes and risk factors in cohort studies. Relative risk
ratio is defined as

a ÷ ( a + b) incidence rate of disease in exposed group


RR = or RR =
c ÷ (c + d ) incidence rate of disease in unexposed group

Examples illustrated in the text are the relationship between exercise and
the risk of coronary heart disease (example 7.1) or the association between
mitral-valve prolapse and the risk of death, cerebral embolus, and
endocarditis (example 7.2). However, simple relative risk ratios fail to
account for the effect (bias) of confounding factors. Multivariate methods
(e.g., logistic regression) and stratification can take account of the effect of
confounding factors and allow for calculation of an “adjusted” relative risk
ratio. For more on this approach, see chapter 10 in Hulley and Cummings.

30 Lesson TwoÑCohort Studies


Pharmacy C479 31
LESSON THREE
CROSS-SECTIONAL AND CASE-CONTROL
STUDIES
Reading Assignment

Hulley and Cummings, chapter 8

Objectives
At the end of this lesson, you should be able to
o describe the basic features of a cross-sectional and a case-control
study;
o recognize the strengths and weaknesses of cross-sectional studies;
o discuss biases of particular concern in case-control studies, and
ways to minimize their influence; and
o define and calculate an odds ratio and interpret 95% confidence
intervals.

Key Terms
cross-sectional study
prevalence
case-control study
sampling bias
measurement bias
recall bias
confounding
odds ratio
confidence interval

Cross-Sectional Studies
Cross-sectional studies, sometimes called prevalence studies, are used to
determine the distribution of exposures, other risk factors, and/or diseases
in a sample of the population at a single point in time. Cross-sectional
studies tell us about the distribution of a disease or its risk factors in the
population, rather than its etiology. However, the distribution patterns may
suggest etiologic hypotheses that can then be tested by case-control and/or
cohort studies.

Example: What is the prevalence of Chlamydia in the population, and is it


associated with the use of oral contraceptives? To answer this question you
studied a sample of 1,800 nonpregnant women aged 15–34 years from an HMO
who underwent culture diagnostic testing for Chlamydia trachomatis. Each
woman completed a self-administered questionnaire on her contraceptive use just
prior to the examination.

This example uses the term prevalence, which is defined as the proportion
of the population who have a disease at one point in time. One can also

32 Lesson ThreeÑCross-Sectional and Case-Control Studies


determine the prevalence of risk factors (e.g., smoking among users of oral
contraceptives, use of long-acting benzodiazepines by the elderly) in a
prevalence study.

Advantages of Cross-Sectional Studies


The advantages of cross-sectional studies are as follows:
• The rapidity with which they can be conducted. A cross-sectional
study can provide information about disease and risk factors at the
same time.
• Their relatively low expense and high efficiency. Cross-sectional
studies can be convenient first steps in identifying hypotheses that can
be tested using more analytic study designs.

Weaknesses of Cross-Sectional Studies


The weaknesses of cross-sectional studies are as follows:
• It is difficult to establish cause-and-effect relationships from data
collected in cross-sectional studies. It is hard to know if the exposure
or other risk factor of interest actually preceded the development of
the disease.
• Unless they are large, cross-sectional studies are not suitable for
the study of rare diseases. Rare diseases can, however, be studied if a
sample is drawn from a population of diseased persons rather than
from the general population.

Conducting a Cross-Sectional Study


Once the research question is determined, you would likely begin by
specifying the target population for the study. One can study a total
population or, more likely, a sample of a population. The sample would
then be selected and the appropriate information would be collected about
present and (sometimes) past exposures, behaviors, diseases, and other
characteristics. It is important to remember that people are studied at a
point (i.e., cross-section) in time. Cross-sectional studies may include a
one-time examination (as in the example on the prevalence of Chlamydia)
or surveys of a population of individuals. Examples of the latter strategy
include ad hoc studies of the prevalence of chronic diseases in the general
population and cross-sectional national surveys, such as the Health and
Nutrition Examination Survey (HANES).

The main statistic for expressing disease (or risk factor) frequency in a
cross-sectional study is prevalence. The example noted at the beginning of
the chapter reported that the overall prevalence of C. trachomatis in the
study population was 0.038 or 3.8%. The same study found that the
prevalence of C. trachomatis was much higher (13.7%) among women
aged 15–19 years of age. Contrary to the example in the text, this
particular study also reported that there was no association between
current use of oral contraceptives and the prevalence of C. trachomatis (a
prevalence odds ratio of 0.99, 95% confidence interval of 0.57–1.73).

An Introduction to Case-Control Studies


A case-control study identifies a group of persons who have a particular
disease (i.e., cases) and persons who do not have the disease (i.e.,
controls). The control group serves as a reference group. The exposure

Pharmacy C479 33
histories of cases and control are ascertained to compute measures of
association (e.g., odds ratio) between exposure and disease.

Example: You decide to assess the effect of current use of triphasic oral contraceptives
(OCs) on the risk of ovarian cyst development. You identify 106 women aged 15–59
years old with a primary diagnosis of ovarian cysts at an HMO. You also identify 255
control women without ovarian cysts who are randomly selected from the HMO,
matched to the cases for age. Pharmacy and medical records are then reviewed to
determine if cases and controls were using OCs at the time the case was diagnosed (or
at the same date for the controls).

Case-control studies have been extensively used to assess the safety of


pharmaceuticals. There are many examples of case-control studies that
have identified important associations: vaginal cancer and
diethylstilbestrol (DES), Reye’s syndrome and aspirin, peptic ulcer disease
and nonsteroidal anti-inflammatory drugs, and venous thromboembolism
and oral contraceptives.

Advantages of Case-Control Studies


Case-control studies are advantageous because
• they can be conducted rapidly (case-control studies are generally
retrospective);
• they are relatively inexpensive and efficient (e.g., the study of
vaginal cancer and DES required only 8 cases and 40 controls, rather
than the thousands of exposed subjects that would have been required
for a cohort study of this question);
• they are well-suited for studying risk factors for rare disease (for
diseases that are rare—the vaginal cancer and DES example, or, for
example, risk factors for agranulocytosis—or have long latent periods
between exposure and disease, case-control studies are often the only
feasible option); and
• they are easy to evaluate several risk factors for a single disease
(information about past or current exposures to numerous risk factors
can be ascertained from cases and controls).

Weaknesses of Case-Control Studies


Case-control studies are weak because
• they cannot directly estimate the incidence or prevalence of the
disease, nor the attributable or excess risk;
• only one outcome can be studied, whereas cohort studies can
investigate numerous outcomes;
• they are not suited to rare exposures (since the presence or
absence of disease is the criterion for drawing the samples, subjects
are not selected on the basis of exposure);
• they are not suitable if the exposure(s) of interest is difficult or
impossible to assess retrospectively; and
• they are susceptible to several types of bias (as is noted later in
this lesson).

34 Lesson ThreeÑCross-Sectional and Case-Control Studies


Choosing Cases
• It is necessary to specify inclusion and exclusion criteria for
defining cases. The criteria should be objective, sensitive, and
specific.
• Unless otherwise specified, only incident cases (i.e., those newly
diagnosed during a defined time period) are included in the case
group.
• Special consideration should be given to the source of cases or
the population from which the cases are derived. Defined populations
(e.g., geographic, HMO) have certain advantages.

Choosing Controls
The goal is to select individuals in whom the frequency of exposure would
be the same as that of the cases in the absence of an association between
the exposure and the disease.
• They should be selected from a population whose distribution of
the exposure is the same as the population at-risk for becoming a case.
• Almost always, sampling is used to select control subjects.
Sources of controls include third-party payors’ enrollment records,
hospital records, and household samples.
• Controls should be selected from among persons in whom the
presence of exposure and other characteristics can be measured in a
comparable manner.
• Population-based controls have several advantages, including the
degree to which they represent the general population. They allow for
calculating attributable risk as well as an estimate of the population
frequency of exposure.
• Hospital- or clinic-based controls may be relatively inexpensive
to identify but are more susceptible to several types of bias and results
may not be as generalizable.

Assessment of Exposures
The goal is to accurately assess the presence (and level) of exposure (e.g.,
the risk factor) for the period of time prior to the onset of the disease
during which the exposure would have acted as a causal factor.

Sources of information include interviews, secondary data sources (e.g.,


medical pharmacy, birth, death), direct measurement (e.g., physical exam,
laboratory), and environmental sources (e.g., ambient air pollution).

Bias in Case-Control Studies


Bias is any error in the design, conduct, or analysis of a study that results
in a distorted estimate of an exposure’s effect on the risk of the disease.
Bias tends to produce results that depart systematically from the true
values (in contrast to random error). Three types of bias of particular
concern are sampling bias, measurement bias, and confounding.

Sampling Bias
Also referred to as a form of selection bias, sampling bias pertains to how
cases and controls get into a study. The separate sampling of cases and
controls makes it difficult to ensure that the controls represent the same

Pharmacy C479 35
underlying population. Selection bias occurs when noncomparable criteria
are used to enroll subjects in a study.

Measurement Bias
Because of the retrospective approach to measuring exposures, case-
control studies are susceptible to measurement bias. A type of
measurement bias referred to as recall bias is of particular concern. Recall
bias occurs when subjects report information in noncomparable ways. For
example, parents of infants with birth defect (i.e., the cases) may be more
likely than parents of normal infants (i.e., the controls) to report the use of
certain pharmaceuticals, because they will already have been concerned
about what might have caused the defect.

Some Methods to Reduce Measurement Bias


• Use memory aids, such as pictures of medications. Also, try to
validate exposure with independent sources.
• Data should be collected from cases and controls in the same
manner, using standardized data collection forms. Interviewers should
be trained and ideally should not know the case or control status of
persons being interviewed.

Confounding
Confounding occurs when a third factor, associated with both the exposure
and the disease, contributes independently to the observed association
between the exposure and the disease. Thus, one would have an indirect,
or confounded association. For example, a study of the effect of OCs on
the risk of myocardial infarction (MI) should also assess cigarette smoking
since OC users may be more likely to smoke and smoking increases the
risk of MI.

Analysis of Data from Case-Control Studies


Table 3-1 displays data from the case-control study of OCs and ovarian
cysts. Recall, cases, and controls were first selected, then the investigators
ascertained their use of OCs. In these data, the proportion of cases using
OCs at the time of diagnosis is 10/75 or 0.13, and the proportion of
controls using OCs is 22/198 or 0.11.

The measure of association between exposure and disease is the odds ratio
(OR) or relative odds. The OR is an estimate of the relative risk (RR),
particularly when the disease under study is rare. Often there is close
agreement between the OR and RR.

Table 3-1ÑOvarian Cysts and Current Use of


Triphasic OCs
Ovarian Cyst
Current Use of
Triphasic OCs Cases Controls
Yes 10 22
No 65 176

36 Lesson ThreeÑCross-Sectional and Case-Control Studies


Table 3.2ÑThe Calculation of an
OR
Case Control Total
Exposed a b a+b
Unexposed c d c+d

 a   c 
   a + c a × d
= a+c
case exposure odds
=
control exposure odds  b   d  b×c
 b + d  b + d

For the example noted in table 3-1, the unadjusted OR is 1.23 or


(10)(176)
.
(22)(65)

ORs and RRs are usually reported with confidence intervals (CIs), which
indicate the range of the point estimates (e.g., OR or RR). For example, a
95% confidence interval around an OR means that we can be 95%
confident that the “true” OR lies in this range. If a 95% CI excludes 1.0,
the finding is considered to be statistically significant. For example, a
study yielding an OR of 3.0 (95% CI of 2.2 to 3.8) is clearly showing an
important increase in risk. An OR of 0.6 (95% CI of 0.4 to 0.8) provides
strong evidence of a reduction in risk. As a final example, an OR of 1.23
(95% CI of 0.5 to 3.3) suggests that an association is unlikely, since the
95% CI includes 1.0.

Pharmacy C479 37
LESSON FOUR
EXPERIMENTS
Reading Assignment

Hulley and Cummings, chapter 11

Objectives
At the end of this lesson, you should be able to
o identify and discuss the important aspects of the randomized trial
including placebo or active control, blinding, and randomization;
o describe the deficiencies inherent to nonrandomized studies with
regard to study design, analysis and clinical interpretation of the
results;
o show the importance of a cross-over design; and

Key Terms
experimental study
between-group design
within-group design
randomized control trial (RCT)
randomized blinded trial (RBT)
experimental/intervention group
control group
run-in RBT
crossover study

Introduction to Experimental Designs


Experimental studies involve enrollment of a group (or groups) of subjects
for the purpose of determining the association between medical
interventions and health outcome variables. The investigator selects the
comparison groups and determines the intervention to be received by both
groups in order to study the effect(s) of the intervention. Because the
researcher is in control of patient selection and the intervention, thereby
controlling through the research design possible confounding factors, the
randomized trial is considered the “gold standard” for medical research.
The major advantage of this method is that the researcher is better able to
establish a causal relationship between the intervention and the health
outcome. A substantial majority of all pharmacotherapy studies involve
the use of experimental methods.

Two key features of experimental studies are group classification and


prospective follow-up. For example, subjects involved in a drug study may
be classified into two groups, those receiving a new drug and those
receiving usual care or a placebo treatment. Note that the text refers to
these groups as cohorts, whereas we and the conventional medical
literature refer to these as treatment groups (e.g., those receiving one
treatment or another). Once assigned to one of these treatment groups,

38 Lesson FourÑExperiments
subjects are then followed over time to observe the effect of the
intervention on various health outcomes. The purpose of the study is to
causally link the intervention to observed differences in health outcomes.
Experimental designs in clinical research are preferred over observational
studies such as case-control and cohort studies because they provide the
most rigorous test of the causal relationships between interventions and
health outcomes.

Types of Experimental Designs


There are two general types of experimental study designs: (1) between-
group designs and (2) within-group designs. The important distinction
between these two design features is that with between-group experiments
the health outcomes of persons completing the study are compared
between two or more treatment groups, each of which received a different
intervention. Within-group experiments require that study subjects serve
as their own control. That is, outcomes within the treatment groups are
compared before and after an intervention.

The randomized controlled trial (RCT) is the most widely used between-
group experimental design in clinical medicine. Graphically, the research
process for the RCT is depicted in figure 11.1 in Hulley and Cummings.
(Note that the book uses the term randomized blinded trial [RBT] to
describe the same method. For ease of clarity, through out the remaining
sections of this lesson we will refer to this design as the RBT. But be
aware that a preponderance of clinical literature refers to these types of
studies as RCTs.) The five steps in designing an RBT are as follows:
1. After considering the primary research question, investigators (a)
select an appropriate target population for sampling, (b) identify
necessary inclusion and exclusion criteria for subject identification,
(c) estimate the sample size requirement to answer the study question,
and (d) develop a recruitment plan. Recall from previous readings that
decisions about sample identification and the sampling procedure
determine, in part, the generalizability of the findings.

2. Investigators agree upon and assess key measurements that will


characterize the study subjects disease status and define the health
outcome variables prior to randomization to one of the treatment
groups. The medical literature frequently refers to health outcomes
measured as part of an RBT as study endpoints. It is these endpoints
that the researcher hopes to impact with the intervention. For
example, forced expiratory flow in one second (FEV1) is a primary
study endpoint for assessing the effect of asthma treatments on lung
function. Baseline measures of subject disease characteristics and
health outcomes will assist the investigator in describing the subjects
and, more importantly, assure that the two treatment groups are
balanced at the beginning of the study with respect to these key
measures. If these measures are not approximately equivalent across

Pharmacy C479 39
the treatment groups, it is possible that the randomization procedure
was unsuccessful.

3. Investigators randomize the study subjects to the treatment groups.


This rather simple sounding procedure is the most critical aspect of
the RBT. Randomization assures equal and unbiased assignment of
study subjects to the treatment groups. Furthermore, randomization
provides that possible confounding factors such as age, gender,
disease severity, and length of illness are distributed equally among
the treatment groups. Even with the most judicious randomization,
there are likely to remain slight differences in baseline factors that can
be attributed to chance. The randomization process is an important
component of what differentiates experiments from observational
studies.

Sometimes it is necessary to take special precaution in order to assure


that a possible confounder is controlled through the randomization
process or that special populations are enrolled in a trial. Suppose an
investigator wishes to design an RBT to investigate the treatment
effect of a new protease inhibitor for the management of AIDS. The
investigator then wishes to assess these treatment effects in different
subpopulations of persons with AIDS. The enrollment criteria are
broad and select from among all possible persons with AIDS. A
stratified random sampling scheme would assure that enough persons
from different risk strata (e.g., women and IV drug users) were
enrolled in the trial in order to test the treatment effects within these
subpopulations. Appendix 11.A of Hulley and Cummings describes
these special randomization procedures.

4. Investigators choose and apply the intervention. Between-group


designs always divide study subjects into an experimental or
intervention group and a control group in order to test the effects of
the intervention on the outcomes of interest. The experimental group
receives the intervention, whether a new drug, a new device, or a new
surgical procedure. The control group can either receive a placebo or
be assigned to some alternative intervention, most commonly the
prevailing standard or usual care. Several factors should be
considered when choosing and applying the intervention:

¥ Blinding. Recall that treatment randomization only protects


the study (and the investigatorÕs analysis of the results) from
confounding factors present at the beginning of the study.
Blinding study subjects and treating clinicians (double-blind) to
treatment assignment can protect against most confounding that

40 Lesson FourÑExperiments
may occur during the conduct of the study. The consequence of
unblinded studies is that unintended interventions (co-
intervention) can affect the between-group differences that the
investigator is attempting to measure. If complete blinding is
achieved, differential bias from any co-intervention that may
occur is minimized. It is important to remember that unblinded
studies suffer from this important limitation.

Despite its importance, it is often difficult to accomplish


complete blinding, particularly in drug studies. Suppose an
investigator wishes to compare the health outcomes of
theophylline versus inhaled corticosteroids in an adult COPD
population. Achieving a double-blind would be difficult because
one product is an oral tablet and the other an oral inhaler. The
treating clinician would likely know which patient was taking
which product. Double-blinding is more easily achieved in
placebo-controlled trials because a placebo look-alike product
can be produced for the control group.

¥ Selecting the intervention. Some interventions are not


suitable for blinding. The text gives the useful example of
exercise compared to cholesterol-lowering agents. Not all
research questions can be answered with blinded designs.
Another factor to consider when selecting the experimental
treatment and comparator relates to its relevance in clinical
practice. It makes no sense to study treatments that are long since
out of date or doses that are not appropriate to the patientÕs
condition or severity. Finally, if an investigator compares
multiple drug regimens, as is common in cancer chemotherapy
and HIV treatments, the results of the study will describe the
effects of the entire regimen and will not be able to discern the
individual effects of any one drug within the regimen.

¥ Selecting the comparator. The optimal approach to control


involves the use of a blinded placebo. This way, the difference
between the treatment and control group can be completely
attributed to the effect of the treatment on the primary health
outcome. More frequently, the selection of a placebo as the
comparator has been judged to be unethicalÑoften for good
reason. For example, the Diabetes Complication and Control
Trial (DCCT) investigators decided that it would not allow
control patients to participate in the study without any therapy.
The treatment group received intense insulin therapy, and the
control group received usual care that involved a less intense

Pharmacy C479 41
regimen of insulin. Note that any difference in health outcomes
between the treatment and control groups in the DCCT study
would represent only the difference between these two
approaches to the treatment of insulin-dependent diabetes and not
the absolute efficacy of intense insulin therapy compared to no
treatment.

There is a special problem to be aware of with the use of active


comparators. One possible result of an active comparator study is
that a new treatment may prove to be equivalent to the
comparator (e.g., no statistically significant difference in health
outcomes between the treatment and control groups). If the study
is too small in enrollment terms (if there are not enough patients
for statistical power) or is biased for some reason, a new
treatment may demonstrate equivalence when in fact it is less
effective than the comparator. For this reason, the FDA strongly
encourages placebo-controlled trials for demonstration of
efficacy and safety in new agents.

¥ Compliance with the intervention. The former surgeon


general of the United States, C. Everett Koop, MD, once said,
ÒDrugs donÕt work unless people take them.Ó As a pharmacist,
you know this to be true and work as hard as you can at getting
patients to take their medications correctly. Thinking about
compliance, you can then imagine how difficult it would be for
an investigator to test the differences between two treatments if
patients, for whatever reason, were not compliant with therapy or
did not adhere to the protocol during the course of the study. To
minimize this risk, investigators spend an enormous amount of
time attempting to get study subjects to take their treatments or
complete their study diaries according to study protocol. Many
methods are used to increase compliance, such as small out-of-
pocket payments and other incentives. As an aside, the offer of
financial incentives is highly controlled by IRB and ethics
committees so that patients are not unduly coerced by
investigators.

5. Investigators measure the outcome. Selecting the study endpoint is a


delicate process, balancing practical and statistical considerations.
Ultimately, the choice of the health outcome depends upon factors
that can be summarized as follows:

• Appropriateness to the research question. In a study of


comparable antiviral agents for the treatment of AIDS, the
standard outcome measure of the effect of treatment has been

42 Lesson FourÑExperiments
CD-4 count. This biological marker of disease activity is
considered a surrogate measure. If CD-4 counts are improved, the
pharmaceutical agents are said to have “biological activity in
AIDS.” But the true outcome measure of interest in AIDS is
survival, an outcome that is difficult to study in short-term drug
trials. Recall that for a dichotomous variable like survival,
frequent events over a long period of time are required to provide
sufficient power for statistical testing.

If CD-4 is correlated with survival in persons with AIDS, then


drugs that alter CD-4 may have an effect on survival. That is, the
CD-4 count is a surrogate for the actual measure of
interest—survival. Upon reflection, you may find that many of
the health outcomes studied as endpoints in drug trials are
surrogate measures (e.g., blood pressure, cholesterol levels, and
bacterial count). A further note: CD-4 count is currently being
replaced as the most important outcome in AIDS drug trials by
another surrogate measure, viral load.

• Statistical characteristics. Investigators must consider


measurement issues such as the ability to make precise and
accurate measures of the outcome during the research study. In
addition, investigators prefer outcomes that are continuous rather
than dichotomous for reasons of power and sample size.

• Number of outcome variables. Clinical research studies often


contain more than one health outcome in order to provide
different information on the phenomenon of interest. Note that
multiple outcome measures can be problematic. What if one
measure suggests a positive benefit and another suggests a
negative benefit? This can and does happen. Which will the
investigators report in the published paper? More importantly,
which is the true result?

• Importance of blinding. Triple blinding refers to blinding of


the investigator, study subject, and the person assigned to
measure the outcomes. In this way, ascertainment bias—or the
bias associated with ascertainment of outcome measures—is
reduced.

• Follow-up. The importance of follow-up should not be under


estimated. Complete follow-up (100% response rate) is the goal
of all clinical studies and investigators will take extra measures to
achieve this outcome.

Once the outcome measure has been measured and collected, the task of
statistical evaluation of the outcome data follows. In general, the
researcher aggregates the data by treatment group and performs the
necessary statistical tests, selected a priori, of the between-group
difference in the primary outcome measure. Generally, if the outcome

Pharmacy C479 43
measure is dichotomous the investigator will use a chi-square test or a test
of proportions. If the outcome measure is continuous, the researcher will
likely use a t-test if the data are normally distributed or a nonparametric
test if the data are not normally distributed. If the outcome is measured in
units of time (e.g., time to death or time to an event), survival analysis
techniques are used. If there is baseline or treatment-period confounding,
multiple regression techniques can be used to adjust for these differences.
The technical detail of these methods is described in the subsequent
section of this course.

You are advised to read appendix 11.b (pp. 212–14), which covers intent-
to-treat, subgroup investigations, and stopping rules. These are important
topics that deserve consideration. Note that the Medical Literature
Evaluation course (Pharmacy 493) discusses these issues in greater detail.

Special Types of RBTs and Other Experimental Designs


The next section describes variants to the RBT. We will highlight two of
these designs and ask you to read about the others. Please pay attention to
the section on nonblinded studies. Earlier we dismissed these types of
studies as providing insufficient control of important biases. There are
instances, however, when unblinded designs are necessary and may be the
only option.

The run-in RBT is used frequently for drug studies. The goal is to increase
compliance with the intervention. Subjects are placed either on active or
placebo treatment for a brief period prior to the baseline assessment in
order to select only those patients who are likely to comply with the
treatments and protocol during the course of the study. Once selected,
patients are randomized in the usual fashion to treatment and control and
the study proceeds.

Another variation of the RBT is the crossover study. With this design,
persons randomized to the initial treatments (active or placebo) are
switched to the other treatment (“crossed over”) after a period of time.
This allows the investigator some flexibility. Fewer patients can be used,
time-dependent confounders can be avoided, and between-group and
within-group analyses can be performed. Recall that between-group
analysis is one in which treatment differences between the groups can be
assessed. Within-group analysis implies that the effects of treatments can
be assessed within the same patient or group. This design is frequently
used for dose ranging or dose finding studies of new pharmaceuticals.

Conclusion
Lessons Two through Four have endeavored to discuss the elements of
basic clinical research designs in sufficient detail to present an
understanding for those interested in evaluating the medical and
pharmaceutical literature. Any further discussions of research design are
better undertaken in the context of the review of actual studies, which are
presented in the literature evaluation course.

The next section of this course focuses on the descriptive, graphical, and
statistical analysis of data collected as part of a clinical research study.

44 Lesson FourÑExperiments
The lessons deal with the complex issues of presenting and depicting the
results of clinical studies. We encourage you to study these forthcoming
topics carefully and thoroughly; this material is the basis for much clinical
decision making. The statistical results of clinical studies have had far-
reaching effects on the practice of medicine and pharmacy. It is best that
you and your colleagues gain knowledge about the applications of
biostatistics in medical and pharmaceutical research.

Further Reading
If you are interested in further reading on the topics of clinical research
methods, you are encouraged to explore chapters 12 through 14. If you are
contemplating a clinical research career, you should read the rest of the
book. Even if you have no additional interest, you may want to make a cup
of coffee and spend an hour or so reading or skimming the contents of
chapter 14. This chapter addresses the very real issues of the ethics of
clinical research, such as placebo-control, financial compensation, and
other important topics. Mostly without your knowledge, many of your
patients are enrolled in clinical drug studies. If you work in an inpatient
facility, you may be aware of those that are. You are not likely to know if
you work in an ambulatory setting. After reading chapter 14, you are
likely to learn how your patients were approached for participation, given
informed consent, compensated, and encouraged to complete the study.
The more informed you are about the conduct of clinical research, the
better pharmaceutical care you are likely to give your patients.

Pharmacy C479 45
ASSIGNMENT #1
Following the outline presented in Lesson One, prepare a brief research
protocol of no more than three pages to address a research project that is
of interest to you in your current work environment. This can be a drug
study, a study of a pharmaceutical care program or service, or any other
medical or pharmaceutical topic. In the protocol, state the research
questions, the study sample, the selection criteria, the health outcomes,
and the intervention and nonintervention groups. Describe the research
design you selected and give the benefits and limitations of this design
over other possible designs.

Attach Assignment Identification Sheet #1

46 Lesson FourÑExperiments
Pharmacy C479 47
LESSON FIVE
PREPARING FOR THE FIRST MIDTERM
EXAMINATION
This test covers chapters 1-4, 7, 8, and 11 of Hulley and Cummings.
Remember that it is a closed-book, closed-note test. However, you are
allowed to use one page of formulae to help you during the exam and also
a calculator. To help you complete the exam, a copy of appendices A, B,
and C from the Le and Boen text will be provided. You will have ninety
minutes to complete the examination.

See the following page for information on scheduling your examination.

Submit Assignment Identification Sheet #2

48 Lesson FiveÑPreparing for the First Midterm Examination


Pharmacy C479 49
LESSON SIX
DATA ANALYSIS: DESCRIPTIVE
MEASUREMENT
Reading Assignment

Le and Boen, chapters 1 and 2 and section 7.2

Objectives
At the end of this lesson, you should be able to
o explain the difference between a continuous and discrete variable;
o compute a proportion and graphically display your results in a bar
or pie chart;
o construct a two-by-two table and from it calculate disease
prevalence, and the sensitivity and specificity of a test;
o compute a mean, variance, and standard deviation of a data set;
o read and understand a survival curve; and
o explain the concept of standardizing rates for comparison.

Key Terms
histogram
stem and leaf diagram
mean
median
variance
standard deviation
sensitivity
specificity

Understanding the Data Set


Data comes in many forms—usually as a collection or batch of data.
Before analysis begins it is wise to get an understanding of the data set. A
relatively easy way to do this is to display the data graphically, as is done
in the following figure. A picture (graph) of the data allows you to quickly
determine the mode(s) of the data, the general shape of the distribution
(symmetric or skew) and the spread of the data. While knowing the mean
of a distribution is useful, without also knowing the variance you don’t
really have much information about the data. Although the following data
sets have identical means they have different spreads or variances and are
quite different.

50 Lesson SixÑData Analysis: Descriptive Measurement


Histograms and stem and leaf diagrams allow visual representations of the
data. Stem and leaf diagrams also allow you to recover any data point
from the data set. Histograms do not as they provide a summary of data
values in each interval. Despite this, histograms are useful and more
commonly used than stem and leaf diagrams.

The mean and median are both statistics of any data set. The mean is
calculated by adding the values of all data points and dividing that sum by
the number of data points. Another name for the mean is the average. The
median is found by arranging the data in ascending order and taking the
middle value. When a data set has an odd number of observations N, the
median is the (N + 1)/2 observation when the data are ordered. If a data set
has an even number of observations the median is the average of the two
middle observations.

Example 1
Data Set A
1, 2, 3, 7, 9, 12, 22
Mean: 1 + 2 + 3 + 7 + 9 + 12 + 22 = 56/7 = 8
The mean of data set A is 8.
The median is 7 because half of the data points are greater than 7, and half are less
than 7.

Data Set B
2, 3, 5, 6, 8, 14
Mean: 2 + 3 + 5 + 6 + 8 + 12 = 36/6 = 6
The mean of data set B is 6.
The median is (5 + 6)/2 = 5.5.

The median is not affected by extreme values and doesn’t use information
about the spread of the data. When a data set is symmetric the mean and
median are quite similar to each other. Compare the mean and median of
data sets C and D.

Data Set C Data Set D


1, 2, 10, 80, 92 0, 5, 10, 15, 20
Mean = 37 Mean = 10
Median = 10 Median = 10

In data set C, the median is not affected by the large size of the
observation 80 and 92. The mean is a more appropriate measure of
centrality. In data set D, which is symmetric, the mean and median both
provide a good measure of centrality.

Pharmacy C479 51
Another descriptive attribute of a data set is its variance, which is a
numeric value describing the range of the data. The larger the variance, the
more spread out the data set.

The data graphed as a solid line in the preceding figure have a larger
variance than those in the figure graphed as a dashed line.

The positive square root of the variance is called the standard deviation. It
is a useful measure because it is measured in the same units as the data.

Example 2a
Data Set E
0, 3, 4, 6, 8, 9
Mean: 0 + 3 + 4 + 6 + 8 + 9 = 30/6 = 5
Variance: (0 – 5)2 + (3 – 5)2 + (4 – 5)2 + (6 – 5)2 + (8 – 5)2 + (9 – 5)2
= 25 + 4 + 1 + 1 + 9 + 16 = 56
55/5 = 11.2.

The variance of data set E is 11.2. The standard deviation of Data Set E is the square
root of 11 or approximately 3.3.

(∑ x )
2

∑x 2

n
A quicker way to compute the variance is given by .
n −1

Example 2b
Data Data Squared
0 0
3 9
4 16
6 36
8 64
9 81

Total: 30 206

206 − 900 / 6
variance = = 11.2
5

Most scientific calculators will compute the mean and variance of a data
set. Check your manual to see if yours will. Try the previous example with
your calculator to insure you understand how to compute a variance by
hand and on your calculator.

52 Lesson SixÑData Analysis: Descriptive Measurement


When the outcome we are trying to improve is time to an event (for
example, time to relapse), a survival rate is the appropriate statistic to
compute. A group of patients called the cohort are followed for a given
period of time. If no event occurs in the observation period we say the
patient is a censored data point. Carefully study example 2.7 in your text.

When a data set is larger, calculating the Kaplan-Meier curve can be quite
computationally intensive. We can create a curve using the actuarial
method similar to a cumulative frequency distribution. Your text discusses
this process in detail at the end of chapter 2. The curve graphically depicts
the survival rate as a percent at any point in time. See figure 2.10 in your
text. For example, in figure 2.10, at t = 5 years approximately 30% of the
cancer patients in the study are still surviving. When comparing two
curves (to compare two treatments for instance), the better treatment
would have a higher percent of survivors over the course of the curve.

Pharmacy C479 53
ASSIGNMENT #3
Le and Boen, chapter 1: exercises 3, 4, 9, 16, 27
Le and Boen, chapter 2: exercises 2, 11, 13–15, 18, 20

Attach Assignment Identification Sheet #3

54 Lesson SixÑData Analysis: Descriptive Measurement


Pharmacy C479 55
LESSON SEVEN
DISTRIBUTIONS
Reading Assignment

Le and Boen, chapter 3 and appendix B

Objectives
At the end of this lesson, you should be able to
o explain conditional probability;
o explain the meaning of independent events;
o express the sensitivity, specificity, positive predictive value and
the negative predictive value of a test as a conditional
probability;
o calculate a probability from a z-score;
o standardize statistics to a standard normal curve; and
o approximate a probability from a binomial distribution using a
normal approximation.

Key Terms
probability
simple random sampling (SRS)
conditional probability
positive predictive value (PPV)
negative predictive value (NPV)
z-score
binomial distribution

Introduction
This section is designed to introduce you to the tools you will need in
subsequent lessons. The chapter begins by defining and discussing
probability and expresses some ideas already introduced in terms of
probability. Additionally you will be introduced to the normal distribution.
The normal distribution is arguably the most powerful tool in statistical
experimentation. It is vital to interpreting literature and performing
research to understand its application.

Probability
Scientific conclusions are drawn after examining evidence from
experimentation. This is true is all fields. The evidence considered leads
us to believe our conclusion. Sometimes our conclusions are incorrect. We
formulate our conclusions based on probability. Probability can be defined
as

number of positive outcomes


number of possible outcomes

56 Lesson SevenÑDistributions
What we consider a positive outcome is defined by what we are studying.
If we were rolling a fair die, we could define a positive outcome as a 3 or
5 and the probability would be equal to 2/6 (which reduces to 1/3) because
there are two positive outcomes (3 and 5) and six possible outcomes (1, 2,
3, 4, 5, and 6). We could define a positive outcome as a disease-free state
or even as a disease-present state. In the case of a disease-outcome, the
probability is the number of persons diagnosed as having the disease
divided by the number of persons checked for the disease. If 1,000
randomly selected children were screened for Disease A, and 150 of them
were diagnosed as diseased, the probability of Disease A would be
denoted by Pr(Disease A) = 150/1000 = 0.15.

It is important when determining probability to select carefully the sample


to study. For example if you were trying to determine the probability of a
child having Disease A above and screened all children from a doctor’s
office you would probably over-estimated the disease prevalence.
Children in a doctor’s office are not representative of the general
population of children. The fact that they are in the doctor’s office means
that they are more likely to be ill.

Simple random sampling (SRS) means each person has an equally likely
chance of being selected to the sample. Researchers often use random digit
dialing to select samples for studies. Practically speaking, most homes
have phones and therefore any household would have an equally likely
chance of being selected. Studies done using only volunteers must be
careful of the conclusions drawn and applied to the general population
because people willing to participate in a study may be fundamentally
different in some ways than those who refuse to participate.

Conditional probability is the chance something will happen given that


some other condition has been met. For example, we might be interested
in the probability that a person has a disease given that they have tested
positive for the disease. This could be denoted by Pr(diseased ã test
positive). This is also called the positive predictive value (PPV) of a test.
The negative predictive value (NPV) of a test is Pr(disease free ã test
negative); that is, the probability that a person is disease free if they test
negative for the disease. Clearly we would like tests to have large PPV and
NPV.

Two other test measures that can be expressed as a conditional probability


are the sensitivity and specificity of a test. The sensitivity of a test is
defined as Pr(test positive ã diseased). We want the sensitivity of test to be
high because we want to know that if we test a diseased person we will in
fact determine that they are diseased. If we don’t have reasonable
sensitivity, the test is not very worthwhile. The specificity of a test is
defined as Pr(test negative ã disease free). If a test has low specificity, we
obtain a large proportion of false-positives (tests that indicate a person is
diseased when in fact they are not). If the cost of the test is low, a low
specificity is not as important because a person testing positive will just be
retested to confirm the result. If the cost of a test is high, a low specificity
becomes more important in judging the usefulness of the test.

Pharmacy C479 57
In order to draw conclusions from our studies, we make comparisons
between different treatments or applications. For example, if we want to
introduce a new drug treatment, it needs to be compared to and outperform
in some aspect (effectiveness, cost, side-effects) the current standard
treatment. In order to make this comparison we frequently use statistics
from our data and the normal distribution. In this lesson, you need to be
able to read a table of the area under a normal curve (appendix B).
Carefully study the examples in your book in section 3.2.2. Keep in mind
the total area under the curve is equal to 1. The curve is symmetric, and
therefore the area to the right of 0 equals 0.5, as does the area to the left of
0.

In order to make comparisons using a normal table, we must first


standardize our normal curve. We do this with the following equation:

x−µ
z=
σ

The value of z, called the z-score, is the number of standard deviations our
measurement is from the mean. Important values to know are z = 0.99,
which cuts off 68% of observations, and z = 1.96, which cuts off 95% of
observations. This means that 68% of observations fall within one
standard deviation of the mean in either direction, and 95% of
observations fall within two standard deviations of the mean in either
direction. (See figure 3.4 on page 85.)

Example
Suppose that the mean mg/100mL of cholesterol is 275 with a standard deviation of
75. Find the probability that a person’s cholesterol is below 200.

Pr(X < 200) = Pr([X – 275] / 75 < [200 – 275] / 75) = Pr(Z < –1)
= .5 – Pr(–1 < Z < 0)
= .5 – .3413
= .1587

Another distribution commonly used in studies is the binomial


distribution. It is useful when the outcome is dichotomous. The probability
that the outcome is positive is denoted by P. Since a binomial distribution
has only two possible outcomes, the probability of a negative outcome is 1
– P. Note the sum of these two probabilities is 1, which means one of
these options must be the outcome. For example,

Pr(person being diseased) + Pr(person being disease-free) = 1.

A person must fall into one of the two categories. If number of trials
performed (persons studied) is large enough, we can use a normal
distribution to approximate a binomial distribution. The mean is estimated
by nP and the variance is estimated by nP(1 – P). Once we have these
numbers, we apply the same procedures as we did when using the normal
distribution.

58 Lesson SevenÑDistributions
ASSIGNMENT #4
Le and Boen, chapter 3: exercises 1–3, 5, 7, 8, 10, 12, 19, 20

Attach Assignment Identification Sheet #4

Pharmacy C479 59
LESSON EIGHT
CONFIDENCE ESTIMATION
Reading Assignment

Le and Boen, chapter 4 and appendix C

Objectives
At the end of this lesson, you should be able to
o explain the concept of a random variable;
o explain how a statistic has a distribution of its own;
o estimate a population mean;
o compute a standard error;
o construct a confidence interval with varying measures of
certainty; and
o estimate a population proportion from a sample and construct a
confidence interval around it.

Key Terms
statistical estimation
confidence interval
Central Limit Theorem
standard error

Introduction to Statistical Estimation


Scientific experimentation is usually done to discover information about a
particular point of interest. To describe our results we usually want to
estimate some parameter of interest. For example, if we were studying a
new drug regimen we might want to report the number of adverse side-
effects a patient experienced. Since the number of side effects experienced
by any one patient is a variable number we would want to summarize the
side-effects of all our patients. In other words, we might choose to report
the average, or mean number of side-effects experienced. Since we could
not try our drug out on an entire population we would study only a sample
of patients. Using our data we want to estimate the number of side effects
a person in the general population is likely to experience.

This scenario is called statistical estimation. Using statistical theory with


rigorous research we can draw conclusions, or estimate, information about
the population our sample represents. It can be shown that the sampling
mean of a population is normally distributed with variance equal to s2/n.
This is intuitively pleasing because it implies our variance (or uncertainty)
decreases as n (the number in our sample) increases. This makes sense
because we would expect that the more data points we have, the better our
estimate would be. Would you believe that the results of a poll that
surveyed 1,000 people more accurately reflected the population’s views
than the results of a poll that surveyed 25 people?

60 Lesson EightÑConfidence Estimation


Constructing a Confidence Interval for a Mean
Once we have estimated the mean of the population using the mean of our
sample, we want to express the certainty of our estimate. This is called a
confidence interval. Knowing that the mean is normally distributed, we
can determine an interval that will contain 95% of all sampling means.
This interval can be written as x ± 1.96 SE( x ) . This process is exactly
analogous to our work using the normal distribution is the previous study
unit. Recall that the total area under the curve was equal to 1 and that 1.96
standard deviations in either direction cut a piece of the area equal to 95%.
Think of the standard error as the standard deviation of our sample mean.

Example 1
A simple random sample was selected from a population. The sample size, mean and
variance are given here. Calculate the 95% confidence interval around the mean.

sample size: 50
sample mean: 39.4
variance: 0.7
95% confidence interval: = 39.4 ±1.96√(0.7/50) = 39.4 ± 0.12

Comparing Two Means


Matched Samples
If we are comparing two treatments or groups, as is often the case in
research, there are two ways to combine the samples. The first scenario is
when we have a matched sample. By this we mean the data points (people,
plants, machines) in the first sample are not independent of those in the
second sample. Both “before and after” experiments and crossover studies
fall into this category. For each pair of data points we can come up with a
single summary measure. An example of this is the difference between
Treatment A outcome and Treatment B outcome, or the difference
between our measurement at Time Point 1 and Time Point 2. For example
suppose we are comparing the number of asthma episodes a patient
experiences on two different drug regimens. A way to summarize the two
data points is to take the difference of the two. The mean difference across
patients will be normally distributed according to the Central Limit
Theorem (see section 3.2). Using this information, we can construct a
confidence interval of the difference between the two treatments.

Independent Samples
If the data of the two samples are unrelated or independent, our approach
is slightly different. The mean for each group is computed and then the
difference of the means is estimated and its confidence interval calculated.
The standard error of the difference of two means is given by the
following:

SE( X1 − X2 ) = (s2
1 n1 ) + (s22 n2 )

Again the statistic is normally distributed so we can make the desired


inferences. Carefully read section 4.2.3 of Le and Boen and study the
examples of the matched and independent samples.

Pharmacy C479 61
Sometimes the quantity we want to estimate about a population is a
proportion. If the outcome variable is categorical or dichotomous, this is
the case. We use the sample proportion p to estimate the population
proportion. We denote the population proportion with π, pronounced
“pie.” The variance of the sample proportion is [p(1 – p)]/n.

When our sample is sufficiently large, the sampling proportion will be


normally distributed. This means we can apply the same procedure for
computing the confidence interval of a proportion as we did for computing
the confidence interval for a mean. Therefore, the 95% confidence interval
for a proportion p is p ± (1.96)SE(p).

Example 2
A simple random sample of 100 people yields the following: 75 of the people have
brown hair and the remaining 25 have some hair color other than brown. From this we
can estimate p (the population parameter for brown hair) as 0.75. The variance of this
binomial is [(.75)(.25)]/100 = 0.001875. The standard error is √(0.001875) = 0.043,

The 95% confidence interval for the proportion of the population with brown hair is

.75 ± 1.96(.043) = .75 ± 0.08 = (.67 ⇒ .83)

For purposes of demonstrating a proportion, we considered only two


outcomes for hair color—“brown” and “not brown.”

Confidence Intervals for an Odds Ratio


Finding the confidence interval for an odds ratio involves some knowledge
of the natural logarithm (“natural log” for short) and exponential
functions. Without explaining these two functions here, we can state that
the two functions are inverses (or rather that they “undo” each
other—much like multiplication and division do). The variance related to
odds ratios that you are given is the variance of ln(OR), which is read “the
natural log of the odds ratio;” this is calculated as

1 1 1 1
Variance[ln(OR)] ≅ + + + ,
a b c d

where a, b, c, and d represent the values in a two-by-two table. In order to


compute the variance of the odds ratio (which is what we are interested
in), we must first calculate the confidence interval of the ln(OR) using our
usual technique. We must then raise e (which is a constant approximately
equal to 2.71, and which can be found on any scientific calculator) to the
power of each of the two endpoints of this confidence interval. The result
is the confidence interval of the odds ratio.

See the example on the next page.

62 Lesson EightÑConfidence Estimation


Example 3
A study looking at the relationship between vitamin A intake and breast cancer resulted
in the following data:
Vitamin A Cases Controls
(IU/month)
≤150,500 893 392
>150,500 132 83

Variance [ln(OR)] ≅ 1/893 + 1/392 + 1/132 + 1/83 ≅ .0232


√(0.0232) = 0.1526

The 95% confidence interval for the odds ratio is given by exponentiating both parts of
the following:

ln(1.43) ± 1.96(.1526)
eln(1.43) ±1..96(.1526) = (1.06 ⇒ 1.93)

Notice the confidence interval is quite narrow here and that it does not include 1. We
therefore can conclude an association between vitamin A intake as defined here and
breast cancer.

Pharmacy C479 63
ASSIGNMENT #5
Le and Boen, chapter 4: exercises 2–4, 6–8, 13, 15, 20

Attach Assignment Identification Sheet #5

64 Lesson EightÑConfidence Estimation


Pharmacy C479 65
LESSON NINE
PREPARING FOR THE SECOND MIDTERM
EXAMINATION
This test covers chapters 1Ð4, section 7.2, and appendixes B and C of Le
and Boen. Remember that it is a closed-book, closed-note test. However,
you will be allowed one page of formulae to help you during the exam and
you may also bring a calculator. To help you complete the exam, a copy
of appendixes A, B, and C from the Le and Boen text will be provided.
You should find the test problems to be similar to the homework
problems. You will have ninety minutes to complete the examination.

See the following page for information on scheduling your examination.

Submit Assignment Identification Sheet #6

66 Lesson NineÑPreparing for the Second Midterm Examination


Pharmacy C479 67
LESSON TEN
HYPOTHESIS TESTING
Reading Assignment

Le and Boen, sections 5.1–5.3

Objectives
At the end of this lesson, you should be able to
o explain the concept of a hypothesis test;
o state a research question as a null and alternative hypothesis;
o explain the meanings of Type I and Type II errors;
o differentiate between a one-tailed and two-tailed significance test
and explain when each is appropriate to use;
o perform a hypothesis test; and
o describe the relationship between a hypothesis test and a
confidence interval.

Key Terms
Type I error
Type II error

The Purpose of Hypothesis Testing


When we conduct a study in any field we are trying to answer a research
question. For example, we may be interested in whether two populations
differ in some way, or whether a new drug outperforms the current
accepted drug therapy for a given medical condition. We answer these
questions by carefully setting up an experiment or study and interpreting
the data collected. Consider the question of a new drug therapy. We may
want to think about how much better a new drug would need to perform to
be clinically important. Our study may well find that there is a difference
between the two drugs being tested but not such a large difference to be of
practical importance. This is the difference between being statistically
significant and clinically significant.

Errors
When we set up a study, we hope that we will draw the correct
conclusions from the data. However, this is not always the case. We can
make two kinds of errors when we make decisions about data. Consider
that there are only two possibilities in our drug trial: either the new drug
performs better than the current drug therapy or it does not. Given only
these options, we could correctly conclude that the drug performs better
than the current drug when it in fact does, or we could erroneously
conclude that it does not perform as well as the current drug. This is called
a Type II error: the mistake of not finding a difference when one exists. If
no difference existed between the two drugs, we could also draw two
different conclusions. We could correctly state there was no difference, or
we could make another type of error. The other error would be to conclude

68 Lesson TenÑHypothesis Testing


that the drug does outperform the current therapy when in fact it does not.
This is called a Type I error: the mistake of stating there is a difference
when in fact there is not (other than that caused by chance variation). We
denote the probability of making a Type I error as α. Usually we set α at
0.05. We want to be correct 95% of the time when we say that a
significant difference exists. Type II error levels are called β. Since α and
β are related, we cannot always achieve the level of β that we want to. The
power of a study is 1 – β, which is the probability that we will find a
difference if one exists. Since the variance of our statistic is influenced by
our sample size, we can improve the power of our study (decrease β) and
maintain our α level if we increase our study sample size.

Stating a Research Hypothesis


When we are investigating a research question, we state it as a hypothesis.
We state what we are trying to disprove as the null hypothesis, which is
denoted by H0. For example, in our hypothetical drug study we could state
the null hypothesis as “H0:” There is no difference in performance level
between Drug A and Drug B. Our alternative hypothesis (that we want to
show to be true) would be “HA: Drug B outperforms Drug A.”
Mathematically we might say “H0: XA – XB = 0” (where XA and XB are the
population parameters for populations A and B, respectively), which
would mean that there is no difference between the test statistic of the two
groups. Our alternative hypothesis would be “HA: XB – XA > 0,” which
would mean that the group B statistic is larger than the group A statistic.

Finally, we have to decide which hypothesis we will believe. Our data


should guide us in our decision. We must decide whether to reject the null
hypothesis. We must decide if our data make it difficult to believe that the
null hypothesis could be true. Our statistics (e.g., the mean) must be
sufficiently different so we do not believe they are different due to only
sampling variability.

Example 1
We want to investigate the study question “Mothers with a low socioeconomic status
(SES) deliver babies whose birth weights are lower than normal.” Let the normal mean
weight for newborns be 120 ounces. Our null hypothesis and alternative hypotheses
are as follows:

H0: µlw = µ0 (mean for low SES is the same as the average mean birth weight)

HA: µlw < µ0 (mean for low SES is less than the average mean birth weight)

Note that we are only interested in the birth weight from low socioeconomic status
women being less than normal so we will express our alternative hypothesis with only
that option and use a one-sided test. Suppose we observe a mean of 100 ounces in our
sample with a standard error of 10. What should we conclude?

100 − 120
We calculate a z-score: z= = 2.0
10
A z-score of 2.0 has a p-value of .045. This means that if the null hypothesis were true,
we would get a value this extreme (or different) only 4.5% of the time. This leads us to
reject the null hypothesis that there is no difference in the mean birth weights of the
two groups.

Pharmacy C479 69
A p-value can be thought of as a measure of the strength of the evidence in
favor of the null hypothesis. A large p-value supports the null hypothesis,
while a small p-value does not provide much support for the null
hypothesis. A p-value greater than .10 is called “not significant.” A p-
value less than .01 is called “significant.” A p-value between .01 and .10 is
interpreted in the context of the study, with a value less than .05
commonly considered “significant.”

Hypothesis tests and confidence intervals are closely related. A null


hypothesis that is rejected at the α level implies that the observed sample
mean did not fall in the 1 – α confidence interval. In other words, if we
reject a null hypothesis at the .05 level, our observed mean would not fall
in the 95% confidence interval.

In the preceding birth weight example, the 95% confidence interval for
birth weight would be

120 ± 1.96(10) or (100.4, 139.6)

Our observed value of 100 did not fall in the range of the confidence
interval, and therefore we rejected the null hypothesis.

It is important to remember that failing to reject a null hypothesis does not


prove it true. Classical hypothesis testers never “accept H0;” instead, they
“do not reject H0.” This subtle nuance can make statistical language
awkward and difficult to understand.

70 Lesson TenÑHypothesis Testing


ASSIGNMENT #7
Le and Boen, chapter 5: exercises 1–7, 9, 10, 15

Attach Assignment Identification Sheet #7

Pharmacy C479 71
LESSON ELEVEN
BASIC STATISTICAL TESTING
Reading Assignment

Le and Boen, sections 6.1–6.2 and appendixes C and D

Objectives
At the end of this lesson, you should be able to
o compute a statistical test to compare two proportions;
o explain the concept of a matched sample;
o explain the concept of independent samples;
o compute a statistical test to compare several independent
samples;
o compute a statistical test to compare two means from matched
data; and
o compute a statistical test to compare two means from
independent samples.

Key Terms
population proportion
chi-square
McNemar’s chi-square test

Introduction
Once we have gathered the data for a study, our next step is to analyze the
data to discover what conclusions we can draw. Choosing an appropriate
statistical test is an important part of the analysis process. This unit will
introduce you to some of the most common statistical tests for a variety of
types of outcome variables.

Testing Two Proportions


Recall from our earlier studies that we could have continuous or discrete
outcome variables. An example of a discrete variable is a variable for
which the outcome can only be “yes” or “no,” such as a person’s smoking
status: a person either is a smoker or isn’t. When a variable has only two
outcomes, we call it a binary variable. Frequently we are interested in a
summary measure of binary data known as a population proportion. For
example, we may want to know how many people in our population are
smokers. We may also be interested in comparing our study population to
some other population. We may want to test to see if the proportions from
two populations are statistically different.

Computing a test statistic to compare two proportions is relatively


straightforward. First we must decide if our test is to be one- or two-tailed.
Remember that a one-tailed test should always be specified in advance and
should be used when the hypothesis being tested specifically calls for a
one-sided alternative. For example, recall our test of the birth weight of

72 Lesson ElevenÑBasic Statistical Testing


babies born to mothers of low socioeconomic status from the previous
lesson. We were interested specifically if those babies had a lower birth
weight than other babies. We were not interested if they were different,
only if they weighed less. This implied the use of a one-sided test. When
using a one-sided test, it is easier to reject the null hypothesis or find a
significant result because the z-score cutoff is lower (1.645) than that of a
two-tailed test (1.96). The population parameters do not need to be as
different to reject the null hypothesis. For example, a difference that
would not be considered significant if a two-tailed test were used could be
significant with a one-tailed test.

The test statistic is

p − π0
z=
π 0 (1 − π 0 )
n

Where π 0 is the proportion we are hypothesizing to be true under H0, and p


is our observed proportion.

Using a standard normal table, we will reject the null hypothesis of no


difference between the two proportions if |z| > 1.96 for a two-tailed test, z
> 1.645 for a right-tailed test, or z < –1.645 for a left-tailed test.

When the data from which the proportions are being tested are matched,
our approach is a little bit different. A two-by-two table can be constructed
with each column and row entry being a combination of possible
outcomes.
For example,

Observatio
n Time 1
+ –
Observation + a b
Time 2
– c d

In this table, d represents the number of people who did not have the
characteristic of interest at either Time 1 or Time 2. People who had the
characteristic at Time 1 but not at Time 2 are represented by c, and so on.
To test if the proportions are the same at the two points in time, we can
compute a z-score using

(b − c)
z=
b+c

When a two-tailed test is used the above value is squared to give a statistic
called a chi-square (denoted χ 2). This test, called McNemar’s chi-square,
is quite common. When a one-tailed test is used, the preceding z-score is
used and rejected at 1.645 (α = .05). When a two- sided test is used, the
rejection region is χ 2 ≥ 3.84 (which is 1.962).

Pharmacy C479 73
If we are interested in comparing proportions from two independent
samples we calculate a z-score for their difference. Our null hypothesis is
stated as “H0: π1 – π2 = 0” or π 1 = π 2.

Our z-score is computed as follows:

p2 − p1
z=
p(1 − p)(1 / n1 + 1 / n2 )

where p = (x1 + x2)/(n1 + n2) is the pooled group proportion.

Testing Several Proportions


When there are proportions from several groups to compare several two-
by-two tables can be created and individuals tests done or a chi-square test
for the entire table can be done. The resulting chi-square statistic tested
has (k – 1) degrees of freedom, where k is the number of groups being
compared. Your text goes through several detailed step-by-step
explanations of this procedure called the Mantel-Haenszel procedure.
Study the examples to understand the logic and process used. Keep in
mind that this statistic is standard output for any statistical software
package.

Testing Two Means


Finally, we consider the case of comparing two means. If the sample is
matched we calculated a difference score for every data point and then test
to see if our difference is statistically different from zero. A common use
for this is “before-and-after” experiments on a group of people. We use a
t-statistic and conduct a t-test. The t-test and t-statistic are analogous to the
z-score but take into account the degrees of freedom of the sample. As our
sample gets large, the t-test behaves like the standard normal curve. Look
at appendix C and the cutoff values for ν degrees of freedom. How do they
compare with those from the standard normal curve?

The t-statistic is computed as follows:

x−µ
t=
SE( x )

Remember that SE( x ) = s / n .

If our samples are not matched, we calculate a t-statistic taking the


difference of the two observed means. The t-statistic is the following:

x1 − x2
t=
SE( x1 − x2 )

with (n1 + n2 – 2) degrees of freedom.

74 Lesson ElevenÑBasic Statistical Testing


1 1 (n − 1)s12 + (n2 − 1)s22
The SE( x1 − x2 ) = s p + where s p = 1 and is the
n1 n2 n1 + n2 − 2
pooled standard deviation.

Limitations
When you are doing statistical tests, it is important to remember their
limitations. If enough tests are done, one will probably give a significant
result, by chance alone. For example, if you were testing 20 different
blood characteristics between two groups, and using a .05 significance
level, you could expect to find one significant result even if there were no
true differences. When we know we need to conduct a lot of tests, we
sometimes set more stringent levels for significance. For example, instead
of using a .05 cut point for significance, we might use .01. This allows us
to believe that a significant result really indicates a difference between
groups and is less likely to be a Type I.

Pharmacy C479 75
ASSIGNMENT #8
Le and Boen, chapter 6: exercises 1–3, 6–11, 17, 18, 20, 21, 24

Attach Assignment Identification Sheet #8

76 Lesson ElevenÑBasic Statistical Testing


Pharmacy C479 77
LESSON TWELVE
CORRELATION AND REGRESSION
Reading Assignment

Le and Boen, section 6.3

Objectives
At the end of this lesson, you should be able to
o explain the meaning of a correlation coefficient;
o interpret different values of a correlation coefficient;
o perform a statistical test on a correlation coefficient;
o explain the concept of linear regression; and
o explain the meaning of the parameters generated using linear
regression.

Key Terms
correlation coefficient
regression

Correlation
We have discussed the methods for testing for an association between
discrete variables and in this unit will address the methods for testing for
an association between continuous variables. When we are interested in
determining a relationship between two continuous measures we compute
their correlation coefficient. A correlation is a numeric value describing
the degree of association. Correlation values can range from –1 to 1. If
two variables have a correlation coefficient equal to a large positive value
(e.g., 0.9) the implication is that as one variable increases so does the
other. A scatter plot of the data would increase from left to right, as in the
following diagram:

-1

-2

-3

-2 -1 0 1 2

78 Lesson TwelveÑCorrelation and Regression


If a correlation coefficient is large and negative, the implication is that as
one variable increases the other decreases. The scatter plot would decrease
from left to right, as in this figure:

-1

-2

-3

-2 -1 0 1 2

We use the correlation coefficient from sample data in much the same way
as we use our sample mean. That is, we use the sample correlation
coefficient (denoted by r) to estimate the true population correlation
coefficient ρ (“rho”). We can statistically test if ρ is equal to zero, which,
if true, would mean that there is no association between the variables of
interest. (This is similar to the statistical tests we do on a sample mean.)
We could state “H0: ρ = 0” as our null hypothesis and “HA: ρ ≠ 0” as our
alternate hypothesis. The statistical test we would use is

n−2
t=r ,
1 − r2

which is a t-statistic with (n – 2) degrees of freedom. We would then refer


to a table of the percentiles of the t-distribution (appendix C in Le and
Boen) to determine the corresponding p-value. Our interpretation of the
resulting p-value is the same as previous hypothesis tests. That is, we
consider the p-value to be a measure of the strength of the data in favor of
the null hypothesis. Small p-values lead us to reject the null hypothesis of
ρ = 0 and accept that there is some association between the variables.

Regression
If we determine that ρ ≠ 0, we can use a technique called regression to
determine a relationship (described mathematically) between our two
variables. The goal in regression is to accurately predict the value of one
variable called Y or the dependent variable from the other called X or the
independent variable. Y is called the dependent variable because the value
we predict for Y depends on the corresponding value of X that we know.

When the relationship between X and Y is linear (meaning the graph of X


versus Y is a line) the resulting equation will have the form Y = a + bX,
where a is the Y-intercept of the line (the point at which the line crosses
the Y-axis) and b is the slope of the line (numeric value describing how
steep the line is). Given these parameters we can sketch the relationship
between X and Y and draw inferences from it.

Pharmacy C479 79
All statistical packages easily compute correlation coefficients and
perform regression analysis. It is most important that you understand the
concept of what is being done rather than the specific computations.

80 Lesson TwelveÑCorrelation and Regression


ASSIGNMENT #9
Le and Boen, chapter 6: exercises 28–30

Attach Assignment Identification Sheet #9

Pharmacy C479 81
LESSON THIRTEEN
STATISTICAL TESTING
Reading Assignment

Le and Boen, section 7.3

Objectives
At the end of this lesson, you should be able to
o explain why nonparametric tests are useful and when it is
appropriate to use them;
o compute a Wilcoxon rank-sum test;
o compute a Wilcoxon sign test; and
o compute a Spearman’s rank correlation test.

Key Terms
Wilcoxon rank-sum test
Wilcoxon sign test
Spearman’s rank correlation test

Nonparametric Methods
We have considered many statistical methods and tests that are commonly
used. Many of these tests require certain assumptions to be made to be
valid, such as normality or near-normality of a distribution. We often are
willing to make these assumptions when we are not sure if they are in fact
true. In reality, most of the tests we have talked about are quite robust
(meaning fairly uninfluenced) to some non-normality in the data.
However, extreme non-normality may affect the validity of the statistical
tests used and the corresponding conclusions we draw from them.
It is with this problem in mind that we introduce some nonparametric
tests. Nonparametric tests, as the name suggests, do not involve a statistic
or parameter from the data. They require that no assumptions about
normality of the data be made. Usually the inferences drawn using
nonparametric tests are about the distribution as a whole rather than the
value of any particular statistic (the mean, for example).

Wilcoxon Sign Test


The Wilcoxon sign test is based on a fairly straightforward idea. Using this
test, one may wish to infer that the distribution of some random variable is
0. An example of this would be the difference between a person observed
at Time 1 and Time 2 is 0. The null hypothesis corresponding to this
would be “H0: Treatment1 – Treatment2 = 0” or “H0: Treatment1=
Treatment2.”

The corresponding alternate hypothesis would be “HA: Treatment1 d


Treatment2.” The basic idea is that if indeed the null hypothesis is true,
approximately the same number of data points should be less than zero as
are greater than zero. In order to perform this test, the sign of each data

82 Lesson ThirteenÑNonparametric Statistical Testing


point is evaluated. The number of positive (or negative) signs are counted
and the probability of that number is determined under the null hypothesis
using a binomial distribution. The probability P of x positive outcomes
from n trials is given by

n!( p) x (1 − p)n − x
P=
x!(n − x )!

where p is the probability a positive outcome will occur.

Example 1

H0: The difference D between T1 and T2 is zero


HA: The difference D between T1 and T2 is not zero

If we had 20 data points di (where i = 1, . . . , 20), and 16 of them were positive, we


could compute the probability P as follows:

20!(0.5)16 (0.5)4 20!(0.5)20


P= ⇒P= = 0.0046
(16!)( 4!) (16!)( 4!)

Under the null hypothesis of no difference between T 1 and T2, we would expect the
chance that the sign of the difference di is positive (or negative) to be 0.5. This p-value
provides little evidence in support of our null hypothesis (D = 0), and so we reject it.

Wilcoxon Rank-Sum Test


Another commonly used nonparametric test is the Wilcoxon rank-sum test.
In this test, the two samples being compared are grouped into one large
sample, ordered from smallest to largest, and assigned a corresponding
numeric rank. For example, the smallest observation in a study with 20
data points would be assigned a rank of “1” and the largest a rank of “20.”
Under the null hypothesis of no difference between the two original
populations, the sum of all ranks from each sample would be expected to
be approximately equal. We test the null hypothesis using the z-score
statistic,

R − µR
z= ,
σR

where R is the sum of the ranks from one of the original samples and µR =
n1(n1 + n2 + 1)/2. The standard deviation of R, which is denoted by σR, is
given by n1n2 (n1 + n2 + 1) / 12 . For sample sizes n1 and n2 of 10 or larger,
the statistic is approximately normal and the null hypothesis should be
rejected at the .05 level if z is larger than 1.96 or smaller than –1.96. There
are also tables of rank-sum probabilities to which one can refer in order to
come up with a p-value. These may be used for smaller sample sizes also.

If we are interested in investigating a relationship between two continuous


variables but are concerned that extreme values may influence our
correlation coefficient r we can use a rank correlation. The idea here is
that we correlate two variables ranks rather than their actual values

Pharmacy C479 83
thereby diminishing the effect of any particular data point. The first
variable, call it X, is ranked from smallest to largest. The same thing is
done for the second variable, Y. Then we compute a correlation coefficient
of the ranks rather than of the data values as we have done previously.
This process is called the Spearman’s rank correlation.

Nonparametric tests are useful in any situation, but especially when little
is known about a distribution or when assumptions for other tests are
clearly not met. If a nonparametric test is used when in fact the data are
distributed normally, the resulting p-value will be very close to the p-value
obtained using a parameter-dependent test.

84 Lesson ThirteenÑNonparametric Statistical Testing


ASSIGNMENT #10
Le and Boen, chapter 7: exercises 10, 11, 14, 15

Attach Assignment Identification Sheet #10

Pharmacy C479 85
LESSON FOURTEEN
PREPARING FOR THE FINAL EXAMINATION
This test covers all chapters covered in the course. Remember that it is a
closed-book, closed-note test. However, you will be allowed one page of
formulae to help you during the exam, and you may also bring a
calculator. To help you complete the exam, a copy of appendixes A, B,
and C from the Le and Boen text will be provided. You should find the
test problems to be similar to the homework problems. You will have two
hours to complete the examination.

See the following page for information on scheduling your examination.

Submit Assignment Identification Sheet #11

86 Lesson FourteenÑPreparing for the Final Examination

You might also like