Elements of Research: Study Design and Data Analysis

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Chapter Outline

Introduction

Study Design
Research Question
Target Population and
Chapter 1
Study Subjects
Properties of Interest, Variables,
and Measurements
Testable Hypothesis
Number of Subjects Elements of
Data Description and Analysis
Data Screening
Data Reduction and
Research: Study
Descriptive Summaries
Checking Assumptions for
Analytic Techniques Design and Data
Analysis
Statistical Analysis

Summary

Selected Bibliography

Elizabeth R. Myers, PhD

This chapter at a glance


This chapter describes the components of designing a study and the
analytical approaches taken after data are collected, with the intent to
provide an overview of analysis used in planning new research or in
critically evaluating past and current studies.
2 Section 1 Principles and Methods

Introduction Study Design


Many researchers have been faced with the problem of The point of designing a study in orthopaedic research
conducting a study and then not being able to reach a well- before data collection is simple: to maximize the ability to
supported conclusion about the results. The reasons could draw valid and supported conclusions from findings in the
range from not having enough subjects to performing an study. In other words, the goal is to maximize both the
incorrect multivariate statistical analysis. When designing a internal and external validity of a study. Internal validity is
study, steps should be taken to maximize the ability to the soundness of conclusions drawn within the study as
reach conclusions and to infer from study findings the based on the actual findings. External validity is the validity
value of the work. This chapter first describes the compo- of inferences drawn from the study to the world outside the
nents of designing a study followed by analytic approaches study. One recommended set of steps for designing a scien-
taken after data are collected. The intent is to provide an tific study is shown in Outline 1. Maximizing the ability to
overview of analysis used in planning new research or in generalize study results should be considered in all steps of
critically evaluating past and current studies. study design. Reviewing the literature prior to beginning a
Most orthopaedic research is probabilistic in nature, that study is essential; this information can impact proposed
is, the phenomena of interest occur with some random project design, analysis, and interpretation of results.
error. Each occurrence is not exactly the same as any other
observation of the phenomenon. Probabilistic phenomena
can be contrasted with deterministic problems, in which Research Question
there is no allowance for error and each run gives the same The first step in planning a new study is to formulate a
value as the others. An example of a deterministic model is research question. Similarly, the first step in evaluating a
force = mass ⫻ acceleration, or Newton’s Law. It is expected completed study is to elucidate the research question. A
that an object of mass, under an acceleration, will generate research question is a statement of an unknown issue in
force, with very little error in force for most practical pur- science that the investigator wishes to address. Every study
poses. The outcome is determined by the input values. begins with a basic question or series of questions. Resolu-
There is little reason to apply statistical techniques of data tion of the research question is then considered by planning
analysis to deterministic problems because there is mini-
mal variability associated with prediction of the results. Outline 1
However, in probabilistic research, analysis is required to Sequential Steps in Designing a Research Project
determine if associations are likely caused by random error
or are likely to be real. This aim is the basis behind most sta-
Research question:
tistical analyses of research. This chapter deals with such Formulate the research question; review literature before proceeding
techniques in orthopaedic research.
Such scientific studies can be broken down into 3 main Study subjects:
components: a design phase, in which the scientist formu- Conceptualize the target populations
lates a research question, chooses the subjects, determines Plan the technique for obtaining the intended set of subjects from
the populations
the measurements, and plans the analysis; an implementa-
Establish a plan to minimize loss of actual subjects
tion phase, during which the information is collected; and
an analysis phase, in which descriptive summaries are gen- Measurements:
erated and inferences are made based on the findings of the Identify the properties of interest
study. Overall goals of any scientific study are to be able to Translate the properties of interest into intended variables of
draw well-supported conclusions from the research and to the study
Plan the actual measurements
convince others that the methods and interpretations are
valid. To attain these goals, each component of research has
Statistical analysis:
specific aims. The aim of designing a study is to plan a con- Formulate a working hypothesis based on the research question,
vincing study and, when possible, to generalize the results subjects, and variables
to the world outside the study. The goal during implemen- Plan the statistical technique for testing the working hypothesis
tation is to collect data while taking steps for quality con-
trol. The purpose of the analysis phase is to use appropriate Number of subjects:
Estimate the number of subjects or specimens
statistical methods that estimate effects and assess the
goodness of decision making.

Orthopaedic Basic Science American Academy of Orthopaedic Surgeons


Chapter 1 Elements in Research: Study Design and Data Analysis 3

measurements or observations in subjects or specimens by an observational study to confirm significant associa-


that represent appropriate characteristics in a population tions between a predictor and an outcome, and then a more
of interest. The research question should be new and difficult interventional study is done to establish cause and
important, but it must also be practical and workable. effect. There are even some research questions that can
When stating the research question, the investigator should only be studied by an observational approach.
specify the unresolved issue, the properties of interest, and
the general set of subjects or specimens. Some examples of
research questions are: What is the prevalence of infection Target Population and Study Subjects
around a hip implant following joint replacement in The second step in designing a study or in evaluating an
patients with osteonecrosis? Does therapy that inhibits existing investigation is to delineate the target populations
bone resorption result in a decrease in hip fractures in post- and the subjects or specimens to be assessed in the study. A
menopausal women? In the mature rabbit knee, does population is the complete set of subjects or specimens of
repaired cartilage have mechanical properties of normal interest to the researcher with specified characteristics. In
cartilage in full-thickness cartilage defects? health research, these characteristics are defined typically
During the formulation of the research question, it is by clinical and demographic traits. A sample is a subset that
important to decide whether to study the issue by observ- is selected from the population of interest. The units of
ing events or by testing the effects of an active intervention study are the individual subjects or specimens that are
or treatment (Fig. 1). If the investigator observes and mea- assessed in a scientific investigation. In orthopaedic
sures uncontrolled events without altering them, then the research, many clinical studies use individual humans as
study is considered nonexperimental or observational. the units of study, but there are also musculoskeletal
However, if the investigator controls or manipulates events, research projects that use organs, tissues, cells, specimens
then the study is considered an experiment. Observational of synthetic material, or animals as the units of study. In the
studies can be further divided into descriptive studies, in rest of this chapter, the terms subject and specimen will be
which properties are described but relationships are not used interchangeably to describe the unit of study.
analyzed, versus analytic studies, in which relationships are The first consideration in choosing subjects for ortho-
analyzed. In analytic observational studies, the researcher paedic research is to envision the target populations. A
must decide which properties are predictors and which are population is defined by a set of clinical, demographic, geo-
outcomes, although these designations are based on graphic, and/or time-based selection criteria. An example
assumptions about cause and effect. of a target population is the set of adolescent females living
There are basic research questions that can be answered in the United States with idiopathic scoliosis of a
by either observational or experimental studies. For exam- certain degree of deformity.
ple, 2 possible studies could be designed and conducted to Next, a selection procedure is chosen to select a group of
answer the question: Do high-impact forces during a fall specimens from the population. In a few rare instances it is
contribute to hip fractures in elderly women? In the first
study, the investigator decides to compute estimated
impact forces by observing and gathering pertinent infor-
mation about falls in female patients older than 65 years of
age. The values for impact force are compared between a
group with hip fractures and a group of control fallers with-
out hip fractures. This is an analytic observational study in
which the investigator does not impose controlled events
on the subjects. In the second study, the investigator
decides to study the effects of padding the trochanteric
region in elderly female subjects. One group of patients
wears an attenuating pad and a second group serves as
controls with no padding, and the outcome of hip fracture
is assessed. In this experiment, the investigator controls
impact force with the trochanteric pad. There is no correct
manner in which to conduct a study, and many issues enter
into the decision of observational versus experimental Figure 1
investigations. Often a research question is first examined Comparison of observational versus experimental research studies.

American Academy of Orthopaedic Surgeons Orthopaedic Basic Science


4 Section 1 Principles and Methods

possible to study the entire target population. However, the subjects participating in sports practice. The intended sam-
target population is often too large or unmanageable to ple is meant to be a clinic-based consecutive sample, yet a
study all members. In such cases, a procedure is required to group of important, active subjects is omitted. Strategies for
select the subset of subjects that will make up the sample. encouraging participation in patient studies include mak-
External validity in choosing subjects is whether findings ing contact with every member of the intended sample and
from the set of subjects can be generalized to a population developing relationships between the study coordinator
of interest. External validity is of particular consideration in and subjects. Sometimes the best way to plan a study is
medical research using animals. Clearly, the animal sub- to look at a previous study with successful recruitment
jects used in such a study are not a sample of the human strategies. For laboratory studies, strategies to reduce loss
target population. The process of generalizing, then, of specimens include providing training for technical
involves a judgment of what features in the animal study personnel and establishing standard operating procedures
represent the human condition. for techniques.
In clinical studies, there are several techniques for select-
ing the subjects. Random selection is marking every mem-
ber of the population and then using a random technique Properties of Interest, Variables,
for drawing a certain number of study units. Thus, random and Measurements
samples are those that each have an equal probability of A third step in the design phase of research is to identify the
being selected from the population. Selecting subjects ran- properties of interest, to define the intended variables, and
domly is a good method for obtaining a sample that will to plan how the variables will be measured. The aim is to
represent the underlying population, but it is often imprac- choose variables that represent the general properties of
tical to do in small, low-cost studies. Samples chosen by interest and to measure those variables with accuracy and
random selection are also called probability samples. precision. An example of a property of interest is infection
When it is not possible to draw a random sample, the around a hip implant, with the corresponding variable
subjects can be chosen by nonprobability methods. In con- being the presence of bacteria after aspiration arthrogram,
secutive selection, every available subject or specimen that and the actual measurement being the reading of an inves-
meets the selection criteria is taken over a given time peri- tigator looking through a microscope at culture grown from
od or up to a certain number of units. As long as the time the aspirate.
period is long enough to avoid seasonal effects, consecutive There are some basic concepts that must be understood
samples can work effectively. For example, all female to plan the variables effectively. The first is the idea of
patients between the ages of 12 and 18 years seen in a scol- classifying the properties of interest into those that are
iosis clinic over 2 years from a given start date would make predictive versus those that are responses or outcomes. The
up a consecutive sample representing a population of corresponding classification of the variables is into inde-
adolescent girls with a certain degree of scoliosis in that pendent or dependent variables. Independent variables are
geographic location. those either controlled by the investigator in an experiment
There are many other selection techniques, including or chosen as predicting variables in an observational study.
modifications of random sampling, and other nonprobability Independent variables are also known as factors, predictor
techniques such as selecting subjects based on convenience. variables, or effect variables. Dependent variables are the
Techniques that involve volunteers or convenience samples variables measured as outcome and are also called
tend to be the least representative of the population. Findings response or outcome variables.
from such a study can be distorted relative to phenomena in The second concept is to determine the scale on which
a population simply because the sample is nonrepresenta- each variable is measured. Continuous variables take on
tive. Problems with such techniques include bias and con- values corresponding to points on a real number line.
founding and can therefore yield limited conclusions. Discrete variables take on a finite number of values with
During the design phase, it is important to establish quantified intervals. Categorical variables take on a finite
strategies to retain as many subjects or specimens as possi- number of values with qualitative intervals. The levels of a
ble within the intended set. If subjects or specimens are lost variable are the settings or possible values that the variable
to measurements, then the internal validity of a study could can take on; continuously scaled variables have an infinite
suffer in that the actual subjects or specimens at the end of number of possible levels, whereas discrete and categorical
the study do not represent the intended set of subjects variables have a finite number of levels defined by the inter-
(bias). Minimizing such loss could require diverse vals. When levels are ordered in a categorical variable, it is
approaches such as planning for effective recruitment and called an ordinal variable. When there is no rank or order to
retention of human subjects or minimizing loss of speci- the levels, the categorical variable is called nominal (“in
mens in cell culture. For example, an investigator is only name only”). Examples of each of these measurement
able to contact and assess subjects who come into the scales are given in Table 1.
scoliosis clinic during afternoons and, therefore, misses Considerations of validity should be made when planning

Orthopaedic Basic Science American Academy of Orthopaedic Surgeons


Chapter 1 Elements in Research: Study Design and Data Analysis 5

the variables and measurements. When picking variables to that this is done during the planning phase. The long-term
represent the properties of interest, the researcher should goal is to be able to draw conclusions at the end of the study
consider the external validity and make an informed judg- that answer the research question.
ment of how closely the variables represent the phenome- Research hypotheses involve an explanation of the phe-
na. For example, does the compressive failure load of a nomenon of interest and often provide explicit ideas about
cadaveric spine specimen broken in the laboratory repre- cause and effect. In an analytic study that will use statistical
sent fracture risk in the elderly? decision making, statistical hypotheses are also stated in
When designing the actual measurements, accuracy, the addition to the research hypothesis.
degree of agreement between the result of a measurement Statistical hypotheses involve a concept called proof by
and the true value of the quantity measured, and precision, contradiction. Both a null hypothesis and an alternate
the degree of agreement of repeated measurements using hypothesis are stated, and support for the research hypoth-
the same protocol, are important to ensure that the values esis is shown by rejecting or “nullifying” the null hypothesis.
are valid internally. For instance, does the maximum axial The null hypothesis (Ho) states that there is no association
force registered by a calibrated load cell during a compres- between predictor and outcome variables or that a treat-
sion test of an excised vertebra with end plates removed ment has no effect. The alternate hypothesis (Ha) states that
represent the failure load of the vertebra? Some strategies there is an association between the variables or that the
for increasing accuracy and precision include planning for treatment has an effect. This alternate hypothesis is usually
calibration of the instruments, standard operating proce- linked to the research hypothesis. By rejecting the null
dures, training time for the observer, automation of mea- hypothesis, support is shown for the research hypothesis.
surements, and use of objective measures if possible. As an illustration of null and alternate hypotheses, the fol-
lowing research question should be considered: does ther-
apy using dose A of Drug X increase bone density in the
Testable Hypothesis proximal femur of postmenopausal women in the United
Once the research question is set, the subjects defined, and States? The specific research hypothesis is that hip bone
the variables identified, the fourth step is to take all that mineral density measured by dual energy x-ray absorp-
information, generate a working hypothesis, and plan the tiometry is changed by treatment with dose A of Drug X
statistical approach. The working hypothesis is a formula- compared with placebo in a convenience sample of women
tion of the research question, and it includes a tentative aged 60 and older. The null hypothesis is that the mean
statement that can be tested or investigated. The working bone mineral densities are equal between the treated and
hypothesis is a practical version of the research question. placebo groups, which are designated as group 1 and group 2:
The immediate goal for stating this hypothesis is to set up a
strategy for statistical analysis, and it is important to note Ho:(µ1 – µ2) = 0

and the alternate hypothesis is that the mean bone mineral


Table 1 densities are unequal;
Measurement Scales for Variables
Ha:(µ1 – µ2) ≠ 0
Scale Levels Examples
Continuous Infinite Temperature where µ1 = mean for group 1 treated with Drug X and µ2 =
Bone mineral density mean for group 2 treated with placebo.
Fracture force In studies of associations among variables, the statistical
approach is determined primarily by the type and scale of
Discrete Finite quantitative Temperature in intervals the variables. This is why, in addition to considerations of
intervals (30°, 35°, 40°)
validity, it is very important to plan the variables and to list
Number of alcoholic
drinks per day
the type and scale of each variable during the planning
Range of motion in intervals
phase. If the researcher plans to use statistical significance
(10°, 20°, 30°) testing, the choice of the statistical test is made during the
design phase for several reasons: to assure that the working
Categorical Finite qualitative hypothesis is testable, to confirm that the capabilities for
intervals performing the analysis are available, and to determine the
Ordinal Temperature (room, body)
Pain (mild, moderate, severe)
number of subjects or specimens.
Nominal Gender (male, female)
The alternate hypothesis can be stated with or without a
Blood type definite direction. A 1-sided or 1-tailed alternate hypothesis
Hip fracture (yes, no) states that there is a specific direction to the association
between variables or to the difference among groups. To

American Academy of Orthopaedic Surgeons Orthopaedic Basic Science


6 Section 1 Principles and Methods

illustrate, a 1-sided hypothesis would state that there is a subjects? It is necessary to know the statistical methods
positive linear relationship between x and y or that group 1 proposed for the study. Three other quantities are also need-
has a greater mean value than group 2. A 2-sided or 2-tailed ed: two originate from the probabilities the researcher is
alternate hypothesis has no specific direction. One-sided willing to accept in making a decision at the conclusion of the
tests should be planned when the scientist believes that study and the third is based on the size of the impact that the
medical or scientific meaning is only important in 1 direc- predictor variables will have on the response. These 3 quanti-
tion. For example, a 1-sided hypothesis might be used in a ties are called the alpha level, the beta level, and the effect
study of compromised bone accumulation in girls wearing size; these terms are defined in the following paragraphs.
back braces for treatment of scoliosis. The hypothesis that In statistical decision theory, 2 hypothetical states of real-
brace treatment results in lower rates of bone accumulation ity are established. One is the null hypothesis (no associa-
may be of interest in musculoskeletal research, whereas the tion found) and the other is the alternate hypothesis (an
hypothesis that brace treatment results in greater bone association exists). After a study is implemented and results
accumulation than in unbraced subjects is not part of the collected, a decision is made about whether there is suffi-
research question and may not be of concern. When there cient evidence to reject the null hypothesis in favor of the
is no clear, strong reason for directionality, it is recom- alternate hypothesis. Thus, there are 4 possible outcomes
mended that the 2-sided approach be used. after a study is completed (Table 2): the null hypothesis is
It should be pointed out that confidence interval estima- rejected and in reality the alternate hypothesis is true (a
tion is a strong alternative to statistical hypothesis testing correct and desirable decision); the null hypothesis is
that is gaining popularity in health research. The confi- not rejected when in reality it is true (also a correct deci-
dence interval is more informative than the significance sion); the null hypothesis is rejected but in reality it is true
test. A confidence interval is a bracket that has a certain (a type I error); and the null hypothesis is not rejected but
level of confidence (often 95%) that the interval encloses a in reality the alternate hypothesis is true (a type II error).
population parameter. Therefore, the confidence interval Hopefully a correct decision will be made, but it is helpful to
displays both the size of an effect and the variability of the consider the probabilities of making the wrong decisions.
estimate. Plans can be made during the design phase of a Alpha is the probability of making the wrong decision when
project to use interval estimation rather than or in addition the null hypothesis is true (Table 2), that is, deciding that
to null hypothesis testing. Which method of analysis should there is an association in the study when there is no associ-
be used? Both are used in basic and clinical orthopaedic ation in the population. This type of decision is sometimes
science. A decision based on rejection of a null hypothesis called a false positive, in that the result of the research study
is appropriate when the study is designed to make a choice is positive (an association is found) but it is false. Alpha is
between alternatives. The interpretation of the results is then analogous to a false positive rate. Beta is the probabil-
often clear and easy (“the difference in compressive ity of making the wrong decision when the alternate
strength between bone cement and the new polymer was hypothesis is actually true, that is, deciding that there is no
significant”). In research areas such as epidemiology or association in the study when there is an association in the
orthopaedic treatment, however, confidence intervals are population. This decision can be thought of as a false nega-
often preferred. Confidence interval estimation allows the tive; the study has a negative result (no association found)
clinical relevance of an effect to be evaluated because the but that result is false. Beta is therefore analogous to a false
magnitude and variability of the estimate are presented. negative rate. Scientific intuition should encourage the
Additional information and computational approaches for idea that alpha and beta (the false positive and false nega-
statistical decision making and confidence interval estima-
tion are given in the section on data description and analysis. Table 2
Decisions in Analytic Studies
Number of Subjects
A necessary step in designing any orthopaedic research Statistical Decision Reality
project, before beginning the study, is to determine the in the Study
number of subjects or specimens needed for an analytic Null hypothesis Alternate hypothesis
study. There are very practical reasons for determining the is true is true
number of subjects. The number of subjects impacts the Do not reject Correct Type II Error
feasibility, cost, ethical considerations, and the time scale of null hypothesis (1 - ␣)* (β)*
a project. If a large number of subjects is needed to ensure
a certain probability of detecting an effect or a certain plau- Reject null Type I Error Correct
sible range for a parameter, then it may not be feasible to hypothesis (␣)* (1 - β)*
perform the study at all.
* Conditional probabilities of the decisions.
What is involved in a determination of the number of

Orthopaedic Basic Science American Academy of Orthopaedic Surgeons


Chapter 1 Elements in Research: Study Design and Data Analysis 7

tive rates) should be set as low as possible to enhance the bone mineral density. The effect size would be smaller for
conclusions drawn at the end of a study. However, as the data of the second study compared with the first study.
described in the following sections, the number of subjects Furthermore, it would require more subjects to detect the dif-
increases as the levels of alpha and beta are restricted, so a ference for the second case at given levels of alpha and beta.
tradeoff is necessary in practice. To determine the number of subjects needed for the
Power is the probability of rejecting the null hypothesis (in study, the researcher first must decide on the maximum
favor of the research hypothesis) in the study when the probability of making type I and type II errors. Ideally, alpha
alternative is true in the population. This outcome often and beta should be set at small levels. Based on practical
leads to support for the research hypothesis. Thus, it is issues and tradition, alpha is often set at 0.05, but note that
important to have a study with high power. lower alpha levels should be used if it is critical to avoid
Effect size is the magnitude of the effect of an indepen- false positives. Conversely, higher alpha levels could be
dent variable on the dependent variable relative to the used if avoiding false positives is not that important, such
background variability or spread in the dependent variable. as for a therapy with clinically relevant potential benefits
Consider the example of determining the impact of a cate- but with minimal side effects. Beta is often set at 0.05 to 0.2,
gorical variable, drug treatment, with 2 levels (drug treat- which gives a power of 80% to 95%. These values for beta
ment can take on the value of “dose I” or “dose II”) on a con- are also based on tradition and should be adjusted to suit a
tinuous response variable, bone mineral density (Fig. 2). given study. Next the researcher estimates the effect size.
For illustrative purposes, suppose that 2 separate studies This may seem like an example of the cart coming before
are performed. The difference between doses I and II is the the horse, but the estimate can be done based on pilot stud-
same for the data of study A and that of study B. However, ies, values in the literature, or simply by making an educat-
the spread in the values for bone mineral density is much ed guess at the size of the effect and the variability in the
greater for the data of study B. For example, this greater dependent variables. If an educated guess is used, it is help-
spread could be caused by careless assessments in the sec- ful to estimate the number of subjects based on several rea-
ond study resulting in more error in the determination of sonable values of the effect size.
It is worthwhile at this point to consider what strategies
A could enhance the probability of a successful outcome.
There are 4 quantities involved: alpha, power (or beta),
effect size, and number of subjects. Power is the probabili-
ty of rejecting the null hypothesis when the alternative is
true in the population (a successful positive outcome), so it
is enlightening to consider the dependence of power on the
other quantities. The relationships among power, effect
size, and number of subjects are illustrated in Figures 3 and
4 for a Student’s t test or comparison between 2 groups. The
Student’s t test is defined in the section on data analysis.
Note that the power goes up as the number of subjects is
B increased for set values of alpha and effect size (Fig. 3).
Thus, there is an obvious strategy that could be used to
enhance the probability of a successful outcome: increase
the number of subjects. When the total number of subjects
is restricted, there are at least 2 other strategies that could
help in certain designs. One is to amplify the “signal” of the
information and the other is to reduce the “noise”. Both of
these act to increase the effect size, and the power of the
study increases as the effect size goes up for set levels of
alpha and number of subjects (Fig. 4). To increase the sig-
Figure 2 nal, a treatment or predictor variable can be planned that is
thought to result in a large difference in the dependent vari-
Example of how the effect size of a factor depends on both the magnitude
able. To reduce the noise of the study, precise assessments
of the effect and the spread in the data. Both parts of the figure show
histograms of bone mineral density values in 2 groups. A, The difference can be planned.
in bone mineral density between Group I and Group II is large relative to In summary, it is important to note that the components
the spread in values for bone mineral density. B, The difference between of a scientific study are sequential: a study must be planned
Groups I and II is the same as in study A, but the spread of data is much and implemented before it is possible to make inferences
greater in both groups. Therefore, the effect size is smaller in study B
based on the analysis. Steps in designing a study are
compared with study A. More subjects would be required to detect the
difference between groups in study B. straightforward: formulate a research question, pick the

American Academy of Orthopaedic Surgeons Orthopaedic Basic Science


8 Section 1 Principles and Methods

study subjects, determine the measurements, and plan the


analytic approach and number of subjects. The benefits of
Data Description and Analysis
giving consideration to the design of a study are also Once the study is designed and implemented, the results
straightforward. Careful attention to the steps in study need to be analyzed and inferences need to be made based
design can enhance the validity of conclusions after the on the study results. Just as there are practical steps in study
study is completed. design, there are also functional steps in analysis of results:
screening data to maximize the quality, generating descrip-
tive summaries of the data, checking assumptions, and per-
forming analytic tests and calculating confidence intervals.
These steps are shown in Outline 2.

Data Screening
Why screen the data? The main goal is to ensure an accurate
data set. In the process, the researcher verifies that data are
entered correctly, that each variable falls within a proper
range, and that missing values are flagged.
Checks of data entry are perhaps the most tedious of the
steps in the screening process but have been aided by the
advent of computer programs for data entry. In the best of
all worlds, a complete list of data is generated by the soft-
ware program and checked on a cell-by-cell basis with the
laboratory notebook or other original source of values. In
addition, the investigator should check the following in the
output: number of variables, number of observations, and
format of each variable. Incorrect entries identified by this
initial screening should be corrected.
Out-of-range values, or outliers, are observations that
Figure 3
appear inconsistent with the remainder of the data set.
Relationship between power and number of observations for a comparison Extreme values can be in a single variable only or in a com-
of 2 groups with a fixed effect size of 0.5 and a fixed type I error rate of bination of variables. Possible sources of extreme values
0.05. Power is related directly to number of observations or subjects.
include errors made in taking, recording, or entering data;
cases that are not part of the population the investigator
intended to represent; and values that are the result of
extreme (but real) biologic variation. To detect outliers in a

Outline 2
Sequential Steps for Data Description and Analysis

Data screening
Check data values, edit incorrect entries
Flag outliers and missing values, identify cause, make decision how
to handle

Data reduction and descriptive summaries


Plot graphical displays
Compute numerical measures of central tendency, spread,
or frequency

Check of assumptions
Check for normal distributions and other assumptions of
Figure 4 planned tests

Relationship between power and effect size for a comparison of 2 groups Perform statistical analysis and/or determine
with a fixed number of subjects (50) and a fixed type I error rate (0.05). confidence intervals
The larger the effect size, the greater the power of a study for constant ␣
and number of subjects.

Orthopaedic Basic Science American Academy of Orthopaedic Surgeons


Chapter 1 Elements in Research: Study Design and Data Analysis 9

single variable, the minimum and maximum values should yield interesting information about the phenomena under
be examined; to detect outliers in a combination of vari- study but should be taken with caution. Typically, a dummy
ables, more difficult multivariate procedures are needed. variable is created out of the variable with missing values.
What to do with outlying data depends on the source of In the example used previously, the new variable would be
the out-of-range value. If errors are made in data entry, the labeled “ability to recall impact location” and would be
outlier is replaced with the correct value. If it is clear that the coded as missing or complete. This new variable could then
case is not from the target population, it is deleted from the be used in the analysis. More complicated models can also
data set. An example is the inadvertent inclusion of a young be developed to describe the mechanism of missing data. For
male cadaveric spine specimen with a fracture load of both outliers and missing values, the decision of how to
10,000 N in a study of fracture in elderly female specimens handle the problem should be made before the data analysis.
only, with a range of fracture loads from 1,000 to 5,000 N.
If the outlier is suspected of being the result of extreme
biologic variation, then the path to follow is not as clear. Data Reduction and
Most investigators simply live with the extreme value and Descriptive Summaries
accept any distortion caused by the outlier in the descrip- The second step in data description and analysis is to gen-
tive summaries and analysis. There are also mechanisms erate summaries of the data. How is a set of measurements
for handling outliers during analysis, such as techniques described? The measurements could be presented in their
that adjust for skewed data. It should be noted that outliers entirety, but this would be of little help to the orthopaedic
may give insight into the phenomena under study and scientist in understanding the results. Instead, graphic dis-
should therefore be examined carefully. plays are made or numeric measures are computed that
The approach to examining missing values is similar to represent the central tendency, the dispersion, or the
that for out-of-range values. Sources of missing values frequency of the variables. There are many such methods
include problems such as loss of specimens, poor patient for describing data sets; only a few methods used common-
recall, and equipment malfunction. Missing values should ly in orthopaedic research are presented in this section.
be detected and flagged in the data set. The cause should be Graphic methods for displaying distributions include
determined if possible. The quantity and pattern of the frequency histograms and box plots. To form a frequency
missing information should be checked. Values that appear histogram, intervals are established from values of a vari-
to be missing at random are much less of a problem in able and then the number of observations within each
terms of distortion than values that are missing informa- interval is determined. To form the histogram plot, the
tion in association with other variables in the study. For interval values of the variable are plotted on a horizontal
example, in a study of falls, impact location, and hip frac- axis and the vertical heights of bars are drawn proportional
ture, most of the subjects who cannot recall the location of to the number of specimens within that interval. An exam-
impact during a fall are found to be in the fracture group. ple is shown in Figure 5 for fracture force from a study of
The subjects who readily identify the location of impact cadaveric specimens from elderly female donors. The num-
tend to be in the control group without fracture. Thus, there ber of intervals is arbitrary but should be adjusted to the
is an association between having a missing value for a key amount of data collected. Typically 5 to 20 intervals are
variable and fracture status. Consequently, deletion of used, with larger data sets requiring more intervals.
these subjects could cause distortion of the sample. By examining the frequency histogram, the manner in
The procedures for handling missing values should be which the measurements are distributed in the intervals is
done with care. Deletion of all data for a specimen or spec- evident. In addition, the histogram can be used to deter-
imens with missing values is a possible alternative if there mine what proportion of measurements have values
are only a few cases and they seem to be a random subset greater or less than a certain value. For example, what frac-
within the set. Similarly, the variable with missing values tion of spines broke at loads greater than 3,000 N? Based on
can be dropped from the data set, particularly if the missing Figure 5, six specimens out of 15 achieved fracture loads
values are concentrated within the variable and the variable greater than 3,000 N, or 40%. Note that this is also the per-
is not crucial for answering the research question. Another cent of the total area under the histogram. It is expected
common procedure is to impute the missing value based that the frequency histogram of a sample will provide infor-
on nonmissing values for the variable or on relationships mation on the population frequency histogram, which is
with other variables in the data set. A variable to be used in the histogram that would be generated if all values from the
hypothesis testing with another variable should never be population were obtained.
used to estimate missing values within this other variable. A second method for graphic display of a set of measure-
Another approach sometimes used to handle missing ments is the box plot. In contrast to the horizontal axis of a
information is to transform missing information into a new histogram, the distribution of a variable is displayed on a
variable. This approach is taken when failure to have a vertical scale in a box plot. First, a horizontal line is drawn
value may itself be predictive of outcome. Such a tactic can at the midpoint of the measurements and then a box is con-

American Academy of Orthopaedic Surgeons Orthopaedic Basic Science


10 Section 1 Principles and Methods

structed that divides the lower 25% of observations from distribution of the measurements. Scientists are often
the upper 75%. In addition, vertical lines mark the smallest interested in numbers that describe the central tendency
and largest observations. Figure 6 is a box plot for the same and the spread of observations within continuous and dis-
data used to generate the histogram of Figure 5. If the actu- crete variables. In certain cases, it is possible to summarize
al data points are superimposed on the box plot, outlying the entire set of measurements for a given continuous vari-
values become readily apparent. able with two numbers, one that reflects the center and
Numeric methods for describing data sets are intended to another that reflects the dispersion.
reduce the data to a limited set of numbers that conveys the The sample mean is equal to the sum of a set of measure-
ments divided by the number of observations:
n
∑ yi
i=1
Mean: y = n

where yi is the specific value of the variable y for the i th


observation and n is the total number of observations. The
sample mean can be used as a measure of central tendency
for a continuous variable if the distribution is roughly bell-
shaped (the histogram of Figure 5 is an example). The sample
mean is used to estimate the population mean (µ), which is
generally unknown. If the population distribution is bell-
shaped, the population mean is the center of the distribution
and the most probable value within the population.
Other measures of central tendency include the median
and the mode. The median of a set of n measurements is the
Figure 5 value that falls in the middle of the ordered measurements.
Example of a histogram. Data are plotted for the failure force in Newtons
of 15 spine specimens. The horizontal axis depicts intervals of force val- Median: yi + yi + 1 , i = n , n even
ues with interval widths of 500 N. 2 2
n + 1,
yi; i = 2 n odd
The mode is the most frequently occurring measurement in
a set of measurements and is often used with discrete and
categoric data.
Measures of dispersion or spread in the data include the
variance, standard deviation, range, and interquartile
range. The sample variance (s2) is:

s 2 = 1 ∑(yi – y) 2
n

n – 1 i=1
Note that yi minus the mean is a measure of the deviation of
that specific measurement from the mean. Thus, the vari-
ance reflects the average of the squares of the deviations of
the measurements about their mean. When the variance is
large, the data are more dispersed than when the number is
small. The sample standard deviation (s) is the positive
square root of the variance:

Sample standard deviation: s = √s 2


Figure 6
Example of a box plot. The same data as in Figure 5 are plotted in the box The sample variance is an estimate of the population vari-
plot format. The vertical axis depicts force values on a continuous scale. ance (␴ 2), which, like the population mean, is generally
The midpoint or median of the data array after ordering is plotted as a hor- unknown.
izontal line. Then a box is drawn around the median line with the upper
Other indicators of sample variability include measures
edge at the 75th percentile and the lower edge at the 25th percentile. The
high and low values are also indicated by vertical lines. such as the range, which is the difference between the

Orthopaedic Basic Science American Academy of Orthopaedic Surgeons


Chapter 1 Elements in Research: Study Design and Data Analysis 11

largest and smallest value of y, and the interquartile range, and progressively fewer values toward the extremes of the
which is the difference between the third quartile and the range. If the number of observations is large, the distribu-
first quartile of a set of measurements. The first quartile is tion is bell shaped and approximates a normal distribution.
the value of y that separates the lower 25% from the upper Examples include the height and weight of humans, bone
75% of values, and the third quartile is the value that sepa- mechanical properties, or bone density. In Figure 8, actual
rates the lower 75% from the upper 25%. Fifty percent of values for bone mineral density in a sample of 120 post-
values fall within the interquartile range. menopausal women are plotted in a frequency histogram.
Several descriptors are used with nominal variables. A The cluster of values near the mean and the approximate
proportion is the number of measurements with a particu- bell shape can be seen.
lar level of a nominal variable divided by the total number The equation of the normal curve is given by the normal
of measurements. For example, if 36 out of 50 patients with probability density function:
hip fracture are women, then the proportion of women is
36/50 or 0.72. A ratio is the number of measurements with 2
1 e–[(y–µ) /2␴ ]
2

a particular level of a nominal variable divided by the num-


ber of measurements without that value. The ratio of
f(y) = (
␴√2∏ )
women with hip fracture to men with hip fracture is 36/14
or 2.6. A rate is a proportion determined over a period of This is the equation of the bell-shaped curve illustrated in
time. A well known illustration of a rate in medicine is the Figure 7 where µ is the population mean and ␴ is the stan-
incidence of a disease, which is the number of new cases of dard deviation. Note that the area under the curve to the
a disease divided by the total number of people at risk over right of a given value of y represents the probability that y
a certain time period. will be greater than or equal to the given value.
The normal score (z) gives the distance that y is from the
mean in number of standard deviations:
Checking Assumptions for
Analytic Techniques
Parameters are numeric descriptive quantities that charac- z= y–

µ
terize the population, such as the population mean or
standard deviation. Many analytic tests assume that the If z = 1, then the corresponding y is one standard deviation
parameter being analyzed comes from a population with a away from the mean. If z = 0, y is equal to the mean. The
certain frequency distribution called the normal probabili- probability distribution for z is called the standard normal
ty distribution (Fig. 7). Therefore, before going on to the distribution (Fig. 9). The probability that z belongs to some
strategy for checking assumptions, it is necessary to review interval is equal to the corresponding area under the stan-
the concepts behind a normal distribution. dard normal curve, and the total area under the curve is
A large number of continuous variables in nature possess equal to 1. To illustrate the use of the standardized normal
a frequency distribution with many values near the mean curve, consider the following question: What is the value of
z (call it zo) such that 95% of z values fall within -zo and +zo?
Based on Figure 9, the area under the curve between z = 0

Figure 7 Figure 8
Normal probability distribution. The horizontal axis depicts the variable y Histogram of bone mineral density in 120 postmenopausal women.
and the vertical axis is the value of the normal density at (f). The peak of
the normal probability distribution corresponds to y = mean (µ).

American Academy of Orthopaedic Surgeons Orthopaedic Basic Science


12 Section 1 Principles and Methods

and 1.96 is 0.475 and, with symmetry, the area between z = If an assumption for the analysis is that the observations
-1.96 and +1.96 is 0.95. Therefore, z o = 1.96 and it can be have a normal distribution, then the sample distribution
seen that 95% of values fall within 1.96 or approximately 2 should be assessed before proceeding with analysis. A
standard deviations of the mean. graphic display of the histogram should be checked for
Just as variables often have a bell-shaped distribution skewness and kurtosis. A skewed variable is one with the
with many values near the mean and progressively fewer mean not in the center of the distribution. An example of a
values near the extremes or tails, so do the means of a given skewed distribution is shown in Figure 10 for the body mass
variable from multiple random samples drawn from a pop- index of 50 adolescent girls. Although many values tend to
ulation. In other words, if many samples are drawn ran- cluster near 18 to 20 kg/m2, there are several subjects with
domly from a population, the means of these samples will quite high values for body mass index, resulting in positive
form a normal distribution. Many of the means will be near skewness. A variable with kurtosis has either too many
the mean of the means but a few will be far away. Even if the cases in the tails of the distribution or too few observations
underlying population is not normal, the distribution of the in the tails. There are also hypothesis tests for assessing
means will tend toward normality as the number of obser- departure from normality, such as the Shapiro-Wilk or W
vations within each of the samples increases. This leads to statistic and the Kolmogorow-Smirov test.
the definition of the standard error of the mean, which If there is departure from normality, there are nonpara-
should not be confused with the standard deviation. The metric tests that do not rely on parameters such as the
sample standard error of the mean (SEM) is the square root mean and standard deviation. There are also transforma-
of the sample variance of the distribution of means and is tion functions that can be applied to variables to reduce
equal to the sample standard deviation divided by the skewness or kurtosis. This is often the reasoning behind
square root of n: logarithmic or square-root transformations of data in
orthopaedic research. Taking the logarithm of y will some-
s times pull in the tail of a skewed distribution. Note that
SEM = transforming variables may make it difficult to describe and
√n
interpret results. For example, it is difficult to interpret the
Note that the sample SEM is not a measure of the disper- logarithm of body mass index.
sion of a set of observations but a measure of the dispersion The mean and standard deviation are appropriate mea-
of the mean. Some call it an assessment of the precision of sures of central tendency and dispersion only if the data
the estimate of the mean. It should not be used as an have an approximate normal distribution. In situations
expression of the spread of a variable nor as an estimate of with marked deviation from a normal distribution, the
the population spread. SEM gives important information, median and the range or interquartile range can be used as
however, when comparing means. measures of central tendency and dispersion.
Other assumptions for analytic tests depend on the spe-
cific tests themselves. Some frequently required assump-
tions include independence of observations and equality of
variances among groups. The researcher and critical

Figure 9 Figure 10
Standard normal distribution for the normal score (z). Area under the Histogram of the body mass index (BMI) of 50 adolescent female subjects.
standard normal curve represents probability.

Orthopaedic Basic Science American Academy of Orthopaedic Surgeons


Chapter 1 Elements in Research: Study Design and Data Analysis 13

reviewer should be aware that a given parameter estimate niques (Outline 3). The basic question is whether an
or hypothesis test may have underlying assumptions that observed association in a sample could be the result of ran-
should be checked. dom error. Null and alternate hypotheses are stated and a
single number called the test statistic is computed based on
the sample information. If the magnitude of the test statis-
Statistical Analysis tic is large enough, it is considered inconsistent with the
The culminating step to data description and analysis is to truth of the null hypothesis and the null hypothesis is
perform statistical analyses. The objective is to make infer- rejected. The p-value (p), also called the observed signifi-
ences about a population based on information gathered in cance or associated probability, is the probability that the
the sample of a research study. It is important to under-
stand that parameters determined in a study (values such
as the sample mean and standard deviation or the differ- Table 3
ence between 2 means) do not necessarily completely Techniques for Statistical Inference About Parameters
represent the values of the underlying population. Typical-
ly, studies are limited by factors such as small numbers of
Parameter Technique
specimens or large biologic variability. Therefore, research-
ers are called upon to estimate population values and Mean One-sample t test
sometimes to make decisions concerning the value of a Difference Two-sample t test
parameter. between 2 means
Strategies to analyze results tend to fall into 2 categories:
Difference between Paired-difference t test
tests of hypotheses concerning values of parameters (statis-
paired means
tical decision making; also called the significance test) and
estimations of parameter values (point and interval esti- Difference between 2 variances F test
mates). Most everyone who has read scientific literature
Difference among > 2 means Analysis of variance
over the last century is familiar with the significance test.
Some of the underlying ideas behind significance tests have Difference among > 2 means Repeated measures
been described in this chapter in the section on Study with trial factor analysis of variance
Design (see subsections Testable Hypothesis and Number Linear association between Correlation
of Subjects) and are covered in more depth in some of the 2 variables
suggested references. In many designs, a straw man (null
Slope between 2 variables Regression
hypothesis) is set up and an attempt is made to strike it
down with the data of the study. The researcher and critical
reviewer should keep in mind, however, that there are limi-
tations to significance tests. Performing a significance test Outline 3
is a decision-making process. The significance test treats Basic Setup for Statistical Significance Test
the acceptance or rejection of a hypothesis as a decision the
researcher makes based on the data. As such, the test may
Null hypothesis (Ho):
only give a yes-no decision about a parameter. There is no
No difference or no association
sense of the size or strength of an effect or the nature of a
relationship. An interval estimate, on the other hand, con- Alternate hypothesis (Ha):
tains this important, additional information. Therefore, Difference or an association specified by the investigator
although reports of significant tests may be more familiar in Test statistic:
orthopaedic publications, researchers should consider Function of the data and parameters that are known
using confidence intervals to report results in the literature.
Degrees of freedom:
Many experts have a strong preference for interval estima-
Function of number of measurements
tion over significance tests (see bibliography). Insomuch as
both approaches are currently followed in orthopaedic sci- p-value:
ence, techniques for performing significance tests and for Probability of obtaining a value of the test statistic at least as
extreme as the value observed given that Ho is true; depends on
determining point and interval estimates are described in
magnitude of test statistic and degrees of freedom
the following sections.
A significance test involves a specific procedure that Type I error rate (␣):
depends on the design of the study. Some of the frequently Probability of erroneously rejecting Ho; set during design of study
used parameters and the corresponding parametric signifi- Decision:
cance tests are shown in Table 3. The anatomy of a statisti- If p ≤ ␣, reject Ho
cal significance test is consistent among the many tech-

American Academy of Orthopaedic Surgeons Orthopaedic Basic Science


14 Section 1 Principles and Methods

test statistic could be at least this extreme assuming the null a 2-sided hypothesis test. Two samples are drawn in a con-
hypothesis is true. The p-value is compared against the secutive fashion from a hospital orthopaedic floor: one
alpha level set during the design of the study. If the p-value group has hip fracture and the other has fallen without hip
is less than or equal to the alpha level, then the null hypoth- fracture. Bone mineral density is assessed in the proximal
esis is rejected. femur (Table 4), and the corresponding t value is:
To illustrate the use of a significance test, consider the
comparison of 2 means, which uses the test statistic called
Student’s t: 0.64g/cm 2 – 0.56g/cm2
t= =1.56
Ho: (µ1 – µ2) = Do (0.12g/cm 2) √ 101 + 121
Ha: (µ1 – µ2) ≠ Do For degrees of freedom equal to 20, the p-value (area under
the t distribution to the right of t = 1.56 and to the left of
t = -1.56) is p = 0.13. Thus, there is insufficient evidence to
(y1 – y2) – Do reject the null hypothesis or to support a conclusion of any
Test Statistic: t =
S √ n1 + n1
1 2
difference between the means of the 2 populations.
A second analytic strategy is estimation of a population
value based on data from the research study, including both
Degrees of Freedom: ν = n1 + n2 – 2 point and interval estimates. A point estimate is a single
number that estimates the parameter of interest. For
Further details of the t test are given in Outline 4 and in instance, the difference between 2 sample means can serve
the selected bibliography, but the basic idea behind a sig- as a point estimator of the difference between 2 population
nificance test can be understood by considering the test means. An interval estimate gives a plausible range for a
statistic. Note that the hypotheses are stated for the popu- parameter and, as such, contains very important informa-
lation parameters but that the test statistic is calculated tion. To illustrate, the confidence interval of the difference
from the sample data. The value of t will be large if the dif- between 2 means provides an assessment of the plausible
ference between the mean for sample 1 and the mean for range of the difference between 2 population means rather
sample 2 is large relative to the assumed difference, Do. In than just a point estimate. If this confidence interval over-
most studies, the assumed difference is zero (Do = 0). The laps zero, then it is plausible that the true values for the 2
value of t will also be large if the pooled standard deviation means are not different. But if the range is large, then the
(sp) is small. The statistic, therefore, captures important plausible values for the difference cover broad ground.
information about the comparison of 2 means. If the mag- The width of a confidence interval depends on the vari-
nitude of t is very large, then it is plausible that the value is ability in the data, the number of subjects or specimens,
not the result of random error under the given condition and on a value called the confidence coefficient (1-␣). The
(Ho) that there is no difference. level of confidence is often expressed as a percentage: 100
A case-control study of bone mineral density in hip frac- (1-␣). It is arbitrary but is often set at 90% or 95%. For a con-
ture sufferers versus controls can be used as an example. fidence level of 95%, the estimated interval would enclose
The research question is whether bone mineral density is the population parameter 95% of the time if repeated stud-
different in postmenopausal women with hip fracture than ies were performed.
in controls without fracture. This question is translated into The upper and lower bounds of a confidence interval are

Table 4
Example of Data for Comparison of Two Means: Femoral Bone Mineral Density (BMD) for Control and
Hip Fracture Groups

Group Mean BMD (g/cm2) s (g/cm2) n 95% confidence


interval (g/cm2)
Control 0.64 0.13 10 0.55 - 0.73

Hip Fracture 0.56 0.11 12 0.49 - 0.63

Pooled 0.12

Orthopaedic Basic Science American Academy of Orthopaedic Surgeons


Chapter 1 Elements in Research: Study Design and Data Analysis 15

Table 5 Outline 4
Nonparametric Counterparts for Some Common Inference About Mean: One Sample t test
Parametric Tests
Null hypothesis (Ho): µ = µo
Parametric Nonparametric
Alternate hypothesis (Ha): µ ≠ µo OR µ < µo OR µ > µo
(specified by investigator)
Student’s t test Mann-Whitney or Wilcoxon rank-sum
Test statistic:
Paired difference t test Wilcoxon signed rank

One-way analysis of variance Kruskal-Wallis

Two-way analysis of variance Friedman Degrees of freedom: ν=n–1

Linear correlation Kendall or Spearman p-value: One-tailed: area under


rank correlation tn-1 distribution to the right of t if
Ha = µ > µo or to the left of t
if Ha = µ < µo. Two-tailed: sum of
areas under tn-1 distribution to the
right of | t | and to the left of -| t |
calculated from formulas specific to the parameter of inter-
est. For example, the confidence interval for the population Decision: If p ≤ ␣, reject Ho
mean is:
Assumptions: Random sample
Sampled population has
y ± t ␣ /2s normal probability distribution
√n with unknown mean µ and
unknown variance
where (1-␣) is the confidence coefficient. The 95% confi-
Confidence interval
dence intervals for the previous example of bone mineral 100 ⫻ (1 – ␣):
density are given in Table 5. The plausible values for popu-
lation mean bone density for hip fracture patients are
between 0.49 and 0.63 g/cm2.
Equations for the test statistics of a few of the most com-
mon parametric statistical tests are given in Outlines 4 ing population has a normal distribution (see Outlines 4
through 6. In addition, the assumptions required for each through 6). When sample size is large, many parametric
test and the equations for computing the confidence inter- tests are “robust” to deviations from the normal distribution.
val of the parameter are also given. There are many other Robust means that the validity of the test is not seriously
test statistics used to examine other null hypotheses, but affected. When the assumption of normality is severely
the computations of these extensive tests are beyond the violated, however, nonparametric tests, which do not rely on
scope of this chapter. Several of the general statistics texts an assumption of underlying normality, can be used. Many
listed in the bibliography provide additional information. of the nonparametric tests use ranks rather than means and
When reporting results from significance tests, sometimes consequently do not rely on the shape of the distribution of
only the p value is given without also presenting additional the property being tested. Nonparametric counterparts to
information such as parameter estimates or confidence some common parametric tests are listed in Table 5. These
intervals. This omission is a common error in biomedical tests are recommended for situations in which the investi-
literature and should be avoided. An example is the follow- gator wishes to examine ranks or when the check of
ing report: the drug raised the hip bone mineral density in assumptions reveals severe violations.
postmenopausal women compared with placebo treatment
(p = 0.04). Note that although the p value is given, indicat-
ing that there is an effect, there is no sense of the magnitude Summary
of the effect. An improved report is: the drug raised the hip There are many problems in research that can be improved
bone mineral density in postmenopausal women by a by a sound understanding of study design and statistical
mean of 0.06 g/cm2 or 10% compared with placebo treat- analysis. Research is undoubtedly a creative process, but
ment (p = 0.04). The parameter in this case is the difference some practical skills will enhance the creativity. The
in bone mineral density between the two treatment groups. approach and practical steps outlined in this chapter are
An even better account is to give the confidence interval for idealized, but it is hoped that they will provide a rough
the difference. framework for initiating new research studies or under-
Note that many parametric tests assume that the underly- standing current ones.

American Academy of Orthopaedic Surgeons Orthopaedic Basic Science


16 Section 1 Principles and Methods

Outline 5 Outline 6
Inference About Difference Between 2 Means: Inference About Difference Between Paired Means:
Student's t Test Paired Difference Test

Null Hypothesis (Ho): µ1 – µ2 = Do Null Hypothesis (Ho): ␦ = ␦o, where ␦ = mean of


the differences
Alternate Hypothesis (Ha): (µ1 – µ2) ≠ Do or
(µ1 – µ2) > Do or (µ1 – µ2) < Do Alternate Hypothesis (Ha): ␦ ≠ ␦o, or ␦ < ␦o or ␦ > ␦o

Test Statistic: Test Statistic:

where d is the mean of the


differences in the sample and sd
is the standard deviation of the
differences
Degrees of Freedom: ν = n1 + n2 –2
Degrees of Freedom: ν = n–1 where n = number of pairs
p-value: One-tailed: area under tν
distribution to the right of t
if Ha = (µ1 – µ2) > Do or to the Associated probability (p): One-tailed: area under tn-1
left of t if Ha = (µ1 – µ2) < Do distribution to the right of t
Two-tailed: sum of areas under if Ha = ␦ > ␦o or to the left
tν distribution to the right of of t if Ha = ␦ < ␦o
| t | and to the left of -| t | Two-tailed: sum of areas under tn-1
(or 2x area to the right of | t |) distribution to the right of
| t | and to the left of -| t |
Decision: If p ≤ ␣, reject Ho (or 2x area to the right of | t |)

Assumptions: Random samples Decision: If p ≤ ␣, reject Ho


Sampled populations have
normal probability distributions, Assumptions: Random sample
population variances are equal, Sampled population has
and samples are independent normal probability distribution

Confidence Interval Confidence Interval


100 ⫻ (1– ␣): 100 ⫻ (1– ␣):

Selected Bibliography Winer BJ (ed): Statistical Principles in Experimental Design, ed 2.


New York, NY, McGraw-Hill, 1971.

Study Design Analysis and Statistics


Dawson-Saunders B, Trapp RG (ed): Basic and Clinical Biostatistics.
Cohen J (ed): Statistical Power Analysis for the Behavioral Sciences, ed
Norwalk, CT, Appleton & Lange, 1990.
2. Hillsdale, NJ, L Erlbaum Associates, 1988.
Glantz SA (ed): Primer of Biostatistics, ed 3. New York, NY, McGraw-
Hulley SB, Cummings SR, Browner WS (eds): Designing Clinical
Hill, 1992.
Research: An Epidemiologic Approach. Baltimore, MD, Williams &
Wilkins, 1988. Kleinbaum DG, Kupper LL, Muller KE (eds): Applied Regression
Analysis and Other Multivariable Methods, ed 2. Boston, MA, PWS-
Janssen HF: Experimental design and data evaluation in orthopaedic
Kent Publishing, 1988.
research. J Orthop Res 1986;4:504–509.
Lieber RL: Experimental design and statistical analysis, in Simon SR
Lieber RL: Statistical significance and statistical power in hypothesis
(ed): Orthopaedic Basic Science. Rosemont, IL, American Academy of
testing. J Orthop Res 1990;8:304–309.
Orthopaedic Surgeons, 1994, pp 623–665.
Rothman KJ (ed): Modern Epidemiology. Boston, MA, Little, Brown &
Mendenhall W (ed): Introduction to Probability and Statistics, ed 4.
Co, 1986.
North Scituate, MA, Duxbury Press, 1975.

Orthopaedic Basic Science American Academy of Orthopaedic Surgeons


Chapter 1 Elements in Research: Study Design and Data Analysis 17

Munro BH, Page EB (eds): Statistical Methods for Health Care Cleveland WS (ed): The Elements of Graphing Data, rev ed. Murray
Research, ed 2. Philadelphia, PA, JB Lippincott, 1993. Hill, NJ, AT&T Bell Laboratories, 1994.

Oakes MW (ed): Statistical Inference. Chestnut Hill, MA, Epidemiolo- DeMets DL: Statistics and ethics in medical research. Science Eng
gy Resources Inc, 1986. Ethics 1999;5:97–117.

Santner TJ: Fundamentals of statistics for orthopaedists: Part I. Dorey FS, Nasser S, Amstutz H: The need for confidence intervals in
J Bone Joint Surg 1984;66A:468–471. the presentation of orthopaedic data. J Bone Joint Surg 1993;75A:
1844–1852.
Santner TJ, Burstein AH: Fundamentals of statistics for ortho-
paedists: Part II. J Bone Joint Surg 1984;66A:794–799. Friedman LM, Furberg C, DeMets DL (eds): Fundamentals of Clinical
Trials, ed 2. Littleton, MA, PSG Publishing Company, 1985.
Santner TJ, Wypij D: Fundamentals of statistics for orthopaedists:
Part III. J Bone Joint Surg 1984;66A:1309–1318. Glantz SA: Biostatistics: How to detect, correct, and prevent errors in
the medical literature. Circulation 1980;61:1–7.
Tabachnick BG, Fidell LS (eds): Using Multivariate Statistics, ed 2.
New York, NY, Harper & Row, 1989. Lang TA, Secic M (eds): How to Report Statistics in Medicine: Annotated
Guidelines for Authors, Editors, and Reviewers. Philadelphia, PA,
Zar JH (ed): Biostatistical Analysis, ed 2. Englewood Cliffs, NJ, American College of Physicians, 1997.
Prentice-Hall, 1984.
Mills JL: Data torturing. N Engl J Med 1993;329:1196–1199.

Special Topics Vrbos LA, Lorenz MA, Peabody EH, McGregor M: Clinical method-
ologies and incidence of appropriate statistical testing in ortho-
Browner WS, Newman TB: Are all significant P values created equal? paedic spine literature: Are statistics misleading? Spine 1993;18:
The analogy between diagnostic tests and clinical research. JAMA 1021–1029.
1987;257:2459–2463.

American Academy of Orthopaedic Surgeons Orthopaedic Basic Science

You might also like