Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 14

Psychological Assessment © 2000 by the American Psychological Association

September 2000 Vol. 12, No. 3, 263-271


For personal use only--not for distribution.

Revising Psychological Tests


Lessons Learned From the Revision of the MMPI

James N. Butcher
Department of Psychology University of Minnesota
ABSTRACT

Some types of psychological tests become dated and require more frequent and more extensive
revision than others. Because of the formidable effort that is required in a test revision, the goals
and scope of the revision need to be carefully staked out before a revision is undertaken. The
revision team needs to develop a generally agreed-upon guiding philosophy for the test revision in
the beginning of the project and incorporate broad input into the changes that are likely to be
required. Factors important to consider in a test revision are discussed, and the parameters of
personality test revision illustrated from the extensive program to revise the Minnesota Multiphasic
Personality Inventory (MMPI) are included. Recommendations for gauging acceptance of the
revision are suggested along with steps that revisers and publishers might take to make a test
revision both more research based and more acceptable to test users.

When is a psychological test in need of revision? Not everything in life becomes functionally
ineffective at the same rate. Some types of psychological tests become dated and require more
frequent or more extensive revision over time than others. Achievement tests, cognitive measures,
or interest scales that rely upon current information or recent events need revision more frequently
than measures such as personality tests that rely upon stimulus material that remains more constant
over time. Some measures, such as the Rorschach Inkblot Technique, which is made up of a series
of ambiguous inkblots, are not greatly influenced by changing meaning and may never need
revision at the stimulus level. Revisions at the level of interpretation, however, may be required as
information on the interpretation of the technique increases. Other projective techniques, such as the
Thematic Apperception Test (TAT), which includes a number of pictures involving people engaging
in various activities, however, are influenced by changing times and styles, and the stimulus cards
(still in use in their original form) may appear to people as dated.

Test revisions can come in all sizes and shapes. One might simply make some minor changes in the
booklet and test manual, leave all else the same, and call it a revision or go a bit further and drop
some nonworking items, add new ones, or even develop new norms. However, most widely used
measures require more extensive alterations to keep the instrument viable. Making even small item
or norm changes requires that many, more substantial tasks be undertaken.

The more successful a test the more likely it becomes that inertia and change resistance may
develop over time. Thus, broad usage of a test may extend its use beyond the point that problems of
obsolescence begin to emerge. If a test is in wide use, there are likely to be more arguments against
its revision and more resistance to change even though everything else around it has changed,
resulting in an instrument that becomes even more out of date. Such was the case with the
Minnesota Multiphasic Personality Inventory (MMPI)–because it was so widely used and seemed to
work well enough (or else people tended to overlook its problems), a revision was long overdue by
the time the moon and stars and the test publisher were in the proper phase that a revision could be
mounted.
The goals and scope of possible modifications need to be carefully evaluated before a revision of a
widely used test is initiated. Revisions of major psychological instruments are typically more
substantial undertakings than revisers initially conceive them to be. In an insightful discussion of
his revision of the Strong Vocational Interest Blank (SVIB), Campbell (1969 , 1972) , more than a
decade before the MMPI revision began, warned potential revisers of the MMPI at a national
symposium devoted to the topic of an MMPI revision:

I emphasize that it was about twenty years ago that Strong first started thinking about these
revisions and ten years ago that the work actually started. Those have to be sobering figures to
anyone thinking about beginning to revise the MMPI. (1972, p. 118)

Campbell's sage advice to potential revisers of the MMPI back in the 1970s is clearly appropriate
today for anyone revising a major psychological test. Many of his experiences in revising the SVIB
occurred in the MMPI revision project as well. The issues surrounding the MMPI underwent
considerable scrutiny ( Dahlstrom, 1972 ; Hathaway, 1972 ; Meehl, 1972 ) and debate ( Loevinger,
1972 ; Norman, 1972 ) before a revision was finally undertaken. Debate continued to take place a
decade after the opening bell calling for a revision was sounded and another decade passed before
work began. The guiding philosophy for the MMPI revision was widely discussed, and several
publications heralded the project startup in the 1980s ( Butcher, 1972 ; Butcher & Owen, 1978 ),
with self-study continuing over the ten years that the MMPI project took to complete, culminating
in the publication of the MMPI-2 for adults ( Butcher, Dalstrom, Graham, Tellegen, & Kaemmer )
and MMPI-A for adolescents ( Butcher et al., 1992 ). Thus, as Campbell had envisioned, the MMPI-
2 was published a full 20 years after the initial discussions aimed at developing a revision program
for the test had begun.

Experiences from the revisions of the SVIB and MMPI suggest that it takes a great deal of time and
research effort to effect a successful revision and gain broad acceptance by the professional
community. The present article was written to provide a practical framework for conducting a test
revision of a major personality test and to examine ramifications that alterations might have on test
usage. Examples are drawn largely from the extensive MMPI revision program. Anyone seriously
contemplating a revision of a major instrument should also read David Campbell's (1972)
discussion of the SVIB revision.

Principles of Test Revision

A generally, agreed-upon guiding philosophy for the test revision should be developed in the
beginning of the project. Developing a plan that stakes out the major outlines of the alterations that
are to be made and makes professional use less adversely impacted is the best tack to take at startup.
The major issues a test reviser will face are summarized as Principles of Test Revision and are
shown in Table 1 .

The revision planners did not view the MMPI revision as being designed to accomplish a magical
metamorphosis into an entirely new test but rather to make an on-course correction along a clearly
predetermined path. One element of the guiding philosophy for the MMPI revision that served as a
framework was that the original instrument was considered to have many elements that should be
maintained in the revised form in order for the test to be considered a revision and not an altogether
new test:

 The traditional clinical scales needed to be nearly identical in terms of item structure and
general configuration as the original instrument.
 Although many elements in the test needed to be identical in the revision, some changes to
the booklet were necessary: Unused and dated items were deleted, some of the items were
rewritten, some objectionable items were dropped, and new content was added in order to
measure contemporary clinical problems and make the MMPI a more effective instrument in
the future.

 A more relevant, contemporary normative population study needed to be conducted to


provide new norms for the traditional validity and clinical scales, even though the items
were continuous with the original version.

 The inclusion of new item content to replace out-of-date items would allow for the
development of new scales to address contemporary problems.

 The committee agreed that a substantial number of clinical studies needed to be developed
in order to provide for well-defined samples to "test out" new scales that were to be
developed or to provide a new validation of the traditional scales.

Early in the MMPI revision it was concluded that the use of the MMPI with adolescents required a
separate form (with expansion of the item pool to include more adolescent problems) and the
development of separate adolescent norms. The adolescent MMPI development program became a
separate program of research, with a shorter but more relevant content domain than the adult
version. This plan required the development of a separate set of norms and extensive clinical data
collection before the test could be published.

In sum, the research philosophy guiding the MMPI Revision Committee was to make the necessary
changes to improve the item content coverage, to change the traditional clinical scales as little as
possible, to develop new contemporary measures to broaden the test's assessment base, and to
employ new normative and new clinical data to test empirically any changes that were eventually
undertaken. The guiding philosophy for the MMPI revision included the collection of a substantial
database to allow for validation of the revised instrument.

The Scope of the Test Revision

How much change is required in a test revision? This is a complicated question and would, of
course, differ depending upon the type of test, the societal changes that have occurred, and new
improvements in the science and technology of test construction that are available–for example,
new psychometric procedures or strategies that have evolved since the test was developed or last
revised. The amount of revision needs to be determined in terms of the following possible
parameters to change.

In test stimuli.

The revision might call for the incorporation of new stimulus material or new tasks into the
instrument. Passing years tend to make some test stimuli or interpretive procedures appear quaint,
antiquated, or completely irrelevant. In revising the SVIB, Campbell (1972) found, for example,
that the instrument was so dated that it contained items that referred to magazines that were no
longer published.

In administrative procedures.

New or alternative methods of administering the test might be desirable to consider (e.g., computer-
administered item presentation using adaptive testing technology or computer-based voice activated
response recording technology as it becomes available).
In scales or units of measure.

New personality constructs or new ways of assessing important dimensions may be available since
the earlier publication.

In norms or psychometric features.

Newer psychometric approaches, for example Item Response Theory ( Embretson, 1996 ), might be
employed to enhance or replace traditional psychometric procedures.

In applications.

For example, instruments that were developed for one type of application might require substantial
revision and altered norming if the test is to be expanded into other applications.

Each of the elements, as noted above, would vary depending upon the type of assessment measure
involved and the length of time since the test was revised or was originally published.

In formulating the plan to revise the MMPI, the Revision Committee set out to to maintain
continuity with the original MMPI while at the same time implementing important changes and
expanding the items and scales in the test. The major reason for maintaining the continuity between
MMPI-2 and the original MMPI was to allow clinicians and researchers alike to be able to use the
revised MMPI as the original was used, in an uninterrupted manner. Those with longitudinal
research, for example, could still understand the meaning of scale scores for persons who were
followed up over time with a different version of the test. Yet, it was also considered important to
incorporate new item content to address contemporary problems.

Resolve Key Prerevision Issues

It is critical to settle issues such as arrangement of credit, work responsibility, and royalties before
the project gets underway. Issues of authorship and financial arrangements, if not mutually agreed
upon prior to the revision (between the test publisher and revision team), can create ill feelings.
Because most psychological tests are developed and marketed by commercial or academic
publishers, revision arrangements should be spelled out in a contract, particularly if the revisers of
the test are different from the initial developers of the instrument. David Campbell (1972) pointed
out clear obstacles to a test revision:

The third major obstacle for the MMPI reviser concerns the necessary practical arrangements.
These can be subdivided into three main areas: first, the establishment of some administrative
structure so that the work will get done. Essentially this means deciding who is going to be
responsible and then giving him enough authority to carry out that responsibility; second, the
provision of the necessary funds to support the activity; and third, the assignment of credit for doing
the work to include both authorship listing and royalties. (p. 124)

Some aspects of the MMPI revision may not provide much useful guidance for potential test
revisers in that the MMPI-2 committee essentially revised the MMPI with the primary goal of
developing a stronger instrument for their ongoing research, without participating in possible
financial rewards for conducting the revision. That is, the Revision Committee chose to forgo
royalties from the revised instrument.

Each potential reviser and each test publisher will have different motives and expectations from a
test revision. It behooves the participants to air those expectations and responsibilities lest
difficulties of a pecuniary nature entangle and disrupt the endeavor. One issue that Campbell
addressed, and would be underscored with the MMPI revision, is that it is important that one person
be given the responsibility of making decisions when there are disagreements about the way to
proceed. Although theoretical and practical disagreements did occur on the MMPI project, no
disagreements occurred about financial arrangements. Two requirements were stipulated by the
MMPI Revision Committee in the beginning of the revision: (a) It was important that the MMPI-2
scoring keys be open and public. (There had been some precedent in the testing industry for test
publishers to keep the scoring keys for instruments secret in order to protect them from being
copied.) The MMPI committee believed that the instrument was a research tool "for the field" and
as such it was considered important that open and available access to the scoring keys be
maintained. (b) The MMPI committee strongly recommended (and the test publisher, the University
of Minnesota Press has since followed) that a portion of the test revenues be allocated for future
research on the instrument.

Gauge Potential Reaction to Alterations

It is important that test revisers try to foresee problems that might result from any proposed
alterations insofar as ultimate professional acceptance of the revision can be assured. Alterations,
even minor ones, can have an impact that was originally unintended. Campbell (1972) related the
following problem that occurred with changes made to the number of items on the SVIB:

The first example is the number of items on the test. When a test is changed, the new form should
have a different number of items so that it can quickly be distinguished from the older form,
especially when answer sheets are involved. In the initial planning of the men's revision of the
SVIB, we added 5 items (for a total of 405) to make the new form unique. After those plans became
public, I received an anguished letter from a user stating that if the new test had 405 items, it could
no longer be scored on the IBM-805 scoring machine because that answer sheet takes a maximum
of 400 items. That did not seem to be a major problem since most people do not use the IBM 805
anymore and, anyway, answer sheets are flexible so that a few more items can always be added. The
problem loomed larger when I found that 80,000 IBM answer sheets had been sold the preceding
year, so clearly someone was using them, and the problem became hopelessly complicated after I
spent an afternoon trying to cram five more items onto the answer sheet and finally concluded that
it was impossible. (I found out much later that the reason that the original SVIB had 400 items was
because that was the maximum number that could be fitted onto that answer sheet.) Since we were
at a point where we could still make changes, we dropped one item, rather than adding five. (pp.
121—122)

A similar problem with an unexpected result from one small change in the MMPI booklet was
unearthed early in the test revision. One of the changes that the MMPI committee considered a "no
brainer" that would immediately improve the instrument was to shorten it by dropping the 16
repeated items from the original booklet. These items had been included in the original test booklet
in order to facilitate scoring on an earlier version of the instrument and are not used in clinical
interpretation. The intention to drop these items as a space saving move was announced early in the
revision at a national meeting of the Symposium on Recent Developments in the Use of the MMPI.
We immediately received two protests to dropping the repeated items because they were used in the
test—retest (TR) index as a measure of response consistency. However, the fact that the 16 repeated
items were used by a small number of researchers was not very persuasive, particularly since plans
for the revision included developing stronger consistency measures. Eliminating the repeated items
improved the instrument and did not adversely affect the development of response consistency
measures because there still remained in the item pool a sufficient number of items with similar or
opposite meaning. This change, in the end, did not result in a loss of a significant measure because
two, more useful, consistency scales (VRIN and TRIN) developed by Auke Tellegen (1988) were
published in the revision. However, dropping the 16 repeated items eliminated a frequent complaint
from test takers that we were trying to "trip them up" by repeating some items.

This situation did serve to alert the MMPI Revision Committee of the need to gauge the potential
impact of changes on research or clinical practice, where possible, and to secure clear empirical
support to justify any changes that were to be implemented. Assuring research justification is
crucial if major shifts or broad changes in interpretation are likely to come from the alterations that
are proposed. Before the MMPI revision was undertaken, a number of "focus group discussions"
were held with many researchers to determine the impact of possible changes. In addition, the test
publisher (University of Minnesota Press) conducted several surveys of test users and researchers,
before and after the revised version was published, in order to gauge the impact of some specific
changes that were undertaken, such as the elimination of the so-called subtle—obvious scales and
the timing of the phase-out of the original version of the test.

Commit to Necessary Changes and Modernize the Instrument

Fix all the problems that are fixable even though they may seem unimportant at first glance. If
major problems persist after the revision is published, there will surely be an adverse reaction and
potential stumbling blocks to the broader acceptance of the updated form. An instrument like the
MMPI that had been around so long and had so many contributors to its research base had a number
of unanticipated nuisance problems that came to light during the revision. Problems surfaced with
the instrument that, while not fatal, did result in a less effective instrument. For example, toward the
end of the MMPI revision, after the new norms had been generated and during the development of a
set of subscales for the Si scale ( Ben-Porath, Hostetler, Butcher, & Graham, 1989 ), it was
discovered that the original Si scale contained two items that were traditionally misscored, that is,
they were being scored all along in the wrong direction! It was considered necessary to fix this
problem even though it meant disturbing some original scoring keys and, of course, recomputing
the T scores for the new norms.

Not all problems in a complex instrument are, however, fixable. It is important to realize that,
because of the nature of the test, there may be some lingering troublesome elements in the
instrument that cannot be addressed in a revision because they would require a "new test" rather
than a revised version. For example, in the MMPI revision research some questions were raised
about the value of the K correction that had been added to "improve" the assessment of five of the
MMPI clinical scales since the mid-1940s ( Weed, 1992 ). Empirical studies showed that the K
correction developed by Meehl and Hathaway (1946) did not actually improve the validity of the
predictions from the scales that were routinely K-adjusted. Clearly, this finding was troublesome for
the MMPI committee because the potential impact of simply doing away with this traditional
scoring correction in the MMPI-2 could have greater consequences. The fact that the K correction
had been applied to the vast majority of the traditional validity studies and, if deleted, might negate
the results of the revised scales was a sobering thought. However, the use of K did not appear to
make the predictions substantially less valid. It was therefore concluded that since one goal of the
revision was an on-course correction, rather than a major revamping of the clinical scales, the
traditional clinical scales needed to remain as close as possible to the original versions. However, to
promote future research on the problem, the committee decided to encourage the development of
non-K corrected validity studies that could be applied in future research by making available in the
test manual non-K corrected T scores. In addition, a non-K corrected profile form (based on norms
developed without K corrections) was provided for psychologists to examine profiles without the K
correction being added.

Changes in test stimuli are often required in any revision. All one has to do is to look at the TAT
cards or read through some of the items on the original MMPI to get a feel for the importance of
updating psychological test stimuli periodically. Many assessment stimuli are time bound and
experience a deterioration of meaning or relevance over time because of language, living styles, or
social practice changes making the original stimuli appear quaint. People being asked to respond to
a lot of questions on a psychological test that are out of date may not take the task very seriously.

The item-level improvements that were implemented with the MMPI had several important effects.
The items in the revised form became more readable and less objectionable, and the content became
more appropriate to a broader population. For example, dropping items like "I believe in the second
coming of Christ" or "I enjoyed reading Alice in Wonderland" made the instrument less
idiosyncratic. For example, one item on the original form that was dropped in MMPI-2 was the item
"I like Lincoln better than Washington." This item was problematic in that it was very difficult to
translate into other languages and showed biased results in various samples in the United States. For
example, Erdberg (1970) found that this item completely separated African Americans from Whites
in a study of Black—White differences on the test in Alabama. However, the item was not very
effective at detecting personality differences, and it was not scored on any major MMPI scale!
Dropping the item from the inventory did not affect any major scale.

Many of the item-level improvements implemented in the revised version produced a more
appropriate instrument for diverse populations. Another interesting consequence of improving the
wording and reducing the culture-bound elements in the test was that the MMPI-2 item pool
became much easier to translate into other languages ( Butcher, 1996 ).

Test New or Altered Stimuli in Pretests

If items are changed or substituted on the revised version of an instrument, then it is crucial to
empirically explore the impact of these changes before the revised version is released. For example,
the MMPI Revision Committee conducted an evaluation of wording changes on the revised MMPI
by administering the instrument to samples of known clients (airline pilot applicants) to determine if
the revised booklet produced different results than the original item wordings. Results indicated that
wording changes did not alter meaning–only the ease of reading the items. In the case of the MMPI-
A, item changes were evaluated in a field study. Adolescents from a private school served as
"project consultants"–they were given alternative wording of cumbersome items and asked to
determine the readability and acceptability of the item changes ( Williams, Ben-Porath, & Hevern,
1991 ).

Choose the Most Generalizable Normative Approach

Applications of a psychological test are limited by the normative basis of the instrument.
Developing the instrument according to the broadest normative populations is important for user
acceptance and to assure utility across a range of applications. Some instruments have been
developed for specialized purposes (e.g., the Millon Multiaxial Clinical Inventory [MCMI]; Millon,
1994 ) and contain "norms" that are based on responses from a narrow normative sample (i.e., from
a psychiatric sample rather than on responses from the general population). If the instrument's use is
expanded for broader assessment into wider applications (such as with personnel selection or
forensic assessment), then the norms for the instrument would not apply. The MCMI has limited
utility when used with nonpsychiatric samples or for broad assessment questions. For example,
normal persons are found to be "pathological" on the MCMI norms because these norms do not
distinguish normals from nonnormals–they assume that everyone taking the test is a psychiatric
patient. If one is developing an instrument that will be used in a broad range of applications, it is
advisable to employ norming procedures that do not unduly limit the generality of the instrument.
In developing a normative sample, potential test revisers should avoid shortcuts that might make
data collection easier but produce a weaker normative base for the instrument or result in a measure
that requires limited interpretation or use. For example, one recent personality scale, the Basic
Personality Inventory ( Jackson, 1989 ), was developed using procedures that substantially limit the
generalizability and reduce the confidence in the utility of the test. The norms for the Basic
Personality Inventory were collected by mailing the test booklets out to a sample of possible
normative test subjects, with $1 attached, requesting that they fill them out. No effort was made to
sample diverse ethnic group membership or to provide a controlled testing environment for the
administration of the test. Moreover, each participant was asked to respond to only one third of the
items in the total item pool. Thus, participants did not respond to the entire item pool so that any
scale statistics, such as alpha coefficients for scales, would be more difficult to interpret because the
participants have not responded to all the items.

The MMPI Revision Committee considered it important in the revision to develop norms that were
nationally based, randomly drawn from the community, and obtained in well-controlled testing
sessions. The sample was balanced for ethnic group membership and well represented the national
census. The MMPI-2 normative sample was drawn from a broadly diverse sample that clearly
approximated characteristics of the national census ( Schinka & LaLone, 1997 ; Shaaffer, Erdberg,
& Haroian, 1998 ). Interestingly, the norms for the MMPI-2 have been found to apply well with
diverse ethnic samples in the United States ( Ben-Porath, Shondrick, & Stafford, 1994 ; Ellertsen,
Havik, & Skavhellen, 1996 ; McNulty, Graham, Ben-Porath, & Stein, 1997 ; Timbrook & Graham,
1994 ; Velasquez et al., 1997 ) and with international populations.

Evaluate the Measuring Instruments

It is crucial in a test revision to accumulate a broad variety of data on the populations for which the
instrument is intended to be used. There can never be enough field testing of the revised instrument.
It is absolutely essential to provide basic psychometric information, such as scale reliability, factor
structure, and so forth, for relevant samples, in order to anchor the instrument in psychometric
"space." Test users expect to be provided basic psychometric statistics on the instrument as well as
evidence of congruence (or lack) with the earlier instrument. As with any new test, information
about the external validity of the revised scale needs to be provided. For example, in the MMPI
revision a large sample of couples (over 800 couples) from the normative population were also
asked to rate each other on 110 personality characteristics. These served as one source of external
validity against which the clinical scales and new scales could be tested out ( Butcher et al., 1989 ).

The MMPI Revision Committee approached the matter of justifying changes by an extensive data
collection on many samples in addition to the normative population. In all, over 15,000 persons
from clinical and normal range populations were evaluated with the MMPI-2 or MMPI-A during the
decade of redevelopment of the instrument and before the revision was released. For example,
empirical studies were conducted on pain patients ( Keller & Butcher, 1991 ), psychiatric inpatients
( Ben-Porath, Butcher, & Graham, 1991 ), likely child abusing mothers ( Egeland, Erickson,
Butcher, & Ben-Porath, 1991 ), alcoholics ( Weed, Butcher, Ben-Porath, & McKenna, 1992 ),
couples in marital therapy ( Hjemboe & Butcher, 1991 ), airline pilot applicants ( Butcher, 1994 ),
older individuals ( Butcher et al., 1991 ), military personnel ( Butcher, Jeffrey, et al., 1990 ), and
college students ( Butcher, Graham, Dahlstrom, & Bowman, 1990 ).

Develop the New Scales

As noted earlier, newer effective methods might be employed in a test revision to supplement
measures in the original instrument. Empirical scale construction methods, used in the development
of the original MMPI clinical scales, came under criticism because the scales developed tend to be
psychometrically complex and heterogeneous ( Loevinger, 1972 ; Norman, 1972 ). Therefore,
during the MMPI revision several scales were developed following an entirely different strategy for
the MMPI-2 item pool. Over the past 30 years a substantial amount of research ( Wiggins, 1966 )
and theoretical discussion ( Burisch, 1984 ) has provided a strong impetus for the development of
content-based scales. The MMPI-2 content scales ( Butcher, Graham, Williams, & Ben-Porath,
1990 ) were published following a combined rational—empirical scale construction strategy. These
scales have been found subsequently to perform well as empirical predictors of external behavior
and to provide important content themes ( Ben-Porath, Butcher, & Graham, 1991 ; Ben-Porath,
McCully, & Almagor, 1993 ).

As long as the client is cooperative with the evaluation, content-based scales are effective predictors
of criterion behavior–equal or better, in terms of predictive power, than the empirical scales of the
MMPI-2. Therefore, it is important to have a clear picture of the client's test taking attitudes and
cooperativeness with the evaluation when the instrument is used in clinical assessment (or in the
development of scales in the first place, for that matter).

Evaluate Response Sets

Given that not all people who are selected to serve as participants in a standardization study are
cooperative, it is critical for test revisers to have an effective means of assuring participant
cooperation. Clear and unambiguous directions and well-controlled and monitored testing sessions
can go a long way toward assuring quality standardization data.

Some psychological tests also incorporate measures for assessing protocol validity. The MMPI-2
contains a number of validity scales to detect invalidating conditions. For example, the F(B) scale
was developed in the MMPI Revision Project as a means of detecting random responses or "mixed-
up responding" toward the end of the booklet. A number of new validity scales were developed for
the MMPI-2 that cover an expanded range of possible invalidating conditions. The availability of an
expanded array of invalidity indices ( Arbisi & Ben-Porath, 1995 ; Butcher & Han, 1995 ; Tellegen,
1988 ) had the effect of stimulating a large number of studies to explore the utility of the MMPI-2 to
detect invalidating conditions in personality assessment.

Develop a Detailed Test Manual

Key to gaining acceptance of the revised instrument is the publication of a comprehensive, detailed
test manual. The test manual for a revised version of a psychological test has multiple purposes. As
for any psychological test, the manual needs to provide evidence with respect to the delineation of
scale constructs, administration and scoring procedures, psychometric properties and internal scale
relationships, evidence of test validity, and examples of how the instrument is used and interpreted.

The manual for a revised instrument also needs to be tied to the original test that it is replacing.
Commonalties between the two measures need to be identified so that the test user can relate his or
her past experiences with the test to the present version. Additionally, deviations from the "old way"
of doing things also need to be spelled out. For example, any variations in the administration and
scoring, alterations in test stimuli, interpretive strategies, and so forth need to be highlighted so that
current users can visualize the operation of the new version.

Dissent and Cacophony

Some people will, for a time, hold on to the earlier standard even though improvements make the
revised instrument much better than the original and even when most people have adopted the new
version. Psychologists who are placed in the role of "revisionists" for any widely used
psychological test need to be aware that not everyone likes change. Some people, by nature, abhor
change or modifications in their external world if they have to alter their practice or research
substantially. Changes in a beloved and relied-upon test can be an imposing event for some
professionals.

Odell Shepard (1929) pointed out in Joys of Forgetting:

There are people who not only strive to remain static themselves but strive to keep everything else
so, and weep like Heraclitus to find that nothing ever stands still to be studied, understood, and
described. Their grievance against the world is that it insists upon changing at every moment and
destroying all their categories. Who that has lived at all has not sympathized with them at one time
or another? And yet their position is almost laughably hopeless. (p. 146)

As noted earlier, even the most substantial and high-quality test revision will have its detractors
because of the resistance some people have to change. Critical detraction can result from several
reasons: blind loyalty to the earlier version of the instrument, financial ties to the earlier version,
resistance to change or novelty, or heavy commitment to an earlier version (e.g, "I have my file
cabinets filled with original test booklets and answer sheets"). Such criticisms of a revised test are
neither easy to predict nor possible to circumvent, because they may not be based on a careful
evaluation of the revision itself but on more subjective, idiosyncratic reasons. It may not be possible
to eliminate all such problems, and it is important to distinguish between well-founded criticism and
simple "grousing" over the situation.

Yet, criticisms of the revision do not have to be substantiated in order to be aired and to call the
revision into question. Some critics might find problems with the revision or voice criticisms even
though they may have little or no basis in fact. For example, one vocal critic of MMPI-2
complained that "the validity of the Pd scale was less in MMPI-2 than in the original MMPI." Yet,
this criticism was clearly unfounded because the Pd scale is exactly the same composition in the
revised MMPI–no items dropped or none added–therefore, the ability of the new version to predict
external behavior would be exactly the same as the original version of the instrument.

Opinions, even groundless ones, can sway perceptions on a temporary basis. However, a well-
conceived and constructed test revision will win out over the long term. It is important in a test
revision to do the most careful job possible–and to have the conviction to make necessary changes
even though some criticism may follow.

Some psychologists who have revised well-known tests have been criticized for changing the test.
People handle such criticism in different ways. Our view on the MMPI project was that we would
"let the data speak for themselves." If the revision is a strong one, then the criticisms will abate as
people become aware that the changes are backed by substantial information.

Develop a Critical Phase-Out Period for the Superseded Version

The test revision process is not complete until the earlier or original version of the instrument has
receded into history. There were very clear reasons for continuing the original MMPI for a period of
time along with the revised instrument. The MMPI-2 Revision Committee (because of the extensive
use of the MMPI) held the view that the original MMPI would likely be used for some time because
it was so tied to clinical practice. In addition, the MMPI-2 (for adults) was published in 1989 while
the MMPI revision for adolescents was still in progress and did not get published until 1992. The
MMPI-2 did not include adolescents in the norms–therefore, it was not recommended for teens–and
so the original version (particularly since there were adolescent norms available) was still
recommended for this population. The MMPI committee held the general view that this period
would be about 5 years, or roughly 3 years after the adolescent version of the MMPI was published.

Having two "standards" for the same instrument can be problematic for both clinical and research
applications. For example, if the instrument is widely used in court-related evaluations, then both
sides of the court case in the adversarial legal system could employ different versions of the
standard in their assessment and produce results that might appear to differ substantially. The
existence of both the original and revised forms of the MMPI has created some confusion,
particularly for applications involving forensic evaluations. The test publisher, with substantial
review of existing research by its psychological consultants, determined that it is in the best interest
of the field of psychological assessment to phase out the original version.

What is the professional standard for determining when the old test should recede into history?
There are no clear guidelines for determining when a test has been superceded. However,
psychologists are encouraged to use the most current version of a psychological test. The American
Psychological Association suggested the following:

A test should be amended or revised when new research data, significant changes in the domain
represented, or new conditions of test use and interpretation make the test inappropriate for its
intended uses. An apparently old test that remains useful need not be withdrawn or revised simply
because of the passage of time. But it is the responsibility of the test publishers to monitor changing
conditions and to amend, revise, or withdraw the test as indicated. ( American Psychological
Association, 1996 , Standard 3.18)

Although most MMPI users shifted over to the MMPI-2 in 1989 and the MMPI-A in 1992, a
smaller number of psychologists (less than 5%) continued to use the original version 9 years after
the MMPI-2 became available. A few somewhat vocal dissenters continued to employ the original
norms in critical assessment situations such as court cases–a situation that created confusion as to
what the "true" MMPI standard should be. This confusing situation led the test publisher to
withdraw the original MMPI from service as of September 1, 1999.

Provide Educational Training or Briefings on Changes

One of the most effective means of assuring that practitioners and researchers will transition to the
revised version of a test is for the revision team to conduct practical workshops or briefing sessions
for test users to explain the changes in the revised version and continuities with the original
instrument. Even before the MMPI was published an extensive series of workshops or briefing
sessions were conducted to inform test users about the new form. These educational programs were
influential in informing test users as to the important adjustments that needed to be made, as
measured by the fact that within 6 months of publication over 80 percent of MMPI users had
switched to the revised MMPI.

Another way in which test publishers can facilitate the transition to a revised version of a
psychological test is to implement an exchange or "buy back" program to help practitioners obtain
the newer version without undue costs. This program involves replacing stock of the earlier version
with the revised form at a lower cost.

Conclusions

Many psychological tests require updating if their timeliness and effectiveness are to be maintained.
Revisions of a widely used psychological test can be a daunting task–and one that can take great
effort and resources if it is to be done properly. Key elements in any revision are (a) develop a
practical program and collect an ample base of supporting data to justify the changes that are made
in the revision, (b) communicate clearly to test users what changes have been implemented and
what are the continuities with the original instrument, and (c) conduct a series of accessible
continuing education programs across the country to inform practitioners of the changes and
modifications made.

References

American Psychological Association. (1996). Standards for psychological tests. (Washington, DC:
Author)
Arbisi, P. & Ben-Porath, Y. S. (1995). An MMPI-2 infrequency scale for use with
psychopathological populations: The Infrequency-Psychopathology Scale, F(p). Psychological
Assessment, 7, 424-431.
Ben-Porath, Y. S., Butcher, J. N. & Graham, J. R. (1991). Contribution of the MMPI-2 scales to the
differential diagnosis of schizophrenia and major depression. Psychological Assessment, 3, 634-
640.
Ben-Porath, Y. S., Hostetler, K., Butcher, J. N. & Graham, J. R. (1989). New subscales for the
MMPI-2 Social Introversion (Si) scale. Psychological Assessment, 1, 169-174.
Ben-Porath, Y. S., McCully, E. & Almagor, M. (1993). Incremental validity of the MMPI-2 Content
Scales in the assessment of personality and psychopathology by self-report. Journal of Personality
Assessment, 61, 557-575.
Ben-Porath, Y. S., Shondrick, D. & Stafford, K. (1994). MMPI-2 and race in a forensic diagnostic
sample. Criminal Justice and Behavior, 22, 19-32.
Burisch, M. (1984). Approaches in personality inventory construction. American Psychologist, 39,
214-227.
Butcher, J. N. (1972). Objective personality assessment: Changing perspectives. (New York:
Academic Press. (Note: Contains early articles highlighting problems with the original MMPI and
the need for a revision.)
Butcher, J. N. (1994). Psychological assessment of airline pilot applicants with the MMPI-2.
Journal of Personality Assessment, 62, 31-44.
Butcher, J. N. (1996). International adaptations of the MMPI-2: Research and clinical applications.
(Minneapolis, MN: University of Minnesota Press)
Butcher, J. N., Aldwin, C., Levenson, M., Ben-Porath, Y. S., Spiro, A. & Bossé, R. (1991).
Personality and aging: A study of the MMPI-2 among elderly men. Psychology of Aging, 6, 361-
370.
Butcher, J. N., Dahlstrom, W. G., Graham, J. R., Tellegen, A. & Kaemmer, B. (1989). Minnesota
Multiphasic Personality Inventory-2 (MMPI-2): Manual for administration and scoring.
(Minneapolis, MN: University of Minnesota Press)
Butcher, J. N., Graham, J. R., Dahlstrom, W. G. & Bowman, E. (1990). The MMPI-2 with college
students. Journal of Personality Assessment, 54, 1-15.
Butcher, J. N., Graham, J. R., Williams, C. L. & Ben-Porath, Y. S. (1990). Development and use of
the MMPI-2 Content Scales. (Minneapolis: University of Minnesota Press)
Butcher, J. N. & Han, K. (1995). Development of an MMPI-2 scale to assess the presentation of self
in a superlative manner: The S Scale.(In J. N. Butcher & C. D. Spielberger (Eds.), Advances in
personality assessment (Vol. 10, pp. 25—50). Hillsdale, NJ: Erlbaum.)
Butcher, J. N., Jeffrey, T., Cayton, T. G., Colligan, S., DeVore, J. & Minnegawa, R. (1990). A study
of active duty military personnel with the MMPI-2. Military Psychology, 2, 47-61.
Butcher, J. N. & Owen, P. (1978). Survey of personality inventories: Recent research developments
and contemporary issues.(In B. Wolman (Ed.), Handbook of clinical diagnosis. New York: Plenum.)

Butcher, J. N., Williams, C. L., Graham, J. R., Archer, R., Tellegen, A., Ben-Porath, Y. S. &
Kaemmer, B. (1992). MMPI-A manual for administration, scoring, and interpretation.
(Minneapolis, MN: University of Minnesota Press)
Campbell, D. P. (1969). The practical problems of revising an established psychological test. (Paper
presented at the 5th Conference on Recent Developments in the Use of the MMPI, Minneapolis,
MN)
Campbell, D. P. (1972). The practical problems of revising an established psychological test.(In J.
N. Butcher (Ed.), Objective personality assessment: Changing perspectives (pp. 117—130). New
York: Academic Press.)
Dahlstrom, W. G. (1972). Whither the MMPI?(In J. N. Butcher (Ed.), Objective personality
assessment: Changing perspectives (pp. 85—115). New York: Academic Press.)
Egeland, B., Erickson, M., Butcher, J. N. & Ben-Porath, Y. S. (1991). MMPI-2 profiles of women at
risk for child abuse. Journal of Personality Assessment, 57, 254-263.
Ellertsen, B., Havik, O. E. & Skavhellen, R. R. (1996). The Norwegian MMPI-2.(In J. N. Butcher
(Ed.), International adaptations of the MMPI-2: Research and clinical applications (pp. 350—367).
Minneapolis, MN: University of Minnesota Press.)
Embretson, S. E. (1996). The new rules of measurement. Psychological Assessment, 8, 341-349.
Erdberg, S. P. (1970). MMPI differences associated with sex, race, and residence in a southern
sample (Doctoral dissertation, University of Alabama, 1969). Dissertation Abstracts International,
30, 5236B
Hathaway, S. R. (1972). Where have we gone wrong? The mystery of missing progress.(In J. N.
Butcher (Ed.), Objective personality assessment: Changing perspectives (pp. 24—44). New York:
Academic Press.)
Hjemboe, S. & Butcher, J. N. (1991). Couples in marital distress: A study of demographic and
personality factors as measured by the MMPI-2. Journal of Personality Assessment, 57, 216-237.
Jackson, D. N. (1989). Basic Personality Inventory Manual. (Goshen, NY: Sigma Assessment
Systems)
Keller, L. S. & Butcher, J. N. (1991). Use of the MMPI-2 with chronic pain patients. (Minneapolis,
MN: University of Minnesota Press)
Loevinger, J. (1972). Some limitations of objective personality tests.(In J. N. Butcher (Ed.),
Objective personality assessment: Changing perspectives (pp. 45—58). New York: Academic
Press.)
McNulty, J., Graham, J. R., Ben-Porath, Y. S. & Stein, L. A. R. (1997). Comparative validity of
MMPI-2 scores of African American and Caucasian health center clients. Psychological
Assessment, 9, 464-470.
Meehl, P. E. (1972). Reactions, reflections, projections.(In J. N. Butcher (Ed.), Objective
personality assessment: Changing perspectives (pp. 131—183). New York: Academic Press.)
Meehl, P. E. & Hathaway, S. R. (1946). The K factor as a suppressor variable. Journal of Applied
Psychology, 30, 525-564.
Millon, T. (1994). MCMI-III: Manual. (Minneapolis, MN: National Computer Systems)
Norman, W. (1972). Psychometric considerations for a revision of the MMPI.(In J. N. Butcher
(Ed.), Objective personality assessment: Changing perspectives (pp. 59—84). New York: Academic
Press.)
Schinka, J. A. & LaLone, L. (1997). MMPI-2 norms: Comparisons with a census-matched
subsample. Psychological Assessment, 9, 307-311.
Shaaffer, T. W., Erdberg, P. & Haroian, J. (1998, February). Current nonpatient data for the
Rorschach, WAIS-R and the MMPI-2. (Paper presented at the Midwinter Meeting of the Society for
Personality Assessment, Boston, MA)
Shepard, O. (1929). Joys of forgetting. (Boston: Houghton Mifflin)
Tellegen, A. M. (1988). The analysis of consistency in personality assessment. Journal of
Personality, 56, 621-663.
Timbrook, R. E. & Graham, J. R. (1994). Ethnic differences on the MMPI-2? Psychological
Assessment, 6, 212-217.
Velasquez, R., Gonzales, M., Butcher, J. N., Castillo-Canez, I., Apodaca, J. X. & Chavira, D.
(1997). Use of the MMPI-2 with Chicanos: Strategies for counselors. Journal of Multicultural
Counseling and Development, 25, 107-120.
Weed, N. C. (1992). An evaluation of the efficacy of MMPI-2 indicators of validity. (Unpublished
doctoral dissertation, University of Minnesota)
Weed, N. C., Butcher, J. N., Ben-Porath, Y. S. & McKenna, T. (1992). New measures for assessing
alcohol and drug abuse with the MMPI-2: The APS and AAS. Journal of Personality Assessment,
58, 389-404.
Wiggins, J. (1966). Substantive dimensions of self-report in the MMPI item pool. Psychological
Monographs, 80,
Williams, C. L., Ben-Porath, Y. S. & Hevern, V. (1991, March). Item level improvements for the
MMPI-A. (Paper presented at the 26th annual symposium on Recent Developments in the Use of the
MMPI (MMPI-2 and MMPI-A), St. Petersburg Beach, FL)

You might also like