Methods of Measuring Health Care Service Quality

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Methods of Measuring Health-Care Service Quality

Hanjoon Lee
Linda M. Delene
Mary Anne Bunda
WESTERN MICHIGAN UNIVERSITY
Chankon Kim
SAINT MARY’S UNIVERSITY

Service quality is an elusive and abstract construct to measure, and extra actual medical outcome (O’Connor, Shewchuk, and Carney,
effort is required to establish a valid measure. This study investigates the 1994). A series of services marketing research, however, has
psychometric properties of three different measurements of health-care looked at the relationship between the services expected and
service quality as assessed by physicians. The multitrait-multimethod the service actually perceived as received by recipients (Car-
approach revealed that convergent validity was established for measures man, 1990; Finn and Lamb, 1991; Parasuraman, Zeithaml,
based on the single-item global rating method and multi-item rating and Berry, 1985, 1988; Zeithaml, Parasuraman, and Berry,
method. On the other hand, almost no evidence of convergent validity 1988). The services marketing approach places an emphasis
was found for the measures based on the constant-sum rating method. on quality evaluation from the recipients’ perspectives, but
Furthermore, discriminant validity for the seven health-care service qual- ignores the necessity for including an evaluation of the techni-
ity dimensions measured by the three methods was not well established. cal skill of the provider and the nature of the medical outcome.
The high levels of interdimensional correlations found suggested that the Especially in the area of health-care service, the services mar-
service quality dimensions may not be separable in a practical sense. The keting approach seems to neglect the important role of physi-
study suggested an ongoing effort is needed to develop a new service cians in shaping patients’ service expectations. A balanced
quality scale suitable to this unique service industry. J BUSN RES 2000. approach, therefore, utilizing aspects of service quality from
48.233–246.  2000 Elsevier Science Inc. All rights reserved. both the services marketing and health-care approaches may
be required. At the same time, the physicians’ view toward
the quality of their own services needs more research attention.
For the success of health-care organizations, accurate mea-

T
he health-care delivery system has been undergoing surement of health-care service quality is as important as
formidable challenges in the 1990s. Rapid movement understanding the nature of the service delivery system. With-
toward systems of managed care and integrated deliv- out a valid measure, it would be difficult to establish and
ery networks has led health-care providers to recognize real implement appropriate tactics or strategies for service quality
competition. To be successful or even survive in this hostile management. The most widely known and discussed scale
environment, it is crucial to provide health-care recipients for measuring service quality is SERVQUAL (Parasuraman,
with service that meets or exceeds their expectations. At the Zeithmal, Berry, 1988). Since the scale was developed, various
same time, it is important to known which dimensions of researchers have applied it across such different fields as secu-
health-care services physicians believe are necessary to consti- rities brokerage, banks, utility companies, retail stores, and
tute excellent service. It is crucial to have a better understand- repair and maintenance shops. The scale has also been applied
ing of service quality perceptions possessed by both recipients to the health-care field in numerous studies (Babakus and
and providers when shaping the health-care delivery system. Mangold, 1992; Brown and Swartz, 1989; Carman, 1990;
The traditional medical model has focused on the technical Headley and Miller, 1993; O’Connor, Shewchuk, and Carney,
nature of health-care events; the focus has been on the training 1994; Walbridge and Delene, 1993). However, with a few
and updated skills of the physicians and the nature of the exceptions, they did not systematically examine the psycho-
metric properties of their scale, because these studies dealt
with pragmatic and managerial issues for health-care services.
Address correspondence to Hanjoon Lee, Marketing Department, Haworth
College of Business, Western Michigan University, Kalamazoo, Michigan
Validity of the SERVQUAL scale seems not to be fully estab-
49008, USA. lished. A more stringent psychometric test has been recom-

Journal of Business Research 48, 233–246 (2000)


 2000 Elsevier Science Inc. All rights reserved. ISSN 0148-2963/00/$–See front matter
655 Avenue of the Americas, New York, NY 10010 PII S0148-2963(98)00089-7
234 J Busn Res H. Lee et al.
2000:48:233–246

mended for the improvement of the service quality measure- the most widely used quality assessment approaches has been
ment (for a recent review, please see Asubonteng, McCleary, proposed in the structure-process-outcome model of Donabe-
and Swan, 1996). dian (1980). In this model, the structure indicates the settings
In this study, we sought to examine rigorously the psycho- where the health care is provided, the process indicates how
metric properties pertaining to alternative methods of measur- care is technically delivered; whereas, the outcome indicates
ing health-care service quality as perceived by physicians. the effect of the care on the health or welfare of the patient.
Specifically, physicians were asked to assess health-care ser- In the structure-process-outcome model, quality was viewed
vice quality along the seven dimensions of a modified SERV- as technical in nature and assessed from the physicians’ point
QUAL scale. Dimensional responses were collected using three of view. It is well known that physicians pay significantly
measurement methods: single-item global rating method, con- more attention to the technical and functional dimensions of
stant-sum rating method and multi-item rating method, thus health-care service (Donabedian, 1988; O’Connor, Shewchuk,
resulting in multitrait-multimethod (MTMM) data. Based on and Carney, 1994). This tendency might be attributal to physi-
the results of construct validation conducted on the MTMM cian education and training. Considering the potentially fatal
data, we reported findings regarding the convergent validity and irrevocable consequences of poor medical quality (mal-
of the three methods and the discriminant validity of the seven practice) in health care, in contrast to other service industries,
service quality dimensions as measured by the three methods. it would be logical and desirable for physicians to hold such
an attitude.
A difference has been observed between the service market-
Previous Research ing approach emphasizing recipients’ perspectives and the
Two Approaches in Health-Care Service Quality traditional health-care approach honoring physicians’ con-
Service quality is an exclusive and abstract concept because cerns. Both patient groups and physician groups are important
of its “intangibility” as well as its “inseparability of production constituents of the health-care system. However, it has been
and consumption” (Parasuraman, Zeithaml, and Berry, 1985). found that health-care recipients have difficulty in evaluating
Various approaches have been suggested regarding how to medical competence and the security dimensions (i.e., cre-
define and measure service quality. The services marketing dence properties) considered to be the primary determinant
literature has defined service quality in terms of what service of service quality (Bopp, 1990; Hensel and Baumgarten,
recipients receive in their interaction with the service providers 1988). This inability or impossibility of assessing the technical
(i.e., technical, physical, or outcome quality), and how this quality received in health-care service leads patients to rely
technical quality is provided to the recipients (i.e., functional, more heavily on other dimensions, such as credibility or tangi-
interactive, or process quality) (Grönoos, 1988; Lehtinen and bility (i.e., search properties) when inferring the quality of
Lehtinen, 1982; Berry, Zeithaml, and Parasuraman, 1985). health-care service (Bowers, Swan, and Taylor, 1994). This
Parasuraman, Zeithaml, and Berry (1985) asserted that con- lack of patient ability to make a proper evaluation raises a
sumers perceive service quality in terms of the gap between question regarding the gap analysis paradigm suggested by
received service and expected service. They identified 10 di- Parasuraman, Zeithaml, and Berry (1985). If customers in
mensions of service quality: access, communication, compe- the health-care delivery system cannot evaluate the important
tence, courtesy, security, tangibles, reliability, responsiveness, service dimensions, can they have a reasonable expectation
credibility, and understanding or caring. They then classified about services they will receive? If they cannot, the contribu-
10 dimensions into three categories: search properties (credi- tion of the health-care recipients’ views in influencing the
bility and tangibles; dimensions that consumers can evaluate design of an efficient system may not be as significant as we
before purchase), experience properties (reliability, respon- formerly thought.
siveness, accessibility, courtesy, communication, and under- If the health-care service industry were similar to other
standing/knowing the consumer; dimensions that can be industries that provide services for their customers, a patient
judged during consumption or after purchase), and credence could choose among many physicians who offer different
properties (competence and security; dimensions that a con- prices, and provide service that differs in terms of medical
sumer finds hard to evaluate even after purchase or consump- technical quality (i.e., competence and security) or other ser-
tion). vice-related dimensions. The reality in the health-care industry
In the area of traditional health-care research, the quality is different. Patients do not have enough information about
of health care has been viewed from a different perspective. their physicians. Even if more information were available and
Quality has been defined as “the ability to achieve desir- accessible, patients probably could not weigh the information
able objectives using legitimate means” (Donabedian, 1988, properly. Physician choice is often made not by the patients
p. 173), where the desirable objective implied “an achievable themselves, but through referral from the patient’s primary
state of health.” Thus, quality is ultimately attained when a doctor, from his or her health organization (HMO), and/or
physician properly helps his or her patients to reach an achiev- from friends. Although service recipients’ perceptions toward
able level of health, and they enjoy a healthier life. One of service is valuable for improving health-care service quality,
Measuring Health-Care Service Quality J Busn Res 235
2000:48:233–246

it is as crucial to understand physicians’ perceptions of service the confounding effects of random and systematic errors from
quality when designing and improving the health-care delivery trait variance. Without disentangling the variation in measures
system. Therefore, this study placed its focus on how physi- attributable to the trait, we cannot assess the extent of the
cians perceive health-care service quality. true relationship between the measures and traits (i.e., the
convergent validity) or the true relationships between traits
Measurement Issues in (the discriminant validity).
Health-Care Service Quality Despite its seminal role in the understanding and assess-
ment of construct validity, the original Campbell and Fiske
A system cannot be designed and operated effectively unless
(1959) approach to MTMM analyses has limitations. Most
the quality of the product or service can be understood or
notably, it prescribes no precise standards for determining
correctly measured. One major stride toward developing
evidence of construct validity. Furthermore, the procedure
quantitative measures of service quality was made by Parasura-
does not yield specific estimates of trait, method, and random
man, Zeithaml, and Berry (1985), and the SERVQUAL scale
variance. Several alternative procedures have been proposed
was the consequence of this effort (Parasuraman, Zeithaml,
for analyzing MTMM data (for a review, see Bagozzi, 1993).
and Berry, 1988). The 10 dimensions discussed in the 1985
The construct validation process in this study utilized two of
study were reduced into five dimensions in SERVQUAL after
these alternative MTMM approaches; namely, application of
an empirical test. Their original objective was to discover di-
the confirmatory factor analysis (CFA) model (Joreskog and
mensions that were generic to all services. If this assumption
Sorbom, 1993) and the correlated uniqueness (CU) model
is correct, dimensional patterns for service quality should be
(Marsh, 1989).
similar across different service industries. Several researchers
have since examined the stability of SERVQUAL dimensions
(Asubonteng, McCleary, and Swan, 1996; Babakus and Boller, Research Design
1992; Carman, 1990; Dabholkar, Thorpe, and Rentz, 1996).
Carman (1990) found that the numbers of service quality Design of the MTMM Study
dimensions were not stable across different services in his Previous studies have indicated that SERVQUAL must be mod-
factor analysis results. He also found that, among the five ified for each unique service sector (Carman, 1990; Babakus
dimensions, items measuring “tangibles” and “reliability” con- and Boller, 1992). Haywood-Farmer and Stuart (1988) empir-
sistently loaded on the expected factors across different ser- ically tested SERVQUAL and found it did not encompass all
vices. However, items tapping “assurance” and “empathy” the dimensions of professional service quality. They suggested
broke into different factors. A similar finding was reported that service dimensions for core service, service customization,
by Babakus and Boller (1992). There seems to be a consensus and knowledge and information be added to the five dimen-
that SERVQUAL is not a generic measure for all service indus- sions of SERVQUAL. Of these additional dimensions, core
tries and that service-specific dimensions other than those service was found to be the most important factor not repre-
suggested in SERVQUAL may be needed to understand service sented in the SERVQUAL instrument. Related research of
quality perceptions fully. professional service quality perception was done by Brown
Although these studies have generated insight into the and Swartz (1989). This study found that “professionalism”
measurement properties of SERVQUAL, their measurement and “professional competence” were significant factors for
analyses, which were aimed primarily at checking dimension- both providers and patients in the evaluation of service quality.
ality, were inadequate for testing the construct validity of the The modified SERVQUAL approach utilized in this re-
scale. Construct validity is defined as the degree of correspon- search, therefore, included the five dimensions of SERVQUAL
dence between constructs and their measures (Peter, 1981). (Parasuraman, Zeithaml, and Berry, 1985), as well as the “core
A systematic and rigorous construct validation requires multi- medical service” (Haywood-Farmer and Stuart, 1989) and the
trait-multimethod data, which is the correlation matrix for “professionalism/skill” (Brown and Swartz, 1989) dimensions.
two or more traits where each trait is measured by two or The latter two dimensions were included to measure the tech-
more methods. Demonstration of construct validity requires nical aspects of health-care service. These same service quality
evidence of convergent validity and discriminant validity dimensions were also used in the earlier research of Walbridge
(Campbell and Fiske, 1959). and Delene (1993), which involved physician attitudes toward
The two main sources of variance in measures of a construct service quality. The seven dimensions, their origins, and their
are the construct or trait being measured and measurement definitions can be found in Table 1.
error. Measurement error can be divided further into random It is well known that measurement methods can affect the
error and systematic error (e.g., method variance). Single mea- nature of a respondent’s evaluation (Kumar and Dillon, 1992;
sures do not allow us to make an assessment of measurement Phillips, 1981). Of the various methods used in measurement,
error. With a single method, we cannot separate trait variance three were selected for this research: single-item global rating
from unwanted method variance (Bagozzi, Yi, and Phillips, method, constant-sum rating method, and multi-item rating
1991). Thus, construct validation is a process of separating method. The single-item global rating method provided the
236 J Busn Res H. Lee et al.
2000:48:233–246

Table 1. Service Quality Attributes


Attribute Definition Authors

Assurance Courtesy displayed by physicians, nurses, or office staff Parasuraman, Zeithaml, and Berry, 1988
and their ability to inspire patient trust and
confidence

Empathy Caring, individualized attention provided to patients by Parasuraman, Zeithaml, and Berry, 1988
physicians and their staffs

Reliability Ability to perform the expected service dependably and Parasuraman, Zeithaml, and Berry, 1988
accurately

Responsiveness Willingness to provide prompt service Parasuraman, Zeithaml, and Berry, 1988

Tangibles Physical facilities, equipment, and appearance of contact Parasuraman, Zeithaml, and Berry, 1988
personnel

Core medical service The central medical aspects of the service: appropriate-
ness, effectiveness, and benefits to the patient

Professionalism/skill Knowledge, technical expertise, amount of training, Swartz and Brown, 1989
and experience

respondent with dimensions and definitions of each service to gather information regarding the relative importance of
dimension. With this method, the respondent reported his or each dimension. One way to do this is through the use of
her evaluation rating on each dimension—without evaluating constant-sum rating method. The constant-sum rating method
the multiple indicators (components) of each dimension and forces respondents to identify the comparative importance of
without comparing it to other service dimensions. The con- each service dimension. In health-care study, this constant-
stant-sum rating method, in contrast, is comparative in nature, sum method was used to examine determinant dimensions
requiring the respondent to allocate a given number of “impor- in hospital preference (Woodside and Shinn, 1988). Constant-
tance points” among various dimensions. In this method, sum method also tends to eliminate individual response styles
respondents were forced to think about the relative impor- of “nay-saying” and the “halo effects,” which cause respondents
tance of each service dimension. In the multi-item rating to carry over their judgments from one dimension to another
method, multiple indicators were developed that were in- (Churchill, 1991). In an earlier, related study (Walbridge and
tended to capture each of seven service quality dimensions. Delene, 1993), it was believed that physicians may be reluctant
It is generally accepted that the multi-item rating method to rate any service quality dimension as unimportant. Thus,
can provide a better sampling of the domain of content than the constant-sum rating method was employed in this research
the single-item global rating method (Bagozzi, 1980). Thus, to determine its applicability as an efficient measurement
content validity can be enhanced with multiple-item measures. method of health-care service quality where physicians’ per-
They also have the advantage of allowing the computation ceptions were surveyed.
of reliability coefficients (e.g., Cronbach’s alpha, [Cronbach, There are a few drawbacks to using the constant-sum
1951]). Reliability assessment with the single-item global rating method. The first is its inherent increase in task complexity for
method is a problem in typical survey research studies, because respondents. It requires more mental effort from the individual
measurement error cannot be estimated with a single item. than either the single-item or multi-item methods. Each rating
However, a drawback in using the multi-item rating decision affects other ratings because of the constraints im-
method in place of the single-item global rating method is posed by the nature of the measurement process. As the num-
the tendency toward questionnaire length along with possible ber of attributes increase, respondents become more taxed
detrimental effects on response rate and respondent fatigue. (Aaker, Kumar, and Day, 1994; Malhotra, 1995). This increase
In other words, the single-item global rating method has the in complexity may lead the subject to use a subset of the
potential advantage of parsimony for the respondent. There- dimensions instead of including all of them in his or her
fore, in areas where there is little or no difference between evaluation (Churchill, 1991). This effect may be heightened
the explanatory power of single- and multi-item methods, the if the subject does not view the dimensions as being completely
single-item global rating method may be preferable in studies independent. This lack of independence was found to produce
where parsimony is important. spurious correlations sometimes (Kerlinger, 1973).
In studying service quality dimensions, it may be helpful In this study, we asked physicians how they perceived the
Measuring Health-Care Service Quality J Busn Res 237
2000:48:233–246

seven dimensions of health-care service quality as measured Of the original 1,428 addresses, 72 were invalid. Six ques-
by three different methods: single-item global rating method, tionnaires returned were unusable. A total of 348 responses
multi-item rating method, and constant-sum rating method. were received from the two mailings with an effective response
rate of 24.4%. Demographic characteristics of our sample were
Questionnaire Development compared with those of the physician population in the United
A panel of physicians was consulted on questionnaire design States in Table 2. The similarities become apparent through
and semantics, and input was also received from a state univer- simple visual inspection. The population of physicians in the
sity hospital. The questionnaire was divided into four sections, United States is 16.4% female; whereas, the sample was 18.7%
with one section for each of the three measurement methods female. The age distribution of the sample was also somewhat
and the last section containing demographic questions. Sec- similar, especially for physicians under the age of 65, which
tion One utilized the single-item global rating method. The accounted for about 90%. The sample was similar to the
subjects were given the name and definition of each dimension population on the basis of practice specialty. The goodness-
indicated in Table 1 and asked to rate the importance of each of-fit tests were performed for sex, age, and specialty group
dimension on a seven-point scale. Pretesting with physicians categories. The results were ␹2 ⫽ 0.4 (DF ⫽ 1, p ⫽ 0.729)
showed that a conventional scale using the two bipolar adjec- for sex, ␹2 ⫽ 14.4 (DF ⫽ 4, p ⬍ 0.000) for age, and ␹2 ⫽
tives “unimportant” and “important” was inappropriate. Physi- 4.8 (DF ⫽ 3, p ⫽ 0.084) for specialty group. These results
cians were reluctant to rate any of the dimensions as “unimpor- suggest that the sample reflected the population’s sex and
tant” or “less important.” Further pretesting results suggested specialty group compositions, but consisted of physicians who
the use of “Important” for the low end (one) and “Critical” were somewhat older than the population.
for the high end (seven) in a seven-point scale.
Section Two was a constant-sum rating method that asked
the subjects to distribute a fixed number of “importance Analysis
points” among the seven dimensions. This led respondents Instrument Reliability for the Multi-Item
to rate the comparative importance of each service dimension Rating Method
relative to the others. The same names and definitions of the
It is necessary to derive a composite score for each of the
dimensions were used as in Section One.
seven service quality dimensions measured by the multi-item
Section Three consisted of forty-three (43) “practice charac-
rating method. For this purpose, the level of internal consis-
teristics.” Placed in random order, each practice characteristic
tency was checked as a way of assessing the homogeneity
corresponded with one of the seven service quality dimen-
of items comprising each dimension. The Cronbach’s alpha
sions, with between five and seven characteristics pertaining
indices for the seven dimensions ranged from 0.80 to 0.90,
to each service quality dimension based on a previous study
with a mean of 0.85. This high degree of internal consistency
(Walbridge and Delene, 1993) (please see Appendix A). In
(Nunnally, 1978) allowed us to sum the ratings to get compos-
this section, physicians evaluated the practice characteristics,
ite scores for each of the seven dimensions. Each composite
without referring to the names or definitions of the pertinent
score indicated a measure of each service quality dimension
service quality dimensions. Practice characteristics were evalu-
obtained by the multi-item rating method. These composite
ated using the same “important–critical” dichotomy used in
Section One. Respondents then answered questions related scores were used for the MTMM analysis along with the other
to demographic variables in the last section.
Table 2. Demographics: Population vs. Sample
Sampling
Populationa Sample
Physicians (1,428) were randomly selected by a commercial
mail-order vendor from a national databased leased from the Age
American Medical Association. Some professional categories Under 35 24.4 20.3
were eliminated to remove nonphysicians from the list, as 35–44 39.8 32.1
well as specialties considered divergent from the mainstream 45–54 21.6 18.5
55–64 12.5 17.1
of health-care service (for a listing of the specialties used,
65 and over 1.7 12.1
please see Appendix B). The four-page, self-administered Gender
questionnaire was mailed to physicians. To attain a higher Male 83.6 81.0
response rate, physicians received a “warm-up” postcard an- Female 16.4 18.7
nouncing the arrival of the questionnaire within the next week. Specialty group
The initial mailing of the questionnaire included a cover letter Primary care 34.5 39.4
Surgical 12.9 10.3
explaining the purpose of the research and the confidentiality Hospital based 10.8 12.9
of responses. Approximately 6 weeks later, a follow-up mailing Other specialties 22.5 33.3
of 1,200 questionnaires was sent to physicians who had not
yet responded. a
Source: American Medical Association, 1990.
238 J Busn Res H. Lee et al.
2000:48:233–246

scores assessed by the single-item global rating method and related, no such assumption is necessary for the CFA model.
the constant-sum rating method. Finally, both CFA and CU models are premised on the additive
effects of traits and methods on measures.
Construct Validation of the Modified Given its robust nature, the CU model is an attractive
SERVQUAL Scale alternative when the CFA model results in an ill-defined solu-
Our investigation of the construct validity of the modified tion or nonconvergence (Bagozzi, 1993; Marsh, 1989). We
SERVQUAL involved CFA of the multitrait–multimethod subsequently tested the CU model’s fit to our MTMM data
(MTMM) data. CFA allows methods to affect measures of traits (see Figure 1 for the diagram of the CU model). Another
in different degrees; whereas, methods are assumed to covary application of the CU model in a similar situation can be
freely among themselves. CFA then provides assessments of found in Kim and Lee’s (1997) construct validation study
over-all goodness of fit for the variable specification of the involving measures of children’s influences on family deci-
given MTTM data, while enabling the partition of variance in sions. The CU model’s fit as indicated by the ␹2 test result
measures into trait, method, and error components. Trait (␹2 (105) ⫽ 311.62, p ⫽ .00) was unsatisfactory. However,
variance reflects the shared variation for measures of a com- because of the ␹2 test’s sensitivity to sample size, some re-
mon trait and can be used to assess convergent validity. Dis- searchers (Bentler, 1990; Bagozzi and Yi, 1991) have suggested
criminant validity among traits is indicated by intertrait corre- fit assessments based on other goodness-of-fit indices when
lations significantly lower than unity (Bagozzi and Yi, 1991). the sample size is suspected to be the cause for rejecting
As suggested by Bagozzi (1993) and Widaman (1985), we the hypothesized model. One frequently used measure is the
first tested a CFA model based on the hypothesis that the comparative fit index (CFI), which evaluates the practical
variation in measures can be explained by traits and random significance of the variance explained by the model (for a
error (i.e., the trait-only model). In this model, there are seven detailed discussion, seen Bentler, 1990 and Bagozzi, Yi, and
traits (i.e., seven service quality dimensions), and each trait Phillips, 1991). For our CU model, computation of the CFI
is indicated by three measures. Each of three measures is related yielded .96. This is much greater than the .90 rule of thumb
to its own rating method (i.e., single-item global rating suggested as the minimum acceptable level by Bentler (1990).
method, constant-sum rating method, etc.,). This model re- Therefore, the CU model captures a significant proportion of
sulted in poor fit, as indicated by ␹2 (168) ⫽ 1935.04, p ⫽ variance of our MTMM data from a practical point of view;
.00. A probable cause for the trait-only model’s poor fit was the hence, little variance remains to be accounted for.
presence of method factors as important sources of variation in Table 3 presents the estimated factor loadings for the CU
the measures (Bagozzi, 1993; Widaman, 1985). Subsequently, model. Significant trait factor loadings (t ⬎ 2.0) establish the
another CFA model that incorporated trait and method factors convergent validity of the measures (Widaman, 1985; Bagozzi
was tested. Estimation of this trait-method model was not and Yi, 1991). Although the trait factor loadings for all the
possible, however, because iterations failed to converge. Such measures based on global single-item method and multi-item
occurrence in the confirmatory factor analysis of a trait- method are significant, only three of the seven constant-sum
method model is not uncommon (Marsh and Bailey, 1991; measures were significant. The three dimensions of health-care
Bagozzi, 1993; Van Driel, 1978). Also, frequently found in service quality for which the constant-sum measure exhibited
the CFA solution are improper parameter estimates, such as convergent validity are assurance, responsiveness, and tangi-
negative variances. In all these instances, the confirmatory bles. An assessment of the extent of convergence shown by
factor analysis model is construed as an inappropriate specifi- each measure requires a decomposition of the total variance
cation of the variable structure and must be rejected (Bagozzi into proportions attributable to the corresponding trait and
and Yi, 1991). random error. As in the CFA model, the amount of trait
In view of these problems that frequently accompany the variance in a measure is inferred by the squared trait factor
application of CFA models to MTMM data, Marsh (1989) loading for that measure. For all seven dimensions of health-
proposed the CU model as an alternative. The CU model care service quality, trait variances for constant-sum measures
differs from the CFA model primarily in the interpretation of were extremely low, with a range between 0.00 and 0.03 (or
method effects. In the CFA model, method effects are inferred 0 and 3%). The best results were found for the global single-
by squared method factor loadings. In contrast, the CU model item measures. Their trait variances ranged between 0.45 and
specification does not include method factors. Instead, method 0.83, with a mean level of .57. The seven multi-item measures
effects are depicted as and inferred from correlations among showed levels of trait variance generally lower than the global
error terms corresponding to the measures based on common single-item measures. Trait variances for these measures
method. This depiction of method effect is the main reason ranged from 0.30 to 0.49, with a mean of 0.39. According to
why the CU model seldom produces an ill-defined solution Bagozzi and Yi (1991), strong (weak) evidence for convergent
(Marsh, 1989, p. 341). Another difference between the two validity is achieved when at least (less than) half of the total
approaches rests on the assumption regarding method correla- variation in a measure is caused by trait. According to this
tions. Whereas the CU model assumes that methods are uncor- rule of thumb, there is strong evidence for convergent validity
Measuring Health-Care Service Quality J Busn Res 239
2000:48:233–246

Figure 1. Correlated uniqueness model for the MTMM data.


240 J Busn Res H. Lee et al.
2000:48:233–246

Table 3. Summary of Parameter Estimates for the Correlated Uniqueness Model


Factor Loading
Traits
Core Medical Professionalism/
Assurance Service Empathy Skills Reliability Responsiveness Tangibles

Single-item global measure


Assurance 0.69 (0.10) 0.00 0.00 0.00 0.00 0.00 0.00
Core medical service 0.00 0.67 (0.10) 0.00 0.00 0.00 0.00 0.00
Empathy 0.00 0.00 0.75 (0.10) 0.00 0.00 0.00 0.00
Professionalism/skills 0.00 0.00 0.00 0.71 (0.10) 0.00 0.00 0.00
Reliability 0.00 0.00 0.00 0.00 0.71 (0.10) 0.00 0.00
Responsiveness 0.00 0.00 0.00 0.00 0.00 0.83 (0.11) 0.00
Tangibles 0.00 0.00 0.00 0.00 0.00 0.00 0.91 (0.11)
Constant-sum Measure
Assurance ⫺0.18 (0.06) 0.00 0.00 0.00 0.00 0.00 0.00
Core medical service 0.00 ⫺0.01 (0.06) 0.00 0.00 0.00 0.00 0.00
Empathy 0.00 0.00 0.05 (0.06) 0.00 0.00 0.00 0.00
Professionalism/skill 0.00 0.00 0.00 0.06 (0.06) 0.00 0.00 0.00
Reliability 0.00 0.00 0.00 0.00 0.10 (0.06) 0.00 0.00
Responsiveness 0.00 0.00 0.00 0.00 0.00 0.16 (0.06) 0.00
Tangibles 0.00 0.00 0.00 0.00 0.00 0.00 0.16 (0.06)
Multi-item Measure
Assurance 0.63 (0.09) 0.00 0.00 0.00 0.00 0.00 0.00
Core medical service 0.00 0.55 (0.09) 0.00 0.00 0.00 0.00 0.00
Empathy 0.00 0.00 0.70 (0.09) 0.00 0.00 0.00 0.00
Professionalism/skill 0.00 0.00 0.00 0.62 (0.09) 0.00 0.00 0.00
Reliability 0.00 0.00 0.00 0.00 0.66 (0.09) 0.00 0.00
Responsiveness 0.00 0.00 0.00 0.00 0.00 0.60 (0.09) 0.00
Tangibles 0.00 0.00 0.00 0.00 0.00 0.00 0.62 (0.09)

Standard error of estimates are shown in parantheses.


All zero values indicate that their corresponding parameters were fixed.

for most of our global single-item measures (5 out of 7). Trait indicates the existence of a significant method effect, the mag-
variances for all seven multi-item measures fall below the level nitudes of the uniqueness correlations (range: 0.03–0.36;
of 0.5. Therefore, evidence for convergent validity is weak for mean 0.19) suggest that the size of method effect is small.
these measures using multi-item rating method; whereas, the The very large error variances shown in Table 4b demonstrate
constant-sum measures exhibit little or no convergent validity. that almost all the variations in the constant-sum measures
As noted before, the effects of methods under the CU model are attributable to random error. With regard to the multi-
are represented as correlations among error (uniqueness) item measures, as can be seen in Table 4c, all uniqueness
terms. Although the CFA model enables the separation of the covariances are significant. Uniqueness correlations were also
variance portion that is caused by method bias, we can only generally high (range: 0.37–0.71; mean 0.59).
infer the significance and size of the method bias in the CU Our next investigation focused on discriminant validity
model analysis based on examination of the estimated unique- among the seven dimensions of health-care service quality. It
ness correlations. Table 4(a), 4(b), and 4(c) display the esti- consisted in verifying whether the correlations among the
mated error variances and covariances for single-item global seven dimensions (i.e., traits) as measured by three different
measures, constant-sum measures, and multi-item measures, methods were significantly different from unity (⫹1 or ⫺1)
respectively. For the single-item measures, a significant covari- (Widaman, 1985; Bagozzi, Yi, and Phillips, 1991). As shown
ance between error terms were found in 14 of 21 possible in Table 5, all of the correlations among the dimensions are
cases (see Table 4a). When these covariances were converted significant and very high (range: 0.69–0.99; mean: 0.84).
into correlations, the values ranged from 0.28 to 0.82, with Seven of the 21 correlations were above the 0.90 level. Such
an average of 0.59. These levels of uniqueness correlations high correlations among service quality dimensions (range:
demonstrate a considerable degree of method effect contained 0.67–0.92; mean: 0.82) were also observed in the study con-
in the measurement. Therefore, a substantial portion of the ducted by Dabholkar, Thorpe, and Rentz (1996). It should
variations in the global single-item measures can be attributed be noted, however, that these correlations are disattenuated
to the measurement procedure. correlations (i.e., corrected for measurement error) and are
For the constant-sum measures, 16 of the 21 uniqueness larger than those correlations among measures. Particularly
covariances were significant (see Table 4b). Although this notable is the correlation between the dimensions of assurance
Measuring Health-Care Service Quality J Busn Res 241
2000:48:233–246

Table 4. Summary of Parameter Estimates for the Correlated Uniqueness Model


Traits
Core Medical Professionalism/
Assurance Service Empathy Skills Reliability Responsiveness Tangibles

(a) Error Variance and Covariance for Single-Item Global Measures


Assurance 0.51 (0.12)
Core medical service 0.37 (0.11) 0.53 (0.12)
Empathy 0.35 (0.13) 0.33 (0.12) 0.41 (0.14)
Professionalism/skills 0.35 (0.10) 0.34 (0.11) 0.28 (0.10) 0.51 (0.13)
Reliability 0.32 (0.11) 0.38 (0.11) 0.30 (0.11) 0.41 (0.12) 0.49 (0.13)
Responsiveness 0.27 (0.11) 0.23 (0.11) 0.22 (0.11) 0.26 (0.12) 0.26 (0.13) 0.31 (0.16)
Tangibles 0.14 (0.11) 0.09 (0.11) 0.11 (0.11) 0.08 (0.13) 0.10 (0.13) 0.09 (0.14) 0.17 (0.19)

(b) Error Variance and Covariance for Constant-Sum Measures


Assurance 0.99 (0.08)
Core medical service ⫺0.03 (0.06) 1.00 (0.08)
Empathy 0.24 (0.06) ⫺0.18 (0.06) 0.99 (0.08)
Professionalism/skills ⫺0.35 (0.06) ⫺0.14 (0.06) ⫺0.13 (0.06) 0.99 (0.08)
Reliability ⫺0.32 (0.06) ⫺0.21 (0.06) ⫺0.21 (0.06) 0.20 (0.06) 0.99 (0.08)
Responsiveness ⫺0.16 (0.06) ⫺0.20 (0.06) ⫺0.15 (0.06) ⫺0.16 (0.06) 0.27 (0.06) 0.97 (0.08)
Tangibles ⫺0.10 (0.06) ⫺0.23 (0.06) ⫺0.09 (0.06) ⫺0.18 (0.06) 0.07 (0.06) 0.26 (0.06) 0.97 (0.08)

(c) Error Variance and Covariance for Multi-Item Measures


Assurance 0.58 (0.11)
Core medical service 0.35 (0.09) 0.68 (0.09)
Empathy 0.34 (0.11) 0.36 (0.09) 0.49 (0.12)
Professionalism/skills 0.39 (0.08) 0.41 (0.09) 0.32 (0.09) 0.60 (0.11)
Reliability 0.32 (0.09) 0.32 (0.09) 0.28 (0.09) 0.31 (0.10) 0.55 (0.11)
Responsiveness 0.31 (0.08) 0.34 (0.08) 0.28 (0.08) 0.29 (0.08) 0.37 (0.09) 0.63 (0.10)
Tangibles 0.38 (0.08) 0.38 (0.08) 0.32 (0.08) 0.43 (0.09) 0.41 (0.09) 0.37 (0.08) 0.61 (0.10)

All error variance and covariance estimates differing significantly from zero are underscored.
Standard error of estimates are within parentheses.

and empathy (0.99), which is near unity. This high correlation the multi-item measure. For the constant-sum measure, on the
between the assurance dimension and the empathy dimension other hand, there was virtually no sign of convergence. Almost
seemed to be consistent with the findings of the past studies all of the variance in the seven constant-sum measures (for the
that discovered the dimensional instability of the SERVQUAL seven service dimensions) was attributed to random error.
scale (Babakus and Boller, 1992; Carman, 1990). A formal With respect to discriminant validity, from a strict statistical
test of discriminant validity was conducted by computing a viewpoint, discrimination was demonstrated among the seven
95% confidence interval (the estimated correlation ⫾ twice its health-care service quality dimensions, except for one instance
standard error estimate) for each of the estimated correlations (between “assurance” and “empathy”). That is, all intertrait
among the seven dimensions. Despite the high levels of corre- (or interdimensional) correlations except one were signifi-
lation observed between the dimensions, only one (that be- cantly less than unity. However, the magnitudes of the in-
tween assurance and empathy) fell within the interval. Hence, tertrait correlations were generally very high, with a mean
from a strict statistical point of view, discriminant validity value of 0.84. Hence, the seven dimensions did not seem
was established, except for between assurance and empathy. separable in a practical sense. We should note, however, that
However, whether these dimensions are distinct from a practi- the interpretation of discriminant validity is meaningful only
cal standpoint is highly questionable. when convergent validity is established (Bagozzi, 1993). Given
In summary, the above results of the CU model analysis our finding that convergent validity was established for two
of the MTMM data first led us to conclude that convergent of the three types of measures tested, the evidence relating to
validity was established for two of the three measures, the discriminant validity should be viewed with caution.
single-item global measure and multi-item measure. Based on
Bagozzi and Yi’s (1991) rule of thumb, only the single-item
global measure, which captured an average trait variance
Implications and Conclusion
greater than 0.50, demonstrated strong evidence of conver- One of the more pressing challenges health-care providers and
gence; whereas, weak evidence of convergence was found for researchers face is to develop a better understanding of the key
242 J Busn Res H. Lee et al.
2000:48:233–246

Table 5. Summary of Parameter Estimates for the Correlated Uniqueness Model


Trait Intercorrelation
Traits
Core Medical Professionalism/
Assurance Service Empathy Skills Reliability Responsiveness Tangibles

Assurance 1.00
Core medical service 0.94 (0.03) 1.00
Empathy 0.99 (0.02) 0.91 (0.03) 1.00
Professionalism/skills 0.80 (0.04) 0.92 (0.03) 0.77 (0.04) 1.00
Reliability 0.89 (0.03) 0.90 (0.03) 0.83 (0.03) 0.95 (0.02) 1.00
Responsiveness 0.81 (0.04) 0.80 (0.05) 0.76 (0.04) 0.85 (0.04) 0.91 (0.03) 1.00
Tangibles 0.72 (0.05) 0.78 (0.05) 0.69 (0.05) 0.83 (0.04) 0.83 (0.03) 0.83 (0.03) 1.00

All error variance and covariance estimates differing significantly from zero are underscored.
Standard error of estimates are within parentheses.

dimensions constituting health-care quality and valid ap- standing of the issues involved in the questions reduces mea-
proaches to their measurement. This research focused on con- surement error in responses. Thus, such an outcome may not
ceptual and measurement issues relating to the study of health- be obtained from health-care recipients, who may not possess
care quality. In contrast to most of the past research in this such a clear understanding. Nonetheless, this finding suggests
area, we took the physician’s (service provider’s) rather than that single-item global measures may elicit responses that are
the patient’s (service recipient’s) perspective. This approach as reliable as the multi-item measures when knowledgeable
is justified in view of the prevalent understanding that health- service providers are involved, and do so with greater parsi-
care recipients are often unable to evaluate key dimensions mony. The single-item global rating method may be useful if
of health-care service (Bopp, 1990; Hensel and Baumgarten, the goal of a study is to gain an understanding for the general
1988), and, thus, may not have as much to contribute to the nature of health-care service issues. We should add, however,
design of an effective health-care system as providers. Another that assessment of reliability level for single-item measures is
contrast is found in methodological approach. Whereas past not possible in most cases. This remains a major problem for
studies that investigated the validity of the SERVQUAL scale the single-item global rating method.
tended to lack methodological rigor and scope, our construct When the research is to be diagnostic in nature, focusing
validation procedure based on the MTMM data analysis al- on specific characteristics of the service offering in an effort to
lowed for a more systematic scrutiny of key measurement identify areas for improvement, the multi-item rating method
properties of the scale (i.e., convergent validity, discriminant has greater utility. The multi-item rating method has the dis-
validity, and method bias). tinct advantage of being able to generate detailed information
First, we compared the performance of the constant-sum on specific aspects of service quality that can be used as a
rating method, the single-item global rating method, and the basis for action plans. As a caveat, it should be noted that
multi-item rating method in measuring the health-care service our recommendation regarding the use of the single-item
quality. All seven measures based on the constant-sum method global rating method and the multi-item rating method is
showed almost complete lack of convergence with the mea- limited to future research involving health-care service provid-
sures based on other methods. One plausible explanation for ers’ perceptions. For research involving the perceptions of
this is the relatively high degree of complexity inherent in patients who do not understand the key dimensions of health-
the measures using the constant-sum method. This measure care service quality, the multi-item rating method seems to
requires more effort on the part of the respondents, and, thus, be a better choice, because this method is less susceptible to
is likely to create cognitive strains. Consequently, resulting measurement error than the single-item global rating method.
responses may not be as reliable as those obtained by other In terms of the discriminant validity of the seven health-care
methods. In fact, many physicians seemed to have difficulty service quality dimensions, our results were not supportive of
allocating the importance points among the seven categories. the validity. The computed magnitudes of interdimensional
In contrast to common expectation, the single-item global correlations were very high. Although all correlations except
measures performed better than the multi-item measures in one satisfied the statistical criterion applied (i.e., significantly
capturing the intended dimensions. An attempt to generalize less than unity), their magnitudes (ranging between 0.69–
this finding beyond health-care providers may be inappropri- 0.99) cast much doubt on the separability of these dimensions
ate, because the result could have been caused by the high from a practical viewpoint. Considering that a similar finding
level of familiarity that our physician respondents had with has been reported before (Dabholkar, Thorpe, and Rentz,
the health-care service quality dimensions. A clear under- 1996), a caution is warranted in future applications of the
Measuring Health-Care Service Quality J Busn Res 243
2000:48:233–246

SERVQUAL scale or its modified versions in health-care ser- criminant Validity by the Multitrait-Multimethod Matrix. Psycho-
vice quality research. Because the validation of a measure is logical Bulletin 56(2) (1959): 81–105.
an ongoing process, we suggest that more research be directed Carman, James M.: Consumer Perceptions of Service Quality: An
Assessment of the SERVQUAL Dimensions. Journal of Retailing
toward producing a suitable adaptation of the SERVQUAL
66(1) (Spring 1990): 33–55.
scale. It is important for this research to take into consideration
Churchill, Gilbert A., Jr.: Marketing Research: Methodological Founda-
the unique aspects of this particular service sector. tions, 5th ed. The Dryden Press, Chicago. 1991.
This study limited its research scope to physicians’ percep-
Cronbach, Lee J.: Coefficient Alpha and the Internal Structure of
tions toward health-care service quality. Under CQI or TQM, Tests. Psychometrika 16 (1951): 297–334.
patients’ perceptions or evaluations of health-care services also
Dabholkar, Pratibha A., Thorpe, Darly I., and Rentz, Joseph O.: A
play a critical role. If health-care providers do not understand Measure of Service Quality for Retail Stores: Scale Development
how service recipients evaluate health-care services, it is diffi- and Validation. Journal of Academy of Marketing Science 24(1)
cult for providers to design or improve strategic planning (1996): 3–16.
and marketing activities effectively. Therefore, research based Donabedian, Avedis: Quality Assessment and Assurance: Unity of
upon the patients’ perspective is necessary. Based upon the Purpose, Diversity of Means. Inquiry (Spring 1988): 175–192.
perceptions of both parties in the health-care delivery system, Donabedian, Avedis: Explorations in Quality Assessment and Monitor-
we can identify areas where mutual understanding exists, ing, vol. 1: The Definition of Quality and Approaches to Its Assessment.
means to inform and educate the public, and ways to improve Health Administration Press, Ann Arbor, MI. 1980.
the current delivery system. Finn, David W., and Lamb, Clarles W., Jr.: An Evaluation of the
SERVQUAL Scale in a Retail Setting, in Solomon, R.H., ed. Ad-
vances in Consumer Research, vol. 18, Association of Consumer
References Research, Provo, UT. 1991.
Aaker, David A., Kumar, V., and Day, George S.: Marketing Research.
Grönroos, Christian: Service Quality: The Six Criteria of Good Per-
John Wiley & Sons, Inc., New York, NY. 1995.
ceived Service Quality. Review of Business (Winter 1988): 1–9.
Asubonteng, Patrick, McCleary, Karl J., and Swan, John E.: SERV-
QUAL Revisited: A Critical Review of Service Quality. The Journal Haywood-Farmer, John, and Stuart, F. Ian: Measuring the Quality
of Services Marketing 10(6) (1996): 62–71. of Professional Services. The Management of Service Operations.
Proceedings of the 3rd Annual International Conference of the
Babakus, Emin, and Mangold, W. Glynn: Adapting the SERVQUAL UK Operations Management Association. 1988.
Scale to Hospital Services: An Empirical Investigation. Health
Services Research 26 (February 1992): 767–786. Headley, D.E., and Miller, S.: Measuring Service Quality and Its
Relationship to Future Consumer Behavior. Journal of Health Care
Babakus, Emin, and Boller, Gregory W.: An Empirical Assessment Marketing 13(4) (December 1993): 32–41.
of the SERVQUAL Scale. Journal of Business Research 24(3) (1992):
253–268. Hensel, James S., and Baumgarten, Steven, A.: Managing Patient
Perceptions of Medical Practice Service Quality. Review of Business
Bagozzi, Richard P.: Causal Models in Marketing. John Wiley and 9(3) (Winter 1988): 23–26.
Sons, New York. 1980.
John, Joby: Improving Quality Through Patient-Provider Communi-
Bagozzi, Richard P., Yi, Youjae, and Phillips, Lynn W.: Assessing
cation. Journal of Marketing Management 1(1) (Fall 1991): 51–60.
Construct Validity in Organizational Research. Administrative Sci-
ence Quarterly 36(3) (1991): 421–458. Joreskog, Karl G., and Sorbom, Dag: LISREL 8: Structural Equation
Bagozzi, Richard P., and Yi, Youjae: Multitrait–Multimethod Matrices Modeling with the SIMPLIS Command Language. Lawrence Erlbaum,
in Consumer Research. Journal of Consumer Research 17 (March Hillsdale, NJ. 1993.
1991): 426–439. Kerlinger, Fred N.: Foundations of Behavioral Research. Holt, Rinehart,
Bagozzi, Richard P.: Assessing Construct Validity in Personality Re- and Winston, Inc., New York. 1973.
search: Applications to Measures of Self-Esteem. Journal of Re- Kim, Chankon, and Lee, Hanjoon: Development of Family Triadic
search in Personality 27(1) (1993): 49–87. Measures for Children’s Purchase Influence. Journal of Marketing
Bentler, Peter: Comparative Fit Indexes in Structural Models. Psycho- Research 34(3) (August 1997): 307–321.
logical Bulletin 107(2) (1990): 238–246. Kumar, Ajith, and Dillon, William R.: An Integrative Look at the
Berry, Leonard, Zeithaml, Valarie, and Parasuraman, A.: Quality Use of Additive and Multiplicative Covariance Structure Models
Counts in Services, Too. Business Horizons 28 (May/June 1985): in the Analysis of the MTMM Data. Journal of Marketing Research
44–52. 24 (February 1992): 51–64.
Bopp, Kenneth D.: How Patients Evaluate the Quality of Ambulatory Lehtinen, Uolevi, and Lehtinen, Jarmo: Service Quality: A Study of
Medical Encounters: A Marketing Perspective. Journal of Health Quality Dimensions. Unpublished working paper, Service Man-
Care Marketing 10(1) (March 1990): 6–15. agement Institute, Helsinki, Finland. 1982.
Bowers, Michael R., Swan, John E., and Taylor, Jack A.: Influencing Malhotra, Naresh K.: Marketing Research. Prentice Hall, Upper Saddle
Physicians Referrals. Journal of Health Care Marketing 14 (Fall River, NJ. 1996.
1994): 42–50. Marsh, Herbert W.: Confirmatory Factor Analyses of Multitrait–
Brown, Stephen W., and Swartz, Teresa A.: A Gap Analysis of Profes- Multimethod Data: Many Problems and a Few Solutions. Applied
sional Service Quality. Journal of Marketing 53(4) (April 1989): Psychology Measurement 13 (1989): 335–361.
92–98. Marsh, Herbert W., and Bailey, M.: Confirmatory Factor Analyses
Campbell, Donald T., and Fiske, Donald W.: Convergent and Dis- of Multitrait-multimethod Data: A Comparison of the Behavior of
244 J Busn Res H. Lee et al.
2000:48:233–246

Alternative Models. Applied Psychological Measurement 15 (1991): Perceptions of Hospital Operations by a Modified SERVQUAL
47–70. Approach. Journal of Health Care Marketing 10(4) (December
Nunnally, Jum C.: Psychometric Theory, 2nd ed. McGraw-Hill, New 1990): 47–55.
York. 1978. Swartz, Teresa A., and Brown, Stephen W.: Consumer and Provider
O’Connor, Stephen J., Shewchuk, Richard M., and Carney, Lynn Expectations and Experiences in Evaluating Professional Service
W.: The Great Gap. Journal of Health Care Marketing 14(2) (1994): Quality. Journal of the Academy of Marketing Sciences 17(2) (Spring
32–39. 1989): 189–195.
Van Driel, O.P.: On Various Causes of Improper Solutions of Maxi-
Parasuraman, A., Zeithaml, Valarie, and Berry, Leonard: A Concep-
mum Likelihood Factor Analysis. Psychometrika 43(2) (1978):
tual Model of Service Quality and Its Implications for Future
225–243.
Research. Journal of Marketing 49 (Fall 1985): 41–50.
Walbridge, Stephanie W., and Delene, Linda M.: Measuring Physician
Parasuraman, A., Zeithaml, Valarie, and Berry, Leonard: SERVQUAL: Attitudes of Service Quality. Journal of Health Care Marketing
A Multiple-Item Scale for Measuring Consumer Perceptions of 13(1) (Winter 1993): 6–15.
Service Quality. Journal of Retailing 64(1) (1988): 12–40.
Widaman, Keith F.: Hierarchically Nested Covariance Structure Mod-
Peter, J. Paul: Construct Validity: A Review of Basic Issues and Market- els for Multitrait–Multimethod Data. Applied Psychological Mea-
ing Practices. Journal of Marketing Research 18 (May 1981): 133– surement 9(1) (March 1985): 1–26.
145.
Woodside, Arch, and Shinn, Raymond: Customer Awareness and
Phillips, Lynn W.: Assessing Measurement Error in Key Informants’ Preferences Toward Competing Hospital Services. Journal of
Reports: A Methodological Note on Organizational Analysis in Health Care Marketing 8(1) (March 1988): 39–47.
Marketing. Journal of Marketing Research 18 (November 1981): Zeithaml, Valarie A., Parasuraman, A., and Berry, Leonard L.: Prob-
395–415. lems and Strategies in Services Marketing. Journal of Marketing
Reidenbach, R., Eric, and Sandifer-Smallwood, Beverly: Exploring 49 (Spring 1988): 33–48.
Measuring Health-Care Service Quality J Busn Res 245
2000:48:233–246

Appendix A. Specifications
Quality Attribute Activity

Reliability: Ability to perform the expected service dependably and accurately


—Accuracy in patient billing
—Current, accurate and neat medical record
—Correct performance of the service the first time
—Physician reputation among patients
—Physician reputation among other physicians
—Reputation of the hospital
—Physician compliance with Universal Precautions

Professionalism/skill: Knowledge, technical expertise, amount of training, experience, etc.


—Knowledgeable, skilled nurses and support staff
—Residency trained physicians
—Highly experienced physicians
—Physician specialty board certification
—Knowledgeable and skilled physicians
—Explaining trade-offs between service and cost to patient
—Physician history of malpractice

Empathy: Caring, individualized attention provided to patients by physicians and their staffs
—Alleviating patient concerns about the medical treatment
—Personal demeanor of the physician
—Learning the patient’s individual needs
—Providing individual consideration to the patient
—Remembering names and faces of patients

Assurance: Courtesy displayed by physicians, nurses, or office staff and their ability to inspire patient trust and confidence
—Courteous, friendly nurses and support staff
—Explaining the cost of service to the patient
—Explaining the medical service to the patient
—Courteous and friendly physicians
—Sensitivity to patient confidentiality

Core medical services: The central medical aspects of the service; appropriateness, effectiveness, and benefits to the patient
—Physicians who have published in medical journals
—Well-established physician referral base
—Effective utilization of services
—Physicians who participate in medical research
—Appropriate utilization of services (non-defensive)
—Positive medical outcome
—Orientation to preventative medicine
—Emphasis on patient education

Responsiveness: Willingness to provide prompt service


—Providing the service at the time promised
—Prompt service without an appointment
—Physician accessibility to patients by phone
—Convenient office hours for patients
—Adherence to patient appointment schedule

Tangibles: Physical facilities, equipment and appearance of contact personnel


—Professional appearance/dress of the support staff
—Professional appearance/dress of the physician
—Location of the office
—Location of the hospital
—Visually attractive and comfortable facilities
—Up-to-date equipment to provide the service
246 J Busn Res H. Lee et al.
2000:48:233–246

Appendix B. Specialty Groupings


Primary carea Specialists Hospital-Based Surgical

Family practitionerb Allergist/immunologist Anesthesiologist General surgeon

General practitioner Chest physician Emergency medicine Neurological surgeon

Pediatrics Dermatologist Pathologist Ophthalmologist

Internal medicine Geriatric Radiologist Orthopedic surgeon


Occupational medicine Nuclear medicine Plastic surgeon
Oncologist Thoracic surgeon
Physical medicine Ob/gyn
Psychiatrist Colon/rectal
Neurologist Otolaryngologist
Urologist

a
Specialty groupings: primary care; specialists; hospital-based; and surgical were derived through consultation with a panel of physicians.
b
Individual specialty designations provided by AMA to commercial mailing list vendor.

You might also like