Professional Documents
Culture Documents
Reliabilitas EPI
Reliabilitas EPI
net/publication/232584814
CITATIONS READS
90 3,466
3 authors:
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Schizotypal personality traits: The first cross-national study. "Crossing Borders" View project
Psicosis: Avances en detección e intervención temprana / Psychosis: early deteccion and intervention View project
All content following this page was uploaded by Eduardo García-Cueto on 20 January 2018.
Abstract
Most psychometric questionnaires used for evaluating personality traits are developed utilizing dichot-
omous item formats. From a psychometric viewpoint the Likert-type item format has more advantages
than the dichotomous format. If so, why is the dichotomous format so widely used in the framework of
personality tests? Does empirical evidence support this extensive use? The main goal of this research is to
study systematically the way in which the number of categories of the Likert-type item format used (from
two to nine categories) affects the main psychometric properties of the scale: reliability and validity. The
Eysenck Personality Questionnaire was used, changing its original yes/no format into a Likert-type scale,
from dichotomous to nine categories. A sample of 1149 participants (578 men and 571 women) was used.
The results show that the psychometric properties of the test (reliability and validity) improve as the
number of item categories is raised. Seven categories maximize the psychometric properties of the test. The
implications of these findings for Eysenck’s personality model are analyzed, and some considerations for
practitioners and professionals discussed.
Ó 2004 Elsevier Ltd. All rights reserved.
1. Introduction
The majority of questionnaires used by psychologists for evaluating personality traits use a
two-choice response format. The reasons for choosing this format are not made explicit by those
that construct the tests, but seem to relate more to tradition and supposed pragmatism than to
*
Corresponding author. Tel.: +34-985-10-32-22; fax: +34-985-10-41-44.
E-mail address: jmuniz@uniovi.es (J. Mu~
niz).
0191-8869/$ - see front matter Ó 2004 Elsevier Ltd. All rights reserved.
doi:10.1016/j.paid.2004.03.021
62 J. Mu~niz et al. / Personality and Individual Differences 38 (2005) 61–69
2. Method
2.1. Participants
We used a sample of 1149 persons, 578 men and 571 women, high-school and university stu-
dents, with a mean age of 20.4 years and a standard deviation of 4.58. Participants came from
different Spanish regions, and it was considered a fairly representative sample of the Spanish
population with these characteristics. Special care was taken to ensure that the different groups of
subjects were comparable.
2.3. Procedure
The questionnaires were applied by experts, keeping the instructions the same for all groups,
except those referring to the different response formats used. Participation was voluntary, and no
kind of reward or incentive was offered. Participants were highly cooperative at all times.
3. Results
3.1. Variability
Given that the standard deviation of the sample has a strong influence on psychometric indi-
cators such as reliability and other important correlations, we calculated the standard deviations
64 J. Mu~niz et al. / Personality and Individual Differences 38 (2005) 61–69
Table 1
Standard deviation of the variables as a function of number of response categories
Categories Neuroticism Extraversion Psychoticism
2 7.6 5.3 5.0
3 11.0 6.2 5.6
4 14.5 10.1 6.5
5 18.1 8.7 7.8
6 21.7 12.6 9.3
7 27.6 17.0 17.1
8 32.6 18.6 16.7
9 36.8 21.4 16.7
of the variables for the responses obtained, as a function of the number of categories of the Likert-
type scales used. The results are shown in Table 1. It can be seen that the lowest values correspond
to the data from the questionnaires with two categories (dichotomous response) and three cate-
gories. Variability is maximum from six categories onwards. It is evident that these results will
influence the rest of the psychometric properties estimated, such as reliability and validity.
3.2. Reliability
The Cronbach’s (1951) alpha reliability coefficients for the three scales of the questionnaire are
shown in Table 2. The first thing that can be observed is that the psychoticism scale has a lower
internal consistency than the Extraversion scale, and that this, in turn, is less reliable than the
neuroticism scale. The reliability of neuroticism remains stable around 0.9 from three response
categories onwards; that of extraversion does so around 0.8. In either case reliability is minimized
for the case of two categories. Reliability of psychoticism varies from 0.6 to 0.8, with a systematic
variations according to the number of categories. Using the Hakstian and Whalen (1976) K-
sample significance test for independent alpha coefficients, we found that for the Neuroticism-
Control and Extraversion–Introversion scales the differences between the coefficients alpha were
not statistically significant at the 95% significance level (v2 ¼ 5:39 and v2 ¼ 14:59 respectively).
However, in the case of Psychoticism the differences between the coefficients alpha obtained for
different categories resulted in statistical significance (v2 ¼ 37:84; p < 0:05).
Table 2
Reliability of the variables (alpha coefficient) as a function of number of response categories
Categories Neuroticism Extraversion Psychoticism
2 0.71 0.78 0.77
3 0.92 0.80 0.64
4 0.90 0.86 0.62
5 0.92 0.80 0.67
6 0.92 0.81 0.62
7 0.90 0.80 0.83
8 0.92 0.85 0.72
9 0.92 0.86 0.72
J. Mu~niz et al. / Personality and Individual Differences 38 (2005) 61–69 65
Once again the three variables of the Eysenck model behave in different ways, as regards their
factorial structure, depending on the number of response categories (Table 3). In the case of the
traditional dichotomous format, psychoticism is seen to be more one-dimensional, with the first
factor explaining 35% of total variance, followed by extraversion, with 30%, and finally neurot-
icism, with 27%. However, when five and six categories are used, the percentage of variance ex-
plained by the first factor increases to 36% and 37% respectively in the case of neuroticism; the
same does not occur with the other two scales, for which the percentage of variance explained
decreases with a number of categories higher than two. The factorial analyses were carried out
using the principal components method, and the variance explained by the first factor is computed
before the rotation of the axes.
Table 4 shows the items correctly assigned according to the Eysenck model after the corre-
sponding factorial analyses. Two methods of classification were used, according to the first one
Table 3
Percentage of variance explained by the first component
Categories Neuroticism Extraversion Psychoticism
2 27 30 35
3 35 25 15
4 33 31 14
5 36 25 14
6 37 26 13
7 33 25 25
8 35 29 18
9 37 33 15
Note: Principal Components were extracted and the percentage of variance is computed before the rotation.
Table 4
Percentage of items correctly classified according to Eysenck’s personality model, using two different classification
methods (CM1 and CM2)
Categories Neuroticism Extraversion Psychoticism Average
CM1 CM2 CM1 CM2 CM1 CM2 CM1 CM2
2 88 84 90 90 79 33 86 69
3 96 88 85 75 71 58 84 74
4 96 92 90 90 79 71 88 84
5 96 92 95 90 79 75 90 86
6 96 92 95 90 63 54 85 79
7 100 96 85 70 96 92 94 86
8 96 92 85 85 83 63 88 80
9 96 92 90 90 63 46 83 76
Average 95 91 89 85 77 61 87 79
Note: Three Principal Components were extracted and rotated using Varimax rotation. Two methods of classification
were used, according to the first one (CM1), an item was correctly assigned if the loading on its keyed factor was 0.2 or
above. According to the second classification method (CM2), an item was correctly classified when its highest loading
was on the keyed factor.
66 J. Mu~niz et al. / Personality and Individual Differences 38 (2005) 61–69
(CM1), an item was correctly assigned if the loading on its keyed factor was 0.2 or above.
According to the second classification method (CM2), an item was correctly classified when its
highest loading was on the keyed factor. This second method appears to be more restrictive than
the first one. The analyses were carried out using the principal components method, retaining only
three factors with varimax rotation, as hypothesized in Eysenck’s model implemented in the EPQ.
It can be seen that according to the first classification method (CM1) the number of categories
that maximized the correct assignment of the items is seven, with a total of 94% of the items
correctly classified. The second classification method (CM2) is more restrictive, reporting lower
percentages of correct assignments, with maximum numbers corresponding to five and
seven response categories (86%). As can be seen in Table 4, some differences in the percentages
of items correctly assigned can be observed among the three scales of the Eysenck Personality
model.
The objective of this work was to determine the extent to which, on manipulating the response
format of the items in the Eysenck Personality Questionnaire (Spanish version), the psychometric
properties of the scales (reliability and validity) would be affected. The modifications of the re-
sponse format consisted in applying the test in its usual dichotomous form to a sample of subjects,
as well as applying it to other samples, transforming this format into others of a Likert-type with
from three to nine categories. Apart from the intrinsic interest of determining the incidence of the
format on the psychometric properties, we were also interested in analyzing the implications of
these changes for the Eysenck’s personality model itself, originally constructed on the basis of
dichotomous data. The data obtained in a Spanish sample of 1149 subjects show that, in general,
the traditional dichotomous response format behaves worse, in psychometric terms, than the
Likert-type formats. The response formats that appear to function best are those of Likert type
from four categories onwards, thus confirming the results obtained by other authors with simu-
lated data (Bernstein & Teng, 1989; Bollen & Barb, 1981; Garcıa-Cueto, Mu~ niz, & Lozano, 2002).
Within the framework of item-response models, the graded response models, such as that of
Muraki (1990), adjust better to the data when between four and six response categories are used
(Hern andez, Mu~ niz, & Garcıa-Cueto, 2000). Similarly, the Likert-type format produces good
results when compared with continuous response formats (Ferrando, 1995; Gregoire & Driver,
1987; Rasmussen, 1989; Tom as & Oliver, 1998).
Variability is lower for the case of two (dichotomous format) and three categories, increasing,
though not in a systematic way, as number of categories increases. Since variability influences
reliability and correlations, this fact will condition to some extent other psychometric indicators
such as the reliability and factorial structure of the items. When simulated data are used, vari-
ability is lower for the case of three categories than for the dichotomous case, due probably to the
use of normal distributions of the responses, with the majority of scores accumulating in the
central category, resulting in a lowering of the variability (Garcıa-Cueto et al., 2002). However,
with real subjects, variability is greater––albeit slightly––for three categories than for two. From a
cognitive point of view it seems clear that the jump from two to three categories is not the same as
the jump from three to four or more.
J. Mu~niz et al. / Personality and Individual Differences 38 (2005) 61–69 67
As far as reliability is concerned, the data from other researchers are confirmed in the sense
that the neuroticism scale of Eysenck’s model shows better psychometric properties than the
other scales (Angleitner, John, & L€ ohr, 1986; Corulla, 1987; Ferrando, 2001; Helmes, 1980; Loo,
1995). The alpha coefficient of the neuroticism scale hovers around 0.90, from three categories
onwards, with small variations as a function of the number of categories of the scale used.
Reliability of the extraversion scale is around 0.80, with variations according to number of
categories of the scale, while for psychoticism somewhat lower values are obtained, between 0.62
and 0.83. The traditional dichotomous format seems to be less prejudicial to the psychoticism
scale than to those of extraversion and neuroticism, which both improve when three categories or
more are used. This is especially true for the neuroticism scale, which goes from a reliability of
0.71 for the dichotomous case to hovering around 0.9 from three categories onwards (v2 ¼ 37:84;
p < 0:05).
The neuroticism scale is found to be more one-dimensional than the other two, the first factor
explaining 27% of the total variance for the case of dichotomous format, and this percentage
rising to 37% when six response categories are used. These data are in the same line as those
obtained recently by Ferrando (2001), so that it can be asserted that the neuroticism scale is
essentially one-dimensional. In contrast to the case of neuroticism, the extraversion scale is more
one-dimensional for the case of dichotomous response, the first factor explaining 30% of the total
variance. This percentage decreases for the case of three to eight categories, rising again in the case
of nine (33%). The case of the psychoticism scale is quite striking: with a dichotomous format the
first factor explains 35% of the total variance, this percentage falling markedly when the number
of response categories increases. Everything appears to indicate that the classic dichotomous
format is advantageous in terms of reliability and one-dimensionality for the psychoticism scale
than for those of neuroticism and extraversion.
Finally, we attempted to determine which type of item format provided the best results in
relation to Eysenck’s personality model (Eysenck, 1978; Eysenck et al., 1992; Eysenck & Eysenck,
1969; Moosbrugger & Fischbach, 2002; Van Hemert, van de Vijver, Poortinga, & Georgas, 2002).
As Table 4 shows, the use of seven response categories maximizes the percentage of items correctly
classified according to Eysenck’s model, with 96% correct assignments in the case of the first
classification method, and 86% using the second classification method. Using the traditional
dichotomous response, the percentages of correct classifications are lower (86% and 69%
respectively), so that everything seems to suggest that the Likert-type format of seven categories
would favor the confirmation of Eysenck’s model.
Although with some particular observations for each one of the scales considered, in general
our data supports the idea that, from the psychometric point of view, the use of the Likert-type
format has more advantages than disadvantages in the case of personality questionnaires. Results
in the same line have been obtained by Bollen and Barb (1981) using simulated data. If to this we
add the finding that the people evaluated also prefer the Likert-type format (Velicer et al., 1984),
there remain few reasons for continuing to use the dichotomous format. In the present work we
have simply tried to add some empirical data on the behavior of those evaluated with different
formats; naturally, there remain many basic issues not dealt with here, such as the question of the
psychological processes involved in the change of format, especially with regard to the jump from
two to more categories. Nor, indeed, have we considered technical issues in relation to the con-
ditions required for a rigorous comparison of formats. In future investigations it would be also
68 J. Mu~niz et al. / Personality and Individual Differences 38 (2005) 61–69
desirable to include an external criterion in order to check if the validity coefficient reflects the
psychometric improvements observed when increasing the number of categories.
Acknowledgements
A previous version of this work was presented at the XXV International Congress of Applied
Psychology, Singapore, 7–12 July, 2002. This work was supported by a grant from the Spanish
Ministerio Espa~
nol de Educacion y Cultura, ref. PB97-1295.
References
Angleitner, A., John, O. P., & L€ ohr, F. J. (1986). It’s what you ask and how you ask it: an item metric analysis of
personality questionnaires. In A. Angleitner & J. S. Wiggins (Eds.), Personality assessment via questionnaires (pp.
61–107). Berlin: Springer-Verlag.
Bernstein, I. H., & Teng, G. (1989). Factoring items and factoring scales are different: spurious evidence for
multidimensionality due to item categorization. Psychological Bulletin, 105, 467–477.
Bollen, K. A., & Barb, H. (1981). Pearson’s R and coarsely categorized measures. American Sociological Review, 46,
232–239.
Chang, L. (1994). A psychometric evaluation of 4-point and 6-point Likert-type scales in relation to reliability and
validity. Applied Psychological Measurement, 18, 205–215.
Comrey, A. L., & Montag, I. (1982). Comparison of factor analytic results with two-choice and seven choice personality
item formats. Applied Psychological Measurement, 6, 285–289.
Corulla, W. J. (1987). A psychometric investigation of the Eysenck personality questionnaire (revised) and its
relationship to the 1.7 impulsiveness questionnaire. Personality and Individual Differences, 8, 651–658.
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297–334.
Eysenck, H. J. (1978). Superfactors P, E, and N in a comprehensive factor space. Multivariate Behavioral Research, 13,
475–481.
Eysenck, H. J., Barrett, P., Wilson, G., & Jackson, C. (1992). Primary trait measurement of the 21 components of the P-
E-N system. European Journal of Psychological Assessment, 8, 109–117.
Eysenck, H. J., & Eysenck, S. B. G. (1969). Personality structure and measurement. London: Routledge.
Eysenck, H. J., & Eysenck, S. B. G. (1978). EPQ cuestionario de personalidad. Madrid: TEA.
Ferrando, P. J. (1995). Equivalencia entre los formatos Likert y continuo en items de personalidad: un estudio empırico
[Equivalence between Likert-type and continuous item formats in personality items: an empirical study]. Psicol
ogica,
16, 417–428.
Ferrando, P. J. (1999). Likert scaling using continuous, censored, and graded response models: effects on criterion-
related validity. Applied Psychological Measurement, 23, 161–175.
Ferrando, P. J. (2000). Testing the equivalence among different item response formats in personality measurement: a
structural equation modeling approach. Structural Equation Modeling, 7, 271–286.
Ferrando, P. J. (2001). The measurement of neuroticism using MMQ, MPI, EPI and EPQ items: a psychometric
analysis based on item response theory. Personality and Individual Differences, 30, 641–656.
Garcıa-Cueto, E., Mu~ niz, J., & Lozano, L. M. (2002). Influencia del n umero de alternativas en las propiedades
psicometricas de los tests [Influence of the items format on the psychometric properties of tests] [Special issue].
Metodologa a de las Ciencias del Comportamiento, 201–205.
Gardner, D. G., Cummings, L. L., Dunham, R. B., & Pierce, J. L. (1998). Single-item versus multiple-item
measurement scales. An empirical comparison. Educational and Psychological Measurement, 58, 898–915.
Goldberg, L. R. (1981). Unconfounding situational attributions from uncertain, neutral, and ambiguous ones: a
psychometric analysis of descriptions of oneself and various types of others. Journal of Personality and Social
Psychology, 41, 517–552.
J. Mu~niz et al. / Personality and Individual Differences 38 (2005) 61–69 69
Gregoire, T. G., & Driver, B. L. (1987). Analysis of ordinal data to detect population differences. Psychological Bulletin,
101, 159–165.
Hakstian, A. R., & Whalen, T. E. (1976). A K-sample significance test for independent alpha coefficients.
Psychometrika, 41, 219–231.
Helmes, E. (1980). A psychometric investigation of the Eysenck personality questionnaire. Applied Psychological
Measurement, 4, 43–55.
Hernandez, A., Mu~ niz, J., & Garcıa-Cueto, E. (2000). Comportamiento del modelo de respuesta graduada en funci on
del numero de categorıas de la escala [Effects of the scale number of categories on the Muraki graded response
model]. Psicothema, 12, 288–291.
Jacoby, J., & Matell, M. S. (1971). Three point Likert scales are good enough. Journal of Marketing Research, 8, 495–
500.
Loo, R. (1995). Validation of the EPQ Neuroticism subscales using a Japanese sample. Social Behavior and Personality,
23, 131–136.
Matell, M. S., & Jacoby, J. (1971). Is there an optimal number of alternatives for Likert scale items? Educational and
Psychological Measurement, 31, 657–674.
Maurer, T. J., & Andrews, K. D. (2000). Traditional, Likert, and simplified measures of self-efficacy. Educational and
Psychological Measurement, 60, 965–973.
Moosbrugger, H., & Fischbach, A. (2002). Evaluating the dimensionality of the Eysenck Personality Profiler––German
version (EPP-D): A contribution to the super three vs. big five discussion. Personality and Individual Differences, 33,
191–212.
Muraki, E. (1990). Fitting a polytomous item response model to Likert-type data. Applied Psychological Measurement,
14, 59–71.
Oswald, W. L., & Velicer, W. F. (1980). Item format and the structure of the Eysenck Personality Inventory: a
replication. Journal of Personality Assessment, 44, 283–288.
Rasmussen, J. L. (1989). Analysis of Likert-scale data: a reinterpretation of Gregoire and Driver. Psychological Bulletin,
105, 167–170.
Tomas, J. M., & Oliver, A. (1998). Efectos de formato de respuesta y metodo de estimaci on en analisis factorial
confirmatorio [Response format and method of estimation effects on confirmatory factor analysis]. Psicothema, 10,
197–208.
Van Hemert, D. A., van de Vijver, F. J. R., Poortinga, Y. H., & Georgas, J. (2002). Structural and functional
equivalence of the Eysenck Personality Questionnaire within and between countries. Personality and Individual
Differences, 33, 1229–1250.
Velicer, W. F., DiClemente, C. C., & Corriveau, D. P. (1984). Item format and the structure of the Personal Orientation
Inventory. Applied Psychological Measurement, 8, 409–419.
Velicer, W. F., & Stevenson, J. F. (1978). The relation between item format and the structure of the Eysenck personality
Inventory. Applied Psychological Measurement, 2, 293–304.