Professional Documents
Culture Documents
Fuzzyvs LikertScaleinStatistics
Fuzzyvs LikertScaleinStatistics
Fuzzyvs LikertScaleinStatistics
Abstract. Likert scales or associated codings are often used in connection with
opinions/valuations/ratings, and especially with questionnaires with a pre-specified
response format. A guideline to design questionnaires allowing free fuzzy-numbered
response format is now given, the fuzzy numbers scale being very rich and expres-
sive and enabling to describe in a friendly way the usual answers in this context. A
review of some techniques for the statistical analysis of the obtained responses is
enclosed and a real-life example is used to illustrate the application.
1 Introduction
Likert scales are widely used to measure attributes often associated with opin-
ions/valuations/ratings, and so on, leading to ordinal/categorical data from a set of
pre-fixed labels/categories/names.
To facilitate the development of statistical data analysis in this setting, the usual
way to proceed is to code each response category by means of an integer number
(often by using the either the scale 1-5, or 1-7). More recently, some authors (see,
for instance, Lalla et al. [9] Lazim and Osman [10], Bharadwaj [2]) have suggested
to identify each Likert response category with a fuzzy subset from a class of opera-
tional and flexible fuzzy sets which have been stated by ‘experts’ either individually
or by consensus.
Marı́a Ángeles Gil · Gil González-Rodrı́guez
University of Oviedo, 33071 Oviedo, Spain
e-mail: {magil,gil}@uniovi.es
Gil González-Rodrı́guez
European Centre for Soft Computing, 33600 Mieres, Spain
e-mail: gil.gonzalez@softcomputing.es
This paper has been written as a tribute to Professor Ebrahim Mamdani. We have had the
great opportunity of meeting a unique outstanding person, during last years mainly
because of him being a member of the Scientific Committee of the European Centre for
Soft Computing. We have learned a lot from his lectures and conversations, and have
enjoyed with the fruitful discussions around, so we will feel always indebted to him.
E. Trillas et al. (Eds.): Combining Experimentation and Theory, STUDFUZZ 271, pp. 407–420.
springerlink.com
c Springer-Verlag Berlin Heidelberg 2012
408 M.Á. Gil and G. González-Rodrı́guez
One of the key concerns for these approaches relates to the fact that the number
of different potential values of the attribute is small, whence many statistical devel-
opments to be performed could be limited or even unfeasible, and conclusions from
them could be sometimes not very accurate.
In this respect, an alternate way to proceed (see, for instance, van Laerhoven et al.
[8]) is the one corresponding to the so-called simple visual analogue scale (or line
response option), in which the extreme answers of the Likert scale mark the ends of
a line and respondents are asked to mark the line with a cross somewhere between
both extremes that best reflect their answer. Therefore, there is no pre-specified list
of possible answers but questionnaire is a free format one.
Another more general alternate approach to proceed is to be explained, the ap-
proach taking into account that, for the sake of realism, the nature of most of the at-
tributes concerning opinions/ratings/judgements involve subjectiveness and a certain
imprecision. In this way, the value of such an attribute for an individual is assumed
to be described by using a fuzzy set (usually a fuzzy number) fitting the perception
of the researcher/respondent without considering a pre-fixed list of answers. This
freedom in assigning values leads to a free (fuzzy-valued) response format enabling
a variability and accuracy which would not be captured in case of using either a
Likert scale or an associated real- or fuzzy-valued coding.
Whereas Likert scales or associated codings discretize concerned attributes into
a small number of potential values, the use of the free response format would allow
attributes to take either a large finite or infinite number of potential values. The
spirit of Statistics, as the science of variation, randomness and chance, would be
better captured by using this free response format than a Likert-like (or a coded
Likert-like) one.
Furthermore, the fuzzy scale is rich and expressive enough to find a value in it
fitting appropriately the valuation/opinion/rating involving subjective perceptions in
most of real-life situations, even if we constrain ourselves to find it in some oper-
ational classes of fuzzy sets, like trapezoidal, S- and Π -curves (see Eshragh and
Mamdani [3]).
On the other hand, to facilitate statistical data analysis Likert response cate-
gories are usually coded by consecutive integer values. This assignment has been
frequently criticized as unrealistic (cf. Wu [22]) because the integer numbers cannot
reflect often real differences between scale categories.
In this paper, after presenting the preliminaries on the fuzzy scale which will be
most commonly used, a guideline is given to design questionnaires allowing free
fuzzy-numbered response format, and to explain non-expert users how to employ
this friendly and accurate approach.
Some real-life examples will illustrate the approach and a review will be given on
a methodology which is being carried out to analyze fuzzy data in this setting. This
methodology is based on a versatile and intuitive distance with a meaning similar
to the one for real numbers. As a consequence, by combining the fuzzy scale with
this distance the usual concerns on the integer coding of Likert scale categories
are avoided, since distance between fuzzy numbers reflect properly real differences
between the corresponding perceptions.
Fuzzy vs. Likert Scale in Statistics 409
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Figure 2 shows the answer supplied by one of the students to the question con-
cerning the motivation of a concrete course; this answer would indicate that for this
student the motivation of the course has been not lower than 75% (so, the 0-level
will be [75, 100]), and he/she considers that 80 to 90% are the values being fully
compatible with his/her opinion (so, the 1-level will be [80, 90]), these levels being
finally ‘interpolated’ by using a linear interpolation.
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Fig. 2 Answer supplied by one of the students of a course about a given question
Fuzzy vs. Likert Scale in Statistics 411
This questionnaire has been designed for individual description. Since no pre-
specified list of categories has been considered in advance, students have a high
freedom to express their valuation/rating accurately. Furthermore, as it has been
pointed out in the Introduction, the variability of collected data are definitely much
better captured in this way. To corroborate this assertion we can comment that, in
particular for the first examined course and question, only 4 coincidences have been
detected among the 29 students who have attended the course.
Broadly speaking, this fuzzy valued-based questionnaire provides investigators
with a richer information than traditional ones, and the freedom in the response
format leads to more interesting and powerful statistics.
and given a real number γ , the product of U by the scalar γ is defined as the fuzzy
number γ · U such that for each α ∈ [0, 1]:
α = γ ·U
(γ · U) α = γ · y : y ∈ U α .
The space (Fc (R), +, ·) has not a linear (but a semilinear-conical) structure, since
+ (−1)· U
U does not coincide in general with the indicator function of the singleton
{0}, but with a fuzzy number being symmetrical w.r.t. 0.
Let θ ∈ (0, +∞) and let ϕ be an absolutely continuous probability measure on
([0, 1], B[0,1] ) with the mass function being positive in (0, 1) and B[0,1] being the
Borel σ -field on [0, 1]. From now on, and to guarantee the existence of the involved
distances, we will constrain to the wide subclass Fc2 (R) of fuzzy numbers U for
which both [0,1] [inf Uα ] d λ (α ) < ∞ and [0,1] [sup Uα ] d λ (α ) < ∞.
2 2
ϕ V ∈ Fc2 (R)
Then, the mapping D : Fc2 (R) × Fc2 (R) → [0, +∞) such that for U,
θ
2
2
ϕ
Dθ (U, V) = mid Uα − mid Vα + θ · spr Uα − spr Vα d ϕ (α )
[0,1]
with α[ν ]
U = ν sup U α + (1 − ν ) inf U α and λ being the Lebesgue measure in
([0, 1], B[0,1] ).
More generally, if θ ∈ (0, 1], then (see Gil et al. [4], Trutschnig [21]) there ex-
ist a weighting measure
W formalized as a nondegenerate
probability measure on
([0, 1], B[0,1] ) with [0,1] dW (ν ) = .5 and θ = [0,1] (2ν − 1)2 dW (ν ), such that
2
ϕ
Dθ (U, V) = Uα[ν ] − V
α[ν ] dW (ν ) d ϕ (α ),
[0,1] [0,1]
Random fuzzy sets (for short RFS) were introduced by Puri and Ralescu [19], as
a mathematical model for mechanisms associating a fuzzy value with each exper-
imental outcome and extending random variables and sets. The notion of random
fuzzy set (in the 1-dimensional case) can be introduced in some equivalent ways,
namely,
It should be pointed out that the Borel measurability of RFS’s ensures that one can
properly refer in this setting to notions like the distribution induced by an RFS, the
stochastic independence of RFS’s, and other ones which are required in the statis-
tical developments. As a consequence most of the key ideas in statistical develop-
ments could be preserved.
In the statistical analysis of fuzzy data two main types of summary measures/
parameters may be distinguished:
• fuzzy-valued summary measures, like the mean value of an RFS as a measure for
the central tendency of its values;
• real-valued summary measures, like the Fréchet-variance of an RFS as measures
for the mean error/dispersion of the values of the RFS.
The mean value of an RFS can be presented in two equivalent ways, either as
an extension of the set-valued Aumann expectation (see Puri and Ralescu [19]) or
level-wise in terms of the mids and spreads (as well as induced from the expectation
of a Hilbert space-valued random element). Thus,
The mean value of an RFS satisfies the usual properties of linearity. Thus,
Proposition 1. If γ ∈ R, U ∈ Fc2 (R) and X , Y are RFSs associated with (Ω , A , P)
and such that max{| inf Xα |, | sup Xα |}, max{| inf Yα |, | sup Yα |} ∈ L1 (Ω , A , P) for
all α ∈ (0, 1], then
γ · X + U)
i) E( = γ · E(X
) + U.
ii) E(X
+ Y ) = E(X ).
) + E(Y
Furthermore, the mean value of an RFS is coherent with the fuzzy arithmetic and it
ϕ
is the Fréchet’s expectation w.r.t. Dθ , which corroborates the fact that it is a central
tendency measure. Thus,
Proposition 2. The mean value of an RFS satisfies that
i) if X is an RFS associated with the same probability space (Ω , A , P) and
such that the set of the RFS values is finite or countable, that is, X (Ω )
= {x1 , . . . , xm , . . .} ⊂ Fc2 (R), then
E(X ) = P ({ω ∈ Ω : X (ω ) = x1 }) · x1 + . . .
+P({ω ∈ Ω : X (ω ) = xm }) · xm + . . .;
ϕ
ii) it is the fuzzy number leading to the lowest mean squared Dθ -distance (or error)
w.r.t. the RFS values, i.e.,
2
2
ϕ ϕ
E Dθ (X , E(X )) = min E Dθ (X , U) .
U∈F c (R)
2
On the other hand, in formalizing the variance of an RFS the Fréchet’s approach
has been considered (see Lubiano et al. [12], Körner and Näther [14], Ramos et
al. [20]). In this approach the variance is conceived as a measure of the ‘error’ in
approximating the values of the RFS through the corresponding mean value, this
error being quantified in terms of a squared metric. In this way,
Definition 3. Given a probability space (Ω , A , P) and an associated RFS X such
that max{| inf Xα |, | sup Xα |} ∈ L2 (Ω , A , P) for all α ∈ (0, 1], the (θ , ϕ )-Fréchet
variance of X is the real number given by any of the following statements
2
ϕ
• σX 2 =E Dθ X , E(X )] ,
• σX
2
= Var(mid X ) + θ Var(spr X ).
The (θ , ϕ )-Fréchet variance of an RFS satisfies the usual properties for this concept.
In this way,
Proposition 3. σX 2
≥ 0 with σX 2 ∈ Fc2 (R) such
= 0 if, and only if, there exists U
that almost surely X = U.
Proposition 4. If γ ∈ R, U ∈ Fc2 (R) and X , Y are two independent RFSs associ-
ated with the probability space (Ω , A , P) and such that max{| inf Xα |, | sup Xα |},
max{| inf Yα |, | sup Yα |} ∈ L2 (Ω , A , P), then
i) σγ2·X +U = γ 2 · σX
2 .
ii) σX
2
+Y = σX + σY .
2 2
416 M.Á. Gil and G. González-Rodrı́guez
a simple random sample (i.e., independent RFSs being identically distributed as the
one to be analyzed), (X1 , . . . , Xn ) from the RFS X , methods have been suggested
to test the null ‘two-sided’ hypothesis
H0 : E(X ∈ Fc2 (R)
) =U (equality of fuzzy numbers),
which is equivalent to
ϕ
=0
H0 : Dθ E(X ), U (equality of real numbers).
An exact test for ‘normal’ RFSs (in Puri and Ralescu’s sense [18]) has been devel-
oped (Montenegro et al. [16]). Although it is an exact and easy-to-apply method, X
being normal in Puri and Ralescu’s sense (i.e., X = V + N (0, 1) with V
∈ Fc2 (R))
is quite restrictive and unrealistic.
On the other hand, asymptotic tests for general RFSs have been also introduced
(see Körner [13], Montenegro et al. [16]). Although it is a general method based
on the Central Limit Theorem for Banach space-valued random elements, and it
is rather easy-to-apply when X takes on a finite number of different values, the
asymptotic distribution of the statistic usually involves unknown parameters, and
large sample sizes are required. Moreover, simulation studies have shown that esti-
mating either the eigenvalues or the covariance operator entails a substantial loss of
precision w.r.t. the nominal significance level.
ϕ
By taking into account these concerns, the use of Dθ has been combined with that
of the Generalized Bootstrapped Central Limit Theorem by Giné and Zinn, allowing
us to consider bootstrap techniques in this context. Thus, in González-Rodrı́guez et
al. [7] a bootstrap approximation to the asymptotic test has been presented. The
algorithm summarizing the steps to be followed to apply such a test is the following
one:
where
n 2
ϕ 1
Sn2 (sample) = ∑ Dθ xi , · [
x1 + . . . + xn ] (n − 1)
i=1 n
S2. Fix the bootstrap population to be the above realization of the simple random
sample
S3. Obtain a realization of the simple random sample (X1∗ , . . . , Xn∗ ) from the boot-
strap population
418 M.Á. Gil and G. González-Rodrı́guez
S5. Steps S3 and S4 should be repeated a large number B of times to get a set of B
∗(1) ∗(B)
estimates, denoted by {Tn , . . . , Tn }
∗(1) ∗(B)
S6. Compute the bootstrap p-value as the proportion of values in {Tn , . . . , Tn }
being greater than Tn (sample)
Comparative simulation studies have been carried out, showing that for small/
medium samples, the bootstrap method performs and behaves usually much better
than the asymptotic one, and for large sample sizes (over 300), the improvement is
not that remarkable, but the bootstrap approach still provides the best approximation
to the nominal significance level. It should be also emphasized that the probability of
rejecting the null hypothesis under alternative assumptions converges to 1 as n → ∞
(i.e., both the asymptotic and the bootstrap tests are consistent).
The application of the bootstrapped one-sample test is now illustrated.
Table 1 Fuzzy answers to the motivation of a given course of 29 students attending the II
Summer School of the ECSC
xi )0
inf( 50 34 21 70 50 75 70 52 50 60 80 10 65 20 60 44 60 50 60 90 56 30 10 60 70 80 55 70 69
xi )1
inf( 60 40 23 80 60 80 74 60 55 70 90 30 70 30 70 47 70 60 67 100 60 40 20 65 76 90 65 80 100
xi )1
sup( 70 41 34 90 70 90 86 60 60 80 90 40 70 30 70 53 80 70 72 100 64 40 20 75 84 90 74 100 100
xi )0
sup( 80 46 40 100 80 100 90 64 70 90 100 60 75 40 80 71 90 80 80 100 70 50 30 80 90 100 80 100 100
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
In a similar way, two-sample and multi-sample tests for both independent and
linked samples have been developed, the bootstrap approach being in general
the most appropriate one. An R-package called SAFD (Statistical Analysis of
Fuzzy Data) have been recently designed by Lubiano and Trutschnig to perform
computations with RFSs. This package includes most of the procedures com-
mented in Subsection 4.3 and is being periodically updated. It can be found in
http://bellman.ciencias.uniovi.es/SMIRE/SAFDpackage.html.
5 Concluding Remarks
The use of free response fuzzy-numbered formats instead of Likert’s ones (or
alternate real- or fuzzy-valued codings) to answer to questions related to valua-
tions/opinions/ratings involving some subjectiveness has been discussed in this pa-
per. A relevant advantage for this approach is that the suggested format captures
much better accuracy, subjectiveness and variability of answers, whence their sta-
tistical analysis becomes more interesting. Actually, this analysis can be carried out
through recent inferential developments which have been shortly commented.
An open direction that could be thought about is the one combining the free
response and summary answers with the ideas by Eshragh and Mamdani [3] in case
a linguistic interpretation is needed, although originally the suggested format would
not force users to take this combination into account.
Acknowledgements. This research has been partially supported by/benefited from the Span-
ish Ministry of Science and Innovation Grants MTM2009-09440-C02-01 and MTM2009-
09440-C02-02, the Principality of Asturias Grants IB09-042C1 and IB09-042C2, and the
COST Action IC0702. Their financial support is gratefully acknowledged.
References
1. Bertoluzza, C., Corral, N., Salas, A.: On a new class of distances between fuzzy numbers.
Math. & Soft Comput. 2, 71–84 (1995)
2. Bharadwaj, B.: Development of a fuzzy Likert scale for the WHO ICF to include categor-
ical definitions on the basis of a continuum. ETD Collection for Wayne State University.
Paper AAI1442894 (2007),
http://digitalcommons.wayne.edu/dissertations/AAI1442894
3. Eshragh, F., Mamdani, E.H.: A general approach to linguistic approximation. Int. J. Man-
Machine Studies 11, 501–519 (1979)
420 M.Á. Gil and G. González-Rodrı́guez
4. Gil, M.A., Lubiano, M.A., Montenegro, M., López-Garcı́a, M.T.: Least squares fitting of
an affine function and strength of association for interval data. Metrika 56, 97–111 (2002)
5. Giné, E., Zinn, J.: Bootstrapping general empirical measures. Ann. Probab. 18, 851–869
(1990)
6. González-Rodrı́guez, G., Colubi, A., Gil, M.A.: Fuzzy data treated as functional
data. A one-way ANOVA test approach. Comp. Statist Data Anal. (2011) (in press)
doi:10.1016/j.csda.2010.06.013
7. González-Rodrı́guez, G., Montenegro, M., Colubi, A., Gil, M.A.: Bootstrap techniques
and fuzzy random variables: Synergy in hypothesis testing with fuzzy data. Fuzzy Sets
and Systems 157, 2608–2613 (2006)
8. van Laerhoven, H., van der Zaag-Loonen, H.J., Derkx, B.H.F.: A comparison of Likert
scale and visual analogue scales as response options in childrens questionnaires. Acta
Pædiatr 93, 830–835 (2004)
9. Lalla, M., Facchinetti, G., Mastroleo, G.: Ordinal scales and fuzzy set systems to measure
agreement: an application to the evaluation of teaching activity. Quality & Quantity 38,
577–601 (2004)
10. Lazim, M.A., Osman, M.T.A.: Measuring teachers’ beliefs about Mathematics: a fuzzy
set approach. Int. J. Soc. Sci. 4(1), 39–43 (2009)
11. Lubiano, M.A., Gil, M.A.: Estimating the expected value of fuzzy random variables in
random samplings from finite populations. Statistical Papers 40(3), 277–295 (1999)
12. Lubiano, M.A., Gil, M.A., López-Dı́az, M., López, M.T.: The lambda-mean squared
dispersion associated with a fuzzy random variable. Fuzzy Sets and Systems 111(3),
307–317 (2000)
13. Körner, R.: An asymptotic α -test for the expectation of random fuzzy variables. J. Stat.
Plann Infer. 83, 331–346 (2000)
14. Körner, R., Näther, W.: On the variance of random fuzzy variables. In: Bertoluzza, C.,
Gil, M.A., Ralescu, D.A. (eds.) Statistical Modeling, Analysis and Management of Fuzzy
Data, pp. 22–39. Physica-Verlag, Heidelberg (2002)
15. Montenegro, M., Casals, M.R., Lubiano, M.A., Gil, M.A.: Two-sample hypothesis tests
of means of a fuzzy random variable. Information Sciences 133(1-2), 89–100 (2001)
16. Montenegro, M., Colubi, A., Casals, M.R., Gil, M.A.: Asymptotic and Bootstrap tech-
niques for testing the expected value of a fuzzy random variable. Metrika 59, 31–49 (2004)
17. Nguyen, H.T.: A note on the extension principle for fuzzy sets. J. Math. Anal. Appl. 64,
369–380 (1978)
18. Puri, M.L., Ralescu, D.A.: The concept of normality for fuzzy random variables. Ann.
Probab. 11, 1373–1379 (1985)
19. Puri, M.L., Ralescu, D.A.: Fuzzy random variables. J. Math. Anal. Appl. 114, 409–422
(1986)
20. Ramos-Guajardo, A.B., Colubi, A., González-Rodrı́guez, G., Gil, M.A.: One sample
tests for a generalized Fréchet variance of a fuzzy random variable. Metrika 71(2),
185–202 (2010)
21. Trutschnig, W., González-Rodrı́guez, G., Colubi, A., Gil, M.: A new family of metrics
for compact. Sets Based on a Generalized Concept of Mid and Spread Inform. Sci. 179,
3964–3972 (2009)
22. Wu, C.-H.: An empirical study on the transformations of Likert-scale data to numerical
scores. Appl. Math. Sci. 58(1), 2851–2862 (2007)
23. Zadeh, L.A.: The concept of a linguistic variable and its application to approximate rea-
soning. Part 1. Inform. Sci. 8, 199–249 (1975); ; Part 2. Inform. Sci. 8, 301–353; Part 3.
Inform. Sci. 9, 43–80