Professional Documents
Culture Documents
Instrunmentation For Sensory Measurements PDF
Instrunmentation For Sensory Measurements PDF
“Is hypothesis testing really important? A more important issue than that concerning possible sensory
differences is the size and importance of these differences1 .”
Decision Rules
The second main assumption made by Thurstonian scaling is that subjects compare the percepts in a
systematic fashion when making their decisions in a sensory difference test. For example, in a 2‐AFC test,
a subject will respond that Sample A is less intense than Sample B if the percept corresponding to Sample
A is less intense than the percept corresponding to Sample B. If Sample A is less intense on average than
Sample B, then the subject will be correct. Such a case occurs if, for example, percepts a1 and b are
chosen in Fig. 1. On the other hand, if the percept a2 is instead chosen for Sample A in Fig. 1, then the
subject will give an incorrect answer. As these percepts assume momentary values as given by normal
distributions, the probability of correct response can be computed as a function of δ.
Figure 1
Possible Percepts in the 2‐Alternative Forced‐Choice (2‐AFC) Test
Similarly, for each difference testing method, the relationship between δ and the probability of a correct
response can be deduced once a decision rule has been specified. This relationship is called a
psychometric function (cf. Ennis 1993) – psychometric functions have been derived for the 2‐AFC
(Thurstone 1927), 3‐AFC (Elliot 1964), 4‐AFC (Bi et al. 2010), Triangle (Ura 1960; David and
Trivedi 1962; Bradley 1963), Duo‐Trio (David and Trivedi 1962) and the Specified and Unspecified
Tetrad (Ennis et al. 1998) tests, and have been approximated for the Two‐Out‐of‐Five test (Ennis 2013).
In addition, Bi and O'Mahony (2013) have provided simplified expressions for the psychometric
functions of both the Triangle and Unspecified Tetrad tests.
Differences in decision rules explain why some difference tests lead to more correct answers than others
(Frijters 1979) – some decision rules are more efficient than others and are less easily influenced by noise
in the percepts. For example, Jesionka et al. (2014) illustrate, through a case‐by‐case analysis, why the 3‐
AFC leads to more correct answers than the Triangle test9 .
Recently, investigations have been conducted into techniques to encourage subjects to employ more
efficient decision rules (van Hout et al. 2011; M. Kim et al. 2012). A challenge for this line of research is
that, unless the decision rules that the subject used are known, it is not possible to use Thurstonian theory
to transform the results into d′ values. Through the skillful application of signal detection theory
(Hautus et al. 2007; O'Mahony and Hautus 2008; Wichchukit and O'Mahony 2010), it may be possible to
meet this challenge, but to our knowledge it has not yet been fully met.
Precision of Measurement
Psychometric Functions and Precision
As noted in previous section, the psychometric function of a sensory difference test relates the
underlying sensory difference δ to the probability of correct response. Figure 2 shows the
psychometric functions for the specified methods described in Section 2, while Fig. 3 shows the
psychometric functions for the unspecified methods.
Figure 2
Psychometric Functions for the Specified Methods of Forced‐Choice Difference Testing
Discussed in this Article
Figure 3
Psychometric Functions for the Unspecified Methods of Forced‐Choice Difference Testing
Discussed in this Article
When using the results of a difference test to estimate δ, the shape of the psychometric function
determines the precision of that estimate. If the psychometric function is relatively flat in the
region of δ values surrounding d′, then a wide range of δ values would yield results similar to
those observed experimentally. Thus the estimate of δ will not be precise in this case. On the
other hand, if the psychometric function is relatively steep, in the region of δ values
surrounding d′, only δ values close to d′ would yield results similar to those witnessed
experimentally. In such a case, the estimate of δ will be more precise. For example, Fig. 4 shows
the psychometric function of the Triangle test over the range 0.5 ≤ δ ≤ 1.5, while Fig. 5 shows the
psychometric function of the Tetrad test over the same range.
Figure 4
The Relationship Between a Range of Similar Values for the Proportion Correct and the
Corresponding Range of δ Values, in a Triangle Test
Figure 5
The Relationship Between a Range of Similar Values for the Proportion Correct and the
Corresponding Range of δ Values, in a Tetrad Test
Over this range of δ values, the psychometric function for the Tetrad test is steeper than the
psychometric function of the Triangle test. As a result, small deviations in the proportion of
correct responses (in this case a deviation of ± 0.03) correspond to relatively smaller deviations
in δ for the Tetrad test than the for the Triangle test. This fact eventually implies that the Tetrad
test is more precise near δ = 1 than the Triangle test [see Ennis and Christensen (2014), for
additional discussion on this topic].
Variance in Estimate of δ
The variance in the estimate of δ depends only on the experimentally observed proportion of
correct responses and the sample size. When discussing this variance, it is thus standard practice
to refer to the so‐called “B values,” which are the product of the variance by the sample size (cf.
Bi et al. 1997). Recently, Bi and O'Mahony (2013) compiled the B values for several of the
common difference testing methods into a single resource, and contributed the B values for the
Specified Tetrad test. These B values, which were provided separately in (Bi et al. 1997, 2010;
Bi 2006; Ennis 2012) have been combined with B values for the Two‐Out‐of‐Five test
(Ennis 2013) to create Figs 6 and 7. These figures show these values for the testing methods in
our list of forced‐choice testing methods, separated according to whether the tests are specified
or unspecified.
Figure 6
B Values for the Specified Methods of Forced‐Choice Difference Testing Discussed in this
Article
Figure 7
B Values for the Unspecified Methods of Forced‐Choice Difference Testing Discussed in this
Article
Comparing Figs 6 and 7 with Figs 2 and 3, we see that the B values for a difference test are
indeed smallest when the test's psychometric function is steepest. Two other important notes are
that the variance in d′ is not the same as the perceptual noise as described in Section 3, and that
variances derived from B values should not be used to create confidence intervals (cf.
Pawitan 2001 and Christensen and Brockhoff 2009). The B values are only meant to provide a
rough assessment of the precision or imprecision of the various testing methods over a range of
possible δ values, for comparative purposes.
Figures 8 and 9 show the expected widths of the likelihood‐based 95% confidence intervals for
the specified and unspecified difference tests, respectively, for N = 30. Note that these expected
widths appear in Ennis and Christensen (2014) for the Unspecified Tetrad, Triangle and 2‐AFC
tests. Otherwise, these expected widths appear for the first time here in the present article13 .
Figure 8
Expected Widths of Likelihood‐Based Confidence Intervals for unspecified Methods of Forced‐
Choice Difference Testing Discussed in this Article (N = 30)
Figure 9
Expected Widths of Likelihood‐Based Confidence Intervals for the Duo‐Trio, Triangle and
Tetrad Tests (N = 30)
From these figures, we see that both of the methods of Tetrad testing, specified and unspecified
are, in theory, the most precise methods for which likelihood‐based confidence intervals have
been developed. However, some caveats are in order. Both of these methods require evaluation
of four samples, which could lead to additional noise. In addition, if four samples are to be
evaluated in a specified condition, it would perhaps be feasible to perform a double‐replicated 2‐
AFC test with the same amount of product preparation. Thus, experimental comparisons of the
various methods, especially those involving Tetrad testing, are crucial for understanding the
behavior of these tests in practice [see Garcia et al. (2012), Ishii et al. (2014), and Garcia et al.
(2013) for three such recent comparisons].
Reference
Google Scholar
Angulo, O., Lee, H.‐S. and O'Mahony, M. 2007. Sensory difference tests: Overdispersion and warm‐up. Food Qual.
Prefer. 18, 190– 195.
https://onlinelibrary.wiley.com/doi/full/10.1111/joss.12086