Professional Documents
Culture Documents
The Beta Binomial Model Variability in O
The Beta Binomial Model Variability in O
WICHE
ABSTRACT
INTRODUCTION
Binomial statistics have long been used for probability calculations when
panelist responses fall into one of two categories. For example, in a paired
preference test (“Which one of two products do you prefer?”) or paired
comparison test (“Which one of two products is stronger in <specified
attribute>?”), the judge can choose between one of two possible alternatives,
sample 1 or sample 2. The same is true in a triangle test (“Two samples are
1
Corresponding author. TEL: 614 247 6756; FAX: 614 292 0218; EMAIL: delwiche.1@osu.edu
the same and one is different. Please choose the different sample.”). Even
though there are three samples, the judge’s response is either correct or
incorrect. The binomial statistic makes two basic assumptions. First, objects
(or judges) are identical, and second, cases (or judgments) are independent.
These assumptions are reasonably fulfilled when one judgment is collected
from each judge. However, bringing judges into the booths is often the biggest
challenge in conducting a sensory study; thus, it is generally beneficial to
collect more information from these judges than a single assessment.
Unfortunately, when judges make multiple judgments, the assumption of
independence is broken. The binomial distribution assumes the existence of
only one source of variability, i.e., the samples. In fact, judges are also a source
of variation as they vary in preference and/or sensitivity, leading to varying
choice probabilities. If judges are acting in an identical fashion, variance from
sample to sample is explained entirely by the binomial distribution. If judges
are not acting in an identical fashion, the variance due to differences between
judges is explained by the beta distribution. This variability is known as
overdispersion, which is measured by g (gamma) – a value that ranges from
0 to 1. When g = 0, there is no overdispersion and binomial statistics may be
used. When g = 1, there is total overdispersion and binomial statistics cannot
be used without violating its underlying assumptions. In order to use the beta-
binomial model, data must be collected with at least two replications, gamma
must be calculated, and it must be determined whether gamma is significantly
greater than zero. When gamma is greater than zero, the binomial model is
invalid and the beta-binomial model should be used instead (Ennis and Bi
1998; Bi et al. 2000; Rousseau 2002).
The beta-binomial model has been used successfully in a variety of areas,
including chromosome research (Skellam 1948), market research (Chatfield
and Goodhardt 1970), and policy change (Gange et al. 1996). To date, its use
in sensory evaluation has not been fully realized (Ennis and Bi 1998). More
recently, however, sensory preference of electrostatically coated potato chips
was investigated by Ratanatriwong et al. (2003) using the beta-binomial
model. Gamma values for a paired preference of electrostatically and nonelec-
trostatically coated barbecue and sour cream and onion chips were signifi-
cantly greater than zero and the beta-binomial model was used to determine
if electrostatically coated potato chips were significantly different from non-
electrostatically coated potato chips. Visual triangle tests of the same season-
ings showed a nonsignificant gamma indicating the binomial model
adequately fits the data. Similarly, Radovich et al. (2004) used the beta-
binomial model to test sensory quality of cabbage. Gamma for all triangle
comparisons was nonsignificant, indicating again that it was appropriate to
combine across judges and replications and use the binomial model to deter-
mine if significant differences existed between samples.
50 R.E. LIGGETT and J.F. DELWICHE
The objectives of this study were to: (1) determine if panelist variability
was a greater issue in preference tasks than in discrimination tasks; (2) exam-
ine the stability of gamma across various discrimination tasks (such as the 2-
alternative forced choice, 3-alternative forced choice, triangle and duo-trio);
and (3) investigate the stability of gamma over time and panelist experience.
Judges. Judges were recruited via intercept from the lobby of the Parker
Food Science and Technology Building on the Columbus Campus of The Ohio
State University. Panelists were selected based upon their willingness to assess
the given stimuli. In tests 1A, 2 and 3, 53, 54 and 56 judges participated,
respectively. Twenty-five judges participated in test 1B. In tests 4 and 5, 56
and 58 judges participated, respectively. Judges were compensated with
candy.
In test 2 and test 4, there was significant panelist variability for the preference
data but not for the discrimination data. Thus, for analysis, the beta-binomial
and binomial models were used, respectively. In tests 3 and 5, panelists acted
similarly in preference but there was a significant gamma for the 2-alternative
forced choices (2-AFCs); thus, the binomial and beta-binomial models were
used, respectively. Equivalent panel size was larger than the number of judges
in all tests. Regardless of model used, power was higher for the difference
tests than for the preference tests (see Table 1), indicating that differences in
preference were smaller than the perceived difference in sweet intensity.
While Ratanatriwong et al. (2003) found significant panelist variability
in preference data, and Radovich et al. (2004) found low panelist variability
in discrimination data, this pattern was not replicated with this study. These
differing results illustrate the unpredictable nature of panelist variability and
caution against assuming that panelist variability can be ignored.
Comparison of tests 1A and 1B shows the trade-off between judges and
replications. In both tests, panelist variability was very low for preference and
discrimination data. In each case, the binomial model was employed to arrive
at the same conclusions for both the preference and discriminability of the
TABLE 1.
COMPARISON OF PANELIST VARIABILITY AND MODEL USED FOR DATA ANALYSIS FOR
THE PAIRED PREFERENCE AND PAIRED-COMPARISON TEST METHODS FOR TWO
REPLICATIONS OF EACH OF FIVE PAIRS OF PRODUCTS
No. of panelists 53 54 56 56 58
Paired preference (P = 1/2)
No. of agreeing responses 62 76 71 64 102
Panelist variability (g) <0.0001 0.3213 0.1524 0.2513 <0.0001
Binomial P-value 0.0990 <0.0001 0.0062 0.1586 <0.0001
Beta-binomial P-value 0.0804 0.0001 0.0003 0.0426 <0.0001
Appropriate model Binomial bb Binomial bb Binomial
Equivalent panel size 106 82 97 90 116
Power 0.416 0.982 0.958 0.527 1.000
Paired comparison (P = 1/2)
No. of correct responses 69 100 99 81 97
Panelist variability (g) <0.0001 <0.0001 0.2381 0.0984 0.5623
Binomial P-value 0.0012 <0.0001 <0.0001 <0.0001 <0.0001
Beta-binomial P-value 0.0009 <0.0001 <0.0001 <0.0001 <0.0001
Appropriate model Binomial Binomial bb Binomial bb
Equivalent panel size 106 108 90 102 74
Power 0.937 1.000 1.000 1.000 1.000
bb = beta-binomial.
VARIABILITY IN OVERDISPERSION 53
TABLE 2.
COMPARISON OF PANELIST VARIABILITY AND MODEL USED FOR DATA ANALYSIS
WITH RESPECT TO NUMBER OF PANELISTS AND NUMBER OF REPLICATIONS FOR THE
PAIRED PREFERENCE AND PAIRED COMPARISON TEST METHODS FOR A PAIR OF
CHERRY-FLAVORED BEVERAGES
Test 1A Test 1B
No. of panelists 53 25
No. of replications 2 4
Paired preference (P = 1/2)
No. of agreeing responses 62 55
Panelist variability (g) <0.0001 <0.0001
Binomial P-value 0.1442 0.3682
Beta-binomial P-value 0.0804 0.3173
Appropriate model Binomial Binomial
Equivalent panel size 106 100
Power 0.416 0.169
Paired comparison (P = 1/2)
No. of correct responses 69 66
Panelist variability (g) <0.0001 <0.0001
Binomial P-value 0.0012 0.0009
Beta-binomial P-value 0.0009 0.0007
Appropriate model Binomial Binomial
Equivalent panel size 106 100
Power 0.937 0.950
products (see Table 2). While the ability to trade-off judges and replications
is advantageous, caution should be taken when using small numbers of judges
for affective measurements as a small sample size is less likely to be repre-
sentative of the whole population.
Judges. Judges were recruited via intercept from the lobby of the Parker
Food Science and Technology Building on the Columbus Campus of The Ohio
State University. Panelists were selected based upon their willingness to assess
the given stimuli. One hundred three judges (48 males and 55 females; ages
18–65) participated and were compensated with candy.
54 R.E. LIGGETT and J.F. DELWICHE
TABLE 3.
COMPARISON OF PANELIST VARIABILITY, MODEL USED FOR DATA ANALYSIS AND D¢
ACROSS DISCRIMINATION METHODS FOR A PAIR OF CHERRY-FLAVORED BEVERAGES
WHERE THE SAME 103 JUDGES EVALUATED TWO REPLICATIONS OF EACH METHOD
bb = beta-binomial.
56 R.E. LIGGETT and J.F. DELWICHE
responses for the triangle is 35.58, while the other methods range from 48.26
to 63.82%. Thus the latter methods can pick up a significant difference while
the percentage for the triangle is barely above chance.
The significant panelist variability found for the duo-trio can be explained
by Sequential Sensitivity Analysis (SSA). According to O’Mahony (1995),
the order of samples, specifically the impact of the strength of the first stimulus
on the second stimulus, can have an effect on panelist performance, even
altering d¢. Commonly, the effects are due to carryover and adaptation. SSA
hypothesizes that the perception of a stimulus is impacted by the stimulus
directly preceding it. Specifically, it predicts a strong stimulus preceded by a
weaker one will be perceived as more intense than a weaker stimulus preceded
by a stimulus of the same strength. Further, it predicts a weak stimulus
preceded by a stimulus of the same strength will be perceived as more intense
than a weak stimulus that is preceded by a stronger stimulus. Generally, the
easiest pair of stimuli to discriminate is weak followed by strong; intermediate
pairs are strong followed by weak and weak followed by weak; and the most
difficult is strong followed by strong (O’Mahony and Odbert 1985; O’Mahony
1995; Dessirier and O’Mahony 1999). In this study, half of judges received
the sweeter sample as the reference in the duo-trio and the other half
received the less sweet sample as the reference. When segregating the data
based on the reference received, there is no significant panelist variability in
either set. However, panelist performance was better when the weaker stimulus
was presented as the reference. It is then not surprising that there was signif-
icant panelist variability in the combined dataset. It is surprising, however,
that the d¢ for the duo-trio was significantly larger than that of the 2-AFC, 3-
AFC and triangle. Because the stimuli were relatively simple, differing along
a single dimension, it is possible that the judges were able to use a skimming
strategy during the duo-trio, increasing performance levels and the apparent d¢.
Judges. As before, all judges were recruited via intercept from the lobby
of the Parker Food Science and Technology Building on the Columbus Cam-
pus of The Ohio State University. Panelists were selected based upon their
VARIABILITY IN OVERDISPERSION 57
Day 1 Day 2 Day 3 Day 4 Day 5 Day 6 Day 7 Day 8 Day 9 Day 10
bb = beta-binomial.
VARIABILITY IN OVERDISPERSION 59
TABLE 5.
COMPARISON OF PANELIST PERFORMANCE ACROSS 10 DAYS OF TESTING (FOUR
REPLICATIONS EACH DAY) FOR THE PAIRED-COMPARISON (P = 1/2) TEST METHOD FOR
A PAIR OF CHERRY-FLAVORED BEVERAGES FOR REPRESENTATIVE JUDGES
Overdispersion within judges was also examined. All judges across days
were consistent with themselves, i.e., gamma was not significant. However,
not all judges could discriminate between the products. Of the 25 panelists,
eight could not discriminate the products. Judge 2 showed the most variability
but could discriminate the products. Judges 17 and 18 both had very low
variability but one could discriminate the difference and the other could not
(see Table 5). Such analyses could be used to select more reliable panelists
for future studies.
GENERAL DISCUSSION
ACKNOWLEDGMENTS
REFERENCES
BI, J., TEMPLETON-JANIK, L., ENNIS, J.M. and ENNIS, D.M. 2000.
Replicated difference and preference tests: How to account for intertrial
variation. Food Qual. Prefer. 11, 269–273.
BROCKHOFF, P.B. 2003. The statistical power of replications in difference
tests. Food Qual. Prefer. 14, 405–417.
CHATFIELD, C. and GOODHARDT, G. 1970. The beta-binomial model for
consumer purchasing behavior. Appl. Stat. 19, 240–250.
DELWICHE, J.F. and O’MAHONY, M. 1996. Flavour discrimination: An
extension of Thurstonian “paradoxes” to the tetrad method. Food Qual.
Prefer. 7, 1–5.
DESSIRIER, J.-M. and O’MAHONY, M. 1999. Comparison of d¢ values for
the 2-AFC (paired comparison) and 3-AFC discrimination methods:
Thurstonian models, sequential sensitivity analysis and power. Food
Qual. Prefer. 10, 51–58.
ENNIS, D. 1993. The power of sensory discrimination methods. J. Sens. Stud.
8, 353–370.
ENNIS, D. and BI, J. 1998. The beta-binomial model: Accounting for inter-
trial variation in replicated difference and preference tests. J. Sens. Stud.
13, 389–412.
GANGE, S., MUNOZ, A., SAEZ, M. and ALONSO, J. 1996. Use of the beta-
binomial distribution to model the effect of policy change on appropri-
ateness of hospital stays. Appl. Stat. 45, 371–382.
O’MAHONY, M. 1995. Who told you the triangle test was simple? Food
Qual. Prefer. 6, 227–238.
O’MAHONY, M., MASUOKA, S. and ISHII, R. 1994. A theoretical note on
difference tests: Models, paradoxes and cognitive strategies. J. Sens.
Stud. 9, 247–272.
VARIABILITY IN OVERDISPERSION 61