The Beta Binomial Model Variability in O

Blackwell Science, LtdOxford, UKJSSJournal of Sensory Studies0887-8250Copyright 2005 by Food & Nutrition Press, Inc., Trumbull, Connecticut.
2014861Original Article VARIABILITY IN OVERDISPERSIONR.E. LIGGETT and J.F. DEL-
WICHE
THE BETA-BINOMIAL MODEL: VARIABILITY IN

OVERDISPERSION ACROSS METHODS AND OVER TIME
RACHEL E. LIGGETT and JEANNINE F. DELWICHE1
The Ohio State University

Department of Food Science and Technology
110 Parker Food Science and Technology Building, 2015 Fyffe Road
Columbus, OH 43210
Accepted for Publication November 21, 2004
ABSTRACT
The beta-binomial model accounts for variability between samples as

well as variability between judges (overdispersion), making it possible to
combine responses across judges and replications. Overdispersion (measured
by gamma) was examined in this 3-part study. (1) Groups of judges (n = 53–
59) performed two replications each of a paired comparison and paired
preference. Results showed that extent of overdispersion could not be pre-
dicted from the method. (2) The stability of gamma across discrimination
methods (2-alternative forced choice [2-AFC], 3-alternative forced choice [3-
AFC], triangle and duo-trio) was also examined (n = 103). Results indicated
that gamma was largely consistent across 2-AFC, 3-AFC and triangle tasks,
but it was higher in the duo-trio task. (3) The stability of gamma over time
was also investigated (n = 25). Results demonstrate that overdispersion varied
over time with no clear pattern. In all cases, use of the beta-binomial model
allowed responses across judges and replications to be combined, increasing
the power achieved for a given panel size.
INTRODUCTION
Binomial statistics have long been used for probability calculations when
panelist responses fall into one of two categories. For example, in a paired
preference test (“Which one of two products do you prefer?”) or paired
comparison test (“Which one of two products is stronger in <specified
attribute>?”), the judge can choose between one of two possible alternatives,
sample 1 or sample 2. The same is true in a triangle test (“Two samples are
1
Corresponding author. TEL: 614 247 6756; FAX: 614 292 0218; EMAIL: delwiche.1@osu.edu
Journal of Sensory Studies 20 (2005) 48–61. All Rights Reserved.

48 © Copyright 2005, Blackwell Publishing
VARIABILITY IN OVERDISPERSION 49
the same and one is different. Please choose the different sample.”). Even
though there are three samples, the judge’s response is either correct or
incorrect. The binomial statistic makes two basic assumptions. First, objects
(or judges) are identical, and second, cases (or judgments) are independent.
These assumptions are reasonably fulfilled when one judgment is collected
from each judge. However, bringing judges into the booths is often the biggest
challenge in conducting a sensory study; thus, it is generally beneficial to
collect more information from these judges than a single assessment.
Unfortunately, when judges make multiple judgments, the assumption of
independence is broken. The binomial distribution assumes the existence of
only one source of variability, i.e., the samples. In fact, judges are also a source
of variation as they vary in preference and/or sensitivity, leading to varying
choice probabilities. If judges are acting in an identical fashion, variance from
sample to sample is explained entirely by the binomial distribution. If judges
are not acting in an identical fashion, the variance due to differences between
judges is explained by the beta distribution. This variability is known as
overdispersion, which is measured by g (gamma) – a value that ranges from
0 to 1. When g = 0, there is no overdispersion and binomial statistics may be
used. When g = 1, there is total overdispersion and binomial statistics cannot
be used without violating its underlying assumptions. In order to use the beta-
binomial model, data must be collected with at least two replications, gamma
must be calculated, and it must be determined whether gamma is significantly
greater than zero. When gamma is greater than zero, the binomial model is
invalid and the beta-binomial model should be used instead (Ennis and Bi
1998; Bi et al. 2000; Rousseau 2002).
The beta-binomial model has been used successfully in a variety of areas,
including chromosome research (Skellam 1948), market research (Chatfield
and Goodhardt 1970), and policy change (Gange et al. 1996). To date, its use
in sensory evaluation has not been fully realized (Ennis and Bi 1998). More
recently, however, sensory preference of electrostatically coated potato chips
was investigated by Ratanatriwong et al. (2003) using the beta-binomial
model. Gamma values for a paired preference of electrostatically and nonelec-
trostatically coated barbecue and sour cream and onion chips were signifi-
cantly greater than zero and the beta-binomial model was used to determine
if electrostatically coated potato chips were significantly different from non-
electrostatically coated potato chips. Visual triangle tests of the same season-
ings showed a nonsignificant gamma indicating the binomial model
adequately fits the data. Similarly, Radovich et al. (2004) used the beta-
binomial model to test sensory quality of cabbage. Gamma for all triangle
comparisons was nonsignificant, indicating again that it was appropriate to
combine across judges and replications and use the binomial model to deter-
mine if significant differences existed between samples.
50 R.E. LIGGETT and J.F. DELWICHE
The objectives of this study were to: (1) determine if panelist variability
was a greater issue in preference tasks than in discrimination tasks; (2) exam-
ine the stability of gamma across various discrimination tasks (such as the 2-
alternative forced choice, 3-alternative forced choice, triangle and duo-trio);
and (3) investigate the stability of gamma over time and panelist experience.
EXPERIMENT 1 – PREFERENCE VERSUS DIFFERENCE
Materials and Methods

Stimuli. A powdered, artificially flavored, unsweetened drink mix
(Great Value, Wal-Mart Stores Inc., Bentonville, AR) was reconstituted using
package directions and sweetened with varying levels of sucrose (Domino
Foods Inc., Yonkers, NY). In test 1 (parts A and B), the flavor was cherry and
sucrose levels were 10% and 9.5%. In test 2, the flavor was grape and sucrose
levels were 10% and 7.5%. In test 3, the flavor was orange and sucrose levels
were 10% and 8.5%. In all of the above tests, 20 mL of finished beverage
were dispensed into 1 oz translucent plastic soufflé cups (Solo Cup Co.,
Urbana, IL) labeled with three-digit codes.
In test 4, regular and reduced-fat Ruffles Original potato chips (Frito-
Lay, Inc., Plano, TX) were compared. Approximately 4 g of whole chips were
served in 3.25 oz translucent plastic soufflé cups (Solo Cup Co., Urbana, IL)
labeled with three-digit codes. In test 5, regular and reduced-fat Oreo choco-
late-sandwich cookies (Nabisco Division, Kraft Foods North America, Inc.,
East Hanover, NJ) were compared. One whole cookie was placed into 3.25 oz
translucent plastic soufflé cups labeled with a three-digit code.
Judges. Judges were recruited via intercept from the lobby of the Parker
Food Science and Technology Building on the Columbus Campus of The Ohio
State University. Panelists were selected based upon their willingness to assess
the given stimuli. In tests 1A, 2 and 3, 53, 54 and 56 judges participated,
respectively. Twenty-five judges participated in test 1B. In tests 4 and 5, 56
and 58 judges participated, respectively. Judges were compensated with
candy.
Procedure. The Office of Responsible Research Practices at The Ohio

State University approved all methods and procedures. For all drink mixes,
judges evaluated samples in individual booths at a central location and entered
their responses directly utilizing Compusense five version 4.6 data collection
and analysis software (Compusense Inc., Guelph, ON, Canada). For the potato
chips and sandwich cookies, the judges evaluated samples while standing at
a laboratory bench and responses were collected on paper. Judges were

allowed to swallow or expectorate samples, as each desired, and retasting was
allowed. Room-temperature spring water was provided for rinsing (Ice Moun-
tain Water Co., Hilliard, OH). Judges began by giving their informed consent
for participation and each proceeded at their own pace.
In tests 1A, 2, 3, 4 and 5, a tray of samples containing 4 pairs of
unknowns, numbered 1 through 4, was presented. Judges were instructed to
taste the 2 samples in row 1 from left to right and indicate which sample they
preferred. Judges then tasted the 2 samples in row 2 and responded in identical
fashion as before. Next, judges tasted the 2 samples in row 3 and indicated
which sample had more of a specified attribute. For drink mixes, the attribute
was sweetness, and for chips and cookies, the attribute was hardness. The
same procedure was followed for row 4. The order of samples was counter-
balanced within and across judges.
In test 1B, a tray of samples containing 8 pairs of unknowns, numbered
1 through 8, was presented. In rows 1 through 4, judges indicated which
sample they preferred and in rows 5 through 8, judges indicated which sample
was sweeter, following the same procedure as before.
Statistics. For all experiments, frequencies of responses for each judge

and each method were tabulated and analyzed with IFPrograms version 7.6
software (The Institute for Perception, Richmond, VA). Panelist variability
was measured by calculating the associated g. When panelist variability was
significant, the beta-binomial model was used to determine if a significant
difference existed; when panelist variability was not significant, the binomial
model was used. The amount of overdispersion (g), or panelist variability, was
nR S 1
estimated by - , where nR = the number of replications
p(1 - p ) N J ( nR - 1) nR - 1
per judge, p = the mean probability of a choice response, NJ = the number of
NJ 2
judges, and S = Â ÊÁ i - pˆ˜ , where xi = the number of choice responses in
x
Ë
i =1 nR
¯
N J nR
the ith trial. Equivalent panel size was then calculated using .
1 + ( nR - 1)(g )
In addition, the power of each test was calculated from the number of judges,
number of replications, alpha level, null probability, gamma (overdispersion)
and the actual proportion of choice responses observed (Rousseau 2002).
Results and Discussion

In test 1A, panelist variability was low for both preference and discrim-
ination tasks and the binomial model was used to analyze both sets of data.
In test 2 and test 4, there was significant panelist variability for the preference
data but not for the discrimination data. Thus, for analysis, the beta-binomial
and binomial models were used, respectively. In tests 3 and 5, panelists acted
similarly in preference but there was a significant gamma for the 2-alternative
forced choices (2-AFCs); thus, the binomial and beta-binomial models were
used, respectively. Equivalent panel size was larger than the number of judges
in all tests. Regardless of model used, power was higher for the difference
tests than for the preference tests (see Table 1), indicating that differences in
preference were smaller than the perceived difference in sweet intensity.
While Ratanatriwong et al. (2003) found significant panelist variability
in preference data, and Radovich et al. (2004) found low panelist variability
in discrimination data, this pattern was not replicated with this study. These
differing results illustrate the unpredictable nature of panelist variability and
caution against assuming that panelist variability can be ignored.
Comparison of tests 1A and 1B shows the trade-off between judges and
replications. In both tests, panelist variability was very low for preference and
discrimination data. In each case, the binomial model was employed to arrive
at the same conclusions for both the preference and discriminability of the
TABLE 1.
COMPARISON OF PANELIST VARIABILITY AND MODEL USED FOR DATA ANALYSIS FOR
THE PAIRED PREFERENCE AND PAIRED-COMPARISON TEST METHODS FOR TWO
REPLICATIONS OF EACH OF FIVE PAIRS OF PRODUCTS
Test 1A Test 2 Test 3 Test 4 Test 5

Cherry Grape Orange Potato Sandwich
beverage beverage beverage chips cookies
No. of panelists 53 54 56 56 58
Paired preference (P = 1/2)
No. of agreeing responses 62 76 71 64 102
Panelist variability (g) <0.0001 0.3213 0.1524 0.2513 <0.0001
Binomial P-value 0.0990 <0.0001 0.0062 0.1586 <0.0001
Beta-binomial P-value 0.0804 0.0001 0.0003 0.0426 <0.0001
Appropriate model Binomial bb Binomial bb Binomial
Equivalent panel size 106 82 97 90 116
Power 0.416 0.982 0.958 0.527 1.000
Paired comparison (P = 1/2)
No. of correct responses 69 100 99 81 97
Panelist variability (g) <0.0001 <0.0001 0.2381 0.0984 0.5623
Binomial P-value 0.0012 <0.0001 <0.0001 <0.0001 <0.0001
Beta-binomial P-value 0.0009 <0.0001 <0.0001 <0.0001 <0.0001
Appropriate model Binomial Binomial bb Binomial bb
Equivalent panel size 106 108 90 102 74
Power 0.937 1.000 1.000 1.000 1.000
bb = beta-binomial.
TABLE 2.
COMPARISON OF PANELIST VARIABILITY AND MODEL USED FOR DATA ANALYSIS
WITH RESPECT TO NUMBER OF PANELISTS AND NUMBER OF REPLICATIONS FOR THE
PAIRED PREFERENCE AND PAIRED COMPARISON TEST METHODS FOR A PAIR OF
CHERRY-FLAVORED BEVERAGES
Test 1A Test 1B
No. of panelists 53 25
No. of replications 2 4
Paired preference (P = 1/2)
No. of agreeing responses 62 55
Panelist variability (g) <0.0001 <0.0001
Binomial P-value 0.1442 0.3682
Beta-binomial P-value 0.0804 0.3173
Appropriate model Binomial Binomial
Equivalent panel size 106 100
Power 0.416 0.169
Paired comparison (P = 1/2)
No. of correct responses 69 66
Panelist variability (g) <0.0001 <0.0001
Binomial P-value 0.0012 0.0009
Beta-binomial P-value 0.0009 0.0007
Appropriate model Binomial Binomial
Equivalent panel size 106 100
Power 0.937 0.950
products (see Table 2). While the ability to trade-off judges and replications
is advantageous, caution should be taken when using small numbers of judges
for affective measurements as a small sample size is less likely to be repre-
sentative of the whole population.
EXPERIMENT 2 – STABILITY OF GAMMA ACROSS METHODS

Stimuli. The same cherry-flavored drink mix used in experiment 1 was
reconstituted using the package directions and sweetened with sucrose at 10%
and 9.5%. Twenty milliliters were dispensed into 1 oz translucent plastic
soufflé cups labeled with three-digit codes.
Judges. Judges were recruited via intercept from the lobby of the Parker
Food Science and Technology Building on the Columbus Campus of The Ohio
State University. Panelists were selected based upon their willingness to assess
the given stimuli. One hundred three judges (48 males and 55 females; ages
18–65) participated and were compensated with candy.

State University approved all methods and procedures. Judges evaluated sam-
ples in individual booths at a central location and entered their responses
directly utilizing Compusense. Judges were allowed to swallow or expectorate
samples, as each desired, and retasting was allowed. Room-temperature spring
water was provided for rinsing. Judges began by giving their informed consent
for participation and proceeded at their own pace.
A tray of samples containing 8 sample sets, numbered 1 through 8, was
presented. Each tray contained two 2-AFCs, two 3-AFCs, two triangles and
two duo-trios (descriptions below). Sample sets of a like method were always
presented one immediately after the other to minimize panelist confusion and
the order of the methods was counterbalanced across judges.
For 2-AFCs, judges were presented with two unknowns, and all judges
were informed that one sample was sweeter and the other was less sweet. Half
of the judges were instructed to indicate which sample was sweeter and the
other half were instructed to indicate which sample was less sweet. Sample
order was counterbalanced both within and across judges.
For 3-AFCs, the judges were presented with three unknown samples.
Half of the judges were informed that one sample was sweeter and two others
were less sweet. These judges were instructed to indicate which sample was
sweetest. The other judges were presented with one sample that was less
sweet and two others that were sweeter. These judges were instructed to
indicate which sample was least sweet. Sample order was counterbalanced
across judges. Within judges, the odd sample never fell in the same position
twice.
For triangles, all judges were presented with three unknown samples and
informed that two were the same and one was different, and instructed to
choose the different sample. Sample order was counterbalanced across judges.
Within judges, the sweeter sample was odd in one sample set and the less
sweet sample was odd in the other; the odd sample never fell in the same
position twice.
For duo-trios, judges were presented with one reference sample and two
unknown samples. They were informed that one sample was the same as the
reference and the other was different. Judges were instructed to indicate which
sample was the same as the reference. Sample order was counterbalanced
within and across judges. Half of the judges were given the sweeter sample
as the reference and the other half were given the less sweet sample as the
reference.
Statistics. Frequencies of responses for each judge and each method

were tabulated and analyzed with IFPrograms. Panelist variability was mea-
sured by calculating the associated g. When panelist variability was signifi-
cant, the beta-binomial model was used to determine if a significant difference

existed; when panelist variability was not significant, the binomial model was
used. The amount of overdispersion (g), equivalent panel size and power were
calculated in the same manner as in experiment 1.

For the 2-AFC, 3-AFC and triangle, panelist variability was low and the
binomial model was used to analyze the data. For the duo-trio, panelist
variability was significant and the beta-binomial model was used. The 2-AFC,
3-AFC and duo-trio found a significant difference between the two products
while the triangle failed to do so. Values of d¢ for the 2-AFC, 3-AFC and
triangle ranged from 0.510 to 0.560, while the duo-trio was 0.860. For all
tests, equivalent panel size was larger than the actual number of panelists and
the associated power was effectively boosted (see Table 3).
A significant difference was not found in the triangle test because it is
the least sensitive of the four discrimination methods studied here. As dis-
cussed by O’Mahony et al. (1994), panelists use a comparison of distances
cognitive strategy when performing the triangle and duo-trio and a skimming
strategy for the 2-AFC and 3-AFC. Performance using the skimming strategy
is greater than that of the comparison of distances strategy even when the
judges’ sensitivity to the difference between the products and d¢ are the same
(O’Mahony et al. 1994; O’Mahony 1995; Delwiche and O’Mahony 1996;
O’Mahony and Rousseau 2002). Using Thurstonian modeling, percentages of
correct responses for various methods and various d¢ values have been com-
puted (Ennis 1993). At a d¢ value of 0.5 the expected proportion of correct
TABLE 3.
COMPARISON OF PANELIST VARIABILITY, MODEL USED FOR DATA ANALYSIS AND D¢
ACROSS DISCRIMINATION METHODS FOR A PAIR OF CHERRY-FLAVORED BEVERAGES
WHERE THE SAME 103 JUDGES EVALUATED TWO REPLICATIONS OF EACH METHOD
2-AFC 3-AFC Triangle Duo-trio
Null probability 1/2 1/3 1/3 1/2

No. of correct responses 132 103 74 116
Panelist variability (g) 0.0299 0.0874 <0.0001 0.1713
Binomial P-value <0.0001 <0.0001 0.2360 0.0406
Beta-binomial P-value <0.0001 <0.0001 0.2150 0.0471
Appropriate model Binomial Binomial Binomial bb
Equivalent panel size 200 189 206 176
Power 0.993 0.999 0.200 0.512
d¢ 0.510 0.560 0.540 0.860
bb = beta-binomial.
responses for the triangle is 35.58, while the other methods range from 48.26
to 63.82%. Thus the latter methods can pick up a significant difference while
the percentage for the triangle is barely above chance.
The significant panelist variability found for the duo-trio can be explained
by Sequential Sensitivity Analysis (SSA). According to O’Mahony (1995),
the order of samples, specifically the impact of the strength of the first stimulus
on the second stimulus, can have an effect on panelist performance, even
altering d¢. Commonly, the effects are due to carryover and adaptation. SSA
hypothesizes that the perception of a stimulus is impacted by the stimulus
directly preceding it. Specifically, it predicts a strong stimulus preceded by a
weaker one will be perceived as more intense than a weaker stimulus preceded
by a stimulus of the same strength. Further, it predicts a weak stimulus
preceded by a stimulus of the same strength will be perceived as more intense
than a weak stimulus that is preceded by a stronger stimulus. Generally, the
easiest pair of stimuli to discriminate is weak followed by strong; intermediate
pairs are strong followed by weak and weak followed by weak; and the most
difficult is strong followed by strong (O’Mahony and Odbert 1985; O’Mahony
1995; Dessirier and O’Mahony 1999). In this study, half of judges received
the sweeter sample as the reference in the duo-trio and the other half
received the less sweet sample as the reference. When segregating the data
based on the reference received, there is no significant panelist variability in
either set. However, panelist performance was better when the weaker stimulus
was presented as the reference. It is then not surprising that there was signif-
icant panelist variability in the combined dataset. It is surprising, however,
that the d¢ for the duo-trio was significantly larger than that of the 2-AFC, 3-
AFC and triangle. Because the stimuli were relatively simple, differing along
a single dimension, it is possible that the judges were able to use a skimming
strategy during the duo-trio, increasing performance levels and the apparent d¢.
EXPERIMENT 3 – STABILITY OF GAMMA OVER TIME

Stimuli. The same cherry-flavored drink mix used in experiment 1 was
reconstituted using the package directions and sweetened with sucrose at 10%
and 9.5%. As in the previous experiments, 20 mL were dispensed into 1 oz
translucent plastic soufflé cups labeled with three-digit codes.
Judges. As before, all judges were recruited via intercept from the lobby
of the Parker Food Science and Technology Building on the Columbus Cam-
pus of The Ohio State University. Panelists were selected based upon their
willingness to assess the given stimuli. Twenty-five judges (4 males and 21

females; ages 18–45) participated and were compensated with candy and gift
certificates to a local eatery.

State University approved all methods and procedures. Judges evaluated sam-
ples in individual booths at a central location and entered their responses
directly utilizing Compusense. Judges were allowed to swallow or expectorate
samples as each desired and retasting was allowed. Room-temperature spring
water was provided for rinsing. Judges began by giving their informed consent
for participation and proceeded at their own pace.
A tray of samples containing 4 pairs of unknown samples, numbered 1
through 4, was presented. For each pair, judges were informed that one sample
was sweeter and the other was less sweet, and instructed to choose the sweeter
sample. Sample order was counterbalanced within and across judges.
Judges completed this task once each day, Monday through Friday for
two consecutive weeks. Judges were allowed to miss no more than one session
during the study. If the missed session fell on a Monday or Friday, they were
dismissed from the study in order that the span of days between sessions did
not exceed two. Four judges attended one make-up session that was held the
Monday immediately following the two-week study.
Statistics. As before, frequencies of responses for each judge and each

method were tabulated and analyzed with IFPrograms. Panelist variability was
measured by calculating the associated g. When panelist variability was sig-
nificant, the beta-binomial model was used to determine if a significant dif-
ference existed; when panelist variability was not significant, the binomial
model was used. The amount of overdispersion (g), equivalent sample size
and power were calculated in the same manner as experiment 1.

Across 10 days of testing, both the binomial and beta-binomial models
were employed with no apparent pattern. The fact that the panelist variability
was significant on the first day and not on the second suggested a learning
effect. The lack of apparent pattern across days and the fact that judges failed
to find a significant difference between products on Day 9 may be due to
boredom or some other change over time. While there was no significant
panelist variability for 7 of the 10 days demonstrating some consistency,
panelist variability was not predictable and thus should always be measured.
In all cases, the resultant equivalent panel size and power were greater than
would be achieved by collecting one replication alone (see Table 4).
58
TABLE 4.
COMPARISON OF PANELIST VARIABILITY AND MODEL USED FOR DATA ANALYSIS FOR THE PAIRED-COMPARISON (P = 1/2) TEST
R.E. LIGGETT and J.F. DELWICHE

METHOD ADMINISTERED OVER TIME FOR A PAIR OF CHERRY-FLAVORED BEVERAGES WHERE THE SAME 25 PANELISTS EVALUATED
FOUR REPLICATIONS EACH DAY
Day 1 Day 2 Day 3 Day 4 Day 5 Day 6 Day 7 Day 8 Day 9 Day 10
No. of correct responses 66 71 69 65 68 71 68 68 53 65

Panelist variability (g) 0.1979 <0.0001 0.0182 0.1355 <0.0001 <0.0001 <0.0001 0.1422 0.0232 0.0769
Binomial P-value 0.0009 <0.0001 0.0001 0.0018 0.0002 <0.0001 0.0002 0.0002 0.3086 0.0018
Beta-binomial P-value 0.0056 <0.0001 0.0001 0.0057 0.0002 <0.0001 0.0002 0.0013 0.2743 0.0013
Appropriate model bb Binomial Binomial bb Binomial Binomial Binomial bb Binomial Binomial
Equivalent panel size 63 100 95 71 100 100 100 70 94 81
Power 0.826 0.998 0.987 0.823 0.982 0.998 0.982 0.929 0.143 0.867
bb = beta-binomial.
TABLE 5.
COMPARISON OF PANELIST PERFORMANCE ACROSS 10 DAYS OF TESTING (FOUR
REPLICATIONS EACH DAY) FOR THE PAIRED-COMPARISON (P = 1/2) TEST METHOD FOR
A PAIR OF CHERRY-FLAVORED BEVERAGES FOR REPRESENTATIVE JUDGES
Judge 2 Judge 17 Judge 18
No. of correct responses 32 28 18

Panelist variability (g) 0.3730 <0.0001 <0.0001
Binomial P-value 0.0001 0.0083 0.7852
Beta-binomial P-value 0.0001 0.0057 0.2635
Appropriate model Binomial Binomial Binomial
Overdispersion within judges was also examined. All judges across days
were consistent with themselves, i.e., gamma was not significant. However,
not all judges could discriminate between the products. Of the 25 panelists,
eight could not discriminate the products. Judge 2 showed the most variability
but could discriminate the products. Judges 17 and 18 both had very low
variability but one could discriminate the difference and the other could not
(see Table 5). Such analyses could be used to select more reliable panelists
for future studies.
GENERAL DISCUSSION
The unpredictable nature of overdispersion is demonstrated in each of

these experiments. Even in a homogenous group of panelists being tested over
time, gamma could not be predicted. It is therefore evident that panelist
variability should be accounted for when collecting replicate observations
from the same panelists. While this is important for discrimination data where
panelist sensitivities may vary, it becomes even more important when consid-
ering varying panelist preferences. P-values predicted by the binomial and
beta-binomial models were generally different and in some cases provided
different conclusions regarding the products being tested. These results sug-
gest that when the products being tested are not overly fatiguing, it is always
beneficial to collect and analyze replicated preference and discrimination data
using the beta-binomial model.
According to Brockhoff (2003), a corrected version of the beta-binomial
model should be used. This suggested correction is in the calculation of the
overdispersion in which it is assumed that the choice probability cannot fall
below chance, i.e., P = 1/2 for the paired preference, paired comparison and
duo-trio and P = 1/3 for the triangle and 3-AFC. While this approach may be
more representative of the whole population, it undermines the ability of the
beta-binomial model to calculate the exact amount of overdispersion from

empirically collected data (versus that of simulation) and may distort the
actual data. In fact, an individual panelist’s choice probability may be zero.
This potential distortion of the data is problematic, and for these experiments,
the (uncorrected) beta-binomial model was successful.
ACKNOWLEDGMENTS
This project was self-funded. The authors wish to acknowledge Amanda

Warnock, Christie Goerlitz and Sarah Zevchak for assistance with execution
of this study, and thank all participants.
REFERENCES
BI, J., TEMPLETON-JANIK, L., ENNIS, J.M. and ENNIS, D.M. 2000.
Replicated difference and preference tests: How to account for intertrial
variation. Food Qual. Prefer. 11, 269–273.
BROCKHOFF, P.B. 2003. The statistical power of replications in difference
tests. Food Qual. Prefer. 14, 405–417.
CHATFIELD, C. and GOODHARDT, G. 1970. The beta-binomial model for
consumer purchasing behavior. Appl. Stat. 19, 240–250.
DELWICHE, J.F. and O’MAHONY, M. 1996. Flavour discrimination: An
extension of Thurstonian “paradoxes” to the tetrad method. Food Qual.
Prefer. 7, 1–5.
DESSIRIER, J.-M. and O’MAHONY, M. 1999. Comparison of d¢ values for
the 2-AFC (paired comparison) and 3-AFC discrimination methods:
Thurstonian models, sequential sensitivity analysis and power. Food
Qual. Prefer. 10, 51–58.
ENNIS, D. 1993. The power of sensory discrimination methods. J. Sens. Stud.
8, 353–370.
ENNIS, D. and BI, J. 1998. The beta-binomial model: Accounting for inter-
trial variation in replicated difference and preference tests. J. Sens. Stud.
13, 389–412.
GANGE, S., MUNOZ, A., SAEZ, M. and ALONSO, J. 1996. Use of the beta-
binomial distribution to model the effect of policy change on appropri-
ateness of hospital stays. Appl. Stat. 45, 371–382.
O’MAHONY, M. 1995. Who told you the triangle test was simple? Food
Qual. Prefer. 6, 227–238.
O’MAHONY, M., MASUOKA, S. and ISHII, R. 1994. A theoretical note on
difference tests: Models, paradoxes and cognitive strategies. J. Sens.
Stud. 9, 247–272.
O’MAHONY, M.A.P.D.E. and ODBERT, N. 1985. A comparison of sensory

difference testing procedures: Sequential sensitivity analysis and aspects
of taste adaptation. J. Food Sci. 50, 1055–1058.
O’MAHONY, M. and ROUSSEAU, B. 2002. Discrimination testing: A few
ideas, old and new. Food Qual. Prefer. 14, 157–164.
RADOVICH, T.J.K., KLEINHENZ, M.D., DELWICHE, J.F. and LIGGETT,
R.E. 2004. Triangle tests indicate that irrigation timing affects fresh
cabbage sensory quality. Food Qual. Prefer. 15, 471–476.
RATANATRIWONG, P., BARRINGER, S.A. and DELWICHE, J. 2003. Sen-
sory preference, coating evenness, dustiness and transfer efficiency of
electrostatically coated potato chips. J. Food Sci., 68, 1542–1547.
ROUSSEAU, B. 2002. The beta-binomial test: How to utilize replicated dif-
ference tests and problems with overdispersion. In Current Topics in
Sensory and Consumer Science, at Sea Island, GA. The Institute for
Perception, Richmond, VA.
SKELLAM, J. 1948. A probability distribution derived from the binomial
distribution by regarding the probability of success as variable between
the sets of trials. J. Roy. Stat. Soc. B 10, 257–261.

The Beta Binomial Model Variability in O

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Beta Binomial Model Variability in O

Uploaded by

Copyright:

Available Formats

Blackwell Science, LtdOxford, UKJSSJournal of Sensory Studies0887-8250Copyright 2005 by Food & Nutrition Press, Inc., Trumbull, Connecticut.

2014861Original Article VARIABILITY IN OVERDISPERSIONR.E. LIGGETT and J.F. DEL-

THE BETA-BINOMIAL MODEL: VARIABILITY IN

RACHEL E. LIGGETT and JEANNINE F. DELWICHE1

The Ohio State University

Accepted for Publication November 21, 2004

The beta-binomial model accounts for variability between samples as

Journal of Sensory Studies 20 (2005) 48–61. All Rights Reserved.

EXPERIMENT 1 – PREFERENCE VERSUS DIFFERENCE

Materials and Methods

Procedure. The Office of Responsible Research Practices at The Ohio

a laboratory bench and responses were collected on paper. Judges were

Statistics. For all experiments, frequencies of responses for each judge

Results and Discussion

Test 1A Test 2 Test 3 Test 4 Test 5

EXPERIMENT 2 – STABILITY OF GAMMA ACROSS METHODS

Materials and Methods

Procedure. The Office of Responsible Research Practices at The Ohio

Statistics. Frequencies of responses for each judge and each method

cant, the beta-binomial model was used to determine if a significant difference

Results and Discussion

2-AFC 3-AFC Triangle Duo-trio

Null probability 1/2 1/3 1/3 1/2

EXPERIMENT 3 – STABILITY OF GAMMA OVER TIME

Materials and Methods

willingness to assess the given stimuli. Twenty-five judges (4 males and 21

Procedure. The Office of Responsible Research Practices at The Ohio

Statistics. As before, frequencies of responses for each judge and each

Results and Discussion

R.E. LIGGETT and J.F. DELWICHE

No. of correct responses 66 71 69 65 68 71 68 68 53 65

Judge 2 Judge 17 Judge 18

No. of correct responses 32 28 18

The unpredictable nature of overdispersion is demonstrated in each of

beta-binomial model to calculate the exact amount of overdispersion from

This project was self-funded. The authors wish to acknowledge Amanda

O’MAHONY, M.A.P.D.E. and ODBERT, N. 1985. A comparison of sensory

You might also like