Divergent Thinking Scores

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/271670146

Is What You See What You Really Get? Comparison of Scoring Techniques in the
Assessment of Real-World Divergent Thinking

Article in Creativity Research Journal · May 2014


DOI: 10.1080/10400419.2014.901023

CITATIONS READS

79 324

3 authors, including:

Jonathan Plucker Meihua Qian


Johns Hopkins University Clemson University
276 PUBLICATIONS 11,133 CITATIONS 15 PUBLICATIONS 1,293 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Jonathan Plucker on 06 February 2015.

The user has requested enhancement of the downloaded file.


This article was downloaded by: [University of Connecticut], [Jonathan A. Plucker]
On: 08 May 2014, At: 12:29
Publisher: Routledge
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,
37-41 Mortimer Street, London W1T 3JH, UK

Creativity Research Journal


Publication details, including instructions for authors and subscription information:
http://www.tandfonline.com/loi/hcrj20

Is What You See What You Really Get? Comparison of


Scoring Techniques in the Assessment of Real-World
Divergent Thinking
a b b
Jonathan A. Plucker , Meihua Qian & Stephanie L. Schmalensee
a
University of Connecticut
b
Indiana University
Published online: 08 May 2014.

To cite this article: Jonathan A. Plucker , Meihua Qian & Stephanie L. Schmalensee (2014) Is What You See What You Really
Get? Comparison of Scoring Techniques in the Assessment of Real-World Divergent Thinking, Creativity Research Journal, 26:2,
135-143

To link to this article: http://dx.doi.org/10.1080/10400419.2014.901023

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained
in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no
representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the
Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and
are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and
should be independently verified with primary sources of information. Taylor and Francis shall not be liable for
any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever
or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of
the Content.
This article may be used for research, teaching, and private study purposes. Any substantial or systematic
reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any
form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://
www.tandfonline.com/page/terms-and-conditions
CREATIVITY RESEARCH JOURNAL, 26(2), 135–143, 2014
Copyright # Taylor & Francis Group, LLC
ISSN: 1040-0419 print=1532-6934 online
DOI: 10.1080/10400419.2014.901023

Is What You See What You Really Get? Comparison


of Scoring Techniques in the Assessment of Real-World
Divergent Thinking
Jonathan A. Plucker
Downloaded by [University of Connecticut], [Jonathan A. Plucker] at 12:29 08 May 2014

University of Connecticut

Meihua Qian and Stephanie L. Schmalensee


Indiana University

In recent years, the social sciences have seen a resurgence in the study of divergent
thinking (DT) measures. However, many of these recent advances have focused on
abstract, decontextualized DT tasks (e.g., list as many things as you can think of that
have wheels). This study provides a new perspective by exploring the reliability and val-
idity evidence for several methods for scoring real-world DT tasks (i.e., tasks situated
within a real-world problem or situation). The results suggest a combination of
objective and subjective scoring methods may be optimal for originality scoring for
contextualized DT tasks, which stands in contrast to recent research suggesting the
objective, percentage scoring technique may be optimal for scoring originality on
abstract tasks.

Divergent thinking (DT) tasks remain a popular form of original (Runco, 1991, 1992). However, DT tasks allow
creativity assessment, with some evidence that they are for numerous scoring options (Silvia, 2011). Responses
increasing in use in both empirical and practical appli- are usually scored for originality (statistical infrequency),
cations (Kaufman, Plucker, & Baer, 2008; Plucker & fluency (number of responses), and flexibility (number of
Makel, 2010). This phenomenon is probably due to distinct categories of response), although there are a
several factors, including the growth in concern about range of strategies for deriving such scores. Because of
creativity in many countries and a lack of creativity the increased attention with regards to DT tests, their
assessments other than DT measures that can be easily popularity and ease regarding large group administration
administered to large groups (Plucker & Makel, 2010). within educational settings, as well as the recognition of
DT tests are open-ended tasks measuring ideational the multitude of scoring possibilities, DT assessments
fluency, the ability to create a list of responses that are have undergone significant scrutiny. However, such scru-
tiny is based on the recognition that blanket approval or
The research reported in this article was supported, in part, by a disapproval of DT measures is misguided: They have
grant to the first author from the Kempf Assessment Fund at the Indi- strengths and weaknesses, just like any other class of
ana University School of Education. We appreciate the input and assessments (Kaufman, Plucker, & Baer, 2008).
encouragement provided by James Kaufman and Paul Silvia through-
One of the weaknesses of DT tests regarding scoring
out this line of research. A version of this article was presented at the
2011 conference of the National Association for Gifted Children in techniques is fluency contamination of the scoring of
New Orleans, LA, and at a special symposium in 2011 at East China originality. Fluency is conventionally calculated by
Normal University in Shanghai, China. tallying the total number of given responses, and orig-
Correspondence should be sent to Jonathan A. Plucker, Neag inality is derived by determining the statistical infre-
School of Education, University of Connecticut, 2131 Hillside Road
quency of the responses (Plucker, Qian, & Wang, 2011).
U-3007, Storrs, CT 06269–3007. E-mail: jonathan.plucker@uconn.edu
Color versions of one or more of the figures in the article can be However, research has documented that highly fluent
found online at www.tandfonline.com/hcrj. respondents generally have a higher chance of
136 PLUCKER, QIAN, SCHMALENSEE

earning a high originality score, thus leading to fluency Makel, 2010; Plucker & Runco, 1998; Runco, 1991,
contamination on originality scores (Hocevar, 1979a, 1999; Runco, Okuda, & Thurston, 1987). Predictive
1979b; Hocevar & Michael, 1979; Plucker et al., 2011; and concurrent validity evidence is relatively strong
Silvia, 2008; Silvia et al., 2008; Torrance, 1988). Silvia, (e.g., Hong, Milgram, & Gorsky, 1995; Milgram &
Martin, and Nusbaum (2009) noted that the scoring of Hong, 1993; Okuda, Runco, & Berger, 1991; Plucker,
fluency and the number of unique responses have tra- 1999a; Sawyers & Canestaro, 1989), and discriminant
ditionally been highly correlated, begging the question: validity evidence is growing (Kaufman et al., 2008;
Do they actually express discrete pieces of assessment Plucker & Makel, 2010), but researchers have largely
information? overlooked issues of convergent validity1 (i.e., How do
The issue of fluency contamination is also important the various techniques for controlling for fluency impact
due to controversies over the psychometric quality of correlations between DT originality scores and external
DT assessments. For example, several leading research- criteria of creativity?) and the educational outcomes of
ers have found that reliability evidence for originality test use (i.e., How does use of a specific scoring tech-
Downloaded by [University of Connecticut], [Jonathan A. Plucker] at 12:29 08 May 2014

scores is often poor if fluency effects are not controlled nique impact who is admitted to a gifted program?).
(Clark & Mirels, 1970; Hocevar, 1979c), and that orig- Plucker et al. (2011) attempted to answer these ques-
inality scores are associated with improved reliability tions, but they—like many others before them—used
estimates when fluency is controlled (Runco & Albert, abstract DT tasks, by far the most common form of
1985). Due to the recognition of weaknesses within DT assessment. For example, in a recent study of
scoring techniques in the research community, many alternative scoring techniques for DT tasks, Silvia
scholars have been analyzing various scoring models (2011) utilized three common types of DT prompts:
for DT assessments. This line of research has significant ‘‘two Unusual Uses tasks (unusual uses for a brick and
historical precedents. As Silvia et al. (2009) noted, for a knife), two Instances tasks (instances of things that
Guilford and his colleagues, as part of their seminal are round and that make a noise), and two Consequences
work with DT assessment, experimented with various tasks (consequences of not needing to sleep and of every-
scoring techniques (e.g., Wilson, Guilford, & Christen- one shrinking to 12 in. in height)’’ (p. 26). Although these
sen, 1953), and several other leading creativity scholars abstract DT tasks are important to understanding scoring
have performed similar studies (Milgram & Milgram, methods, they are lacking in terms of practicality and
1978; Runco & Charles, 1993; Runco & Mraz, 1992). meaningfulness to respondents. For example, Kaufman
This research is distinct from, but runs parallel to, et al. (2008) noted that other forms of validity evidence
research on scoring techniques with creativity product are much stronger for applied DT tasks than abstract
assessments, such as the work of Caroff and Besançon items, suggesting that the lack of research on applied
(2008), Kaufman, Baer, Cole, and Sexton (2008), and DT is a major limitation on the assessment of creativity.
Plucker, Kaufman, Temple, and Qian (2009), among
many others.
These different DT scoring strategies have included, THE PRESENT STUDY
but are not limited to, objective and subjective methods,
uniqueness scoring, snapshot scoring, percentage scor- In this study, various techniques were used to score
ing, having participants rate their own originality, and student responses to the real-world DT assessments that
controlling the number of responses from each partici- employ real-world problems in a replication and exten-
pant, in addition to traditional originality, fluency, and sion of the work of several researchers, including
flexibility calculations (Clark & Mirels, 1970; Hocevar, Plucker et al. (2011) and Silvia (2011; Silvia et al.,
1979a; Hocevar & Michael, 1979; Plucker & Runco, 2008). Due to the implications of DT assessment use
1998; Runco & Mraz, 1992; Runco & Smith, 1992; (i.e., educational placements), this study serves as an
Silvia, 2011; Silvia et al., 2008; Silvia et al., 2009). Many additional piece of information in practitioners’ efforts
of these methods are time-consuming for researchers to assess student originality while controlling for pot-
and educators, leading to a situation in which one’s ential fluency contamination effects.
choice of scoring technique may impact the student’s Using the abstract Wheels and Noise tests (Wallach
score and create a tangible opportunity cost in lost time & Kogan, 1965), Plucker et al. (2011) compared seven
and resources. scoring methods, including objective and subjective
In addition, the impact of the various techniques on techniques, and found evidence that the traditional
different types of validity evidence has not been exam- scoring method (i.e., counting the number of responses
ined. DT tests are often associated with convincing provided by less than 20% of the sample) as well as
evidence of reliability and validity, although the evi- the percentage scoring method (i.e., dividing originality
dence is admittedly inconsistent (Davis, 1989; Hocevar
1
& Bachelor, 1989; Kaufman et al., 2008; Plucker & See Runco (1984, 1985) for exceptions.
SCORING REAL-WORLD DT TASKS 137

scores by fluency scores) for originality are promising, Measures


with the latter probably being the most appropriate
Participants completed two real-world DT tasks, the Pat
scoring strategy.
and Kelly problems (Appendix A; Chand & Runco,
In addition, although Silvia et al. (2008) and Hocevar
1992; Runco, Illies, & Eisenman, 2005; Runco & Okuda,
(1981) found somewhat impressive results for subjective
1988), and a battery of creativity assessments: the
scoring methods, Plucker et al. (2011) did not find that
Creative Personality Scale (CPS; Gough, 1979), the
subjective scoring methods performed well in controlling
Creative Behavior Inventory (CBI; Hocevar, 1981),
fluency effects. This inconsistency may reflect that sub-
and one performance measure of creativity. To avoid
jective scoring techniques can be unstable due to the
any presentation bias, instruments and activities were
characteristics of raters (Kaufman et al., 2008; Runco,
presented to students in a random order.
2008) or other factors such as the use of different cri-
Established protocols were used to score these
terion measures and different DT tasks. However, will
creativity assessments (Amabile, 1982, 1996; Hocevar,
the conclusions from previous research hold for real-
1979c; Kaufman et al., 2008; Plucker & Runco, 1998;
Downloaded by [University of Connecticut], [Jonathan A. Plucker] at 12:29 08 May 2014

world DT tasks? In this study, we apply techniques used


Runco, 1987). Specifically, the CPS consisted of 18 posi-
in previous studies to the scoring of two real-world
tive and 12 negative items. A participant scored 1 by
problems.
endorsing a positive item and scored 1 by endorsing
Specifically, the following issues are explored in this
a negative item. If an item was not checked, it was coded
study:
as 0. In our sample, the internal consistency estimate of
1. Because responses to DT tests are usually scored the CPS was acceptable (Cronbach’s a ¼ .70). The CBI
for originality, fluency, and flexibility, but fluency (Hocevar, 1981) contained 90 items and required parti-
can be a contaminating influence on originality cipants to indicate their involvement in a range of
scores, fluency will be treated as one unique creative activities on a 4-point scale (i.e., never, 1–2
method for scoring DT tests, and the correspond- times, 3–4 times, or more than five times). Plucker
ing evidence of reliability as well as validity will (1999b) found that a one-factor solution best represents
be examined in this study. the structure of CBI data. Internal consistency estimates
2. In contrast with fluency, several methods for were excellent (Cronbach’s a ¼ .96). The performance
scoring originality proposed in previous studies measure required students to produce a story based on
will also be examined regarding evidence of the prompt, ‘‘Write an original story of what it would
reliability and validity. be like to learn on the moon.’’ Students were given
3. By correlating fluency and various originality 5 min to write their stories. The participants were ver-
scores and taking into account the findings from bally reminded to be creative when writing the story.
the first two goals of our study, the results should Two raters, both graduate students in educational
suggest which scoring methods most effectively psychology, evaluated the creativity of each student’s
control fluency effects on originality. story. The correlation between the raters’ scores was .89.
Participants were given 5 min to complete each of the
two real-world DT tasks. The following six methods
METHOD were employed to score each student’s responses to the
DT tests in terms of originality. Fluency scores were also
Participants calculated by counting the number of responses.
A total of 148 adolescents (grades 8–10) attending a
summer, residential program for high ability students . Traditional scoring of originality a (i.e., count the
volunteered to participate in this study (43% girls, 57% number of responses provided by less than 10% of
boys). Seventh-grade students in the program’s region the sample; Method 1)
of the country scoring at or above the 97th percentile . Traditional scoring of originality b (i.e., count the
on their school-administered achievement test were number of responses provided by less than 20% of
invited to participate in the talent search, which involved the sample; Method 2)
an invitation to take the SAT or ACT out-of-level. Stu- . Percentage scoring formula (i.e., originality a div-
dents achieving specific SAT (or ACT) score criteria ided by fluency using the entire ideational pool;
become eligible for the summer residential programs. Method 3)
Although racial information was not collected for stu- . Percentage scoring formula (i.e., originality b div-
dents in this sample, the sample was similar to that used ided by fluency using the entire ideational pool;
in previous studies, where the samples included roughly Method 4)
75% Caucasian, 15% Asian American, 5% African . Scoring of originality using external raters’ ratings
American, and 5% Hispanic students. of the entire ideational pool (Method 5)
138 PLUCKER, QIAN, SCHMALENSEE

. Scoring of originality c using external raters’ Validity Evidence


ratings of the answers provided by less than 20%
Convergent and criterion-related validity. Because
of the sample (Method 6)
. Percentage scoring formula (i.e., originality c
the CPS assesses participants’ creative personality and
DT tasks estimate an aspect of their creative potential,
obtained through Method 6 divided by fluency
the correlation between them provides evidence of each
using the entire ideational pool) (Method 7)
scoring method’s convergent validity. Correlations
between DT tasks and the CBI, which assesses parti-
For Methods 5–6, two raters evaluated the creativity cipants’ actual creative achievements, and the perfor-
of the responses from each student. Each item was mance measure of creativity which is used to measure
evaluated separately on a 0–10 scale; evaluating them subjects’ creative product will provide evidence of DT
collectively would have increased the likelihood of flu- tasks’ predictive and concurrent validity, respectively.
ency effects. The correlations between raters were .90 Table 4 includes correlations between the various
for Pat item evaluations and .91 for Kelly item evalua- originality scores and the criterion measures. The data
Downloaded by [University of Connecticut], [Jonathan A. Plucker] at 12:29 08 May 2014

tions for Method 5. suggest that almost every scoring method significantly
correlated with CPS (providing evidence of convergent
validity) and CBI (evidence of predictive validity), with
the evidence of validity most convincing for Methods
RESULTS 4, 6, and 7. However, in all cases, the magnitude of cor-
relations was not impressive. In addition, few of the
Tables 1 and 2 include the descriptive statistics of these originality scores were significantly correlated with the
DT task scores. For the Pat problem, all the scores performance measures of creativity (concurrent valid-
approximated a normal distribution. However, with ity), and those that were corresponding to one objective
regard to the Kelly problem, the distributions of orig- scoring (i.e., Method 4) and the combination of subjec-
inality scores varied. Specifically, the objective scoring tive and objective scoring of originality (i.e., Method 7)
methods (i.e., Methods 3, 4 and 7) produced similar dis- were relatively small in magnitude. Fluency scores signi-
tributions that were close to normal; the other objective ficantly correlated with some of the criterion measures,
methods (i.e., Methods 1 and 2) and the subjective scor- but the magnitude of the correlations was also small.
ing methods (i.e., Methods 5 and 6) produced another
cluster of similar distributions, which were all positively
skewed and peaked to various degrees. Social consequences of test interpretation and
use. As noted by Messick (1989, 1995), various types
of validity evidence (e.g., content validity, predictive
and concurrent validity, convergent and discriminant
Reliability Evidence
validity) should be considered when examining the
Intraclass correlations were used to provide evidence of psychometric quality of data produced by a particular
reliability for the DT test scores based on each of the measure. Also, test validation involves not only score
scoring methods (see Table 3). With the exception of meaning but also value implications and action out-
Methods 3 and 4, reliability estimates did not vary comes, especially for applied decision making. The
dramatically. Estimates, although not overwhelmingly social consequences of test interpretation and use consti-
positive, are in the expected range for assessments using tute an important aspect of validity (Kane, 2008), which
only two tasks. has received little attention within creativity research.

TABLE 1
Descriptive Statistics: Pat Problem

Scoring Methods Mean SD N Skew (SE) Kurtosis (SE)

Fluency 8.58 3.70 148 .56 (.20) .17 (.40)


1: Traditional scoring method @ 10%(Originality a) 2.09 1.91 148 1.21 (.20) 1.13 (.40)
2: Traditional scoring method @ 20% (Originality b) 5.50 3.52 148 .87 (.20) .65 (.40)
3: Originality a=fluency .22 .16 148 .50 (.20) –.19 (.40)
4: Originality b=fluency .60 .22 148 –.44 (.20) –.04 (.40)
5: Scoring of originality using external raters’ ratings of the entire ideational pool 25.48 11.73 148 .65 (.20) .59 (.40)
6: Scoring of originality using external raters’ ratings of the answers provided by 15.66 10.37 148 .92 (.20) .90 (.40)
less than 20% of the sample
7: Originality via Method 6=fluency 1.70 .65 148 –.35 (.20) –.13 (.40)
SCORING REAL-WORLD DT TASKS 139

TABLE 2
Descriptive Statistics: Kelly Problem

Scoring Methods Mean SD N Skew (SE) Kurtosis (SE)

Fluency 5.22 2.32 143 .92 (.20) 1.98 (.40)


1: Traditional scoring method @ 10% (Originality a) 1.32 1.41 142 1.50 (.20) 3.14 (.40)
2: Traditional scoring method @ 20% (Originality b) 3.21 2.03 142 1.14 (.20) 2.17 (.40)
3: Originality a=fluency .24 .23 142 .76 (.20) .16 (.40)
4: Originality b=fluency .60 .24 143 –.47 (.20) .37 (.40)
5: Scoring of originality using external raters’ ratings of the entire ideational pool 14.76 7.28 142 1.53 (.20) 5.93 (.40)
6: Scoring of originality using external raters’ ratings of the answers provided by 9.61 6.00 136 2.25 (.20) 10.30 (.40)
less than 20% of the sample
7: Originality via Method 6= fluency 1.71 .81 143 .44 (.20) .83 (.40)
Downloaded by [University of Connecticut], [Jonathan A. Plucker] at 12:29 08 May 2014

they were used to rank students. Figure 1 depicts how


TABLE 3
Reliability Estimates of the Various Scoring DT Tests student ranks for originality differ according to the
two scoring methods. If only minor differences exist,
Scoring Methods ICC the scatterplot would depict points clustered along a
Fluency .65
straight line moving from the bottom left to the top
1: Traditional scoring method @ 10%(Originality a) .57 right, with few data points in the upper left and lower
2: Traditional scoring method @ 20% (Originality b) .62 right quadrants, and the correlation among the ranks
3: Originality a=fluency .43 would be large and statistically significant. The data in
4: Originality b=fluency .37 Figure 1 suggest that, for the majority of students, the
5: Scoring of originality using external raters’ ratings of the .70
entire ideational pool
ranks are similar. However, the number of students
6: Scoring of originality using external raters’ ratings of the .67 who ranked high according to one method but low on
answers provided by less than 20% of the sample the other is striking; this phenomenon appears to be
7: Originality via Method 6=fluency .53 especially marked for students who scored high using
Method 7.
Note. ICC: intraclass correlation coefficient.
The scatterplot data can be interpreted in several
ways, but one interpretation appears to be most plaus-
Given the frequency with which educators use DT ible. Method 4 controlled the impact of fluency through
test scores to make important placement decisions, such dividing originality scores obtained via Method 2 by flu-
as identification for accelerated learning programs, we ency scores, but the data suggest that fluency is still a
used ranks to illustrate the influence of different scoring contaminating factor in the calculation of some orig-
methods on decision making in real educational settings. inality scores because some students who ranked very
In this study, as the objective scoring (i.e., Method 4) as high according to Method 7 (dividing originality scores
well as the combination of objective and subjective scor- obtained via Method 6 by fluency scores) ended up
ing (Method 7) appear to be relatively more promising, with very low ranks according to Method 4. Hence,

TABLE 4
Correlations Between Pat and Kelly Problem Scores and Criterion Measures

Creative Personality Creative Behavior Performance Measure


Scale Inventory of Creativity

Scoring Methods r tau r tau r tau

Fluency .186 .142 .324 .216 .071 .068


1: Traditional scoring method @ 10% (Originality a) .260 .219 .356 .268 .094 .127
2: Traditional scoring method @ 20% (Originality b) .294 .237 .426 .291 .125 .132
3: Originality a=fluency .294 .203 .330 .238 .164 .138
4: Originality b=fluency .386 .260 .407 .290 .197 .118
5: Scoring of originality using external raters’ ratings of the entire ideational pool .211 .141 .312 .211 .077 .080
6: Scoring of originality using external raters’ ratings of the answers provided by .328 .232 .425 .298 .058 .086
less than 20% of the sample
7: Originality via Method 6=fluency .400 .275 .360 .249 .193 .121

Note. r ¼Pearson product-moment correlations. tau ¼Kendall rank order correlations.



p < .05.  p < .01.
140 PLUCKER, QIAN, SCHMALENSEE
Downloaded by [University of Connecticut], [Jonathan A. Plucker] at 12:29 08 May 2014

FIGURE 1 Scatterplot of ranks based on the Pat and Kelly problems.

alternative methods for scoring originality and control- (i.e., originality was calculated by counting number of
ling for fluency effects are critical in applied settings. responses provided by less than 20% of the sample),
these results are not surprising.

Relationship Between Fluency and Originality


To further explore the relationship between fluency DISCUSSION
and originality scores, Table 5 displays the correlations
between fluency and various originality scores. Appar- Reliability
ently, Method 5 was influenced the most by fluency.
In this study, two real-world problems were used as the
However, the magnitude of correlations between fluency
DT tasks, and four objective and one subjective method,
and other methods was also generally large. Considering
as well as two methods involving both subjective and
the nature of traditional scoring methods for originality
objective scoring, were employed to score participants’
responses for originality. Previous creativity researchers
TABLE 5 (e.g., Hocevar, 1981; Silvia et al., 2008) concluded that
Correlations Between Fluency and Seven Scoring Methods subjective scoring methods of DT tasks are more appro-
Fluency priate from a conceptual and psychometric perspective.
As one finding of our study indicates, Method 5 (scoring
Scoring Methods for Originality r tau of originality using external raters’ ratings of the entire
1: Traditional scoring method @ 10% (Originality a) .71 .54 ideational pool) did produce the most convincing evi-
2: Traditional scoring method @ 20% (Originality b) .92 .77 dence of reliability among the seven methods. At the
3: Originality a=fluency .30 .24 same time, the results in Table 5 suggest that Method
4: Originality b=fluency .32 .28 5 is the subjective measure most likely to be influenced
5: Scoring of originality using external raters’ .95 .82
by fluency effects. These findings were consistent with
ratings of the entire ideational pool
6: Scoring of originality c using external raters’ .88 .69 the results in Plucker et al. (2011). In other words, the
ratings of the answers provided by less than 20% content of DT tasks does not seem to have a significant
of the sample impact on the effectiveness of various scoring methods
7: Originality via Method 6=fluency .33 .26 in terms of reliability.
Note. r ¼Pearson product-moment correlations. tau ¼Kendall rank
In addition, with regard to the traditional objective
order correlations. scoring for originality, some researchers have suggested

p < .05.  p < .01. assigning one point for responses given by fewer than
SCORING REAL-WORLD DT TASKS 141

5% or 10% of the sample (e.g., Milgram & Milgram, Method 2=fluency) and Method 7 (originality via Method
1976), although a more common practice is to count 6=fluency) were far from identical. For example, a student
number of responses provided by less than 20% of the received a rank in the top 30 according to Method 4, but
sample (e.g., Plucker et al., 2011). According to Table 3 the same student was placed over 100th when Method 7
in this study, there is no obvious difference in reliability was used. These findings suggest that even small differ-
estimates between Method 1 (i.e., 10%) and Method 2 ences in scoring techniques can have a disproportionate
(i.e., 20%), or between Method 3 and Method 4. This impact on score rankings for some students.
indicates that the degree of infrequency does not greatly Given that the purpose of this study was to explore
influence reliability evidence. the impact of different methods for controlling fluency,
it seems reasonable to conclude that using alternative,
fluency-controlling methods for calculating DT origin-
Validity
ality does not have a significant impact on the majority
According to the results in Plucker et al. (2011), fluency of participants; but for a small subsample, controlling
Downloaded by [University of Connecticut], [Jonathan A. Plucker] at 12:29 08 May 2014

was not associated with evidence of validity. However, for fluency does lower originality scores substantially
in this study, fluency scores were significantly correlated relative to their peers.
with some of the criterion measures, but the magnitude
was relatively small. For originality, each scoring
Relationship Between Fluency and Originality
method was associated with considerable evidence of
validity, although the levels of evidence varied. Com- The correlations between fluency and creativity criterion
pared to the findings in Plucker et al. (2011), the tra- measures were low (Table 4), which implies that fluency
ditional scoring method (Method 2) and percentage should not play the same role as originality in scoring
scoring method (Method 4) still performed well in terms students’ responses to DT tests. However, fluency sig-
of convergent validity and predictive validity, but the nificantly correlated with most of the scoring methods.
combinations of subjective and objective scoring (i.e., In other words, fluency is, indeed, a contaminating
Method 6 or 7) also looked promising. The magnitude factor in scoring originality, and the techniques used
of all correlations was considerably larger than those in this study to control for this contaminating influence
in the previous study, suggesting that the content of were only partially successful.
DT tasks matters, given that this study used two
real-world prompts and Plucker et al. (2011) used two
Future Research
abstract prompts (i.e., the Wheels and Noise tests).
In addition, according to Table 4, there are only Improving the precision and psychometric quality of
minor differences in validity indices between Method 1 measures, especially the most widely used measures, will
(i.e., 10%) and Method 2 (i.e., 20%), or between Method continue to be a critical contribution to building the
3 (10%=fluency) and Method 4 (20%=fluency). These foundations of the field of creativity research. Although
results provide evidence that different cut-off points do this study suggests that a combination of subjective and
not play a dominant role in the effectiveness of various objective scoring techniques may be good candidates for
objective scoring methods in terms of convergent and scoring DT tests for originality, more research is needed
predictive validity evidence. to examine the generalizability of this conclusion and
With respect to concurrent validity, none of the its psychometric implications, especially in the areas of
methods performed impressively. This confirms the find- concurrent validity and social consequences of test use.
ings in Plucker et al. (2011), but stands in contrast to the For example, examining both originality and appropri-
work of Silvia et al. (2008), who found somewhat ateness or utility of responses (e.g., Runco & Charles,
impressive results for subjective scoring methods. This 1993), especially when real-world DT tasks are used, is
inconsistency may reflect that completely subjective appropriate and in line with recent conceptualizations
scoring techniques can be unstable due to the character- of creativity (Plucker, Beghetto, & Dow, 2004).
istics of raters (Kaufman et al., 2008; Runco, 2008) or
other factors such as the use of different criterion mea-
sures (e.g., the Big Five Personality Test in Kaufman REFERENCES
et al. versus the performance measure of creativity in
the present study) and different DT tests. Amabile, T. M. (1982). Social psychology of creativity: A consensual
One striking finding of this study is the evidence of assessment technique. Journal of Personality and Social Psychology,
social consequences of test use. Rank-order correlations 43, 997–1013.
Amabile, T. M. (1996). Creativity in context: Update to ‘‘the social
were used to illustrate the influence of various scoring psychology of creativity.’’ Boulder, CO: Westview Press.
methods on educational practice. Figure 1 demonstrates Caroff, X., & Besançon, M. (2008). Variability of creativity judgments.
that the ranks based on Method 4 (originality via Learning and Individual Differences, 18, 367–371.
142 PLUCKER, QIAN, SCHMALENSEE

Chand, I., & Runco, M. A. (1992). Problem finding skills as Plucker, J., & Makel, M. (2010). Assessment of creativity. In R. J.
components in the creative process. Personality and Individual Sternberg & J. C. Kaufman (Eds.), The Cambridge handbook of
Differences, 14, 155–162. creativity (pp. 48–73). New York, NY: Cambridge.
Clark, P. M., & Mirels, H. L. (1970). Fluency as a pervasive element in the Plucker, J., Qian, M., & Wang, S. (2011). Is originality in the eye of the
measurement of creativity. Journal of Educational Measurement, 7, 83–86. beholder? Comparison of scoring techniques in the assessment of
Davis, G. A. (1989). Testing for creative potential. Contemporary divergent thinking. Journal of Creative Behavior, 45, 1–22.
Educational Psychology, 14, 257–274. Plucker, J. A., & Runco, M. (1998). The death of creativity measurement
Gough, H. G. (1979). A creative personality scale for the Adjective Check has been greatly exaggerated: Current issues, recent advances, and future
List. Journal of Personality and Social Psychology, 37, 1398–1405. directions in creativity assessment. Roeper Review, 21, 36–39.
Hocevar, D. (1979a). A comparison of statistical infrequency and Runco, M. A. (1984). Teachers’ judgments of creativity and social
subjective judgment as criteria in the measurement of originality. validation of divergent thinking tests. Perceptual and Motor Skills,
Journal of Personality Assessment, 43, 297–299. 59, 711–717.
Hocevar, D. (1979b). Ideational fluency as a confounding factor in the Runco, M. A. (1985). Reliability and convergent validity of ideational
measurement of originality. Journal of Educational Psychology, 71, flexibility as a function of academic achievement. Perceptual and
191–196. Motor Skills, 61, 1075–1081.
Hocevar, D. (1979c). The unidimensional nature of creative thinking in Runco, M. A. (1987). Interrater agreement on a socially valid measure
Downloaded by [University of Connecticut], [Jonathan A. Plucker] at 12:29 08 May 2014

fifth grade children. Child Study Journal, 9, 273–278. of students’ creativity. Psychological Reports, 61, 1009–1010.
Hocevar, D. (1981). Measurement of creativity: Review and critique. Runco, M. A. (1991). Divergent thinking. Norwood, NJ: Ablex.
Journal of Personality Assessment, 45, 450–464. Runco, M. A. (1992). Children’s divergent thinking and creative
Hocevar, D., & Bachelor, P. (1989). A taxonomy and critique of ideation. Developmental Review, 12, 233–264.
measurements used in the study of creativity. In J. A. Glover, Runco, M. A. (1999). Divergent and creative thinking. Cresskill, NJ:
R. R. Ronning & C. R. Reynolds (Eds.), Handbook of creativity Hampton Press.
(pp. 53–75). New York, NY: Plenum Press. Runco, M. A. (2008). Commentary: Divergent thinking is not synony-
Hocevar, D., & Michael, W. B. (1979). The effects of scoring formulas mous with creativity. Psychology of Aesthetics, Creativity, and the
on the discriminant validity of tests of divergent thinking. Educa- Arts, 2, 93–96.
tional and Psychological Measurement, 39, 917–921. Runco, M. A., & Albert, R. S. (1985). The reliability and validity of
Hong, E., Milgram, R. M., & Gorsky, H. (1995). Original thinking as a ideational originality in the divergent thinking of academically
predictor of creative performance in young children. Roeper Review, gifted and nongifted children. Educational and Psychological
18, 147–149. Measurement, 45, 483–501.
Kane, M. T. (2008). Terminology, emphasis, and utility in validation. Runco, M. A., & Charles, R. E. (1993). Judgments of originality
Educational Researcher, 37, 76–82. and appropriateness as predictors of creativity. Personality and
Kaufman, J. C., Baer, J., Cole, J. C., & Sexton, J. D. (2008). A Individual Differences, 15, 537–546.
comparison of expert and nonexpert raters using the consensual Runco, M. A., Illies, J. J., & Eisenman, R. (2005). Creativity, orig-
assessment technique. Creativity Research Journal, 20, 171–178. inality, and appropriateness: What do explicit instructions tell us
Kaufman, J. C., Plucker, J. A., & Baer, J. (2008). Essentials of creativ- about their relationships? Journal of Creative Behavior, 39, 137–148.
ity assessment. New York, NY: Wiley. Runco, M. A., & Mraz, W. (1992). Scoring divergent thinking tests
Messick, S. (1989). Meaning and values in test validation: The science using total ideational output and a creativity index. Educational
and ethics of assessment. Educational Researcher, 18, 5–11. and Psychological Measurement, 52, 213–221.
Messick, S. (1995). Validity of psychological assessment: Validation of Runco, M. A., & Okuda, S. M. (1988). Problem-discovery, divergent
inferences from persons’ responses and performances as scientific thinking, and the creative process. Journal of Youth and Adoles-
inquiry into score meaning. American Psychologist, 50, 741–749. cence, 17, 211–220.
Milgram, R. M., & Hong, E. (1993). Creative thinking and creative per- Runco, M. A., Okuda, S. M., & Thurston, B. J. (1987). The psycho-
formance in adolescents as predictors of creative attainments in metric properties of four systems for scoring divergent thinking
adults: A follow-up study after 18 years. Roeper Review, 15, 135–139. tests. Journal of Psychoeducational Assessment, 5, 149–156.
Milgram, R. M., & Milgram, N. A. (1976). Creative thinking and Runco, M. A., & Smith, W. R. (1992). Interpersonal and intrapersonal
creative performance in Israeli students. Journal of Educational evaluations of creative ideas. Personality and Individual Differences,
Psychology, 68, 255–259. 13, 295–302.
Milgram, R. M., & Milgram, N. A. (1978). Quality and quantity of Sawyers, J. K., & Canestaro, N. C. (1989). Creativity and achievement
creative thinking in children and adolescents. Child Development, in design coursework. Creativity Research Journal, 2, 126–133.
49, 385–388. Silvia, P. J. (2008). Creativity and intelligence revisited: A latent
Okuda, S. M., Runco, M. A., & Berger, D. E. (1991). Creativity and variable analysis of Wallach and Kogan (1965). Creativity Research
the finding and solving or real-world problems. Journal of Psycho- Journal, 20, 34–39.
educational Assessment, 9, 45–53. Silvia, P. J. (2011). Subjective scoring of divergent thinking: Examining
Plucker, J. A. (1999a). Is the proof in the pudding? Reanalyses of the reliability of unusual uses, instances, and consequences tasks.
Torrance’s (1958 to Present) longitudinal data. Creativity Research Thinking Skills and Creativity, 6, 24–30.
Journal, 12, 103–114. Silvia, P. J., Martin, C., & Nusbaum, E. C. (2009). A snapshot of crea-
Plucker, J. A. (1999b). Reanalyses of student responses to creativity tivity: Evaluating a quick and simple method for assessing divergent
checklists: Evidence of content generality. Journal of Creative thinking. Thinking Skills and Creativity, 4, 79–85.
Behavior, 33, 126–137. Silvia, P. J., Winterstein, B. P., Willse, J. T., Barona, C. M., Cram,
Plucker, J., Beghetto, R. A., & Dow, G. (2004). Why isn’t creativity more J. T., Hess, K. I., . . . Richard, C. A. (2008). Assessing creativity with
important to educational psychologists? Potential, pitfalls, and future divergent thinking tasks: Exploring the reliability and validity of new
directions in creativity research. Educational Psychologist, 39, 83–96. subjective scoring methods. Psychology of Aesthetics, Creativity, and
Plucker, J., Kaufman, J. C., Temple, J. S., & Qian, M. (2009). Do the Arts, 2, 68–85.
experts and novices evaluate movies the same way? Psychology & Torrance, E. P. (1988). The nature of creativity as manifest in its test-
Marketing, 26, 470–478. ing. In R. J. Sternberg (Ed.), The nature of creativity: Contemporary
SCORING REAL-WORLD DT TASKS 143

psychological perspectives (pp. 43–75). New York, NY: Cambridge while you are doing your work. Sometimes Pat distracts
University Press. you and you miss an important part of the lecture, and
Wallach, M. A., & Kogan, N. (1965). Modes of thinking in young
children: A study of the creativity-intelligence distinction. New York,
many times you don’t finish your work because Pat is
NY: Holt, Rinehart and Winston. bothering you. What should you do? How would you
Wilson, R. C., Guilford, J. P., & Christensen, P. R. (1953). The solve this problem? Remember to list as many original
measurement of individual differences in originality. Psychological ideas and solutions as you can.
Bulletin, 50, 362–370. Kelly problem: It’s a great day for sailing and
your friend, Kelly, calls and asks you if you want
APPENDIX A: REAL-WORLD DIVERGENT to go sailing. Unfortunately, you have a big project
THINKING TASKS due tomorrow, and it requires a full day to
complete. You would rather be sailing. What are
Pat problem: Your friend Pat sits next to you in class. you going to do? Think of as many original ideas as
Pat really likes to talk to you and often bothers you you can.
Downloaded by [University of Connecticut], [Jonathan A. Plucker] at 12:29 08 May 2014

View publication stats

You might also like