1 s2.0 S0959475210000198 Main

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Learning and Instruction 21 (2011) 220e231

www.elsevier.com/locate/learninstruc

Measuring spontaneous and instructed evaluation processes during


Web search: Integrating concurrent thinking-aloud protocols and
eye-tracking data
Peter Gerjets*, Yvonne Kammerer, Benita Werner
Knowledge Media Research Center, Konrad-Adenauer-Str. 40, 72072 Tuebingen, Germany

Abstract

Web searching for complex information requires to appropriately evaluating diverse sources of information. Information science studies
identified different criteria applied by searchers to evaluate Web information. However, the explicit evaluation instructions used in these studies
might have resulted in a distortion of spontaneous evaluation processes. Accordingly, the present study compared explicit evaluation instructions
and neutral thinking-aloud instructions. Data from thinking-aloud protocols, eye tracking, and information problem-solving were collected from
30 participants equally distributed to two experimental conditions, that is, the Instructed Evaluation condition and the Spontaneous Evaluation
condition. Instructed evaluation, as compared to spontaneous evaluation, resulted in more verbal utterances of quality-related evaluation criteria,
in an increased attention focus on user ratings displayed on Web pages, and in better quality of decision making, although participants in the
Instructed Evaluation condition were not able to better justify their decision as compared to participants in the Spontaneous Evaluation
condition.
Ó 2010 Elsevier Ltd. All rights reserved.

Keywords: Web search; Evaluation criteria; Thinking aloud; Eye tracking

1. Introduction information. There is hardly any quality assurance on the Web


as it used to be when expert knowledge was obtained from
For the public, the World Wide Web (WWW) has academic professionals or publishers (e.g., certificates of
increasingly evolved into one of the most important informa- professional qualifications).
tion sources, especially when it comes to complex contents of As a result, Web users are required to appropriately eval-
personal concern like medical, environmental, or technical uate diverse, potentially diffuse, or even contradictory sources
issues (Stadtler & Bromme, 2007). More and more, searching of information. Hence, Web search has turned into an activity
for complex and science-related information on the Web that goes far beyond simple fact finding or browsing, although
supplements the interaction with experts, for instance, when these tasks are still standard paradigms used in scientific
a diagnosis or a treatment for a medical or technical problem studies on Web search (Brumby & Howes, 2008; Cutrell &
is needed. However, on the Web not only scientific and other Guan, 2007; Joachims, Granka, Pan, Hembrooke, & Gay,
institutions, but also journalists, companies, and lay-people 2005; Miller & Remington, 2004). Instead, Web search is
provide information on complex domains, yielding a substan- often related to personal decisions under uncertainty in
tial variability in terms of the quality and reliability of Web domains characterized by fragile and conflicting evidence
where goals of high personal relevance are set up and pursued
in a self-directed way. Accordingly, adequate strategies for the
* Corresponding author. Tel.: þ49 7071 979 219; fax: þ49 7071 979 126. critical evaluation of information quality while searching for
E-mail address: p.gerjets@iwm-kmrc.de (P. Gerjets). complex contents on the Web become a crucial factor for the

0959-4752/$ - see front matter Ó 2010 Elsevier Ltd. All rights reserved.
doi:10.1016/j.learninstruc.2010.02.005
P. Gerjets et al. / Learning and Instruction 21 (2011) 220e231 221

searcher’s success in terms of gaining important and valid Accordingly, three different types of evaluation processes
information as well as in terms of solving his or her infor- are pivotal during Web search, namely the evaluation (a) of
mation problems. However, it has been shown that searchers search results, (b) of Web pages, and (c) of document
usually face difficulties in appropriately evaluating informa- collections. Several studies have focused on the third evalua-
tion during Web search (Brand-Gruwel, Wopereis, & tion phase, that is, on the evaluation of an incoherent docu-
Vermetten, 2005; Gerjets & Hellenthal-Schorr, 2008). ment collection in order to construct an integrated mental
From these considerations, it should become clear that representation of the search domain (Rouet, 2006; Rouet,
there are normative as well as descriptive questions to be Favart, Britt, & Perfetti, 1997; Stadtler & Bromme, 2007).
answered with regard to the evaluation of Web information. In the present article, however, the focus will be on the
Normative issues concern the questions of what evaluation cognitive processes occurring during the first two evaluation
processes searchers should engage in and how they should be phases, that is, the search-result evaluation phase and the Web-
supported to do so. Descriptive issues address the questions of page evaluation phase. Specifically, for the search-result
what evaluation processes searchers actually do display evaluation phase it will be analyzed how search results are
spontaneously during Web search and how different scanned and evaluated with regard to their significance for the
constraints of their search situation actually do influence their current search goal. For the Web-page evaluation phase it will
engagement in evaluation processes. In this paper we focus on be investigated how users subsequently evaluate the Web
a descriptive perspective by investigating what type of spon- pages that they actually selected from a SERP. Nevertheless,
taneous evaluation processes occur when users solve the third evaluation phase will not be neglected, as also
a complex search task related to a medical problem by using searchers’ solutions to their information problem that results
a standard search-engine design. In particular, we are inter- from integrating different sources will be analyzed.
ested in the role of different evaluation criteria in different
stages of the Web search process. 1.2. Judging the ‘‘relevance’’ of Web information

1.1. Evaluation processes during information search Theoretical accounts for the evaluation of search results
on the WWW and Web pages usually refer to the concept of relevance.
However, the definition of this concept varies between
Several models have described the process of Web search different scientific disciplines concerned with Web search,
as an information problem-solving process that can be with a rather narrow definition of relevance in information
segmented into several sub-processes (Brand-Gruwel et al., retrieval (IR) research and a broader definition in information
2005; Gerjets & Hellenthal-Schorr, 2008; Marchionini, 1995; science.
Wilson, 1999). A compilation from different existing models In research on IR, the term relevance denotes how well
yields five important stages of the Web search process: a retrieved set of documents meets the information needs of
a user (Maglaughlin & Sonnenwald, 2002). In this literature,
(a) An information problem is recognized and a search goal is relevance thus refers to topical relevance (or topicality), that
defined. is, the match between the topic of the information need or
(b) Mostly, as a means to access Web information a search query and the topic of a search result (Pirolli & Card, 1999).
engine (e.g., Google) is selected, where a query with Several Web search studies have shown that users rely heavily
search terms is entered and sent off. Subsequently, a so- on the topicality ranking offered by search engines and
called ‘‘search engine result page’’ (SERP) with a list of predominantly select the first few links on the top of a SERP
search result links is returned to the user. (Cutrell & Guan, 2007; Granka, Joachims, & Gay, 2004; Guan
(c) Available search results are scanned and evaluated with & Cutrell, 2007; Joachims et al., 2005; Pan et al., 2007). In
regard to their significance for the search goal and links have accordance with the IR tradition to identify relevance with
to be selected for further inspection. This evaluation of topicality, cognitive Web-search models derived from research
search results is based on sparse information about the at the intersection of IR, humanecomputer interaction (HCI),
corresponding Web pages and the information they contain and cognitive science are usually restricted to the evaluation of
(e.g., titles, summaries, and uniform resource locators, topicality (Brumby & Howes, 2008; Miller & Remington,
URLs). Clicking on a link can be regarded as the central 2004; Pirolli & Card, 1999; Pirolli & Fu, 2003).
(observable) action in the information search process as it Based on the empirical success of these models to explain
reflects which information will be subsequently accessed. and predict the selection of search results, it seems at first sight
(d) After accessing a selected Web page, its information is that the concept of topical relevance is sufficient to under-
scanned, evaluated with regard to its relevance for the standing users’ evaluation of Web information. It has to be
search goal, and in case of relevance information is noted, however, that this conclusion might be a bit premature,
extracted for further processing. because these models have been mainly applied to rather
(e) The information from different pages is compared and simple fact-finding tasks or to search tasks for which a selec-
integrated towards a solution of the information problem. tion of Web sources containing uncontroversial and consistent
Here again evaluations are required to resolve conflicts and information of high quality was provided. Such tasks demand
incoherences within the document collection retrieved. that users focus their attention mainly on the topical fit of
222 P. Gerjets et al. / Learning and Instruction 21 (2011) 220e231

available information. However, in contrast to traditional will instead directly refer to topicality and quality as two
search tasks like fact finding, searching for complex infor- important evaluation criteria.
mation on the Web is characterized by a much larger vari-
ability with regard to search goals pursued and information 1.3. Methodological considerations: measuring
quality encountered. It can thus be assumed that Web evaluation processes during Web search
searchers will also evaluate other aspects of Web information
beyond topicality, at least when certain preconditions are given Concerning the measurement of evaluation processes
(e.g., the search task is sufficiently complex, the available occurring during Web search, the scope of most studies on
information is highly variable with regard to its quality, the evaluation criteria conducted in information science is limited
user has the cognitive prerequisites to engage in these by two specific methodological shortcomings.
processes at his or her disposal). First, many studies investigating the use of different eval-
Furthermore, it has to be considered that different stages of uation criteria in the evaluation of Web information seem to
the search process might differ with regard to the evaluation fail in addressing spontaneous evaluation processes as partic-
processes users engage in. It can be assumed that evaluating ipants are explicitly instructed to evaluate search results and
links on a SERP differs from evaluating the corresponding Web pages. All of these studies have in common e indepen-
Web pages in several respects. For instance, SERPs are very dent of the methods of data collection applied, e.g., screen
often confined to topical information in terms of the title and capturing, thinking-aloud protocols, highlighting of important
the description of each search result. Contrary to that, Web aspects of Web pages, or retrospective reports e that partici-
pages usually also include quality-related cues like user pants are instructed beforehand to mention or mark important
ratings, author information, or publication information. factors or criteria which might influence their evaluation of
Accordingly, users might be more focused on the evaluation of information (Crystal & Greenberg, 2006; Rieh, 2002;
topicality when selecting links from a SERP, whereas they Savolainen & Kari, 2005; Tombros et al., 2005). This kind
might be more open to other quality-related evaluation criteria of task does not seem to be suitable to investigate spontaneous
when scanning the corresponding Web pages. evaluation processes, as it is likely to yield artifacts. Partici-
One way of addressing the assumption that evaluating Web pants may become much more aware of their evaluation and
information should not be confined to judgments of topicality, selection processes due to the instruction used, resulting in
is to broaden the concept of relevance beyond topicality. For a distortion of their spontaneous evaluation processes. In the
instance, in information science relevance is introduced as present study the aim was to provide evidence that e as
a concept that cannot be reduced to topical fit, but is based on criticized e a thinking-aloud procedure including explicit
a set of different evaluation criteria (Borlund, 2003; Schamber, instructions to apply evaluation criteria during Web search (as
1994) also reflecting the quality of information. In the last it is done, for instance, by Tombros et al., 2005) significantly
decade several studies which addressed the evaluation of Web increases the number of evaluation criteria mentioned
search results and Web pages (Crystal & Greenberg, 2006; compared to standard neutral thinking-aloud instructions (in
Rieh, 2002; Savolainen & Kari, 2005; Tombros, Ruthven, & line with Ericsson & Simon, 1993), which should reveal more
Jose, 2005) yielded that although topical fit (e.g., topicality, spontaneous and undirected evaluation processes.
topical interest, scope) is the evaluation criteria most often A second concern with regard to the methods currently
applied, searchers nevertheless used other evaluation criteria used to investigate evaluation criteria in information science is
as well which refer to the quality of information (e.g., credi- the strong focus on consciously accessible, verbalized criteria
bility, up-to-dateness, design). Moreover, in these studies, and decisions that lead to overt interactions with the search
publishing information (e.g., publication date, name and environment (e.g., mouse movements and clicks). Hence,
profession of the author) and source information (e.g., refer- evaluation processes that go beyond overt actions remain
ences, contact information, affiliation) turned out to be largely undiscovered, and so do quick and unconscious eval-
important Web page characteristics that were used to evaluate uation processes. Moreover, the evaluation processes leading
the quality of Web information. According to social navigation to the decision to not select a search result are as important as
research it can be additionally assumed that social information the processes leading to the selection of a search result. To
like user ratings (e.g., rating stars, comments) will also be used unravel these processes that might not show up in overt
by searchers to derive quality judgments when navigating and behaviour, eye-tracking methodologies seem particularly
exploring information spaces (Höök, Benyon, & Munro, promising because they provide more detailed insights in
2003). However, as we will argue in the next section, the participants’ cognitive processing (Rayner, 1998). For
results on evaluation criteria obtained in information science instance, eye tracking allows to reconstruct every search result
are by no means conclusive yet due to two methodological and every Web page characteristic that was looked at irre-
shortcomings of current research paradigms. Beyond this spective from its selection or rejection or from being verbally
methodological critique, we also have the impression that the addressed or not. Moreover, Van Gog, Paas, and Van
idea of including quality aspects into the concept of relevance Merriënboer (2005) argue that especially a combined use of
might be rather confusing for readers not familiar with this thinking-aloud protocols and eye-tracking data can provide
terminology from information science. Thus, in the remainder deeper insights into implicit and fine-grained aspects of
of this paper we will avoid using the concept of relevance and cognitive processes. Therefore, in the study outlined in this
P. Gerjets et al. / Learning and Instruction 21 (2011) 220e231 223

paper concurrent thinking-aloud data was supplemented by 1.4.3. Search result selection
participants’ eye-tracking data during Web search. This Spontaneously, users tend to predominantly select the first
method also allows us to test whether explicit evaluation few search results on a SERP and neglect the remaining ones
instructions alter the way users pay attention to different types (Guan & Cutrell, 2007; Joachims et al., 2005). Explicit evalu-
of information. ation instructions, as compared to neutral thinking-aloud
instructions, should increase the number of search results with
1.4. The present study lower rankings chosen because users will be stimulated to
evaluate all search results on their own with regard to infor-
The present study tested the general hypothesis that explicit mation quality instead of selecting links only according to the
evaluation instructions as used in information science studies topicality ranking provided by the search engine (Hypothesis 3).
(Rieh, 2002; Tombros et al., 2005) might result in a severe
distortion of users’ spontaneous evaluation processes. As 1.4.4. Quality of information problem solving
control condition neutral thinking-aloud instructions were If explicit evaluation instructions stimulated users to eval-
chosen in order to detect this distortion in participants’ verbal uate search results or Web pages in the course of their infor-
utterances. Additionally, eye-tracking data were used in mation problem solving, then the quality of the resulting
combination with thinking-aloud protocols to reveal fine problem solution should improve (e.g., the quality of an
grained or implicit cognitive processes that do not show up in informed decision users are asked for or the quality of
users’ verbal utterances. a statement to justify an informed decision) as compared to
It was expected that explicit evaluation instructions spontaneous evaluation processes (Hypothesis 4).
(compared to neutral thinking-aloud instructions) will influence
users engaged in a complex search task with regard to different 2. Method
aspects of their Web search behaviour, namely their verbal
utterances and eye movements with respect to SERPs and 2.1. Design e participants
selected Web pages, their search result selection on SERPs, and
the result of their information problem solving. The basic The independent variable of the study was the type of
assumption that explicit evaluation instructions will elicit instructions given to the participants. Participants received
evaluation processes that would not take place spontaneously either neutral thinking-aloud instructions (in line with Ericsson
thus results in the following more specific expectations. & Simon, 1993) without any instructions to evaluate (Sponta-
neous Evaluation condition) or thinking-aloud instructions
1.4.1. Verbal utterances including instructions to evaluate (Instructed Evaluation
Explicit evaluation instructions should result in an condition; cf. Tombros et al., 2005). In the latter case partici-
increased number of verbal utterances addressing different pants were asked to mention the evaluation criteria that they
evaluation criteria during both the evaluation of SERPs and of applied while selecting search results and assessing Web pages.
selected Web pages. In particular, for criteria that go beyond Participants in the study were 30 students (10 male, 20
topicality and concern the quality of Web information (e.g., female; M ¼ 25.40 years, SD ¼ 3.95) from social and natural
credibility, up-to-dateness, design) due to the explicit evalua- sciences and humanities (no computer science students) at the
tion instructions a strong increase was expected, as compared University of Tuebingen, Germany; participation was rewar-
to spontaneously produced verbal utterances because, in the ded with either course credit or payment. Participants had
latter case, searchers are probably less used in evaluating the normal or corrected to normal vision.
quality of information than its topicality (Hypothesis 1). Participants were randomly assigned to the two experi-
mental conditions with 15 participants serving in each of the
1.4.2. Eye-tracking data conditions.
The assumption that explicit evaluation instructions will
elicit evaluation processes that would not take place spon- 2.2. Material and apparatus
taneously renders the expectation that users’ eye-tracking
data will differ on parameters that might indicate evalua- 2.2.1. Task
tion processes. Potential indicators for processes of search A complex and controversial domain characterized by
result evaluation are a higher number of attended search fragile and conflicting evidence was chosen that provided
results or a longer total fixation time on search results. sufficient affordances for searchers to engage in evaluation
Additionally, it was expected that explicit evaluation processes beyond topicality. The task was to achieve an
instructions, as compared to neutral thinking-aloud instruc- informed decision between a low fat and a low carb (i.e.,
tions, would result in a longer total fixation time or a higher carbohydrates) diet with regard to which of the two diet
number of gazes on specific pieces of information (e.g., methods better promotes a healthy and long-lasting weight
publishing information, source information, user ratings) on loss. In line with the method used by Stadtler and Bromme
a SERP or a Web page, indicating that users try to evaluate (2007) participants were confronted with a request from
the quality of search results and of retrieved Web pages a fictitious overweight friend, who wants to lose weight by
(Hypothesis 2). changing her diet and asks for advice. Participants were asked
224 P. Gerjets et al. / Learning and Instruction 21 (2011) 220e231

to conduct a 20-min Web search regarding this controversial Act as if you were alone, with no one listening, and just
topic in order to decide which of the two diet methods they keep talking.
would recommend to their friend.
In contrast, in the Instructed Evaluation condition the
instructions were similar to the instructions used, for instance,
2.2.2. Web stimulus materials
by Tombros et al. (2005) or Rieh (2002):
For their Web search, participants were provided with three
preselected Google-like SERPs. All three SERPs were Please think aloud during your Web search, that is, mention
accessible by means of a start Web page presenting three the evaluation criteria you apply to select search results and
hyperlinks with the search terms used to generate these to assess Web pages.
SERPs. The three search terms were ‘‘low fat’’, ‘‘low carb’’,
Please keep constantly talking from beginning till the end
and ‘‘low carb þ low fat’’ whereby each of the search terms
of the task.
was used to generate one SERP containing 10 search results
(for an example screenshot of the SERP ‘‘low carb þ low fat’’ Act as if you were alone, with no one listening, and just
see Fig. 1). The set of search results and Web pages for each of keep talking.
the three SERPs reflected the given heterogeneity of infor-
During task performance, whenever participants stopped
mation sources and their different perspectives and interests
verbalizing their thoughts, the experimenter reminded them
with regard to this controversial search topic. All three SERPs
(after 5 s) to think aloud.
included Web pages provided by scientific institutions (e.g.,
universities), journalists (e.g., online magazines), industry and
companies (e.g., online shops for nutrition or pharmaceutics), 2.2.5. Demographics, computer use and Web search skills,
and lay-people (e.g., discussion pages). There was an prior knowledge
approximately equal distribution of the type of Web pages for Before task processing, participants were asked to fill in
each SERP. Accordingly, for each SERP the available infor- a short computer-based questionnaire about demographics
mation varied largely with regard to contents and quality. The (gender, age) as well as about computer use and their skills in
three SERPs and the 30 Web pages (with 845 words per Web Web search (5 items; Cronbach’s a ¼ .82). Moreover, the
page on average and a range from 241 words to 3191 words questionnaire included nine items about participants’ prior
per Web page) corresponding to the search results were put knowledge on diets and nutrition; example item is ‘‘I never
offline to guarantee a standardized and controlled experi- heard about the low carb versus low fat controversy’’. The
mental setting. Hyperlinks within the 30 Web pages could be items had to be rated on five-point Likert-type response scales
used whereby in that case the corresponding Web pages were ranging from 1 (highly disagree) to 5 (highly agree). Cron-
accessed online. The layout of the Google-like SERPs was set bach’s alpha was .86 for the nine items. Analyses of the
up close to original, but ads and the hyperlinks ‘‘in cache’’ and respective data revealed no differences between the two
‘‘similar pages’’ were removed. The Web materials were dis- experimental conditions, that is, for gender, c2(1,
played on a standard 17-inch computer screen and were pre- N ¼ 29) ¼ 0.84, ns, for age, t(27) ¼ 0.07, ns, for computer use
sented with Microsoft Internet Explorer 6. and Web search skills, t(27) ¼ 0.96, ns, and for prior knowl-
edge on diets and nutrition, t(27) ¼ 0.03, ns.
2.2.3. Eye tracking equipment
For eye tracking during task processing a 50 Hz Tobii 1750 2.2.6. Decision making and decision justification
remote eye-tracking system with infrared-cameras built into At the end of task processing, participants were required to
a 17-inch monitor (www.tobii.com) was used. The Web stimulus decide which of the two diet methods they would recommend
recording mode of the ClearView 2.7.1 analysis software was to their friend. Additionally, participants had to write a short
used that captures not only the eye movements, but the entire statement to justify their decision.
task performance process (including mouse operations). The
minimum fixation duration was set to 100 ms with a fixation 2.3. Procedure
radius of 30 pixels (cf. Cutrell & Guan, 2007). Participants’
thinking-aloud protocols were recorded digitally by Camtasia Participants were tested in individual sessions of approxi-
3.0 software using a standard microphone attached to the PC. mately 1 h. First, they filled in the questionnaire on demo-
graphics, computer use and Web search skills, and their prior
2.2.4. Thinking-aloud instructions knowledge on diets and nutrition. They also received
In the Spontaneous Evaluation condition instructions to instructions about Web search as well as the thinking-aloud
think aloud were worded in line with the standards described instructions according to their experimental condition.
by Ericsson and Simon (1993). The instructions were: Participants were then calibrated on the eye-tracking system
using a 9-point calibration.
Please think aloud during your Web search, that is,
Before working on the main task, participants underwent
verbalize everything that comes to your mind.
a training task for approximately 5 min to get acquainted with
Please keep constantly talking from beginning till the end the thinking-aloud method and the Web search environment.
of the task. In this training task, they had to conduct a Web search on
P. Gerjets et al. / Learning and Instruction 21 (2011) 220e231 225

Fig. 1. Screenshot of the SERP with low carb þ low fat as keywords.

possible causes and treatments of backache. They were pre- aloud freely and that they felt comfortable enough with the
sented three search terms (‘‘backache’’, ‘‘back gym’’, and procedure the training task was finished.
‘‘backache þ back gym’’) leading to three Google-like SERPs When participants had finished the training task, they were
(with 10 search results each) linked to websites. During the given the instructions for the main task including the fictitious
training task, participants’ thinking aloud was practiced request of their friend to give a recommendation about low
together with the experimenter. In the case that participants carb or low fat diets. Furthermore, participants were again
did not verbalize their thoughts according to the instructions reminded to think aloud during their task performance and to
received the experimenter repeated the instructions and use all three search terms. Eye movements, screen recordings,
encouraged them to think aloud freely. When the experimenter and concurrent verbalizations were captured during the entire
had the impression that the participants were able to think 20 min task performance. If necessary the experimenter
226 P. Gerjets et al. / Learning and Instruction 21 (2011) 220e231

Table 1 on the SERPs and the Web pages. It was determined for all
Evaluation criteria. AOIs whether and for which amount of time a participant was
Evaluation criteria Short description looking at this area. On the three SERPs each of the ten
Topicality Is the search search results (including title, summary, and URL) was
result or the Web page defined as a single rectangular AOI. For all 30 Web pages
about the topic AOIs were defined on specific Web page characteristics,
in general?
Scope Is the information
namely areas that provided publishing information (publi-
focused enough on the specific cation date, name and profession of the author), source
request of the fictitious friend? information (references, contact information, affiliation), and
Credibility Is the information user ratings (e.g., rating stars, comments). All AOIs corre-
valid and is the source sponding to the same Web page characteristic (i.e.,
providing information trustworthy
and reliable?
publishing information, source information, and user ratings,
Design Is the design respectively) were aggregated for data analyses across the 30
of the Web page Web pages. Similar AOIs were defined on the three SERPs.
good and is the information However, as the SERPs contained no user ratings and only
clearly displayed and easy a very limited amount of publishing information we aggre-
to find?
Up-to-dateness Is the information
gated these AOIs into a single variable indicating publishing
recent or up-to-date? and source information.
As a first eye-tracking variable during the search-result
reminded the participants to keep thinking aloud and informed evaluation phase the number of visually inspected Google
them after 10 min that half of the available time was over. search results was counted. A visual inspection was defined
Participants were asked to use all three SERPs for their Web as at least one fixation (100 ms) within a search-result AOI.
search, but were not allowed to generate new SERPs by The second eye-tracking variable was the total fixation time
changing the search terms. Participants could access all Web for which participants visually inspected Google search
pages corresponding to the list of search results. results during the search-result evaluation phase. The total
Subsequent to the search task participants were required to fixation time on search results was defined as the sum of
decide which of the two diet methods they would recommend to fixation durations for all search results. As a third eye-
their friend and to write a short statement about their decision. tracking variable the number of gazes on publishing and
source information within the SERPs was registered. A single
2.4. Data analyses and dependent variables gaze was defined as all subsequent fixations on an AOI. In
case that an AOI was re-inspected after fixations on another
2.4.1. Coding scheme for thinking-aloud protocols AOI, this was counted as a second gaze. As a fourth eye-
For the analysis of participants’ thinking-aloud protocols tracking variable during the search-result evaluation phase
a coding scheme was developed that was based on the evaluation the total fixation time on publishing and source information
criteria found in the information science literature (Savolainen within the SERPs was measured by summing up all respective
& Kari, 2005). This scheme was refined by analyzing data fixation durations. Furthermore, three eye-tracking variables
from thinking-aloud protocols of a pilot study. It included the during the Web-page evaluation phase were formed based on
following five evaluation criteria: (a) topicality, (b) scope, (c) the number of gazes on the aggregated AOIs of each of the
credibility, (d) up-to-dateness, and (e) design. The first two three Web page characteristics, namely publishing informa-
criteria were topic-related criteria, whereas the latter three were tion, source information, and user ratings. Finally, three more
quality-related criteria. Short descriptions of these five evalua- eye-tracking variables were formed based on the total fixation
tion criteria are provided in Table 1. Two raters familiar with the time for each of the three Web page characteristics during the
search task and the materials as well as with the coding scheme Web-page evaluation phase. The total fixation time for
scored 30% of the protocols. Interrater reliability computed on a single Web page characteristic (e.g., publishing informa-
this subsample of protocols yielded a Cohen’s kappa of 0.72. tion) was defined as the sum of all fixation durations on this
One rater scored the remaining protocols. As dependent vari- Web page characteristic across all Web pages. All fixation-
ables the number of verbal utterances referring to the five time data were transformed into seconds for ease of
different types of evaluation criteria was analyzed for both the interpretation.
search result evaluation and the Web page evaluation.
2.4.3. Log file data (mouse clicks)
2.4.2. Eye-tracking data As a first selection variable the number of search results
For the analysis of participants’ eye-tracking data so- selected per SERP was registered. Additionally as a second
called ‘‘areas of interest’’ (AOIs)1 were defined manually and third selection variable the number of selections of the top
five search results per SERP (i.e., positions 1e5) and the
number of selections of the bottom five search results per
1
‘‘Areas of interest’’ (AOIs) are precisely specified areas of an object (e.g., SERP (i.e., positions 6e10) were registered (cf. Pan et al.,
a search result on a SERP or an author name on a Web page). 2007).
P. Gerjets et al. / Learning and Instruction 21 (2011) 220e231 227

2.4.4. Quality of decision making and decision justification Table 2


Participants’ solution to the information problem, that is, Means (and standard deviations) for the number of verbal utterances as
a function of experimental condition in the two evaluation phases for the five
the decision to either recommend low carb diets or low fat evaluation criteria.
diets, was analyzed by counting the frequency with which the
Evaluation criteria Condition
two diet methods were recommended in each experimental
condition. Additionally, participants’ statements to justify their Instructed evaluation Spontaneous evaluation Overall
decision were rated with respect to their quality on a 3-point Search-result evaluation phase
rating scale, ranging from 0 (false or no statements), 1 Topicality 4.60 (4.09) 3.50 (2.77) 4.07 (3.49)
Scope 0.67 (0.90) 0.14 (0.36) 0.41 (0.73)
(personal opinions or likes and dislikes without any further Credibilitya 4.60 (4.55) 1.79 (1.72) 3.24 (3.71)
argumentation), 2 (fuzzy statements mentioning risks and Design 0.07 (0.26) 0.07 (0.27) 0.07 (0.26)
benefits of one diet method), to 3 points (detailed statement Up-to-dateness 0.00 (0.00) 0.00 (0.00) 0.00 (0.00)
with arguments in favour and against both diet methods). Web-page evaluation phase
Topicality 1.93 (2.76) 0.93 (1.00) 1.45 (2.13)
3. Results Scope 3.33 (2.97) 2.07 (2.17) 2.72 (2.64)
Credibilitya 9.13 (5.78) 3.57 (2.31) 6.45 (5.21)
Designa 5.53 (4.69) 0.93 (1.07) 3.31 (4.12)
An alpha level of .05 was used for the statistical tests
Up-to-datenessa 0.47 (0.83) 0.00 (0.00) 0.24 (0.64)
reported. Due to technical problems one participant from the
Spontaneous Evaluation condition was excluded from data Overall
Topicality 3.27 (2.72) 2.21 (1.41) 2.76 (2.21)
analyses; consequently, statistical analyses were conducted for Scope 2.00 (1.49) 1.11 (1.08) 1.57 (1.36)
29 participants. For all analyses partial eta-squared is reported Credibilitya 6.87 (4.72) 2.68 (1.20) 4.85 (4.04)
as a measure of effect size. Designa 2.80 (2.40) 0.50 (0.52) 1.69 (2.09)
Up-to-datenessa 0.23 (0.42) 0.00 (0.00) 0.12 (0.32)
3.1. Verbal utterances a
Significant differences between the two experimental conditions.

Data from thinking-aloud protocols were analyzed with page evaluation phase, F(1, 27) ¼ 13.80, p < .01, partial
a 2(conditions: instructed evaluation vs. spontaneous eval- h2 ¼ .34.
uation)  2(evaluation phases: search result evaluation vs. Web Furthermore, there was a significant interaction between
page evaluation) mixed model repeated measures MANOVA conditions and evaluation phases, Pillai’s trace ¼ 0.49, F(5,
with the number of verbal utterances addressing the five eval- 23) ¼ 4.43, p ¼ .01, partial h2 ¼ .49. Univariate ANOVAs
uation criteria as five dependent variables; conditions served as showed that this interaction was significant only for design and
the between subjects factor and the evaluation phases as within up-to-dateness, F(1, 27) ¼ 13.25, p < .01, partial h2 ¼ .33,
subjects factor (for means and standard deviations see Table 2). and F(1, 27) ¼ 4.38, p ¼ .05, partial h2 ¼ .14, respectively.
The MANOVA showed a significant overall main effect of For topicality, scope, and credibility the interaction was not
conditions on the number of evaluative verbal utterances, Pil- significant, F < 1, ns, F < 1, ns, F(1, 27) ¼ 3.64, p ¼ .07,
lai’s trace ¼ 0.43, F(5, 23) ¼ 3.43, p < .05, partial h2 ¼ .43. respectively. Bonferroni post hoc tests revealed that during the
Univariate ANOVAs showed that participants in the Instructed Web-page evaluation phase participants in the Instructed
Evaluation condition showed significantly more verbal utter- Evaluation condition showed significantly more verbal utter-
ances than participants in the Spontaneous Evaluation condition ances with respect to the criteria design and up-to-dateness
only for the three quality-related evaluation criteria: credibility, than participants in the Spontaneous Evaluation condition
F(1, 27) ¼ 10.36, p < .01, partial h2 ¼ .28; design, F(1, ( p < .01 and p ¼ .05, respectively), whereas during the
27) ¼ 12.33, p < .01, partial h2 ¼ .31; and up-to-dateness, F(1, search-result evaluation phase the number of verbal utterances
27) ¼ 4.38, p ¼ .05, partial h2 ¼ .14. For topicality and scope the with regard to these criteria did not differ between participants
difference was not significant, F(1, 27) ¼ 1.68, ns, and F(1, (in both cases p > .90). This was not surprising as there are
27) ¼ 3.38, p ¼ .08, respectively. hardly any hints present in the search results concerning the
There was also a significant overall main effect of the eval- design of the Web page and how up-to-date the information on
uation phase, Pillai’s trace ¼ 0.11, F(5, 23) ¼ 15.37, p < .001, the Web page is. With regard to the criterion credibility, during
partial h2 ¼ .11. Univariate ANOVAs showed that for the both the search result and Web-page evaluation phases,
following four evaluation criteria participants showed signifi- participants in the Instructed Evaluation condition showed
cantly more evaluative verbal utterances during the Web-page more verbal utterances than participants in the Spontaneous
evaluation phase than during the search-result evaluation Evaluation condition ( p <. 05 and p <. 01, respectively).
phase: F(1, 27) ¼ 19.64, p < .001, partial h2 ¼ .42 for scope; F(1,
27) ¼ 19.23, p < .001, partial h2 ¼ .42 for credibility; F(1, 3.2. Eye-tracking data
27) ¼ 24.95, p < .001, partial h2 ¼ .48 for design; and F(1,
27) ¼ 4.38, p ¼ .05, partial h2 ¼ .14 for up-to-dateness. In Eye-tracking data were analyzed separately for the search-
contrast, for topicality, that is, the fifth criterion, a reversed result evaluation phase and the Web-page evaluation phase as
pattern was shown with significantly more verbal utterances AOIs were not comparable between the two phases (for means
during the search-result evaluation phase than during the Web- and standard deviations see Table 3).
228 P. Gerjets et al. / Learning and Instruction 21 (2011) 220e231

Table 3
Means (and standard deviations) for the eye-tracking data as a function of experimental condition in the two evaluation phases.
Eye-tracking data Condition
Instructed Spontaneous Overall
evaluation evaluation
Search-result evaluation phase (sum of all 30 search results)
Number of visually inspected search results 27.07 (3.20) 28.64 (1.65) 27.83 (2.65)
Total fixation time on search results (in seconds) 103.63 (46.84) 100.68 (51.48) 102.20 (48.27)
Number of gazes on publishing and source information 47.07 (21.16) 47.36 (20.25) 47.21 (20.36)
Total fixation time on publishing and source 24.37 (15.91) 21.94 (16.33) 23.20 (15.87)
information (in seconds)
Web-page evaluation phase (sum of all 30 Web pages)
Number of gazes on publishing information 14.00 (7.22) 13.93 (6.82) 13.97 (6.90)
Number of gazes on source information 3.20 (2.43) 2.14 (1.75) 2.69 (2.16)
Number of gazes on user ratingsa 1.60 (1.99) 0.36 (0.63) 1.00 (1.60)
Total fixation time on publishing information (in seconds) 8.85 (7.11) 6.51 (3.13) 7.72 (5.59)
Total fixation time on source information (in seconds) 1.87 (1.18) 1.01 (1.72) 1.45 (1.52)
Total fixation time on user ratingsa (in seconds) 2.08 (3.50) 0.12 (0.29) 1.13 (2.67)
a
Significant differences between the two experimental conditions.

For the search-result evaluation phase the ANOVAs with 3.3. Search result selection
conditions as between subjects factor showed no significant
main effect of conditions on the number of visually inspected During the 20 min Web search participants on average
search results, F(1, 27) ¼ 2.73, ns, and on the total fixation selected 4.98 (SD ¼ 1.98) search results per SERP. The number
time on search results, F < 1, ns. Similarly, the ANOVAs of selected search results did not differ significantly, F(1,
showed no significant main effect of conditions on the 27) ¼ 1.20, ns, between the two experimental conditions, with
number of gazes, F < 1, ns, and on the total fixation time, M ¼ 4.57 (SD ¼ 2.08) per SERP in the Instructed Evaluation
F < 1, ns, on publishing and source information of the search condition and M ¼ 5.41 (SD ¼ 1.98) in the Spontaneous
results. Evaluation condition. Furthermore, between the two conditions
For the Web-page evaluation phase results of a MANOVA there were neither significant differences with regard to the
with conditions as between subjects factor and the number of number of selections of the top five search results per SERP,
gazes on the three Web page characteristics (i.e., publishing F(1, 27) < 1, ns, nor with regard to the number of selections of
information, source information, and user ratings) as three the bottom five ones, F(1, 27) ¼ 1.77, ns. Participants in the
dependent variables showed a significant overall main effect Instructed Evaluation condition on average selected 2.76
of conditions on the number of gazes on these Web page (SD ¼ 0.93) of the top five search results per SERP and 1.82
characteristics, Pillai’s trace ¼ 0.30, F(3, 25) ¼ 3.48, (SD ¼ 1.32) of the remaining ones. Participants in the Spon-
p < .05, partial h2 ¼ .30. However, univariate ANOVAs taneous Evaluation condition on average selected 2.95
revealed that only for user ratings the number of gazes was (SD ¼ 1.00) of the top five search results per SERP and 2.45
significantly higher in the Instructed Evaluation condition (SD ¼ 1.22) of the bottom five ones. A repeated measures
than in the Spontaneous Evaluation condition, F(1, ANOVA showed that participants selected the top five search
27) ¼ 11.19, p < .05, partial h2 ¼ .16. For the two other results of a SERP significantly more often than the bottom five
Web page characteristics there were no significant differ- ones, F(1, 27) ¼ 15.07, p < .01, partial h2 ¼ .36.
ences found between the two conditions, that is, for source
information, F(1, 27) ¼ 1.79, ns, and for publishing infor- 3.4. Quality of information problem solving
mation, F < 1, ns.
Furthermore, regarding the total fixation time on the three A c2-test showed that there was a significant effect of
Web page characteristics similar patterns were found. Results conditions on participants’ decision with regard to the infor-
of the respective MANOVA showed a nonsignificant overall mation problem, c2(1, N ¼ 29) ¼ 4.97, p < .05. In the
main effect of conditions on the total fixation time on the Web Spontaneous Evaluation condition 10 participants recom-
page characteristics, F(3, 25) ¼ 2.40, p ¼ .09. However, mended low fat diets and 4 participants recommended low
univariate ANOVAs showed that only for user ratings the total carb diets, whereas in the Instructed Evaluation condition all
fixation time was significantly longer in the Instructed Eval- of the 15 participants recommended low fat diets. However,
uation condition than in the Spontaneous Evaluation condi- with regard to the quality of the statements justifying the
tion, F(1, 27) ¼ 4.34, p ¼ .05, partial h2 ¼ .14. For the two decision the results of an ANOVA showed no significant
other Web page characteristics there were no significant differences between the two experimental conditions, F(1,
differences found between the two conditions, that is, for 27) ¼ 2.42, ns, with M ¼ 1.29 (SD ¼ 0.92) for the Sponta-
source information, F(1, 27) ¼ 2.42, ns, and for publishing neous Evaluation condition and M ¼ 1.73 (SD ¼ 0.96) for the
information, F(1, 27) ¼ 1.28, ns. Instructed Evaluation condition.
P. Gerjets et al. / Learning and Instruction 21 (2011) 220e231 229

4. Discussion nine of the ten search results per SERP and selected about half
of the available search results. It is known from previous
Appropriately evaluating Web search results and Web pages research, however, that users normally attend to only three to
is a crucial component when searching for complex and four search results per SERPdat least when solving simple
science-related information due to the diverse and often fact finding tasks (cf. Granka, Feusner, & Lorigo, 2008). Thus,
contradictory sources of information that can be found on the the lacking time pressure might have lowered the importance
Web. To investigate these evaluation processes, information of participants’ selection decisions. In future studies the time
science studies (Rieh, 2002) have used explicit evaluation available for Web search could be manipulated to clarify
instructions that yielded sets of different criteria used by whether a lack of time pressure may have reduced the pro-
searchers to evaluate Web information. However, it can be cessing differences between the experimental groups in the
criticized that these studies fail to address spontaneous eval- present study.
uation processes due to the instructions used. Thus, the study
reported in the present article compared standard thinking- 4.2. Evaluation of Web pages
aloud instructions to explicit evaluation instructions to test
whether the latter lead to a distortion of users’ spontaneous As Web pages more often than SERPs contain character-
evaluation processes. As expected, differences were found for istics like publishing information, source information, or user
thinking-aloud data (Hypothesis 1) and eye-tracking data ratings that can be used to evaluate the quality of Web infor-
(Hypothesis 2) as well as on participants’ information problem mation it was not unexpected that participants showed more
solving (Hypothesis 4), but not on search result selection quality-related evaluative verbal utterances during the Web-
(Hypothesis 3). We will first discuss the results for the eval- page evaluation phase than during the search-result evalua-
uation of search results on the SERPs and subsequently tion phase. Interestingly, the pattern for topicality was
elaborate on the findings on Web page evaluation. reversed; this is in line with the idea that searchers focus on
topicality judgments when they evaluate Web search results.
4.1. Evaluation of search results During the Web-page evaluation phase there were several
differences between participants in the Instructed Evaluation
In line with Hypothesis 1, during the search-result evaluation condition and participants in the Spontaneous Evaluation condi-
phase users in the Instructed Evaluation condition differed in tion with regard to their thinking-aloud data (Hypothesis 1) and
their verbal utterances from users in the Spontaneous Evalua- their eye-tracking data (Hypothesis 2) as well as with regard to
tion condition in that they showed more credibility-related the results of their information problem solving (Hypothesis 4).
verbal utterances when they inspected the SERPs. However, First, in line with Hypothesis 1, evaluation instructions
contrary to the assumptions of Hypothesis 2 and Hypothesis 3 increased the rate of verbal utterances with respect to quality-
no differences could be found between experimental conditions related criteria like up-to-dateness, design, and credibility.
with regard to users’ eye-tracking data and search result Second, eye-tracking data showed that evaluation instructions
selection on the SERPs. More precisely, explicit evaluation increased searchers’ attention for user ratings, which is quality-
instructions did neither affect the number of attended search related information. Therefore, Hypothesis 2 was at least partly
results or the fixation time on search results nor the processing confirmed. Third, evaluation instructions influenced the results
of specific information on search results (publishing and source of participants’ information problem solving. In this study, all
information). Second, experimental groups did not differ with participants in the Instructed Evaluation condition decided to
regard to the overall number of selected search results or the recommend low fat diets, whereas in the Spontaneous Evalua-
number of selected search results with lower rankings as would tion condition 10 participants recommended low fat diets and 4
have been expected if users had been stimulated to evaluate all participants recommended low carb diets. As the sub-collection
search results on their own with regard to information quality of sources in the present study that cited scientific evidence
instead of selecting search results only according to the topi- mainly favoured low fat diets over low carb diets the decision
cality ranking provided by the search engine. quality of the instructed evaluation group can be seen as supe-
In sum, it was found for the search result evaluation on rior. However, as participants in the Instructed Evaluation
SERPs that due to the evaluation instructions participants condition were not able to better justify their decision than
uttered their considerations with regard to the credibility of participants in the Spontaneous Evaluation condition, we
different search results, but the effects of the instructions on consider Hypothesis 4 as only partly confirmed.
verbal utterances were not reflected in eye-tracking data and In sum, it was found for Web page evaluation that due to
log file data. One reason for the pattern of results found in this the evaluation instructions, participants not only uttered more
study for the search-result evaluation might be that users spent quality-related evaluation criteria but also changed their
more time on the SERPs than they would spontaneously do processing strategies and improved their information
because we provided them with 20 min time for their Web problem solving. Thus, it becomes clear from our results that
search that was confined to three SERPs with ten search findings in information science studies (Rieh, 2002) by no
results each. Accordingly, the number of attended search means reflect the evaluation criteria users spontaneously
results as well as the number of selected search results per apply during Web search. It might be even criticized that the
SERP was uncommonly high. Both groups attended about standard thinking-aloud instructions that were used as
230 P. Gerjets et al. / Learning and Instruction 21 (2011) 220e231

a control condition for the explicit evaluation instruction is We expect that the methodological critique addressed in the
still not very close to a natural search situation. First, users’ present article and the possible improvements outlined, finally
search space was restricted to three SERPs with given search result in future research that will gain deeper insights into
terms and was linked to 30 Web pages. Additionally, evaluation processes during Web search. These insights will
participants were required to study these materials for also be useful to inform concrete training programs and design
20 min. So they might have elaborated the materials more guidelines for search interfaces in order to support users in an
thoroughly than they might have done under natural search adequate evaluation of Web information.
conditions. Second, the procedure of the experiment in both
conditions required participants to verbalize their thoughts References
concurrently to their search process, which might have
interfered with participants’ cognitive processes. The use of Borlund, P. (2003). The concept of relevance in IR. Journal of the American
obtrusive measurements like concurrent thinking aloud may Society for Information Science and Technology, 54, 913e925.
have reactively influenced the search process itself (e.g., Brand-Gruwel, S., & Gerjets, P. (2008). Instructional support for enhancing
students’ information problem solving ability. Computers in Human
stimulated participants to process the materials more elabo-
Behavior, 24, 615e622.
rately). To investigate Web users’ natural search behaviour Brand-Gruwel, S., Wopereis, I., & Vermetten, Y. (2005). Information problem
more unobtrusively in future studies one might consider solving by experts and novices: analysis of a complex cognitive skill.
combining eye tracking with cued retrospective verbal Computers in Human Behavior, 21, 487e508.
utterances by presenting them with a screen recording of Brumby, D. P., & Howes, A. (2008). Strategies for guiding interactive search:
their task processing superimposed with their eye movements an empirical investigation into the consequences of label relevance for
assessment and selection. Human-Computer Interaction, 23, 1e46.
and mouse operations subsequent to the search task and Crystal, A., & Greenberg, J. (2006). Relevance criteria identified by health
asking them to report retrospectively what they were thinking information users during web searches. Journal of the American Society for
during information seeking (cf. De Koning, Tabbers, Rikers, Information Science and Technology, 57, 1368e1382.
& Paas, 2010; Jarodzka, Scheiter, Gerjets, & Van Gog, 2010; Cutrell, E., & Guan, Z. (2007). What are you looking for? An eye-tracking
study of information usage in Web search. In M. B. Rosson & D. J. Gil-
Van Gog, Paas, Van Merriënboer, & Witte, 2005).
more (Eds.), Proceedings of the SIGCHI conference on human factors in
With regard to the practical implications of the present study it computing systems (pp. 407e416). New York: ACM Press.
can be concluded that stimulating learners to engage in quality- De Koning, B. B., Tabbers, H. K., Rikers, R. M. J. P., & Paas, F. (2010).
related evaluation processes might be useful to improve their Attention guidance in learning from complex animation: seeing is under-
Web search performance. This is in line with findings by Stadtler standing? Learning and Instruction, 20, 111e122.
and Bromme (2007) on the effects of the metacognitive tool Ericsson, K. A., & Simon, H. A. (1993). Protocol analysis: Verbal reports as
data. Cambridge, MA: MIT Press.
met.a.ware on users’ Web search. Results of their study showed Gerjets, P., & Hellenthal-Schorr, T. (2008). Competent information search in
that participants’ knowledge about information sources and their the World Wide Web: development and evaluation of a web training for
justification of credibility judgments could be significantly pupils. Computers in Human Behavior, 24, 693e715.
enhanced by providing them with evaluation prompts from the Granka, L., Feusner, M., & Lorigo, L. (2008). Eye monitoring in online search.
tool. Other manipulations beyond explicit evaluation instructions In R. I. Hammoud (Ed.), Passive eye monitoring: Algorithms, applications
and experiments (pp. 283e304). Berlin: Springer.
that might serve this purpose could be to provide searchers with Granka, L., Joachims, T., & Gay, G. (2004). Eye-tracking analysis of user behavior
a specific training (cf. Brand-Gruwel & Gerjets, 2008) or with in WWW search. In M. Sanderson, K. Järvelin, J. Allan, & P. Bruza (Eds.),
additional quality-related information on SERPs and Web pages. Proceedings of the 27th annual ACM SIGIR conference on research and
For instance, a recent study by Walraven (2008) on a training development in information retrieval (pp. 478e479). New York: ACM Press.
Guan, Z., & Cutrell, E. (2007). An eye tracking study of the effect of target
program for evaluation skills on the Web revealed that the
rank on Web search. In M. B. Rosson & D. J. Gilmore (Eds.),
program indeed improved the evaluation behaviour of high Proceedings of the SIGCHI conference on human factors in computing
school students. With regard to the latter approach to provide systems (pp. 417e420). New York: ACM Press.
additional quality-related information on SERPs, Kammerer, Höök, K., Benyon, D., & Munro, A. J. (Eds.). (2003). Designing information
Wollny, Gerjets, and Scheiter (2009) could show that an spaces: The social navigation approach. London: Springer.
augmented search engine interface containing additional infor- Jarodzka, H., Scheiter, K., Gerjets, P., & Van Gog, T. (2010). In the eyes of the
beholder: how experts and novices interpret dynamic stimuli. Learning and
mation beyond topicality (source category labels) influenced Instruction, 20, 146e154.
participants’ viewing and selection behaviour concerning the Joachims, T., Granka, L., Pan, B., Hembrooke, H., & Gay, G. (2005). Accurately
search results compared to a standard search engine interface. interpreting click through data as implicit feedback. In R. Baeza-Yates, N.
Similarly, Sundar, Knobloch-Westerwick, and Hastall (2007) Ziviani, G. Marchionini, A. Moffat, & J. Tait (Eds.), Proceedings of the 28th
annual international ACM SIGIR conference on research and development in
found that the availability of quality-related cues for news items
information retrieval (pp. 154e161). New York: ACM Press.
presented by the online news service Google News (i.e., infor- Kammerer, Y., Wollny, E., Gerjets, P., & Scheiter, K. (2009). How authority-related
mation on the source and the recency of a story, and on the number epistemological beliefs and salience of source information influence the eval-
of related articles) affected the subjective evaluation of news uation of web search results e an eye tracking study. In N. A. Taatgen & H. van
leads. To conclude, the provision of both a specific training on Rijn (Eds.), Proceedings of the 31st annual conference of the cognitive science
society (pp. 2158e2163). Austin, TX: Cognitive Science Society.
how to use different criteria to evaluate the quality of Web
Maglaughlin, K. L., & Sonnenwald, D. H. (2002). User perspectives on
information as well as an augmented search engine interface relevance criteria: a comparison among relevant, partially relevant, and
design seem to be promising approaches in order to improve not-relevant judgments. Journal of the American Society for Information
searchers’ evaluation behaviour during Web search. Science and Technology, 53, 327e342.
P. Gerjets et al. / Learning and Instruction 21 (2011) 220e231 231

Marchionini, G. (1995). Information seeking in electronic environments. Schamber, L. (1994). Relevance and information behaviour. In M. E. Wiliams
Cambridge, UK: Cambridge University Press. (Ed.), Annual review of information science and technology (ARIST) (pp.
Miller, C. S., & Remington, R. W. (2004). Modeling information navigation: 3e48). Medford, NJ: Learned Information.
implications for information architecture. Human-Computer Interaction, Stadtler, M., & Bromme, R. (2007). Dealing with multiple documents on the
19, 225e271. WWW: the role of metacognition in the formation of documents models.
Pan, B., Hembrooke, H., Joachims, T., Lorigo, L., Gay, G., & Granka, L. Computer-Supported Collaborative Learning, 2, 191e210.
(2007). In Google we trust: users’ decisions on rank, position, and rele- Sundar, S. S., Knobloch-Westerwick, S., & Hastall, M. R. (2007). News cues:
vance. Journal of Computer-Mediated Communication, 12, 801e823. information scent and cognitive heuristics. Journal of the American Society
Pirolli, P., & Card, S. K. (1999). Information foraging. Psychological Review, for Information Science and Technology, 58, 366e378.
106, 643e675. Tombros, A., Ruthven, I., & Jose, J. M. (2005). How users assess web pages
Pirolli, P., & Fu, W.-T. F. (2003). SNIF-ACT: a model of information foraging for information-seeking. Journal of the American Society for Information
on the World Wide Web. In P. Brusilovsky, A. T. Corbett, & F. De Rosis Science and Technology, 56, 327e344.
(Eds.), Proceedings of the 9th international conference on user modeling Van Gog, T., Paas, F., & Van Merriënboer, J. J. G. (2005). Uncovering
(pp. 45e54). Johnstown, PA: Springer. expertise-related differences in troubleshooting performance: combining
Rayner, K. (1998). Eye movements in reading and information processing: 20 eye movement and concurrent verbal protocol data. Applied Cognitive
years of research. Psychological Bulletin, 124, 372e422. Psychology, 19, 205e221.
Rieh, S. Y. (2002). Judgment of information quality and cognitive authority in Van Gog, T., Paas, F., Van Merriënboer, J. J. G., & Witte, P. (2005). Uncov-
the web. Journal of the American Society for Information Science and ering the problem solving process: cued retrospective reporting versus
Technology, 53, 145e161. concurrent and retrospective reporting. Journal of Experimental
Rouet, J.-F. (2006). The skills of document use: From text comprehension to Psychology: Applied, 11, 237e244.
Web-based learning. Mahwah, NJ: Erlbaum. Walraven, A. (2008). Becoming a critical websearcher: Effects of instruction to
Rouet, J.-F., Favart, M., Britt, M. A., & Perfetti, C. A. (1997). Studying and foster transfer. Unpublished doctoral dissertation. Heerlen, The
using multiple documents in history: effects of discipline expertise. Netherlands: Open University of the Netherlands. http://hdl.handle.net/
Cognition and Instruction, 15, 85e106. 1820/1734 Retrieved from.
Savolainen, R., & Kari, J. (2005). User-defined relevance criteria in web Wilson, T. D. (1999). Models in information behaviour research. Journal of
searching. Journal of Documentation, 62, 685e707. Documentation, 55, 249e270.

You might also like