Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

1012 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 26, NO.

1, JANUARY 2020

The Perceptual Proxies of Visual Comparison


Nicole Jardine, Brian D. Ondov, Niklas Elmqvist, Senior Member, IEEE, Steven Franconeri

   

   
  
  
!   !  
     

 "   # 


 

     


 #  
     
      
  #              "  
    

Fig. 1: What’s visual in visual comparisons, such as finding the larger mean value? We identify mark arrangements that allow
for better performance across comparison tasks. Combining previous results with the results of two new tasks fails to produce a
clean ranking of arrangement effectiveness across tasks. We argue that to explain these complex patterns of performance, we first
need a perceptual explanation of how visual comparison actually unfolds. Viewers likely perform these mathematical comparison
operations with perceptual proxies. We propose and evaluate a candidate set of proxies for two visual comparison tasks.

Abstract—Perceptual tasks in visualizations often involve comparisons. Of two sets of values depicted in two charts, which set had
values that were the highest overall? Which had the widest range? Prior empirical work found that the performance on different
visual comparison tasks (e.g., “biggest delta”, “biggest correlation”) varied widely across different combinations of marks and spatial
arrangements. In this paper, we expand upon these combinations in an empirical evaluation of two new comparison tasks: the
“biggest mean” and “biggest range” between two sets of values. We used a staircase procedure to titrate the difficulty of the data
comparison to assess which arrangements produced the most precise comparisons for each task. We find visual comparisons of
biggest mean and biggest range are supported by some chart arrangements more than others, and that this pattern is substantially
different from the pattern for other tasks. To synthesize these dissonant findings, we argue that we must understand which features
of a visualization are actually used by the human visual system to solve a given task. We call these perceptual proxies. For example,
when comparing the means of two bar charts, the visual system might use a “Mean length” proxy that isolates the actual lengths of the
bars and then constructs a true average across these lengths. Alternatively, it might use a “Hull Area” proxy that perceives an implied
hull bounded by the bars of each chart and then compares the areas of these hulls. We propose a series of potential proxies across
different tasks, marks, and spatial arrangements. Simple models of these proxies can be empirically evaluated for their explanatory
power by matching their performance to human performance across these marks, arrangements, and tasks. We use this process to
highlight candidates for perceptual proxies that might scale more broadly to explain performance in visual comparison.
Index Terms—Graphical perception, visual perception, visual comparison, crowdsourced evaluation

1 I NTRODUCTION
Visual comparison is a core perceptual task in data visualizations [10]. ther of these comparison tasks need rely on the averages of each of
An epidemiologist might use two bar charts to assess whether, across these sets, or identification of individual values. They simply require
all age groups, there is a larger overall population of women than men. a judgment of which set’s mean or spread is larger than the other. Al-
An education researcher might use two bar charts to assess whether though certain visual channels [3, 18] are known to convey higher-
one group of students’ test scores has a larger range than another. Nei- precision information (position) than other marks (hue), existing eval-
uations of visual comparison performance suggest that there is no sin-
gle mark or spatial arrangement that optimizes visual comparison.
 Nicole Jardine is affilitated with Northwestern University and with the
Cook County Assessor’s Office in Chicago, IL, USA. E-mail: In prior work, Ondov et al. [20] evaluated the perceptual preci-
njardine@cookcountyassessor.com. sion of visual comparisons of the “biggest delta between items” and
 Brian Ondov is with the National Institutes of Health in Bethesda, MD, “biggest correlation between sets” for different visualization marks
USA and University of Maryland in College Park, MD, USA. E-mail: (bar, line) and spatial arrangements (stacked, mirrored, adjacent, su-
ondovb@umd.edu. perposed, and animated). Precision was not optimized by a single
 Steven Franconeri is with Northwestern University in Evanston, IL, USA. mark type or spatial arrangement. Instead, the precision of visual
E-mail: franconeri@northwestern.edu. comparison depended on an interaction of mark, arrangement, and
 Niklas Elmqvist is with University of Maryland in College Park, MD, USA. task (Figure 2). The best static chart for a precise delta compari-
E-mail: elm@umd.edu. son, for example, was one that was spatially superposed, rather than
Manuscript received 31 Mar. 2019; accepted 1 Aug. 2019. juxtaposed in a stacked format (Figure 1), validating an intuitively-
Date of publication 16 Aug. 2019; date of current version 20 Oct. 2019.
For information on obtaining reprints of this article, please send e-mail to:
reprints@ieee.org, and reference the Digital Object Identifier below.
Digital Object Identifier no. 10.1109/TVCG.2019.2934786

1077-2626 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
JARDINE ET AL.: THE PERCEPTUAL PROXIES OF VISUAL COMPARISON 1013

tween all items within a set or only between neighboring items.


The complex dependency of visual comparison performance on
combinations of marks, arrangements, and tasks might soon be pre-
dicted by a model that accounts for such a set of perceptual proxies.
We speculate that these proxies follow two broad categories of global
or set-based visual comparisons and focal or item-based visual com-
parisons, drawing from perceptual psychology research on that divi-
sion in types of visual processing.
This work contributes (1) results on how visualization design ar-
rangement affects two new comparison tasks with two new crowd-
sourced graphical perception experiments for visual comparisons of
“biggest mean” and “biggest range”, (2) a framework of perceptual
proxies for visual comparison, (3) implementations of these proxies
for empirical evaluation, and (4) data generation procedures designed
to estimate the amount of signal needed in the data to support a given
visual comparison between sets of items. Our findings present a first
step toward a model of human perception during visual comparison.

2 R ELATED W ORK
Here we review empirical research on visual comparison and suggest
Fig. 2: Visual comparison depends not on a single dimension of mark, that visual comparisons can often be classified as being made between
arrangement, or task, but of the interactions between them. These isolated parts (i.e., a bar distinct from other bars in its set) or whole sets
interactions can be represented as a cube. Our present goal is not to (i.e., all the bars). Frequently, these correspond to analytic tasks for
examine the full space of the cube, but rather to understand how a which a goal is identification or comparison of items, or of sets, in data.
viewer uses visual features to serve analytic task goals depending on We propose that these analytic task goals correspond to proxies that
the marks and arrangements they see. determine the visual features that a viewer uses for visual comparison.
2.1 Visual Comparison
motivated guideline from Gleicher et al. [10]. Not predicted by current Frameworks of visual comparison are often driven by the analytic
guidelines was the discovery that, to support biggest delta comparisons goals of the viewer; for example, Amar et al. [2] names comparison as
in data, animation provided the most precise performance. Animation, a high-level “compound task” central to many specific analysis tasks.
however, did not perform as well for comparisons of correlations. Gleicher et al. [9] conducted a review of the taxonomies of visual com-
We first extend this work to empirically evaluate performance parison with the goal of a top-down approach examining what people
across these arrangements for two new tasks—“biggest mean” and do when they do visual comparison. They propose that, broadly, to
“biggest range”—and again find that performance is strongly impacted “compare” in a visualization involves at minimum three components:
by spatial arrangement. Comparison was most precise when these two targets (which multiple “items” are being compared), the relationship
datasets were vertically stacked, and least precise when the datasets between these items, and an action (a mechanism operating on the re-
were superposed. This pattern of which arrangements were best was lationship between these targets). These targets may correspond to
strikingly different than for the previous pattern for “biggest delta be- items or sets of data. Yi et al. [28] discussed adding a comparison
tween items” and “biggest correlation between sets.” task to their seven-task taxonomy, but ultimately decided against it be-
Why is there not a single clean emerging answer, where a given ar- cause “compare is a higher-level user goal or objective.” In contrast,
rangement is best across various tasks? This empirical evidence for the our work here and in past work [20] frames comparison as a relatively
more complex nature of visual comparison is consistent with the idea low-level perceptual task.
that it requires a series of visual actions at a variety of scales from one
object, to multiple objects, to whole sets of objects [9]. Taxonomies 2.2 Visual Comparison: Parts vs. Wholes
of visual comparison describe multiple stages of perceptual and cog- Visual comparison across multiple series can be comparisons of items-
nitive steps [4, 9], and vary in describing one or many types of visual to-items (focal) or sets-to-sets (global). We refer to these as different
comparisons, but the visual mechanisms supporting these stages are visual spans of comparison. One study tested visual comparisons be-
unclear. We argue that an empirical description of the precision of vi- tween smaller regions of time series charts, or between larger areas
sual comparison across each combination of mark × arrangement × of time series charts [14]. The data were consistent with the idea that
task would be valuable, but unlikely to scale to have predictive be- these are distinct visual actions: viewers conducting focal visual com-
yond its status as a lookup table. A different approach is required. We parisons of small regions benefited from shared-space charts that over-
propose that instead of continuing to fill out the entries of the cube in lapped in space, whereas viewers conducting visual comparisons over
Figure 2, it may be more productive to study perceptual proxies of a vi- the entire sets were better served by separated-space charts.
sualization are actually used to reason about a visual comparison task. Another study [20] investigated the perceptual precision of two vi-
The goal of this approach is to begin to identify the proxies for visual sual tasks, and tested a series of chart arrangements to see which ar-
comparison, as opposed to merely gathering additional empirical data. rangements best supported each visual comparison. In one task, peo-
We propose several candidate visual proxies and implement them ple saw two sets of 7 items and compared them to identify which of the
as simulations. This allows us to evaluate each proxy’s objective per- 7 items changed the most between the first and second chart. In this
formance in performing the same task given to the human participants. item-to-item comparison, visual comparison was most precise when
But by comparing the performance of these proxies to human perfor- the charts were spatially superposed within a single chart space, or had
mance on the same questions, we can rank proxies by which most an animated transition between them (temporally superposed). These
closely mirrors human performance. arrangements created visually salient features that mapped to these
Our evaluation of perceptual proxies suggests that although these item-to-item changes, either as overhang in superposed charts or as a
two comparison tasks show similar arrangement-driven patterns of re- salient motion cue that captured attention in the animated charts. Com-
sults, these patterns are consistent with different proxies. To compare parison was less precise when the sets were separated spatially such
means, the visual features that best predict human performance are the that the two chart axes were arranged horizontally, vertically, or mir-
ensembles of lengths and the centroids of the bars. To compare ranges, roring each other. In another task, people saw two pairs of bar charts
the best-predictive visual features were those that compared deltas be- that were correlated with each other to some degree. The viewer’s task
1014 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 26, NO. 1, JANUARY 2020

was to pick the pair with the strongest correlation. Here, performance delta value between the range widths of the two charts. Since
was best when each pair of correlated charts had axes that mirrored range may be a less widely-understood concept, we gave our
each other. Speculatively, viewers could rely on he rapid perception of participants a detailed description with a simple example, both at
symmetry between chart items, which in this task happened to indicate the start of the trials and each time they were incorrect in training
correlation between charts. trials. See past studies [12, 22] for similar tasks.
These focal vs. global modes have analogues in perceptual psychol-
ogy research as different perceptual modes. A viewer can opt between 4.2 Titer Staircase Method
these modes (with a mandatory dominance for global mode for a first Our goal was to quantify the magnitude of the difference of means
glance at a new image) to flexibly meet the demands of focal tasks for the M AX M EAN task, and the magnitude of the difference of range
(identify items within sets of larger items) or global tasks. In other widths for the M AX R ANGE task, for each arrangement. We dynami-
words, when people are presented with a visual stimulus, they can cally titrated stimulus difficulty using a staircase method [20]. Briefly,
flexibly choose whether to attend to the “forest” or the “trees” of that this method scales task difficulty on a trial-by-trial difference. The end
stimulus at varying spatial scales. These different attentional modes result quantifies a titer: a value between 0 and 1 that scales the mag-
switch which scope of a visual stimulus is used for a task goal [19]. nitude of the difference between stimuli to determine the threshold at
In sum, we predict that item-to-item comparisons are facilitated by which a participant can barely perform a discrimination task (expected
animated and superposed charts that place these items in close prox- performance of 75%). Fig. 3 illustrates this.
imity to each other, allowing a focused mode to subserve this more fo- In the M AX M EAN task, the titer controlled whether there was a
cused task. Set-to-set comparisons may be facilitated by arrangements large difference between the two chart means (large titer: easy task) or
that spatially separate these sets, allowing the visual system treat each a smaller difference between the two means (smaller titer: harder task).
set globally as its own unit, because preserving the values of specific For M AX R ANGE, the titer controlled whether there was a large differ-
items is not necessary. Different arrangements support different com- ence between the range widths of the two chart means or a smaller
parisons because, we propose, these arrangements offer visual proxies difference between the range widths of the two chart means (Fig. 1,
that people actually use to conduct visual comparisons. Biggest delta left; orange), making this pair stand out more from the baseline pair.
benefits from animation because a salient visual cue (motion speed of Titers and stimulus datasets changed trial-by-trial depending on par-
an item changing the most) naturally maps to, or is compatible with, ticipant performance for the previous trial. Briefly, the initial titer
the apprehension that the data value is also changing the most. value that scaled the difference-of-means or the difference-of-ranges
was 0.5. Depending on whether the participant’s discrimination in that
2.3 Interim Summary
trial was correct or incorrect, the next trial adjusted the titer by -0.01
Empirical evaluation of how human observers perceive values or rela- or +0.03. The goal of this staircase procedure is that by the end of
tionships in a visualization suggests that people can rely on a variety of the trials, the titer reflects a stable magnitude of signal that allows the
proxies, operating over the marks and arrangements and other visual participant to perform with 75% accuracy for that arrangement.
properties of a visualization, to meet the demands of an analytic task.
4.3 Arrangements
3 H YPOTHESES As in previous work [20], datasets were presented in blocks of 5 dif-
Our original hypothesis was that the two new tasks MaxMean and ferent arrangements (Fig. 1):
MaxRange would show similar patterns of performance to the Max-
Correlation and MaxDelta tasks from previous work [20]: that com-  Stacked: Vertically juxtaposed small multiples (i.e. one chart is
parisons of means would, like comparison of correlation, benefit from placed above the other).
mirrored charts. We were surprised to see an almost opposite pattern of
performance, leading us to turn out attention toward the path of seek-  Adjacent: A more commonly used instance of small multiples,
ing perceptual proxies that might provide more explanatory power for in which data series are horizontally juxtaposed.
why these strikingly different patterns emerged.
 Mirrored: This horizontally “mirrored” variation of adjacent
4 M ETHODS opposes the direction of the x-axis in each chart. The Gestalt
nature of bilateral symmetry suggests this arrangement prompts
Our first goal is to quantify the perceptual precision with which human “set” proxies rather than “item” proxies in viewers.
observers can perform visual comparison between two charts of hori-
zontal bars, and to measure whether that precision differs based on the  Superposed: A combined chart depicting both data series within
spatial arrangement of those charts. the same space. Past work has claimed that overlaying values, or
superposition, minimizes eye movements and memory load, and
4.1 Tasks may lead to efficient comparison [10].
We chose two tasks to build on previous work:
 Animated: In this “arrangement,” a single chart is transitioned,
 M AX M EAN: Of two sets, which had the largest average (mean) or morphed, from one data series to another over time, using
value? Difficulty is increased by reducing the delta between cubic interpolation to ease the transitions [7].
the mean values, so that the difference between sets is less
distinguishable. Displays were controlled so that the largest In trials, the order of these 5 blocks was rotated, and each rotation
single-item in a chart was not predictive of that chart having the reversed, for a total of 10 possible orderings, each of which was pre-
largest mean and so that charts in a trial were of approximately sented to 5 participants.
equal variance. Within-chart variance ranged from .04 to .09.
4.4 Task and Procedure
Harder discriminations (smaller mean deltas) spanned the low
to high variance range, whereas easier discriminations tended to Before each trial began, the screen contained a centrally placed fixa-
be lower variance. The two extreme values that bounded data tion dot and outlines of where the charts would appear. Participants
generation were directly included in a randomly selected chart, clicked a button to start the trial. After a countdown, the visualiza-
ensuring that the highest or lowest individual value did not corre- tion appeared for a short, fixed time. Static and animated charts were
late with the correct answer, and allowing us to examine whether shown for 1.5 seconds. In contrast to previous work, at the end of the
participants used this as a proxy. impression, both sets of data were removed from the display. Partici-
pants then clicked on the orange or blue button corresponding to the or-
 M AX R ANGE: Of two sets, which had the widest range between ange or blue set of bars to provide a response. For the M AX M EAN task
its min and max values? Difficulty is adjusted by varying the they were instructed to “Click on the chart that had the biggest mean
JARDINE ET AL.: THE PERCEPTUAL PROXIES OF VISUAL COMPARISON 1015

                         

Fig. 3: In the staircase procedure, a correct response produces a smaller difference in the subsequent trial.

values”; for M AX R ANGE to “Click on the chart that had the widest maximum titer value of 1.0) because at this point the stimuli cannot
range between min and max values.” Participants were informed if titrate difficulty beyond these floors and ceilings.
they were correct and, if incorrect, what the correct answer was. This Because viewers performed tasks for 5 arrangements, we excluded
feedback was provided to make the task more engaging and to rein- participants for whom there were at least 5 trials of floor or ceiling
force the goal. Between trials, the titer was adjusted based on the titer values. These criteria excluded 0 from the M AX M EAN task, but
response. To seek 75% accuracy during trials, the titer was increased for M AX R ANGE there was 1 trial in which a participant reached ceil-
three times as much for an incorrect answer as it was decreased for ing performance and 109 trials who repeatedly reached the floor titer
a correct answer (see Figure 3). Dynamic data generation according (largest delta). We excluded 7 participants for whom there were at
to the titer value is described in Supplemental Materials A. Each ex- least 5 (up to 22) trials of floor titer values (one of whom was also the
periment included all five arrangements. There were twenty trials for participant excluded with the standard deviation procedure), leaving
each arrangement, and arrangements were blocked. The order of the N = 49 for the M AX M EAN task and N = 47 for M AX R ANGE.
arrangement blocks was changed between participants.
5.2 Titer Analysis
4.5 Training We computed each observer’s mean titer values from the final 10 trials
Before training, participants were shown examples of stimuli and the for each arrangement. We used the final 10 trials because visual evalu-
task. Before each arrangement block, participants were given a time- ation of trial-by-trial data suggested that this was approximately when
unconstrained version of the task, which they were required to answer the staircase procedure stabilized around a narrow range of titers, for
correctly before proceeding (once for the M AX M EAN task, three times most participants. Thus we analyze the final 10 titer values achieved
for M AX R ANGE). Additionally, the first non-animated arrangement for each of the five arrangements, for each subject.
given to a participant followed untimed training with three timed train-
ing trials, which were identical to the real trials except that they always 5.3 Exp. 1 and 2: MaxMean and MaxRange
had the easiest (largest) titer. Data were regenerated on incorrectly an- Figure 4 (far left) displays the mean final 10 delta values for the
swered training answers to minimize answering by elimination. Video M AX M EAN task, and (second from the left) displays the mean values
demos are in Supplemental Materials. for the M AX R ANGE task. These titer values correspond to the differ-
ences between the charts being compared. Means could be discrimi-
4.6 Participant Recruitment nated when they differed by approximately 5-8% of the chart axis, and
Based on previous work, we predicted N = 50 would provide suffi- range widths when they differed by approximately 14-17%. For both
cient statistical power to reliably detect the presence or absence of tasks, the precision of visual comparison was affected by arrangement.
an effect of arrangement. We also expected that more participants Titer values for the present experiment were analyzed with a mixed
would struggle to understand the MaxRange task, so we collected data ANOVA to test for experiment-level and arrangement-level effects.
from 50 MTurk workers for the MaxMean task and 54 MTurk work- Titer values varied between experiments, F(1, 94) = 9.06, p = .003,
ers for MaxRange. Participants were asked to self-select out of the η p2 = 0.09, but this is likely because the titer values scale to different
study if they had color vision deficiencies. Each participant com- stimulus changes between the two experiments. As such we avoid a
pleted the staircase procedure for all 5 arrangements of one of the meaningful comparison between differing titer values.
tasks (M AX M EAN or M AX R ANGE). Worker IDs were used to ensure More meaningful is that there was a significant effect of arrange-
uniqueness of participants across all such combinations. All workers ment on precision, F(3.09, 290.7) = 8.17, p < .0001, η p2 = 0.08, with-
recruited for participation were adults in the United States. out evidence for an interaction between arrangement and experiment,
F(3.09, 290.7) = .34, p = .85, η p2 = 0.004, both Greenhouse-Geisser
5 R ESULTS corrected. This suggests that arrangement produces largely similar ef-
We evaluated the magnitude of deltas required in the data for non- fects on the precision of visual comparisons of means and of ranges.
expert participants to reliably identify the set with the largest mean We conducted pairwise comparisons within each experiment.
(Experiment 1) or the largest range (Experiment 2). Stacked charts were the most precise arrangement overall for both
tasks. Precision was better in stacked relative to adjacent charts for
5.1 Exclusion Criterion the M AX R ANGE task, t(49) = 2.73, p = .009, although not signifi-
The dependent measure in these experiments is the average titer value cantly better in the M AX R ANGE task, t(47) = 1.70, p = .09. Super-
from the final 10 trials of each task. We excluded datasets from par- posed charts resulted in the lowest precision for both tasks. Note that
ticipants whose average titers (averaged over spatial arrangement) was these patterns are strikingly different compared to prior evaluation of
more than 2 standard deviations from the mean. This procedure elim- visual comparisons of items, which were best supported by animated
inated 1 observer each from the M AX M EAN and M AX R ANGE tasks. and superposed charts.
We also adopted a second criterion. In a staircase procedure, the
goal is to find a converged titer value for which a participant is 75% 5.4 Accuracy
accurate. The procedure fails if a participant repeatedly reaches ceiling The goal of a staircase procedure is to titrate the task’s difficulty so dif-
performance (a minimum titer value of 0.01) or floor performance (the ficulty might change across arrangements, but that accuracy is equiv-
1016 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 26, NO. 1, JANUARY 2020

       


      





 

$%

&%

      ! "   " #   " #
   

Fig. 4: Means of averaged final titer values across participants performing the M AX M EAN and M AX R ANGE tasks. Smaller titers correspond to
more precise differences between means (range widths). The precision of both the M AX M EAN and M AX R ANGE tasks was affected by chart
arrangements. Also presented are titer values from previously published empirical evaluations of the precision of other comparison tasks. Note
that different chart arrangements support different visual comparisons. Gray bars represent 95% confidence intervals.

alent between arrangements. Mean accuracy in the M AX M EAN task 7 P ERCEPTUAL P ROXIES
for each arrangement ranged from 76.4% (stacked) to 79.9% (mirror),
with no evidence that accuracy was different between arrangements. Instead of mathematical operations, people more likely rely on heuris-
This suggests the staircase procedure reliably converged for this task. tic perceptual proxies to extract data values and patterns from data vi-
sualizations. Heuristics are shortcuts that rely on a simplified metric—
Mean accuracy in the M AX R ANGE task ranged from 75.2% (super- a proxy metric—to convey the desired information. Perceptual heuris-
posed) to 84.7% (stacked), and a repeated measures ANOVA found tics are easily computable features that (at least) correlate with the
that accuracy consistently differed between arrangements, F(4, 184) = right answer, allowing viewers of visualizations to use visual features
4.34, p = .002. The staircase procedure did not reliably converge as accurate or inaccurate proxies for the data those features represent.
for all arrangements in the M AX R ANGE task due to large effects of One example is the perception of correlation in scatterplots. The
arrangements on people’s ability to perceive range widths. Stacked perceptual process does not appear to calculate the true mathematical
charts allowed for higher accuracy and high precision than other ar- correlation, and there are instead proposals for multiple proxies that
rangements. In a pilot version of this experiment with fewer partici- might underlie correlation perception [12, 22, 27], including the as-
pants, we tested a larger initial titer value so that participants unfamil- pect ratio of the bounding box surrounding the points [27]. This proxy
iar with statistical ranges could use a very large signal in this task, but can be efficient because it relies on a rapid perceptual process of in-
found the same pattern: superposed charts simply outperform stacked specting a shape boundary around the points. This proxy is also fairly
charts regardless of initial task difficulty. Because of these strong ef- accurate [27]: the width of the bounding box of the visualization cor-
fects of chart arrangement, future work using this method might better responds strongly to the correlation in the data.
titrate task difficulty by relying on a greater number trials, or a differ- Scatterplots are commonly used to show correlation data, but not all
ent kind of adaptive titration procedure. links between visualization and analytic task are so strong. Further-
more, an analyst might not always know what kind of visualization
they will see. Finally, the selection of a proxy will be strongly affected
by the visual features that are available in a visualization.
6 D ISCUSSION
Different proxies may afford not only different data patterns, but
different conceptual associations of what those values might mean.
The precision of visual comparison of M AX M EAN and M AX R ANGE
The same two data points graphed as two bars or as two endpoints of a
tasks were best supported by vertically stacked charts, and least sup-
line chart can evoke different visual actions taken on visual features of
ported by superimposed charts. This is in contrast to previous findings
the visualization. Zacks and Tversky [29] presented simple line or bar
for item-item comparisons, which were best supported by animated
charts to participants for open description. Participants’ descriptions
and superimposed charts. These differential findings are consistent
of bar charts overwhelmingly tended to involve discretizing words,
with the principle that item-item and set-set comparisons of data are
such as “Y is higher than Z,” and descriptions of line charts entirely
supported by arrangements that enable focal and global visual feature
used continuous relations, such as “as X increases, Y decreases.” This
comparison, respectively. That vertically stacked charts were best may
bar-line message correspondence seems to occur because the type of
be unsurprising because marks are horizontally extending bars. View-
mark is associated with metaphors of bars being containers or groups,
ers can simply slice downward to extract lengths between charts.
in contrast to lines, which are continuous entities.
We hypothesize that the reason these arrangements best supported
these tasks is that visual system can, somewhat flexibly, adopt a series 7.1 Candidate Proxies
of focal or global “perceptual proxies” that operate on a visualization.
To understand visual comparison we must understand not only what A visualization contains any number of visual features potentially
visual features exist in a given visualization, but when they are used available as a proxy for a given task, such as the lengths of the top
for which tasks. But these are post-hoc explanations. Are people actu- most items of each set, or the perceived symmetry of each set. Differ-
ally using that proxy for both tasks? Or are people using different vi- ent visual features might be better proxies than others for different vi-
sual features to support M AX M EAN and M AX R ANGE? This question sual comparisons. Here we explore which visual features appear to be
cannot be answered by evaluating the precision of visual comparisons. most similar to participant performance (making the same decision),
Next we study whether the same patterns of precision arising from the when used as a proxy for M AX M EAN or M AX R ANGE. We developed
same arrangements for different tasks arise because of the different two broad categories of candidate features, informed by research in
visual features that people use for comparison. both visualization and perceptual psychology.
JARDINE ET AL.: THE PERCEPTUAL PROXIES OF VISUAL COMPARISON 1017

033*#-&1208*&3 &3$2*14*0/ *35"-$0(/*4*0/12*/$*1-&


&"/ 842"$43-&/(4)30'#"230'&"$)3&4 &01-&$"/&842"$44)&.&"/3*:&0'"3&4
$0.154&3&/3&.#-&3$)003&3$)"24 0'*4&.3 !4)05()4)"4.&"/*3-*,&-9
7*4)-0/(&2&/3&.#-& 4)205()30.&12089
&/420*% *$,3$)"247*4)-"2(&34$&/420*%0'4)& 9&.06&.&/432"1*%-9%&1-0940
#"2"2&"3"-0/(+5342&-&6"/48"8*3 $&/420*%30'(20513#544)03&$&/420*%3
"11&"240#&$0.154&%"$20334)&
#05/%*/()5--0'4)&0#+&$43/044)&*2425&
5--2&" "-$5-"4&3$0/6&8)5--0'$)"241*$,3 $&/4&20'(2"6*49 !
$)"247*4)#*((&2)5--"2&"
    

5--&/420*% "-$5-"4&3$0/6&8)5--31*$,3-"2(&2
$&/420*%"-0/(2&-&6"/4%*.&/3*0/
0/-9
2"12&" 2"7342"1&:0*%3#&47&&/&"$) 3)"1&<3&84&2/"-#05/%"2*&3$"/#&
$)"24<3401"/%#0440.#"231*$,3 .02&6*35"--93"-*&/44)"/*/4&2/"-
#*((&2"2&" #05/%"2*&3 !7)*$)$05-%120%5$&
06&27&*()4*/(0'4)&'*234"/%-"34#"23
2"1&/420*% 2"7342"1&:0*%3#&47&&/&"$) 7)&/+5%(*/(4)&#05/%"29$0/4052
$)"24<3401"/%#0440.#"231*$,3
42"1&:0*%7*4)-"2(&2$&/420*%
9..&429*"3 "-$5-"4&33,&7*&39..&4290' &01-&"2&3&/3*4*6&4039..&429 !
&"$)3&4$)003&37)*$)$)"24*3-&33 "/%"2&#*"3&%403&-&$439..&42*$
3,&7&%*&.02&39..&42*$ 0#+&$43&6&/7)&/4)&4"3,*3/04"
39..&429+5%(.&/4 !

"/(& 842"$43"--1"*27*3&%&-4"37*4)*/" &/(4)%*''&2&/4*"4*0/)"3)*()"$5*49


3&4$)003&33&47*4)4)&-0/(&34 !
1"*27*3&-&/(4)%*''&2&/$&
*((&3406&2 &47&&/$)"243'*/%3-"2(&34%&-4"
"*2#3 #&47&&/*4&.1"*23" # " # ;
1*$,3$)"247*4)-"2(&34103*4*6&%&-4"
    

*((&3406&2 ".&"3*((&3406&2"*27*3&
"*2&- #543$"-&%2&-"4*6&404)&3."--&2*4&.
7*4)*/4)&1"*2
*((&34 0.1"2&401*4&.31*$,3$)"247*4) )&/'"$&%7*4).5-4*1-&0#+&$431&01-&
*2344&. -"2(&2401*4&. "2&#*"3&%40"44&/%404)&4010#+&$4
!
*((&34*%%-& 0.1"2&.*%%-&*4&.31*$,3$)"24 &01-&.*()43&-&$4"(2051<3$&/4&2*4&.
4&. 7*4)-"2(&2.*%%-&*4&. !"/%4)&/3&-&$44)&-"2(&20'4)&470
!
-01&*/ */%3&"$)$)"24<3.*/."8 &01-&.*()43&-&$4.*/."8054-*&23
40"8 $0.154&33-01&#&47&&/4)&.*$,3 !4)&/$"-$5-"4&0''3&4 !
4)&$)"247*4)4)&-&"346&24*$"-3-01&

Fig. 5: A set of candidate perceptual proxies that might be used in visual comparison of means and ranges (and possibly other tasks). The
proxies are arranged by their correspondence with hypothesized distinctions between global and local visual scopes.

7.1.1 Global Features The mean length feature tests this veridical averaging. Viewers might
also perceptually organize the bars into a coherent object, such that
Global-level features describe properties aggregated over a visual what they perceive is the convex hull of the bounded object that in-
set of items, rather than comparing two focal items. Viewers can cludes the heights of the bars and the white space between bars, and
rapidly compute global statistics such as the mean of a collection of then compare the centroids or areas of these two hulls. These object
items [1, 11, 23, 25], though from present work it is unclear if this boundary proxies might be subject to perceptual biases, such as over-
ability is mediated by a proxy. One high-precision proxy is that the weighting outer edges in contour judgments [6]. Empirical research
lengths of bars in a set are veridically averaged together and the chart on human attention suggests that the allocation of attention through-
with the largest ensemble length is chosen as the answer for the task.
1018 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 26, NO. 1, JANUARY 2020

out visual displays is preceded by the organization of the scene into 7.2 Perceptual Proxy Results
objects and groups [5], and that the center-of-area of those objects The goal of this proxy approach is to evaluate which visual features
can be rapidly computed [17]. The hull area and hull centroid proxies are consistent with human performance, and which are actually useful
test whether this visual feature is consistent with participant responses for the task. As such we evaluate the “decisions” of each proxy against
and consistent with differences in the data. Note that for superposed two baselines. On what proportion of trials did the proxy agree with
charts, the two hulls are overlapping, such that this particular visual the participant’s response? And on what proportion of trials did the
feature may be harder for people to see because it involves filtering proxy agree with the true answer of the stimulus? We treat all of the
using color rather than space (as with the stacked, mirrored, and verti- following results as initial speculations, and make no claims of their
cal arrangements). Finally, people are highly sensitive to symmetry in statistical reliability. These values are depicted in Fig. 6. A visual
displays [24] and are biased to select symmetric over asymmetric in- feature can be considered useful if a decision using the differences in
formation [15]. One possible heuristic is that people use symmetry as that visual feature is consistent with the task-dependent differences in
a proxy for range, such that any chart that is less symmetric is selected the data. The dots in Fig. 6 to the right of 50% show features that give
as the one having the bigger range. above-chance performance at the task. We highlight a few patterns.
We suspected that visual features describing global, rather than First, the most useful visual feature depends on comparison task.
focal, characteristics of the visualization would be better predictors Broadly speaking, global visual features such as centroids are better
of human decisions in the M AX M EAN task, because this task in- candidates for the M AX M EAN comparisons, and focal visual features
volves set-level comparisons. Conversely, because precision for a such as Neighbor Delta are better candidates for M AX R ANGE compar-
M AX R ANGE task requires the isolation of individual item lengths for isons. For the M AX M EAN task, visual features of the Mean lengths
comparison between sets, we predicted visual features describing focal (global), Bar Centroids (global), and Biggest Mover Pair (focal) were
characteristics would be better predictors of human decisions. the most predictive of the difference in the means. It was unexpected
that the Biggest Mover Pair, which computes pairwise differences be-
7.1.2 Focal Features tween chart items, predicted the difference of means at above-chance
Focal features describe pairwise differences between two items. Peo- levels. It suggests that in the data, the largest between-item change
ple can discriminate small differences in line segment lengths [16]. (neighbor delta in superposed charts, motion in animated charts) was
Chart viewers might be sensitive to the deltas, either between charts predictive of the chart means, moreso than other global features. For
(Biggest Mover Pair) or within a chart (Neighbor Delta). In addition, M AX R ANGE, the Range proxy (which computed all pairwise dis-
focal attention can be biased to attend to the topmost item in a col- tances between items) was most useful, closely followed by pairwise
lection [26], so one possible proxy is that people compare only the differences only between neighboring items (Neighbor Delta).
lengths of the topmost items of the two sets (Biggest First Item). Second, people tend to make decisions consistent with using the
most useful visual features: the bars that show agreement between
7.1.3 Implementations proxy responses and human responses tend to follow the dots that
We implemented these global and focal perceptual proxies for all shows the most task-relevant useful features in Figure 6.
charts (Figure 5 and pseudocode in Appendix B in the supplemental Third, we note the absence of a symmetry bias. The Symmetry
materials). Some visual features may be salient [13] to human ob- proxy, which uses stimulus symmetry as a proxy on which to make
servers, but not useful for an analytic task (uncorrelated with the an- M AX M EAN and M AX R ANGE decisions, was predictive neither of ac-
swer). For example, the delta between adjacent bars (i.e., the amount tual differences in means or range, nor of human responses.
of overhang) might be a salient and useful indicator for an analytic Fourth, there is weak evidence of a bias for people to perform the
task involving comparing items, but if the viewer’s goal is to compare M AX R ANGE task with the global proxies of Hull Centroid and/or Area
means, relying on this feature should impair task performance. Trapezoid Centroid, to a higher degree than is actually useful in the
To evaluate these proxies, we simulated what would happen if each task: note where in Figure 6 the human behavior bars are to the right
proxy was tested on every data series combination that each observer of the proxy dots.
actually saw in the two experiments. Each proxy was used to make a We speculate that these findings are broadly consistent with the idea
decision about a visual comparison (e.g., Hull Area generated a convex that global visual features are useful for set-level visual comparisons,
hull around each of the two charts, calculated their areas, and evaluated and local visual features are useful for item-level visual comparisons.
the pixel difference in their areas), and provided an “answer” to the M AX M EAN and M AX R ANGE tasks benefit from the same chart ar-
task (i.e., larger area is used as a proxy for mean or for range). rangements, but use different emergent visual features in these chart
Note that this procedure necessarily shows the proxies different arrangements for visual comparison. Visual comparison is afforded
stimuli depending on arrangement: because the stimuli have been by more than precision of marks and their arrangements. The “visual”
titrated to respond to viewer accuracy, the charts “shown” for stacked component of visual comparison may rely on a flexible suite of visual
stimuli will have different properties than the charts “shown” for su- proxies that viewers can rely on to accomplish a given task, depending
perposed stimuli. Because the data in the charts “shown” to the proxies on what visual features are present. The slight bias to erroneously use
is arrangement-specific, proxies were implemented to be arrangement- global features for the M AX R ANGE task raises the speculative possi-
invariant. The proxies were calculated using raw data values, the bility that, in some tasks and arrangements, viewers use global shape-
length of each mark, and the relative location of each mark (e.g., the based proxies even when these proxies are not useful.
first datum in a chart was at the “top” location), not as visual features
7.3 Caveats, Limitations, and Future Directions
extracted from an image-based representation. Future work should
also test proxy performance using image-based implementations. We enumerate several important caveats below and how they suggest
We computed two outputs for each of these proxies: which chart possible future avenues of research.
would the proxy have chosen, and was this choice correct? Although 1. Whither the cube? We began this work with the intent of filling
we excluded some participants from the titer analysis for low accu- out more cells of the mark × arrangement × task “cube” in Fig-
racy, we included their data in the simulation to allow for the future ure 2. While we quickly found that this model does not seem to
possibility of testing whether their poorer task performance is consis- scale, it is possible that more data (varying marks, arrangements,
tent with using different perceptual proxies than other viewers with and tasks) might show that it can scale with some modifications.
higher-precision visual comparison. For example, dividing the tasks axis into “local” and “global”
Files that contain trial-by-trial data for properties of the stimuli, hu- might allow a useful level of predictive validity, even if the un-
man responses, the pixel information used by each perceptual proxy derlying perceptual explanations are less satisfying.
to inform a heuristic about a chart decision, and each proxy’s de-
cision, for all combinations of arrangement and task, are posted at 2. More data on the cells of the cube: Existing work has only
https://osf.io/uenzd/. now empirically evaluated four visual comparison tasks, and
JARDINE ET AL.: THE PERCEPTUAL PROXIES OF VISUAL COMPARISON 1019

             


   
  
   
!! 
    !
!!  
  
 
  ! " 
  # " 
!  
$ %& ! 
' 
 
' ' 
$ %& ! 
!  
!!  
!! 
 
  
  
 
 
  # " 
   
  ! " 
    !

Fig. 6: Results of the two analyses of visual feature performance, split by task: M AX M EAN and M AX R ANGE. The x-axis is the percentage of
trials for which the visual proxy was predictive, for human behavior (vertical bars), and for true answer for the comparison (colored dots). The
small dots show individual subjects, and light gray around the black lines shows 95% confidence interval. True answer dots are color-coded to
show whether we informally coded them as a global proxy feature (blue) or focal proxy feature (orange). The true answer dots indicates that
some features are more useful than others for a given visual comparison.

there are many more to test. The present experiment focused of the proxies is to create “bots” that perform the same experi-
only on bar charts, due to their ubiquity in real visualizations, and ments, simulating Mechanical Turk workers that exclusively use
combination of position and length encodings. It will clearly be only one proxy. Large numbers of these simulated participants
important to fill out the “cube” by testing other marks, including could run through the real experiment, with difficulty titrated ac-
lines, orientations, saturations, etc., both to test the robustness cording to their performance across trials, in a way that produces
of the cube model and to provide more data for the enterprise mean titer levels for each “proxy bot,” allowing another type of
of searching for candidate perceptual proxies for visualization comparison of the proxy to human performance.
tasks. The bar charts in the present work and that of Ondov et
al. [20] were horizontally extended, an increasingly common de- 5. Proxies merely fit performance: This human-proxy agreement
sign [8]. Other variants even of bar charts might reveal the use metric can only reveal features that are consistent with human
of different proxies for comparison. performance. They cannot confidently reveal what features peo-
ple actually use. As in science more broadly, we can rarely be
3. Different dependent measures: The current experiments titrate sure of an answer, but we can be sure which of many generated
the size of the compared difference, instead of other potential dif- potential answers is most consistent with the data.
ficulty manipulations, such as the time allowed to make the judg-
ment, or the number of objects in each set. But some visual com- 6. Proxy overlap: The tested visual features are highly inter-
parisons need not be precise, and future work should test whether correlated. As such, we cautiously refrain from strong conclu-
the same patterns of results hold for these alternative dependant sions about which best predict performance. As above, future
measures. We would not be surprised by substantial differences work should use more sophisticated modeling that accounts for
in those results, as the perceptual proxies that help make pre- this shared variance. It could also rely on datasets that are in-
cise judgments could differ substantially from those that allow tentionally designed to maximally differentiate among the pre-
coarser judgments, or judgments over larger sets of values. dictions of the candidate proxies. For example, our data gener-
ation for the M AX M EAN task was specifically designed so that
4. Artificial artificial artificial intelligence: We measured the per- the largest item was not predictive of the largest set mean. As
formance of workers on Mechanical Turk (whose slogan is “ar- such a largest-item proxy could not be useful for the current
tificial artificial intelligence,” because human workers take on M AX M EAN data sets, but could be for other data sets.
tasks that are often automated) on making visual comparisons,
and then matched their trial-by-trial performance to the predic- 7. A proxy for proxies: The proxies implemented here did not use
tions of each candidate proxy. Another way to test the match a computer visual system to “look at” pixels of a chart’s visual
1020 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 26, NO. 1, JANUARY 2020

features and parse those pixels into values. We used the actual One possible avenue of future work is a survey of the literature of
data values to generate models of these perceptual proxies. The what kinds of visualizations are used for different kinds of visual com-
value of this approach is that if we can determine the proper- parisons, to test the validity of the global-versus-focal distinction. A
ties of the data and arrangements that lend themselves to partic- possible eventual extension of this line of research is the potential au-
ular proxies for comparison, then a potential application of this tomation of constructing marks and arrangements, based on character-
approach is that an automated visualization system would only istics of the data and on the designer’s desired comparison they wish to
need know the data values and the designer’s desired compari- enable in the viewer. This line of work would invert the approach used
son to construct the mark and arrangement to support that com- here. In the current work, we assessed performance of proxies and of
parison. In other words, these proxies do not directly take into observers with a known task. The experimental task may have guided
account limitations in perceptual visual acuity, or the capacity the proxy that observers used. But what if a viewer does not have a
limitations of attention and memory. specific task before seeing a visualization? One possibility is that the
proxy that yields the most salient comparison compels the viewer to
8. How can one generate new candidate proxies? Future work perform a task guided with that proxy. For example, in a visualiza-
should generate more, and more sophisticated, proxies (includ- tion with two charts with one large item outlier, the single item out-
ing combinations of proxies, and eventually, predictions for who lier might capture attention provoke a focal comparison. Future work
will use which, and when). We generated proxies with a com- could manipulate the salience of different proxies in a stimulus and
bination of intuition and consultation with the perceptual psy- evaluate which task observers perform based on the salience of those
chology literature, including a strong influence of the literature proxies, and verify this approach with interviews.
on focal vs. global processing modes in vision. Our list is by
no means exhaustive, and identifying new candidates will be a 8 C ONCLUSION
creative process that, like hypothesis generation across the rest
of science, relies on engaging a diverse group of people with In this work, we found evidence that visual comparisons of biggest
different types of background knowledge across both the percep- mean and biggest range are supported by some chart arrangements
tion and data visualization communities. A brute force approach more than others, and that this pattern is substantially different from
would be to generate the full space of mathematically possible the pattern for other tasks. We proposed a series of potential prox-
pairwise and set-wise proxies. Another route could be based on ies across different tasks, marks, and spatial arrangements. Simple
interviews with viewers engaged in a particular task, to see which models of these proxies can be empirically evaluated for their explana-
aspects of their proxies might be consciously verbalized. tory power by comparing their performance to human performance for
these marks, arrangements, and tasks. We used this process to high-
7.4 Implications for Visualization light candidates of perceptual proxies that might scale more broadly to
explain performance in visual comparison.
The study of the visual features used for visual comparison points to
fruitful paths forward for both perceptual psychology as well as for
inspiring guidelines for effective visual comparison. For practitioners, ACKNOWLEDGMENTS
because our results indicate that people use different visual features for Nicole Jardine was supported by the U.S. National Science Founda-
different tasks, a visualization designer could use these rules to opti- tion grant number DRL-1661264 while affiliated with Northwestern
mize their visualizations and arrangements depending on the perceived University. Brian Ondov was supported by the Intramural Research
task. For example, a visualization where the key task is perceiving a Program of the National Human Genome Research Institute, a part of
range, such as trend over time in, e.g., a stock market visualization, the U.S. National Institutes of Health. Any opinions, findings, and
should clearly optimize for focal visual comparisons. In visualizations conclusions or recommendations expressed in this material are those
where understanding the biggest mean is central, value arrangements of the authors and do not necessarily reflect the views of the respective
should favor global visual features. funding agencies.
Our study here was confined to bar charts, but it is clear from the
richness of our results that this limitation did not restrict the complex-
R EFERENCES
ity of the performance results. Bar charts are clearly flexible visual
representations in that they support both global and focal visual com- [1] G. A. Alvarez. Representing multiple objects as an ensemble enhances
parison. Nevertheless, another clear next step is to expand this work visual cognition. Trends in Cognitive Sciences, 15(3):122–131, 2011. doi:
to other visual mark types. 10.1016/j.tics.2011.01.003
In our empirical study of perceptual proxies, the features that [2] R. Amar, J. Eagan, and J. Stasko. Low-level components of analytic activ-
yielded the most similar responses suggests that some proxies are more ity in information visualization. In Proceedings of the IEEE Symposium
on Information Visualization, pp. 111–117. IEEE, Piscataway, NJ, USA,
likely than others to explain human behavior. This is not to say such
2005. doi: 10.1109/INFOVIS.2005.24
a correspondence would prove that people use these extremely sim-
[3] J. Bertin. Semiology of Graphics. University of Wisconsin Press, Madi-
plified proxies exactly, or alone, but instead point a path forward to son, Wisconsin, 1983.
possible mechanisms that can be empirically evaluated. [4] M. Brehmer, J. Ng, K. Tate, and T. Munzner. Matches, mismatches, and
methods: Multiple-view workflows for energy portfolio analysis. IEEE
7.5 How Might Viewers Choose Proxies? Transactions on Visualization and Computer Graphics, 22(1):449–458,
The proxy approach has been a fruitful one for the study of value es- 2016. doi: 10.1109/TVCG.2015.2466971
timation and comparison of correlation in scatterplots [12, 22, 27]. [5] Z. Chen. Object-based attention: A tutorial review. Attention, Perception,
& Psychophysics, 74(5):784–802, 2012. doi: 10.3758/s13414-012-0322
Scatterplots are an ideal “Petri dish” with which to test perceptual
-z
proxies for visualization for a number of reasons. Scatterplots are also
[6] H. Choo, B. R. Levinthal, and S. L. Franconeri. Average orientation is
used often to communicate a single statistic (correlation) of a set for more accessible through object boundaries than surface features. Jour-
which precision is important. A viewer seeing a scatterplot will likely nal of Experimental Psychology: Human Perception and Performance,
develop the analytic goal of perceiving correlation, which should be 38(3):585, 2012.
more likely to trigger analysis of the proxies available in the scatter- [7] P. Dragicevic, A. Bezerianos, W. Javed, N. Elmqvist, and J.-D. Fekete.
plot visualization to calculate correlation [21]. Designers are likely to Temporal distortion for animated transitions. In Proceedings of the ACM
be implicitly aware of some of these perceptual proxies for global and Conference on Human Factors in Computing Systems, pp. 2009–2018.
focal tasks. It is possible that the ubiquity of bar chart histograms, or ACM, New York, NY, USA, 2011. doi: 10.1145/1978942.1979233
multi-item bar charts in general, is because they allow for both global [8] S. Few. Now You See It: Simple Visualization Techniques for Quantitative
and focal comparisons. Analysis. Analytics Press, Oakland, CA, USA, 2009.
JARDINE ET AL.: THE PERCEPTUAL PROXIES OF VISUAL COMPARISON 1021

[9] M. Gleicher. Considerations for visualizing comparison. IEEE Trans- [20] B. D. Ondov, N. Jardine, N. Elmqvist, and S. Franconeri. Face to face:
actions on Visualization and Computer Graphics, 24(1):413–423, 2017. Evaluating visual comparison. IEEE Transactions on Visualization and
doi: 10.1109/TVCG.2017.2744199 Computer Graphics, 25(1):861–871, 2019. doi: 10.1109/TVCG.2018.
[10] M. Gleicher, D. Albers, R. Walker, I. Jusufi, C. D. Hansen, and 2864884
J. C. Roberts. Visual comparison for information visualization. In- [21] S. Pinker. A theory of graph comprehension. In R. Freedle, ed., Artificial
formation Visualization, 10(4):289–309, Oct. 2011. doi: 10.1177/ Intelligence and the Future of Testing, pp. 73–126. Lawrence Erlbaum
1473871611416549 Associates, Inc., Hillsdale, NJ, USA, 1990.
[11] J. Haberman and D. Whitney. Ensemble perception: Summarizing the [22] R. A. Rensink and G. Baldridge. The perception of correlation in scat-
scene and broadening the limits of visual processing. From Perception to terplots. Computer Graphics Forum, 29(3):1203–1210, 2010. doi: 10.
Consciousness: Searching with Anne Treisman, pp. 339–349, 2012. doi: 1111/j.1467-8659.2009.01694.x
10.1093/acprof:osobl/9780199734337.003.0030 [23] D. A. Szafir, S. Haroz, M. Gleicher, and S. Franconeri. Four types of
[12] L. Harrison, F. Yang, S. Franconeri, and R. Chang. Ranking visualiza- ensemble coding in data visualizations. Journal of Vision, 16(5):11–11,
tions of correlation using Weber’s law. IEEE Transactions on Visual- 2016. doi: 10.1167/16.5.11
ization and Computer Graphics, 20(12):1943–1952, 2014. doi: 10.1109/ [24] J. Wagemans. Characteristics and models of human symmetry detection.
TVCG.2014.2346979 Trends in Cognitive Sciences, 1(9):346–352, 1997. doi: 10.1016/S1364
[13] L. Itti, C. Koch, and E. Niebur. A model of saliency-based visual atten- -6613(97)01105-4
tion for rapid scene analysis. IEEE Transactions on Pattern Analysis & [25] D. Whitney and A. Yamanashi Leib. Ensemble perception. Annual
Machine Intelligence, (11):1254–1259, 1998. Review of Psychology, 69:105–129, 2018. doi: 10.1146/annurev-psych
[14] W. Javed, B. McDonnel, and N. Elmqvist. Graphical perception of mul- -010416-044232
tiple time series. IEEE Transactions on Visualization and Computer [26] Y. Xu and S. L. Franconeri. Capacity for visual features in mental ro-
Graphics, 16(6):927–934, 2010. doi: 10.1109/TVCG.2010.162 tation. Psychological Science, 26(8):1241–1251, 2015. doi: 10.1177/
[15] M. King, G. E. Meyer, J. Tangney, and I. Biederman. Shape constancy 0956797615585002
and a perceptual bias towards symmetry. Perception & Psychophysics, [27] F. Yang, L. Harrison, R. A. Rensink, S. Franconeri, and R. Chang. Cor-
19(2):129–136, 1976. doi: 10.3758/BF03204219 relation judgment and visualization features: A comparative study. IEEE
[16] D. M. Levi and S. A. Klein. Vernier acuity, crowding and amblyopia. Vi- Transactions on Visualization and Computer Graphics, 25(3):1474–
sion Research, 25(7):979–991, 1985. doi: 10.1016/0042-6989(85)90208 1488, Mar. 2018. doi: 10.1109/TVCG.2018.2810918
-1 [28] J. S. Yi, Y. ah Kang, J. T. Stasko, and J. A. Jacko. Toward a deeper un-
[17] D. Melcher and E. Kowler. Shapes, surfaces and saccades. Vision Re- derstanding of the role of interaction in information visualization. IEEE
search, 39(17):2929–2946, 1999. doi: 10.1016/S0042-6989(99)00029-2 Transactions on Visualization and Computer Graphics, 13(6):1224–
[18] T. Munzner. Visualization Analysis and Design. A.K. Peters Visualization 1231, 2007. doi: 10.1109/TVCG.2007.70515
Series. CRC Press, Boca Raton, FL, USA, 2014. [29] J. Zacks and B. Tversky. Bars and lines: A study of graphic commu-
[19] D. Navon. Forest before trees: The precedence of global features in visual nication. Memory & Cognition, 27(6):1073–1079, 1999. doi: 10.3758/
perception. Cognitive Psychology, 9(3):353–383, 1977. doi: 10.1016/ BF03201236
0010-0285(77)90012-3

You might also like