What Questions Reveal About Novices' Attempts To Make Sense of Data Visualizations: Patterns and Misconceptions

Computers & Graphics 94 (2021) 32–42
Contents lists available at ScienceDirect
Computers & Graphics

journal homepage: www.elsevier.com/locate/cag
Special Section on SIBGRAPI 2020
What questions reveal about novices’ attempts to make sense of data

visualizations: Patterns and misconceptions
Ariane Moraes Bueno Rodrigues∗, Gabriel Diniz Junqueira Barbosa, Hélio Côrtes Vieira
Lopes, Simone Diniz Junqueira Barbosa
Department of Informatics, PUC-Rio, Rua Marques de Sao Vicente, 225, Gavea, Rio de Janeiro 22451-900, Brazil
a r t i c l e i n f o a b s t r a c t
Article history: Data visualization literacy has attracted widespread interest due to the urgent need to analyze unprece-
Received 4 June 2020 dented volumes of data we have nowadays. Much work on visualization literacy focuses on asking people
Revised 28 September 2020
to answer specific questions about the data depicted in a visual representation, in an attempt to try to
Accepted 29 September 2020
understand how people make sense of the underlying data. In this work, we investigate, through a user
Available online 8 October 2020
survey, the initial questions people pose when first encountering a visualization. We analyzed a set of
Keywords: 1058 questions that 22 participants created about 20 different visualizations, deriving templates for the
Visualization sense-making recurring types of questions that emerged as information-seeking patterns, and classifying the various
Data-related questions kinds of errors they introduced in the questions. By understanding the common mistakes they made
Visualization literacy when asking data-related questions, we now feel better equipped to inform further research on pro-
Empirical study ducing and consuming data visualization concepts. The results of the study reported in this paper can
be used in teaching data visualization, as they uncover and classify frequent errors people make when
trying to make sense of data represented visually. The study may also contribute to the design of visual-
ization recommender systems, as the question patterns revealed what people expect to answer with each
visualization.
© 2020 Elsevier Ltd. All rights reserved.
1. Introduction visual literacy [12–14], visual information literacy [15,16], and

graphicacy [17,18]. While exploring the literature, we found that
As the availability of data reaches unprecedented volumes, at- visualization literacy research assesses several types of literacy:
tention has shifted from data acquisition (when there are poor textual [19], visual [5,7,9], mathematical (or statistical, aka numer-
datasets) to data analysis (what to do with the recently available acy) [20–22], and data [23–25]. These types of literacy assessment,
rich datasets) [1]. Human attention and capacity to process that alone or combined, can be used for different research purposes.
data are now the limited resources. To extract information and gain They help to analyze how people: understand, create (sketching,
insights from those data, several tools to support data analysis and through coding, or using an authoring tool), teach, and/or make
visualization techniques have been developed [2]. sense of visualizations.
Despite the growing number of visualization tools, both in in- Much work on visualization literacy focuses on asking people
dustry (e.g., Tableau [3]) and academia (e.g., Voyager 2 [4]), we still to answer specific questions about the data represented in a visual
need to face the challenge of ensure that both the producers and representation [5,6,26–28]. In an effort to understand how people
consumers of data visualization are able to understand data visu- attempt to make sense of certain data visualizations, in this paper,
alization, i.e., they have an adequate level of visualization literacy. we follow a complementary approach.
Visualization literacy has received increasing attention in the We investigate the initial questions people make when first en-
last few years [5–9,9–11]. Boy et al. defined visualization literacy countering a visualization, as an attempt to make sense of the un-
as “the ability to use well-established data visualizations (e.g., line derlying data. Through a user survey, we requested participants to
graphs) to handle information in an effective, efficient, and confi- ask questions about a set of visualizations. We analyzed the set
dent manner.” [6]. They related their definition to many others: of 1058 questions that 22 participants created, deriving templates
for the recurring types of questions that emerged as information-
∗ seeking patterns, and classifying the various kinds of errors they
Corresponding author.
E-mail addresses: arodrigues@inf.puc-rio.br (A. Moraes Bueno Rodrigues), introduced in the questions. By understanding the common mis-
simone@inf.puc-rio.br (S. Diniz Junqueira Barbosa). takes they made when asking data-related questions, we now feel
https://doi.org/10.1016/j.cag.2020.09.015
0097-8493/© 2020 Elsevier Ltd. All rights reserved.
A. Moraes Bueno Rodrigues, G. Diniz Junqueira Barbosa, H. Côrtes Vieira Lopes et al. Computers & Graphics 94 (2021) 32–42
better equipped to inform further research on producing and con- they showed a graph query (bar or line chart) and asked partic-
suming data visualizations. ipants whether the information in the graph query was “stated”
Understanding the mistakes novices make allows us to recog- or “not stated” in the source graph. Next, they did the same with
nize gaps in their understanding of information visualization con- a prose query, asking participants whether the information in this
cepts. This in turn contributes not only to data visualization ed- query was “stated” or “not stated” in the source prose shown be-
ucation, but also to the design of visualization authoring tools fore. Their goal was to assess the participants’ understanding of
and other tools that make use of visualizations, such as question- each graph.
answering systems. The mentioned studies have in common the fact that the par-
The remainder of this paper is organized as follows. ticipants’ path to extract information from the visualization is to
Section 2 presents the related work, expanding on the moti- analyze the chart based on the provided question.
vation of this work. Section 3 describes the empirical study. Recently, Kim et al. [30] conducted a study to identify how peo-
Sections 4–6 analyze the study results. Finally, Section 7 concludes ple usually ask questions when they are analyzing a chart. It was a
the paper with the main findings and points to future work. formative study to start broader research, whose ultimate goal was
to develop a pipeline of automatic chart answers with visual ex-
2. Related work planations of how these answers were obtained. For a given chart,
they asked crowdworkers to write questions, answers, and expla-
After reviewing the literature, we can summarize the recent nations for their answers. They limited the study to bar (simple,
studies in four topic questions: (i) How well do people understand grouped, and staked) and line charts. They reviewed the responses
visualizations?; (ii) How do people create visualizations?; (iii) How and, after removing questions that were not valid, they classified
do people teach visualizations?; and (iv) How do people make them as lookup questions or visual versus non-visual questions
sense of visualizations?. Next, we discuss the main works of each and explanations. They found that people regularly ask visual ques-
one. tions, and those visual explanations are both common and effec-
tive.
2.1. How well do people understand visualizations? This study limited the diversity of visualizations to deepen the
participants’ analytical procedure. Asking, answering, and explain-
Typically, studies that assess how well people understand vi- ing are cognitive tasks that encompass much more than we want
sualizations aim to investigate the ability of a user to extract in- to investigate. So, we simplified the study task to analyze only the
formation from a graphical representation. Boy et al. proposed data-related questions that the participants would ask, to diversify
a method for assessing the visualization literacy of a user in- the set of visualizations.
specting the ability scores, derived from the item response theory
(IRT) models [29]. Their method tests visualization literacy for line 2.2. How do people create visualizations?
charts, bar charts, and scatterplots, by asking participants to read
and answer questions about these charts. Each chart type has a Understanding a person’s cognitive process when creating a vi-
separate test, and each test has a set of 12 items using different sualization can bring discoveries about how much they understand
stimulus parameters and tasks. and express the visual mappings and how this can impact on the
Proceeding very much in a similar way, Lee et al. [5] developed final visualization.
VLAT, a visualization literacy test that associates tasks, chart types, One of the contributions by Grammel et al. [31] was the dis-
and questions to assess user visualization understanding. It pro- covery, through an empirical study, about how information visual-
poses metrics for difficulty indices and discrimination in evalua- ization novices construct visualizations. Although the formulation
tions. In one of the steps to create the systematic test, they asked of questions was not the main point of the study, in all visual-
the participants to formulate a sentence with information gained ization construction cycles, participants asked questions about the
from the chart. The researchers analyzed the sentences collected data to be analyzed, mapping these questions to visualization con-
and defined some potential test items, which are transcriptions of struction. Participants received a task sheet with data attributes,
the sentences into analytical questions. The final set of test items visual properties to map, task operations, and the task descrip-
has 53 questions for 12 different chart types and eight tasks. tion with a scenario. They were allowed to construct any visual-
Interacting with data may make analytics accessible to a broad ization they wanted and were encouraged to use various types of
audience. To validate this intuition, Setlur et al. [26] developed visualization, different from the common ones (bar, line, and pie
Eviza, a natural language interface for visual analysis. To guide this charts). They faced several barriers, and the first one was selecting
development, they conducted a user study to gather a repository the right data attributes for their high-level questions.
of questions that people would naturally ask, given different types Huron et al. [32] deconstructed the visual representation pro-
of charts. The chart types used were: maps, bar charts, scatterplots, cess by observing people creating representations using tangible
time-series line plots, bubblecharts, treemaps, heatmaps, and dash- tokens (mapped as a data unit). Participants created a visualization
boards. They asked participants to provide three to five statements based on the available data. After crafting the visualization, they
or questions about five random visualizations. They categorized the had to explain it, as if to a friend. The literacy assessment process
responses in 12 categories. Although they carried out a very simi- was intrinsic in each high-level activity of the creation process:
lar study to ours, they used the questions as a basis for the design construction, computation, and storytelling. Participants received
and conception of Eviza. Unfortunately, they did not describe the a scenario and one goal and did not have a specific visualization
question repository, nor the relationships between them and the question to guide them. Although researchers did not define ques-
types of charts analyzed. tions for study tasks, they noticed that some participants created
In order to evaluate graph comprehension capability, Livingston particular questions about the scenario. They used these questions
et al. [27] developed an algorithmic method for generating queries, to customize visual mappings made in initial designs.
based on the Sentence Verification Technique (SVT). Instead of
defining transformations of prose sentences into query probes 2.3. How do people teach visualizations?
(from SVT), they set alterations to graphs’ information statements
or assertions to graphs’ queries. In their pilot test, they showed a Visualizations are not limited to the computational field. They
source graph, some diversionary images, and a source prose. Then, are present in everyone’s daily life: newspapers, social networks,
33
books, etc. Some studies look for approaches to pedagogically im- We believe that our work fits into Activities 1 and 2 of this
prove people’s ability to solve problems and obtain information model. Once the reader encounters a visualization, one might ask:
through visualizations. Alper et al. [11] analyzed current prac- What are their first impressions about it? And how can they make
tices and challenges in teaching and learning data visualization in sense of the data it represents?
early education. They conducted several studies with students and
teachers to conceptualize a tool to help students create visualiza- 3. Empirical study
tions and teachers to teach them. The tool deals with the creation
of bar charts, from the abstraction of pictograms. They concluded 3.1. Objective and participants
that concrete examples should guide students to abstract knowl-
edge. In their studies, they did not use visualization tasks or ques- The goal of the study was to assess the quality of data-related
tions to guide students through the tool’s use evaluation. questions produced by people with minimal knowledge of data vi-
In a more recent study, Brner et al. [7] present a data visualiza- sualization, when exposed to different kinds of visualizations.
tion literacy framework (DVL-FM) to guide the design of data vi- We invited graduate and undergraduate students of Computer
sualization literacy teaching and assessment. Their framework de- Science, Design, Engineering, and Social Sciences, with little to no
fines a hierarchical typology of core concepts and details the pro- knowledge of data visualization to participate in the study. Al-
cess steps required to extract insights from data. The interpretation though we have not systematically tested the visual literacy of the
of the visualization proposed by them refers to visualization tasks participants, we asked them to self-evaluate their previous experi-
and typology types (insight needs, data scales, analyses, visualiza- ence and knowledge with data visualization in a series of Likert-
tion types, graphic symbols, graphic variables, and interactions). scale statements, with a 1-to-7 scale, where 1 means no knowl-
Although this framework is quite extensive, it does not consider edge or experience, and 7 means specialist. We asked whether
data-questions about visualizations. they had already: taken a course (median M = 1, interquartile
range IQR = 3); read any textbook material or blogs on data vi-
2.4. How do people make sense of visualizations? sualization (M = 1, IQR = 1), selected a type of chart for a visu-
alization (M = 3, IQR = 2); adjusted the mapping of visual vari-
Familiarity with a particular visualization type is crucial in en- ables in a chart (M = 2, IQR = 2); and evaluated a data visualiza-
suring that one can extract correct information about it. Brner tion (M = 2, IQR = 2).
et al. [9], informally investigated the familiarity of young and adult They had attended two two-hour data visualization lessons. The
museum visitors with different visualization types. They seek to lessons comprised the following content: (i) motivations on the
know whether and where they have seen them before, how they importance of visualization and real-world examples of situations
read and call them, and what types of data or information they where visualizations (or lack thereof) influenced major decisions;
would visualize similarly. Since this study does not attempt to (ii) variable types and scales (i.e., categorical, ordinal, interval, ra-
measure the ability to read the data and interpret the visualiza- tio, temporal, spatial); (iii) visual variables (i.e., color, size, shape,
tions correctly, they do not take a visualization question into ac- texture etc.); (iv) visual perception and Gestalt theory. The design
count. goals and visual encodings related to each type of chart were left
Lee et al. [33] investigated how people make sense of unfa- to lessons they took after the survey was closed.
miliar information visualizations such as parallel-coordinates plot, Their participation was completely voluntary and the data col-
chord diagram, and treemap. They experimented with three ses- lected was anonymous. Twenty-two people participated in the
sions, one for each visualization type. For each session, they asked study.
participants to verbalize their thoughts and behavior while trying
to make sense of the visualization. Although this is not a direct 3.2. Materials and procedure
literacy assessment study, it seeks to extract the participant’s vi-
sual knowledge and understanding. To collect the information, the We created 20 visualizations: bar (ordered by category); bar
participant has only the possible prior experience of similar visual- (ordered by frequency); bar (clustered); bar (stacked); boxplot;
izations. There were no questions or tasks to guide the participant. heatmap; chord; Sankey; network; line; line (multiple); ridge; his-
Another significant contribution of Lee et al. was the definition togram; scatterplot; scatterplot (+ color); bubblechart; bubblechart
of the NOVIS model (NOvice’s information VIsualization Sensemak- (+ color); map (cartogram); map (choropleth); and table.
ing). This model consists of five activities and a miscellaneous one, The dataset used to create the visualizations was made up of
as follows: 15 variables of different types, with values calculated by com-
bining different random distributions. The variables had meaning-
1. Encountering visualization: a cognitive activity that the less names, i.e., not belonging to either Portuguese or English (e.g.,
reader faces and looks at visualization as a whole image and kidun, klon, neji).
can build their first impression of it. Most (18) visualizations had only dummy variables, with no re-
2. Constructing a frame: a cognitive activity that the reader lation to any discernible domain (e.g., klod, neji). The two excep-
attempts to construct a frame (i.e., “an explanatory inter- tions were the maps, which presented familiar geographic regions,
nal structure that accounts for visual objects” [34]) to make but the variable depicted in either the choropleth map or the car-
sense of a given visualization. togram was also a dummy variable. Fig. 1 shows a sample of the
3. Exploring visualization: the cognitive activity that the reader visualizations used in the study.
interacts with a visualization to discover facts and insights We used dummy variables because we believe that the domain
about it based on the constructed frames. knowledge, although useful for interpreting real-world visualiza-
4. Questioning the frame: the cognitive activity that the reader tions, may get in the way of assessing a person’s visualization lit-
tries to confirm that the constructed frame is reasonable to eracy, as they may rely on their domain knowledge to fill the gaps
explain the visual object, and they can use the frame to ex- in their knowledge of the visual representation.
plore the visualization. The study was conducted through an anonymous online ques-
5. Floundering on visualization: the cognitive activity that the tionnaire, where the 20 visualizations were presented in random
reader does not know what to do with a visualization be- order, one at a time. For each visualization, each participant was
cause they failed to construct any reasonable frames. asked to create up to five questions about the underlying data that
34
Fig. 1. A sample of the visualizations used in the study.
could be answered by examining the visualization. For each ques- resolved. In the cases where there were equivalent ways of posing
tion, they were also asked to indicate the level of effort required the same question, we opted for the simplest phrasing.
to generate the question, on a 7-point scale, with 1 meaning “no For instance, the question “How many klon have klop?” for a
effort”, and 7 meaning “excessive effort”. scatterplot was considered problematic because both klon and klop
are continuous numeric variables, so questions of the form “How
many... have...?” make no sense. By contrast, the question “Is there a
3.3. Study results relation between klon and klop?” for the same chart was considered
clear and conceptually sound. In Section 5 we analyze the clear,
The questionnaire collected a total of 1058 responses. We (three conceptually sound questions, and in Section 6 we exemplify each
of the co-authors) examined the data collected, and discarded type of problematic question.
eight non-questions (e.g., “I could not phrase a question.”), result- Fig. 2 shows the number of questions created per type of visu-
ing in 1050 valid questions. alization, according to our assessment: 800 clear and conceptually
We individually created standardized versions of the questions, sound questions (“OK”, in blue), and 250 problematic questions (in
as an attempt to reduce the number of unique questions. To cre- orange).
ate the standardized questions, three researchers worked indepen- We wrote the questionnaire instructions in the respondents’ na-
dently on the questions. The guiding principle was to rephrase tive language, Portuguese. They were allowed to pose the questions
each question following basic English grammar, using a simple in their preferred language, either Portuguese or English. In our
structure starting with an interrogative pronoun and using the analysis, we disregarded typos and grammatical errors when clas-
variable names as nouns. We rewrote 800 questions (76.2%), and sifying a question as problematic or not.
flagged 250 questions (23.8%) as problematic, i.e., we could not To assess the possible influence of the participants’ previous
unambiguously create standardized questions from them and we knowledge of data visualizations on the results, we (i) calcu-
deemed that those question instances could not be clearly an- lated each participant’s error rate (number of problematic ques-
swered by inspecting the visualizations. When examining the 800 tions/total number of questions generated by the participant); and
standardized questions each researcher had written, we found that (ii) calculated the Spearman correlation between the degree of
we had fully agreed on 641 of them (80.1%). In 150 cases (18.8%), self-reported knowledge and the error rate. The correlation coeffi-
one of us had created a different standardized question, and in 9 cients were very low (ρ in [−0.28023163, 0.09411151]) and none
cases (1.1%) we had all created different standardized questions. of the correlations were significant (the lowest p-value was p =
Most of the discrepancies were caused by distraction and easily 0.2185527).
35
Fig. 2. Number of questions created per type of visualization. (For interpretation of the references to color in this figure, the reader is referred to the web version of this
article.)
In the next sections, we analyze the study results: the levels tion and increase towards the fourth or fifth question), we cal-
of effort and question order with respect to the assessment of culated the Spearman rank-correlation coefficient between effort
OK/problematic questions (Section 4), standardized questions for level (on a 7-point scale) and the question order (1 to 5). We found
the clear, conceptually sound questions (Section 5) and the issues a weak correlation (ρ = 0.349 ( p ≤ 2.2 × 10−16 )), so we may reject
identified in the problematic questions (Section 6), respectively. H3. This result may be due to the low number of questions created
in the fourth and fifth order, given that the participants were not
4. Analyzing perceived effort level and question order obliged to provide all five responses for each visualization.
Regarding the level of effort to create each question (on a 1–7

scale) and the question order (1–5), we had three hypotheses:
5. Analyzing the clear, conceptually sound questions
H1: There is a significant difference in the perceived effort level
to create a question that is OK and a question that is prob-
As mentioned in Section 3.3, we rewrote the 800 clear, concep-
lematic (i.e., the questions perceived as more effortful would
tually sound questions in standardized form. Next, we abstracted
have a higher chance of being problematic).
the standardized questions as parameterized templates, replacing
H2: There is a significant difference in the question order be-
the variable names with their corresponding types: N for nominal
tween OK and problematic questions (i.e., most problematic
variables, Q for continuous numeric variables, T for temporal
questions would appear in the later questions for each visu-
variables, S for spatial variables, and Objs for objects. We also
alization).
parameterized question variations (e.g., “What is the largest value
H3: The effort level to create a question and the question order
of Q?” and “What is the smallest value of Q?” were subsumed
are correlated (i.e., as participants would expend more effort
under “What is the [largest | smallest] value of Q?”). In addition
to create later questions for each visualization).
to replacing the variable names with their corresponding types,
Fig. 3 depicts the distribution of OK and problematic ques- when different variables of the same type were mentioned in
tions over the perceived (self-reported) effort level. To evaluate H1, the question, we needed a way to differentiate them. If two or
we ran a Mann-Whitney non-parametric test on the effort level more variables of the same type are used (for instance, three
for OK vs. problematic questions, and found a significant differ- continuous numeric variables in a bubblechart), they were indexed
ence (U = 63326, p = 5.047 × 10−16 ). Therefore, we accept H1, i.e., with numbers, as in Q1, Q2, Q3. Moreover, when referring to a
participants perceived they expended more effort in creating the specific value, we use a dot notation, such as N1.A, meaning a
questions that were later assessed as having lower quality. specific value of one of the nominal variables. Expressions within
Fig. 4 depicts the distribution of OK and problematic questions brackets indicate they may or may not be present in the template
over the question order. To evaluate H2, we ran a Mann–Whitney instances. For instance, for the clustered bar chart, “What is the
non-parametric test on the question order for OK vs. problem- value of klod per nili?” and “What is the value of klod in nili
atic questions, but found no significant difference (U = 94548, p = BKL and neji MQQ?” were both instances of the template “What
0.1722). Therefore, we reject H2, i.e., there was no difference in the is the value of Q [per N1 | per N1 and N2 | in N1.A | in N1.A
quality of questions created earlier or later for each visualization. and N2.B]?” As a result, we generated a consolidated list of 249
Fig. 5 depicts the relation between the effort level and the ques- unique question-templates (265 visualization, template pairs).
tion order. To evaluate H3 (the level of effort to create a question Table 1 illustrates the question-templates that encompass at least
would be low for the first questions created for each visualiza- five question-instances per visualization.
36
Fig. 3. Self-reported effort level to create each question.
Fig. 4. Question order and the initial quality assessment of each question.
6. Analyzing the problematic questions • INS-∗ failures to follow the instructions when filling out the
questionnaire (79)
Using an open coding approach, we coded the 250 problem-
atic questions. Next, we discussed the codes we had defined to From the coding of the problematic questions, we can observe
converge into an agreed set of codes. Some questions presented which kinds of problems in the questions occurred more often for
more than one problem, so the total number of problem occur- each type of visualization. Fig. 6 shows how the code classes are
rences (277) is greater than the number of questions (250). We distributed across the different visualizations.
individually recoded the questions and discussed the code defini- The highest incidence of AMB-∗ issues associated to the Sankey
tions again in three more iterations, until we were satisfied with a diagram is mostly related to underspecifying the focus of the ques-
unified set of 20 codes, categorized in five major classes: tion (e.g., “What is the intensity of this relation?” without specify-
ing the values of interest). The high incidence of DNA-∗ issues in
• ERR-∗ questions containing conceptual errors (88 occur- boxplots is related to questions that require observing individual
rences) objects or computer derived values from them (e.g., mean), which
• AMB-∗ questions that contain some ambiguity (41) are not represented in the visualization. The high incidence of
• DTA-∗ questions that are technically answerable, but are dif- DTA-∗ issues in the stacked bar chart was related mostly to ques-
ficult to answer with the visualization, i.e., questions for tions that inquired the size of specific segments, which are of-
which the visualization was not appropriate (43) ten not straightforward to answer with this type of visualization.
• DNA-∗ questions the visualization does not answer (28) The high incidence of ERR-∗ issues in the bubblechart was related
37
Fig. 5. Number of questions per self-reported effort level and question order.
Fig. 6. Distribution of code classes per visualization.
mostly to treating a continuous numeric variable as if it were dis- using the Fleiss kappa metric [35]. We obtained κ = 0.693, which,
crete (e.g., “...per Q?”), an example of misunderstanding how to use according to Landis and Koch’s proposed benchmark, is considered
a variable effectively. Finally, the high incidence of INS-∗ issues in a substantial agreement [36]. We then examined the questions pre-
the choropleth map is mostly related to assumptions about the un- senting conflicts and decided on the final coding.
derlying domain represented in the map (e.g., assuming that the Table 2 presents the codes and their definitions, together
numeric variable represented the population of each country). with the number of occurrences of each code. Further defi-
After individually coding the 250 problematic questions with nitions of the classes of problems are detailed in the next
the unified set of 20 codes, we calculated the inter-rater agreement subsections.
38
Table 1
Question templates with five or more occurrences.
Visualization Question template Count
bar (clustered) What is the value of Q [per N1 | per N1 and N2 | in N1.A | in N1.A and N2.B]? 9
Which N1 has the [largest | smallest] Q in N2? 9
bar (ordered by category) Which N has the [largest | smallest] number of Objs? 18
How many Objs are there per N? 6
What is the [average | median | mode] number of Objs? 5
bar (ordered by frequency) Which N has the [largest | smallest] number of Objs? 19
How many Objs are there [per N | in N.A]? 10
What is the [average | median | mode] number of Objs? 5
Which Ns have [more | fewer] than A Objs? 5
bar (stacked) Which N has the [largest | smallest] number of Objs? 14
Which N1 has the [largest | smallest] number of Objs [per N2 | in N2.A]? 12
boxplot Which N has the [largest | smallest] variation of Q? 6
What is the median of Q per N? 5
Which N has the [most | fewest] outliers? 5
bubblechart For what ranges of [Q1 | Q2] do we have the [most | fewest] Objs? 12
bubblechart (+ color) For what ranges of [Q1 | Q2] do we have the [most | fewest] Objs (per N | in N.A)? 12
chord Which N1 is the [most | least] associated with N2? 9
What is the [most | least] frequent N? 7
heatmap Which N1 and N2 has the [largest | smallest] Q? 12
histogram Which range of Q has the [most | fewest] Objs? 16
What is the distribution of Q? 5
line (single) How has Q [behaved | increased | decreased] (in T.A)? 11
In which T did Q reach its [largest | smallest] value? 11
In which (period of) T did Q [increase | decrease] the [most | least]? 7
In which (period of) T did Q [increase | decrease] monotonically? 5
line (multiple) Which N had the [most | least] variation of Q (in (period of) T)? 16
How has Q behaved [in each N | in N.A] (since T.A)? 7
In which N did Q [increase | decrease] the [most | least] over T? 6
Which N had the [largest | smallest] Q (in (period of) T)? 6
map (cartogram) Which S (or set of Ss) has the [largest | smallest] values of Q? 16
map (choropleth) Which (set of) Ss have the [largest | smallest] values of Q? 21
network Which Vs have the [largest | smallest] degree (number of connections)? 11
What is the [shortest | longest] path between V.A and V.B? 7
How many cycles are there in this graph? 6
Which Vs are (indirectly) connected (to vertex V.A)? 5
ridge In which T(year) did Q have its [largest | smallest] value (per N | in N.A)? 10
What are the values of Q per N (in T(year) | in T.A-T.B)? 5
Which N had the [largest | smallest] Q (per T | in T.A)? 5
Sankey Which N1 is associated with (the [most | least]) Objs [in each N2 | in N2.A]? 9
scatterplot In which range of [Q1 | Q2] are there the [most | fewest] Objs? 13
What is the relation between Q1 and Q2? 6
scatterplot (+ color) What is the relation between Q1 and Q2 (in each N | in N.A)? 11
Which N has the [most | least] Objs (in the range Q1.A-Q1.B, Q2.C-Q2.D)? 7
table Which N has the [largest | smallest] number of Objs? 20
What is the [average | median | standard deviation | variance] of the number of Objs? 8
What is the number of Objs in each N? 6
6.1. Questions containing errors OBJ, e.g., [scatterplot] “Are the values more concentrated at the
center or at the edges?”); one question called for a single value,
In 88 cases, there were comprehension errors about the vari- but mentioned a range of values instead (ERR-RANGE-IO-POINT, e.g.,
ables represented in the chart. In the following examples, the [boxplot] “From which range on can one identify outliers?”); and,
underlined text represents the focus of the coded problem: 37 finally, one question called for values, but mentioned statistics in-
questions called for a countable object, but mentioned a con- stead (ERR-STAT-IO-VAL, e.g., [line (single)] “What is the largest Qy
tinuous variable instead (ERR-COUNT-OBJ-IO-CONT-VAR, e.g., [bub- each month?”).
blechart] “What value of Qx had the least frequency of Qy ?”);
in 20 questions the participants seem to have misunder- 6.2. Ambiguous questions
stood the variables encoded in the visualization in various
ways (ERR-MISUNDERSTOOD-VAR, e.g., [bubblechart] “How many Qx In 41 cases, the questions were ambiguous. Because the am-
have Qy ?”); 18 questions called for a discrete variable, but men- biguities were very open to interpretation, we did not fur-
tioned a continuous variable instead (ERR-DISC-VAR-IO-CONT-VAR, ther specify the issues in more granular codes. Some exam-
e.g., [bubble chart (+ color)] “How many Ncolor per Qx ?”); in five ples of ambiguous questions are: [line chart with oscillating val-
questions, the participants asked about a trend along a cate- ues] “What was the period of recovery?”; [scatterplot (+ color)]
gorical variable (ERR-TREND-IN-CATEG-VAR, e.g., [bar chart] “How “What is the sampling of Ncolor .a and Ncolor .b according to the
does the count of objects evolve along the Nbar ?”); five questions growth of Qx and Qy ? ”.
called for a range of values, but mentioned a single value instead
6.3. Questions for which the visualization was not suitable
(ERR-POINT-IO-RANGE, e.g., [bubblechart] “In which value of Qx was
there the lowest concentration of Qsize ?”); one question called for
In 43 cases, the questions were difficult to answer, i.e., the is-
countable objects, but mentioned object values (ERR-OBJ-VAL-IO-
sues were more related to an inadequacy of the visualization to
39
Table 2
Codes resulting from the open coding process. A total of 277 code occurrences were associated with the 250 problematic questions.
Code count Definition
Questions containing errors

ERR-COUNT-OBJ-IO-CONT-VAR 37 Question called for a countable object, but mentioned a continuous variable instead
ERR-MISUNDERSTOOD-VAR 20 Participant seems to have misunderstood the variables encoded in the visualization
ERR-DISC-VAR-IO-CONT-VAR 18 Question called for a discrete variable, but mentioned a continuous variable instead
ERR-TREND-IN-CATEG-VAR 5 Participant asked about a trend along a categorical variable
ERR-POINT-IO-RANGE 5 Question called for a range of values, but mentioned a single value instead
ERR-OBJ-VAL-IO-OBJ 1 Question called for objects, but mentioned object values instead
ERR-RANGE-IO-POINT 1 Question called for a single value, but mentioned a range of values instead
ERR-STAT-IO-VAL 1 Question called for values, but mentioned a summary statistics or aggregate value
instead
Ambiguous questions
AMB-QUESTION 41 Question used ambiguous terminology that allows for multiple interpretations
Questions that were difficult to answer with the visualization

DTA-ALL-VALS 23 Participant asked for all the values of one or more variables of all objects
DTA-VIS-TYPE 17 The question cannot be answered with the current visualization type
DTA-DATASET 3 Although the question could potentially be answered with the current type of
visualization, the particular dataset made it very difficult to answer it.
Questions the visualization does not answer
DNA-REQ-INDIV-OBJS 9 The question requires individual objects to answer, but only aggregates are
represented in the visualization
DNA-OTHER 8 The question cannot be answered by this visualization (for some other reason than
the ones specified in the other DNA-∗ codes)
DNA-REQ-ADDIT-VARS 4 The question requires additional variables that are not represented in the
visualization
DNA-WHY 4 The question requires some statement of causal relationship that cannot be asserted
based only on the visualization?
DNA-REQ-DATA-VALS 3 The question requires individual data values to answer, but they are not represented
in the visualization
Failures to follow instructions
INS-MAPPED-ONTO-DOMAIN 38 The participant phrased an analogous question about a familiar domain, ignoring the
dummy variables
INS-ABOUT-VIS 37 Question about the chart type and its elements, and not about the underlying data.
INS-MULTIPLE-QUESTIONS 4 The participant provided multiple questions in a single text field
answer the stated question (DTA-∗ ): 23 questions asked for all the 6.5. Failures in following instructions
exact values depicted in the visualization, which would be better
answered by inspecting the dataset itself in a table (DTA-ALL-VALS, In 79 questions, participants failed to follow the instructions
e.g., [line chart] “What is the exact value of Qy in each segment of (INST-∗ ). Of those questions, 37 were about the visualization type
the curve?”); 17 questions asked for an information for which the or its elements (INST-ABOUT-VIS, e.g., [bar chart (ordered by cat-
visualization type was inadequate (DTA-VIS-TYPE, e.g., [bubblechart] egory) “Would it be better to sort bars by count?”); 38 ignored
“Which mode of Qy has more points?”; and three questions de- the dummy variables and attempted to phrase a similar question
pended more heavily on the dataset, which incidentally made it about a familiar domain (INST-MAPPED_ONTO_DOMAIN, e.g., [scat-
difficult to answer it in this particular visualization instance (DTA- terplot] “What is the level of customer satisfaction with relation
DATASET, e.g., [scatterplot with no discernible outliers] “Which val- to the price of the meal?”); and four were multiple questions in a
ues can be considered outliers?”). single field (INST-MULTIPLE-QUESTIONS).
6.4. Questions the visualization does not answer 6.6. Distribution of problems across participants
In 28 questions, participants posed questions that the visualiza- Finally, we analyzed how the problems were distributed across
tion could not answer (DNA-∗ ): nive questions required observing participants, to assess how common each type of problem was,
or counting individual objects, but the visualization only provided and whether a problem was particular to only one or few partic-
summaries (DNA-REQ-INDIV-OBJS, e.g., [horizontal boxplot] “What ipants. Table 3 summarizes the number of participants that intro-
are the values of Qx per Ny ?”); four questions required additional duced which kind of problem, and Fig. 7 details this distribution.
variables not represented in the visualization (DNA-REQ-ADDIT- We omitted the INS-∗ problems, as they are related only to not
VARS, e.g., [map (choropleth)] “How is Qcolor related to the pop- following the study instructions and are therefore unrelated to vi-
ulation of each country?”); three questions required explicit data sualization literacy.
values, instead of the aggregate values provided (DNA-REQ-DATA- As we can see, most participants created ambiguous ques-
VALS, e.g., [map (cartogram)] “How many countries are above av- tions (AMB-QUESTION); misunderstood the types of variables
erage of Qcolor ? (only ranges of values are provided, and the mean represented in the visualizations (ERR-COUNT-OBJ-IO-CONT-
is not provided nor can it be calculated)”); four questions asked VAR, ERR-DISC-VAR-IO-CONT-VAR, and the more general ERR-
about a cause-and-effect relation not depicted in the visualization MISUNDERSTOOD-VAR); or created questions that the correspond-
(DNA-WHY, e.g., [bar chart] “Why is the Qy so different across most ing visualization type could not answer effectively (DTA-VIS-TYPE),
Nx ?”); and eight questions did not answer the questions for other i.e., questions that could be answered much better by other types
reasons (DNA-OTHER, e.g., [chord whose ribbons draw a pattern re- of visualization (e.g., attempting to compare or find the difference
sembling the letter “M” in the middle of the chart] “What is this between bar segments in a stacked bar chart, instead of a clustered
M in the middle of the chart?”). bar chart).
40
Fig. 7. Distribution of codes per participant, as a percentage of each participant’s errors.
Table 3 questions). From the first group, we derived 249 unique question
Number of participants who introduced a problem in their
templates that describe the different types of questions people
questions.
are expected to be able to answer through each visualization.
Code Number of participants For the second group, we applied an open coding technique to
AMB-QUESTION 16 identify the types of problems found in the questions, yielding 20
ERR-COUNT-OBJ-IO-CONT-VAR 12 types of problems, which can be subsumed under five classes of
DTA-VIS-TYPE 11 problems.
DTA-ALL-VALS 7
The results of the study reported in this paper can be used in
ERR-DISC-VAR-IO-CONT-VAR 7
ERR-MISUNDERSTOOD-VAR 7 teaching data visualization, as they uncover and classify frequent
DNA-OTHER 6 errors people make when thinking about data represented visually.
DNA-REQ-INDIV-OBJS 4 These results can also inform the design of visualization recom-
DTA-DATASET 3
mender systems, such as Voyager 2 [4], for instance, going beyond
ERR-POINT-IO-RANGE 3
ERR-TREND-IN-CATEG-VAR 3 the association of the variable types with visualizations and sup-
DNA-REQ-ADDIT-VARS 2 porting question-answering interactions more fully. We are now
DNA-REQ-DATA-VALS 2 better equipped to recommend visualizations based on users’ ques-
DNA-WHY 2 tions, as the study revealed what questions participants expected
ERR-OBJ-VALUE-IO-OBJ 1
to answer with each visualization.
ERR-RANGE-IO-POINT 1
ERR-STAT-IO-VALUE 1 From the types of errors identified in the questions formu-
lated by the participants, we can also help users to formulate
better questions and assist them in obtaining the desired in-
For the three kinds of errors that were made by only one formation. One way to do so would be to suggest alternatives
participant (ERR-OBJ-VALUE-IO-OBJ, ERR-RANGE-IO-POINT, and ERR- to poorly phrased questions, through a mechanism analogous to
STAT-IO-VALUE), we note that each one was made by a different Google Search’s way of suggesting alternative search queries based
participant. on what the user has typed (e.g., “Did you mean X?”).
As future work, we want to carry out a similar study with a
7. Concluding remarks larger number of participants, to attempt to reproduce the results.
We also plan to further delve into the ambiguous questions, by
In this paper, we report a visual literacy study focused on how asking participants to both pose and answer questions, in a way
participants formulate questions to interpret data represented similar to Swerdlik and Cohen’s study [29], but with a focus on
in visualizations. After having collected 1058 questions from 22 identifying sources of ambiguity to later design ways to support
participants, we classified the questions into two groups: (i) clear data visualization students and tool users in disambiguating un-
and conceptually sound (800 questions) and (ii) problematic (250 clear questions.
41
Declaration of Competing Interest [15] Taylor C. New kinds of literacy, and the world of visual information. Paper
presented at the EIGVIL workshop: Explanatory & Instructional Graphics and
Visual Information Literacy. London Metropolitan University; 2003. p. 1–22.
The authors declare that they have no known competing finan- Retrieved from http://www.conradiator.com/resources/pdf/literacies4eigvil_
cial interests or personal relationships that could have appeared to ct2003.pdf.
influence the work reported in this paper. [16] Harris BR. Visual information literacy via visual means: three heuristics.
Ref Serv Rev 2006;34(2):213–21. doi:10.1108/00907320610669452. Publisher:
Emerald Group Publishing Limited.
[17] Wainer H. A test of graphicacy in children. Appl Psychol Meas
Acknowledgments 1980;4(3):331–40.
[18] Herdal T, Pedersen JG, Knudsen S. Designing information visualizations for
elite soccer children’s different levels of comprehension. In: Proceedings of the
The authors thank CAPES for the support for this work and 9th nordic conference on human-computer interaction; 2016. p. 1–4.
CNPq (processes #311316/2018-2 and #313654/2017-4). [19] Gormley K, McDermott P. Searching for evidence teaching students to become
effective readers by visualizing information in texts. Clear House: J Educ Strat
Issues Ideas 2015;88(6):171–7.
[20] Lee S, Kwon BC, Yang J, Lee BC, Kim S-H. The correlation between users cog-
nitive characteristics and visualization literacy. Appl Sci 2019;9(3):488.
References
[21] Galesic M, Garcia-Retamero R. Graph literacy: a cross-cultural comparison.
Med Decis Mak 2011;31(3):444–57.
[1] Key A, Howe B, Perry D, Aragon C. VizDeck: self-organizing dashboards for vi- [22] Garfield JB. Assessing statistical reasoning. Stat Educ Res J 2003;2(1):22–38.
sual analytics. In: Proceedings of the 2012 ACM SIGMOD international confer- [23] Schield M. Information literacy, statistical literacy and data literacy. IASSIST Q
ence on management of data, SIGMOD ’12. New York, NY, USA: ACM; 2012. 2004;28(2/3):6–11.
p. 681–4. ISBN 978-1-4503-1247-9. doi:10.1145/2213836.2213931. [24] Koltay T. Data literacy: in search of a name and identity. J Docum
[2] Keim DA, Mansmann F, Oelke D, Ziegler H. Visual analytics: combining 2015;71(2):401–15. doi:10.1108/JD- 02- 2014- 0026.
automated discovery with interactive visualizations. In: Proceedings of the [25] Carlson J, Fosmire M, Miller C, Nelson MS. Determining data information
11th international conference on discovery science, DS ’08. Berlin, Heidel- literacy needs: a study of students and research faculty. Portal: Lib Acad
berg: Springer-Verlag; 2008. p. 2–14. ISBN 978-3-540-88410-1. doi:10.1007/ 2011;11(2):629–57.
978- 3- 540- 88411- 8_2. [26] Setlur V, Battersby SE, Tory M, Gossweiler R, Chang AX. Eviza: a natural lan-
[3] Tableau. Tableau software. 2003. http://www.tableausoftware.com/, last ac- guage interface for visual analysis. In: Proceedings of the 29th annual sympo-
cessed on July, 2020. sium on user interface software and technology; 2016. p. 365–77.
[4] Wongsuphasawat K, Qu Z, Moritz D, Chang R, Ouk F, Anand A, et al. Voyager 2: [27] Livingston MA, Brock D, Decker JW, Perzanowski DJ, Van Dolson C, Mathews J,
augmenting visual analysis with partial view specifications. In: Proceedings of et al. A query generation technique for measuring comprehension of statistical
the 2017 CHI conference on human factors in computing systems. ACM; 2017. graphics. In: Proceedings of the international conference on applied human
p. 2648–59. factors and ergonomics. Springer; 2019. p. 3–14.
[5] Lee S, Kim S-H, Kwon BC. Vlat: development of a visualization literacy assess- [28] Rodrigues AM, Barbosa GD, Lopes H, Barbosa SD. Comparing the effectiveness
ment test. IEEE Trans Vis Comput Graph 2016;23(1):551–60. of visualizations of different data distributions. In: Proceedings of the 32nd
[6] Boy J, Rensink RA, Bertini E, Fekete J-D. A principled way of assessing visual- SIBGRAPI conference on graphics, patterns and images (SIBGRAPI). IEEE; 2019.
ization literacy. IEEE Trans Vis Comput Graph 2014;20(12):1963–72. p. 84–91.
[7] Börner K, Bueckle A, Ginda M. Data visualization literacy: definitions, [29] Cohen RJ, Swerdlik ME, Phillips SM. Psychological testing and assessment: an
conceptual frameworks, exercises, and assessments. Proc Natl Acad Sci introduction to tests and measurement. Mayfield Publishing Co; 1996.
2019;116(6):1857–64. [30] Kim DH, Hoque E, Agrawala M. Answering questions about charts and generat-
[8] Maltese AV, Harsh JA, Svetina D. Data visualization literacy: investigating ing visual explanations. In: Proceedings of the 2020 CHI conference on human
data interpretation along the novice expert continuum. J Coll Sci Teach factors in computing systems; 2020. p. 1–13.
2015;45(1):84–90. [31] Grammel L, Tory M, Storey M-A. How information visualization novices con-
[9] Börner K, Maltese A, Balliet RN, Heimlich J. Investigating aspects of data visu- struct visualizations. IEEE Trans Vis Comput Graph 2010;16(6):943–52.
alization literacy using 20 information visualizations and 273 science museum [32] Huron S, Jansen Y, Carpendale S. Constructing visual representations: In-
visitors. Inf Vis 2016;15(3):198–213. vestigating the use of tangible tokens. IEEE Trans Vis Comput Graph
[10] Chevalier F, Riche NH, Alper B, Plaisant C, Boy J, Elmqvist N. Observations and 2014;20(12):2102–11.
reflections on visualization literacy in elementary school. IEEE Comput Graph- [33] Lee S, Kim S-H, Hung Y-H, Lam H, Kang Y-a, Yi JS. How do people make sense
ics Appl 2018;38(3):21–9. of unfamiliar visualizations?: A grounded model of novice’s information visu-
[11] Alper B, Riche NH, Chevalier F, Boy J, Sezgin M. Visualization literacy at ele- alization sensemaking. IEEE Trans Vis Comput Graph 2015;22(1):499–508.
mentary school. In: Proceedings of the 2017 CHI conference on human factors [34] Klein incollection, Phillips JK, Rall EL, Peluso DA. A data–frame theory of sense-
in computing systems; 2017. p. 5485–97. making. In: Proceedings of the sixth international conference on naturalistic
[12] Bristor VJ, Drake SV. Linking the language arts and content areas through vi- decision making; 2007.
sual technology. THE J (Technol Horizons Educ) 1994;22(2):74. [35] Fleiss JL, Levin B, Paik MC. Statistical methods for rates and proportions. 3rd
[13] Avgerinou M, Ericson J. A review of the concept of visual literacy. Br J Educ edition. Wiley-Interscience; 2003. ISBN 978-0-471-52629-2.
Technol 1997;28(4):280–91. [36] Landis JR, Koch GG. The measurement of observer agreement for categori-
[14] Avgerinou MD, Pettersson R. Toward a cohesive theory of visual literacy. J Vis cal data. Biometrics 1977;33(1):159–74. doi:10.2307/2529310. Publisher: [Wi-
Lit 2011;30(2):1–19. ley, International Biometric Society].
42

What Questions Reveal About Novices' Attempts To Make Sense of Data Visualizations: Patterns and Misconceptions

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

What Questions Reveal About Novices' Attempts To Make Sense of Data Visualizations: Patterns and Misconceptions

Uploaded by

Copyright:

Available Formats

Computers & Graphics 94 (2021) 32–42

Contents lists available at ScienceDirect

Computers & Graphics

Special Section on SIBGRAPI 2020

What questions reveal about novices’ attempts to make sense of data

1. Introduction visual literacy [12–14], visual information literacy [15,16], and

Fig. 1. A sample of the visualizations used in the study.

Regarding the level of effort to create each question (on a 1–7

Fig. 3. Self-reported effort level to create each question.

Fig. 6. Distribution of code classes per visualization.

Visualization Question template Count

Code count Deﬁnition

Questions containing errors

Questions that were diﬃcult to answer with the visualization

Fig. 7. Distribution of codes per participant, as a percentage of each participant’s errors.

You might also like