Professional Documents
Culture Documents
Charles Forceville - Visual and Multimodal Communication - Applying The Relevance Principle-Oxford University Press (2020) - 82-117
Charles Forceville - Visual and Multimodal Communication - Applying The Relevance Principle-Oxford University Press (2020) - 82-117
Adapting Relevance
Theory to Accommodate
Visual Communication
3.1 INTRODUCTION
Visual and Multimodal Communication. Charles Forceville, Oxford University Press (2020). © Oxford University Press.
DOI:10.1093/oso/9780190845230.001.0001
be exploratory and require further debate. Here and there I will therefore take
considerable liberties to be speculative.
A few preliminary remarks are in order. First, as indicated before, the word
“visuals” will be taken in a very broad sense. I agree with Lisa El Refaie, that
“it is important to recognize . . . that visual meaning-making is by no means
lim- ited to the use of iconic pictures; it also includes nonrepresentational
aspects of visual design, such as style, layout, color, and typography”
(2019: 38). Of course the differences between these various phenomena
are substantial and have a bearing on how they can achieve relevance. For
present purposes, the main point is that “visuals” contrast with other
modes or modalities (which are the two interchangeably terms I will use
for what in other approaches are called semiotic systems or semiotic
resources), specifically with the spoken- verbal mode that has hitherto
been the privileged one in RT scholarship. Sometimes the distinction is
problematic, since for instance written language has visual dimensions as well
as verbal ones: typed language is printed in a spe- cific font, with a specific
size. We are usually unaware of this, but are alerted to it when we suddenly
encounter a word in CAPITALS, or in bold, or in italics, or with a section
heading printed in another font or size. Children’s books and instruction
manuals may use different colors for different segments of texts. Certain
poems exploit the visual dimensions of language. The lines in George
Herbert’s “Easter-Wings” (1633) are arranged on the page in such a way
that, when the page is turned 90°, the poem’s two stanzas resemble a pair of
angels, while the lines in his “The Altar” (1633) have been arranged to
resemble the object of its title. Dadaists conducted radical experiments
with the type fonts of words and letters and their spatial arrangement on
the page. Comics art- ists often find creative ways in which to visualize
onomatopoeia (AAARRRGG! Boing! Zzzzzz). Contemporary fiction
sometimes explores the visual dimen- sion, and its interrelations with the
written-verbal mode, in highly daring and thought-provoking manners
(Gibbons 2012). Whenever written language has such salient visual
dimensions that it seems reasonable to assume that these dimensions play a
role in meaning-making, I consider these dimensions perti- nent to what is
discussed in this book. The same holds for gestures. Gesturing is usually
considered a mode in itself (e.g., Müller 2008; Cienki and Müller 2008),
even though people see gestures, which thereby could also be called
“visual.” I will not be overly concerned by such problems of delimitation. My
focus on the visual dimension is motivated by practical reasons pertaining
to space, reproducibility of data for analysis, and the areas I am more or
less knowledgeable about rather than by principled reasons.
Second, I will in this chapter talk about both “pure” visuals and visuals
accompanied by language. In fact, the former constitute a fairly untypical sit-
uation: in the service of optimal relevance, ostensive visuals combine
more often than not with language; messages of the latter type are called
[ 64 ] Visual and Multimodal Communication
“multi- modal.” Multimodality, however, is an ill-defined concept. As the
sociologist
A D A P t I Ng R e L e VA NC e t H e OR y [
Luc Pauwels warns, “Multimodal research is an ambitious venture given
the fact that even most forms of mono-modal or single mode analysis (for
ex- ample, the analysis of static photographs) are still underdeveloped—in
other words, not able to tap into the full expressive potential of this
medium” (2015: 73). In section 3.2 I will address the thorny concept of
“multimodality” at some greater length.
Third, while spoken language dominates the face-to-face variety of
commu- nication habitually discussed in RT, visuals are typically used in
many forms of mass-communication. For the time being, I will take for
granted this mass- communicative aspect of visuals; in Chapter 4 I will
return to this crucial di- mension in more detail.
Fourth, while the utterances exchanged between Mary and Peter typi-
cally analyzed in RT are one or two complete or incomplete sentences, the
visuals that will be the center of attention in this book are often, in one way
or another, complete discourses. We need to be alert to a possible imbalance
between the amount of information that is conveyed in a single sentence vis-
à-vis that in a complete visual or word-and-image discourse.
Fifth, I will standardly assume that the examples of pictures discussed
in this book are ostensively used. That is, they are used by a communicator
as a message that aims to trigger a positive cognitive effect or reward in an
addressee or audience—and thus come with the presumption of relevance.
Usually, of course, it is the context in which the picture functions that
marks it as a piece of ostenstive-inferential communication.
The structure of this chapter is as follows: I will begin by briefly discussing
multimodality and proceed by reflecting on some passages by Carston (2002)
that help pave the way for applying RT beyond the verbal mode. Then I will
es- tablish where the applicability of RT (as summarized in Chapter 2) to
visuals is in my view completely or fairly unproblematic, and gradually
proceed to address more knotty issues.
3.2 MULTIMODALITY
A D A P t I Ng R e L e VA NC e t H e OR y [
worrisomely
First, a material substrate must be fixed as an essential component for any se-
miotic mode; this material may itself stretch over diverse sensory channels.
Second, a mid-level, “mediating” stratum provides more (i.e., grammar-like) or
less (lexicon-like) compositionally functioning structural possibilities capable
of drawing “functionally”-motivated differentiations in form. Third and finally,
“above,” or “surrounding” these levels of semiotic abstraction, we place our
more abstract stratum of (local) discourse semantics, which operates abductively
on the descriptions of the lower levels of abstractions (2016: 46-47, emphases in
original).
I find particularly the third level rather daunting. Since the concept of a mode
can in my view only be useful if it covers a fairly limited number, I have de-
cided, pace Bateman (2016), to commit myself provisionally to the following
list: (1) visuals; (2) written language; (3) spoken language; (4) bodily behav-
ior (comprising gestures, postures, facial expressions, and [manner of] move-
ment); (5) sound; (6) music; (7) olfaction (see Plümacher and Holz 2007);
(8) taste; (9) touch. (This is a slight adaptation of the list I proposed in
Forceville (2006: 383), in which (4) covered only “gestures.”) The rationale
here is to try to keep as close a correspondence as possible between sensory
perception and mode, and to consider other meaning-generating
mechanisms as simply not belonging to “modality.” From this perspective it
would be highly convenient if we could restrict ourselves to the visual, the
aural, the olfactory, the tactile, and the gustatory modes. However, for
various reasons this is not a viable option. For one thing, we see both
written language and visuals, but whereas we have to learn the specific
language of the culture in which we are born, we often immediately
understand the meaning of visible and depicted objects and persons.
Moreover, we hear spoken language, music, and non-verbal sound.
Generally speaking different input sources are associated with very
different material and social circumstances. Language learning requires a
substantial period of time, while this is not, or is far less the case, for visuals.
A D A P t I Ng R e L e VA NC e t H e OR y [
Languages’
A D A P t I Ng R e L e VA NC e t H e OR y [
the word-and-image combinations I discuss in this book are invariably
fairly short, often no more
Sperber and Wilson claim that “linguistic communication is the strongest pos-
sible form of communication” (1995: 175). This is undoubtedly right if one
thinks of the nuanced and precise way in which one can communicate in lan-
guage, but visual communication has other affordances.
Fortunately, RT offers clear starting points for applications to the visual
realm. I feel encouraged, for instance, by a major point in RT scholar
Robyn Carston’s Thoughts and Utterances: The Pragmatics of Explicit
Communication (2002). The philosophers she takes to task, she argues, base
their theories on the misguided idea that everyday verbal communication
functions essentially in the same way as the language of logic. In the
language of logic there is, or should be, a complete distinction between the
purely semantic information in a proposition and the pragmatically inferred
information. This distinction is important for linguists of the language-of-
logic persuasion, for “the semantics of a formal logical language is typically
given in terms of a truth theory for the language, which assigns to each
sentential formula conditions on its truth; the propositionality and context-
independence of the sentences of the language are important factors in
making this feasible” (Carston 2002: 50). Informally formulated, Carston’s
A D A P t I Ng R e L e VA NC e t H e OR y [
opponents (she specifically discusses Higginbotham’s
A D A P t I Ng R e L e VA NC e t H e OR y [
2002: 76, emphasis mine, ChF).
Much animal communication is purely coded: for example, the bee dance used
to indicate the direction and distance of nectar It is arguable that some
human
non-verbal communication is purely coded: for example, the interpretation
by neo-nates of facial expressions of emotion (Sperber and Wilson 2012c: 263).
Once these relaxations are accepted, the step from verbal communication
to communication involving other modalities is not such a big one
anymore. Indeed, De Brabanter (2010) discusses gestures from the angle of
RT, as does Wharton (2009). The latter, moreover, discusses Grice’s
example of Herod presenting the severed head of John the Baptist, and
accepts that such osten- sive behavior can be understood as a specimen of
“overt intentional commu- nication” (2009: 33).
The following sections will demonstrate in more detail how RT can be ap-
plied to visuals.
A D A P t I Ng R e L e VA NC e t H e OR y [
of a handsome co-worker) that gives you information that you find highly
relevant, but it is no more an ostensive stimulus than a dark cloud spelling
rain. Both the accidentally discovered visual and the dark cloud may
trigger pertinent inferences—but thanks to the (First) Cognitive, not the
(Second) Communicative, Principle of Relevance.
By contrast, if the colleague had shown any of these visuals to you, they
would have been ostensive stimuli which, as always, come with the presump-
tion of optimal relevance. However, most of the time visuals (whether or not
accompanied by information in other modes) are used ostensively not be-
tween two people but as part of a mass-communicative message. This latter
issue is discussed in Chapter 4.
There does not seem to be a big difference between how the two intentions
function in spoken language and visuals. As for the communicative inten-
tion: an ostensive visual stimulus, too, needs first of all to be recognized (= ac-
knowledged) and fulfilled (= heeded). Some ostensive stimuli may actually be
difficult or impossible not to notice. For instance, when somebody addresses
you very loudly from close by or you hear the shrill doorbell in your own
house, if you are not hard of hearing in all likelihood you can’t avoid
noticing it. Similarly, a flickering or moving advertisement on your computer
screen irritatingly imposes itself upon your attention. That is, you cannot help
but recognize the communicative intention in these cases—and it is doubtful
whether you can choose not to fulfill it by ignoring it.
Next, the issue arises of whether an (envisaged) addressee will
recognize the informative intention by processing and interpreting the
visual stimulus; and finally, only if the addressee then accepts that
interpretation as (prob- ably) true or more generally accepts the positive
cognitive effects or cognitive rewards the interpretation triggers can we say
that the informative intention is not only recognized but also fulfilled.
Let us first consider ostensive pictorial communication in a Mary-and-
Peter type of exchange, that is, a picture is shown by one person to an in-
dividual addressee in a face-to-face situation. This situation is quite rare in
Western society (but less so in some other societies; see, e.g., Munn 2016;
Wilkins 2016). One example is a passer-by showing the way to a lost traveler
by drawing a map. The friendly passer-by has a relatively easy job: she wants
to help an audience of one (the traveler) to reach his destination with the
aid of her drawn-on-the-spot map and she can fine-tune her visual or
multimodal message (it turns multimodal if the visuals are accompanied by
language—for instance, in the form of written or spoken street names) in
interaction with the traveler. In the old board game Pictionary players take
turns drawing a
[ 78 ] Visual and Multimodal Communication
picture cueing a word printed on a card only they get to see in such a way that
their teammate is (hopefully) capable of guessing that word on the basis of
the picture. I propose this counts as a form of one-to-one visual communica-
tion. Here is a third example: in an educational context a teacher expects her
pupils to make a drawing satisfying certain requirements (and not merely as
a form of self-expression) for a mark. The task could be, for instance, “Draw
what impressed you most in our recent excursion to the local art gallery,” or
“Draw your family,” or “Draw what you did during the weekend.” Each pupil
would then make a drawing for one person: the teacher—although, again,
it is likely that the visuals would be accompanied by language, as “in many
graphic texts produced by children, drawing and writing are co-present”
(Mavers 2009: 265). Another situation would be the explanation of a tech-
nical problem by an expert to a fellow expert supporting it with a diagram
or sketch. In these examples the visuals are relatively expendable: after they
have served their purpose, they are deleted or thrown away, although a proud
father may stick up the product of a child’s enthusiastic exertions, specially
made for him, for a while in the living room. Similarly, certain visuals may
be stored for a period of time for later consultation—or even archived for
eternity. But these are presumably relatively rare cases of one-to-one visual
communication. Let us say, for argument’s sake, that mass-
communication technically begins when a sender addresses an audience of at
least two per- sons. Of course, the closer the number of addressees is to two,
the less typical is the situation a mass-communicative one, but for theoretical
purposes the distinction suffices. Importantly, the central RT tenet that
relevance is always relevance to an individual holds with undiminished force
when the audience consists of more than a single addressee—whether two,
2,000, or 2 million of them. The implications of the relevance-to-an-
individual tenet in mass- communication will be explored in more detail in
Chapter 4.
The principles of effect and effort apply in much the same way in visual com-
munication as they do in face-to-face verbal communication, given that any
ostensive stimulus comes with the presumption of optimal relevance to the
addressee. To recall, a presumption is by no means the same as a guarantee: as
addressees we are quite often disappointed in the implicitly promised
relevance of a communicator addressing us. Just as a narcissist at a party may
bore us to death with the presumption of the relevance of her self-
aggrandizing chatter, so we may find a TV-program or YouTube film
completely irrelevant because the meager positive cognitive effects or
emotional rewards do not warrant the investment of even our minimal
mental effort. Similarly, the discussion in a science program on TV may be
A D A P t I Ng R e L e VA NC e t H e OR y [
so difficult that we are not prepared to
So far, so good. Thanks to the parallels with processing verbal stimuli, the
pro- cessing of visual stimuli can hitherto by and large be accommodated
within the RT framework. But further pursuing the parallels gets us into
deeper wa- ters, since Sperber and Wilson specify that the first stage in the
interpretation process of utterances is that they are decoded. Only after the
various elements have been decoded and their underlying logical form has
been decided on, utterances can be further processed by means of
reference assignment, dis- ambiguation, and enrichment. Recall that an
addressee draws on the logical form for the derivation of explicatures and
implicatures from an utterance. The concept of logical form is thus a central
issue in RT. For instance, only a fully propositional logical form allows for
interaction with other conceptual representations so as to enable deciding
issues like contradiction and implica- tion (Sperber and Wilson 1995: 72).
Accepting that logical forms do not have to be fully propositional, Sperber
and Wilson state that there can also be less-than-fully-propositional logical
forms, as long as they are well-formed. Such “assumption schemas” (Sperber
and Wilson 1995: 73), which Carston defines in terms of “non-
propositional (non-truth-evaluable) logico-conceptual structure” (2002: 59),
require further completion, to be achieved in the future, in order to
develop, hopefully, into fully propositional logical forms. But even
allowing for the existence of not- yet-fully-propositional logical forms is
A D A P t I Ng R e L e VA NC e t H e OR y [
not going to help solve the following
A D A P t I Ng R e L e VA NC e t H e OR y [
course, only have implicatures” (Sperber and Wilson 1995: 182). The conse-
quence of respecting this definition would be that ostensive pictures
cannot convey explicit information (since that is what explicatures
communicate) or, to reformulate, there would then be no visuals that on
their own, unaccom- panied by language, transmit explicit assumptions.
However, as Forceville and Clark (2014) argue, this is problematic, since
there seem to be at least some types of visuals that communicate coded,
explicit information (see also Forceville 2014; Tseronis and Forceville
2017a).
One way of solving the problem raised by Sperber and Wilson’s strict def-
inition of explicatures is to understand “syntax” in a broader sense than it
is used in linguistics, and to postulate that the logical form in the
“language of thought” is to be understood as a system of rules as to how
not just verbal elements but also information in other modes (such as
visuals and sounds) can be integrated to form fully propositional forms.
This solution would mean abandoning the requirement that in the logical
form concepts must be governed by grammar; it would be sufficient to say
that they are governed by some sort of structure. This means dropping the
idea that we (always) think grammatically or (always) think in grammar,
i.e., discarding the assumption that “the grammar of thought” has the same
underlying principles as “the grammar of language.” Now as indicated
before, there are very good reasons to believe that, indeed, we by no means
always “think in language.” Introspection suggests this to be the case, but
we can also ask: do painters think grammati- cally, or think in the grammar
of language? Do composers? The long tradition of systematically
investigating language as the supreme vehicle for commu- nication has led
to a skewed view of its importance: “sentence-like linguistic expressions
are not primary, but are based on our more visceral, incarnate sources of
meaning and understanding” (Johnson 2015: 10).
It is important to recall here that semiotics has a long tradition of using
the word “code” and has routinely referred to “decoding” as a way to make
sense of pictures. Chandler (2017) devotes a chapter in his Semiotics: The
Basics to the concept of codes, and he summarizes insights from authors
such as De Saussure, Jakobson, Gombrich, and Hall as follows:
A D A P t I Ng R e L e VA NC e t H e OR y [
A semiotic code is closely associated with a set of interpretive and represen-
tational practices familiar to its users, and the conventions of codes represent
a social dimension in structuralist semiotics. . . .
Codes provide relational frameworks within which social and cultural
meanings are produced. Unfamiliar experiences are interpreted analogically
in relation to already codified knowledge. The dominant codes help to main-
tain a broad conceptual consensus and thus facilitate cultural transmission. . .
.
Semioticians seek to identify and describe the various codes that are taken
for granted in this way. The task of the analyst involves identifying and making
ex- plicit the system of distinctions, conventions, categories, operations, and
rela- tions underlying a particular social practice, which gives familiar
phenomena cultural meaning and value as signs (2017: 177–179, emphases in
original).
Obviously, the word “code” is used in a much broader sense in semiotics than
in RT. Chandler mentions three main codes, each with a variety of subcodes:
in- terpretative codes (including those governing perception and ideology);
social codes (including those governing language, bodily contact, proximity
and ap- pearance, fashion, behavior, etc.); and representational codes
(including codes pertaining to science, aesthetics, genre, rhetoric, mass
media, etc.). He adds that “most codes are not explicitly formulated and are
usually followed uncon- sciously. Some theorists question whether some of
the looser systems consti- tute codes at all (2017: 187).
Without proposing that we equate the precision and sophistication of
the syntactic and semantic codes of language as used in RT with those in
the list provided above, I would nonetheless maintain that in essence they
pertain to the same thing: a set of conventions and rules that have to be
learned and that guide our interpretation of what semiotics calls “signs” and
RT calls “os- tensive stimuli.” A person who is not in possession of the
proper code cannot check an ostensive stimulus (whether an utterance or a
visual or a sound or a gesture) against the encyclopedic knowledge
(pertaining to objects, people, scripts, schemata) in his cognitive
environment, and will misinterpret it, or not understand it at all.
That being said, there are important differences between verbal and
non- verbal codes. To understand utterances in an unfamiliar language one
truly has to learn that new language from scratch, but we can recognize and
under- stand many objects, people, and events in visuals because they very
closely resemble objects, people, and events in everyday life. We routinely
identify them, and usually do so correctly. This fact has led critics of
semiotics to point out that although interpreting film for instance requires
an understanding of some basic conventions (such as different ways of
editing together two shots), “learning” to watch film is nothing like learning a
new language. As Anderson puts it, “The perception and comprehension of
A D A P t I Ng R e L e VA NC e t H e OR y [
perceptual systems and the mind of the spectator are viewed in the context
of their evolutionary development” (Anderson 1996: 10; for similar views,
see, e.g., Bordwell 1985, 1989; Bordwell and Thompson 2008; Carroll 1996;
Ildirar
and Schwan 2011).
Chandler admits that the popular structuralist terms “encoding” and
“decoding” have sometimes had “the unfortunate consequence of making the
processes of constructing and interpreting texts (visual, verbal, or otherwise)
sound too programmatic,” acknowledging that “inference is required to ‘go
beyond the information given’ ” (2017: 228, emphasis in original). Chandler
wisely dropped from his book a sentence that appeared in its early online
predecessor, Semiotics for Beginners: “In the context of semiotics, ‘decoding’
involves not simply basic recognition and comprehension of what a text ‘says’
but also the interpretation and evaluation of its meaning with reference to
rele- vant codes” (Chandler n.d.: “Encoding/decoding,” emphases in original).
This latter would seem to suggest that interpretation and evaluation are
entirely a matter of “decoding”—and this is precisely why Sperber and
Wilson saw semiotics as having failed in providing an adequate model for
communica- tion. One of the great strengths of RT, after all, is that it shows
how much of meaning-making is a matter of combining ostensive stimuli with
ad hoc con- text, yielding implicatures as well as explicatures.
In short, the word “code” in semiotics tends to be used fairly broadly, and
thus it does not mean exactly the same as it means in language (namely, the
set of rules governing the correct use of grammar and vocabulary). That being
said, RT acknowledges that even the language code cannot specify the precise
meaning of each single word or phrase: the word “open” means something
slightly different in “he opened the tin,” “he opened the door,” and “he opened
his heart.” In RT, such different uses would be marked by asterisks, so that we
can distinguish between OPEN*, OPEN**, and OPEN***. We could do
some- thing similar for the words “code” and “en/decoding” themselves: in
the case of linguistic utterances, an addressee DECODES*, while in the case
of non- linguistic ostensive stimuli he DECODES**. But I submit that the
similarities between the two meanings are much more important than the
differences. I thus propose that we use the words “code,” “encoding,” and
“decoding” not only for linguistic utterances but also for at least some (parts
of) ostensive visual stimuli (and by extension also for some ostensive stimuli
in different sign systems).
Accepting this claim would mean that not just decoded ostensive verbal
stimuli but also (some) decoded ostensive visual stimuli can serve as the
raw input for the next step in the process of deriving relevance:
disambiguation, reference assignment, and various enrichment procedures
—which in turn allow for the derivation of explicatures and implicatures.
But before we turn to these issues, let me briefly consider what other
semiotic concepts will be useful in an RT analysis of mass-communicative
[ 88 ] Visual and Multimodal Communication
visuals.
A D A P t I Ng R e L e VA NC e t H e OR y [
3.8 SOME OTHER USEFUL SEMIOTICS CONCEPTS PERTAINING
TO CODE** IN VISUALS
Of the many tripartite divisions Charles Sanders Peirce proposed, the only
threesome that is regularly used outside of semiotics scholarship (e.g., by
Clark 1996) is that of symbol, icon, and index. I will here draw on Chandler
(2017) for their definition:
It is to be realized, as Chandler points out, that although the three are often
referred to as kinds of signs, Peirce envisaged them as different
dimensions most signs have simultaneously. However, usually one of
them is dominant over the others, so that in common parlance a sign is
considered a symbol, an icon, or an index.
In many ostensively used visuals we recognize elements because they icon-
ically cue their referents in everyday life. It is not just people or single objects
that are iconically understood, but also certain actions; and certain clusters
of objects, people, and actions, which are usually referred to as scripts or
scenarios. A cluster consisting of pews, people kneeling, an altar, and a priest
will activate the “church” scenario; people at a table with cutlery on it, and
somebody with a menu standing next to it evokes the “restaurant”
scenario. Incidentally, it is not just that the recognition of individual
elements leads to the recognition of the scenario; this is frequently a two-
way process, as often
Having argued that at least some elements of visuals are decoded, I will
now further pursue the analogy with language. As we saw in Chapter 2,
verbal messages rarely come in such a complete form that they
straightaway allow for the derivation of explicatures and implicatures. They
need to fit the format of a “logical form” or “assumption schema,” which can,
but need not be, fully propositional. To allow for the derivation of inferences,
the decoded verbal in- formation requires reference assignment,
disambiguation, and various forms of enrichment. Let me reconsider these
operations with a view of their pos- sible application to ostensively used
visuals.
A D A P t I Ng R e L e VA NC e t H e OR y [
3.9.1 Reference Assignment
First of all, in most realistic pictures we need to know who is who, and what is
what. In most photographs, there is a relation of resemblance between people
as they appear in the picture and as they are known to look in real life—
what Peirce calls an iconic relationship between signifier and signified.
Such a re- semblance is supposed to be specifically salient in photographs for
passports, which have a clear ostensive function: “This is what person X,
whose name and other biographical details appear elsewhere in this
document, and who is this document’s carrier, looks like.” In other
photographs, the resemblance may, for all sorts of reasons, not be so clear,
and the assignment of the cor- rect referent may require some reasoning.
Here is an example: Mary fetches a photograph taken at last year’s
Christmas party, showing it to Peter to prove that their friend Irene was
actually there, while Peter had insisted that Irene could not have been
present at that party as he is convinced she was abroad during that whole
December month. Technically speaking, Mary tries to per- suade Peter to
delete one assumption in his cognitive environment (Irene was abroad last
Christmas) and replace it by another (Irene was at Mary and Peter’s party
last Christmas). However, the photograph may be blurred, or the person
supposedly being Irene may be difficult to identify because she is seen from
the back only. In such a situation the issue of “reference assignment” may
require (mental) work and/or background knowledge. For instance, the
person under consideration wears an unusual hat that both Mary and Peter
know to be Irene’s.
Many issues may problematize reference assignment. Here is an
attested example of one such issue. Quite some years ago (I have to say in my
defense), a student gave a presentation in a seminar, showing a photograph of
a young, blond woman with a milky moustache. One mature student and I saw
a young blond woman with a milky moustache; all the other students saw
Paris Hilton with a milky moustache (the photograph was part of the
celebrity-endorsed “Make Mine Milk” campaign promoting the drinking of
milk to young people). Clearly, spotting resemblance between somebody in a
photograph, on the one hand, and the real-life referent, on the other,
requires that the addressee of the photograph has the knowledge of who
the referent is, and what she looks like, stored somewhere in his cognitive
environment. Note that in this case, only seeing a blond woman, as the
mature student and I did, still left intact a large part of the message—but
not all of it, for we missed the celebrity status of this endorser of milk-
drinking. In the case of drawn or painted rather than photographed people,
other problems pertaining to reference-assignment may arise. Cartoonists
may cue a depicted person’s identity by means of cer- tain salient features
(a big nose, a bald head, prominent breasts, protruding teeth), or by certain
props—much as sculptures of saints in Catholic churches are recognized by
A D A P t I Ng R e L e VA NC e t H e OR y [
resemblance. In portraits, artists may deliberately make the resemblance of
their sitters subservient to other interests, such as trying to bring out the
sitter’s character or status, or expressing their own idiosyncratic style
of painting.
Difficulties with reference assignment do not only emerge with
persons. Objects and buildings, too, may pose problems. We may not
recognize an object in the first place; or we are puzzled about what it is
because we see it represented from an unfamiliar angle or because we see
only part of it. Historians who want to use photographs as evidence, and
thus ostensively, may need to ensure that there is no disagreement about
the referent of a spe- cific, unique building supposedly represented in a
given photograph, taken at a specific time.
Reference assignment also pertains to ostensive visuals’ depiction of
activities. Are these two boys playing or fighting? Is this nude woman so-
phisticatedly exhibitionistic, or has she been surreptitiously photographed
by a paparazzo? Are these police officers involved in self-defense against
a dangerous criminal or are they beating up a helpless victim (the issue
in the infamous Rodney King case)? Consider Figure 3.1. We see a man
using a pole apparently to cross a ditch. The innocent viewer may be
forgiven for thinking he simply does this to get to its other side in order to
continue on his way. In fact, however, the man is engaged in a simulation
of the sports activity that is popular in the province of Frisia in the
Netherlands and is known as “fierljeppen”; the goal is to descend as far as
possible at the other
3.9.2 Disambiguation
A D A P t I Ng R e L e VA NC e t H e OR y [
Figure 3.2 Deliberately (?) ambiguous “A-Style” clothing company logo. Source:
https:// www.boredpanda.com/worst-logo-fails-ever/?utm_source=google&utm_
medium=organic&utm_campaign=organic, last accessed January 2, 2020.
Figure 3.3 Deliberately (?) ambiguous “Dirty Bird” restaurant logo. Source: https://www.
adforum.com/creative-work/ad/player/34501437/logo/dirty-bird, last accessed January
2, 2020.
is that we are not expected to resolve the ambiguity but to relish it. So provi-
sionally I will take it that the procedure of verbal “disambiguation” does not
have an equivalent in the sorting out of ostensive visuals that is distinct from
reference assignment.
Figure 3.4 (a) An extreme-close-up routinely makes us enrich the representation into a
complete face or body. Source: https://steemit.com/tutorial/@armiden/how-to-take-a-
good-and-true-video-image, last accessed January 2, 2020.
(b) A skyscraper cityscape. Windows are absent or indicated as highly stylized slits only.
Source: http://clipart-library.com/clipart/kiMaRg4ij.htm, last accessed January 2, 2020.
(c) A stick figure to which the viewer mentally adds facial features, hands, feet, etc.
A D A P t I Ng R e L e VA NC e t H e OR y [
Source: Internet, provenance unknown.
In many types of ostensive pictures, the drawing style is, in one way or
another, not “realistic,” as we have seen in the discussion of “enrichment.”
Now what counts as realistic is in itself a knotty issue, since the idea of
realism is sub- ject to change over time and place. Nonetheless, most of us
have a fairly clear everyday idea of when a picture counts as “realistic,”
namely, when the depic- tion of something closely, in a more or less
photographic manner, resembles the way we perceive that thing in reality.
But in many situations, the maker of a picture deliberately and routinely
fails to adhere to these conventions of realism. That is, like the woman who
answers “2.30” rather than “2.28” when asked the time by a passer-by in
the street and me saying “Jag talar inte svenska” rather than “I don’t speak
Swedish” to the Malmö market grocer (see Chapter 2), some visuals
deliberately deviate from a faithful depiction. Viewers unproblematically fill
in the missing details in such “short-hand de- piction” thanks to their
knowledge of stereotypes and standard scenarios, as we saw in Figures 3.4b
and 3.4c. Other candidates for “loose visuals” are the stylized pictures of
elements of a machine in a manual for the prospective user, who is to
assemble these elements himself. Again, details that are unmis- takably
recognizable in the real-life referent that the communicator wants to capture
are deliberately omitted in the representation of that referent. Why would
cartoonists and manuals designers deliberately indulge in incomplete- ness in
their visuals? Of course, the answer is that they do so in the interest of
relevance, more specifically, the reduction of processing effort. To achieve
op- timal relevance, a cartoonist, like any communicator, needs to ensure that
the envisaged addressee “gets” the critical, often more or less funny comment
on a state of affairs in the world without being unduly puzzled. This means
among other things that a viewer needs immediately to recognize the situation,
and often the person(s) depicted. Cluttering cartoons with too much detail
A D A P t I Ng R e L e VA NC e t H e OR y [
would
A D A P t I Ng R e L e VA NC e t H e OR y [
salvage the idea that there
Note that in all three cases deriving the pertinent information communica-
tively requires that the picture is used ostensively. It is not difficult to imagine
situations in which this is plausibly the case. As for the paparazzo photo-
graph described in (1), the editor-in-chief of a gossip magazine, faced with
the dilemma of whether she will publish a juicy article about the rumors
that X and Y are having an affair, risking a libel suit if wrong, could print the
pho- tograph to prove that the rumor is true. As for the Bamiyan statue
discussed in (2)–see Figure 3.5—art historians who would want to restore
the statue could present the photograph, probably along with many others,
as a cor- rect representation of what the statue looked like before it was
destroyed. And as for the historically attested Van Meegeren case (3):
after World War II, this Dutch art dealer was accused of having stolen and
sold paintings by famous artists, including Johannes Vermeer, to the
Nazis, and was therefore charged with collaboration. However, Van
Meegeren claimed that he was in- nocent because he had actually forged the
supposed masterpieces himself. The skeptical judges required him to paint
a “Vermeer” on the spot to prove his point—and when he successfully did
so, he was indeed found not guilty of the charge.
A critic might object that additional information from a wider context is
necessary to promote these visuals to the status of giving rise to explicatures,
since on their own they supply at best (perhaps essential) proof for a fully
propositional logical form. But this surely is not fundamentally different
from having to complete lapidary verbal forms such as “on the top shelf,”
A D A P t I Ng R e L e VA NC e t H e OR y [
Figure 3.5 Buddha statue at Bamiyan, Afghanistan, before it was destroyed by the Taliban,
photographer unknown.
As we saw in Chapter 2, utterances can both describe a state of affairs and in-
terpret a representation (another utterance or a thought). Clearly, the most
common use of ostensive visuals, like that of ostensive utterances, is the “de-
scription” (the noun is a bit awkward for visuals, but since it is a technical
term in RT, I will stick to it here) of actual states of affairs, but they also
A D A P t I Ng R e L e VA NC e t H e OR y [
can
(a) (b)
(c)
Figure 3.6 (a) Similarity, not identity, between concepts in the communicator’s and the
addresser’s mind: communicating the concept “tree.” Source: http://d3fhkv6xpescls.
cloudfront.net/blog/wp-content/uploads/2011/02/miscommunication.jpg, last accessed
January 2, 2020.
(b) S. Bailie: The workers’ ideal as envisioned by the bourgeoisie. Source: Vers l’Avenir,
Brussels, 1912, p. 40.
(c) Men discussing early 20th-century art. Source: Panel from Soirs de Paris (1989)
by Philippe Petit-Roulet (writer) and François Avril (artist) © 2017 Humanoids, Inc.
Los Angeles, p. 24.
A D A P t I Ng R e L e VA NC e t H e OR y [
Figure 3.6b has a caption in French, which translates as follows: “To be
‘at home,’ to live in ‘his’ house, cultivate his plot of land, to be his own
lord and master, who then, at the hour when destinies were to be decided,
has not had this dream and thought about realizing this ambition?” We do not
need this cap- tion, however, to understand that the scene in the smoke
“balloon” is a depiction of the man’s “pipe dream.” Put differently, the
depiction in the cloud of smoke is a visual interpretation of the man’s
thoughts pertaining to a desirable state of affairs. (I am indebted to Janet
Polasky for alerting me to this illustration.)
Figure 3.6c is a panel from the album Soirs de Paris, in which the
artists apparently set themselves the task of using no language in the text
balloons of their characters. In this particular panel, we understand on the
basis of the visual description of a number of elements that the men are at
a party. The contents of their speech balloon are visual interpretations of
their utterances about several early 20th-century paintings.
Further proof that “describing” need not be done literally can be found
in the existence of pictorial/visual metaphors (see, e.g., Forceville 1996,
2008a, 2016a; Bounegru and Forceville 2011; El Refaie 2003, 2009, 2019;
Abdel-
Raheem 2019; Benedek and Nyíri 2019).
Gilles Fauconnier and Mark Turner (2002) were perhaps a bit too enthusi-
astic when they implied that “the way we think” necessarily goes via “blends,”
but they certainly drew attention to, and modeled, an important phenom-
enon in human cognition. Blending theory (BT; its later developments are
labeled conceptual integration theory/CIT) can help account for how it is
we understand certain visual and verbo-visual discourses. Whereas I cannot
do justice to the theory here, I trust that a quick summary, using some rel-
evance theory terminology, will give an idea of its potential usefulness for
present purposes. In this summary I rely heavily on, and sometimes literally
quote from, my earlier analyses (Forceville 2004, 2012, 2013). Fauconnier
and Turner claim roughly the following: many of the conceptual domains
an addressee of an ostensive stimulus draws on in meaning-making are not
in themselves sufficient to make sense of discourse and, more generally,
artifacts. Very often, the interpreter needs to evoke two or more concep-
tual domains (called “mental input spaces” in BT), turning them into a new,
ad hoc conceptual domain (called the “blended space,” or the “blend”). This
blending is possible thanks to the fact that the input spaces share certain
similarities (modeled in the “generic space”). This ad hoc combining ability
The punning name “Nim Chimpsky,” for an ape that was able to deploy a
rudimentary form of sign language, is a blend that has the name of the lin-
guist “Noam Chomsky” and the noun “chimpanzee” as its two input spaces.
Shared properties include “being a mammal” and “belonging to a species
with a fairly highly developed signaling system.” On a purely formal level,
the input spaces share certain sounds as well, notably the “ch,” prominent
as the first phoneme of surname and noun, respectively, and the “m.” All
these would thus be represented in the generic space. Unique properties of
the “Chomsky” input space include him being, presumably, one of the most
informed and famous language experts of his species and a proponent of
the idea that the ability to use language is innate, while the chimpanzee
input space confers about everything generally known to be true of this spe-
cies of apes to the blend. The blended space also inherits the consonants
and the monosyllabic structure of the linguist’s first name (“Noam”→ “Nim”)
and the last syllable of his second name (“sky”), while the first part of its
second name (“chimp”) is inherited from the noun “chimpanzee.” The emer-
gent structure in the blend is a felicitous representation of a language-using
chimp (Forceville 2013: 256–257).
A D A P t I Ng R e L e VA NC e t H e OR y [
Generic Space
Blended Space
Figure 3.7 The blending space model (from Forceville 2013, adapted from Fauconnier and
Turner 2002: 46). Key: The big circles are mental spaces. The black and open dots represent
properties. Uninterrupted lines between dots represent a property shared across spaces.
Interrupted lines represent properties uniquely imparted to the blended space by each
of the input spaces. The square in the blended space contains the pertinent properties
in the blend. Unconnected black dots in the input spaces represent properties that were
not imparted to the blended space; open dots in the blended space represent properties
that
were present in neither of the input spaces and are generated thanks to the combining of
the input spaces: these, then, symbolize the new, “emergent” properties.
this commonality is represented in the generic space. With the shared features
and structures of the input spaces as the base, the blend imports pertinent
features from the input spaces to create “emergent meaning” (see Figure 3.7).
The BT model also has serious limitations. While acknowledging the impor-
tance of pragmatic factors in meaning-making, it does not pay much attention
to them. Veale et al. (2013) point out that the BT focus on the perspective
of the recipient leads to a kind of reverse-engineering: in retrospect, the ad-
dressee can always discern which input spaces gave rise to the blend, but it
has nothing to say about how a communicator went about creating it—which
makes it of only limited use when one is interested in ad hoc meaning pro-
duction: “Blending theory cannot be considered a true theory of producer-
centric creativity until it can explicitly identify the heuristics, pathways
and mechanisms that allow a producer to infer the contents of a second input
space for a given input in a specific goal-oriented context” (Veale et al. 2013:
49; see
also Veale 2012).
Brandt (2013), too, identifies problems with BT. Her central thesis is
that insights from semiotics are indispensable for further development of BT
—indeed of cognitive linguistics more generally. Specifically, she rightly
[ 110 ] Visual and Multimodal
criticizes the lack of attention being paid in BT to the crucial importance
of “the situation of enunciation, as an experiential source in meaning
construc- tion” (Brandt 2013: 219). This omission makes mental space theory
needlessly complex. Brandt points out, for instance, that Fauconnier
wrestles with the thorny issue of what constitutes a mental space. He
comes up with six types of spaces: time spaces, space spaces, domain spaces
(such as activities), hypo- thetical spaces (e.g., “if I were you . . .” or “imagine
the following . . .”), tenses, and moods. Fauconnier concludes:
A substantial part of my own scholarly work has been devoted to pictorial (or
visual) and multimodal metaphor. Since metaphoricity is not a central
issue in this book, it is not opportune to dwell on this work. Nonetheless, it
might be odd not to refer to it at all, particularly because, as briefly
mentioned in Chapter 2, my account of metaphor does not square with RT’s
categorization
A D A P t I Ng R e L e VA NC e t H e OR y [
of metaphor as a variety of loose use. This discrepancy, incidentally, did not
prevent me from drawing on RT in the development of my model of pictorial
metaphor in Forceville (1996).
My own understanding of metaphor, including its visual and
multimodal varieties, is in the spirit of Romero and Soria (2014), who see
as the great problem of RT’s view of metaphor as a form of loose use that
it cannot ac- count for the ad hoc, emergent meaning that is typical of its
creative varieties. Emergent meaning in metaphor is meaning that does not
reside in the target, nor in the source, but comes into being in their
interaction (Black 1979). Discussing the example “Robert is a bulldozer,” the
authors state:
The reason why . . . it is hard for relevance theorists to solve the emergent
property issue is that they think that emergent properties not only have to be
attributed to the topic [or: target, ChF] but also to the denotation of the meta-
phorical vehicle [or: source, ChF]. When the speaker uses “bulldozer” meta-
phorically , he is not talking about a bulldozer but about Robert, a person, and
he is not interested in applying this expression to things that it literally
applies to. Nothing is meant to be conveyed about literal bulldozers In its
metaphor-
ical sense, this predicate is not applied to certain machines, the metaphorical
meaning of its properties does not have the requirement that they have in RT
of being applicable to both a tractor fitted with caterpillar [tracks] and Robert, a
person (Romero and Soria 2014: 499).
3.14 SUMMARY
A D A P t I Ng R e L e VA NC e t H e OR y [
an attitude, belief, or emotion vis-à-vis that information), and
communicate this intention by any of a wide range of attention-grabbing
devices. Like os- tensive verbal stimuli and stimuli in other modes, these
visuals thus come with the promise that they are worth the audience’s
attention, since they have something to convey that is supposedly relevant
to the audience. Whether this promise is actually fulfilled depends, as
always, on whether the visuals triggers any positive cognitive/emotional
effects in the audience, and whether these effects are balanced by the
amount of mental effort this audience needs to invest to process these
effects.
Ostensive visuals, whether or not accompanied by language or other
modes, thus come with the presumption of relevance. Even though visuals
do not have a grammar or a vocabulary in the way languages have, visuals
have parts that are decoded, either because they resemble their counterparts
in reality or because they have a meaning that has been ascribed to them
convention- ally. All visuals are in one way or another incomplete and require
that the ad- dressee assign a referent to one or more of their elements. In
addition, usually various forms of enrichment are called for, as many visual
communicators— for instance, in cartoons and drawings—tend to leave out
certain details in the interest of optimizing relevance by reducing
processing effort on the part of the addressees. Enrichment can also take
the form of the recognition of intertextual references, or the awareness that
different mental concepts have been conflated into a “blend.” Even after
performing these mental operations, however, the addressee will not be able to
derive explicatures and implicatures without taking into account a huge
amount of contextual information. The reason for this is that visual
representations are governed by structuring prin- ciples but not by a
grammar specifying which elements are admissible in it and how these items
can be combined. Consequently, most enriched visuals are in need of
additional information, such as, for instance, supplied in the verbal text
that accompanies the picture or in the situational context in which the
visuals are ostensively used, to be capable of being judged relevant. One con-
sequence of this is that while all visuals have implicatures, it remains a matter
of debate to what extent we should routinely ascribe explicatures to them.
But inasmuch as some subtypes of visuals, and some parts of visuals, I
have argued, are completely coded, they can be claimed to give rise to
explicatures. Since this is a controversial claim that will require further
debate in the RT community, each time I refer to an explicature in visuals,
this is to be un- derstood as meaning a visual explicature. It will thus still
have to be resolved whether “verbal explicatures” and “visual explicatures”
can be ultimately be conflated or whether these terms should be
understood as referring to related but distinct concepts.
Finally, visuals can both describe actual and possible states of affairs
and interpret other agents’ descriptions, and can do so both literally and
[ 114 ] Visual and Multimodal
non- literally. Thereby they are capable of metarepresentation.
A D A P t I Ng R e L e VA NC e t H e OR y [
Visuals usually function as part of mass-communicative messages. The
next chapter will therefore address the question how the key notion of
“rel- evance to an individual” fares in situations where Mary needs not
just to be optimally relevant to dear Peter, but instead needs to take into
account the cognitive environments of dozens, thousands, or millions of
envisaged addressees.