Jones 2008 The Role of Text in Televideo Cybersex

The role of text in televideo cybersex
RODNEY H. JONES
Abstract
Televideo cybersex provides a unique example of the ways meanings are

instantiated, identities constructed, and relationships negotiated across dif-
ferent semiotic modes. This article explores the role of verbal messages in
these multimodal exchanges, examining the specific interactional functions
they perform. Text, it is argued, plays a rather unique role in this particular
kind of interaction. Unlike ordinary face-to-face conversation, in which the
body (posture, gestures, gaze) usually plays more of an ancillary role, in
televideo cybersex, the bodily performance is primary, with verbal messages
functioning to contextualize physical actions. Text is used to help increase
the sense of ‘presence’ participants feel, to regulate the rhythm of the un-
folding interaction, to help manage the orderly exchange of information,
and to create narrative frames within which bodily displays can be inter-
preted and made coherent.
Keywords: computer-mediated communication; cybersex; gay men; multi-

modality; text.
1. Introduction
This paper focuses on the ways verbal messages are combined with bodily
displays in televideo cybersex between gay men—what is referred to by
my participants as ‘cam sex’ or ‘cam fun’. Although what will be de-
scribed here might seem rather removed from the experience of many,
looking at this kind of interaction, I will argue, can bring us closer to un-
derstanding how, in everyday life, the body functions as both an acting
subject and an objective ‘display’, and how the subjective and objective
qualities of the body change when it is combined with other modes such
as spoken and written language. As more and more human interaction is
1860–7330/08/0028–0453 Text & Talk 28–4 (2008), pp. 453–473

Online 1860–7349 DOI 10.1515/TEXT.2008.022
6 Walter de Gruyter
454 Rodney H. Jones
mediated through technologies and these technologies become increas-

ingly multimodal, it becomes increasingly important to investigate how
users employ di¤erent modes to ‘embody’ themselves and the e¤ect these
modes have on their social interaction.
Televideo cybersex is a kind of interactive ‘reality porn’ (Barcan 2002)
in which users—who are often strangers—conduct erotic performances
for each other using webcams. Participants meet each other in a variety
of ways—in chat rooms, forums, or though Web sites. Interaction may
take place in the context of these Web sites with one or several other par-
ticipants, or users may contact each other privately using software that
supports videoconferencing such as MSN Messenger. It chiefly consists
of users displaying di¤erent parts of their bodies to each other, usually
while masturbating. The enjoyment for participants is both voyeuristic
and exhibitionistic as they trade positions as subjects gazing upon the
bodies of others, and objects o¤ered up to the other’s gaze (Waskul
2002, 2003). These encounters are most often casual and anonymous; in-
formation about participants’ actual identities is seldom exchanged, and
participants seldom display their faces to each other. There are no reliable
statistics on the extent of this behavior, but anecdotal evidence and the
sheer number of Web sites, IM groups, and video-conferencing directories
devoted to it suggest that it is widespread. In fact, executives from Micro-
soft admit that the earliest adopters and most frequent users of their
video-conferencing applications have been cybersex enthusiasts, though
they are careful to add that the company in no way condones the use of
their software for such ‘risqué’ activities (Lewis 1998).
Although the ‘meat’ of televideo cybersex is in the ‘conversation of ges-
tures’ (Waskul 2002: 204) users engage in with their webcams, there is a
considerable amount of verbal communication as well consisting occa-
sionally in voice conversations through microphones connected to users’
computers or, more commonly, intermittent text messages. My purpose
here is to explore the function of these text messages in relation to the
images users transmit via their webcams, and to understand how text
and image interact to create meaning, relationships, and interactional
coherence.
The data come from an ethnographic study of how gay men in Hong
Kong use computers.1 The study took a participatory approach in which
gay men were recruited as participant researchers, and the data includes
screen movies of participants’ on-line practices, in-depth interviews, and
participant diaries. Participants who engaged in televideo cybersex and
were willing to share their interactions were asked to help recruit chat
partners from their MSN Messenger buddy lists to join the study and to
give consent for their interactions to be analyzed. All in all, 17 examples
Televideo cybersex 455
of televideo interaction were collected involving 24 di¤erent individuals.

Of these participants, 18 were Hong Kong Chinese and six were Cauca-
sians from the United States and the United Kingdom. Their ages ranged
from 18 to 48 with an average age of 27.6.
My aim here is not to make generalizations about the conduct of tele-
video cybersex across a range of cultures, genders, and sexual interests—
indeed such a limited data set would not allow this. My purpose, rather,
is to explore the possible a¤ordances and constraints associated with text
and other modes in this kind of interaction through the close examination
of the practices of a specific community of users.
The analytical framework I use comes chiefly from interactional socio-
linguistics, with its focus on the ‘discourse strategies’ (Gumperz 1982) and
negotiative processes people use to manage interactions and social rela-
tionships. I also draw on Halliday’s functional-semantic view of dialogue
(Eggins and Slade 1997; Halliday 1994), which sees interaction as an ‘ex-
change of commodities’ realized through various conversational ‘moves’.
Insights from other analysts working from a more ethnomethodological
perspective like Goodwin (1994, 2002), who focuses on the ways the
sequential deployment of verbal messages and nonverbal displays oper-
ates to structure interaction in recognizable patterns, also figure in my
approach. Finally, I draw on concepts from scholars in multimodal dis-
course analysis, particularly those working in systemic functional tradi-
tions (Kress and Van Leeuwen 1996, 2001; Lemke 1987, 1998; Norris
2004; Stöckl 2004).
2. The role of verbal messages in multimodal communication
Televideo cybersex is a unique form of multimodal communication both

in terms of its social goals and the particular kinds of modal configura-
tions users deploy to reach them. While it in some ways resembles other
kinds of social interaction, in other ways it di¤ers considerably. Like text-
based computer chat or videoconferencing, or everyday face-to-face com-
munication for that matter, it involves the regular, alternating exchange
of verbal messages, but the function of these messages is very di¤erent.
While verbal messages are usually the primary focus of these other forms
of interaction, in televideo cybersex, the visual messages interactants con-
struct with their bodily displays are primary, the ongoing verbal conver-
sation taking on a more secondary or ancillary role. In particular kinds of
face-to-face encounters in which bodily displays take on a similarly pri-
mary role such as face-to-face sexual interaction and the kinds of bodily
456 Rodney H. Jones
displays one finds in strip shows, peep shows, and the masturbatory dis-
plays gay men engage in in public sex venues such as saunas and public
toilets, the exchange of verbal messages is not always evident and some-
times prohibited. In televideo cybersex, however, the regular exchange of
verbal messages is practically obligatory.
Research into the role verbal messages play in other forms of multimo-
dal interaction has resulted in a number of observations applicable to this
study. The first is the notion that di¤erent modes function di¤erently in
texts and interactions partly because these modes themselves embody cer-
tain ‘a¤ordances’ and ‘constraints’ regarding the kinds of communication
for which they can be employed. According to Kress and Van Leeuwen
(1996, 2001), di¤erent modes work to structure and constrain meaning-
making practices and construct participants’ orientations toward reality
and each other. Among the chief di¤erences between verbal and visual
modes, they say, is that verbal texts work within the logic of time, orient-
ing readers (or hearers) toward causality, and images operate within the
logic of space, orienting viewers toward spatial analytic perspectives.
One useful way of approaching the di¤erent meaning-making potential
of di¤erent modes is through the lens of Halliday’s three metafunctions of
language: the ideational, in which language is used to depict a state of
a¤airs, the interpersonal, in which language is used to construct the rela-
tionship between sender and receiver, and the textual, in which language
contributes to the organization and structure of messages. Stöckl (2004)
suggests that while all modes are theoretically capable of performing any
of these metafunctions, in di¤erent communicative events they may be
distributed across modes in an unequal way depending on how they can
be realized most e‰ciently.
One central point of such work is that modes do not function inde-
pendently, that verbal messages work together with other modes in texts
and interactions in an integrated way in which the meanings they create
together are more than just the sum of the meanings they create sepa-
rately (Kress and Van Leeuwen 1996). For Barthes (1977), for example,
one of the key ways that words interact with images is by ‘anchoring’ or
fixing certain meanings in them, defining their terms of reference or point
of view from which they are to be interpreted.
Whereas in printed texts, words are more often seen as functioning to
‘anchor’ or contextualize images, in face-to-face interaction, gestures,
posture, gaze, and other nonverbal communication are more often seen
as resources with which speakers contextualize words, and so such cues
are considered by Gumperz (1982) as part of a broader class of signals
he refers to as contextualization cues. Such cues a¤ect interaction in a
variety of ways. One way involves the ideational or referential plane of the
interaction—nonverbal communication acting to add to, refer to, depict,

or modulate the meaning of verbal utterances (Kendon 2004). Another
way has more to do with organizational aspects of communication—
gesture, gaze, and bodily orientation acting as resources with which
people display information about their joint participation in the interac-
tion and about the temporal and sequential organization they expect it
to take (Goodwin 2002; Kendon 2004).
Particularly relevant to the kind of interaction I am addressing is the
research into multimodality in computer-mediated environments, which
has mainly focused on the degree to which the modes available either
contribute to or inhibit the sense of ‘social presence’ (Short et al. 1976)
users experience. Early studies with this focus used face-to-face conversa-
tion as the standard for optimum social presence and judged ‘low band-
width’ media environments, with their ‘reduced social cues’ as intrinsi-
cally deficient (see, for example, Daft and Lengel 1984; Sproull and
Kiesler 1986). Computer systems relying chiefly on text were seen as ade-
quate for most task-related communication but insu‰cient for conveying
social, emotional, and contextual aspects of messages. Later work in this
area, however, has found that ‘low bandwidth’ can sometimes enhance
feelings of intimacy and ‘presence’ (McLellan 1996; Walther and Parks
2002), a phenomenon Walther (1996) refers to as hyperpersonal communi-
cation. This observation has also been born out in work on mediated sex-
ual or romantic relationships (Mantovani 2001; Stone 1996). In her work
on phone sex, for example, Stone (1996: 94) notes that the ‘narrow band-
width’ of the telephone arises as a ‘powerful asset’ in such encounters be-
cause ‘the interpretive faculties of one participant or another are power-
fully . . . engaged, (so) . . . extremely complex fantasies can be generated
from a small set of cues’. From the perspective of such scholars, one im-
portant lesson computer-mediated communication has to teach us about
multimodality is that more modalities does not necessarily mean more
meaning, and reduced cues can often provide space for highly nuanced
meaning making and highly intimate relationships.
In another area of communication which is particularly relevant to this
study, that of face-to-face sexual communication in physical sexual en-
counters or other types of sexualized bodily displays, very little research
has been done on the role of verbal messages. While linguistic messages
and paralinguistic cues clearly play an important role in much sexual in-
teraction, with 58% of respondents in a Kinsey Institute study reporting
verbalization during sex (Reinish 1991), and numerous pop psychologists
encouraging such verbalization as a way to increase sexual satisfaction
(see, for example, Stanton 2006), talk during sex is by no means an oblig-
atory feature of this kind of interaction, and for many it is considered
458 Rodney H. Jones
unusual, embarrassing, or ‘kinky’ (Foore 2004; Stanton 2006). Popular

and scholarly treatments of sexual communication tend to frame talk dur-
ing sex either as ‘an exchange of information’, as in most considerations
focusing on the negotiation of sexual satisfaction or safe sex (Quina et al.
2000; Molitor et al. 1999), or as a way to increase the eroticism of the ex-
perience (Stanton 2006). Neither of these framings, however, leaves room
for a consideration of the more interactional, regulatory, and social func-
tions of talk in this context.
In other examples of face-to-face eroticized displays, verbal messages
take on an even more marginal role. In what is perhaps the type of sexual
interaction most similar to that under consideration, the masturbatory
displays gay men engage in toilets and other public sex venues (Clatts
1999; Humphreys 1970; Jones et al. 2000), any exchange of verbal mes-
sages is often strictly prohibited, as such audible exchanges might attract
unwanted attention from passersby or authorities.
3. Text in televideo cybersex
In his 2003 book on cybersex, Waskul makes an important distinction be-

tween text-based cybersex, in which the body is constructed solely
through participants’ verbal descriptions of themselves and of sexual
acts, and televideo cybersex in which participants display video images
of their bodies to each other in real time. In text-based cybersex, he says,
the body is ‘semiotically enselfed’ in words, whereas in televideo cybersex,
the self is ‘embodied in moving images’. Despite the qualitative di¤erence
in the interaction brought about by the addition of the visual mode, how-
ever, most televideo cybersex is still heavily dependent on verbal commu-
nication, and in particular, written text. In fact, the mutual display of
bodies participants engage in and the way this unfolds is crucially depen-
dent on the ‘textual selves’ they construct to go along with their visible
bodies.
In my data, during their televideo encounters participants typed mes-
sages or received typed messages on the average of once every 12.6 sec-
onds. Many of these messages were minimal, consisting of single words
like ‘wow’ or ‘nice’, while others were more elaborate. In general, though,
most were relatively short, averaging 2.8 words per turn. Despite their
brevity, they were seen by my participants as playing an extremely impor-
tant role in the interaction. ‘If the guy doesn’t type anything’, said one
participant, ‘I just log o¤. So boring!’
If such verbal exchanges are such an important part of the experience,
why then, we might ask, do users not avail themselves of opportunities
for voice-based communication. MSN Messenger, for example, o¤ers the

option to supplement videoconferencing with audio chat, and participants
also have the option of combining ‘cam sex’ with phone sex. None of the
encounters I collected, however, used these options. The main reasons
mentioned were technical: participants noted things like the poor sound
quality of computer-mediated voice communication and the inconve-
nience involved in holding a telephone, operating a computer, and mas-
turbating at the same time. Another reason mentioned, however, was
that they did not want to hear their partners’ voices or to be burdened
with the necessity of having to constantly produce ‘noise’ for their partner
to hear. For some the addition of the dimension of voice made the inter-
action ‘too personal’ or ‘kill(ed) the fantasy’. Text-based communication
allowed them to focus more fully on the visual performance and left more
room for them to create an idealized version of their partner.
This preference of written over spoken communication makes the rela-
tionship between the visual and the verbal message in this kind of interac-
tion very di¤erent from that in face-to-face interaction (including face-to-
face sexual interaction). The first di¤erence has to do with the sense of
‘dislocation’ created when the verbal message in interaction is disassoci-
ated from the bodily act of talking. Computer-mediated communication
in general has a dislocating e¤ect, with the actual body proximally dis-
located from the virtual self. In videoconferencing accompanied by textual
messaging, this sense of dislocation is further exasperated by the lack of
entrainment of the visual and the verbal messages (Raudaskoski 1999).
This is particularly true in televideo cybersex in which the action of mas-
turbating often has to be visibly interrupted to accommodate the act of
typing. The second major di¤erence is the fact that, as mentioned above,
visual and verbal modes in televideo cybersex take on functions di¤erent
than they do in face to face communication, the visual performance being
primary, and verbal messages taking on more contextualizing and regula-
tory functions.
The functional bifurcation of modes in televideo cybersex is to a large
degree a result of the fact that, whereas participants in this type of inter-
action are quite forthcoming with displays of body parts that, in most
face-to-face interaction, are normally kept hidden from view, they are
more reluctant to display the one part of the body that is regularly dis-
played in casual conversation—the face. The most obvious reason is
that, unlike other regions of the body, the face contains information that
can clearly be linked to users’ ‘real life’ identities. For some of the users
I talked to, however, anonymity was not just a matter of safety, but
also part of the overall eroticism of the experience. As one participant
said:
460 Rodney H. Jones
prolonged (facial) exposure is a little bit unusual—sort of face-to-face talk, this

really diminishes the sexual tension, as if you identify the person as a friend that
is . . . it’s being anonymous that gives you the kick, right?
That is not to say that participants never show their faces, but when they
do this display is often brief and usually preceded by careful negotia-
tion (see below), and it often signals a change in framing and footing
(Go¤man 1974, 1981), a shift from cybersex to some other activity such
as casual chatting, for example. This relative absence of the face as a
communicative tool, I will argue, is an important factor in understanding
the role of written text in these interactions.
4. Functions of written text
On the basis of my interviews and an analysis of my corpus of interac-

tions, I have isolated four primary functions of written text in televideo
cybersex, functions that are usually taken up by paralinguistic and non-
verbal communication in face-to-face encounters, even of the more inti-
mate kind. In televideo cybersex, text is used (i) to convey a sense of pres-
ence and orientation toward one’s interlocutor, (ii) to aid in the timing of
the interaction, (iii) to help users regulate the moment-by-moment ex-
change of conversational commodities, and (iv) to create narrative and
interactive frames within which bodily actions can be interpreted.
4.1. Presence
For any interaction to proceed successfully, interactants must continually
communicate and monitor mutual attention toward each other in order
to maintain a sense of co-presence, which is central to what Go¤man
(1981) calls a ‘state of talk’. In face-to-face communication, this is usually
achieved through bodily orientation and gaze (Kendon 2004). Since par-
ticipants in televideo cybersex usually do not show their faces, the gaze is
typically not available, and participants’ physical bodies are usually ori-
ented toward themselves (i.e., in masturbation). Therefore, the function
of maintaining this sense of co-presence is transferred almost entirely to
the mode of text.
According to my participants, if one partner fails to type a message for
an extended period of time, his partner is likely to feel ignored, no matter
what kind of bodily display he is being o¤ered. ‘If the guy doesn’t type,
how do I know he’s there?’ This is further confirmed by the fact that par-
ticipants typically ‘prompt’ one another with utterances like ‘hey’ and
‘???’ and ‘u there?’, or with questions designed to elicit a response such as

‘like it?’ after an extended period in which a textual o¤ering is not recip-
rocated, in the same way one might respond to an extended silence in a
telephone conversation. All this suggests that, even when two participants
are aligned in the mutual and reciprocal display of their naked bodies, in
the absence of facial displays, this is not su‰cient to give them the feeling
that these displays are communicative acts. It is the exchange of textual
messages that provides an overall framework for a sustained sense of
mutual availability. In face-to-face conversation, according to Go¤man
(1959), the body is the anchor for communication, the peg upon which
verbal messages are hung. In much televideo cybersex, the text becomes
the peg upon which the body is hung.
Beyond this most basic sense of mutual monitoring, a sense of pres-
ence also involves the feeling of richness in the interaction, which some
argue is increased by the number of modes available to participants
(Daft and Lengel 1984). Whereas in face-to-face interaction, gestures
and other bodily displays serve to ‘animate’ verbal messages, making
communication a ‘richer experience’ (Kendon 2004: 175), in televideo
cybersex, it is the text that serves to ‘animate’ the images, making what
otherwise might seem a cinematic display into something more resem-
bling interaction.
4.2. Timing
Another important function of text is in regulating timing and creating a
sense of conversational synchrony. In face-to-face conversation, gestures,
posture, and paralinguistic cues play a key role in regulating the sequen-
tial and temporal organization of interaction and giving participants the
feeling that they are ‘in synch’ with one another (Condon 1986; Goodwin
2002; Gumperz 1982). In text-based computer-mediated communication,
conversational synchrony is just as important as it is in face-to-face con-
versation. In an earlier examination of text-based cybersex (Jones 2005b),
I noted how users who were successful in engaging partners the longest
tended to be those who were able to establish a regular rhythm of sending
and receiving messages with their partners, and that this rhythmicity was
part of the pleasure participants associated with the activity.
In the case of text-based cybersex, such rhythmic coordination is ac-
complished entirely though the mode of text, dependent upon things like
typing speed and the length of pauses between messages. In televideo
cybersex there is the added dimension of the body through which infor-
mation about timing can be sent. Participants, for example, can regulate
their verbal contributions based on the actions of the other, delaying, for
462 Rodney H. Jones
Figure 1. Time between turns in a sample conversation
example, their own message while the other is visibly typing. They can also
coordinate the pace of their bodily actions based on visual observation of
the other. Even with these visual cues, however, the speed and frequency
of verbal messages still plays an important role in the maintenance
of conversational synchrony as well as in signaling di¤erent phases in the
interaction.
To illustrate this I have plotted the length of pauses between turns in
one of the interactions from my data (see Figure 1). As can be seen, in
the initial stages the time between turns is very short as users exchange
greetings and initial information, and then it lengthens slightly after they
‘get down to business’. What is most striking here is the regularity of
pause length that is maintained throughout this middle phase, with typed
messages being issued at a fairly constant rate of between 5 and 10 sec-
onds, mimicking the kind of rhythmic regularity one associates with
actual sex and helping the participants construct what Prior and Shipka
(2003: 230–231) call ‘embodied chronotropes’, which add further to the
‘tone and feel’ one associates with physical presence.
Goodwin (2002) points out that in multimodal interaction participants
orient to multiple orders of temporality simultaneously, with di¤erent
modes used to create di¤erent forms of temporal and sequential organiza-
tion. In the case of televideo cybersex, the rhythmic back and forth of text
provides a temporal context for the more rapid sexualized rhythms of
bodily movements involved in the masturbatory display. At the same
time, rhythm on these two time scales exists in a complementary relation-
ship in which increased pace on one scale results in a slowing down of

action on another time scale—the increased bodily rhythms associated
with nearing climax, for example, being associated with a slowing of the
pace of verbal messages.
In this regard, text is important not just in creating and maintaining the
ongoing rhythm of the interaction but also in signaling to participants
shifts from one phase of the interaction to another, a function that is par-
ticularly evident in the early and later stages of these encounters. These
shifts are accomplished both explicitly, through the words people type—
initial shifts from ‘cyberchat’ to ‘cybersex’, for example, being signaled by
utterances like ‘wanna show?’ or ‘u hard down there?’, and shifts to the
closing phase of the encounter being signaled by utterances like ‘r u ready
to cum?’—and rhythmically through a slowing down or speeding up of
the exchange of turns. As mentioned above, when participants move
from their initial negotiations to more erotic interaction, the speed of ver-
bal exchanges typically shifts from a rapid exchange of information to a
slower but steady trading of comments to accompany their visual dis-
plays. Another such shift in rhythm comes near the end of these encoun-
ters when participants are preparing to ejaculate, at which point another
dramatic lengthening and time between turns is typical, perhaps because
this moment might require more prolonged attention to one’s own physi-
cal body.
In terms of timing, the moments leading up to climax are the most crit-
ical as, in the best of situations, participants prefer ejaculation to be per-
formed simultaneously.
In order to facilitate this, users typically issue a series of pre-o¤ers—
like ‘lets cum man’ and ‘ready?’. The purpose of such moves is both to
monitor the progress of one’s partner and to enforce reciprocity. With
these messages, users attempt to open slots for the other to issue the o¤er
of ejaculation, as in the following example:
(1) A: Cum?
B: OK
A: Ready?
B: u?
A: yeahhh
B: cumming?
A: u first
B: your cum gets mine.
Ejaculations, once performed, also demand a reaction in the form of a

coda (like ‘wow’ or ‘nice’) in order to close the transaction.
464 Rodney H. Jones
4.3. Regulating the exchange of conversational commodities
The above example dramatically illustrates how, in these encounters, dis-

plays are treated as ‘exchanges’ in which an o¤er of a certain type by one
participant is seen as requiring (or ‘earning’) a reciprocal o¤er from his
partner (‘your cum gets mine’).
While cybersexual encounters always involve some degree of coopera-
tion as partners work together to create a mutually satisfying fantasy, it is
also to some degree competitive, as partners negotiate these exchanges.
Like all interaction, what underlies cybersex is what Go¤man (1959) calls
an ‘information game’, a contest in which interactants vie to maintain
control over their respective ‘information preserves’ while gathering as
much relevant information as possible about their interlocutor’s. What is
perhaps unique about cybersex is the ‘value’ of the information at stake,
information which in this case includes not just verbal information but
also bodily displays of a most intimate nature. The most obvious risk in-
volved in this game is that the information one reveals might somehow
give away one’s ‘true identity’, but this is not the only risk. There is also
the risk that the information o¤ered may not accord with the desires of
one’s partner, resulting in rejection, or that one’s o¤er of information
may not be properly or ‘fairly’ reciprocated.
From the very beginning of such encounters, users tend to measure out
their contributions carefully, trading information about their age, appear-
ance, and sexual proclivities in an incremental fashion, each o¤er opening
a slot for a reciprocal o¤er, and each exchange determining whether or
not the interaction can progress to the next stage (Jones 2005a). This
‘code of reciprocity’ that governs initial textual exchanges becomes even
more important when the visual mode is added, primarily because the
more multimodal the message becomes, the less control interactants have
over their ‘information preserves’—while textual descriptions allow users
to restrict information only to that which is voluntarily given, video also
involves information that is involuntarily ‘given o¤ ’. Faced with this
increased risk, users take steps to control the information they o¤er and
ensure that their o¤ers result in reciprocal o¤ers from their partners.
They do this by carefully positioning their cameras to reveal or conceal
various bodily regions and by using text to negotiate with their partners
the positioning of their cameras. As in earlier text-based exchanges, dis-
plays are o¤ered in an incremental, reciprocal fashion. Participants o¤er
themselves ‘in pieces’—one piece of you for one piece of me. The default
regions are the torso, the torso and penis, the penis alone, and the but-
tocks. The exposure of a particular region has to do not just with how
the body is made erotic but how the body is made meaningful in the on-
going negotiation of information. Text is used here as a means of enforc-

ing reciprocity, as in the following examples.
(2) A: Show dick?
Wow nice
B: Show yours
mmm
(3) A: You hard
down there.
B: LIKE A ROCK
A: lemme c
B: u hard?
Whereas purely textual interactions tend to be dominated by two-part
initiation–response sequences, when the visual mode is added, three-part
exchanges of the kind Sinclair and Coulthard (1975) observed in class-
room interactions are the rule. Verbal initiations are answered by visual
responses, which are then followed up by verbal feedback or reactions.
These reactions are just as important as the displays that precede
them—in fact, they act as a kind of ‘payment’ for the visual display.
(4) A: Can you show me your cock?
B: (displays penis)
A: nice!
As noted above, the region that is most rarely displayed is the face. Be-
cause of the higher risk involved in such displays, the code of reciprocity
is even more strongly enforced—few users would agree to show their face
in the absence of an agreement for a reciprocal display from their part-
ners, and in such cases, the sequential reciprocity observed above be-
comes simultaneous—faces must be shown ‘together’, users concurrently
moving their webcams gradually upward while at the same time checking
that the other user is also doing the same. The mode of text is crucial in
managing these simultaneous exchanges, as can be see in this example:
(5) A: wanna show face? (displaying torso)
B: together (displaying torso)
A: ok
B: ready? (moving camera slowly upwards)
A: (moves camera upward to revel face)
ok thanks (quickly moves camera downward to display torso)
B: (moves camera downward to display torso)
ur cute
A: really?
B: yeah
466 Rodney H. Jones
Often in such cases the camera only lingers in this region briefly, and,
as in this exchange, textual messages like ‘ok thanks’ are used to mark the
end of the exposure.
4.4. Framing
Interactions in televideo cybersex, however, are more than just mutually

negotiated bodily displays. These displays are framed within coherent
erotic narratives that are collaboratively constructed by participants turn
by turn (Goodwin 2002), and text messages play an important part in this
co-construction of discursive coherence.
A key phase in the narrative framing of cybersex is the beginning when
participants are just getting to know each other. This phase is usually
characterized by a series of questions and answers that often need to be
successfully negotiated even before participants begin a videoconference,
questions usually centering on appearance (age, height, and weight) and
sexual preferences (e.g., passive, active). Such constructions are con-
strained by cultural conventions regarding the kinds of descriptors that
are deemed relevant and desirable (Jones 2005c; Stone 1996). Thus, from
the beginning, the narratives which are to unfold are to some extent gov-
erned by pre-existing scripts that include expectations about roles and
relationships (a participant who describes himself as a ‘bottom’, for ex-
ample, will be expected to conduct himself in a particular way once par-
ticipants’ cameras are turned on). When participants finally turn on their
cameras, the ‘verbal bodies’ they have constructed are ‘resemiotized’
(Iedema 2001, 2003) into images (Jones 2005a), but this resemiotization
does not totally replace the verbal with the visual. Instead the verbal
body is superimposed onto the visual image, continually informing the
way it is interpreted. On the one hand, then, the initial verbal descrip-
tions one o¤ers are constrained by the future visual display (one must
be able to ‘pull o¤ ’ the textual self one has created). On the other hand,
the visual display is constrained by the ongoing verbal narration that
accompanies it. Once participants have switched on their cameras, text
helps to contextualize these displays within an ongoing sexual narrative
in which participants claim and impute various roles. In these narra-
tives, verbal contributions give meaning to visual displays, indicating, for
example, what function a particular bodily organ or region is meant
to play at a particular moment in the fantasy. These ‘stories’, however,
do not simply make use of a single narrative frame, but rather normally
exploit multiple inter-nested and overlapping frames, as in the example
below.
(6) A: nice dick (seated, displaying torso)

B: Thanks (seated, displaying penis)
A: wanna fuck me with it? (seated, displaying torso, leaning
forward)
B: sure (seated, displaying and stroking penis)
A: o nice (standing, displaying buttocks)
B: take my cock boy (moving penis toward camera for close-up)
A: fuck me (standing, leaning over, displaying and touching
buttocks)
B: show me that ass (standing, stroking penis—close-up)
more light please (seated, displaying penis and torso)
In this short excerpt, participants construct with their utterances at
least three di¤erent interactive frames: one in which they comment upon
and direct each others’ actual displays in the present moment (‘nice dick’,
‘show me that ass’), one in which a hypothetical fantasy is played out
(‘take my cock boy’, ‘fuck me’), and a broader ‘regulatory frame’ in
which technical aspects of the channel and message quality are negotiated
(‘more light please’) (see Figure 2).
[ regulatory frame [ present moment [ fantasy ] present moment ] regulatory frame ]
Figure 2. Interactive frames in televideo cybersex
Meanwhile, the shifts of frame accomplished through text are rein-

forced with bodily movements, the shift from the present moment to the
fantasy frame (A: ‘wanna fuck me with it?’, B: ‘sure’), for example, being
accompanied by one participant moving forward and the other beginning
to stroke his penis. Verbal frames overlap with visual frames as bodily
movements (for example, one partner displaying his buttocks and the
other moving his penis closer to the camera) imitate the movements of
the sexual acts or highlight the body parts mentioned in the verbal track.
In such cases, the visual messages ‘act out’ the storyline participants co-
construct with their verbal exchanges, and verbal messages act as cap-
tions or as a soundtrack for the visual narrative that is being performed.
Because textual messages are to some degree dislocated from the body,
however, users can participate in di¤erent frames in di¤erent modes in
ways that might be considered rather unusual in real-life sex, as in the fol-
lowing example in which the ‘sexual act’ is momentarily interrupted by an
exchange of small talk.
(7) A: Yeah man, fuck me! (seated, displaying and stroking penis,
legs raised)
468 Rodney H. Jones
B: yea (seated, displaying and stroking penis)

A: where u from? (seated, displaying and stroking penis, legs
raised )
B: Nottingham (seated, displaying and stroking penis)
Robin Hood country (seated, displaying and stroking penis)
A: kewl (seated, displaying and stroking penis, legs raised )
B: give me ur ass boy (seated, displaying and stroking penis)
What is striking about this example, in contrast to Example (6), is that

verbal and visual messages are not entrained, the shift from eroticized in-
teraction to small talk in the verbal track having no e¤ect at all on the
ongoing masturbatory displays in the visual track.
As they move across these multiple frames, participants themselves
take on not only particular roles in the erotic narrative that is being writ-
ten, but also ‘discourse roles’ (Sarangi and Slembrouck 1996) which posi-
tion them at various times in relation to these frames and to their part-
ners. In televideo cybersex, participants typically perform three di¤erent
kinds of discourse roles: the role of performer—presenting their visual
bodies and verbal selves for consumption, the role of director, controlling
the performance of the other, and the role of the spectator, enjoying and
reacting to what one sees. These roles can be fairly consistently mapped
onto the semantic functions (Halliday 1994) of the moves participants
make. The performer issues o¤ers either of a visual variety, revealing
parts of their bodies, or of a verbal variety in the form of invitations
such as ‘want to see it?’, information such as ‘I love asian guys’, and de-
scriptions of the actions one is taking in the fantasy frame or acting out
on camera, such as ‘I’m fucking that ass man’. Directing, on the other
hand, is almost exclusively performed through text, chiefly because the
range of meanings one can express with the body in this regard is more
limited. The director asks questions like ‘do u have any toys?’, makes re-
quests like ‘can u do a close up on your cock and balls?’, and issues direc-
tives such as ‘move your cam closer please’. It is the role of the spectator,
however, in which text is the most important. In this role, participants
issue responses or reactions to the displays of the other like ‘wow’, ‘nice
body man’, ‘o my god’, and ‘HUGE MONSTER COCK!’. As noted
above, these reactions are a crucial element in the interaction, functioning
as a kind of ‘payment’ for visual displays.
Figure 3 shows the relative distribution of semantic moves achieved
through text in my corpus. O¤ers are the least common moves taken
with text, presumably because most o¤ers in this type of interaction are
visual. By far the most common moves taken with text are responding
moves: reactions, answers to questions, and indications of compliance or
Figure 3. Distribution of semantic moves
refusal. The most likely reason for this is the absence of the face as a com-
municative tool. In face-to-face interaction, one watches and reacts with
one’s gaze and facial expressions. In face-to-face sexual encounters, espe-
cially mutual masturbatory displays, the face plays a similar role, signal-
ing responses or reactions to the other’s performance. In televideo cyber-
sex, on the other hand, where the face is usually not available, one
watches with one’s words.
Of course, there is a kind of ambiguity and polyfunctionality to these
visual and verbal moves. A reaction, for example, can also be regarded
as a kind of performance, and, because of the code of reciprocity, each
visual display is also an implicit demand that the other party produce a
similar display. The combination of visual and verbal modes also allows
users to take up di¤erent positions in di¤erent modes, making a visual
o¤er of a particular body part, for example, while verbally issuing a de-
mand that one’s partner does the same.
5. Conclusion
Although telelvideo cybersex is a rather unique form of interaction, ana-

lyzing the ways communicative functions are distributed among di¤erent
modes in these encounters can inform our understanding of multimodal-
ity in general and of the new possibilities for multimodal contact o¤ered
by new communication technologies. While any mode has the potential
to fulfill any communicative function, di¤erent functions are realized dif-
ferently in di¤erent genres. In face-to-face conversation, verbal messages
are usually the chief carriers of ideational meaning, while the body and
face, along with paralinguistic cues, are more associated with communi-
cating attitude and working to regulate the structure and flow of the inter-
action. In televideo cybersex, the opposite is true: most of the ideational
470 Rodney H. Jones
meaning is delivered through images, and the words serve more of a con-
textualizing and regulating function. One performs with one’s body. One
watches with one’s words. Jointly employing both modes allows users
to simultaneously present themselves as objects and to exert agency as
subjects.
One of the chief reasons for this phenomenon is the absence of the face
as a communicative tool. One might say that in televideo cybersex, while
the image is the body, the text is the face. Like the face in face-to-face in-
teraction, text functions as an emblem of selfhood and agency to give to
the experience the feeling of being truly interactive. Thus, to use Waskul’s
(2003) terminology, while participants are ‘embodied’ in the visual images
they broadcast, it is through the words that they type that they are
‘enselfed’.
Another important observation to come from this analysis is how dif-
ferent modes serve not just to elaborate one another, but also to regulate
one another. One of the chief roles of text in these interactions is to allow
participants to manage more precisely their measured and incremental
visual displays. Televideo cybersex is just as much about what one does
not show as what one does. Visual messages come as carved up body
parts rather than complete persons, and verbal messages are stripped of
the paralinguistic cues of audible talk, and it is this ‘semiotic minimalism’
of both visual and verbal modes that helps to make these encounters so
exciting for users by leaving space for them to weave complex fantasies
from a limited set of cues.
In their search for something which one of my informants described as
‘more real than pornography and less real than reality’, participants in
televideo cybersex deploy the verbal mode of communication in di¤erent
ways and for di¤erent purposes than it is deployed in other kinds of inter-
action like text-based computer chat, and most face-to-face conversation
and face-to-face sexual interaction. At the same time, there are also simi-
larities. In nearly all forms of interaction there is an element of bodily
‘display’ (Go¤man 1959), and even users of text-based chat often engage
in textual descriptions of their bodies (Jones 2005a, 2005c). Furthermore,
there are numerous contexts of face-to-face interaction, especially in the
workplace, in which verbal messages do function in similar ways to con-
textualize bodily actions and help regulate pace and rhythm. Examples
can be seen in Nevile’s (2004) descriptions of interactions between pilots
in commercial airliners, and in Filliettaz’s (2005) observations of multi-
modal interactions in a factory. Indeed, much more work needs to be
done to understand bodily displays in general and the ways they are man-
aged in various configurations of time and space using discourse. Finally,
this work also invites a closer consideration of the use of verbal messages
in face-to-face erotic encounters within an approach that focuses not just

on the informational function of language but also on its discourse func-
tions, the strategic ways participants use it to negotiate frames, actions,
and identities within the sexual act.
Note
1. ‘An ethnographic study of computer mediated communication among gay men in Hong
Kong’, City University of Hong Kong Small Scale Research Grant #9030988 (http://
personal.cityu.edu.hk/~en-cyber/home.htm). An earlier version of this paper was pre-
sented the Third International Conference on Multimodality, 25–27 May 2006, Pavia,
Italy.
References
Barcan, R. (2002). In the raw: ‘Home-made’ porn and reality genres. Journal of Mundane
Behavior 3 (1 February). URL: 3http://www.mundanebehavior.org/issues/v3n1/barcan.
htm4 [accessed on 3 October 2006].
Barthes, R. (1977). Image–Music–Text. London: Fontana.
Clatts, M. C. (1999). Ethnographic observations of men who have sex with men in public. In
Public Sex/Gay Space, W. Leap (ed.), 141–156. New York: Columbia University Press.
Condon, W. S. (1986). Communication: Rhythm and structure. In Rhythm in Psychological,
Linguistic and Musical Processes, J. R. Evans and M. Clynes (eds.), 55–78. Springfield,
IL: Charles C. Thomas.
Daft, R. L. and Lengel, R. H. (1984). Information richness: A new approach to managerial
behavior and organizational design. In Research in Organizational Behavior, L. L. Cum-
mings and B. M. Staw (eds.), 191–233. Homewood, IL: JAI Press.
Eggins, S. and Slade, D. (1997). Analyzing Casual Conversation. London: Cassell.
Filleittaz, L. (2005). Time, rhythm and multiactivity: Contextualizing teamwork. A paper
presented at the 9th International Pragmatics Conference, 10–15 July, Riva del Garda,
Italy.
Foore, K. A. (2004). Through the looking glass: Constructing sexual identity. Unpublished
M.A. thesis, University of Alaska, Fairbanks.
Go¤man, E. (1959). The Presentation of Self in Everyday Life: New York: Anchor Double-
day.
Go¤man, E. (1974). Frame Analysis. Cambridge: Harvard University Press.
Go¤man, E. (1981). Forms of Talk. Philadelphia: University of Pennsylvania Press.
Goodwin, C. (1994). Professional vision. American Anthropologist 96 (3): 606–633.
Goodwin, C. (2002). Time and action. Current Anthropology 43 (Suppl.): 19–35.
Gumperz, J. (1982). Discourse Strategies. Cambridge: Cambridge University Press.
Halliday, M. A. K. (1994). An Introduction to Functional Grammar, 2nd ed. London:
Edward Arnold.
Humphreys, L. (1970). Tearoom Trade: Impersonal Sex in Public Places. Chicago: Aldine.
Iedema, R. (2001). Resemiotization. Semiotica 137 (1–4): 23–39.
Iedema, R. (2003). Multimodality, resemiotization: Extending the analysis of discourse as
multi-semiotic practice. Visual Communication 2 (1): 29–57.
472 Rodney H. Jones
Jones, R. (2005a). ‘You show me yours, I’ll show you mine’: The negotiation of shifts from
textual to visual modes in computer mediated interaction among gay men. Visual Commu-
nication 4 (1): 69–92.
Jones, R. (2005b). Rhythm and timing in computer mediated communication. A paper
presented at the 9th International Pragmatics Conference, 10–15 July, Riva del Garda,
Italy.
Jones, R. (2005c). Sexual risk and the Internet. A paper presented at the Language and
Global Communication Conference, 7–9 July, Cardi¤, Wales.
Jones, R., Yu, K. K., and Candlin, C. N. (2000). A preliminary study of HIV vulnerability
and risk behavior among MSM in Hong Kong. Report to the Council for the AIDS
Trust Fund, Hong Kong. URL: 3http://personal.cityu.edu.hk/~enrodney/Research/
MSM/ MSMindex.html4 [accessed on 25 September 2006].
Kendon, A. (2004). Gesture: Visible Action as Utterance. Cambridge: Cambridge University
Press.
Kress, G. and Van Leeuwen, T. (1996). Reading Images: The Grammar of Visual Design.
London: Routledge.
Kress, G. and Van Leeuwen, T. (2001). Multimodal Discourse: The Modes and Media of
Contemporary Communication. London: Edward Arnold.
Lemke, J. L. (1987). Strategic deployment of speech and action: A sociosemiotic analysis.
In Semiotics: 1983, Proceedings of the Semiotic Society of America ‘Snowbird’ Conference,
J. Evans and J. Deely (eds.), 67–79. New York: University Press of America.
Lemke, J. L. (1998). Multiplying meaning: Visual and verbal semiotics in scientific text.
In Reading Science: Critical and Functional Perspectives on Discourses of Science, J. R.
Martin and R. Veel (eds.), 87–113. London: Routledge.
Lewis, P. H. (1998). Videoconferencing’s killer app may be sex. The New York Times 16
July: sec. G, p. 7, col. 1.
Mantovani, F. (2001). Cyber-attraction: The emergence of computer-mediated communica-
tion in the development of interpersonal relationships. In Say Not to Say: New Perspec-
tives on Miscommunication, L. Anolli, R. Ciceri, and G. Riva (eds.), 229–246. Amsterdam:
IOS Press.
McLellan, H. (1996). Virtual realities. In Handbook of Research for Educational Communi-
cations and Technology, D. H. Jonassen (ed.), 457–487. New York: Macmillan Library
Reference.
Molitor, F., Facer, M., and Ruiz, J. D. (1999). Sex communication and unsafe sexual behav-
ior among young men who have sex with men in California. Archives of Sexual Behavior
28 (4): 335–344.
Nevile, M. (2004). Beyond the Black Box: Talk-in-Interaction in the Airline Cockpit. Alder-
shot: Ashgate.
Norris, S. (2004). Analyzing Multimodal Interaction: A Methodological Framework. London:
Routledge.
Prior, P. and Shipka, J. (2003). Chronotopic lamination: Tracing the contours of literate
activities. In Writing Selves/Writing Societies: Research from Activity Perspectives, C.
Bazerman and D. Russell (eds.), 182–238. Fort Collins, CO: The WAC Clearinghouse
and Mind, Culture, and Activity.
Quina, K., Harlow, L., Moroko¤, P. J., Burkenholder, G., and Deiter, P. J. (2000). Sexual
communication in relationships: When words speak louder than actions. Sex Roles:
A Journal of Research April. URL: 3http://findarticles.com/p/articles/mi_m2294/is_
2000_April/ai_655767104 [accessed on 25 September 2007].
Raudaskoski, P. (1999). The use of communicative resources in language technology envi-
ronments. Unpublished doctoral dissertation, University of Oulu, Oulu.
Reinish, J. M. (1991). The Kinsey Institute New Report on Sex. New York: St. Martin’s
Gri‰n.
Sarangi, S. and Slembrouck, S. (1996). Language, Bureaucracy and Social Control. London:
Longman.
Short J., Williams, E., and Christie, B. (1976). The Social Psychology of Tele-communica-
tions. New York: John Wiley & Sons.
Sinclair, J. M. and Coulthard, R. M. (1975). Towards an Analysis of Discourse. Oxford:
Oxford University Press.
Sproull, L. and Kiesler, S. (1986). Reducing social context cues: Electronic mail in organiza-
tional communication. Management Science 32 (11): 1492–1512.
Stanton, L. (2006). Talking Dirty: Learning to Speak the Language of Lust. San Francisco:
Chronicle Books.
Stöckl, H. (2004). In between modes: Language and image in printed media. In Perspectives
on Multimodality, E. Ventola, C. Charles, and M. Kaltenbacher (eds.), 9–30. Amsterdam:
John Benjamins.
Stone, A. R. (1996). The War of Desire and Technology at the Close of the Mechanical Age.
Cambridge, MA: MIT Press.
Walther, J. B. (1996). Computer-mediated communication: Impersonal, interpersonal and
hyperpersonal interaction. Communication Research 23: 3–43.
Walther, J. B. and Parks, M. R. (2002). Cues filtered out, cues filtered in: Computer-
mediated communication and relationships. In Handbook of Interpersonal Communica-
tion, M. L. Knapp and J. A. Daly (eds.), 529–563. Thousand Oaks, CA: Sage.
Waskul, D. D. (2002). The naked self: Being a body in televideo cybersex. Symbolic Interac-
tion 25 (2): 199–227.
Waskul, D. D. (2003). Self-Games and Body-Play: Personhood in On-line Chat and Cyber-
sex. New York: P. Lang.
Rodney H. Jones is Associate Professor in the Department of English and Communication

at City University of Hong Kong. He has published widely in the areas of language and sex-
uality, computer-mediated communication, and mediated discourse analysis. Address for
correspondence: Department of English and Communication, City University of Hong
Kong, Tat Chee Ave., Kowloon Tong, Hong Kong 3enrodney@netvigator.com4.

Jones 2008 The Role of Text in Televideo Cybersex

Uploaded by

Copyright:

Available Formats

You might also like

Jones 2008 The Role of Text in Televideo Cybersex

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Jones 2008 The Role of Text in Televideo Cybersex

Uploaded by

Copyright:

Available Formats

The role of text in televideo cybersex

Televideo cybersex provides a unique example of the ways meanings are

Keywords: computer-mediated communication; cybersex; gay men; multi-

1860–7330/08/0028–0453 Text & Talk 28–4 (2008), pp. 453–473

mediated through technologies and these technologies become increas-

of televideo interaction were collected involving 24 di¤erent individuals.

2. The role of verbal messages in multimodal communication

Televideo cybersex is a unique form of multimodal communication both

interaction—nonverbal communication acting to add to, refer to, depict,

unusual, embarrassing, or ‘kinky’ (Foore 2004; Stanton 2006). Popular

3. Text in televideo cybersex

In his 2003 book on cybersex, Waskul makes an important distinction be-

for voice-based communication. MSN Messenger, for example, o¤ers the

prolonged (facial) exposure is a little bit unusual—sort of face-to-face talk, this

4. Functions of written text

On the basis of my interviews and an analysis of my corpus of interac-

‘???’ and ‘u there?’, or with questions designed to elicit a response such as

Figure 1. Time between turns in a sample conversation

ship in which increased pace on one scale results in a slowing down of

Ejaculations, once performed, also demand a reaction in the form of a

4.3. Regulating the exchange of conversational commodities

The above example dramatically illustrates how, in these encounters, dis-

going negotiation of information. Text is used here as a means of enforc-

Interactions in televideo cybersex, however, are more than just mutually

(6) A: nice dick (seated, displaying torso)

[ regulatory frame [ present moment [ fantasy ] present moment ] regulatory frame ]

Figure 2. Interactive frames in televideo cybersex

Meanwhile, the shifts of frame accomplished through text are rein-

B: yea (seated, displaying and stroking penis)

What is striking about this example, in contrast to Example (6), is that

Figure 3. Distribution of semantic moves

Although telelvideo cybersex is a rather unique form of interaction, ana-

in face-to-face erotic encounters within an approach that focuses not just

Rodney H. Jones is Associate Professor in the Department of English and Communication

You might also like