Professional Documents
Culture Documents
Remote Interpreting A Technical Perspect
Remote Interpreting A Technical Perspect
Panayotis Mouzourakis
European Parliament Interpretation Directorate (EPID)
his article reviews recent remote interpreting (RI) experiments carried out
at the United Nations and European Union institutions, with emphasis on
their salient technical features, which are also summarized in the Appendix.
Motivations for remote interpreting with minimum technical requirements
for sound and image transmission in compressed form as well as the meth-
ods used in recent experiments for image capture in the meeting room and
display in the remote room are discussed. he impact of technical condi-
tions upon interpreters’ perception of remote interpreting is also examined
using questionnaire data, which seem to suggest that the interpreters’ visual
perception of the meeting room, as mediated by image displays, is the deter-
mining factor for the “alienation” or absence of a feeling of presence in the
meeting room universally experienced by interpreters under RI conditions.
he paper also points out the advantages of a more coherent research meth-
odology based upon the notion of presence in a virtual environment as well
as possible innovative approaches to providing the interpreter with meeting
room views.
Remote interpreting, a boon for some, a bane to others, has given rise to much
heated and emotional debate within the interpreting community. Although not
exactly a new idea, as the first attempts in this direction had already taken place
in the 1970s, namely the “Symphonie Satellite” (hiery 1976) and the 1978 New
York — Buenos Aires (Chernov 2004: 82–83) tests, remote interpreting has
recently come into the limelight as a potential mode of conference interpreting.
No fewer than eight experiments using this technique have been carried out
since 1999 in major multilingual organizations such as the United Nations and
the European Union institutions.
Since definitions of the term remote interpreting tend to vary (Niska 1998),
in the context of the present article remote interpreting (RI) will be used to refer
to situations in which interpreters are no longer present in the meeting room,
but work from a screen and earphones without a direct view of the meeting
room or the speaker. his is different from videoconferencing where interpret-
ers are still physically present in a booth, within the meeting room where most
participants are gathered and other participants intervene remotely via a video
link-up. RI should not be confused with video remote interpreting, a term used,
especially in the United States, to refer to a form of person-to-person video-
conferencing mostly (although not exclusively) used to convey sign language.
Experimentation is, by its very nature, oten anarchic and recent experi-
ments in RI, using a bewildering variety of equipment and under conditions
oten specific to the particular institution at hand, is probably no exception.
However, as the dust begins to settle, it is perhaps time to take stock, establish
what has been learned from these experiments and identify the still unresolved
issues. If RI is ever to become a routine form of interpreting, more systematic
research will be needed, to establish technique-specific criteria for technical
standards and interpreter working conditions.
he fascinating ergonomic and cognitive issues raised by the interpreter
experience under RI conditions, which require extensive research before they
can be resolved, and the consequences of the routine use of RI in multilingual
conferences at the organizational and administrative level, will not be included
in the present article. Such matters will only be noted superficially and only to
the extent they are relevant to the technical aspects of remote interpreting.
2. Motivations for RI
In general, there are two kinds of rationale for the introduction of RI in the
United Nations, European Union or other international organizations:
Remote interpreting can be seen as a means of dealing with problems of
interpreter availability and cost. For large multilingual organizations, travel to-
gether with per diems constitute, on the average, about one third of the total
cost of a free-lance interpreter. In addition, such organizations need to deal
with the logistics of putting together large interpreter teams, oten requiring
free-lance interpreters to travel to distant locations, even for brief meetings.
Remote interpreting 47
In fact, if these interpreters did not have to travel but could work from their
homes, they might even be able to take advantage of different time zones to
service more than one meeting per working day, an obvious advantage in the
case of those covering “exotic” languages. Furthermore, for organizations such
as the UN, with staff interpreters dispersed among a number of duty stations
(New York, Geneva, Vienna, Nairobi etc.), remote interpreting could provide
for a more efficient use of human resources.
Another explanation has to do with physical building constraints, in par-
ticular a shortage of meeting rooms with enough booths to accommodate the
twenty (and soon more) official EU languages, or even the impossibility of fit-
ting that many booths into a small or medium-sized meeting room, without
compromising interpreters’ visibility. Other constraints might include a reluc-
tance to install booths in historic meeting rooms and security considerations
mandating the physical separation of interpreters from conferring participants.
In these cases, RI would offer the alternative of accommodating the interpret-
ers in different rooms (whether in booths or in custom-built installations), lo-
cated in the same building or at least within the same building complex.
Whatever the motivation, the ability to physically separate interpreters
from the meeting room opens up a host of new possibilities. In principle, RI
could, without the need for long-distance travel, enable interpreters to service
meetings anywhere in the world. In the future, perhaps interpreters could work
from one of a dense network of decentralized “interpreting stations” linking all
the major cities of the globe.
One of the problems that must be resolved before such a vision can be realized
concerns the need to provide the proper technical framework for transmitting
audio and video to the remote interpreters and also sending the interpreta-
tion audio stream back to the meeting room. Satellite links, both analog and
digital, such as those routinely used to transmit TV broadcasts are used for
this purpose; however, the cost of these satellite connections is quite high. As a
result, much interest has been focused on alternative, terrestrial links, particu-
larly ISDN (Integrated Services Digital Network) lines,1 which can be readily
and inexpensively leased from telephone operators. ISDN lines provide syn-
chronized transmission of digitized audio and video streams in the form of a
number of data “blocks”, each such block providing a capacity or bandwidth of
64 kilobits/second (kbps),2 and equal to a digital telephone connection. Using
48 Panayotis Mouzourakis
In all but the simplest cases involving more than one mobile camera, a
director and an image-mixing station is needed to select the views projected
in the remote room. his is a demanding task, requiring a professional with
experience in working and directing cameramen to capture relevant views of
the meeting room; for any extended period of time, two directors working half
a day each would probably be needed for this task (Louvranges 2001: 10). Al-
though language skills are an advantage for the director at a multilingual meet-
ing, an experienced interpreter can act as a liaison with the director to facilitate
the proper flow of speech events. Cameramen need both sufficient experience
and time before they concurrently find their targets and provide smooth tran-
sitions between images. Another complication with image mixing is the ad-
ditional delay, relative to the audio signal, in the video signal, which must be
properly compensated to preserve sound — image (“lip-sync”) synchroniza-
tion, the absence of which can be very unsettling for interpreters.
Image capture quality is also sensitive to meeting-room lighting conditions.
Meeting-room design does not generally consider image capture, meeting
room illumination is rarely uniform and the lighting temperature does not cor-
respond to daylight conditions (5200°),8 resulting in an altered color balance.
Passive projection and plasma screens, cathode ray tube (CRT), liquid
crystal display (LCD) monitors, and combinations thereof have all been used
to display meeting-room images to interpreters in their remote rooms. Plasma
screens have been most favorably judged for their lack of flicker and conse-
quent minimal eye fatigue, as well as their compatibility with normal lighting
in the remote room. Yet they have been judged aggressive when placed too
close to the booth. In the EU Council 2001 remote-interpreting experiment,
plasma screens initially placed at 130 cm had to be subsequently moved to a
distance of 350 cm from the booths (EU Council 2001: 5). In the December
2001 EPID remote experiment, where plasma screens were placed at a distance
of 150 cm from the booth and could not be moved, interpreters complained of
excessive glare (EPID 2001b: 7). In general, there has been no comprehensive
evaluation of the pros and cons of different kinds of display, particularly the
optimum distance from the booth for screens and monitors.
In principle, relative to what can be achieved using standard definition
(SD), the quality of both image capture and projection should benefit from
the use of high definition (HD)9 video equipment, as tested during the EPID
2004 remote experiment. In practice the HD “silver bullet” failed to materialize
during the EPID 2004 experiment, since individually adjusting each projector
for HD to overcome persistent color balance problems essentially consumed
the first two weeks of what should have been a 3-week test (Barco 2004: 15).
Remote interpreting 5
In any case, since HD transmission would require more bandwidth than SD,
the use of HD for anything other than direct room-to-room connections is
highly problematic.
5. Meeting-room views
While it is beyond the scope of the present article to provide a full ac-
count of the cognitive aspects of remote interpreting, it might be instructive to
explore possible connections between individual features of the remote inter-
preting environment, such as the speaker and meeting-room views, on the one
hand, and the interpreters’ subjective judgment of key psychological parame-
ters, on the other, including their feelings of participation or of alienation. his
was based on questionnaire data provided by interpreters who, ater a meeting,
graded the physical and psychological parameters on a scale of −5 to +5, with 0
corresponding to the conditions of normal, “live” interpreting.
Intriguing results emerged from the analysis of the EPID January 2001
questionnaire data of participating interpreters who had been divided into two
groups: those who consistently wear glasses and those who use only reading
glasses or no glasses at all. he feeling of participation reported by those who
consistently wear glasses was found to be much lower (−3.98) than that of the
second group (−2.89). According to a two-tail t-test10 the probability that this
could be due to chance was only 4%. (EPID 2001a: 30). his would indicate
that interpreters who consistently wear glasses are at a handicap under RI con-
ditions, and suggests a connection between visual perception and interpreters’
feelings of participation.
A correlation analysis11 of questionnaire data from the EPID January 2001
remote experiment also showed a correlation coefficient between alienation
and the speaker view of +0.01 (complete independence), while that between
alienation and meeting-room view was +0.37 (−0.19 and +0.46, respectively
for the group of interpreters not permanently wearing glasses). Although
strictly speaking not statistically significant, this result appears to suggest that
alienation, while unrelated to the quality of the speaker view, is influenced by
the more “global” meeting-room view. he same conclusion emerges from a
meta–analysis of questionnaire data from a number of recent RI experiments,
showing that interpreter alienation/lack of motivation in RI is linked to the
view of the meeting room rather than to that of the speaker (Moser-Mercer
2005: 733).
Another result of the EPID January 2001 remote interpreting experi-
ment concerns the correlation between sound-image synchronization and
the speaker and meeting-room views. he respective correlation coefficients
were found to be +0.30 for the speaker view versus +0.04 for the meeting-
room view (+0.51 vs. +0.13 for interpreters not permanently wearing glasses).
hese results could indicate a possible connection between sound and image
perception, i.e. to a multi-modality of perception under remote interpreting
conditions. Such “parallel” processing of simultaneous complementary audible
54 Panayotis Mouzourakis
Perhaps the single most remarkable feature of the RI experiments to date has
been a lack of ambition in design philosophy. Existing meeting rooms were
“converted” into remote rooms for RI purposes by simply installing screens or
other display devices; the configuration of interpreting booths in these rooms
did not undergo any major modification relative to their normal use for “live”
interpretation. However, since the general layout of these rooms and the posi-
tioning of interpreting booths oten severely limit the possibilities for installing
display devices, this approach is rather problematic. While cost considerations
have also favored this approach to RI, these were less important than the im-
plicit, diehard assumption that RI is just another variant of normal interpret-
ing, or put more crudely, that it makes no fundamental difference to interpret-
ers if they are looking at a real meeting room or at a screen.
While one might have expected that a thorough study of “live” interpreta-
tion in a given meeting room, aimed at the identification of potential interpret-
ing problems, be carried out prior to the RI experimentation in that setting,
this has not been the case to date. he only instance where such a study was
actually performed was the EPID 2004 remote experiment, but even there, due
to tight time constraints, it was impossible, based on ergonomic study data, to
significantly alter the remote experiment’s technical setup. Nevertheless, the
final report for this experiment was the first ever to offer TSI (Technology Sup-
ported Interpretation), a concrete concept encompassing both “live” and re-
mote interpreting booths (Mertens & Hoffman 2005: 134–161).
According to the TSI concept, normal interpreting with full access to visu-
al information from the meeting room and remote interpreting are but the two
end-points of a continuum that also includes a wide spectrum of intermediate
Remote interpreting 55
situations where physical restrictions limit the interpreter’s ability to see all
the participants from the interpreting booth properly, e.g. in large meeting
halls. Accessing the missing visual information for the interpreter might call
for varying kinds of assistance, ranging from the relay of voting results, to a
slide presentation or an improved speaker view or even providing a full re-
placement view of the entire meeting room under RI conditions. he authors
of the 2004 EP remote interpreting report claim that information can best be
realized through the use of individual, ergonomic, computerized workstations
connected to the Internet, installed in an office environment or even within
existing booths, and integrating the functions of interpreting console and vi-
sual display. hese workstations would incorporate a flat 19’’ screen at approxi-
mately 90 cm from the interpreters’ eyes, and allow the interpreter to select one
or more windows with views of the speaker and podium, as well as partial and
panoramic views of the meeting room.
Since the TSI concept has not yet been tested, some of its claims should
probably be taken with a healthy dose of skepticism. For instance, it is not
particularly likely that a 19’’ screen will be capable of providing sufficient detail
for a panoramic view of the meeting room; nor would interpreters be expected
to relish working from a display placed at a distance of only 90 cm (in most
RI experiments interpreters have insisted that screens be placed as far as pos-
sible from the booth). It is also unclear whether the TSI concept, in its present
form, would be compatible with existing booths, especially since the minimum
dimensions for the interpreters’ offices or booths recommended by Mertens &
Hoffman would represent an increase of at least 20% over present booth sizes
(Mertens & Hoffman 2005: 140). Nevertheless, the TSI concept has at least pro-
vided a starting point for future efforts for defining new standards for booths
intended for remote interpreting and/or limited visibility situations.
While it would be rather premature to predict the precise direction of future re-
mote interpreting experiments, one can already identify two broad, interlinked,
areas which could prove fruitful for future research: a) the analysis of the causes
of interpreter alienation under RI conditions and b) the exploration of alterna-
tive more effective methods for visualizing the meeting room in RI conditions.
It is unlikely that much progress in RI will be achieved without a more
systematic understanding of the ergonomic and cognitive issues involved and,
in particular, interpreter alienation or the absence of a feeling of participation
56 Panayotis Mouzourakis
Notes
* Disclaimer: he opinions expressed in the present article are purely those of the author
and do not reflect the point of view of the European Parliament Interpretation Directorate
or of any other European Parliament body.
. ISDN telephone lines provide an alternative to what is known as POTS (“plain old tele-
phone service”): unlike normal telephone lines restricted to carrying voice information only
(in analog form) they can carry voice or data plus connection control information in digital
form (i.e. as strings of 0’s and 1’s).
2. he speed at which data can be transmitted over a digital line is usually expressed in kbps,
i.e. thousands of bits of information per second, each bit (“binary digit”) having a value of ei-
ther 0 or 1. his speed is also (rather improperly) commonly referred to as the “bandwidth”
of the line in question.
4. A pixel (short for “picture element”) is the elementary unit out of which images in digital
form are composed; a pixel is typically represented by a triplet of 8-bit values (ranging from
0 to 255) for each of the three primary colors: red, green and blue. he number of pixels
in an image is referred to as the resolution of that image; a typical resolution for computer
monitor images would be 1024 by 768 pixels.
5. Sound signals are characterized by their frequency, i.e. the number of times the sound
wave pattern repeats itself per second. 1 Hertz (Hz) corresponds to one such repetition per
second; one kHz is a thousand Hertz. he human ear is sensitive to frequencies ranging
roughly from 100 Hz to 20 kHz. ISO standards for simultaneous interpreting require the
faithful transmission of all speech frequencies between 125 Hz and 12.5 kHz (AIIC 2000).
6. A standard 64-kbps block (see note 3) is used in telephony to carry the human voice,
retaining frequencies from 0 to about 3.1 kHz. he “G 722” standard, contained in “H 320”,
provides for the compression of sound signals by a factor of roughly 2, so that a 64 kbps
block can now carry frequencies from 0 to about 7 kHz.
7. Mp3 (more properly MPEG1 layer 3) is a popular standard for compressed sound files,
providing nearly CD quality at roughly one tenth the CD file size.
9. High definition (HD) refers to a standard for TV and video, providing considerably more
resolution (up to 1920 by 1080 pixels) than “standard” digital (SD) TV (720 by 480 pixels).
60 Panayotis Mouzourakis
0. he t-test is a standard statistical test for determining whether a difference observed in
the values of a variable between two subsets of a population is statistically significant or not.
A two-tail t-test makes no presumption as to which subset is expected to return a higher
value for the variable.
. In statistics, two variables are said to be correlated if they follow a similar trend, and anti-
correlated if they follow opposite trends. he correlation coefficient r between two variables
is defined in such a way as to have a value of +1 or −1 for perfect correlation and −1 for per-
fect anticorrelation; a value of 0 corresponds to the variables being completely independent
of each other. A significant correlation is deemed to exist if r > 0.5 (or r < −0.5).
References
AIIC. (2000) Code for the use of new technologies in conference interpreting. Communi-
cate! March-April 2000. http://www.aiic.net/ViewPage.cfm/page120.htm (accessed 18
December 2005).
Barco N. V. (2004). Technical report lot 2. Remote interpreting test, European Parliament
Brussels, December 2004. Unpublished.
Braun, S. (2004). Kommunikation unter widrigen Umständen? Tübingen: Gunter Narr.
Chernov, G. V. (2004). Inference and anticipation in simultaneous interpreting. Amsterdam/
Philadelphia: John Benjamins.
EPID (2001a). Report on a remote interpreting test at the European Parliament. http://www.
europarl.eu.int/interp/remote_interpreting/ep_report1.pdf (accessed 18 December
2005).
EPID (2001b). Report on the second remote interpreting test at the European Parliament.
http://www.europarl.eu.int/interp/remote_interpreting/ep_report2.pdf (accessed 18
December 2005).
EPID (2004). Study concerning the constraints arising from remote interpreting. Special provi-
sions and specifications. http://www.europarl.eu.int/interp/online/english/techno/foru-
men.pdf (accessed 18 December 2005).
EU Council (2001). Rapport sur un test de téléinterprétation effectué au Secrétariat Général
du Conseil. http://www.europarl.eu.int/interp/remote_interpreting/sg_conseil_avr-
il2001.pdf (accessed 18 December 2005).
Esteban-Causo, J. (1999). Rapport de mission: Expérience de téléinterprétation ONU. Unpub-
lished SCIC internal note.
ETSI (1993). Study of ISDN videotelephony for conference interpreters. Unpublished report.
Foote, J. & Kimber, D. (2000). FlyCam: Practical Panoramic Video. In Proceedings of IEEE
International Conference on Multimedia and Expo, vol. III, pp. 1419–1422. http://www.
fxpal.com/publications/FXPAL-PR-00-090.pdf (accessed 18 December 2005).
ITU (2001). Remote interpretation — status report. Unpublished report submitted to IAM-
LADP.
Kalawsky, R. S. (2000). he validity of presence as a reliable human performance metric in
immersive environments. 3rd International Workshop on Presence, Delt, Netherlands.
http://www.presence-research.org/Kalawsky.pdf (accessed 18 December 2005).
Remote interpreting 6
Author’s address
Panayotis Mouzourakis
Interpretation Directorate
European Parliament
Rue Wiertz
B-1047 Brussels
Belgium
E-mail: PMouzourakis@europarl.eu.int
overcome by using non-standard, mp3 encoding (20 kHz sound);image capture in Ge-
neva using 3 cameras filming from different angles; projection in Vienna by 2000 lumen
projectors on large double screens, 4 by 4 m each at 15–20 m from the booths; 384 kbps
per screen. Let-hand side screen shows a static view of the meeting room with a small
close-up of the president in a corner. Right-hand side shows the speaker. Conclusions
(UN 1999: 25–27):
he experiment was a technical success, albeit a qualified one. Saying that it was a
successful experiment should not be interpreted as meaning that remote interpreta-
tion on a large scale is a viable and cost-effective alternative to on-site interpretation
… However, it is doubtful that RI [remote interpretation] will ever become standard
practice for interpreters who were trained to work on-site without some adjustments
to their working conditions. Indeed, this experiment seems to indicate that there are
components in simultaneous interpretation which do not lend themselves to techno-
logical solutions …
e. Ecole de Traduction et de l’Interprétation — International Telecommunications Union
(ETI-ITU) test in Geneva, April 1999: he first experiment to investigate the interpreter’s
psychological/physiological response to remote vs. normal interpreting when six inter-
preters alternate between a normal and a remote booth. Technical setup similar to UN
experiment above: image capture by three orientable cameras; one for president and two
for global view plus speaker. Mp3 quality sound, 384 kbps for the image projected on a
monitor, showing a global view of the meeting room on which a small portrait of the
speaker or president is superimposed. Conclusions (ITU 1999: 19):
he first controlled experiment to evaluate human factors and technical arrangements
in remote interpreting has demonstrated that for the same group of interpreters work-
ing live in a conference room is psychologically less stressful, less tiring and conducive
to better performance.
f. UN New York test, April 2001: Two weeks of meetings in a conference room in N.Y. are
interpreted by a team located in a different conference room, to which audio and video
are provided by a combined ISDN plus satellite (4.85 MHz) link. In the meeting room,
3 cameras are used, one facing participants and capturing speaker close-ups, one facing
the podium to provide an image of the chairperson, and one providing a general view
of the meeting room. here are 3 cameramen and a director. For each booth, images are
displayed by one 42-inch plasma screen (showing the speaker at 512 kbps), 14 feet from
the booth plus one 25-inch monitor (alternately showing the general room view or the
podium at 384 kbps), 11 feet from the booth. he experiment concluded that the mini-
mum requirements for remote interpreting were (UN 2001: 15):
14 kHz sound (requiring 128 kbps) for sending floor sound to the booths (14 kHz)
and 10 kHz sound (at 64 kbps) for sending interpretation back to the floor (10 kHz);
512 kbps for the image of the speaker plus 384 kbps for the floor/podium image.
64 Panayotis Mouzourakis
g. SCIC test in Brussels, January 2000: sound and image transmission by direct cabling.
Image capture by 5–6 cameras (two fixed and 3–4 mobile or one fixed and 5 mobile),
one mixing station, a director, three cameramen, all quadrilingual. A number of different
configurations were used for image display: 16/9 monitors, projection on big screens,
plasma screens, under both natural and reduced light conditions. Interpreters stressed
the added fatigue due to artificial lighting and were not satisfied by the speed of reaction
and target choice of the cameras; they also stressed the alienation of the interpreter from
the meeting room under such conditions (SCIC 2001: 3):
Le manque de vision globale de la salle et du déroulement réel de la réunion entraîne
la perte d’élements d’information essentiels à l’interprétation, relevant de la commu-
nication non verbale…
… Toute la gestuelle échappe. Il devient impossible de suivre réactions et interactions
des délégués, d’identifier le prochain intervenant, d’anticiper la langue qu’il va parler,
de se rendre compte d’éventuels problèmes techniques (micro, mauvaise audition …)
et de proposer des remèdes aux délégués.
h. EPID test in Brussels, January 2001: One week of parliamentary meetings; sound and
image transmission by direct cabling. he meeting room was covered by five cameras,
one fixed camera showing the podium, one fixed camera providing a global view of the
meeting room and three mobile cameras on tripods (3 full or partial let-right views of
meeting room plus 2 cameras for speaker views), operated by 5 cameramen and a direc-
tor. In the remote room, images were projected by a pair of 3000 lumen projectors on
each of 3 large screens (placed in such a way as to be visible from at least one of the 11
interpreting booths) plus 11 monitors, one in front of each booth. he let-hand side of
the double screens was used to project a mosaic of four images which could be used by
the interpreter to select the image appearing on the monitor in front of each booth. he
right-hand screen usually showed a close-up of the speaker, as chosen by the director.
Interpreters had the possibility (through a rather unwieldy device) of choosing one of
the four mosaic images to be displayed on the monitor in front of the booth. Conclusions
(EPID 2001a: 5):
he technical set-up tested in the course of this experiment did not provide interpret-
ers with an adequate and coherent view of the meeting room. his led to a significant
loss of visual information, which was not compensated by the arrangement intro-
duced to allow individual booths to choose between multiple camera views …
… Remote interpreting resulted generally in a significant and cumulative increase in
fatigue and physical discomfort for the interpreters, who reported a marked deterio-
ration in their ability to concentrate and their motivation to work in such a setting, as
well as a significant feeling of alienation.
i. European Union (EU) Council test in Brussels, April 2001: sound and image transmis-
sion by direct cabling. 7 cameras used: 4 mobile cameras for the speakers, 3 fixed cam-
eras for the chair, for a (wide-angle) global view of the meeting room plus a view of
Remote interpreting 65
Problems: Joining the two panoramic (fairly wide-angle) shots creates an optical dis-
tortion. he alternative would be two different meeting room perspectives that would
not exactly match. Since the projector sotware was unable to provide the normal color
balance in HD mode, the experiment was run in standard definition mode for the first
two weeks; HD was achieved only during the last week and still fell short of the test
specifications.