Conference Paper: Audio Engineering Society

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Audio Engineering Society

Conference
Paper
Presented at the Conference on
Audio for Virtual and Augmented Reality
2016 September 30–October 1, Los Angeles, CA, USA
This conference paper was selected based on a submitted abstract and 750-word precis that have been peer reviewed by at
least two qualified anonymous reviewers. The complete manuscript was not peer reviewed. This conference paper has been
reproduced from the author’s advance manuscript without editing, corrections, or consideration by the Review Board. The
AES takes no responsibility for the contents. This paper is available in the AES E-Library (http://www.aes.org/e-lib), all rights
reserved. Reproduction of this paper, or any portion thereof, is not permitted without direct permission from the Journal of the
Audio Engineering Society.

Crafting cinematic high end VR audio for Etihad Airways


1 2
Ola Björling , Eric Thorsell
1
Global Director of VR, MediaMonks
2
Senior Sound Designer, MediaMonks

Correspondence should be addressed to Ola Björling (ola@mediamonks.com)

ABSTRACT
MediaMonks were approached by Etihad Airways via their ad agency The Barbarian Group to create a Virtual
Reality experience taking place aboard their Airbus A380, the worlds largest and most luxurious non-private
airplane. Challenges included capturing audio including dialogue aboard the real plane, crafting an experience
that encourages repeated viewing and combining a sense of truthful realism with a sense of dream-like luxury
without relying on a musical score, all in a head tracked spatialized mix. Artistic conventions around non-
diegetic sound and their psychological impact in VR also required consideration.
Björling, Thorsell Crafting cinematic high end VR audio for Etihad Airways

1. Introduction scrolling and linear automation editing to picture


With the objective of creating an immersive were also very important to us as we were
experience of a virtual flight with the amazing approaching this project from a cinematic feature
Etihad A380 while conveying a narrative structure film perspective.
and encouraging a second and third view, audio • Multi channel techniques like quad binaural did
played a key role in this production. The task was offer a very detailed control over the entire
also to do all this in the form of cinematic VR with soundscape in each of the four directions, allowing
the appearance of a major Hollywood feature film. us for instance to slightly attenuate certain sounds
We were striving for a realistic and authentic sound outside the field of view, thus letting the viewer
while realizing that pure realism isn’t always in the concentrate on the action or dialogue in the chosen
user’s best interest. Creating a distinct feeling of viewing direction. We had previously done several
smooth, effortless luxury was very important from projects this way, but the inferior spatial precision
the start and to achieve that we would need some and the lack of elevation tracking were major
audio enhanced storytelling. There was to be music drawbacks for this method.
only for the titles and credits so the audio in itself • Motion tracked ambisonics in general looked
would have to carry enough emotional power to very promising and offered many of the things we
create this feeling. wanted. Plugins for DAW’s were available from
several developers with a great deal of control
over spatial positioning. However it proved
difficult to include non-diegetic material such as
music in this format since it would follow head
tracking. For us this was a really important aspect
as a score that follows movement will almost
certainly leave the viewer wondering where in the
scene the music is located, or at least break the
illusion that we’re actually on a plane.

Ultimately the verdict was that 3Dception spatial


workstation (since acquired by Facebook) from Two
Big Ears best suited our needs. It is a proprietary
format providing an end to end solution featuring
Figure 1. Grounded Etihad A380 plane.
many benefits from the Ambisonics realm while also
including a static stereo stream for non-diegetic
2. Choice of audio engine material. The AAX format of their plugin suite
We tested several different methods for creating and offers a set of HRTF panners in mono and
playing back spatial audio before deciding on the multichannel format plus ambisonic B-format
toolset for this project. The production scope decoders. This meant we could stay with Pro Tools
involved MediaMonks producing a proprietary app for sound design thereby utilizing many of the tools
which allowed for some freedom in choosing a and techniques we have acquired through the years.
suitable audio engine, but at the same time there had
to be a mix that could live in channels like Youtube 3Dception also allows for dynamically enhancing
and Facebook 360. Choice of playback sound engine the mix in the viewing direction - a “Focus” effect
naturally has an impact on mixing and potentially which could help the viewer navigate through busy
even recording, so it was crucial that this research scenes, similar to the trick allowed by quad binaural
was carried out in full before production began. mixes but with greater directional precision.

We looked at three major functional principles for


the audio engine:

• Object based systems in the form of game


engines offered excellent control over spatial
audio but proved somewhat difficult to use for
linear sound design. It would have involved
building and animating objects for every sound
source in the scene and with rapidly changing
picture edits this would have become too time
consuming for this project. Intuitive timeline
AES Conference on Audio for Virtual and Augmented Reality, Los Angeles, CA, USA.
2016 September 30–October 1,

Page 2 of 7
Björling, Thorsell Crafting cinematic high end VR audio for Etihad Airways

Then we created the muffled noises from the engines


and placed them at the correct positions; The A 380
has a wingspan of around 80 meters (1) so some of
these sounds were placed 20-40 meters from the
listening position, also in mono but spread with a
slight touch of algorithmic reverb in stereo. This
worked surprisingly well to create a sense of
wideness that you experience in reality during a
flight but which proved very hard to capture in a
conventional recording.

We also tried to position faint air condition noises


Figure 2. 3DCeption work environment. elevated by the HRTF panners but it proved difficult
to sense the spatial location for these sounds so we
3. Location recording ended up losing them to avoid overloading the
We shot most of the experience in an Etihad A380 ambience with noise.
aircraft but also had access to a training facility in
Abu Dhabi faithfully replicating the cabin Added to this soundscape was of course the entire
environments. We quickly realized that it would not human component, the faint background walla
be enough to use an ambisonic microphone (crowd murmur). We paid a lot of attention in
capturing a realistic sonic representation of the scene getting the right and slightly anechoic cabin sound
and that we would have to augment such a recording due to the large amount of high frequency absorbing
with classical film style methods. Throughout this provided by mainly the seats and the carpets. We
project we kept considering how to strike the used the Coresound TetraMic to make ambisonic
balance between strict realism and an enhanced recordings from the set for this, but we also used
cinematic experience in a way that enhances rather some dry studio recordings to be able to control the
than detracts from the unique form of immersion and intensity of the voices at different locations. We
presence that VR can provide. placed various groups of voices at different positions
to create a dynamic background that would change
We outlined a plan to capture as many individual with head movement as opposed to an evenly
close up sounds as possible along with an ambisonic distributed wall of walla.
recording at camera position and to that end we used
traditional methods like booms and lapels but also
small field recording devices hidden in the props.

4. Sound design – ambiences


We recorded authentic ambiences in the various
sections of the A380 during real flights to use as a
base for the sound design and as an anchor to reality.

Using the stereo HRTF panners in 3Dception we


placed layers of these ambiences quite close to the
listener in a quad pattern which worked really well
because of the static nature of the sound in itself. We
augmented this foundation with subtle high Figure 4. CoreSound TetraMic.
frequency wind noises in mono that we carefully
placed at the nearest window with the HRTF
panners. This helped to establish a fixed point in the During the pre-production process it was decided
cabin that would rotate nicely with head movements. that no musical score would be used in the cabin
scenes, as we had previously experienced how in
We discovered that the spatialization process as VR it could serve to distract the user rather than as
applied by this sound engine affected these sounds the emotional guide it’s intended to be in traditional
slightly different at various frequencies and so we filmmaking. To make up for the emotional content
tweaked them extensively to match spatial location of music we worked very hard to introduce a
while maintaining an authentic sense of airiness and harmonic, almost musical sense to the ambiences.
voluminosity. We started out by filtering authentic cabin and
AES Conference on Audio for Virtual and Augmented Reality, Los Angeles, CA, USA.
2016 September 30–October 1,

Page 3 of 7
Björling, Thorsell Crafting cinematic high end VR audio for Etihad Airways

engine sounds with the highly resonant analog filters The cabin emulations were applied in mono as
in a modular synthesizer. A couple of passes with inserts on every channel of dialogue in order to
different tunings gave us the opportunity to play match the sound characteristics whilst retaining
“chords” with airplane sounds. It’s important to note accurate spatial positioning. With the very short
that no synthesis was used—just analog filtering and decay times involved this had almost more of an eq
waveshaping. But as it turned out there was no way effect than a room sound per se. We then used the
to strike the perfect balance between sound quality 3Dception reflection engine to add some very subtle
and harmony: With high resonances it started to early reflections in order to make all of the dialogue
sound almost sinusoidal like a simple chord played sit together in the same space.
on an synthesizer and with lower resonances it
turned into an eerie windy tunnel effect. At the onset we considered the idea of recording the
dialogue with an ambisonic microphone but we
The answer was to fill the harmonic gaps from the didn’t move forward with that. The main reason for
peak resonant analog filters with a large number (15- this was that the inevitable distance between the mic
20) of digital filters that could be played and the actor would have made it impossible to get
polyphonically and thus allowed us to create a wash the close up sound that we were after. It could be
of tuned harmonic ambience that sounded like we argued that an ambisonic recording would have
had imagined in the first place. We placed different provided greater spatial precision and more realism
parts of these sounds at different spatial positions over all, but we discovered that hyper realism
using HRTF, thereby creating an immersive doesn’t always tell the story the way we wanted.
harmonic foundation. This ambience was balanced Sometimes an intimate whisper can be more
against the realistic sounds to create the different effective when not played in a realistically correct
signature cabin sounds that go through the entire ambience, for example.
experience.
Another task was to get the voice of the pilot over
As observed during our recording of the plane in speakers to come from the expected location
flight, a lot of the ambience is spatially perceived to overhead. We had to do this in post since the pilot’s
originate far outside the fuselage. Extending these lines weren’t decided at the time of the shoot. Once
ambiences to that distance away from the viewer his line was recorded, we played it over a small
made the ambience much more believable, despite loudspeaker placed at the correct position and
being texturally quite different from the reference recorded its output with an ambisonic microphone.
recordings. That proved quite effective and you can easily locate
the speaker in the experience when looking up. We
5. Dialogue and voices actually tried to place several speakers at realistic
No production is without surprises, and a major one distances since in real life the sound wouldn’t just
for sound was the discovery that a grounded Airbus come from a single speaker in the cabin, but it
A380 cannot be powered up without running the tended to blur the effect a bit so we ended up using
rather powerful air conditioning at high power. Due just the one.
to the noise this created, it proved to be somewhat of
a challenge capturing the close up dialogue we were 6. Foley and small sound effects
after. Since the camera solution was to not shoot Since we used separated dialogue we had to recreate
omnidirectionally, we actually had the opportunity all footsteps, movements and small effects to make
to occasionally use a boom and shotgun, something each scene come alive. This had the benefit of
that’s normally out of the question for cinematic allowing us to bring some sounds forward more than
VR. We also used carefully placed lapels and after others to achieve the effect we were looking for, and
reviewing the sounds we ended up using a mix of to serve as a narrative guide for the user and to
location dialogue and ADR. We went to great efforts elevate the emotional impact of the scenes. For
to match the sound of the ADR with the location instance enhancing the small sounds of the leather
dialogue using cabin impulse responses captured on seats helped us to create a sense of quiet luxury. We
the plane, and we put every voice through an wanted the viewer to hear many small and normally
individual HRTF panner to match the exact position quiet sounds because it will create a sense that the
on screen. It also proved quite challenging to match cabin is very quiet and that in turn would allow us to
the sound of the last scene, shot in a studio, with bring up our elaborate ambience without drawing
those shot in the actual cabin because of the higher too much attention to it. We further enhanced this
ceiling in the studio. feeling by keeping the foley quite dry. Compared to
the movement sounds captured on location, these
new sounds made the entire cabin feel more

AES Conference on Audio for Virtual and Augmented Reality, Los Angeles, CA, USA.
2016 September 30–October 1,

Page 4 of 7
Björling, Thorsell Crafting cinematic high end VR audio for Etihad Airways

expensive, so to speak. These sounds were also all intention here was not to create a 360° sound field
mono. but rather something similar to a cinematic
experience where there is a defined front and back.
In between cabins there is a text overlay explaining
in what cabin we are and where the plane is on its For the transitions we used a heavily filtered
journey from New York to Abu Dhabi. This is symphony orchestra wash pushed far to the sides by
accompanied with a classic cinema style graphic text the HRTF panners and put it quite low in the mix
readout sound. We were happy to discover that the with the intention that it should be felt rather than
spatial placement of this sound provided a cue for heard.
the viewer where to look first after a scene transition
since the text was placed at the main point of action.
This is the only really synthetic sound in the
experience.

We quickly learned the importance of drawing


meticulous automation curves to make each sound
really fit the picture both in panning and in distance.
It also proved extremely important to switch
between different picture views for this. The
proprietary 3Dception video player helped get a
sense of how close things really appear in the end
experience and of course get the panning checked,
while the final output from post production, once a Figure 5. Music layering and spatial positioning.
scene was done, provided an important overview of
how the sounds interacted on a larger scale. 8. Mix
There was no mix phase in the traditional sense for
There is the possibility to use an Oculus Rift© with
this project. Every new sound was tweaked to the
the 3Dception video player but with the amount of
right position and level at the time it was put in and
Pro Tools interaction required for editing this didn’t
constant adjustments were made throughout the
quite fit our workflow at that point.
project. This meant that the client would always
review the actual mix, not having to imagine what
7. Music and transitions the end result would sound like thus creating an
The Etihad theme music, played on the planes evolving process for sound design.
before take-off for example, was to be used in intro
and outro. In order not to lead the user to believe We have used this approach extensively in our
that the music is diegetic, it was vital to be able to previous work and while it isn’t suitable for all
keep the music in a static stereo layer, separate from projects we found it particularly useful in this case.
head tracking. Positioning the sounds spatially and with the right
room reflections during sound design allowed us to
Starting with the stereo music master we used evaluate the “final” result and also to learn how the
techniques inherited from the 5.1 realm with HRTF process affected each sound. It became clear
upmixers and M/S matrices subsequently fed into a that timing differences were important below around
series of HRTF panners in a traditional 5.0 pattern to 800-1200 Hz and that spectral/level differences
give the music a really immersive quality. We dominated the higher frequencies. Small, repeated
discovered how this approach worked better on transients proved very effective for spatialization
some sounds than others so we split the music into while beds of noise in the white realm could mutate
frequency bands with individual spreading in various ways when put through HRTF panners.
functions. Many sounds were replaced immediately because
we could hear the effect of the mix when designing
The music was also layered, where some layers were them.
pushed way back through reverbs to create an
ambient mood with some sounds still being very As mentioned earlier we wanted to slightly
clearly up front. The distance controls in the HRTF emphasize some sounds in the viewing direction and
panners were really important to achieve this effect. for that purpose the Focus feature of the 3Dception
engine was used. It applies a level difference for a
The picture shows the places of origin we were selectable area of the viewing direction and gives the
imagining for the various musical components. The

AES Conference on Audio for Virtual and Augmented Reality, Los Angeles, CA, USA.
2016 September 30–October 1,

Page 5 of 7
Björling, Thorsell Crafting cinematic high end VR audio for Etihad Airways

viewer the opportunity to single out certain sounds


or pieces of dialogue by looking in that direction.
Since this very clearly means leaving the realistic
domain it’s a matter of taste how effects like these
should be applied, and what level of realism is
required to avoid distracting the viewer. In the case
of this production, the Focus feature provided a way
for the viewer to focus on the selected action,
especially in the scenes with several pieces of
dialogue taking place simultaneously around us. It
also helped enhancing the sense of things moving
out of focus and continuing behind us when looking
around. All in all we wanted to encourage the viewer
to look around in the scenes and discover the many Figure 3. Desired placement of air conditioning
small details that contribute to the story. sound—not used in the final mix.
All of these things were of course possible in an
In summary, we were somewhat surprised that we
object based system such as a game engine at the
ended up using so few ambisonic recordings in favor
time and there’s a strong appeal in the possibility to
of constructing a soundscape in a more traditional
declare logic rules for sound playback but for a
feature film style for this project. That said, we don’t
cinematic VR experience of this length and
feel that this is a clear path and depending on the
complexity it didn’t seem quite realistic to use a
circumstances it might prove more suited to use only
game engine like Unity or Unreal for sound within
ambisonic recordings on other projects.
the given time frame.
9. Room for improvement Systems like Dolby Atmos© for VR looked very
We’re always keen on improving existing promising but weren’t released at the time of this
workflows and finding ways to make things sound production. Currently our dream system for creating
better and in this case there are a couple of things immersive audio for cinematic VR experiences
we’d like to see as the technology evolves. would be a combination of linear channels with solid
sync and timeline scrolling like in a traditional
We’d really like a way to isolate certain sounds from DAW, combined with the object based flexibility
the general mechanics of the spatial engine. For and programmable logic control of a powerful game
example it would have been really nice to put the engine. We’d really like to be able to build custom
sound of an air conditioner so that it’s only heard reflection patterns by defining the shape and
when looking up. We wouldn’t care that much for materials of the surrounding walls as well as placing
realistic binaural spatialization of a sound like that objects for correct sound occlusion where
since it would mean hearing it constantly through applicable.
the scene and just adding to the general noise level.
But a touch of it, panned at the correct elevation by As for the playback format we targeted our in-house
HRTF when looking up, would have helped to developed app on a Unity base which meant that we
invoke curiosity and perhaps the desire to explore could use all the features offered by the 3Dception
other areas of the scene. There might even be room rendering engine. Following the release, the desire
for dynamically mixing layers of music depending to publish the experience on YouTube360 was
on the viewing direction with the intent of guiding expressed and this showed us another field of
the viewer towards looking in a certain direction to potential improvement: There are several systems
catch some action that will appear in a moment. for spatial audio reproduction in use today, but many
of them are incompatible with each other. The
3Dception mix had to be converted to the first order
ambisonic format that YouTube required (ACN
channel order using four channels and SN3D
normalization (2)) which meant that we would lose
not only the Focus aspect of the mix but even more
crucial - the static layer for non-diegetic sounds.
These things made the mix of course sound radically
different on YouTube compared to the app.

AES Conference on Audio for Virtual and Augmented Reality, Los Angeles, CA, USA.
2016 September 30–October 1,

Page 6 of 7
Björling, Thorsell Crafting cinematic high end VR audio for Etihad Airways

While a standardization of playback formats for the


different platforms is probably utopian, the current
situation makes for some annoying obstacles when
publishing across all the desired destinations. And
who doesn’t want to see (and hear!) their film on all
available channels?

Also on the subject of playback: Binaural stereo


comes across as a very practical solution because of
the compatibility with widespread hardware but
there is room for improvement when it comes to
calibrating the output to the individual anatomy of
the listener. Our tests have shown that different
people listening to the same binaural recording
perceive the spatial orientation quite differently.
This might be caused by several things but one of
them is most likely individual head and pinna
anatomy. A way of measuring these properties and
applying the closest matching HRTF dataset for
playback would be a very interesting prospect.

And last but not least, learning more about the


physics of sound, psychoacoustics and the principles
of human hearing is more important than ever when
mixing sound for VR experiences.

References
1. Airbus S.A.S., “A380 specifications”
http://www.airbus.com/aircraftfamilies/passe
ngeraircraft/a380family/specifications/

2. Youtube, “Use spatial audio in 360-degree


and VR videos”
https://support.google.com/youtube/answer/6
395969?hl=en

AES Conference on Audio for Virtual and Augmented Reality, Los Angeles, CA, USA.
2016 September 30–October 1,

Page 7 of 7

You might also like