Enhancing Virtual Reality Experiences Through Embedded 3D Models in Video Content

Enhancing Virtual Reality Experiences through
Embedded 3D Models in Video Content

Jeferson B. da Costa, Eduardo J. Pereira Souto
2024 IEEE International Conference on Consumer Electronics (ICCE) | 979-8-3503-2413-6/24/$31.00 ©2024 IEEE | DOI: 10.1109/ICCE59016.2024.10444434
Roberto G. E. Leyva, Institute of Computing – Federal University of Amazonas

Pio H. Ordozgoith. da F. S. Manaus, Amazonas 69067-005
Sidia Institute of Technology Email: esouto@icomp.ufam.edu.br
Manaus, Amazonas 69058–830
Email: {jeferson.costa, roberto.leyva,
pio.sobrinho}@sidia.com
Abstract—In the rapidly evolving realm of Virtual Reality embedding 3D assets directly within videos. This approach
(VR), the integration of video content with 3D modeling presents not only provides a streamlined and easily accessible format
various challenges and opportunities. This paper introduces a but also transcends mere display. The vision is to position
methodology that incorporates glTF models directly into video
files, offering a holistic and immersive multimedia experience. By these 3D assets around the viewer, enabling direct interaction.
utilizing a custom Blender plugin and a Unity-based video player, This interactivity enhances immersion, transforming the act of
we enable seamless synchronization between video content and watching a video into a more encompassing experience. This
3D animations. Our approach utilizes steganography, specifically approach can transform the experience of both contemporary
the Least Significant Bit (LSB) technique, to hide 3D models and classic films.
within video files without significantly compromising video qual-
ity. The Blender plugin facilitates the creative process by allowing The concept of embedding files within videos has been
users to synchronize video and glTF animations, while the Unity explored in several research studies [6], [7], [8], [9]. Some
video player ensures real-time rendering and interactivity during researchers have proposed the use of video Compositing or
playback. Our results, showcased through user perspective views, Camera Tracking (Matchmoving) [10], [11]. Both function-
highlight the transformative potential of our methodology, setting alities are crucial for visual effects work and are often used
the stage for a paradigm shift in VR entertainment and beyond.
in tandem. For instance, after using camera tracking to insert
I. I NTRODUCTION a 3D model into real-world footage, is commonly necessary
Virtual Reality (VR) has gained significant attention in re- use compositing to further integrate the model by adding
cent years due to its immersive and interactive experience that shadows, reflections, or other effects to make it fit seamlessly
enables users to interact with virtual objects and environments into the scene. Other researchers have explored the use of
[1]. VR applications span various domains, including enter- Steganography techniques to embed images within videos,
tainment, education, training and healthcare [2], [3]. In the where the images are hidden within the video frames [12],
entertainment sector, particularly video consumption, there is [13].
a significant potential for growth. However, enhancing videos In this paper, we introduce an innovative use of the
with realistic 3D graphics, a critical element for immersive steganography technique. Rather than leveraging it for con-
VR experiences, presents substantial challenges [4]. cealing information, we harness its potential to enhance infor-
Integrating 3D effects into videos is not only technically mation visibility, aiming to amplify the user experience during
challenging but also comes with a hefty price tag [4]. While video consumption. Our focus is on seamlessly embedding
the focus often leans towards crafting new VR content, there 3D objects into videos, thereby allowing the simultaneous
is a wealth of existing videos awaiting enhancement. Adapting delivery of both video and 3D content within a singular file.
these to the 3D realm, especially when they were not originally We introduce a time-stamped file format, designed to ensure
designed for it, is a task that cannot be underestimated. that glTF models and their corresponding animations are
Fortunately, there are open-source solutions available, rendered at the precise intended moments. To complement this,
equipped with capabilities to directly embed 3D objects into a specialized video player is presented, capable of extracting
videos [5]. Harnessing the versatility of these open-source plat- and displaying the 3D objects in sync with the time-stamped
forms paves the way for groundbreaking viewing experiences, cues. Furthermore, we will delve into the challenges and
amplifying video immersion levels. Embracing affordable so- opportunities this methodology presents for VR applications
lutions that can seamlessly integrate 3D components into both and explore its potential impact on the future of VR content
contemporary and legacy videos is undeniably pivotal for creation.
charting the future trajectory of the VR entertainment industry.
While these tools already offer significant enhancements in
immersion, their true potential lies in an intriguing possibility:
Authorized licensed use limited to: Universidade Tecnologica Federal do Parana. Downloaded on March 12,2024 at 21:16:13 UTC from IEEE Xplore. Restrictions apply.
II. V IRTUAL R EALITY E XPERIENCES THROUGH glTF files within a video file, enabling simultaneous playback
E MBEDDED 3D M ODELS IN V IDEO C ONTENT of both the video and the 3D model.
Steganography is the practice of concealing a file or mes- To achieve this, we initially encode the glTF file using a
sage within another file or message, in a way that is not steganographic technique. There are several steganographic
obvious to an observer [14]. On he other hand, container techniques that can be used for this purpose, such as Least
formats are multimedia files that can contain multiple streams Significant Bit (LSB) and Discrete Cosine Transform (DCT)
of audio and video, along with other metadata [15]. They [13]. In our implementation, we employ the LSB technique,
are commonly used in creating and distributing video content, which involves replacing the least significant bit of each byte
and are supported by a wide range of software and hardware in the video file with a correspondent bit from the glTF file
players. [13]. This allows us to hide the 3D model within the video
One way to integrate files into videos involves utilizing file without significantly compromising its quality.
steganography techniques to embed them within the video To minimize the impact on video quality, we first define
stream. For example, a file could be hidden within the least the amount of information that each frame supports. Once
significant bits of the video frames or audio samples, so the maximum amount of information in each frame has been
that it is imperceptible to the human eye or ear, but can be defined, we will only use half of the capacity to encode 3D
extracted with specialized software [16]. This approach has the objects, thus reducing the total number of bits that will undergo
advantage of not requiring any modification to the container changes, and therefore preserving the quality of the original
format and can be applied with a wide range of video codecs video.
and formats. Let’s break down the potential for data hiding in a single
Alternatively, another approach entails using container for- frame, by using a single Channel (e.g., Blue channel): Each
mats to add the file as a separate stream within the video pixel can hold 1 bit of data in its LSB. For a frame of size
file [15]. For example, the file could be added as a subtitle W idth × Height, we can hide W × H Bits or
or caption track, or as a separate audio or video stream. W ×H
= Bytes
This approach has the advantage of being more visible and 8
accessible to users, as the file can be easily extracted and (since 1 byte = 8 bits). Using one channel for a 1920x1080
played back using standard media players. However, it requires frame:
modification of the container format and may not be supported 1920 × 1080
by all video codecs and players. = 259, 200Bytes
8
In recent years, 3D objects have been incorporated into
videos through post-production techniques and used in the or 259 KB. Since we are using only the half capacity of each
context of virtual reality (VR) and augmented reality (AR) frame,
259, 200
applications [4]. For example, the GL Transmission Format = 129, 600Bytes
2
(glTF) is a container format that is specifically designed for
or 129 KB will be stored in each frame.
3D models and VR/AR content. It supports a wide range
Considering these parameters, the video capacity (VC) for
of features, including animations, Physics-Based Rendering
the maximum size of 3D objects that can be added in our
(PBR) materials, and texture mapping, and can be used with
methodology will be given by the video resolution times frame
a range of authoring tools and game engines [17]. By em-
number (F)
bedding glTF files into video streams, developers can create W ×H ×F
immersive VR/AR experiences that combine video, audio, and =VC
16
3D graphics.
Another important aspect of the proposed methodology is
Overall, steganography and container formats offer two
the generation of the JSON action file Listing 1, this file is
distinct approaches for adding files to videos[14], [15], each
produced by the Blender plugin and describes the objects along
with their own advantages and limitations. The choice of
with the duration each should be displayed
approach will depend on factors such as the intended use
case, the desired level of visibility and accessibility, and the 1 {
2 "objectName_1": "HH:MM:SS"
compatibility with existing hardware and software. In the next 3 "objectName_2": "HH:MM:SS",
section, we will explore some specific techniques and tools 4 "objectName_N": "HH:MM:SS"
for implementing these approaches in practice. 5 }
Listing 1: Action Json
III. T HE P ROPOSED M ETHODOLOGY
Building upon the discussion above, we have developed The Unity video player plays a pivotal role in enhanced
a framework that enhances the virtual reality experience by VR experience we aim to deliver. Once the glTF files are
embedding 3D models within video content files. This scheme embedded within the video using steganographic LSB tech-
relies on a Blender plugin and a Unity video player, incorporat- nique, and the action JSON file is generated by the Blender
ing steganography techniques. Our method involves concealing plugin, it is the responsibility of the Unity video player to
decode this information during playback. The player retrieves
the 3D model data from the video’s least significant bits and and their corresponding playback timings. This enables users
simultaneously plays the video and renders the 3D models to maintain a structured overview of their animation elements.
in real-time based on the instructions from the JSON action Figure 1 illustrates how users interact with Blender plugin,
file. Beyond just playback, the Unity video player also offers during creation process. There are six essential steps to ac-
interactivity, allowing users to engage with the embedded complish this:
3D models, further deepening the immersion and bringing to 1) Users start by importing their video files and glTF
fruition the full potential of our methodology. models into Blender.
IV. TARGET A PPLICATION 2) Within the plugin interface, users can designate the glTF
files they wish to incorporate.
In this section, we present our chosen application sce- 3) In order to make 3D objects appears contextually with
nario developed to assess the feasibility of our proposed scenes from the video, users set their position
methodology. Our project encompasses the development of 4) They specify the precise time markers on the video
two distinct applications, each serving a unique purpose.The timeline when each glTF model should appear and
first application focuses on the creative domain, where we disappear.
integrate a custom Blender plugin for precise synchroniza- 5) With the plugin users can export video and 3D objects
tion of video content and glTF animations. Additionally, in a single file.
we employ steganography to embed glTF files into videos, 6) The plugin generates a JSON file that records the glTF
enhancing storytelling capabilities. The second application file names and their associated timing information.
extends into the realm of immersive experiences, where we
introduce a Unity-based video player that interprets JSON
data generated by our Blender plugin and extracts glTF from
videos. This player seamlessly integrates glTF models into
video scenes, offering real-time interaction and captivating
multimedia experiences. These two applications collectively
showcase the versatility and practicality of our methodology
in both artistic and interactive contexts, and the ensuing
experiments shed light on their effectiveness and challenges in
practical implementation, offering valuable insights for further
development and research.
A. Developing a Blender Plugin for Video and glTF Animation
Synchronization
In this section of the paper, we delve into the development
and implementation of a custom plugin for the popular 3D
modeling and animation software, Blender. The primary ob-
jective of this plugin is to enhance Blender’s functionality
by enabling users to seamlessly integrate video content and
synchronize glTF files within their projects. Below we list the
main features of the plugin.
1) Video Integration: Our plugin simplifies the inclusion of Fig. 1: Blender plugin flow.
video content into Blender’s workspace. Users can effortlessly
import video files of various formats, providing an additional B. Unity-Based Video Player for Synchronized glTF Playback
layer of visual storytelling to their 3D animations.
2) GLTF Animation Synchronization: One of the key fea- In this section, we introduce a Unity-based video player
tures of our plugin is the ability to precisely time the playback component specially designed to interpret the JSON files
of glTF files within the animation. Users can select specific generated by our Blender plugin. This video player serves as
glTF models and define the exact moments in the animation the bridge between the JSON data, glTF models and the final
timeline when they should be displayed. immersive experience for our users. Below we list the main
3) Embed glTF into video: Another feature provided by our features of the video player.
plugin is the ability to embed glTF files directly into video files 1) JSON Parsing: The Unity video player is equipped with
using steganography. This allows content creators to deliver a robust JSON parser that reads and interprets the JSON
video and 3D objects in a single file. file generated by our Blender plugin. This parser extracts
4) JSON File Storage: To efficiently manage the synchro- critical information, such as the names of glTF files and their
nization data, our plugin employs a JSON file format. This file associated playback timings.
stores essential information such as the name of the glTF files 2) Dynamic glTF Loading: The video player extract and
dynamically loads the specified glTF models into the Unity
environment, ensuring that the correct 3D assets are ready for and utilize the JSON data generated by our Blender plugin,
playback when needed. this player efficiently extracts and renders glTF models in the
3) Timeline Synchronization: Our Unity video player syn- Unity environment. Its effectiveness becomes evident when
chronizes the playback of glTF models with the video content. we examine the results visually. The Figure 3, capturing the
It utilizes the timing data from the JSON file to ensure that user’s perspective view, provides a firsthand experience of
the glTF models appear and disappear at precisely the right the video content embellished with dynamically loaded glTF
moments during video playback. models. The intricate details of the 3D models, alongside the
4) Real-Time Rendering: The video player leverages video, create a riveting visual narrative that holds the viewer’s
Unity’s powerful rendering capabilities to display glTF models attention. Meanwhile, the Figure 4 offers a side camera view
with high fidelity, seamlessly integrating them into the video from the user’s perspective, offering a broader context and
scene. showcasing the spatial arrangement of the video content and
5) User Interaction: Users have the option to interact with 3D models. This juxtaposition vividly demonstrates the trans-
the glTF models during playback, offering an engaging and formative effect of our proposed methodology, turning tradi-
interactive experience within the 3D environment. tional video content into interactive 3D experiences, thereby
To better understand the Unity Video Player’s role in our illustrating the practicality and potential of our approach in
project, let’s explore its fundamental workflow, as depicted the real-world VR landscape.
in Figure 2. This component acts as the bridge between the
JSON data generated by our Blender plugin and the dynamic
integration of glTF models within Unity. The basic steps on
video player workflow are:
1) The video player parses the JSON file, extracting glTF
filenames and timing information.
2) The video player has the ability to extract 3D models
that were added with the blender plugin.
3) Based on the extracted data, the video player dynam-
ically loads the relevant glTF models into the Unity
scene.
4) Users can interact with the 3D models as they appear,
enhancing the immersion and engagement of the expe-
rience.
Fig. 3: User perspective.
Fig. 2: Video flow.
The Unity video player’s implementation is the linchpin that

ties our entire methodology together, creating a cohesive and
enhanced VR experience. Engineered to seamlessly interpret Fig. 4: Camera with side view.
V. C ONCLUSION [10] Lee, J. and Lee, I., 2012. A Study on Correcting Virtual Camera
Tracking Data for Digital Compositing. Journal of the Korea society of
This research presents a novel methodology that integrates computer and information, 17(11), pp.39-46.
glTF 3D models with video content using steganographic [11] Grundhöfer, A. and Bimber, O., 2008. VirtualStudio2Go: digital video
techniques. The incorporation of a bespoke Blender plugin composition for real environments. ACM Transactions on Graphics
(TOG), 27(5), pp.1-8.
and a Unity-derived video player has enabled the seamless em- [12] Sadek, M.M., Khalifa, A.S. and Mostafa, M.G., 2015. Video steganog-
bedding of 3D objects into video files, signaling a significant raphy: a comprehensive review. Multimedia tools and applications, 74,
advancement in the realm of virtual reality (VR) multimedia. pp.7063-7094.
[13] Liu, Y., Liu, S., Wang, Y., Zhao, H. and Liu, S., 2019. Video steganog-
Such integration offers a dual advantage: it affords content raphy: A review. Neurocomputing, 335, pp.238-250.
creators the precision to synchronize 3D animations with [14] Channalli, S. and Jadhav, A., 2009. Steganography an art of hiding data.
video timelines and permits the consolidated delivery of both arXiv preprint arXiv:0912.2319.
[15] Riiser, H., Halvorsen, P., Griwodz, C. and Johansen, D., 2010, February.
media forms in a singular, efficient package. Visual results, Low overhead container format for adaptive streaming. In Proceedings of
demonstrated via both user-centric and ancillary side perspec- the first annual ACM SIGMM conference on Multimedia systems (pp.
tives, underline the transformative potential of this approach 193-198).
[16] Dasgupta, K., Mandal, J.K. and Dutta, P., 2012. Hash based least
for the VR domain. The technique promises to revitalize significant bit technique for video steganography (HLSB). International
not only current video content but also offers a mechanism Journal of Security, Privacy and Trust Management (IJSPTM), 1(2), pp.1-
to retrofit legacy materials. Future research avenues might 11.
[17] Friston, S., Fan, C., Doboš, J., Scully, T. and Steed, A., 2017, June.
include the exploration of advanced optimization algorithms 3DRepo4Unity: Dynamic loading of version controlled 3D assets into the
for video quality preservation, the expansion of interactive Unity game engine. In Proceedings of the 22nd International Conference
user engagement parameters, and potential applicability within on 3D Web Technology (pp. 1-9).
augmented reality (AR) environments. In summation, the
proposed methodology signifies a pivotal step in shaping the
next phase of VR multimedia, emphasizing both innovation
and practical applicability.
ACKNOWLEDGMENT
This research, according to Article 48 of Decree nº
6.008/2006, was partially funded by Samsung Electronics of
Amazonia Ltda, under the terms of Federal Law nº 8.387/1991,
through agreement nº 003/2019, signed with ICOMP/UFAM.
R EFERENCES
[1] Mendes, Daniel, Fabio Marco Caputo, Andrea Giachetti, Alfredo Ferreira,
and Joaquim Jorge. ”A survey on 3d virtual object manipulation: From
the desktop to immersive virtual environments.” In Computer graphics
forum, vol. 38, no. 1, pp. 21-45. 2019.
[2] Abdelmaged, M.A.M., 2021. ”Implementation of virtual reality in health-
care, entertainment, tourism, education, and retail sectors.” In Munich
Personal RePEc Archive, Paper No. 110491.
[3] Hsieh, M.C. and Lee, J.J., 2018. Preliminary study of VR and AR
applications in medical and healthcare education. J Nurs Health Stud,
3(1), p.1.
[4] Chen, Y., Rong, F., Duggal, S., Wang, S., Yan, X., Manivasagam, S., Xue,
S., Yumer, E. and Urtasun, R., 2021. Geosim: Realistic video simulation
via geometry-aware composition for self-driving. In Proceedings of the
IEEE/CVF conference on computer vision and pattern recognition (pp.
7230-7240).
[5] Dovramadjiev, T., 2015. Modern accessible application of the system
blender in 3d design practice. International scientific on-line journal”
Science & Technologies”. Publishing House” Union of Scientists-Stara
Zagora”, ISSN, pp.1314-4111.
[6] Kelash, H.M., Wahab, O.F.A., Elshakankiry, O.A. and El-Sayed, H.S.,
2013, October. Hiding data in video sequences using steganography
algorithms. In 2013 International Conference on ICT Convergence (ICTC)
(pp. 353-358). IEEE.
[7] Chae, J.J. and Manjunath, B.S., 1999, October. Data hiding in video.
In Proceedings 1999 international conference on image processing (Cat.
99CH36348) (Vol. 1, pp. 311-315). IEEE.
[8] Wu, M. and Liu, B., 2003. Data hiding in image and video. I. Fundamental
issues and solutions. IEEE Transactions on image processing, 12(6),
pp.685-695.
[9] Long, M., Peng, F. and Li, H.Y., 2018. Separable reversible data hiding
and encryption for HEVC video. Journal of Real-Time Image Processing,
14, pp.171-182.

Enhancing Virtual Reality Experiences Through Embedded 3D Models in Video Content

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Enhancing Virtual Reality Experiences Through Embedded 3D Models in Video Content

Uploaded by

Copyright:

Available Formats

Enhancing Virtual Reality Experiences through

Embedded 3D Models in Video Content

Roberto G. E. Leyva, Institute of Computing – Federal University of Amazonas

Fig. 3: User perspective.

Fig. 2: Video flow.

The Unity video player’s implementation is the linchpin that

You might also like