S3DVCOM Intern 2024

Semantic 3D video communication for the metaverse (S3DVCOM)
Project proposed by Brahim Farhat, researcher, at artificial intelligence and digital science research
center (AIDRC)
6 months duration Master or Ph. D. Intern from February 2024 at Technology Innovation Institute
(TII), Masdar city, Abu Dhabi, UAE
Project description
The S3DVCOM project aims to build a real-time and high-quality streaming framework of natural or
synthetic 3D scenes (a.k.a. volumetric video streaming). Several solutions have been recently standardized
at the IEC/ISO for efficient coding and streaming of volumetric video in different formats, including multi-
view plus depth (MPEG immersive video - MIV), point clouds (video-based PCC V-PCC and geometric-
based PCC G-PCC), and mesh (video-based dynamic mesh coding). However, although these solutions
reach a good coding performance for dynamic 3D content, they introduce high computational complexity,
preventing the fast development and deployment of hardware encoder and decoder. Moreover, point
clouds and mesh modalities fail to represent high-quality natural scenes with many objects and texture
details. To tackle this latter issue, the neural radiance field (NeRF) [1] has emerged as an artificial
intelligence (AI)-based solution that implicitly encodes the radiance field of the 3D scene in a multilayer
perceptron (MLP) network. This MLP takes as input continuous 5D coordinates and predicts the volume
density and view-dependent emitted radiance at the input spatial location. The NeRF model trained on a
sparse set of input views achieves state-of-the-art performance for synthesizing a novel view through a
continuous representation of the scene. The success of NeRF has revolutionized the 3D representation
and rendering based on a lightweight NeRF network. Several papers have then been published addressing
NeRF shortcomings: 1) reducing the memory access and computational complexity of the training [2], 2)
real-time rendering on mobile devices [3], and 3) extension to 3D dynamic scenes [4] and finally 4)
enhancing the perceived quality [5].
The intern project will focus on developing an AI-based model for a compact representation of 3D dynamic
scenes targeting a high-quality visual reconstruction while minimizing the bitrate required to convey the
weights of the MLP. Furthermore, additional constraints are considered on the training time (i.e., fast and
progressive training in the range of 1 to 3 minutes) and real-time rendering on mobile devices leveraging
their neural processing units. The project will target both publication and demonstration for 3D dynamic
compact and high-quality representation. The system will also be integrated into end-to-end streaming
pipeline and displayed by the end user on different users relying on integration with Unity plugin.
Project milestones
M0-M1: State-of-the-art study and benchmarking existing solutions.
M1-M3: Propose and develop an AI-based model for 3D dynamic scene continuous
representation.
M3-M5: Integration of the model within a streaming pipeline and develop plugin with Unity for
display by the end user on different platforms.
M5-M6: Assess the performance of the proposed model in terms of quality, bitrate, and
training/inference time on different platforms.
Figure 1: Gantt chart of the intern project
Profile and skills

Candidate in computer vision with a solid background in 3D representations and modalities, including
mesh and point clouds. Knowledge of neural radiance field and implicit 3D representation models is
suitable.
Programming experience in Python and advanced AI frameworks like TensorFlow or PyTorch is required,
mastering Unity is appreciated.
Desirable Qualities:
English Proficiency: strong English language skills, both written and verbal
Analytical mindset with a problem-solving approach.
Ability to work collaboratively in interdisciplinary teams.
Eagerness to stay updated with the latest developments in computer vision and AI technologies.
Submit Your Application

To apply, please forward your Curriculum Vitae (CV) to the following email addresses:
Brahim Farhat: Brahim.farhat@tii.ae
Wassim Hamidouche: Wassim.hamidouche@tii.ae
We welcome the submission of recommendation letters alongside your application. Please
ensure that each letter is duly signed by the person recommending you.
Steps
To integrate TII, you’ll go through the following steps:
- Initial Exchange Meeting: The candidate will be provided with an opportunity to showcase
their professional profile and express their motivation for joining the TII project.
- Technical Proficiency Assessment: The candidate will undertake a Python programming
assessment to evaluate their technical capabilities relevant to the project’s requirements.
- Final HR Consultation: Following successful completion of the previous stages, the
candidate will engage with TII's HR team to finalize the onboarding process.
References
[1] B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “Nerf: Representing scenes as neural
radiance fields for view synthesis,” in Computer Vision – ECCV 2020, A. Vedaldi, H. Bischof, T. Brox, and J.-M. Frahm, Eds.
Cham: Springer International Publishing, 2020, pp. 405–421.
[2] T. Muller, A. Evans, C. Schied, and A. Keller, “Instant neural graphics primitives with a multiresolution hash encoding,”
ACM Trans. Graph., vol. 41, no. 4, jul 2022. [Online]. Available: https://doi.org/10.1145/3528223.3530127
[3] J. Cao, H. Wang, P. Chemerys, V. Shakhrai, J. Hu, Y. Fu, D. Makovi-ichuk, S. Tulyakov, and J. Ren, “Real-time neural
light field on mobile devices,” arXiv preprint arXiv:2212.08057, 2022.
[4] T. Li, M. Slavcheva, M. Zollhoefer, S. Green, C. Lassner, C. Kim, T. Schmidt, S. Lovegrove, M. Goesele, R. Newcombe,
and Z. Lv, “Neural 3d video synthesis from multi-view video,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern
Recognition (CVPR). Los Alamitos, CA, USA: IEEE Computer Society, jun 2022, pp. 5511–5521. [Online]. Available:
https://doi.ieeecomputersociety.org/10.1109/CVPR52688.2022.00544
[5] Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, George Drettakis, 3D Gaussian Splatting for Real-Time Radiance
Field Rendering, ACM Transactions on Graphics, SIGGRAPH 2023, [Online]. Available: https://repo-sam.inria.fr/fungraph/3d-
gaussian-splatting/

S3DVCOM Intern 2024

Uploaded by

Copyright:

Available Formats

You might also like

S3DVCOM Intern 2024

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

S3DVCOM Intern 2024

Uploaded by

Copyright:

Available Formats

Semantic 3D video communication for the metaverse (S3DVCOM)

Profile and skills

Analytical mindset with a problem-solving approach.

Ability to work collaboratively in interdisciplinary teams.

Submit Your Application

You might also like