Profiling User Interactions For Predictive 3D Streaming and Rendering

Profiling User Interactions of 3D Complex
Meshes for Predictive Streaming & Rendering

Vani V
1
, Pradeep Kumar R
2
and Mohan S
3

1
Department of Information Technology, Dr.N.G.P.IT, Affiliated to Anna University,
Coimbatore, India.
2
Department of Computer Science Engineering, Adithya IT, Affiliated to
Anna University, Coimbatore, India.
3
Department of Computer Science Engineering, Dr.N.G.P. IT, Affiliated to
Anna University, Coimbatore, India.

vasudevan.vani@gmail.com,rpk.ind@gmail.com,
s.mohan77@gmail.com

Abstract. Inspired by the cache model, a predictive agent is analytically
constructed to determine the user navigation based on the patterns derived out of
user profiles. The user profiling is derived based on the user interactions made by
the diversified set of users over different 3D models. An attempt has been made
to analyze how efficiently the prediction works to stream a 3D model based on
the pre determined transition path generated out of the user profiles. The
transition paths for various models are generated by exploiting the properties of
Markov Chain model. The analytics collected from the transition paths affirm
that the predictive agent lessens the rendering latency significantly. The
rendering latency is lessened by streaming the required data well before it is
requested from the server to the client. The streaming and rendering process with
user interactions from client would stream and render only the visible portion of
the 3D models while ensuring that there is no compromise on the visual quality
of the objects. This paper mainly focuses on profiling the user interactions during
the navigation of 3D meshes and analyses various outcome of it.

Keywords: User Profiling, Web 3D, 3D Streaming, Predictive Agent, 3D
Modeling and Rendering, 3D Virtual Environment, Transition Path.

1 Introduction

3D modeling and rendering over the network has been the advancement in the
recent research as the application of 3D over web is seamless. While creating a
photo realistic virtual environment, the major challenge is to stream the 3D
models within the available network bandwidth. At the same time, the visual
quality and the delay in response to user navigation may considerably get
affected. Hence a system which can promise a better virtual 3D environment over
the existing network without placing any constraint on the user navigation is the
need of the hour. In this paper, an attempt is made towards reducing the waiting
time of the user/client during the interaction further by predicting the operation
that would be performed by the users.
Mohan S. et al. (Eds.): Proceedings of the International Conference ICSIP 2012, , pp.
springerlink.com Springer India 2012
1 Vani V et al.

2 3D Streaming An Overview

3D streaming [3, 4 & 5] is the process of delivering 3D content in real-time for the
users over a network. 3D streaming is carried in such a way that the interactivity
and visual qualities of content may match as closely as if they were stored locally.
The main resource bottleneck here is usually assumed to be the bandwidth and not
rendering or processing power of the clients. To achieve this goal, simplification of
the model and transmission of the content based on the users view are two
dominant strategies adopted. The model simplification [6] and transmission
strategies exploit the resolution of the model with respect to the users view. When
the users view point is far off from the 3D models screen space then only the
coarse low (resolution) mesh is brought to the client and in case when the users
view point is closer to the 3D models screen space then the refined (high
resolution) mesh is brought to the client. Therefore, the Multi-resolution models [7,
8] offer the possibility to manipulate representations of 3D objects at different
levels of detail (LOD). It is possible to adapt to different hierarchies of LODs [9,10
& 11] based on the application requirement and mostly view dependent LODs
which would incrementally bring in the required quality of the 3D mesh is highly
used in the Virtual Environments by effectively utilizing the available bandwidth.
However, apart from the bandwidth, we also need to consider the rendering latency.
Rendering latency could be reduced up to some extent by considering the multi
resolution 3D model based on the users view point. An attempt is made to reduce
the rendering latency further by predictive the users next move.

As most of the 3D streaming and rendering systems deals only with what the user is
viewing (in the frustum) and not the actual way the user interacts, the main
objective of the proposed work is to analyze the user interactions and determine the
relationship between the interaction elements and the streaming (and rendered)
elements of that 3D model.

This would determine the amount of data that ought to be sent to the client well
before it is demanded by the client by prediction. This approach would definitely
result in the reduction of rendering latency.

3 Proposed Work

3.1 Predictive Model (P
r
m)

The proposed predictive model is based on understanding the user navigation in the
virtual world. Based on the current position of user navigation, only visible vertices
and faces of the selected triangular meshes are brought to the client machine during
visualization. At the same time, based on the previous history collated from various
user inputs, the next set of predicted vertices and faces are also pushed to the client
with the help of the Predictive Agent (PA).
A Predictive Agent (PA) is built after successful offline analysis that was carried out
on user profiles collected from 55 different users (aged 18 to 22, from engineering
institutions with good visual and computer senses. The reason is that, this age group
spends more time on gaming and relatively having a better understanding of the
interfaces to navigate). As part of user analysis, the speed of the key press, total
session time spent by every user, visual coverage of the model and pattern of the
keys/buttons pressed are taken across complex 3D models/meshes. For
experimentation purpose, complex 3D models of sizes 26 MB(Armadillo) and 45
MB (Brain) are considered with different shapes. Various shapes where the basic
building block is a triangular mesh are considered in order to profile the moves if the
3D shape is oriented either horizontally / vertically. Based on the shapes even the
user movement would vary if the user wishes to check the visual appearance of the
entire 3D shapes.
The PA contains conventional transition probabilities [1, 2] of users when they
move from one state to another state. During the transition, the maximum
probability from a given step to current step is chosen for prediction and further
towards predictive streaming. This algorithm uses greedy approach by considering
the maximum probability (compared to all states transition probability) to move
from a given state to the next. These transition probability paths generated for
various models are used to predict the user interactions at every state. This would
help further to optimize the 3D streaming and rendering over the network by
reducing the time delay between user request and response.

3.1.1 Analytical Model

The main objective of the proposed work is to develop an analytical model based on
the user interaction while viewing the 3D models over the network. The central idea
is to predict the user navigation and construct an analytical model for every 3D
object (3D triangular meshes) using the PA. This predictive model hence would be
useful in bringing the necessary surfaces during streaming. This prediction is useful
to reduce the rendering & response time. To construct the predictive model
(Predictive Agent: PA), the following notations have been used:
Let S
v
be a set of mesh vertices in the server and S
f
be a set of corresponding mesh
faces in the server for the selected 3D mesh. Let C
v
be the set of mesh vertices in
the client
where C
v
S
v
and C
f
be the set of corresponding mesh faces in the client where C
f

S
f
.

On an Operation O
i
, which can be an arbitrary rotation(
x
,
y
,
z
) or Zoom in/
Zoom out(Z
in
, Z
out
) or Translate (T
x
, T
y
, Y
z
), C
v
and C
f
can undergo a change if its
a rotation operation in terms of Vertices and Faces as follows : {V
i
} & {F
i
}.
For {V
i
}: + {V
i
} S
v
, - {V
i
} Cv ,
where +{Vi} is the set of vertices chosen from Sv and
- {V
i
} is the set of vertices chosen out from Cv.

For {F
i
}:+ {F
i
} S
f
, - {F
i
} C
f
,
where +{F
i
} is the set of faces chosen from S
f
and - {F
i
} is
the set of faces chosen out from C
f
.
Table 1 summarizes the notations used in our model.

User Profiling for Predictive Streaming & Rendering 2

3 Vani V et al.

Table 1: Notations
S
v
Server Vertex Set
S
f
Server Face Set
C
v
Client Vertex Set
C
f
Client Face Set
O
i
i
th
Operation
{V
i
} Vertices Changes
{F
i
} Faces Changes
R
Rotation
x
Rotation about x axis
y
Rotation about y axis
z
Rotation about z axis
T
x
Translate about x axis
T
y
Translate about y axis
T
z
Translate about z axis
Z
in
Zoom In
Z
out
Zoom Out

3.1.2 Operation

Profiling

To profile the interaction performed by the user, basically the Rotation operation R

in any one of the directions: +
x
/-
x
, +
y
/-
y
, +
z
/-
z
and also Translation /
Scaling with a fixed Translation/scale factors are considered.
For every key press/ mouse move during the rotation, a fixed angle of rotation is
applied on the 3D object and outcome of the rotation generates updated eye position
and eye orientation (eye refers to the camera position, which is the view point of the
user in 3D world). Based on this operation, the speed of rotation is estimated based
on the number of key pressed per second. The key presses would determine the
amount of angle being rotated per second.
Based on the rotation output, the amount of change in the vertices and faces (+
{V
i
} and + {F
i
}) that ought to be transmitted to the client is predicted. The
predicted faces and vertices only are transmitted to the client. The prediction, hence,
would reduce the rendering latency based on the client input.

3.1.3 User Profiling

To construct the predictive agent, an offline analysis has been carried out by
considering 55 user profiles taken from a range of novice to professionals
interacting with 3D virtual world. The user profiles include, rate at which the key is
pressed/ mouse button clicked with a drag and the actual key/mouse/scroll button
that is pressed per user session on various complex 3D meshes are considered for
analysis. Using the collated user profiles, operation patterns are determined by
considering the transition probabilities. At each step of constructing the transition
path, transition from maximum probable state to all other possible states based on
the user profiles are extracted by exploiting the markov chain model and greedy
approach. As per the greedy approach, at every transition step locally optimal
choice (state with maximum probability) is considered and a complete transition
path is constructed. Also, as per the markov chain process, summation of the
transition probabilities from one state to all possible states should be 1. Based on the
constructed transition path with maximum probabilities considered at every step, the
predictive agent for a specific model is built. This process is considered to be a
training session for the users before they actually navigate the virtual world. Once
trained, the users would be able to get the rendered 3D models with a better
response time across the network while interacting with the 3D web. Table 2
describes various keys that can be used by the user and the corresponding
operations performed in response to the key press are specified. Also, we have
carefully studied the existing systems and identified all possible operations that the
user is intended to perform in a 3D environment and consolidated it.
Similar to Table 2, the Table 3 describes various mouse moves the user can perform
during his/her interactions in the 3D environment. Here also, most common practice
adopted in the 3D virtual environment is followed. However, it is a common user
practice that most of the time the interactions are made using the keys rather than
mouse. Study reveals that this is due to the linear degree of movement of key press
and non-linear movement of the mouse operation. Evidences are that the
commercial graphic hardware consoles use joystick movements which make a
linear movement while playing the games. However, we have conducted the
detailed study and analyzed extensively using keys as well as mouse interactions.

Table 2: User Key Press
S.No Key
Pressed
Operation
Performed
Key
Value
1 LEFT Rotate -X 1
2 RIGHT Rotate +X 2
3 Up Rotate -Y 3
4 Down Rotate +Y 4
5 PgUp Rotate +Z 5
6 PgDown Rotate -Z 6
7 Tab Move Right 7
8 Backspace Move Left 8
9 X Move Up <
10 Y Move Down >
11 O Zoom out 0
12 Z Zoom In 9

Table 3: User Mouse Movement


S.No Mouse
Move
Operation
Performed
Key
Value
1 LEFT+Drag Rotate LG
2 Right+Drag Zoom In /Out RG
3 Left+Right+
Drag Move
LRG
4 Scroll Zoom In/Out U/D
5 Vani V et al.

4 Result and Analysis

We have used two different 3D models (Table 4) which contain only meshes to
experiment the user interactions and build a rigid predictive agent. Meshes are
considered so that the number of culled faces can be easily computed. 3D meshes
with its number of vertices, faces, triangle strips and file size considered is given in
Table 5. The 3D meshes are carefully chosen so that it differs in shape and
complexity (like total no. of vertices and faces). The experimental setup was run on
an Intel Core2 Duo CPU P8600 @ 2.4 GHz with 4 GB RAM and ATI Radeon 1GB
graphics card system. The system has been used to simulate a server environment
where the server was streaming the 3D meshes as requested by user from a client
machine based on the navigation. The client module was run on the machines with
the following configuration which does not have exclusive graphics card. The
configurations of 55 client machines are Intel Core 2 Duo CPU P6550 @ 2.3 GHz
with 1GB RAM. The profiling collected from 55 client machines helps in rendering
the 3D meshes more efficiently.

4.1 Analysis of User Profiling

We have collected the profiling of 55 different users (aged 18 to 22, from
engineering institutions with good visual and computer senses). For each 3D model,
the users/clients are asked to navigate through the object with various key
strokes/mouse moves as defined earlier and user manual is also circulated among
the users to get a fair idea about key/mouse press and the corresponding operation
being performed. In addition to it, all the users are instructed properly on how to use
the key strokes/mouse moves to navigate and visualize the 3D meshes. Number of
times a key/mouse button pressed is counted. Later, the probability of pressing each
key stroke/mouse button and the transition probability of moving from one
key/mouse button to other key/mouse button is calculated as in Figure 4(a) and 4(b).

Figure 1 describes the overall visual coverage of 3D mesh models considered and
could infer that brain model is covered by maximum number of users (29 users,
53%) than that of armadillo model (13 users,24%) out of 55 users. This visual
coverage is estimated based on the interactions of the model and the operation
performed and the total number of vertices and faces covered by them. After
applying visibility culling algorithm, the study reveals that 40% of the meshes
would be saved without rendering in a client machine for a complex mesh. It
implies that only 60% of meshes would be rendered and viewed by maximum
number of users who covers the entire model.

Figure 2(a) and (b) highlights time spent on the model (session time) by each of the
55 users. From the result obtained, we could see, some of the users spent more 560
seconds that is around 10 minutes on the brain model with 1446 key/ mouse
interactions (Figure 3(a)).Also, on the other hand 353 seconds are spent on the brain
model with 4030 key presses. The results imply that the interactions performed by
the user also influence his psychological aspects such how much he/she is interested
on the model he is viewing, how long he takes to press the next key(think time) etc
can also be analyzed. Similarly, if we consider the Armadillo model, maximum 266
seconds is spent on a model with 1988 key/mouse interactions (Figure 3(b)) and on
the other hand maximum of 3129 key/mouse interactions is done for the duration of
211 seconds. Therefore, the session time and think time can be exploited to push the
predicted faces and vertices well before it is requested.

Figure 4(a) shows the constructed transition path based on the transition probability.
Based on the 55 users interactions with brain model consolidated, the transition path
is constructed. From the start state S, the next probable state from the available 15
states is calculated by identifying the state with maximum probability. Accordingly,
the state 3 from S is the termed to be the most probable state determined based on
the user profiles. From state 3, next probable state is determined once again and the
process is repeated till we reach the finish state from only two states with equal
probability .5 and number of user involved in that move remains to be 1. This
transition probability which estimated by considering only the unique moves by
removing the duplicate states such as (3 to 3, 4 to 4) can be used as a look up table
in the predictive agent to perform predictive streaming and rendering. Also, the
consolidated user interactions reveals the fact that the probability of moving from
state X
i
to X
i
is high compared to transition from state X
i
to X
j
where i j.
Therefore, the predictive agent is built to get the utmost efficiency by considering
both the constructed transition path with unique user interactions while moving
from state X
n-1
(previous)

to X
n
(current) as well as the calculated cumulative
probability of X
i
to X
i.
Figure 4(b) shows the transition path generated for Armadillo
model. Figure 4(a) and (b) gives a conclusion that based on the shape of the model
the interactions of the users varies and this is also considered while implementing a
predictive based 3D streaming and rendering to achieve swift response from the
server for every interaction.

Table 4: Thumbnails of actual and Rotated/Zoomed 3D meshes using key press / mouse
moves (
#
http://www1.cs.columbia.edu/~cs4162/models)

Table 5: 3D objects and its attributes


S.No Model
No.
Vertices
No. of
Faces
No. of Triangle
Strips
File
Size
1.
Armadillo 1,72,974 3,45,944 3,37,958 26
MB
2.
Brain 2,94,012 5,88,032 5,83,877 44
MB
7 Vani V et al.

Fig. 1. % of Users covered entire model

Fig. 2(a). Users Session Time (in Sec) for Armadillo 3D mesh

Fig. 2(b). Users Session Time (in Sec) for Brain 3D mesh

Fig. 3(a). Users Interactions on Armadillo 3D mesh

Fig. 3(b). Users Interactions on Brain 3D mesh

Fig. 4(a) Transition Path for Brain Model

Transition Path for Brain Model

Fig. 4(b) Transition Path for Armadillo Model
5 Conclusion

This paper explored the possibility of implementing a predictive based 3D streaming
and rendering by exploiting the user interactions performed by various users across two
complex models differ in size and shape. With the predicted user profiling, it is affirmed
that the 3D meshes can be rendered with minimum latency. The transition paths
generated and the analysis performed based on the think time and session time shows
that the predictive model for 3D streaming has a significant impact in bringing the 3D
models to the client system without compromising on the visual quality. Future
extension of the proposed system would be introducing multiple objects in a more
dynamic 3D environment.

1
.
0

0.2
0
.
0
2

0
.
0
2

0
.
0
2

0
.
0
2

0
.
0
9

0.07
0.31
0.18
6
1
3
L
5
8
2
7
D
R
S
0
.
4
3

0
.
2
4

0.35
0.41
1
2
4
0.14
0.5
3
1
7
0
.
2
5

0.5
4
7
5
0.25 0.5
0.5
L
6
F
4
U
1.0
0
.
1
1

0
.
0
2

0
.
0
4

U
0
.
0
4

0
.
0
5

0.05
0.33
0.13
6
1
5
2
7
4
D
R
0.24
0.12
0
.
1
8

G
0.18
0.35
3
R
1

0
.
4

0.4
U
2
3
0.2
2
U
0.5
0.5
D
3
3
F L
8
3
S
.17

9 Vani V et al.

6 References

[1] Gerald Benoit Simmons Application of Markov chains in an interactive information
retrieval system. Inf. Process. Manage. 41, 4 (July 2005), 843-857.
DOI=10.1016/j.ipm.2004.06.005. (2005).
[2] Dong Hyun Jeong, Soo-Yeon Ji, William Ribarsky, Remco Chang: A state transition
approach to understanding users' interactions. IEEE VAST 2011: 285-286. (2011).
[3] Soumyajit Deb and P. J. Narayanan. Design of a geometry streaming system. In
Proc. ICVGIP, pages 296-30. (2004).
[4] Nien-Shien Lin, Ting-Hao Huang, and Bing-Yu Chen3D model streaming based on
JPEG 2000. IEEE Transactions on Consumer Electronics (TCE), 53(1). (2007).
[5] William J. Schroeder, Jonathan A. Zarge, and William E. Lorensen. Decimation of
triangle meshes. SIGGRAPH Comput. Graph. 26, 2 (July 1992), 65-70.
DOI=10.1145/142920.134010. (1992).
[6] Hugues Hoppe. Progressive meshes. In Proc. SIGGRAPH, pages 99-108. (1996).
[7] Wei Cheng. 2008. Streaming of 3D progressive meshes. In Proceedings of the 16th
ACM international conference on Multimedia (MM '08). ACM, New York, NY,
USA, 1047-1050. DOI=10.1145/1459359.1459570. (2008).
[8] Wei Cheng, Wei Tsang Ooi, Sebastien Mondet, Romulus Grigoras, and Geraldine
Morin. Modeling progressive mesh streaming: Does data dependency matter?. ACM
Trans. Multimedia Comput. Commun. Appl. 7, 2, Article 10 (March 2011), 24 pages.
DOI=10.1145/1925101.1925105. ( 2011).
[9] Cohen, Jonathan D. and Dinesh Manocha. Model Simplification for Interactive
Visualization. Visualization Handbook. 13 pages. Eds. Chris Johnson and Chuck
Hansen. Elsevier Butterworth-Heinemann. Chapter 20, pp. 393-410. (2005).
[10] Huimin Ma, Tiantian Huang, Yanzhi Wang, Multi-resolution recognition of 3D
objects based on visual resolution limits, Pattern Recognition Letters, Volume 31,
Issue 3, 1 February 2010, Pages 259-266, ISSN 0167-8655,
10.1016/j.patrec.2009.08.015.
[11] Li Xin, Research on LOD Technology in Virtual Reality, Energy Procedia, Volume
13, 2011, Pages 5144-5149, ISSN 1876-6102, 10.1016/j.egypro.2011.12.142.

Profiling User Interactions For Predictive 3D Streaming and Rendering

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Profiling User Interactions For Predictive 3D Streaming and Rendering

Uploaded by

Copyright:

Available Formats

Profiling User Interactions of 3D Complex

Meshes for Predictive Streaming & Rendering

You might also like