Professional Documents
Culture Documents
Internet TV Architecture Based On Scalable Video Coding: January 2011
Internet TV Architecture Based On Scalable Video Coding: January 2011
net/publication/230882786
CITATIONS READS
2 76
3 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Rui António Santos Cruz on 25 June 2014.
2. ARCHITECTURE OVERVIEW
The Internet TV distribution network architecture considers
end user nodes and serving platforms. The end user nodes
are distributed peers (with P2P capabilities) that can pro-
duce, consume and share contents, offering their resources
(bandwidth, processing power, storing capacity) to other
end user nodes. The serving platforms are centralized ser-
vice nodes providing control (tracker for P2P), content treat- Figure 1: Layer creation process
ment and distribution (transcoders and media servers), as
well as interaction tools and facilities. The architecture is
a multi-source Hypertext Transfer Protocol (HTTP) client files into several transmission layers, as illustrated in Fig-
and server solution providing an advanced form of Web- ure 1. This process begins with the demultiplexing of the
Streaming and WebSeeding (HTTP based P2P Streaming NALs and with their identification with one ID. This is very
Protocols) [3]. The process used for streaming distribution important for the reconstitution of the original bitstream on
relies on a chunk transfer scheme (instead of a bitstream) the client side. This identification process is done by the in-
whereby the original video data is chopped into small video sertion of a Sequence Numbering field with 2 bytes between
chunks with a short duration (of typically two seconds). The the start code (0001), and the beginning of the NAL unit
chunk-based streaming protocols allow the deployment of (Figure 2). Layer separation is done by using the three iden-
a distribution network compatible with the Internet infras-
tructure, such as Web caches and CDNs as well as P2P
distribution. The description of the serving platforms of the
Internet TV distribution network will not be covered in this
paper (except for a brief description of the partition sys- Figure 2: Identification of NAL order
tem), as its focus is on the client side, for a solution able
to consume Video On Demand (VoD) and Live video ser-
vices, supporting multiple device types and resolutions with tifiers: Definition (DID), Quality (QID) and Temporal IDs
adaptive streaming mechanisms based on SVC extension of (TID) that exist on SVC NALs. After the partition of the
H.264/AVC . The overview of the architecture will focus the video file in layer files, an index file (Manifest) of the content
following components: is created. The manifest file holds information about the
content, i.e., describes the structure of the media, namely,
the codecs used, the chunks, the number of layers, the audio
• The Partition system, which will be responsible for component, etc., is a Well-Formed XML Document encoded
splitting the SVC video files in chunks and then in as double-byte Unicode.
several layer files.
• The Adaptation System that requests the video with 2.2 Adaptation System
maximum possible quality. The Adaptation System is responsible for the adjustments in
video quality by determining the number of layers to request
• The Reassembler System that rebuilds the video file to
from the serving nodes, based on a set of heuristics related
a given level of quality.
to network and host system conditions. Network conditions,
• The Media Player that plays the SVC video. such as bandwidth and Round-Trip Time (RTT), are con-
tinuously measured, with their averages used as smoothing
factors to prevent abrupt changes in the quality of the video.
The first component, the Partition system, is typically de- This ensures that the variations between layers are smooth
ployed in a centralized heavy-duty SVCencoder appliance. and causes an almost imperceptible impact on the user view-
The other components are for the end user client. ing experience. For the host system condition the heuristics
are related to the Screen Resolution and the CPU Load and
2.1 Partition system the system always uses the lowest values returned by the
The Chunk Partitioner (Figure 1) encodes the SVC video metrics. Additionally, the download time of several layers
file in a set of independent chunks that can be played in- of each chunk is also limited to 2 s to prevent pauses and
dependently, each one with 2 s duration. The chunks are re-buffering.
2.3 Reassembler System streams as well as accompanying metadata related to the
The Reassembler re-creates the independent video chunk file stream content and Quality of Service (QoS) metrics, the
from the received layer files (containing several NALs iden- daemon communicates directly with other P2P nodes, ap-
tified by unique IDs that provide the order of the NAL in propriate external Web servers, local video codecs, and the
the final video chunk). This video chunk is then sent to the browser plug-in via a standard Javascript API (JSAPI). The
SVC Media Player. Figure 3 illustrates the reassembly chain back-end component lets the P2P core engine and the HTTP
used for P2P or client-server streaming methods. server to run in the background regardless if the front-end
interface is running or not.
3. EVALUATION RESULTS
In order to evaluate the prototype solution a network sce-
nario was prepared, using only a client-server mode, on
which the available bandwidth for the client systems could
be artificially adjusted. The HTTP web streaming server
contained the SVC encoded videos, either as stored contents
or as real-time encoded media chunk streams (simulating a
Figure 3: The SVC reassembly chain Live TV program), together with the corresponding manifest
files. The web streaming server had public address accessible
from the Internet. The SVC video used on the evaluation,
was encoded with ten layers for two spatial scalability lev-
els, with the first five layers with a Common Intermediate
2.4 Client Video Player Format (CIF) resolution and the other layers with a Double
The end user client media player can be either a platform- Common Intermediate Format (DCIF) resolution.The Peak
specific software client to deliver audio-visual content to the Signal-to-Noise Ratio (PSNR) of the video has been ana-
user in a variety of formats, a Web browser plug-in, embed- lyzed for the first 200 frames. The results were plotted in
ded into an HTML5 document, or a WebApp targeted to Figure 5 where each numbered Layer PSNR line corresponds
mobile smartphones, providing the user interface and con- to the number of layers combined in the video. The metrics
tent playback functionalities. The architecture of the client
provides not only a client side but also a peer serving side.
55
The client side includes a local HTTP process that also sup- Layer 2
ports standard client-server downloading and streaming via 50 Layer 1
PSNR (dB)
Layer 9
HTTP protocol. The local HTTP process listens at a lo- 45 Layer 3
cal port to redirect HTTP GET or POST methods initiated 40 Layer 4
from either the local web browser or from the application Layer 8
35 Layer 5
Graphical User Interface (GUI), to either the P2P engine Layer 6
or to the appropriate external Web server, basing its deci- 30 Layer 7
sion on information taken from the Manifest of the content. 0 50 100 150 200
Frame
The client media player supports several codecs, including
SVC decoding [9]. As illustrated in Figure 4, the video play-
back can be made directly in the browser video canvas (for Figure 5: PSNR of a test video
the browser plug-in version) making it easier to integrate
P2P based video delivery into Web based distribution mech-
anisms. The client serving side back-end component is a used in each test measured the Bandwidth, Network Load,
RTT, Cache size and PSNR. For each relevant test the layer
variation during streaming was also collected. For a score
reference on the perceived quality of the received media after
compression and/or transmission, during the analysis of the
results the following relationship between PSNR and Mean
Opinion Score (MOS) was used (Table 1). The bandwidth