Professional Documents
Culture Documents
Video Conferencing Ebook
Video Conferencing Ebook
Video Conferencing,
The Enterprise and You
Video conferencing knowhow without the
computer science degree
eBook
Introduction
Computers have changed our lives by promising - and delivering - seemingly limitless innovation, from
word processing to the Internet.
Indeed, technology has even managed to honor the one pledge enterprises around the world were
hoping for: the ability to join virtual meetings with your peers and partners from anywhere, at any
time and thus reduce travel costs and increase productivity.
Though the hype around it has been quite exaggerated, video conferencing has grown consistently in
market share. Advances in processing power, available bandwidth and video coding technology,
coupled with the current economic slowdown and the "green" movement place video conferencing in a
good position to grow faster in market share and fulfill its potential to offer high definition, easy-to-
use, on-demand videoconferencing.
More and more companies are either deploying or considering deployment of visual communication
services. Still, there seems to be a great lack of knowledge regarding video conferencing at
management levels – what exactly is it, what would be the benefits and what are the hurdles of
deploying it within an organization.
This eBook tries to tackle these issues by answering several simple, yet frequently asked questions:
• Why is there any need to deploy visual communication services?
• What is video conferencing anyway?
• What does it take to deploy visual communication services?
• How do I know if my infrastructure is ready for visual communication services?
For those who wish to dive deeper into the exciting world of visual communication and learn more
about the technological aspects of this technology, there is an additional chapter. It gives some
insights about the obstacles in video transmission over corporate networks as well as current industry
trends of video coding techniques.
We hope you will enjoy this eBook.
Table of Contents
Introduction ........................................................................................................... 1
Acronyms and Glossary ............................................................................................... 3
Frequently Used Acronyms........................................................................................ 3
Video Glossary ...................................................................................................... 3
The Benefits of Using Video ......................................................................................... 7
The Different Types of Office Communication Means ........................................................ 7
The Benefits of Deploying Video Communications in the Enterprise ...................................... 10
How Does Video Really Work in an Enterprise? ............................................................... 11
Video Conferencing Truly Exists ................................................................................ 12
What Does it Take to Deploy Video? .............................................................................. 15
I feel the need, the need for speed ............................................................................ 15
High Definition is Next. Do YOU Know How Much Bandwidth You Have?! ................................. 17
What Can I Know About My Network? ............................................................................. 20
Testing Your Network’s Video Capabilities has now Become eVident. .................................... 20
Advanced Topics ..................................................................................................... 23
Visual Artifacts in Video over IP ................................................................................ 23
Codec Manipulation in Visual Communication ................................................................ 29
Scalable Video Coding and the Future of Video Conferencing .............................................. 31
About The Author .................................................................................................... 35
RADVISION’s Video Offering ........................................................................................ 35
SCOPIA Unified Communications Video Infrastructure ....................................................... 36
Video Glossary
Video Encoder: Software or HW device that enables video
compression. Generally, compression is used to reduce the size of
the visual content, either for storage purposes or for streaming over
a network channel (reduce bit rate). Video encoder performance and
quality is being determined by the encoder complexity.
Bit rate: Rate of bits transmitted over a particular period of time on a specific channel. In video
coding applications, video bit rate is determined by the number of the used bits per one second. For
example: 1Mbps = 1Megabit (1 Million bits) per second.
Frame Rate (fps): Rate of frames used in one second of video stream.
Frame resolution: A term defining the size of the basic element of a video content – the frame.
Frame resolution describes the number of pixels on the horizontal and vertical axis of a video frame.
There are several predefined popular acronyms for frame resolutions: CIF – 352x288 , 4CIF - 704x576,
D1 – 720x480 (NTSC) or 720x576 (PAL), 720p – 1280x720.
PAL: A term that uses to describe a playback video on a PAL TV. In general, PAL refers to standard
definition (SD) video with vertical resolution of up to 576 pixels and horizontal resolution of up to 720
pixels. PAL frame rate is 25 fps. PAL broadcasting can be found in Western Europe countries,
Australia, some countries of South America and some Asian countries.
NTSC: A term that uses to describe a playback video on a NTSC TV. NTSC generally includes standard
definition (SD) video with vertical resolution of up to 480 pixel and horizontal resolution of up to 720
pixels. NTSC frame rate is 29.97 fps. NTSC is used in United States, Canada, Japan, and various Asian
countries.
Frame Types: In video coding, there are several common frame types:
• I or Intra frame is a frame that is coded independently of any other frame, using only spatial
redundancies for prediction and coding. An I-frame uses relatively more bits comparing to
other frame types. I frame coding complexity is relatively less than other frames type.
• P or Inter frame is a predictive video frame. This coding is done according to predictions made
on the current frame following the previous I or P frames. A P frame is coded by using
temporal redundancies from the previous frame. P frame uses relatively less bits than I frame
and its complexity is higher.
• B frame refers to a Bi-directionally predicted frame and requires information from previous
and following I, P or B frames. B frame uses relatively less bits than all other frame types and
its coding complexity is greater than all other frames types. Usage of this type of frame
introduces system delay. Hence, it is not popular in real time low delay applications.
Packet Loss: Packets are units of information sent across a packet switched network from their source
address to a destination. Packet loss occurs when one or more packets fail to reach their destination.
On network protocols such as UDP that provides no recovery mechanism for packet loss, applications
should handle that error efficiently and should be able to conceal the lost data. In video conferencing
applications, packet loss is the major encountered error type that reduces video quality and quality of
service.
Video Codecs/Standards (H.26x, MPEG-x, WMV x, Real Video, VPx): Video coding standards are used
in order to standardize the video codecs. Some of those widely used standards are specified in the
international standards while others are based on proprietary standards.
H.26x refers to ITU standards while the MPEG-x term refers to ISO/IEC standards.
WMV (its latest version known as VC1) is a Microsoft standard for high efficiency video coding.
RealVideo is a popular video codec, developed by RealNetworks mainly used in PC and mobile
applications. VPx is a proprietary video codec, developed by On2 Technologies and is commonly used
by Adobe flash player and internet video platforms. Common MPEG codecs are MPEG 2 and MPEG 4.
MPEG 2 is widely spread as a popular storage and broadcasting codec. MPEG 4 and its derivatives are
common in mobile device applications as well as storage formats, and supported by many DVD players.
Common H.26x codecs are H.263 and H.264. H.263 is widely used by video conferencing applications.
H.264 is a joint development ofITU and ISO/IEC and currently, is the latest video standard available in
the industry. H.264 goal was to provide good video quality at substantially lower bit rates than
previous standards without increasing the complexity of design.
Lossy compression: A term used to describe a compression method where the compressed data
cannot be reconstructed exactly as the original form. This type of compression is mainly used in visual
and audio applications where a partial loss of data is acceptable by the human visual and hearing
systems. As opposed to a lossless compression where the compressed data can be reconstructed
precisely, lossy compression methods require significantly less bits in the compressed form.
Jitter: A term used to describe the variation in packet delay. In packet switch applications, where
data is carrying over network packets, there is a variance in the packet arrival timing. In order to
overcome this variance and to provide a smooth usage of the received packets, a delay buffer is added
to the system. In most cases, the buffer size is being determined by the max introduced variance.
Video Artifacts: The big challenge in most video applications is to provide the highest video quality
with a minimum cost of bit rate. As a result of lossy compression techniques, non optimized network
conditions and other application restrictions, video quality is affected and quality of service may
reduce. Video artifacts may be generated from non optimal settings and environment characteristics,
causing an unpleasant visual view. The most popular video artifact is the quantization noise,
generated as a result of bit rate reduction. Network packet loss, when accruing frequently, increasing
the video artifacts dramatically. Other artifacts like ringing noise, blocking effects, blurred images,
un-sharpness and more are resulted from the codec processing and for some cases, may be
compensated with post processes after the decoder task.
Immediacy Services are either synchronous, where you expect to get a response or a feedback
immediately, or asynchronous, where feedback is either unexpected or can be
delayed.
Direction Services are either unidirectional (you get a feed of data but can’t respond to it) or
bidirectional, where both sides of the “conversation” participate.
Participation How many people on each “side” of the service exist (1:1, 1:N and even M:N).
Here’s how the office communication means discussed above measure up:
Ease of Use Easy Easy Very Easy Less Easy Less Easy
1
IM is considered to be Synchronous, but one may argue that if the other party’s client is closed and/or the
other party is unavailable, it is an asynchronous experience.
2
IM is considered to be Bidirectional, but one may argue that if the other party’s client is closed and/or the
other party is unavailable, it is a unidirectional experience.
3
E-mail can be a 1:1 as well as a 1:N experience, depending on the type of e-mail chosen.
4
IM is usually a 1:1 service, but today’s IM clients can broadcast IM messages in a 1:N mode as well.
5
POTS is usually a 1:1 service, but today’s PBXs offer N:N services as well.
Looking at this complicated mash of different video conferencing equipment, one may wonder how a
video conference can even be conducted? It is as diverse in capabilities - image size (resolution), bit
rate, choice of video and audio codecs, etc. and therefore so broad. A low resolution endpoint can’t
interwork with a high resolution stream and a high bandwidth endpoint will have trouble
communicating with a low bandwidth endpoint. Lastly, if endpoints use different video codecs, the
call will simply not be successful.
The choice of video codec is a “weak spot” in the whole concept of video communication. One may
argue that today most endpoints support H.264, the latest video codec standardized by ISO and ITU-T
(considered to be the best known codec out there), but unfortunately it is far more complicated.
Some of the older video conferencing equipment (legacy) does not support H.264; some endpoints still
support H.261, which was standardized 18 years ago. Others use H.263; Mobile handsets supported by
MPEG4 and H.263 and only now have started to introduce H.264. Popular instant messaging clients,
such as Office Communicator and Skype, use proprietary video standards.
ITU-T
H.263 H.263++
H.263+
H.264
MPEG 1 MPEG 4
ISO
MPEG 2
200 kbps FCC Definition of High Speed DSL Lite (256 kbps)
Broadband Applications & Speeds. Source: S. Derek Turner, Broadband Reality Check, Free Press,
August 2005
As one can see in the above table, most video applications require at least 1Mbps of download speed.
For HD television there needs to be around 20Mbps. In a Communication Workers of America policy
paper titled “Speed Matters - Affordable High Speed Internet For All“, the authors argue that a high-
speed interactive network on a national level will improve the quality of our economic, civic and
personal life, not just entertainment.
For instance, high speed interactive broadband can connect health professionals and patients which
6
Tagline for the movie “Top Gun”, voted as one of the top 100 movie quotes by the American Film Institute.
Imagine that your enterprise has many branches, connected to one another in some infrastructure. It
would be great if you could test the connectivity using REAL video and analyze how well it performs.
Or, think of another example - a CEO who has a 4pm conference call with some analysts… its 3:30pm
now. Is there a way for him to test the connection ahead of time and verify that everything is ready?!
Another important issue is testing video quality. Up until now, video quality has remained something
that only experts understood, even though anyone faced with a video conference will have some
notion of its perceived quality. Objective measures exist, but they are very hard to understand and
have limited correlation to human perception. This situation has caused video quality to be something
no one mentions when discussing network analysis or video conferencing systems.
RADVISION’s Elie Cohen has years of experience with testing and analysis tools. In a recent
whitepaper he discusses the growing need for video quality testing. Combinations of objective and
eVident provides voice and video quality measures that anyone can read. This means that you will not
only receive notifications on whether the video is “good” or “bad”, but also a numeric value which
strongly correlates to what you or other end-users will think of the video. On top of that, eVident
implements different network metrics (such as: jitter, delay, packet loss, throughout utilization) in
order to better configure the network for voice and video conferencing.
One may think that a video conferencing system vendor such as RADVISION is “shooting itself in the
foot” by releasing such an analysis tool. However, only by educating the customer, only by giving him
the means to prepare his infrastructure better, and only by cooperating with the customer to provide
the desired user experience will video conferencing be able to truly provide the best means of
communication.
Advanced Topics
Visual Artifacts in Video over IP
One of the biggest obstacles of the video conferencing industry has long been the user experience.
Early video conferencing systems suffered from poor quality video, low fidelity audio, as well as great
difficulties in establishing and maintaining the connection for the conference duration. Most of these
issues were solved with advances in processor speed, audio and video coding algorithms and network
infrastructure improvement. But even today’s video conferencing experience - with high definition
video, wide-band audio and always-on networks - is not flawless, and the main Achilles heel is, as
always, the network.
A 1Mbps network connection is not a problem in a modern enterprise, but if video conferencing is
deployed all over the organization (as it should be), very soon the existing infrastructure would
become a bottleneck and bandwidth would drop, causing an array of nasty artifacts. Those
annoyances make your video terribly unpleasant. And while other aspects, such as video codec
features, scene type and source type, also influence the visual quality, it seems that network related
artifacts are the most frequent and most annoying.
Insufficient bandwidth will result in packet loss, which leads to lousy video. There are basically two
ways for an endpoint to deal with that - reduce the bit rate and/or increase the compression.
Reducing the bit rate causes a drop in visual quality (as can be seen above). Modern endpoints reduce
the resolution (picture size) together with the bit rate, but still, the experience suffers. Higher
1. Packet Loss
If packets are missing, whole areas in the video frame are displayed wrong. This causes ugly artifacts
to appear in various ways, all of them very unpleasant
Left: A scene featuring trails on the person on the left. Right: zoom in on the trails.
3. Blockiness
Blockiness is usually visible on moving objects and backgrounds (walls, furniture). This is caused by a
high compression rate.
5. Noise
Noise is a general name for any “weird” looking squares and spots in the video. These are usually
caused by packet loss or erroneous packets.
7. Scaling
As endpoints lower the resolution when bit rates drop, the picture on the other end is smaller and has
to be scaled up to be displayed on a large screen. Scaling up a low resolution picture is very hard, and
often the visual quality suffers.
When it comes to visual quality, everyone is an expert. Whether you’re watching a video streamed
over the internet, attending a video conference with peers around the world or watching the news on
your mobile handset - you don’t want anything interfering with your experience, and definitely not
those pesky artifacts.
There are ways to remove, or at least reduce, those annoyances. Those methods mostly involve fancy
post-processing algorithms, but this, I fear, is a matter for a different book.
Scalable Video Coding (SVC) is an interesting technology. It definitely shows the potential that still
lays in H.264 tools for video applications such as video conferencing. I believe that the media
attention it is receiving will improve the video conferencing market, as it will push vendors to
introduce more tools that will improve the overall quality of experience.
AVC
Decoder
qCIF @ 15fps
SVC
SVC Decoder
Encoder
SVC CIF @ 30fps
Decoder
720p @ 30fps
Example: SVC encoder with multiple receivers.
Any network component can then choose to process any set of layers, yielding the different
resolution(s) it chooses. In a similar way, layers can increase the frame rate (the number of frames in
the stream), the bit rate, or the quality (the base layer has low quality; higher layers improve the
quality gradually).
All major video coding standards since 1994 have included tools for scalable coding (MPEG-2, H.263
V2. MPEG-4). It doesn’t seem like that long ago when there was big hype around MPEG-4’s scalable
video coding tools which offered a brand new disruptive alternative to the traditional coding schemes
of the time (see part 8 of this interesting pdf: MPEG-4 overview).
Not all of these tools were accepted by the video applications market, even though the standards
themselves were, mainly due to the tremendous additional cost in terms of bit rate and computation.
Although, it was appealing to send just one stream instead of multiple streams from a streaming
server to different clients, the overall bit rate of that one stream was close to the aggregate total of
the different streams complexity and cost. Therefore the solution didn’t stick.
H.264 SVC
Scalable Video Coding (SVC), the extension of the H.264 standard, has been developed since October
2003 by the Moving Picture Experts Group (MPEG) at ISO/IEC. In January 2005 MPEG and the Video
Coding Experts Group (VCEG) at ITU-T agreed to standardize SVC as an amendment to the H.264
standard. In July 2007, this amendment got its final approval.
The scalability extension of H.264 offers spatial scalability (frame size adaption), temporal scalability
(frame rate adaption) and fidelity scalability (quality adaption). It also provides a great boost in error
resiliency and concealment, which helps prevent errors in the bit stream and recover from them
gracefully.
Many applications, such as video streaming, surveillance, broadcast and storage may potentially
decide to adopt SVC. For video conferencing SVC offers two potential benefits:
Compatibility among different endpoints, from desktop to conference room, as the variety of their
capabilities is a great challenge to current MCUs.
Greater error resiliency as modern networks still introduce many artifacts to the video which hurt the
quality of experience tremendously.
The main question here is whether or not these benefits will drive SVC into the video conferencing
market?
Adaption to Different Endpoint Capabilities
As a general idea, adaption to different endpoint capabilities does sound great. The main problem is
interoperability. If your network is, as RADVISION believes, to be the Babel Fish of all video
conferencing endpoints, encompassing a full range of products from the mobile handset through the
desktop and up to HD video conferencing and Telepresence, then the range of video resolutions,
frame rates and bit rates that you have to support makes the use of SVC very complicated, especially
if you also need to support non-SVC (legacy) endpoints.
The use of SVC or SVC-like techniques to improve error resiliency and error concealment is, IMHO, the
biggest short term benefit to the video conferencing world, out of all the hype concerning this
technology.
Error resiliency schemes and tools already exist in H.264 AVC (pdf), but are mostly not used in the
video conferencing domain. SVC will assist video conferencing to move forward by giving these tools
the spotlight.
With SVC being introduced and companies like Vidyo pushing it hard and marketing their proprietary
error concealment capabilities, the focus of video conferencing vendors will change. More error
resiliency tools, but not necessarily the SVC tools, will be used and that will improve the overall
experience for all.
The value of Scalable Video technologies for improving error resiliency is clear. However, instead of a
dramatic “phase transition” into SVC, the video conferencing market should evolve gradually into
scalable technologies.
Bottom Line
SVC is intriguing but the video conferencing market will not move entirely to it. Instead, it will adopt
ideas and technologies into its fine foundations, making the video conferencing experience better.
Reliable and highly scalable visual communication infrastructure solutions for enterprise and service
provider environments, RADVISION’s SCOPIA Conferencing Platforms offer the industry’s most
technologically advanced and easy-to-use multipoint infrastructure for real-time conferencing over
any network, protocol and device. Easy to use plug and play functionality minimizes initial setup time,
and offers unmatched flexibility. High Definition (HD) is standard on each system, enabling HD at
720p, H.264 with up to 30 frames per second of perfect quality video.
With RADVISION’s PC based SCOPIA Desktop, you don’t need to go to “the conference room” to have a
conference. SCOPIA Desktop easily extends a room system conferencing application to remote and
desktop users for voice, video and data communications. Take conferencing “where you go” – instead
of being told “where to go!” SCOPIA Desktop is designed to meet the demands of high performance
video conferencing with a standard PC and Internet connection. It includes the latest in video
technology providing HD H.264 for viewing both meeting participants and data collaboration. Its audio
system provides echo cancellation, background noise suppression, and is highly resilient to network
errors common on the Internet.
SCOPIA Desktop is a simple web browser plug-in that is centrally managed and deployed without
complex licensing fees or installation issues. Simply click on a link and in moments you are ready to
go. Include tele-workers in meetings, participate in video conferences from the road, collaborate with
partners and suppliers and seamlessly connect through firewalls.
Gateways
SCOPIA Gateways provide seamless connectivity between different networks and standards to deliver
feature-rich, reliable, multimedia conferencing and communications. The Gateways are ideal for
connecting IP video conferencing networks with ISDN endpoints and networks to fully utilize existing
video conferencing infrastructure investments.
iVIEW Suite provides a comprehensive management solution for voice and video collaborative
communications. Efficiently manage and monitor a video network to ensure efficient bandwidth
utilization, easy meeting scheduling, management and control for an optimal, high quality video
communications experience. iVIEW delivers full gatekeeper functionality, complementing RADVISION’s
SCOPIA video network infrastructure and MCUs.
Firewall Traversal
SCOPIA PathFinder Firewall Traversal is a complete firewall and NAT solution enabling secure
connectivity between enterprise networks and remote sites. SCOPIA PathFinder maintains the security
and advantages of firewall and NAT over heterogeneous video networks and allows seamless
integration with existing video endpoints and infrastructure components.
Gatekeepers
The SCOPIA Interactive Video Platform (IVP) is a powerful general purpose media server with a flexible
high-level API and Service Creation Environment for generating a wide range of video services. With
the SCOPIA IVP, service providers, enterprises and developers can now easily create and reliably
deploy interactive video services seamlessly integrated with existing networks. These real-time,
video-based services offer a high revenue margin complement to traditional voice and data services
for true added value.