SB VoIP Media Transcoding in The Cloud

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Determining the Right Solution for VoIP

Media Transcoding in the Cloud

There are a significant number of Voice over IP (VoIP) encoding/decoding algorithms used in service provider networks, enterprise net-
works, and edge consumer devices. These algorithms have traditionally been developed to reduce bandwidth consumption (e.g., iBLC,
G.729, and AMR) or to deliver superior voice quality (e.g., wideband codecs like AMR-WB, G.722, and OPUS).

As an example, a VoIP session initiated with WebRTC uses OPUS, an enterprise VoIP phone might use G.729, and a wireless network
subscriber might use AMR-WB. Calls connecting these various endpoints will require media transcoding between multiple codecs and
the choice of codecs could even change during a call. Therefore, the ability to successfully and reliably deliver real-time communica-
tions, with a high degree of quality of services, depends heavily on media transcoding capabilities.

Media transcoding can be provided in more than one place in a network: media gateways; session border controllers (SBCs) central-
ized media resource function processor (MRFP); or even within the edge device. While this brief focuses on SBCs, many of the same
characteristics apply regardless of the use case.

For years, SBC-based media transcoding was done in hardware using digital signal processors (DSPs) optimized for scale, process-
ing speed, and cost. While this has been an effective solution, networks have evolved with real-time communications migrating to the
software-only, virtual cloud environment, forcing SBCs to adapt. Today’s SBCs are deployable in virtual cloud environments and offer
two options for media transcoding:

CPU-based Media Transcoding


Intel x86 is a general-purpose CPU which can support the vector calculations required for media transcoding, but it also has been
designed to support a wide set of additional instructions unrelated to the handling of media. As such, the Intel x86 is a functional, but
somewhat inefficient solution, for media transcoding. While modern day CPUs are fast, they come equipped with small number of
cores, limiting the number of concurrent threads of execution. For compute intensive applications like media transcoding, this presents
a limit to overall scale, thereby making the solution inefficient with respect to power and cost.

Graphics Processing Units-based Media Transcoding (GPU)


Graphics processing units(GPUs) are equipped with large number of compute units that cumulatively provide high computation and
data throughput. While historically designed for computer graphics, today’s GPUs are general-purpose parallel processors and have
become the de-facto choice to solve large compute-intensive tasks such as image processing, machine learning, and more.

For real-time communications, when there are a very large number of media sessions that need to be transcoded simultaneously, the
solution requirements can be mapped to that of a high-performance computing (HPC) cluster. Because media transcoding is consid-
ered a compute-intensive task, it is well suited to take advantage of a GPU’s inherent characteristics and achieve disruptive perfor-
mance increases versus optimized CPU implementations.

Ribbon’s SBC
The software in Ribbon’s SBC, even when implemented as a physical appliance, was architected to separate signaling, media packet
handling, and media transcoding. This architecture design was well suited to facilitate the transition to a virtual cloud deployment
model. Based on this design, it was possible to optimize software for each of those functions.

The signaling software for setting up sessions, signaling protocol interworking, session management, and signaling security and the
media processing software responsible for media security, packet processing and forwarding, and media protocol interworking are
both well suited to run on Central Processing Units (CPUs) in a virtual cloud deployment.

Solution Brief
Determining the Right Solution for VoIP Media Transcoding in the Cloud

Since CPUs are used for signaling and media packet handling, using them for media transcoding is possible but it is not the only option.
Using GPUs is also a viable, and some would say better, option. Figure 1 below shows the difference between these two options for
an SBC, highlighting what functions run on virtual CPUs versus what runs on the GPUs. Note that what is called a “GPU SBC” is really
composed of CPUs plus GPUs, as the CPUs are still handling the signaling and media packet handling and some of the less complex
transcoding functions.

Transcoding vCPUs: Transcoding vCPUs:


All functions related to Light weight tasks like playout processing
transcoding including and DTMF detection done on CPU.
Management Management
Network complex speech algorithms + Signaling Network Costly speech transcoding algorithms
+ Signaling vCPUs vCPUs
vCPUs
are run on vcpus directly. vCPUs moved onto GPU

… … … … … …
DPDK DPDK NVIDIA
Library Library Library
CPU SBC GPU SBC

GPU
NETWORK NETWORK
INTERFACES INTERFACES

Management, Signaling, and Network vCPUs are the same in a CPU-based SBC and a GPU-based SBC

Figure 1. CPU-based vs GPU-based transcoding SBC

It’s important to consider differences in scalability, voice quality and cost per transcoded session when deciding between these two
options. To assess these parameters, Ribbon ran lab tests to set benchmark measures. Independently, a Tier 1 network operator used
a similar test configuration for the Ribbon SBC, to run tests on these same parameters. The results follow:

Scalability
Transcoding multiple simultaneous sessions involves many small repetitive tasks done repeatedly. When this involves multiple codec
types or complex, high-definition codecs, transcoding becomes a compute-intensive task, which is where the inherent design of GPUs
provides scalability and proves extremely valuable.

To determine the potential scalability gains from GPUs, Ribbon built a lab configuration as shown in Figure 1. The goal was to determine
the maximum concurrent transcoded sessions using a fixed number of CPUs versus that same number of CPUs plus the addition of a
GPU. The testing was done for transcoding between G.711 and G.722, G.729, AMR-NB, AMR-WB and EVRC codecs. The results showed
significant improvements, depending upon the codecs used, with a range of 400% to 1300% gain with GPUs.

Independent testing, using a more extensive set of transcoding, validated the Ribbon lab results by showing GPU scalability gains be-
tween 425% and 1100%, dependent upon the codecs used.

Voice Quality
An important consideration when deciding between CPUs and GPUs for media transcoding is the level of voice quality they provide.
Since CPUs use fixed-point calculations, while GPUs use floating-point calculations, Ribbon needed to assess whether this had any
effect on voice quality.

We ran voice quality assessment (VQA) tests and got results that showed using GPUs with floating point processing was either within
1% of CPU fixed point processing or better, depending upon the codecs being used. Independent VQA testing examining an even more
extensive codec list, showed similar VQA results for using GPUs.

The bottom line for voice quality is there is no perceptible voice quality difference between using GPU and CPUs for transcoding.

2 Solution Brief
Determining the Right Solution for VoIP Media Transcoding in the Cloud

Cost per transcoded session


To assess the cost per transcoded session, Ribbon calculated two costs: the maximum power draw for each configuration; and the
capital costs of each physical configuration with/without the GPU.

We saw substantial power savings in our testing, with GPUs providing anywhere from 54% to 456% increase in sessions per watt versus
CPUs, dependent upon codec used. Similarly, independent testing of a more extensive codec list showed that GPUs provided a gain in
transcoded sessions per watt between 400% and 800%, dependent upon codec used.

On top of power costs, we factored in the respective capital costs of the tested configuration to account for the incremental GPUs. By
dividing the total costs, by the maximum number of transcoded sessions, we calculated the cost per transcoded session. The results
show that using GPUs is significantly less expensive per transcoded session across all codec pair combinations. We expect the perfor-
mance of each successive generation of GPUs to increase faster than the associated cost, therefore the use of GPUs will achieve even
better cost per transcoded session.

Summary
As service providers and large enterprises plan the migration of their SBCs to a virtual cloud environment, they must determine how to
deliver the most cost effective, yet scalable, media transcoding.

In this solution brief we highlighted two options for media transcoding in the cloud: CPUs and GPUs. Results of testing conducted by
both Ribbon and a Tier 1 network operator show that when it comes to scalability, voice quality, and cost per transcoded session, GPUs
significantly out-perform CPUs for media transcoding, especially as the requirement to scale goes up.

In conclusion, using CPUs for media transcoding may work well for a low-scale transcoding solution or a transcoding use case where
the maximum concurrent transcoded sessions traffic is relatively static and not expected to grow over time. Otherwise, given the re-
sults described in this solution brief, moving a network infrastructure toward the use of GPUs for media transcoding means you are
heading in the right direction for the future.

www.rbbn.com

Copyright © 2019, Ribbon Communications Operating Company, Inc. (“Ribbon”). All Rights Reserved. v0119

Ribbon Communications is a registered trademark of Ribbon Communications, Inc. All other trademarks, service marks, registered trademarks, or registered service marks may be the
property of their respective owners.

3 Solution Brief

You might also like