Voip Basics: Converting Voice To Digital Form

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

VoIP Basics: Converting Voice to Digital Form

Let's start with the beginning. VoIP sends digitized voice across computer networks. So how do
we convert voice to the digital form?

When converting an analog signal (be it speech or another noise), you need to consider two
important factors: sampling and quantization. Together, they determine the quality of the
digitized sound.

 Sampling is about the sampling rate — i.e. how many samples per second you use to
encode the sound.
 Quantization is about how many bits you use to represent each sample. The number of
bits determines the number of different values you can represent with each sample.

Figures 1 and 2 show the idea of sampling — Figure 1 is the original analog signal, while Figure
2 shows the digitized form as a sequence of discrete samples.

 
Figure 1: Analog signal
 
Figure 2: Digitized signal

Quantization

As mentioned above, quantization is about how many bits you use to represent individual sound
samples. In practice, we want to work with whole bytes, so let's consider 8 or 16 bits.

With 8-bit samples, each sample can represent 256 different values, so we can work with whole
numbers between -128 and +127. Because of the whole numbers, it is inevitable that we
introduce some noise into the signal as we convert it to digital samples. For example, if the exact
analog value is "7.44125", we will represent it as "7". As we do this with each sample in the
sequence, we slightly distort the signal — inject noise, in other words.

It turns out 8-bit samples do not result in a good quality. With only 256 sample values, the
analog-to-digital conversion adds too much noise. The situation improves a lot if we switch to
16-bit samples as 16 bits give us 65536 different representations (from -32768 to +32767). 16-bit
samples are what you will find on a CD and what VoIP codecs use as their input.

Sampling

Now that we have decided what sample size to use (16 bits), let's look at sampling rates. The
table below shows three frequently used sampling rates:

Type Transmitted Bandwidth Sampling Frequency


Telephone Speech 300-3400 Hz 8 kHz
Wide Band Speech 50-7000 Hz 16 kHz
CD quality audio 20-20000 Hz 44.1 kHz

 
With VoIP, you will most frequently encounter the sampling rate of 8 kilohertz. The frequency
of 16 kHz can be used now and then in situations when a higher quality audio is required (with
proportionally higher Internet bandwidth consumption).

The choice of sampling frequencies for the individual types of audio is not random. There is a
rule (based on the work of Nyquist and Shanon) that the sampling frequency needs to be equal or
greater than two times the transmitted bandwidth. Figures 3 and 4 show why this is required.

 
Figure 3

In Figure 3, the sinusoid represents the original analog sound. The large black dots are where we
read our samples. Note that we take two samples in each period, i.e. the sampling rate is two
times the frequency of the sound. This is the absolute minimum that will allow us to reconstruct
a signal that is still comprehensible. It certainly won't be a hi-fi sound but it will have the correct
frequency - see the thin black lines in the picture.

 
Figure 4
The Figure 4 shows a situation where we take less than two samples per period. The thin black
lines show what would happen after we feed the samples into a digital-to-analog converter — we
would hear something different from the original, a sound with lower frequency. This problem is
known as "aliasing" since the lower frequency appears to be an "alias" to the original correct one.

Summary

In this piece, we discussed a conversion of voice to a digital format. We considered the influence
of sampling frequency and of the sample's size. It's good to remember that VoIP most frequently
works with the sampling frequency of 8 kilohertz and each sample is stored in 16 bits.

You might also like