Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

REAL TIME DATA TRANSMISSION OVER GSM VOICE CHANNEL FOR SECURE VOICE & DATA

APPLICATIONS

N.N. Katugampala, K.T. Al-Naimi, S. Villette, and A.M. Kondoz


University of Surrey, United Kingdom
Email: a.kondoz@eim.surrey.ac.uk

addition, the GSM data channel may use automatic


ABSTRACT repeat request (ARQ) for error correction and has zero
errors at the expense of increased delay. The average
round-trip time of the GSM data channel is 0.5 seconds
This paper describes a real time prototype [Challans et al 3]. This value depends upon the size of
implementation of a system, which enables secure voice the packets transmitted. In practice this translates into a
and data communication over the GSM voice channel. delay, which exceeds the ITU-T specifications for
The security of GSM is not guaranteed especially over one-way transmission times of 150ms for telephony
the core network. The proposed system modulates services [ITU-T G.114 4]. Although the proposed 3GPP
digital data, which may be encrypted onto speech-like standards specify the provision of low-latency data
waveforms. The modulated waveform is then bearer channels, which could be used for end-to-end
transmitted over the GSM voice channel, which can be secure communications or telemetry operations, the
demodulated and decrypted at the receiver. deployment dates of such systems are as yet uncertain,
and it will be quite some time before 3G mobile systems
The real time prototype system has been tested on will be ubiquitously available.
GSM-to-GSM voice calls, and a proprietary speech
codec is used on the real time data channel to produce On the other hand, the use of encryption on the speech
communication quality speech. A demonstration will be channel is not straightforward. The GSM terminal has a
provided at the presentation. speech compression/decompression process for efficient
use of the bandwidth and this is heavily based on the
assumption that the input signal will be speech. It uses
1. INTRODUCTION the usual speech production model parameters such as
pitch, vocal tract model parameters etc. to efficiently
compress the input speech. If the speech signal is
The GSM system ensures subscriber identity encrypted before it comes to the encoding block, as it
confidentiality, subscriber authentication as well as will be randomised by the encryption process, it will not
confidentiality of user traffic and signalling. The satisfy the expected speech characteristics and hence
ciphering algorithms used in GSM [Lo and Chen 1] will fail to go through the GSM speech transcoding
have proved to be effective in ensuring traffic process with sufficient accuracy. A method was
confidentiality. However the traffic confidentiality is presented where after the encryption process the
only ensured across the radio access channel. Voice resultant bits are modulated back onto speech-like
traffic is transmitted across the core circuit switched waveforms, which possess the required speech
networks ‘in clear’ in the form of PCM or ADPCM characteristics [Katugampala et al 5]. This paper
speech which opens up the possibility of unauthorised presents the progress made since the publication [5], in
access to GSM-to-GSM or GSM-to-PSTN terms of testing on public GSM network voice calls,
conversations. Moreover, the security over the GSM additional problems encountered, proposed solutions,
speech channel is controlled by the network operator, and a real time prototype of the system.
not the end user. Control by the end user may be
preferable in some applications. For guaranteed
end-to-end security the speech signal must be encrypted 2. VOICE DATA TUNNELLING
before entering the communications system.
Although the GSM data channel can be used for
encrypted speech transmission, this approach suffers The standard modems used in PSTN are not suitable for
from a number of disadvantages. The GSM data channel the compressed low bit rate speech channels. The main
has interoperability problems especially across the objective of speech compression is to reduce the number
international networks [Street 2]. The GSM data channel of bits required to represent speech, whilst still retaining
typically requires 28-31 seconds to establish a an acceptable speech quality level [Kondoz 6]. A
connection, of which approximately 18 seconds are side-effect of this approach is that the resulting
taken up by the GSM modem handshaking time. In synthesised speech, whilst perceptually being similar to
the input speech, i.e. it sounds very similar to the waveform. The transcoding that takes place within the
original, may have a fairly different waveform on a network, cause the waveform generated by the decoder
sample-by-sample basis. This objective difference to differ from that produced by the modulator at the
prevents most data modems from operating over transmit end. The demodulator is still able to extract the
channels, which employ speech compression systems.
original transmitted data. For simplicity only simplex
This problem is compounded by the fact that in many
networks, and in particular, mobile communication communication is illustrated, however full duplex secure
systems, the speech signal may undergo more than one voice communication is possible using the same
set of compression/decompression stages, a techniques.
phenomenon known as tandeming.
Therefore a different modem was designed for low bit
Speech-like
rate speech channels [5]. This modem can be used to Input Speech
waveform
transmit any form of general digital data, e.g. encrypted 1100110 1010101
Input speech
speech. Figure 1 depicts the relationship between the Speech
encoder
Data
encryption
Data
modulator
modulator, the demodulator, and the transmission path
in a low bit rate voice communication system. Add-on module to be connected to
standard GSM handset

Speech PSTN to GSM to Speech


encoder GSM 64 kbps PCM PSTN decoder
waveform
Base Station Subsystem Base Station Subsystem
Speech-like Speech-like
waveform waveform
Output Speech
1010101
Data Speech
1010101 1100110
modulator compression Compressed
Input ‘speech’ Data Data Speech Output speech
data demodulator decryption decoder
Transmitter
Add-on module to be connected to
standard GSM handset

Figure 2: Overview of the complete system


Communications Network
Data may be subjected to bit
errors and packet/block loss

Speech-like 3. SIMULATIONS
waveform
1010101 Compressed
Data Speech ‘speech’
demodulator decompression
Output
data The results presented in [5] were obtained from software
Receiver
simulations using C. The simulation included a
modulator, a double tandem GSM EFR [ETSI GSM
Figure 1: Modulation over the speech channel of a 06.60 8] speech transcoding process, and a demodulator.
communications network Figure 3 depicts a modulated waveform segment and the
signal received at the demodulator in the simulations.

Figure 2 depicts a more detailed example for the GSM However the real GSM voice channel proved to be more
challenging even under error free conditions. The GSM
system. The example shown is a typical mobile terminal
system includes additional components such as
to mobile terminal communications path. The input Automatic Gain Control (AGC), Voice Activity
speech signal is first compressed using a very low bit Detectors (VAD), and various filters. These components
rate speech encoder [Stefanovic et al 7], e.g. 1.2 kbps, in further degrade the performance, in addition to the
order to accommodate in the available bandwidth of the double tandem speech transcoding. The modulated
voice data tunnel. The output bit stream of the speech signal is monitored for sections without much variation
encoder can be encrypted. The encrypted speech data is and modified so that triggering of the VAD will be
fed into the modulator, which converts it into a avoided. The initial modem [5] needed significant
speech-like waveform to feed into the GSM handset. modifications in order to take into account the above
The GSM speech encoder in the handset compresses the listed problems.
modulated waveform. The resulting digital bit stream is
transmitted over the communications channel, which
includes a radio link, a speech decoder at the base
station, a core transmission network, a speech encoder at
the second base station and a downlink radio channel.
The bit stream is received by the decoder of the receive
terminal which converts it back to a speech-like
2 soundcards and various standard Nokia handsets were
used. Microsoft Visual C++ library functions were used
Synthesised signal
to read and write to the sound cards. Once the
complexity reduction techniques currently being
investigated are implemented the complete full duplex
secure voice system is expected to run on a modern
PDA e.g. 200 to 400 MHz.
Amplitude

The present demonstrator is a simplex system, which


can be extended to a full duplex system. In order to
achieve full duplex secure communication practical
issues such as side tone cancellation and two to four
wire transformation need to be considered depending on
Received signal the interface and the telephone connection used.

0 250 500 750 4.1 Synchronisation


Time (samples, 8 kHz)

Figure 3: Synthesised and received speech-like Synchronisation of the frame boundaries is achieved by
waveforms using a different modulated signal with a much lower
data rate (400 bps). This synchronisation signal is
4. REAL TIME PROTOTYPE derived from a known set of data stored at both the
modulator and the demodulator and transmitted at the
beginning. This signal passes through a GSM voice call
There are several methods to interface the service access with virtually no errors.
point of the communications network, e.g. GSM handset
An additional problem with the analogue connections is
to the modem.
the drifting of the digital samples of the modulated
signal due to the difference in the transmitter and
1. The modem, encryption/decryption, and the speech receiver sound card clock rates. This problem was
codec may be implemented on a personal digital solved by implementing an additional function to
assistant (PDA) with a GSM connection. Then the continuously monitor the frame boundaries.
modulated waveform could be directly copied onto
the GSM voice buffer.
4.2 End-to-end delay
2. The secure voice system is implemented as a
separate add on module and the interface provided
with a Bluetooth audio link. The extra end-to-end delay introduced by the system
3. The secure voice system is implemented as a stays reasonable: 95ms for the algorithmic delay of the
separate add on module and the interface provided 1.2 kb/s speech coder plus 40 ms for the
with cables using the hands free sockets of the GSM modulation/demodulation process give an overall extra
handsets. delay of 135 ms in addition to the normal GSM speech
channel delay. This is significantly less than that of the
GSM data channel delay. As a result the proposed
It should be noted that Bluetooth provides a digital system provides a better quality of service than the
connection, while the hands free cables provide an existing systems, in addition to the improved security.
analogue connection. Analogue connections add extra
distortion and perform worse than the digital 5. RESULTS ON GSM-TO-GSM VOICE CALLS
connections. An integrated PDA implementation, which
directly accesses the GSM voice buffers, will not add
any distortion due to the interface. Table 1 shows the results obtained on GSM-to-GSM
A real time prototype of the system was implemented on cross network voice calls on UK public networks,
two desktop personal computers (PC). Each PC used namely Vodafone and O2. This is the most challenging
one 2 GHz Intel Pentium Xeon processor running scenario for the proposed system, GSM-to-GSM calls
Microsoft Windows XP operating system and 2 GB of undergo double tandem speech transcoding. The system
RAM. The interface to the GSM handsets was provided works better on GSM-to-PSTN, PSTN-to-GSM, or
PSTN-to-PSTN connections, due to one or no speech
using hands free cables. Creative Sound Blaster Audigy
transcoding stages involved.
the system on a GSM-to-GSM call with the 1.2 kbps
speech codec will be provided at the presentation.
TABLE 1: Results on GSM-to-GSM voice calls

7. REFERENCES
Before After channel
channel decoding
decoding [1] C. Lo and Y. Chen, November 1999, “Secure
Interface communication mechanisms for GSM networks”,
Rate BER Rate BER FER IEEE Transactions on Consumer Electronics, Vol.
kbps % kbps % % 45, No. 4, pp. 1074-1079.
Digital/Analogue 3.0 2.9 1.7 0.40 1.8
[2] Michael Street, February 2003, “Interoperability
Digital/Analogue 3.0 2.9 1.2 0.03 0.2 and international operation: An introduction to end
to end mobile security”, IEE Secure GSM and
Beyond: End to End Security for Mobile
Communications, London.
In order to avoid the potential problems associated with
analogue interfacing at the transmitter side a digital
interface was simulated by copying a modulated
waveform file onto a GSM handset using Bluetooth, and [3] P. Challans, R. Gover, and J. P. Thorlby, February
2003, “End to end data bearer performance
playing the file while on a call to a second GSM handset. characterisation for communications over wide area
The second handset was connected to a PC via hands mobile networks”, IEE Secure GSM and Beyond:
free cables, which analyses the received signal and plays End to End Security for Mobile Communications,
the speech in real time. This process transmits the London.
modulated signal on a GSM-to-GSM call, however the
modulated signal was transferred to the handset as [4] ITU-T Recommendation G.114, May 2000,
Bluetooth data. “One-way transmission time”.
A 1/2 rate convolutional code with a constraint length of
7 is used to derive the 1.2 kbps rate. The same code is
used with puncturing to derive the 1.7 kbps rate. A [5] N. Katugampala, S. Villette, and A. M. Kondoz,
modern 1.2 kbps proprietary speech codec [7][ Villette February 2003, “Secure voice over GSM and other
et al 9] is used providing communication quality speech low bit rate systems”, IEE Secure GSM and
across the secure voice channel. The speech codec can Beyond: End to End Security for Mobile
tolerate these error rates without noticeably degrading Communications, London.
the output speech quality.
[6] A. Kondoz, 1994, “Digital speech: coding for low
bit rate communication systems”, J. Wiley, New
6. CONCLUSION York.

A real time prototype system has been implemented, [7] M. Stefanovic, Y. D. Cho, S. Villette, and A. M.
which enables end-to-end secure voice communications Kondoz, September 2000, “A 2.4/1.2 kb/s speech
over the GSM voice channel. The secure voice system, coder with noise pre-processor”, proceedings
compresses the speech to reduce the bit rate, may EUSIPCO 2000, Tampere, Finland, pp. 4-8.
encrypt the resulting bit stream to provide security, and
speech pattern modulates to pass through the GSM [8] ETSI Standard GSM 06.60, March 1997, “Digital
speech transcoding process. cellular telecommunications system; Enhanced Full
Rate (EFR) speech transcoding”.
The secure voice system has been tested on
GSM-to-GSM voice calls. A throughput of 3 kbps has
been achieved with 2.9 % bit error rate (BER). With the
addition of error correcting codes a throughput of 1.2 [9] S. Villette, K. Al-Naimi, C. Sturt, A. M. Kondoz,
kbps with 0.03 % BER and 0.2 % frame error rate and H. Palaz, October 2002, “A 2.4/1.2 SB-LPC
based speech coder: the Turkish NATO STANAG
(FER) has been derived. A modern 1.2 kbps proprietary candidate”, Proceedings of the IEEE Speech
speech codec, which can tolerate these error rates, is Coding Workshop 2002, Tsukuba, Japan.
used to produce communication quality speech. It is
shown that end-to-end secure communication over the
GSM voice channel is achievable. A demonstration of

You might also like