Professional Documents
Culture Documents
Automated Diagnosis For UMTS Networks Using Bayesian Network Approach
Automated Diagnosis For UMTS Networks Using Bayesian Network Approach
Abstract—This paper presents an automated diagnosis in trou- Area Network (WLAN). As a result, the operation of the RAN
bleshooting (TS) for Universal Mobile Telecommunications Sys- will be a tough challenge that the operators will have to tackle.
tem (UMTS) networks using a Bayesian network (BN) approach. In addition, cellular network operators have to find ways to
An automated diagnosis model is first described using the Naïve
Bayesian Classifier. To increase the performance of the diagnosis reduce the cost of their services and improve their quality to
model, the entropy minimization discretization (EMD) method counter the threat posed by emerging technologies, such as
is incorporated into the model to select optimal segments for telephony based on the WLAN.
the discretization of the input symptoms. In the first phase, the In a mature cellular network that has undergone most of its
diagnosis model is constructed using a dynamic simulator. The site roll out, the major cost is associated with the operation
simulator TS platform allows generation of a large amount of data
required to study the relations between faults and symptoms. In of the network. As the network consists of a high number
the second phase, the diagnosis model is adapted to a real UMTS of pieces of equipment that are distributed across the entire
network using counters and key performance indicators (KPIs) country, maintaining and operating this large and technically
recovered from an Operations and Maintenance Center (OMC). complicated system is a difficult task that requires operator
Results for the automated diagnosis using both network simulator personnel around the clock in several regional offices. Even
and real UMTS network measurements illustrate the efficiency of
the proposed TS approach and its importance to mobile network with reliable hardware and software, there are always faults
operators. that have to be rectified as otherwise, the end user will either
experience suboptimal service levels or no service at all. As in
Index Terms—Automated diagnosis, Bayesian networks (BNs),
entropy minimization discretization (EMD), faults, symptoms, most countries, several operators are competing for subscribers,
troubleshooting (TS), Universal Mobile Telecommunications and it is imperative to quickly rectify such occurrences because
System (UMTS) network. otherwise, users will naturally switch to competing network op-
erators. Hence, fault management, also called troubleshooting
(TS), is a key aspect of the operation of a cellular system in
I. I NTRODUCTION a competitive environment. As the RAN of cellular systems is
by far the biggest part of the network, most TS activities are
T HE MOBILE telecommunication industry has experi-
enced significant changes in the recent past, and it will
continue to evolve in the foreseeable future. The current sce-
focused on this area.
TS comprises the following three tasks: 1) fault detection
nario comprises a complex set of interrelated and rapidly (FD); 2) cause diagnosis (i.e., identification of the problems’
growing wireless networks, applications that require increasing cause); and 3) solution deployment, namely fixing the problem.
bandwidth, and users who demand high quality of service at Among the TS tasks, the diagnosis of the cause of faults is the
low cost but with a limited spectrum. In a few years, the highly most complex and time-consuming one. A cause could be a
complex and heterogeneous Radio Access Network (RAN) will hardware failure (like a broken base-band card in a node B)
comprise different technologies, such as the Global System or a bad parameter value (i.e., transmission power, antenna tilt,
for Mobile Communications (GSM), the Universal Mobile or a control parameter such as a Radio Resource Management
Telecommunications System (UMTS), and the Wireless Local (RRM) parameter). The term symptom refers to indicators that
may help to identify the fault cause. There are two types of
Manuscript received February 21, 2007; revised May 28, 2007, September symptoms, i.e., counters and/or key performance indicators
17, 2007, and October 3, 2007. The review of this paper was coordinated by (KPI), and alarms.
Prof. Y.-B. Lin.
R. M. Khanafer, J. Triola, and Z. Altman are with France Telecom
The first steps in the automation of the TS process in cellular
R&D, 92794 Issy les Moulineaux, France (e-mail: rana.khanafer@orange- networks have been focused on performance visualization [1]
ftgroup.com; jordi.triolabosch@orange-ftgroup.com; zwi.altman@orange- and FD [2]–[5]. Regarding automatic diagnosis, very few ref-
ftgroup.com).
B. Solana is with Telefónica I+D, 28043 Madrid, Spain (e-mail: solana@
erences can be found on the diagnosis in the RAN of cellular
tid.es). networks. However, automatic diagnosis has been extensively
R. Barco and P. Lázaro are with ETSI Telecomunicación, University of studied in other fields, such as diagnosis of diseases in medicine
Málaga, 29071 Málaga, Spain (e-mail: rbm@ic.uma.es; plazaro@ic.uma.es).
L. Moltsen was with Moltsen Intelligent Software, 9220 Aalborg, Denmark. [6], TS of printer failures [7], and diagnosis in the core of
He is now with Wirtek, 9220 Aalborg, Denmark (e-mail: lars.moltsen@ communication networks [8]. However, diagnosis in the RAN
Wirtek.com). of cellular networks has some distinctive characteristics, such
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. as the continuous nature of performance indicators and the
Digital Object Identifier 10.1109/TVT.2007.912610 existence of logical faults, such as a wrong configuration, that
K
|R(ck )| |R(ck )|
Ent(R) = − · log2 (3)
|R| |R|
k=1
|D1 | |D2 |
Ent(D, mj , Si ) = · Ent(D1 ) + · Ent(D2 ) (4)
|D| |S|
III. A UTOMATED D IAGNOSIS FOR UMTS RAN link. For simplicity of notation, the add and drop window
U SING A N ETWORK S IMULATOR parameters with too high and too low values are denoted by
RRM_MD+ and RRM_MD− , respectively. It is supposed that
The diagnosis model for a UMTS network requires to
the difference between the add and drop windows is kept
determine appropriate causes and the available indicators
constant as typically implemented in real networks. The add
(symptoms) to efficiently infer causes of faults. Data from
and drop windows of a given cell have an impact on the creation
the network are “expensive,” and to construct a significant
and suppression of links of its neighboring cells. In other words,
statistical database requires precious time from radio experts.
poor quality in one cell could be caused by a fault in another
A dynamic network simulator allows to produce a large amount
cell. Hence, one needs to keep track of certain symptoms of a
of “cheap” data needed to adapt the BN diagnosis model
BS and of its neighbors, making the TS problem more complex.
to the new technology. It allows determination of the data
In a similar manner, one could consider other RRM parameters,
requirements to construct an accurate diagnosis model, i.e.,
such as admission and congestion control parameters.
number of case studies, the symptoms associated to each cause,
2) Symptoms: In the context of a simulator, alarms can be
etc. A simulator can be used to study only a subset of problems
defined using both counters and KPIs. When a KPI value ex-
that can occur in a real network: causes related to antenna and
ceeds a predetermined threshold, an alarm can be triggered. In a
system parameters (i.e., common channel power, maximum
real network, such an alarm would correspond to a “flag” raised
base station (BS) transmitted power, neighboring list declara-
when processing data from the Operations and Maintenance
tion) and causes related to different RRM functionalities, such
Center (OMC) or capture tool. The following symptoms are
as admission and congestion control or mobility. The simulator
considered.
used in this paper is a semidynamic simulator based on
correlated snapshots with time intervals on the order of 1 s [18]. — blocked call rate (BCR);
— dropped call rate (DCR);
A. Model Construction — MD blocking rate. If a request to establish a new (addi-
tional) link with a BS is denied, it is considered as MD
Although one can benefit from the TS experience in GSM blocking. The ratio between the number of MD blockings
[12], [13], the particularities of the UMTS technology should and the total number of requests to establish additional
be taken into account when developing the diagnosis model. links defines the MD blocking rate. The MD blocking rate
For example, UMTS is an interference-limited system in which of a BS is calculated here as the average blocking rate of
interference can produce coupling effects between neighboring all the mobiles having this station as the best server;
cells. Hence, a faulty cell can considerably reduce the perfor- — capacity/throughput. For real-time traffic, capacity is
mance in a neighboring cell. given in terms of the number of mobiles per service. For
A reduced model has been built to prove the feasibility of the nonreal-time traffic, the downlink throughput is used as a
techniques proposed in this paper. The causes and symptoms in capacity indicator;
the model are the following ones. — Ping-pong. The ping-pong KPI is calculated as the fre-
1) Causes: Two types of fault causes are considered, i.e., quency of active set updates.
hardware problems and bad parameter values, which result in
poor quality indicators and alarms. A parameter value could
be too big or too small (with respect to an optimal value) and
B. Case Study
denoted, respectively, as Par+ and Par− . The faults considered
by the simulator are the following. This section illustrates the application of the BN model for
Channel element breakdown: This cause is a hardware the automated diagnosis of UMTS networks using the semidy-
problem. One or several channel elements in a node B could namic network simulator. Hence, data from the simulator are
be out of service. used to adapt and fine tune the reference model and to assess
Pilot power: A too high Common Pilot Channel the amount of data necessary for accurate diagnosis.
(CPICH) power P ilot+ will extend too much the service zone 1) Simulation Setup: The causes and symptoms used for
of the node B, thus becoming overloaded. Conversely, a pilot constructing the diagnosis model are listed below.
power P ilot− that is too low will decrease the cell extent too Causes:
much, reduce its load, and push traffic to neighboring cells. — channel element breakdown in a BS (hardware fault);
Antenna tilt: As in the pilot case, tilt+/− affects the cell — bad settings of system and RRM parameters: CPICH
extent, its load, and that of its neighbors. An up-tilted antenna power, antenna tilt, and mobility parameters (add and
will create interference in neighboring cells and will deteriorate drop windows).
their performance, whereas down-tilted antennas will reduce
the cell range and may cause coverage holes. In the construction of the Bayesian model, parameters that
Mobility parameters: Mobility parameters are of partic- are too high or too low are considered to be distinct fault causes.
ular importance in mobile networks. The hysteresis events In addition, a “normal” state has been included in the Cause
1A and 1B, or add and drop windows, respectively, for soft node, which stands for nonfaulty cells triggering an alarm.
handover (HO), are considered here. Add window defines the Symptoms:
threshold for adding a new link to the active set of a mobile, and — BCR;
the drop window defines the threshold for removing an existing — DCR;
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA. Downloaded on November 04,2023 at 09:07:43 UTC from IEEE Xplore. Restrictions apply.
2456 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 57, NO. 4, JULY 2008
Fig. 6. BCR histograms for (black) nonfaulty and (white) faulty cells due to
Fig. 5. One hundred ninety-six sector UMTS network for testing the auto-
excess of pilot power.
mated diagnosis model. A buffer zone of 151 sectors is added to minimize
truncation effects.
— MD blocking rate;
— capacity;
— ping-pong indicator.
To improve the model accuracy and to take into account the
interference-related coupling effects between neighboring BSs
[19], the KPIs of neighboring cells (from the list above) have
also been included in the BN model.
A 66 trisectorial site network in a dense urban environment
is considered (with a total of 196 sectors, with two sites having
only two sectors). A buffer zone with 151 sectors is added to
minimize the truncation effects and is used in all the simulations
for network evaluation (see Fig. 5).
Fig. 7. MD blocking call rate histograms for (black) nonfaulty and (white)
For each given cell and each selected KPI, a neighboring KPI faulty cells due to excess of pilot power.
has also been calculated and computed as the result of averaging
the KPI values for that KPI related to the four neighbors discretization methods have been analyzed, i.e., an unsuper-
having the highest traffic flux with that cell. Each KPI (either vised method (the PBD) and a supervised one (the EMD).
neighboring or serving cell) has been averaged over 3000 time The threefold cross-validation statistical test [20] has been se-
steps of 1 s in the simulator. The statistics for cells with faults lected to compute the diagnosis accuracy of both discretization
have been calculated as follows. Fifteen simulations for each techniques. Thus, the entire diagnosis workflow (discretization,
one of the seven faults have been carried out. In each one of model probabilities estimation, and performance evaluation)
these simulations, one single fault was introduced into 14 of the has been repeated three times using at each iteration different
196 cells. Thus, 210 (= 14 cells/simulation · 15 simulations) couples of training and test sets. The final results are calculated
data points per fault have been stored in the data set. Finally, by averaging the performances from three iterations.
210 nonfaulty cells (normal conditions) having triggered an For comparison purposes, both discretization techniques use
alarm along the 105 different simulations have also been added a four-state segmentation of all the symptoms, having for each
in the whole data set. KPI the states labeled “low,” “normal,” “high,” and “very
In BN inference, the symptoms’ histograms for both normal high.” Hence, for the EMD method, the algorithm described
and faulty cells are of particular interest and are illustrated in Section II-C has been recursively applied twice, whereas for
below. Fig. 6 compares the BCR histograms for the normal and the PBD, the 70th, 80th, and 90th percentile of each symptom
faulty cells in the case of a pilot value (denoted as P ilot+ ) of distribution have been considered to find the three thresholds.
38 dBm that is too high (33 dBm is considered as a normal For both methods and for each iteration, a training set of
value). A shift to the right of the P ilot+ histogram indicating 140 data points (cases) per fault has been used, namely two
quality degradation can be clearly noticed in the figure. thirds of the whole data set of 210 cases.
The histogram for the MD blocking rate for the normal and Once the discretization process has been performed, the
faulty cells in the case of a pilot value that is too high is depicted estimation of the BN parameters, i.e., the prior probabilities
in Fig. 7. As in the BCR case, the excess of pilot power results and the conditional ones in (4), is carried out by means of the
in shifting the histogram to the right. method described in Section II-D.
2) Diagnosis Performance Using Two Different Discretiza- Table I summarizes the discretization obtained for the BCR
tion Methods: The discretization techniques and the per- symptom using both methods and their associated conditional
formance evaluation are currently considered. Two different probabilities related to “normal” and P ilot+ faulty cells,
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA. Downloaded on November 04,2023 at 09:07:43 UTC from IEEE Xplore. Restrictions apply.
KHANAFER et al.: AUTOMATED DIAGNOSIS FOR UMTS NETWORKS USING BAYESIAN NETWORK APPROACH 2457
TABLE I
CONDITIONAL PROBABILITY TABLE FOR NORMAL AND FAULTY CELLS IN
THE C ASE OF E XCESS P ILOT P OWER P ROBLEM , i.e., P(BCR|“normal”)
AND P(BCR|P ilot+), R ESPECTIVELY , U SING (A) EMD AND (B) PBD
Fig. 8. Entropy versus boundary cut points for the neighboring ping-pong
symptom.
TABLE II
DIAGNOSIS ACCURACY COMPARISON USING PBD AND
EMD DISCRETIZATION TECHNIQUES
Fig. 9. Entropy versus boundary cut points for the DL throughput symptom.
TABLE III
DIAGNOSIS RESULTS: THE UPPER PART CORRESPONDS TO THE
i.e., P (Si = si,j |“normal”) and P (Si = si,j |P ilot+ ), respec- PERCENTAGE OF CORRECT DIAGNOSIS OF FAULTY CELLS , AND THE
tively, where in this case, Si is the BCR symptom. This table is LOWER PART CORRESPONDS TO THE CORRECT DETECTION
OF F AULTY T RIGGERED A LARMS
derived from the histogram previously depicted in Fig. 6.
In the following step, the remaining 70 data points per fault
(i.e., one third of the whole data set) are introduced into the
Execution module to perform the diagnosis. The final diagnosis
is obtained using (2), where the different probabilities are
selected from the tables. The table entry is defined by the row
with the interval to which the symptom belongs. By computing
(2) for all possible causes, one obtains all the conditional represents the entropy evolution as a function of the boundary
posterior probabilities for the all possible fault causes given the cut points for the first iteration, whereas the dotted black and
set of symptoms and, in particular, the cause with the highest dashed gray lines represent the entropy evolution for the first
probability to occur. Table II summarizes the obtained results, and second subsets of the second iteration of the algorithm,
where fault diagnosis and false alarm detection illustrate the respectively. The minimum value for each curve determines the
ability of the system to identify faulty and nonfaulty cells best threshold and is represented in the figures as a vertical line.
(and not only to diagnose faults), respectively. One can see Table III gives the overall performance of the automated
how the discretization impacts the inference quality. The EMD diagnosis using the EMD technique with threefold cross-
outperforms the PBD method in terms of diagnosis accuracy for validation test for the first two causes with the highest prob-
both fault diagnosis and false alarm detection. abilities. For each one of the threefold validation tests, 490
3) Detailed Results Using the EMD Method: In this section, different faulty cells (70 points for each of the seven faults)
the results using the EMD method are presented in more detail. have been introduced into the Execution module as test set. On
Figs. 8 and 9 show the entropy calculated using (4) as a function average (computing the three different test sets), 344 among the
of the boundary cut point for two symptoms [i.e., neighboring 490 generated faults have been correctly diagnosed, namely the
ping-pong and downlink (DL) throughput] for each one of highest probability has been attributed to the correct fault with
the two iterations of the algorithm. The continuous gray line 70.2% diagnosis accuracy. In 88 of the remaining 146 cases, the
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA. Downloaded on November 04,2023 at 09:07:43 UTC from IEEE Xplore. Restrictions apply.
2458 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 57, NO. 4, JULY 2008
probability given by the BN to the correct diagnosis has been — Other: The symptoms’ values included in the model are
the second highest probability (i.e., in 88.2% of the cases, the expected to significantly change if one of the above faults
right fault has been the first or the second in the list of causes is causing problems. The last cause corresponds to the
ranked by probability). rest of the faults that could originate a significant DCR,
Next, the ability of the system to identify nonfaulty cells (and such as the following:
not only to diagnose faults) has been assessed. Together with • terminal problems;
490 faulty cells, 70 different false alarm cells have been present • hardware and software problems;
at each one of the three test sets. Among the different simu- • bad parameter definition.
lations, on average, 34 out of the 70 nonfaulty cells triggering If new causes are added, additional symptoms and alarms
alarms have been correctly identified as normal cells without may be needed to achieve a correct diagnosis. The symptoms
faults, and 16 of the remaining cells have been identified as have been discretized according to thresholds learned using the
faulty but with a “normal” second diagnosed state (i.e., in PBD method (when performing this part of the work the EMD
71.4% of the diagnosed cases have been identified as normal functionality was not yet available). For that purpose, a database
in the first or the second cause ranked by probability). with more than 500 cases has been used.
4) Convergence: The number of cases required in the BN The symptoms utilized for the diagnosis model are listed
learning phase (i.e., the number of data points for BSs trigger- below. It is noted that some of the drop call counters are
ing alarms due to faults and false alarms) is an important point. associated to specific causes.
A convergence study has shown that diagnosis results converge — DCR for speech calls;
from 85 data points. Hence, the assumption of using 140 data — DCR due to missing neighbor relation;
points per fault in the training is correct. — DCR during soft HO execution;
— DCR due to lost of synchronization with node B;
IV. TS IN R EAL UMTS N ETWORKS — DCR due to a different cause (other);
— establishment fail rate for radio resource control
This section describes the application of automated TS in connections;
a real UMTS network. First, the diagnosis model is derived — establishment fail rate of radio access bearer connections
following the three steps below (see Section II-B): for speech calls;
1) identification of the fault’s causes and their associated — soft HO fails;
symptoms; — average number of radio links per cell;
2) BN construction (structure and number of states); — relocation failures during 3G–2G HO;
3) model learning using expert knowledge and data ex- — total number of inter-Radio Access Technology (RAT)
tracted from an OMC (thresholds and probabilities). HO attempts;
Then, FD is performed using the learned model. It should — number of calls drop during inter-RAT HO;
be pointed out that although certain parts of the model creation — received signal strength indicator;
are automatically performed, the roll of the radio expert and — failure to add the cell to the active set. This indicator is
the incorporation of the expert knowledge have proven to be calculated for the four neighboring cells with the highest
essential. number of HOs.
B. Learning Cases
A. Model Creation
The model construction has required to manually perform
The starting point for generating the UMTS TS model is the
fault diagnosis of several cells where the correct diagnosis
identification of appropriate symptoms (counters and KPIs) that
was not previously available. From the diagnosis data, the
could help in the FD. Counters and KPIs recovered from the
Knowledge Builder is able to train the model and calculate
OMC are the sole available information. Once the symptoms
thresholds and probabilities. The data used to build the model
are identified, the faults that directly or indirectly have an im-
comprises counters and KPIs recovered from the OMC with
pact on these counters are selected. Finally, the symptom–fault
a daily resolution. Several trials have been carried out (for
relation is determined. The proposed model considers a cell to
adjusting thresholds, determining relations between indica-
be problematic if the DCR is high, namely higher than 1%.
tors and causes, etc.) until the model convergence has been
The identified faults that can cause a high DCR are the
achieved with 97% correct diagnoses for a learning set of
following.
77 cases.
— lack of coverage;
— uplink interference;
C. Fault Diagnosis
— lack of 3G neighbors;
— soft HO problem; The last step consists of applying the diagnosis model to a
— 2G neighbors’ problem test set of problematic cells to evaluate its performance. Forty-
• bad definition of 3G–2G neighbors; two faulty cells have been selected, and their corresponding
• congestion in 2G neighboring cell; symptoms, namely counters and KPIs, have been introduced
• lack of 2G neighbors. into the Execution module. Among the 42 selected cells, six to
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA. Downloaded on November 04,2023 at 09:07:43 UTC from IEEE Xplore. Restrictions apply.
KHANAFER et al.: AUTOMATED DIAGNOSIS FOR UMTS NETWORKS USING BAYESIAN NETWORK APPROACH 2459
V. C ONCLUSION
This paper has presented research on a UMTS-automated
TABLE V TS carried out in the framework of the Eureka Celtic Gandalf
DIAGNOSIS ACCURACY PER FAULT project. The BN approach has been selected and adapted mainly
based on previous work, and again, it has been found partic-
ularly effective for the automated diagnosis task. To further
enhance automation and performance of the BN model, the
automated learning of both KPI thresholds and model proba-
bilities has been thoroughly investigated.
The PBD and EMD methods have been used for threshold
setting. It has been shown that the EMD method achieves a
discretization of the input symptoms that is closer to optimal
TABLE VI than when using the PBD method, since the resulting model
FIVE CASES FOR WHICH DIAGNOSIS IS WRONG FOR THE FIRST CAUSE
WITH THE HIGHEST PROBABILITY AND CORRECT FOR THE SECOND ONE shows better performance. In a first phase, the diagnosis model
has been studied on a semidynamic UMTS simulator. The
simulator allows the generation of a large amount of “cheap”
data required to learn the diagnosis model and to relate symp-
toms to faults. In a second phase, the automated TS model
has been adapted to a real UMTS network utilizing counters
and KPIs recovered from an OMC. In 88% of the considered
cases, the correct fault has been diagnosed, and in the remaining
cases, the diagnosis has been wrong in the first option but
correct in the second one. These encouraging results illustrate
the potential benefit of automated TS for a wireless network
operator. Finally, the methodology presented in this paper can
be extended to other RANs, including heterogeneous networks
and core networks.
ACKNOWLEDGMENT
nine cases have been selected for each type of cause, depending This paper was carried out in the framework of the Eureka
on how representative it is in the entire sample space, namely Celtic Gandalf project.
in the real network. In 88.1% of the cases, the correct fault has
been diagnosed. In the remaining cases, i.e., in five of the cells, R EFERENCES
the diagnosis has been wrong in the first option but correct [1] P. Lehtimäki and K. Raivio, “A knowledge-based model for analyzing
in the second one. Another important issue is that among the GSM network performance,” in Proc. Int. Conf. Ind. Eng. Appl. Artif.
Intell. Expert Syst., Bari, Italy, Jun. 2005.
37 cells correctly diagnosed, for 24 (i.e., for nearly 65%), the [2] J. Laiho, M. Kylväjä, and A. Höglund, “Utilisation of advanced analy-
right fault has been diagnosed with a probability higher than sis methods in UMTS networks,” in Proc. IEEE Veh. Technol. Conf.,
90%. These encouraging results illustrate the effectiveness of Birmingham, AL, May 2002, pp. 726–730.
[3] J. Laiho, K. Raivio, P. Lehtimäki, K. Hätönen, and O. Simula, “Advanced
the BN approach of automated TS. More detailed results for analysis methods for 3G cellular networks,” IEEE Trans. Wireless Com-
the diagnosis phase are presented and analyzed below. The mun., vol. 4, no. 3, pp. 930–942, May 2005.
distribution of faults for the testing set is listed in Table IV. [4] A. J. Hoglund, K. Hatonen, and A. S. Sorvari, “A computer host-based
user anomaly detection system using the self-organizing map,” in Proc.
The diagnosis accuracy per fault is depicted in Table V. IEEE-INNS-ENNS Int. Joint Conf. Neural Netw., Como, Italy, Jul. 2000,
Table VI presents a closer look at the five cells (denoted vol. 5, pp. 411–416.
“CELL A” to “CELL E”) for which the diagnosis has been [5] P. Lehtimäki and K. Raivio, “A SOM based approach for visualization of
GSM network performance data,” in Proc. Int. Symp. Intell. Data Anal.,
wrong for the first diagnosed cause (and correct for the second). Madrid, Spain, Sep. 2005.
The cases of wrong diagnosis in Table VI hint that the [6] G. Ng and K. Ong, “Using a qualitative probabilistic network to explain
main problems are related to the identification of 2G neigh- diagnostic reasoning in an expert system for chest pain diagnosis,” in
Proc. Comput. Cardiol., Sep. 2000, pp. 569–572.
bors and Coverage faults. Further effort should be invested by [7] D. Heckerman, J. Breese, and K. Rommelse, “Decision-theoretic trou-
verifying whether certain symptoms should be removed or if bleshooting,” Commun. ACM, vol. 38, no. 3, pp. 49–57, Mar. 1995.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA. Downloaded on November 04,2023 at 09:07:43 UTC from IEEE Xplore. Restrictions apply.
2460 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 57, NO. 4, JULY 2008
[8] M. Steinder and A. Sethi, “Probabilistic fault localization in communi- Jordi Triola was born in Figueres, Spain, in 1981.
cation systems using belief networks,” IEEE/ACM Trans. Netw., vol. 12, He received the M.Sc. degree in telecommunica-
no. 5, pp. 809–822, Oct. 2004. tions engineering from Universitat Politècnica de
[9] H. Wietgrefe, “Investigation and practical assessment of alarm correlation Catalunya (UPC), Barcelona, Spain, in 2004, and
methods for the use in GSM access networks,” in Proc. IEEE/IFIP Netw. a radio communications specialization degree from
Operations Manage. Symp., Florence, Italy, Apr. 2002, pp. 391–403. École Supérieure d’Électricité (Supélec), Gif-sur-
[10] R. Barco, V. Wille, and L. Dýez, “System for automatic diagnosis in cellu- Yvette, France.
lar networks based on performance indicators,” Eur. Trans. Telecommun., After his training period with SONDRA
vol. 16, no. 5, pp. 399–409, Oct. 2005. (a joint laboratory between the National University
[11] P. Langley, W. Iba, and K. Thompson, “An analysis of Bayesian classi- of Singapore and Supélec), he joined France
fiers,” in Proc. 10th Nat. Conf. Artif. Intell., 1992, pp. 223–228. Telecom R&D, Issy les Moulineaux, France, in
[12] R. Barco, R. Guerrero, G. Hylander, L. Nielsen, M. Partanen, and 2005. He has participated in different projects on wireless network engineering,
S. Patel, “Automated troubleshooting of mobile networks using Bayesian including radio network simulators and troubleshooting tools. His research
networks,” in Proc. IASTED Int. Conf. CSN, Malaga, Spain, Sep. 2002, interests include radio mobile network optimization, quality evaluation, and
pp. 105–110. automated troubleshooting techniques.
[13] R. Barco, L. Nielsen, R. Guerrero, G. Hylander, and S. Patel, “Automated
troubleshooting of a mobile communication network using Bayesian
networks,” in Proc. IEEE Int. Workshop MWCN, Stockholm, Sweden,
Sep. 2002, pp. 606–610.
[14] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plau-
sible Inference. San Francisco, CA: Morgan Kaufmann, 1988.
[15] R. C. Holte, “Very simple classification rules perform well on most com-
monly used datasets,” Mach. Learn., vol. 11, no. 1, pp. 63–90, Apr. 1993.
[16] U. M. Fayyad and K. B. Irani, “Multi-interval discretization of
continuous-valued attributes for classification learning,” in Proc. 13th Int.
Joint Conf. Artif. Intell., 1993, pp. 1022–1027.
[17] Z. Altman, R. Skehill, R. Barco, L. Moltsen, R. Brennan, A. Samhat,
R. Khanafer, H. Dubreil, M. Barry, and B. Solana, “The Celtic Gandalf
framework,” in Proc. IEEE MELECON, Benálmadena, Spain, May 2006,
pp. 595–598.
[18] A. Samhat, Z. Altman, M. Francisco, and B. Fourestié, “Semi-dynamic
simulator for large scale heterogeneous wireless networks,” Int. J. Mob. Raquel Barco received the M.Sc. degree in telecom-
Netw. Des. Innov., vol. 1, no. 3/4, pp. 269–278, 2006. munication engineering and the Ph.D. degree from
[19] S. B. Jamaa, H. Dubreil, Z. Altman, and A. Ortega, “Quality indicator the University of Málaga, Málaga, Spain, in 1997 and
matrices and their contribution to WCDMA network design,” IEEE Trans. 2007, respectively.
Veh. Technol., vol. 54, no. 3, pp. 1114–1121, May 2005. From 1998 to 2000, she was with the European
[20] R. Kohavi, “A study of cross-validation and bootstrap for accuracy esti- Space Agency, Darmstadt, Germany. From 2000 to
mation and model selection,” in Proc. 14th Int. Joint Conf. Artif. Intell., 2003, she worked part-time for Nokia Networks.
San Mateo, CA, 1995, pp. 1137–1143. Since the end of 1999, she has been with the
Communication Engineering Department, Univer-
sity of Málaga. Her research interests include satel-
lite and mobile communications, mainly focussing
self-regulation of radio networks.
Rana M. Khanafer received the B.Sc. degree in
telecommunication from Saint-Joseph University,
Beirut, Lebanon, in 1999, the M.Sc. degree in com-
puter science from the University of Paris 6, Paris,
France, in 2001, and the Ph.D. degree in com-
puter science from “Ecole Nationale Supérieure des
Télécommunications,” Paris Cedex 13, France,
in 2005.
She was an Assistant Professor from 2001 to 2004
and a Research Assistant from 2004 to 2005 with
the University of Paris 6. Since 2005, she has been
a Research Engineer with France Telecom R&D, Issy les Moulineaux, France,
and has participated in different projects on wireless network engineering, per-
formance evaluation, and design of traffic controls for multiservice networks.
Her research interests include mobile communications, quality evaluation, and
end-to-end QoS in multiservice networks. Lars Moltsen received the M.Sc. degree in computer
science and mathematics from Aalborg University,
Aalborg, Denmark, in 1996.
He is an experienced Entrepreneur, R&D Engi-
neer, and Software Architect, technically special-
Beatriz Solana received the Master Eng. degree izing in artificial intelligence (Bayesian networks)
in telecommunications engineering (radio commu- and mobile technology (3GPP standards, in partic-
nication area) from Madrid Polytechnic University, ular GSM, UMTS, and long-term evolution). From
Madrid, Spain, in 2000. 1996 to 2000, he was with Hugin Expert, Denmark,
Since March 2000, she has been with Telefónica developing software for Bayesian reasoning. From
I+D, Madrid, where she became an R&D Engineer 2000 to 2003, he was a Research Specialist with
with the Radio Communication Systems Group, tak- Nokia Networks, working on UMTS RRM algorithms, contributing to one
ing part in projects related to planning/optimization patent and a number of conference papers. From 2003 to 2007, he was the
tool development for 2G and 3G mobile radio sys- Managing Director of his own company i.e., Moltsen Intelligent Software,
tems. Likewise, she has collaborated in consultancy specializing in wireless communication software and automation of processes.
tasks and support radio planning activities with other The company grew and was sold in February 2007 to Wirtek, Aalborg, which
companies within the Telefónica Group. She has worked on European projects is a Danish/Romanian telecom software provider, where he is currently the
under the Celtic initiative. Business Unit Manager of the software services business unit.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA. Downloaded on November 04,2023 at 09:07:43 UTC from IEEE Xplore. Restrictions apply.
KHANAFER et al.: AUTOMATED DIAGNOSIS FOR UMTS NETWORKS USING BAYESIAN NETWORK APPROACH 2461
Zwi Altman (SM’98) received the B.Sc. and M.Sc. Pedro Lázaro received the M.Sc. degree in telecom-
degrees in electrical engineering from Technion— munication engineering from the University of
Israel Institute of Technology, Haifa, Israel, in 1986 Málaga, Malaga, Spain, in 1997.
and 1989, respectively, and the Ph.D. degree in elec- From 1997 to 1999, he was with the European
tronics from the Institut National Polytechnique de Space Agency, Darmstadt, Germany. From 1999 to
Toulouse, Toulouse, France, in 1994. 2001, he was with the International Telecommunica-
He was a Laureate of the Lavoisier scholarship tion Union, Geneva, Switzerland. Since 2001, he has
from the French Foreign Ministry in 1994, and from been with the Communication Engineering Depart-
1994 to 1996, he was a Post-Doctoral Research ment, University of Málaga. His research interests
Fellow with the University of Illinois at Urbana include satellite and mobile communications, mainly
Champaign. Since 1996, he has been with France focussing in self-regulation of radio networks.
Telecom R&D, Issy les Moulineaux, France, and has participated in different
projects on wireless network engineering. He was the Project Coordinator of
the Eureka Celtic Gandalf project that dealt with the automation of management
tasks in heterogeneous wireless networks. His research interests include mobile
communications, autonomic networking, and data mining.
Dr. Altman was on the winning team of the 2003 Innovation Prize of France
Telecom R&D and was the corecipient of the Wheeler Award for the Best
Application Paper of the IEEE Antennas and Propagation Society in 2005.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA. Downloaded on November 04,2023 at 09:07:43 UTC from IEEE Xplore. Restrictions apply.