2-LTE Access Fault Diagnosis ISSUE1.03 (Drive Test)

LTE Access Fault Analysis
47pt
30pt
反白
LT Medium
:
: Arial
LTE Access Fault Diagnosis
47pt
黑体
28pt
反白
细黑体
www.huawei.com
Copyright © Huawei Technologies Co., Ltd. All rights reserved.
Confidential Information of Huawei. No Spreading Without Permission

35pt
Objectives
⚫ Upon completion of this course, you will be able to:
32pt
 Get deep understanding of UE initial access flow
 Describe the typical access fault scenarios
 Describe how to locate the access fault

) :18pt
 Typical methods for fault analysis
Copyright © Huawei Technologies Co., Ltd. All rights reserved. Page1

35pt
Contents
1. Common Access Problem and Influence Factors
32pt
2. Access Problem Analysis and Case Study
) :18pt

35pt
Common Access Problem
Source Problem Identification Method
Low RRC establishment success 1. A KPI is below the baseline or required
rate value.
Or low E-RAB establishment 2. A KPI deteriorates after an upgrade.
32pt
success rate
Or low CSSR success rate
Unstable RRC connection setup A KPI varies that of the previous day or the
success rate/ E-RAB connection same day of the previous week ( more than
setup success rate/CSSR 20%)
success rate
) :18pt Traffic
KPI An abrupt increase or drop in Despite an abrupt increase or drop in the
access requests measured value of the number of RRC
connection setup attempts in a cell, the
RRC and E-RAB setup success rates are
as the same as normal and the measured
value of the maximum number of users in a
cell remains basically stable.
Sleeping cells No UEs can access a cell all of a sudden
that has been maintaining accessible or
UEs of a cell gradually reduces to zero.
⚫ Note: Access-related KPIs do not cover all access problems. For example, no KPI is
available for the access problem that a UE fails to search an LTE network and
therefore cannot initiate an access request.

35pt
Common Access Problem (Cont.)
Source Problem Identification Method
Unsatisfactory call setup The drive test results show that the
32pt KPI of driver success rate CSSR is below the baseline or required
test value.
Long attach delay The drive test results show that the
average attach delay is below the
baseline or required value.
) :18pt Long Idle-to-Active Delay The drive test results show that the
average attach delay is below the
baseline or required value.
Complaints Failures of calls or data 1.No signal bar is displayed on a UE
services 2. Signal bars are displayed on a UE,
but the user still cannot make a call or
perform a data service.

35pt
Common Factors of Access Fault
RF planning issues：
Resource issues
Coverage issues
32pt 1.Wrong PRACH 1. Air interface
RF parameters 1.Weak coverage
resource limited
2.Improper TA 2.Cross converge
2.CPU overload
planning
Parameters issues RF channel &

Transmission issues
1.Wrong parameters Interference：
Parameters 1.Wrong parameter
Common setting
) :18pt
Factors & Channel 1.High VSWR setting ；
2.Improper parameters 2.High RSSI/RSSI
setting 2.High BER
unbalance
EPC
eNodeB issues
UE issues： 1.Wrong subscription
Device 1.eNodeB fault
data
1.Top UE problem 2.Sleeping cell
2.EPC abnormal

35pt
Contents
1. Common Access Problem and Influence Factors
32pt
2. Access Problem Analysis and Case Study
) :18pt

35pt
General Analysis Procedure
Step Prerequisite Purpose
Perform scope 1. Determine whether the problem is a top cell problem or
32pt identification, KPI network-wide problem.
None
trend analysis, and 2. Analyze the major causes of access failures and come up with
cause resolution priority actions.
1.Check operation logs to identify parameter modification and
Checking operation
operations that may cause problems.
logs, device faults,
None. 2. Check whether there are device faults that result in access
alarms, and
problems or whether access-related alarms are reported.
) :18pt external events
Checking 1.Analyze whether the core access parameters are proper on or
None
parameters consistent between the eNodeB and EPC.
Perform this
1. Check the access problem is caused by improper configurations
action based
Checking network of parameters
on the analysis
planning and 2. Check whether an access problem is caused directly by resource
results
optimization congestions or indirectly by improper parameter configurations.
provided by
3. Check whether an access failure is caused by weak coverage.
action 1.

35pt
General Analysis Procedure (Cont.)
Action Prerequisite Purpose
For top cell 1. Check whether RF channels are normal.
Checking RF
problems, perform 2. Check for uplink interference
32pt channels
this action.
For top-cell
1. Check whether access-related KPI deterioration
problems, check
happens on individual user(s).
Checking top UE top users.
2. Check whether the access problem is caused by a
types and top users For network-wide
certain type of UEs.
problems, check
) :18pt top UE types.
Perform this
action based on Check whether an access failure is caused by the EPC
Checking the EPC the analysis faults.
results provided
by action 1
Perform this
action based on
Checking Check whether the access failure is caused by
the analysis
transmission transmission faults.
results provided
by action 1.

35pt
Relevant Data Source
⚫ Performance Counters
32pt  Performance counters can be obtained easily and allows data collection after the
event. However, it provides only a rough problem identification result.
 Performance counters help determine whether an access problem occurs in the RRC
connection setup phase or in the E-RAB connection setup phase and the main causes
of the problem. It can be used for rapid problem identification and rough location of
) :18pt
the fault NE.
⚫ Trace signaling
 Signaling messages help accurately identify the phase in which an access failure
occurs, which is useful for checking compatibility problem of UEs and the EPC and
problems discovered during drive test and repeating an access problem. It is good
practice to perform signaling tracing over Uu and S1 interfaces on the faulty node on
which an access problem is discovered by analyzing performance counters and alarms
or during drive tests.

35pt
Relevant Data Source (Cont.)
⚫ Drive Test Data
32pt
 The drive test data helps obtain the signal strength, scheduling
information (depending on drive test tools and UEs) of the UEs. An
access problem can be accurately identified by comparing the drive
) :18pt test data with the signaling messages on the eNodeB side.

35pt
Action 1: Scope Identification
Data Analysis Method Solution
Source
32pt
Perfor 1. Identify the scope of the problem Continue to
mance 2. Identify whether the access problem occurs in perform other
Counte
rs the RRC connection setup phase or the E-RAB steps in action 1.
connection setup phase by analyzing the RRC
) :18pt connection success rates and the E-RAB
connection success rates.
3. If the access problem occurs in the RRC
connection setup phase, collect the data of
RRC connection setup success rates in
different scenarios by the cause of the RRC
connection setup
⚫ Procedure introduction: Not the required actions need to be performed. Identifying

whether an access problem occurs only in the top cells or on the entire network helps
determine prerequisites for choosing the necessary actions for solving this problem to
avoid unnecessary actions. A top-cell problem and a network-wide problem are
defined as follows:
 Top-cell problem: If an access KPI on the network is significantly improved
before and after the top 10 cells with the lowest access success rates and
the top 10 cells with the largest number of access failures are excluded or
still meets the requirement after these cells are excluded.
 Network-wide problem: If an access KPI on the network remains basically
unchanged before and after the top 10 cells with the lowest access success
rates and the top 10 cells with the largest number of access failures are
excluded.
 To filter out the top cells, a threshold needs to be properly set on the FMA
tool for the access success rate and the number of access failures,
respectively.
 If the access problem occurs in the top cells, perform action 5 to check weak
coverage, improper network planning parameters, and congestion, action 6
to check the RF channels, and action 7 to check top users. If it occurs on the
entire network, perform action 7 to check top UEs of certain types.

35pt
Action 1: KPI Trend Analysis
Data Analysis Method Solution

32pt
Source
Perfor 1. Analyze KPI trend by day 1. If the KPI deteriorates suddenly,
mance 2. Analyze KPI trend by hour check whether the deterioration is
Counte 3. Analyze related KPIs: caused by an upgrade, abnormal
rs operations, external interference, and
) :18pt burst of services.
2. If the KPI deteriorates gradually,
check whether the service volume is
increasing or a new type of UEs are
released.
⚫ The access problem scenarios are categorized as deterioration scenarios and

optimization/new deployment scenarios to facilitate selection of the top cells.
 Deterioration Scenarios: The access success rate of the live network deteriorates all of
a sudden and stays persistently at a low level. In this scenario, perform action 2 first to
check the operation logs and external events, such as changes in parameter
configurations, enabling of a new feature, a version upgrade, and a network swap. If
the cause is unclear at the present and cannot be located, the problem is also grouped
to this scenario for further analysis.
 Optimization/New Deployment Scenario: The access success rate is keeping below
the requirement and needs to be improved on the live network. If the problem occurs in
a new deployment or network swapping scenario, KPI mapping needs to be performed.
⚫ Procedure Introduction:
 Analyze the recent trend of access KPIs (including access success rate and number of
access attempts) to determine whether the KPI deteriorates suddenly or gradually.
 If the KPI deteriorates suddenly, check whether an upgrade on the EPC or RAN side or
a network cutover has been performed, or whether the network topology is changed
and perform action 2 to check for operation alarms.
 If the KPI deteriorates gradually, check whether the number of users increases
gradually and whether new types of UEs are released.
 Other KPIs can also be correlated for analyzing the access problem.

35pt Action 1: Cause Resolution and KPI

Correlation
Procedure Access Failure Criteria Description Prevailed
Cause Action
32pt Random No UEs Cell status is normal and Sleeping cell None
access access the UEs access the network.
network. The following situation
suddenly or gradually
occurs:
L.Traffic.User.Max = 0 or
L.RRC.ConnReq.Att = 0
RRC L.RRC.SetupF L.Traffic.User.Max is close Resource congestion Check
) :18pt success ail.Rej to or satisfy product congestion
rate specifications.
deterioratio
RRC connection setup 1. PUCCH parameter configurations are Check
n
Reject contains limited. congestion
L.RRC.SetupFail.ResFail. 2. PUCCH resources fail to be
PUCCH. expanded.
3. The number of UEs is large.
RRC connection setup The number of online UEs exceeds the Check
Reject contains CAPS specifications of a single eNodeB congestion
L.RRC.SetupFail.Rej.Flow and VS.BBUBoard.CPUload.Max is less
Ctrl. than 80%.
⚫ Classify problems based on access failure values and associated KPIs so that actions
to be taken and the sequence of taking actions can be determined. You are advised
to focus on causes and associated KPIs of critical problems and determine whether
inventory optimization is required based on provided standards.


Correlation (Cont.)
Procedure Access Failure Criteria Description Prevailed Action
Cause
32pt RRC success .RRC.SetupFail. L.UL.Interference.Avg ≥ - 1. Interference exists. Check for interference.
rate NoReply 105 2. Top users or UEs are Check for exceptions
deterioratio abnormal. on top users or UEs.
n BBP BBP The BBP CPU usage is high Check for congestion
VS.Board.CPUl VS.Board.CPUload.Max is due to congestion
oad.Max is about 90%.
about 90%.
) :18pt The number L.RRC.ConnReq The number of RRC 1. TA planning is improper. Check parameter
of RRC .Att abruptly connections 2. NAS messages on the EPC Check for EPC
connections increases. (L.RRC.ConnReq.Att) are abnormal. exceptions.
abruptly abruptly increases, the 3. UE processing is abnormal. Check for exceptions
increases. RRC&E-RAB setup success on top users or UEs.
rate is normal, and
L.Traffic.User.Max does not
suddenly increase.


Correlation (Cont.)
Proced Access Failure Cause Criteria Description Prevailed Action
ure
32pt L.E-RAB.FailEst.MME 1. E-RAB setup failures occur due to EPC Check for EPC
exceptions. exceptions.
2. The eNodeB does not receive any UE's Check for
response and the timer at the eNodeB air exceptions on top
None
interface and S1 interface is greater than the users or UEs.
E-RAB context setup timeout timer on the EPC. As a
setup result, the EPC releases the UE's context setup
) :18pt success message earlier.
rate L.E-RAB.FailEst.TNL 1. IPPATH is not configured or incorrectly Check for
deterio None configured. transmission
ration
2. The SCTP link is intermittently disconnected. exceptions
L.E- 1. Interference exists. Check for
RAB.FailEst.NoReply L.UL.Interf 2. Top users or UEs are abnormal. interference.
erence.Av Check for
g ≥ -105 exceptions on top
users or UEs.


Correlation (Cont.)
32pt
Proced Access Failure Criteria Description Prevailed Action
ure Cause
L.E- 1. The system exceeds the license capacity. Check congestion
RAB.FailEst.NoRad None 2. Parameter configurations are incorrect.
ioRes
E-RAB L.E- 1. The security mode configuration fails Check for
) :18pt
setup RAB.FailEst.Secur due to UE incompatibility. exceptions on top
succes ModeFail 2. The EPC is abnormal. users or UEs.
None
s rate 3. eNodeB completeness or encryption Check for EPC
deterio algorithms are incorrectly configured. exceptions
ration

35pt
Action 2: Check Operation Log
Data Source Analysis Method Solution
Operation Analysis method: Check whether
32pt logs 1. Check for abnormal operations that may have the operation
(operation been performed a week before the can be rolled
logs of top deterioration occurs in a sudden KPI back. If yes,
10 sites for deterioration scenario. check whether
a network- 2. Check for abnormal operations that may have the KPI
wide been performed within the latest week in a improves after
problem and gradual KPI deterioration scenario. the rollback.
) :18pt
top 10 cells 3. Check for abnormal operations that may have
for a top-cell been performed at all sites if a problem occurs
problem) on the entire network.
4. Abnormal operations include but not limited to
addition, removal, blocking, activation, and
deactivation.
5. If an operation is performed in a batch fashion,
it needs to be checked on the MAE because
the operation details are not available on the
eNodeB side.
⚫ Procedure introduction: perform this action to check whether a KPI change is

caused by an operation. This action helps check for a chronological corresponding
between a KPI change and a performed operation and therefore is mandatory in a
KPI deterioration scenario. This guide provides the external events that are already
known being factors that may cause KPI changes and alarms that will be reported if a
KPI change occurs. If other events or alarms are also chronologically correlated with a
KPI, they also require special attentions.

35pt
Action 2: Check Device Fault & Alarm
Data Source Analysis Method Solution
32pt Analyze the impact of
Alarm and 1. Check alarms and device fault logs
fault logs within a week before the deterioration alarms and device
(top 10 occurs and active alarms and faults in faults on an access
sites for a a sudden KPI deterioration scenario. KPI. Then, clear
network- 2. Check alarms and device fault logs these alarms by
wide within the latest week and active referring to the alarm
) :18pt problem alarms and faults in a gradual KPI and fault handling
and top 10 deterioration scenario. guide and check
cells for a 3. Check for abnormal operations that whether the KPI
top-cell may have been performed at all sites improves after the
problem) if a problem occurs on the entire alarms are cleared.
network.
⚫ Procedure introduction: Perform this action to check for device faults and alarms
related to a KPI change. A fault that directly affects the KPI or the alarm of such fault
needs to be handled in the first place, with a higher priority over the fault or alarm that
is barely related to the KPI change.

35pt
Typical Alarm for Access Problem
Mod
Alarm/Event Name Alarm/Event Impact
ule
Services that exceed the license capacity cannot be
32pt accessed.
eRAN3.0:If a license capacity for the maximum number of
users exceeds the limit, the eNodeB allows these UEs to
ALM-26812 System
access the network and immediately releases these UEs.
Licen Dynamic Traffic
As a result, the number of E-RAB connection setup
se Exceeding Licensed
attempts and releases increases.
Limit
eRAN6.0:If a license capacity for the maximum number of
) :18pt
users exceeds the limit, the ERAB setup will fail. The
eNodeB response MME UE context setup fail with cause
radio-resources-not-available.
This PDSCH power configuration change does not take
ALM-29241 Cell
effect. As a result, the cell coverage does not meet the
Reconfiguration Failed
requirement.
Cell
ALM-29245 Cell Blocked The cell cannot provide services.
ALM-29240 Cell
The cell cannot provide services.
Unavailable

35pt
Typical Alarm for Access Problem (Cont.)
Module Alarm/Event Name Alarm/Event Impact
The board cannot work properly and services carried over
this board may be interrupted.
32pt ALM-26200 Board Hardware
The board cannot perform all the designed functions and
Fault
the board reliability degrades. If this problem persists,
services carried over this board may be interrupted.
The access success rates and service quality may
Board
deteriorate.
If this problem persists, the maintenance operations on this
ALM-26202 Board Overload board may slowly respond and even fail due to operation
) :18pt timeout.
The test operations and tracing tasks of lower priorities
may be suspended or terminated.
ALM-29207 eNodeB Control
All the SCTP links in the eNodeB are faulty, resulting in
Plane Transmission
failures such as S1 and X2 link setup failure, cell activation
Interruption
failure, and network access failure of users.
S1
Interfac ALM-25888 SCTP Link Fault The SCTP link cannot process signaling.
e
ALM-25889 SCTP Link The services are interrupted because the data cannot be
Congestion transmitted due to insufficient space of the sending buffer.

35pt
Typical Alarm for Access Problem (Cont.)
Module Alarm/Event Name Alarm/Event Impact
The return loss at the antenna port is excessive. As a
32pt ALM-26529 RF Unit VSWR result, the RF unit automatically switches off the TX
Threshold Crossed channel, and the ongoing services carried on the TX
channel are interrupted.
ALM-26532 RF Unit Hardware The RF unit may work improperly. The ongoing services
Fault carried on the RF unit may be interrupted.
ALM-29207 eNodeB Control
All the SCTP links in the eNodeB are faulty, resulting in
) :18pt
Plane Transmission
failures such as S1 and X2 link setup failure, cell activation
RF Interruption
failure, and network access failure of users.
Channel
The receive sensitivity of the RFU decreases, the
ALM-26521 RF Unit RX demodulation performance of the cell deteriorates, and the
Channel RTWP/RSSI Too Low uplink coverage shrinks.
If the RTWP/RSSI on all RX channels of the cell is too low,
the ongoing services of the cell may be interrupted.
ALM-26522 RF Unit RX The receive sensitivity of the RFU decreases, the
Channel RTWP/RSSI demodulation performance of the cell deteriorates, and the
Unbalanced uplink coverage shrinks.

35pt
Cause of High VSWR
⚫ VSWR: Voltage Standard Wave Ratio, indicate the if feeder impedance is
32pt match or not. The normal VSWR range is 1~1.5. If current VSWR is more
than a specified threshold, then eNodeB will generate relevant alarm.
⚫ Alarm generation:
 Cell is activated
) :18pt  RF power is more than 34dBm/channel

 VSWR is more than a specified threshold
⚫ Possible cause:
 Incorrect VSWR alarm threshold
 RF unit hardware fault
 RRU/RFU wrong connection
 Bad feeder quality or nonstandard feeder installation

35pt
Solution for High VSWR
⚫ Step 1: Check if the VSWR alarm threshold is correct (default is 2dB).
 Note: RRU TX channel will automatic closed if VSWR is extremely high (more than post-
32pt
processing threshold)
Threshold for TX shutdown

) :18pt
Threshold VSWR alarm

generation
⚫ Step 2: If alarm threshold is correct, then check if relevant feeder installation and RRU
connections are meet the standard.
⚫ Step3: After the feeder tuning, if TX channel is closed, then activate TX channel again
 MOD TXBRANCH

35pt
Cause of Low RSSI
⚫ Alarm generation: If RSSI is less than a specified value
32pt
⚫ Incorrect RX attenuation setting

 If no TMA, the attenuation should be 0
 If 12dB TMA is used, the attenuation should be 4 to 11dB

) :18pt
 If 24dB TMA is used, the attenuation should be 11 to 22dB
⚫ Feeder problem
 Due to bad feeder quality, causing additional loss
 RRU fault

35pt
Solutions for Low RSSI
⚫ Step 1: Check if RRU RX attenuation is correct
32pt
⚫ Step2: If attenuation is correct, then check the feeder installation

) :18pt
and connection

35pt
Cause of Imbalance of RSSI
⚫ Alarm generation: RSSI of the main RX channel and the RSSI of
32pt
the diversity RX channel exceeds 10 dB.
⚫ Possible causes
 High interference
) :18pt
 RRU cross connections

35pt
Solution of Imbalance of RSSI
⚫ UL interference check
32pt  From web-LMT: Perform spectrum detection to evaluate UL interference
 From MAE client: Perform interference detect monitoring
 Find out interference source
⚫ Check RRU connections, avoid cross connections, show as below

) :18pt
A A
N N
T T
1 2
RRU1 RRU2

35pt
Action 3: Check Parameters
Parameter Source Impact Recommended Value
Number of initial MML If the switch for dynamic adjustment If the switch for dynamic
32pt PDCCH to the number of OFDM symbols adjustment to the number of
symbols occupied by the PDCCH is turned off OFDM symbols occupied by the
and this parameter is set to 1, the PDCCH is turned on, preferably
peak throughput of a single user set this parameter to 1.
increases. However, if the bandwidth If the switch for dynamic
of the cell is lower than 3 MHz, it will adjustment to the number of
impact user access OFDM symbols occupied by the
PDCCH is turned off, preferably
) :18pt set this parameter to 3
Encryption MML In the ENodeBCipherCap MO, the PrimaryCipherAlgo = AES,
Algorithm PrimaryCipherAlgo, SecondCipherAlgo = Snow3G,
SecondCipherAlgo, and ThirdCipherAlgo = NULL;
ThirdCipherAlgo parameters must be
set to different values.
Primary integrity MML In the ENodeBIntegrityCap MO, the PrimaryIntegrityAlgo = AES,
algorithm PrimaryIntegrityAlgo, SecondIntegrityAlgo = Snow3G,
SecondIntegrityAlgo, and ThirdIntegrityAlgo = NULL;
ThirdIntegrityAlgo parameters must
be set to different values.

35pt
Action 3: Check Parameters (Cont.)
Parameter Source Impact Recommen
ded Value
32pt S1 MML If this parameter is set to a small value, the eNodeB 20s
message may determine that timeout occurs even when the
waiting MME does not respond to the message. If this
timer parameter is set to a large value, when exceptions
occur but no response messages from the MME are
received, system resources will be occupied for a
long period of time.
) :18pt
Uu MML This parameter affects the timeout length for the 35s
message timer for the eNodeB waiting for the UE to send the
waiting Uu response message. If this parameter is set to a
timer small value, the eNodeB may determine that timeout
occurs even when the UE does not respond to the
message. If this parameter is set to a large value,
when exceptions occur but no response messages
from the UE are received, system resources will be
occupied for a long period of time.

35pt
Action 3: Check Parameters (Cont.)
Parameter Source Recommended Value
32pt AMBR S1 signaling (MME) Set this parameter a value greater than 0. If
AMBR is set to 0, users cannot access the
network.
ARP S1 signaling (MME) Set this parameter to a non-zero value. If ARP is

set to 0, users cannot access the network.
Network mode Configurations on the UE Set this parameter to Auto (rather than WCDMA-
) :18pt
side only or GSM-only, or LTE-only if inter-RAT
interoperability is enabled on a network)
Set this parameter to LTE-only if UEs are always
camping on a WCDMA or GSM network when this
parameter is set to Auto.
APN Configurations on the UE Dynamic APN configurations are preferred.

configuration side

35pt
Action 4: Weak Coverage Check
32pt Data Analysis method Solution

Source
Driver Analysis method: If the drive test 1. If only top UEs are
test log
log shows that the downlink RSRP experiencing weak
is below -119 dBm, the problem is coverage, clarify this point
) :18pt
caused by weak coverage. to the operator.
2. If massive UEs are
experiencing weak
coverage, perform RF
tuning
⚫ Procedure introduction: This action checks whether an access problem is caused by

weak coverage by analyzing the top cells. If only top UEs are experiencing weak
coverage, perform the following closed-loop actions to solve this problem.

35pt
Action 4: Check Congestion
Data Analysis method Solution
32pt Source
Performan Check relevant resource, including: Located the bottleneck
ce counter PRB, CCE, CPU usage, license, of congestion and
PUCCH usage, active user number perform capacity
extension
) :18pt
⚫ Relevant threshold for congestion prevention

 PRB usage < 70%
 CCE usage < 70%
 CPU usage < 60%
 PUCCH usage < 70%
⚫ This action targets at top cells and is performed in any of the following scenarios:
 1. L.RRC.SetupFail.ResFail returns to a non-zero value and the number of
UEs is limited.
 2. L.RRC.SetupFail.NoReply returns to a non-zero value and the BBP CPU
usage is high.
 3. L.RRC.SetupFail.Rej.FlowCtrl returns to a non-zero value.
 4. L.RRC.ConnReq.Msg.disc.FlowCtrl returns to a non-zero value.

35pt
Action 5: Check Interference
32pt

Source
Interfere Check for uplink interference traffic statistics of cells in idle hours. If Interference
nce L.UL.Interference.Avg is -105 dBm or greater, UL interference is likely to analysis
traffic occur.
) :18pt
statistics
Interfere Perform the real time interference monitoring, if interference power on
nce each RB is more than -129dBm, we can consider there is an interference
monitori
ng

35pt
Action 6 : Top UE Check
Data Source Analysis method Solution
32pt
Use the Capabilities of the Top Users function 1. Check for known
Nastar provided by the Nastar to check the ratio of top UE issues in the
(Choose top type problems to total exceptions. If the ratio of
10 sites for a exceptions due to top 1 UE type is twice higher current UE version.
network-wide than that of normal UE types, the problem is a top 2. Use the UE to
problem. UE type problem. repeat the problem
) :18pt ) 1. Collect logs of top 10 sites and statistics about the
UE capacity recorded in CHRs. Then, generate the
ratio of each UE capacity to obtain top 1 UE type.
2. Calculate the proportion of exceptions generated
by the top 1 UE type and proportions of exceptions
generated by other top UE types based on the
statistical results of the CHR.
⚫ Procedure introduction: If a problem is network-wide problem, check whether it is caused by

a certain type of UEs. This action helps check whether an access KPI deterioration occurs due
to bad performance of a type of UEs or due to incompatibility between these UEs and
Huawei-built networks. If the problem is directly caused by these top UEs, check whether a
known issue has already been identified for these UEs of a version through the Headquarters'
IOT channel. Then, repeat this problem by using UEs of the version.

35pt
Action 7: Check EPC Exceptions
Source
32pt E-RAB setup failures: Analyze the following problems by using CHRs and 1. If EPC exceptions
CHR and Uu/S1 signaling. exist, locate the
Uu/S1 1. Failure of an eNodeB to respond to a context，Check AS layer problem with EPC
signaling integrated protection and encrypted algorithm configuration on the personnel.
eNodeB by using standard signaling. Check whether IP address at the 2. If UE exceptions
transmission layer on the EPC, AMBR, and ARP are correct. exist, perform top
2. Abnormal release of active MME release: According to CHRs or Uu/S1 users/UE check
signaling, check whether the EPC delivers the release command too early 3. If air interface
) :18pt
or the the EPC delivers the release command after the timer for the EPC scheduling is
waiting for context setup expires because the eNdoeB air interface waits inappropriate,
for a long period of time. For such problems, check the length of the timer perform weak
for waiting contexts with the EPC personnel. coverage and
Abnormal NAS： interference check
1.Authentication failure
2.NAS security activation failure
⚫ The abnormal E-RAB setup signaling process is as follows: After the EPC delivers the context
setup request message of a UE to the eNodeB, the eNodeB fails to respond to the message,
or the EPC delivers the command to release UE's contexts before the eNodeB responds to the
context response.

35pt
Action 8: Check Transmission
source
32pt Performa The cause value of an access failure is L.E-RAB.FailEst.TNL. 1. Repeat the transmission
nce Analyze the quality of SCTP link by counters of caused access problem.
Counters
VS.SCTPLnk.Cong.Dur and VS.SCTPLnk.Cong and 2. Troubleshoot the
VS.SCTPLnk.Unavail.Dur and VS.SCTPLnk.Unavail. transmission faults.
Alarm Check for ALM-25888 SCTP Link Fault, ALM-25886 IP Path Fault, Clear the reported alarms by
and ALM-29240 Cell Unavailable referring to the Alarm/Event
) :18pt
References.
Paramete Check whether the settings of VLAN, DSCP, IPRT, IPPATH, SCTP, Reconfigure the parameters
r settings and other transmission parameters are the same as the planned that are not currently
settings.
configured as the planned
settings.
⚫ Procedure introduction: This action is required if traffic data analysis determines the cause of
L.E-RAB.FailEst.TNL or low S1 signaling message setup success rate at the top sites for an
access problem. A transmission fault must be cleared in the first place under any condition.
Alarms or faults that are not closely related to the access success rate can be handled with a
lower priority than transmission faults.

35pt
Action 8: Check Transmission (Cont.)
source
32pt Signaling If a context setup fails, check the 1. If an IP address is not configured at the
message value of the transportLayerAddress eNodeB transmission layer, configure an IP
s over S1 field in the
interface INITIAL_CONTEXT_SETUP_REQ address for the IP path of the eNodeB.
message is consistent with the peer 2. If an IP address is not configured as planned
IP address of the IP path by using at the eNodeB transmission layer,
the S1 interface signaling message. reconfigure an IP address for the IP path of
) :18pt
the eNodeB.
3. If an IP address is not configured as planned
at the EPC transmission layer, contact the
EPC engineer for a reconfiguration of an IP
address for the IP path of the eNodeB.

35pt Case 1 – IP Path Configuration Leads to

Low Access Ratio
⚫ Description: In one live network, ERAB setup ratio is very low,
32pt
almost 50%
⚫ Alarm info: None
⚫ Analysis:
) :18pt
 Check the performance statistic of ERAB release, we found that
most of failures are caused by transport resource not available
 From the message tracing, we can also confirm this result
⚫ Tracing message
Previous Next

35pt Case 1 –IP Path Configuration Leads to

Low Access Ratio (Cont.)
⚫ Now we can locate that the problem is due to IP path
32pt
configuration. We check eNodeB configuration and find a valid
IP path, the connection is valid. And it is also weird that this
problem occurs with a probability
) :18pt
⚫ Then we analyze the message again, and find that in the “UE
context setup request” message delivers variable GTPU
address, and these address is not identical with eNodeB
configuration.
⚫ Message tracing
Previous Next

35pt Case 1 –IP Path Configuration Leads to

Low Access Ratio (Cont.)
⚫ Conclusion :As the SGW deliveries multiple GTUP address to
32pt
eNodeB, but we only configure one IP path to SGW, so it causes
the partly ERAB setup failure
⚫ Solutions: We confirm this with EPC engineer, SGW pool function

) :18pt
is used, so there are multiple IP addresses for SGW. Then we
make up the IP path for all address, the problem is cleared.

35pt Case 2 – Wrong EPC Parameters Leads to

ERAB Setup Failure
⚫ Description: In one live network, ERAB setup ratio is very low, it is
32pt
about 85%
⚫ Analysis: We first check the failure cause, most of failure reason

) :18pt
is “MME related”, and we check with MME side, everything
seems normal. So we find some top cells, trace S1 message
during the busy hour.


ERAB Setup Failure (Cont.)
32pt
) :18pt
⚫ Above is a failure message, from the message, we find that the

failure cause is “ semantic error” which means there is something
error in previous message.


ERAB Setup Failure (Cont.)
32pt
) :18pt
⚫ We go on to check the previous message, and find that the QCI

from MME is wrong.
⚫ Solution: this is a bug of MME, sometimes MME will give the

wrong QCI. After software upgrade, the problem is solved

35pt
Case 3 – Multi-mode UE Attach Failure
32pt S1
UU trace
trace
) :18pt
⚫ Description: During the commissioning of one live network, we found that

UE attach fails.
⚫ Message Tracing: From the message tracing, we can see that most of access
procedures are normal, after ERAB setup, MME release the connection with
the cause “normal release”

35pt Case 3 – Multi-mode UE Attach Failure

(Cont.)
32pt
⚫ Analysis
) :18pt  Since the most of procedures are normal, and eNodeB consider that it’s
a normal release, so we locate that the problem is due to NAS failure.
 So we continue to analyze the previous NAS message, and in the DL NAS

message, we can see that MME response the attach with the cause
“MSC-temporarily-not-reachable”, as show above. now we can confirm
that this is root cause of attach failure


(Cont.)
32pt
) :18pt
⚫ Analysis (Cont.)
 Why should UE need attach to MSC ? We check the UE model, it ‘s
Huawei E398, with multi-modes(GSM/UMTS/LTE). So we assume
that this UE performs combined attach. From the attach request
message we verify our assumption.


(Cont.)
32pt
⚫ Conclusion:
) :18pt
 We check that there is no CS domain configuration in current EPS network, so MME
only replies PS attach accept, and also inform that MSC can’t reachable
⚫ Temporary solution:
 Change UE attach mode with PS only
 Add CS domain configuration in EPC
⚫ Final solution:
 Update MME to compatible with combined attach even there is no CS domain

35pt Case 4 : Low Access Success Rate Due to

Improper SRS Subframe Configuration
32pt ⚫ Problem: The onsite KPI monitoring engineer discovers a
decrease in access success rates at one site between September
24 and October 7.
) :18pt

35pt
Troubleshooting Process
32pt
The performance counters

) :18pt exported by using the FMA
shows that the number of
L.RRC.SetupFail.ResFail
messages is equal to that
of the The maximum
L.RRC.SetupFail.Rej number of
message, which indicates users is 14 in
that RRC connection this cell,
setup failures are caused which stands
by allocation failures of below the
resources of RRC allowed limit.
connections.
Action Analysis Results Status

1. The onsite KPI monitoring engineer discovers an extremely low
setup success rate of RRC connections and E-RAB connections
at one site during the monitoring period. Therefore, this
problem is a top-cell problem.
Action 1: 2. The cause of RRC connection setup failures is
performing scope L.RRC.SetupFail.ResFail, which indicates that this problem is
identification, KPI caused by allocation failures of resources for RRC connections. OK
trend analysis, and 3. It is confirmed that this problem coincides with a horse race
cause resolution held at the site area and that the top cell covers the horse face
course.
The maximum number of users is 14 in this cell, which stands below
the allowed limit. Therefore, only required actions are required
based on analysis of action 1.

35pt
Troubleshooting Process (Cont.)
Stat
Action Analysis Results
us
Action 2: checking
32pt operation logs, device No operation alarms of this cell are reported within the one week that
OK
faults, alarms, and precedes the occurrence of this problem.
external events
The configuration files of this eNodeB and these of the eNodeB of which
Action 3: checking the cells are normal are compared by the frontline engineers, finding
OK
parameters that the SRSSUBFRAMECFG parameter is set to SC9 at this site and is
SC3 at other sites.
) :18pt Action 4: checking
version differences and This problem has been identified, and this action is not required. /
known issues
⚫ Parameter analysis:
 As described in the impact of core parameters, this SRS configuration does not allow
SRS resources of this cell to decrease or increase. Therefore, the number of users of
this cell cannot be dynamically adjusted.
 Before the horse race event, the cell has a limited number of users and therefore an
SRS resource increase is not required. However, the number of users significantly
increases during the horse race, yet the SRS resource increase cannot be performed.
⚫ Description of the SRSSUBFRAMECFG parameter in the LTE access parameters:
 If SrsSubframeCfg is set to SPC9, the cell subframe cycle is 10 ms and the offset is 0.
The cell subframe cycle and offset do not allow a cell migration. Therefore, this cell
does not allow resources of this cell to increase or decrease.

Case 5 Drastic Increase in the Number of

35pt
RRC Connection Setup Attempts Due to
Improper TAC Planning
32pt
⚫ Problem description: Despite normal setup success rates of RRC
connections and E-RAB connections in some areas of a site, the
number of RRC connection setup attempts increases drastically
) :18pt and the number of E-RAB connection setup attempts remains the
same.

35pt
Troubleshooting Process
32pt 1. Analysis of daily performance counters
shows that the drastic increase of RRC
connection setup attempts takes place only
at the top sites, rather than on the entire
Action 1: network.
performing scope 2. Despite normal setup success rates of
) :18pt
identification, KPI RRC connections and E-RAB connections, OK
trend analysis, and the number of RRC connection setup
cause resolution attempts increases drastically and the
number of E-RAB connection setup
attempts remains the same. Therefore, this
problem is empirically caused by improper
TAU planning.

35pt
32pt Action Analysis Results Status

Action 2: checking
operation logs,
device faults, No exception is found. OK
alarms, and
external events
) :18pt
The eNodeB parameter configurations of the
Action 3: checking problematic site and these of the normal
OK
parameters sites are compared, and no difference is
found.
Action 4: checking No known issue that will result in this
version differences problem is found in the related release notes OK
and known issues and preventive guides.

35pt
32pt
The checking of the network topology finds that the top
sites (in the red circles in the following figure) are different
Action 5: checking
from the nearby sites in terms of TAC planning and they
network planning and OK
belong to different TALs. Therefore, the top sites are
optimization
discretely distributed, with improper TAC and TAL
planning. For details, see the following figure.
) :18pt
⚫ Analysis:
 The setup success rates of RRC connections and E-RAB connections are not decreased
at the top sites. Therefore, this problem is not caused by weak coverage.
 No difference is found in the network parameter configurations between the top sites
and normal sites. The only difference is that the two top sites are newly deployed.
 The TAC/TAL of the two top sites in the red circle are improper because they are not
planned by their geographical positions (the sector number is their TAC number).
 A tracking area update (TAU) is required when UEs are moving from a cell of one TAL
to a cell of another TAL. Therefore, UEs served by the two top sites perform frequent
cell reselection when they are moving in the edge areas because the tracking area is
updated in this process. As a result, the number of RRC connection setup attempts
increases.

Thank you
www.huawei.com

2-LTE Access Fault Diagnosis ISSUE1.03 (Drive Test)

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2-LTE Access Fault Diagnosis ISSUE1.03 (Drive Test)

Uploaded by

Copyright:

Available Formats

LTE Access Fault Analysis

Copyright © Huawei Technologies Co., Ltd. All rights reserved.

Confidential Information of Huawei. No Spreading Without Permission

 Describe the typical access fault scenarios

 Describe how to locate the access fault

Copyright © Huawei Technologies Co., Ltd. All rights reserved. Page1

Confidential Information of Huawei. No Spreading Without Permission

2. Access Problem Analysis and Case Study

Copyright © Huawei Technologies Co., Ltd. All rights reserved. Page2

Confidential Information of Huawei. No Spreading Without Permission

Copyright © Huawei Technologies Co., Ltd. All rights reserved. Page3

Confidential Information of Huawei. No Spreading Without Permission

Copyright © Huawei Technologies Co., Ltd. All rights reserved. Page4

Confidential Information of Huawei. No Spreading Without Permission

Parameters issues RF channel &

Copyright © Huawei Technologies Co., Ltd. All rights reserved. Page5

Confidential Information of Huawei. No Spreading Without Permission

2. Access Problem Analysis and Case Study

Copyright © Huawei Technologies Co., Ltd. All rights reserved. Page6

Confidential Information of Huawei. No Spreading Without Permission

Copyright © Huawei Technologies Co., Ltd. All rights reserved. Page7

Confidential Information of Huawei. No Spreading Without Permission

Copyright © Huawei Technologies Co., Ltd. All rights reserved. Page8

Confidential Information of Huawei. No Spreading Without Permission

Copyright © Huawei Technologies Co., Ltd. All rights reserved. Page9

Confidential Information of Huawei. No Spreading Without Permission

Copyright © Huawei Technologies Co., Ltd. All rights reserved. Page10

Confidential Information of Huawei. No Spreading Without Permission

Copyright © Huawei Technologies Co., Ltd. All rights reserved. Page11

⚫ Procedure introduction: Not the required actions need to be performed. Identifying

Confidential Information of Huawei. No Spreading Without Permission

Data Analysis Method Solution

Copyright © Huawei Technologies Co., Ltd. All rights reserved. Page12

⚫ The access problem scenarios are categorized as deterioration scenarios and

Confidential Information of Huawei. No Spreading Without Permission

35pt Action 1: Cause Resolution and KPI

Copyright © Huawei Technologies Co., Ltd. All rights reserved. Page13

Confidential Information of Huawei. No Spreading Without Permission

35pt Action 1: Cause Resolution and KPI

Copyright © Huawei Technologies Co., Ltd. All rights reserved. Page14

Confidential Information of Huawei. No Spreading Without Permission

35pt Action 1: Cause Resolution and KPI

Copyright © Huawei Technologies Co., Ltd. All rights reserved. Page15

Confidential Information of Huawei. No Spreading Without Permission

35pt Action 1: Cause Resolution and KPI

Copyright © Huawei Technologies Co., Ltd. All rights reserved. Page16

Confidential Information of Huawei. No Spreading Without Permission

Copyright © Huawei Technologies Co., Ltd. All rights reserved. Page17

⚫ Procedure introduction: perform this action to check whether a KPI change is

Confidential Information of Huawei. No Spreading Without Permission

Copyright © Huawei Technologies Co., Ltd. All rights reserved. Page18

Confidential Information of Huawei. No Spreading Without Permission

Copyright © Huawei Technologies Co., Ltd. All rights reserved. Page19

Confidential Information of Huawei. No Spreading Without Permission

Copyright © Huawei Technologies Co., Ltd. All rights reserved. Page20

Confidential Information of Huawei. No Spreading Without Permission

Copyright © Huawei Technologies Co., Ltd. All rights reserved. Page21

Confidential Information of Huawei. No Spreading Without Permission

) :18pt  RF power is more than 34dBm/channel

Copyright © Huawei Technologies Co., Ltd. All rights reserved. Page22

Confidential Information of Huawei. No Spreading Without Permission

Threshold for TX shutdown

Threshold VSWR alarm