Professional Documents
Culture Documents
2-LTE Access Fault Diagnosis ISSUE1.03 (Drive Test)
2-LTE Access Fault Diagnosis ISSUE1.03 (Drive Test)
47pt
30pt
反白
LT Medium
:
: Arial
LTE Access Fault Diagnosis
47pt
黑体
28pt
反白
细黑体
www.huawei.com
35pt
Objectives
⚫ Upon completion of this course, you will be able to:
32pt
Get deep understanding of UE initial access flow
35pt
Contents
1. Common Access Problem and Influence Factors
32pt
) :18pt
35pt
Common Access Problem
Source Problem Identification Method
Low RRC establishment success 1. A KPI is below the baseline or required
rate value.
Or low E-RAB establishment 2. A KPI deteriorates after an upgrade.
32pt
success rate
Or low CSSR success rate
Unstable RRC connection setup A KPI varies that of the previous day or the
success rate/ E-RAB connection same day of the previous week ( more than
setup success rate/CSSR 20%)
success rate
) :18pt Traffic
KPI An abrupt increase or drop in Despite an abrupt increase or drop in the
access requests measured value of the number of RRC
connection setup attempts in a cell, the
RRC and E-RAB setup success rates are
as the same as normal and the measured
value of the maximum number of users in a
cell remains basically stable.
Sleeping cells No UEs can access a cell all of a sudden
that has been maintaining accessible or
UEs of a cell gradually reduces to zero.
⚫ Note: Access-related KPIs do not cover all access problems. For example, no KPI is
available for the access problem that a UE fails to search an LTE network and
therefore cannot initiate an access request.
35pt
Common Access Problem (Cont.)
Source Problem Identification Method
Unsatisfactory call setup The drive test results show that the
32pt KPI of driver success rate CSSR is below the baseline or required
test value.
Long attach delay The drive test results show that the
average attach delay is below the
baseline or required value.
) :18pt Long Idle-to-Active Delay The drive test results show that the
average attach delay is below the
baseline or required value.
Complaints Failures of calls or data 1.No signal bar is displayed on a UE
services 2. Signal bars are displayed on a UE,
but the user still cannot make a call or
perform a data service.
35pt
Common Factors of Access Fault
RF planning issues:
Resource issues
Coverage issues
32pt 1.Wrong PRACH 1. Air interface
RF parameters 1.Weak coverage
resource limited
2.Improper TA 2.Cross converge
2.CPU overload
planning
EPC
eNodeB issues
UE issues: 1.Wrong subscription
Device 1.eNodeB fault
data
1.Top UE problem 2.Sleeping cell
2.EPC abnormal
35pt
Contents
1. Common Access Problem and Influence Factors
32pt
) :18pt
35pt
General Analysis Procedure
Step Prerequisite Purpose
Perform scope 1. Determine whether the problem is a top cell problem or
32pt identification, KPI network-wide problem.
None
trend analysis, and 2. Analyze the major causes of access failures and come up with
cause resolution priority actions.
1.Check operation logs to identify parameter modification and
Checking operation
operations that may cause problems.
logs, device faults,
None. 2. Check whether there are device faults that result in access
alarms, and
problems or whether access-related alarms are reported.
) :18pt external events
Checking 1.Analyze whether the core access parameters are proper on or
None
parameters consistent between the eNodeB and EPC.
Perform this
1. Check the access problem is caused by improper configurations
action based
Checking network of parameters
on the analysis
planning and 2. Check whether an access problem is caused directly by resource
results
optimization congestions or indirectly by improper parameter configurations.
provided by
3. Check whether an access failure is caused by weak coverage.
action 1.
35pt
General Analysis Procedure (Cont.)
Action Prerequisite Purpose
For top cell 1. Check whether RF channels are normal.
Checking RF
problems, perform 2. Check for uplink interference
32pt channels
this action.
For top-cell
1. Check whether access-related KPI deterioration
problems, check
happens on individual user(s).
Checking top UE top users.
2. Check whether the access problem is caused by a
types and top users For network-wide
certain type of UEs.
problems, check
) :18pt top UE types.
Perform this
action based on Check whether an access failure is caused by the EPC
Checking the EPC the analysis faults.
results provided
by action 1
Perform this
action based on
Checking Check whether the access failure is caused by
the analysis
transmission transmission faults.
results provided
by action 1.
35pt
Relevant Data Source
⚫ Performance Counters
32pt Performance counters can be obtained easily and allows data collection after the
event. However, it provides only a rough problem identification result.
Performance counters help determine whether an access problem occurs in the RRC
connection setup phase or in the E-RAB connection setup phase and the main causes
of the problem. It can be used for rapid problem identification and rough location of
) :18pt
the fault NE.
⚫ Trace signaling
Signaling messages help accurately identify the phase in which an access failure
occurs, which is useful for checking compatibility problem of UEs and the EPC and
problems discovered during drive test and repeating an access problem. It is good
practice to perform signaling tracing over Uu and S1 interfaces on the faulty node on
which an access problem is discovered by analyzing performance counters and alarms
or during drive tests.
35pt
Relevant Data Source (Cont.)
⚫ Drive Test Data
32pt
The drive test data helps obtain the signal strength, scheduling
information (depending on drive test tools and UEs) of the UEs. An
access problem can be accurately identified by comparing the drive
) :18pt test data with the signaling messages on the eNodeB side.
35pt
Action 1: Scope Identification
Data Analysis Method Solution
Source
32pt
Perfor 1. Identify the scope of the problem Continue to
mance 2. Identify whether the access problem occurs in perform other
Counte
rs the RRC connection setup phase or the E-RAB steps in action 1.
connection setup phase by analyzing the RRC
) :18pt connection success rates and the E-RAB
connection success rates.
3. If the access problem occurs in the RRC
connection setup phase, collect the data of
RRC connection setup success rates in
different scenarios by the cause of the RRC
connection setup
35pt
Action 1: KPI Trend Analysis
32pt Random No UEs Cell status is normal and Sleeping cell None
access access the UEs access the network.
network. The following situation
suddenly or gradually
occurs:
L.Traffic.User.Max = 0 or
L.RRC.ConnReq.Att = 0
RRC L.RRC.SetupF L.Traffic.User.Max is close Resource congestion Check
) :18pt success ail.Rej to or satisfy product congestion
rate specifications.
deterioratio
RRC connection setup 1. PUCCH parameter configurations are Check
n
Reject contains limited. congestion
L.RRC.SetupFail.ResFail. 2. PUCCH resources fail to be
PUCCH. expanded.
3. The number of UEs is large.
RRC connection setup The number of online UEs exceeds the Check
Reject contains CAPS specifications of a single eNodeB congestion
L.RRC.SetupFail.Rej.Flow and VS.BBUBoard.CPUload.Max is less
Ctrl. than 80%.
⚫ Classify problems based on access failure values and associated KPIs so that actions
to be taken and the sequence of taking actions can be determined. You are advised
to focus on causes and associated KPIs of critical problems and determine whether
inventory optimization is required based on provided standards.
35pt
Action 2: Check Operation Log
Data Source Analysis Method Solution
Operation Analysis method: Check whether
32pt logs 1. Check for abnormal operations that may have the operation
(operation been performed a week before the can be rolled
logs of top deterioration occurs in a sudden KPI back. If yes,
10 sites for deterioration scenario. check whether
a network- 2. Check for abnormal operations that may have the KPI
wide been performed within the latest week in a improves after
problem and gradual KPI deterioration scenario. the rollback.
) :18pt
top 10 cells 3. Check for abnormal operations that may have
for a top-cell been performed at all sites if a problem occurs
problem) on the entire network.
4. Abnormal operations include but not limited to
addition, removal, blocking, activation, and
deactivation.
5. If an operation is performed in a batch fashion,
it needs to be checked on the MAE because
the operation details are not available on the
eNodeB side.
35pt
Action 2: Check Device Fault & Alarm
Data Source Analysis Method Solution
32pt Analyze the impact of
Alarm and 1. Check alarms and device fault logs
fault logs within a week before the deterioration alarms and device
(top 10 occurs and active alarms and faults in faults on an access
sites for a a sudden KPI deterioration scenario. KPI. Then, clear
network- 2. Check alarms and device fault logs these alarms by
wide within the latest week and active referring to the alarm
) :18pt problem alarms and faults in a gradual KPI and fault handling
and top 10 deterioration scenario. guide and check
cells for a 3. Check for abnormal operations that whether the KPI
top-cell may have been performed at all sites improves after the
problem) if a problem occurs on the entire alarms are cleared.
network.
⚫ Procedure introduction: Perform this action to check for device faults and alarms
related to a KPI change. A fault that directly affects the KPI or the alarm of such fault
needs to be handled in the first place, with a higher priority over the fault or alarm that
is barely related to the KPI change.
35pt
Typical Alarm for Access Problem
Mod
Alarm/Event Name Alarm/Event Impact
ule
Services that exceed the license capacity cannot be
32pt accessed.
eRAN3.0:If a license capacity for the maximum number of
users exceeds the limit, the eNodeB allows these UEs to
ALM-26812 System
access the network and immediately releases these UEs.
Licen Dynamic Traffic
As a result, the number of E-RAB connection setup
se Exceeding Licensed
attempts and releases increases.
Limit
eRAN6.0:If a license capacity for the maximum number of
) :18pt
users exceeds the limit, the ERAB setup will fail. The
eNodeB response MME UE context setup fail with cause
radio-resources-not-available.
This PDSCH power configuration change does not take
ALM-29241 Cell
effect. As a result, the cell coverage does not meet the
Reconfiguration Failed
requirement.
Cell
ALM-29245 Cell Blocked The cell cannot provide services.
ALM-29240 Cell
The cell cannot provide services.
Unavailable
35pt
Typical Alarm for Access Problem (Cont.)
Module Alarm/Event Name Alarm/Event Impact
The board cannot work properly and services carried over
this board may be interrupted.
32pt ALM-26200 Board Hardware
The board cannot perform all the designed functions and
Fault
the board reliability degrades. If this problem persists,
services carried over this board may be interrupted.
The access success rates and service quality may
Board
deteriorate.
If this problem persists, the maintenance operations on this
ALM-26202 Board Overload board may slowly respond and even fail due to operation
) :18pt timeout.
The test operations and tracing tasks of lower priorities
may be suspended or terminated.
ALM-29207 eNodeB Control
All the SCTP links in the eNodeB are faulty, resulting in
Plane Transmission
failures such as S1 and X2 link setup failure, cell activation
Interruption
failure, and network access failure of users.
S1
Interfac ALM-25888 SCTP Link Fault The SCTP link cannot process signaling.
e
ALM-25889 SCTP Link The services are interrupted because the data cannot be
Congestion transmitted due to insufficient space of the sending buffer.
35pt
Typical Alarm for Access Problem (Cont.)
Module Alarm/Event Name Alarm/Event Impact
The return loss at the antenna port is excessive. As a
32pt ALM-26529 RF Unit VSWR result, the RF unit automatically switches off the TX
Threshold Crossed channel, and the ongoing services carried on the TX
channel are interrupted.
ALM-26532 RF Unit Hardware The RF unit may work improperly. The ongoing services
Fault carried on the RF unit may be interrupted.
ALM-29207 eNodeB Control
All the SCTP links in the eNodeB are faulty, resulting in
) :18pt
Plane Transmission
failures such as S1 and X2 link setup failure, cell activation
RF Interruption
failure, and network access failure of users.
Channel
The receive sensitivity of the RFU decreases, the
ALM-26521 RF Unit RX demodulation performance of the cell deteriorates, and the
Channel RTWP/RSSI Too Low uplink coverage shrinks.
If the RTWP/RSSI on all RX channels of the cell is too low,
the ongoing services of the cell may be interrupted.
ALM-26522 RF Unit RX The receive sensitivity of the RFU decreases, the
Channel RTWP/RSSI demodulation performance of the cell deteriorates, and the
Unbalanced uplink coverage shrinks.
35pt
Cause of High VSWR
⚫ VSWR: Voltage Standard Wave Ratio, indicate the if feeder impedance is
32pt match or not. The normal VSWR range is 1~1.5. If current VSWR is more
than a specified threshold, then eNodeB will generate relevant alarm.
⚫ Alarm generation:
Cell is activated
35pt
Solution for High VSWR
⚫ Step 1: Check if the VSWR alarm threshold is correct (default is 2dB).
Note: RRU TX channel will automatic closed if VSWR is extremely high (more than post-
32pt
processing threshold)
⚫ Step 2: If alarm threshold is correct, then check if relevant feeder installation and RRU
connections are meet the standard.
⚫ Step3: After the feeder tuning, if TX channel is closed, then activate TX channel again
MOD TXBRANCH
35pt
Cause of Low RSSI
⚫ Alarm generation: If RSSI is less than a specified value
32pt
⚫ Feeder problem
Due to bad feeder quality, causing additional loss
RRU fault
35pt
Solutions for Low RSSI
⚫ Step 1: Check if RRU RX attenuation is correct
32pt
and connection
35pt
Cause of Imbalance of RSSI
⚫ Alarm generation: RSSI of the main RX channel and the RSSI of
32pt
the diversity RX channel exceeds 10 dB.
⚫ Possible causes
High interference
) :18pt
RRU cross connections
35pt
Solution of Imbalance of RSSI
⚫ UL interference check
32pt From web-LMT: Perform spectrum detection to evaluate UL interference
From MAE client: Perform interference detect monitoring
Find out interference source
A A
N N
T T
1 2
RRU1 RRU2
35pt
Action 3: Check Parameters
Parameter Source Impact Recommended Value
Number of initial MML If the switch for dynamic adjustment If the switch for dynamic
32pt PDCCH to the number of OFDM symbols adjustment to the number of
symbols occupied by the PDCCH is turned off OFDM symbols occupied by the
and this parameter is set to 1, the PDCCH is turned on, preferably
peak throughput of a single user set this parameter to 1.
increases. However, if the bandwidth If the switch for dynamic
of the cell is lower than 3 MHz, it will adjustment to the number of
impact user access OFDM symbols occupied by the
PDCCH is turned off, preferably
) :18pt set this parameter to 3
Encryption MML In the ENodeBCipherCap MO, the PrimaryCipherAlgo = AES,
Algorithm PrimaryCipherAlgo, SecondCipherAlgo = Snow3G,
SecondCipherAlgo, and ThirdCipherAlgo = NULL;
ThirdCipherAlgo parameters must be
set to different values.
Primary integrity MML In the ENodeBIntegrityCap MO, the PrimaryIntegrityAlgo = AES,
algorithm PrimaryIntegrityAlgo, SecondIntegrityAlgo = Snow3G,
SecondIntegrityAlgo, and ThirdIntegrityAlgo = NULL;
ThirdIntegrityAlgo parameters must
be set to different values.
35pt
Action 3: Check Parameters (Cont.)
Parameter Source Impact Recommen
ded Value
32pt S1 MML If this parameter is set to a small value, the eNodeB 20s
message may determine that timeout occurs even when the
waiting MME does not respond to the message. If this
timer parameter is set to a large value, when exceptions
occur but no response messages from the MME are
received, system resources will be occupied for a
long period of time.
) :18pt
Uu MML This parameter affects the timeout length for the 35s
message timer for the eNodeB waiting for the UE to send the
waiting Uu response message. If this parameter is set to a
timer small value, the eNodeB may determine that timeout
occurs even when the UE does not respond to the
message. If this parameter is set to a large value,
when exceptions occur but no response messages
from the UE are received, system resources will be
occupied for a long period of time.
35pt
Action 3: Check Parameters (Cont.)
Parameter Source Recommended Value
32pt AMBR S1 signaling (MME) Set this parameter a value greater than 0. If
AMBR is set to 0, users cannot access the
network.
35pt
Action 4: Weak Coverage Check
35pt
Action 4: Check Congestion
Data Analysis method Solution
32pt Source
Performan Check relevant resource, including: Located the bottleneck
ce counter PRB, CCE, CPU usage, license, of congestion and
PUCCH usage, active user number perform capacity
extension
) :18pt
⚫ This action targets at top cells and is performed in any of the following scenarios:
1. L.RRC.SetupFail.ResFail returns to a non-zero value and the number of
UEs is limited.
2. L.RRC.SetupFail.NoReply returns to a non-zero value and the BBP CPU
usage is high.
3. L.RRC.SetupFail.Rej.FlowCtrl returns to a non-zero value.
4. L.RRC.ConnReq.Msg.disc.FlowCtrl returns to a non-zero value.
35pt
Action 5: Check Interference
32pt
35pt
Action 6 : Top UE Check
Data Source Analysis method Solution
32pt
Use the Capabilities of the Top Users function 1. Check for known
Nastar provided by the Nastar to check the ratio of top UE issues in the
(Choose top type problems to total exceptions. If the ratio of
10 sites for a exceptions due to top 1 UE type is twice higher current UE version.
network-wide than that of normal UE types, the problem is a top 2. Use the UE to
problem. UE type problem. repeat the problem
) :18pt ) 1. Collect logs of top 10 sites and statistics about the
UE capacity recorded in CHRs. Then, generate the
ratio of each UE capacity to obtain top 1 UE type.
2. Calculate the proportion of exceptions generated
by the top 1 UE type and proportions of exceptions
generated by other top UE types based on the
statistical results of the CHR.
35pt
Action 7: Check EPC Exceptions
Data Analysis method Solution
Source
32pt E-RAB setup failures: Analyze the following problems by using CHRs and 1. If EPC exceptions
CHR and Uu/S1 signaling. exist, locate the
Uu/S1 1. Failure of an eNodeB to respond to a context,Check AS layer problem with EPC
signaling integrated protection and encrypted algorithm configuration on the personnel.
eNodeB by using standard signaling. Check whether IP address at the 2. If UE exceptions
transmission layer on the EPC, AMBR, and ARP are correct. exist, perform top
2. Abnormal release of active MME release: According to CHRs or Uu/S1 users/UE check
signaling, check whether the EPC delivers the release command too early 3. If air interface
) :18pt
or the the EPC delivers the release command after the timer for the EPC scheduling is
waiting for context setup expires because the eNdoeB air interface waits inappropriate,
for a long period of time. For such problems, check the length of the timer perform weak
for waiting contexts with the EPC personnel. coverage and
Abnormal NAS: interference check
1.Authentication failure
2.NAS security activation failure
⚫ The abnormal E-RAB setup signaling process is as follows: After the EPC delivers the context
setup request message of a UE to the eNodeB, the eNodeB fails to respond to the message,
or the EPC delivers the command to release UE's contexts before the eNodeB responds to the
context response.
35pt
Action 8: Check Transmission
Data Analysis method Solution
source
32pt Performa The cause value of an access failure is L.E-RAB.FailEst.TNL. 1. Repeat the transmission
nce Analyze the quality of SCTP link by counters of caused access problem.
Counters
VS.SCTPLnk.Cong.Dur and VS.SCTPLnk.Cong and 2. Troubleshoot the
VS.SCTPLnk.Unavail.Dur and VS.SCTPLnk.Unavail. transmission faults.
Alarm Check for ALM-25888 SCTP Link Fault, ALM-25886 IP Path Fault, Clear the reported alarms by
and ALM-29240 Cell Unavailable referring to the Alarm/Event
) :18pt
References.
Paramete Check whether the settings of VLAN, DSCP, IPRT, IPPATH, SCTP, Reconfigure the parameters
r settings and other transmission parameters are the same as the planned that are not currently
settings.
configured as the planned
settings.
⚫ Procedure introduction: This action is required if traffic data analysis determines the cause of
L.E-RAB.FailEst.TNL or low S1 signaling message setup success rate at the top sites for an
access problem. A transmission fault must be cleared in the first place under any condition.
Alarms or faults that are not closely related to the access success rate can be handled with a
lower priority than transmission faults.
35pt
Action 8: Check Transmission (Cont.)
Data Analysis method Solution
source
32pt Signaling If a context setup fails, check the 1. If an IP address is not configured at the
message value of the transportLayerAddress eNodeB transmission layer, configure an IP
s over S1 field in the
interface INITIAL_CONTEXT_SETUP_REQ address for the IP path of the eNodeB.
message is consistent with the peer 2. If an IP address is not configured as planned
IP address of the IP path by using at the eNodeB transmission layer,
the S1 interface signaling message. reconfigure an IP address for the IP path of
) :18pt
the eNodeB.
3. If an IP address is not configured as planned
at the EPC transmission layer, contact the
EPC engineer for a reconfiguration of an IP
address for the IP path of the eNodeB.
⚫ Analysis:
) :18pt
Check the performance statistic of ERAB release, we found that
most of failures are caused by transport resource not available
⚫ Tracing message
Previous Next
⚫ Message tracing
Previous Next
) :18pt
) :18pt
35pt
Case 3 – Multi-mode UE Attach Failure
32pt S1
UU trace
trace
) :18pt
⚫ Analysis
) :18pt Since the most of procedures are normal, and eNodeB consider that it’s
a normal release, so we locate that the problem is due to NAS failure.
) :18pt
⚫ Analysis (Cont.)
Why should UE need attach to MSC ? We check the UE model, it ‘s
Huawei E398, with multi-modes(GSM/UMTS/LTE). So we assume
that this UE performs combined attach. From the attach request
message we verify our assumption.
⚫ Conclusion:
) :18pt
We check that there is no CS domain configuration in current EPS network, so MME
only replies PS attach accept, and also inform that MSC can’t reachable
⚫ Temporary solution:
Change UE attach mode with PS only
Add CS domain configuration in EPC
⚫ Final solution:
Update MME to compatible with combined attach even there is no CS domain
) :18pt
35pt
Troubleshooting Process
32pt
35pt
Troubleshooting Process (Cont.)
Stat
Action Analysis Results
us
Action 2: checking
32pt operation logs, device No operation alarms of this cell are reported within the one week that
OK
faults, alarms, and precedes the occurrence of this problem.
external events
The configuration files of this eNodeB and these of the eNodeB of which
Action 3: checking the cells are normal are compared by the frontline engineers, finding
OK
parameters that the SRSSUBFRAMECFG parameter is set to SC9 at this site and is
SC3 at other sites.
) :18pt Action 4: checking
version differences and This problem has been identified, and this action is not required. /
known issues
⚫ Parameter analysis:
As described in the impact of core parameters, this SRS configuration does not allow
SRS resources of this cell to decrease or increase. Therefore, the number of users of
this cell cannot be dynamically adjusted.
Before the horse race event, the cell has a limited number of users and therefore an
SRS resource increase is not required. However, the number of users significantly
increases during the horse race, yet the SRS resource increase cannot be performed.
⚫ Description of the SRSSUBFRAMECFG parameter in the LTE access parameters:
If SrsSubframeCfg is set to SPC9, the cell subframe cycle is 10 ms and the offset is 0.
The cell subframe cycle and offset do not allow a cell migration. Therefore, this cell
does not allow resources of this cell to increase or decrease.
35pt
Troubleshooting Process
Action Analysis Results Status
32pt 1. Analysis of daily performance counters
shows that the drastic increase of RRC
connection setup attempts takes place only
at the top sites, rather than on the entire
Action 1: network.
performing scope 2. Despite normal setup success rates of
) :18pt
identification, KPI RRC connections and E-RAB connections, OK
trend analysis, and the number of RRC connection setup
cause resolution attempts increases drastically and the
number of E-RAB connection setup
attempts remains the same. Therefore, this
problem is empirically caused by improper
TAU planning.
35pt
Troubleshooting Process (Cont.)
35pt
Troubleshooting Process (Cont.)
Action Analysis Results Status
32pt
The checking of the network topology finds that the top
sites (in the red circles in the following figure) are different
Action 5: checking
from the nearby sites in terms of TAC planning and they
network planning and OK
belong to different TALs. Therefore, the top sites are
optimization
discretely distributed, with improper TAC and TAL
planning. For details, see the following figure.
) :18pt
⚫ Analysis:
The setup success rates of RRC connections and E-RAB connections are not decreased
at the top sites. Therefore, this problem is not caused by weak coverage.
No difference is found in the network parameter configurations between the top sites
and normal sites. The only difference is that the two top sites are newly deployed.
The TAC/TAL of the two top sites in the red circle are improper because they are not
planned by their geographical positions (the sector number is their TAC number).
A tracking area update (TAU) is required when UEs are moving from a cell of one TAL
to a cell of another TAL. Therefore, UEs served by the two top sites perform frequent
cell reselection when they are moving in the edge areas because the tracking area is
updated in this process. As a result, the number of RRC connection setup attempts
increases.
Thank you
www.huawei.com