Professional Documents
Culture Documents
Toaz - Info Smartcare Seq Analyst v200r002c01 Web Service Quality Assessment and Optimizatio PR
Toaz - Info Smartcare Seq Analyst v200r002c01 Web Service Quality Assessment and Optimizatio PR
V200R002C01
Web Service Quality
Assessment and Optimization
Guide
Issue 01
Date 2012-03-24
and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd.
All other trademarks and trade names mentioned in this document are the property of their respective
holders.
Notice
The purchased products, services and features are stipulated by the contract made between Huawei and
the customer. All or part of the products, services and features described in this document may not be
within the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements,
information, and recommendations in this document are provided "AS IS" without warranties, guarantees
or representations of any kind, either express or implied.
The information in this document is subject to change without notice. Every effort has been made in the
preparation of this document to ensure accuracy of the contents, but all statements, information, and
recommendations in this document do not constitute a warranty of any kind, express or implied.
Website: http://www.huawei.com
Email: support@huawei.com
Purpose
This document describes the web service key quality indicators (KQIs) used in the SmartCare
SEQ Analyst system and the methods of assessing and optimizing web service quality with
these KQIs. In addition, it provides analysis for service failures.
Intended Audience
This document is intended for:
Technical support engineers
Maintenance engineers
Change History
Changes between document issues are cumulative. The latest document issue contains all the
changes in earlier issues.
Issue 01 (2012-03-24)
This issue is the first official release.
Contents
3 Assessment................................................................................11
3.1 Involved KQIs..................................................................................................................................................11
3.2 Baseline............................................................................................................................................................11
3.3 Method.............................................................................................................................................................12
3.3.1 Procedure................................................................................................................................................12
4 Locating Problems......................................................................13
4.1 Method.............................................................................................................................................................13
4.2 Procedure.........................................................................................................................................................14
4.2.1 Page Response Success Rate..................................................................................................................14
4.2.2 Page Response Delay..............................................................................................................................19
4.2.3 Page Browsing Success Rate..................................................................................................................23
4.2.4 Page Browsing Delay.............................................................................................................................23
4.2.5 Page Download Throughput...................................................................................................................23
To use the data service, attachment and activation must be performed to attach subscribers to
the PS network, obtain the IP address used for interworking with the network, and obtain
information about quality of service (QoS) for the bearer channel establishment. For some
smart phones, the attachment and activation are performed immediately after phones are
powered on no matter whether the subscriber starts to use services.
The signaling used for attachment and activation is public signaling and is not associated with
some web browsing services. Therefore, public signaling interaction is not KQIs for web
browsing services. The web browsing service flow is the flow associated with data
interaction.
Step 1 Enter a website in the address bar of the web browser or click a hyper link. If the IP address of
the domain name is not cached in the computer, the browser sends a domain name server
(DNS) request . The DNS server replies with a response containing the IP address
corresponding to the domain name.
The DNS request before the page request is responded is called the first DNS request. If no response is
returned to the first DNS request, the page fails to be opened.
Step 2 After the IP address is obtained, the browser sends a TCP link setup request , the server
replies with a Connect Reply message. After that, the browser sends a Connect ACK . After
three handshakes, the TCP link is set up.
The TCP link setup request before the page request is responded is called the first TCP link setup
request. If no response is returned to the first TCP link setup request, the page fails to be opened.
Step 3 After the TCP link is set up, the browser sends a GET request to request the page download
data. The server replies with a 200 OK message, indicating that the page request is
successfully responded.
The GET request before the page request is responded is called the first GET request. If no response is
returned to the first GET request, the page fails to be opened.
Step 4 After the page request is successfully responded, the page data starts to be downloaded.
During the download, the browser may send DNS request , TCP link setup request , and
GET/POST request to the server.
Any failure response to the DNS request, TCP link setup request, or GET/POST request can be
considered as object download failure for a page.
Step 5 All objects on a page are downloaded after the last data packet is downloaded to the browser
.
----End
Definition
Page Response Success Rate = Page response success count/Web service attempts
After a user enters www.vodafone.com in the address bar of the web browser, if the tab
changes to "Welcome to the Corporate Website of Vodafone Group Plc", the web service
request is considered successfully responded to, even though content on the requested web
page is only partially displayed.
Measurement Points
Associated KPIs
The response to a web page request includes three stages: DNS request, TCP link setup
request, and GET response. The corresponding KPIs are First DNS Query Success Rate, First
TCP Connect Success Rate, and First GET Success Rate.
Before using the DNS request for the performance measurement, you must determine whether
the DNS request is for a web service. To make such decision, match the DNS name with the
host IP address of historical web services. If a match is found, the DNS request is for the
match web service.
One host may correspond to multiple service types, for example, www.sina.com provides web
browsing and streaming media services. SEQ Analyst adopts proportional match method. For
example, if the web browsing service is used for 10 times while the streaming media service is used
for 5 times, DNSs are allocated according to the ratio of 10:5 to web browsing service and streaming
media services. This method may be inaccurate.
One IP address/Port may correspond to multiple service types, for example, the IP address/Port
53.122.67/80 of www.sina.com may be for web browsing and streaming media services. SEQ
Analyst adopts proportional match method. For example, if the web browsing service is used for 10
times while the streaming media service is used for 5 times, IP addresses/Ports are allocated
according to the ratio of 10:5 to web browsing service and streaming media services. This method
may be inaccurate.
SEQ Analyst automatically learns the mappings between hosts, IP addresses, port numbers, and
services. The longer SEQ Analyst works, the more accurate the system can associate hosts, IP
addresses, port numbers, and services.
Data Source
Data used for calculation is obtained from packets captured over the Gb, Iu-PS, Gn, and Gi
interfaces.
Formula
A page request is responded successfully when the browser receives a 200 OK message
responding to the first GET request.
Page Response Success Rate includes three associated KPIs. They are First DNS Query
Success Rate, First TCP Connect Success Rate, and First GET Success Rate.
Page Response Success Rate = Page Response Successes/Page Requests
First DNS Query Success Rate (MS) = First DNS Query Successes (MS)/First DNS
Query Requests (MS)
First TCP Connect Success Rate = First TCP Connect Successes/First TCP Connect
Requests
First Get Success Rate = First GET Successes/ First GET Requests
Page Requests = First DNS Failures + First TCP Connect Failures + First GET Requests
Definition
Page Response Delay is the delay after a user enters a URL and presses Enter, click
hyperlink, or open the default homepage till the requested webpage is opened.
Measurement Points
Data Source
Data used for calculation is obtained from packets captured over the Gb, Iu-PS, Gn, and Gi
interfaces.
Formula
For a page request, Page Response Delay is the delay from sending a DNS request (if there is
no DNS request, it is the TCP link setup request) till receiving a 200 OK responding to the
GET request.
Page Response Delay includes three associated KPIs. They are First DNS Query Delay, First
TCP Connect Delay, and First GET Response Delay.
Page Response Delay = Page Response Time – Page Request Time
First DNS Query Delay (MS) = First DNS Response Success Time (MS) – First DNS
Query Request Time (MS)
First TCP Connect Delay = First TCP Connect Success Time – First TCP Connect
Request Time
First Get Response Delay = First GET Response Success Time – First GET Request
Time
Definition
Page Browsing Success Rate is the rate at which the requested webpage is displayed after a
user sends a request.
Measurement Points
Associated KPIs
KPIs associated with Page Browsing Success Rate include Post Success Rate, GET Success
Rate, DNS Query Success Rate (MS) and TCP Connect Success Rate.
DNS Query Success Rate and are the rates associated with web browsing services.
Data Source
Data used for calculation is obtained from packets captured over the Gb, Iu-PS, Gn, and Gi
interfaces.
Formula
For a web page, if the download success rate of all objects is equal to or larger than 90%, the
web page is successfully displayed.
Page Browsing Success Rate includes four associated KPIs. They are DNS Query Success
Rate, TCP Connect Success Rate, GET Success Rate and Post Success Rate.
Page Browsing Success Rate = Page Browsing Success Times/Page Request Times
DNS Query Success Rate (MS) = DNS Query Successes (MS)/DNS Query Requests(MS)
TCP Connect Success Rate = TCP Connect Success Times/ TCP Connect Request Times
GET Success Rate = GET Success Times/GET Request Times
Post Success Rate = Post Success Times/Post Request Times
Definition
Page Browsing Delay is the delay for a web page to be completely displayed.
Measurement Points
Associated KPIs
KPIs associated with Page Browsing Delay include DNS Query Delay (MS), TCP Connect
Delay and GET Response Delay.
Data Source
Data used for calculation is obtained from packets captured over the Gb, Iu-PS, Gn, and Gi
interfaces.
Formula
For a page request, Page Browsing Delay is the time from sending a DNS request (if there is
no DNS request, it is the TCP link setup request) till receiving the last downloaded data
packet.
Page Browsing Delay includes three associated KPI. They are DNS Query Delay (MS), TCP
Connect Delay and GET Response Delay.
Page Browsing Delay = Page Browsing Success Time – Page Request Time
DNS Query Delay (MS) = Average value of all (DNS Success Response Time (MS) –
DNS Request Time (MS)
TCP Connect Delay = Average value of all (TCP Link Setup Success Time – TCP Link
Setup Request Time
GET Response Delay = Average value of all (GET Success Response Time – GET
Request Time
Definition
Page Download Throughput is the average speed for a page to be downloaded.
Measurement Points
Associated KPIs
KPIs associated with the Page Download Throughput include TCP Packet Loss Rate, TCP
Retransmission Rate, Fragmentation Rate and TCP RTT.
Data Source
Data used for calculation is obtained from packets captured over the Gb, Iu-PS, Gn, and Gi
interfaces.
Formula
Page Download Throughput = Total Pages Size/Total Page Browsing Delay
3 Assessment
3.2 Baseline
If the carrier has a service quality assessment baseline, determine the baseline with the carrier.
If the carrier does not have such a baseline, use the KQIs calculated by the system. The KQIs
used for assessment standard are ± KQIs x 10%.
Table 1.1 describes the web service quality assessment baseline used in Philippines project.
Page > 95% 95% - 90% 90% - 85% 85% - 80% < 80%
Response
Success Rate
Page < 500 ms 500 ms - 1s 1s - 3s 3s - 5s > 5s
Response
Delay
Page > 95% 95% - 90% 90% - 80% 80% - 70% < 70%
Browsing
Success Rate
Page < 1s 1s - 3s 3s - 6s 6s - 10s > 10s
Browsing
Delay
Index Benchmark
Page > 500 kbps 500 - 200 kbps 200 - 100 kbps 100 - 40 kbps < 40 kbps
Download
Throughput
If KQIs calculated by the system are proper, for example, the success rate is 90%, delay is
within tens of ms and rate is 1 Mbps for 3G, only small optimization is required. The baseline
must be set on the basis of project requirements. The high standard baseline may make the
future optimization harder.
The determination methods for the baseline are not verified and wait to be modified after
experiences are accumulated during the project.
This baseline can only be used to assess the service quality. It is not the goal for service
optimization. Some errors may occur on the match method, which will result in errors during KQI
calculation.
3.3 Method
Calculate the KQIs of web browsing services in the live network, and then compare them with
target KQIs to check whether the baseline standard has been reached. You can also compare
the KQIs with KQI history to check whether the service quality has been getting worse.
3.3.1 Procedure
Step 1 Check the quality of web browsing service.
Log in to HUAWEI SmartCare SEQ Analyst. Click SQM, the Service Quality Analysis page
is displayed. Click the WEB tab, and specify values, such as Time Period, Area, and Access
Type. Click Query. The query results are displayed as follows:
This figure shows average values of five KQIs for the web browsing service, changes
compared with those in the last period, and overall trends compared with last period.
Step 2 Assess KQIs.
Compare the KQIs in the live network with the baseline.
Check whether the KQIs are the same as before or worse than before. If KQIs have been
getting worse, list generated alarms while providing assessment results.
----End
4 Locating Problems
4.1 Method
Calculate KQIs of the web browsing service, analyze poor KQIs and KPIs, and check whether
a certain poor KPI leads to the poor KQI. If yes, analyze only this KPI to solve the problem.
KPIs associated with success rate
Calculate the ratio of each failure cause; analyze failure rules by location, device, APN,
browser, and website; analyze scenarios in which failure occurs and then perform
optimization.
If the timeout is caused by the network failure, analyze correlated multi-interface to
locate the part that leads to the timeout.
KPIs associated with delay
Analyze the spectrum diagram and KPIs with longer delay.
Analyze rules by location, device, APN, browser, and website.
If there are no rules, failures may occur during the transmission. Therefore, analyze TCP
performance for correlated multi-interface, including re-transmission, packet loss, and
RTT.
KPIs associated with rate
Analyze the spectrum diagram and KPIs with lower rate.
Analyze rules by location, device, APN, browser, and website.
If there are no rules, failures may occur during the transmission. Therefore, analyze TCP
performance for correlated multi-interface, including retransmission, packet loss,
fragmentation, and RTT.
4.2 Procedure
Average values of KQIs and historical trends are displayed on the web browsing service
quality analysis page, as shown in Figure 1.1.
You can view the changes of KQIs compared with the last period. If the KQI cannot reach the
standard or lower than that in the last period, you can check the historical trend. Click the
worst KQI and view its analysis page.
For details about the causes contributing to the failure, see chapter 5"Failure Cause Analysis."
Figure 1.1 Failure causes for KQI Page Response Success Rate
Click service failure times, and the detailed failure information of the corresponding area
is displayed. You can further analyze the failure by adding fields and conditions, as
shown in Figure 1.2.
----End
Perform the following steps to analyze the KPIs of Page Response Success Rate:
For failures 1001 to 1005, a KPI analysis enables you to obtain more detailed xDRs.
Step 1 Perform failure cause analysis in terms of types, aspects, and a failure cause code.
Analyze failure causes of the KPIs of Page Response Success Rate with the same analysis
procedure as that of the KQI Page Response Success Rate.
The difference lies in that the detailed information for the KPI is in the form of flow (4-tupel
of the TCP or DNS is considered as one flow) other than page.
Step 2 Perform response timeout analysis.
The timeout may be caused by no response from the server, longer delay, or packet loss.
Therefore, analyze failure causes on correlated multi-interface by comparing transaction
(including DNS, TCP link setup, and GET) timeout times on each interface and then locate
the part where the failure occurs. Figure 1.1 shows the failure analysis.
For example, if there are 900 requests on the Gn interface and 1000 requests on the Iu
interface, it indicates that some packets are lost between the Gn and Iu interfaces. Therefore,
most timeout failures occur on the Gn and Iu interfaces.
----End
Figure 1.1 Frequency segment with comparatively more services and longer delay
Analysis results by location, device, website, APN and browser are displayed in the lower part
of the page.
The delay distribution can be displayed after you clicking any tab of the analysis aspect. Find
failure rules according to the delay distribution.
Click success count, the detailed failure information of the corresponding location is
displayed. You can further analyze the failure by customizing group analysis.
----End
Perform the following steps to analyze the KPIs of Page Response Delay:
Step 1 Perform analysis in various aspects.
The analysis method for the KPIs of Page Response Delay is the same as that for the KQI
Page Response Delay. The difference lies in that the detailed information for the KPI is in the
form of flow.
----End
In addition to packet retransmission and RTT, packet loss and fragmentation are also analyzed
in the GET transactions.
Packet loss consists of uplink packet loss and downlink packet loss.
First check the uplink packet loss rate. If the uplink packet loss rate exceeds 3%, check the
uplink packet retransmission rate.
If the uplink packet retransmission rate exceeds 3%, the network connectivity above the
interface is abnormal.
If the uplink packet retransmission rate is no higher than 3%, the network connectivity
below the interface is abnormal.
Then check the downlink packet loss rate. If the downlink packet loss rate exceeds 3%, check
the downlink packet retransmission rate.
If the downlink packet retransmission rate exceeds 3%, the network connectivity below
the interface is abnormal.
If the downlink packet retransmission rate is no higher than 3%, the network
connectivity above the interface is abnormal.
You may also perform a multi-interface analysis. For example, the uplink packet loss rate is
5% on the Iu interface and 10% on the Gn interface. The difference indicates that there are
packet losses between the Iu and Gn interfaces.
To perform packet fragmentation analysis, focus on the packet fragmentation rate between the
Gn and Iu interfaces. If the fragmentation rate exceeds 10%, the MTUs configured on NEs are
inappropriate.
Possible Causes
The 1001 failure causes are as follows:
The server releases the connection unexpectedly, which is a main cause for 1001 failures.
After receiving the RST request, the device still sends request messages to the server.
The user stops using the service (for example, closes the browser). This type of scenario
should be excluded from the scenarios for the 1001 failure.
Cause Analysis
Analyze the causes of 1001 failure from the following aspects:
Analyze the causes of the 1001 failure by server because the 1001 failure occurs mostly
due to server-initiated call release. Identify servers that release call connections
unexpectedly based on the FIN flag (FIN = 0).
Analyze the cause from the aspect of the device and the browser based on the FIN flag.
Possible Causes
The 1002 failure causes are as follows:
The device fails to receive the downlink packet. It is possible that the downlink packet is
delayed or lost due to poor radio quality.
The device fails to receive the response message from the server and releases the
connection.
The user actively releases the connection before HTTP sessions are complete in any of
the following scenarios:
− The user releases the connection without reason.
− The user releases the connection due to poor radio quality.
− The user releases the connection because the network connectivity above the
interface is abnormal and the packet transmission delay is unusually long.
After using the web browsing service, the user actively releases the connection.
Cause Analysis
When the network quality is good, users rarely release service connections before HTTP
sessions are complete. Therefore we focus on the first two causes.
Count the cells in which the downlink retransmission rate is not 0 and the downlink RTT
exceeds 0.5s and count the failure rate in them to locate the cells with poor service
quality.
Count the hosts or server IP addresses whose uplink retransmission rate is not 0 and the
downlink RTT is no more than 0.5s and count their failure rate to locate the servers with
poor service quality.
Count the hosts or server IP addresses whose xDRs contain HTTP transaction requests
while there is no uplink packet that contains payload, and count their failure rate to
locate the servers with poor service quality.
Possible Causes
The 1003 failure causes are as follows:
The server releases the connection without giving response to the transaction request
from the device.
The server has released the connection, while the device fails to receive the release
message from the server and sends a transaction request to the server.
The server has released the connection and the device has received the release message,
but the device still sends a transaction request to the server.
The link setup delay is unusually long, and the server releases the connection before the
link setup is complete.
Cause Analysis
If the server releases the connection unexpectedly, and there is only one HTTP
transaction request in the flow xDR, count the hosts or server IP addresses, request times,
failure times and failure ratio to find out rules.
If the failure occurs below the interface, count the cells whose TCP link setup delay
exceeds 5s and with relatively more 1003 failures to locate the cells with poor service
quality. If all the cells have a large number of 1003 failures, the network connectivity
below the interface is abnormal.
Possible Causes
The 1004 failure causes are as follows:
The user actively releases the connection.
There is packet loss below the interface.
There is packet loss above the interface.
Cause Analysis
When the network quality is good, users rarely release service connections before HTTP
sessions are complete. Therefore we focus on the last two causes.
Count the cells of which the downlink retransmission rate is not 0, and count the failure
rate to locate the cell with poor quality.
Count the hosts with more packet losses above the interface to locate the hosts with poor
service quality. If all the hosts have a large number of packet losses, the network
connectivity above the interface is abnormal.
Possible Causes
The 1005 failure causes are as follows:
The server fails to respond before the device's timer (30 seconds) expires.
The packet is lost and the retransmitted packet fails to reach the device before the
device's timer expires.
Cause Analysis
Analyze the cause from the aspect of the server, and identify host names and IP
addresses with high failure rates.
If the 1005 failures are common for all the services, analyze the 1005 failure rate for
other services and the TCP packet loss rate to locate the problem.