Professional Documents
Culture Documents
RhodeShwarz NPS PDF
RhodeShwarz NPS PDF
White paper | Version 01.00 | Dr. Jens Berger, Johanna Sochos, Dr. Marija Stoilkovic
CONTENTS
1 Introduction ..........................................................................................................................4
2 Basic structure .....................................................................................................................5
3 Voice telephony ....................................................................................................................6
3.1 Voice telephony contributors ....................................................................................................... 6
3.1.1 Accessibility and retainability – success ratio ................................................................................... 6
3.1.2 Call setup time ................................................................................................................................... 6
3.1.3 Perceptual objective listening quality analysis (POLQA) in line with ITU-T P.863 ............................. 7
3.2 Contributors transformation to a percentage scale ..................................................................... 8
4 Data services ........................................................................................................................9
4.1 Data transfer services .................................................................................................................. 9
4.1.1 Availability/accessibility – HTTP UL/DL success ratio ...................................................................... 9
4.1.2 Data transfer performance – HTTP DL/UL throughput ................................................................... 10
4.1.3 Data transfer services – contribution and weighting ...................................................................... 11
4.2 Video streaming services ........................................................................................................... 11
4.2.1 Typical video streaming service structure ........................................................................................ 12
4.2.2 Video streaming services performance contributors ....................................................................... 12
4.2.3 Video streaming services – contribution and weighting ................................................................. 13
4.3 HTTP browsing and social media .............................................................................................. 13
4.3.1 HTTP browsing ................................................................................................................................. 13
4.3.2 Browsing/social media – contribution and weighting ..................................................................... 15
7 Summary ............................................................................................................................20
Appendix A .........................................................................................................................21
2
Appendix B ........................................................................................................................37
Customer support ...............................................................................................................42
Technical support – where and when you need it ..................................................................... 42
Up-to-date information and upgrades ....................................................................................... 42
Regional contact ........................................................................................................................ 42
There is a demand for an efficient method of calculating an overall score that reflects the
perceived technical performance of a network or one of its subsets, such as a region, a
period of time or a technology. The method described here considers and weights the key
performance indicators (KPI) for a wide range of services that are essential for and rep-
resentative of the service quality and combines them into an overall performance score.
This score can be calculated for individual regions such as cities, highways and popular
areas. The scores of the individual regions are then aggregated in an overall network per-
formance score.
Between these two scales, there is a transformation of the applied weightings for the in-
dividual services. Section 2 Basic structure to section 5 Regions and final aggregation
categorization explain the KPIs used and the structure of the score based on the percent-
age score. The point score is explained separately in section 6 Point score application.
The scoring mechanism allows very efficient comparison of operators in a market, of dif-
ferent measurement campaigns in regions and countries, or before and after deploy-
ment of new technology or software. The transparent structure of the score allows effi-
cient drilldown to the region, service or even the KPIs responsible for a nonoptimal overall
score.
The scoring methodology is in line with the current state of the ETSI TR 103 559 V1.1.1
and is available in Rohde & Schwarz mobile network testing (MNT) products as network
performance score version 1.1.
This score only takes into account the technical performance of the services; other dimen-
sions of user satisfaction such as billing, tariffs and support quality are not considered.
4
2 BASIC STRUCTURE
The structure of the network performance score is highly transparent and consists of dif-
ferent layers of weighting and accumulation.
On the technical side, the score is based on telephony and data services subscores, each
of which is scaled separately from 0 % to 100 %. Each of these two subscores consists
of a set of comprehensive KPIs or contributors. Today, the subscores have weightings of
40 % telephony and 60 % data services and form a complete network score.
60 %
0 % to 100 %
Telephony
Connecting roads
Data services
0 % to 100 % 0 % to 100 %
Telephony
..
.
Spots
Data services
0 % to 100 %
Telephony
The number, categorization and weighting of these regions is flexible and can be defined
to meet regional or national needs. This regional categorization is described in section 5
Regions and final aggregation categorization.
Note: Successfully established calls include completed calls and established but dropped
calls. Attempts include completed, dropped and failed calls.
Call drop ratio (CDR) is the percentage of telephone calls that were cut off due to tech-
nical reasons before the speaking parties finished their conversation and before one of
them intentionally hung up. This KPI is measured as a percentage of all successfully es-
tablished calls. Typical scores are < 10 %.
The formula only considers completed and dropped calls as successfully established:
Number of dropped calls
CDR =
Number of completed + Number of dropped calls
Note: There is a difference between the CST described above and the shorter call setup
times measured at the signaling level.
Call setup time (CST) average is the overall average performance of the network or of
the applied selection of measurements. This value is calculated as the average of all mea-
sured CSTs for all completed and dropped calls.
6
CST > 15 s ratio is a KPI used to identify poor performers. Usually, the contribution of
this KPI is important due to fact that users have a very negative perception when some-
thing goes wrong while they consider exceptional service as the norm.
CST 10th percentile is the threshold below which the shortest 10 % of CST values fall.
This score rewards best performers and gives an indication of the shortest CST reachable
in a technology or region. The 10 % percentile value also indicates a good practice expec-
tation of what is possible with respect to the CST for a region, technology or weighting.
3.1.3 Perceptual objective listening quality analysis (POLQA) in line with ITU-T P.863
To assess the speech quality provided by mobile operators, three KPIs are defined based
on the ITU-T P.863 (POLQA) MOS. ITU-T P.863 is used in its super-wideband/fullband
mode to access HD voice/wideband channels correctly. The applied ITU-T P.863 algorithm
considers the full audio spectrum applied to EVS-SWB and FB codecs as deployed in
VoLTE.
The POLQA scores are evaluated per speech sample, each call delivering multiple individ-
ual scores into the analysis in both directions of the conversation (half-duplex).
The absolute POLQA score depends on the test speech signal used. It is recommended
to use reference signals specified in ITU P.501 Annex D and apply the same signal for all
measurements in a campaign for comparison reasons. A good example is the English test
sample “EN_fm_P501”, which is part of the Rohde & Schwarz system installation.
Voice MOS average measures the overall, average speech quality performance of a
network or a selection of measurements. This KPI is the plain average of all considered
POLQA scores without any further preselection, exclusion or weighting. Typical MOS
scores are around 3.0 for narrowband channels and 3.5 to 3.7 for wideband (HD voice)
channels.
Note: Silence in the audio channel is not part of the mean opinion score (MOS); it is in-
directly considered, either by failed calls due to silence or by dropped calls where
silence typically occurs shortly before a call drops.
Voice MOS < 1.6 ratio is the ratio of very bad speech samples. Considering the strong
negative perception of low quality, this score explicitly penalizes high ratios of bad
samples.
Voice MOS 90th percentile is the threshold above which the best 10 % of voice MOS
values fall. It rewards good performers, keeping in mind that users perceive very good
performance very positively. It also gives an indication of the MOS scores that are attain-
able with a given setup or technology (based on the applied selection of measurement
data).
The transformation applies a linear weighting of 0 % to 100 % between a bad and good
threshold. Scores outside of these boundaries stay saturated.
100 %
0%
Generally, 0 % is assigned to the bad threshold and 100 % is assigned to the good thresh-
old. The assignment depends on the contributor: for the call success ratio or the average
speech quality, a high score is used as the good threshold; for the call drop ratio or the
average call setup time, a high score is considered bad and is used as the bad threshold.
The terms bad and good refer to the contribution in terms of quality.
For each contributor, bad and good thresholds are defined in the same way as the
weighting in the telephony subscore. The table shows these thresholds as applied in the
initial Rohde & Schwarz SwissQual implementation.
8
4 DATA SERVICES
The data services subscore consists of three areas of contributors addressing different
types of services and characterizing different types of requests in a network:
►► (Plain) data transfer (HTTP) (25 %)
►► Video streaming (22 %)
►► HTTP browsing (38 %) and social media (15 %)
Similar to voice telephony, each area consists of a set of individual contributors quantify-
ing the typical midrange, rewarding outstanding performers and giving an e xtra bonus for
operators with little or no bad performance.
For the transfer performance, multiple connections are opened and the transfer rate is
measured for a given time. This test is also known as the capacity test.
Successfully completed tests are tests with ErrorCode = 0 (state = success). Attempts
include successfully completed tests and tests with ErrorCode ≠ 0 (state = test timeout,
HTTP request timeout, file transfer failed, etc.).
The total duration of an HTTP transfer test is set to 10 s with a connection lost t imeout of
3 s and transfer duration of 7 s (see Table 15: HTTP DL test – multi-connection and Table
16: HTTP UL test – multi-connection). These parameters are the same for both upload
and download tests. The active transfer duration starts only after all configured TCP sock-
ets have been connected.
As an indicator of the average data throughput, the mean data rate (MDR) is calculated.
For an HTTP transfer test, the MDR is calculated as the sum of all transferred bytes during
the test’s active transfer period divided by the transfer time. It is calculated separately for
upload and download data.
∑ all mean data rates per test
Mean data rate per test =
Number of tests
In the calculation, all tests with ErrorCode = 0 are considered. Tests where the TCP con-
nection could not be established for any or all sockets, tests for which the server is not re-
sponding (ErrorCode ≠ 0) or tests classified as system release are excluded.
HTTP DL/UL throughput average is the main score and quantifies the average transfer
rate in Mbit/s across a network or an applied selection of measurement data.
∑ all mean data rates per test
HTTP DL / UL throughput average =
Number of tests
HTTP DL/UL throughput 10th percentile is a KPI that measures the poor performance of a
network, i.e. the data rate below which the worst 10 % of transfers fall. It is used to con-
sider the negative perception if there is a very slow transfer that is not well considered in
the average throughput.
Example:
There are 100 DL tests, 85 with 120 Mbit/s and 15 with just 0.5 Mbit/s. The average MDR
is considered good at 102 Mbit/s even if 15 % of the tests have a much lower value. The
10 % percentile MDR indicates this with a score of 0.5 Mbit/s. Consequently, such a net-
work is rated lower than one having consistently 102 Mbit/s in all tests.
HTTP DL/UL throughput 90th percentile is a KPI that evaluates the good performance of
the network, i.e. the best 10 % of transfers are above this value. The goal of this KPI is to
find the maximum performance of the network or the selected measurements. The 90 %
percentile value is preferred over the absolute maximum (which is just a single test) and
is considered a more reliable KPI for showing the network’s full capacity.
10
4.1.3 Data transfer services – contribution and weighting
The individual contributors are rescaled on a 0 % to 100 % scale as described in section
3.2 Contributors transformation to a percentage scale.
The HTTP data transfer performance contributes 25 % to the data services subscore.
In YouTube – and in all video services – there is a basic difference between live video and
video on demand (VoD). In the VoD case, the video is completely stored on the server
and is usually completely – or mostly – downloaded to the device; there are many tech-
niques, progressive download being the most common one. Live video is not available as
a file. It is sent almost in real time to the device, and in practice, in short portions of a few
seconds each. If VoD is used as the test case, the video is largely buffered on the phone
and outages in the connection can easily be bridged. Live video is much more sensitive
since an interruption in the data flow will lead to freezing a fter a short time. The VoD is
the less sensitive test case and leads to average or higher video quality since there is less
freezing. Live video is more sensitive and reflects the continuity of the data flow provided
by the network.
Consideration of live video streams is best practice for network benchmarking today.
After the playout starts, the perceived video quality is considered as the main contributor.
The perceived video quality is determined by compression artifacts, rescaling effects, low-
er frame rates and freezing (stalling) during the display. The perceived quality is measured
by ITU J.343.1 and combines all possible degradations into one video MOS on a common
scale from 1 to 5. ITU J.343.1 is especially recommended by ETSI TS 102250-2 for evalu-
ating mobile streaming services. The testing methodology for YouTube and other video
streaming services is described in ETSI TR 101578.
Video success ratio considers all tests that achieve the defined display time of the
video. These tests are classified as completed. The typical display time applied for live
YouTube streams is 45 s.
Number of successfully completed tests
Video success ratio =
Number of attempts
Attempts include tests with the following states: completed, failed and dropped.
# test status Completed
Video success ratio =
# test status Completed + Failed + Dropped
The status failed or dropped is defined by timeouts because unlike for telephony, there is
no ongoing signaling information available. A video streaming test is considered as failed
if no picture is displayed within a defined timeout (connection timeout). This timeout de-
fines therefore the maximum length of the video access phase. A timeout value of 30 s is
used. A video streaming test is considered as dropped if 15 s of subsequent freezing (vid-
eo pause) is observed. This stream lost timeout is considered the maximum time a viewer
is willing to wait for the video to resume.
12
Video setup average is the average value of all measured times to first picture (TTFP)
for all completed and dropped tests. It quantifies the average length of the video access
phase.
Video setup > 10 s ratio is the ratio of attempts where TTFP lasts longer than 10 s. This
performance is considered to have a negative impact on the user experience and the per-
ceived degradation.
Video MOS average is calculated for all successfully completed tests. It is defined as
the average of already averaged video MOS (ITU J.341) per test. It incorporates all visible
degradations during the video display into a MOS.
Video MOS 10th percentile is the threshold below which the lowest 10 % of video
MOS values fall. This KPI evaluates poor network performance in terms of video quality.
The percentile values focus on bad and very good performers. This KPI is calculated tak-
ing into account only completed tests.
The performance of video streaming services contributes 22 % to the data services
subscore.
The HTTP browsing tests considered in this scoring methodology should access a set of
different pages. This set has to be a mix of static and dynamic pages where the Alexa rat-
ing gives a valid indication of the most popular websites.
No matter what pages are used, all HTTP tests are considered equally by the contributors
(KPIs) for HTTP browsing performance:
►► Browsing success ratio
►► Browsing duration average
►► Activity duration > 6 s ratio
Browsing success ratio is the ratio of data tests with status OK (ErrorCode = 0) to all
tests. The status OK is given if a website (including all items) is downloaded completely
and does not exceed the time limit (typically set to 15 s). The criterion for failing an HTTP
browsing test is ErrorCode ≠ 0 (status: test timeout, cancelled, etc.).
Number of successful tests
Browsing success ratio =
Number of successful tests + Number of failed tests
Unlike HTTP transfer, browsing is not performed in a controlled environment. The select-
ed live internet pages often result in different transfer times. A well-chosen set of different
pages minimizes this problem by averaging as explained above.
Browsing duration > 6 s ratio is a KPI that measures the percentage of webpages that
were downloaded in a time interval longer than 6 s and shorter than the 15 s test timeout.
It provides a further distinction between operators.
To mimic a typical Facebook or Dropbox user, the test includes different typical sub-
sequent tasks or actions before leaving the site and finishing the test. A social media
session can involve actions such as opening the home page, opening several posts,
commenting on posts, liking posts and creating posts.
To take into account posting to social media in this scoring methodology, the test in-
cludes creating or uploading a post that includes transferring a 1 Mbyte media file.
Since most social media actions only transfer a minimal amount of data, the throughput
is not an important indicator. Instead, the main results of the test are the durations of the
individual actions and the entire session and the action success rates.
The test success ratio (task completed in a defined time) and the data transfer time for
uploading the 1 Mbyte file are used as contributors.
14
This test is similar to an HTTP transfer test. Therefore, it is not necessary to extract more
KPIs from this test. The critical part of this test – compared to plain HTTP transfer tests –
is the performance of the connection to the Facebook or Dropbox server and the potential
data rate restrictions imposed by the server/network in case of traffic.
Social media success ratio is the ratio of data tests with status OK (ErrorCode = 0) to
all tests. ErrorCode = 0 is assigned to all tests completed without exceeding the time limit
(defined timeout = 30 s). The criterion for failed tests is ErrorCode ≠ 0 (status: cancelled,
service unavailable; test result: initialization failed, test timeout, service unavailable, etc.).
Number of successful tests
Social media success ratio =
Number of successful tests + Number of failed tests
Social media duration average measures the network (including the CDN) and server
performance in this application test. It differs from HTTP browsing tests in that only one
DNS request is sent and only a single object is uploaded to one location. It differs from
HTTP transfer upload in that the server is a third-party server and may, like the content
delivery network, apply its own data transfer restrictions.
This KPI is calculated for successfully completed tests only (ErrorCode = 0, status OK).
Social media duration > 15 s ratio is a KPI that measures the percentage of uploads
that were performed in a time interval longer than 15 s and shorter than the 30 s test
timeout. It contributes to more objective scoring by introducing a further distinction
between operators.
The performance of HTTP browsing contributes 38 % and the performance of the social
media applications contributes 15 % to the data services subscore.
This regional categorization can be seen as an intermediate weighting layer where the
network performance score is calculated separately for each regional category, weighted
according to the importance of each region and then aggregated to a final score.
Depending on the operators or the country, not all categories apply. In addition, the
score’s reliability depends on the available number of measurements: if there are only a
limited number of measurements, using fewer categories leads to more reliable scores.
There are advantages to applying this regional weighting as a separate layer during post-
processing. The first one is the flexible assignment of categories and weightings; different
views can be applied, e.g. one based on traffic and another based on the number of sub-
scribers. Another advantage is that in the case of nonoptimal performance, it is possible
to immediately drill down to the underperforming category or region.
The architecture of the implementation allows different weightings on the lower layers
based on the regional category. This means that an individual weighting for telephony and
data can be applied for each region. The contribution of individual KPIs and their thresh-
olds can be adjusted according to regions to serve special customer needs. For public
use, it is recommended to use the same low-layer weightings for all regional categories
for transparency reasons.
16
6 POINT SCORE APPLICATION
The point score is based on exactly the same KPIs and thresholds as the percentage
score. Instead of scaling the subscores (e.g. of telephony or cities) from 0 % to 100 % on
each level, the number of achievable points is specified and implicitly contains all later ap-
plied weightings.
Example:
Consider a 2 % call drop ratio in the road category (see section 3.2 Contributors transfor-
mation to a percentage scale).
On the percentage scale, this 2 % CDR would be rated as 80 % after applying the thresh-
olds and the linear equation formula. This 80 % would contribute 80 % × 0.375 = 30 % to
the telephony score.
If telephony is weighted by 40 % in the mixed telephony and data score, the CDR contrib-
utes 30 % × 0.4 = 12 % to the overall percentage score for the road category that is the
next aggregation layer.
If the road category is weighted by 25 % in the overall network performance score, the
2 % CDR in the road category contributes 12 % × 0.25 = 3 % to the overall percentage.
There is a direct dependency between the percentage and the point scale at this final
level. A percentage score e.g. of 91.5 % is equivalent to 915 points. In the example, the
contribution of the 2 % CDR in the road category is equivalent to 30 points (3 %). The
maximum number of points to be reached by CDR in the road category would be 37.5
(0 % CDR ⇒ 100 % × 0.375 × 0.4 × 0.25 = 3.75 % ⇒ 37.5 points).
If individual categories weightings are defined, the number of points the categories
contribute to the overall network performance score can be directly calculated and
presented.
The following tables present the contributions of all individual KPIs to the overall net-
work performance score based on the individual weightings for the measured regional
categories.
The following table shows the maximum scoring points for data services. In total, data
services can contribute up to 600 points and e.g. data transfer can contribute 150 points.
The following tables give an indication of the improvement in points if a KPI is changed
across all regions by a certain margin. It is anticipated that the improvement will stay
between the bad and good limit. An improvement outside the limits will have no e ffect
due to saturation.
18
The following formula is applied:
Change in KPI
Improvement in points = 1000 points × Weighting in overall ×
Good limit – Bad limit
Video MOS average 0.0363 0.022 21.8 3 4.5 0.1 MOS 1.5
Video MOS 10th
0.0363 0.022 21.8 2 4 0.1 MOS 1.1
percentile
Video setup average 0.0099 0.006 5.9 7.0 s 2.0 s –1 s 1.2
Video setup > 10 s
0.0099 0.006 5.9 5 % 0 % –1.0 % 1.2
ratio
Browsing success
0.25333 0.152 152 80 % 100 % 1.0 % 7.6
ratio
Browsing duration
0.10857 0.065 65.1 6.0 s 1.0 s –1 s 13
average
Browsing duration
0.0181 0.011 10.9 15 % 0 % 1% 0.73
> 6 s ratio
Social media success
0.100005 0.060 60.0 80 % 100 % 1.0 % 3.0
ratio
Social media duration
0.042855 0.02575 25.75 15 s 3s –1 s 2.14
average
Social media duration
0.00714 0.004 4.25 5 % 0 % 1 % 0.8
> 15 s ratio
From the analysis documented in this white paper, it can be seen that a robust and scal-
able methodology for quantifying a network’s performance on a local, regional or national
scale can be produced by taking into account all factors that affect the delivery of servic-
es to end users and applying appropriate weighting.
The value of such a methodology is that mobile network performance can be expressed
in a single, integrated metric that can be independently and transparently compared on
a national and international level. These results can be used to develop a program of net-
work improvement actions to achieve the desired level of performance.
20
APPENDIX A
When testing HTTP browsing, several factors need to be considered, including suc-
cess ratio and access, download times (which depend heavily on the website structure)
and the connections to the content delivery network (CDN). Today’s popular websites
are highly dynamic, which means that content and advertisements change within short
periods of time. Therefore, multiple different websites are included in benchmarking cam-
paigns to diversify and average the sites’ individual behaviors.
Typically, five to eight different websites are used in one benchmarking campaign. They
are continuously observed and can be replaced if the applied rules are no longer met.
Depending on the focus of a benchmarking campaign, only global or only local favorites
can be selected for testing. It is best practice to have a mix of global and local favorites.
Examples of global favorites are www.google.com and www.wikipedia.org; examples of
local favorites are local newspapers, newsfeeds, sports and common local services such
as banks.
Websites to be included in a measurement campaign are preferably among the top rank-
ing sites listed on Alexa. Technically, the websites should have an average complexity and
meet the following criteria; otherwise, one of the next highest ranking websites should be
chosen.
Criteria include:
►► Websites of services that are predominantly accessed via a dedicated app on a
smartphone should not be selected. For example, Facebook, YouTube and similar
websites/services are typically not accessed through a mobile browser and should not
be used for mobile benchmarking campaigns.
►► Websites with a very simple structure and a small amount of data should not be
selected. Examples include login and landing pages that offer a service or further
selection only. A user would consider such pages as service access points and not
browsing experiences.
Results obtained by using websites that do not meet the above criteria or whose content
changes so that they no longer meet the above criteria are invalidated during postpro-
cessing and used neither in the reporting nor to calculate the network performance score.
Since websites may have to be changed during a measurement campaign, it is not a giv-
en that the same websites will be used throughout the entire campaign. Regular screen-
ing and the potential replacement of websites ensure the use of websites that adhere to
the predefined selection criteria and that measurement results are as close as possible to
the user’s perception while browsing.
For each campaign, we decided to include two of the most popular websites such as
www.google.com and www.wikipedia.org. If available, a Google website with a local do-
main should be chosen. The remaining websites are chosen for their complexity to reflect
the distribution of a large population of websites.
A couple of spare websites of appropriate size and complexity also have to be selected so
that websites can be changed during a campaign in case a chosen website no longer ful-
fills the defined criteria.
22
A-1.3 Best practice for selecting websites
There is an established procedure for the selection and continuous use of websites in
benchmarking campaigns:
►► Definition of test cases and measurement campaigns:
During the definition of a measurement campaign, a set of websites is selected and
proposed by Rohde & Schwarz MNT. These websites must fulfill the selection criteria as
listed above when accessed from the Rohde & Schwarz MNT office in Switzerland.
►► Precampaign local test measurements:
Since content and especially advertisements are delivered individually for the local
market and differently for individual MNOs, prior to starting the measurement
campaign the selected websites have to be checked in the local market to ensure that
they meet the above listed selection criteria by using subscriptions of local providers.
Measurement files have to be provided and the list has to be confirmed by
Rohde & Schwarz MNT experts. If confirmation fails, alternative websites have to be
proposed and tested.
►► In-campaign sanity checks:
Contents of websites are subject to regular change. It can easily happen that a website
suddenly no longer fulfills the selection criteria. The local test team is obliged to
monitor whether there are any severe changes in the measurement results.
A good practice is to monitor the success ratio for the selected websites. In environments
with good RF coverage, the success ratio is > 90 % when downloading a website within
the defined timeout (typically 15 s). If the success ratio falls below this threshold within
an observation period of a few hours, the Rohde & Schwarz MNT e xperts have to be in-
formed and – if possible – the website will need to be replaced by another one. In addi-
tion, measurement files have to be provided on short notice for offline analysis.
Based on our statistical analysis of more than 200 Alexa websites in June and July 2018,
the following conclusions have been drawn:
►► The size distribution of the most popular webpages decreases exponentially. 73 % of
all webpages are smaller than 2 Mbyte and 83.5 % are smaller than 3 Mbyte. Most of
the small websites are simply landing pages of search engines, social networks, email
login pages, etc. These landing and login pages will not be used in benchmarking
campaigns in accordance with the criteria defined in A-1.2 Selection criteria for
websites.
Relative occurrence in %
25
20
15
10
00
00
00
50
00
50
00
50
00
50
00
50
00
50
00
50
00
o5
80
o1
8
0t
to
to
to
to
to
to
to
to
to
to
to
to
to
to
>
0t
00
00
00
00
00
00
00
00
00
00
00
00
00
00
50
10
15
20
25
30
35
45
45
50
55
60
65
70
75
Website size in kbyte
►► A typical website consists of text content and several web resources, such as images,
external JavaScript files, external style sheets and other related files. The website is
correctly rendered in the web browser only if each of the resources is successfully
downloaded and parsed. Ideally, it is desirable to reduce the number of HTTP requests
made or required to display the website completely. The reason for this is that several
often time-consuming processes happen when downloading web resources.
The average number of resources (http requests) is approximately 74 across all tested
websites. 75 % of all tested websites have less than 100 resources.
Distribution number of resources in websites
Fig. 6: Distribution of resources in websites
30
Relative occurrence in %
25
20
15
10
0
0
40
60
80
0
o2
10
12
14
16
18
20
22
24
26
28
30
30
to
to
to
0t
to
to
to
to
to
to
to
to
to
to
to
>
20
40
60
80
0
10
12
14
16
18
20
22
24
26
28
24
Fig. 7: Distribution of images in websites
40
Relative occurrence in %
35
30
25
20
15
10
00
10
20
30
40
50
0
o1
15
1
o1
o1
o1
o1
o1
to
to
to
to
to
to
to
to
0t
to
>
0t
0t
0t
0t
0t
10
20
30
40
50
60
70
80
90
10
11
12
13
14
Number of images in websites
The methodology used and the measurement setup define the voice and data scenarios
to be followed during data collection to obtain meaningful results for:
►► Accessibility (e.g. call setup time (CST))
►► Retainability (e.g. call drop ratio (CDR))
►► Integrity (e.g. handover success ratio (HOSR))
►► Air interface logging
►► Voice call testing
►► Speech testing POLQA, narrowband and wideband
►► Video streaming (intrusive and non-intrusive) such as YouTube
There are many other KPIs registered during the drive test and the collected data can be
accessed via a very convenient interface on the SmartAnalytics report platform.
26
A-2.2 Voice test settings
28
Table 15: HTTP DL test – multi-connection
Upload size 1
Upload size unit Mbyte
Remote file to be defined
30
A-2.4 Rohde & Schwarz measurement systems
SmartBenchmarker is a solution for drive test based quality of experience benchmarking
campaigns.
Fig. 9: SmartBenchmarker
32
A-2.5 NPS campaign setup in SmartBenchmarker
To easily set up an NPS measurement campaign with all timings and other set-
tings as recommended, Rohde & Schwarz offers an NPS campaign template. In
SmartBenchmarker in the campaigns section, choose the NPS icon in the top right
corner.
This will open the first step of the NPS campaign setup workflow: Basic info. Here, you
can enter the campaign name and other basic properties as well as create a list of region-
al categories that will be part of the campaign.
In the second step, all project-dependent parts of the jobs can be configured. These are
mainly the URLs for browsing and HTTP transfer. In the top section, it is possible to acti-
vate options to reduce data usage (see section A-2.1.2 Data session) and enable packet
capture. Both settings are recommended.
The newly created items appear in the job and campaign lists. It is possible to edit them
manually afterwards, but then they lose their status of being NPS compliant. A warning is
displayed in the edit screen.
34
A-2.6 Postprocessing
SmartAnalytics calculates the NPS for the overall network and can drill it down by use
case, service, technology and other variables. It is not only an integrated quality score for
the overall network quality, it is also the ideal entry point into deeper analysis for network
optimization because the NPS makes it obvious at first glance where the most potential
for improvement can be found.
SmartAnalytics offers many ways to display the NPS and the contributing KPIs, including
the full list of KPIs and the remaining potential on a point scale.
36
APPENDIX B
The main differences between NPS V1.0 and NPS V1.1 are:
►► New thresholds for HTTP UL/L throughput KPIs
►► Introduction of two new KPIs as defined by ETSI TR 103559 V1.1.1; activity duration
for HTTP browsing test and social media test
►► TCP round trip time (RTT) in HTTP browsing test replaced by activity duration KPI,
keeping the same weighting
►► The weightings of the social media KPIs modified to be in line with ETSI TR 103559
V1.1.1
Table 21: NPS V1.0 HTTP data transfer contributors (Rohde & Schwarz implementation)
Table 23: NPS V1.0 HTTP browsing and social media contributors (Rohde & Schwarz implementation)
The following table shows the maximum scoring points for data services. In total, data
services can contribute up to 600 points and e.g. data transfer can contribute 150 points.
38
Table 25: NPS V1.0 data contributors category weighting
40
B-1.4 Abbreviations
Abbreviation Designation
3rd generation partnership project: globally recognized specifications and standards
3GPP
for GSM and eventually 3G network deployments
CA carrier aggregation: technology used in LTE to improve data throughput
DC dual carrier: technology used in WCDMA to improve data throughput
circuit switched fallback: technology that allows LTE devices to fall back to WCDMA
networks in order to establish phone calls when VoLTE is not available.
CSFB was specified in 3GPP Release 8. CSFB requires a software upgrade of the
CSFB operator’s core and radio network.
CSFB is often seen as an interim solution for LTE operators. Voice over LTE (VoLTE)
is considered to be the long-term goal for the delivery of voice services on LTE
networks.
European Telecommunications Standards Institute: independent, nonprofit, standard-
ETSI ization organization in the telecommunications industry with members across five
continents.
EU end user
global system for mobile communications: normally operating in 900 MHz and
GSM
1800 MHz bands
inter-radio access technology: allows handover and cell change between different
IRAT
technologies, such as 3G and 2G depending on the covered area of each technology.
ITU-T International Telecommunication Union – Telecommunication Standardization Sector
KPI key performance indicator
KQI key quality indicator
MNO mobile network operator
multiple input, multiple output: antenna technology for wireless communications in
which multiple antennas are used at both the transmitter and the destination receiver.
MIMO
The antennas at each end of the communications circuit are combined to minimize
errors and optimize data speed.
NGN next generation network
NP network performance
network quality data investigator (NQDI Classic): postprocessing system that maxi-
NQDI mizes the potential of data collected by QualiPoc and diversity products for network
and service optimization and benchmarking
OpCo operating company related to the customer group
OMG object management group
PM performance monitoring
QoS quality of service
RFS ready for service
radio access technology: the underlying physical connection method for a wireless
RAT based communications network. Many UEs support several RATs in one device such
as Bluetooth®, Wi-Fi, 2G, 3G or LTE.
SA service availability
SPoC single point of contact
SUA service unavailability
UE user equipment (usually mobile phones, smartphones or modems)
WCDMA wideband code division multiple access. ITU IMT-2000 family of 3G standards.
The Bluetooth® word mark and logos are registered trademarks owned by B
luetooth SIG, Inc. and any use of such marks
by Rohde & Schwarz is under license.
Regional contact
Europe, Africa, Middle East
Phone +49 89 4129 12345
customersupport@rohde-schwarz.com
North America
Phone 1-888-TEST-RSA (1-888-837-8772)
customer.support@rsa.rohde-schwarz.com
Latin America
Phone +1-410-910-7988
customersupport.la@rohde-schwarz.com
Asia/Pacific
Phone +65 65 13 04 88
customersupport.asia@rohde-schwarz.com
China
Phone +86-800-810-8228 /+86-400-650-5896
customersupport.china@rohde-schwarz.com
42
Rohde & Schwarz | White paper Network Performance Score 43
Rohde & Schwarz
The Rohde & Schwarz electronics group offers innovative
solutions in the following business fields: test and mea-
surement, broadcast and media, secure communications,
cybersecurity, monitoring and network testing. Founded
more than 80 years ago, the independent company which
is headquartered in Munich, Germany, has an extensive
sales and service network with locations in more than
70 countries.
www.rohde-schwarz.com
www.rohde-schwarz.com/mnt
Regional contact
►► Europe, Africa, Middle East | +49 89 4129 12345
customersupport@rohde-schwarz.com
►► North America | 1 888 TEST RSA (1 888 837 87 72)
customer.support@rsa.rohde-schwarz.com 3608177552