Professional Documents
Culture Documents
Web Testing
Web Testing
Web Testing
A Methodology
with
Tools and Considerations
Department of Transportation and Logistics
Chalmers
Göteborg 2001-02-06
It has been a challenge working in a new field, web application testing, and it has been great to see
the striking resemblance between the software testing process and the development process of any
project. Planning is certainly important in any kind of project.
Most projects, at some point, require the hands of several persons to progress. So did this one. We
would therefore like to thank the persons who at these points helped us.
First we would like to thank our tutors, as well as project owners, at Sigma nBit AB in Gothenburg,
Sweden, Peter Nielsen and Lars Patriksson, for their help getting the process started, and for sharing
their insights throughout the project.
We would also like to thank all the people helping to evaluate the Test Priority Sheet at different
stages through its development; Jesper Almström, Anders Averö, Magnus Edvardsson, Dan Jonsson,
Helena Olsson and Kenny Rubinsson.
Finally, we’d like to thank our tutor at the Department of Transportation and Logistics, Chalmers, Ola
Hultkrantz, for his guidance in the last stages of this Master Theses.
Resultatet av arbetet blev Test Priority Sheet, en matris bestående av områden där behov av testning
kan föreligga samt de faktorer vi anser styr behovet av testning hos webb-applikationer. Dessa
områden att testa har vi delat in under olika så kallade testtyper. Testtypen beskriver var fokus ligger
vid test av dessa områden, tillika vad som bör beaktas totalt sett vid test av webb-applikationer, i stor
utsträckning även vid test av annan mjukvara. Dessa testtyper är:
• Funktionalitet
• Användarvänlighet
• Server gränssnitt
• Kompatibilitet
• Prestanda
• Säkerhet
De faktorer vi anser styr behovet av testning, således var vikt bör läggas vid test, är komplexitet samt
målsättning. Komplexiteten hos en webb-applikation säger oss hur den är uppbyggd, alltså vilka
komponenter och tekniker som används. Vet vi detta så vet vi vad som finns att testa. Målsättning är
en kombination av syfte och målgrupp, vad man vill uppnå med sin applikation.
Där finns stora paralleller mellan våra faktorer, komplexitet och målsättning, och de faktorer som
nämns vid riskanalys, sannolikhet för fel och effekt (kostnad) av fel. Vi använder också våra faktorer
på samma sätt som vid riskanalys, d.v.s. att vi multiplicerar dessa faktorer med varandra och erhåller
ett risk-, eller prioritets-, värde. För att erhålla numeriska värden och för att belysa vad vi anser viktigt
i bedömningen av de olika områdena, har vi utformat två frågor att besvara under varje område, en
för varje faktor.
Vad gäller verktygen för automatiserade tester så utvärderade vi två typer av verktyg från två olika
tillverkare. Dessa verktyg är:
Robot och QuickTest är verktyg för funktionstester och TestManager och LoadTest är verktyg för
prestandatester. Rational TestManager är i själva verket ett verktyg som även används för
kravhantering och liknande, men nu mera innehåller det också funktionaliteten från det tidigare
fristående verktyget Rational LoadTest. Astra-serien från Mercury Interactive är speciellt framtagna
för test av webb-applikationer, medan Rationals produkter kan användas även för test av annan
mjukvara.
Vi har utvärderat funktionen hos dessa verktyg och även jämfört dem med varandra. Slutsatser vi
dragit kan kortfattat sägas vara följande:
• Det krävs att användargränssnittet inte genomgår större förändringar för att verktygen
ska vara användbara
• Ett uppenbart användningsområde är vid datadrivna tester där indata kan
parametriseras
• Användandet av verktyg för funktionstester kan medföra att testaren koncentreras mot
att leta fel i automatiskt skapade skript eller hos verktyget, inte hos applikationen som
testas
• För att få ut full funktionalitet från verktygen krävs utbildning, erfarenhet och
programmeringskunskap
• Prestandatester bör i de flesta fall automatiseras p.g.a. svårigheter att utföra dessa
manuellt.
Abstract
The process of establishing a methodology for web application testing resulted in the Test Priority
Sheet. This methodology helps determine the most important areas to test in any web application, as
well as being a tool for prioritizing when time is short. The methodology is general to be useful for
any web application. For this kind of tool to be used, it is required that it is short and easy to use. The
Test Priority Sheet is all this.
Testing is never really completed. Testing can only show presence of errors and not the absence of
them. Due to the characteristics of the Internet, time is often short creating a need to prioritize the
testing efforts. To be able to do this you need to know which factors that set the need for testing. The
factors are:
• Complexity
• Aim
Complexity is a factor considering the architecture of a web application and of which components it
is built. Aim is a combination between purpose and target group, considering what you mean to
achieve with the application.
Today, web applications are sometimes categorized based on their interactivity. For testing, this
categorization is not useful. Knowing the complexity of a site does not mean that we know what to
test. The reason for this is that similarities between two web applications of different complexity
categories are sometimes greater than within a single category. Two applications that look the same
might in fact differ greatly in how they are built. Thus, even when combining complexity and aim
there is no way to beforehand tell what one needs to test. Instead, every part of an application needs
to be separately considered. All these areas where errors might occur are grouped under the
appropriate test type. The test types are:
• Functionality
• Usability
• Server side interface
• Client side compatibility
• Performance
• Security
To separately consider all areas where errors might occur we applying a risk-based approach, where
complexity is likelihood of an error to occur and aim is impact, or cost, of an error. For each area
under the test types there are two questions to answer, one for each factor. For example, Forms is an
area under functionality where the following questions are to be answered:
• Are forms present? To what extent and how advanced?
• How critical are forms for the site and how sensitive are users to problems with forms?
Numerical answers to questions such as these are multiplied with each other, creating a risk-based
order of priority when compared to other values established the same way. All this is compiled into a
matrix, where the rows consist of the areas to test and the columns of the factors to consider. This is
the Test Priority Sheet.
Further, tools for automating tests have been evaluated. Two types of tools were evaluated. First, tools
for mainly functional tests, and second, tools for performance tests. The tools are used for script
creation, script execution and result analysis. Functional testing tools are mainly used for regression
tests, since when a test has been made, it can be rerun with minimal manual effort. Performance
testing tools are used in order to evaluate how the tested application reacts to high numbers of
simultaneous users. The tool simulates a number of actual users and captures and displays data on the
performance of the site.
• Since the obvious use of graphic interface in web applications and the tools dependence of it,
the GUI should have reached a rather high level of stability before using these tools
• An obvious use of the tools is when performing data-driven tests. By using in-parameters for
the application, the test effort may be greatly reduced. These tests are of interest even if the
GUI is not finished due to the saving of resources
• Straight replay of recorded tests results in low rate of bug detection. This further implies the
need for regression testing or data-driven tests using these tools.
• Using the tools often focus testers on weaknesses in the tool rather than the tested application
• The tools are not easily learned. In order to take full advantage of them to create useful and
maintainable test cases, experience and education on the tools is needed as well good skills in
programming. The architecture of the tested application should also be well understood
• Performance testing tools are of obvious use since the difficulty in performing performance
tests manually is extent.
________________________________________________Table of Content
Table of content
Preface.........................................................................................................................3
Sammanfattning...........................................................................................................5
Abstract........................................................................................................................7
Table of content............................................................................................................9
Introduction.................................................................................................................12
Background......................................................................................................................... ....12
Goal 12
Problem definition.............................................................................................. ....................12
Method 13
Target group.................................................................................................................... ........13
Delimitations............................................................................................... ...........................13
Web Applications ........................................................................................ 14
1.1 The recent evolution of Internet...........................................................................14
1.2 The present state of the web................................................................................16
1.3 Classification of web sites....................................................................................16
1.4 Web applications..................................................................................................18
1.5 User Issues...........................................................................................................18
Fundamentals for Testing................................................................... .........20
2.1 The Process.........................................................................................................21
2.1.1 Test planning........................................................................................................... .......22
2.1.2 Test case design & implementation.................................................. .............................23
2.1.3 Test execution & evaluation....................................................................................... ....24
2.1.4 Test phases..................................................................................................... ................24
2.1.5 Test types................................................................................................. ......................25
2.2 Structuring Test Types..........................................................................................27
2.3 Test Approaches...................................................................................................28
2.3.1 Walkthroughs and inspections........................................................... ............................28
2.3.2 White-box testing....................................................................................................... ....28
2.3.3 Black-box testing............................................................................................ ...............28
2.3.4 Gray-box testing.................................................................................... ........................29
2.4 Prioritizing.............................................................................................................29
2.4.1 Why prioritize?.......................................................................................... ....................29
________________________________________________Table of Content
Introduction
This report is a Master Theses at The Department of Transportation and Logistics at Chalmers,
Gothenburg. The assignment was given by Sigma nBiT AB in Gothenburg and the work was done at
Sigma nBiT AB. The Master Theses covers software testing in general and web application testing in
specific.
Background
Internet is today an important competitive tool. In this environment it is an absolute must that your
web site performs the way it is supposed to maintain customer satisfaction. To be confident in ones
web application, thorough testing is needed before releasing the application on the web. Sigma nBiT
AB has, as one of many areas, software testing as a key area of expertise. The company wishes to
broaden its knowledge in the growing area of web application testing and the Master Theses is a step
taken in that direction.
Goal
The assignment can be divided into two parts.
The goal of the first part of the assignment was to establish a web application test methodology that
would simplify web application testing. The methodology should be general in its character and easy
to use but still comprehensive, covering all important aspects of web application testing. The
methodology should pose as a mean for prioritizing where to allocate testing resources during the
development of a web application.
The goal of the second part of the assignment was to gain knowledge on automated tools for web
application testing. Sigma nBiT wishes to broaden its knowledge on this type of tool as well as build
up experience on these within the company. This report is an early step in that direction by evaluating
some tools offered on the market. The goal is to recommend if, where and when tools of this type
may be used in the test process, as well as which manufacturer of tools offer the best product.
Problem definition
In order to reach the goals, certain sub problems have been defined that can be presented as follows:
• Categories of web sites
How do web sites differ in how they are used? Is the site aimed at the general public or
internal co-workers?
• Technologies used in modern web site construction
What builds up a web site today? What technologies and techniques are used in web design?
• How does the process of testing web applications differ from test of traditional software?
• Are tools for automated tests useful when testing web applications?
What kinds of tests are the tools useful for and when throughout the process may they be
used?
• Which manufacturer offers the best tools? Do they differ in functionality?
Method
The area of web application testing is relatively new since web applications in them selves are a
relatively new occurrence. Because of this the amount of books written on the subject is limited.
Instead, much of the information needed to be able to complete this Master Theses was gathered
using the medium in question, the Internet. Of course, books were still a big source of information.
Besides the search for information in books and on the Internet, some interviews were conducted and
a number of persons were involved in evaluating our first steps towards a useful methodology. These
evaluations were mainly informative, giving many good insights on the way to the final result.
As for the automated testing tools, evaluation copies were obtained when needed. Tools from
different vendors were compared, testing the same web applications.
Target group
This report is mainly aimed at personnel in an organization with software testing as an activity. It is
proposed to constitute background material for test of web based applications. The reader of this
report is supposed to have basic knowledge of elements connected to information technology and web
components. If terms used are not familiar to the reader, concerning any of these two areas,
information may be found at www.webopedia.com.
Delimitations
Activities such as test management are not covered in this report. The methodology to be established
covers all important areas to test. Under security issues, though, we only consider the general need of
security. Security issues are a large area where much has been written. It will not play a major part in
this report. We will not cover connected area to web, such as wap applications, in this report.
_Chapter One_________
Web Applications
_________________________________________________________________________________
Internet is still relatively new, which creates a need to discuss certain issues before tackling the aim of
the project; the making of a web application test methodology.
Picture 1.1. The growth of Internet. The graph shows the growth after 01/1995.
Many think of World Wide Web as Internet. Today when you are on the net you are likely to be
visiting a WWW-site, but WWW was not released until 1991.
The market for certain offers is limited, making it necessary for new companies to have something
different or better to offer. Together with the fact that the web now reaches almost all possible target
groups, it makes it inevitable that new businesses enter the web. New businesses demand new
features. All this adds to the growing complexity of the sites on the web.
Not long ago most sites did not offer much of interactivity. Today the possibilities for interactivity are
endless. The development of new techniques for Internet makes it easier every year to make reality of
your ideas regarding your site. Special features are performed on your browser and live events are
broadcast over the world, through the web.
The possibility of real time publishing in many cases sets the pace for web site development. With the
growing complexity and demands for rapid deployment the web site development tend to lack testing
efforts even when the needs for it, in fact, increases.
The growth of Internet, together with the increasing number of personal computers in the world,
makes for an increase in accessibility, meaning Internet is now available to more or less everyone.
This makes the Internet even more interesting to new companies which in addition means that it will
keep growing.
The first classification is based on the different business purposes of a commercial web site. James
Ho (Evaluating the World Wide Web: A Global Study of Commercial Sites, 1997) classifies these
purposes of commercial web sites into three categories:
Promotion is information about products and services that are part of the company’s business,
whereas provision is information about, for instance, the environmental care program the company
may sponsor. Processing refers to regular business transactions.
Although this classification is meant to show the purposes with one commercial web site, we believe
it can also be used to categorize the main purpose of a web site. For instance, a company’s on-line
catalogue would be a promotional site, a private person’s homepage may be considered a provisional
site and, of course, a web site for banking services may be considered a site for processing.
Another classification is based on the degree of interactivity the web site offers. Thomas A. Powell
(1998) classifies web sites into five categories:
This classification is derived from the need of methodology during the development of web sites. The
classification is useful also for the testing process, not only for the need of methodology but also for
how extensive the testing must be. For instance, for a static web site the demands may be, besides that
the information is correct and up-to-date, that the source code is correct and that the load capacity of
the server is great enough, i.e. the server can handle a large enough number of visitors at the same
time. There is no need to go much deeper in this case. For the other extreme, Web-Based Software
Application, the requirements are much greater, where, for instance, security is of great importance.
These two classifications are two major ways of showing distinctions between web sites. Together
they provide information about interactivity and purpose which gives us an idea on the site’s
complexity.
Another problem that always irritates when on the web is broken links. We don’t think that there is
anyone with some web-browsing experience that hasn’t encountered this. It is an always-returning
error that will continue to haunt the web for as long as pages are moved or taken off the Internet.
These relatively small errors shouldn’t be too difficult to remove, and there is therefore no excuse to
have broken links on a site for more than a short period of time.
As for the dramatically increasing use of Internet banking services, one must feel secure; otherwise
no one would want to make transactions over the web. Still today, many Internet users are skeptical
towards exposing personal information, which should suggest even higher demands on security.
_Chapter Two_________
_________________________________________________________________________________
Testing is the process of verifying that a product meets all requirements. A test is never complete.
When testing software the goal should never be a product completely free from defects, because it’s
impossible. The average is 16 faults per 1000 lines of code (Peter Nielsen, 2000) when the
programmer has tested his code and it is believed to be correct. When looking at a larger project,
there are millions of lines of code, which makes it impossible to find all present faults. Far too often
products are released on the market with poor quality. Errors are often uncovered by users, and in that
stage the cost of removing errors is extensive.
Studies show that testing often represents between 30-50% of the software development cost (RUP1).
In order to reduce testing costs, a structured and well defined way of testing needs to be implemented.
Certain projects may appear too small to justify extensive testing. However, one should consider the
impact of errors and not the size of the project.
Important to remember is that, unfortunately, testing only shows the presence of errors, not the
absence of them.
The figure below (Fig 2.2.) gives a general explanation about both the development process and the
testing process but it also shows the relation between these two processes. In real life these two
processes, as stated earlier, should be viewed and handled as one. The colors in the figure are to show
relations between phases. For example, system testing of an application is to check so that it meets
the requirements specified at the start of the development. A large part of integration testing is to
check the logical design done in the design phase of the development. The relationships between the
phases are based partly on the V-model as presented by Mark Fewster and Dorothy Graham (Software
Test Automation, 1999).
1
Rational Unified Process is a software engineering process developed by Rational. It provides an approach on how
to assign tasks and responsibilities in the development process.
As mentioned above, a well defined and understood way of testing is essential to make the process of
testing as effective as possible. In order to produce software products with high quality one has to
view the testing process as a planned and systematic way of performing activities. The activities
included are Test Management, Test Planning, Test Case Design & Implementation and Test
Execution & Evaluation. Test management will not be further discussed in this report.
In order to be able to create a complete test plan, resources needs to be identified and allocated.
Resources include
A good test plan should also include stop criterias. These can be very intricate to define, since the
actual quality of the software is difficult to determine. Some common criterias used are (Rick Hower,
2000):
Since every error in today’s complex software very rarely is found, one can go on testing forever if
stop criterias aren’t used. Specific criterias should therefore be defined for each separate test case in
the process.
The outcome of test planning should of course be the test plan, which will function as the backbone,
providing the strategies to be used throughout the test process.
The design of test cases is based on what is to be tested. Features to be tested often present a unique
need and the testing should be done in small sections to cope with the differences in test case design
that occur due to this. When testing a single feature there are a number of things to consider; how
does it work, what may cause it to crash, what are the possible variables?
Both data input and user actions should be done in ways that test the designed logic so that we get
answers to the questions; do we get the expected answer, what happens when wrong input is used? If,
for instance, you are prompted to write your age in a field, the logic behind this may expect a number
between 0 and 100, but of course you might by mistake put a letter instead. What happens then? If the
application is not designed to see this mistake it will cause the application to crash or at least give an
unexpected answer. This makes it important to test this feature so that it does not accept the wrong
input but still, of course, accepts the expected input.
Considering the many ways to make similar mistakes there are thousands of different inputs that
should be tested. It is not possible to test all these potential inputs, instead one should choose input
that in a good way represent all possible groups of input. This procedure is often referred to as
boundary testing.
For the results of the test to be useful and to know when stop criterias are reached, the completeness
of the test cases is very important.
When creating tests based on the test cases, certain objectives should be addressed.
When the actual results from a test do not match the expected, certain actions have to be taken. The
first is to determine why the actual and the expected results differ. Does the error lie in the tested
application or e.g. the test script?
When errors are found, they need to be properly reported. Information on the bug needs to be
communicated so that developers and programmers can solve the problem. These reports should
include, among others, application name and version, test date, tester’s name and description on the
occurred error.
When a bug has been properly taken care of, the software needs to be re-tested to verify that the
problems have been solved, and that the corrections made did not create new conflicts.
During the process of development and testing of software, many changes will surely be made to the
software as well as its environments. When such changes are made, there will be a need to assure that
the program still functions as required. This kind of testing is called regression testing. The difference
between a re-test and regression test is that the latter is done when changes has been made to the
program regarding, for example, its functionality, whereas re-test is done to test the software after bug
fixing.
functioning before being integrated with other parts. Important in the development process is that
neither of these phases is completely separated from each other. There is no definite border between
when the different phases starts or ends. They can be seen as an overall approximate guideline for
how to perform a successful test. Several authors, including Bill Hetzel and Hans Schaefer, describe
the test process as consisting of the following phases:
• Unit testing
• Integration testing
• System testing
• Acceptance testing
Unit testing
Also called module test. The testing done on this stage is on the isolated unit.
Integration testing
When units interact with others, one must assure that the communication between them works.
Conflicts often occur when units are developed separately or if the syntax to be used is not
communicated in a sufficient way.
System testing
When the system is complete, testing on the system as a whole can commence. Test cases with actual
user behaviour can be implemented and non-functional tests, such as usability and performance, may
be made.
Acceptance testing
The purpose is to let end users or customers decide whether to accept the system or not. Are the users
feeling comfortable with the product and does it perform the activities as required?
There are no definite borders between the types and several of them can seem overlapping with
adjacent areas. Needless to say, there are several opinions on this matter and we base the following
descriptions on authors such as Hans Schaefer, Bill Hetzel, Tim Van Tongeren and Hung Q. Nguyen.
Functionality testing
The purpose of this type of test is to ensure that every function is working according to the
specifications. Functions apply to a complete system as well as a separated unit. Within the context of
the web, functionality testing can, for example, include testing links, forms, Java Applets or ActiveX
applications.
Performance testing
To ensure that the system has the capability that is requested the performance have to be tested for.
The characteristics normally measured are execution time, response time etc. In order to identify
bottlenecks, the system or application have to be tested under various conditions. Varying the number
of users and what the users are doing helps identify weak areas that are not shown during normal use.
When testing applications for the web this kind of testing becomes very important. Since the users are
not normally known and the number of users can vary dramatically, web applications have to be
tested thoroughly. The general way of performance testing should not vary, but the importance of this
kind of test varies (Schaefer, 2000). Testing web applications for extreme conditions is done by load-
and stress testing. These two are performed to ensure that the application can withstand, for example,
a large amount of users simultaneously or large amount of data from each user. Other characteristics
of the web that is important can be download time, network speed etc.
Usability
To ensure that the product will be accepted on the market it has to appeal to users. There are several
ways to measure usability and user response. For the web, this is often very important due to the users
low acceptance level for glitches in the interface or navigation. Due to the nearly complete lack of
standards for web layout this area is dependent on actual usage of the site to receive as useful
information as possible. Microsoft has extensive standards down to pixel level on where, for example,
buttons are to be placed when designing programs for windows. The situation on the web is
exceptionally different with almost no standards at all on how a site layout should be designed.
Compatibility testing
This refers to different settings or configuration of, for example, client machine, server or external
databases. When looking at web this can be a very intricate area to test due to the total lack of control
over the client machine configuration or an external database. Will your site be compatible with
different browser versions, operating systems or external interfaces? Testing every combination is
normally not possible, so identifying the most likely used combinations is usually how it’s done.
Security testing
In order to persuade customers to use Internet banking services or shop over the web, security must
be high enough. One must feel safe when posting personal information on a site in order to use it.
Typical areas to test are directory setup, SSL, logins, firewalls and logfiles.
Security is an area of great importance as well as great extent, not least for the web. A lot of literature
has been written on this subject and more will come. Due to the complexity and size of this particular
subject, we will not cover this area more than the basic features and where one should put in extra
effort.
When comparing Tim Van Tongeren’s (1998) way of structuring test types with Vincent Soberano’s
(1998) views on the subject, one notices the similarities between the two authors’ way to classify and
exemplify different types of tests and what is included within the type of test. The most interesting
divergence is in the way they present interface issues. Soberano preferred to combine Functionality
and Structural under the main heading User Interface were as Van Tongeren presents User interface
separate from Functionality. Whether one is more useful than the other is difficult to determine, but
they both represent interesting approaches toward how different areas relate to one another.
Thomas A. Powell presents a somewhat different approach to this matter. His grouping of tests is
interesting though he lists unit- and integration testing under functionality tests. One wonders how
familiar Powell is with traditional test methods but he still offers an interesting view to when, during
the phases of testing, different types are to be produced and executed.
Hung Q. Nguyen (2001) share several similarities with foremost Van Tongeren and Soberano, mainly
in the way he presents the main areas to test. One obvious difference is in that Nguyen emphasize
Help and Installation test, as two areas that call for separate attention.
Which author that has the best approach is impossible to say. They all share the same backbone but
presents different ways to pinpoint vital areas when testing a web site.
Since the area of web testing is still in its cradle, there are few accepted standards and definitions to
lean on. This often makes it necessary for the process of testing web applications to be defined for
every test process done. Kathleen A. Iberle (www.stickyminds.com Step-by-Step Test Design, 2000)
upholds that a unique test type list has been created for each major type of product she has worked on,
and they have all been slightly different. The conclusion is that which test types will be more
important than the next, and in what grade prioritization between types have to be done, will differ
from test to test.
If…
…else…
Figure 2.3. White-box Testing. The code and architecture are known.
X
?
Y(X)
Figure 2.4. Black-box Testing. Code and architecture unknown. Valid input and expected outcome are known.
The White-box approach is to an higher extent used early in the development and testing process,
before there are any visible functions to test, while the Black-box approach is used later when
functions are visible and can be tested.
2.4 Prioritizing
What these features are, differ from application to application and they are not always obvious.
Considering the application’s purpose might help deciding the important parts of the site. Earlier we
introduced purposes of web sites that we had derived from Ho’s (1997) business purposes. These
purposes present different needs of prioritizing. A site for business transactions, for instance an
Internet banking service, has security requirements that must be fulfilled for us users to feel confident
in the application, or we will not use it. A promotional site, on the other hand, has no apparent need of
high security in that sense. This can be translated into assessing the significance of a specific function
or the importance of a function not to fail, which leads us to risk-based analysis where some ideas
come from James Bach (2000).
Whenever we make decisions there is something working in the background considering things that
might go wrong and the effects that they might have. This is also the basis of risk-based analysis.
Risk-based analysis is a way of determining the order of priority between all possible errors that
might occur. Risk-based analysis takes into account the two factors mentioned above:
These two factors are given numeric values and are multiplied with each other creating a risk-value.
The higher the value – the higher the risk – the higher the priority. Based on this the further test
actions can be planned.
2.5 Challenges
Web applications share many similarities with traditional client/server applications, but there are also
a number of differences that create new problems when it comes to testing.
One example is that web application developers often use a great number of different techniques to
create the features on their web sites. The mix of techniques used consists of HTML, ASP, Java,
JavaScript, ActiveX and others.
Also creating problems is the wide range of user-side configurations. The web application must work
on many different combinations of hardware configurations, Operating Systems and browsers. PC,
Mac, OS/2, Windows NT, Windows 95/98, Internet Explorer, Netscape Navigator and more, make for
a great number of possible combinations.
For a traditional client/server application the number of simultaneous users often can be predicted.
For web applications this is very difficult. It is hard to know the number of hits per day the site might
get and also the variation over the day.
These are some of the challenges you encounter in web application development and testing.
More characteristics of web application testing are the difficulties of defect tracking. The many layers
interacting can all be responsible for the error, or symptom of error, to occur. Hung Q. Nguyen (2000)
presents what he considers to be five fundamental considerations:
• When we see an error on the client side, we are seeing the symptom of an error—not the error
itself.
• Errors may be environment-dependent and may not appear in different environments.
• Errors may be in the code or in the configuration.
• Errors may reside in any of several layers (Client/Server/Network).
• Examining the two classes of operating environments—static versus dynamic—demands
different approaches
Where Static Operating Environment is configuration and compatibility variables and Dynamic
Environment is resource and time-related errors.
_Chapter Three_______
_________________________________________________________________________________
The goal was to achieve an easy to follow methodology with quantifiable measures, for any tester to
follow when testing any type of web site or application. The methodology should be:
• Comprehensive
• Short
• Easy to use
The strategy on how to reach the goal was to study and establish the factors that set the testing need
for a site. We have also studied what areas there are to test in web applications. We have previously
presented different authors views on this area and in this chapter we will present our own views on
the subject (see 3.2).
When Powell (1998) describes the development of web sites, he divides them into five groups based
on their interactivity, as mentioned in chapter 1.1.3.
With this list, he describes the differences in how web sites are constructed and also the differences in
development strategies. With interactivity he points out differences in complexity and in the way the
sites are built. Based on the differences in complexity, the test process will most certainly differ from
site to site. Depending on the technology used and differences in how features are created,
divergences in the test approach will be evident. It is therefore clear that the complexity of the site, in
how it is built and to what extent different technologies are used, will be a main factor in determining
the test approach and to prioritize urgent areas.
Noticeable are the similarities between Powell’s five categories and the three purposes we derived
from Ho’s (1997) classification of web sites or rather the purpose with them.
Interesting here is that Ho points at the differences in purpose of the site and how one creates value
with web sites as the tool. When considering risk based testing as described in chapter 2.4, one can
see the connection between the value created, based on the business purpose, and the seriousness of
an error that might occur. If the site is, for example, an online banking site, the value created may be
reduced costs for personnel as well as making it easier for customers to use the bank’s services and
therefore attract more customers. The connection between error effect and the value created based on
the purpose now becomes evident.
We have previously discussed what to consider when prioritizing. In that chapter (2.4) we did not
mention interactivity, or complexity, which now seems to be an important factor when considering
what to test. However, the complexity factor tells us more about what specific components there are
to test while purpose helps us decide in what order they should be tested. We have, it would seem,
established a base to help decide what to test in a specific web application. However, purpose is not
the sole factor when prioritizing. Target group is also of importance. Who is it that you want to view
your web site? If your site is meant to be a place where you write down funny stories for your friends
to read, there might not be a great need for testing certain things or to test thoroughly. If, on the other
hand, your web application is meant to help your company conquer the world, there is no room for
performance slacks, broken links or usability problems. Viewing possible target groups and their
characteristics we find several possible groups that may differ in their needs and demands on the
application in a way that gives us incentives to use them for determining the testing need. One
example of possible categorization of target groups is presented below.
The categories are:
Together purpose and target group give a good view over the importance of certain features and the
effect of an error in any of these features.
At one point, after reading Powell’s categorization of web sites, the idea was to use a similar
categorization of web sites, combining interactivity with other factors, to create a small number of
categories. The idea at this point was to predefine a small number of test approaches, depending on
the categories, where we would exclude areas unnecessary to test from the complete list we will
present in chapter 3.2. We found that this cannot be done since, whatever the degree of interactivity
and whatever purpose one might have, the way a site is built and of what components, differs greatly.
Thus, we cannot predefine what one should test based strictly on interactivity, purpose and target
group. This did not affect the primary goal.
Depending on the factors, purpose (P) and target group (T), the occurrence of an error has different
impact. A natural step to take is therefore to view the combination of them as varying cost of an
error(C). What we get is one of the factors of Risk-based analysis. The more complex (Comp) an
application is, the more different features there are, making it more likely that an error will occur.
This reveals the other factor of Risk-based analysis, the likelihood of an error to occur (L).
In compliance with the theory of Risk-based analysis discussed in chapter 2.4.2 we get the following
formula on the risk value:
R = L(Comp) * C(P,T)
The risk value is in fact a value describing the testing need. The higher risk, the higher the need for
testing (see fig. 2.5).
Functionality
1. Links
2. Forms
3. Cookies
4. Web Indexing
5. Dynamic Interface Components
6. Programming Language
7. Databases
Usability
1. Navigation
2. Graphics
3. Content
4. General Appearance
2. Browsers
3. Settings, Preferences
4. Printers
Performance
1. Connection speed
2. Load
3. Stress
4. Continuous use
Security
1. General Security
All important issues under each test type are addressed in Appendix B.
When comparing the two versions by Hung Q. Nguyen and Tim Van Tongeren, we notice that they
both separate User Interface from Functionality, whereas Soberano lists Functionality together with
Structural design, as headings under User Interface. Noticeable here is that Soberano has separated
the functional aspect of User Interface from the Structural design, i.e. separating usability from
functionality. When comparing these two alternatives, we believe that the later better fit our needs and
that it is more applicable on the web, based on how it is to be used. Why this is so, is based on the
way the interface towards the user is made with the web as a medium. There are very few standards
for how a web site should be designed in order to make users experience the site as user friendly and
comprehensible. Web sites encountered often show poor consideration for how users respond and act
on the web. We therefore believe that usability issues should be addressed separately to ensure that
both the functionality and usability aspects are covered when testing web sites.
We have further come to the conclusion to separate user side- from server side configuration issues by
addressing Server side Interface issues separately from Client side Compatibility. When studying the
authors’ opinions in this matter, we have used a definition closely related to both Van Tongeren’s and
Soberano’s definitions. Nguyen chooses to combine client and server issues but presents database
issues separately, whereas we list the functional aspect of database issues as a part of Functionality
and configuration issues under Server side Interface.
We have chosen Nguyen’s approach to include load- and stress testing in Performance testing and it
seems a logical way to reason, since these types of tests actually measures the performance of the
system. Other performance issues are also discussed under this heading such as connection speed and
continuous use.
Security issues will not be addressed at a more detailed level then General Security, since the area is
of such great extent and would need a report of its own to be satisfyingly covered.
_Chapter Four________
The Methodology
_________________________________________________________________________________
The discussion on why we do what we do throughout the process of establishing the sought
methodology, along with other questions, such as at what point in a web application development
process the methodology can be used, are presented throughout this chapter. In Appendix C there is a
manual on how to use the Test Priority Sheet. Along with the manual are the questions that are to be
answered in the matrix. First, though, we briefly present the result of our efforts.
to each other and the highest values are assigned the highest testing need and should therefore be
prioritized. Besides the testing need values, using the matrix, called Test Priority Sheet, gives you a
good idea on the overall complexity of your application and the testing effort needed throughout the
development.
Testing Need
Complexity
Aim
Functionality (0-3) (1-3)
Links
Forms
Cookies
Web Indexing
Programming Language
Dynamic Interface Components
Databases
Usability
Navigation
Graphics
Content
General Appearance
Performance
Connection speed
Load
Stress
Continuous use
Security
General Security
When evaluating the results from the methodology it is important to bear in mind that the
prioritization recommended by the matrix should be a guideline and a way to identify extreme values
in either direction. Areas with high values should be tested as early as possible while low value test
areas may occasionally not be tested at all. Mid range values may be difficult to distinguish from each
other and may therefore not always be prioritized in a strict order.
These are examples of questions we believe will help determine what to test, or what areas that have
the greatest need for testing. Before answering these questions one must establish what the factors
are. For Complexity, one must consider how the application is built, of what components, both
hardware and software, and the architecture of the application. Purpose is what the application is
meant to achieve and Target Group is, of course, for what group, or groups, your efforts are aimed.
Based on the list of the test types and areas recommended, we formed a matrix. In the cells of the
matrix there is supposed to be the answers to questions like the once presented above. To have any
real use of the matrix the answers needs to be numerical. The numbers in each row will then represent
the significance of the test area in that row. When the numbers within each row are multiplied with
each other the product becomes a numerical value representing the need for testing, as established in
chapter 3.1. The value is not in any way intended to show the effort involved in testing the specific
area, it only shows the need for testing, relative other areas.
When evaluating the matrix so far, we find that it is sometimes hard to distinguish differences
between the answers in Purpose and in Target Group. The two factors are obviously connected and
when answering these two, it’s often hard to decide where the line is drawn between them. Therefore,
when analyzing test persons’ answers, it shows that they often differ in how they answer these two
questions depending on how they interpret and separate them. On several occasions, some have put,
for instance, a 2 under Purpose and 3 under Target Group while others put the opposite. On some
questions, purpose is a more relevant approach than target group, and on others it is the other way
around. When realizing that this will probably always be the case, we consider the relation between
these two factors to be so strong, that it is worth considering combining the two. This is done and the
factor is named Aim. This turned out to be a more comprehensive way without losing any
information.
We soon realized that some differentiations would have to be introduced, based on under which of the
main areas the questions was to be answered. As a result, instead of two questions, twelve were
needed, two for each main area. This makes it easier to answer the questions, since less work needs to
be put in to understand what to consider when giving your answer. Presented below are examples of
questions and the differences that are necessary between, for instance, functionality and performance.
Functionality
• Is the feature present - To what extent and how advanced
• How critical is the feature for the Aim of the site
Performance
• Does the site contain features that demand specific performance and to what extent
• How sensitive are users to the performance of the site
When test persons evaluated the matrix, it was noted that even within the test type groups, the
questions were still too general in order to be answered without need of interpretation and adjustment
to the single test area. It was therefore considered if the use of the methodology wouldn’t be even
more simplified by the construction and use of separate questions for each and every test area in the
matrix. This would also reduce the amount of subjective interpretations and misunderstandings when
used, as would otherwise be obvious. An example of where the same question may be difficult to ask,
or answer, is on Usability. To define the complexity of graphics and content, interpretation difficulties
become obvious. If they on the other hand could be addressed separately and in a way to suit the area,
the effort to determine the value for that specific area would be reduced. Separate questions were
therefore produced and evaluated with a positive result.
First, overall testing requirements may be established using the methodology on the application as a
whole. For instance, the project leader and the test leader go through the Test Priority Sheet
considering the established architecture and requirements of the application.
A second step is to view the points of integration and consider what requirements that can and should
be verified at those points. Integration is usually done on several levels, such as integrating single
units with each other as well as putting larger systems together by integrating subsystems consisting
of previously integrated units. The functionality that is being put together at the different levels must
of course be tested, and having planned the development thoroughly enables the testing of that
function to be planned at the start of the project.
Third, having used the Test Priority Sheet on unit level, the project leader and test leader are able to
give valid advise to developers what to test on unit level as well, even before the coding has started.
Together these three steps constitute a ground for thorough test planning. The effort required for
testing throughout the development of the application can be established at the start of the project,
making it easier to have a valid time and cost estimate for the project.
establishing the most important areas to test making sure you do not spend valuable time testing areas
where an error would perhaps not cause a problem.
General
In what ever way you use the Test Priority Sheet, it is important to be familiar with the area
definitions as well the questions, so that no mistakes are made because of misinterpretations. What
role interpretation plays is discussed as part of chapter 4.5. Furthermore, it is important that every
user of the methodology is well aware of the aim with the application, which might otherwise lead to
misguided testing effort. If the planning of unit testing is done by the developer of that unit, it is
important that the project leader communicates the aim.
4.5 Evaluation
To know whether the methodology is useful and to assure future acceptance of the methodology, it
was presented, at different stages of development, to different people who evaluated it by using it on
actual web applications as well as in web application development projects.
It was the early evaluations that led to the use of the factor Aim, instead of the factors Purpose and
Target Group. They also to some extent affected the questions to answer on the Test Priority Sheet.
The later evaluation of the methodology was done by people in a web application development
project. The results of the Test Priority Sheet differed some between the test persons. This is not at all
surprising, but it still needs to be analyzed why the results differed. Their evaluations are presented in
Appendix D. We see five possible reasons behind differences such as these. They are:
1. Role
Differences in what part of the application one develops. Answers are obviously shaped by
what part of an application one is responsible for.
2. Experience
Past experience will affect what one considers being important.
3. Phase of development process
Is it applied on a single unit or on a system? Possibly, when time is short, some prioritizing
sneaks in when answering even though this is what the methodology is there for.
4. Interpretations
Imprecise formulated or sloppy read area definitions or questions lead to different
interpretations and thereby the answers differ.
5. Badly put question
If questions or area definitions are badly put, they are hard to answer or answers may be given
to a question that was not intended. Highly connected to Interpretations (4).
There is no value in it self to have all differences wiped out. Within a group or project, the differences
are what in the end ensure that every aspect of an application has been considered. However, certain
reasons behind the differences may be worth taking a look at. What role and past experience one has
must of course be what shape the answers. These areas are key to the use of the Test Priority Sheet
within a group. In what phase of development the Test Priority Sheet is being used is also a valid
reason for differences. Unit level or system level will most likely differ in what areas are prioritized.
Differences in result due to differences in interpretation should be minimized. There is no real danger
of missing anything if the methodology is used by a number of people within a group, where results
are discussed before used. However, if a single person uses the methodology, interpretations not being
what was intended may lead to that certain areas are missed. Badly put questions or area definitions in
the Test Priority Sheet may have the same result. These reasons behind differences are important that
we reduce to a minimum and changes have been made after the evaluation.
To be able to thoroughly evaluate the methodology one must use it as it is intended, which is
throughout a web application development process. The evaluation done, however, was more static
and captured opinions only on a small part of the possible use of the methodology. Nevertheless, the
opinions are regarded as very significant. The persons performing the evaluation of the Test Priority
Sheet were in the middle of a web application development process. They were all at the same stage
of development but responsible for different features of the application. The different roles and
experience of the test persons shape the results of the evaluation. In short, what they found was:
Cons:
• Difficult to use the first time
• Area definitions sometimes unclear
• Certain areas not what they expected
• Areas can sometimes be further divided
• Needs project start-up meeting to assure same definitions
Pros:
• Shows areas of importance that would otherwise have been missed
• Useful to gain acceptance for the present testing need
• Can make developers of different units understand importance of other parts of the
application development process
these areas are hard to answer. However, after implementing the methodology a second time, many
aspects were immediately much clearer.
As for some area definitions not being what they expected, changes have been made. There was, for
example, an area simply called Indexing that was defined to be ways of making sure that different
search engines on the web found the site. Test persons intuitively wanted to answer questions on
database indexing and related areas. Therefore, the prefix Web has been added. Furthermore, database
issues were taken out from Server Interface, where it first resided, and put separately under
Functionality as Databases. Indexing is one area to consider under Databases.
Test persons also wanted some areas to be split up further. We consider it possible to split several of
the test areas further, depending on what your area of expertise is, but we choose not to. Two goals
are to keep the methodology general and short, and we believe these goals may not be reached if
further splitting of test areas is done. However, we are aware of this fact and trust future use will
shape the methodology if, and as, needed.
The last aspect considered a problem is also connected to area definitions. It was stated that some sort
of project start-up meeting is required to make all users of the methodology within a specific project
use the methodology with the same definitions on all areas. We realize the problem but also believe
that having slightly different ideas on what to weigh in when implementing the Test Priority Sheet
might actually further emphasize the important areas to test.
As for the positive opinions, they are well in line with what we aim at. The first, highlighting
important areas to test, has been our primary goal with this methodology. The other positive effects,
or reasons to use it, are more of side effects realized analysing the methodology throughout its
development process. Having them presented to us when evaluating the methodology led us to really
consider these areas as areas where the methodology may be useful, as discussed in chapter 4.2.3.
4.6 Conclusion
After making certain changes based on the evaluation, we have reached our primary goal; to
accomplish a methodology that shows the most important areas to test.
Although the evaluation showed that the Test Priority Sheet was difficult to use the first time, we are
confident that it will be easy to use once the user has gotten acquainted with it. The Test Priority
Sheet is one page long consisting of a number of questions to answer making it a fairly short
methodology to use. It covers all important areas of a web application which makes it as
comprehensive as we wanted. All together we consider our goals to be achieved.
4.6.1 Discussion
Web applications may be categorized based on their complexity. Today though, basing which areas
that are in need of testing solely on what category of complexity the application is cannot be done.
Even though two web applications may look the same, the way they both are built may differ greatly.
The number of techniques available is almost endless. ASP, DHTML and ActiveX are just a few
techniques used. This means that even though we know the complexity of a web application, we do
not know what components there are to test. Still, complexity is an important factor when establishing
a web site’s need for testing.
Even when the categorization is based on both complexity and aim there is no way to beforehand tell
what areas that are in need of testing. Therefore, a methodology must be designed to cope with the
characteristics of any single web application. Of course, one might argue that a methodology may be
designed with the intent only to cope with the characteristics of some types of web applications and
therefore be more precise in its recommendations. But as we found throughout our project, the
similarities between types of applications are sometimes greater than within a single category web
applications. Thus, based only on factors such as complexity, purpose and target group, a single-
category methodology may not be established, i.e. it will not differ much from the comprehensive
methodology.
In chapter 4.2.3 we mentioned that the use of the Test Priority Sheet varies depending on at what
phase in the development process you are and also on what application, or part of application, it is
you use the matrix on within that phase. The methodology is designed in a way that handles these
variations. By giving the score 0 to any test area’s complexity factor, the area is neglected. One may
also simply choose not to answer certain questions that one considers unnecessary. Since the method
is designed to point out relative differences in importance between test areas on one specific
application at a time, it does not matter if some questions are left unanswered. The most important
result is that essential areas to test are not missed. We believe that using relative measures is a strong
advantage with this methodology. The general character of the methodology means it may be used
throughout the development process to assess the changes in testing need for different features, as the
design and architecture changes. Also an important feature of the methodology is that it requires the
developer and/or the tester to consider the whole application thoroughly, by which one gains
important knowledge about the application.
The evaluation of the Test Priority Sheet presented in chapter 4.2.4 stress the need for further dividing
up the test areas under each test type. When gaining more experience in web application testing in
general, as well as on how the methodology is, and should be, used, the shape of the Test Priority
Sheet may become subjected to changes. At this point, the methodology is knowingly made general in
its character to prevent areas in need of testing to fall between the test areas in the matrix. We choose
to maintain our high-level dividing of the areas to let future experiences guide the way to an
improved methodology. Throughout future employment of the methodology small changes in many
directions may be made to cope with needs of specific projects. In the end, what we finally have
might very well be a number of slightly different methodologies.
One approach to methodology discussed is to make the answers to the questions stricter, i.e. giving
less room for subjective interpretations. Doing this may lead to the use of the methodology also for
comparing different projects. At this point, the methodology does not support this, it is only meant for
comparing the relative testing need within a single project. Stricter answers might increase the
number of mid-range values, but summing up the testing need values in the right column would
create a value representing the total testing need of the application. Certain other changes would, of
course, have to be made. One of them would be to establish a factor for every area that would
represent the effort involved in testing the area. Establishing this would enable the methodology to be
used for comparison between projects.
_Chapter Five________
_________________________________________________________________________________
When planning a test one have the opportunity to choose if one should perform the test manually or if
one should automate the test. Later in this chapter (see 5.3) we will discuss benefits and problems
with automating tests, but we will start with the possibilities that lie at hand.
• Design testing: major testing technique is the formal review, but there are some possibilities
to use automated tools.
The tools presented above might not be single tools but parts of a tool performing multiple tasks. For
instance, the tool presented above as capture/playback systems often include a comparator to check
the outcome of the tests recorded and executed using capture/playback.
Along with these tools there are test management tools to help plan and manage resources and other
important areas.
There are also test case design tools or test generation tools that help generate test cases that cover all
important aspects of an application.
Fewster and Graham (1999) talk about three categories of test generation:
• Code-based; generates tests that check that code does what code does. Does not check that it
does what it should.
• Interface-based; generates tests based on well-defined interfaces, such as GUI. Can create
tests that visit every button, check-box or menu on a site.
• Specification-based; generates both input and expected output from specifications list.
Another characteristic that makes a test case suitable for automation is that it is performed repeatedly.
The cost and effort for automating the test is then divided among the many re-runs of the same test.
Although a test may be re-run many times, it is almost always the first time it is run that most errors
or deficiencies are found. Even when automating tests, using one of many possible automation tools,
most errors are often found manually since the test case run is tested manually before executed.
When being a part of a test team, that one feels not to be testing efficiently (not finding enough errors
fast enough), one might recommend automating the test process or parts there of. This is often a
mistake. Automating an ad-hoc testing process or a poorly maintained process might instead worsen
the problems. Fewster and Graham (1999) say “automating chaos just gives faster chaos” which
highlights the possible consequences of automating what is not a structured process.
Important to understand is also that automating test execution is not automating testing. Using a
capture/playback tool mentioned above automates input but not comparison or verification. To
automate testing, the verification as well needs to be automated. Many tools support this but even so,
testing is not yet automated. Supporting comparison does not mean that the tool actually verifies the
outcome. To achieve this, the outcome of the case executed needs to be compared to expected
outcome, that in turn needs to be input in the comparator.
While on the subject of comparing actual outcome to expected outcome, it is important to point out
that even though the actual outcome might be as expected, it does not mean that the application
passes the test, or that it is free from errors tested for. The expected outcome that is input in the
comparator might be wrong which in fact means that you will always approve a faulty application. To
some extent, this is also the case for manual testing, meaning that it all depends on how the test case
is written. The difference here is that a poorly defined manual test may very well find many important
deficiencies or errors which a likewise poorly defined automated test will not, since it can only
perform actions specified and verify given expected outcome.
We will be discussing benefits and problems later on, but there is something that needs to be
addressed right away that might also qualify for the benefits and problems chapter. That is the
differences between human testers and comparators or test execution tools. While tools only do what
they are specified to do and compares only the specified outcome, human testers performing the test
manually are able to adjust their test case to unforeseen events, such as an error dialog box, and also
to check outcome in many more ways than tools. When a human tester performs a test, almost any
test, he simultaneously can be said to be performing a usability test since he has to navigate, wait,
view and understand the application being tested. These positive side effects are never achieved when
automating testing.
Automated testing, on the other hand, ensures that a test is always being run in the exact same way,
which is often important to reproduce errors or when performing regression tests.
A final note, also from Fewster and Graham (1999), is that test execution tools are not really testing
tools but re-testing tools since the test is really being performed when recording or testing the test.
Running the tests with test execution tools then means that you are re-running a test. This is perfect
for regression testing.
5.3.1 Benefits
Among the most obvious benefits is the possibility to run the same test on new versions of a program.
A program may then be tested to check for new bugs, on existing features, that may have been
introduced when adding new features or correcting deficiencies. This is regression testing. The same
applies to re-testing, i.e. testing the functionality of a debugged feature.
If a test has been created previously it will most likely be very easy to run it again, which is not only a
benefit, but as we stated earlier, it might also be a must for automation of some tests to be at all worth
considering.
The second benefit mentioned by Fewster and Graham is the possibility to run tests more often by
which you gain more confidence in the program. They also state that people often believe that
automating tests will mean that their tests will run faster, while in reality it tends to mean that more
tests are run on a more frequent basis.
A program may pass a test at one time but fail another. When executing a test only once, manually or
automatic, it might fail to catch a deficiency that later on might cause an error. For web applications
this is certainly the reality because of the varying loads, connection speeds and the possibility of the
connection to be lost completely for a moment.
A benefit that, when realized, is obvious, is the possibility to perform tests impossible to do manually.
When discussing web applications there are obvious examples. Load testing, for instance, is
sometimes possible but not very recommendable to do manually. Applying a load of hundreds of
users might be done manually but would require a massive administrative effort. Applying the same
load by simulating the users using a tool decreases the required effort immensely.
Sometimes certain user actions may trigger events that don’t call for any response to be shown on the
screen. A tool might then be helpful when checking that the event actually occurred.
Most likely there will always be some testing that should be done manually. Automating the tests that
are tedious and requires only low skill, might free resources, in the shape of skilled testers, that serve
better purpose coming up with better test cases. Tests run manually will also be performed better if
there are fewer cases to run.
In consistency with running tests more often, mentioned above, a benefit also mentioned earlier is the
fact that a test will always be run exactly the same way when automated. This is important for both
regression testing and to gain confidence in the program’s reliability. The reuse of tests we have
established here means decreased cost per test run, meaning that more effort may be spent making the
automated tests as good as possible.
One of the last benefits mentioned by Fewster and Graham is that time to market can be shortened. A
fully automated set of tests may be run much faster than the same tests manually. This means benefits
for the release of later versions of a program, where tests necessary for full confidence in the release
can be performed in much shorter time. This is often of great importance in the development of web
applications. The time saving abilities are true if the automated test set is maintained throughout the
development of the program. The cost of maintenance is one of the most important factors when
contemplating automation.
5.3.2 Problems
What we believe to be most important to remember is that the manual testing process must be well
structured, with necessary and consistent documentation and consisting of tests good at finding errors
and deficiencies, before considering automating. Without meeting these requirements, automation
will most likely cause more problems than it solves.
Also very important is to have realistic expectations and not believe everything manufacturers of
automated tools might tell you. Many tools might be very good on what they can do, but they must be
well understood before their possible benefits come true. It is almost always the case that
manufacturers downsize the effort required to make use of their tools and instead they, of course,
emphasize the miraculous successes that takes place using these tools. Fewster and Graham make a
good point when stating our human need, or wish, to believe that using new technology will solve all
our problems. Of course it is not so.
Do not automate because you want to find additional errors that you did not find manually. Most tools
are, as we stated earlier, re-test tools, meaning they execute a test that has in fact already been run.
This means that most errors that can be found by this test already have been found. Despite this, there
are tests that might still benefit from automation in this respect, and we have already mentioned load
testing, for web applications.
Automated testing is not the same as automatically creating the test scripts. In order to receive long
time value by using tools, tests and test scripts need to be maintainable. By automatically creating the
scripts, using capture tools, one builds in a certain amount of inflexibility. Actions taken by the user
creates a strict order in the script, and if the tested application is modified, the captured sequence may
not be valid anymore. This often generates unacceptable maintenance costs (Cem Kaner, 1997).
The last problem we will address at this stage is related to the first consideration we mentioned in this
chapter; the need for structured and well defined test that are good at finding errors. This time we will
point out that the fact that a test is passed does not mean that a program is free of errors. We have
previously emphasized the importance to remember that testing never shows the absence of errors,
only the presence of them. Automating a test that in it self is faulty means that many errors, that the
test is meant to capture, are missed. Automating a test like this means that the error is preserved to
future releases.
When the script is run, the tool goes through the script line by line and performs the actions recorded,
or written, and compares the result with the checkpoints added. The results of a test execution are
then presented in some sort of result analysis window, where the status of the test (Pass, Fail) is
shown and where possible failures or divergences have occurred.
In order to do performance tests, many manufacturers offer a load-testing tool to verify how the
application reacts to different loads and to identify bottlenecks and other weak spots in the system.
These tests, as mentioned earlier in this report, becomes significantly important when testing web
sites due to the uncertainty of the number of users and user side configuration. The tools often offer
the possibility to use scripts made with the tool type described above, but also the option to create
separate scripts for performance tests. Depending on the tool, different features are offered when
designing the test, though they all share the possibility to define the number of users to simulate.
Other options can be to define different scenarios to be run simultaneously with different user groups,
or to define different user configuration. The performance statistics can be viewed in real time as well
as after the test. The results of such tests are often presented in graphs or charts, displaying
information such as user action, user number, page hits or resources used over time.
Based on the above, certain factors for evaluation were established in order to compare the tools.
• Test functions or possibilities offered by the tools
• Usability of the tools
• Scripting language
• Editing of scripts
• Maintainability of scripts
• Error analysis possibilities
• Other functionalities (such as interface towards other software for enhanced testing)
The tools offer possibilities for debugging of tested applications, but in the evaluation, there will not
be any consideration taken to how well they do it.
The two manufacturers of automated testing tools that were chosen for evaluation were Rational
Software and Mercury Interactive. The tools from Rational were Rational Robot 7.1 and
TestManager. Prior, Rational offered a program called LoadTest. The main functionality of that
program is now included in TestManager. We therefore aim at only evaluate the load testing
functionality of TestManager since it offers much more. The two tools Robot and TestManager are
included in Rational Suite Enterprise and offers close interactivity with other Rational products such
as Rose and TestFactory. Rational’s tools are designed to test client/server application and offer
extensive web support.
The Tools evaluated from Mercury Interactive were Astra QuickTest 5.0 Professional and Astra
LoadTest 4.5. These two are specifically designed for the test of web applications. These tools are
lighter versions of Mercury’s products WinRunner and LoadRunner, which offers the possibility to
test other software than web applications.
Common with the two manufacturers above is that they both uphold that their tools in a rather
uncomplicated way create usable tests. They both encourage the use of their capture/playback
function in order to automatically create script by recording user actions. We will compare the tools
with each other and evaluate if they are useful and compare our conclusions with both authors on the
subject and the information given by the tool manufacturers.
The main way the two tools are used is very similar. The user navigates through a site and the tool
records actions done. After inserting checkpoints, as it is called in QuickTest, or verification points, as
it is called in Robot, the tool compares the actual result with the supposed as defined in the check- or
verification points. There are a number of different types of such points offered by the two tools, in
order to compare areas such as object properties, links and images or content in tables. These points
may be inserted both during and after recording the script, in both Robot and QuickTest. When
inserting a check- or verification point, one defines what properties, or which data that are to be
compared in the test.
The two tools differ in how they use checkpoints and what they cover. Robot offers a wider selection
of verifications points to choose from, but QuickTest offer some that Robot does not. The object
properties point is central for both tools and functions in the same way. When pointing at the object to
be tested, the tool captures the properties of that particular object, such as name, default value, state
etc. Objects can be ActiveX controls or Java objects or items such as check boxes, edit boxes or radio
buttons. The check can be to verify that the objects appear in a specific state when the test is run.
They both offer the possibility to verify links and images for both numbers and URL’s, but Robot
offer the possibility to scan the entire site, together with the tool SiteCheck that displays the site in a
site map and offer more extensive analysis possibilities.
Astra QuickTest offer a handy checkpoint, which uses text in a web page, without it having to be
placed in a table or box. This can be very useful when the pages are dynamically created and it needs
to be verified that the correct text appears when, for example, a choice in a list have been made and
the page is created based on this. Robot does not offer this possibility in such a simple way but is still
possible to do by manually writing the function.
The languages used by the two tools differ. Rational Robot uses SQABasic while Astra QuickTest
uses VBScript. SQABasic is a language based on Microsoft Basic that has been further developed by
Rational Software. If one is familiar with Basic, this language should not pose a problem to work
with. The way the scripts are presented in the two tools differs. Rational Robot presents the script in a
more traditional way to present code than Astra QuickTest does. QuickTest on the other hand offer the
possibility to view the test in a tree view, while Robot only presents the verification points in such
view. One major drawback of QuickTest is that the script code presented in the so called expert view
is not that extensive as in Robot. Actions as scrolling in list box is not recorded in QuickTest, just the
selection in it. One therefore has better control over the script in Robot than in QuickTest.
One feature the tools offer is the possibility for using tables in order to perform data-driven tests.
These can be used to, for example, parameterize input to, or even output from, the application. This
function may be used in order to test acceptance of input to a form or to test data in databases by
requesting and verifying the results. By creating tables with data presented in fields, the tools may
then read from the tables and perform repeated actions, depending on how the data in the tables is
formed and how the tool is configured to perform iterations.
The two tools differ in how they utilize data tables. In Astra QuickTest the tables are created within
the tool and are presented at the bottom of the screen. There is a choice of global or local table. The
global table may be used by all actions2 in the test, while the local may only be used by a specific
action. With Robot the datapools, as called by Rational, are created outside Robot, in the tool
TestManager, a tool for managing the test process. The creation of the tables in Robot offers more
possibilities than QuickTest. Among them are the options to auto fill the tables with data as well as
determining the sequence the data will be retrieved (in random order, unique or sequential). This may
be very useful in order to test, for example, a credit card verification function, where TestManager
can auto fill the tables with such numbers in the correct format. Among the predefined types are, first-
and last name, date and credit car numbers. One also has the possibility to create user defined types
that can later be used to fill tables. Rational Robot and TestManager therefore offer a much more
extensive way to use data tables than Astra QuickTest.
During the evaluation, certain areas where the tools may be more useful than others started to take
shape. One obvious area is the possibility to parameterize input to the application. To test all possible
inputs to a form or test every row or column in a database is nearly impossible to do manually. By
using tables to parameterize the input, the tool may go through combination after combination,
verifying the result with minimal manual effort. The creation of tables and datatypes for the test can
2
In Astra QuickTest, a test can be build by several actions. An action is a defined part or script of the test.
of course be demanding, but we suspect that the resources saved by running the test automatically is
often far greater.
Both tools offer the possibility for parameterization of in-data as described earlier. But in order to
verify results during a data driven test, the verification point comparing the result also needs to be
data driven. Take an example where personnel records are stored in a database and can be selected
and retrieved via a listbox and displayed in a web page. If one would want to verify that the correct
record is shown for every selection, a verification- or checkpoint needs to be inserted on the page
showing the record. This verification needs to be changed since a different record is shown for every
selection. QuickTest offer a useful option to parameterize the checkpoints, by easily connecting them
to a data table, so that the information captured during a test is checked against a new baseline for
every selection. With Robot, the parameterization of verification points is not done in such a way.
One has to create this in a separate function. This may then be called by writing such instructions in
the script. It seems here that QuickTest has an advantage with the way that these checks are made, but
one has to consider the factors mentioned earlier with building in inflexibility in the tests.
When a test script, or session of scripts, has been created, the process of designing how the actual
load test will be run is done. In both tools, one defines a number of user groups that each may be
configured separately. They may for example run separate scripts, have different configurations and
differ in number of virtual users. By applying different groups as described, different load levels may
be created on the application. The virtual users will follow their designated scripts and perform the
actions in the predefined order and the load testing program captures transactions and gathers
information for evaluation. Different settings, as time limits and the number of iterations that will be
run, may be defined in both tools.
There is also the possibility, in both tools, to insert a form of meeting point that is used to gather the
virtual users in order to release them in a defined manner. This may be used to test a certain feature
with a specifically high load, by having the VU perform the action simultaneously. If for example a
certain form for input may be sensitive to high load, gathering the users and having them submit the
form at the same time may test this. These rendezvous points, as named in Astra LoadTest, or
synchronisation points, as called in TestManager, are often used with a timer function in order to
measure the performance of that specific feature during the load test. These timer functions may be
inserted wherever there is a need to measure a certain action or actions. In Astra LoadTest one may
insert a so called transaction, which is similar to Rational TestManager’s blocks. These specify certain
parts or actions in the script to be measured separately. And after the test, particular information on
these is presented for evaluation.
Functionality such as using data tables is offered by both tools. The creation and usage of those is
done in the same manner as for QuickTest and Robot, and is therefore described in the prior chapter.
In order to perform tests where verification- or checkpoints are being used, the two tools offer a bit
different solutions. When creating tests with Astra LoadTest, one may insert checkpoints in the script
in the same manner as with QuickTest. These will then be checked against during the actual run of the
test. Rational propose a different approach to this by offering the possibility to run GUI3 scripts
simultaneously with VU script in the same test session. This is positive since one may use previously
created scripts in a new session.
The scripting language used by the two tools differs. The creation of VU scripts by Rational Robot is
done in the so called VU scripting language. This is a language similar to C. It shares much of its
3
The scripts created by Rational Robot as described in the previous chapter are called GUI scripts.
syntax rules and library functions. According to Rational, if one is familiar with the C language, the
VU language should not pose a problem. Astra LoadTest uses VBScript just as QuickTest and
therefore has an advantage over Rationals products, which uses different language for GUI and VU
scripts. Though it is probably so that by using a specialised language, more possibilities for creating
test cases may be offered.
When performing tests involving large amount of virtual users, these may have to be spread between
several machines producing the load. This is possible with both tools, and functions in the same way.
One computer is acting as the master computer and directs the tests, while other acts as hosts for VUs.
The most important part of these tools is the result analysis during or at the end of a test. Information
on the performance is given in forms of graphs or reports, describing requests per time unit, sent or
received bytes per time unit, resources used etc. By viewing graphs, unsatisfying performance is
identified and can then be more thoroughly examined by viewing reports, where detailed information
on for example response time for individual VU or actions is displayed.
Fig 5.3 Results showing average transaction response time in Astra LoadTest
The tools are very similar in the way they function. The main features of a performance testing tool
are present in both. Though differences are present. As with the earlier described tools, Rational offers
a more complex product with advanced features and possibilities not available in Astra tools. For
example Rational makes it possible to set recording options depending on the server setup (for
example if using proxy server) or if a specific database is used. Rational also makes it possible to
include or exclude certain protocols when recording, for example HTTP, IIOP or pure Oracle
requests. Though, the time it takes to efficiently use the tools from Rational is extensive. With Astra
LoadTest one rather quickly may set up a useful load scenario.
5.4.3 Conclusion
When running mainly the functional testing tools we found that they were somewhat inconsistent in
how they captured different objects or actions, depending on the applications tested. In some tests
they were unable to catch the content of a list box, and the next time it appeared as it should. The
tools also show some instability when running some scripts, showing differences in result after two
seemingly identical runs. Why this is so is hard to say. It can depend on several things, fluctuations in
connection speed, performance of client and server, bugs in the tool or the technology used in the
tested application. Regardless where problems reside, one often spends more time searching for bugs
in the tool and editing scripts than on concentrating on the application that is tested, due to the
instability of the tool.
These problems occurred mainly when tests were run on applications not undergoing any changes
between the tests. Some tests were done with changes to the user interface and the tools showed
obvious problems when to large changes were made. When in an environment of software
development, where new releases may have totally different user interface and the sequence in which
actions shall be taken, have changed, we suspect that using the tools with capture/playback will end in
heavy maintenance costs as discussed earlier. Still, we consider the feature to be of use by supplying a
foundation for a test or just by giving useful hints on how to manually write the scripts. Though when
regarding the performance testing tools, the use becomes more obvious, since these types of tests are
difficult to do without tools.
If one tool outruns the other or if tools are of interest at all is difficult to say. Rational, for example,
offers extensive functionality when creating datapools, but it lacks the possibility to parameterize
checkpoints in such a simple way as Mercury’s Astra QuickTest and Astra LoadTest. We believe that
whether or not tools are of use is heavily depending on the size and type of project. QuickTest and
LoadTest are easy to get started with and probably offer functionality enough for smaller projects
without extent need of managing different releases or with applications that do not undergo greater
changes in their development. Rational on the other hand offers not just a single testing tool but a
suite of tools in order to enhance the software development process. The possibility to manage
requirements and keeping track of releases and results of tests is offered. We therefore consider the
tools from Rational to be more useful in larger projects where there are demands for such
functionality. Though in order to do a more correct evaluation one should bear in mind that Mercury
offers a set of tools called WinRunner, LoadRunner and TestDirector, as mentioned earlier in the
evaluation. In order to compare with Rational’s products, these tools probably offer more of a match.
Though both manufacturers uphold in their manuals that effective tests are easily created using
capture/playback, we believe that this is often not the case. When talking to representatives from
Rational even they state that automatic tools seldom automate the tests and using capture/playback is
generally not the way tools are used.
The conclusions come to during this evaluation follows below. Many of these conclusions get support
from experienced testers and authors on the subject as well as manufacturers of automated tools.
• Since the obvious use of graphic interface in web applications and the tools dependence of it,
the GUI should have reached a rather high level of stability before using these tools
• An obvious use of the tools is when performing data-driven tests. By using in-parameters for
the application, the test effort may be greatly reduced. These tests are of interest even if the
GUI is not finished due to the saving of resources
• Straight replay of recorded tests results in low rate of bug detection. This further implies the
need for regression testing or data-driven tests using these tools.
• Using the tools often focus testers on weaknesses in the tool rather than the tested application
• The tools are not easily learned. In order to take full advantage of them to create useful and
maintainable test cases, experience and education on the tools is needed as well good skills in
programming. The architecture of the application tested should also be well understood
• Testing of links throughout a site can be demanding to do manually but is easily done by a
tool. These tools verify every link and displays information on broken links and orphan pages
• Performance testing tools are of obvious use since the difficulty in performing performance
tests manually is extent.
User interface
1. Instructions
2. Sitemap/nav.bar
3. Content
4. Colour/background
5. Images
6. Tables
7. Wrap around
Functionality
1. Links
2. Forms
3. Data verification
4. Cookies
5. Application specific functional requirements
Interface testing
1. Server interface
2. External interface
3. Error handling
Compatibility
1. Operating systems
2. Browsers
3. Video settings
4. Modem/connection speed
5. Printers
6. Combinations
Load/stress
1. Many users at the same time
2. Large amount of data from each user
3. Long period of continuous use
Security
2. Directory setup
3. SSL
4. Logins
5. Logfiles
6. Scripting languages
User interface
1. Structural
1. Navigation
2. Graphics
3. Formatting
4. Content
5. General Appearance
2. Functionality
1. Programming Language
2. Linkage testing
3. Forms
4. Cookies
5. Application-specific Transactions
Host interface
1. Server interface
2. External interface
3. Error handling
Compatibility issues
1. Operating systems
2. Browsers
3. Connection
4. Printers
5. Multimedia devices
Load/Stress
1. User traffic
2. Data volume
3. Continuous usage
Security & Encryption
1. Logins
2. SSL
3. Directory setup
4. Log Files
5. Hacker Scripts
Functional testing
1. Unit testing
2. Integration testing
3. Browser testing
4. Configuration testing
5. Delivery testing
Content testing
1. Spelling, grammar
2. Accuracy, copyright - Liability issues
3. Images
Security testing
User test
1. Usability testing
2. Beta testing
Below is a presentation of the main areas to test when developing and publishing a web site. It is a
checklist that presents the most important features to test under each area and how to perform them.
Functionality testing
1. Links
Links are maybe the main feature on web sites. They constitute the mean of transport between
pages and guide the user to certain addresses without the user knowing the actual address
itself. Linkage testing is divided into three sub areas. First - check that the link takes you to
the page it said it would. Second – That the link isn’t broken i.e. that the page you’re linking
to exists. Third – Ensure that you have no orphan pages at your site. An orphan page is a page
that has no links to it, and may therefore only be reached if you know the correct URL.
Remember that to reduce redundant testing, there is no need to test a link more than once to a
specific page if it appears on several pages; it needs only to be tested once.
This kind of test can preferably be automated and several tools provide solutions for this.
Link testing should be done during integration testing, when connections between pages
subsist.
Resources: Rational SiteCheck (http://www.rational.com/)
http://www.netmechanic.com/
http://home.snafu.de/tilman/xenulink.html
http://www.cyberspyder.com/cslnkts1.html
Summary:
• Verify that you end up at the designated page
• Verify that the link isn’t broken
• Locate orphan pages if present
2. Forms
Forms are used to submit information from the user to the host, which in turn gets processed
and acted upon in some way. Testing the integrity of the submitting operation should be done
in order to verify that the information hits the server in correct form. If default values are
used, verify the correctness of the value. If the forms are designed to only accept certain
values this should also be tested for. For example, if only certain characters should be
accepted, try to override this when testing. These controls can be done both on the client side
as well as the server side, depending on how the application is designed, for example using
scripting languages such as Jscript, JavaScript or VBScript. Check that invalid inputs are
detected and handled.
Summary:
• Information hits the server in correct form
• Acceptance of invalid input
• Handling of wrong input (both client an server side)
• Optional versus mandatory fields
• Input longer than field allows
• Radio buttons
• Default values
3. Cookies
Cookies are often used to store information about the user and his actions on a particular site.
When a user accesses a site that uses cookies, the web server sends information about the user
and stores it on the client computer in form of a cookie. These can be used to create more
dynamic and custom-made pages or by storing, for example, login info. If you have designed
your site to use cookies, they need to be checked. Verify that the information that is to be
retrieved is there. If login information is stored in cookies check for correct encryption of
these. If your applications require cookies, how does it respond to users that disabled the use
of such? Does it still function or will the user get notified of the current situation. How will
temporary cookies be handled? What will happen when cookies expire? Depending on what
cookies are used for, one should examine the possibilities for other solutions.
Summary:
• Encryption of e.g. login info
• Users denying or accepting
• Temporary and expired cookies
4. Web Indexing
There are a number of different techniques and algorithms used by different search engines to
search the Internet. Depending on how the site is designed using Meta tags, frames, HTML
syntax, dynamically created pages, passwords or different languages, your site will be
searchable in different ways.
Summary:
• Meta tags
• Frames
• HTML syntax
• Passwords
• Dynamically created pages
5. Programming Language
Differences in web programming language versions or specifications can cause serious
problems on both client or server side. For example, which HTML specification will be used
(for example 3.2 or 4.0)? How strictly? When HTML is generated dynamically it is important
to know how it is generated.
When development is done in a distributed environment where developers, for instance, are
geographically separated, this area becomes increasingly important. Make sure that
specifications are well spread throughout the development organization to avoid future
problems.
Except HTML classes, specifications on e.g. Java, JavaScript, ActiveX, VBScript or Perl
need to be verified.
There are several tools on the market for validating different programming languages. For
languages that need compiling e.g. C++, this kind of check is often done by the compiling
program. Since this kind of testing is done by static analysis tools and needs no actual running
of the code, these tests can be done as early as possible in the development process.
Language validation tools can be found in compilers, online as well as for download, free or
by payment.
Resources: http://arealvalidator.com/
http://www.delorie.com/web/purify.html,
Summary:
• Language specifications
• Language syntax (HTML, C++, Java, Scripting languages, SQL etc.)
Summary:
• Do client side components (applets, ActiveX controls, JavaScript, CSS etc.) function
as intended (i.e. do the components perform the right tasks in a correct way)
• User disabling features (Java-applets, ActiveX, scripts etc.)
• Do server side components (ASP, Java-Servlets, server-side scripting etc.) function as
intended (i.e. do the components perform the right tasks in a correct way)
7. Databases
Databases play an important role in web application technology, housing the content that the
web application manages, running queries and fulfilling user requests for data storage. The
most commonly used type of database in web applications is the relational database and it’s
managed by SQL to write, retrieve and editing of information. In general, there are two types
of errors that may occur, data integrity errors and output errors. Data integrity errors refer to
missing or wrong data in tables and output errors are errors in writing, editing or reading
operations in the tables. The issue is to test the functionality of the database, not the content
and the focus here is therefore on output errors. Verify that queries, writing, retrieving or
editing in the database is performed in a correct way.
Resources: Rational Robot (http://www.rational.com/)
Astra QuickTest (http://www.merc-int.com/)
Issues to test are:
• Creation of tables
• Indexing of data
• Writing and editing in tables (for example valid numbers or characters, input longer
than field etc.)
• Reading from tables
Usability
1. Navigation
Navigation describes the way users navigate within a page, between different user interface
controls (buttons, boxes, lists, windows etc.), or between pages via e.g. links. To determine
whether or not your page is easy to navigate through consider the following. Is the
application’s navigation intuitive? Are the main features of the site accessible from the main
page? Do the site need a site map, search engine, or other navigational help. Be careful
though that you don’t over do your site. Too much information often has the opposite effect as
to what was intended. Users of the web tend to be very goal driven and scan a site very
quickly to see if it meets their expectations. If not, they quickly move on. They rarely take the
time to learn about the sites structure, and it is therefore important to keep the navigational
help as concise as possible.
Another important aspect of navigation is if the site is consistent in its conventions regarding
page layout, navigation bars, menus, links etc. Make sure that users intuitively know that they
are still within the site by keeping the page design uniform throughout the site.
As soon as the hierarchy of the site is determined, testing of how users navigate can
commence. Have real users try and navigate through ordinary papers describing how the
layout is done.
Summary:
• Intuitive navigation
• Main features accessible from main page
• Site map or other navigational help
• Consistent conventions (navigation bars, menus, links etc.)
2. Graphics
The graphics of a web site include images, animations, borders, colours, movie clips, fonts,
backgrounds, buttons etc. Issues to check are:
• Make sure that the graphics serve a definite purpose and that images or animations
don’t just clutter up the visual design and waste bandwidth
• Verify that fonts are consistent in style
• Suitable background colours combined with font- and foreground colour. Remember
that a computer display exceptionally well presents contrasts apposed to printed paper
• Three-dimensional effects on buttons often gives useful cues
• When displaying large amount of images, consider using thumbnails. Check that the
original picture appears when a thumbnail is clicked
• Size – quality of pictures, usage of compressed formats (JPG or GIF)
• Mouse-over effects
3. Content
Content testing is done to verify the correctness, accuracy and relevancy of information
presented on the site, or in a database, in forms of text, images or animations.
Correctness is whether the information is truthful or contains misinformation. For example
wrong prices in a price list may cause financial problems or even induce legal issues.
The accuracy of the information is whether it is without grammatical or spelling errors. These
kinds of verifications are often done in e.g. Word or other word processors.
Remove irrelevant information from your site. This may otherwise cause misunderstandings
or confusion.
Content testing should be done as early as possible, i.e. when the information is posted.
Summary:
• Correctness
• Accuracy
• Relevancy
4. General Appearance
Does the site feel right when using it? Do you intuitively know where to look for
information? Is the design consistent throughout the site? Make sure that the design and aim
goes hand in hand. Too much design can easily turn a conservative corporate site in to a
publicity stunt. Important to all kinds of usability tests is to involve external personnel that
have little or no connection to the development of the site. It’s easy to get fond of ones own
solution, so having actual users evaluating the site may be critical.
Summary:
• Intuitive design
• Consistent design
• If using frames, make sure that the main area is large enough
• Consider size of pages. Several screens on the same page or links between them
• Do features on the site need help systems or will they be intuitive
2. External Interface
Several web pages have external interfaces, such as merchants verifying credit card numbers
to allow transactions to be made or a site like http://www.pris.nu/ that compares prices and
delivery times on different merchants on the web. Verify that is sent and retrieved in correct
form.
2. Browsers
The browser is the most central component on the client side of the web. Browsers come in
different brands and versions and have different support for Java, JavaScript, ActiveX,
plugins or different HTML specifications. ActiveX, for example, is a Microsoft product and
therefore designed for Internet Explorer, while JavaScript is produced by Netscape and Java
by Sun. This substantiates the fact that compatibility problems commonly occur. Frames and
Cascading style sheets may display differently on different browsers, or not at all. Different
browsers also have different settings for e.g. security or Java support.
A good way to test browser compatibility is to create a compatibility matrix where different
brands and versions of browsers are tested to a certain number of components and settings,
for example Applets, scripting, ActiveX controls or cookies.
Summary:
• Internet Explorer (3.X 4.X, 5.X)
• Netscape Navigator (3.X, 4.X, 6.X)
• AOL
• Browser settings (security settings, graphics, Java etc.)
• Frames and Cascade Style sheets
• Applets, ActiveX controls, DHTML, client side scripting
• HTML specifications
• Graphics
3. Settings, Preferences
Depending on settings and preferences of the client machine, web applications may behave
differently. Try and vary the following:
• Screen resolution (check that text and graphic alignment still work, font are readable
etc.)
• Colour depth (256, 16-bit, 32-bit)
4. Printing
Despite the paperless society the web was to introduce, printing is done more than ever.
Verify that pages are printable with considerations on:
• Text and image alignment
• Colours of text, foreground and background
• Scalability to fit paper size
• Tables and borders
Performance
1. Connection speed
Users may differ greatly in connection speed. They may be on a 28.8 modem or on a T3
connection. Users expect longer download times when retrieving demos or programs, but not
when requesting a homepage. If the transaction response time is to long, user will leave the
site. Other issues to consider are time-out on a page that request logins. If load time is to long,
users may be thrown out due to time-out. Database problem may occur if the connection
speed is two low, causing data loss.
Summary:
• Connection speed: 14.4, 28.8, 33.6, 56.6, ISDN, cable, DSL, T1, T3
• Time-out
2. Load
What is the estimated number of users per time period and how will it be divided over the
period? Will there be peak loads and how will the system react? Can your site handle a large
amount of users requesting a certain page? Load testing is done to measure the performance at
a given load level to assure that the site work within requirements for performance. The load
level may be a certain amount of users using your site at the same time or large amount of
data transactions from user such as online ordering.
Resources: Rational TestManager (http://www.rational.com/)
Astra LoadTest (http://www.merc-int.com/)
Summary:
• Many users requesting a certain page at the same time or using the site simultaneously
• Large amount of data from users
3. Stress
Stress testing is done in order to actually break a site or a certain feature to determine how the
system reacts. Stress tests are designed to push and test system limitations and determine
whether the system recovers gracefully from crashes. Hackers often stress systems by
providing loads of wrong in-data until it crash and then gain access to it during start-up.
Typical areas to test are forms, logins or other information transaction components.
Resources: Rational TestManager (http://www.rational.com/)
Astra LoadTest (http://www.merc-int.com/)
Summary:
• Performance of memory, CPU, file handling etc.
• Error in software, hardware, memory errors (leakage, overwrite or pointers)
4. Continuous use
Is the application or certain features going to be used only during certain periods of time or
will it be used continuously 24 hours a day 7 days a week? Test that the application is able to
perform during those conditions. Will downtime be allowed or is that out of the question?
Verify that the application is able to meet the requirements and does not run out of memory or
disk space.
Security
Security is an area of immense extent, and would need extensive writing to be fairly covered. We
will no more than point out the most central elements to test. First make sure that you have a
correct directory setup. You don’t want users to be able to brows through directories on your
server.
Logins are very common on today’s web sites, and they must be error free. Make sure to test both
valid and invalid login names and passwords. Are they case sensitive? Is there a limit to how
many tries that are allowed? Can it be bypassed by typing the URL to a page inside directly in the
browser?
Is there a time-out limit within your site? What happens when it’s exceeded? Are users still able
to navigate through the site?
Logfiles are a very important in order to maintain security at the site. Verify that relevant
information is written to the logfiles and that the information is traceable.
When secure socket layers are used, verify that the encryption is done correctly and check the
integrity of the information.
Scripting on the server often constitute security holes and are often used by hackers. Test that it
isn’t possible to plant or edit scripts on the server without authorisation.
Summary:
• Directory setup
• Logins
• Time-out
• Logfiles
• SSL
• Scripting Languages
Aim
Testing Need
Complexity
Functionality (0-3) (1-3)
Links
Forms
Cookies
Web Indexing
Programming Language
Dynamic Interface Components
Databases
Usability
Navigation
Graphics
Content
General Appearance
Performance
Connection speed
Load
Stress
Continuous use
Security
General Security
User Manual
What it does
This tool is designed to help prioritize testing efforts. It will help distinguish the most important areas
of your application.
How it does it
Area specific questions are answered with numerical values that are multiplied with each other
creating a Testing Need score. The value is compared to other values established the same way. The
higher the value, the higher need for testing.
Step-by-Step
Before starting to use the Test Priority Sheet you need to establish certain factors. You need to be
familiar with the following:
0 1 2 3
Complexity
Not Present Low Medium High
1 2 3
Aim
Not Important Medium Importance High Importance
Low Impact of Failure Medium Impact of Failure High Impact of Failure
There are two questions presented under each test area below. The first is to be answered under
Complexity and the second under Aim.
Multiply the values on the same row with each other and put the product under Testing Need. Do this
for all rows.
Result
Compare the values in the Testing Need column with each other. The higher the value, the higher the
testing need. If several rows have the same value, higher priority should often be given the area with
the higher Aim value.
The questions
Functionality
Links:
• Are links present and to what extent?
• Are links critical and how sensitive is the target group to bad links?
Forms:
• Are forms present? To what extent and how advanced?
• How critical are forms for the site and how sensitive are users to problems with forms?
Cookies:
• Are cookies used and to what extent?
• How critical are the use of cookies?
Web Indexing:
• How advanced and extent is the use of techniques affecting search engine indexing?
• How severe effects will problems with web indexing have?
Programming Language:
• To what extent are different programming languages used?
• How critical effects can problems with Programming Language have?
Dynamic Interface Components:
• How extent and how advanced are the use of different technologies?
• How critical are the functions of these components and will users notice failure of these and
how will they react?
Databases:
• How complex is the database architecture?
• How critical are databases to the aim of the application and how will users respond to failure?
Usability
Navigation:
• How complex is the structure of the site?
• How severe effects will difficulty of navigation have?
Graphics:
• How extent is the use of different graphical elements?
• How critical is the use of graphics and how sensitive will users be to problems with graphics?
Content:
• To what extent does the site contain complex and sensitive information?
• How severe effects will errors in the content have?
General Appearance:
• How advanced is the site layout?
• How sensitive are users to poor appearance?
Performance
Connection Speed:
• To what extent are there features on the site sensitive to connection speed?
• To what extent will users differ in connection speed and how sensitive will they be to bad
performance?
Load:
• Are there features on the site that are sensitive to high loads?
• Will the presence of users reach high load levels?
Stress:
• Are there features on the site that are sensitive to stress situations?
• How likely is it that stress situations will occur and how severe effects may it cause?
Continuous use:
• Are there features on the site that are sensitive to continuous use?
• Will the site or feature be continuously used and what will be the effects of problems of the
feature?
Security
General security:
• How advanced is the security?
• How critical is the security of the site?
Aim
Testing Need
Complexity
Functionality (0-3) (1-3)
Links 3 1 3
Forms 2 3 6
Cookies 0 - 0
Indexing
Programming Language 1 2 2
Dynamic Interface Components 3 3 9
Usability
Navigation 3 3 9
Graphics 3 1 3
Content 3 3 9
General Appearance 3 1 3
Performance
Connection speed 3 3 9
Load 2 3 6
Stress 2 3 6
Continuous use 3 3 9
Security
General Security 1 3 3
Testing Need
Complexity
Aim
Functionality (0-3) (1-3)
Links 1 3 Functions, admin. docs. 3
Forms 2 3 6
Cookies 0 1
Indexing 3 3 Database searches 9
Programming Language 2 3 6
Dynamic Interface Components 2 3 6
Usability
Navigation 3 3 9
Graphics 1 3 3
Content 2 3 6
General Appearance
Performance
Connection speed 1 3 3
Load 2 3 6
Stress
Continuous use 3 2 6
Security
General Security 1 2 2
Testing Need
Complexity
Aim
Functionality (0-3) (1-3)
Links 1 3 3
Forms 3 3 9
Cookies 0 0
Indexing 0 0
Programming Language 3 1 3
Dynamic Interface Components 3 3 9
Usability
Navigation 2 3 6
Graphics 1 2 2
Content 2 3 6
General Appearance 1 2 2
Performance
Connection speed 3 3 9
Load 2 2 4
Stress 3 1 3
Continuous use 3 3 9
Security
General Security 2 3 6
References:
Internet:
Nguyen, Hung Q., Testing Web-Based Applications, Analyzing and reproducing errors in web
environment, 2000; http://www.stickyminds.com/
Soberano, Vincent, White Paper on Web Testing, Meeting the Unique Challenges of Testing Web
Applications, 1998; http://members.spree.com/oceansurfer/webtesting.htm
Books:
Fewster, Mark and Graham, Dorothy; Software Test Automation, Effective use of test execution tools;
1999; Adison-Wesley, Harlow; ISBN 0-201-33140-3
Hetzel, Bill, The Complete Guide to Software Testing, 2nd edition, 1988; Wiley-QED, New York;
ISBN 0-471-56567-9
Nguyen, Hung Q., Testing Applications on the Web, Test Planning for Internet-Based Systems, 2001;
Wiley, New York; ISBN 0-471-39470-X
Powell, Thomas A., Web Site Development, Beyond Web Page Design, 1998; Prentice Hall;
ISBN 0136509207
Compendium:
Schaefer, Hans, Testning, Företagsintern kurs: Sigma Systems AB; 4-6 september 2000, ITQ
Paper:
Kaner, Cem, Improving the Maintainability of Automated Test Suites, 1997; Presented at quality week
97
Other:
Nielsen, Peter, From a Sigma nBit AB internal course in Test & Verification, November 2000
Hower, Rick, Beyond Broken Links, 1997; Internet Systems, July 1997;
http://www.dbmsmag.com/9707i03.html
Powers, Mike, Why Test the Web? How Much Should You Test? Jan 2000;
http://www.data-dimensions.com/testersnet/wince.htm
Mosley, Daniel J.; Client-Server Software Testing on the Desktop and the Web, 2000; Prentice Hall
PTR, Upper Saddle River; ISBN 0-13-183880-6
Perry, William E., Effective Methods for Software Testing, 2nd edition, 2000; Wiley, New York; ISBN
0-471-35418-X
Earls, Alan, True Test of the Web, 1999; Information week, 01/25/99 issue 718, p1A, 5p
Useful resources:
http://www.stickyminds.com
http://www.io.com/~wazmo/qa.html
http://www.mtsu.edu/~storm
http://www.pcwebopaedia.com
http://www.softwareqatest.com
http://www.data-dimensions.com
http://www.rational.com
http://www.merc-int.com
http://www.compuware.com
http://www.microsoft.com/enable
http://www.io.com/~wazmo