Professional Documents
Culture Documents
Testing The Suitability of Markov Chains As Web Usage Models
Testing The Suitability of Markov Chains As Web Usage Models
the ofcial web site for the School of Engineering and Applied Science at Southern Methodist University. The results
provided us with empirical evidence that Markov chains can
provide fairly accurate models of web usages.
1. Introduction
With the prevalence of the world wide web and peoples
reliance on it in society today, ensuring satisfactory performance and reliability of web servers and web sites is becoming increasingly important. Because of the user focus and
the large size of the web, exhaustive testing suitable for individual web pages or small web sites, such as various link
checkers, or traditional coverage-based testing techniques,
need to be modied or used selectively to remain practical or feasible. A good candidate for effective, large-scale
web testing and quality assurance is statistical testing and
related reliability analysis [5, 10]. Because of the close resemblance between web applications and the state transition
mechanism in Markov chains [3], statistical testing based on
actual usage patterns and frequencies captured in Markov
chains is a natural choice for web testing and quality assurance [2].
To test the suitability of Markov chains to model web
usages, we gathered information about web link usage frequencies from web logs. In particular, we propose to use
a small set of tests to check the conformance by these actual usage frequencies to the so-called memoryless property
that all Markov chains satisfy. We applied this approach to
Proceedings of the 27th Annual International Computer Software and Applications Conference (COMPSAC03)
0730-3157/03 $ 17.00 2003 IEEE
a system component). The states are interconnected to reect the functional or structural interconnections in the system, and the state transition probabilities give us the information about how likely such transitions are going to take
place. An operational sequence consisting of visits to multiple states can be constructed by following the state transitions. The likelihood for a particular sequence to happen
can also be easily calculated by the product of its individual state transition probabilities. Therefore, Markov chains
can be used to ensure performance and reliability based on
usage scenarios and frequencies by target customers.
CSE home
0.01
0.3
0.3
0.19
Other info
0.5
Courses
Programs
0.4
0.4
0.5
0.3
0.6
0.2
0.3
Exit
Proceedings of the 27th Annual International Computer Software and Applications Conference (COMPSAC03)
0730-3157/03 $ 17.00 2003 IEEE
Proceedings of the 27th Annual International Computer Software and Applications Conference (COMPSAC03)
0730-3157/03 $ 17.00 2003 IEEE
Rank
1
2
3
4
5
6
7
8
9
10
Page
/index.html
/justin/inline.html
/justin/inline m.rail.html
/ce/index.html
/co/cams/index.html
/ce/seas/index.html
/co/cams/index.html
/ce/smu/index.html
/cse/index.html
/netech/index.html
#hits
19418
5140
4733
4013
3472
2890
2571
2422
2257
2230
4. A Case Study
The above approach for validating the Markovian property for web applications was applied in a case study.
We used web logs covering 26 days of data from the
School of Engineering and Applied Science at Southern Methodist University (SMU/SEAS) web site at
www.seas.smu.edu.
Rank Page
1 /cse/index.html
2 /ce/seas/index.html
3 /ee/index.html
4 /disted/index.html
5 /gradadmissions/index.html
6 /gradinfo.html
7 /ecommerce/index.html
8 /me/index.html
9 /infodata.html
10 /ugradinfo.html
11 /textonly.html
12 /hear.htm
13 /students.html
14 /disted/ship/index.html
15 /building.html
16 /env/index.html
17 /co/index.html
18 /seasnews.html
19 /contactseas.html
20 /recruit/index.html
15 outlinks with ref counts < 100
total over all 35 outlinks
#hits
1258
1145
1140
547
433
314
305
281
239
221
188
170
159
150
137
132
118
116
115
109
334
7611
pij
0.165
0.150
0.150
0.072
0.057
0.041
0.040
0.037
0.031
0.029
0.025
0.022
0.021
0.020
0.018
0.017
0.016
0.015
0.015
0.014
0.044
1
Proceedings of the 27th Annual International Computer Software and Applications Conference (COMPSAC03)
0730-3157/03 $ 17.00 2003 IEEE
Rank
1
2
3
4
5
6
7
8
9
10
Page
www.smu.edu/academics.html
www.smu.edu/graduate/
www.smu.edu/sitemap.html
www.smu.edu/admissions/academics/
engineering.html
search.yahoo.com/bin/search
search.msn.com/results.asp
www.goto.com/d/search/p/netscape/
netnd2.aol.com/results.adp
voled.doded.mil/dantes/dl/extdeg.htm
ink.yahoo.com/bin/query
#hits
1912
146
105
100
49
44
43
38
28
22
Proceedings of the 27th Annual International Computer Software and Applications Conference (COMPSAC03)
0730-3157/03 $ 17.00 2003 IEEE
inlink
outlink
(/ = www.seas.smu.edu)
/cse/index.html
/ce/seas/index.html
/ee/index.html
/disted/index.html
/gradadmissions/index.html
/gradinfo.html
/ecommerce/index.html
/me/index.html
/infodata.html
/ugradinfo.html
all
(pij )
0.165
0.150
0.150
0.072
0.057
0.041
0.040
0.037
0.031
0.029
empty
/academics.html
0.159
0.168
0.153
0.069
0.057
0.040
0.041
0.025
0.032
0.019
0.256
0.073
0.213
0.049
0.055
0.036
0.045
0.074
0.027
0
Acknowledgments
This research is supported in part by NSF grants
9733588 and 0204345, THECB/ATP grants 003613-00301999 and 003613-0030-2001, and Nortel Networks.
We also thank the other members of our research group,
Gunes Koru, Sunita Rudraraju, Li Ma, and Sudipti Mishra,
for their comments and suggestions.
References
[1] B. Behlandorf. Running a Perfect Web Site with Apache, 2nd
Ed. MacMillan Computer Publishing, New York, 1996.
[2] C. Kallepalli and J. Tian. Measuring and modeling usage and
reliability for statistical web testing. IEEE Trans. on Software Engineering, 27(11):10231036, Nov. 2001.
[3] S. Karlin and H. M. Taylor. A First Course in Stochastic Processes, 2nd Ed. Academic Press, New York, 1975.
[4] A. L. Montgomery and C. Faloutsos. Identifying web browsing trends and patterns. IEEE Computer, 34(7):9495, July
2001.
[5] J. D. Musa. Software Reliability Engineering. McGraw-Hill,
New York, 1998.
[6] R. R. Sarukkai. Link prediction and path analysis using
Markov chains. In Proc. 9th International World Wide Web
Conference, Amsterdam, the Netherlands, May 2000.
[7] M. Spiliopoulou. Web usage mining for web site evaluation.
Communications of the ACM, 43(8):127134, Aug. 2000.
[8] J. Tian. Integrating time domain and input domain analyses
of software reliability using tree-based models. IEEE Trans.
on Software Engineering, 21(12):945958, Dec. 1995.
[9] K. S. Trivedi. Probability and Statistics with Reliability,
Queuing, and Computer Science Applications, 2nd Edition.
John Wiley and Sons, New York, 2001.
[10] J. A. Whittaker and M. G. Thomason. A Markov chain model
for statistical software testing. IEEE Trans. on Software Engineering, 20(10):812824, Oct. 1994.
Proceedings of the 27th Annual International Computer Software and Applications Conference (COMPSAC03)
0730-3157/03 $ 17.00 2003 IEEE