Professional Documents
Culture Documents
Web Caching Alg
Web Caching Alg
© Vladimir V. Prischepa
Chelyabinsk State University
vladimir@csu.ac.ru
Abstract is connected with the fact that many Web users have cor-
relation in requests (common interests). Researches have
In this paper we analyse effectiveness of LFU-K shown that 25-40% of all requested documents account
replacement policy for the purposes of caching for 70% of users requests [2]. Moreover, the nets of ho-
on proxy servers and give the results of traces mogeneous organisations (university, corporation, etc.)
analysis taken from real proxy servers to reveal have the highest correlation of requests.
a set of properties of network traffic. On the ba- While caching Web objects on proxy servers as well
sis of the analysis we have drawn a conclusion as in other aspects of caching there appears a problem of
about expediency of usage of LFU-K policy finite storage of the cache and hence, a necessity of mak-
which uses information about dynamic change ing room for new documents. A replacement policy de-
of document popularity, for Web caching. The termines which object is to be removed from the cache.
scheme of LFU-K policy is given as well as re- Selection of an effective policy can considerably increase
sults of experiments aimed to compare its effec- caching effectiveness reducing network traffic by 20%
tiveness with the most popular replacement al- and more.
gorithms. Difficulties in selecting a replacement policy are con-
nected with specific characteristics of the network traffic
1. Introduction. which are the following [4]:
In the last decade intensive growth of information vol- - HTTP protocol gives access only to files of full size,
umes in the World-Wide-Web has led to the problem of i. e. a proxy server cache can satisfy the user’s request
network loads. Development of technical means for only if the file has been stored completely (i.e. it can
transmitting data and their introduction into practice do not store incomplete objects).
not correspond with the Internet growth. Hence, there is a - Documents stored in the proxy server cache are of
need to find other approaches to solve the problem of different size, from several bytes to hundreds of
network loads. megabytes.
The common method of solution without drawing ad- - A stream of requests to the cache is a sum of streams
ditional technical means is via caching Web objects (text, of requests of hundreds and thousands of users.
image, etc.). Web documents are subdivided into two Because sizes of documents are different there is a
types: static and dynamic. Most of the Web objects are need to introduce a new metric of effectiveness called
static documents which can be stored in a certain data Byte Hit Rate apart from Hit Rate policy as the basic
‘storage’, i.e. a cache, for the further usage. While reac- metric of effectiveness. Byte Hit Rate is computed as a
cessing these documents the proxy server checks whether ratio of amount of bytes derived from the proxy cache to
there have been any modifications on the source site and the total amount of bytes requested by the user.
if there have not been any, the user has the page Because of difference in documents sizes metric sug-
downloaded right from the cache. gested by Byte Hit Rate is the most adequate to deter-
Usage effectiveness of caching allows to use it on dif- mine the policy effectiveness as it is the one that shows
ferent levels such as Internet browser, proxy server of a the savings in network traffic.
local net, Internet proxy server. Proxies serving a large
set of clients show the highest effectiveness of caching. It
1,8% 1,2%
26,5% 23,3%
26,2% 27,6%
less than 1 Kbyte less than 1 Kbyte
1-5 Kbyte 1-5 Kbyte
5-100 Kbyte 5-100 Kbyte
more than 100 Kbyte more than 100 Кбайт
45,5%
47,9%
Figure 1. Percentage of requests for objects depending Figure 2. Percentage of requests for the most popular
on their size (BU trace). objects depending on their size.
12000
Percent traffic (BU)
BU -Trace
10000
Zipf 80-20
1,6% 7,5%
8000
Access
52,5%
5-100 Kbyte
38,4% more than 100 Kbyte 4000
2000
0
1 3 5 7 9 11 13 15 17 19 21 23 25 27
Page num ber
Figure 3. Percentage of traffic in all BU trace, depend- Figure 4. Comparison of theoretical distribution and
ing on the size of objects. requests distribution in the real BU trace.
0,45
0,45
0,4 0,4
LR U
LRU
S IZE
0,35 GD_SIZE
G D _S IZE
0,3 GDSF GDSF
LFU -1 0,3 LFU-1
0,25
0,2 0,25
0,15 0,2
50 100 150 200 100 150 200
C ache size Cache size
Figure 5. Byte Hit Rate of BU trace Figure 6. Byte Hit Rate of CSU trace.
For example, Boston University trace, which we have Using the formula above one can compute ratings of
analysed, includes more than 250 thousand requests and all the pages in the cache. If there is no room in the
has clearly shown changes in many of the popular ob- cache the least accessed document is replaced.
jects. Here we took 600 of the most popular objects in It should be pointed out that the best advantage of
the first and the second half of the trace to compare and this algorithm is determining object popularity both
found out that two third of the objects are different. during quite a long period of users accesses (the latest
Thus, we come to the conclusion that it is necessary m-requests) and a possibility to react quickly to appear-
to use replacement policies aimed to distinguish ance of new popular objects (the latest h-requests).
changes in object popularity. LRU policy exactly de- Parameters for LFU-K algorithm were selected as
termines changes in request probability in time but it is follows: m parameter was derived from analytical esti-
quite primitive and does not correspond properly with mations [7]. Value of h parameter was derived from
sharp changes in popularity. It should be pointed out empirical results. In our experiment we set h=3000.
that several special replacement policies for proxy sev-
ers based upon changes in object popularity in time 4. Experiment results.
have been worked out. For example, Pitkow/Recker
policy [6] is aimed to store documents accessed by the We have compared LFU-1 policy with the most popular
user in current 24 hours, other documents are marked as policies of real proxy servers traces (see Figure 5 and
outdated and get removed. But it is evident that changes Figure 6). Results of two traces are given. They are
in document popularity are more complicated. traces of Boston University (BU) and Chelyabinsk State
University (CSU).
As the Figures 5 and 6 show LFU-1 policy reflects
3. LFU-1 algorithm.
higher effectiveness in Byte Hit Rate in comparison
We suggested using LFU-1 algorithm for proxy cach- with other policies
ing [7]. This algorithm takes into consideration specific
character of changes in objects popularity. LFU-1 5. Conclusion.
turned out to be sufficient to show high effectiveness in
Byte Hit Rate. In this paper we have researched into effectiveness of
Let us consider a succession of document requests LFU-K policy for the purposes of caching on a proxy
(later request corresponds with a smaller value of in- server. The analysis of the traces demonstrates disad-
dex). ri is a number of a document requested vantages of basic replacement policies in real systems.
In comparison with popular replacement policies
LFU-1 algorithm shows higher efficiency. The results
r1 , r2 , …, rh , rh+1 , …, rm , … of the experiments allow to make a conclusion about
expediency of LFU-K algorithm on proxy servers in the
Web.
ν(i) s(i) Evaluation of LFU-2 algorithm effectiveness repre-
sents a special interest in future.
Where ν(i) is amount of entries of page number i We plan to try to raise effectiveness of caching
into the latest h requests and s(i) is amount of entries of mechanism by means of combined replacement poli-
page number i into the latest h requests. cies.