Professional Documents
Culture Documents
Bucket Membership - Performance Tests
Bucket Membership - Performance Tests
Conclusions
The system was tested on the bucket membership tests. From results we could get the
following:
- up to 800 000 relevant documents are supported with lower number of
categories (<= 10 000)
- in case of 15 000 categories the number of documents which can be handled
by the system drops to 600 000 relevant documents
- increasing number of relevant categories decreases maximum number of
relevant documents in the system
- both number of categories and number of documents have influence on the
performance, but number of categories has bigger impact
- in case of having 1 million of relevant documents the system crashes
- adding irrelevant documents to the system has minor impact on the
performance
- adding irrelevant categories to the system has small impact up to 60 000 and
then performance decreases.
- the tests were performed only with the even distribution of documents in the
category tree (all leaf categories have the same amount of documents)
We are going to measure as well other values and we would like to see non-/ linear
behavior of the system. We are going to change both factor together as well as
independently.
Structure of category tree
All Documents
+source
+A
+A
+B
+B
………………….
+target
+A
+A
+B
+B
………………….
The structure of source and target tree is the same as it is shown above. The target leaf
categories have in their Include Primary Bucket the leaf categories from source sub-
tree. The numbers which represent number of categories are related to number of leaf
categories. For each leaf category is created an intermediate category. Additionally,
for each source category is created a mirror category in the target tree. It implies that
in the system there are four times more categories than the number of categories
indicates. E.g 10 leaf categories mean that there are 40 categories in the system.
Documents in the system are attached to leaves of the source tree. Whole target tree
and intermediate categories in both target and source tree do not contain documents.
Testing
The purpose of the tests is to see how behave Bucket Membership categories-
documents extractor depending on number of categories and documents.
Changing number of documents
In this test we are going to test how the system behaves when number of documents is
changing with fixed 10 category leaves.
Seconds
1400
1200
1000
800
600
400 Seconds
200
0
ry 00 00 00 00 00 00 00
go 10 0 0 0 0 0 0
at
e 10 1 00 2 00 3 00 4 00 6 00
erc
p
ts
en
m
ocu
D
Changing number of leaf categories
In this test we are going to test how the system behaves when number of leaf
categories is changing with fixed 1 document per leaf category. It means that both
factors are changing.
Category leaf
number Time Seconds
00:00:0
100 4 4
00:00:3
1000 5 35
00:01:0
2000 6 66
00:02:4
5000 3 163
00:05:3
10000 0 330
Seconds
350
300
250
200
Seconds
150
100
50
0
Category 100 1000 2000 5000 10000
leaf
number
Seconds
400
350
300
250
200 Seconds
150
100
50
0
Category 100 1000 2000 5000 10000
leaf
number
In this test we are going to change number of leaf categories and number of
documents at the same time. It is just a generic test.
Seconds
1200
1000
800
600
400
200
0 Seconds
ies 00 00 00 00 00 00 00 00
gor 10 50 100 200 400 600 000 000
te 1 2
t ca
an
lev
ir re
r of
be
um
N
We can spot that adding irrelevant categories to the system has almost no influence up
to 600 000 categories. Above this number the performance decreases. More
investigation must be done, but at the moment it is far beyond than our expected
number of categories in the system is.
Seconds
600
500
400
300
200
100
0 Seconds
ts 0 0 0 0 0 0 0 0
en 000 000 000 000 000 000 000 000
um 10 20 40 50 60 80 00 00
doc 1 2
nt
a
lev
ir re
r of
be
um
N
On the basis of the results we can see that increasing number of irrelevant documents
in the system has no clear influence on the performance.
In this test we have 10 000 leaf categories and we are going to change the number of
documents in leaves in source tree.
Seconds
1200
1000
800
600 Seconds
400
200
0
Docu- 25 50 100 200 400 800
ments
per
cate-
gory
In this test we have 15 000 leaf categories and we are going to change the number of
documents in leaves in source tree.