Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 9

Bucket Membership – Performance Tests

Conclusions

The system was tested on the bucket membership tests. From results we could get the
following:
- up to 800 000 relevant documents are supported with lower number of
categories (<= 10 000)
- in case of 15 000 categories the number of documents which can be handled
by the system drops to 600 000 relevant documents
- increasing number of relevant categories decreases maximum number of
relevant documents in the system
- both number of categories and number of documents have influence on the
performance, but number of categories has bigger impact
- in case of having 1 million of relevant documents the system crashes
- adding irrelevant documents to the system has minor impact on the
performance
- adding irrelevant categories to the system has small impact up to 60 000 and
then performance decreases.
- the tests were performed only with the even distribution of documents in the
category tree (all leaf categories have the same amount of documents)

The Results of Tests

Expected numbers to test

A real system has:


300 000 documents
10 000 categories

Extra numbers to test (future target):


1 000 000 documents
15 000 categories

We are going to measure as well other values and we would like to see non-/ linear
behavior of the system. We are going to change both factor together as well as
independently.
Structure of category tree

All Documents
+source
+A
+A
+B
+B
………………….
+target
+A
+A
+B
+B
………………….

The structure of source and target tree is the same as it is shown above. The target leaf
categories have in their Include Primary Bucket the leaf categories from source sub-
tree. The numbers which represent number of categories are related to number of leaf
categories. For each leaf category is created an intermediate category. Additionally,
for each source category is created a mirror category in the target tree. It implies that
in the system there are four times more categories than the number of categories
indicates. E.g 10 leaf categories mean that there are 40 categories in the system.

Documents in Include Primary Bucket

Documents in the system are attached to leaves of the source tree. Whole target tree
and intermediate categories in both target and source tree do not contain documents.

Testing

The purpose of the tests is to see how behave Bucket Membership categories-
documents extractor depending on number of categories and documents.
Changing number of documents

In this test we are going to test how the system behaves when number of documents is
changing with fixed 10 category leaves.

Documents per category Time Seconds


1000 00:00:02 2
10000 00:00:10 10
100000 00:01:37 97
200000 00:03:38 218
300000 00:06:53 413
400000 00:11:08 668
600000 00:21:49 1309
1000000 out of memory

Seconds
1400
1200
1000
800
600
400 Seconds
200
0
ry 00 00 00 00 00 00 00
go 10 0 0 0 0 0 0
at
e 10 1 00 2 00 3 00 4 00 6 00
erc
p
ts
en
m
ocu
D
Changing number of leaf categories

In this test we are going to test how the system behaves when number of leaf
categories is changing with fixed 1 document per leaf category. It means that both
factors are changing.

Category leaf
number Time Seconds
00:00:0
100 4 4
00:00:3
1000 5 35
00:01:0
2000 6 66
00:02:4
5000 3 163
00:05:3
10000 0 330

Seconds
350
300
250
200
Seconds
150
100
50
0
Category 100 1000 2000 5000 10000
leaf
number

An additional test with 10 documents per leaf category.

Category leaf number Time Seconds


00:00:0
100 4 4
00:00:3
1000 3 33
2000 00:01:0 68
8
00:02:4
5000 9 169
00:05:4
10000 3 343

Seconds
400
350
300
250
200 Seconds
150
100
50
0
Category 100 1000 2000 5000 10000
leaf
number

Changing number of categories and number of documents

In this test we are going to change number of leaf categories and number of
documents at the same time. It is just a generic test.

Documents per leaf


Category leaf number category Time Seconds
00:00:0
100 100 4 4
00:00:0
200 200 9 9
00:00:1
300 300 7 17
00:00:2
400 400 6 26
00:00:3
500 500 6 36
00:01:4
1000 1000 1 101
00:07:2
2000 2000 5 445

Adding irrelevant categories


In this test we are going to see an influence of adding irrelevant categories (without
any intermediate/additional nodes) to the system. To test we are going to use 10 000
leaf categories and 100 documents per leaf.

Number of irrelevant categories Time Seconds


00:07:1
1000 0 430
00:06:5
5000 6 416
00:06:5
10000 8 418
00:07:2
20000 5 445
00:07:4
40000 3 463
00:08:0
60000 7 487
00:09:3
100000 1 571
00:15:5
200000 3 953

Seconds
1200
1000
800
600
400
200
0 Seconds
ies 00 00 00 00 00 00 00 00
gor 10 50 100 200 400 600 000 000
te 1 2
t ca
an
lev
ir re
r of
be
um
N

We can spot that adding irrelevant categories to the system has almost no influence up
to 600 000 categories. Above this number the performance decreases. More
investigation must be done, but at the moment it is far beyond than our expected
number of categories in the system is.

Adding irrelevant documents


In this test we are going to see an influence of adding irrelevant documents to the
system (there are no in an include bucket). To test we are going to use 10 000 leaf
categories and 100 documents per leaf.

Number of irrelevant documents Time Seconds


00:07:0
100000 4 424
00:07:0
200000 9 429
00:07:1
400000 2 432
00:08:0
500000 2 482
00:07:5
600000 5 475
00:08:2
800000 5 505
00:08:5
1000000 2 532
00:08:1
2000000 3 493

Seconds
600
500
400
300
200
100
0 Seconds
ts 0 0 0 0 0 0 0 0
en 000 000 000 000 000 000 000 000
um 10 20 40 50 60 80 00 00
doc 1 2
nt
a
lev
ir re
r of
be
um
N

On the basis of the results we can see that increasing number of irrelevant documents
in the system has no clear influence on the performance.

Testing a real case (10k categories)

In this test we have 10 000 leaf categories and we are going to change the number of
documents in leaves in source tree.

Documents per category Time Seconds


00:06:0
25 6 366
00:06:2
50 2 382
00:07:2
100 2 442
00:10:0
200 5 605
00:11:1
400 6 675
00:17:5
800 6 1076
1000 out of memory

Seconds
1200
1000
800
600 Seconds
400
200
0
Docu- 25 50 100 200 400 800
ments
per
cate-
gory

Testing a real case (15k categories)

In this test we have 15 000 leaf categories and we are going to change the number of
documents in leaves in source tree.

Documents per category Time Seconds


00:09:0
25 6 546
00:09:5
50 1 591
00:12:0
100 0 720
00:16:0
200 1 961
00:17:1
400 0 1021
800 out of memory
1000 out of memory
Seconds
1200
1000
800
600 Seconds
400
200
0
Docu- 25 50 100 200 400
ments
per cat-
egory

You might also like