Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/335935263

A Distributed System for Biometrical Identification Based on Big Data


Analysis

Conference Paper · March 2019


DOI: 10.1109/CREBUS.2019.8840079

CITATIONS READS

0 26

3 authors, including:

George Popov
Technical University of Sofia
102 PUBLICATIONS 121 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Communication View project

Operating systems View project

All content following this page was uploaded by George Popov on 23 September 2020.

The user has requested enhancement of the downloaded file.


A Distributed System for Biometrical Identification
Based on Big Data Analysis
George Popov Ognyan Nakov Elizabeth Mihaylova
Faculty of Computer Systems and Faculty of Computer Systems and Faculty of Computer Systems and
Technologies Technologies Technologies
Technical University of Sofia Bulgaria Technical University of Sofia Bulgaria Technical University of Sofia Bulgaria
popovg@tu-sofia.bg nakov@tu-sofia.bg elizabeth.mihailova@gmail.com

Abstract—This article explores the design and construction


of national and larger international biometric identification II. SYSTEM DESCRIPTION
databases. An analytical study of the resource efficiency Let there is a system with the following structure:
problem of a system comprising multiple mobile local
biometric readers connected through a limited communication - Central remote database RD realized like a big data
channel with a central database is done. A suggested approach Server DS with a large number of templates, e.g. a national
is to accelerate the computation process based on the partial bio-identification data storage system;
storing of signs information for templates. The presented
analytical models can be used for subsequent optimization. - Mobile local devices MLD with own memory, called
in this paper local memory LM in which there is a biometric
reader and a bio-identification is carried out. These devices
Keywords— Biometrical recognition, Big data analysis, Cache turn to the RD to load templates.
hit optimization
If in the system are stored a big number (more than tens
of millions) N templates and each image is defined by a
template with a volume of information q represented n
I. INTRODUCTION signs, sufficient for identification with probability Pid.
A trend in modern recognition systems is that most
computational operations are performed on local devices,
with the aim of unloading the central database and the
connection channel of the devices. For example, face
detection systems by 2010 have detected a PC, DVR, or
NVR on the central unit.
Today, analytic features are transferred to front devices
such as intelligent face detection cameras, fingerprint
scanners, hand or palm recognition readers, and so on
[1,2,3].
Although these modern devices have a significant
amount of internal memory and some store up to 100,000
biometric templates. Unfortunately, they cannot be directly
implemented for high volume information tasks such as
national and global databases with biometric information
[4,5]. Fig.1. The architecture of the system for pattern recognition
Due to the fact that these scanners are often realized as
mobile devices and they use a GPRS connection with a
limited volume of the communication channel. The functional algorithm is as follows: the scanned
biometrical information (data) is recognized by the
Terminal devices for bio-identification used in a large processor (s) based on pattern (template) analysis. Processor
system for a national or international system (eg, the EU) (s) have cache and RAM, they are called local memory with
are unwittingly implementing LRU algorithm by updating volume Vm and access time Tl. Probably there the more
their internal memory with data dependent on the location of commonly used templates are present. In case the templates
their use. For example, a biometric terminal operating on the are absent in the local memory, they are loaded from
border between Bulgaria and Turkey will keep the external memory, ie. Remote Database with unlimited
information of all peoples who cross this border regularly. volume and access time TRD (Fig. 1)
Another important feature in biometric identification is The system maintains templates in memory by means of
also the preparation and application of implantable materials LRU algorithm or working set, i.e. the images identified
in the human body and the subsequent influence on the with the highest relative frequency are maintained in the
measurement sensors [6,7,8]. The use of optimization LM. A system algorithm specified by Petri's stochastic
models in production would spontaneously recognize and timed network is shown in Fig.2. and Table 1.
correctly interpret the subsequent biometric bias.
NLM - number of patterns in the LM;
Ns- total number of patterns in the system;
VLM- LM volume;
q- one pattern volume.

If a part of the attributes, describing the template is


loaded in the LM (for example, n of k), the number of LM
templates increases n / k times, which reduces the likelihood
of a reference to the RD:
n VLM
Fig. 2. Memory reference model for LM and RD through Petri's stochastic p hit  (4)
k qN s
timed network
Logically, logically, if the signs are less then it will
TABLE I. Description of transitions and places of PN model decrease the probability of recognition Pid. If the probability
TRAN- POSI- of recognition by one sign is pi, then the probability of
TYPE DESCRIPTION TION TYPE recognition by n signs is:
SITION
A request for pattern n
t1 Normal P1 Ready state
recognition. Pid (n)  1   (1  pi ) (5)
i 1
T2 Stochastic with P2
Local system hit. Memory
pdf p1, timed Assuming that all signs are equal from point of
Reference time to LM reference
with time TLM identification Pid is:
T3 Stochastic with
Local system miss. P3 RD data Pid (n)  1  (1  pi ) n (6)
pdf p2, timed
Reference time to LM reference
with time TLM Dependencies (4) and (5) are shown in Fig. 3, such as
for phit the LM contains 100 thousand templates, and the
T4 Timed with Reference time to P4 End of RD 10 million templates, i.e.
time TDS RD MR
Nc 100000 (7)
  0.01
t5 Normal Auxiliary transition - - N s 10000000

III. ANALYTICAL MODELING


If the processing of the LM information requires time
TLM, the processing associated with the RD requires time
TRD:
TRD  TLM , (1)
due to delayed network access to RD. Access time is
given in the scientific literature [6,7]:
Tref  (1  phit )TRD  TLM  Ts (2)

where:
phit- the hit ratio of LM ;
TRD- time to make a RD access when there is a miss (or, Fig. 3. Graphical interpretation of Pid , Pid/n and phit as a function of the
with multi-level cache, average memory reference time for number of checked signs
the next-lower cache); It can be seen that when the number of signs in the local
TLM- the time to reference the LM (should be the same system increases, phit decreases because more memory is
for hits and misses); used. On the other hand, Pid does not increase significantly.

Ts- other various secondary system effects. In order to reduce the number of comparisons, there is
suggested a tree algorithm with steps as first comparing one
According to the geometric probability phit is: sign. If there is a match, the next sign of the selected
template have to be checked (see Fig. 4) otherwise, the next
V LM
template is compared from the beginning.
N LM q V
p hit    LM , (3)
The number of comparisons (or calculations) depending
N LM  N s Ns qN s
on dept of number of sign is given in Figure 5.
where:
The acceleration of system SLM from the use of the LM
is
TRD  Ts
S LM  (9)
(1  phit )TRD  TLM  Ts

Assuming Ts  0

TRD
S LM  (10)
(1  phit )TRD  TLM
At probability p hit  1 , acceleration SLM is:

TRD
S LM  (11)
TLM
Contrary at p hit  0 , acceleration is lower than 1:

TRD
S LM  (12)
TRD  TLM
Fig. 6 gives the efficiency of using LM technology SLM
on the probability phit at ratio =TDS/TLM=10.
Fig. 4. A tree algorithm for pattern comparison

Fig. 6. Efficiency of caching depending on phit

It seems that the limit of efficiency is 1/ and it is


Fig. 5. Number of comparisons (or calculations) depending on step depth n
obvious from (10) that a LM with too small volume
Fig. 5 shows that most of the calculations are performed degrades the performance of the system as shown in Fig. 6.
on the CM of MDL (area with light gray color), that unloads
the Remote data storage and the communication channel IV. CONCLUSIONS AND FUTURE WORKS
(dark grey).
The assumption that all signs are equal from point of
Therefore, the processing time Tproc for one request for identification (6) is the worst case. Actually things are better
template comparison with information volume q is: because some signs bring more information than others.
q 1 Based on formulas (3) (8) and (10), it is possible to
Tproc  Tfetch  Texec  qTref   q(Tref  ) , (8)
p p determine analytically what part of the signs of the templates
is optimal to be in the LM.
where:
For example, the number of signs representing a given
Tref - reference time for template loading memory (2); template depends on the frequency of its use by the local
Tfetch – time to fetch the information for one template; device. Fig. 7 gives a similar order, where in CM for 100000
templates are accommodated more than 278000, which
Texec – time for execute a code comparison; reduces dramatically references to RD and accelerates
system productivity.
q- the amount of information contained in the template;
p– the performance of the processor (s).
Fig. 7. Partition of LM for different groups of templates containing
different percentage of signs

ACKNOWLEDGMENT
This work has been accomplished with the financial
support of the MES by the Grant No. D01-221/03.12.2018
for NCDSC – part of the Bulgarian National Roadmap on
RIs.

REFERENCES

[1] Jain, Anil K.; Ross, Arun (2008). "Introduction to Biometrics". In


Jain, AK; Flynn; Ross, A. Handbook of Biometrics. Springer. pp. 1–
22. ISBN 978-0-387-71040-2.
[2] Trifonov R., Gotseva D., Angelov V., Analysis of Data Mining
Evaluation Methods’ Efficiency, International Journal of
Development Research, Vol. 07, Issue, 11, November, 2017, ISSN:
2230-9926, pp.16880- 16884
[3] Nakov, O., D. Gotseva, V. Gancheva, Database Server Optimization
and Analysis, Proceedings of the 6th International Scientific
Conference “Computer Science’2011”, 2011, pp. 455-460, ISBN:
978-954-438-914-7.
[4] R. Ilieva, K. Anguelov and M. Nikolov, "A Cynefin Framework for
Agile Decision Making of AI BOTS," 2018 International Conference
on High Technology for Sustainable Development (HiTech), Sofia,
2018, pp. 1-4., IEEE, doi: 10.1109/HiTech.2018.8566411
[5] Dimitrov, K., Laskov, L., A comparative analysis of thermopile
sensors for biomedical applications, 14th INTERNATIONAL
ENGINEERING CONFERENCE on ‘Communications,
Electromagnetics and Medical Applications’(CEMA’19) Sofia,
Bulgaria, October 17th-19th, 2019
[6] Todorov, G., Todorov, T., Ivanov, I., Valtchev, S.,Klaassens, B.,
Tuning techniques for kinetic MEMS energy harvesters, INTELEC,
International Telecommunications Energy Conference, Oct 8-
12, 2011
[7] Yankov, E., Nikolova, M., & Dechev, D., Ivanov, N., Hikov, T.,
Valkov, S., Dimitrova, V., Yordanov, M., Petrov P.,. (2018). Changes
in the Mechanical Properties of Ti Samples with TiN and TiN/TiO2
Coatings Deposited by Different PVD Methods. IOP Conference
Series: Materials Science and Engineering. 416. 012062.
10.1088/1757-899X/416/1/012062.
[8] John L. Hennessy; David A. Patterson (16 September
2011).Computer Architecture: A Quantitative Approach. Elsevier.
pp. B–12. ISBN 978-0-12-383872-8. Retrieved 25 March 2012

View publication stats

You might also like