Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Technical White Paper

Behaviometrics

Measuring FAR/FRR/EER in Continuous Authentication


Draft version 12/22/2009
Table of Contents

Background ............................................................................................................................... 1
Calculating FAR/FRR/EER in the field of Biometrics ...................................................................... 1
Calculating FAR/FRR/EER on Behavio ......................................................................................... 3
Training the behavioral profiles ................................................................................................. 3
Introducing level of Trust.......................................................................................................... 3
1.1.1 How trust can be applied ........................................................................................ 4
1.1.2 A practical example of trust ..................................................................................... 4
Processing the collected behaviors ......................................................................................... 5
1.1.3 Making Behavio utilize the collected behaviors ......................................................... 5
Calculating the FAR/FRR/EER ratios ........................................................................................ 6
Delimitations ........................................................................................................................... 6
Results ...................................................................................................................................... 6
Background
The word “Behaviometrics” derives from the terms “behavioral” and “biometrics”. “Behavioral” refers
to the way a human person behaves and “biometrics”, in an information security context, refers to
technologies and methods that measure and analyzes biological characteristics of the human body
for authentication purposes; for example fingerprints eye retina and voice patterns.
In other words Behaviometrics, or behavioral biometrics, is a measurable behavior used to
recognize or verify the identity of a person. Behaviometrics focus on behavioral patterns rather than
physical attributes.
After a user is verified with traditional security techniques, such as passwords, Behaviometrics can
enhance the protection even after the user has logged in. It can continuously monitor the user
during the whole working session to create an ongoing authentication process.
The purpose of this paper is to present a methodology for calculating the performance of a
continuous behavioral authentication system. Standard procedures used for biometrics are not
sufficient since they are not developed to constantly authenticate users. Hence a new methodology
is needed.
A biometric authentication system can check if a user is accepted into a system. If a user is
accepted that should not be, it is called a false accept. If a user that should be accepted is not, it is
called a false reject. The ratio between users that falsely attempts to enter and users falsely
accepted is called false accept rate (FAR). While the ratio between correct users being accepted
and rejected is called false reject rate (FRR).
A behavioral continuous authentication system uses a set of behavioral traits to calculate a similarity
ratio between the current user’s behavior and the expected. The similarity can be combined with a
threshold so that if the similarity drops below the set threshold the user will be detected.
It is because the similarity is gathered over time and the dependency for a threshold to accept or
reject a user that old methods are not sufficient.

Calculating FAR/FRR/EER in the field of


Biometrics
Biometrical systems generally separate impostors from a correct user by matching a score against
a threshold. The score is how similar a sample and a template is; the higher score the more similar
they are. The threshold is a line that says that all scores above this line is considered to be the
correct user while all scores that are below the threshold is considered to be an impostor.

1
Classifying Samples by using a Threshold
100

80

60
Score

40

20

0
Sample 1 Sample 2 Sample 3 Sample 4 Sample 5

The performance of a biometrical system is usually measured in terms of false accept rate (FAR),
false reject rate (FRR) and equal error rate (EER). The false accept rate is the percentage of invalid
inputs that are incorrectly accepted (match between input and a non-matching template). The false
reject rate is the percentage of valid inputs that are incorrectly rejected (fails to detect a match
between input and matching template).
The equal error rate indicates the accuracy of the system. The false accept rate and false reject
rate intersect at a certain point which is called the equal error rate (the point in which the FAR and
FRR have the same value).

100
90 ROC Curve
Accept / Reject Ratio (%)

80
70
60
50 FAR
40 FRR
30
EER
20
10
0
0 10 20 30 40 50 60 70 80 90 100
Threshold level

In theory, the correct users should always score higher than the impostors. A single threshold could
then be used to separate the correct user from the impostors.
In general, the matching algorithm performs a decision based on a threshold which determines how
close to a template the input needs to be for it to be considered a match. If the threshold is
reduced, there will be less false non-matches but more false accepts. Correspondingly, a higher
threshold will reduce the false accept rating but increase the false reject rating.
In some cases impostor patterns generate scores that are higher than the patterns from the user.
For that reason that however the threshold is chosen, some classification errors occur.

2
Depending on the threshold, a range between all and none of the impostor patterns are falsely
accepted by the system. The choice of threshold value is a problem if the scoring distribution of the
correct user and impostor overlap.
When comparing biometric systems some of the systems just specify a FAR value. A single FAR
without the corresponding FRR is not sufficient since it is possible that the system with the lowest
FAR has a high FRR.
In some cases where the threshold is adjustable there is not a reasonable way to decide which of
them that are performing better by just looking at the FAR and FRR values. To get a threshold
independent performance measurement the EER can be used. The lower the EER, the more
accurate the system is considered to be.

Calculating FAR/FRR/EER on Behavio


To calculate the FAR/FRR/EER ratings for Behavio, our desktop behaviometric security solution a
test group of 40 users where selected. They have used Behavio in a real world environment for 3
months. A behavioral monitor was installed on each subject’s computer which collected behavioral
data regarding specific applications used and associated keyboard and mouse events.

Training the behavioral profiles


In the beginning the profile will be empty and Behavio has to learn the behavior of the user. At an
early stage it is difficult to differentiate between persons so initially Behavio assumes that it is the
correct user handling the computer.
In order to handle the evolution of the user’s behavior the system has to tolerate small shifts and
gradually make the necessary changes in the profile.

The amount of training of the behavioral profiles is measured in insertions. An insertion is a


keystroke event of some sort. Each time the profile is updated with key press statistics or a key
flight statistics the amount of insertions will increase by one. A typical keystroke (moving between
two letters) triggers 5 events so it would increase the number of insertions by 5. However, in the
real world this is not true. Factors such as the quarantine, pauses and immeasurable samples have
to be taken into account.
The amount of time it takes to train the profiles varies depending on how fast the user is typing and
how much the user is using the computer.
Our studies show that a typical user…
…has an average typing speed between 200 and 600 milliseconds between each keystroke.
…has to type about 0.5 keystrokes to get 1 insertion.
This means that an average user has to type about 10 000 keystrokes to achieve 20 000
insertions. For a fast typist it would require roughly 30 minutes and for a slow typist about 100
minutes of active writing.

Introducing level of Trust


Trust can be measured as a percentage. At 100% the system fully trusts that it is the correct user
and opposite if the trust level reaches 0%the system triggers detection.

3
+++
++ Similarity Confidence Ratio
+ Threshold
+ Increase of trust
- - Decrease of trust
--
Figure 1: The concept of trust
The similarity score is mapped against a trust model by using a threshold. If the user is above the
threshold the trust will increase, and if the user is below the threshold the trust will decrease.
Staying above the threshold improves your trust level to 100%; the higher you are above the
threshold the quicker the trust reaches its maximum.
Some users will be faster to detect than other because of two things:
1) How much the behavior differs between the active user and the template
If the difference is large enough the detection will be faster than if the difference is smaller.
2) How much trust the previous user has achieved
If the previous user has worked itself up to be fully trusted, it will take longer time for the
incorrect user to reach the not trusted level.
Abnormal actions can also trigger a decrease in trust such as using key combinations that have
never been used before or repeatedly providing immeasurable samples (compare to pushing your
forehead against a finger print scanner).

1.1.1 How trust can be applied


These numbers are just examples to demonstrate how trust could be calculated:
• A user passed a test, that has to increase the trust (i.e. is over the threshold)
+1% (can be tied to level of success)
• A user failed a test (i.e. is below the threshold)
-1%(can be tied to level of breach)
• A user hit a key that has never been used before, is it really the same user?
-1%
• A user triggers the immeasurable sample alert
-2%

1.1.2 A practical example of trust


Event Change in trust Trust

Passed test (10 times) +10% 10%

Failed test -1% 9%

Key that has not been used -1% 8%

Passed a test +1% 9%

Triggers invalid sample (3 times) -6% 3%

Failed test (3 times) -3% 0% (Detection!)


Table 1: A practical example of trust

4
Processing the collected behaviors
Normally Behavio installs hooks into the system to retrieve keyboard and mouse data. From there
the retrieved data goes through the comparison engine which calculates the probability that the
sample belong to the profile. The calculated statistics are then handed over to the validator which
will determine whether it is the correct user or not.

System

Behavioral Monitor
Profile

Figure 2: Logical presentation of


Comparator Validator system flow

1.1.3 Making Behavio utilize the collected behaviors


To have Behavio evaluate the previously collected data, the monitor is replaced by a database
reader which will read the data that we gathered from the test group. The comparator then
performs the calculations in a normal manner but instead of transferring it to the validator it will store
the statistics so that it can be analyzed in a FAR/FRR tool.

Behavioral
Data

Behavioral Data Reader


Profile

Comparator Validator

Statistics Figure 3: Logical presentation of


system flow after modifications

BehavioSec
FAR/FRR tool

5
Calculating the FAR/FRR/EER ratios
The FAR/FRR tool calculates the ratios by measuring the amount of time that the level of trust has
been over and below the threshold. There is no upper or lower cap in how much trust a user can
gain or lose.

Delimitations
For this paper the trial is going to be limited to keyboard input. Only users with completeness level
of 20 000 insertions will have a behavioral profile generated and compared against. Only
measurable samples are going to be used.

Results
The results of the trial show that there is a significant difference in behavior between users.
During the trial Behavio managed to achieve an EER of 3.05% for the users in the test group. With
a false reject rate under 1% only 4.3% of the input from other users would be falsely accepted and
at a false accept rate below 5%, 0 % of the input from the correct user is incorrectly rejected.

100

80

60
Ratio (%)

40

20

0
0 10 20 30 40 50 60 70 80 90 100
Threshold

You might also like