Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 25

User Identification Through Keystroke Biometrics

Seminar Report

INTRODUCTION:
The increasing use of automated information systems together with our pervasive use of computers has greatly simplified our lives, while making us overwhelmingly dependent on computers and digital networks. Technological achievements over the past decade have resulted in improved network services, particularly in the areas of performance, reliability, and availability, and have significantly reduced operating costs due to the more efficient utilization of these advancements. Some authentication mechanisms recently developed requires users to perform a particular action and then some behavior of that action is examined. The traditional method of signature verification falls in this category. Handwritten signatures are extremely difficult to forge without assistance of some copier. A number of identification solutions based on verifying some physiological aspect known as BIOMETRICS have emerged. Biometrics, the physical traits and behavioral characteristics that make each of us unique, are a natural choice for identity verification. Biometrics is an excellent candidate for identity verification because unlike keys or passwords, biometrics cannot be lost, stolen, or overheard, and in the absence of physical damage they offer a potentially foolproof way of determining someone's identity. Physiological (i.e., static) characteristics, such as fingerprints, are good candidates for verification because they are unique across a large section of the population. Indispensable to all biometric systems is that they recognize a living person and encompass both physiological and behavioral characteristics.
www.seminarsonly.com

User Identification Through Keystroke Biometrics

Seminar Report

Biometrics is of two kinds. One deals with the physical traits of the user and the other deals with the behavioral traits of the user. Retinal scanning, fingerprint scanning, face recognition, voice recognition and DNA testing comes under the former category, while typing rhythm comes under the later category. Physiological characteristics such as fingerprints are

relatively stable physical features that are unalterable without causing trauma to the individual. Behavioral traits, on the other hand, have some physiological basis, but also react to a person's psychological makeup. Most systems make use of a personal identification code in order to authentication the user. In these systems, the possibility of a malicious user gaining access to the code cannot be ruled out. However, combing the personal identification code with biometrics provides for a robust user authentication system. Authentication using the typing rhythm of the user on keyboard or a keypad takes advantage of the fact that each user would have a unique manner of typing the keys. It makes use of the interstroke gap that exists between consecutive characters of the user identification code. While considering any system for authenticity, one needs to consider the false acceptance rate and the false rejection rate. The False Acceptance Rate (FAR) is the percentage of un-authorised users accepted by the system and the False Rejection Rate (FRR) is the percentage of authorised users not accepted by the system. An increase

www.seminarsonly.com

User Identification Through Keystroke Biometrics

Seminar Report

in one of these metrics decreases the other and vice versa. The level of error must be controlled in the authentication system by the use of a suitable threshold such that only the required users are selected and the others who are not authorised are rejected by the system. In this technique, standard deviation of the user's training period entry is used as a threshold. The correct establishment of the threshold is important since too strong a threshold would lead to a lot of difficulty in entry even for the legal user, while a lax threshold would allow un-authorised entry.

www.seminarsonly.com

User Identification Through Keystroke Biometrics

Seminar Report

KEYSTROKE BIOMETRICS ON A KEYBOARD


An authentication system based on key stroke pattern and measure of the inter stroke gap can be easily implemented. One major drawback in using other biometrics for authentication is the overhead incurred. Both the amount of space and the money incurred in using typing characteristics for authentication are comparatively less. As the security mechanism is not visible, unauthorized users cant have an idea of the security measure. Further, the operating system doesnt have to perform any task other than maintaining the database of each user and running the program every time one logs onto the system. The time gap between consecutive keystrokes is a unique characteristic of the user. The typing rhythm is self-tuned by the user to suit his needs. As the keyboard has duplicate keys, the typing rhythm also depends on whether the user is a left handed person or a righthanded person. Both the FAR and the FRR depend to some extent on the deviation allowed from the reference level and on the number of characters in the identification code. It has been observed that providing a small deviation lowers the FAR to almost nil but at the same time tends to increase the FRR. This is due to the fact that the typing rhythm of the user depends to some extent on the mental state of the user. A balance would have to be established taking both the above factors into consideration.

www.seminarsonly.com

User Identification Through Keystroke Biometrics

Seminar Report

Keystroke dynamics include several different measurements which can be detected when the user presses keys in the keyboard. Possible measurements include: Latency between consecutive keystrokes. Duration of the keystroke, hold-time. Overall typing speed. Frequency of errors (how often the user has to use backspace). The habit of using additional keys in the keyboard, for example writing numbers with the numpad.

In what order does the user press keys when writing capital letters, is shift or the letter key released first.

The force used when hitting keys while typing (requires a special keyboard).

www.seminarsonly.com

User Identification Through Keystroke Biometrics

Seminar Report

IMPLEMENTATION DETAILS
When a user types his authentication code, there exists a particular rhythm or fashion in typing the code. If there does not exist any abrupt change in this rhythmic manner, this uniqueness can be used as an additional security constraint. It has been proved experimentally that the manner of typing the same code varies from user to user. Thus this can be used as a suitable biometric. Further, if the user knows beforehand about the existence of this mechanism, he can intentionally introduce the rhythm to suit his needs.

The mechanism: As the user logs onto the system for the first time, a
database entry is created for the user. He is then put through a training period, which consists of 15-20 iterations. During this time, one obtains the inter-stroke timings of all the keys of the identification code. The mean and standard deviation of the above code are calculated. This is done in order to provide some leverage to the user typing the code. The system has to incur the additional overhead of maintaining the database, which would contain all the users information. These details can also be incorporated onto the systems password files in order to save the additional overhead incurred. The inter stroke interval between the keys is measured in milliseconds. The systems delay routine can be used to serve the purpose. The delay routine measures in milliseconds and the amount of delay incurred between successive strokes can be used as a counter to record this time interval. Like any other normal system, a new user is asked to register in order to add his name onto the database. The only
www.seminarsonly.com

User Identification Through Keystroke Biometrics

Seminar Report

difference that exists now is that he would have to go through a training period of about 15-20 iterations wherein one obtains the reference level and the deviation for the user. The reference level that we chose is the mean of the training period and the rounded standard deviation is used as the leverage allotted per user. These values are fed into the database of the user.

Fig .1 Authentication. Dotted areas are added to the normal authentication procedure.

www.seminarsonly.com

User Identification Through Keystroke Biometrics

Seminar Report

The mean and the standard deviation can be determined by using the relationship given below: Mean = 1/n x(i) Standard Deviation = Sqrt {[ x(i) Mean] 2/n} Once the database entry has been allotted for the user, this can be used in all further references to the user. The next time the user tries to login, one would obtain the entered inter stroke timing along with the password. A combination of both the metrics is used a security check of the user.

www.seminarsonly.com

User Identification Through Keystroke Biometrics

Seminar Report

ALGORITHM:
The algorithm given in the following section gives the details of obtaining the authorization for a particular user. system delay routine available. Input: User database, User name, password. Output: Acceptance of the user if registered or registration of a new user. Main () { If (user = = new) { //register the user //add the user to the database Obtain the password Add user (database); } else { read (user): read (deviation); if(usercount<20) { deviation = 30 increment user count add userpass(database) The algorithm assumes that the database already exists in the system and one has a

www.seminarsonly.com

User Identification Through Keystroke Biometrics

Seminar Report

} else { read(deviation) } if(check(user,password, deviation)) { login: } else exit(0); } }

www.seminarsonly.com

User Identification Through Keystroke Biometrics

Seminar Report

ANALYSIS OF INTER-KEYSTROKE TIMING OF USER CODE


A graph is plotted between keystrokes and keystroke timing. The x axis indicates the number of inter-keystrokes and negative y axis indicates the inter-keystroke timing in milliseconds.

Graph 1: Inter-keystroke timing analysis when the user is accepted

Graph 2: Inter-keystroke timing when the user is not legal, or is not following his rhythmic behavior.

www.seminarsonly.com

User Identification Through Keystroke Biometrics

Seminar Report

User accepted:
Graph 1 show the inter-keystroke timing analysis when the user is accepted. Here it can be easily seen that when the user is authentic or when he types in his normal rhythm, the user automatically comes into the predefined ranges. The current inter-keystroke timing lies around the database inter-keystroke time, there providing adequate amount of predefined ranges. FAR and FRR can be reduced to great extend so that only the legal user gets access to the system. The +R boundary and R boundary give the desired range so that only the legal user gets access. In the graph, the yellow line indicates the current pattern of typing the access code on the keyboard, the green line indicates the keystroke pattern according to the reference level, and the red and blue lines indicate the positive and the negative ranges .The ranges can be decided by the standard deviation method.

User not accepted:


Graph 2 indicates inter-keystroke timing when the user is not legal or not following his rhythmic behavior of typing the access code. It can be easily noticed that when the user is not legal, his typing pattern for the access code is not at all into the predefined ranges.

www.seminarsonly.com

User Identification Through Keystroke Biometrics

Seminar Report

THE CURRENT STATE OF KEYSTROKE DYNAMICS


Keystroke verification techniques can be classified as either static or continuous. Static verification approaches analyze keystroke verification characteristics only at specific times, for example, during the login sequence. Static approaches provide more robust user verification than simple passwords, but do not provide continuous security; they can not detect a substitution of the user after the initial verification. Continuous verification, on the contrary, monitors the user's typing behavior throughout the course of the interaction. As early as 1980 researchers have been studying the use of habitual patterns in a users typing behavior for identification. To our knowledge, Gaines was the first to investigate the possibility of using keystroke timings for authentication. Experiments were conducted with a very small population of seven secretaries. A test of statistical independence of their profiles was carried out using the T-Test under the hypothesis that the means of the digraph times at both sessions were the same, but the variances different. While the approaches of Gaines and Leggett address a number of problems inherent with identity verification via keystroke timings, there was considerable room for improvement. For example, the pool variance estimate used is meaningful only when there is homogeneity of variance across all reference digraph latencies; however, studies by Mahar show that there is significant variability with which typist produce each digraph, and hence the use of a pooled estimate digraph latency variability is inappropriate.

www.seminarsonly.com

User Identification Through Keystroke Biometrics

Seminar Report

An additional limitation of the digraph latency based technique is the use of a single low-pass temporal filter for all typists for the removal of outliers. The rationale for this approach is that digraphs with abnormally long latencies are not likely to be representative of the authorized users' typing. While this seems like a reasonable assumption it has recently been shown that one filter value for all typists does not yield optimal performance. Furthermore, empirical data from Gentner suggests that the median inter-key latency of expert typists is approximately 96 ms, while that of novice typists is near 825 ms. Therefore, the 500 ms low-pass filter used excludes many keystrokes typical of a novice typists, while at the same time, includes many keystrokes which are not representative of an expert typist. Studies showed that the use of digraph-specific measures of variability instead of one low-pass filter can lead to measurable improvements in verification accuracy. Moreover, this approach to keystroke verification uses the key down-to-down time as the base unit of measure, but this measure may be further delineated into two orthogonal components -total time the first key is depressed (i.e. keystroke duration), and the time between a key is released and the next key is pressed (i.e. keystroke latency). Previous work used these two components in their verification systems. However, the initial sample sets did not provide enough data to ascertain whether the use of the two separate orthogonal digraph components added significant predictive power to the more traditional key down-todown measure. Substantially improved performance results based on using the bivariate measure of latency with an appropriate distance measure were achieved.
www.seminarsonly.com

User Identification Through Keystroke Biometrics

Seminar Report

Some neural network approaches have also been undertaken in the last few years. While the back-propagation models used yield favorable performance results on small databases, neural networks have a fundamental limitation in that each time a new user is introduced into the database, the network must be retrained. For applications such as access control, the training requirements are prohibitively expensive and time consuming. Furthermore, in situations where there is a high turnover of users, the down time associated with retraining can be significant. A promising research effort in applying keystroke dynamics as a static authentication method is the work of Joyce and Gupta. Their approach is relatively simple and yields impressive results. The performance results reported here are based on a database of profiles collected over a period of 11 months. Data for 63 users was collected on a variety of Sun Workstations at NYU and Bell Communications Research. Typing proficiency was not a requirement in this study although almost all participants were familiar with computers. Unlike previous studies in which the observers had complete control over the collection of the data, participants ran the experiment from their own machines at their convenience. Participants downloaded and executed the experiment on their local machines and the results were automatically encoded and electronically mailed back. Figure (1) shows an example of a profile received for a user in the data set. An alternate representation showing plots of the covariance matrices (of the keystroke latencies for a particular feature set) for different users over different time intervals is shown in figure 2.

www.seminarsonly.com

User Identification Through Keystroke Biometrics

Seminar Report

Fig.1. Example reference profile. The top n most frequent features in the pattern vector are shown on the X-axis. The users keystroke latencies, as well as keystroke durations, are graphed above. The graphs show that on average, the user suppresses keys for a longer period than it takes him/her to type them.

Data Extraction
To evaluate the behavior and performance of each of the classifiers presented we developed a C++ toolkit for analyzing the data. The toolkit was built using the xview library routines, and serves as a front-end to the main recognition engine. The toolkit is helpful in diagnosing system behavior and can generate graphical output for both the Mat lab and Gnuplot systems. Figure (3) is from the main panel of the toolkit. The data extraction toolkit provides a quick way to establish rough properties on the data set by partitioning the users in distinct groups. Our clustering criterion represents a heuristic approach that is
www.seminarsonly.com

User Identification Through Keystroke Biometrics

Seminar Report

guided by intuition. Users are clustered into groups comprising of (possibly) disjoint feature sets in which the features in each set are pair wise correlated.

Fig.2. Plots (a) and (b) depict the covariance matrices for the same user at two different time intervals across the same set of features. Plot (c) shows the covariance matrix for a different user over the same set of features.

Feature sets are determined through Factor Analysis (FA). Factor analysis seeks a lower dimensional representation that accounts for the correlation among features. This idea partitions the database of users into subsets whose in-class members are similar" in typing rhythm over a particular set of features and whose cross-class members are dissimilar in the corresponding sense. For example, members of group i may exhibit strong individualistic typing patterns for features in the set S = {th; ate; st; ion}, whereas members of group j may be more distinctive over the features S = {ere; on; wy}.

www.seminarsonly.com

User Identification Through Keystroke Biometrics

Seminar Report

Fig. 3. To automate the data selection and extraction process, a system toolkit was designed to assist in the visualization, tuning, and overall analysis of the data. A graphical user interface with various tunable options allows the operator to diagnose the performance of each of the classifiers in detail. The above is a snapshot from the main panel of the interface.

www.seminarsonly.com

User Identification Through Keystroke Biometrics

Seminar Report

FUTURE DEVELOPMENTS
The keystroke technique requires the training period to be mutually exclusive of the functional period of the system. This would not account for the behavioural changes of the individual. The system performance can be further enhanced if the training period can be extended throughout the life of the system and the reference level constantly updated to reflect the behavioural changes of the user. Biometrics using the behavioural aspects of an individual are generally prone to error. This is because of the fact that human behaviour is unexpected at any given time. Even if we neglect the freak incidents (termed as noises), which might be due to the fluctuating moods of the person, one has to account for those change that one encounters in the long run. This is where the adaptive algorithm becomes useful. This algorithm can be a simple value predictor based on normal curve-fitting applications or can be an application of artificial neural network (ANN). Though the first approach is easy to model, the second approach provides a more accurate analysis due to the inherent compatibility between the ANN approach and the behavioural aspects of a normal individual. The implementation was done on a comparatively small scale and results obtained were based on limited observation. It can be extended to a large scale such as a campus network over a considerably longer period of time. This would provide more convincing evidence of the FAR and FRR, and we can obtain a better comparison of the existing applications.
www.seminarsonly.com

User Identification Through Keystroke Biometrics

Seminar Report

APPLICATIONS
Keystroke dynamics has many applications in the computer security arena. One area where the use of a static approach to keystroke dynamics may be particularly appealing is in restricting root level access to the master server hosting a Kerberos [21] key database. Any user accessing the server is prompted to type a few words or a pass phrase in conjunction with his/her username and password. Access is granted if his/her typing pattern matches within a reasonable threshold of the claimed identity. This safeguard is effective as there is usually no remote access allowed to the server, and the only entry point is via console login. Alternatively, dynamic or continuous monitoring of the interaction of users while accessing highly restricted documents or executing tasks in environments where the user must be alert" at all times (for example air traffic control), is an ideal scenario for the application of a keystroke authentication system. Keystroke dynamics may be used to detect uncharacteristic typing rhythm (brought on by drowsiness, fatigue etc.) in the user and notify third parties.

www.seminarsonly.com

User Identification Through Keystroke Biometrics

Seminar Report

ADVANTAGES
1) Verification is based on the concept that how a person types, in particular their rhythm. Even if intruders guess the correct password, they cannot type it in with the proper rhythm. 2) It is a transparent system, which uses the familiar keyboard. Users adapt to this method quickly because it is easy and unobtrusive. 3) The keystroke biometrics technique is software based, requiring no external hardware. It is just as easy as installing a new program. 4) System administrators using this technique can easily expand the number of users who are required to authenticate themselves, thereby integrating and strengthening the existing security systems.

www.seminarsonly.com

User Identification Through Keystroke Biometrics

Seminar Report

CONCLUSION:
Keystroke biometrics is a fool proof security solution. Even if the unauthorized user discovers the access code, he cannot get access to the system until and unless he also knows the typing rhythm. The disadvantage of this biometrics is that if the user looses his rhythm abruptly, he will not be able to access the system.

www.seminarsonly.com

User Identification Through Keystroke Biometrics

Seminar Report

REFERENCES:
1) IEEE Transactions on Pattern Analysis and Machine Intelligence. 2) Electronics For You. 3) IEEE Spectrum. 4) www.ieee.org

www.seminarsonly.com

User Identification Through Keystroke Biometrics

Seminar Report

ABSTRACT:
Global access to information and resources is becoming an integral part of nearly every aspect of our lives. Unfortunately, with this global network access, come increased chances of malicious attack and intrusion. In an effort to confront the new threats unveiled by the networking revolution of the past few years reliable, rapid, and unintrusive means for automatically recognizing the identity of individuals are now being sought. Verification of user identity, also known as user authentication, is necessary for secured financial transactions. In this paper we examine an emerging non-static keystroke biometric technique that aims to identify users based on analyzing habitual rhythm patterns in the way they type. Keystroke biometrics provides a foolproof authentication solution.

www.seminarsonly.com

User Identification Through Keystroke Biometrics

Seminar Report

CONTENTS:
1) Introduction. 2) Keystroke biometrics on a keyboard. 3)

01 04 06 09 11 13 19 20 21 22 23

A : Implementation details. B : Algorithm.

4) Analysis of inter-keystroke timing of user code. 5) The current state of Keystroke Dynamics. 6) Future developments. 7) Applications. 8) Advantages. 9) Conclusion. 10) References.

www.seminarsonly.com

You might also like