Statistical Models Based Password Candidates Generation For Specified Language Used in Wireless LAN Security Audit

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Statistical Models Based Password Candidates

Generation for Specified Language Used


in Wireless LAN Security Audit
Jan Krekan, Matus Pleva and Lubomir Dobos
Department of Electronics and Multimedia Communications,
Faculty of Electrical Engineering and Informatics,
Technical University of Kosice,
Park Komenskeho 13, Kosice, Slovakia
Email: Jan.Krekan@tuke.sk, Matus.Pleva@tuke.sk, Lubomir.Dobos@tuke.sk

Abstract—The purpose of this article is to describe a new are not common words or phrases (non dictionary words),
method which is proposed to be the best practice for creating have tendency to be ”remember-able”. When something is
a very effective password candidate lists for specified language, ”remember-able” or ”human friendly” it means that there is
which could be then also used to test the security level of wireless
networks protected by WPA/WPA2 PSK standards. The main a possibility to create the model of ”meaningful” passwords
principle of this technique is to create the statistical model of and try them instead of testing all possible combinations of
the new target language which could be used for password characters (Brute-force attack) [5], [6]. Of course a new special
candidates generation in controlled order for security audit of techniques to reveal these passwords was implemented by
the wireless network. It means that the list starts with more leading companies called Advanced dictionary search attack
probable combinations, going to the less probable ones, so it
can be said that this approach means sorting the Brute-force [2], Smart Force Attack [3] or XieveTM [4] for main lan-
candidates according to specified language, or predicting the guages as Arabic, Dutch, English, French, German, Italian,
usage of letter combinations according to the specified language Portuguese, Russian, and Spanish but there is a very limited
statistics. The tests have shown that this approach of generating possibility that another language will be added on demand.
more probable combinations as first, could improve the procedure
since it is about 15 times faster in finding about 70% of This article will describe a new technique which allows
passwords than common Brute-force attack, comparing to about to create very efficient lists of possible passwords for any
20% effectiveness of old dictionary attacks. language. These password candidates are derived from the
word-list which contains as much meaningful words of the
I. I NTRODUCTION
required language as possible text corpus used for typical
Efficiency of securing and protecting information stored or language modeling. This word-list serves as input to the open
transferred using wireless networks is very important. When source statistical algorithm available for research [7], which
somebody wants to effectively protect information in the then generates the combination of characters (the password
organization, he has to have methods which allow to test the candidates list) based on that language, which was developed
strength of data protection being used in community complex and presented on last year Passwordˆ12 conference in Oslo
system to prevent unauthorized access to that information. [8].
These methods are used also for recovering lost passwords (or
The second very important feature of this algorithm is that
changed by attacker), forensic examination and data recovery.
the password candidates list generated using this approach
The main motivation of this research is based on security
should not be sorted afterwards, because the output password
audit of WLAN networks task of international FP7 project
candidates list is produced from the most probable passwords
INDECT [1]. Since testing the WPA handshakes (described in
to the less probable ones.
next section) is very computationally intensive process (also
using GPU acceleration), the efficiency is very crucial. This The total boost in performance in password recovery for
new approach is trying to predict the usage of passwords (not new chosen language using this statistical approach is around
only dictionary ones) of the WLAN user, when the mother 1433% time-saving for 70% success, when compared to the
tongue affect the passwords in all security applications, not standard recovery methods as described below.
only WPA handshakes. This approach is important when dic- This article describe the research on the base of development
tionary attacks are not effective for different reasons described of WLAN security audit solution, so the security audits are
later in the paper. described in the next section, then a statistics of password
Research made by leading companies which are focusing usage is described, next the new statistical method of password
on development of password recovery and forensic solutions candidates generation is proposed and finally the conclusion
showed [2], [3], [4], that a lot of well selected passwords which and future work is discussed.

978-1-4799-0944-5/13/$31.00 ©2013 IEEE

95
Authorized licensed use limited to: University of West London. Downloaded on August 13,2023 at 04:07:05 UTC from IEEE Xplore. Restrictions apply.
Password Comparing
WPA
with
PSK generation
captured
4096 x hashing
ESSID/SSID data

Fig. 2. WPA PSK hash key cracking using password and SSID and multiply
hashing function.

The deauthentication attack is performed by sending deau-


thentication packets to the client that is connected to the
desired AP or also a strong noise source in the transmission
channel for a while could lead to losing connectivity of
all clients [11]. To perform the deauthentication attack it is
necessary to know the MAC address of the client that is going
Fig. 1. Authentication and Association state machine of the WLAN client
to be disconnected. Next, the packets to disconnect are being
[10] sent using tools such as aireplay-ng6 or Commview for
Wifi’s Node re-association tool.
When the attack was successful, the station which was dis-
II. WLAN SECURITY AUDIT METHODS OF WPA-PSK connected just reconnects and the desired WPA Handshake is
PROTECTED NETWORKS captured. When the handshake was not captured, this process
In order to be able to start the security audit of the needs to be repeated until success. When the handshake is
WLAN network, the auditor must obtain the WPA-PSK 4-way captured, the process of WLAN network security auditing may
handshake [9] from the communication of the clients which be launched.
are connecting to the audited access point. This can be done in C. Handshake decryption
two ways: using passive sniffing, or using the deauthentication
Because WPA uses robust encryption without known weak-
attack [10].
nesses, the only way to recover a password is by trying
A. Passive sniffing password candidates one by one. This process is very slow
First, an application such as kismet1 is used to identify the because each password must be hashed multiple times (more
channel where the desired network is broadcasting. When the than 4000 times) as described in [9], [10] or [12] and simply
channel is identified, sniffer such as airodump-ng2 starts depicted on the Figure 2.
A single Intel QX 9550 processor (quad core) can test
to capture the traffic which is broadcasted on that channel.
around 2000 passwords per second. Because of such a low
Then the captured data are passed to the application such as
speed, a Brute-force attack (i.e. trying all possible character
Aircrack-ng3 or professional GPU accelerated EWSA4 to
combinations) does not make sense, as the minimum allowed
see whether they contain the desired information. This method
WPA password length is eight characters.
is quite slow, since the auditor must wait for a legitimate user
Let us assume that the password is composed of small
to connect to the network because this is the time when the
characters and numbers and the password length is 8 char-
data required for the audit are broadcasted.
acters. The total number of passwords which must be tested
B. Deauthentication attack to guarantee that the password will be found is 368 which is
The first step is similar to the previously discussed method. 2 821 109 907 456. GPU powered methods could test around
The channel used by network is identified. 50000 passwords per second which means that the total re-
Secondly, the sniffer such as Commview for WiFi5 or quired time for password recovery will be 653 days [12]. This
airodump-ng mentioned earlier is used to see whether the is the main reason why this paper discusses faster statistical
access point has active clients which are connected to it. At this methods for performing security audit for WPA/WPA2-PSK
point, it is necessary to mention that this attack is applicable protected networks.
when there is at least one connected client to the access point III. C OMMON PASSWORDS STATISTICS
or network (group of access points) which is being audited. The data summarized in this table show the usual behavior
When there is at least one client connected to the network, the of the people when they choose passwords for authentication.
deauthentication attack may be launched. The passwords for this research was revealed from the server
1 http://www.kismetwireless.net/ authentication file in which people were not forced to choose
2 http://www.aircrack-ng.org/doku.php?id=airodump-ng ”secure” passwords (for example secure passwords should
3 http://www.aircrack-ng.org/ contain at least one special character or number).
4 http://www.elcomsoft.com/ewsa.html
5 https://www.tamos.com/products/commwifi/ 6 http://www.aircrack-ng.org/doku.php?id=aireplay-ng

96
Authorized licensed use limited to: University of West London. Downloaded on August 13,2023 at 04:07:05 UTC from IEEE Xplore. Restrictions apply.
TABLE I
PASSWORD STATISTICS FROM SMALL S LOVAK SERVER SUITE REVEALED using or cstocs7 , iconv or Diakritika tool. In our case
PREVIOUSLY AND AFTERWARDS NOTIFYING THE USERS TO CHANGE THEM the diacritics was removed and the wordlist sorted to have only
BECAUSE THE PASSWORD POLICY WAS INCREASED
unique words after conversion.
Password type found percent from all Flective languages such as Slovak, Czech are more compli-
dictionary based 34 38.6% cated than for example English, because they allow to create
numbers only 5 5.6% words which are the result of word mangling [18].
dictionary mutations 6 6.8% Let us assume the word ”mesto”, in English: a town. When
language statistics 22 25.0%
one wants to say a small town, in Slovak language he will use
total found 67 76.1%
total passwords 88 100%
”mestecko”. This example clearly shows the difference. While
English language uses second word to express the property of
the following word, the languages like Slovakian can express
this by appending some characters to the original word [19].
As we can see in the Table I out of 88 passwords 34 could
This is the reason why the leading companies which develop
be revealed using trivial dictionary attack. By searching just
software for auditing of WPA-PSK protected networks don’t
numbers only 5 passwords were found and using standard
support those languages in their statistical attacks imple-
dictionary mutations only 6 passwords were found.
mentations [4]. The technique that we used looks on texts
After standard dictionary and number based tests 43 pass- as it was a series of characters. The process of creating a
words remained unrevealed. After successful implementation statistical table for a given language starts by counting the
of this new statistical method for password candidates gener- number of characters used in the wordlist which serves as
ation and its testing 22 of the remaining 43 passwords were input [13]. Then the wordlist is being read to create a table
found, which means that more than 50% of the remaining which includes information about the length of the words, and
passwords. includes information about frequencies of occurrence of the
By analyzing passwords provided from much larger Slovak character sequences in the input words.
password database published on the Internet [14] we came to The program we used to create this table is from the open
very similar results as from the analyzed previously mentioned source security community [7] John the Ripper. The attack
source. in John which implements this statistical approach is called
The total passwords contained in the database were more Incremental attack described in details here [8]. The exact
than 20 000. Top 8 most used passwords could be revealed command which was used to build the statistical table used
using any regular dictionary. Of course, all those 8 passwords in our test was: john --make-charset=custom.chr.
could be revealed using our statistical approach. The actual wordlist which served as input was located in
These most used passwords include first names, surnames, the file john.pot. The actual password candidates list
simple words as well as slang words. People often try to secure based on the newly created table was generate using: john
their passwords by appending digits to the end of the word -i=custom --stdout >candidates.txt.
[15]. In this case it can not be considered as secure because After finishing the password candidates list generation it
man can take our statistically generated password candidates was clearly seen that the generated password candidates list
and feed it to WPA password recovery application such as contained the original words as well as the words derived from
EWSA which supports password mutations including digit the original ones as explained in the example. When we create
mutation. a table of success it can be clearly seen that this method is
From these statistics man can clearly see the reason why very efficient also in languages like Slovak.
we decided to implement this technique, because as mentioned The password candidates list which contained 8 charac-
earlier, Slovak language is very complicated for the standard ters long words, had a success of 70% passwords found
word permutation methods and this approach allows effective in 6.97% of the whole password candidates list. It shows
password testing also for such flective languages [16]. that approximately 3 of 4 words can be recovered using
this method which is 14 times faster than trying all possible
IV. C REATING A STATISTICAL TABLE FOR CERTAIN 8 character combinations (Brute-force attack). The success
LANGUAGE of this algorithm for different character size passwords is
depicted in the Table II for real password list of Slovak users
In order to create a statistical table which describes (different from the set described in previous Section and also
a mathematical model of a given language, it is nec- greater). We compare the success of password candidates list
essary to build a dictionary which contains as many generated by our method when more than 70% of password
words of that language as possible. In our experiment was found, and we can see that the speedup rate is growing
we used the complex Slovakian wordlist which contained with the number of characters. We plan to repeat these tests
all words including their inflextional affixis (prefixis and with different test sets (real passwords publicly available) and
suffixes) using Aspell [17] command aspell -l cs continue to testing more the 8 characters length passwords.
dump master | aspell -l cs expand | tr ` `
`\n´ > sk.dic. After that the diacritics could be removed 7 available as executable in Linux distributions or as a Perl package

97
Authorized licensed use limited to: University of West London. Downloaded on August 13,2023 at 04:07:05 UTC from IEEE Xplore. Restrictions apply.
TABLE II
S PEEDUP COMPARING B RUTE - FORCE AND STATISTICAL ATTACK statistics from language models derived from huge Slovak
PREPARED FOR S LOVAK LANGUAGE FOR 70% OF REVEALED PASSWORDS databases used for continuous speech recognition [20], [21] in
TESTSET FROM REAL PASSWORD LIST OF S LOVAK HACKED WEBHOSTING
the phase of generating the character combinations statistics
AVAILABLE ON THE WEB [14] AND ALSO SUCCESS OF COMMON
DICTIONARY ATTACK ON WHOLE TESTSET POINTED OUT produced by John the Ripper. After comparing results it will
be clearly evaluated if the character combinations from widely
Password used words are more probable for constructing password then
length 5 6 7 8
[# of the rare ones.
chars]
ACKNOWLEDGMENT
all
password 11 881376 308 915776 8031 810176 208827 064576 The research presented in this paper was supported by
candidates R&D Operational Program funded by the ERDF under project
passwords
found 1113 1173 673 903
numbers ITMS-26220220155 (40%) & ITMS-26220220141
(our (40%) and EU ICT project INDECT (FP7 - 218086) (20%).
method)
candidates R EFERENCES
tested 1 564923 30 712430 570 967275 14572 531443 [1] http://www.indect-project.eu/ Accessed May 2013.
(for 70% [2] http://www.crackpassword.com/ Accessed Nov 2011.
success) [3] http://www.lastbit.com Accessed Dec 2011.
dictionary [4] http://www.lostpassword.com/attacks.htm Accessed Nov 2012.
attack 34.78% 16.90% 18.52% 12.02% [5] D. Florencio, C. Herley, A large-scale study of web password habits,
success 16th International World Wide Web Conference, WWW2007, Banff,
overall AB; Canada; 8–12 May, pp. 657–666, 2007.
speedup [6] G.B. Duggan, H. Johnson, B. Grawemeyer, Rational security: Modelling
versus 7.59 10.06 14.07 14.33 everyday password use, International Journal of Human Computer
Brute- Studies, Vol. 70 (6), pp. 415–431, 2012.
force [7] http://www.openwall.com/ Accessed Jan 2013.
attack [8] S. Marechal, Distributable probabilistic candidate password generators,
Passwordsˆ12 conference, Dec. 3 - 5, University of Oslo, Norway,
presentation (2012) http://openwall.com/presentations/Passwords12-
Probabilistic-Models/ Accessed Jan 2013.
V. C ONCLUSION [9] I.P. Mavridis, A.-I.E. Androulakis, A.B. Halkias, P. Mylonas, Real-Life
Paradigms of Wireless Network Security Attacks, Informatics (PCI), 2011
Research of passwords usage in Slovakia showed that the 15th Panhellenic Conference on , pp.112–116, Sept. 30 – Oct. 2, 2011.
total number of weak passwords is still high. What is worse is [10] J. Krekan, L. Dobos, J. Papaj, Intrusion detection methods in wireless
the fact that a lot of weak passwords are used by organizations, network systems, Journal of Electrical and Electronics Engineering, 4
(1), pp. 79–82, 2011.
schools and companies instead of individuals. For example lots [11] J. Papaj, Modification of DSR to implement SSV to the mobile ad-hoc
of organization still did not apply the password policy which network, SCYR, FEI TU, pp. 221–223, 2009.
does not allow users to have simple passwords like words, [12] J. Krekan, L. Dobos, M. Pleva, Accelerated GPU Powered methods
for auditing security of wireless networks using probabilistic password
names or numbers. generation, Journal of Electrical and Electronics Engineering, 5 (1), pp.
This the reason why the statistical methods described in this 111–114, 2012.
paper are important. They allow to test passwords strength in [13] M. Weir, S. Aggarwal, B. De Medeiros, B. Glodek, Password cracking
using probabilistic context-free grammars, Proceedings - IEEE Sympo-
a very efficient way which: sium on Security and Privacy, art. no. 5207658, pp. 391–405, 2009.
1) Saves time and costs by making the password audit of [14] http://blackhole.sk/analyza-hesiel-zo-serveru-miesto-sk (in Slovak) Ac-
wireless networks much faster. cessed Jan 2012.
[15] N. Christin, L.F. Cranor, Encountering stronger password requirements:
2) This approach allows to test and reveal the passwords User attitudes and behaviors, ACM International Conference Proceeding
that seem to be strong and unbreakable for traditional Series, 6th Symposium on Usable Privacy and Security, SOUPS 2010,
attacks but not for the attack described in this paper. Redmond, WA, USA, July 14–16, art. no. 2, 2010.
[16] J. Stas, D. Hladek, J. Juhar, D. Zlacky, Analysis of Morph-Based
3) Finaly also energy of the infrastructure is saved. Language Modeling and Speech Recognition in Slovak, Advances in
Another important fact that should be mentioned is that Electrical and Electronic Engineering, Vol. 10 (4), pp. 291–296, 2012.
[17] http://gpsfreemaps.net/navody/security/komplexni-cesky-a-slovensky-
tracking down illegal activities done via badly secured WiFi wordlist-ke-stazeni (in Czech) Accessed May 2012.
networks is almost impossible. Finally it should be said that [18] D. Hladek, J. Stas, Text mining and processing for corpora creation in
the properly chosen passwords which are regularly changed Slovak language, Journal of Computer Science and Control Systems.
Vol. 3, no. 1, pp. 65–68, 2010.
are one of the key aspects of secure computing environment. [19] S. Ondas, J. Juhar, A. Cizmar, Extracting Sentence Elements for the
Recently we developed 7 times faster evaluation algorithm Natural Language Understanding Based on Slovak National Corpus,
for finding passwords in the password candidates lists (Xeon Analysis of Verbal and Nonverbal Communication and Enactment,
Lecture Notes in Computer Science, Vol. 6800, pp. 171–177, 2011.
12GB RAM server for 8 characters password list), so we plan [20] D. Hladek, J. Stas, J. Juhar, Rule-based morphological tagger for
to continue in the tests for 9-16 characters (typical password an inflectional language, International Training School on Cognitive
length for non-experienced user) and according to result and Behavioural Systems, Lecture Notes in Computer Science, Vol. 7403,
pp. 208–215, 2012.
time consumption also to the maximum WPA length of 63 [21] J. Stas, et. al., Slovak language model from internet text data, Toward
characters. Autonomous, Adaptive, and Context-Aware Multimodal Interfaces. Lec-
Finally we are thinking about including word occurrence ture Notes in Computer Science, Vol. 6456, pp. 340–346, 2011.

98
Authorized licensed use limited to: University of West London. Downloaded on August 13,2023 at 04:07:05 UTC from IEEE Xplore. Restrictions apply.

You might also like