Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

2013 International Conference on Recent Trends in Information Technology (ICRTIT)

Pattern Analysis of Cipher Text : A Combined


Approach
Shivendra Mishra Dr. Aniruddha Bhattacharjya
MTech-Embedded System, Amrita School of Engineering CSE Department, Amrita School of Engineering
Amrita VishwaVidyapeetham Amrita VishwaVidyapeetham
Bangalore, India Bangalore, India
E-mail:shivendra_mishra@comsoc.org E-mail: a_bhattacharjya@blr.amrita.edu

Abstract—In this paper, we propose and implement a electromagnetic radiation, power consumption, sound, or
combined approach for identification of a given unknown sample timing information.
of cipher text.In the first part of system, cipher text samples are
generated randomly using different cipher algorithms. In the Pattern analysis of cipher-text has been done through
second part; the system analyses sample through a) Block various methods, such as support vector machines, decision
Length/stream Detection b) Entropy/Reoccurrence Analysis c) tree, data mining, and through neural networks. A system has
Dictionary and Decision tree based approach. All these blocks been introduced in the [2], where the author is using the
analyzethe sample simultaneously. The block length/stream support vector machine to classify the cipher. This
detection is done through counting the block length and by classification is done through the document vectors analysis
comparing to known sample’s patterns. Whereas, the using the support vector machines. However, the accuracy of
Entropy/reoccurrence analysis and Dictionary-Decision tree the system is average with 60-90%.Another system for the very
based approaches are done through the large data set same purpose has been shown in [3] and [4]. Author explains a
characterization. At last,in the third part; the different block technique of data mining as the pattern classifier in [3].
results are compared and result is generated. However, the accuracy of the system has been shown average.
A very recent work has also been shown in [4], where a system
Finally, we analyze the system with unknown ciphertext has been developed using the decision tree. This system is
samplesof AES, DES and Blow Fish, which is generated in
developed based on the C4.5 algorithm; and the accuracy of
random fashion and given as input to the system.
this system is 70-75%.Another detailed publication about
Keywords— AES, DES, Blowfish, Cipher-text, pattern analysis, pattern analysis has been shown in [5,6 and 7]; the author
Decision tree. explains different method of General pattern analysis using
SVM, KNN and LDA.
I. INTRODUCTION
Apart from this, there are few other methods of pattern
analysis:
The problem of data classification is not so new, whether it
is classification of meaningful text or the encrypted text; which 1) Multiple Histogram and Bitwise Histogram Method
has no meaning. Forproblem definition of this paper, we In the multiple histogram method, the blocks of multiple
consider a scenario of fig.[1.a]; where ISP is connected to samples are taken for generating histogram. This histogram is
WWW in one end and It is providing services to multiple generated based on the bit position in the block.
subscribers in the another end. These subscribers have freedom However, probabilities of occurrence of bits are drawn as
to establish secure communicationwith any node in the histogram in the multiple histogram method.
network, as shown in fig. [1.b];this used to be either encrypted 2) Gaussian Mixture Model and Hamming Distance Based
or unencrypted type of data. Since, the encrypted data comes Method
from the complex unbreakable algorithm; it has been widely Gaussian Mixture Model defines the pattern by the help of
used for communication by the bad people [1]. Thus, there is PDF of observed variable using a multivariate Gaussian
need of traffic analysis in the network; which will help to find mixture density. Whereas, Hamming Distance based method
out such communications. defines the pattern by measuring the distance between two
This type of analysis can be classified as cipher text only cipher text blocks.
attack or more precisely, unknown cipher-text only attacks.In 3) Mode and Method classification of Cipher text
this case, the type of cipher text (i.e.) the algorithm used for the It is important to find out mode of encryption for finding
encryption of plaintext is kept unknown. As e.g., consider out encryption method; histogram is one of the ways to find
unknown traffic is going through the network shown in mode of encryption. Similar methods have been explained in
fig.[1.b]. And it has to be analyzed through this the reference [12].
system.However, some authors also classify this type of
analysis as side channel attack; where the data is captured from
the network, which may be available in the form of

ISBN:978-1-4799-1024-3/13/$31.00 ©2013 IEEE 393


2013 International Conference on Recent Trends in Information Technology (ICRTIT)

II. PROPOSED METHOD A. Generation of Unknown Cipher Text


The generation of unknown cipher text is done using the
As mentioned earlier in the introduction section, theexisting AES, DES and Blowfish algorithm. The N number of
systems are still lacking in terms of accuracy. So, this paper unique keys is used for generation of Qi samples.
proposes and implements a combined approach of pattern However, It is found that use of same key for generation of
analysis; which combines the best existing systems for pattern samples increases the confidence value of the pattern
analysis together to gain higher accuracy. The existing system analysis.
are modified to resolveconsistency and integrity issues with The proposal of side channel information has also
each other and coupled together to work simultaneously. The been given for increasing the confidence interval of pattern
system explained in the [2] and [4] has been combined together analysis of ciphers, such as AES and TDES. However, this
and merged with the system as block c) Dictionary/Decision work is not much focused towards the patterns analysis
tree based approach. using the side channel information. A sample of side
channel information can be seen in [14].

B. Introduction to block length/stream detection


The cipher text can be genratedusing either block cipher or
stream cipher. The widely used block ciphers are AES,
3DES, RC5, Blowfish; whereas HC-256, Rabbit, Vest are
Figure1. Scenario of Problem definition the other example of stream cipher. The goal of the this
module is to classify cipher text for this both types. Since,
Hence, this system consists of three sub-modules; a) Block most of stream cipher encypt plain text in small chunk at
length/stream detection b) Entropy/Reoccurrence analysisc) once, like encrypting a bit at a time. Whereas, block cipher
Dictionary/Decision tree based approach. perform encrption on the bigger block; such as 128 bit or
These are the following advantage of this system over the higher. In such scenario, a large data set given to the block
existing system: two and three can almost clissify that given sample is
stream cipher or block cipher. A flow-chart for this
• It achieves higher accuracy rates, when compare purpose has been given in the fig.[2] and the details had
to existing systems.
been discused in section II.
• It can classify block/stream ciphers and it has
option to generate random cipher-text samples for
analysis (It includes AES, DES and Blowfish). C. Entropy/Reoccurance Analysis
The term entropy is defined as the measure of uncertanity
in the random variable[8]. Whereas, the reoccurance is
defined as a clear pattern in which some event happens
with constant time interval. Hence, this module deals with
the analysis of random as well as periodic occurance of
given sample.

D. Dictionary/Decision Tree Based Approach


The decision tree is basically used in the field of data
mining; where decision is suggested based on the statics of
the large data set.Similarly, such large set of knowledge
can be stored in the dictionaries; which can be refered for
any complex decisions, which needs analysis of multiple
attributes. However, this dictionary can also be created by
another methods.
Hence, This module deals with the creation of such
decision tress and dictionries.
E. Parameters extraction and Results
Figure2. Flow-Chart for Stream/Block Detection Based on the analysis phase explained above the mode,
confidence level and entropy of the given sample is
generated, and a matrix of these values is passed for statics
generation.

394
20013 International Conference on Recent Trends in Informaation Technology (ICRTIT)

2) DES
DES is another symmetric key y block cipher, which was
III. SYSTEM ARCHITECTU
URE
published FIPS in 1975. This algorithm
a uses 64 bit plain
There are two parts in the system architectture, as shown in text and 56 bit long key, and thet plain text goes through
the fig.[3]. These parts are: a) sample generation b) Pattern the initial and final permutattion before and after the
analysis c) Result.The first part is responsiblle for the sample rounds.Each round is consists of a swapper and mixer,
generation. AES, DES and theBlowfish algorrithms have been
which xor the key with the data and swapes it again.
implemented to generate the random sample.
3) Blow Fish
However, we can also explain the system m in the analysis The Blowfish is the another ciph her, which has no effective
prospective; which will divide the system into parts as a) cryptanalysis till date. It has sixteen rounds with large s-
Modeling phase of cipher text and b) Moodeling phase of boxes. It supports 64 bit block k size and,It is flexible for
model. Now, we will see the architecture in deetail: key length from 32 to 448 bits.
A. Sample Genration
The sample is generation module hasthrree separate sub- Finally, these all sub-modules worksw simultenously and
modules inside: genrates a set of sample cipher textt for analysis, which has to
be given to second module.
1) AES Algorithm
As part of the sample generation, AES alggorithm has been B. Analysis
implemented. AES was basically selectedd as FIPS 197 by The analysis module consistts of the three sub-modules:
the NIST in December, 2001. The alggorithm uses ten
1) Length/Stream Detection
rounds and a 128 bit long key. These all rround consists of Pattern analysis of block ciphers,, such as AES is always
sub-byte, shift-rows, mix-columns, and add round key. considered as tough task;since, it achieves almost linear
The AES always uses matrix form to pprocess the data. distribution of symbol in the cipher text.However,
t Block
First of all, state is generated for the ggiven plain text,
which is matrix form of the given plain ttext. Next, a Add
round key is done and the state is given tto the first round
of AES. Finally, the mix column is not done in the last
round and a final state of cipher text is gennerated.

Figure4. Flow-Chart for Pattern Analysis

cipher such as blow fish and DES ciipher-text samplesare found


with pattern, when large set of data is
i analyzed.
So, this module identifies patteern of cipher text based on
the bitwise histogram. The flow-chaart for this module has been
drawn in the fig.[4]; and algorithm has
h been explained below:
Fig. [3]: System Architecturee

395
20013 International Conference on Recent Trends in Informaation Technology (ICRTIT)

Step [1]: Get the samples of the cipher text; generated and pruning is showed baased on the set threshold of
each branches.
Step [2]: Convert the given sample into array oof bits;
This module considers the set of bits as symbols and
Step [3]: Group the bits and generate such mulltiple states; each new symbol in the reading staage of cipher text defines a
Step [4]: Generate histogram for these states seequentially; new root in the tree. This is the foollowing detailed algorithm
explaining, how the tree is generated
d using this module:
Step [5]: Compare the histogram with the ssample of known
ciphers; Step[1]: Read the sample of the giveen cipher.
Step [5.a]: If sample matches more than threshold, then show Step[2]: Define the symbol for the in
ncoming string.
the result as cipher name and confidence valuee. Step[3]: check for the incoming sy
ymbol in the tree (Classify
Step [5.b]: If the sample match is beloww threshold then according to C4.5)
reshuffle the sample and go tostep [3]. hesis of insertion according
Step[4]: Define the right/left parenth
2) Entropy/Reoccurance Analysis to the classification.
Suppose that sub-module is receiving argumennts such as (α, β, d of the sample.
Step[5]: Go to step [1], until the end
P, T).

b) Dictionary based Method


And, let S be a function defined through α;
This approach [2] is implemented d using the support vector
Then, machines. In this method, the cip pher text is considered as
document or bag of words. Since, the sample is large in size
Sn=STn
and complex in nature; it is divided into set of bits. And corpus
Where, n=1,2,3,4..... is generated for different classes annd it is given to dictionary
for further references. Here, the neew arriving sample will be
Here, this function S can be considered as mapping matched with existing knowledge av vailable in the dictionary.
function; where one value is coded to another.
c) Side Channel Information
The unit of entropy is known as nuts andd the entropy of a
random space based on parameters (α, β, P); iss defined as: As mentioned earlier, the block ciph hers such as AES is almost
perfectly linear and hence, it doesn’tt leave much pattern, which
Hp(S) = P S a log S a could be analyzed through small seet of cipher text.Therefore,
the side channel information in the analysis module is a
If, the sample has been partitioned; and Q is deefined as, important term, which can be loaded on optional basis in the
Qi = {ω:S(ω) = ai} = S-1({ai})) system. We consider timing/power consumption
c information in
the analysis module of the system m. The timing information
Then, increases the confidence level off the analysis and hence
|| || increases the accuracy of the systtem. Author has shown a
method to obtain the side channel innformation in the paper [9].
H P Q log P Q
This is the following algorithm,, to increases the accuracy
through the side channel information
n
Step[1]: Load the timing/powerconssumption data in the system
In such scenario, the entropy rate is defined as; of the different algorithms.
mple and feed to the analysis
Step[2]: Generate the unknown sam
module. Load the timing/power co onsumption information of
unknown sample into the system.
3) Dictionary/ Decision tree Based Methood Step[3]: Forward the side chan nnel information to the
entropy/reoccurrenceanalysis modulle.
This sub-module detects the pattern based oon the following Step[3.a]: If the data matches, show
w the cipher with confidence
methods: interval.
a) Decision Tree C. Result
In this module [4] the characterization of cippher text is done meters passed to the APIs to
In this module, the analyzed param
based on the testing and training phases; aand the extracted generate the graph for showing thee confidence interval of the
information is given to C4.5 algorithm for cllassification. The match.
C4.5 classifier produces the classification of ddata based on the
nature observed. In the training phase the entroopy of the data is IV. IMPLEMENTATION DETAILS OF SYSTEM
analyzed; whereas, in the testing phase decision tree is

396
2013 International Conference on Recent Trends in Information Technology (ICRTIT)

The system is implemented using Java Cryptographic generating histogram and entropy for given data; which has
Architecture (JCA). This architecture supports secure been used for implementing the analysis and result module of
programming concepts in addition with object oriented the system [10, 11].
Programming concepts. These secure programming concepts
includes a) Pointer Unavailability (i.e.) It doesn’t support The system implementation follows the definition and
arbitrary memory location addressing in order to ensure algorithms explained in the system architecture section. This is
security. b) Byte code verification (i.e.)architecture checks for the following algorithm explains procedure for passing the
the security breaches after execution of class files. c) Library pattern data for generating results using the JMF:
support and Resource access (i.e.) it supports the APIs of third Step[1]: Model the pattern data into n*n matrix.
party algorithms, which can be called and used.
Step[2]: Pass the modeled data to the histogram API and store
To maintain integrity of the message, this architecture uses the return data.
various algorithms such as MD2, MD5, SHA-256, SHA-383,
SHS-512. In order to ensure integrity, MD5 has been used in Step[3]: open the returned data file.
the sample generation module of the system, see fig.[3]. This This file can be used for further comparison and modeling with
architecture supports the automatic key generation, (i.e.) key the other sample histograms.
can be simply generated by passing the key length and the
cipher name to the class keyGenrator(). The key is changed for A. Modeling and Requirement of the System
each encryption algorithm; and N different keys have been The system is having the following hardware and software
used for generating the samples. These N number of different requirements:
keys helps to find out the exact accuracy of the system.
1) Hardware Requirements
This is the following steps has been followed to generate The hardware requirements of the system includes of the
the cipher text in sample generator module of the system (for following: a) 256MB RAM; b) P4 Processor; c) 5GB HD
writing code):
2) Software Requirements
Step[1]: Create a class and then a function inside it, and get the These are the software requirements for the system: a) java
plain text in UTF8 format. SDK 1.6;
Step[2]: Pass the cipher name and the key length to the The different UML diagrams of the system have been
keyGenrator(); and store the key in key data type. shown in fig.[5] and fig.[6] to understand the interaction and
Step[3]: Get the cipher object, as (e.g.) get AES object and visual model of the system.
define the corresponding parameters.
Step[4]: Initialize the cipher with the encryption mode and key.
Step[5]: Pass the plain text and receive the cipher text in the
byte form of UTF8; which is combined with the other samples
of the cipher texts.
This set of samples is passed to the analysis module, where
the classifications of unknown samples are done; the following
procedure has been used to generate the decision tree in the
analysis module:
Step[1]: Group the set of symbols; to get Qi.
Step[2]: Define the parenthesis of the tree based on the
classifier and generate the child node if threshold is meet; as
given in [13].
Step[3]: Generate the tree till the end of samples.
Step[4]: Compare the existing learnt model and generate the
confidence interval.
Similarly, the length, mode and entropy are found out
according to procedures explained in the II.B.
Figure5. Use-Case Diagram of system
A matrix of extracted parameters of the analysis module is
passed to the result generation module, where the histogram is
generated.
The histogram generation is done using the java media
framework (JMF);whereas, the analysis is done using the both
architectures JMF and the JCA. The JMF supports the

397
2013 International Conference on Recent Trends in Information Technology (ICRTIT)

VI. CONCLUSIONS
Pattern analysis of cipher text is having a wide range of
applications.The system can be used by the network
administrator to analyze the flow and type of traffic in the
network. Additionally, identification of encryption algorithm is
first step of cryptanalysis, which moves towards the analysis
for the key. The system may also have wide range of military
applications.This type of analysis may also provide us a model
of breaches in the encryption algorithms. In another words, it
motivates to have more and more cryptanalysis techniques in
the field of (unknown) cipher text only attacks.
The future work in this area can be effective cryptanalysis
for the AES algorithm. Since, the AES algorithm shows almost
uniform distribution of the cipher text; which makes it difficult
Figure6. Sequence Diagram of the system to analyze.
REFERENCES
V. RESULTS OBTAINED [1] Center of excellence (Deffence Against terrorism), “Response to Cyber
Terrorism,” IOS Press, May2008. (references)
The various screen shoots and the result statics of the system
[2] Dileep A.D. et. al.,”Identification of Block Ciphers Using Support
are shown in the fig. [7]. Vector Machines”,In proceedings of IEEE International Conference on
Neural Networks,IJCNN’2006, pp.2696–2701.
TABLE I [3] Pejman Khadivi et. al. , “Cipher Text classification with Data Mining
PATTERN ANALYSIS PERFORMANCE OF THE SYSTEM Fine particles, thin films and exchange anisotropy,” In proceedings of4th
International Symposium on Advanced Networks and
Thres- Number Bit Mode Classificati Entr- Telecommunication Systemss,ISANTS’2010, pp.64–66.
hold of Length Confid -on opy [4] Manjula R. et. al.,”Identification of Encyption Algorithm Using
(Ti) Samples of the -ence Confidence Rate Decision Tree”,Springer-Verlag Berlin Hebelberg-2011, pp.237–246.
(N) Sample (Cm) (Cc) H(Qi) [5] Suhaila O. Sharif et. al.,”Classifying Encryption Algorithms Using
Pattern Recognition Techniques”
(Qi)
[6] Richard P. L ippmann et. al.,” Pattern Classification Using
10 10 128 86% 83% 31.77 NeuralNetworks”,
55 200 256 70% 64% 22 [7] R.O. Duda et. al., “Pattern Classification”, Wiley Interscience, 2000.
78 700 512 90% 87.3% 25 [8] Robert M. Gray, “Entropy and Information Theory,” Springer Verlag,
99 2000 1024 90.8% 89.1% 18 USA-2013.
[9] Paul Kocher, Joshua Jaffe, and Benjamin Jun, “Differential power
And the [Table I] list outs the different parameters analysis,” Lecture Notes in Computer Science, vol. 1666, pp. 388–397,
1999.
extracted in the different scenarios. It can be statically
[10] Oracle Java Documetation on JMF
compared with works published in [2], [3] and [4]. <http://www.oracle.com/technetwork/java/javase/documentation-
138769.html> -2011 (accessed on 17-03-2013)
[11] Oracle Java Documetation on JCA
<http://docs.oracle.com/javase/6/docs/technotes/guides/security/crypto/
CryptoSpec.html> -2011 (accessed on 17-03-2013)
[12] Eli Biham, “Cryptanalysis of multiple modes of operation,” in
ASIACRYPT ’94: Proceedings of the 4th International Conference on
the Theory and Applications of Cryptology, London, UK, 1995, pp.
278–292,Springer-Verlag.
[13] Quinlan, J. R. C4.5: Programs for Machine Learning. Morgan
Kaufmann Publishers, 1993.
[14] Side Channel Information sample
<http://en.wikipedia.org/wiki/File:Power_attack.png> -2010 (accessed
on 17-03-2013).

Figure 7. Results

398

You might also like