Introduction NS Report

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

GROUP

Break-Substitution Cipher

Introduction: Substitution ciphers are one of the oldest and simplest methods for encrypting text. Historically, the substitution (sometimes it's called replacement cipher) cipher has been used many times since it provides a good illustration of basic cryptography. The name substitution cipher comes from the fact that each letter from the message we want to encrypt is substituted by another character (e.g. letter or symbol). Someone who knows which character from the plain text has been mapped to which character from the cipher text can easily decrypt it. For example, to encrypt the following: Substitution Cipher

How it can be cracked:


Brute-Force: Finding the key for a substitution cipher by using brute-force is almost impossible and waste of both time and resources. The reason is a simple statistical trivia, in case of using English letters only the key space will be 26!288 which a huge number even if the attacker use hundreds of powerful computers. Letter Frequency Analysis: This method represents a major weakness in substitution ciphers. That is because each plaintext symbol always maps to specific cipher text symbol. That means that the statistical properties of the plain text are preserver in the cipher text. In English 'E' is the most frequent letter (about 12.51%), 'T' comes in second (9.25%) then A (8.04). How is that going to help? Easy, by a simple observation we can find out what's the most frequent symbol in the cipher text then we replace it with the most frequent letter in English and so on. This method can be generalized by testing pairs or triples or even quadruples. For instance, in English, a U almost always follows the letter Q. This behavior can be exploited to detect the substitution of the letter Q and the letter U. Letter Frequency Letter Frequency E 12.51% M 2.53% T 9.25% F 2.30% A 8.04% P 2.00% O 7.60% G 1.96% I 7.26% W 1.92% N 7.09% Y 1.73% S 6.54% B 1.54% R 6.12% V 0.99% H 5.49% K 0.67% L 4.14% X 0.19% D 3.99% J 0.16% C 3.06% Q 0.11% U 2.71% Z 0.09%

Implementation:
At the very first, I would like to mention our programming language we used. We used Qt with C++. When we started from the beginning to see what steps we would follow we tried to not take a very deep look at the Internet and just try to copy or take any idea from it. We tried to work almost %100 with our way of implementation, so we started with the design of our project and we tried to make it simple and easy to use. Here a screen shot of our first design: After that we started working on it, what we have in our minds is we will first take a very good advantage of the LFA. So the first thing we did is to count the most repeated letter from the cipher text. For example we pick the first letter and see how many times it got repeated then store it, after that we do the same thing with all letters then put all that in variables. When we finished that we based on LFA to replace the most repeated letter with E and the second with T third with A forth with O fifth and last with I . We ignored N to have only five. At the beginning we thought that would be always true! But discovered that isnt the case always and will talk about that later on.

After we replaced the most five repeated letters we got a text we can only read five letters from it. Of course that wasnt enough. So tried to see what we got from that so we could proceed. We took a look to the text to see what correct words do we got after this operation. We found these words appears: Eat ate tea tie These words did not help much so we try to find the most words with three letters we found: // the, and, for, are, but, not, you, all, any, can, // had, her, was, one, our, out, day, get, has, him, his, // how, man, new, now, old, see, two, way, who, boy, // did, its, let, put, say, she, too, use Then we tried to pick a word with a missing one letter like: The we got the T & E and we want the H so one letter is missing it looks perfect for us, so we used regular expressions in this case to make the program find any word starting with T then unknown letter then E = T?e where ? represent the missing letter. When we did that we got hits so lets say after we replaced the big five letters we got Tke so our program will take that word since it matched T?e and we asked the program to give us the letter in the second position which us K in this case to replace EVERY K on the our little plain text. Now we got in our dictionary 6 letters. So we took other words like this with a missing one letter. But now instead of saying we only got one letter in the word Her now we can say we got 2 letters, so we are now able to find the letter R and add to our dictionary and use it to hunt other words. But by using the above technique we have to accept the risk of finding other words with the same regular expression like instead of finding The we will get Tee or Tie or Toe so instead of finding the letter H we get e, I or o. After that we couldnt found all the three letter words so we tried 4 letter words with respect of the order of the most letters in English so we use that new letter in a wider area. Truly we got a real problem which is if the cipher text we got does not contains the words we picked to check with we got no chance of finding it, for example if the cipher text came with no the we will never find the letter H out of it. So we tried firstly to use most common words so the possibility of these words not come almost ZERO. Also we tried to put more than one word to find that letter to help us make sure that word will be there and the letter got found. Note: we used in the cracked text (Upper letters represent Unknown letter, Lower letter represent that letter came from us and it has been cracked).

After all that work we faced a problem as we mentioned earlier, not always the order of the most repeated letters are the same. Now, should we change every thing? Because one letter of the big letters disordered we toast. So we cant bet on that, that is why we provide for the user to check if cracked text is readable, nice and clean or the user have to ability to click on other button to change the order of the big five letters. So to make the user sure of the real order the benefit of accuracy: If the user click one of them as we mentioned the order will change and a number of accuracy will change automatically until you find the real order. Every button of the Above will call Different function of These. We found no Disorder of the E and T letters so we just playing with the rest.

Even though there is no way we found a hundred percent in all cases. So in our case we found that in order to have Almost % 95 of the plain text the user must enter at least 8-10 pages average. We know it seems too long but after many tries we found that that number of pages have a high possibility to contains all the patterns we have to have high accuracy rate. Last thing we added that we provide also a very powerful function that may lead to %99.99 and it seems bad or old but it does make that charm. Yes, Replace function allows you to replace any letter with the letter you want manually. Because sometimes after the program do its job and leads you to %90 and you check the plain text and you find the word eQample and you instantly knew it as example the program made all these changes but got stuck somehow and did not find the letter X why we dont provide you this function to change ALL Qs in the plain text with the right letter to make it %99 for you.

Accuracy:

In order to calculate the accuracy of the plain text we used simple algorithm. Which is after the program finished replacing the letters it found calls a function that count every small letters in the plaintext box which indicates the letters we found. And then count the whole text including upper letters and lower letters. Then divide the number of lower letters over the number of all text. Then multiply it by 100.

The Final design:


Screen Shoots:

Here we tried a cipher text then we put it in its place then we pressed 1 but the result came with % 82 just Note: (@ means a) but we got weird error when we tried to put it a so we did that instead.

We hope the good mark & luck and wish the best for all of us.

You might also like