Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

2013 International Conference on Recent Trends in Information Technology (ICRTIT)

Highly Secure DNA-based Audio Steganography


Shyamasree C M, Sheena Anees
Department of Computer Science, KMEA Engineering College
Aluva
shyamasreecm@gmail.com,sheenanees@gmail.com

Abstract— Security is the important criteria relevant to (Deoxyribonucleic Acid) is the germ plasm of all life styles.
information in transit as well as in storage. This paper It is a kind of biological macromolecule made up of
proposes a highly secure method to hide the secret nucleotides. Each nucleotide contains a single base. There
messages to prevent unauthorized access. The proposed are four kinds of bases, which are adenine (A), thymine (U
method works in three levels. Single level of encryption or T), cytosine (C) and guanine (G). In a double helix DNA
and two levels of steganography are used. First level string, two strands are complementary in terms of sequence,
makes use of a DNA based playfair encryption. Second that is A to T and C to G according to Watson-Crick rules
level hides the encrypted secret file in a randomly [4].
generated DNA sequence. In the third level, embedded .
DNA is hidden in an audio file. The main objective of the A number of methods have been proposed over the last
proposed method is that no one could be able to find the decade for encoding information using deoxyribonucleic
existence of the secret file. acid (DNA), giving rise to the emerging area of DNA data
embedding. Since a DNA sequence is conceptually
Keywords— Information security, steganography, equivalent to a sequence of quaternary symbols (bases),
cryptography, DNA computing, playfair encryption. DNA data embedding (diversely called DNA watermarking
or DNA steganography) can be seen as a digital
I. INTRODUCTION communications problem where channel errors are
The explosive growth of computer systems and their analogous to mutations of DNA bases [5].
interconnections via networks such as internet has led to a
heightened need for information and data security. The DNA was proposed for computation by Adlemann in
growing use of the internet has led to a continuous increase 1994. After that many approaches have been investigated. A
single strand DNA consist of four different base nucleotides
in the amount of data that is being exchanged and stored in
various digital media[1]. In the current internet community, including adenine(A), thymine(T), cytosine(C),and
secure data is limited due to its attack made on data guanine(G). After attached to deoxyribose, those
communication [2]. Security is the main concern of any type nucleotides could be strung together to generate long
of communication. The objective of secure communication sequences. The DNA computing gives a new way to
cryptography [6] Leonard Adleman [13], proposed the
is that actual data should not be revealed to any third party.
computation ability of DNA. The field of cryptography has
Steganographic techniques are the most successful
technique in supporting hiding of critical information in evolved, which is DNA cryptography. DNA is used as
ways that prevent the detection of hidden messages. While information carrier. High randomness and storage capacity
cryptography scrambles the message so that it cannot be of DNA made it efficient to be used as basic computational
understood, steganography hides the data so that it cannot tool [14], [15], [16].
be observed [1]. In steganography the existence of the
information is hiding so that it is hard for attackers to find Mona Sabry in [7] proposed a significant modification to
that the hidden information is existing. Steganography is the the old playfair cipher by introducing DNA-based and
science that involves communicating secret data in an amino acids-based structure to the core of the ciphering
appropriate multimedia carrier, e.g., image, audio, and video process. A binary form of data are transformed into
files. It comes under the assumption that if the feature is sequences of DNA nucleotides subsequently these
visible, the point of attack is evident, thus the goal here is nucleotides pass through a playfair encryption process based
always to conceal the very existence of the embedded data on amino-acids structure. The relationship between the
[3]. nucleotide sequences of genes and the amino acid sequences
of proteins is determined by the rules of translation, known
DNA computing is a new method of simulating collectively as genetic code. The genetic code consists of
biomolecular structure of DNA and computing by means of three-letter words called codons formed from a sequence of
molecular biological technology which is a novel and three nucleotides (e.g. ACT, CAG, TTT). Since there are 4
potential growth. In a pioneering study, Adleman bases in 3-letter combinations, there are 64 possible codons.
demonstrated the first DNA computing. It marked the These encode the twenty standard amino acids, giving most
beginning of a new stage in the era of information DNA amino acids more than one possible codon [7].

ISBN: 978-1-4799-1024-3/13/$31.00 ©2013 IEEE 519


2013 International Conference on Recent Trends in Information Technology (ICRTIT)
The motivation of preparing this paper is given in the
Catherine Taylor Clelland proposed a hiding technique next section. The section III gives the detailed description of
using DNA. The evolution of DNA steganography takes its encryption phase. The section IV illustrates a detailed
idea from the microdots. The microdot is a means of overview of the DNA steganography phase of the proposed
concealing messages i.e. steganography that was developed method. Details of audio steganography are given in section
by Professor Zapp and used by German spies in the Second V. The details of the extraction of original text file from the
World War to transmit secret information. The microdot stego audio file are given in section VI. Section VII gives
have been taken a step further and developed a DNA-based, the experimental results of the proposed method.
doubly steganographic technique for sending secret
messages. A DNA encoded message is first camouflaged II. MOTIVATION
within the enormous complexity of human genomic DNA Information security is one of the important fields in
and then further concealed by confining this sample to a which so many researches are taking place. Steganography
microdot [8]. From this technique several methods are is a main method to provide security.
proposed in the field of DNA steganography [9], [10], [11].
One of the latest techniques is proposed by Amal Khalifa DNA due to its immense storage capacity and high
[12]. In this the carrier is a randomly generated DNA randomness is used now in the field of steganography. This
sequence. The high randomness and effective storage can be considered as the recent technique in steganography.
capacity of DNA has given it high potential to act as cover DNA based algorithms can be used in various fields such as
medium to hide secret information. job scheduling for clusters, GPU applications, multi-core
architectures, etc [20], [21], [22] and hence there is a need
Most steganography jobs have been carried out on of more new techniques in this field.
different storage cover media like text, image, audio or
video [17]. Steganography and encryption are both used to Since the Human Auditory System (HAS) is more
ensure data confidentiality however the main difference sensitive than Human Visual System (HVS) [23], and since
between them is that with encryption anybody can see that the audio files are redundant and highly available, audio
both parties are communicating in secret. Steganography steganography is of high importance in the field of
hides the existence of a secret message and in the best case steganography. Hence more techniques are to be found out
nobody can see that both parties are communicating secrets in this field [18].
[18].
III. DNA BASED ENCRYPTION
Audio files are considered to be excellent carriers for the In this section the detailed overview of the encryption
purpose of steganography due to presence of redundancy. technique is illustrated which is used to convert the secret
Audio steganography requires a text or audio secret message file in to encrypted DNA sequence.
to be embedded within a cover audio message. Due to
availability of redundancy, the cover audio message before Encryption is the first level of the proposed
steganography and stego message after steganography method.DNA based playfair encryption is used to hide the
remains same. However, audio steganography is considered secret message stored in a file called secret file . Text files
more difficult than video steganography because the Human can be used as the secret file. This secret file is the input to
Auditory System (HAS) is more sensitive than Human the encryption algorithm.
Visual System (HVS) [18]. Least Significant Bit (LSB)
modification technique is the most simple and efficient DNA digital coding is used to convert the raw data in the
technique used for audio steganography [18]. secret file in to DNA sequence. From a computational point
of view, any DNA sequence can be encoded using a binary
The objective of this paper is to come up with an efficient coding scheme, in which anything can be encoded by a
method to preserve security of secret messages ina a text file combination of the two sates 0 and 1. Therefore, the
against unauthorized access by hiding the presence of the simplest coding patterns to encode the 4 nucleotide bases
text file. The method proposed in this paper uses a three (A, U, C, G) is: 0(00), 1(01), 2(10), 3(11) respectively.
levelled security. The three levels are Encryption, DNA Obviously, there are 4!. So, among these 24 patterns, only 8
steganography and audio steganography. For encrypting the kinds of patterns (0123/CUAG, 0123/CAUG, 0123/GUAC,
text file the proposed method uses a DNA based playfair 0123/GAUC, 0123/UCGA, 0123/UGCA, 0123/ACGU and
encryption algorithm. In the second level, the encrypted o123/AGCU) which are topologically identical fit the
secret file is hidden in a randomly generated DNA complementary rule of the nucleotide bases. It is suggested
sequence. In the third level the DNA sequence which is that the coding pattern in accordance with the sequence of
embedded with the encrypted file is hidden inside an audio molecular weight, 0123/CUAG, is the best coding pattern
file using Least Significant Bit modification technique. for the nucleotide bases[25], as illustrated in table 1 [12]

520
2013 International Conference on Recent Trends in Information Technology (ICRTIT)
TABLE 2: A MAPPINGOF THE DNA CODONS INTO 26 AMINO
ACIDS
TABLE 1: DNA DIGITAL CODING [12] Codons Character Codons Character
GCU, GCC, A AAU, AAC N
DNA Nucleotide Decimal Binary GCA, GCG
A 0 00 UAA.UAG,UGA B UUA,UUG O
C 1 01 UGU,UGC C CCU,CCC, P
G 2 10 CCA,CCG
U 3 11 GAU,GAC D CAA,CAG Q
GAA,GAG E CGU,CGC,CGA R
. The process of DNA based encryption has the CGG,AGA,AGG
following steps: UUU,UUC F UCU,UCC,UCA, S
UCG,AGU,AGC
1. Convert the contents of the secret file in to binary GGU,GGC,GGA, G ACU,ACC,ACA, T
form by taking the ASCII value. GGG ACG
2. Group the binary data into groups of two bits and CAU,CAC H AGA,AGG U
Code binary data into DNA sequence using table 1. AUU,AUC,AUA I GUU,GUC,GUA, V
3. Group the alphabets in the DNA sequence into GUG
groups of three letters(i.e., codons) and Map the J UGG W
AAA,AGG K AGU,AGC X
codons into amino acids using table 2.
UUA,UUG,CUU, L UAU Y
4. Store the ambiguity numbers in a file. CUC,CUA,CAG
5. Encrypt the amino acid sequence using playfair AUG M UAC Z
cipher method.
6. Convert the amino acids in the encrypted sequence
into binary form. By takin the Ascii value.
7. Group the binary form of encrypted amino acid
TABLE 3: A LEGAL TWO-BY-TWO COMPLEMENTARY RULE [
sequence into groups of two bits and Code the
groups into DNA sequence using table 1.
TOKEN COMPLEMENT
8. Store the resulting sequence in a text file.
AA TC
TC CG
Table 2 is used to map the DNA sequence into amino CG TG
acid sequence. There are 1 to 6 possible codons to represent TG GC
each amino acid in table 2. This fact will create confusion GC TT
on decryption that which codon should be selected TT TA
corresponding to a particular amino acid. To counter this TA GT
GT GG
problem ambiguity numbers are used which are 0, 1, 2, 3, 4
GG AT
and 5 corresponding to the first, second, third, fourth, fifth AT CT
and sixth codon respectively. These ambiguity numbers are CT CC
stored in a separated file to be used in the decryption CC CA
process [7]. CA AC
AC AG
Hence the output of the DNA based encryption is two AG GA
GA AA
files, one file containing the ambiguity numbers useful for
later decryption and the other file containing the encrypted
DNA sequence. IV. DNA STEGANOGRAPHY
This section gives the idea about the second level of the
proposed method which is the DNA steganography. The
encrypted DNA sequence obtained from the first level is
hidden inside a randomly generated reference DNA. Two by
two generic complementary rule is used for hiding a DNA
sequence in other. A condition for applying the proposed
algorithm for DNA steganography is that the randomly
generated reference DNA sequence should have a length

521
2013 International Conference on Recent Trends in Information Technology (ICRTIT)
which is double of the length of the DNA sequence to be This password is encrypted and hidden in the audio file
hidden in the reference DNA. Table 3 shows the legal along with the embedded DNA sequence.
complementary rule used in the proposed algorithm. For
each pair of bases there is a complement. Also double
Audio files used should be in .AU format since it is the
complement can be taken by taking the complement of the
complement. In this way triple complement, quaternary standard audio file format used by Sun, Unix and Java. The
complement etc can be taken. audio file is read in binary format and the encrypted
password end the embedded DNA sequence are stored in
In the proposed algorithm for DNA steganography DNA the lower half of each byte of the audio file. This will create
bases in the encrypted DNA sequence which is the input to no distortion to the sound.
the algorithm is treated one DNA base at a time while DNA
bases in the randomly generated reference DNA is treated as The process of audio steganography has the following
DNA base pairs. For each DNA base in the input the steps:
corresponding base pair in the reference sequence is 1. Read the embedded DNA sequence and the audio
processed. The DNA sequence to be hidden is represented file in binary format.
by DS and reference DNA is represented by Ref. The 2. Read the password Sample the audio file.
algorithm has the following steps: 3. Encrypt the password and embedded sequence
together using DES to form cipher
1. Let the length of DS is N and hence that of Ref is 4. Sample the audio file.
2N. 5. Encode the length of the cipher in lower half of the
2. For i = 1 to N by step 1and j = 2 to 2N by step 2. first 32 audio samples.
if DSi is ‘A’, do not change rj and rj+1 of Ref . 6. Encode the cipher in lower half of the remaining
if DSi is ‘C’, change ([rj], [rj+1]) to be audio samples.
C([rj], [rj+1]).
If DSi is G, change ([rj], [rj+1]) to be CC([rj],
[rj+1]). VI. EXTRACTION OF SECRET FILE FROM AUDIO DATA
If DSi is ‘U’, change ([rj], [rj+1]) to be This section illustrates how the secret file is extracted
CCC([rj], [rj+1]). from the stego audio file. The extraction process consists of
the reverse of all the process that had been done to hide the
3. Now the new DNA sequence is Ref. secret file containing the secret message.

The actual reference DNA should be stored for later use After the encryption phase and two steganography
in the extraction process. And the processed reference DNA phases, there are files as output. First file contains the
is the output of DNA steganography. It is the embedded ambiguity numbers from the encryption phase which is
DNA sequence in which actual encrypted DNA sequence is necessary in the decryption phase. Second file contains
reference DNA used in the DNA steganography phase
hidden. So in this level of steganography two files are to be
which is to be used to extract the actual DNA sequence. The
stored, one file containing the reference DNA sequence third file is the audio file in which the embedded DNA
which is randomly generated in the algorithm and the other sequence is hidden. The password used in audio
file containing the output of the DNA steganography which steganography phase should be remembered in order to
is the embedded DNA sequence. extract the embedded DNA sequence from the audio file.

V. AUDIO STEGANOGRAPHY As the first step the stego audio file is sampled. From the
This section illustrates the third level of the proposed first 32 samples the length of the cipher embedded in it is
decoded. Then from the remaining samples, in which the
method, audio steganography. The embedded DNA
cipher is encoded, is decoded. This cipher is decrypted to
sequence obtained as the output of DNA steganography is obtain embedded DNA sequence.
hidden in an audio file to hide the existence of the secret
data. To extract the actual DNA sequence from the embedded
DNA sequence a comparison between the reference DNA
Least Significant Bit (LSB) modification is used for and the embedded sequence is needed. The DNA bases in
audio steganography. A password is provided in addition to both DNA sequences are treated as pairs of bases. The
length of both DNA sequences is the same. The reference
the embedded DNA sequence to provide additional security.
DNA sequence is annotated as Ref, the embedded sequence
is annotated as DC and the output sequence which is the

522
2013 International Conference on Recent Trends in Information Technology (ICRTIT)
actual DNA sequence embedded in the reference DNA is TABLE 4: RESULTS OF HIDING TEXT FILES
annotated as OP. The extraction process has the following
steps: Size of text file Size of file after Number of audio
1. Let the length of the DNA sequences is N. containing secret DNA samples required
2. Ste k =1. message(Kb) steganography(kb) to embed the file
3. For i = 1 to N by step 2. 1 10.7 10968
2 21.4 21976
If ([ri], [ri+1]) of DC is equal to ([di], [di+1]) of Ref,
3 32.1 32920
OPk is ‘A’. 4 42.6 43704
If ([ri], [ri+1]) of DC is equal to C([di], [di+1]), OPk 5 53.3 54680
is ‘C’. 6 64.0 65632
If ([ri], [ri+1]) of DC is equal to CC([di], [di+1]) , 7 74.7 76512
OPk is ‘G’.
If ([ri], [ri+1]) of DC is equal to CCC([di], [di+1]) , Advantages of the proposed method:
OPk is ‘U’. • The proposed method takes advantages of both
Set k = k+1. encryption and steganography.
• Since the last step is to hide the secret text file in an
4. Now the actual DNA sequence is OP.
audio file, it will not attract unwanted attention.
This actual DNA sequence is then decrypted to obtain • It is difficult to prove that there exists a secret
the secret file. The ambiguity numbers from obtained from message and to interpret the message inside the
the encryption phase is needed for decryption. The audio file since the message has gone through
decryption process has the following steps: encryption as well as two steps of steganography.
1. Convert the DNA sequence in to binary form using
table 1.
VIII. CONCLUSION
2. Group the binary data into groups of 8 bits and
This paper proposed a method to hide the secret messages
covert each group into alphabets corresponding to store in text files from unauthorized access. The method can
the ASCII values to obtain the encrypted amino be applied to text files. The proposed method has three
acid sequence. levels. The first level is DNA based encryption of secret
3. Decrypt this amino acid sequence using playfair file. In level 2 the encrypted file is hidden in a randomly
cipher method to obtain the actual amino acid generated DNA sequence. This embedded DNA sequence is
sequence. hidden in an audio file in level 3.
4. Using ambiguity numbers and table 2 convert each
The proposed method was tested on different secret
amino acid in the sequence into corresponding messages stored in different text files. Advantages of the
DNA codons to obtain the DNA sequence. proposed method are listed. In conclusion, in the proposed
5. Using table 1 convert the DNA sequence into scheme the message is encrypted, hidden in a DNA
binary form. sequence and as the third step hidden in an audio file.
6. Group the binary form into groups of 8 bits and
REFERENCES
convert them into alphabets using corresponding
ASCII value. [1] VenkataramanS, Abraham, Ajith,,“Significance of Steganography
7. Store the output of step 6 to obtain the actual secret on Data Security”, international conference on information
file. security,coding and computing, pp.no.347-351,2004.
[2] Bankar Priyanka R, Katariya Vrushabh, Patil Komal K,Shashikant
M Pingle,“Audio Steganography using LSB”, first international
conference on recent trends in engineering & technology,
VII. EXPERIMENTAL EVALUATION IJECSCSE, 2012
[3] Abbas Cheddad, Joan Condell, Kevin Curran, Paul Mc Kevitt,
The software tool used for study the evaluation for this “Digital Image Steganography: Survey and Analysis of Current
work is JDK1.7. The algorithms are implemented Methods”, signal processing, volume 90, issue 3, ppno.727-752,
march 2010.
successfully and results are obtained. [4] Cui, Guang-zhao,Li,Haobin,Li,Xiaoguang, “DNA Computing and
The purpose of this experimental evaluation is to assess Its Application to Information Security Field”, IEEE ICNC,
the performance of the proposed method. Text files are used ppno.148-152, 2009.
to conduct the experimental evaluation. Th results of hiding [5] Balado F, “Capacity of DNA Data Embedding Under Substitution
Mutation”, IEEE Transactions on Information Theory, Volume:59,
text files of different sizes are given in table 4. Issue:2, 2013, pp.928-941.
[6] Xing Wang and Qiang Zhang, "DNA computing based
Cryptography," 2009.

523
2013 International Conference on Recent Trends in Information Technology (ICRTIT)
[7] Mona Sabry, Mohamed Hashem, Taymoor Nazmy, and Mohamed
Essam Khalifa, "A DNA and Amino Acids Based
Implementation of Playfair Cipher," vol. 8 No.3, 2010.
[8] Catherine Taylor Clelland, Viviana Risca, and Carter Bancroft,
"Hiding messages in DNA microdots," vol. 399, 1999.
[9] Andre Leier, Christoph Richter, Wolfgang Banzhaf, and Hilmar
Rauhe, "Cryptography with DNA binary strands," vol. 57, 2000.
[10] Magdy Saeb, Eman El-Abd, and Mohamed E El-Zanaty, "On Covert
Data Communication Channels Employing DNA
Recombinant and Mutagenesis-based Steganographic Techniques,"
2007.
[11] “Hayam Mousa, Kamel Moustafa, Waiel Abdel-Wahed, and Mohiy
Hadhoud, "Data Hiding Based on Contrast Mapping Using
DNA Medium," vol. 8 No.2, 201l.
[12] Amal Khalifa, Ahmed Atito, “High-Capacity DNA-based
Steganography”, 2012.
[13] L. Adleman, “Molecular computation of solutions to combinatorial
problems,” Science, JSTOR, vol. 266, pp. 1021–1025, 1994.
[14] G. Cui, Y. Liu, and X. Zhang, “New direction of data storage: DNA
molecular storage technology,” Computer Engineering and
Application, vol. 42, no. 26, pp. 29–32, 2006.
[15] J. Chen, “A DNA-based, biomolecular cryptography design,” in
IEEE International Symposium on Circuits and Systems (ISCAS)
2003, pp. 822–825.
[16] Beenish Anam, Kazi Sakib, Md. Alamgir Hossain, Keshav Dahal,”
Review on the Advancements of DNA Cryptography”, 1 OCT 2010.
[17] KumarD, Singh S, “Secret Data Writung Using DNA Sequences”,
IEEE international conference on ETNCC, ppno. 402-405, 2011
[18] Asad m, Gilani J, Khalid A, “An Enhanced Least Significant Bit
Modification Technique for Audio Steganography”, IEEE
international conference on ICCNIT, ppno.143-147, 2011.
[19] Battacharya S, Yuangfang Gao, Korapally V, Othman M.T, Grant
S.A, Kleiboeker S.B, Gangopadhyay.K, “Optimization of Design
and Fabrication processes for Realization of a PDMS-SOG-Silicon
DNA amplification chip”, IEEE Journal of Microelectromechanical
Systems, pp.no. 401-410, 2007.
[20] Bourkerche A, de Melo, A.C.M.A, Walter M.E.T, Melo R.C.F,
Santana M.N.P, Batista R.B, “A performance evaluation of a local
DNA sequence alignment algorithm on a cluster of workstations”,
IEEE IPDPS, 2004.
[21] Tumeo A, Villa O, “ Accelerating DNA Analysis applications on
GPU clusters”, IEEE SASP, pp.no:71-76, 2010
[22] Leroy Hood and David Galas, "The digital code of DNA," vol. 421,
no. 6921, 2003.
[23] Asad, Muhammad, Gilani, Junaid, Khalid, Adnan“Three layered
Model for Audio Steganography”,IEEE ICET, ppno:1-6, 2012.

524

You might also like