Professional Documents
Culture Documents
Hiding Message Into DNA Sequence Through DNA Coding and Chaotic Maps
Hiding Message Into DNA Sequence Through DNA Coding and Chaotic Maps
DOI 10.1007/s11517-014-1177-3
Original Article
Received: 14 April 2013 / Accepted: 27 June 2014 / Published online: 15 July 2014
© International Federation for Medical and Biological Engineering 2014
13
742 Med Biol Eng Comput (2014) 52:741–747
The principle of hiding data into DNA sequence is as 2 Encode and encrypt secret message
follows: a reference sequence S is randomly selected from
some publicly available DNA sequence databases, only the 2.1 Encode secret message using DNA coding
sender and the receiver are aware of the reference sequence.
The sender transforms S into a new sequence S ′ by embed- Each DNA sequence contains four nucleic acid bases,
ding the secret message M into it. After the sequence S ′ is which are A (adenine), C (cytosine), G (guanine), and
sent to the receiver with other real sequences, the receiver T (thymine), where A and T, C and G are complemen-
examines the received sequences to identify S ′, recovers the tary pairs. In the binary code, 0 and 1 are complemen-
secret message M, and converts S ′ back to S. tary, so 0 (00) and 3 (11) are complementary pair, 1 (01)
Shiu et al. [13] proposed three data hiding methods to and 2 (10) are complementary pair too. So in the 4! = 24
embed the secret message into a reference DNA sequence. kinds of coding, there are only 8 of them to meet the
Method 1, 2, and 3 are insertion method, complementary complementary rule, for example, 0123 can be encoded
pair method, and substitution method, respectively. The to CTAG, CATG, GTAC, GATC, TCGA, TGCA, ACGT,
results show that Method 2 offers the worst capacity, while and AGCT. Here, we select one of them to encode the
Method 3 shows the best capacity. Method 1 is the most secret message.
robust, and Method 3 is the least. Of the three methods, DNA coding proposes a novel encoding method as
only Method 3 remains zero payload, which indicates that an alternative to traditional ASCII coding. Nucleotides
encrypted reference sequence can confuse the attacker, and are used as a quaternary coding, and each letter can be
difficult to be exactly identified. denoted by three nucleotides. The length of DNA coding
Guo et al. [4] proposed a new DNA sequence-based is only 6-bit, which is shorter than the 8-bit ASCII cod-
data hiding scheme. They establish an injective mapping ing. In Ref. [6], Clelland et al. proposed the translation
between one complementary rule and two secret bits in a table from alphabets to DNA nucleotides, as shown in
message. Based on this mapping mechanism, the proposed Table 1.
scheme can effectively hide two secret bits in a message by If we use C, T, A, G to denote 0 (00), 1 (01), 2 (10),
replacing one character. This approach can greatly improve 3 (11), then use CGA (001110) to denote the letter A, and
the embedding capacity in data hiding. Robustness and use CCA (000010) to denote the letter B, etc., then, the
security analyses show that the probability of an attacker’s secret message to be hidden can be encoded to a DNA
making a successful recovery of the hidden data is negli- sequence, for example, “AB” can be encoded to CGACCA
gible. According to the experimental results, the proposed (001110000010). Respectively, to decode the sequence, we
scheme has a stable and efficient embedding capacity with a can get the secret message.
low modification rate, and the fake DNA sequence does not Although the DNA coding in Table 1 can only be
need to expand the length of the reference DNA sequence. expressed as capital letters, numbers, and several punctua-
In this paper, we propose an improved data hiding tion marks, it is still sufficient to encode the secret message
method based on the substitution method in Ref. [13] and M; suppose the length of M is L, the length of the encoded
take four measures to enhance the robustness and enlarge message M ′ is 3L.
the hiding capacity. ➀ Encoding the secret message by DNA
coding to reduce the size; ➁ Encrypting the secret message 2.2 Encrypt the DNA sequence by Chebyshev maps
by Chebyshev chaotic maps to enhance the robustness; ➂
Generating the hiding locations randomly by the PWLCM After encoding the secret message M by DNA coding to get
system to resist attacking; ➃ Using the complementary rule M ′, we can further generate a DNA sequence by Cheby-
to hide 2-bit in one nucleotide to enlarge the hiding capac- shev maps to encrypt it.
ity. Experimental results demonstrate the effectiveness. The Chebyshev maps can be described as follows:
13
Med Biol Eng Comput (2014) 52:741–747 743
Fig. 1 Plot and distribution of PWLCM system. a The plot of PWLCM system. b The distribution of x and q with 1,000 iterations
13
744 Med Biol Eng Comput (2014) 52:741–747
hiding locations. If the randomly selected reference DNA Step 2. For i from 1 to 15:
sequence is long enough, the value of pi is determined
by the length of secret message L. If pi ∈ {1, 2, 3, 4}, if si′ is the same with and si, then set mj = 0 and j = j + 1;
the values can be generated by Eq. (7). If pi ∈ {1, 2} or else if si′ is the same with C(si), then set mj = 1 and
pi ∈ {1, 2, 3}, their values can be generated by the similar j = j + 1;
equations.
Step 3. Concatenate all mk’s, 1 ≤ k ≤ j − 1, to be M. M
1, yi ∈ (−1, −0.5] is the secret message. Set all si′’s to be si′’s to recover S ′ back
to the reference sequence S.
2, y ∈ (−0.5, 0]
i
pi = . (7)
3, yi ∈ (0, 0.5]
4.2 Improved message hiding method
4, yi ∈ (0.5, 1]
13
Med Biol Eng Comput (2014) 52:741–747 745
j = i;
i
pk, do not change si;
if j � = 1 +
k=1
i
if j =1+ pk , then
k=1
do case
case mi′ = 00, set sj′ = sj;
case mi′ = 01, set sj′ = C(sj); Fig. 2 Process of message hiding
case mi′ = 10, set sj′ = C(C(sj));
case mi′ = 11, set sj′ = C(C(C(sj))); set w = 4.2926871 and z0 = 0.2182135678932276 in Eq.
end case. (1) to get X = GTCAAG, then XM ′ = X ⊕ M ′ = CACGCT.
We use Eq. (7) to generate the relative hiding locations
Step 5. After the above steps, sequence S is changed into P = {2, 1, 4, 3, 2, 3}. The process of transforming S to S ′
sequence S ′, then send S ′ to receiver. is shown in Fig. 2, and S ′ can be transformed into S by the
inverse transformation while extracting the secret message.
4.3 Improved message recovery method
5.2 Analysis of key space
The receiver can use the following algorithm to recover the
hidden message. For an intruder to discover the secret message, the fol-
Input: A faked DNA sequence S ′ = s1′ , s2′ , . . . , sL′ S, a lowing information is necessary: ➀ the reference DNA
reference DNA sequence S = s1 , s2 , . . . , sLS, the comple- sequence, ➁ the complementary rule, ➂ the sequence X,
mentary rule, the keys to generate sequence X. and ➃ the sequence P.
Output: The hidden secret message M. For ➀, there are roughly 163 million DNA sequences
Step 1. Generate the hiding relative locations available publicly. Thus, the probability for an attacker to
P = {p1 , p2 , . . . , p3L } with the same key. make a successful guess is
Step 2. Initialize i to 1.
Step 3. For each element si′ ∊ S ′, do the following 1
,
operation: 1.63 × 108
so the key space SDNA = 1.63 × 108.
j = i; i For ➁, the number of legal complementary rules should
if j = 1 + pk then
be considered. There are six legal complementary rules as
do case k=1 follows:
case sj′ = sj, mi′ = 00;
case C(sj′) = sj, mi′ = 11;
(AT )(TC)(CG)(GA), (AT )(TG)(GC)(CA), (AC)(CT )(TG)(GA),
case C(C(sj′)) = sj, mi′ = 10; (AC)(CG)(GT )(TA), (AG)(GT )(TC)(CA), and (AG)(GC)(CT )(TA).
case C(C(C(sj′))) = sj, mi′ = 01; The probability to make correct guess is 1/16.6, so the
end case. key space is 6.
For ➂, the variation of z0 ∊ [0, 1] is with a precision of
Step 4. After the Step 3, we can get −16
10 , so the key space for initial condition z0 is Sz0 = 1016.
′ }, then decrypt it by Eq. (5) to get
XM ′ = {m1′ , m2′ , . . . , m3L To ensure a large divergence of a chaotic trajectory
M and decode it by DNA coding to get the secret message
′
from the initial condition, the iteration times n should
M. be relatively large but not too much; it is usually set
n ∊ [100, 1,000]. We provide an increase of the key space
dimension in Eqs. (1) and (6) by 9 × 102, so the total key
5 Experiments and comparisons space Sn = 8.1 × 105.
The variation of the parameter w in the chaotic region is
5.1 Simulation results between 2 and 6 with a step of 10−7, so Sw = 4 × 107.
For ➃, the variation of y0 ∊ [0, 1] is with a precision of
Here, we set the plain text “OK” as secret mes- 10−16, so the key space for initial condition y0 is Sy0 = 1016.
sage M, according to Table 1, the result of DNA cod- Because of qi ≈ yi/2, and yi ∊ [0, 1], then Sqi = Sy0 = 1016.
ing is M ′ = GGCAAG. Assume that S = ACGGAATT- Finally, the algorithm has S = 6SDNA Sz0 Sn Sw Sy0
GCTTCAGT. We use C, T, A, G to denote 00, 01, 10, 11, Sqi = 3.5575 × 1070 combinations of the secret keys.
the complementary rule (AT)(CA)(GC)(TG) is applied. We Even though the fake DNA sequence is identified, it is still
13
746 Med Biol Eng Comput (2014) 52:741–747
Cracking probability 1
× 1 1
× 1
× 1 1
× 1
× 1
× 1
× 1
× 1
1.63×108 6 1.63×108 6 24 1.63×108 6 1016 1.8×105 4×107 1032
virtually impossible for the sequence to be correctly recov- message is encoded by DNA coding, encrypted by pseudo-
ered without the correct keys. random sequence and generated by Chebyshev chaotic
maps. The random hiding locations are generated by the
5.3 Security comparison PWLCM system to resist attacks. The complementary
rule is applied to hide 2-bit in one nucleotide to enlarge
We compare the methods in Refs. [4, 13] with our method, the hiding capacity. Experimental results indicate that the
and the results are shown in Table 2. We can find that by hiding capacity of the proposed method is greater than the
these measures, the cracking probability of the proposed competing method, and it also has a better performance in
method apparently much lower than the three methods. robustness and capacity. In contrast with some traditional
data hiding schemes those using image as carrier, the pro-
5.4 Hiding capacity posed method is easy to implement and hard to detect.
By the substitution method in Ref. [13], each nucleotide Acknowledgments The research is supported by the National
Natural Science Foundation of China (No. 61363082) and the Minor-
can only be embedded into one bit. In order to enlarge the ity Nationality Technology Talent Cultivation Plan of Xinjiang (No.
hiding capacity, we encode the secret message by 6-bit 201123116).
DNA coding, instead of 8-bit ASCII coding, and each
nucleotide can be embedded into 2-bit by the complemen-
tary rule. We use the same eight DNA sequences in Refs.
[4, 13] as the test sample from Web site [14]; the compar- References
ison of the maximal hiding capacity is shown in Table 3,
1. Awad A, Assad SE, Wang QX, Vlădeanu C, Bakhache B (2008)
from which we can find that the hiding capacity of the Comparative study of 1-D chaotic generators for digital data
improved substitution method is greater than the substitu- encryption. IAENG Int J Comput Sci 35(4):483–488
tion method in Ref. [13]. If we set pi to different values, 2. Chang CC, Lu TC, Chang YF, Lee RCT (2007) Reversible data
their corresponding results are shown in Table 3. hiding schemes for deoxyribonucleic acid (DNA) medium. Int J
Innov Comput Inf Control 3(5):1145–1160
3. Clelland CT, Risca V, Bancroft C (1999) Hiding messages in
DNA microdots. Nature 399(6736):533–534
6 Conclusion 4. Guo C, Chang CC, Wang ZH (2012) A new data hiding scheme
based on DNA sequence. Int J Innov Comput Inf Control
8(1(A)):139–149
An improved data hiding method into DNA sequence 5. Lee CF, Huang YL (2012) An efficient image interpolation
is designed. Four measures have been taken to enhance increasing payload in reversible data hiding. Expert Syst Appl
the robustness and enlarge the hiding capacity. The plain 39(8):6712–6719
13
Med Biol Eng Comput (2014) 52:741–747 747
6. Leier A, Richter C, Banzhaf W, Rauhe H (2000) Cryptography 12. Shimanovsky B, Feng J, Potkonjak M (2002) Hiding data in
with DNA binary strands. BioSystems 57(1):13–22 DNA, the 5th international workshop on information hiding. Lect
7. Liu HJ, Wang XY, Zhu QL (2011) Asynchronous anti-noise hyper Notes Comput Sci 2578:373–386
chaotic secure communication system based on dynamic delay 13. Shiu HJ, Ng KL, Fang JF, Lee RCT, Huang CH (2010)
and state variables switching. Phys Lett A 375(30–31):2828–2835 Data hiding methods based upon DNA sequences. Inf Sci
8. Liu HJ, Wang XY, Kadir A (2012) Image encryption using 180(11):2196–2208
DNA complementary rule and chaotic maps. Appl Soft Comput 14. Website, NCBI Database: http://www.ncbi.nlm.nih.gov/
12(5):1457–1466 15. European Bioinformatics Institute. http://www.ebi.ac.uk/
9. Mali SN, Patil PM, Jalnekar RM (2012) Robust and secured 16. Yang WJ, Chung KL, Liao HYM (2012) Efficient reversible data
image-adaptive data hiding. Digit Signal Proc 22(2):314–323 hiding for color filter array images. Inf Sci 190(1):208–226
10. Peterson I (2001) Hiding in DNA, Muse, p 22
11. Qian ZX, Zhang XP (2012) Lossless data hiding in JPEG bit-
stream. J Syst Softw 85(2):309–313
13