A DNA-based Data Hiding Technique With Low Modification Rates

Multimed Tools Appl
DOI 10.1007/s11042-012-1176-z
A DNA-based data hiding technique with low

modification rates
Ying-Hsuan Huang & Chin-Chen Chang & Chun-Yu Wu
# Springer Science+Business Media, LLC 2012
Abstract In 2010, Shiu et al. proposed three DNA-based reversible data hiding schemes
with high embedding capacity. However, their schemes were not focused on DNA modifi-
cation rate or the expansion problem. Therefore, we propose a novel reversible data hiding
scheme based on histogram technique to solve the weaknesses of Shiu et al.’s schemes. The
proposed scheme transforms the DNA sequence into a binary string and then combines
several bits into a decimal integer. These decimal integers are used to generate a histogram.
Afterwards, the proposed scheme uses a histogram technique to embed secret data. The
experimental results show that the modification rate of our proposed scheme is 69 % lower
than that of Shiu et al.’s schemes for the same embedding capacity. In addition, the length of
the DNA sequence remains unchanged in the proposed scheme.
Keywords DNA . Reversible data hiding . DNA modification rate . Histogram
1 Introduction
Reversible data hiding schemes usually embed secret data into cover media, such as image
[2, 5, 7], video [3, 15, 16], audio [14, 17] and DNA sequence [1, 4, 6, 8–13]. Different media
have their own advantages for applications. For example, now, the 163 million DNA
Y.-H. Huang
Department of Computer Science and Engineering, National Chung Hsing University, Taichung 40227,
Taiwan, Republic of China
C.-C. Chang (*)

Department of Information Engineering and Computer Science, Feng Chia University, Taichung 40724,
e-mail: alan3c@gmail.com
C.-C. Chang
Department of Computer Science and Information Engineering, Asia University, Taichung 41354,
C.-Y. Wu
Department of Computer Science and Imformation Engineering, Chung Cheng University, Chiayi 62102,
Multimed Tools Appl
sequences that are openly usable assure the security and the robustness of the information
hiding methods [4]. In other words, since the number of the DNA sequence is abundant, the
DNA sequence is a satisfactory cover medium.
A DNA sequence includes four nucleotides: adenine (A), thymine (T), cytosine (C), and
guanine (G), as well as a non-labeled nucleotide (N), as shown in Fig. 1 [9]. The number of
non-labeled nucleotide (N) in the DNA sequence is low. When a non-labeled nucleotide (N)
is modified, hackers can detect that there may be secret data on the DNA sequence.
Furthermore, the experts have pointed out that more than 99 % of human DNA sequences
are the same across the population [6]. Therefore, if large nucleotides are modified or the
length of the DNA sequence is expanded, hackers can detect that the DNA sequence is
different from the original sequence.
Peterson [10] and Shimanovsky et al. [11] proposed two data hiding schemes that use the
DNA sequence. Their schemes efficiently utilize the DNA sequence to hide secret data.
nnnnnnnnnn nnnnnnnnnn nnnncccacc ctcctcccaa ataaaaccca ataacccaat

taacatgtac aggggataaa ttttaagcta ttaaattttt cctccttccc ctctccctcc
ctttccccct tctcttcctc ttttttccat cagcctatta atttatcacc taaccatccc
tccatcactt tcttctttct ttcgtctctg cccagcacct tttacctttt tagcactttt
tagaatagaa actgaaaata atcttgatct taaacataac gtgttagaaa aattgaatgt
gttttttgag aagagatatc ttgtccttgt atccacatat cattgtgata cttgaacctg
tctcaaaaca gaagtagaac tatgattttt aacactaatt tcaatcttta gtgaatagac
tttcctttcc cagccaccct gatgagagag aacagaacac ttaaacacaa gtctggtagt
tctgatacca cttacccaag ttgagtgcct tttatggttc ccagtggcca tgatgatttc
tattcctttt caagtttgta agatcttggt tggtaatttt tgtagcagcc aggagtttgg
gtcctagtaa ctgaacctta tacccttttt tttttttttc cttcctctcc aggtgtctgc
tctgggacca ccttgctcta tttatccttt tttgtatggt gtttcccttg tcaattcatg
taatgctgtt cattttagtt attaagatat gaatttaatc cagtgaaggg ggcaagtatg
tgtgtgtctc tctgaatttt ttaaagggga ataatatttc tcacttgagg aaactggctt
tcacagtcca tcctctaact tttttctttt ttcaatttga actggccaac taaggagatc
acatttagta cttgctagtt gatacttacc ttctcccatt ctgagtgagt cctcctgtgg
ccccctctcc caaaggtgca ctgagtcaca atgcgattag cagctgcctg tgcactcttg
cacatggcac actaaaccat ggggtgttca aggttcagct taggaaggac tttcaggaaa
agatggagac cctttcccta ctccaatcat tgaatagttg cttgtaactg gtattaatgt
tataaatgat tgtctctaag tctgtctcag ttccaaacca gatggagttt tgattttaga
aaatcattca gtgaaatgta tttcctgtgc ctggtagtca gtgatcactt accaaaaaaa
gtgggggggg gtgggagaat aaaaataaac ttatgataat tcacatacac ttaaccttta
ctgagtgagt aataccaaaa agggaatgaa gcacgctctg ctgaccagca tttccaaatg
gattccagga atgtggttag agttgccaaa gtaatctctg aatcctgcca aaagcttcct
ttgggatatt ttttaggtta gggaaagcga gggatatgaa ggtgtgtata tgaatatgtt
Fig. 1 Part of the DNA sequence [9]

Multimed Tools Appl
However, the original DNA sequence of their schemes cannot be recovered. In 2007, Chang et
al. [1] proposed two DNA-based reversible data hiding schemes. In their schemes, the secret
data can be extracted and the original DNA sequence can be restored. However, their schemes
need a compression technique to embed secret data. Also in 2007, Coltuc et al. [2], Hong et al.
[5], and Jin et al. [7] pointed out that the compression technique for data hiding is complicated.
In 2010, Shiu et al. proposed three reversible data hiding schemes based on DNA sequences
[12]. The proposed schemes include the insertion method, complementary pair method, and
substitution method. The schemes developed by Shiu et al. have good embedding capacity, but
they suffer from a high modification rate. Also, the insertion method requires that several keys
be transmitted to the receiver; the complementary pair method expands the sequence signifi-
cantly in the process of embedding the secret message; and the substitution method requires that
the original DNA sequence be transmitted to the receiver for extracting the secret message. In
[12], the expansion of the DNA sequence of the complementary pair method is larger than the
expansions for the other two methods, which means the complementary pair method can easily
attract the attention of hackers. In order to overcome the weaknesses of the insertion and
substitution methods, this paper adopts the binary coding rules and the histogram technique.
Furthermore, the proposed scheme can control the modification rate effectively.
2 Related work
In this section, the insertion and substitution methods which were proposed in [12] are
described. Each method requires a binary coding rule that should be known by both the sender
and the receiver. The binary coding rule is shown in Table 1. Furthermore, the original DNA
sequence is denoted by S ¼ fs1 ; s2 ; . . . ; si g , where i is the number of nucleotide, and the secret
data are denoted by M ¼ fm1 ; m2 ; . . . ; mr g , where r is the number of secret data.
2.1 Insertion method [12]
2.1.1 Embedding phase
Step 1: Use the binary coding rule to transform DNA sequence S ¼ fs1 ; s2 ; . . . ; si g into a
binary string B ¼ fb1 ;b2 ; . . . ;bi2g .
Step 2: Divide the binary string B into ni segments, each of which having n bits. Assume
that n is 3. Therefore, the first segment is fb1 ;b2 ;b3 g .
Step 3: Secret data M ¼ fm1 ; m2 ; . . . ; mr g are inserted at the beginning of each
segment. These segments are combined into a new binary string B0 ¼

m1 ;s1 ;s2 ;s3 ;m2 ;s4 ; . . . ;si .
Step 4: Use the inverse binary coding n rule to transform
o the new binary string B′ into the
0 0 0
stego DNA sequence S 0 ¼ s1 ; s2 ; . . . ; siþðr=2Þ .
Table 1 Binary coding rule

Nucleotide Binary Code
A 00
T 01
C 10
G 11
Multimed Tools Appl
After the stego DNA sequence S′, n and r have been obtained, the receiver can extract the
secret data and recover the original DNA sequence.
2.1.2 Extraction and recovery phase
Step 1: Apply the binary coding rule to transform the stego DNA sequence S 0 ¼
n0 0 0
o 0 0 0
s11 ; s2 ; . . . ; siþðr=2Þ into a binary
l string
m B0 ¼ b11 ; b2 ; . . . ; bi2þr .
Step 2: Divide the binary string B′ into i2þr segments that have the size of n+1 bits.
nþ10 0 0 0
For example, the first segment is b1 ;b2 ;b3 ; b4 .
Step 3: Extract and delete the first bit of each segment to extract secret data. Therefore, the
0 0 0 0
first secret datum is b1 and the new segment is b2 ;b3 ; b4 . After extracting the
secret data and deleting the first bit of each segment, combine these segments into
an original binary string B ¼ fb1 ; b2 ; . . . ; bi2 g .
Step 4: Use the inverse binary coding rule to transform B into the original DNA sequence S.
2.2 Substitution method [12]
2.2.1 Embedding phase
Step 1: The substitution rule is constructed, as shown in Fig. 2. Assume that s1 is A. After
using substitution rule, we get C ðs1 Þ ¼ C .
Step 2: The embedded positions are randomly selected. The embedded locations are
denoted by E ¼ fe1 ; e2 ; . . . ; er g , where r is the number of secret data.
Step 3: Use the substitution rule to substitute the nucleotide; if the position j of nucleotide
sj ðj ¼ 1; 2; . . . ; iÞ in the DNA sequence is equal to the randomly selected number

ek ðk ¼ 1; 2; . . . ; rÞ and the secret message is equal to 1, then set sj to be C sj . If
Fig. 2 Substitution rule

sj C(sj)
A C
C G
G T
T A
Multimed Tools Appl
position j of sj in the DNA sequence is equal to the randomly selected number ek and
the secret message is equal to 0, keep unchanged sj. Otherwise, if position
j of sj does
not equal any randomly selected number ek, then set sj to be C C sj .
2.2.2 Extraction and recovery phase
When the receiver receives the original DNA sequence S ¼ fs1 ; s2 ; . . . ; si g and the stego
0 0 0 0
DNA sequence S 0 ¼ s1 ; s2 ; . . . ; si , the secret data can be extracted. If sj ðj ¼ 1; 2; . . . ; iÞ is
0
the same as sj, the secret message 0 can be extracted. If sj is the same as C sj , the secret
message 1 can be extracted. After the extraction and recovery phase is completed, the secret
data can be extracted successfully.
3 Proposed scheme
From the previous section, we see that the insertion and substitution methods have huge
modification rates; in addition, the insertion method has some nucleotide expansion. These
nucleotides will make it easy to detect that the DNA sequence has been modified. Therefore,
we propose a reversible data hiding scheme based on Chang et al.’s binary coding rule [1]
and Tseng et al.’s histogram method [13] to decrease the modification rate and maintain the
same length of the DNA sequence.
3.1 Embedding phase
Step 1: Set the mark of four nucleotide types (A, T, C, and G) and that of the non-labeled
nucleotide (N) to 0 and 1, respectively.
Step 2: Extract the nucleotides S ¼ fs1 ; s2 ; . . . ; si g , in which their marks are equal to 0, to
embed secret data, where sk ðk ¼ 1; 2; . . . ; iÞ and i denote the kth embeddable
nucleotide and the number of embeddable nucleotides, respectively.
Step 3: Using the binary coding rule, the embeddable nucleotides S ¼ fs1 ; s2 ; . . . ; si g can
be transformed into a binary string B ¼ fb1 ; b2 ; . . . ; bi2 g . The binary coding rule
of the proposed scheme is listed in Table 1.
Step 4: The proposed 2i scheme
converts every 2 t bits into decimal integers
pj j ¼ 1; 2; . . . ; 2t , where threshold t is used to control hiding capacity and

modification rate. In this step, n residual bits, bb2i=2tc2tþ1 ; bb2i=2tc2tþ2 ;
. . . ; bb2i=2tc2tþn g , cannot be converted, where n is defined as:
n ¼ 2i mod 2t: ð1Þ

Step 5: Compile the decimal integers P ¼ p1 ; p2 ; . . . ; pb2i=2tc to generate a histogram,
and then find the most frequently appearing decimal integer h, the least frequently
appearing decimal integer L1, and the second least frequently appearing decimal
integer L2.
Step 6: In order to create the hiding space, if the decimal integer pj is equal to L1, set pj to
be L2 and set the value of the location map as 1. If the decimal integer pj is equal to
L2, the decimal integer pj remains unchanged and the value of the location map is
set to 0. Otherwise, if the decimal integer pj is not equal to L1 or L2, the decimal
integer pj remains unchanged and need not set the location map. In order to recover
Multimed Tools Appl
the original DNA sequence, the location map must be concealed into the DNA
sequence with secret message.
Step 7: If pj is equal to h and the embedded message is equal to 0, pj does not change.
Otherwise, if pj is equal to h and the embedded
n message is 1, set
o pj to be L1. We
0 0 0
can then obtain new decimal integers P0 ¼ p1 ; p2 ; . . . ; pb2i=2tc .
Step 8: The new decimal integers P′ can be transformed into the new binary string, and
then combine the new binary string and the residual bits to get a stego binary string
0 0 0
B0 ¼ b1 ; b2 ; . . . ; bi2 .
Step 9: Use the binary coding rule, in which the stego binary string B′ can be transformed
0 0 0
into the stego nucleotides S 0 ¼ s1 ; s2 ; . . . ; si .
0
Step 10: If the mark of nucleotide is equal to 0, add the stego nucleotide sk in the stego
DNA sequence and increase k by one, where the initial value of k is one.
Otherwise, if the mark of nucleotide is equal to 1, add a non-labeled nucleotide
(N) in the stego DNA sequence.
After the stego DNA sequence, the most frequently appearing decimal integer h, the least
frequently appearing decimal integer L1, and the second least frequently appearing decimal
integer L2 are obtained, the receiver can extract the secret data and recover the original DNA
sequence.
We give an example to describe the embedding procedure. Let two secret bits be {0, 0}.
Suppose that eight nucleotides are {A, A, A, A, T, T, C, G} and the threshold T is 1. These
nucleotides are transformed into a binary string B0{0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1}.
Then, every two bits are converted into decimal integers {0, 0, 0, 0, 1, 1, 2, 3}. These
decimal integers are compiled to generate a histogram. The decimal integer with the most
appearing frequency in the histogram is 0, the decimal integer with least frequently appear-
ing is 2, and the decimal integer with second least frequently appearing is 3. Therefore, the
decimal integer that equals 2 is modified to 3 and its value of the location map is set as 1.
The decimal integer that equals 3 remains unchanged and its value of location map is set as
0. After the above phases, we get a location map l0{1, 0}.
The decimal integer that equals 0 is used to embed the location map and secret
bits. Because the first value of the location map is equal to one, the first decimal integer
is modified to 2. The reminder embedded bits are equal to 0, the decimal integers
remain unchanged. We get stego decimal integers {2, 0, 0, 0, 1, 1, 3, 3}. Finally,
the stego decimal integers are transformed into the stego DNA sequence {C, A, A, A, T,
T, G, G}.
3.2 Extraction and recovery phase
Step 1: Set the mark of four nucleotide types and that of the non-labeled nucleotide to 0
and 1, respectively. 0 0 0
Step 2: Extract the stego nucleotides S 0 ¼ s1 ; s2 ; . . . ; si , in which their marks are
equivalent to 0, to retrieve secret data and recover the original nucleotides.
0 0 0
Step 3: Use the binary coding rule to transform the stego nucleotides S 0 ¼ s1 ; s2 ; . . . ; si
0 0 0
into a binary string B0 ¼ b1 ; b2 ; . . . ; bi2 .
0 2i
Step 4: Each 2 t bits from B′ is converted
n into a decimal integer pj j ¼ 1; 2;o. . . ; 2t . In
0 0 0
this step, n residual bits, bb2i=2tc2tþ1 ; bb2i=2tc2tþ2 ; . . . ; bb2i=2tc2tþn , cannot be
converted, where n is obtained by Eq. (1).
Multimed Tools Appl
Table 2 Seven DNA sequences [9]
Sequence The length of DNA sequence Actual number of nucleotide Number of non-labeled nucleotide
AC153526 200,117 200,117 0

AC167221 204,841 204,841 0
AC168874 206,488 205,188 1,300
AC168897 200,203 195,017 5,186
AC168901 191,456 191,206 250
AC168907 194,226 193,417 809
AC168908 218,028 217,110 918
0
Step 5: If the decimal integer pj is equal to h, the embedded data 0 can be extracted. If the
0 0
decimal integer pj is equal to L1, the embedded data 1 can be extracted, and set pj
to be h. In this step, the location map and secret data can be completely extracted.
0 0 0
Step 6: If pj is equal to L2 and the value of the location map is equal to 1, set pj to be L1. If pj
0
is equal to L2 and the value of the location map is equal to 0, pj remains unchanged.

We can then obtain original decimal integers P ¼ p1 ; p2 ; . . . ; pb2i=2tc .
Step 7: The original decimal integers P can be transformed into a new binary string, and
then combine the new binary string and the residual bits to get an original binary
string B ¼ fb1 ; b2 ; . . . ; bi2 g .
Step 8: Use the inverse binary coding rule to transform the original string B into the
restored nucleotides S ¼ fs1 ; s2 ; . . . ; si g .
Step 9: If the mark of nucleotide is equal to 0, add the restored nucleotide sk in the original
DNA sequence and increase k by one, where the initial value of k is one.
Otherwise, if the mark of nucleotide is equal to 1, add a non-labeled nucleotide
(N) in the original DNA sequence.
We give an example to describe the procedures of extraction and recovery. When the
receiver receives the stego DNA sequence {C, A, A, A, T, T, G, G}, the stego decimal
integers {2, 0, 0, 0, 1, 1, 3, 3} can be obtained by Step 2 and Step 3. Because the first decimal
integer is equal to 2, the embedded bit 0 is extracted. In addition, the decimal integer is
modified to 0. Because the second to fourth decimal integers are equal to 0, three embedded
Table 3 Comparison of hiding capacity (HC), modification rate (MR), h, L1, and L2 using different t values
Sequence t02 t03
HC (bits) MR (%) h L1 L2 HC (bits) MR (%) h L1 L2
AC153526 5,591 4.43 5 11 14 2,380 2.14 21 62 11

AC167221 4,331 4.17 0 11 14 2,029 1.86 0 22 41
AC168874 5,937 4.8 5 11 14 2,519 2.34 21 62 43
AC168897 4,498 4.07 0 11 14 2,131 1.98 0 43 62
AC168901 5,214 4.08 0 11 14 2,179 1.91 0 43 58
AC168907 1,551 4.2 8 11 4 1,501 1.87 0 62 27
AC168908 6,639 4.54 5 11 14 2,743 2.07 21 62 59
Multimed Tools Appl
Table 4 Distortion Control
Sequence t02 t03
Hiding capacity (bits) MR (%) Hiding capacity (bits) MR (%)
AC153526 2,380 2.89 2,380 2.14

AC167221 2,029 3.01 2,029 1.86
AC168874 2,519 3.13 2,519 2.34
AC168897 2,131 2.91 2,131 1.98
AC168901 2,179 2.52 2,179 1.91
AC168907 1,501 4.20 1,501 1.87
AC168908 2,743 2.74 2,743 2.07
bits {0, 0, 0} can be extracted. After the above phases, the location map and secret data are
extracted successfully.
After that, the location map is used to recover the DNA sequence. Because the first value
of the location map is equal to 1, the seventh decimal integer that equals 3 is modified to 2.
The second value of the location map is equal to 0, the eighth decimal integer that is equal to
3 remains unchanged. After the above phases, the original decimal integers {0, 0, 0, 0, 1, 1,
2, 3} are obtained. Finally, the original DNA sequence {A, A, A, A, T, T, C, G} is computed
by binary coding rules.
4 Experimental results
Three different reversible data hiding schemes were implemented to compare the perform-
ances of the proposed scheme, the insertion scheme, and the substitution scheme. Seven
DNA sequences are used as test DNA sequences, as shown in Table 2. In 2010, Liao [8] first
proposed a difference rate formula to calculate the difference between DNA sequences.
However, Liao’s formula does not consider the expansion problem. Therefore, we modified
the equation as shown:
Table 5 Comparison of modification rate for Shiu et al.’s two schemes and the proposed scheme
Sequence Insertion scheme Substitution scheme Proposed scheme
MR (%) Capacity (bits) MR (%) Capacity (bits) MR (%) Capacity (bits)
AC153526 74.56 5,591 98.58 5,591 4.43 5,591

AC167221 74.37 4,331 98.96 4,331 4.17 4,331
AC168874 73.99 5,937 97.95 5,937 4.80 5,937
AC168897 72.47 4,498 96.27 4,498 4.07 4,498
AC168901 74.23 5,214 98.50 5,214 4.08 5,214
AC168907 74.54 1,551 99.01 1,551 4.20 1,551
AC168908 73.96 6,639 98.05 6,639 4.54 6,639
Average 74.01 4,823 98.18 4,823 4.33 4,823
Multimed Tools Appl
Table 6 Comparison of the expansion of DNA sequence of Shiu et al.’s two schemes and the proposed
scheme
Sequence Insertion scheme [12] Substitution scheme [12] Proposed scheme
Expansion Capacity Expansion Capacity Expansion Capacity

(nucleotides) (bits) (nucleotides) (bits) (nucleotides) (bits)
AC153526 2,796 5,591 0 5,591 0 5,591

AC167221 2,166 4,331 0 4,331 0 4,331
AC168874 2,969 5,937 0 5,937 0 5,937
AC168897 2,249 4,498 0 4,498 0 4,498
AC168901 2,607 5,214 0 5,214 0 5,214
AC168907 1,135 1,551 0 1,551 0 1,551
AC168908 3,320 6,639 0 6,639 0 6,639
P
i
dj
j¼1
Modification rate ¼ 100%; ð2Þ
i
where
( 0
0; if sj ¼ sj ;
dj ¼ 0 ð3Þ
1; if sj ¼
6 sj :
In the above equations, i is the length of the original DNA sequence, sj is the j-th
0
nucleotide of S, and sj is the j-th nucleotide of S′.
Table 3 shows the hiding capacity, modification rate, h, L1, and L2 of the proposed
scheme. From Table 3, it is apparent that the hiding capacity of the proposed scheme with t0
2 is greater than the hiding capacity when t03. Nevertheless, in the same hiding capacity,
Table 4 shows that the modification rate of the proposed scheme with t02 is larger than the
modification rate when t03. Table 5 shows that the average modification rates of the
insertion and substitution methods are 74.01 % and 98.18 %, respectively. In the Substitu-
tion method developed by Shiu et al. [12], all non-embeddable nucleotides were still
changed. Therefore, the modification rate of the Substitution method is rather high. In the
Insertion method developed by Shiu et al. [12], the DNA sequence can be transformed into a
binary sequence. Then, the secret bit is embedded into the binary sequence to generate a
stego binary sequence. The structure of the stego binary sequence differs from that of the
original binary sequence. Therefore, the modification rate of the Insertion method exceeds
Table 7 Comparison of the requirements of related reversible data hiding techniques based on DNA sequence
Requirement Chang et al.’s Insertion Substitution Proposed

schemes [1] scheme [12] scheme [12] scheme
Compression technique Yes No No No

r
Expansion (nucleotides) No 2 No No
Original DNA sequence No No Yes No
Number of keys No Two keys (n, r) No Three keys (h, L1, L2)
Multimed Tools Appl
a
70000
60000
50000
40000 Original nucleotid

Number
30000 Stego nucleotide
20000
10000
0
A T C G N
Nucleotid
b
70000
60000
50000

Number
20000
10000
0
A T C G N
Nucleotid
c
70000
60000
50000

Number
20000
10000
0
A T C G N
Nucleotid
d
60000
50000
40000
Original nucleotid
Number 30000
Stego nucleotide
20000
10000
0
A T C G N
Nucleotid
Fig. 3 Results of histogram-based security analysis. (a) AC153526 (b) AC167221 (c) AC168874 (d) AC168897
Multimed Tools Appl
that of our scheme. As mentioned above, our modification rate is 69 % lower than that of
Shiu et al.
Table 6 indicates that the length of the stego DNA sequence obtained by the proposed
scheme is same as the length of the original DNA sequence. In other words, the proposed
scheme does not require the addition of an extra nucleotide. In the insertion method, the
length of the stego DNA sequence is expanded after the secret data have been embedded.
The requirements of our proposed scheme and the other schemes are listed in Table 7.
The two schemes in Chang et al. require a compression technique for embedding the secret
data in the DNA sequence [1]. Therefore, the computation cost of their schemes is high. In
the insertion method, the DNA sequence is lengthened to embed the secret data; in the
substitution method, if the receiver wants to extract the secret data, the receiver must have
the original DNA sequence. Our proposed scheme only uses simple operators and three keys
to achieve the purpose of reversible data hiding. Furthermore, the length of the DNA
sequence of the proposed scheme will remain unchanged after the secret data are embedded.
This study performs the security analysis of the proposed scheme by using the histogram
analysis technique, while the robustness is measured using the cropping attack. Figures 3
and 4 summarize the results of security analysis and robustness analysis, respectively.
Figure 3 shows the histogram analysis results of the proposed scheme. This figure reveals
that the number of stego nucleotides is close to that of the original nucleotides. Therefore,
the stego DNA sequence produced by the proposed scheme is secure.
The proposed scheme is evaluated using the cropping attack. Figure 4 shows the results
of robustness analysis. Experimental results indicate that the robustness is satisfactory under
a low cropping ratio. It is because most nucleotides that were embedded with secret data
were not destroyed. Therefore, the secret data can be extracted efficiently.
5 Conclusions
In this paper, we proposed a novel reversible data hiding scheme based on the histogram
technique to embed secret data. The proposed scheme does not require either a compression
technique or an expansion technique. The sender only sends three keys to the receiver. The
experimental results show that the modification rate of the proposed scheme is 17 times
lower than that of Shiu et al.’s scheme. Moreover, the proposed scheme maintains the same
length of the DNA sequence to avoid attracting the attention of hackers.
0.95
0.9
Accuracy extraction ratio
0.85
0.8 AC153526
0.75 AC167221
0.7 AC168874
0.65 AC168897
0.6
0.55
0.5
0.4 0.3 0.2 0.1
Cropping ratio
Fig. 4 Results of robustness analysis

Multimed Tools Appl
References
1. Chang CC, Lu TC, Chang YF, Lee RCT (2007) Reversible data hiding schemes for deoxyribonucleic acid
(DNA) medium. Int J Innov Comput Inf Control 3(5):1145–1160
2. Coltuc D, Chassery JM (2007) Very fast watermarking by reversible contrast mapping. IEEE Signal
Process Lett 14(4):255–258
3. Farias MCQ, Carli M, Mitra SK (2005) Objective video quality metric based on data hiding. IEEE Trans
Consum Electron 51(3):983–992
4. Guo C, Chang CC, Wang ZH (2012) A new data hiding scheme based on DNA sequence. Int J Innov
Comput Inf Control 8(1):1–11
5. Hong W, Chen TS, Shiu CW (2009) Reversible data hiding for high quality images using modification of
prediction errors. J Syst Softw 82(11):1833–1842
6. Human Genome Project Information: http://www.ornl.gov/sci/techresources/Human_Genome/research/
sequencing.shtml. Accessed 15 November 2011
7. Jin HL, Fujiyoshi M, Kiya H (2007) Lossless data hiding in the spatial domain for high quality images.
IEICE Trans Fundam Electron Commun Comput Sci E90-A(4):771–777
8. Liao SR (2010) Information hiding schemes applied to biological gene sequences. Master thesis,
Chaoyang University of Technology
9. NCBI Database: http://www.ncbi.nlm.nih.gov/. Accessed 14 June 2010
10. Peterson I (2001) Hiding in DNA. Muse: 22
11. Shimanovsky B, Feng J, Potkonjak M (2002) Hiding data in DNA. Revised Papers from the 5th
International Workshop on Information Hiding. Lecture Notes Comput Sci 2578:373–386
12. Shiu HJ, Ng KL, Fang JF, Lee RCT, Huang CH (2010) Data hiding methods based upon DNA sequences.
Inform Sci 180(11):2196–2208
13. Tseng HW, Hsieh CP (2009) Prediction-based reversible data hiding. Inform Sci 179(14):2460–2469
14. Wu ZJ, Gao W, Yang W (2009) LPC parameters substitution for speech information hiding. J China Univ
Posts Telecommun 16(6):103–112
15. Wu M, Liu BD (2003) Data hiding in image and video: Part I—Fundamental issues and solutions. IEEE
Trans Image Process 12(6):685–695
16. Wu M, Yu H, Liu BD (2003) Data hiding in image and video: Part II—Fundamental issues and solutions.
IEEE Trans Image Process 12(6):696–705
17. Xu S, Zhang P, Wang P, Yang H (2009) Performance analysis of data hiding in MPEG-4 AAC audio.
Tsinghua Sci Technol 14(1):55–61
Ying-Hsuan Huang received the MS degree in Information Management from Chaoyang University of
Technology, Taiwan. He is currently pursuing the Ph.D. degree in Computer Science and Engineering from
National Chung Hsing University. His research interests include data hiding, secret sharing, watermarking and
image processing.
Multimed Tools Appl
Chin-Chen Chang received his Ph.D. degree in computer engineering from National Chiao Tung University.
His first degree is Bachelor of Science in Applied Mathematics and master degree is Master of Science in
computer and decision sciences. Both were awarded in National Tsing Hua University. Dr. Chang served in
National Chung Cheng University from 1989 to 2005. His current title is Chair Professor in Department of
Information Engineering and Computer Science, Feng Chia University, from Feb. 2005. Prior to joining Feng
Chia University, Professor Chang was an associate professor in Chiao Tung University, professor in National
Chung Hsing University, chair professor in National Chung Cheng University. He had also been Visiting
Researcher and Visiting Scientist to Tokyo University and Kyoto University, Japan. During his service in
Chung Cheng, Professor Chang served as Chairman of the Institute of Computer Science and Information
Engineering, Dean of College of Engineering, Provost and then Acting President of Chung Cheng University
and Director of Advisory Office in Ministry of Education, Taiwan. Professor Chang has won many research
awards and honorary positions by and in prestigious organizations both nationally and internationally. He is
currently a Fellow of IEEE and a Fellow of IEE, UK. And since his early years of career development, he
consecutively won Outstanding Talent in Information Sciences of the R. O. C., AceR Dragon Award of the
Ten Most Outstanding Talents, Outstanding Scholar Award of the R. O. C., Outstanding Engineering
Professor Award of the R. O. C., Distinguished Research Awards of National Science Council of the R. O.
C., Top Fifteen Scholars in Systems and Software Engineering of the Journal of Systems and Software, and so
on. On numerous occasions, he was invited to serve as Visiting Professor, Chair Professor, Honorary
Professor, Honorary Director, Honorary Chairman, Distinguished Alumnus, Distinguished Researcher, Re-
search Fellow by universities and research institutes. His current research interests include database design,
computer cryptography, image compression and data structures.
Chun-Yu Wu received the MS degree in Computer Science and Information Engineering from Chung Cheng
University, Taiwan. His research interests include data hiding.

A DNA-based Data Hiding Technique With Low Modification Rates

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A DNA-based Data Hiding Technique With Low Modification Rates

Uploaded by

Copyright:

Available Formats

Multimed Tools Appl

A DNA-based data hiding technique with low

Ying-Hsuan Huang & Chin-Chen Chang & Chun-Yu Wu

# Springer Science+Business Media, LLC 2012

Keywords DNA . Reversible data hiding . DNA modification rate . Histogram

C.-C. Chang (*)

nnnnnnnnnn nnnnnnnnnn nnnncccacc ctcctcccaa ataaaaccca ataacccaat

Fig. 1 Part of the DNA sequence [9]

2.1 Insertion method [12]

2.1.1 Embedding phase

Table 1 Binary coding rule

2.1.2 Extraction and recovery phase

2.2 Substitution method [12]

2.2.1 Embedding phase

Fig. 2 Substitution rule

2.2.2 Extraction and recovery phase

3.1 Embedding phase

n ¼ 2i mod 2t: ð1Þ

3.2 Extraction and recovery phase

Table 2 Seven DNA sequences [9]

AC153526 200,117 200,117 0

Sequence t02 t03

HC (bits) MR (%) h L1 L2 HC (bits) MR (%) h L1 L2

AC153526 5,591 4.43 5 11 14 2,380 2.14 21 62 11

Table 4 Distortion Control

Sequence t02 t03

Hiding capacity (bits) MR (%) Hiding capacity (bits) MR (%)

AC153526 2,380 2.89 2,380 2.14

Sequence Insertion scheme Substitution scheme Proposed scheme

MR (%) Capacity (bits) MR (%) Capacity (bits) MR (%) Capacity (bits)

AC153526 74.56 5,591 98.58 5,591 4.43 5,591

Sequence Insertion scheme [12] Substitution scheme [12] Proposed scheme

Expansion Capacity Expansion Capacity Expansion Capacity

AC153526 2,796 5,591 0 5,591 0 5,591

Requirement Chang et al.’s Insertion Substitution Proposed

Compression technique Yes No No No

40000 Original nucleotid

40000 Original nucleotid

40000 Original nucleotid

Fig. 4 Results of robustness analysis

You might also like