A New Steganography Algorithm Based On Video Sparse Representation

Multimedia Tools and Applications
https://doi.org/10.1007/s11042-019-08233-5
A new steganography algorithm based on video

sparse representation
Arash Jalali 1 & Hassan Farsi 1
Received: 9 January 2019 / Revised: 24 August 2019 / Accepted: 13 September 2019
# Springer Science+Business Media, LLC, part of Springer Nature 2019
Abstract
Steganography has been a great interest since long time ago. There are a lot of methods that
have been widely used since long past. Recently, there has been a growing interest in the use
of sparse representation in signal processing. Sparse representation can efficiently model
signals in different applications to facilitate processing. Much of the previous work was
focused on image and audio sparse representation for steganography. In this paper, a new
steganography scheme based on video sparse representation (VSR) is proposed. To exploit
proper dictionary, KSVD algorithm is applied to DCT coefficients of Y component related to
video (cover) frames. Both I and Q components of video frames are used for secure message
insertion. The aim is to hide secret messages into non-zero coefficients of sparse represen-
tation of DCT called, I and Q video frames. Several experiments are performed to evaluate
the performance of the proposed algorithm, in case of some metrics such as pick signal to
noise ratio (PSNR), the hiding ratio (HR), bit error rate (BER) and similarity (Sim) of secret
message, and also runtime. The simulation results show that the proposed method exhibits
appropriate invisibility and robustness.
Keywords Discrete Cosine Transform (DCT) . Dictionary Learning . KSVD . OMP . Sparse
Representation . Video Sparse Representation . Video Steganography
1 Introduction
Steganography deals with the ability of embedding data into a digital cover with minimum amount
of perceivable degradation which makes the hidden data invisible to a human observer [71]. There
are several requirements in steganography, including the high payload of hidden data, imperceptible
distortion, security and reliability [75]. To achieve the practical covert communication, valuable
information is firstly hidden in a host multimedia, such as text, digital image, audio or video, and
* Hassan Farsi
hfarsi@birjand.ac.ir
1
Department of Electrical and Computer Engineering, University of Birjand, Birjand, Iran
then secretly transmitted to the receiver. The problem occurs when traditional text and image-based
steganography techniques are not abundant. They are able to carry only small files. So, there is a
problem: how to get much enough files to hide our message. This will be a very difficult task to
carry a large amount of data, when we use traditional text and image-based steganography
techniques. This becomes a very difficult task for carry large amount of data. Here comes the need
to use these videos for hiding data inside it. Digital video files can serve as proper hosts and the use
of video as a cover (host) for the secure message overcomes the capacity problem. Information can
be hidden in any frame of video, so, video has a large capacity to store information [70].
Many interesting steganography techniques have been created and basically categorized
into two main branches, namely spatial domain techniques and transform domain techniques
[17]. Transform domain techniques alters the frequency coefficients instead of manipulating
directly the image pixels, thus keeping distortion at minimum level, so these techniques are
more secure and that is what makes them preferred. Also, in case of embedding capacity,
spatial domain techniques provide more capacity [7].
With advancements in mathematics, linear representation methods have been well studied and
have recently received considerable attention [3, 5]. The sparse representation method is the most
representative methodology of these methods. It can efficiently model signals in different appli-
cations to facilitate processing, especially in signal processing, image processing such as image
denoising, deblurring, inpainting, restoration, classification, segmentation, and super-resolution,
machine learning, computer vision, and visual tracking [11, 18, 19, 21, 27, 36, 51, 57, 83, 88].
Sparse representation revolves around designing a matrix D ∈ Rm × n called dictionary, such
that to obtain appropriate sparse representations y ≈ Dx for a class of signals y ∈ Rm given
through a set of samples. The also unknown representation vectors x ∈ Rn are sparse, meaning
that they have only a few nonzero coefficients. So, each signal is a linear combination of a few
atoms, as the dictionary columns. The last few years have witnessed fast development on
dictionary learning approaches for signal processing tasks, largely due to their utilization in
developing new techniques based on sparse representation. Compared to conventional tech-
niques employing manually defined dictionaries, such as Discrete Fourier Transform (DFT),
Discrete Cosine Transform (DCT), and Discrete Wavelet Transform (DWT), dictionary
learning aims at obtaining a dictionary adaptively from the data so as to support optimal
sparse representation of the data. In contrast to conventional algorithms, a dictionary-based
learning algorithm provides a more flexible representation of a signal.
The most successful classes of algorithms in this field are based on greedy approaches and
convex relaxation. These algorithms are guaranteed to find the sparsest solution according to
sets of conditions that ensure its success. In this matter, the headmost difficulty is the
computation of sparse representations of signals. So, the problem is to find the sparsest among
the infinite number of solutions. Two prominent applications of dictionary learning are in the
steganography and steganalysis, where more challenges have seen promising new solutions
based on sparse representation with learned dictionaries [41].
There are several limitations in the previous works which are our motivations to do this
research, as below:
a) Traditional text and image-based steganography techniques are not abundant.

b) Some previous works focused on image and audio sparse representation for steganography.
c) Some video steganography techniques presented schemes in raw videos formats.
d) There are yet more challenges in steganography regarding several requirements, such as
high payload of hidden data, imperceptible distortion, security and robustness.
In this paper, we study the video steganography based on sparse representation as an appealing
approach in which a new theory is used for steganography. The rest of the paper is organized as
follows: Section 2 presents related work and section 3 describes sparse representation for
application of video steganography. Section 4 deals with our proposed method. Section 5
presents experimental results, brief analysis and discussion of the proposed method. Summary
and conclusions drawn from the study have been given in section 6.
2 Related work
According to [52], steganography is hiding the existence of a message by hiding information

into various carriers. One of the early techniques for steganography was proposed in [42]. The
basic idea is to distribute the secure message over a wide range of frequencies of the host signal
(multimedia).
Many researchers have used the discrete cosine transform (DCT) or the discrete wavelet
transform (DWT) coefficients to embed the secure data for image steganography [47, 67].
In [1], a new strategy was proposed to improve embedding efficiency of LSB based
steganography by processing both secret image and cover image to achieve high similarities
between the secret bit-stream and the cover image LSB plane. This strategy produces stego-
images that have minimal distortion, high embedding efficiency, reasonably good stego-image
quality and robustness against steganalysis tools.
In [80], a binary image steganographic scheme was applied to preserve the quality of image
and to increase embedding capacity effectively through local texture patterns.
While much of the initial work was focused on image steganography [1, 15, 16, 56, 80],
recently several methods have been reported for embedding audio and video information in
video sequences and there are numerous papers about data embedding in videos. Video data
hiding presents a more challenging task compared to image data hiding. Digital video is a very
promising host candidate that can carry a large amount of data and its potential for secret
communications is largely unexplored. Since a video is formed from a sequence of frames, it
presents the data hider with the possibility to embed and send a large amount of data.
Some of video steganography techniques presented schemes in raw videos [68].
But embedding data in raw video is time-consuming that needs a lot of computer
process. In many papers data has been hidden in compressed domain. Some of these
papers considered data hiding in DCT coefficients [37, 38]. Mansouri et al. [38] have
presented an algorithm that embeds one bit in each qualified block in H.264 video
standard. Few papers have utilized temporal feature of the video [10, 55]. Xu et al.
[55] have proposed a steganographic method that uses temporal features of the video
for data hiding. In [78], the authors proposed a data hiding algorithm to embed
compressed video and audio data into video signal. The message data is embedded
in the DCT domain, by modifying the projections of the 8 × 8 host block DCT
coefficients. The data hiding rate is two bits per 8 × 8 block. The authors demonstrate
robustness to additive Gaussian noise and motion JPEG compression.
More recently, the authors in [14] have presented a technique for hiding audio in video.
They have used multidimensional lattice structures to embed an 8 KHz speech signal, and the
data hiding rate is about 1%. In [84], for hiding data inside the video, the least significant bit
(LSB) method was used, which is considering an appropriate technique for hiding all letters of
the message with pixels of frames of the video in a randomized cyclic manner.
Recently, there has been a growing interest in the use of sparse representation in signal
processing [33, 69]. The key idea of sparse representation is to approximate a signal via a
linear combination of atoms from an over-complete dictionary. So, the signal can be recon-
structed by aggregating atoms of the over-complete dictionary with their corresponding sparse
coefficients, obtained via l1 relaxation [8, 95] or matching pursuit algorithms [40, 82].
The earlier works of sparse representation [25] mainly concentrated on the predefined
parametric over-complete dictionary. Afterwards, Wright et al. [32] found out that the dictio-
nary can be directly chosen from training examples. More commonly, the dictionary learned
from training examples has shown more effectiveness in modeling images [74]. The past
decade has witnessed the explosion of approaches on learning the nonparametric dictionary
from a set of training examples. At the beginning, several approaches [92, 93] were proposed
for learning reconstructive dictionary, aiming at faithfully reconstructing signals. A represen-
tative approach is K-SVD [20]. In exploring the reconstructive capability of the learned
dictionary, K-SVD approach has been successfully applied for many image reconstruction
tasks, such as image denoising [85] and image compression [81]. This form of sparse structure
also, arises naturally in many other applications [26, 49, 89, 94]. For example, audio signals
are sparse in the frequency domain, especially for the sounds representing tones.
In objects recognition, sparse coding (SC) [90, 91] methods have been demonstrated effectively,
which describe each local feature as a linear combination of a small number of visual words.
Sparse coding as a basic technique in compressive sensing theory [31] has been yet
introduced into speech recognition or speech feature enhancement [23, 34, 35].
Image processing can exploit a sparsity property in the discrete cosine domain, i.e., many
DCT coefficients of images are zero or small enough to be regarded as zero. This type of
sparsity property has enabled intensive research on signal and data processing, such as
dimension reduction in data science, wideband sensing in CRNs, data collection in large scale
wireless sensor networks (WSNs), ultra-wideband (UWB) communication systems, and
channel estimation and feedback in massive MIMO [43–45, 73].
Recent years have witnessed a growing interest in the search for sparse representations of signals
for steganography and steganalysis. The authors in [65, 66] have proposed new image steganog-
raphy techniques based on compressive sensing. In [41], a new universal image steganalysis system
based on double sparse representation classifier (DSRC) was proposed. In [2], a new stegano-
graphic algorithm based on discrete wavelet transform was introduced, that uses the K-SVD
algorithm for dictionary learning [76, 77]. Following the dictionary leaning step, a sparse repre-
sentation of the wavelet coefficients on the learned dictionary is used to embed the secret data.
In [6], a novel speech steganography method was presented using discrete wavelet transform
and sparse decomposition. The proposed method also yielded improvements on both stego signal
quality and embedding capacity. The goal was to present a speech steganography method which
was less detectable, so more secure, against steganalyzers. The novelty of the work came from the
employment of sparse representation (SR) of wavelet coefficients for secret data embedding.
In [13], a new steganographic method was proposed using region-level sparse representa-
tion for high density reversible data hiding (RDH). In this work, the similarity of adjacent
pixels of in sub image regions was utilized for reversible data hiding and represented by region
level sparse representation. The text message was hidden in the cover image. Then encryption
and decryption processes were used in the proposed steganographic technique.
In [86], a novel biometric authentication information hiding method based on the sparse
representation technique was proposed for enhancing the security of biometric information
transmitted in the network. In the method, for exploiting the correlation between the cover and
biometric images, the sparse representation was used. Thus, the biometric image was divided into
two parts. The first part was the reconstructed image, and the other part was the residual image. The
residual image and sparse representation coefficients were embedded into the cover image.
In [4], the authors addressed the use of sparse representation to securely hide a message
within non-overlapping blocks of a given color image in the wavelet domain. They used one of
the color bands to estimate a proper dictionary. In the paper, their proposed algorithm was
called SDW-steg, where ‘SD’, ‘W’, and ‘steg’ referred to the ‘sparse decomposition’, ‘wavelet
transform’, and ‘steganography’, respectively.
In [48], a patch-level sparse representation method for reversible data hiding in encrypted
image was proposed according to the correlation between neighbor pixels. In this method, each
patch was linearly represented by some atoms in an over-complete dictionary, the leading
residual errors were encoded and embedded within the cover image, and the learned dictionary
was also embedded into the encrypted image.
In [87], the authors focused on security and payload capacity enhancement of an image
steganography system for an audio message by using compressed sensing theory. However, in
order to utilize compressed sensing, the audio message was first converted to an equivalent
grayscale image which was sparsified using 2D-DCT and thresholding. The compressed image
was embedded in chaotically chosen pixels of the cover image. At receiver the compressed
sensing reconstruction algorithm was used to reconstruct the grayscale image which was then
converted back to the audio message.
This paper [12] provided an overview of a proposed dissertation work of highly efficient
steganography using patch level sparse representation. In this paper a method for reversible
data hiding in encrypted images was also proposed. The proposed method created a sparse
space to accommodate some additional data by compressing the LSBs of the encrypted image.
In [39], a novel framework of performing multimedia data hiding was proposed using an
over-complete dictionary. According to the fact that, sparse representation of the host signal is
less than the signal’s dimension in the original domain, an informed sparse domain data hiding
system was established by modifying the coefficients of the atoms that had not participated in
representing the host signal and a single support modification-based data hiding system was
proposed and analyzed as an example.
3 Sparsity
Sparse representation can efficiently model signals in different applications to facilitate process-
ing. It revolves around designing a matrix D ∈ Rm × n called dictionary, such that to obtain proper
sparse representations y ≈ Dx for a class of signals y ∈ Rm given through a set of samples. The also
unknown representation vectors x ∈ Rn are sparse, meaning that they have only a few nonzero
coefficients. So, each signal is a linear combination of a few atoms, as the dictionary columns.
3.1 Sparse representation
The main character in sparse model is a matrix D ∈ Rm × n called dictionary. Its columns are
named atoms; we denote dj the jth column of matrix D; we assume that all atoms are
normalized:

d j ¼ 1; j ¼ 1; …; n: ð1Þ
where ‖.‖ is the Euclidean norm and in most cases the dictionary is over-complete, which
means m < n. A vector y ∈ Rm, called here signal, has a sparse representation if it can be written
as a linear combination of few atoms, i.e.,
y ¼ Dx ¼ ∑nj¼1 x j d j ¼ ∑jϵS x j d j ; ð2Þ
where most of the coefficients xj are zero. We name S = {j | xj ≠ 0}, the support of the signal.
In the sparse representation problem, the dictionary D and the signal y are given and we
want to find the sparse vector x which is the solution of relation (2). The problem is interesting,
when x is the sparsest solution, for example when the problem is
min x
x kxk0 s:t:y ¼ Dx ð3Þ
When m <n, the system y = Dx has an infinite number of solutions, from which we should pick
the sparsest.
The approximate sparse representation problem can then be modeled in two ways. In the
first, we aim to minimize the representation error given by:
min x
x k y−Dxk2 s:t:kxk0 ≤s ð4Þ
where s is the number of atoms involved in the representation and the choice of s is mostly
based on a try-and-error approach.
The second representation problem establishes an error bound ε with the following equation
and looks for the sparsest representation
min x
x kxk0 s:t:k y−Dxk ≤ε ð5Þ
Both above problems are NP-hard, hence we are interested in fastalgorithms that give subop-
timal solutions. There are many algorithms for solving the sparse representation problems (4)
and (5), based on suboptimal solutions. Although they cannot find the optimal solution in
general, some of them are guaranteed to succeed under certain conditions. Greedy algorithms
and convex relaxation techniques are the most important classes of them [9, 50, 53].
The most popular greedy algorithms are Orthogonal Matching Pursuit (OMP) and Orthog-
onal Least Squares (OLS) [30, 46, 54, 72]. Convex relaxation techniques are replacing the l0
norm in (3) with an actual l1 norm, thus, obtaining the convex optimization problem as follow
min x
x kxk1 s:t:y ¼ Dx ð6Þ
Other algorithms besides the greedy and convex relaxation approaches, there are other families
of sparse representations algorithms, like [22, 24, 63, 64, 79]. The OMP algorithm is
summarized in Fig. 1.
According to the (4) and (5) relations, two stopping criteria can be used in this algorithm.
The first one is maximum number of atoms s; once this level is attained, the algorithm stops
regardless of the error. The second one is an error bound ε and grows the support S until the
actual error becomes smaller than the bound, regardless of the support size.
3.2 Dictionary learning
A significant issue when using sparse representations is learned dictionaries for the training
signals available from a specific application.
Fig. 1 Orthogonal Matching Pursuit (OMP) algorithm
Two subjects that are vital in DL algorithms are sparse coding and dictionary update.
Learning dictionaries (LD) is a hard problem, for example, as an adaptive process case, a
learned dictionary should have more atoms where the density of training signals is higher and
fewer atoms where the density is lower. However, there are several good algorithms for it.
We can formulate the dictionary learning as an optimization problem, where the represen-
tation error is minimized under the constraint of sparsity. So, we are clearly allowed to work
with an approximate model for relation (2) as following relation.
y ¼ Dx þ v; ð7Þ
where x is sparse code and v is the noise, typically Gaussian, with small variance. In this
model, we should achieve an appropriate dictionary, means finding the best dictionary D,
which is adapted to the class of signals at hand, called training signals. This adaptation can be
stated as an optimization process or dictionary learning.
Dictionary learning consists of solving the following optimization problem.
min D;X
D;X kY −DX k2F s:t: kxl k0 ≤s; l ¼ 1 : N

d j ¼ 1; j ¼ 1 : n ð8Þ
where the variables are the dictionary D ∈ Rm × n, the sparse representations matrix X ∈ Rn × N,
whose columns have at most s nonzero elements, that is, the sparsity level is s, and Y ∈ Rm × N,
whose columns are the training signals, so that the number of training signals is N.
Also, E is the representation error (or residual) for dictionary D and sparse representations
X. It is defined as (9).
E¼Y −DX ð9Þ
so, the objective in (8) is.
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
kEk F ¼ ∑m i¼1 ∑l¼1 eil
N 2 ð10Þ
and we will have.
kEk2F ¼ ∑Nl¼1 kel k2 ¼ ∑Nl¼1 kyl −Dxl k2 ð11Þ
The DL problem in (8) is divided to two subproblems. One is the sparse coding (or
representation) problem.
min X
X kY −DX k2F
s:t: kxl k0 ≤s; l ¼ 1 : N ð12Þ
where the dictionary is fixed and the sparse representations have to be computed. The second
is that of dictionary update.
min D;ðX Þ
D;X kY −DX k2F s:t:X Ω ¼ fixed

d j ¼ 1; j ¼ 1 : n ð13Þ
Here, the sparsity pattern is fixed and the dictionary is a variable.

When the representation matrix X is fixed and only the dictionary is variable, the problem is
not convex, although the objective is a convex function, due to the normalization constraints.
With the sparsity pattern fixed, the problem is bi-quadratic, hence not convex. Thus, finding an
optimal solution may be hard. Only when the dictionary D and the sparsity pattern are fixed,
the problem is convex and optimal representations can be computed independently for each
signal via OMP algorithm.
There are many dictionary learning algorithms optimizing the representation error problem.
The most successful techniques rely on iteratively solving both sparse coding and dictionary
update problems optimally or sub-optimally. The OMP is often the choice for sparse coding,
which is also efficient in KSVD algorithm for dictionary learning. The K-SVD dictionary
update is detailed in Fig. 2.
There is a simple strategy in the attempt to overcome the difficulties of the optimization
process, due to non-convexity and to the huge size of the problem and iteratively improve a
variable while keeping the others fixed. So, the two main operations are iterated for a high
enough number of iterations K. First, keeping fixed the current dictionary D, the sparse
representations X are computed; this is the sparse coding step and is typically solved with
simple algorithms like OMP, as detailed in Fig. 1. Then, in the dictionary update step, the
dictionary is improved, while the nonzero pattern of the representations is kept fixed.
An initial dictionary is necessary to start the alternate optimization by performing the sparse
coding step. The simplest initialization is either to give random values to the atoms or
randomly selecting n signals to serve as initial atoms.
Fig. 2 K-SVD dictionary update algorithm

4 Proposed method: Video steganography based on sparse

representation
In this section, we describe detailed explanations of video steganography based on video sparse
representation (VSR). The proposed method hides secret messages into non-zero coefficients of
sparse representation of DCT called, I and Q components of video frames. We now explain
related techniques, the embedding and extraction procedures of the proposed method.
4.1 Related techniques
One of the color space systems is known as YIQ which is used for both image and video
signals. The main advantage of this format is that grayscale information is separated from color
data, which is useful for isolating the gray level information in an image. In this system, image
data consists of three components: luminance (Y), hue (I), and saturation (Q). The first
component, luminance, represents grayscale information (or brightness of the color), while
the last two components make up chrominance (or color information). The human eye is more
sensitive to Y component, so, it needs to be more correct. But, I and Q components are less
sensitive to the human eye. Therefore it needs not to be more accurate. The following relation
converts RGB color space to the YIQ color space.
2 3 2 32 3
Y 0:299 0:587 0:114 R
4 I 5 ¼ 4 0:596 −0:274 −0:322 5:4 G 5 ð14Þ
Q 0:211 −0:523 0:312 B
Two-Dimensional Discrete Cosine Transform (2D-DCT) and the corresponding inverse trans-
formation (2D-IDCT) are defined as (15) and (16) respectively [22].
2 ð2x þ 1Þmπ ð2y þ 1Þnπ
2D−DCT : F ðm; nÞ ¼ pffiffiffiffiffiffiffiffi C ðmÞC ðnÞ∑M−1 N −1
x¼0 ∑y¼0 f ðx; yÞcos cos ð15Þ
MN 2M 2N
pffiffiffi
where C ðmÞ; C ðnÞ ¼ 1= 2 for m, n = 0 and C(m), C(n) = 1 otherwise.
2 ð2x þ 1Þmπ ð2y þ 1Þnπ
2D−IDCT : f ðx; yÞ ¼ pffiffiffiffiffiffiffiffi ∑M −1 N −1
m¼0 ∑n¼0 C ðmÞC ðnÞ F ðm; nÞcos cos ð16Þ
MN 2M 2N
pffiffiffi
where F (m, n) is the DCT of the signal f (x, y) and C ðmÞ; C ðnÞ ¼ 1= 2 for m, n = 0 and C
(m), C (n) = 1 otherwise.
In this paper the algorithm is a transform domain method based on discrete cosine transform
(DCT). DCT coefficients are usually sparse so they can sparsely be represented over a proper
dictionary.
We use Y component of the cover video frames for dictionary learning. We apply 2D-DCT
to each grayscale frame (Y component) of video cover signal and make the proper dictionary
using KSVD technique. Two other components (I and Q) are also used for secure message
insertion. As we mentioned, the goal is to hide secret data into non-zero coefficients of sparse
representation of DCT of so called, I and Q video frames.
4.2 Embedding procedure
The proposed VSR steganography algorithm is based on DCT domain. As we mentioned

previously, the video frames are sparse in the DCT domain. Our algorithm uses this fact to
insert the secret message bit stream m into the sparse representation coefficients of DCT called,
I and Q video frames. The block diagram of proposed method is depicted in Fig. 3. The
functionality of each block is described in the following 16 steps:
& Step 1 (input 1): Select some training video files for dictionary learning.
We use K (25 or more) video streams that contain a wide range of scenarios.
& Step 2 (framing 1): Take or extract all the frames of video streams in AVI format.
In this step, the training video x(t) is divided into N non-overlapping frames xn’s, each frame
consists of (or resize to) M × N pixels. For example, the nth frame, xn, is represented as:
2 3
xn ð1; 1Þ ⋯ xn ð1; N Þ
xn ¼ 4 ⋮ ⋱ ⋮ 5 ð17Þ
xn ðM ; 1Þ … xn ðM; N Þ MN
These frames collectively form the new data matrix X, where its nth column is xn. The size and
the length of the frames are M × N and L = M*N, respectively and it is assumed to be available
at the receiver side.
& Step 3 (changing color space 1): Do RGB to YIQ conversion for each frame.
This step is performed by using eq. (14) which has been introduced previously.
& Step 4 (applying 2D-DCT): Apply 2D-DCT to Y component of each frame.
Invertible transforms such as discrete Fourier transform (DFT), the discrete wavelet transform
(DWT), or the discrete cosine transform (DCT) can be easily calculated. The DCT transforms
Fig. 3 Block diagram of embedding procedure

the Y component of training video frames into the low and high frequency coefficients. The
low frequency coefficients of the DCT are the most important part of a frame, while the high
frequency coefficients influence the nuance of that frame or details. This step uses eq. (15) to
calculate DCT coefficients of Y component of each frame.
& Step 5 (dictionary learning): Perform dictionary learning based on KSVD algorithm.
We use K (25 or more) video streams that contain a wide range of scenarios, each of them has
185 frames and each frame consist of (or resize to) 350 × 240 pixels. In summation, 4625 Y
component of frames are used for dictionary learning. There are many dictionary learning
algorithms optimizing the representation error problem. The most successful techniques rely
on iteratively solving both sparse coding and dictionary update problems optimally or sub-
optimally. The OMP is often the choice for sparse coding, which is also efficient in KSVD
algorithm for dictionary learning.
As mentioned in section 3, we employ the KSVD dictionary learning algorithm in this step.
The KSVD algorithm finds the dictionary elements that are sparse. The K-SVD dictionary
update is detailed in Fig. 2. According to the steps which were introduced in this figure,
dictionary learning process is performed. In this process we use the DCT coefficients of Y
component of all frames to estimate the proper dictionary, in order to present I and Q
components of each frame sparsely over it.
& Step 6 (dictionary): Proper dictionary is exploited according to the K-SVD dictionary
update algorithm as described in Fig. 2.
After applying KSVD algorithm to each training video (Step 5), the dictionary is updated iteratively.
The output of the KSVD algorithm, after dictionary learning and updating is a proper dictionary
matrix D where the train set could be sparsely represented over it. We also use the estimated
dictionary D the I and Q component sparsely in step 11. The KSVD algorithm reconstructs the
dictionary at the receiver side, in the same way as that at the sender side, by using the same input
training videos and the same frame size. It is worth mentioning that although modifications of the I
and Q components of frames due to secret message embedding by our method may generally lead
to changes of the Y component, this process has a negligible effect on Y component. Thus, the
dictionary is well reconstructed at the receiver side. In our preliminary experiments over 4625 Y
components, we note to the l2 − norm of the difference between dictionaries reconstructed at the
receiver side and the sender side is in the order of 10−9. Therefore, for algorithm simplicity and time
consumption, we use the proper dictionary D in the sender side at the receiver side.
& Step 7 (input 2): Select a video file as cover.
In this step we select a video file as the host for secret message embedding.
& Step 8 (framing 2): Take or extract all the frames of cover video in. AVI format.
This step is as same as Step 2 which has been described previously.
& Step 9 (changing color space 2): Do RGB to YIQ conversion for each frame of cover video.
& Step 10 (applying 2D-DCT): Apply 2D-DCT to Y, I and Q components of the video
frames, separately.
Both 9 and 10 steps are similar to the step 3 and 4, respectively.
& Step 11 (sparse representation): Select both I and Q components, which are exploited from
step 10 and then calculate the sparse representation of them using OMP algorithm
(presented in Fig. 1) and exploited dictionary from step 6.
The most popular greedy algorithms are Orthogonal Matching Pursuit (OMP). This algorithm
is summarized in section 3, Fig. 1. According to the relations 4 and 5, two stopping criteria can
be used in this algorithm. The first one is maximum number of atoms s; once this level is
attained, the algorithm stops regardless of the error. The second one is an error bound ε and
grows the support S until the actual error becomes smaller than the bound, regardless of the
support size. After finding a proper dictionary, both I and Q components of host video frames
is represented sparsely over the estimated dictionary D. The inputs are dictionary D, signal y (I
and Q components separately), and the termination values number of atoms, s, and stopping
error, ε. After applying the OMP algorithm to the I and Q components, the output S, which is
the sparse representation of these components are calculated. S, is the sparse representation
coefficient matrix where the secret message will be inserted in its non-zero elements.
& Step 12 (insertion): Perform the insertion operation by adding the secret message to the
non-zero sparse coefficients.
Sparse representation of a frame represents it as a linear combination of a set of

elementary structural elements. The non-zero elements of the sparse representation of
that frame represent the importance of each structural element in frame reconstruction.
Secret message inserted into the non-zero coefficients slightly changes the effect of
the related elements in frame reconstruction. The zero coefficients are not used for
data insertion, because any changes in them may lead to audible distortions in the
stego-frame. Different methods can be used for data insertion into the non-zero
coefficients. We use the LSB method one of basic steganography method, for data
embedding. The LSBs of the fraction part of the non-zero coefficients are substituted
with a secret message bit in the proposed method. We do not change the integer part
of the non-zero coefficients, so that less distortion is introduced into the cover video
frames. If the number of LSBs used for data embedding increases, the hidden data
extractionerrors that may occur at the receiver side decreases, while may lead to a
higher distortion.
& Step 13 (frame reconstruction): Reconstruct both I and Q component frames.
Following the hidden secret message insertion stage, the I and Q component DCT coefficient
of stego-frames is reconstructed from its sparse representation coefficients matrix as:
^ ^
XQ ¼ DS Q ð18Þ
^
where XQ is the DCT coefficient of Q component of stego-video frame, D is the estimated
Dictionary and S^ is the sparse representation of Q component of stego-video frame. The same
Q
scenario exists for I component.
& Step 14 (2D-IDCT): Take Y component from step 10 and Y and Q reconstructed
components from step 13 and then apply 2D-IDCT to them.
This step is the inverse of step 4. Relation 16 is used to calculate the inverse discrete cosine
transform of stego-video components.
& Step 15 (changing color space 3): Perform the process of color conversion, changing YIQ
to RGB color space for each frame.
& Step 16 (de-framing): Re-constitute the frames for saving stego-video.
4.3 Extraction procedure
At the receiver side, the embedded secret message can be extracted based on the received
stego-video and some assumed available information such as dictionary D, the termination
values including number of atoms, s, and stopping error, ε. Figure 4 shows the extraction
procedure for proposed algorithm. The method is explained in the following 6 steps.
& Step 1 (input 1): Recall the stego video.
In this step a stego-video which is received from the sender side is revoked.
& Step 2 (framing 1): Take or extract all the frames of stego video.
Fig. 4 Block diagram of de-embedding (extraction) procedure

Table 1 List of the no. of frames and size of different cover video
Video (Cover) Name Number of video frames Frame Dimension Video Size (KBs)
Missile.avi 30 155×208 283.5

Blossom.avi 30 216×192 349.5
Flower.avi 30 350×240 454.5
Same as step 7 of the embedding procedure, the stego-video is recalled and divided into N
non-overlapping frames.
& Step 3 (changing color space): Do RGB to YIQ conversion for each frame of stego video.
& Step 4 (applying 2D-DCT): Apply 2D-DCT to I and Q components of the video frames,
separately.
& Step 5 (sparse representation): Select both I and Q components in the DCT domain, which
are exploited from step 4 and then calculate the sparse representation of them using OMP
algorithm (introduced in Fig. 1) and exploited dictionary from sender side.
& Step 6 (extraction): Perform this step by extraction of the secret message from non-zero
sparse coefficients.
5 Simulation results and discussion
In this section, we report our simulation results to evaluate performance of the proposed method.
We use 25 video streams that contain a wide range of scenarios, each of which has 185 frames and
each frame consist of (or resize to) 350 × 240 pixels. In summation, 4625 Y component of frames
are used for dictionary learning. Also, some binary images of different sizes are used as secure
message for the experiments. The final version of the algorithm is tested on “.avi” video formats.
Therefore, three video files Missile.avi, Blossom.avi, and Flower.avi are used as cover video
throughout the experiments. Table 1 provides the resolution, number of frames, resolution, and
size of the video files used. The experiments are conducted on these test video files for assessing
the performance of the proposed method. The number of frames of the cover video is 30. For
dictionary learning the KSVD algorithm is applied to frames’ Y component, and then the OMP
algorithm is used for sparse representation of I and Q components.
Fig. 5 RGB, YIQ, Y, I and Q components representation of first frame for Missile.avi (top), Blossom.avi
(middle), and Flower.avi (bottom) videos
Fig. 6 Performing 2D-DCT of Y component and 2D-DCT of I and Q components of first frame for Missile.avi
(top), Blossom.avi (middle), and Flower.avi (bottom) videos
In the following figures, the simulation results step by step are presented. Figure 5 shows
the RGB and YIQ color space, Y, I, and Q components of the first frame, for each video in the
Table 1. Then, 2D-DCT of Y component and 2D-DCT of I and Q components of first frame for
them are shown in Fig. 6.
Figure 7 illustrates five frames (1st, 7th, 15th, 23rd, and 30th) of the cover videos.
The stego videos are shown in Fig. 8 which represents the corresponding retrieved
video frames. Secret message and extracted messages are shown in Figs. 9 and 10
respectively.
Since, when we use a steganography method, the host signal (cover) is distorted, we will
eager to evaluate whether the distortion level is acceptable or not. To do that there are generally
some metrics. Therefore, we evaluate the performance of the algorithm in case of some metrics
such as pick signal to noise ratio (PSNR), the hiding ratio (HR), bit error rate (BER) and
similarity (Sim) of secret message, and also runtime.
Pick Signal to Noise Ratio (PSNR) is a common metric utilized to calculate the difference
between the cover and stego signal. The PSNR can be calculated as follows [29, 61]:

2552
PSNR ¼ 10 log10 ðdBÞ ð19Þ
MSE
Fig. 7 Representation of cover video (Frame 1, 7, 15, 23 and 30) for Missile.avi (top), Blossom.avi (middle), and
Flower.avi (bottom) videos
Fig. 8: representation of stego video (Frame 1, 7, 15, 23 and 30) for Missile.avi (top), Blossom.avi (middle), and
Flower.avi (bottom) videos.
2
∑m
i¼1 ∑ j¼1 ∑k¼1 ½C ði; j; k Þ−S ði; j; k Þ
n h
MSE ¼ ð20Þ
mnh
C and S represent the host and stego video frames. Both m and n indicate the frame dimentions,
and h represents the RGB colors (k = 1, 2, and 3). On the other hand, the embedding capacity
is a major factor that any method tried to increase it with the respect of the high transparency.
According to [58], the hiding ratio (HR) is calculated in the following relation:
size of embedding secure message
HR ¼ 100 ð21Þ
cover video size
To further evaluate the performance of any steganographic algorithm in terms of robustness,
two objective metrics including bit error rate (BER) and similarity (Sim) are used. These
metrics are used to determine whether the secret messages are retrieved from the stego videos
successfully or corrupted during the communication by comparing the concealed and extracted
covert data. The BER and Sim are computed in the following formulas [59, 62]:
Fig. 9 Secret message

Fig. 10 Extracted message from Missile.avi (left), Blossom.avi (middle), and Flower.avi (right) stego videos
2
^
M ði; jÞ⊕M i; j
BER ¼ ∑ai¼1 ∑bj¼1 4 ð22Þ
a b 100
2
^
6 M ði; jÞ M ði; j
Sim ¼ ∑ai¼1 ∑bj¼1 4 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiqffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð23Þ
^
∑ai¼1 ∑bj¼1 M ði; jÞ2 ∑ai¼1 ∑bj¼1 M ði; jÞ2
where M and M^ are the concealed and extracted hidden data, and, a and b are the size of the
hidden data.
Another parameter for performance analysis of steganography algorithm is runtime.
Runtime measures the amount of time taken to extract the secure message with respect to
the secure message size. Runtime is mathematically formulated as given below.
Runtime ¼ ðSMET Þ ðSMS Þ ð24Þ
where SMET and SMS parameters are secure message extraction time and secure message size,
respectively. In Table 2, the runtime values for three different video cover are calculated.
The experimental results obtained from the Missile.avi, Blossom.avi, Flower.avi video
stream for the proposed algorithm regarding to peak signal to noise ratio (PSNR) measurement
are shown in Fig. 11. The computed PSNR values indicate the quality approximation between
a reconstructed image and the original image. The fact behind the mechanism is to compute a
value that reflects the quality of the reconstructed image. Reconstructed images with higher
PSNR values are evaluated as better.
To extend the simulation results, we calculate and tabulate the evaluation parameters in
Tables 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13 and 14, for 12 different video streams, separately.
Then, the proposed algorithm is tested under different types of attacks such as Gaussian
noise with the zero mean and variance = 0.05, salt & pepper noise with the density = 0.05, and
median filtering. To achieve the robustness of the algorithm, the higher Sim and lower BER
must be obtained. Table 15 illustrates the robustness of the proposed algorithm under these
attacks related to Sim and BER calculations:
To measure the performance of our proposed method, we compare our algorithm with other
conventional schemes. Therefore, the comparison is drawn for existing [28, 60, 68, 84] and
Table 2 Runtime measurement for three different video host
Video (Cover) Name Frame Dimension Video Size (KBs) Secret Message Size (KBs) Runtime (ms)
Missile.avi 155×208 283.5 45.5 25.06

Blossom.avi 216×192 349.5 45.5 39.11
Flower.avi 350×240 454.5 45.5 41.8
Fig. 11 Comparison of PSNR value per frame for 3 different video host
Table 3 Tabulation of performance evaluation of proposed algorithm for missile.avi video host
Video cover Size (KBs) Secret Message Size (KBs) Runtime (ms) BER (%) Sim HR
283.5 25.4 24.83 17.05 0.999 8.95

283.5 53.1 37.11 23.74 0.997 18.73
283.5 87.9 40.75 31.56 0.998 31.00
Table 4 Tabulation of performance evaluation of proposed algorithm for blossom.avi video host
349.5 25.4 38.67 25.31 0.996 7.27

349.5 53.1 52.26 30.71 0.995 15.19
349.5 87.9 55.38 37.69 0.993 25.15
Table 5 Tabulation of performance evaluation of proposed algorithm for flower.avi video host
454.5 25.4 24.83 32.03 0.995 5.59

454.5 53.1 37.11 36.42 0.992 11.68
454.5 87.9 40.75 42.51 0.990 19.34
Table 6 Tabulation of performance evaluation of proposed algorithm for Beauty.avi video host
429 25.4 33.25 33.22 0.990 6.27

429 53.1 41.13 38.81 0.988 14.32
429 87.9 50.07 47.78 0.987 27.11
Table 7 Tabulation of performance evaluation of proposed algorithm for vehicle.avi video host
317.3 25.4 44.96 30.82 0.988 9.43

317.3 53.1 57.19 40.99 0.984 19.02
317.3 87.9 69.23 57.14 0.980 29.55
Table 8 Tabulation of performance evaluation of proposed algorithm for athletic.avi video host
400.7 25.4 30.09 19.38 0.996 5.90

400.7 53.1 39.55 30.05 0.994 13.58
400.7 87.9 48.70 38.56 0.991 21.36
Table 9 Tabulation of performance evaluation of proposed algorithm for ship.avi video host
295.5 25.4 47.13 20.25 0.993 6.17

295.5 53.1 58.63 33.44 0.990 13.57
295.5 87.9 70.51 42.17 0.988 22.47
Table 10 Tabulation of performance evaluation of proposed algorithm for tennis.avi video host
210 25.4 50.88 29.45 0.987 8.16

210 53.1 62.19 35.96 0.981 19.56
210 87.9 73.20 43.70 0.979 26.66
Table 11 Tabulation of performance evaluation of proposed algorithm for office.avi video host
525.1 25.4 19.88 17.47 0.995 6.35

525.1 53.1 31.14 24.66 0.992 12.44
525.1 87.9 43.27 31.75 0.989 20.01
Table 12 Tabulation of performance evaluation of proposed algorithm for calendar.avi video host
427.6 25.4 71.45 63.18 0.992 7.20

427.6 53.1 82.36 75.50 0.988 15.53
427.6 87.9 90.24 81.37 0.987 28.72
Table 13 Tabulation of performance evaluation of proposed algorithm for park.avi video host
211.5 25.4 43.61 22.96 0.999 8.88

211.5 53.1 51.94 29.81 0.997 14.35
211.5 87.9 63.83 35.13 0.994 21.17
Table 14 Tabulation of performance evaluation of proposed algorithm for hotel.avi video host
357.1 25.4 26.19 35.08 0.998 9.16

357.1 53.1 39.81 45.52 0.997 19.90
357.1 87.9 44.44 58.16 0.993 26.91
proposed work in Fig. 12. In the above figure, X-axis is showing the frames’ number and y-
axis is showing the PSNR value obtained after the simulation. As depicted, it is concluded that
the PSNR value for proposed work is enhanced as compare to the existing work. To do the test,
we use Missile.avi video frames as cover. The plots also show that, our proposed algorithm
provides high imperceptible stego video for human vision of the hidden secret message.
6 Conclusion
In this paper, we have proposed a new steganography algorithm based on video sparse represen-
tation (V-SR). To exploit proper dictionary, KSVD algorithm was applied to DCT coefficients of
Y component related to video (cover) frames. Then, both I and Q components of video frames
were selected and the sparse representation of them calculated by using OMP technique. Several
experiments were performed to evaluate the performance of the proposed algorithm, in case of
some metrics such as pick signal to noise ratio (PSNR), the hiding ratio (HR), bit error rate (BER)
and similarity (Sim) of secret message, and also runtime. The simulation results show that the
proposed method exhibits good invisibility. Moreover, the experimental results showed that the
proposed algorithm is robust against several attacks. The results of comparison also show that the
Table 15 BER and Sim values of the proposed algorithm under different attacks
Attack Type Gaussian noise Salt and Pepper noise Median Filtering
Variance = 0.05 Density = 0.05
Video Name BER (%) Sim BER (%) Sim BER (%) Sim
Missile.avi 24.07 0.9672 22.88 0.9699 19.92 0.9887

Blossom.avi 29.39 0.9507 30.42 0.9600 26.35 0.9796
Flower.avi 37.78 0.9311 36.19 0.9483 33.10 0.9680
Beauty.avi 28.12 0.9601 25.36 0.9633 22.33 0.9324
Vehicle.avi 32.25 0.9842 30.52 0.9879 28.41 0.9512
Athletic.avi 40.26 0.9944 37.13 0.9972 34.73 0.9641
Ship.avi 32.38 0.9899 29.48 0.9918 25.15 0.9606
Tennis.avi 26.45 0.9853 23.88 0.9878 21.63 0.9570
Office.avi 32.58 0.9927 30.75 0.9952 28.79 0.9644
Calendar.avi 29.91 0.9905 26.66 0.9931 22.39 0.9599
Park.avi 38.16 0.9968 34.29 0.9994 30.40 0.9677
Hotel.avi 34.55 0.9960 31.11 0.9995 29.87 0.9640
Fig. 12 PSNR Comparison for our method with the methods in [59–62, 68, 70, 84], and [28]
proposed algorithm outperformed other existing algorithms [29, 61, 68, 84], in terms of PSNR.
For future work, we would like to improve the embedding payload of the proposed algorithm
with the respect of the video quality by using optimization techniques such as GA, PSO. Also, we
would like to conduct new techniques to enhance the security of the algorithm.
References
1. Abdulla AA, Sellahewa H, Jassim SA (2019) Improving embedding efficiency for digital steganography by
exploiting similarities between secret and cover images. Multimed Tools Appl:1–25
2. Ahani S, Ghaemmaghami S (2010) Image steganography based on sparse decomposition in wavelet space.
IEEE International Conference on Information Theory and Information Security, Beijing, pp 632–637
3. Ahani S, Ghaemmaghami S (2015) Colour image steganography method based on sparse representation. IET
Image Process 6:496–505
4. Ahani S, Ghaemmaghami S (2015) Color image steganography method based on sparse representation. IET
Image Process (6):496–505
5. Ahani S, Ghaemmaghami S, Wang ZJ (2015) A Sparse Representation-Based Wavelet Domain Speech
Steganography Method. IEEE/ACM Transactions on Audio, Speech, and Language Processing 1:80–91
6. Ahani S, Ghaemmaghami S, Wang ZJ (2015) A Sparse Representation-Based Wavelet Domain Speech
Steganography Method. IEEE Trans Audio Speech Lang Process (1):80–91
7. Akram MZ, Azizah AM, Shayma SM (2011) High watermarking capacity based on spatial domain
technique. Inf Technol J 10(7):1367–1373
8. Aziz Sbai SM, Aissa-El-Bey A, Pastor D (2012) Underdetermined source separation of finite alphabet signals
via l1 minimization. 11th International Conference on Information Science, Signal Processing and their
Applications (ISSPA), Montreal, pp 625–628
9. Blanchard JD, Tanner J (2015) Performance comparisons of greedy algorithms in compressed sensing.
Numerical Linear Algebra with Applications (2):254–282
10. Budhia U, Kundur D, Zourntos T (2006) Digital Video Steganalysis Exploiting Statistical Visibility in the
Temporal Domain. IEEE Transactions on Information Forensics and Security (4):502–516
11. Cai J, Ji H, Liu C, Shen Z (2009) Blind motion deblurring from a single image using sparse approximation.
IEEE Conference on Computer Vision and Pattern Recognition, Miami, pp 104–111
12. Cao X, Du L, Wei X, Meng D, Guo X (2015) High Capacity Reversible Data Hiding in Encrypted Images
by Patch-Level Sparse Representation. IEEE Transactions on Cybernetics 5:1132–1143
13. Cao X, Du L, Wei X, Meng D, Guo X (2016) High Capacity Reversible Data Hiding in Encrypted Images
by Patch-Level Sparse Representation. IEEE Transactions on Cybernetics 5:1132–1143
14. Chae JJ, Manjunath BS (1999) Data hiding in video. Proceedings 1999 International Conference on Image
Processing (Cat. 99CH36348), Kobe, pp 311–315
15. Chandramouli R, Memon N (2001) Analysis of LSB image steganography techniques. IEEE International
Conference on Image Processing:1019–1022
16. Cheddad A, Condell J, Curran K, McKevitt P (2008) Biometric Inspired Digital Image Steganography”,
15th Annual IEEE International Conference and Workshop on the Engineering of Computer Based
Systems, pp. 159–168
17. Cheddad A, Condell J, Curran K, McKevitt P (2010) Digital image steganography: Survey and analysis of
current methods. Signal Process (3):727–752
18. Chen Y, Nasrabadi NM, Tran TD (2013) Hyperspectral Image Classification via Kernel Sparse
Representation. IEEE Trans Geosci Remote Sens (1):217–231
19. Chen L, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) DeepLab: Semantic Image Segmentation
with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans Pattern Anal
Mach Intell (4):834–848
20. Cong Y, Zhang S, Lian Y (2015) K-SVD Dictionary Learning and Image Reconstruction Based on Variance
of Image Patches. 8th International Symposium on Computational Intelligence and Design (ISCID),
Hangzhou, pp 254–257
21. Dabov K, Foi A, Katkovnik V, Egiazarian K (2007) Image Denoising by Sparse 3-D Transform-Domain
Collaborative Filtering. IEEE Trans Image Process (8):2080–2095
22. Dai W, Milenkovic O (2009) Subspace pursuit for compressive sensing signal reconstruction. IEEE Trans
Inf Theory (5):2230–2249
23. Dan W, Xia W, Guangyan W, Yan Z (2016) Speech enhancement based on Emd and compressed sensing.
IEEE International Conference on Signal and Image Processing (ICSIP), Beijing, pp 699–702
24. Donoho DL, Tsaig Y, Drori I, Starck JL (2012) Sparse solution of underdetermined systems of linear
equations by stagewise orthogonal matching pursuit. IEEE Trans Inf Theory:1094–1121
25. Escoda OD, Vandergheynst P (2004) A Bayesian approach to video expansions on parametric over-
complete 2-D dictionaries. IEEE 6th Workshop on Multimedia Signal Processing, Siena, pp 490–493
26. Etezadifar P, Farsi H (2017) Scalable video summarization via sparse dictionary learning and selection
simultaneously. Multimed Tools Appl (6):7947–7971
27. Fadili JM, Starck J, Murtagh F (2009) Inpainting and Zooming Using Sparse Representations. Comput J ,
Oxford University Press (UK) 1:64–79
28. Fang DY, Chang LW (2006) Data hiding for digital video with phase of motion vector. IEEE International
Symposium on Circuits and Systems:1422–1425
29. Farsi H (2010) Improvement of minimum tracking in Minimum Statistics noise estimation method. Signal
Processing: an International Journal (SPIJ) 4:17–22
30. Figueiredo MA, Nowak RD, Wright SJ (2007) Gradient projection for sparse reconstruction: Application to
compressed sensing and other inverse problems. IEEE Journal of Selected Topics in Signal Processing (4):586–597
31. Forster E, Lowe T, Wenger S, Magnor M (2015) RGB-guided depth map compression via Compressed
Sensing and Sparse Coding. Picture Coding Symposium (PCS), Cairns, pp 1–4
32. Geng Q, Wright J (2014) On the local correctness of ℓ1-minimization for dictionary learning. IEEE
International Symposium on Information Theory, Honolulu, pp 3180–3184
33. Gorodnitsky IF, Rao BD (1997) Sparse signal reconstruction from limited data using FOCUSS: a re-
weighted minimum norm algorithm. IEEE Trans Signal Process (3):600–616
34. Hasheminejad M, Farsi H (2016) Frame level sparse representation classification for speaker verification.
Multimed Tools Appl 76:21211–21224
35. Hasheminejad M, Farsi H (2018) Sample-specific late classifier fusion for speaker verification. Multimed
Tools Appl 77:15273–15289
36. Hosseini SM, Farsi H, SadoghiYazdi H (2009) Best Clustering Around the Color Images. International
Journal of Computer and Electrical Engineering 1:20–24
37. Hu HT, Hsu LY (2015) Robust, transparent and high-capacity audio watermarking in DCT domain. Signal Process:
226–235
38. Hu Y, Zhang C, Su Y (2007) Information Hiding Based on Intra Prediction Modes for H.264/AVC. IEEE
International Conference on Multimedia and Expo, Beijing, pp 1231–1234
39. Hua G, Xiang Y, Bi G (2016) When Compressive Sensing Meets Data Hiding. IEEE Signal Processing
Letters (4):473–477
40. Huang J, Xu Y, Zhu P, Wang Y (2014) An Improved Reconstruction Algorithm Based on Multi-candidate
Orthogonal Matching Pursuit Algorithm. Seventh International Symposium on Computational Intelligence
and Design, Hangzhou, pp 564–568
41. Jalali A, Farsi H, Ghaemmaghami S (2018) A steganalysis System Based on Double Sparse Representation
Classification (DSRC). Multimed Tools Appl 11:16347–16366
42. Jamil T (1999) Steganography: the art of hiding information in plain sight. IEEE Potentials 1:10–12
43. Keshavarz SN, Hajizadeh S, Hamidi M, Omali MG (2010) A Novel UWB Pulse Waveform Design
Method. Fourth International Conference on Next Generation Mobile Applications, Services and
Technologies, Amman, pp 168–173
44. Keshavarz SN, Hamidi M, Khoshbin H (2010) A PSO-Based UWB Pulse Waveform Design Method.
Second International Conference on Computer and Network Technology, Bangkok, pp 249–253
45. Keshavarz SN, Kakhki MA, Omali MG, Hamidi M (2010) A novel UWB pulse design method using
particle swarm optimization algorithm. Sci Res Essays (5):3049–3058
46. Krstulovic S, Gribonval R (2006) MPTK: Matching pursuit made tractable, vol 3. IEEE International
Conference on Acoustics, Speech and Signal Processing (ICASSP), Toulouse, pp 496–499
47. Kumar V, Kumar D (2010) Performance evaluation of DWT based image steganography. IEEE 2nd
International Advance Computing Conference (IACC), Patiala, pp 223–228
48. Kumar BS, Shree VU (2017) Encrypting Images by Patch-Level Sparse Representation for High Capacity
Reversible Data Hiding. International Journal of Advanced Technology and Innovative Research (1):1–8
49. Liu P, Tao Y, Zhao W, Tang X (2017) Abnormal crowd motion detection using double sparse representation.
Journal of Neurocomputing, Elsevier:3–12
50. Liu E, Temlyakov VN (2012) The orthogonal super greedy algorithm and applications in compressed
sensing. IEEE Trans Inf Theory (4):2040–2047
51. Liu Y, Wang Z (2015) Simultaneous image fusion and denoising with adaptive sparse representation. IET
Image Process (5):347–357
52. Lou DC, Liu JL, Tso HK (2008) Evolution of information – hiding technology. Information Security and
Ethics: Concepts, Methodologies, Tools and Applications, New York, pp 438–450
53. Maechler P, Greisen P, Sporrer B, Steiner S, Felber N, Burg A (2010) Implementation of greedy algorithms
for LTE sparse channel estimation. Conference Record of the Forty Fourth Asilomar Conference on Signals,
Systems and Computers (ASILOMAR), Pacific Grove, pp 400–405
54. Mallat SG, Zhang Z (1993) Matching pursuits with time-frequency dictionaries. IEEE Trans Signal Process:
3397–3415
55. Mansouri J, Khademi M (2009) An adaptive scheme for compressed video steganography using temporal
and spatial features of the video signal. Int J Imaging Syst Technol (4):306–315
56. Masud Karim SM, Rahman MS, Hossain MI (2011) A new approach for LSB based image steganography
using secret key. 14th International Conference on Computer and Information Technology (ICCIT 2011),
Dhaka, pp 286–291
57. Mohamadzadeh S, Farsi H (2016) Content Based Video Retrieval Based on HDWT and Sparse
Representation. Image Analysis and Stereology (2):67–80
58. Mstafa RJ, Elleithy KM (2015) A New Video Steganography Algorithm Based on the Multiple Object
Tracking and Hamming Codes. IEEE 14th International Conference on Machine Learning and Applications
(ICMLA):335–340
59. Mstafa RJ, Elleithy KM (2015) A novel video steganography algorithm in the wavelet domain based on the
KLT tracking algorithm and BCH codes. IEEE Long Island Systems, Applications and Technology
Conference (LISAT):1–7
60. Mstafa RJ, Elleithy KM (2015) A high payload video steganography algorithm in DWT domain based on
BCH codes (15, 11). International IEEE Wireless Telecommunications Symposium, New York:1–8
61. Mstafa RJ, Elleithy KM (2016) A novel video steganography algorithm in DCT domain based on hamming
and BCH codes. IEEE 37th Sarnoff Symposium:208–213
62. Mstafa RJ, Elleithy KM (2016) A DCT-based robust video steganographic method using BCH error
correcting codes. IEEE Long Island Systems, Applications and Technology Conference (LISAT):1–6
63. Needell D, Tropp JA (2009) CoSaMP: Iterative signal recovery from incomplete and inaccurate samples.
Appl Comput Harmon Anal (3):301–321
64. Needell D, Vershynin R (2009) Uniform uncertainty principle and signal recovery via regularized orthog-
onal matching pursuit. Found Comput Math, Springer 9:317–334
65. Pan JS, Li W, Yang CS, Yan L (2015) Image steganography based on subsampling and compressive
sensing. Multimed Tools Appl:9191–9205
66. Patsakis C, Aroukatos N (2014) LSB and DCT steganographic detection using compressive sensing. Journal
of Information Hiding and Multimedia Signal Processing 1:20–32
67. Raja KB, Chowdary CR, Venugopal KR, Patnaik LM (2005) A Secure Image Steganography using LSB,
DCT and Compression Techniques on Raw Images. 3rd International Conference on Intelligent Sensing and
Information Processing, Bangalore, pp 170–176
68. Rajesh GR, Nargunam AS (2013) Steganography algorithm based on discrete cosine transform for data
embedding into raw video streams. IET Chennai Fourth International Conference on Sustainable Energy
and Intelligent Systems (SEISCON), Chennai, pp 554–558
69. Rencker L, Wang W, Plumbley MD (2017) A greedy algorithm with learned statistics for sparse signal
reconstruction. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New
Orleans, pp 4775–4779
70. RoselinKiruba R, Sree Sharmila T (2018) Hiding Data in Videos Using Optimal Selection of Key-Frames.
International Conference on Computer, Communication, and Signal Processing (ICCCSP):1–6
71. Saha B, Sharma S (2012) Steganographic Techniques of Data Hiding Using Digital Images. Def Sci J 1:11–18
72. Schepker HF, Dekorsy A (2011) Sparse multi-user detection for CDMA transmission using greedy algo-
rithms. 8th International Symposium on Wireless Communication Systems (ISWCS), Aachen, pp 291–295
73. Singh A, Hariharan S (2017) Performance analysis of energy efficient algorithm for MIMO based CRN with
antenna selection and maximal ratio combining. International Conference on Trends in Electronics and
Informatics (ICEI), Tirunelveli, pp 663–668
74. Song L, Peng J (2012) Dictionary Learning Research Based on Sparse Representation. International
Conference on Computer Science and Service System, Nanjing, pp 14–17
75. Su P (2013) C., S., Lu, M. T., Wu, C. Y.: “A practical design of high-volume steganography in digital video
files”. Multimed Tools Appl 2:247–266
76. Sun G, Meng L, Liu L, Tan Y, Zhang J, Zhang H (2018) KSVD-based Multiple Description Image Coding.
IEEE Access Journal:1962–1972
77. Sung T, Shieh Y, Yu C, Hsin H (2006) High-Efficiency and Low-Power Architectures for 2-D DCT and
IDCT Based on CORDIC Rotation. Seventh International Conference on Parallel and Distributed
Computing, Applications and Technologies (PDCAT'06), Taipei, pp 191–196
78. Swanson MD, Zhu B, Tewfik AH (1997) Data hiding for video-in-video, vol 2. International Conference on
Image Processing, Santa Barbara, pp 676–679
79. Tropp JA, Gilbert AC (2007) Signal recovery from random measurements via orthogonal matching pursuit.
IEEE Trans Inf Theory (12):586–597
80. Tseng YC, Chen YY, Pan HK (2002) A secure data hiding scheme for binary images. IEEE Trans Commun 8:
1227–1231
81. Wang H, Xia Y, Wang Z (2017) Dictionary learning-based image compression. IEEE International
Conference on Image Processing (ICIP), Beijing, pp 3235–3239
82. Xi R (2010) Super resolution processing of SAR images by Matching Pursuit method based on Genetic
Algorithm. 3rd International Congress on Image and Signal Processing, Yantai, pp 2066–2070
83. Xu Z, Sun J (2010) Image Inpainting by Patch Propagation Using Patch Sparsity. IEEE Trans Image Process (5):
1153–1165
84. Yadav P, Mishra N, Sharma S (2013) A secure video steganography with encryption based on LSB
technique. IEEE International Conference on Computational Intelligence and Computing Research,
Enathi, pp 1–5
85. Yan X, Yang B, Zhang W, Liu C, Wang Y (2016) An Improved Denoising Algorithm of Feather and Down
Image Based on KSVD. 8th International Conference on Information Technology in Medicine and
Education (ITME), Fuzhou, pp 419–423
86. Yao M, Qi M, Yi Y, Shi Y, Kong J (2015) An Improved Information Hiding Method Based on Sparse
Representation. Math Probl Eng:1–10
87. Zaheer M, Qureshi I, Muzaffar Z, Aslam L (2017) Compressed Sensing Based Image Steganography
System for Secure Transmission of Audio Message with Enhanced Security. International Journal of
Computer Science and Network Security 7:133–141
88. Zhang Q, Liu Y, Blum RS, Han J, Tao D (2018) Sparse representation based multi-sensor image fusion for
multi-focus and multi-modality images: A review”, Elsevier. Information Fusion Journal:57–75
89. Zhang S, Wang H, Huang W (2017) Two-stage plant species recognition by local mean clustering and
weighted sparse representation classification. Journal of Cluster Computing:1–9
90. Zhao Y, Li J, Zhong Z (2014) Group-based sparse coding dictionary learning for object recognition. IEEE
Workshop on Advanced Research and Technology in Industry Applications, Ottawa, pp 331–334
91. Zheng A, Zhao Y, Li C, Tang J, Luo B (2018) Moving Object Detection via Robust Low-Rank and Sparse
Separating with High-Order Structural Constraint. IEEE Fourth International Conference on Multimedia
Big Data (BigMM), Xi’an, pp 1–6
92. Zhu X, Liu L, Wang X, Wang J (2016) Super-resolution reconstruction via multiple sparse dictionary
combined with sparse coding. IEEE International Conference on Information and Automation (ICIA),
Ningbo, pp 1720–1725
93. Zhu X, Tao J, Li B, Chen X, Li Q (2015) A novel image super-resolution reconstruction method based on
sparse representation using classified dictionaries. IEEE International Conference on Information and
Automation, Lijiang, pp 776–780
94. Zhu Y, Zhang X, Wen G, He W, Cheng D (2016) Double sparse-representation feature selection algorithm
for classification. Multimed Tools Appl:17525–17539
95. Zibulevsky M, Elad M (2010) L1-L2 Optimization in Signal and Image Processing. IEEE Signal Process
Mag (3):76–88
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.
Arash Jalali received the B.Sc. degree in Electronics engineering from University of Ferdowsi, Mashhad, Iran,
in 2010. He received the M.Sc. degree in communication engineering from Sharif university of technology,
Tehran, Iran, in 2013. He is currently Ph.D student in Department of Electrical and Computer Engineering,
University of Birjand, Birjand, Iran. His area research interests include Image Processing and steganography,
Pattern recognition, Digital Signal Processing and Sparse representation. His email address is:
a_jalali1367@birjand.ac.ir.
Hassan Farsi received the B.Sc. and M.Sc. degrees from Sharif University of Technology, Tehran, Iran, in 1992 and
1995, respectively. Since 2000, he started his Ph.D in the Centre of Communications Systems Research (CCSR),
University of Surrey, Guildford, UK, and received the Ph.D degree in 2004. He is interested in speech, image and video
processing on wireless communications. Now, he works as associate professor in communication engineering in
department of Electrical and Computer Eng., university of Birjand, Birjand, Iran. His Email is: hfarsi@birjand.ac.ir.

A New Steganography Algorithm Based On Video Sparse Representation

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A New Steganography Algorithm Based On Video Sparse Representation

Uploaded by

Copyright:

Available Formats

Multimedia Tools and Applications

A new steganography algorithm based on video

Arash Jalali 1 & Hassan Farsi 1

Received: 9 January 2019 / Revised: 24 August 2019 / Accepted: 13 September 2019

# Springer Science+Business Media, LLC, part of Springer Nature 2019

a) Traditional text and image-based steganography techniques are not abundant.

According to [52], steganography is hiding the existence of a message by hiding information

3.1 Sparse representation

3.2 Dictionary learning

Fig. 1 Orthogonal Matching Pursuit (OMP) algorithm

and we will have.

kEk2F ¼ ∑Nl¼1 kel k2 ¼ ∑Nl¼1 kyl −Dxl k2 ð11Þ

s:t: kxl k0 ≤s; l ¼ 1 : N ð12Þ

Here, the sparsity pattern is fixed and the dictionary is a variable.

Fig. 2 K-SVD dictionary update algorithm

4 Proposed method: Video steganography based on sparse

4.1 Related techniques

4.2 Embedding procedure

The proposed VSR steganography algorithm is based on DCT domain. As we mentioned

& Step 4 (applying 2D-DCT): Apply 2D-DCT to Y component of each frame.

Fig. 3 Block diagram of embedding procedure

& Step 7 (input 2): Select a video file as cover.

This step is as same as Step 2 which has been described previously.

Both 9 and 10 steps are similar to the step 3 and 4, respectively.

Sparse representation of a frame represents it as a linear combination of a set of

& Step 13 (frame reconstruction): Reconstruct both I and Q component frames.

4.3 Extraction procedure

Fig. 4 Block diagram of de-embedding (extraction) procedure

Missile.avi 30 155×208 283.5

5 Simulation results and discussion

Fig. 9 Secret message

Table 2 Runtime measurement for three different video host

Missile.avi 155×208 283.5 45.5 25.06

283.5 25.4 24.83 17.05 0.999 8.95

349.5 25.4 38.67 25.31 0.996 7.27

454.5 25.4 24.83 32.03 0.995 5.59

429 25.4 33.25 33.22 0.990 6.27

317.3 25.4 44.96 30.82 0.988 9.43

400.7 25.4 30.09 19.38 0.996 5.90

295.5 25.4 47.13 20.25 0.993 6.17

210 25.4 50.88 29.45 0.987 8.16

525.1 25.4 19.88 17.47 0.995 6.35

427.6 25.4 71.45 63.18 0.992 7.20

211.5 25.4 43.61 22.96 0.999 8.88

357.1 25.4 26.19 35.08 0.998 9.16

Missile.avi 24.07 0.9672 22.88 0.9699 19.92 0.9887

You might also like