Supervised by: Dr. Longin Jan Latecki Overview Introduction Clandestine Communication Digital Applications of Steganography Background Uncompressed Images Compressed Images Steganalysis The Images Used Finding and Extracting Messages from Bitmaps Detecting Messages in jpegs Future Work Introduction Clandestine Communication Cryptography Scrambles the message into cipher Steganography Hides the message in unexpected places Digital Applications of Steganography Can be hidden in digital data MS Word (doc) Web pages (htm) Executables (exe) Sound files (mp3, wav, cda) Video files (mpeg, avi) Digital images (bmp, gif, jpg) Background Uncompressed Images Grayscale Bitmap images (bmp) 256 shades of intensity from black to white Can be obtained from color images Arranged into a 2-D matrix Messages are hidden in the least significant bits (lsb) Matrix values change slightly Interested in patterns that form messages Character Integer Binary Space 32 00100000 0 9 48 57 00110000 - 00111001 A Z 65 90 01000001 - 01011010 a z 97 122 01100001 01111010 Length = 12 Message = Hello Stego!
Background Compressed Images Grayscale jpeg images (jpg) Joint Photographic Experts Group (jpeg) Converts image to YCbCr colorspace Divides into 8x8 blocks Uses Discrete Cosine Transform (DCT) Obtain frequency coefficients Scaled by quantization to remove some frequencies High quality setting will not be noticed Huffman Coding Affects the images statistical properties Background Steganalysis The Images Used From Star Trek Website 1,000 color jpeg images 320x240 or 240x320 www.startrek.com There will be Klingons Finding and Extracting Messages from Bitmaps Problem Messages can be hidden in lsbs May be anywhere in image Cannot see message in image Would take forever to be processed by a human Finding and Extracting Messages from Bitmaps Procedure Inject messages into a images Take a Boolean snapshot of even and odd pixels Construct a string of all possible characters An n-pixel image has n-7 individual character enumerations (320 x 240 - 7 = 76,793) Use character properties to match a message pattern in the enumerated string Define a message (pattern of message characters) Define message characters (used in messages) Use stego stems (patterns) A test can be performed faster by using tiled samples
Steganography is the art and science of communicating in a way which hides the existence of the communication. In contrast to cryptography, where the "enemy" is allowed to detect, intercept and modify messages without being able to violate certain security premises guaranteed by a cryptosystem, the goal of steganography is to hide messages inside other "harmless" messages in a way that does not allow any "enemy" to even detect that there is a second secret message present [Markus Kuhn 1995-07-03].
Finding and Extracting Messages from Bitmaps Observation Only considered linear unencrypted messages Trial performed on 100 grayscale bitmaps 97 clean 3 stego Took an average of 9 seconds per image to find with 100% accuracy (no training -- cold) Occasionally some garbage text at head or tail Took an average of 3 seconds per image to test with 100% accuracy Clean images had pattern scores of less than 10 Stego images had pattern scores of 31 or more Finding and Extracting Messages from Bitmaps Conclusion Messages are detectible and extractible from non-encrypted uncompressed images Linear messages can be found in any direction with more computation This method can be foiled by hashing the message into the image Detecting Messages in jpegs Problem Cannot use an enumeration scheme to detect or find a message May only be able to detect because of encoding schemes and encryption Cannot see message in image Statistical properties of an image change when a message is injected Detecting Messages in jpegs Procedure Obtain the 4-level 2-D wavelet decomposition of the images Obtain the orientation decomposition of frequency space statistics 72 features plus the class (0 = clean, 1=stego) Includes: mean, variance, skewness and kurtosis of coefficients and error for prediction in subband Normalize the data by 0-1 min-max Train Fisher Linear Descriptor (FLD) Test the FLD threshold -0.004 17.120 120.485 0.059 0.363 1.041 3.809 -0.291 -0.146 838.622 97.874 0.887 0.034 1.391 3.948 -0.703 -2.200 15627.538 47.077 -1.128 -0.465 2.060 3.726 -0.738 0.011 15.318 90.017 0.594 0.268 0.969 3.877 -0.172 -0.523 920.19 62.226 -1.366 -0.146 1.326 3.944 -0.705 4.418 15572.229 23.531 -0.123 -0.541 1.980 3.571 -0.705 -0.004 0.935 182.339 -1.808 0.601 1.226 4.692 0.205 -0.079 193.451 364.874 -9.569 -0.116 1.133 4.244 -0.577 1.899 3640.213 24.731 0.766 -0.349 1.681 3.426 -0.625 0 0.590963 0.050189 0.080103 0.345166 0.343829 0.332710 0.001311 0.021374 0.482941 0.094929 0.084698 0.411032 0.331954 0.572352 0.260870 0.337264 0.135543 0.065238 0.079329 0.542244 0.187500 0.603208 0.306227 0.424866 0.370270 0.032725 0.025054 0.381317 0.412698 0.385321 0.001666 0.043085 0.402427 0.053992 0.155397 0.553661 0.476190 0.432629 0.237224 0.271698 0.422609 0.096439 0.087974 0.463496 0.471598 0.242233 0.153389 0.360447 0.395349 0.026724 0.044753 0.738226 0.479060 0.367367 0.073430 0.361345 0.427911 0.042625 0.055986 0.558653 0.350634 0.332762 0.165738 0.301011 0.611057 0.054988 0.166710 0.497393 0.518569 0.373766 0.153005 0.320611 0
meanV 12 meanH 12 meanD 12 varV 12 varH 12 varD 12 skwV 12 skwH 12 skwD 12 krtV 12 krtH 12 krtD 12 meanEv 12 meanEh 12 meanEd 12 varEv 12 varEh 12 varEd 12 skwEv 12 skwEh 12 skwEd 12 krtEv 12 krtEh 12 krtEd 12 meanV 23 meanH 23 meanD 23 varV 23 varH 23 varD 23 skwV 23 skwH 23 skwD 23 krtV 23 krtH 23 krtD 23 meanEv 23 meanEh 23 meanEd 23 varEv 23 varEh 23 varEd 23 skwEv 23 skwEh 23 skwEd 23 krtEv 23 krtEh 23 krtEd 23 meanV 34 meanH 34 meanD 34 varV 34 varH 34 varD 34 skwV 34 skwH 34 skwD 34 krtV 34 krtH 34 krtD 34 meanEv 34 meanEh 34 meanEd 34 varEv 34 varEh 34 varEd 34 skwEv 34 skwEh 34 skwEd 34 krtEv 34 krtEh 34 krtEd 34 class Detecting Messages in jpegs Observation Trials performed on 2000 images 1000 clean and 1000 stego Random selection of 1000 instances without replacement (500 each class) Messages in stego had sufficient size Results show overwhelming accuracy Bior3.1 True Neg 100%, True Pos 98.6% Rbio5.5 True Neg 99.8%, True Pos 98.8% Detecting Messages in jpegs Conclusion Messages of sufficient size can be detected in stego images with great accuracy Improved accuracy may be due to a large training set 1000 (800/200) 500 (400/100) Restricted domain Many similar images Detecting Messages in jpegs Problems Authors did not handle log of zero problem Replaced with small value Differing jpeg sizes need differing message sizes Dynamic message injection Detecting Messages in jpegs Other Classifiers Tests were run on J4.8, SMO, Logistic and Nave Bayes for bior3.1 and rbio5.5 with 80/20 split and default settings Results Future Work Would like to find optimal stems Pattern matching Text mining Cryptanalysis Would like to optimize TestMsg code C/assembly code References Petitcolas, F.A.P., Anderson, R., Kuhn, M.G., "Information Hiding - A Survey", July1999, URL: http://www.cl.cam.ac.uk/~fapp2/publications/ieee99-infohiding.pdf (11/26/0117:00) Farid, Hany, Detecting Steganographic Messages in Digital Images Department of Computer Science, Dartmouth College, Hanover NH 03755 Moby Words II, Copyright (c) 1988-93, Grady Ward. All Rights Reserved. Lyu, Siwei and Farid, Hany, Steganalysis Using Color Wavelet Statistics and One-Class Support Vector Machines, Department of Computer Science, Dartmouth College, Hanover, NH 03755, USA Farid, Hany, Detecting Hidden Messages Using Higher Order Statistical Models Department of Computer Science, Dartmouth College, Hanover NH 03755 Spy Vs. Spy by Antonio Prohias from MAD Magazine Have a good Winter Break!