7

I came across a link that shows how to hide number of files inside an image file: http://lifehacker.com/282119/hide-files-inside-of-jpeg-images more discussion on detection here: http://ask.metafilter.com/119943/How-to-detect-RARsEXEs-hidden-in-JPGs

I'm trying to find out what is a good way to programmatically detect whether an image file has other files hidden inside it? Should I try unzipping the file to see if other files come out of it?

I'm not bound programmatically but something that works well on the JVM would be great.

Update

One Approach:

Would something like this work (suggested by someone on metafilter)

$ cat orig.jpg test.zip > stacked.jpg
$ file stacked.jpg 
stacked.jpg: JPEG image data, JFIF standard 1.01
$ convert stacked.jpg stripped.jpg  # this is an ImageMagick command
$ ls -l
 11483 orig.jpg
322399 stacked.jpg
 11484 stripped.jpg
310916 test.zip

I could use JMagick for this approach.

5
  • I've updated the link. You are right, the hidden files would not be in the metadata. However, the problem still stands - how can I detect that the image file contains some hidden files inside it.
    – Jayson
    Commented Jan 22, 2013 at 3:38
  • You can't by magic, you could guess how the files were hidden in a given instance. But that can vary completely from an instance to another, you could create a different hiding method for example.
    – mmgp
    Commented Jan 22, 2013 at 3:57
  • Yes, you can detect that by magic - en.wikipedia.org/wiki/Magic_number_(programming)
    – Tesseract
    Commented Jan 22, 2013 at 4:10
  • @SpiderPig are you referring to magic numbers that identify file formats ? I can simply remove them.
    – mmgp
    Commented Jan 22, 2013 at 4:14
  • @mmgp I've updated the question with one of the approaches I found on the internet
    – Jayson
    Commented Jan 22, 2013 at 4:15

3 Answers 3

2

Great question!

If all you want to check for is a RAR or ZIP file appended to the end of an image file, then running it through the unrar or unzip command is the easiest way to do it.

If you want a faster but less exact check, you can check for some of the special file format signatures that indicate certain types of files. The usual UNIX tool to identify file format is file. It uses a database of binary file signatures, whose format is defined in the magic(5) man page. It won’t find a RAR file for you at the end of a JPEG, because it only looks at the start of files to try to identify them quickly, but you might be able to modify its source code to do what you want. You could also reuse its database of file signatures. If you look at the archive file part of its database in the Rar files section, it shows this:

# RAR archiver (Greg Roelofs, [email protected])
0   string      Rar!        RAR archive data,

which indicates that if your JPEG file contains the four bytes Rar! that would be suspicious. But you would have to examine the Rar file format spec in detail to check whether more of the Rar file structure is present to avoid false positives—this web page also contains the four bytes Rar! but there are no hidden files attached to it :P

But if someone knows the details of your automated checks, they could easily work around them. The simplest workaround would be to reverse all the bytes of the files before appending them to the JPEG. Then none of your signatures would catch the reversed version of the file.


If someone really wants to hide a file inside an image, there are all sorts of ways to do that that you won’t be able to detect easily. The general term for this is “steganography.” The Wikipedia page, for example, shows a picture of trees that has a picture of a cat hidden inside it. For simpler steganographic methods, there are statistical tests that can indicate something funny has been done to a picture, but if someone spends a lot of time to come up with their own method to hide other files inside images, you won’t be able to detect it.

3
  • 2
    @mmgp Please stop commenting on this thread. Your rude and unhelpful comments are not appreciated by anyone here.
    – andrewdotn
    Commented Jan 22, 2013 at 4:21
  • @andrew thanks. I'm not at all planning to tackle steganography from all aspects as illustrated by that tree-cat pic. However, I'm looking for ways to find if there is a completely separate file hidden inside the image. Sure, to begin with I don't know what file format could be hidden but I can target different formats one-by-one. If I target RAR and it is actually at the end of the JPEG then what might the options be? Can I examine JPEG bit-by-bit to see if it has a RAR in it? How can I do this?
    – Jayson
    Commented Jan 22, 2013 at 4:29
  • @Jayson In the case where there’s a RAR file appended, whether it’s appended to a JPEG, a PNG, or anything else doesn’t really matter. The archive part is outside the part defined by the image file format. RAR files start with the string Rar!, so you could scan byte-by-byte until you hit that, and then treat the bytes from then on as a RAR file—but the unrar tool already does that. To do something much more complicated you’d basically have to reimplement unrar in Java :/
    – andrewdotn
    Commented Jan 22, 2013 at 4:40
0

You could search for the file signature. http://en.wikipedia.org/wiki/List_of_file_signatures e.g. for 7z file the sigature is 37 7A BC AF 27 1C for rar files it's 52 61 72 21 1A 07 00 and for zip it's 50 4B 03 04 Take a look at a compressed file in a hex editor e.g. HxD

1
  • Of course it only works for files hidden in the way the video demonstrates.
    – Tesseract
    Commented Jan 22, 2013 at 4:18
0

To see if there's any metadata or other information appended to the file, you could decode the image and re-encode it to see if the size decreases dramatically. For a JPEG file you would want to do something like a lossless rotate that retains the original DCT data, otherwise the file size might change just through encoding differences.

A smaller result wouldn't be proof of hidden data, but it would be an indicator that you need to take a closer look.

You never shared your motivation for asking the question, but I'm going to guess that it's about downloading images to a public site. In that case you really shouldn't care whether the submitted image contains extraneous data, you should just cleanse the input regardless. The decode/re-encode process would be perfect for this.

10
  • I don't see how this could work, honestly. You are assuming the file can be decoded, but what if I (as the one that hid the data) removed the data necessary for the file to be decoded ? I don't have any problem handling the files, because I know how I removed them.
    – mmgp
    Commented Jan 22, 2013 at 3:59
  • @mmgp, I thought we were starting with the assumption that we had a valid image file. Obviously if you invent your own image file format you can hide anything you want. Commented Jan 22, 2013 at 4:41
  • @mmgp, I apologize, my answer was unclear and you were reacting to that. What I meant to say was to decode the image part of the data, not the unknown part. I've slightly changed the wording to make that clear. Commented Jan 22, 2013 at 4:44
  • The problem is getting the image part of the data if you don't know the actual format of the data. Even if we take the simplest image formats, like the ones by netpbm, and simply exchange the first line with the second line, the ready tools won't attempt to read it since it fails the simplest of the tests that is done to attempt to identify it. After we settle on a lot of pre-conditions, then the question might be answerable. As it stands it can't, because we can make up any hiding process, and it doesn't need to invent a new format, just scramble it a little.
    – mmgp
    Commented Jan 22, 2013 at 4:47
  • 3
    @mmgp, I didn't see anything in the question that required deciphering the hidden content. It was merely a question of determining if there was hidden content, on a file that is masquerading as a valid image file. Creating a file that isn't a valid image is beyond the scope of the question as well. Your misunderstanding of the question borders on trolling. Commented Jan 22, 2013 at 16:24

Not the answer you're looking for? Browse other questions tagged or ask your own question.