If the file in question was a plain text file (as Linux understands it, i.e. UTF-8) and the filesystem you copied with dd
is neither encrypted nor compressed, use strings
on the image.
For each file given, GNU strings
prints the printable character sequences that are at least 4 characters long (or the number given with the options below) and are followed by an unprintable character.
(source: man 1 strings
)
You want something like:
strings -aw -e S -n 512 ddimage >extracted
(or pv ddimage | strings -aw -e S -n 512 >extracted
to see the progress).
Then extracted
will be a file you can view with less
, search with grep
etc. In my tests -e S
was crucial to detect UTF-8 text with multi-byte characters.
There are possible problems:
The data you seek may be fragmented, scattered around the file, not necessarily in sequence. There may be old versions, there may be fragments of other files (garbage, including text-alike fragments of binary files); all these possibly interleaved. It will be a textual jigsaw puzzle. Consider using the -s
(--output-separator
) option, but keep in mind if there are unrelated fragments strictly adjacent in the image then you won't get a separator between them, as if they were one bigger chunk.
If the filesystem you copied with dd
was on SSD and TRIM was performed after the mishap but before you copied, then there's a risk the data you want to recover is gone. This is a bad scenario.
On the other hand, if the filesystem was on SSD and TRIM was performed before the mishap, and there was no TRIM between the mishap and the copy, then the TRIM may have wiped out unrelated old data, old versions of files etc., but not the data you're after. In effect you will get less garbage. This is a good scenario.
As you can see, SSD may be a disadvantage or an advantage. For HDD these scenarios do not apply. Virtual disks may support something similar to TRIM.
-n 512
tells the tool to print sequences at least 512 bytes long. The manual says "characters" but my tests with UTF-8 multi-byte characters show it's "bytes" for sure. The lower the number, the more garbage you will get. On the other hand you should not exceed the block size used by the imaged filesystem, which is at least 512 (the lowest common sector size for block devices). You said nothing about the filesystem, its block size may be e.g. 4096 or 8192. The point is your file may be fragmented and -n
higher than the block size will miss a textual block, if it happens to be between non-textual data. If your file was tiny (smaller than -n
you used) then you might miss it completely. Similarly you may miss the tail part of your desired file, if the part happens not to be adjacent to other text.
Still -n 512
should allow you to find almost all existing remnants of the file (unwanted garbage and fragmentation may be bigger problems than missing part(s)). Unless…
In the beginning I wrote "the filesystem […] neither encrypted nor compressed". An encrypted or compressed filesystem would store textual data not in its plain form, so strings
would be useless. I guess some other features of some filesystems may lower your chances or cause some extra garbage.
extracted
may be relatively huge. I performed strings -awe S -n 4096
on my system drive which is about 477 GiB, the output was over 10 GiB. Some filtering is advised, e.g. grep -av '[[:lower:]][[:upper:]]'
is a reasonable filter (but note it will filter out lines containing kHz
, kB
, macOS
or MacGyver
); in my case it reduced 10+ GiB to 6 GiB.
I note your entire image is about 850 MB, not that huge. Your extracted
won't be bigger. It may still be too big for "manual" inspection though, even after filtering. Eventually you will probably need to use a good text editor or pager (capable of handling large text files) to interactively search for strings you know were in the file you want to recover. This way you will hopefully locate relevant fragments.
Consider copying extracted
to /dev/shm
(or use vmtouch -l
) to speed up your work.