Recover txt files based on known strings

Question

I have lost a number of txt files, which contain important personal information. I accidentally deleted them from the hard disk. I am not sure which folder they were in. I am not sure what filenames they had (at least not all of them), but I know some keywords that are likely to be in them. For example, I know most of them contain the string diary (you can guess why these files are important to me).

As far as I can understand, I can't use file carving tools like Scalpel, since they rely on identifying files based on their headers and footers, but txt files have neither.

So I guess my only option is to search for these known strings in the raw dump.

I have a dump of the FAT32 partition, a 150GB img file, created with dd.

As far as I understand FAT32 uses clusters of 4K. So any file smaller than 4K, which is the case for most of the txt files I am looking for, will be in one cluster. Some of them will span two or more clusters, perhaps contiguous, perhaps not.

So I think I need a tool, that can go through each cluster on the image, and grep for a list of keywords. If the cluster contains a match, it should be copied to a file, maybe just cluster001.txt, cluster002.txt, etc. Then I can manually piece these clusters together.

I would like to know if my reasoning and ideas make sense?

What tools can I use to achieve this?

Is it possible to consider mounting the image using the loopback system in Linux (something like: mount -o loop,ro image.img /mnt)? Then you could use the standard tools to search the filesystem. — carveone, Commented Dec 30, 2013 at 13:41
Why can't you use normal file recovery software to determine if the data even still exists on the partition? — Ramhound, Commented Dec 30, 2013 at 13:59
I don't know what 'normal' file recovery software is? TestDisk only lets me recover files if I know their location. Scalpel uses file carving, which I believe is not possible for txt files. — Mads Skjern, Commented Dec 30, 2013 at 20:37

Dennis · Accepted Answer · 2013-12-30 21:51:24Z

I don't know of any file recovery tool that selects files based on a specific string they contain, but these three methods should work:

When a file on a FAT32 partition is erased, its filename doesn't get overwritten. Only the first byte of the 8.3 filename gets set to E5, marking the file as deleted. This won't affect the extension, so TXT files are still easily recognizable.

You can use any file recovery tool that lets you specify an extension (e.g., Recuva), recover all TXT files and then search for diary in all recovered files.

Since text files are (usually) small, recovering the text files shouldn't take much time (probably less than finding them). For a 150 GB partition, this should be rather quick.
Programs like PhotoRec identify files by their content and attempt to recover them. While it is true that text files don't have any headers, PhotoRec still manages to identify them (by exclusion, I suppose).

Again, you could recover all text files and then search for diary in all recovered files.

Identifying text files by their content will take longer than by their extension, but it will find files which directory entry has been overwritten as well.
Since you don't expect the text files to be big, you could also search for diary in the partition dump and recover the cluster containing it:
```
sudo bash -c '
    for OFFSET in $(grep -abio diary /dev/sda3 | cut -d: -f 1); do
        ((CLUSTER = OFFSET / 4096))
        dd if=<imgfile> of=cluster$CLUSTER.txt bs=4096 skip=$CLUSTER count=1
    done
'
```
How it works:
- grep -Pabio diary /dev/sda3 | cut -d: -f 1 will print the byte offsets of every occurrence of the string diary in the image file.
  
  The -i switch makes the search case-insensitive. The -P switch turns on Perl-compatible Regular Expressions. This is needed because of a bug in some versions of (GNU) grep that makes case-insensitive searches unbearably slow unless you use PCRE.
- ((CLUSTER = OFFSET / 4096)) calculates the offset in clusters from the offset in bytes.
- dd if=<imgfile> of=cluster$CLUSTER.txt bs=4096 skip=$CLUSTER count=1 writes the cluster at offset X in a file named clusterX.txt.
By its nature, this will work only for files that fit in one cluster. You can increase count to recover more than one cluster and decrease CLUSTER to recover previous clusters as well.

To recover three clusters (one before and one after the cluster containing diary), make the following changes:
```
((CLUSTER = OFFSET / 4096 - 1))

dd ... count=3
```

Thank you for the great answer, complete and elaborate :) I won't have time to try this before tomorrow. — Mads Skjern, Commented Dec 30, 2013 at 20:46
In fact I already tried your second suggestion: recovering all files (or all txt files) with TestDisk, but I could not find a way to recover all files, except manually browsing through the system, and undeleting each one. — Mads Skjern, Commented Dec 30, 2013 at 20:47
When PhotoRec asks for a partition, you can specify which file types you wan't to recover in File Opt. Press the right arrow key twice, then Enter. — Dennis, Commented Dec 30, 2013 at 21:16
This worked great! Btw you may need to add z to the grep options, if you get a message saying it has run out of memory. Also, for those who don't know you can put /dev/sda3 in place of <imgfile> — guillefix, Commented Oct 29, 2017 at 13:17

GabrielB · Accepted Answer · 2019-05-25 08:09:21Z

1

Old question, but might be useful to someone someday...

With WinHex you can run a “simultaneous search” of several keywords or expressions on the entirety of a partition, it will then display a list of hits, and if the file-system has been correctly analysed it will indicate which file each found string belongs to, even if the file has been deleted (it's not always reliable in FAT32 though, works better with a NTFS partition).

answered May 25, 2019 at 8:09

GabrielB

8559 silver badges25 bronze badges

Add a comment |

Stack Exchange Network

Recover txt files based on known strings

2 Answers 2

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
data-recovery
.

Hot Network Questions

Recover txt files based on known strings

2 Answers 2

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged data-recovery.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
data-recovery
.