Skip to main content
Source Link

Data recovery from file collections

I am presently working as a summer intern.

My first objective is, given a collection of files, recover and identify the file types present in it.

To download sample pseudo-forensics data I have been using: Digital Corpora

To recover files I have been using 'The Sleuthkit' but this does not seem to do the work as I always get error messages regarding the format of the data being used.

It would be great to get suggestions/links to tutorials or software which help in recovering data.

I am also looking forward to links to download psuedo forensic data similar to the one above.

Specifically, I would like to recover all data that are in text formats(word,pdf,emails,html... etc) then unify it into one single file in text format and then use natural language processing to determine places the person was associated with. I have some ideas with respect the natural language processing aspect of the problem but need help with respect to data recovery. How best to do this task.