5

I have a Windows 10 PC and one of my hard discs broke completely (as in not recognized by the BIOS, makes funny noises and cannot be used). It was not the system disc but contained mostly private data. I don't exactly remember what was on the disc and that is the problem. In order to evaluate the loss (and to evaluate if further data recovery procedures on the hard disc are worth the price) I would like to get all possible clues about what was on the disc (file names, possibly with folder names would suffice).

I thought that maybe Windows search with all its file indexing might have some information about what was on the disc stored somewhere and that information, even if only partial, could be retrieved somehow.

Is it possible to access the Windows search database and retrieve a list of files with paths from the lost drive and if so how?

Please note that this question is not about data recovery of the broken hard disc, but about what information Windows keeps about files that were on hard drives (but aren't at the present). The Windows file search index seems like a potential place where such information could be stored.

7
  • 1
    Very interesting question.. I have never considered using the disk indexing to recover names of files (or even CRCs?) that were no longer available. I think the information in the search indexing database thingamajigger probably has what you want but Microsoft doesn't publish this information. Everyone below are smart enough but aren't actually trying to answer your specific question. I too can read a FAT file table entry but that isn't what you are asking. Commented Jan 21, 2021 at 20:47
  • 1
    You state it's, "not accessible at all". If you cannot access the drive, you can't recover any data from it. If this is not the system drive, you might try to access the Search index at "C:\ProgramData\Microsoft\Search\Data\Applications\Windows\Windows.edb" (default), but if that's the drive gone bad, you're out of luck. See eprints.whiterose.ac.uk/75046/1/… for reading the edb file. Commented Jan 22, 2021 at 2:43
  • @DrMoishePippik This question is not about data recovery as such. Thanks for the link to the Windows.edb file. I have difficulties accessing it. Windows won't let me because "Search is using it". Do you know in what format the data is stored in that file? Commented Jan 22, 2021 at 8:17
  • I see that the second link could be helpful. I will search for software reading and analyzing the Windows.edb file (it's 1 GB in size). If it's not too difficult I might even try to create a Python script to analyze the file myself. Commented Jan 22, 2021 at 8:25
  • Boot from another drive to access a file that would be locked by Windows, or use the Volume Shadow Copy Service to do so, as do drive imaging software. Commented Jan 22, 2021 at 17:01

3 Answers 3

1

With the help of a comment of DrMoishe Pippik I succeeded in accessing the Windows search database.

The search content is stored by default at

C:\ProgramData\Microsoft\Search\Data\Applications\Windows\Windows.edb

It's quite a big file (1GB in my case) and typically cannot be copied somewhere else because it's in use by Windows search. Simply open the task manager, search for "Microsoft Windows Search Indexer" and terminate the "Windows search" process. Then copy the Windows.edb file somewhere safe.

Information about the file format of the search database is for example in this document by Howard Chivers, who uses the wdsCarve software, which does not seem to be available for download. There is also this and this article by Joachim Metz. Joachim Metz also seems to be the main contributor to libesedb on Github, which however has no binary releases for Windows and is marked experimental.

In short, the Windows search database seems to be based on the Extensible Storage Engine (ESE) Database File (EDB) format, a Windows proprietary undocumented file format with additional obfuscation and compression parts.

Finally, I found a project by Jeonghyeon Kim from 2018 called WinSearchDBAnalyzer, with sources (additional dependency Winforms) on Github. According to the blog it's free (to use), there are binaries for Windows available and additionally with Microsofts Visual Studio Community edition I could easily build the program myself.

Usage is straightforward, one can select a Windows.edb file location and then check some flags for what to search. It then took a while (~5 minutes) and then presented about 100k entries in a table. Sorting by file location is simple and for each file meta data is presented.

However, coming back to my original intent, the number of files on the lost hard drive still present in the Windows search database was disappointingly small. From hundreds of thousands of files on that hard drive only at max. 1/10th or less was contained (actually the things I still remember), so it was much less helpful in the end than I thought. Still it's a viable way to access at least some meta information about the content of hard drives that are not present and accessible anymore.

0

This answer relates to Trilarion's answer to his own question.

The interesting document linked by Dr. Moishe Pippik describes the scope of indexing. That could explain the low percentage of file indexed.

Furthermore you might test what happens if you present a file with random content to the indexer. Will this file appear in the database or not?

An unknown file format cannot be read by the indexer. As far as I know in older Windows version the indexer could be extended by providing compiled code that enabled the indexer to read the file format that one created.

It might happen that those files that cannot be indexed technically do not appear in the index.

-1

First thing is to be very precise and detail-oriented. You said "as in not accessible at all". Is that regular English grammar? Next thing is that "not accessible at all" is not precise. Many would say that once the drive letter is still there and the content is marked as raw their drive is "not accessible at all".

My definition would be that if the drive powers up with no abnormality but any read attempt issued by the computer to the drive fails it has to be considered as "not accessible at all".

This is a totally different understanding of the same expression.

The first things you can do are very simply. Using preferably a linux machine you would connect the drive and verify its presence with the lsblk command. You would then generate a log file using smartmontools and analyze it. Next thing would be - if the log file content does not object it - try to duplicate the drive using ddrescue and its log file feature. That will provide you with a hopefully mostly complete duplicate on a healthy drive and a list of missing areas that were not copied.

Being paranoid you would then quickly duplicate the duplicate and work on the second copy. Depending on your budget you would run different recovery products against the second copy. A free and open source fingerprinting specialist programm like Photorec would produce results without metadata like directory and file structures, but the usable output of Photorec would get your memory back on track what you had stored on it.

This analysis can be done without harm on a windows machine too using your second copy. If the information gathered is not sufficient your work would start.

You should first learn how the mostly used partition schemes like the old Intel/MBR one work and learn GPT as well. The free and open source software Testdisk is a nice tool to support you.

Knowing the file system used from you formatting the drive you would then need to learn the file system format in question for instance NTFS. As NTFS is a little bit difficult to start with you would rather practice with FAT filesystems. Once you get familar with you can give it a try by overwriting a part or the complete FAT (file allocation table(s)) of a FAT file system. Then a little bit of programming would be required to search the remains of it. You should then learn how defragmentation affects your success.

After understanding a simple file system implementation you would get to (assumably) NTFS. Your only advantage against a recovery program is that you can use your remaining memory of your drive content to your advantage.

That can yield a success but there is no guarantee.

In 2001 I examined a failed XP system drive still using a FAT file system. The only thing I recovered (that mattered at all) was the invitation list (xls -file) of his marriage with the invitations already been sent out. There was no way to automatically reconstruct the file - I had to look at the different following clusters. As it was an xls-file it was not compressed. I simply saw the second cluster of the file and it was not the following one in a linear fashion. Today I would never have succeeded with that task as to the compression used in Excel xlsx-type files.

1
  • Sorry, for the confusion about "not accessible at all". That was my way of saying that the drive should be excluded from the question. I meant that the computer (the BIOS) doesn't recognize it and no data can be retrieved from it. That's also not what I aim at in this question. Commented Jan 22, 2021 at 8:20

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .