0

On my Windows 10 system, if I search for a common word in a folder with many documents, I get a long list of results. I would like to be able to sort the results based on the number of hits in each document. Looking through the list of available columns I see Relevance as the only one that might be related but the result in my case is the same for all items, 890. Nothing else in the list seems applicable.

It this possible? Does anyone know if Windows Search even has this metric stored internally? I'd even be willing to code something in vb.net to get to it.

Thanks.

2 Answers 2

1

This feature is not built into Windows Search, though it might have that frequency data stored internally in Extensible Storage Engine format. The data is located in C:\ProgramData\Microsoft\Search\Data\Applications\Windows\Windows.edb, and there are auxiliary indexes in subfolders of Search.

Though Nirsoft's free tool, ESEDatabaseView, can be used to view that type of file, Windows.edb is locked by the Windows Search Engine, when in use, so the best you can do with that data is to stop indexing and Search to unlock it, leaving you with out-of-date data.

Rather than reinventing Search, you might be better served by a more configurable engine, such as free DocFetcher. That tool, for example, enables "fuzzy" searches and weighting searches by proximity of words or importance of a key term. For example, you might search for the word "Ahab", in proximity to "whale".

DocFetcher Search Results

Since DocFetcher reports the "Score" of each item, you have insight into the closeness of the match, frequency in the document, etc.

5
  • A very good candidate to be marked as the answer. I'll check it out. Commented Sep 18, 2021 at 7:24
  • Got the portable version and run my search. The top two documents with equal score had a difference in the word frequency by 11 (29 vs 40 hits). Maybe non English search confuses it. Still neat though. I might reach out to developers. Commented Sep 18, 2021 at 8:14
  • Developers pointed me to the FAQ which explains how the score is calculated which is close but quite right for my needs. Nonetheless the tool is very cool so I've up-voted your answer. Commented Sep 19, 2021 at 4:37
  • @DudenamedBen, Please post a reference to that Windows Search result-scoring algorithm, either in your question, or in an answer. That might be useful knowledge for the community. and I'd upvote it. Commented Sep 19, 2021 at 19:04
  • Here is a quote from DocFetcher developers regarding their scoring, the score doesn't reflect the number of hits in a document, but the number of hits in the document relative to its total length. Commented Sep 22, 2021 at 1:41
0

For those on Windows 10 and comfortable with Linux and the command line I have the following alternative solution based on https://askubuntu.com/a/1131185/1350649

  • Open Windows Subsystem for Linux (WSL) prompt

  • Install tools if missing sudo apt install unoconv

  • Run the following to get a sorted list based on word frequency

    for i in *.odt ; do R=`unoconv --stdout -f text $i | grep -w -o "word" | wc -l`; echo $R $i; done | sort -n

This isn't fast but it gets the job done.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .