I am looking for a tool that will be faster than grep, maybe a multi-threaded grep, or something similar... I have been looking at a bunch of indexers, but I am not sold that I need an index...
I have about 100 million text files, that I need to grep for exact string matches, upon finding a string match, I need the filename where the match was found.
ie: grep -r 'exact match' > filepaths.log
Its about 4TB of data, and I started my first search 6 days ago, and grep is still running. I have another dozen searches to go and I can't wait 2 months to retrieve all these filenames =]
I've reviewed the following, however, I don't think I need all the bells and whistles these indexers come with, I just need the filename where the match occurred...
- dtSearch
- Terrier
- Lucene
- Xapian
- Recoil
- Sphinx
and after spending hours reading about all those engines, my head is spinning, and I wish I just had a multi-threaded grep lol, any ideas, and/or suggestions are greatly appreciated!
PS: I am running CentOS 6.5
EDIT: Searching for multi-threaded grep returns several items, My question is, is a multi-threaded grep the best option for what I am doing?
EDIT2: After some tweaking, this is what I have come up with, and it is going much faster than the regular grep, I still wish it was faster though... I am watching my disk io wait, and its not building up yet, I may do some more tweaking, and def still interested in any suggestions =]
find . -type f -print0 | xargs -0 -n10 -P4 grep -m 1 -H -l 'search string'