3

I have a list of 307 web-page URLs. Is it possible to tell Google (or another search engine) to search a specific keyword EXACTLY only on those 307 websites?

I have read some advices which tell to cycle through the list with a script or similar and perform a Google search for each list item. But wouldn't this make Google think you are a bot and block the searches?

Another suggested method was to download those 307 webpages with wget (perhaps using a script) or similar and then do a local search in those downloaded web-pages. But that would take a long time while a Google search would be almost instantaneous.

Doesn't Google have a built-in method to search from a list of URLs?

2
  • Fetching the pages and grepping them should be pretty fast, or you don't have a few seconds to spare? Google can search for an exact keyword if the search term is inside quotes, and can limit the search to specific websites with the site: option.
    – simlev
    Commented Jul 26, 2017 at 9:14
  • A common approach when performing multiple Google searches through a script is to insert (posibly random) delays in order not to look suspicious.
    – simlev
    Commented Jul 26, 2017 at 9:29

1 Answer 1

4

Several questions are being asked here, I'll try to answer them all in proper order.

Is it possible to tell Google (or another search engine) to search a specific keyword EXACTLY only on those 307 websites?

Have a look at google's search options. You can look for an exact term by quoting it "like this". You can then filter by domain (not the same as url!) with the site: option. In your case, you would build the search string in this format:

"keyword" site:site1.com site:site2.com ...site:site307.com

...cycle through the list with a script... But wouldn't Google think you are a bot?

A common approach when performing multiple Google searches through a script is to insert (possibly random) delays in order not to look suspicious.


...download those 307 webpages with wget... That would take a long time.

It you need to check those websites every now and then, you could consider saving a local copy of the webpages and perodically refreshing them; local search should then be very fast. Otherwise, Google is going to be faster because it works with cached results and doesn't have to wait for connection and download. But, unless a website is down or experiencing serious problems, it should be all over in 30 seconds or so. Supposing you have a list of urls in a file called list.txt, you would just have to run:

cat "list.txt" | parallel 'wget -q -O - {} | grep keyword' to see matching contents or:
cat "list.txt" | parallel 'if wget -q -O - {} | grep -q keyword; then echo {}; fi' for urls or: cat "list.txt" | parallel 'if wget -q -O - {} | grep keyword; then echo {}; fi' to show both.


Doesn't Google have a built-in method to search from a list of URLs?

Yes, there's Custom Search:

With Google Custom Search, you can:
- Create custom search engines that search across a specified collection of sites or pages

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .