Skip to main content
Slight rephrase.
Source Link
simlev
  • 3.9k
  • 3
  • 15
  • 33

You are asking so manySeveral questions are being asked here, I'll try to answer them all in proper order.

Is it possible to tell Google (or another search engine) to search a specific keyword EXACTLY only on those 307 websites?

Have a look at google's search options. You can look for an exact term by quoting it "like this". You can then filter by domain (not the same as url!) with the site: option. In your case, you would build the search string in this format:

"keyword" site:site1.com site:site2.com ...site:site307.com

...cycle through the list with a script... But wouldn't Google think you are a bot?

A common approach when performing multiple Google searches through a script is to insert (possibly random) delays in order not to look suspicious.


...download those 307 webpages with wget... That would take a long time.

It you need to check those websites every now and then, you could consider saving a local copy of the webpages and perodically refreshing them; local search should then be very fast. Otherwise, Google is going to be faster because it works with cached results and doesn't have to wait for connection and download. But, unless a website is down or experiencing serious problems, it should be all over in 30 seconds or so. Supposing you have a list of urls in a file called list.txt, you would just have to run:

cat "list.txt" | parallel 'wget -q -O - {} | grep keyword' to see matching contents or:
cat "list.txt" | parallel 'if wget -q -O - {} | grep -q keyword; then echo {}; fi' for urls or: cat "list.txt" | parallel 'if wget -q -O - {} | grep keyword; then echo {}; fi' to show both.


Doesn't Google have a built-in method to search from a list of URLs?

Yes, there's Custom Search:

With Google Custom Search, you can:

  • Create custom search engines that search across a specified collection of sites or pages

You are asking so many questions, I'll try to answer them all in proper order.

Is it possible to tell Google (or another search engine) to search a specific keyword EXACTLY only on those 307 websites?

Have a look at google's search options. You can look for an exact term by quoting it "like this". You can then filter by domain (not the same as url!) with the site: option. In your case, you would build the search string in this format:

"keyword" site:site1.com site:site2.com ...site:site307.com

...cycle through the list with a script... But wouldn't Google think you are a bot?

A common approach when performing multiple Google searches through a script is to insert (possibly random) delays in order not to look suspicious.


...download those 307 webpages with wget... That would take a long time.

It you need to check those websites every now and then, you could consider saving a local copy of the webpages and perodically refreshing them; local search should then be very fast. Otherwise, Google is going to be faster because it works with cached results and doesn't have to wait for connection and download. But, unless a website is down or experiencing serious problems, it should be all over in 30 seconds or so. Supposing you have a list of urls in a file called list.txt, you would just have to run:

cat "list.txt" | parallel 'wget -q -O - {} | grep keyword' to see matching contents or:
cat "list.txt" | parallel 'if wget -q -O - {} | grep -q keyword; then echo {}; fi' for urls or: cat "list.txt" | parallel 'if wget -q -O - {} | grep keyword; then echo {}; fi' to show both.


Doesn't Google have a built-in method to search from a list of URLs?

Yes, there's Custom Search:

With Google Custom Search, you can:

  • Create custom search engines that search across a specified collection of sites or pages

Several questions are being asked here, I'll try to answer them all in proper order.

Is it possible to tell Google (or another search engine) to search a specific keyword EXACTLY only on those 307 websites?

Have a look at google's search options. You can look for an exact term by quoting it "like this". You can then filter by domain (not the same as url!) with the site: option. In your case, you would build the search string in this format:

"keyword" site:site1.com site:site2.com ...site:site307.com

...cycle through the list with a script... But wouldn't Google think you are a bot?

A common approach when performing multiple Google searches through a script is to insert (possibly random) delays in order not to look suspicious.


...download those 307 webpages with wget... That would take a long time.

It you need to check those websites every now and then, you could consider saving a local copy of the webpages and perodically refreshing them; local search should then be very fast. Otherwise, Google is going to be faster because it works with cached results and doesn't have to wait for connection and download. But, unless a website is down or experiencing serious problems, it should be all over in 30 seconds or so. Supposing you have a list of urls in a file called list.txt, you would just have to run:

cat "list.txt" | parallel 'wget -q -O - {} | grep keyword' to see matching contents or:
cat "list.txt" | parallel 'if wget -q -O - {} | grep -q keyword; then echo {}; fi' for urls or: cat "list.txt" | parallel 'if wget -q -O - {} | grep keyword; then echo {}; fi' to show both.


Doesn't Google have a built-in method to search from a list of URLs?

Yes, there's Custom Search:

With Google Custom Search, you can:

  • Create custom search engines that search across a specified collection of sites or pages
Added more examples with variations.
Source Link
simlev
  • 3.9k
  • 3
  • 15
  • 33

You are asking so many questions, I'll try to answer them all in proper order.

Is it possible to tell Google (or another search engine) to search a specific keyword EXACTLY only on those 307 websites?

Have a look at google's search options. You can look for an exact term by quoting it "like this". You can then filter by domain (not the same as url!) with the site: option. In your case, you would build the search string in this format:

"keyword" site:site1.com site:site2.com ...site:site307.com

...cycle through the list with a script... But wouldn't Google think you are a bot?

A common approach when performing multiple Google searches through a script is to insert (possibly random) delays in order not to look suspicious.


...download those 307 webpages with wget... That would take a long time.

It you need to check those websites every now and then, you could consider saving a local copy of the webpages and perodically refreshing them; local search should then be very fast. Otherwise, Google is going to be faster because it works with cached results and doesn't have to wait for connection and download. But, unless a website is down or experiencing serious problems, it should be all over in 30 seconds or so. Supposing you have a list of urls in a file called list.txt, you would just have to run:

cat "list.txt" | parallel 'wget -q -O - {} | grep keyword' to see matching contents or:
cat "list.txt" | parallel 'if wget -iq keyword'-O - {} | grep -q keyword; then echo {}; fi' for urls or: cat "list.txt" | parallel 'if wget -q -O - {} | grep keyword; then echo {}; fi' to show both.


Doesn't Google have a built-in method to search from a list of URLs?

Yes, there's Custom Search:

With Google Custom Search, you can:

  • Create custom search engines that search across a specified collection of sites or pages

You are asking so many questions, I'll try to answer them all in proper order.

Is it possible to tell Google (or another search engine) to search a specific keyword EXACTLY only on those 307 websites?

Have a look at google's search options. You can look for an exact term by quoting it "like this". You can then filter by domain (not the same as url!) with the site: option. In your case, you would build the search string in this format:

"keyword" site:site1.com site:site2.com ...site:site307.com

...cycle through the list with a script... But wouldn't Google think you are a bot?

A common approach when performing multiple Google searches through a script is to insert (possibly random) delays in order not to look suspicious.


...download those 307 webpages with wget... That would take a long time.

It you need to check those websites every now and then, you could consider saving a local copy of the webpages and perodically refreshing them; local search should then be very fast. Otherwise, Google is going to be faster because it works with cached results and doesn't have to wait for connection and download. But, unless a website is down or experiencing serious problems, it should be all over in 30 seconds or so. Supposing you have a list of urls in a file called list.txt, you would just have to run:

cat "list.txt" | parallel 'wget -q -O - {} | grep -i keyword'


Doesn't Google have a built-in method to search from a list of URLs?

Yes, there's Custom Search:

With Google Custom Search, you can:

  • Create custom search engines that search across a specified collection of sites or pages

You are asking so many questions, I'll try to answer them all in proper order.

Is it possible to tell Google (or another search engine) to search a specific keyword EXACTLY only on those 307 websites?

Have a look at google's search options. You can look for an exact term by quoting it "like this". You can then filter by domain (not the same as url!) with the site: option. In your case, you would build the search string in this format:

"keyword" site:site1.com site:site2.com ...site:site307.com

...cycle through the list with a script... But wouldn't Google think you are a bot?

A common approach when performing multiple Google searches through a script is to insert (possibly random) delays in order not to look suspicious.


...download those 307 webpages with wget... That would take a long time.

It you need to check those websites every now and then, you could consider saving a local copy of the webpages and perodically refreshing them; local search should then be very fast. Otherwise, Google is going to be faster because it works with cached results and doesn't have to wait for connection and download. But, unless a website is down or experiencing serious problems, it should be all over in 30 seconds or so. Supposing you have a list of urls in a file called list.txt, you would just have to run:

cat "list.txt" | parallel 'wget -q -O - {} | grep keyword' to see matching contents or:
cat "list.txt" | parallel 'if wget -q -O - {} | grep -q keyword; then echo {}; fi' for urls or: cat "list.txt" | parallel 'if wget -q -O - {} | grep keyword; then echo {}; fi' to show both.


Doesn't Google have a built-in method to search from a list of URLs?

Yes, there's Custom Search:

With Google Custom Search, you can:

  • Create custom search engines that search across a specified collection of sites or pages
Source Link
simlev
  • 3.9k
  • 3
  • 15
  • 33

You are asking so many questions, I'll try to answer them all in proper order.

Is it possible to tell Google (or another search engine) to search a specific keyword EXACTLY only on those 307 websites?

Have a look at google's search options. You can look for an exact term by quoting it "like this". You can then filter by domain (not the same as url!) with the site: option. In your case, you would build the search string in this format:

"keyword" site:site1.com site:site2.com ...site:site307.com

...cycle through the list with a script... But wouldn't Google think you are a bot?

A common approach when performing multiple Google searches through a script is to insert (possibly random) delays in order not to look suspicious.


...download those 307 webpages with wget... That would take a long time.

It you need to check those websites every now and then, you could consider saving a local copy of the webpages and perodically refreshing them; local search should then be very fast. Otherwise, Google is going to be faster because it works with cached results and doesn't have to wait for connection and download. But, unless a website is down or experiencing serious problems, it should be all over in 30 seconds or so. Supposing you have a list of urls in a file called list.txt, you would just have to run:

cat "list.txt" | parallel 'wget -q -O - {} | grep -i keyword'


Doesn't Google have a built-in method to search from a list of URLs?

Yes, there's Custom Search:

With Google Custom Search, you can:

  • Create custom search engines that search across a specified collection of sites or pages