0

I have a list of several thousand URLs, and I'd like to search each of these pages for a given word. How can I do this programmatically on Windows, preferably using VBScript or Powershell?

2 Answers 2

1

Edit: The original question didn't specify VBScript & Powershell. I'm leaving this Python suggestion in hopes that someone in the future will benefit.

What is the quickest way to do this programmatically on Windows? I guess 'quickest' is a function of your abilities.

With my skills, I would whip up a python script for that, as that would be the quickest way for me. The script, as I would write it, would looks kind of like

search_string = ""                 #String you're search for
sites_with_str = {}                #List that'll contain URLs with search_string in them
file = fopen("c:\sites.txt", "r")
for site in file:
  html = wget(site)
  if html.contains(search_string):
     sites_with_str.add(site)
file.fclose()                      #it's just polite to close your read handles


#Print out the sites with the search string in them
print "\n\nSites Containing Search String \""+search_string+"\":"
for each in sites_with_str:
  print each

Of course, that's sort of Pseudo-Python. You'll have to find a library that'll grab a site for you. And obviously it'd require a little recursive function and some string parsing if you wanted to search all pages within each site referenced in the input file.

4
  • Thanks for the suggestion. I've updated my question to indicate VBScript or Powershell. Commented Jul 12, 2011 at 16:34
  • @Mark -- cries Commented Jul 12, 2011 at 16:42
  • Yes, I'm crying too, not having access to a real OS ;) Commented Jul 12, 2011 at 16:48
  • @Mark and you're being forced to not use Python?? What a Saddistic situation my friend :P Commented Jul 12, 2011 at 16:48
1

I solved my own problem, in case anyone else faces the same requirement:

$webClient = new-object System.Net.WebClient
$webClient.Headers.Add("user-agent", "PowerShell Script")

$info = get-content c:\path\to\file\urls.txt

foreach ($i in $info) {
  $output = ""

  $startTime = get-date
  $output = $webClient.DownloadString($i)
  $endTime = get-date

  if ($output -like "*some dirty word*") {
    "Success`t`t" + $i + "`t`t" + ($endTime - $startTime).TotalSeconds + " seconds"
  } 

}

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .