6

I'm trying to scrape this web page because there is no other way to automatically be alerted when its contents change:

https://airsdk.harman.com/runtime

Using cURL works fine to download the page (at which point its contents can be parsed), but using Invoke-WebRequest or the DownloadFile/DownloadString methods of System.Net.WebClient causes an error to arise, saying that the web server returned a 404 error.
Checking in Chrome confirms that the page always responds with a 404, but also returns content, which is what I want.

Using PowerShell 5.1, is there a way to instruct Invoke-WebRequest to ignore the spurious 404 error, or some method by which I can get the response data regardless?

1 Answer 1

9

In PowerShell 7, there is a -SkipHttpErrorCheck which will make Invoke-WebRequest behave like you want it in your usecase.

Invoke-WebRequest https://airsdk.harman.com/runtime -SkipHttpErrorCheck -OutFile C:\install\test.html

In PowerShell 5.1, use curl.exe. If you're on Windows 10 v1803 or later, curl.exe is shipped with the OS, if you're on a lower version you need to download it manually.

curl.exe https://airsdk.harman.com/runtime --output C:\install\abc.html

remember to specify the .exe because curl without it is just an alias of Invoke-WebRequest

If you don't want to use curl.exe, all you can do is wrapping it in try/catch and access the response data through the exception, but not really download it as a file, and without as many information as you probably would like.

Try { 
    Invoke-WebRequest https://airsdk.harman.com/runtime -ErrorAction Stop 
} Catch { 
    $_.Exception.Response 
}


IsMutuallyAuthenticated : False
Cookies                 : {}
Headers                 : {Connection, Vary, X-Content-Type-Options, X-XSS-Protection...}
SupportsHeaders         : True
ContentLength           : 1123
ContentEncoding         :
ContentType             : text/html;charset=UTF-8
CharacterSet            : UTF-8
Server                  :
LastModified            : 08.10.2021 19:01:01
StatusCode              : NotFound
StatusDescription       :
ProtocolVersion         : 1.1
ResponseUri             : https://airsdk.harman.com/runtime
Method                  : GET
IsFromCache             : False
2
  • I had no idea curl.exe shipped with Windows! That's fine, I'll just use that. I didn't want to include it as a dependency but if it's there already, and it is, I'll leverage it instead. Thanks!
    – seagull
    Commented Oct 8, 2021 at 21:58
  • @seagull you're welcome! if it answered your question satisfactorily, then please consider to hit that checkmark icon and mark the question as solved :).
    – SimonS
    Commented Oct 8, 2021 at 22:22

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .