5

I've been searching all around for a Python 3.x code sample to get HTTP Header information.

Something as simple as get_headers equivalent in PHP cannot be found in Python easily. Or maybe I am not sure how to best wrap my head around it.

In essence, I would like to code something where I can see whether a URL exists or not

something in the line of

h = get_headers(url)
if(h[0] == 200)
{
   print("Bingo!")
}

So far, I tried

h = http.client.HTTPResponse('http://docs.python.org/')

But always got an error

3
  • possible duplicate of Grabbing headers from webpage with python
    – DocMax
    Commented Feb 19, 2013 at 5:36
  • @DocMax: Possible, but the accepted answer there doesn't deal with response codes or exception handling.
    – johnsyweb
    Commented Feb 19, 2013 at 6:06
  • 3
    Good point. I had overlooked this subtlety. To other reviewers, please keep this in mind. I would rescind my vote now were it possible.
    – DocMax
    Commented Feb 19, 2013 at 6:27

4 Answers 4

11

To get an HTTP response code in , use the urllib.request module:

>>> import urllib.request
>>> response =  urllib.request.urlopen(url)
>>> response.getcode()
200
>>> if response.getcode() == 200:
...     print('Bingo')
... 
Bingo

The returned HTTPResponse Object will give you access to all of the headers, as well. For example:

>>> response.getheader('Server')
'Apache/2.2.16 (Debian)'

If the call to urllib.request.urlopen() fails, an HTTPError Exception is raised. You can handle this to get the response code:

import urllib.request
try:
    response = urllib.request.urlopen(url)
    if response.getcode() == 200:
        print('Bingo')
    else:
        print('The response code was not 200, but: {}'.format(
            response.get_code()))
except urllib.error.HTTPError as e:
    print('''An error occurred: {}
The response code was {}'''.format(e, e.getcode()))
5
  • 1
    Thank you! This is the answer. I got it after researching more based on the initial answer Ali-Akber provided. Thanks again!
    – Adib
    Commented Feb 19, 2013 at 4:35
  • So, the danger with this code is that if the page is 404, it gives me a massive error ending with (urllib.error.HTTPError: HTTP Error 404: Not Found). What's a good way to go around that problem?
    – Adib
    Commented Feb 19, 2013 at 4:57
  • 1
    I've updated my answer to demonstrate how to handle a urllib.error.HTTPError.
    – johnsyweb
    Commented Feb 19, 2013 at 5:17
  • Wit requests library, you have response.status_code and response.ok (True/False) available. But that's not a guarantee, that the response is correct. I sometimes get 200 but still an invalid response (e.g. HTML where I expect a JSON).
    – Roland
    Commented Jun 26, 2023 at 16:13
  • This is what I do: if response.ok and response.status_code == 200 and len(response.text) > 0: Still no guarantee for getting a JSON back (e.g. when you query a JSON API). You still should validate or try to parse it. Check my json_from_response() function for a complete solution: git.mxchange.org/?p=fba.git;a=blob;f=fba/http/…
    – Roland
    Commented Jun 26, 2023 at 16:17
3

You can use requests module to check it:

import requests
url = "http://www.example.com/"
res = requests.get(url)
if res.status_code == 200:
    print("bingo")

You can also check header contents before making downloading the whole content of the webpage by using header.

1
  • You may also wish to check res.ok (True/False).
    – Roland
    Commented Jun 26, 2023 at 16:13
2

For Python 2.x

urllib, urllib2 or httplib can be used here. However note, urllib and urllib2 uses httplib. Therefore, depending on whether you plan to do this check a lot (1000s of times), it would be better to use httplib. Additional documentation and examples are here.

Example code:

import httplib
try:
    h = httplib.HTTPConnection("www.google.com")
    h.connect()
except Exception as ex:
    print "Could not connect to page."

For Python 3.x

A similar story to urllib (or urllib2) and httplib from Python 2.x applies to the urllib2 and http.client libraries in Python 3.x. Again, http.client should be quicker. For more documentation and examples look here.

Example code:

import http.client

try:
    conn = http.client.HTTPConnection("www.google.com")
    conn.connect()    
except Exception as ex:
    print("Could not connect to page.")

and if you wanted to check the status codes you would need to replace

conn.connect()

with

conn.request("GET", "/index.html")  # Could also use "HEAD" instead of "GET".
res = conn.getresponse()
if res.status == 200 or res.status == 302:  # Specify codes here.
    print("Page Found!")

Note, in both examples, if you would like to catch the specific exception relating to when the URL doesn't exist, rather than all of them, catch the socket.gaierror exception instead (see the socket documentation).

3
  • Thanks for this answer. For the current project, I think the URLLIB would work. But once I go through examining many urls at once, I might use your method. Thanks again!
    – Adib
    Commented Feb 19, 2013 at 4:36
  • 2
    I've added a Python 3.x solution. Should work now - it's not much different to the original answer as you can see.
    – Akyidrian
    Commented Feb 19, 2013 at 18:12
  • Thanks! This informative. I've went through the documentation in the Python 3.3 library. But I guess the problem I originally had was finding the right syntax to implement since I would get confused with the Python 2.x vs Python 3.x functions and calls.
    – Adib
    Commented Feb 19, 2013 at 21:06
1

you can use the urllib2 library

import urllib2
if urllib2.urlopen(url).code == 200:
    print "Bingo"
2
  • 1
    This is fine for Python 2. The question is tagged with python-3.x, though.
    – johnsyweb
    Commented Feb 19, 2013 at 4:27
  • This is for Python 2.x, but it did help a lot! Thank you
    – Adib
    Commented Feb 19, 2013 at 4:33

Not the answer you're looking for? Browse other questions tagged or ask your own question.