Python - Get Header information from URL

Question

I've been searching all around for a Python 3.x code sample to get HTTP Header information.

Something as simple as get_headers equivalent in PHP cannot be found in Python easily. Or maybe I am not sure how to best wrap my head around it.

In essence, I would like to code something where I can see whether a URL exists or not

something in the line of

h = get_headers(url)
if(h[0] == 200)
{
   print("Bingo!")
}

So far, I tried

h = http.client.HTTPResponse('http://docs.python.org/')

But always got an error

possible duplicate of Grabbing headers from webpage with python — DocMax, Commented Feb 19, 2013 at 5:36
@DocMax: Possible, but the accepted answer there doesn't deal with response codes or exception handling. — johnsyweb, Commented Feb 19, 2013 at 6:06
Good point. I had overlooked this subtlety. To other reviewers, please keep this in mind. I would rescind my vote now were it possible. — DocMax, Commented Feb 19, 2013 at 6:27

johnsyweb · Accepted Answer · 2013-02-19 05:16:28Z

11

To get an HTTP response code in python-3.x, use the urllib.request module:

>>> import urllib.request
>>> response =  urllib.request.urlopen(url)
>>> response.getcode()
200
>>> if response.getcode() == 200:
...     print('Bingo')
... 
Bingo

The returned HTTPResponse Object will give you access to all of the headers, as well. For example:

>>> response.getheader('Server')
'Apache/2.2.16 (Debian)'

If the call to urllib.request.urlopen() fails, an HTTPError Exception is raised. You can handle this to get the response code:

import urllib.request
try:
    response = urllib.request.urlopen(url)
    if response.getcode() == 200:
        print('Bingo')
    else:
        print('The response code was not 200, but: {}'.format(
            response.get_code()))
except urllib.error.HTTPError as e:
    print('''An error occurred: {}
The response code was {}'''.format(e, e.getcode()))

edited Feb 19, 2013 at 5:16

answered Feb 19, 2013 at 4:24

johnsyweb

140k26 gold badges194 silver badges250 bronze badges

1

Thank you! This is the answer. I got it after researching more based on the initial answer Ali-Akber provided. Thanks again!
– Adib
Commented Feb 19, 2013 at 4:35
So, the danger with this code is that if the page is 404, it gives me a massive error ending with (urllib.error.HTTPError: HTTP Error 404: Not Found). What's a good way to go around that problem?
– Adib
Commented Feb 19, 2013 at 4:57
1

I've updated my answer to demonstrate how to handle a urllib.error.HTTPError.
– johnsyweb
Commented Feb 19, 2013 at 5:17
Wit requests library, you have response.status_code and response.ok (True/False) available. But that's not a guarantee, that the response is correct. I sometimes get 200 but still an invalid response (e.g. HTML where I expect a JSON).
– Roland
Commented Jun 26, 2023 at 16:13
This is what I do: if response.ok and response.status_code == 200 and len(response.text) > 0: Still no guarantee for getting a JSON back (e.g. when you query a JSON API). You still should validate or try to parse it. Check my json_from_response() function for a complete solution: git.mxchange.org/?p=fba.git;a=blob;f=fba/http/…
– Roland
Commented Jun 26, 2023 at 16:17

Add a comment |

Farzad Vertigo · Accepted Answer · 2017-11-28 06:06:07Z

3

You can use requests module to check it:

import requests
url = "http://www.example.com/"
res = requests.get(url)
if res.status_code == 200:
    print("bingo")

You can also check header contents before making downloading the whole content of the webpage by using header.

answered Nov 28, 2017 at 6:06

Farzad Vertigo

2,6581 gold badge30 silver badges34 bronze badges

You may also wish to check res.ok (True/False).
– Roland
Commented Jun 26, 2023 at 16:13

Add a comment |

Akyidrian · Accepted Answer · 2013-02-20 18:13:53Z

For Python 2.x

urllib, urllib2 or httplib can be used here. However note, urllib and urllib2 uses httplib. Therefore, depending on whether you plan to do this check a lot (1000s of times), it would be better to use httplib. Additional documentation and examples are here.

Example code:

import httplib
try:
    h = httplib.HTTPConnection("www.google.com")
    h.connect()
except Exception as ex:
    print "Could not connect to page."

For Python 3.x

A similar story to urllib (or urllib2) and httplib from Python 2.x applies to the urllib2 and http.client libraries in Python 3.x. Again, http.client should be quicker. For more documentation and examples look here.

Example code:

import http.client

try:
    conn = http.client.HTTPConnection("www.google.com")
    conn.connect()    
except Exception as ex:
    print("Could not connect to page.")

and if you wanted to check the status codes you would need to replace

conn.connect()

with

conn.request("GET", "/index.html")  # Could also use "HEAD" instead of "GET".
res = conn.getresponse()
if res.status == 200 or res.status == 302:  # Specify codes here.
    print("Page Found!")

Note, in both examples, if you would like to catch the specific exception relating to when the URL doesn't exist, rather than all of them, catch the socket.gaierror exception instead (see the socket documentation).

Thanks for this answer. For the current project, I think the URLLIB would work. But once I go through examining many urls at once, I might use your method. Thanks again! — Adib, Commented Feb 19, 2013 at 4:36
I've added a Python 3.x solution. Should work now - it's not much different to the original answer as you can see. — Akyidrian, Commented Feb 19, 2013 at 18:12
Thanks! This informative. I've went through the documentation in the Python 3.3 library. But I guess the problem I originally had was finding the right syntax to implement since I would get confused with the Python 2.x vs Python 3.x functions and calls. — Adib, Commented Feb 19, 2013 at 21:06

Ali-Akber Saifee · Accepted Answer · 2013-02-19 04:08:41Z

1

you can use the urllib2 library

import urllib2
if urllib2.urlopen(url).code == 200:
    print "Bingo"

answered Feb 19, 2013 at 4:08

Ali-Akber Saifee

4,5461 gold badge17 silver badges18 bronze badges

1

This is fine for Python 2. The question is tagged with python-3.x, though.
– johnsyweb
Commented Feb 19, 2013 at 4:27
This is for Python 2.x, but it did help a lot! Thank you
– Adib
Commented Feb 19, 2013 at 4:33

Add a comment |

Collectives™ on Stack Overflow

Python - Get Header information from URL

4 Answers 4

For Python 2.x

For Python 3.x

Not the answer you're looking for? Browse other questions tagged
python
python-3.x
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

For Python 2.x

For Python 3.x

Not the answer you're looking for? Browse other questions tagged pythonpython-3.x or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
python-3.x
or ask your own question.