0

I'd like a sanity check on this Python script. My goal is to input a list of urls and get a byte size, giving me an indicator if the url is good or bad.

import urllib2
import shutil

urls = (LIST OF URLS)

def getUrl(urls):
    for url in urls:
        file_name = url.replace('https://','').replace('.','_').replace('/','_')
        try:
            response = urllib2.urlopen(url)
        except urllib2.HTTPError, e:
            print e.code
        except urllib2URLError, e:
            print e.args
        print urls, len(response.read())
        with open(file_name,'wb') as out_file:
            shutil.copyfileobj(response, out_file)
getUrl(urls)

The problem I am having is my output looks like:

(LIST OF URLS) 22511
(LIST OF URLS) 56472
(LIST OF URLS) 8717
...

How would I make only one url appear with the byte size?
Is there a better way to get these results?

4
  • do you mind indent the code ? please
    – Sérgio
    Commented Jun 2, 2015 at 13:43
  • seconded. I'd have done it, if there wasn't some manual HTML in between. Commented Jun 2, 2015 at 13:44
  • I think you meant print url, ... rather than print urls, ...
    – GP89
    Commented Jun 2, 2015 at 13:45
  • Thank you all. I'm sleep deprived and new to stackoverflow. I was overlooking the 's'! Commented Jun 2, 2015 at 18:02

2 Answers 2

2

Try

print url, len(response.read())

Instead of

print urls, len(response.read())

You are printing the list each time. Just print the current item.

There are some alternate ways to determine a pages size described here and here there is little point me duplicating that information here.

Edit

Perhaps you would consider using requests instead of urllib2.

You can easily extract only the content-length from the HEAD request and avoid a full GET. e.g.

import requests

h = requests.head('http://www.google.com')

print h.headers['content-length'] 

HEAD request using urllib2 or httplib2 detailed here.

1
  • Thank you for the links. I'm reading up on requests and it does look like it would fit my needs better. Cheers! Commented Jun 2, 2015 at 18:10
2

How would I make only one url appear with the byte size?

Obviously: don't

print urls, ...

but

print url, ...

Not the answer you're looking for? Browse other questions tagged or ask your own question.