I am downloading a file using Python urllib2. How do I check how large the file size is?

Question

And if it is large...then stop the download? I don't want to download files that are larger than 12MB.

request = urllib2.Request(ep_url)
request.add_header('User-Agent',random.choice(agents))
thefile = urllib2.urlopen(request).read()

Community · Accepted Answer · 2017-05-23 11:48:21Z

There's no need as bobince did and drop to httplib. You can do all that with urllib directly:

>>> import urllib2
>>> f = urllib2.urlopen("http://dalkescientific.com")
>>> f.headers.items()
[('content-length', '7535'), ('accept-ranges', 'bytes'), ('server', 'Apache/2.2.14'),
 ('last-modified', 'Sun, 09 Mar 2008 00:27:43 GMT'), ('connection', 'close'),
 ('etag', '"19fa87-1d6f-447f627da7dc0"'), ('date', 'Wed, 28 Oct 2009 19:59:10 GMT'),
 ('content-type', 'text/html')]
>>> f.headers["Content-Length"]
'7535'
>>>

If you use httplib then you may have to implement redirect handling, proxy support, and the other nice things that urllib2 does for you.

bobince · Accepted Answer · 2009-10-28 11:36:19Z

You could say:

maxlength= 12*1024*1024
thefile= urllib2.urlopen(request).read(maxlength+1)
if len(thefile)==maxlength+1:
    raise ThrowToysOutOfPramException()

but then of course you've still read 12MB of unwanted data. If you want to minimise the risk of this happening you can check the HTTP Content-Length header, if present (it might not be). But to do that you need to drop down to httplib instead of the more general urllib.

u= urlparse.urlparse(ep_url)
cn= httplib.HTTPConnection(u.netloc)
cn.request('GET', u.path, headers= {'User-Agent': ua})
r= cn.getresponse()

try:
    l= int(r.getheader('Content-Length', '0'))
except ValueError:
    l= 0
if l>maxlength:
    raise IAmCrossException()

thefile= r.read(maxlength+1)
if len(thefile)==maxlength+1:
    raise IAmStillCrossException()

You can check the length before asking to get the file too, if you prefer. This is basically the same as above, except using the method 'HEAD' instead of 'GET'.

This is a better solution, since Content-Length is not reliable (Someone may incorrectly set it) — Taha Jahangir, Commented Aug 27, 2011 at 7:32

Community · Accepted Answer · 2017-05-23 12:19:36Z

1

you can check the content-length in a HEAD request first, but be warned, this header doesn't have to be set - see How do you send a HEAD HTTP request in Python 2?

edited May 23, 2017 at 12:19

CommunityBot

11 silver badge

answered Oct 28, 2009 at 11:24

SeriousCallersOnly

4293 silver badges6 bronze badges

How do I check the content-length in the HEAD request? Is this considered downloading headers?
– TIMEX
Commented Oct 28, 2009 at 11:26
Doing a HEAD request is at best theoretical if you want to use urllib/urllib2. Those modules only support GET and POST requests.
– Andrew Dalke
Commented Oct 28, 2009 at 19:58

Add a comment |

Gourneau · Accepted Answer · 2014-03-31 02:51:15Z

1

This will work if the Content-Length header is set

import urllib2          
req = urllib2.urlopen("http://example.com/file.zip")
total_size = int(req.info().getheader('Content-Length'))

edited Mar 31, 2014 at 2:51

answered Dec 4, 2011 at 18:52

Gourneau

12.8k8 gold badges43 silver badges42 bronze badges

you don't need .strip(): 1. getheader() already returns stripped version 2. int() doesn't care about leading/trailing whitespace.
– jfs
Commented Mar 28, 2014 at 19:08
Also, there is no point to use int(info().getheader()) if you don't set the default value: ValueError from int is less appropriate than KeyError from req.headers (note: req.info() is req.headers)
– jfs
Commented Mar 28, 2014 at 19:08
@Gourneau - Would this still work if the url specified is ftp:// url?
– Pankaj Parashar
Commented Sep 17, 2014 at 9:22
@PankajParashar Nope, "Content-Length" is pulled out of the HTTP header, so only works with HTTP. This might be what you need though stackoverflow.com/a/5241914/56069
– Gourneau
Commented Sep 18, 2014 at 5:24

Add a comment |

Collectives™ on Stack Overflow

I am downloading a file using Python urllib2. How do I check how large the file size is?

4 Answers 4

Not the answer you're looking for? Browse other questions tagged
python
file
download
urllib2
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Not the answer you're looking for? Browse other questions tagged pythonfiledownloadurllib2 or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
file
download
urllib2
or ask your own question.