1

iwant to download a file with the urllib2, and meanwhile i want to display a progress bar.. but how can i get the actual downloaded filesize?

my current code is

ul = urllib2.urlopen('www.file.com/blafoo.iso')
data = ul.get_data()

or

open('file.iso', 'w').write(ul.read())

The data is first written to the file, if the whole download is recieved from the website. how can i access the downloaded data size?

Thanks for your help

1
  • have you tried urllib.urlretrieve ?
    – Inbar Rose
    Commented Aug 6, 2012 at 14:47

3 Answers 3

7

Here's an example of a text progress bar using the awesome requests library and the progressbar library:

import requests
import progressbar

ISO = "http://www.ubuntu.com/start-download?distro=desktop&bits=32&release=lts"
CHUNK_SIZE = 1024 * 1024 # 1MB

r = requests.get(ISO)
total_size = int(r.headers['content-length'])
pbar = progressbar.ProgressBar(maxval=total_size).start()

file_contents = ""
for chunk in r.iter_content(chunk_size=CHUNK_SIZE):
    file_contents += chunk
    pbar.update(len(file_contents))

This is what I see in the console while running:

$ python requests_progress.py
 90% |############################   |

Edit: some notes:

  • Not all servers provide a content-length header, so in that case, you can't provide a percentage
  • You might not want to read the whole file in memory if it's big. You can write the chunks to a file, or somewhere else.
3
  • I like the progress bar library! Reading a whole ISO image into memory isn't a good idea though. Also some additional handling is needed for when the Content-length header is missing (the server isn't required to send it).
    – user634175
    Commented Aug 6, 2012 at 16:02
  • added notes about content-length and file in memory
    – jterrace
    Commented Aug 6, 2012 at 16:41
  • got it working with a progressbar. case sensitive for the content-length :) Now its quite awesome ! thanks again Commented Aug 6, 2012 at 18:13
4

You can use info function of urllib2 which returns the meta-information of the page and than you can use getheaders to access Content-Length.

For example, let's calculate the download size of Ubuntu 12.04 ISO

>>> info = urllib2.urlopen('http://mirror01.th.ifl.net/releases//precise/ubuntu-12.04-desktop-i386.iso')
>>> size = int(info.info().getheaders("Content-Length")[0])
>>> size/1024/1024
701
>>>
1
import urllib2
with open('file.iso', 'wb') as output: # Note binary mode otherwise you'll corrupt the file
    with urllib2.urlopen('www.file.com/blafoo.iso') as ul:
        CHUNK_SIZE = 8192
        bytes_read = 0
        while True:
            data = ul.read(CHUNK_SIZE)
            bytes_read += len(data) # Update progress bar with this value
            output.write(data)
            if len(data) < CHUNK_SIZE: #EOF
                break

Not the answer you're looking for? Browse other questions tagged or ask your own question.