Python download a file

Question

I'm unsure how to do this. One way is:

import urllib.request;
urllib.request.urlretrieve('www.example.com/file.tar', 'file.tar')

Another way would be:

import urllib.request;

#Set as appropriate
userAgent = ....;

req = urllib.request.Request('www.example.com/file.tar', headers={'User-Agent' : userAgent});
response = urllib.request.urlopen(req);

#Save the file
f = open('file.tar', 'wb');
f.write(response.read());
f.close()

I'm not sure which method to use. I'll be downloading many files (with a pattern filename) in a loop. However, I would like to be able to set up a user-agent header. It's not critical but I'd like to.

EDIT: I forgot to mention that I prefer the first method but I don't know how to set the user-agent header with urlretrieve.

I dont understand. What is wrong with your second approach if you want to customize the headers? Does it not work? — jdi, Commented Mar 27, 2012 at 23:16
It works but I am unsure what the urllib.request.urlretrieve is used for then? Also I need to create a response object on each iteration (in case I put it in a loop). The code is also a lot longer so I thought there must be a way to use urlretrieve and set the headers. After all urlretrieve saves many lines. — s5s, Commented Mar 27, 2012 at 23:19
urlretrieve is exactly what the docs say: A higher-level function, to be used simply to copy a network resource to a local file. You don't get much control, like headers, which is why you have to drop down to a request object. You are doing the manual process of urlretrieve so it does require a few more lines. — jdi, Commented Mar 27, 2012 at 23:22
I would strongly suggest using python-requests.org (a higher-level wrapper around urllib3) -- it not only provides a simpler API, but also does things like keepalive and session management for you, without any extra work. If you're retrieving lots of files from the same server, keepalive might be a substantial performance benefit. — Charles Duffy, Commented Mar 27, 2012 at 23:27

jdi · Accepted Answer · 2012-03-27 23:26:11Z

I am moving what started as comments, to an answer...

Your second example is pretty much doing what it needs to, in making a request object with a custom header and then reading the results into a local file.

urlretrieve is a higher level function so it only does exactly what the docs say: Downloads a network resource to a local file and tells you where the file is. If you don't like the slightly lower level approach of your second example and you want more higher-level functionality, you can look into using the Requests library

ferdy · Accepted Answer · 2015-10-16 12:52:32Z

0

As @jdi said, you may use the requests library. This is also mentioned on https://docs.python.org/2/library/urllib2.html. You will need to pip the library, e.g.

pip install requests

My code looks like this:

import requests

def download_file(url):
    file = requests.get(url)
    return file.text

It can't be easier.

answered Oct 16, 2015 at 12:52

ferdy

7,6463 gold badges37 silver badges50 bronze badges

Add a comment |

Collectives™ on Stack Overflow

Python download a file

2 Answers 2

Not the answer you're looking for? Browse other questions tagged
python
linux
urllib2
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Not the answer you're looking for? Browse other questions tagged pythonlinuxurllib2 or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
linux
urllib2
or ask your own question.