2

I'm unsure how to do this. One way is:

import urllib.request;
urllib.request.urlretrieve('www.example.com/file.tar', 'file.tar')

Another way would be:

import urllib.request;

#Set as appropriate
userAgent = ....;

req = urllib.request.Request('www.example.com/file.tar', headers={'User-Agent' : userAgent});
response = urllib.request.urlopen(req);

#Save the file
f = open('file.tar', 'wb');
f.write(response.read());
f.close()

I'm not sure which method to use. I'll be downloading many files (with a pattern filename) in a loop. However, I would like to be able to set up a user-agent header. It's not critical but I'd like to.

EDIT: I forgot to mention that I prefer the first method but I don't know how to set the user-agent header with urlretrieve.

5
  • I dont understand. What is wrong with your second approach if you want to customize the headers? Does it not work?
    – jdi
    Commented Mar 27, 2012 at 23:16
  • It works but I am unsure what the urllib.request.urlretrieve is used for then? Also I need to create a response object on each iteration (in case I put it in a loop). The code is also a lot longer so I thought there must be a way to use urlretrieve and set the headers. After all urlretrieve saves many lines.
    – s5s
    Commented Mar 27, 2012 at 23:19
  • urlretrieve is exactly what the docs say: A higher-level function, to be used simply to copy a network resource to a local file. You don't get much control, like headers, which is why you have to drop down to a request object. You are doing the manual process of urlretrieve so it does require a few more lines.
    – jdi
    Commented Mar 27, 2012 at 23:22
  • I understand. That's what I was unsure about.
    – s5s
    Commented Mar 27, 2012 at 23:24
  • I would strongly suggest using python-requests.org (a higher-level wrapper around urllib3) -- it not only provides a simpler API, but also does things like keepalive and session management for you, without any extra work. If you're retrieving lots of files from the same server, keepalive might be a substantial performance benefit. Commented Mar 27, 2012 at 23:27

2 Answers 2

2

I am moving what started as comments, to an answer...

Your second example is pretty much doing what it needs to, in making a request object with a custom header and then reading the results into a local file.

urlretrieve is a higher level function so it only does exactly what the docs say: Downloads a network resource to a local file and tells you where the file is. If you don't like the slightly lower level approach of your second example and you want more higher-level functionality, you can look into using the Requests library

0

As @jdi said, you may use the requests library. This is also mentioned on https://docs.python.org/2/library/urllib2.html. You will need to pip the library, e.g.

pip install requests

My code looks like this:

import requests

def download_file(url):
    file = requests.get(url)
    return file.text

It can't be easier.

Not the answer you're looking for? Browse other questions tagged or ask your own question.