3

How would you mass download files from a web page and also rename them using the href name(description) they have?

The idea is that files downloaded have descriptive names, unlike the original file names which are anything but that.

For example, given that a web page contains the following link

<a href='http://www.example.com/docs/ex160.pdf'>Advanced Foo Bar</a>

Ideally, I would like to save it as "Advanced Foo Bar.pdf", but even "Advanced Foo Bar" would be fine, as I can use a Bulk renaming utility to add pdf extension to hundred or so files I have to download.

I have been using FlashGotAll extension for Firefox to download, and it works splendidly for bulk downloading, except there is no renaming function built in.

I can also fire up Linux(or use cygwin) and use curl or wget, if need be for this solution.

1 Answer 1

2

Assuming that the html content is nice looking like your example (i.e. only one href per line, not split on several lines, no mix of HREF and href, etc), you can download the page and run

prompt$ grep www.example.com the_page.html | sed 's/.*href="\([^"]\+\)">\([^<]*\)<.*/wget -O "\2".pdf \1/' | tee files_to_download
wget -O "Advanced Foo Bar".pdf http://www.example.com/docs/ex160.pdf
...
prompt$

Edit files_to_download if applicable, and then download by running sh files_to_download.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .