3

I am using Windows XP and am trying to use wget to download all the pictures (and some other files) from my website which is going to be closed on a host in about two weeks (so I need to hurry).

I wonder why I can download specific files with no problem, but when it comes to downloading everything from that site automatically, it just doesn't work.

If try this line, for example:

wget –r http://*the site’s name*/ lang2.JPG 

It works just fine: It create a folder (its name being the name of the web site), and downloads a picture (lang2.JPG) into it.

However, when I try this one:

wget –r http://*the site’s name* 

it doesn’t do anything. I only get these lines in the command window:

HTTP request sent, awaiting response…403 Forbidden 2009-12-02 09:54:33
ERROR 403: Forbidden

Why is it so that when I download a particular picture from my site, it is not forbidden, but when I want to download all the files automatically, it is forbidden?

1 Answer 1

2

This is mainly because Wget just wasn't designed for this sort of operation...

Wget is one of the best and simplest tools for downloading files if you know the absoloute path, for example, it may work if you tried index.html, index.htm, default.htm or default.html (or others)... However it isn't a full web browser, and doing recursion or anything advanced can cause problems.

Based on your previous questions and my understanding, I highly recommend you ask your ISP/host for FTP credentials or other information and simply download all content. Failing this, take a look at HTTrack, if you type the website address, it should be able to download EVERYTHING to a local folder and keep the directory structure the same as on your host. You should be able to get what you want working in a fraction of the time compared to using wget.

3
  • Thank you, Wil!!! It's already the second time you've encouraged me to take the way of HTTrack and this time I think I'll take it.
    – brilliant
    Commented Dec 2, 2009 at 2:57
  • wil, shame on you. wget does just fine mirroring sites -- recursive download is what it's designed for. the error he's getting is the server saying "you can't access this", whatever the server understands "this" to be. that may be due to the server filtering out traffic based on the agent-string wget is supplying, or maybe there isn't an index.html on the site and the server is set not to list files on the webroot. ... Commented Dec 2, 2009 at 11:58
  • ... whatever it is, it's not the fault of wget. it's likely an understanding of how the site is configured would allow us to craft some command-line magic for this to work for him. (but since he's left us in the dark about the site, we're left to guess about it.) Commented Dec 2, 2009 at 12:00

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .