wget won't ignore no-follow attributes

Question

I'm using the following command to download all files off of a webpage:

wget --recursive "http://example.com"

This gives me sometimes the following error:

no-follow attribute found in www.example.com. Will not follow any links on this page

According to gnu.org, I have to add -e robots=off --wait 0.25 to my command.

My final command looks like this (I don't want span-hosts):

wget --recursive -e robots=off --wait 0.5 "http://example.com"

However I am still getting the above error. What can I do to ignore those attributes?

Gokydzi · Accepted Answer · 2022-09-18 05:25:52Z

4

The right way is there. You just have to try it

wget -r -erobots=off "your_url"

answered Sep 18, 2022 at 5:25

Gokydzi

412 bronze badges

Add a comment |

Nils André · Accepted Answer · 2021-04-16 11:09:20Z

0

The message is a bug, wget is in fact following the links despite showing the message "Will not follow any links on this page".

This has been fixed on the master branch and should be fixed in the next version of wget.

See this for more details.

answered Apr 16, 2021 at 11:09

Nils André

1012 bronze badges

Add a comment |

Matti vL · Accepted Answer · 2020-12-03 05:55:05Z

-2

In my case I had a syntax error in --follow-tags. Removing the syntax error let wget continue despite the no-follow attribute.

answered Dec 3, 2020 at 5:55

Matti vL

1

The OP isn't using --follow-tags, though
– Chris Davies
Commented Dec 5, 2020 at 17:53

Add a comment |

3 Answers 3