Need help downloading all PDF files available on a website

Question

I'm trying to download an entire website's PDF files, two sites actually:

They're in French and they're giving me trouble. I tried using wget, by running:

wget -A pdf -m -p -E -k -K -np https://concours-maths-cpge.fr/

I also tried using lynx by following a guide

I tried using other solutions, like DownThemAll (a Firefox Add-on)

None of work, as if there's no PDF files in these two websites.

Any help is appreciated.

I dual boot both Manjaro and Windows 10—an OS dependant solution is no problem.

The main page looks to be dynamically generated by JavaScript and you need something far smarter than a simple HTML parser to deal with it. — Mokubai, Commented Mar 22, 2020 at 12:18

user1019780user1019780 · Accepted Answer · 2020-03-22 12:17:46Z

0

Have you tried this? https://www.freedownloadmanager.org/fr/

It seems that it can download all (freely) available files from a website, provided you have the proper credentials. You need to authenticate first, I presume.

answered Mar 22, 2020 at 12:17

user1019780

Most sites block that tool from working (simultaneous downloads), tried it last week and got a website access denied error.
– Moab
Commented Mar 22, 2020 at 12:20
I don't have that particular software installed, but I tried using Internet Download Manager Grabber and JDownloader2, both failed, so I guess another download manager would probably not work either, but I'll try it. EDIT: Yep, didn't work.
– Fritjof Larsson
Commented Mar 22, 2020 at 12:22
@Moab Is it because of the simultaneous streams? If so, it should have an option in settings that allows changing how many download streams it uses.
– JW0914
Commented Mar 22, 2020 at 12:22
Ill experiment later, haven't used it in many years until last week, I gave up too soon.
– Moab
Commented Mar 22, 2020 at 12:27
I assume the server only allows one download at a time from any given ip, at least that is how I would stop mass downloading of my site if I had one.
– Moab
Commented Mar 22, 2020 at 12:30

Add a comment |

FiddlingAway · Accepted Answer · 2020-03-22 12:34:33Z

0

Though I disagree with what I'm about to suggest, you could always use HTTTrack Website Copier. The software crawls the given website, and downloads everything, unless you specify otherwise - by filetype (for example, zip or jpg, or png, or maybe pdf).

The reason I find this to be a bit of an overkill is that it keeps sending requests to the target website, and might cause problems for the other side (the website). But it does work, in most cases.

answered Mar 22, 2020 at 12:34

FiddlingAway

2321 gold badge5 silver badges16 bronze badges

I agree with @FiddlingAway: it's kind of a swart-a-fly-with-a-steamroller kind of tool, but it does what it advertises. Worth a try, especially if you're in a hurry.
– user1019780
Commented Mar 22, 2020 at 14:36

Add a comment |

Stack Exchange Network

Need help downloading all PDF files available on a website

2 Answers 2

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
download
website
wget
.

Hot Network Questions

Need help downloading all PDF files available on a website

2 Answers 2

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged downloadwebsitewget.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
download
website
wget
.