1

I'm trying to download an entire website's PDF files, two sites actually:

They're in French and they're giving me trouble. I tried using wget, by running:

wget -A pdf -m -p -E -k -K -np https://concours-maths-cpge.fr/

I also tried using lynx by following a guide

I tried using other solutions, like DownThemAll (a Firefox Add-on)

None of work, as if there's no PDF files in these two websites.

Any help is appreciated.

I dual boot both Manjaro and Windows 10—an OS dependant solution is no problem.

1
  • 1
    The main page looks to be dynamically generated by JavaScript and you need something far smarter than a simple HTML parser to deal with it.
    – Mokubai
    Commented Mar 22, 2020 at 12:18

2 Answers 2

0

Have you tried this? https://www.freedownloadmanager.org/fr/

It seems that it can download all (freely) available files from a website, provided you have the proper credentials. You need to authenticate first, I presume.

5
  • Most sites block that tool from working (simultaneous downloads), tried it last week and got a website access denied error.
    – Moab
    Commented Mar 22, 2020 at 12:20
  • I don't have that particular software installed, but I tried using Internet Download Manager Grabber and JDownloader2, both failed, so I guess another download manager would probably not work either, but I'll try it. EDIT: Yep, didn't work. Commented Mar 22, 2020 at 12:22
  • @Moab Is it because of the simultaneous streams? If so, it should have an option in settings that allows changing how many download streams it uses.
    – JW0914
    Commented Mar 22, 2020 at 12:22
  • Ill experiment later, haven't used it in many years until last week, I gave up too soon.
    – Moab
    Commented Mar 22, 2020 at 12:27
  • I assume the server only allows one download at a time from any given ip, at least that is how I would stop mass downloading of my site if I had one.
    – Moab
    Commented Mar 22, 2020 at 12:30
0

Though I disagree with what I'm about to suggest, you could always use HTTTrack Website Copier. The software crawls the given website, and downloads everything, unless you specify otherwise - by filetype (for example, zip or jpg, or png, or maybe pdf).

The reason I find this to be a bit of an overkill is that it keeps sending requests to the target website, and might cause problems for the other side (the website). But it does work, in most cases.

1
  • I agree with @FiddlingAway: it's kind of a swart-a-fly-with-a-steamroller kind of tool, but it does what it advertises. Worth a try, especially if you're in a hurry.
    – user1019780
    Commented Mar 22, 2020 at 14:36

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .