1

I need to download all files in a domain folder says https://example.com/folder/subfolder. The subfolder files dont have unique increment, means the file names are random string. I want to download all the files in the subfolder using wget or any other method. Please give details.

I tried the answer here. It only download index.html file. I tried other option in that answer with --reject option, but it don't download anything.

4
  • 2
    Does this answer your question? Download ALL Folders, SubFolders, and Files using Wget
    – harrymc
    Commented Jan 2, 2022 at 8:37
  • I tried that, It only download index.html file. I tried other option in that answer with --reject option, but it don't download anything.
    – J C
    Commented Jan 2, 2022 at 8:44
  • 1
    Does each index.html file have a list of all the files at each folder level? I would suggest using Powershell, to read that file to obtain the file names and create the urls to download. Commented Jan 2, 2022 at 9:26
  • Page contains Load more option and index file have lines till load more button link which don't has href attribute.
    – J C
    Commented Jan 2, 2022 at 9:33

1 Answer 1

0

As far as I am aware, wget only works with links that:

  • Explicitly have an standard href attribute.

  • Are present in a given HTML document (which is something the server generates, so every technically available file may not always be listed for wget to download).

Furthermore, you should probably look at any raw page source (e.g. in your browser). If the page uses JavaScript, you may be out of luck, as wget does not process JavaScript.

If a link is listed in the raw HTML, but without a standard href attribute, you can still parse the page for links, just not with wget. You would likely need to write your own script with something like Windows PowerShell or Python (possibly with requests) and BeautifulSoup.


Note that in some rare cases, if the links are entirely generated by JavaScript, you might even need Selenium to save a fully rendered page before processing it for file links. Python has a Selenium module and I have personally had good luck with the stand-alone "Marmaduke" builds (zip files) of Ungoogled Chromium from Woolyss for browser automation.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .