How to download all files in a webpage's directory/sub-folder all at once?

Question

Say a directory or folder's path on a website is https://superuser.xyz/images/, but you don't know this right away. Ordinarily, webmasters don't make browsing of sub-folders available, i.e. the images folder would have no index.html file, so it would simply return a "folder not found" error if someone were to enter the URL in Chrome directly through guessing or sourcing any images' path.

Also, even if the directory were accessible through an index.html file, and you right click on that webpage and press Inspect or View Page Source Code, you could find the folder and its contents, but you can only save individual files in it one at a time in the Inspect view panel, which is inefficient.

In google Chrome windows 10, how do you download all batch contents of an online directory all at once, rather than one-by-one?

1. Please read Sources panel overview. 2. Related superuser.com/q/1442462/152004 — Wicket, Commented Jan 15, 2022 at 0:06
There aren't any HTTP commands that will retrieve more than one file at a time. See the list of available HTTP/1.1 commands in RFC 2616. w3.org/Protocols/rfc2616/rfc2616-sec9.html — Frank Thomas, Commented Jan 15, 2022 at 0:08
If either of you can form an answer that is step-by-step instructive to the question, I could award points. The links so far have been non-instructive to the question. — user610620, Commented Jun 11, 2022 at 7:12

9072997 · Accepted Answer · 2023-12-02 01:02:31Z

wget is designed to do this. It's a CLI tool.

Download wget. The official website only provides source code, so you probably want to use someone else's build of wget (latest version, EXE, you probably want the x64 one).
Go to the folder where you downloaded wget.exe and [shift] + [right click] on the background of the folder. Then click "Open PowerShell Window Here".
Now we can run commands. For example, type .\wget.exe --help and press enter. This should print a bunch of text about how to use wget.

Before we keep going, it's important to understand why "download all files in a webpage's directory" is kind of impossible, and how wget manages to do it anyway. On your local computer, you can open a folder and see all the files inside it. HTTP supports this (it's called WebDAV) but almost every web server has it turned off. Some web servers have a sort-of alternative where they will automatically generate a directory index. These automatically generated directory indexes are just normal HTML pages that contain links to every file in the directory. If the server in question does this for you, great, but it might not for several reasons:

The server admin has them turned off (ex: with the Options -Indexes setting in Apache)
The folder you are interested in already has a default page set (so you see that instead of the directory listing)

Ok, so we've established that we need to know the names of files to download them, and we don't have a way to just list all the names of files in a directory. wget can do something clever though. It can start at a given page, find all the files that page references (images, links, etc), find all the files those pages reference, find all the pages those pages reference, etc. This process is referred to as "crawling" a website, and it's how search engines find things. A nice side-effect of how this works is that if the server you are working with does happen to have directory indexes turned on it can make use of those (since it's just a page of links).

Now we have to write our wget command. wget has a lot of options, because there are a lot of trade-offs when crawling a website. If you crawl too fast you might overwhelm the server and get banned. If you don't have any conditions for where to stop you might wind up trying to download the whole internet (though wget does have default settings to prevent that).

.\wget.exe "https://www.example.com/foo/example.html" --recursive --no-parent --level=5

Breaking this down:

start at https://www.example.com/foo/example.html
--recursive - do the crawling thing
--no-parent - never download (or even look at) a page outside https://www.example.com/foo/
--level=5 - max out at 5 pages deep

That works pretty well if everything is in foo. It sounds like your starting point (example.html) might not be in foo though. The simple (but inefficient) option is to just let wget download the whole site and delete the directories you don't want afterwards. By default, wget won't look at anything outside the domain (www.example.com) you give it, so this might work well enough for you:

.\wget.exe "https://www.example.com/example.html" --recursive --level=5

Chris Heath · Accepted Answer · 2022-01-15 00:23:55Z

-1

even without an index file browsers will be able to display the contents, but the server can be configured to disallow it

if the site allows for it you could try to use this solution in powershell: How to download a whole folder of files/subfolders from the web in PowerShell

for a solution on linux you could try this: CURL to download a directory

answered Jan 15, 2022 at 0:23

Chris Heath

875 bronze badges

not sure what PowerShell is. If it's the modern equivalent of MS-DOS, have no idea how it can even interface to the internet. Hope you could explain how that would work.
– user610620
Commented Jun 11, 2022 at 7:10
are you asking me how Powershell can 'interface to the internet'? or how to run the script in the link?
– Chris Heath
Commented Jun 12, 2022 at 14:24

Add a comment |

Stack Exchange Network

How to download all files in a webpage's directory/sub-folder all at once?

2 Answers 2

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
windows-10
google-chrome
batch
save-as
directory-listing
.

Linked

Hot Network Questions

How to download all files in a webpage's directory/sub-folder all at once?

2 Answers 2

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged windows-10google-chromebatchsave-asdirectory-listing.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
windows-10
google-chrome
batch
save-as
directory-listing
.