152

I am using wget to download all images from a website and it works fine but it stores the original hierarchy of the site with all the subfolders and so the images are dotted around. Is there a way so that it will just download all the images into a single folder? The syntax I'm using at the moment is:

wget -r -A jpeg,jpg,bmp,gif,png http://www.somedomain.com

7 Answers 7

223

Try this:

wget -nd -r -P /save/location -A jpeg,jpg,bmp,gif,png http://www.somedomain.com

Here is some more information:

-nd prevents the creation of a directory hierarchy (i.e. no directories).

-r enables recursive retrieval. See Recursive Download for more information.

-P sets the directory prefix where all files and directories are saved to.

-A sets a whitelist for retrieving only certain file types. Strings and patterns are accepted, and both can be used in a comma separated list (as seen above). See Types of Files for more information.

5
  • 5
    This didn't actually work for me. My save location was "." and it copied the whole site hierarchy there. Commented Dec 7, 2012 at 9:57
  • 2
    @ButtleButkus It sounds like you need mess around a bit more with the accept -A option, see the Wget documentation about types of files. Also, if you're downloading to the current directory, you can remove the directory prefix -P option. If you're downloading a single file type, such as only jpg's, use something like wget -r -A.jpg http://www.domain.com. Look at the advanced examples that the Wget documentation provides.
    – Jon
    Commented Dec 8, 2012 at 0:29
  • Adding -nd to the above makes it work. You can also specify multiple -A flags such as -A "*foo*" -A "*bar*"
    – Yablargo
    Commented Dec 14, 2015 at 18:57
  • 1
    Don't forget to use --level=inf or --level=9999999999 because wget is likely to sabotage the job due to default maximum recursion depth level of 5.
    – user619271
    Commented Oct 3, 2017 at 10:46
  • 1
    this is a useful answer, but I think it does not handle duplicate file names when directories get squashed. would there be a way to save files as -1.jpg and -2.jpg and so forth? here they get downloaded as .jpg.1, and then deleted because that's not part of the whitelist. Commented Nov 17, 2021 at 21:21
134
wget -nd -r -l 2 -A jpg,jpeg,png,gif http://t.co
  • -nd: no directories (save all files to the current directory; -P directory changes the target directory)
  • -r -l 2: recursive level 2
  • -A: accepted extensions
wget -nd -H -p -A jpg,jpeg,png,gif -e robots=off example.tumblr.com/page/{1..2}
  • -H: span hosts (wget doesn't download files from different domains or subdomains by default)
  • -p: page requisites (includes resources like images on each page)
  • -e robots=off: execute command robotos=off as if it was part of .wgetrc file. This turns off the robot exclusion which means you ignore robots.txt and the robot meta tags (you should know the implications this comes with, take care).

Example: Get all .jpg files from an exemplary directory listing:

$ wget -nd -r -l 1 -A jpg http://example.com/listing/
0
17

I wrote a shellscript that solves this problem for multiple websites: https://github.com/eduardschaeli/wget-image-scraper

(Scrapes images from a list of urls with wget)

1
  • 1
    Worked great. Thanks Commented Oct 25, 2016 at 14:37
9

Try this one:

wget -nd -r -P /save/location/ -A jpeg,jpg,bmp,gif,png http://www.domain.com

and wait until it deletes all extra information

1
  • It is not working for me.wget -nd -r -P /Users/duraiamuthan/Downloads/images/ -A jpeg,jpg,bmp,gif,png http://www.forbes.com/profile/mark-zuckerberg/
    – Vivo
    Commented Nov 30, 2014 at 13:35
5

According to the man page the -P flag is:

-P prefix --directory-prefix=prefix Set directory prefix to prefix. The directory prefix is the directory where all other files and subdirectories will be saved to, i.e. the top of the retrieval tree. The default is . (the current directory).

This mean that it only specifies the destination but where to save the directory tree. It does not flatten the tree into just one directory. As mentioned before the -nd flag actually does that.

@Jon in the future it would be beneficial to describe what the flag does so we understand how something works.

2

The proposed solutions are perfect to download the images and if it is enough for you to save all the files in the directory you are using. But if you want to save all the images in a specified directory without reproducing the entire hierarchical tree of the site, try to add "cut-dirs" to the line proposed by Jon.

wget -r -P /save/location -A jpeg,jpg,bmp,gif,png http://www.boia.de --cut-dirs=1 --cut-dirs=2 --cut-dirs=3

in this case cut-dirs will prevent wget from creating sub-directories until the 3th level of depth in the website hierarchical tree, saving all the files in the directory you specified.You can add more 'cut-dirs' with higher numbers if you are dealing with sites with a deep structure.

-12

wget utility retrieves files from World Wide Web (WWW) using widely used protocols like HTTP, HTTPS and FTP. Wget utility is freely available package and license is under GNU GPL License. This utility can be install any Unix-like Operating system including Windows and MAC OS. It’s a non-interactive command line tool. Main feature of Wget is it’s robustness. It’s designed in such way so that it works in slow or unstable network connections. Wget automatically start download where it was left off in case of network problem. Also downloads file recursively. It’ll keep trying until file has be retrieved completely.

Install wget in linux machine sudo apt-get install wget

Create a folder where you want to download files . sudo mkdir myimages cd myimages

Right click on the webpage and for example if you want image location right click on image and copy image location. If there are multiple images then follow the below:

If there are 20 images to download from web all at once, range starts from 0 to 19.

wget http://joindiaspora.com/img{0..19}.jpg

1
  • 2
    Your answer explains what wget is and how to use it to download sequentially-numbered images...neither are related to the original question.
    – Alastair
    Commented Feb 19, 2014 at 17:30

Not the answer you're looking for? Browse other questions tagged or ask your own question.