Timeline for How to crawl using wget to download ONLY HTML files (ignore images, css, js)
Current License: CC BY-SA 3.0
10 events
when toggle format | what | by | license | comment | |
---|---|---|---|---|---|
Apr 11, 2017 at 15:44 | answer | added | Spir | timeline score: 13 | |
Apr 28, 2014 at 4:57 | vote | accept | Nathan J.B. | ||
Jan 31, 2014 at 18:00 | answer | added | Nathan J.B. | timeline score: 24 | |
Jan 31, 2014 at 17:51 | history | edited | Nathan J.B. | CC BY-SA 3.0 |
moooore details
|
Jan 31, 2014 at 17:40 | review | Close votes | |||
Feb 4, 2014 at 15:13 | |||||
Jan 31, 2014 at 17:36 | comment | added | Nathan J.B. |
I've tried using --accept=html, but it downloads CSS files THEN deletes them. I want to prevent them from ever downloading. A headers request is fine, though -- E.g. I notice Length: 558 [text/css] on the files I don't want. If I could stop the request if the header doesn't return text/html , I'd be elated.
|
|
Jan 31, 2014 at 17:31 | history | edited | Nathan J.B. | CC BY-SA 3.0 |
more info
|
Jan 31, 2014 at 17:26 | comment | added | Ƭᴇcʜιᴇ007 | Opposite: Exclude list of specific files in wget | |
Jan 31, 2014 at 17:12 | comment | added | ernie | what's the command you've tried so far? If the naming of files is consistent, you should be able to use the -R flag. Alternatively, you could use the --ignore-tags flag and ignore script and img tags. | |
Jan 31, 2014 at 17:12 | history | asked | Nathan J.B. | CC BY-SA 3.0 |