SlideShare a Scribd company logo
Paul Bradshaw
Leanpub.com/scrapingforjournalists*
Scraping
in 60 mins
How do you scrape?
Aron Pilhofer, News Rewired
WYSIWYG tools: OutWit Hub, Apify
Browser extensions: Web Scraper,
Grepsr,
Google Sheets’ =IMPORT functions
Workbench Data, IFTTT, Open
Refine
Morph. io
Scraping tools
OutWit Hub
*
Chrome extensions:
*
Edit column >
Add column by fetching URLs…
https://ifttt.com/channels
https://apify.com/apify/google-search-scraper
https://app.workbenchdata.com/workflows/
*
app.workbenchdata.co
m/workflows/22852
/22850
/25739
https://onlinejournalismblog.com/2013/09/18/ethics-in-data-journalism-mass-data-gathering-scraping-foi-and-deception/
Robots.txt
http://www.tcij.org/robots.txt
Database rights
Data copyright
Terms & conditions
Legal considerations
Scraping in 60 minutes (CIJ Summer School 2019)
https://moveplanner.zoopla.co.uk/terms-and-conditions
Treat like any source:
build in TGTBT checks
Seek second sources
Seek right of reply/
confirmation
Data is just a lead
http://www.storybench.org/to-scrape-or-not-to-scrape-the-technical-and-ethical-challenges-of-collecting-data-off-the-web/
https://www.mediawiki.org/wiki/API:Main_page
Does it have an API?
https://github.com/BBC-Data-Unit/music-festivals
Paul Bradshaw
Leanpub.com/scrapingforjournalists*
Thank you.

More Related Content