Like many others, I have a process for bookmarking web pages to read later. My requirements for web page bookmarking are:
- Ability to bookmark pages must be available from all (within reason) platforms - PC/browser, mobile device, etc.
- Bookmarks must be centrally stored (implicit from #2) so that I can read the bookmarks from anywhere/any device
- Full text of web pages must be stored
Bonus features would be:
- Bookmarks and page content should be full text searchable
- Maintain an archive indefinitely
- Distinguish between what's read vs. unread
- Bookmarked page content is cleaned up, e.g. ads eliminated, unnecessary html removed, pages better formatted for reading
My current process (which addresses most of these requirements) is as follows:
- I set up a Gmail account with 2 labels, "Bookmarks Unread" and "Bookmarks Read"
- Gmail filters set up such that depending on the form of the address (using Gmail's '+string' functionality in addresses), the incoming bookmark gets labeled appropriately
- On each of my browsers/devices, I have an address book entry for MyGmailAccount+BookmarksUnread@gmail.com and MyGmailAccount+BookmarksRead@gmail.com.
- If I want to clean up the page content, I use the Readability bookmarklet which does a great job of giving me the essential content only
- Anywhere I have Firefox, I use the Send Page by Email extension which, with 2 clicks, allows me to send the cleaned-up Readability page URL and content to one of the above email addresses.
- Where I don't have Firefox (e.g. iPhone or other mobile device) I use the native ability to send the current link via email (most/all apps have them, including the browser, RSS readers, NYTimes, etc.). In most cases (unless it's built into the particular app), this won't include the page body.
The process is almost perfect. I've got the central access and ubiquitous access of Gmail as the storage mechanism, full text searchability (due to Gmail, but of course only for the URLs I send from that Firefox extension), a cleaned up page due to Readability, ability to read offline (assuming I use an IMAP client against Gmail) and permanent archiving of content, including what's been read vs. unread.
The missing pieces are:
- The Send Page by Email Firefox extension seems to only send X bytes of a web page. Or some portion. UPDATE: I was incorrect. The limitation is that sendemail.exe (which the extension uses to send the email has a 16k limit on the body text passed in.)
- Where I don't have Firefox, I can only send the link, so no full text search at all in those cases.
Instapaper looks interesting because it does page clean-up, but it's not clear to me that it actually stores the page content or is full text searchable. In addition, they state in their FAQ that their storage isn't mean to be long term.
Thoughts on addressing the gaps and improving this process further?