24

Academic blogs and websites are emerging as an important component of academic discourse.

However, I often lack confidence that the content will remain accessible in the short term (e.g., 5 years) let alone the longer term (10, 20, 50, 100 years).

An important function of journals is to archive the content and provide a stable citation system.

In particular, I'm worried about:

  • academics who change employment where the site is hosted by the previous employer
  • academics who die or lose interest in their content (e.g., domain names lapsing; site hosting fees ending)
  • Internet services that close down

Question: What can an academic blogger do to ensure that their blog content remains archived and accessible in the longer term?

1

4 Answers 4

10

Digital preservation is an evolving area, and the ability to preserve websites over the long-term is one of the most problematic areas. There are several reasons for this: the dynamic nature of contemporary websites (especially in a format like a blog, where content is update regularly and interactive components such as commenting), hyperlinking (which will eventually lead to broken links and broken images, which, if you point to content external to your site, is out of your control) and the instability of Web formats (websites might look better in one browser than another, much less how websites hold up over time) are just a few of the challenges in preserving websites.

There are, however, several things you can do to help preserve your website:

  1. Backup—keep your content stored in more than one location (on your server, on your hard drive, on an external hard drive, in the cloud, etc.) and don't use an external service as your primary storage location, as they can collapse at any time (Geocities was a huge service when it was shuttered);
  2. Keep up to date in changes in file formats and browser software—when browsers get updated to view HTML 5, 6, or whatever may replace HTML in the future, will they be backwards compatible to be able to view the blog you're authoring today, or will you need to migrate your website to a contemporary format?;
  3. Utilize pockets of expertise on campus—archivists on many campuses have been working towards digital preservation solution and your university's archivist (especially at larger universities) may be able to provide solutions for the long-term preservation of your website; and
  4. Consider normalizing your website to a preservation format—although creating a copy of your website as a PDF may mean losses in functionality (in terms of interactivity), it is a way to preserve the content and appearance of the site in a file format that is considered a safer bet for long-term preservation.

The Library of Congress also provides some tips on how to design preservable websites including following available web and accessibility standards, embedding metadata and maintaining stable URLs.

Taylor, N. (2012, February 6). Designing Preservable Websites, Redux. Retrieved May 23, 2012, from http://blogs.loc.gov/digitalpreservation/2012/02/designing-preservable-websites-redux/

1
  • Just wanted to suggest RSS as an archive format for blogs. Unlike pdf, the text content of RSS is readable by computers, which could be important if one wanted to republish the content at a later time...
    – Nico Burns
    Commented Jun 28, 2014 at 21:34
4
  1. keep up-to-date backups so that you've got copies in at least two geographical locations (e.g. one at home, one at work) of everything you want to keep.
  2. route everything via your own personal domain: so that even when things are hosted elsewhere (current university website, pre-print archives, whatever), the URL people see and bookmark is the one on your own personal domain. That way, their bookmarks will still work when you change your affiliation away from your current university.
  3. pick good URIs, and then stick to them.
2

I believe you have a number of options, which I'm stating in no particular oder:

  1. You can use the Wayback Machine, which strives to store snapshots of web-pages across time. You can also use the more personalized archive-it service, which lets you manage your own collection, at the same time sharing it with the public - this is mostly used by institutions I think.

  2. Alternatively, you can host your own blog on a licensed domain name, where you've prepaid the fees for enough years in advance.

Lastly, a very simple suggestion - if the content of the website is tending towards the academic quality/nature of a book, why not publish it as a short collection essays, which you can then disseminate freely over open web libraries etc?

3
  • 5
    > WordPress.com/Blogspot, which I don't believe are going to shutdown anytime in future. You mean, not like Myspace or GeoCities? ;) Anyone pretending to know what's going to be the Internet in 10, 20, 50, 100 years is very good psychic :)
    – user102
    Commented May 23, 2012 at 8:02
  • 1
    Yep, I guess I haven't been on the net long enough to realize that nothing is infallible! Edited answer accordingly...
    – TCSGrad
    Commented May 23, 2012 at 8:10
  • 3
    (a) Wayback Machine: my blog has around 200 unique pages, but only 20 pages are archived on Wayback Machine; so it appears to inadequately archive content (b) Paying ahead of time for domain and site presumably has limits (a hosting company could end; and how many years can you easily pay in advance, 10? 20?) (c) the book idea seems like a viable strategy for some content. Commented May 23, 2012 at 11:34
2

I guess that unfortunately, it's not possible to rely on any private service to ensure a long-term accessibility, since any service can shut down (in a similar way that any book editor can disappear, making impossible to print new editions of old books).

For your own blog, you basically need to have a local copy of it, and either you back-up the public version. For instance, you have a wordpress blog, and you back-up every new article you publish locally. In my case, I have a local version of my own website running on my personal computer (with a webserver and database), and I just sync my local copy with the server it's hosted on, so that there are always two versions of it. The likelihood of losing both at the same time is low enough to make it safe. If you have a decent Internet connection, you can also have your own server at home, and back it up online, so that there are always at least two versions in two different places.

For the other blogs, that's why it's important when you cite one blog not only to put a link, but also to put the text you're quoting, or some text you find interesting. Hence, if the original source disappear, there still exists a copy in many other places. That's the reason why on the SE network, it is asked not to put only links, but also the (description of the) content of the link. In other words, copyright problems aside, it's a good thing to copy!

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .