Professor's personal web pages hosted by their institution are crucial sources of information in two ways:

  1. to disseminate useful and practical but not publishable information (especially in systems programming)

  2. to disseminate supplemental data, source code, and software. Google killed Google Code, and people are apprehensive about the long-term fate of GitHub since it was acquired by Microsoft. As such, a lot of people put source code in their university web pages. Now we have Zenodo and FigShare, but those are relatively new.

    • E.g., I wanted to do an experiment involving an older system (published on ~10 years ago), and according to the author, it's source code has been deleted by their old institution when they left to teach at a different university. Reviewers will probably object that my evaluation is incomplete since it is lacking the older one, but there is no extant copy of that older system to compare to!

I have three questions:

  1. Is it normal to delete when the professor moves or retires? I can't find it stated anywhere in official university policy. What's the policy at your institution?

  2. What was the point of hosting that in the first place if it is going to be deleted eventually? If the purpose was to disseminate information, the fact that it will disappear so quickly defeats the purpose of having it at all. In the long run, it just leads to the creation of dead links.

  3. What are the main constraints the university is facing that cause them to stop hosting that data? If the issue is cost, how does hosting ~100KiB for even as large as 10,000 past and present faculty (total ~1GiB) become prohibitive for a university?

In my experience there are two common situations.

  1. The university's provision for web-hosting is centrally managed, and tied to the user's university-wide access credentials. These are themselves tied to HR's employee database. Once someone leaves the university, this automatically triggers processes that lead to their computer accounts being deactivated. A consequence of this is that faculty websites disappear. Probably nobody ever consciously made the decision that this is what should happen - it just arises naturally from the way the web hosting is implemented.

  2. Faculty websites are hosted on an ad hoc basis, using a server run by a department or similar. Typically such websites persist for longer. However, at some point the system dies and is replaced by something newer. It is left to individual users to port their website from the old system to the new system, and so websites that no longer have an active owner disappear by default. Again, this is largely driven by practicalities - the alternative would be paying someone to deal with it, and no single person in the organisation cares enough to find the money for that.

There are also a number of practical, legal and security concerns associated with hosting unmaintained content. Information may be out of date and misleading (e.g., course syllabi that are no longer correct) or legally questionable (e.g. due to changes in privacy laws). There is also the risk that such websites (ab)use components or services that have not been updated for a significant period, and are known attack vectors. All in all, from the university's corporate perspective there is little sense in retaining a site that nobody is willing to take responsibility for.

A university website is not an archive, and the people working at the IT department are not professional archivists. If you want to seriously think about preserving knowledge from people no longer there (or no longer interested in maintaining it), you need a different place to store it and different people to maintain it and make it accessible.

For example, in my field it is important to preserve data, and now there is an international network of national level data archives that professionally preserve and maintain accessibility of that data. This only happened after each university tried to do this on its own, with very mixed results. I remember a story of data stored on punch-cards in the university basement. Punch-cards are cardboard and mice will eat bits of cardboard. You can imagine the rest of the story.

Why would you not delete the webpages of people who are no longer in your employ?

  • Hosting their webpages makes it seem like they still work at your institution, but they are no longer in your employ.
  • By extension, you save the time of people looking for the professor, since they won't e.g. compose emails to the professor's @myinstitution.edu email address then find it bounced.
  • You can't easily modify those webpages, because they're personal webpages.
  • Even if the source code is worth hosting, it might not be licensed for you to distribute. The professor could (and arguably should) bring it with them to wherever they are now.
  • What was the point of hosting that in the first place if it is going to be deleted eventually? In the long run we are all dead, C++/Python/whatever language the code is written in will become a historical relic, and the Sun will go nova, so what was the point of hosting anything in the first place?
Personal webpages for professors have their roots in smart individuals fiddling with the Internet well before it became standard for universities and businesses to have their own massive webpages. Most were home-spun and might even have been written in plain HTML.

This is not a good archival system.

In the last 15 years, as universities have built out giant web pages and systems, someone's not likely going to get www.podunk.edu/faculty/JSmith anymore, as websites become more sophisticated.

I have never run across a webpage that has been deleted, but it is not surprising to me that some IT departments have begun to trim these random hangers-on webpages from 20 years ago and probably two or three massive system overhauls (could even have been an accident in 2018 that no one noticed erased Smith, emeritus professor's webpage last updated in 2009).

While yes, universities are repositories of knowledge, it may just not have occurred to whoever was in charge that any thing like that may be lost if they clean up those old webpages.

What was the point of hosting that in the first place if it is going to be deleted eventually?

I don't think that just because something might be gone later, doesn't mean there's no reason to host it now? Nobody in 2004 was there to say "Professor Smith, we are going to delete this webpage when you retire in 2020, so don't post it now."

What are the main constraints the university is facing that cause them to stop hosting that data?

Obviously, these aren't huge storage demands.

Given that it's 2023, and that both university administrations and many older faculty didn't catch on to "the internet" until the last 10-20 years or so, there is really no long-term precedent for how to deal with the question of faculty work posted on their university web pages. The main approach has been to try to not count this as being anything official, anyway. There still is the confusion about literal publishing (= on the internet) versus "publishing" in the once-sensible, but now-archaic, meaning of "passing peer-review for a 'journal'".

I myself see this as somewhat analogous to "the problem" of what to do with books/scrolls before the idea of "library" was established... :)

A disturbing (to me) difference is the idea of the ephemeralness of stuff on the internet. Indeed, lots of stuff is not really meant to be looked at more than once, if at all, and many things are of-the-moment. The concept of long-term or even permanently-useful things does not seem to be in harmony with "the internet", as we see it currently.

On another hand, given the lack of precedent, possibly there is a (near-?) future equilibrium that none of us has the imagination to see?

Since I've written lots of mathematical stuff, most of it aiming to be helpful to other people, I do have an impulse to try to keep it available "after I'm gone". At this point, I do not see a reliable means to do that. Some conventional books, ok. Do all of us have to allocate some of our estates to server fees for our life's work? "The state" has paid for libraries in the past (though "books" are apparently less popular in some quarters than they once were), so an author of a serious book did not have to allocate money to their book's continued existence.

Another plausible attitude is that it's just as well to lose "old stuff", since an accumulation of that old stuff imposes an ever-increasing burden on younger people. :)

To add to the excellent answers already posted:

One of the functions of professors' web pages is to give prospective students and faculty a glimpse into the areas of research and publishing that the particular university department is involved in. It would be deeply disappointing to join a university and realize that the majority of the interesting and pertinent research goals being discussed belong to faculty no longer associated with the university.

(Yes, one could get some idea by checking when each page was last updated, but that is a relatively tedious and manual process, and it may be difficult to tell the difference between a faculty member who rarely updates their pages and one who left last year after being denied tenure.)

In addition, orphaned pages can no longer be edited to, for example, remove links to retracted or superseded papers; in some cases stale information can be worse than no information, for example sharing source code with known and widely exploited security holes.


As yet unmentioned: copyright and distribution rights.

At most universities, the professor retains the copyright to material that they produce. The the university does not have the legal authority to continue distributing a professor's materials without their explicit consent. If the professor does not continue making it available, then the university cannot do so in their place.

For example, my employment contract grants my university a one-year, non-exclusive license to use all of the teaching materials I have used in a course taught at the university. This is the "hit by a bus" clause- if I were to be hit by a bus or otherwise suddenly removed from my teaching position, my department would have a one year window to continue teaching with my materials, but after that they need to figure something else out.

