Suppose we maintain a massive electronic library of texts/photos/videos etc., and want to ensure that these files are readable indefinitely long in the future. [Update] one of the major problems with digital libraries is the program rot: due to bugs in content creation and/or content playback software (and due to feature removal from playback software):
Many documents can be reproduced only on particular versions of software, of the OS and of the computer hardware.
So we:
- Keep snapshots of versions of OS/software which are known to read these files without errors.
- Keep snapshots of VM implementations which are known to run these OS/software versions without errors.
However, this is obviously not enough: for the best result, we need to preserve the versions of CPU on which the VM implementation is running!
The only exit of this vicious circle seems to have “a virtual CPU”: a “virtual instruction set” which is:
- Powerful enough so that one can recompile the VM mentioned above to run on this pseudo-CPU.
- Simple enough so that one can write a very simple interpreter for this instruction set (e.g., in a pseudo-code — but it better be compilable for periodic checks of it working!).
The target is that N (or N²) years in the future, a future librarian should be able to quickly rewrite this “sample interpreter” into whatever programming language is available at that time. After this, the library becomes readable. (In other words, all one should provide is:
- A general human-readable instruction how to navigate the library;
- The human-readable (pseudo-)code of the interpreter.
- A blob keeping the compiled VM, the OS and the reader programs.
- A blob keeping the library.)
Of course, in the best of the worlds, such a CPU architecture would be already available!
Question: is it available? If not, how close it is to being available?