23

In this answer it is suggested that arXiv, as its name would suggest, is archival. One of Beall's criteria is that a publisher is potentially predatory if it

Has no policies or practices for digital preservation, meaning that if the journal ceases operations, all of the content disappears from the internet.

The only thing I can find about digital preservation on the arXiv website is

arXiv submissions are meant to be available in perpetuity. Thus, arXiv has high technical standards for the files that are submitted.

While it is good that the articles are in a format which will allow access in perpetuity, the primer says nothing about what happens if arXiv ceases operations. What is the arXiv policy in regards to digital preservation?

15
  • 4
    I would presume that the arXiv is simply too big to fail.
    – Arno
    Commented Jun 18, 2015 at 12:30
  • 4
    @Arno I would presume that perpetuity is a long time for things to "just work".
    – StrongBad
    Commented Jun 18, 2015 at 12:41
  • 10
    I don't really understand the relevance of Beall's criteria to arXiv. As far as I understand, arXiv does not have an editorial board that operates in a similar manner to a journal's editorial board, nor does it conduct peer review, for example... Is this meant to suggest that arXiv is a predatory publisher?
    – user9646
    Commented Jun 18, 2015 at 13:25
  • 3
    @NajibIdrissi I used Beall's criteria because it nicely described what I was looking for, not because I think of arXiv as predatory.
    – StrongBad
    Commented Jun 18, 2015 at 13:42
  • 3
    StackExchange is the same way. If StackExchange were to implode tomorrow, well, our questions are gone. But you should also weigh the likelihood of that versus the severity of that.
    – Compass
    Commented Jun 18, 2015 at 14:27

3 Answers 3

18

From their FAQ:

What are CUL's preservation strategies?

Digital preservation refers to a range of managed activities to support the long-term maintenance of bitstreams. These activities ensure that digital objects are usable (intact and readable), retaining all quantities of authenticity, accuracy, and functionality deemed to be essential when articles (and other associated materials) were ingested. Formats accepted by arXiv have been selected based on their archival value (TeX/LaTeX, PDF, HTML) and the ability to process all source files is actively monitored. The underlying bits are protected by standard backup procedures at the Cornell campus. Off-site backup facilities in New York City provide geographic redundancy. The complete content is replicated at arXiv's mirror sites around the world, and additional managed tape backups are taken at Los Alamos National Laboratory. CUL has an archival repository to support preservation of critical content from institutional resources, including arXiv. We anticipate storing all arXiv documents, both in source and processed form, in this repository. There will be ongoing incremental ingest of new material. We expect that CUL will bear the preservation costs for arXiv, leveraging the archival infrastructure developed for the library system.

It looks like they're relying on a) multiple offsite mirrors; b) periodic stored backups at LANL; and c) deposit in the institutional repository at Cornell.

It's a little unclear if that deposit is actually happening yet or is still part of a long-term plan, but it's worth noting that the arXiv program director is also the librarian responsible for Cornell's digital preservation work, so it's unlikely to have been forgotten about!

1
  • 1
    The link is now dead, and I couldn't find up-to-date information. Does someone know if there is info available somewhere about their current archival policies?
    – a3nm
    Commented Jul 29, 2022 at 17:06
12

This is answered in the FAQ for the arXiv Membership Program:

CUL [Cornell University Library] has an archival repository to support preservation of critical content from institutional resources, including arXiv. We anticipate storing all arXiv documents, both in source and processed form, in this repository. There will be ongoing incremental ingest of new material. We expect that CUL will bear the preservation costs for arXiv, leveraging the archival infrastructure developed for the library system.

The same FAQ also tells you about the current funding model (up until some years ago arXiv was entirely funded and ran by the Cornell University Library; now funding comes from also the Simons Foundation as well as other participating university libraries).

In terms of Cornell's digital preservation policies, I cannot find a full description online (probably just due to my weak google-fu today); but this would be person to contact and ask.

2
  • The link is now dead, and I couldn't find up-to-date information. Does someone know if there is info available somewhere about their current archival policies?
    – a3nm
    Commented Jul 29, 2022 at 17:06
  • I would suggesting emailing the arXiv staff. There is a contact link in the help page on arXiv. Commented Jul 29, 2022 at 17:21
6

I asked arXiv on October 4, 2023:

I am involved in an overlay journal whose articles are published on arXiv, and we are asked by the Free Journal Network about our long-term archival policies.

For this reason I would like to know if there is a long-term archival policy for arXiv articles: are there backups or continuity plans in place to ensure that the articles are preserved if arXiv is unable to continue operation?

Also, is there publicly available information about this? I could not find anything in the FAQ or in the help about long-term preservation.

I got the following reply one day later:

Thank you for reaching out. We are committed to permanently archiving the content that is submitted to and hosted by arXiv and operate under policies designed to ensure long-term access to the scholarly record. However, at this time, arXiv is not engaged in formal preservation activities. We hope to implement a preservation strategy in the future.

So the answer seems to be that arXiv does not currently have any long-term preservation strategy.

That said, in practice, in addition to the current bulk data access options, it looks like there is a downloadable copy of the data here https://archive.org/details/arxiv-bulk (though it was apparently last updated in 2020).

0

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .