14

If you go on the recent pages (now, 8th December 2020, 19:24 Italian time), there is for example the article "Exploring the Usage of Online Food Delivery Data for Intra-Urban Job and Housing Mobility Detection and Characterization".

This article is under the heading "8 December".

But if you click on it, it was "Submitted on 4 Dec 2020".

Why there is a lag between submission time and the header under which it appears for recent publications?

I guess the arXiv staff have to approve the article before is being published .... is my guessing correct?

0

4 Answers 4

24

I guess the arXiv staff have to approve the article before is being published .... is my guessing correct?

Yes, arXiv moderates all submissions before posting them (there is some description of this here). This is not anywhere near as rigorous as peer review; rather, someone just flips through it to ensure that the submission is (1) not complete junk and (2) classified correctly. This approach is used by many pre-print platforms (SSRN, socarXiv, etc.).

8
  • 8
    I suspect the real reason is that all software updated that way in 1993, and nobody has offered to pay to change it. Commented Dec 8, 2020 at 21:31
  • 16
    I suspect they would be overwhelmed by spam, misclassified articles, and other junk if they eliminated human moderation and let authors post articles immediately.
    – cag51
    Commented Dec 9, 2020 at 0:15
  • 5
    That's an awesome link -- thanks, added it to the answer. I do not know whether the initial flagging is done by a human or an algorithm. But I suspect that the time delay is important regardless -- without it, bots could get real-time feedback on whether their auto-generated spam "paper" was flagged or not. Even Amazon reviews have a delay of at least a few hours, probably for this reason.
    – cag51
    Commented Dec 9, 2020 at 0:57
  • 4
    @AnonymousPhysicist The submission system has been updated more than once since 1993.
    – user151413
    Commented Dec 9, 2020 at 2:15
  • 10
    @AnonymousPhysicist: It’s pretty clearly not a legacy software issue, but a deliberate choice in their submission/moderation/announcement process. The arXiv web interface provides several automated services (e.g. proofreading the arXiv-processed pdf) which are at least as non-trivial from a software point of view as posting the papers live would be. On the other hand, the delay serves several useful functions: besides moderation as others mention, it gives authors a “second thoughts” period to pause or withdraw a submission; and batching releases is also (arguably) useful to readers.
    – PLL
    Commented Dec 9, 2020 at 14:29
12

To complement the other answer. All submission to ArXiv pass through a rudimentary moderation process, which checks that submitted manuscripts satisfy the ArXiv's basic technical and content guidelines. In general, this process is quite fast and this by itself does not account for the 4 day lag observed.

In general, submissions before 14:00 (Eastern time US) appear online at 20:00 the same working day. Note that no new articles are announced over the weekend, so some of the articles appearing on Monday were submitted on Friday.

Publish time (Eastern US) Submission time
Monday 20:00 Friday 14:00 - Monday 14:00
Tuesday 20:00 Monday 14:00 - Tuesday 14:00
Wednesday 20:00 Tuesday 14:00 - Wednesday 14:00
Thursday 20:00 Wednesday 14:00 - Thursday 14:00
Friday 20:00 Thursday 14:00 - Friday 14:00

In fact, since the articles appear in order of submission time in the daily digests, some authors make appoint of submitting directly after the cut-off time. This ensures that their article appears at the top of the list of new articles for the next announcement. This means that it is common to see articles that were submitted Friday at 14:00 + a few seconds in the Monday new articles.

The specific example given by the OP appeared on a Tuesday at 14:00 and was submitted on Friday at 15:18:37. So the usual delay does not account for the 4 day lag in this case.

Some submissions (about 15% according to this blog post) are flagged for additional moderation checks. There are various reasons for a submission to be flagged. Some common ones include one of the authors having a problematic previous submission history, and the submission being flagged by automated plagiarism checks. This must have happened to the example given by the OP, most likely due to it being a PDF only submission (which means that some of the automated checks done for TeX submissions are not available).

2
  • 2
    Could also be that the author edited after the initial submission, say on the weekend, which would have bumped it by one day? I'm actually not sure whether arXiv keeps the timestamp of the initial submission in this case, but I think I recall correctly that it does delay the announcement.
    – Kyle
    Commented Dec 10, 2020 at 17:19
  • @Kyle If it keeps the timestamp, it will produce a new version. So I strongly suspect that it won't. (Things were different in the past, when you could submit just after the deadline and keep your position in the daily announcements even if you updated later. But even then, I believe that the timestamp would be the one of the last submission.)
    – user151413
    Commented Dec 26, 2020 at 21:52
6

This is mostly, but nor entirely, because of the weekend.

The real delay from moderation is very short: you need to submit before 2pm Eastern time in order to be included in the following day's mailing. However, since there are no mailings at weekends, that means you need to submit before 2pm Eastern on Friday to be included in Monday's mailing. So it is quite normal for a paper to be submitted on a Friday afternoon (e.g. 4th Dec 2020) and appear on a Tuesday (e.g. 8th Dec 2020).

In this case, however, the paper was submitted on Friday morning, so would normally have appeared on the 7th, and there must have been some other reason, specific to this submission, for the additional one-day delay.

0

The other answers are correct that articles are reviewed (moderated) by humans, but only about 15% are flagged for additional moderation checks which take more time, and this reasoning doesn't explain why "trusted users" who have submitted many papers to arXiv with absolutely zero problems, don't get a fast-track process.

The first time you ever submit to arXiv, someone has to "vouch" for your paper, which is a manual process that takes a lot of time. After that, the amount of time spent on moderating the articles is extremely minimal (I've been involved in the process in the past). One answer has said that articles submitted before 14:00 EST appear online at 20:00 the same working day, which means only 6 hours delay. Many articles are not even looked at, or barely looked at during those 6 hours, because the articles don't need to be "vouched" for. There's many tags such as "quantum physics" and "discrete mathematics", and the moderators on each tag will vary a bit in how strict they are in monitoring whether or not the papers are appropriate for the tags, but there's certainly some tags for which the papers are barely looked at.

So 6 hours seems reasonable for articles that need to be examined, usually because the user has very little history of submissions in the past, or because they are submitting from a hotamil email address rather than a princeton.edu email address (for example), but what about all those articles that are not really being examined much? I don't know if arXiv has made any public statement about the reason for the 6 hours in those cases, but I sure like the fact that the paper doesn't get published online immediately, because I do find myself wanting to make last-minute changes after a couple hours, so very much like the extra hours of leeway, but that's just me.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .