Why is peer review so random?

Question

For people whose profession revolves around making order out of seemingly-random observations, scientists sure are inconsistent at judging the work of other scientists. Why? It certainly doesn't seem to be like this at all levels. For example according to the GRE's website,

For the Analytical Writing section, each essay receives a score from two trained raters, using a six-point holistic scale. In holistic scoring, raters are trained to assign scores on the basis of the overall quality of an essay in response to the assigned task. If the two assigned scores differ by more than one point on the scale, the discrepancy is adjudicated by a third GRE reader. Otherwise, the two scores on each essay are averaged.

This implies that it's uncommon for two assigned scores to differ by more than one point on the scale, i.e. GRE essay raters usually agree. Similarly, as far as I know, undergraduate thesis readers, MS thesis readers and even PhD thesis readers don't usually come to diametrically opposed judgments on the piece of work. Yet once it gets to research-level material, peer reviewers no longer seem to agree. Why?

Thanks for the links, @Allure. I've been rejected soo much I'm sure that my Nobel prize is in the mail. I'd better start preparing my acceptance speech. :) — user96258, Commented Aug 13, 2018 at 7:08
Because the space is very sparse, untrodden, and thus there is little statistics in novel areas to judge how good the work is? — Captain Emacs, Commented Aug 13, 2018 at 7:26
Its worth noting that in the case of the NIH grant allocations, the actual conclusion was that rank within the top 20% was random, but that reviewers did agree generally on which applications were in the top 20%. Thats more agreement than your GRE example. — Ian Sudbery, Commented Aug 13, 2018 at 8:54
We hear about only the few cases where peer review went wrong. But in most cases it does what it is supposed to. — GEdgar, Commented Aug 13, 2018 at 12:23
Nature only introduced peer review in 1967, so the 1930s papers rejected by Nature were rejected by the editor, not as a result of peer review. — Count Iblis, Commented Aug 13, 2018 at 21:37

Community · Accepted Answer · 2018-08-17 05:16:53Z

65

The biggest difference is that, up to PhD thesis level, the person doing the assessing is more of an expert than the person being assessed. In almost all these cases there is an agreed set of standard skills, techniques and knowledge that any assessor can be expected to possess and any assessee is being measured against.

This isn't so true of a PhD thesis, but in the end once a supervisor/thesis committee has green lit a student, almost all PhD theses are passed.

It's definitely not true higher up. In almost all cases the person being reviewed will be more of an expert in their work than anyone doing the reviewing. The only exceptions will be direct competitors, and they will be excluded. We are talking right at the edge of human knowledge, different people have different knowledge and skill sets.

I'm quite surprised that the GRE scores are so consistent. It’s long been known that essay marking is pretty arbitrary (see for example Diederich 1974[1]). Mind you 1 mark on a 6 mark scale is 15% – a pretty big difference. In our degree a 70 and above is a 1st class degree – the best mark there is, whereas 55 is a 2:2, a degree that won't get you an interview for most graduate jobs. Losing 15% on a grant assessment will almost certainly lose you the grant.

But even to obtain this level of consistency, the graders must have been given a pretty prescriptive grading rubric. In research, no such rubric exists; there are not pre-defined criteria against which a piece of research is measured, and any attempt to lay one down would more or less break the whole point of research.

edited Aug 17, 2018 at 5:16

CommunityBot

1

answered Aug 13, 2018 at 9:29

Ian Sudbery

41.8k2 gold badges97 silver badges148 bronze badges

17

@Mehrdad Already you run into issues. What does "reproducibility" mean for a math theorem? What does "correctness" mean for a philosophical text? And I think everyone would be hard-pressed to give numerical grades for each of these categories given a paper. And there's no threshold above which a paper is accepted and below which the paper is refused.
– user9646
Commented Aug 13, 2018 at 12:27
7

The footnote expanding "Diederich 1974" is missing.
– David Richerby
Commented Aug 13, 2018 at 14:20
2

I believe [1] is "Diederich, P. B. (1974). Measuring growth in English. Urbana, IL: National Council of Teachers of English." ( eric.ed.gov/?id=ED097702 ) Another related paper by the author, with a slightly more useful abstract: Diederich, P. B., French, J. W. and Carlton, S. T. (1961), FACTORS IN JUDGMENTS OF WRITING ABILITY. ETS Research Bulletin Series, 1961: i-93. doi:10.1002/j.2333-8504.1961.tb00286.x (which says "free access" but may just be my university)
– BurnsBA
Commented Aug 13, 2018 at 16:10
11

@mehrdad - your 2,3, and 4 are all extremely subjective.
– Mazura
Commented Aug 13, 2018 at 18:22
4

"The only exceptions will be direct competitors, and they will be excluded." - depending on the field, reviewing your direct competitors' work can be very common, for exactly the described reasons.
– O. R. Mapper
Commented Aug 16, 2018 at 5:19

| Show 5 more comments

score 88 · Accepted Answer · 2018-08-14 07:27:29Z

88

Good question. Hard to answer. Some thoughts:

reviewers are not trained
reviewers are anonymous
reviewers receive minor feedback on their performance
reviewers are also authors, competing for the same funds/ prestige
reviewers are specialized in a narrow discipline
reviewers are volunteers
reviewers are scarce
the review system lacks an external (independent) control system (audit)
reviewers are humans, with their own personal interests, emotions, capabilities

Considering these observations, it is unlikely to expect two review reports to be aligned. Then the difficult decision transfers to the associate editor who is also a volunteer and not specialized in the author’s field.

Leaves the question why it is accepted while outside science this wouldn’t be. Honestly, I don’t know. Just some guesses:

Science is a powerful isolated sector with its own rules?
The current system works for established research groups?
Jourals do not have the funding to train and attract qualified professionals/scientists as reviewers?
There is no easy solution or alternative?

Added based on comment: - reviewers are busy scientists - reviewers are career-wise not rewarded for conducting reviews

edited Aug 14, 2018 at 7:27

answered Aug 13, 2018 at 7:28

user93911

9

Good list! I would add that reviewers might not have a lot of time and rush paper review, even though they volunteered. Some people also might not be very invested in doing a thorough review, because they do not benefit directly. Both are probably not often the case, but it seems likely.
– Ian
Commented Aug 13, 2018 at 10:42
2

Something new is already on its way! researchers.one
– Peaceful
Commented Aug 13, 2018 at 15:37
Thank you, @Ian. I fully agree and added your points at the bottom of my post.
– user93911
Commented Aug 14, 2018 at 7:28
Thank you @Peacefull. I follow you here. Science is changing. Actually, I believe this is a very interesting time where scientists on the entire globe can and will shape the future of science.
– user93911
Commented Aug 14, 2018 at 7:31
1

"Jourals do not have the funding to train and attract qualified professionals/scientists as reviewers?" I would say that there isn't demand on journals to do this and they're not going to do it until the demand forces them to.
– Dean MacGregor
Commented Aug 15, 2018 at 17:33

| Show 1 more comment

Ray · Accepted Answer · 2018-08-13 18:41:59Z

51

With respect to the good papers being rejected problem, a factor that doesn't seem to have been mentioned yet is that the consequences of accepting a bogus paper are much worse than those of rejecting a good paper. If a good paper is rejected, it can always be resubmitted to a different journal. And if the authors first revise according to the reviewer comments, the version that ends up getting published may well be better written than the one that was rejected. All that's lost is time.

But if a bogus paper is accepted, other scientists may see it in the literature, assume its results to be valid, and build their own work upon it. This could result in significant lost time on their part, as experiments that depend on the bogus result don't work out as they should (which at least may lead to the bogus paper being retracted if the errors are bad enough). Or maybe they'll avoid researching along a line that would have worked, because the bogus paper implies it wouldn't, or worse, they'll end up with inaccurate results themselves and end up putting another paper with bad data into the literature. All of these are far worse outcomes than just needing to resubmit a paper, so false negatives are preferred to false positives when reviewing.

answered Aug 13, 2018 at 18:41

Ray

2,00215 silver badges16 bronze badges

This is really the best answer and makes the most sense and includes the least cynicism. Not sure why this isn't hugely upvoted since it really does explain the core challenges of the process (danger of accepting a bad paper).
– user65852
Commented Aug 13, 2018 at 20:30
7

This is definitely a good perspective and argument. However, currently also many bogus papers pass the review process. I can point them out in my field. And good papers or good papers with contradicting results don’t make it. I have an example where people got killed because a published paper became the standard (in medicine) and opposing papers were rejected (resulting in very angry and disappointed scientists) Personally I believe a scientist is never relieved from judging the quality of published papers. I do not rely on the fact that a paper is published. I use my own judgemental skills.
– user93911
Commented Aug 14, 2018 at 7:45
In addition, there is nothing to suggest that the rejected papers were bad at that time of the original submission, and that only after they had been revised were they clear enough for publication. It would be rare IMO that an author would not substantially revise the writing/presentation after rejection: after all, the odds are that if you submit the same paper twice, the result will be the same twice. I’m actually surprised only 8 such papers have been found.
– ZeroTheHero
Commented Aug 18, 2018 at 18:08

Add a comment |

Buffy · Accepted Answer · 2018-08-15 18:31:42Z

This won't really answer your question, I realize, but I'd like to address your first example - rejected papers that later led to Nobel prizes.

Sometimes a piece of work is Frame Breaking and it leads to a Paradigm Shift within a field. This has happened many times in history, since at least Copernicus and Galileo. Einstein's early work on relativity was rejected among the physics/astronomy hoi oligoi as it was too different from the belief in the Aether at the time. The most prominent members of the field reject a radically new idea and their students, who are pervasively represented usually go along.

It has been said that revolutions in physics require the death or retirement of the most respected researchers so that the ideas of the young can get a fair hearing and come to the fore.

That is in fact an explanation of at least some of the eight papers referenced in your first link.

I don't think that many of us write paradigm changing papers, but it occasionally happens. The truly brilliant (not guilty) among us often must labor in near silence and obscurity for most of a generation. The next generation may celebrate them, or it may take even longer.

When a reviewer is faced with a truly frame breaking paper they, by definition, have no frame of reference in which to evaluate it. It is orthogonal to their entire way of thinking. "This must be nonsense", is the too-natural response.

Read, for example, the short Wikipedia biography of Ramanujan.

Thomas Kuhn's book "The Structure of Scientific Revolutions" is the classic reading on this point. — WBT, Commented Aug 13, 2018 at 17:15
This is the most relevant answer to the direct question - Nobel worthy papers are radical or revolutionary by nature. This naturally makes their review all the more sceptical an affair. — J..., Commented Aug 15, 2018 at 22:07
It's not just physics where revolutions require a changing of the guard. This happens in geology and chemistry as well. — Peter Shor, Commented Aug 16, 2018 at 13:19

Dmitry Grigoryev · Accepted Answer · 2018-08-13 13:07:26Z

15

Different tasks, different results.

All the GRE graders have to do is assign scores but they are doing so to dozens or hundreds of essays. They receive clear guidance and examples about what score given essays should probably receive. So it’s basically checking boxes to justify a small set of results.

A peer review analysis is fundamentally different since you’re asking for a much more technically difficult task. They have to evaluate if the analysis is accurate, not if it’s responsive to a prompt. There’s no set of examples to draw on either. So the focus of peer review can be very different for different reviewers who may have different sets of expertise and certainly will have their own points of view.

edited Aug 13, 2018 at 13:07

Dmitry Grigoryev

5,45014 silver badges36 bronze badges

answered Aug 13, 2018 at 9:51

aeismail

174k34 gold badges419 silver badges744 bronze badges

There’s no set of examples to draw on either what about all the already-published papers?
– Allure
Commented Aug 16, 2018 at 0:03

Add a comment |

Pete L. Clark · Accepted Answer · 2018-08-13 16:51:47Z

To compare academic peer review to GRE grading -- that makes apples and oranges look all but identical. Let's step a little closer:

Similarly, as far as I know, undergraduate thesis readers, MS thesis readers and even PhD thesis readers don't usually come to diametrically opposed judgments on the piece of work.

That is certainly not always true and highly field dependent. In certain parts of academia it is a standard grad student horror story that Committee Member A insists that the thesis be cast in terms of Theoretical Perspective X, while Committee Member B insists that the thesis be cast in terms of Theoretical Perspective Y, where X and Y may be intellectually incompatible or sociologically incompatible: i.e., each theory has rejection of the other as a central tenet. This is more common in humanities where the nature of "theory" to the rest of the work is rather different, but it is not unheard of in the sciences either.

As a frequent committee member, I also happen to know that coming to a consensus judgment is a sociological phenomenon as well as an intellectual one -- i.e., some differences in judgment are limited only to the private discussion following the defense and other differences in judgment are never verbalized at all.

This is helpful in understanding the disparity in peer review: in peer review, the different referees are (in my experience, at least) never in direct communication with each other, and in fact may not be seeing each other's verdicts at all: as a referee, I believe that I have never been shown another referee report. In fact,

Who watches the watchmen?

There is no aspect of the academic process that makes me feel like a lone masked vigilante more than being a referee. Surely people who do GRE grading go through some lengthy training process of repeated practice evaluating, feedback on those evaluations, discussion of the larger goals, and so forth. There is nothing like this for academic referees. We get no practice, and there is very little evaluation of our work. If I turn in what is (I guess!) an unusually comprehensive report unusually quickly, I will often get a "Hey, thanks!" email from the editor. In the (thankfully rather small) number of instances where my referee reports were months overdue, I either heard nothing from the editors (I am ashamed to say that once I figured out on my own that a paper I thought I had had for a few months had actually been an entire year) or got carefully polite pleas for me to turn in the report. I have never gotten any negative feedback after the fact. Unlike GRE graders, referees are volunteers.

I find (again, in my experience and in my academic field of mathematics) that referees are almost never given instructions that amount to any more than "1) Use your best judgment. 2) We are a really good journal and want you to impose high standards." I also notice that 2) is said for journals of wildly differing quality. What does it mean to "impose high standards"? I take that directive seriously and fire my shots into the dark as carefully as I can, but....of course that is ridiculously, maximally subjective.

On the other hand, your third link is pretty alarming. It describes a systematic process of resubmitting papers that had been accepted and published by prestigious journals within the last three years to the same journal that published them. In the majority of cases, the journals did not recognize that they had published the papers before. I find that very surprising. — Pete L. Clark, Commented Aug 13, 2018 at 17:03

Has QUIT--Anony-Mousse · Accepted Answer · 2018-08-13 17:20:13Z

Submission overload

We write more and more, and the typical submission quality seems to be going down. This has various reasons, including bad incentives in particular in China. If your salary directly on the papers accepted, quantity beats quality...

IMHO we are close to a tipping point now. Many of the expert reviewers refuse almost any reviewing request - because so many submissions are so sloppy, that it's quite annoying to review them. It should be different: most submissions should be so high quality that you enjoy reading that and can focus on the details. So more and more experts are just annoyed. They delegate more of the reviewing to students, or simply refuse. But that now means the remaining reviewers get more requests, and more bad papers. This can tip quickly, just like most ecosystems.

So the editors need to find other reviewers, and we get less and less expert reviewers. This also opens doors to scams and schemes. Multimedia Tools and Applications for example seems to have fallen prey to editor and reviewer manipulation scheme.

So what's the solution? I don't know.

Make the handling editor and the reviewer names public and thus accountable on accepted papers - this used to be quite common; expert reviewers tend to stand publicly to their reviews if they accept the paper (this used to be an actual endorsement of the work in some fields). This makes corruption and scheming easier to discover with modern analysis. But this will likely just make it harder to find expert reviewers...
Do the first review only with one reviewer (to reduce the load on the reviewers), possible outcomes "early reject" and "full review".
Require authors to do 5 reviews per accepted publication (so the experts cannot just stop reviewing completely, unless they retire - first publication is "free", but you cannot keep on rejecting review requests)?
Actually pay the expert reviewers! Once you are confident that a submission is worth it, that should be an option given Elseviers absurd profit margin. I believe the main problem here is the bureocracy involved here (who decides if a review is high quality) and the different wage costs in different countries. Nevertheless, combined with above pre-review, this will increase the "worth" of good reviews. But this, in turn, increases the risk of schemes to make money from this...
Review reviews, and give out best reviewer awards.
Punish repeated rejects with a delay to make spamming costly.
Put a limit on the number of submissions per author and year (probably will just mean the submissions go elsewhere - so it may help a particular journal, but not the entire community)?
Ban financial incentives - if your country/university has such a direct financial incentive, you aren't allowed to submit to certain top journals. Then these bad incentives will be quickly abolished because they tend to also ask for top journals. But it may just make the payments become delayed or hidden... I don't know. I am not an expert on policy. These are just some ideas.

WBT · Accepted Answer · 2018-08-13 17:12:58Z

Contributing a point beyond other answers:

Different levels of effort going into the review leads to different outcomes.

Papers are often written such that on a first pass read, it's supposed to read "pretty good" even if a more critical deep read and/or check of references would expose gaping holes, serious methodological issues, and alternative explanations for the results observed. Sometimes, an even more-effortful review can find that these issues don't actually matter in the particular case applicable to that specific paper (though the author should generally add this to the paper text itself).

While reviewers are incentivized to do a good job by the general knowledge that the system depends on that, specific instances are generally not incentivized and reviews sometimes get left to the last minute with a reviewer who's short on sleep and long on other tasks, who doesn't put in the effort for a good review. Thus, the result could be very different than even the same paper getting reviewed by the same reviewer at a different time. With no visibility into the factors affecting that outcome, it seems random.

BruceET · Accepted Answer · 2018-08-19 17:17:57Z

The fundamental difference between grading GRE essays and reviewing scientific papers submitted for publication has been has been cogently discussed in several previous answers. The fundamental difference between reviewing grant applications and papers for publication has not.

Publications. When a paper is submitted to a journal it is usually supposed to be a finished product, or at least one finished step toward a defined goal. It is truly difficult to find reviewers who are able to assess the importance of a paper and to find every gap in reasoning or every imperfection in technique, but at least reviewers of journal articles have the results of a piece of research at hand.

A potential Nobel paper may cover material so new or so far off the beaten track that it will be especially difficult to review fairly. A paper resubmitted after several years may have been based on procedures or techniques that have been considerably refined in the meantime. So maybe they were state of the art at the time of original submission, but are now 'seriously flawed' in terms of currently available methods. So it is hardly surprising that reviewers don't score 100% on those tasks.

However, even though reviewers are unpaid, overworked volunteers, working with no specialized training or feedback on the details of reviewing, I think it is surprising how well journal reviewing works in practice.

Grants. By contrast, making judgments about research grants is an entirely different kind of activity. Some years ago (when US federal funding was available at a much higher level than it is now), I spent several years at a federal agency with a reasonably large budget for supporting basic and applied research in a variety of scientific fields. So I will try to address this part of the picture briefly. I will begin by saying that I am not at all surprised that a panel of research scientists would find the funding of NIH (or any other US government agency) to be 'no better than a lottery'.

Generally speaking, if you know exactly what you are doing, how long it will take, and how much it will cost, you're not doing research. Reviewers can often be useful, in assessing a proposer's track record of success and providing a rough idea whether the proposer is competent to undertake research in a particular area. (I should add that most program directors are well aware of the standards, biases, foibles, and strengths of the reviewers they use. I was seldom surprised by the contents of a requested review, but the few surprises were extremely valuable.) However, reviewer input is only a part of the picture.

Going beyond reviewer input, program directors in granting agencies have to take other factors into account. To some degree they must consider financial, political, and infrastructural factors. 'Political' usually means that that money was appropriated or donated specifically to support a particular scientific goal. Infrastructural concerns may center on developing technologies that are agency goals, training graduate students in fields where there are not enough researchers, whether the institution requesting the grant has the sophistication for adequate stewardship, and so on.

In the US, agencies such as NIH, NSF, DoE, EPA, various defense agencies, and various privately funded foundations may have very different goals. However clearly these agency missions and objectives may be spelled out in 'requests for proposals', they are often ignored by grant applicants, who might make a better case for their work if the appropriate connections were made clear.

In spite of these constraints on awarding of grants, program directors strive to support nothing but the highest quality science, and I believe they usually succeed at that. In my experience, almost all of them view themselves as scientists first and agency bureaucrats second. Often their success is with the considerable help of reviewers, but sometimes not.

Wrzlprmft · Accepted Answer · 2018-08-14 20:50:56Z

To address the aspect of:

The same paper resubmitted to the same journal after several years often ends up rejected due to 'serious methodological errors'

In about one third of the papers I reviewed, I identified fundamental flaws that could not be addressed by revising the paper (you would have to write a new paper instead). Some examples just to give you a taste:

The entire analysis was a self-fulfilling prophecy, i.e., the result was an assumption.
An proposed characteristics was a roundabout way of measuring some much more trivial property. (This happened twice already.)
A proposed model ignores the dominant mechanism for what it is supposed to model.
The entire study was about understanding an artefact of a well-known beginner’s mistake (without identifying that mistake).

While I may have been wrong about these things, the authors never addressed my concerns, be it in a rebuttal or version of the paper published in another journal (which never happened in most of these cases) – which is something they should do even if I am wrong.

Now these issues may seem like they should have been easy to spot, but evidently they weren’t: I spotted some of these flaws only when writing up the actual review, and I witnessed (and performed) quite a few jaw droppings when discussing papers with co-reviewing colleagues¹ whom I knew to be thorough. Also, in some cases I saw reports of other referees who were otherwise exhaustive but did not spot the issues.

So, to conclude: Even fundamental flaws are difficult to spot. A given reviewer only has a comparably small chance to spot a given flaw in a paper. Therefore there is a considerable chance that all of the reviewers fail.

^{¹ Yes, that’s a thing in my field and fully accepted by the journals.}

I am curious about the "discussing papers with co-reviewing colleagues" part, since (as I mentioned in my answer) that is absolutely not a thing in my field (mathematics). (Although in fact, in my field the most common number of referees is one.) How does that work? — Pete L. Clark, Commented Aug 14, 2018 at 13:37
@PeteL.Clark: Essentially, a referee is allowed seek the input of colleagues as long as they ensure that the reviewed material is kept confidential and name the additional reviewers when submitting the review (policy example). Typically, an advisor takes an advisee onto the team. — Wrzlprmft, Commented Aug 14, 2018 at 15:24

Keith · Accepted Answer · 2018-08-20 01:23:49Z

When I was in grad school in computer science I noticed a few challenges in the paper review process. Most of these things I observed through the experiences of others:

Journals and conferences use the community of prior authors to review papers. The people who publish the most are working very hard. They may not be able to devote enough time to review a paper well. If the material in the paper is novel, unfamiliar, dull, highly technical, or lacking clarity then those factors make it less likely to be reviewed carefully.
The reviewer who was originally selected by the publication may hand the review task off to a graduate student. The student may just be learning to read and digest research papers in the field. If the material is difficult then the inexperienced reviewer may not be able to review it adequately and may not have learned to qualify their judgements according to their degree of understanding.
The most qualified reviewers often have strong viewpoints about the subject matter and may have a very different approach. The cognitive and emotional dissonance one feels when trying to understand the application of unfamiliar tools to familiar problems can be too much to overcome. Of course this hard work can lead to benefits for both parties when the process works.
A successful new approach can even be felt as threatening to other researchers or to the community as a whole. Real breakthroughs will disrupt the lines of research that others are pursuing. That is one reason that strong claims will be held to a higher standard of justification.
Authors compete for limited space in journals and conferences. I've heard authors say they feel a paper was rejected due to an unfair review by a self-interested rival. Of course they usually can't know this for sure.
Sometimes reviewers make valid comments but the strong emotional attachment the author feels to their work causes them to reject the feedback. This can deadlock the process, making it difficult to get work published due to uncorrected flaws, even when the work is strong. I have read about this happening even to some famous papers.

Publishing papers is an essential professional activity for most of the authors, but reviewing papers is an act of community service. That sums up a lot of the specific problems.

Stack Exchange Network

Why is peer review so random?

11 Answers 11

Submission overload

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
peer-review
.

Linked

Hot Network Questions

Why is peer review so random?

11 Answers 11

Submission overload

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged peer-review.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
peer-review
.