What is a statistically significant number of up-votes?

Question

Up-votes and down-votes are intended to indicate whether the Stack Exchange community feels a particular answer "is useful", as explained in the FAQ: "When should I vote?".

How many up-votes does an answer need for those up-votes to be statistically significant?

For example

can I rely on an answer with one up-vote? And with 100 up-votes? If they're different, where's the line?
what if the question has a small number of views? Do the answers to the above change?

And related: what is a statistically significant number of down-votes?

EDIT I'm aware this can't be answered with a single number, applicable to all of the many topics across the site. Perhaps one may need something as hefty as ML to get meaningful results.

EDIT question originally mentioned posts. I meant to ask about answers only, not questions.

This is impossible to answer. Popular posts tend to get a lot of upvotes, and some very high quality stuff inevitably goes entirely missed. It would be said that the voting system is functional when on average, higher post score would correlate with higher quality. If, as a frequenter on a site, you feel inclined to trust what posts with higher scores say, then it's working, in your case, but that should never replace a healthy amount of skepticism of what's written, of course. (Not the downvoter BTW) — M.A.R., Commented Dec 27, 2019 at 18:40
@PeterMortensen can you elaborate? do you mean you think this is a chameleon question? — joel, Commented Dec 28, 2019 at 0:11
There really is no answer. I checked on Network Engineering, and there are exactly six Gold Great Answer badges awarded. Five of those took years to each reach 100 votes, but one did it in three days (look at the award dates vs. the answer dates). All were legitimate answers to legitimate questions, and I have no doubt about the validity of the votes. Each of the answers will still occasionally receive a new up vote, which is why the five took years to reach 100 votes. — Ron Maupin, Commented Dec 29, 2019 at 5:26

Glorfindel · Accepted Answer · 2019-12-27 18:47:39Z

12

It's impossible to give a number. It will differ widely per site; a site like Community Building has 33 visits per day, much less than Stack Overflow. So there are no answers with 100 upvotes. Just looking at the traffic might not be enough; on some sites, users are more inclined to vote than on others.

And even on Stack Overflow, a popular tag like [java] or [c#] will see much more voting than, say [kotlin]. The best way to check if you can rely on an answer is to actually try it out. A very popular answer with hundreds of upvotes might not work for you since your case is special. Don't forget to upvote the answer (and probably the question) if it works for you!

edited Dec 27, 2019 at 18:47

answered Dec 27, 2019 at 18:38

GlorfindelMod

253k61 gold badges626 silver badges1.3k bronze badges

1

Random Tidbit: There is a question with 100 upvotes though. To be fair, it was migrated from UX, which is much larger than Community Building.
– Andy
Commented Dec 27, 2019 at 18:42
2

Note that on many sites there isn’t really a way to test if an answer works.
– Alex
Commented Dec 27, 2019 at 20:04
1

And even on SO and the Java tag, "many" votes are mostly a thing of the past...
– GhostCat
Commented Dec 27, 2019 at 20:27

Add a comment |

Alexander O'Mara · Accepted Answer · 2019-12-27 19:17:41Z

11

It depends a lot on context.

50 upvotes on an answer within a few days isn't necessarily surprising. It happens all the time with posts that make it to the Hot Network Questions list.

On the flip side, a few upvotes on a really bad question or answer could be highly unusual (maybe even a sign of voting fraud; I've seen it happen).

The popularity of the topic of the Q&A will also influence the number of votes, and not necessarily in a positive way.

There is also the fact that more users have upvote privileges than downvote privileges, and many users are reluctant to use their downvote privileges for one reason or another. This means that a question some people find useful may get mostly upvotes even if most people do not find it useful.

Additionally the timeframe a post was posted in can have a huge impact. The best solution from 5 years ago may not be the best solution today, but after 5 years of upvotes it could take years for the better answer to rise to the top. On a busy site, even the time of day the post was made may have an impact, at-least in the short term.

Evaluating the significance of the number of votes would require accounting for all of these variables, and probably more.

edited Dec 27, 2019 at 19:17

answered Dec 27, 2019 at 19:12

Alexander O'Mara

9,0486 gold badges29 silver badges44 bronze badges

1

Completely agree.
– Travis J
Commented Dec 27, 2019 at 19:42
"not necessarily in a positive way". Can you give an example?
– joel
Commented Dec 27, 2019 at 23:34
2

@JoelBerkeley Not sure about any good examples, but posts about popular topics may get a lot of votes, even if the content isn't that great.
– Alexander O'Mara
Commented Dec 27, 2019 at 23:39
oh, right, I thought you meant positive as in positive gradient as opposed to as in desirable
– joel
Commented Dec 27, 2019 at 23:42
Exactly. As long as a question gets enough views over time, the vote count will eventually go up.
– Mast
Commented Dec 27, 2019 at 23:42

Add a comment |

This_is_NOT_a_forum · Accepted Answer · 2019-12-27 23:46:50Z

There is no real answer to this question, unless the answer involves a LOT of criteria (which would be wrong, since answers are supposed to be clear and concise).

Let's talk about why.

Basically, there are some questions which are popular, and some that aren't. And this may not necessarily share a relationship with how good the question is or anything- it usually just means that the question may be hard to answer, or the group it is targeted towards has a small population. That isn't necessarily a bad thing.

How this relates to this question? Well, a post can only get so many views. Even if the answer is an extremely well written, superb, best-in-all-of-Stack Exchange kind of answer, it's votes will always stay bottlenecked by the popularity of the question.

Another thing you might want to factor in is the site it is on. Is it on high-traffic sites, like Stack Overflow or Meta Stack Exchange? Or is it on low-traffic, lesser-known sites? (I'm not going to take any names.)

Taking that into consideration, my philosophy is: always judge a post based on a) how helpful it seems to me, and then b) the number of votes it has relative to the posts around it.

Note that the above philosophy is still prone to failure, since everybody has a personal preference, and while there might be a post that is amazing when taken into an objective context, somebody might disliked it for reasons known to them and it may have been just recently posted, therefore acquiring a voting-score of -1 while others around it have higher numbers.

I could go on and on about other things that need to be taken into consideration; however, I don't want to bore you. The point here is: there is no clear, concise, one-line philosophy to follow. Pick the one that suits you best, or better yet, make up your own!

GhostCat · Accepted Answer · 2019-12-27 20:46:49Z

Unfortunately, no number can do that.

First of all, the average vote count is really a function of time and more importantly, the average number of views.

Here on MSE, hitting the 200 reputation daily cap limit is something that often just happens. During my years on stack overflow it was real hard work (most of the time at least) to hit that limit. Hitting 200 on MSE before lunch happened frequently, on SO I achieved that maybe 2, 3 days out of over 1500 days I spent there... And the reason? More views, but overall less people writing questions and answers.

Therefore it doesn't mean much if a vote has 3 or 5 or 10 upvotes. Even on purely technical questions I have seen for example that a person like Jon Skeet came in a few seconds later, mostly writing the same content as the first answer. Yet Jon saw 10 upvotes in 2 hours, and the other guy sits at 1 or 2.

Thus: there is no universal rule. Sometimes a complicated answer only gets 1 downvote, and all the newbies upvote a more simple answer. Albeit the more complicated answer was better...

Yes, it is way too complicated. There are also positive feedback loops - previous votes affecting future votes (pile on upvoting or downvoting) and more attention to some posts over posts due to vote counts (sorting) and the "Related" question and "Hot Network Questions" algorithms. — This_is_NOT_a_forum, Commented Dec 27, 2019 at 23:48

Rob · Accepted Answer · 2019-12-28 18:44:36Z

How many up-votes does an answer need for those up-votes to be statistically significant?

On getting the views and votes, and what do they mean ...

Many factors affect votes:

The season, holidays, day of the week, time of day, current events (school, sports on TV, national fireworks celebrations, big news, etc.), even disasters and connectivity.

On an inactive site most of the active members may get to see the question and it's answers over a period of a week or two, negating many of the above effects.

On a busy site the question may be pushed down the list, especially if it gets downvotes and one fast answer (for an Enlightened badge, or an Explainer, Refiner, or Illuminator badge, or the Lifejacket, Lifeboat, and possibly the Guru badge); with one quite good answer that is doing well others may be discouraged to enter a race where another answer has a head start.
Viewers may be attracted by the title of the question, the tags, and first two sentences (when they can see them), even the user's name and reputation. Votes, number of answers, views and status [Open/Closed] may also affect future visits. To vote one must visit, questions that don't attract views can't receive votes. Under certain circumstances a question may enter the Hot Network Question list, though not all sites gain such exposure; the result of such exposure varies enormously.

This is what your question looks like:

All of the above, and anything I missed, affects receiving a visit (and a potential vote).
Having arrived at the question, since you ask only about votes on answers, it requires at least one clear answer that tackles the question well. Since you only ask about "vote number per view count" (to determine "answer quality") the answer doesn't need to be complete or great, it only needs to seem OK and be favorably received. The Bandwagon Effect can come into play, in many flavours.

[Note: Those aren't my standards, they seem to be what is proposed by your question.]
Voting in political science has what is called: "input, output and throughput legitimacy", "negative and positive legitimacy", and "instrumental and substantive legitimacy".

Those are:
- Whom can vote and how: (reputation: 15 vote up, 100 vote down (and at a cost of 1 reputation, the loss of the vote down privilege), 1000 see up and down vote counts, instead of the total aggregate. The "throughput legitimacy" can refer to both correct counting of votes, and more frequently voting reversals.
- Negative legitimacy refers to people studying the question and answer and voting correctly while positive legitimacy refers to people holding appropriate tag badges.
- The instrumental legitimacy is the availability of an expert on the subject to answer your question (regardless of reputation, tags, votes), while the substantive legitimacy is people's (including the author's) belief that the author provides a defining answer. If someone wrote a book or paper about which you are asking that often becomes the case, but even Einstein made mistakes.

It's more than simply ability and motivation, turnout decisions and the quality of vote choice. Great questions (and their answers) can have few or a lot of views, while good or bad answers can have many up or down votes, even none. If a team of experts doesn't double check each Q&A then you need to do your own research/homework once you have a hint unless what is written is obviously correct.

See the Dempster–Shafer theory:

"Belief functions base degrees of belief (or confidence, or trust) for one question on the probabilities for a related question. The degrees of belief themselves may or may not have the mathematical properties of probabilities; how much they differ depends on how closely the two questions are related. Put another way, it is a way of representing epistemic plausibilities but it can yield answers that contradict those arrived at using probability theory.".
- and the Transferable Belief Model:
  
  "The transferable belief model (TBM) is an elaboration on the Dempster–Shafer theory (DST) of evidence developed by Philippe Smets who proposed his approach as a response to Zadeh’s example against Dempster's rule of combination. In contrast to the original DST the TBM propagates the open-world assumption that relaxes the assumption that all possible outcomes are known.
  
  Lofti Zadeh describes an information fusion problem. A patient has an illness that can be caused by three different factors A, B or C. Doctor 1 says that the patient's illness is very likely to be caused by A (very likely, meaning probability p = 0.95), but B is also possible but not likely (p = 0.05). Doctor 2 says that the cause is very likely C (p = 0.95), but B is also possible but not likely (p = 0.05). How is one to make one's own opinion from this?
  
  Bayesian updating the first opinion with the second (or the other way round) implies certainty that the cause is B. Dempster's rule of combination lead to the same result. This can be seen as paradoxical, since although the two doctors point at different causes, A and C, they both agree that B is not likely. (For this reason the standard Bayesian approach is to adopt Cromwell's rule and avoid the use of 0 or 1 as probabilities.)".
More votes (up or down) from more people doesn't make something more correct or incorrect, it simply moves the line from zero up or down; unless every vote is correct.
- Distilled it degrades to Segal's Law, an adage that states:
  
  "A man with a watch knows what time it is. A man with two watches is never sure."
  
  "In reality a man possessing one watch has no idea whether it is the correct time unless he is able to compare it to a known standard, in which case he effectively has more than one watch. This situation is not made any worse by having two watches.
  
  One might even think that it is better since if the two watches are in approximate agreement one might assume that both are working and an average of them will yield the correct time to within some accuracy depending on the specification of the timepieces.
  
  While this is true, the probability of knowing the right time is still exactly the same as with one watch. This is because the probability of all combinations of states of the two watches needs to be taken into account.".
In the paper: "Assessing the reliability of crowdsourced labels via Twitter" (.PDF), by Noor Jamaludeen, Vishnu Unnikrishnan, Maya S. Sekeran, Majed Ali, Le Anh Trang, and Myra Spiliopoulou they write:

"We propose a new annotation tool for tweet sentiment labeling, that capitalizes on topic-specific expertise of Twitter users. We derive topics from the tweets and use them to derive topic-based reliability scores for the annotators. These scores we use in a weighting scheme for the annotated tweets. This allows us to exploit the fact that an annotator may be more reliable for tweets belonging to a certain topic than to other topics. ... We compare our model with Kappa Weighted Voting and Majority Voting as baseline methods, and show that our approach performs well and is robust when up to 30% of the annotators is not reliable.".

As explained in the first half of this answer Stack Exchange doesn't use voter reliability or tag expertise, the ability to vote is strictly privilege based. Sometimes you simply need to assess the author and decide if it's worth the time and effort to confirm their answer, no one is required to do that before they vote; but you may need to do that before you rely on the answer.

Can I rely on an answer with one up-vote? And with 100 up-votes? If they're different, where's the line? What if the question has a small number of views? Do the answers to the above change?

Sometimes only ~10% of the views will be accompanied by upvotes without any downvotes, and often that is an indicator that the answer is correct, but it would be foolish to apply anything as a indicator of certainty.

And related: What is a statistically significant number of down-votes?

Sometimes if there's ~10% downvotes compared to upvotes that indicates some problem, but darned if you could spot it if you don't know the answer; it's also possible for those votes to be incorrect (or from competitive answerers).

An answer with a few downvotes and no upvotes can be correct, even accepted.

The badges Tenacious and Unsung Hero are awarded for zero score answers. Some sites, such as here or Space.SE, have awarded zero of those badges while other sites, such as Physics.SE (220) or Chemistry.SE (12) have a small number of those badges awarded. On Stack Overflow they've awarded 78.3K of those badges. At the same time, accepted answers can clearly be wrong.

Condorcet's jury theorem explains:

"The assumptions of the simplest version of the theorem are that a group wishes to reach a decision by majority vote. One of the two outcomes of the vote is correct, and each voter has an independent probability p of voting for the correct decision. The theorem asks how many voters we should include in the group. The result depends on whether p is greater than or less than 1/2:

1. If p is greater than 1/2 (each voter is more likely to vote correctly), then
adding more voters increases the probability that the majority decision is
correct. In the limit, the probability that the majority votes correctly
approaches 1 as the number of voters increases.

2. On the other hand, if p is less than 1/2 (each voter is more likely to vote
incorrectly), then adding more voters makes things worse: the optimal jury
consists of a single voter.".

In the paper: "Measuring Voter Decision Strategies in Political Behavior and Public Opinion Research" (22 Mar 2018), by Richard R Lau, Mona S Kleinberg, and Tessa M Ditonto, in the AAPOR journal "Public Opinion Quarterly", Volume 82, Issue S1, 2018, Pages 911–936, they write:

"Broadly speaking, a decision strategy is “a set of mental and physical operations that an individual uses to reach a decision” (Lau and Redlawsk 2006, p. 30; see also Payne, Bettman, and Johnson [1993]; Lau [2003]; Redlawsk and Lau [2013]). At the very least, decision strategies involve plans for gathering relevant information (from the external environment, and/or by search through memory), evaluating that information, and choosing among alternative courses of action.

Lau and Redlawsk described four broad types (or, in their words “models”) of decision strategies that are employed by citizens in making vote decisions. These four strategies differ in how much information is gathered (depth of search), and how evenly that search is distributed across alternatives (comparability of search)—the two major dimensions identified by psychologists across which various decision strategies differ (e.g., Jacoby et al. 1987; Ford et al. 1989; Payne, Bettman, and Johnson 1993).

...

We also propose a fifth possible type of decision-making, one that gets more attention in the popular press than among psychologists, and is colloquially referred to as “going with your gut.” Keeping this common colloquial label, strategy 5, Gut decision-making, is strictly affective, usually unconscious, and involves no deliberate external searching for information. It should surely be associated with shallow information search, with no effort whatsoever to compare alternatives on anything other than how they make you “feel” (Dane, Rockmann, and Pratt 2012). Allegedly, it often provides very good decisions—or at least choices that, retrospectively, decision-makers feel good about.".

As you can see, the quality of each vote (up / down, even abstain) can vary greatly.

Thanks for such a detailed answer. While I read it, I'll mention that "vote number per view count" isn't what I meant. Only that view count will influence vote count. For example, I don't expect them to necessarily be linearly correlated. I'll see if I can make that clearer in my post — joel, Commented Dec 28, 2019 at 23:22
Joel, (counting single sentences and one image, each as paragraphs) the first 9 paragraphs cover "when will you generate a view" -- zero views = zero votes, nowhere do I suggest "a linear correlation", in fact efforts have been made to clairify each aspect. --- Since I retired last year at 20K Flair I use reputation earned to measure the seconds devoted to each Q&A, seasons greetings. — Rob, Commented Dec 28, 2019 at 23:54

Stack Exchange Network

What is a statistically significant number of up-votes?

5 Answers 5

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
discussion
voting
down-votes
up-votes
.

Linked

Hot Network Questions

What is a statistically significant number of up-votes?

5 Answers 5

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged discussionvotingdown-votesup-votes.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
discussion
voting
down-votes
up-votes
.