54

Three-vote close and reopen has been something I've been thinking about for ... a really long time. I'm really happy to finally be able to present the results of this test to you all. This post is long, so the short version is - we tested three-vote closure in two ways, and in general, this seems to help sites in both cases. I've tried to keep this post relatively neutral because I know that there're many factors that I can't control for, that could have impacted this test, but it seems this change is neutral at worst and beneficial for many sites.

Three-vote-close/reopen is not, as I mentioned in the original announcement of this test, a perfect solution. It presumes that there are at least three people using the site who are willing to put in the time and effort to review posts and close the ones that need to be closed. Without that, this test can't be successful. We are looking into other options that will help with that. As Philippe mentioned in his 2021 Q4 Roadmap blog post, I'll be investigating an idea to weight votes for close reasons other than duplicates - this may be an alternative or addition to 3-vote close for sites in various situations. I look forward to talking with y'all about that in the coming weeks.

The Numbers

Note: The query I'm using checks dates relatively - meaning I input days before today (e.g. GETUTCDATE()-210) rather than stating a specific date. Because I didn't document which date I ran the queries on, and thus which date range, there is some variation in whether some site met certain criteria.

I've since reworked the queries a bit so that I can show the data over time, which has made it possible for me to include some charts at the bottom of this post - while I'm only including a few for illustrative purposes here, I'll post them for each site when I make the site-specific results posts on the relevant sites’ metas.

Days checked

  • Before: -210 to -150 days before Monday 27 Sept ~ 27 Feb - 30 Apr 2021
  • After: -90 to -30 days before Monday 27 Sept ~ 29 Jun - 28 Aug 2021

Table 1 - low review completed group

Site % Reviewed
Before
% Reviewed
After
Change % Closed by
Mods before
% Closed by
Mods after
Change
Stack Overflow en Español 45% 64% ↑ 19 41.9% 14.4% ↓ 27.5
English Language & Usage 68% 95% ↑ 27 10.2% 18.6% ↑ 8.4
Server Fault 51% 46% ↓ 5 10.1% 5.1% ↓ 4.9
Software Engineering 45% 54% ↑ 9 38.8% 12.3% ↓ 25.5
Arduino 47% 97% ↑ 50 61.3% 94.7% ↑ 33.4
Home Improvement 28% 54% ↑ 26 23.2% 5.4% ↓ 17.7

Table 2 - high moderator-led closure group

Site % Reviewed
Before
% Reviewed
After
Change % Closed by
Mods before
% Closed by
Mods after
Change
Stack Overflow em Português 86% 78% ↓ 8 95.5% 64.2% ↓ 31.2
Cross Validated 94% 93% ↓ 1 97.1% 92.1% ↓ 5
WordPress Development 61% 65% ↑ 4 76.5% 51.7% ↓ 24.8
Drupal Answers 75% 82% ↑ 7 97.4% 64.1% ↓ 33.4
Artificial Intelligence 86% 82% ↓ 4 97.5% 38.2% ↓ 59.2
Software Recs 86% 90% ↑ 4 100% 100% 0
Anime & Manga 64% 73% ↑ 9 98.7% 85.1% ↓ 13.6

Percentages of completed reviews

For the first six sites (Table 1), we were primarily looking for improvement in the review completion percentages. Of those, during the time periods checked, five improved and one got worse, but only slightly.

These increases are welcome, certainly, but this doesn't seem to be enough on some sites to really get the reviews completed consistently. English Language & Usage and Arduino both made it up to the 90% completion range but the other four sites are still under 70%. While it's likely that some questions will always age out of review due to varying interest in reviewing, I would prefer that sites were above 80-90% completion as much as possible.

That said, the improvement was seen and it seems to be stable. I'll be interested in seeing how this changes over the next six months and whether it can help other sites as much as it did these. What was nice to see was that, even on the remaining seven sites (Table 2) where this wasn't the focus, four saw an improvement in completions and of the three that dropped, two were relatively small reductions.

Percentages closed by mods

We want this to go down. For the most part, it does, particularly on the sites where this was the focus of the study (Table 2), but I think what we're seeing here is that the change is happening but not necessarily as much as the mods may benefit from to lighten their loads. While this change is a significant reduction on some sites, it's non-existent or minimal on some, such as Software Recs and Cross Validated.

The two sites that show an uptick (English and Arduino) also show an increase in review completion, so it's possible that moderators are picking up more reviews after two votes. The query I'm using for this doesn't check whether the mod cast the "final" vote - only whether they voted at all. Arduino had a new moderator join the team on July 20th, right in the middle of the window I'm looking at - so this likely explains the increase. When I look at a similar 60-day window before that, I see mods handling only 44.5% of post closures.

In the tables above, while I'm focusing on “closed by mods,” I did look at the overall close and reopen numbers to see if there was any change in reopens - there's generally very small numbers of questions reopened, so I'm not certain the data is as valuable. What I do think is worth looking at is whether there was a change in the number of reopened questions as a function of closed ones though that doesn't necessarily indicate that users were doing more of this.

Table 3 (all test sites)

Site Questions
closed before
Questions
reopened before
Questions
closed after
Questions
reopened after
Stack Overflow en Español 1172 20 1588 43
English Language & Usage 1010 23 1395 128
Server Fault 447 1 468 7
Software Engineering 363 4 341 7
Arduino 93 1 414 12
Home Improvement 108 1 129 4
Stack Overflow em Português 3983 77 2658 29
Cross Validated 1478 99 1034 63
WordPress Development 557 9 395 5
Drupal Answers 272 5 153 11
Artificial Intelligence 118 0 68 2
Software Recs 37 3 44 2
Anime & Manga 77 1 67 2

I'd say that, with the exception of English and a small uptick on Drupal, most of these numbers were relatively consistent, so we can safely say that reopen votes generally aren’t affected.

Pretty Pictures:

A few notes - I'm choosing a few sites to show these images for since these graphs over time can do more to show trends than my static 60 day numbers that, well - so many things can cause it to fluctuate - time of year, changes to review process, COVID etc. I'm including dates from February of 2020 through late September 2021. You can click on any of these images to view the full-sized one.

Reviews completing

These charts show percentages on the vertical axis (note that when sites hit 100% the graph tops out at 120%) over time in weeks on the horizontal axis. The "pending" status indicates reviews that are currently in the review queue, so it's expected to only see anything in the last two weeks that's still pending.

English Language & Usage

On EL&U, this is sort of the best-case scenario for these tests - it's what I'd hope would happen on any site - there's a relatively clear indication that the site has struggled to keep up with review tasks over time but this change leads to a consistent completion of tasks. There are more “leave open” reviews (the light blue line at the bottom), too - while this could mean a number of things, it does indicate that people aren't just closing everything with their new-found lower number of people needed to close.

Graph of English Language & Usage close review status from 2020-02-01 to 2021-09-27. It has four lines plotted over time. These lines represent completed reviews, reviews completed with the outcome being a closed question, reviews completed with the outcome being the question left open, and pending reviews. It shows a very big changes up and down, of completion of tasks until early May 2021, when the completed reviews stay consistently near 100%

Stack Overflow en Español

SO en Español is quite different - while there's an uptick and a trend in improved percentages initially, in recent weeks there seems to be a drop in completed reviews - it'd be worth understanding what's changed there. Hopefully this is a temporary drop and we'll see it pop back up over time.

Graph of Stack Overflow en Español close review status from 2020-02-01 to 2021-09-27. It has four lines plotted over time. These lines represent completed reviews, reviews completed with the outcome being a closed question, reviews completed with the outcome being the question left open, and pending reviews.  It shows completed tasks generally between 25-55% until May 2021, where it averages between 53-76% until taking a dip in late August 2021

Software Engineering

From the early parts of the chart, it's clear that Software Engineering has been struggling with low completion rates for review tasks for a while. While this change wasn't a huge win that brought the site up to 100% completion, it's clear from the numbers that their lows now are about the same or above their highs before the change - so this feels like a reasonable improvement overall.

Graph of Software Engineering close review status from 2020-02-01 to 2021-09-27. It has four lines plotted over time. These lines represent completed reviews, reviews completed with the outcome being a closed question, reviews completed with the outcome being the question left open, and pending reviews. In the early part of the graph, the completed tasks is between 20 and 50% but after May 2021, the completion rates are between 45 and 75%.

Server Fault

Server Fault had a much more modest response to the change. While their completion rates are consistently above 40% now, which they haven't been since August of 2020, they're also rarely above 65%. This graph leaves me wondering what happened in August 2020 to impact the reviewing on the site.

Graph of Server Fault close review status from 2020-02-01 to 2021-09-27. It has four lines plotted over time. These lines represent completed reviews, reviews completed with the outcome being a closed question, reviews completed with the outcome being the question left open, and pending reviews. Before August 2020, the site consistently completed between 50-90% of review tasks. Between August and April, the completions swing up and down between 20-80% but more frequently between 20-40%. After April, the completions are consistently between 40-70%.

Moderator vs User completion

These graphs are for roughly the same period of time as the prior ones - for reasons™, this uses a lookback in days (GETUTCDATE()-605 from 28 September 2021) rather than being set to a specific date range. This graphs number of posts closed on the vertical axis and time in weeks on the horizontal axis.

In the grand scheme of things, this felt like a smaller "win" than low completion rate. While Table 3 above shows consistent reduction in closure entirely, most of these sites saw an increase in reviews completed, which perhaps indicates that there were fewer posts nominated for closure. This could mean that fewer posts were close-worthy or that mods were less likely to unilaterally close a post after this change went into effect.

It's worth saying at this point that we don't ever expect moderators to close zero questions - there will always be times, particularly on smaller to medium-sized sites where moderators see posts that clearly should be closed, whether that's duplicates, blatantly out of scope questions, or low quality questions that need to be made clear. Unlike the review completion, it's OK if mods are doing 10-50% of the closures - what we don't want to see is them doing all of the work on their own.

Stack Overflow em Português

This was the largest site we ran these tests on and Table 3 above shows a reduction in questions closed during the 60 day period reviewed - and that seems to be confirmed in the trends here, though the control period seemed to be one of higher-than-average closure numbers (~450-500 per week vs ~300-450 per week). Regardless of the total numbers of posts closed, the graphs clearly show that this is the first time in recent history when users have been consistently getting 50-150 closures per week, whereas before May 2021, users generally closed between 10-60 questions per week. Moderators are still doing a decent amount of closing but they've got some help from community members now.

Graph of Stack Overflow em Português questions closed and reopened over time starting in February 2020 until September 2021 with lines for total questions closed, questions closed by moderators, questions closed by users, as well as lines for these three groups for reopens. The contents of the graph are explained in the text.

Artificial Intelligence

In contrast, AI was one of the smaller sites we tested this on, and I see a good indication of increased community participation. I'm presenting two graphs here as an outlier week in early 2021 makes it difficult to see the change in recent months, so the second graph narrows the lookback to 235 days. There has been a reduction in questions closed but, with only a small 4% drop in percentage of review tasks handled, that could mean that fewer questions needed to be closed or fewer questions were nominated for closure.

Graph of Artificial Intelligence questions closed and reopened over time starting in February 2020 until September 2021 with lines for total questions closed, questions closed by moderators, questions closed by users, as well as lines for these three groups for reopens. Until May 2021, the bulk of the 10-40 question closures per week were done by the moderators. There is a huge outlier data point at 137 closures in a week in January 2021 that makes the other points very crowded - read the next image for a more detailed description of the recent data. The same graph as before but it begins in March 2021 to cut off the outlier data point. Prior to May 2021, most of the closures were handled by moderators but afterwards, there is an increase in community participation in both closure and reopening.

WordPress

This graph seems to show that WordPress has made a pretty sizable change in community closing over the past few months - prior to this change, the community was closing between 1-25 questions per week on average with mods doing the bulk of the 35-75 average closures. While the number of closures per week seems to vary a lot, community members are more frequently closing 25-60 per week while the moderators are having to act on 10-50, a much more even balance.

Because the closures vary so much on WordPress, I'm also including a smoothed monthly view, which shows the community participation a bit more clearly.

Graph of WordPress questions closed and reopened over time starting in February 2020 until September 2021 with lines for total questions closed, questions closed by moderators, questions closed by users, as well as lines for these three groups for reopens. Closures per week vary widely between about 25 to 125 over time but, in general, the moderators are doing most of those closures. There are a few points where users members do more but the big change happens starting in May, where the users actually started handling more of the total closures. The same graph as above but, due to the wide variation in week-to-week numbers, this is plotted by month. The result is a much smoother graph showing between 175-400 closures per month, mostly done by mods until the users start doing the bulk of the closures after May 2020.

Software Recommendations

Having run a similar test on Hardware Recommendations, I was curious whether this change would benefit Software Recs - while it's possible that users are doing more to help out by casting that first and second vote to close, it seems like the moderators are still doing the bulk of the work to actually close questions. I'll be interested to look into this more closely as I'm concerned that the results on this site may indicate that there's a site size or engagement level at which it's simply unlikely that the community can do the work of closing posts, even with a lower number of votes needed.

As you may have noticed in the table above, both before and after this change, moderators were closing 100% of posts closed on Software Recs and this seems to be the general case. The line that pops above the bottom on this one isn't community closures like it is on many of the other sites - it's moderator reopens. Because of the small number of post closures and the large variation, I'm going to show by month rather than week.

Graph of Software Recommendations questions closed and reopened over time starting in February 2020 until September 2021 with lines for total questions closed, questions closed by moderators, questions closed by users, as well as lines for these three groups for reopens. This graph shows that between around 10-35 questions are closed per month, trending downwards since January 2021, where the closures are 10-20 per month. The lines for all closures and moderator closures are the same except in two months (both prior to May 2021) where the moderators closed one fewer posts.


I have a ton more graphs I can share but this post is huge as it is - I'll be making site-specific posts over the next week or so with the info above as well as additional info where it was requested. I'll also be using the information I have to start 3-vote close/reopen on the other sites that have requested it. In general, I don't see any indication of negative impact.

I'd like to thank you all for your patience as I got this project moving and the data assembled. I'd also like to say thanks so much to Shog9 for the queries I've used to make the tables and graphs above as well as Nicolas Chabanovsky for creating a bunch of new queries that I'll be using in the per-site posts. Additionally, thanks go to Slate for helping me review the queries and ensure they were showing what we intended.

15
  • 11
    Thank you for the detailed results. I can see how much work this involved. Making it possible to make data-driven decisions is worth it though, I promise :)
    – ColleenV
    Commented Sep 28, 2021 at 17:53
  • 14
    (Also, you did a great job on the image descriptions; that's not as easy as some might think)
    – ColleenV
    Commented Sep 28, 2021 at 18:09
  • 10
    @ColleenV Thanks! And I really appreciate that you noticed the descriptions! It really is hard to do without being too repetitive but reliance on images is something I wanted to avoid - so I appreciate it!
    – Catija
    Commented Sep 28, 2021 at 18:46
  • 2
    Wow, I can't believe 100% of the closures on SoftwareRecs are by mods. I need to visit that site more. Do you have data on whether any of the questions closed were at least flagged for closure by normal users below 3k reputation? Or are moderators there literally the only users closing things?
    – TylerH
    Commented Sep 28, 2021 at 19:24
  • 1
    @TylerH, maybe we need to raise more custom flags or comment flags there to keep mods busy, so community users get the chance to do some closing as well. But on a more serious note, an interview with that sides moderator team might provide some insight on what exactly happens, and more importantly how that specific moderation team feels about the current state of affairs.
    – Luuklag
    Commented Sep 28, 2021 at 19:41
  • @TylerH That's not unexpected for a small site that gets an average of 3.4k visits a day. Also, the site only has 19 non-mod users with 3k+ rep and some of them (didn't check how many) are not active.
    – 41686d6564
    Commented Sep 28, 2021 at 19:42
  • 2
    @Catija I notice that for some sites you give ballpark figures of the amount of posts regularly closed. Would it be possible to provide these sort of figures for all sites in the test. It could shed some light on how problematic a high mod closure percentage is for example. If there are 100-1000's of questions closed per week, or only about 10, that would make a huge difference in the workload obviously.
    – Luuklag
    Commented Sep 28, 2021 at 19:43
  • @41686d6564 Yeah, I suspect as much for a small site that mods will do the lion's share of the closing, but 100% is remarkable (especially if no one is flagging for closure, either).
    – TylerH
    Commented Sep 28, 2021 at 19:43
  • 2
    @Luuklag Table 3 has much more specific numbers - those are over the two 60 day periods I checked. The numbers attached to the graphs are just my estimates based on the points I see, so they're actually less precise. But I'll have that info in the individual posts I make on sites and they'll be linked to here when I make them. I just couldn't go over 13 sites twice in this post or else it would have been way too much.
    – Catija
    Commented Sep 28, 2021 at 20:00
  • 2
    Ah yeah, thanks for that Catija. So it appears that although the mods on Software Rec. close 100% of the questions, they on average close less then 1 question a day. I don't see that as very problematic.
    – Luuklag
    Commented Sep 28, 2021 at 20:05
  • "The query I'm using for this doesn't check whether the mod cast the "final" vote - only whether they voted at all." - If there are enough users working the queues on a site it's always fair if the moderator reviews last (with the equivalent vote of a regular user) or earlier if it's undeniable that it should be closed. --- It cuts out "users assisting with moderation" if the moderator hammers the review, we see that (and the opposite, no users reviewing) on some sites; something to be avoided both for the sake of the site in question and other sites where the habit may follow the users.
    – Rob
    Commented Sep 29, 2021 at 4:46
  • 2
    @Rob I do have a query (one of Nicolas's) that identifies how many votes mods cast between 1-5 in a given month. Here's what it looks like on SOPt, for example, and on Software Recs - if I'd looked at something like this before testing on SR, I would have known that only single-vote close would have made a difference there, since nearly every vote by a mod is the first vote to close. This graph gets really messy the fewer votes/month there are on a site when the mods' voting varies a lot.
    – Catija
    Commented Sep 29, 2021 at 4:57
  • Catija, thanks for the reply, Nicolas has already suggested one vote close at SR.SE. Links to the query at SEDE would be easier to use than images that are 1010x593 pixels, they are thumbnails. This appears to be on track: data.stackexchange.com/stackoverflow/query/987014/… for modifications.
    – Rob
    Commented Sep 29, 2021 at 9:08
  • @Catija I'm having difficulty reading the post for the following reason: Right before table 1 the clear definition isn't stated of what the two columns % Reviewed Before and % Reviewed After mean (the expressions aren't self-explanatory by themselves). This leads the reader to start considering the data only to find themselves second guessing the exact meaning of the first columns. I would suggest for easy of reading leading the table with a clear definition in prose of those 2 columns.
    – bad_coder
    Commented Sep 29, 2021 at 13:56
  • 5
    @bad_coder The title of the table should explain it - they're completed reviews.
    – Catija
    Commented Sep 29, 2021 at 14:06

0

You must log in to answer this question.