104

Summary

With the introduction of new, simplified graduation requirements, we're preparing to shut down the venerable Community Eval review process. Years of waning participation, combined with a general lack of clarity as to how these mesh with the normal graduation process, have turned this system into a bit of an anachronism at best, and a waste of time at worst.

This leaves open the question: how should we be addressing the needs that motivated this system's creation in the first place... And do we even need to?

Background

Years ago, Grace Note tossed out a complicated question for discussion:

How can the community pull off a self-evaluation that would be meaningful? What would be the process? How would it be organized? Can we use meta and/or chat? The process and the results should be transparent so the community can learn from the experience.

At the time, we – the six members of the community management team��– were manually reviewing every beta site periodically, checking traffic, user engagement, and Q&A quality. Our biggest concerns at the time, born from experience on some of the earliest sites, was that these young communities would devolve into content farms, discussion boards, or simply grow quiet and gradually decay. So every few weeks, one of us would grab an analytics report and a pile of random questions and try to read the tea leaves...

By the end of 2011, it was clear this was neither scalable nor even particularly useful; the results were secret, emailed to a handful of people within the company and quickly forgotten about. Meanwhile, Robert found himself explaining again and again that we were not going to be shutting down otherwise-healthy sites that were simply a bit small...

So one day, in one of our regular team meetings, we hatched this crazy idea: instead of quietly judging site quality in some secret star chamber and then trying to hand-wave away questions about our inscrutable process, why not ask the folks who might actually know something about the topic to take a good hard look at what they'd built and... Decide for themselves if they were really making the 'Net better.

So Grace Note raised the matter for discussion, while I kicked off an ad hoc experiment on Gardening.

A year and a couple dozen further experiments later, we built an automated process for this on top of the then-new review system...

The problems

Anti-introspective

Looking back at how this was automated, it seems clear that we'd already lost track of why we were doing it in the first place. What we wanted was an opportunity for the community to gather and engage in some collective introspection; what we got was a parallel voting system, the results of which were divorced from any discussion or long-term artifacts.

For internal use, it offered the seductive convenience of scanning a row of numbers. Whether they meant anything is anyone's guess, but they look serious. But as to whether they helped the community gauge its own progress, face up to persistent problems... It's instructive to compare the last “experimental” quality eval post on Japanese Language and the first automated site eval post a year later: slightly less voting, 100% less discussion. This became a pattern...

Unclear goals

The problem with trying to kill two birds with one stone is that you often miss both birds and lose your stone.

  • Our internal goal was to see how the sites stacked up against the greater Internet - was the site adding to it, or just imitating? Hence the instructions referring to Google searches and associated comparison.
  • Our goal for going public was to draw the community's focus away from numbers they couldn't easily change and onto quality issues that they could. Hence the talk of closing and editing.

Nothing wrong with these goals, but nothing in the actual process guided reviewers toward either one - instead, it just generated more numbers. Intrepid members who did step up to discuss problems that they observed often just ignored both and took the opportunity to talk about whatever else was on their mind; in no instance did the system attempt to guide or correct anyone's chosen strategy – if you wanted to go down the list voting satisfactory for everything and then call it a day, you could... And many reviewers did just that.

The questions

In conclusion, we're not happy with how this system is working. It frequently fails to provoke any useful introspection, and perhaps worse, when it does prompt some thoughtful response these writeups tend to sit unaddressed and undiscussed next to a pile of numbers from folks punching buttons.

So...

Do any of you still find these useful?

...and if so, how so? Is there a baby lurking somewhere in this murky water?

Is this sort of forced introspection even needed anymore?

A lot has changed in the past three years. There are more review queues, more opportunities to identify and tools to address problems, better exposure of meta discussions, and better up-front vetting for site creation itself. Custom off-topic reasons promote discussion and resolution on the questions themselves, and stats offer experienced members a bird's eye view of problems as they develop. And there's a new process for graduation and hopefully clearer criteria for site closures, hopefully reducing the fear and doubt that so often led to the site-stats fixation in the past...

Your thoughts?

10
  • 9
    I think I'd be the jerk and say the eval process wasn't as useful as it was meant to be, while I have clearly nothing better in my mind. At the very least, as someone who just got their main site <cough> graduated, I'll think about it. The eval process garnered reckless votes. IMO, the satisfactory vote option was the worst part. Whenever one got undecided, they voted the stuff as satisfactory. The worst of the questions ended up piling 10-something satisfactory votes, and then, even with a net score of -5 or what, they ended up unnoticed.
    – M.A.R.
    Commented Jun 19, 2015 at 20:19
  • 5
    Jon Ericson's post on the subject seems relevant. (Linking since not everyone here reads meta.Puzzling).
    – user259867
    Commented Jun 19, 2015 at 20:42
  • 24
    For me the problems begin with Area51. Does not feel like the site is a good tool to decide which sites to launch in the first place and which not. But that is another discussion...
    – juergen d
    Commented Jun 19, 2015 at 21:08
  • 3
    They started to feel just too random to be useful at some point to me. With 10 questions, the randomness due to question selection is pretty much making the numbers that come out of the process almost useless. Commented Jun 20, 2015 at 11:45
  • These evals remain entirely devoid of any usefulness.
    – user154510
    Commented Jun 21, 2015 at 7:13
  • 1
    One thing I noticed from having my edits reviewed is, they are getting reviewed by people who have no expertise in the area in which I am writing. People get a random review queue, it looks good, they approve; but later it turns out I was wrong in my edit. Suggestions to fix that: - to allow users to specify tags in which they are more proficient, and to have them review questions only with those tags. - Somehow encourage them to skip questions whose subject matter they are less familiar with. - If one posted in a thread which has something that needs reviewed, that person should see it.
    – Alex
    Commented Jun 22, 2015 at 11:16
  • Do you know that your automated process made stackexchange sites a sentient being? Commented Jun 22, 2015 at 16:48
  • the bigger issue/ bottom line is se needs a more straightfwd/ transparent mechanism to gauge beta site health & for beta sites to graduate. otherwise its endless beta limbo. some beta sites are several years old. there needs to be some urgency on se side to deal with very old beta sites in some way, and it hasnt been there for ages... its the feeling of continually being in the middle of a (nonexistent!) queue and neither going up or down!
    – vzn
    Commented Jun 22, 2015 at 23:45
  • @vzn The post Graduation, site closure, and a clearer outlook on the health of SE sites is precisely about that
    – user259867
    Commented Jun 23, 2015 at 0:19
  • @Alex What kind of edits are you suggesting that turn out to be "wrong" later? For the most part, edits should improve presentation without changing the meaning. In any case, this isn't on topic under this question: if you'd like to request such a feature, you should post a separate feature request.
    – user259867
    Commented Jun 23, 2015 at 0:23

15 Answers 15

53

In a minority of the evaluations I've participated in, the evaluation has prompted some useful discussion -- maybe as meta posts directly on the evaluation post, or more likely as other meta posts (is this really within our scope? how can we improve that pattern in answers? are these the right tags? etc) or in chat. People can do that any time, of course, and often do, but sometimes the evaluation kick-starts it. That's good -- but it's also a minority of the cases. Most of the time it's as you say -- a few people try to do the review, maybe getting hung up on the requirements ("how does this fare in Google?" always hinders me), but mostly a stats post gets made and people don't really discuss it.

I think the introspection is important, and there's a lot of variation in how much of this communities do on their own. So we should do something, but not this.

How about having a set of questions, or categories of question, that should be asked at various stages in a site's development? You have the "7 essential questions" of early beta, so maybe something like that but less prescriptive. Make it milestone-based, not time-based (mostly), because every site is different. And here's where I wave my hands a little because I don't have metrics, but something like this: What questions should sites be asking themselves around the time they get question #500? #2000? The first Deputy badge? 200 active users ("active" to be defined)? 500? Five users at over 4k rep? 50 total hot network questions? 200 completed first-post reviews? 500 completed VLQ reviews? Two years in beta?

When a site reaches a triggering milestone, or when it's been "a while" (TBD) since the last activity of this sort, perhaps the moderators could receive a notification -- "hey, you reached such-and-such milestone, this might be a good time to talk about X if you aren't already, here's a template for a possible meta post (if you want it to come from a CM just ask)". And let the moderators take it from there. Their users might already be handling all this just fine, but if not, this can be a useful, no-obligation, help.

1
  • 8
    I like this. I wish this had been implemented before (and I hope it's implemented in the future).
    – HDE 226868
    Commented Jun 19, 2015 at 21:36
33

The problem I have had with Community Evaluations is, in part, what you state: they don't seem to generate much discussion.

But it's more than that, and it may be in fact the reason for the lack of discussion: the evaluations didn't seem very actionable. (Maybe that's what "Unclear Goals" is intended to mean?) That's really part of the problem, I think, with longer-lived but still small beta sites in general: it's very unclear what actions can be taken to improve the site's size/quality/et cetera, and something like a Community Evaluation didn't help.

Think of it as a review at work.

A bad review:

  • Lists the things you did last year
  • Grades those things
  • With an inconsistent and/or hidden set of standards

A good review:

  • Gives examples of behaviors, not just "things"
  • Points to what can/should be done to improve
  • Has consistent and fully disclosed standards

Community evals were much closer to the former: a bunch of evaluations, each of us with different standards undoubtedly, and no clear 'actions' to improve - that's left to the reader to figure out.


So, what's actionable in this answer?

What would be useful, in my opinion, would be something a bit more focused on improvements in the community.

I don't think there's a way to do this without some involvement from SE staff, though, or else some people who actually know how to grow a community. When you think about our sites that are in the middle - sites with potential to grow, but also sites that aren't actually there yet, and have been not actually there yet for a while - I bet you find a lot of people passionate about their sites and their topics of interest, but aren't specialists in community development. That's the thing: we all have opinions as to what can be done, but between a lack of time and a lack of actual knowledge, it's very hard to get things going.

That's where SE's community managers come in. They know something about community development, and can be a useful neutral observer to point out the flaws and point out things that can be done that perhaps don't appear obvious to users. Sites see them occasionally, and the attention is often very beneficial. Obviously there's not an unlimited number of CMs, and they have their own time constraints - but I think a limited amount of involvement can help here.

The Community Evaluation might be useful as a starting point to start those discussions - maybe a mechanical turk approach, asking us to find similar posts on other sites, for example - but I think the discussions need someone like a CM to start it, and provide some feedback.

If CMs don't really have time to do this on a practical level for the many beta sites that undoubtedly need the help, perhaps users who do know something about community development (and we have lots of those too) can volunteer to help with other sites, and then moderators of beta sites can put their site on a queue to get the help of those volunteers?

7
  • 1
    Re: using the evaluation tool to start discussions, one possible idea would be to take the results of the self-evaluation and open a Meta discussion for any question/answer which gets a bad review. (Ideally IMHO this is how we should be doing the self-evaluations anyway: community discussion on Meta or in chat. That doesn't produce a row of numbers you can scan, but it may lead to more actionable goals, or at least more community action in general.)
    – voretaq7
    Commented Jun 19, 2015 at 21:12
  • 1
    I'm not sure if that would work very well automatedly, unfortunately; I feel like the reason the current Community Evaluation Results post rarely gets much discussion is that it is automated, and nobody feels like they need to reply to a machine; while when I post something in meta, it always gets a reply, even if it's not very interesting, because people feel like they need to reply.
    – Joe
    Commented Jun 19, 2015 at 21:17
  • 4
    I think part of what keeps the current stuff from getting discussed is there's no obvious place TO discuss it (there's no text box on the self-evaluation so people just click the buttons and move on with their lives). By opening a discussion on Meta maybe people would reply to that - it could be automated or done by a CM in response to the self-eval results, but at least it's obvious that it's a subject for discussion rather than just some buttons to mash.
    – voretaq7
    Commented Jun 19, 2015 at 21:21
  • @voretaq7 Well, they do automatically post a single "Results" post. That rarely gets a reply, though. Certainly it would be nice to have more 'comments' type things on the review, though.
    – Joe
    Commented Jun 19, 2015 at 21:22
  • Yeah, short-circuiting the whole thing and just having a CM pick a couple of posts for discussion may be a better route to go (or for sites with a semi-active chat dropping a post in there as discussion fodder).
    – voretaq7
    Commented Jun 19, 2015 at 21:24
  • I really like the suggestion that these self-evals should be generating community feedback to feed to a CM who can synthesize them into more directed and personal Meta discussion.
    – Air
    Commented Jun 19, 2015 at 21:32
  • 3
    @voretaq7 We actually tried something like that on Japanese before the automated evals, but the end result was more a focus on "how can we fix these specific needs improvement questions" rather than "how can we improve the general quality of our site". In other words, even when there was a review-driven discussion, it was mostly focused around the 10 random questions picked by the community team/tool rather than an ongoing discussion about the site.
    – Troyen
    Commented Jun 19, 2015 at 21:44
29

I don't find the existing evaluation process useful at all.

For the purpose of introspection, I would find a "guest view" of the site useful. What does the site looks like from the outside? What do the people with no relation to the Stack Exchange network look for when they land there, and what do they find?

So, I propose a different automatic process:

  1. Once in 3 months, post the list of 20 questions with age between 3 and 6 months that got the most visits from search. The view counts are not useful for this, they are distorted by Hot Network Questions and favor clickbait titles.

  2. Right there in the automatic meta post, provide anonymous feedback data for those questions and answers to them. (If there are several answers, it's enough to give stats for the one that has the most feedback).

  3. Leave the rest to the community: they will ponder the results, and probably fix a post or two. This is not meant to make decisions about the fate of the site, but to encourage good housekeeping habits — something that's lacking in a few places around the network.


Notes:

In principle, (2) is already available, but only to those with mod tool access and to the tiny minority that uses Data Explorer. (1) is not currently available.

If visits from search are not available for technical reasons, pick the questions with the most feedback (total number of anonymous votes on the question and its answers).

The parameters in italic are tentative.

2
  • 3
    This is a good idea. "Do people see our best when they visit?" is an important question that has a higher probability of actively engaging the core users (that care about the site). Also, the result is immediately actionable: if our visitors see crap, we better fix it!
    – Raphael
    Commented Jun 22, 2015 at 9:00
  • 1
    Addition: don't put the same question(s) up for discussion every time.
    – Raphael
    Commented Jun 22, 2015 at 9:07
17

In their current state, these self-evaluations ask people to do work for no clear outcome. You start with a task-oriented queue to complete which ideally triggers multiple epiphanies to manifest in harmony on the meta site, leading to brilliance and actionable steps toward realistic goals. Maybe we weren't that optimistic about these when we rolled them out, but that's okay.

They don't work as intended.

What I like about these is:

  • They prod the community to look at how other sites in the topic space are doing
  • They could prod the community to look at what other sites in the topic space are doing

If we have a site about mayonnaise, we want it to be as good, or better, than any other site about mayonnaise. In only very rare occasions has a topic come together in questionable quality on our engine, and quality is something we shouldn't need robots to remind us to check. If quality tanks and goes unnoticed, we've got far bigger problems to solve.

There are all sorts of bias going into how people rate the random sampling of questions, but that doesn't matter nearly as much as the fact that the number of people participating in these is exceedingly small to begin with.

Let's keep this, but:

  • Kill the 'rate our quality' queue altogether. People rate the quality of their content every day by voting. It's presumptuous of us to ask them to do it in this manner and the results aren't useful to us or them. In fact, when the results are overly critical, this exercise can be actively harmful as it leads to self-deprecation.

  • Open a discussion instead asking people to take a look at the front page of their site, which is a better representation of the kinds of questions people are more likely to care about, and do some casual searching.

  • Ask people to share their thoughts as an answer to the discussion. What are we good at? What could we change for the better? How's our relevance to the topic for the Internet at large? Is there anything in our way that we could change?

The value in the following discussion would be light years ahead of trying to figure out why people rated a random sampling of questions the way that they did. We turn this away from being a call to action to accomplish units of work and more into a "it's that time again, to gather together and talk about what we're doing."

Change the frequency to yearly. Anyone is welcome to open a meta discussion about the direction a site is headed at any time, and a year for this seems long enough for there to be enough new things to talk about.

Once modified, I don't see a reason to not run this on all of our sites. Asking folks to come together if only to say "Hey, are we still relevant?" is definitely not an exercise in futility.

6
  • 5
    I disagree with front page being representative of something important. On a slow site, it's a bunch of questions bumped by Community or by the since-deleted non-answers posted under them. The non-answers do correlate with the sort of questions on which anonymous visitors land (since they are the ones posting them), but this is a round about way to identify those. Also, if a problem with review is an exceedingly small number of participants, then wouldn't the same apply to any meta discussion? If the main site's relevance is in question, its meta is going to be in thick layers of dust.
    – user259867
    Commented Jun 20, 2015 at 18:29
  • 1
    I like that the call to action is "discuss on meta" rather than "complete this queue", but if users aren't already clicking through the community bulletin I don't see how putting a "time to discuss" robopost there is going to drive any more traffic to meta. Commented Jun 20, 2015 at 19:32
  • Once-a-year discussion about where the site is headed would be nice to have, of course. On active graduated sites, such discussions tend to come up during moderator election, which is quite possibly the time of highest meta engagement. Consider adding a discussion topic either at that time, or somewhat earlier.
    – user259867
    Commented Jun 20, 2015 at 22:16
  • 1
    This idea, combined with Slim's "outsider homepage" idea, has some value. Let's think about this as a replacement for the oft-forgotten "greatest hits" route.
    – Shog9
    Commented Jun 21, 2015 at 0:23
  • 1
    @Idisagree It's .. not perfect, but it's generally a sampling of stuff that people found valuable enough to do something with. We can put filters on that to exclude dust balls that community bumps without any evidence of active interest by someone else (a vote, an edit, even a comment)
    – user50049
    Commented Jun 22, 2015 at 6:41
  • 1
    @Idisagree Also, there seems like some value in putting some of the 'dust' in the mix, especially if it's getting kind of thick. Meta being a ghost town is something we'd have to look at. Timing these with elections (perhaps a month or two prior) would be ideal, but we ... well, we don't always know we're having an election that far in advance. The community bulletin usually does a good job of driving folks there, I'm open to other ideas for (gentle) nudges.
    – user50049
    Commented Jun 22, 2015 at 6:45
16

Wait, there is a Site Self-Evaluation review queue?

Until reading your post, and going to a respective beta site, and looking on meta, and then reading the answer to look at the ratings, and then re-reading the post; only then out of this whole time did I realize there was a temporary review queue that gets opened for this process.

I literally had no clue before this. I am familiar with the overall exchange. I have over a thousand posts and visits. Apparently I have helped millions.

I treated the beta exchange just like any other exchange, visiting frequently even though the stream of questions was slow. I poked around a little but overall I expected the beta to have less features.

I would have used it... had I known

What I am getting at is there needs to be more exposure to this evaluation process. Simply posting it on the meta of sites with low traffic and opening a queue for a week seems to be rather in the shadows. I believe this can be seen by the amount of feedback (it is rather low).

Having been just now exposed to this process and having a fresh perspective on it, these are some of my first reactions.

Has this been tried on SO for example to get an idea of what a massive site looks like?

How was the 7 day window chosen?

Why is it a baseline 10 questions regardless of size?

What type of criteria is used to select the random questions?

They are not meant to be overly critical, they are just what first come to mind about the process.

I really like the nuance that the beta sites contain. It is an interesting place to see the model of Stack Exchange being applied. I feel that most people who participate in beta sites feel similarly and want them to succeed.

Notify me of the review queue

As a result of that inclination, I believe that it would make sense to ping every active user over the past quarter for their respective beta in their notification inbox with a link to the site evaluation review queue.

These users want to interact with the site, but sometimes when there aren't many questions it can be difficult to have interactions. Giving them a notification of the evaluation review queue in their inbox will allow them an opportunity to interact with the beta site.

For example, the most recent evaluation on beer beta only had 8 reviewers (unless some had skipped which was unlikely, max 10 if some skipped). Yet that quarter saw 288 different users obtain at least some reputation gain. Notifying these 288 users of the review queue would not only have provided a higher value for your testing statistics, but it also would have driven traffic to the site.

tmd;dr; (too much drivel; didn't read) I didn't know until now there was an actual review queue for these. I have seen the posts but never clicked every link to find the queue. I don't think these should be given up on just yet because there is room to improve - specifically in the form of notifying involved users of the existence of the review queue's status when available.

14

The introspection itself, and the inclusion of the users in that process, I consider fairly useful—as long as you get participation.

What I like about the current, automated self-evaluations:

  1. The fact that they're scheduled means community members don't have to think about introducing a "call to action" themselves (nor do pro tems) to get it happening.
  2. Random samples are great. Users need to come out of their SME corners to get an idea of the site's overall value (and challenges).
  3. Looking back at questions in the self-evaluation is a lot less charged than evaluating them when they first come in. Everyone's got less skin in the game, and you get the benefit of hindsight, which maybe prompts users who participate to start thinking more long-term about questions as they come in.

What I don't like:

  1. The fact that they're posted by the Community user makes them trivial to ignore. Somewhat paradoxically, there's extra friction involved in sending your opinions out into empty space.
  2. I know Meta is a horrifying monstrosity, but site eval threads are particularly ugly; they're cluttered, they age poorly, it's hard to have separable discussions and voting ends up being counter-productive.
  3. Ten questions is too small a random sample.
  4. Some questions just aren't that important. I don't want to be distracted by some troubleshooting question that got seven views and is just intrinsically not searchable. Sometimes I wish I could mulligan the random sample...

If we accept that this approach to introspection is very useful (TBD!) then I think we should grab a larger sample; the less practical it becomes to enter the discussion with a play-by-play of every item on the list, the more likely users are to focus on the big picture when answering.

When users just list their thoughts on each question from the queue in their answers, those thoughts aren't separable. A great point about a single question can get hidden in the middle of a bunch of forgettable running commentary. Better for them to use the eval thread as a place to synthesize a conclusion rather than pick nits about each item.

An even stronger nudge in this direction might involve hacking together a way to comment on review items within the self-eval queue without having those comments show up on the post itself, leaving the Meta thread for tl;dr and Big Ideas.

11

As already stated in HDE 226868’s answer, a bunch of random questions is very unlikely to spark any constructive discussion.

So, a naïve thought would be to hold regular community surveys, asking questions like: “Are we closing too many questions?”, “Are we friendly to newcomers?”, “What policy should we change?” and so on. But without any examples, a lot of people will just select the most neutral option and it does not help you to identfy issues and similar.

Thinking a bit further, one would need specialised surveys with selected examples. For instance, show people a few closed questions whose on-topicness was disputed (i.e., a considerable number of leave open or reopen votes) and ask what they think about them. Next months, something similar with unanswered questions. This might be worth trying but it can only capture issues that are known to affect many sites.

For individual issues, you need classical Meta posts again: Somebody identifies a potential issue, collects a few exemplary posts and asks a Meta question about them. But then – and now we come full circle – how to spark these? I can only see one solution: Make per-site-metas more attractive, e.g., by informing users about votes. Stack Exchange lives from gamification, but per-site-metas are somewhat excempt from this.

3
  • The trick is getting people to focus on the general trend illustrated by the example rather than nitpick the example. The latter happens all too often and the underlying issue rarely gets discussed/resolved.
    – Troyen
    Commented Jun 19, 2015 at 21:47
  • Not sure dwelling on controversial closures does much to give an impression of the site overall. Most questions should be neither closed nor controversial, I would think.
    – Air
    Commented Jun 19, 2015 at 21:53
  • 3
    @Air: If I understand correctly, the whole point of this question is that the purpose of obtaining “an impression of the site overall” is gone now and thus only the purpose of “drawing the community's focus […] onto quality issues that they could [change]” remains. — I agree with you that most questions are neither closeworthy nor controversial but those questions do not require any action (unless you count the question of how to get more of them).
    – Wrzlprmft
    Commented Jun 19, 2015 at 22:14
9

Summary


Suggestions

One idea I had would be to completely cut out the statistics "answer". Stop having people rank the questions in the queue. Just show them 10 random questions - change them for each person, perhaps - and encourage them to write up an answer to a meta post if (and only if) they notice anything useful. That's the nicest suggestion I have if the Evaluations are kept around (which they won't be).

Huh, that kind of sounds like typical meta activity: A user notices something interesting that needs to be discussed and brings it up on meta. So even this bit of the Evaluations doesn't add anything.

The only possible issue is that there aren't a lot of these "big picture" questions on a meta. There's typically not a lot of introspection.

This brings me to the second part . . .


Is this sort of forced introspection even needed anymore?

Yes. Sort of. But not forced. And there needs to be a better way. . .

Let's delegate this kind of thing to the members of each site as a whole.

Cut out the idea that this kind of thing is periodic. (I link to Grace Note's answer purely to link to official policy) There's no reason why, say, Aviation should need an evaluation in Month 1 and not Month 2. Let the people of each site choose when they want to be introspective.

If people think that there needs to be deeper introspection, then let them have it. But if they think that all is well, then, hold off. It would be a waste of everyone's time.

As for the format of this, just have people look around and see what issues they find. Post answers on a meta question, or post separate meta questions for each issue. Cut the queue and the rankings. Have a Site Self-Evaluation be as informal as it can be - simply a period of time where everyone agrees to dig something up. That's all. Perhaps have the mods set up a few canonical meta posts about different topics, but that's it. Period. Let the people choose.

So, there should be an official evaluation that each site chooses to have, but everything about it is up to them to decide. Maybe SE should suggest a few meta posts to have - like the Real Essential Questions of Every Beta - but that's it.

2
  • I think it's very difficult for someone to hold 10 questions—which may be lengthy and technical—in their mind, while checking search results, and come back at the end with organized thoughts, without scoring the questions as they go along. I know I'd keep a scorecard on scratch paper or in a spreadsheet if I had to.
    – Air
    Commented Jun 19, 2015 at 21:39
  • @Air I'm not even suggesting that 10 questions have to be used - or that any set number has to be used. A user could poke around at will.
    – HDE 226868
    Commented Jun 19, 2015 at 21:39
9

Being a pro-tem moderator on a beta site where there is no "standard" review process (neither Code Review nor Programming Puzzles and Code Golf have them), I can hardly claim to be an expert on them. What I can say, is that on Code Review we have had a number of self-introspections and as a community we have rallied and improved as a result.

These introspections have been the result of feedback from "manual" evaluations from the CM's that were performed in "secret" and then announced to us. That review, for example, inspired this response: We're on a mission!.

The efforts resulting from that "mission" have more than doubled the activity on the site, removed a number of warning signs, and then later resulted in the announcement that at some point, the site will "graduate".

It is clear that introspection can be beneficial, but it is also clear that it does not need to be triggered by the automated/review system.

Similarly, a number of sites have repeatedly seen clean bills of health from the self-review system, but have no indication that they are ready for "graduation" either.

The promotion from a beta site to a full site can be influenced by a number of factors, one of them is the self-review, but, in the long run, it is not used in a deterministic way by the CM's when considering a site promotion, it is just one of many factors that are considered, and, even when considered, it is not necessarily even treated as a go/no-go barrier. Sites with low participation or unflattering self-reviews have been promoted anyway.

The best value that I can see from a site review is a trigger for further introspection and discussion. Let's face it, 10 questions on a decent beta site with a few thousand questions per year, is not exactly a good sample. It is not a tool that is particularly useful for more than that. More interesting than the actual evaluation would be the participation in the evaluation, how enthusiastic and engaged the community is in the process.

Bottom line, is a site that has an active and invested community is probably going to have a good evaluation anyway, and a site that does not have that community, probably does not have a great review regardless.

Having "looked around" at various beta-site self evaluations, there does not appear to be much in the way of discussion or follow-up on the evaluations. They do not change anything....

Recently promoted sites:

Stable but not promoted sites:

New Sites:

In all, I would suggest that participation is very low on the evaluations. 30-or-so people responding to 10 random questions on sites with thousands of members is hardly useful. The resulting participation in discussions is also.... disappointing.

Bottom line is that the review process does little to promote any meaningful discussion, it provides marginal value in terms of any objective numbers, and the results are not used systematically by the CM's anyway.

There is no reason to continue with them other than to reinforce that participation in the meta-aspects of the site is low.

Further, in my experience, there are other, better ways to encourage community growth and question and answer quality, and those other ways rely on community engagement and standards, and those are things which cause good reviews, and are not caused by auto-reviews.

1
  • ..................no pun? Commented Jun 22, 2015 at 17:37
6

On a podcast after manual site evaluations started (around the 8:40 mark), Robert asked:

What if we just literally let [small sites] exist forever? What does that do in terms of making the internet a better place?

This is the situation we are now in. Since we keep launching more and rarely close sites down, monitoring represents a scaling problem for Community Managers. Automated self-evaluations solve the scaling problem by, in theory, asking the community to do the heavy lifting of comparing the site's questions to other pages on the internet. Unfortunately, our current system is needlessly complex.

So I'd like to propose the following changes:

  1. Select 5 questions instead of 10. Part of the problem I have with these evaluation is that it just takes a long time to look at so many questions. Cutting the number in half hurts the sample, but also makes the task less daunting. We might even go back to quarterly evaluations if sample size is a concern.

  2. Don't separate the review from the writeup. I enjoyed going through questions in the review queue, but I kinda wanted to be able to write something when I first evaluated the page. But instead, I had to wait for the review to end and the results to be posted. Instead, I'd like to post the sample and encourage people to post their analysis immediately.

  3. Don't use numbers that lack meaning. It was great to look at the statistics at the end of an evaluation until I discovered that every user has an incompatible model of what the ratings mean. Numeric results require some sort of objective criteria to ground any post analysis.

  4. Assume we have good SEO. One of the remarkable things I discovered when doing these evaluations was that Google loves to provide our pages as search results. That might not be true forever, but for now it doesn't make a lot of sense to ask users to go through the exercise of searching for the page on our site. Rather, I think it's helpful to ask users to search for competing answers that are better than the ones on the Stack Exchange site.

  5. Provide some objective measure of how well the page was received by regular users and anonymous visitors. Often both groups have similar opinions, but it can be revealing when they differ.

In order to test these ideas, I've posted a test on meta.Aviation.

5
  • 1
    Interesting query, but anonymous feedback would be more meaningful if instead of looking at the last 90 days it took the posts between 90 and 180 days old. I.e., evaluating not as things happen but after a while. Then one'd be in a position to obtain stronger signal: I tried the posts with at least 3 votes of either kind and got one in the sample that's rated 23:0 by users but 4:9 by visitors.
    – user259867
    Commented Jul 9, 2015 at 1:38
  • There has got to be a better way of making the task manageable than by reducing the sample size. The point is to evaluate the site, not the sample; if the latter's not statistically significant, you might as well not even bother. And although we can't really use a formula to determine what's significant in terms of open-ended, qualitative feedback, my strong feeling as a participant is that a random sample of five questions is unlikely to be representative of an entire site. Especially a site that's doing what it's supposed to and producing lots of questions over the sampled period.
    – Air
    Commented Jul 9, 2015 at 22:55
  • @Air: I'm coming to the same conclusion. The principle thing I'd like to ask is "does the question and its answers improve on what the internet has to offer". So the only thing we need to ask of people is to try to find some better answer. I can probably ditch the anonymous feedback portion (or religate it to an informational query) and just focus on finding better answers elsewhere. My plan to to do a separate experiment on a different site and expand the sample back to 10. (10 still might not be significant on high-volume sites, however. More than that would limit diversity of feedback.) Commented Jul 9, 2015 at 23:08
  • We could also reduce the population of interest by disqualifying answered questions that don't meet a minimum threshold of views. If we look at a site via the analogy of a city, there are always going to be parts that are ugly, unimpressive and out-of-the-way. The value of the city, its impact and impression on the world, are much more determined by its well-traveled parts. In terms of site evaluation, that suggests weighting items for inclusion according to their views - or applying a minimum threshold.
    – Air
    Commented Jul 9, 2015 at 23:21
  • Of course, there is some risk there of missing the particular problem of worthy questions that potentially could have a lot of exposure not getting that exposure for lack of curation via edits. That's something an evaluation would hopefully call out. So I don't know (hard problem is hard).
    – Air
    Commented Jul 9, 2015 at 23:23
5

I will not be at all sorry to see the Community/Site Evaluation process shut down because I perceived little value in them on a site I moderate: Let's get critical: Feb 2015 Site Self-Evaluation

I think whatever replaces it should be kept very simple.

I think a very useful set of metrics for sites to aspire to is already available in Area 51 and we should do more with them.

I would prefer to see per site Meta discussions, with community manager input to get more cross site perspective, on simple questions like:

  • Why are our questions per day very low?
  • Why are our 200+ rep user numbers low?
  • Why are our answers per question low?
  • Why are our visits per day low?

and be prepared to congratulate ourselves when metrics are looking good:

  • Why we stay well into the 90s for percentage questions answered?
  • How do we keep our 2,000+ and 3,000+ rep users engaged and keen?

If there are any better metrics that come to light then by all means add a couple more into the mix for discussion.

2
  • 2
    What kind of answers do you expect to those questions? "Q: Why are our 200+ rep user numbers low?" "A: because few of the site's users have earned 200 rep."
    – user259867
    Commented Jun 20, 2015 at 18:31
  • 1
    @Idisagree The questions are expected to result in discussion and new ideas. If a site has thousands of people getting to 50 and few going beyond 200 then I would be keen to know why that category of user is sampling but not engaging with our community. Perhaps we are "mean" to newbies, maybe we occupy a topic space where everyone feels they only have a question or two to ask, or maybe they find our Q&A format confusing/unsuitable for whatever reason, etc, etc The answers that emerge will only be known once the question is asked.
    – PolyGeo
    Commented Jun 21, 2015 at 8:09
4

The questions are rather useless, as it currently stands. It might be better to have some kind of a survey instead. For instance, the following types of things might be of interest:

  1. How many questions have you searched this site for in the last month?
  2. Do you find the quality of questions to be high?

Some questions need to be qualitative as well, such as:

  1. What do you think the single best thing of this site is?
  2. How do you think we can improve?

These qualitative answers would be posted.

Asking some questions like that, and providing all of the information could be a useful way to gather information from a wider audience. Maybe do it once a quarter, and send out a notice to those people who frequent the site to make sure and take part would be nice as well.

2
  • 4
    Users who participate in self-evaluation tend to be the most invested in the site's success. I'd be concerned that the answers to a general survey like this would be an exercise in confirmation bias and wishful thinking.
    – Air
    Commented Jun 19, 2015 at 21:27
  • 1
    Note that I mentioned people who frequent the site. I agree that a general survey wouldn't be helpful, but I do think it'd be good to get some of the moderately active users on board. Maybe people who have asked/ answered a question recently? I'm not sure what the criteria would be, but I definitely agree that having a general survey wouldn't be as useful. Commented Jun 20, 2015 at 1:52
3

How about having others judge your site?

Beta-sites each could judge the quality of questions and answers for other sites, but not pair wise, such the possibility of quid-pro-quo doesn't even arise.

Yes, you potentially can't judge whether answers (and to some extend) questions make sense, but that's not the point. Look at whether they're structured well, comments are not answers, answers are not comments, etc.

1
  • 1
    If the Stack Exchange folks really think Hot Network Questions shows worthwhile questions, maybe we could ask (a subset of) users clicking through HNQ from another SE site to do the judging. Commented Jun 20, 2015 at 19:34
3

With the old system, I usually found the sample rather unsatisfying. Here is one idea on how to

  • make people read questions they can say something about,
  • make it clear their input/action is valuable, and
  • may improve site quality.

Introspection can be gained by tracking what users do in response to what I propose.

So, here's the idea. Every day (week, month, ...) send every active¹ user a notification with a question link, asking them to review the post and then edit, vote and answer (if possible). Draw this question randomly from the pool of questions in tags this user likes which probably need attention (few views and votes, no upvoted answers).


  1. They should have high enough rep to do all the community moderation, and should have visited the site in the last interval.
2

I never liked the self evaluation because I found myself rating my own answers. It's very difficult to be totally neutral about your own answer.

I also found it hard to use suggested methodology of doing a google search and comparing the answers. Content farms have a lot of answers but you could write a small essay about why those answers might be technically correct but lacking in references or disclaimers about where the answer applies.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .