There is no nuance to the 'Frustration Feedback' bar chart, in the loop blog

Question

The first thing that struck me about the recent blog post was this lop-sided bar chart.

I didn't think it would be productive to post a comment on it, so I'm airing my frustrations with this bar chart.

This only measures the number of complaints, not the impact resolving them would have.

There's no nuance that the bottom item 'Review Queues' could have a significantly larger impact on the top item 'Unwelcoming Community', despite one being roughly 30x as prominent.

Has StackOverflow made any comment on this? Is it paranoia that this was deliberately stated this way, to show that this isn't an issue?

What is also interesting is if "being called an unwelcoming community" will be lumped in with "unwelcoming community" because, well, dumb language processing robots. — Christian Rau, Commented Nov 27, 2019 at 13:51
@ChrissaysReinstateMonica yeah I've not even gotten into how things could have been recategorised in an unconsciously biased way. Nor that just asking what people want is an awful way to get feedback: "A person is smart. People are dumb, panicky dangerous animals and you know it." — AncientSwordRage, Commented Nov 27, 2019 at 13:55
You know what personally ticked me off the most? There's an entire 20% of data missing, and that's assuming you can only pick one. There's no way 20% of the data is in the long tail, either. — Sébastien Renauld, Commented Nov 27, 2019 at 13:56
It does not even properly measure number of complaints... because "unwelcoming community" can mean just about anything... but mostly it translates to "moderated community" — Resistance Is Futile, Commented Nov 27, 2019 at 14:07
@SébastienRenauld Those 20% missing are probably "not satisfied with SE management" and "not satisfied with firing Monica" — Resistance Is Futile, Commented Nov 27, 2019 at 14:09
@ResistanceIsFutile "overmoderation" is 7.1% and "welcoming backlash" at 0.5% so you're probably quite right. The top result should be "not getting what I want right now <stamps foot> <pouts>" — gbjbaanb, Commented Nov 27, 2019 at 14:24
I have no concept on what "barrier to participation" means - that you can't post 100 questions immediately after signing up? that you're not allowed to moderate until a certain rep level? That your questions get closed? — cegfault, Commented Nov 27, 2019 at 14:30
@cegfault being banned after posting one poor question too many? — Resistance Is Futile, Commented Nov 27, 2019 at 14:31
I also have no concept on what "unwelcoming community" means - are we talking about people closing questions, the hostility felt on meta because staff are ignoring us? Is it personal issues? social issues? technical issues? — cegfault, Commented Nov 27, 2019 at 14:32
This is what we are going to get in the future.... surveys with bunch of data that can be interpreted to anyone's liking... and SE will undoubtfully find appropriate data to back their future decisions. — Resistance Is Futile, Commented Nov 27, 2019 at 14:34
@cegfaultthe blog does actually cover this "artifact quality (outdated answers, poorly framed questions, etc.)" — AncientSwordRage, Commented Nov 27, 2019 at 14:36
We could argue about the chart and what's behind it. But most people know what's behind it: Thin air. The goal of this chart is to justify decisions (trying to avoid words like "agenda" or "ideology" here). — Marco13, Commented Nov 27, 2019 at 14:56
I'm not that knowledgeable about statistics, but what I've learned make this chart stink. There's an "Other", but it still adds up to less than 100%. If people selected nothing at all, that shouldn't count towards the 100% anyway. — Gloweye, Commented Nov 27, 2019 at 15:14
@ChrissaysReinstateMonica I am pretty sure, when you ask on MSE, the biggest problem we have right now is: we, the community feels very unwelcome by the company. So, they got that perfectly right: this part perfectly shows how unwelcome by the company the community feels treated! Right?! That is for sure what they meant to say?! ;-) — GhostCat, Commented Nov 27, 2019 at 16:49
@Raedwald Decisions are being made and their stance is "look, this is the data". If the data is junk, and they know it is, then it is actively misleading people at the very least. — Sébastien Renauld, Commented Nov 28, 2019 at 14:16

Nicolas Chabanovsky · Accepted Answer · 2022-10-28 09:21:30Z

I'm afraid I can't answer your specific questions, but I think it's on-topic to point out some more fundamental problems with the chart.

The chart doesn't meet even the most basic requirements for any presentation of survey data:

How many people were surveyed?
How were they selected?
Does SO Inc. claim the chart is representative? Of which group(s)?
What's the margin of error?
Why is the data presented with 0.1% precision, although it's likely that the margin of error is much larger?

Many other problems are mentioned in @curious's excellent answer.

The blog post by Sara Chipps and Juan M is silent on all of these questions, and (as far as I can tell) it doesn't provide links to sources that could answer them.

In a nutshell: In its current form, the chart is useless. It's pseudo-science.

Community · Accepted Answer · 2020-06-03 13:30:57Z

Allow me to piggyback on your question since I wanted to ask about this. For the record, I'm not saying I would have done better because qualitative research is harder than it looks, but I would also like some answers to better be able to make sense of that chart. I apologize for the lack of citations, this is off the top of my head. Feel free to comment if I got something wrong.

Some questions and comments:

Inductive and deductive coding

Inductive coding means that you build codes from the data, and eventually draw themes from what these to try to make sense of what's going on. When deductive coding is used, the research starts with preset categories and fits the data into boxes. This can be helpful if you want to make a comparison with something else or you are using a framework, and probably more but it's not something I've used often. Which method was used there?

Inter-rater reliability

It seems plausible to think that the data was coded by hand, from Yaakov's answer here (and kuddos to Yaakov for keeping communication open with Meta):

For a number of months we have been doing this by hand (yes, a few people have looked at many thousands of these responses, and assigned them to one of many dozens of categories).

(Note, it's not clear here if this applies to the Site Satisfaction Survey and if machine learning was used on this data.)

I'm not sure how many people is "a few" and if they coded the data in way they could assess reliability (by coding a single item multiple times and checking for agreement), or if they just needed a few people because there was a lot of data. Was inter-rater reliability taken into account, and if so, how?

Themes

Themes are supposed to give a sense of the data by capturing the underlying idea behind it. Often, themes are an actual sentence. Currently, the themes are really vague. What does it mean to be frustrated with voting? Is one frustrated about downvotes? That they are anonymous? The lack of votes? The accumulation of votes through time or from HNQ? It's fine to categorize information but it should be looked into further to better understand what exactly is going on and point to the actual problems, then you can get to the essence of things.

What's not being said

Another important feature of qualitative research is trying to make sense of what's not being said. What's missing from the bar chart? There is no interpretation included in the post as far as I'm aware.

Sampling

According to the blog post, the data comes from a representative sample of users. I would like more explanations about the sampling method. What characteristics were used? Location? Age? Reputation? Something else? Is the sample self-selecting? As in, one may be more likely to answer a survey if they want to voice a dissatisfaction?

Bunching all responses in the chart regardless of user background

Given how privileges scale, I highly doubt that new users find review queues annoying. So it seems all responses are bunched together. It's critical to gather data from new users, but it probably should be nuanced differently than the data collected from intermediary, experienced users, moderators, etc.

To illustrate my point, the director of public Q&A wrote a great article about how it feels to get ganged up on. That it's possible each individual comment is not being rude but the sheer amount of disagreement is overwhelming. A sturdier approach would be to categorize people from what we know of their experience of the site and compare the responses across categories. Can we make better sense of some responses, such as "unwelcoming community" with other data, research, and experiences? (e.g. literature about being ganged up on? Flags for unkind, rude behavior? Rude comments classification? Interviews?)

Self-reports

Self reports were not coded taking potential bias into account. It's not "I felt unwelcome, put on the spot, like I didn't belong", it's not coded as unwelcoming atmosphere, or that there is a divide between users that needs to be fixed. Why not take some time to look at that data again and try to get to the essence of it so we can all work together towards a common goal?

Taking your audience into account

Who is the bar chart for? Is it for company use? For experienced users? For new users? If you had a bad experience with Stack Overflow, then the bar chart is comforting and validates that it's not you, it's us. If you're part of the community, well you were just classified as unwelcoming. You, the community, is unwelcoming. Can we please stop digging the hole between the community and the company?

Overall, both anonymous and registered users are highly satisfied with Stack Overflow and tell us that their favorite things about our community include finding solutions to their problems, vast access to information, and the knowledgeable people who participate.

Would the company mind giving the community a bone there? Was the positive data coded? Spending some time analyzing what the company, its volunteers, and contributors are doing properly can also be worthwhile.

"If the problem is truly the community, then do you solve that by increasing the divide and hoping that only problematic people will leave?" Don't you have Twitter? — dfhwze, Commented Nov 27, 2019 at 15:51
@dfhwze I do. I took a deep breath and decided against writing "remove their problematic selves". — curious, Commented Nov 27, 2019 at 15:53

This_is_NOT_a_forum · Accepted Answer · 2019-11-29 01:39:25Z

I agree; the chart lacks essential "properties". That bar chart is one of these over-simplified "things" that managers love so much. Why? Because it allows you to project into it whatever you want.

The problem is: without knowing exactly in which context the underlying data was compiled, what questions were asked (exactly, the slightest variations in wording can lead to different outcomes), ... Without knowing that, it is hard to derive conclusions from that bar chart. Or to identify invalid decisions, that aren't supported by the underlying data.

To me, it is not even clear whether that bar chart is derived from the "well-known" yearly SO developer survey, or if it solely based on that not-yet-public Site Satisfaction Survey. Or a mix of both.

Who can tell us for example what the big "unwelcoming" line is really about? As I wrote in a comment, one could go so far and say: "looking at this chart, the community members think that the community itself isn't welcomed". (probably true, or who around here feels "welcomed" by SE Inc. lately).

And sure, then artifact quality is mentioned. But is that coming from newbies who find some answers to technical/complicated? Or from power users who worry about "do my homework" questions and "here, let me answer the Java null pointer exception thing for the one-million-th time" answers? Or from moderators who really lack tooling support?

We don't know. So we would need to trust SE Inc. about how data was collected, and how earnest the interpretation took place. Or if something else happened: that some higher-up manager gave out a vision, and then the data scientists at SE Inc. were tasked to find data that could be used to support that vision.

Trusting SE Inc., too bad, it is 2019, so that won't happen any more any time soon.

^{And after working for 20 years in a large US-based company, I don't think that happened by accident. Strange, how SE Inc. with its 300+ employees manages to repeat the well-known mistakes of those "legacy" companies with their out-of-touch vice presidents and senior executives.}

And then: downvoters are welcome to give quick feedback what aspects made them "vote down". I am always interested to learn something. Downvotes alone dont help with that. — GhostCat, Commented Nov 27, 2019 at 17:30
"looking at this chart, the community members think that the community itself isn't welcomed" So much this. It's a strange loop... :P — curious, Commented Nov 27, 2019 at 17:35

goldPseudo · Accepted Answer · 2019-11-27 18:53:39Z

Aren't you maybe expecting too much from this chart?

Could they have added more details on the methodology? Sure. Could they have provided a dump of the original data set? Sure. Should this sway people who are willing to trust data but unwilling to trust the company? Absolutely not. All this chart is doing is repeating what they already said, except this time in bars and axes instead of words.

We’ve learned a lot from you, and we work to distill your feedback into themes. One theme we examine is what you find most frustrating about using Stack Overflow.
^{—Introducing “The Loop”: A Foundation in Listening}

They made claims about what "you" (i.e. everyone who took the survey) found most frustrating, and explained how they analysed the survey data to come to those conclusions. And then they showed you a chart of the relevant results because charts make everything more sciencey.

The chart "only measures the number of complaints" because that's exactly what they were looking for here, and they explicitly took the data set and filtered out everything that was not that. You're not seeing nuance, because they weren't looking for nuance. You're not seeing impact, because they weren't looking for impact.

If they were actually looking for either of those things they wouldn't (or at least shouldn't) be using this chart to do that because that's not what it's for.

I guess I was hoping they'd look at the problem holistically and not just the loudest complaint. — AncientSwordRage, Commented Nov 27, 2019 at 23:16
@Pureferret Who says they're not? "One theme we examine" is a far cry from "The only thing we care about". — goldPseudo, Commented Nov 27, 2019 at 23:26
They say they're using this to form future decisions based on themes drawn from the chart. They are saying they're not by omission. — AncientSwordRage, Commented Nov 28, 2019 at 14:33

Stack Exchange Network

There is no nuance to the 'Frustration Feedback' bar chart, in the loop blog

4 Answers 4

Inductive and deductive coding

Inter-rater reliability

Themes

What's not being said

Sampling

Bunching all responses in the chart regardless of user background

Self-reports

Taking your audience into account

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
discussion
blog
the-loop
.

Linked

Hot Network Questions

4 Answers 4

Inductive and deductive coding

Themes

What's not being said

Sampling

Bunching all responses in the chart regardless of user background

Self-reports

Taking your audience into account

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged discussionblogthe-loop.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
discussion
blog
the-loop
.