83

Given the new CoC changes, it's clear that StackExchange the company wants to make a Big Deal of gendered language.

So, out of curiosity, how much will this actually affect? What percentage of comments on a technical site like StackOverflow use gendered language? Has the company done some research into how often these comments result in the misgendering of another user?

(I'd like to note that I'm in no way trying to declare pronouns unimportant here. I'm only curious, given responses like this, what the actual percentage of interactions that may be affected will look like.)

8
  • 1
    You could also add they and them in your query. Commented Oct 11, 2019 at 16:05
  • As that would be a comment that wouldn't have to change, given the new CoC, I'm not sure how useful that would be @πάνταῥεῖ
    – scohe001
    Commented Oct 11, 2019 at 16:07
  • 1
    It's an indicator for gendered / gender neutral comment appliance though. Commented Oct 11, 2019 at 16:09
  • 6
    Anyways your research confirms my POV that Stack Overflow is just doing makeup instead of covering core problems at the moment. Commented Oct 11, 2019 at 16:17
  • 3
    @πάνταῥεῖ It’s impossible to distinguish “gender neutral they” from “they that doesn’t refer to a person” without more sophisticated tools.
    – ColleenV
    Commented Oct 11, 2019 at 19:31
  • 4
    I don't think I've ever used a gendered pronoun on SE. Maybe the occasional singular they, but even that would be rare. I think I refer to 3rd parties only when it's OP (who is called OP) or @ mentioning a different user. Apart from I, we, or you, I don't need to use pronouns here much if at all.
    – Max A.
    Commented Oct 13, 2019 at 3:32
  • 1
    @MaxA. from here it looks like you have 4 instances of using gendered language in the lifetime of your Workplace account (though it looks like genders were known in each of those cases, so none of those would be in violation of the CoC).
    – scohe001
    Commented Oct 13, 2019 at 13:38
  • @scohe001 That's cool, I didn't know that query functionality existed. Very cool. It should be noted that all 4 are from before the change though. It's easy to be more careful once the change is in place. At least for me.
    – Max A.
    Commented Oct 13, 2019 at 23:44

2 Answers 2

82

Given this query on StackOverflow comments from the last month, .645% (4059/629548) of comments contained gendered language. That's ~1/150.

If we call a 10k+ user "active" if they've commented in the last month, this query shows that active 10k+ StackOverflow users will comment an average of 27 times a month (and the gendered comments drop to .557%, or closer to ~1/200). So an active 10k+ user will need to alter the way they write in, on average, 1 comment every ~7 months.

Curious about what your past comments look like for the lifetime of your account? Use this query to see your stats and this query for a list of gendered comments you've made with links. (Thanks to @Stevoisiak, and through them, MSE chat for putting those together!)

Note that these queries don't take into account deleted comments, which may very well be the worst offenders.

23
  • 22
    This actually shows how relatively small the problem is on main. Chat is a whole different problem though Commented Oct 11, 2019 at 15:48
  • 32
    @Zoethetransgirl from what I've read and seen, the outrage seems to be focused on not being able to answer an innocent C++ question without tip-toeing around the new CoC. Chat is (and has always been) more of a free-for-all and extremely difficult to moderate. I wouldn't even know how to go about analyzing that.
    – scohe001
    Commented Oct 11, 2019 at 15:51
  • 12
    I don't think I personally ever used gendered pronouns on Stack Exchange. I usually say "the OP" or singular their/they. Commented Oct 11, 2019 at 15:55
  • 3
    Don't bother analyzing chat. There's no decent way to analyze it with the public tools at hand, maybe unless you download a data dump (or <strike>bribe</strike> ask an employee nicely to compile some stats). But yeah, chat is slightly more open, and I imagine the problems might be smaller there (because stating pronouns, if it shows up, is better in chat than on main because of the format). No idea though, that's just my experience. Hopefully though, this calms down some of the concerns Commented Oct 11, 2019 at 15:55
  • 5
    It would be interesting to break it down by technical / non-technical site, to see if there's a measurable difference Commented Oct 11, 2019 at 16:14
  • 1
    @AurélienGasser the most awesome part of SEDE is that anyone can switch the site on the query with a few clicks. I don't have the time at the moment to play around with checking different sites (my goal here was just SO), but I'd encourage you to! And if you find something interesting, by all means make it an answer here :)
    – scohe001
    Commented Oct 11, 2019 at 16:19
  • 8
    This is a nice answer, but be aware that this searches over comments that have not been deleted. Most of the egregious stuff gets deleted & often quickly. Lots of other comments get deleted, too, so it isn't clear if the final proportion is higher or lower. Commented Oct 11, 2019 at 16:25
  • 4
    Ooh good call @gung. I've added a disclaimer.
    – scohe001
    Commented Oct 11, 2019 at 16:27
  • 1
    @rjzii Varies a lot, but there's always some activity in chat. After the CoC changes hit, activity exploded (we're talking thousands of messages in one room in a day - that's very unusual for at least two of these rooms that're normally big and active, but nowhere near 2k+ messages per day on a normal basis). I don't have any exact numbers for you though, and as far as I know, there's no proper statistics tools for chat, at least for us mortals. Commented Oct 11, 2019 at 20:00
  • 2
    I noticed that we were only querying the SO main site but found this great cross site query tool. I have modified it to include your query across all sites. Meta.SE is excluded for now because the many hypothetical use cases obscure actual uses. Across all sites (other than Meta.SE) gendered pronouns are in 1.37% of comments last month.
    – egerardus
    Commented Oct 11, 2019 at 21:26
  • 1
    @Gero might be worth making another answer here to expand on that a little and give your data some more eyes. Honestly that's pretty interesting
    – scohe001
    Commented Oct 11, 2019 at 23:34
  • 1
    Thanks for the query - here's an example of a comment I left that was marked as "gendered", but it wasn't actually directed at a person. ell.stackexchange.com/questions/32578/… Here's me once again referring to the pronouns without actually using them: ell.stackexchange.com/questions/32711/… I don't know if you can exclude the pronouns when they're in quotes, but it may help reduce false positives.
    – ColleenV
    Commented Oct 14, 2019 at 3:17
  • 2
    That's the figure for Stack Overflow. Run the query on Parenting, and the figure jumps up to 29%. Commented Oct 14, 2019 at 15:54
  • 4
    @scohe001 I don't know how this translates to SO, but in my activity on Math SE, I found that each of the 7 times I used a gendered pronoun in comments, it referred to either external figures (e.g. historical mathematicians) or to fictional figures introduced in the question (e.g. Alice and Bob, who want to securely share a message) - so, while a query definitely can't figure out this sort of data, the number of uses of gendered pronouns to refer to a user of the site might be a relatively small percentage of the total number of uses of gendered pronouns. Commented Oct 14, 2019 at 16:59
  • 1
    "Note that these queries don't take into account deleted comments, which may very well be the worst offenders." Gendering is not offending in general. I would not expect the ratio of gendered to non-gendered comments to change for deleted and not deleted posts. Commented Oct 15, 2019 at 8:41
19

I was curious about the impact of the pronoun "discussion" so I ran your query by day for the last month, I also threw in a comparison to neutral pronouns as a point of reference.

As you noted this does not include deleted comments.

But here is "gendered" comments vs "neutral" comments as a percentage of total comments by day for the last month (up to the last SEDE dump):

gendered_comments_by_day

Hard to see much change. Though a person declaring their pronouns as "he/him" or "she/her" will result in more uses of gendered pronouns instead of less.

I will try to update this when the next SEDE data is available.

3
  • 8
    This isn't necessarily accurate. Using he as a gender-neutral pronoun is more common among older and foreign users because that's what they were taught. Labeling all usage of he as gendered is not comprehensive.
    – ohmu
    Commented Oct 11, 2019 at 18:33
  • 2
    @user369450 yep, that is major factor in the discussion and why I wanted to try this query
    – egerardus
    Commented Oct 11, 2019 at 18:41
  • 4
    @user369450 In that case, it seems like even less of a problem considering the already extremely low number of comments containing that verbiage. Commented Oct 11, 2019 at 20:05

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .