-341

Earlier this week, Stack Exchange released guidance to moderators on how to moderate AI Generated content. What does this guidance include?

26
  • 13
    Please upvote this. Downvote the answer all you want, but the question should remain on the front page.
    – terdon
    Commented May 30, 2023 at 20:29
  • 61
    @terdon: That'd mean that this would belong in a blog post rather than here on Meta Stack Exchange. If we're voting on it, then we're tacitly discussing the quality of the policy, which is not a very good or agreeable policy. I have no intent on upvoting this question just because it's related to policy; it's up to the company to make the conversation inviting and the policy open for discussion/feedback.
    – Makoto
    Commented May 30, 2023 at 20:32
  • 10
    @Makoto since this has been posted as a Q&A, we can vote on the A and Q separately. The Q is fine, had it not been posted by a member of staff it would have been upvoted. It's the answer we have issues with.
    – terdon
    Commented May 30, 2023 at 20:34
  • 8
    @terdon if there's a need for a feature to keep company posts at the top, the company can will it into existence. we don't need to use our votes to do that.
    – Kevin B
    Commented May 30, 2023 at 20:34
  • 32
    I downvoted this question itself, simply because I'm angry at the way that the situation was handled, and because I disagree with everything that's happened so far. The post speaks for itself, and deserves votes based on this. Commented May 30, 2023 at 20:44
  • 14
    It feels to me like there's really mixed/ unfocused messaging going on here. The question title and body focus on "moderat[ing] AI Generated content", but the policy answer is primarily about user suspensions, and gives little to no concrete guidance on the aforementioned moderation of AI-generated content. Can you update the title & question to reflect the actual policy you've posted? It's genuinely really confusing to me what your main aim is here because the answer and question title seem so disconnected.
    – zcoop98
    Commented May 30, 2023 at 22:09
  • 25
    The current lock notice states "It is still accepting answers, comments and other interactions" (emphasis mine). If that is not the case, please change the lock type to the appropriate one to prevent any new answers being posted (I'm fairly sure I've seen this being used previously on some posts), rather than just deleting them all afterwards. If I'm mistaken and such an appropriate lock type doesn't exist, then please at least update the question text to explicitly state for members to not post any answers. Commented May 30, 2023 at 22:52
  • 13
    another weird thing: Your question post is written in a way that nobody would have really asked that question. How would anyone but mods know that this happened earlier this week? And why would mods be wondering about something that presumably you would/should have made clear to them?
    – starball
    Commented May 30, 2023 at 23:18
  • 10
    Side note: could the chatgpt tag be added? It's locked so I can't, but... that is pretty directly connected to this post
    – cocomac
    Commented May 30, 2023 at 23:22
  • 38
    Please discuss this policy here.
    – Ryan M
    Commented May 31, 2023 at 2:03
  • 13
    Echoing an earlier comment, why isn't this featured? Is it because it lacks the details necessary for meaningful discussion?
    – QHarr
    Commented May 31, 2023 at 4:18
  • 6
    One more step to making Stack Overflow a first Q&A platform where answers are one hundred percent AI-generated. Commented May 31, 2023 at 8:31
  • 15
    I wonder what will happen first, humans leaving SE or SE replacing humans with AI that generates high-noise low-signal engagement.
    – Akixkisu
    Commented May 31, 2023 at 9:35
  • 19
    If future content is heavily AI-generated, there will be no market for SE to license the content to AI generators for training... The only way for SE content to be worth anything is for it to NOT be AI generated material.
    – Jon Custer
    Commented May 31, 2023 at 13:57
  • 11
    If I were staff, I would not feel safe hurling myself onto the pitchforks here. Whatever response they are planning, it's almost certainly not going to materialize as comments here. Just sayin'.
    – tripleee
    Commented Jun 1, 2023 at 17:36

2 Answers 2

16

The policy given to moderators in private, which was later made public:

May 29, 2023

Please stop use of GPT detectors for content moderation on Stack Exchange

In light of significant research on the efficacy and flaws of GPT detectors, we’ve been studying the impacts of these detectors on Stack Exchange. And I’ve got some bad news to share.

We made two major discoveries, which I’ll discuss in more detail in this post. To summarize, though, we have strong evidence that:

  1. none of the GPT detectors work as advertised on Stack Exchange data and have egregious false detection rates on our platform, and
  2. they are creating an alarming native language bias in suspensions on the platform.

The tl;dr: We are now asking you to stop using GPT detectors to substantiate claims of GPT usage and to stop suspending due to AI-generated content in general. Based on what we now know, we can’t admit GPT detectors’ results as reasonable evidence of GPT usage. To be clear: as of today, Philippe has instructed our team that GPT detectors will not be acceptable to establish reason-to-block. We will revisit this decision if evidence emerges that the issues with these tools have been resolved.

GPT detectors do not work as advertised and produce unreliable results

Based on significant and credible research that implies GPT detectors return excessive false positives, we conducted our own survey of GPT detector efficacy on Stack Exchange posts authored before ChatGPT’s November release. This survey concluded that GPT detectors misclassify 32% (+/- 6%) of non-GPT posts on Stack Exchange sites as having been written by GPT. This error rate is too high to be tenable for day-to-day use, and is certainly too high to establish cause for messaging or suspension.

We also discovered that GPT detectors often disagree with each other as to whether a post is GPT created. This means that strategies relying on one of two detectors returning a positive result could lead to a baseline error rate of around 50% - and the incorrect detection rate gets worse the more detectors are added. In other words, 50% of posts detected as positive using an “either detector” strategy could be incorrectly classified as AI-written.

However, we did not find a way to reduce the error rate far enough by combining GPT detector results. Relying on multiple GPT detectors returning positives at the same time will return an error rate in the range of 12% (+/- 7%). We expect that combining the results of additional detectors will not return more meaningful results. This is still too high for operational usage, particularly at the volume GPT messages and suspensions are issued on the platform today.

If the situation changes and a GPT detector is released that meets our criteria for operational efficacy, we will revisit this policy change.

GPT detectors are creating bias against non-native English speakers

Even more alarmingly, recent research shows that GPT detectors exhibit overwhelming bias against non-native English speakers, and misclassify their writing as authored by GPT at a rate of ~45-80%. Available detectors are also unanimous in this misclassification around ~19% of the time. This risks unjustly targeting non-native English speakers on the platform, which comprise a significant majority of participants on some of our largest sites. (I would also hazard to say that non-native English speakers comprise the majority of participants on a majority of sites across the platform.)

We conducted a survey of the site using our best available information, on the assumption that biases against non-native speakers (a feature we cannot analyze) would show up as biases for and against specific countries (a feature we can analyze).

Our results strongly suggest that actual mod messages sent about GPT to users on the platform are significantly and inappropriately biased, likely driven by biases in the underlying tools used to flag GPT answers. To be clear, we do not believe that moderators are consciously acting with any sort of bias, but that the tools that they are using for this task have that bias built into them.

Regions of the world that appear to be unjustly targeted by GPT detectors in use on our platform include southeast Asia, the Middle East, and most of Africa. Regions of the world unjustly benefited by GPT detectors include North America, Europe, and Oceania.

The effect sizes are not small. For example, Pakistan receives 3.6x more suspensions for GPT usage than their baseline participation rates should imply. Bangladesh receives 2.7x more suspensions than the base participation rate justifies. India receives 1.75x more suspensions. On the other hand, the United States receives 0.6x the suspension rate, alongside Sweden, Great Britain, and Australia. While we unfortunately can’t share the raw data from our conclusions, we hope that these data points help to convince you that our alarm is justified.

Bear in mind that some of the reason we suspect GPT detectors create this bias is because they are detecting patterns in language that non-native speakers rely on. Therefore we also cannot accept similarities between the structure of a user’s answers alone as an indicator of GPT usage.

Only self-admission, freely given, can count as evidence of GPT usage on the platform

In light of the above, there are no tools currently available (at least so far as we are aware) that can identify GPT usage successfully. Therefore, the scope of admissible evidence for establishing GPT usage is, in the current state of things, very thin.

  • GPT detectors cannot be counted as evidence for the reasons above.
  • User behavior, such as answer timing, could be correlated with GPT usage, but a hunch alone does not a suspension make. Instead, this signal has to be taken as a prompt to review the objective quality of the user’s contributions.
  • You may consider things that have obviously been copy pasted from ChatGPT such as literally mentioning the knowledge cutoff date from ChatGPT or phrases that include things like "If this didn't help, you can use a forum like Stack Overflow and ask for help there" to be self-admission and AI generated.

Therefore, the only admissible evidence we can currently permit for GPT usage is self-admission by the author of the posts, freely given. Please note that “freely given” is important here: please do not, under any circumstances, try to trick users into admitting GPT usage, lure them into saying it, or otherwise coerce a response. Even a user saying they have used GPT in general may not count unless they specifically say they have used it here, or for this contribution.

As a final reminder, suspensions should only be issued for real behavior, actually known to be malfeasance. We just can’t endorse kicking users off the platform on the basis of hunches, intuitions, guesses, or untested/untestable heuristics.


We are aware that this leaves the state of GPT enforcement on the site in pretty rough shape. The silver lining is that these tools don’t really seem to have been working as designed anyway.

Here are the key points we’d like you to take away from this post:

  • We are now requiring you to ignore and/or decline reports using online GPT detectors or intuition as their basis. Remember that users have honed their instincts as to what a GPT contribution ‘sounds’ like based on input and signals from GPT detectors that are now known to be bad.
  • Do not direct users to GPT detectors or encourage them to act on their results in any way. Users, as well as moderators, should not be using these tools to determine post and user outcomes.
  • The only admissible evidence for GPT usage is self-admission; however, you should not solicit self-admission from users.
  • Remember that mod messages and suspensions are for real, verifiable malfeasance only, and should not be enacted on the basis of hunches, guesses, intuition, or unvetted heuristics.
  • Effective immediately, Philippe has revoked the temporary policy change allowing moderators to jump to 30 day suspensions for GPT usage. In light of the current evidence, we simply do not trust that the tooling exists to support suspensions with the required degree of confidence. Although this policy was only formally enacted on Stack Overflow, any network sites that have informally been working under such a policy should also now consider that policy overturned, and should no longer be acting on it.

As far as the existing suspensions go, we will let active suspensions that were issued for a time period of less than 90 days expire organically. We’ll only review these suspensions if users contact us. For suspensions that were issued for a time period of 90 days or greater, we’ll manually review these to determine if they can be removed (they’re not just flooding bad content, for example).

Low quality content

You may still naturally deal with content abuse in ways we always have: if someone is posting a flood of poor quality content, suspend them for low quality content. Just don’t make it about GPT or AI generated content - make it about the quality being poor on a repeated basis. This includes things like repeated answers that don't address the question.

If someone is posting a flood of content to promote their tool/site/blog, use the astroturfing policy we have.

One final note: The amount of content deleted for GPT usage is truly astounding across the network. While we don’t doubt users have been using GPT and other language models to post answers, if you’ve been using GPT detectors to mark individual answers for deletion, we’d ask you to consider carefully whether any such content needs undeleting. Keep in mind, false positives are dramatically more likely for authentic writing by users who do not natively speak English, and may run as high as 80% for some detectors; in these cases, users may not deserve to have their content removed. Additionally, it remains possible that a false positive on one answer means false positives on other answers by that user are more likely to occur.

If you have questions, let us know in the answers below and Philippe is also willing to schedule a meeting in the TL if needed to discuss it further.

2
  • 3
    should this also be locked to prevent editing? (the source post is locked)
    – starball
    Commented Jul 26, 2023 at 20:45
  • 2
    @starball I'll flag it and request to add the lock. P.S. The title of the policy given to mods was this: "Please stop use of GPT detectors for content moderation on Stack Exchange" (according to Laurel's comment). Commented Jul 27, 2023 at 3:18
-440

We recently performed a set of analyses on the current approach to AI-generated content moderation. The conclusions of these analyses strongly indicate to us that AI-generated content is not being properly identified across the network, and that the potential for false-positives is very high. Through no fault of moderators' own, we also suspect that there have been biases for or against residents of specific countries as a potential result of the heuristics being applied to these posts. Finally, internal evidence strongly suggests that the overapplication of suspensions for AI-generated content may be turning away a large number of legitimate contributors to the site.

In order to help mitigate the issue, we've asked moderators to apply a very strict standard of evidence to determining whether a post is AI-authored when deciding to suspend a user. This standard of evidence excludes the use of moderators' best guesses based on users' writing styles and behavioral indicators, because we could not validate that these indicators are actually successfully identifying AI-generated posts when they are written. This standard would exclude most suspensions issued to date.

We've also identified that current GPT detectors have an unacceptably high false positive rate for content on our network and should not be regarded as reliable indicators of GPT authorship. While these aren't the sole tools that moderators rely upon to identify AI-generated content, some of the heuristics used have been developed with their assistance.

We've reminded moderators that suspensions (and typically mod messages as well) are for real, verifiable malfeasance only, and should not be enacted on the basis of hunches, guesses, intuition, or unverified heuristics. Therefore, we are not confident that either GPT detectors or best-guess heuristics can be used to definitively identify suspicious content for the purposes of suspension.

As always, moderators who identify that a user has a problematic pattern of low-quality posts should continue to act on such users as they otherwise would. Indicators moderators currently use to determine that a post was authored with the help of AI can in some cases form a reliable set of indicators that the content quality may be poor, and moderators should feel free to review posts as such. If someone is repeatedly contributing low-quality content, we already have policies in place to help handle it, including a suspension reason that can, in those cases, be used.

75
  • 179
    This post does not match the guidance given to moderators. Commented May 30, 2023 at 19:52
  • 141
    Can we please have concrete numbers for the "unacceptably high false positive rate" from your research?
    – E_net4
    Commented May 30, 2023 at 19:55
  • 151
    @E_net4 We mods didn't even get them. That said, they ran it on old questions, proving that there's a bias in the detectors (which there are independent studies already confirming). However, they've never run any actual checks to prove that there is a false positive rate in suspensions, as most suspensions do not rely only on the output of GPT detectors. They've been bashed for this internally, and have so far ignored that feedback Commented May 30, 2023 at 19:57
  • 115
    "In order to help mitigate the issue, we've asked moderators to apply a very strict standard of evidence to determining whether a post is AI-authored when deciding to suspend a user." The guidance I have received is: "We are now asking you to stop [...] suspending due to AI-generated content in general". How do I reconcile these two statements?
    – Andy
    Commented May 30, 2023 at 19:57
  • 89
    "We've reminded moderators that [...]Therefore, we are not confident that either GPT detectors or best-guess heuristics can be used to definitively identify suspicious content for the purposes of suspension." This is exactly why SO had the policy of ChatGPT being banned... because one can't rely on anything about it being accurate. This is exactly the danger of "AI" and this policy seems counterintuitive because it seems to invite more "AI" into the site instead of less.
    – TylerH
    Commented May 30, 2023 at 19:59
  • 86
    We protested this update vehemently and you went ahead with it anyway. Apparently it's not clear to the team how much this will negatively impact the quality of the network. If there's anything we can do to help you see the light, please say so.
    – Mast
    Commented May 30, 2023 at 20:12
  • 87
    @Andy (and Philippe), we were not "asked", we were "required". I don't know why Philippe is suddenly turning into a spin doctor here, and I am guessing this isn't his choice, but let's be very clear: we were not asked, we were required and told what to do. That is a very significant difference.
    – terdon
    Commented May 30, 2023 at 20:14
  • 50
    For everyone else equally infuriated by this decision, strike discussions have started up. Commented May 30, 2023 at 20:14
  • 83
    We moderators have been assured that there are distressing "indicators" that our handling of suspensions has been suboptimal. This answer would be substantially improved with concrete exposition of those indicators and how they are unfair to community members or harm Stack Overflow's business interests. Commented May 30, 2023 at 20:33
  • 142
    You seem to be completely oblivious of how we actually moderate AI content. I suggest you actually come talk to us, and the moderators, and look into the chat rooms we use for this curation, to understand that we are reasonable in our moderation efforts, and we don't just throw around accusations and suspensions without being on steady ground. Commented May 30, 2023 at 20:48
  • 78
    @Andreasdetestscensorship Also worth noting that they've gotten mountains of feedback pointing out all of this, and have ignored all of it. They know it isn't how we operate, they've been told that AI content isn't reliable, but it doesn't matter to them anymore. They've provided some surface-level stats internally, but none showing that the FP rate exists in suspensions, nor any stats to back up any of their other internal claims Commented May 30, 2023 at 20:55
  • 82
    The whole premise is flawed because it starts from "regular users might use biased AI detectors" and the conclusion is "mods can't be trusted to judge from many kinds of signals whether a post is AI-written" which is the kind of fallacy that's hard to take as an honest mistake. Painting this all with "in the name of inclusivity" just drives it further home how cynical this all is. Not sure it's worth arguing about the face value of the "arguments" made by the company (for want of a better word). Commented May 30, 2023 at 21:26
  • 98
    This post is manipulative. You have been told time and time again how you are wrong. Moderators have been very clear in the fact that you say one thing behind locked doors, and another in public. As such, I consider this post offensive. Commented May 30, 2023 at 23:07
  • 99
    99% of moderation is performed by regular users like me, but this post provides zero guidance for regular users like me. You haven’t even answered your own question, which was “What does this guidance include?” Commented May 31, 2023 at 2:31
  • 107
    Is a user who has never or hardly ever used capitalization and punctuation, has through the months or years of their activity written answers full of typos and "try this" in lieu of explanations of their solutions, and suddenly learned to write with perfect grammar and spelling, full-sentenced, plausible-sounding but conceptually entirely wrong answers not evidence enough? If not, then what is?
    – CodeCaster
    Commented May 31, 2023 at 8:56

Not the answer you're looking for? Browse other questions tagged .