252

Use of ChatGPT generated text for posts on Stack Overflow is temporarily banned.
However, the reasons for the ban really apply to much or all of the network, and certainly for sites that are similar in nature to Stack Overflow.

I suggest (temporarily) banning ChatGPT network-wide.

41
  • 12
    So far, if someone copies text generated by ChatGPT without attribution, it's at the very least plagiarism, so you could argue most of it is already banned. When attribution is given though, that becomes a bit harder, but probably just a case of up/downvoting as appropriately (or deleting if it doesn't answer the actual question because someone only pasted e.g. the title)?
    – Tinkeringbell Mod
    Commented Dec 5, 2022 at 13:19
  • 56
    @Tinkeringbell the problem is when it looks like an answer, smells like an answer, could be an answer, but is actually completely incorrect... And then that user dumps a load of those AI-generated non-answers on a load of questions... That's in a very tiny nutshell why the decision was made on SO to just ban it.
    – Cerbrus
    Commented Dec 5, 2022 at 13:22
  • 7
    Technically? Impossible. But as is often the case, there are systematic offenders that are relatively easy to recognize (manually)
    – Cerbrus
    Commented Dec 5, 2022 at 13:33
  • 26
    @Tinkeringbell it's not that you don't have the power, it's that if 2000 users posted an answer every 3 minutes just after you went to bed you'd have a lot of clean up to do. We'd like to avoid that in the first place, ideally by making as many people as possible aware that they shouldn't do it. Commented Dec 5, 2022 at 13:44
  • 4
    @RobertLongson so basically, an awareness campaign that these are things that are already not okay to do? I could live with that.
    – Tinkeringbell Mod
    Commented Dec 5, 2022 at 13:58
  • 42
    The problem is they're deeply plausible but incorrect answers, and they're a ton of work to ferret out. I'd be tempted to just throw them a year's suspension for being a jerk and wasting people's time. I'd even be inclined to destroy their accounts for posting nonsense Commented Dec 5, 2022 at 14:33
  • 7
    @Tinkeringbell If copying the work of a ML model is plagiarism is an area of debate, not only for language models but for image generators too, and most voices seem to go to "a model is a tool and tools don't have authorship". If there's no policy, suspending for plagiarism because you've used a language model to help answer something doesn't seem appropriate to me. We need specific policy for this situation
    – Erik A
    Commented Dec 5, 2022 at 14:58
  • 14
    @Cerbrus interesting - under Restrictions it says "[You may not] (v) represent that output from the Services was human-generated when it is not"
    – VLAZ
    Commented Dec 5, 2022 at 15:12
  • 5
    I just got it from skimming. Technically, the service doesn't allow you sharing output as is your own creation. Which answerers have been doing. But there is probably nothing the service can really do. At most, they'd cancel your plan for that account. If they bother at all. But then you can probably just register a new account ant continue. There is nothing they can really do about that output already in the wild. IMO, the clause is there to just cover themselves if somebody says "Some output from your service was used for <some abuse>" then they can just say it's not their responsibility.
    – VLAZ
    Commented Dec 5, 2022 at 15:18
  • 36
    We have begun internal discussions to identify options for addressing this issue. We’re also reading what folks write about the topic on their individual sites, as one piece of assessing the overall impact. While we evaluate, we hope that folks on network sites feel comfortable establishing per-site policies responsive to their communities’ needs.
    – Slate StaffMod
    Commented Dec 5, 2022 at 20:24
  • 22
    @JoshL1516 Because there's simply too much incorrect answers being generated to properly distinguish between them. We do not have the manpower for this level of quality control needed.
    – Mast
    Commented Dec 5, 2022 at 20:42
  • 11
    IANAL: Plagiarism, as used on our sites, is simply presenting a work as your own. If you didn't create it, it's not your work. As a non-existent thing ChatGPT doesn't own the copyright, and it seems that the user receiving the response is given what amounts to the Unlicense, or perhaps becomes the owner, even as far as copyrights are concerned. But, it's not a question of whether or not it's a copyright violation, or against what ever license you have to use the content. It's really a binary decision: did you create it [not plagiarism], or did someone (something?) else create it [plagiarism]?
    – Chindraba
    Commented Dec 5, 2022 at 22:36
  • 7
    @Slate At some point (probably sooner, rather than later) it may be necessary to clearly state on answering page that posting AI generated answers (on sites that don't allow them) is not acceptable and can result with account suspension. There are many policies that are not clearly stated and people post garbage because of that, but AI answers are way worse for detecting. There needs to be clear signal, and then there will be no surprises if someone gets a suspension. Also this could significantly reduce the influx of AI generated answers. Commented Dec 8, 2022 at 10:55
  • 11
    @tuskiomi ok thanks. Important note: ChatGPT's sharing and publication policy requires that "The role of AI in formulating the content is clearly disclosed in a way that no reader could possibly miss, and that a typical reader would find sufficiently easy to understand."
    – starball
    Commented Dec 11, 2022 at 6:21
  • 11
    @starball if only people would follow policy, lol. They don't. They want easy rep, and they'll do anything for that. So this crappy ChatGPT thing is the jackpot for them: way to write answers that look smart, get upvotes, and hard to detect it's not legit. Commented Dec 11, 2022 at 8:22

15 Answers 15

25

With due consideration, we've decided no network-wide, general policy regarding banning ChatGPT, or other AI generated content, is necessary or helpful at this time. However, as detailed in answer to: "Is attribution required for machine-generated text when posting on Stack Exchange?", we do consider AI generated content to be "the work of others" and the requirements for referencing must be followed for all such content on the network.

I want to be clear: I am not in any way intending to downplay the significance of ChatGPT, nor the disruption it has caused to the platform over the last few weeks.

Instead, we're going to stand by the comment I left on this post on December 5th:

While we evaluate, we hope that folks on network sites feel comfortable establishing per-site policies responsive to their communities’ needs.

Each site on the network is going to be impacted by ChatGPT (and its future iterations) in different ways. Of all the sites on the network, Stack Overflow was hit by far the hardest. However, we are measuring its impact both on Stack Overflow and across the network -- and, the impact of ChatGPT is currently diminishing everywhere. Some sites will see more or less activity on a given day, but outside Stack Overflow, it appears to be leveling off to a very slow trickle. On Stack Overflow, its usage rate is still falling quickly.

Because sites are impacted to such different degrees by the usage of ChatGPT, we encourage sites to create these policies as they become an issue. A blanket policy does no good if affected communities are not simultaneously developing the methods they use to combat the material problems they face. Instead, it risks being actively unproductive, by setting an expectation that sites will purge this content without giving them targeted tools to do so.

Our work internally progresses on identifying these posts and making our systems more resilient to issues like this in the future. We recognize that this is a shot across the bow, and the problem isn't going to go away in the long term. But for now, it seems we've weathered this storm mostly intact. As always, we'll reevaluate this decision in the future, if the circumstances warrant it.

And, of course, if any site experiences a volume of GPT posts that are cumbersome to manage, or a site needs any other support managing an influx of unwanted content, we are always happy to help apply the tools we have at our disposal.

8
  • 18
    In very simple terms, this feels like you're protecting Stack Overflow while throwing the rest of the network below a bus and letting it get run over. Without clear policy, terrible low quality AI generated content will flood the sites, eventually bringing them down. And I really don't like that. It's sad, and I see it a terrible mistake. Commented Dec 20, 2022 at 7:02
  • 17
    @ShadowWizardChasingStars Isn't Slate saying that in fact, ChatGPT-generated content isn't actually flooding non-SO sites?
    – Adám
    Commented Dec 20, 2022 at 7:22
  • 4
    @Adám perhaps it's not flooding them now, but seeing there is no policy against it, and seeing a clear request to ban it officially declined, I'm 100% sure people will start doing exactly that at some point. It's easy rep. Commented Dec 20, 2022 at 7:30
  • 8
    @ShadowWizardChasingStars But they only declined to ban it now. They promise to reevaluate this decision in the future, if the circumstances warrant it. So if people start doing it (as you are so sure they will do), further actions may be taken.
    – Adám
    Commented Dec 20, 2022 at 7:51
  • 19
    @Shadow If a community is experiencing a serious disruption in their ability to manage content quality, we will happily support that community using the tools we have at our disposal. Rest assured, the plan here is not to throw network sites to the dogs. Rather, that sites need to develop their own guidance specific to how GPT and language learning models are misused in their communities - policies we plan to help support. Overall, though, the median daily volume is very, very low on all ~180 non-SO network sites & decreasing. And I'm monitoring that situation actively for unwanted changes.
    – Slate StaffMod
    Commented Dec 20, 2022 at 8:23
  • 2
    @ShadowWizardChasingStars: If users are already unmotivated to try & prevent low-quality AI-generated content's flooding their site, then it's hard to imagine how the announcement of any network-wide policy will spur them into action. And I doubt the people posting this rubbish are paying any attention to policies. Commented Dec 21, 2022 at 17:04
  • 3
    @ShadowTheSpringWizard No; Stack Overflow is protecting Stack Overflow, because the ban arose out of community consensus there. As a Stack Overflow user, the process is that I flag suspected ChatGPT content via the normal flagging mechanism, then a Stack Overflow mod acts on that like any other flag. Each other Stack site is free to follow the same process and come to the same conclusions individually. Commented Mar 27, 2023 at 18:43
  • 3
    Is this officially endorsed by StackExchange? pipedream.com/apps/stack-exchange/integrations/openai Commented Mar 31, 2023 at 3:40
183
+100

I'd advocate for a hard line on this.

Something that's utter random garbage is actually less harmful than almost correct garbage that needs an expert to work out.

If you're using the output of a machine learning or AI tool WITHOUT verification and/or disclosure, and we're going to have to waste time working this out, the user clearly isn't here with the right intentions. As such I'd be inclined to treat the user as severely as needed, and I'd be tempted to start with longer suspensions.

It's a good way to prevent folks who are clearly wasting people's time from wasting more of it.

23
  • 9
    so you're ok to use ChatGPT in an answer provided it has the proper attributions? Commented Dec 5, 2022 at 16:28
  • 4
    If its garbage, we'd delete it anyway and suspend the user if there's a bunch of poor quality contributions but seems a more polite way to do it Commented Dec 5, 2022 at 16:38
  • 4
    so you're ok to use ChatGPT in an answer provided it has the proper attributions and it's a correct answer? Commented Dec 5, 2022 at 16:39
  • 11
    and its verified to be a workable/correct answer. Commented Dec 5, 2022 at 16:40
  • 6
    Ok then I think your answer and mine are similar. Commented Dec 5, 2022 at 16:41
  • 27
    @Franck You can't give proper attribution: ChatGPT doesn't identify its sources. And the text it emits may be derived from the work of a few authors or thousands (if not millions). The topic of AI plagiarism is mentioned in academia.stackexchange.com/q/191197 for
    – PM 2Ring
    Commented Dec 5, 2022 at 17:21
  • 28
    My initial reaction to "I'd be tempted to start with longer suspensions" was that it might be too harsh but the fact is a) any user who does that must know exactly what he's doing, b) said user would at least suspect that doing so might be against the rules, and c) the user will have done that multiple times already as it's not easy to identify that behavior with one or two answers; in most cases, one needs a stream of AI-generated answers to spot the pattern. Based on that, I don't really think such punishment is too harsh.
    – 41686d6564
    Commented Dec 5, 2022 at 20:47
  • 2
    Exactly - that's the same line of thinking I had Commented Dec 6, 2022 at 0:16
  • 8
    I fully support complete ban. Even if AI would be able to generate correct answers, then we could just ask AI instead. No need for posting such content on the network. We need to leave Stack Exchange for problems only people can solve. Anything else will not work in the long run and will cause more harm than good. Commented Dec 6, 2022 at 11:43
  • 2
    @ResistanceIsFutile "Even if AI would be able to generate correct answers, then we could just ask AI instead". No. Only true if AI is always correct. Commented Dec 6, 2022 at 21:04
  • 1
    @FranckDernoncourt Nothing is ever completely correct. If you could get excellent accuracy then you would probably first ask AI. Just like answers on SO are not always correct even some posted by experts. Commented Dec 6, 2022 at 21:06
  • @PM2Ring good point, I just meant giving attribution to ChatGPT. Commented Dec 6, 2022 at 21:06
  • 1
    @FranckDernoncourt to be fair - the main advantage here is we can take a look, decide its crap and delete it faster. Commented Dec 7, 2022 at 1:46
  • 3
    Well, posting nonsense is something commonsensically bad. It's grounds for potential account deletion. I don't think there's an argument for posting trash being acceptable in any way, even if it's plausible garbage, rather than the outcome of a cat walking all over the keyboard Commented Dec 13, 2022 at 9:24
  • 1
    @Topcode I am not going to focus on rest of SE because some rules may be different on some sites. But idea behind SO was not creating duplicate of documentation that can be easily found elsewhere, though some duplication will always exist. Idea was creating high quality repository of knowledge and answers to common (or not so common) questions related to programming. So when you have a problem you can easily find solution because someone before already had that problem and it was solved on SO. Commented Dec 14, 2022 at 13:10
73

Let me back this up and look at the broader picture here. We're about to enter a time that Star Trek once hinted at, but is now here and ready for your use: artificially generated content. The Internet has now reached maturity and search engines can run as much training data down your throat as you can handle. Quite literally we're watching the true next iteration of the Internet be born right now. Google can show you what already exists. AIs can generate almost anything your mind can dream up.

The problem there is ownership. The US helped the Internet in its infancy by making an environment where you can "Fair Use" just about everything.

[O]nline intermediaries that host or republish speech are protected against a range of laws that might otherwise be used to hold them legally responsible for what others say and do.

If you post a meme on SE that's using a copyrighted image, all SE has to do is take it down to avoid liability. With such a low bar to clear, it's allowed Fair Use to thrive. But... what do we do when these AIs start generating content that is wholly based on the works of others?

Allen's victory prompted lively discussions on Twitter, Reddit, and the Midjourney Discord server about the nature of art and what it means to be an artist. Some commenters think human artistry is doomed thanks to AI and that all artists are destined to be replaced by machines. Others think art will evolve and adapt with new technologies that come along, citing synthesizers in music. It's a hot debate that Wired covered in July.

And will these tools drown out actual users?

Established artist communities are at a tough crossroads because they fear non-AI artwork getting drowned out by an unlimited supply of AI-generated art, and yet the tools have also become notably popular among some of their members.

These are from September, involving art communities, but all ChatGPT is doing is basically a fancier search than Google can serve up. And this problem isn't going to go away because we're using machine learning everywhere. If an AI can't do it now, just wait.

For my fellow mods and I on Stack Overflow, the root problem boils down to two issues

ChatGPT is a parrot

Parrots are very smart birds and they can mimic sounds very well. But parrots cannot talk. They emulate the sounds they hear but they do not comprehend what they're saying.

ChatGPT is better than any chatbot we've seen. It writes in natural language, not the stilted text that typifies such systems. It generates what appears at first blush to be quality content. But we've noted that ChatGPT is doing what a lot of inexperienced users on Stack Overflow do: try to be the best-sounding parrot. Someone asked ChatGPT if it should be allowed to answer Stack Overflow questions and posted it in a now-deleted answer on our rule. I do have to admit it's amusing

I am writing to express my extreme disapproval of the idea of allowing ChatGPT answers on Stack Overflow. This would be an irresponsible move and would fundamentally undermine the integrity of the platform.

Polly want a cracker? (fascinating English.SE etymology lesson there)

ChatGPT is not a writer or programmer, it's just copying other smart-sounding sources that look highly relevant. It can (to its inventors' credit) write simple, passable code. But ChatGPT doesn't know what SQL injection is and another mod experimenting with it found it will merrily give you code suggestions using it. Why? For too long that was how a lot of people did it on the Internet, and you can still find that poor advice everywhere. Hence why a lot of folks on Stack Overflow will incessantly warn you about not doing that.

The various AI image generators can afford to be parrots. In fact, that's what the users want. "Paint me a picture of a cat riding a unicorn carrying a shotgun in the style of Vincent Van Gogh". That doesn't work so well when you need something like "How do I create a contact form on my WordPress blog?" That's not something I would trust ChatGPT to answer. It might give you workable code that someone will use to gain control of said blog.

Dishonesty

I won't link the person who said this, but this is a real Twitter post

So I started a new stackoverflow account and I am plugging random questions without answers into https://chat.openai.com/chat and pasting the answer. So far, after 9 answers in 1.5 hours it has 1 accepted and 3 upvotes and a reputation of 62...

And

6 accepted answers, of the 26, 11 upvotes, 5 downvotes. I am not checking the answers in any way. I'll give it a rest for now and see how we do tomorrow...😄

I'm sure ChatGPT (or Google if you wanna go old school) can tell you where to find this user, but that second statement is downright scary. They're not checking the answers in any way. In other words they also are not a programmer. I've already gone over why that part is a problem, but there's another one here: This user admits they didn't write the code. From the Meta Stack Overflow FAQ on non-English content

Translating a question for a non-English speaker sets them and all participants up for a poor experience, due to the OP not being able to follow and respond to feedback from comments, understand answers, or get assistance from the Help Center.

The inverse of that is true here. A user comes in and asks a question. Someone posts an AI-generated answer without understanding anything about the question. What if the questioner posts a comment asking a clarifying question? What if the AI did something confusing in their answer? This user cannot interact in a meaningful way with said questioner. And, going back to the art site debacle, who owns it if it's been copied? I mean, we don't use Creative Commons for no reason. This butts up against plagiarism in a most uncomfortable way.

TL;DR What do we do about it?

Answers generated by an AI should be considered as being written by the AI. That means you can quote them like any other source, but you must attribute them to the AI, just like any other source, and not use a bulk-copied AI answer quote as an answer. This way, we're avoiding the thorny issues of people running to the latest AI to get answers so they can copy-paste them as their own. We have plagiarism tools (current and forthcoming) in this wheelhouse so we don't need to reinvent any wheels.

If the AI gives a bad answer, we have votes for that.

3
  • 2
    Another piece using the parrot analogy: On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? (ACM. 2021-03-01) Commented Dec 12, 2022 at 16:22
  • @Machavity I may be asking a lot, but can you provide a reference that actually provides a concrete example of a question that causes ChatGPT to produce code that's vulnerable to a SQL injection attack, or any other "standard" code vulnerability for that matter? The reason being that I really want to put together a blog post showing such an answer, side by side with asking ChatGPT "Why is SQL injection bad?" This will be an extremely visible and comprehensible example of ChatGPT's complete inability to understand and reason about what it's saying.
    – dgnuff
    Commented Apr 14 at 6:11
  • This problem is not limited to AI generated answers. Answer get additional upvotes just because they already have many of them, see for example stackoverflow.com/a/70517348/3027266 (scroll down in the comments) Commented Jun 19 at 7:19
45

I believe any AI-generated content needs to be banned network-wide, because it is an attribution/plagiarism nightmare scenario

Some of the other answers have focused on the difficulties with moderating AI-generated content for correctness, accuracy, usefulness, etc., and I don't meaningfully disagree with those answers. But I do think there is a big, big problem, especially for a website network like Stack Exchange, which is the near-impossibility of properly attributing credit for the words that the AI is producing.

The main problem is that the vast majority of AI-generating algorithms publicly available, including ChatGPT, do not properly credit the sources that were used in the training model that the AI used to tune its internal model. This means that any answers generated by the AI are, de facto, plagiarism.

Many of these AI algorithms/organizations will obviate around the plagiarism issues in ways that avoid obvious legal culpability: ChatGPT, for example, refers to this data as "human AI trainers". But it remains the case that any use of these algorithms will constitute plagiarism until/unless stringent rules are applied to these organizations that they A) directly attribute every text that was used to train the AI model, B) only obtained these texts from authors that gave explicit, documented consent to have their texts used in these models, and C) document these contributions and make that data publicly available.

That doesn't necessarily mean it will never be appropriate to reference an AI-generated text per the normal Stack Exchange rules about quoting sources/using quote blocks in an answer (I have some skepticism about how often that will be appropriate), but I do think that this means—pending major ethical overhauls in AI-generated content—it will never be appropriate for AI-generated text to manifest as the body of an answer posted on this network.

EDIT: I don't know why this answer in particular is attracting users who seem ignorant about the relationship between Neural-Network AIs and actual biological human cognition, but to address a few recurring comments I've been seeing:

No, Neural Net AI Algorithms are not capable of Original Thought in the same way that humans are

Addendum: I am not saying that any/all AI Algorithms are not/will never be capable of original thought, but Neural Networks certainly are not.

Neural Networks get their name from a conceptual similarity between how biological neurons function and the abstract model of "Neurons" deployed in Neural Networks, which unfortunately has led to false equivalencies being made between the two, implying that human neurons and AI neurons are essentially equivalent or "the same, but varying in speed/power/etc.", and it needs to be understood that this is fundamentally untrue.

There's a lot of technical reasons why the comparison isn't particularly cogent, but to reduce it down to the simplest form possible: the biggest reason that Neural Networks cannot produce original thought is because they're not trying to. Neural Networks are designed with 'emulation' of existing data as an end-goal. To use ChatGPT as an example, its goal is not to create original ideas, its goal is to produce text that its model detects, with high consensus, is similar to the text that a human* has already produced. Stable Diffusion and other AI-Art-Generating algorithms operate on similar principles, attempting not to produce original paintings, but instead to produce an image that is similar to images that humans have already created.

* as defined by the model's training data, which I am assuming consists entirely of human-produced content but it should be acknowledged this may not be the case since the sloppiness with which sources are pulled into the model can in many cases pull in other AI-generated content, which would train the model not to talk like a human but instead talk like an AI

A really good case study for this "don't produce something original, produce something that resembles something that has already been created" function of neural networks is a test I performed, using Stable Diffusion, where I described a hypothetical D&D character ("Githyanki Woman in Red Trenchcoat") and had the algorithm try to generate an image. These were the produced results, which give us some important insights. The algorithm attempted to produce 4 images, and of those 4 images, two of them clearly modeled Cosplay photos (with some distressing facial distortion, which I'm guessing is the influence of the 'Githyanki' species modifier) and the other two generated... images of physical tabletop tokens, stand and everything.

It's not difficult to figure out what happened. The metadata of 'Githyanki' (a rare but playable humanoid species in some editions of Dungeons and Dragons) connected the algorithm to images that were created in Heroforge and other TTRPG token creators, along with cosplay photos of people modelling Critical Role characters (or their own original D&D characters). This is how the algorithm tried to establish 'similarity' between the images it created and the images it associates with the prompt provided.

But it's also very clear that none of the images produced are images that a human artist would create. In fact, the only way a human would produce any of those images themselves is if they had attempted to cheat the prompt—grabbing an image from a token generating website or a cosplay photoshoot and passing it off as their own work.

In other words, the AI did an exemplary job of replicating what a human committing blatant plagiarism might attempt to produce.


The reason I'm going into this long explanation and case study, aside from reinforcing my point about modern AI-generation algorithms only really being capable of committing mass-scale plagiarism, is to emphasize this point about these algorithms not being capable of original thought, and why arguments like "it's just like a human brain!" or "humans don't have to cite their training, why should AI have to?" are invalid. Some prompts are easier for the AI to replicate than others; certainly my example prompt broke the limits of what the AI was capable of replicating. But it's critical to understand that even when the AI is doing a much better job of reproducing what the prompt has asked it to generate, it's still doing the exact same thing as when it broke under the weight of my request: copying the data it found in its training data.

Now, the ethics of these algorithms can be solved: these algorithms could purge their databases, begin taking data only from artists who have explicitly consented to have their art/writing ingested, properly cite each work ingested, and make that data publicly available and easy to access, solving the widespread plagiarism I started out by addressing. I would even go a step further and argue that to be truly ethical the algorithm would also have to be able to cite the specific works whose influences composed the specific resulting output, but I'm guessing that might be technologically infeasible.

Important: something being "technologically infeasible" is not the same thing as saying "so we don't/shouldn't have to care about it". If AI-Art or AI-Writing can't be done ethically, I would argue it shouldn't be done at all, and whether or not running these algorithms ethically is a thing that can be done isn't in my purview.

But we need to dispose of this notion that these algorithms are, in any meaningful sense, engaging in original thought. They're not, and they're not truly replicating human thought, either at the macro level of human consciousness or at the micro level of individual brain neurons. Neural Networks are designed to emulate and reproduce existing works based on what the algorithm detects as being similar to those existing works, and they can be in some cases very uncanny replications, but they aren't original creations.

26
  • Does the term "plagiarism" make any sense for copying from a work not subject to copyright? Commented Dec 12, 2022 at 20:33
  • 9
    @JonathanReez So Plagiarism and Copyright are orthogonal concepts. Works that are not subject to copyright can still be plagiarized (used without crediting the source), and you can commit copyright infringement without committing plagiarism (cited the source but used too much of the copyrighted work and violated Fair Use). Important Distinction.
    – Xirema
    Commented Dec 13, 2022 at 1:08
  • @JonathanReez Beyond the question of whether any of the works used in the training algorithms for these AI are, in fact, subject to Copyright (they might or might not be; and the fact that we're not sure is kind of the root of the problem given that, again, these works are not properly cited), failure to properly cite those works would still constitute plagiarism regardless.
    – Xirema
    Commented Dec 13, 2022 at 1:11
  • 4
    AI writing text by learning from other texts isn’t any different from humans doing the exact same thing. You don’t have to attribute text you write to every single book on the subject that you’ve read. The tricky part is that prior to ~2021 any tool whatsoever was considered fair game to use to help you write text - spell checker, Google Translate, thesaurus, tools that help rephrase things, etc. But all of a sudden it’s claimed that this particular tool goes too far and no longer counts as a “tool”. Commented Dec 13, 2022 at 1:29
  • We consider plagiarism to be bad because it allows credit to be stolen. No such concern exists for tools because they don’t require human input. If you can write amazing novels using ChatGPT, why should you give your tool any credit? Commented Dec 13, 2022 at 1:32
  • 9
    @JonathanReez "AI writing text by learning from other texts isn’t any different from humans doing the exact same thing." This is not true. It's just flat-out completely false. Proselytizers for AI-generated content will sometimes make this claim because they want to capitalize on hype around AI and/or are jonesing for valuable Venture Capital funding, but the neural networks that power these algorithms are extremely unlike human thinking, and should not be treated as though they are performing original thought.
    – Xirema
    Commented Dec 13, 2022 at 1:41
  • 2
    Um… humans don’t learn how to write text or code by reading the works of others? That’s certainly news to me. Humans need a significantly lower number of samples to learn something but the general principle is the same. There’s nothing magical about how our brain works, it’s just a neural network. Commented Dec 13, 2022 at 1:46
  • 8
    @JonathanReez This isn't a debate about how Humans learn, it's about how AI learns, and specifically, how Neural Networks operate. Mass-scale Matrix Multiplication is not analogous to human learning.
    – Xirema
    Commented Dec 13, 2022 at 5:07
  • 5
    The brain uses the biological equivalent of matrix multiplication to achieve the same result. My question is why it's fair for a human to read a few articles and then write their own "original" article on the subject but not fair for an AI to do the same thing. You seem to assume that the human brain does some "magical" process while in reality it "auto completes" text in a fashion not quite dissimilar to ChatGPT. Commented Dec 13, 2022 at 5:14
  • @JonathanReez: I think the point that is being made here (underneath all the "AIs are not like humans" chatter, which IMHO is frankly just irrelevant and distracting), is that humans are normally expected to cite their sources, and ChatGPT is currently unable to do so. If you cite your sources, then as a rule, that is generally understood to be enough to defeat a charge of plagiarism. (There may be copyright issues if the text is very similar to the original, but that's a separate issue as Xirema's comment acknowledges.)
    – Kevin
    Commented Dec 22, 2022 at 9:55
  • @Kevin people rarely cite their sources on Stackoverflow and many sites like Politics have a rampant lack of source attribution by humans Commented Dec 22, 2022 at 12:35
  • @Kevin I wouldn't had to have added that section if I didn't keep getting comments from people insisting otherwise. There's already quite a few deleted comments on this answer from people doing that...
    – Xirema
    Commented Dec 22, 2022 at 17:09
  • @JonathanReez Which is why we don't want to introduce a tool that will make the problem exponentially worse.
    – Xirema
    Commented Dec 22, 2022 at 17:10
  • 2
    @JonathanReez Humans think about things. Transformers do not think about things. The precise mechanics of human learning aren't relevant, because this distinction is enough to explain a lot of the difference between human output and ChatGPT output. (Ask ChatGPT not to plagiarise, and it'll tell you it's not plagiarising while plagiarising just as much.)
    – wizzwizz4
    Commented Dec 29, 2022 at 20:02
  • 1
    @JonathanReez IQ tests don't test critical thinking. If even Nobel prize winners, who have made novel contributions to the body of human knowledge, can fail to apply critical-thinking, how would "capacity to solve IQ puzzles" be indicative? (But I suppose I'm just making your point again. As a bonus, I haven't proof-read this comment, so I betcha I've made it a third time, too.) Regardless, experts tend to think about their subject matter, and those are the people who are answering questions on Stack Exchange.
    – wizzwizz4
    Commented Dec 29, 2022 at 22:40
31

I fully support network-wide AI ban. A permanent one.

For now, the most prominent issue with AI answers is that they are mostly incorrect. Because they look like well-written elaborate answers they don't look like common low-quality answers that are otherwise poorly written or just contain code snippets (on Stack Overflow).

Because of that, they are harder to moderate - common flags like low quality and not an answer don't apply. Yes, incorrect answers could be downvoted, but that also requires some expertise in subject matter.

If the user posts plenty of such answers, they can also avoid an answer ban, by getting upvotes from other users deceived by the initial good impression.

Overall, incorrect AI generated answers are evidently harmful.

Why a permanent ban?

If we take a look beyond incorrect answers, it will become clearer why a permanent ban is necessary.

Let's say we allow users to post AI generated answers they will check for correctness (ignoring for the moment other potential issues mentioned in other answers).

User is an expert and can verify answer's correctness

If the user has the expertise to verify the correctness of an answer, then they also have the expertise to fully write one. Grammatical errors were never an obstacle when answer content was otherwise good, and there is plenty of users that edit such answers, so improving language is not the advantage in this scenario. Also, there are other tools that can correct language errors.

Because users that have the expertise, don't need AI, if we allow such answers to be posted, there would be extremely small amount of such answers, as verifying the AI generated content would probably take more time than writing the answer.

We are not going to lose any good content posted and verified by the experts in such cases.

AI becomes so good it gives correct answers

This will probably never become true, but let's pretend that AI will generate if not always, then correct answers most of the time.

If this would be true, then people could just directly ask AI the question, instead of posting it on Stack Exchange. Similarly, currently people are supposed to read the documentation and use Google or other search engines for solutions before simply asking questions.

We don't need to allow AI generated answers, if AI can directly solve the problem.


Another aspect why we need to permanently ban AI, is that AI didn't magically appear. It is being trained on human generated content. Stack Exchange strives for quality, and we need to preserve human generated content to do that. If we allow AI answers, no matter how good they are, we will start losing experts willing to share their knowledge.

If you can post in a few seconds what AI gave you, why would you bother adding original content? With time, less and less people will be motivated to participate, and in the end, Stack Exchange will become a collection of AI generated posts nobody will read anyway because it would be much faster to ask AI directly.

And when you can effortlessly get virtual Internet points, they will lose value fast.

7
  • 5
    "If the user has the expertise to verify the correctness of an answer, then they also have the expertise to fully write one." While that is true, the user might don't want to take time to write it fully, so leave drafting to AI. After that they can take some time to brush up the draft, then answer it - Does this should also be banned? And if this happened how could you find out that it was actually made by AI?
    – Skye-AT
    Commented Dec 8, 2022 at 3:00
  • 1
    @Skye-AT Yes, using AI should be banned in all cases. At some point when expert uses AI as help, it may be impossible to tell whether the answer is AI or not. And we don't need to push and enforce policy on anything even remotely suspicious. But policy needs to be clear without exceptions. Why? Because when you say it is admissible to use AI but you have to verify it, it is not just about the code it is also about explanation. ChatGPT is great BS writer, in other words it makes plausible explanations that only experts can verify, and sometimes even that can be hard. Commented Dec 8, 2022 at 8:35
  • 2
    We cannot allow people writing wrong answers just because they verified the code. Also even experts can fail to verify some area they don't know well, so this also poses a danger. Yes, there may be human written answers that are incorrect, too and are well written, but AI will only increase that. Commented Dec 8, 2022 at 8:38
  • 1
    You give people an inch and they will take a mile. We need clear AI ban. Commented Dec 8, 2022 at 8:40
  • "when you can effortlessly get virtual Internet points, they will lose value fast." Does rep have any value? Commented Dec 12, 2022 at 16:17
  • 2
    @Franck Dernoncourt: Yes, it does. There is a market for it, driven directly or indirectly by real monies. Much of it is hidden; it would be nice to know more about it. There is some evidence, but more is needed. E.g., it is suspected that HR drones use it as a hard filter on job applications, presumably driving some of the plagiarism on Stack Overflow. Commented Dec 12, 2022 at 16:27
  • And also driving the plagiarism on, e.g. Medium, presumably because an online presence, e.g. in the form of a blog, is required or presumed to be required to get a (good) job. Commented Dec 12, 2022 at 16:33
24

I asked this question directly to ChatGPT (the text of the OP's question):

Use of ChatGPT generated text for posts on Stack Overflow is temporarily banned. However, the reasons for the ban really apply to much or all of the network, and certainly for sites that are similar in nature to Stack Overflow.

I suggest (temporarily) banning ChatGPT network-wide.

It turns out that ChatGPT can give correct (or correct-looking) answers even for subjects of sites that are not at all like Stack Overflow. I asked an unanswered question from Judaism.SE and got a fairly good answer.

Its answer was:

This is a problem because it can mislead people who are looking for help, and it can also give an unfair advantage to ChatGPT users who are also answerers on Stack Overflow.

So, even ChatGPT itself agrees with the ban. Ban away!

1
14

Let's make this broader.

Ban any form of AI-based content creation from the network.

This would cover both ChatGPT and any AI art-based creations.

I don't think disallowing discussion about them in the sense of legal or ethical discussion should be excised entirely from the network; there are plenty of places in which experts could weigh in and help the larger Internet community navigate this whole thing.

But someone sharing their AI art here? Someone using a bot to answer a question? Get that crap off the network.


Related: I follow a ton of artists on Twitter (literally the only reason I haven't left the platform yet), and I can see that they're also worried and frustrated with the likelihood that AI art is both stealing their style and people are actively getting paid for it. While I believe that the future of AI could be bright, the fact that these models are trained on data without consent is something I morally object to, and have held an opinion on since it's come onto my radar.

8
  • Would a network-wide ban include chat? Maybe I can't be too critical of it because I've posted AI images to chat, but chat isn't supposed to be serious academic prose like much of the network.
    – Laurel
    Commented Dec 12, 2022 at 18:07
  • 3
    @Laurel I can't see any reason to ban AI generated content in chat. MSE main room, Tavern, already has a lot of AI generated images for several months, and everyone are fine with it. Same can be done with actual messages, within the CoC and ToS of course. Commented Dec 12, 2022 at 18:38
  • @Laurel: I'd defer to those who are more expert on that. If Chat's content is meant to be covered by copyright that could get complex. If it's an objective conversation about it then I don't think I'd care too much, personally.
    – Makoto
    Commented Dec 12, 2022 at 20:13
  • 1
    While I basically agree, there are a lot of exceptions. Besides for Shadow's, there are also e.g. pfps.
    – Adám
    Commented Dec 12, 2022 at 20:33
  • 1
    @Adám: If you're saying that someone should be allowed to use an AI-generated piece of art as their avatar or in their profile, I'm going to firmly disagree with that. No reason to make carve-outs there, after all.
    – Makoto
    Commented Dec 12, 2022 at 20:37
  • 1
    @Makoto It'll be their responsibility to deal with copyright (but that's already the case with pfps). I should be allowed to use "AI" to generate a pfp e.g. based on all the photos and images I created myself.
    – Adám
    Commented Dec 12, 2022 at 20:40
  • @Adám: It's been a minute since I've had to deal with this, but it's not the individual that receives a DMCA complaint, it's the site. They'll delegate all of the appeal process down so that yes, if you own the copyright on all the things then you can deal with it then, but it's not the case that these DMCA complaints are targeted so specifically to the individual.
    – Makoto
    Commented Dec 12, 2022 at 20:47
  • pfps = profile pictures Commented Dec 13, 2022 at 5:52
14

I agree with this proposal. We are operating under a similar policy on the Politics Stack Exchange site. While moderating and removing answers posted using this tool, I've noticed that the ChatGPT tool is particularly good at generating content which seems eminently plausible to someone without knowledge of the topic, which in some cases has led to arguments developing in comment threads. In one case, the poster of the AI-generated content appeared to be feeding the critical comments back into ChatGPT and posting its replies.

Furthermore, if AI-generated answers were allowed, this would jeopardise the Stack Exchange reputation system, which is meant to be a decent measure of expertise and trust that the community places in a user. Users obtaining reputation through posting AI-generated content would not, in my opinion, have demonstrated that they are worthy of that trust or measure of expertise.

4
  • 1
    If high-rep users "have demonstrated that they are worthy of that trust or measure of expertise", why not trust them to only copy ChatGPT when it's correct? Commented Dec 16, 2022 at 17:15
  • 1
    @FranckDernoncourt that sounds so... soulless. as a high rep user, don't you take some pride in the humanity of your style and thought-process?
    – starball
    Commented Dec 16, 2022 at 18:06
  • 2
    @starball no pride, just knowledge exchange or improving one's reasoning ability. Commented Dec 16, 2022 at 18:07
  • 1
    Politics is primarily about persuasion and "oughts", not facts and "ises" - using the term "plausible" muddies the waters. If someone can be tricked into debating one's own political position with an overgrown language model that performs no critical thinking, that should be taken as a sign that the human in that exchange has also not met a minimum standard of critical thinking. Commented Mar 27, 2023 at 18:50
13

I'm surprised it hasn't been mentioned that Quora has a bot that asks questions that humans then get suckered into answering. It's awful. I've abandoned Quora (turned off all email notifications) in response, and I think most people who learn about this are similarly turned off.

We should explicitly ban bot-generated answers AND questions, across SE, forthwith.

5
13

Sadly, this got another major hit, probably the final blow: What is the network policy regarding AI Generated content?

In short, moderators are no longer allowed to suspend users when they suspect the posts are generated by ChatGPT, or other AI software, not even when using any of the existing detectors.

This can have only one result: ChatGPT answers are going to flood all the sites, and there's nothing to stop it anymore. This is a historical moment, the beginning of the end of Stack Exchange, which lost direction utterly and completely.

4
  • @Laurel you did not fix grammar, you made a major change in my answer. Are moderators still allowed to delete answers, and just not suspend the users? If so, should clarify, if not, should roll back the change. Commented May 31, 2023 at 9:46
  • You can revert if you want.
    – Laurel
    Commented May 31, 2023 at 9:48
  • 2
    @Laurel but why did you change it in the first place? If it was correct, why you removed it? Commented May 31, 2023 at 9:49
  • 1
    We can delete "low quality" answers. That's pretty standard procedure on a lot of sites, going back before AI.
    – Laurel
    Commented May 31, 2023 at 9:55
8

I like this, but I'm going to split hairs and take that stance that this shouldn't really be "new" policy, and that it should instead just be messaging that reinforces existing policy on this new kind of "source" for content (for lack of a better term).

Plagiarism is posting content as your own that is not your own, simply put. This is completely regardless of whether a given work's license (or AI's terms of use, in this case) allows or disallows that practice– passing off anything that is not yours as yours is plagiarism, and is not allowed.

Because our network includes many different sites that work in unique ways, I think each individual site should be free to come up with its own decision regarding whether it officially allows or bars ChatGPT as a "source" for answers. But no network site allows plagiarized content, which is what wholesale copies of AI-generated responses are.

If this messaging would help deter users, even in some minor way, from flooding sites with AI-generated answers posted as their own, or would give moderators across the network more confidence in dealing with such content, then that sounds like a completely worthwhile message to send.

3
  • 4
    "But no network site allows plagiarized content, which is what wholesale copies of AI-generated responses are." this is...debatable. It's a tool - you give it input, it gives you output. The rights to both are given to you. But even if that wasn't specified, it's no more plagiarism than using other tools. Since tools can't own content. The TOS of the tool does state that you should not be presenting the content as your own but that's hardly enforceable by non-affiliates. But mostly, the whole legality doesn't matter that much - the output tends to be garbage. Should be the driving factor.
    – VLAZ
    Commented Dec 5, 2022 at 18:26
  • 2
    @VLAZ Firstly; I'm no expert here, and so I genuinely don't know if the way I understand all this is the best way, or whether it will hold up in light of whatever new AIs come out tomorrow and what they can do. But, especially after reading Machavity's answer, I lean towards his idea that we can't treat these answers as "tool generated"– we have to consider them as plagiarized, precisely because they include zero actual input from the author. That's the key distinction. Either a post is from the author to some degree or it's not. These aren't.
    – zcoop98
    Commented Dec 5, 2022 at 23:35
  • 5
    I'd also make the argument that using AI as a "tool" is combining its response with user input, to come up with an end result post that might incorporate ideas or pieces from the generated response but which is still a user-driven presentation of ideas. Obviously that line is extremely blurry (E.g. How much user-generated content is needed for the result to be "user-driven"?)– but wholesale copies of AI responses are well outside that discussion in my view, because the amount of user input they include is exactly zero percent, which makes them de facto plagiarism.
    – zcoop98
    Commented Dec 5, 2022 at 23:40
4

I believe a permanent ban on MI generated content is necessary to preserve the quality of answers on Stack Exchange.

Perhaps SE should go further. The GPT neural net (NN) trained on data scraped from many sources including Stack Overflow and related sites. As others have noted this happened without explicit permission, attribution or compensation. I didn't consent for them to use everything I ever posted on the internet for this purpose. Perhaps SE should create an opt-in model so that human creators need to give explicit permission to use their contributions for NN training. The default legal position then becomes, "No. You can't use my creativity to train a proprietary, competing tool". Thoughts?

7
  • 1
    I wonder if it's possible to legally block AI training from using CC BY SA licensed content (which includes the entirety of SE). As I understand it, if they attribute the source, they can use the content. If you disagree, the only way to find out who's right is probably to sue. (Do they attribute their sources? Not that I've found, but if that's the only legal qualm then they could continue what they're doing if they properly attributed their sources.)
    – Laurel
    Commented Dec 13, 2022 at 16:13
  • 1
    @Laurel question is how they scrape the training data, in what tools, etc. Commented Dec 13, 2022 at 16:19
  • @Laurel law.stackexchange.com/q/11183/31 Commented Dec 16, 2022 at 17:20
  • @FranckDernoncourt's link (somewhat) addresses the attribution side of things. However, that's not the core of the issue. Stack Exchange posts are licensed under CC BY SA, which cannot be revoked. You can tell someone they can't use your stuff, but it's an empty threat legally if you've licensed it under a license that allows that use. Or was this suggesting a new license for future contributions? Are there any licenses out there like that or is it unenforceable?
    – Laurel
    Commented Dec 16, 2022 at 17:56
  • @Laurel "In the case of a share-alike-licensed dataset, must all models trained on it be redistributed under the same or similar license?" Commented Dec 16, 2022 at 17:57
  • @FranckDernoncourt And if they distribute it under a compatible license?
    – Laurel
    Commented Dec 16, 2022 at 18:02
  • @Laurel I think it's ok, unless perhaps if the license excludes derivative work. Commented Dec 16, 2022 at 18:04
3

I'm not advocating for or against bans here -- I agree overall that there are valid reasons to ban ChatGPT answers, especially copy/paste/verbatim. But I do think some of the answers here are (and this may not be the best word), "overreacting" to certain aspects of the issue:

  • Position: All ChatGPT content should be banned.

    Does this include grammar checks done by ChatGPT, verified by the editor, and modified as needed by the editor? I've found ChatGPT to be a good tool (when it's up) for quickly suggesting better grammar in several posts that needed improvements.

    • Example 1: Before and after. IIRC, I'm fairly certain I did change one word from the ChatGPT suggestion -- I felt "the issue persists" was better than "this issue persists".

    • Example 2: Before and after: I definitely made some small tweaks to the ChatGPT suggestion to make it slightly more concise, but I can't recall exactly what they were.

    I do a lot of grammar/clarity edits (including, frequently, on my own mistakes). While I certainly could have cleaned up these two posts on my own, ChatGPT saved a lot of effort in these cases, and I believe the posts (and site) are better for it. In no way did ChatGPT suggest an answer here, or rely on any training other than proper English grammar.

    Should this use be banned as well?

    Counterpoint: ChatGPT can still make grammar mistakes that change the meaning of the post. If the editor doesn't carefully review the changes to assure consistent meaning, then there's a risk of a bad (and disallowed) edit.

    Counterpoint-to-counterpoint: And yet, honest mistakes in editing happen even with human intervention (perhaps even more so).

    I'm not saying I have the answer here, just raising the question of whether (or not) there are acceptable and positive uses of ChatGPT.

  • Position: Attribution issues are a nightmare, since ChatGPT can't tell us where it learned something.

    I struggle with this one as well. Sure, this is a problem, but isn't it a problem for many posts by humans as well? Unless we came up with an answer purely by finding a website with the answer, we typically don't remember where we found the information originally.

    Take the Stack Overflow answer to How to permanently export a variable in Linux?. At some point, the user who posted this had to learn how to do it themself, but it's unlikely they remember when or where. I certainly don't remember where I learned how to do that. Why would we expect an AI to be able to cite/attribute its source for where it learned something when most of us don't know ourselves?

    The problem, of course, is when there is only a single-source of the information. Even so, if I read something in an article several weeks ago, and the topic comes up in a month in a Super User question, am I going to remember where I read it? I'll try to search the web to find it again so that I can attribute it, but most people wouldn't even go that far. But at that point, it has become part of personal my "memory" and "training", regardless of whether it is information from a single-source or available from multiple-sources.

1

Let me go a little deep in support of this question.

Firstly, Stack Exchange would be useless if ChatGPT was allowed network-wide. Remember that SE is a network of communities such that each person seeks to learn more from each other, not just the questioner. Even answerers get to learn by getting improvements from other users. This would be useless if AI could automatically answer all questions, with little error, and ChatGPT has seemed to achieve this goal.

I hear you saying "But we've got Google!". Well, for starters, most of my programming answers were basically searches into Google, but they mostly redirected to Stack Overflow and related webpages.

Secondly, AI is still not yet perfect. So although the range of error is small, now again here we're talking about lots of answers. That means, small translates to a (somewhat) big amount.

Thirdly, most of the issues concerning SO about AI pretty much applies to the entire network.

That's all I've got to say.

4
  • 1
    "Stack Exchange would be useless if ChatGPT was allowed network-wide" why? The output is either satisfactory to the asker or it isn't, and in the case it isn't it can still ask the question.
    – Braiam
    Commented Dec 13, 2022 at 17:27
  • 3
    "This would be useless if AI could automatically answer all questions, with little error, and ChatGPT has seemed to achieve this goal." =.= have you been following the discussion on MSO?
    – starball
    Commented Dec 13, 2022 at 20:20
  • @Braiam Note that AI gets better nearly every week, and if this rate can be reliable enough to keep improving it, then if your comment is right, the percentage of questions asked will go really low Commented Dec 14, 2022 at 5:10
  • @starball I've been reading it, yes Commented Dec 14, 2022 at 5:11
-50

I disagree: we shouldn't blindly discriminate against AI. If ChatGPT provides a correct answer, then it shouldn't be deleted, provided it has the proper attributions. No need to waste human time if some AI can do it. However, if an SE user tends to post incorrect answers, then they should be banned.

Example of attribution that we could require users to add:

This answer was (partly) written by ChatGPT, but I have the expertise to confirm it is correct.

Notes:

  • The most upvoted answer is pretty much the same as this answer.
  • SO banned ChatGPT on the basis that it can generate a wrong answer, and Puzzling SE banned ChatGPT on the basis that it can generate a correct answer.

Follow-ups:

0

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .