-12

This question is concerned with finding possible middle grounds in the ongoing debate about banning and detecting AI generated content, so it will probably get criticized from all sides. I'm prepared.

Nevertheless I ask myself if it would be possible at the same time to:

  • ban (identify and remove) AI generated content that is taken from somewhere as-is without any fact check nor any improvement whatsoever and
  • keep content that among others draws on AI generated content, fact checks it and improves all the things that can be improved, potentially adding own ideas and
  • has a reasonable low error rate of differentiation between the two ?

I think this is very difficult, because people who just copy&pasting AI generated content can always just claim that they did the fact checking without really doing it. While some people would certainly fact check AI content there are even more lazy people out there, so it's not like one can expect that only one of the two things will happen at any time.

It would be desirable because we decided that simply copy&pasting content from AI is banned because the quality is too low, but in many comments we nevertheless agreed that fact-checked and polished/drawn upon AI generated content is not a problem. A reliable differentiation of the two would therefore be helpful.

On the other hand, the company recently made a u-turn on banning AI content, questioning the accuracy of AI generated content detection at all (leading to a moderator strike). If we were not able to reliably detect AI generated content usage at all, there is also no hope to differentiate between the two usages. And even if we could, the second group, which is AI generated but human fact checked and edited content, is likely somewhere in between AI generated and Human generated content from their characteristics, so one would expect it to be even harder to detect or differentiate between.

So maybe this cannot be achieved and there is no middle ground and you either have to ban all AI generated content or none (meaning the strike will end by one side giving up). Or it's actually somehow possible and that's the reason for this question.

I think that the largest differences are:

  • Human checked and edited AI generated content will contain a working solution while AI generated content alone will typically not be working (otherwise it would be high quality and much less controversial)
  • Human checked and edited AI generated content might be more precise and to the point than AI generated content (because humans can trim the content more efficiently)

If we would have to fact check ourselves each time to be sure that the answer was already fact checked, this solution would be too inefficient (and typically not doable by moderators on for example Stack Overflow). However, maybe we could require that people actually describe how they made sure that their answer is actually working. Is this even possible? Can one credibly prove that the own answer is actually more than copy&pasted from some obscure source on the internet?

Maybe the personal history can be taken into account (somebody with a long track record of well received answers might be more likely to no just copy&past) to get more confidence. Or the time between posting answers or the number of saved drafts maybe (even though they might not be reliable)?

I think this question is interesting even in a broader context because not only copy&pasted AI generated answers are low quality. It might improve the quality of all generated answers if we had a way to be more sure of how much they were fact checked before posting.

For the sake of this question I would exclude issues with attribution and rather assume that always all sources are properly attributed. This question is just about the content.

I searched within Is there a list of ChatGPT or other AI-related discussions and policies for our sites? for similar discussions. The closest is still How can we determine whether an answer used ChatGPT?, ChatGPT assisted questions, Is re-worded ChatGPT answer allowed?, which touch upon the broader subject of detectability.

38
  • 19
    This whole idea requires so much safeguards, that it's doubtful that there are any real upsides for allowing GPT answers. My position: users who can verify correctness of answer don't need GPT, people who couldn't - shouldn't answer that question in the first place. My position in full is in an answer to this very similar question.
    – markalex
    Commented Jun 16, 2023 at 13:15
  • 9
    How do you differentiate an original answer from a plagiarized answer? Or an answer that is original but incorrect? Or an answer that was correct at one point but is now outdated? The problem isn't distinguishing good quality AI posts from not-good quality AI posts; The problem is that it is difficult to effectively curate content under the current system when the volume of posts gets too large. Using AI to improve or help write a post is not inherently a problem and I could do it in a way nearly impossible to detect.
    – ColleenV
    Commented Jun 16, 2023 at 14:09
  • 1
    @ColleenV "that was correct at one point but is now outdated" This question is only concerned with answers that are correct at the time of posting. People cannot look into the future. "Using AI to improve or help write a post is not inherently a problem and I could do it in a way nearly impossible to detect." Okay, then could you maybe write in an answer how you do that so that it is nearly impossible to detect? Then everyone should do the same but I don't believe it yet. Commented Jun 16, 2023 at 15:34
  • 13
    You've missed my point entirely. The current system doesn't scale. Stack Overflow content is full of bad posts. Questions that should be closed but aren't. Plagiarized answers. Incorrect answers. Outdated answers. If SO can't solve that problem, it can't solve the problem of bad quality AI posts either. The only reason anyone is talking about AI generated posts is because the popularity of it has caused the system that was held together with volunteer manual labor and optimism to finally be overwhelmed.
    – ColleenV
    Commented Jun 16, 2023 at 15:39
  • 1
    @NoDataDumpNoContribution I don't understand. An AI-assisted post doesn't look like it was written by AI because it wasn't; it was written by a human with AI help. If you don't want a post to be detected by a human as AI-generated, don't copy and paste AI generated text. There are no reliable automated ways to detect AI generated text.
    – ColleenV
    Commented Jun 16, 2023 at 16:51
  • 7
    The trouble with AI is it's still as dumb as a rock. I just asked Firefly for "an old black & white photograph of three Victorian gentlemen and a large-toothed fish they caught in a foggy swamp." I'm not sure it quite got the idea - i.sstatic.net/POJKF.jpg The difficulty is differentiating dumb users from dumb AI [LLM]. AI never checks its work. Humans are expected to, but if they write the 'english' bits out themselves then it becomes difficult, because unless you fully test it yourself, you can't see the big fish with teeth. It's still dead easy to spot the copy/pasters, though.
    – Tetsujin
    Commented Jun 16, 2023 at 16:54
  • 2
    @ColleenV "An AI-assisted post doesn't look like it was written by AI because it wasn't; it was written by a human with AI help." I see. For you it's a binary thing. A text is either written by a human or an AI, even if a human was inspired by an AI it's still human and both are completely distinct. I think of it rather as a continuum, there are purely human written texts, purely AI written texts, and texts that are collaborative between humans and AI and therefore everything in between exists too. To me there are more grey levels to you more black and white. Just different ways to look at it. Commented Jun 16, 2023 at 20:26
  • 2
    @ggorlen "Which comments? Who is "we"? " I will print and reference them tomorrow. They definitely exist. We means the community. Again and again I've seen comments that using GPT as inspiration and doing fact checking isn't a problem. Haven't you seen them too? We can discuss that too, but it would be a different question. This question explores these two usages of AI content. Commented Jun 16, 2023 at 20:35
  • 7
    @NoDataDumpNoContribution Detecting GPT is easy: brand new accounts spamming 25 answers in the span of an hour in totally random tags all of which have perfect grammar, at least half of which are flat out wrong, many that don't even address the question being asked and are poorly-formatted, and all of which sound exactly like ChatGPT. "It looks like you're trying to do X". This pattern never happened before LLMs came out, now it happens constantly. Poster can't even tell you much of anything about the answer when you ask for clarification, because they're just trying to farm rep
    – ggorlen
    Commented Jun 16, 2023 at 22:15
  • 9
    I keep hearing virtually the same proposal talking about the value add of LLMs on Stack Overflow, but I've seen zero evidence of a positive contribution it's made. Even if it happened to answer something correctly, why not just go to the LLM and ask it directly? There's no point in turning SO into a dumping ground of stale LLM answers. Not being able to post LLM answers on SO doesn't stop you from using LLMs in any way, but posting LLM answers on SO makes it much harder to get human help, which is more necessary than ever.
    – ggorlen
    Commented Jun 16, 2023 at 22:28
  • 2
    @ggorlen See some examples of my experience meta.stackexchange.com/questions/387575/…. I could contribute hundreds of correct answers being assisted by ChatGPT4 in my field of study, but haven't done so quite simply because it currently is not allowed... Commented Jun 18, 2023 at 11:40
  • 2
    There was a post about using AI to fix grammar, where this usage was considered acceptable. This is precedent that human-AI coauthored posts are acceptable to some extent. (I'm fairly sure, going forward, I'm going to at least ask ChatGPT for feedback on my posts---something like AI peer review.) Commented Jun 18, 2023 at 12:46
  • 2
    @TomWenseleers Please re-read my post above. "Even if it happened to answer something correctly, why not just go to the LLM and ask it directly?" Is it because LLMs are so often wrong that getting a correct answer is luck? I'm talking about net impact here, which is clearly negative. Getting the right answer by chance 30% of the time on a good day doesn't cut it for me (and even if it did, just ask the LLM directly...). Anyone can cherry pick a good LLM answer, but that doesn't overcome the impact of thousands of garbage answers flooding the site and overwhelming the signal to noise ratio.
    – ggorlen
    Commented Jun 18, 2023 at 14:55
  • 3
    @RebeccaJ.Stones Using AI to help people ask questions was a disaster on SO
    – ggorlen
    Commented Jun 18, 2023 at 15:02
  • 3
    @TomWenseleers There's no getting stuck--nobody is telling you not to use LLMs, just don't pollute the training data and drown out the human input factor that got us this far. We're having the same discussion in two threads and I'm mostly repeating my points. Thanks for humoring me--we'll have to agree to disagree. Like I said in the other thread--you won. LLM answers will flood SO with the support of the company. I don't see them going back to the LLM ban, because it hurts their bottom line of maximizing "engagement". Please enjoy your victory and I'll see how wrong I am as time goes on.
    – ggorlen
    Commented Jun 18, 2023 at 19:38

3 Answers 3

4

This was in fact being accurately achieved prior to the policy change. The change was a panicked reaction and in error.

8
  • 4
    Is this an answer? I ask for how in order to get to know details. You don't say anything about how that might have been accomplished before. Commented Jun 16, 2023 at 20:18
  • 4
    There is no objective list of criteria that can be used to directly identify AI content, it's something you do through reading and understanding what generated content looks like and more importantly, what it does not. it is not a mathematical formula or set of phrases that we can just regex for. Being unable to code a solution to detect it accurately doesn't mean our existing methods, aka using our brains, is ineffective at identifying these differences.
    – Kevin B
    Commented Jun 16, 2023 at 20:23
  • 2
    No it doesn't mean that, although typically people try to find explanations even approximate ones because there must be some, we do not simply throw dices. In this case I think it's even more difficult because the amount of AI as influence can be variable. A mixture of AI and human content should be even more difficult to detect. But nevertheless you say that it is possible with high confidence by human learning, right? Maybe you have a theory how that worked? How did people find out the difference between fully AI generated and only AI assisted content? Commented Jun 16, 2023 at 20:42
  • I don't think we've thus far tried to differentiate between the two, given our policy was a blanket ban on AI, not excluding AI assisted content. AI assisted content varies, as AI-assisted code would be much harder to differentiate from just wrong or correct code. AI "assisted" text on the other hand would be no more difficult to detect than what we've been detecting.
    – Kevin B
    Commented Jun 16, 2023 at 20:44
  • 4
    After all, AI assisted text is really just AI text. If you take an AI text response and modify it to look human, it's no longer AI assisted text. AI gave you an idea, and you formulated that idea in your own words.
    – Kevin B
    Commented Jun 16, 2023 at 20:55
  • 2
    "I don't think we've thus far tried to differentiate between the two" this probably means that the answer should rather be that we don't know because we never tried it before. "AI gave you an idea, and you formulated that idea in your own words. " But the idea is still the same? Is it only about the used words then? I thought these language models are especially good in emulating the words of humans. I thought the weakness of AI is that it cannot fact check, much less that it cannot emulate human speech well enough. Commented Jun 16, 2023 at 20:58
  • "I thought these language models are especially good in emulating the words of humans." that's certainly the claim, but in practice... not so much. It's great at being unpredictable, however that unpredictable nature leads to being identifiable if left unmodified. If you instead take an idea or solution presented to you by ai and provide it as an answer in your own words, that's no more detectable than someone doing the same from docs. It can certainly be problematic, in that it might still be presenting false information, but that's not what the majority of problematic cases are doing.
    – Kevin B
    Commented Jun 16, 2023 at 21:51
  • 4
    The goal of the old policy was to deal with the worst of the worst. If someone takes an idea presented to them by AI and presents it in their own words, that's not something that's going to be detected as easily as someone tabbing through a half dozen questions and copy/pasting an answer. That doesn't mean we shouldn't still take action against the users that are just tabbing through a half dozen questions and copy pasting an answer.
    – Kevin B
    Commented Jun 16, 2023 at 21:52
0

For programming related questions I would say it is very easy to detect if ChatGPT has been used responsibly & the answer is not just a quick copy and paste:

(1) it should feature reproducible code with output or benchmark

Since ChatGPT can't yet run code, that would be a simple and easy check to verify that the poster checked the code ran correctly & verified the answer.

and it should

(2) feature links to trustworthy sources

E.g. Google Scholar articles etc, which would also signal that the user took some effort to do some extra verification, as ChatGPT doesn't do that by default (unless the ScholarAI or the web browsing plugin is used, but that doesn't always work so well).

Simple checks like this would ensure the accuracy of programming-related answers partially based on ChatGPT4 or GPT based Github Copilot code or the GPT based Bing search engine (which many now have as default). This is also the narrow conditions under which I would allow some usage of generative AI on SO.

See also my answer here, where I shared some experiences on the use of ChatGPT4 for answering Stack Overflow questions & the responses I received (including ones for which I never received a good answer on SO, not even after 5 years, despite the question being highly ranked).

It would seem unavoidable that at some stage SO will have to transition to allow responsible use of AI on its platform. Cf. most academic journals that now allow the responsible use of generative AI, e.g. PNAS. The quality of ChatGPT4 has also been become much better than that of the free tier, ChatGPT3.5, see e.g. here for some recent maths benchmarks. Most discussion on SO on ChatGPT is highly outdated and concerns mainly ChatGPT3. Of course, as in all guidelines on responsible use of AI, the use of ChatGPT should then be disclosed (in line with the referencing requirements), which currently is almost impossible, as doing that often results in the answer being deleted. Doing so would also enable other user to scrutinize those answers a little more if need be.

-2

Here is my rules of thumb. Please note that in the end the final choice should come from someone with experience with a bunch of GPT generated content who will recognize this. But these are generally giveaways:

  1. Is it factually wrong? This doesn't necessarily making it unchecked AI, but it’s regardless a bad answer. Editing the answer should remove this.
  2. Does it have a lot that is not useful? Often ChatGPT will give a general introduction to the topic at hand. Editing the answer should remove this.
  3. Can the user respond well to comments? Oftentimes if the user does not know what they are writing about, then they will not be able to accurately and usefully respond to any comments. And don't get me started on GPT-generated comments...Those are obvious
  4. Does it have some of ChatGPT's favorite starters? GPT loves "Here's an example of how you could use" and similar starters. Editing will generally remove that as it isn't too useful.

Note that these don't necessarily make a post actually unedited GPT or not. The best way to tell is have human who has seen it a lot (like our moderators) make the call.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .