6
$\begingroup$

Update 2024 Feb

Stack Exchange has rolled out a new Help Center page network-wide that makes it clear that uncited AI-generated content is a violation of the Code of Conduct. Our copy is at /help/ai-policy.

A complete ban continues to be up to individual sites. Given the current votes on the two answers here (which I interpret as "cautiously allow" vs "ban"), we'll allow AI content if properly cited and correct. Repeated posting of low-quality answers (including those that are AI-generated if they are incorrect, in particular by an unwary user who hasn't fully vetted the output) will be handled as usual.


2022 Dec

The Stack Exchange network has been abuzz with talk of OpenAI's ChatGPT and its use to generate answers on the network. Similar but smaller-scale talk has happened in the past around other AI answer generation, and surely more will come.

Here at DS.SE we perhaps have a special relationship with ChatGPT, but I think the general concern holds: reasonable-sounding but incorrect answers from the tool are harder for the community to moderate and put stress on our resources. There are additional questions around ownership/licensing/copyright for the answers and the source data used to train the tool.

I have seen a few of these answers on this site. Some of them are superficially convincing, and on topics outside my specialty, so that I cannot say for certain whether they are correct.

How should DS.SE handle answers generated by ChatGPT?


$\endgroup$
4
  • $\begingroup$ Is there a current policy now? $\endgroup$ Commented May 20, 2023 at 23:03
  • $\begingroup$ Can we get a practical answer on what to do when we spot a case of bad LLM usage on a given answer ? (downvote ? report under what justification ?) what if the 'wrongness' is not so obvious ? Can we get guidelines on what to do with repeated offenders ? $\endgroup$ Commented Jun 10, 2023 at 14:38
  • $\begingroup$ @lcrmorin, this is actively being discussed, with a moderation strike having been instigated by (but not solely about) a policy from SE effectively saying that moderators cannot do anything except in extremely narrow circumstances. Do the usual, at least: downvote anything wrong, flag for usual reasons, and flag repeat offenders of those issues. IMO, flag suspected genAI with a custom message, but know that we (diamond mods) are not currently allowed to act on those; the flags will be useful datapoints in our discussions. $\endgroup$
    – Ben Reiniger Mod
    Commented Jun 17, 2023 at 17:11
  • $\begingroup$ To update (rather late) from my last comment: one of the strike outcomes was a collaboration between moderators of AI content and the company to produce and maintain a list of heuristics allowable for the identification of AI content. meta.stackexchange.com/q/391847/829685. All of the other action options from my last comment stand, but uncited AI content can be flagged as such as well, and diamond-mods will apply the heuristics in determining any actions to be taken; noting in the flag why you think it's AI-generated will be helpful for this. $\endgroup$
    – Ben Reiniger Mod
    Commented Feb 26 at 17:00

2 Answers 2

5
$\begingroup$

ACL has recently published their take on the topic which I find well thought-through and sufficiently nuanced (in contrast to an undifferentiated strict ban, for example). While ACL is certainly a different platform than DSSE and, hence, facing different potential issues with regards to Large Language Models (LLMs), I still think these rules could be mostly adopted for DSSE:

  • Assistance purely with the language of the paper. When generative models are used for paraphrasing or polishing the author’s original content, rather than for suggesting new content - they are similar to tools like Grammarly, spell checkers, dictionary and synonym tools, which have all been perfectly acceptable for years. If the authors are not sufficiently fluent to notice when the generated output does not match their intended ideas, using such tools without further checking could yield worse results than simpler-but-more-accurate English. The use of tools that only assist with language, like Grammarly or spell checkers, does not need to be disclosed.
  • Short-form input assistance. Even though predictive keyboards or tools like smart compose in google docs are also powered by generative language models, nobody objected to them, since hardly anyone would try to use them to generate a long, unique and coherent text: it would simply not be practical. Similarly to language tools above, the use of such tools does not need to be disclosed in response to the writing assistance question.
  • Literature search. Generative text models may be used as search assistants, e.g. to identify relevant literature. However, we expect the authors to read and discuss such references, just like the references identified by a regular search engine or a semantic literature recommendation tool. The usual requirements for citation accuracy and thoroughness of literature reviews apply; beware of the possible biases in suggested citations.
  • Low-novelty text. Some authors may feel that describing widely known concepts is a waste of their time and can be automated. They should specify where such text was used, and convince the reviewers that the generation was checked to be accurate and is accompanied by relevant and appropriate citations (e.g., using block quotes for verbatim copying). If the generation copies text verbatim from existing work, the authors need to acknowledge all relevant citations: both the source of the text used and the source of the idea(s).
  • New ideas. In cases where the model suggested new research ideas that would deserve co-authorship or acknowledgement from a human colleague, and which you then developed yourself (e.g. topics to discuss, framing of the problem) - we would suggest acknowledging the use of the model, and checking whether there are known sources for any ideas that are not widely known, or that you would not have included but for the help of the model.
  • New ideas + new text: a contributor of both ideas and their execution seems to us like the definition of a co-author, which the models cannot be. While the norms around the use of generative AI in research are being established, we would discourage such use in ACL submissions. If you choose to go down this road, you are welcome to make the case to the reviewers that this should be allowed, and that the new content is in fact correct, coherent, original and does not have missing citations. Note that, as our colleagues at ICML point out, currently it is not even clear who should take the credit for the generated text: the developers of the model, the authors of the training data, or the user who generated it.

An argument for stricter rules on DSSE might be that most answers are low-novelty since this is a not site to publish novel research. However, I suggest to still treat text generated by LLMs, such as ChatGPT, like another source, e.g. Wikipedia, and require users to explicitly quote it. The downside of this approach is that in contrast to sources like a website the references cannot be validated. To manage that problem we could require users to include the prompt they used. (Though I am unsure what level of compliance we can expect here.)

Besides answers in natural language, rules also need to cover code. ACL states the following with regards to code:

A separate, but related issue is use of generative models for writing code. ACL submissions may be accompanied by code, which counts as supplementary materials that the reviewers are not obliged to check and consider, but they may do so if they wish. The use of code assistants such as Copilot is also a relatively new practice, and the norms around that are not fully established. For now, we ask the authors to acknowledge the use of such systems and the scope thereof, e.g. in the README files accompanying the code attachments or repositories. We also ask the authors to check for potential plagiarism. Note that the Copilot in particular is currently the subject of a piracy lawsuit, and may have suggested snippets of code with licenses incompatible with yours. The use of code assistance does not obviate the requirements of authors to ensure the correctness of their methods and results.

However, since purely coding related questions are out of scope for DSSE anyhow, I do not think we need strict restrictions with regards to code, except for a comment in case the code was completely generated by an LLM.

Generally and on the softer side of arguments, I think as a site on which LLMs are on-topic, we need to be open to the innovation our field brings to the table. And if these rules lead to problems, these could be fixed by an re-evaluation of whatever is decided now (e.g. in 3 to 6 months).

$\endgroup$
4
$\begingroup$

My humble opinion: on DSSE we already struggle with having enough people checking other users' answers and voting. This is partly due to the topics on DSSE being so diverse that people often don't feel comfortable voting, in particular when they are not sure whether an answer is correct or not.

Obviously generated answers would worsen this issue, because there would be even more answers that we're not sure about. And sometimes nobody would even see it, but Google would find it and may lead somebody to it later. So in the end this would mostly decrease the global quality of the answers found on the site.

Needless to say, the generated answers I've seen so far are mediocre at best. They look vaguely meaningful but they are very shallow, they don't properly address the question. Another disadvantage is that quite often on DSSE the OP needs a follow-up or clarification, and this is going to be difficult if even the user who generated the answer doesn't understand it.

Basically I think we should follow the ban.

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .