ACL has recently published their take on the topic which I find well thought-through and sufficiently nuanced (in contrast to an undifferentiated strict ban, for example). While ACL is certainly a different platform than DSSE and, hence, facing different potential issues with regards to Large Language Models (LLMs), I still think these rules could be mostly adopted for DSSE:
- Assistance purely with the language of the paper. When generative models are used for paraphrasing or polishing the author’s original content, rather than for suggesting new content - they are similar to tools like Grammarly, spell checkers, dictionary and synonym tools, which have all been perfectly acceptable for years. If the authors are not sufficiently fluent to notice when the generated output does not match their intended ideas, using such tools without further checking could yield worse results than simpler-but-more-accurate English. The use of tools that only assist with language, like Grammarly or spell checkers, does not need to be disclosed.
- Short-form input assistance. Even though predictive keyboards or tools like smart compose in google docs are also powered by generative language models, nobody objected to them, since hardly anyone would try to use them to generate a long, unique and coherent text: it would simply not be practical. Similarly to language tools above, the use of such tools does not need to be disclosed in response to the writing assistance question.
- Literature search. Generative text models may be used as search assistants, e.g. to identify relevant literature. However, we expect the authors to read and discuss such references, just like the references identified by a regular search engine or a semantic literature recommendation tool. The usual requirements for citation accuracy and thoroughness of literature reviews apply; beware of the possible biases in suggested citations.
- Low-novelty text. Some authors may feel that describing widely known concepts is a waste of their time and can be automated. They should specify where such text was used, and convince the reviewers that the generation was checked to be accurate and is accompanied by relevant and appropriate citations (e.g., using block quotes for verbatim copying). If the generation copies text verbatim from existing work, the authors need to acknowledge all relevant citations: both the source of the text used and the source of the idea(s).
- New ideas. In cases where the model suggested new research ideas that would deserve co-authorship or acknowledgement from a human colleague, and which you then developed yourself (e.g. topics to discuss, framing of the problem) - we would suggest acknowledging the use of the model, and checking whether there are known sources for any ideas that are not widely known, or that you would not have included but for the help of the model.
- New ideas + new text: a contributor of both ideas and their execution seems to us like the definition of a co-author, which the models cannot be. While the norms around the use of generative AI in research are being established, we would discourage such use in ACL submissions. If you choose to go down this road, you are welcome to make the case to the reviewers that this should be allowed, and that the new content is in fact correct, coherent, original and does not have missing citations. Note that, as our colleagues at ICML point out, currently it is not even clear who should take the credit for the generated text: the developers of the model, the authors of the training data, or the user who generated it.
An argument for stricter rules on DSSE might be that most answers are low-novelty since this is a not site to publish novel research. However, I suggest to still treat text generated by LLMs, such as ChatGPT, like another source, e.g. Wikipedia, and require users to explicitly quote it. The downside of this approach is that in contrast to sources like a website the references cannot be validated. To manage that problem we could require users to include the prompt they used. (Though I am unsure what level of compliance we can expect here.)
Besides answers in natural language, rules also need to cover code. ACL states the following with regards to code:
A separate, but related issue is use of generative models for writing code. ACL submissions may be accompanied by code, which counts as supplementary materials that the reviewers are not obliged to check and consider, but they may do so if they wish. The use of code assistants such as Copilot is also a relatively new practice, and the norms around that are not fully established. For now, we ask the authors to acknowledge the use of such systems and the scope thereof, e.g. in the README files accompanying the code attachments or repositories. We also ask the authors to check for potential plagiarism. Note that the Copilot in particular is currently the subject of a piracy lawsuit, and may have suggested snippets of code with licenses incompatible with yours. The use of code assistance does not obviate the requirements of authors to ensure the correctness of their methods and results.
However, since purely coding related questions are out of scope for DSSE anyhow, I do not think we need strict restrictions with regards to code, except for a comment in case the code was completely generated by an LLM.
Generally and on the softer side of arguments, I think as a site on which LLMs are on-topic, we need to be open to the innovation our field brings to the table. And if these rules lead to problems, these could be fixed by an re-evaluation of whatever is decided now (e.g. in 3 to 6 months).