What quality threshold should be established for language models/question-answering systems to be allowed to write answers on Stack Exchange?

Question

One of the main criticism against using ChatGPT is that it may frequently give an incorrect answer that appears like it's professionally written and therefore correct. As a result of this and a sudden inundation of low-quality AI-generated answers, SO has banned its use. Other SE sites may follow.

How good should a generative question-answering system or a language model be to be allowed to write answers on a Stack Exchange site? (either with or without a human checking the correctness of the machine-generated answer)

Re "One of the main criticism against using ChatGPT is that sometimes it gives bad answer": Wrong. One of the main criticism is that most of the time, it gives bad answers. — 41686d6564, Commented Dec 13, 2022 at 1:41
@41686d6564standsw.Palestine meta.stackexchange.com/q/384410/178179 — Franck Dernoncourt, Commented Dec 13, 2022 at 1:44
Also, counterpoint: suppose some AI reaches 99% accuracy (which is better than humans), still, why would the poster blindly copy and paste the answer without validating it first? — 41686d6564, Commented Dec 13, 2022 at 1:45
@41686d6564standsw.Palestine the accuracy of human answers on Stackoverflow is definitely lower than 99% :-) — JonathanReez, Commented Dec 13, 2022 at 1:55
I guess they need an approved list of calculators: bbc.com/news/education-27391683 satsuite.collegeboard.org/sat/what-to-bring-do/… apstudents.collegeboard.org/exam-policies-guidelines/… --- Should math.SE and mathoverflow.SE allow the use of calculators? --- Should writing.SE allow computer spell check? --- Should astronomy.SE allow telescopes, or should observations be made with the naked eye? — Rob, Commented Dec 13, 2022 at 2:04
Isn't generative usually taken in the Chomsky sense of the term? — bad_coder, Commented Dec 13, 2022 at 2:16
@bad_coder I use it in the NLP sense (writing answers from scratch instead of extracting them from some passage). — Franck Dernoncourt, Commented Dec 13, 2022 at 2:22
It would be good enough when AI will have a soul. Or, in my humble opinion, never. — Shadow Wizard, Commented Dec 13, 2022 at 7:56
"validating by who?" by the poster. It asks "why would the poster blindly copy and paste the answer without validating it first". There is no change of the subject until it reaches "validating". Moreover, the main problem which has repeatedly been cited that blatantly incorrect content was posted where the answerer either knowingly or completely without care copy/pasted the generated content. Often in rapid succession, as well, just to nail down that their motive is not to provide good content but just any content. The context around this discussion seems to be quite clear. — VLAZ, Commented Dec 13, 2022 at 7:57
@VLAZ I see, then "why would the poster blindly copy and paste the answer without validating it first" is answered by your comment. — Franck Dernoncourt, Commented Dec 13, 2022 at 10:10
But not by your question where you put forward the premise suggesting that there should be some form of metric to allow such blind copy/pasting. — VLAZ, Commented Dec 13, 2022 at 10:40
@VLAZ Not necessarily blind. Good point, I've edited the question to clarify it. SO banned ChatGPT regardless of human verification. — Franck Dernoncourt, Commented Dec 13, 2022 at 10:46
I don't understand why this was closed. Why not discuss this topic? — Rebecca J. Stones, Commented Apr 13, 2023 at 2:49
@RebeccaJ.Stones Same, no idea. Good timing to reopen given today's release of Hugging Face Introduces StackLLaMA: A 7B Parameter Language Model Based on LLaMA and Trained on Data from Stack Exchange Using RLHF, and the quality of GPT-4. — Franck Dernoncourt, Commented Apr 13, 2023 at 2:50

Anton Menshov · Accepted Answer · 2022-12-13 02:07:23Z

24

I would argue that the potential quality of the answers generated automatically (using ChatGPT-like services) is irrelevant.

If such answers are mostly¹ bad -> they should not be allowed.
If the answers are generally¹ good -> then, ChatGPT-answer could have been obtained by the user themselves similar to search engines; thus, a user could have asked ChatGPT the question directly without asking it on Stack Exchange. Nobody would find posting a "screenshot" of Google search results as an answer useful.

Therefore, I don't see a place for fully automatically-generated answers on Stack Exchange at all. Thus, however accurate those services become, I don't see the value in allowing them on Stack Exchange² as answers³.

(part of this answer is rewritten from my answer on Academia SE Meta)

¹ mostly/generally are very subjective, and determining the exact number might be hard; however, for this answer it is irrelevant.

² I can easily see a different than Stack Exchange service which is very valuable for users providing automatically-generated answers, potentially curated by human users. It is just a different service and business.

³ Stack Exchange can probably use ChatGPT-like services in "Ask Wizard" or other mechanisms. But not as an answer itself.

answered Dec 13, 2022 at 2:07

Anton Menshov

10.8k11 gold badges31 silver badges67 bronze badges

2

Shouldn't we then require users to check ChatGPT today before posting a question? It's already very good at coding questions so it seems reasonable to require users to try the ChatGPT solution before going on Stackoverflow.
– JonathanReez
Commented Dec 13, 2022 at 4:17
4

@JonathanReez do we already require users to check Google? I don't see how it is different in that regard.
– Anton Menshov
Commented Dec 13, 2022 at 4:23
2

My current interpretation of the rules is no, we don't require them to check Google. We only don't allow questions that have been asked on SE itself, even if answered on other sites. But then... why would ChatGPT be any different? Why would we mandate users to check ChatGPT (or a future version of it) if we don't mandate it for Google?
– JonathanReez
Commented Dec 13, 2022 at 4:25
1

@JonathanReez I don't see why we should mandate users to check ChatGPT as well.
– Anton Menshov
Commented Dec 13, 2022 at 4:25
2

But... that's literally what your answer states? thus, a user could have asked ChatGPT the question directly without asking it on Stack Exchange. Nobody would find posting a "screenshot" of Google search results as an answer useful.
– JonathanReez
Commented Dec 13, 2022 at 4:26
5

So, either the question shouldn't have been asked at all because its trivially answered by ChatGPT+ or it's a valid question and any valid answer should be acceptable, including an autogenerated one. I struggle to understand why it would matter whether or not the correct answer is human generated or not.
– JonathanReez
Commented Dec 13, 2022 at 4:27
1

Nope. Not at all. My answer states that neither posting of Google search screenshots nor ChatGPT-generated answers has a place on SE. Nothing that I am saying about mandating users to check Google or Chat GPT. They could. They also can ask on Quora\Habrahabr\Reddit\younameit.
– Anton Menshov
Commented Dec 13, 2022 at 4:27
6

Ok, then I remain confused as to why we should care whether a valid/correct answer is human-generated or machine-generated. We don't accept screenshots from Google but in practice we do accept quotes found in Google with a reference to the source.
– JonathanReez
Commented Dec 13, 2022 at 4:31
1

We may not enforce that users search before asking, but we do require/ask of them to do so. The first point of how to ask page is "search and research". When you ask questions you are also presented with a generated list of questions on the site too. So, we may not require you to tongue test to ask question, but we strongly recommend you to do so.
– Braiam
Commented Apr 14, 2023 at 15:36
1

Except you are forgetting you still need to ask ChatGPT the right questions to get decent answers. Many don't know how to do that. It also doesn't yet run the code that's produced & run benchmarks, which in a SO answer would be required. So still a big scope for canonical GPT derived and verified answers to be featured on SO.
– Tom Wenseleers
Commented Jun 17, 2023 at 17:53
1

@TomWenseleers I see that as a plus in some twisted way. If chatgpt can't understand your question, a human could have a problem with it. Something like "does this question have all the reasonable elements to be answered?"
– Braiam
Commented Jun 21, 2023 at 14:08
1

@Braiam I agree - I definitely ask any question first to ChatGPT4 now. Usually that already gets you some useful pointers or manages to answer your question entirely. It's rare now with ChatGPT4 not to get an answer at least as good than what you might get as a reply on SO. It usually rephrases your question in a clearer way as well. So it really helps to guide your thinking.
– Tom Wenseleers
Commented Jun 21, 2023 at 14:49

Add a comment |

41686d6564 · Accepted Answer · 2023-03-19 23:50:57Z

11

How good should a generative question-answering system or a language model be to be allowed to write answers on a Stack Exchange site?

Let's imagine it is perfect.

Why would anyone ask on Stack Overflow then? Why not ask the AI directly?

Technology changes fast, we already have problems with outdated answers. If AI could give us correct answers, why would we create a static collection of answers generated by AI, that may not be up to date?

Searching is always an issue, too. Why are people dumping their questions on Stack Overflow without searching? Because it takes time. If we allow AI-generated content there will be even more of it, which makes searching even harder. And again why would anyone search or ask on Stack Overflow if they can just ask AI?

But most likely it will never be that good. And it does not matter, because it is simply not an appropriate tool for the Stack Exchange platform. We want human knowledge and insight. Human reasoning. That is something AI cannot give us.

edited Mar 19, 2023 at 23:50

41686d6564

15.2k2 gold badges36 silver badges86 bronze badges

answered Dec 13, 2022 at 7:33

Resistance Is Futile

19.1k12 gold badges54 silver badges88 bronze badges

"We want human knowledge and insight. That is something AI cannot give us." it can. But therein lies the problem. An AI can examine a large volume of human-generated content then summarise the insights they've expressed. Let's say humans agree that feature X is inconvenient because of A, B, and C. Therefore experienced people suggest feature Y which has less potential downsides. An AI can indeed relay the gist here. But only for things humans have expressed an opinion on. Let's say 10 years later such AI summaries are the norm. There are less humans sharing their experience. Less to sum up.
– VLAZ
Commented Dec 13, 2022 at 7:47
@VLAZ Yes, AI is trained on human data, and while there are situations where it can appropriately summarize that knowledge, there will always be situations where it will not be able to because we are encountering a completely new problem that requires new approach and solution. Maybe it can give some ideas and trigger people to go in particular direction they would not consider, but that is all.
– Resistance Is Futile
Commented Dec 13, 2022 at 8:16
1

"Let's imagine it is perfect." -> that's the easiest case indeed :) "We want human knowledge and insight. That is something AI cannot give us." -> Why not? QA systems can be pretty good to query human knowledge in at least some domains or question types.
– Franck Dernoncourt
Commented Dec 13, 2022 at 22:43
@FranckDernoncourt Because you want the human knowledge that is not already out there. You want human ability to solve new problems and there will always be such problems.
– Resistance Is Futile
Commented Dec 14, 2022 at 7:03
1

@ResistanceIsFutile so you mean human reasoning then?
– Franck Dernoncourt
Commented Dec 14, 2022 at 7:05
@FranckDernoncourt Yes, I thought that "insight" covers it.
– Resistance Is Futile
Commented Dec 14, 2022 at 8:47
2

"Why not ask the AI directly?" what happens if I don't know that the AI (actually ML) exist? Or the AI is behind a paywall? Or the AI is overloaded? There's a time misalocation problem that SE asynchronous nature solves. Also, sometimes my queries are not as useful as someone that reads my question and compiles a better query. The AI might be perfect, but it can't be better than the inputs presented either.
– Braiam
Commented Apr 14, 2023 at 15:32
1

@ResistanceIsFutile Well practically all the answers with R & Python & Matlab code that I look at has reproducible code.... You still need to run code to have a benchmark on your machine & be able to compare to other methods, which is often what's requested. And I would just allow ChatGPT or Github Copilot assisted answers for these sorts of cases...
– Tom Wenseleers
Commented Jun 17, 2023 at 18:33
1

@ResistanceIsFutile Which is why regular users, maybe dependent on their reputation in a particular field, should be given more powers to kick out the bad answers. Also not saying GPT should be allowed everywhere on the site. But for programming, given that >80% of all software developers now use it already, it's clear you won't stop it. And with a blanket ban, people would post GPT answers anyway & not disclose it, which would be worse... You should also be able to train much better AI Bots to do a lot of the moderation-that would scale.
– Tom Wenseleers
Commented Jun 17, 2023 at 19:22
1

@ResistanceIsFutile ChatGPT4 is pretty good at rating SO answers actually. If I ask it to rate the answers here e.g. stats.stackexchange.com/questions/130661/… it gives correct comments about each answer & gives the given answers all a 6/10 except mine which gets a 9/10, which seems quite correct to me... Probably too expensive still to do this at scale, but just saying...
– Tom Wenseleers
Commented Jun 17, 2023 at 19:36
1

@ResistanceIsFutile I posted some examples here: meta.stackexchange.com/questions/387575/…
– Tom Wenseleers
Commented Jun 17, 2023 at 20:48
1

@ResistanceIsFutile I felt obliged to share my experiences with using ChatGPT4 & generative AI, which are totally at odds with everything I read here. But yes, I'll also leave it at that. That the SO community does with that input what they want. Most of the answers & replies of others on AI are totally outdated - they have no clue what they are talking about really...
– Tom Wenseleers
Commented Jun 18, 2023 at 10:17
1

@ResistanceIsFutile Weren't you on strike actually? Or you make an exception when it concerns bashing the potential of AI? :-)
– Tom Wenseleers
Commented Jun 18, 2023 at 10:24
1

@TomWenseleers I am on strike on main sites. Meta sites are different as they are place of discussion about how sites work or should work.
– Resistance Is Futile
Commented Jun 18, 2023 at 10:28
1

@ResistanceIsFutile Thanks - I see - for me it was the first time posting things here. But when it comes to discussing ChatGPT it all feels more like activism to me, more than a reasoned data-based discussion. I find posting here super frustrating. I think I'll retreat back to Stack Overflow & Cross Validated. Or ChatGPT4: always up for a chat & no votes to close questions there...
– Tom Wenseleers
Commented Jun 18, 2023 at 10:32

| Show 12 more comments

bad_coder · Accepted Answer · 2022-12-13 02:30:08Z

6

How good should a generative question-answering system or a language model be to be allowed to write answers on a Stack Exchange site?

Put simply we assume that Q&A generation system would be training/learning from SO answers and other sources; hence there'd be no gain (theoretically?) in allowing it to output into its learning data/corpus (or cross-posting parts from outside SO). I don't see a threshold where we could consider the AI generated content as being original.

Accuracy in that case would assume the AI is producing relevant answers that haven't been posted by humans on the SE repository or elsewhere - it has yet to be demonstrated that ChatGPT is doing more than collating on-topic content and composing syntactically coherent sentences.

In conclusion ChatGPT would make sense as a SaaS Q&A oracle not as a participant in a human problem-solving site.

answered Dec 13, 2022 at 2:30

bad_coder

27.3k7 gold badges50 silver badges129 bronze badges

2

it has yet to be demonstrated that ChatGPT is doing more than collating on-topic content and composing syntactically coherent sentences => has this been demonstrated for humans? After all we have a similar neural network in our brains.
– JonathanReez
Commented Dec 13, 2022 at 4:21
2

@JonathanReez of course it has, usually it's called creativity.
– bad_coder
Commented Dec 13, 2022 at 4:27
2

What test can conclusively prove the presence or lack of creativity in a given individual/computer system? Are you sure ChatGPT wouldn't pass said test better than a large percentage of humans?
– JonathanReez
Commented Dec 13, 2022 at 4:30
2

ChatGPT generates plenty of coherent, reasonable text that's never existed before, yes. Your answer seems to be hung up on collating on-topic content and composing syntactically coherent sentences and AFAIK there's no reason to believe humans aren't doing the exact same thing internally. We're nothing but fancy auto-completion and prediction neural nets.
– JonathanReez
Commented Dec 13, 2022 at 4:49
1

@JonathanReez your claims lack proof.
– bad_coder
Commented Dec 13, 2022 at 4:55
2

1. Could you add a reference for the idea that humans aren't collating on-topic content and composing syntactically coherent sentences internally? 2. What kind of proof would you like? ChatGPT is free and open for all to test.
– JonathanReez
Commented Dec 13, 2022 at 4:57
2

Here you go => what percentage of software engineers can do better than that? Surely not 100%? Note that the problems are all original and never seen before by the AI.
– JonathanReez
Commented Dec 13, 2022 at 5:02
2

@Jonathan ChatGPT generates plausible text, consistent with its training data & input, but it doesn't know what it's talking about, and it has no way of representing or evaluating the truth of its utterances. Sure, it can say true things, but it can also say complete nonsense, and it can't tell the difference. Eg, i.sstatic.net/0WsuA.png
– PM 2Ring
Commented Dec 13, 2022 at 10:08
3

@PM2Ring sure but a lot of humans likewise write text without really understanding what they’re talking about. That’s how you get poor StackOverflow answers. ChatGPT is estimated to have an IQ of 80 at this point.
– JonathanReez
Commented Dec 13, 2022 at 14:14
2

@JonathanReez sounds like we need much stronger tools to use against low quality content and be much more assertive in dealing with it. Rather than using the low quality content to justify more low quality content.
– VLAZ
Commented Dec 13, 2022 at 14:52
2

@VLAZ OPs questions basically asks what the IQ of a future GPT should be in order for it to be acceptable on stack overflow. In theory a future system might have an IQ of 130 and thus beat 99% of human engineers in writing StackOverflow answers. What then?
– JonathanReez
Commented Dec 13, 2022 at 15:06
3

@JonathanReez arxiv.org/pdf/1906.00077.pdf "When writing a summary, humans tend to choose content from one or two sentences and merge them into a single summary sentence". So humans are also often just collating on-topic content :)
– Franck Dernoncourt
Commented Dec 13, 2022 at 22:39
2

@FranckDernoncourt people seem to believe they have a magical "soul" that generates "true inspiration", as opposed to ChatGPT which lacks one.
– JonathanReez
Commented Dec 13, 2022 at 22:59
1

I've had ChatGPT4 write down a verbal problem statement as a differential equation system and solve it correctly using Wolfram Alpha. I would count that under reasoning capability. This answer was pretty good too, with prompt just being original question plus 2 extra ones to ask for some extra details: stats.stackexchange.com/questions/76925/…
– Tom Wenseleers
Commented Jun 17, 2023 at 17:57
1

@JonathanReez Well they did give ChatGPT4 an IQ test & it scored 155 apparently, which would be top 1%, scientificamerican.com/article/…. I think that's a bit of an overestimate, but the vast knowledge helps of course. And for answering many university entrance exams it ends up in the top10%, openai.com/research/gpt-4. All of the info on SO on ChatGPT is terribly outdated. And any question about it that's remotely positive gets closed down immediately.
– Tom Wenseleers
Commented Jun 17, 2023 at 19:52

| Show 5 more comments

Stack Exchange Network

What quality threshold should be established for language models/question-answering systems to be allowed to write answers on Stack Exchange?

3 Answers 3

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
discussion
quality
chatgpt
.

Linked

Hot Network Questions

What quality threshold should be established for language models/question-answering systems to be allowed to write answers on Stack Exchange?

3 Answers 3

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged discussionqualitychatgpt.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
discussion
quality
chatgpt
.