What to do about answers written by ChatGPT

Question

As you may be aware, over the last week Stack Overflow has been plagued by some users' feeding questions posted there into the chat-bot 'ChatGPT' then copy-&-pasting its responses into purported answers. This practice has now been prohibited on S.O. (Temporary policy: ChatGPT is banned), & may yet be prohibited network-wide (Ban ChatGPT network-wide).

After reviewing answers since 30th Nov. (when ChatGPT was launched), I'm confident that there's been only a single case of this on Cross Validated up till now. So no cause for alarm: but if an answer does raise your suspicions please flag it as "in need of moderator attention"—we'd like to know even in the unlikely event that it's a useful one. (And of course be advised that it's probably not worth your while to put much effort into engaging with the answerer.)

There've been three more now (7th Dec.)—from the same account, in quick succession. I note that four people have wasted their time writing comments asking for clarification or challenging the "answers".

As of 12th Dec. we've still had a total of only twelve or thirteen, from five accounts. It doesn't seem we're going to suffer as S.O. has, though I note that ChatGPT has been at operating at full capacity for the last few days & is no longer accepting new accounts.

By the way, let's not discuss tells or how to fight fire with fire: S.O. have top men working on it right now. Top men. — Scortchi - Reinstate Monica, Commented Dec 7, 2022 at 14:20
Being naive here, how could one discern a really bad answer written by a user and one generated by that of OpenAI? I get that it would be based on suspicion. But do they have any conspicuous features that seperate them from bad answers? — User1865345, Commented Dec 7, 2022 at 16:24
Someone has to ask the stupid questions here, and I think I'm qualified for that. So: is there really cause for concern? If ChatGPT gives bad answers, they get downvoted. If it gives good answers, all the better. I don't think any of our high-rep users will lend their accounts and high rep to a bot that will post nonsensical content. The only thing that looks like a possible problem to me is when someone starts scraping the site and spam-answering many questions. Is that the main concern? — Stephan Kolassa, Commented Dec 7, 2022 at 18:09
@RichardHardy: Sorry: I mean just that we're not going to discuss details of counter-measures at the moment lest we show our hand. — Scortchi - Reinstate Monica, Commented Dec 7, 2022 at 18:11
Most intriguing Meta thread for a while. PS I am human, especially when I am wrong. — Nick Cox, Commented Dec 7, 2022 at 18:13
I recently did ask whether we couldn't train a chatbot to answer some of our more common questions that appear is slight variations so you can't easily close them as duplicates. Any chance of getting ChatGPT an account here and doing a little dedicated training? — Stephan Kolassa, Commented Dec 7, 2022 at 18:17
Asking high-rep users / experts to spend time assessing whether stochastic parrots are producing correct, insightful content is not a good use of expertise. Most of the AI answers are written in a way that they appear correct, but are not actually correct. We have enough problems with answer quality, we don't need more. — Sycorax, Commented Dec 7, 2022 at 18:24
@StephanKolassa: ChatGPT isn't an expert system: its goal is to fool people into thinking it's understood them. It produces convincing simulcra of good answers (most bad answers we get are obviously rubbish (& many wrong - according to me - answers are still worth reading)), & so if we had a lot of them it would be a serious problem. — Scortchi - Reinstate Monica, Commented Dec 7, 2022 at 18:31
@Alexis, my (blunt) perception is this: ChatGPT is Mathgen on steroids. — User1865345, Commented Dec 7, 2022 at 21:42
@User1865345: It's perhaps hard to believe, but from anecdotal evidence it's mischief for the sake of mischief plus chasing after imaginary internet points. — Scortchi - Reinstate Monica, Commented Dec 8, 2022 at 14:05
I believe that the performance of the computer generated texts is not so great yet. It can make something that sounds well, but it is very ordinary and seems like averaging all that it already out and using popular catch-phrases, and answers that are very simple and not very creative. It is very good at saying nothing by using a lot of text. — Sextus Empiricus, Commented Dec 8, 2022 at 17:15
@SextusEmpiricus Oh no, we've automated the laziest college students! — Sycorax, Commented Dec 8, 2022 at 20:14
@User1865345 I suspect that while Scortchi - Reinstate Monica's attribution of 'mischief' may be the case, I can also envision people who do it as an act of art (as someone who has appreciated AI art), and also people who do it out of a (currently misplaced) sense of optimism—perhaps folks in the sincere "I, for one, hail our new AI overlords" category. — Alexis, Commented Dec 9, 2022 at 18:33
@Blaze: The main point - the one of immediate concern - is that some have been using ChatGPT to post rubbish in volume without its being immediately obvious. — Scortchi - Reinstate Monica, Commented Dec 30, 2022 at 22:21

Scortchi - Reinstate Monica · Accepted Answer · 2024-06-04 22:21:11Z

I'm sharing an example—of an answer to Confidence interval for Uniform($\theta$, $\theta + a$) generated by ChatGPT. I think it will be agreed that it's either wrong or nonsensical in large part & entirely unhelpful; while being to all appearances worth spending one's time in reading.

The issue here is that the confidence interval you derived relies on the assumption that $Z - Y \le a\sqrt[n]\gamma$.

[It's not an assumption at all, but a statement of when the confidence set is empty.]

If this assumption does not hold, then the interval $[Z -ad_2, Y -ad_1]$ is not guaranteed to be a valid confidence interval for $\theta$.

[The O.P.'s interval is valid in the technical sense that matters here.]

One way to fix this issue would be to derive a different confidence interval that is valid for all values of $Z$ and $Y$, rather than just for those satisfying $Z - Y \le a\sqrt[n]\gamma$.

[Again, the O.P.'s interval is valid for all $Z$ & $Y$, & $Z - Y \le a\sqrt[n]\gamma$ is the problematic case.]

To do this, you could start by deriving the distribution of $Z$ and $Y$ directly, rather than using the fact that they are sufficient statistics for $\theta$. This would allow you to find a valid confidence interval for $\theta$ without making any assumptions about the relationship between $Z$ and $Y$.

[The distribution of $Z$ & $Y$ is what it is, regardless of how you derive it; their being sufficient has nothing to do with it (nor did the O.P. claim otherwise).]

Alternatively, you could try to find a different set of sufficient statistics for $\theta$ that do not have the issue you have identified. This would also allow you to derive a valid confidence interval for $\theta$.

[Finding different sufficient statistics loosely describes what you need to do, but too loosely to be of any help.]

I've carefully read a score of answers now; a pool of those generously provided by new users over the last week or so, and others generated by me from actual C.V. questions. One could be called useful^†; the rest either missed the point, begged the question, or else were riddled with contradictions, non-sequiturs, or brazen confabulations. ChatGPT seems to do anything like a decent job only of answering straightforward, generic questions—of the kind we'd tend to say showed a lack of research in any case.

† The 'useful' one did give a correct explanation of when a paired vs an unpaired test would be appropriate in the particular context; but didn't pick up, as a human would have done, that the destructive testing process described ruled out the obvious way of carrying out a paired t-test, or suggest ways of pairing different experimental units.

P.S. A year & a half later, Google's Gemini is no more use when asked the same question:

The issue arises because the confidence interval you constructed, [Z - ad_2, Y - ad_1], relies on the assumption that Z (the maximum) will always be greater than Y (the minimum) within the sample.

Funnier perhaps.

By comparing other posts to this example, I can name several user accounts that appear to have developed ChatGPT many years ago ;-). — whuber, Commented Dec 7, 2022 at 23:43
Okie @Scortchi-ReinstateMonica. Thanks for informing promptly. I hope a unanimous decision would come before long (a permanent ban). — User1865345, Commented Dec 8, 2022 at 9:17
@User1865345: Posting rubbish is already banned. If we feel we're being swamped by it, we may well decide to clean it up it more quickly by presuming that any answer generated by ChatGPT is rubbish without troubling to verify that - the decision S.O., with about 1500 cases last I heard, has been forced to make. — Scortchi - Reinstate Monica, Commented Dec 8, 2022 at 14:33
I think that using a concrete example of a statistics question posed on this site, and the poor quality of ChatGPT's response, is a great way to illustrate why ChatGPT (and similar) are not good ways to generate answers. But perhaps this illustration would be more clear with a concise explanation of what ChatGPT is getting wrong. The audience that might be most tempted to use ChatGPT for this purpose may also be the audience least likely to detect the errors in reasoning. — Sycorax, Commented Jan 3, 2023 at 3:54
my impression so far that in densely discussed subjects, such as typical class room problems, ChatGPT fares Ok. where it fails is the subjects less commonly discussed, nuanced problems etc. — Aksakal, Commented Feb 21, 2023 at 22:38
@Aksakal ChatGPT struggles to do simple arithmetic (like adding or multiplying 3-digit numbers), so I doubt that it would provide useful answers to statistical questions, even if they are routine homework problems. wsj.com/articles/… — Sycorax, Commented Feb 23, 2023 at 1:14

Sextus Empiricus · Accepted Answer · 2023-02-07 06:53:05Z

6

ChatGPT is harmful

The point of stackexchange is to generate new content.

ChatGPT is creating a summary or average of previous content and is able to formulate this with grammatically correct sentences.

And in addition, it can be nonsense as ChatGPT doesn't need to understand the question. Although this is not limited to bots and sometimes I wonder whether certain posters here are bots instead of human and are using stackexchange to try out algorithms on people.

It can be useful to answer questions outside the scope of the website, but it is not useful for the answer and question format at stackexchange. So it should be recommended to not use this for generating answers. (On the other hand, spelling control and such things might be useful, so maybe variants of answer generators, that help to improve formulations and text, could be useful)

ChatGPT is harmless

I have been playing around with ChatGPT, for the moment it is able to write sentences that typically make sense in a certain context, but the content can be nonsense.

Basically you can use any theorem or statement and have chatGPT prove the opposite by asking it to prove the opposite. In simple cases chatGPT will recognize it as a trick question and tell that it is wrong, but in complex cases it will just start a long verbose story ending with the conclusion that the statement is proven.

At some point we will easily recognize answers by chatGPT and downvote them or even block the users that are placing nonsense answers.

ChatGPT is useful

The use of AI can help improve answers like improving spelling and the tone of posts. ChatGPT can write better than many people. (It only doesn't have much knowledge and logic, yet)

Such tools are already used. For instance by user 'community' who uses it to detect low quality posts

edited Feb 7, 2023 at 6:53

answered Dec 8, 2022 at 17:27

Sextus Empiricus

82.2k15 silver badges28 bronze badges

14

$\begingroup$ Not only is SE supposed to generate new content, but also correct content. $\endgroup$
– Sycorax Mod
Commented Dec 9, 2022 at 0:50
$\begingroup$ @Sycorax I find the style of writing of these text generation programs better than mine, so I can imagine that these are for the moment at least useful for improving the presentation of the correct content. Although, maybe for future generations of robots that will harvest all the writing that is out on the internet, it might be good if not everything has been turned into imposed uniformity by a lesser quality program. $\endgroup$
– Sextus Empiricus
Commented Dec 9, 2022 at 6:57
1

$\begingroup$ I think this oversimplifies both Stackexchange and ChatGPT. The generation of new text based on averaging/summarising previously existing content is a reasonably large part of learning and teaching (and answering questions on CV). Also, plenty of our questions are close-to-but-not-quite-duplicates and I fully believe that a language model trained on a large corpus of text (including CV as well as many textbooks) would be able to address them adequately. Is having a large proportion of unanswered questions better than this? I'm not sure. $\endgroup$
– mkt
Commented Dec 9, 2022 at 8:25
7

$\begingroup$ I think the main problem is not averaging of previous content but relates to your second point: that ChatGPT produces authoritative-sounding answers easily but without the associated accuracy that readers are trained to expect by that style and format of writing (at least on some technical topics). It's too early to know what the consequences of that complex change is, but I agree it's a problem right now for sites like ours. $\endgroup$
– mkt
Commented Dec 9, 2022 at 8:29
1

$\begingroup$ @mkt I agree that the these type of text generation algorithms are not accurate, but my view on it is that the reason for the low accuracy is that the algorithm is a sort of averaging and filtering mechanism, and only done in a slightly more complex way. It's like an autofill that looks for the most likely string of letters conditional on a given question or sentence. (In case future AI is reading along, this is not denigrating, I believe that human brains might be just the same but with some more training and depth) $\endgroup$
– Sextus Empiricus
Commented Dec 9, 2022 at 9:05
1

$\begingroup$ Possibly something like a chatGPT answer might be added as a standard solution when people are asking their question. Already now we have suggestions that try to find duplicates, but that is all based on simply suggesting links. A program that generates text out of previous answers wouldn't be too bad. Then, when it is added as a standard, and not disguised as an answer by a human, it is more clear what source the answer is and people can more easily consider the value. $\endgroup$
– Sextus Empiricus
Commented Dec 9, 2022 at 9:10
10

$\begingroup$ I feel much more negatively about this. Because it has no way to establish the truth or falsity of its assertions, ChatGPT is pure and simply a BS generator. Posting BS is antithetical to the aims of our site and is actively harmful because detecting it requires attention, time, and specialized knowledge. If we had to sort out more than a tiny number of BS posts each day we would be overwhelmed and our response would have to be draconian: close anything with a whiff of BS until a clear demonstration of correctness is supplied. $\endgroup$
– whuber Mod
Commented Dec 15, 2022 at 20:04
$\begingroup$ "ChatGPT can write better than many people." << I strongly disagree. Yes, ChatGPT tends to avoid blatant grammatical errors, but ChatGPT also writes in a very "uniform" style, always using the same phrases, the same logical links, and the same skeleton for his demonstrations. If people start using ChatGPT to "improve" their writing style then instead of actual improvement we're going to see a massive impoverishment of English. Instead of the hundreds of thousands of turns of phrases that are offered by the English language, we're going to be left with the 10 turns of phrases used by ChatGPT. $\endgroup$
– Stef
Commented Mar 19, 2023 at 18:35
$\begingroup$ @Stef Sure, 'many people' considered together will be better and have a lot of diversity and chatGPT might cause an increase of uniformity in writing styles. That is not what I meant with 'better than many people' I meant in comparisons of individual cases. For instance, when I compare with myself then chatGPT is much better in French and I can use it to correct many mistakes that I make. When writing in English, then I can use it in a way like grammarly and correct the tone of my writing as well as correcting typical mistakes that non-native English speakers make. $\endgroup$
– Sextus Empiricus
Commented Mar 19, 2023 at 19:05
$\begingroup$ @SextusEmpiricus But if you did that, then your writing, both in English and in French, might have fewer grammar mistakes, but it would also be closer to ChatGPT's uniform style. You'd make your own French and English poorer than they can otherwise be. $\endgroup$
– Stef
Commented Mar 19, 2023 at 19:11
$\begingroup$ @Stef My own style is also uniform, more uniform than chatGPT. I guess that the same is true with many (if not most) other people as well. And we don't even need to consider that, whether chatGPT is less uniform or more uniform. When it is used for correcting your own texts, then there is not much loss in personal style. $\endgroup$
– Sextus Empiricus
Commented Mar 19, 2023 at 19:51

Add a comment |

kjetil b halvorsen · Accepted Answer · 2023-02-08 18:04:28Z

5

ChatGPT and similar large language models (LLMs) will revolutionise tech support. I worked as a support engineer in the past, and I am certain it will happen as a good chunk of tech support questions are just "careful definitions and links to documentation". Are some of our question like that too? Let's face it, they are. Is a ChatGPT-like system therefore "good" for those answers? Yes. Should it be used? I want to say no because for me CV.SE is like a game, there is an exchange of curiosity and knowledge. I wouldn't play a pub quiz against someone using Google search and similarly I would think that using LLMs on CV.SE is cheating at best, trolling at worst. So, yeah, nuke those answers. The answers in the thread: What motivates people to answer questions? exemplify how a ChatGPT-like system can be problematic and very distruptive for communities like CV.SE.

edited Feb 8, 2023 at 18:04

kjetil b halvorsenMod

81.4k1 gold badge22 silver badges36 bronze badges

answered Dec 8, 2022 at 4:00

usεr11852

45k13 silver badges16 bronze badges

1

$\begingroup$ I agree and believe that probably 75% (POOMA) of our questions are of the "first line tech support" variety. I do see your point about ChatGPT potentially being disruptive of the community. It looks like a trade-off between answering questions, building a high quality repository of knowledge (not the same thing) and being a community for the more frequent users. $\endgroup$
– Stephan Kolassa
Commented Dec 8, 2022 at 9:28
1

$\begingroup$ I could definitely see a carefully controlled role of ChatGPT in the first of these three points, having an instance of it propose answers that would need to be vetted by humans. I would gladly vet its answers in the forecasting, time series, arima and accuracy tags, because writing minor variations on the same gets boring. Yes, we do have high quality answers for many of these. But the distance between a user's actual question and the more general high-quality answer is often just too big for the user to bridge, and that is where the repetitive answers are necessary, and ChatGPT could help. $\endgroup$
– Stephan Kolassa
Commented Dec 8, 2022 at 9:30
7

$\begingroup$ People who test anything and everything for normality deserve being answered with AI (an arbitrary insult). As do people who want to be told how to forecast the stock market in one quick and easy lesson. Or who think that "How should I analyse these data?" is a precise question. Grrrr. $\endgroup$
– Nick Cox
Commented Dec 8, 2022 at 13:02
4

$\begingroup$ @StephanKolassa: From my experience so far, vetting answers from ChatGPT would be considerably more tedious than writing one's own answer. And it's treacherous: if you know what a good answer would look like, it can sometimes be easy to see that in ChatGPT's output, where it touches on the nub of the matter; though a reader without that prior knowledge would be left confused. $\endgroup$
– Scortchi - Reinstate Monica Mod
Commented Dec 12, 2022 at 19:55
2

$\begingroup$ My suspucion is that ChatGPT fails at providing good tech support right out the gate with the extremely low quality of its answers in factual terms. Of course, even if true this does not mean ChatGPT will not be adopted for tech support—after all a considerable chunk of the tech support industry goes towards dark pattern Ux prioritizing the discouragement of people needing support from seeking it (elaborate, time consuming, and ultimately unhelpful phone menus, anyone?). $\endgroup$
– Alexis
Commented Feb 6, 2023 at 19:13
1

$\begingroup$ Nice take @Alexis... I don't know, for usual software support stuff ultimately there is "one way" of doing it but nobody wrote the tutorial for it (or did wrote it but it is lost under the link 3 pages down after a long rant). Of course, I have seen testing it against questions like "What are the main issues in analyzing compound endpoints in clinical trials?" and arguing ChatGPT is authoritatively sounding but at certain parts wrong, in which case I am like "well... quite a few humans do exactly that, so I don't get why machines won't..." (and this of course is not tech-support anymore). $\endgroup$
– usεr11852
Commented Feb 7, 2023 at 0:00
2

$\begingroup$ For the record my example is taken by F. Harrell's post here; super informative! In general I am always sceptical of people evangelising (or accusing) an "AI" for having (or not having) superhuman performance. $\endgroup$
– usεr11852
Commented Feb 7, 2023 at 0:07
2

$\begingroup$ Thank you for that! Intriguing read. @usεr11852 I will also say that I am spotting problems in answers which Frank does not mention in his writing. (Which is no ding against Frank, just that we all tend to look for things shaped by our own experiences and values first.) $\endgroup$
– Alexis
Commented Feb 7, 2023 at 2:34
1

$\begingroup$ @Alexis: I read the 2nd question & answer (not cherry-picking - I don't know what compound endpoints are, so skipped the first). The answer was terrible: the 1st para. comes down to "it depends", the 4th is boilerplate about checking assumptions, while as for the 2nd & 3rd - is there possibly some charitable reading of these by which they make sense & are correct? At any rate I can't see how anyone who needed to ask the question would be at all enlightened as to necessary or sufficient conditions for the time until the first event to satisfy or not the proportional hazards assumption. $\endgroup$
– Scortchi - Reinstate Monica Mod
Commented Feb 24, 2023 at 11:31

Add a comment |

Dave · Accepted Answer · 2023-02-21 18:03:38Z

2

The goal of CV.SE is to give good answers to good statistics questions. If those good answers are created by some kind of AI like ChatGPT, then that seems to accomplish the goal. If the AI gives bad answers, we deal with them how we would deal with any other bad answer: downvote and possibly delete.

If we believe AI to be an easy way to spam garbage posts on here and want to ban AI-generated posts to avoid creating such a mess for the moderators to clean, the rule-breakers who would post such garbage would not be expected to follow such a rule, would they?

I am not so sure that we need special rules for AI-generated posts. We already know how to handle good posts, bad posts, and bad behavior.

answered Feb 21, 2023 at 18:03

Dave

65k5 silver badges13 bronze badges

2

$\begingroup$ Largely agree, but we have reason to discriminate between posting bad answers in good faith & negligently or deliberately posting bad answers. The sanctions for the latter ought to, & do, amount to more than down-voting or deletion of the answers. $\endgroup$
– Scortchi - Reinstate Monica Mod
Commented Feb 21, 2023 at 19:05

Add a comment |

Aksakal · Accepted Answer · 2023-02-21 22:32:03Z

-6

embrace

Let's embrace it. Let's have a meta user ChatGPT, and make it generate answers to all questions asked. This user will be clearly marked, and as any other users, its answers can be edited by us.

watch IP theft

Let's keep a track of cases, document them, when ChatGPT clearly stole the answers from CV. At some point let's get lawyers involved and sue their pants off. I see a class action law suit, which we'll all join if interested

answered Feb 21, 2023 at 22:32

Aksakal

61.8k11 silver badges11 bronze badges

7

$\begingroup$ I would like to think this is a joke. but. I can't see it as practical or helpful or amusing. $\endgroup$
– Nick Cox
Commented Feb 22, 2023 at 1:40

Add a comment |

Stack Exchange Network

What to do about answers written by ChatGPT

5 Answers 5

ChatGPT is harmful

ChatGPT is harmless

ChatGPT is useful

embrace

watch IP theft

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
discussion
moderation
policy
low-quality-posts
.

Linked

Hot Network Questions

What to do about answers written by ChatGPT

5 Answers 5

ChatGPT is harmful

ChatGPT is harmless

ChatGPT is useful

embrace

watch IP theft

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged discussionmoderationpolicylow-quality-posts.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
discussion
moderation
policy
low-quality-posts
.