55
$\begingroup$

The user (or a bot) https://mathoverflow.net/users/496085/canvas123 appears to be posting ChatGPT-generated text on MathOverflow.

Two “answers” by this user already got accepted (!):

The question Does every monoidal abelian category admit an exact, lax monoidal functor to abelian groups? has an answer by this user with all the telltale signs of being computed-generated. In particular, the “commutative diagrams” are nonsensical, and the poster is “unaware” of this.

$\endgroup$
18
  • 14
    $\begingroup$ This seems to be a recent topic on several other sites as well. For example this post on Meta Stack Exchange: Ban ChatGPT network-wide (and the questions linked there). Or Meta Stack Overflow: Temporary policy: ChatGPT is banned (and the questions linked there). $\endgroup$ Commented Dec 8, 2022 at 21:54
  • 12
    $\begingroup$ The relevant user has been suspended and their posts deleted. Looking forward, it's worth having a discussion about our attitude toward this sort of thing in the future. $\endgroup$
    – Tim Campion Mod
    Commented Dec 9, 2022 at 0:46
  • 1
    $\begingroup$ @MartinSleziak: is the upshot from your links that there is already some automatic AI answer-detection applied to all stack exchange sites? Or is there anything more MO could do? $\endgroup$ Commented Dec 9, 2022 at 3:58
  • 6
    $\begingroup$ Also under discussion at math.stackexchange meta: What is our policy on AI-generated content? $\endgroup$ Commented Dec 9, 2022 at 5:52
  • 4
    $\begingroup$ @SamHopkins I did not see anything about automatic detection. (Of course, I might have missed something. There was a lot on this topic recently.) There is this question with an answer from SO mod: How can we enforce the ChatGPT ban?. They say that they do not want to share information about detection of such posts publicly, but they share some such info with other mods. (I do not know whether the MO mods are members of the moderators team, but they definitely have access to Teachers' Lounge.) $\endgroup$ Commented Dec 9, 2022 at 6:28
  • 1
    $\begingroup$ @MartinSleziak: gotcha. I guess I was reading between the lines with that moderator statement about detection of AI-generated content. $\endgroup$ Commented Dec 10, 2022 at 16:26
  • 2
    $\begingroup$ Also on The Register - the online news site for the IT industry. $\endgroup$ Commented Dec 10, 2022 at 19:55
  • 1
    $\begingroup$ Is Are some Galois representations vector bundles? another example? $\endgroup$
    – LSpice
    Commented Dec 11, 2022 at 0:07
  • 2
    $\begingroup$ @LSpice I guess the moderators have their own ways of knowing, but oddly the question you link to seems less accurate than the chatbot, e.g. mis-spelling 'cyclotomic' as 'cyclomatic'. It's possible that a chatbot was used to generate parts of the question but not all of it, although that seems less likely than either 'none' or 'all' $\endgroup$
    – Yemon Choi
    Commented Dec 11, 2022 at 15:48
  • 4
    $\begingroup$ What did their answers say before they were deleted? $\endgroup$ Commented Dec 12, 2022 at 1:48
  • 2
    $\begingroup$ @TymaGaidash You can still see the deleted answers although unfortunately you need a relatively high reputation threshold (10k) to do this. We did consider whether the answers should not be deleted (for posterity's sake) but ultimately decided that wouldn't be fair to the individual questioners, as it would distract from their actual questions. But Martin Brandenburg's discussion below of one of the answers is quite typical, I think. You can also experiment for yourself if you're curious (but please don't post this stuff as answers here). $\endgroup$
    – Tim Campion Mod
    Commented Dec 13, 2022 at 19:40
  • 1
    $\begingroup$ One of the questions was mine, and I read the answer and found it acceptable (it was actually a request for references). How did you figure out it was GPT? And how did you understand it in general? $\endgroup$ Commented Dec 22, 2022 at 12:49
  • 3
    $\begingroup$ @MarsaultChabat: All three “references” provided in the “answer” were entirely fictional: papers with such names simply do not exist. How exactly do you find this acceptable? $\endgroup$ Commented Dec 22, 2022 at 17:44
  • 2
    $\begingroup$ Very strange. My experience with it shows clearly that it is nowhere near the level of generating any sensible mathematics. For example, it repeats the same logical error again and again no matter how many times I tell it that it is an error, this kind of thing. $\endgroup$ Commented Dec 26, 2022 at 7:31
  • 3
    $\begingroup$ Unfortunately, ChatGPT answers get posted more and more on MO ... The most recent example is mathoverflow.net/users/497569/amit-luthra who deleted all of this "answers" right after I called him out. A few weeks ago something similar happened. I suspect a huge dark figure. $\endgroup$ Commented Jan 9, 2023 at 18:25

2 Answers 2

56
$\begingroup$

It's worth having a discussion about AI-generated content. At this point, the conclusions of such a discussion are pretty clear: the current state of the art in AI-generated MathOverflow answers is complete garbage. Not only that, but the possibility invites abusing the site by serially posting low-quality and plagiarized content, and in a way which is particularly prone to wasting everybody's time.

In principle, AI-generated content may one day be capable of adding value to the site, but that day is not today. If that day comes, appropriate attribution of AI-generated content will certainly be required. In the present, please refrain from wasting people's time.

$\endgroup$
8
  • 13
    $\begingroup$ Not to be facile or short-sighted, but, since one of the appeals of utilities like ChatGPT is that anyone can use them, what utility could ChatGPT-generated answers, even if sensical, ever add? Rather than my asking a question and having someone else feed it to a bot for an answer, I could just do that myself. $\endgroup$
    – LSpice
    Commented Dec 10, 2022 at 18:42
  • 9
    $\begingroup$ @LSpice So if ChatGPT became sufficiently advanced, you could imagine adding an automatic answer from ChatGPT that people could vote on as if it were a regular user. $\endgroup$ Commented Dec 10, 2022 at 20:10
  • 12
    $\begingroup$ @AbhimanyuPallaviSudhir So basically this XKCD? $\endgroup$ Commented Dec 11, 2022 at 16:37
  • 3
    $\begingroup$ @AbhimanyuPallaviSudhir My instinctive thought is that en.wikipedia.org/wiki/Brandolini%27s_law will apply $\endgroup$
    – Yemon Choi
    Commented Dec 12, 2022 at 5:03
  • 5
    $\begingroup$ @LSpice One can imagine a future in which the competent AIs typically charge for their services, but under certain circumstances provide some free services. Sort of like, I dunno, human mathematicians get paid a salary, yet sometimes post answers on MO for free. $\endgroup$ Commented Dec 13, 2022 at 21:43
  • 1
    $\begingroup$ @TimothyChow, aha, so we need to focus on training AI not to understand mathematics better, but to enjoy the reputation and procrastination boost as humans do. 😄 $\endgroup$
    – LSpice
    Commented Dec 14, 2022 at 2:13
  • 7
    $\begingroup$ @LSpice: Regarding your first question, writing good prompts/queries for such AI’s is itself a non-trivial skill — just like e.g. using Google well. (Indeed, I find it useful to think of GPT not as “a very stupid essay-writer”, but as “a rather clever search engine”.) I can imagine, some years down the line, a situation where mathematician A tries their question on a GPT-like AI, doesn’t get a good answer, so posts the question here, but then person B (who knows the AI better) is able to rewrite the question into a form which gets a good answer from the AI. $\endgroup$ Commented Dec 19, 2022 at 14:31
  • 1
    $\begingroup$ One more data point for what it's worth: I have just submitted to ChatGPT a nine-question exam consisting entirely of questions taken from recent exams in my sophomore-level intermediate economics class. Every single one of the answers was pure gibberish, though they all appeared at first glance like they might be meaningful. A student who turned in this exam would have earned a score of zero. Obviously math and economics are different subjects, but these are problems with unambiguous right and wrong answers, and so far more like math than they are like (say) history or literature. $\endgroup$ Commented Apr 4, 2023 at 4:32
24
$\begingroup$

At first I did not agree with the title "ChatGPT strikes MathOverflow", because we need to verify (somehow) that the answers were indeed created by ChatGPT. I have tested ChatGPT with similar questions like the one about monoidal abelian categories, and although the results were certainly impressive (not correct, but looking good on a superficial level), they were not even close to what the user canvas123 has posted. The answers didn't contain formulas, let alone commutative diagrams.

But then I realized that you can actually add requirements in your query. I asked ChatGPT to give a "detailed proof", which produced formulas as well. Then I asked to "include commutative diagrams as well", which indeed happened (see below).

Interestingly, the line breaks in the commutative diagrams were done with a single \, where a double \\ is required. And the same was the case in the answer by canvas123. Just so that you know, I have edited his diagrams accordingly to make them more readable, because I was curious if the claim that they are nonsensical is correct. After the edit, they were a bit less nonsensical (still wrong).

Given this, it is very likely that canvas123 has indeed used ChatGPT.

It is well-known that ChatGPT often produces wrong answers for complicated, technical questions. Personally, I am still amazed by this tool since sometimes they turn out to be correct, and probably they will even get much better in the next months and years. At some point, it will become very hard to distinguish them from answers written by human beings.


Here is ChatGPT's (wrong) answer to "Write down a detailed proof that every monoidal abelian category admit an exact, lax monoidal functor to abelian groups. Include commutative diagrams as well." (since the modal is stochastic, you will probably get a different answer though):

First, let us construct an abelian category $\mathcal{C}'$ as follows: the objects of $\mathcal{C}'$ are the same as those of $\mathcal{C}$, and the morphisms of $\mathcal{C}'$ are the same as those of $\mathcal{C}$ except that we require them to be strictly additive, i.e. $f + g = h$ implies $f = h-g$ and $g = h-f$. Composition in $\mathcal{C}'$ is the same as in $\mathcal{C}$, and the identity morphism on an object $c$ is the same as the identity morphism in $\mathcal{C}$.

It is straightforward to verify that $\mathcal{C}'$ is indeed an abelian category. To see that it is monoidal, we define the monoidal product $\otimes'$ on $\mathcal{C}'$ as follows: given two objects $c_1, c_2$ in $\mathcal{C}'$, we define $c_1 \otimes' c_2 = c_1 \otimes c_2$ in $\mathcal{C}$. Given two morphisms $f,g$ in $\mathcal{C}'$, we define $f \otimes' g = f \otimes g$ in $\mathcal{C}$. It is straightforward to check that $\otimes'$ is associative and that $I' = I$ is the identity object.

Since $\mathcal{C}'$ is an abelian category, it admits an exact, faithful functor $F: \mathcal{C}' \to \mathbf{Ab}$ to the category of abelian groups. Since $\mathcal{C}'$ is monoidal and $F$ is a functor, it is also lax monoidal, meaning that it preserves the monoidal structure up to natural isomorphism. In other words, there exist natural isomorphisms $$F(c_1 \otimes' c_2) \cong F(c_1) \otimes F(c_2)$$ for all objects $c_1, c_2$ in $\mathcal{C}'$, and $$F(f \otimes' g) \cong F(f) \otimes F(g)$$ for all morphisms $f,g$ in $\mathcal{C}'$. These natural isomorphisms can be represented by the following commutative diagrams: $$\begin{array}{ccc} F(c_1 \otimes' c_2) & \stackrel{\cong}{\longrightarrow} & F(c_1) \otimes F(c_2) \ \downarrow & & \downarrow \ F(c_1) \otimes F(c_2) & \stackrel{\cong}{\longrightarrow} & F(c_1) \otimes F(c_2) \end{array}$$

When you replace \ by \\ in the latter diagram, it looks like this:

$$\begin{array}{ccc} F(c_1 \otimes' c_2) & \stackrel{\cong}{\longrightarrow} & F(c_1) \otimes F(c_2) \\ \downarrow & & \downarrow \\ F(c_1) \otimes F(c_2) & \stackrel{\cong}{\longrightarrow} & F(c_1) \otimes F(c_2) \end{array}$$

$\endgroup$
13
  • 33
    $\begingroup$ I have so far been unable to get ChatGPT to correctly prove $\operatorname{Tr}(AB) = \operatorname{Tr}(BA)$, whether or not I use LaTeX or add suggestive extra questions. It either forgets about non-diagonal entries, or turns an $a_{i,j}$ into $a_{i,i}$ somewhere. When I tell it about its errors, it "recognizes" them, sometimes elaborating on a counterexample I provide, but then it repeats the same errors again. I have a hunch that it treats lexically similar "words" as semantically similar, which dooms it to uselessness in mathematics. $\endgroup$ Commented Dec 10, 2022 at 6:10
  • 13
    $\begingroup$ An alternative conclusion is that rhetoric and poetry are significantly easier than (even quite straightforward) mathematics. Flattering, but not very convincing. I'm skeptical about general AI, but the prospect of having an AI help out with mathematical proofs actually sounds quite plausible to me! $\endgroup$ Commented Dec 10, 2022 at 6:13
  • 15
    $\begingroup$ @darijgrinberg: I disagree with your last statement: natural language is very tolerant to local ambiguity (and even local falseness), whereas the mathematical one is not (and, in fact, the absence of precision at every point of a statement can even lead to catastrophic results). The problem is that the AI methods developed lately are insensitive to details, and tend to capture only the "general shape of a thing", while these subtle and minute details that are discarded by AI tend to form most of the modern mathematical thinking. $\endgroup$
    – Alex M.
    Commented Dec 10, 2022 at 12:47
  • 17
    $\begingroup$ @darijgrinberg " I have a hunch that it treats lexically similar "words" as semantically similar" - by complete coincidence reading this made me think of marking HW and some referee work I should be doing. $\endgroup$
    – Yemon Choi
    Commented Dec 10, 2022 at 16:27
  • 2
    $\begingroup$ @AlexM.: Not sure what statement of mine you're referring to, as I think we are agreeing. What I mean is that AI for proofs seems both realistic and useful to me, but ChatGPT is definitely not it. The best part about proofs is that you can use automated reinforcement learning, which is likely much faster than anything that relies on human feedback. But that was clearly not the optimization target for ChatGPT. $\endgroup$ Commented Dec 10, 2022 at 17:09
  • 9
    $\begingroup$ As I remarked in CV too, this is Mathgen on steroids. $\endgroup$ Commented Dec 11, 2022 at 8:53
  • 5
    $\begingroup$ Somebody pointed out to me that ChatGPT seems to know something about what it means to prove things in Lean. I haven't explored this too much, but you can ask it to give a proof of XYZ in Lean, and it will give you actual Lean code (which may or may not actually work). $\endgroup$
    – Tim Campion Mod
    Commented Dec 11, 2022 at 19:55
  • 2
    $\begingroup$ @TimCampion Sounds like ChatGPT is cheating off of Lean's test. $\endgroup$
    – Todd Trimble Mod
    Commented Dec 12, 2022 at 6:56
  • $\begingroup$ @darijgrinberg I'm very far from knowing how GPT/ChatGPT work, but I've heard that such AI systems do have finite memory (as in Markov chains). $\endgroup$
    – Pablo H
    Commented Dec 13, 2022 at 14:02
  • 4
    $\begingroup$ I have failed to get a sensible answer when I ask it to two-color a cycle graph of five vertices. It happily reports the requirements of two-coloring (adjacent vertices have different colors). Then it happily suggests colorings like red,blue,red,blue,red. When I point out that the first and last are now red, it agrees it is incorrect, and offers another wrong one, ad infinitum. All in excellent English. So already the five-vertex cycle graph is something over the horizon of its inference capabilities. $\endgroup$ Commented Dec 14, 2022 at 14:00
  • 2
    $\begingroup$ I had quite a lot of conversations with ChatGPT today feeding it some unsolved problems, and some solved ones from here math.utah.edu/~bestvina/eprints/questions-updated.pdf and having a look at the answers. It is quite interesting but mostly non-nonsensical. However interestingly enough, when given an example and then asking ChatGPT to break it down to the definitions and pointing out the errors, it seems to be capable of adapting the answer. (it won't be stored for the next user though). $\endgroup$ Commented Dec 15, 2022 at 21:37
  • 1
    $\begingroup$ @Loreno, nonsensical or non-nonsensical? $\endgroup$ Commented Dec 17, 2022 at 18:23
  • 2
    $\begingroup$ nonsensical of course $\endgroup$ Commented Dec 17, 2022 at 20:27

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .