2

If there are two significantly similar pieces of fiction, but no word to word sentences, is it possible to use artificial intelligence to prove the similarities ( or reverse engineer in the case artificial intelligence was used to write one, when the other has been used as a prompt to train the program) ? What is the law concerning non consenting parties whose work was used to “teach” the machine?

1
  • AFAICT, the use of copyrighted materials in training AI's is legally a bit ambiguous, where I'm tending to the "it's allowed since it's not forbidden, and it's not forbidden since it wasn't foreseen".
    – MSalters
    Commented Nov 21, 2022 at 9:34

2 Answers 2

2

In such a case, a court would generally look at the result. If the later work is found to have "substantial similarity" to the earlier work, or to be a derivative work of the earlier work, and if fair use (in a US court) or another exception to copyright (elsewhere, including fair dealing in the UK) is not found, then the court might well find that there was an infringement of copyright.

Note, both "substantial similarity" and "derivative work " are intentionally somewhat vague terms in copyright law, allowing case-by-case decision and flexibility.

The court will probably care little about whether or how an AI was used. It is the output, not the input, that will be the issue, I think. I do not know of a case with exactly this fact pattern, however.

5
  • 1
    An independently created work where the author had no access to the original is not copying. If an AI program or an un-artificial intelligence created a very similar poem having no access to the original it is not copying. The input matters. Commented Nov 22, 2022 at 0:08
  • 1
    @George White It is true that creating a work that is very similar, even perhaps identical, to an earlier work without any use of that work is not copying. It is also true that wHere the similarity is substantial, a court will usually find PRIMA FACIE copying, and leave it to the defendant to prove lack of access. Therehave been cases of in infringement found with no proof of access.. But in the scenario here, there was access, so the detailed nature of the access and the use or non-use of an AI will probably weigh little with the court Commented Nov 22, 2022 at 0:16
  • 1
    I generally agree with your point about AI. But it is not true that the input doesn’t matter. In the case of alleged copying by a person it is hard to establish one way or the other that they were or were not exposed to the original. In the case of a program it could be provable that inputing only poems A-G will output poem Z. Then input matters more than output. Commented Nov 22, 2022 at 0:23
  • @GeorgeWhite So if that roomful of monkeys banging away at keyboards (en.wikipedia.org/wiki/Infinite_monkey_theorem) finishes my novel before I do, I'm hosed? Commented Nov 30, 2022 at 18:25
  • Change the ending Commented Nov 30, 2022 at 21:27
0

There seem to be a number of questions here. One is, suppose a program actually generates a literary text, which happens to be sufficiently like another human-created text. Then who holds copyright? The human-created text is protected by copyright, the machine-created text is not in the US, until the courts say that computer-generated text is copyrightable (unlikely, without an act of Congress). The monkey-selfie case clarified that only human outputs are protected.

The second question is whether a program could be used to address the question of whether there was copying. Logically, it it seems that two works are similar, the similarity could be a coincidence, A might have copied from B (or the opposite). This is a factual question which ideally involve expert testimony (which incidentally can only be provided by a human). The expert can and really should rely on some computer program that instantiates scientific knowledge about the structure of texts. The expert could testify that it is highly probable that certain similarities are coincidental (the presence of the sentence "Well, then, let's go!") or that it is highly improbable that the texts arose independently (the text "It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the season of Darkness, it was the spring of hope, it was the winter of despair, we had everything before us, we had nothing before us, we were all going direct to Heaven, we were all going direct the other way—in short, the period was so far like the present period, that some of its noisiest authorities insisted on its being received, for good or for evil, in the superlative degree of comparison only"). Scientific evidence in copyright infringement cases is no more legally obligatory than DNA is required in criminal cases, it's just harder to disagree with a one-sided cold-hard scientific argument. Instead, courts may rely on ideas like "striking similarity", "probative similarity" or "substantial similarity" as a subjective standard for determining whether copying took place. A program would instead give you a number, such as a p-value.

Such expert testimony would be subject to an admissibility challenge, where the opposing side would argue that the supposedly scientific methodology was no better that entrail-divination. So far, no court has allowed ChatGPT dialogues to be admitted as expert testimony. Courts routinely allow experts to base their testimony on the output of statistical programs, because they are reliable at that. Ultimately, any computer-assisted testifying requires the expert to understand the logical basis for asserting "this was copied" vs. "this is just a coincidence".

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .