2

Stack Exchange has announced that it will introduce as a preview a new form of a question asking assistance:

Through an updated semantic search experience, after a user searches or asks a question in the Stack Overflow search bar, we can leverage AI to provide a results summary that draws from multiple high-quality answers on Stack Overflow, in addition to providing the traditional search results list of questions and answers.

A screenshot added shows a heading saying "Search results", followed by a summarizing text that is, as I understand it, produced by an Artificial Intelligence agent. Below that follows a list of links, with a header "Sources", that list the answers used to produce the summary.

It is is relatively clear the list is covered under fair use. But what about that summarizing text? To produce it, the whole content of all listed answers needs to be processed, making it adapted material. Would this form of processing considered to be "fair use"?

Note this is different from using large databases (in this case, the whole of the SO content) to train the AI. The already-trained AI processes a relatively small number of answers (maybe up to 50) to give a summary of their content.

The practical difference this would make lies in the fact that answers on the Stack Exchange network are licensed by its authors by a CC-BY-SA license. While the attribution requirement is obviously fulfilled by the listing of the sources, the question remains if the share-alike clause needs to be respected. Must the result summary text be licensed also under a CC-BY-SA license?


This Q&A does give a general overview how to determine fair use, but does not give an answer how to apply the tests (especially no. 3, substantiality) to the above case.

0

2 Answers 2

2

The standard answer to a question about fair use is to recite the fair use defense, with a heavy does of "it depends on whether...". There are non-legal questions which are outside of the scope of Law SE (the exact technology for extracting text summaries from a small database and how one might tweak the numbers in a huge database given an analysis of a small set of texts). The sample extract-texts are highly-probable word sequences when writing about the particular topic, therefore not demonstrably derived from any particular source.

The required processing that SE does is in order to create this database is within the scope of the license granted to the network, so SE creating the summary is allowed. The summary texts are sufficiently associated with protected source text. If the content in question is "created by SE", then it cannot be copied without permission. However, protection only applies to human-created content which is protected by copyright. The fruits of bot-labor are not protected by copyright, therefore SE cannot sue users for copying text that they created with an AI. In other words, the generated text is outside the scope of copyright law and licensing requirements.

1

It’s unlikely the summary is an infringement

First, it can’t be a derivative or an original work because it has no human author.

If it did have a human author, it would most likely be an original work, not a derivative. A summary of a copyrighted work is an original work of creation, not a derivative of the summarised work. Unless it’s a copy.

The test for whether it’s a copy is if a substantial portion of the subject matter has been copied - a qualitative test. This is why I say it’s unlikely rather than being certain - some summaries will cross the line into being copies, but most won’t.

As you observe, the CC-BY-SA gives SE the right to make copies anyway, so long as they give attribution, which they have.

Because these summaries are machine generated, there is no copyright in them, so there is no need for a licence for anyone to use them. If they are infringing copies, then they are copies of the original work(s) that have already been licensed by the original authors.

2
  • My use of the word "derivative" was probably false. The license itselfs speaks of "adapted material" to cover software-based transformation. I've added a link to the question.
    – ccprog
    Commented Jul 27, 2023 at 22:47
  • @ccprog adapted material IS a derivative work.
    – Trish
    Commented Jul 28, 2023 at 0:13

Not the answer you're looking for? Browse other questions tagged .