13

Is there an SPDX or other widely accepted license header syntax/format that prohibits machine learning?

My company's Github license statement includes:

"Licensee is not granted the right to, and Licensee shall not ...[use company's open source code to] ... train Artificial Intelligence (AI) models, including language models, programming models, or other type of model, or any other automated or manual training of deep / multi-layer neural networks ... the intent of this License is for human collaboration and teamwork, and not for any AI or machine training use"

But I have no doubt LLM training crawlers ignore this. I'm looking for a simple, one-line header, in SPDX or other syntax, they would observe -- or at least if they did not, it would be clear to reasonable, impartial observers they should.

I found one other relevant thread, but not asking the same question.

Edit: regarding fair use, for every link saying LLM training may be fair use (e.g. the one in @DaleM's answer), there is another saying it may not. So I would like to clarify my question: hypothetically, setting aside the issue of fair use and assuming that at some point source code licenses must be factored in (as it is for humans), is there a tentative or emerging simple header line that can be added to source code files that effectively says "training prohibited", even if this should be shot down later in the courts ?

5
  • 3
    Your main question might be a better fit for opensource.stackexchange.com since it doesn't seem to be so much a legal question as one about available licenses and the folks on OpenSource might (might, I don't know, and pelase make sure it would even be on topic) know better.
    – terdon
    Commented Mar 26 at 16:33
  • @terdon, thank you for that advice Commented Mar 26 at 16:47
  • 3
    I don't believe this question would be accepted on opensource.SE. Prohibiting use by LLMs is discrimination against a field of endeavor, which would make any license with such a prohibition not open source, and so anyone asking OS.SE how to do this would just be shown the door.
    – jwodder
    Commented Mar 26 at 16:54
  • 1
    @jwodder, well I have been to open source conferences where a presentation broke down as this point was argued. The tech equivalent of a food fight. Maybe I can post there very gently like "in the event common sense does not prevail, what can we do to prepare" :-) Commented Mar 26 at 17:07
  • @terdon, update - I want to add there are other similar discussions going on, here is one on opensource.stackexchange.com Commented May 23 at 14:37

1 Answer 1

22

No

It’s quite likely there never will be.

It is an open question whether copying for the purpose of training an AI model is fair use. If it is, than any prohibition by the copyright holder will be of no legal effect because, if it’s fair use, it isn’t copyright violation.

There are a number of ongoing cases. When they’re decided, we’ll know. For what it’s worth, I think the courts will decide that it is fair use.

1
  • Comments have been moved to chat; please do not continue the discussion here. Before posting a comment below this one, please review the purposes of comments. Comments that do not request clarification or suggest improvements usually belong as an answer, on Law Meta, or in Law Chat. Comments continuing discussion may be removed.
    – Dale M
    Commented Mar 26 at 11:49

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .