40

Stack Exchange is now partnering with OpenAI. While the company has received a commitment from OpenAI that attribution will be maintained, and the company has not relicensed our contributions (as of 2024-05-14), OpenAI clearly wants to do things (like training a transformer model) that aren't compatible with attribution.

If Stack Exchange were to end up liable for license violations then – after a period of 30 days, during which they could only "cure" the violation by making OpenAI destroy their latest model – Stack Exchange would no longer be authorised to use our contributions. So… in this hypothetical, what's the procedure for having our contributions removed from Stack Exchange?

The procedure is not:

  • DMCA takedown notice. DMCA's OCILLA "safe harbor" provision doesn't apply if Stack Exchange is the knowing violator, so we don't need to "under penalty of perjury" anything.
  • Mass deletion. The software doesn't let you delete well-received posts, and mass-deletion is currently considered vandalism.
  • Blanking edits (except as an interim measure). This is also currently considered vandalism, though I suppose we could change the rules if it came to that.
  • GDPR erasure request. Most of the time, contributions are not personal data: all this can achieve is waiving the attribution requirement for your contributions, pursuant to CC BY-SA 4.0 §3(a)(3), though it won't waive the ShareAlike provisions.
10
  • Attn staff: user1937198 has identified a potential license violation with much weaker preconditions than in the original version of this question. This might not be as hypothetical a situation as I thought.
    – wizzwizz4
    Commented May 8 at 17:30
  • 1
    the potential legal issue I pointed out is not a license violation but a gdpr violation. Rather than losing the right to host content, stackoverflow only faces fines of up to 10% of annual revenue. Commented May 9 at 9:46
  • @user1937198 Yes, but per §3(a)(3) of the CC BY-SA 4.0 license, it is also a license violation in the event someone requests dissociation.
    – wizzwizz4
    Commented May 9 at 15:06
  • Why not a DMCA notice? Wouldn't that be the normal way? You are the original creator. You believe that the license is moot, so you send them a DMCA notice. They may disagree and then you sue them. Maybe this question would also be more suitable for law.SE. Commented May 10 at 8:20
  • 1
    @NoDataDumpNoContribution DMCA takedown notices are for when somebody else posted content that infringes on your own copyright. More generally, DMCA takedown is a process initiated when a copyright owner finds content online which was uploaded there without their permission. I very much doubt that uploading the content themselves would then count as "without permission". Moreover, the DMCA takedown procedure is to send a notice to a service, which the service can refuse. After such refusal you can pursue legal action. DMCA takedown notices don't carry obligations by themselves.
    – VLAZ
    Commented May 10 at 8:31
  • @VLAZ Technically I don't see much difference between "uploaded illegally" or "did not remove when it became illegal". The important point is that is available in both cases although the creator thinks it shouldn't and the DMCA takedown notice is a step into the direction of rectifying it. It there was a dead-sure way to remove content just because someone wants that, that would be an even greater risk and courts must be last arbiter there. Who else can? Commented May 10 at 8:34
  • @NoDataDumpNoContribution I'm just explaining what DMCA takedowns are. Whether you can "withdraw consent" is something not really covered AFAIK. If that were the case, you still have to content with the perpetual and irrevocable license you have agreed upon by signing up. All that's to say there is no <s>magic</s> legal "erase content" button to press. If you really want to pursue a legal path, that'd most likely involve legal action. Which can be costly, even if you do get to get a court to rule in your favour. But even then, there is no guarantee it would.
    – VLAZ
    Commented May 10 at 8:40
  • @NoDataDumpNoContribution I checked Law.SE for DMCA and withdrawing consent. The only really relevant thing I found was Can I get my answers deleted from Stack Exchange?
    – VLAZ
    Commented May 10 at 8:51
  • @VLAZ I kind of know how DMCA works, I did it once. I mentioned it with regard to the case of this question here. wizzwizz4 does not "withdraw consent" as far as I can see. The case is instead a possible automatic ending of the license by a violation of it. And in such a general case the recommended way to deal with violations may indeed end up in court. I just don't understand why wizzwizz4 rules DMCA out in this question. And yes, I believe the problem is indeed the other license as for example Franck Dernoncourt mentions in an answer below. However, it's downvoted. Commented May 10 at 9:18
  • @NoDataDumpNoContribution I ruled DMCA out because that's not what it's for. If it works anyway (even for pseudonymous users), that'd be an acceptable answer.
    – wizzwizz4
    Commented May 10 at 13:40

2 Answers 2

19

First, there needs to be a license violation.

The use of intellectual property in training AI/ML models is still an open legal question, at least in the United States. Historically, text and data mining has generally been seen as fair use. Although a case-by-case assessment may still be needed, it seems consistent to consider the use of protected works for the training of AI/ML models as text and data mining and, therefore, fair use. As such, copyright protections wouldn't apply.

Instead of intellectual property, you could consider this as a case of contract law. According to the University of Arkansas libraries, US federal courts have recognized CC licenses as legally binding contracts. If the license has been violated, then the contract has been broken.

Enforcement of intellectual property laws in the United States is difficult. All avenues require registration of the work with the Copyright Office. That is, you cannot go before the Copyright Claims Board or start a federal court case without first registering copyright as a prerequisite. This can get expensive, and depending on the age of the content, even registration may not allow you to recoup your legal fees if you win your case. Contract laws can be enforced through other courts, such as state courts. Since you aren't making intellectual property claims, you also do not necessarily need to register your work. This would depend on the exact strategies and you would need to consult a lawyer to determine which course is best.

Now, let's assume that you are going forward with one of these claims - either an intellectual property violation or a breach of contract. Either way, the next step will be to look at the specific terms of the license. In the case of the newest content on Stack Exchange, that is the Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license. Older content may be under an older license, but we'll focus on the 4.0 license to keep it simple. Similar concepts can be applied.

Stack Exchange has not violated the license, so you cannot take action. They are continuing to host the content. Each post is attributed to the original creator and the most recent editor (if any), the history page maintains the editors and their specific changes, the specific license each post is available under is displayed in the "share" menu. All of the elements of attribution are satisfied by Stack Exchange.

There's also an open question about if you are dual-licensing your content to SE. Some people interpret Section 6 of the Terms of Service to grant two licenses. They specifically point to the "Subscriber Content" which talks about the CC BY-SA 4.0 license and "the perpetual and irrevocable right and license to access, use, process, copy, distribute, export, display and to commercially exploit such Subscribe Content". Because these rights are inherently granted in CC BY-SA 4.0, I interpret this as an enumeration of the basic rights needed for the company to function should the public-facing license ever change, but this has never been addressed or tested.

Unless you can demonstrate that the company is sharing the content with other people and not maintaining these elements of attribution, then Stack Exchange is not in violation of the terms.

As the content creator, you would have to examine if the recipients of the content continue to follow the terms of the license agreement. Recipients, whether they receive the data by scraping the web pages, the API, the SEDE, or some other mechanism, all receive it under the CC BY-SA terms.

Unfortunately, there are ambiguities and open questions about the CC attribution clauses. In CC BY-SA, Section 3(a)(2) says that the person using the content may "satisfy the conditions ... in any reasonable manner based on the medium, means, and context" and it gives the example of "providing a URI or hyperlink to a resource that includes the required information". There may not be a need to explicitly attribute the individual posters - being able to draw a connection between a generated block of test and hyperlinks at the SE question level where the rest of the licensing and attribution data may be found may be enough.

If anyone, SE or otherwise, violates the license, the process would be:

  1. Follow the Creative Commons advice and send a request to either correct the violation or remove the content.
  2. If the content is not removed or there is a dispute, consult with a lawyer. They can review the situation and tell you what the next steps could be. This will likely cost money, although some lawyers do offer free consultations. There may be added expenses associated with filing court cases, documentation, copyright registration, and so on.
5
  • 1
    What if I can prove that I'm the author of certain content but there is no way to prove that I actively agreed on any Terms of Service? I haven't signed anything or used some means to prove my RL identity to the site, so how can some Terms of Service be legally binding? I can just say that I wasn't informed about them and the network can't prove otherwise. Even if they send out an email or have you click some "I accept" button, that's still not proof that the particular information has reached the intended person. Maybe it was someone else clicking the "I agree" button without my knowledge.
    – Lundin
    Commented May 8 at 6:55
  • 1
    I think that the OP was probably thinking about this comment about the UK ICO guidance that apparently states that "because SO was responsible for the initial consent, they are also responsible for getting any third partys they sold the data to to delete any personal data on request" Commented May 8 at 10:03
  • 7
    @Lundin I'm not sure how other jurisdictions view it, but in the US, creating an account constitutes acceptance of the terms. Terms usually have clauses allowing revision, so you're also agreeing to continue to accept the terms as they change. By simply logging in and posting content, you're agreeing to the terms of service. Ignorance by failing to read and understand the terms is not a defense. And with IP logging and SSO, if you're making claims that you posted the posts, it's likely to assert that the same person created an account. These would be seen as very weak arguments. Commented May 8 at 10:46
  • 1
    @SPArcheon That would probably be a good question for Law. There's a conflict here between GDPR and the attribution requirements of CC BY-SA. The license terms require that the license holder make the request to remove the attribution information. I'm not even close enough to familiar with UK (or any European law) to know how that would fall out. Commented May 8 at 10:48
  • 1
    law.stackexchange.com/q/91785/41063
    – endolith
    Commented May 11 at 17:52
-11

How do I remove my contributions from Stack Exchange, in the event of a license violation?

Stack Exchange currently dual-licenses user content (and it's been like this for over 13 years), which allows them to sell user content without attributing users. Therefore, if OpenAI don't do attributions, Stack Exchange still doesn't violate the license.

14
  • 5
    If that were the intention of the ToS, it wouldn't say “as reasonably necessary to”. The only thing they've listed that comes close to this is “Aggregate data to provide product optimization”, and training a transformer model clearly doesn't fall under that. Besides, they don't have the right to sublicense except under CC BY-SA 4.0.
    – wizzwizz4
    Commented May 7 at 16:54
  • 11
    This remains an open question. Some people are convinced there is a dual license. I believe that it is an enumeration of the minimum rights that SE needs to function which are currently covered by CC BY-SA 4.0. Unless SE legal clarifies or someone tests it in court, we may never know for certainty if there's one license or two. Commented May 7 at 16:56
  • 10
    I don't believe that we've agreed to a second licence which permits SE (or their partners) to republish the content we created without giving adequate CC attribution. OTOH, I agree that the situation is legally murky, and meta.stackexchange.com/questions/388571/…
    – PM 2Ring
    Commented May 7 at 19:18
  • 1
    @PM2Ring "I don't believe that..." It may be the case that you don't believe it but it is also true. The TOS is the only binding document for all of SE. A dual licensing is either in there or it isn't. The voting on this answer may represent more wishful thinking than actual legal expertise and I wish we would post these questions on law.SE in order to get more expert opinions. Commented May 10 at 8:24
  • 2
    @NoDataDumpNoContribution A contract is a meeting of the minds, and contract ambiguities are generally resolved in favour of the party who didn't draft the contract. It's not obvious that there's a dual license, and if there is a dual license, the other one is not a carte blanche "you can do anything with this". The version of the ToS from 13 years ago looks even less like a dual license. Representatives from Stack Exchange have consistently said "CC BY-SA/wiki", but afaik none of them have ever said there's a dual license. I have never intended to grant a dual license to Stack Exchange.
    – wizzwizz4
    Commented May 10 at 13:45
  • 1
    Believing there's a dual license is a valid opinion, but it's also a minority opinion – not a fact. Anything is possible in a court of law, so we shouldn't dismiss this view out-of-hand, but representing it as a fact is inaccurate. I expect that's why this answer has been downvoted.
    – wizzwizz4
    Commented May 10 at 13:46
  • 1
    @wizzwizz4 I'm not a lawyer and I don't want to give any legal advice. Everything is possible. To me it looks like the dual license is and was obvious enough. My contributions were always with this in mind (even though I also didn't realize it until some time in but that's my fault). Do with it what you want but my opinion is that votes on this answer may be more what people desire to be true. But maybe I'm wrong and people really looked at the TOS and said: clearly no dual license. Who knows. Commented May 10 at 14:07
  • @NoDataDumpNoContribution I never knew there was a dual license until now. I always thought it was CC BY-SA, as it says at the bottom of every page and in the ToS. The "commercial exploitation" language that implies another license was added to the agreement at some later time, without my knowledge.
    – endolith
    Commented May 11 at 18:26
  • 1
    @endolith We can find out if it was added later by looking up the version of the TOS at the time point when you signed up. And another question would be if you got notified every time they changed their TOS with the disclaimer to end your usership should you not agree to the changes. I don't remember if I got such a message in 2018, because I didn't keep these messages. Commented May 11 at 18:39
  • 1
    I signed up in 2009-03, and I don't see any ToS at that time in Internet Archive. In 2010, it said "You grant Stack Overflow the … license to … create derivative works and store such Subscriber Content and to allow others to do so (“Content License”) in order to provide the Services" which doesn't to me imply that they can sell the content under a non-CC license, just that they are free to cache files and such in order to run the servers. I ran the ToS through ChatGPT and it doesn't think so either:
    – endolith
    Commented May 12 at 22:23
  • 1
    @endolith I wouldn't put any weight to the output of running a ToS through an LLM. and you've (again) missed the words "without limitation"
    – starball
    Commented May 12 at 22:27
  • @super-starball-ultra Indeed, I don't see those words anywhere in the Subscriber Content section. Where do you see them? Are you claiming that they invalidate the words "in order to provide the Services"?
    – endolith
    Commented May 12 at 22:32
  • @endolith You really use a LLM to substitute a lawyer? Not sure if that can end well. That is probably the core of the no generative AI policy here. You cannot trust them. Commented May 12 at 23:20
  • 1
    @NoDataDumpNoContribution Just use an LLM Judge and that'll be fine. Commented May 14 at 19:44

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .