2

This is something I've been curious about for some time. Say I programmed a web crawler to download a bunch of pictures of pets from pet websites ( I'm not asking about the legality of this, but as long as I stay within TOS I think this would be fine.) Now many of these images would be under copyright.

What if I then took these images and trained an ML Model off of them. What if I then sold the ML Model? What if the source of the images claimed that the images could only be used for non-commercial use?

3

1 Answer 1

1

Overview

There are at least two issues here:

  1. Is the use of the images a use of copyrighted content of a sort that normally requires permission? Or is the output a derivative work (or works) requiring permission to create?

  2. If the answer to 1 is "yest" would n exception to copyright, such as "fair use" apply?

I ignore, as the question asks, any violations of the TOS document(s) of the site(s) where the images were obtained.

Does Copyright apply at All

Depending on exactly how the images are used, and what is contained in the model derived from there there is an argumetn that there is no copyright issue here at all.

Images posted to the net (or otherwise published) have an implicit license for anyone to view them at the very least. Copyright never protects ideas.

If the original images cannot be re-extracted from the model, then it is hard to assert that the model is a "copy" of any or all of those images. If removing any one image from the training set would result in much the same model, it is hard to claim that any ultimate output work is a derivative work of that image, or that it is "based on".

Based on this, there seems to be no valid copyright claim at all, without even getting into the tricky land of copyright exceptions. If this is correct, no permission is needed and the copyright owners in the original images have no control over thej model or its output.

Fair Use

Assuming for purposes of argument that the model does constitute a "copy" or that the output images are derivatives of the source images, then does fair use apply? Not that fair use is a strictly US legal concept, so this question is only relevant if suit is brought in a US court, not in the courts of some other country.

Fair use is defined in 17 USC 107. Under it a court deciding on a fair use claim must consider the four statutory factors, and may consider other unspecified factors as well. The four statutory factors are:

(1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;
(2) the nature of the copyrighted work;
(3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and
(4) the effect of the use upon the potential market for or value of the copyrighted work.

Note that all four factors must always be considered. In some cases, however, some factors a=weigh more strongly than others. Now in this case:

  1. The work is for commercial purpose, which leans against fair use. But the use seems to be highly transformative, and that leans toward fair use.
  2. The copyrighted works are presumably creative and not factual, which leans against fair use.
  3. If the various images were separately contributed, the whole of each is being used, which leans against fair use. However, if they formed pats of a single work and were all created by a single author (John Blow's Ten Thousand Pets I have Known) then this factor would need to be evaluated based on how much of the work was used. But assuming each source image was independent, this leans against fair use, it seems to me.
  4. This depends on how the output is expected to be used. It the model will be used to produce images which might compete against the source images (should those ever be marketed) this leans against fair use. If it is to be used to produce 3-D models that have a quite different market, this probably leans for fair use.

It is not at all clear if a fair use claim would succeed or snot. Details of the facts would matter, and it might well be a judgement call for the court considering the issue. I don't find any published case that is on point here.

Other US Issues

In a US court, a copyright suit cannot be made unless the work has been registered first (if the work is a "US work", that is one published in the US by a US national). If registration is note made until after the claimed infringement, statutory damages and possible attorney's fees may not be available. In the absence of statutory damages, there would need to be evidence of commercial value and actual damages, or of profits made by the alleged infringer. Most online pet photos are not registered with the US copyright office.

There is also a 5-year statute of limitations for copyright claims. That is from the date of the infringement.

Other Exceptions to Copyright

Other countries than the US have various exceptions to copyright of their own. Some have exceptions for "research". The UK has "Fair dealing" which is somewhat similar to the US fair use, but is more limited. India, for example, specifies more than 27 separate exceptions. Such exceptions would need to be researched for each country of interest, if one is relying on an exception.

Use of Public Domain images

One could, instead, use images already in the public domain. Images published in the US, say in books or magazines, prior to 1964, are now in the Public Domain under US law, if the copyright was not renewed 28 years after publication (plus or minus 1 year). Estimates are that less than 10% of publications were so renewed. Project Gutenberg includes lists of all renewals (originally published by the US Copyright Office) With these one can determine if a copyright was renewed.

Works publishes in the US prior to 1925 are now all in the PD.

Any PD work could be scanned and used. More work than a download, but no copyright issue and not TOS issue either.

Conclusion

Copyright probably does not prevent the sort of use described in teh question, but if it does an exception to copyright such as fair use may or may not apply. Images not protected by copyright could be used.

2
  • "If removing any one image from the training set would result in much the same model, it is hard to claim that any ultimate output work is a derivative work of that image" The word "much" is doing a lot of work here. The trained weights will always be dependant on the input data, to what extent any particular output is dependant on any particular input is impractical (perhaps impossible) to determine. One can see many examples of the generation of such "derivative" images at this X does not exist.
    – Dave
    Commented Nov 6, 2021 at 11:36
  • In particular this art work does not exist has images that seem to include themes for copyrighted art in a similar way to music that has been found infringing. It seems an open question how copyright applies here
    – Dave
    Commented Nov 6, 2021 at 11:38

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .