3

I am considering using audio soundtracks training data for training a machine learning model. Under this consideration, these audio files are speech excerpts obtained from some TV programs posted on streaming platforms like Youtube. The trained model is able to output sounds quite similar (in terms of style, tonality, etc.) as the training data samples used during model training. By nature of the model, it is possible to produce quite similar results (but not exactly identical) as the training data samples used, when given the right prompt. But these "right prompts" are generally kept hidden from users.

Could anyone explain what are the copyright laws in these situations:

  1. The output of the trained-model will be used commercially on a website.
  2. Having a "donation" option to get some support for financing the service such as backend servers. Is this considered "non-commercial" usage and fair-use? Much thanks!
2
  • Which jurisdiction are you interested in? Some jurisdictions provide explicit exceptions/conditions for this.
    – xngtng
    Commented Jun 9, 2021 at 11:09
  • 1
    As this service will be run on a website accessible globally, am I correct to understand that there is a more universal set of regulations? Or if such regulations are based on where the owner of the website/service resides in, then I would be interested in that of the United States. Commented Jun 9, 2021 at 18:38

2 Answers 2

2

The on;y thing close to a world-wide copyright law is the Berne Copyright Convention, but it leaves many details to the laws od individual countries.

"fair use" is a concept only under copyright law. There is a somewhat similar but narrower exception to copyright known as "fair dealing" in the UK and some other countries, mostly commonwealth countries. Three are somewhat similar exception under the laws of other countries, mostly for educational and news reporting purposes.

A copyright owner may sue for infringement in any country where the infringing work has been published or distributed, or other acts of (alleged) infringement have occurred. It is often easier to enforce a judgement if one sues in the country where the defendant lives or works, or some place that has jurisdiction over the defendant.

Under US law the question of whether a service is commercial or not has only limited impact on whether it may rely on fair use or not.

If the copyright owner has published the audio files, that giv anyoen permission to use them, at least under US law and I think Under the laws of most if not all other nations. The right to use a work, or to authorize the use is not one of the rights of the copyright owner under 17 USC 106 nor under article 5 of Berne.

Since anyone has the right to use these audio files, anyone may use them as input to an algorithm or automated process. That the output of such an algorithm and its associated data set could be similar audio, when this is not the normal or expected output, would not make this a form of copying in my view, but I don't know of a case exactly on point for this. If I am correct, the is no copyright infringement here, and no need to evn discuss any fair use claims.

1
  • The critical question would be if either a trained CNN or the output of such is a derivative work of the input. The most relevant cases could be the various music copyright cases, but the differences are significant.
    – Dave
    Commented Jun 11, 2021 at 8:54
0

It depends on what the material is and how you get it. First, I assume that you are only mining material posted by the copyright holder – as you may know, a lot of pirated material is available and being available on Youtube is not a guarantee that the material is non-infringing (it's only a guarantee that the uploader claimed that it is non-infringing). Second, it depends on the license terms. Youtube says that

You are not allowed to: access, reproduce, download, distribute, transmit, broadcast, display, sell, license, alter, modify or otherwise use any part of the Service or any Content except: (a) as expressly authorized by the Service; or (b) with prior written permission from YouTube and, if applicable, the respective rights holders

In the past it was easier to determine whether permission had been granted, but not so much anymore. However, the uploader grants other users a license to access the content only through the service, and not by other data-mining techniques. So data-mining from Youtube would be without permission, hence potentially infringement.

W.r.t. US copyright law, this brief touches on aspects of fair use and data mining. They give the impression that data mining is allowed under fair use, without mentioning whether any cases of data mining were found to not be fair use, and if so, what distinguishes such uses (this is where rules like the "one chapter / 40 page" rule came into existence). It's more likely than not that it is indeed allowed under fair use, though that does not prevent Youtube from blocking you for violating their TOS.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .