0

For my bachelor thesis, I need to download ~10Tb of data from Twitch via Twitchleecher. I would like to use it for Deep-learning (emotion recognition), which means processing the data and deleting it afterward. I don't plan on publishing the trained network itself, making any profit off it or mentioning their name (should I mention them?).

I asked the programmer of TwitchLeecher before and he advised me to abandon the project or get a good lawyer/ ask every streamer and Twitch for permission. I've tried asking a couple of the streamers and Twitch and got no answer at all (which was to be expected I guess and will probably not change).

Is this fair use and is there a feasible way to make sure I don't get in legal trouble for this?

2
  • There are two concerns: (a) may you analyze the data? Probably so, as argued in the answers. (b) may you download the data? That depends not only on copyright but also Twitch's ToS. 10TB sounds like it may run afoul of reasonable use policies.
    – amon
    Commented Apr 16, 2019 at 16:36
  • @amon I agree on (b) I've been looking at the ToS for a while now and there are multiple sentences that would restrict my use it seems, but mainly: Circumvent the restrictions on copying data by using automatation (TwitchLeecher).
    – WhatAMesh
    Commented Apr 16, 2019 at 18:48

2 Answers 2

3

The first question is whose law you are concerned with, since in principle you might have violated copyright law in any country, and might be sued under the laws of multiple countries. The US has a concept of "fair use" which is notoriously difficult to apply. When you are sued in the US, you can defend against the allegation by arguing certain things: telegraphically, this includes purpose and character of use, nature of the work, substantiality in relation to the whole, and effect on market. Plus there is a 5th factor to be considered, transformativeness. The court then weighs these factors to decide if the use is "fair". By reading existing case law on the topic (conveniently available from the US Copyright office) you might develop a fact-based opinion of the risk: you would be vastly better off hiring an attorney who specializes in US copyright law to do an analysis for you. Do not hire a programmer to give you legal advice (do not hire an attorney to debug code).

You would "fail" on the test of substantiality in that you are copying a highly substantial portion of the original work(s). You would "win" on nature of use (research especially non-profit and commentary are the underlying purposes that drive fair use law). It's not clear how you would fare w.r.t. nature of the work, which is intended to distinguish the extremes "news report" and "literature and artistic work" where copying news is at the fair use end of the spectrum. It is not clear how you would fare on "effect on market", but probably not so badly: are you avoiding some licensing fee? Coupled with the tranformativeness consideration, you are most likely having no effect on the market, since the product that you will distribute is not the original work, but a scientific conclusion about the work.

Germany has different laws, and this article would be relevant if you cared about Germany. There was a change in the law that expanded the analog of fair use pertaining to research use. That law allows 15 percent of a work to be reproduced, distributed and made available to the public for the purpose of non-commercial scientific research. That, b.t.w., does not refer to what you are planning to do (unless you also publish quotes); for personal scientific research you may reproduce up to 75 percent. Since this is a new law only a year old, you could become part of the cutting edge in testing the limits of the law. So the standard disclaimer applies: ask your attorney. But note section 60d of the law which legalized data mining, and is squarely on point:

(1) In order to enable the automatic analysis of large numbers of works (source material) for scientific research, it shall be permissible

  1. to reproduce the source material, including automatically and systematically, in order to create, particularly by means of normalisation, structuring and categorisation, a corpus which can be analysed and

  2. to make the corpus available to the public for a specifically limited circle of persons for their joint scientific research, as well as to individual third persons for the purpose of monitoring the quality of scientific research.

In such cases, the user may only pursue non-commercial purposes.

(2) If database works are used pursuant to subsection (1), this shall constitute customary use in accordance with section 55a, first sentence. If insubstantial parts of databases are used pursuant to subsection (1), this shall be deemed consistent with the normal utilisation of the database and with the legitimate interests of the producer of the database within the meaning of section 87b (1), second sentence, and section 87e.

(3) Once the research work has been completed, the corpus and the reproductions of the source material shall be deleted; they may no longer be made available to the public. It shall, however, be permissible to transmit the corpus and the reproductions of the source material to the institutions referred to in sections 60e and 60f for the purpose of long-term storage.

1
  • @Putvi Upvote/Downvote doesn't mean "agree" or "Disagree". It just means "This answer was useful" or "this answer was not useful" respectively.
    – Brandin
    Commented Apr 17, 2019 at 4:57
-8

As long as you are using it for your own use and do not distribute anything you are fine.

2
  • Let us continue this discussion in chat.
    – user4657
    Commented Apr 17, 2019 at 3:40
  • This answer is too short. If you wanted to answer in this way a comment would have been better. To supply an answer you need to expand on it otherwise it will not be accepted as good.
    – Brandin
    Commented Apr 17, 2019 at 4:55

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .