0

I'm trying to build a large dataset of images for training a machine learning model. My idea is to generate these images by grabbing frames off YouTube videos. My question: would this violate copyright laws if the frame comes from, say, a movie? Would I be safe to openly distribute this dataset for other researchers?

5
  • It depends in large part on your jurisdiction, so I think you should include that information. Even then, I'm not sure if this would be on topic here; if it's not, it might fit on Law or Open Source.
    – David Z
    Commented Nov 9, 2017 at 4:41
  • 5
    Have you read YouTube's policies about botscraping and research use? Why are you asking the internet at large when the actual company is right there to email and call?
    – Nij
    Commented Nov 9, 2017 at 6:46
  • Doesn't Google "botscrape" the entire internet (this very stack exchange post included) constantly so it knows what information exists where? That's exactly what I'm trying to do: look at the content offered by YouTube so I can learn something from it. Commented Nov 9, 2017 at 13:28
  • In addition to copyright questions and Terms of Service questions (which, regardless of what Google does, merit attention), there are additional ethical questions to consider. This article is a good example of ethical implications of use of public data for research. I also recommend reading the Association of Internet Researcher's 2012 ethics statement.
    – user60728
    Commented Nov 10, 2017 at 2:21
  • @rodrigo-silveira and they can afford the lawyers when there is a dispute, which you cannot
    – Greg
    Commented Nov 10, 2017 at 16:55

1 Answer 1

3

According to this: http://tubularinsights.com/youtube-copyright-ownership/

People / Entities / Conglomerates still retain at least some rights to their own videos most of the time.

So yes, if you come across any licensed material, for eg. grabbing frames from videos of movies etc without seeking permission first and the original owners find out, then you could get into trouble.

You could handpick videos / create playlists that you want to scrape making sure that their content is not commercially licensed or is used with permission of the original creators (I think you typically do this with the publication that follows / describes your dataset, if you choose to do that)

1
  • Many YouTube videos are Creative Commons-licensed, and even though these licenses typically don't require asking permission, there is nearly always an attribution component that's required. If the images were just being used for an internal research project, I'm not sure how that would play out, but best practice would suggest that credit should be given to the creator of each video.
    – user60728
    Commented Nov 10, 2017 at 2:25

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .