I am a college student aspiring to learn Chinese.

Youtube has a wealth of resources available for learning Chinese - intructors, videos with subtitles, example sentences, movies... I was wondering the legality of writing down, recording, learning and even reproducing sentences I've learned from Youtube, but I don't want to violate copyright.

According to this article it seems that all youtube videos are copyrighted. And according to what I've researched on copyright, I've had difficulty trying to apply what I've found to this very niche situation. Although this quora question somewhat relates - someone asking about if dictionary example sentences are copyrighted.

The problem I face is that if someone has content in a YouTube video, does that automatically make it copyrighted? Can sentences be copyrighted? Or is it just the "video" -> the collection of ideas and - not the "content" inside of the video that is copyrighted? Provided the content is not copyrighted in the first place. After all, any sentence is conveying an idea in a specific combination of words. How can this be copyrighted even though it seems that is the condition for copyright - a specific way of expressing a certain idea?

In one case, I want to learn/copy example sentences from this video which is a compilation of common Chinese sentences/phrases. Another case, these sentences are already transcribed by this YouTuber. Thank you. I'm willing to elaborate more if needed.

2 Answers 2


This is where "fair use" comes in (in the US). Any content on YT is protected by copyright. Transcribing a movie would be creating a derivative work, which requires permission from the copyright owner. The same goes for transcribing a half of a movie. The minimal individual components of a text, the words and phrases, are not protected by copyright. A sentence, or two, or ten, get you into the realm of fair use analysis. There are 4 core factors and a "tie-breaker" about transformativeness, where e.g. if you only take a tiny bit that's consistent with fair use, if it's taken for an educational / research purpose that's pro fair use, if it's for a non-profit reason that's a vote for fair use, and a transcription is highly transformative. Of course, it's impossible to tell if your plan is actually fair use, because what happens is that you get sued for infringement, then you can use fair use as a defense. So it is legally more involved than the case where you actually did not copy or transcribe anything (they got the wrong guy, or, you are actually the original author).

Dictionary sentences are relevant because dictionaries generally copy sentences from existing literature, though often they are out of copyright at this point. The greatest risk of copyright infringement would be if you transcribed the entirety of Red Cliff, which relates to the "substantiality" consideration.

  • I've read and appreciate your reply. I've added some more context which will hopefully make my situation more clear. I think being general though, is also useful so that people with differing situations can gain insight into their own situation. Commented Mar 18, 2021 at 5:30

Generally speaking, copyright applies to creative works. A single sentence can be subject to copyright if there is some amount of creativity in the way it is formulated. It is not subject to copyright if it is the obvious, natural way to phrase an idea. For example, a translation “What time is it?” on its own is not subject to copyright, but a one-sentence poem would typically be. A translation of a “mundane” sentence into a fictional language might plausibly be subject to copyright.

The extent of copyright is not defined by the Berne convention, which merely states in article 2 that it covers (among other things)

every production in the literary, scientific and artistic domain, whatever may be the mode or form of its expression, such as books, pamphlets and other writings; lectures, addresses, sermons and other works of the same nature; dramatic or dramatico-musical works

The exact threshold at which copyright starts to apply is up to each national legislation, and is mostly a matter of jurisprudence. For example, in the United States, 17 USC §102 merely states that

Copyright protection subsists, in accordance with this title, in original works of authorship fixed in any tangible medium of expression (…). Works of authorship include the following categories: (1) literary works; (…)

“Literary works” is defined in 17 USC §101 as

“Literary works” are works, other than audiovisual works, expressed in words, numbers, or other verbal or numerical symbols or indicia

It does not define when such a work meets the condition of being a “work of authorship”. US House report 94-1476 discusses the meaning of this expression, which replaced the phrase “all the writings of an author” that was present in US law before 1976.

The phrase "original works or authorship," which is purposely left undefined, is intended to incorporate without change the standard of originality established by the courts under the present copyright statute. This standard does not include requirements of novelty, ingenuity, or esthetic merit, and there is no intention to enlarge the standard of copyright protection to require them. (…)

The historic expansion of copyright has also applied to forms of expression which, although in existence for generations or centuries, have only gradually come to be recognized as creative and worthy of protection. (…)

The term "literary works" does not connote any criterion of literary merit or qualitative value: it includes catalogs, directories, and similar factual, reference, or instructional works and compilations of data. It also includes computer data bases, and computer programs to the extent that they incorporate authorship in the programmer's expression of original ideas, as distinguished from the ideas themselves.

Note the key ideas that while the requirements are low, they do include “creativ[ity]” as judged by current standards.

