27

I am currently interested in doing some research performing various measurements and algorithms on the most common words in the English language. I have found a few good word lists online that would be suitable, but I am concerned that they may be subject to copyright and would like to be sure about their status before I use them. The lists are in the form of a simple text file with one word per line.

I understand that collections of words such as dictionaries are subject to copyright because they contain a large amount of words and their definitions which can be considered to require original work and creativity, but what about just words without any definitions or additional information?

I have seen this other Law Stack Exchange question, which mentions lists of words, but the author seems to have been interested in also using some short definitions, and I am not certain the answer refers to plain word lists.

Is a list of common words copyrightable? If so, would it still be considered fair use to use the word list to generate data not related to the words themselves?

4
  • 20
    Name your jurisdiction.
    – TRiG
    Commented Oct 7, 2019 at 10:22
  • 11
    Depending on jurisdiction, database rights might actually be the relevant right rather than copyright.
    – Jasper
    Commented Oct 7, 2019 at 12:44
  • 1
    Relevant Wikipedia article: en.wikipedia.org/wiki/Sweat_of_the_brow
    – Golden Cuy
    Commented Oct 8, 2019 at 1:55
  • If you are only using the list, and not redistributing that list or a derivative work based on that list, then I don't see how its copyrightability would be relevant. Fair use would not be relevant either; that would only be needed if you actually copied or redistributed the list (assuming it was copyrightable in the first place).
    – Brandin
    Commented Oct 9, 2019 at 14:42

2 Answers 2

41

Depending on your jurisdiction, such lists may be protected, but not by copyright.

For example, in Germany there was a court decision that scanning all the country’s phone books and selling them on CD constituted “unfair competition” and was illegal, while hiring 1000 typists who would manually type in all this information would not be.

Databases are protected in many jurisdictions, and a list of the 1000 most commonly used English words could reasonably be called a database.

18

The words themselves are not protected by copyright, because they are "facts" (of the English Language -- also, the list-maker didn't create the words). Lists of words created by an algorithm are "facts", and lack the speck of creativity that makes web pages protected. The corpora that underlie the lists are protected, as is the program that filters them to give token counts, but the resulting table of information is not, see Feist v. Rural Telephone.

16
  • 30
    You seem to assuming one particular jurisdiction without ever naming which one. Also, it is not clear why you assume that this particular jurisdiction is the one and only, considering that the OP has not named any particular jurisdiction. In particular, in 99.5% of all jurisdictions, the reference you cited is completely und utterly irrelevant. Commented Oct 7, 2019 at 10:47
  • 6
    "The list-maker didn't create the words" is beside the point.Few people would try to argue that a novel is not copyright because the writer did not create any of the words in it. The arrangement of the words (and whether or not that is "trivial" and/or "common knowledge") is the important point.
    – alephzero
    Commented Oct 7, 2019 at 11:10
  • 4
    The sequence of most common English words is the output of a well-defined algorithm, not a creative work. It's hard to see how any jurisdiction with a reasonable definition of copyright would protect that list.
    – asgallant
    Commented Oct 7, 2019 at 16:18
  • 8
    @Acccumulation Which Supreme Court? There's at least a few, and there's nothing that says this question is about English law (only about the use of lists of English words - for all we know it could be comparative research taking place in Thailand comparing most used English words to Korean words), and this site has users all around the world. I mean, in all likelyhood it's about US law since the Questioner's profile says they're in the US, but it's silly to assume US-centric for all questions on the site.
    – Delioth
    Commented Oct 7, 2019 at 19:21
  • 13
    Regarding "any jurisdiction with a reasonable definition of copyright", in fact, I'm not aware of any jurisdiction with what I would consider a reasonable definition nor implementation of copyright.
    – dotancohen
    Commented Oct 7, 2019 at 20:02

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .