0

Situation

Suppose I have built a computer program which fetches articles from websites, rewrites them and posts them.

Since I'm rewriting the article I think copyright law is not violated since information can not be copyrighted on its own.

Questions

  1. Do I need to give credit to the original article writing website even after rephrasing the article?
  2. Can websites declare web scraping to be illegal? What if the program mimicks the human behaviour?
11
  • 1
    Related: Paraphrasing of copyrighted material Commented May 2, 2022 at 11:25
  • 3
    "Giving credit" to the original source doesn't generally shield you from copyright claims. In the US, at least, you have to get explicit permission from the copyright holder unless the copied material falls under "fair use" (in which case you don't even necessarily need to cite the original source.) More on copyright infringement vs. plagiarism. Commented May 2, 2022 at 11:55
  • 1
    "Giving credit" deals with the issue of plagiarism. That is a seperate issue. To avoid violating copyright, you need a license from the copyright holder.
    – Roland
    Commented May 2, 2022 at 13:16
  • 1
    @Michael Seifert Giving proper attribution is often a significant factor in whether a court finds fair use or not. Uncredited use is rather less likely to be found to be fair, although credit is not ab absolute requirement of fair use. Commented May 2, 2022 at 13:41
  • @Roland Under US law, if fair use applies, there is no infringement and no license is required. The same is true for fair dealing in the UK, and various "exceptions to copyright" in various countries. Commented May 2, 2022 at 13:44

2 Answers 2

4

Ideas are not Subject to Copyright

Copyright does not protect ideas. This is true in the US, in the UK, and under the copyright laws of every country that I know of. Article 2 paragraph 8 of the Berne Copyright Convention reads:

The protection of this Convention shall not apply to news of the day or to miscellaneous facts having the character of mere items of press information.

If the ideas of a work have been so re-written or recast as not to constitute a derivative work, the original author has no rights over the new work, which becomes a separate work with its own copyright. In such case there is no legal requirement for any credit or acknowledgement, at least not under copyright law. Also the use of a work whose copyright has expired, or is for some other reason in the public domain and not protected by copyright, may be legally made without acknowledgement of the author, or even under a false designation of authorship.

Plagiarism

Passing someone else's work off as one's own is generally considered to be plagiarism. Some people consider that using significant parts of another's work without proper credit is also plagiarism.

Plagiarism is not a legal matter. It is considered highly improper in the academic and journalistic worlds, and may carry serious consequences there. It is considered unethical by many in other situations as well. However, it does not constitute copyright infringement, and copyright law cannot be used to prevent or punish plagiarism that is not also infringement.

Works Created by an Automated Process or Script

Whether an automated process can (at the current state of the art) truly extract facts and re-express them to a degree that would constitute a new, non-infringing work, I tend to doubt. Whether even sufficient alteration could be made by an automated process to reliably constitute fair use, fair dealing, or have any similar exception apply I also doubt.

The US Copyright Office Compendium of Copyright practice (an official publication of the US Copyright Office) states in item 307:

The U.S. Copyright Office will register an original work of authorship, provided that the work was created by a human being.

The copyright law only protects “the fruits of intellectual labor” that “are founded in the creative powers of the mind.” Trade-Mark Cases, 100 U.S. 82, 94 (1879). Because copyright law is limited to “original intellectual conceptions of the author,” the Office will refuse to register a claim if it determines that a human being did not create the work. Burrow-Giles Lithographic Co. v. Sarony, 111 U.S. 53, 58 (1884)

Similar legal limits on AI authorship apply in many other countries.

Fair Use

Fair use is a specifically US legal concept, and generally does not apply in any other country, although I understand that Israel has closely followed US law in this matter.

Fair use is defined by 17 USC 107. That law specifies four factors which a court must consider in making a decision on whether a use is a fair use. particularly important is whether the new work will harm actual or potential markets for the original, and whether it will serve as a replacement for the original. US Courts also often consider whether a new work is "transformative", that is whether it serves a significantly different purpose than the original does. For example, in a popular song, lyrics are often intended to have an emotional effect. In a textbook on verse, the same lyrics may be used to demonstrate poetic technique, rhyme, meter, etc. That would be a transformative use.

The presence of proper attribution or credit is often a significant factor in the decision by a court as to whether a use is fair. Using another's work without proper credit is significantly less likely to be found to be a fair use, although credit is not an absolute requirement of fair use.

See Is this copyright infringement? Is it fair use? What if I don't make any money off it? and the various questions on this site tagged for many more details on fair use.

Fair dealing and Other Exceptions to Copyright

In the UK and some commonwealth countries, there is a doctrine known as "fair dealing" It is somewhat similar to fair use, but is generally more limited. In other countries there are various "exceptions to copyright". Some countries have a few broad exception, some have many narrower exceptions. India, for example, has more than 28 separate exceptions. What is covered varies from country to county. Exceptions for teaching, comment and analysis, and news reporting are common.

Article 9, paragraph 2 of the Berne Copyright Convention (linked above) recognizes such exceptions, stating:

It shall be a matter for legislation in the countries of the Union to permit the reproduction of such works in certain special cases, provided that such reproduction does not conflict with a normal exploitation of the work and does not unreasonably prejudice the legitimate interests of the author.

The convention goes on to state, in article 10, that:

(1) It shall be permissible to make quotations from a work which has already been lawfully made available to the public, provided that their making is compatible with fair practice, and their extent does not exceed that justified by the purpose, including quotations from newspaper articles and periodicals in the form of press summaries.

(2) It shall be a matter for legislation in the countries of the Union, and for special agreements existing or to be concluded between them, to permit the utilization, to the extent justified by the purpose, of literary or artistic works by way of illustration in publications, broadcasts or sound or visual recordings for teaching, provided such utilization is compatible with fair practice.

(3) Where use is made of works in accordance with the preceding paragraphs of this Article, mention shall be made of the source, and of the name of the author if it appears thereon.

Thus article 10 paragraph 3 of the Berne Copyright Convention establishes an international norm that works used under an exception to copyright, such as fair use or fair dealing, shall be properly credited.

Web-Scraping

The law on computer scraping is still under development, and varies from country to country. If a site operator makes it clear to users that scraping is unwelcome, it may be unlawful, depending on the rules of the country or countries involved. When a Terms of Service (TOS) document constitutes a binding contract or agreement that users must accept, and when such an agreement prohibits scraping or other automated access, that prohibition may be enforceable.

in Craigslist Inc. v. 3Taps Inc., 942 F.Supp.2d 962 (N.D. Cal. 2013) a US Federal district court held that sending a cease-and-desist letter and enacting an IP address block is sufficient notice of online trespassing, which a plaintiff can use to claim a violation of the Computer Fraud and Abuse Act (CFAA). However, that decision has been criticized by many, and was not a Circuit court or Supreme Court case.

In the recent case of Van Buren v. United States, 593 U.S. ___ (2021) the US Supreme Court narrowed the application of the language in the CFAA making access that "exceeds authorization" criminal. In the case of HiQ Labs, Inc. v. LinkedIn Corp. The Supreme Court addressed the question of whether scraping a public website after a cease-and-desist letter has been sent constitutes a violation of the CFAA (this was the fact pattern in Craigslist v. 3Taps). The Court sent the case back to the Ninth Circuit for reconsideration. The Ninth Circuit Court reaffiremd its prior decision that when the website had been made publicly accessible, the CFAA did not apply, even in the face of a C&D letter. This seems to overrule 3taps. Note that other means of prohibiting scraping may still be legally sound and enforceable. See "hiQ Labs v. LinkedIn" from the National Law Review. (This article and the decision it reports was brought to my attention via a comment by user Michael Seifert.)

The article "Web Scraping Watch: Cases Set to Clarify Application of the Computer Fraud and Abuse Act" discusses these cases in more detail, but does not incloude the latest ruling in the HiQ Labs case.

Conclusion

Unless the results of the "rewrite" done by the "program" are sufficiently original to be neither a quotation, a fair use, nor a derivative work, but a new work using the same ideas, they will need to qualify under fair use or some other exception to copyright (unless permission has been obtained). This may well require a proper attribution of the original article. In any case, such credit is considered to be ethically mandatory by many.

The web-scraping done to obtain the initial data may or may not be lawful, depending on the contents of any TOS document, and whether the relevant laws make such a document enforceable, which is still not a fully settled point under the law, and which varies by country.

Personally, I would think giving proepr credit much easier and safer than trying to justify not doing so, but that is not law, just my opnion.

3
  • FYI, HiQ v. LinkedIn was re-affirmed a couple weeks ago (the Ninth Circuit was instructed to reconsider it after the Van Buren decision but came to the same conclusion as it did initially.) Commented May 2, 2022 at 15:35
  • @Michael Seifert Thank you that is quite interesting. Taht seems to settel teh applicability of the CFAA to most scraping of publicly available sites, but does not say that other methods of prohibiting scraping may not be legally valid. Commented May 2, 2022 at 15:44
  • @Michael Seifert I have updated my answer citing your soiurce Commented May 2, 2022 at 15:56
1

This sounds like an untested edge case. If you harvest text from a website and redistribute the work in exactly the same form, that is copyright infringement (I assume that doesn't need explaining). If you manually reformat the text in some artistic fashion, you still infringed copyright though you do hold the copyright in the artistic element that you added. Changing the original does not negate the original author's copyright. You can create derivative works of various types such as translating into Spanish, respelling words and so on.

You propose a sophisticated "translation" that algorithmically turns one string into another string, which looks like "creating a derivative work". However, it is clear that if a person reads an article and then writes another article that presents the same information, that is not infringement (under the theory that you are copying the ideas, not the text). The law has not developed bright lines that distinguish "expression" (via language) from "ideas", nor has philosophy. Legal doctrines on this distinction vary between countries.

It is utterly clear that your algorithm requires wholesale copying of the original text, and copying is the very act which requires permission. US law complicates the matter by adding a vague "fair use" doctrine which says that sometimes you can copy without permission. However, a legal finding of copyright infringement is also based on a comparative analysis of the texts, using the concept of "substantial similarity". It would be fatal to your case if the plaintiff could reverse the translation. You goal would be to sufficiently obfuscate the machine translation so that substantial similarity could not be found.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .