31

I am reading through these notes, trying to piece together a picture of what the rules/laws are regarding Wikipedia content:

I don't have any desire or intention of doing this myself, but I am wondering if I create a site like Wikipedia, which has "freely licensed/usable content", if someone else is going to go ahead and clone my project and slap it under a new domain, change some fonts and colors, slap on a subscription fee and maybe some ads, and try and rank higher on Google Search so they become the dominant provider of my underlying content.

So Wikipedia is a parallel, how does Wikipedia prevent someone from downloading a bulk dump of the whole site, putting it under their own custom domain like freeknowledgefoo.com, slap some ads around in the pages, and add a subscription fee. Then they maybe redesign a few things slightly (changing fonts and colors), and then for whatever reason they end up ranking higher than Wikipedia itself on Google and end up being used by default instead of Wikipedia. That would be a horrible scenario, which would mean that all the work is put into Wikipedia, but all the money is made by some copycat site.

How does Wikipedia prevent that?

How could I prevent that if I am offering free data dumps of various kinds, but myself have a UI to view the data (like NYC's open data site). So you can download the dump and run the site yourself, or use my precomposed main website hosting and showing the data. How do I prevent users from just deploying my project on their own domain, slightly changing things, and then they run off with the future?

Is this what the "ShareAlike" license is for?

Creative Commons Deed

This is a human-readable summary of the full license below.

You are free:

  • to Share—to copy, distribute and transmit the work, and
  • to Remix—to adapt the work

for any purpose, even commercially.

Under the following conditions:

  • Attribution—You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work.)
  • Share Alike—If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license.

With the understanding that:

  • Waiver—Any of the above conditions can be waived if you get permission from the copyright holder.

  • Other Rights—In no way are any of the following rights affected by the license:

  • your fair dealing or fair use rights;

  • the author's moral rights; and

  • rights other persons may have either in the work itself or in how the work is used, such as publicity or privacy rights.

  • Notice—For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do that is with a link to https://creativecommons.org/licenses/by-sa/3.0/

I don't see anything saying you can't do what I describe. But maybe the fact that you have to have "attribution" is the key? I don't quite know how best to approach this situation. On one hand I would like to release data which is free and open source. On the other hand I don't want someone else to then run off with all of it and try and outcompete me with the same product I guess.

11
  • 3
    It would also become swiftly outdated, since it is the active user core that adds/ updates the contained information. Commented Aug 21, 2022 at 10:18
  • 6
    Wikipedia dumps are available for download dumps.wikimedia.org Commented Aug 21, 2022 at 12:55
  • 17
    In future, please do not post images of test here, such as scans or sdcreen shots of text. If the text cannot be copied and pasted, please use OCR or simply retype the test, or just link. I have replaced the image with text. Images do not work for screen readers, for SE's own scan and search functions, or for external search engines. Commented Aug 21, 2022 at 14:54
  • 2
    wikiwand.com basically does this Commented Aug 22, 2022 at 0:11
  • 17
    Fun fact: this does happen with StackExchange. There are websites in a Q&A format that copy part of their content from here.
    – quarague
    Commented Aug 22, 2022 at 6:53

5 Answers 5

62

It's allowed by the Creative Commons Attribution -ShareAlike license, and intentionally so. The Wikimedia Foundation wants things like this to be possible; that is part of the goal of open content. (This license is also used on Stack Exchange content, so the same applies to e.g. this answer.)

However, it is important to remember that this is not a public-domain equivalent license. If you copy from a Wikipedia article (or an SE post), you must comply with the "Attribution" and "ShareAlike" requirements.

  • Attribution: You must give credit to the author. For Wikipedia articles, which typically have many authors, a link to the page is sufficient; editors agree to this in addition to the license when they save their edits. For Stack Exchange content, a link to the post itself should be enough. (To get a direct URL for an answer, click the "share" link below the post.) To be entirely safe, and as a courtesy, it would also be a good idea to include the poster's username.
  • ShareAlike: If you modify the content, you must release your modified version under the same or a compatible license. You can't copy this answer, add more information (or translate it into another language, or make any other change), and keep an all-rights-reserved copyright on it, or release it into the public domain; your version must also be released under CC BY-SA.

As long as you follow these requirements, copying is allowed and encouraged.

2
  • 1
    Perhaps elaborate on attribution in your answer? Most people "forget" to provide it. For Stack Exchange content, it is 9 out of 10 who "forget" to provide it. I don't know of Wikimedia content (I have never really encountered it). Commented Aug 24, 2022 at 11:27
  • 1
    @PeterMortensen What is the basis for saying that 9 out of 10 people "forget" to provide StackExchange attribution? For programming and looking something up in Stackoverflow, in particular, I often may say "I found out how to do XYZ on this Stackoverflow post", but that doesn't automatically mean that the code I wrote therefore also must be CC licensed. You have to look at such situations carefully to see if what you wrote is actually "based on" or "derivative" of the Stackoverflow post or not.
    – Brandin
    Commented Oct 17, 2022 at 6:50
53

"they end up ranking higher than Wikipedia itself on Google". That's a highly unlikely scenario, unless you could somehow convince all current users of Wikipedia that switching to a paid page is better (for the same content!). Also, Google's high ranking of Wikipedia articles (together with often embedding parts of an article in its results) is hard coded and unlikely to change without the explicit interaction of a programmer at Google. As mentioned in this article, Google is a large donor to the Wikimedia Foundation, paying several million dollars a year. This was, according to the article, done to "help reduce the pagerank of widespread, uneditable Wikipedia clones that were ostensibly ad farms."

So indeed, such pages did exist and many still do. The List of Forks known to Wikipedia is almost endless. Some just clone a particular set of articles (for a particular topic) others really just wrap a new layout around the same content, adding poor navigation and advertisements. Many of the clones fail to follow the CC license requirement, giving the impression the content is their own. This is obviously a clear copyright violation.

9
  • 2
    Actually, it happens all the time.
    – Nemo
    Commented Aug 21, 2022 at 18:09
  • 3
    @Nemo Which part do you mean? The copying? Yes, sure. But as far as I can tell, the above-mentioned effort of Google to fight against this type of ad farming has worked quite well. I have never been directed to a Wikipedia clone instead of the "true" one.
    – PMF
    Commented Aug 21, 2022 at 19:02
  • 4
    Thinking about it, Google earns most of its money also with ads placed next to search results. So maybe this effort wasn't only out of generosity. They want companies to place ads with them, not with the competitors (Wikipedia clones with ads are somewhat competing with Google).
    – PMF
    Commented Aug 21, 2022 at 19:05
  • 2
    They rip off wikipedia. add averts, and have sometimes made it into search results.
    – Jasen
    Commented Aug 22, 2022 at 6:56
  • 5
    In my experience, wikiwand mostly tends to make it into search results with stuff that got deleted from Wikipedia. If something is on wikiwand but not on Wikipedia, there's probably a good reason why Wikipedia deleted it (and I think Google Search does not return old revisions of Wikipedia pages).
    – gerrit
    Commented Aug 22, 2022 at 8:04
16

Wikipedia does not make any attempt to stop people creating and placing online copies of Wikipedia, or parts of it. Indeed it encourages people to do so. Not only does the CC-BY-SA 3.0 license explicitly allow this, Wikipedia provides free "dumps" on a regular Basis. Using these, anyone can copy the data behind any or all of Wikipedia's articles. It also provides free access to the software on which Wikipedia runs (MediaWiki) and instructions on how to set it up, and how to load it with data from a dump.

The legal requirements imposed are to provide attribution and to offer the content under the same (or a compatible) license, that is, the CC-BY-SA 3.0 license.

The license is simple, just include a note that the site is under the CC-BY-SA 3.0 license and a link to the text of the license. Attribution is harder, because Wikipedia articles often have many authors. But it is generally agreed that one valid way to provide attribution is to link to the history page for the source article on Wikipedia. One could also copy the list of editors from the history, and post that for each article.

Many online copies of Wikipedia do not fully comply with this license, and are infringements of copyright. But the copyright is held by the individual contributors, not the Wikimedia foundation, and few if any of them care to sue over such infringements.

I don't quite know how best to approach this situation. On one hand I would like to release data which is free and open source. On the other hand I don't want someone else to then run off with all of it and try and outcompete me with the same product I guess.

Those desires are essentially contradictory. If you post a work under a permissive license, such as the CC-BY-SA 3.0 license (or the very similar version 4.0 license which this post and all of Stack Exchange (SE) is offered under), you are saying that anyone in the world is free to copy the work for any purpose, including to sell access to it. One could use the CC-BY-SA-NC licensee which forbids commercial reuse, or some other license that imposes restrictions that seem good to the person doing the posting. But attempting to check up on copies, and get commercial ones taken down or sued would be a full-time job for a team of people, if the site becomes at all popular. Wikipedia (and SE) has chosen not to try.

3

As well as forks, there are also "mirrors". These are like forks, but update regularly to match changes in Wikipedia, so don't end up becoming nothing but a one-off snapshot of the site. Again, nothing to stop you from charging for it. You may not even get your capital investment back though, and the ongoing expenses for a mirror would be higher too, due to higher energy and data usage from continually updating to keep up with Wikipedia. Even more so if you are also saving the intermediate revisions. "Sure you can download Wikipedia, got several multi-terabyte hard drives sitting around?"

1
  • 6
    This doesn't appear to address the legal question, which is whether it is permissible to charge money for access to content taken from Wikipedia.
    – feetwet
    Commented Aug 21, 2022 at 14:37
3

Yes, it can be done legally. Although creating a mirror is not that simple nowadays, as the complexity of the pages increased through the years (images, wikidata, LUA…). You would also need to keep them frequently updated to be competitive.

And, most importantly, you must correctly include the license of the content (CC-BY-SA for the text, varied for the images) and credit the authors (the author is not "Wikipedia", but each of the contributors to that page, so you need a list of the authors. It might be enough linking to the Wikipedia history… or you could need your own copy - e.g. they could delete the page you are showing!).

Then, getting to the main issue, you are too expensive. You are providing a service (website with Wikipedia content) for a fee, that is provided for free by Wikipedia itself. And I would bet that better than you (faster to load, more up to date…)

If you are offered a paid option and exactly the same one (and perhaps even better) for free, which one would you choose? :)

You would need to provide an additional value that people were willing to pay for. Perhaps, your website is available in a local community with no access to the internet. Or you printed this encyclopedia in paper form. Or you certify that your version does not contain any wrong fact. Or you made a deal with Chinese authorities so your website is available from China while Wikipedia is not.

Or a nuclear war ensued and yours is the only remaining copy of Wikipedia.

This is the same question that is sometimes raised with Free Software. You could repackage and resell for a hefty sum an image program, a web browser, or even a full operating system with little more than changing their names. But you must acknowledge who their authors are (you cannot pass it as made by you), and the right of anyone that receives it from you to give it to others for free.

So in the end the options to get money from that are generally

  • Get paid a nominal fee for the disks with the media (nobody would hardly pay more)
  • Get paid for providing support (helping people with issues, keep servers running…)
  • Get paid for developing it (there is a payment for the development of a feature, but from then on it is available to everyone for free)
2
  • 1
    This doesn't appear to address the legal question, which is whether it is permissible to charge money for access to content taken from Wikipedia.
    – Trish
    Commented Aug 21, 2022 at 17:35
  • 4
    @Trish: it's in the very first line: Yes. Then I admit I focused on the question on why you are unlikely to get rich with it (and probably not even cover your costs). You legally can, but is not a viable business.
    – Ángel
    Commented Aug 21, 2022 at 17:39

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .