122

On Stack Overflow, we've suspected the plagiarism rabbit hole was deep... but we've had some users who are plumbing the depths of that pit and finding out that it's far deeper than anyone realized.

Up until now, it's been fine to leave them under moderator flags. You typically don't see a ton of them at once, and a user with a pattern will get dealt with in short order. But we now have 900+ moderator flags on SO, with at least half (probably more like 2/3 now) being plagiarism. There's two problems with this

  1. Plagiarism flags are slow to handle. Assuming you got enough data from the flag ("this post is plagiarized from here [link]"), you now get the fun of figuring out if the post was copied. Sometimes it's a slam dunk and pure copy-paste. Sometimes it's mix-and-match posts. Sometimes it's simply not properly attributed. And sometimes you just can't tell. And if it has a lot of votes, you'll need a post disassociation from a CM on top of that.
  2. It clogs the queue so other flags get lost. If there's something important flagged on SO, it may not get handled in a timely fashion.

I certainly don't want to even imply our newfound plagiarism flaggers' efforts are creating any issues. I've seen some seriously messed up stuff from years ago that we were able to fix thanks to them. I want this stuff flagged, regardless of how many flags it adds to the queue. We just need to be able to categorize it better.

With a dedicated flag, we could also improve the process. Add sources for the material as part of the flagging process. A surprisingly common tactic to gain rep is to find a duplicate question, go to the duplicate, and copy the best answer(s). If the source post(s) are on the site, give us a console where we can see a comparison side-by-side. I don't think that would be terribly hard, but it would give us a leg up on removing copied materials. It also forces flaggers to provide a link to the source material (uncommon, but I've seen more than a few that are just "This answer is copied from another answer").

It might also be useful to make a case-by-case disassociation system for CMs. That way we're not building these massive error-prone lists by hand so we can open one ticket to them. It would take some of the tedium out of the process and guide all SE moderators through handling plagiarism ("Would you like to delete this post? Request a disassociation?")

12
  • 4
    Related proposal on MSO (for on-site duplicate answers) How about adding a "duplicate answer" manual flag type? Commented Sep 6, 2022 at 3:57
  • 6
    Duplicate answer is not the same thing. Assessing whether something is repeating the same thing as another answer generally requires subject-matter expertise (unless it's just plain copy-pasta), and while it's something that I'm fine with users flagging for moderators to review, it's not anywhere near the same as plagiarism and shouldn't be conflated with it. @Henry Commented Sep 6, 2022 at 4:01
  • 3
    I agree that plagiarism and duplicated answers are different things @Cody especially with regard to "equivalent" answers that are not identical. IMO that doesn't change the fact that the proposals are related in that they both propose a new flag workflow which would (1) help organise flag queue by providing a new category of flags (which helps with handling and tracking) and (2) require the flagger to provide a content origin when flagging a post because sometimes these types of flags do not include a link to the source. Commented Sep 6, 2022 at 4:06
  • 12
    Yeah, I see the connection, @Henry. I guess what I'm saying is, I think it would actually be a significant drawback if people started misusing any new "plagiarized" flag type as a way of flagging answers that they thought duplicated/repeated other answers on the page... Commented Sep 6, 2022 at 4:15
  • 1
    @HenryEcker To some extent they're in the same wheelhouse, but we're getting to where we do need more specialized flags on some sites (especially SO). The problem on "late retreads" is that people are more likely to just say it's the same as another answer without specifying which answer they mean.
    – Machavity
    Commented Sep 6, 2022 at 13:58
  • 3
    I'm sorry, I'm a bit confused. The only thing I meant to indicate with my initial comment was that these proposals are related. I was noting the relationship because (1) there are good points made there which also apply to this proposal; (2) the workflows for both of these flags have similar issues both wrt. handling time and the way that people flag these types of posts; (3) if developers take the time to implement this there are other types of flags which could benefit from a "requires supplemental link" mechanism. Commented Sep 6, 2022 at 15:00
  • 2
    Also having just gone through the process of creating a new closure reason, there was significant emphasis on providing guidance on how to use the reason in the UI itself. I assume that this is similar for flag descriptions where it is important to consider what this flag should not be used for and what they should do instead. Like, for example, differentiating plagiarism (exactly copied text without attribution) and functionally equivalent answers (retreads). I hoped that by providing a related proposal with similar issues would help in this regard as well. Commented Sep 6, 2022 at 15:00
  • 2
    Excellent suggestion! The side-by-side comparison could probably be done using an iframe, which would also permit comparison of non-SE material, eg Wikipedia articles.
    – PM 2Ring
    Commented Sep 7, 2022 at 0:08
  • 3
    "I think it would actually be a significant drawback if people started misusing any new "plagiarized" flag type as a way of flagging answers that they thought duplicated/repeated other answers on the page." Agreed. So - why don't we have both? "Plagiarized" and - not "duplicate" (since that's ambiguous), but "redundant". Commented Sep 7, 2022 at 20:11
  • @KarlKnechtel right, a duplicate answer is not necessarily plagiarized. Two users might see a question "How do I reticulate the spline in F#?" and independently consult, summarize, and cite a post on the Spline Reticulation in Functional Languages blog. We don't need both answers but neither are plagiarized. Commented Sep 12, 2022 at 13:57
  • Well, yes, but I was also thinking of the situation where two users independently think about a problem that has only one realistic solution, and write out the relatively simple solution in substantially the same way. Off the topic of this post, but also a significant problem, at least on Stack Overflow. Commented Sep 12, 2022 at 16:01
  • 2
    Maybe also allow non-moderators to review plagiarism flags through the Low Quality Posts queue, similarly to what we already do with NAA flags. Commented Nov 4, 2022 at 16:52

2 Answers 2

57

UPDATE

We've built and will be releasing a plagiarism flag on 22 March 2023 (on SO only). For more info, see this MSO post - Plagiarism flag and moderator tooling launching to Stack Overflow


We can see this is a high priority issue for you all. Back in September, Bella asked the moderators to escalate situations where they found users who had five or more instances of plagiarism for us to have a better understanding of the scope of the issue that mods and users face when trying to deal with possible plagiarism.

After a few days the situation was clear and she reconnected with the moderators. I think her response was, “Wow. Are y’all ok?!” No. She and I know you are not.

We got together with jkm (one of our new PMs) and the Community Enablement team she's managing, who are working on improving tooling for moderators and CMs since this concern would fall to them to solve. Together, we wanted more information about two things:

  1. Volume of flags
  2. Time to resolve

One thing that she recognizes is that there's a huge backlog of asks as tools for moderators haven't been touched much over the last few years - as such, she wants to be cautious about balancing addressing emergent pain points with long-standing needs for improvements.

Our investigation process

jkm worked with Nicolas to identify some keywords like plagiar and without attribution, to categorize which flags are likely related to plagiarism. They did this to better understand the scale of the problem. Meanwhile, Bella and I chatted with the moderators to understand what problems this influx of flags was causing for them and what would make handling the flags more efficient.

Our findings

The TL;DR is that the general volume is low, but flags related to plagiarism have been increasing and taking much more time to resolve - sometimes hours per flag, as many cases of plagiarism are not one-offs, so moderators are often investigating all of a user's answer history.

We realized there's two big issues that we need to find solutions for, eventually -

  1. lots of posters may not be aware of what plagiarism is and the Stack Exchange policies around it.
  2. there is 15 years of content that we need to review for possible instances of plagiarism.

These two aspects will require different solutions. Because the concerns y'all have brought to us are centered around your ability to easily handle the flags you're getting, we feel like starting with the second issue is more pressing.

The second issue can be split into two areas and this is where we are focusing:

  • The high volume of custom plagiarism flags are making it harder for mods to find other potentially important flags in the "moderator attention" flags category.
  • Mods often have to ask CMs for support in handling these flags as they don’t have the tools they need to investigate or resolve flags related to plagiarism..

Our [tentative] plan

The solutions we’re working on are:

New plagiarism flag type for questions & answers

This will require flaggers to include a link to the original source and give them a space to add an explanation. This will also allow for these flags to be bucketed into the same category on the flags dashboard, separating them from other custom flags and giving us a better idea of the volume. This will only be enabled on Stack Overflow initially but other sites will be able to request it if they have need.

Allow moderators to deny reputation for deleted plagiarized posts

Our system automatically allows the poster to keep reputation earned when a post is over 60 days old and has a score of 3 or higher. In the case of plagiarism, this means that moderators often have to request that CMs ‘disassociate’ older posts to ensure the user isn't earning reputation for content they didn't create. When posts are disassociated, they're no longer connected to the poster's account, meaning that the fact they've had posts deleted as plagiarism disappears. So we need a solution that removes reputation without disassociation. Because we want to allow users to fix their own posts and get them undeleted, this solution would only impact the user's reputation while the post is deleted.

Make it easier for moderators to see when users have a history of plagiarism

Once flags have been handled, it becomes difficult for mods to see how common it is that a user has plagiarism flags validated on their posts. In order to simplify future investigation for cases of repeated plagiarism, we want to ensure these handled flags are easy to see. We think that one of the best ways to achieve this is by improving the flagged posts for user moderator page. This page shows all posts a user has that have ever been flagged but it's not sortable and can't be filtered. We think having this page look more like the flags raised by user page will be more useful for moderators in many cases.

Create a post notice explaining why the post was deleted and guiding the author

Often, moderators leave comments linking to the original sources to notify the poster and have an indication for high-reputation users why the post was deleted. We are going to investigate adding a post notice to answers deleted as plagiarism that will notify the poster and give them helpful information about how to properly attribute their post on our network and get attention to have it undeleted if the issues are fixed.


These are the initial changes that have been planned, and we’ll be monitoring the following over the next two months:

  • Time that moderators are spending on flags related to plagiarism
  • % of CM escalations related to plagiarism
  • Answers to this linked meta question

Feedback

If you have any thoughts or concerns about what we are planning, please let us know in the answers to this linked question. As far as the timeline goes, while we're working on design concepts for this (as you can see above), we will be picking this project up as Winter Bash wraps up.

This isn’t a perfect or holistic solution, in other words we don’t expect to fully solve the plagiarism problem with these solutions, but we do hope it makes a noticeable difference in how to know which flags are related to plagiarism, and how long it takes to investigate and resolve these types of flags. We think these changes will be iterative, meaning if we are not seeing meaningful movement in either category, there’s always room to further investigate. Otherwise, we’ll be excited to tackle some of the other areas like suspicious voting, sock puppets, etc. that we know have been asked for consistently through moderator surveys, Meta posts, and Mod working groups.

7
  • 2
    is this supposed to be strictly SO or the plan is to eventually make it available on some other sites in the network?
    – gnat
    Commented Mar 22, 2023 at 6:55
  • ...nevermind, upon more careful re-reading I found the answer right here, in this very post: "This will only be enabled on Stack Overflow initially but other sites will be able to request it if they have need"
    – gnat
    Commented Mar 22, 2023 at 19:25
  • But why does each sites have to make an individual requests for this useful tool? Commented Sep 4, 2023 at 8:51
  • @ElementsinSpace because most sites don't have any plagiarism at all and adding a flag for it overcomplicates the UI with unnecessary options.
    – Catija
    Commented Sep 4, 2023 at 21:03
  • 1
    Well, I recently encountered some blatant plagiarism on a small site, I flagged and a moderator promptly deleted it, but the plagiarist got to keep all the rep (+14 votes). This just seems wrong. Commented Sep 4, 2023 at 23:41
  • @Catija Maybe my sites are just special, but it's extremely common for people (even high rep users!) to plagiarize on my site (two examples, same question). It's extremely pervasive and obvious, but nobody ever really flags about it but me. Maybe people think they get a free pass as long as they're copying from a dictionary.
    – Laurel
    Commented Sep 6, 2023 at 15:38
  • 1
    @Laurel That's why we allow sites to request the flag type be added - but many sites don't have a need for it as far as we can tell, so we would rather avoid adding it to all sites... that said, I'd also say that much of this isn't necessarily "plagiarism" so much as it's likely people who don't know how to properly cite content.
    – Catija
    Commented Sep 6, 2023 at 16:02
8

People who want to help out on Stack Overflow should check out the StackAPP Guttenberg: "Guttenberg - A bot searching for plagiarism on Stack Overflow".

"What is Guttenberg?
Guttenberg is a bot that searches for plagiarism or duplicated answers on Stack Overflow. It's currently running in SOBotics under the user Guttenberg.".

There's an additional command available for moderators and room owners, it checks all posts of the particular user for plagiarism.

There is also source code available on GitHub, if you want to help out there. See above link for more information.

2

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .