57

Could we automatically reject a suggested edit if the edit summary is only an email address? We all know it is very common for spam bots to enter email address to the one-line field that is after a post and an edit summary should never be just an email address.

In the past few months I've seen lots of edits that have been spam and nothing else, and the edit summary has been just an email address. Few examples below:

From my reviews alone, at least 19 suggested edits were spam with just an email address as the summary.

This isn't a catch-all solution to the problem, but it would get rid of a lot of the spam that at least Arqade is getting in the form of suggested edits. There is no downside to refusing edits with email address summary.

23
  • 2
    Good: All of the examples were rejected. Bad: All of the examples contained BBCode. Commented Nov 18, 2013 at 9:43
  • 1
    Assuming we closed this avenue wouldn't they just change to something else? Commented Nov 18, 2013 at 9:43
  • 2
    @michaelb958 I couldn't find any spam suggestions without bbcode.
    – 3ventic
    Commented Nov 18, 2013 at 9:43
  • 2
    @3ventic Nothing against you. I just tend to cringe whenever I see that bracketed abomination anywhere. Bad spambots! No biscuit! Commented Nov 18, 2013 at 9:44
  • 1
    @RichardTingle I don't think the spam is aimed at SE judging by the use of suggested edits (why suggest if barely anyone will see it) and bbcode, which doesn't even work here.
    – 3ventic
    Commented Nov 18, 2013 at 9:45
  • 3
    You do make a fair point. But how is a generic bot even able to suggest edits to a non generic site? I would have expected it to have to be a stack exchange specific bot to interface with the stack exchange interface Commented Nov 18, 2013 at 9:47
  • No point in my opinion, the spammer will notice this at some point and just change the algorithm to use something else, e.g. "code formatting". They're not stupid, just evil. Commented Nov 18, 2013 at 9:49
  • 1
    I've seen tons of spam that looks "proper", so I don't thin they're that stupid. Commented Nov 18, 2013 at 9:52
  • 2
    the nice thing about SE is that people are reviewing the suggested edits before they are live. There is no way 4 people are accepting spam changes.
    – Wouter J
    Commented Nov 18, 2013 at 13:06
  • 2
    The nice thing about code is that it does the job without the need for few people. It would also prevent the spam from being documented to suggested edits history.
    – 3ventic
    Commented Nov 18, 2013 at 13:38
  • Just a note that this is still being reviewed. It would block only one specific bot, from what I've been able to tell, and it would be easy for them to fix. What might be useful is a setting for a simple regex to test the reason with, which might be useful for keeping out corner cases - but we're going to see what this change does before adding any more complexity.
    – user50049
    Commented Dec 11, 2013 at 6:42
  • Why are the links broken? Commented Sep 10, 2017 at 13:33
  • 1
    @3ventic Apparently, edits that are rejected as spam aren't visible for users who don't have an account on that site. I asked a question about that here. Commented Sep 16, 2017 at 10:38
  • 1
    I'm changing this question over to [status-completed], since it looks like [status-review] was added several years ago by Tim Post (who wanted to look at the question in more detail at the time) - but the actual change was already implemented.
    – Slate StaffMod
    Commented Jan 24, 2023 at 21:57
  • 1
    @Sonic Ah, fair enough. Still - the request ("fix the problem with email-only edit summary spam") has, at its most basic level, been satisfied by an alternative approach, so it still seems like it fits the spirit of [status-completed] to me. If the problem is still ongoing, has become more severe, and/or there's evidence the solution implemented in 2013 has degraded, I'd recommend opening a new question.
    – Slate StaffMod
    Commented Jan 25, 2023 at 5:43

2 Answers 2

35
+50

As Shog said, you're pretty spot on with your observations.

In fact, we had not fully realized just how much of a problem anon spam suggested edits actually were until we got the new system in place.

The good news is it's working, and it's working very well. Behold, 1800 things that didn't annoy people as inbox notifications over the last seven days alone (caution: fragile CSV - sorry!). We are now blocking more spam than we're flagging or rejecting on average, and that is very good news.

gelatinous glazed meat-like substance abuse

... and now continuing on into December:

It's the most, wonderful time, of the year ...

... and finally, most recently, through July 2014:

enter image description here

... where we see signal collected from users and moderators over the last few weeks now keeping a massive onslaught of it out of the queue, almost entirely (well, a little more than 85% of it). There are very few false positives, less than 1%, but that actual number is harder to crunch as it's based on manual review. As you can see from the latest graph, the tweaking and fidgeting we've been doing has really paid off, in fact you can see a lull where it's obvious that someone shut off a bot net. Yet, the work you all needed to do remained about the same.

It's still early as far as testing and data goes, but the single most profound and immediate impact that the system had was drastically reducing the amount of spam that suggested edit reviewers see. You can see how the system ramped-up to an onslaught around the 17th, this has repeated more than several times. We're still adjusting things to tighten it down even more.

The thing is, almost half of the suggested edit spam that we see comes from several distinct networks, which we've traced back to an extremely stupid bot that tests acceptable link format (does bbcode, html, etc work? that's what it's investigating) and does basic link-plant attempts, often with broken html.

What we're doing now is keeping a history of these odd posts that spam networks like to target, and if a new spammer is caught targeting one of those, we dish out a much heavier penalty in the system that keeps only anonymous edits from that neighborhood out.

I'd like to study this more before we start examining the edit summary to a greater extent - just to be sure that this change wouldn't be effective against just this family of very broken bots before we do it. We're still considering inserting regex based checks that we can tweak on a per-site basis in strategic places that directly influence the behavior of the spam protection layer, but we're not yet certain if it would really help. Once a human realizes the bot broke, it's trivially simple to just change strings, making these very distinct 'breeds' of bots harder to study and track.

Until then ...

Remember, every time you reject this crap - you're helping every site in the network block subsequent attempts - that might make you feel a little better about doing it. The system is specifically designed to pick up on signals that the communities and moderators send and learn from it.

Putting on this because it is interesting and now I'm curious to look at it more in-depth, so please hang tight :)

Rejecting these with a message like "Please describe the scope of your changes to this post, we don't require your email address for submission" would probably confuse bots but not legitimate editors that are conditioned to somehow 'sign' their contributions - but I need to look a bit deeper.

4
  • 5
    That CSV is awesome.
    – Undo
    Commented Nov 25, 2013 at 3:34
  • 3
    I wish I were 35 to get that cologne Commented Nov 25, 2013 at 3:55
  • 3
    The colors in the latest graph are reversed, any reason? Commented Jul 2, 2014 at 7:02
  • 2
    @TimPost It's been a couple of years - how's it working out? :) Commented Jun 29, 2016 at 1:42
30

Huh. So yeah, this does appear to be a pretty decent heuristic.

Here's the outcome of suggested edits with email-like comments network-wide for the past 90 days:

 R  A Site Name 
199 0 StackOverflow
 26 0 SuperUser 
 71 0 ServerFault
 10 0 Cooking   
 14 0 Home Improvement
  3 0 Game Developers
116 0 Gaming    
  3 0 GIS       
 18 0 Mathematics
  5 0 Photography
  5 0 Statistical Analysis
  4 0 Web Apps  
 14 0 Webmasters
172 0 Apple     
 10 0 Theoretical Computer Science
 41 0 English Language and Usage
 21 0 Personal Finance and Money
  1 0 Role-playing Games
 14 0 TeX - LaTeX
 77 0 Ubuntu    
 34 0 Unix and Linux
 72 0 WordPress 
  4 0 Programmers
  3 0 OnStartups
 31 0 Homebrew  
  3 0 IT Security
 26 0 Writers   
  8 0 Electronics and Robotics
  6 0 Graphic Design
  1 0 Database Administrators
 15 0 Science Fiction
  3 0 Code Review
  3 0 Code Golf 
  9 0 Drupal Answers
  2 0 SharePoint
 33 0 Musical Practice and Performance
  1 0 Japanese Language and Usage
  2 0 Gardening and Landscaping
  8 0 Travel    
  3 0 Signal Processing
 60 0 Bitcoin   
  1 0 LEGO®     
  2 0 Spanish Language and Usage
 12 0 Movies    
  1 0 Martial Arts
  6 0 Sports    
  5 0 Raspberry Pi
 25 0 Salesforce
 12 0 Patents   
  2 0 User Experience
  1 0 Robotics  
 13 0 ExpressionEngine
 15 0 Magento   
  6 0 English Language Learners
  8 0 Tridion Stack Exchange
 14 0 Blender Stack Exchange
 10 0 MathOverflow

We just rolled out a more effective spam-fighting system for suggested edits, so I would like to wait and see how much these numbers drop before going further... But this does seem like a simple and effective place to attack.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .