47

Achtung! Read this first:

Please use spam flags responsibly. Spam flags can have severe consequences on a user without any moderator action. Don't just spam spam flags, look at the user's activity. If they're engaged in frequent borderline spamming, flag for moderator review and say what you found, rather than dropping the hammer on a user who might be a useful contributor that failed to add a disclaimer of affiliation.

Flags are there to help you help the moderators help us all. Please use them appropriately!


So, recently I've been experimenting with using the Data Explorer to find posts of... actionably low quality, shall we say. After trying a lot of badly-constructed queries with somewhat mixed results, the most recent attempt seems to have been rather effective:

Seek and Destroy: Spam spam URLs spam baked beans and spam

A list of almost 500 users that are suspicious. Certainly not all are spamming--but after looking at a few selected arbitrarily from the list, I'd wager that very, very few are unambiguously non-spam.

To give you an idea, one of the first users I looked at had a grand total of 17 answers all promoting the same site, with little to no other content in the answer. I left a comment to that effect, flagged an answer for moderator attention with the same explanation, then spam-flagged a few others for good measure. The user account has since been gloriously destroyed with righteous vengeance.

There are a few clearly legitimate users like this one who seem to merely have a habit of giving relevant links as answers with perhaps less detail than would be ideal; but most others I saw were promoting a single site/product/&c., with no disclosure of affiliation.

And lest you think that the example I mentioned of 17 spam answers was just an outlier, here's an account with 39 answers the smoking ruins of what used to be a spam account--good work, everyone! I've glanced at a half-dozen of them and all have been blatantly promoting one of two products.

Is this worth dealing with? I don't think I have the stomach to go through more than a few of these.


Edit: Some other folks have stepped up to improve or expand upon my (quick and dirty) query, which is pretty awesome! Check the most recent queries list to see what people have been up to.

As another aside, I don't know if anyone else has tried doing this on other sites yet, but I ran my query on both SU and SF. After inspecting a few users chosen arbitrarily, I found nothing other than people giving helpful, relevant links to things they clearly had no affiliation with. There may still be some spam users in there, but they aren't the majority. Looks like SO is by far the biggest spam target in the SE family, which isn't surprising, but good to know.


Spam-hunting 2: Electric Boogaloo

Anyone up for a bit more? Valiant spam-fighter Scorpi0 below has been trying more queries, and it looks like the most recent may still have some material worth inspecting.

This is probably the last gasp for this method of spam-hunting, at least until the next data dump is added to the Data Explorer and things aren't painfully clogged with the bajillion spam accounts already sent to the Great Meat Tin in the Sky.

26
  • Forgive my ignorance, but what does spam-flagging do?
    – Thursagen
    Commented Jul 25, 2011 at 4:56
  • 1
    @HamandBacon throws in a 100 rep penalty if it gets flagged by 6 people Commented Jul 25, 2011 at 5:00
  • @Ham and Bacon: I can't find a reference at the moment, but if memory serves me spam flags act as a downvote (without cost to the flagger) and, if enough spam flags are added to a post, it's nuked with an extra rep penalty (-100 or something?) to the user. It's pretty harsh.
    – McCannot
    Commented Jul 25, 2011 at 5:01
  • @camccann it's in the privilege page. Go to your privileges and go to flagging.
    – Thursagen
    Commented Jul 25, 2011 at 5:04
  • @Ham and Bacon: Derp derp derp, of course it's in the most sensible place which is the only place I didn't think to look. Sigh.
    – McCannot
    Commented Jul 25, 2011 at 5:05
  • 5
    Consider opening a new question/feature request asking them to evaluate your query in terms of adding it to their heuristics for poor quality questions and users.
    – Pollyanna
    Commented Jul 25, 2011 at 5:41
  • @Adam Davis: Hm, dunno. Not only is the query trivial, it also (by definition) only picks things up after the fact; it's comparing the ratio of "short posts containing a URL" to "total posts" by each user. I suspect that by the time it would be useful, it'd be time to just destroy the account.
    – McCannot
    Commented Jul 25, 2011 at 5:47
  • Wow, that search is incredible. I just flagged all the posts that I could and there are still plenty more. Good catch!! Commented Jul 25, 2011 at 6:22
  • 6
    Phew ... glad I saw this. I thought we may have been in the middle of a spamming zombie apocalypse. The flags have been (mostly) spot on, thanks for the great work!!!
    – user50049
    Commented Jul 25, 2011 at 9:26
  • 1
    @Tim Post: Haha! I was a bit worried that posting this might result in an avalanche of spam flags that would give the moderators a heart attack... glad to know you're keeping up with it!
    – McCannot
    Commented Jul 25, 2011 at 13:37
  • Yee haw! Kill that spam! Loving the query. Commented Jul 25, 2011 at 13:47
  • 1
    I am a deputy dawg! Yee haw! Commented Jul 25, 2011 at 15:39
  • 3
    After looking at the moderator queue today, I had a feeling I'd find something like this on Meta. Thanks for organizing a clean up effort! Commented Jul 25, 2011 at 16:37
  • 7
    @Bill the Lizard: It's my pleasure. What are users for, after all, if not to occasionally create massive amounts of extra work for moderators?
    – McCannot
    Commented Jul 25, 2011 at 17:24
  • Ran out of flags! Will be back tomorrow :) Commented Jul 25, 2011 at 17:48

4 Answers 4

17

Seek and Destroy: Spam Users who are Spamming URLs is a query which group by users and by URLs.

I use a brute-force solution to find something between a http:// and a /, or between http:// and a ", and this works pretty great.

Edit

I delete the filter on the body length, and add a filter on the user reputation. This shows all new spamming users, yaaaaaa!

A new one: Seek and Destroy: Auto Promoter

Retrieve users who quoted their website a lot more than necessary. We can see high reputation users in these lists, so keep your eyes open!

2
  • After looking at your query a bit I'm undecided--the results it gives are much nicer, but I think it's also missing some results that mine picks up. I made a few attempts at extracting URLs by other means but found no significant improvements. For the time being, I think it makes sense to keep both around. Maybe edit the title on yours to make it clear that it's a variation with better detail but more false negatives?
    – McCannot
    Commented Jul 25, 2011 at 18:06
  • 2
    this is awesome, well done :) Commented Jul 31, 2011 at 20:17
14

Awesome query!

Please keep the following things in mind:

  • The Spam-Flags are only suitable for single spam posts. Please always check the whole account, and if there are more then one spam answers/questions, flag one for Moderator Attention, either requesting deletion of the account or manual cleanup (via the mods) of all answers (good practice is to offer coffee or donuts as bribe).
  • Not all spammers are obvious, some are hiding behind URL-Shorteners and redirects. Also keep an eye out for Ad-Spammers. I've written a GreaseMonkey Script which will display some of the better known Ad-Providers in a small info field when visiting the site. Ad-Spammers do also need to be flagged.
  • Another possible help for the hunt is the Web Of Trust, which will display user-ratings of the visited page.
2
  • 3
    Feel free to flag anything that contains shortened URLs as spam, though. There's no excuse for that. At a minimum, you should replace the obfuscated URL with its full equivalent. Commented Jul 25, 2011 at 7:51
  • 1
    Flagged and bribed as appropriate :) Commented Jul 25, 2011 at 16:11
7

Very nice, it is important to deal with spam posts properly. Especially these kind of ones that come across as relevant helpful answers with good intentions towards the asker, because they are the ones that can mislead people.

I just picked a random user from that list and first one I picked had 14 answers, all of which promote the same thing. https://stackoverflow.com/users/91095/ahmad

I picked a few more and they all seemed like legit users not promoting anything. Perhaps it would be beneficial to have a query which searches for repetitions of one link by the same user instead of many links that could be to many different sites. I'm not sure how you'd write that query though.

1
  • Yeah, if I could search for repeated links (or better, repeated links to a domain) I would have. Maybe there's a way to do it, but I'm far from an expert on T-SQL hackery.
    – McCannot
    Commented Jul 25, 2011 at 5:11
3

Please treat this as a catch all for domains that should probably be black listed.

Format

  • Domain URL (no link, just the FQDN)
  • of times it showed up

  • of individual accounts posting it

Note, only domains that are obviously frequent fruit of link planting should be listed here. MSDN would be a good example of something that should not be listed here. I kicked off one e-mail to team@stackoverflow recommending a domain, it seems like we could cut down on the noise they receive by creating and reviewing a list instead.

6
  • For what it's worth--based on one of my other attempts at a query to find spam posts, the clearly-legit domains I noticed the most were SE sites, various project hosting sites (github.com, code.google.com, &c.), Wikipedia, and official sites for programming languages and tools (e.g. MSDN, python.org, oracle.com, &c.).
    – McCannot
    Commented Jul 25, 2011 at 13:09
  • @McCannot - I noticed several (so far in today's hunt for mystery meat) that repeated themselves across user accounts. All accounts were blatantly spamming the same domain. It's those cases I hope to send up the chain to be outright banned, since the planting was really egregious.
    – user50049
    Commented Jul 25, 2011 at 13:21
  • Yeah, I saw one or two of those in my initial sampling, which really made me grind my teeth. That's actually part of what made me decide it was a big enough clean-up job to post here.
    – McCannot
    Commented Jul 25, 2011 at 13:35
  • 3
    A whitelist would probably be helpful here. code.google.com, wikipedia etc are not likely to be spam links, and a user who repeatedly links to them may just be someone who thinks you have to cite a source in every answer. Commented Jul 25, 2011 at 19:06
  • @Kate - that's why I noted review and gave MSDN as an example (as it is frequently cited). I'm looking for the ezandroidappcreator.com links, as a purely meta example (I have no idea if that domain even exists)
    – user50049
    Commented Jul 26, 2011 at 7:28
  • @Tim bontq_com has had some dummy users posting links to their site (all flagged as spam now) Commented Jul 28, 2011 at 14:39

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .