TL;DR: Yes, it can.


The New Answers To Old questions page in the moderator tools is one of the best tools that can be used to assist moderation and has often been described as a place to find flaggable answers. There are nearly 9000 new answers to old questions per week.

A few users (myself included) used to devote a lot of our time aggressively patrolling the same answers and often used to end up out of flags even when there were still a lot of NAAs flowing. We faced a few issues:

  • The NAAs used to pile up too quickly for users to notice and would be forgotten in the pages of history, as the tool listed only 500 posts (25 pages X 20 posts per page).
  • The tool was not a real time feed of Late Answers.
  • The tool was available only for users above 10k, there were a lot of users below 10k who were interested in curating this content.

After examining the Stack Exchange API, we, i.e, the SOBotics team, wrote a chatbot that could report these posts to us. That way we could solve all the issues we faced! Thus, Natty was born.

It worked well but after looking at some stats, we realized that most of the posts are quite obvious:

Tps percentage

(tp indicates a true positive (i.e. a flaggable post), fp a false positive, which is a wrongly detected post, tn is a post that was not detected but was reported manually and ne indicates a badly formatted post in need of an edit)

A large number of posts (868) caught with a detection value of above 7.0 were all flaggable! Only 3 posts (which were all really badly formatted answers) were wrongly detected. (The 1 manually reported post was because of the API backing the bot off for a minute).

A few filters were performing really well compared to the rest:

Individual filters

So we decided that it would be better to automatically flag those instead of waiting for a human to revisit them and flag it. Everytime we flagged, we used to add an automatic comment from a list of canned comments. Ideally, this would have educated the OP that their post was NAA and hence they would delete the post. In reality, the 4-5 hour delay between posting and flagging meant that the OP was no longer around to look at our comments.

The initial plan was to flag the obvious NAAs and leave a canned comment on their post. Detecting which comment to use was simple - there were 3 categories of NAAs being detected. Link only posts, (which were flagged VLQs), "me too" answers by <50 rep and the NAAs by >50rep. Link-only posts were detected as they consisted only of a link. The reputation of the user, returned by the API, was used to determine the type of NAA. If the bot detects the post as non-English or a gibberish post, it doesn't leave any comment.

After talking to Shog9, a community manager, about this system, he advised us to:

  1. Improve the comments.
  2. Delete the comments if the flags are declined or remain unattended for an extended period of time.

After trying to improve the comments, Shog himself provided us with a set of comments.

What are we doing at the moment?

At the moment we are:

  1. Flagging obvious NAA.
  2. Adding a relevant comment on the post.
  3. Deleting the comment, if the post remains for a day.

What problem are we trying to solve?

All the posts that are flagged go directly to the Low Quality Posts Queue (a few after a delay of 15 minutes). Many of the posts which could have been autoflagged were lying around for 6 or more hours (until a human user checked those and flagged them). Hence autoflagging them would have cleaned those within an hour. Apart from that the NAAs enter the moderator queue after an hour and are prioritized based on when the post was flagged and not when the post was created. Adding a comment on the post as soon as they post such an answer would make the user aware of the rules. This would result in either the OP deleting the NAA as informed or tailoring their post into an answer for that question. All in all, it would help clear NAAs faster.

Why am I not running it under my own account instead of a bot account?

Initially it was decided that I would run it on my own account as I could retract bad flags if any and delete the wrong comments. But then something happened and I could no longer use my account. At the moment, the only disadvantage of this is that the bad flags cannot be retracted. There is a Room Owner only command to delete the bad comments.

Are there a few examples of true positives and false positives?

Sentinel is a small web dashboard that was written to track all the posts that we have detected. Most of the true positives can be seen there. A few of them are:

The 3 false positives were:

More Stats

Every day we detect more than 150 bad posts, out of which nearly 125 are flagged. On November 25th, the number went to 227 flagged posts out of 269 with 30 of them in between 0900-1000 UTC. The graph for the past week is:

Graph for the past week

Date,       Detected, Flagged 
2017-01-30, 176,      160
2017-01-31, 191,      170
2017-02-01, 213,      190
2017-02-02, 201,      181
2017-02-03, 255,      226
2017-02-04, 122,      110

Out of the 1036 NAA flags this week, 100 of them would have been flagged and auto-commented (10%).

Can't SE directly stop such posts?

I have asked about this on Meta Stack Exchange when Tim requested ideas for the "Second Iteration of the Stack Exchange Quality Project".


Now that you know the details of this system, what are your thoughts/comments/ideas?

Title courtesy: Can a machine be taught to flag comments automatically?

  • 6
  • Is the second link in false positives correct? The link is sentinel.erwaysoftware.com/posts/4805. It's a valid link, but it seems out of place
    – Tas
    Commented Feb 6, 2017 at 22:35
  • 2
    I'm just curious about what are your "Blacklisted words" ?
    – Fabich
    Commented Feb 6, 2017 at 22:55
  • 4
    Why not have the bots rate them as constructive and non-constructive?
    – Machavity Mod
    Commented Feb 7, 2017 at 0:37
  • @Lordofdark You can find a list of the filters with the most caught posts here: sentinel.erwaysoftware.com/reasons Have a look at reasons starting with "Conteins blacklisted word - " to see the most popular blacklisted words/phrases
    – FelixSFD
    Commented Feb 7, 2017 at 5:51
  • 1
    @Tas Yes, The post is this stackoverflow.com/a/40973019/4099593. The original post when we captured it from the API is this sentinel.erwaysoftware.com/posts/4805. The link was just to show that it was edited within grace period. Commented Feb 7, 2017 at 6:44
  • 7
    Mission accomplished, auto comment, OP understood and deleted it, yeah truly some nice work that will save us all some time and make SO a better place. Commented Feb 7, 2017 at 7:56
  • 1
    @petter Thanks. There are many more such examples. stackoverflow.com/questions/33272967/… :) Commented Feb 7, 2017 at 8:24
  • I like the comment text for NAA - "Every post here is expected to be an explicit attempt to answer this question". Heck yes! This question, not just "a" question. But from what I've seen that's not how NAA works currently.
    – Gimby
    Commented Feb 7, 2017 at 10:22
  • @Gimby, If the post is not immediately obvious, Then it is best to use the other option and explain why it is a NAA. Commented Feb 7, 2017 at 10:26
  • 2
    This is really impressive. I've often thought that you could train many different AIs to flag posts by having each one focus on a specific kind of bad post. Well done. Commented Feb 7, 2017 at 20:24
  • 1
    Thanks @Bill :) .... SmokeDetector detects spam posts and casts Spam (or) Rude/Abusive flags. Natty flags NAAs and VLQs. (We need to write one to flag for moderator attention and we've covered all flag types) Commented Feb 7, 2017 at 20:56

2 Answers 2


I feel that we are relying too much on human interactions. If we can identify with high accuracy (+95%) these posts, we should block them before they enter the system. Of course, this could be done in steps: auto flags/quality filter that sends these post to the queue, if that works as intended (deletions + edits ~98%) we could move towards warn then block.

  • 8
    I agree with this, The heuristics for sending a post to the LQPQ must be improved. This has been requested before also. Commented Feb 6, 2017 at 15:53
  • 1
    What does 95% correlate to in concrete terms? 5, 50, 500, or 5000? If it's 5 or 50, that's a much easier call to make than 5,000.
    – TylerH
    Commented Feb 6, 2017 at 22:48
  • 21
    If you just want to keep this noise away from SO, I suspect that immediate blocking may be less effective than delayed removal, simply because immediate feedback will let the users trying to post these non-answers tweak them until they get past the filter. If instead you're hoping to actually teach these users how to use SO properly, I suspect that will take more than just an error message. Remember, we're talking about users who, for whatever reason, are confused enough to try to post a question as an answer. To them, an error message just looks like an arbitrary obstacle to get around. Commented Feb 6, 2017 at 23:41
  • @TylerH is a statistic fetishism. The cutoff are 50, 90, 95, 99, etc. %. to determine p-values of statistic significance. Quite arbitrary.
    – Braiam
    Commented Feb 7, 2017 at 0:55
  • @IlmariKaronen not necessarily, since the filter will simply ask you to make sure that you are answering the question.
    – Braiam
    Commented Feb 7, 2017 at 0:56

I am in full support of this. I joined the chat and within about 10 minutes ran out of flags on low quality posts. I was being picky with my flags and the majority I viewed were TPs. A few I personally didn't know what to do but left. I really like the work you have done and I am all for it!

I don't flag much so I only have 10 flags so about 1 flag a minute when I joined for the first time.

  • 8
    The purpose of Natty (and our other bots) is not to increase our "helpful flag"-count, but to help the community to keep SO nice.
    – FelixSFD
    Commented Feb 12, 2017 at 16:13

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .