
Heat detector is a StackApp whose goal is to catch offensive, rude, snarky comments, and comments that leads to abusive content (offensive, spam posts, or previous offensive comments). It is running a bot that queries comments from the Stack Exchange API and retrieves all comments posted on Stack Overflow.

The automatic detection is based on a mix of regular expressions and NLP (Natural Language Processing). When a comment is classified as "rude" by the system, it is sent in a dedicated chat room for manual review and feedback.



To avoid storing comments in chat transcript (unearth buried fights), the comment is automatically deleted from chat before the timeout (which is a 2 minute window).

The bot also has a test function and a report function, that enables the possibility to test different comments and to train the classifiers on uncaught content.


The comments API is queried every 1.5 to 3 minutes depending on the traffic, the objective being to get all last comments within 1 page result (max 100 comments), to reduce API calls to a minimum.

All comments are processed both by RegEx and machine learning algorithms (NLP) and they all contributes to a final score. If the score is above a defined threshold then the comment is send to chat.


Before applying a RegEx detection, the comment is pre-processed, removing usernames, HTML tags, and repetitive characters. In the same way, before NLP classification, code blocks, links, and non normal ASCII characters are removed.

Regular expressions

The RegEx system is divided in three external text files; high, medium, and low scoring RegEx.

Natural language processing (NLP)

Currently 2 different NLP systems are used. On the basis of the predicated value the score is increased.

NLP Feed

The classifier has a feed divided in 2 categories, the good feed that is a download of old (>90 days) comments from the comment API and a bad feed that is a composition of:

  1. Feed provided by SE, these comments have been reviewed manually and almost 50% removed as they did not seem to be within scope

  2. Comments captured incorrectly or correctly during testing

  3. Twitter feed provided by Laurel

Currently the feed contains 2000 good Stack Overflow comments and 2000 (1000 Stack Overflow, 1000 Twitter) bad comments.


The classifier runs with these primary settings:

useWordFrequency: false (comments are to short to use word frequency)
lowercaseTokens: true
NormalizeDocLenght: false
Stemmer: IteratedLovinsStemmer
Stopwords: Standard english stopwords
Tokenizer: NGramTokenizer (min 1, max 3)

The settings have been chosen on the basis of cross validation and classification result on the training set. Testing on the training set has correctly classified instances at 99.075%.

Classifier result

Apache OpenNLP

Apache Open NLP standard DocumentCategorizerME, is used with these non default settings:

FeatureGenerator: BagOfWordsFeatureGenerator
WhitespaceTokenizer: WhitespaceTokenizer.INSTANCE

Testing on training set has correctly classified instances at 99.823%, with 7 comments out of 4000 not classified correctly.

OpenNLP result

Other classifiers

Other classifiers as J48, SMO, and SGDText have been tested, but didn't provide good enough results.


The bot runs under the user Queen, which is the same as SOCVFinder. The main reason for this is to leverage calls to the comments API, since SOCVFinder is already querying it to fetch possible duplicates. The bot is present in the SOBotics chat room.

Source code

Source code is available on GitHub.


This StackApp is currently in testing phase, to understand correct RegEx and especially to improve feed, including new bad Stack Overflow comments replacing the Twitter feed comments. The objective is to run on 3000 good/bad only Stack Overflow comments.

  • 5
    This is fantastic. I'm very glad to see someone taking this on, as I believe that the comments on SO get inappropriately mean with some regularity. The description here mentions a chat room where potentially troublesome comments are sent. I'm not sure if you are in need of more assistance, but I'd be happy to help with reviewing/flagging if you'd like. Commented Sep 1, 2016 at 3:18
  • 1
    @MichaelOhlrogge you are welcome all the things we do, we try to keep as public and transparent as possibile, step by the SOCVFinder room. Since still under testing the feed is only showed if an RO with notify is in room, to control that the feed is not used inappropriately see our policy FAQ Commented Sep 1, 2016 at 9:31
  • 1
    The room is now SOBotics, FAQ and another info is available at sobotics.org Commented Jul 15, 2017 at 12:42
  • @PetterFriberg are you actively scanning these days, or relying on flagging? Interested because I've been building a comment scanner. Ping me in Charcoal or Sobotics, or somewhere else if you want to discuss.
    – CalvT
    Commented Feb 8, 2019 at 17:13
  • 1
    @PetterFriberg can you list out the possible feedback commands that the heat detector accepts... Commented Oct 19, 2020 at 5:55
  • Accepted commands listed on the GitHub wiki page: tp true positive aka "harassment/bigotry/abuse" flag, fp false positive aka not offensive, nc "unfriendly or unkind" flag, sk skip aka unsure how to classify this, fn false negative (used for reports). Commented Nov 25, 2021 at 22:41


You must log in to answer this question.

Browse other questions tagged .