50

Just took a look at this question and was struck by the lovely list of related questions:

What's wrong with...

Should we add this title to the low quality posts filter?

Or have some logic that checks for how similar a title is to other titles (and how many of them) and for high values of similarity and duplication add to the low quality score?


kiamlaluno in his comment to his answer suggests that we could use the title Levenshtein distance as a measure for the similarity of a title to existing titles - I believe this would work well - if a title scores high (is very similar) to (arbitrary number here) other titles, that would be a flag for low quality.

With the above, instead of "a question with that title already exists" we could have "several questions with very similar titles already exist; please use a more descriptive title".

16
  • 1
    Pending migration to Stack Overflow. Not enough questions with that title in that list to be a problem. Commented Sep 4, 2012 at 11:59
  • 12
    Nice title. I was just opened the question to flat it as off-topic. But nice question too.
    – Himanshu
    Commented Sep 4, 2012 at 12:01
  • 8
    @hims056 - Great. Working as designed ;)
    – Oded
    Commented Sep 4, 2012 at 12:03
  • 1
    In this case, the questions may well not be duplicated - not saying they aren't simply I got a problem, but I don't know what it is but how many ways can you title a question when something isn't working in a sql query?
    – Fluffeh
    Commented Sep 4, 2012 at 12:09
  • 2
    @Fluffeh - You can be much more descriptive than "something's wrong".
    – Oded
    Commented Sep 4, 2012 at 12:10
  • 2
    Well, there is that one guy who has "syntax errors" in the title. That's a little more descriptive.
    – casperOne
    Commented Sep 4, 2012 at 12:10
  • 1
    @Oded true, but it is much easier to be descriptive with a language where you can say functionName isn't working. Not disagreeing, just saying it can be hard for someone new to SQL - it's not like the error messages are generally amazingly descriptive.
    – Fluffeh
    Commented Sep 4, 2012 at 12:11
  • On the topic of that particular question, I think I would point out the guy who was rude, got his answer downvoted, then had a hissyfit, downvoted all the other answers and finally deleted his answer - is there a protocol or suggested response to that sort of behavior?
    – Fluffeh
    Commented Sep 4, 2012 at 12:15
  • what is wrong with my update query/ update query not working, but not much else
    – prusswan
    Commented Sep 4, 2012 at 12:21
  • 1
    @Fluffeh: Ignore and let the serial downvoting detector take care of the downvotes if the delete failed? SO is better off without such people in any case. Commented Sep 4, 2012 at 13:04
  • 2
    @Fluffeh SQL Server errors are normally extremely descriptive. What's unclear about most of them is that the person writing the query doesn't have a clue about databases or queries. The deceptive English-like qualities of SQL make folks think they can just write out whatever they want and it should work.
    – JNK
    Commented Sep 4, 2012 at 13:13
  • 3
    It's not only SQL. title:"What's wrong with*"
    – user138231
    Commented Sep 4, 2012 at 13:25
  • 1
    @Chichiray: Nice, but don't stop there. Commented Sep 4, 2012 at 13:39
  • 3
    1590 rows returned
    – sth
    Commented Sep 4, 2012 at 14:37
  • 2
    Heh, I noticed this in chat almost a year back before they put in that restriction: i.sstatic.net/2IrnA.png
    – Yi Jiang
    Commented Sep 5, 2012 at 23:48

2 Answers 2

9

I tried asking a question using "What's wrong with this SQL query?" as title, and I got the following error messages.

screenshot

What should be added to the low quality filter is all the variants to the title, or the set of words common to all the variants. Considering those questions are a minimal part of all the questions, I am not sure there is anything that should be done.

4
  • Personally I would like something more general - if people are writing many variants of the same title, that's what I would like to be caught.
    – Oded
    Commented Sep 4, 2012 at 13:50
  • 3
    Something more general would compare the Levenshtein distance of the titles.
    – avpaderno
    Commented Sep 4, 2012 at 15:02
  • Arg I didn't notice, they all vary by incredibly minor differences in punctuation, capitalization or wording. I wonder if that's how they got through
    – Zelda
    Commented Sep 4, 2012 at 18:25
  • 4
    @BenBrocka - Exactly right. That's why I am looking for something more general than word search. And yes - as kiamlaluno says, the Levenshtein distance of the title is where I was going with...
    – Oded
    Commented Sep 4, 2012 at 19:06
5

Although duplicate titles are strictly verboten, this is apparently case sensitive as this query returns results (some were posted after Mark Gravell's answer).

select top 15 lower(title), count(*)
  from posts
 where posttypeid = 1
 group by lower(title)
 order by count(*) desc
---------------------------------------------------- -- 
object reference not set to an instance of an object 44 
help with sql query                                  35 
regular expression                                   33 
mysql query help                                     31 
mysql syntax error                                   28 
mysql query problem                                  27 
mysql query optimization                             24 
jquery selector                                      24 
regular expression help                              23 
sql query problem                                    23 
database design question                             20 
jquery selector question                             19 
jquery autocomplete                                  17 
what does this code do?                              17 
jquery validation                                    17 

There are some unimaginative MySQL and jQuery users around...

Judging by these results and your own a quick win might be to lower-case everything and strip out all punctuation when doing this check. It would be harder on the indexes though.

1
  • 1
    I like it. I did figure that the Levenshtein distance option may be a bit too difficult in practice, but this strikes a good middleground.
    – Oded
    Commented Sep 6, 2012 at 8:29

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .