Skip to main content
replaced http://workplace.stackexchange.com/ with https://workplace.stackexchange.com/
Source Link

First, I am not too sure if this is “Workplace” specific, or a question best suited for the larger Stack Exchange world. If this question can be moved to a more appropriate place, please do so.

So this morning I came across this bizarre/stilted question that refers to events from 2010:

Beyond the very stilted wording & references to 2010, something else stood out to me: The text itself has tons of odd unicode “gremlins” in it. Mainly in what appear to be spaces, but actually are something else. Perhaps non-breaking spaces? Who knows. It’s one of those things I bet I could dissect deeper, but had better things to do today.

Anyway, my hunch that it was fake was confirmed by the user jonsca who discovered the question is lifted from some H.R. management text or test. Kudos to jonsca!

So anyway knowing this text was so littered with an inhuman amount of unicode cruft that caused any attempt to edit futile at best, is there any automated way the Stack Exchange filtering system can detect stuff like this & perhaps flag it right away? Or is that part of the CATCHA mechanism that pops up every now & then.

It seems to me the clean text that comes from a valid question when entered automatically trumps the unicode junk seen in this post. But unclear if Stack Exchange actually does detect stuff like this & it comes through?

Basically: This was obviously not a real question just even from the formatting. How can we combat this stuff? Considering the clear data discrepancies on a unicode level, can better filtering be put in place to catch junk like this before it hits the system?

First, I am not too sure if this is “Workplace” specific, or a question best suited for the larger Stack Exchange world. If this question can be moved to a more appropriate place, please do so.

So this morning I came across this bizarre/stilted question that refers to events from 2010:

Beyond the very stilted wording & references to 2010, something else stood out to me: The text itself has tons of odd unicode “gremlins” in it. Mainly in what appear to be spaces, but actually are something else. Perhaps non-breaking spaces? Who knows. It’s one of those things I bet I could dissect deeper, but had better things to do today.

Anyway, my hunch that it was fake was confirmed by the user jonsca who discovered the question is lifted from some H.R. management text or test. Kudos to jonsca!

So anyway knowing this text was so littered with an inhuman amount of unicode cruft that caused any attempt to edit futile at best, is there any automated way the Stack Exchange filtering system can detect stuff like this & perhaps flag it right away? Or is that part of the CATCHA mechanism that pops up every now & then.

It seems to me the clean text that comes from a valid question when entered automatically trumps the unicode junk seen in this post. But unclear if Stack Exchange actually does detect stuff like this & it comes through?

Basically: This was obviously not a real question just even from the formatting. How can we combat this stuff? Considering the clear data discrepancies on a unicode level, can better filtering be put in place to catch junk like this before it hits the system?

First, I am not too sure if this is “Workplace” specific, or a question best suited for the larger Stack Exchange world. If this question can be moved to a more appropriate place, please do so.

So this morning I came across this bizarre/stilted question that refers to events from 2010:

Beyond the very stilted wording & references to 2010, something else stood out to me: The text itself has tons of odd unicode “gremlins” in it. Mainly in what appear to be spaces, but actually are something else. Perhaps non-breaking spaces? Who knows. It’s one of those things I bet I could dissect deeper, but had better things to do today.

Anyway, my hunch that it was fake was confirmed by the user jonsca who discovered the question is lifted from some H.R. management text or test. Kudos to jonsca!

So anyway knowing this text was so littered with an inhuman amount of unicode cruft that caused any attempt to edit futile at best, is there any automated way the Stack Exchange filtering system can detect stuff like this & perhaps flag it right away? Or is that part of the CATCHA mechanism that pops up every now & then.

It seems to me the clean text that comes from a valid question when entered automatically trumps the unicode junk seen in this post. But unclear if Stack Exchange actually does detect stuff like this & it comes through?

Basically: This was obviously not a real question just even from the formatting. How can we combat this stuff? Considering the clear data discrepancies on a unicode level, can better filtering be put in place to catch junk like this before it hits the system?

added 185 characters in body
Source Link
Giacomo1968
  • 9.9k
  • 10
  • 8

First, I am not too sure if this is “Workplace” specific, or a question best suited for the larger Stack Exchange world. If this question can be moved to a more appropriate place, please do so.

So this morning I came across this bizarre/stilted question that refers to events from 2010:

Beyond the very stilted wording & references to 2010, something else stood out to me: The text itself has tons of odd unicode “gremlins” in it. Mainly in what appear to be spaces, but actually are something else. Perhaps non-breaking spaces? Who knows. It’s one of those things I bet I could dissect deeper, but had better things to do today.

Anyway, my hunch that it was fake was confirmed by the user jonsca who discovered the question is lifted from some H.R. management text or test. Kudos to jonsca!

So anyway knowing this text was so littered with an inhuman amount of unicode cruft that caused any attempt to edit futile at best, is there any automated way the Stack Exchange filtering system can detect stuff like this & perhaps flag it right away? Or is that part of the CATCHA mechanism that pops up every now & then.

It seems to me the clean text that comes from a valid question when entered automatically trumps the unicode junk seen in this post. But unclear if Stack Exchange actually does detect stuff like this & it comes through?

Basically: This was obviously not a real question just even from the formatting. How can we combat this stuff? Considering the clear data discrepancies on a unicode level, can better filtering be put in place to catch junk like this before it hits the system?

First, I am not too sure if this is “Workplace” specific, or a question best suited for the larger Stack Exchange world. If this question can be moved to a more appropriate place, please do so.

So this morning I came across this bizarre/stilted question that refers to events from 2010:

Beyond the very stilted wording & references to 2010, something else stood out to me: The text itself has tons of odd unicode “gremlins” in it. Mainly in what appear to be spaces, but actually are something else. Perhaps non-breaking spaces? Who knows. It’s one of those things I bet I could dissect deeper, but had better things to do today.

So anyway knowing this text was so littered with an inhuman amount of unicode cruft that caused any attempt to edit futile at best, is there any automated way the Stack Exchange filtering system can detect stuff like this & perhaps flag it right away? Or is that part of the CATCHA mechanism that pops up every now & then.

It seems to me the clean text that comes from a valid question when entered automatically trumps the unicode junk seen in this post. But unclear if Stack Exchange actually does detect stuff like this & it comes through?

Basically: This was obviously not a real question just even from the formatting. How can we combat this stuff?

First, I am not too sure if this is “Workplace” specific, or a question best suited for the larger Stack Exchange world. If this question can be moved to a more appropriate place, please do so.

So this morning I came across this bizarre/stilted question that refers to events from 2010:

Beyond the very stilted wording & references to 2010, something else stood out to me: The text itself has tons of odd unicode “gremlins” in it. Mainly in what appear to be spaces, but actually are something else. Perhaps non-breaking spaces? Who knows. It’s one of those things I bet I could dissect deeper, but had better things to do today.

Anyway, my hunch that it was fake was confirmed by the user jonsca who discovered the question is lifted from some H.R. management text or test. Kudos to jonsca!

So anyway knowing this text was so littered with an inhuman amount of unicode cruft that caused any attempt to edit futile at best, is there any automated way the Stack Exchange filtering system can detect stuff like this & perhaps flag it right away? Or is that part of the CATCHA mechanism that pops up every now & then.

It seems to me the clean text that comes from a valid question when entered automatically trumps the unicode junk seen in this post. But unclear if Stack Exchange actually does detect stuff like this & it comes through?

Basically: This was obviously not a real question just even from the formatting. How can we combat this stuff? Considering the clear data discrepancies on a unicode level, can better filtering be put in place to catch junk like this before it hits the system?

Source Link
Giacomo1968
  • 9.9k
  • 10
  • 8

Detecting fake/spam questions via bizarre unicode gremlins?

First, I am not too sure if this is “Workplace” specific, or a question best suited for the larger Stack Exchange world. If this question can be moved to a more appropriate place, please do so.

So this morning I came across this bizarre/stilted question that refers to events from 2010:

Beyond the very stilted wording & references to 2010, something else stood out to me: The text itself has tons of odd unicode “gremlins” in it. Mainly in what appear to be spaces, but actually are something else. Perhaps non-breaking spaces? Who knows. It’s one of those things I bet I could dissect deeper, but had better things to do today.

So anyway knowing this text was so littered with an inhuman amount of unicode cruft that caused any attempt to edit futile at best, is there any automated way the Stack Exchange filtering system can detect stuff like this & perhaps flag it right away? Or is that part of the CATCHA mechanism that pops up every now & then.

It seems to me the clean text that comes from a valid question when entered automatically trumps the unicode junk seen in this post. But unclear if Stack Exchange actually does detect stuff like this & it comes through?

Basically: This was obviously not a real question just even from the formatting. How can we combat this stuff?