Timeline for Detecting fake/spam questions via bizarre unicode gremlins?
Current License: CC BY-SA 3.0
12 events
when toggle format | what | by | license | comment | |
---|---|---|---|---|---|
Apr 13, 2017 at 12:48 | history | edited | CommunityBot |
replaced http://workplace.stackexchange.com/ with https://workplace.stackexchange.com/
|
|
May 27, 2014 at 18:03 | answer | added | IDrinkandIKnowThings | timeline score: 3 | |
May 27, 2014 at 8:16 | comment | added | gnat | meta.stackexchange.com/questions/213201/… We spoke about weird unicode stuff in posts a while ago at Whiteboard chat at Programmers. Back then, guys brought up examples of legitimate posts with unicode, if memory serves these were from Islam.SE and some other smaller SE sites | |
May 27, 2014 at 1:53 | comment | added | jmac | While I'd love to discuss it further Jake, continuing the discussion of odd character sets would be better suited for The Workplace Chat. (Don't want to dilute the message to the super-important community team!) | |
May 27, 2014 at 1:51 | comment | added | Giacomo1968 | @jmac Understood. I worked on a Chinese website 5 years ago & was stunned how little I truly understood about character set issues before that. | |
May 27, 2014 at 1:44 | comment | added | jmac | I absolutely see the concern, but my guesstimation is that this wasn't done intentionally (in the sense of the person maliciously doing it to try to skirt spam filters or the like), but rather due to the encoding used on the site it was copied from. If copying from sites that use non-ascii characters results in weird unicode spaces, any filter may end up punishing other legitimately copy-pasted resources in questions (which would be bad). Asian languages are...problematic...when it comes to encoding (this seems to be from Pakistan). | |
May 27, 2014 at 1:41 | comment | added | Giacomo1968 |
@jmac Okay, obviously you are right. But just so you understand, pretty much every seemingly empty space in the original post was what I believe to be a unicode U+00A0 non-breaking space.
|
|
May 27, 2014 at 1:38 | comment | added | jmac | Obviously they can search for certain character codes in submissions to filter stuff like this out. The question is whether or not that would also filter out content that does have value for the site. That's something for the community team to look at, as they probably won't share the nitty gritty of the spam algorithm for (hopefully) obvious reasons. At any rate, thanks for pointing it out, and the community team will see it. | |
May 27, 2014 at 1:34 | comment | added | Giacomo1968 | @jmac Thanks! But the question I mention is given how inhuman the basic text formatting is—littered with gremlins—is there some way to automatically look out for this stuff? | |
May 27, 2014 at 1:32 | history | edited | Giacomo1968 | CC BY-SA 3.0 |
added 185 characters in body
|
May 27, 2014 at 1:31 | comment | added | jmac | This is a copy-paste from somewhere else. We have had it before. Developers/Community Managers review our meta, so they will see this as-is. Don't worry! | |
May 27, 2014 at 1:25 | history | asked | Giacomo1968 | CC BY-SA 3.0 |