107

Last week I got an email to my email address saying "I saw on Stack Overflow that you were interested in.." and the content of the question I asked. The email suggested I'll star a bug \ feature request on chromium website. The request is totally legit and I don't mind it at all, my main concern is that my email is somewhat exposed on Stack Overflow.

I emailed this guy back and asked him where did he get my email address from, he replied with "I made a long list of folks to email to tell about the bug report, so I don't remember specifics of any of them."

I'm sure that I'll get a lot of responses here saying that stack keeps our emails private but, the facts are:

  1. I never published \ used this email address on Stack Overflow beside my login.
  2. My user name is short and unique for Stack Overflow. (I don't use it on other sites what eliminates the chance that a Google search will show my email address for a specific user.. I don't think someone will do a search for a 3 letters username and expect to find anything)
  3. Googling my user name and my email address will return zero results.

Is it possible the Stack Overflow team using users emails for their own benefit?

I don't know how relevant that is but on this guy about.me site it says that he is an ex-Googler.

Update:

A grace period on the bounty has started so I'm adding here what I've got so far. Even if my SO login email is buried somewhere in the stack pages (which I still doubt) finding it, as we all figured out, isn't an easy task. I don't think that someone on his right mind (besides us :) ) would waste more than a minute to find someones email for something that minor such as staring a bug.

Possible solutions:

  1. There's a dead simple MD5 extractor that can revert any MD5 to its original string.
  2. The person that emailed me gained some kind of access to the SO user list

I've forwarded the email, as requested, to one of the SO team, maybe that will help us solve this mystery.

Update 2:

One of the users in the comments (@Arjan):

2 hours ago I left a comment on one of his answers,(he his referring to the person who emailed me. He found his SO username) to get his attention for this very question, and now the whole answer is gone.Okay, it was not a very good answer. But assuming he deleted himself, he knows about this very question now, but he did not respond -- yet

The comment above was written on the 24 of Nov. The user in question got his first 100 rep Association Bonus on meta on the 24 of Nov. He definitely saw the comment, deleted his answer, read this question and chose to ignore it.

Update 3:

You should all read hwlau answer which makes perfect sense and describes how vulnerable MD5 is and how people can use that to their advantage.

135
  • 32
    I had a person who contacted and followed me through several of my public and private social profiles, going so far as to guess my email address that, while easy to guess, wasn't disclosed on my SO profile (and by guess I mean sending a dummy email first just to see if it worked), just so he could "chat". I found it disconcerting, to say the least. Commented Nov 18, 2013 at 14:18
  • 3
    I don't know if this is still the case but this could be related meta.stackexchange.com/questions/21117/…
    – Marc-Andre
    Commented Nov 18, 2013 at 14:20
  • 8
    The particular question might not be the issue. If you have identifying info in any question or answer, it's easily linked to that one through your profile. A single link to a bug tracker, personal repository, etc, is all it really takes. True, you'd need to be somewhat... persistent... to find it, but people be crazy.
    – Geobits
    Commented Nov 18, 2013 at 14:34
  • 2
    @Sha yes you do have Gravatar, it's still linked in this account of yours and another one - decoding one of those hashes will lead to your email address. (e.g. gravatar.com/avatar/…) Commented Nov 18, 2013 at 14:38
  • 2
    How simply/feasible/realistic is that @ShaWizDowArd? Any clue? Afaik, you could only really confirm if the address you're thinking of has a certain Gravatar. Not obtain the full email address from it.
    – Bart
    Commented Nov 18, 2013 at 14:39
  • 3
    This OP specifically. A personal project was posted(with dropbox link) on a different question. The project's header files just might contain something identifiable. A username that may be used elsewhere. A typical auto-inserted "Created by ..."
    – Geobits
    Commented Nov 18, 2013 at 15:00
  • 6
    @Sha Looks like you posted a YouTube video in answer to a question back in June, and in the description for the video is your a link to you Github account, which contains your email address. Could that explain it?
    – JMK
    Commented Nov 18, 2013 at 16:08
  • 6
    It's very hard to keep from leaking information about ourselves. Have a hard look at this question. Should you have perhaps anonymized the user-agent string?
    – ale
    Commented Nov 18, 2013 at 16:13
  • 3
    Does your email address begin shawn.gragg? Googling the user agent brings back this and filling in the captcha reveals it starts shawn.gr... and googling "Shawn McGraggerSoft" reveals a surname. Commented Nov 18, 2013 at 16:54
  • 2
    Dammit. The "Sha" made it so tempting. Grr
    – Bart
    Commented Nov 18, 2013 at 17:05
  • 4
    OK, this is really weird. Several people have tried their Google- and other fu on this and came up with nothing, right? And the "cracking the gravatar hash" theory sounds super unlikely.
    – Pekka
    Commented Nov 18, 2013 at 20:39
  • 2
    I fear the only way will be to plainly ask the guy who wrote you "How did you find my email?" again, not taking "I made a list" as answer. He may or may not answer but it's worth a shot. Commented Nov 18, 2013 at 23:55
  • 10
    This is intriguing - would you forward the offending email to [email protected] please? No one outside the technical staff will see it. Commented Nov 26, 2013 at 22:23
  • 7
    This is like reading an exciting book :)
    – Stijn
    Commented Nov 27, 2013 at 9:03
  • 3
    No updates? This has had me on the edge of my seat for weeks! Commented Dec 9, 2013 at 18:47

6 Answers 6

35

The user that contacted you performed a search, found your question, then visited your user page. He had no other contact with any of your other content. That's all the information our logs show.

As others have said, it's possible the user has access to an MD5-to-email lookup. To mitigate the ease of obtaining gravatar hashes, we've already removed them from data dumps and data.stackexchange.com, and we're working on a solution for the remaining places the hashes are exposed (i.e. on the sites themselves and their apis).

2
  • (Just in case it helps you and you missed it: the email was sent on Fri, 15 Nov 2013 10:09:57 -0800 (PST). I'm curious how much time passed since visiting Sha's profile, but I understand that's not information you'll make public.)
    – Arjan
    Commented Jan 30, 2014 at 19:55
  • @Arjan I'm also very curious about this. I don't see why Jarrod won't post it here.
    – Segev
    Commented Jan 30, 2014 at 20:22
94
+25

Here's one way they could have done it.

I was able to link you to a GitHub account based on a Dropbox link you posted (with a quick Google to make the connection). From there, I cloned your repository and your email address was visible in the raw Git commit data.

I also found a MozillaZine account, but I couldn't find any Chromium bug tracker connections.

To respect your privacy, I won't be any more specific (unless you ask otherwise).

Someone really wanted to talk to you (or maybe they found an easier way).

4
  • 3
    @Chris I've found the GitHub connected to the dropbox link. Unfortunately that doesn't solve the mystery. I don't have a GitHub account with the email in question. You are still getting +1 for outsmarting us all ;) The email should start with "na".
    – Segev
    Commented Nov 27, 2013 at 9:46
  • @Sha Ah, ok. Yeah, the GitHub account was under a Gmail address. Commented Nov 27, 2013 at 16:28
  • 4
    @Chris the email in question is also under gmail but like I said starts with na.
    – Segev
    Commented Nov 27, 2013 at 16:52
  • Also note that he left the very same message to other users on SO by just leaving a comment (and even posting an answer, which basically also was a comment, and meanwhile has been deleted -- see comments on the question above). And commenting is so much easier than even thinking about cloning a repo. In other words: he must have an easier way?
    – Arjan
    Commented Nov 27, 2013 at 21:41
29
+100

The situation mentioned by @D.W. is very true. A simple calculation below suggests that all emails with less than 11 characters must be cracked, and I will give a motivation to do so.

The data dump here already contains all the MD5 hashed emails and the corresponding userid. Hence, it would be possible to do a brute force matching and there are someone do has hundreds GPU hanging around. Suppose each top GPU could have 1GHash/s, and there are 100 GPU, and run for 3 days. It gives total H = (10^9 * 100 * 3 * 86400) ~ 2.6*10^16 hash. Also, assuming the character set are only letters (no capital), numbers and some symbols which accounts for at most 40 letter, so log(H)/log(40) = 10.25

It means that all emails with 10 or less characters from major email providers must be cracked already with brute force. Provided you have the equipments, it is easy to do and it requires very low technical skills compared with the "advance skills" like dictionary attacked (which should be harder to parallelize in GPU, I guess). Definitely, more can be cracked if they spend more time, but the basic one is really just hours of setup.

About the motivation, someone might find it fun to do it, but it does cost their time and money to run the equipments. So if they can sell dozens of the userid<->emails list, it would a good motivation. The list might have various use. In particular, with the price and restriction on career.stackoverflow, such a simple list would be very handle for the companies bypassing the system and directly contacting with the candidates. There are various email lists out in the wild and the person using it might not even know where it is from. Also, most of users being contact would not notice that it is leak from here as their emails are likely being reused. Anyway, my point is such list is easy to create and there is enough motivation to do so.

Even though the stack keep saying that emails are private, but they already release the email list, in an easily decoding MD5 which they don't even salt it.


Edit: It could have been harder and make this attempt infeasible if the data dump contains salted md5 hash, which requires attempting one email at a time so my above estimation is for one email only. But with the unsalted hash, the GPU only need blind hashing all combinations and send the results back to CPU which then compares all user emails at once. So it is faster by the factor of #users and make the extraction feasible. The limitation would be memory bandwidth which I did not include in my estimation, but the results only slow down few times depending on hardwares.

16
  • 6
    "with the price and restriction on career.stackoverflow, such a simple list would be very handy for the companies bypassing the system and directly contacting with the candidates" -- and in case you missed it: the sender's startup is indeed about some website to match candidates with jobs...
    – Arjan
    Commented Dec 18, 2013 at 23:16
  • @Arjan Yup, I don't know the details, I am just speculating here. My point is that I believe everything that is possible, motivated and easy would be done by someone. The easier, the more people will do that.
    – unsym
    Commented Dec 19, 2013 at 1:21
  • @Arjan: I thought they claimed it was just for some feature request on a bug tracker? Commented Dec 19, 2013 at 20:09
  • Indeed, the email to Sha and the comments to other people at Stack Overflow were about a bug report, @Chris. However, the author of that email and those comments is a co-founder of a startup company that advertises giving recruiters "a complete view" of candidates... That, plus his silence here though he surely knows we're wondering how he got the email address, makes me quite suspicious. But no proof at all, of course. Could be something else for sure.
    – Arjan
    Commented Dec 19, 2013 at 23:07
  • 1
    @hwlau It all adds up. If its that easy like hwlau describes it makes perfect sense. Scanning the SO db and extracting all the MD5 you can find then use all the CPU\GPU power you have to crack those up and end up with a gigantic list of programers emails (perfect job candidates) I can imagine that that list is worth a LOT of money. On the way i'll promote a bug report I care about. I don't have that user MD5...I'll just comment to his question....I don't have that user MD5...I'll just comment to his question....I have Sha MD5 on my list, cool, I'll just email him instead.
    – Segev
    Commented Dec 20, 2013 at 5:09
  • 1
    But why would someone use such a valuable list to ask for upvotes in a bug database?
    – Pekka
    Commented Dec 20, 2013 at 5:34
  • 1
    Imaginary internet points are really important to people, didn't you know?
    – Joe
    Commented Dec 20, 2013 at 6:00
  • @Joe "Earned ten of the most meaningless points on the 'Net" - Shog9♦
    – unsym
    Commented Dec 20, 2013 at 6:03
  • No proof f course, I'm not accusing anyone.. We are just speculating here.. I'm just saying that after reading all the info here, specially this answer and then reading the first title of this guy startup page...it all feels too suspicious.
    – Segev
    Commented Dec 20, 2013 at 6:28
  • @Pëkka: Maybe the whole reason he was contacting people was to verify the email addresses, before selling them to recruiters. (If Arjan's guess is correct, then this might make sense). Commented Dec 20, 2013 at 16:38
  • Or, @Pëkka, simply because he didn't realize he was exposing having such list? (If there is such list, of course.) If I would have created such list, I would also have created a user script to easily find me the email address when just clicking any gravatar. Or maybe a script to show the email address next to a gravatar, and then at some point forgetting that it was not public information I was looking at...
    – Arjan
    Commented Dec 21, 2013 at 10:24
  • (In case you missed it, @Pëkka, I'm worried by that guy's startup company promising recruiters "a complete view" of candidates, and that guy not responding here but cleaning up his answers/comments instead. Hoping there's a much simpler explanation...)
    – Arjan
    Commented Dec 21, 2013 at 10:33
  • Ah: since November Stack Exchange is salting creation of hashes for generating identicons for users that do not have a gravatar account. (So the old datadumps are becoming valuable, if hashes were used to get email addresses. But then, @Sha does have a Gravatar account, so the hashing is is not related to this very question.)
    – Arjan
    Commented Dec 21, 2013 at 12:02
  • @Arjan What makes you say that I have a Gravatar account? I don't remember signing up for one.. I thought the Avatar pics I got on other SE websites are something default that the system gives to any user that doesn't upload a picture.
    – Segev
    Commented Dec 21, 2013 at 13:16
  • @Sha, my bad: I somehow forgot about the option to upload an image to SE. So when I saw that your avatar here on Meta is clearly not some generated identicon, I boldly assumed you uploaded a custom image to a Gravatar account... (Which is stupid to think, as I am using the SE upload function myself as well, with a slightly altered Gravatar identicon.) As an aside, after re-reading Unexpectedly changing identicon: maybe the hashes for identicons are only salted as soon as one clicks the link to change the image.
    – Arjan
    Commented Dec 21, 2013 at 13:23
25

The mail address in your profile is private, only mods and SE employees can see it and they are not allowed to share it with third parties. Sharing PII like mail addresses is covered by the moderator agreement, and each access to that information is logged as well. SE doesn't sell mail addresses or something like that as far as I know.

The usual way someone gets your email address if it isn't published in your profile is by correlating your account with another one that has the mail associated. If your user profile contains enough information to identify you, the mail address could then be taken from your Twitter or Facebook account or any other source that can be somehow connected to your account.

In your specific case I can't actually see how this would have worked, though. I don't see enough personal information in your profile.

9
  • 1
    Whilst true in the general sense, this user's profile has very little identifying information. A short meaningless username, no website, no location, nothing personal in the bio.
    – Matt
    Commented Nov 18, 2013 at 14:22
  • @Matt I just added that as I noticed it as well Commented Nov 18, 2013 at 14:22
  • Maybe something he reveal in one of his question. He said : and the content of the question I asked.
    – Marc-Andre
    Commented Nov 18, 2013 at 14:25
  • I started my profile a while back and to my recollection never posted anything personal on my profile. This is the question the email referred to: stackoverflow.com/questions/19895922/…
    – Segev
    Commented Nov 18, 2013 at 14:32
  • 3
    @Sha The Github link contains your full name Commented Nov 18, 2013 at 14:34
  • 2
    If that is him, and he's not simply using the code @MadScientist.
    – Bart
    Commented Nov 18, 2013 at 14:35
  • 8
    @Matt That's just someones repository, I was just using his code.
    – Segev
    Commented Nov 18, 2013 at 14:37
  • 2
    @MartijnPieters The OP explicitly states in the comment above yours that it's not him. Am I missing something?
    – Bart
    Commented Nov 18, 2013 at 15:42
  • @Bart: I missed that comment. Commented Nov 18, 2013 at 15:45
24

There are some technical aspects of the Stack Exchange platform that could, at least in principle, allow someone to deduce your email address. I somehow doubt that it's what happened to you, but I'll document it here for posterity, since I haven't seen it written down anywhere else.

In particular, the MD5 hash of your email address is made publicly visible in data dumps and via Data Explorer (e.g., try this query). Moreover, for many people it will be possible to recover your email address, given its MD5 hash. For many people, their email address has relatively low entropy: there are a handful of common options for the domain name, and one can enumerate many common possibilities for the username (one can also use your profile and your StackExchange username to help seed candidates for the username). This provides a large list of candidate email addresses for you. Then, one can use offline dictionary search to test each one against the known MD5 hash, to see if any of those have the same MD5 as your known MD5 hash. Due to the improvements in use of GPUs for password cracking, there are existing tools for this sort of MD5 reversing that are surprisingly efficient. Based upon past analyses, I would expect that many (but not all) users could have their email address recovered.

I haven't seen any place that publicly discloses this risk. I learned about it on a post I made to MSO a while ago that has since been auto-deleted.

That said, while this is technically possible, I don't know whether it's likely that anyone will bother to go to these lengths just to learn your email address. They'd have to download a MD5 cracker, configure it to help it generate plausible email addresses, tweak it by hand, and leave it running for hours. That doesn't sound super-plausible to me.

Bottom line: my expectation would be that there's some other explanation -- that this is not the cause of the email you got. However, I suppose it's possible, in theory.

4
  • Like I said in the comments, I doubt that someone made that effort just to get my email plus my email isn't a dictionary word.. still, you wrote some great information here that I wasn't aware of. Thanks.
    – Segev
    Commented Nov 19, 2013 at 5:38
  • I suspect that the spammer got hands of the mail from somewhere else and calculated the md5 hash of each adress. Commented Nov 19, 2013 at 13:24
  • It wasn't a spammer, @Johannes. It was a legit request referencing a specific SO question (that's what's so weird)
    – Pekka
    Commented Nov 23, 2013 at 0:04
  • Please merge this with your other answer as much of it overlaps.
    – Caleb
    Commented Dec 19, 2013 at 20:07
16

Due to a technical weakness in the gravatar system, if you use an auto-generated Gravatar, in many situations it would be technically possible to derive your email address from the Gravatar you use. Whether anyone will go to the lengths necessary to do so is less clear, but it is technically possible.

Here's how. The URL to your Gravatar contains a MD5 hash of your email address in the URL. There is no salt in the MD5 hash, so it is possible to do a dictionary attack: guess many possibilities at your email, and test each one to see whether it is correct. If your email address is guessable (and many people's email addresses are), then this would be one way that someone could learn your email address, if you use the default the Gravatar.

I once wrote a detailed technical explanation of this on Meta.StackOverflow, but it looks like the question was deleted (some folks took exception to my characterization of this as a privacy weakness or felt it was covered by prior discussion and downvoted the question, so it ended up with a total score of -2: +11 upvotes, -13 downvotes; as a result it eventually got auto-deleted by the Community user), so you won't be able to read it unless you have > 10K rep on MSO. You can find a less detailed explanation here, but that post is old, so it doesn't take into account the potential for GPU-based dictionary attacks, which potentially change the risk calculus, and it misses some relevant links and analysis.

If I recall correctly, I believe this only applies to people who use the default auto-generated Gravatar. If you upload your own avatar image, it doesn't apply to you. It looks like you currently have a custom avatar image, so assuming this is not something new for you, I suspect this doesn't apply to you and so probably wasn't the way they got your email address. But I'm just trying to document this for posterity, in case anyone else runs across this situation.

9
  • Although I don't use a Gravatar on Stackoverflow (as people mentioned in the comments) I have a default Gravatar on stack design and other stack sites I used once or twice. But, my email address is not a dictionary word so I'm assuming you'll need one hell of a brute force attack to discover it.. Doing all that just to get someones email? Idk..
    – Segev
    Commented Nov 18, 2013 at 17:27
  • Yeah. Agreed, @Sha. Sounds pretty implausible -- doesn't seem very likely to be what's going on in your situation...
    – D.W.
    Commented Nov 18, 2013 at 17:32
  • 2
    @Sha It doesn't matter if the gravatar is default or not. Merely using it is enough to leak the hash. Back in 2011 I ran a cracker on those hashes, and recovered 27k Commented Nov 23, 2013 at 13:09
  • @CodesInChaos How much time do you think it would take to crack my hash to find my email address? (I'm just curious if that is really the way that guy got my email address)
    – Segev
    Commented Nov 23, 2013 at 13:17
  • @Sha The main issue with cracking is obtaining the word lists and configuring the cracker. For somebody who is into password cracking this is pretty easy, for others it's probably several hours of work. The cracker then attacks all hashes at the same time. It'll find many email addresses within the first few hours, and then become slower and slower. If your address would be among the cracked depends on how guessable it is, and how good the dictionary used by the cracker is. Commented Nov 23, 2013 at 13:34
  • @CodesInChaos And if the email is something like [email protected] Is that even possible to crack using just the hash?
    – Segev
    Commented Nov 23, 2013 at 15:32
  • @Sha If it doesn't contain any patterns (yours does), 10 chars have 52 bits of entropy. This is outside of the range I tried with my CPU cracker, but with a GPU cracker it might be possible. With a slightly shorter name or by exploiting patterns it becomes even cheaper. That said, it's pretty unlikely that somebody bothered to brute force 10 characters to recover your email. Commented Nov 23, 2013 at 15:56
  • Very interesting. What makes you say that mine contains patterns? It only contains 7 chars btw.
    – Segev
    Commented Nov 23, 2013 at 16:06
  • 3
    Please merge this with your other answer as much of it overlaps.
    – Caleb
    Commented Dec 19, 2013 at 20:07

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .