614

We'd first like to take a moment to thank everyone for their patience while we put this together. Your restraint was a very big help in us handling this incident with the degree of diligence that all of you deserve; thank you for waiting so patiently as we worked to resolve it.

tl;dr:

On 2016-11-28 it was brought to our attention that we were unintentionally exposing email addresses and phone numbers of users that filled out a Developer Story. The information wasn't actually printed to browsers, but was present in the page's HTML source markup. The bug causing this existed since the Developer Story private beta, but was actually exposed once the beta period switched to public on 2016-10-11.

The bug was immediately fixed, and we spent quite a bit of time working with all major search engines and archival services in order to ensure that the accidentally disclosed information was either suppressed in results pending reindexing, or purged subsequent to reindexing.

This bug affected a very small percentage of Stack Overflow users, limited specifically to users that had filled out a Developer Story prior to 2016-11-28. Discovery of the information was possible only through very specific searches containing the user's email address (if known) or phone number (if known).

We strongly believe that any potential for inconvenience due to this bug has been mitigated, and that there's no additional cause for concern regarding any accidental disclosure of email addresses or phone numbers. We sincerely apologize if anyone was inconvenienced in any way as we corrected the bug and worked to mitigate any potential lingering effects.


Now, the longer version.

What happened, and when?

On 2016-11-28 at approximately 10:30 UTC-0, a user reported that search results for their phone number showed their Stack Overflow Jobs public CV as the first response. Upon examination, we realized that user’s email addresses and telephone numbers were accidentally disclosed within HTML markup that renders the CV page. Neither email addresses or phone numbers were actually rendered, but the HTML code that forms the page contained this information.

The bug was introduced when the Developer Story CV view was checked into our codebase. It was there on 10/11, but also present during the private beta, however the view wasn't yet accessible to crawlers until the public beta.

How did this happen?

While porting some of our legacy Careers 2.0 code over to the new integrated Stack Overflow Jobs platform, a view that was originally programmed to render a PDF copy of a user’s CV was reused to render an HTML version of the CV. As the goal was to create a view that rendered similarly as the PDF version, this seemed to be an ideal choice.

A bug that caused the user’s phone number and email address to render in the HTML source for people that weren’t the user or an employer attempting to contact the user went unnoticed, because the information wasn’t actually rendered on the page. The information was only included as part of the source of the page.

What did we do?

Once alerted to the bug, we immediately corrected the bug. Our second priority was to get in touch with major search engines in order to get the accidentally disclosed information out of their indexes.

We also notified the Internet Archive of the accidental disclosure, who obliged our request to suppress any archived URLs up to the date that the bug was corrected that could contain this information in the HTML source.

What is the impact of the disclosure?

We believe that the impact of this disclosure is minimal in the context of any harm or inconvenience for users affected by the bug.

All major search engines have either suppressed this info in results, or re-crawled us at our request thus purging the information from indexes. (Many elected to simply re-index prior to the usual 120 day interval.)

Is there anything I need to do?

No. While the information disclosed is personally-identifiable, it was:

  • Not actually printed to the screen, it was only visible in the HTML source of the CV page.

  • Not easily correlatable; you need to already know a phone number or email address in order to turn it up.

We don't anticipate any lingering impact or potential inconvenience for any users that were affected. If you have additional concerns, please contact us privately and we'll be happy to discuss them.

What did we learn from this, and how are we doing things differently?

Personally-identifiable information (PII) is something that every developer needs to handle with care. Fortunately, recognizing PII when you see it isn't all that difficult; if information you're handling can be used to identify, contact, or locate a single person, or to identify an individual in context, it needs to be treated with care.

That's great when you're building new things, but extremely mature code bases have dark and dusty corners where light doesn't often shine. It's extremely important, if not critical, to know when you're working with something that in any way transmits personally-identifiable information in any way.

We've implemented (and recommend others implement) the following scheme to make sure something like this can't happen again:

  • Regular code audits to identify any places where PII is stored and / or transmitted, and regular review of the necessity of each instance found. If there's a chunk of code that shows you your email address on a route that's no longer used after other changes, get rid of it.

  • Identification of PII in the code base and database, so developers immediately know if the code they're working with stores or transmits PII and precisely the kind of information that needs to be considered.

  • Ensuring that the definition of personally-identifiable information is disambiguated entirely, so that there's no question or subjective interpretation of what should be treated differently at all.

I'd like to reiterate, we believe you don't have anything to worry about, and this bug only potentially impacted those that filled out a Developer Story between 2016-10-11 and 2016-11-28. And, again, the surface to take advantage of this bug was quite small, and required prior knowledge of the information that was accidentally disclosed.

But we take our responsibility as custodians of your information and trust very seriously; now that we've taken every possible measure to mitigate any potential inconvenience to those affected, we feel that we owe it to you to be as transparent about what happened as possible.

If you have any additional concerns, please contact us privately to discuss them.

71
  • 47
    "this bug only potentially impacted those that filled out a Developer Story between 2016-10-11 and 2016-11-28" Just to clarify, did we have to explicitly add this information to a "Developer Story", or was our existing information from the Careers site automatically migrated into a "Developer Story"? If I remember correctly, there was some kind of automatic migration done of the information you already had from me. Commented Jan 5, 2017 at 17:19
  • 71
    It took exactly 6-8 weeks to fix it...
    – Jongware
    Commented Jan 5, 2017 at 17:45
  • 7
    @CodyGray Migrated users were also affected, but the vast majority of them had their stories set to private upon migration, so there wasn't a public link in which this bug would have affected them. It really is confined to folks that actively tinkered with / updated their Dev stories and made them public.
    – user50049
    Commented Jan 5, 2017 at 18:03
  • 11
    Although I see the concept of "no huge harm done", aren't you planning to make this post featured? Commented Jan 5, 2017 at 18:22
  • 120
    Discovery of the information was possible only through very specific searches -> Well, a crafty person could have simply clicked "view page source" and noticed the information was there. Who knows if it was extracted from all the published developer stories by a bot? There is no practical way of knowing if someone did, but it's entirely feasible given that the recruiting industry seems to have plenty of actors with a rather flexible sense of ethics, relatively high value of the information, and past efforts of scraping this sort of information from SO. Commented Jan 5, 2017 at 18:40
  • 69
    @Carpetsmoker That's one of the first things we thought, too. We took a look at our load balancer / access logs as we were fixing the bug, and from what we could tell, it was only the normal symphony of crawlers. We of course can't be 100% certain but it doesn't look like that scenario happened. Could a shady recruiter have updated their user scripts after the bug was reported publicly? Possibly, but it was fixed really quickly - I'm doubtful.
    – user50049
    Commented Jan 5, 2017 at 18:55
  • 16
    Double points for going the extra mile then. Strange you didn't mention that in the pos t mortem though. Commented Jan 5, 2017 at 18:59
  • 10
    I don't think it's plausible that this was abused after the bug was found. But I could imagine that someone looks at the source to see if they can gain any additional info, and if they bump into an e-mail address, they start poking around in a targeted way. Obviously they wouldn't inform SO about this, they'll just silently keep crawling the pages. (I'm not saying that this happened, or that it's likely or not; but this scenario seems more likely to me than the exploit after day 0.) Commented Jan 5, 2017 at 19:08
  • 40
    Thanks for the post mortem. That said, if I understand what happened, "you need to already know a phone number or email address in order to turn it up" is not correct. One could have retrieved the phone number from the page source, so the only information required to get the phone number was the name of the person.
    – ken2k
    Commented Jan 6, 2017 at 9:54
  • 13
    One question that might be worth discussion is whether the original meta post raising this bug should have been temporarily deleted or somehow hidden. It sounds like the overall handling of the issue was great, but the chance of some recruiter manually going and grabbing info from candidates in the meantime (as Carpetsmoker suggests) must have been hugely increased by that meta post Commented Jan 6, 2017 at 10:16
  • 15
    @TimPost This bug affected a very small percentage of Stack Overflow users, limited specifically to users that had filled out a Developer Story prior to 2016-11-28. Discovery of the information was possible only through very specific searches containing the user's email address (if known) or phone number (if known) this is incorrect. The information was exposed to anybody visiting a developer story page, and someone who knew about it could have found this info from everyone they knew the name of.
    – Magisch
    Commented Jan 6, 2017 at 12:37
  • 13
    @BenAaronson When the bug was first reported, there was some discussion even by some users about whether the report should be deleted. We discussed it internally as well, but it was already out there getting attention on MSO and we didn't want it to appear that we were hiding it. The team frantically worked to fix the bug to minimize any impact.
    – Taryn
    Commented Jan 6, 2017 at 14:06
  • 20
    Oh, come on. I can't believe someone took the suggestion in my first comment seriously.
    – Glorfindel
    Commented Jan 6, 2017 at 14:42
  • 18
    @bluefeet IMO that was the right call, but I wonder if it might be worth having an explicit policy on this for next time. You could even make a new question/announcement replacing the hidden one which explains that a question was removed, why, roughly what it covered (without potentially harmful details), and saying that it will be unhidden once the issue is dealt with. I think if there was a policy like that, publicly agreed on in advance, for specific cases like this, most people would see that as sufficiently transparent. Commented Jan 6, 2017 at 16:05
  • 51
    My phone # was exposed all this time and no one called me. Not even a single recruiter cold call. :( Considering legal action against SE for false advertising
    – Pekka
    Commented Jan 9, 2017 at 0:37

3 Answers 3

140

Frequently Asked Questions

What telephone number was leaked?

The telephone number listed under your Job Match preferences, which was expected to be only used as a way for employers to contact you if you have expressed interest in being contacted or when you’ve applied for a job.

What email address was leaked?

The email address you provided for employers to contact you.

Did this affect me if my Developer Story was private?

No. If your developer story was set to private, the information was not exposed.

Did this affect me if my Careers profile was migrated into a Developer Story?

[Awaiting answer]

How many users were affected?

[Awaiting answer]

Were archive services other than the WayBack Machine notified?

[Awaiting answer]

Does Stack Overflow have an established policy for responding to unauthorized disclosure of user private data?

[Awaiting answer]

Was the policy followed in this specific instance?

[Awaiting answer]

Will I be notified if my account was individually affected?

[Awaiting answer]

Have you informed the relevant data protection offices in the countries in which StackOverflow operates?

[Awaiting answer]

What policies and procedures are in place to ensure the privacy of user's data during development and in general?

[Awaiting answer]

I'm not sure when I filled out my developer story. How can I tell if my account was affected?

[Awaiting answer]

8
  • 54
    Users: please add common questions which don't need to be discussed privately. Staff: please fill the blanks.
    – Oriol
    Commented Jan 6, 2017 at 21:15
  • 19
    Good idea, I'll keep an eye on this. Having a single post with questions that don't directly relate to specific information (which we'd need to discuss privately) helps make sure folks see these all in one place. Thanks for the initiative here!
    – user50049
    Commented Jan 8, 2017 at 3:31
  • Comments are not for extended discussion; this conversation has been moved to chat.
    – Brad Larson Mod
    Commented Jan 13, 2017 at 18:52
  • 2
    @BradLarson Is the team looking at answering these? Commented Jan 15, 2017 at 14:45
  • 2
    @PhilipWhitehouse - I'm just a moderator, I don't know anything more than you do about these.
    – Brad Larson Mod
    Commented Jan 16, 2017 at 3:09
  • 1
    @PhilipWhitehouse Just got back to Manila from NYC (which was almost 24 hours of traveling), I'll try my best to get these filled in this week. Not an oversight, just really crunched on time (and we've got a toddler that's still on EST :/)
    – user50049
    Commented Jan 17, 2017 at 15:46
  • @TimPost If your toddler can stick to a timezone that sounds like progress from what I hear :) Thanks for responding. Commented Jan 18, 2017 at 17:28
  • 5
    @TimPost - It looks like you never got around to answering these. Surely there's some answers to these questions after 8 months... :) Commented Aug 2, 2017 at 16:31
57

While the information disclosed is personally-identifiable, it was:

Not actually printed to the screen, it was only visible in the HTML source of the CV page.

Not easily correlatable; you need to already know a phone number or email address in order to turn it up.

The post could do without these weasel words. Personal information was served up and transmitted to clients. Period. The fact that "it was only in the source" makes little difference - as evidenced by the fact that it was indexed by Google, saved by archive.org, and available to anybody who bothered to look. Maybe this would fly elsewhere, but it's pretty ridiculous and insulting to our intelligence to try it here where the audience is specifically professional software developers.

15
  • 35
    Did you bother to read the whole text? They ensured it's no longer indexed by search engines and that it's no longer available on the Internet Archive. Newsflash: bugs happen, and sometimes they have grave consequences.
    – user247702
    Commented Jan 7, 2017 at 20:34
  • 25
    Yes, I read the whole thing. My issue isn't with the fact of the bug occurring but with the wording of the report. I've edited the post to try to make this more clear.
    – nobody
    Commented Jan 7, 2017 at 20:50
  • 29
    With all due respect, the issue was solved. Likewise, this did mean that other than through searching, the average user couldn't find the info. They made no attempt to cover up the fact that search engines could indeed index the information. As a Web developer myself, I don't believe that this is an insult to the intelligence of anyone.
    – Leo Wilson
    Commented Jan 7, 2017 at 21:05
  • 35
    @LeoWilson: The fact is that the last quoted sentence is a blatant lie, you wouldn't need to search based on prior knowledge of the phone number, any search that accesses that Developer Story page would lead to downloading the phone number. (mutatis mutandi for email-address) Furthermore, it is almost certain that some recruiters still have some developer story content stored in browser cache, and can now extract the personal contact information from those. I'm sorry for you that you don't recognize the whitewash.
    – Ben Voigt
    Commented Jan 8, 2017 at 17:15
  • 5
    @BenVoigt Yes, so a hacker who knows HTML and who knew about this glitch, or just so happened to be looking through the code, got your information. How many people did that? They were talking about the average user. I don't take information breeches lightly, but let's be honest, this really isn't the worst that could have happened. Plus, there are much more efficient ways for advanced spammers to get people's personal information. If you want to quit the site, that's your right in most Western democracies, but seriously, they made one mistake. Cut them some slack.
    – Leo Wilson
    Commented Jan 8, 2017 at 23:29
  • 24
    @LeoWilson: What actually happened isn't that bad (the information leaked is generally publicly listed in various directories) so they shouldn't need to try to whitewash it. And if it isn't a conspiracy to mislead users, that's actually worse, because it means the people responsible for the system don't understand how websites work, which would make me question whether they did after all manage to fix it completely.
    – Ben Voigt
    Commented Jan 9, 2017 at 0:25
  • 23
    @LeoWilson: They made two mistakes: Implementing a bug and misrepresenting its effects. This answer is commenting on the latter, and there is little reason to disagree. One does not need any prior knowledge to gain access to the leaked information, and correlate user name, real name, phone number, and email address. This is evidenced by the fact, that PII did show up in search indexes of search engines, that weren't specifically implemented to exploit this bug. Commented Jan 9, 2017 at 12:37
  • 10
    A while back Stack Overflow changed their Terms of Service in an effort to stop recruiters (or if you're a cynic, Jobs competitors) from scraping this kind of information. That means at least some people were looking at the source of these pages to write scrapers, and I'd expect at least some of them noticed. Commented Jan 9, 2017 at 20:12
  • 24
    For what it's worth, when I read through the posting, I had exactly the same thought when I hit the spot with the wrong statement. The fact that the comments tried to pamper over the fact that the statement was wrong made it worse, as it was clearly wrong, and the author knew it to be wrong. This was probably handled well, but not described in a trustworthy way. Commented Jan 9, 2017 at 20:28
  • The two points taken together are relevant. What there is a data leak, how likely that leak is to have been noticed and exploited is relevant. The fact it was only buried in source means nobody would spot it by eye, only by crawling. The fact you had to know what to search for is another. These are clearly not posted as excuses but for information to educate you the severity of the breach
    – Mr. Boy
    Commented Jan 11, 2017 at 11:31
  • 4
    @Mr.Boy: This is incorrect. You didn't have to notice the leak to exploit it. And you didn't have to know what to look for either. Unless you have more information than we do, there is no indication that search engines were specifically trained to exploit this leak. They were just doing business as usual. In other words: The information was there for everyone to pick up, knowingly or by chance. This is the true scope of the leak, and the Post Mortem made no attempt to document it as such. Commented Jan 11, 2017 at 12:11
  • 11
    I expect a very crude email harvesting bot will download webpages, search for an @ symbol anywhere in the markup, parse the text surrounding the @ to see if its an email address, save any valid email addresses to a database and then follow any links on the page and repeat. These bots will already be out there crawling the web and don't require any specific knowledge of stack overflow or this particular issue that occurred. For that reason I agree its a bit disingenuous to imply that no harm would have been done unless a human was specifically looking at the pagesource
    – rdans
    Commented Jan 11, 2017 at 12:33
  • 8
    @LeoWilson "so a hacker who knows HTML" huh? There is no hacking involved. View Source is a feature that has been built into web browsers for over a decade (Can't recall if Navigator had the feature). Commented Jan 11, 2017 at 14:58
  • 5
    The first of the two statements is perfectly reasonable, the second somewhat dubious. On balance though, these guys are doing an awesome job of transparency, and the leak wasn't that bad.
    – N. McA.
    Commented Jan 11, 2017 at 17:37
  • Seeing as this is a pretty technical site full of pretty technical people, I don't think defining the exact scope and vulnerability of the problem is "weasely". Describing a situation like this shouldn't just be "We leaked your information, sorry." I guarantee if they had described it that way, people would be asking if it was visible, which pages it was on, how easy it would be to find, and so on. They admitted the data was indexed/archived, they aren't trying to hide anything. I read this as a report of the exact facts; "Not actually visible on the screen" is just another important fact. Commented Aug 18, 2017 at 15:15
57

We sincerely apologize if anyone was inconvenienced in any way as we corrected the bug and worked to mitigate any potential lingering effects.

What I hoped to read at this point is a text like:

We sincerely apologize for our failure to keep your personal data confidential.

We have now corrected the mistake and are undoing its effects to the best of our abilities.

21
  • 27
    There's always a critic. Imo the team handled this bug as good as they could possibly have.
    – Cerbrus
    Commented Jan 10, 2017 at 0:09
  • 59
    Personally, I do not care about any inconvenience that occurred while they corrected the bug. I do care about sensitive data being exposed unintentionally, and without my permission. The wording of the post above suggests to me that the team is unaware of what the main cause of concern is in this case: Exposing user data. Not inconvenience. The sentence "Discovery of the information was possible only through very specific searches containing the user's email address (if known) or phone number (if known)." casts additional doubt on the team's ability to accurately assess and correct the issue.
    – mat
    Commented Jan 10, 2017 at 0:19
  • 24
    Mat, saying "our failure to keep your personal data confidential." is a perfect guarantee to make readers panic. That's all they see, then. In the end, what matters is that this issue was handled properly. Let's not nitpick about something as trivial as how they worded the post-mortem. SE is damn well aware how important privacy is, but they didn't have to write this wall of text. They came out and fessed up, full transparency. Let's give them the benefit of the doubt (That is, if you have any reason to doubt their intentions)
    – Cerbrus
    Commented Jan 10, 2017 at 7:33
  • 10
    That's not at all what I regard "full transparency". Consider for example the following suggestion: "In the time period between date X and date Y, the following information was publicly accessible in the source code of the following subset of Stackoverflow pages (...) The information appeared in the source code and was indexed by major search engines as well as the Internet Archive. The attributes were not displayed when the pages were rendered on screen." The potential of users panicking should not prevent telling the truth. That's also not "nitpicking", but suggesting how to build trust.
    – mat
    Commented Jan 10, 2017 at 8:34
  • 11
    What about the post suggests SO is trying to hide the indexing? "we spent quite a bit of time working with all major search engines and archival services in order to ensure that the accidentally disclosed information was either suppressed in results pending reindexing, or purged subsequent to reindexing." They're well aware it was indexed and they've put significant effort into getting it removed.They're not "not telling the truth". They're not claiming it was never indexed. You can try "build trust" all you like, but the harder you try, the more reasons will people find not to trust you.
    – Cerbrus
    Commented Jan 10, 2017 at 8:43
  • 12
    You are now arguing against statements I did not make, and so I do not defend them. (For example, I did not claim that they stated it was never indexed, or were not aware of this.) I only say that this is not what I regard "full transparency", and my impression from how this post is written is that the team does not take personal information and its accidental public exposure as seriously as would in my view be justified. To me, this means I will never use the developer stories, and have strong doubts about future services where personal data is involved. I might have tried them out otherwise!
    – mat
    Commented Jan 10, 2017 at 8:51
  • 3
    There's always option 2: Don't provide that kind of personal data if you're not comfortable with it getting leaked somewhere, anywhere.
    – Cerbrus
    Commented Jan 10, 2017 at 8:54
  • 3
    Yes, this attitude towards privacy is really a shame. I wrote an extensive complaint about this incident. On Facebook. Just kidding. They did the best they could, and I hardly know a site that I consider as trustworthy as stackoverflow. They know about their responsibility. And they have a name to lose. Of course, mistakes like this shouldn't happen, but they do happen, and I'm confident that this incident causes an even higher awareness among the people who are responsible for this community (i.e. something like this is unlikely to happen again soon or at all).
    – Marco13
    Commented Jan 11, 2017 at 1:49
  • 3
    @Marco13: I'd like to agree, except I cannot. Although not documented in great detail, the measures taken to prevent this sort of leak in the future are based on developer trust alone. There is no infrastructure in place to enforce this, nor is it planned to implement one. Unless the system protects sensitive data, it will get leaked again. Maybe not due to a bug in the system, but rather a coordinated attack. Commented Jan 11, 2017 at 12:24
  • 1
    @IInspectable I'm not an expert here, and not familiar with what could be set up as a technical infrastructure to prevent this. This is site where personal information (for Job negotiations) is part of the profile. Regardless of the tech infrastructure, there will always be developers responsible for the confidentiality. And for me, personally, there are reasons to assume that the people working for SO know their stuff, and take such things serious. And ... sorry, it may sound polemic, but absolute safety can only be achieved by never connecting your PC to the internet at all....
    – Marco13
    Commented Jan 11, 2017 at 14:06
  • 3
    @Marco13: You seem to be confused, which information was leaked here. It wasn't information you see in the public profile. And this isn't about personal information either. It is about personally-identifiable information. If you need help imagining, how data can be secured: It can be encrypted, with the encryption key only being available to authorized parties. It could be stored in a location with restricted access. Nothing fancy, really, operating systems have been doing this literally for decades. Commented Jan 11, 2017 at 14:48
  • 3
    @Marco13: See, that's precisely where we disagree. The proposed changes rely on developer trust alone. This does not prevent against accidentally leaking data. Tool- (or operating system-)enforced access control, on the other hand, would. Commented Jan 11, 2017 at 16:07
  • 1
    @DavidB: That's true, and completely undisputed. To me, it is important to see how the team reacts when this happens. For example, do I get (as in this case), 3 screens of contradicting information together with a conditional apology for an issue that does not bother me in the least? Or (as I would consider ideal), do I get a clear statement about everything that happened and could have happened together with an indication that the team is well aware of how sensitive the issue of personal information is, and truly sorry for having failed to live up to their high standards on one occasion?
    – mat
    Commented Jan 12, 2017 at 8:49
  • 13
    I fully agree with this post. The phrase "for any inconvenience", or worse, "if any inconvenience", is terribly weaselly officialese that almost sounds like a backhanded inversion of what it should be. It almost sounds like it's my fault for feeling inconvenienced (if I should feel so!), and the apology is for my reaction, not for the causing incident. (It's similar to "I'm sorry you had to find out", or "I'm sorry you feel that way".) Don't worry about panicking me, my reaction is my choice. Please just talk straight to me.
    – Kerrek SB
    Commented Jan 12, 2017 at 12:32
  • 2
    I agree with this, if only because "inconvenience" sounds just about as fitting in this context as giving a seafood restaurant a bad rating because the owner's wife plays golf. It's the kind of canned wording that stands out so much as a non-sequitur that it hurts.
    – BoltClock
    Commented Jan 14, 2017 at 4:28

You must log in to answer this question.