We'd first like to take a moment to thank everyone for their patience while we put this together. Your restraint was a very big help in us handling this incident with the degree of diligence that all of you deserve; thank you for waiting so patiently as we worked to resolve it.
tl;dr:
On 2016-11-28 it was brought to our attention that we were unintentionally exposing email addresses and phone numbers of users that filled out a Developer Story. The information wasn't actually printed to browsers, but was present in the page's HTML source markup. The bug causing this existed since the Developer Story private beta, but was actually exposed once the beta period switched to public on 2016-10-11.
The bug was immediately fixed, and we spent quite a bit of time working with all major search engines and archival services in order to ensure that the accidentally disclosed information was either suppressed in results pending reindexing, or purged subsequent to reindexing.
This bug affected a very small percentage of Stack Overflow users, limited specifically to users that had filled out a Developer Story prior to 2016-11-28. Discovery of the information was possible only through very specific searches containing the user's email address (if known) or phone number (if known).
We strongly believe that any potential for inconvenience due to this bug has been mitigated, and that there's no additional cause for concern regarding any accidental disclosure of email addresses or phone numbers. We sincerely apologize if anyone was inconvenienced in any way as we corrected the bug and worked to mitigate any potential lingering effects.
Now, the longer version.
What happened, and when?
On 2016-11-28 at approximately 10:30 UTC-0, a user reported that search results for their phone number showed their Stack Overflow Jobs public CV as the first response. Upon examination, we realized that user’s email addresses and telephone numbers were accidentally disclosed within HTML markup that renders the CV page. Neither email addresses or phone numbers were actually rendered, but the HTML code that forms the page contained this information.
The bug was introduced when the Developer Story CV view was checked into our codebase. It was there on 10/11, but also present during the private beta, however the view wasn't yet accessible to crawlers until the public beta.
How did this happen?
While porting some of our legacy Careers 2.0 code over to the new integrated Stack Overflow Jobs platform, a view that was originally programmed to render a PDF copy of a user’s CV was reused to render an HTML version of the CV. As the goal was to create a view that rendered similarly as the PDF version, this seemed to be an ideal choice.
A bug that caused the user’s phone number and email address to render in the HTML source for people that weren’t the user or an employer attempting to contact the user went unnoticed, because the information wasn’t actually rendered on the page. The information was only included as part of the source of the page.
What did we do?
Once alerted to the bug, we immediately corrected the bug. Our second priority was to get in touch with major search engines in order to get the accidentally disclosed information out of their indexes.
We also notified the Internet Archive of the accidental disclosure, who obliged our request to suppress any archived URLs up to the date that the bug was corrected that could contain this information in the HTML source.
What is the impact of the disclosure?
We believe that the impact of this disclosure is minimal in the context of any harm or inconvenience for users affected by the bug.
All major search engines have either suppressed this info in results, or re-crawled us at our request thus purging the information from indexes. (Many elected to simply re-index prior to the usual 120 day interval.)
Is there anything I need to do?
No. While the information disclosed is personally-identifiable, it was:
Not actually printed to the screen, it was only visible in the HTML source of the CV page.
Not easily correlatable; you need to already know a phone number or email address in order to turn it up.
We don't anticipate any lingering impact or potential inconvenience for any users that were affected. If you have additional concerns, please contact us privately and we'll be happy to discuss them.
What did we learn from this, and how are we doing things differently?
Personally-identifiable information (PII) is something that every developer needs to handle with care. Fortunately, recognizing PII when you see it isn't all that difficult; if information you're handling can be used to identify, contact, or locate a single person, or to identify an individual in context, it needs to be treated with care.
That's great when you're building new things, but extremely mature code bases have dark and dusty corners where light doesn't often shine. It's extremely important, if not critical, to know when you're working with something that in any way transmits personally-identifiable information in any way.
We've implemented (and recommend others implement) the following scheme to make sure something like this can't happen again:
Regular code audits to identify any places where PII is stored and / or transmitted, and regular review of the necessity of each instance found. If there's a chunk of code that shows you your email address on a route that's no longer used after other changes, get rid of it.
Identification of PII in the code base and database, so developers immediately know if the code they're working with stores or transmits PII and precisely the kind of information that needs to be considered.
Ensuring that the definition of personally-identifiable information is disambiguated entirely, so that there's no question or subjective interpretation of what should be treated differently at all.
I'd like to reiterate, we believe you don't have anything to worry about, and this bug only potentially impacted those that filled out a Developer Story between 2016-10-11 and 2016-11-28. And, again, the surface to take advantage of this bug was quite small, and required prior knowledge of the information that was accidentally disclosed.
But we take our responsibility as custodians of your information and trust very seriously; now that we've taken every possible measure to mitigate any potential inconvenience to those affected, we feel that we owe it to you to be as transparent about what happened as possible.
If you have any additional concerns, please contact us privately to discuss them.
Discovery of the information was possible only through very specific searches
-> Well, a crafty person could have simply clicked "view page source" and noticed the information was there. Who knows if it was extracted from all the published developer stories by a bot? There is no practical way of knowing if someone did, but it's entirely feasible given that the recruiting industry seems to have plenty of actors with a rather flexible sense of ethics, relatively high value of the information, and past efforts of scraping this sort of information from SO.