72

We wanted to share some of what's been happening on the network lately behind the scenes. We do our best to keep fires from being visible but that's not always possible and when it happens my goal is to be as open as possible.

In short, we've enabled captchas for anonymous users when they search Q&A. If you'd like more details, read on!


So, attackers. In recent weeks we've been hit with progressively escalating attacks targeting our search engine. These attacks have varied and escalated in both complexity and volume. In short, someone is attempting to DDoS us from over 10,000 endpoints. This has caused abnormally high load on our search clusters used for both site search and search on /jobs as well.

While we've tried to play nice with what at first looked like crawlers making questionable choices, after deploying changes to robots.txt instructing them to back off, continued abuse, and user agent/IP range hits against it, it became obvious this was a malicious attack against our infrastructure.

Ultimately, this was the cause of a brief site outage Monday (October 19, 2020) which was a cascading impact of search queries executed (normally quickly) while a database connection was held open. When queries suddenly took far longer, this resulted in connection pool exhaustion across our web tier and resulted in collateral damage outside search in the wider Q&A application. We have removed this interaction by eagerly closing connections in those paths (less optimal, but safer).

In addition, we have enabled captchas for anonymous users hitting our search endpoints to eliminate the traffic coming from automated bots, but still allow for legitimate user traffic. We'll now require a captcha solved when hitting search anonymously (which is valid for 5 minutes, we may tweak this). It's not a usability tradeoff we'd like to make, but it is a necessary one at this phase. Our search resources simply are not infinite and to make them scale to 10-100x their current size only to serve botnets isn't a good use of anyone's resources either.

We hope this results in only minimal disruption; if anyone has any specific questions please let us know and we’ll answer ’em if we can.

7
  • 2
    What makes search so much more costly then say, a home page or /questions? Isn't it a matter of time that the botnets redirect to those end points and cause further havoc? If answering this reveals too much, then don't feel obliged to answer.
    – rene
    Commented Oct 21, 2020 at 19:54
  • 12
    @rene Probably the one great irony of databases: searching for something that doesn't exist (where you have to look at all records) is more costly than searching for something that does. In the automated attacks I've seen on the site at work, they range from gibberish to SQL injection. Even if DDOS wasn't their goal, that's the net effect.
    – Machavity
    Commented Oct 21, 2020 at 20:02
  • 8
    While our .NET app, Redis, and SQL infrastructure have gotten a lot of scale love over the years, search simply hasn't because it hasn't (yet) been a pain point. We think Elastic can probably perform far better than it currently is on our hardware (and we're working on that), but at the moment isn't not at the same level of serving scale as the rest of the site under intense load. Commented Oct 21, 2020 at 20:08
  • 1
    Is this change supposed to be temporary, or permanent? If the former, is there a timeline to remove it, or is it just 6-8 weeks? Commented Oct 21, 2020 at 20:39
  • 2
    Also, this doesn't appear to work for Meta Stack Exchange. I was able to search it without solving a CAPTCHA in a private window, but when I tried to search Stack Overflow, I was prompted for one. Commented Oct 21, 2020 at 20:43
  • 12
    For the moment, we plan for it to be an indefinite change, no plans on rolling it back unless we institute other compensating mitigations for the core issue. It's intentionally on Stack Overflow and Stack Exchange (the main site) for the moment only, but if there is an escalation, there's a good chance it'd be applied globally. Trying to be as light as possible on mitigations, but messaging and preparing as if it's global. Commented Oct 21, 2020 at 20:46
  • I think Captcha in China and some other countries is broken, they must simply go elsewhere: stackoverflow.com/a/61669633/3648282 meta.stackexchange.com/q/153561/282094 - it's unfortunate that some may need to hide and perhaps be poor at searching. --- I know when I search I need to do a bit of refining to get a good return of hits; once that's sorted out I can browse through the results and see if I can find what I'm looking for. If IP addresses are always searching and never looking at the results that's a tip off, as are nonsense queries; but there's a limit for us 2 chk 2.
    – Rob
    Commented Oct 22, 2020 at 10:44

1 Answer 1

16

The Google captchas can be really, really annoying if Google doesn't like you. And certain privacy-preserving measures you can take in your browser can make this dramatically worse. If you don't like to be tracked, you're almost certainly getting much more annoying captchas. I mean the ones where you have to click on lots of images, they fade out extremely slowly and reveal more images you have to handle. And then you never know how many of these Captchas Google will show you. Personally, if I see one of these and the image fades slowly when clicking, I'll just close the page and go somewhere else. That is just far too annoying and time-consuming, especially as you don't know how long it will take in this particular case.

A very important part here is that your experience with these Captchas won't tell you how other users will experience them. The difference between the light and the really annoying settings is vast. It's the difference between almost not noticing them (the single checkbox version) and just closing the tab because you just can't be bothered to click and wait for stupid images for an entire minute or so.

Search is a very important part of the sites. Well, a large part of that is handled by people using Google before they arrive here, but for users that are about to ask a question the site search is something they really should use more, not less. So putting a potential barrier here is not ideal. We are asking new users to make sure they're not asking duplicate questions, making the search more annoying is somewhat contradictory then. This maybe doesn't matter so much on SO where registration is already mandatory, but that would be annoying if this has to be extended to other sites.

It might be useful to actually explain on the Captcha page that you could also log in or register an account instead of the Captcha. If there are any settings to make the Captchas less annoying, that would help as well.

7
  • 12
    This is good feedback - thanks! I'll ask the community team about changing the copy on our captcha pages to help people out in that regard. If users have some issues with the Google captcha that is unfortunate, and I agree it's not ideal, but we have a limited about of tools to combat some of these things and when the alternative is an offline service or site, we do have to make some trade-offs for the greater good in that regard. I promise we make as few trade-offs that affect users as we possibly can, my job is that you just don't see or worry about these things every time we can prevent it. Commented Oct 21, 2020 at 22:28
  • @Nick surely there are alternatives to Google Captcha. Why not use such alternative, at least for something so common and crucial as the search? Commented Oct 21, 2020 at 22:56
  • 8
    @ShadowWizardWearingMask I'm seriously asking: such as? Captcha is something we use across the site and have for some time, has the widest support of anything I'm aware of, and is a known quantity. What unproven thing (from our perspective) would we be looking at to replace it with? I'm always open to better alternatives, but is there one that works for all users, making it a net win over Captcha? I think it's natural to assume a lot of people have solved this problem...but they haven't once you dig in, and so options that cover so many cases are few in this department. Commented Oct 21, 2020 at 23:49
  • 7
    If it helps, I think it's worth noting that captcha has been present for anonymous searches for a little over 6 years now. Users just didn't typically hit it until the 2nd search in based on throttle rules we have set. The increased enforcement is the tweak, but the concept of captcha for anonymous searches isn't a totally new thing, the bot problem has increased (dramatically), but it's not a "from zero". Commented Oct 21, 2020 at 23:53
  • 1
    @NickCraver I understand that this is a necessary evil, and I appreciate that SE is trying to be as open to anonymous users as it is possible in a rather hostile environment. Commented Oct 22, 2020 at 6:53
  • @Nick huh, as usual you're right. Digging into it, the best option was visualCaptcha, but it's no longer developed so guess not a good option. What about developing your own Captcha mechanism? Commented Oct 22, 2020 at 12:30
  • 3
    @ShadowWizardWearingMask I'd say that's just not something we'd ever be likely to wade into. There are 10 gazillion problems to solve on the internet, and we have to choose our battles. Taking on botnets for the world isn't a core competency we should expend a lot of resources on. The goal here is to protect the user experience as much as possible...not infinitely chase the rabbit hole. Think of it this way: even if we did that, would we build a better one? That's unlikely - Google probably has more than our entire engineering staff (maybe with a multiplier) working on just this problem. Commented Oct 22, 2020 at 22:49

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .