There has been some confusion during some recent incidents that we’d like to address. Some of the confusion stems from our status pages/banners that we’re going to update. The other point of contention involves our status page and when it’s updated.
New high availability configuration
In the past if the site ran into a major error and crashed or if we were hit with a big enough DDoS attack that it brought the site down, users were presented with a generic page saying the site was down for maintenance. Since moving to Cloudflare as our CDN, we are now utilizing one of their features that allows us to automatically redirect traffic to our secondary data center that runs in read-only mode. The end result is that even if the primary site goes down, users will still be able to access the site albeit in a degraded fashion (read-only mode). When the application recovers in the primary site, traffic should automatically be directed back.
Because of the way that we used read-only mode in the past during planned maintenance, the verbiage around read-only mode is now a bit confusing. We’ll be updating it in the coming weeks to reflect the new changes.
StackStatus
We use a tool called FireHydrant for incident management; www.stackstatus.net is one of the features built into it. Because FireHydrant is an incident management tool rather than an application monitoring tool, StackStatus is not automatically updated whenever there is site instability. StackStatus is only updated when someone internally declares a major incident. One of the thresholds for declaring a major incident is if the site is down for longer than 5 minutes. However, we strive to improve our transparency during incidents by updating StackStatus more frequently as it’s been an issue we’ve had in the past.
New Health Check Dashboard
Due to how StackStatus is used and listening to the feedback from the community, we’ve built a new dashboard that shows live data directly from our application monitoring tool s.tk/stackstatusdashboard. A link to the dashboard is already up on www.stackstatus.net. The dashboard shows the number of health check errors reported by our monitoring software which runs tests from their data centers around the world. If our monitoring software detects any errors, they will be reflected in the new dashboard. If no errors are detected the dash will be blank—which is the state we’d like for it to remain in!
Stack Overflow for Teams - FBB
, hereFBB
stands for?