137

Update 2019-04-23: The maintenance went as planned.


Current status We have hit some unexpected issues and have rescheduled - title and dates below are updated. The work listed below for Monday/Tuesday has already been completed, the rest of the work will be completed on the day we perform the failover.


tl;dr: Planned service interruption that will impact all Stack Overflow/Stack Exchange sites, Jobs, Chat, and Teams. All sites will be read-only for up to an hour during the maintenance. Enterprise cloud hosted instances will not be impacted.

Short Version:

There will be a service degradation for up to an hour this upcoming week - possibly April 23rd, 2019 at 23:30 UTC (7:30 PM US/Eastern). During that time questions and answers will still display, job listings will still work, and job ads will still display. However, the site will be "read only," i.e. people won't appear logged-in, won't be able to add/edit new job listings, apply for jobs, create, edit or vote on questions/comments/answers, reputation won't change, etc. This should minimize the disruption to the majority of casual readers. We will display a banner on the sites stating we're 'read only' for maintenance. We expect that the site will be in a read-only state for less than an hour.

Longer More Technical Version of What’s Happening?

Background

Our primary database servers are currently running on Windows Server 2012. We have two Windows Failover Clusters, one for Stack Overflow and one for Stack Exchange (Careers), each cluster contains three database servers. We will be upgrading the servers to Windows Server 2016. During the service interruption, we will be performing a failover of the servers still on Windows Server 2012 to the servers already on Windows Server 2016.

What we'll be doing

As mentioned we are using Windows Failover Clustering, along with SQL Server Always On Availability Groups, and Distributed Availability Groups to keep our data in sync across various servers, while giving us redundancy in multiple locations (NY and CO). Starting next week, we will be upgrading the operating systems across these servers to Windows Server 2016.

This upgrades involves many moving pieces, but high-level we will be doing the following next week:

  • Monday: we will be removing a NY server (currently a secondary) from an existing Windows Failover Cluster. The server will get a clean install of Windows Server 2016, a new Failover Cluster will be created, SQL Server 2017 will be reinstalled, and we will create new Availability Groups and new Distributed Availability Groups. By doing this, this server should start to receive data from the current primary SQL Server i.e. the one still in the old 2012 cluster.

  • Tuesday: another NY Secondary will follow the same path as the one on Monday.

  • Wednesday: the remote secondaries in CO will be removed from the old 2012 clusters, rebuilt, and put into the new 2016 failover clusters.

At this point, we will have a GO / NO-GO on the failover. If everything goes according to plan and we feel comfortable, then we will perform the failover (scheduled maintenance) . If anything gets delayed or if there are unexpected issues, then we will push the maintenance to later.

We will not be moving forward with the failover until we are comfortable.

When we perform the maintenance, we will be pointing the applications to the new 2016 servers and performing a SQL failover of the Distributed Availability Groups. We expect that the site will be in a read-only state for about an hour. During this time, we will be making progress announcements and updates on @StackStatus, so following along there if you're interested.

This is a very complicated move that we are making, which has been fully tested in a lab environment, but you can never be sure of anything during these types of operations. As Nick Craver said:

Everyone has a plan until they get punched in the mouth - Mike Tyson

Questions or concerns?

Please post a comment or answer below; I'll do my best to address any concerns between now and the maintenance window.

35
  • 42
    I'll cross some fingers for you :P
    – Tim Stone
    Commented Apr 12, 2019 at 19:12
  • 12
    @TimStone We need more than that. :)
    – Taryn
    Commented Apr 12, 2019 at 19:15
  • 11
    Alright alright, I'll go buy some rum or something too
    – Tim Stone
    Commented Apr 12, 2019 at 19:17
  • 111
    As you also take out chat, any chance we might be watching all of this in a live stream? Or do you expect us to go outside, get some fresh air and ... shrug talk to real people during that hour?
    – rene
    Commented Apr 12, 2019 at 19:20
  • 6
    @rene I don't think there will be a livestream this time around, unless something changes between now and maintenance day. There are a ton of moving parts and that adds another layer of stuff.
    – Taryn
    Commented Apr 12, 2019 at 19:24
  • 31
    Wait... will I have to... go to sleep!?
    – Miriam
    Commented Apr 12, 2019 at 21:35
  • 26
    Looking forward to the upgrade to Server 2019 in six to eight weeks. Commented Apr 13, 2019 at 1:29
  • 9
    @MichaelHampton that was the initial plan, but we ran into a lot of issues with it so decided 2016 got us the improvements we wanted on the SQL side. Also the biggest pain is leaving 2012, it wouldn’t be as complicated if we were on 2012 R2.
    – Taryn
    Commented Apr 13, 2019 at 1:32
  • 52
    🔺 Thanks for plenty of notice.
    – Rob
    Commented Apr 13, 2019 at 2:03
  • 8
    Banner text: We're 'read only' for 60 to 80 minutes for maintenance.
    – TGrif
    Commented Apr 13, 2019 at 18:33
  • 4
    Why not take the AWS route, migrate to the cloud?
    – The Onin
    Commented Apr 14, 2019 at 6:19
  • 8
    Ha! The AskDifferent, AskUbuntu and Unix StackExchange sites are running on Windows Server :)
    – Malekai
    Commented Apr 16, 2019 at 9:11
  • 4
    Why are you using Windows Server?
    – Vikki
    Commented Apr 16, 2019 at 19:18
  • 3
    @AndrewMorton Of course we'll share. We hit a significant bug within SQL Server and the Distributed AGs. Microsoft worked with us to get past the issue, but a fix will be released in the next CU.
    – Taryn
    Commented Apr 21, 2019 at 16:05
  • 3
    I hope this is over soon. What will become of me during these hours?? Where will I go? What will I be?? Commented Apr 22, 2019 at 17:12

3 Answers 3

23

Just don't take too long. I don't know how long I'll be able to wait without commenting on, answering, or editing a post.

It is my life at this point.

14

How do we know if the maintenance has been finished already or not yet carried out?

2
  • 6
    We'll update this post once done or if delayed beyond April 18th.
    – Taryn
    Commented Apr 18, 2019 at 12:34
  • It will be tweeted! Commented Apr 21, 2019 at 5:06
13

Once maintenance is complete, will we be treated to a blow by blow story, with personal drama, difficulties overcome, and maybe even a touch of tragedy and romance?

1
  • 13
    Yes I will blog about it. If you have been following me or Nick on twitter, you'll notice things are not going well.
    – Taryn
    Commented Apr 19, 2019 at 21:52

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .