Google Cloud shows it can break things for lots of customers – not just one at a time

Deleted about 40 networks that services needed, causing late Thursday fun

In the week after its astounding deletion of Australian pension fund UniSuper's entire account, you might think Google Cloud would be on its very best behavior.

Nope.

At 15:22 last Thursday, US Pacific Time, Google Cloud ran "maintenance automation intended to shutdown an unused network control component in a single location." Which is fair enough.

It worked! Unfortunately, it also worked in other locations – about 40 of them, by The Register's count.

The result was two hours and forty-eight minutes during which users of 33 Google Cloud Services – including biggies like the Compute Engine and Kubernetes Engine – experienced the following symptoms:

  • New VM instances were provisioned without network connectivity, hence unable to establish network connections;
  • Migrated/restarted VMs lost network connectivity;
  • Virtual networking configurations (firewalls, network load balancers etc.) could not be updated;
  • Partial packet loss for certain VPC network flows was observed in us-central1 and us-east1;
  • Cloud NAT Dynamic Port Allocation (DPA) experienced allocation failures;
  • Creation of new GKE nodes and nodepools experienced failures.

Other services that needed VMs in Google Cloud Engine or network configuration updates "were not able to successfully complete operations during this time."

The incident was over by 18:10 Pacific Time.

The Register can only imagine how US-based customers felt as they wound down for the day only to see their cloudy connectivity evaporate. Asian users copped it in the middle of the day.

Google has told customers that the incident was caused by a bug in the automation it used to shut down networks, and that once the flawed component was restarted the problem went away.

The automation tool has been sin-binned "until appropriate safeguards are put in place."

And Google has told clients "There is no risk of a recurrence of this outage at the moment." Which isn't very reassuring, given the org's recent rotten track record.

The search giant's cloud limb has promised to reveal more info about the mess. If there's anything juicy in it, we'll let you know. ®

More about

TIP US OFF

Send us news


Other stories you might like