-2

I have been developing an app that will require a cron task every minute. We are handling our cron tasks with Spring Boot Scheduling. Although, I am a little worried about the following question:

One part of our product must be highly available on the mentioned task, meaning, if it fails even for 1 minute, it will have a great impact on our processes and customers. The question is: is Google Cloud App Engine reliable enough to support these processes so our product wont get affected easily, and if Google Cloud App Engine gets to fail, what options do we have to handle this kind of situation where we need an application that cannot fail not even by one single minute?

2
  • 1
    You're possibly more likely to have the connection to Google Cloud App Engine not work than for GCAE to fail. Commented Feb 13, 2020 at 3:25
  • Meaning, GCAE is highly reliable and it is almost unlikely it will fail so the cron jobs I have attached to my scheduler will be safe?
    – Juan
    Commented Feb 13, 2020 at 14:41

1 Answer 1

1

If you need high availability where one minute of downtime is not acceptable a single cloud provider is not enough. You need multiple providers to have high availability at that level, even then it's still a matter of hoping any issues don't affect multiple providers at the same time. You also need internal processes and procedures in place that are far more challenging and demanding than choice of cloud providers. You also need to ensure anything supporting the highly available module is highly available itself in most cases.

When faced with the price tag for true high availability most organizations discover they really don't need high availability. Once a real cost benefit analysis is done, downtime tends to not look so bad. The less downtime acceptable the more your costs to insure that happens increases and that scale is exponential in nature. Accepting an hour of downtime a year only costs $D, but a few minutes of downtime a year is going to cost 10-20 times more, the price paid to prevent losses can quickly eclipse the actual losses from downtime.

To give you an idea of just how extreme avoiding one minute of downtime is, an SLA for 99.9% up-time still allows for a minute of downtime per day. For a 99.99% SLA a minute of downtime on a weekly basis is acceptable, and a 99.999% is five minutes on a yearly basis. It's very easy to have all sorts of SLAs with what look like impressive numbers, but a minute of downtime is an extremely short window. Everything has to be automated to maintain that level of up time, you need to detect and mitigate issues without human interaction. App Engine only offers a default SLA of 99.95% which wouldn't meet your needs alone if one minute of downtime is an issue.

2
  • Thank you for your answer, it has helped me think what I didn’t want to. Although, can we say it is not likely to happen that an app engine on a cloud will go down ?
    – Juan
    Commented Feb 14, 2020 at 14:19
  • @Juan added the app engine SLA to answer. They will probably be fine the vast majority of the time, high availability is about much more than that though. You can look up their outage history to get a better idea of how often they run into problems.
    – Ryathal
    Commented Feb 14, 2020 at 15:37

Not the answer you're looking for? Browse other questions tagged or ask your own question.