Cloud Computing:

Power & Frailty

For all its operational, scale, and cost advantages, cloud computing is an (un)surprisingly fragile technology. 

Part 2: The Myth of Painless Operational Recovery

When the cloud services that your business runs on suffer outages, the ultimate ramification is that certain parts of your operations are disrupted. Naturally, assuming you have set up your cloud architecture using best-practices, you'll go through a set of recovery and fail over protocols. And boom you're back up and running, right? Well... not exactly.

Operational recovery isn't done at the flip of a magic switch. The truth is, when the cloud fails and a businesses operation is interrupted, it actually takes quite some time to identify whether the cloud service provider is at fault or if the business operation stopped for other reasons. After eliminating self-induced issues, someone in your business will finally call up your cloud service provider, asking whether the cloud service(s) you use is actually down.

 

On the other side of the world, there's already a scramble happening to bring the service back up. Trust us when we tell you that it's sheer chaos (we used to work at major cloud computing companies). Unfortunately, the cloud engineers don't immediately know what exactly went wrong to cause the service outage. It takes time and diagnosis to identify the root cause of the problem. Then it takes even more time to fix the problem, which may include rebooting instances or sometimes even a reset of a whole region or sub-region of servers. That's why downtimes can last anywhere between a few unnoticeable minutes all the way to record-setting weeks. 

And what are businesses doing while their cloud service providers are troubleshooting? Waiting. That's it. All they can do is wait. And bleed cash.

But wait, what about high-availability architecture? Isn't that enough to thwart the pains of potential downtimes? Read on, my friend.

Jump to...

 

Data Courtesy of Uptime Institute 2017.

Human Error

UPS System Failure

Cyber Crimes (DDoS, etc.)

Water, Cooling, CRAC Failure

Weather-induced Failure

Generator Failure

IT Equipment Failure

Other Causes

22.7%

22.5%

24.8%

24.4%

20.0%

19.3%

2.7%

18.2%

22.7%

15.9%

12.7%

11.22%

13.6%

12.8%

10.8%

11.1%

7.8%

6.3%

5.7%

4.8%

4.5%

2.5%

1.2%

1.8%

Cloud Outage Root-Cause Breakdown

2010

2013

2016

Copyright 2017 Black Swan Technology, Inc.

China Office

383 Tianhe Rd. Tianhe District

Guangzhou, Guangdong 510620

+86 189 - 8892 - 0258

U.S. Office

Black Swan Technology, Inc.

600 1st Ave, Seattle, WA 98104

+1 (800) 838 - 9079