Microsoft’s Failure Highlights Problems in Cloud Computing

Lots of Microsoft users were left unable to access some key online features because of a huge service failure. Hotmail, Office 365 and Skydrive were just some of the many services affected by the failure. Microsoft was still examining the cause of the problem on Friday morning, but released a statement saying that the problem seemed to be related to the Internet’s DNS address system.

Cloud Computing - Reliability

Problems like this bring into question the reliability of services like cloud computing, and definitely raise some questions about how such services compete with local storage. What’s particularly embarrassing is that Microsoft lost the use of Office 365, the company’s competitor to Google’s array of online apps.

The very same service also went offline for a brief period in mid-August, less than 60 days after the service had been launched. The service failure is believed to have lasted for somewhere around two-and-a-half hours, between 3:00 a.m. GMT and 5:30 a.m. GMT on Sep 8th. In a blog post at 6:49 a.m. GMT, Microsoft commented, “We have completed propagating our DNS configuration changes around the world, and have restored service for most customers.” The Domain Name System (DNS) is the system responsible for translating URLs into IP addresses. Try looking up the IP addresses of your favorite websites and you’ll have an idea of how it works.

Microsoft was not alone in experiencing problems with its cloud-based services. Google Docs also went offline for a short period on Wednesday. But, since Microsoft’s Office 365 is a premium service, (users are charged .50 a month), users may expect a more reliable setup.

Problems in the Industry

Shifting applications from installed software to cloud based “software as a service” model has been a huge trend in the computing industry for the past few years. These cloud-based systems are seen as much easier to manage, easier to scale up and down as necessary, and offer greater security.

But a significant number of high profile cloud failures have led some to question the long-term reliability of such services. Among these failures has been Amazon’s EC2 – the company’s remote computing service, which enables businesses to rent out extra processing power and storage. The system failed in April 2011 and had a drastic impact on some of the web’s biggest sites, including Foursquare and Reddit.

Another failure in August affected some of the very same sites. Ken Moody, data center services manager at the Cloud Computing Center comments, "There will be an element of confidence shaken." Moody says that the future of cloud computing relies on IT companies’ ability to spread the risk among users.

"People should look at smaller data centers which are divided up where resilience could be guaranteed. Our service level agreements are 99.99% because we don't put everything into one large data center," Moody said. Moody said that no one, not even its users, knew how Microsoft’s cloud computing systems were structured. Building future confidence in such platforms may depend on the sharing of more information. Moody says, "There's a requirement for transparency and communication to prospective clients."

BBC