Cloud Wars
  • Home
  • Top 10
  • CW Minute
  • CW Podcast
  • Categories
    • AI and Copilots
    • Innovation & Leadership
    • Cybersecurity
    • Data
  • Member Resources
    • Cloud Wars AI Agent
    • Digital Summits
    • Guidebooks
    • Reports
  • About Us
    • Our Story
    • Tech Analysts
    • Marketing Services
  • Summit NA
  • Dynamics Communities
  • Ask Copilot
Twitter Instagram
  • Summit NA
  • Dynamics Communities
  • AI Copilot Summit NA
  • Ask Cloud Wars
Twitter LinkedIn
Cloud Wars
  • Home
  • Top 10
  • CW Minute
  • CW Podcast
  • Categories
    • AI and CopilotsWelcome to the Acceleration Economy AI Index, a weekly segment where we cover the most important recent news in AI innovation, funding, and solutions in under 10 minutes. Our goal is to get you up to speed – the same speed AI innovation is taking place nowadays – and prepare you for that upcoming customer call, board meeting, or conversation with your colleague.
    • Innovation & Leadership
    • CybersecurityThe practice of defending computers, servers, mobile devices, electronic systems, networks, and data from malicious attacks.
    • Data
  • Member Resources
    • Cloud Wars AI Agent
    • Digital Summits
    • Guidebooks
    • Reports
  • About Us
    • Our Story
    • Tech Analysts
    • Marketing Services
    • Login / Register
Cloud Wars
    • Login / Register
Home » After 3 Cloud Failures in 12 Months, Microsoft Fortifies Azure Reliability
Cloud

After 3 Cloud Failures in 12 Months, Microsoft Fortifies Azure Reliability

Bob EvansBy Bob EvansJuly 17, 20195 Mins Read
Facebook Twitter LinkedIn Email
Share
Facebook Twitter LinkedIn Email

For Azure customers, the good news is that Microsoft’s global cloud infrastructure has delivered an average uptime of 99.995% for its core compute services over the past 12 months.

The not-so-good news for those same customers is that over those same 12 months, the Azure cloud has “experienced three unique and significant incidents that impacted customers.”

In a blog entry posted earlier this week, Azure CTO Mark Russinovich addressed the issue head-on. Russinovich spelled out the broad nature of those failures—what he referred to as “incidents”—as well as the remedial steps Microsoft is taking to ensure that such problems become even less frequent.

The Three Azure Failures

Here’s how Russinovich described the three failures:

“However, at the scale Azure operates, we recognize that uptime alone does not tell the full story. We experienced three unique and significant incidents that impacted customers during this time period, a datacenter outage in the South Central US region in September 2018, Azure Active Directory (Azure AD) Multi-Factor Authentication (MFA) challenges in November 2018, and DNS maintenance issues in May 2019.”

As we all know, breakdowns, challenges, issues, outages, failures, incidents and imperfections are inescapable until we humans achieve a state of full perfection, which Gartner predicts will occur at 12:37pm Pacific Time on Oct. 17 in the year 5852.

Until then, customers and prospects need to push back hard on Microsoft—the world’s #1 cloud vendor—and the entire tech community to do everything possible to deliver relentlessly enhanced reliability, security and availability.

Toward that end, here are some of the steps Russinovich outlined in his blog post:

“Improve our understanding”

“Outages and other service incidents are a challenge for all public cloud providers, and we continue to improve our understanding of the complex ways in which factors such as operational processes, architectural designs, hardware issues, software flaws, and human factors can align to cause service incidents.”

“Multiple failures” and “intricate interactions”
CTO Mark Russinovich published a blog post explaining three recent Azure cloud failures
Mark Russinovich

“All three of the incidents mentioned were the result of multiple failures that only through intricate interactions led to a customer-impacting outage. In response, we are creating better ways to mitigate incidents through steps such as redundancies in our platform, quality assurance throughout our release pipeline, and automation in our processes.

The capability of continuous, real-time improvement is one of the great advantages of cloud services, and while we will never eliminate all such risks, we are deeply focused on reducing both the frequency and the impact of service issues while being transparent with our customers, partners, and the broader industry.”

Within Russinovich’s CTO office, Microsoft has created a Quality Engineering team that will work closely with the existing Site Reliability Engineering team to explore and create innovative reliability solutions.

Safe deployment

Aimed at ensuring “that all code and configuration changes go through a cycle of specific stages,” Microsoft has expanded this initiative to include software-defined infrastructure changes such as networking and DNS, Russinovich wrote.

Storage-account level failover

This one’s worth reading in full:

“During the September 2018 datacenter outage, several storage stamps were physically damaged, requiring their immediate shut down. Because it is our policy to prioritize data retention over time-to-restore, we chose to endure a longer outage to ensure that we could restore all customer data successfully. A number of you have told us that you want more flexibility to make this decision for your own organizations, so we are empowering customers by previewing the ability to initiate your own failover at the storage-account level.”

Expanding availability zones

In Azure’s 10 largest regions, availability zones provide “an additional reliability option for the majority of our customers,” the blog post says. Microsoft is planning to expand availability zones over the next 18 months to its next 10 largest Azure regions.

Project Tardigrade

Looking to spot and prevent hardware failures or memory leaks before they happen, this effort will enable Azure to freeze virtual machines for a few seconds and shift workloads to healthy systems, Russinovich wrote.

Low to zero-impact maintenance

Including hot patching, live migration and in-place migration, these novel approaches aim to require zero downtime for customers. 

Fault injection and stress testing

I also recommend reading this one in Russinovich’s own words:

“Validating that systems will perform as designed in the face of failures is possible only by subjecting them to those failures. We’re increasingly fault injecting our services before they go to production, both at a small scale with service-specific load stress and failures, but also at regional and AZ scale with full region and AZ failure drills in our private canary regions. Our plan is to eventually make these fault injection services available to customers so that they can perform the same validation on their own applications and services.”

Clearly, pushing reliability upward from 99.995% is a big challenge. But implicit as well as explicit in Microsoft’s promise to customers of its Azure cloud is that Microsoft’s size, scale, technological expertise and financial resources will shield those customers from the disruptive chaos of modern enterprise technology.

And if Microsoft intends to retain its #1 spot in the Cloud Wars, all of its cloud customers—those who’ve been affected by those 3 “incidents” as well as those that haven’t—will be demanding that the plans outlined by Russinovich become reality.

And that they do so quickly.

 

Subscribe to the Cloud Wars Newsletter for in-depth analysis of the major cloud vendors from the perspective of business customers. It’s free, it’s exclusive, and it’s great!

Azure Cloud Wars Cloud Wars Archive Cybersecurity Latest Articles Microsoft
Share. Facebook Twitter LinkedIn Email
Founderuser

Bob Evans

Founder
Cloud Wars

Areas of Expertise
  • AI
  • Cloud
  • Digital Business
  • Innovation
  • Leadership
  • LinkedIn

Cloud Wars Founder Bob Evans actively analyzes the Cloud and AI categories through video reports, in-depth analyses, and interviews with the Cloud and AI market’s leaders and innovators. He’s also the creator of the Cloud Wars Top 10, a ranking and ongoing analysis of the world's most influential tech companies driving digital business and the digital economy. Bob is recognized as a world-class strategic communicator focused on emerging business strategy, disruptive innovation, and forward-looking leadership.

  Contact Bob Evans ...

Related Posts

IBM Launches Industry-First Governance Tools for Agentic AI Security

July 11, 2025

ServiceNow Partner, Nicus: Financial Intelligence Layer for Enterprise Tech

July 11, 2025

Microsoft, Oracle, SAP, IBM Total Market Cap Is $5 Trillion: Legacy’s Revenge!

July 10, 2025

SAP Business Network: A B2B Trading Partner Platform for Resilient Supply Chains

July 10, 2025
Add A Comment

Comments are closed.

Recent Posts
  • IBM Launches Industry-First Governance Tools for Agentic AI Security
  • ServiceNow Partner, Nicus: Financial Intelligence Layer for Enterprise Tech
  • Microsoft, Oracle, SAP, IBM Total Market Cap Is $5 Trillion: Legacy’s Revenge!
  • SAP Business Network: A B2B Trading Partner Platform for Resilient Supply Chains
  • How Nicus and ServiceNow Are Transforming Enterprise IT Spend with AI and Financial Intelligence

  • Ask Cloud Wars AI Agent
  • Tech Guidebooks
  • Industry Reports
  • Newsletters

Join Today

Most Popular Guidebooks

Accelerating GenAI Impact: From POC to Production Success

November 1, 2024

ExFlow from SignUp Software: Streamlining Dynamics 365 Finance & Operations and Business Central with AP Automation

September 10, 2024

Delivering on the Promise of Multicloud | How to Realize Multicloud’s Full Potential While Addressing Challenges

July 19, 2024

Zero Trust Network Access | A CISO Guidebook

February 1, 2024

Advertisement
Cloud Wars
Twitter LinkedIn
  • Home
  • About Us
  • Privacy Policy
  • Get In Touch
  • Marketing Services
  • Do not sell my information
© 2025 Cloud Wars.

Type above and press Enter to search. Press Esc to cancel.

  • Login
Forgot Password?
Lost your password? Please enter your username or email address. You will receive a link to create a new password via email.