Friday, June 23, 2017

Azure VM SLA and high availability confusion

Abstract

Since Microsoft Azure announced Single Instance SLA for Azure Virtual Machine, people are getting confused if availability sets with multiple VMs are required or no?
This post is to address various queries I received regarding Azure VM availability and SLA.
Disclaimer: All below views are personal. In no way, it represents the company I work for.

What is SLA?

SLA stands for Service Level agreement. It is a contract between service provider(here, Microsoft Azure) and end user(anyone, any organization who user Microsoft Azure services); that defines the level of service expected from the service provider.
If expected level of service is not provided and there is loss of business for end user then end user may claim for financial recovery with valid reasons, proofs, evidences etc. If evidences are valid and justified, then it will be an obligation for service provider to approve the claim and make payments.
Depending on the SLA provided, availability can be identified. Many online websites tools available for knowing what uptime of service you get based on SLA percentage. For example, if SLA offered as 99.90% means below can be approximate time periods of potential downtime/ unavailability of the service –
  • Daily: 1m 26.4s
  • Weekly: 10m 4.8s
  • Monthly: 43m 49.7s
  • Yearly: 8h 45m 57.0s
SLA percentage at 99.9% and above are general and industry accepted standards. Reference link used for calculation is here.

Recommendation for 99.95% availability for Azure VM – Availability sets

If you want to achieve 99.95% SLA for Azure VM deployment you should have at-least 2 instances of Azure VM running in Availability set.
P.S. Azure VMs with availability sets can be provisioned from the portal.
Irrespective of type of storage disk used for Azure VMs, SLA 99.95% is valid as long as you are running 2 instances in availability sets. This means if I want high availability for my crucial workloads under Azure VM then I should be running at least 2 VMs on azure in availability sets.
2 VMs with no availability sets means no SLA also. 😊

What is Azure VM - Single Instance SLA?

There were very important specific demand from customer –
-        “We want to run applications on azure that are not designed to run in multiple VMs. So how do we ensure reliability for such applications running on single Azure VM?”
-        “The application I am running on premises today is not designed for scale out. Also scale and management of the application is too expensive and cumbersome for me. What alternatives I have to move such a application to Azure immediately?”
The answer is Azure VM Single Instance SLA.
If you run single instance of Azure VM with premium disk you will get 99.9% SLA. This is always good to have guaranteed reliability from service provider than running VM with standard disks and no SLA at all.

Does Single Instance SLA means High Availability?

No. A simple concept behind high availability is to have redundancy. With single instance there is no redundancy hence no high availability. But when I say no high availability does not mean that VM and applications hosted may face downtime anytime. Your VM will be up 99.9% time in its running lifetime. When Microsoft Azure plans to have maintenance, they will send you notification at-least 5 days before. So you can plan the downtime alternatives with applications running with single instance on Azure VM.
You want to go beyond 99.9% SLA, implement Availability sets and get 99.95% SLA. You want auto-scaling, go for VM Scale Sets. You want to go for 99.999 and so on availability; go for DR, backup strategy.

Summary

1.      Today Microsoft Azure is the only cloud who provides Single Instance SLA.
2.      Single instance SLA is guaranteed only when VM is running with all disks as Premium.
3.      Single instance SLA does not mean you get High availability for applications hosted in VM.
4.      To get High availability, run more than 1 VMs. [This doubles the cost, but there is a way to reduce the cost and still achieve HA; Of-course that is another blog on another day 😊].
5.      Most important – don’t commit to customer that “Single instance SLA means your application will never be down and it provides inbuilt HA”. 😐