Calculating availability with cloud service SLAs

Calculating availability with cloud service SLAs

‌‌When designing a cloud solution, it is beneficial to have a good understanding of service-level agreements (SLAs) cloud providers offer for individual services and how this impacts your overall system availability.

For example, a service offering

  • 99.9% uptime is expected to be down for 43 mins 47 seconds monthly
  • 99.99% uptime is expected to be down for 4 mins 22 seconds monthly

Systems are rarely built using a single cloud service, for example typical web application is likely to have 4-layers like Azure Front Door, Web App, API and Database.  

Sample Web Application (N- Tier)

‌Let's consider each cloud service has following SLA

  • Azure Front Door -> SLA 99.99%
  • Web App -> SLA 99.95%
  • API (Web App) -> SLA 99.95%
  • Database -> SLA 99.99%

Composite SLA

A system is considered up and running if all four services are running (let's leave user/app errors out for simplicity). To calculate the overall SLA of the system a worst-case scenario is considered i.e. each service goes out one after another as below:

Composite SLA 

Overall SLA (aka composite SLA) is calculated simply by multiplying all three SLAs together as follows:

Composite SLA = 99.99% x 99.95% x 99.95% x 99.99% = 99.88%

‌‌Multi-Region Deployments

One way to improve uptime is to deploy an application across multiple regions, which helps you improve uptime and withstand region-wide outages. So our new architecture would look like this (Azure Front door is a global service so it is shared)

Multi Region Deployment

Now let's calculate overall system SLA:

We will first calculate our SLAs for Region 1 and Region 2

Region 1 SLA = 99.95% x 99.95% x 99.99% = 99.89%
Region 2 SLA = 99.89% (same as region 1)

Since these regions run services in parallel composite SLA for Region 1 & Region 2 is calculated as

Composite SLA: for Parallel Deployment (multi-region)
  1. Region 1 Unavailability (R1) = 100-99.89 = 0.11
  2. Region 2 Unavailability (R2) = 100-99.89 = 0.11
  3. Multi-Region Unavailability (OU = (R1 * R2)/100) = (0.11 * 0.11)/100 = 0.000121
  4. Multi-Region Availability (OA = 100-OU) = 100-0.0121 = 99.999879

This gives us overall availability of service in parallel to be 99.999879%

The last step is to calculate composite availability with Azure Front Door as:

-> SLA for Azure Front Door x SLA for Multi-Region Availability

-> 99.99 * 99.999879 = 99.9897%

This is equivalent to monthly downtime of 4 mins 35 seconds

Overall SLA for Azure Front Door x SLA for Multi-Region Availability

Here is a simple spreadsheet you can use to perform SLA calculations (download here)

Sample Spreadsheet

Share Tweet Send

Related Articles

You've successfully subscribed to Kunal Babre
Great! Next, complete checkout for full access to Kunal Babre
Welcome back! You've successfully signed in
Success! Your account is fully activated, you now have access to all content.