Availability Service Level Calculation

Calculating the composite availability SLA for your stack
The following guide below will help you calculate your own availability.
Two parts to the guide according to what you are after.
- The actuarial science behind the calculation(which is just probability of "something" being available or unavailable)
- SLA Calculation guide to maximum downtime possible.
The Actuarial science
The calculation of service levels is purely to assess the risk or the probability of failure and taken as a mathematical problem.
Suggestion: Skip this if purely interested in just availability percentages. Go here
Let us consider the sample space of the following detail.

SLA summary for Azure services taken independently
- Azure DNS: 100% availability (so will remove from consideration in this problem, as will not skew the calculation)
- Azure Front door: 99.99% availability or 0.0001 probability of going down
- Azure App service: 99.95% availability or 0.0005 probability of going down
Note: Although App service is declared with an SLA of 99.95%, with the GA of zonal redundancy that should increase to 99.99% - but that has not been documented yet. For the case of this will be using as described here
Sample spaces for the probability:
Mutually exclusive events
- App service Region 1(AR1) is down but Azure Front door (FD is up)
- App service Region 2(AR2) is down but FD is up
Independent events
- AR1 and AR2 is down
- Azure Front door(FD) is down
- FD is down or AR1 and AR2 is down
For the Mutually exclusive events that either AR1 or AR2 is down, but not both simultaneously
P(AR1 and AR2) = P ( AR1 ∩ AR2 ) = 0
There by the probability of unavailability is 0 for the mutually exclusive events both occuring
For the Mutually exclusive events , then probability of either occuring
P(AR1 or AR2) = P(AR1 ∪ AR2 ) = (P(AR1) + P(AR2) - P(AR1 ∩ AR2)
= P(AR1) + P(AR2) - 0 = P(AR1) + P(AR2)
calculating that as values
- Probability of AR1 to be down : 0.0005
- Probability of AR1 to be down : 0.0005
Probability of either to be down:
P(AR1 and AR2) = 0.0005 + 0.0005 = 0.001
Calculating the probability of only operating on a single region
Two independent events
- Azure Front door being available = 1 - 0.0001 = 0.9999
- Either of AR1 or AR2 being available(AR1|AR2): 1 - 0.001 = 0.999
Overall probability of only being operational on a single region
P(FD and AR1|AR2) = P(FD ∪ AR1|AR2 )= P(FD)P(AR1|AR2) = 0.999 * 0.9999 = 0.9989001
In percentage = 99.89001%.
Overall availability/unavailability
Overall unavailability is the scenario FD is down or (AR1 and AR2) is down
AR1 and AR2 are down as independent events AR1||AR2
P(AR1 ∪ AR2) = P(AR1) * P(AR2) = 0.0005 * 0.0005 = 0.00000025FD is down as a independent event from AR1 and AR2 being down as independent events AR1||AR2
P(FD ∩ AR1||AR2) = P(FD) * P(AR1||AR2) = 0.0001 * 0.00000025 = 0.00000000025
FD is down as a mutually exclusive event from AR1 and AR2 being down as independent events, but either can occur
P(FD U AR1||AR2) = P(FD) + P(AR1||AR2) - P(FD ∩ AR1||AR2))= 0.0001 + 0.00000025 - 0.00000000025 = 0.00010025
Overall probability of availability = 1 - 0.00010025 = 0.99989975
In percentage: availability = 99.989975%
Calculating your downtime or availability percentages
The simplified calculation below just uses probability rules described above to calculate the compound availability of the stack.
Note: A few examples are given below to demon
Stack for a stateless web application
SLA calculation guide for the following detail:

SLA summary for Azure services taken independently
- Akamai : 99.999%
- Azure DNS: 100% availability (so will remove from consideration in this problem, as will not skew the calculation)
- Azure Front door : 99.99% availability or 0.0001 probability of going down
- Azure App service: 99.95% availability or 0.0005 probability of going down
Azure App service across both regions being down as independent events simultaneously
0.05 % * 0.05 % = 0.000025%
So availability: 99.999975%
Either of Akamai OR Azure Frontdoor Or Azure App service across both regions being down
99.999% * 99.99% * 99.999975% = 99.9889%
The overall SLA of the stack is
99.9889%
Stack for a stateless web application through a private link with regional Redis cache
SLA calculation guide for the following detail:

SLA summary for Azure services taken independently
- Akamai : 99.999%.(This could well be 100% - something to validate contractually)
- Azure DNS: 100% availability (so will remove from consideration in this problem, as will not skew the calculation)
- Azure Front door : 99.99% availability or 0.0001 probability of going down
- Azure App service: 99.95% availability or 0.0005 probability of going down
- Azure private link: 99.99% availability or 0.0001 probability of going down
- Azure Redis (individual region - for any Standard): 99.9% or 0.001 probability of going down
Although considering Redis being used as a cache (read/write through) and should not "really" affect the SLA, we would consider it technically as part of this calculation demonstration.
Composite Availability of App Service and Redis within a region (inclusive of private link)
99.95 % * 99.99% * 99.9 % = 99.84%
unavailability of a region : 0.16% (100 - 99.84)
Unavailability of two regions of App Service, private link and Redis.
0.16 % * 0.16 % = 0.000256%
Compound Availability of App service and Redis over two regions: 99.999744%
Compound availability of the stack (Akamai Frontdoor ( (appservice + redis)both regions) ))
99.999 % * 99.99% * 99.999744 % = 99.9887%
The overall SLA of the stack is
99.9887%
Follow the approach as in the above examples to calculate the composite availability of the stack you deploy appropriate to the configuration (eg: types of instances will have different SLAs premium vs standard)
Downtime calculation.
- For a 24 hour period, the maximum allowed downtime(error budget) for an availability of
99.9887%is 9.76 seconds $((100-99.9887)/100 24 3600))$ - For a month, the maximum allowed downtime is
~ 5 minutes


