12/02/2021 | Press release | Distributed by Public on 12/02/2021 03:32
What are SLOs? Here's a guide to service-level objectives, how they work, and how they help DevOps teams auotmate and deliver better software.
As organizations adopt microservices-based architecture, service-level objectives (SLOs) have become a vital way for teams to set specific, measurable targets that ensure users are receiving agreed-upon service levels. SLOs, together with service-level indicators (SLIs), deliver the performance promised in service-level agreements (SLAs) and other business level objectives (BLOs) while staying within error budgets.
But what are SLOs? And why have SLOs and SLIs become so important as teams automate processes to consistently meet SLAs and error budgets? To get a better handle on this, let's start with some definitions.
SLOs are best understood as part of a framework for tracking service levels that also includes service level agreements (SLAs), service-level indicators (SLIs), and error budgets.
SLAs, or service-level agreements, are contracts signed between a vendor and customer that guarantees a certain measurable level of service. They are often drawn up with specific financial consequences if the vendor fails to provide the guaranteed service. SLAs are usually composed of many individual SLOs to help formalize the details of what is being promised. For example, an SLA between a web host provider and customer can guarantee 99.95% uptime for all web services of a company over a year.
As defined by Gartner, service-level objectives are an agreed-upon target within an SLA that must be achieved for each activity, function, and process to provide the best opportunity for customer success. In layman's terms, SLOs represent the performance or health of a service. These can include business metrics, such as conversion rates, uptime, and availability; service metrics, such as application performance; or technical metrics, such as dependencies to third-party services, underlying CPU, and the cost of running a service. For example, if the SLA for a website is 99.95% uptime, its corresponding SLO could be 99.95% availability of the login services. Organizations commonly use SLOs in production environments to ensure released code stays within error budgets.
Error budgets are an allowance for a certain amount of failure or technical debt within an SLO. For example, if your SLO guarantees 99.5% availability of a website over a year, your error budget is .05%. Error budgets allow development teams to make informed decisions between new development vs operations and polishing existing software. Properly set and defined SLOs should have error budgets that give developers space to innovate without impacting operations.
SLIs provide the actual metrics and measurements that indicate whether you are meeting your SLO. Most SLIs are measured in percentages to express the service level delivered. For example, if your SLO is to deliver 99.5% availability, the actual measurement may be 99.8%, which means you're meeting your agreements and you have happy customers. To gain an understanding of long-term trends, you can visually represent SLIs in a histogram that shows actual performance in the overall context of your SLOs.
To learn more about how Dynatrace does SLOs, check out the on-demand performance clinic, Getting started with SLOs in Dynatrace.
In short, service-level objectives ensure reliability. Generally, SLOs are important because they:
Cloud-native software and its supporting tools and infrastructure generate a diversity of metrics and data points every second that indicate a system's state and performance. Service-level objectives define or support a set of higher-level business goals, which you can measure by leveraging the data and insights from observability tools.
The goal of SLOs is to deliver more reliable, resilient, and responsive services that meet or exceed user expectations. Reliability and responsiveness are often measured in nines on the way to 100%. For example, an objective for system availability can be:
Each decimal point closer to 100 usually involves greater cost and complexity to achieve. Users may require a certain level of responsiveness, after which they can no longer detect a difference. Setting SLOs is part science and part art, striking a balance between statistical perfection and realistic goals.
You can set SLOs based on individual indicators, such as batch throughput, request latency, and failures-per-second. You can also create SLOs based on aggregate indicators, for example, the application performance index (Apdex), an industry standard that measures user satisfaction based on a variety of metrics.
Gathering and analyzing metrics over time will help you determine the overall effectiveness of your SLOs so you can tune them as your processes mature and improve. These trends also help you adjust business objectives and SLAs.
Service-level objectives define what good service means over a specific duration of time based on the measurements of SLIs. Here are some best practices to help you achieve the goals set out in your SLOs:
It's important to consider SLOs as an ongoing process and commitment to deliver optimal performance. IT workloads and end-user expectations are continually changing. An SLO designed for the workload requirements right now may not be equally valid for future performance requirements.
Keep SLOs simple, few and realistic. Avoid absolute numbers that are unachievable. You may set an internal SLO that acts as a safety margin or buffer to deliver a lower SLO target agreed with the end-users.
As more organizations adopt microservices, creating measurable SLOs is becoming more important to consistently deliver reliable, resilient, and responsive software that meets agreed-upon service levels. SLOs also help teams assess release risk and make decisions.
Microservices architecture means there are infinitely more apps, tools, and cloud-based infrastructure that influence an application's performance and availability. This makes developing effective SLOs more challenging. Dynatrace makes it easy to create and manage SLOs with out-of-the-box SLO templates and guidance for setting up SLOs with the right metrics, combined with automatic, AI-powered analytics and root-cause problem detection.
SLOs also set the stage for automating processes so you can speed up issue discovery and remediation before customers are impacted.
Ready for a deeper look at how to use SLOs for automation? Join us for the on-demand performance clinic, Automating SLOs as code-from Ops to Dev with Dynatrace.