Datadog Inc.

10/31/2024 | News release | Distributed by Public on 10/31/2024 11:26

Best practices for monitoring cloud costs with Datadog Scorecards

To ensure that your organization's cloud spend is efficient, you need detailed and granular visibility to understand what comprises your costs, what causes them to change, and how the cloud services and resources you use are enabling your business goals. Extending your visibility and more closely monitoring your cloud costs can position you to successfully adopt FinOps, which provides a framework that can help you maximize the value you get from your cloud spend. As you evolve your cloud cost monitoring, you can mature your FinOps practice, empowering your teams to more actively manage their costs and identifying the trends that illustrate your organization's cost-management improvements.

Datadog Scorecards evaluate your organization's compliance against your observability, reliability, and documentation standards, and ensure visibility into each service's ownership data. By adding custom rules to your scorecards, you can easily track your FinOps evolution and share data to report that progress to stakeholders throughout the organization.

In this post, we'll describe several best practices that will help you initiate and mature your organization's FinOps practice. We'll show you how Scorecards and Workflow Automation can jump-start your organization's adoption and compliance by helping you collect cloud cost data and showcase the effectiveness of your optimizations as your FinOps practice matures.

We'll look at how you can:

Tag your resources for complete cost allocation

A key capability for a successful FinOps practice is cost allocation-attributing the component costs in your cloud bill to the specific business units or activities (such as teams, departments, or applications) that incurred those costs. Some cloud costs are shared-such as Kubernetes clusters that host services used by different teams-and can be difficult to allocate. But developing your organization's cost allocation capability early in your FinOps journey is critical, because even partial allocation can reveal paths toward optimizing your spend. It's important to allocate costs as much as you can based on the data you have and then aim to increase your cost allocation as your FinOps practice matures.

Adding a team tag to your cloud infrastructure components is a productive step in increasing your cost allocation. This tag allows you to isolate each team's costs from the organization's overall cloud spend, enabling cost ownership-a key principle in the FinOps framework. You can use Scorecards to help you track your organization's use of the team tag, which can help you understand how well you're able to allocate costs. In the screenshot below, a new custom rule is being added to the Cloud Costs scorecard to track usage of the team tag.

If your organization is new to FinOps-in what the FinOps Maturity Model calls the "crawl" stage-you should aim to allocate at least 50 percent of your costs. As your FinOps practice matures, you should increase this target, aiming for at least 80 percent in the intermediate ("walk") phase of the maturity model.

You can use Datadog Workflow Automation to automatically evaluate your services' use of the team tag and determine your overall cost allocation capability. The Workflow blueprint titled "At least 80 percent of a service's costs are allocated using the team tag" calculates the percentage of each service's costs that are tagged appropriately. The workflow automatically updates the outcome for each service, assigning it a PASSING result if 80 percent or more of its costs are tagged, or a FAILING result if the percentage of costs is too low.

In the screenshot below, the Scorecards view shows that 57 percent of services have the team tag applied. Note that one of the services has an outcome of SKIPPED, which indicates that no cost data was available for that service. One of the services with a FAILING outcome is expanded to explain the outcome and show recommended remediation steps.

Datadog provides resources to help you improve your tagging and increase your cost allocation. The cost visibility enabled through team tags complements the scoped resource visibility provided by Datadog Teams, which can help teams understand their costs in context. And you can use Datadog Tag Pipelines to help standardize the tags on your cloud resources. You can also use tools from your cloud provider to help enforce tagging, such as an AWS tag policy or Service Control Policy (SCP).

Some resources may remain untaggable-for example, your cloud provider's monitoring, logging, and support services. Although you may not reach 100 percent cost allocation, each increase you make improves your position to track and optimize your costs.

Maximize your use of commitment-based discounts

While using less of the cloud services and resources you rely on is one path to cost optimization, paying less for what you use-by adopting commitment-based discounts-is another path. Discount programs such as Amazon EC2 Reserved Instances allow you to reduce the cost of a service by committing to and prepaying for an amount of usage of that service. They can provide substantial savings, but require that you accurately project your future utilization of the service-if you overcommit, you'll pay for service you don't end up using.

You should take full advantage of these programs, but you need a deep understanding of your cloud costs to manage the risk of overpaying. The FinOps Maturity Model specifies a goal for organizations in the "crawl" phase to purchase enough discounts to cover just 60 percent of their eligible cloud spend. This can help new FinOps practitioners mitigate the risk of overcommitting until they have gained enough experience to safely purchase more discounts. As your FinOps practice matures into the "walk" phase, you'll have greater cost allocation and visibility, and you should aim to increase your discount purchases enough to cover 70 percent of your eligible spend.

The Workflow blueprint titled "Discount program coverage is at least 70 percent for a service" queries the proportion of discounted costs to overall costs. You can create a workflow based on this blueprint, and then add a custom rule to your Scorecards to surface the amount of discount coverage for each service, as well as your overall score rolled up from coverage data across all services.

Alert on unexpected cost changes

The speed and flexibility of the cloud enables teams to innovate quickly and operate reliable, performant applications. But while the cloud's elastic nature helps you ensure your applications' performance-for example, by instantly triggering serverless functions and fluidly managing your autoscaling groups-it can also lead to unexpected increases in your cloud spending. To avoid the panic of receiving a monthly bill that's higher than you expected, you should rely on real-time alerting to identify rising cloud costs while you still have time to mitigate them.

Cost Monitors for Datadog Cloud Cost Management allow you to alert on unexpected changes in your rate of cloud spend. Cost monitors proactively notify your engineering and FinOps teams, enabling them to investigate the cause of an increase and mitigate its impact on your monthly bill by optimizing the affected resource. You can alert on cost anomalies, as illustrated in the screenshot below. You can also alert on costs that rise above a threshold, forecasts that predict future costs above a limit you specify, or cost changes in comparison to the previous day, week, or month.

Scorecards can help you ensure that each of your services has at least one cost monitor to help you track its costs. Using a blueprint, you can easily build a workflow that automatically checks your services' cost monitors and displays the outcome on a scorecard. In the screenshot below, a scorecard shows the rule's outcome for each service and reports that only 25 percent of services have at least one associated cost monitor.

You can easily make your scorecards visible to stakeholders throughout your organization-for example, by sharing reports via Slack-to spotlight your teams' compliance with this and other best practices and cultivate another key FinOps capability: reporting and analytics.

Adopt best practices for continual cost optimization

Even if your FinOps practice is well established, the best practices we've highlighted in this post can help you achieve continuous optimization by improving your cloud cost monitoring and management. Scorecards give you visibility into your cost optimization status and let you track your progress and share updates across the organization to highlight your evolving FinOps capabilities. If you're not already using Datadog, you can start today with a free 14-day trial.