IBM - International Business Machines Corporation

08/24/2023 | Press release | Distributed by Public on 08/24/2023 12:39

Announcing automatic scaling for Ingress ALBs on IBM Cloud Kubernetes Service clusters

Calculating loads for applications can prove to be time-consuming and challenging given its variability based on regional business hours or significant holidays. When preliminary testing was done on IBM Cloud Ingress ALBs to determine its baseline load, it was shown to handle approximately 20,000 connections per second. However, if your applications received traffic exceeding this limitation, the number of ALB pods in the cluster had to be manually increased. This manual solution is not convenient nor able to handle varying load sizes, as it requires you to pick a static replica count, which might lead to resource waste when the cluster is underutilized or cause request timeouts during peak usage.

On 28 July 2023, we introduced managed autoscaling for IBM Cloud Kubernetes Service Ingress ALBs. This feature allows you to dynamically and automatically adjust the number of ALB pods based on the actual load. Therefore, you can minimize your resource waste and preserve your ability to serve increased traffic demand.

What is horizontal pod autoscaling?

Horizontal pod autoscaling (HPA) is a Kubernetes concept. In simple terms, HPA horizontally scales an application in response to changes in load. In practice, pods are launched or terminated automatically.

The load can be determined by many factors. Most of the time, CPU utilization is a good measure, but it is possible to configure other factors, as well. When configured, Kubernetes tries to keep the average load of all pods within a specified range by adjusting the replica count of the deployment. This helps by scaling up the replica count in more demanding times and scaling back in less-active hours.

In order to utilize the benefits of HPA on IBM Cloud Kubernetes Service Ingress ALB deployments, a set of CLI commands were introduced. In the next sections, we demonstrate how you can configure autoscaling on resource metrics like CPU usage or even on custom metrics for more advanced use cases.

Enable ALB autoscaling based on CPU average utilization

As the ALB pods are predominantly CPU-heavy processes, using the CPU average utilization as the basis of autoscaling configuration is a good choice for most use cases.

In order to simply enable HPA based on CPU utilization average, you can use the following CLI command:

$ ibmcloud ks ingress alb autoscale set -c  --alb  --max-replicas  --min-replicas  --cpu-average-utilization

To determine the desired CPU average utilization, you may find guidance in our documentation. It is not recommended to set the minimum replica count below two for high availability purposes. It is also not recommended to set the maximum replica count above the number of workers your cluster has, as these excess ALB pods will not be scheduled due to anti-affinity rules.

For example, if you would like to configure a maximum of 12 replicas with a target CPU average utilization of 600%, you can use the following command:

$ ibmcloud ks ingress alb autoscale set -c  --alb public-cr-alb1 --max-replicas 12 --min-replicas 2 --cpu-average-utilization 600

Keep in mind that it may take up to 10 minutes for the HPA resource to be deployed.

To verify that your HPA resource is configured as intended, you can use the kubectl get horizontalpodautoscaler command:

$ kubectl get horizontalpodautoscaler -n kube-system
NAME                                 REFERENCE                                       TARGETS    MINPODS   MAXPODS   REPLICAS   AGE
public-cr-alb1            Deployment/public-cr-alb1            11%/600%   2         12        2          131m

You can use the same command to determine if the HPA is working as intended. The following command was executed on the same cluster. This time, the utilization has increased, and more requests are being served by the ALB. As you can see, the HPA automatically increased the amount of ALB replicas as the load increased:

NAME                                 REFERENCE                                       TARGETS     MINPODS   MAXPODS   REPLICAS   AGE
public-cr-alb1            Deployment/public-cr-alb1            418%/600%   2         12        5          3h

Enable ALB autoscaling based on custom metrics

The ALBs expose various metrics, including request statistics and Nginx process metrics. These metrics can be captured and aggregated by a metric collector like Prometheus and made available to Kubernetes HPA by using Prometheus Adapter.

Depending on your use case, you might want to scale your ALBs based on metrics other than CPU average utilization. For example, based on the number of incoming requests per second or the number of established connections. We provide a way for you to fine-tune your scaling with custom metrics.

In the following example, we present an example for setting up autoscaling based on custom metrics. With this example, you can easily get started on designing your custom metrics-based setup. As a prerequisite, we deployed an application, exposed the application using an Ingress resource (named alb-autoscale-example-ingress) and set up monitoring by using Prometheus. We also configured Prometheus Adapter to expose the nginx_ingress_controller_requests_rate metric for the Kubernetes metric-server. If you would like to follow along, you can find the sample resources and instructions in the IBMCloud/kube-samples repository.

To verify whether the configured custom metrics are available for the Kubernetes metrics-server, we can issue the following command:

$ kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta2/namespaces/alb-autoscale-example/ingress/alb-autoscale-example-ingress/nginx_ingress_controller_requests_rate"
{
  "kind": "MetricValueList",
  "apiVersion": "custom.metrics.k8s.io/v1beta2",
  "metadata": {},
  "items": [
    {
      "describedObject": {
        "kind": "Ingress",
        "namespace": "alb-autoscale-example",
        "name": "alb-autoscale-example-ingress",
        "apiVersion": "networking.k8s.io/v1"
      },
      "metric": {
        "name": "nginx_ingress_controller_requests_rate",
        "selector": null
      },
      "timestamp": "2023-08-08T09:11:02Z",
      "value": "0"
    }
  ]
}

To enable ALB autoscaling based on custom metrics, we can use the same command we used before, but this time we are going to define the --custom-metrics-file flag instead of --cpu-average-utilization. The custom metrics file must contain a MetricSpec array that will be directly injected in the HPA configuration.

Our custom metrics file is named custom-metrics.yaml and has the following content:

- type: Object
  object:
    metric:
      name: nginx_ingress_controller_requests_rate
    describedObject:
      apiVersion: networking.k8s.io/v1
      kind: Ingress
      name: alb-autoscale-example-ingress
      namespace: alb-autoscale-example
    target:
      type: Value
      value: 2k

Let's use our custom metrics configuration for autoscale:

$ ibmcloud ks ingress alb autoscale set -c  --alb  --min-replicas 1 --max-replicas 12 --custom-metrics-file custom-metrics.yaml
Setting autoscaling configuration for ...
OK

Now we can use the kubectl get horizontalpodautoscaler command again to see whether autoscaling works as expected:

$ kubectl get horizontalpodautoscaler -n kube-system public-cr-alb1
NAME                                 REFERENCE                                       TARGETS       MINPODS   MAXPODS   REPLICAS   AGE
public-cr-alb1            Deployment/public-cr-alb1            0/2k          1         12        2          2h

The HPA scaled down the ALB to one replica because there were no requests and we configured the minimum replica count as one. As the request-per-second rate increases, the HPA scales up the ALB automatically:

$ kubectl get horizontalpodautoscaler -n kube-system public-cr-alb1
NAME                                 REFERENCE                                       TARGETS       MINPODS   MAXPODS   REPLICAS   AGE
public-cr-alb1            Deployment/public-cr-alb1            399700m/2k    1         12        3          2h

More information

To learn more about autoscaling in general, you can check out the Horizontal Pod Autoscaling page in the official Kubernetes documentation.

Check out "Dynamically scaling ALBs with autoscaler" in our official documentation

Contact us

If you have questions, engage our team via Slack by registering here and join the discussion in the #general channel on our public IBM Cloud Kubernetes Service Slack.

IBM Cloud Kubernetes Service Engineer
IBM Cloud Kubernetes Service Engineer
IBM Cloud Kubernetes Service Engineer
Software Engineer, IBM Cloud Kubernetes Service