This article is part of a series on KubeCon Europe 2018.

  1. Autoscaling for Kubernetes workloads
    1. The old and new autoscalers
    2. Upcoming features
    3. Conclusion

Technologies like containers, clusters, and Kubernetes offer the prospect of rapidly scaling the available computing resources to match variable demands placed on the system. Actually implementing that scaling can be a challenge, though. During KubeCon + CloudNativeCon Europe 2018, Frederic Branczyk from CoreOS (now part of Red Hat) held a packed session to introduce a standard and officially recommended way to scale workloads automatically in Kubernetes clusters.

Kubernetes has had an autoscaler since the early days, but only recently did the community implement a more flexible and extensible mechanism to make decisions on when to add more resources to fulfill workload requirements. The new API integrates not only the Prometheus project, which is popular in Kubernetes deployments, but also any arbitrary monitoring system that implements the standardized APIs.

The old and new autoscalers

Frederic Branczyk

Branczyk first covered the history of the autoscaler architecture and how it has evolved through time. Kubernetes, since version 1.2, features a horizontal pod autoscaler (HPA), which dynamically allocates resources depending on the detected workload. When the load becomes too high, the HPA increases the number of pod replicas and, when the load goes down again, it removes superfluous copies. In the old HPA, a component called Heapster would pull usage metrics from the internal cAdvisor monitoring daemon and the HPA controller would then scale workloads up or down based on those metrics.

Unfortunately, the controller would only make decisions based on CPU utilization, even though Heapster provides other metrics like disk, memory, or network usage. According to Branczyk, while in theory any workload can be converted to a CPU-bound problem, this is an inconvenient limitation, especially when implementing higher-level service level agreements. For example, an arbitrary agreement like "process 95% of requests within 100 milliseconds" would be difficult to represent as a CPU-usage problem. Another limitation is that the Heapster API was only loosely defined and never officially adopted as part of the larger Kubernetes API. Heapster also required the help of a storage backend like InfluxDB or Google's Stackdriver to store samples, which made deploying an HPA challenging.

Architecture diagram

In late 2016, the "autoscaling special interest group" (SIG autoscaling) decided that the pipeline needed a redesign that would allow scaling based on arbitrary metrics from external monitoring systems. The result is that Kubernetes 1.6 shipped with a new API specification defining how the autoscaler integrates with those systems. Having learned from the Heapster experience, the developers specified the new API, but did not implement it for any specific system. This shifts responsibility of maintenance to the monitoring vendors: instead of "dumping" their glue code in Heapster, vendors now have to maintain their own adapter conforming to a well-defined API to get certified.

The new specification defines core metrics like CPU, memory, and disk usage. Kubernetes provides a canonical implementation of those metrics through the metrics server, a stripped down version of Heapster. The metrics server provides the core metrics required by Kubernetes so that scheduling, autoscaling, and things like kubectl top work out of the box. This means that any Kubernetes 1.8 cluster now supports autoscaling using those metrics out of the box: for example minikube or Google's Kubernetes Engine both offer a native metrics server without an external database or monitoring system.

In terms of configuration syntax, the change is minimal. Here is an example of how to configure the autoscaler in earlier Kubernetes releases, taken from the OpenShift Container Platform documentation:

    apiVersion: extensions/v1beta1
    kind: HorizontalPodAutoscaler
      name: frontend 
        kind: DeploymentConfig 
        name: frontend 
        apiVersion: v1 
        subresource: scale
      minReplicas: 1 
      maxReplicas: 10 
        targetPercentage: 80

The new API configuration is more flexible:

    apiVersion: autoscaling/v2beta1
    kind: HorizontalPodAutoscaler
      name: hpa-resource-metrics-cpu 
        apiVersion: apps/v1beta1 
        kind: ReplicationController 
        name: hello-hpa-cpu 
      minReplicas: 1 
      maxReplicas: 10 
      - type: Resource
          name: cpu
          targetAverageUtilization: 50

Notice how the cpuUtilization field is replaced by a more flexible metrics field that targets CPU utilization, but can support other core metrics like memory usage.

The ultimate goal of the new API, however, is to support arbitrary metrics, through the custom metrics API. This behaves like the core metrics, except that Kubernetes does not ship or define a set of custom metrics directly, which is where systems like Prometheus come in. Branczyk demonstrated the k8s-prometheus-adapter, which connects any Prometheus metric to the Kubernetes HPA, allowing the autoscaler to add new pods to reduce request latency, for example. Those metrics are bound to Kubernetes objects (e.g. pod, node, etc.) but an "external metrics API" was also introduced in the last two months to allow arbitrary metrics to influence autoscaling. This could allow Kubernetes to scale up a workload to deal with a larger load on an external message broker service, for example.

Here is an example of the custom metrics API pulling metrics from Prometheus to make sure that each pod handles around 200 requests per second:

      - type: Pods
          metricName: http_requests
          targetAverageValue: 200

Here http_requests is a metric exposed by the Prometheus server which looks at how many requests each pod is processing. To avoid putting too much load on each pod, the HPA will then ensure that this number will be around a target value by spawning or killing pods as appropriate.

Upcoming features

The SIG seem to have rounded up everything quite neatly. The next step is to deprecate Heapster: as of 1.10, all critical parts of Kubernetes use the new API so a discussion is under way in another group (SIG instrumentation) to finish moving away from the older design.

Another thing the community is looking into is vertical scaling. Horizontal scaling is fine for certain workloads, like caching servers or application frontends, but database servers, most notably, are harder to scale by just adding more replicas; in this case what an autoscaler should do is increase the size of the replicas instead of their numbers. Kubernetes supports this through the vertical pod autoscaler (VPA). It is less practical than the HPA because there is a physical limit to the size of individual servers that the autoscaler cannot exceed, while the HPA can scale up as long as you add new servers. According to Branczyk, the VPA is also more "complicated and fragile, so a lot more thought needs to go into that." As a result, the VPA is currently in alpha. It is not fully compatible with the HPA and is relevant only in cases where the HPA cannot do the job: for example, workloads where there is only a single pod or a fixed number of pods like StatefulSets.

Branczyk gave a set of predictions for other improvements that could come down the pipeline. One issue he identified is that, while the HPA and VPA can scale pods, there is a different Cluster Autoscaler (CA) that manages nodes, which are the actual machines running the pods. The CA allows a cluster to move pods between the nodes to remove underutilized nodes or create new nodes to respond to demand. It's similar to the HPA, except the HPA cannot provision new hardware resources like physical machines on its own: it only creates new pods on existing nodes. The idea here is to combine to two projects into a single one to keep a uniform interface for what is really the same functionality: scaling a workload by giving it more resources.

Another hope is that OpenMetrics will emerge as a standard for metrics across vendors. This process seems to be well under way with Kubernetes already using the Prometheus library, which serves as a basis for the standard, and with commercial vendors like Datadog supporting the Prometheus API as well. Another area of possible standardization is the gRPC protocol used in some Kubernetes clusters to communicate between microservices. Those endpoints can now expose metrics through "interceptors" that get executed before the request is passed to the application. One of those interceptors is the go-grpc-prometheus adapter, which enables Prometheus to scrape metrics from any gRPC-enabled service. The ultimate goal is to have standard metrics deployed across an entire cluster, allowing the creation of reusable dashboards, alerts, and autoscaling mechanisms in a uniform system.


This session was one of the most popular of the conference, which shows a deep interest in this key feature of Kubernetes deployments. It was great to see Branczyk, who is involved with the Prometheus project as well, work on standardization so other systems can work with Kubernetes.

The speed at which APIs change is impressive; in only a few months, the community upended a fundamental component of Kubernetes and replaced it with a new API that users will need to become familiar with. Given the flexibility and clarity of the new API, it is a small cost to pay to represent business logic inside such a complex system. Any simplification will surely be welcome in the maelstrom of APIs and subsystems that Kubernetes has become.

A video of the talk and slides [PDF] are available. SIG autoscaling members Marcin Wielgus and Solly Ross presented an introduction (video) and deep dive (video) talks that might be interesting to our readers who want all the gory details about Kubernetes autoscaling.

This article first appeared in the Linux Weekly News.

Created . Edited .