This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.


Learn to scale apps up or down.

1 - Scaling overview

Learn about how Kf scales apps.

Kf leverages the Kubernetes Horizontal Pod Autoscaler (HPA) to automatically scale the number of Pods in a App. When autoscaling is enabled for an App, an HPA object is created and bound to the App object. It then dynamically calculates the target scale and sets it for the App.

Kf Apps are also compatible with HPA policies created outside of Kf.

How Kf scaling works

The number of Pods that are deployed for a Kf App is controlled by its underlying Deployment object’s replicas field. The target number of Deployment replicas is set through the App’s replicas field.

Scaling can be done manually with the kf scale command. This command is disabled when autoscaling is enabled to avoid conflicting targets.

How the Kubernetes Horizontal Pod Autoscaler works

The Horizontal Pod Autoscaler (HPA) is implemented as a Kubernetes API resource (the HPA object) and a control loop (the HPA controller) which periodically calculates the number of desired replicas based on current resource utilization. The HPA controller then passes the number to the target object that implements the Scale subresource. The actual scaling is delegated to the underlying object and its controller. You can find more information in the Kubernetes documentation.

How the Autoscaler determines when to scale

Periodically, the HPA controller queries the resource utilization against the metrics specified in each HorizontalPodAutoscaler definition. The controller obtains the metrics from the resource metrics API for each Pod. Then the controller calculates the utilization value as a percentage of the equivalent resource request. The desired number of replicas is then calculated based on the ratio of current percentage and desired percentage. You can read more about the autoscaling algorithm in the Kubernetes documentation.


Kf uses HPA v1 which only supports CPU as the target metric.

How the Kubernetes Horizontal Autoscaler works with Kf

When autoscaling is enabled for a Kf App, the Kf controller will create an HPA object based on the scaling limits and rules specified on the App. Then the HPA controller fetches the specs from the HPA object and scales the App accordingly.

The HPA object will be deleted if Autoscaling is disabled or if the corresponding App is deleted.

2 - Manage Autoscaling

Learn to use autoscaling for your app.

Kf supports two primary autoscaling modes:

Built-in autoscaling

Kf Apps can be automatically scaled based on CPU usage. You can configure autoscaling limits for your Apps and the target CPU usage for each App instance. Kf automatically scales your Apps up and down in response to demand.

By default, autoscaling is disabled. Follow the steps below to enable autoscaling.

View Apps

You can view the autoscaling status for an App using the kf apps command. If autoscaling is enabled for an App, Instances includes the autoscaling status.

$ kf apps

Name   Instances              Memory  Disk  CPU
app1   4 (autoscaled 4 to 5)  256Mi   1Gi   100m
app2   1                      256Mi   1Gi   100m

Autoscaling is enabled for app1 with min-instances set to 4 and max-instances set to 5. Autoscaling is disabled for app2.

Update autoscaling limits

You can update the instance limits using the kf update-autoscaling-limits command.

kf update-autoscaling-limits app-name min-instances max-instances

Create autoscaling rule

You can create autoscaling rules using the kf create-autoscaling-rule command.

kf create-autoscaling-rule app-name CPU min-threshold max-threshold

Delete autoscaling rules

You can delete all autoscaling rules with the kf delete-autoscaling-rule command. Kf only supports one autoscaling rule.

kf delete-autoscaling-rules app-name

Enable and disable autoscaling

Autoscaling can be enabled by using enable-autoscaling and disabled by using disable-autoscaling. When it is disabled, the configurations, including limits and rules, are preserved.

kf enable-autoscaling app-name

kf disable-autoscaling app-name

Advanced autoscaling

Kf Apps support the Kubernetes Horizontal Pod Autoscaler interface and will therefore work with HPAs created using kubectl.

Kubernetes HPA policies are less restrictive than Kf’s built-in support for autoscaling.

They include support for:

  • Scaling on memory, CPU, or disk usage.
  • Scaling based on custom metrics, such as traffic load or queue length.
  • Scaling on multiple metrics.
  • The ability to tune reactivity to smooth out rapid scaling.

Using custom HPAs with apps

You can follow the Kubernetes HPA walkthrough to learn how to set up autoscalers.

When you create the HPA, make sure to set the scaleTargetRef to be your application:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
  name: app-scaler
  namespace: SPACE_NAME
    kind: App
    name: APP_NAME
  minReplicas: 3
  maxReplicas: 10
  - type: Resource
      name: memory
        type: Utilization
        averageUtilization: 60


  • You shouldn’t use Kf autoscaling with an HPA.
  • When you use an HPA, kf apps will show the current number of instances, it won’t show that the App is being autoscaled.

3 - Managing Resources for Apps

Learn to set resources on apps.

When you create an app, you can optionally specify how much of each resource an instance of the application will receive when it runs.

Kf simplifies the Kubernetes model of resources and provides defaults that should work for most I/O bound applications out of the box.

Resource types

Kf supports three types of resources, memory, CPU, and ephemeral disk.

  • Memory specifies the amount of RAM an application receives when running. If it exceeds this amount then the container is restarted.
  • Ephemeral disk specifies how much an application can write to a local disk. If an application exceeds this amount then it may not be able to write more.
  • CPU specifies the number of CPUs an application receives when running.


Resources are specified using four values in the manifest:

  • memory sets the guaranteed minimum an app will receive and the maximum it’s permitted to use.
  • disk_quota sets the guaranteed minimum an app will receive and the maximum it’s permitted to use.
  • cpu sets the guaranteed minimum an app will receive.
  • cpu-limit sets the maximum CPU an app can use.


- name: "example"
  disk_quota: 512M
  memory: 512M
  cpu: 200m
  cpu-limit: 2000m


Memory and ephemeral storage are both set to 1Gi if not specified.

CPU defaults to one of the following

  • 1/10th of a CPU if the platform operator hasn’t overridden it.
  • A CPU value proportionally scaled by the amount of memory requested.
  • A minimum CPU value set by the platform operator.

Resource units

Memory and disk

Cloud Foundry used the units T, G, M, and K to represent powers of two. Kubernetes uses the units Ei, Pi, Gi, Mi, and Ki for the same.

Kf allows you to specify memory and disk in either units.


Kf and Kubernetes use the unit m for CPU, representing milli-CPU cores (thousandths of a core).

Sidecar overhead

When Kf schedules your app’s container as a Kubernetes Pod, it may bundle additional containers to your app to provide additional functionality. It’s likely your application will also have an Istio sidecar which is responsible for networking.

These containers will supply their own resource requests and limits and are overhead associated with running your application.

Best practices

  • All applications should set memory and disk quotas.
  • CPU intensive applications should set a CPU request and limit to guarantee they’ll have the resources they need without starving other apps.
  • I/O bound applications shouldn’t set a CPU limit so they can burst during startup.

Additional reading