This is the multi-page printable view of this section. Click here to print.
Scaling
1 - Scaling overview
Kf leverages the Kubernetes Horizontal Pod Autoscaler (HPA) to automatically scale the number of Pods in a App. When autoscaling is enabled for an App, an HPA object is created and bound to the App object. It then dynamically calculates the target scale and sets it for the App.
Kf Apps are also compatible with HPA policies created outside of Kf.
How Kf scaling works
The number of Pods that are deployed for a Kf App is
controlled by its underlying Deployment object’s replicas
field. The target
number of Deployment replicas is set through the App’s replicas
field.
Scaling can be done manually with the kf scale
command.
This command is disabled when autoscaling is enabled to avoid conflicting targets.
How the Kubernetes Horizontal Pod Autoscaler works
The Horizontal Pod Autoscaler (HPA) is implemented as a Kubernetes API resource (the HPA object) and a control loop (the HPA controller) which periodically calculates the number of desired replicas based on current resource utilization. The HPA controller then passes the number to the target object that implements the Scale subresource. The actual scaling is delegated to the underlying object and its controller. You can find more information in the Kubernetes documentation.
How the Autoscaler determines when to scale
Periodically, the HPA controller queries the resource utilization against the metrics specified in each HorizontalPodAutoscaler definition. The controller obtains the metrics from the resource metrics API for each Pod. Then the controller calculates the utilization value as a percentage of the equivalent resource request. The desired number of replicas is then calculated based on the ratio of current percentage and desired percentage. You can read more about the autoscaling algorithm in the Kubernetes documentation.
Metrics
Kf uses HPA v1 which only supports CPU as the target metric.
How the Kubernetes Horizontal Autoscaler works with Kf
When autoscaling is enabled for a Kf App, the Kf controller will create an HPA object based on the scaling limits and rules specified on the App. Then the HPA controller fetches the specs from the HPA object and scales the App accordingly.
The HPA object will be deleted if Autoscaling is disabled or if the corresponding App is deleted.
2 - Manage Autoscaling
Kf supports two primary autoscaling modes:
- Built-in autosacling similar to Cloud Foundry.
- Advanced autoscaling through the Kubernetes Horizontal Pod Autoscaler (HPA).
Built-in autoscaling
Kf Apps can be automatically scaled based on CPU usage. You can configure autoscaling limits for your Apps and the target CPU usage for each App instance. Kf automatically scales your Apps up and down in response to demand.
By default, autoscaling is disabled. Follow the steps below to enable autoscaling.
View Apps
You can view the autoscaling status for an App using the kf apps
command. If autoscaling is enabled for an App, Instances
includes the
autoscaling status.
$ kf apps
Name Instances Memory Disk CPU
app1 4 (autoscaled 4 to 5) 256Mi 1Gi 100m
app2 1 256Mi 1Gi 100m
Autoscaling is enabled for app1
with min-instances
set to 4 and
max-instances
set to 5. Autoscaling is disabled for app2
.
Update autoscaling limits
You can update the instance limits using the kf update-autoscaling-limits
command.
kf update-autoscaling-limits app-name min-instances max-instances
Create autoscaling rule
You can create autoscaling rules using the kf create-autoscaling-rule
command.
kf create-autoscaling-rule app-name CPU min-threshold max-threshold
Delete autoscaling rules
You can delete all autoscaling rules with the
kf delete-autoscaling-rule
command. Kf only supports
one autoscaling rule.
kf delete-autoscaling-rules app-name
Enable and disable autoscaling
Autoscaling can be enabled by using enable-autoscaling
and
disabled by using disable-autoscaling
. When it is disabled, the
configurations, including limits and rules, are preserved.
kf enable-autoscaling app-name
kf disable-autoscaling app-name
Advanced autoscaling
Kf Apps support the Kubernetes Horizontal Pod Autoscaler interface and will
therefore work with HPAs created using kubectl
.
Kubernetes HPA policies are less restrictive than Kf’s built-in support for autoscaling.
They include support for:
- Scaling on memory, CPU, or disk usage.
- Scaling based on custom metrics, such as traffic load or queue length.
- Scaling on multiple metrics.
- The ability to tune reactivity to smooth out rapid scaling.
Using custom HPAs with apps
You can follow the Kubernetes HPA walkthrough to learn how to set up autoscalers.
When you create the HPA, make sure to set the scaleTargetRef
to be your application:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: app-scaler
namespace: SPACE_NAME
spec:
scaleTargetRef:
apiVersion: kf.dev/v1alpha1
kind: App
name: APP_NAME
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 60
Caveats
- You shouldn’t use Kf autoscaling with an HPA.
- When you use an HPA,
kf apps
will show the current number of instances, it won’t show that the App is being autoscaled.
3 - Managing Resources for Apps
When you create an app, you can optionally specify how much of each resource an instance of the application will receive when it runs.
Kf simplifies the Kubernetes model of resources and provides defaults that should work for most I/O bound applications out of the box.
Resource types
Kf supports three types of resources, memory, CPU, and ephemeral disk.
- Memory specifies the amount of RAM an application receives when running. If it exceeds this amount then the container is restarted.
- Ephemeral disk specifies how much an application can write to a local disk. If an application exceeds this amount then it may not be able to write more.
- CPU specifies the number of CPUs an application receives when running.
Manifest
Resources are specified using four values in the manifest:
memory
sets the guaranteed minimum an app will receive and the maximum it’s permitted to use.disk_quota
sets the guaranteed minimum an app will receive and the maximum it’s permitted to use.cpu
sets the guaranteed minimum an app will receive.cpu-limit
sets the maximum CPU an app can use.
Example:
applications:
- name: "example"
disk_quota: 512M
memory: 512M
cpu: 200m
cpu-limit: 2000m
Defaults
Memory and ephemeral storage are both set to 1Gi if not specified.
CPU defaults to one of the following
- 1/10th of a CPU if the platform operator hasn’t overridden it.
- A CPU value proportionally scaled by the amount of memory requested.
- A minimum CPU value set by the platform operator.
Resource units
Memory and disk
Cloud Foundry used the units T
, G
, M
, and K
to represent powers of two.
Kubernetes uses the units Ei
, Pi
, Gi
, Mi
, and Ki
for the same.
Kf allows you to specify memory and disk in either units.
CPU
Kf and Kubernetes use the unit m
for CPU, representing milli-CPU cores (thousandths of a core).
Sidecar overhead
When Kf schedules your app’s container as a Kubernetes Pod, it may bundle additional containers to your app to provide additional functionality. It’s likely your application will also have an Istio sidecar which is responsible for networking.
These containers will supply their own resource requests and limits and are overhead associated with running your application.
Best practices
- All applications should set memory and disk quotas.
- CPU intensive applications should set a CPU request and limit to guarantee they’ll have the resources they need without starving other apps.
- I/O bound applications shouldn’t set a CPU limit so they can burst during startup.