Skip to content

What is Kubernetes AutoScaling?

Kubernetes AutoScaling automatically adjusts capacity based on traffic and resource usage so your applications stay:

  • Available under load
  • Cost‑efficient during low usage
  • Stable in production

Autoscaling works at two layers:

  • Cluster Infrastructure → Nodes (VMs / servers)
  • Workloads → Pods (application instances)

Two scaling directions:

  • Horizontal scaling → Add or remove instances
  • Vertical scaling → Increase or decrease resources per instance

Success

In production, horizontal scaling is preferred first for availability and fault tolerance. Vertical scaling is mainly used for tuning.


Scaling Types — Simple View

Cluster Infrastructure Scaling

Type What Changes Example
Horizontal Number of nodes Add worker nodes
Vertical Node size Increase VM CPU/RAM

Workload Scaling

Type What Changes Example
Horizontal Number of pods Increase replicas
Vertical Pod resources Increase CPU/memory limits

Scaling Basics (Pre‑Kubernetes Concept)

Core Concept

Vertical scaling - Increase CPU/RAM on same server - Usually requires restart - Has an upper limit

Horizontal scaling - Add more servers - Share load - Better resilience

Kubernetes applies the same ideas to pods and nodes.


Manual vs Automated Scaling

Manual Scaling

Manual Commands

kubeadm join ...
kubectl scale deployment app --replicas=5
kubectl edit deployment app

Warning

Manual scaling is acceptable for testing — not safe for production spikes.


Autoscaling Components Overview

Kubernetes autoscaling uses:

  • HPA → Scale pod count
  • VPA → Scale pod resources
  • Cluster Autoscaler → Scale nodes
  • In‑Place Resize → Resize pod resources without recreation (feature‑gated)

Metrics Requirement (Critical for Autoscaling)

Required

  • Metrics Server installed
  • CPU & memory requests defined
  • Limits recommended
kubectl top pods

Failure

Without resource requests → autoscalers cannot compute utilization.


Horizontal Pod Autoscaler (HPA)

What HPA Does

HPA automatically scales number of pods based on metrics.

  • Reads metrics continuously
  • Compares with target
  • Adds/removes pods

Success

Most used autoscaler for stateless production workloads.


How HPA Works

Abstract

Traffic ↑ → CPU ↑ → Metrics Server → HPA → More replicas → Load spreads → CPU ↓ → Scale down


Supported Metrics

  • CPU
  • Memory
  • Custom metrics
  • External metrics

HPA Requirements

Note

  • Metrics Server
  • Resource requests set
  • Deployment / RS / StatefulSet target

HPA Creation

Imperative

Example

kubectl autoscale deployment myapp --cpu-percent=50 --min=2 --max=10

Declarative

Example

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  minReplicas: 2
  maxReplicas: 12
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60

HPA Best Practices

Success

  • minReplicas > 1
  • Set safe maxReplicas
  • Use readiness probes
  • Load test thresholds
  • Prefer autoscaling/v2
  • Combine with Cluster Autoscaler

❌ Don’t

Danger

  • Run without requests
  • Leave max unlimited
  • Use for databases blindly

Vertical Pod Autoscaler (VPA)

What VPA Does

VPA automatically adjusts CPU and memory requests/limits of pods based on usage history.

Abstract

HPA = more pods
VPA = bigger pods

Note

VPA is not built‑in — must be installed.


Why VPA

Manual vertical scaling requires:

Example

kubectl top pod
kubectl edit deployment

Which causes:

Danger

  • Manual monitoring
  • Pod restart
  • Operational risk

VPA Components

Note

Recommender - Analyzes historical + live metrics - Suggests CPU/memory

Updater - Finds mis-sized pods - Evicts when needed

Admission Controller - Injects recommended resources at pod creation


Install VPA

Install Controllers

kubectl apply -f     https://github.com/kubernetes/autoscaler/releases/latest/download/vertical-pod-autoscaler.yaml

Verify:

kubectl get pods -n kube-system | grep vpa

VPA Modes

VPA behavior depends on update mode.

Update Modes

Off - Only recommendations - No pod changes

Initial - Apply only at pod creation - No evictions

Recreate - Evict pods to apply changes - Causes restart

Auto - Currently behaves like Recreate - Future: will prefer in-place resize


VPA Example

VPA Resource

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: myapp-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  updatePolicy:
    updateMode: Auto
  resourcePolicy:
    containerPolicies:
    - containerName: myapp
      minAllowed:
        cpu: 250m
      maxAllowed:
        cpu: "2"

Check recommendations:

kubectl describe vpa myapp-vpa

VPA vs HPA — Quick Production Difference

Abstract

VPA → Changes pod size → May restart pods
HPA → Changes pod count → No restarts


When to Use VPA

Tip

  • Databases
  • JVM apps
  • ML jobs
  • Stateful workloads
  • Resource tuning

❌ Don’t Use VPA For

Warning

  • Traffic spikes
  • Latency‑critical APIs
  • Non‑restartable apps

VPA Best Practices

Success

  • Start Off mode first
  • Review recommendations
  • Set min/max bounds
  • Use PDB
  • Monitor evictions

In‑Place Pod Resize

What It Is

Allows CPU/memory changes without recreating pod.

Default behavior:

  • Resource change → Pod deleted → New pod created

With feature:

  • Resource change → Pod resized → Less disruption

Feature Gate Requirement

In-place resize requires enabling the feature gate on control plane and kubelet.

Feature Gate Required

Enable:

FEATURE_GATES=InPlacePodVerticalScaling=true

If not enabled → Kubernetes falls back to delete & recreate pod behavior.


Resize Policy (Per Resource)

You can define resize behavior per resource type.

Resize Policy Example

resources:
  requests:
    cpu: "1"
    memory: "256Mi"
  limits:
    cpu: "500m"
    memory: "512Mi"

resizePolicy:
- resourceName: cpu
  restartPolicy: NotRequired
- resourceName: memory
  restartPolicy: RestartContainer

Meaning:

  • CPU change → no restart required
  • Memory change → container restart required

Limitations

Warning

  • CPU & memory only
  • QoS cannot change
  • No init/ephemeral containers
  • Cannot reduce memory below usage
  • No Windows support

In‑Place Resize Best Practices

Success

  • Test first
  • Define resizePolicy
  • Monitor status
  • Use readiness probes

Production Rules — HPA vs VPA

Abstract

HPA → Handle spikes
VPA → Tune resources

Quote

Use HPA for demand scaling
Use VPA for right‑sizing
Use In‑Place Resize to reduce disruption