What is Kubernetes AutoScaling?

Kubernetes AutoScaling automatically adjusts capacity based on traffic and resource usage so your applications stay:

Available under load
Cost‑efficient during low usage
Stable in production

Autoscaling works at two layers:

Cluster Infrastructure → Nodes (VMs / servers)
Workloads → Pods (application instances)

Two scaling directions:

Horizontal scaling → Add or remove instances
Vertical scaling → Increase or decrease resources per instance

Success

In production, horizontal scaling is preferred first for availability and fault tolerance. Vertical scaling is mainly used for tuning.

Scaling Types — Simple View

Cluster Infrastructure Scaling

Type	What Changes	Example
Horizontal	Number of nodes	Add worker nodes
Vertical	Node size	Increase VM CPU/RAM

Workload Scaling

Type	What Changes	Example
Horizontal	Number of pods	Increase replicas
Vertical	Pod resources	Increase CPU/memory limits

Scaling Basics (Pre‑Kubernetes Concept)

Core Concept

Vertical scaling - Increase CPU/RAM on same server - Usually requires restart - Has an upper limit

Horizontal scaling - Add more servers - Share load - Better resilience

Kubernetes applies the same ideas to pods and nodes.

Manual vs Automated Scaling

Manual Scaling

Manual Commands

kubeadm join ...
kubectl scale deployment app --replicas=5
kubectl edit deployment app

Warning

Manual scaling is acceptable for testing — not safe for production spikes.

Autoscaling Components Overview

Kubernetes autoscaling uses:

HPA → Scale pod count
VPA → Scale pod resources
Cluster Autoscaler → Scale nodes
In‑Place Resize → Resize pod resources without recreation (feature‑gated)

Metrics Requirement (Critical for Autoscaling)

Required

Metrics Server installed
CPU & memory requests defined
Limits recommended

kubectl top pods

Failure

Without resource requests → autoscalers cannot compute utilization.

Horizontal Pod Autoscaler (HPA)

What HPA Does

HPA automatically scales number of pods based on metrics.

Reads metrics continuously
Compares with target
Adds/removes pods

Success

Most used autoscaler for stateless production workloads.

How HPA Works

Abstract

Traffic ↑ → CPU ↑ → Metrics Server → HPA → More replicas → Load spreads → CPU ↓ → Scale down

Supported Metrics

CPU
Memory
Custom metrics
External metrics

HPA Requirements

Note

Metrics Server
Resource requests set
Deployment / RS / StatefulSet target

HPA Creation

Imperative

Example

kubectl autoscale deployment myapp --cpu-percent=50 --min=2 --max=10

Declarative

Example

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  minReplicas: 2
  maxReplicas: 12
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60

HPA Best Practices

Success

minReplicas > 1
Set safe maxReplicas
Use readiness probes
Load test thresholds
Prefer autoscaling/v2
Combine with Cluster Autoscaler

❌ Don’t

Danger

Run without requests
Leave max unlimited
Use for databases blindly

Vertical Pod Autoscaler (VPA)

What VPA Does

VPA automatically adjusts CPU and memory requests/limits of pods based on usage history.

Abstract

HPA = more pods
VPA = bigger pods

Note

VPA is not built‑in — must be installed.

Why VPA

Manual vertical scaling requires:

Example

kubectl top pod
kubectl edit deployment

Which causes:

Danger

Manual monitoring
Pod restart
Operational risk

VPA Components

Note

Recommender - Analyzes historical + live metrics - Suggests CPU/memory

Updater - Finds mis-sized pods - Evicts when needed

Admission Controller - Injects recommended resources at pod creation

Install VPA

Install Controllers

kubectl apply -f     https://github.com/kubernetes/autoscaler/releases/latest/download/vertical-pod-autoscaler.yaml

Verify:

kubectl get pods -n kube-system | grep vpa

VPA Modes

VPA behavior depends on update mode.

Update Modes

Off - Only recommendations - No pod changes

Initial - Apply only at pod creation - No evictions

Recreate - Evict pods to apply changes - Causes restart

Auto - Currently behaves like Recreate - Future: will prefer in-place resize

VPA Example

VPA Resource

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: myapp-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  updatePolicy:
    updateMode: Auto
  resourcePolicy:
    containerPolicies:
    - containerName: myapp
      minAllowed:
        cpu: 250m
      maxAllowed:
        cpu: "2"

Check recommendations:

kubectl describe vpa myapp-vpa

VPA vs HPA — Quick Production Difference

Abstract

VPA → Changes pod size → May restart pods
HPA → Changes pod count → No restarts

When to Use VPA

Tip

Databases
JVM apps
ML jobs
Stateful workloads
Resource tuning

❌ Don’t Use VPA For

Warning

Traffic spikes
Latency‑critical APIs
Non‑restartable apps

VPA Best Practices

Success

Start Off mode first
Review recommendations
Set min/max bounds
Use PDB
Monitor evictions

In‑Place Pod Resize

What It Is

Allows CPU/memory changes without recreating pod.

Default behavior:

Resource change → Pod deleted → New pod created

With feature:

Resource change → Pod resized → Less disruption

Feature Gate Requirement

In-place resize requires enabling the feature gate on control plane and kubelet.

Feature Gate Required

Enable:

FEATURE_GATES=InPlacePodVerticalScaling=true

If not enabled → Kubernetes falls back to delete & recreate pod behavior.

Resize Policy (Per Resource)

You can define resize behavior per resource type.

Resize Policy Example

resources:
  requests:
    cpu: "1"
    memory: "256Mi"
  limits:
    cpu: "500m"
    memory: "512Mi"

resizePolicy:
- resourceName: cpu
  restartPolicy: NotRequired
- resourceName: memory
  restartPolicy: RestartContainer

Meaning:

CPU change → no restart required
Memory change → container restart required

Limitations

Warning

CPU & memory only
QoS cannot change
No init/ephemeral containers
Cannot reduce memory below usage
No Windows support

In‑Place Resize Best Practices

Success

Test first
Define resizePolicy
Monitor status
Use readiness probes

Production Rules — HPA vs VPA

Abstract

HPA → Handle spikes
VPA → Tune resources

Quote

Use HPA for demand scaling
Use VPA for right‑sizing
Use In‑Place Resize to reduce disruption