What is Kubernetes AutoScaling?
Kubernetes AutoScaling automatically adjusts capacity based on traffic and resource usage so your applications stay:
- Available under load
- Cost‑efficient during low usage
- Stable in production
Autoscaling works at two layers:
- Cluster Infrastructure → Nodes (VMs / servers)
- Workloads → Pods (application instances)
Two scaling directions:
- Horizontal scaling → Add or remove instances
- Vertical scaling → Increase or decrease resources per instance
Success
In production, horizontal scaling is preferred first for availability and fault tolerance. Vertical scaling is mainly used for tuning.
Scaling Types — Simple View
Cluster Infrastructure Scaling
| Type | What Changes | Example |
|---|---|---|
| Horizontal | Number of nodes | Add worker nodes |
| Vertical | Node size | Increase VM CPU/RAM |
Workload Scaling
| Type | What Changes | Example |
|---|---|---|
| Horizontal | Number of pods | Increase replicas |
| Vertical | Pod resources | Increase CPU/memory limits |
Scaling Basics (Pre‑Kubernetes Concept)
Core Concept
Vertical scaling - Increase CPU/RAM on same server - Usually requires restart - Has an upper limit
Horizontal scaling - Add more servers - Share load - Better resilience
Kubernetes applies the same ideas to pods and nodes.
Manual vs Automated Scaling
Manual Scaling
Manual Commands
Warning
Manual scaling is acceptable for testing — not safe for production spikes.
Autoscaling Components Overview
Kubernetes autoscaling uses:
- HPA → Scale pod count
- VPA → Scale pod resources
- Cluster Autoscaler → Scale nodes
- In‑Place Resize → Resize pod resources without recreation (feature‑gated)
Metrics Requirement (Critical for Autoscaling)
Failure
Without resource requests → autoscalers cannot compute utilization.
Horizontal Pod Autoscaler (HPA)
What HPA Does
HPA automatically scales number of pods based on metrics.
- Reads metrics continuously
- Compares with target
- Adds/removes pods
Success
Most used autoscaler for stateless production workloads.
How HPA Works
Abstract
Traffic ↑ → CPU ↑ → Metrics Server → HPA → More replicas → Load spreads → CPU ↓ → Scale down
Supported Metrics
- CPU
- Memory
- Custom metrics
- External metrics
HPA Requirements
Note
- Metrics Server
- Resource requests set
- Deployment / RS / StatefulSet target
HPA Creation
Imperative
Declarative
Example
HPA Best Practices
Success
- minReplicas > 1
- Set safe maxReplicas
- Use readiness probes
- Load test thresholds
- Prefer autoscaling/v2
- Combine with Cluster Autoscaler
❌ Don’t
Danger
- Run without requests
- Leave max unlimited
- Use for databases blindly
Vertical Pod Autoscaler (VPA)
What VPA Does
VPA automatically adjusts CPU and memory requests/limits of pods based on usage history.
Abstract
HPA = more pods
VPA = bigger pods
Note
VPA is not built‑in — must be installed.
Why VPA
Manual vertical scaling requires:
Which causes:
Danger
- Manual monitoring
- Pod restart
- Operational risk
VPA Components
Note
Recommender - Analyzes historical + live metrics - Suggests CPU/memory
Updater - Finds mis-sized pods - Evicts when needed
Admission Controller - Injects recommended resources at pod creation
Install VPA
Install Controllers
Verify:
VPA Modes
VPA behavior depends on update mode.
Update Modes
Off - Only recommendations - No pod changes
Initial - Apply only at pod creation - No evictions
Recreate - Evict pods to apply changes - Causes restart
Auto - Currently behaves like Recreate - Future: will prefer in-place resize
VPA Example
VPA Resource
Check recommendations:
VPA vs HPA — Quick Production Difference
Abstract
VPA → Changes pod size → May restart pods
HPA → Changes pod count → No restarts
When to Use VPA
Tip
- Databases
- JVM apps
- ML jobs
- Stateful workloads
- Resource tuning
❌ Don’t Use VPA For
Warning
- Traffic spikes
- Latency‑critical APIs
- Non‑restartable apps
VPA Best Practices
Success
- Start Off mode first
- Review recommendations
- Set min/max bounds
- Use PDB
- Monitor evictions
In‑Place Pod Resize
What It Is
Allows CPU/memory changes without recreating pod.
Default behavior:
- Resource change → Pod deleted → New pod created
With feature:
- Resource change → Pod resized → Less disruption
Feature Gate Requirement
In-place resize requires enabling the feature gate on control plane and kubelet.
If not enabled → Kubernetes falls back to delete & recreate pod behavior.
Resize Policy (Per Resource)
You can define resize behavior per resource type.
Resize Policy Example
Meaning:
- CPU change → no restart required
- Memory change → container restart required
Limitations
Warning
- CPU & memory only
- QoS cannot change
- No init/ephemeral containers
- Cannot reduce memory below usage
- No Windows support
In‑Place Resize Best Practices
Success
- Test first
- Define resizePolicy
- Monitor status
- Use readiness probes
Production Rules — HPA vs VPA
Abstract
HPA → Handle spikes
VPA → Tune resources
Quote
Use HPA for demand scaling
Use VPA for right‑sizing
Use In‑Place Resize to reduce disruption