What is Kubernetes AutoScaling?
Kubernetes AutoScaling automatically adjusts capacity based on traffic and resource usage so your applications stay:
- Available under load
- Costāefficient during low usage
- Stable in production
Autoscaling works at two layers:
- Cluster Infrastructure ā Nodes (VMs / servers)
- Workloads ā Pods (application instances)
Two scaling directions:
- Horizontal scaling ā Add or remove instances
- Vertical scaling ā Increase or decrease resources per instance
Success
In production, horizontal scaling is preferred first for availability and fault tolerance. Vertical scaling is mainly used for tuning.
Scaling Types ā Simple View
Cluster Infrastructure Scaling
| Type | What Changes | Example |
|---|---|---|
| Horizontal | Number of nodes | Add worker nodes |
| Vertical | Node size | Increase VM CPU/RAM |
Workload Scaling
| Type | What Changes | Example |
|---|---|---|
| Horizontal | Number of pods | Increase replicas |
| Vertical | Pod resources | Increase CPU/memory limits |
Scaling Basics (PreāKubernetes Concept)
Core Concept
Vertical scaling - Increase CPU/RAM on same server - Usually requires restart - Has an upper limit
Horizontal scaling - Add more servers - Share load - Better resilience
Kubernetes applies the same ideas to pods and nodes.
Manual vs Automated Scaling
Manual Scaling
Manual Commands
Warning
Manual scaling is acceptable for testing ā not safe for production spikes.
Autoscaling Components Overview
Kubernetes autoscaling uses:
- HPA ā Scale pod count
- VPA ā Scale pod resources
- Cluster Autoscaler ā Scale nodes
- InāPlace Resize ā Resize pod resources without recreation (featureāgated)
Metrics Requirement (Critical for Autoscaling)
Failure
Without resource requests ā autoscalers cannot compute utilization.
Horizontal Pod Autoscaler (HPA)
What HPA Does
HPA automatically scales number of pods based on metrics.
- Reads metrics continuously
- Compares with target
- Adds/removes pods
Success
Most used autoscaler for stateless production workloads.
How HPA Works
Abstract
Traffic ā ā CPU ā ā Metrics Server ā HPA ā More replicas ā Load spreads ā CPU ā ā Scale down
Supported Metrics
- CPU
- Memory
- Custom metrics
- External metrics
HPA Requirements
Note
- Metrics Server
- Resource requests set
- Deployment / RS / StatefulSet target
HPA Creation
Imperative
Declarative
Example
After Creating the HPA
Once the HPA is created, verify and monitor its behavior.
Check HPA Status
Watch HPA in Real Time
This shows:-
Current CPU utilization
-
Replica count changes
-
Automatic scaling events
View Detailed HPA Information
HPA Best Practices
Success
- minReplicas > 1
- Set safe maxReplicas
- Use readiness probes
- Load test thresholds
- Prefer autoscaling/v2
- Combine with Cluster Autoscaler
ā Donāt
Danger
- Run without requests
- Leave max unlimited
- Use for databases blindly
Vertical Pod Autoscaler (VPA)
What VPA Does
VPA automatically adjusts CPU and memory requests/limits of pods based on usage history.
Abstract
HPA = more pods
VPA = bigger pods
Note
VPA is not builtāin ā must be installed.
Why VPA
Manual vertical scaling requires:
Which causes:
Danger
- Manual monitoring
- Pod restart
- Operational risk
VPA Components
Note
Recommender - Analyzes historical + live metrics - Suggests CPU/memory
Updater - Finds mis-sized pods - Evicts when needed
Admission Controller - Injects recommended resources at pod creation
Install VPA
Install Controllers
Verify:
VPA Modes
VPA behavior depends on update mode.
Update Modes
Off - Only recommendations - No pod changes
Initial - Apply only at pod creation - No evictions
Recreate - Evict pods to apply changes - Causes restart
Auto - Currently behaves like Recreate - Future: will prefer in-place resize
VPA Example
VPA Resource
Check recommendations:
VPA vs HPA ā Quick Production Difference
Abstract
VPA ā Changes pod size ā May restart pods
HPA ā Changes pod count ā No restarts
When to Use VPA
Tip
- Databases
- JVM apps
- ML jobs
- Stateful workloads
- Resource tuning
ā Donāt Use VPA For
Warning
- Traffic spikes
- Latencyācritical APIs
- Nonārestartable apps
VPA Best Practices
Success
- Start Off mode first
- Review recommendations
- Set min/max bounds
- Use PDB
- Monitor evictions
InāPlace Pod Resize
What It Is
Allows CPU/memory changes without recreating pod.
Default behavior:
- Resource change ā Pod deleted ā New pod created
With feature:
- Resource change ā Pod resized ā Less disruption
Feature Gate Requirement
In-place resize requires enabling the feature gate on control plane and kubelet.
If not enabled ā Kubernetes falls back to delete & recreate pod behavior.
Resize Policy (Per Resource)
You can define resize behavior per resource type.
Resize Policy Example
Meaning:
- CPU change ā no restart required
- Memory change ā container restart required
Limitations
Warning
- CPU & memory only
- QoS cannot change
- No init/ephemeral containers
- Cannot reduce memory below usage
- No Windows support
InāPlace Resize Best Practices
Success
- Test first
- Define resizePolicy
- Monitor status
- Use readiness probes
Production Rules ā HPA vs VPA
Abstract
HPA ā Handle spikes
VPA ā Tune resources
Quote
Use HPA for demand scaling
Use VPA for rightāsizing
Use InāPlace Resize to reduce disruption