9.3 Configure High Availability in Kubernetes
Abstract
High Availability (HA) in Kubernetes removes single points of failure from the cluster control plane.
In production, HA is required because losing a single control plane node should not stop cluster management, scheduling, controller reconciliation, or API access.
Why High Availability Is Needed
If a single control plane node fails:
- existing pods may continue running on worker nodes
- users may still access running applications
kubectlaccess may fail because the API server is unavailable- failed pods may not be recreated
- new pods may not be scheduled
- controllers cannot reconcile desired state
- cluster operations become unavailable
Warning
Running applications may survive a control plane failure temporarily, but the cluster cannot properly heal, scale, schedule, or accept management requests until the control plane is restored.
What HA Protects
A production HA Kubernetes design should provide redundancy for:
| Area | Why It Matters |
|---|---|
| Control plane nodes | Avoid losing cluster management |
| API servers | Keep kubectl and API access available |
| Controller managers | Keep reconciliation running |
| Schedulers | Keep pod scheduling available |
| etcd | Protect cluster state |
| Worker nodes | Keep workloads available |
| Load balancer | Provide a stable API endpoint |
Success
A highly available cluster avoids a single point of failure across both control plane and worker components.
Control Plane Components in HA
A control plane node commonly runs:
- kube-apiserver
- kube-controller-manager
- kube-scheduler
- etcd
In an HA setup, these components run on multiple control plane nodes.
Control Plane Node 1
├── kube-apiserver
├── kube-controller-manager
├── kube-scheduler
└── etcd
Control Plane Node 2
├── kube-apiserver
├── kube-controller-manager
├── kube-scheduler
└── etcd
API Server High Availability
The kube-apiserver can run in active-active mode.
That means multiple API servers can be running at the same time.
kubectl / API clients
↓
Load Balancer :6443
↓
┌───────────────┬───────────────┐
│ API Server 1 │ API Server 2 │
└───────────────┴───────────────┘
Note
API servers process requests independently, so multiple instances can safely run at the same time.
API Server Load Balancer
With multiple control plane nodes, clients should not connect directly to one master node.
Instead, configure a load balancer in front of API servers.
Example endpoint:
Your kubeconfig should point to the load balancer:
Tip
Use a highly available load balancer such as HAProxy, NGINX, cloud load balancer, or a virtual IP solution.
Controller Manager and Scheduler HA
The kube-controller-manager and kube-scheduler must not actively perform the same work at the same time.
They run in active-standby mode using leader election.
| Component | HA Mode | Reason |
|---|---|---|
| kube-apiserver | Active-active | Can process independent requests |
| kube-controller-manager | Active-standby | Prevent duplicate reconciliation |
| kube-scheduler | Active-standby | Prevent duplicate scheduling |
| etcd | Distributed quorum | Protect cluster state |
Warning
If multiple schedulers or controller managers act as leaders at the same time, duplicate or conflicting actions may occur.
Leader Election
Leader election ensures only one controller manager or scheduler is active.
Typical flow:
- All instances start
- Each tries to acquire a lease
- One becomes the leader
- Others remain standby
- If the leader fails, another instance takes over
Example controller manager options:
kube-controller-manager \
--leader-elect=true \
--leader-elect-lease-duration=15s \
--leader-elect-renew-deadline=10s \
--leader-elect-retry-period=2s
Note
--leader-elect=true is enabled by default for control plane components that need leader election.
Leader Election Timing
| Option | Purpose | Common Default |
|---|---|---|
--leader-elect |
Enables leader election | true |
--leader-elect-lease-duration |
How long the leader holds the lease | 15s |
--leader-elect-renew-deadline |
How long the leader has to renew | 10s |
--leader-elect-retry-period |
How often standby instances retry | 2s |
Tip
Do not tune leader election values casually in production. Incorrect values can cause unnecessary failovers or slow recovery.
etcd in HA
etcd stores all Kubernetes cluster state.
Examples of data stored in etcd:
- nodes
- pods
- deployments
- services
- secrets
- configmaps
- RBAC objects
- cluster configuration
Danger
If etcd data is lost and no backup exists, the Kubernetes cluster state may be unrecoverable.
etcd Access from API Server
The kube-apiserver is the only Kubernetes control plane component that directly communicates with etcd.
Example API server configuration:
kube-apiserver \
--etcd-servers=https://10.240.0.10:2379,https://10.240.0.11:2379,https://10.240.0.12:2379 \
--etcd-cafile=/var/lib/kubernetes/ca.pem \
--etcd-certfile=/var/lib/kubernetes/apiserver-etcd-client.crt \
--etcd-keyfile=/var/lib/kubernetes/apiserver-etcd-client.key
Note
The API server can connect to any healthy etcd member from the configured list.
HA Control Plane Topologies
There are two common HA control plane topologies:
- Stacked etcd topology
- External etcd topology
Stacked etcd Topology
In a stacked topology, each control plane node also runs an etcd member.
Control Plane Node 1
├── kube-apiserver
├── kube-controller-manager
├── kube-scheduler
└── etcd
Control Plane Node 2
├── kube-apiserver
├── kube-controller-manager
├── kube-scheduler
└── etcd
Advantages
- easier to set up
- easier to manage
- fewer servers required
- common for smaller HA clusters
Disadvantages
- control plane and etcd fail together on the same node
- losing a node reduces both API capacity and etcd quorum
- higher risk during failures
Warning
Stacked topology is simpler, but a failed node removes both a control plane instance and an etcd member.
External etcd Topology
In an external etcd topology, etcd runs on separate dedicated nodes.
Control Plane Nodes
├── kube-apiserver
├── kube-controller-manager
└── kube-scheduler
External etcd Nodes
└── etcd cluster
Advantages
- less risky for etcd availability
- control plane failure does not directly remove etcd members
- better separation of responsibilities
- preferred for stronger production isolation
Disadvantages
- harder to set up
- requires more servers
- more certificates and networking configuration
- more operational complexity
Success
External etcd topology is safer for critical production clusters because etcd is isolated from control plane node failures.
Stacked vs External etcd
| Area | Stacked etcd | External etcd |
|---|---|---|
| Setup complexity | Lower | Higher |
| Server count | Fewer | More |
| Management effort | Easier | Harder |
| Failure isolation | Lower | Higher |
| Production safety | Medium | Higher |
| Best for | Small/medium HA clusters | Critical production clusters |
Recommended Production Design
A simple production HA design includes:
Users / kubectl / API clients
↓
Control Plane Load Balancer
↓
┌────────────────┬────────────────┐
│ master-01 │ master-02 │
│ API Server │ API Server │
│ Controller Mgr │ Controller Mgr │
│ Scheduler │ Scheduler │
│ etcd │ etcd │
└────────────────┴────────────────┘
↓
┌────────────────┬────────────────┐
│ worker-01 │ worker-02 │
└────────────────┴────────────────┘
Note
For stronger production reliability, use three or more control plane nodes and an odd number of etcd members.
Minimum Production HA Considerations
| Component | Recommendation |
|---|---|
| Control plane nodes | At least 3 for stronger HA |
| etcd members | Odd number, commonly 3 or 5 |
| API endpoint | Load balancer or virtual IP |
| Worker nodes | Multiple workers across failure zones |
| etcd backup | Scheduled and tested |
| Certificates | Properly managed and rotated |
| Monitoring | Required for API server, etcd, nodes, and controllers |
Danger
A two-node etcd cluster is not ideal for production quorum. Prefer odd-numbered etcd membership.
Load Balancer Best Practices
Use a stable API endpoint:
The load balancer should:
- check API server health
- forward TCP traffic on port
6443 - avoid sending traffic to failed control plane nodes
- be highly available itself
- use DNS or VIP for a stable endpoint
Example HAProxy-style backend concept:
frontend kubernetes-api
bind *:6443
default_backend kube-apiserver
backend kube-apiserver
server master1 master1:6443 check
server master2 master2:6443 check
server master3 master3:6443 check
Tip
The API load balancer is part of the control plane availability design. Do not make it a new single point of failure.
etcd Backup Best Practices
For self-managed clusters, back up etcd regularly.
Example command:
ETCDCTL_API=3 etcdctl snapshot save snapshot.db \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
Verify snapshot:
Warning
Backups are only useful if restore procedures are tested.
Production Best Practices
Recommended
- Use multiple control plane nodes
- Place a load balancer in front of API servers
- Use leader election for scheduler and controller manager
- Use an odd number of etcd members
- Back up etcd regularly
- Test etcd restore procedures
- Spread nodes across availability zones when possible
- Monitor API server, etcd, scheduler, and controller manager
- Keep certificates and kubeconfigs secure
- Avoid running user workloads on control plane nodes
- Use infrastructure as code for repeatable HA setup
Do's
- Use HA for production clusters
- Use a stable API endpoint in kubeconfig
- Use multiple API servers behind a load balancer
- Use at least three etcd members for reliable quorum
- Monitor leader election and failover behavior
- Protect etcd with TLS
- Schedule etcd backups
- Test failure scenarios before production rollout
Don'ts
- Don't rely on a single control plane node in production
- Don't point kubeconfig directly to one master node
- Don't run etcd without backups
- Don't use even-numbered etcd clusters for critical production
- Don't expose etcd publicly
- Don't disable leader election
- Don't make the load balancer a single point of failure
- Don't run production workloads on control plane nodes unless explicitly designed
Failure
HA is not just “adding another master.” You must also design API access, leader election, etcd quorum, backups, and failure recovery.
Common Failure Scenarios
| Failure | Impact | HA Mitigation |
|---|---|---|
| One API server fails | API requests still work | Load balancer routes to healthy API servers |
| Active scheduler fails | New pod scheduling pauses briefly | Standby scheduler becomes leader |
| Active controller manager fails | Reconciliation pauses briefly | Standby controller manager becomes leader |
| One worker fails | Pods on that worker fail | ReplicaSets recreate pods elsewhere |
| One etcd member fails | Cluster state still available if quorum remains | etcd quorum |
| Load balancer fails | API access may fail | HA load balancer or VIP |
Troubleshooting Commands
Check nodes:
Check control plane pods:
Check component endpoints:
Check leader election leases:
Describe a lease:
kubectl describe lease kube-controller-manager -n kube-system
kubectl describe lease kube-scheduler -n kube-system
Check API server static pod manifest:
Check etcd members:
Check etcd health:
Tip
In kubeadm clusters, control plane components usually run as static pods under /etc/kubernetes/manifests.
Summary
Quote
- HA removes single points of failure from Kubernetes control plane
- API servers run active-active behind a load balancer
- Scheduler and controller manager run active-standby using leader election
- etcd protects cluster state and requires quorum
- Stacked topology is easier but has higher failure risk
- External etcd topology is safer but more complex
- Production clusters require backups, monitoring, secure certificates, and tested recovery