9.4 ETCD in High Availability

Abstract

etcd is the distributed key-value store used by Kubernetes to store all cluster state.

In a High Availability (HA) Kubernetes cluster, etcd must also be highly available because losing etcd means losing the source of truth for the cluster.

What is etcd?

etcd is a:

distributed key-value store
reliable datastore
strongly consistent system
secure and fast backend for Kubernetes state

Kubernetes stores critical cluster data in etcd, including:

Data Type	Examples
Workloads	Pods, Deployments, ReplicaSets
Networking	Services, Endpoints, NetworkPolicies
Security	Secrets, ServiceAccounts, RBAC
Configuration	ConfigMaps, API objects
Cluster state	Nodes, leases, controllers, scheduler state

Danger

If etcd data is lost and no backup exists, the Kubernetes cluster may not be recoverable.

Why etcd Needs HA

A single etcd node is a single point of failure.

If one etcd server stores all cluster state and it fails:

Kubernetes API may stop working
cluster state cannot be read or updated
controllers cannot reconcile resources
new scheduling decisions may fail
recovery depends on backup availability

Success

In production, etcd should run as a cluster with multiple members.

Distributed etcd Cluster

In HA mode, etcd runs on multiple servers.

Each etcd member stores a copy of the same data.

etcd-1  ←→  etcd-2  ←→  etcd-3

All members cooperate to keep data consistent.

Note

You can read from any etcd member, but writes are coordinated through a leader.

Consistency in etcd

etcd provides strong consistency.

That means:

all members agree on the same data
writes are replicated before being committed
clients get reliable cluster state
Kubernetes does not see conflicting data

Client writes key=name value=John
        ↓
Leader accepts write
        ↓
Followers replicate write
        ↓
Write is committed after quorum

Tip

Strong consistency is important because Kubernetes controllers depend on accurate cluster state.

Read and Write Behavior

Reads

Reads can be served from etcd members because all members maintain consistent data.

Writes

Writes are handled by the etcd leader.

If a write request reaches a follower:

follower forwards the request to the leader
leader processes the write
leader replicates the write to followers
write is committed after quorum is reached

Note

etcd may receive write requests on any member, but internally the leader coordinates the write.

Leader and Followers

An etcd cluster has:

Role	Description
Leader	Handles writes and coordinates replication
Followers	Replicate data and participate in quorum

Example:

etcd-1  → Leader
etcd-2  → Follower
etcd-3  → Follower

If the leader fails, the remaining members elect a new leader.

Warning

If quorum is lost, etcd cannot safely process writes.

Raft Consensus Protocol

etcd uses the Raft consensus algorithm.

Raft is responsible for:

leader election
log replication
quorum-based writes
failover when leader is unavailable

Leader election flow:

members start without a leader
each member starts an election timer
one member requests votes
majority votes elect a leader
leader sends heartbeats
if heartbeats stop, a new election starts

Abstract

Raft helps etcd members agree on one correct cluster state.

Write Replication Flow

When a write happens:

Write request
    ↓
Leader receives request
    ↓
Leader sends update to followers
    ↓
Majority confirms
    ↓
Write is committed
    ↓
Data becomes available consistently

Example

In a 3-member etcd cluster, a write is committed when at least 2 members agree.

Quorum

Quorum is the minimum number of etcd members required for the cluster to make decisions.

Formula:

Quorum = N/2 + 1

Where N is the number of etcd members.

etcd Members	Quorum	Fault Tolerance
1	1	0
2	2	0
3	2	1
4	3	1
5	3	2
6	4	2
7	4	3

Note

Fault tolerance means how many members can fail while still keeping quorum.

Why 2 etcd Members Is Not Enough

A 2-member etcd cluster has quorum of 2.

If one member fails:

Remaining members = 1
Required quorum = 2

Result:

no quorum
writes cannot be committed
cluster becomes unhealthy

Failure

A 2-member etcd cluster gives almost no real HA benefit because losing one member breaks quorum.

Odd Number of etcd Members

Odd numbers are preferred:

3 members
5 members
7 members

Why?

Because odd numbers improve quorum behavior during failures or network partitions.

Success

Use an odd number of etcd members in production. Common choices are 3 or 5.

Odd vs Even Example

A 6-member cluster has quorum of 4.

If a network partition splits it into 3 + 3:

Partition A = 3 members
Partition B = 3 members
Required quorum = 4

Neither side has quorum.

Result:

cluster cannot make progress
writes fail

A 7-member cluster has quorum of 4.

If split into 4 + 3:

Partition A = 4 members → quorum exists
Partition B = 3 members → no quorum

Cluster can continue on the majority side.

Warning

Even-numbered etcd clusters can fail during equal network partitions.

Recommended etcd Cluster Size

Size	Recommendation
1 member	Only for labs/dev
2 members	Avoid
3 members	Good production minimum
5 members	Better fault tolerance
7+ members	Usually unnecessary

Tip

For most production Kubernetes clusters, 3 or 5 etcd members is enough.

etcd Topologies in Kubernetes

There are two common Kubernetes HA etcd topologies:

Stacked etcd topology
External etcd topology

Stacked etcd Topology

In stacked topology, etcd runs on the same nodes as the control plane.

master-1
  ├── kube-apiserver
  ├── controller-manager
  ├── scheduler
  └── etcd

master-2
  ├── kube-apiserver
  ├── controller-manager
  ├── scheduler
  └── etcd

master-3
  ├── kube-apiserver
  ├── controller-manager
  ├── scheduler
  └── etcd

Advantages

easier to set up
fewer servers
easier to manage
common with kubeadm HA clusters

Disadvantages

losing a control plane node also loses an etcd member
lower failure isolation
more risk during node failures

Warning

Stacked topology is simpler but couples control plane failure with etcd failure.

External etcd Topology

In external etcd topology, etcd runs on dedicated nodes separate from the control plane.

Control Plane Nodes
  ├── kube-apiserver
  ├── controller-manager
  └── scheduler

External etcd Nodes
  ├── etcd-1
  ├── etcd-2
  └── etcd-3

Advantages

better failure isolation
safer for critical production clusters
control plane node failure does not remove etcd member
easier to scale or maintain etcd separately

Disadvantages

harder to set up
more servers
more certificates
more networking and operational complexity

Success

External etcd is preferred for strict production environments where cluster state must be isolated and protected.

Stacked vs External etcd

Area	Stacked etcd	External etcd
Complexity	Lower	Higher
Server count	Fewer	More
Failure isolation	Lower	Higher
Cost	Lower	Higher
Operational effort	Easier	Harder
Production safety	Good	Better

etcd Ports

etcd uses two important ports:

Port	Purpose
`2379`	Client communication, used by kube-apiserver
`2380`	Peer communication between etcd members

Note

In HA etcd, both client and peer communication must be allowed through firewall rules.

Installing etcd Manually

Typical manual setup steps:

wget -q --https-only \
  "https://github.com/etcd-io/etcd/releases/download/v3.3.9/etcd-v3.3.9-linux-amd64.tar.gz"

tar -xvf etcd-v3.3.9-linux-amd64.tar.gz

mv etcd-v3.3.9-linux-amd64/etcd* /usr/local/bin/

mkdir -p /etc/etcd /var/lib/etcd

cp ca.pem kubernetes-key.pem kubernetes.pem /etc/etcd/

Warning

Use a version compatible with your Kubernetes release. Do not randomly upgrade etcd in production.

etcd Service Configuration

Example systemd-style etcd configuration:

ExecStart=/usr/local/bin/etcd \
  --name=${ETCD_NAME} \
  --cert-file=/etc/etcd/kubernetes.pem \
  --key-file=/etc/etcd/kubernetes-key.pem \
  --peer-cert-file=/etc/etcd/kubernetes.pem \
  --peer-key-file=/etc/etcd/kubernetes-key.pem \
  --trusted-ca-file=/etc/etcd/ca.pem \
  --peer-trusted-ca-file=/etc/etcd/ca.pem \
  --peer-client-cert-auth \
  --client-cert-auth \
  --initial-advertise-peer-urls=https://${INTERNAL_IP}:2380 \
  --listen-peer-urls=https://${INTERNAL_IP}:2380 \
  --listen-client-urls=https://${INTERNAL_IP}:2379,https://127.0.0.1:2379 \
  --advertise-client-urls=https://${INTERNAL_IP}:2379 \
  --initial-cluster-token=etcd-cluster-0 \
  --initial-cluster=master-1=https://${MASTER1_IP}:2380,master-2=https://${MASTER2_IP}:2380,master-3=https://${MASTER3_IP}:2380 \
  --initial-cluster-state=new \
  --data-dir=/var/lib/etcd

Note

The --initial-cluster flag tells each etcd member where its peers are.

Important etcd Flags

Flag	Purpose
`--name`	Unique etcd member name
`--data-dir`	etcd data directory
`--listen-client-urls`	Client listener URL
`--advertise-client-urls`	Client URL advertised to clients
`--listen-peer-urls`	Peer listener URL
`--initial-advertise-peer-urls`	Peer URL advertised to members
`--initial-cluster`	List of all initial etcd members
`--initial-cluster-state`	`new` for new cluster, `existing` for joining
`--cert-file`	TLS certificate for client traffic
`--peer-cert-file`	TLS certificate for peer traffic

Using etcdctl

etcdctl is the CLI tool used to interact with etcd.

Set API version:

export ETCDCTL_API=3

Put a key:

etcdctl put name John

Get a key:

etcdctl get name

List keys:

etcdctl get / --prefix --keys-only

Tip

In Kubernetes troubleshooting, etcdctl is commonly used for snapshots, member checks, and health checks.

Check etcd Cluster Health

Check endpoint health:

ETCDCTL_API=3 etcdctl endpoint health --cluster

Check endpoint status:

ETCDCTL_API=3 etcdctl endpoint status --cluster -w table

List members:

ETCDCTL_API=3 etcdctl member list

Note

In secured clusters, include --cacert, --cert, and --key options.

Example with certificates:

ETCDCTL_API=3 etcdctl endpoint health --cluster \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key

etcd Backup

Back up etcd regularly.

ETCDCTL_API=3 etcdctl snapshot save snapshot.db \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key

Verify snapshot:

ETCDCTL_API=3 etcdctl snapshot status snapshot.db

Danger

etcd backups must be tested. A backup without a tested restore process is not a reliable recovery plan.

Kubernetes API Server and etcd

The kube-apiserver talks to etcd using the --etcd-servers option.

Example:

kube-apiserver \
  --etcd-servers=https://10.240.0.10:2379,https://10.240.0.11:2379,https://10.240.0.12:2379 \
  --etcd-cafile=/var/lib/kubernetes/ca.pem \
  --etcd-certfile=/var/lib/kubernetes/apiserver-etcd-client.crt \
  --etcd-keyfile=/var/lib/kubernetes/apiserver-etcd-client.key

Note

The kube-apiserver should be configured with multiple etcd endpoints for HA.

Production Best Practices

Recommended

Use 3 or 5 etcd members
Use an odd number of etcd members
Enable TLS for client and peer communication
Keep etcd behind private networking
Do not expose etcd publicly
Use fast disks, preferably SSD
Monitor etcd latency, leader changes, and disk usage
Take regular snapshots
Test snapshot restore
Keep etcd members close in network latency
Avoid running heavy workloads on etcd nodes
Use external etcd for critical production clusters when possible

Do's

Use odd-numbered etcd clusters
Use at least 3 members for HA
Protect etcd with TLS
Restrict etcd access to API servers only
Monitor quorum and leader status
Back up etcd before upgrades
Test restore in a non-production environment
Use reliable disks and stable networking

Don'ts

Don't use 2 etcd members for production HA
Don't expose port 2379 publicly
Don't ignore quorum requirements
Don't run etcd on slow or unstable disks
Don't manually edit etcd data unless absolutely necessary
Don't upgrade etcd without checking Kubernetes compatibility
Don't skip backup validation
Don't place all etcd members in the same failure domain

Failure

etcd HA depends on quorum. Adding more nodes does not help if they are poorly distributed or frequently partitioned.

Common Failure Scenarios

Failure	Impact	Mitigation
One etcd member fails in 3-node cluster	Cluster continues	Replace failed member quickly
Two etcd members fail in 3-node cluster	Quorum lost	Restore member or recover from backup
Network partition	Minority side stops processing writes	Use odd number and stable networking
Disk latency increases	API server may become slow	Use SSD and monitor disk I/O
etcd certificate expires	API server cannot connect	Monitor and rotate certificates
Backup missing	Recovery becomes difficult	Schedule and test snapshots

Troubleshooting Commands

Check etcd pods in kubeadm clusters:

kubectl get pods -n kube-system | grep etcd

Check etcd static pod manifest:

cat /etc/kubernetes/manifests/etcd.yaml

Check etcd logs:

kubectl logs -n kube-system etcd-<node-name>

Check members:

ETCDCTL_API=3 etcdctl member list

Check endpoint status:

ETCDCTL_API=3 etcdctl endpoint status --cluster -w table

Check endpoint health:

ETCDCTL_API=3 etcdctl endpoint health --cluster

Check kube-apiserver etcd config:

cat /etc/kubernetes/manifests/kube-apiserver.yaml | grep etcd

Exam and Practical Notes

CKA Focus

For certification-style tasks, focus on:

identifying etcd endpoints
checking etcd health
taking etcd snapshots
restoring from etcd snapshots
understanding quorum
knowing why odd number of members is recommended

Quick Reference

Task	Command
Put key	`etcdctl put name John`
Get key	`etcdctl get name`
List keys	`etcdctl get / --prefix --keys-only`
Member list	`etcdctl member list`
Endpoint health	`etcdctl endpoint health --cluster`
Endpoint status	`etcdctl endpoint status --cluster -w table`
Snapshot save	`etcdctl snapshot save snapshot.db`
Snapshot status	`etcdctl snapshot status snapshot.db`

Summary

Quote

etcd stores all Kubernetes cluster state
etcd HA depends on Raft consensus and quorum
Writes are coordinated by a leader
A majority of members must agree before writes are committed
Avoid 2-member etcd clusters
Use odd-numbered clusters, usually 3 or 5 members
Use stacked etcd for simpler setup
Use external etcd for stronger production isolation
Always secure, monitor, back up, and test restore for etcd