Skip to content

9.4 ETCD in High Availability

Abstract

etcd is the distributed key-value store used by Kubernetes to store all cluster state.

In a High Availability (HA) Kubernetes cluster, etcd must also be highly available because losing etcd means losing the source of truth for the cluster.


What is etcd?

etcd is a:

  • distributed key-value store
  • reliable datastore
  • strongly consistent system
  • secure and fast backend for Kubernetes state

Kubernetes stores critical cluster data in etcd, including:

Data Type Examples
Workloads Pods, Deployments, ReplicaSets
Networking Services, Endpoints, NetworkPolicies
Security Secrets, ServiceAccounts, RBAC
Configuration ConfigMaps, API objects
Cluster state Nodes, leases, controllers, scheduler state

Danger

If etcd data is lost and no backup exists, the Kubernetes cluster may not be recoverable.


Why etcd Needs HA

A single etcd node is a single point of failure.

If one etcd server stores all cluster state and it fails:

  • Kubernetes API may stop working
  • cluster state cannot be read or updated
  • controllers cannot reconcile resources
  • new scheduling decisions may fail
  • recovery depends on backup availability

Success

In production, etcd should run as a cluster with multiple members.


Distributed etcd Cluster

In HA mode, etcd runs on multiple servers.

Each etcd member stores a copy of the same data.

etcd-1  ←→  etcd-2  ←→  etcd-3

All members cooperate to keep data consistent.

Note

You can read from any etcd member, but writes are coordinated through a leader.


Consistency in etcd

etcd provides strong consistency.

That means:

  • all members agree on the same data
  • writes are replicated before being committed
  • clients get reliable cluster state
  • Kubernetes does not see conflicting data
Client writes key=name value=John
Leader accepts write
Followers replicate write
Write is committed after quorum

Tip

Strong consistency is important because Kubernetes controllers depend on accurate cluster state.


Read and Write Behavior

Reads

Reads can be served from etcd members because all members maintain consistent data.

Writes

Writes are handled by the etcd leader.

If a write request reaches a follower:

  1. follower forwards the request to the leader
  2. leader processes the write
  3. leader replicates the write to followers
  4. write is committed after quorum is reached

Note

etcd may receive write requests on any member, but internally the leader coordinates the write.


Leader and Followers

An etcd cluster has:

Role Description
Leader Handles writes and coordinates replication
Followers Replicate data and participate in quorum

Example:

etcd-1  → Leader
etcd-2  → Follower
etcd-3  → Follower

If the leader fails, the remaining members elect a new leader.

Warning

If quorum is lost, etcd cannot safely process writes.


Raft Consensus Protocol

etcd uses the Raft consensus algorithm.

Raft is responsible for:

  • leader election
  • log replication
  • quorum-based writes
  • failover when leader is unavailable

Leader election flow:

  1. members start without a leader
  2. each member starts an election timer
  3. one member requests votes
  4. majority votes elect a leader
  5. leader sends heartbeats
  6. if heartbeats stop, a new election starts

Abstract

Raft helps etcd members agree on one correct cluster state.


Write Replication Flow

When a write happens:

Write request
Leader receives request
Leader sends update to followers
Majority confirms
Write is committed
Data becomes available consistently

Example

In a 3-member etcd cluster, a write is committed when at least 2 members agree.


Quorum

Quorum is the minimum number of etcd members required for the cluster to make decisions.

Formula:

Quorum = N/2 + 1

Where N is the number of etcd members.

etcd Members Quorum Fault Tolerance
1 1 0
2 2 0
3 2 1
4 3 1
5 3 2
6 4 2
7 4 3

Note

Fault tolerance means how many members can fail while still keeping quorum.


Why 2 etcd Members Is Not Enough

A 2-member etcd cluster has quorum of 2.

If one member fails:

Remaining members = 1
Required quorum = 2

Result:

  • no quorum
  • writes cannot be committed
  • cluster becomes unhealthy

Failure

A 2-member etcd cluster gives almost no real HA benefit because losing one member breaks quorum.


Odd Number of etcd Members

Odd numbers are preferred:

  • 3 members
  • 5 members
  • 7 members

Why?

Because odd numbers improve quorum behavior during failures or network partitions.

Success

Use an odd number of etcd members in production. Common choices are 3 or 5.


Odd vs Even Example

A 6-member cluster has quorum of 4.

If a network partition splits it into 3 + 3:

Partition A = 3 members
Partition B = 3 members
Required quorum = 4

Neither side has quorum.

Result:

  • cluster cannot make progress
  • writes fail

A 7-member cluster has quorum of 4.

If split into 4 + 3:

Partition A = 4 members → quorum exists
Partition B = 3 members → no quorum

Cluster can continue on the majority side.

Warning

Even-numbered etcd clusters can fail during equal network partitions.


Recommended etcd Cluster Size

Size Recommendation
1 member Only for labs/dev
2 members Avoid
3 members Good production minimum
5 members Better fault tolerance
7+ members Usually unnecessary

Tip

For most production Kubernetes clusters, 3 or 5 etcd members is enough.


etcd Topologies in Kubernetes

There are two common Kubernetes HA etcd topologies:

  1. Stacked etcd topology
  2. External etcd topology

Stacked etcd Topology

In stacked topology, etcd runs on the same nodes as the control plane.

master-1
  ├── kube-apiserver
  ├── controller-manager
  ├── scheduler
  └── etcd

master-2
  ├── kube-apiserver
  ├── controller-manager
  ├── scheduler
  └── etcd

master-3
  ├── kube-apiserver
  ├── controller-manager
  ├── scheduler
  └── etcd

Advantages

  • easier to set up
  • fewer servers
  • easier to manage
  • common with kubeadm HA clusters

Disadvantages

  • losing a control plane node also loses an etcd member
  • lower failure isolation
  • more risk during node failures

Warning

Stacked topology is simpler but couples control plane failure with etcd failure.


External etcd Topology

In external etcd topology, etcd runs on dedicated nodes separate from the control plane.

Control Plane Nodes
  ├── kube-apiserver
  ├── controller-manager
  └── scheduler

External etcd Nodes
  ├── etcd-1
  ├── etcd-2
  └── etcd-3

Advantages

  • better failure isolation
  • safer for critical production clusters
  • control plane node failure does not remove etcd member
  • easier to scale or maintain etcd separately

Disadvantages

  • harder to set up
  • more servers
  • more certificates
  • more networking and operational complexity

Success

External etcd is preferred for strict production environments where cluster state must be isolated and protected.


Stacked vs External etcd

Area Stacked etcd External etcd
Complexity Lower Higher
Server count Fewer More
Failure isolation Lower Higher
Cost Lower Higher
Operational effort Easier Harder
Production safety Good Better

etcd Ports

etcd uses two important ports:

Port Purpose
2379 Client communication, used by kube-apiserver
2380 Peer communication between etcd members

Note

In HA etcd, both client and peer communication must be allowed through firewall rules.


Installing etcd Manually

Typical manual setup steps:

wget -q --https-only \
  "https://github.com/etcd-io/etcd/releases/download/v3.3.9/etcd-v3.3.9-linux-amd64.tar.gz"

tar -xvf etcd-v3.3.9-linux-amd64.tar.gz

mv etcd-v3.3.9-linux-amd64/etcd* /usr/local/bin/

mkdir -p /etc/etcd /var/lib/etcd

cp ca.pem kubernetes-key.pem kubernetes.pem /etc/etcd/

Warning

Use a version compatible with your Kubernetes release. Do not randomly upgrade etcd in production.


etcd Service Configuration

Example systemd-style etcd configuration:

ExecStart=/usr/local/bin/etcd \
  --name=${ETCD_NAME} \
  --cert-file=/etc/etcd/kubernetes.pem \
  --key-file=/etc/etcd/kubernetes-key.pem \
  --peer-cert-file=/etc/etcd/kubernetes.pem \
  --peer-key-file=/etc/etcd/kubernetes-key.pem \
  --trusted-ca-file=/etc/etcd/ca.pem \
  --peer-trusted-ca-file=/etc/etcd/ca.pem \
  --peer-client-cert-auth \
  --client-cert-auth \
  --initial-advertise-peer-urls=https://${INTERNAL_IP}:2380 \
  --listen-peer-urls=https://${INTERNAL_IP}:2380 \
  --listen-client-urls=https://${INTERNAL_IP}:2379,https://127.0.0.1:2379 \
  --advertise-client-urls=https://${INTERNAL_IP}:2379 \
  --initial-cluster-token=etcd-cluster-0 \
  --initial-cluster=master-1=https://${MASTER1_IP}:2380,master-2=https://${MASTER2_IP}:2380,master-3=https://${MASTER3_IP}:2380 \
  --initial-cluster-state=new \
  --data-dir=/var/lib/etcd

Note

The --initial-cluster flag tells each etcd member where its peers are.


Important etcd Flags

Flag Purpose
--name Unique etcd member name
--data-dir etcd data directory
--listen-client-urls Client listener URL
--advertise-client-urls Client URL advertised to clients
--listen-peer-urls Peer listener URL
--initial-advertise-peer-urls Peer URL advertised to members
--initial-cluster List of all initial etcd members
--initial-cluster-state new for new cluster, existing for joining
--cert-file TLS certificate for client traffic
--peer-cert-file TLS certificate for peer traffic

Using etcdctl

etcdctl is the CLI tool used to interact with etcd.

Set API version:

export ETCDCTL_API=3

Put a key:

etcdctl put name John

Get a key:

etcdctl get name

List keys:

etcdctl get / --prefix --keys-only

Tip

In Kubernetes troubleshooting, etcdctl is commonly used for snapshots, member checks, and health checks.


Check etcd Cluster Health

Check endpoint health:

ETCDCTL_API=3 etcdctl endpoint health --cluster

Check endpoint status:

ETCDCTL_API=3 etcdctl endpoint status --cluster -w table

List members:

ETCDCTL_API=3 etcdctl member list

Note

In secured clusters, include --cacert, --cert, and --key options.

Example with certificates:

ETCDCTL_API=3 etcdctl endpoint health --cluster \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key

etcd Backup

Back up etcd regularly.

ETCDCTL_API=3 etcdctl snapshot save snapshot.db \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key

Verify snapshot:

ETCDCTL_API=3 etcdctl snapshot status snapshot.db

Danger

etcd backups must be tested. A backup without a tested restore process is not a reliable recovery plan.


Kubernetes API Server and etcd

The kube-apiserver talks to etcd using the --etcd-servers option.

Example:

kube-apiserver \
  --etcd-servers=https://10.240.0.10:2379,https://10.240.0.11:2379,https://10.240.0.12:2379 \
  --etcd-cafile=/var/lib/kubernetes/ca.pem \
  --etcd-certfile=/var/lib/kubernetes/apiserver-etcd-client.crt \
  --etcd-keyfile=/var/lib/kubernetes/apiserver-etcd-client.key

Note

The kube-apiserver should be configured with multiple etcd endpoints for HA.


Production Best Practices

Recommended

  • Use 3 or 5 etcd members
  • Use an odd number of etcd members
  • Enable TLS for client and peer communication
  • Keep etcd behind private networking
  • Do not expose etcd publicly
  • Use fast disks, preferably SSD
  • Monitor etcd latency, leader changes, and disk usage
  • Take regular snapshots
  • Test snapshot restore
  • Keep etcd members close in network latency
  • Avoid running heavy workloads on etcd nodes
  • Use external etcd for critical production clusters when possible

Do's

  • Use odd-numbered etcd clusters
  • Use at least 3 members for HA
  • Protect etcd with TLS
  • Restrict etcd access to API servers only
  • Monitor quorum and leader status
  • Back up etcd before upgrades
  • Test restore in a non-production environment
  • Use reliable disks and stable networking

Don'ts

  • Don't use 2 etcd members for production HA
  • Don't expose port 2379 publicly
  • Don't ignore quorum requirements
  • Don't run etcd on slow or unstable disks
  • Don't manually edit etcd data unless absolutely necessary
  • Don't upgrade etcd without checking Kubernetes compatibility
  • Don't skip backup validation
  • Don't place all etcd members in the same failure domain

Failure

etcd HA depends on quorum. Adding more nodes does not help if they are poorly distributed or frequently partitioned.


Common Failure Scenarios

Failure Impact Mitigation
One etcd member fails in 3-node cluster Cluster continues Replace failed member quickly
Two etcd members fail in 3-node cluster Quorum lost Restore member or recover from backup
Network partition Minority side stops processing writes Use odd number and stable networking
Disk latency increases API server may become slow Use SSD and monitor disk I/O
etcd certificate expires API server cannot connect Monitor and rotate certificates
Backup missing Recovery becomes difficult Schedule and test snapshots

Troubleshooting Commands

Check etcd pods in kubeadm clusters:

kubectl get pods -n kube-system | grep etcd

Check etcd static pod manifest:

cat /etc/kubernetes/manifests/etcd.yaml

Check etcd logs:

kubectl logs -n kube-system etcd-<node-name>

Check members:

ETCDCTL_API=3 etcdctl member list

Check endpoint status:

ETCDCTL_API=3 etcdctl endpoint status --cluster -w table

Check endpoint health:

ETCDCTL_API=3 etcdctl endpoint health --cluster

Check kube-apiserver etcd config:

cat /etc/kubernetes/manifests/kube-apiserver.yaml | grep etcd

Exam and Practical Notes

CKA Focus

For certification-style tasks, focus on:

  • identifying etcd endpoints
  • checking etcd health
  • taking etcd snapshots
  • restoring from etcd snapshots
  • understanding quorum
  • knowing why odd number of members is recommended

Quick Reference

Task Command
Put key etcdctl put name John
Get key etcdctl get name
List keys etcdctl get / --prefix --keys-only
Member list etcdctl member list
Endpoint health etcdctl endpoint health --cluster
Endpoint status etcdctl endpoint status --cluster -w table
Snapshot save etcdctl snapshot save snapshot.db
Snapshot status etcdctl snapshot status snapshot.db

Summary

Quote

  • etcd stores all Kubernetes cluster state
  • etcd HA depends on Raft consensus and quorum
  • Writes are coordinated by a leader
  • A majority of members must agree before writes are committed
  • Avoid 2-member etcd clusters
  • Use odd-numbered clusters, usually 3 or 5 members
  • Use stacked etcd for simpler setup
  • Use external etcd for stronger production isolation
  • Always secure, monitor, back up, and test restore for etcd