5.3 Kubernetes Backup & Restore

What Should You Back Up?

In production, you must protect 3 things which actually matters for recovery:

🔹 1. Resource Configuration

Deployments
Services
ConfigMaps
Secrets
Ingress
RBAC objects

🔹 2. Cluster State (etcd)

Nodes
Pods
All Kubernetes objects
Cluster metadata

🔹 3. Persistent Volumes (Application Data)

Abstract

If the cluster is lost:

YAML files restore workloads
etcd restores cluster state
PV backups restore business data

Backup Resource Configuration

Declarative (Best Practice)

Store all YAML manifests in Git:

kubectl apply -f app/

Benefits:

Version controlled
Reusable
Team-friendly
Git becomes your backup

Success

GitOps-style configuration storage is the safest production approach.

Export Live Resources (Safety Net)

If objects were created imperatively:

kubectl get all --all-namespaces -o yaml > cluster-backup.yaml

This queries the API server and exports resource definitions.

Warning

This does NOT back up persistent volume data.

Tools for Resource Backup

Velero
Cloud-native backup operators
GitOps pipelines

Tip

In managed clusters (EKS/GKE/AKS), API-based backup is usually required because etcd access is restricted.

Backup etcd (Cluster State Backup)

etcd is the Kubernetes key-value database.

It stores:

All cluster objects
Secrets
Nodes
Cluster metadata

Instead of exporting YAML, you can snapshot etcd.

Working with ETCDCTL & ETCDUTL

Note

Always use ETCDCTL_API=3 for Kubernetes clusters.

Taking etcd Backup

Option 1 — Live Snapshot (etcdctl)

ETCDCTL_API=3 etcdctl \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key \
  snapshot save /backup/etcd-snapshot.db

Required Flags

--endpoints → etcd endpoint
--cacert → CA certificate
--cert → client certificate
--key → client key

Verify snapshot:

etcdctl snapshot status /backup/etcd-snapshot.db --write-out=table

Success

This creates a portable .db snapshot of the entire cluster state.

Option 2 — File-Level Backup (etcdutl)

etcdutl backup \
  --data-dir /var/lib/etcd \
  --backup-dir /backup/etcd-backup

This copies:

etcd backend database
WAL files

Note

etcdutl backup works even if etcd is not running.

Restoring etcd from snapshot

Step 1 — Stop kube-apiserver

mv /etc/kubernetes/manifests/kube-apiserver.yaml /tmp/
sleep 30

Note

Moving static pod manifest stops the component automatically.

Step 2 — Restore etcd Snapshot

etcdutl snapshot restore /opt/snapshot-pre-boot.db \
  --data-dir /var/lib/etcd-from-backup

This creates a new restored data directory:

/var/lib/etcd-from-backup

Step 3 — Update etcd Static Pod Config

vi /etc/kubernetes/manifests/etcd.yaml

VERY IMPORTANT : You must modify ONLY the `hostPath` inside the `volumes` section.

🔴 OLD Configuration

volumes:
- hostPath:
    path: /etc/kubernetes/pki/etcd
    type: DirectoryOrCreate
  name: etcd-certs
- hostPath:
    path: /var/lib/etcd        # OLD directory
    type: DirectoryOrCreate
  name: etcd-data

🟢 NEW Configuration (After Restore)

volumes:
- hostPath:
    path: /etc/kubernetes/pki/etcd
    type: DirectoryOrCreate
  name: etcd-certs
- hostPath:
    path: /var/lib/etcd-from-backup   # NEW restored directory
    type: DirectoryOrCreate
  name: etcd-data

❗ Why Do We Change ONLY `volumes.hostPath.path`?

This is extremely important.

There are two different layers:

Section	Meaning	Location
`hostPath.path`	Directory on the node (host machine)	Physical storage
`volumeMounts.mountPath`	Directory inside container	Container filesystem

🔍 Volume Mount Section (DO NOT CHANGE)

In the same file you will see:

volumeMounts:
- mountPath: /var/lib/etcd
  name: etcd-data
- mountPath: /etc/kubernetes/pki/etcd
  name: etcd-certs

🚨 This must NOT be changed.

🧠 Why Not Change mountPath?

Because etcd container is started with:

--data-dir=/var/lib/etcd

That path is inside the container.

If you change mountPath, etcd will not find its database and will fail to start.

📦 What Actually Happens

After restore:

Host machine:

/var/lib/etcd-from-backup

Container view:

/var/lib/etcd

Mapping:

Host: /var/lib/etcd-from-backup
        ↓ mounted as
Container: /var/lib/etcd

The container does NOT know the host path changed.

That is why:

✅ Change hostPath.path
❌ Do NOT change mountPath

⚠️ What Happens If You Change mountPath?

If you modify:

mountPath: /var/lib/etcd-from-backup

Then:

etcd still expects /var/lib/etcd
Database not found
etcd crashes
Control plane fails

Success

Always change only the hostPath during restore.

Step 4 - Restart all ETCD Components

Restart kube-apiserver

mv /tmp/kube-apiserver.yaml /etc/kubernetes/manifests/

Wait ~60 seconds.

Restart controller-manager

mv /etc/kubernetes/manifests/kube-controller-manager.yaml /tmp/
sleep 20
mv /tmp/kube-controller-manager.yaml /etc/kubernetes/manifests/

Restart scheduler

mv /etc/kubernetes/manifests/kube-scheduler.yaml /tmp/
sleep 20
mv /tmp/kube-scheduler.yaml /etc/kubernetes/manifests/

Restart kubelet

systemctl restart kubelet

Monitor Restore

watch crictl ps

Ensure:

All control plane pods = Running
etcd is healthy
kube-apiserver is Running

Verify Restore

kubectl get deployments,services --all-namespaces
kubectl get pods --all-namespaces
kubectl get nodes

Success

All resources should reflect snapshot time.

Persistent Volume Backup

etcd does NOT store application data.

For stateful workloads:

Use cloud disk snapshots (EBS, GCE PD, Azure Disk)
Use CSI VolumeSnapshots
Use storage-native backup tools

Danger

Without PV backups, application data loss is permanent.

Comparing Backup Methods

Method	Protects	Best For
Git/YAML	App definitions	GitOps
kubectl export	Resource configs	Quick backup
etcdctl snapshot	Full cluster state	Self-managed clusters
etcdutl backup	Raw etcd files	Advanced DR
PV snapshot	Application data	Stateful apps

Production Best Practices

DO This

Store manifests in Git
Schedule automated etcd snapshots
Backup persistent volumes separately
Encrypt backup files
Store backups off-cluster
Test restore regularly
Document restore runbook
Monitor etcd health
Verify snapshot integrity
Maintain off-site copies

❌ DON'T

Danger

Don’t rely only on YAML exports
Don’t skip PV backups
Don’t store backups on same node
Don’t skip restore testing
Don’t expose etcd without TLS
Don’t modify volumeMount path

Disaster Recovery Flow

Abstract

Identify failure type (state vs data)
Restore etcd snapshot (if control plane issue)
Restore persistent volumes (if data issue)
Reapply manifests if needed
Validate cluster health
Monitor workloads

🎯 Interview Memory

Question

etcd stores cluster state
YAML + Git = config backup
ETCDCTL_API=3 required
etcdctl snapshot save
Stop API server before restore
etcdutl snapshot restore
Update data-dir in etcd.yaml
Static pod restart behavior
PV backups are separate

Question

We change only hostPath.path because it defines the data location on the host. mountPath defines where data appears inside the container. etcd expects /var/lib/etcd inside container. Changing mountPath breaks etcd startup.

Final Production Takeaway

Quote

Backups are useless without restore testing.

Protect:

Configuration (Git)
Cluster State (etcd)
Application Data (Persistent Volumes)

Automate backups. Test restores. Protect production.

In static pod restore, only change the host path — never the container path.

5.3 Kubernetes Backup & Restore

What Should You Back Up?

🔹 1. Resource Configuration

🔹 2. Cluster State (etcd)

🔹 3. Persistent Volumes (Application Data)

Backup Resource Configuration

Declarative (Best Practice)

Export Live Resources (Safety Net)

Tools for Resource Backup

Backup etcd (Cluster State Backup)

Working with ETCDCTL & ETCDUTL

Taking etcd Backup

Option 1 — Live Snapshot (etcdctl)

Required Flags

Option 2 — File-Level Backup (etcdutl)

Restoring etcd from snapshot

Step 1 — Stop kube-apiserver

Step 2 — Restore etcd Snapshot

Step 3 — Update etcd Static Pod Config

VERY IMPORTANT : You must modify ONLY the hostPath inside the volumes section.

🔴 OLD Configuration

🟢 NEW Configuration (After Restore)

❗ Why Do We Change ONLY volumes.hostPath.path?

🔍 Volume Mount Section (DO NOT CHANGE)

🧠 Why Not Change mountPath?

📦 What Actually Happens

⚠️ What Happens If You Change mountPath?

Step 4 - Restart all ETCD Components

Restart kube-apiserver

Restart controller-manager

Restart scheduler

Restart kubelet

Monitor Restore

Verify Restore

Persistent Volume Backup

Comparing Backup Methods

Production Best Practices

❌ DON'T

Disaster Recovery Flow

🎯 Interview Memory

Final Production Takeaway

VERY IMPORTANT : You must modify ONLY the `hostPath` inside the `volumes` section.

❗ Why Do We Change ONLY `volumes.hostPath.path`?