5.3 Kubernetes Backup & Restore
What Should You Back Up?
In production, you must protect 3 things which actually matters for recovery:
πΉ 1. Resource Configuration
- Deployments
- Services
- ConfigMaps
- Secrets
- Ingress
- RBAC objects
πΉ 2. Cluster State (etcd)
- Nodes
- Pods
- All Kubernetes objects
- Cluster metadata
πΉ 3. Persistent Volumes (Application Data)
Abstract
If the cluster is lost:
- YAML files restore workloads
- etcd restores cluster state
- PV backups restore business data
Backup Resource Configuration
Declarative (Best Practice)
Store all YAML manifests in Git:
Benefits:
- Version controlled
- Reusable
- Team-friendly
- Git becomes your backup
Success
GitOps-style configuration storage is the safest production approach.
Export Live Resources (Safety Net)
If objects were created imperatively:
This queries the API server and exports resource definitions.
Warning
This does NOT back up persistent volume data.
Tools for Resource Backup
- Velero
- Cloud-native backup operators
- GitOps pipelines
Tip
In managed clusters (EKS/GKE/AKS), API-based backup is usually required because etcd access is restricted.
Backup etcd (Cluster State Backup)
etcd is the Kubernetes key-value database.
It stores:
- All cluster objects
- Secrets
- Nodes
- Cluster metadata
Instead of exporting YAML, you can snapshot etcd.
Working with ETCDCTL & ETCDUTL
Note
Always use ETCDCTL_API=3 for Kubernetes clusters.
Taking etcd Backup
Option 1 β Live Snapshot (etcdctl)
ETCDCTL_API=3 etcdctl \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
snapshot save /backup/etcd-snapshot.db
Required Flags
--endpointsβ etcd endpoint--cacertβ CA certificate--certβ client certificate--keyβ client key
Verify snapshot:
Success
This creates a portable .db snapshot of the entire cluster state.
Option 2 β File-Level Backup (etcdutl)
This copies:
- etcd backend database
- WAL files
Note
etcdutl backup works even if etcd is not running.
Restoring etcd from snapshot
Step 1 β Stop kube-apiserver
Note
Moving static pod manifest stops the component automatically.
Step 2 β Restore etcd Snapshot
This creates a new restored data directory:
Step 3 β Update etcd Static Pod Config
VERY IMPORTANT : You must modify ONLY the hostPath inside the volumes section.
π΄ OLD Configuration
volumes:
- hostPath:
path: /etc/kubernetes/pki/etcd
type: DirectoryOrCreate
name: etcd-certs
- hostPath:
path: /var/lib/etcd # OLD directory
type: DirectoryOrCreate
name: etcd-data
π’ NEW Configuration (After Restore)
volumes:
- hostPath:
path: /etc/kubernetes/pki/etcd
type: DirectoryOrCreate
name: etcd-certs
- hostPath:
path: /var/lib/etcd-from-backup # NEW restored directory
type: DirectoryOrCreate
name: etcd-data
β Why Do We Change ONLY volumes.hostPath.path?
This is extremely important.
There are two different layers:
| Section | Meaning | Location |
|---|---|---|
hostPath.path |
Directory on the node (host machine) | Physical storage |
volumeMounts.mountPath |
Directory inside container | Container filesystem |
π Volume Mount Section (DO NOT CHANGE)
In the same file you will see:
volumeMounts:
- mountPath: /var/lib/etcd
name: etcd-data
- mountPath: /etc/kubernetes/pki/etcd
name: etcd-certs
π¨ This must NOT be changed.
π§ Why Not Change mountPath?
Because etcd container is started with:
That path is inside the container.
If you change mountPath, etcd will not find its database and will fail to start.
π¦ What Actually Happens
After restore:
Host machine:
Container view:
Mapping:
The container does NOT know the host path changed.
That is why:
β
Change hostPath.path
β Do NOT change mountPath
β οΈ What Happens If You Change mountPath?
If you modify:
Then:
- etcd still expects
/var/lib/etcd - Database not found
- etcd crashes
- Control plane fails
Success
Always change only the hostPath during restore.
Step 4 - Restart all ETCD Components
Restart kube-apiserver
Wait ~60 seconds.
Restart controller-manager
mv /etc/kubernetes/manifests/kube-controller-manager.yaml /tmp/
sleep 20
mv /tmp/kube-controller-manager.yaml /etc/kubernetes/manifests/
Restart scheduler
mv /etc/kubernetes/manifests/kube-scheduler.yaml /tmp/
sleep 20
mv /tmp/kube-scheduler.yaml /etc/kubernetes/manifests/
Restart kubelet
Monitor Restore
Ensure:
- All control plane pods = Running
- etcd is healthy
- kube-apiserver is Running
Verify Restore
kubectl get deployments,services --all-namespaces
kubectl get pods --all-namespaces
kubectl get nodes
Success
All resources should reflect snapshot time.
Persistent Volume Backup
etcd does NOT store application data.
For stateful workloads:
- Use cloud disk snapshots (EBS, GCE PD, Azure Disk)
- Use CSI VolumeSnapshots
- Use storage-native backup tools
Danger
Without PV backups, application data loss is permanent.
Comparing Backup Methods
| Method | Protects | Best For |
|---|---|---|
| Git/YAML | App definitions | GitOps |
| kubectl export | Resource configs | Quick backup |
| etcdctl snapshot | Full cluster state | Self-managed clusters |
| etcdutl backup | Raw etcd files | Advanced DR |
| PV snapshot | Application data | Stateful apps |
Production Best Practices
DO This
- Store manifests in Git
- Schedule automated etcd snapshots
- Backup persistent volumes separately
- Encrypt backup files
- Store backups off-cluster
- Test restore regularly
- Document restore runbook
- Monitor etcd health
- Verify snapshot integrity
- Maintain off-site copies
β DON'T
Danger
- Donβt rely only on YAML exports
- Donβt skip PV backups
- Donβt store backups on same node
- Donβt skip restore testing
- Donβt expose etcd without TLS
- Donβt modify volumeMount path
Disaster Recovery Flow
Abstract
- Identify failure type (state vs data)
- Restore etcd snapshot (if control plane issue)
- Restore persistent volumes (if data issue)
- Reapply manifests if needed
- Validate cluster health
- Monitor workloads
π― Interview Memory
Question
- etcd stores cluster state
- YAML + Git = config backup
- ETCDCTL_API=3 required
- etcdctl snapshot save
- Stop API server before restore
- etcdutl snapshot restore
- Update data-dir in etcd.yaml
- Static pod restart behavior
- PV backups are separate
Question
We change only hostPath.path because it defines the data location on the host.
mountPath defines where data appears inside the container.
etcd expects /var/lib/etcd inside container.
Changing mountPath breaks etcd startup.
Final Production Takeaway
Quote
Backups are useless without restore testing.
Protect:
- Configuration (Git)
- Cluster State (etcd)
- Application Data (Persistent Volumes)
Automate backups. Test restores. Protect production.
- In static pod restore, only change the host path β never the container path.