5.3 Kubernetes Backup & Restore
1οΈβ£ What Should You Back Up?
In production, you must protect 3 things which actually matters for recovery:
πΉ 1. Resource Configuration
- Deployments
- Services
- ConfigMaps
- Secrets
- Ingress
- RBAC objects
πΉ 2. Cluster State (etcd)
- Nodes
- Pods
- All Kubernetes objects
- Cluster metadata
πΉ 3. Persistent Volumes (Application Data)
Abstract
If the cluster is lost:
- YAML files restore workloads
- etcd restores cluster state
- PV backups restore business data
2οΈβ£ Backup Method 1 β Resource Configuration (Recommended)
Declarative (Best Practice)
Store all YAML manifests in Git:
Benefits:
- Version controlled
- Reusable
- Team-friendly
- Git becomes your backup
Success
GitOps-style configuration storage is the safest production approach.
Export Live Resources (Safety Net)
If objects were created imperatively:
This queries the API server and exports resource definitions.
Warning
This does NOT back up persistent volume data.
Tools for Resource Backup
- Velero
- Cloud-native backup operators
- GitOps pipelines
Tip
In managed clusters (EKS/GKE/AKS), API-based backup is usually required because etcd access is restricted.
3οΈβ£ Backup Method 2 β etcd Snapshot (Cluster State Backup)
etcd is the Kubernetes key-value database.
It stores:
- All cluster objects
- Secrets
- Nodes
- Cluster metadata
Instead of exporting YAML, you can snapshot etcd.
4οΈβ£ Working with ETCDCTL & ETCDUTL
Verify etcd Version
Example:
Note
Always use ETCDCTL_API=3 for Kubernetes clusters.
5οΈβ£ Taking etcd Backup
Option 1 β Live Snapshot (etcdctl)
ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key snapshot save /backup/etcd-snapshot.db
Required Flags
--endpointsβ etcd endpoint--cacertβ CA certificate--certβ client certificate--keyβ client key
Verify snapshot:
Success
This creates a portable .db snapshot of the entire cluster state.
Option 2 β File-Level Backup (etcdutl)
This copies:
- etcd backend database
- WAL files
Note
etcdutl backup works even if etcd is not running.
6οΈβ£ Restoring etcd
Using etcdutl (Recommended)
This creates a new data directory.
Manual Restore Process
Step 1 β Stop API Server
Step 2 β Restore Snapshot
Step 3 β Update etcd Config
Change:
Step 4 β Restart Services
Warning
Always include certificates and endpoints when using etcdctl.
7οΈβ£ Persistent Volume Backup
etcd does NOT store application data.
For stateful workloads:
- Use cloud disk snapshots (EBS, GCE PD, Azure Disk)
- Use CSI VolumeSnapshots
- Use storage-native backup tools
Danger
Without PV backups, application data loss is permanent.
8οΈβ£ Comparing Backup Methods
| Method | Protects | Best For |
|---|---|---|
| Git/YAML | App definitions | GitOps |
| kubectl export | Resource configs | Quick backup |
| etcdctl snapshot | Full cluster state | Self-managed clusters |
| etcdutl backup | Raw etcd files | Advanced DR |
| PV snapshot | Application data | Stateful apps |
9οΈβ£ Production Best Practices
DO This
- Store manifests in Git
- Schedule automated etcd snapshots
- Backup persistent volumes separately
- Encrypt backup files
- Store backups off-cluster
- Test restore regularly
- Document recovery runbook
π Production Do & Donβt
β DO
Tip
- Verify snapshot integrity
- Maintain off-site copies
- Automate backup schedules
- Monitor etcd health
β DON'T
Danger
- Donβt rely only on YAML exports
- Donβt ignore PV backups
- Donβt store backups on same node
- Donβt skip restore testing
- Donβt expose etcd without TLS
1οΈβ£1οΈβ£ Disaster Recovery Flow
Abstract
- Identify failure type (state vs data)
- Restore etcd snapshot (if control plane issue)
- Restore persistent volumes (if data issue)
- Reapply manifests if needed
- Validate cluster health
- Monitor workloads
π― Interview Memory
Question
- etcd stores cluster state
- YAML + Git = config backup
- etcdctl snapshot save
- etcdutl snapshot restore
- Stop API server before restore
- PV backups are separate
Final Production Takeaway
Quote
Backups are useless without restore testing.
Protect:
- Configuration (Git)
- Cluster State (etcd)
- Application Data (Persistent Volumes)
Automate backups. Test restores. Protect production.