12.04 Network Troubleshooting
Kubernetes network troubleshooting means checking how traffic flows between Pods, Services, CoreDNS, CNI, and kube-proxy. For CKA revision, troubleshoot layer by layer instead of guessing.
Goal
Troubleshoot Kubernetes network failures step by step:
- Confirm Pods are healthy
- Test direct Pod-to-Pod connectivity
- Check Service selectors and endpoints
- Verify DNS/CoreDNS
- Check CNI plugin
- Check kube-proxy rules
- Validate network policies, ports, and node routes
Kubernetes Network Components
| Component | Purpose | Troubleshooting focus |
|---|---|---|
| Pod | Application runtime unit | IP, status, logs, container port |
| Service | Stable access point for Pods | selector, port, targetPort, endpoints |
| CoreDNS | Cluster DNS / service discovery | DNS Pods, kube-dns service, /etc/resolv.conf |
| CNI plugin | Pod network setup and Pod IP allocation | CNI Pods, /etc/cni/net.d, Pod CIDR |
| kube-proxy | Service rule management | iptables/IPVS rules, kube-proxy logs |
| NetworkPolicy | Traffic allow/deny rules | namespace, podSelector, ingress/egress rules |
Note
Pods are temporary. Services provide stable access. CoreDNS resolves names. CNI gives Pods network connectivity. kube-proxy makes Service traffic reach the correct Pods.
Network Troubleshooting Flow
Application cannot connect
│
├── 1. Check Pod status
├── 2. Test direct Pod IP connectivity
├── 3. Check Service definition
├── 4. Check Service endpoints / EndpointSlices
├── 5. Test Service ClusterIP
├── 6. Test DNS resolution
├── 7. Check CoreDNS
├── 8. Check CNI plugin
├── 9. Check kube-proxy
└── 10. Check NetworkPolicies and firewall rules
Tip
For exams and production incidents, isolate the problem first:
- Pod IP works but Service IP fails → Service or kube-proxy issue
- Service IP works but DNS name fails → CoreDNS issue
- Pod IP fails across nodes → CNI/routing issue
- Pod IP fails only for one app → app/container port issue
Step 1: Understand the Traffic Path
Example application:
Before troubleshooting, identify:
- source Pod
- destination Service or Pod
- namespace
- port
- protocol
- expected DNS name
- expected endpoint Pod
Ask first
- Is the issue from outside the cluster or inside the cluster?
- Is the issue with IP connectivity or DNS?
- Does direct Pod IP work?
- Does Service ClusterIP work?
- Are endpoints created for the Service?
Step 2: Check Pod Status
For a specific app:
Example:
NAME READY STATUS RESTARTS AGE IP NODE
hostnames-6f78f87b8f-abcdf 1/1 Running 0 5m 10.244.1.5 node-1
hostnames-6f78f87b8f-hijk1 1/1 Running 0 5m 10.244.2.3 node-2
Check:
READYshould be1/1STATUSshould beRunningRESTARTSshould not be increasing- Pod IP should exist
- Pods should be on expected nodes
Warning
Do not debug Service or DNS first if the backend Pods are not healthy.
Step 3: Check Pod Logs and Events
For previous crashed container logs:
Follow logs:
Common Pod-level issues
- App is not listening on the expected port
- Container is restarting
- Readiness probe is failing
- Wrong container image or config
- App binds only to
127.0.0.1instead of0.0.0.0
Step 4: Test Direct Pod-to-Pod Connectivity
First get Pod IPs:
Or get only Pod IPs:
Run a temporary BusyBox Pod:
Inside BusyBox:
Interpretation
If direct Pod IP works, the CNI network path is probably working.
If direct Pod IP fails across nodes, check CNI, routing, node firewall, or NetworkPolicy.
Step 5: Check Service Definition
Important fields:
apiVersion: v1
kind: Service
metadata:
name: hostnames
spec:
selector:
app: hostnames
ports:
- port: 80
targetPort: 9376
Check:
selectormatches Pod labelsportis the Service porttargetPortmatches the container listening port- namespace is correct
- Service type is correct
Most common Service issue
The Service selector does not match the Pod labels. If selectors do not match, Kubernetes will not create endpoints.
Step 6: Verify Pod Labels Match Service Selectors
Check Service selector:
Check Pod labels:
Or:
Expected result: backend Pods should be listed.
Selector mismatch example
Service selector:
Pod label:
This will fail because hostnames and hostname are different.
Step 7: Check Endpoints / EndpointSlices
Modern Kubernetes uses EndpointSlices, but endpoints is still useful for quick checks.
Expected:
Success
If endpoints exist, the Service has discovered matching backend Pods.
Danger
If endpoints are empty, traffic to the Service will fail even if the Service object exists.
Step 8: Test Service ClusterIP
Get Service IP:
Example:
Test from inside the cluster:
Inside BusyBox:
Tip
If Pod IP works but Service ClusterIP does not work, check kube-proxy and Service rules.
Step 9: Test DNS Resolution
Test the Service name:
Test FQDN:
kubectl run -it --rm --restart=Never busybox --image=busybox -- nslookup hostnames.default.svc.cluster.local
Common Service DNS formats:
<service-name>
<service-name>.<namespace>
<service-name>.<namespace>.svc
<service-name>.<namespace>.svc.cluster.local
Note
Same namespace Pods can usually call a Service by short name. Cross-namespace calls should use <service>.<namespace> or full FQDN.
Step 10: Check Pod DNS Configuration
From inside a Pod:
Expected style:
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
Check:
nameserverpoints to kube-dns/CoreDNS Service IP- search domains are present
- namespace in search path is correct
Bug
If /etc/resolv.conf has the wrong nameserver, check kubelet DNS settings and the kube-dns Service.
Step 11: Troubleshoot CoreDNS
CoreDNS usually runs in the kube-system namespace.
Check CoreDNS Pods:
Check CoreDNS Service:
Check EndpointSlice:
Check logs:
Check CoreDNS ConfigMap:
CoreDNS Corefile
Warning
If CoreDNS Pods are running but DNS fails, verify the kube-dns Service has endpoints and that Pods use the correct nameserver in /etc/resolv.conf.
Step 12: Check CNI Plugin
CNI is responsible for Pod networking and Pod IP assignment.
Check CNI Pods:
Common CNI Pods:
- Calico
- Flannel
- Weave
- Cilium
Check CNI logs:
Check node CNI files:
Check Pod CIDR:
CNI failure signs
- Pods stuck in
ContainerCreating failed to create pod sandbox- Pod IP missing
- Pod-to-Pod traffic fails across nodes
- CNI DaemonSet Pods are not running on all nodes
Step 13: Check kube-proxy
kube-proxy programs rules so Service traffic reaches backend Pods.
Check kube-proxy Pods:
Check logs:
Check ConfigMap:
Look for mode:
or:
Note
kube-proxy can use iptables or ipvs. In troubleshooting, first identify which mode your cluster uses.
Step 14: Verify Service Rules
Example IPVS output:
Success
If kube-proxy rules show Service IP forwarding to Pod IPs, Service routing is likely configured correctly.
Step 15: Check NetworkPolicy
If Pod IP, Service IP, or DNS partially works only from some Pods/namespaces, check NetworkPolicies.
Common mistakes:
- default deny policy blocks traffic
- wrong namespace selector
- wrong pod selector
- missing egress DNS rule to CoreDNS
- missing port in ingress/egress rule
Warning
NetworkPolicy is enforced by the CNI plugin. If the CNI does not support NetworkPolicy, the policy may exist but not be enforced.
Step 16: Check Node-Level Network
On the affected node:
Check firewall rules:
Check listening ports:
Tip
If only cross-node Pod traffic fails, check node routes, overlay tunnel interfaces, firewall rules, and CNI DaemonSet health.
Quick Debug Decision Table
| What works? | What fails? | Likely issue |
|---|---|---|
| Pod IP | Service IP | kube-proxy or Service rules |
| Service IP | DNS name | CoreDNS or Pod /etc/resolv.conf |
| Same-node Pod IP | Cross-node Pod IP | CNI routing/overlay issue |
| DNS from one namespace | DNS from another namespace | namespace/search path/policy issue |
| Service has no endpoints | App access | Service selector or Pod readiness |
| Pod running but not endpoint | Service access | readiness probe failing |
| External access only | NodePort/LoadBalancer/Ingress | external routing/firewall/LB issue |
| Everything works except one Pod | App config or NetworkPolicy | Pod-specific issue |
CKA-Focused Command Cheat Sheet
Production Best Practices
Do
- Use Services instead of hardcoding Pod IPs.
- Use readiness probes so Services route only to healthy Pods.
- Monitor CoreDNS latency, errors, and restarts.
- Run at least two CoreDNS replicas in production.
- Keep CNI DaemonSet healthy on every node.
- Monitor kube-proxy logs and rule sync failures.
- Use NetworkPolicies intentionally and document them.
- Use clear labels because Services depend on selectors.
- Validate DNS, Service, and Pod connectivity after every CNI upgrade.
- Keep Pod CIDR and Service CIDR non-overlapping.
Don't
- Do not point applications directly to Pod IPs.
- Do not ignore empty Service endpoints.
- Do not change CNI configs manually without understanding cluster impact.
- Do not assume DNS is broken before testing Service ClusterIP.
- Do not create broad default-deny NetworkPolicies without required allow rules.
- Do not forget DNS egress rules when using strict NetworkPolicies.
- Do not mix multiple CNIs unless the architecture explicitly supports it.
Exam Revision Checklist
[ ] Are source and destination Pods running?
[ ] Do Pods have IP addresses?
[ ] Can source Pod reach destination Pod IP?
[ ] Does the Service selector match Pod labels?
[ ] Does the Service have endpoints or EndpointSlices?
[ ] Does Service ClusterIP work?
[ ] Does Service DNS name resolve?
[ ] Is /etc/resolv.conf inside the Pod correct?
[ ] Are CoreDNS Pods running?
[ ] Does kube-dns Service have endpoints?
[ ] Are CNI Pods running on all nodes?
[ ] Is kube-proxy running on all nodes?
[ ] Are iptables/IPVS rules created?
[ ] Are NetworkPolicies blocking traffic?
[ ] Are node routes/firewalls blocking traffic?
Fast Troubleshooting Example
# 1. Check Pods and IPs
kubectl get pods -A -o wide
# 2. Check Service
kubectl get svc hostnames -o yaml
# 3. Check endpoints
kubectl get endpoints hostnames
kubectl get endpointslices -l kubernetes.io/service-name=hostnames
# 4. Test from temporary Pod
kubectl run -it --rm --restart=Never busybox --image=busybox -- sh
# Inside busybox
wget -qO- <pod-ip>:<target-port>
wget -qO- <service-cluster-ip>:<service-port>
nslookup hostnames
nslookup hostnames.default.svc.cluster.local
# 5. Check DNS
kubectl get pods -n kube-system -l k8s-app=kube-dns
kubectl logs -n kube-system -l k8s-app=kube-dns
# 6. Check kube-proxy
kubectl get pods -n kube-system -l k8s-app=kube-proxy
kubectl logs -n kube-system -l k8s-app=kube-proxy
# 7. Check CNI
kubectl get pods -n kube-system
Quote
Network troubleshooting in Kubernetes is simple when done in order: Pod → Service → Endpoints → DNS → CNI → kube-proxy → NetworkPolicy.