12.04 Network Troubleshooting

Kubernetes network troubleshooting means checking how traffic flows between Pods, Services, CoreDNS, CNI, and kube-proxy. For CKA revision, troubleshoot layer by layer instead of guessing.

Goal

Troubleshoot Kubernetes network failures step by step:

Confirm Pods are healthy
Test direct Pod-to-Pod connectivity
Check Service selectors and endpoints
Verify DNS/CoreDNS
Check CNI plugin
Check kube-proxy rules
Validate network policies, ports, and node routes

Kubernetes Network Components

Component	Purpose	Troubleshooting focus
Pod	Application runtime unit	IP, status, logs, container port
Service	Stable access point for Pods	selector, port, targetPort, endpoints
CoreDNS	Cluster DNS / service discovery	DNS Pods, kube-dns service, `/etc/resolv.conf`
CNI plugin	Pod network setup and Pod IP allocation	CNI Pods, `/etc/cni/net.d`, Pod CIDR
kube-proxy	Service rule management	iptables/IPVS rules, kube-proxy logs
NetworkPolicy	Traffic allow/deny rules	namespace, podSelector, ingress/egress rules

Note

Pods are temporary. Services provide stable access. CoreDNS resolves names. CNI gives Pods network connectivity. kube-proxy makes Service traffic reach the correct Pods.

Network Troubleshooting Flow

Application cannot connect
│
├── 1. Check Pod status
├── 2. Test direct Pod IP connectivity
├── 3. Check Service definition
├── 4. Check Service endpoints / EndpointSlices
├── 5. Test Service ClusterIP
├── 6. Test DNS resolution
├── 7. Check CoreDNS
├── 8. Check CNI plugin
├── 9. Check kube-proxy
└── 10. Check NetworkPolicies and firewall rules

Tip

For exams and production incidents, isolate the problem first:

Pod IP works but Service IP fails → Service or kube-proxy issue
Service IP works but DNS name fails → CoreDNS issue
Pod IP fails across nodes → CNI/routing issue
Pod IP fails only for one app → app/container port issue

Step 1: Understand the Traffic Path

Example application:

User
 │
 ▼
Web Service
 │
 ▼
Web Pod
 │
 ▼
DB Service
 │
 ▼
DB Pod

Before troubleshooting, identify:

source Pod
destination Service or Pod
namespace
port
protocol
expected DNS name
expected endpoint Pod

Ask first

Is the issue from outside the cluster or inside the cluster?
Is the issue with IP connectivity or DNS?
Does direct Pod IP work?
Does Service ClusterIP work?
Are endpoints created for the Service?

Step 2: Check Pod Status

kubectl get pods -A -o wide

For a specific app:

kubectl get pods -l app=hostnames -o wide

Example:

NAME                         READY   STATUS    RESTARTS   AGE   IP          NODE
hostnames-6f78f87b8f-abcdf   1/1     Running   0          5m    10.244.1.5  node-1
hostnames-6f78f87b8f-hijk1   1/1     Running   0          5m    10.244.2.3  node-2

Check:

READY should be 1/1
STATUS should be Running
RESTARTS should not be increasing
Pod IP should exist
Pods should be on expected nodes

Warning

Do not debug Service or DNS first if the backend Pods are not healthy.

Step 3: Check Pod Logs and Events

kubectl logs <pod-name>
kubectl describe pod <pod-name>

For previous crashed container logs:

kubectl logs <pod-name> --previous

Follow logs:

kubectl logs <pod-name> -f

Common Pod-level issues

App is not listening on the expected port
Container is restarting
Readiness probe is failing
Wrong container image or config
App binds only to 127.0.0.1 instead of 0.0.0.0

Step 4: Test Direct Pod-to-Pod Connectivity

First get Pod IPs:

kubectl get pods -o wide

Or get only Pod IPs:

kubectl get pods -l app=hostnames -o jsonpath='{.items[*].status.podIP}'

Run a temporary BusyBox Pod:

kubectl run -it --rm --restart=Never busybox --image=busybox -- sh

Inside BusyBox:

for ip in 10.244.1.5 10.244.2.3; do
  wget -qO- $ip:9376
done

Interpretation

If direct Pod IP works, the CNI network path is probably working.

If direct Pod IP fails across nodes, check CNI, routing, node firewall, or NetworkPolicy.

Step 5: Check Service Definition

kubectl get svc hostnames -o yaml

Important fields:

apiVersion: v1
kind: Service
metadata:
  name: hostnames
spec:
  selector:
    app: hostnames
  ports:
    - port: 80
      targetPort: 9376

Check:

selector matches Pod labels
port is the Service port
targetPort matches the container listening port
namespace is correct
Service type is correct

Most common Service issue

The Service selector does not match the Pod labels. If selectors do not match, Kubernetes will not create endpoints.

Step 6: Verify Pod Labels Match Service Selectors

Check Service selector:

kubectl describe svc hostnames

Check Pod labels:

kubectl get pods --show-labels

Or:

kubectl get pods -l app=hostnames

Expected result: backend Pods should be listed.

Selector mismatch example

Service selector:

selector:
  app: hostnames

Pod label:

labels:
  app: hostname

This will fail because hostnames and hostname are different.

Step 7: Check Endpoints / EndpointSlices

Modern Kubernetes uses EndpointSlices, but endpoints is still useful for quick checks.

EndpointSlicesEndpoints

kubectl get endpointslices -l kubernetes.io/service-name=hostnames
kubectl get endpointslices -l kubernetes.io/service-name=hostnames -o yaml

kubectl get endpoints hostnames
kubectl describe endpoints hostnames

Expected:

NAME        ENDPOINTS                         AGE
hostnames   10.244.1.5:9376,10.244.2.3:9376   1h

Success

If endpoints exist, the Service has discovered matching backend Pods.

Danger

If endpoints are empty, traffic to the Service will fail even if the Service object exists.

Step 8: Test Service ClusterIP

Get Service IP:

kubectl get svc hostnames

Example:

NAME        TYPE        CLUSTER-IP      PORT(S)
hostnames   ClusterIP   10.96.120.10    80/TCP

Test from inside the cluster:

kubectl run -it --rm --restart=Never busybox --image=busybox -- sh

Inside BusyBox:

wget -qO- 10.96.120.10:80

Tip

If Pod IP works but Service ClusterIP does not work, check kube-proxy and Service rules.

Step 9: Test DNS Resolution

Test the Service name:

kubectl run -it --rm --restart=Never busybox --image=busybox -- nslookup hostnames

Test FQDN:

kubectl run -it --rm --restart=Never busybox --image=busybox -- nslookup hostnames.default.svc.cluster.local

Common Service DNS formats:

<service-name>
<service-name>.<namespace>
<service-name>.<namespace>.svc
<service-name>.<namespace>.svc.cluster.local

Note

Same namespace Pods can usually call a Service by short name. Cross-namespace calls should use <service>.<namespace> or full FQDN.

Step 10: Check Pod DNS Configuration

From inside a Pod:

kubectl exec -it <pod-name> -- cat /etc/resolv.conf

Expected style:

nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

Check:

nameserver points to kube-dns/CoreDNS Service IP
search domains are present
namespace in search path is correct

Bug

If /etc/resolv.conf has the wrong nameserver, check kubelet DNS settings and the kube-dns Service.

Step 11: Troubleshoot CoreDNS

CoreDNS usually runs in the kube-system namespace.

Check CoreDNS Pods:

kubectl get pods -n kube-system -l k8s-app=kube-dns

Check CoreDNS Service:

kubectl get svc -n kube-system kube-dns

Check EndpointSlice:

kubectl get endpointslice -n kube-system -l k8s.io/service-name=kube-dns

Check logs:

kubectl logs -n kube-system -l k8s-app=kube-dns

Check CoreDNS ConfigMap:

kubectl get configmap coredns -n kube-system -o yaml

CoreDNS Corefile

.:53 {
    errors
    health
    kubernetes cluster.local in-addr.arpa ip6.arpa {
        pods insecure
        fallthrough in-addr.arpa ip6.arpa
    }
    prometheus :9153
    forward . /etc/resolv.conf
    cache 30
    reload
}

Warning

If CoreDNS Pods are running but DNS fails, verify the kube-dns Service has endpoints and that Pods use the correct nameserver in /etc/resolv.conf.

Step 12: Check CNI Plugin

CNI is responsible for Pod networking and Pod IP assignment.

Check CNI Pods:

kubectl get pods -n kube-system

Common CNI Pods:

Calico
Flannel
Weave
Cilium

Check CNI logs:

kubectl logs -n kube-system <cni-pod-name>

Check node CNI files:

ls -l /etc/cni/net.d/
ls -l /opt/cni/bin/

Check Pod CIDR:

kubectl get nodes -o wide
kubectl describe node <node-name> | grep -i cidr

CNI failure signs

Pods stuck in ContainerCreating
failed to create pod sandbox
Pod IP missing
Pod-to-Pod traffic fails across nodes
CNI DaemonSet Pods are not running on all nodes

Step 13: Check kube-proxy

kube-proxy programs rules so Service traffic reaches backend Pods.

Check kube-proxy Pods:

kubectl get pods -n kube-system -l k8s-app=kube-proxy

Check logs:

kubectl logs -n kube-system -l k8s-app=kube-proxy

Check ConfigMap:

kubectl get configmap kube-proxy -n kube-system -o yaml

Look for mode:

mode: "iptables"

or:

mode: "ipvs"

Note

kube-proxy can use iptables or ipvs. In troubleshooting, first identify which mode your cluster uses.

Step 14: Verify Service Rules

iptables modeIPVS mode

sudo iptables -t nat -L -n -v | grep <service-name>
sudo iptables -t nat -L -n -v | grep <cluster-ip>

sudo ipvsadm -Ln

Example IPVS output:

TCP  10.0.1.175:80 rr
  -> 10.244.0.5:9376 Masq 1 0 0
  -> 10.244.0.6:9376 Masq 1 0 0

Success

If kube-proxy rules show Service IP forwarding to Pod IPs, Service routing is likely configured correctly.

Step 15: Check NetworkPolicy

If Pod IP, Service IP, or DNS partially works only from some Pods/namespaces, check NetworkPolicies.

kubectl get networkpolicy -A
kubectl describe networkpolicy <policy-name> -n <namespace>

Common mistakes:

default deny policy blocks traffic
wrong namespace selector
wrong pod selector
missing egress DNS rule to CoreDNS
missing port in ingress/egress rule

Warning

NetworkPolicy is enforced by the CNI plugin. If the CNI does not support NetworkPolicy, the policy may exist but not be enforced.

Step 16: Check Node-Level Network

On the affected node:

ip addr
ip route
ping <other-node-ip>
ping <pod-ip>
curl -k https://<api-server-ip>:6443/healthz

Check firewall rules:

sudo iptables -L -n -v
sudo iptables -t nat -L -n -v

Check listening ports:

ss -lntup

Tip

If only cross-node Pod traffic fails, check node routes, overlay tunnel interfaces, firewall rules, and CNI DaemonSet health.

Quick Debug Decision Table

What works?	What fails?	Likely issue
Pod IP	Service IP	kube-proxy or Service rules
Service IP	DNS name	CoreDNS or Pod `/etc/resolv.conf`
Same-node Pod IP	Cross-node Pod IP	CNI routing/overlay issue
DNS from one namespace	DNS from another namespace	namespace/search path/policy issue
Service has no endpoints	App access	Service selector or Pod readiness
Pod running but not endpoint	Service access	readiness probe failing
External access only	NodePort/LoadBalancer/Ingress	external routing/firewall/LB issue
Everything works except one Pod	App config or NetworkPolicy	Pod-specific issue

CKA-Focused Command Cheat Sheet

PodsServicesDNSCNIkube-proxy

kubectl get pods -A -o wide
kubectl describe pod <pod>
kubectl logs <pod>
kubectl logs <pod> --previous

kubectl get svc -A
kubectl get svc <svc> -o yaml
kubectl describe svc <svc>
kubectl get endpoints <svc>
kubectl get endpointslices -l kubernetes.io/service-name=<svc>

kubectl get pods -n kube-system -l k8s-app=kube-dns
kubectl get svc -n kube-system kube-dns
kubectl logs -n kube-system -l k8s-app=kube-dns
kubectl exec -it <pod> -- cat /etc/resolv.conf
kubectl run -it --rm --restart=Never busybox --image=busybox -- nslookup kubernetes.default.svc.cluster.local

kubectl get pods -n kube-system
ls -l /etc/cni/net.d/
ls -l /opt/cni/bin/
ip addr
ip route

kubectl get pods -n kube-system -l k8s-app=kube-proxy
kubectl logs -n kube-system -l k8s-app=kube-proxy
kubectl get configmap kube-proxy -n kube-system -o yaml
sudo iptables -t nat -L -n -v
sudo ipvsadm -Ln

Production Best Practices

Do

Use Services instead of hardcoding Pod IPs.
Use readiness probes so Services route only to healthy Pods.
Monitor CoreDNS latency, errors, and restarts.
Run at least two CoreDNS replicas in production.
Keep CNI DaemonSet healthy on every node.
Monitor kube-proxy logs and rule sync failures.
Use NetworkPolicies intentionally and document them.
Use clear labels because Services depend on selectors.
Validate DNS, Service, and Pod connectivity after every CNI upgrade.
Keep Pod CIDR and Service CIDR non-overlapping.

Don't

Do not point applications directly to Pod IPs.
Do not ignore empty Service endpoints.
Do not change CNI configs manually without understanding cluster impact.
Do not assume DNS is broken before testing Service ClusterIP.
Do not create broad default-deny NetworkPolicies without required allow rules.
Do not forget DNS egress rules when using strict NetworkPolicies.
Do not mix multiple CNIs unless the architecture explicitly supports it.

Exam Revision Checklist

[ ] Are source and destination Pods running?
[ ] Do Pods have IP addresses?
[ ] Can source Pod reach destination Pod IP?
[ ] Does the Service selector match Pod labels?
[ ] Does the Service have endpoints or EndpointSlices?
[ ] Does Service ClusterIP work?
[ ] Does Service DNS name resolve?
[ ] Is /etc/resolv.conf inside the Pod correct?
[ ] Are CoreDNS Pods running?
[ ] Does kube-dns Service have endpoints?
[ ] Are CNI Pods running on all nodes?
[ ] Is kube-proxy running on all nodes?
[ ] Are iptables/IPVS rules created?
[ ] Are NetworkPolicies blocking traffic?
[ ] Are node routes/firewalls blocking traffic?

Fast Troubleshooting Example

# 1. Check Pods and IPs
kubectl get pods -A -o wide

# 2. Check Service
kubectl get svc hostnames -o yaml

# 3. Check endpoints
kubectl get endpoints hostnames
kubectl get endpointslices -l kubernetes.io/service-name=hostnames

# 4. Test from temporary Pod
kubectl run -it --rm --restart=Never busybox --image=busybox -- sh

# Inside busybox
wget -qO- <pod-ip>:<target-port>
wget -qO- <service-cluster-ip>:<service-port>
nslookup hostnames
nslookup hostnames.default.svc.cluster.local

# 5. Check DNS
kubectl get pods -n kube-system -l k8s-app=kube-dns
kubectl logs -n kube-system -l k8s-app=kube-dns

# 6. Check kube-proxy
kubectl get pods -n kube-system -l k8s-app=kube-proxy
kubectl logs -n kube-system -l k8s-app=kube-proxy

# 7. Check CNI
kubectl get pods -n kube-system

Quote

Network troubleshooting in Kubernetes is simple when done in order: Pod → Service → Endpoints → DNS → CNI → kube-proxy → NetworkPolicy.