Skip to content

12.04 Network Troubleshooting

Kubernetes network troubleshooting means checking how traffic flows between Pods, Services, CoreDNS, CNI, and kube-proxy. For CKA revision, troubleshoot layer by layer instead of guessing.

Goal

Troubleshoot Kubernetes network failures step by step:

  1. Confirm Pods are healthy
  2. Test direct Pod-to-Pod connectivity
  3. Check Service selectors and endpoints
  4. Verify DNS/CoreDNS
  5. Check CNI plugin
  6. Check kube-proxy rules
  7. Validate network policies, ports, and node routes

Kubernetes Network Components

Component Purpose Troubleshooting focus
Pod Application runtime unit IP, status, logs, container port
Service Stable access point for Pods selector, port, targetPort, endpoints
CoreDNS Cluster DNS / service discovery DNS Pods, kube-dns service, /etc/resolv.conf
CNI plugin Pod network setup and Pod IP allocation CNI Pods, /etc/cni/net.d, Pod CIDR
kube-proxy Service rule management iptables/IPVS rules, kube-proxy logs
NetworkPolicy Traffic allow/deny rules namespace, podSelector, ingress/egress rules

Note

Pods are temporary. Services provide stable access. CoreDNS resolves names. CNI gives Pods network connectivity. kube-proxy makes Service traffic reach the correct Pods.


Network Troubleshooting Flow

Application cannot connect
├── 1. Check Pod status
├── 2. Test direct Pod IP connectivity
├── 3. Check Service definition
├── 4. Check Service endpoints / EndpointSlices
├── 5. Test Service ClusterIP
├── 6. Test DNS resolution
├── 7. Check CoreDNS
├── 8. Check CNI plugin
├── 9. Check kube-proxy
└── 10. Check NetworkPolicies and firewall rules

Tip

For exams and production incidents, isolate the problem first:

  • Pod IP works but Service IP fails → Service or kube-proxy issue
  • Service IP works but DNS name fails → CoreDNS issue
  • Pod IP fails across nodes → CNI/routing issue
  • Pod IP fails only for one app → app/container port issue

Step 1: Understand the Traffic Path

Example application:

User
Web Service
Web Pod
DB Service
DB Pod

Before troubleshooting, identify:

  • source Pod
  • destination Service or Pod
  • namespace
  • port
  • protocol
  • expected DNS name
  • expected endpoint Pod

Ask first

  • Is the issue from outside the cluster or inside the cluster?
  • Is the issue with IP connectivity or DNS?
  • Does direct Pod IP work?
  • Does Service ClusterIP work?
  • Are endpoints created for the Service?

Step 2: Check Pod Status

kubectl get pods -A -o wide

For a specific app:

kubectl get pods -l app=hostnames -o wide

Example:

NAME                         READY   STATUS    RESTARTS   AGE   IP          NODE
hostnames-6f78f87b8f-abcdf   1/1     Running   0          5m    10.244.1.5  node-1
hostnames-6f78f87b8f-hijk1   1/1     Running   0          5m    10.244.2.3  node-2

Check:

  • READY should be 1/1
  • STATUS should be Running
  • RESTARTS should not be increasing
  • Pod IP should exist
  • Pods should be on expected nodes

Warning

Do not debug Service or DNS first if the backend Pods are not healthy.


Step 3: Check Pod Logs and Events

kubectl logs <pod-name>
kubectl describe pod <pod-name>

For previous crashed container logs:

kubectl logs <pod-name> --previous

Follow logs:

kubectl logs <pod-name> -f

Common Pod-level issues

  • App is not listening on the expected port
  • Container is restarting
  • Readiness probe is failing
  • Wrong container image or config
  • App binds only to 127.0.0.1 instead of 0.0.0.0

Step 4: Test Direct Pod-to-Pod Connectivity

First get Pod IPs:

kubectl get pods -o wide

Or get only Pod IPs:

kubectl get pods -l app=hostnames -o jsonpath='{.items[*].status.podIP}'

Run a temporary BusyBox Pod:

kubectl run -it --rm --restart=Never busybox --image=busybox -- sh

Inside BusyBox:

for ip in 10.244.1.5 10.244.2.3; do
  wget -qO- $ip:9376
done

Interpretation

If direct Pod IP works, the CNI network path is probably working.

If direct Pod IP fails across nodes, check CNI, routing, node firewall, or NetworkPolicy.


Step 5: Check Service Definition

kubectl get svc hostnames -o yaml

Important fields:

apiVersion: v1
kind: Service
metadata:
  name: hostnames
spec:
  selector:
    app: hostnames
  ports:
    - port: 80
      targetPort: 9376

Check:

  • selector matches Pod labels
  • port is the Service port
  • targetPort matches the container listening port
  • namespace is correct
  • Service type is correct

Most common Service issue

The Service selector does not match the Pod labels. If selectors do not match, Kubernetes will not create endpoints.


Step 6: Verify Pod Labels Match Service Selectors

Check Service selector:

kubectl describe svc hostnames

Check Pod labels:

kubectl get pods --show-labels

Or:

kubectl get pods -l app=hostnames

Expected result: backend Pods should be listed.

Selector mismatch example

Service selector:

selector:
  app: hostnames

Pod label:

labels:
  app: hostname

This will fail because hostnames and hostname are different.


Step 7: Check Endpoints / EndpointSlices

Modern Kubernetes uses EndpointSlices, but endpoints is still useful for quick checks.

kubectl get endpointslices -l kubernetes.io/service-name=hostnames
kubectl get endpointslices -l kubernetes.io/service-name=hostnames -o yaml
kubectl get endpoints hostnames
kubectl describe endpoints hostnames

Expected:

NAME        ENDPOINTS                         AGE
hostnames   10.244.1.5:9376,10.244.2.3:9376   1h

Success

If endpoints exist, the Service has discovered matching backend Pods.

Danger

If endpoints are empty, traffic to the Service will fail even if the Service object exists.


Step 8: Test Service ClusterIP

Get Service IP:

kubectl get svc hostnames

Example:

NAME        TYPE        CLUSTER-IP      PORT(S)
hostnames   ClusterIP   10.96.120.10    80/TCP

Test from inside the cluster:

kubectl run -it --rm --restart=Never busybox --image=busybox -- sh

Inside BusyBox:

wget -qO- 10.96.120.10:80

Tip

If Pod IP works but Service ClusterIP does not work, check kube-proxy and Service rules.


Step 9: Test DNS Resolution

Test the Service name:

kubectl run -it --rm --restart=Never busybox --image=busybox -- nslookup hostnames

Test FQDN:

kubectl run -it --rm --restart=Never busybox --image=busybox -- nslookup hostnames.default.svc.cluster.local

Common Service DNS formats:

<service-name>
<service-name>.<namespace>
<service-name>.<namespace>.svc
<service-name>.<namespace>.svc.cluster.local

Note

Same namespace Pods can usually call a Service by short name. Cross-namespace calls should use <service>.<namespace> or full FQDN.


Step 10: Check Pod DNS Configuration

From inside a Pod:

kubectl exec -it <pod-name> -- cat /etc/resolv.conf

Expected style:

nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

Check:

  • nameserver points to kube-dns/CoreDNS Service IP
  • search domains are present
  • namespace in search path is correct

Bug

If /etc/resolv.conf has the wrong nameserver, check kubelet DNS settings and the kube-dns Service.


Step 11: Troubleshoot CoreDNS

CoreDNS usually runs in the kube-system namespace.

Check CoreDNS Pods:

kubectl get pods -n kube-system -l k8s-app=kube-dns

Check CoreDNS Service:

kubectl get svc -n kube-system kube-dns

Check EndpointSlice:

kubectl get endpointslice -n kube-system -l k8s.io/service-name=kube-dns

Check logs:

kubectl logs -n kube-system -l k8s-app=kube-dns

Check CoreDNS ConfigMap:

kubectl get configmap coredns -n kube-system -o yaml

CoreDNS Corefile

.:53 {
    errors
    health
    kubernetes cluster.local in-addr.arpa ip6.arpa {
        pods insecure
        fallthrough in-addr.arpa ip6.arpa
    }
    prometheus :9153
    forward . /etc/resolv.conf
    cache 30
    reload
}

Warning

If CoreDNS Pods are running but DNS fails, verify the kube-dns Service has endpoints and that Pods use the correct nameserver in /etc/resolv.conf.


Step 12: Check CNI Plugin

CNI is responsible for Pod networking and Pod IP assignment.

Check CNI Pods:

kubectl get pods -n kube-system

Common CNI Pods:

  • Calico
  • Flannel
  • Weave
  • Cilium

Check CNI logs:

kubectl logs -n kube-system <cni-pod-name>

Check node CNI files:

ls -l /etc/cni/net.d/
ls -l /opt/cni/bin/

Check Pod CIDR:

kubectl get nodes -o wide
kubectl describe node <node-name> | grep -i cidr

CNI failure signs

  • Pods stuck in ContainerCreating
  • failed to create pod sandbox
  • Pod IP missing
  • Pod-to-Pod traffic fails across nodes
  • CNI DaemonSet Pods are not running on all nodes

Step 13: Check kube-proxy

kube-proxy programs rules so Service traffic reaches backend Pods.

Check kube-proxy Pods:

kubectl get pods -n kube-system -l k8s-app=kube-proxy

Check logs:

kubectl logs -n kube-system -l k8s-app=kube-proxy

Check ConfigMap:

kubectl get configmap kube-proxy -n kube-system -o yaml

Look for mode:

mode: "iptables"

or:

mode: "ipvs"

Note

kube-proxy can use iptables or ipvs. In troubleshooting, first identify which mode your cluster uses.


Step 14: Verify Service Rules

sudo iptables -t nat -L -n -v | grep <service-name>
sudo iptables -t nat -L -n -v | grep <cluster-ip>
sudo ipvsadm -Ln

Example IPVS output:

TCP  10.0.1.175:80 rr
  -> 10.244.0.5:9376 Masq 1 0 0
  -> 10.244.0.6:9376 Masq 1 0 0

Success

If kube-proxy rules show Service IP forwarding to Pod IPs, Service routing is likely configured correctly.


Step 15: Check NetworkPolicy

If Pod IP, Service IP, or DNS partially works only from some Pods/namespaces, check NetworkPolicies.

kubectl get networkpolicy -A
kubectl describe networkpolicy <policy-name> -n <namespace>

Common mistakes:

  • default deny policy blocks traffic
  • wrong namespace selector
  • wrong pod selector
  • missing egress DNS rule to CoreDNS
  • missing port in ingress/egress rule

Warning

NetworkPolicy is enforced by the CNI plugin. If the CNI does not support NetworkPolicy, the policy may exist but not be enforced.


Step 16: Check Node-Level Network

On the affected node:

ip addr
ip route
ping <other-node-ip>
ping <pod-ip>
curl -k https://<api-server-ip>:6443/healthz

Check firewall rules:

sudo iptables -L -n -v
sudo iptables -t nat -L -n -v

Check listening ports:

ss -lntup

Tip

If only cross-node Pod traffic fails, check node routes, overlay tunnel interfaces, firewall rules, and CNI DaemonSet health.


Quick Debug Decision Table

What works? What fails? Likely issue
Pod IP Service IP kube-proxy or Service rules
Service IP DNS name CoreDNS or Pod /etc/resolv.conf
Same-node Pod IP Cross-node Pod IP CNI routing/overlay issue
DNS from one namespace DNS from another namespace namespace/search path/policy issue
Service has no endpoints App access Service selector or Pod readiness
Pod running but not endpoint Service access readiness probe failing
External access only NodePort/LoadBalancer/Ingress external routing/firewall/LB issue
Everything works except one Pod App config or NetworkPolicy Pod-specific issue

CKA-Focused Command Cheat Sheet

kubectl get pods -A -o wide
kubectl describe pod <pod>
kubectl logs <pod>
kubectl logs <pod> --previous
kubectl get svc -A
kubectl get svc <svc> -o yaml
kubectl describe svc <svc>
kubectl get endpoints <svc>
kubectl get endpointslices -l kubernetes.io/service-name=<svc>
kubectl get pods -n kube-system -l k8s-app=kube-dns
kubectl get svc -n kube-system kube-dns
kubectl logs -n kube-system -l k8s-app=kube-dns
kubectl exec -it <pod> -- cat /etc/resolv.conf
kubectl run -it --rm --restart=Never busybox --image=busybox -- nslookup kubernetes.default.svc.cluster.local
kubectl get pods -n kube-system
ls -l /etc/cni/net.d/
ls -l /opt/cni/bin/
ip addr
ip route
kubectl get pods -n kube-system -l k8s-app=kube-proxy
kubectl logs -n kube-system -l k8s-app=kube-proxy
kubectl get configmap kube-proxy -n kube-system -o yaml
sudo iptables -t nat -L -n -v
sudo ipvsadm -Ln

Production Best Practices

Do

  • Use Services instead of hardcoding Pod IPs.
  • Use readiness probes so Services route only to healthy Pods.
  • Monitor CoreDNS latency, errors, and restarts.
  • Run at least two CoreDNS replicas in production.
  • Keep CNI DaemonSet healthy on every node.
  • Monitor kube-proxy logs and rule sync failures.
  • Use NetworkPolicies intentionally and document them.
  • Use clear labels because Services depend on selectors.
  • Validate DNS, Service, and Pod connectivity after every CNI upgrade.
  • Keep Pod CIDR and Service CIDR non-overlapping.

Don't

  • Do not point applications directly to Pod IPs.
  • Do not ignore empty Service endpoints.
  • Do not change CNI configs manually without understanding cluster impact.
  • Do not assume DNS is broken before testing Service ClusterIP.
  • Do not create broad default-deny NetworkPolicies without required allow rules.
  • Do not forget DNS egress rules when using strict NetworkPolicies.
  • Do not mix multiple CNIs unless the architecture explicitly supports it.

Exam Revision Checklist

[ ] Are source and destination Pods running?
[ ] Do Pods have IP addresses?
[ ] Can source Pod reach destination Pod IP?
[ ] Does the Service selector match Pod labels?
[ ] Does the Service have endpoints or EndpointSlices?
[ ] Does Service ClusterIP work?
[ ] Does Service DNS name resolve?
[ ] Is /etc/resolv.conf inside the Pod correct?
[ ] Are CoreDNS Pods running?
[ ] Does kube-dns Service have endpoints?
[ ] Are CNI Pods running on all nodes?
[ ] Is kube-proxy running on all nodes?
[ ] Are iptables/IPVS rules created?
[ ] Are NetworkPolicies blocking traffic?
[ ] Are node routes/firewalls blocking traffic?

Fast Troubleshooting Example

# 1. Check Pods and IPs
kubectl get pods -A -o wide

# 2. Check Service
kubectl get svc hostnames -o yaml

# 3. Check endpoints
kubectl get endpoints hostnames
kubectl get endpointslices -l kubernetes.io/service-name=hostnames

# 4. Test from temporary Pod
kubectl run -it --rm --restart=Never busybox --image=busybox -- sh

# Inside busybox
wget -qO- <pod-ip>:<target-port>
wget -qO- <service-cluster-ip>:<service-port>
nslookup hostnames
nslookup hostnames.default.svc.cluster.local

# 5. Check DNS
kubectl get pods -n kube-system -l k8s-app=kube-dns
kubectl logs -n kube-system -l k8s-app=kube-dns

# 6. Check kube-proxy
kubectl get pods -n kube-system -l k8s-app=kube-proxy
kubectl logs -n kube-system -l k8s-app=kube-proxy

# 7. Check CNI
kubectl get pods -n kube-system

Quote

Network troubleshooting in Kubernetes is simple when done in order: Pod → Service → Endpoints → DNS → CNI → kube-proxy → NetworkPolicy.