Skip to content

12.01 Troubleshooting Application Failure

Troubleshooting application failure in Kubernetes means checking the full request path from the user to the backend dependency.

For a typical two-tier app:

User
web-service
web pod
db-service
db pod

Goal

Find where the traffic or application flow breaks:

  • user cannot reach service
  • Service selector does not match Pod labels
  • Service has no endpoints
  • Pod is not running
  • container is restarting
  • application logs show dependency errors
  • backend database/service is unavailable

Troubleshooting Flow

Start from the user-facing side and move inward.

1. Check external access
2. Check web Service
3. Check Service endpoints
4. Check web Pod status
5. Check web Pod events
6. Check web Pod logs
7. Check previous logs if container restarted
8. Check DB Service
9. Check DB Pod
10. Check DB logs and connectivity

Tip

Always draw the application path before troubleshooting. It prevents random debugging and helps you isolate the exact failing layer.


Step 1: Check Application Accessibility

For a web application exposed through a NodePort, test access using curl.

curl http://<node-ip>:<node-port>

Example:

curl http://web-service-ip:node-port

If the request fails with timeout or connection error, move to the Service layer.

Note

A failed curl does not always mean the Pod is broken. It may be a Service, selector, endpoint, port, or network policy issue.


Step 2: Check the Web Service

Inspect the Service configuration.

kubectl describe service web-service

Check these fields carefully:

Field What to verify
Selector Must match Pod labels
Type ClusterIP, NodePort, or LoadBalancer as expected
Port Service port exposed internally
TargetPort Container port where app listens
NodePort External node port if Service is NodePort
Endpoints Must show backend Pod IP and port

Example Service issue:

Selector:    name=webapp-mysql
TargetPort: 8080/TCP
Endpoints:  10.32.0.6:8080

Healthy Service

A healthy Service should show valid endpoints. If Endpoints is empty, the Service is not connected to any matching Pod.

Common Service failure

If the Service selector does not match the Pod labels, Kubernetes will not create endpoints for that Service.


Step 3: Compare Service Selector and Pod Labels

Check the Service selector:

kubectl describe service web-service

Check Pod labels:

kubectl get pod web --show-labels

Or describe the Pod:

kubectl describe pod web

Example:

# Service selector
selector:
  name: webapp-mysql
# Pod labels
labels:
  name: webapp-mysql

Tip

The Service selector and Pod labels must match exactly. Even a small mismatch like app=web vs name=web can break traffic.


Step 4: Check Pod Status

List Pods and check status, readiness, and restarts.

kubectl get pods

Example output:

NAME   READY   STATUS    RESTARTS   AGE
web    1/1     Running   5          50m
Field Meaning
READY Containers ready inside the Pod
STATUS Current Pod state
RESTARTS Number of container restarts
AGE How long the Pod has existed

Restarts matter

A Pod can show Running but still have many restarts. That usually means the app is crashing and restarting repeatedly.


Step 5: Describe the Pod

Use describe to check scheduling, image pull, container creation, and runtime events.

kubectl describe pod web

Look at the Events section:

Normal  Scheduled  default-scheduler  Successfully assigned webapp-mysql to worker-1
Normal  Pulling    kubelet             Pulling image "simple-webapp-mysql"
Normal  Pulled     kubelet             Successfully pulled image
Normal  Created    kubelet             Created container
Normal  Started    kubelet             Started container

Healthy Pod events

Healthy events usually show:

  • scheduled successfully
  • image pulled
  • container created
  • container started

Problem indicators

Watch for events like:

  • FailedScheduling
  • ImagePullBackOff
  • ErrImagePull
  • CrashLoopBackOff
  • CreateContainerConfigError
  • FailedMount

Step 6: Check Current Logs

Check application logs from the running container.

kubectl logs web

Follow logs in real time:

kubectl logs web -f

Example successful web logs:

GET / HTTP/1.1 200
GET /static/img/success.jpg HTTP/1.1 200
GET /favicon.ico HTTP/1.1 404

Note

A 404 for /favicon.ico is usually harmless. Focus on application errors, database errors, authentication failures, and repeated 500 responses.


Step 7: Check Previous Logs

If the Pod restarted, current logs may not show why the previous container failed.

Use:

kubectl logs web --previous

Example:

Some Database Error application exiting!

Important

Use --previous when the container has restarted. This is one of the fastest ways to find the real crash reason.


Step 8: Check Database Service

After confirming the web Pod is running, check the backend Service.

kubectl describe service db-service

Verify:

  • selector matches DB Pod labels
  • target port matches DB container port
  • endpoints exist
  • service name used by the app is correct
kubectl get endpoints db-service

DB Service issue

If the DB Service has no endpoints, the web app may fail even if the web Pod itself is healthy.


Step 9: Check Database Pod

kubectl get pods
kubectl describe pod db
kubectl logs db

Check for:

  • database container not starting
  • incorrect environment variables
  • missing Secrets
  • PVC mount failures
  • authentication errors
  • database readiness failures

Common database failures

  • wrong DB password
  • missing Secret
  • incorrect Service name
  • PVC not bound
  • database port mismatch
  • app starts before DB is ready

Useful Commands

curl http://<node-ip>:<node-port>
curl http://<service-ip>:<service-port>
kubectl get svc
kubectl describe svc web-service
kubectl get endpoints web-service
kubectl get pods
kubectl get pods -o wide
kubectl describe pod web
kubectl get pod web --show-labels
kubectl logs web
kubectl logs web -f
kubectl logs web --previous
kubectl exec -it web -- sh

Common Root Causes

Symptom Likely cause What to check
curl times out Service/NodePort/network issue Service type, NodePort, firewall
Service has no endpoints Selector mismatch Service selector and Pod labels
Pod is Pending Scheduling/resource issue kubectl describe pod events
Pod is ImagePullBackOff Image issue image name, tag, registry auth
Pod is CrashLoopBackOff App crash kubectl logs --previous
App returns DB error Backend issue DB Service, DB Pod, DB credentials
Pod restarts repeatedly App/runtime failure restarts count, previous logs
Service port works internally, not externally Exposure issue NodePort, LoadBalancer, firewall

Production Troubleshooting Checklist

Do

  • Start from the user-facing endpoint and move layer by layer.
  • Confirm Service selectors match Pod labels.
  • Check Endpoints before blaming the Pod.
  • Check Pod RESTARTS, not only STATUS.
  • Use kubectl describe for events.
  • Use kubectl logs --previous for restarted containers.
  • Validate database/service dependencies.
  • Keep readiness and liveness probes configured.
  • Use meaningful labels like app, tier, and component.
  • Keep application logs structured and searchable.

Don't

  • Do not restart Pods blindly without checking logs first.
  • Do not assume Running means healthy.
  • Do not expose databases directly outside the cluster unless required.
  • Do not use random labels that make Service selection unclear.
  • Do not ignore restart counts.
  • Do not hardcode backend Pod IPs; use Services.
  • Do not store DB credentials directly in manifests.

Production Best Practices

Design for easier troubleshooting

  • Use readinessProbe so traffic only reaches ready Pods.
  • Use livenessProbe to restart unhealthy containers safely.
  • Use startupProbe for slow-starting apps.
  • Expose apps through Services, not Pod IPs.
  • Use separate Services for web and database tiers.
  • Add resource requests and limits.
  • Store credentials in Secrets.
  • Use centralized logging and monitoring.

Readiness vs liveness

Do not use the same probe blindly for both.

  • Readiness controls whether traffic is sent to the Pod.
  • Liveness controls whether the container should be restarted.

Quick Decision Tree

User cannot access app
├── curl NodePort/LoadBalancer fails?
│   └── Check Service type, port, node firewall
├── Service has no endpoints?
│   └── Check selector and Pod labels
├── Pod not ready or restarting?
│   └── Check describe + logs + previous logs
├── Web logs show DB error?
│   └── Check DB Service, DB Pod, DB credentials
└── Everything looks healthy?
    └── Check NetworkPolicy, DNS, app config, ingress/load balancer

Example Investigation

# 1. Test application access
curl http://<node-ip>:<node-port>

# 2. Check web service
kubectl describe svc web-service

# 3. Check endpoints
kubectl get endpoints web-service

# 4. Check pod state
kubectl get pods

# 5. Check pod events
kubectl describe pod web

# 6. Check current logs
kubectl logs web

# 7. Check previous failed container logs
kubectl logs web --previous

# 8. Check backend service
kubectl describe svc db-service

# 9. Check database pod
kubectl describe pod db
kubectl logs db

Quote

Troubleshooting Kubernetes applications is not guessing. Follow the traffic path and verify each object one by one.