12.01 Troubleshooting Application Failure

Troubleshooting application failure in Kubernetes means checking the full request path from the user to the backend dependency.

For a typical two-tier app:

User
  ↓
web-service
  ↓
web pod
  ↓
db-service
  ↓
db pod

Goal

Find where the traffic or application flow breaks:

user cannot reach service
Service selector does not match Pod labels
Service has no endpoints
Pod is not running
container is restarting
application logs show dependency errors
backend database/service is unavailable

Troubleshooting Flow

Start from the user-facing side and move inward.

1. Check external access
2. Check web Service
3. Check Service endpoints
4. Check web Pod status
5. Check web Pod events
6. Check web Pod logs
7. Check previous logs if container restarted
8. Check DB Service
9. Check DB Pod
10. Check DB logs and connectivity

Tip

Always draw the application path before troubleshooting. It prevents random debugging and helps you isolate the exact failing layer.

Step 1: Check Application Accessibility

For a web application exposed through a NodePort, test access using curl.

curl http://<node-ip>:<node-port>

Example:

curl http://web-service-ip:node-port

If the request fails with timeout or connection error, move to the Service layer.

Note

A failed curl does not always mean the Pod is broken. It may be a Service, selector, endpoint, port, or network policy issue.

Step 2: Check the Web Service

Inspect the Service configuration.

kubectl describe service web-service

Check these fields carefully:

Field	What to verify
`Selector`	Must match Pod labels
`Type`	`ClusterIP`, `NodePort`, or `LoadBalancer` as expected
`Port`	Service port exposed internally
`TargetPort`	Container port where app listens
`NodePort`	External node port if Service is NodePort
`Endpoints`	Must show backend Pod IP and port

Example Service issue:

Selector:    name=webapp-mysql
TargetPort: 8080/TCP
Endpoints:  10.32.0.6:8080

Healthy Service

A healthy Service should show valid endpoints. If Endpoints is empty, the Service is not connected to any matching Pod.

Common Service failure

If the Service selector does not match the Pod labels, Kubernetes will not create endpoints for that Service.

Step 3: Compare Service Selector and Pod Labels

Check the Service selector:

kubectl describe service web-service

Check Pod labels:

kubectl get pod web --show-labels

Or describe the Pod:

kubectl describe pod web

Example:

# Service selector
selector:
  name: webapp-mysql

# Pod labels
labels:
  name: webapp-mysql

Tip

The Service selector and Pod labels must match exactly. Even a small mismatch like app=web vs name=web can break traffic.

Step 4: Check Pod Status

List Pods and check status, readiness, and restarts.

kubectl get pods

Example output:

NAME   READY   STATUS    RESTARTS   AGE
web    1/1     Running   5          50m

Field	Meaning
`READY`	Containers ready inside the Pod
`STATUS`	Current Pod state
`RESTARTS`	Number of container restarts
`AGE`	How long the Pod has existed

Restarts matter

A Pod can show Running but still have many restarts. That usually means the app is crashing and restarting repeatedly.

Step 5: Describe the Pod

Use describe to check scheduling, image pull, container creation, and runtime events.

kubectl describe pod web

Look at the Events section:

Normal  Scheduled  default-scheduler  Successfully assigned webapp-mysql to worker-1
Normal  Pulling    kubelet             Pulling image "simple-webapp-mysql"
Normal  Pulled     kubelet             Successfully pulled image
Normal  Created    kubelet             Created container
Normal  Started    kubelet             Started container

Healthy Pod events

Healthy events usually show:

scheduled successfully
image pulled
container created
container started

Problem indicators

Watch for events like:

FailedScheduling
ImagePullBackOff
ErrImagePull
CrashLoopBackOff
CreateContainerConfigError
FailedMount

Step 6: Check Current Logs

Check application logs from the running container.

kubectl logs web

Follow logs in real time:

kubectl logs web -f

Example successful web logs:

GET / HTTP/1.1 200
GET /static/img/success.jpg HTTP/1.1 200
GET /favicon.ico HTTP/1.1 404

Note

A 404 for /favicon.ico is usually harmless. Focus on application errors, database errors, authentication failures, and repeated 500 responses.

Step 7: Check Previous Logs

If the Pod restarted, current logs may not show why the previous container failed.

Use:

kubectl logs web --previous

Example:

Some Database Error application exiting!

Important

Use --previous when the container has restarted. This is one of the fastest ways to find the real crash reason.

Step 8: Check Database Service

After confirming the web Pod is running, check the backend Service.

kubectl describe service db-service

Verify:

selector matches DB Pod labels
target port matches DB container port
endpoints exist
service name used by the app is correct

kubectl get endpoints db-service

DB Service issue

If the DB Service has no endpoints, the web app may fail even if the web Pod itself is healthy.

Step 9: Check Database Pod

kubectl get pods
kubectl describe pod db
kubectl logs db

Check for:

database container not starting
incorrect environment variables
missing Secrets
PVC mount failures
authentication errors
database readiness failures

Common database failures

wrong DB password
missing Secret
incorrect Service name
PVC not bound
database port mismatch
app starts before DB is ready

Useful Commands

AccessibilityServicesPodsLogsDebug shell

curl http://<node-ip>:<node-port>
curl http://<service-ip>:<service-port>

kubectl get svc
kubectl describe svc web-service
kubectl get endpoints web-service

kubectl get pods
kubectl get pods -o wide
kubectl describe pod web
kubectl get pod web --show-labels

kubectl logs web
kubectl logs web -f
kubectl logs web --previous

kubectl exec -it web -- sh

Common Root Causes

Symptom	Likely cause	What to check
`curl` times out	Service/NodePort/network issue	Service type, NodePort, firewall
Service has no endpoints	Selector mismatch	Service selector and Pod labels
Pod is `Pending`	Scheduling/resource issue	`kubectl describe pod` events
Pod is `ImagePullBackOff`	Image issue	image name, tag, registry auth
Pod is `CrashLoopBackOff`	App crash	`kubectl logs --previous`
App returns DB error	Backend issue	DB Service, DB Pod, DB credentials
Pod restarts repeatedly	App/runtime failure	restarts count, previous logs
Service port works internally, not externally	Exposure issue	NodePort, LoadBalancer, firewall

Production Troubleshooting Checklist

Do

Start from the user-facing endpoint and move layer by layer.
Confirm Service selectors match Pod labels.
Check Endpoints before blaming the Pod.
Check Pod RESTARTS, not only STATUS.
Use kubectl describe for events.
Use kubectl logs --previous for restarted containers.
Validate database/service dependencies.
Keep readiness and liveness probes configured.
Use meaningful labels like app, tier, and component.
Keep application logs structured and searchable.

Don't

Do not restart Pods blindly without checking logs first.
Do not assume Running means healthy.
Do not expose databases directly outside the cluster unless required.
Do not use random labels that make Service selection unclear.
Do not ignore restart counts.
Do not hardcode backend Pod IPs; use Services.
Do not store DB credentials directly in manifests.

Production Best Practices

Design for easier troubleshooting

Use readinessProbe so traffic only reaches ready Pods.
Use livenessProbe to restart unhealthy containers safely.
Use startupProbe for slow-starting apps.
Expose apps through Services, not Pod IPs.
Use separate Services for web and database tiers.
Add resource requests and limits.
Store credentials in Secrets.
Use centralized logging and monitoring.

Readiness vs liveness

Do not use the same probe blindly for both.

Readiness controls whether traffic is sent to the Pod.
Liveness controls whether the container should be restarted.

Quick Decision Tree

User cannot access app
│
├── curl NodePort/LoadBalancer fails?
│   └── Check Service type, port, node firewall
│
├── Service has no endpoints?
│   └── Check selector and Pod labels
│
├── Pod not ready or restarting?
│   └── Check describe + logs + previous logs
│
├── Web logs show DB error?
│   └── Check DB Service, DB Pod, DB credentials
│
└── Everything looks healthy?
    └── Check NetworkPolicy, DNS, app config, ingress/load balancer

Example Investigation

# 1. Test application access
curl http://<node-ip>:<node-port>

# 2. Check web service
kubectl describe svc web-service

# 3. Check endpoints
kubectl get endpoints web-service

# 4. Check pod state
kubectl get pods

# 5. Check pod events
kubectl describe pod web

# 6. Check current logs
kubectl logs web

# 7. Check previous failed container logs
kubectl logs web --previous

# 8. Check backend service
kubectl describe svc db-service

# 9. Check database pod
kubectl describe pod db
kubectl logs db

Quote

Troubleshooting Kubernetes applications is not guessing. Follow the traffic path and verify each object one by one.