12.01 Troubleshooting Application Failure
Troubleshooting application failure in Kubernetes means checking the full request path from the user to the backend dependency.
For a typical two-tier app:
Goal
Find where the traffic or application flow breaks:
- user cannot reach service
- Service selector does not match Pod labels
- Service has no endpoints
- Pod is not running
- container is restarting
- application logs show dependency errors
- backend database/service is unavailable
Troubleshooting Flow
Start from the user-facing side and move inward.
1. Check external access
2. Check web Service
3. Check Service endpoints
4. Check web Pod status
5. Check web Pod events
6. Check web Pod logs
7. Check previous logs if container restarted
8. Check DB Service
9. Check DB Pod
10. Check DB logs and connectivity
Tip
Always draw the application path before troubleshooting. It prevents random debugging and helps you isolate the exact failing layer.
Step 1: Check Application Accessibility
For a web application exposed through a NodePort, test access using curl.
Example:
If the request fails with timeout or connection error, move to the Service layer.
Note
A failed curl does not always mean the Pod is broken. It may be a Service, selector, endpoint, port, or network policy issue.
Step 2: Check the Web Service
Inspect the Service configuration.
Check these fields carefully:
| Field | What to verify |
|---|---|
Selector |
Must match Pod labels |
Type |
ClusterIP, NodePort, or LoadBalancer as expected |
Port |
Service port exposed internally |
TargetPort |
Container port where app listens |
NodePort |
External node port if Service is NodePort |
Endpoints |
Must show backend Pod IP and port |
Example Service issue:
Healthy Service
A healthy Service should show valid endpoints. If Endpoints is empty, the Service is not connected to any matching Pod.
Common Service failure
If the Service selector does not match the Pod labels, Kubernetes will not create endpoints for that Service.
Step 3: Compare Service Selector and Pod Labels
Check the Service selector:
Check Pod labels:
Or describe the Pod:
Example:
Tip
The Service selector and Pod labels must match exactly. Even a small mismatch like app=web vs name=web can break traffic.
Step 4: Check Pod Status
List Pods and check status, readiness, and restarts.
Example output:
| Field | Meaning |
|---|---|
READY |
Containers ready inside the Pod |
STATUS |
Current Pod state |
RESTARTS |
Number of container restarts |
AGE |
How long the Pod has existed |
Restarts matter
A Pod can show Running but still have many restarts. That usually means the app is crashing and restarting repeatedly.
Step 5: Describe the Pod
Use describe to check scheduling, image pull, container creation, and runtime events.
Look at the Events section:
Normal Scheduled default-scheduler Successfully assigned webapp-mysql to worker-1
Normal Pulling kubelet Pulling image "simple-webapp-mysql"
Normal Pulled kubelet Successfully pulled image
Normal Created kubelet Created container
Normal Started kubelet Started container
Healthy Pod events
Healthy events usually show:
- scheduled successfully
- image pulled
- container created
- container started
Problem indicators
Watch for events like:
FailedSchedulingImagePullBackOffErrImagePullCrashLoopBackOffCreateContainerConfigErrorFailedMount
Step 6: Check Current Logs
Check application logs from the running container.
Follow logs in real time:
Example successful web logs:
Note
A 404 for /favicon.ico is usually harmless. Focus on application errors, database errors, authentication failures, and repeated 500 responses.
Step 7: Check Previous Logs
If the Pod restarted, current logs may not show why the previous container failed.
Use:
Example:
Important
Use --previous when the container has restarted. This is one of the fastest ways to find the real crash reason.
Step 8: Check Database Service
After confirming the web Pod is running, check the backend Service.
Verify:
- selector matches DB Pod labels
- target port matches DB container port
- endpoints exist
- service name used by the app is correct
DB Service issue
If the DB Service has no endpoints, the web app may fail even if the web Pod itself is healthy.
Step 9: Check Database Pod
Check for:
- database container not starting
- incorrect environment variables
- missing Secrets
- PVC mount failures
- authentication errors
- database readiness failures
Common database failures
- wrong DB password
- missing Secret
- incorrect Service name
- PVC not bound
- database port mismatch
- app starts before DB is ready
Useful Commands
Common Root Causes
| Symptom | Likely cause | What to check |
|---|---|---|
curl times out |
Service/NodePort/network issue | Service type, NodePort, firewall |
| Service has no endpoints | Selector mismatch | Service selector and Pod labels |
Pod is Pending |
Scheduling/resource issue | kubectl describe pod events |
Pod is ImagePullBackOff |
Image issue | image name, tag, registry auth |
Pod is CrashLoopBackOff |
App crash | kubectl logs --previous |
| App returns DB error | Backend issue | DB Service, DB Pod, DB credentials |
| Pod restarts repeatedly | App/runtime failure | restarts count, previous logs |
| Service port works internally, not externally | Exposure issue | NodePort, LoadBalancer, firewall |
Production Troubleshooting Checklist
Do
- Start from the user-facing endpoint and move layer by layer.
- Confirm Service selectors match Pod labels.
- Check
Endpointsbefore blaming the Pod. - Check Pod
RESTARTS, not onlySTATUS. - Use
kubectl describefor events. - Use
kubectl logs --previousfor restarted containers. - Validate database/service dependencies.
- Keep readiness and liveness probes configured.
- Use meaningful labels like
app,tier, andcomponent. - Keep application logs structured and searchable.
Don't
- Do not restart Pods blindly without checking logs first.
- Do not assume
Runningmeans healthy. - Do not expose databases directly outside the cluster unless required.
- Do not use random labels that make Service selection unclear.
- Do not ignore restart counts.
- Do not hardcode backend Pod IPs; use Services.
- Do not store DB credentials directly in manifests.
Production Best Practices
Design for easier troubleshooting
- Use
readinessProbeso traffic only reaches ready Pods. - Use
livenessProbeto restart unhealthy containers safely. - Use
startupProbefor slow-starting apps. - Expose apps through Services, not Pod IPs.
- Use separate Services for web and database tiers.
- Add resource requests and limits.
- Store credentials in Secrets.
- Use centralized logging and monitoring.
Readiness vs liveness
Do not use the same probe blindly for both.
- Readiness controls whether traffic is sent to the Pod.
- Liveness controls whether the container should be restarted.
Quick Decision Tree
User cannot access app
│
├── curl NodePort/LoadBalancer fails?
│ └── Check Service type, port, node firewall
│
├── Service has no endpoints?
│ └── Check selector and Pod labels
│
├── Pod not ready or restarting?
│ └── Check describe + logs + previous logs
│
├── Web logs show DB error?
│ └── Check DB Service, DB Pod, DB credentials
│
└── Everything looks healthy?
└── Check NetworkPolicy, DNS, app config, ingress/load balancer
Example Investigation
# 1. Test application access
curl http://<node-ip>:<node-port>
# 2. Check web service
kubectl describe svc web-service
# 3. Check endpoints
kubectl get endpoints web-service
# 4. Check pod state
kubectl get pods
# 5. Check pod events
kubectl describe pod web
# 6. Check current logs
kubectl logs web
# 7. Check previous failed container logs
kubectl logs web --previous
# 8. Check backend service
kubectl describe svc db-service
# 9. Check database pod
kubectl describe pod db
kubectl logs db
Quote
Troubleshooting Kubernetes applications is not guessing. Follow the traffic path and verify each object one by one.