What does CrashLoopBackOff mean in Kubernetes?

CrashLoopBackOff isn't an error - it's a state: Kubernetes started the container, it crashed multiple times, and the kubelet is now waiting with exponential backoff (10s, 20s, 40s, ... up to 5 minutes) until the next restart. The actual root cause is in the container itself, not in Kubernetes.

How do I see why a pod is in CrashLoopBackOff?

Three commands in this order: 'kubectl describe pod ' shows the exit code in the Last State section, 'kubectl logs --previous' shows the logs of the last crash, and 'kubectl get events --sort-by=.lastTimestamp' shows system events like OOMKilled or FailedMount. In 90% of cases the cause is in one of those outputs.

What if logs are empty?

Empty logs in CrashLoopBackOff usually mean: the liveness probe fails before the container could write anything, the container hit OOMKill due to memory limits, or the entrypoint can't find a required file or env variable. With 'kubectl debug' and an ephemeral container you can find the root cause even without logs.

Fix CrashLoopBackOff Systematically: 7 Causes, 1 Workflow

TL;DR - CrashLoopBackOff isn’t a bug, it’s a state: the container started, crashed and is waiting for the next restart attempt. The cause is never in Kubernetes - always in the container, the manifest, or the environment. This workflow finds 90% of root causes in under 5 minutes: read the exit code, look at previous logs, check events, then test the most common of the 7 causes.

What CrashLoopBackOff really means

The term confuses more engineers than necessary. CrashLoopBackOff means:

Kubernetes started the container
The container terminated with a non-zero exit code
Kubelet tried to restart it
Crash again
Kubelet now waits with exponential backoff until the next attempt (10s, 20s, 40s, 80s, … up to 5 minutes)

So the container isn’t running. The current attempt’s logs are empty because no restart has happened yet. You need the logs from the previous crash - this is the most important insight for everything that follows.

The 3-step workflow

Whatever the cause, these three commands always run first.

Step 1: Read the exit code

kubectl describe pod <name>

Scroll to the Last State section:

Last State:     Terminated
  Reason:       Error
  Exit Code:    137
  Started:      Mon, 04 May 2026 14:20:15 +0200
  Finished:     Mon, 04 May 2026 14:20:18 +0200

The exit code is the most important piece of information:

Exit code	Meaning
0	Clean exit (shouldn’t end up in CrashLoop, check `restartPolicy`)
1	Generic application error - read the logs
2	Misuse of shell builtins - usually a typo in the command
126	Command not executable - permission issue
127	Command not found - wrong path or missing binary
137	SIGKILL - usually OOMKilled (check the Reason)
139	SIGSEGV - segmentation fault, native code crashed
143	SIGTERM - cleanly terminated from outside, check liveness probe

Step 2: Read previous logs

kubectl logs <pod> --previous

The --previous switch isn’t optional. Without it you see the logs of the current (not yet started) container - i.e. nothing. With it you see the logs of the last crash.

Step 3: Check system events

kubectl get events --sort-by=.lastTimestamp -n <namespace> | tail -20

System events show things the container itself can’t log: OOMKilled by memory limit, FailedMount on volumes, BackOff counts.

The 7 most common causes

1. Application error (exit code 1)

By far the most common cause. The code throws an exception at startup. Reasons:

Missing or wrong environment variable
Database not reachable (typical race-condition symptom)
Migration script fails
Config file doesn’t exist or has the wrong format

Fix workflow: kubectl logs <pod> --previous, read the exception, fix it. For DB race conditions: an initContainer with wait-for-db in front of the app.

2. OOMKilled (exit code 137, Reason: OOMKilled)

Container exceeded its memory limit. Kernel killed it with SIGKILL.

kubectl describe pod <name> | grep -A 2 "Last State"
# Reason: OOMKilled
# Exit Code: 137

Fix: Raise the memory limit or find the memory leak in the code. Doubling the limit is symptomatic cosmetics - real leaks just crash the pod again later. kubectl top pod shows real-time consumption. For the full diagnosis including JVM/Node/Go gotchas, see the OOMKilled cheatsheet and the OOMKilled article.

resources:
  requests:
    memory: "256Mi"
  limits:
    memory: "512Mi"  # based on real usage, not gut feeling

3. Liveness probe fails (exit code 143, often with empty logs)

The liveness probe fails, kubelet terminates the container with SIGTERM. If the probe fails on the very first run, you usually see no logs at all.

Classic fail: initialDelaySeconds is too low. A Spring Boot app needs 30-60s to start, the probe fires after 10s and kills the container.

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 60   # not 10
  periodSeconds: 10
  failureThreshold: 3

Diagnosis:

kubectl describe pod <name> | grep -A 5 "Liveness"
kubectl get events | grep -i "liveness"

1-Day Intensive Workshop

Kubernetes Debugging - systematic, not guesswork

Replay real production incidents, internalise kubectl workflows, find root causes in minutes.

View workshop details

4. Command not found (exit code 127)

The container image doesn’t have the command specified in the manifest. Or the path is wrong:

command: ["/usr/local/bin/myapp"]   # does this actually exist?

Quick test:

kubectl debug -it <pod> --image=busybox --target=<container>
# in the debug container:
ls /proc/1/root/usr/local/bin/

The ephemeral debug container shares the filesystem namespace, so you can see exactly what paths the original container had.

5. ConfigMap or Secret missing (exit code 1, often “no such file”)

The pod manifest mounts a ConfigMap that doesn’t exist. The container starts, can’t find the file, crashes.

kubectl get configmap -n <namespace>
kubectl get secret -n <namespace>

Common cause: namespace confusion. The pod is in prod, the ConfigMap is in default.

6. Wrong volume permissions (exit code 1, “permission denied”)

A mounted PV has UID/GID that the container user can’t read. Frequently happens when migrating from Docker Compose to Kubernetes.

Fix: set securityContext.fsGroup so kubelet adjusts permissions on mount:

spec:
  securityContext:
    fsGroup: 1000
  containers:
    - name: app
      image: ...

7. Race condition with the database (exit code 1, “connection refused”)

The app starts before the database, can’t connect, crashes. The restart might catch the DB - or not.

Fix: an initContainer with a wait script:

initContainers:
  - name: wait-for-db
    image: busybox:1.36
    command: ['sh', '-c', 'until nc -zv postgres 5432; do echo waiting for db; sleep 2; done']

Or better: application-level retry with exponential backoff in the code itself. initContainer is the bandage, not the cure.

Decision tree

Exit code 137 + Reason OOMKilled?  ->  Cause 2 (memory)
                                  v no
Logs --previous completely empty?  ->  Cause 3 (liveness probe)
                                  v no
Logs say "no such file"?           ->  Cause 5 (ConfigMap/Secret)
                                  v no
Logs say "permission denied"?      ->  Cause 6 (volume permissions)
                                  v no
Logs say "command not found"?      ->  Cause 4 (command/path)
                                  v no
Logs say "connection refused"?     ->  Cause 7 (race condition)
                                  v no
                                       Cause 1 (application error)
                                       -> Read logs carefully, check stack trace

What the workshops cover that this article doesn’t

This workflow handles the most common cases. What isn’t in the 7 causes:

JVM container with hidden OOM: the Java heap explodes but the kernel doesn’t see it as OOM. You only see exit code 1 with Killed.
Network policy blocking init egress: the app wants to fetch configs from Vault at startup, NetworkPolicy only allows inbound, app crashes without a log.
Cluster autoscaler killing the node during pod start: pod gets evicted before liveness probe runs, restart on a new node.

These patterns need more than command order - they need system understanding. That’s the difference between “guess and try” and “find the root cause in 3 minutes”.

What’s next

In our Kubernetes Debugging Workshop we replay 8 real production incidents - including the three edge cases above - and drill the workflow until it sticks. 1 day, 8 hours, after which you solve CrashLoopBackOff systematically instead of by guessing.

Before you book: also have a look at our kubectl Debugging Cheatsheet for the 12 most important commands as a complete workflow. Also in the debugging series: OOMKilled in Kubernetes for exit-137 cases and Pod Pending: 23 causes for stuck pods.

Fix CrashLoopBackOff Systematically: 7 Causes, 1 Workflow

What CrashLoopBackOff really means

The 3-step workflow

Step 1: Read the exit code

Step 2: Read previous logs

Step 3: Check system events

The 7 most common causes

1. Application error (exit code 1)

2. OOMKilled (exit code 137, Reason: OOMKilled)

3. Liveness probe fails (exit code 143, often with empty logs)

Kubernetes Debugging - systematic, not guesswork

4. Command not found (exit code 127)

5. ConfigMap or Secret missing (exit code 1, often “no such file”)

6. Wrong volume permissions (exit code 1, “permission denied”)

7. Race condition with the database (exit code 1, “connection refused”)

Decision tree

What the workshops cover that this article doesn’t

What’s next

Kubernetes Debugging - systematic, not guesswork

kubectl Debugging Cheatsheet: 12 Commands for Production Incidents

kubectl Debugging Cheatsheet: 12 Commands for Production Incidents

OOMKilled in Kubernetes: 6 Causes, kubectl Workflow, Right-Sizing

Need a second opinion on your cluster?

What CrashLoopBackOff really means

The 3-step workflow

Step 1: Read the exit code

Step 2: Read previous logs

Step 3: Check system events

The 7 most common causes

1. Application error (exit code 1)

2. OOMKilled (exit code 137, Reason: OOMKilled)

3. Liveness probe fails (exit code 143, often with empty logs)

Kubernetes Debugging - systematic, not guesswork

4. Command not found (exit code 127)

5. ConfigMap or Secret missing (exit code 1, often “no such file”)

6. Wrong volume permissions (exit code 1, “permission denied”)

7. Race condition with the database (exit code 1, “connection refused”)

Decision tree

What the workshops cover that this article doesn’t

What’s next

Kubernetes Debugging - systematic, not guesswork

Keep reading

kubectl Debugging Cheatsheet: 12 Commands for Production Incidents

kubectl Debugging Cheatsheet: 12 Commands for Production Incidents

OOMKilled in Kubernetes: 6 Causes, kubectl Workflow, Right-Sizing

Need a second opinion on your cluster?