Cheveo Cheatsheet

OOMKilled Cheatsheet

12 kubectl commands for diagnosis and right-sizing - plus runtime limits for JVM, Node, Go and Python. Printable, free, no signup.

Read the article →

The workflow

1 Confirm
2 Measure
3 Compare
4 Fix
5 Verify

Confirm

1

Reason & exit code

kubectl describe pod <name> | grep -A 5 "Last State"

→ Shows reason (OOMKilled), exit code (137) and timestamp of the last crash.

2

Reason from JSON

kubectl get pod <name> -o jsonpath='{.status.containerStatuses[*].lastState.terminated.reason}'

→ One-line output, ideal for scripts and alerts.

3

Cluster-wide OOMs

kubectl get events -A --field-selector reason=OOMKilling | tail -20

→ Finds every OOMKill across the cluster, sorted by time.

Measure usage

4

Live usage

kubectl top pod <name> --containers

→ Current memory per container - the most important live number.

5

Top memory hogs

kubectl top pods -A --sort-by=memory | head -20

→ Sorts every pod cluster-wide by memory consumption.

6

Cgroup peak (v2)

kubectl exec <pod> -- cat /sys/fs/cgroup/memory.peak

→ Maximum since container start - the only number that matters for right-sizing.

Limits & requests

7

Show limits

kubectl describe pod <name> | grep -A 3 "Limits\|Requests"

→ Shows what the manifest really says - often different from what you expect.

8

Resources structured

kubectl get pod <name> -o jsonpath='{range .spec.containers[*]}{.name}: {.resources}{"\n"}{end}'

→ Clean overview of every container in the pod.

9

Node allocation

kubectl describe node <node> | grep -A 5 "Allocated resources"

→ Total requests vs node capacity - reveals node pressure.

Fix & verify

10

Set the limit

kubectl set resources deployment/<name> --limits=memory=1Gi --requests=memory=512Mi

→ Quick patch - for GitOps, commit it in the manifest.

11

Watch the rollout

kubectl rollout status deployment/<name> -w

→ Waits until new pods are stable, or surfaces errors.

12

Validate under load

kubectl top pod <name> --containers -w

→ Live monitor - usage under load should stay below the new limit.

Runtime memory limits

Rule of thumb: runtime heap = container limit × 0.75. The rest is for stack, native, JIT, threads.

Runtime Setting Example
Java 11+ -XX:MaxRAMPercentage=75.0 1Gi container → 768Mi heap
Java (legacy) -Xmx<size> -Xmx768m
Node.js --max-old-space-size=<MB> --max-old-space-size=768
Go 1.19+ GOMEMLIMIT GOMEMLIMIT=900MiB
Python resource.setrlimit(RLIMIT_AS, ...) in init code

Quick decisions

Reason: OOMKilled, peak ≈ limit
→ Measure & add 30%
Usage grows over days
→ Memory leak in code
Multiple pods OOM at once
→ Check node pressure
JVM/Node/Go container
→ Set runtime heap
Heap dump < container usage
→ Off-heap / native memory
1-Day Intensive Workshop

Kubernetes Debugging - systematic, not guesswork

Replay real production incidents, internalise kubectl workflows, find root causes in minutes.

View workshop details