Back to problems

problem hub

Read-only first

OOM killer logs on Linux

Confirm whether the kernel killed a process for memory pressure before restarting workloads or changing limits.

Safest first command

journalctl -k --since '24 hours ago' --no-pager | grep -iE 'out of memory|oom-killer|killed process'

Before you run this

Expected output: Kernel journal lines naming OOM events, killed process IDs, memory cgroup context, or no output if no recent OOM lines match.

When not to use it: Do not restart memory-heavy workloads before capturing which process was killed and whether the host or container hit the limit.

Expected output example

kernel: Out of memory: Killed process 4242 (python) total-vm:2048000kB, anon-rss:980000kB

How to read the result

The killed process line names the victim, not always the root cause. Pair kernel logs with current memory and process snapshots.

What to check next

Killed process lines exist

Means: The kernel recently reclaimed memory by killing a process.

Next step: Capture memory snapshot and top consumers.

Find OOM Killer Lines in the Kernel Journal

No OOM lines but memory is full

Means: Pressure may be current but not yet killed, or logs may be outside the window.

Next step: Check current memory and swap.

Take a Memory Pressure Snapshot

One process dominates memory

Means: A leak or workload spike may be driving pressure.

Next step: Inspect owner, service, and logs before restart.

Show Top Memory Processes

OOM decision tree

Read kernel OOM lines, then compare current memory, swap, and top processes. If containers or cgroups are involved, check their limits before changing host memory settings.

  1. journalctl -k --since '24 hours ago' --no-pager | grep -iE 'out of memory|oom-killer|killed process'
  2. free -h
  3. ps -eo pid,comm,%mem,%cpu --sort=-%mem | head

Bad fixes to avoid

Do not restart the victim process before saving logs and current memory state. Do not assume the killed process caused the pressure; it may only have been selected as the victim.

Common causes

  • Application memory leak
  • Container memory limit
  • Host memory exhaustion
  • Swap pressure
  • Batch job spike

What not to change yet

  • Do not restart before capturing evidence.
  • Do not raise limits without knowing host capacity.
  • Do not ignore cgroup or container limits.

Stop and escalate if

  • The next step could interrupt users, remove data, or lock out access.
  • The output includes secrets, customer data, or private infrastructure details.
  • You cannot explain the blast radius of the repair command.

supporting commands

Command path

Guides and drills