Back to lessons

Cybersecurity Triage

Find Paths Repeatedly Returning 404

You need to identify missing paths that are being requested repeatedly.

Command

awk '$9==404 {count[$7]++} END {for (path in count) if (count[path] >= 3) print count[path], path}' ./fixtures/nginx/access.log | sort -nr | head

What changed

Nothing changes. The command counts repeated 404 paths and hides one-off misses.

Danger

safe

When to use it

Use this when distinguishing harmless broken links from suspicious or noisy repeated requests.

When not to use it

Do not treat every repeated 404 as malicious; it may be a stale link, deployment mistake, or crawler behavior.

Undo or recovery

No undo needed because the command is read-only.

Expected output

A descending list of 404 counts followed by requested paths.

demo script

Disposable terminal steps

  1. awk '$9==404 {print $1, $7}' ./fixtures/nginx/access.log | head
  2. awk '$9==404 {count[$7]++} END {for (path in count) if (count[path] >= 3) print count[path], path}' ./fixtures/nginx/access.log | sort -nr | head
  3. awk '$9==404 {count[$1]++} END {for (ip in count) print count[ip], ip}' ./fixtures/nginx/access.log | sort -nr | head

simulated output

What it looks like

disposable vessel
::fixture-ready::
$ awk '$9==404 {print $1, $7}' ./fixtures/nginx/access.log | head
203.0.113.44 /missing
203.0.113.44 /missing
203.0.113.44 /missing
203.0.113.44 /wp-login.php
203.0.113.44 /wp-admin
::exit-code::0
$ awk '$9==404 {count[$7]++} END {for (path in count) if (count[path] >= 3) print count[path], path}' ./fixtures/nginx/access.log | sort -nr | head
3 /missing
::exit-code::0
$ awk '$9==404 {count[$1]++} END {for (ip in count) print count[ip], ip}' ./fixtures/nginx/access.log | sort -nr | head
5 203.0.113.44
::exit-code::0

YouTube Short

Repeated 404s are a clue.

A single 404 is boring. Repeated 404s on the same path are worth grouping so you can tell broken links from noisy probing.

LinkedIn hook

One missing URL is normal. A repeated missing URL is a signal.

Question: Do you review repeated 404s as broken-link cleanup, security triage, or both?

experiments

A/B tests to run

Metric: short_click_through_rate

A: One 404 is boring. Repeated 404s are a signal.

B: Find the missing paths people keep requesting.