Cybersecurity Triage
Find Paths Repeatedly Returning 404
You need to identify missing paths that are being requested repeatedly.
Command
awk '$9==404 {count[$7]++} END {for (path in count) if (count[path] >= 3) print count[path], path}' ./fixtures/nginx/access.log | sort -nr | head
What changed
Nothing changes. The command counts repeated 404 paths and hides one-off misses.
Danger
safe
When to use it
Use this when distinguishing harmless broken links from suspicious or noisy repeated requests.
When not to use it
Do not treat every repeated 404 as malicious; it may be a stale link, deployment mistake, or crawler behavior.
Undo or recovery
No undo needed because the command is read-only.
Expected output
A descending list of 404 counts followed by requested paths.
demo script
Disposable terminal steps
awk '$9==404 {print $1, $7}' ./fixtures/nginx/access.log | headawk '$9==404 {count[$7]++} END {for (path in count) if (count[path] >= 3) print count[path], path}' ./fixtures/nginx/access.log | sort -nr | headawk '$9==404 {count[$1]++} END {for (ip in count) print count[ip], ip}' ./fixtures/nginx/access.log | sort -nr | head
simulated output
What it looks like
::fixture-ready::
$ awk '$9==404 {print $1, $7}' ./fixtures/nginx/access.log | head
203.0.113.44 /missing
203.0.113.44 /missing
203.0.113.44 /missing
203.0.113.44 /wp-login.php
203.0.113.44 /wp-admin
::exit-code::0
$ awk '$9==404 {count[$7]++} END {for (path in count) if (count[path] >= 3) print count[path], path}' ./fixtures/nginx/access.log | sort -nr | head
3 /missing
::exit-code::0
$ awk '$9==404 {count[$1]++} END {for (ip in count) print count[ip], ip}' ./fixtures/nginx/access.log | sort -nr | head
5 203.0.113.44
::exit-code::0
YouTube Short
Repeated 404s are a clue.
A single 404 is boring. Repeated 404s on the same path are worth grouping so you can tell broken links from noisy probing.
LinkedIn hook
One missing URL is normal. A repeated missing URL is a signal.
Question: Do you review repeated 404s as broken-link cleanup, security triage, or both?
experiments
A/B tests to run
Metric: short_click_through_rate
A: One 404 is boring. Repeated 404s are a signal.
B: Find the missing paths people keep requesting.