Back to lessons

Cybersecurity Triage

Find the IPs Creating the Most 4xx Noise

You need to identify which client IPs are generating the most client-side errors in a web access log.

Command

awk '$9 ~ /^4/ {count[$1]++} END {for (ip in count) print count[ip], ip}' ./fixtures/nginx/access.log | sort -nr | head

What changed

Nothing changes. The command counts matching log lines and prints the busiest 4xx sources.

Danger

safe

When to use it

Use this during defensive web-log triage when you want to separate normal missing pages from noisy clients.

When not to use it

Do not block an IP from this count alone; review paths, time windows, and business context first.

Undo or recovery

No undo needed because the command is read-only.

Expected output

A descending list of request counts followed by source IP addresses.

demo script

Disposable terminal steps

  1. awk '$9 ~ /^4/ {print $1, $7, $9}' ./fixtures/nginx/access.log | head
  2. awk '$9 ~ /^4/ {count[$1]++} END {for (ip in count) print count[ip], ip}' ./fixtures/nginx/access.log | sort -nr | head
  3. awk '$9 ~ /^4/ {print $1, $7, $9}' ./fixtures/nginx/access.log

simulated output

What it looks like

disposable vessel
::fixture-ready::
$ awk '$9 ~ /^4/ {print $1, $7, $9}' ./fixtures/nginx/access.log | head
203.0.113.44 /missing 404
203.0.113.44 /missing 404
203.0.113.44 /missing 404
203.0.113.44 /wp-login.php 404
203.0.113.44 /wp-admin 404
203.0.113.45 /admin 403
203.0.113.45 /login 403
203.0.113.46 /api/profile 405
203.0.113.46 /api/profile 405
::exit-code::0
$ awk '$9 ~ /^4/ {count[$1]++} END {for (ip in count) print count[ip], ip}' ./fixtures/nginx/access.log | sort -nr | head
5 203.0.113.44
2 203.0.113.46
2 203.0.113.45
::exit-code::0
$ awk '$9 ~ /^4/ {print $1, $7, $9}' ./fixtures/nginx/access.log
203.0.113.44 /missing 404
203.0.113.44 /missing 404
203.0.113.44 /missing 404
203.0.113.44 /wp-login.php 404
203.0.113.44 /wp-admin 404
203.0.113.45 /admin 403
203.0.113.45 /login 403
203.0.113.46 /api/profile 405
203.0.113.46 /api/profile 405
::exit-code::0

YouTube Short

Find the loudest 4xx source.

When the access log gets noisy, count 4xx responses by IP first. It shows which clients are generating the most failed requests without changing anything on the server.

LinkedIn hook

One address can turn a normal access log into a wall of failed requests.

Question: When web logs get noisy, do you group failures by IP or by URL first?

experiments

A/B tests to run

Metric: short_click_through_rate

A: One IP can explain the whole 4xx spike.

B: Find the noisiest client in your access log.