Back to lessons

Cybersecurity Triage

Find Clients Repeating the Same Path

You need to find IP and path pairs that appear repeatedly in a web access log.

Command

awk '{key=$1 " " $7; count[key]++} END {for (k in count) if (count[k] >= 5) print count[k], k}' ./fixtures/nginx/access.log | sort -nr | head

What changed

Nothing changes. The command counts repeated source-IP and path combinations.

Danger

safe

When to use it

Use this when looking for repeated polling, scraping, broken clients, or suspicious request loops.

When not to use it

Do not use this alone to decide intent; repeated requests can come from health checks, retries, or caches.

Undo or recovery

No undo needed because the command is read-only.

Expected output

Counts followed by source IP and requested path.

demo script

Disposable terminal steps

  1. awk '{print $1, $7}' ./fixtures/nginx/access.log | head
  2. awk '{key=$1 " " $7; count[key]++} END {for (k in count) if (count[k] >= 5) print count[k], k}' ./fixtures/nginx/access.log | sort -nr | head
  3. awk '{count[$1]++} END {for (ip in count) print count[ip], ip}' ./fixtures/nginx/access.log | sort -nr | head

simulated output

What it looks like

disposable vessel
::fixture-ready::
$ awk '{print $1, $7}' ./fixtures/nginx/access.log | head
198.51.100.10 /
198.51.100.11 /docs
198.51.100.12 /api/search
203.0.113.44 /missing
203.0.113.44 /missing
203.0.113.44 /missing
203.0.113.44 /wp-login.php
203.0.113.44 /wp-admin
203.0.113.45 /admin
203.0.113.45 /login
::exit-code::0
$ awk '{key=$1 " " $7; count[key]++} END {for (k in count) if (count[k] >= 5) print count[k], k}' ./fixtures/nginx/access.log | sort -nr | head
5 198.51.100.30 /health
::exit-code::0
$ awk '{count[$1]++} END {for (ip in count) print count[ip], ip}' ./fixtures/nginx/access.log | sort -nr | head
5 203.0.113.44
5 198.51.100.30
3 198.51.100.25
2 203.0.113.46
2 203.0.113.45
2 198.51.100.24
1 198.51.100.23
1 198.51.100.22
1 198.51.100.21
1 198.51.100.12
::exit-code::0

YouTube Short

Find IP-path repeaters.

Counting requests by IP is useful, but pairing IP with path shows a tighter pattern: one client repeating one URL again and again.

LinkedIn hook

The suspicious pattern is sometimes one client hammering one URL.

Question: Do you group web traffic by IP alone, or by IP plus path?

experiments

A/B tests to run

Metric: youtube_retention_15s

A: One IP hammering one URL is easy to miss.

B: Pair the client with the path.