Back to lessons

Hosting Operations

Compare Source and Backup File Lists

You need to compare relative file paths in source and backup directories.

Command

comm -3 <(find source -type f | sed 's#^source/##' | sort) <(find backup -type f | sed 's#^backup/##' | sort)

What changed

Nothing changes. The command compares sorted relative file lists.

Danger

safe

When to use it

Use when checking whether a backup contains the same files as the source tree.

When not to use it

Do not use it to prove files have identical contents; compare checksums or use rsync itemization too.

Undo or recovery

No undo needed because this command is read-only.

Expected output

Paths present only on one side of the comparison.

demo script

Disposable terminal steps

  1. find source backup -type f | sort
  2. comm -3 <(find source -type f | sed 's#^source/##' | sort) <(find backup -type f | sed 's#^backup/##' | sort)

simulated output

What it looks like

disposable vessel
::fixture-ready::
$ find source backup -type f | sort
backup/.snapshot
backup/app/config.yml
backup/content/index.md
backup/old-report.csv
backup/tmp/empty.cache
source/app/config.yml
source/assets/logo.svg
source/content/about.md
source/content/index.md
::exit-code::0
$ comm -3 <(find source -type f | sed 's#^source/##' | sort) <(find backup -type f | sed 's#^backup/##' | sort)
	.snapshot
assets/logo.svg
content/about.md
	old-report.csv
	tmp/empty.cache
::exit-code::0

YouTube Short

Compare backup file lists.

Strip the source and backup prefixes, sort both lists, then use comm to see what exists on only one side.

LinkedIn hook

A backup can be missing files and still look plausible at a glance.

Question: Do you compare backup file presence separately from file contents?

experiments

A/B tests to run

Metric: save_rate

A: Missing files hide easily.

B: Compare relative paths.