Back to lessons

Web Server Rescue

Find Broken Internal Links in Built HTML

You need to list internal href paths that do not exist in the static build.

Command

grep -Rho --include='*.html' 'href="/[^"]*"' public | sed 's#href="##;s#"##' | while read -r path; do test -e "public${path}" || echo "$path"; done | sort -u

What changed

Nothing changes. The command extracts internal links and checks whether the target path exists locally.

Danger

safe

When to use it

Use before deploys, after URL changes, or after moving content between sections.

When not to use it

Do not use it for JavaScript-routed apps or remote URLs without adapting the path logic.

Undo or recovery

No undo needed because this command is read-only.

Expected output

Root-relative paths linked from HTML that do not exist in the public directory.

demo script

Disposable terminal steps

  1. grep -Rho --include='*.html' 'href="/[^"]*"' public | sed 's#href="##;s#"##' | sort -u
  2. grep -Rho --include='*.html' 'href="/[^"]*"' public | sed 's#href="##;s#"##' | while read -r path; do test -e "public${path}" || echo "$path"; done | sort -u

simulated output

What it looks like

disposable vessel
::fixture-ready::
$ grep -Rho --include='*.html' 'href="/[^"]*"' public | sed 's#href="##;s#"##' | sort -u
/
/assets/site.css
/blog/post.html
/missing.html
::exit-code::0
$ grep -Rho --include='*.html' 'href="/[^"]*"' public | sed 's#href="##;s#"##' | while read -r path; do test -e "public${path}" || echo "$path"; done | sort -u
/missing.html
::exit-code::0

YouTube Short

Find broken internal links.

Extract root-relative hrefs from generated HTML and test whether each target exists in the build.

LinkedIn hook

A broken internal link is easiest to catch before it becomes a 404.

Question: Do you check broken internal links from the built output or source files?

experiments

A/B tests to run

Metric: share_rate

A: Broken links before deploy.

B: Audit the generated HTML.