Web Server Rescue
Read-only, can be slowFind Broken Internal Links in Built HTML
You need to list internal href paths that do not exist in the static build.
Command
grep -Rho --include='*.html' 'href="/[^"]*"' public | sed 's#href="##;s#"##' | while read -r path; do test -e "public${path}" || echo "$path"; done | sort -u
Before you run this
System impact: Read-only. Can create load on large logs, directories, filesystems, or process tables.
When not to use it: Do not use it for JavaScript-routed apps or remote URLs without adapting the path logic.
Expected output
Root-relative paths linked from HTML that do not exist in the public directory.
System impact
Read-only, can be slow. Nothing changes. The command extracts internal links and checks whether the target path exists locally.
Scope this to the smallest useful path or service on busy systems.
Recovery / rollback: no state is changed.
When to use it
Use before deploys, after URL changes, or after moving content between sections.
When not to use it
Do not use it for JavaScript-routed apps or remote URLs without adapting the path logic.
Watch this command run
Command transcript
This sanitized transcript shows the commands and output shape without exposing host details.
$ grep -Rho --include='*.html' 'href="/[^"]*"' public | sed 's#href="##;s#"##' | sort -u
/
/assets/site.css
/blog/post.html
/missing.html
$ grep -Rho --include='*.html' 'href="/[^"]*"' public | sed 's#href="##;s#"##' | while read -r path; do test -e "public${path}" || echo "$path"; done | sort -u
/missing.html
View commands shown
These are the commands shown in the sanitized transcript.
Commands shown
grep -Rho --include='*.html' 'href="/[^"]*"' public | sed 's#href="##;s#"##' | sort -ugrep -Rho --include='*.html' 'href="/[^"]*"' public | sed 's#href="##;s#"##' | while read -r path; do test -e "public${path}" || echo "$path"; done | sort -u
next steps
Related commands
Find HTML Pages Missing from the Sitemap
A page can exist in the build but never make it into the sitemap.
find public -name '*.html' -print | sed 's#^public#https://example.com#' | while read -r url; do grep -q "$url" public/sitemap.xml || echo "$url"; done
Find Pages Missing Canonical Links
Canonical tags are easy to drop when templates branch.
find public -name '*.html' -print | while read -r f; do grep -qi 'rel="canonical"' "$f" || echo "$f"; done
Find Feed Links Missing from the Sitemap
Your feed can advertise URLs that the sitemap never lists.
grep -o '<link>https://example.com/[^<]*</link>' public/feed.xml | sed 's#<link>##;s#</link>##' | while read -r url; do grep -q "$url" public/sitemap.xml || echo "$url"; done
Find Pages Missing Meta Descriptions
Missing descriptions are usually a content template problem, not a mystery.
find public -name '*.html' -print | while read -r f; do grep -qi 'name="description"' "$f" || echo "$f"; done
Find Pages Missing og:title
Social previews often fail because one template missed Open Graph tags.
find public -name '*.html' -print | while read -r f; do grep -qi 'property="og:title"' "$f" || echo "$f"; done
Study mapping
Use this as independent command practice: read the notes, predict the output, then compare it with the example before using a real shell.
Useful for
- LPIC-1 style command-line practice
- LFCS style performance tasks
- Linux+ style troubleshooting review
Independent study support only. No affiliation, endorsement, exam dumps, or real exam questions.