Hosting Operations
Read-only, can be slowFind HTML Pages Missing from the Sitemap
You need to compare generated HTML files against sitemap URLs.
Command
find public -name '*.html' -print | sed 's#^public#https://example.com#' | while read -r url; do grep -q "$url" public/sitemap.xml || echo "$url"; done
Before you run this
System impact: Read-only. Can create load on large logs, directories, filesystems, or process tables.
When not to use it: Do not treat every result as an error; drafts and private pages may be intentionally omitted.
Expected output
Generated page URLs that do not appear in sitemap.xml.
System impact
Read-only, can be slow. Nothing changes. The command maps file paths to URLs and prints pages absent from the sitemap.
Scope this to the smallest useful path or service on busy systems.
Recovery / rollback: no state is changed.
When to use it
Use when new routes are not appearing in sitemap output.
When not to use it
Do not treat every result as an error; drafts and private pages may be intentionally omitted.
Watch this command run
Command transcript
This sanitized transcript shows the commands and output shape without exposing host details.
$ find public -name '*.html' -print
public/about.html
public/draft.html
public/blog/post.html
public/index.html
$ find public -name '*.html' -print | sed 's#^public#https://example.com#' | while read -r url; do grep -q "$url" public/sitemap.xml || echo "$url"; done
https://example.com/draft.html
https://example.com/index.html
View commands shown
These are the commands shown in the sanitized transcript.
Commands shown
find public -name '*.html' -printfind public -name '*.html' -print | sed 's#^public#https://example.com#' | while read -r url; do grep -q "$url" public/sitemap.xml || echo "$url"; done
next steps
Related commands
Find Feed Links Missing from the Sitemap
Your feed can advertise URLs that the sitemap never lists.
grep -o '<link>https://example.com/[^<]*</link>' public/feed.xml | sed 's#<link>##;s#</link>##' | while read -r url; do grep -q "$url" public/sitemap.xml || echo "$url"; done
Find Pages Missing Canonical Links
Canonical tags are easy to drop when templates branch.
find public -name '*.html' -print | while read -r f; do grep -qi 'rel="canonical"' "$f" || echo "$f"; done
Find Pages Missing Meta Descriptions
Missing descriptions are usually a content template problem, not a mystery.
find public -name '*.html' -print | while read -r f; do grep -qi 'name="description"' "$f" || echo "$f"; done
Find Pages Missing og:title
Social previews often fail because one template missed Open Graph tags.
find public -name '*.html' -print | while read -r f; do grep -qi 'property="og:title"' "$f" || echo "$f"; done
Find Logs Missing Logrotate Coverage
The biggest log risk is often the file no policy mentions.
find /var/log -type f -name '*.log' -printf '%p\n' | while read -r log; do grep -Rqs -- "$log" /etc/logrotate.conf /etc/logrotate.d || grep -Rqs -- "$(dirname "$log")/[*].log" /etc/logrotate.conf /etc/logrotate.d || printf '%s\n' "$log"; done
Study mapping
Use this as independent command practice: read the notes, predict the output, then compare it with the example before using a real shell.
Useful for
- LPIC-1 style command-line practice
- LFCS style performance tasks
- Linux+ style troubleshooting review
Independent study support only. No affiliation, endorsement, exam dumps, or real exam questions.