Back to lessons

Hosting Operations

Find HTML Pages Missing from the Sitemap

You need to compare generated HTML files against sitemap URLs.

Command

find public -name '*.html' -print | sed 's#^public#https://example.com#' | while read -r url; do grep -q "$url" public/sitemap.xml || echo "$url"; done

What changed

Nothing changes. The command maps file paths to URLs and prints pages absent from the sitemap.

Danger

safe

When to use it

Use when new routes are not appearing in sitemap output.

When not to use it

Do not treat every result as an error; drafts and private pages may be intentionally omitted.

Undo or recovery

No undo needed because this command is read-only.

Expected output

Generated page URLs that do not appear in sitemap.xml.

demo script

Disposable terminal steps

  1. find public -name '*.html' -print
  2. find public -name '*.html' -print | sed 's#^public#https://example.com#' | while read -r url; do grep -q "$url" public/sitemap.xml || echo "$url"; done

simulated output

What it looks like

disposable vessel
::fixture-ready::
$ find public -name '*.html' -print
public/about.html
public/draft.html
public/blog/post.html
public/index.html
::exit-code::0
$ find public -name '*.html' -print | sed 's#^public#https://example.com#' | while read -r url; do grep -q "$url" public/sitemap.xml || echo "$url"; done
https://example.com/draft.html
https://example.com/index.html
::exit-code::0

YouTube Short

Which pages missed the sitemap?

Map generated HTML paths to public URLs, then print any URL missing from sitemap.xml.

LinkedIn hook

A page can exist in the build but never make it into the sitemap.

Question: Do you compare generated pages against sitemap output?

experiments

A/B tests to run

Metric: save_rate

A: Built but not in sitemap.

B: Compare files to sitemap URLs.