Back to lessons

Hosting Operations

Find Feed Links Missing from the Sitemap

You need to compare feed item links against sitemap entries.

Command

grep -o 'https://example.com/[^<]*' public/feed.xml | sed 's###;s###' | while read -r url; do grep -q "$url" public/sitemap.xml || echo "$url"; done

What changed

Nothing changes. The command extracts feed links and prints any URL absent from the sitemap.

Danger

safe

When to use it

Use after changing feed generation, permalink formats, or sitemap filters.

When not to use it

Do not use it for feeds with multiple domains without adapting the URL filter.

Undo or recovery

No undo needed because this command is read-only.

Expected output

Feed item URLs that do not appear in sitemap.xml.

demo script

Disposable terminal steps

  1. grep -o 'https://example.com/[^<]*' public/feed.xml | sed 's###;s###'
  2. grep -o 'https://example.com/[^<]*' public/feed.xml | sed 's###;s###' | while read -r url; do grep -q "$url" public/sitemap.xml || echo "$url"; done

simulated output

What it looks like

disposable vessel
::fixture-ready::
$ grep -o 'https://example.com/[^<]*' public/feed.xml | sed 's###;s###'
https://example.com/blog/post.html
https://example.com/news/missing.html
::exit-code::0
$ grep -o 'https://example.com/[^<]*' public/feed.xml | sed 's###;s###' | while read -r url; do grep -q "$url" public/sitemap.xml || echo "$url"; done
https://example.com/news/missing.html
::exit-code::0

YouTube Short

Compare feed to sitemap.

Extract feed URLs and print the ones missing from sitemap.xml. It catches stale or filtered content fast.

LinkedIn hook

Your feed can advertise URLs that the sitemap never lists.

Question: Have you compared feed output against sitemap output after permalink changes?

experiments

A/B tests to run

Metric: save_rate

A: Feed and sitemap disagree.

B: Compare discovery files.