Linux Survival Basics
List URLs from a Sitemap
You need to inspect the loc entries inside a sitemap without opening an XML viewer.
Command
grep -o '[^<]* ' public/sitemap.xml | sed 's###;s# ##'
What changed
Nothing changes. The command extracts URL text from sitemap loc elements.
Danger
safe
When to use it
Use when checking sitemap contents during static-site deploys.
When not to use it
Do not use it as an XML validator; it is a quick inspection command.
Undo or recovery
No undo needed because this command is read-only.
Expected output
One sitemap URL per line.
demo script
Disposable terminal steps
sed -n '1,12p' public/sitemap.xmlgrep -o '[^<]* ' public/sitemap.xml | sed 's###;s# ##'
simulated output
What it looks like
::fixture-ready::
$ sed -n '1,12p' public/sitemap.xml
https://example.com/
https://example.com/about.html
https://example.com/blog/post.html
::exit-code::0
$ grep -o '[^<]* ' public/sitemap.xml | sed 's###;s# ##'
https://example.com/
https://example.com/about.html
https://example.com/blog/post.html
::exit-code::0
YouTube Short
Print sitemap URLs.
For a quick sitemap sanity check, extract loc values and read the URLs line by line.
LinkedIn hook
Before comparing sitemap coverage, print the URLs plainly.
Question: Do you inspect sitemap output after changing routes?
experiments
A/B tests to run
Metric: completion_rate
A: List sitemap URLs.
B: Make XML readable fast.