Back to lessons

Linux Survival Basics

List URLs from a Sitemap

You need to inspect the loc entries inside a sitemap without opening an XML viewer.

Command

grep -o '[^<]*' public/sitemap.xml | sed 's###;s###'

What changed

Nothing changes. The command extracts URL text from sitemap loc elements.

Danger

safe

When to use it

Use when checking sitemap contents during static-site deploys.

When not to use it

Do not use it as an XML validator; it is a quick inspection command.

Undo or recovery

No undo needed because this command is read-only.

Expected output

One sitemap URL per line.

demo script

Disposable terminal steps

  1. sed -n '1,12p' public/sitemap.xml
  2. grep -o '[^<]*' public/sitemap.xml | sed 's###;s###'

simulated output

What it looks like

disposable vessel
::fixture-ready::
$ sed -n '1,12p' public/sitemap.xml


  https://example.com/
  https://example.com/about.html
  https://example.com/blog/post.html

::exit-code::0
$ grep -o '[^<]*' public/sitemap.xml | sed 's###;s###'
https://example.com/
https://example.com/about.html
https://example.com/blog/post.html
::exit-code::0

YouTube Short

Print sitemap URLs.

For a quick sitemap sanity check, extract loc values and read the URLs line by line.

LinkedIn hook

Before comparing sitemap coverage, print the URLs plainly.

Question: Do you inspect sitemap output after changing routes?

experiments

A/B tests to run

Metric: completion_rate

A: List sitemap URLs.

B: Make XML readable fast.