Back to commands

Linux Survival Basics

Read-only, can be slow

List URLs from a Sitemap

You need to inspect the loc entries inside a sitemap without opening an XML viewer.

Command

grep -o '<loc>[^<]*</loc>' public/sitemap.xml | sed 's#<loc>##;s#</loc>##'

Before you run this

System impact: Read-only. Can create load on large logs, directories, filesystems, or process tables.

When not to use it: Do not use it as an XML validator; it is a quick inspection command.

Expected output

One sitemap URL per line.

System impact

Read-only, can be slow. Nothing changes. The command extracts URL text from sitemap loc elements.

Scope this to the smallest useful path or service on busy systems.

Recovery / rollback: no state is changed.

When to use it

Use when checking sitemap contents during static-site deploys.

When not to use it

Do not use it as an XML validator; it is a quick inspection command.

Watch this command run

Command transcript

This sanitized transcript shows the commands and output shape without exposing host details.

demo@lab:~$

$ sed -n '1,12p' public/sitemap.xml

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://example.com/schemas/sitemap/0.9">
  <url><loc>https://example.com/</loc></url>
  <url><loc>https://example.com/about.html</loc></url>
  <url><loc>https://example.com/blog/post.html</loc></url>
</urlset>

$ grep -o '<loc>[^<]*</loc>' public/sitemap.xml | sed 's#<loc>##;s#</loc>##'

https://example.com/
https://example.com/about.html
https://example.com/blog/post.html
View commands shown

These are the commands shown in the sanitized transcript.

Commands shown

  1. sed -n '1,12p' public/sitemap.xml
  2. grep -o '<loc>[^<]*</loc>' public/sitemap.xml | sed 's#<loc>##;s#</loc>##'

next steps

Related commands

Hosting Operations Can be slow

Find Feed Links Missing from the Sitemap

Your feed can advertise URLs that the sitemap never lists.

grep -o '<link>https://example.com/[^<]*</link>' public/feed.xml | sed 's#<link>##;s#</link>##' | while read -r url; do grep -q "$url" public/sitemap.xml || echo "$url"; done
Hosting Operations Can be slow

Find HTML Pages Missing from the Sitemap

A page can exist in the build but never make it into the sitemap.

find public -name '*.html' -print | sed 's#^public#https://example.com#' | while read -r url; do grep -q "$url" public/sitemap.xml || echo "$url"; done
Linux Survival Basics Can be slow

Find the Exact Log Line Before You Scroll

The error was there. The useful part was knowing exactly where it was.

grep -inE 'error|failed|denied|timeout' /var/log/nginx/error.log
Linux Survival Basics Can be slow

Count Failures by Test File

Turn noisy test logs into a ranked failure list.

grep -RhoE '[A-Za-z0-9_./-]+\.(test|spec)\.(js|ts|py|rb)' logs/ | sort | uniq -c | sort -nr | head
Linux Survival Basics Read-only

Show the Real User Cron Jobs

Cron problems often hide behind comments, blank lines, and copied folklore.

crontab -l | sed -n '/^[[:space:]]*#/d;/^[[:space:]]*$/d;p'
Study mapping

Use this as independent command practice: read the notes, predict the output, then compare it with the example before using a real shell.

  • lpic1:103-gnu-unix-commands
  • lfcs:essential-commands
  • linuxplus:automation-scripting
  • linuxplus:provisional
  • risk:read-only

Useful for

  • LPIC-1 style command-line practice
  • LFCS style performance tasks
  • Linux+ style troubleshooting review

Independent study support only. No affiliation, endorsement, exam dumps, or real exam questions.