Back to commands

Hosting Operations

Read-only, can be slow

Check robots.txt for a Sitemap Line

You need to confirm robots.txt advertises the sitemap URL.

Command

grep -n '^Sitemap:' public/robots.txt

Before you run this

System impact: Read-only. Can create load on large logs, directories, filesystems, or process tables.

When not to use it: Do not use it to validate every robots rule; it only checks the sitemap directive.

Expected output

The line number and Sitemap directive from robots.txt.

System impact

Read-only, can be slow. Nothing changes. The command searches robots.txt for a Sitemap directive.

Scope this to the smallest useful path or service on busy systems.

Recovery / rollback: no state is changed.

When to use it

Use after changing sitemap paths, domains, or static host routing.

When not to use it

Do not use it to validate every robots rule; it only checks the sitemap directive.

Watch this command run

Command transcript

This sanitized transcript shows the commands and output shape without exposing host details.

demo@lab:~$

$ cat public/robots.txt

User-agent: *
Disallow: /admin
Sitemap: https://example.com/sitemap.xml

$ grep -n '^Sitemap:' public/robots.txt

3:Sitemap: https://example.com/sitemap.xml
View commands shown

These are the commands shown in the sanitized transcript.

Commands shown

  1. cat public/robots.txt
  2. grep -n '^Sitemap:' public/robots.txt

next steps

Related commands

Hosting Operations Can be slow

Find HTML Pages Missing from the Sitemap

A page can exist in the build but never make it into the sitemap.

find public -name '*.html' -print | sed 's#^public#https://example.com#' | while read -r url; do grep -q "$url" public/sitemap.xml || echo "$url"; done
Hosting Operations Can be slow

Find Feed Links Missing from the Sitemap

Your feed can advertise URLs that the sitemap never lists.

grep -o '<link>https://example.com/[^<]*</link>' public/feed.xml | sed 's#<link>##;s#</link>##' | while read -r url; do grep -q "$url" public/sitemap.xml || echo "$url"; done
Hosting Operations Can be slow

Find Pages Marked noindex

A leftover noindex can hide a page after launch.

grep -Rni --include='*.html' 'noindex' public
Hosting Operations Can be slow

Find Duplicate Page Titles

Duplicate titles make a static site harder to scan in search results and browser tabs.

grep -Rho --include='*.html' '<title>[^<]*</title>' public | sed 's#<title>##;s#</title>##' | sort | uniq -c | sort -nr
Hosting Operations Can be slow

Find Pages Missing Canonical Links

Canonical tags are easy to drop when templates branch.

find public -name '*.html' -print | while read -r f; do grep -qi 'rel="canonical"' "$f" || echo "$f"; done
Study mapping

Use this as independent command practice: read the notes, predict the output, then compare it with the example before using a real shell.

  • lpic1:103-gnu-unix-commands
  • lfcs:essential-commands
  • lfcs:operations-deployment
  • lfcs:services-logs
  • linuxplus:automation-scripting
  • linuxplus:provisional
  • risk:read-only

Useful for

  • LPIC-1 style command-line practice
  • LFCS style performance tasks
  • Linux+ style troubleshooting review

Independent study support only. No affiliation, endorsement, exam dumps, or real exam questions.