Hosting Operations
Read-only, can be slowCheck robots.txt for a Sitemap Line
You need to confirm robots.txt advertises the sitemap URL.
Command
grep -n '^Sitemap:' public/robots.txt
Before you run this
System impact: Read-only. Can create load on large logs, directories, filesystems, or process tables.
When not to use it: Do not use it to validate every robots rule; it only checks the sitemap directive.
Expected output
The line number and Sitemap directive from robots.txt.
System impact
Read-only, can be slow. Nothing changes. The command searches robots.txt for a Sitemap directive.
Scope this to the smallest useful path or service on busy systems.
Recovery / rollback: no state is changed.
When to use it
Use after changing sitemap paths, domains, or static host routing.
When not to use it
Do not use it to validate every robots rule; it only checks the sitemap directive.
Watch this command run
Command transcript
This sanitized transcript shows the commands and output shape without exposing host details.
$ cat public/robots.txt
User-agent: *
Disallow: /admin
Sitemap: https://example.com/sitemap.xml
$ grep -n '^Sitemap:' public/robots.txt
3:Sitemap: https://example.com/sitemap.xml
View commands shown
These are the commands shown in the sanitized transcript.
Commands shown
cat public/robots.txtgrep -n '^Sitemap:' public/robots.txt
next steps
Related commands
Find HTML Pages Missing from the Sitemap
A page can exist in the build but never make it into the sitemap.
find public -name '*.html' -print | sed 's#^public#https://example.com#' | while read -r url; do grep -q "$url" public/sitemap.xml || echo "$url"; done
Find Feed Links Missing from the Sitemap
Your feed can advertise URLs that the sitemap never lists.
grep -o '<link>https://example.com/[^<]*</link>' public/feed.xml | sed 's#<link>##;s#</link>##' | while read -r url; do grep -q "$url" public/sitemap.xml || echo "$url"; done
Find Pages Marked noindex
A leftover noindex can hide a page after launch.
grep -Rni --include='*.html' 'noindex' public
Find Duplicate Page Titles
Duplicate titles make a static site harder to scan in search results and browser tabs.
grep -Rho --include='*.html' '<title>[^<]*</title>' public | sed 's#<title>##;s#</title>##' | sort | uniq -c | sort -nr
Find Pages Missing Canonical Links
Canonical tags are easy to drop when templates branch.
find public -name '*.html' -print | while read -r f; do grep -qi 'rel="canonical"' "$f" || echo "$f"; done
Study mapping
Use this as independent command practice: read the notes, predict the output, then compare it with the example before using a real shell.
Useful for
- LPIC-1 style command-line practice
- LFCS style performance tasks
- Linux+ style troubleshooting review
Independent study support only. No affiliation, endorsement, exam dumps, or real exam questions.