Hosting Operations
Read-only, can be slowFind Pages Marked noindex
You need to identify generated HTML pages that contain noindex directives.
Command
grep -Rni --include='*.html' 'noindex' public
Before you run this
System impact: Read-only. Can create load on large logs, directories, filesystems, or process tables.
When not to use it: Do not assume every noindex is wrong; some pages should intentionally stay out of search.
Expected output
Matching file paths and lines containing noindex.
System impact
Read-only, can be slow. Nothing changes. The command searches generated HTML for noindex.
Scope this to the smallest useful path or service on busy systems.
Recovery / rollback: no state is changed.
When to use it
Use before launch, after moving draft pages, or when a page is not appearing in search.
When not to use it
Do not assume every noindex is wrong; some pages should intentionally stay out of search.
Watch this command run
Command transcript
This sanitized transcript shows the commands and output shape without exposing host details.
$ find public -name '*.html' -print
public/about.html
public/draft.html
public/blog/post.html
public/index.html
$ grep -Rni --include='*.html' 'noindex' public
public/draft.html:4:<meta name="robots" content="noindex,nofollow">
View commands shown
These are the commands shown in the sanitized transcript.
Commands shown
find public -name '*.html' -printgrep -Rni --include='*.html' 'noindex' public
next steps
Related commands
Find Duplicate Page Titles
Duplicate titles make a static site harder to scan in search results and browser tabs.
grep -Rho --include='*.html' '<title>[^<]*</title>' public | sed 's#<title>##;s#</title>##' | sort | uniq -c | sort -nr
Find HTML Pages Missing from the Sitemap
A page can exist in the build but never make it into the sitemap.
find public -name '*.html' -print | sed 's#^public#https://example.com#' | while read -r url; do grep -q "$url" public/sitemap.xml || echo "$url"; done
Find Pages Missing Canonical Links
Canonical tags are easy to drop when templates branch.
find public -name '*.html' -print | while read -r f; do grep -qi 'rel="canonical"' "$f" || echo "$f"; done
Find Pages Missing og:title
Social previews often fail because one template missed Open Graph tags.
find public -name '*.html' -print | while read -r f; do grep -qi 'property="og:title"' "$f" || echo "$f"; done
Find Pages Missing Meta Descriptions
Missing descriptions are usually a content template problem, not a mystery.
find public -name '*.html' -print | while read -r f; do grep -qi 'name="description"' "$f" || echo "$f"; done
Study mapping
Use this as independent command practice: read the notes, predict the output, then compare it with the example before using a real shell.
Useful for
- LPIC-1 style command-line practice
- LFCS style performance tasks
- Linux+ style troubleshooting review
Independent study support only. No affiliation, endorsement, exam dumps, or real exam questions.