Back to commands

Hosting Operations

Read-only, can be slow

Find Duplicate Page Titles

You need to find repeated HTML title text across a built static site.

Command

grep -Rho --include='*.html' '<title>[^<]*</title>' public | sed 's#<title>##;s#</title>##' | sort | uniq -c | sort -nr

Before you run this

System impact: Read-only. Can create load on large logs, directories, filesystems, or process tables.

When not to use it: Do not treat it as a complete SEO audit; it only checks the visible title element.

Expected output

A count-sorted list of title strings, with duplicates showing counts greater than one.

System impact

Read-only, can be slow. Nothing changes. The command reads HTML files, extracts title text, and counts repeats.

Scope this to the smallest useful path or service on busy systems.

Recovery / rollback: no state is changed.

When to use it

Use before publishing a static site or after template changes that may duplicate page titles.

When not to use it

Do not treat it as a complete SEO audit; it only checks the visible title element.

Watch this command run

Command transcript

This sanitized transcript shows the commands and output shape without exposing host details.

demo@lab:~$

$ find public -name '*.html' -maxdepth 3 -print

public/about.html
public/draft.html
public/blog/post.html
public/index.html

$ grep -Rho --include='*.html' '<title>[^<]*</title>' public | sed 's#<title>##;s#</title>##' | sort | uniq -c | sort -nr

      2 Demo Site
      1 Post One
      1 Draft Page
View commands shown

These are the commands shown in the sanitized transcript.

Commands shown

  1. find public -name '*.html' -maxdepth 3 -print
  2. grep -Rho --include='*.html' '<title>[^<]*</title>' public | sed 's#<title>##;s#</title>##' | sort | uniq -c | sort -nr

next steps

Related commands

Web Server Rescue Can be slow

Find Broken Internal Links in Built HTML

A broken internal link is easiest to catch before it becomes a 404.

grep -Rho --include='*.html' 'href="/[^"]*"' public | sed 's#href="##;s#"##' | while read -r path; do test -e "public${path}" || echo "$path"; done | sort -u
Hosting Operations Can be slow

Find HTML Pages Missing from the Sitemap

A page can exist in the build but never make it into the sitemap.

find public -name '*.html' -print | sed 's#^public#https://example.com#' | while read -r url; do grep -q "$url" public/sitemap.xml || echo "$url"; done
Hosting Operations Can be slow

Find Pages Missing og:title

Social previews often fail because one template missed Open Graph tags.

find public -name '*.html' -print | while read -r f; do grep -qi 'property="og:title"' "$f" || echo "$f"; done
Hosting Operations Can be slow

Count Request IDs in Error Lines

Repeated request IDs can connect separate error lines to one failing path.

grep -Ei 'error|timeout|fatal|exception' fixtures/incidents/app.log | awk '{for (i=1;i<=NF;i++) if ($i ~ /^request_id=/) print $i}' | sort | uniq -c | sort -nr
Hosting Operations Can be slow

Find Pages Marked noindex

A leftover noindex can hide a page after launch.

grep -Rni --include='*.html' 'noindex' public
Study mapping

Use this as independent command practice: read the notes, predict the output, then compare it with the example before using a real shell.

  • lpic1:103-gnu-unix-commands
  • lfcs:essential-commands
  • lfcs:operations-deployment
  • lfcs:services-logs
  • linuxplus:automation-scripting
  • linuxplus:provisional
  • risk:read-only

Useful for

  • LPIC-1 style command-line practice
  • LFCS style performance tasks
  • Linux+ style troubleshooting review

Independent study support only. No affiliation, endorsement, exam dumps, or real exam questions.