Hosting Operations

Read-only, can be slow

Find Duplicate Page Titles

You need to find repeated HTML title text across a built static site.

Command

grep -Rho --include='*.html' '<title>[^<]*</title>' public | sed 's#<title>##;s#</title>##' | sort | uniq -c | sort -nr

Before you run this

System impact: Read-only. Can create load on large logs, directories, filesystems, or process tables.

When not to use it: Do not treat it as a complete SEO audit; it only checks the visible title element.

Expected output

A count-sorted list of title strings, with duplicates showing counts greater than one.

System impact

Read-only, can be slow. Nothing changes. The command reads HTML files, extracts title text, and counts repeats.

Scope this to the smallest useful path or service on busy systems.

Recovery / rollback: no state is changed.

When to use it

Use before publishing a static site or after template changes that may duplicate page titles.

When not to use it

Do not treat it as a complete SEO audit; it only checks the visible title element.

Watch this command run

Command transcript

This sanitized transcript shows the commands and output shape without exposing host details.

demo@lab:~$

$ find public -name '*.html' -maxdepth 3 -print

public/about.html
public/draft.html
public/blog/post.html
public/index.html

$ grep -Rho --include='*.html' '<title>[^<]*</title>' public | sed 's#<title>##;s#</title>##' | sort | uniq -c | sort -nr

      2 Demo Site
      1 Post One
      1 Draft Page

View commands shown

These are the commands shown in the sanitized transcript.

Commands shown

find public -name '*.html' -maxdepth 3 -print
grep -Rho --include='*.html' '<title>[^<]*</title>' public | sed 's#<title>##;s#</title>##' | sort | uniq -c | sort -nr

next steps

Related commands

Web Server Rescue Can be slow

Find Broken Internal Links in Built HTML

A broken internal link is easiest to catch before it becomes a 404.

grep -Rho --include='*.html' 'href="/[^"]*"' public | sed 's#href="##;s#"##' | while read -r path; do test -e "public${path}" || echo "$path"; done | sort -u

Hosting Operations Can be slow

Find HTML Pages Missing from the Sitemap

A page can exist in the build but never make it into the sitemap.

find public -name '*.html' -print | sed 's#^public#https://example.com#' | while read -r url; do grep -q "$url" public/sitemap.xml || echo "$url"; done

Hosting Operations Can be slow

Find Pages Missing og:title

Social previews often fail because one template missed Open Graph tags.

find public -name '*.html' -print | while read -r f; do grep -qi 'property="og:title"' "$f" || echo "$f"; done

Hosting Operations Can be slow

Count Request IDs in Error Lines

Repeated request IDs can connect separate error lines to one failing path.

grep -Ei 'error|timeout|fatal|exception' fixtures/incidents/app.log | awk '{for (i=1;i<=NF;i++) if ($i ~ /^request_id=/) print $i}' | sort | uniq -c | sort -nr

Hosting Operations Can be slow

Find Pages Marked noindex

A leftover noindex can hide a page after launch.

grep -Rni --include='*.html' 'noindex' public

Study mapping

Use this as independent command practice: read the notes, predict the output, then compare it with the example before using a real shell.

Find Duplicate Page Titles

Before you run this

Expected output

System impact

When to use it

When not to use it

Command transcript

Commands shown

Related commands

Find Broken Internal Links in Built HTML

Find HTML Pages Missing from the Sitemap

Find Pages Missing og:title

Count Request IDs in Error Lines

Find Pages Marked noindex

Useful for