Back to lessons

Hosting Operations

Find Pages Missing Canonical Links

You need to list generated HTML pages that do not include a canonical link.

Command

find public -name '*.html' -print | while read -r f; do grep -qi 'rel="canonical"' "$f" || echo "$f"; done

What changed

Nothing changes. The command scans each HTML file and prints files missing canonical markup.

Danger

safe

When to use it

Use after layout edits, generator upgrades, or route additions.

When not to use it

Do not use it to validate whether canonical URLs are correct; it only checks presence.

Undo or recovery

No undo needed because this command is read-only.

Expected output

A list of HTML files that lack rel="canonical".

demo script

Disposable terminal steps

  1. grep -Rni --include='*.html' 'rel="canonical"' public
  2. find public -name '*.html' -print | while read -r f; do grep -qi 'rel="canonical"' "$f" || echo "$f"; done

simulated output

What it looks like

disposable vessel
::fixture-ready::
$ grep -Rni --include='*.html' 'rel="canonical"' public
public/draft.html:5:
public/blog/post.html:4:
public/index.html:5:
::exit-code::0
$ find public -name '*.html' -print | while read -r f; do grep -qi 'rel="canonical"' "$f" || echo "$f"; done
public/about.html
::exit-code::0

YouTube Short

Find missing canonicals.

Scan generated pages, not assumptions. This loop prints every HTML file that missed the canonical tag.

LinkedIn hook

Canonical tags are easy to drop when templates branch.

Question: Have you had one template path silently drop canonical tags?

experiments

A/B tests to run

Metric: completion_rate

A: Find missing canonicals.

B: Check generated HTML directly.