Content Audit for SEO: The Complete Checklist to Find and Fix What’s Killing Your Rankings

Most sites don’t have a traffic problem. They have a content quality problem buried under years of accumulation. A structured content audit surfaces the specific pages dragging down your organic equity — and gives you a prioritized action plan to fix them.

This guide covers every major content audit signal, from duplicate pages and missing H1 tags to JavaScript rendering issues, obtrusive ads, and high bounce rates. Whether you’re auditing a 50-page site or a 50,000-URL domain, the diagnostic framework stays the same: inventory, analyze, triage, act.

What a Content Audit Actually Measures

A content audit is a systematic evaluation of every indexable URL on your site, scored against signals that affect crawlability, relevance, user experience, and E-E-A-T. The output is not a list of problems — it’s a triage matrix that separates pages worth investing in from pages that are silently consuming crawl budget and diluting domain-level quality signals.

The audit categories below map to four impact areas: indexation integrity, on-page optimization, content quality, and technical rendering. Tackle them in that order for the highest ROI.

Duplicate Content: Internal and External

Internal Pages with Content Duplicated on Other Pages

Internal duplication occurs when two or more pages on your domain target the same search intent with substantially similar copy. Google does not penalize duplicate content with a manual action in most cases, but it does force a ranking election — and it usually picks the wrong page. The result is keyword cannibalization: your pages compete against each other, split link equity, and underperform relative to a single consolidated authority page.

Identify internal duplicates using Screaming Frog’s “Near Duplicate” report or Semrush’s Site Audit, which flags pages above a configurable similarity threshold (typically 80%+). For each cluster of duplicates, your decision is binary: consolidate into one canonical URL with 301 redirects, or implement rel=canonical pointing the weaker variants to the authoritative version.

External Pages with Content Duplicated from Websites

Syndicated content, scraped copy, or republished articles that appear verbatim on third-party domains create an external duplication signal. If Google indexes the external version before yours — or assigns it higher authority — your original page loses the ranking credit it earned.

Audit external duplication using Copyscape or the Siteliner “External Duplicate Content” report. Where your content has been scraped, submit a DMCA takedown. Where you’ve syndicated deliberately, ensure the canonical tag on the third-party page points back to your original URL — Google’s guidance on handling cross-domain content duplication explains the correct implementation in detail.

Under-Optimized Pages

Under-optimization is not a single issue — it’s a cluster of on-page deficiencies that collectively suppress a page’s ranking ceiling. The most common patterns found in content audits are missing target keywords in the title tag, meta description, or first 100 words; absence of LSI terms and semantic entities that top-ranking competitors naturally include; and word count significantly below the topical depth of current SERP leaders.

Pages with high impression counts but low click-through rates or high bounce rates indicate that the page is visible but failing to satisfy the user — hidden liabilities that waste crawl budget and frustrate visitors. These are your highest-priority under-optimized pages: they already have indexation and some ranking equity, meaning optimization investment compounds faster here than on brand-new content.

Export your Google Search Console Performance report, filter for pages with more than 500 impressions per month and an average position between 6 and 20, and cross-reference with your on-page audit. These are your “striking distance” pages — small optimizations to title tags, semantic coverage, and internal linking often push them into top-5 positions within 30 to 60 days.

Heading Tag Errors

Pages with Missing H1 Tags

The H1 tag is the primary on-page signal telling Google what a page is about. Missing it forces the crawler to infer topic relevance from surrounding context — a less reliable signal that routinely results in misclassified indexing. Research shows that 59.5% of websites are missing an H1 tag on at least one page, which can confuse search engines about the page’s main topic.

Use Screaming Frog or Semrush to export a full list of pages flagged “Missing H1.” Prioritize by organic impressions: fix high-impression pages first, as these already receive crawl attention and will respond fastest to corrections.

Pages Containing Multiple H1s

Multiple H1s on a single page fragment the topical signal. While Google has stated it can process pages with multiple H1s, the practical effect is reduced clarity about the primary topic — particularly for AI-driven answer engines that extract structured meaning from heading hierarchies to determine inclusion eligibility.

The fix is straightforward: audit your H1 inventory using your crawler of choice, then restructure pages so each carries exactly one H1 (the primary topic) with H2s and H3s used for subtopics and supporting sections.

Click-Through Rate Issues

Top Pages Have Low Click-Through Rates

A page ranking in position 3 for a high-volume keyword but generating a below-average CTR is losing hundreds of potential organic sessions every week. Low CTR at high positions is almost always a title tag or meta description problem: the SERP snippet fails to communicate unique value or match the emotional register of the query intent.

High impressions but low CTR signals a need to improve title and description; high traffic but high bounce rate signals a need to add internal links and improve content.

Pull your GSC data filtered by average position 1–10 and sort by CTR ascending. For each page significantly below the expected CTR for its position (roughly 2.8% for position 5 on desktop), audit the title tag for specificity, emotional resonance, and alignment with the dominant query intent. Test new title variants using GSC’s Search Analytics or a dedicated A/B testing tool.

Landing Pages Are Under-Optimized

Landing pages — specifically those driving paid or organic acquisition — are the highest-revenue pages on most sites. Under-optimization here directly compounds conversion loss on top of ranking suppression. Common audit findings on landing pages include: keyword absent from the H1, thin body copy that fails to establish topical depth, missing FAQ schema, and no internal links from high-authority blog content to the landing page.

Audit landing pages for both SEO signals (on-page optimization score, target keyword presence, page speed) and conversion signals (CTA clarity, trust elements, Core Web Vitals). Both sets of issues depress performance and both are fixable within a single sprint.

Technical Content Issues

Content Design

Content design refers to the structural and visual framework through which information is delivered. From a content audit perspective, poor content design manifests as dense unbroken paragraphs that elevate bounce rates, absent subheadings that prevent scannable consumption, missing visual anchors (images, diagrams, data tables) where competitors include them, and formatting that degrades on mobile.

A high dwell time on desktop combined with a high mobile bounce rate is frequently a content design problem, not a keyword problem. Audit mobile rendering for each top-10 page and compare layout quality against the top three SERP competitors.

Transient Content and Seasonality

Transient content — pages built for time-limited campaigns, discontinued products, or past events — creates indexation debt when left live without proper handling. These pages accumulate crawl budget consumption and, in some cases, a diluted quality signal for the broader domain.

Your content audit should flag pages with declining traffic trajectories over a 12-month window in GA4, especially those tied to seasonal topics. Decision framework: redirect to a relevant evergreen equivalent if the URL has backlinks; noindex if the URL has zero external links; delete if it serves no current user or business purpose.

Content Quality

Content quality in Google’s current evaluation model maps directly to E-E-A-T: Experience, Expertise, Authoritativeness, and Trustworthiness. The December 2025 core update emphasized the Experience pillar more than ever — sites with thin content, duplicate meta descriptions, or missing author bios lost visibility even if they had strong backlinks.

Score each page against four quality indicators: Does it demonstrate first-hand experience or original data? Does it carry an identified author with verifiable credentials? Is the publication and last-reviewed date displayed? Does it link to primary sources? Pages that fail three or more of these checks are under-optimized for quality signals regardless of keyword placement.

Rendering and JavaScript Checks

Content Is Present with JavaScript Turned Off

Google can render JavaScript, but Googlebot processes HTML before running JS — and in a two-wave indexing model, JS-dependent content may be indexed days or weeks later than server-rendered HTML. For critical page content (headings, body text, product descriptions, internal links), relying on JavaScript execution means delayed or incomplete indexation.

Google’s own JavaScript SEO documentation confirms: your most important content must be present in the initial HTML or SSR output, not only after hydration — especially for large navigations, product grids, and article bodies.

Audit this by disabling JavaScript in your browser (Chrome DevTools → Settings → Disable JavaScript) and loading your key pages. Any content that disappears is at indexation risk. The fix is server-side rendering (SSR) or static site generation (SSG) for content-critical elements.

Content Is Not Served Within an iFrame

iFrame content is generally not indexed as part of the parent page. If your primary body copy, navigation, or product descriptions are rendered inside iFrames — common in legacy CMS setups and some third-party embed integrations — that content does not contribute to the parent page’s relevance signal. Audit iFrame usage using Screaming Frog’s “Contains iFrames” filter and migrate critical content to native HTML.

Does Fetch and Render / DOM Match

A DOM mismatch occurs when the HTML source Google receives differs materially from what a browser renders after JavaScript execution — often caused by dynamic content injection. If your canonical tag, H1, or main body text only appear post-render, Google may index the wrong version of the page or miss the content entirely.

Validate DOM consistency using Google Search Console’s URL Inspection tool: compare the rendered screenshot against your live page. Any structural discrepancy between the two indicates a rendering issue requiring engineering attention.

Content Integrity Checks

Lorem Ipsum and Adult Content Check

Lorem ipsum placeholder text left in published pages signals low production quality — to both users and crawlers. More critically, it occupies word count without contributing relevance signals, potentially tipping a page below the quality threshold that triggers Google’s “thin content” classification.

Adult content appearing on pages outside explicitly designated sections triggers E-E-A-T penalties and potential SafeSearch filtering, removing the page from significant portions of the addressable audience.

Audit for both using a combination of site crawls (Screaming Frog can search for custom strings, including “lorem ipsum”) and manual spot-checks on recently published or templated pages.

Heading IDs

Heading IDs (anchor tags applied to H2/H3 elements) enable deep linking, support structured navigation in long-form content, and improve eligibility for Google’s featured snippet “jump to section” formatting. Pages with heading IDs correctly implemented generate higher engagement metrics — users can link to and return to specific sections — which feeds positive behavioral signals back into the ranking algorithm.

Audit heading ID implementation using your site crawler or browser DevTools. For long-form guides and resource pages, ensure every major H2 carries a unique, descriptive ID that reflects the section topic.

Incorrect Filetypes

Incorrect filetypes surface most commonly as PDFs indexed instead of HTML equivalents, or as downloadable assets (DOCX, XLSX) mistakenly left crawlable. PDFs can rank well, but they lack the internal linking capability, structured data support, and analytics tracking granularity of HTML pages. If your content strategy depends on PDFs ranking for informational keywords, consider migrating that content to HTML and redirecting the PDF URL.

Screaming Frog’s “Non-HTML” filter surfaces all non-HTML URLs in your crawl. Cross-reference against your robots.txt to confirm which filetypes should be excluded from indexation.

Paywalls

Paywalled content requires specific implementation to avoid indexation issues. Google’s guidance on paywalls is clear: lead-in content (the visible portion before the paywall triggers) must be substantial enough to establish relevance, and structured data (NewsArticle with isAccessibleForFree: false) must be implemented to signal paywall status accurately.

Under-implemented paywalls — where the full article is accessible via Google cache or when JavaScript is disabled — constitute cloaking, which carries a manual penalty risk. Audit all paywalled URLs for consistent enforcement across crawlers, cached versions, and JS-disabled states.

User Behavior Signals

Pages with High Bounce Rates

Bounce rate, or in GA4 terms the inverse of engagement rate, is a composite signal that surfaces multiple distinct underlying problems. A high bounce rate on an informational article suggests the content failed to answer the user’s question quickly; low dwell time indicates the content is unengaging or difficult to read — user signals that Google uses to validate ranking positions.

The diagnostic framework for high-bounce pages: first, confirm intent alignment — does the page content match the dominant search intent for the query driving traffic? Second, evaluate above-the-fold design — does the page immediately signal that the user’s question will be answered? Third, check page speed — Core Web Vitals failures frequently masquerade as content quality issues. A page that loads in 4+ seconds generates elevated bounce rates regardless of content quality.

Pull your GA4 Engagement report, filter for pages with an engagement rate below 40% and more than 200 monthly sessions, and build a triage queue sorted by traffic volume. High-traffic, high-bounce pages deliver the largest ROI from optimization.

Site Contains Obtrusive Ads

Google’s Page Experience signals include the Abusive Experiences Report, which penalizes pages with ads that interfere with content consumption: full-page interstitials triggered on load, sticky ads that cover more than 30% of screen real estate, and auto-playing video ads without user initiation. Beyond the algorithmic penalty risk, obtrusive ads directly suppress dwell time and elevate bounce rates — compounding the ranking suppression across behavioral and technical signals simultaneously.

Audit ad placement using the Ad Experience Report in Google Search Console and PageSpeed Insights “Opportunities” section. Relocate intrusive ad placements to non-content-blocking positions and verify compliance via the Ad Experience Report.

Content Audit Action Framework

The output of a content audit is not a list of issues — it is a prioritized action matrix. Assign each audited page one of five statuses:

Keep: High-performing pages with strong engagement, current content, and correct optimization. Monitor quarterly.

Update: Pages with good topical authority but stale statistics, missing semantic coverage, or weak on-page signals. Refresh and republish with a clear “last updated” timestamp.

Consolidate: Duplicate or near-duplicate pages competing for the same intent. Merge into one canonical URL; 301 redirect the weaker variants.

Rewrite: Pages with correct topic targeting but fundamentally misaligned content — wrong format, wrong intent match, insufficient depth. Treat as new content creation.

Remove: Pages with zero traffic, no backlinks, no business purpose, and no recovery pathway. Delete and 301 redirect if the URL has any inbound link equity.

Frequently Asked Questions

Q: How often should I run a content audit? Quarterly audits are appropriate for most sites under 1,000 pages. Larger sites or those in highly competitive niches benefit from rolling monthly audits segmented by content type. Run an immediate audit after any major Google core update — the December 2025 update, for example, disproportionately affected sites with thin content and weak E-E-A-T signals.

Q: What tools do I need to run a content audit? The minimum viable stack is Screaming Frog (technical crawl), Google Search Console (indexation and CTR data), and GA4 (engagement and behavioral data). For sites above 10,000 pages, add Semrush or Ahrefs for bulk on-page analysis, cannibalization detection, and backlink data per URL.

Q: How do I prioritize which content issues to fix first? Prioritize by the intersection of traffic impact and fix effort. Striking-distance pages (positions 6–15 with 500+ monthly impressions) deliver fast ranking gains from small optimizations. High-traffic, high-bounce pages deliver fast engagement improvements from intent and design fixes. Duplicate content consolidations protect existing equity. Technical rendering issues (JavaScript, iFrames) should be fixed early as they can suppress entire page categories silently.

Q: Does deleting pages hurt SEO? Removing genuinely low-quality pages with no traffic and no backlinks typically improves domain-level quality signals over time by concentrating crawl budget on higher-value URLs. Always apply a 301 redirect from the deleted URL to the most topically relevant live page — do not return 404s on URLs that have any inbound link equity.

Q: What’s the difference between a content audit and a technical SEO audit? A technical SEO audit focuses on crawlability, indexation, Core Web Vitals, and structured data — the infrastructure layer. A content audit focuses on the quality, relevance, optimization, and behavioral performance of the pages themselves. Both are necessary and overlap in areas like rendering (JavaScript, iFrames) and on-page signals (H1s, heading structure). Run them in sequence: technical audit first to ensure your content can be correctly crawled and indexed, then content audit to optimize what’s visible.

Next Steps

If you’ve identified issues across several of these categories, start with the highest-impact, lowest-effort fixes: missing H1 tags, striking-distance page title optimizations, and internal duplicate consolidations. These three changes alone, applied systematically, tend to generate measurable ranking movement within 30 to 60 days.

For a deeper framework on topical authority and how your content audit findings connect to cluster-level optimization, explore our guides on information architecture and semantic SEO. A content audit without a content strategy to feed its findings into is a one-time exercise — the compounding organic equity comes from treating it as a recurring system.

About the author

SEO Strategist with 16 years of experience