Files
our-claude-skills/custom-skills/10-seo-technical-audit/code/CLAUDE.md
Andrew Yim 236be6c580 directory changes and restructuring
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-22 02:01:41 +09:00

3.4 KiB

CLAUDE.md

Overview

Technical SEO auditor for crawlability fundamentals: robots.txt validation, XML sitemap analysis, and URL accessibility checking.

Quick Start

# Install dependencies
pip install -r scripts/requirements.txt

# Robots.txt analysis
python scripts/robots_checker.py --url https://example.com

# Sitemap validation
python scripts/sitemap_validator.py --url https://example.com/sitemap.xml

# Async URL crawl (check sitemap URLs accessibility)
python scripts/sitemap_crawler.py --sitemap https://example.com/sitemap.xml

Scripts

Script Purpose Key Output
robots_checker.py Parse and validate robots.txt User-agent rules, disallow patterns, sitemap declarations
sitemap_validator.py Validate XML sitemap structure URL count, lastmod dates, size limits, syntax errors
sitemap_crawler.py Async check URL accessibility HTTP status codes, response times, broken links
base_client.py Shared utilities RateLimiter, ConfigManager, BaseAsyncClient

Robots.txt Checker

# Basic analysis
python scripts/robots_checker.py --url https://example.com

# Test specific URL against rules
python scripts/robots_checker.py --url https://example.com --test-url /admin/

# Output JSON
python scripts/robots_checker.py --url https://example.com --json

Checks performed:

  • Syntax validation
  • User-agent rule parsing
  • Disallow/Allow pattern analysis
  • Sitemap declarations
  • Critical resource access (CSS/JS/images)

Sitemap Validator

# Validate sitemap
python scripts/sitemap_validator.py --url https://example.com/sitemap.xml

# Include sitemap index parsing
python scripts/sitemap_validator.py --url https://example.com/sitemap_index.xml --follow-index

Validation rules:

  • XML syntax correctness
  • URL count limit (50,000 max per sitemap)
  • File size limit (50MB max uncompressed)
  • Lastmod date format validation
  • Sitemap index structure

Sitemap Crawler

# Crawl all URLs in sitemap
python scripts/sitemap_crawler.py --sitemap https://example.com/sitemap.xml

# Limit concurrent requests
python scripts/sitemap_crawler.py --sitemap https://example.com/sitemap.xml --concurrency 10

# Sample mode (check subset)
python scripts/sitemap_crawler.py --sitemap https://example.com/sitemap.xml --sample 100

Output includes:

  • HTTP status codes per URL
  • Response times
  • Redirect chains
  • Broken links (4xx, 5xx)

Output Format

All scripts support --json flag for structured output:

{
  "url": "https://example.com",
  "status": "valid|invalid|warning",
  "issues": [
    {
      "type": "error|warning|info",
      "message": "Description",
      "location": "Line or URL"
    }
  ],
  "summary": {}
}

Common Issues Detected

Category Issue Severity
Robots.txt Missing sitemap declaration Medium
Robots.txt Blocking CSS/JS resources High
Robots.txt Overly broad disallow rules Medium
Sitemap URLs returning 404 High
Sitemap Missing lastmod dates Low
Sitemap Exceeds 50,000 URL limit High
Sitemap Non-canonical URLs included Medium

Configuration

Environment variables (optional):

# Rate limiting
CRAWL_DELAY=1.0          # Seconds between requests
MAX_CONCURRENT=20        # Async concurrency limit
REQUEST_TIMEOUT=30       # Request timeout seconds