Files
Andrew Yim 7c5efea817 Restructure skill numbering: SEO 11-30, GTM 60-69, reserve 19-28 for future skills
Renumber 12 existing skills to new ranges:
- SEO: 11→13, 12→18, 13→16, 14→17, 15→14, 16→15, 17→29, 18→30, 19→12
- GTM: 20→60, 21→61, 22→62

Update cross-references in gateway architect/builder skills, GTM guardian
README, CLAUDE.md (skill tables + directory layout), and AGENTS.md
(domain routing ranges).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 02:37:39 +09:00

4.5 KiB

CLAUDE.md

Overview

Technical SEO auditor for crawlability fundamentals: robots.txt validation, XML sitemap analysis, and URL accessibility checking.

Quick Start

# Install dependencies
pip install -r scripts/requirements.txt

# Robots.txt analysis
python scripts/robots_checker.py --url https://example.com

# Sitemap validation
python scripts/sitemap_validator.py --url https://example.com/sitemap.xml

# Async URL crawl (check sitemap URLs accessibility)
python scripts/sitemap_crawler.py --sitemap https://example.com/sitemap.xml

Scripts

Script Purpose Key Output
robots_checker.py Parse and validate robots.txt User-agent rules, disallow patterns, sitemap declarations
sitemap_validator.py Validate XML sitemap structure URL count, lastmod dates, size limits, syntax errors
sitemap_crawler.py Async check URL accessibility HTTP status codes, response times, broken links
base_client.py Shared utilities RateLimiter, ConfigManager, BaseAsyncClient

Robots.txt Checker

# Basic analysis
python scripts/robots_checker.py --url https://example.com

# Test specific URL against rules
python scripts/robots_checker.py --url https://example.com --test-url /admin/

# Output JSON
python scripts/robots_checker.py --url https://example.com --json

Checks performed:

  • Syntax validation
  • User-agent rule parsing
  • Disallow/Allow pattern analysis
  • Sitemap declarations
  • Critical resource access (CSS/JS/images)

Sitemap Validator

# Validate sitemap
python scripts/sitemap_validator.py --url https://example.com/sitemap.xml

# Include sitemap index parsing
python scripts/sitemap_validator.py --url https://example.com/sitemap_index.xml --follow-index

Validation rules:

  • XML syntax correctness
  • URL count limit (50,000 max per sitemap)
  • File size limit (50MB max uncompressed)
  • Lastmod date format validation
  • Sitemap index structure

Sitemap Crawler

# Crawl all URLs in sitemap
python scripts/sitemap_crawler.py --sitemap https://example.com/sitemap.xml

# Limit concurrent requests
python scripts/sitemap_crawler.py --sitemap https://example.com/sitemap.xml --concurrency 10

# Sample mode (check subset)
python scripts/sitemap_crawler.py --sitemap https://example.com/sitemap.xml --sample 100

Output includes:

  • HTTP status codes per URL
  • Response times
  • Redirect chains
  • Broken links (4xx, 5xx)

Output Format

All scripts support --json flag for structured output:

{
  "url": "https://example.com",
  "status": "valid|invalid|warning",
  "issues": [
    {
      "type": "error|warning|info",
      "message": "Description",
      "location": "Line or URL"
    }
  ],
  "summary": {}
}

Common Issues Detected

Category Issue Severity
Robots.txt Missing sitemap declaration Medium
Robots.txt Blocking CSS/JS resources High
Robots.txt Overly broad disallow rules Medium
Sitemap URLs returning 404 High
Sitemap Missing lastmod dates Low
Sitemap Exceeds 50,000 URL limit High
Sitemap Non-canonical URLs included Medium

Configuration

Environment variables (optional):

# Rate limiting
CRAWL_DELAY=1.0          # Seconds between requests
MAX_CONCURRENT=20        # Async concurrency limit
REQUEST_TIMEOUT=30       # Request timeout seconds

Notion Output (Required)

IMPORTANT: All audit reports MUST be saved to the OurDigital SEO Audit Log database.

Database Configuration

Field Value
Database ID 2c8581e5-8a1e-8035-880b-e38cefc2f3ef
URL https://www.notion.so/dintelligence/2c8581e58a1e8035880be38cefc2f3ef

Required Properties

Property Type Description
Issue Title Report title (Korean + date)
Site URL Audited website URL
Category Select Technical SEO, On-page SEO, Performance, Schema/Structured Data, Sitemap, Robots.txt, Content, Local SEO
Priority Select Critical, High, Medium, Low
Found Date Date Audit date (YYYY-MM-DD)
Audit ID Rich Text Format: [TYPE]-YYYYMMDD-NNN

Language Guidelines

  • Report content in Korean (한국어)
  • Keep technical English terms as-is (e.g., SEO Audit, Core Web Vitals, Schema Markup)
  • URLs and code remain unchanged

Example MCP Call

mcp-cli call notion/API-post-page '{"parent": {"database_id": "2c8581e5-8a1e-8035-880b-e38cefc2f3ef"}, "properties": {...}}'