Files

Andrew Yim eea49f9f8c refactor(skills): Restructure skills to dual-platform architecture

Major refactoring of ourdigital-custom-skills with new numbering system:

## Structure Changes
- Each skill now has code/ (Claude Code) and desktop/ (Claude Desktop) versions
- New progressive numbering: 01-09 General, 10-19 SEO, 20-29 GTM, 30-39 OurDigital, 40-49 Jamie

## Skill Reorganization
- 01-notion-organizer (from 02)
- 10-18: SEO tools split into focused skills (technical, on-page, local, schema, vitals, gsc, gateway)
- 20-21: GTM audit and manager
- 30-32: OurDigital designer, research, presentation
- 40-41: Jamie brand editor and audit

## New Files
- .claude/commands/: Slash command definitions for all skills
- CLAUDE.md: Updated with new skill structure documentation
- REFACTORING_PLAN.md: Migration documentation
- COMPATIBILITY_REPORT.md, SKILLS_COMPARISON.md: Analysis docs

## Removed
- Old skill directories (02-05, 10-14, 20-21 old numbering)
- Consolidated into new structure with _archive/ for reference

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2025-12-22 01:58:24 +09:00

3.4 KiB

Raw Blame History

CLAUDE.md

Overview

Technical SEO auditor for crawlability fundamentals: robots.txt validation, XML sitemap analysis, and URL accessibility checking.

Quick Start

# Install dependencies
pip install -r scripts/requirements.txt

# Robots.txt analysis
python scripts/robots_checker.py --url https://example.com

# Sitemap validation
python scripts/sitemap_validator.py --url https://example.com/sitemap.xml

# Async URL crawl (check sitemap URLs accessibility)
python scripts/sitemap_crawler.py --sitemap https://example.com/sitemap.xml

Scripts

Script	Purpose	Key Output
`robots_checker.py`	Parse and validate robots.txt	User-agent rules, disallow patterns, sitemap declarations
`sitemap_validator.py`	Validate XML sitemap structure	URL count, lastmod dates, size limits, syntax errors
`sitemap_crawler.py`	Async check URL accessibility	HTTP status codes, response times, broken links
`base_client.py`	Shared utilities	RateLimiter, ConfigManager, BaseAsyncClient

Robots.txt Checker

# Basic analysis
python scripts/robots_checker.py --url https://example.com

# Test specific URL against rules
python scripts/robots_checker.py --url https://example.com --test-url /admin/

# Output JSON
python scripts/robots_checker.py --url https://example.com --json

Checks performed:

Syntax validation
User-agent rule parsing
Disallow/Allow pattern analysis
Sitemap declarations
Critical resource access (CSS/JS/images)

Sitemap Validator

# Validate sitemap
python scripts/sitemap_validator.py --url https://example.com/sitemap.xml

# Include sitemap index parsing
python scripts/sitemap_validator.py --url https://example.com/sitemap_index.xml --follow-index

Validation rules:

XML syntax correctness
URL count limit (50,000 max per sitemap)
File size limit (50MB max uncompressed)
Lastmod date format validation
Sitemap index structure

Sitemap Crawler

# Crawl all URLs in sitemap
python scripts/sitemap_crawler.py --sitemap https://example.com/sitemap.xml

# Limit concurrent requests
python scripts/sitemap_crawler.py --sitemap https://example.com/sitemap.xml --concurrency 10

# Sample mode (check subset)
python scripts/sitemap_crawler.py --sitemap https://example.com/sitemap.xml --sample 100

Output includes:

HTTP status codes per URL
Response times
Redirect chains
Broken links (4xx, 5xx)

Output Format

All scripts support --json flag for structured output:

{
  "url": "https://example.com",
  "status": "valid|invalid|warning",
  "issues": [
    {
      "type": "error|warning|info",
      "message": "Description",
      "location": "Line or URL"
    }
  ],
  "summary": {}
}

Common Issues Detected

Category	Issue	Severity
Robots.txt	Missing sitemap declaration	Medium
Robots.txt	Blocking CSS/JS resources	High
Robots.txt	Overly broad disallow rules	Medium
Sitemap	URLs returning 404	High
Sitemap	Missing lastmod dates	Low
Sitemap	Exceeds 50,000 URL limit	High
Sitemap	Non-canonical URLs included	Medium

Configuration

Environment variables (optional):

# Rate limiting
CRAWL_DELAY=1.0          # Seconds between requests
MAX_CONCURRENT=20        # Async concurrency limit
REQUEST_TIMEOUT=30       # Request timeout seconds

3.4 KiB Raw Blame History