Major refactoring of ourdigital-custom-skills with new numbering system: ## Structure Changes - Each skill now has code/ (Claude Code) and desktop/ (Claude Desktop) versions - New progressive numbering: 01-09 General, 10-19 SEO, 20-29 GTM, 30-39 OurDigital, 40-49 Jamie ## Skill Reorganization - 01-notion-organizer (from 02) - 10-18: SEO tools split into focused skills (technical, on-page, local, schema, vitals, gsc, gateway) - 20-21: GTM audit and manager - 30-32: OurDigital designer, research, presentation - 40-41: Jamie brand editor and audit ## New Files - .claude/commands/: Slash command definitions for all skills - CLAUDE.md: Updated with new skill structure documentation - REFACTORING_PLAN.md: Migration documentation - COMPATIBILITY_REPORT.md, SKILLS_COMPARISON.md: Analysis docs ## Removed - Old skill directories (02-05, 10-14, 20-21 old numbering) - Consolidated into new structure with _archive/ for reference 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
3.4 KiB
3.4 KiB
CLAUDE.md
Overview
Technical SEO auditor for crawlability fundamentals: robots.txt validation, XML sitemap analysis, and URL accessibility checking.
Quick Start
# Install dependencies
pip install -r scripts/requirements.txt
# Robots.txt analysis
python scripts/robots_checker.py --url https://example.com
# Sitemap validation
python scripts/sitemap_validator.py --url https://example.com/sitemap.xml
# Async URL crawl (check sitemap URLs accessibility)
python scripts/sitemap_crawler.py --sitemap https://example.com/sitemap.xml
Scripts
| Script | Purpose | Key Output |
|---|---|---|
robots_checker.py |
Parse and validate robots.txt | User-agent rules, disallow patterns, sitemap declarations |
sitemap_validator.py |
Validate XML sitemap structure | URL count, lastmod dates, size limits, syntax errors |
sitemap_crawler.py |
Async check URL accessibility | HTTP status codes, response times, broken links |
base_client.py |
Shared utilities | RateLimiter, ConfigManager, BaseAsyncClient |
Robots.txt Checker
# Basic analysis
python scripts/robots_checker.py --url https://example.com
# Test specific URL against rules
python scripts/robots_checker.py --url https://example.com --test-url /admin/
# Output JSON
python scripts/robots_checker.py --url https://example.com --json
Checks performed:
- Syntax validation
- User-agent rule parsing
- Disallow/Allow pattern analysis
- Sitemap declarations
- Critical resource access (CSS/JS/images)
Sitemap Validator
# Validate sitemap
python scripts/sitemap_validator.py --url https://example.com/sitemap.xml
# Include sitemap index parsing
python scripts/sitemap_validator.py --url https://example.com/sitemap_index.xml --follow-index
Validation rules:
- XML syntax correctness
- URL count limit (50,000 max per sitemap)
- File size limit (50MB max uncompressed)
- Lastmod date format validation
- Sitemap index structure
Sitemap Crawler
# Crawl all URLs in sitemap
python scripts/sitemap_crawler.py --sitemap https://example.com/sitemap.xml
# Limit concurrent requests
python scripts/sitemap_crawler.py --sitemap https://example.com/sitemap.xml --concurrency 10
# Sample mode (check subset)
python scripts/sitemap_crawler.py --sitemap https://example.com/sitemap.xml --sample 100
Output includes:
- HTTP status codes per URL
- Response times
- Redirect chains
- Broken links (4xx, 5xx)
Output Format
All scripts support --json flag for structured output:
{
"url": "https://example.com",
"status": "valid|invalid|warning",
"issues": [
{
"type": "error|warning|info",
"message": "Description",
"location": "Line or URL"
}
],
"summary": {}
}
Common Issues Detected
| Category | Issue | Severity |
|---|---|---|
| Robots.txt | Missing sitemap declaration | Medium |
| Robots.txt | Blocking CSS/JS resources | High |
| Robots.txt | Overly broad disallow rules | Medium |
| Sitemap | URLs returning 404 | High |
| Sitemap | Missing lastmod dates | Low |
| Sitemap | Exceeds 50,000 URL limit | High |
| Sitemap | Non-canonical URLs included | Medium |
Configuration
Environment variables (optional):
# Rate limiting
CRAWL_DELAY=1.0 # Seconds between requests
MAX_CONCURRENT=20 # Async concurrency limit
REQUEST_TIMEOUT=30 # Request timeout seconds