Files
Andrew Yim 7c5efea817 Restructure skill numbering: SEO 11-30, GTM 60-69, reserve 19-28 for future skills
Renumber 12 existing skills to new ranges:
- SEO: 11→13, 12→18, 13→16, 14→17, 15→14, 16→15, 17→29, 18→30, 19→12
- GTM: 20→60, 21→61, 22→62

Update cross-references in gateway architect/builder skills, GTM guardian
README, CLAUDE.md (skill tables + directory layout), and AGENTS.md
(domain routing ranges).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 02:37:39 +09:00

163 lines
4.5 KiB
Markdown

# CLAUDE.md
## Overview
Technical SEO auditor for crawlability fundamentals: robots.txt validation, XML sitemap analysis, and URL accessibility checking.
## Quick Start
```bash
# Install dependencies
pip install -r scripts/requirements.txt
# Robots.txt analysis
python scripts/robots_checker.py --url https://example.com
# Sitemap validation
python scripts/sitemap_validator.py --url https://example.com/sitemap.xml
# Async URL crawl (check sitemap URLs accessibility)
python scripts/sitemap_crawler.py --sitemap https://example.com/sitemap.xml
```
## Scripts
| Script | Purpose | Key Output |
|--------|---------|------------|
| `robots_checker.py` | Parse and validate robots.txt | User-agent rules, disallow patterns, sitemap declarations |
| `sitemap_validator.py` | Validate XML sitemap structure | URL count, lastmod dates, size limits, syntax errors |
| `sitemap_crawler.py` | Async check URL accessibility | HTTP status codes, response times, broken links |
| `base_client.py` | Shared utilities | RateLimiter, ConfigManager, BaseAsyncClient |
## Robots.txt Checker
```bash
# Basic analysis
python scripts/robots_checker.py --url https://example.com
# Test specific URL against rules
python scripts/robots_checker.py --url https://example.com --test-url /admin/
# Output JSON
python scripts/robots_checker.py --url https://example.com --json
```
**Checks performed**:
- Syntax validation
- User-agent rule parsing
- Disallow/Allow pattern analysis
- Sitemap declarations
- Critical resource access (CSS/JS/images)
## Sitemap Validator
```bash
# Validate sitemap
python scripts/sitemap_validator.py --url https://example.com/sitemap.xml
# Include sitemap index parsing
python scripts/sitemap_validator.py --url https://example.com/sitemap_index.xml --follow-index
```
**Validation rules**:
- XML syntax correctness
- URL count limit (50,000 max per sitemap)
- File size limit (50MB max uncompressed)
- Lastmod date format validation
- Sitemap index structure
## Sitemap Crawler
```bash
# Crawl all URLs in sitemap
python scripts/sitemap_crawler.py --sitemap https://example.com/sitemap.xml
# Limit concurrent requests
python scripts/sitemap_crawler.py --sitemap https://example.com/sitemap.xml --concurrency 10
# Sample mode (check subset)
python scripts/sitemap_crawler.py --sitemap https://example.com/sitemap.xml --sample 100
```
**Output includes**:
- HTTP status codes per URL
- Response times
- Redirect chains
- Broken links (4xx, 5xx)
## Output Format
All scripts support `--json` flag for structured output:
```json
{
"url": "https://example.com",
"status": "valid|invalid|warning",
"issues": [
{
"type": "error|warning|info",
"message": "Description",
"location": "Line or URL"
}
],
"summary": {}
}
```
## Common Issues Detected
| Category | Issue | Severity |
|----------|-------|----------|
| Robots.txt | Missing sitemap declaration | Medium |
| Robots.txt | Blocking CSS/JS resources | High |
| Robots.txt | Overly broad disallow rules | Medium |
| Sitemap | URLs returning 404 | High |
| Sitemap | Missing lastmod dates | Low |
| Sitemap | Exceeds 50,000 URL limit | High |
| Sitemap | Non-canonical URLs included | Medium |
## Configuration
Environment variables (optional):
```bash
# Rate limiting
CRAWL_DELAY=1.0 # Seconds between requests
MAX_CONCURRENT=20 # Async concurrency limit
REQUEST_TIMEOUT=30 # Request timeout seconds
```
## Notion Output (Required)
**IMPORTANT**: All audit reports MUST be saved to the OurDigital SEO Audit Log database.
### Database Configuration
| Field | Value |
|-------|-------|
| Database ID | `2c8581e5-8a1e-8035-880b-e38cefc2f3ef` |
| URL | https://www.notion.so/dintelligence/2c8581e58a1e8035880be38cefc2f3ef |
### Required Properties
| Property | Type | Description |
|----------|------|-------------|
| Issue | Title | Report title (Korean + date) |
| Site | URL | Audited website URL |
| Category | Select | Technical SEO, On-page SEO, Performance, Schema/Structured Data, Sitemap, Robots.txt, Content, Local SEO |
| Priority | Select | Critical, High, Medium, Low |
| Found Date | Date | Audit date (YYYY-MM-DD) |
| Audit ID | Rich Text | Format: [TYPE]-YYYYMMDD-NNN |
### Language Guidelines
- Report content in Korean (한국어)
- Keep technical English terms as-is (e.g., SEO Audit, Core Web Vitals, Schema Markup)
- URLs and code remain unchanged
### Example MCP Call
```bash
mcp-cli call notion/API-post-page '{"parent": {"database_id": "2c8581e5-8a1e-8035-880b-e38cefc2f3ef"}, "properties": {...}}'
```