our-claude-skills/.claude/commands/seo-crawl-budget.md

---
description: Crawl budget optimization and log analysis
---

# SEO Crawl Budget

Server access log analysis, bot profiling, and crawl budget waste identification.

## Triggers
- "crawl budget", "log analysis", "크롤 예산"

## Capabilities

1. **Log Parsing** - Parse Nginx, Apache, CloudFront access logs (streaming for >1GB files)
2. **Bot Identification** - Googlebot, Yeti/Naver, Bingbot, Daumoa/Kakao, and others by User-Agent
3. **Per-Bot Profiling** - Crawl frequency, depth distribution, status codes, crawl patterns
4. **Waste Detection** - Parameter URLs, low-value pages, redirect chains, soft 404s, duplicate URLs
5. **Orphan Pages** - Pages in sitemap but never crawled, crawled but not in sitemap
6. **Optimization Plan** - robots.txt suggestions, URL parameter handling, noindex recommendations

## Scripts

```bash
# Parse Nginx access log
python /Users/ourdigital/Projects/our-claude-skills/custom-skills/32-seo-crawl-budget/code/scripts/log_parser.py \
  --log-file /var/log/nginx/access.log --json

# Parse Apache log, filter by Googlebot
python /Users/ourdigital/Projects/our-claude-skills/custom-skills/32-seo-crawl-budget/code/scripts/log_parser.py \
  --log-file /var/log/apache2/access.log --format apache --bot googlebot --json

# Parse gzipped log in streaming mode
python /Users/ourdigital/Projects/our-claude-skills/custom-skills/32-seo-crawl-budget/code/scripts/log_parser.py \
  --log-file access.log.gz --streaming --json

# Full crawl budget analysis with sitemap comparison
python /Users/ourdigital/Projects/our-claude-skills/custom-skills/32-seo-crawl-budget/code/scripts/crawl_budget_analyzer.py \
  --log-file access.log --sitemap https://example.com/sitemap.xml --json

# Waste identification only
python /Users/ourdigital/Projects/our-claude-skills/custom-skills/32-seo-crawl-budget/code/scripts/crawl_budget_analyzer.py \
  --log-file access.log --scope waste --json

# Orphan page detection
python /Users/ourdigital/Projects/our-claude-skills/custom-skills/32-seo-crawl-budget/code/scripts/crawl_budget_analyzer.py \
  --log-file access.log --sitemap https://example.com/sitemap.xml --scope orphans --json
```

## Output
- Bot request counts, status code distribution, top crawled URLs per bot
- Crawl waste breakdown (parameter URLs, redirects, soft 404s, duplicates)
- Orphan page lists (in sitemap not crawled, crawled not in sitemap)
- Efficiency score (0-100) with optimization recommendations
- Saved to Notion SEO Audit Log (Category: Crawl Budget, Audit ID: CRAWL-YYYYMMDD-NNN)