Files
our-claude-skills/.claude/commands/seo-crawl-budget.md
Andrew Yim 2aa9d098cb Fix SEO skills 19-34, add global slash commands, update installer (#4)
* Fix SEO skill 34 bugs, Korean labels, and transition Ahrefs refs to our-seo-agent

P0: Fix report_aggregator.py — wrong SKILL_REGISTRY[33] mapping, missing
CATEGORY_WEIGHTS for 7 categories, and break bug in health score parsing
that exited loop even on parse failure.

P1: Remove VIEW tab references from skill 20, expand skill 32 docs,
replace Ahrefs MCP references across all 16 skills (19-28, 31-34)
with our-seo-agent CLI data source references.

P2: Fix Korean labels in executive_report.py and dashboard_generator.py,
add tenacity to base requirements, sync skill 34 base_client.py with
canonical version from skill 12.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add Claude Code slash commands for SEO skills 19-34 and fix stale paths

Create 14 new slash command files for skills 19-28, 31-34 so they
appear as /seo-* commands in Claude Code. Also fix stale directory
paths in 8 existing commands (skills 12-18, 29-30) that referenced
pre-renumbering skill directories.

Update .gitignore to track .claude/commands/ while keeping other
.claude/ files ignored.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add 8 slash commands, enhance reference-curator with depth/output options

- Add slash commands: ourdigital-brand-guide, notion-writer, notebooklm-agent,
  notebooklm-automation, notebooklm-studio, notebooklm-research,
  reference-curator, multi-agent-guide
- Add --depth (light/standard/deep/full) with Firecrawl parameter mapping
- Add --output with ~/Documents/reference-library/ default and user confirmation
- Increase --max-sources default from 10 to 100
- Rename /reference-curator-pipeline to /reference-curator
- Simplify web-crawler-orchestrator label to web-crawler in docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Convert slash command script paths to absolute for global availability

Symlinked all 39 project commands to ~/.claude/commands/ so they work
from any project directory. Converted 126 relative custom-skills/ paths
to absolute /Users/ourdigital/Projects/our-claude-skills/custom-skills/.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Update installer to support global slash command installation

Add symlink-based global command setup so all 39 custom skills work from
any project directory. New --commands flag for quick re-sync, updated
--validate/--update/--uninstall to handle symlinks, and expanded skill
listing to cover all 7 domains.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add user guides in English and Korean for all 52 custom skills

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 14:37:54 +09:00

55 lines
2.5 KiB
Markdown

---
description: Crawl budget optimization and log analysis
---
# SEO Crawl Budget
Server access log analysis, bot profiling, and crawl budget waste identification.
## Triggers
- "crawl budget", "log analysis", "크롤 예산"
## Capabilities
1. **Log Parsing** - Parse Nginx, Apache, CloudFront access logs (streaming for >1GB files)
2. **Bot Identification** - Googlebot, Yeti/Naver, Bingbot, Daumoa/Kakao, and others by User-Agent
3. **Per-Bot Profiling** - Crawl frequency, depth distribution, status codes, crawl patterns
4. **Waste Detection** - Parameter URLs, low-value pages, redirect chains, soft 404s, duplicate URLs
5. **Orphan Pages** - Pages in sitemap but never crawled, crawled but not in sitemap
6. **Optimization Plan** - robots.txt suggestions, URL parameter handling, noindex recommendations
## Scripts
```bash
# Parse Nginx access log
python /Users/ourdigital/Projects/our-claude-skills/custom-skills/32-seo-crawl-budget/code/scripts/log_parser.py \
--log-file /var/log/nginx/access.log --json
# Parse Apache log, filter by Googlebot
python /Users/ourdigital/Projects/our-claude-skills/custom-skills/32-seo-crawl-budget/code/scripts/log_parser.py \
--log-file /var/log/apache2/access.log --format apache --bot googlebot --json
# Parse gzipped log in streaming mode
python /Users/ourdigital/Projects/our-claude-skills/custom-skills/32-seo-crawl-budget/code/scripts/log_parser.py \
--log-file access.log.gz --streaming --json
# Full crawl budget analysis with sitemap comparison
python /Users/ourdigital/Projects/our-claude-skills/custom-skills/32-seo-crawl-budget/code/scripts/crawl_budget_analyzer.py \
--log-file access.log --sitemap https://example.com/sitemap.xml --json
# Waste identification only
python /Users/ourdigital/Projects/our-claude-skills/custom-skills/32-seo-crawl-budget/code/scripts/crawl_budget_analyzer.py \
--log-file access.log --scope waste --json
# Orphan page detection
python /Users/ourdigital/Projects/our-claude-skills/custom-skills/32-seo-crawl-budget/code/scripts/crawl_budget_analyzer.py \
--log-file access.log --sitemap https://example.com/sitemap.xml --scope orphans --json
```
## Output
- Bot request counts, status code distribution, top crawled URLs per bot
- Crawl waste breakdown (parameter URLs, redirects, soft 404s, duplicates)
- Orphan page lists (in sitemap not crawled, crawled not in sitemap)
- Efficiency score (0-100) with optimization recommendations
- Saved to Notion SEO Audit Log (Category: Crawl Budget, Audit ID: CRAWL-YYYYMMDD-NNN)