--- name: seo-crawl-budget description: | Crawl budget optimization and log analysis. Triggers: crawl budget, log analysis, bot crawling, Googlebot, crawl waste, orphan pages, crawl efficiency. --- # Crawl Budget Optimizer Analyze server access logs to identify crawl budget waste and generate optimization recommendations for search engine bots. ## Capabilities 1. **Log Analysis**: Parse Nginx/Apache/CloudFront access logs to extract bot crawl data 2. **Bot Profiling**: Per-bot behavior analysis (Googlebot, Yeti, Bingbot, Daumoa) 3. **Waste Detection**: Parameter URLs, redirect chains, soft 404s, duplicate URL variants 4. **Orphan Pages**: Pages in sitemap but uncrawled, and crawled pages not in sitemap 5. **Recommendations**: Prioritized action items for crawl budget optimization ## Workflow 1. Parse server access log with `log_parser.py` 2. Run crawl budget analysis with `crawl_budget_analyzer.py` 3. Compare with sitemap URLs for orphan page detection 4. Optionally compare with Ahrefs page history data 5. Generate Korean-language report with recommendations 6. Save to Notion SEO Audit Log database ## Tools Used - **Ahrefs**: `site-explorer-pages-history` for indexed page comparison - **Notion**: Save audit report to database `2c8581e5-8a1e-8035-880b-e38cefc2f3ef` - **WebSearch**: Current best practices and bot documentation ## Output All reports are saved to the OurDigital SEO Audit Log with: - Category: Crawl Budget - Audit ID format: CRAWL-YYYYMMDD-NNN - Content in Korean with technical English terms preserved