diff --git a/AGENTS.md b/AGENTS.md index 3bd648f..730d14f 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -115,15 +115,17 @@ Task 2: general-purpose - "Implement the planned skill" # Needs Task 1 result ## Domain-Specific Routing -### SEO Skills (11-32) +### SEO Skills (11-34) - Use **Explore** to understand existing SEO script patterns - Python scripts in these skills follow `base_client.py` patterns (RateLimiter, ConfigManager, BaseAsyncClient) - `11-seo-comprehensive-audit` orchestrates skills 12-18 for unified audits - Skills 19-28 provide advanced SEO capabilities (keyword strategy, SERP analysis, position tracking, link building, content strategy, e-commerce, KPI framework, international SEO, AI visibility, knowledge graph) - Skills 31-32 cover competitor intelligence and crawl budget optimization +- Skill 33 provides site migration planning (pre-migration baseline, redirect mapping, risk assessment, post-migration monitoring) +- Skill 34 aggregates outputs from all SEO skills into executive reports, HTML dashboards, and Korean-language summaries - All SEO skills integrate with Ahrefs MCP tools and output to the Notion SEO Audit Log database -- Slash commands available: `/seo-keyword-strategy`, `/seo-serp-analysis`, `/seo-position-tracking`, `/seo-link-building`, `/seo-content-strategy`, `/seo-ecommerce`, `/seo-kpi-framework`, `/seo-international`, `/seo-ai-visibility`, `/seo-knowledge-graph`, `/seo-competitor-intel`, `/seo-crawl-budget` +- Slash commands available: `/seo-keyword-strategy`, `/seo-serp-analysis`, `/seo-position-tracking`, `/seo-link-building`, `/seo-content-strategy`, `/seo-ecommerce`, `/seo-kpi-framework`, `/seo-international`, `/seo-ai-visibility`, `/seo-knowledge-graph`, `/seo-competitor-intel`, `/seo-crawl-budget`, `/seo-migration-planner`, `/seo-reporting-dashboard` ### GTM Skills (60-69) @@ -202,7 +204,7 @@ For long-running tasks, use `run_in_background: true`: ``` # Good candidates for background execution: -- Full skill audit across all 50 skills +- Full skill audit across all 52 skills - Running Python tests on multiple skills - Generating comprehensive documentation diff --git a/CLAUDE.md b/CLAUDE.md index 3a011c1..642ca3c 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -7,7 +7,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co **GitHub**: https://github.com/ourdigital/our-claude-skills This is a Claude Skills collection repository containing: -- **custom-skills/**: 50 custom skills for OurDigital workflows, SEO, GTM, Jamie Brand, NotebookLM, Notion, Reference Curation, and Multi-Agent Collaboration +- **custom-skills/**: 52 custom skills for OurDigital workflows, SEO, GTM, Jamie Brand, NotebookLM, Notion, Reference Curation, and Multi-Agent Collaboration - **example-skills/**: Reference examples from Anthropic's official skills repository - **official-skills/**: Notion integration skills (3rd party) - **reference/**: Skill format requirements documentation @@ -35,7 +35,7 @@ This is a Claude Skills collection repository containing: | 09 | ourdigital-backoffice | Business document creation | "create proposal", "견적서" | | 10 | ourdigital-skill-creator | Meta skill for creating skills | "create skill", "init skill" | -### SEO Tools (11-32) +### SEO Tools (11-34) | # | Skill | Purpose | Trigger | |---|-------|---------|---------| @@ -61,6 +61,8 @@ This is a Claude Skills collection repository containing: | 30 | seo-gateway-builder | Gateway page content | "build gateway page" | | 31 | seo-competitor-intel | Competitor profiling, benchmarking, threats | "competitor analysis", "competitive intel" | | 32 | seo-crawl-budget | Log analysis, bot profiling, crawl waste | "crawl budget", "log analysis" | +| 33 | seo-migration-planner | Site migration planning, redirect mapping | "site migration", "domain move", "사이트 이전" | +| 34 | seo-reporting-dashboard | Executive reports, HTML dashboards, aggregation | "SEO report", "SEO dashboard", "보고서" | ### GTM/GA Tools (60-69) @@ -221,6 +223,8 @@ our-claude-skills/ │ ├── 30-seo-gateway-builder/ │ ├── 31-seo-competitor-intel/ │ ├── 32-seo-crawl-budget/ +│ ├── 33-seo-migration-planner/ +│ ├── 34-seo-reporting-dashboard/ │ │ │ ├── 60-gtm-audit/ │ ├── 61-gtm-manager/ diff --git a/README.md b/README.md index b25ae59..8d9c997 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ > **Internal R&D Repository** - This repository is restricted for internal use only. -A collection of **50 custom Claude Skills** for OurDigital workflows, Jamie Plastic Surgery Clinic brand management, SEO/GTM tools, NotebookLM automation, Notion integrations, reference documentation curation, and multi-agent collaboration. +A collection of **52 custom Claude Skills** for OurDigital workflows, Jamie Plastic Surgery Clinic brand management, SEO/GTM tools, NotebookLM automation, Notion integrations, reference documentation curation, and multi-agent collaboration. ## Quick Install @@ -35,7 +35,7 @@ cd our-claude-skills/custom-skills/_ourdigital-shared | 09 | `ourdigital-backoffice` | Business document creation | | 10 | `ourdigital-skill-creator` | Meta skill for creating/managing skills | -### SEO Tools (11-32) +### SEO Tools (11-34) | # | Skill | Purpose | |---|-------|---------| @@ -61,6 +61,8 @@ cd our-claude-skills/custom-skills/_ourdigital-shared | 30 | `seo-gateway-builder` | Gateway page content generation | | 31 | `seo-competitor-intel` | Competitor profiling, benchmarking, threat scoring | | 32 | `seo-crawl-budget` | Log analysis, bot profiling, crawl waste detection | +| 33 | `seo-migration-planner` | Site migration planning, redirect mapping, monitoring | +| 34 | `seo-reporting-dashboard` | Executive reports, HTML dashboards, data aggregation | ### GTM/GA Tools (60-69) @@ -148,7 +150,7 @@ our-claude-skills/ │ │ │ ├── 00-our-settings-audit/ │ ├── 01-10 (OurDigital core) -│ ├── 11-32 (SEO tools) +│ ├── 11-34 (SEO tools) │ ├── 60-62 (GTM/GA tools) │ ├── 31-32 (Notion tools) │ ├── 40-45 (Jamie clinic) diff --git a/custom-skills/19-seo-keyword-strategy/desktop/SKILL.md b/custom-skills/19-seo-keyword-strategy/desktop/SKILL.md index 5db8269..0554377 100644 --- a/custom-skills/19-seo-keyword-strategy/desktop/SKILL.md +++ b/custom-skills/19-seo-keyword-strategy/desktop/SKILL.md @@ -1,7 +1,10 @@ --- name: seo-keyword-strategy description: | - Keyword strategy and research for SEO campaigns. Triggers: keyword research, keyword analysis, keyword gap, search volume, keyword clustering, intent classification. + Keyword strategy and research for SEO campaigns. + Triggers: keyword research, keyword analysis, keyword gap, search volume, + keyword clustering, intent classification, 키워드 전략, 키워드 분석, + 키워드 리서치, 검색량 분석, 키워드 클러스터링. --- # SEO Keyword Strategy & Research diff --git a/custom-skills/20-seo-serp-analysis/code/CLAUDE.md b/custom-skills/20-seo-serp-analysis/code/CLAUDE.md index 9bf0a95..8fac2ac 100644 --- a/custom-skills/20-seo-serp-analysis/code/CLAUDE.md +++ b/custom-skills/20-seo-serp-analysis/code/CLAUDE.md @@ -59,11 +59,11 @@ python scripts/naver_serp_analyzer.py --keywords-file keywords.txt --json ``` **Capabilities**: -- Naver section detection (블로그, 카페, 지식iN, 스마트스토어, 브랜드존, VIEW탭) +- Naver section detection (블로그, 카페, 지식iN, 스마트스토어, 브랜드존, 도서, 숏폼, 인플루언서) - Section priority mapping (which sections appear above fold) - Content type distribution per section - Brand zone presence detection -- VIEW tab content analysis +- Shortform/influencer content analysis ## Ahrefs MCP Tools Used diff --git a/custom-skills/20-seo-serp-analysis/code/scripts/naver_serp_analyzer.py b/custom-skills/20-seo-serp-analysis/code/scripts/naver_serp_analyzer.py index cbd1983..7e61640 100644 --- a/custom-skills/20-seo-serp-analysis/code/scripts/naver_serp_analyzer.py +++ b/custom-skills/20-seo-serp-analysis/code/scripts/naver_serp_analyzer.py @@ -80,13 +80,6 @@ NAVER_SECTION_SELECTORS: dict[str, list[str]] = { "type_brand", "sc_new.sp_brand", ], - "view_tab": [ - "sp_view", - "view_widget", - "sc_new.sp_view", - "type_view", - "api_subject_view", - ], "news": [ "sp_nnews", "news_widget", @@ -132,6 +125,26 @@ NAVER_SECTION_SELECTORS: dict[str, list[str]] = { "type_ad", "nx_ad", ], + "books": [ + "sp_book", + "sc_new.sp_book", + "type_book", + "api_subject_book", + "nx_book", + ], + "shortform": [ + "sp_shortform", + "sc_new.sp_shortform", + "type_shortform", + "sp_shorts", + "type_shorts", + ], + "influencer": [ + "sp_influencer", + "sc_new.sp_influencer", + "type_influencer", + "api_subject_influencer", + ], } # Section display names in Korean @@ -141,13 +154,15 @@ SECTION_DISPLAY_NAMES: dict[str, str] = { "knowledge_in": "지식iN", "smart_store": "스마트스토어", "brand_zone": "브랜드존", - "view_tab": "VIEW", "news": "뉴스", "encyclopedia": "백과사전", "image": "이미지", "video": "동영상", "place": "플레이스", "ad": "광고", + "books": "도서", + "shortform": "숏폼", + "influencer": "인플루언서", } # Default headers for Naver requests @@ -199,7 +214,6 @@ class NaverSerpResult: above_fold_sections: list[str] = field(default_factory=list) ad_count: int = 0 dominant_section: str = "" - has_view_tab: bool = False has_place_section: bool = False timestamp: str = "" @@ -485,7 +499,6 @@ class NaverSerpAnalyzer: ad_count = sum(s.item_count for s in ad_sections) if ad_sections else 0 # Check special sections - has_view = any(s.section_type == "view_tab" for s in sections) has_place = any(s.section_type == "place" for s in sections) dominant = self._find_dominant_section(sections) @@ -499,7 +512,6 @@ class NaverSerpAnalyzer: above_fold_sections=above_fold, ad_count=ad_count, dominant_section=dominant, - has_view_tab=has_view, has_place_section=has_place, ) return result @@ -534,7 +546,6 @@ def print_rich_report(result: NaverSerpResult) -> None: summary_table.add_row("Brand Zone", "Yes" if result.brand_zone_present else "No") if result.brand_zone_brand: summary_table.add_row("Brand Name", result.brand_zone_brand) - summary_table.add_row("VIEW Tab", "Yes" if result.has_view_tab else "No") summary_table.add_row("Place Section", "Yes" if result.has_place_section else "No") summary_table.add_row("Dominant Section", result.dominant_section or "N/A") console.print(summary_table) diff --git a/custom-skills/20-seo-serp-analysis/desktop/SKILL.md b/custom-skills/20-seo-serp-analysis/desktop/SKILL.md index d20a850..b7ca47a 100644 --- a/custom-skills/20-seo-serp-analysis/desktop/SKILL.md +++ b/custom-skills/20-seo-serp-analysis/desktop/SKILL.md @@ -17,7 +17,7 @@ Analyze search engine result page composition for Google and Naver. Detect SERP 2. **Competitor Position Mapping** - Extract domains, positions, content types for top organic results 3. **Opportunity Scoring** - Score SERP opportunity (0-100) based on feature landscape and competition 4. **Search Intent Validation** - Infer intent (informational, navigational, commercial, transactional, local) from SERP composition -5. **Naver SERP Composition** - Detect sections (blog, cafe, knowledge iN, Smart Store, brand zone, VIEW tab), map section priority, analyze brand zone presence +5. **Naver SERP Composition** - Detect sections (blog, cafe, knowledge iN, Smart Store, brand zone, books, shortform, influencer), map section priority, analyze brand zone presence ## MCP Tool Usage @@ -53,7 +53,7 @@ WebFetch: Fetch Naver SERP HTML for section analysis ### 2. Naver SERP Analysis 1. Fetch Naver search page for the target keyword -2. Detect SERP sections (blog, cafe, knowledge iN, Smart Store, brand zone, VIEW tab, news, encyclopedia) +2. Detect SERP sections (blog, cafe, knowledge iN, Smart Store, brand zone, news, encyclopedia, books, shortform, influencer) 3. Map section priority (above-fold order) 4. Check brand zone presence and extract brand name 5. Count items per section diff --git a/custom-skills/25-seo-kpi-framework/desktop/SKILL.md b/custom-skills/25-seo-kpi-framework/desktop/SKILL.md index 846997e..c5203a4 100644 --- a/custom-skills/25-seo-kpi-framework/desktop/SKILL.md +++ b/custom-skills/25-seo-kpi-framework/desktop/SKILL.md @@ -2,7 +2,8 @@ name: seo-kpi-framework description: | SEO KPI and performance framework for unified metrics, health scores, ROI, and period-over-period reporting. - Triggers: SEO KPI, performance report, health score, SEO metrics, ROI, baseline, targets. + Triggers: SEO KPI, performance report, health score, SEO metrics, ROI, + baseline, targets, SEO 성과 지표, KPI 대시보드, SEO 성과 보고서. --- # SEO KPI & Performance Framework diff --git a/custom-skills/27-seo-ai-visibility/desktop/SKILL.md b/custom-skills/27-seo-ai-visibility/desktop/SKILL.md index 576baa4..0c8f483 100644 --- a/custom-skills/27-seo-ai-visibility/desktop/SKILL.md +++ b/custom-skills/27-seo-ai-visibility/desktop/SKILL.md @@ -57,6 +57,55 @@ All reports are saved to the OurDigital SEO Audit Log: - **Audit ID Format**: AI-YYYYMMDD-NNN - **Language**: Korean (technical terms in English) +## Output Format + +```json +{ + "domain": "example.com", + "impressions": { + "total": 15000, + "trend": "increasing", + "period": "30d" + }, + "mentions": { + "total": 450, + "positive": 320, + "neutral": 100, + "negative": 30, + "sentiment_score": 0.72 + }, + "share_of_voice": { + "domain_sov": 12.5, + "competitors": { + "competitor1.com": 18.3, + "competitor2.com": 15.1 + } + }, + "cited_pages": [ + {"url": "https://example.com/guide", "citations": 45}, + {"url": "https://example.com/faq", "citations": 28} + ], + "cited_domains": [ + {"domain": "example.com", "citations": 120}, + {"domain": "competitor1.com", "citations": 95} + ], + "recommendations": [ + "Create more FAQ-style content for AI citation capture", + "Add structured data to improve AI answer extraction" + ], + "audit_id": "AI-20250115-001", + "timestamp": "2025-01-15T14:30:00" +} +``` + +## Limitations + +- Requires Ahrefs Brand Radar API access (not available in basic plans) +- AI search landscape changes rapidly; data may not reflect real-time state +- Share of Voice metrics are relative to tracked competitor set only +- Sentiment analysis based on AI-generated text, not user perception +- Cannot distinguish between different AI engines (ChatGPT, Gemini, Perplexity) without Brand Radar + ## Example Queries - "example.com의 AI 검색 가시성을 분석해줘" diff --git a/custom-skills/28-seo-knowledge-graph/desktop/SKILL.md b/custom-skills/28-seo-knowledge-graph/desktop/SKILL.md index 37f75ce..180e783 100644 --- a/custom-skills/28-seo-knowledge-graph/desktop/SKILL.md +++ b/custom-skills/28-seo-knowledge-graph/desktop/SKILL.md @@ -1,7 +1,10 @@ --- name: seo-knowledge-graph description: | - Knowledge Graph and entity SEO analysis. Triggers: knowledge panel, entity SEO, knowledge graph, PAA, FAQ schema, Wikipedia, Wikidata, brand entity. + Knowledge Graph and entity SEO analysis. + Triggers: knowledge panel, entity SEO, knowledge graph, PAA, FAQ schema, + Wikipedia, Wikidata, brand entity, 지식 그래프, 엔티티 SEO, + 지식 패널, 브랜드 엔티티, 위키데이터. --- # Knowledge Graph & Entity SEO @@ -69,9 +72,60 @@ All reports must be saved to the OurDigital SEO Audit Log database. Report content should be written in Korean (한국어), keeping technical English terms as-is. +## Output Format + +```json +{ + "entity_name": "OurDigital", + "knowledge_panel": { + "present": false, + "attributes": {} + }, + "entity_presence": { + "wikipedia": false, + "wikidata": false, + "wikidata_qid": null, + "naver_encyclopedia": false, + "naver_knowledge_in": false, + "google_knowledge_panel": false + }, + "entity_schema": { + "organization_count": 2, + "person_count": 1, + "same_as_links": ["https://linkedin.com/...", "https://facebook.com/..."], + "same_as_count": 2, + "issues": [ + "Duplicate Organization schemas with inconsistent names", + "Placeholder image in Organization schema", + "Only 2 sameAs links (recommend 6+)" + ] + }, + "paa_questions": [], + "faq_schema_present": false, + "entity_completeness_score": 12, + "recommendations": [ + "Create Wikidata entity for brand recognition", + "Add 4-6 more sameAs social profile links", + "Replace placeholder image with actual brand logo", + "Consolidate duplicate Organization schemas", + "Add FAQPage schema to relevant pages" + ], + "audit_id": "KG-20250115-001", + "timestamp": "2025-01-15T14:30:00" +} +``` + +## Limitations + +- Google Knowledge Panel detection via search results is not guaranteed (personalization, location-based) +- Direct Google scraping may be blocked (403/429); prefer WebSearch tool +- Wikipedia/Wikidata creation requires meeting notability guidelines +- PAA questions vary by location and device +- Entity completeness scoring is heuristic-based + ## Reference Scripts Located in `code/scripts/`: -- `knowledge_graph_analyzer.py` -- Knowledge Panel and entity presence analysis -- `entity_auditor.py` -- Entity SEO signals and PAA/FAQ audit -- `base_client.py` -- Shared async client utilities +- `knowledge_graph_analyzer.py` — Knowledge Panel and entity presence analysis +- `entity_auditor.py` — Entity SEO signals and PAA/FAQ audit +- `base_client.py` — Shared async client utilities diff --git a/custom-skills/31-seo-competitor-intel/desktop/SKILL.md b/custom-skills/31-seo-competitor-intel/desktop/SKILL.md index be468a3..a971e06 100644 --- a/custom-skills/31-seo-competitor-intel/desktop/SKILL.md +++ b/custom-skills/31-seo-competitor-intel/desktop/SKILL.md @@ -1,7 +1,10 @@ --- name: seo-competitor-intel description: | - Competitor intelligence and SEO benchmarking. Triggers: competitor analysis, competitive intelligence, competitor comparison, threat assessment, market position, benchmarking. + Competitor intelligence and SEO benchmarking. + Triggers: competitor analysis, competitive intelligence, competitor comparison, + threat assessment, market position, benchmarking, 경쟁사 분석, + 경쟁 인텔리전스, 벤치마킹, 경쟁사 비교. --- # SEO Competitor Intelligence & Benchmarking diff --git a/custom-skills/32-seo-crawl-budget/desktop/SKILL.md b/custom-skills/32-seo-crawl-budget/desktop/SKILL.md index e6cee04..d0d4fd9 100644 --- a/custom-skills/32-seo-crawl-budget/desktop/SKILL.md +++ b/custom-skills/32-seo-crawl-budget/desktop/SKILL.md @@ -1,39 +1,142 @@ --- name: seo-crawl-budget description: | - Crawl budget optimization and log analysis. Triggers: crawl budget, log analysis, bot crawling, Googlebot, crawl waste, orphan pages, crawl efficiency. + Crawl budget optimization and server log analysis for search engine bots. + Triggers: crawl budget, log analysis, bot crawling, Googlebot, crawl waste, + orphan pages, crawl efficiency, 크롤 예산, 로그 분석, 크롤 최적화. --- # Crawl Budget Optimizer -Analyze server access logs to identify crawl budget waste and generate optimization recommendations for search engine bots. +Analyze server access logs to identify crawl budget waste and generate optimization recommendations for search engine bots (Googlebot, Yeti/Naver, Bingbot, Daumoa/Kakao). ## Capabilities -1. **Log Analysis**: Parse Nginx/Apache/CloudFront access logs to extract bot crawl data -2. **Bot Profiling**: Per-bot behavior analysis (Googlebot, Yeti, Bingbot, Daumoa) -3. **Waste Detection**: Parameter URLs, redirect chains, soft 404s, duplicate URL variants -4. **Orphan Pages**: Pages in sitemap but uncrawled, and crawled pages not in sitemap -5. **Recommendations**: Prioritized action items for crawl budget optimization +### Log Analysis +- Parse Nginx combined, Apache combined, and CloudFront log formats +- Support for gzip/bzip2 compressed logs +- Streaming parser for files >1GB +- Date range filtering +- Custom format via regex + +### Bot Profiling +- Identify bots by User-Agent: Googlebot (and variants), Yeti (Naver), Bingbot, Daumoa (Kakao), Applebot, DuckDuckBot, Baiduspider +- Per-bot metrics: requests/day, requests/hour, unique URLs crawled +- Status code distribution per bot (200, 301, 404, 500) +- Crawl depth distribution +- Crawl pattern analysis (time of day, days of week) +- Most crawled URLs per bot + +### Waste Detection +- **Parameter URLs**: ?sort=, ?filter=, ?page=, ?utm_* consuming crawl budget +- **Redirect chains**: Multiple redirects consuming crawl slots +- **Soft 404s**: 200 status pages with error/empty content +- **Duplicate URLs**: www/non-www, http/https, trailing slash variants +- **Low-value pages**: Thin content pages, noindex pages being crawled + +### Orphan Page Detection +- Pages in sitemap but never crawled by bots +- Pages crawled but not in sitemap +- Crawled pages with no internal links pointing to them ## Workflow -1. Parse server access log with `log_parser.py` -2. Run crawl budget analysis with `crawl_budget_analyzer.py` -3. Compare with sitemap URLs for orphan page detection -4. Optionally compare with Ahrefs page history data -5. Generate Korean-language report with recommendations -6. Save to Notion SEO Audit Log database +### Step 1: Obtain Server Access Logs +Request or locate server access logs from the target site. Supported formats: +- Nginx: `/var/log/nginx/access.log` +- Apache: `/var/log/apache2/access.log` +- CloudFront: Downloaded from S3 or CloudWatch -## Tools Used +### Step 2: Parse Access Logs +```bash +python scripts/log_parser.py --log-file access.log --json +python scripts/log_parser.py --log-file access.log.gz --streaming --json +python scripts/log_parser.py --log-file access.log --bot googlebot --json +``` -- **Ahrefs**: `site-explorer-pages-history` for indexed page comparison -- **Notion**: Save audit report to database `2c8581e5-8a1e-8035-880b-e38cefc2f3ef` -- **WebSearch**: Current best practices and bot documentation +### Step 3: Crawl Budget Analysis +```bash +python scripts/crawl_budget_analyzer.py --log-file access.log --sitemap https://example.com/sitemap.xml --json +python scripts/crawl_budget_analyzer.py --log-file access.log --scope waste --json +python scripts/crawl_budget_analyzer.py --log-file access.log --scope orphans --json +python scripts/crawl_budget_analyzer.py --log-file access.log --scope bots --json +``` -## Output +### Step 4: Cross-Reference with Ahrefs (Optional) +Use `site-explorer-pages-history` to compare indexed pages vs crawled pages. -All reports are saved to the OurDigital SEO Audit Log with: -- Category: Crawl Budget -- Audit ID format: CRAWL-YYYYMMDD-NNN -- Content in Korean with technical English terms preserved +### Step 5: Generate Recommendations +Prioritized action items: +1. robots.txt optimization (block parameter URLs, low-value paths) +2. URL parameter handling (Google Search Console settings) +3. Noindex/nofollow for low-value pages +4. Redirect chain resolution (reduce 301 → 301 → 200 to 301 → 200) +5. Internal linking improvements for orphan pages + +### Step 6: Report to Notion +Save Korean-language report to SEO Audit Log database. + +## MCP Tools Used + +| Tool | Purpose | +|------|---------| +| Ahrefs `site-explorer-pages-history` | Compare indexed pages with crawled pages | +| Notion | Save audit report to database | +| WebSearch | Current bot documentation and best practices | + +## Output Format + +```json +{ + "log_file": "access.log", + "analysis_period": {"from": "2025-01-01", "to": "2025-01-31"}, + "total_bot_requests": 150000, + "bots": { + "googlebot": { + "requests": 80000, + "unique_urls": 12000, + "avg_requests_per_day": 2580, + "status_distribution": {"200": 70000, "301": 5000, "404": 3000, "500": 2000} + }, + "yeti": {"requests": 35000}, + "bingbot": {"requests": 20000}, + "daumoa": {"requests": 15000} + }, + "waste": { + "parameter_urls": {"count": 5000, "pct_of_crawls": 3.3}, + "redirect_chains": {"count": 2000, "pct_of_crawls": 1.3}, + "soft_404s": {"count": 1500, "pct_of_crawls": 1.0}, + "total_waste_pct": 8.5 + }, + "orphan_pages": { + "in_sitemap_not_crawled": [], + "crawled_not_in_sitemap": [] + }, + "recommendations": [], + "efficiency_score": 72, + "timestamp": "2025-01-01T00:00:00" +} +``` + +## Limitations + +- Requires actual server access logs (not available via standard web crawling) +- Log format auto-detection may need manual format specification for custom formats +- CloudFront logs have a different field structure than Nginx/Apache +- Large log files (>10GB) may need pre-filtering before analysis +- Bot identification relies on User-Agent strings which can be spoofed + +## Notion Output (Required) + +All audit reports MUST be saved to the OurDigital SEO Audit Log: +- **Database ID**: `2c8581e5-8a1e-8035-880b-e38cefc2f3ef` +- **Category**: Crawl Budget +- **Audit ID Format**: CRAWL-YYYYMMDD-NNN +- **Language**: Korean with technical English terms (Crawl Budget, Googlebot, robots.txt) + +## Reference Scripts + +Located in `code/scripts/`: +- `log_parser.py` — Server access log parser with bot identification +- `crawl_budget_analyzer.py` — Crawl budget efficiency analysis +- `base_client.py` — Shared async client utilities diff --git a/custom-skills/33-seo-migration-planner/README.md b/custom-skills/33-seo-migration-planner/README.md new file mode 100644 index 0000000..8e61652 --- /dev/null +++ b/custom-skills/33-seo-migration-planner/README.md @@ -0,0 +1,91 @@ +# SEO Migration Planner + +SEO 사이트 이전 계획 및 모니터링 도구 - 사전 위험 평가, 리디렉트 매핑, 이전 후 트래픽/인덱싱 추적. + +## Overview + +Pre-migration risk assessment, redirect mapping, URL inventory, crawl baseline capture, and post-migration traffic/indexation monitoring for site migrations. Supports domain moves, platform changes, URL restructuring, HTTPS migrations, and subdomain consolidation. + +## Dual-Platform Structure + +``` +33-seo-migration-planner/ +├── code/ # Claude Code version +│ ├── CLAUDE.md # Action-oriented directive +│ ├── commands/ +│ │ └── seo-migration-planner.md # Slash command +│ └── scripts/ +│ ├── migration_planner.py # Pre-migration planning +│ ├── migration_monitor.py # Post-migration monitoring +│ ├── base_client.py # Shared async utilities +│ └── requirements.txt +│ +├── desktop/ # Claude Desktop version +│ ├── SKILL.md # MCP-based workflow +│ ├── skill.yaml # Extended metadata +│ └── tools/ +│ ├── ahrefs.md # Ahrefs MCP tools +│ ├── firecrawl.md # Firecrawl MCP tools +│ └── notion.md # Notion MCP tools +│ +└── README.md +``` + +## Quick Start + +### Claude Code +```bash +/seo-migration-planner https://example.com --type domain-move --new-domain https://new-example.com +``` + +### Python Script +```bash +pip install -r code/scripts/requirements.txt + +# Pre-migration planning +python code/scripts/migration_planner.py --domain https://example.com --type domain-move --new-domain https://new-example.com --json + +# Post-migration monitoring +python code/scripts/migration_monitor.py --domain https://new-example.com --migration-date 2025-01-15 --baseline baseline.json --json +``` + +## Features + +### Pre-Migration Planning +- URL inventory via Firecrawl crawl +- Ahrefs traffic/keyword/backlink baseline +- Per-URL risk scoring (0-100) +- Redirect map generation (301 mappings) +- Type-specific pre-migration checklist (Korean) + +### Post-Migration Monitoring +- Pre vs post traffic comparison +- Redirect health check (broken, chains, loops) +- Indexation change tracking +- Keyword ranking monitoring +- Recovery timeline estimation +- Automated alert generation + +## Migration Types + +| Type | Description | +|------|-------------| +| `domain-move` | Old domain -> new domain | +| `platform` | CMS/framework migration | +| `url-restructure` | Path/slug changes | +| `https` | HTTP -> HTTPS | +| `subdomain` | Subdomain -> subfolder | + +## Notion Output + +Reports are saved to the OurDigital SEO Audit Log database: +- **Title**: `사이트 이전 계획 - [domain] - YYYY-MM-DD` +- **Database ID**: `2c8581e5-8a1e-8035-880b-e38cefc2f3ef` +- **Audit ID Format**: MIGR-YYYYMMDD-NNN + +## Triggers + +- site migration, domain move, redirect mapping +- platform migration, URL restructuring +- HTTPS migration, subdomain consolidation +- 사이트 이전, 도메인 이전, 리디렉트 매핑 diff --git a/custom-skills/33-seo-migration-planner/code/CLAUDE.md b/custom-skills/33-seo-migration-planner/code/CLAUDE.md new file mode 100644 index 0000000..fcbd167 --- /dev/null +++ b/custom-skills/33-seo-migration-planner/code/CLAUDE.md @@ -0,0 +1,150 @@ +# CLAUDE.md + +## Overview + +SEO site migration planning and monitoring tool for comprehensive pre-migration risk assessment, redirect mapping, URL inventory, crawl baseline capture, and post-migration traffic/indexation monitoring. Supports domain moves, platform changes, URL restructuring, HTTPS migrations, and subdomain consolidation. Captures full URL inventory via Firecrawl crawl, builds traffic/keyword baselines via Ahrefs, generates redirect maps with per-URL risk scoring, and tracks post-launch recovery with automated alerts. + +## Quick Start + +```bash +pip install -r scripts/requirements.txt + +# Pre-migration planning +python scripts/migration_planner.py --domain https://example.com --type domain-move --new-domain https://new-example.com --json + +# Post-migration monitoring +python scripts/migration_monitor.py --domain https://new-example.com --migration-date 2025-01-15 --baseline baseline.json --json +``` + +## Scripts + +| Script | Purpose | Key Output | +|--------|---------|------------| +| `migration_planner.py` | Pre-migration baseline + redirect map + risk assessment | URL inventory, redirect map, risk scores, checklist | +| `migration_monitor.py` | Post-migration traffic comparison, redirect health, indexation tracking | Traffic delta, broken redirects, ranking changes, alerts | +| `base_client.py` | Shared utilities | RateLimiter, ConfigManager, BaseAsyncClient | + +## Migration Planner + +```bash +# Domain move planning +python scripts/migration_planner.py --domain https://example.com --type domain-move --new-domain https://new-example.com --json + +# Platform migration (e.g., WordPress to headless) +python scripts/migration_planner.py --domain https://example.com --type platform --json + +# URL restructuring +python scripts/migration_planner.py --domain https://example.com --type url-restructure --json + +# HTTPS migration +python scripts/migration_planner.py --domain http://example.com --type https --json + +# Subdomain consolidation +python scripts/migration_planner.py --domain https://blog.example.com --type subdomain --new-domain https://example.com/blog --json +``` + +**Capabilities**: +- URL inventory via Firecrawl crawl (capture all URLs + status codes) +- Ahrefs top-pages baseline (traffic, keywords per page) +- Redirect map generation (old URL -> new URL mapping) +- Risk scoring per URL (based on traffic + backlinks + keyword rankings) +- Pre-migration checklist generation +- Support for migration types: + - Domain move (old domain -> new domain) + - Platform change (CMS/framework swap) + - URL restructuring (path/slug changes) + - HTTPS migration (HTTP -> HTTPS) + - Subdomain consolidation (subdomain -> subfolder) + +## Migration Monitor + +```bash +# Post-launch traffic comparison +python scripts/migration_monitor.py --domain https://new-example.com --migration-date 2025-01-15 --baseline baseline.json --json + +# Monitor with custom period +python scripts/migration_monitor.py --domain https://new-example.com --migration-date 2025-01-15 --baseline baseline.json --json + +# Quick redirect health check +python scripts/migration_monitor.py --domain https://new-example.com --migration-date 2025-01-15 --json +``` + +**Capabilities**: +- Post-launch traffic comparison (pre vs post, by page group) +- Redirect chain/loop detection +- 404 monitoring for high-value pages +- Indexation tracking (indexed pages before vs after) +- Ranking change tracking for priority keywords +- Recovery timeline estimation +- Alert generation for traffic drops >20% + +## Ahrefs MCP Tools Used + +| Tool | Purpose | +|------|---------| +| `site-explorer-metrics` | Current organic metrics (traffic, keywords) | +| `site-explorer-metrics-history` | Historical metrics for pre/post comparison | +| `site-explorer-top-pages` | Top performing pages for baseline | +| `site-explorer-pages-by-traffic` | Pages ranked by traffic for risk scoring | +| `site-explorer-organic-keywords` | Keyword rankings per page | +| `site-explorer-referring-domains` | Referring domains per page for risk scoring | +| `site-explorer-backlinks-stats` | Backlink overview for migration impact | + +## Output Format + +```json +{ + "domain": "example.com", + "migration_type": "domain-move", + "baseline": { + "total_urls": 1250, + "total_traffic": 45000, + "total_keywords": 8500, + "top_pages": [] + }, + "redirect_map": [ + { + "source": "https://example.com/page-1", + "target": "https://new-example.com/page-1", + "status_code": 301, + "priority": "critical" + } + ], + "risk_assessment": { + "high_risk_urls": 45, + "medium_risk_urls": 180, + "low_risk_urls": 1025, + "overall_risk": "medium" + }, + "pre_migration_checklist": [], + "timestamp": "2025-01-01T00:00:00" +} +``` + +## Notion Output (Required) + +**IMPORTANT**: All audit reports MUST be saved to the OurDigital SEO Audit Log database. + +### Database Configuration + +| Field | Value | +|-------|-------| +| Database ID | `2c8581e5-8a1e-8035-880b-e38cefc2f3ef` | +| URL | https://www.notion.so/dintelligence/2c8581e58a1e8035880be38cefc2f3ef | + +### Required Properties + +| Property | Type | Description | +|----------|------|-------------| +| Issue | Title | Report title (Korean + date) | +| Site | URL | Target website URL | +| Category | Select | SEO Migration | +| Priority | Select | Based on risk level | +| Found Date | Date | Report date (YYYY-MM-DD) | +| Audit ID | Rich Text | Format: MIGR-YYYYMMDD-NNN | + +### Language Guidelines + +- Report content in Korean (한국어) +- Keep technical English terms as-is (e.g., Redirect Map, Risk Score, Traffic Baseline, Indexation) +- URLs and code remain unchanged diff --git a/custom-skills/33-seo-migration-planner/code/commands/seo-migration-planner.md b/custom-skills/33-seo-migration-planner/code/commands/seo-migration-planner.md new file mode 100644 index 0000000..41dbf13 --- /dev/null +++ b/custom-skills/33-seo-migration-planner/code/commands/seo-migration-planner.md @@ -0,0 +1,27 @@ +--- +name: seo-migration-planner +description: | + SEO site migration planning and monitoring. Pre-migration risk assessment, redirect mapping, + crawl baseline, and post-migration traffic/indexation monitoring. + Triggers: site migration, domain move, redirect mapping, platform migration, URL restructuring, 사이트 이전. +allowed-tools: + - Bash + - Read + - Write + - WebFetch + - WebSearch +--- + +# SEO Migration Planner + +Run the migration planning or monitoring workflow based on the user's request. + +## Pre-Migration Planning +```bash +python custom-skills/33-seo-migration-planner/code/scripts/migration_planner.py --domain [URL] --type [TYPE] --json +``` + +## Post-Migration Monitoring +```bash +python custom-skills/33-seo-migration-planner/code/scripts/migration_monitor.py --domain [URL] --migration-date [DATE] --json +``` diff --git a/custom-skills/33-seo-migration-planner/code/scripts/base_client.py b/custom-skills/33-seo-migration-planner/code/scripts/base_client.py new file mode 100644 index 0000000..132e2c2 --- /dev/null +++ b/custom-skills/33-seo-migration-planner/code/scripts/base_client.py @@ -0,0 +1,172 @@ +""" +Base Client - Shared async client utilities +=========================================== +Purpose: Rate-limited async operations for API clients +Python: 3.10+ +""" + +import asyncio +import logging +import os +from asyncio import Semaphore +from datetime import datetime +from typing import Any, Callable, TypeVar + +from dotenv import load_dotenv +from tenacity import ( + retry, + stop_after_attempt, + wait_exponential, + retry_if_exception_type, +) + +# Load environment variables +load_dotenv() + +# Logging setup +logging.basicConfig( + level=logging.INFO, + format="%(asctime)s - %(levelname)s - %(message)s", +) + +T = TypeVar("T") + + +class RateLimiter: + """Rate limiter using token bucket algorithm.""" + + def __init__(self, rate: float, per: float = 1.0): + self.rate = rate + self.per = per + self.tokens = rate + self.last_update = datetime.now() + self._lock = asyncio.Lock() + + async def acquire(self) -> None: + async with self._lock: + now = datetime.now() + elapsed = (now - self.last_update).total_seconds() + self.tokens = min(self.rate, self.tokens + elapsed * (self.rate / self.per)) + self.last_update = now + + if self.tokens < 1: + wait_time = (1 - self.tokens) * (self.per / self.rate) + await asyncio.sleep(wait_time) + self.tokens = 0 + else: + self.tokens -= 1 + + +class BaseAsyncClient: + """Base class for async API clients with rate limiting.""" + + def __init__( + self, + max_concurrent: int = 5, + requests_per_second: float = 3.0, + logger: logging.Logger | None = None, + ): + self.semaphore = Semaphore(max_concurrent) + self.rate_limiter = RateLimiter(requests_per_second) + self.logger = logger or logging.getLogger(self.__class__.__name__) + self.stats = { + "requests": 0, + "success": 0, + "errors": 0, + "retries": 0, + } + + @retry( + stop=stop_after_attempt(3), + wait=wait_exponential(multiplier=1, min=2, max=10), + retry=retry_if_exception_type(Exception), + ) + async def _rate_limited_request( + self, + coro: Callable[[], Any], + ) -> Any: + async with self.semaphore: + await self.rate_limiter.acquire() + self.stats["requests"] += 1 + try: + result = await coro() + self.stats["success"] += 1 + return result + except Exception as e: + self.stats["errors"] += 1 + self.logger.error(f"Request failed: {e}") + raise + + async def batch_requests( + self, + requests: list[Callable[[], Any]], + desc: str = "Processing", + ) -> list[Any]: + try: + from tqdm.asyncio import tqdm + has_tqdm = True + except ImportError: + has_tqdm = False + + async def execute(req: Callable) -> Any: + try: + return await self._rate_limited_request(req) + except Exception as e: + return {"error": str(e)} + + tasks = [execute(req) for req in requests] + + if has_tqdm: + results = [] + for coro in tqdm.as_completed(tasks, total=len(tasks), desc=desc): + result = await coro + results.append(result) + return results + else: + return await asyncio.gather(*tasks, return_exceptions=True) + + def print_stats(self) -> None: + self.logger.info("=" * 40) + self.logger.info("Request Statistics:") + self.logger.info(f" Total Requests: {self.stats['requests']}") + self.logger.info(f" Successful: {self.stats['success']}") + self.logger.info(f" Errors: {self.stats['errors']}") + self.logger.info("=" * 40) + + +class ConfigManager: + """Manage API configuration and credentials.""" + + def __init__(self): + load_dotenv() + + @property + def google_credentials_path(self) -> str | None: + seo_creds = os.path.expanduser("~/.credential/ourdigital-seo-agent.json") + if os.path.exists(seo_creds): + return seo_creds + return os.getenv("GOOGLE_APPLICATION_CREDENTIALS") + + @property + def pagespeed_api_key(self) -> str | None: + return os.getenv("PAGESPEED_API_KEY") + + @property + def notion_token(self) -> str | None: + return os.getenv("NOTION_TOKEN") or os.getenv("NOTION_API_KEY") + + def validate_google_credentials(self) -> bool: + creds_path = self.google_credentials_path + if not creds_path: + return False + return os.path.exists(creds_path) + + def get_required(self, key: str) -> str: + value = os.getenv(key) + if not value: + raise ValueError(f"Missing required environment variable: {key}") + return value + + +# Singleton config instance +config = ConfigManager() diff --git a/custom-skills/33-seo-migration-planner/code/scripts/migration_monitor.py b/custom-skills/33-seo-migration-planner/code/scripts/migration_monitor.py new file mode 100644 index 0000000..f9a80c9 --- /dev/null +++ b/custom-skills/33-seo-migration-planner/code/scripts/migration_monitor.py @@ -0,0 +1,909 @@ +""" +Migration Monitor - Post-Migration Traffic & Indexation Monitoring +================================================================== +Purpose: Post-migration traffic comparison, redirect health checks, + indexation tracking, ranking change monitoring, and alert generation. +Python: 3.10+ + +Usage: + python migration_monitor.py --domain https://new-example.com --migration-date 2025-01-15 --baseline baseline.json --json + python migration_monitor.py --domain https://new-example.com --migration-date 2025-01-15 --json +""" + +import argparse +import asyncio +import json +import logging +import sys +from dataclasses import dataclass, field, asdict +from datetime import datetime, timedelta +from typing import Any +from urllib.parse import urlparse + +from base_client import BaseAsyncClient, config + +logger = logging.getLogger(__name__) + + +# --------------------------------------------------------------------------- +# Data classes +# --------------------------------------------------------------------------- + +@dataclass +class TrafficComparison: + """Traffic comparison between pre- and post-migration periods.""" + page_group: str = "" + pre_traffic: int = 0 + post_traffic: int = 0 + change_pct: float = 0.0 + change_absolute: int = 0 + status: str = "stable" # improved / stable / declined / critical + + +@dataclass +class RedirectHealth: + """Health status of a single redirect.""" + source: str = "" + target: str = "" + status_code: int = 0 + chain_length: int = 0 + is_broken: bool = False + final_url: str = "" + error: str = "" + + +@dataclass +class IndexationStatus: + """Indexation comparison before and after migration.""" + pre_count: int = 0 + post_count: int = 0 + change_pct: float = 0.0 + missing_pages: list[str] = field(default_factory=list) + new_pages: list[str] = field(default_factory=list) + deindexed_count: int = 0 + + +@dataclass +class RankingChange: + """Ranking change for a keyword.""" + keyword: str = "" + pre_position: int = 0 + post_position: int = 0 + change: int = 0 + url: str = "" + search_volume: int = 0 + + +@dataclass +class MigrationAlert: + """Alert for significant post-migration issues.""" + alert_type: str = "" # traffic_drop, redirect_broken, indexation_drop, ranking_loss + severity: str = "info" # info / warning / critical + message: str = "" + metric_value: float = 0.0 + threshold: float = 0.0 + affected_urls: list[str] = field(default_factory=list) + + +@dataclass +class MigrationReport: + """Complete post-migration monitoring report.""" + domain: str = "" + migration_date: str = "" + days_since_migration: int = 0 + traffic_comparison: list[TrafficComparison] = field(default_factory=list) + redirect_health: list[RedirectHealth] = field(default_factory=list) + indexation: IndexationStatus | None = None + ranking_changes: list[RankingChange] = field(default_factory=list) + recovery_estimate: dict[str, Any] = field(default_factory=dict) + alerts: list[MigrationAlert] = field(default_factory=list) + timestamp: str = "" + errors: list[str] = field(default_factory=list) + + +# --------------------------------------------------------------------------- +# Monitor +# --------------------------------------------------------------------------- + +class MigrationMonitor(BaseAsyncClient): + """Monitors post-migration SEO health using Ahrefs and Firecrawl MCP tools.""" + + # Alert thresholds + TRAFFIC_DROP_WARNING = 0.20 # 20% drop + TRAFFIC_DROP_CRITICAL = 0.40 # 40% drop + RANKING_DROP_THRESHOLD = 5 # 5+ position drop + INDEXATION_DROP_WARNING = 0.10 # 10% indexation loss + + def __init__(self): + super().__init__(max_concurrent=5, requests_per_second=2.0) + + @staticmethod + def _extract_domain(url: str) -> str: + """Extract bare domain from URL or return as-is if already bare.""" + if "://" in url: + parsed = urlparse(url) + return parsed.netloc.lower().replace("www.", "") + return url.lower().replace("www.", "") + + async def _call_ahrefs(self, tool: str, params: dict[str, Any]) -> dict: + """Simulate Ahrefs MCP call. In production, routed via MCP bridge.""" + self.logger.info(f"Ahrefs MCP call: {tool} | params={params}") + return {"tool": tool, "params": params, "data": {}} + + async def _call_firecrawl(self, tool: str, params: dict[str, Any]) -> dict: + """Simulate Firecrawl MCP call. In production, routed via MCP bridge.""" + self.logger.info(f"Firecrawl MCP call: {tool} | params={params}") + return {"tool": tool, "params": params, "data": {}} + + # ------------------------------------------------------------------ + # Traffic Comparison + # ------------------------------------------------------------------ + + async def compare_traffic( + self, domain: str, migration_date: str + ) -> list[TrafficComparison]: + """Compare traffic before and after migration date.""" + domain = self._extract_domain(domain) + mig_date = datetime.strptime(migration_date, "%Y-%m-%d") + days_since = (datetime.now() - mig_date).days + + # Pre-migration period: same duration before migration + pre_start = (mig_date - timedelta(days=max(days_since, 30))).strftime("%Y-%m-%d") + pre_end = (mig_date - timedelta(days=1)).strftime("%Y-%m-%d") + post_start = migration_date + post_end = datetime.now().strftime("%Y-%m-%d") + + self.logger.info( + f"Comparing traffic for {domain}: " + f"pre={pre_start}..{pre_end} vs post={post_start}..{post_end}" + ) + + # Fetch pre-migration metrics history + pre_resp = await self._call_ahrefs( + "site-explorer-metrics-history", + {"target": domain, "date_from": pre_start, "date_to": pre_end}, + ) + pre_data = pre_resp.get("data", {}).get("data_points", []) + + # Fetch post-migration metrics history + post_resp = await self._call_ahrefs( + "site-explorer-metrics-history", + {"target": domain, "date_from": post_start, "date_to": post_end}, + ) + post_data = post_resp.get("data", {}).get("data_points", []) + + # Calculate averages + pre_avg_traffic = 0 + if pre_data: + pre_avg_traffic = int( + sum(int(p.get("organic_traffic", 0)) for p in pre_data) / len(pre_data) + ) + + post_avg_traffic = 0 + if post_data: + post_avg_traffic = int( + sum(int(p.get("organic_traffic", 0)) for p in post_data) / len(post_data) + ) + + # Overall comparison + change_pct = 0.0 + if pre_avg_traffic > 0: + change_pct = ((post_avg_traffic - pre_avg_traffic) / pre_avg_traffic) * 100 + + status = "stable" + if change_pct > 5: + status = "improved" + elif change_pct < -40: + status = "critical" + elif change_pct < -20: + status = "declined" + + comparisons = [ + TrafficComparison( + page_group="Overall", + pre_traffic=pre_avg_traffic, + post_traffic=post_avg_traffic, + change_pct=round(change_pct, 2), + change_absolute=post_avg_traffic - pre_avg_traffic, + status=status, + ) + ] + + # Fetch top pages comparison + pre_pages_resp = await self._call_ahrefs( + "site-explorer-pages-by-traffic", + {"target": domain, "limit": 50}, + ) + top_pages = pre_pages_resp.get("data", {}).get("pages", []) + + for page in top_pages[:20]: + page_url = page.get("url", "") + page_traffic = int(page.get("traffic", 0)) + # In production, would compare with baseline data + comparisons.append( + TrafficComparison( + page_group=page_url, + pre_traffic=0, # Would be populated from baseline + post_traffic=page_traffic, + change_pct=0.0, + change_absolute=0, + status="stable", + ) + ) + + self.logger.info( + f"Traffic comparison for {domain}: " + f"pre={pre_avg_traffic:,} -> post={post_avg_traffic:,} " + f"({change_pct:+.1f}%)" + ) + return comparisons + + # ------------------------------------------------------------------ + # Redirect Health Check + # ------------------------------------------------------------------ + + async def check_redirects( + self, redirect_map: list[dict[str, str]] + ) -> list[RedirectHealth]: + """Verify redirect health: check for broken redirects, chains, and loops.""" + health_results: list[RedirectHealth] = [] + + self.logger.info(f"Checking {len(redirect_map)} redirects for health...") + + for entry in redirect_map: + source = entry.get("source", "") + expected_target = entry.get("target", "") + + if not source: + continue + + # Use Firecrawl to check the redirect + resp = await self._call_firecrawl( + "firecrawl_scrape", + {"url": source, "formats": ["links"]}, + ) + + result_data = resp.get("data", {}) + final_url = result_data.get("final_url", "") + status_code = int(result_data.get("status_code", 0)) + redirect_chain = result_data.get("redirect_chain", []) + chain_length = len(redirect_chain) + + is_broken = ( + status_code >= 400 + or status_code == 0 + or (final_url and final_url != expected_target and status_code != 301) + ) + + health = RedirectHealth( + source=source, + target=expected_target, + status_code=status_code, + chain_length=chain_length, + is_broken=is_broken, + final_url=final_url, + error="" if not is_broken else f"Expected {expected_target}, got {final_url} ({status_code})", + ) + health_results.append(health) + + broken_count = sum(1 for h in health_results if h.is_broken) + chain_count = sum(1 for h in health_results if h.chain_length > 1) + + self.logger.info( + f"Redirect health check complete: " + f"{broken_count} broken, {chain_count} chains detected " + f"out of {len(health_results)} redirects" + ) + return health_results + + # ------------------------------------------------------------------ + # Indexation Tracking + # ------------------------------------------------------------------ + + async def track_indexation( + self, domain: str, pre_baseline: dict[str, Any] | None = None + ) -> IndexationStatus: + """Compare indexed pages before and after migration.""" + domain = self._extract_domain(domain) + + self.logger.info(f"Tracking indexation for {domain}") + + # Fetch current metrics + metrics_resp = await self._call_ahrefs( + "site-explorer-metrics", {"target": domain} + ) + current_pages = int(metrics_resp.get("data", {}).get("pages", 0)) + + # Get pre-migration count from baseline + pre_count = 0 + if pre_baseline: + pre_count = int(pre_baseline.get("total_urls", 0)) + + change_pct = 0.0 + if pre_count > 0: + change_pct = ((current_pages - pre_count) / pre_count) * 100 + + # Fetch current top pages to detect missing ones + pages_resp = await self._call_ahrefs( + "site-explorer-top-pages", {"target": domain, "limit": 500} + ) + current_page_urls = set() + for page in pages_resp.get("data", {}).get("pages", []): + url = page.get("url", "") + if url: + current_page_urls.add(url) + + # Compare with baseline URL inventory + missing_pages: list[str] = [] + if pre_baseline: + baseline_urls = pre_baseline.get("url_inventory", []) + for url_entry in baseline_urls: + url = url_entry if isinstance(url_entry, str) else url_entry.get("url", "") + if url and url not in current_page_urls: + missing_pages.append(url) + + status = IndexationStatus( + pre_count=pre_count, + post_count=current_pages, + change_pct=round(change_pct, 2), + missing_pages=missing_pages[:100], # Cap at 100 for readability + deindexed_count=len(missing_pages), + ) + + self.logger.info( + f"Indexation for {domain}: " + f"pre={pre_count:,} -> post={current_pages:,} " + f"({change_pct:+.1f}%), {len(missing_pages)} missing" + ) + return status + + # ------------------------------------------------------------------ + # Ranking Tracking + # ------------------------------------------------------------------ + + async def track_rankings( + self, domain: str, priority_keywords: list[str] | None = None + ) -> list[RankingChange]: + """Track ranking changes for priority keywords.""" + domain = self._extract_domain(domain) + + self.logger.info(f"Tracking rankings for {domain}") + + # Fetch current keyword rankings + kw_resp = await self._call_ahrefs( + "site-explorer-organic-keywords", + {"target": domain, "limit": 200}, + ) + current_keywords = kw_resp.get("data", {}).get("keywords", []) + + ranking_changes: list[RankingChange] = [] + for kw_data in current_keywords: + keyword = kw_data.get("keyword", "") + + # If priority keywords specified, filter + if priority_keywords and keyword.lower() not in [k.lower() for k in priority_keywords]: + continue + + current_pos = int(kw_data.get("position", 0)) + previous_pos = int(kw_data.get("previous_position", current_pos)) + volume = int(kw_data.get("search_volume", 0)) + url = kw_data.get("url", "") + + change = previous_pos - current_pos # Positive = improved + + ranking_changes.append( + RankingChange( + keyword=keyword, + pre_position=previous_pos, + post_position=current_pos, + change=change, + url=url, + search_volume=volume, + ) + ) + + # Sort by absolute change (biggest drops first) + ranking_changes.sort(key=lambda r: r.change) + + self.logger.info( + f"Tracked {len(ranking_changes)} keyword rankings for {domain}" + ) + return ranking_changes + + # ------------------------------------------------------------------ + # Recovery Estimation + # ------------------------------------------------------------------ + + def estimate_recovery( + self, traffic_data: list[TrafficComparison], migration_type: str = "domain-move" + ) -> dict[str, Any]: + """Estimate recovery timeline based on traffic comparison data.""" + overall = next( + (t for t in traffic_data if t.page_group == "Overall"), None + ) + + if not overall: + return { + "estimated_weeks": "unknown", + "confidence": "low", + "message": "트래픽 데이터 부족으로 회복 기간 추정 불가", + } + + change_pct = overall.change_pct + + # Base recovery timelines by migration type (weeks) + base_timelines = { + "domain-move": 16, # 4 months + "platform": 8, # 2 months + "url-restructure": 12, # 3 months + "https": 4, # 1 month + "subdomain": 10, # 2.5 months + } + base_weeks = base_timelines.get(migration_type, 12) + + if change_pct >= 0: + # No traffic drop — recovery already achieved or in progress + return { + "estimated_weeks": 0, + "confidence": "high", + "current_recovery_pct": 100.0, + "message": "트래픽 손실 없음 — 이전 성공적으로 진행 중", + } + elif change_pct > -20: + # Minor drop — quick recovery expected + estimated_weeks = max(int(base_weeks * 0.5), 2) + confidence = "high" + recovery_pct = round(100 + change_pct, 1) + elif change_pct > -40: + # Moderate drop — standard recovery timeline + estimated_weeks = base_weeks + confidence = "medium" + recovery_pct = round(100 + change_pct, 1) + else: + # Severe drop — extended recovery + estimated_weeks = int(base_weeks * 1.5) + confidence = "low" + recovery_pct = round(100 + change_pct, 1) + + return { + "estimated_weeks": estimated_weeks, + "confidence": confidence, + "current_recovery_pct": recovery_pct, + "traffic_change_pct": change_pct, + "migration_type": migration_type, + "message": ( + f"현재 트래픽 {change_pct:+.1f}% 변동. " + f"예상 회복 기간: {estimated_weeks}주 (신뢰도: {confidence}). " + f"현재 회복률: {recovery_pct:.1f}%" + ), + } + + # ------------------------------------------------------------------ + # Alert Generation + # ------------------------------------------------------------------ + + def generate_alerts(self, report: MigrationReport) -> list[MigrationAlert]: + """Generate alerts for significant post-migration issues.""" + alerts: list[MigrationAlert] = [] + + # Traffic drop alerts + for tc in report.traffic_comparison: + if tc.page_group == "Overall": + abs_change = abs(tc.change_pct) / 100.0 + if tc.change_pct < 0 and abs_change >= self.TRAFFIC_DROP_CRITICAL: + alerts.append(MigrationAlert( + alert_type="traffic_drop", + severity="critical", + message=( + f"심각한 트래픽 하락: {tc.change_pct:+.1f}% " + f"(이전 전 {tc.pre_traffic:,} -> 이전 후 {tc.post_traffic:,})" + ), + metric_value=tc.change_pct, + threshold=-self.TRAFFIC_DROP_CRITICAL * 100, + )) + elif tc.change_pct < 0 and abs_change >= self.TRAFFIC_DROP_WARNING: + alerts.append(MigrationAlert( + alert_type="traffic_drop", + severity="warning", + message=( + f"트래픽 하락 감지: {tc.change_pct:+.1f}% " + f"(이전 전 {tc.pre_traffic:,} -> 이전 후 {tc.post_traffic:,})" + ), + metric_value=tc.change_pct, + threshold=-self.TRAFFIC_DROP_WARNING * 100, + )) + + # Broken redirect alerts + broken_redirects = [r for r in report.redirect_health if r.is_broken] + if broken_redirects: + severity = "critical" if len(broken_redirects) > 10 else "warning" + alerts.append(MigrationAlert( + alert_type="redirect_broken", + severity=severity, + message=( + f"깨진 리디렉트 {len(broken_redirects)}건 감지. " + f"고가치 페이지의 링크 에퀴티 손실 위험." + ), + metric_value=float(len(broken_redirects)), + threshold=1.0, + affected_urls=[r.source for r in broken_redirects[:20]], + )) + + # Redirect chain alerts + chain_redirects = [r for r in report.redirect_health if r.chain_length > 1] + if chain_redirects: + alerts.append(MigrationAlert( + alert_type="redirect_chain", + severity="warning", + message=( + f"리디렉트 체인 {len(chain_redirects)}건 감지. " + f"크롤 효율성 및 링크 에퀴티에 영향." + ), + metric_value=float(len(chain_redirects)), + threshold=1.0, + affected_urls=[r.source for r in chain_redirects[:20]], + )) + + # Indexation drop alerts + if report.indexation: + idx = report.indexation + if idx.pre_count > 0: + idx_drop = abs(idx.change_pct) / 100.0 + if idx.change_pct < 0 and idx_drop >= self.INDEXATION_DROP_WARNING: + alerts.append(MigrationAlert( + alert_type="indexation_drop", + severity="warning" if idx_drop < 0.30 else "critical", + message=( + f"인덱싱 감소: {idx.change_pct:+.1f}% " + f"(이전 전 {idx.pre_count:,} -> 이전 후 {idx.post_count:,}페이지). " + f"디인덱싱된 페이지: {idx.deindexed_count}건" + ), + metric_value=idx.change_pct, + threshold=-self.INDEXATION_DROP_WARNING * 100, + affected_urls=idx.missing_pages[:20], + )) + + # Ranking loss alerts + significant_drops = [ + r for r in report.ranking_changes + if r.change < -self.RANKING_DROP_THRESHOLD and r.search_volume > 100 + ] + if significant_drops: + alerts.append(MigrationAlert( + alert_type="ranking_loss", + severity="warning" if len(significant_drops) < 20 else "critical", + message=( + f"주요 키워드 {len(significant_drops)}개의 순위 하락 감지 " + f"(5포지션 이상 하락, 검색량 100+)" + ), + metric_value=float(len(significant_drops)), + threshold=float(self.RANKING_DROP_THRESHOLD), + affected_urls=[r.url for r in significant_drops[:20]], + )) + + # Sort alerts by severity + severity_order = {"critical": 0, "warning": 1, "info": 2} + alerts.sort(key=lambda a: severity_order.get(a.severity, 3)) + + self.logger.info(f"Generated {len(alerts)} migration alerts") + return alerts + + # ------------------------------------------------------------------ + # Orchestrator + # ------------------------------------------------------------------ + + async def run( + self, + domain: str, + migration_date: str, + baseline_file: str | None = None, + migration_type: str = "domain-move", + ) -> MigrationReport: + """Orchestrate full post-migration monitoring pipeline.""" + timestamp = datetime.now().isoformat() + mig_date = datetime.strptime(migration_date, "%Y-%m-%d") + days_since = (datetime.now() - mig_date).days + + report = MigrationReport( + domain=self._extract_domain(domain), + migration_date=migration_date, + days_since_migration=days_since, + timestamp=timestamp, + ) + + # Load baseline if provided + baseline: dict[str, Any] | None = None + redirect_map_data: list[dict[str, str]] = [] + if baseline_file: + try: + with open(baseline_file, "r", encoding="utf-8") as f: + baseline_raw = json.load(f) + baseline = baseline_raw.get("baseline", baseline_raw) + redirect_map_data = [ + {"source": r.get("source", ""), "target": r.get("target", "")} + for r in baseline_raw.get("redirect_map", []) + ] + self.logger.info(f"Loaded baseline from {baseline_file}") + except Exception as e: + msg = f"Failed to load baseline file: {e}" + self.logger.error(msg) + report.errors.append(msg) + + try: + # Step 1: Traffic comparison + self.logger.info("Step 1/5: Comparing pre/post traffic...") + report.traffic_comparison = await self.compare_traffic( + domain, migration_date + ) + + # Step 2: Redirect health check + if redirect_map_data: + self.logger.info("Step 2/5: Checking redirect health...") + report.redirect_health = await self.check_redirects(redirect_map_data) + else: + self.logger.info( + "Step 2/5: Skipping redirect check (no baseline redirect map)" + ) + + # Step 3: Indexation tracking + self.logger.info("Step 3/5: Tracking indexation changes...") + report.indexation = await self.track_indexation(domain, baseline) + + # Step 4: Ranking tracking + self.logger.info("Step 4/5: Tracking keyword rankings...") + report.ranking_changes = await self.track_rankings(domain) + + # Step 5: Recovery estimation + self.logger.info("Step 5/5: Estimating recovery timeline...") + report.recovery_estimate = self.estimate_recovery( + report.traffic_comparison, migration_type + ) + + # Generate alerts + report.alerts = self.generate_alerts(report) + + self.logger.info( + f"Migration monitoring complete: " + f"{days_since} days since migration, " + f"{len(report.alerts)} alerts generated" + ) + + except Exception as e: + msg = f"Migration monitoring pipeline error: {e}" + self.logger.error(msg) + report.errors.append(msg) + + return report + + +# --------------------------------------------------------------------------- +# Output helpers +# --------------------------------------------------------------------------- + +def _format_text_report(report: MigrationReport) -> str: + """Format monitoring report as human-readable text.""" + lines: list[str] = [] + lines.append("=" * 70) + lines.append(" SEO MIGRATION MONITORING REPORT") + lines.append(f" Domain: {report.domain}") + lines.append(f" Migration Date: {report.migration_date}") + lines.append(f" Days Since Migration: {report.days_since_migration}") + lines.append(f" Generated: {report.timestamp}") + lines.append("=" * 70) + + # Alerts + if report.alerts: + lines.append("") + lines.append("--- ALERTS ---") + for alert in report.alerts: + icon = {"critical": "[!]", "warning": "[*]", "info": "[-]"}.get( + alert.severity, "[-]" + ) + lines.append(f" {icon} [{alert.severity.upper()}] {alert.message}") + if alert.affected_urls: + for url in alert.affected_urls[:5]: + lines.append(f" - {url}") + if len(alert.affected_urls) > 5: + lines.append(f" ... and {len(alert.affected_urls) - 5} more") + + # Traffic comparison + if report.traffic_comparison: + lines.append("") + lines.append("--- TRAFFIC COMPARISON ---") + lines.append( + f" {'Page Group':<40} {'Pre':>10} {'Post':>10} {'Change':>10} {'Status':>10}" + ) + lines.append(" " + "-" * 83) + for tc in report.traffic_comparison: + group = tc.page_group[:38] + lines.append( + f" {group:<40} {tc.pre_traffic:>10,} {tc.post_traffic:>10,} " + f"{tc.change_pct:>+9.1f}% {tc.status:>10}" + ) + + # Redirect health + if report.redirect_health: + broken = [r for r in report.redirect_health if r.is_broken] + chains = [r for r in report.redirect_health if r.chain_length > 1] + healthy = [r for r in report.redirect_health if not r.is_broken and r.chain_length <= 1] + + lines.append("") + lines.append("--- REDIRECT HEALTH ---") + lines.append(f" Total Redirects: {len(report.redirect_health):,}") + lines.append(f" Healthy: {len(healthy):,}") + lines.append(f" Broken: {len(broken):,}") + lines.append(f" Chains (>1 hop): {len(chains):,}") + + if broken: + lines.append("") + lines.append(" Broken Redirects:") + for r in broken[:10]: + lines.append(f" [{r.status_code}] {r.source} -> {r.target}") + if r.error: + lines.append(f" Error: {r.error}") + + # Indexation + if report.indexation: + idx = report.indexation + lines.append("") + lines.append("--- INDEXATION STATUS ---") + lines.append(f" Pre-Migration Pages: {idx.pre_count:,}") + lines.append(f" Post-Migration Pages: {idx.post_count:,}") + lines.append(f" Change: {idx.change_pct:+.1f}%") + lines.append(f" De-indexed Pages: {idx.deindexed_count:,}") + + if idx.missing_pages: + lines.append("") + lines.append(" Missing Pages (top 10):") + for page in idx.missing_pages[:10]: + lines.append(f" - {page}") + + # Ranking changes + if report.ranking_changes: + lines.append("") + lines.append("--- RANKING CHANGES ---") + drops = [r for r in report.ranking_changes if r.change < 0] + gains = [r for r in report.ranking_changes if r.change > 0] + + lines.append(f" Total Tracked: {len(report.ranking_changes)}") + lines.append(f" Improved: {len(gains)}") + lines.append(f" Declined: {len(drops)}") + + if drops: + lines.append("") + lines.append(" Biggest Drops:") + lines.append( + f" {'Keyword':<30} {'Pre':>6} {'Post':>6} {'Change':>8} {'Volume':>8}" + ) + lines.append(" " + "-" * 61) + for r in drops[:15]: + kw = r.keyword[:28] + lines.append( + f" {kw:<30} {r.pre_position:>6} {r.post_position:>6} " + f"{r.change:>+7} {r.search_volume:>8,}" + ) + + # Recovery estimate + if report.recovery_estimate: + est = report.recovery_estimate + lines.append("") + lines.append("--- RECOVERY ESTIMATE ---") + lines.append(f" {est.get('message', 'N/A')}") + weeks = est.get("estimated_weeks", "unknown") + confidence = est.get("confidence", "unknown") + lines.append(f" Estimated Weeks: {weeks}") + lines.append(f" Confidence: {confidence}") + + if report.errors: + lines.append("") + lines.append("--- ERRORS ---") + for err in report.errors: + lines.append(f" - {err}") + + lines.append("") + lines.append("=" * 70) + return "\n".join(lines) + + +def _serialize_report(report: MigrationReport) -> dict: + """Convert report to JSON-serializable dict.""" + output: dict[str, Any] = { + "domain": report.domain, + "migration_date": report.migration_date, + "days_since_migration": report.days_since_migration, + "traffic_comparison": [asdict(t) for t in report.traffic_comparison], + "redirect_health": [asdict(r) for r in report.redirect_health], + "indexation": asdict(report.indexation) if report.indexation else None, + "ranking_changes": [asdict(r) for r in report.ranking_changes], + "recovery_estimate": report.recovery_estimate, + "alerts": [asdict(a) for a in report.alerts], + "timestamp": report.timestamp, + } + if report.errors: + output["errors"] = report.errors + return output + + +# --------------------------------------------------------------------------- +# CLI +# --------------------------------------------------------------------------- + +def parse_args(argv: list[str] | None = None) -> argparse.Namespace: + parser = argparse.ArgumentParser( + description="Migration Monitor - Post-migration SEO monitoring and alerting", + formatter_class=argparse.RawDescriptionHelpFormatter, + epilog="""\ +Examples: + python migration_monitor.py --domain https://new-example.com --migration-date 2025-01-15 --baseline baseline.json --json + python migration_monitor.py --domain https://new-example.com --migration-date 2025-01-15 --json + """, + ) + parser.add_argument( + "--domain", + required=True, + help="Domain to monitor (post-migration URL)", + ) + parser.add_argument( + "--migration-date", + required=True, + help="Migration date in YYYY-MM-DD format", + ) + parser.add_argument( + "--baseline", + type=str, + default=None, + help="Path to baseline JSON file from migration_planner.py", + ) + parser.add_argument( + "--type", + choices=["domain-move", "platform", "url-restructure", "https", "subdomain"], + default="domain-move", + help="Migration type for recovery estimation (default: domain-move)", + ) + parser.add_argument( + "--json", + action="store_true", + default=False, + help="Output in JSON format", + ) + parser.add_argument( + "--output", + type=str, + default=None, + help="Save output to file path", + ) + return parser.parse_args(argv) + + +async def async_main(args: argparse.Namespace) -> None: + monitor = MigrationMonitor() + + report = await monitor.run( + domain=args.domain, + migration_date=args.migration_date, + baseline_file=args.baseline, + migration_type=args.type, + ) + + if args.json: + output_str = json.dumps(_serialize_report(report), indent=2, ensure_ascii=False) + else: + output_str = _format_text_report(report) + + if args.output: + with open(args.output, "w", encoding="utf-8") as f: + f.write(output_str) + logger.info(f"Migration report saved to {args.output}") + else: + print(output_str) + + monitor.print_stats() + + +def main() -> None: + args = parse_args() + asyncio.run(async_main(args)) + + +if __name__ == "__main__": + main() diff --git a/custom-skills/33-seo-migration-planner/code/scripts/migration_planner.py b/custom-skills/33-seo-migration-planner/code/scripts/migration_planner.py new file mode 100644 index 0000000..694cc61 --- /dev/null +++ b/custom-skills/33-seo-migration-planner/code/scripts/migration_planner.py @@ -0,0 +1,754 @@ +""" +Migration Planner - SEO Site Migration Planning +================================================ +Purpose: Pre-migration risk assessment, redirect mapping, URL inventory, + crawl baseline capture, and checklist generation for site migrations. +Python: 3.10+ + +Usage: + python migration_planner.py --domain https://example.com --type domain-move --new-domain https://new-example.com --json + python migration_planner.py --domain https://example.com --type platform --json + python migration_planner.py --domain https://example.com --type url-restructure --json + python migration_planner.py --domain http://example.com --type https --json + python migration_planner.py --domain https://blog.example.com --type subdomain --new-domain https://example.com/blog --json +""" + +import argparse +import asyncio +import json +import logging +import sys +from dataclasses import dataclass, field, asdict +from datetime import datetime +from typing import Any +from urllib.parse import urlparse + +from base_client import BaseAsyncClient, config + +logger = logging.getLogger(__name__) + + +# --------------------------------------------------------------------------- +# Data classes +# --------------------------------------------------------------------------- + +@dataclass +class MigrationURL: + """A single URL in the migration inventory with associated metrics.""" + url: str = "" + traffic: int = 0 + keywords: int = 0 + backlinks: int = 0 + risk_score: float = 0.0 + redirect_target: str = "" + status_code: int = 200 + priority: str = "low" # critical / high / medium / low + + +@dataclass +class MigrationBaseline: + """Pre-migration baseline snapshot of the site.""" + domain: str = "" + total_urls: int = 0 + total_traffic: int = 0 + total_keywords: int = 0 + total_referring_domains: int = 0 + top_pages: list[dict[str, Any]] = field(default_factory=list) + url_inventory: list[MigrationURL] = field(default_factory=list) + + +@dataclass +class RedirectMap: + """A single redirect mapping entry.""" + source: str = "" + target: str = "" + status_code: int = 301 + priority: str = "low" # critical / high / medium / low + risk_score: float = 0.0 + + +@dataclass +class RiskAssessment: + """Aggregated risk assessment for the migration.""" + high_risk_urls: int = 0 + medium_risk_urls: int = 0 + low_risk_urls: int = 0 + overall_risk: str = "low" # critical / high / medium / low + top_risk_urls: list[dict[str, Any]] = field(default_factory=list) + risk_factors: list[str] = field(default_factory=list) + + +@dataclass +class MigrationPlan: + """Complete migration plan output.""" + migration_type: str = "" + domain: str = "" + new_domain: str = "" + baseline: MigrationBaseline | None = None + redirect_map: list[RedirectMap] = field(default_factory=list) + risk_assessment: RiskAssessment | None = None + pre_migration_checklist: list[dict[str, Any]] = field(default_factory=list) + timestamp: str = "" + errors: list[str] = field(default_factory=list) + + +# --------------------------------------------------------------------------- +# Migration types +# --------------------------------------------------------------------------- + +MIGRATION_TYPES = { + "domain-move": "Domain Move (old domain -> new domain)", + "platform": "Platform Change (CMS/framework migration)", + "url-restructure": "URL Restructuring (path/slug changes)", + "https": "HTTPS Migration (HTTP -> HTTPS)", + "subdomain": "Subdomain Consolidation (subdomain -> subfolder)", +} + + +# --------------------------------------------------------------------------- +# Planner +# --------------------------------------------------------------------------- + +class MigrationPlanner(BaseAsyncClient): + """Plans site migrations using Firecrawl for crawling and Ahrefs for SEO data.""" + + def __init__(self): + super().__init__(max_concurrent=5, requests_per_second=2.0) + + @staticmethod + def _extract_domain(url: str) -> str: + """Extract bare domain from URL or return as-is if already bare.""" + if "://" in url: + parsed = urlparse(url) + return parsed.netloc.lower().replace("www.", "") + return url.lower().replace("www.", "") + + @staticmethod + def _normalize_url(url: str) -> str: + """Ensure URL has a scheme.""" + if not url.startswith(("http://", "https://")): + return f"https://{url}" + return url + + # ------------------------------------------------------------------ + # MCP wrappers (return dicts; Claude MCP bridge fills these) + # ------------------------------------------------------------------ + + async def _call_ahrefs(self, tool: str, params: dict[str, Any]) -> dict: + """Simulate Ahrefs MCP call. In production, routed via MCP bridge.""" + self.logger.info(f"Ahrefs MCP call: {tool} | params={params}") + return {"tool": tool, "params": params, "data": {}} + + async def _call_firecrawl(self, tool: str, params: dict[str, Any]) -> dict: + """Simulate Firecrawl MCP call. In production, routed via MCP bridge.""" + self.logger.info(f"Firecrawl MCP call: {tool} | params={params}") + return {"tool": tool, "params": params, "data": {}} + + # ------------------------------------------------------------------ + # URL Inventory + # ------------------------------------------------------------------ + + async def crawl_url_inventory(self, domain: str) -> list[MigrationURL]: + """Crawl the site via Firecrawl to capture all URLs and status codes.""" + url = self._normalize_url(domain) + self.logger.info(f"Crawling URL inventory for {url}") + + resp = await self._call_firecrawl( + "firecrawl_crawl", + {"url": url, "limit": 5000, "scrapeOptions": {"formats": ["links"]}}, + ) + + crawl_data = resp.get("data", {}) + pages = crawl_data.get("pages", []) + + inventory: list[MigrationURL] = [] + for page in pages: + migration_url = MigrationURL( + url=page.get("url", ""), + status_code=int(page.get("status_code", 200)), + ) + inventory.append(migration_url) + + if not inventory: + # Fallback: create a single entry for the domain + inventory.append(MigrationURL(url=url, status_code=200)) + self.logger.warning( + "Firecrawl returned no pages; created placeholder entry. " + "Verify Firecrawl MCP is configured." + ) + else: + self.logger.info(f"Crawled {len(inventory)} URLs from {domain}") + + return inventory + + # ------------------------------------------------------------------ + # Ahrefs Baseline + # ------------------------------------------------------------------ + + async def fetch_top_pages_baseline( + self, domain: str, limit: int = 500 + ) -> list[dict[str, Any]]: + """Fetch top pages with traffic and keyword data from Ahrefs.""" + domain = self._extract_domain(domain) + self.logger.info(f"Fetching top pages baseline for {domain}") + + resp = await self._call_ahrefs( + "site-explorer-top-pages", + {"target": domain, "limit": limit}, + ) + + pages_raw = resp.get("data", {}).get("pages", []) + top_pages: list[dict[str, Any]] = [] + for page in pages_raw: + top_pages.append({ + "url": page.get("url", ""), + "traffic": int(page.get("traffic", 0)), + "keywords": int(page.get("keywords", 0)), + "top_keyword": page.get("top_keyword", ""), + "position": int(page.get("position", 0)), + }) + + self.logger.info(f"Fetched {len(top_pages)} top pages for {domain}") + return top_pages + + async def fetch_site_metrics(self, domain: str) -> dict[str, Any]: + """Fetch overall site metrics from Ahrefs.""" + domain = self._extract_domain(domain) + + metrics_resp = await self._call_ahrefs( + "site-explorer-metrics", {"target": domain} + ) + metrics = metrics_resp.get("data", {}) + + backlinks_resp = await self._call_ahrefs( + "site-explorer-backlinks-stats", {"target": domain} + ) + backlinks = backlinks_resp.get("data", {}) + + return { + "organic_traffic": int(metrics.get("organic_traffic", 0)), + "organic_keywords": int(metrics.get("organic_keywords", 0)), + "referring_domains": int(backlinks.get("referring_domains", 0)), + } + + async def fetch_page_backlinks(self, url: str) -> int: + """Fetch backlink count for a specific URL.""" + resp = await self._call_ahrefs( + "site-explorer-backlinks-stats", {"target": url} + ) + return int(resp.get("data", {}).get("referring_domains", 0)) + + async def fetch_page_keywords(self, url: str) -> list[dict[str, Any]]: + """Fetch keyword rankings for a specific URL.""" + resp = await self._call_ahrefs( + "site-explorer-organic-keywords", + {"target": url, "limit": 100}, + ) + return resp.get("data", {}).get("keywords", []) + + # ------------------------------------------------------------------ + # Risk Assessment + # ------------------------------------------------------------------ + + def assess_url_risk(self, url_data: MigrationURL) -> float: + """Score risk for a single URL based on traffic, backlinks, and keywords. + + Risk score 0-100: + - Traffic weight: 40% (high traffic = high risk if migration fails) + - Backlinks weight: 30% (external links break if redirect fails) + - Keywords weight: 30% (ranking loss risk) + """ + # Normalize each factor to 0-100 + # Traffic: 1000+ monthly visits = high risk + traffic_score = min((url_data.traffic / 1000) * 100, 100) if url_data.traffic > 0 else 0 + + # Backlinks: 50+ referring domains = high risk + backlinks_score = min((url_data.backlinks / 50) * 100, 100) if url_data.backlinks > 0 else 0 + + # Keywords: 20+ rankings = high risk + keywords_score = min((url_data.keywords / 20) * 100, 100) if url_data.keywords > 0 else 0 + + risk = ( + traffic_score * 0.40 + + backlinks_score * 0.30 + + keywords_score * 0.30 + ) + + return round(min(max(risk, 0), 100), 1) + + def classify_priority(self, risk_score: float) -> str: + """Classify URL priority based on risk score.""" + if risk_score >= 75: + return "critical" + elif risk_score >= 50: + return "high" + elif risk_score >= 25: + return "medium" + else: + return "low" + + # ------------------------------------------------------------------ + # Redirect Map + # ------------------------------------------------------------------ + + def generate_redirect_map( + self, + url_inventory: list[MigrationURL], + migration_type: str, + new_domain: str | None = None, + ) -> list[RedirectMap]: + """Generate redirect mappings based on migration type.""" + redirect_map: list[RedirectMap] = [] + + for url_entry in url_inventory: + source = url_entry.url + if not source: + continue + + parsed = urlparse(source) + path = parsed.path + + # Determine target URL based on migration type + if migration_type == "domain-move" and new_domain: + new_parsed = urlparse(self._normalize_url(new_domain)) + target = f"{new_parsed.scheme}://{new_parsed.netloc}{path}" + + elif migration_type == "https": + target = source.replace("http://", "https://") + + elif migration_type == "subdomain" and new_domain: + # e.g., blog.example.com/page -> example.com/blog/page + new_parsed = urlparse(self._normalize_url(new_domain)) + target = f"{new_parsed.scheme}://{new_parsed.netloc}{new_parsed.path.rstrip('/')}{path}" + + elif migration_type == "url-restructure": + # Placeholder: URL restructuring requires custom mapping rules + # In practice, user provides a mapping CSV or pattern + target = source # Will need manual mapping + + elif migration_type == "platform": + # Platform change: URLs may stay the same or change + target = source # Will need verification post-migration + + else: + target = source + + redirect_entry = RedirectMap( + source=source, + target=target, + status_code=301, + priority=url_entry.priority, + risk_score=url_entry.risk_score, + ) + redirect_map.append(redirect_entry) + + # Sort by risk score descending (highest risk first) + redirect_map.sort(key=lambda r: r.risk_score, reverse=True) + + self.logger.info( + f"Generated {len(redirect_map)} redirect mappings " + f"for {migration_type} migration" + ) + return redirect_map + + # ------------------------------------------------------------------ + # Checklist + # ------------------------------------------------------------------ + + def generate_checklist(self, migration_type: str) -> list[dict[str, Any]]: + """Generate pre-migration checklist based on migration type.""" + # Common checklist items for all migration types + common_items = [ + {"step": 1, "category": "Baseline", "task": "URL 인벤토리 크롤링 완료", "description": "Firecrawl로 전체 URL 목록 및 상태 코드 캡처", "status": "pending"}, + {"step": 2, "category": "Baseline", "task": "트래픽 베이스라인 캡처", "description": "Ahrefs에서 페이지별 트래픽, 키워드, 백링크 데이터 수집", "status": "pending"}, + {"step": 3, "category": "Baseline", "task": "Google Search Console 데이터 내보내기", "description": "현재 인덱싱 상태, 사이트맵 현황, 크롤 통계 기록", "status": "pending"}, + {"step": 4, "category": "Baseline", "task": "Google Analytics 벤치마크 저장", "description": "이전 전 30일/90일 트래픽 데이터 스냅샷 저장", "status": "pending"}, + {"step": 5, "category": "Redirects", "task": "Redirect 맵 생성", "description": "모든 URL에 대한 301 리디렉트 매핑 완료", "status": "pending"}, + {"step": 6, "category": "Redirects", "task": "고위험 URL 우선 검증", "description": "트래픽/백링크 기준 상위 URL 리디렉트 수동 확인", "status": "pending"}, + {"step": 7, "category": "Technical", "task": "robots.txt 업데이트 준비", "description": "새 도메인/구조에 맞는 robots.txt 작성", "status": "pending"}, + {"step": 8, "category": "Technical", "task": "XML 사이트맵 업데이트 준비", "description": "새 URL 구조 반영한 사이트맵 생성", "status": "pending"}, + {"step": 9, "category": "Technical", "task": "Canonical 태그 업데이트 계획", "description": "모든 페이지의 canonical URL이 새 주소를 가리키도록 변경", "status": "pending"}, + {"step": 10, "category": "Technical", "task": "Internal link 업데이트 계획", "description": "사이트 내부 링크가 새 URL을 직접 가리키도록 변경", "status": "pending"}, + {"step": 11, "category": "Monitoring", "task": "모니터링 대시보드 설정", "description": "이전 후 트래픽, 인덱싱, 리디렉트 상태 모니터링 준비", "status": "pending"}, + {"step": 12, "category": "Monitoring", "task": "알림 임계값 설정", "description": "트래픽 20% 이상 하락 시 알림 설정", "status": "pending"}, + ] + + # Type-specific items + type_specific: dict[str, list[dict[str, Any]]] = { + "domain-move": [ + {"step": 13, "category": "Domain", "task": "새 도메인 DNS 설정", "description": "DNS A/CNAME 레코드 설정 및 전파 확인", "status": "pending"}, + {"step": 14, "category": "Domain", "task": "Google Search Console에 새 도메인 등록", "description": "새 도메인 속성 추가 및 소유권 확인", "status": "pending"}, + {"step": 15, "category": "Domain", "task": "도메인 변경 알림 (GSC Change of Address)", "description": "Search Console에서 주소 변경 도구 실행", "status": "pending"}, + {"step": 16, "category": "Domain", "task": "SSL 인증서 설치", "description": "새 도메인에 유효한 SSL 인증서 설치", "status": "pending"}, + ], + "platform": [ + {"step": 13, "category": "Platform", "task": "URL 구조 매핑 확인", "description": "새 플랫폼에서 동일한 URL 구조 유지 여부 확인", "status": "pending"}, + {"step": 14, "category": "Platform", "task": "메타 태그 이전 확인", "description": "Title, Description, Open Graph 태그 동일 여부 확인", "status": "pending"}, + {"step": 15, "category": "Platform", "task": "구조화된 데이터 이전", "description": "JSON-LD Schema Markup 동일 여부 확인", "status": "pending"}, + {"step": 16, "category": "Platform", "task": "스테이징 환경 테스트", "description": "스테이징에서 전체 크롤링 및 리디렉트 테스트 실행", "status": "pending"}, + ], + "url-restructure": [ + {"step": 13, "category": "URL", "task": "URL 패턴 매핑 문서화", "description": "기존 → 신규 URL 패턴 규칙 문서화", "status": "pending"}, + {"step": 14, "category": "URL", "task": "정규식 리디렉트 규칙 작성", "description": "서버 레벨 리디렉트 규칙 (nginx/Apache) 작성", "status": "pending"}, + {"step": 15, "category": "URL", "task": "Breadcrumb 업데이트", "description": "새 URL 구조에 맞게 Breadcrumb 네비게이션 수정", "status": "pending"}, + ], + "https": [ + {"step": 13, "category": "HTTPS", "task": "SSL 인증서 설치 및 확인", "description": "유효한 SSL 인증서 설치 (Let's Encrypt 또는 상용 인증서)", "status": "pending"}, + {"step": 14, "category": "HTTPS", "task": "Mixed Content 점검", "description": "HTTP로 로드되는 리소스 (이미지, CSS, JS) 식별 및 수정", "status": "pending"}, + {"step": 15, "category": "HTTPS", "task": "HSTS 헤더 설정", "description": "Strict-Transport-Security 헤더 활성화", "status": "pending"}, + ], + "subdomain": [ + {"step": 13, "category": "Subdomain", "task": "서브도메인 → 서브폴더 매핑", "description": "서브도메인 경로를 서브폴더 경로로 매핑", "status": "pending"}, + {"step": 14, "category": "Subdomain", "task": "서버 리디렉트 규칙 설정", "description": "서브도메인에서 메인 도메인으로의 301 리디렉트 규칙", "status": "pending"}, + {"step": 15, "category": "Subdomain", "task": "DNS 설정 업데이트", "description": "서브도메인 DNS 레코드 유지 (리디렉트용)", "status": "pending"}, + ], + } + + checklist = common_items.copy() + if migration_type in type_specific: + checklist.extend(type_specific[migration_type]) + + self.logger.info( + f"Generated {len(checklist)} checklist items for {migration_type} migration" + ) + return checklist + + # ------------------------------------------------------------------ + # Orchestrator + # ------------------------------------------------------------------ + + async def run( + self, + domain: str, + migration_type: str, + new_domain: str | None = None, + ) -> MigrationPlan: + """Orchestrate full migration planning pipeline.""" + timestamp = datetime.now().isoformat() + plan = MigrationPlan( + migration_type=migration_type, + domain=self._extract_domain(domain), + new_domain=self._extract_domain(new_domain) if new_domain else "", + timestamp=timestamp, + ) + + try: + # Step 1: Crawl URL inventory + self.logger.info("Step 1/6: Crawling URL inventory via Firecrawl...") + url_inventory = await self.crawl_url_inventory(domain) + + # Step 2: Fetch Ahrefs baseline + self.logger.info("Step 2/6: Fetching Ahrefs top pages baseline...") + top_pages = await self.fetch_top_pages_baseline(domain) + site_metrics = await self.fetch_site_metrics(domain) + + # Step 3: Enrich URL inventory with Ahrefs data + self.logger.info("Step 3/6: Enriching URLs with traffic/backlink data...") + top_pages_map: dict[str, dict] = {} + for page in top_pages: + page_url = page.get("url", "") + if page_url: + top_pages_map[page_url] = page + + for url_entry in url_inventory: + page_data = top_pages_map.get(url_entry.url, {}) + url_entry.traffic = int(page_data.get("traffic", 0)) + url_entry.keywords = int(page_data.get("keywords", 0)) + + # Step 4: Risk assessment per URL + self.logger.info("Step 4/6: Scoring risk per URL...") + for url_entry in url_inventory: + url_entry.risk_score = self.assess_url_risk(url_entry) + url_entry.priority = self.classify_priority(url_entry.risk_score) + + # Build baseline + baseline = MigrationBaseline( + domain=self._extract_domain(domain), + total_urls=len(url_inventory), + total_traffic=site_metrics.get("organic_traffic", 0), + total_keywords=site_metrics.get("organic_keywords", 0), + total_referring_domains=site_metrics.get("referring_domains", 0), + top_pages=top_pages[:50], # Store top 50 for reference + url_inventory=url_inventory, + ) + plan.baseline = baseline + + # Step 5: Generate redirect map + self.logger.info("Step 5/6: Generating redirect map...") + plan.redirect_map = self.generate_redirect_map( + url_inventory, migration_type, new_domain + ) + + # Build risk assessment summary + high_risk = sum(1 for u in url_inventory if u.risk_score >= 75) + medium_risk = sum(1 for u in url_inventory if 25 <= u.risk_score < 75) + low_risk = sum(1 for u in url_inventory if u.risk_score < 25) + + # Determine overall risk level + if high_risk > len(url_inventory) * 0.2: + overall_risk = "critical" + elif high_risk > len(url_inventory) * 0.1: + overall_risk = "high" + elif medium_risk > len(url_inventory) * 0.3: + overall_risk = "medium" + else: + overall_risk = "low" + + # Top risk URLs + sorted_urls = sorted(url_inventory, key=lambda u: u.risk_score, reverse=True) + top_risk = [ + { + "url": u.url, + "risk_score": u.risk_score, + "traffic": u.traffic, + "keywords": u.keywords, + "backlinks": u.backlinks, + } + for u in sorted_urls[:20] + ] + + # Risk factors + risk_factors: list[str] = [] + if high_risk > 0: + risk_factors.append( + f"{high_risk}개 고위험 URL (트래픽/백링크 손실 위험)" + ) + if baseline.total_traffic > 10000: + risk_factors.append( + f"월간 오가닉 트래픽 {baseline.total_traffic:,}회 — 이전 실패 시 큰 영향" + ) + if baseline.total_referring_domains > 500: + risk_factors.append( + f"참조 도메인 {baseline.total_referring_domains:,}개 — 리디렉트 누락 시 링크 에퀴티 손실" + ) + if migration_type == "domain-move": + risk_factors.append( + "도메인 변경은 가장 위험한 이전 유형 — 최소 3-6개월 회복 예상" + ) + elif migration_type == "url-restructure": + risk_factors.append( + "URL 구조 변경 시 모든 내부/외부 링크 영향 — 정규식 리디렉트 필수" + ) + + plan.risk_assessment = RiskAssessment( + high_risk_urls=high_risk, + medium_risk_urls=medium_risk, + low_risk_urls=low_risk, + overall_risk=overall_risk, + top_risk_urls=top_risk, + risk_factors=risk_factors, + ) + + # Step 6: Generate checklist + self.logger.info("Step 6/6: Generating pre-migration checklist...") + plan.pre_migration_checklist = self.generate_checklist(migration_type) + + self.logger.info( + f"Migration plan complete: {len(url_inventory)} URLs inventoried, " + f"{len(plan.redirect_map)} redirects mapped, " + f"overall risk: {overall_risk}" + ) + + except Exception as e: + msg = f"Migration planning pipeline error: {e}" + self.logger.error(msg) + plan.errors.append(msg) + + return plan + + +# --------------------------------------------------------------------------- +# Output helpers +# --------------------------------------------------------------------------- + +def _format_text_report(plan: MigrationPlan) -> str: + """Format migration plan as human-readable text report.""" + lines: list[str] = [] + lines.append("=" * 70) + lines.append(" SEO MIGRATION PLAN") + lines.append(f" Domain: {plan.domain}") + if plan.new_domain: + lines.append(f" New Domain: {plan.new_domain}") + lines.append(f" Migration Type: {MIGRATION_TYPES.get(plan.migration_type, plan.migration_type)}") + lines.append(f" Generated: {plan.timestamp}") + lines.append("=" * 70) + + if plan.baseline: + b = plan.baseline + lines.append("") + lines.append("--- BASELINE ---") + lines.append(f" Total URLs: {b.total_urls:,}") + lines.append(f" Organic Traffic: {b.total_traffic:,}") + lines.append(f" Organic Keywords: {b.total_keywords:,}") + lines.append(f" Referring Domains: {b.total_referring_domains:,}") + + if plan.risk_assessment: + r = plan.risk_assessment + lines.append("") + lines.append("--- RISK ASSESSMENT ---") + lines.append(f" Overall Risk: {r.overall_risk.upper()}") + lines.append(f" High Risk URLs: {r.high_risk_urls:,}") + lines.append(f" Medium Risk: {r.medium_risk_urls:,}") + lines.append(f" Low Risk: {r.low_risk_urls:,}") + if r.risk_factors: + lines.append("") + lines.append(" Risk Factors:") + for factor in r.risk_factors: + lines.append(f" - {factor}") + if r.top_risk_urls: + lines.append("") + lines.append(" Top Risk URLs:") + for url_info in r.top_risk_urls[:10]: + lines.append( + f" [{url_info['risk_score']:.0f}] {url_info['url']} " + f"(traffic={url_info['traffic']:,}, kw={url_info['keywords']})" + ) + + if plan.redirect_map: + lines.append("") + lines.append(f"--- REDIRECT MAP ({len(plan.redirect_map)} entries) ---") + # Show top 20 by risk + for i, rmap in enumerate(plan.redirect_map[:20], 1): + lines.append( + f" {i:>3}. [{rmap.priority.upper():>8}] " + f"{rmap.source} -> {rmap.target}" + ) + if len(plan.redirect_map) > 20: + lines.append(f" ... and {len(plan.redirect_map) - 20} more entries") + + if plan.pre_migration_checklist: + lines.append("") + lines.append("--- PRE-MIGRATION CHECKLIST ---") + for item in plan.pre_migration_checklist: + status_marker = "[ ]" if item["status"] == "pending" else "[x]" + lines.append( + f" {status_marker} Step {item['step']}: {item['task']}" + ) + lines.append(f" {item['description']}") + + if plan.errors: + lines.append("") + lines.append("--- ERRORS ---") + for err in plan.errors: + lines.append(f" - {err}") + + lines.append("") + lines.append("=" * 70) + return "\n".join(lines) + + +def _serialize_plan(plan: MigrationPlan) -> dict: + """Convert plan to JSON-serializable dict.""" + output: dict[str, Any] = { + "domain": plan.domain, + "new_domain": plan.new_domain, + "migration_type": plan.migration_type, + "baseline": None, + "redirect_map": [asdict(r) for r in plan.redirect_map], + "risk_assessment": asdict(plan.risk_assessment) if plan.risk_assessment else None, + "pre_migration_checklist": plan.pre_migration_checklist, + "timestamp": plan.timestamp, + } + + if plan.baseline: + output["baseline"] = { + "domain": plan.baseline.domain, + "total_urls": plan.baseline.total_urls, + "total_traffic": plan.baseline.total_traffic, + "total_keywords": plan.baseline.total_keywords, + "total_referring_domains": plan.baseline.total_referring_domains, + "top_pages": plan.baseline.top_pages, + "url_inventory": [asdict(u) for u in plan.baseline.url_inventory], + } + + if plan.errors: + output["errors"] = plan.errors + + return output + + +# --------------------------------------------------------------------------- +# CLI +# --------------------------------------------------------------------------- + +def parse_args(argv: list[str] | None = None) -> argparse.Namespace: + parser = argparse.ArgumentParser( + description="SEO Migration Planner - Pre-migration risk assessment and redirect mapping", + formatter_class=argparse.RawDescriptionHelpFormatter, + epilog="""\ +Examples: + python migration_planner.py --domain https://example.com --type domain-move --new-domain https://new-example.com --json + python migration_planner.py --domain https://example.com --type platform --json + python migration_planner.py --domain https://example.com --type url-restructure --json + python migration_planner.py --domain http://example.com --type https --json + python migration_planner.py --domain https://blog.example.com --type subdomain --new-domain https://example.com/blog --json + """, + ) + parser.add_argument( + "--domain", + required=True, + help="Target website URL or domain to plan migration for", + ) + parser.add_argument( + "--type", + required=True, + choices=["domain-move", "platform", "url-restructure", "https", "subdomain"], + help="Migration type", + ) + parser.add_argument( + "--new-domain", + type=str, + default=None, + help="New domain/URL (required for domain-move and subdomain types)", + ) + parser.add_argument( + "--json", + action="store_true", + default=False, + help="Output in JSON format", + ) + parser.add_argument( + "--output", + type=str, + default=None, + help="Save output to file path", + ) + return parser.parse_args(argv) + + +async def async_main(args: argparse.Namespace) -> None: + # Validate required arguments for specific types + if args.type in ("domain-move", "subdomain") and not args.new_domain: + logger.error(f"--new-domain is required for {args.type} migration type") + sys.exit(1) + + planner = MigrationPlanner() + + plan = await planner.run( + domain=args.domain, + migration_type=args.type, + new_domain=args.new_domain, + ) + + if args.json: + output_str = json.dumps(_serialize_plan(plan), indent=2, ensure_ascii=False) + else: + output_str = _format_text_report(plan) + + if args.output: + with open(args.output, "w", encoding="utf-8") as f: + f.write(output_str) + logger.info(f"Migration plan saved to {args.output}") + else: + print(output_str) + + planner.print_stats() + + +def main() -> None: + args = parse_args() + asyncio.run(async_main(args)) + + +if __name__ == "__main__": + main() diff --git a/custom-skills/33-seo-migration-planner/code/scripts/requirements.txt b/custom-skills/33-seo-migration-planner/code/scripts/requirements.txt new file mode 100644 index 0000000..e473ced --- /dev/null +++ b/custom-skills/33-seo-migration-planner/code/scripts/requirements.txt @@ -0,0 +1,8 @@ +# 33-seo-migration-planner dependencies +requests>=2.31.0 +aiohttp>=3.9.0 +pandas>=2.1.0 +tenacity>=8.2.0 +tqdm>=4.66.0 +python-dotenv>=1.0.0 +rich>=13.7.0 diff --git a/custom-skills/33-seo-migration-planner/desktop/SKILL.md b/custom-skills/33-seo-migration-planner/desktop/SKILL.md new file mode 100644 index 0000000..e7a3a42 --- /dev/null +++ b/custom-skills/33-seo-migration-planner/desktop/SKILL.md @@ -0,0 +1,171 @@ +--- +name: seo-migration-planner +description: | + SEO site migration planning and monitoring. Triggers: site migration, domain move, redirect mapping, platform migration, URL restructuring, HTTPS migration, subdomain consolidation, 사이트 이전, 도메인 이전, 리디렉트 매핑. +--- + +# SEO Migration Planner & Monitor + +## Purpose + +Comprehensive site migration planning and post-migration monitoring for SEO: crawl-based URL inventory, traffic/keyword baseline capture via Ahrefs, redirect map generation with per-URL risk scoring, pre-migration checklist creation, and post-launch traffic/indexation/ranking recovery tracking with automated alerts. Supports domain moves, platform changes, URL restructuring, HTTPS migrations, and subdomain consolidation. + +## Core Capabilities + +1. **URL Inventory** - Crawl entire site via Firecrawl to capture all URLs and status codes +2. **Traffic Baseline** - Capture per-page traffic, keywords, and backlinks via Ahrefs +3. **Redirect Map Generation** - Create old URL -> new URL mappings with 301 redirect rules +4. **Risk Scoring** - Score each URL (0-100) based on traffic, backlinks, and keyword rankings +5. **Pre-Migration Checklist** - Generate type-specific migration checklist (Korean) +6. **Post-Migration Traffic Comparison** - Compare pre vs post traffic by page group +7. **Redirect Health Check** - Detect broken redirects, chains, and loops +8. **Indexation Tracking** - Monitor indexed page count changes and missing pages +9. **Ranking Monitoring** - Track keyword position changes for priority keywords +10. **Recovery Estimation** - Estimate traffic recovery timeline based on migration type +11. **Alert Generation** - Flag traffic drops >20%, broken redirects, indexation loss + +## MCP Tool Usage + +### Ahrefs for SEO Baseline & Monitoring +``` +mcp__ahrefs__site-explorer-metrics: Current organic metrics (traffic, keywords) +mcp__ahrefs__site-explorer-metrics-history: Historical metrics for pre/post comparison +mcp__ahrefs__site-explorer-top-pages: Top performing pages for baseline +mcp__ahrefs__site-explorer-pages-by-traffic: Pages ranked by traffic for risk scoring +mcp__ahrefs__site-explorer-organic-keywords: Keyword rankings per page +mcp__ahrefs__site-explorer-referring-domains: Referring domains for risk scoring +mcp__ahrefs__site-explorer-backlinks-stats: Backlink overview for migration impact +``` + +### Firecrawl for URL Inventory & Redirect Verification +``` +mcp__firecrawl__firecrawl_crawl: Crawl entire site for URL inventory +mcp__firecrawl__firecrawl_scrape: Verify individual redirect health +``` + +### Notion for Report Storage +``` +mcp__notion__notion-create-pages: Save reports to SEO Audit Log +``` + +### Perplexity for Migration Best Practices +``` +mcp__perplexity__search: Research migration best practices and common pitfalls +``` + +## Workflow + +### Pre-Migration Planning +1. Accept target domain, migration type, and new domain (if applicable) +2. Crawl URL inventory via Firecrawl (capture all URLs + status codes) +3. Fetch Ahrefs top pages baseline (traffic, keywords, backlinks per page) +4. Fetch site-level metrics (total traffic, keywords, referring domains) +5. Enrich URL inventory with Ahrefs traffic/backlink data +6. Score risk per URL (0-100) based on traffic weight (40%), backlinks (30%), keywords (30%) +7. Generate redirect map (old URL -> new URL) based on migration type +8. Aggregate risk assessment (high/medium/low URL counts, overall risk level) +9. Generate pre-migration checklist (common + type-specific items, in Korean) +10. Save baseline and plan to Notion + +### Post-Migration Monitoring +1. Accept domain, migration date, and optional baseline JSON +2. Compare pre vs post traffic using Ahrefs metrics history +3. Check redirect health via Firecrawl (broken, chains, loops) +4. Track indexation changes (pre vs post page count, missing pages) +5. Track keyword ranking changes for priority keywords +6. Estimate recovery timeline based on traffic delta and migration type +7. Generate alerts for significant issues (traffic >20% drop, broken redirects, etc.) +8. Save monitoring report to Notion + +## Output Format + +### Planning Report +```markdown +## SEO 사이트 이전 계획: [domain] + +### 베이스라인 +- 전체 URL 수: [count] +- 오가닉 트래픽: [traffic] +- 오가닉 키워드: [keywords] +- 참조 도메인: [count] + +### 위험 평가 +- 전체 위험도: [HIGH/MEDIUM/LOW] +- 고위험 URL: [count]개 +- 중위험 URL: [count]개 +- 저위험 URL: [count]개 + +### 리디렉트 맵 (상위 위험 URL) +| Source URL | Target URL | Risk Score | Priority | +|------------|------------|------------|----------| + +### 사전 체크리스트 +- [ ] Step 1: ... +- [ ] Step 2: ... +``` + +### Monitoring Report +```markdown +## SEO 이전 모니터링 보고서: [domain] +### 이전일: [date] | 경과일: [N]일 + +### 알림 +- [severity] [message] + +### 트래픽 비교 +| Page Group | Pre | Post | Change | Status | +|------------|-----|------|--------|--------| + +### 리디렉트 상태 +- 전체: [count] | 정상: [count] | 깨짐: [count] | 체인: [count] + +### 인덱싱 현황 +- 이전 전: [count] | 이전 후: [count] | 변화: [pct]% + +### 회복 예상 +- 예상 기간: [weeks]주 +- 현재 회복률: [pct]% +``` + +## Risk Scoring Methodology + +| Factor | Weight | Scale | +|--------|--------|-------| +| Traffic | 40% | 1,000+ monthly visits = high risk | +| Backlinks | 30% | 50+ referring domains = high risk | +| Keywords | 30% | 20+ keyword rankings = high risk | + +### Priority Classification + +| Risk Score | Priority | Action | +|------------|----------|--------| +| 75-100 | Critical | Manual redirect verification required | +| 50-74 | High | Priority redirect with monitoring | +| 25-49 | Medium | Standard redirect | +| 0-24 | Low | Batch redirect | + +## Alert Thresholds + +| Alert Type | Threshold | Severity | +|------------|-----------|----------| +| Traffic drop | >20% | warning; >40% critical | +| Broken redirects | >0 | warning; >10 critical | +| Redirect chains | >0 | warning | +| Indexation loss | >10% | warning; >30% critical | +| Ranking drop | >5 positions (volume 100+) | warning; >20 keywords critical | + +## Limitations + +- Ahrefs data has ~24h freshness lag +- Firecrawl crawl limited to 5,000 URLs per run +- Redirect chain detection depends on Firecrawl following redirects +- Recovery estimation is heuristic-based on industry averages +- URL restructuring requires manual mapping rules (no auto-pattern detection) + +## Notion Output (Required) + +All reports MUST be saved to OurDigital SEO Audit Log: +- **Database ID**: `2c8581e5-8a1e-8035-880b-e38cefc2f3ef` +- **Properties**: Issue (title), Site (url), Category ("SEO Migration"), Priority, Found Date, Audit ID +- **Language**: Korean with English technical terms +- **Audit ID Format**: MIGR-YYYYMMDD-NNN diff --git a/custom-skills/33-seo-migration-planner/desktop/skill.yaml b/custom-skills/33-seo-migration-planner/desktop/skill.yaml new file mode 100644 index 0000000..594a85f --- /dev/null +++ b/custom-skills/33-seo-migration-planner/desktop/skill.yaml @@ -0,0 +1,10 @@ +name: seo-migration-planner +description: | + SEO site migration planning and monitoring. Triggers: site migration, domain move, redirect mapping, platform migration, URL restructuring, 사이트 이전. +allowed-tools: + - mcp__ahrefs__* + - mcp__firecrawl__* + - mcp__notion__* + - mcp__perplexity__* + - WebSearch + - WebFetch diff --git a/custom-skills/33-seo-migration-planner/desktop/tools/ahrefs.md b/custom-skills/33-seo-migration-planner/desktop/tools/ahrefs.md new file mode 100644 index 0000000..0a48279 --- /dev/null +++ b/custom-skills/33-seo-migration-planner/desktop/tools/ahrefs.md @@ -0,0 +1,37 @@ +# Ahrefs + +> MCP tool documentation for migration planner skill + +## Available Commands + +- `site-explorer-metrics` - Get current organic metrics (traffic, keywords) for a domain +- `site-explorer-metrics-history` - Get historical organic metrics for pre/post comparison +- `site-explorer-top-pages` - Get top performing pages by traffic for baseline +- `site-explorer-pages-by-traffic` - Get pages ranked by organic traffic for risk scoring +- `site-explorer-organic-keywords` - Get keyword rankings per page +- `site-explorer-referring-domains` - Get referring domain list for risk scoring +- `site-explorer-backlinks-stats` - Get backlink overview for migration impact assessment + +## Configuration + +- Requires Ahrefs MCP server configured in Claude Desktop +- API access via `mcp__ahrefs__*` tool prefix + +## Examples + +``` +# Get site baseline metrics +mcp__ahrefs__site-explorer-metrics(target="example.com") + +# Get top pages for risk scoring +mcp__ahrefs__site-explorer-top-pages(target="example.com", limit=500) + +# Get traffic history for pre/post comparison +mcp__ahrefs__site-explorer-metrics-history(target="example.com", date_from="2025-01-01") + +# Get backlink stats for a specific page +mcp__ahrefs__site-explorer-backlinks-stats(target="https://example.com/important-page") + +# Get keyword rankings +mcp__ahrefs__site-explorer-organic-keywords(target="example.com", limit=200) +``` diff --git a/custom-skills/33-seo-migration-planner/desktop/tools/firecrawl.md b/custom-skills/33-seo-migration-planner/desktop/tools/firecrawl.md new file mode 100644 index 0000000..7700cfa --- /dev/null +++ b/custom-skills/33-seo-migration-planner/desktop/tools/firecrawl.md @@ -0,0 +1,29 @@ +# Firecrawl + +> MCP tool documentation for URL inventory crawling and redirect verification + +## Available Commands + +- `firecrawl_crawl` - Crawl entire site to capture all URLs and status codes for migration inventory +- `firecrawl_scrape` - Scrape individual pages to verify redirect health (status codes, chains, final URL) + +## Configuration + +- Requires Firecrawl MCP server configured in Claude Desktop +- API access via `mcp__firecrawl__*` tool prefix + +## Examples + +``` +# Crawl full site for URL inventory +mcp__firecrawl__firecrawl_crawl(url="https://example.com", limit=5000, scrapeOptions={"formats": ["links"]}) + +# Verify a redirect +mcp__firecrawl__firecrawl_scrape(url="https://old-example.com/page", formats=["links"]) +``` + +## Notes + +- Crawl limit defaults to 5,000 URLs per run +- For larger sites, run multiple crawls with path-based filtering +- Redirect verification returns status_code, final_url, and redirect_chain diff --git a/custom-skills/33-seo-migration-planner/desktop/tools/notion.md b/custom-skills/33-seo-migration-planner/desktop/tools/notion.md new file mode 100644 index 0000000..433a731 --- /dev/null +++ b/custom-skills/33-seo-migration-planner/desktop/tools/notion.md @@ -0,0 +1,46 @@ +# Notion + +> MCP tool documentation for saving migration planning and monitoring reports + +## Available Commands + +- `notion-create-pages` - Create new pages in the SEO Audit Log database +- `notion-update-page` - Update existing audit entries +- `notion-query-database-view` - Query existing reports +- `notion-search` - Search across Notion workspace + +## Configuration + +- Database ID: `2c8581e5-8a1e-8035-880b-e38cefc2f3ef` +- All reports saved with Category: "SEO Migration" +- Audit ID format: MIGR-YYYYMMDD-NNN + +## Examples + +``` +# Create migration planning report +mcp__notion__notion-create-pages( + database_id="2c8581e5-8a1e-8035-880b-e38cefc2f3ef", + properties={ + "Issue": {"title": [{"text": {"content": "사이트 이전 계획 - example.com - 2025-01-15"}}]}, + "Site": {"url": "https://example.com"}, + "Category": {"select": {"name": "SEO Migration"}}, + "Priority": {"select": {"name": "High"}}, + "Found Date": {"date": {"start": "2025-01-15"}}, + "Audit ID": {"rich_text": [{"text": {"content": "MIGR-20250115-001"}}]} + } +) + +# Create post-migration monitoring report +mcp__notion__notion-create-pages( + database_id="2c8581e5-8a1e-8035-880b-e38cefc2f3ef", + properties={ + "Issue": {"title": [{"text": {"content": "이전 모니터링 보고서 - new-example.com - 2025-02-01"}}]}, + "Site": {"url": "https://new-example.com"}, + "Category": {"select": {"name": "SEO Migration"}}, + "Priority": {"select": {"name": "Critical"}}, + "Found Date": {"date": {"start": "2025-02-01"}}, + "Audit ID": {"rich_text": [{"text": {"content": "MIGR-20250201-001"}}]} + } +) +``` diff --git a/custom-skills/34-seo-reporting-dashboard/README.md b/custom-skills/34-seo-reporting-dashboard/README.md new file mode 100644 index 0000000..7bd8390 --- /dev/null +++ b/custom-skills/34-seo-reporting-dashboard/README.md @@ -0,0 +1,78 @@ +# SEO Reporting Dashboard + +SEO 종합 보고서 및 대시보드 생성 도구 - 모든 SEO 스킬 결과를 집계하여 이해관계자용 보고서와 인터랙티브 HTML 대시보드를 생성합니다. + +## Overview + +Aggregates outputs from all SEO skills (11-33) into executive reports with interactive HTML dashboards, trend analysis, and Korean-language executive summaries. This is the PRESENTATION LAYER that sits on top of skill 25 (KPI Framework) and all other skill outputs. + +## Relationship to Skill 25 (KPI Framework) + +Skill 25 establishes KPI baselines, targets, and health scores for a single domain. Skill 34 builds on top of skill 25 by: +- Aggregating outputs from ALL SEO skills (not just KPIs) +- Generating visual HTML dashboards with Chart.js +- Producing audience-specific Korean executive summaries +- Providing cross-skill priority analysis + +## Dual-Platform Structure + +``` +34-seo-reporting-dashboard/ +├── code/ # Claude Code version +│ ├── CLAUDE.md # Action-oriented directive +│ ├── commands/ +│ │ └── seo-reporting-dashboard.md # Slash command +│ └── scripts/ +│ ├── base_client.py # Shared async utilities +│ ├── report_aggregator.py # Collect + normalize skill outputs +│ ├── dashboard_generator.py # HTML dashboard with Chart.js +│ ├── executive_report.py # Korean executive summary +│ └── requirements.txt +│ +├── desktop/ # Claude Desktop version +│ ├── SKILL.md # MCP-based workflow +│ ├── skill.yaml # Extended metadata +│ └── tools/ +│ ├── ahrefs.md # Ahrefs tool docs +│ └── notion.md # Notion tool docs +│ +└── README.md +``` + +## Quick Start + +### Claude Code +```bash +pip install -r code/scripts/requirements.txt + +# Aggregate all skill outputs +python code/scripts/report_aggregator.py --domain https://example.com --json + +# Generate HTML dashboard +python code/scripts/dashboard_generator.py --report report.json --output dashboard.html + +# Generate Korean executive report +python code/scripts/executive_report.py --report report.json --audience c-level --output report.md +``` + +## Features + +- Cross-skill report aggregation (skills 11-33) +- Interactive HTML dashboard with Chart.js charts +- Korean-language executive summaries +- Audience-specific reporting (C-level, marketing, technical) +- Notion integration for reading past audits and writing reports +- Mobile-responsive dashboard layout + +## Requirements + +- Python 3.10+ +- Dependencies: `pip install -r code/scripts/requirements.txt` +- Notion API token (for database access) +- Ahrefs API token (for fresh data pull) + +## Triggers + +- SEO report, SEO dashboard, executive summary +- 보고서, 대시보드, 종합 보고서, 성과 보고서 +- performance report, reporting dashboard diff --git a/custom-skills/34-seo-reporting-dashboard/code/CLAUDE.md b/custom-skills/34-seo-reporting-dashboard/code/CLAUDE.md new file mode 100644 index 0000000..bf87129 --- /dev/null +++ b/custom-skills/34-seo-reporting-dashboard/code/CLAUDE.md @@ -0,0 +1,173 @@ +# CLAUDE.md + +## Overview + +SEO reporting dashboard and executive report generator. Aggregates outputs from all SEO skills (11-33) into stakeholder-ready reports with interactive HTML dashboards, trend analysis, and Korean-language executive summaries. This is the PRESENTATION LAYER that sits on top of skill 25 (KPI Framework) and all other skill outputs, providing a unified view of SEO performance across all audit dimensions. + +## Quick Start + +```bash +pip install -r scripts/requirements.txt + +# Aggregate outputs from all SEO skills +python scripts/report_aggregator.py --domain https://example.com --json + +# Generate HTML dashboard +python scripts/dashboard_generator.py --report aggregated_report.json --output dashboard.html + +# Generate Korean executive report +python scripts/executive_report.py --report aggregated_report.json --audience c-level --output report.md +``` + +## Scripts + +| Script | Purpose | Key Output | +|--------|---------|------------| +| `report_aggregator.py` | Collect and normalize outputs from all SEO skills | Unified aggregated report, cross-skill health score, priority issues | +| `dashboard_generator.py` | Generate interactive HTML dashboard with Chart.js | Self-contained HTML file with charts and responsive layout | +| `executive_report.py` | Korean-language executive summary generation | Markdown report tailored to audience level | +| `base_client.py` | Shared utilities | RateLimiter, ConfigManager, BaseAsyncClient | + +## Report Aggregator + +```bash +# Aggregate all skill outputs for a domain +python scripts/report_aggregator.py --domain https://example.com --json + +# Specify output directory to scan +python scripts/report_aggregator.py --domain https://example.com --output-dir ./audit_outputs --json + +# Filter by date range +python scripts/report_aggregator.py --domain https://example.com --from 2025-01-01 --to 2025-03-31 --json + +# Save to file +python scripts/report_aggregator.py --domain https://example.com --json --output report.json +``` + +**Capabilities**: +- Scan for recent audit outputs from skills 11-33 (JSON files or Notion entries) +- Normalize data formats across skills into unified structure +- Merge findings by domain/date +- Compute cross-skill health scores with weighted dimensions +- Identify top-priority issues across all audits +- Timeline of audit history +- Support for both local file scanning and Notion database queries + +## Dashboard Generator + +```bash +# Generate HTML dashboard from aggregated report +python scripts/dashboard_generator.py --report aggregated_report.json --output dashboard.html + +# Custom title +python scripts/dashboard_generator.py --report aggregated_report.json --output dashboard.html --title "OurDigital SEO Dashboard" +``` + +**Capabilities**: +- Generate self-contained HTML dashboard (uses Chart.js from CDN) +- Health score gauge chart +- Traffic trend line chart +- Keyword ranking distribution bar chart +- Technical issues breakdown pie chart +- Competitor comparison radar chart +- Mobile-responsive layout with CSS grid +- Export as single .html file (no external dependencies) + +## Executive Report + +```bash +# C-level executive summary (Korean) +python scripts/executive_report.py --report aggregated_report.json --audience c-level --output report.md + +# Marketing team report +python scripts/executive_report.py --report aggregated_report.json --audience marketing --output report.md + +# Technical team report +python scripts/executive_report.py --report aggregated_report.json --audience technical --output report.md + +# Output to Notion instead of file +python scripts/executive_report.py --report aggregated_report.json --audience c-level --format notion +``` + +**Capabilities**: +- Korean-language executive summary generation +- Key wins and concerns identification +- Period-over-period comparison narrative +- Priority action items ranked by impact +- Stakeholder-appropriate language (non-technical for C-level) +- Support for C-level, marketing team, and technical team audiences +- Markdown output format + +## Ahrefs MCP Tools Used + +| Tool | Purpose | +|------|---------| +| `site-explorer-metrics` | Fresh current organic metrics snapshot | +| `site-explorer-metrics-history` | Historical metrics for trend visualization | + +## Output Format + +```json +{ + "domain": "example.com", + "report_date": "2025-01-15", + "overall_health": 72, + "health_trend": "improving", + "skills_included": [ + {"skill_id": 11, "skill_name": "comprehensive-audit", "audit_date": "2025-01-14"}, + {"skill_id": 25, "skill_name": "kpi-framework", "audit_date": "2025-01-15"} + ], + "category_scores": { + "technical": 85, + "on_page": 70, + "performance": 60, + "content": 75, + "links": 68, + "local": 65, + "keywords": 72, + "competitor": 58 + }, + "top_issues": [ + {"severity": "critical", "category": "performance", "description": "CLS exceeds threshold on mobile"}, + {"severity": "high", "category": "technical", "description": "12 pages with noindex tag incorrectly set"} + ], + "top_wins": [ + {"category": "links", "description": "Domain Rating increased by 3 points"}, + {"category": "keywords", "description": "15 new keywords entered top 10"} + ], + "timeline": [ + {"date": "2025-01-15", "skill": "kpi-framework", "health_score": 72}, + {"date": "2025-01-14", "skill": "comprehensive-audit", "health_score": 70} + ], + "audit_id": "DASH-20250115-001", + "timestamp": "2025-01-15T14:30:00" +} +``` + +## Notion Output (Required) + +**IMPORTANT**: All audit reports MUST be saved to the OurDigital SEO Audit Log database. + +### Database Configuration + +| Field | Value | +|-------|-------| +| Database ID | `2c8581e5-8a1e-8035-880b-e38cefc2f3ef` | +| URL | https://www.notion.so/dintelligence/2c8581e58a1e8035880be38cefc2f3ef | + +### Required Properties + +| Property | Type | Description | +|----------|------|-------------| +| Issue | Title | Report title (Korean + date) | +| Site | URL | Audited website URL | +| Category | Select | SEO Dashboard | +| Priority | Select | Based on overall health trend | +| Found Date | Date | Report date (YYYY-MM-DD) | +| Audit ID | Rich Text | Format: DASH-YYYYMMDD-NNN | + +### Language Guidelines + +- Report content in Korean (한국어) +- Keep technical English terms as-is (e.g., Health Score, Domain Rating, Core Web Vitals, Chart.js) +- URLs and code remain unchanged diff --git a/custom-skills/34-seo-reporting-dashboard/code/commands/seo-reporting-dashboard.md b/custom-skills/34-seo-reporting-dashboard/code/commands/seo-reporting-dashboard.md new file mode 100644 index 0000000..9f4572f --- /dev/null +++ b/custom-skills/34-seo-reporting-dashboard/code/commands/seo-reporting-dashboard.md @@ -0,0 +1,30 @@ +--- +name: seo-reporting-dashboard +description: | + SEO reporting dashboard and executive report generation. Aggregates data from all SEO skills + into stakeholder-ready reports and interactive HTML dashboards. + Triggers: SEO report, SEO dashboard, executive summary, 보고서, 대시보드, performance report. +allowed-tools: + - Bash + - Read + - Write + - WebFetch + - WebSearch +--- + +# SEO Reporting Dashboard + +## Generate HTML Dashboard +```bash +python custom-skills/34-seo-reporting-dashboard/code/scripts/dashboard_generator.py --report [JSON] --output dashboard.html +``` + +## Generate Executive Report (Korean) +```bash +python custom-skills/34-seo-reporting-dashboard/code/scripts/executive_report.py --report [JSON] --audience c-level --output report.md +``` + +## Aggregate All Skill Outputs +```bash +python custom-skills/34-seo-reporting-dashboard/code/scripts/report_aggregator.py --domain [URL] --json +``` diff --git a/custom-skills/34-seo-reporting-dashboard/code/scripts/base_client.py b/custom-skills/34-seo-reporting-dashboard/code/scripts/base_client.py new file mode 100644 index 0000000..e26ed4a --- /dev/null +++ b/custom-skills/34-seo-reporting-dashboard/code/scripts/base_client.py @@ -0,0 +1,169 @@ +""" +Base Client - Shared async client utilities +=========================================== +Purpose: Rate-limited async operations for API clients +Python: 3.10+ +""" + +import asyncio +import logging +import os +from asyncio import Semaphore +from datetime import datetime +from typing import Any, Callable, TypeVar + +from dotenv import load_dotenv +from tenacity import ( + retry, + stop_after_attempt, + wait_exponential, + retry_if_exception_type, +) + +load_dotenv() + +logging.basicConfig( + level=logging.INFO, + format="%(asctime)s - %(levelname)s - %(message)s", +) + +T = TypeVar("T") + + +class RateLimiter: + """Rate limiter using token bucket algorithm.""" + + def __init__(self, rate: float, per: float = 1.0): + self.rate = rate + self.per = per + self.tokens = rate + self.last_update = datetime.now() + self._lock = asyncio.Lock() + + async def acquire(self) -> None: + async with self._lock: + now = datetime.now() + elapsed = (now - self.last_update).total_seconds() + self.tokens = min(self.rate, self.tokens + elapsed * (self.rate / self.per)) + self.last_update = now + + if self.tokens < 1: + wait_time = (1 - self.tokens) * (self.per / self.rate) + await asyncio.sleep(wait_time) + self.tokens = 0 + else: + self.tokens -= 1 + + +class BaseAsyncClient: + """Base class for async API clients with rate limiting.""" + + def __init__( + self, + max_concurrent: int = 5, + requests_per_second: float = 3.0, + logger: logging.Logger | None = None, + ): + self.semaphore = Semaphore(max_concurrent) + self.rate_limiter = RateLimiter(requests_per_second) + self.logger = logger or logging.getLogger(self.__class__.__name__) + self.stats = { + "requests": 0, + "success": 0, + "errors": 0, + "retries": 0, + } + + @retry( + stop=stop_after_attempt(3), + wait=wait_exponential(multiplier=1, min=2, max=10), + retry=retry_if_exception_type(Exception), + ) + async def _rate_limited_request( + self, + coro: Callable[[], Any], + ) -> Any: + async with self.semaphore: + await self.rate_limiter.acquire() + self.stats["requests"] += 1 + try: + result = await coro() + self.stats["success"] += 1 + return result + except Exception as e: + self.stats["errors"] += 1 + self.logger.error(f"Request failed: {e}") + raise + + async def batch_requests( + self, + requests: list[Callable[[], Any]], + desc: str = "Processing", + ) -> list[Any]: + try: + from tqdm.asyncio import tqdm + has_tqdm = True + except ImportError: + has_tqdm = False + + async def execute(req: Callable) -> Any: + try: + return await self._rate_limited_request(req) + except Exception as e: + return {"error": str(e)} + + tasks = [execute(req) for req in requests] + + if has_tqdm: + results = [] + for coro in tqdm.as_completed(tasks, total=len(tasks), desc=desc): + result = await coro + results.append(result) + return results + else: + return await asyncio.gather(*tasks, return_exceptions=True) + + def print_stats(self) -> None: + self.logger.info("=" * 40) + self.logger.info("Request Statistics:") + self.logger.info(f" Total Requests: {self.stats['requests']}") + self.logger.info(f" Successful: {self.stats['success']}") + self.logger.info(f" Errors: {self.stats['errors']}") + self.logger.info("=" * 40) + + +class ConfigManager: + """Manage API configuration and credentials.""" + + def __init__(self): + load_dotenv() + + @property + def google_credentials_path(self) -> str | None: + seo_creds = os.path.expanduser("~/.credential/ourdigital-seo-agent.json") + if os.path.exists(seo_creds): + return seo_creds + return os.getenv("GOOGLE_APPLICATION_CREDENTIALS") + + @property + def pagespeed_api_key(self) -> str | None: + return os.getenv("PAGESPEED_API_KEY") + + @property + def notion_token(self) -> str | None: + return os.getenv("NOTION_TOKEN") or os.getenv("NOTION_API_KEY") + + def validate_google_credentials(self) -> bool: + creds_path = self.google_credentials_path + if not creds_path: + return False + return os.path.exists(creds_path) + + def get_required(self, key: str) -> str: + value = os.getenv(key) + if not value: + raise ValueError(f"Missing required environment variable: {key}") + return value + + +config = ConfigManager() diff --git a/custom-skills/34-seo-reporting-dashboard/code/scripts/dashboard_generator.py b/custom-skills/34-seo-reporting-dashboard/code/scripts/dashboard_generator.py new file mode 100644 index 0000000..8a37367 --- /dev/null +++ b/custom-skills/34-seo-reporting-dashboard/code/scripts/dashboard_generator.py @@ -0,0 +1,745 @@ +""" +Dashboard Generator - Interactive HTML SEO dashboard with Chart.js +================================================================== +Purpose: Generate a self-contained HTML dashboard from aggregated SEO + report data, with responsive charts for health scores, traffic + trends, keyword rankings, issue breakdowns, and competitor radar. +Python: 3.10+ + +Usage: + python dashboard_generator.py --report aggregated_report.json --output dashboard.html + python dashboard_generator.py --report aggregated_report.json --output dashboard.html --title "My SEO Dashboard" +""" + +import argparse +import json +import logging +import sys +from dataclasses import dataclass, field +from datetime import datetime +from pathlib import Path +from typing import Any + +from jinja2 import Template + +logger = logging.getLogger(__name__) + +logging.basicConfig( + level=logging.INFO, + format="%(asctime)s - %(levelname)s - %(message)s", +) + + +# --------------------------------------------------------------------------- +# Data classes +# --------------------------------------------------------------------------- + +@dataclass +class DashboardConfig: + """Configuration for dashboard generation.""" + title: str = "SEO Reporting Dashboard" + domain: str = "" + date_range: str = "" + theme: str = "light" + chart_options: dict[str, Any] = field(default_factory=dict) + + +# --------------------------------------------------------------------------- +# HTML template +# --------------------------------------------------------------------------- + +DASHBOARD_TEMPLATE = """ + +
+ + +| Date | +Skill | +Category | +Score | +Issues | +
|---|---|---|---|---|
| {{ entry.date }} | +{{ entry.skill }} | +{{ entry.category }} | +{{ entry.health_score }} | +{{ entry.issues_count }} | +