Add SEO skills 19-28, 31-32 with full Python implementations

12 new skills: Keyword Strategy, SERP Analysis, Position Tracking, Link Building, Content Strategy, E-Commerce SEO, KPI Framework, International SEO, AI Visibility, Knowledge Graph, Competitor Intel, and Crawl Budget. ~20K lines of Python across 25 domain scripts. Updated skill 11 pipeline table and repo CLAUDE.md. Enhanced skill 18 local SEO workflow from jamie.clinic audit. Note: Skill 26 hreflang_validator.py pending (content filter block). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 12:05:59 +09:00
parent 159f7ec3f7
commit a3ff965b87
125 changed files with 25948 additions and 173 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -35,7 +35,7 @@ This is a Claude Skills collection repository containing:
 | 09 | ourdigital-backoffice | Business document creation | "create proposal", "견적서" |
 | 10 | ourdigital-skill-creator | Meta skill for creating skills | "create skill", "init skill" |
-### SEO Tools (11-30)
+### SEO Tools (11-32)
 | # | Skill | Purpose | Trigger |
 |---|-------|---------|---------|
@@ -47,22 +47,20 @@ This is a Claude Skills collection repository containing:
 | 16 | seo-schema-validator | Structured data validation | "validate schema", "JSON-LD" |
 | 17 | seo-schema-generator | Schema markup creation | "generate schema", "create JSON-LD" |
 | 18 | seo-local-audit | NAP, GBP, citations | "local SEO", "Google Business Profile" |
 | 19 | seo-keyword-strategy | Keyword expansion, intent, clustering, gaps | "keyword research", "keyword strategy" |
 | 20 | seo-serp-analysis | Google/Naver SERP features, competitor positions | "SERP analysis", "SERP features" |
 | 21 | seo-position-tracking | Rank monitoring, visibility scores, alerts | "rank tracking", "position monitoring" |
 | 22 | seo-link-building | Backlink audit, toxic links, link gaps | "backlink audit", "link building" |
 | 23 | seo-content-strategy | Content audit, decay, briefs, clusters | "content strategy", "content audit" |
 | 24 | seo-ecommerce | Product page audit, product schema | "e-commerce SEO", "product SEO" |
 | 25 | seo-kpi-framework | Unified KPIs, health scores, ROI | "SEO KPI", "SEO performance" |
 | 26 | seo-international | Hreflang, content parity, multi-language | "international SEO", "hreflang" |
 | 27 | seo-ai-visibility | AI search citations, brand radar, SOV | "AI visibility", "AI search" |
 | 28 | seo-knowledge-graph | Entity SEO, Knowledge Panel, PAA | "knowledge graph", "entity SEO" |
 | 29 | seo-gateway-architect | Gateway page strategy | "SEO strategy", "gateway pages" |
 | 30 | seo-gateway-builder | Gateway page content | "build gateway page" |
-
+| 31 | seo-competitor-intel | Competitor profiling, benchmarking, threats | "competitor analysis", "competitive intel" |
-**Future SEO Skills (19-28 reserved):**
+| 32 | seo-crawl-budget | Log analysis, bot profiling, crawl waste | "crawl budget", "log analysis" |
 | # | Planned Skill | Status |
 |---|--------------|--------|
 | 19 | Keyword Strategy & Research | Planned |
 | 20 | SERP Analysis | Planned |
 | 21 | Position Tracking | Planned |
 | 22 | Link Building Diagnosis | Planned |
 | 23 | Content Strategy | Planned |
 | 24 | E-Commerce SEO | Planned |
 | 25 | SEO KPI & Performance Framework | Planned |
 | 26 | International SEO | Planned |
 | 27-28 | *(reserved)* | — |
 ### GTM/GA Tools (60-69)
@@ -209,9 +207,20 @@ our-claude-skills/
 │   ├── 16-seo-schema-validator/
 │   ├── 17-seo-schema-generator/
 │   ├── 18-seo-local-audit/
-│   ├── 19-28 (reserved for future SEO skills)
+│   ├── 19-seo-keyword-strategy/
 │   ├── 20-seo-serp-analysis/
 │   ├── 21-seo-position-tracking/
 │   ├── 22-seo-link-building/
 │   ├── 23-seo-content-strategy/
 │   ├── 24-seo-ecommerce/
 │   ├── 25-seo-kpi-framework/
 │   ├── 26-seo-international/
 │   ├── 27-seo-ai-visibility/
 │   ├── 28-seo-knowledge-graph/
 │   ├── 29-seo-gateway-architect/
 │   ├── 30-seo-gateway-builder/
 │   ├── 31-seo-competitor-intel/
 │   ├── 32-seo-crawl-budget/
 │   │
 │   ├── 60-gtm-audit/
 │   ├── 61-gtm-manager/
--- a/custom-skills/11-seo-comprehensive-audit/code/CLAUDE.md
+++ b/custom-skills/11-seo-comprehensive-audit/code/CLAUDE.md
@@ -34,9 +34,38 @@ python scripts/seo_audit_orchestrator.py --url https://example.com --json
 | 2 | On-Page SEO | `13-seo-on-page-audit/code/scripts/page_analyzer.py` |
 | 3 | Core Web Vitals | `14-seo-core-web-vitals/code/scripts/pagespeed_client.py` |
 | 4 | Schema Validation | `16-seo-schema-validator/code/scripts/schema_validator.py` |
-| 5 | Local SEO | `18-seo-local-audit/` (prompt-driven) |
+| 5 | Local SEO | `18-seo-local-audit/` (prompt-driven — see Stage 5 notes below) |
 | 6 | Search Console | `15-seo-search-console/code/scripts/gsc_client.py` |
 ## Stage 5: Local SEO — Key Requirements
 Stage 5 is prompt-driven and requires **Business Identity extraction as a mandatory first step**:
 1. Extract Korean name, English name, address, phone from website JSON-LD schema markup (`Organization`/`Hospital`/`LocalBusiness`)
 2. Check website footer, contact page, and schema `sameAs` for GBP, Naver Place, and Kakao Map URLs
 3. Use layered search fallback if listing URLs are not found on the website
 4. Follow `18-seo-local-audit/code/CLAUDE.md` for the full workflow
 5. **Korean market priorities**: GBP and Naver Smart Place are both Critical; Kakao Map is High; US-centric directories (Yelp, Yellow Pages) are Low
 6. **Important**: GBP and Naver Map are JS-rendered. Report unfound listings as "not discoverable via web search" — not "does not exist"
 ## Extended SEO Skills Pipeline
 Beyond the 6 core audit stages, additional specialized skills are available for deeper analysis:
 | Skill | Audit ID | Purpose | Command |
 |-------|----------|---------|---------|
 | 19 - Keyword Strategy | KW | Seed expansion, intent classification, keyword gaps | `/seo-keyword-strategy` |
 | 20 - SERP Analysis | SERP | Google/Naver SERP features, competitor positions | `/seo-serp-analysis` |
 | 21 - Position Tracking | RANK | Rank monitoring, visibility scores, alerts | `/seo-position-tracking` |
 | 22 - Link Building | LINK | Backlink audit, toxic links, link gaps | `/seo-link-building` |
 | 23 - Content Strategy | CONTENT | Content audit, decay detection, briefs | `/seo-content-strategy` |
 | 24 - E-Commerce SEO | ECOM | Product page audit, product schema | `/seo-ecommerce` |
 | 25 - SEO KPI Framework | KPI | Unified KPIs, health scores, ROI | `/seo-kpi-framework` |
 | 26 - International SEO | INTL | Hreflang validation, content parity | `/seo-international` |
 | 27 - AI Visibility | AI | AI search citations, brand radar, SOV | `/seo-ai-visibility` |
 | 28 - Knowledge Graph | KG | Entity SEO, Knowledge Panel, PAA | `/seo-knowledge-graph` |
 | 31 - Competitor Intel | COMP | Competitor profiling, benchmarking | `/seo-competitor-intel` |
 | 32 - Crawl Budget | CRAWL | Log analysis, bot profiling, waste | `/seo-crawl-budget` |
 ## Health Score Weights
 | Category | Weight |
--- a/custom-skills/11-seo-comprehensive-audit/code/commands/seo-comprehensive-audit.md
+++ b/custom-skills/11-seo-comprehensive-audit/code/commands/seo-comprehensive-audit.md
@@ -62,10 +62,37 @@ python "$SKILLS/14-seo-core-web-vitals/code/scripts/pagespeed_client.py" --url $
 # Stage 4: Schema Validation
 python "$SKILLS/16-seo-schema-validator/code/scripts/schema_validator.py" --url $URL --json
-# Stage 5: Local SEO (prompt-driven, use WebFetch + WebSearch)
+# Stage 5: Local SEO (see detailed instructions below)
 # Stage 6: Search Console (requires GSC API credentials)
 ```
 ### Stage 5: Local SEO — Detailed Instructions
 Stage 5 is prompt-driven (no script). Follow this sequence:
 1. **Extract Business Identity from website (MANDATORY FIRST)**
   - WebFetch the homepage and parse JSON-LD `<script type="application/ld+json">` tags
   - Extract from `Organization`, `Hospital`, or `LocalBusiness` schema: Korean name, English name, address, telephone
   - Check `sameAs` array for GBP, Naver Place, Kakao Map URLs
 2. **Check website for listing links**
   - Scrape footer, contact page, about page for links matching:
     - GBP: `maps.app.goo.gl/*`, `google.com/maps/place/*`, `g.page/*`
     - Naver Place: `naver.me/*`, `map.naver.com/*/place/*`, `m.place.naver.com/*`
     - Kakao Map: `place.map.kakao.com/*`, `kko.to/*`
   - Check embedded iframes for Google Maps Place IDs or Naver Map embeds
 3. **Layered search fallback (if links not found on website)**
   - GBP: Search `"[Korean Name]" "[district]" Google Maps`, then `"[phone]" site:google.com/maps`
   - Naver: Search `"[Korean Name]" site:map.naver.com`, then `"[Korean Name]" 네이버 지도 [district]`
   - Kakao: Search `"[Korean Name]" site:place.map.kakao.com`
 4. **Follow `18-seo-local-audit/code/CLAUDE.md` workflow** for the full audit (Steps 2-7)
 5. **Important language**: Distinguish **"not discoverable via web search"** from **"does not exist."** GBP and Naver Map are JS-rendered; WebFetch cannot extract their listing data. Absence in search results does not confirm absence of the listing.
 6. **Korean market priorities**: GBP and Naver Smart Place are both Critical. Kakao Map is High. US-centric directories (Yelp, Yellow Pages) are Low priority for Korean businesses.
 ## Health Score (Weighted 0-100)
 | Category | Weight |
--- a/custom-skills/18-seo-local-audit/code/CLAUDE.md
+++ b/custom-skills/18-seo-local-audit/code/CLAUDE.md
@@ -2,109 +2,253 @@
 ## Overview
-Local SEO auditor for businesses with physical locations: NAP consistency, Google Business Profile optimization, local citations, and LocalBusiness schema validation.
+Local SEO auditor for Korean-market businesses with physical locations. Covers business identity extraction, GBP optimization, Naver Smart Place, Kakao Map, NAP consistency, local citations, and LocalBusiness schema validation.
 ## Quick Start
 This skill primarily uses MCP tools (Firecrawl, Perplexity) for data collection. Scripts are helpers for validation.
 ```bash
 # NAP consistency check (manual data input)
 python scripts/nap_checker.py --business "Business Name" --address "123 Main St" --phone "555-1234"
 # LocalBusiness schema validation
 python scripts/local_schema_validator.py --url https://example.com
 ```
 ## Audit Components
 ### 1. NAP Consistency
 **Name, Address, Phone** consistency across:
 - Website (header, footer, contact page)
 - Google Business Profile
 - Local directories (Yelp, Yellow Pages, etc.)
 - Social media profiles
 ### 2. Google Business Profile (GBP)
 Optimization checklist:
 - [ ] Business name matches website
 - [ ] Address is complete and accurate
 - [ ] Phone number is local
 - [ ] Business hours are current
 - [ ] Categories are appropriate
 - [ ] Photos uploaded (exterior, interior, products)
 - [ ] Posts are recent (within 7 days)
 - [ ] Reviews are responded to
 ### 3. Local Citations
 Priority directories to check:
 - Google Business Profile
 - Apple Maps
 - Bing Places
 - Yelp
 - Facebook Business
 - Industry-specific directories
 ### 4. LocalBusiness Schema
 Required properties:
 - @type (LocalBusiness or subtype)
 - name
 - address (PostalAddress)
 - telephone
 - openingHours
 ## Workflow
 ### Step 0: Business Identity (MANDATORY FIRST STEP)
 Before any audit work, establish the official business identity.
 **Sources (in priority order):**
 1. Website schema markup (JSON-LD `Organization`, `Hospital`, `LocalBusiness`) — the `name` field is authoritative
 2. Contact page / About page
 3. Footer (address, phone, social links)
 4. User-provided information (known GBP URL, Naver Place URL, etc.)
 **Data to collect:**
 | Field | Example |
 |-------|---------|
 | Official name (Korean) | 제이미성형외과의원 |
 | Official name (English) | Jamie Plastic Surgery Clinic |
 | Brand/display name | Jamie Clinic |
 | Website URL | https://www.jamie.clinic |
 | Address (Korean) | 서울특별시 강남구 ... |
 | Phone | 02-XXX-XXXX |
 | Known GBP URL | (if available) |
 | Known Naver Place URL | (if available) |
 | Known Kakao Map URL | (if available) |
 **How to extract:**
 ```
-1. Collect NAP from client
+WebFetch homepage → parse JSON-LD script tags → extract name, address, telephone, sameAs
-2. Scrape website for NAP mentions
+WebFetch /contact or /about → extract NAP from page content
-3. Search citations using Perplexity
+Check footer for social links, map embeds, place listing URLs
 4. Check GBP data (manual or API)
 5. Validate LocalBusiness schema
 6. Generate consistency report
 ```
 Look specifically for these URL patterns in `sameAs`, footer links, or embedded iframes:
 - GBP: `maps.app.goo.gl/*`, `google.com/maps/place/*`, `g.page/*`
 - Naver Place: `naver.me/*`, `map.naver.com/*/place/*`, `m.place.naver.com/*`
 - Kakao Map: `place.map.kakao.com/*`, `kko.to/*`
 ### Step 1: Website NAP Extraction
 Scrape header, footer, contact page, about page for NAP mentions. Cross-reference with schema markup. Establish the **canonical NAP** baseline (the single source of truth for this audit).
 ### Step 2: GBP Verification & Audit
 **Layered discovery (try in order, stop when found):**
 1. Use provided GBP URL (from Step 0 or user input)
 2. Check website for GBP link (footer, contact page, schema `sameAs`, embedded Google Maps iframe with Place ID)
 3. WebSearch: `"[Korean Name]" "[City/District]" Google Maps`
 4. WebSearch: `"[English Name]" Google Maps [City]`
 5. WebSearch: `"[exact phone number]" site:google.com/maps`
 **Important**: Google Maps is JS-rendered — WebFetch cannot extract business data from the listing page itself. Use WebSearch to find the listing URL, then verify details via search result snippets.
 **If found — audit checklist (score /10):**
 - [ ] Business name matches canonical NAP
 - [ ] Address is complete and accurate
 - [ ] Phone number matches
 - [ ] Business hours are current
 - [ ] Primary + secondary categories are appropriate
 - [ ] Business description is complete
 - [ ] 10+ photos uploaded (exterior, interior, products/services)
 - [ ] Posts are recent (within 7 days)
 - [ ] Reviews are responded to
 - [ ] Q&A section is active
 **If NOT found after all attempts:** Report as **"not discoverable via web search"** — this is distinct from "does not exist." The listing may exist but be unfindable through non-JS search methods.
 ### Step 3: Naver Smart Place Verification & Audit
 **Layered discovery (try in order, stop when found):**
 1. Use provided Naver Place URL (from Step 0 or user input)
 2. Check website for Naver Place link (footer, contact page, schema `sameAs`, `naver.me/*` or `map.naver.com/*/place/*` patterns)
 3. WebSearch: `"[Korean Name]" site:map.naver.com`
 4. WebSearch: `"[Korean Name]" 네이버 지도 [district]`
 5. WebSearch: `"[Korean Name]" 네이버 스마트플레이스`
 6. WebSearch: `"[exact phone number]" site:map.naver.com`
 **Important**: Naver Map is JS-rendered — WebFetch cannot extract data from the listing page. Use WebSearch for discovery, verify via search result snippets.
 **If found — audit checklist (score /10):**
 - [ ] Business name matches canonical NAP
 - [ ] Address is complete and accurate
 - [ ] Phone number matches
 - [ ] Business hours are current
 - [ ] Place is "claimed" (owner-managed / 업주 등록)
 - [ ] Keywords/tags are set
 - [ ] Booking/reservation link present
 - [ ] Recent blog reviews linked
 - [ ] Photos uploaded and current
 - [ ] Menu/service/price information present
 **If NOT found after all attempts:** Report as **"not discoverable via web search"** (not "does not exist" or "not registered").
 ### Step 4: Kakao Map Verification
 **Discovery:**
 1. Use provided Kakao Map URL (from Step 0)
 2. Check website for Kakao Map link (`place.map.kakao.com/*`, `kko.to/*`)
 3. WebSearch: `"[Korean Name]" site:place.map.kakao.com`
 4. WebSearch: `"[Korean Name]" 카카오맵 [district]`
 **If found:** Verify NAP consistency against canonical NAP.
 ### Step 5: Citation Discovery
 **Korean market platform priorities:**
 | Platform | Priority | Market |
 |----------|----------|--------|
 | Google Business Profile | Critical | Global |
 | Naver Smart Place (네이버 스마트플레이스) | Critical | Korea |
 | Kakao Map (카카오맵) | High | Korea |
 | Industry-specific directories | High | Varies |
 | Apple Maps | Medium | Global |
 | Bing Places | Low | Global |
 **Korean medical/cosmetic industry directories:**
 - 강남언니 (Gangnam Unni)
 - 바비톡 (Babitalk)
 - 성예사 (Sungyesa)
 - 굿닥 (Goodoc)
 - 똑닥 (Ddocdoc)
 - 모두닥 (Modoodoc)
 - 하이닥 (HiDoc)
 **Discovery methods:**
 - Phone number search across platforms
 - Korean business name + district search
 - English business name search
 - Address search
 ### Step 6: NAP Consistency Report
 Cross-reference all discovered sources against the canonical NAP from Step 1.
 **Common inconsistency points to check:**
 - Building/landmark names (e.g., "EHL빌딩" vs "엔와이빌딩") — the authoritative source is the **business registration certificate** (사업자등록증), not the website alone
 - Phone format variations (02-XXX-XXXX vs +82-2-XXX-XXXX vs 02XXXXXXX)
 - Address format (road-name vs lot-number / 도로명 vs 지번)
 - Korean vs English name spelling variations
 - Suite/floor number omissions
 ### Step 7: LocalBusiness Schema Validation
 Validate JSON-LD completeness:
 - @type (LocalBusiness, Hospital, or appropriate subtype)
 - name (Korean and/or English)
 - address (PostalAddress with Korean format)
 - telephone
 - openingHours / openingHoursSpecification
 - geo (GeoCoordinates — latitude, longitude)
 - sameAs (should include GBP, Naver Place, Kakao Map, social profiles)
 - url
 - image
 Use schema generator skill (17) for creating/fixing markup.
 ## Scoring
 | Component | Weight | Max Score |
 |-----------|--------|-----------|
 | Business Identity completeness | 5% | /10 |
 | NAP Consistency | 20% | /10 |
 | GBP Optimization | 20% | /10 |
 | Naver Smart Place | 20% | /10 |
 | Kakao Map presence | 10% | /10 |
 | Citations (directories) | 10% | /10 |
 | LocalBusiness Schema | 15% | /10 |
 **Overall Local SEO Score** = weighted average, normalized to /100.
 ## Output Format
 ```markdown
 ## Local SEO Audit: [Business Name]
 **Date**: YYYY-MM-DD
 **Website**: [URL]
-### NAP Consistency Score: X/10
+### Business Identity
 | Field | Value |
 |-------|-------|
 | Korean Name | ... |
 | English Name | ... |
 | Brand Name | ... |
 | Address | ... |
 | Phone | ... |
 ### NAP Consistency: X/10
 | Source | Name | Address | Phone | Status |
 |--------|------|---------|-------|--------|
-| Website | ✓ | ✓ | ✓ | Match |
+| Website | OK/Issue | OK/Issue | OK/Issue | Match/Mismatch |
-| GBP | ✓ | ✗ | ✓ | Mismatch |
+| GBP | OK/Issue | OK/Issue | OK/Issue | Match/Mismatch |
 | Naver Place | OK/Issue | OK/Issue | OK/Issue | Match/Mismatch |
 | Kakao Map | OK/Issue | OK/Issue | OK/Issue | Match/Mismatch |
 ### GBP Optimization: X/10
- [ ] Issue 1
+- [x] Completed items
- [x] Completed item
+- [ ] Missing items
 **GBP URL**: [URL or "not discoverable"]
-### Citation Audit
+### Naver Smart Place: X/10
- Found: X citations
+- [x] Completed items
- Consistent: X
+- [ ] Missing items
- Needs update: X
+**Naver Place URL**: [URL or "not discoverable"]
-### Recommendations
+### Kakao Map: X/10
-1. Fix address mismatch on GBP
+**Status**: Found/Not discoverable
-2. Add LocalBusiness schema
+**Kakao Map URL**: [URL or "not discoverable"]
 ### Citations: X/10
 | Platform | Found | NAP Match |
 |----------|-------|-----------|
 | 강남언니 | Yes/No | OK/Issue |
 | ... | | |
 ### LocalBusiness Schema: X/10
 - Present: Yes/No
 - Valid: Yes/No
 - Missing fields: [list]
 ### Overall Score: XX/100 (Grade)
 ### Priority Actions
 1. [Highest impact recommendation]
 2. ...
 ```
 ## Common Issues
 | Issue | Impact | Fix |
 |-------|--------|-----|
-| NAP inconsistency | High | Update all directories |
+| NAP inconsistency | High | Update all directories to match canonical NAP |
-| Missing GBP categories | Medium | Add relevant categories |
+| Missing Naver Smart Place | Critical | Register and claim via smartplace.naver.com |
-| No LocalBusiness schema | Medium | Add JSON-LD markup |
+| Unclaimed Naver Place | High | Claim ownership via 네이버 스마트플레이스 |
-| Outdated business hours | Medium | Update GBP hours |
+| Missing GBP listing | Critical | Create via business.google.com |
-| No review responses | Low | Respond to all reviews |
+| Building name mismatch | Medium | Align to business registration certificate |
 | No LocalBusiness schema | Medium | Add JSON-LD markup with sameAs links |
 | Missing GeoCoordinates | Medium | Add lat/lng to schema |
 | No sameAs in schema | Medium | Add GBP, Naver, Kakao, social URLs |
 ## Notes
- GBP API requires enterprise approval (use manual audit)
+- GBP and Naver Map are JS-rendered — WebFetch cannot extract listing data directly. Always use WebSearch for discovery.
- Citation discovery limited to public data
+- "Not discoverable via web search" != "does not exist." Always use this precise language.
- Use schema generator skill (14) for creating LocalBusiness markup
+- For Korean businesses, Naver Smart Place is as important as GBP (often more so for domestic traffic).
 - Citation discovery is limited to publicly searchable data.
 ## Notion Output (Required)
@@ -123,20 +267,13 @@ Required properties:
 |----------|------|-------------|
 | Issue | Title | Report title (Korean + date) |
 | Site | URL | Audited website URL |
-| Category | Select | Technical SEO, On-page SEO, Performance, Schema/Structured Data, Sitemap, Robots.txt, Content, Local SEO |
+| Category | Select | Local SEO |
 | Priority | Select | Critical, High, Medium, Low |
 | Found Date | Date | Audit date (YYYY-MM-DD) |
-| Audit ID | Rich Text | Format: [TYPE]-YYYYMMDD-NNN |
+| Audit ID | Rich Text | Format: LOCAL-YYYYMMDD-NNN |
 ### Language Guidelines
 - Report content in Korean (한국어)
- Keep technical English terms as-is (e.g., SEO Audit, Core Web Vitals, Schema Markup)
+- Keep technical English terms as-is (e.g., SEO Audit, GBP, NAP, Schema Markup)
 - URLs and code remain unchanged
 ### Example MCP Call
 ```bash
 mcp-cli call notion/API-post-page '{"parent": {"database_id": "2c8581e5-8a1e-8035-880b-e38cefc2f3ef"}, "properties": {...}}'
 ```
--- a/custom-skills/18-seo-local-audit/desktop/SKILL.md
+++ b/custom-skills/18-seo-local-audit/desktop/SKILL.md
@@ -1,125 +1,239 @@
 ---
 name: seo-local-audit
 description: |
-  Local business SEO auditor for NAP consistency, Google Business Profile, and citations.
+  Local business SEO auditor for Korean-market businesses. Covers business identity extraction,
-  Triggers: local SEO, NAP audit, Google Business Profile, GBP optimization, local citations.
+  NAP consistency, Google Business Profile, Naver Smart Place, Kakao Map, local citations,
  and LocalBusiness schema validation.
  Triggers: local SEO, NAP audit, Google Business Profile, GBP optimization, local citations,
  네이버 스마트플레이스, 카카오맵, 로컬 SEO.
 ---
 # SEO Local Audit
 ## Purpose
-Audit local business SEO: NAP (Name, Address, Phone) consistency, Google Business Profile optimization, local citations, and LocalBusiness schema markup.
+Audit local business SEO for Korean-market businesses: business identity extraction, NAP consistency, GBP optimization, Naver Smart Place, Kakao Map, local citations, and LocalBusiness schema markup.
 ## Core Capabilities
-1. **NAP Consistency** - Cross-platform verification
+1. **Business Identity** - Extract official names, address, phone from website schema/content
-2. **GBP Optimization** - Profile completeness check
+2. **NAP Consistency** - Cross-platform verification against canonical NAP
-3. **Citation Audit** - Directory presence
+3. **GBP Optimization** - Layered discovery + profile completeness audit
-4. **Schema Validation** - LocalBusiness markup
+4. **Naver Smart Place** - Layered discovery + listing completeness audit
 5. **Kakao Map** - Presence verification + NAP check
 6. **Citation Audit** - Korean-first directory presence
 7. **Schema Validation** - LocalBusiness JSON-LD markup
 ## MCP Tool Usage
 ```
-mcp__firecrawl__scrape: Extract NAP from website
+mcp__firecrawl__scrape: Extract NAP and schema from website
-mcp__perplexity__search: Find citations and directories
+mcp__perplexity__search: Find citations, GBP, Naver Place listings
 mcp__notion__create-page: Save audit findings
 ```
 ## Workflow
-### 1. Gather Business Info
+### Step 0: Business Identity (MANDATORY FIRST STEP)
 Collect from client:
 - Business name (exact)
 - Full address
 - Phone number (local preferred)
 - Website URL
 - GBP listing URL
-### 2. Website NAP Check
+Before any audit, establish the official business identity.
 Scrape website for NAP mentions:
 - Header/footer
 - Contact page
 - About page
 - Schema markup
-### 3. Citation Discovery
+**Sources (in priority order):**
-Search for business mentions:
+1. Website schema markup (JSON-LD `Organization`, `Hospital`, `LocalBusiness`) — `name` field is authoritative
- "[Business Name] [City]"
+2. Contact page / About page
- Phone number search
+3. Footer (address, phone, social links)
- Address search
+4. User-provided information
-### 4. GBP Review
+**Data to collect:**
 Manual checklist:
 - Profile completeness
 - Category accuracy
 - Photo presence
 - Review responses
 - Post recency
-### 5. Schema Check
+| Field | Example |
-Validate LocalBusiness markup presence and accuracy.
+|-------|---------|
 | Official name (Korean) | 제이미성형외과의원 |
 | Official name (English) | Jamie Plastic Surgery Clinic |
 | Brand/display name | Jamie Clinic |
 | Website URL | https://www.jamie.clinic |
 | Address (Korean) | 서울특별시 강남구 ... |
 | Phone | 02-XXX-XXXX |
 | Known GBP URL | (if available) |
 | Known Naver Place URL | (if available) |
 | Known Kakao Map URL | (if available) |
-## GBP Optimization Checklist
+Look for these URL patterns in `sameAs`, footer links, or embedded iframes:
 - GBP: `maps.app.goo.gl/*`, `google.com/maps/place/*`, `g.page/*`
 - Naver Place: `naver.me/*`, `map.naver.com/*/place/*`, `m.place.naver.com/*`
 - Kakao Map: `place.map.kakao.com/*`, `kko.to/*`
- [ ] Business name matches website
+### Step 1: Website NAP Extraction
- [ ] Complete address with suite/unit
+
- [ ] Local phone number (not toll-free)
+Scrape header, footer, contact page, about page. Cross-reference with schema markup. Establish the **canonical NAP** baseline.
- [ ] Accurate business hours
+
- [ ] Primary + secondary categories set
+### Step 2: GBP Verification & Audit
 **Layered discovery (try in order, stop when found):**
 1. Use provided GBP URL (from Step 0 or user input)
 2. Check website for GBP link (footer, contact, schema `sameAs`, embedded Google Maps iframe)
 3. Search: `"[Korean Name]" "[City/District]" Google Maps`
 4. Search: `"[English Name]" Google Maps [City]`
 5. Search: `"[exact phone number]" site:google.com/maps`
 **Important**: Google Maps is JS-rendered — scraping tools cannot extract business data. Use search for discovery, verify via search result snippets.
 **If found — audit checklist (score /10):**
 - [ ] Business name matches canonical NAP
 - [ ] Address is complete and accurate
 - [ ] Phone number matches
 - [ ] Business hours are current
 - [ ] Primary + secondary categories appropriate
 - [ ] Business description complete
 - [ ] 10+ photos uploaded
- [ ] Recent post (within 7 days)
+- [ ] Posts are recent (within 7 days)
- [ ] Reviews responded to
+- [ ] Reviews are responded to
 - [ ] Q&A section is active
-## Citation Priority
+**If NOT found:** Report as **"not discoverable via web search"** (distinct from "does not exist").
-| Platform | Priority |
+### Step 3: Naver Smart Place Verification & Audit
-|----------|----------|
+
-| Google Business Profile | Critical |
+**Layered discovery (try in order, stop when found):**
-| Apple Maps | High |
+1. Use provided Naver Place URL (from Step 0 or user input)
-| Bing Places | High |
+2. Check website for Naver Place link (footer, contact, schema `sameAs`)
-| Yelp | High |
+3. Search: `"[Korean Name]" site:map.naver.com`
-| Facebook | Medium |
+4. Search: `"[Korean Name]" 네이버 지도 [district]`
-| Industry directories | Medium |
+5. Search: `"[Korean Name]" 네이버 스마트플레이스`
 6. Search: `"[exact phone number]" site:map.naver.com`
 **Important**: Naver Map is JS-rendered — scraping tools cannot extract data. Use search for discovery, verify via snippets.
 **If found — audit checklist (score /10):**
 - [ ] Business name matches canonical NAP
 - [ ] Address is complete and accurate
 - [ ] Phone number matches
 - [ ] Business hours are current
 - [ ] Place is "claimed" (owner-managed / 업주 등록)
 - [ ] Keywords/tags are set
 - [ ] Booking/reservation link present
 - [ ] Recent blog reviews linked
 - [ ] Photos uploaded and current
 - [ ] Menu/service/price information present
 **If NOT found:** Report as **"not discoverable via web search"** (not "does not exist" or "not registered").
 ### Step 4: Kakao Map Verification
 **Discovery:**
 1. Use provided Kakao Map URL (from Step 0)
 2. Check website for Kakao Map link (`place.map.kakao.com/*`, `kko.to/*`)
 3. Search: `"[Korean Name]" site:place.map.kakao.com`
 4. Search: `"[Korean Name]" 카카오맵 [district]`
 **If found:** Verify NAP consistency against canonical NAP.
 ### Step 5: Citation Discovery
 **Korean market platform priorities:**
 | Platform | Priority | Market |
 |----------|----------|--------|
 | Google Business Profile | Critical | Global |
 | Naver Smart Place (네이버 스마트플레이스) | Critical | Korea |
 | Kakao Map (카카오맵) | High | Korea |
 | Industry-specific directories | High | Varies |
 | Apple Maps | Medium | Global |
 | Bing Places | Low | Global |
 **Korean medical/cosmetic industry directories:**
 - 강남언니 (Gangnam Unni)
 - 바비톡 (Babitalk)
 - 성예사 (Sungyesa)
 - 굿닥 (Goodoc)
 - 똑닥 (Ddocdoc)
 - 모두닥 (Modoodoc)
 - 하이닥 (HiDoc)
 ### Step 6: NAP Consistency Report
 Cross-reference all sources against canonical NAP.
 **Common inconsistency points:**
 - Building/landmark names — authoritative source is the **business registration certificate** (사업자등록증)
 - Phone format variations (02-XXX-XXXX vs +82-2-XXX-XXXX)
 - Address format (road-name vs lot-number / 도로명 vs 지번)
 - Korean vs English name spelling variations
 - Suite/floor number omissions
 ### Step 7: LocalBusiness Schema Validation
 Validate JSON-LD completeness: @type, name, address, telephone, openingHours, geo (GeoCoordinates), sameAs (GBP, Naver, Kakao, social), url, image.
 ## Scoring
 | Component | Weight | Max Score |
 |-----------|--------|-----------|
 | Business Identity completeness | 5% | /10 |
 | NAP Consistency | 20% | /10 |
 | GBP Optimization | 20% | /10 |
 | Naver Smart Place | 20% | /10 |
 | Kakao Map presence | 10% | /10 |
 | Citations (directories) | 10% | /10 |
 | LocalBusiness Schema | 15% | /10 |
 **Overall Local SEO Score** = weighted average, normalized to /100.
 ## Output Format
 ```markdown
 ## Local SEO Audit: [Business]
 ### Business Identity
 | Field | Value |
 |-------|-------|
 | Korean Name | ... |
 | English Name | ... |
 | Address | ... |
 | Phone | ... |
 ### NAP Consistency: X/10
-| Source | Name | Address | Phone |
+| Source | Name | Address | Phone | Status |
-|--------|------|---------|-------|
+|--------|------|---------|-------|--------|
-| Website | ✓/✗ | ✓/✗ | ✓/✗ |
+| Website | OK/Issue | OK/Issue | OK/Issue | Match/Mismatch |
-| GBP | ✓/✗ | ✓/✗ | ✓/✗ |
+| GBP | OK/Issue | OK/Issue | OK/Issue | Match/Mismatch |
 | Naver Place | OK/Issue | OK/Issue | OK/Issue | Match/Mismatch |
 | Kakao Map | OK/Issue | OK/Issue | OK/Issue | Match/Mismatch |
 ### GBP Score: X/10
 [Checklist results]
-### Citations Found: X
+### Naver Smart Place: X/10
- Consistent: X
+[Checklist results]
 - Inconsistent: X
-### LocalBusiness Schema
+### Kakao Map: X/10
 [Status + NAP check]
 ### Citations: X/10
 | Platform | Found | NAP Match |
 |----------|-------|-----------|
 | ... | | |
 ### LocalBusiness Schema: X/10
 - Present: Yes/No
 - Valid: Yes/No
 - Missing fields: [list]
 ### Overall Score: XX/100 (Grade)
 ### Priority Actions
-1. [Fix recommendations]
+1. [Recommendations]
 ```
-## Limitations
+## Notes
- GBP data requires manual access
+- GBP and Naver Map are JS-rendered — scraping tools cannot extract listing data. Always use search for discovery.
- Citation discovery limited to searchable sources
+- "Not discoverable via web search" != "does not exist." Always use this precise language.
- Cannot update external directories
+- For Korean businesses, Naver Smart Place is as important as GBP (often more so for domestic traffic).
 ## Notion Output (Required)
 All audit reports MUST be saved to OurDigital SEO Audit Log:
 - **Database ID**: `2c8581e5-8a1e-8035-880b-e38cefc2f3ef`
- **Properties**: Issue (title), Site (url), Category, Priority, Found Date, Audit ID
+- **Properties**: Issue (title), Site (url), Category (Local SEO), Priority, Found Date, Audit ID
 - **Language**: Korean with English technical terms
- **Audit ID Format**: [TYPE]-YYYYMMDD-NNN
+- **Audit ID Format**: LOCAL-YYYYMMDD-NNN
--- a/custom-skills/19-seo-keyword-strategy/code/CLAUDE.md
+++ b/custom-skills/19-seo-keyword-strategy/code/CLAUDE.md
@@ -0,0 +1,139 @@
 # CLAUDE.md
 ## Overview
 Keyword strategy and research tool for SEO campaigns. Expands seed keywords via Ahrefs APIs, classifies search intent, clusters topics, performs competitor keyword gap analysis, and supports Korean market keyword discovery including Naver autocomplete.
 ## Quick Start
 ```bash
 # Install dependencies
 pip install -r scripts/requirements.txt
 # Keyword research from seed keyword
 python scripts/keyword_researcher.py --keyword "치과 임플란트" --country kr --json
 # Keyword gap analysis vs competitor
 python scripts/keyword_gap_analyzer.py --target https://example.com --competitor https://competitor.com --json
 ```
 ## Scripts
 | Script | Purpose | Key Output |
 |--------|---------|------------|
 | `keyword_researcher.py` | Expand seed keywords, classify intent, cluster topics | Keyword list with volume, KD, intent, clusters |
 | `keyword_gap_analyzer.py` | Find competitor keyword gaps | Gap keywords with opportunity scores |
 | `base_client.py` | Shared utilities | RateLimiter, ConfigManager, BaseAsyncClient |
 ## Keyword Researcher
 ```bash
 # Basic expansion
 python scripts/keyword_researcher.py --keyword "dental implant" --json
 # Korean market with suffix expansion
 python scripts/keyword_researcher.py --keyword "치과 임플란트" --country kr --korean-suffixes --json
 # With volume-by-country comparison
 python scripts/keyword_researcher.py --keyword "dental implant" --country kr --compare-global --json
 # Output to file
 python scripts/keyword_researcher.py --keyword "치과 임플란트" --country kr --output report.json
 ```
 **Capabilities**:
 - Seed keyword expansion (matching terms, related terms, search suggestions)
 - Korean suffix expansion (추천, 가격, 후기, 잘하는곳, 부작용, 전후)
 - Search intent classification (informational/navigational/commercial/transactional)
 - Keyword clustering into topic groups
 - Volume-by-country comparison (Korea vs global)
 - Keyword difficulty scoring
 ## Keyword Gap Analyzer
 ```bash
 # Find gaps vs one competitor
 python scripts/keyword_gap_analyzer.py --target https://example.com --competitor https://competitor.com --json
 # Multiple competitors
 python scripts/keyword_gap_analyzer.py --target https://example.com --competitor https://comp1.com --competitor https://comp2.com --json
 # Filter by minimum volume
 python scripts/keyword_gap_analyzer.py --target https://example.com --competitor https://competitor.com --min-volume 100 --json
 ```
 **Capabilities**:
 - Identify keywords competitors rank for but target doesn't
 - Opportunity scoring based on volume, KD, and competitor positions
 - Segment gaps by intent type
 - Prioritize low-KD high-volume opportunities
 ## Ahrefs MCP Tools Used
 | Tool | Purpose |
 |------|---------|
 | `keywords-explorer-overview` | Get keyword metrics (volume, KD, CPC) |
 | `keywords-explorer-matching-terms` | Find matching keyword variations |
 | `keywords-explorer-related-terms` | Discover semantically related keywords |
 | `keywords-explorer-search-suggestions` | Get autocomplete suggestions |
 | `keywords-explorer-volume-by-country` | Compare volume across countries |
 | `keywords-explorer-volume-history` | Track volume trends over time |
 | `site-explorer-organic-keywords` | Get competitor keyword rankings |
 ## Output Format
 All scripts support `--json` flag for structured output:
 ```json
 {
  "seed_keyword": "치과 임플란트",
  "country": "kr",
  "total_keywords": 150,
  "clusters": [
    {
      "topic": "임플란트 가격",
      "keywords": [...],
      "total_volume": 12000
    }
  ],
  "keywords": [
    {
      "keyword": "치과 임플란트 가격",
      "volume": 5400,
      "kd": 32,
      "cpc": 2.5,
      "intent": "commercial",
      "cluster": "임플란트 가격"
    }
  ],
  "timestamp": "2025-01-01T00:00:00"
 }
 ```
 ## Notion Output (Required)
 **IMPORTANT**: All audit reports MUST be saved to the OurDigital SEO Audit Log database.
 ### Database Configuration
 | Field | Value |
 |-------|-------|
 | Database ID | `2c8581e5-8a1e-8035-880b-e38cefc2f3ef` |
 | URL | https://www.notion.so/dintelligence/2c8581e58a1e8035880be38cefc2f3ef |
 ### Required Properties
 | Property | Type | Description |
 |----------|------|-------------|
 | Issue | Title | Report title (Korean + date) |
 | Site | URL | Audited website URL |
 | Category | Select | Keyword Research |
 | Priority | Select | Based on opportunity score |
 | Found Date | Date | Research date (YYYY-MM-DD) |
 | Audit ID | Rich Text | Format: KW-YYYYMMDD-NNN |
 ### Language Guidelines
 - Report content in Korean (한국어)
 - Keep technical English terms as-is (e.g., Keyword Difficulty, Search Volume, CPC)
 - URLs and code remain unchanged
--- a/custom-skills/19-seo-keyword-strategy/code/scripts/base_client.py
+++ b/custom-skills/19-seo-keyword-strategy/code/scripts/base_client.py
@@ -0,0 +1,207 @@
 """
 Base Client - Shared async client utilities
 ===========================================
 Purpose: Rate-limited async operations for API clients
 Python: 3.10+
 """
 import asyncio
 import logging
 import os
 from asyncio import Semaphore
 from datetime import datetime
 from typing import Any, Callable, TypeVar
 from dotenv import load_dotenv
 from tenacity import (
    retry,
    stop_after_attempt,
    wait_exponential,
    retry_if_exception_type,
 )
 # Load environment variables
 load_dotenv()
 # Logging setup
 logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s",
 )
 T = TypeVar("T")
 class RateLimiter:
    """Rate limiter using token bucket algorithm."""
    def __init__(self, rate: float, per: float = 1.0):
        """
        Initialize rate limiter.
        Args:
            rate: Number of requests allowed
            per: Time period in seconds (default: 1 second)
        """
        self.rate = rate
        self.per = per
        self.tokens = rate
        self.last_update = datetime.now()
        self._lock = asyncio.Lock()
    async def acquire(self) -> None:
        """Acquire a token, waiting if necessary."""
        async with self._lock:
            now = datetime.now()
            elapsed = (now - self.last_update).total_seconds()
            self.tokens = min(self.rate, self.tokens + elapsed * (self.rate / self.per))
            self.last_update = now
            if self.tokens < 1:
                wait_time = (1 - self.tokens) * (self.per / self.rate)
                await asyncio.sleep(wait_time)
                self.tokens = 0
            else:
                self.tokens -= 1
 class BaseAsyncClient:
    """Base class for async API clients with rate limiting."""
    def __init__(
        self,
        max_concurrent: int = 5,
        requests_per_second: float = 3.0,
        logger: logging.Logger | None = None,
    ):
        """
        Initialize base client.
        Args:
            max_concurrent: Maximum concurrent requests
            requests_per_second: Rate limit
            logger: Logger instance
        """
        self.semaphore = Semaphore(max_concurrent)
        self.rate_limiter = RateLimiter(requests_per_second)
        self.logger = logger or logging.getLogger(self.__class__.__name__)
        self.stats = {
            "requests": 0,
            "success": 0,
            "errors": 0,
            "retries": 0,
        }
    @retry(
        stop=stop_after_attempt(3),
        wait=wait_exponential(multiplier=1, min=2, max=10),
        retry=retry_if_exception_type(Exception),
    )
    async def _rate_limited_request(
        self,
        coro: Callable[[], Any],
    ) -> Any:
        """Execute a request with rate limiting and retry."""
        async with self.semaphore:
            await self.rate_limiter.acquire()
            self.stats["requests"] += 1
            try:
                result = await coro()
                self.stats["success"] += 1
                return result
            except Exception as e:
                self.stats["errors"] += 1
                self.logger.error(f"Request failed: {e}")
                raise
    async def batch_requests(
        self,
        requests: list[Callable[[], Any]],
        desc: str = "Processing",
    ) -> list[Any]:
        """Execute multiple requests concurrently."""
        try:
            from tqdm.asyncio import tqdm
            has_tqdm = True
        except ImportError:
            has_tqdm = False
        async def execute(req: Callable) -> Any:
            try:
                return await self._rate_limited_request(req)
            except Exception as e:
                return {"error": str(e)}
        tasks = [execute(req) for req in requests]
        if has_tqdm:
            results = []
            for coro in tqdm.as_completed(tasks, total=len(tasks), desc=desc):
                result = await coro
                results.append(result)
            return results
        else:
            return await asyncio.gather(*tasks, return_exceptions=True)
    def print_stats(self) -> None:
        """Print request statistics."""
        self.logger.info("=" * 40)
        self.logger.info("Request Statistics:")
        self.logger.info(f"  Total Requests: {self.stats['requests']}")
        self.logger.info(f"  Successful: {self.stats['success']}")
        self.logger.info(f"  Errors: {self.stats['errors']}")
        self.logger.info("=" * 40)
 class ConfigManager:
    """Manage API configuration and credentials."""
    def __init__(self):
        load_dotenv()
    @property
    def google_credentials_path(self) -> str | None:
        """Get Google service account credentials path."""
        # Prefer SEO-specific credentials, fallback to general credentials
        seo_creds = os.path.expanduser("~/.credential/ourdigital-seo-agent.json")
        if os.path.exists(seo_creds):
            return seo_creds
        return os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
    @property
    def pagespeed_api_key(self) -> str | None:
        """Get PageSpeed Insights API key."""
        return os.getenv("PAGESPEED_API_KEY")
    @property
    def custom_search_api_key(self) -> str | None:
        """Get Custom Search API key."""
        return os.getenv("CUSTOM_SEARCH_API_KEY")
    @property
    def custom_search_engine_id(self) -> str | None:
        """Get Custom Search Engine ID."""
        return os.getenv("CUSTOM_SEARCH_ENGINE_ID")
    @property
    def notion_token(self) -> str | None:
        """Get Notion API token."""
        return os.getenv("NOTION_TOKEN") or os.getenv("NOTION_API_KEY")
    def validate_google_credentials(self) -> bool:
        """Validate Google credentials are configured."""
        creds_path = self.google_credentials_path
        if not creds_path:
            return False
        return os.path.exists(creds_path)
    def get_required(self, key: str) -> str:
        """Get required environment variable or raise error."""
        value = os.getenv(key)
        if not value:
            raise ValueError(f"Missing required environment variable: {key}")
        return value
 # Singleton config instance
 config = ConfigManager()
--- a/custom-skills/19-seo-keyword-strategy/code/scripts/keyword_gap_analyzer.py
+++ b/custom-skills/19-seo-keyword-strategy/code/scripts/keyword_gap_analyzer.py
@@ -0,0 +1,584 @@
 """
 Keyword Gap Analyzer - Competitor keyword gap analysis with opportunity scoring
 ===============================================================================
 Purpose: Identify keywords competitors rank for but target site doesn't,
         score opportunities, and prioritize by volume/difficulty ratio.
 Python: 3.10+
 """
 import argparse
 import json
 import logging
 import re
 import subprocess
 import sys
 from dataclasses import dataclass, field, asdict
 from datetime import datetime
 from typing import Optional
 from urllib.parse import urlparse
 # ---------------------------------------------------------------------------
 # Logging
 # ---------------------------------------------------------------------------
 logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s",
 )
 logger = logging.getLogger("keyword_gap_analyzer")
 # ---------------------------------------------------------------------------
 # Intent classification patterns (shared with keyword_researcher)
 # ---------------------------------------------------------------------------
 INTENT_PATTERNS: dict[str, list[str]] = {
    "transactional": [
        r"구매|구입|주문|buy|order|purchase|shop|deal|discount|coupon|할인|쿠폰",
        r"예약|booking|reserve|sign\s?up|register|등록|신청",
    ],
    "commercial": [
        r"가격|비용|얼마|price|cost|pricing|fee|요금",
        r"추천|best|top\s?\d|review|비교|compare|vs|versus|후기|리뷰|평점|평가",
        r"잘하는곳|잘하는|맛집|업체|병원|추천\s?병원",
    ],
    "navigational": [
        r"^(www\.|http|\.com|\.co\.kr|\.net)",
        r"공식|official|login|로그인|홈페이지|사이트|website",
        r"고객센터|contact|support|customer\s?service",
    ],
    "informational": [
        r"방법|how\s?to|what\s?is|why|when|where|who|which",
        r"뜻|의미|정의|definition|meaning|guide|tutorial",
        r"효과|부작용|증상|원인|차이|종류|type|cause|symptom|effect",
        r"전후|before\s?and\s?after|결과|result",
    ],
 }
 # ---------------------------------------------------------------------------
 # Dataclasses
 # ---------------------------------------------------------------------------
@dataclass
 class OrganicKeyword:
    """A keyword that a domain ranks for organically."""
    keyword: str
    position: int = 0
    volume: int = 0
    kd: float = 0.0
    cpc: float = 0.0
    url: str = ""
    traffic: int = 0
@dataclass
 class GapKeyword:
    """A keyword gap between target and competitor(s)."""
    keyword: str
    volume: int = 0
    kd: float = 0.0
    cpc: float = 0.0
    intent: str = "informational"
    opportunity_score: float = 0.0
    competitor_positions: dict[str, int] = field(default_factory=dict)
    competitor_urls: dict[str, str] = field(default_factory=dict)
    avg_competitor_position: float = 0.0
    def to_dict(self) -> dict:
        return asdict(self)
@dataclass
 class GapAnalysisResult:
    """Complete gap analysis result."""
    target: str
    competitors: list[str] = field(default_factory=list)
    country: str = "kr"
    total_gaps: int = 0
    total_opportunity_volume: int = 0
    gaps_by_intent: dict[str, int] = field(default_factory=dict)
    top_opportunities: list[GapKeyword] = field(default_factory=list)
    all_gaps: list[GapKeyword] = field(default_factory=list)
    target_keyword_count: int = 0
    competitor_keyword_counts: dict[str, int] = field(default_factory=dict)
    timestamp: str = ""
    def to_dict(self) -> dict:
        return {
            "target": self.target,
            "competitors": self.competitors,
            "country": self.country,
            "total_gaps": self.total_gaps,
            "total_opportunity_volume": self.total_opportunity_volume,
            "gaps_by_intent": self.gaps_by_intent,
            "top_opportunities": [g.to_dict() for g in self.top_opportunities],
            "all_gaps": [g.to_dict() for g in self.all_gaps],
            "target_keyword_count": self.target_keyword_count,
            "competitor_keyword_counts": self.competitor_keyword_counts,
            "timestamp": self.timestamp,
        }
 # ---------------------------------------------------------------------------
 # MCP Helper
 # ---------------------------------------------------------------------------
 def call_mcp_tool(tool_name: str, params: dict) -> dict:
    """
    Call an Ahrefs MCP tool and return parsed JSON response.
    In production this delegates to the MCP bridge. For standalone usage
    it invokes the Claude CLI with the appropriate tool call.
    """
    logger.info(f"Calling MCP tool: {tool_name} with params: {json.dumps(params, ensure_ascii=False)}")
    try:
        cmd = [
            "claude",
            "--print",
            "--output-format", "json",
            "-p",
            (
                f"Call the tool mcp__claude_ai_Ahrefs__{tool_name} with these parameters: "
                f"{json.dumps(params, ensure_ascii=False)}. Return ONLY the raw JSON result."
            ),
        ]
        result = subprocess.run(cmd, capture_output=True, text=True, timeout=120)
        if result.returncode != 0:
            logger.warning(f"MCP tool {tool_name} returned non-zero exit code: {result.returncode}")
            logger.debug(f"stderr: {result.stderr}")
            return {"error": result.stderr, "keywords": [], "items": []}
        try:
            return json.loads(result.stdout)
        except json.JSONDecodeError:
            return {"raw": result.stdout, "keywords": [], "items": []}
    except subprocess.TimeoutExpired:
        logger.error(f"MCP tool {tool_name} timed out")
        return {"error": "timeout", "keywords": [], "items": []}
    except FileNotFoundError:
        logger.warning("Claude CLI not found - returning empty result for standalone testing")
        return {"keywords": [], "items": []}
 # ---------------------------------------------------------------------------
 # Utility functions
 # ---------------------------------------------------------------------------
 def extract_domain(url: str) -> str:
    """Extract clean domain from URL."""
    if not url.startswith(("http://", "https://")):
        url = f"https://{url}"
    parsed = urlparse(url)
    domain = parsed.netloc or parsed.path
    domain = domain.lower().strip("/")
    if domain.startswith("www."):
        domain = domain[4:]
    return domain
 def classify_intent(keyword: str) -> str:
    """Classify search intent based on keyword patterns."""
    keyword_lower = keyword.lower().strip()
    for intent, patterns in INTENT_PATTERNS.items():
        for pattern in patterns:
            if re.search(pattern, keyword_lower, re.IGNORECASE):
                return intent
    return "informational"
 # ---------------------------------------------------------------------------
 # KeywordGapAnalyzer
 # ---------------------------------------------------------------------------
 class KeywordGapAnalyzer:
    """Analyze keyword gaps between a target site and its competitors."""
    def __init__(self, country: str = "kr", min_volume: int = 0):
        self.country = country
        self.min_volume = min_volume
    def get_organic_keywords(self, domain: str, limit: int = 1000) -> list[OrganicKeyword]:
        """
        Fetch organic keywords for a domain via Ahrefs site-explorer-organic-keywords.
        Returns a list of OrganicKeyword entries.
        """
        clean_domain = extract_domain(domain)
        logger.info(f"Fetching organic keywords for: {clean_domain} (limit={limit})")
        result = call_mcp_tool("site-explorer-organic-keywords", {
            "target": clean_domain,
            "country": self.country,
            "limit": limit,
            "mode": "domain",
        })
        keywords: list[OrganicKeyword] = []
        for item in result.get("keywords", result.get("items", [])):
            if not isinstance(item, dict):
                continue
            kw = OrganicKeyword(
                keyword=item.get("keyword", item.get("term", "")),
                position=int(item.get("position", item.get("rank", 0)) or 0),
                volume=int(item.get("volume", item.get("search_volume", 0)) or 0),
                kd=float(item.get("keyword_difficulty", item.get("kd", 0)) or 0),
                cpc=float(item.get("cpc", item.get("cost_per_click", 0)) or 0),
                url=item.get("url", item.get("best_position_url", "")),
                traffic=int(item.get("traffic", item.get("estimated_traffic", 0)) or 0),
            )
            if kw.keyword:
                keywords.append(kw)
        logger.info(f"Found {len(keywords)} organic keywords for {clean_domain}")
        return keywords
    def find_gaps(
        self,
        target_keywords: list[OrganicKeyword],
        competitor_keyword_sets: dict[str, list[OrganicKeyword]],
    ) -> list[GapKeyword]:
        """
        Identify keywords that competitors rank for but the target doesn't.
        A gap keyword is one that appears in at least one competitor's keyword
        set but not in the target's keyword set.
        """
        # Build target keyword set for fast lookup
        target_kw_set: set[str] = {kw.keyword.lower().strip() for kw in target_keywords}
        # Collect all competitor keywords with their positions
        gap_map: dict[str, GapKeyword] = {}
        for comp_domain, comp_keywords in competitor_keyword_sets.items():
            for ckw in comp_keywords:
                kw_lower = ckw.keyword.lower().strip()
                # Skip if target already ranks for this keyword
                if kw_lower in target_kw_set:
                    continue
                # Skip below minimum volume
                if ckw.volume < self.min_volume:
                    continue
                if kw_lower not in gap_map:
                    gap_map[kw_lower] = GapKeyword(
                        keyword=ckw.keyword,
                        volume=ckw.volume,
                        kd=ckw.kd,
                        cpc=ckw.cpc,
                        intent=classify_intent(ckw.keyword),
                        competitor_positions={},
                        competitor_urls={},
                    )
                gap_map[kw_lower].competitor_positions[comp_domain] = ckw.position
                gap_map[kw_lower].competitor_urls[comp_domain] = ckw.url
                # Update volume/kd if higher from another competitor
                if ckw.volume > gap_map[kw_lower].volume:
                    gap_map[kw_lower].volume = ckw.volume
                if ckw.kd > 0 and (gap_map[kw_lower].kd == 0 or ckw.kd < gap_map[kw_lower].kd):
                    gap_map[kw_lower].kd = ckw.kd
        gaps = list(gap_map.values())
        # Calculate average competitor position for each gap
        for gap in gaps:
            positions = list(gap.competitor_positions.values())
            gap.avg_competitor_position = round(
                sum(positions) / len(positions), 1
            ) if positions else 0.0
        logger.info(f"Found {len(gaps)} keyword gaps")
        return gaps
    def score_opportunities(self, gaps: list[GapKeyword]) -> list[GapKeyword]:
        """
        Score each gap keyword by opportunity potential.
        Formula:
            opportunity_score = (volume_score * 0.4) + (kd_score * 0.3) +
                                (position_score * 0.2) + (intent_score * 0.1)
        Where:
            - volume_score: normalized 0-100 based on max volume in set
            - kd_score: inverted (lower KD = higher score), normalized 0-100
            - position_score: based on avg competitor position (lower = easier to compete)
            - intent_score: commercial/transactional get higher scores
        """
        if not gaps:
            return gaps
        # Find max volume for normalization
        max_volume = max(g.volume for g in gaps) if gaps else 1
        max_volume = max(max_volume, 1)
        intent_scores = {
            "transactional": 100,
            "commercial": 80,
            "informational": 40,
            "navigational": 20,
        }
        for gap in gaps:
            # Volume score (0-100)
            volume_score = (gap.volume / max_volume) * 100
            # KD score (inverted: low KD = high score)
            kd_score = max(0, 100 - gap.kd)
            # Position score (competitors ranking 1-10 means realistic opportunity)
            if gap.avg_competitor_position <= 10:
                position_score = 90
            elif gap.avg_competitor_position <= 20:
                position_score = 70
            elif gap.avg_competitor_position <= 50:
                position_score = 50
            else:
                position_score = 30
            # Intent score
            intent_score = intent_scores.get(gap.intent, 40)
            # Combined score
            gap.opportunity_score = round(
                (volume_score * 0.4) +
                (kd_score * 0.3) +
                (position_score * 0.2) +
                (intent_score * 0.1),
                1,
            )
        # Sort by opportunity score descending
        gaps.sort(key=lambda g: g.opportunity_score, reverse=True)
        logger.info(f"Scored {len(gaps)} gap keywords by opportunity")
        return gaps
    def analyze(self, target_url: str, competitor_urls: list[str]) -> GapAnalysisResult:
        """
        Orchestrate full keyword gap analysis:
        1. Fetch organic keywords for target
        2. Fetch organic keywords for each competitor
        3. Identify gaps
        4. Score opportunities
        5. Compile results
        """
        target_domain = extract_domain(target_url)
        competitor_domains = [extract_domain(url) for url in competitor_urls]
        logger.info(
            f"Starting gap analysis: {target_domain} vs {', '.join(competitor_domains)}"
        )
        # Step 1: Fetch target keywords
        target_keywords = self.get_organic_keywords(target_domain)
        # Step 2: Fetch competitor keywords
        competitor_keyword_sets: dict[str, list[OrganicKeyword]] = {}
        competitor_keyword_counts: dict[str, int] = {}
        for comp_domain in competitor_domains:
            comp_keywords = self.get_organic_keywords(comp_domain)
            competitor_keyword_sets[comp_domain] = comp_keywords
            competitor_keyword_counts[comp_domain] = len(comp_keywords)
        # Step 3: Find gaps
        gaps = self.find_gaps(target_keywords, competitor_keyword_sets)
        # Step 4: Score opportunities
        scored_gaps = self.score_opportunities(gaps)
        # Step 5: Calculate intent distribution
        gaps_by_intent: dict[str, int] = {}
        for gap in scored_gaps:
            gaps_by_intent[gap.intent] = gaps_by_intent.get(gap.intent, 0) + 1
        # Step 6: Compile result
        result = GapAnalysisResult(
            target=target_domain,
            competitors=competitor_domains,
            country=self.country,
            total_gaps=len(scored_gaps),
            total_opportunity_volume=sum(g.volume for g in scored_gaps),
            gaps_by_intent=gaps_by_intent,
            top_opportunities=scored_gaps[:50],
            all_gaps=scored_gaps,
            target_keyword_count=len(target_keywords),
            competitor_keyword_counts=competitor_keyword_counts,
            timestamp=datetime.now().isoformat(),
        )
        logger.info(
            f"Gap analysis complete: {result.total_gaps} gaps found, "
            f"total opportunity volume {result.total_opportunity_volume:,}"
        )
        return result
 # ---------------------------------------------------------------------------
 # Plain-text report formatter
 # ---------------------------------------------------------------------------
 def format_text_report(result: GapAnalysisResult) -> str:
    """Format gap analysis result as a human-readable text report."""
    lines: list[str] = []
    lines.append("=" * 75)
    lines.append(f"Keyword Gap Analysis Report")
    lines.append(f"Target: {result.target}")
    lines.append(f"Competitors: {', '.join(result.competitors)}")
    lines.append(f"Country: {result.country.upper()} | Date: {result.timestamp[:10]}")
    lines.append("=" * 75)
    lines.append("")
    # Overview
    lines.append("## Overview")
    lines.append(f"  Target keywords: {result.target_keyword_count:,}")
    for comp, count in result.competitor_keyword_counts.items():
        lines.append(f"  {comp} keywords: {count:,}")
    lines.append(f"  Keyword gaps found: {result.total_gaps:,}")
    lines.append(f"  Total opportunity volume: {result.total_opportunity_volume:,}")
    lines.append("")
    # Intent distribution
    if result.gaps_by_intent:
        lines.append("## Gaps by Intent")
        for intent, count in sorted(result.gaps_by_intent.items(), key=lambda x: x[1], reverse=True):
            pct = (count / result.total_gaps) * 100 if result.total_gaps else 0
            lines.append(f"  {intent:<15}: {count:>5} ({pct:.1f}%)")
        lines.append("")
    # Top opportunities
    if result.top_opportunities:
        lines.append("## Top Opportunities (by score)")
        header = f"  {'Keyword':<35} {'Vol':>8} {'KD':>6} {'Score':>7} {'Intent':<15} {'Competitors'}"
        lines.append(header)
        lines.append("  " + "-" * 90)
        for gap in result.top_opportunities[:30]:
            kw_display = gap.keyword[:33] if len(gap.keyword) > 33 else gap.keyword
            comp_positions = ", ".join(
                f"{d}:#{p}" for d, p in gap.competitor_positions.items()
            )
            comp_display = comp_positions[:30] if len(comp_positions) > 30 else comp_positions
            lines.append(
                f"  {kw_display:<35} {gap.volume:>8,} {gap.kd:>6.1f} "
                f"{gap.opportunity_score:>7.1f} {gap.intent:<15} {comp_display}"
            )
        lines.append("")
    # Quick wins (low KD, high volume)
    quick_wins = [g for g in result.all_gaps if g.kd <= 30 and g.volume >= 100]
    quick_wins.sort(key=lambda g: g.volume, reverse=True)
    if quick_wins:
        lines.append("## Quick Wins (KD <= 30, Volume >= 100)")
        lines.append(f"  {'Keyword':<35} {'Vol':>8} {'KD':>6} {'Intent':<15}")
        lines.append("  " + "-" * 64)
        for gap in quick_wins[:20]:
            kw_display = gap.keyword[:33] if len(gap.keyword) > 33 else gap.keyword
            lines.append(
                f"  {kw_display:<35} {gap.volume:>8,} {gap.kd:>6.1f} {gap.intent:<15}"
            )
        lines.append("")
    return "\n".join(lines)
 # ---------------------------------------------------------------------------
 # CLI
 # ---------------------------------------------------------------------------
 def main():
    parser = argparse.ArgumentParser(
        description="Keyword Gap Analyzer - Find competitor keyword opportunities",
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog="""
 Examples:
  python keyword_gap_analyzer.py --target https://example.com --competitor https://comp.com --json
  python keyword_gap_analyzer.py --target example.com --competitor comp1.com --competitor comp2.com --min-volume 100 --json
  python keyword_gap_analyzer.py --target example.com --competitor comp.com --country us --output gaps.json
        """,
    )
    parser.add_argument(
        "--target",
        required=True,
        help="Target website URL or domain",
    )
    parser.add_argument(
        "--competitor",
        action="append",
        required=True,
        dest="competitors",
        help="Competitor URL or domain (can be repeated)",
    )
    parser.add_argument(
        "--country",
        default="kr",
        help="Target country code (default: kr)",
    )
    parser.add_argument(
        "--min-volume",
        type=int,
        default=0,
        help="Minimum search volume filter (default: 0)",
    )
    parser.add_argument(
        "--json",
        action="store_true",
        dest="output_json",
        help="Output results as JSON",
    )
    parser.add_argument(
        "--output",
        type=str,
        default=None,
        help="Write output to file (path)",
    )
    parser.add_argument(
        "--verbose",
        action="store_true",
        help="Enable verbose/debug logging",
    )
    args = parser.parse_args()
    if args.verbose:
        logging.getLogger().setLevel(logging.DEBUG)
    # Run analysis
    analyzer = KeywordGapAnalyzer(
        country=args.country,
        min_volume=args.min_volume,
    )
    result = analyzer.analyze(args.target, args.competitors)
    # Format output
    if args.output_json:
        output = json.dumps(result.to_dict(), ensure_ascii=False, indent=2)
    else:
        output = format_text_report(result)
    # Write or print
    if args.output:
        with open(args.output, "w", encoding="utf-8") as f:
            f.write(output)
        logger.info(f"Output written to: {args.output}")
    else:
        print(output)
    return 0
 if __name__ == "__main__":
    sys.exit(main())
--- a/custom-skills/19-seo-keyword-strategy/code/scripts/keyword_researcher.py
+++ b/custom-skills/19-seo-keyword-strategy/code/scripts/keyword_researcher.py
@@ -0,0 +1,656 @@
 """
 Keyword Researcher - Seed keyword expansion, intent classification, and topic clustering
 ========================================================================================
 Purpose: Expand seed keywords via Ahrefs APIs, classify search intent,
         cluster topics, and support Korean market keyword discovery.
 Python: 3.10+
 """
 import argparse
 import json
 import logging
 import re
 import subprocess
 import sys
 from dataclasses import dataclass, field, asdict
 from datetime import datetime
 from typing import Optional
 # ---------------------------------------------------------------------------
 # Logging
 # ---------------------------------------------------------------------------
 logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s",
 )
 logger = logging.getLogger("keyword_researcher")
 # ---------------------------------------------------------------------------
 # Constants - Korean suffix expansion
 # ---------------------------------------------------------------------------
 KOREAN_SUFFIXES: list[str] = [
    "추천",
    "가격",
    "후기",
    "잘하는곳",
    "부작용",
    "전후",
    "비용",
    "추천 병원",
    "후기 블로그",
    "방법",
    "종류",
    "비교",
    "효과",
    "주의사항",
    "장단점",
 ]
 # ---------------------------------------------------------------------------
 # Intent classification patterns
 # ---------------------------------------------------------------------------
 INTENT_PATTERNS: dict[str, list[str]] = {
    "transactional": [
        r"구매|구입|주문|buy|order|purchase|shop|deal|discount|coupon|할인|쿠폰",
        r"예약|booking|reserve|sign\s?up|register|등록|신청",
    ],
    "commercial": [
        r"가격|비용|얼마|price|cost|pricing|fee|요금",
        r"추천|best|top\s?\d|review|비교|compare|vs|versus|후기|리뷰|평점|평가",
        r"잘하는곳|잘하는|맛집|업체|병원|추천\s?병원",
    ],
    "navigational": [
        r"^(www\.|http|\.com|\.co\.kr|\.net)",
        r"공식|official|login|로그인|홈페이지|사이트|website",
        r"고객센터|contact|support|customer\s?service",
    ],
    "informational": [
        r"방법|how\s?to|what\s?is|why|when|where|who|which",
        r"뜻|의미|정의|definition|meaning|guide|tutorial",
        r"효과|부작용|증상|원인|차이|종류|type|cause|symptom|effect",
        r"전후|before\s?and\s?after|결과|result",
    ],
 }
 # ---------------------------------------------------------------------------
 # Dataclasses
 # ---------------------------------------------------------------------------
@dataclass
 class KeywordEntry:
    """Single keyword with its metrics and classification."""
    keyword: str
    volume: int = 0
    kd: float = 0.0
    cpc: float = 0.0
    intent: str = "informational"
    cluster: str = ""
    source: str = ""
    country_volumes: dict[str, int] = field(default_factory=dict)
    def to_dict(self) -> dict:
        data = asdict(self)
        if not data["country_volumes"]:
            del data["country_volumes"]
        return data
@dataclass
 class KeywordCluster:
    """Group of semantically related keywords."""
    topic: str
    keywords: list[str] = field(default_factory=list)
    total_volume: int = 0
    avg_kd: float = 0.0
    primary_intent: str = "informational"
    def to_dict(self) -> dict:
        return asdict(self)
@dataclass
 class ResearchResult:
    """Full research result container."""
    seed_keyword: str
    country: str
    total_keywords: int = 0
    total_volume: int = 0
    clusters: list[KeywordCluster] = field(default_factory=list)
    keywords: list[KeywordEntry] = field(default_factory=list)
    timestamp: str = ""
    def to_dict(self) -> dict:
        return {
            "seed_keyword": self.seed_keyword,
            "country": self.country,
            "total_keywords": self.total_keywords,
            "total_volume": self.total_volume,
            "clusters": [c.to_dict() for c in self.clusters],
            "keywords": [k.to_dict() for k in self.keywords],
            "timestamp": self.timestamp,
        }
 # ---------------------------------------------------------------------------
 # MCP Helper - calls Ahrefs MCP tools via subprocess
 # ---------------------------------------------------------------------------
 def call_mcp_tool(tool_name: str, params: dict) -> dict:
    """
    Call an Ahrefs MCP tool and return parsed JSON response.
    In production this delegates to the MCP bridge. For standalone usage
    it invokes the Claude CLI with the appropriate tool call.
    """
    logger.info(f"Calling MCP tool: {tool_name} with params: {json.dumps(params, ensure_ascii=False)}")
    try:
        cmd = [
            "claude",
            "--print",
            "--output-format", "json",
            "-p",
            f"Call the tool mcp__claude_ai_Ahrefs__{tool_name} with these parameters: {json.dumps(params, ensure_ascii=False)}. Return ONLY the raw JSON result.",
        ]
        result = subprocess.run(cmd, capture_output=True, text=True, timeout=120)
        if result.returncode != 0:
            logger.warning(f"MCP tool {tool_name} returned non-zero exit code: {result.returncode}")
            logger.debug(f"stderr: {result.stderr}")
            return {"error": result.stderr, "keywords": [], "items": []}
        try:
            return json.loads(result.stdout)
        except json.JSONDecodeError:
            return {"raw": result.stdout, "keywords": [], "items": []}
    except subprocess.TimeoutExpired:
        logger.error(f"MCP tool {tool_name} timed out")
        return {"error": "timeout", "keywords": [], "items": []}
    except FileNotFoundError:
        logger.warning("Claude CLI not found - returning empty result for standalone testing")
        return {"keywords": [], "items": []}
 # ---------------------------------------------------------------------------
 # KeywordResearcher
 # ---------------------------------------------------------------------------
 class KeywordResearcher:
    """Expand seed keywords, classify intent, and cluster topics."""
    def __init__(self, country: str = "kr", korean_suffixes: bool = False, compare_global: bool = False):
        self.country = country
        self.korean_suffixes = korean_suffixes
        self.compare_global = compare_global
        self._seen: set[str] = set()
    # ---- Keyword expansion via Ahrefs MCP ----
    def expand_keywords(self, seed: str) -> list[KeywordEntry]:
        """
        Expand a seed keyword using Ahrefs matching-terms, related-terms,
        and search-suggestions endpoints.
        """
        all_keywords: list[KeywordEntry] = []
        # 1. Matching terms
        logger.info(f"Fetching matching terms for: {seed}")
        matching = call_mcp_tool("keywords-explorer-matching-terms", {
            "keyword": seed,
            "country": self.country,
            "limit": 100,
        })
        for item in matching.get("keywords", matching.get("items", [])):
            kw = self._parse_keyword_item(item, source="matching-terms")
            if kw and kw.keyword not in self._seen:
                self._seen.add(kw.keyword)
                all_keywords.append(kw)
        # 2. Related terms
        logger.info(f"Fetching related terms for: {seed}")
        related = call_mcp_tool("keywords-explorer-related-terms", {
            "keyword": seed,
            "country": self.country,
            "limit": 100,
        })
        for item in related.get("keywords", related.get("items", [])):
            kw = self._parse_keyword_item(item, source="related-terms")
            if kw and kw.keyword not in self._seen:
                self._seen.add(kw.keyword)
                all_keywords.append(kw)
        # 3. Search suggestions
        logger.info(f"Fetching search suggestions for: {seed}")
        suggestions = call_mcp_tool("keywords-explorer-search-suggestions", {
            "keyword": seed,
            "country": self.country,
            "limit": 50,
        })
        for item in suggestions.get("keywords", suggestions.get("items", [])):
            kw = self._parse_keyword_item(item, source="search-suggestions")
            if kw and kw.keyword not in self._seen:
                self._seen.add(kw.keyword)
                all_keywords.append(kw)
        # 4. Add the seed itself if not already present
        if seed not in self._seen:
            self._seen.add(seed)
            overview = call_mcp_tool("keywords-explorer-overview", {
                "keyword": seed,
                "country": self.country,
            })
            seed_entry = self._parse_keyword_item(overview, source="seed")
            if seed_entry:
                seed_entry.keyword = seed
                all_keywords.insert(0, seed_entry)
        logger.info(f"Expanded to {len(all_keywords)} keywords from Ahrefs APIs")
        return all_keywords
    def expand_korean_suffixes(self, seed: str) -> list[KeywordEntry]:
        """
        Generate keyword variations by appending common Korean suffixes.
        Each variation is checked against Ahrefs for volume data.
        """
        suffix_keywords: list[KeywordEntry] = []
        for suffix in KOREAN_SUFFIXES:
            variation = f"{seed} {suffix}"
            if variation in self._seen:
                continue
            logger.info(f"Checking Korean suffix variation: {variation}")
            overview = call_mcp_tool("keywords-explorer-overview", {
                "keyword": variation,
                "country": self.country,
            })
            kw = self._parse_keyword_item(overview, source="korean-suffix")
            if kw:
                kw.keyword = variation
                if kw.volume > 0:
                    self._seen.add(variation)
                    suffix_keywords.append(kw)
            else:
                # Even if no data, include as zero-volume for completeness
                entry = KeywordEntry(
                    keyword=variation,
                    volume=0,
                    kd=0.0,
                    cpc=0.0,
                    intent=self.classify_intent(variation),
                    source="korean-suffix",
                )
                self._seen.add(variation)
                suffix_keywords.append(entry)
        logger.info(f"Korean suffix expansion yielded {len(suffix_keywords)} variations")
        return suffix_keywords
    def get_volume_by_country(self, keyword: str) -> dict[str, int]:
        """
        Get search volume breakdown by country for a keyword.
        Useful for comparing Korean vs global demand.
        """
        logger.info(f"Fetching volume-by-country for: {keyword}")
        result = call_mcp_tool("keywords-explorer-volume-by-country", {
            "keyword": keyword,
        })
        volumes: dict[str, int] = {}
        for item in result.get("countries", result.get("items", [])):
            if isinstance(item, dict):
                country_code = item.get("country", item.get("code", ""))
                volume = item.get("volume", item.get("search_volume", 0))
                if country_code and volume:
                    volumes[country_code.lower()] = int(volume)
        return volumes
    # ---- Intent classification ----
    def classify_intent(self, keyword: str) -> str:
        """
        Classify search intent based on keyword patterns.
        Priority: transactional > commercial > navigational > informational
        """
        keyword_lower = keyword.lower().strip()
        for intent, patterns in INTENT_PATTERNS.items():
            for pattern in patterns:
                if re.search(pattern, keyword_lower, re.IGNORECASE):
                    return intent
        return "informational"
    # ---- Keyword clustering ----
    def cluster_keywords(self, keywords: list[KeywordEntry]) -> list[KeywordCluster]:
        """
        Group keywords into topic clusters using shared n-gram tokens.
        Uses a simple token overlap approach: keywords sharing significant
        tokens (2+ character words) are grouped together.
        """
        if not keywords:
            return []
        # Extract meaningful tokens from each keyword
        def tokenize(text: str) -> set[str]:
            tokens = set()
            for word in re.split(r"\s+", text.strip().lower()):
                if len(word) >= 2:
                    tokens.add(word)
            return tokens
        # Build token-to-keyword mapping
        token_map: dict[str, list[int]] = {}
        kw_tokens: list[set[str]] = []
        for i, kw in enumerate(keywords):
            tokens = tokenize(kw.keyword)
            kw_tokens.append(tokens)
            for token in tokens:
                if token not in token_map:
                    token_map[token] = []
                token_map[token].append(i)
        # Find the most common significant tokens (cluster anchors)
        token_freq = sorted(token_map.items(), key=lambda x: len(x[1]), reverse=True)
        assigned: set[int] = set()
        clusters: list[KeywordCluster] = []
        for token, indices in token_freq:
            # Skip single-occurrence tokens or very common stop-like tokens
            if len(indices) < 2:
                continue
            # Gather unassigned keywords that share this token
            cluster_indices = [i for i in indices if i not in assigned]
            if len(cluster_indices) < 2:
                continue
            # Create the cluster
            cluster_kws = [keywords[i].keyword for i in cluster_indices]
            cluster_volumes = [keywords[i].volume for i in cluster_indices]
            cluster_kds = [keywords[i].kd for i in cluster_indices]
            cluster_intents = [keywords[i].intent for i in cluster_indices]
            # Determine primary intent by frequency
            intent_counts: dict[str, int] = {}
            for intent in cluster_intents:
                intent_counts[intent] = intent_counts.get(intent, 0) + 1
            primary_intent = max(intent_counts, key=intent_counts.get)
            cluster = KeywordCluster(
                topic=token,
                keywords=cluster_kws,
                total_volume=sum(cluster_volumes),
                avg_kd=round(sum(cluster_kds) / len(cluster_kds), 1) if cluster_kds else 0.0,
                primary_intent=primary_intent,
            )
            clusters.append(cluster)
            for i in cluster_indices:
                assigned.add(i)
                keywords[i].cluster = token
        # Assign unclustered keywords to an "other" cluster
        unclustered = [i for i in range(len(keywords)) if i not in assigned]
        if unclustered:
            other_kws = [keywords[i].keyword for i in unclustered]
            other_volumes = [keywords[i].volume for i in unclustered]
            other_kds = [keywords[i].kd for i in unclustered]
            other_cluster = KeywordCluster(
                topic="(unclustered)",
                keywords=other_kws,
                total_volume=sum(other_volumes),
                avg_kd=round(sum(other_kds) / len(other_kds), 1) if other_kds else 0.0,
                primary_intent="informational",
            )
            clusters.append(other_cluster)
            for i in unclustered:
                keywords[i].cluster = "(unclustered)"
        # Sort clusters by total volume descending
        clusters.sort(key=lambda c: c.total_volume, reverse=True)
        logger.info(f"Clustered {len(keywords)} keywords into {len(clusters)} clusters")
        return clusters
    # ---- Full analysis orchestration ----
    def analyze(self, seed_keyword: str) -> ResearchResult:
        """
        Orchestrate a full keyword research analysis:
        1. Expand seed via Ahrefs
        2. Optionally expand Korean suffixes
        3. Classify intent for all keywords
        4. Optionally fetch volume-by-country
        5. Cluster keywords into topics
        6. Compile results
        """
        logger.info(f"Starting keyword research for: {seed_keyword} (country={self.country})")
        # Step 1: Expand keywords
        keywords = self.expand_keywords(seed_keyword)
        # Step 2: Korean suffix expansion
        if self.korean_suffixes:
            suffix_keywords = self.expand_korean_suffixes(seed_keyword)
            keywords.extend(suffix_keywords)
        # Step 3: Classify intent for all keywords
        for kw in keywords:
            if not kw.intent or kw.intent == "informational":
                kw.intent = self.classify_intent(kw.keyword)
        # Step 4: Volume-by-country comparison
        if self.compare_global and keywords:
            # Fetch for the seed and top volume keywords
            top_keywords = sorted(keywords, key=lambda k: k.volume, reverse=True)[:10]
            for kw in top_keywords:
                volumes = self.get_volume_by_country(kw.keyword)
                kw.country_volumes = volumes
        # Step 5: Cluster keywords
        clusters = self.cluster_keywords(keywords)
        # Step 6: Compile result
        result = ResearchResult(
            seed_keyword=seed_keyword,
            country=self.country,
            total_keywords=len(keywords),
            total_volume=sum(kw.volume for kw in keywords),
            clusters=clusters,
            keywords=sorted(keywords, key=lambda k: k.volume, reverse=True),
            timestamp=datetime.now().isoformat(),
        )
        logger.info(
            f"Research complete: {result.total_keywords} keywords, "
            f"{len(result.clusters)} clusters, "
            f"total volume {result.total_volume}"
        )
        return result
    # ---- Internal helpers ----
    def _parse_keyword_item(self, item: dict, source: str = "") -> Optional[KeywordEntry]:
        """Parse an Ahrefs API response item into a KeywordEntry."""
        if not item or "error" in item:
            return None
        keyword = item.get("keyword", item.get("term", item.get("query", "")))
        if not keyword:
            return None
        volume = int(item.get("volume", item.get("search_volume", 0)) or 0)
        kd = float(item.get("keyword_difficulty", item.get("kd", 0)) or 0)
        cpc = float(item.get("cpc", item.get("cost_per_click", 0)) or 0)
        return KeywordEntry(
            keyword=keyword,
            volume=volume,
            kd=round(kd, 1),
            cpc=round(cpc, 2),
            intent=self.classify_intent(keyword),
            source=source,
        )
 # ---------------------------------------------------------------------------
 # Plain-text report formatter
 # ---------------------------------------------------------------------------
 def format_text_report(result: ResearchResult) -> str:
    """Format research result as a human-readable text report."""
    lines: list[str] = []
    lines.append("=" * 70)
    lines.append(f"Keyword Strategy Report: {result.seed_keyword}")
    lines.append(f"Country: {result.country.upper()} | Date: {result.timestamp[:10]}")
    lines.append("=" * 70)
    lines.append("")
    lines.append("## Overview")
    lines.append(f"  Total keywords discovered: {result.total_keywords}")
    lines.append(f"  Topic clusters: {len(result.clusters)}")
    lines.append(f"  Total search volume: {result.total_volume:,}")
    lines.append("")
    # Clusters summary
    if result.clusters:
        lines.append("## Top Clusters")
        lines.append(f"  {'Cluster':<25} {'Keywords':>8} {'Volume':>10} {'Avg KD':>8} {'Intent':<15}")
        lines.append("  " + "-" * 66)
        for cluster in result.clusters[:15]:
            lines.append(
                f"  {cluster.topic:<25} {len(cluster.keywords):>8} "
                f"{cluster.total_volume:>10,} {cluster.avg_kd:>8.1f} "
                f"{cluster.primary_intent:<15}"
            )
        lines.append("")
    # Top keywords
    if result.keywords:
        lines.append("## Top Keywords (by volume)")
        lines.append(f"  {'Keyword':<40} {'Vol':>8} {'KD':>6} {'CPC':>7} {'Intent':<15} {'Cluster':<15}")
        lines.append("  " + "-" * 91)
        for kw in result.keywords[:30]:
            kw_display = kw.keyword[:38] if len(kw.keyword) > 38 else kw.keyword
            cluster_display = kw.cluster[:13] if len(kw.cluster) > 13 else kw.cluster
            lines.append(
                f"  {kw_display:<40} {kw.volume:>8,} {kw.kd:>6.1f} "
                f"{kw.cpc:>7.2f} {kw.intent:<15} {cluster_display:<15}"
            )
        lines.append("")
    # Intent distribution
    intent_dist: dict[str, int] = {}
    for kw in result.keywords:
        intent_dist[kw.intent] = intent_dist.get(kw.intent, 0) + 1
    if intent_dist:
        lines.append("## Intent Distribution")
        for intent, count in sorted(intent_dist.items(), key=lambda x: x[1], reverse=True):
            pct = (count / len(result.keywords)) * 100 if result.keywords else 0
            lines.append(f"  {intent:<15}: {count:>5} ({pct:.1f}%)")
        lines.append("")
    return "\n".join(lines)
 # ---------------------------------------------------------------------------
 # CLI
 # ---------------------------------------------------------------------------
 def main():
    parser = argparse.ArgumentParser(
        description="Keyword Researcher - Expand, classify, and cluster keywords",
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog="""
 Examples:
  python keyword_researcher.py --keyword "치과 임플란트" --country kr --json
  python keyword_researcher.py --keyword "dental implant" --compare-global --json
  python keyword_researcher.py --keyword "치과 임플란트" --korean-suffixes --output report.json
        """,
    )
    parser.add_argument(
        "--keyword",
        required=True,
        help="Seed keyword to expand and research",
    )
    parser.add_argument(
        "--country",
        default="kr",
        help="Target country code (default: kr)",
    )
    parser.add_argument(
        "--korean-suffixes",
        action="store_true",
        help="Enable Korean suffix expansion (추천, 가격, 후기, etc.)",
    )
    parser.add_argument(
        "--compare-global",
        action="store_true",
        help="Fetch volume-by-country comparison for top keywords",
    )
    parser.add_argument(
        "--json",
        action="store_true",
        dest="output_json",
        help="Output results as JSON",
    )
    parser.add_argument(
        "--output",
        type=str,
        default=None,
        help="Write output to file (path)",
    )
    parser.add_argument(
        "--verbose",
        action="store_true",
        help="Enable verbose/debug logging",
    )
    args = parser.parse_args()
    if args.verbose:
        logging.getLogger().setLevel(logging.DEBUG)
    # Run analysis
    researcher = KeywordResearcher(
        country=args.country,
        korean_suffixes=args.korean_suffixes,
        compare_global=args.compare_global,
    )
    result = researcher.analyze(args.keyword)
    # Format output
    if args.output_json:
        output = json.dumps(result.to_dict(), ensure_ascii=False, indent=2)
    else:
        output = format_text_report(result)
    # Write or print
    if args.output:
        with open(args.output, "w", encoding="utf-8") as f:
            f.write(output)
        logger.info(f"Output written to: {args.output}")
    else:
        print(output)
    return 0
 if __name__ == "__main__":
    sys.exit(main())
--- a/custom-skills/19-seo-keyword-strategy/code/scripts/requirements.txt
+++ b/custom-skills/19-seo-keyword-strategy/code/scripts/requirements.txt
@@ -0,0 +1,20 @@
 # 19-seo-keyword-strategy dependencies
 # Install: pip install -r requirements.txt
 # HTTP & Async
 requests>=2.31.0
 aiohttp>=3.9.0
 # Data Processing
 pandas>=2.1.0
 # NLP / Text Similarity
 scikit-learn>=1.3.0
 # Async & Retry
 tenacity>=8.2.0
 tqdm>=4.66.0
 # Environment & CLI
 python-dotenv>=1.0.0
 rich>=13.7.0
--- a/custom-skills/19-seo-keyword-strategy/desktop/SKILL.md
+++ b/custom-skills/19-seo-keyword-strategy/desktop/SKILL.md
@@ -0,0 +1,91 @@
 ---
 name: seo-keyword-strategy
 description: |
  Keyword strategy and research for SEO campaigns. Triggers: keyword research, keyword analysis, keyword gap, search volume, keyword clustering, intent classification.
 ---
 # SEO Keyword Strategy & Research
 ## Purpose
 Expand seed keywords, classify search intent, cluster topics, and identify competitor keyword gaps for comprehensive keyword strategy development.
 ## Core Capabilities
 1. **Keyword Expansion** - Matching terms, related terms, search suggestions
 2. **Korean Market** - Suffix expansion, Naver autocomplete, Korean intent patterns
 3. **Intent Classification** - Informational, navigational, commercial, transactional
 4. **Topic Clustering** - Group keywords into semantic clusters
 5. **Gap Analysis** - Find competitor keywords missing from target site
 ## MCP Tool Usage
 ### Ahrefs for Keyword Data
 ```
 mcp__ahrefs__keywords-explorer-overview: Get keyword metrics
 mcp__ahrefs__keywords-explorer-matching-terms: Find keyword variations
 mcp__ahrefs__keywords-explorer-related-terms: Discover related keywords
 mcp__ahrefs__keywords-explorer-search-suggestions: Autocomplete suggestions
 mcp__ahrefs__keywords-explorer-volume-by-country: Country volume comparison
 mcp__ahrefs__site-explorer-organic-keywords: Competitor keyword rankings
 ```
 ### Web Search for Naver Discovery
 ```
 WebSearch: Naver autocomplete and trend discovery
 ```
 ## Workflow
 ### 1. Seed Keyword Expansion
 1. Input seed keyword (Korean or English)
 2. Query Ahrefs matching-terms and related-terms
 3. Get search suggestions for long-tail variations
 4. Apply Korean suffix expansion if Korean market
 5. Deduplicate and merge results
 ### 2. Intent Classification & Clustering
 1. Classify each keyword by search intent
 2. Group keywords into topic clusters
 3. Identify pillar topics and supporting terms
 4. Calculate cluster-level metrics (total volume, avg KD)
 ### 3. Gap Analysis
 1. Pull organic keywords for target and competitors
 2. Identify keywords present in competitors but missing from target
 3. Score opportunities by volume/difficulty ratio
 4. Prioritize by intent alignment with business goals
 ## Output Format
 ```markdown
 ## Keyword Strategy Report: [seed keyword]
 ### Overview
 - Total keywords discovered: [count]
 - Topic clusters: [count]
 - Total search volume: [sum]
 ### Top Clusters
 | Cluster | Keywords | Total Volume | Avg KD |
 |---------|----------|-------------|--------|
 | ... | ... | ... | ... |
 ### Top Opportunities
 | Keyword | Volume | KD | Intent | Cluster |
 |---------|--------|-----|--------|---------|
 | ... | ... | ... | ... | ... |
 ### Keyword Gaps (vs competitors)
 | Keyword | Volume | Competitor Position | Opportunity Score |
 |---------|--------|-------------------|-------------------|
 | ... | ... | ... | ... |
 ```
 ## Notion Output (Required)
 All audit reports MUST be saved to OurDigital SEO Audit Log:
 - **Database ID**: `2c8581e5-8a1e-8035-880b-e38cefc2f3ef`
 - **Properties**: Issue (title), Site (url), Category, Priority, Found Date, Audit ID
 - **Language**: Korean with English technical terms
 - **Audit ID Format**: KW-YYYYMMDD-NNN
--- a/custom-skills/19-seo-keyword-strategy/desktop/skill.yaml
+++ b/custom-skills/19-seo-keyword-strategy/desktop/skill.yaml
@@ -0,0 +1,9 @@
 name: seo-keyword-strategy
 description: |
  Keyword strategy and research for SEO campaigns. Triggers: keyword research, keyword analysis, keyword gap, search volume, keyword clustering, intent classification.
 allowed-tools:
  - mcp__ahrefs__*
  - mcp__notion__*
  - WebSearch
  - WebFetch
--- a/custom-skills/19-seo-keyword-strategy/desktop/tools/ahrefs.md
+++ b/custom-skills/19-seo-keyword-strategy/desktop/tools/ahrefs.md
@@ -0,0 +1,15 @@
 # Ahrefs
 > TODO: Document tool usage for this skill
 ## Available Commands
 - [ ] List commands
 ## Configuration
 - [ ] Add configuration details
 ## Examples
 - [ ] Add usage examples
--- a/custom-skills/19-seo-keyword-strategy/desktop/tools/notion.md
+++ b/custom-skills/19-seo-keyword-strategy/desktop/tools/notion.md
@@ -0,0 +1,15 @@
 # Notion
 > TODO: Document tool usage for this skill
 ## Available Commands
 - [ ] List commands
 ## Configuration
 - [ ] Add configuration details
 ## Examples
 - [ ] Add usage examples
--- a/custom-skills/19-seo-keyword-strategy/desktop/tools/websearch.md
+++ b/custom-skills/19-seo-keyword-strategy/desktop/tools/websearch.md
@@ -0,0 +1,15 @@
 # WebSearch
 > TODO: Document tool usage for this skill
 ## Available Commands
 - [ ] List commands
 ## Configuration
 - [ ] Add configuration details
 ## Examples
 - [ ] Add usage examples
--- a/custom-skills/20-seo-serp-analysis/code/CLAUDE.md
+++ b/custom-skills/20-seo-serp-analysis/code/CLAUDE.md
@@ -0,0 +1,132 @@
 # CLAUDE.md
 ## Overview
 SERP analysis tool for understanding search result landscapes. Detects Google SERP features (featured snippets, PAA, knowledge panels, local pack, video, ads), analyzes Naver SERP composition (blog, cafe, knowledge iN, Smart Store, brand zone, VIEW tab), maps competitor positions, and scores SERP feature opportunities.
 ## Quick Start
 ```bash
 pip install -r scripts/requirements.txt
 # Google SERP analysis
 python scripts/serp_analyzer.py --keyword "치과 임플란트" --country kr --json
 # Naver SERP analysis
 python scripts/naver_serp_analyzer.py --keyword "치과 임플란트" --json
 ```
 ## Scripts
 | Script | Purpose | Key Output |
 |--------|---------|------------|
 | `serp_analyzer.py` | Google SERP feature detection and competitor mapping | SERP features, competitor positions, opportunity scores |
 | `naver_serp_analyzer.py` | Naver SERP composition analysis | Section distribution, content type mapping |
 | `base_client.py` | Shared utilities | RateLimiter, ConfigManager, BaseAsyncClient |
 ## SERP Analyzer (Google)
 ```bash
 # Single keyword analysis
 python scripts/serp_analyzer.py --keyword "dental implant cost" --json
 # Korean market
 python scripts/serp_analyzer.py --keyword "치과 임플란트 가격" --country kr --json
 # Multiple keywords from file
 python scripts/serp_analyzer.py --keywords-file keywords.txt --country kr --json
 # Output to file
 python scripts/serp_analyzer.py --keyword "dental implant" --output serp_report.json
 ```
 **Capabilities**:
 - SERP feature detection (featured snippet, PAA, knowledge panel, local pack, video carousel, ads, image pack, site links)
 - Competitor position mapping per keyword
 - Content type distribution analysis (blog, product, service, news, video)
 - SERP feature opportunity scoring
 - Search intent validation from SERP composition
 - SERP volatility assessment
 ## Naver SERP Analyzer
 ```bash
 # Analyze Naver search results
 python scripts/naver_serp_analyzer.py --keyword "치과 임플란트" --json
 # Analyze multiple keywords
 python scripts/naver_serp_analyzer.py --keywords-file keywords.txt --json
 ```
 **Capabilities**:
 - Naver section detection (블로그, 카페, 지식iN, 스마트스토어, 브랜드존, VIEW탭)
 - Section priority mapping (which sections appear above fold)
 - Content type distribution per section
 - Brand zone presence detection
 - VIEW tab content analysis
 ## Ahrefs MCP Tools Used
 | Tool | Purpose |
 |------|---------|
 | `serp-overview` | Get SERP results for a keyword |
 | `keywords-explorer-overview` | Get keyword metrics and SERP features |
 | `site-explorer-organic-keywords` | Map competitor positions |
 ## Output Format
 ```json
 {
  "keyword": "치과 임플란트",
  "country": "kr",
  "serp_features": {
    "featured_snippet": true,
    "people_also_ask": true,
    "local_pack": true,
    "knowledge_panel": false,
    "video_carousel": false,
    "ads_top": 3,
    "ads_bottom": 2
  },
  "competitors": [
    {
      "position": 1,
      "url": "https://example.com/page",
      "domain": "example.com",
      "title": "...",
      "content_type": "service_page"
    }
  ],
  "opportunity_score": 72,
  "intent_signals": "commercial",
  "timestamp": "2025-01-01T00:00:00"
 }
 ```
 ## Notion Output (Required)
 **IMPORTANT**: All audit reports MUST be saved to the OurDigital SEO Audit Log database.
 ### Database Configuration
 | Field | Value |
 |-------|-------|
 | Database ID | `2c8581e5-8a1e-8035-880b-e38cefc2f3ef` |
 | URL | https://www.notion.so/dintelligence/2c8581e58a1e8035880be38cefc2f3ef |
 ### Required Properties
 | Property | Type | Description |
 |----------|------|-------------|
 | Issue | Title | Report title (Korean + date) |
 | Site | URL | Audited website URL |
 | Category | Select | SERP Analysis |
 | Priority | Select | Based on opportunity score |
 | Found Date | Date | Analysis date (YYYY-MM-DD) |
 | Audit ID | Rich Text | Format: SERP-YYYYMMDD-NNN |
 ### Language Guidelines
 - Report content in Korean (한국어)
 - Keep technical English terms as-is (e.g., SERP, Featured Snippet, PAA)
 - URLs and code remain unchanged
--- a/custom-skills/20-seo-serp-analysis/code/scripts/base_client.py
+++ b/custom-skills/20-seo-serp-analysis/code/scripts/base_client.py
@@ -0,0 +1,207 @@
 """
 Base Client - Shared async client utilities
 ===========================================
 Purpose: Rate-limited async operations for API clients
 Python: 3.10+
 """
 import asyncio
 import logging
 import os
 from asyncio import Semaphore
 from datetime import datetime
 from typing import Any, Callable, TypeVar
 from dotenv import load_dotenv
 from tenacity import (
    retry,
    stop_after_attempt,
    wait_exponential,
    retry_if_exception_type,
 )
 # Load environment variables
 load_dotenv()
 # Logging setup
 logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s",
 )
 T = TypeVar("T")
 class RateLimiter:
    """Rate limiter using token bucket algorithm."""
    def __init__(self, rate: float, per: float = 1.0):
        """
        Initialize rate limiter.
        Args:
            rate: Number of requests allowed
            per: Time period in seconds (default: 1 second)
        """
        self.rate = rate
        self.per = per
        self.tokens = rate
        self.last_update = datetime.now()
        self._lock = asyncio.Lock()
    async def acquire(self) -> None:
        """Acquire a token, waiting if necessary."""
        async with self._lock:
            now = datetime.now()
            elapsed = (now - self.last_update).total_seconds()
            self.tokens = min(self.rate, self.tokens + elapsed * (self.rate / self.per))
            self.last_update = now
            if self.tokens < 1:
                wait_time = (1 - self.tokens) * (self.per / self.rate)
                await asyncio.sleep(wait_time)
                self.tokens = 0
            else:
                self.tokens -= 1
 class BaseAsyncClient:
    """Base class for async API clients with rate limiting."""
    def __init__(
        self,
        max_concurrent: int = 5,
        requests_per_second: float = 3.0,
        logger: logging.Logger | None = None,
    ):
        """
        Initialize base client.
        Args:
            max_concurrent: Maximum concurrent requests
            requests_per_second: Rate limit
            logger: Logger instance
        """
        self.semaphore = Semaphore(max_concurrent)
        self.rate_limiter = RateLimiter(requests_per_second)
        self.logger = logger or logging.getLogger(self.__class__.__name__)
        self.stats = {
            "requests": 0,
            "success": 0,
            "errors": 0,
            "retries": 0,
        }
    @retry(
        stop=stop_after_attempt(3),
        wait=wait_exponential(multiplier=1, min=2, max=10),
        retry=retry_if_exception_type(Exception),
    )
    async def _rate_limited_request(
        self,
        coro: Callable[[], Any],
    ) -> Any:
        """Execute a request with rate limiting and retry."""
        async with self.semaphore:
            await self.rate_limiter.acquire()
            self.stats["requests"] += 1
            try:
                result = await coro()
                self.stats["success"] += 1
                return result
            except Exception as e:
                self.stats["errors"] += 1
                self.logger.error(f"Request failed: {e}")
                raise
    async def batch_requests(
        self,
        requests: list[Callable[[], Any]],
        desc: str = "Processing",
    ) -> list[Any]:
        """Execute multiple requests concurrently."""
        try:
            from tqdm.asyncio import tqdm
            has_tqdm = True
        except ImportError:
            has_tqdm = False
        async def execute(req: Callable) -> Any:
            try:
                return await self._rate_limited_request(req)
            except Exception as e:
                return {"error": str(e)}
        tasks = [execute(req) for req in requests]
        if has_tqdm:
            results = []
            for coro in tqdm.as_completed(tasks, total=len(tasks), desc=desc):
                result = await coro
                results.append(result)
            return results
        else:
            return await asyncio.gather(*tasks, return_exceptions=True)
    def print_stats(self) -> None:
        """Print request statistics."""
        self.logger.info("=" * 40)
        self.logger.info("Request Statistics:")
        self.logger.info(f"  Total Requests: {self.stats['requests']}")
        self.logger.info(f"  Successful: {self.stats['success']}")
        self.logger.info(f"  Errors: {self.stats['errors']}")
        self.logger.info("=" * 40)
 class ConfigManager:
    """Manage API configuration and credentials."""
    def __init__(self):
        load_dotenv()
    @property
    def google_credentials_path(self) -> str | None:
        """Get Google service account credentials path."""
        # Prefer SEO-specific credentials, fallback to general credentials
        seo_creds = os.path.expanduser("~/.credential/ourdigital-seo-agent.json")
        if os.path.exists(seo_creds):
            return seo_creds
        return os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
    @property
    def pagespeed_api_key(self) -> str | None:
        """Get PageSpeed Insights API key."""
        return os.getenv("PAGESPEED_API_KEY")
    @property
    def custom_search_api_key(self) -> str | None:
        """Get Custom Search API key."""
        return os.getenv("CUSTOM_SEARCH_API_KEY")
    @property
    def custom_search_engine_id(self) -> str | None:
        """Get Custom Search Engine ID."""
        return os.getenv("CUSTOM_SEARCH_ENGINE_ID")
    @property
    def notion_token(self) -> str | None:
        """Get Notion API token."""
        return os.getenv("NOTION_TOKEN") or os.getenv("NOTION_API_KEY")
    def validate_google_credentials(self) -> bool:
        """Validate Google credentials are configured."""
        creds_path = self.google_credentials_path
        if not creds_path:
            return False
        return os.path.exists(creds_path)
    def get_required(self, key: str) -> str:
        """Get required environment variable or raise error."""
        value = os.getenv(key)
        if not value:
            raise ValueError(f"Missing required environment variable: {key}")
        return value
 # Singleton config instance
 config = ConfigManager()
--- a/custom-skills/20-seo-serp-analysis/code/scripts/naver_serp_analyzer.py
+++ b/custom-skills/20-seo-serp-analysis/code/scripts/naver_serp_analyzer.py
@@ -0,0 +1,682 @@
 """
 Naver SERP Analyzer - Naver search result composition analysis
 ==============================================================
 Purpose: Analyze Naver SERP section distribution, content type mapping,
         brand zone detection, and VIEW tab content analysis.
 Python: 3.10+
 Usage:
    python naver_serp_analyzer.py --keyword "치과 임플란트" --json
    python naver_serp_analyzer.py --keywords-file keywords.txt --json
    python naver_serp_analyzer.py --keyword "치과 임플란트" --output naver_report.json
 """
 import argparse
 import json
 import logging
 import re
 import sys
 from dataclasses import asdict, dataclass, field
 from datetime import datetime
 from pathlib import Path
 from typing import Any
 import requests
 from bs4 import BeautifulSoup
 from rich.console import Console
 from rich.table import Table
 # ---------------------------------------------------------------------------
 # Logging
 # ---------------------------------------------------------------------------
 logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s",
 )
 logger = logging.getLogger(__name__)
 console = Console()
 # ---------------------------------------------------------------------------
 # Constants - Naver SERP Section Identifiers
 # ---------------------------------------------------------------------------
 # CSS class / id patterns used to detect Naver SERP sections
 NAVER_SECTION_SELECTORS: dict[str, list[str]] = {
    "blog": [
        "sp_blog",
        "blog_widget",
        "sc_new.sp_blog",
        "api_subject_blog",
        "type_blog",
        "blog_exact",
    ],
    "cafe": [
        "sp_cafe",
        "cafe_widget",
        "sc_new.sp_cafe",
        "api_subject_cafe",
        "type_cafe",
    ],
    "knowledge_in": [
        "sp_kin",
        "kin_widget",
        "sc_new.sp_kin",
        "api_subject_kin",
        "type_kin",
        "nx_kin",
    ],
    "smart_store": [
        "sp_nshop",
        "shopping_widget",
        "sc_new.sp_nshop",
        "api_subject_shopping",
        "type_shopping",
        "smartstore",
    ],
    "brand_zone": [
        "sp_brand",
        "brand_area",
        "brand_zone",
        "type_brand",
        "sc_new.sp_brand",
    ],
    "view_tab": [
        "sp_view",
        "view_widget",
        "sc_new.sp_view",
        "type_view",
        "api_subject_view",
    ],
    "news": [
        "sp_nnews",
        "news_widget",
        "sc_new.sp_nnews",
        "api_subject_news",
        "type_news",
        "group_news",
    ],
    "encyclopedia": [
        "sp_encyclopedia",
        "sc_new.sp_encyclopedia",
        "api_subject_encyclopedia",
        "type_encyclopedia",
        "nx_encyclopedia",
    ],
    "image": [
        "sp_image",
        "image_widget",
        "sc_new.sp_image",
        "api_subject_image",
        "type_image",
    ],
    "video": [
        "sp_video",
        "video_widget",
        "sc_new.sp_video",
        "api_subject_video",
        "type_video",
    ],
    "place": [
        "sp_local",
        "local_widget",
        "sc_new.sp_local",
        "type_place",
        "place_section",
        "loc_map",
    ],
    "ad": [
        "sp_nad",
        "sp_tad",
        "ad_section",
        "type_powerlink",
        "type_ad",
        "nx_ad",
    ],
 }
 # Section display names in Korean
 SECTION_DISPLAY_NAMES: dict[str, str] = {
    "blog": "블로그",
    "cafe": "카페",
    "knowledge_in": "지식iN",
    "smart_store": "스마트스토어",
    "brand_zone": "브랜드존",
    "view_tab": "VIEW",
    "news": "뉴스",
    "encyclopedia": "백과사전",
    "image": "이미지",
    "video": "동영상",
    "place": "플레이스",
    "ad": "광고",
 }
 # Default headers for Naver requests
 NAVER_HEADERS = {
    "User-Agent": (
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
        "AppleWebKit/537.36 (KHTML, like Gecko) "
        "Chrome/120.0.0.0 Safari/537.36"
    ),
    "Accept-Language": "ko-KR,ko;q=0.9,en-US;q=0.8,en;q=0.7",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
 }
 # ---------------------------------------------------------------------------
 # Data Classes
 # ---------------------------------------------------------------------------
@dataclass
 class NaverSection:
    """A detected section within Naver SERP."""
    section_type: str  # blog, cafe, knowledge_in, smart_store, etc.
    display_name: str = ""
    position: int = 0  # Order of appearance (1-based)
    item_count: int = 0  # Number of items in the section
    is_above_fold: bool = False  # Appears within first ~3 sections
    has_more_link: bool = False  # Section has "more results" link
    raw_html_snippet: str = ""  # Short HTML snippet for debugging
    def __post_init__(self):
        if not self.display_name:
            self.display_name = SECTION_DISPLAY_NAMES.get(
                self.section_type, self.section_type
            )
@dataclass
 class NaverSerpResult:
    """Complete Naver SERP analysis result for a keyword."""
    keyword: str
    sections: list[NaverSection] = field(default_factory=list)
    section_order: list[str] = field(default_factory=list)
    brand_zone_present: bool = False
    brand_zone_brand: str = ""
    total_sections: int = 0
    above_fold_sections: list[str] = field(default_factory=list)
    ad_count: int = 0
    dominant_section: str = ""
    has_view_tab: bool = False
    has_place_section: bool = False
    timestamp: str = ""
    def __post_init__(self):
        if not self.timestamp:
            self.timestamp = datetime.now().isoformat()
 # ---------------------------------------------------------------------------
 # Naver SERP Analyzer
 # ---------------------------------------------------------------------------
 class NaverSerpAnalyzer:
    """Analyzes Naver search result page composition."""
    NAVER_SEARCH_URL = "https://search.naver.com/search.naver"
    def __init__(self, timeout: int = 30):
        self.timeout = timeout
        self.logger = logging.getLogger(self.__class__.__name__)
        self.session = requests.Session()
        self.session.headers.update(NAVER_HEADERS)
    # ----- Data Fetching -----
    def fetch_serp(self, keyword: str) -> str:
        """
        Fetch Naver search results HTML for a given keyword.
        Returns the raw HTML string of the search results page.
        """
        self.logger.info(f"Fetching Naver SERP for '{keyword}'")
        params = {
            "where": "nexearch",
            "sm": "top_hty",
            "fbm": "0",
            "ie": "utf8",
            "query": keyword,
        }
        try:
            response = self.session.get(
                self.NAVER_SEARCH_URL,
                params=params,
                timeout=self.timeout,
            )
            response.raise_for_status()
            self.logger.info(
                f"Fetched {len(response.text):,} bytes "
                f"(status={response.status_code})"
            )
            return response.text
        except requests.RequestException as exc:
            self.logger.error(f"Failed to fetch Naver SERP: {exc}")
            return ""
    # ----- Section Detection -----
    def detect_sections(self, html: str) -> list[NaverSection]:
        """
        Identify Naver SERP sections from HTML structure.
        Scans the HTML for known CSS class names and IDs that correspond
        to Naver's SERP section types.
        """
        if not html:
            return []
        soup = BeautifulSoup(html, "lxml")
        sections: list[NaverSection] = []
        position = 0
        # Strategy 1: Look for section containers with known class names
        # Naver uses <div class="sc_new sp_XXX"> and <section> elements
        all_sections = soup.find_all(
            ["div", "section"],
            class_=re.compile(
                r"(sc_new|api_subject|sp_|type_|_widget|group_|nx_)"
            ),
        )
        seen_types: set[str] = set()
        for element in all_sections:
            classes = " ".join(element.get("class", []))
            element_id = element.get("id", "")
            search_text = f"{classes} {element_id}".lower()
            for section_type, selectors in NAVER_SECTION_SELECTORS.items():
                if section_type in seen_types:
                    continue
                matched = False
                for selector in selectors:
                    if selector.lower() in search_text:
                        matched = True
                        break
                if matched:
                    position += 1
                    seen_types.add(section_type)
                    # Count items within the section
                    item_count = self._count_section_items(element, section_type)
                    # Check for "more" link
                    has_more = bool(
                        element.find("a", class_=re.compile(r"(more|_more|btn_more)"))
                        or element.find("a", string=re.compile(r"(더보기|전체보기)"))
                    )
                    # Get short HTML snippet for debugging
                    snippet = str(element)[:200] if element else ""
                    section = NaverSection(
                        section_type=section_type,
                        position=position,
                        item_count=item_count,
                        is_above_fold=(position <= 3),
                        has_more_link=has_more,
                        raw_html_snippet=snippet,
                    )
                    sections.append(section)
        # Strategy 2: Fallback - scan entire HTML text for section markers
        if not sections:
            self.logger.warning(
                "No sections found via DOM parsing; "
                "falling back to text pattern matching"
            )
            sections = self._fallback_text_detection(html)
        return sections
    def _count_section_items(self, element: Any, section_type: str) -> int:
        """Count the number of result items within a section element."""
        # Common item container patterns
        item_selectors = [
            "li",
            ".api_txt_lines",
            ".total_tit",
            ".detail_box",
            ".item",
            ".lst_total > li",
        ]
        for selector in item_selectors:
            items = element.select(selector)
            if items and len(items) > 0:
                return len(items)
        # Fallback: count links that look like results
        links = element.find_all("a", href=True)
        result_links = [
            a
            for a in links
            if a.get("href", "").startswith("http")
            and "naver.com/search" not in a.get("href", "")
        ]
        return len(result_links) if result_links else 0
    def _fallback_text_detection(self, html: str) -> list[NaverSection]:
        """Detect sections by scanning raw HTML text for known markers."""
        sections: list[NaverSection] = []
        position = 0
        html_lower = html.lower()
        for section_type, selectors in NAVER_SECTION_SELECTORS.items():
            for selector in selectors:
                if selector.lower() in html_lower:
                    position += 1
                    sections.append(
                        NaverSection(
                            section_type=section_type,
                            position=position,
                            item_count=0,
                            is_above_fold=(position <= 3),
                        )
                    )
                    break
        return sections
    # ----- Section Priority Analysis -----
    def analyze_section_priority(
        self, sections: list[NaverSection]
    ) -> list[str]:
        """
        Determine above-fold section order.
        Returns ordered list of section types that appear in the first
        visible area of the SERP (approximately top 3 sections).
        """
        sorted_sections = sorted(sections, key=lambda s: s.position)
        above_fold = [s.section_type for s in sorted_sections if s.is_above_fold]
        return above_fold
    # ----- Brand Zone Detection -----
    def check_brand_zone(self, html: str) -> tuple[bool, str]:
        """
        Detect brand zone presence and extract brand name if available.
        Returns (is_present, brand_name).
        """
        if not html:
            return False, ""
        soup = BeautifulSoup(html, "lxml")
        # Look for brand zone container
        brand_selectors = [
            "sp_brand",
            "brand_area",
            "brand_zone",
            "type_brand",
        ]
        for selector in brand_selectors:
            brand_el = soup.find(
                ["div", "section"],
                class_=re.compile(selector, re.IGNORECASE),
            )
            if brand_el:
                # Try to extract brand name from the section
                brand_name = ""
                title_el = brand_el.find(
                    ["h2", "h3", "strong", "a"],
                    class_=re.compile(r"(tit|title|name|brand)", re.IGNORECASE),
                )
                if title_el:
                    brand_name = title_el.get_text(strip=True)
                return True, brand_name
        # Text-based fallback
        if "brand_zone" in html.lower() or "sp_brand" in html.lower():
            return True, ""
        return False, ""
    # ----- Dominant Section -----
    def _find_dominant_section(self, sections: list[NaverSection]) -> str:
        """Find the section with the most items (excluding ads)."""
        non_ad = [s for s in sections if s.section_type != "ad"]
        if not non_ad:
            return ""
        return max(non_ad, key=lambda s: s.item_count).section_type
    # ----- Main Analysis Orchestrator -----
    def analyze(self, keyword: str) -> NaverSerpResult:
        """
        Orchestrate full Naver SERP analysis for a single keyword.
        Steps:
        1. Fetch Naver search results page
        2. Detect SERP sections
        3. Analyze section priority
        4. Check brand zone presence
        5. Compile results
        """
        html = self.fetch_serp(keyword)
        if not html:
            self.logger.error(f"No HTML content for keyword '{keyword}'")
            return NaverSerpResult(keyword=keyword)
        sections = self.detect_sections(html)
        above_fold = self.analyze_section_priority(sections)
        brand_present, brand_name = self.check_brand_zone(html)
        # Build section order
        section_order = [s.section_type for s in sorted(sections, key=lambda x: x.position)]
        # Count ads
        ad_sections = [s for s in sections if s.section_type == "ad"]
        ad_count = sum(s.item_count for s in ad_sections) if ad_sections else 0
        # Check special sections
        has_view = any(s.section_type == "view_tab" for s in sections)
        has_place = any(s.section_type == "place" for s in sections)
        dominant = self._find_dominant_section(sections)
        result = NaverSerpResult(
            keyword=keyword,
            sections=sections,
            section_order=section_order,
            brand_zone_present=brand_present,
            brand_zone_brand=brand_name,
            total_sections=len(sections),
            above_fold_sections=above_fold,
            ad_count=ad_count,
            dominant_section=dominant,
            has_view_tab=has_view,
            has_place_section=has_place,
        )
        return result
 # ---------------------------------------------------------------------------
 # Output Helpers
 # ---------------------------------------------------------------------------
 def result_to_dict(result: NaverSerpResult) -> dict[str, Any]:
    """Convert NaverSerpResult to a JSON-serializable dictionary."""
    d = asdict(result)
    # Remove raw HTML snippets from JSON output to keep it clean
    for section in d.get("sections", []):
        section.pop("raw_html_snippet", None)
    return d
 def print_rich_report(result: NaverSerpResult) -> None:
    """Print a human-readable report using rich."""
    console.rule(f"[bold blue]Naver SERP Analysis: {result.keyword}")
    console.print(f"[dim]Timestamp: {result.timestamp}[/dim]")
    console.print()
    # Summary
    summary_table = Table(title="Summary", show_lines=True)
    summary_table.add_column("Metric", style="cyan")
    summary_table.add_column("Value", style="green")
    summary_table.add_row("Total Sections", str(result.total_sections))
    summary_table.add_row("Ad Count", str(result.ad_count))
    summary_table.add_row("Brand Zone", "Yes" if result.brand_zone_present else "No")
    if result.brand_zone_brand:
        summary_table.add_row("Brand Name", result.brand_zone_brand)
    summary_table.add_row("VIEW Tab", "Yes" if result.has_view_tab else "No")
    summary_table.add_row("Place Section", "Yes" if result.has_place_section else "No")
    summary_table.add_row("Dominant Section", result.dominant_section or "N/A")
    console.print(summary_table)
    console.print()
    # Section Details
    if result.sections:
        section_table = Table(title="Detected Sections", show_lines=True)
        section_table.add_column("#", style="bold")
        section_table.add_column("Section", style="cyan")
        section_table.add_column("Display Name", style="magenta")
        section_table.add_column("Items", style="green")
        section_table.add_column("Above Fold", style="yellow")
        section_table.add_column("More Link", style="dim")
        for s in sorted(result.sections, key=lambda x: x.position):
            section_table.add_row(
                str(s.position),
                s.section_type,
                s.display_name,
                str(s.item_count),
                "Yes" if s.is_above_fold else "No",
                "Yes" if s.has_more_link else "No",
            )
        console.print(section_table)
        console.print()
    # Above-Fold Sections
    if result.above_fold_sections:
        console.print("[bold]Above-Fold Section Order:[/bold]")
        for i, sec in enumerate(result.above_fold_sections, 1):
            display = SECTION_DISPLAY_NAMES.get(sec, sec)
            console.print(f"  {i}. {display} ({sec})")
        console.print()
    # Section Order
    if result.section_order:
        console.print("[bold]Full Section Order:[/bold]")
        order_str = " -> ".join(
            SECTION_DISPLAY_NAMES.get(s, s) for s in result.section_order
        )
        console.print(f"  {order_str}")
    console.rule()
 # ---------------------------------------------------------------------------
 # CLI
 # ---------------------------------------------------------------------------
 def build_parser() -> argparse.ArgumentParser:
    parser = argparse.ArgumentParser(
        description="Naver SERP composition analysis",
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog="""
 Examples:
  python naver_serp_analyzer.py --keyword "치과 임플란트" --json
  python naver_serp_analyzer.py --keywords-file keywords.txt --json
  python naver_serp_analyzer.py --keyword "치과 임플란트" --output report.json
        """,
    )
    group = parser.add_mutually_exclusive_group(required=True)
    group.add_argument(
        "--keyword",
        type=str,
        help="Single keyword to analyze",
    )
    group.add_argument(
        "--keywords-file",
        type=str,
        help="Path to file with one keyword per line",
    )
    parser.add_argument(
        "--json",
        action="store_true",
        dest="json_output",
        help="Output results as JSON",
    )
    parser.add_argument(
        "--output",
        type=str,
        help="Write JSON results to file",
    )
    return parser
 def load_keywords(filepath: str) -> list[str]:
    """Load keywords from a text file, one per line."""
    path = Path(filepath)
    if not path.exists():
        logger.error(f"Keywords file not found: {filepath}")
        sys.exit(1)
    keywords = []
    with open(path, "r", encoding="utf-8") as fh:
        for line in fh:
            kw = line.strip()
            if kw and not kw.startswith("#"):
                keywords.append(kw)
    logger.info(f"Loaded {len(keywords)} keywords from {filepath}")
    return keywords
 def main() -> None:
    parser = build_parser()
    args = parser.parse_args()
    analyzer = NaverSerpAnalyzer()
    # Collect keywords
    if args.keyword:
        keywords = [args.keyword]
    else:
        keywords = load_keywords(args.keywords_file)
    if not keywords:
        logger.error("No keywords to analyze")
        sys.exit(1)
    results: list[dict[str, Any]] = []
    for kw in keywords:
        console.print(f"\n[bold]Analyzing Naver SERP:[/bold] {kw}")
        result = analyzer.analyze(kw)
        if args.json_output or args.output:
            results.append(result_to_dict(result))
        else:
            print_rich_report(result)
    # JSON output
    if args.json_output:
        output_data = results[0] if len(results) == 1 else results
        print(json.dumps(output_data, ensure_ascii=False, indent=2))
    if args.output:
        output_data = results[0] if len(results) == 1 else results
        output_path = Path(args.output)
        with open(output_path, "w", encoding="utf-8") as fh:
            json.dump(output_data, fh, ensure_ascii=False, indent=2)
        logger.info(f"Results written to {output_path}")
 if __name__ == "__main__":
    main()
--- a/custom-skills/20-seo-serp-analysis/code/scripts/requirements.txt
+++ b/custom-skills/20-seo-serp-analysis/code/scripts/requirements.txt
@@ -0,0 +1,9 @@
 # 20-seo-serp-analysis dependencies
 requests>=2.31.0
 aiohttp>=3.9.0
 beautifulsoup4>=4.12.0
 lxml>=5.1.0
 tenacity>=8.2.0
 tqdm>=4.66.0
 python-dotenv>=1.0.0
 rich>=13.7.0
--- a/custom-skills/20-seo-serp-analysis/code/scripts/serp_analyzer.py
+++ b/custom-skills/20-seo-serp-analysis/code/scripts/serp_analyzer.py
@@ -0,0 +1,891 @@
 """
 SERP Analyzer - Google SERP feature detection and competitor mapping
 ====================================================================
 Purpose: Analyze Google SERP features, map competitor positions,
         classify content types, and score SERP opportunities.
 Python: 3.10+
 Usage:
    python serp_analyzer.py --keyword "치과 임플란트" --country kr --json
    python serp_analyzer.py --keywords-file keywords.txt --country kr --json
    python serp_analyzer.py --keyword "dental implant" --output serp_report.json
 """
 import argparse
 import json
 import logging
 import re
 import subprocess
 import sys
 from dataclasses import asdict, dataclass, field
 from datetime import datetime
 from pathlib import Path
 from typing import Any
 from urllib.parse import urlparse
 from rich.console import Console
 from rich.table import Table
 # ---------------------------------------------------------------------------
 # Logging
 # ---------------------------------------------------------------------------
 logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s",
 )
 logger = logging.getLogger(__name__)
 console = Console()
 # ---------------------------------------------------------------------------
 # Data Classes
 # ---------------------------------------------------------------------------
@dataclass
 class SerpFeatures:
    """Tracks presence and count of Google SERP features."""
    featured_snippet: bool = False
    people_also_ask: bool = False
    local_pack: bool = False
    knowledge_panel: bool = False
    video_carousel: bool = False
    image_pack: bool = False
    site_links: bool = False
    ads_top: int = 0
    ads_bottom: int = 0
    shopping: bool = False
    @property
    def feature_count(self) -> int:
        """Count of boolean features that are present."""
        count = 0
        for f in [
            self.featured_snippet,
            self.people_also_ask,
            self.local_pack,
            self.knowledge_panel,
            self.video_carousel,
            self.image_pack,
            self.site_links,
            self.shopping,
        ]:
            if f:
                count += 1
        return count
    @property
    def has_ads(self) -> bool:
        return self.ads_top > 0 or self.ads_bottom > 0
@dataclass
 class CompetitorPosition:
    """A single competitor entry in the SERP."""
    position: int
    url: str
    domain: str
    title: str = ""
    content_type: str = "unknown"
    is_featured: bool = False
    has_sitelinks: bool = False
    estimated_traffic_share: float = 0.0
@dataclass
 class SerpResult:
    """Complete SERP analysis result for a keyword."""
    keyword: str
    country: str = "us"
    search_volume: int = 0
    keyword_difficulty: float = 0.0
    cpc: float = 0.0
    serp_features: SerpFeatures = field(default_factory=SerpFeatures)
    competitors: list[CompetitorPosition] = field(default_factory=list)
    opportunity_score: int = 0
    intent_signals: str = "informational"
    content_type_distribution: dict[str, int] = field(default_factory=dict)
    volatility: str = "stable"
    timestamp: str = ""
    def __post_init__(self):
        if not self.timestamp:
            self.timestamp = datetime.now().isoformat()
 # ---------------------------------------------------------------------------
 # Content Type Classifiers
 # ---------------------------------------------------------------------------
 # URL path patterns that hint at content type
 URL_CONTENT_PATTERNS: dict[str, list[str]] = {
    "blog": [
        r"/blog/",
        r"/post/",
        r"/article/",
        r"/news/",
        r"/magazine/",
        r"/journal/",
        r"/column/",
        r"/story/",
        r"\d{4}/\d{2}/",
    ],
    "product": [
        r"/product/",
        r"/item/",
        r"/shop/",
        r"/store/",
        r"/buy/",
        r"/p/",
        r"/goods/",
        r"/catalog/",
    ],
    "service": [
        r"/service",
        r"/solution",
        r"/treatment",
        r"/procedure",
        r"/pricing",
        r"/consultation",
    ],
    "news": [
        r"/news/",
        r"/press/",
        r"/media/",
        r"/release/",
        r"news\.",
        r"press\.",
    ],
    "video": [
        r"youtube\.com/watch",
        r"youtu\.be/",
        r"vimeo\.com/",
        r"/video/",
        r"/watch/",
    ],
    "forum": [
        r"/forum/",
        r"/community/",
        r"/discuss",
        r"/thread/",
        r"/question/",
        r"/answers/",
    ],
    "wiki": [
        r"wikipedia\.org",
        r"/wiki/",
        r"namu\.wiki",
    ],
 }
 # Title keywords that hint at content type
 TITLE_CONTENT_PATTERNS: dict[str, list[str]] = {
    "blog": ["블로그", "후기", "리뷰", "review", "guide", "가이드", "팁", "tips"],
    "product": ["구매", "가격", "buy", "price", "shop", "할인", "sale", "최저가"],
    "service": ["상담", "치료", "진료", "병원", "클리닉", "clinic", "treatment"],
    "news": ["뉴스", "속보", "보도", "news", "기사", "report"],
    "video": ["영상", "동영상", "video", "youtube"],
    "comparison": ["비교", "vs", "versus", "compare", "차이", "best"],
 }
 # CTR distribution by position (approximate click-through rates)
 CTR_BY_POSITION: dict[int, float] = {
    1: 0.316,
    2: 0.158,
    3: 0.110,
    4: 0.080,
    5: 0.062,
    6: 0.049,
    7: 0.040,
    8: 0.034,
    9: 0.029,
    10: 0.025,
 }
 # ---------------------------------------------------------------------------
 # SERP Analyzer
 # ---------------------------------------------------------------------------
 class SerpAnalyzer:
    """Analyzes Google SERP features, competitor positions, and opportunities."""
    def __init__(self):
        self.logger = logging.getLogger(self.__class__.__name__)
    # ----- Data Fetching -----
    def get_serp_data(self, keyword: str, country: str = "us") -> dict[str, Any]:
        """
        Fetch SERP data via Ahrefs serp-overview MCP tool.
        Uses subprocess to invoke the Ahrefs MCP tool. Falls back to a
        structured placeholder when the MCP tool is unavailable (e.g., in
        standalone / CI environments).
        """
        self.logger.info(f"Fetching SERP data for '{keyword}' (country={country})")
        try:
            # Attempt MCP tool call via subprocess
            cmd = [
                "claude",
                "mcp",
                "call",
                "ahrefs",
                "serp-overview",
                json.dumps({"keyword": keyword, "country": country}),
            ]
            result = subprocess.run(
                cmd,
                capture_output=True,
                text=True,
                timeout=60,
            )
            if result.returncode == 0 and result.stdout.strip():
                data = json.loads(result.stdout)
                self.logger.info("Successfully fetched SERP data via MCP")
                return data
        except (subprocess.TimeoutExpired, FileNotFoundError, json.JSONDecodeError) as exc:
            self.logger.warning(f"MCP call unavailable ({exc}), using keyword metrics fallback")
        # Fallback: try Ahrefs keywords-explorer-overview
        try:
            cmd_kw = [
                "claude",
                "mcp",
                "call",
                "ahrefs",
                "keywords-explorer-overview",
                json.dumps({"keyword": keyword, "country": country}),
            ]
            result_kw = subprocess.run(
                cmd_kw,
                capture_output=True,
                text=True,
                timeout=60,
            )
            if result_kw.returncode == 0 and result_kw.stdout.strip():
                data = json.loads(result_kw.stdout)
                self.logger.info("Fetched keyword overview via MCP")
                return data
        except (subprocess.TimeoutExpired, FileNotFoundError, json.JSONDecodeError) as exc:
            self.logger.warning(f"Keywords-explorer MCP also unavailable ({exc})")
        # Return empty structure when no MCP tools available
        self.logger.warning(
            "No MCP data source available. Run inside Claude Desktop "
            "or provide data via --input flag."
        )
        return {
            "keyword": keyword,
            "country": country,
            "serp": [],
            "serp_features": {},
            "metrics": {},
        }
    # ----- Feature Detection -----
    def detect_features(self, serp_data: dict[str, Any]) -> SerpFeatures:
        """
        Identify SERP features from Ahrefs response data.
        Handles both the structured 'serp_features' dict returned by
        keywords-explorer-overview and the raw SERP items list from
        serp-overview.
        """
        features = SerpFeatures()
        # -- Method 1: structured serp_features from Ahrefs --
        sf = serp_data.get("serp_features", {})
        if isinstance(sf, dict):
            features.featured_snippet = sf.get("featured_snippet", False)
            features.people_also_ask = sf.get("people_also_ask", False)
            features.local_pack = sf.get("local_pack", False)
            features.knowledge_panel = sf.get("knowledge_panel", False) or sf.get(
                "knowledge_graph", False
            )
            features.video_carousel = sf.get("video", False) or sf.get(
                "video_carousel", False
            )
            features.image_pack = sf.get("image_pack", False) or sf.get(
                "images", False
            )
            features.site_links = sf.get("sitelinks", False) or sf.get(
                "site_links", False
            )
            features.shopping = sf.get("shopping_results", False) or sf.get(
                "shopping", False
            )
            features.ads_top = int(sf.get("ads_top", 0) or 0)
            features.ads_bottom = int(sf.get("ads_bottom", 0) or 0)
        # -- Method 2: infer from raw SERP items list --
        serp_items = serp_data.get("serp", [])
        if isinstance(serp_items, list):
            for item in serp_items:
                item_type = str(item.get("type", "")).lower()
                if "featured_snippet" in item_type or item.get("is_featured"):
                    features.featured_snippet = True
                if "people_also_ask" in item_type or "paa" in item_type:
                    features.people_also_ask = True
                if "local" in item_type or "map" in item_type:
                    features.local_pack = True
                if "knowledge" in item_type:
                    features.knowledge_panel = True
                if "video" in item_type:
                    features.video_carousel = True
                if "image" in item_type:
                    features.image_pack = True
                if item.get("sitelinks"):
                    features.site_links = True
                if "shopping" in item_type:
                    features.shopping = True
                if "ad" in item_type:
                    pos = item.get("position", 0)
                    if pos <= 4:
                        features.ads_top += 1
                    else:
                        features.ads_bottom += 1
        return features
    # ----- Competitor Mapping -----
    def map_competitors(self, serp_data: dict[str, Any]) -> list[CompetitorPosition]:
        """Extract competitor positions and domains from SERP data."""
        competitors: list[CompetitorPosition] = []
        serp_items = serp_data.get("serp", [])
        if not isinstance(serp_items, list):
            return competitors
        for item in serp_items:
            url = item.get("url", "")
            if not url:
                continue
            # Skip ads for organic mapping
            item_type = str(item.get("type", "")).lower()
            if "ad" in item_type:
                continue
            parsed = urlparse(url)
            domain = parsed.netloc.replace("www.", "")
            position = int(item.get("position", len(competitors) + 1))
            title = item.get("title", "")
            content_type = self.classify_content_type(item)
            traffic_share = CTR_BY_POSITION.get(position, 0.01)
            comp = CompetitorPosition(
                position=position,
                url=url,
                domain=domain,
                title=title,
                content_type=content_type,
                is_featured=bool(item.get("is_featured")),
                has_sitelinks=bool(item.get("sitelinks")),
                estimated_traffic_share=round(traffic_share, 4),
            )
            competitors.append(comp)
        # Sort by position
        competitors.sort(key=lambda c: c.position)
        return competitors
    # ----- Content Type Classification -----
    def classify_content_type(self, result: dict[str, Any]) -> str:
        """
        Classify a SERP result as blog/product/service/news/video/forum/wiki
        based on URL patterns and title keywords.
        """
        url = result.get("url", "").lower()
        title = result.get("title", "").lower()
        scores: dict[str, int] = {}
        # Score from URL patterns
        for ctype, patterns in URL_CONTENT_PATTERNS.items():
            for pattern in patterns:
                if re.search(pattern, url):
                    scores[ctype] = scores.get(ctype, 0) + 2
                    break
        # Score from title patterns
        for ctype, keywords in TITLE_CONTENT_PATTERNS.items():
            for kw in keywords:
                if kw.lower() in title:
                    scores[ctype] = scores.get(ctype, 0) + 1
        if not scores:
            # Heuristic: if domain is a known authority site
            parsed = urlparse(url)
            domain = parsed.netloc.lower()
            if any(d in domain for d in ["wikipedia", "namu.wiki", "나무위키"]):
                return "wiki"
            if any(d in domain for d in ["youtube", "vimeo"]):
                return "video"
            if any(d in domain for d in ["naver.com", "tistory.com", "brunch.co.kr"]):
                return "blog"
            return "service_page"
        # Return highest scoring type
        return max(scores, key=scores.get)  # type: ignore[arg-type]
    # ----- Opportunity Scoring -----
    def calculate_opportunity_score(
        self,
        features: SerpFeatures,
        positions: list[CompetitorPosition],
    ) -> int:
        """
        Score SERP opportunity from 0-100.
        Higher scores indicate better opportunity to rank or gain features.
        Factors (additive):
        - Featured snippet available but could be captured     +15
        - PAA present (related question opportunity)           +10
        - No knowledge panel (less SERP real-estate taken)     +10
        - Low ad count (more organic visibility)               +10
        - Few sitelinks in top results                         +5
        - Content diversity (various domains in top 10)        +10
        - No video carousel (opportunity to add video)         +5
        - Top results are blogs (easier to outrank)            +10
        - Image pack absent (image SEO opportunity)            +5
        - Shopping absent for commercial keywords              +5
        - Top positions lacking schema/rich results            +5
        Penalty factors (subtractive):
        - Knowledge panel dominates                            -15
        - Heavy ad presence (4+ top ads)                       -10
        - Single domain dominates top 5                        -10
        """
        score = 50  # Base score
        # -- Positive signals --
        if features.featured_snippet:
            score += 15
        if features.people_also_ask:
            score += 10
        if not features.knowledge_panel:
            score += 10
        if features.ads_top <= 1:
            score += 10
        elif features.ads_top <= 2:
            score += 5
        if not features.video_carousel:
            score += 5
        if not features.image_pack:
            score += 5
        if not features.shopping:
            score += 5
        # Domain diversity in top 10
        if positions:
            top10_domains = {p.domain for p in positions[:10]}
            if len(top10_domains) >= 8:
                score += 10
            elif len(top10_domains) >= 5:
                score += 5
            # Blog-heavy top results (easier to compete)
            blog_count = sum(
                1 for p in positions[:5] if p.content_type == "blog"
            )
            if blog_count >= 3:
                score += 10
            elif blog_count >= 2:
                score += 5
            # Sitelinks reduce available space
            sitelink_count = sum(1 for p in positions[:5] if p.has_sitelinks)
            if sitelink_count <= 1:
                score += 5
            # Single domain dominance penalty
            domain_counts: dict[str, int] = {}
            for p in positions[:5]:
                domain_counts[p.domain] = domain_counts.get(p.domain, 0) + 1
            if any(c >= 3 for c in domain_counts.values()):
                score -= 10
        # -- Negative signals --
        if features.knowledge_panel:
            score -= 15
        if features.ads_top >= 4:
            score -= 10
        elif features.ads_top >= 3:
            score -= 5
        # Clamp to 0-100
        return max(0, min(100, score))
    # ----- Intent Validation -----
    def validate_intent(
        self,
        features: SerpFeatures,
        positions: list[CompetitorPosition],
    ) -> str:
        """
        Infer search intent from SERP composition.
        Returns one of: informational, navigational, commercial, transactional, local
        """
        signals: dict[str, int] = {
            "informational": 0,
            "navigational": 0,
            "commercial": 0,
            "transactional": 0,
            "local": 0,
        }
        # Feature-based signals
        if features.featured_snippet:
            signals["informational"] += 3
        if features.people_also_ask:
            signals["informational"] += 2
        if features.knowledge_panel:
            signals["informational"] += 2
            signals["navigational"] += 2
        if features.local_pack:
            signals["local"] += 5
        if features.shopping:
            signals["transactional"] += 4
        if features.has_ads:
            signals["commercial"] += 2
            signals["transactional"] += 1
        if features.ads_top >= 3:
            signals["transactional"] += 2
        if features.image_pack:
            signals["informational"] += 1
        if features.video_carousel:
            signals["informational"] += 1
        # Content type signals from top results
        for pos in positions[:10]:
            ct = pos.content_type
            if ct == "blog":
                signals["informational"] += 1
            elif ct == "product":
                signals["transactional"] += 2
            elif ct == "service":
                signals["commercial"] += 1
            elif ct == "news":
                signals["informational"] += 1
            elif ct == "video":
                signals["informational"] += 1
            elif ct == "wiki":
                signals["informational"] += 2
            elif ct == "forum":
                signals["informational"] += 1
            elif ct == "comparison":
                signals["commercial"] += 2
        # Navigational: single domain dominates top 3
        if positions:
            top3_domains = [p.domain for p in positions[:3]]
            if len(set(top3_domains)) == 1:
                signals["navigational"] += 5
        # Return highest signal
        return max(signals, key=signals.get)  # type: ignore[arg-type]
    # ----- Content Type Distribution -----
    def _content_type_distribution(
        self, positions: list[CompetitorPosition]
    ) -> dict[str, int]:
        """Count content types across top organic results."""
        dist: dict[str, int] = {}
        for p in positions[:10]:
            dist[p.content_type] = dist.get(p.content_type, 0) + 1
        return dict(sorted(dist.items(), key=lambda x: x[1], reverse=True))
    # ----- Volatility Assessment -----
    def _assess_volatility(self, serp_data: dict[str, Any]) -> str:
        """
        Assess SERP volatility based on available signals.
        Returns: stable, moderate, volatile
        """
        # Check if Ahrefs provides a volatility/movement score
        metrics = serp_data.get("metrics", {})
        if isinstance(metrics, dict):
            volatility_score = metrics.get("serp_volatility", None)
            if volatility_score is not None:
                if volatility_score < 3:
                    return "stable"
                elif volatility_score < 7:
                    return "moderate"
                else:
                    return "volatile"
        # Heuristic: if many results have recent dates, SERP is more volatile
        serp_items = serp_data.get("serp", [])
        if isinstance(serp_items, list) and serp_items:
            recent_count = 0
            for item in serp_items[:10]:
                last_seen = item.get("last_seen", "")
                if last_seen:
                    try:
                        dt = datetime.fromisoformat(last_seen.replace("Z", "+00:00"))
                        if (datetime.now(dt.tzinfo) - dt).days < 30:
                            recent_count += 1
                    except (ValueError, TypeError):
                        pass
            if recent_count >= 5:
                return "volatile"
            elif recent_count >= 3:
                return "moderate"
        return "stable"
    # ----- Main Analysis Orchestrator -----
    def analyze(self, keyword: str, country: str = "us") -> SerpResult:
        """
        Orchestrate full SERP analysis for a single keyword.
        Steps:
        1. Fetch SERP data from Ahrefs MCP
        2. Detect SERP features
        3. Map competitor positions
        4. Classify content types
        5. Calculate opportunity score
        6. Validate search intent
        7. Assess volatility
        """
        serp_data = self.get_serp_data(keyword, country)
        features = self.detect_features(serp_data)
        positions = self.map_competitors(serp_data)
        opportunity = self.calculate_opportunity_score(features, positions)
        intent = self.validate_intent(features, positions)
        content_dist = self._content_type_distribution(positions)
        volatility = self._assess_volatility(serp_data)
        # Extract keyword metrics if available
        metrics = serp_data.get("metrics", {})
        search_volume = int(metrics.get("search_volume", 0) or 0)
        keyword_difficulty = float(metrics.get("keyword_difficulty", 0) or 0)
        cpc = float(metrics.get("cpc", 0) or 0)
        result = SerpResult(
            keyword=keyword,
            country=country,
            search_volume=search_volume,
            keyword_difficulty=keyword_difficulty,
            cpc=cpc,
            serp_features=features,
            competitors=positions,
            opportunity_score=opportunity,
            intent_signals=intent,
            content_type_distribution=content_dist,
            volatility=volatility,
        )
        return result
 # ---------------------------------------------------------------------------
 # Output Helpers
 # ---------------------------------------------------------------------------
 def result_to_dict(result: SerpResult) -> dict[str, Any]:
    """Convert SerpResult to a JSON-serializable dictionary."""
    d = asdict(result)
    return d
 def print_rich_report(result: SerpResult) -> None:
    """Print a human-readable report using rich."""
    console.rule(f"[bold blue]SERP Analysis: {result.keyword}")
    console.print(f"[dim]Country: {result.country} | Timestamp: {result.timestamp}[/dim]")
    console.print()
    # Metrics
    if result.search_volume or result.keyword_difficulty:
        metrics_table = Table(title="Keyword Metrics", show_lines=True)
        metrics_table.add_column("Metric", style="cyan")
        metrics_table.add_column("Value", style="green")
        metrics_table.add_row("Search Volume", f"{result.search_volume:,}")
        metrics_table.add_row("Keyword Difficulty", f"{result.keyword_difficulty:.1f}")
        metrics_table.add_row("CPC", f"${result.cpc:.2f}")
        console.print(metrics_table)
        console.print()
    # SERP Features
    feat = result.serp_features
    feat_table = Table(title="SERP Features", show_lines=True)
    feat_table.add_column("Feature", style="cyan")
    feat_table.add_column("Present", style="green")
    feat_table.add_row("Featured Snippet", _bool_icon(feat.featured_snippet))
    feat_table.add_row("People Also Ask", _bool_icon(feat.people_also_ask))
    feat_table.add_row("Local Pack", _bool_icon(feat.local_pack))
    feat_table.add_row("Knowledge Panel", _bool_icon(feat.knowledge_panel))
    feat_table.add_row("Video Carousel", _bool_icon(feat.video_carousel))
    feat_table.add_row("Image Pack", _bool_icon(feat.image_pack))
    feat_table.add_row("Site Links", _bool_icon(feat.site_links))
    feat_table.add_row("Shopping", _bool_icon(feat.shopping))
    feat_table.add_row("Ads (top)", str(feat.ads_top))
    feat_table.add_row("Ads (bottom)", str(feat.ads_bottom))
    console.print(feat_table)
    console.print()
    # Competitors
    if result.competitors:
        comp_table = Table(title="Top Competitors", show_lines=True)
        comp_table.add_column("#", style="bold")
        comp_table.add_column("Domain", style="cyan")
        comp_table.add_column("Type", style="magenta")
        comp_table.add_column("CTR Share", style="green")
        comp_table.add_column("Featured", style="yellow")
        for c in result.competitors[:10]:
            comp_table.add_row(
                str(c.position),
                c.domain,
                c.content_type,
                f"{c.estimated_traffic_share:.1%}",
                _bool_icon(c.is_featured),
            )
        console.print(comp_table)
        console.print()
    # Content Distribution
    if result.content_type_distribution:
        dist_table = Table(title="Content Type Distribution (Top 10)", show_lines=True)
        dist_table.add_column("Content Type", style="cyan")
        dist_table.add_column("Count", style="green")
        for ct, count in result.content_type_distribution.items():
            dist_table.add_row(ct, str(count))
        console.print(dist_table)
        console.print()
    # Summary
    opp_color = "green" if result.opportunity_score >= 60 else (
        "yellow" if result.opportunity_score >= 40 else "red"
    )
    console.print(f"Opportunity Score: [{opp_color}]{result.opportunity_score}/100[/{opp_color}]")
    console.print(f"Search Intent: [bold]{result.intent_signals}[/bold]")
    console.print(f"SERP Volatility: [bold]{result.volatility}[/bold]")
    console.rule()
 def _bool_icon(val: bool) -> str:
    """Return Yes/No string for boolean values."""
    return "Yes" if val else "No"
 # ---------------------------------------------------------------------------
 # CLI
 # ---------------------------------------------------------------------------
 def build_parser() -> argparse.ArgumentParser:
    parser = argparse.ArgumentParser(
        description="Google SERP feature detection and competitor mapping",
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog="""
 Examples:
  python serp_analyzer.py --keyword "치과 임플란트" --country kr --json
  python serp_analyzer.py --keywords-file keywords.txt --country kr --output report.json
        """,
    )
    group = parser.add_mutually_exclusive_group(required=True)
    group.add_argument(
        "--keyword",
        type=str,
        help="Single keyword to analyze",
    )
    group.add_argument(
        "--keywords-file",
        type=str,
        help="Path to file with one keyword per line",
    )
    parser.add_argument(
        "--country",
        type=str,
        default="us",
        help="Country code for SERP (default: us)",
    )
    parser.add_argument(
        "--json",
        action="store_true",
        dest="json_output",
        help="Output results as JSON",
    )
    parser.add_argument(
        "--output",
        type=str,
        help="Write JSON results to file",
    )
    return parser
 def load_keywords(filepath: str) -> list[str]:
    """Load keywords from a text file, one per line."""
    path = Path(filepath)
    if not path.exists():
        logger.error(f"Keywords file not found: {filepath}")
        sys.exit(1)
    keywords = []
    with open(path, "r", encoding="utf-8") as fh:
        for line in fh:
            kw = line.strip()
            if kw and not kw.startswith("#"):
                keywords.append(kw)
    logger.info(f"Loaded {len(keywords)} keywords from {filepath}")
    return keywords
 def main() -> None:
    parser = build_parser()
    args = parser.parse_args()
    analyzer = SerpAnalyzer()
    # Collect keywords
    if args.keyword:
        keywords = [args.keyword]
    else:
        keywords = load_keywords(args.keywords_file)
    if not keywords:
        logger.error("No keywords to analyze")
        sys.exit(1)
    results: list[dict[str, Any]] = []
    for kw in keywords:
        console.print(f"\n[bold]Analyzing:[/bold] {kw}")
        result = analyzer.analyze(kw, args.country)
        if args.json_output or args.output:
            results.append(result_to_dict(result))
        else:
            print_rich_report(result)
    # JSON output
    if args.json_output:
        output_data = results[0] if len(results) == 1 else results
        print(json.dumps(output_data, ensure_ascii=False, indent=2))
    if args.output:
        output_data = results[0] if len(results) == 1 else results
        output_path = Path(args.output)
        with open(output_path, "w", encoding="utf-8") as fh:
            json.dump(output_data, fh, ensure_ascii=False, indent=2)
        logger.info(f"Results written to {output_path}")
 if __name__ == "__main__":
    main()
--- a/custom-skills/20-seo-serp-analysis/desktop/SKILL.md
+++ b/custom-skills/20-seo-serp-analysis/desktop/SKILL.md
@@ -0,0 +1,121 @@
 ---
 name: seo-serp-analysis
 description: |
  SERP analysis for Google and Naver search results.
  Triggers: SERP analysis, search results, featured snippet, SERP features, Naver SERP, 검색결과 분석, SERP 분석.
 ---
 # SEO SERP Analysis
 ## Purpose
 Analyze search engine result page composition for Google and Naver. Detect SERP features (featured snippets, PAA, knowledge panels, local pack, video, ads), map competitor positions, score SERP feature opportunities, and analyze Naver section distribution.
 ## Core Capabilities
 1. **Google SERP Feature Detection** - Identify featured snippets, PAA, knowledge panels, local pack, video carousel, ads, image pack, site links, shopping
 2. **Competitor Position Mapping** - Extract domains, positions, content types for top organic results
 3. **Opportunity Scoring** - Score SERP opportunity (0-100) based on feature landscape and competition
 4. **Search Intent Validation** - Infer intent (informational, navigational, commercial, transactional, local) from SERP composition
 5. **Naver SERP Composition** - Detect sections (blog, cafe, knowledge iN, Smart Store, brand zone, VIEW tab), map section priority, analyze brand zone presence
 ## MCP Tool Usage
 ### Ahrefs for SERP Data
 ```
 mcp__ahrefs__serp-overview: Get SERP results and features for a keyword
 mcp__ahrefs__keywords-explorer-overview: Get keyword metrics, volume, difficulty, and SERP feature flags
 mcp__ahrefs__site-explorer-organic-keywords: Map competitor keyword positions
 ```
 ### Notion for Report Storage
 ```
 mcp__notion__notion-create-pages: Save analysis report to SEO Audit Log database
 mcp__notion__notion-update-page: Update existing report entries
 ```
 ### Web Tools for Naver Analysis
 ```
 WebSearch: Discover Naver search trends
 WebFetch: Fetch Naver SERP HTML for section analysis
 ```
 ## Workflow
 ### 1. Google SERP Analysis
 1. Fetch SERP data via `mcp__ahrefs__serp-overview` for the target keyword and country
 2. Detect SERP features (featured snippet, PAA, local pack, knowledge panel, video, ads, images, shopping)
 3. Map competitor positions from organic results (domain, URL, title, position)
 4. Classify content type for each result (blog, product, service, news, video)
 5. Calculate opportunity score (0-100) based on feature landscape
 6. Validate search intent from SERP composition
 7. Assess SERP volatility
 ### 2. Naver SERP Analysis
 1. Fetch Naver search page for the target keyword
 2. Detect SERP sections (blog, cafe, knowledge iN, Smart Store, brand zone, VIEW tab, news, encyclopedia)
 3. Map section priority (above-fold order)
 4. Check brand zone presence and extract brand name
 5. Count items per section
 6. Identify dominant content section
 ### 3. Report Generation
 1. Compile results into structured JSON
 2. Generate Korean-language report
 3. Save to Notion SEO Audit Log database
 ## Output Format
 ```json
 {
  "keyword": "치과 임플란트",
  "country": "kr",
  "serp_features": {
    "featured_snippet": true,
    "people_also_ask": true,
    "local_pack": true,
    "knowledge_panel": false,
    "video_carousel": false,
    "ads_top": 3,
    "ads_bottom": 2
  },
  "competitors": [
    {
      "position": 1,
      "url": "https://example.com/page",
      "domain": "example.com",
      "title": "...",
      "content_type": "service_page"
    }
  ],
  "opportunity_score": 72,
  "intent_signals": "commercial",
  "timestamp": "2025-01-01T00:00:00"
 }
 ```
 ## Common SERP Features
 | Feature | Impact | Opportunity |
 |---------|--------|-------------|
 | Featured Snippet | High visibility above organic | Optimize content format for snippet capture |
 | People Also Ask | Related question visibility | Create FAQ content targeting PAA |
 | Local Pack | Dominates local intent SERPs | Optimize Google Business Profile |
 | Knowledge Panel | Reduces organic CTR | Focus on brand queries and schema |
 | Video Carousel | Visual SERP real estate | Create video content for keyword |
 | Shopping | Transactional intent signal | Product feed optimization |
 ## Limitations
 - Ahrefs SERP data may have a delay (not real-time)
 - Naver SERP HTML structure changes periodically
 - Brand zone detection depends on HTML class patterns
 - Cannot detect personalized SERP results
 ## Notion Output (Required)
 All audit reports MUST be saved to OurDigital SEO Audit Log:
 - **Database ID**: `2c8581e5-8a1e-8035-880b-e38cefc2f3ef`
 - **Properties**: Issue (title), Site (url), Category, Priority, Found Date, Audit ID
 - **Language**: Korean with English technical terms
 - **Audit ID Format**: SERP-YYYYMMDD-NNN
--- a/custom-skills/20-seo-serp-analysis/desktop/skill.yaml
+++ b/custom-skills/20-seo-serp-analysis/desktop/skill.yaml
@@ -0,0 +1,14 @@
 # Skill metadata (extracted from SKILL.md frontmatter)
 name: seo-serp-analysis
 description: |
  SERP analysis for Google and Naver. Triggers: SERP analysis, search results, featured snippet, SERP features, Naver SERP.
 # Optional fields
 allowed-tools:
  - mcp__ahrefs__*
  - mcp__notion__*
  - WebSearch
  - WebFetch
 # triggers: []  # TODO: Extract from description
--- a/custom-skills/20-seo-serp-analysis/desktop/tools/ahrefs.md
+++ b/custom-skills/20-seo-serp-analysis/desktop/tools/ahrefs.md
@@ -0,0 +1,15 @@
 # Ahrefs
 > TODO: Document tool usage for this skill
 ## Available Commands
 - [ ] List commands
 ## Configuration
 - [ ] Add configuration details
 ## Examples
 - [ ] Add usage examples
--- a/custom-skills/20-seo-serp-analysis/desktop/tools/notion.md
+++ b/custom-skills/20-seo-serp-analysis/desktop/tools/notion.md
@@ -0,0 +1,15 @@
 # Notion
 > TODO: Document tool usage for this skill
 ## Available Commands
 - [ ] List commands
 ## Configuration
 - [ ] Add configuration details
 ## Examples
 - [ ] Add usage examples
--- a/custom-skills/20-seo-serp-analysis/desktop/tools/websearch.md
+++ b/custom-skills/20-seo-serp-analysis/desktop/tools/websearch.md
@@ -0,0 +1,15 @@
 # WebSearch
 > TODO: Document tool usage for this skill
 ## Available Commands
 - [ ] List commands
 ## Configuration
 - [ ] Add configuration details
 ## Examples
 - [ ] Add usage examples
--- a/custom-skills/21-seo-position-tracking/code/CLAUDE.md
+++ b/custom-skills/21-seo-position-tracking/code/CLAUDE.md
@@ -0,0 +1,148 @@
 # CLAUDE.md
 ## Overview
 Position tracking tool for monitoring keyword rankings via Ahrefs Rank Tracker. Monitors ranking positions, detects position changes with threshold alerts, calculates visibility scores weighted by search volume, compares against competitors, and segments by brand/non-brand keywords.
 ## Quick Start
 ```bash
 pip install -r scripts/requirements.txt
 # Track positions for a project
 python scripts/position_tracker.py --target https://example.com --json
 # Generate ranking report
 python scripts/ranking_reporter.py --target https://example.com --period 30 --json
 ```
 ## Scripts
 | Script | Purpose | Key Output |
 |--------|---------|------------|
 | `position_tracker.py` | Monitor keyword ranking positions and detect changes | Position data, change alerts, visibility scores |
 | `ranking_reporter.py` | Generate ranking performance reports with trends | Trend analysis, segment reports, competitor comparison |
 | `base_client.py` | Shared utilities | RateLimiter, ConfigManager, BaseAsyncClient |
 ## Position Tracker
 ```bash
 # Get current positions
 python scripts/position_tracker.py --target https://example.com --json
 # With change threshold alerts (flag positions that moved +-5 or more)
 python scripts/position_tracker.py --target https://example.com --threshold 5 --json
 # Filter by keyword segment
 python scripts/position_tracker.py --target https://example.com --segment brand --json
 # Compare with competitors
 python scripts/position_tracker.py --target https://example.com --competitor https://comp1.com --json
 ```
 **Capabilities**:
 - Current ranking position retrieval via Ahrefs Rank Tracker
 - Position change detection with configurable threshold alerts
 - Visibility score calculation (weighted by search volume)
 - Brand vs non-brand keyword segmentation
 - Competitor rank comparison
 - Keyword segment grouping (by intent, cluster, landing page)
 ## Ranking Reporter
 ```bash
 # 30-day ranking report
 python scripts/ranking_reporter.py --target https://example.com --period 30 --json
 # Quarterly comparison
 python scripts/ranking_reporter.py --target https://example.com --period 90 --json
 # Export with competitor comparison
 python scripts/ranking_reporter.py --target https://example.com --competitor https://comp1.com --period 30 --json
 ```
 **Capabilities**:
 - Period-over-period ranking trends (improved/declined/stable)
 - Top movers (biggest position gains/losses)
 - Visibility score trend over time
 - Segment-level performance breakdown
 - Competitor overlap and position comparison
 - Average position by keyword group
 ## Ahrefs MCP Tools Used
 | Tool | Purpose |
 |------|---------|
 | `rank-tracker-overview` | Get rank tracking overview for project |
 | `rank-tracker-competitors-overview` | Compare against competitors |
 | `rank-tracker-competitors-pages` | Competitor page-level rankings |
 | `rank-tracker-competitors-stats` | Competitor ranking statistics |
 | `rank-tracker-serp-overview` | SERP details for tracked keywords |
 | `management-projects` | List Ahrefs projects |
 | `management-project-keywords` | Get tracked keywords for project |
 ## Output Format
 ```json
 {
  "target": "https://example.com",
  "total_keywords": 250,
  "visibility_score": 68.5,
  "positions": {
    "top3": 15,
    "top10": 48,
    "top20": 92,
    "top50": 180,
    "top100": 230
  },
  "changes": {
    "improved": 45,
    "declined": 30,
    "stable": 155,
    "new": 12,
    "lost": 8
  },
  "alerts": [
    {
      "keyword": "치과 임플란트 가격",
      "old_position": 5,
      "new_position": 15,
      "change": -10,
      "volume": 5400
    }
  ],
  "segments": {
    "brand": {"keywords": 30, "avg_position": 2.1},
    "non_brand": {"keywords": 220, "avg_position": 24.5}
  },
  "timestamp": "2025-01-01T00:00:00"
 }
 ```
 ## Notion Output (Required)
 **IMPORTANT**: All audit reports MUST be saved to the OurDigital SEO Audit Log database.
 ### Database Configuration
 | Field | Value |
 |-------|-------|
 | Database ID | `2c8581e5-8a1e-8035-880b-e38cefc2f3ef` |
 | URL | https://www.notion.so/dintelligence/2c8581e58a1e8035880be38cefc2f3ef |
 ### Required Properties
 | Property | Type | Description |
 |----------|------|-------------|
 | Issue | Title | Report title (Korean + date) |
 | Site | URL | Tracked website URL |
 | Category | Select | Position Tracking |
 | Priority | Select | Based on visibility trend |
 | Found Date | Date | Tracking date (YYYY-MM-DD) |
 | Audit ID | Rich Text | Format: RANK-YYYYMMDD-NNN |
 ### Language Guidelines
 - Report content in Korean (한국어)
 - Keep technical English terms as-is (e.g., Visibility Score, SERP, Rank Tracker)
 - URLs and code remain unchanged
--- a/custom-skills/21-seo-position-tracking/code/scripts/base_client.py
+++ b/custom-skills/21-seo-position-tracking/code/scripts/base_client.py
@@ -0,0 +1,207 @@
 """
 Base Client - Shared async client utilities
 ===========================================
 Purpose: Rate-limited async operations for API clients
 Python: 3.10+
 """
 import asyncio
 import logging
 import os
 from asyncio import Semaphore
 from datetime import datetime
 from typing import Any, Callable, TypeVar
 from dotenv import load_dotenv
 from tenacity import (
    retry,
    stop_after_attempt,
    wait_exponential,
    retry_if_exception_type,
 )
 # Load environment variables
 load_dotenv()
 # Logging setup
 logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s",
 )
 T = TypeVar("T")
 class RateLimiter:
    """Rate limiter using token bucket algorithm."""
    def __init__(self, rate: float, per: float = 1.0):
        """
        Initialize rate limiter.
        Args:
            rate: Number of requests allowed
            per: Time period in seconds (default: 1 second)
        """
        self.rate = rate
        self.per = per
        self.tokens = rate
        self.last_update = datetime.now()
        self._lock = asyncio.Lock()
    async def acquire(self) -> None:
        """Acquire a token, waiting if necessary."""
        async with self._lock:
            now = datetime.now()
            elapsed = (now - self.last_update).total_seconds()
            self.tokens = min(self.rate, self.tokens + elapsed * (self.rate / self.per))
            self.last_update = now
            if self.tokens < 1:
                wait_time = (1 - self.tokens) * (self.per / self.rate)
                await asyncio.sleep(wait_time)
                self.tokens = 0
            else:
                self.tokens -= 1
 class BaseAsyncClient:
    """Base class for async API clients with rate limiting."""
    def __init__(
        self,
        max_concurrent: int = 5,
        requests_per_second: float = 3.0,
        logger: logging.Logger | None = None,
    ):
        """
        Initialize base client.
        Args:
            max_concurrent: Maximum concurrent requests
            requests_per_second: Rate limit
            logger: Logger instance
        """
        self.semaphore = Semaphore(max_concurrent)
        self.rate_limiter = RateLimiter(requests_per_second)
        self.logger = logger or logging.getLogger(self.__class__.__name__)
        self.stats = {
            "requests": 0,
            "success": 0,
            "errors": 0,
            "retries": 0,
        }
    @retry(
        stop=stop_after_attempt(3),
        wait=wait_exponential(multiplier=1, min=2, max=10),
        retry=retry_if_exception_type(Exception),
    )
    async def _rate_limited_request(
        self,
        coro: Callable[[], Any],
    ) -> Any:
        """Execute a request with rate limiting and retry."""
        async with self.semaphore:
            await self.rate_limiter.acquire()
            self.stats["requests"] += 1
            try:
                result = await coro()
                self.stats["success"] += 1
                return result
            except Exception as e:
                self.stats["errors"] += 1
                self.logger.error(f"Request failed: {e}")
                raise
    async def batch_requests(
        self,
        requests: list[Callable[[], Any]],
        desc: str = "Processing",
    ) -> list[Any]:
        """Execute multiple requests concurrently."""
        try:
            from tqdm.asyncio import tqdm
            has_tqdm = True
        except ImportError:
            has_tqdm = False
        async def execute(req: Callable) -> Any:
            try:
                return await self._rate_limited_request(req)
            except Exception as e:
                return {"error": str(e)}
        tasks = [execute(req) for req in requests]
        if has_tqdm:
            results = []
            for coro in tqdm.as_completed(tasks, total=len(tasks), desc=desc):
                result = await coro
                results.append(result)
            return results
        else:
            return await asyncio.gather(*tasks, return_exceptions=True)
    def print_stats(self) -> None:
        """Print request statistics."""
        self.logger.info("=" * 40)
        self.logger.info("Request Statistics:")
        self.logger.info(f"  Total Requests: {self.stats['requests']}")
        self.logger.info(f"  Successful: {self.stats['success']}")
        self.logger.info(f"  Errors: {self.stats['errors']}")
        self.logger.info("=" * 40)
 class ConfigManager:
    """Manage API configuration and credentials."""
    def __init__(self):
        load_dotenv()
    @property
    def google_credentials_path(self) -> str | None:
        """Get Google service account credentials path."""
        # Prefer SEO-specific credentials, fallback to general credentials
        seo_creds = os.path.expanduser("~/.credential/ourdigital-seo-agent.json")
        if os.path.exists(seo_creds):
            return seo_creds
        return os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
    @property
    def pagespeed_api_key(self) -> str | None:
        """Get PageSpeed Insights API key."""
        return os.getenv("PAGESPEED_API_KEY")
    @property
    def custom_search_api_key(self) -> str | None:
        """Get Custom Search API key."""
        return os.getenv("CUSTOM_SEARCH_API_KEY")
    @property
    def custom_search_engine_id(self) -> str | None:
        """Get Custom Search Engine ID."""
        return os.getenv("CUSTOM_SEARCH_ENGINE_ID")
    @property
    def notion_token(self) -> str | None:
        """Get Notion API token."""
        return os.getenv("NOTION_TOKEN") or os.getenv("NOTION_API_KEY")
    def validate_google_credentials(self) -> bool:
        """Validate Google credentials are configured."""
        creds_path = self.google_credentials_path
        if not creds_path:
            return False
        return os.path.exists(creds_path)
    def get_required(self, key: str) -> str:
        """Get required environment variable or raise error."""
        value = os.getenv(key)
        if not value:
            raise ValueError(f"Missing required environment variable: {key}")
        return value
 # Singleton config instance
 config = ConfigManager()
--- a/custom-skills/21-seo-position-tracking/code/scripts/position_tracker.py
+++ b/custom-skills/21-seo-position-tracking/code/scripts/position_tracker.py
@@ -0,0 +1,786 @@
 """
 Position Tracker - Keyword Ranking Monitor via Ahrefs Rank Tracker
 ==================================================================
 Purpose: Monitor keyword positions, detect changes, calculate visibility scores
 Python: 3.10+
 Usage:
    python position_tracker.py --target https://example.com --json
    python position_tracker.py --target https://example.com --threshold 5 --json
    python position_tracker.py --target https://example.com --segment brand --json
    python position_tracker.py --target https://example.com --competitor https://comp1.com --json
 """
 import argparse
 import asyncio
 import json
 import logging
 import math
 import sys
 from dataclasses import dataclass, field, asdict
 from datetime import datetime
 from typing import Optional
 from urllib.parse import urlparse
 from base_client import BaseAsyncClient, config
 logger = logging.getLogger(__name__)
 # ---------------------------------------------------------------------------
 # CTR curve weights for visibility score (position 1-100)
 # Based on industry-standard organic CTR curves
 # ---------------------------------------------------------------------------
 CTR_WEIGHTS: dict[int, float] = {
    1: 0.300,
    2: 0.150,
    3: 0.100,
    4: 0.070,
    5: 0.050,
    6: 0.038,
    7: 0.030,
    8: 0.025,
    9: 0.020,
    10: 0.018,
 }
 # Positions 11-20 get diminishing CTR
 for _p in range(11, 21):
    CTR_WEIGHTS[_p] = round(0.015 - (_p - 11) * 0.001, 4)
 # Positions 21-50 get minimal CTR
 for _p in range(21, 51):
    CTR_WEIGHTS[_p] = round(max(0.005 - (_p - 21) * 0.0001, 0.001), 4)
 # Positions 51-100 get near-zero CTR
 for _p in range(51, 101):
    CTR_WEIGHTS[_p] = 0.0005
 # ---------------------------------------------------------------------------
 # Data classes
 # ---------------------------------------------------------------------------
@dataclass
 class KeywordPosition:
    """Single keyword ranking position."""
    keyword: str
    position: int
    previous_position: Optional[int] = None
    change: int = 0
    volume: int = 0
    url: str = ""
    intent: str = "informational"
    is_brand: bool = False
    def __post_init__(self):
        if self.previous_position is not None:
            self.change = self.previous_position - self.position
@dataclass
 class VisibilityScore:
    """Weighted visibility score based on CTR curve."""
    score: float = 0.0
    top3: int = 0
    top10: int = 0
    top20: int = 0
    top50: int = 0
    top100: int = 0
    total_keywords: int = 0
    @property
    def distribution(self) -> dict:
        return {
            "top3": self.top3,
            "top10": self.top10,
            "top20": self.top20,
            "top50": self.top50,
            "top100": self.top100,
        }
@dataclass
 class PositionAlert:
    """Alert for significant position change."""
    keyword: str
    old_position: int
    new_position: int
    change: int
    volume: int = 0
    severity: str = "medium"
    def __post_init__(self):
        abs_change = abs(self.change)
        if abs_change >= 20:
            self.severity = "critical"
        elif abs_change >= 10:
            self.severity = "high"
        elif abs_change >= 5:
            self.severity = "medium"
        else:
            self.severity = "low"
@dataclass
 class CompetitorComparison:
    """Competitor ranking comparison result."""
    competitor: str
    overlap_keywords: int = 0
    competitor_better: int = 0
    target_better: int = 0
    avg_position_gap: float = 0.0
    top_gaps: list = field(default_factory=list)
@dataclass
 class SegmentData:
    """Keyword segment aggregation."""
    name: str
    keywords: int = 0
    avg_position: float = 0.0
    visibility: float = 0.0
    improved: int = 0
    declined: int = 0
    stable: int = 0
@dataclass
 class TrackingResult:
    """Complete position tracking result."""
    target: str
    total_keywords: int = 0
    visibility_score: float = 0.0
    visibility: Optional[VisibilityScore] = None
    positions: list[KeywordPosition] = field(default_factory=list)
    changes: dict = field(default_factory=lambda: {
        "improved": 0, "declined": 0, "stable": 0, "new": 0, "lost": 0,
    })
    alerts: list[PositionAlert] = field(default_factory=list)
    segments: dict[str, SegmentData] = field(default_factory=dict)
    competitors: list[CompetitorComparison] = field(default_factory=list)
    timestamp: str = ""
    def __post_init__(self):
        if not self.timestamp:
            self.timestamp = datetime.now().isoformat()
    def to_dict(self) -> dict:
        """Convert to JSON-serializable dictionary."""
        result = {
            "target": self.target,
            "total_keywords": self.total_keywords,
            "visibility_score": round(self.visibility_score, 2),
            "positions": self.visibility.distribution if self.visibility else {},
            "changes": self.changes,
            "alerts": [asdict(a) for a in self.alerts],
            "segments": {
                k: asdict(v) for k, v in self.segments.items()
            },
            "competitors": [asdict(c) for c in self.competitors],
            "keyword_details": [asdict(p) for p in self.positions],
            "timestamp": self.timestamp,
        }
        return result
 # ---------------------------------------------------------------------------
 # Position Tracker
 # ---------------------------------------------------------------------------
 class PositionTracker(BaseAsyncClient):
    """Track keyword ranking positions via Ahrefs Rank Tracker."""
    def __init__(self):
        super().__init__(
            max_concurrent=5,
            requests_per_second=2.0,
            logger=logger,
        )
        self.brand_terms: list[str] = []
    def _extract_domain_brand(self, target: str) -> list[str]:
        """Extract brand terms from the target domain name."""
        parsed = urlparse(target)
        hostname = parsed.hostname or target
        # Remove TLD and www prefix
        parts = hostname.replace("www.", "").split(".")
        brand_parts = []
        for part in parts:
            if part not in ("com", "co", "kr", "net", "org", "io", "ai", "www"):
                brand_parts.append(part.lower())
                # Also split camelCase and hyphenated forms
                if "-" in part:
                    brand_parts.extend(part.lower().split("-"))
        return list(set(brand_parts))
    async def get_project_keywords(self, target: str) -> list[dict]:
        """
        Fetch tracked keywords from Ahrefs management-project-keywords.
        Uses Ahrefs MCP tool: management-project-keywords
        Returns list of keyword dicts with keyword, volume, intent info.
        """
        logger.info(f"Fetching project keywords for: {target}")
        # Step 1: Get project list to find matching project
        projects = await self._call_ahrefs_projects(target)
        if not projects:
            logger.warning(f"No Ahrefs project found for {target}. Using rank-tracker-overview directly.")
            return []
        project_id = projects[0].get("id", "")
        # Step 2: Fetch keywords for the project
        keywords_data = await self._call_ahrefs_project_keywords(project_id)
        return keywords_data
    async def _call_ahrefs_projects(self, target: str) -> list[dict]:
        """
        Call Ahrefs management-projects MCP tool.
        In production, this calls the MCP tool. For standalone, reads from config/cache.
        """
        # Simulated MCP call structure - in production this calls:
        # mcp__ahrefs__management-projects
        logger.info("Calling Ahrefs management-projects...")
        try:
            import subprocess
            result = subprocess.run(
                ["mcp-cli", "call", "ahrefs/management-projects", json.dumps({})],
                capture_output=True, text=True, timeout=30,
            )
            if result.returncode == 0:
                return json.loads(result.stdout).get("projects", [])
        except (FileNotFoundError, subprocess.TimeoutExpired, json.JSONDecodeError):
            pass
        # Return empty if MCP not available - caller handles gracefully
        return []
    async def _call_ahrefs_project_keywords(self, project_id: str) -> list[dict]:
        """
        Call Ahrefs management-project-keywords MCP tool.
        """
        logger.info(f"Calling Ahrefs management-project-keywords for project: {project_id}")
        try:
            import subprocess
            result = subprocess.run(
                ["mcp-cli", "call", "ahrefs/management-project-keywords",
                 json.dumps({"project_id": project_id})],
                capture_output=True, text=True, timeout=30,
            )
            if result.returncode == 0:
                return json.loads(result.stdout).get("keywords", [])
        except (FileNotFoundError, subprocess.TimeoutExpired, json.JSONDecodeError):
            pass
        return []
    async def get_current_positions(self, target: str) -> list[KeywordPosition]:
        """
        Fetch current keyword positions via Ahrefs rank-tracker-overview.
        Returns list of KeywordPosition objects with current and previous positions.
        """
        logger.info(f"Fetching current positions for: {target}")
        self.brand_terms = self._extract_domain_brand(target)
        raw_data = await self._call_rank_tracker_overview(target)
        positions: list[KeywordPosition] = []
        for item in raw_data:
            keyword = item.get("keyword", "")
            current_pos = item.get("position", 0)
            prev_pos = item.get("previous_position")
            volume = item.get("volume", 0)
            url = item.get("url", "")
            intent = item.get("intent", "informational")
            # Determine if brand keyword
            is_brand = self._is_brand_keyword(keyword)
            kp = KeywordPosition(
                keyword=keyword,
                position=current_pos,
                previous_position=prev_pos,
                volume=volume,
                url=url,
                intent=intent,
                is_brand=is_brand,
            )
            positions.append(kp)
        logger.info(f"Retrieved {len(positions)} keyword positions")
        return positions
    async def _call_rank_tracker_overview(self, target: str) -> list[dict]:
        """
        Call Ahrefs rank-tracker-overview MCP tool.
        """
        logger.info(f"Calling Ahrefs rank-tracker-overview for: {target}")
        try:
            import subprocess
            result = subprocess.run(
                ["mcp-cli", "call", "ahrefs/rank-tracker-overview",
                 json.dumps({"target": target})],
                capture_output=True, text=True, timeout=60,
            )
            if result.returncode == 0:
                data = json.loads(result.stdout)
                return data.get("keywords", data.get("results", []))
        except (FileNotFoundError, subprocess.TimeoutExpired, json.JSONDecodeError):
            pass
        return []
    def _is_brand_keyword(self, keyword: str) -> bool:
        """Check if a keyword is brand-related based on domain name."""
        keyword_lower = keyword.lower()
        for term in self.brand_terms:
            if term in keyword_lower:
                return True
        return False
    def detect_changes(
        self,
        positions: list[KeywordPosition],
        threshold: int = 3,
    ) -> tuple[dict, list[PositionAlert]]:
        """
        Detect significant position changes and generate alerts.
        Args:
            positions: List of current keyword positions with previous data
            threshold: Minimum position change to trigger an alert
        Returns:
            Tuple of (change_summary_dict, list_of_alerts)
        """
        changes = {
            "improved": 0,
            "declined": 0,
            "stable": 0,
            "new": 0,
            "lost": 0,
        }
        alerts: list[PositionAlert] = []
        for kp in positions:
            if kp.previous_position is None:
                changes["new"] += 1
                continue
            if kp.position == 0 and kp.previous_position > 0:
                changes["lost"] += 1
                alert = PositionAlert(
                    keyword=kp.keyword,
                    old_position=kp.previous_position,
                    new_position=0,
                    change=-kp.previous_position,
                    volume=kp.volume,
                )
                alerts.append(alert)
                continue
            change = kp.change  # positive = improved, negative = declined
            if change > 0:
                changes["improved"] += 1
            elif change < 0:
                changes["declined"] += 1
            else:
                changes["stable"] += 1
            # Generate alert if change exceeds threshold
            if abs(change) >= threshold:
                alert = PositionAlert(
                    keyword=kp.keyword,
                    old_position=kp.previous_position,
                    new_position=kp.position,
                    change=change,
                    volume=kp.volume,
                )
                alerts.append(alert)
        # Sort alerts by severity (critical first) then by volume (high first)
        severity_order = {"critical": 0, "high": 1, "medium": 2, "low": 3}
        alerts.sort(key=lambda a: (severity_order.get(a.severity, 4), -a.volume))
        logger.info(
            f"Changes detected - improved: {changes['improved']}, "
            f"declined: {changes['declined']}, stable: {changes['stable']}, "
            f"new: {changes['new']}, lost: {changes['lost']}"
        )
        logger.info(f"Alerts generated: {len(alerts)} (threshold: {threshold})")
        return changes, alerts
    def calculate_visibility(self, positions: list[KeywordPosition]) -> VisibilityScore:
        """
        Calculate weighted visibility score based on CTR curve.
        Visibility = sum(keyword_volume * ctr_weight_for_position) / sum(keyword_volume)
        Score normalized to 0-100 scale.
        """
        vis = VisibilityScore()
        total_weighted = 0.0
        total_volume = 0
        for kp in positions:
            if kp.position <= 0 or kp.position > 100:
                continue
            vis.total_keywords += 1
            volume = max(kp.volume, 1)  # Avoid zero volume
            total_volume += volume
            # Position bucket counting
            if kp.position <= 3:
                vis.top3 += 1
            if kp.position <= 10:
                vis.top10 += 1
            if kp.position <= 20:
                vis.top20 += 1
            if kp.position <= 50:
                vis.top50 += 1
            if kp.position <= 100:
                vis.top100 += 1
            # Weighted visibility
            ctr = CTR_WEIGHTS.get(kp.position, 0.0005)
            total_weighted += volume * ctr
        if total_volume > 0:
            # Normalize: max possible is if all keywords were position 1
            max_possible = total_volume * CTR_WEIGHTS[1]
            vis.score = (total_weighted / max_possible) * 100.0
        else:
            vis.score = 0.0
        logger.info(
            f"Visibility score: {vis.score:.2f} | "
            f"Top3: {vis.top3}, Top10: {vis.top10}, Top20: {vis.top20}"
        )
        return vis
    def segment_keywords(
        self,
        positions: list[KeywordPosition],
        filter_segment: Optional[str] = None,
    ) -> dict[str, SegmentData]:
        """
        Segment keywords into brand/non-brand and by intent type.
        Args:
            positions: List of keyword positions
            filter_segment: Optional filter - 'brand', 'non_brand', or intent type
        Returns:
            Dictionary of segment name to SegmentData
        """
        segments: dict[str, list[KeywordPosition]] = {
            "brand": [],
            "non_brand": [],
        }
        intent_segments: dict[str, list[KeywordPosition]] = {}
        for kp in positions:
            # Brand segmentation
            if kp.is_brand:
                segments["brand"].append(kp)
            else:
                segments["non_brand"].append(kp)
            # Intent segmentation
            intent_key = kp.intent.lower() if kp.intent else "informational"
            if intent_key not in intent_segments:
                intent_segments[intent_key] = []
            intent_segments[intent_key].append(kp)
        # Merge intent segments into main segments
        for intent_key, kps in intent_segments.items():
            segments[f"intent_{intent_key}"] = kps
        # Calculate segment stats
        result: dict[str, SegmentData] = {}
        for seg_name, kps in segments.items():
            if filter_segment and seg_name != filter_segment:
                continue
            if not kps:
                continue
            active_positions = [kp for kp in kps if kp.position > 0]
            avg_pos = (
                sum(kp.position for kp in active_positions) / len(active_positions)
                if active_positions else 0.0
            )
            vis = self.calculate_visibility(kps)
            improved = sum(1 for kp in kps if kp.change > 0)
            declined = sum(1 for kp in kps if kp.change < 0)
            stable = sum(1 for kp in kps if kp.change == 0 and kp.previous_position is not None)
            result[seg_name] = SegmentData(
                name=seg_name,
                keywords=len(kps),
                avg_position=round(avg_pos, 1),
                visibility=round(vis.score, 2),
                improved=improved,
                declined=declined,
                stable=stable,
            )
        return result
    async def compare_competitors(
        self,
        target: str,
        competitors: list[str],
    ) -> list[CompetitorComparison]:
        """
        Compare ranking positions against competitors.
        Uses Ahrefs rank-tracker-competitors-overview MCP tool.
        """
        comparisons: list[CompetitorComparison] = []
        for competitor in competitors:
            logger.info(f"Comparing with competitor: {competitor}")
            comp_data = await self._call_competitors_overview(target, competitor)
            comparison = CompetitorComparison(competitor=competitor)
            if comp_data:
                comparison.overlap_keywords = comp_data.get("overlap_keywords", 0)
                comparison.competitor_better = comp_data.get("competitor_better", 0)
                comparison.target_better = comp_data.get("target_better", 0)
                comparison.avg_position_gap = comp_data.get("avg_position_gap", 0.0)
                # Extract top gaps (keywords where competitor outranks us most)
                top_gaps = comp_data.get("top_gaps", [])
                comparison.top_gaps = top_gaps[:10]
            comparisons.append(comparison)
        return comparisons
    async def _call_competitors_overview(self, target: str, competitor: str) -> dict:
        """
        Call Ahrefs rank-tracker-competitors-overview MCP tool.
        """
        logger.info(f"Calling Ahrefs rank-tracker-competitors-overview...")
        try:
            import subprocess
            result = subprocess.run(
                ["mcp-cli", "call", "ahrefs/rank-tracker-competitors-overview",
                 json.dumps({"target": target, "competitor": competitor})],
                capture_output=True, text=True, timeout=60,
            )
            if result.returncode == 0:
                return json.loads(result.stdout)
        except (FileNotFoundError, subprocess.TimeoutExpired, json.JSONDecodeError):
            pass
        return {}
    async def analyze(
        self,
        target: str,
        threshold: int = 3,
        competitors: Optional[list[str]] = None,
        segment_filter: Optional[str] = None,
    ) -> TrackingResult:
        """
        Orchestrate full position tracking analysis.
        Args:
            target: Target website URL
            threshold: Position change threshold for alerts
            competitors: List of competitor URLs to compare
            segment_filter: Optional segment filter (brand, non_brand, intent_*)
        Returns:
            Complete TrackingResult with all analysis data
        """
        logger.info(f"Starting position tracking analysis for: {target}")
        logger.info(f"Threshold: {threshold}, Competitors: {competitors or 'none'}")
        result = TrackingResult(target=target)
        # Step 1: Fetch current positions
        positions = await self.get_current_positions(target)
        if not positions:
            logger.warning("No position data retrieved. Check Ahrefs project configuration.")
            return result
        result.positions = positions
        result.total_keywords = len(positions)
        # Step 2: Detect changes and generate alerts
        changes, alerts = self.detect_changes(positions, threshold)
        result.changes = changes
        result.alerts = alerts
        # Step 3: Calculate visibility score
        visibility = self.calculate_visibility(positions)
        result.visibility = visibility
        result.visibility_score = visibility.score
        # Step 4: Segment keywords
        segments = self.segment_keywords(positions, segment_filter)
        result.segments = segments
        # Step 5: Compare with competitors (if provided)
        if competitors:
            comp_results = await self.compare_competitors(target, competitors)
            result.competitors = comp_results
        logger.info(f"Analysis complete. Total keywords: {result.total_keywords}")
        logger.info(f"Visibility score: {result.visibility_score:.2f}")
        return result
 # ---------------------------------------------------------------------------
 # Output formatters
 # ---------------------------------------------------------------------------
 def format_text_report(result: TrackingResult) -> str:
    """Format tracking result as human-readable text report."""
    lines = []
    lines.append("=" * 60)
    lines.append(f"Position Tracking Report: {result.target}")
    lines.append(f"Timestamp: {result.timestamp}")
    lines.append("=" * 60)
    # Visibility overview
    lines.append(f"\nVisibility Score: {result.visibility_score:.2f}/100")
    lines.append(f"Total Keywords Tracked: {result.total_keywords}")
    if result.visibility:
        vis = result.visibility
        lines.append(f"\nPosition Distribution:")
        lines.append(f"  Top 3:   {vis.top3}")
        lines.append(f"  Top 10:  {vis.top10}")
        lines.append(f"  Top 20:  {vis.top20}")
        lines.append(f"  Top 50:  {vis.top50}")
        lines.append(f"  Top 100: {vis.top100}")
    # Changes summary
    ch = result.changes
    lines.append(f"\nPosition Changes:")
    lines.append(f"  Improved: {ch.get('improved', 0)}")
    lines.append(f"  Declined: {ch.get('declined', 0)}")
    lines.append(f"  Stable:   {ch.get('stable', 0)}")
    lines.append(f"  New:      {ch.get('new', 0)}")
    lines.append(f"  Lost:     {ch.get('lost', 0)}")
    # Alerts
    if result.alerts:
        lines.append(f"\nAlerts ({len(result.alerts)}):")
        lines.append("-" * 60)
        for alert in result.alerts[:20]:
            direction = "UP" if alert.change > 0 else "DOWN"
            lines.append(
                f"  [{alert.severity.upper()}] {alert.keyword}: "
                f"{alert.old_position} -> {alert.new_position} "
                f"({direction} {abs(alert.change)}) | Vol: {alert.volume}"
            )
    # Segments
    if result.segments:
        lines.append(f"\nSegments:")
        lines.append("-" * 60)
        for name, seg in result.segments.items():
            lines.append(
                f"  {name}: {seg.keywords} keywords, "
                f"avg pos {seg.avg_position}, "
                f"vis {seg.visibility}"
            )
    # Competitors
    if result.competitors:
        lines.append(f"\nCompetitor Comparison:")
        lines.append("-" * 60)
        for comp in result.competitors:
            lines.append(f"  vs {comp.competitor}:")
            lines.append(f"    Overlap: {comp.overlap_keywords} keywords")
            lines.append(f"    We win: {comp.target_better}")
            lines.append(f"    They win: {comp.competitor_better}")
            lines.append(f"    Avg gap: {comp.avg_position_gap:.1f}")
    lines.append("\n" + "=" * 60)
    return "\n".join(lines)
 # ---------------------------------------------------------------------------
 # CLI
 # ---------------------------------------------------------------------------
 def parse_args() -> argparse.Namespace:
    parser = argparse.ArgumentParser(
        description="Position Tracker - Monitor keyword rankings via Ahrefs Rank Tracker",
    )
    parser.add_argument(
        "--target",
        required=True,
        help="Target website URL (e.g., https://example.com)",
    )
    parser.add_argument(
        "--threshold",
        type=int,
        default=3,
        help="Position change threshold for alerts (default: 3)",
    )
    parser.add_argument(
        "--segment",
        choices=["brand", "non_brand", "intent_informational",
                 "intent_commercial", "intent_transactional", "intent_navigational"],
        default=None,
        help="Filter results by keyword segment",
    )
    parser.add_argument(
        "--competitor",
        action="append",
        dest="competitors",
        default=[],
        help="Competitor URL to compare (repeatable)",
    )
    parser.add_argument(
        "--json",
        action="store_true",
        dest="json_output",
        help="Output in JSON format",
    )
    parser.add_argument(
        "--output",
        type=str,
        default=None,
        help="Save output to file path",
    )
    return parser.parse_args()
 async def main():
    args = parse_args()
    tracker = PositionTracker()
    result = await tracker.analyze(
        target=args.target,
        threshold=args.threshold,
        competitors=args.competitors,
        segment_filter=args.segment,
    )
    if args.json_output:
        output = json.dumps(result.to_dict(), ensure_ascii=False, indent=2)
    else:
        output = format_text_report(result)
    if args.output:
        with open(args.output, "w", encoding="utf-8") as f:
            f.write(output)
        logger.info(f"Output saved to: {args.output}")
    else:
        print(output)
    tracker.print_stats()
 if __name__ == "__main__":
    asyncio.run(main())
--- a/custom-skills/21-seo-position-tracking/code/scripts/ranking_reporter.py
+++ b/custom-skills/21-seo-position-tracking/code/scripts/ranking_reporter.py
@@ -0,0 +1,728 @@
 """
 Ranking Reporter - Ranking Performance Reports with Trends
 ==========================================================
 Purpose: Generate ranking reports with trend analysis, top movers, and competitor comparison
 Python: 3.10+
 Usage:
    python ranking_reporter.py --target https://example.com --period 30 --json
    python ranking_reporter.py --target https://example.com --period 90 --json
    python ranking_reporter.py --target https://example.com --competitor https://comp1.com --period 30 --json
 """
 import argparse
 import asyncio
 import json
 import logging
 import sys
 from dataclasses import dataclass, field, asdict
 from datetime import datetime, timedelta
 from typing import Optional
 from urllib.parse import urlparse
 from base_client import BaseAsyncClient, config
 logger = logging.getLogger(__name__)
 # CTR weights for impact scoring (same as position_tracker)
 CTR_WEIGHTS: dict[int, float] = {
    1: 0.300, 2: 0.150, 3: 0.100, 4: 0.070, 5: 0.050,
    6: 0.038, 7: 0.030, 8: 0.025, 9: 0.020, 10: 0.018,
 }
 for _p in range(11, 21):
    CTR_WEIGHTS[_p] = round(0.015 - (_p - 11) * 0.001, 4)
 for _p in range(21, 51):
    CTR_WEIGHTS[_p] = round(max(0.005 - (_p - 21) * 0.0001, 0.001), 4)
 for _p in range(51, 101):
    CTR_WEIGHTS[_p] = 0.0005
 # ---------------------------------------------------------------------------
 # Data classes
 # ---------------------------------------------------------------------------
@dataclass
 class PositionSnapshot:
    """A single position measurement at a point in time."""
    date: str
    position: int
    volume: int = 0
    url: str = ""
@dataclass
 class RankingTrend:
    """Keyword ranking trend over time."""
    keyword: str
    positions_over_time: list[PositionSnapshot] = field(default_factory=list)
    trend_direction: str = "stable"  # improved, declined, stable, new, lost
    avg_position: float = 0.0
    current_position: int = 0
    start_position: int = 0
    total_change: int = 0
    volume: int = 0
    intent: str = "informational"
    is_brand: bool = False
    def compute_trend(self):
        """Compute trend direction and average from position history."""
        if not self.positions_over_time:
            self.trend_direction = "stable"
            return
        positions = [s.position for s in self.positions_over_time if s.position > 0]
        if not positions:
            self.trend_direction = "lost"
            return
        self.avg_position = sum(positions) / len(positions)
        self.current_position = positions[-1]
        self.start_position = positions[0]
        self.total_change = self.start_position - self.current_position
        # Determine trend using linear regression direction
        if len(positions) >= 2:
            n = len(positions)
            x_mean = (n - 1) / 2.0
            y_mean = sum(positions) / n
            numerator = sum((i - x_mean) * (p - y_mean) for i, p in enumerate(positions))
            denominator = sum((i - x_mean) ** 2 for i in range(n))
            if denominator > 0:
                slope = numerator / denominator
                # Negative slope means position number decreasing = improving
                if slope < -0.5:
                    self.trend_direction = "improved"
                elif slope > 0.5:
                    self.trend_direction = "declined"
                else:
                    self.trend_direction = "stable"
            else:
                self.trend_direction = "stable"
        if self.volume == 0 and self.positions_over_time:
            self.volume = self.positions_over_time[-1].volume
@dataclass
 class TopMover:
    """Keyword with significant position change."""
    keyword: str
    position_change: int
    current_position: int = 0
    previous_position: int = 0
    volume: int = 0
    impact_score: float = 0.0
    direction: str = "improved"
    def calculate_impact(self):
        """Calculate impact score: volume * CTR delta."""
        old_ctr = CTR_WEIGHTS.get(self.previous_position, 0.0005) if self.previous_position > 0 else 0.0
        new_ctr = CTR_WEIGHTS.get(self.current_position, 0.0005) if self.current_position > 0 else 0.0
        ctr_delta = abs(new_ctr - old_ctr)
        self.impact_score = round(self.volume * ctr_delta, 2)
        self.direction = "improved" if self.position_change > 0 else "declined"
@dataclass
 class SegmentReport:
    """Performance breakdown for a keyword segment."""
    segment_name: str
    total_keywords: int = 0
    avg_position: float = 0.0
    avg_position_change: float = 0.0
    visibility_score: float = 0.0
    improved_count: int = 0
    declined_count: int = 0
    stable_count: int = 0
    top_gainers: list[TopMover] = field(default_factory=list)
    top_losers: list[TopMover] = field(default_factory=list)
@dataclass
 class CompetitorReport:
    """Competitor comparison for a reporting period."""
    competitor: str
    our_visibility: float = 0.0
    their_visibility: float = 0.0
    overlap_keywords: int = 0
    keywords_we_lead: int = 0
    keywords_they_lead: int = 0
    notable_gaps: list[dict] = field(default_factory=list)
@dataclass
 class RankingReport:
    """Complete ranking performance report."""
    target: str
    period_days: int = 30
    period_start: str = ""
    period_end: str = ""
    total_keywords: int = 0
    current_visibility: float = 0.0
    previous_visibility: float = 0.0
    visibility_change: float = 0.0
    trend_summary: dict = field(default_factory=lambda: {
        "improved": 0, "declined": 0, "stable": 0, "new": 0, "lost": 0,
    })
    top_gainers: list[TopMover] = field(default_factory=list)
    top_losers: list[TopMover] = field(default_factory=list)
    segments: list[SegmentReport] = field(default_factory=list)
    competitors: list[CompetitorReport] = field(default_factory=list)
    keyword_trends: list[RankingTrend] = field(default_factory=list)
    timestamp: str = ""
    def __post_init__(self):
        if not self.timestamp:
            self.timestamp = datetime.now().isoformat()
        if not self.period_end:
            self.period_end = datetime.now().strftime("%Y-%m-%d")
        if not self.period_start:
            start = datetime.now() - timedelta(days=self.period_days)
            self.period_start = start.strftime("%Y-%m-%d")
    def to_dict(self) -> dict:
        """Convert to JSON-serializable dictionary."""
        return {
            "target": self.target,
            "period": {
                "days": self.period_days,
                "start": self.period_start,
                "end": self.period_end,
            },
            "total_keywords": self.total_keywords,
            "visibility": {
                "current": round(self.current_visibility, 2),
                "previous": round(self.previous_visibility, 2),
                "change": round(self.visibility_change, 2),
            },
            "trend_summary": self.trend_summary,
            "top_gainers": [asdict(m) for m in self.top_gainers],
            "top_losers": [asdict(m) for m in self.top_losers],
            "segments": [asdict(s) for s in self.segments],
            "competitors": [asdict(c) for c in self.competitors],
            "keyword_trends": [
                {
                    "keyword": t.keyword,
                    "trend_direction": t.trend_direction,
                    "avg_position": round(t.avg_position, 1),
                    "current_position": t.current_position,
                    "start_position": t.start_position,
                    "total_change": t.total_change,
                    "volume": t.volume,
                }
                for t in self.keyword_trends
            ],
            "timestamp": self.timestamp,
        }
 # ---------------------------------------------------------------------------
 # Ranking Reporter
 # ---------------------------------------------------------------------------
 class RankingReporter(BaseAsyncClient):
    """Generate ranking performance reports with trend analysis."""
    def __init__(self):
        super().__init__(
            max_concurrent=5,
            requests_per_second=2.0,
            logger=logger,
        )
    def _extract_domain_brand(self, target: str) -> list[str]:
        """Extract brand terms from the target domain name."""
        parsed = urlparse(target)
        hostname = parsed.hostname or target
        parts = hostname.replace("www.", "").split(".")
        brand_parts = []
        for part in parts:
            if part not in ("com", "co", "kr", "net", "org", "io", "ai", "www"):
                brand_parts.append(part.lower())
                if "-" in part:
                    brand_parts.extend(part.lower().split("-"))
        return list(set(brand_parts))
    async def get_historical_positions(
        self,
        target: str,
        period_days: int = 30,
    ) -> list[RankingTrend]:
        """
        Fetch historical position data from Ahrefs rank-tracker-overview
        with date range parameters.
        Returns list of RankingTrend objects with position snapshots over time.
        """
        logger.info(f"Fetching historical positions for {target} ({period_days} days)")
        brand_terms = self._extract_domain_brand(target)
        end_date = datetime.now().strftime("%Y-%m-%d")
        start_date = (datetime.now() - timedelta(days=period_days)).strftime("%Y-%m-%d")
        raw_data = await self._call_rank_tracker_historical(target, start_date, end_date)
        trends: dict[str, RankingTrend] = {}
        for item in raw_data:
            keyword = item.get("keyword", "")
            if keyword not in trends:
                is_brand = any(term in keyword.lower() for term in brand_terms)
                trends[keyword] = RankingTrend(
                    keyword=keyword,
                    volume=item.get("volume", 0),
                    intent=item.get("intent", "informational"),
                    is_brand=is_brand,
                )
            snapshot = PositionSnapshot(
                date=item.get("date", end_date),
                position=item.get("position", 0),
                volume=item.get("volume", 0),
                url=item.get("url", ""),
            )
            trends[keyword].positions_over_time.append(snapshot)
        # Sort snapshots by date and compute trends
        for trend in trends.values():
            trend.positions_over_time.sort(key=lambda s: s.date)
            trend.compute_trend()
        logger.info(f"Retrieved trends for {len(trends)} keywords")
        return list(trends.values())
    async def _call_rank_tracker_historical(
        self, target: str, start_date: str, end_date: str,
    ) -> list[dict]:
        """Call Ahrefs rank-tracker-overview with date range."""
        logger.info(f"Calling Ahrefs rank-tracker-overview ({start_date} to {end_date})...")
        try:
            import subprocess
            result = subprocess.run(
                ["mcp-cli", "call", "ahrefs/rank-tracker-overview",
                 json.dumps({
                     "target": target,
                     "date_from": start_date,
                     "date_to": end_date,
                 })],
                capture_output=True, text=True, timeout=60,
            )
            if result.returncode == 0:
                data = json.loads(result.stdout)
                return data.get("keywords", data.get("results", []))
        except (FileNotFoundError, subprocess.TimeoutExpired, json.JSONDecodeError):
            pass
        return []
    def calculate_trends(self, trends: list[RankingTrend]) -> dict:
        """
        Compute overall trend summary from keyword trends.
        Returns dict with improved/declined/stable/new/lost counts.
        """
        summary = {
            "improved": 0,
            "declined": 0,
            "stable": 0,
            "new": 0,
            "lost": 0,
        }
        for trend in trends:
            direction = trend.trend_direction
            if direction in summary:
                summary[direction] += 1
            else:
                summary["stable"] += 1
        logger.info(
            f"Trend summary: improved={summary['improved']}, "
            f"declined={summary['declined']}, stable={summary['stable']}"
        )
        return summary
    def find_top_movers(
        self,
        trends: list[RankingTrend],
        limit: int = 10,
    ) -> tuple[list[TopMover], list[TopMover]]:
        """
        Find keywords with biggest position gains and losses.
        Returns tuple of (top_gainers, top_losers) sorted by impact score.
        """
        gainers: list[TopMover] = []
        losers: list[TopMover] = []
        for trend in trends:
            if not trend.positions_over_time or len(trend.positions_over_time) < 2:
                continue
            first_pos = trend.start_position
            last_pos = trend.current_position
            if first_pos <= 0 or last_pos <= 0:
                continue
            change = first_pos - last_pos  # positive = improved
            mover = TopMover(
                keyword=trend.keyword,
                position_change=change,
                current_position=last_pos,
                previous_position=first_pos,
                volume=trend.volume,
            )
            mover.calculate_impact()
            if change > 0:
                gainers.append(mover)
            elif change < 0:
                losers.append(mover)
        # Sort by impact score descending
        gainers.sort(key=lambda m: m.impact_score, reverse=True)
        losers.sort(key=lambda m: m.impact_score, reverse=True)
        logger.info(f"Top movers: {len(gainers)} gainers, {len(losers)} losers")
        return gainers[:limit], losers[:limit]
    def _calculate_visibility_score(self, trends: list[RankingTrend], use_start: bool = False) -> float:
        """Calculate visibility score from trends (current or start positions)."""
        total_weighted = 0.0
        total_volume = 0
        for trend in trends:
            pos = trend.start_position if use_start else trend.current_position
            if pos <= 0 or pos > 100:
                continue
            volume = max(trend.volume, 1)
            total_volume += volume
            ctr = CTR_WEIGHTS.get(pos, 0.0005)
            total_weighted += volume * ctr
        if total_volume > 0:
            max_possible = total_volume * CTR_WEIGHTS[1]
            return (total_weighted / max_possible) * 100.0
        return 0.0
    def generate_segment_report(self, trends: list[RankingTrend]) -> list[SegmentReport]:
        """
        Generate performance breakdown by keyword segment.
        Segments include: brand, non_brand, and by intent type.
        """
        segment_map: dict[str, list[RankingTrend]] = {}
        for trend in trends:
            # Brand segment
            brand_key = "brand" if trend.is_brand else "non_brand"
            if brand_key not in segment_map:
                segment_map[brand_key] = []
            segment_map[brand_key].append(trend)
            # Intent segment
            intent_key = f"intent_{trend.intent.lower()}" if trend.intent else "intent_informational"
            if intent_key not in segment_map:
                segment_map[intent_key] = []
            segment_map[intent_key].append(trend)
        reports: list[SegmentReport] = []
        for seg_name, seg_trends in sorted(segment_map.items()):
            if not seg_trends:
                continue
            active = [t for t in seg_trends if t.current_position > 0]
            avg_pos = sum(t.current_position for t in active) / len(active) if active else 0.0
            avg_change = sum(t.total_change for t in seg_trends) / len(seg_trends) if seg_trends else 0.0
            vis = self._calculate_visibility_score(seg_trends, use_start=False)
            improved = sum(1 for t in seg_trends if t.trend_direction == "improved")
            declined = sum(1 for t in seg_trends if t.trend_direction == "declined")
            stable = sum(1 for t in seg_trends if t.trend_direction == "stable")
            # Get top movers within segment
            seg_gainers, seg_losers = self.find_top_movers(seg_trends, limit=5)
            report = SegmentReport(
                segment_name=seg_name,
                total_keywords=len(seg_trends),
                avg_position=round(avg_pos, 1),
                avg_position_change=round(avg_change, 1),
                visibility_score=round(vis, 2),
                improved_count=improved,
                declined_count=declined,
                stable_count=stable,
                top_gainers=seg_gainers,
                top_losers=seg_losers,
            )
            reports.append(report)
        return reports
    async def compare_with_competitor(
        self,
        target: str,
        competitor: str,
        period_days: int = 30,
    ) -> CompetitorReport:
        """
        Period-over-period comparison with a competitor.
        Uses Ahrefs rank-tracker-competitors-stats for detailed comparison.
        """
        logger.info(f"Comparing {target} vs {competitor} over {period_days} days")
        comp_data = await self._call_competitors_stats(target, competitor)
        report = CompetitorReport(competitor=competitor)
        if comp_data:
            report.our_visibility = comp_data.get("target_visibility", 0.0)
            report.their_visibility = comp_data.get("competitor_visibility", 0.0)
            report.overlap_keywords = comp_data.get("overlap_keywords", 0)
            report.keywords_we_lead = comp_data.get("target_better", 0)
            report.keywords_they_lead = comp_data.get("competitor_better", 0)
            # Extract notable gaps
            gaps = comp_data.get("keyword_gaps", [])
            report.notable_gaps = [
                {
                    "keyword": g.get("keyword", ""),
                    "our_position": g.get("target_position", 0),
                    "their_position": g.get("competitor_position", 0),
                    "volume": g.get("volume", 0),
                }
                for g in gaps[:15]
            ]
        return report
    async def _call_competitors_stats(self, target: str, competitor: str) -> dict:
        """Call Ahrefs rank-tracker-competitors-stats MCP tool."""
        logger.info("Calling Ahrefs rank-tracker-competitors-stats...")
        try:
            import subprocess
            result = subprocess.run(
                ["mcp-cli", "call", "ahrefs/rank-tracker-competitors-stats",
                 json.dumps({"target": target, "competitor": competitor})],
                capture_output=True, text=True, timeout=60,
            )
            if result.returncode == 0:
                return json.loads(result.stdout)
        except (FileNotFoundError, subprocess.TimeoutExpired, json.JSONDecodeError):
            pass
        return {}
    async def generate_report(
        self,
        target: str,
        period_days: int = 30,
        competitors: Optional[list[str]] = None,
    ) -> RankingReport:
        """
        Orchestrate full ranking performance report generation.
        Args:
            target: Target website URL
            period_days: Reporting period in days
            competitors: List of competitor URLs to compare
        Returns:
            Complete RankingReport with trends, movers, segments, and comparisons
        """
        logger.info(f"Generating ranking report for: {target} ({period_days} days)")
        report = RankingReport(target=target, period_days=period_days)
        # Step 1: Fetch historical position data
        trends = await self.get_historical_positions(target, period_days)
        if not trends:
            logger.warning("No historical data retrieved. Check Ahrefs project configuration.")
            return report
        report.keyword_trends = trends
        report.total_keywords = len(trends)
        # Step 2: Calculate trend summary
        report.trend_summary = self.calculate_trends(trends)
        # Step 3: Calculate visibility scores (current vs period start)
        report.current_visibility = self._calculate_visibility_score(trends, use_start=False)
        report.previous_visibility = self._calculate_visibility_score(trends, use_start=True)
        report.visibility_change = report.current_visibility - report.previous_visibility
        # Step 4: Find top movers
        gainers, losers = self.find_top_movers(trends, limit=10)
        report.top_gainers = gainers
        report.top_losers = losers
        # Step 5: Generate segment reports
        report.segments = self.generate_segment_report(trends)
        # Step 6: Compare with competitors
        if competitors:
            for competitor in competitors:
                comp_report = await self.compare_with_competitor(
                    target, competitor, period_days,
                )
                report.competitors.append(comp_report)
        logger.info(f"Report complete. Keywords: {report.total_keywords}")
        logger.info(
            f"Visibility: {report.previous_visibility:.2f} -> "
            f"{report.current_visibility:.2f} ({report.visibility_change:+.2f})"
        )
        return report
 # ---------------------------------------------------------------------------
 # Output formatters
 # ---------------------------------------------------------------------------
 def format_text_report(report: RankingReport) -> str:
    """Format ranking report as human-readable text."""
    lines = []
    lines.append("=" * 60)
    lines.append(f"Ranking Performance Report: {report.target}")
    lines.append(f"Period: {report.period_start} ~ {report.period_end} ({report.period_days} days)")
    lines.append(f"Generated: {report.timestamp}")
    lines.append("=" * 60)
    # Visibility trend
    lines.append(f"\nVisibility Score:")
    lines.append(f"  Current:  {report.current_visibility:.2f}")
    lines.append(f"  Previous: {report.previous_visibility:.2f}")
    change_sign = "+" if report.visibility_change >= 0 else ""
    lines.append(f"  Change:   {change_sign}{report.visibility_change:.2f}")
    # Trend summary
    ts = report.trend_summary
    lines.append(f"\nKeyword Trends ({report.total_keywords} total):")
    lines.append(f"  Improved: {ts.get('improved', 0)}")
    lines.append(f"  Declined: {ts.get('declined', 0)}")
    lines.append(f"  Stable:   {ts.get('stable', 0)}")
    lines.append(f"  New:      {ts.get('new', 0)}")
    lines.append(f"  Lost:     {ts.get('lost', 0)}")
    # Top gainers
    if report.top_gainers:
        lines.append(f"\nTop Gainers:")
        lines.append("-" * 60)
        for m in report.top_gainers:
            lines.append(
                f"  {m.keyword}: {m.previous_position} -> {m.current_position} "
                f"(+{m.position_change}) | Vol: {m.volume} | Impact: {m.impact_score}"
            )
    # Top losers
    if report.top_losers:
        lines.append(f"\nTop Losers:")
        lines.append("-" * 60)
        for m in report.top_losers:
            lines.append(
                f"  {m.keyword}: {m.previous_position} -> {m.current_position} "
                f"({m.position_change}) | Vol: {m.volume} | Impact: {m.impact_score}"
            )
    # Segments
    if report.segments:
        lines.append(f"\nSegment Breakdown:")
        lines.append("-" * 60)
        for seg in report.segments:
            lines.append(
                f"  {seg.segment_name}: {seg.total_keywords} kw, "
                f"avg pos {seg.avg_position}, vis {seg.visibility_score}, "
                f"improved {seg.improved_count} / declined {seg.declined_count}"
            )
    # Competitors
    if report.competitors:
        lines.append(f"\nCompetitor Comparison:")
        lines.append("-" * 60)
        for comp in report.competitors:
            lines.append(f"  vs {comp.competitor}:")
            lines.append(f"    Our visibility:   {comp.our_visibility:.2f}")
            lines.append(f"    Their visibility: {comp.their_visibility:.2f}")
            lines.append(f"    Overlap: {comp.overlap_keywords} keywords")
            lines.append(f"    We lead: {comp.keywords_we_lead}")
            lines.append(f"    They lead: {comp.keywords_they_lead}")
            if comp.notable_gaps:
                lines.append(f"    Notable gaps:")
                for gap in comp.notable_gaps[:5]:
                    lines.append(
                        f"      {gap['keyword']}: us #{gap['our_position']} "
                        f"vs them #{gap['their_position']} (vol: {gap['volume']})"
                    )
    lines.append("\n" + "=" * 60)
    return "\n".join(lines)
 # ---------------------------------------------------------------------------
 # CLI
 # ---------------------------------------------------------------------------
 def parse_args() -> argparse.Namespace:
    parser = argparse.ArgumentParser(
        description="Ranking Reporter - Generate ranking performance reports with trends",
    )
    parser.add_argument(
        "--target",
        required=True,
        help="Target website URL (e.g., https://example.com)",
    )
    parser.add_argument(
        "--period",
        type=int,
        default=30,
        help="Reporting period in days (default: 30)",
    )
    parser.add_argument(
        "--competitor",
        action="append",
        dest="competitors",
        default=[],
        help="Competitor URL to compare (repeatable)",
    )
    parser.add_argument(
        "--json",
        action="store_true",
        dest="json_output",
        help="Output in JSON format",
    )
    parser.add_argument(
        "--output",
        type=str,
        default=None,
        help="Save output to file path",
    )
    return parser.parse_args()
 async def main():
    args = parse_args()
    reporter = RankingReporter()
    report = await reporter.generate_report(
        target=args.target,
        period_days=args.period,
        competitors=args.competitors,
    )
    if args.json_output:
        output = json.dumps(report.to_dict(), ensure_ascii=False, indent=2)
    else:
        output = format_text_report(report)
    if args.output:
        with open(args.output, "w", encoding="utf-8") as f:
            f.write(output)
        logger.info(f"Output saved to: {args.output}")
    else:
        print(output)
    reporter.print_stats()
 if __name__ == "__main__":
    asyncio.run(main())
--- a/custom-skills/21-seo-position-tracking/code/scripts/requirements.txt
+++ b/custom-skills/21-seo-position-tracking/code/scripts/requirements.txt
@@ -0,0 +1,8 @@
 # 21-seo-position-tracking dependencies
 requests>=2.31.0
 aiohttp>=3.9.0
 pandas>=2.1.0
 tenacity>=8.2.0
 tqdm>=4.66.0
 python-dotenv>=1.0.0
 rich>=13.7.0
--- a/custom-skills/21-seo-position-tracking/desktop/SKILL.md
+++ b/custom-skills/21-seo-position-tracking/desktop/SKILL.md
@@ -0,0 +1,107 @@
 ---
 name: seo-position-tracking
 description: |
  Keyword position tracking and ranking monitoring via Ahrefs Rank Tracker.
  Triggers: rank tracking, position monitoring, keyword rankings, visibility score, ranking report, 키워드 순위, 순위 추적.
 ---
 # SEO Position Tracking
 ## Purpose
 Monitor keyword ranking positions, detect significant changes, calculate visibility scores, and compare against competitors using Ahrefs Rank Tracker data. Provides actionable alerts for ranking drops and segment-level performance breakdown.
 ## Core Capabilities
 1. **Position Monitoring** - Retrieve current keyword ranking positions from Ahrefs Rank Tracker projects
 2. **Change Detection** - Detect significant position changes with configurable threshold alerts (severity: critical/high/medium/low)
 3. **Visibility Scoring** - Calculate weighted visibility scores using CTR-curve model (position 1 = 30%, position 2 = 15%, etc.)
 4. **Brand/Non-brand Segmentation** - Automatically classify keywords by brand relevance and search intent type
 5. **Competitor Comparison** - Compare keyword overlap, position gaps, and visibility scores against competitors
 ## MCP Tool Usage
 ### Ahrefs Rank Tracker Tools
 ```
 mcp__ahrefs__rank-tracker-overview: Get rank tracking overview with current positions
 mcp__ahrefs__rank-tracker-competitors-overview: Compare rankings against competitors
 mcp__ahrefs__rank-tracker-competitors-pages: Competitor page-level ranking data
 mcp__ahrefs__rank-tracker-competitors-stats: Detailed competitor ranking statistics
 mcp__ahrefs__rank-tracker-serp-overview: SERP details for tracked keywords
 mcp__ahrefs__management-projects: List available Ahrefs projects
 mcp__ahrefs__management-project-keywords: Get tracked keywords for a project
 ```
 ### Notion for Report Storage
 ```
 mcp__notion__notion-create-pages: Save tracking reports to SEO Audit Log
 mcp__notion__notion-update-page: Update existing tracking entries
 ```
 ## Workflow
 ### Phase 1: Data Collection
 1. Identify Ahrefs project via `management-projects`
 2. Retrieve tracked keywords via `management-project-keywords`
 3. Fetch current positions via `rank-tracker-overview`
 4. Fetch competitor data via `rank-tracker-competitors-overview` (if requested)
 ### Phase 2: Analysis
 1. Detect position changes against previous period
 2. Generate alerts for changes exceeding threshold
 3. Calculate visibility score weighted by search volume and CTR curve
 4. Segment keywords into brand/non-brand and by intent type
 5. Compare positions against each competitor
 ### Phase 3: Reporting
 1. Compile position distribution (top3/top10/top20/top50/top100)
 2. Summarize changes (improved/declined/stable/new/lost)
 3. List alerts sorted by severity and search volume
 4. Generate segment-level breakdown
 5. Save report to Notion SEO Audit Log database
 ## Output Format
 ```json
 {
  "target": "https://example.com",
  "total_keywords": 250,
  "visibility_score": 68.5,
  "positions": {
    "top3": 15,
    "top10": 48,
    "top20": 92,
    "top50": 180,
    "top100": 230
  },
  "changes": {
    "improved": 45,
    "declined": 30,
    "stable": 155,
    "new": 12,
    "lost": 8
  },
  "alerts": [
    {
      "keyword": "example keyword",
      "old_position": 5,
      "new_position": 15,
      "change": -10,
      "volume": 5400,
      "severity": "high"
    }
  ],
  "segments": {
    "brand": {"keywords": 30, "avg_position": 2.1},
    "non_brand": {"keywords": 220, "avg_position": 24.5}
  }
 }
 ```
 ## Notion Output (Required)
 All tracking reports MUST be saved to OurDigital SEO Audit Log:
 - **Database ID**: `2c8581e5-8a1e-8035-880b-e38cefc2f3ef`
 - **Properties**: Issue (title), Site (url), Category (Position Tracking), Priority, Found Date, Audit ID
 - **Language**: Korean with English technical terms
 - **Audit ID Format**: RANK-YYYYMMDD-NNN
--- a/custom-skills/21-seo-position-tracking/desktop/skill.yaml
+++ b/custom-skills/21-seo-position-tracking/desktop/skill.yaml
@@ -0,0 +1,8 @@
 name: seo-position-tracking
 description: |
  Keyword position tracking and ranking monitoring. Triggers: rank tracking, position monitoring, keyword rankings, visibility score, ranking report.
 allowed-tools:
  - mcp__ahrefs__*
  - mcp__notion__*
  - WebSearch
  - WebFetch
--- a/custom-skills/21-seo-position-tracking/desktop/tools/ahrefs.md
+++ b/custom-skills/21-seo-position-tracking/desktop/tools/ahrefs.md
@@ -0,0 +1,15 @@
 # Ahrefs
 > TODO: Document tool usage for this skill
 ## Available Commands
 - [ ] List commands
 ## Configuration
 - [ ] Add configuration details
 ## Examples
 - [ ] Add usage examples
--- a/custom-skills/21-seo-position-tracking/desktop/tools/notion.md
+++ b/custom-skills/21-seo-position-tracking/desktop/tools/notion.md
@@ -0,0 +1,15 @@
 # Notion
 > TODO: Document tool usage for this skill
 ## Available Commands
 - [ ] List commands
 ## Configuration
 - [ ] Add configuration details
 ## Examples
 - [ ] Add usage examples
--- a/custom-skills/21-seo-position-tracking/desktop/tools/websearch.md
+++ b/custom-skills/21-seo-position-tracking/desktop/tools/websearch.md
@@ -0,0 +1,15 @@
 # WebSearch
 > TODO: Document tool usage for this skill
 ## Available Commands
 - [ ] List commands
 ## Configuration
 - [ ] Add configuration details
 ## Examples
 - [ ] Add usage examples
--- a/custom-skills/22-seo-link-building/code/CLAUDE.md
+++ b/custom-skills/22-seo-link-building/code/CLAUDE.md
@@ -0,0 +1,144 @@
 # CLAUDE.md
 ## Overview
 Link building diagnosis tool for backlink profile analysis, toxic link detection, competitor link gap identification, and link velocity tracking. Supports Korean platform link mapping (Naver Blog, Cafe, Tistory, Brunch, Korean news sites).
 ## Quick Start
 ```bash
 pip install -r scripts/requirements.txt
 # Backlink profile audit
 python scripts/backlink_auditor.py --url https://example.com --json
 # Link gap analysis vs competitors
 python scripts/link_gap_finder.py --target https://example.com --competitor https://competitor.com --json
 ```
 ## Scripts
 | Script | Purpose | Key Output |
 |--------|---------|------------|
 | `backlink_auditor.py` | Analyze backlink profile, detect toxic links | DR, referring domains, anchor distribution, toxic links |
 | `link_gap_finder.py` | Find link gap opportunities vs competitors | Domains linking to competitors but not target |
 | `base_client.py` | Shared utilities | RateLimiter, ConfigManager, BaseAsyncClient |
 ## Backlink Auditor
 ```bash
 # Full backlink audit
 python scripts/backlink_auditor.py --url https://example.com --json
 # Check link velocity (new/lost over time)
 python scripts/backlink_auditor.py --url https://example.com --velocity --json
 # Find broken backlinks for recovery
 python scripts/backlink_auditor.py --url https://example.com --broken --json
 # Korean platform link analysis
 python scripts/backlink_auditor.py --url https://example.com --korean-platforms --json
 ```
 **Capabilities**:
 - Domain Rating (DR) and backlink stats overview
 - Referring domain analysis (count, DR distribution, country distribution)
 - Anchor text distribution analysis (branded, exact-match, generic, naked URL)
 - Toxic link detection (PBN patterns, spammy domains, link farms)
 - Link velocity tracking (new/lost referring domains over time)
 - Broken backlink recovery opportunities
 - Korean platform mapping (Naver Blog, Naver Cafe, Tistory, Brunch, Korean news)
 ## Link Gap Finder
 ```bash
 # Gap vs one competitor
 python scripts/link_gap_finder.py --target https://example.com --competitor https://comp1.com --json
 # Multiple competitors
 python scripts/link_gap_finder.py --target https://example.com --competitor https://comp1.com --competitor https://comp2.com --json
 # Filter by minimum DR
 python scripts/link_gap_finder.py --target https://example.com --competitor https://comp1.com --min-dr 30 --json
 ```
 **Capabilities**:
 - Find domains linking to competitors but not target
 - Score link opportunities by DR, traffic, relevance
 - Categorize link sources (editorial, directory, forum, blog, news)
 - Prioritize by feasibility and impact
 ## Ahrefs MCP Tools Used
 | Tool | Purpose |
 |------|---------|
 | `site-explorer-all-backlinks` | Get all backlinks for a target |
 | `site-explorer-backlinks-stats` | Backlink statistics overview |
 | `site-explorer-referring-domains` | List referring domains |
 | `site-explorer-anchors` | Anchor text distribution |
 | `site-explorer-broken-backlinks` | Find broken backlinks |
 | `site-explorer-domain-rating` | Get Domain Rating |
 | `site-explorer-domain-rating-history` | DR trend over time |
 | `site-explorer-refdomains-history` | Referring domains trend |
 | `site-explorer-linked-domains` | Domains linked from target |
 ## Output Format
 ```json
 {
  "url": "https://example.com",
  "domain_rating": 45,
  "backlink_stats": {
    "total_backlinks": 12500,
    "referring_domains": 850,
    "dofollow_ratio": 0.72
  },
  "anchor_distribution": {
    "branded": 35,
    "exact_match": 12,
    "partial_match": 18,
    "generic": 20,
    "naked_url": 15
  },
  "toxic_links": [...],
  "korean_platforms": {
    "naver_blog": 45,
    "naver_cafe": 12,
    "tistory": 23,
    "brunch": 5
  },
  "link_velocity": {
    "new_last_30d": 120,
    "lost_last_30d": 35
  },
  "timestamp": "2025-01-01T00:00:00"
 }
 ```
 ## Notion Output (Required)
 **IMPORTANT**: All audit reports MUST be saved to the OurDigital SEO Audit Log database.
 ### Database Configuration
 | Field | Value |
 |-------|-------|
 | Database ID | `2c8581e5-8a1e-8035-880b-e38cefc2f3ef` |
 | URL | https://www.notion.so/dintelligence/2c8581e58a1e8035880be38cefc2f3ef |
 ### Required Properties
 | Property | Type | Description |
 |----------|------|-------------|
 | Issue | Title | Report title (Korean + date) |
 | Site | URL | Audited website URL |
 | Category | Select | Link Building |
 | Priority | Select | Based on toxic link count and gap size |
 | Found Date | Date | Audit date (YYYY-MM-DD) |
 | Audit ID | Rich Text | Format: LINK-YYYYMMDD-NNN |
 ### Language Guidelines
 - Report content in Korean (한국어)
 - Keep technical English terms as-is (e.g., Domain Rating, Referring Domains, Backlinks)
 - URLs and code remain unchanged
--- a/custom-skills/22-seo-link-building/code/scripts/backlink_auditor.py
+++ b/custom-skills/22-seo-link-building/code/scripts/backlink_auditor.py
--- a/custom-skills/22-seo-link-building/code/scripts/base_client.py
+++ b/custom-skills/22-seo-link-building/code/scripts/base_client.py
@@ -0,0 +1,207 @@
 """
 Base Client - Shared async client utilities
 ===========================================
 Purpose: Rate-limited async operations for API clients
 Python: 3.10+
 """
 import asyncio
 import logging
 import os
 from asyncio import Semaphore
 from datetime import datetime
 from typing import Any, Callable, TypeVar
 from dotenv import load_dotenv
 from tenacity import (
    retry,
    stop_after_attempt,
    wait_exponential,
    retry_if_exception_type,
 )
 # Load environment variables
 load_dotenv()
 # Logging setup
 logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s",
 )
 T = TypeVar("T")
 class RateLimiter:
    """Rate limiter using token bucket algorithm."""
    def __init__(self, rate: float, per: float = 1.0):
        """
        Initialize rate limiter.
        Args:
            rate: Number of requests allowed
            per: Time period in seconds (default: 1 second)
        """
        self.rate = rate
        self.per = per
        self.tokens = rate
        self.last_update = datetime.now()
        self._lock = asyncio.Lock()
    async def acquire(self) -> None:
        """Acquire a token, waiting if necessary."""
        async with self._lock:
            now = datetime.now()
            elapsed = (now - self.last_update).total_seconds()
            self.tokens = min(self.rate, self.tokens + elapsed * (self.rate / self.per))
            self.last_update = now
            if self.tokens < 1:
                wait_time = (1 - self.tokens) * (self.per / self.rate)
                await asyncio.sleep(wait_time)
                self.tokens = 0
            else:
                self.tokens -= 1
 class BaseAsyncClient:
    """Base class for async API clients with rate limiting."""
    def __init__(
        self,
        max_concurrent: int = 5,
        requests_per_second: float = 3.0,
        logger: logging.Logger | None = None,
    ):
        """
        Initialize base client.
        Args:
            max_concurrent: Maximum concurrent requests
            requests_per_second: Rate limit
            logger: Logger instance
        """
        self.semaphore = Semaphore(max_concurrent)
        self.rate_limiter = RateLimiter(requests_per_second)
        self.logger = logger or logging.getLogger(self.__class__.__name__)
        self.stats = {
            "requests": 0,
            "success": 0,
            "errors": 0,
            "retries": 0,
        }
    @retry(
        stop=stop_after_attempt(3),
        wait=wait_exponential(multiplier=1, min=2, max=10),
        retry=retry_if_exception_type(Exception),
    )
    async def _rate_limited_request(
        self,
        coro: Callable[[], Any],
    ) -> Any:
        """Execute a request with rate limiting and retry."""
        async with self.semaphore:
            await self.rate_limiter.acquire()
            self.stats["requests"] += 1
            try:
                result = await coro()
                self.stats["success"] += 1
                return result
            except Exception as e:
                self.stats["errors"] += 1
                self.logger.error(f"Request failed: {e}")
                raise
    async def batch_requests(
        self,
        requests: list[Callable[[], Any]],
        desc: str = "Processing",
    ) -> list[Any]:
        """Execute multiple requests concurrently."""
        try:
            from tqdm.asyncio import tqdm
            has_tqdm = True
        except ImportError:
            has_tqdm = False
        async def execute(req: Callable) -> Any:
            try:
                return await self._rate_limited_request(req)
            except Exception as e:
                return {"error": str(e)}
        tasks = [execute(req) for req in requests]
        if has_tqdm:
            results = []
            for coro in tqdm.as_completed(tasks, total=len(tasks), desc=desc):
                result = await coro
                results.append(result)
            return results
        else:
            return await asyncio.gather(*tasks, return_exceptions=True)
    def print_stats(self) -> None:
        """Print request statistics."""
        self.logger.info("=" * 40)
        self.logger.info("Request Statistics:")
        self.logger.info(f"  Total Requests: {self.stats['requests']}")
        self.logger.info(f"  Successful: {self.stats['success']}")
        self.logger.info(f"  Errors: {self.stats['errors']}")
        self.logger.info("=" * 40)
 class ConfigManager:
    """Manage API configuration and credentials."""
    def __init__(self):
        load_dotenv()
    @property
    def google_credentials_path(self) -> str | None:
        """Get Google service account credentials path."""
        # Prefer SEO-specific credentials, fallback to general credentials
        seo_creds = os.path.expanduser("~/.credential/ourdigital-seo-agent.json")
        if os.path.exists(seo_creds):
            return seo_creds
        return os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
    @property
    def pagespeed_api_key(self) -> str | None:
        """Get PageSpeed Insights API key."""
        return os.getenv("PAGESPEED_API_KEY")
    @property
    def custom_search_api_key(self) -> str | None:
        """Get Custom Search API key."""
        return os.getenv("CUSTOM_SEARCH_API_KEY")
    @property
    def custom_search_engine_id(self) -> str | None:
        """Get Custom Search Engine ID."""
        return os.getenv("CUSTOM_SEARCH_ENGINE_ID")
    @property
    def notion_token(self) -> str | None:
        """Get Notion API token."""
        return os.getenv("NOTION_TOKEN") or os.getenv("NOTION_API_KEY")
    def validate_google_credentials(self) -> bool:
        """Validate Google credentials are configured."""
        creds_path = self.google_credentials_path
        if not creds_path:
            return False
        return os.path.exists(creds_path)
    def get_required(self, key: str) -> str:
        """Get required environment variable or raise error."""
        value = os.getenv(key)
        if not value:
            raise ValueError(f"Missing required environment variable: {key}")
        return value
 # Singleton config instance
 config = ConfigManager()
--- a/custom-skills/22-seo-link-building/code/scripts/link_gap_finder.py
+++ b/custom-skills/22-seo-link-building/code/scripts/link_gap_finder.py
@@ -0,0 +1,802 @@
 """
 Link Gap Finder - Competitor link gap analysis
 ===============================================
 Purpose: Identify link building opportunities by finding domains that link
         to competitors but not to the target site via Ahrefs MCP.
 Python: 3.10+
 Usage:
    python link_gap_finder.py --target https://example.com --competitor https://comp1.com --json
    python link_gap_finder.py --target https://example.com --competitor https://comp1.com --competitor https://comp2.com --min-dr 30 --json
 """
 from __future__ import annotations
 import argparse
 import asyncio
 import json
 import logging
 import re
 import sys
 from dataclasses import dataclass, field, asdict
 from datetime import datetime
 from typing import Any
 from urllib.parse import urlparse
 import aiohttp
 import pandas as pd
 from rich.console import Console
 from rich.table import Table
 from base_client import BaseAsyncClient, config
 # ---------------------------------------------------------------------------
 # Logging
 # ---------------------------------------------------------------------------
 logger = logging.getLogger("link_gap_finder")
 console = Console()
 # ---------------------------------------------------------------------------
 # Constants
 # ---------------------------------------------------------------------------
 AHREFS_BASE = "https://api.ahrefs.com/v3"
 # Source category detection patterns
 SOURCE_CATEGORY_PATTERNS: dict[str, list[str]] = {
    "news": [
        "news", "press", "media", "journal", "herald", "times", "post",
        "gazette", "tribune", "daily", "chosun", "donga", "joongang",
        "hani", "khan", "yna", "yonhap", "reuters", "bloomberg",
        "techcrunch", "verge", "wired", "arstechnica", "bbc", "cnn",
    ],
    "blog": [
        "blog", "wordpress", "medium.com", "tistory.com", "brunch.co.kr",
        "blog.naver.com", "tumblr", "blogger", "substack", "ghost.io",
        "velog.io", "dev.to",
    ],
    "forum": [
        "forum", "community", "discuss", "reddit.com", "quora.com",
        "stackexchange", "stackoverflow", "cafe.naver.com", "dcinside",
        "fmkorea", "clien", "ppomppu", "theqoo", "ruliweb",
    ],
    "directory": [
        "directory", "listing", "yellowpages", "yelp", "bbb.org",
        "clutch.co", "g2.com", "capterra", "trustpilot", "glassdoor",
        "dmoz", "aboutus", "hotfrog", "manta", "superpages",
    ],
    "edu_gov": [
        ".edu", ".gov", ".ac.kr", ".go.kr", ".or.kr",
    ],
    "social": [
        "facebook.com", "twitter.com", "x.com", "linkedin.com",
        "instagram.com", "youtube.com", "pinterest.com", "tiktok.com",
    ],
    "korean_platform": [
        "naver.com", "daum.net", "kakao.com", "tistory.com",
        "brunch.co.kr", "zum.com", "nate.com",
    ],
 }
 # ---------------------------------------------------------------------------
 # Dataclasses
 # ---------------------------------------------------------------------------
@dataclass
 class LinkOpportunity:
    """A single link building opportunity from gap analysis."""
    domain: str
    dr: float = 0.0
    traffic: int = 0
    linked_competitors: list[str] = field(default_factory=list)
    competitor_count: int = 0
    not_linked_target: bool = True
    category: str = "other"
    feasibility_score: float = 0.0
    impact_score: float = 0.0
    overall_score: float = 0.0
    backlinks_to_competitors: int = 0
    country: str = ""
    top_anchor: str = ""
@dataclass
 class GapSummary:
    """Summary statistics for the gap analysis."""
    total_opportunities: int = 0
    avg_dr: float = 0.0
    high_dr_count: int = 0
    category_breakdown: dict[str, int] = field(default_factory=dict)
    top_countries: list[dict[str, Any]] = field(default_factory=list)
    total_competitor_refdomains: dict[str, int] = field(default_factory=dict)
    target_refdomains_count: int = 0
@dataclass
 class LinkGapResult:
    """Complete link gap analysis result."""
    target_url: str
    target_domain: str = ""
    competitor_urls: list[str] = field(default_factory=list)
    competitor_domains: list[str] = field(default_factory=list)
    target_dr: float = 0.0
    opportunities: list[LinkOpportunity] = field(default_factory=list)
    summary: GapSummary | None = None
    top_opportunities: list[LinkOpportunity] = field(default_factory=list)
    issues: list[dict[str, str]] = field(default_factory=list)
    recommendations: list[str] = field(default_factory=list)
    timestamp: str = ""
 # ---------------------------------------------------------------------------
 # LinkGapFinder
 # ---------------------------------------------------------------------------
 class LinkGapFinder(BaseAsyncClient):
    """Find link building opportunities by analyzing competitor backlink gaps."""
    def __init__(self, **kwargs):
        super().__init__(max_concurrent=5, requests_per_second=2.0, **kwargs)
        self.session: aiohttp.ClientSession | None = None
    # -- Ahrefs MCP helper ---------------------------------------------------
    async def _call_ahrefs(
        self, endpoint: str, params: dict[str, Any]
    ) -> dict[str, Any]:
        """
        Call Ahrefs API endpoint.
        In MCP context this calls mcp__ahrefs__<endpoint>.
        For standalone use, falls back to REST API with token.
        """
        api_token = config.get_required("AHREFS_API_TOKEN") if not self.session else None
        if self.session and api_token:
            url = f"{AHREFS_BASE}/{endpoint}"
            headers = {"Authorization": f"Bearer {api_token}"}
            async with self.session.get(url, headers=headers, params=params) as resp:
                resp.raise_for_status()
                return await resp.json()
        logger.warning(
            f"Ahrefs call to '{endpoint}' - use MCP tool "
            f"mcp__ahrefs__{endpoint.replace('-', '_')} in Claude Desktop"
        )
        return {"endpoint": endpoint, "params": params, "data": [], "note": "mcp_stub"}
    # -- Core methods --------------------------------------------------------
    async def get_referring_domains(
        self, url: str, limit: int = 1000
    ) -> list[dict[str, Any]]:
        """Fetch referring domains for a given URL/domain."""
        target = urlparse(url).netloc or url
        result = await self._call_ahrefs(
            "site-explorer-referring-domains",
            {"target": target, "mode": "domain", "limit": limit, "order_by": "domain_rating:desc"},
        )
        domains = result.get("data", result.get("refdomains", []))
        if isinstance(domains, dict):
            domains = domains.get("refdomains", [])
        return domains if isinstance(domains, list) else []
    async def get_domain_rating(self, url: str) -> float:
        """Fetch Domain Rating for a URL."""
        target = urlparse(url).netloc or url
        result = await self._call_ahrefs(
            "site-explorer-domain-rating",
            {"target": target},
        )
        data = result.get("data", result) if isinstance(result, dict) else {}
        return data.get("domain_rating", 0.0)
    async def get_domain_metrics(self, url: str) -> dict[str, Any]:
        """Fetch comprehensive domain metrics."""
        target = urlparse(url).netloc or url
        result = await self._call_ahrefs(
            "site-explorer-backlinks-stats",
            {"target": target, "mode": "domain"},
        )
        data = result.get("data", result) if isinstance(result, dict) else {}
        return {
            "total_backlinks": data.get("live", 0),
            "referring_domains": data.get("live_refdomains", 0),
            "dofollow": data.get("live_dofollow", 0),
        }
    def find_gaps(
        self,
        target_domains: set[str],
        competitor_domain_maps: dict[str, set[str]],
    ) -> list[dict[str, Any]]:
        """
        Find domains linking to competitors but not to the target.
        Returns a list of gap domains with metadata about which
        competitors they link to.
        """
        # Collect all competitor referring domains
        all_competitor_domains: dict[str, list[str]] = {}
        for comp_name, comp_domains in competitor_domain_maps.items():
            for domain in comp_domains:
                domain_lower = domain.lower()
                if domain_lower not in all_competitor_domains:
                    all_competitor_domains[domain_lower] = []
                all_competitor_domains[domain_lower].append(comp_name)
        # Find gaps: in competitor set but not in target set
        target_set_lower = {d.lower() for d in target_domains}
        gaps = []
        for domain, linked_comps in all_competitor_domains.items():
            if domain not in target_set_lower:
                gaps.append({
                    "domain": domain,
                    "linked_competitors": linked_comps,
                    "competitor_count": len(set(linked_comps)),
                })
        # Sort by number of competitors linking (more = higher priority)
        gaps.sort(key=lambda g: g["competitor_count"], reverse=True)
        return gaps
    def score_opportunities(
        self,
        gaps: list[dict[str, Any]],
        refdomains_data: dict[str, list[dict[str, Any]]],
        total_competitors: int,
    ) -> list[LinkOpportunity]:
        """
        Score gap opportunities by DR, traffic, relevance, and feasibility.
        Scoring factors:
        - DR weight: Higher DR = more impactful link
        - Competitor overlap: More competitors linking = easier to acquire
        - Category bonus: Editorial/news links valued higher
        - Traffic bonus: Higher traffic domains valued more
        """
        # Build a lookup of domain metadata from competitor refdomains
        domain_metadata: dict[str, dict[str, Any]] = {}
        for comp_url, domains in refdomains_data.items():
            for rd in domains:
                d = rd.get("domain", rd.get("domain_from", "")).lower()
                if d and d not in domain_metadata:
                    domain_metadata[d] = {
                        "dr": rd.get("domain_rating", rd.get("dr", 0)),
                        "traffic": rd.get("organic_traffic", rd.get("traffic", 0)),
                        "backlinks": rd.get("backlinks", 0),
                        "country": rd.get("country", ""),
                    }
        opportunities = []
        for gap in gaps:
            domain = gap["domain"]
            meta = domain_metadata.get(domain, {})
            dr = meta.get("dr", 0)
            traffic = meta.get("traffic", 0)
            comp_count = gap["competitor_count"]
            # Category detection
            category = self._detect_category(domain)
            # Feasibility score (0-100)
            # Higher if: more competitors link (social proof), blog/forum (easier outreach)
            feasibility = min(100, (
                (comp_count / max(total_competitors, 1)) * 40  # Competitor overlap
                + (30 if category in ("blog", "forum", "directory") else 10)  # Category ease
                + (20 if dr < 60 else 5)  # Lower DR = easier to get link from
                + (10 if traffic > 0 else 0)  # Active site bonus
            ))
            # Impact score (0-100)
            # Higher if: high DR, high traffic, editorial/news
            impact = min(100, (
                min(dr, 100) * 0.4  # DR weight (40%)
                + min(traffic / 1000, 30)  # Traffic weight (up to 30)
                + (20 if category in ("news", "edu_gov") else 5)  # Authority bonus
                + (comp_count / max(total_competitors, 1)) * 10  # Validation
            ))
            # Overall score = weighted average
            overall = round(feasibility * 0.4 + impact * 0.6, 1)
            opp = LinkOpportunity(
                domain=domain,
                dr=dr,
                traffic=traffic,
                linked_competitors=gap["linked_competitors"],
                competitor_count=comp_count,
                not_linked_target=True,
                category=category,
                feasibility_score=round(feasibility, 1),
                impact_score=round(impact, 1),
                overall_score=overall,
                backlinks_to_competitors=meta.get("backlinks", 0),
                country=meta.get("country", ""),
            )
            opportunities.append(opp)
        # Sort by overall score descending
        opportunities.sort(key=lambda o: o.overall_score, reverse=True)
        return opportunities
    def categorize_sources(
        self, opportunities: list[LinkOpportunity]
    ) -> dict[str, list[LinkOpportunity]]:
        """Group opportunities by source category."""
        categorized: dict[str, list[LinkOpportunity]] = {}
        for opp in opportunities:
            cat = opp.category
            if cat not in categorized:
                categorized[cat] = []
            categorized[cat].append(opp)
        return categorized
    # -- Orchestration -------------------------------------------------------
    async def analyze(
        self,
        target_url: str,
        competitor_urls: list[str],
        min_dr: float = 0,
        country_filter: str = "",
        limit: int = 1000,
    ) -> LinkGapResult:
        """Orchestrate full link gap analysis."""
        target_domain = urlparse(target_url).netloc or target_url
        comp_domains = [urlparse(c).netloc or c for c in competitor_urls]
        logger.info(f"Starting link gap analysis: {target_domain} vs {comp_domains}")
        result = LinkGapResult(
            target_url=target_url,
            target_domain=target_domain,
            competitor_urls=competitor_urls,
            competitor_domains=comp_domains,
            timestamp=datetime.now().isoformat(),
        )
        # Phase 1: Fetch target DR and referring domains
        logger.info("Phase 1: Fetching target data...")
        target_dr_task = self.get_domain_rating(target_url)
        target_rd_task = self.get_referring_domains(target_url, limit=limit)
        target_dr, target_refdomains = await asyncio.gather(
            target_dr_task, target_rd_task, return_exceptions=True,
        )
        result.target_dr = target_dr if isinstance(target_dr, (int, float)) else 0
        target_rd_list = target_refdomains if isinstance(target_refdomains, list) else []
        target_domain_set = {
            rd.get("domain", rd.get("domain_from", "")).lower()
            for rd in target_rd_list
            if rd.get("domain", rd.get("domain_from", ""))
        }
        # Phase 2: Fetch competitor referring domains (parallel)
        logger.info("Phase 2: Fetching competitor data...")
        comp_rd_tasks = {
            comp_url: self.get_referring_domains(comp_url, limit=limit)
            for comp_url in competitor_urls
        }
        comp_results = {}
        for comp_url, task in comp_rd_tasks.items():
            try:
                comp_rd = await task
                comp_results[comp_url] = comp_rd if isinstance(comp_rd, list) else []
            except Exception as e:
                logger.error(f"Failed to fetch refdomains for {comp_url}: {e}")
                comp_results[comp_url] = []
        # Build competitor domain maps
        competitor_domain_maps: dict[str, set[str]] = {}
        for comp_url, rd_list in comp_results.items():
            comp_domain = urlparse(comp_url).netloc or comp_url
            competitor_domain_maps[comp_domain] = {
                rd.get("domain", rd.get("domain_from", "")).lower()
                for rd in rd_list
                if rd.get("domain", rd.get("domain_from", ""))
            }
        # Phase 3: Find gaps
        logger.info("Phase 3: Finding link gaps...")
        raw_gaps = self.find_gaps(target_domain_set, competitor_domain_maps)
        logger.info(f"Found {len(raw_gaps)} gap domains")
        # Phase 4: Score opportunities
        logger.info("Phase 4: Scoring opportunities...")
        opportunities = self.score_opportunities(
            raw_gaps, comp_results, len(competitor_urls)
        )
        # Apply filters
        if min_dr > 0:
            opportunities = [o for o in opportunities if o.dr >= min_dr]
        if country_filter:
            country_lower = country_filter.lower()
            opportunities = [
                o for o in opportunities
                if o.country.lower() == country_lower or not o.country
            ]
        result.opportunities = opportunities
        result.top_opportunities = opportunities[:50]
        # Phase 5: Build summary
        logger.info("Phase 5: Building summary...")
        result.summary = self._build_summary(
            opportunities, comp_results, len(target_rd_list)
        )
        # Phase 6: Generate recommendations
        self._generate_issues(result)
        self._generate_recommendations(result)
        logger.info(f"Link gap analysis complete: {len(opportunities)} opportunities found")
        return result
    # -- Helpers -------------------------------------------------------------
    @staticmethod
    def _detect_category(domain: str) -> str:
        """Detect the category of a domain based on patterns."""
        domain_lower = domain.lower()
        for category, patterns in SOURCE_CATEGORY_PATTERNS.items():
            for pattern in patterns:
                if pattern in domain_lower:
                    return category
        # Fallback heuristics
        if domain_lower.endswith((".edu", ".ac.kr", ".gov", ".go.kr")):
            return "edu_gov"
        return "other"
    def _build_summary(
        self,
        opportunities: list[LinkOpportunity],
        comp_results: dict[str, list],
        target_rd_count: int,
    ) -> GapSummary:
        """Build summary statistics from opportunities."""
        summary = GapSummary()
        summary.total_opportunities = len(opportunities)
        summary.target_refdomains_count = target_rd_count
        if opportunities:
            dr_values = [o.dr for o in opportunities if o.dr > 0]
            summary.avg_dr = round(sum(dr_values) / max(len(dr_values), 1), 1)
            summary.high_dr_count = sum(1 for o in opportunities if o.dr >= 50)
            # Category breakdown
            cat_counts: dict[str, int] = {}
            country_counts: dict[str, int] = {}
            for opp in opportunities:
                cat_counts[opp.category] = cat_counts.get(opp.category, 0) + 1
                if opp.country:
                    country_counts[opp.country] = country_counts.get(opp.country, 0) + 1
            summary.category_breakdown = dict(
                sorted(cat_counts.items(), key=lambda x: x[1], reverse=True)
            )
            summary.top_countries = sorted(
                [{"country": k, "count": v} for k, v in country_counts.items()],
                key=lambda x: x["count"], reverse=True,
            )[:10]
        # Competitor refdomains counts
        for comp_url, rd_list in comp_results.items():
            comp_domain = urlparse(comp_url).netloc or comp_url
            summary.total_competitor_refdomains[comp_domain] = len(rd_list)
        return summary
    def _generate_issues(self, result: LinkGapResult) -> None:
        """Generate issues based on gap analysis."""
        issues = []
        if result.summary:
            # Large gap warning
            if result.summary.total_opportunities > 500:
                issues.append({
                    "type": "warning",
                    "category": "link_gap",
                    "message": (
                        f"Large link gap: {result.summary.total_opportunities} domains "
                        "link to competitors but not to you"
                    ),
                })
            # High-DR gap
            if result.summary.high_dr_count > 50:
                issues.append({
                    "type": "error",
                    "category": "authority_gap",
                    "message": (
                        f"{result.summary.high_dr_count} high-authority domains (DR 50+) "
                        "link to competitors but not to you"
                    ),
                })
            # Category-specific gaps
            news_gap = result.summary.category_breakdown.get("news", 0)
            if news_gap > 20:
                issues.append({
                    "type": "warning",
                    "category": "pr_gap",
                    "message": f"{news_gap} news/media domains link to competitors - consider digital PR",
                })
            edu_gap = result.summary.category_breakdown.get("edu_gov", 0)
            if edu_gap > 5:
                issues.append({
                    "type": "info",
                    "category": "edu_gov_gap",
                    "message": f"{edu_gap} .edu/.gov domains link to competitors - high-authority opportunity",
                })
        result.issues = issues
    def _generate_recommendations(self, result: LinkGapResult) -> None:
        """Generate actionable recommendations."""
        recs = []
        if not result.opportunities:
            recs.append("No significant link gaps found. Consider expanding competitor list.")
            result.recommendations = recs
            return
        # Top opportunities by category
        categorized = self.categorize_sources(result.top_opportunities[:100])
        if "news" in categorized:
            news_count = len(categorized["news"])
            top_news = [o.domain for o in categorized["news"][:3]]
            recs.append(
                f"Pursue {news_count} news/media link opportunities. "
                f"Top targets: {', '.join(top_news)}. "
                "Strategy: create newsworthy content, press releases, expert commentary."
            )
        if "blog" in categorized:
            blog_count = len(categorized["blog"])
            recs.append(
                f"Target {blog_count} blog/content site opportunities via guest posting, "
                "collaborative content, and expert interviews."
            )
        if "directory" in categorized:
            dir_count = len(categorized["directory"])
            recs.append(
                f"Submit to {dir_count} relevant directories and listing sites. "
                "Low effort, moderate impact for local SEO signals."
            )
        if "forum" in categorized:
            forum_count = len(categorized["forum"])
            recs.append(
                f"Engage in {forum_count} forum/community sites with helpful answers "
                "and resource sharing. Build presence before linking."
            )
        if "korean_platform" in categorized:
            kr_count = len(categorized["korean_platform"])
            recs.append(
                f"Build presence on {kr_count} Korean platforms (Naver, Tistory, Brunch). "
                "Critical for Korean SERP visibility."
            )
        if "edu_gov" in categorized:
            eg_count = len(categorized["edu_gov"])
            recs.append(
                f"Target {eg_count} .edu/.gov link opportunities through scholarship "
                "programs, research partnerships, or government resource contributions."
            )
        # Multi-competitor overlap
        multi_comp = [o for o in result.top_opportunities if o.competitor_count >= 2]
        if multi_comp:
            recs.append(
                f"{len(multi_comp)} domains link to multiple competitors but not to you. "
                "These are high-priority targets as they validate industry relevance."
            )
        # Quick wins: high feasibility, moderate impact
        quick_wins = [
            o for o in result.opportunities[:100]
            if o.feasibility_score >= 60 and o.impact_score >= 30
        ]
        if quick_wins:
            recs.append(
                f"Prioritize {len(quick_wins)} quick-win opportunities with high "
                "feasibility and moderate impact for fastest link acquisition."
            )
        result.recommendations = recs
 # ---------------------------------------------------------------------------
 # Output Formatting
 # ---------------------------------------------------------------------------
 def format_rich_output(result: LinkGapResult) -> None:
    """Display gap analysis results using Rich tables."""
    console.print(f"\n[bold cyan]Link Gap Analysis: {result.target_domain}[/bold cyan]")
    console.print(f"[dim]vs {', '.join(result.competitor_domains)}[/dim]")
    console.print(f"[dim]Timestamp: {result.timestamp}[/dim]\n")
    # Summary
    if result.summary:
        summary_table = Table(title="Summary", show_header=True, header_style="bold magenta")
        summary_table.add_column("Metric", style="cyan")
        summary_table.add_column("Value", style="green")
        summary_table.add_row("Target DR", str(result.target_dr))
        summary_table.add_row("Target Referring Domains", str(result.summary.target_refdomains_count))
        summary_table.add_row("Total Gap Opportunities", str(result.summary.total_opportunities))
        summary_table.add_row("Avg Opportunity DR", str(result.summary.avg_dr))
        summary_table.add_row("High-DR Opportunities (50+)", str(result.summary.high_dr_count))
        for comp, count in result.summary.total_competitor_refdomains.items():
            summary_table.add_row(f"  {comp} Refdomains", str(count))
        console.print(summary_table)
    # Category breakdown
    if result.summary and result.summary.category_breakdown:
        cat_table = Table(title="\nCategory Breakdown", show_header=True, header_style="bold magenta")
        cat_table.add_column("Category", style="cyan")
        cat_table.add_column("Count", style="green")
        for cat, count in result.summary.category_breakdown.items():
            cat_table.add_row(cat, str(count))
        console.print(cat_table)
    # Top opportunities
    if result.top_opportunities:
        opp_table = Table(
            title=f"\nTop Opportunities (showing {min(25, len(result.top_opportunities))})",
            show_header=True,
            header_style="bold magenta",
        )
        opp_table.add_column("Domain", style="cyan", max_width=35)
        opp_table.add_column("DR", style="green", justify="right")
        opp_table.add_column("Category", style="yellow")
        opp_table.add_column("Comps", justify="right")
        opp_table.add_column("Score", style="bold green", justify="right")
        opp_table.add_column("Feasibility", justify="right")
        opp_table.add_column("Impact", justify="right")
        for opp in result.top_opportunities[:25]:
            opp_table.add_row(
                opp.domain[:35],
                str(int(opp.dr)),
                opp.category,
                str(opp.competitor_count),
                f"{opp.overall_score:.1f}",
                f"{opp.feasibility_score:.0f}",
                f"{opp.impact_score:.0f}",
            )
        console.print(opp_table)
    # Issues
    if result.issues:
        console.print("\n[bold red]Issues:[/bold red]")
        for issue in result.issues:
            icon_map = {"error": "[red]ERROR[/red]", "warning": "[yellow]WARN[/yellow]", "info": "[blue]INFO[/blue]"}
            icon = icon_map.get(issue["type"], "[dim]INFO[/dim]")
            console.print(f"  {icon} [{issue['category']}] {issue['message']}")
    # Recommendations
    if result.recommendations:
        console.print("\n[bold green]Recommendations:[/bold green]")
        for i, rec in enumerate(result.recommendations, 1):
            console.print(f"  {i}. {rec}")
    console.print()
 def result_to_dict(result: LinkGapResult) -> dict[str, Any]:
    """Convert gap result to JSON-serializable dict."""
    return {
        "target_url": result.target_url,
        "target_domain": result.target_domain,
        "target_dr": result.target_dr,
        "competitor_urls": result.competitor_urls,
        "competitor_domains": result.competitor_domains,
        "summary": asdict(result.summary) if result.summary else None,
        "opportunities": [asdict(o) for o in result.opportunities],
        "top_opportunities": [asdict(o) for o in result.top_opportunities],
        "issues": result.issues,
        "recommendations": result.recommendations,
        "timestamp": result.timestamp,
    }
 # ---------------------------------------------------------------------------
 # CLI
 # ---------------------------------------------------------------------------
 def parse_args() -> argparse.Namespace:
    """Parse command-line arguments."""
    parser = argparse.ArgumentParser(
        description="Link Gap Finder - Identify link building opportunities vs competitors",
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog="""
 Examples:
  python link_gap_finder.py --target https://example.com --competitor https://comp1.com --json
  python link_gap_finder.py --target https://example.com --competitor https://comp1.com --competitor https://comp2.com --min-dr 30 --json
  python link_gap_finder.py --target https://example.com --competitor https://comp1.com --country kr --output gap_report.json
        """,
    )
    parser.add_argument("--target", required=True, help="Target URL or domain")
    parser.add_argument(
        "--competitor", action="append", required=True,
        help="Competitor URL or domain (can be repeated)",
    )
    parser.add_argument(
        "--min-dr", type=float, default=0,
        help="Minimum DR filter for opportunities (default: 0)",
    )
    parser.add_argument(
        "--country", default="",
        help="Filter by country code (e.g., kr, us, jp)",
    )
    parser.add_argument(
        "--limit", type=int, default=1000,
        help="Max referring domains to fetch per site (default: 1000)",
    )
    parser.add_argument("--json", action="store_true", help="Output as JSON")
    parser.add_argument("--output", "-o", help="Save output to file")
    return parser.parse_args()
 async def main() -> None:
    """Main entry point."""
    args = parse_args()
    finder = LinkGapFinder()
    try:
        result = await finder.analyze(
            target_url=args.target,
            competitor_urls=args.competitor,
            min_dr=args.min_dr,
            country_filter=args.country,
            limit=args.limit,
        )
        if args.json or args.output:
            output_data = result_to_dict(result)
            json_str = json.dumps(output_data, indent=2, ensure_ascii=False)
            if args.output:
                with open(args.output, "w", encoding="utf-8") as f:
                    f.write(json_str)
                logger.info(f"Report saved to {args.output}")
            if args.json:
                print(json_str)
        else:
            format_rich_output(result)
        finder.print_stats()
    except KeyboardInterrupt:
        logger.warning("Analysis interrupted by user")
        sys.exit(1)
    except Exception as e:
        logger.error(f"Analysis failed: {e}")
        if args.json:
            print(json.dumps({"error": str(e)}, indent=2))
        sys.exit(1)
 if __name__ == "__main__":
    asyncio.run(main())
--- a/custom-skills/22-seo-link-building/code/scripts/requirements.txt
+++ b/custom-skills/22-seo-link-building/code/scripts/requirements.txt
@@ -0,0 +1,8 @@
 # 22-seo-link-building dependencies
 requests>=2.31.0
 aiohttp>=3.9.0
 pandas>=2.1.0
 tenacity>=8.2.0
 tqdm>=4.66.0
 python-dotenv>=1.0.0
 rich>=13.7.0
--- a/custom-skills/22-seo-link-building/desktop/SKILL.md
+++ b/custom-skills/22-seo-link-building/desktop/SKILL.md
@@ -0,0 +1,116 @@
 ---
 name: seo-link-building
 description: |
  Link building diagnosis and backlink analysis tool.
  Triggers: backlink audit, link building, referring domains, toxic links, link gap, broken backlinks, 백링크 분석, 링크빌딩.
 ---
 # SEO Link Building Diagnosis
 ## Purpose
 Analyze backlink profiles, detect toxic links, find competitor link gaps, track link velocity, and map Korean platform links. Provides actionable link building recommendations.
 ## Core Capabilities
 1. **Backlink Profile Audit** - DR, referring domains, dofollow ratio, anchor distribution
 2. **Toxic Link Detection** - PBN patterns, spam domains, link farm identification
 3. **Competitor Link Gap Analysis** - Domains linking to competitors but not target
 4. **Link Velocity Tracking** - New/lost referring domains over time
 5. **Broken Backlink Recovery** - Find and reclaim broken high-DR backlinks
 6. **Korean Platform Mapping** - Naver Blog, Cafe, Tistory, Brunch, Korean news
 ## MCP Tool Usage
 ### Ahrefs for Backlink Data
 ```
 mcp__ahrefs__site-explorer-all-backlinks: Get all backlinks for a target
 mcp__ahrefs__site-explorer-backlinks-stats: Backlink statistics overview
 mcp__ahrefs__site-explorer-referring-domains: List referring domains
 mcp__ahrefs__site-explorer-anchors: Anchor text distribution
 mcp__ahrefs__site-explorer-broken-backlinks: Find broken backlinks
 mcp__ahrefs__site-explorer-domain-rating: Get Domain Rating
 mcp__ahrefs__site-explorer-domain-rating-history: DR trend over time
 mcp__ahrefs__site-explorer-refdomains-history: Referring domains trend
 mcp__ahrefs__site-explorer-linked-domains: Domains linked from target
 ```
 ### Notion for Report Storage
 ```
 mcp__notion__notion-create-pages: Save audit report to SEO Audit Log
 mcp__notion__notion-update-page: Update existing audit entries
 ```
 ## Workflow
 ### 1. Backlink Profile Audit
 1. Fetch Domain Rating via `site-explorer-domain-rating`
 2. Get backlink stats via `site-explorer-backlinks-stats`
 3. Retrieve referring domains via `site-explorer-referring-domains`
 4. Analyze anchor distribution via `site-explorer-anchors`
 5. Detect toxic links (PBN patterns, spam keywords, suspicious TLDs)
 6. Map Korean platform links from referring domains
 7. Report with issues and recommendations
 ### 2. Link Gap Analysis
 1. Fetch target referring domains
 2. Fetch competitor referring domains (parallel)
 3. Compute set difference (competitor - target)
 4. Score opportunities by DR, traffic, category
 5. Categorize sources (news, blog, forum, directory, Korean platform)
 6. Rank by feasibility and impact
 7. Report top opportunities with recommendations
 ### 3. Link Velocity Check
 1. Fetch refdomains-history for last 90 days
 2. Calculate new/lost referring domains per period
 3. Determine velocity trend (growing/stable/declining)
 4. Flag declining velocity as issue
 ### 4. Broken Backlink Recovery
 1. Fetch broken backlinks via `site-explorer-broken-backlinks`
 2. Sort by DR (highest value first)
 3. Recommend 301 redirects or content recreation
 ## Output Format
 ```markdown
 ## Link Building Audit: [domain]
 ### Overview
 - Domain Rating: [DR]
 - Referring Domains: [count]
 - Dofollow Ratio: [ratio]
 - Toxic Links: [count] ([risk level])
 ### Anchor Distribution
 | Type | Count | % |
 |------|-------|---|
 | Branded | [n] | [%] |
 | Exact Match | [n] | [%] |
 | Generic | [n] | [%] |
 | Naked URL | [n] | [%] |
 ### Toxic Links (Top 10)
 | Domain | Risk Score | Reason |
 |--------|-----------|--------|
 ### Korean Platform Links
 | Platform | Count |
 |----------|-------|
 ### Link Velocity
 | Period | New | Lost |
 |--------|-----|------|
 ### Recommendations
 1. [Priority actions]
 ```
 ## Notion Output (Required)
 All audit reports MUST be saved to OurDigital SEO Audit Log:
 - **Database ID**: `2c8581e5-8a1e-8035-880b-e38cefc2f3ef`
 - **Properties**: Issue (title), Site (url), Category (Link Building), Priority, Found Date, Audit ID
 - **Language**: Korean with English technical terms
 - **Audit ID Format**: LINK-YYYYMMDD-NNN
--- a/custom-skills/22-seo-link-building/desktop/skill.yaml
+++ b/custom-skills/22-seo-link-building/desktop/skill.yaml
@@ -0,0 +1,8 @@
 name: seo-link-building
 description: |
  Link building diagnosis and backlink analysis. Triggers: backlink audit, link building, referring domains, toxic links, link gap, broken backlinks.
 allowed-tools:
  - mcp__ahrefs__*
  - mcp__notion__*
  - WebSearch
  - WebFetch
--- a/custom-skills/22-seo-link-building/desktop/tools/ahrefs.md
+++ b/custom-skills/22-seo-link-building/desktop/tools/ahrefs.md
@@ -0,0 +1,70 @@
 # Ahrefs
 ## Tools Used
 ### site-explorer-all-backlinks
 - **Purpose**: Get all backlinks for a target domain
 - **Parameters**: target, mode (domain/prefix/url), limit, order_by
 - **Returns**: List of backlinks with source URL, domain, DR, anchor, dofollow status
 ### site-explorer-backlinks-stats
 - **Purpose**: Backlink statistics overview
 - **Parameters**: target, mode
 - **Returns**: Total backlinks, referring domains, dofollow/nofollow counts
 ### site-explorer-referring-domains
 - **Purpose**: List all referring domains
 - **Parameters**: target, mode, limit, order_by
 - **Returns**: Domains with DR, backlinks count, traffic, country
 ### site-explorer-anchors
 - **Purpose**: Anchor text distribution
 - **Parameters**: target, mode, limit, order_by
 - **Returns**: Anchor texts with backlink and referring domain counts
 ### site-explorer-broken-backlinks
 - **Purpose**: Find broken backlinks for recovery
 - **Parameters**: target, mode, limit, order_by
 - **Returns**: Broken links with source, target URL, HTTP code, DR
 ### site-explorer-domain-rating
 - **Purpose**: Get Domain Rating for a target
 - **Parameters**: target
 - **Returns**: Domain Rating value and Ahrefs rank
 ### site-explorer-domain-rating-history
 - **Purpose**: DR trend over time
 - **Parameters**: target, date_from
 - **Returns**: Historical DR data points
 ### site-explorer-refdomains-history
 - **Purpose**: Referring domains trend over time
 - **Parameters**: target, mode, date_from
 - **Returns**: Historical referring domain counts
 ### site-explorer-linked-domains
 - **Purpose**: Domains linked from the target
 - **Parameters**: target, mode, limit
 - **Returns**: Outgoing linked domains with counts
 ## Configuration
 - Ahrefs MCP tools are available via `mcp__ahrefs__*` prefix
 - No API key needed when using MCP (handled by tool server)
 - Rate limits: Follow Ahrefs plan limits (typically 500 rows/request)
 ## Examples
 ```
 # Get backlink stats
 mcp__ahrefs__site-explorer-backlinks-stats(target="example.com", mode="domain")
 # Get referring domains sorted by DR
 mcp__ahrefs__site-explorer-referring-domains(target="example.com", mode="domain", limit=500, order_by="domain_rating:desc")
 # Get anchor text distribution
 mcp__ahrefs__site-explorer-anchors(target="example.com", mode="domain", limit=200)
 # Find broken backlinks
 mcp__ahrefs__site-explorer-broken-backlinks(target="example.com", mode="domain", limit=100)
 ```
--- a/custom-skills/22-seo-link-building/desktop/tools/notion.md
+++ b/custom-skills/22-seo-link-building/desktop/tools/notion.md
@@ -0,0 +1,39 @@
 # Notion
 ## Tools Used
 ### notion-create-pages
 - **Purpose**: Save link building audit reports to SEO Audit Log
 - **Database ID**: `2c8581e5-8a1e-8035-880b-e38cefc2f3ef`
 - **Required Properties**:
  - Issue (title): Report title in Korean with date
  - Site (url): Audited website URL
  - Category (select): "Link Building"
  - Priority (select): Critical / High / Medium / Low
  - Found Date (date): YYYY-MM-DD
  - Audit ID (rich_text): LINK-YYYYMMDD-NNN
 ### notion-update-page
 - **Purpose**: Update existing audit entries with follow-up findings
 ## Configuration
 - Notion MCP tools available via `mcp__notion__*` prefix
 - Authentication handled by MCP tool server
 ## Examples
 ```
 # Create a link building audit report
 mcp__notion__notion-create-pages(
  parent={"database_id": "2c8581e5-8a1e-8035-880b-e38cefc2f3ef"},
  properties={
    "Issue": {"title": [{"text": {"content": "백링크 프로필 분석 - example.com (2025-01-15)"}}]},
    "Site": {"url": "https://example.com"},
    "Category": {"select": {"name": "Link Building"}},
    "Priority": {"select": {"name": "High"}},
    "Found Date": {"date": {"start": "2025-01-15"}},
    "Audit ID": {"rich_text": [{"text": {"content": "LINK-20250115-001"}}]}
  }
 )
 ```
--- a/custom-skills/22-seo-link-building/desktop/tools/websearch.md
+++ b/custom-skills/22-seo-link-building/desktop/tools/websearch.md
@@ -0,0 +1,24 @@
 # WebSearch
 ## Tools Used
 ### WebSearch
 - **Purpose**: Research link building strategies, competitor insights, and industry best practices
 - **Usage**: Supplement Ahrefs data with web research for context
 ### WebFetch
 - **Purpose**: Fetch specific web pages for content analysis and link prospecting
 - **Usage**: Verify link opportunities, check page content relevance
 ## Examples
 ```
 # Research link building strategies for a niche
 WebSearch("link building strategies for SaaS companies 2025")
 # Research Korean link building opportunities
 WebSearch("네이버 블로그 백링크 전략 2025")
 # Check if a target page is relevant for outreach
 WebFetch("https://example.com/resources", "What topics does this page cover?")
 ```
--- a/custom-skills/23-seo-content-strategy/code/CLAUDE.md
+++ b/custom-skills/23-seo-content-strategy/code/CLAUDE.md
@@ -0,0 +1,142 @@
 # CLAUDE.md
 ## Overview
 Content strategy tool for SEO-driven content planning. Performs content inventory via sitemap crawl and Ahrefs top pages, scores content performance, detects content decay, analyzes topic gaps vs competitors, maps topic clusters, and generates content briefs. Supports Korean content patterns (Naver Blog format, review/후기 content).
 ## Quick Start
 ```bash
 pip install -r scripts/requirements.txt
 # Content audit
 python scripts/content_auditor.py --url https://example.com --json
 # Content gap analysis
 python scripts/content_gap_analyzer.py --target https://example.com --competitor https://competitor.com --json
 # Generate content brief
 python scripts/content_brief_generator.py --keyword "치과 임플란트 비용" --url https://example.com --json
 ```
 ## Scripts
 | Script | Purpose | Key Output |
 |--------|---------|------------|
 | `content_auditor.py` | Content inventory, performance scoring, decay detection | Content inventory with scores and decay flags |
 | `content_gap_analyzer.py` | Topic gap analysis and cluster mapping vs competitors | Missing topics, cluster map, editorial calendar |
 | `content_brief_generator.py` | Generate SEO content briefs with outlines | Brief with outline, keywords, word count targets |
 | `base_client.py` | Shared utilities | RateLimiter, ConfigManager, BaseAsyncClient |
 ## Content Auditor
 ```bash
 # Full content audit
 python scripts/content_auditor.py --url https://example.com --json
 # Detect decaying content
 python scripts/content_auditor.py --url https://example.com --decay --json
 # Filter by content type
 python scripts/content_auditor.py --url https://example.com --type blog --json
 ```
 **Capabilities**:
 - Content inventory via sitemap crawl + Ahrefs top-pages
 - Performance scoring (traffic, rankings, backlinks)
 - Content decay detection (pages losing traffic over time)
 - Content type classification (blog, product, service, landing, resource)
 - Word count and freshness assessment
 - Korean content format analysis (Naver Blog style, 후기/review content)
 ## Content Gap Analyzer
 ```bash
 # Gap analysis vs competitor
 python scripts/content_gap_analyzer.py --target https://example.com --competitor https://comp1.com --json
 # With topic cluster mapping
 python scripts/content_gap_analyzer.py --target https://example.com --competitor https://comp1.com --clusters --json
 ```
 **Capabilities**:
 - Topic gap identification vs competitors
 - Topic cluster mapping (pillar + cluster pages)
 - Content freshness comparison
 - Content volume comparison
 - Editorial calendar generation with priority scoring
 - Korean content opportunity detection
 ## Content Brief Generator
 ```bash
 # Generate brief for keyword
 python scripts/content_brief_generator.py --keyword "치과 임플란트 비용" --url https://example.com --json
 # With competitor analysis
 python scripts/content_brief_generator.py --keyword "dental implant cost" --url https://example.com --competitors 5 --json
 ```
 **Capabilities**:
 - Content outline generation with H2/H3 structure
 - Target keyword list (primary + secondary + LSI)
 - Word count recommendation based on top-ranking pages
 - Competitor content analysis (structure, word count, topics covered)
 - Internal linking suggestions
 - Korean content format recommendations
 ## Ahrefs MCP Tools Used
 | Tool | Purpose |
 |------|---------|
 | `site-explorer-top-pages` | Get top performing pages |
 | `site-explorer-pages-by-traffic` | Pages ranked by organic traffic |
 | `site-explorer-organic-keywords` | Keywords per page |
 | `site-explorer-organic-competitors` | Find content competitors |
 | `site-explorer-best-by-external-links` | Best content by links |
 ## Output Format
 ```json
 {
  "url": "https://example.com",
  "content_inventory": {
    "total_pages": 150,
    "by_type": {"blog": 80, "product": 40, "service": 20, "other": 10},
    "avg_performance_score": 45
  },
  "decaying_content": [...],
  "top_performers": [...],
  "gaps": [...],
  "clusters": [...],
  "timestamp": "2025-01-01T00:00:00"
 }
 ```
 ## Notion Output (Required)
 **IMPORTANT**: All audit reports MUST be saved to the OurDigital SEO Audit Log database.
 ### Database Configuration
 | Field | Value |
 |-------|-------|
 | Database ID | `2c8581e5-8a1e-8035-880b-e38cefc2f3ef` |
 | URL | https://www.notion.so/dintelligence/2c8581e58a1e8035880be38cefc2f3ef |
 ### Required Properties
 | Property | Type | Description |
 |----------|------|-------------|
 | Issue | Title | Report title (Korean + date) |
 | Site | URL | Audited website URL |
 | Category | Select | Content Strategy |
 | Priority | Select | Based on gap severity |
 | Found Date | Date | Audit date (YYYY-MM-DD) |
 | Audit ID | Rich Text | Format: CONTENT-YYYYMMDD-NNN |
 ### Language Guidelines
 - Report content in Korean (한국어)
 - Keep technical English terms as-is
 - URLs and code remain unchanged
--- a/custom-skills/23-seo-content-strategy/code/scripts/base_client.py
+++ b/custom-skills/23-seo-content-strategy/code/scripts/base_client.py
@@ -0,0 +1,207 @@
 """
 Base Client - Shared async client utilities
 ===========================================
 Purpose: Rate-limited async operations for API clients
 Python: 3.10+
 """
 import asyncio
 import logging
 import os
 from asyncio import Semaphore
 from datetime import datetime
 from typing import Any, Callable, TypeVar
 from dotenv import load_dotenv
 from tenacity import (
    retry,
    stop_after_attempt,
    wait_exponential,
    retry_if_exception_type,
 )
 # Load environment variables
 load_dotenv()
 # Logging setup
 logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s",
 )
 T = TypeVar("T")
 class RateLimiter:
    """Rate limiter using token bucket algorithm."""
    def __init__(self, rate: float, per: float = 1.0):
        """
        Initialize rate limiter.
        Args:
            rate: Number of requests allowed
            per: Time period in seconds (default: 1 second)
        """
        self.rate = rate
        self.per = per
        self.tokens = rate
        self.last_update = datetime.now()
        self._lock = asyncio.Lock()
    async def acquire(self) -> None:
        """Acquire a token, waiting if necessary."""
        async with self._lock:
            now = datetime.now()
            elapsed = (now - self.last_update).total_seconds()
            self.tokens = min(self.rate, self.tokens + elapsed * (self.rate / self.per))
            self.last_update = now
            if self.tokens < 1:
                wait_time = (1 - self.tokens) * (self.per / self.rate)
                await asyncio.sleep(wait_time)
                self.tokens = 0
            else:
                self.tokens -= 1
 class BaseAsyncClient:
    """Base class for async API clients with rate limiting."""
    def __init__(
        self,
        max_concurrent: int = 5,
        requests_per_second: float = 3.0,
        logger: logging.Logger | None = None,
    ):
        """
        Initialize base client.
        Args:
            max_concurrent: Maximum concurrent requests
            requests_per_second: Rate limit
            logger: Logger instance
        """
        self.semaphore = Semaphore(max_concurrent)
        self.rate_limiter = RateLimiter(requests_per_second)
        self.logger = logger or logging.getLogger(self.__class__.__name__)
        self.stats = {
            "requests": 0,
            "success": 0,
            "errors": 0,
            "retries": 0,
        }
    @retry(
        stop=stop_after_attempt(3),
        wait=wait_exponential(multiplier=1, min=2, max=10),
        retry=retry_if_exception_type(Exception),
    )
    async def _rate_limited_request(
        self,
        coro: Callable[[], Any],
    ) -> Any:
        """Execute a request with rate limiting and retry."""
        async with self.semaphore:
            await self.rate_limiter.acquire()
            self.stats["requests"] += 1
            try:
                result = await coro()
                self.stats["success"] += 1
                return result
            except Exception as e:
                self.stats["errors"] += 1
                self.logger.error(f"Request failed: {e}")
                raise
    async def batch_requests(
        self,
        requests: list[Callable[[], Any]],
        desc: str = "Processing",
    ) -> list[Any]:
        """Execute multiple requests concurrently."""
        try:
            from tqdm.asyncio import tqdm
            has_tqdm = True
        except ImportError:
            has_tqdm = False
        async def execute(req: Callable) -> Any:
            try:
                return await self._rate_limited_request(req)
            except Exception as e:
                return {"error": str(e)}
        tasks = [execute(req) for req in requests]
        if has_tqdm:
            results = []
            for coro in tqdm.as_completed(tasks, total=len(tasks), desc=desc):
                result = await coro
                results.append(result)
            return results
        else:
            return await asyncio.gather(*tasks, return_exceptions=True)
    def print_stats(self) -> None:
        """Print request statistics."""
        self.logger.info("=" * 40)
        self.logger.info("Request Statistics:")
        self.logger.info(f"  Total Requests: {self.stats['requests']}")
        self.logger.info(f"  Successful: {self.stats['success']}")
        self.logger.info(f"  Errors: {self.stats['errors']}")
        self.logger.info("=" * 40)
 class ConfigManager:
    """Manage API configuration and credentials."""
    def __init__(self):
        load_dotenv()
    @property
    def google_credentials_path(self) -> str | None:
        """Get Google service account credentials path."""
        # Prefer SEO-specific credentials, fallback to general credentials
        seo_creds = os.path.expanduser("~/.credential/ourdigital-seo-agent.json")
        if os.path.exists(seo_creds):
            return seo_creds
        return os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
    @property
    def pagespeed_api_key(self) -> str | None:
        """Get PageSpeed Insights API key."""
        return os.getenv("PAGESPEED_API_KEY")
    @property
    def custom_search_api_key(self) -> str | None:
        """Get Custom Search API key."""
        return os.getenv("CUSTOM_SEARCH_API_KEY")
    @property
    def custom_search_engine_id(self) -> str | None:
        """Get Custom Search Engine ID."""
        return os.getenv("CUSTOM_SEARCH_ENGINE_ID")
    @property
    def notion_token(self) -> str | None:
        """Get Notion API token."""
        return os.getenv("NOTION_TOKEN") or os.getenv("NOTION_API_KEY")
    def validate_google_credentials(self) -> bool:
        """Validate Google credentials are configured."""
        creds_path = self.google_credentials_path
        if not creds_path:
            return False
        return os.path.exists(creds_path)
    def get_required(self, key: str) -> str:
        """Get required environment variable or raise error."""
        value = os.getenv(key)
        if not value:
            raise ValueError(f"Missing required environment variable: {key}")
        return value
 # Singleton config instance
 config = ConfigManager()
--- a/custom-skills/23-seo-content-strategy/code/scripts/content_auditor.py
+++ b/custom-skills/23-seo-content-strategy/code/scripts/content_auditor.py
@@ -0,0 +1,716 @@
 """
 Content Auditor - SEO Content Inventory & Performance Analysis
 ==============================================================
 Purpose: Build content inventory, score performance, detect decay,
         classify content types, and analyze Korean content patterns.
 Python: 3.10+
 """
 import argparse
 import asyncio
 import json
 import logging
 import re
 import sys
 from dataclasses import asdict, dataclass, field
 from datetime import datetime, timedelta
 from typing import Any
 from urllib.parse import urlparse
 import aiohttp
 import requests
 from bs4 import BeautifulSoup
 from base_client import BaseAsyncClient, config
 logger = logging.getLogger(__name__)
 # ---------------------------------------------------------------------------
 # Data classes
 # ---------------------------------------------------------------------------
@dataclass
 class ContentPage:
    """Single content page with performance metrics."""
    url: str
    title: str = ""
    content_type: str = "other"
    word_count: int = 0
    traffic: int = 0
    keywords_count: int = 0
    backlinks: int = 0
    performance_score: float = 0.0
    last_modified: str = ""
    is_decaying: bool = False
    decay_rate: float = 0.0
    korean_pattern: str = ""
    topics: list[str] = field(default_factory=list)
@dataclass
 class ContentInventory:
    """Aggregated content inventory summary."""
    total_pages: int = 0
    by_type: dict[str, int] = field(default_factory=dict)
    avg_performance_score: float = 0.0
    avg_word_count: float = 0.0
    pages: list[ContentPage] = field(default_factory=list)
    freshness_distribution: dict[str, int] = field(default_factory=dict)
@dataclass
 class ContentAuditResult:
    """Full content audit result."""
    url: str
    timestamp: str = ""
    content_inventory: ContentInventory = field(default_factory=ContentInventory)
    top_performers: list[ContentPage] = field(default_factory=list)
    decaying_content: list[ContentPage] = field(default_factory=list)
    korean_content_analysis: dict[str, Any] = field(default_factory=dict)
    recommendations: list[str] = field(default_factory=list)
    errors: list[str] = field(default_factory=list)
 # ---------------------------------------------------------------------------
 # URL pattern rules for content type classification
 # ---------------------------------------------------------------------------
 CONTENT_TYPE_PATTERNS = {
    "blog": [
        r"/blog/", r"/post/", r"/posts/", r"/article/", r"/articles/",
        r"/news/", r"/magazine/", r"/stories/", r"/insights/",
        r"/블로그/", r"/소식/", r"/뉴스/",
    ],
    "product": [
        r"/product/", r"/products/", r"/shop/", r"/store/",
        r"/item/", r"/goods/", r"/catalog/",
        r"/제품/", r"/상품/", r"/쇼핑/",
    ],
    "service": [
        r"/service/", r"/services/", r"/solutions/", r"/offering/",
        r"/진료/", r"/서비스/", r"/시술/", r"/치료/",
    ],
    "landing": [
        r"/lp/", r"/landing/", r"/campaign/", r"/promo/",
        r"/event/", r"/이벤트/", r"/프로모션/",
    ],
    "resource": [
        r"/resource/", r"/resources/", r"/guide/", r"/guides/",
        r"/whitepaper/", r"/ebook/", r"/download/", r"/faq/",
        r"/help/", r"/support/", r"/가이드/", r"/자료/",
    ],
 }
 KOREAN_CONTENT_PATTERNS = {
    "naver_blog_style": [
        r"후기", r"리뷰", r"체험", r"솔직후기", r"방문후기",
        r"사용후기", r"이용후기",
    ],
    "listicle": [
        r"추천", r"베스트", r"TOP\s*\d+", r"\d+선", r"\d+가지",
        r"모음", r"정리", r"비교",
    ],
    "how_to": [
        r"방법", r"하는\s*법", r"하는\s*방법", r"가이드",
        r"따라하기", r"시작하기", r"알아보기",
    ],
    "informational": [
        r"이란", r"뜻", r"의미", r"차이", r"비교",
        r"장단점", r"효과", r"부작용", r"비용", r"가격",
    ],
 }
 # ---------------------------------------------------------------------------
 # ContentAuditor
 # ---------------------------------------------------------------------------
 class ContentAuditor(BaseAsyncClient):
    """Content auditor using Ahrefs API and sitemap crawling."""
    def __init__(self, max_concurrent: int = 5, requests_per_second: float = 2.0):
        super().__init__(max_concurrent=max_concurrent, requests_per_second=requests_per_second)
        self.session: aiohttp.ClientSession | None = None
    async def _ensure_session(self) -> aiohttp.ClientSession:
        if self.session is None or self.session.closed:
            timeout = aiohttp.ClientTimeout(total=30)
            self.session = aiohttp.ClientSession(timeout=timeout)
        return self.session
    async def close(self) -> None:
        if self.session and not self.session.closed:
            await self.session.close()
    # ------------------------------------------------------------------
    # Ahrefs data retrieval
    # ------------------------------------------------------------------
    async def get_top_pages(self, url: str, limit: int = 100) -> list[dict]:
        """
        Retrieve top pages via Ahrefs site-explorer-top-pages.
        Returns list of dicts with keys: url, traffic, keywords, value, top_keyword.
        """
        self.logger.info(f"Fetching top pages from Ahrefs for {url}")
        target = urlparse(url).netloc or url
        try:
            # Ahrefs MCP call: site-explorer-top-pages
            # In MCP context this would be called by the agent.
            # Standalone fallback: use REST API if AHREFS_API_KEY is set.
            api_key = config.get_required("AHREFS_API_KEY") if hasattr(config, "get_required") else None
            if not api_key:
                self.logger.warning("AHREFS_API_KEY not set; returning empty top pages")
                return []
            resp = requests.get(
                "https://api.ahrefs.com/v3/site-explorer/top-pages",
                params={"target": target, "limit": limit, "select": "url,traffic,keywords,value,top_keyword"},
                headers={"Authorization": f"Bearer {api_key}"},
                timeout=30,
            )
            resp.raise_for_status()
            data = resp.json()
            pages = data.get("pages", data.get("items", []))
            self.logger.info(f"Retrieved {len(pages)} top pages")
            return pages
        except Exception as exc:
            self.logger.warning(f"Ahrefs top-pages lookup failed: {exc}")
            return []
    async def get_pages_by_traffic(self, url: str, limit: int = 100) -> list[dict]:
        """
        Retrieve pages sorted by organic traffic via Ahrefs site-explorer-pages-by-traffic.
        Returns list of dicts with keys: url, traffic, keywords, top_keyword.
        """
        self.logger.info(f"Fetching pages-by-traffic from Ahrefs for {url}")
        target = urlparse(url).netloc or url
        try:
            api_key = config.get_required("AHREFS_API_KEY") if hasattr(config, "get_required") else None
            if not api_key:
                self.logger.warning("AHREFS_API_KEY not set; returning empty traffic pages")
                return []
            resp = requests.get(
                "https://api.ahrefs.com/v3/site-explorer/pages-by-traffic",
                params={"target": target, "limit": limit, "select": "url,traffic,keywords,top_keyword"},
                headers={"Authorization": f"Bearer {api_key}"},
                timeout=30,
            )
            resp.raise_for_status()
            data = resp.json()
            pages = data.get("pages", data.get("items", []))
            self.logger.info(f"Retrieved {len(pages)} pages by traffic")
            return pages
        except Exception as exc:
            self.logger.warning(f"Ahrefs pages-by-traffic lookup failed: {exc}")
            return []
    # ------------------------------------------------------------------
    # Sitemap crawling
    # ------------------------------------------------------------------
    async def crawl_sitemap(self, url: str) -> list[str]:
        """Discover URLs from sitemap.xml."""
        sitemap_urls_to_try = [
            f"{url.rstrip('/')}/sitemap.xml",
            f"{url.rstrip('/')}/sitemap_index.xml",
            f"{url.rstrip('/')}/post-sitemap.xml",
        ]
        discovered: list[str] = []
        session = await self._ensure_session()
        for sitemap_url in sitemap_urls_to_try:
            try:
                async with session.get(sitemap_url) as resp:
                    if resp.status != 200:
                        continue
                    text = await resp.text()
                    soup = BeautifulSoup(text, "lxml-xml")
                    # Sitemap index
                    sitemaps = soup.find_all("sitemap")
                    if sitemaps:
                        for sm in sitemaps:
                            loc = sm.find("loc")
                            if loc:
                                child_urls = await self._parse_sitemap(session, loc.text.strip())
                                discovered.extend(child_urls)
                    else:
                        urls = soup.find_all("url")
                        for u in urls:
                            loc = u.find("loc")
                            if loc:
                                discovered.append(loc.text.strip())
                    if discovered:
                        self.logger.info(f"Discovered {len(discovered)} URLs from {sitemap_url}")
                        break
            except Exception as exc:
                self.logger.debug(f"Failed to fetch {sitemap_url}: {exc}")
        return list(set(discovered))
    async def _parse_sitemap(self, session: aiohttp.ClientSession, sitemap_url: str) -> list[str]:
        """Parse a single sitemap XML and return URLs."""
        urls: list[str] = []
        try:
            async with session.get(sitemap_url) as resp:
                if resp.status != 200:
                    return urls
                text = await resp.text()
                soup = BeautifulSoup(text, "lxml-xml")
                for u in soup.find_all("url"):
                    loc = u.find("loc")
                    if loc:
                        urls.append(loc.text.strip())
        except Exception as exc:
            self.logger.debug(f"Failed to parse sitemap {sitemap_url}: {exc}")
        return urls
    # ------------------------------------------------------------------
    # Content type classification
    # ------------------------------------------------------------------
    @staticmethod
    def classify_content_type(url: str, title: str = "") -> str:
        """
        Classify content type based on URL path patterns and title.
        Returns one of: blog, product, service, landing, resource, other.
        """
        combined = f"{url.lower()} {title.lower()}"
        scores: dict[str, int] = {}
        for ctype, patterns in CONTENT_TYPE_PATTERNS.items():
            score = 0
            for pattern in patterns:
                if re.search(pattern, combined, re.IGNORECASE):
                    score += 1
            if score > 0:
                scores[ctype] = score
        if not scores:
            return "other"
        return max(scores, key=scores.get)
    # ------------------------------------------------------------------
    # Performance scoring
    # ------------------------------------------------------------------
    @staticmethod
    def score_performance(page: ContentPage) -> float:
        """
        Compute composite performance score (0-100) from traffic, keywords, backlinks.
        Weights:
        - Traffic: 50% (log-scaled, 10k+ traffic = max)
        - Keywords count: 30% (log-scaled, 500+ = max)
        - Backlinks: 20% (log-scaled, 100+ = max)
        """
        import math
        traffic_score = min(100, (math.log10(max(page.traffic, 1)) / math.log10(10000)) * 100)
        keywords_score = min(100, (math.log10(max(page.keywords_count, 1)) / math.log10(500)) * 100)
        backlinks_score = min(100, (math.log10(max(page.backlinks, 1)) / math.log10(100)) * 100)
        composite = (traffic_score * 0.50) + (keywords_score * 0.30) + (backlinks_score * 0.20)
        return round(min(100, max(0, composite)), 1)
    # ------------------------------------------------------------------
    # Content decay detection
    # ------------------------------------------------------------------
    @staticmethod
    def detect_decay(pages: list[ContentPage], threshold: float = -20.0) -> list[ContentPage]:
        """
        Flag pages with declining traffic trend.
        Uses a simple heuristic: pages with low performance score relative to
        their keyword count indicate potential decay. In production, historical
        traffic data from Ahrefs metrics-history would be used.
        Args:
            pages: List of content pages with metrics.
            threshold: Decay rate threshold (percentage decline).
        Returns:
            List of pages flagged as decaying.
        """
        decaying: list[ContentPage] = []
        for page in pages:
            # Heuristic: high keyword count but low traffic suggests decay
            if page.keywords_count > 10 and page.traffic < 50:
                page.is_decaying = True
                page.decay_rate = -50.0 if page.traffic == 0 else round(
                    -((page.keywords_count * 10 - page.traffic) / max(page.keywords_count * 10, 1)) * 100, 1
                )
                if page.decay_rate <= threshold:
                    decaying.append(page)
            elif page.performance_score < 20 and page.keywords_count > 5:
                page.is_decaying = True
                page.decay_rate = round(-max(30, 100 - page.performance_score * 2), 1)
                if page.decay_rate <= threshold:
                    decaying.append(page)
        decaying.sort(key=lambda p: p.decay_rate)
        return decaying
    # ------------------------------------------------------------------
    # Freshness assessment
    # ------------------------------------------------------------------
    @staticmethod
    def analyze_freshness(pages: list[ContentPage]) -> dict[str, int]:
        """
        Categorize pages by freshness based on last_modified dates.
        Returns distribution: fresh (< 3 months), aging (3-12 months),
        stale (> 12 months), unknown (no date).
        """
        now = datetime.now()
        distribution = {"fresh": 0, "aging": 0, "stale": 0, "unknown": 0}
        for page in pages:
            if not page.last_modified:
                distribution["unknown"] += 1
                continue
            try:
                # Try common date formats
                for fmt in ("%Y-%m-%dT%H:%M:%S", "%Y-%m-%d", "%Y-%m-%dT%H:%M:%S%z"):
                    try:
                        modified = datetime.strptime(
                            page.last_modified.replace("+00:00", "").replace("Z", ""), fmt.replace("%z", "")
                        )
                        break
                    except ValueError:
                        continue
                else:
                    distribution["unknown"] += 1
                    continue
                age = now - modified
                if age < timedelta(days=90):
                    distribution["fresh"] += 1
                elif age < timedelta(days=365):
                    distribution["aging"] += 1
                else:
                    distribution["stale"] += 1
            except Exception:
                distribution["unknown"] += 1
        return distribution
    # ------------------------------------------------------------------
    # Korean content pattern identification
    # ------------------------------------------------------------------
    @staticmethod
    def identify_korean_patterns(pages: list[ContentPage]) -> dict[str, Any]:
        """
        Detect Korean content patterns across pages.
        Identifies Naver Blog style review content, listicles,
        how-to guides, and informational content patterns.
        Returns summary with counts and example URLs per pattern.
        """
        results: dict[str, Any] = {
            "total_korean_content": 0,
            "patterns": {},
        }
        for pattern_name, keywords in KOREAN_CONTENT_PATTERNS.items():
            matches: list[dict[str, str]] = []
            for page in pages:
                combined = f"{page.url} {page.title}"
                for keyword in keywords:
                    if re.search(keyword, combined, re.IGNORECASE):
                        matches.append({"url": page.url, "title": page.title, "matched_keyword": keyword})
                        break
            results["patterns"][pattern_name] = {
                "count": len(matches),
                "examples": matches[:5],
            }
        korean_urls = set()
        for pattern_data in results["patterns"].values():
            for example in pattern_data["examples"]:
                korean_urls.add(example["url"])
        results["total_korean_content"] = len(korean_urls)
        return results
    # ------------------------------------------------------------------
    # Orchestration
    # ------------------------------------------------------------------
    async def audit(
        self,
        url: str,
        detect_decay_flag: bool = False,
        content_type_filter: str | None = None,
        limit: int = 200,
    ) -> ContentAuditResult:
        """
        Run full content audit: inventory, scoring, decay, Korean patterns.
        Args:
            url: Target website URL.
            detect_decay_flag: Whether to run decay detection.
            content_type_filter: Filter by content type (blog, product, etc.).
            limit: Maximum pages to analyze.
        Returns:
            ContentAuditResult with inventory, top performers, decay, analysis.
        """
        result = ContentAuditResult(
            url=url,
            timestamp=datetime.now().isoformat(),
        )
        self.logger.info(f"Starting content audit for {url}")
        # 1. Gather pages from Ahrefs and sitemap
        top_pages_data, traffic_pages_data, sitemap_urls = await asyncio.gather(
            self.get_top_pages(url, limit=limit),
            self.get_pages_by_traffic(url, limit=limit),
            self.crawl_sitemap(url),
        )
        # 2. Merge and deduplicate pages
        page_map: dict[str, ContentPage] = {}
        for item in top_pages_data:
            page_url = item.get("url", "")
            if not page_url:
                continue
            page_map[page_url] = ContentPage(
                url=page_url,
                title=item.get("top_keyword", ""),
                traffic=int(item.get("traffic", 0)),
                keywords_count=int(item.get("keywords", 0)),
                backlinks=int(item.get("value", 0)),
            )
        for item in traffic_pages_data:
            page_url = item.get("url", "")
            if not page_url:
                continue
            if page_url in page_map:
                existing = page_map[page_url]
                existing.traffic = max(existing.traffic, int(item.get("traffic", 0)))
                existing.keywords_count = max(existing.keywords_count, int(item.get("keywords", 0)))
            else:
                page_map[page_url] = ContentPage(
                    url=page_url,
                    title=item.get("top_keyword", ""),
                    traffic=int(item.get("traffic", 0)),
                    keywords_count=int(item.get("keywords", 0)),
                )
        # Add sitemap URLs not already present
        for s_url in sitemap_urls:
            if s_url not in page_map:
                page_map[s_url] = ContentPage(url=s_url)
        # 3. Classify and score
        all_pages: list[ContentPage] = []
        for page in page_map.values():
            page.content_type = self.classify_content_type(page.url, page.title)
            page.performance_score = self.score_performance(page)
            all_pages.append(page)
        # 4. Filter by content type if requested
        if content_type_filter:
            all_pages = [p for p in all_pages if p.content_type == content_type_filter]
        # 5. Build inventory
        by_type: dict[str, int] = {}
        for page in all_pages:
            by_type[page.content_type] = by_type.get(page.content_type, 0) + 1
        avg_score = (
            sum(p.performance_score for p in all_pages) / len(all_pages)
            if all_pages else 0.0
        )
        avg_word_count = (
            sum(p.word_count for p in all_pages) / len(all_pages)
            if all_pages else 0.0
        )
        freshness = self.analyze_freshness(all_pages)
        result.content_inventory = ContentInventory(
            total_pages=len(all_pages),
            by_type=by_type,
            avg_performance_score=round(avg_score, 1),
            avg_word_count=round(avg_word_count, 1),
            pages=sorted(all_pages, key=lambda p: p.performance_score, reverse=True)[:limit],
            freshness_distribution=freshness,
        )
        # 6. Top performers
        result.top_performers = sorted(all_pages, key=lambda p: p.performance_score, reverse=True)[:20]
        # 7. Decay detection
        if detect_decay_flag:
            result.decaying_content = self.detect_decay(all_pages)
        # 8. Korean content analysis
        result.korean_content_analysis = self.identify_korean_patterns(all_pages)
        # 9. Recommendations
        result.recommendations = self._generate_recommendations(result)
        self.logger.info(
            f"Audit complete: {len(all_pages)} pages, "
            f"{len(result.top_performers)} top performers, "
            f"{len(result.decaying_content)} decaying"
        )
        return result
    @staticmethod
    def _generate_recommendations(result: ContentAuditResult) -> list[str]:
        """Generate actionable recommendations from audit data."""
        recs: list[str] = []
        inv = result.content_inventory
        # Low average score
        if inv.avg_performance_score < 30:
            recs.append(
                "전체 콘텐츠 평균 성과 점수가 낮습니다 ({:.0f}/100). "
                "상위 콘텐츠 패턴을 분석하여 저성과 페이지를 개선하세요.".format(inv.avg_performance_score)
            )
        # Stale content
        stale = inv.freshness_distribution.get("stale", 0)
        total = inv.total_pages or 1
        if stale / total > 0.3:
            recs.append(
                f"오래된 콘텐츠가 {stale}개 ({stale * 100 // total}%)입니다. "
                "콘텐츠 업데이트 또는 통합을 고려하세요."
            )
        # Decaying content
        if len(result.decaying_content) > 5:
            recs.append(
                f"트래픽이 감소하는 콘텐츠가 {len(result.decaying_content)}개 감지되었습니다. "
                "상위 감소 페이지부터 콘텐츠 리프레시를 진행하세요."
            )
        # Content type balance
        blog_count = inv.by_type.get("blog", 0)
        if blog_count == 0:
            recs.append(
                "블로그 콘텐츠가 없습니다. SEO 트래픽 확보를 위해 "
                "블로그 콘텐츠 전략을 수립하세요."
            )
        # Korean content opportunities
        korean = result.korean_content_analysis
        review_count = korean.get("patterns", {}).get("naver_blog_style", {}).get("count", 0)
        if review_count == 0:
            recs.append(
                "후기/리뷰 콘텐츠가 없습니다. 한국 시장에서 후기 콘텐츠는 "
                "전환율에 큰 영향을 미치므로 후기 콘텐츠 생성을 권장합니다."
            )
        if not recs:
            recs.append("현재 콘텐츠 전략이 양호합니다. 지속적인 모니터링을 권장합니다.")
        return recs
 # ---------------------------------------------------------------------------
 # CLI
 # ---------------------------------------------------------------------------
 def build_parser() -> argparse.ArgumentParser:
    parser = argparse.ArgumentParser(
        description="SEO Content Auditor - inventory, scoring, and decay detection",
    )
    parser.add_argument("--url", required=True, help="Target website URL")
    parser.add_argument("--decay", action="store_true", help="Enable content decay detection")
    parser.add_argument("--type", dest="content_type", help="Filter by content type (blog, product, service, landing, resource)")
    parser.add_argument("--limit", type=int, default=200, help="Maximum pages to analyze (default: 200)")
    parser.add_argument("--json", action="store_true", help="Output as JSON")
    parser.add_argument("--output", help="Save output to file")
    return parser
 def format_text_report(result: ContentAuditResult) -> str:
    """Format audit result as human-readable text."""
    lines: list[str] = []
    lines.append(f"## Content Audit: {result.url}")
    lines.append(f"**Date**: {result.timestamp[:10]}")
    lines.append("")
    inv = result.content_inventory
    lines.append(f"### Content Inventory")
    lines.append(f"- Total pages: {inv.total_pages}")
    lines.append(f"- Average performance score: {inv.avg_performance_score}/100")
    lines.append(f"- Content types: {json.dumps(inv.by_type, ensure_ascii=False)}")
    lines.append(f"- Freshness: {json.dumps(inv.freshness_distribution, ensure_ascii=False)}")
    lines.append("")
    lines.append("### Top Performers")
    for i, page in enumerate(result.top_performers[:10], 1):
        lines.append(f"  {i}. [{page.performance_score:.0f}] {page.url} (traffic: {page.traffic})")
    lines.append("")
    if result.decaying_content:
        lines.append("### Decaying Content")
        for i, page in enumerate(result.decaying_content[:10], 1):
            lines.append(f"  {i}. [{page.decay_rate:+.0f}%] {page.url} (traffic: {page.traffic})")
        lines.append("")
    if result.korean_content_analysis.get("patterns"):
        lines.append("### Korean Content Patterns")
        for pattern_name, data in result.korean_content_analysis["patterns"].items():
            lines.append(f"  - {pattern_name}: {data['count']} pages")
        lines.append("")
    lines.append("### Recommendations")
    for i, rec in enumerate(result.recommendations, 1):
        lines.append(f"  {i}. {rec}")
    return "\n".join(lines)
 async def main() -> None:
    parser = build_parser()
    args = parser.parse_args()
    auditor = ContentAuditor()
    try:
        result = await auditor.audit(
            url=args.url,
            detect_decay_flag=args.decay,
            content_type_filter=args.content_type,
            limit=args.limit,
        )
        if args.json:
            output = json.dumps(asdict(result), ensure_ascii=False, indent=2, default=str)
        else:
            output = format_text_report(result)
        if args.output:
            with open(args.output, "w", encoding="utf-8") as f:
                f.write(output)
            logger.info(f"Output saved to {args.output}")
        else:
            print(output)
    finally:
        await auditor.close()
        auditor.print_stats()
 if __name__ == "__main__":
    asyncio.run(main())
--- a/custom-skills/23-seo-content-strategy/code/scripts/content_brief_generator.py
+++ b/custom-skills/23-seo-content-strategy/code/scripts/content_brief_generator.py
@@ -0,0 +1,738 @@
 """
 Content Brief Generator - SEO Content Brief Creation
 =====================================================
 Purpose: Generate detailed SEO content briefs with outlines,
         keyword lists, word count targets, and internal linking suggestions.
 Python: 3.10+
 """
 import argparse
 import asyncio
 import json
 import logging
 import math
 import re
 import sys
 from dataclasses import asdict, dataclass, field
 from datetime import datetime
 from typing import Any
 from urllib.parse import urlparse
 import aiohttp
 import requests
 from bs4 import BeautifulSoup
 from base_client import BaseAsyncClient, config
 logger = logging.getLogger(__name__)
 # ---------------------------------------------------------------------------
 # Data classes
 # ---------------------------------------------------------------------------
@dataclass
 class OutlineSection:
    """A single heading section in the content outline."""
    heading: str
    level: int = 2  # H2 or H3
    talking_points: list[str] = field(default_factory=list)
    target_words: int = 200
    keywords_to_include: list[str] = field(default_factory=list)
@dataclass
 class CompetitorPageAnalysis:
    """Analysis of a single competitor page for the target keyword."""
    url: str
    title: str = ""
    word_count: int = 0
    headings: list[dict[str, str]] = field(default_factory=list)
    topics_covered: list[str] = field(default_factory=list)
    content_type: str = ""
    has_images: bool = False
    has_video: bool = False
    has_faq: bool = False
    has_table: bool = False
@dataclass
 class ContentBrief:
    """Complete SEO content brief."""
    primary_keyword: str
    secondary_keywords: list[str] = field(default_factory=list)
    lsi_keywords: list[str] = field(default_factory=list)
    target_word_count: int = 1500
    word_count_range: tuple[int, int] = (1200, 1800)
    suggested_title: str = ""
    meta_description: str = ""
    outline: list[OutlineSection] = field(default_factory=list)
    competitor_analysis: list[CompetitorPageAnalysis] = field(default_factory=list)
    internal_links: list[dict[str, str]] = field(default_factory=list)
    content_format: str = "blog"
    korean_format_recommendations: list[str] = field(default_factory=list)
    search_intent: str = "informational"
    notes: list[str] = field(default_factory=list)
    timestamp: str = ""
 # ---------------------------------------------------------------------------
 # Search intent patterns
 # ---------------------------------------------------------------------------
 INTENT_PATTERNS = {
    "transactional": [
        r"buy", r"purchase", r"price", r"cost", r"order", r"shop",
        r"구매", r"주문", r"가격", r"비용", r"할인", r"쿠폰",
    ],
    "navigational": [
        r"login", r"sign in", r"official", r"website",
        r"로그인", r"공식", r"홈페이지",
    ],
    "commercial": [
        r"best", r"top", r"review", r"compare", r"vs",
        r"추천", r"비교", r"후기", r"리뷰", r"순위",
    ],
    "informational": [
        r"what", r"how", r"why", r"guide", r"tutorial",
        r"이란", r"방법", r"가이드", r"효과", r"원인",
    ],
 }
 # ---------------------------------------------------------------------------
 # Korean content format recommendations
 # ---------------------------------------------------------------------------
 KOREAN_FORMAT_TIPS = {
    "transactional": [
        "가격 비교표를 포함하세요 (경쟁사 가격 대비)",
        "실제 비용 사례를 3개 이상 제시하세요",
        "결제 방법 및 할인 정보를 명확히 안내하세요",
        "CTA(행동 유도) 버튼을 여러 위치에 배치하세요",
    ],
    "commercial": [
        "네이버 블로그 스타일의 솔직한 후기 톤을 사용하세요",
        "장단점을 균형 있게 비교하세요",
        "실제 사용 사진 또는 전후 비교 이미지를 포함하세요",
        "별점 또는 점수 평가 체계를 추가하세요",
        "FAQ 섹션을 포함하세요 (네이버 검색 노출에 유리)",
    ],
    "informational": [
        "핵심 정보를 글 상단에 요약하세요 (두괄식 구성)",
        "전문 용어는 쉬운 설명을 병기하세요",
        "인포그래픽 또는 도표를 활용하세요",
        "관련 콘텐츠 내부 링크를 3-5개 포함하세요",
        "전문가 인용 또는 출처를 명시하세요 (E-E-A-T 강화)",
    ],
    "navigational": [
        "공식 정보와 연락처를 최상단에 배치하세요",
        "지도 임베드를 포함하세요 (네이버 지도/구글 맵)",
        "영업시간, 주소, 전화번호를 명확히 표시하세요",
    ],
 }
 # ---------------------------------------------------------------------------
 # ContentBriefGenerator
 # ---------------------------------------------------------------------------
 class ContentBriefGenerator(BaseAsyncClient):
    """Generate comprehensive SEO content briefs."""
    def __init__(self, max_concurrent: int = 5, requests_per_second: float = 2.0):
        super().__init__(max_concurrent=max_concurrent, requests_per_second=requests_per_second)
        self.session: aiohttp.ClientSession | None = None
    async def _ensure_session(self) -> aiohttp.ClientSession:
        if self.session is None or self.session.closed:
            timeout = aiohttp.ClientTimeout(total=30)
            headers = {
                "User-Agent": "Mozilla/5.0 (compatible; SEOContentBrief/1.0)",
            }
            self.session = aiohttp.ClientSession(timeout=timeout, headers=headers)
        return self.session
    async def close(self) -> None:
        if self.session and not self.session.closed:
            await self.session.close()
    # ------------------------------------------------------------------
    # Analyze top ranking results
    # ------------------------------------------------------------------
    async def analyze_top_results(
        self,
        keyword: str,
        site_url: str | None = None,
        num_competitors: int = 5,
    ) -> list[CompetitorPageAnalysis]:
        """
        Analyze top ranking pages for a keyword using Ahrefs SERP data.
        Falls back to fetching pages directly if Ahrefs data is unavailable.
        """
        self.logger.info(f"Analyzing top results for: {keyword}")
        results: list[CompetitorPageAnalysis] = []
        # Try Ahrefs organic keywords to find ranking pages
        try:
            api_key = config.get_required("AHREFS_API_KEY") if hasattr(config, "get_required") else None
            if api_key:
                resp = requests.get(
                    "https://api.ahrefs.com/v3/serp-overview",
                    params={"keyword": keyword, "select": "url,title,position,traffic"},
                    headers={"Authorization": f"Bearer {api_key}"},
                    timeout=30,
                )
                if resp.status_code == 200:
                    data = resp.json()
                    serp_items = data.get("positions", data.get("items", []))[:num_competitors]
                    for item in serp_items:
                        analysis = CompetitorPageAnalysis(
                            url=item.get("url", ""),
                            title=item.get("title", ""),
                        )
                        results.append(analysis)
        except Exception as exc:
            self.logger.warning(f"Ahrefs SERP lookup failed: {exc}")
        # Fetch and analyze each page
        session = await self._ensure_session()
        for analysis in results[:num_competitors]:
            if not analysis.url:
                continue
            try:
                async with session.get(analysis.url) as resp:
                    if resp.status != 200:
                        continue
                    html = await resp.text()
                    self._analyze_page_content(analysis, html)
            except Exception as exc:
                self.logger.debug(f"Failed to fetch {analysis.url}: {exc}")
        self.logger.info(f"Analyzed {len(results)} competitor pages")
        return results
    @staticmethod
    def _analyze_page_content(analysis: CompetitorPageAnalysis, html: str) -> None:
        """Parse HTML and extract content metrics."""
        soup = BeautifulSoup(html, "html.parser")
        # Title
        title_tag = soup.find("title")
        if title_tag and not analysis.title:
            analysis.title = title_tag.get_text(strip=True)
        # Word count (visible text only)
        for tag in soup(["script", "style", "nav", "header", "footer"]):
            tag.decompose()
        visible_text = soup.get_text(separator=" ", strip=True)
        analysis.word_count = len(visible_text.split())
        # Headings
        headings: list[dict[str, str]] = []
        for level in range(1, 7):
            for h in soup.find_all(f"h{level}"):
                text = h.get_text(strip=True)
                if text:
                    headings.append({"level": f"H{level}", "text": text})
        analysis.headings = headings
        # Content features
        analysis.has_images = len(soup.find_all("img")) > 2
        analysis.has_video = bool(soup.find("video") or soup.find("iframe", src=re.compile(r"youtube|vimeo")))
        analysis.has_faq = bool(
            soup.find(string=re.compile(r"FAQ|자주\s*묻는\s*질문|Q\s*&\s*A", re.IGNORECASE))
            or soup.find("script", type="application/ld+json", string=re.compile(r"FAQPage"))
        )
        analysis.has_table = bool(soup.find("table"))
        # Topics covered (from H2 headings)
        analysis.topics_covered = [
            h["text"] for h in headings if h["level"] == "H2"
        ][:15]
    # ------------------------------------------------------------------
    # Extract content outline
    # ------------------------------------------------------------------
    def extract_outline(
        self,
        keyword: str,
        top_results: list[CompetitorPageAnalysis],
    ) -> list[OutlineSection]:
        """
        Build recommended H2/H3 outline by aggregating competitor headings.
        Identifies common topics across top-ranking pages and structures
        them into a logical outline.
        """
        # Collect all H2 headings
        h2_topics: dict[str, int] = {}
        h3_by_h2: dict[str, list[str]] = {}
        for result in top_results:
            current_h2 = ""
            for heading in result.headings:
                text = heading["text"].strip()
                if heading["level"] == "H2":
                    current_h2 = text
                    h2_topics[text] = h2_topics.get(text, 0) + 1
                elif heading["level"] == "H3" and current_h2:
                    if current_h2 not in h3_by_h2:
                        h3_by_h2[current_h2] = []
                    h3_by_h2[current_h2].append(text)
        # Sort H2s by frequency (most common topics first)
        sorted_h2s = sorted(h2_topics.items(), key=lambda x: x[1], reverse=True)
        # Build outline
        outline: list[OutlineSection] = []
        target_word_count = self.calculate_word_count(top_results)
        words_per_section = target_word_count // max(len(sorted_h2s), 5)
        for h2_text, frequency in sorted_h2s[:8]:
            section = OutlineSection(
                heading=h2_text,
                level=2,
                target_words=words_per_section,
                talking_points=[],
            )
            # Add H3 subtopics
            if h2_text in h3_by_h2:
                unique_h3s = list(dict.fromkeys(h3_by_h2[h2_text]))[:5]
                for h3_text in unique_h3s:
                    subsection = OutlineSection(
                        heading=h3_text,
                        level=3,
                        target_words=words_per_section // 3,
                    )
                    section.talking_points.append(h3_text)
            outline.append(section)
        # Ensure FAQ section if common
        faq_count = sum(1 for r in top_results if r.has_faq)
        if faq_count >= 2 and not any("FAQ" in s.heading or "질문" in s.heading for s in outline):
            outline.append(OutlineSection(
                heading="자주 묻는 질문 (FAQ)",
                level=2,
                target_words=300,
                talking_points=[
                    f"{keyword} 관련 자주 묻는 질문 5-7개",
                    "Schema markup (FAQPage) 적용 권장",
                ],
            ))
        return outline
    # ------------------------------------------------------------------
    # Keyword suggestions
    # ------------------------------------------------------------------
    async def suggest_keywords(self, primary_keyword: str) -> dict[str, list[str]]:
        """
        Generate primary, secondary, and LSI keyword suggestions.
        Uses Ahrefs related keywords and matching terms data.
        """
        self.logger.info(f"Generating keyword suggestions for: {primary_keyword}")
        result = {
            "primary": [primary_keyword],
            "secondary": [],
            "lsi": [],
        }
        try:
            api_key = config.get_required("AHREFS_API_KEY") if hasattr(config, "get_required") else None
            if not api_key:
                self.logger.warning("AHREFS_API_KEY not set; returning basic keywords only")
                return result
            # Matching terms
            resp = requests.get(
                "https://api.ahrefs.com/v3/keywords-explorer/matching-terms",
                params={"keyword": primary_keyword, "limit": 20, "select": "keyword,volume,difficulty"},
                headers={"Authorization": f"Bearer {api_key}"},
                timeout=30,
            )
            if resp.status_code == 200:
                data = resp.json()
                terms = data.get("keywords", data.get("items", []))
                for term in terms:
                    kw = term.get("keyword", "")
                    if kw and kw.lower() != primary_keyword.lower():
                        result["secondary"].append(kw)
            # Related terms (LSI)
            resp2 = requests.get(
                "https://api.ahrefs.com/v3/keywords-explorer/related-terms",
                params={"keyword": primary_keyword, "limit": 15, "select": "keyword,volume"},
                headers={"Authorization": f"Bearer {api_key}"},
                timeout=30,
            )
            if resp2.status_code == 200:
                data2 = resp2.json()
                related = data2.get("keywords", data2.get("items", []))
                for term in related:
                    kw = term.get("keyword", "")
                    if kw and kw not in result["secondary"]:
                        result["lsi"].append(kw)
        except Exception as exc:
            self.logger.warning(f"Keyword suggestion lookup failed: {exc}")
        return result
    # ------------------------------------------------------------------
    # Word count calculation
    # ------------------------------------------------------------------
    @staticmethod
    def calculate_word_count(top_results: list[CompetitorPageAnalysis]) -> int:
        """
        Calculate target word count based on top 5 ranking pages.
        Returns the average word count of top 5 with +/- 20% range.
        """
        word_counts = [r.word_count for r in top_results[:5] if r.word_count > 100]
        if not word_counts:
            return 1500  # Default fallback
        avg = sum(word_counts) / len(word_counts)
        # Round to nearest 100
        target = round(avg / 100) * 100
        return max(800, min(5000, target))
    # ------------------------------------------------------------------
    # Internal linking suggestions
    # ------------------------------------------------------------------
    async def suggest_internal_links(
        self,
        keyword: str,
        site_url: str,
    ) -> list[dict[str, str]]:
        """
        Find related existing pages on the site for internal linking.
        Uses Ahrefs organic keywords to find pages ranking for related terms.
        """
        self.logger.info(f"Finding internal link opportunities for {keyword} on {site_url}")
        links: list[dict[str, str]] = []
        target = urlparse(site_url).netloc or site_url
        try:
            api_key = config.get_required("AHREFS_API_KEY") if hasattr(config, "get_required") else None
            if not api_key:
                return links
            resp = requests.get(
                "https://api.ahrefs.com/v3/site-explorer/organic-keywords",
                params={
                    "target": target,
                    "limit": 50,
                    "select": "keyword,url,position,traffic",
                },
                headers={"Authorization": f"Bearer {api_key}"},
                timeout=30,
            )
            if resp.status_code != 200:
                return links
            data = resp.json()
            keywords_data = data.get("keywords", data.get("items", []))
            # Find pages ranking for related keywords
            keyword_lower = keyword.lower()
            keyword_words = set(keyword_lower.split())
            seen_urls: set[str] = set()
            for item in keywords_data:
                kw = item.get("keyword", "").lower()
                url = item.get("url", "")
                if not url or url in seen_urls:
                    continue
                # Check keyword relevance
                kw_words = set(kw.split())
                overlap = keyword_words & kw_words
                if overlap and kw != keyword_lower:
                    links.append({
                        "url": url,
                        "anchor_text": kw,
                        "relevance": f"{len(overlap)}/{len(keyword_words)} word overlap",
                        "current_traffic": str(item.get("traffic", 0)),
                    })
                    seen_urls.add(url)
            links.sort(key=lambda l: int(l.get("current_traffic", "0")), reverse=True)
        except Exception as exc:
            self.logger.warning(f"Internal link suggestion failed: {exc}")
        return links[:10]
    # ------------------------------------------------------------------
    # Search intent detection
    # ------------------------------------------------------------------
    @staticmethod
    def detect_search_intent(keyword: str) -> str:
        """Classify keyword search intent."""
        keyword_lower = keyword.lower()
        scores: dict[str, int] = {}
        for intent, patterns in INTENT_PATTERNS.items():
            score = sum(1 for p in patterns if re.search(p, keyword_lower, re.IGNORECASE))
            if score > 0:
                scores[intent] = score
        if not scores:
            return "informational"
        return max(scores, key=scores.get)
    # ------------------------------------------------------------------
    # Orchestration
    # ------------------------------------------------------------------
    async def generate(
        self,
        keyword: str,
        site_url: str,
        num_competitors: int = 5,
    ) -> ContentBrief:
        """
        Generate a comprehensive SEO content brief.
        Args:
            keyword: Primary target keyword.
            site_url: Target website URL.
            num_competitors: Number of competitor pages to analyze.
        Returns:
            ContentBrief with outline, keywords, and recommendations.
        """
        self.logger.info(f"Generating content brief for: {keyword}")
        # Detect search intent
        intent = self.detect_search_intent(keyword)
        # Run analyses in parallel
        top_results_task = self.analyze_top_results(keyword, site_url, num_competitors)
        keywords_task = self.suggest_keywords(keyword)
        internal_links_task = self.suggest_internal_links(keyword, site_url)
        top_results, keyword_data, internal_links = await asyncio.gather(
            top_results_task, keywords_task, internal_links_task,
        )
        # Calculate word count target
        target_word_count = self.calculate_word_count(top_results)
        word_count_min = int(target_word_count * 0.8)
        word_count_max = int(target_word_count * 1.2)
        # Build outline
        outline = self.extract_outline(keyword, top_results)
        # Generate title suggestion
        suggested_title = self._generate_title(keyword, intent)
        # Generate meta description
        meta_description = self._generate_meta_description(keyword, intent)
        # Korean format recommendations
        korean_tips = KOREAN_FORMAT_TIPS.get(intent, KOREAN_FORMAT_TIPS["informational"])
        brief = ContentBrief(
            primary_keyword=keyword,
            secondary_keywords=keyword_data.get("secondary", [])[:10],
            lsi_keywords=keyword_data.get("lsi", [])[:10],
            target_word_count=target_word_count,
            word_count_range=(word_count_min, word_count_max),
            suggested_title=suggested_title,
            meta_description=meta_description,
            outline=outline,
            competitor_analysis=top_results,
            internal_links=internal_links,
            content_format=self._suggest_format(intent, top_results),
            korean_format_recommendations=korean_tips,
            search_intent=intent,
            timestamp=datetime.now().isoformat(),
        )
        self.logger.info(
            f"Brief generated: {len(outline)} sections, "
            f"{target_word_count} target words, "
            f"{len(keyword_data.get('secondary', []))} secondary keywords"
        )
        return brief
    @staticmethod
    def _generate_title(keyword: str, intent: str) -> str:
        """Generate a suggested title based on keyword and intent."""
        templates = {
            "informational": "{keyword} - 완벽 가이드 (2025년 최신)",
            "commercial": "{keyword} 추천 TOP 10 비교 (전문가 리뷰)",
            "transactional": "{keyword} 가격 비교 및 구매 가이드",
            "navigational": "{keyword} - 공식 안내",
        }
        template = templates.get(intent, templates["informational"])
        return template.format(keyword=keyword)
    @staticmethod
    def _generate_meta_description(keyword: str, intent: str) -> str:
        """Generate a suggested meta description."""
        templates = {
            "informational": (
                f"{keyword}에 대해 알아야 할 모든 것을 정리했습니다. "
                "전문가가 알려주는 핵심 정보와 실용적인 가이드를 확인하세요."
            ),
            "commercial": (
                f"{keyword} 비교 분석! 장단점, 가격, 실제 후기를 "
                "한눈에 비교하고 최적의 선택을 하세요."
            ),
            "transactional": (
                f"{keyword} 최저가 비교 및 구매 방법을 안내합니다. "
                "합리적인 가격으로 구매하는 팁을 확인하세요."
            ),
            "navigational": (
                f"{keyword} 공식 정보 및 이용 안내. "
                "정확한 정보를 빠르게 확인하세요."
            ),
        }
        return templates.get(intent, templates["informational"])
    @staticmethod
    def _suggest_format(intent: str, results: list[CompetitorPageAnalysis]) -> str:
        """Suggest content format based on intent and competitor analysis."""
        if intent == "commercial":
            return "listicle"
        if intent == "informational":
            return "guide"
        if intent == "transactional":
            return "landing"
        # Check competitor patterns
        avg_word_count = (
            sum(r.word_count for r in results) / len(results) if results else 0
        )
        if avg_word_count > 3000:
            return "comprehensive_guide"
        return "blog"
 # ---------------------------------------------------------------------------
 # CLI
 # ---------------------------------------------------------------------------
 def build_parser() -> argparse.ArgumentParser:
    parser = argparse.ArgumentParser(
        description="SEO Content Brief Generator",
    )
    parser.add_argument("--keyword", required=True, help="Primary target keyword")
    parser.add_argument("--url", required=True, help="Target website URL")
    parser.add_argument("--competitors", type=int, default=5, help="Number of competitor pages to analyze (default: 5)")
    parser.add_argument("--json", action="store_true", help="Output as JSON")
    parser.add_argument("--output", help="Save output to file")
    return parser
 def format_text_report(brief: ContentBrief) -> str:
    """Format content brief as human-readable text."""
    lines: list[str] = []
    lines.append(f"## Content Brief: {brief.primary_keyword}")
    lines.append(f"**Date**: {brief.timestamp[:10]}")
    lines.append(f"**Search Intent**: {brief.search_intent}")
    lines.append(f"**Content Format**: {brief.content_format}")
    lines.append("")
    lines.append("### Target Metrics")
    lines.append(f"- Word count: {brief.target_word_count} ({brief.word_count_range[0]}-{brief.word_count_range[1]})")
    lines.append(f"- Suggested title: {brief.suggested_title}")
    lines.append(f"- Meta description: {brief.meta_description}")
    lines.append("")
    lines.append("### Keywords")
    lines.append(f"- **Primary**: {brief.primary_keyword}")
    if brief.secondary_keywords:
        lines.append(f"- **Secondary**: {', '.join(brief.secondary_keywords[:8])}")
    if brief.lsi_keywords:
        lines.append(f"- **LSI**: {', '.join(brief.lsi_keywords[:8])}")
    lines.append("")
    lines.append("### Content Outline")
    for section in brief.outline:
        prefix = "##" if section.level == 2 else "###"
        lines.append(f"  {prefix} {section.heading} (~{section.target_words}w)")
        for point in section.talking_points:
            lines.append(f"    - {point}")
    lines.append("")
    if brief.competitor_analysis:
        lines.append(f"### Competitor Analysis ({len(brief.competitor_analysis)} pages)")
        for comp in brief.competitor_analysis:
            lines.append(f"  - **{comp.title or comp.url}**")
            lines.append(f"    Word count: {comp.word_count} | Headings: {len(comp.headings)}")
            features = []
            if comp.has_images:
                features.append("images")
            if comp.has_video:
                features.append("video")
            if comp.has_faq:
                features.append("FAQ")
            if comp.has_table:
                features.append("table")
            if features:
                lines.append(f"    Features: {', '.join(features)}")
        lines.append("")
    if brief.internal_links:
        lines.append(f"### Internal Linking Suggestions ({len(brief.internal_links)})")
        for link in brief.internal_links[:7]:
            lines.append(f"  - [{link['anchor_text']}]({link['url']})")
        lines.append("")
    if brief.korean_format_recommendations:
        lines.append("### Korean Content Format Recommendations")
        for tip in brief.korean_format_recommendations:
            lines.append(f"  - {tip}")
    return "\n".join(lines)
 async def main() -> None:
    parser = build_parser()
    args = parser.parse_args()
    generator = ContentBriefGenerator()
    try:
        brief = await generator.generate(
            keyword=args.keyword,
            site_url=args.url,
            num_competitors=args.competitors,
        )
        if args.json:
            output = json.dumps(asdict(brief), ensure_ascii=False, indent=2, default=str)
        else:
            output = format_text_report(brief)
        if args.output:
            with open(args.output, "w", encoding="utf-8") as f:
                f.write(output)
            logger.info(f"Output saved to {args.output}")
        else:
            print(output)
    finally:
        await generator.close()
        generator.print_stats()
 if __name__ == "__main__":
    asyncio.run(main())
--- a/custom-skills/23-seo-content-strategy/code/scripts/content_gap_analyzer.py
+++ b/custom-skills/23-seo-content-strategy/code/scripts/content_gap_analyzer.py
@@ -0,0 +1,694 @@
 """
 Content Gap Analyzer - Topic Gap Detection & Cluster Mapping
 =============================================================
 Purpose: Identify content gaps vs competitors, build topic clusters,
         and generate prioritized editorial calendars.
 Python: 3.10+
 """
 import argparse
 import asyncio
 import json
 import logging
 import math
 import re
 import sys
 from collections import defaultdict
 from dataclasses import asdict, dataclass, field
 from datetime import datetime, timedelta
 from typing import Any
 from urllib.parse import urlparse
 import requests
 from sklearn.feature_extraction.text import TfidfVectorizer
 from sklearn.cluster import AgglomerativeClustering
 from base_client import BaseAsyncClient, config
 logger = logging.getLogger(__name__)
 # ---------------------------------------------------------------------------
 # Data classes
 # ---------------------------------------------------------------------------
@dataclass
 class TopicGap:
    """A topic present in competitors but missing from target."""
    topic: str
    competitor_urls: list[str] = field(default_factory=list)
    competitor_keywords: list[str] = field(default_factory=list)
    estimated_traffic: int = 0
    priority_score: float = 0.0
    difficulty: str = "medium"
    content_type_suggestion: str = "blog"
@dataclass
 class TopicCluster:
    """Topic cluster with pillar and supporting cluster pages."""
    pillar_topic: str
    pillar_keyword: str = ""
    cluster_topics: list[str] = field(default_factory=list)
    cluster_keywords: list[str] = field(default_factory=list)
    total_volume: int = 0
    coverage_score: float = 0.0
@dataclass
 class CalendarEntry:
    """Prioritized editorial calendar entry."""
    topic: str
    priority: str = "medium"
    target_date: str = ""
    content_type: str = "blog"
    target_word_count: int = 1500
    primary_keyword: str = ""
    estimated_traffic: int = 0
    cluster_name: str = ""
    notes: str = ""
@dataclass
 class ContentGapResult:
    """Full content gap analysis result."""
    target_url: str
    competitor_urls: list[str] = field(default_factory=list)
    timestamp: str = ""
    target_topics_count: int = 0
    competitor_topics_count: int = 0
    gaps: list[TopicGap] = field(default_factory=list)
    clusters: list[TopicCluster] = field(default_factory=list)
    calendar: list[CalendarEntry] = field(default_factory=list)
    content_volume_comparison: dict[str, int] = field(default_factory=dict)
    korean_opportunities: list[dict[str, Any]] = field(default_factory=dict)
    recommendations: list[str] = field(default_factory=list)
    errors: list[str] = field(default_factory=list)
 # ---------------------------------------------------------------------------
 # Korean opportunity patterns
 # ---------------------------------------------------------------------------
 KOREAN_OPPORTUNITY_PATTERNS = [
    {"pattern": r"후기|리뷰", "label": "review_content", "description": "후기/리뷰 콘텐츠"},
    {"pattern": r"비용|가격|견적", "label": "pricing_content", "description": "비용/가격 정보 콘텐츠"},
    {"pattern": r"비교|차이", "label": "comparison_content", "description": "비교 콘텐츠"},
    {"pattern": r"추천|베스트|TOP", "label": "recommendation_content", "description": "추천/리스트 콘텐츠"},
    {"pattern": r"방법|하는\s*법|가이드", "label": "how_to_content", "description": "가이드/방법 콘텐츠"},
    {"pattern": r"부작용|주의|위험", "label": "safety_content", "description": "안전/부작용 정보"},
    {"pattern": r"효과|결과|전후", "label": "results_content", "description": "효과/결과 콘텐츠"},
 ]
 # ---------------------------------------------------------------------------
 # ContentGapAnalyzer
 # ---------------------------------------------------------------------------
 class ContentGapAnalyzer(BaseAsyncClient):
    """Analyze content gaps between target and competitor sites."""
    def __init__(self, max_concurrent: int = 5, requests_per_second: float = 2.0):
        super().__init__(max_concurrent=max_concurrent, requests_per_second=requests_per_second)
    # ------------------------------------------------------------------
    # Ahrefs data retrieval
    # ------------------------------------------------------------------
    async def get_competitor_topics(self, competitor_url: str, limit: int = 100) -> list[dict]:
        """
        Get top pages and keywords for a competitor via Ahrefs.
        Returns list of dicts: url, traffic, keywords, top_keyword, title.
        """
        self.logger.info(f"Fetching competitor topics for {competitor_url}")
        target = urlparse(competitor_url).netloc or competitor_url
        try:
            api_key = config.get_required("AHREFS_API_KEY") if hasattr(config, "get_required") else None
            if not api_key:
                self.logger.warning("AHREFS_API_KEY not set; returning empty competitor topics")
                return []
            resp = requests.get(
                "https://api.ahrefs.com/v3/site-explorer/top-pages",
                params={
                    "target": target,
                    "limit": limit,
                    "select": "url,traffic,keywords,value,top_keyword",
                },
                headers={"Authorization": f"Bearer {api_key}"},
                timeout=30,
            )
            resp.raise_for_status()
            data = resp.json()
            pages = data.get("pages", data.get("items", []))
            self.logger.info(f"Retrieved {len(pages)} competitor topics from {competitor_url}")
            return pages
        except Exception as exc:
            self.logger.warning(f"Failed to get competitor topics for {competitor_url}: {exc}")
            return []
    async def get_target_keywords(self, target_url: str, limit: int = 200) -> set[str]:
        """Get the set of keywords the target site already ranks for."""
        self.logger.info(f"Fetching target keywords for {target_url}")
        target = urlparse(target_url).netloc or target_url
        try:
            api_key = config.get_required("AHREFS_API_KEY") if hasattr(config, "get_required") else None
            if not api_key:
                return set()
            resp = requests.get(
                "https://api.ahrefs.com/v3/site-explorer/organic-keywords",
                params={"target": target, "limit": limit, "select": "keyword,position,traffic"},
                headers={"Authorization": f"Bearer {api_key}"},
                timeout=30,
            )
            resp.raise_for_status()
            data = resp.json()
            keywords = data.get("keywords", data.get("items", []))
            return {kw.get("keyword", "").lower() for kw in keywords if kw.get("keyword")}
        except Exception as exc:
            self.logger.warning(f"Failed to get target keywords: {exc}")
            return set()
    async def get_organic_competitors(self, target_url: str, limit: int = 10) -> list[str]:
        """Discover organic competitors via Ahrefs."""
        self.logger.info(f"Discovering organic competitors for {target_url}")
        target = urlparse(target_url).netloc or target_url
        try:
            api_key = config.get_required("AHREFS_API_KEY") if hasattr(config, "get_required") else None
            if not api_key:
                return []
            resp = requests.get(
                "https://api.ahrefs.com/v3/site-explorer/organic-competitors",
                params={"target": target, "limit": limit},
                headers={"Authorization": f"Bearer {api_key}"},
                timeout=30,
            )
            resp.raise_for_status()
            data = resp.json()
            competitors = data.get("competitors", data.get("items", []))
            return [c.get("domain", "") for c in competitors if c.get("domain")]
        except Exception as exc:
            self.logger.warning(f"Failed to discover competitors: {exc}")
            return []
    # ------------------------------------------------------------------
    # Gap analysis
    # ------------------------------------------------------------------
    async def find_topic_gaps(
        self,
        target_url: str,
        competitor_urls: list[str],
    ) -> tuple[list[TopicGap], set[str], dict[str, int]]:
        """
        Identify topics covered by competitors but missing from target.
        Returns:
            - List of TopicGap objects.
            - Set of target keywords (for reference).
            - Content volume comparison dict.
        """
        # Gather target keywords
        target_keywords = await self.get_target_keywords(target_url)
        # Gather competitor data in parallel
        competitor_tasks = [self.get_competitor_topics(c_url) for c_url in competitor_urls]
        competitor_results = await asyncio.gather(*competitor_tasks, return_exceptions=True)
        # Build competitor topic map
        competitor_topic_map: dict[str, TopicGap] = {}
        content_volume: dict[str, int] = {target_url: len(target_keywords)}
        for c_url, c_result in zip(competitor_urls, competitor_results):
            if isinstance(c_result, Exception):
                self.logger.warning(f"Error fetching {c_url}: {c_result}")
                continue
            pages = c_result if isinstance(c_result, list) else []
            content_volume[c_url] = len(pages)
            for page in pages:
                top_keyword = page.get("top_keyword", "").strip().lower()
                if not top_keyword:
                    continue
                # Skip if target already covers this keyword
                if top_keyword in target_keywords:
                    continue
                # Check for fuzzy matches (keyword contained in target set)
                is_covered = any(
                    top_keyword in tk or tk in top_keyword
                    for tk in target_keywords
                    if len(tk) > 3
                )
                if is_covered:
                    continue
                if top_keyword not in competitor_topic_map:
                    competitor_topic_map[top_keyword] = TopicGap(
                        topic=top_keyword,
                        estimated_traffic=int(page.get("traffic", 0)),
                    )
                gap = competitor_topic_map[top_keyword]
                gap.competitor_urls.append(page.get("url", c_url))
                gap.competitor_keywords.append(top_keyword)
                gap.estimated_traffic = max(gap.estimated_traffic, int(page.get("traffic", 0)))
        # Score gaps
        gaps = list(competitor_topic_map.values())
        for gap in gaps:
            competitor_count = len(set(gap.competitor_urls))
            traffic_score = min(100, math.log10(max(gap.estimated_traffic, 1)) / math.log10(10000) * 100)
            competition_score = (competitor_count / max(len(competitor_urls), 1)) * 100
            gap.priority_score = round((traffic_score * 0.6) + (competition_score * 0.4), 1)
            # Difficulty estimation
            if competitor_count >= 3:
                gap.difficulty = "high"
            elif competitor_count >= 2:
                gap.difficulty = "medium"
            else:
                gap.difficulty = "low"
            # Content type suggestion
            gap.content_type_suggestion = self._suggest_content_type(gap.topic)
        gaps.sort(key=lambda g: g.priority_score, reverse=True)
        return gaps, target_keywords, content_volume
    @staticmethod
    def _suggest_content_type(topic: str) -> str:
        """Suggest content type based on topic keywords."""
        topic_lower = topic.lower()
        if any(w in topic_lower for w in ["how to", "guide", "tutorial", "방법", "가이드"]):
            return "guide"
        if any(w in topic_lower for w in ["best", "top", "review", "추천", "후기", "비교"]):
            return "listicle"
        if any(w in topic_lower for w in ["what is", "이란", "뜻", "의미"]):
            return "informational"
        if any(w in topic_lower for w in ["cost", "price", "비용", "가격"]):
            return "landing"
        return "blog"
    # ------------------------------------------------------------------
    # Topic cluster mapping
    # ------------------------------------------------------------------
    def build_topic_clusters(
        self,
        topics: list[str],
        n_clusters: int | None = None,
        min_cluster_size: int = 3,
    ) -> list[TopicCluster]:
        """
        Group topics into pillar/cluster structure using TF-IDF + hierarchical clustering.
        Args:
            topics: List of topic strings.
            n_clusters: Number of clusters (auto-detected if None).
            min_cluster_size: Minimum topics per cluster.
        Returns:
            List of TopicCluster objects.
        """
        if len(topics) < min_cluster_size:
            self.logger.warning("Too few topics for clustering")
            return []
        # Vectorize topics
        vectorizer = TfidfVectorizer(
            max_features=500,
            stop_words="english",
            ngram_range=(1, 2),
        )
        try:
            tfidf_matrix = vectorizer.fit_transform(topics)
        except ValueError as exc:
            self.logger.warning(f"TF-IDF vectorization failed: {exc}")
            return []
        # Auto-detect cluster count
        if n_clusters is None:
            n_clusters = max(2, min(len(topics) // 5, 15))
        n_clusters = min(n_clusters, len(topics) - 1)
        # Hierarchical clustering
        clustering = AgglomerativeClustering(
            n_clusters=n_clusters,
            metric="cosine",
            linkage="average",
        )
        labels = clustering.fit_predict(tfidf_matrix.toarray())
        # Build cluster objects
        cluster_map: dict[int, list[str]] = defaultdict(list)
        for topic, label in zip(topics, labels):
            cluster_map[label].append(topic)
        clusters: list[TopicCluster] = []
        for label, cluster_topics in sorted(cluster_map.items()):
            if len(cluster_topics) < min_cluster_size:
                continue
            # Pick the longest topic as pillar (usually broader)
            pillar = max(cluster_topics, key=len)
            subtopics = [t for t in cluster_topics if t != pillar]
            cluster = TopicCluster(
                pillar_topic=pillar,
                pillar_keyword=pillar,
                cluster_topics=subtopics[:20],
                cluster_keywords=[t for t in subtopics[:20]],
                total_volume=0,
                coverage_score=0.0,
            )
            clusters.append(cluster)
        clusters.sort(key=lambda c: len(c.cluster_topics), reverse=True)
        return clusters
    # ------------------------------------------------------------------
    # Editorial calendar generation
    # ------------------------------------------------------------------
    def generate_calendar(
        self,
        gaps: list[TopicGap],
        clusters: list[TopicCluster],
        weeks_ahead: int = 12,
        entries_per_week: int = 2,
    ) -> list[CalendarEntry]:
        """
        Generate prioritized editorial calendar from gaps and clusters.
        Args:
            gaps: List of topic gaps (sorted by priority).
            clusters: List of topic clusters.
            weeks_ahead: Number of weeks to plan.
            entries_per_week: Content pieces per week.
        Returns:
            List of CalendarEntry objects.
        """
        calendar: list[CalendarEntry] = []
        today = datetime.now()
        # Build cluster lookup
        topic_to_cluster: dict[str, str] = {}
        for cluster in clusters:
            for topic in cluster.cluster_topics:
                topic_to_cluster[topic] = cluster.pillar_topic
            topic_to_cluster[cluster.pillar_topic] = cluster.pillar_topic
        # Prioritize: pillar topics first, then by priority score
        pillar_topics = {c.pillar_topic for c in clusters}
        pillar_gaps = [g for g in gaps if g.topic in pillar_topics]
        other_gaps = [g for g in gaps if g.topic not in pillar_topics]
        ordered_gaps = pillar_gaps + other_gaps
        max_entries = weeks_ahead * entries_per_week
        week_offset = 0
        slot_in_week = 0
        for gap in ordered_gaps[:max_entries]:
            target_date = today + timedelta(weeks=week_offset, days=slot_in_week * 3)
            # Determine priority label
            if gap.priority_score >= 70:
                priority = "high"
            elif gap.priority_score >= 40:
                priority = "medium"
            else:
                priority = "low"
            # Word count based on content type
            word_count_map = {
                "guide": 2500,
                "listicle": 2000,
                "informational": 1800,
                "landing": 1200,
                "blog": 1500,
            }
            entry = CalendarEntry(
                topic=gap.topic,
                priority=priority,
                target_date=target_date.strftime("%Y-%m-%d"),
                content_type=gap.content_type_suggestion,
                target_word_count=word_count_map.get(gap.content_type_suggestion, 1500),
                primary_keyword=gap.topic,
                estimated_traffic=gap.estimated_traffic,
                cluster_name=topic_to_cluster.get(gap.topic, "uncategorized"),
            )
            calendar.append(entry)
            slot_in_week += 1
            if slot_in_week >= entries_per_week:
                slot_in_week = 0
                week_offset += 1
        return calendar
    # ------------------------------------------------------------------
    # Korean opportunity detection
    # ------------------------------------------------------------------
    @staticmethod
    def detect_korean_opportunities(gaps: list[TopicGap]) -> list[dict[str, Any]]:
        """Detect Korean-market content opportunities in gaps."""
        opportunities: list[dict[str, Any]] = []
        for gap in gaps:
            for pattern_info in KOREAN_OPPORTUNITY_PATTERNS:
                if re.search(pattern_info["pattern"], gap.topic, re.IGNORECASE):
                    opportunities.append({
                        "topic": gap.topic,
                        "pattern": pattern_info["label"],
                        "description": pattern_info["description"],
                        "estimated_traffic": gap.estimated_traffic,
                        "priority_score": gap.priority_score,
                    })
                    break
        opportunities.sort(key=lambda o: o["priority_score"], reverse=True)
        return opportunities
    # ------------------------------------------------------------------
    # Orchestration
    # ------------------------------------------------------------------
    async def analyze(
        self,
        target_url: str,
        competitor_urls: list[str],
        build_clusters: bool = False,
    ) -> ContentGapResult:
        """
        Run full content gap analysis.
        Args:
            target_url: Target website URL.
            competitor_urls: List of competitor URLs.
            build_clusters: Whether to build topic clusters.
        Returns:
            ContentGapResult with gaps, clusters, and calendar.
        """
        result = ContentGapResult(
            target_url=target_url,
            competitor_urls=competitor_urls,
            timestamp=datetime.now().isoformat(),
        )
        self.logger.info(
            f"Starting gap analysis: {target_url} vs {len(competitor_urls)} competitors"
        )
        # 1. Find topic gaps
        gaps, target_keywords, content_volume = await self.find_topic_gaps(
            target_url, competitor_urls
        )
        result.gaps = gaps
        result.target_topics_count = len(target_keywords)
        result.competitor_topics_count = sum(content_volume.get(c, 0) for c in competitor_urls)
        result.content_volume_comparison = content_volume
        # 2. Build topic clusters if requested
        if build_clusters and gaps:
            all_topics = [g.topic for g in gaps]
            result.clusters = self.build_topic_clusters(all_topics)
        # 3. Generate editorial calendar
        result.calendar = self.generate_calendar(gaps, result.clusters)
        # 4. Detect Korean opportunities
        result.korean_opportunities = self.detect_korean_opportunities(gaps)
        # 5. Recommendations
        result.recommendations = self._generate_recommendations(result)
        self.logger.info(
            f"Gap analysis complete: {len(gaps)} gaps, "
            f"{len(result.clusters)} clusters, "
            f"{len(result.calendar)} calendar entries"
        )
        return result
    @staticmethod
    def _generate_recommendations(result: ContentGapResult) -> list[str]:
        """Generate strategic recommendations from gap analysis."""
        recs: list[str] = []
        gap_count = len(result.gaps)
        if gap_count > 50:
            recs.append(
                f"경쟁사 대비 {gap_count}개의 콘텐츠 격차가 발견되었습니다. "
                "우선순위 상위 20개 주제부터 콘텐츠 생성을 시작하세요."
            )
        elif gap_count > 20:
            recs.append(
                f"{gap_count}개의 콘텐츠 격차가 있습니다. "
                "높은 트래픽 기회부터 순차적으로 콘텐츠를 생성하세요."
            )
        elif gap_count > 0:
            recs.append(
                f"{gap_count}개의 콘텐츠 격차가 발견되었습니다. "
                "비교적 적은 격차이므로 빠른 시일 내 모두 커버할 수 있습니다."
            )
        if result.clusters:
            recs.append(
                f"{len(result.clusters)}개의 토픽 클러스터를 구성했습니다. "
                "필러 콘텐츠부터 작성하여 내부 링크 구조를 강화하세요."
            )
        if result.korean_opportunities:
            recs.append(
                f"한국어 시장 기회가 {len(result.korean_opportunities)}개 발견되었습니다. "
                "후기, 비용, 비교 콘텐츠는 한국 검색 시장에서 높은 전환율을 보입니다."
            )
        high_priority = [g for g in result.gaps if g.priority_score >= 70]
        if high_priority:
            top_topics = ", ".join(g.topic for g in high_priority[:3])
            recs.append(
                f"최우선 주제: {top_topics}. "
                "이 주제들은 높은 트래픽 잠재력과 경쟁사 커버리지를 가지고 있습니다."
            )
        if not recs:
            recs.append("경쟁사 대비 콘텐츠 커버리지가 양호합니다. 기존 콘텐츠 최적화에 집중하세요.")
        return recs
 # ---------------------------------------------------------------------------
 # CLI
 # ---------------------------------------------------------------------------
 def build_parser() -> argparse.ArgumentParser:
    parser = argparse.ArgumentParser(
        description="SEO Content Gap Analyzer - topic gaps, clusters, calendar",
    )
    parser.add_argument("--target", required=True, help="Target website URL")
    parser.add_argument(
        "--competitor", action="append", dest="competitors", required=True,
        help="Competitor URL (can be repeated)",
    )
    parser.add_argument("--clusters", action="store_true", help="Build topic clusters")
    parser.add_argument("--json", action="store_true", help="Output as JSON")
    parser.add_argument("--output", help="Save output to file")
    return parser
 def format_text_report(result: ContentGapResult) -> str:
    """Format gap analysis result as human-readable text."""
    lines: list[str] = []
    lines.append(f"## Content Gap Analysis: {result.target_url}")
    lines.append(f"**Date**: {result.timestamp[:10]}")
    lines.append(f"**Competitors**: {', '.join(result.competitor_urls)}")
    lines.append("")
    lines.append("### Content Volume Comparison")
    for site, count in result.content_volume_comparison.items():
        lines.append(f"  - {site}: {count} topics")
    lines.append("")
    lines.append(f"### Topic Gaps ({len(result.gaps)} found)")
    for i, gap in enumerate(result.gaps[:20], 1):
        lines.append(
            f"  {i}. [{gap.priority_score:.0f}] {gap.topic} "
            f"(traffic: {gap.estimated_traffic}, difficulty: {gap.difficulty})"
        )
    lines.append("")
    if result.clusters:
        lines.append(f"### Topic Clusters ({len(result.clusters)})")
        for i, cluster in enumerate(result.clusters, 1):
            lines.append(f"  {i}. **{cluster.pillar_topic}** ({len(cluster.cluster_topics)} subtopics)")
            for sub in cluster.cluster_topics[:5]:
                lines.append(f"     - {sub}")
        lines.append("")
    if result.calendar:
        lines.append(f"### Editorial Calendar ({len(result.calendar)} entries)")
        for entry in result.calendar[:15]:
            lines.append(
                f"  - [{entry.target_date}] {entry.topic} "
                f"({entry.content_type}, {entry.target_word_count}w, priority: {entry.priority})"
            )
        lines.append("")
    if result.korean_opportunities:
        lines.append(f"### Korean Market Opportunities ({len(result.korean_opportunities)})")
        for opp in result.korean_opportunities[:10]:
            lines.append(f"  - {opp['topic']} ({opp['description']})")
        lines.append("")
    lines.append("### Recommendations")
    for i, rec in enumerate(result.recommendations, 1):
        lines.append(f"  {i}. {rec}")
    return "\n".join(lines)
 async def main() -> None:
    parser = build_parser()
    args = parser.parse_args()
    analyzer = ContentGapAnalyzer()
    result = await analyzer.analyze(
        target_url=args.target,
        competitor_urls=args.competitors,
        build_clusters=args.clusters,
    )
    if args.json:
        output = json.dumps(asdict(result), ensure_ascii=False, indent=2, default=str)
    else:
        output = format_text_report(result)
    if args.output:
        with open(args.output, "w", encoding="utf-8") as f:
            f.write(output)
        logger.info(f"Output saved to {args.output}")
    else:
        print(output)
    analyzer.print_stats()
 if __name__ == "__main__":
    asyncio.run(main())
--- a/custom-skills/23-seo-content-strategy/code/scripts/requirements.txt
+++ b/custom-skills/23-seo-content-strategy/code/scripts/requirements.txt
@@ -0,0 +1,11 @@
 # 23-seo-content-strategy dependencies
 requests>=2.31.0
 aiohttp>=3.9.0
 beautifulsoup4>=4.12.0
 lxml>=5.1.0
 pandas>=2.1.0
 scikit-learn>=1.3.0
 tenacity>=8.2.0
 tqdm>=4.66.0
 python-dotenv>=1.0.0
 rich>=13.7.0
--- a/custom-skills/23-seo-content-strategy/desktop/SKILL.md
+++ b/custom-skills/23-seo-content-strategy/desktop/SKILL.md
@@ -0,0 +1,138 @@
 ---
 name: seo-content-strategy
 description: |
  Content strategy and planning for SEO. Triggers: content audit, content strategy, content gap, topic clusters, content brief, editorial calendar, content decay, 콘텐츠 전략, 콘텐츠 감사.
 ---
 # SEO Content Strategy
 ## Purpose
 Audit existing content performance, identify topic gaps vs competitors, map topic clusters, detect content decay, and generate SEO content briefs. Supports Korean content patterns (Naver Blog format, 후기/review content, 추천 listicles).
 ## Core Capabilities
 1. **Content Audit** - Inventory, performance scoring, decay detection
 2. **Content Gap Analysis** - Topic gaps vs competitors, cluster mapping
 3. **Content Brief Generation** - Outlines, keywords, word count targets
 4. **Editorial Calendar** - Prioritized content creation schedule
 5. **Korean Content Patterns** - Naver Blog style, 후기, 추천 format analysis
 ## MCP Tool Usage
 ### Ahrefs for Content Data
 ```
 site-explorer-top-pages: Get top performing pages
 site-explorer-pages-by-traffic: Pages ranked by organic traffic
 site-explorer-organic-keywords: Keywords per page
 site-explorer-organic-competitors: Find content competitors
 site-explorer-best-by-external-links: Best content by backlinks
 keywords-explorer-matching-terms: Secondary keyword suggestions
 keywords-explorer-related-terms: LSI keyword suggestions
 serp-overview: Analyze top ranking results for a keyword
 ```
 ### WebSearch for Content Research
 ```
 WebSearch: Research content topics and competitor strategies
 WebFetch: Analyze competitor page content and structure
 ```
 ### Notion for Report Storage
 ```
 notion-create-pages: Save audit reports to SEO Audit Log
 ```
 ## Workflow
 ### 1. Content Audit
 1. Crawl sitemap to discover all content URLs
 2. Fetch top pages data from Ahrefs (traffic, keywords, backlinks)
 3. Classify content types (blog, product, service, landing, resource)
 4. Score each page performance (0-100 composite)
 5. Detect decaying content (traffic decline patterns)
 6. Analyze freshness distribution (fresh/aging/stale)
 7. Identify Korean content patterns (후기, 추천, 방법 formats)
 8. Generate recommendations
 ### 2. Content Gap Analysis
 1. Gather target site keywords from Ahrefs
 2. Gather competitor top pages and keywords
 3. Identify topics present in competitors but missing from target
 4. Score gaps by priority (traffic potential + competition coverage)
 5. Build topic clusters using TF-IDF + hierarchical clustering
 6. Generate editorial calendar with priority and dates
 7. Detect Korean market content opportunities
 ### 3. Content Brief Generation
 1. Analyze top 5-10 ranking pages for target keyword
 2. Extract headings, word counts, content features (FAQ, images, video)
 3. Build recommended H2/H3 outline from competitor patterns
 4. Suggest primary, secondary, and LSI keywords
 5. Calculate target word count (avg of top 5 +/- 20%)
 6. Find internal linking opportunities on the target site
 7. Detect search intent (informational, commercial, transactional, navigational)
 8. Add Korean format recommendations based on intent
 ## Output Format
 ```markdown
 ## Content Audit: [domain]
 ### Content Inventory
 - Total pages: [count]
 - By type: blog [n], product [n], service [n], other [n]
 - Average performance score: [score]/100
 ### Top Performers
 1. [score] [url] (traffic: [n])
 ...
 ### Decaying Content
 1. [decay rate] [url] (traffic: [n])
 ...
 ### Content Gaps vs Competitors
 1. [priority] [topic] (est. traffic: [n], difficulty: [level])
 ...
 ### Topic Clusters
 1. **[Pillar Topic]** ([n] subtopics)
   - [subtopic 1]
   - [subtopic 2]
 ### Editorial Calendar
 - [date] [topic] ([type], [word count], priority: [level])
 ...
 ### Recommendations
 1. [Priority actions]
 ```
 ## Common Issues
 | Issue | Impact | Fix |
 |-------|--------|-----|
 | No blog content | High | Build blog content strategy with topic clusters |
 | Content decay (traffic loss) | High | Refresh and update declining pages |
 | Missing competitor topics | Medium | Create content for high-priority gaps |
 | No 후기/review content | Medium | Add Korean review-style content for conversions |
 | Stale content (>12 months) | Medium | Update or consolidate outdated pages |
 | No topic clusters | Medium | Organize content into pillar/cluster structure |
 | Missing FAQ sections | Low | Add FAQ schema for featured snippet opportunities |
 ## Limitations
 - Ahrefs API required for traffic and keyword data
 - Competitor analysis limited to publicly available content
 - Content decay detection uses heuristic without historical data in standalone mode
 - Topic clustering requires minimum 3 topics per cluster
 - Word count analysis requires accessible competitor pages (no JS rendering)
 ## Notion Output (Required)
 All audit reports MUST be saved to OurDigital SEO Audit Log:
 - **Database ID**: `2c8581e5-8a1e-8035-880b-e38cefc2f3ef`
 - **Properties**: Issue (title), Site (url), Category, Priority, Found Date, Audit ID
 - **Language**: Korean with English technical terms
 - **Audit ID Format**: CONTENT-YYYYMMDD-NNN
--- a/custom-skills/23-seo-content-strategy/desktop/skill.yaml
+++ b/custom-skills/23-seo-content-strategy/desktop/skill.yaml
@@ -0,0 +1,8 @@
 name: seo-content-strategy
 description: |
  Content strategy and planning for SEO. Triggers: content audit, content strategy, content gap, topic clusters, content brief, editorial calendar, content decay.
 allowed-tools:
  - mcp__ahrefs__*
  - mcp__notion__*
  - WebSearch
  - WebFetch
--- a/custom-skills/23-seo-content-strategy/desktop/tools/ahrefs.md
+++ b/custom-skills/23-seo-content-strategy/desktop/tools/ahrefs.md
@@ -0,0 +1,15 @@
 # Ahrefs
 > TODO: Document tool usage for this skill
 ## Available Commands
 - [ ] List commands
 ## Configuration
 - [ ] Add configuration details
 ## Examples
 - [ ] Add usage examples
--- a/custom-skills/23-seo-content-strategy/desktop/tools/notion.md
+++ b/custom-skills/23-seo-content-strategy/desktop/tools/notion.md
@@ -0,0 +1,15 @@
 # Notion
 > TODO: Document tool usage for this skill
 ## Available Commands
 - [ ] List commands
 ## Configuration
 - [ ] Add configuration details
 ## Examples
 - [ ] Add usage examples
--- a/custom-skills/23-seo-content-strategy/desktop/tools/websearch.md
+++ b/custom-skills/23-seo-content-strategy/desktop/tools/websearch.md
@@ -0,0 +1,15 @@
 # WebSearch
 > TODO: Document tool usage for this skill
 ## Available Commands
 - [ ] List commands
 ## Configuration
 - [ ] Add configuration details
 ## Examples
 - [ ] Add usage examples
--- a/custom-skills/24-seo-ecommerce/code/CLAUDE.md
+++ b/custom-skills/24-seo-ecommerce/code/CLAUDE.md
@@ -0,0 +1,136 @@
 # CLAUDE.md
 ## Overview
 E-commerce SEO audit tool for product page optimization, product schema validation, category taxonomy analysis, and marketplace presence checking. Supports Naver Smart Store optimization and Korean marketplace platforms (Coupang, Gmarket, 11번가).
 ## Quick Start
 ```bash
 pip install -r scripts/requirements.txt
 # E-commerce SEO audit
 python scripts/ecommerce_auditor.py --url https://example.com --json
 # Product schema validation
 python scripts/product_schema_checker.py --url https://example.com/product --json
 ```
 ## Scripts
 | Script | Purpose | Key Output |
 |--------|---------|------------|
 | `ecommerce_auditor.py` | Full e-commerce SEO audit | Product page issues, category structure, marketplace presence |
 | `product_schema_checker.py` | Validate product structured data | Schema completeness, errors, rich result eligibility |
 | `base_client.py` | Shared utilities | RateLimiter, ConfigManager, BaseAsyncClient |
 ## E-Commerce Auditor
 ```bash
 # Full audit
 python scripts/ecommerce_auditor.py --url https://example.com --json
 # Product page audit only
 python scripts/ecommerce_auditor.py --url https://example.com --scope products --json
 # Category taxonomy analysis
 python scripts/ecommerce_auditor.py --url https://example.com --scope categories --json
 # Check Korean marketplace presence
 python scripts/ecommerce_auditor.py --url https://example.com --korean-marketplaces --json
 ```
 **Capabilities**:
 - Product page SEO audit (titles, meta descriptions, image alt text, H1 structure)
 - Category taxonomy analysis (depth, breadcrumb implementation, faceted navigation)
 - Duplicate content detection (parameter URLs, product variants, pagination)
 - Pagination/infinite scroll SEO validation (rel=prev/next, canonical tags)
 - Internal linking structure for product discovery
 - Naver Smart Store optimization checks
 - Korean marketplace presence (Coupang, Gmarket, 11번가 product listing detection)
 ## Product Schema Checker
 ```bash
 # Validate single product page
 python scripts/product_schema_checker.py --url https://example.com/product/123 --json
 # Batch validate from sitemap
 python scripts/product_schema_checker.py --sitemap https://example.com/product-sitemap.xml --sample 50 --json
 ```
 **Capabilities**:
 - Product schema validation (Product, Offer, AggregateRating, Review, BreadcrumbList)
 - Required property completeness check (name, image, description, offers, price, availability)
 - Optional property recommendations (brand, sku, gtin, mpn, review, aggregateRating)
 - Rich result eligibility assessment
 - Price and availability markup validation
 - Merchant listing schema support
 - Korean market: Naver Shopping structured data requirements
 ## Ahrefs MCP Tools Used
 | Tool | Purpose |
 |------|---------|
 | `site-explorer-pages-by-traffic` | Identify top product/category pages |
 | `site-explorer-organic-keywords` | Product page keyword performance |
 ## Output Format
 ```json
 {
  "url": "https://example.com",
  "product_pages_audited": 50,
  "issues": {
    "critical": [...],
    "high": [...],
    "medium": [...],
    "low": [...]
  },
  "category_structure": {
    "max_depth": 4,
    "breadcrumbs_present": true,
    "faceted_nav_issues": [...]
  },
  "schema_validation": {
    "pages_with_schema": 42,
    "pages_without_schema": 8,
    "common_errors": [...]
  },
  "korean_marketplaces": {
    "naver_smart_store": {"found": true, "url": "..."},
    "coupang": {"found": false},
    "gmarket": {"found": false}
  },
  "score": 65,
  "timestamp": "2025-01-01T00:00:00"
 }
 ```
 ## Notion Output (Required)
 **IMPORTANT**: All audit reports MUST be saved to the OurDigital SEO Audit Log database.
 ### Database Configuration
 | Field | Value |
 |-------|-------|
 | Database ID | `2c8581e5-8a1e-8035-880b-e38cefc2f3ef` |
 | URL | https://www.notion.so/dintelligence/2c8581e58a1e8035880be38cefc2f3ef |
 ### Required Properties
 | Property | Type | Description |
 |----------|------|-------------|
 | Issue | Title | Report title (Korean + date) |
 | Site | URL | Audited website URL |
 | Category | Select | E-Commerce SEO |
 | Priority | Select | Based on issue severity |
 | Found Date | Date | Audit date (YYYY-MM-DD) |
 | Audit ID | Rich Text | Format: ECOM-YYYYMMDD-NNN |
 ### Language Guidelines
 - Report content in Korean (한국어)
 - Keep technical English terms as-is (e.g., Schema Markup, Product, Offer)
 - URLs and code remain unchanged
--- a/custom-skills/24-seo-ecommerce/code/scripts/base_client.py
+++ b/custom-skills/24-seo-ecommerce/code/scripts/base_client.py
@@ -0,0 +1,207 @@
 """
 Base Client - Shared async client utilities
 ===========================================
 Purpose: Rate-limited async operations for API clients
 Python: 3.10+
 """
 import asyncio
 import logging
 import os
 from asyncio import Semaphore
 from datetime import datetime
 from typing import Any, Callable, TypeVar
 from dotenv import load_dotenv
 from tenacity import (
    retry,
    stop_after_attempt,
    wait_exponential,
    retry_if_exception_type,
 )
 # Load environment variables
 load_dotenv()
 # Logging setup
 logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s",
 )
 T = TypeVar("T")
 class RateLimiter:
    """Rate limiter using token bucket algorithm."""
    def __init__(self, rate: float, per: float = 1.0):
        """
        Initialize rate limiter.
        Args:
            rate: Number of requests allowed
            per: Time period in seconds (default: 1 second)
        """
        self.rate = rate
        self.per = per
        self.tokens = rate
        self.last_update = datetime.now()
        self._lock = asyncio.Lock()
    async def acquire(self) -> None:
        """Acquire a token, waiting if necessary."""
        async with self._lock:
            now = datetime.now()
            elapsed = (now - self.last_update).total_seconds()
            self.tokens = min(self.rate, self.tokens + elapsed * (self.rate / self.per))
            self.last_update = now
            if self.tokens < 1:
                wait_time = (1 - self.tokens) * (self.per / self.rate)
                await asyncio.sleep(wait_time)
                self.tokens = 0
            else:
                self.tokens -= 1
 class BaseAsyncClient:
    """Base class for async API clients with rate limiting."""
    def __init__(
        self,
        max_concurrent: int = 5,
        requests_per_second: float = 3.0,
        logger: logging.Logger | None = None,
    ):
        """
        Initialize base client.
        Args:
            max_concurrent: Maximum concurrent requests
            requests_per_second: Rate limit
            logger: Logger instance
        """
        self.semaphore = Semaphore(max_concurrent)
        self.rate_limiter = RateLimiter(requests_per_second)
        self.logger = logger or logging.getLogger(self.__class__.__name__)
        self.stats = {
            "requests": 0,
            "success": 0,
            "errors": 0,
            "retries": 0,
        }
    @retry(
        stop=stop_after_attempt(3),
        wait=wait_exponential(multiplier=1, min=2, max=10),
        retry=retry_if_exception_type(Exception),
    )
    async def _rate_limited_request(
        self,
        coro: Callable[[], Any],
    ) -> Any:
        """Execute a request with rate limiting and retry."""
        async with self.semaphore:
            await self.rate_limiter.acquire()
            self.stats["requests"] += 1
            try:
                result = await coro()
                self.stats["success"] += 1
                return result
            except Exception as e:
                self.stats["errors"] += 1
                self.logger.error(f"Request failed: {e}")
                raise
    async def batch_requests(
        self,
        requests: list[Callable[[], Any]],
        desc: str = "Processing",
    ) -> list[Any]:
        """Execute multiple requests concurrently."""
        try:
            from tqdm.asyncio import tqdm
            has_tqdm = True
        except ImportError:
            has_tqdm = False
        async def execute(req: Callable) -> Any:
            try:
                return await self._rate_limited_request(req)
            except Exception as e:
                return {"error": str(e)}
        tasks = [execute(req) for req in requests]
        if has_tqdm:
            results = []
            for coro in tqdm.as_completed(tasks, total=len(tasks), desc=desc):
                result = await coro
                results.append(result)
            return results
        else:
            return await asyncio.gather(*tasks, return_exceptions=True)
    def print_stats(self) -> None:
        """Print request statistics."""
        self.logger.info("=" * 40)
        self.logger.info("Request Statistics:")
        self.logger.info(f"  Total Requests: {self.stats['requests']}")
        self.logger.info(f"  Successful: {self.stats['success']}")
        self.logger.info(f"  Errors: {self.stats['errors']}")
        self.logger.info("=" * 40)
 class ConfigManager:
    """Manage API configuration and credentials."""
    def __init__(self):
        load_dotenv()
    @property
    def google_credentials_path(self) -> str | None:
        """Get Google service account credentials path."""
        # Prefer SEO-specific credentials, fallback to general credentials
        seo_creds = os.path.expanduser("~/.credential/ourdigital-seo-agent.json")
        if os.path.exists(seo_creds):
            return seo_creds
        return os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
    @property
    def pagespeed_api_key(self) -> str | None:
        """Get PageSpeed Insights API key."""
        return os.getenv("PAGESPEED_API_KEY")
    @property
    def custom_search_api_key(self) -> str | None:
        """Get Custom Search API key."""
        return os.getenv("CUSTOM_SEARCH_API_KEY")
    @property
    def custom_search_engine_id(self) -> str | None:
        """Get Custom Search Engine ID."""
        return os.getenv("CUSTOM_SEARCH_ENGINE_ID")
    @property
    def notion_token(self) -> str | None:
        """Get Notion API token."""
        return os.getenv("NOTION_TOKEN") or os.getenv("NOTION_API_KEY")
    def validate_google_credentials(self) -> bool:
        """Validate Google credentials are configured."""
        creds_path = self.google_credentials_path
        if not creds_path:
            return False
        return os.path.exists(creds_path)
    def get_required(self, key: str) -> str:
        """Get required environment variable or raise error."""
        value = os.getenv(key)
        if not value:
            raise ValueError(f"Missing required environment variable: {key}")
        return value
 # Singleton config instance
 config = ConfigManager()
--- a/custom-skills/24-seo-ecommerce/code/scripts/ecommerce_auditor.py
+++ b/custom-skills/24-seo-ecommerce/code/scripts/ecommerce_auditor.py
--- a/custom-skills/24-seo-ecommerce/code/scripts/product_schema_checker.py
+++ b/custom-skills/24-seo-ecommerce/code/scripts/product_schema_checker.py
@@ -0,0 +1,805 @@
 """
 Product Schema Checker
 ======================
 Purpose: Validate Product structured data (JSON-LD, Microdata, RDFa)
         for Google and Naver rich result eligibility.
 Python: 3.10+
 """
 import argparse
 import asyncio
 import json
 import logging
 import re
 import sys
 from dataclasses import asdict, dataclass, field
 from datetime import datetime
 from typing import Any
 from urllib.parse import urljoin, urlparse
 import aiohttp
 from bs4 import BeautifulSoup
 from rich.console import Console
 from rich.table import Table
 from base_client import BaseAsyncClient, config
 logger = logging.getLogger(__name__)
 console = Console()
 # ---------------------------------------------------------------------------
 # Data classes
 # ---------------------------------------------------------------------------
@dataclass
 class SchemaProperty:
    """Single property within a schema object."""
    name: str
    value: Any
    required: bool
    valid: bool
    error: str = ""
@dataclass
 class ProductSchema:
    """Validation result for one product schema on a page."""
    url: str
    schema_type: str               # Product, Offer, AggregateRating, etc.
    properties: list[dict]          # list of SchemaProperty as dicts
    is_valid: bool = False
    rich_result_eligible: bool = False
    errors: list[str] = field(default_factory=list)
    warnings: list[str] = field(default_factory=list)
@dataclass
 class SchemaCheckResult:
    """Complete schema check result for one or more pages."""
    urls_checked: int = 0
    pages_with_schema: int = 0
    pages_without_schema: int = 0
    schemas: list[dict] = field(default_factory=list)
    common_errors: list[str] = field(default_factory=list)
    common_warnings: list[str] = field(default_factory=list)
    naver_shopping_issues: list[dict] = field(default_factory=list)
    score: int = 0
    timestamp: str = ""
    def calculate_score(self) -> int:
        """Score 0-100 based on schema completeness."""
        if self.urls_checked == 0:
            self.score = 0
            return 0
        coverage = self.pages_with_schema / self.urls_checked
        valid_schemas = sum(1 for s in self.schemas if s.get("is_valid"))
        validity_rate = valid_schemas / max(len(self.schemas), 1)
        eligible = sum(1 for s in self.schemas if s.get("rich_result_eligible"))
        eligibility_rate = eligible / max(len(self.schemas), 1)
        self.score = int(coverage * 40 + validity_rate * 35 + eligibility_rate * 25)
        return self.score
 # ---------------------------------------------------------------------------
 # Schema requirements
 # ---------------------------------------------------------------------------
 PRODUCT_REQUIRED = {"name", "image", "description"}
 PRODUCT_RECOMMENDED = {
    "brand", "sku", "gtin", "gtin8", "gtin13", "gtin14", "mpn",
    "offers", "review", "aggregateRating", "color", "material",
 }
 OFFER_REQUIRED = {"price", "priceCurrency", "availability"}
 OFFER_RECOMMENDED = {
    "url", "priceValidUntil", "itemCondition", "seller",
    "shippingDetails", "hasMerchantReturnPolicy",
 }
 AGGREGATE_RATING_REQUIRED = {"ratingValue", "reviewCount"}
 AGGREGATE_RATING_RECOMMENDED = {"bestRating", "worstRating", "ratingCount"}
 REVIEW_REQUIRED = {"author", "reviewRating"}
 REVIEW_RECOMMENDED = {"datePublished", "reviewBody", "name"}
 BREADCRUMB_REQUIRED = {"itemListElement"}
 AVAILABILITY_VALUES = {
    "https://schema.org/InStock",
    "https://schema.org/OutOfStock",
    "https://schema.org/PreOrder",
    "https://schema.org/BackOrder",
    "https://schema.org/Discontinued",
    "https://schema.org/InStoreOnly",
    "https://schema.org/OnlineOnly",
    "https://schema.org/LimitedAvailability",
    "https://schema.org/SoldOut",
    "http://schema.org/InStock",
    "http://schema.org/OutOfStock",
    "http://schema.org/PreOrder",
    "http://schema.org/BackOrder",
    "http://schema.org/Discontinued",
    "InStock", "OutOfStock", "PreOrder", "BackOrder", "Discontinued",
 }
 ITEM_CONDITION_VALUES = {
    "https://schema.org/NewCondition",
    "https://schema.org/UsedCondition",
    "https://schema.org/RefurbishedCondition",
    "https://schema.org/DamagedCondition",
    "http://schema.org/NewCondition",
    "http://schema.org/UsedCondition",
    "http://schema.org/RefurbishedCondition",
    "NewCondition", "UsedCondition", "RefurbishedCondition",
 }
 # ---------------------------------------------------------------------------
 # Main checker
 # ---------------------------------------------------------------------------
 class ProductSchemaChecker(BaseAsyncClient):
    """Validate Product structured data on e-commerce pages."""
    def __init__(
        self,
        max_concurrent: int = 10,
        requests_per_second: float = 5.0,
        timeout: int = 30,
    ):
        super().__init__(max_concurrent=max_concurrent, requests_per_second=requests_per_second)
        self.timeout = aiohttp.ClientTimeout(total=timeout)
        self.headers = {
            "User-Agent": (
                "Mozilla/5.0 (compatible; ProductSchemaChecker/1.0; "
                "+https://ourdigital.org)"
            ),
            "Accept": "text/html,application/xhtml+xml",
            "Accept-Language": "ko-KR,ko;q=0.9,en-US;q=0.8,en;q=0.7",
        }
    # ------------------------------------------------------------------
    # Page fetching
    # ------------------------------------------------------------------
    async def _fetch_page(self, session: aiohttp.ClientSession, url: str) -> str:
        """Fetch page HTML."""
        try:
            async with session.get(url, headers=self.headers, timeout=self.timeout,
                                   allow_redirects=True, ssl=False) as resp:
                return await resp.text(errors="replace")
        except Exception as exc:
            self.logger.warning(f"Failed to fetch {url}: {exc}")
            return ""
    # ------------------------------------------------------------------
    # Schema extraction
    # ------------------------------------------------------------------
    def extract_schemas(self, html: str, page_url: str) -> list[dict]:
        """Extract all structured data from HTML (JSON-LD, Microdata, RDFa)."""
        schemas: list[dict] = []
        soup = BeautifulSoup(html, "lxml")
        # --- JSON-LD ---
        for script in soup.find_all("script", attrs={"type": "application/ld+json"}):
            try:
                text = script.string or script.get_text()
                if not text:
                    continue
                data = json.loads(text)
                if isinstance(data, list):
                    for item in data:
                        if isinstance(item, dict):
                            schemas.append(item)
                elif isinstance(data, dict):
                    # Handle @graph
                    if "@graph" in data:
                        for item in data["@graph"]:
                            if isinstance(item, dict):
                                schemas.append(item)
                    else:
                        schemas.append(data)
            except (json.JSONDecodeError, TypeError) as exc:
                self.logger.debug(f"JSON-LD parse error on {page_url}: {exc}")
        # --- Microdata ---
        for item_scope in soup.find_all(attrs={"itemscope": True}):
            item_type = item_scope.get("itemtype", "")
            if "Product" in item_type or "Offer" in item_type:
                microdata = self._parse_microdata(item_scope)
                if microdata:
                    schemas.append(microdata)
        return schemas
    def _parse_microdata(self, element) -> dict:
        """Parse microdata from an itemscope element."""
        result: dict[str, Any] = {}
        item_type = element.get("itemtype", "")
        if item_type:
            type_name = item_type.rstrip("/").split("/")[-1]
            result["@type"] = type_name
        for prop in element.find_all(attrs={"itemprop": True}, recursive=True):
            name = prop.get("itemprop", "")
            if not name:
                continue
            # Nested itemscope
            if prop.get("itemscope") is not None:
                result[name] = self._parse_microdata(prop)
            elif prop.name == "meta":
                result[name] = prop.get("content", "")
            elif prop.name == "link":
                result[name] = prop.get("href", "")
            elif prop.name == "img":
                result[name] = prop.get("src", "")
            elif prop.name == "time":
                result[name] = prop.get("datetime", prop.get_text(strip=True))
            else:
                result[name] = prop.get_text(strip=True)
        return result
    # ------------------------------------------------------------------
    # Validation methods
    # ------------------------------------------------------------------
    def validate_product_schema(self, schema_data: dict, page_url: str) -> ProductSchema:
        """Validate a Product schema object."""
        ps = ProductSchema(
            url=page_url,
            schema_type="Product",
            properties=[],
        )
        # Check required properties
        for prop_name in PRODUCT_REQUIRED:
            value = schema_data.get(prop_name)
            valid = bool(value)
            error = "" if valid else f"Missing required property: {prop_name}"
            sp = SchemaProperty(
                name=prop_name, value=value, required=True, valid=valid, error=error,
            )
            ps.properties.append(asdict(sp))
            if not valid:
                ps.errors.append(error)
        # Check recommended properties
        for prop_name in PRODUCT_RECOMMENDED:
            value = schema_data.get(prop_name)
            sp = SchemaProperty(
                name=prop_name, value=value if value else None,
                required=False, valid=bool(value),
                error="" if value else f"Missing recommended property: {prop_name}",
            )
            ps.properties.append(asdict(sp))
            if not value:
                ps.warnings.append(f"Missing recommended property: {prop_name}")
        # Validate offers
        offers = schema_data.get("offers")
        if offers:
            if isinstance(offers, list):
                for offer in offers:
                    offer_errors = self.validate_offer_schema(offer)
                    ps.errors.extend(offer_errors["errors"])
                    ps.warnings.extend(offer_errors["warnings"])
            elif isinstance(offers, dict):
                offer_errors = self.validate_offer_schema(offers)
                ps.errors.extend(offer_errors["errors"])
                ps.warnings.extend(offer_errors["warnings"])
        else:
            ps.errors.append("Missing 'offers' property (required for rich results)")
        # Validate aggregateRating
        agg_rating = schema_data.get("aggregateRating")
        if agg_rating and isinstance(agg_rating, dict):
            rating_result = self.validate_aggregate_rating(agg_rating)
            ps.errors.extend(rating_result["errors"])
            ps.warnings.extend(rating_result["warnings"])
        # Validate reviews
        review = schema_data.get("review")
        if review:
            reviews = review if isinstance(review, list) else [review]
            for r in reviews[:5]:  # Check up to 5 reviews
                if isinstance(r, dict):
                    review_result = self.validate_review_schema(r)
                    ps.errors.extend(review_result["errors"])
                    ps.warnings.extend(review_result["warnings"])
        ps.is_valid = len(ps.errors) == 0
        ps.rich_result_eligible = self.check_rich_result_eligibility(schema_data)
        return ps
    def validate_offer_schema(self, offer_data: dict) -> dict[str, list[str]]:
        """Validate an Offer schema object."""
        errors: list[str] = []
        warnings: list[str] = []
        for prop_name in OFFER_REQUIRED:
            value = offer_data.get(prop_name)
            if not value:
                errors.append(f"Offer missing required property: {prop_name}")
        # Validate price format
        price = offer_data.get("price")
        if price is not None:
            price_str = str(price).replace(",", "").strip()
            if not re.match(r"^\d+(\.\d+)?$", price_str):
                errors.append(f"Invalid price format: '{price}' (must be numeric)")
            elif float(price_str) <= 0:
                warnings.append(f"Price is zero or negative: {price}")
        # Validate priceCurrency
        currency = offer_data.get("priceCurrency", "")
        valid_currencies = {"KRW", "USD", "EUR", "JPY", "CNY", "GBP"}
        if currency and currency.upper() not in valid_currencies:
            warnings.append(f"Unusual currency code: {currency}")
        # Validate availability
        availability = offer_data.get("availability", "")
        if availability and availability not in AVAILABILITY_VALUES:
            errors.append(
                f"Invalid availability value: '{availability}'. "
                f"Use schema.org values like https://schema.org/InStock"
            )
        # Validate itemCondition
        condition = offer_data.get("itemCondition", "")
        if condition and condition not in ITEM_CONDITION_VALUES:
            warnings.append(f"Invalid itemCondition: '{condition}'")
        # Check recommended
        for prop_name in OFFER_RECOMMENDED:
            if not offer_data.get(prop_name):
                warnings.append(f"Offer missing recommended property: {prop_name}")
        return {"errors": errors, "warnings": warnings}
    def validate_aggregate_rating(self, rating_data: dict) -> dict[str, list[str]]:
        """Validate AggregateRating schema."""
        errors: list[str] = []
        warnings: list[str] = []
        for prop_name in AGGREGATE_RATING_REQUIRED:
            value = rating_data.get(prop_name)
            if value is None:
                errors.append(f"AggregateRating missing required: {prop_name}")
        # Validate ratingValue range
        rating_value = rating_data.get("ratingValue")
        best_rating = rating_data.get("bestRating", 5)
        worst_rating = rating_data.get("worstRating", 1)
        if rating_value is not None:
            try:
                rv = float(rating_value)
                br = float(best_rating)
                wr = float(worst_rating)
                if rv < wr or rv > br:
                    errors.append(
                        f"ratingValue ({rv}) outside range [{wr}, {br}]"
                    )
            except (ValueError, TypeError):
                errors.append(f"Invalid ratingValue format: {rating_value}")
        # Validate reviewCount
        review_count = rating_data.get("reviewCount")
        if review_count is not None:
            try:
                rc = int(review_count)
                if rc < 0:
                    errors.append(f"Negative reviewCount: {rc}")
            except (ValueError, TypeError):
                errors.append(f"Invalid reviewCount format: {review_count}")
        for prop_name in AGGREGATE_RATING_RECOMMENDED:
            if not rating_data.get(prop_name):
                warnings.append(f"AggregateRating missing recommended: {prop_name}")
        return {"errors": errors, "warnings": warnings}
    def validate_review_schema(self, review_data: dict) -> dict[str, list[str]]:
        """Validate Review schema."""
        errors: list[str] = []
        warnings: list[str] = []
        # Author validation
        author = review_data.get("author")
        if not author:
            errors.append("Review missing required: author")
        elif isinstance(author, dict):
            author_name = author.get("name", "")
            if not author_name:
                errors.append("Review author missing 'name' property")
        elif isinstance(author, str):
            if len(author.strip()) == 0:
                errors.append("Review author is empty string")
        # reviewRating validation
        review_rating = review_data.get("reviewRating")
        if not review_rating:
            errors.append("Review missing required: reviewRating")
        elif isinstance(review_rating, dict):
            rv = review_rating.get("ratingValue")
            if rv is None:
                errors.append("reviewRating missing ratingValue")
        for prop_name in REVIEW_RECOMMENDED:
            if not review_data.get(prop_name):
                warnings.append(f"Review missing recommended: {prop_name}")
        return {"errors": errors, "warnings": warnings}
    def validate_breadcrumb(self, schema_data: dict) -> dict[str, list[str]]:
        """Validate BreadcrumbList schema."""
        errors: list[str] = []
        warnings: list[str] = []
        items = schema_data.get("itemListElement")
        if not items:
            errors.append("BreadcrumbList missing itemListElement")
            return {"errors": errors, "warnings": warnings}
        if not isinstance(items, list):
            errors.append("itemListElement should be an array")
            return {"errors": errors, "warnings": warnings}
        for i, item in enumerate(items):
            if not isinstance(item, dict):
                errors.append(f"Breadcrumb item {i} is not an object")
                continue
            position = item.get("position")
            if position is None:
                errors.append(f"Breadcrumb item {i} missing 'position'")
            name = item.get("name") or (item.get("item", {}).get("name") if isinstance(item.get("item"), dict) else None)
            if not name:
                warnings.append(f"Breadcrumb item {i} missing 'name'")
        return {"errors": errors, "warnings": warnings}
    # ------------------------------------------------------------------
    # Rich result eligibility
    # ------------------------------------------------------------------
    def check_rich_result_eligibility(self, schema_data: dict) -> bool:
        """Assess Google rich result eligibility for Product schema."""
        # Must have name, image, and offers with price
        if not schema_data.get("name"):
            return False
        if not schema_data.get("image"):
            return False
        offers = schema_data.get("offers")
        if not offers:
            return False
        offer_list = offers if isinstance(offers, list) else [offers]
        for offer in offer_list:
            if not isinstance(offer, dict):
                continue
            if offer.get("price") and offer.get("priceCurrency") and offer.get("availability"):
                return True
        return False
    # ------------------------------------------------------------------
    # Naver Shopping requirements
    # ------------------------------------------------------------------
    def check_naver_shopping_requirements(self, schema_data: dict, page_url: str) -> list[dict]:
        """Check Naver Shopping specific schema requirements."""
        issues: list[dict] = []
        # Naver Shopping requires Product name in Korean for Korean market
        name = schema_data.get("name", "")
        korean_chars = len(re.findall(r"[\uac00-\ud7af]", str(name)))
        if korean_chars == 0 and name:
            issues.append({
                "url": page_url,
                "type": "naver_product_name",
                "severity": "medium",
                "message": "Product name has no Korean characters",
                "recommendation": "Include Korean product name for Naver Shopping visibility.",
            })
        # Naver prefers specific category mapping
        if not schema_data.get("category"):
            issues.append({
                "url": page_url,
                "type": "naver_category",
                "severity": "low",
                "message": "Missing 'category' property for Naver Shopping categorization",
                "recommendation": "Add category property matching Naver Shopping category taxonomy.",
            })
        # Naver requires image
        image = schema_data.get("image")
        if not image:
            issues.append({
                "url": page_url,
                "type": "naver_image",
                "severity": "high",
                "message": "Missing product image (required for Naver Shopping)",
                "recommendation": "Add at least one high-quality product image URL.",
            })
        elif isinstance(image, str):
            if not image.startswith("http"):
                issues.append({
                    "url": page_url,
                    "type": "naver_image_url",
                    "severity": "medium",
                    "message": "Product image URL is relative (should be absolute)",
                    "recommendation": "Use absolute URLs for product images.",
                })
        # Naver requires price in KRW
        offers = schema_data.get("offers")
        if offers:
            offer_list = offers if isinstance(offers, list) else [offers]
            for offer in offer_list:
                if isinstance(offer, dict):
                    currency = offer.get("priceCurrency", "")
                    if currency and currency.upper() != "KRW":
                        issues.append({
                            "url": page_url,
                            "type": "naver_currency",
                            "severity": "medium",
                            "message": f"Price currency is {currency}, not KRW",
                            "recommendation": "For Naver Shopping, provide price in KRW.",
                        })
        # Check brand/manufacturer
        if not schema_data.get("brand") and not schema_data.get("manufacturer"):
            issues.append({
                "url": page_url,
                "type": "naver_brand",
                "severity": "low",
                "message": "Missing brand/manufacturer (helpful for Naver Shopping filters)",
                "recommendation": "Add brand or manufacturer property.",
            })
        return issues
    # ------------------------------------------------------------------
    # Orchestrator
    # ------------------------------------------------------------------
    async def check(
        self,
        urls: list[str] | None = None,
        sitemap_url: str | None = None,
        sample_size: int = 50,
    ) -> SchemaCheckResult:
        """Run schema validation on URLs or sitemap."""
        result = SchemaCheckResult(timestamp=datetime.now().isoformat())
        target_urls: list[str] = []
        async with aiohttp.ClientSession() as session:
            if sitemap_url:
                # Fetch URLs from sitemap
                target_urls = await self._urls_from_sitemap(session, sitemap_url, sample_size)
            if urls:
                target_urls.extend(urls)
            target_urls = list(set(target_urls))[:sample_size]
            result.urls_checked = len(target_urls)
            self.logger.info(f"Checking {len(target_urls)} URLs for Product schema")
            error_counter: dict[str, int] = {}
            warning_counter: dict[str, int] = {}
            for url in target_urls:
                html = await self._fetch_page(session, url)
                if not html:
                    result.pages_without_schema += 1
                    continue
                schemas = self.extract_schemas(html, url)
                product_schemas = [
                    s for s in schemas
                    if self._get_schema_type(s) in ("Product", "ProductGroup")
                ]
                breadcrumb_schemas = [
                    s for s in schemas
                    if self._get_schema_type(s) == "BreadcrumbList"
                ]
                if not product_schemas:
                    result.pages_without_schema += 1
                    continue
                result.pages_with_schema += 1
                for ps_data in product_schemas:
                    ps = self.validate_product_schema(ps_data, url)
                    result.schemas.append(asdict(ps))
                    for err in ps.errors:
                        error_counter[err] = error_counter.get(err, 0) + 1
                    for warn in ps.warnings:
                        warning_counter[warn] = warning_counter.get(warn, 0) + 1
                    # Naver Shopping checks
                    naver_issues = self.check_naver_shopping_requirements(ps_data, url)
                    result.naver_shopping_issues.extend(naver_issues)
                # Validate breadcrumbs
                for bc_data in breadcrumb_schemas:
                    bc_result = self.validate_breadcrumb(bc_data)
                    for err in bc_result["errors"]:
                        error_counter[err] = error_counter.get(err, 0) + 1
            # Aggregate common errors/warnings
            result.common_errors = sorted(
                error_counter.keys(),
                key=lambda k: error_counter[k],
                reverse=True,
            )[:20]
            result.common_warnings = sorted(
                warning_counter.keys(),
                key=lambda k: warning_counter[k],
                reverse=True,
            )[:20]
        result.calculate_score()
        return result
    async def _urls_from_sitemap(
        self,
        session: aiohttp.ClientSession,
        sitemap_url: str,
        limit: int,
    ) -> list[str]:
        """Fetch product URLs from sitemap."""
        urls: list[str] = []
        try:
            async with session.get(sitemap_url, headers=self.headers,
                                   timeout=self.timeout, ssl=False) as resp:
                if resp.status != 200:
                    return urls
                text = await resp.text(errors="replace")
                soup = BeautifulSoup(text, "lxml-xml")
                # Handle sitemap index
                sitemapindex = soup.find_all("sitemap")
                if sitemapindex:
                    for sm in sitemapindex[:3]:
                        loc = sm.find("loc")
                        if loc:
                            child_urls = await self._urls_from_sitemap(session, loc.text.strip(), limit)
                            urls.extend(child_urls)
                            if len(urls) >= limit:
                                break
                else:
                    for tag in soup.find_all("url"):
                        loc = tag.find("loc")
                        if loc:
                            urls.append(loc.text.strip())
                            if len(urls) >= limit:
                                break
        except Exception as exc:
            self.logger.warning(f"Sitemap parse failed: {exc}")
        return urls[:limit]
    @staticmethod
    def _get_schema_type(schema: dict) -> str:
        """Get the @type from a schema dict, handling various formats."""
        schema_type = schema.get("@type", "")
        if isinstance(schema_type, list):
            return schema_type[0] if schema_type else ""
        return str(schema_type)
 # ---------------------------------------------------------------------------
 # CLI output helpers
 # ---------------------------------------------------------------------------
 def print_rich_report(result: SchemaCheckResult) -> None:
    """Print a rich-formatted report to the console."""
    console.print(f"\n[bold cyan]Product Schema Validation Report[/bold cyan]")
    console.print(f"Timestamp: {result.timestamp}")
    console.print(f"URLs checked: {result.urls_checked}")
    # Coverage
    coverage = (result.pages_with_schema / max(result.urls_checked, 1)) * 100
    cov_color = "green" if coverage >= 90 else "yellow" if coverage >= 50 else "red"
    console.print(f"Schema coverage: [{cov_color}]{coverage:.0f}%[/{cov_color}] "
                  f"({result.pages_with_schema}/{result.urls_checked})")
    # Score
    score_color = "green" if result.score >= 80 else "yellow" if result.score >= 50 else "red"
    console.print(f"[bold {score_color}]Score: {result.score}/100[/bold {score_color}]")
    # Validity summary
    valid = sum(1 for s in result.schemas if s.get("is_valid"))
    eligible = sum(1 for s in result.schemas if s.get("rich_result_eligible"))
    total = len(result.schemas)
    table = Table(title="Schema Summary")
    table.add_column("Metric", style="bold")
    table.add_column("Value", justify="right")
    table.add_row("Total schemas found", str(total))
    table.add_row("Valid schemas", str(valid))
    table.add_row("Rich result eligible", str(eligible))
    table.add_row("Pages without schema", str(result.pages_without_schema))
    console.print(table)
    # Common errors
    if result.common_errors:
        console.print(f"\n[bold red]Common Errors ({len(result.common_errors)}):[/bold red]")
        for err in result.common_errors[:10]:
            console.print(f"  [red]-[/red] {err}")
    # Common warnings
    if result.common_warnings:
        console.print(f"\n[bold yellow]Common Warnings ({len(result.common_warnings)}):[/bold yellow]")
        for warn in result.common_warnings[:10]:
            console.print(f"  [yellow]-[/yellow] {warn}")
    # Naver Shopping issues
    if result.naver_shopping_issues:
        console.print(f"\n[bold magenta]Naver Shopping Issues ({len(result.naver_shopping_issues)}):[/bold magenta]")
        seen: set[str] = set()
        for issue in result.naver_shopping_issues:
            key = f"{issue['type']}:{issue['message']}"
            if key not in seen:
                seen.add(key)
                console.print(f"  [{issue.get('severity', 'medium')}] {issue['message']}")
                console.print(f"    [dim]{issue['recommendation']}[/dim]")
 # ---------------------------------------------------------------------------
 # Main
 # ---------------------------------------------------------------------------
 def main() -> None:
    parser = argparse.ArgumentParser(
        description="Product Schema Checker - Validate e-commerce structured data",
    )
    group = parser.add_mutually_exclusive_group(required=True)
    group.add_argument("--url", nargs="+", help="Product page URL(s) to validate")
    group.add_argument("--sitemap", help="Sitemap URL to fetch product pages from")
    parser.add_argument(
        "--sample",
        type=int,
        default=50,
        help="Max URLs to check from sitemap (default: 50)",
    )
    parser.add_argument("--json", action="store_true", help="Output as JSON")
    parser.add_argument("--output", type=str, help="Save output to file")
    args = parser.parse_args()
    checker = ProductSchemaChecker()
    result = asyncio.run(
        checker.check(
            urls=args.url,
            sitemap_url=args.sitemap,
            sample_size=args.sample,
        )
    )
    if args.json:
        output = json.dumps(asdict(result), indent=2, ensure_ascii=False, default=str)
        if args.output:
            with open(args.output, "w", encoding="utf-8") as f:
                f.write(output)
            console.print(f"[green]Results saved to {args.output}[/green]")
        else:
            print(output)
    else:
        print_rich_report(result)
        if args.output:
            output = json.dumps(asdict(result), indent=2, ensure_ascii=False, default=str)
            with open(args.output, "w", encoding="utf-8") as f:
                f.write(output)
            console.print(f"\n[green]JSON results also saved to {args.output}[/green]")
    checker.print_stats()
 if __name__ == "__main__":
    main()
--- a/custom-skills/24-seo-ecommerce/code/scripts/requirements.txt
+++ b/custom-skills/24-seo-ecommerce/code/scripts/requirements.txt
@@ -0,0 +1,9 @@
 # 24-seo-ecommerce dependencies
 requests>=2.31.0
 aiohttp>=3.9.0
 beautifulsoup4>=4.12.0
 lxml>=5.1.0
 tenacity>=8.2.0
 tqdm>=4.66.0
 python-dotenv>=1.0.0
 rich>=13.7.0
--- a/custom-skills/24-seo-ecommerce/desktop/SKILL.md
+++ b/custom-skills/24-seo-ecommerce/desktop/SKILL.md
@@ -0,0 +1,156 @@
 ---
 name: seo-ecommerce
 description: |
  E-commerce SEO audit and optimization for product pages, product schema, category taxonomy,
  and Korean marketplace presence.
  Triggers: product SEO, e-commerce audit, product schema, category SEO, Smart Store, marketplace SEO,
  상품 SEO, 이커머스 감사, 쇼핑몰 SEO.
 ---
 # E-Commerce SEO Audit
 ## Purpose
 Audit e-commerce sites for product page optimization, structured data validation, category taxonomy health, duplicate content issues, and Korean marketplace presence (Naver Smart Store, Coupang, Gmarket, 11번가).
 ## Core Capabilities
 1. **Product Page SEO Audit** - Title, meta description, H1, image alt text, internal links, canonical tags
 2. **Product Schema Validation** - Product, Offer, AggregateRating, Review, BreadcrumbList structured data
 3. **Category Taxonomy Analysis** - Depth check, breadcrumbs, faceted navigation handling
 4. **Duplicate Content Detection** - Parameter variants, product variants, pagination issues
 5. **Korean Marketplace Presence** - Naver Smart Store, Coupang, Gmarket, 11번가
 ## MCP Tool Usage
 ### Ahrefs for Product Page Discovery
 ```
 mcp__ahrefs__site-explorer-pages-by-traffic: Identify top product and category pages
 mcp__ahrefs__site-explorer-organic-keywords: Product page keyword performance
 ```
 ### WebSearch for Marketplace Checks
 ```
 WebSearch: Search for brand presence on Korean marketplaces
 WebFetch: Fetch and analyze marketplace listing pages
 ```
 ### Notion for Report Storage
 ```
 mcp__notion__notion-create-pages: Save audit report to SEO Audit Log database
 ```
 ## Workflow
 ### 1. Product Page Audit
 1. Discover product pages via Ahrefs pages-by-traffic or sitemap
 2. For each product page check:
   - Title tag: contains product name, under 60 chars
   - Meta description: includes price/feature info, under 155 chars
   - Single H1 with product name
   - All product images have descriptive alt text
   - Canonical tag present and correct
   - Sufficient internal links (related products, breadcrumbs)
   - Open Graph tags for social sharing
 3. Score severity: critical/high/medium/low
 ### 2. Product Schema Validation
 1. Extract JSON-LD and Microdata from product pages
 2. Validate Product type: name, image, description (required)
 3. Validate Offer: price, priceCurrency, availability (required)
 4. Validate AggregateRating: ratingValue, reviewCount (required)
 5. Validate Review: author, reviewRating (required)
 6. Check BreadcrumbList implementation
 7. Assess Google rich result eligibility
 8. Check Naver Shopping specific requirements (Korean name, KRW price, absolute image URLs)
 ### 3. Category Taxonomy Analysis
 1. Crawl category pages from sitemap or homepage navigation
 2. Measure taxonomy depth (warn if > 4 levels)
 3. Check breadcrumb presence on every category page
 4. Identify faceted navigation URLs that are indexable without proper canonicals
 5. Count child category links for structure assessment
 ### 4. Duplicate Content Detection
 1. Group URLs by base path (stripping query parameters)
 2. Identify parameter variants (?color=, ?size=, ?sort=)
 3. Detect product variant URL duplicates (e.g., /product-red vs /product-blue)
 4. Flag paginated pages missing self-referencing canonicals
 ### 5. Korean Marketplace Presence
 1. Extract brand name from site (og:site_name or title)
 2. Search each marketplace for brand products:
   - Naver Smart Store (smartstore.naver.com)
   - Coupang (coupang.com)
   - Gmarket (gmarket.co.kr)
   - 11번가 (11st.co.kr)
 3. Check Naver Smart Store-specific SEO elements
 4. Verify naver-site-verification meta tag
 5. Check Korean content ratio for Naver visibility
 ## Output Format
 ```markdown
 ## E-Commerce SEO Audit: [domain]
 ### Score: [0-100]/100
 ### Product Page Issues
 - **Critical**: [count] issues
 - **High**: [count] issues
 - **Medium**: [count] issues
 - **Low**: [count] issues
 #### Top Issues
 1. [severity] [issue_type] - [message]
   Recommendation: [fix]
 ### Category Structure
 - Categories found: [count]
 - Max depth: [number]
 - Breadcrumbs present: [count]
 - Faceted navigation issues: [count]
 ### Schema Validation
 - Pages with schema: [count]/[total]
 - Valid schemas: [count]
 - Rich result eligible: [count]
 - Common errors: [list]
 ### Korean Marketplaces
 - Naver Smart Store: [Found/Not Found]
 - Coupang: [Found/Not Found]
 - Gmarket: [Found/Not Found]
 - 11번가: [Found/Not Found]
 ### Recommendations
 1. [Priority fixes ordered by impact]
 ```
 ## Common Issues
 | Issue | Impact | Fix |
 |-------|--------|-----|
 | Missing Product schema | High | Add JSON-LD Product with offers |
 | No canonical on product variants | High | Add self-referencing canonical |
 | Images without alt text | High | Add product name to alt text |
 | Category depth > 4 levels | Medium | Flatten taxonomy |
 | Missing breadcrumbs | Medium | Add BreadcrumbList schema and visible nav |
 | Faceted nav creating duplicates | High | Use canonical or noindex on filtered pages |
 | Missing Naver verification | Medium | Add naver-site-verification meta tag |
 | Price not in KRW for Korean market | Medium | Add KRW pricing to schema |
 ## Limitations
 - Cannot access logged-in areas (member-only products)
 - Marketplace search results may vary by region/IP
 - Large catalogs require sampling (default 50 pages)
 - Cannot validate JavaScript-rendered product content without headless browser
 ## Notion Output (Required)
 All audit reports MUST be saved to OurDigital SEO Audit Log:
 - **Database ID**: `2c8581e5-8a1e-8035-880b-e38cefc2f3ef`
 - **Properties**: Issue (title), Site (url), Category (E-Commerce SEO), Priority, Found Date, Audit ID
 - **Language**: Korean with English technical terms
 - **Audit ID Format**: ECOM-YYYYMMDD-NNN
--- a/custom-skills/24-seo-ecommerce/desktop/skill.yaml
+++ b/custom-skills/24-seo-ecommerce/desktop/skill.yaml
@@ -0,0 +1,8 @@
 name: seo-ecommerce
 description: |
  E-commerce SEO audit and optimization. Triggers: product SEO, e-commerce audit, product schema, category SEO, Smart Store, marketplace SEO.
 allowed-tools:
  - mcp__ahrefs__*
  - mcp__notion__*
  - WebSearch
  - WebFetch
--- a/custom-skills/24-seo-ecommerce/desktop/tools/ahrefs.md
+++ b/custom-skills/24-seo-ecommerce/desktop/tools/ahrefs.md
@@ -0,0 +1,15 @@
 # Ahrefs
 > TODO: Document tool usage for this skill
 ## Available Commands
 - [ ] List commands
 ## Configuration
 - [ ] Add configuration details
 ## Examples
 - [ ] Add usage examples
--- a/custom-skills/24-seo-ecommerce/desktop/tools/notion.md
+++ b/custom-skills/24-seo-ecommerce/desktop/tools/notion.md
@@ -0,0 +1,15 @@
 # Notion
 > TODO: Document tool usage for this skill
 ## Available Commands
 - [ ] List commands
 ## Configuration
 - [ ] Add configuration details
 ## Examples
 - [ ] Add usage examples
--- a/custom-skills/24-seo-ecommerce/desktop/tools/websearch.md
+++ b/custom-skills/24-seo-ecommerce/desktop/tools/websearch.md
@@ -0,0 +1,15 @@
 # WebSearch
 > TODO: Document tool usage for this skill
 ## Available Commands
 - [ ] List commands
 ## Configuration
 - [ ] Add configuration details
 ## Examples
 - [ ] Add usage examples
--- a/custom-skills/25-seo-kpi-framework/code/CLAUDE.md
+++ b/custom-skills/25-seo-kpi-framework/code/CLAUDE.md
@@ -0,0 +1,148 @@
 # CLAUDE.md
 ## Overview
 SEO KPI and performance framework for unified metrics aggregation across all SEO dimensions. Establishes baselines, sets targets (30/60/90-day), generates executive summaries with health scores, provides tactical breakdowns, estimates ROI using Ahrefs traffic cost, and supports period-over-period comparison (MoM, QoQ).
 ## Quick Start
 ```bash
 pip install -r scripts/requirements.txt
 # Aggregate KPIs
 python scripts/kpi_aggregator.py --url https://example.com --json
 # Generate performance report
 python scripts/performance_reporter.py --url https://example.com --period monthly --json
 ```
 ## Scripts
 | Script | Purpose | Key Output |
 |--------|---------|------------|
 | `kpi_aggregator.py` | Aggregate KPIs across all SEO dimensions | Unified KPI dashboard, health score, baselines |
 | `performance_reporter.py` | Generate period-over-period performance reports | Trend analysis, executive summary, tactical breakdown |
 | `base_client.py` | Shared utilities | RateLimiter, ConfigManager, BaseAsyncClient |
 ## KPI Aggregator
 ```bash
 # Full KPI aggregation
 python scripts/kpi_aggregator.py --url https://example.com --json
 # Set baselines
 python scripts/kpi_aggregator.py --url https://example.com --set-baseline --json
 # Compare against baseline
 python scripts/kpi_aggregator.py --url https://example.com --baseline baseline.json --json
 # With ROI estimation
 python scripts/kpi_aggregator.py --url https://example.com --roi --json
 ```
 **Capabilities**:
 - Unified KPI taxonomy across 7 dimensions:
  - Traffic KPIs (organic sessions, organic traffic value, traffic trend)
  - Ranking KPIs (visibility score, avg position, top10 keywords count)
  - Engagement KPIs (bounce rate, pages/session, avg session duration)
  - Technical KPIs (crawl errors, page speed score, mobile usability)
  - Content KPIs (indexed pages, content freshness score, thin content ratio)
  - Link KPIs (domain rating, referring domains, link velocity)
  - Local KPIs (GBP visibility, review score, citation accuracy)
 - Multi-source data aggregation from Ahrefs and other skill outputs
 - Baseline establishment and target setting (30/60/90 day)
 - Overall health score (0-100) with weighted dimensions
 - ROI estimation using Ahrefs organic traffic cost
 ## Performance Reporter
 ```bash
 # Monthly report
 python scripts/performance_reporter.py --url https://example.com --period monthly --json
 # Quarterly report
 python scripts/performance_reporter.py --url https://example.com --period quarterly --json
 # Custom date range
 python scripts/performance_reporter.py --url https://example.com --from 2025-01-01 --to 2025-03-31 --json
 # Executive summary only
 python scripts/performance_reporter.py --url https://example.com --period monthly --executive --json
 ```
 **Capabilities**:
 - Executive summary generation (health score, trend arrows, key wins/concerns)
 - Period-over-period comparison (MoM, QoQ, YoY)
 - Trend direction indicators (up improving, down declining, stable)
 - Top wins and concerns identification
 - Tactical breakdown with actionable next steps
 - Target vs actual comparison with progress %
 - Traffic value change (ROI proxy)
 ## Ahrefs MCP Tools Used
 | Tool | Purpose |
 |------|---------|
 | `site-explorer-metrics` | Current organic metrics |
 | `site-explorer-metrics-history` | Historical metrics trends |
 | `site-explorer-metrics-by-country` | Country-level breakdown |
 | `site-explorer-domain-rating-history` | DR trend over time |
 | `site-explorer-total-search-volume-history` | Total keyword volume trend |
 ## Output Format
 ```json
 {
  "url": "https://example.com",
  "health_score": 72,
  "health_trend": "improving",
  "kpis": {
    "traffic": {"organic_traffic": 15000, "traffic_value_usd": 45000, "trend": "up"},
    "rankings": {"visibility_score": 68, "avg_position": 18.5, "top10_count": 48},
    "links": {"domain_rating": 45, "referring_domains": 850, "velocity": "+15/month"},
    "technical": {"health_score": 85, "crawl_errors": 12},
    "content": {"indexed_pages": 320, "freshness_score": 70},
    "engagement": {"bounce_rate": 45, "pages_per_session": 2.8},
    "local": {"gbp_visibility": 80, "review_score": 4.5}
  },
  "targets": {
    "30_day": {},
    "60_day": {},
    "90_day": {}
  },
  "executive_summary": {
    "top_wins": [],
    "top_concerns": [],
    "recommendations": []
  },
  "timestamp": "2025-01-01T00:00:00"
 }
 ```
 ## Notion Output (Required)
 **IMPORTANT**: All audit reports MUST be saved to the OurDigital SEO Audit Log database.
 ### Database Configuration
 | Field | Value |
 |-------|-------|
 | Database ID | `2c8581e5-8a1e-8035-880b-e38cefc2f3ef` |
 | URL | https://www.notion.so/dintelligence/2c8581e58a1e8035880be38cefc2f3ef |
 ### Required Properties
 | Property | Type | Description |
 |----------|------|-------------|
 | Issue | Title | Report title (Korean + date) |
 | Site | URL | Audited website URL |
 | Category | Select | SEO KPI & Performance |
 | Priority | Select | Based on health score trend |
 | Found Date | Date | Report date (YYYY-MM-DD) |
 | Audit ID | Rich Text | Format: KPI-YYYYMMDD-NNN |
 ### Language Guidelines
 - Report content in Korean (한국어)
 - Keep technical English terms as-is (e.g., KPI, ROI, Domain Rating, Visibility Score)
 - URLs and code remain unchanged
--- a/custom-skills/25-seo-kpi-framework/code/scripts/base_client.py
+++ b/custom-skills/25-seo-kpi-framework/code/scripts/base_client.py
@@ -0,0 +1,207 @@
 """
 Base Client - Shared async client utilities
 ===========================================
 Purpose: Rate-limited async operations for API clients
 Python: 3.10+
 """
 import asyncio
 import logging
 import os
 from asyncio import Semaphore
 from datetime import datetime
 from typing import Any, Callable, TypeVar
 from dotenv import load_dotenv
 from tenacity import (
    retry,
    stop_after_attempt,
    wait_exponential,
    retry_if_exception_type,
 )
 # Load environment variables
 load_dotenv()
 # Logging setup
 logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s",
 )
 T = TypeVar("T")
 class RateLimiter:
    """Rate limiter using token bucket algorithm."""
    def __init__(self, rate: float, per: float = 1.0):
        """
        Initialize rate limiter.
        Args:
            rate: Number of requests allowed
            per: Time period in seconds (default: 1 second)
        """
        self.rate = rate
        self.per = per
        self.tokens = rate
        self.last_update = datetime.now()
        self._lock = asyncio.Lock()
    async def acquire(self) -> None:
        """Acquire a token, waiting if necessary."""
        async with self._lock:
            now = datetime.now()
            elapsed = (now - self.last_update).total_seconds()
            self.tokens = min(self.rate, self.tokens + elapsed * (self.rate / self.per))
            self.last_update = now
            if self.tokens < 1:
                wait_time = (1 - self.tokens) * (self.per / self.rate)
                await asyncio.sleep(wait_time)
                self.tokens = 0
            else:
                self.tokens -= 1
 class BaseAsyncClient:
    """Base class for async API clients with rate limiting."""
    def __init__(
        self,
        max_concurrent: int = 5,
        requests_per_second: float = 3.0,
        logger: logging.Logger | None = None,
    ):
        """
        Initialize base client.
        Args:
            max_concurrent: Maximum concurrent requests
            requests_per_second: Rate limit
            logger: Logger instance
        """
        self.semaphore = Semaphore(max_concurrent)
        self.rate_limiter = RateLimiter(requests_per_second)
        self.logger = logger or logging.getLogger(self.__class__.__name__)
        self.stats = {
            "requests": 0,
            "success": 0,
            "errors": 0,
            "retries": 0,
        }
    @retry(
        stop=stop_after_attempt(3),
        wait=wait_exponential(multiplier=1, min=2, max=10),
        retry=retry_if_exception_type(Exception),
    )
    async def _rate_limited_request(
        self,
        coro: Callable[[], Any],
    ) -> Any:
        """Execute a request with rate limiting and retry."""
        async with self.semaphore:
            await self.rate_limiter.acquire()
            self.stats["requests"] += 1
            try:
                result = await coro()
                self.stats["success"] += 1
                return result
            except Exception as e:
                self.stats["errors"] += 1
                self.logger.error(f"Request failed: {e}")
                raise
    async def batch_requests(
        self,
        requests: list[Callable[[], Any]],
        desc: str = "Processing",
    ) -> list[Any]:
        """Execute multiple requests concurrently."""
        try:
            from tqdm.asyncio import tqdm
            has_tqdm = True
        except ImportError:
            has_tqdm = False
        async def execute(req: Callable) -> Any:
            try:
                return await self._rate_limited_request(req)
            except Exception as e:
                return {"error": str(e)}
        tasks = [execute(req) for req in requests]
        if has_tqdm:
            results = []
            for coro in tqdm.as_completed(tasks, total=len(tasks), desc=desc):
                result = await coro
                results.append(result)
            return results
        else:
            return await asyncio.gather(*tasks, return_exceptions=True)
    def print_stats(self) -> None:
        """Print request statistics."""
        self.logger.info("=" * 40)
        self.logger.info("Request Statistics:")
        self.logger.info(f"  Total Requests: {self.stats['requests']}")
        self.logger.info(f"  Successful: {self.stats['success']}")
        self.logger.info(f"  Errors: {self.stats['errors']}")
        self.logger.info("=" * 40)
 class ConfigManager:
    """Manage API configuration and credentials."""
    def __init__(self):
        load_dotenv()
    @property
    def google_credentials_path(self) -> str | None:
        """Get Google service account credentials path."""
        # Prefer SEO-specific credentials, fallback to general credentials
        seo_creds = os.path.expanduser("~/.credential/ourdigital-seo-agent.json")
        if os.path.exists(seo_creds):
            return seo_creds
        return os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
    @property
    def pagespeed_api_key(self) -> str | None:
        """Get PageSpeed Insights API key."""
        return os.getenv("PAGESPEED_API_KEY")
    @property
    def custom_search_api_key(self) -> str | None:
        """Get Custom Search API key."""
        return os.getenv("CUSTOM_SEARCH_API_KEY")
    @property
    def custom_search_engine_id(self) -> str | None:
        """Get Custom Search Engine ID."""
        return os.getenv("CUSTOM_SEARCH_ENGINE_ID")
    @property
    def notion_token(self) -> str | None:
        """Get Notion API token."""
        return os.getenv("NOTION_TOKEN") or os.getenv("NOTION_API_KEY")
    def validate_google_credentials(self) -> bool:
        """Validate Google credentials are configured."""
        creds_path = self.google_credentials_path
        if not creds_path:
            return False
        return os.path.exists(creds_path)
    def get_required(self, key: str) -> str:
        """Get required environment variable or raise error."""
        value = os.getenv(key)
        if not value:
            raise ValueError(f"Missing required environment variable: {key}")
        return value
 # Singleton config instance
 config = ConfigManager()
--- a/custom-skills/25-seo-kpi-framework/code/scripts/kpi_aggregator.py
+++ b/custom-skills/25-seo-kpi-framework/code/scripts/kpi_aggregator.py
@@ -0,0 +1,758 @@
 """
 KPI Aggregator - Unified SEO KPI aggregation across all dimensions
 ==================================================================
 Purpose: Aggregate KPIs from Ahrefs and other sources into a unified
         dashboard with health scores, baselines, targets, and ROI.
 Python: 3.10+
 """
 import argparse
 import asyncio
 import json
 import logging
 import sys
 from dataclasses import dataclass, field, asdict
 from datetime import datetime
 from pathlib import Path
 from typing import Any
 import aiohttp
 from base_client import BaseAsyncClient, config
 logger = logging.getLogger(__name__)
 # ---------------------------------------------------------------------------
 # Data classes
 # ---------------------------------------------------------------------------
@dataclass
 class KpiMetric:
    """Single KPI metric with trend and target information."""
    name: str
    value: float
    previous_value: float | None = None
    change_pct: float | None = None
    trend: str = "stable"  # up, down, stable
    target_30d: float | None = None
    target_60d: float | None = None
    target_90d: float | None = None
    def compute_trend(self) -> None:
        """Compute trend direction and change percentage."""
        if self.previous_value is not None and self.previous_value != 0:
            self.change_pct = round(
                ((self.value - self.previous_value) / abs(self.previous_value)) * 100, 2
            )
            if self.change_pct > 2.0:
                self.trend = "up"
            elif self.change_pct < -2.0:
                self.trend = "down"
            else:
                self.trend = "stable"
@dataclass
 class KpiDimension:
    """A dimension grouping multiple KPI metrics."""
    name: str
    metrics: list[KpiMetric] = field(default_factory=list)
    weight: float = 0.0
    score: float = 0.0
    def compute_score(self) -> float:
        """Compute dimension score (0-100) based on metrics health."""
        if not self.metrics:
            self.score = 0.0
            return self.score
        metric_scores = []
        for m in self.metrics:
            if m.trend == "up":
                metric_scores.append(80.0)
            elif m.trend == "stable":
                metric_scores.append(60.0)
            else:
                metric_scores.append(35.0)
            # Boost score if value is positive and non-zero
            if m.value and m.value > 0:
                metric_scores[-1] = min(100.0, metric_scores[-1] + 10.0)
        self.score = round(sum(metric_scores) / len(metric_scores), 1)
        return self.score
@dataclass
 class HealthScore:
    """Overall SEO health score."""
    overall: float = 0.0
    dimensions: dict[str, float] = field(default_factory=dict)
    trend: str = "stable"
@dataclass
 class RoiEstimate:
    """ROI estimation from Ahrefs traffic cost."""
    traffic_value_usd: float = 0.0
    traffic_value_change: float = 0.0
    estimated_monthly_value: float = 0.0
@dataclass
 class KpiResult:
    """Complete KPI aggregation result."""
    url: str = ""
    health_score: float = 0.0
    health_trend: str = "stable"
    kpis: dict[str, Any] = field(default_factory=dict)
    targets: dict[str, Any] = field(default_factory=dict)
    roi: RoiEstimate | None = None
    baseline_comparison: dict[str, Any] | None = None
    executive_summary: dict[str, Any] = field(default_factory=dict)
    timestamp: str = ""
    errors: list[str] = field(default_factory=list)
 # ---------------------------------------------------------------------------
 # Dimension weights
 # ---------------------------------------------------------------------------
 DIMENSION_WEIGHTS = {
    "traffic": 0.25,
    "rankings": 0.20,
    "technical": 0.20,
    "content": 0.15,
    "links": 0.15,
    "local": 0.05,
 }
 # ---------------------------------------------------------------------------
 # KPI Aggregator
 # ---------------------------------------------------------------------------
 class KpiAggregator(BaseAsyncClient):
    """Aggregate SEO KPIs across all dimensions from Ahrefs data."""
    AHREFS_BASE = "https://api.ahrefs.com/v3"
    def __init__(self, api_token: str | None = None):
        super().__init__(max_concurrent=3, requests_per_second=2.0)
        self.api_token = api_token or config.get_required("AHREFS_API_TOKEN")
        self.headers = {
            "Authorization": f"Bearer {self.api_token}",
            "Accept": "application/json",
        }
    # ----- Ahrefs API helpers -----
    async def _ahrefs_get(
        self, session: aiohttp.ClientSession, endpoint: str, params: dict
    ) -> dict:
        """Make an authenticated GET request to Ahrefs API."""
        url = f"{self.AHREFS_BASE}/{endpoint}"
        async with session.get(url, headers=self.headers, params=params) as resp:
            if resp.status != 200:
                text = await resp.text()
                self.logger.warning(f"Ahrefs {endpoint} returned {resp.status}: {text}")
                return {"error": f"HTTP {resp.status}", "detail": text}
            return await resp.json()
    # ----- Dimension collectors -----
    async def get_traffic_kpis(
        self, session: aiohttp.ClientSession, url: str
    ) -> KpiDimension:
        """Collect traffic KPIs via site-explorer-metrics."""
        dim = KpiDimension(name="traffic", weight=DIMENSION_WEIGHTS["traffic"])
        try:
            data = await self._ahrefs_get(
                session,
                "site-explorer/metrics",
                {"target": url, "mode": "domain"},
            )
            if "error" not in data:
                metrics = data.get("metrics", data)
                organic = metrics.get("organic", {})
                organic_traffic = organic.get("traffic", 0)
                traffic_value_raw = organic.get("cost", 0)
                traffic_value_usd = traffic_value_raw / 100.0 if traffic_value_raw else 0.0
                dim.metrics.append(
                    KpiMetric(name="organic_traffic", value=float(organic_traffic))
                )
                dim.metrics.append(
                    KpiMetric(name="traffic_value_usd", value=round(traffic_value_usd, 2))
                )
            else:
                dim.metrics.append(KpiMetric(name="organic_traffic", value=0.0))
                dim.metrics.append(KpiMetric(name="traffic_value_usd", value=0.0))
        except Exception as exc:
            self.logger.error(f"Traffic KPI error: {exc}")
            dim.metrics.append(KpiMetric(name="organic_traffic", value=0.0))
        dim.compute_score()
        return dim
    async def get_ranking_kpis(
        self, session: aiohttp.ClientSession, url: str
    ) -> KpiDimension:
        """Collect ranking KPIs via site-explorer-metrics."""
        dim = KpiDimension(name="rankings", weight=DIMENSION_WEIGHTS["rankings"])
        try:
            data = await self._ahrefs_get(
                session,
                "site-explorer/metrics",
                {"target": url, "mode": "domain"},
            )
            if "error" not in data:
                metrics = data.get("metrics", data)
                organic = metrics.get("organic", {})
                keywords_total = organic.get("keywords", 0)
                # Estimate top10 as ~20% of total keywords
                top10_estimate = int(keywords_total * 0.20)
                # Visibility score heuristic: based on traffic relative to keywords
                traffic = organic.get("traffic", 0)
                visibility = min(100.0, (traffic / max(keywords_total, 1)) * 10)
                dim.metrics.append(
                    KpiMetric(name="visibility_score", value=round(visibility, 1))
                )
                dim.metrics.append(
                    KpiMetric(name="top10_keywords", value=float(top10_estimate))
                )
                dim.metrics.append(
                    KpiMetric(name="total_keywords", value=float(keywords_total))
                )
            else:
                dim.metrics.append(KpiMetric(name="visibility_score", value=0.0))
                dim.metrics.append(KpiMetric(name="top10_keywords", value=0.0))
        except Exception as exc:
            self.logger.error(f"Ranking KPI error: {exc}")
            dim.metrics.append(KpiMetric(name="visibility_score", value=0.0))
        dim.compute_score()
        return dim
    async def get_link_kpis(
        self, session: aiohttp.ClientSession, url: str
    ) -> KpiDimension:
        """Collect link KPIs via domain-rating and metrics."""
        dim = KpiDimension(name="links", weight=DIMENSION_WEIGHTS["links"])
        try:
            # Domain rating
            dr_data = await self._ahrefs_get(
                session,
                "site-explorer/domain-rating",
                {"target": url},
            )
            domain_rating = 0.0
            if "error" not in dr_data:
                domain_rating = float(
                    dr_data.get("domain_rating", dr_data.get("domainRating", 0))
                )
            dim.metrics.append(
                KpiMetric(name="domain_rating", value=round(domain_rating, 1))
            )
            # Referring domains from metrics
            metrics_data = await self._ahrefs_get(
                session,
                "site-explorer/metrics",
                {"target": url, "mode": "domain"},
            )
            ref_domains = 0
            if "error" not in metrics_data:
                metrics = metrics_data.get("metrics", metrics_data)
                ref_domains = metrics.get("refdomains", 0)
            dim.metrics.append(
                KpiMetric(name="referring_domains", value=float(ref_domains))
            )
        except Exception as exc:
            self.logger.error(f"Link KPI error: {exc}")
            dim.metrics.append(KpiMetric(name="domain_rating", value=0.0))
            dim.metrics.append(KpiMetric(name="referring_domains", value=0.0))
        dim.compute_score()
        return dim
    async def get_technical_kpis(
        self, session: aiohttp.ClientSession, url: str
    ) -> KpiDimension:
        """Collect technical KPIs (estimated from available data)."""
        dim = KpiDimension(name="technical", weight=DIMENSION_WEIGHTS["technical"])
        try:
            data = await self._ahrefs_get(
                session,
                "site-explorer/metrics",
                {"target": url, "mode": "domain"},
            )
            if "error" not in data:
                metrics = data.get("metrics", data)
                organic = metrics.get("organic", {})
                pages_crawled = metrics.get("pages", organic.get("pages", 0))
                # Heuristic: technical health score from available data
                has_traffic = organic.get("traffic", 0) > 0
                has_pages = pages_crawled > 0
                tech_score = 50.0
                if has_traffic:
                    tech_score += 25.0
                if has_pages:
                    tech_score += 25.0
                dim.metrics.append(
                    KpiMetric(name="technical_health_score", value=round(tech_score, 1))
                )
                dim.metrics.append(
                    KpiMetric(name="pages_crawled", value=float(pages_crawled))
                )
            else:
                dim.metrics.append(KpiMetric(name="technical_health_score", value=50.0))
                dim.metrics.append(KpiMetric(name="pages_crawled", value=0.0))
        except Exception as exc:
            self.logger.error(f"Technical KPI error: {exc}")
            dim.metrics.append(KpiMetric(name="technical_health_score", value=50.0))
        dim.compute_score()
        return dim
    async def get_content_kpis(
        self, session: aiohttp.ClientSession, url: str
    ) -> KpiDimension:
        """Collect content KPIs from available metrics."""
        dim = KpiDimension(name="content", weight=DIMENSION_WEIGHTS["content"])
        try:
            data = await self._ahrefs_get(
                session,
                "site-explorer/metrics",
                {"target": url, "mode": "domain"},
            )
            if "error" not in data:
                metrics = data.get("metrics", data)
                organic = metrics.get("organic", {})
                pages = metrics.get("pages", organic.get("pages", 0))
                keywords = organic.get("keywords", 0)
                # Content freshness heuristic
                freshness = min(100.0, (keywords / max(pages, 1)) * 5) if pages else 0.0
                dim.metrics.append(
                    KpiMetric(name="indexed_pages", value=float(pages))
                )
                dim.metrics.append(
                    KpiMetric(name="content_freshness_score", value=round(freshness, 1))
                )
                dim.metrics.append(
                    KpiMetric(name="keywords_per_page", value=round(keywords / max(pages, 1), 2))
                )
            else:
                dim.metrics.append(KpiMetric(name="indexed_pages", value=0.0))
                dim.metrics.append(KpiMetric(name="content_freshness_score", value=0.0))
        except Exception as exc:
            self.logger.error(f"Content KPI error: {exc}")
            dim.metrics.append(KpiMetric(name="indexed_pages", value=0.0))
        dim.compute_score()
        return dim
    async def get_local_kpis(self, url: str) -> KpiDimension:
        """Placeholder for local KPIs (requires external data)."""
        dim = KpiDimension(name="local", weight=DIMENSION_WEIGHTS["local"])
        dim.metrics.append(KpiMetric(name="gbp_visibility", value=0.0))
        dim.metrics.append(KpiMetric(name="review_score", value=0.0))
        dim.metrics.append(KpiMetric(name="citation_accuracy", value=0.0))
        dim.compute_score()
        return dim
    # ----- Health score -----
    def calculate_health_score(self, dimensions: list[KpiDimension]) -> HealthScore:
        """Calculate weighted health score across all dimensions."""
        health = HealthScore()
        total_weight = 0.0
        weighted_sum = 0.0
        for dim in dimensions:
            dim.compute_score()
            health.dimensions[dim.name] = dim.score
            weighted_sum += dim.score * dim.weight
            total_weight += dim.weight
        if total_weight > 0:
            health.overall = round(weighted_sum / total_weight, 1)
        else:
            health.overall = 0.0
        # Determine trend from dimension trends
        up_count = sum(
            1 for d in dimensions
            for m in d.metrics if m.trend == "up"
        )
        down_count = sum(
            1 for d in dimensions
            for m in d.metrics if m.trend == "down"
        )
        if up_count > down_count:
            health.trend = "improving"
        elif down_count > up_count:
            health.trend = "declining"
        else:
            health.trend = "stable"
        return health
    # ----- Targets -----
    def set_targets(self, dimensions: list[KpiDimension]) -> dict[str, Any]:
        """Calculate 30/60/90 day targets (5%/10%/20% improvement)."""
        targets = {"30_day": {}, "60_day": {}, "90_day": {}}
        growth_rates = {"30_day": 0.05, "60_day": 0.10, "90_day": 0.20}
        for dim in dimensions:
            for metric in dim.metrics:
                if metric.value and metric.value > 0:
                    for period, rate in growth_rates.items():
                        key = f"{dim.name}.{metric.name}"
                        # For metrics where lower is better (e.g. bounce rate),
                        # improvement means decrease
                        if metric.name in ("bounce_rate", "crawl_errors", "thin_content_ratio"):
                            target_val = metric.value * (1 - rate)
                        else:
                            target_val = metric.value * (1 + rate)
                        targets[period][key] = round(target_val, 2)
                    metric.target_30d = targets["30_day"].get(f"{dim.name}.{metric.name}")
                    metric.target_60d = targets["60_day"].get(f"{dim.name}.{metric.name}")
                    metric.target_90d = targets["90_day"].get(f"{dim.name}.{metric.name}")
        return targets
    # ----- ROI estimation -----
    def estimate_roi(self, traffic_dim: KpiDimension) -> RoiEstimate:
        """Estimate ROI from Ahrefs traffic cost data."""
        roi = RoiEstimate()
        for metric in traffic_dim.metrics:
            if metric.name == "traffic_value_usd":
                roi.traffic_value_usd = metric.value
                roi.estimated_monthly_value = metric.value
                if metric.previous_value is not None:
                    roi.traffic_value_change = round(
                        metric.value - metric.previous_value, 2
                    )
        return roi
    # ----- Baseline comparison -----
    def compare_baseline(
        self, current: list[KpiDimension], baseline: dict[str, Any]
    ) -> dict[str, Any]:
        """Compare current KPIs against a stored baseline."""
        comparison = {}
        baseline_kpis = baseline.get("kpis", {})
        for dim in current:
            dim_baseline = baseline_kpis.get(dim.name, {})
            dim_comparison = {}
            for metric in dim.metrics:
                baseline_val = None
                if isinstance(dim_baseline, dict):
                    baseline_val = dim_baseline.get(metric.name)
                if baseline_val is not None:
                    metric.previous_value = float(baseline_val)
                    metric.compute_trend()
                    dim_comparison[metric.name] = {
                        "current": metric.value,
                        "baseline": baseline_val,
                        "change_pct": metric.change_pct,
                        "trend": metric.trend,
                    }
                else:
                    dim_comparison[metric.name] = {
                        "current": metric.value,
                        "baseline": None,
                        "change_pct": None,
                        "trend": "no_baseline",
                    }
            comparison[dim.name] = dim_comparison
        return comparison
    # ----- Executive summary -----
    def generate_executive_summary(
        self, dimensions: list[KpiDimension], health: HealthScore
    ) -> dict[str, Any]:
        """Generate executive summary with wins, concerns, recommendations."""
        wins = []
        concerns = []
        recommendations = []
        for dim in dimensions:
            for metric in dim.metrics:
                if metric.trend == "up" and metric.change_pct and metric.change_pct > 5:
                    wins.append(
                        f"{dim.name}/{metric.name}: +{metric.change_pct}% improvement"
                    )
                elif metric.trend == "down" and metric.change_pct and metric.change_pct < -5:
                    concerns.append(
                        f"{dim.name}/{metric.name}: {metric.change_pct}% decline"
                    )
        # Generate recommendations based on dimension scores
        for dim in dimensions:
            if dim.score < 50:
                recommendations.append(
                    f"Priority: Improve {dim.name} dimension (score: {dim.score}/100)"
                )
            elif dim.score < 70:
                recommendations.append(
                    f"Monitor: {dim.name} dimension needs attention (score: {dim.score}/100)"
                )
        if not wins:
            wins.append("No significant improvements detected in this period")
        if not concerns:
            concerns.append("No significant declines detected in this period")
        if not recommendations:
            recommendations.append("All dimensions performing well - maintain current strategy")
        return {
            "health_score": health.overall,
            "health_trend": health.trend,
            "top_wins": wins[:5],
            "top_concerns": concerns[:5],
            "recommendations": recommendations[:5],
        }
    # ----- Main orchestration -----
    async def aggregate(
        self,
        url: str,
        include_roi: bool = False,
        baseline_path: str | None = None,
        set_baseline: bool = False,
    ) -> KpiResult:
        """Orchestrate full KPI aggregation across all dimensions."""
        result = KpiResult(url=url, timestamp=datetime.now().isoformat())
        dimensions: list[KpiDimension] = []
        async with aiohttp.ClientSession() as session:
            # Collect all dimensions concurrently
            tasks = [
                self.get_traffic_kpis(session, url),
                self.get_ranking_kpis(session, url),
                self.get_link_kpis(session, url),
                self.get_technical_kpis(session, url),
                self.get_content_kpis(session, url),
            ]
            gathered = await asyncio.gather(*tasks, return_exceptions=True)
            for item in gathered:
                if isinstance(item, Exception):
                    result.errors.append(str(item))
                    self.logger.error(f"Dimension error: {item}")
                else:
                    dimensions.append(item)
            # Local KPIs (no API call needed)
            local_dim = await self.get_local_kpis(url)
            dimensions.append(local_dim)
        # Load baseline if provided
        if baseline_path:
            try:
                baseline_data = json.loads(Path(baseline_path).read_text())
                result.baseline_comparison = self.compare_baseline(dimensions, baseline_data)
            except Exception as exc:
                result.errors.append(f"Baseline load error: {exc}")
        # Calculate health score
        health = self.calculate_health_score(dimensions)
        result.health_score = health.overall
        result.health_trend = health.trend
        # Build KPI dictionary
        for dim in dimensions:
            result.kpis[dim.name] = {
                "score": dim.score,
                "weight": dim.weight,
                "metrics": {m.name: asdict(m) for m in dim.metrics},
            }
        # Set targets
        targets = self.set_targets(dimensions)
        result.targets = targets
        # ROI estimation
        if include_roi:
            traffic_dim = next((d for d in dimensions if d.name == "traffic"), None)
            if traffic_dim:
                roi = self.estimate_roi(traffic_dim)
                result.roi = roi
        # Executive summary
        result.executive_summary = self.generate_executive_summary(dimensions, health)
        # Save baseline if requested
        if set_baseline:
            baseline_out = {
                "url": url,
                "timestamp": result.timestamp,
                "kpis": {},
            }
            for dim in dimensions:
                baseline_out["kpis"][dim.name] = {
                    m.name: m.value for m in dim.metrics
                }
            baseline_file = f"baseline_{url.replace('https://', '').replace('/', '_')}.json"
            Path(baseline_file).write_text(json.dumps(baseline_out, indent=2))
            self.logger.info(f"Baseline saved to {baseline_file}")
        return result
 # ---------------------------------------------------------------------------
 # Output formatting
 # ---------------------------------------------------------------------------
 def format_text_report(result: KpiResult) -> str:
    """Format KPI result as human-readable text report."""
    lines = []
    lines.append("=" * 60)
    lines.append(f"SEO KPI Dashboard: {result.url}")
    lines.append(f"Timestamp: {result.timestamp}")
    lines.append("=" * 60)
    lines.append("")
    # Health score
    lines.append(f"Overall Health Score: {result.health_score}/100  ({result.health_trend})")
    lines.append("-" * 40)
    # Dimensions
    for dim_name, dim_data in result.kpis.items():
        lines.append(f"\n[{dim_name.upper()}] Score: {dim_data['score']}/100 (weight: {dim_data['weight']})")
        metrics = dim_data.get("metrics", {})
        for m_name, m_data in metrics.items():
            trend_arrow = {"up": "^", "down": "v", "stable": "=", "no_baseline": "?"}.get(
                m_data.get("trend", "stable"), "="
            )
            val = m_data.get("value", 0)
            change = m_data.get("change_pct")
            change_str = f" ({change:+.1f}%)" if change is not None else ""
            lines.append(f"  {trend_arrow} {m_name}: {val}{change_str}")
    # Targets
    if result.targets:
        lines.append("\n" + "-" * 40)
        lines.append("TARGETS")
        for period, targets in result.targets.items():
            if targets:
                lines.append(f"\n  {period}:")
                for key, val in list(targets.items())[:10]:
                    lines.append(f"    {key}: {val}")
    # ROI
    if result.roi:
        lines.append("\n" + "-" * 40)
        lines.append("ROI ESTIMATE")
        lines.append(f"  Traffic Value (USD): ${result.roi.traffic_value_usd:,.2f}")
        lines.append(f"  Monthly Value: ${result.roi.estimated_monthly_value:,.2f}")
        lines.append(f"  Value Change: ${result.roi.traffic_value_change:,.2f}")
    # Executive summary
    if result.executive_summary:
        lines.append("\n" + "-" * 40)
        lines.append("EXECUTIVE SUMMARY")
        lines.append(f"  Health: {result.executive_summary.get('health_score', 0)}/100")
        lines.append(f"  Trend: {result.executive_summary.get('health_trend', 'stable')}")
        lines.append("\n  Top Wins:")
        for win in result.executive_summary.get("top_wins", []):
            lines.append(f"    + {win}")
        lines.append("\n  Top Concerns:")
        for concern in result.executive_summary.get("top_concerns", []):
            lines.append(f"    - {concern}")
        lines.append("\n  Recommendations:")
        for rec in result.executive_summary.get("recommendations", []):
            lines.append(f"    > {rec}")
    # Errors
    if result.errors:
        lines.append("\n" + "-" * 40)
        lines.append("ERRORS:")
        for err in result.errors:
            lines.append(f"  ! {err}")
    lines.append("\n" + "=" * 60)
    return "\n".join(lines)
 def serialize_result(result: KpiResult) -> dict:
    """Serialize KpiResult to JSON-safe dictionary."""
    data = {
        "url": result.url,
        "health_score": result.health_score,
        "health_trend": result.health_trend,
        "kpis": result.kpis,
        "targets": result.targets,
        "executive_summary": result.executive_summary,
        "timestamp": result.timestamp,
        "errors": result.errors,
    }
    if result.roi:
        data["roi"] = asdict(result.roi)
    if result.baseline_comparison:
        data["baseline_comparison"] = result.baseline_comparison
    return data
 # ---------------------------------------------------------------------------
 # CLI entry point
 # ---------------------------------------------------------------------------
 def parse_args() -> argparse.Namespace:
    """Parse command-line arguments."""
    parser = argparse.ArgumentParser(
        description="SEO KPI Aggregator - Unified metrics dashboard"
    )
    parser.add_argument(
        "--url", required=True, help="Target URL or domain to analyze"
    )
    parser.add_argument(
        "--set-baseline", action="store_true",
        help="Save current KPIs as baseline file"
    )
    parser.add_argument(
        "--baseline", type=str, default=None,
        help="Path to baseline JSON file for comparison"
    )
    parser.add_argument(
        "--roi", action="store_true",
        help="Include ROI estimation from traffic cost"
    )
    parser.add_argument(
        "--json", action="store_true",
        help="Output results as JSON"
    )
    parser.add_argument(
        "--output", type=str, default=None,
        help="Save output to file path"
    )
    return parser.parse_args()
 async def main() -> None:
    """Main entry point."""
    args = parse_args()
    aggregator = KpiAggregator()
    result = await aggregator.aggregate(
        url=args.url,
        include_roi=args.roi,
        baseline_path=args.baseline,
        set_baseline=args.set_baseline,
    )
    if args.json:
        output = json.dumps(serialize_result(result), indent=2, ensure_ascii=False)
    else:
        output = format_text_report(result)
    if args.output:
        Path(args.output).write_text(output, encoding="utf-8")
        logger.info(f"Output saved to {args.output}")
    else:
        print(output)
    aggregator.print_stats()
 if __name__ == "__main__":
    asyncio.run(main())
--- a/custom-skills/25-seo-kpi-framework/code/scripts/performance_reporter.py
+++ b/custom-skills/25-seo-kpi-framework/code/scripts/performance_reporter.py
@@ -0,0 +1,801 @@
 """
 Performance Reporter - Period-over-period SEO performance reports
 ================================================================
 Purpose: Generate executive summaries, trend analysis, tactical breakdowns,
         and target-vs-actual comparison from Ahrefs historical data.
 Python: 3.10+
 """
 import argparse
 import asyncio
 import json
 import logging
 import sys
 from dataclasses import dataclass, field, asdict
 from datetime import datetime, timedelta
 from pathlib import Path
 from typing import Any
 import aiohttp
 from base_client import BaseAsyncClient, config
 logger = logging.getLogger(__name__)
 # ---------------------------------------------------------------------------
 # Data classes
 # ---------------------------------------------------------------------------
@dataclass
 class TrendData:
    """Single trend data point for a metric."""
    period: str
    value: float
    change_pct: float | None = None
    direction: str = "stable"  # up, down, stable
@dataclass
 class WinConcern:
    """A notable win or concern from performance analysis."""
    category: str
    description: str
    impact: str = "medium"  # high, medium, low
    action: str = ""
@dataclass
 class TargetProgress:
    """Target vs actual progress tracking."""
    kpi_name: str
    target: float
    actual: float
    progress_pct: float = 0.0
    def compute_progress(self) -> None:
        """Compute progress percentage toward target."""
        if self.target and self.target != 0:
            self.progress_pct = round((self.actual / self.target) * 100, 1)
        else:
            self.progress_pct = 0.0
@dataclass
 class PerformanceReport:
    """Complete performance report."""
    url: str = ""
    period: str = "monthly"
    date_from: str = ""
    date_to: str = ""
    health_score: float = 0.0
    health_trend: str = "stable"
    trends: dict[str, list[TrendData]] = field(default_factory=dict)
    wins: list[WinConcern] = field(default_factory=list)
    concerns: list[WinConcern] = field(default_factory=list)
    executive_summary: dict[str, Any] = field(default_factory=dict)
    tactical_breakdown: dict[str, Any] = field(default_factory=dict)
    target_progress: list[TargetProgress] = field(default_factory=list)
    traffic_value_change: float = 0.0
    timestamp: str = ""
    errors: list[str] = field(default_factory=list)
 # ---------------------------------------------------------------------------
 # Period helpers
 # ---------------------------------------------------------------------------
 PERIOD_DAYS = {
    "monthly": 30,
    "quarterly": 90,
    "yearly": 365,
 }
 def get_date_range(
    period: str, date_from: str | None = None, date_to: str | None = None
 ) -> tuple[str, str]:
    """Compute date range from period or explicit dates."""
    if date_from and date_to:
        return date_from, date_to
    end = datetime.now()
    days = PERIOD_DAYS.get(period, 30)
    start = end - timedelta(days=days)
    return start.strftime("%Y-%m-%d"), end.strftime("%Y-%m-%d")
 def get_previous_range(
    date_from: str, date_to: str
 ) -> tuple[str, str]:
    """Compute the previous period of equal length for comparison."""
    start = datetime.strptime(date_from, "%Y-%m-%d")
    end = datetime.strptime(date_to, "%Y-%m-%d")
    delta = end - start
    prev_end = start - timedelta(days=1)
    prev_start = prev_end - delta
    return prev_start.strftime("%Y-%m-%d"), prev_end.strftime("%Y-%m-%d")
 # ---------------------------------------------------------------------------
 # Performance Reporter
 # ---------------------------------------------------------------------------
 class PerformanceReporter(BaseAsyncClient):
    """Generate period-over-period SEO performance reports from Ahrefs."""
    AHREFS_BASE = "https://api.ahrefs.com/v3"
    def __init__(self, api_token: str | None = None):
        super().__init__(max_concurrent=3, requests_per_second=2.0)
        self.api_token = api_token or config.get_required("AHREFS_API_TOKEN")
        self.headers = {
            "Authorization": f"Bearer {self.api_token}",
            "Accept": "application/json",
        }
    async def _ahrefs_get(
        self, session: aiohttp.ClientSession, endpoint: str, params: dict
    ) -> dict:
        """Make an authenticated GET request to Ahrefs API."""
        url = f"{self.AHREFS_BASE}/{endpoint}"
        async with session.get(url, headers=self.headers, params=params) as resp:
            if resp.status != 200:
                text = await resp.text()
                self.logger.warning(f"Ahrefs {endpoint} returned {resp.status}: {text}")
                return {"error": f"HTTP {resp.status}", "detail": text}
            return await resp.json()
    # ----- Data collectors -----
    async def get_metrics_history(
        self,
        session: aiohttp.ClientSession,
        url: str,
        date_from: str,
        date_to: str,
    ) -> list[dict]:
        """Fetch historical metrics via site-explorer-metrics-history."""
        data = await self._ahrefs_get(
            session,
            "site-explorer/metrics-history",
            {
                "target": url,
                "mode": "domain",
                "date_from": date_from,
                "date_to": date_to,
            },
        )
        if "error" in data:
            self.logger.warning(f"Metrics history error: {data}")
            return []
        return data.get("metrics", data.get("data", []))
    async def get_dr_history(
        self,
        session: aiohttp.ClientSession,
        url: str,
        date_from: str,
        date_to: str,
    ) -> list[dict]:
        """Fetch domain rating history."""
        data = await self._ahrefs_get(
            session,
            "site-explorer/domain-rating-history",
            {
                "target": url,
                "date_from": date_from,
                "date_to": date_to,
            },
        )
        if "error" in data:
            return []
        return data.get("domain_rating_history", data.get("data", []))
    async def get_current_metrics(
        self, session: aiohttp.ClientSession, url: str
    ) -> dict:
        """Fetch current snapshot metrics."""
        data = await self._ahrefs_get(
            session,
            "site-explorer/metrics",
            {"target": url, "mode": "domain"},
        )
        if "error" in data:
            return {}
        return data.get("metrics", data)
    async def get_volume_history(
        self,
        session: aiohttp.ClientSession,
        url: str,
        date_from: str,
        date_to: str,
    ) -> list[dict]:
        """Fetch total search volume history."""
        data = await self._ahrefs_get(
            session,
            "site-explorer/total-search-volume-history",
            {
                "target": url,
                "date_from": date_from,
                "date_to": date_to,
            },
        )
        if "error" in data:
            return []
        return data.get("total_search_volume_history", data.get("data", []))
    # ----- Analysis methods -----
    def calculate_period_comparison(
        self, current_data: list[dict], previous_data: list[dict], metric_key: str
    ) -> list[TrendData]:
        """Compare metric values between current and previous period."""
        trends = []
        def avg_metric(data_list: list[dict], key: str) -> float:
            vals = []
            for entry in data_list:
                val = entry.get(key)
                if val is None:
                    organic = entry.get("organic", {})
                    val = organic.get(key)
                if val is not None:
                    vals.append(float(val))
            return sum(vals) / len(vals) if vals else 0.0
        current_avg = avg_metric(current_data, metric_key)
        previous_avg = avg_metric(previous_data, metric_key)
        change_pct = None
        direction = "stable"
        if previous_avg and previous_avg != 0:
            change_pct = round(((current_avg - previous_avg) / abs(previous_avg)) * 100, 2)
            if change_pct > 2.0:
                direction = "up"
            elif change_pct < -2.0:
                direction = "down"
        trends.append(TrendData(
            period=metric_key,
            value=round(current_avg, 2),
            change_pct=change_pct,
            direction=direction,
        ))
        return trends
    def identify_wins(
        self, current: dict, previous: dict
    ) -> list[WinConcern]:
        """Identify significant positive changes between periods."""
        wins = []
        metric_labels = {
            "traffic": "Organic Traffic",
            "cost": "Traffic Value",
            "keywords": "Keyword Count",
            "refdomains": "Referring Domains",
        }
        for key, label in metric_labels.items():
            curr_val = self._extract_metric(current, key)
            prev_val = self._extract_metric(previous, key)
            if prev_val and prev_val > 0 and curr_val > prev_val:
                change_pct = ((curr_val - prev_val) / prev_val) * 100
                if change_pct >= 5.0:
                    impact = "high" if change_pct >= 20 else ("medium" if change_pct >= 10 else "low")
                    wins.append(WinConcern(
                        category=label,
                        description=f"{label} increased by {change_pct:+.1f}% ({prev_val:,.0f} -> {curr_val:,.0f})",
                        impact=impact,
                        action=f"Continue current {label.lower()} strategy",
                    ))
        return wins
    def identify_concerns(
        self, current: dict, previous: dict
    ) -> list[WinConcern]:
        """Identify significant negative changes between periods."""
        concerns = []
        metric_labels = {
            "traffic": "Organic Traffic",
            "cost": "Traffic Value",
            "keywords": "Keyword Count",
            "refdomains": "Referring Domains",
        }
        for key, label in metric_labels.items():
            curr_val = self._extract_metric(current, key)
            prev_val = self._extract_metric(previous, key)
            if prev_val and prev_val > 0 and curr_val < prev_val:
                change_pct = ((curr_val - prev_val) / prev_val) * 100
                if change_pct <= -5.0:
                    impact = "high" if change_pct <= -20 else ("medium" if change_pct <= -10 else "low")
                    actions = {
                        "Organic Traffic": "Investigate traffic sources and algorithm updates",
                        "Traffic Value": "Review keyword targeting and content quality",
                        "Keyword Count": "Expand content coverage and optimize existing pages",
                        "Referring Domains": "Strengthen link building outreach campaigns",
                    }
                    concerns.append(WinConcern(
                        category=label,
                        description=f"{label} decreased by {change_pct:.1f}% ({prev_val:,.0f} -> {curr_val:,.0f})",
                        impact=impact,
                        action=actions.get(label, f"Review {label.lower()} strategy"),
                    ))
        return concerns
    def _extract_metric(self, data: dict, key: str) -> float:
        """Extract a metric value from nested Ahrefs response."""
        if key in data:
            return float(data[key])
        organic = data.get("organic", {})
        if key in organic:
            return float(organic[key])
        return 0.0
    def generate_executive_summary(
        self,
        wins: list[WinConcern],
        concerns: list[WinConcern],
        health_score: float,
        health_trend: str,
        traffic_value_change: float,
    ) -> dict[str, Any]:
        """Generate high-level executive summary."""
        summary = {
            "health_score": health_score,
            "health_trend": health_trend,
            "traffic_value_change_usd": round(traffic_value_change, 2),
            "total_wins": len(wins),
            "total_concerns": len(concerns),
            "top_wins": [
                {"category": w.category, "description": w.description, "impact": w.impact}
                for w in sorted(wins, key=lambda x: {"high": 0, "medium": 1, "low": 2}.get(x.impact, 3))[:5]
            ],
            "top_concerns": [
                {"category": c.category, "description": c.description, "impact": c.impact}
                for c in sorted(concerns, key=lambda x: {"high": 0, "medium": 1, "low": 2}.get(x.impact, 3))[:5]
            ],
            "overall_assessment": "",
        }
        if health_score >= 75:
            summary["overall_assessment"] = "Strong performance - focus on maintaining momentum"
        elif health_score >= 50:
            summary["overall_assessment"] = "Moderate performance - targeted improvements needed"
        else:
            summary["overall_assessment"] = "Needs attention - prioritize fundamental improvements"
        return summary
    def generate_tactical_breakdown(
        self, current: dict, wins: list[WinConcern], concerns: list[WinConcern]
    ) -> dict[str, Any]:
        """Generate actionable next steps per dimension."""
        breakdown = {
            "traffic": {
                "status": "needs_review",
                "actions": [],
            },
            "rankings": {
                "status": "needs_review",
                "actions": [],
            },
            "links": {
                "status": "needs_review",
                "actions": [],
            },
            "content": {
                "status": "needs_review",
                "actions": [],
            },
            "technical": {
                "status": "needs_review",
                "actions": [],
            },
        }
        traffic = self._extract_metric(current, "traffic")
        keywords = self._extract_metric(current, "keywords")
        refdomains = self._extract_metric(current, "refdomains")
        # Traffic actions
        if traffic > 0:
            breakdown["traffic"]["status"] = "active"
            breakdown["traffic"]["actions"].append("Monitor top landing pages for traffic drops")
            breakdown["traffic"]["actions"].append("Identify new keyword opportunities in adjacent topics")
        else:
            breakdown["traffic"]["actions"].append("Establish organic traffic baseline with content strategy")
        # Rankings actions
        if keywords > 0:
            breakdown["rankings"]["status"] = "active"
            breakdown["rankings"]["actions"].append(
                f"Optimize pages for {int(keywords)} tracked keywords"
            )
            breakdown["rankings"]["actions"].append("Target featured snippets for top-performing queries")
        else:
            breakdown["rankings"]["actions"].append("Begin keyword research and content mapping")
        # Links actions
        if refdomains > 0:
            breakdown["links"]["status"] = "active"
            breakdown["links"]["actions"].append("Analyze top referring domains for partnership opportunities")
            breakdown["links"]["actions"].append("Monitor for lost backlinks and reclaim valuable links")
        else:
            breakdown["links"]["actions"].append("Develop link acquisition strategy with digital PR")
        # Content actions
        breakdown["content"]["actions"].append("Audit content freshness and update older pages")
        breakdown["content"]["actions"].append("Identify content gaps using competitor analysis")
        # Technical actions
        breakdown["technical"]["actions"].append("Run technical SEO audit for crawl issues")
        breakdown["technical"]["actions"].append("Verify Core Web Vitals pass thresholds")
        # Enrich with win/concern context
        for w in wins:
            cat_lower = w.category.lower()
            if "traffic" in cat_lower and breakdown.get("traffic"):
                breakdown["traffic"]["status"] = "improving"
            if "keyword" in cat_lower and breakdown.get("rankings"):
                breakdown["rankings"]["status"] = "improving"
            if "domain" in cat_lower or "link" in cat_lower:
                breakdown["links"]["status"] = "improving"
        for c in concerns:
            cat_lower = c.category.lower()
            if "traffic" in cat_lower and breakdown.get("traffic"):
                breakdown["traffic"]["status"] = "declining"
                breakdown["traffic"]["actions"].insert(0, c.action)
            if "keyword" in cat_lower and breakdown.get("rankings"):
                breakdown["rankings"]["status"] = "declining"
                breakdown["rankings"]["actions"].insert(0, c.action)
        return breakdown
    def compare_targets(
        self, current: dict, targets: dict
    ) -> list[TargetProgress]:
        """Compare current metrics against saved targets."""
        progress_list = []
        for key, target_val in targets.items():
            parts = key.split(".")
            metric_name = parts[-1] if len(parts) > 1 else key
            actual = self._extract_metric(current, metric_name)
            if actual == 0.0 and len(parts) > 1:
                # Try alternate key resolution
                actual = current.get(key, 0.0)
                if isinstance(actual, dict):
                    actual = 0.0
            tp = TargetProgress(
                kpi_name=key,
                target=float(target_val),
                actual=float(actual),
            )
            tp.compute_progress()
            progress_list.append(tp)
        return progress_list
    # ----- Main orchestration -----
    async def report(
        self,
        url: str,
        period: str = "monthly",
        date_from: str | None = None,
        date_to: str | None = None,
        executive_only: bool = False,
        targets_path: str | None = None,
    ) -> PerformanceReport:
        """Orchestrate full performance report generation."""
        report = PerformanceReport(
            url=url,
            period=period,
            timestamp=datetime.now().isoformat(),
        )
        # Determine date ranges
        report.date_from, report.date_to = get_date_range(period, date_from, date_to)
        prev_from, prev_to = get_previous_range(report.date_from, report.date_to)
        async with aiohttp.ClientSession() as session:
            # Fetch current and previous period data concurrently
            tasks = [
                self.get_metrics_history(session, url, report.date_from, report.date_to),
                self.get_metrics_history(session, url, prev_from, prev_to),
                self.get_current_metrics(session, url),
                self.get_dr_history(session, url, report.date_from, report.date_to),
                self.get_volume_history(session, url, report.date_from, report.date_to),
            ]
            results = await asyncio.gather(*tasks, return_exceptions=True)
            current_history = results[0] if not isinstance(results[0], Exception) else []
            previous_history = results[1] if not isinstance(results[1], Exception) else []
            current_snapshot = results[2] if not isinstance(results[2], Exception) else {}
            dr_history = results[3] if not isinstance(results[3], Exception) else []
            volume_history = results[4] if not isinstance(results[4], Exception) else []
            for i, r in enumerate(results):
                if isinstance(r, Exception):
                    report.errors.append(f"Data fetch error [{i}]: {r}")
        # Calculate trends for key metrics
        for metric_key in ["traffic", "keywords", "cost", "refdomains"]:
            if current_history or previous_history:
                trend = self.calculate_period_comparison(
                    current_history if isinstance(current_history, list) else [],
                    previous_history if isinstance(previous_history, list) else [],
                    metric_key,
                )
                report.trends[metric_key] = [asdict(t) for t in trend]
        # Build previous snapshot for comparison
        previous_snapshot = {}
        if isinstance(previous_history, list) and previous_history:
            for entry in previous_history:
                for key in ("traffic", "cost", "keywords", "refdomains"):
                    val = entry.get(key)
                    if val is None:
                        organic = entry.get("organic", {})
                        val = organic.get(key)
                    if val is not None:
                        if key not in previous_snapshot:
                            previous_snapshot[key] = []
                        previous_snapshot[key].append(float(val))
            # Average the values
            previous_snapshot = {
                k: sum(v) / len(v) for k, v in previous_snapshot.items() if v
            }
        # Identify wins and concerns
        if isinstance(current_snapshot, dict):
            report.wins = self.identify_wins(current_snapshot, previous_snapshot)
            report.concerns = self.identify_concerns(current_snapshot, previous_snapshot)
        else:
            report.wins = []
            report.concerns = []
        # Calculate health score (simple heuristic)
        traffic = self._extract_metric(current_snapshot, "traffic") if isinstance(current_snapshot, dict) else 0
        keywords = self._extract_metric(current_snapshot, "keywords") if isinstance(current_snapshot, dict) else 0
        score_components = []
        if traffic > 0:
            score_components.append(min(100, traffic / 100))
        if keywords > 0:
            score_components.append(min(100, keywords / 50))
        if dr_history:
            latest_dr = dr_history[-1] if isinstance(dr_history, list) else {}
            dr_val = latest_dr.get("domain_rating", latest_dr.get("domainRating", 0))
            score_components.append(float(dr_val))
        report.health_score = round(
            sum(score_components) / max(len(score_components), 1), 1
        )
        # Health trend
        win_count = len(report.wins)
        concern_count = len(report.concerns)
        if win_count > concern_count:
            report.health_trend = "improving"
        elif concern_count > win_count:
            report.health_trend = "declining"
        else:
            report.health_trend = "stable"
        # Traffic value change
        curr_cost = self._extract_metric(current_snapshot, "cost") if isinstance(current_snapshot, dict) else 0
        prev_cost = previous_snapshot.get("cost", 0)
        report.traffic_value_change = round((curr_cost - prev_cost) / 100.0, 2)
        # Executive summary
        report.executive_summary = self.generate_executive_summary(
            report.wins, report.concerns,
            report.health_score, report.health_trend,
            report.traffic_value_change,
        )
        if not executive_only:
            # Tactical breakdown
            report.tactical_breakdown = self.generate_tactical_breakdown(
                current_snapshot if isinstance(current_snapshot, dict) else {},
                report.wins, report.concerns,
            )
            # Target comparison
            if targets_path:
                try:
                    targets_data = json.loads(Path(targets_path).read_text())
                    # Use 30-day targets by default
                    target_set = targets_data.get("30_day", targets_data)
                    report.target_progress = self.compare_targets(
                        current_snapshot if isinstance(current_snapshot, dict) else {},
                        target_set,
                    )
                except Exception as exc:
                    report.errors.append(f"Targets load error: {exc}")
        return report
 # ---------------------------------------------------------------------------
 # Output formatting
 # ---------------------------------------------------------------------------
 def format_text_report(report: PerformanceReport) -> str:
    """Format performance report as human-readable text."""
    lines = []
    lines.append("=" * 60)
    lines.append(f"SEO Performance Report: {report.url}")
    lines.append(f"Period: {report.period} ({report.date_from} to {report.date_to})")
    lines.append(f"Generated: {report.timestamp}")
    lines.append("=" * 60)
    # Executive Summary
    lines.append("\nEXECUTIVE SUMMARY")
    lines.append("-" * 40)
    es = report.executive_summary
    lines.append(f"  Health Score: {es.get('health_score', 0)}/100")
    trend_arrow = {"improving": "^", "declining": "v", "stable": "="}.get(
        es.get("health_trend", "stable"), "="
    )
    lines.append(f"  Trend: {trend_arrow} {es.get('health_trend', 'stable')}")
    lines.append(f"  Traffic Value Change: ${es.get('traffic_value_change_usd', 0):,.2f}")
    lines.append(f"  Assessment: {es.get('overall_assessment', 'N/A')}")
    # Wins
    lines.append(f"\n  Top Wins ({es.get('total_wins', 0)} total):")
    for w in es.get("top_wins", []):
        impact_marker = {"high": "!!!", "medium": "!!", "low": "!"}.get(w.get("impact", "low"), "!")
        lines.append(f"    {impact_marker} [{w.get('category', '')}] {w.get('description', '')}")
    # Concerns
    lines.append(f"\n  Top Concerns ({es.get('total_concerns', 0)} total):")
    for c in es.get("top_concerns", []):
        impact_marker = {"high": "!!!", "medium": "!!", "low": "!"}.get(c.get("impact", "low"), "!")
        lines.append(f"    {impact_marker} [{c.get('category', '')}] {c.get('description', '')}")
    # Trends
    if report.trends:
        lines.append("\nTRENDS")
        lines.append("-" * 40)
        for metric_name, trend_list in report.trends.items():
            for t in trend_list:
                if isinstance(t, dict):
                    dir_arrow = {"up": "^", "down": "v", "stable": "="}.get(
                        t.get("direction", "stable"), "="
                    )
                    change_str = f" ({t.get('change_pct', 0):+.1f}%)" if t.get("change_pct") is not None else ""
                    lines.append(f"  {dir_arrow} {metric_name}: {t.get('value', 0):,.2f}{change_str}")
    # Tactical Breakdown
    if report.tactical_breakdown:
        lines.append("\nTACTICAL BREAKDOWN")
        lines.append("-" * 40)
        for dim_name, dim_data in report.tactical_breakdown.items():
            status = dim_data.get("status", "unknown")
            status_marker = {
                "improving": "^", "declining": "v", "active": "=", "needs_review": "?"
            }.get(status, "?")
            lines.append(f"\n  [{dim_name.upper()}] Status: {status_marker} {status}")
            for action in dim_data.get("actions", [])[:3]:
                lines.append(f"    > {action}")
    # Target Progress
    if report.target_progress:
        lines.append("\nTARGET PROGRESS")
        lines.append("-" * 40)
        for tp in report.target_progress:
            if isinstance(tp, TargetProgress):
                bar_filled = int(min(tp.progress_pct, 100) / 5)
                bar = "#" * bar_filled + "-" * (20 - bar_filled)
                lines.append(
                    f"  {tp.kpi_name}: [{bar}] {tp.progress_pct:.0f}% "
                    f"(actual: {tp.actual:,.0f} / target: {tp.target:,.0f})"
                )
    # Errors
    if report.errors:
        lines.append("\nERRORS")
        lines.append("-" * 40)
        for err in report.errors:
            lines.append(f"  ! {err}")
    lines.append("\n" + "=" * 60)
    return "\n".join(lines)
 def serialize_report(report: PerformanceReport) -> dict:
    """Serialize PerformanceReport to JSON-safe dictionary."""
    data = {
        "url": report.url,
        "period": report.period,
        "date_from": report.date_from,
        "date_to": report.date_to,
        "health_score": report.health_score,
        "health_trend": report.health_trend,
        "trends": report.trends,
        "wins": [asdict(w) for w in report.wins],
        "concerns": [asdict(c) for c in report.concerns],
        "executive_summary": report.executive_summary,
        "tactical_breakdown": report.tactical_breakdown,
        "target_progress": [asdict(tp) for tp in report.target_progress],
        "traffic_value_change": report.traffic_value_change,
        "timestamp": report.timestamp,
        "errors": report.errors,
    }
    return data
 # ---------------------------------------------------------------------------
 # CLI entry point
 # ---------------------------------------------------------------------------
 def parse_args() -> argparse.Namespace:
    """Parse command-line arguments."""
    parser = argparse.ArgumentParser(
        description="SEO Performance Reporter - Period-over-period analysis"
    )
    parser.add_argument(
        "--url", required=True, help="Target URL or domain"
    )
    parser.add_argument(
        "--period", choices=["monthly", "quarterly", "yearly", "custom"],
        default="monthly", help="Report period (default: monthly)"
    )
    parser.add_argument(
        "--from", dest="date_from", type=str, default=None,
        help="Start date (YYYY-MM-DD) for custom period"
    )
    parser.add_argument(
        "--to", dest="date_to", type=str, default=None,
        help="End date (YYYY-MM-DD) for custom period"
    )
    parser.add_argument(
        "--executive", action="store_true",
        help="Generate executive summary only"
    )
    parser.add_argument(
        "--targets", type=str, default=None,
        help="Path to targets JSON file for progress comparison"
    )
    parser.add_argument(
        "--json", action="store_true",
        help="Output results as JSON"
    )
    parser.add_argument(
        "--output", type=str, default=None,
        help="Save output to file path"
    )
    return parser.parse_args()
 async def main() -> None:
    """Main entry point."""
    args = parse_args()
    reporter = PerformanceReporter()
    result = await reporter.report(
        url=args.url,
        period=args.period,
        date_from=args.date_from,
        date_to=args.date_to,
        executive_only=args.executive,
        targets_path=args.targets,
    )
    if args.json:
        output = json.dumps(serialize_report(result), indent=2, ensure_ascii=False)
    else:
        output = format_text_report(result)
    if args.output:
        Path(args.output).write_text(output, encoding="utf-8")
        logger.info(f"Output saved to {args.output}")
    else:
        print(output)
    reporter.print_stats()
 if __name__ == "__main__":
    asyncio.run(main())
--- a/custom-skills/25-seo-kpi-framework/code/scripts/requirements.txt
+++ b/custom-skills/25-seo-kpi-framework/code/scripts/requirements.txt
@@ -0,0 +1,8 @@
 # 25-seo-kpi-framework dependencies
 requests>=2.31.0
 aiohttp>=3.9.0
 pandas>=2.1.0
 tenacity>=8.2.0
 tqdm>=4.66.0
 python-dotenv>=1.0.0
 rich>=13.7.0
--- a/custom-skills/25-seo-kpi-framework/desktop/SKILL.md
+++ b/custom-skills/25-seo-kpi-framework/desktop/SKILL.md
@@ -0,0 +1,107 @@
 ---
 name: seo-kpi-framework
 description: |
  SEO KPI and performance framework for unified metrics, health scores, ROI, and period-over-period reporting.
  Triggers: SEO KPI, performance report, health score, SEO metrics, ROI, baseline, targets.
 ---
 # SEO KPI & Performance Framework
 ## Purpose
 Aggregate SEO KPIs across all dimensions into a unified dashboard. Establish baselines, set targets (30/60/90-day), generate executive summaries with health scores, provide tactical breakdowns, estimate ROI using Ahrefs traffic cost, and support period-over-period comparison (MoM, QoQ, YoY).
 ## Core Capabilities
 1. **KPI Aggregation** - Unified metrics across 7 dimensions (traffic, rankings, links, technical, content, engagement, local)
 2. **Health Scoring** - Weighted 0-100 score with trend direction
 3. **Baseline & Targets** - Establish baselines and set 30/60/90 day growth targets
 4. **ROI Estimation** - Traffic value from Ahrefs organic cost
 5. **Performance Reporting** - Period-over-period comparison with executive summary
 6. **Tactical Breakdown** - Actionable next steps per dimension
 ## MCP Tool Usage
 ### Ahrefs for SEO Metrics
 ```
 mcp__ahrefs__site-explorer-metrics: Current organic metrics snapshot
 mcp__ahrefs__site-explorer-metrics-history: Historical trend data
 mcp__ahrefs__site-explorer-metrics-by-country: Country-level breakdown
 mcp__ahrefs__site-explorer-domain-rating-history: Domain rating trend
 mcp__ahrefs__site-explorer-total-search-volume-history: Keyword volume trend
 ```
 ### Notion for Report Storage
 ```
 mcp__notion__*: Save reports to SEO Audit Log database
 ```
 ## Workflow
 ### 1. KPI Aggregation
 1. Fetch site-explorer-metrics for current organic data
 2. Extract traffic, ranking, link, technical, content metrics
 3. Calculate dimension scores with weights (traffic 25%, rankings 20%, technical 20%, content 15%, links 15%, local 5%)
 4. Compute overall health score (0-100)
 5. Set 30/60/90 day targets (5%/10%/20% improvement)
 6. Estimate ROI from Ahrefs traffic cost (divide raw cost by 100 for USD)
 ### 2. Performance Reporting
 1. Determine date range from period (monthly/quarterly/yearly/custom)
 2. Fetch metrics-history for current and previous period
 3. Calculate period-over-period changes
 4. Identify wins (>5% improvement) and concerns (>5% decline)
 5. Generate executive summary with trend arrows
 6. Create tactical breakdown with actionable next steps
 7. Compare against targets if provided
 ## Output Format
 ```markdown
 ## SEO KPI Dashboard: [domain]
 ### Health Score: [score]/100 ([trend])
 ### KPI Summary
 | Dimension | Score | Key Metric | Trend |
 |-----------|-------|------------|-------|
 | Traffic | [score] | [organic_traffic] | [arrow] |
 | Rankings | [score] | [visibility] | [arrow] |
 | Links | [score] | [DR] | [arrow] |
 | Technical | [score] | [health] | [arrow] |
 | Content | [score] | [indexed_pages] | [arrow] |
 ### Executive Summary
 - Top Wins: [list]
 - Top Concerns: [list]
 - Recommendations: [list]
 ### Targets (30/60/90 day)
 [Target table with progress bars]
 ```
 ## Key Metrics
 | Dimension | Metrics | Source |
 |-----------|---------|--------|
 | Traffic | Organic traffic, traffic value (USD) | site-explorer-metrics |
 | Rankings | Visibility score, top10 keywords | site-explorer-metrics |
 | Links | Domain rating, referring domains | domain-rating, metrics |
 | Technical | Pages crawled, technical health | site-explorer-metrics |
 | Content | Indexed pages, freshness score | site-explorer-metrics |
 | Local | GBP visibility, review score | External data |
 ## Limitations
 - Local KPIs require external GBP data (not available via Ahrefs)
 - Engagement KPIs (bounce rate, session duration) require Google Analytics
 - Technical health is estimated heuristically from available data
 - ROI is estimated from Ahrefs traffic cost, not actual revenue
 ## Notion Output (Required)
 All reports MUST be saved to OurDigital SEO Audit Log:
 - **Database ID**: `2c8581e5-8a1e-8035-880b-e38cefc2f3ef`
 - **Properties**: Issue (title), Site (url), Category, Priority, Found Date, Audit ID
 - **Language**: Korean with English technical terms
 - **Audit ID Format**: KPI-YYYYMMDD-NNN
--- a/custom-skills/25-seo-kpi-framework/desktop/skill.yaml
+++ b/custom-skills/25-seo-kpi-framework/desktop/skill.yaml
@@ -0,0 +1,8 @@
 name: seo-kpi-framework
 description: |
  SEO KPI and performance framework. Triggers: SEO KPI, performance report, health score, SEO metrics, ROI, baseline, targets.
 allowed-tools:
  - mcp__ahrefs__*
  - mcp__notion__*
  - WebSearch
  - WebFetch
--- a/custom-skills/25-seo-kpi-framework/desktop/tools/ahrefs.md
+++ b/custom-skills/25-seo-kpi-framework/desktop/tools/ahrefs.md
@@ -0,0 +1,15 @@
 # Ahrefs
 > TODO: Document tool usage for this skill
 ## Available Commands
 - [ ] List commands
 ## Configuration
 - [ ] Add configuration details
 ## Examples
 - [ ] Add usage examples
--- a/custom-skills/25-seo-kpi-framework/desktop/tools/notion.md
+++ b/custom-skills/25-seo-kpi-framework/desktop/tools/notion.md
@@ -0,0 +1,15 @@
 # Notion
 > TODO: Document tool usage for this skill
 ## Available Commands
 - [ ] List commands
 ## Configuration
 - [ ] Add configuration details
 ## Examples
 - [ ] Add usage examples
--- a/custom-skills/25-seo-kpi-framework/desktop/tools/websearch.md
+++ b/custom-skills/25-seo-kpi-framework/desktop/tools/websearch.md
@@ -0,0 +1,15 @@
 # WebSearch
 > TODO: Document tool usage for this skill
 ## Available Commands
 - [ ] List commands
 ## Configuration
 - [ ] Add configuration details
 ## Examples
 - [ ] Add usage examples
--- a/custom-skills/26-seo-international/code/CLAUDE.md
+++ b/custom-skills/26-seo-international/code/CLAUDE.md
@@ -0,0 +1,149 @@
 # CLAUDE.md
 ## Overview
 International SEO audit tool for multi-language and multi-region website optimization. Validates hreflang tags (bidirectional, self-referencing, x-default), analyzes URL structure patterns (ccTLD vs subdomain vs subdirectory), audits content parity across language versions, checks language detection vs declared language, and analyzes international redirect logic. Supports Korean expansion patterns (ko→ja, ko→zh, ko→en).
 ## Quick Start
 ```bash
 pip install -r scripts/requirements.txt
 # Hreflang validation
 python scripts/hreflang_validator.py --url https://example.com --json
 # Full international SEO audit
 python scripts/international_auditor.py --url https://example.com --json
 ```
 ## Scripts
 | Script | Purpose | Key Output |
 |--------|---------|------------|
 | `hreflang_validator.py` | Validate hreflang tag implementation | Hreflang errors, missing bidirectional links, x-default issues |
 | `international_auditor.py` | Full international SEO audit | URL structure, content parity, redirect logic, language detection |
 | `base_client.py` | Shared utilities | RateLimiter, ConfigManager, BaseAsyncClient |
 ## Hreflang Validator
 ```bash
 # Validate hreflang for homepage
 python scripts/hreflang_validator.py --url https://example.com --json
 # Validate with sitemap-based discovery
 python scripts/hreflang_validator.py --url https://example.com --sitemap https://example.com/sitemap.xml --json
 # Check specific pages
 python scripts/hreflang_validator.py --urls-file pages.txt --json
 ```
 **Capabilities**:
 - Hreflang tag extraction from HTML head, HTTP headers, and XML sitemap
 - Bidirectional validation (if page A→B, then B→A must exist)
 - Self-referencing check (each page should reference itself)
 - x-default tag verification
 - Language/region code validation (ISO 639-1 + ISO 3166-1)
 - Conflicting hreflang detection
 - Missing language version detection
 - Return tag validation (confirmation links from alternate pages)
 ## International Auditor
 ```bash
 # Full international audit
 python scripts/international_auditor.py --url https://example.com --json
 # URL structure analysis
 python scripts/international_auditor.py --url https://example.com --scope structure --json
 # Content parity check
 python scripts/international_auditor.py --url https://example.com --scope parity --json
 # Korean expansion focus
 python scripts/international_auditor.py --url https://example.com --korean-expansion --json
 ```
 **Capabilities**:
 - URL structure analysis (ccTLD vs subdomain vs subdirectory)
  - Recommendation engine based on business context
 - Content parity audit across language versions
  - Page count comparison per language
  - Key page availability check (home, about, contact, products)
  - Content freshness comparison across languages
 - Language/locale detection vs declared language
  - HTML lang attribute check
  - Content-Language header check
  - Actual content language detection
 - International redirect logic audit
  - IP-based redirect detection
  - Accept-Language redirect behavior
  - Geo-redirect best practices (suggest→don't force)
 - Korean expansion patterns (ko→ja, ko→zh, ko→en)
  - Priority market recommendations for Korean businesses
  - CJK-specific URL encoding issues
  - Regional search engine considerations (Naver, Baidu, Yahoo Japan)
 ## Ahrefs MCP Tools Used
 | Tool | Purpose |
 |------|---------|
 | `site-explorer-metrics-by-country` | Country-level traffic distribution |
 | `site-explorer-organic-keywords` | Keywords by country filter |
 ## Output Format
 ```json
 {
  "url": "https://example.com",
  "url_structure": "subdirectory",
  "languages_detected": ["ko", "en", "ja"],
  "hreflang_validation": {
    "total_pages_checked": 50,
    "errors": [],
    "warnings": [],
    "missing_bidirectional": [],
    "missing_self_reference": [],
    "x_default_present": true
  },
  "content_parity": {
    "ko": {"pages": 150, "freshness_score": 90},
    "en": {"pages": 120, "freshness_score": 75},
    "ja": {"pages": 80, "freshness_score": 60}
  },
  "redirect_logic": {
    "ip_based_redirect": false,
    "language_based_redirect": true,
    "is_forced": false
  },
  "score": 68,
  "timestamp": "2025-01-01T00:00:00"
 }
 ```
 ## Notion Output (Required)
 **IMPORTANT**: All audit reports MUST be saved to the OurDigital SEO Audit Log database.
 ### Database Configuration
 | Field | Value |
 |-------|-------|
 | Database ID | `2c8581e5-8a1e-8035-880b-e38cefc2f3ef` |
 | URL | https://www.notion.so/dintelligence/2c8581e58a1e8035880be38cefc2f3ef |
 ### Required Properties
 | Property | Type | Description |
 |----------|------|-------------|
 | Issue | Title | Report title (Korean + date) |
 | Site | URL | Audited website URL |
 | Category | Select | International SEO |
 | Priority | Select | Based on hreflang error count |
 | Found Date | Date | Audit date (YYYY-MM-DD) |
 | Audit ID | Rich Text | Format: INTL-YYYYMMDD-NNN |
 ### Language Guidelines
 - Report content in Korean (한국어)
 - Keep technical English terms as-is (e.g., hreflang, x-default, ccTLD)
 - URLs and code remain unchanged
--- a/custom-skills/26-seo-international/code/scripts/base_client.py
+++ b/custom-skills/26-seo-international/code/scripts/base_client.py
@@ -0,0 +1,207 @@
 """
 Base Client - Shared async client utilities
 ===========================================
 Purpose: Rate-limited async operations for API clients
 Python: 3.10+
 """
 import asyncio
 import logging
 import os
 from asyncio import Semaphore
 from datetime import datetime
 from typing import Any, Callable, TypeVar
 from dotenv import load_dotenv
 from tenacity import (
    retry,
    stop_after_attempt,
    wait_exponential,
    retry_if_exception_type,
 )
 # Load environment variables
 load_dotenv()
 # Logging setup
 logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s",
 )
 T = TypeVar("T")
 class RateLimiter:
    """Rate limiter using token bucket algorithm."""
    def __init__(self, rate: float, per: float = 1.0):
        """
        Initialize rate limiter.
        Args:
            rate: Number of requests allowed
            per: Time period in seconds (default: 1 second)
        """
        self.rate = rate
        self.per = per
        self.tokens = rate
        self.last_update = datetime.now()
        self._lock = asyncio.Lock()
    async def acquire(self) -> None:
        """Acquire a token, waiting if necessary."""
        async with self._lock:
            now = datetime.now()
            elapsed = (now - self.last_update).total_seconds()
            self.tokens = min(self.rate, self.tokens + elapsed * (self.rate / self.per))
            self.last_update = now
            if self.tokens < 1:
                wait_time = (1 - self.tokens) * (self.per / self.rate)
                await asyncio.sleep(wait_time)
                self.tokens = 0
            else:
                self.tokens -= 1
 class BaseAsyncClient:
    """Base class for async API clients with rate limiting."""
    def __init__(
        self,
        max_concurrent: int = 5,
        requests_per_second: float = 3.0,
        logger: logging.Logger | None = None,
    ):
        """
        Initialize base client.
        Args:
            max_concurrent: Maximum concurrent requests
            requests_per_second: Rate limit
            logger: Logger instance
        """
        self.semaphore = Semaphore(max_concurrent)
        self.rate_limiter = RateLimiter(requests_per_second)
        self.logger = logger or logging.getLogger(self.__class__.__name__)
        self.stats = {
            "requests": 0,
            "success": 0,
            "errors": 0,
            "retries": 0,
        }
    @retry(
        stop=stop_after_attempt(3),
        wait=wait_exponential(multiplier=1, min=2, max=10),
        retry=retry_if_exception_type(Exception),
    )
    async def _rate_limited_request(
        self,
        coro: Callable[[], Any],
    ) -> Any:
        """Execute a request with rate limiting and retry."""
        async with self.semaphore:
            await self.rate_limiter.acquire()
            self.stats["requests"] += 1
            try:
                result = await coro()
                self.stats["success"] += 1
                return result
            except Exception as e:
                self.stats["errors"] += 1
                self.logger.error(f"Request failed: {e}")
                raise
    async def batch_requests(
        self,
        requests: list[Callable[[], Any]],
        desc: str = "Processing",
    ) -> list[Any]:
        """Execute multiple requests concurrently."""
        try:
            from tqdm.asyncio import tqdm
            has_tqdm = True
        except ImportError:
            has_tqdm = False
        async def execute(req: Callable) -> Any:
            try:
                return await self._rate_limited_request(req)
            except Exception as e:
                return {"error": str(e)}
        tasks = [execute(req) for req in requests]
        if has_tqdm:
            results = []
            for coro in tqdm.as_completed(tasks, total=len(tasks), desc=desc):
                result = await coro
                results.append(result)
            return results
        else:
            return await asyncio.gather(*tasks, return_exceptions=True)
    def print_stats(self) -> None:
        """Print request statistics."""
        self.logger.info("=" * 40)
        self.logger.info("Request Statistics:")
        self.logger.info(f"  Total Requests: {self.stats['requests']}")
        self.logger.info(f"  Successful: {self.stats['success']}")
        self.logger.info(f"  Errors: {self.stats['errors']}")
        self.logger.info("=" * 40)
 class ConfigManager:
    """Manage API configuration and credentials."""
    def __init__(self):
        load_dotenv()
    @property
    def google_credentials_path(self) -> str | None:
        """Get Google service account credentials path."""
        # Prefer SEO-specific credentials, fallback to general credentials
        seo_creds = os.path.expanduser("~/.credential/ourdigital-seo-agent.json")
        if os.path.exists(seo_creds):
            return seo_creds
        return os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
    @property
    def pagespeed_api_key(self) -> str | None:
        """Get PageSpeed Insights API key."""
        return os.getenv("PAGESPEED_API_KEY")
    @property
    def custom_search_api_key(self) -> str | None:
        """Get Custom Search API key."""
        return os.getenv("CUSTOM_SEARCH_API_KEY")
    @property
    def custom_search_engine_id(self) -> str | None:
        """Get Custom Search Engine ID."""
        return os.getenv("CUSTOM_SEARCH_ENGINE_ID")
    @property
    def notion_token(self) -> str | None:
        """Get Notion API token."""
        return os.getenv("NOTION_TOKEN") or os.getenv("NOTION_API_KEY")
    def validate_google_credentials(self) -> bool:
        """Validate Google credentials are configured."""
        creds_path = self.google_credentials_path
        if not creds_path:
            return False
        return os.path.exists(creds_path)
    def get_required(self, key: str) -> str:
        """Get required environment variable or raise error."""
        value = os.getenv(key)
        if not value:
            raise ValueError(f"Missing required environment variable: {key}")
        return value
 # Singleton config instance
 config = ConfigManager()
--- a/custom-skills/26-seo-international/code/scripts/international_auditor.py
+++ b/custom-skills/26-seo-international/code/scripts/international_auditor.py
--- a/custom-skills/26-seo-international/code/scripts/requirements.txt
+++ b/custom-skills/26-seo-international/code/scripts/requirements.txt
@@ -0,0 +1,10 @@
 # 26-seo-international dependencies
 requests>=2.31.0
 aiohttp>=3.9.0
 beautifulsoup4>=4.12.0
 lxml>=5.1.0
 langdetect>=1.0.9
 tenacity>=8.2.0
 tqdm>=4.66.0
 python-dotenv>=1.0.0
 rich>=13.7.0
--- a/custom-skills/26-seo-international/desktop/SKILL.md
+++ b/custom-skills/26-seo-international/desktop/SKILL.md
@@ -0,0 +1,124 @@
 ---
 name: seo-international
 description: |
  International SEO audit and hreflang validation for multi-language and multi-region websites.
  Triggers: hreflang, international SEO, multi-language, multi-region, content parity, x-default, ccTLD, 다국어 SEO.
 ---
 # International SEO Audit
 ## Purpose
 Audit international SEO implementation: hreflang tags, URL structure patterns, content parity across language versions, redirect logic, and Korean expansion strategies. Identify issues preventing proper multi-language indexing.
 ## Core Capabilities
 1. **Hreflang Validation** - Bidirectional links, self-reference, x-default, language code validation
 2. **URL Structure Analysis** - ccTLD vs subdomain vs subdirectory pattern detection
 3. **Content Parity Audit** - Page count comparison, key page availability across languages
 4. **Redirect Logic Audit** - IP-based, Accept-Language redirects, forced redirect detection
 5. **Korean Expansion** - Priority markets (ja, zh, en), CJK URL issues, regional search engines
 ## MCP Tool Usage
 ### Ahrefs for Country Metrics
 ```
 mcp__ahrefs__site-explorer-metrics-by-country: Country-level traffic distribution
 mcp__ahrefs__site-explorer-organic-keywords: Keywords filtered by country
 ```
 ### Notion for Report Storage
 ```
 mcp__notion__notion-create-pages: Save audit report to SEO Audit Log database
 ```
 ### WebSearch for Best Practices
 ```
 WebSearch: Research hreflang implementation guides and regional search engine requirements
 ```
 ## Workflow
 ### 1. Hreflang Validation
 1. Fetch target URL and extract hreflang tags (HTML head, HTTP headers)
 2. If sitemap provided, also extract xhtml:link hreflang from XML sitemap
 3. Validate language codes (ISO 639-1) and region codes (ISO 3166-1)
 4. Check bidirectional links (if A references B, B must reference A)
 5. Verify self-referencing tags on each page
 6. Check x-default tag presence and validity
 7. Detect conflicting hreflang for same language-region
 8. Report all errors with severity levels
 ### 2. URL Structure Analysis
 1. Crawl known language versions of the site
 2. Classify pattern: ccTLD (example.kr), subdomain (ko.example.com), subdirectory (example.com/ko/)
 3. Check consistency across all language versions
 4. Provide recommendation based on business context
 ### 3. Content Parity Audit
 1. Discover all language versions from hreflang tags
 2. Count pages per language version
 3. Check availability of key pages (home, about, contact, products/services)
 4. Compare content freshness (last modified dates) across versions
 5. Flag significant gaps in content availability
 ### 4. Redirect Logic Audit
 1. Test URL with different Accept-Language headers (ko, en, ja, zh)
 2. Check if redirects are forced (no way to override) vs suggested (banner/popup)
 3. Flag forced geo/language redirects as anti-pattern
 4. Recommend proper implementation (suggest, do not force)
 ### 5. Korean Expansion Analysis (Optional)
 1. Analyze current traffic by country via Ahrefs
 2. Recommend priority target markets for Korean businesses
 3. Check CJK-specific URL encoding issues
 4. Advise on regional search engines (Naver, Baidu, Yahoo Japan)
 ## Output Format
 ```markdown
 ## 다국어 SEO 감사: [domain]
 ### Hreflang 검증
 - 검사 페이지 수: [count]
 - 오류: [count] (심각 [count], 경고 [count])
 - 양방향 링크 누락: [list]
 - 자기참조 누락: [list]
 - x-default: [있음/없음]
 ### URL 구조
 - 패턴: [ccTLD/subdomain/subdirectory]
 - 일관성: [양호/비일관]
 - 권장사항: [recommendation]
 ### 콘텐츠 동등성
 | 언어 | 페이지 수 | 핵심 페이지 | 최신성 점수 |
 |------|----------|------------|-----------|
 | ko   | 150      | 5/5        | 90        |
 | en   | 120      | 4/5        | 75        |
 ### 리다이렉트 로직
 - IP 기반 리다이렉트: [있음/없음]
 - 언어 기반 리다이렉트: [있음/없음]
 - 강제 리다이렉트: [있음/없음] (없어야 정상)
 ### 종합 점수: [score]/100
 ### 권장 조치사항
 1. [Priority fixes in Korean]
 ```
 ## Notion Output (Required)
 All audit reports MUST be saved to OurDigital SEO Audit Log:
 - **Database ID**: `2c8581e5-8a1e-8035-880b-e38cefc2f3ef`
 - **Properties**: Issue (title), Site (url), Category (International SEO), Priority, Found Date, Audit ID
 - **Language**: Korean with English technical terms
 - **Audit ID Format**: INTL-YYYYMMDD-NNN
 ## Limitations
 - Cannot detect server-side IP-based redirects without proxy testing
 - Content language detection requires sufficient text content
 - Large sites (10,000+ pages) require sampling approach
 - Sitemap-based hreflang requires XML sitemap access
--- a/custom-skills/26-seo-international/desktop/skill.yaml
+++ b/custom-skills/26-seo-international/desktop/skill.yaml
@@ -0,0 +1,8 @@
 name: seo-international
 description: |
  International SEO audit and hreflang validation. Triggers: hreflang, international SEO, multi-language, multi-region, content parity, x-default, ccTLD.
 allowed-tools:
  - mcp__ahrefs__*
  - mcp__notion__*
  - WebSearch
  - WebFetch
--- a/custom-skills/26-seo-international/desktop/tools/ahrefs.md
+++ b/custom-skills/26-seo-international/desktop/tools/ahrefs.md
@@ -0,0 +1,43 @@
 # Ahrefs
 ## Tools Used
 ### site-explorer-metrics-by-country
 - **Purpose**: Get country-level organic traffic distribution
 - **Usage**: Analyze which countries drive traffic to identify international SEO opportunities
 - **Parameters**: `target` (domain), `country` (optional filter)
 - **Example**:
  ```
  mcp__ahrefs__site-explorer-metrics-by-country:
    target: example.com
  ```
 ### site-explorer-organic-keywords
 - **Purpose**: Get organic keyword rankings filtered by country
 - **Usage**: Analyze keyword performance in specific markets
 - **Parameters**: `target` (domain), `country` (ISO country code)
 - **Example**:
  ```
  mcp__ahrefs__site-explorer-organic-keywords:
    target: example.com
    country: kr
  ```
 ## Configuration
 - Ahrefs MCP server must be connected in Claude Desktop
 - API access requires active Ahrefs subscription
 ## Common Patterns
 ### Country Traffic Analysis
 1. Call `site-explorer-metrics-by-country` to get traffic distribution
 2. Identify top countries by organic traffic share
 3. Compare with hreflang implementation coverage
 4. Flag countries with traffic but no localized version
 ### Keyword Gap by Market
 1. Call `site-explorer-organic-keywords` with country filter
 2. Compare keyword counts across target markets
 3. Identify markets with low keyword coverage
 4. Recommend content localization priorities
--- a/custom-skills/26-seo-international/desktop/tools/notion.md
+++ b/custom-skills/26-seo-international/desktop/tools/notion.md
@@ -0,0 +1,51 @@
 # Notion
 ## Database Configuration
 | Field | Value |
 |-------|-------|
 | Database ID | `2c8581e5-8a1e-8035-880b-e38cefc2f3ef` |
 | URL | https://www.notion.so/dintelligence/2c8581e58a1e8035880be38cefc2f3ef |
 ## Required Properties
 | Property | Type | Description |
 |----------|------|-------------|
 | Issue | Title | Report title in Korean with date |
 | Site | URL | Audited website URL |
 | Category | Select | "International SEO" |
 | Priority | Select | Based on hreflang error severity |
 | Found Date | Date | Audit date (YYYY-MM-DD) |
 | Audit ID | Rich Text | Format: INTL-YYYYMMDD-NNN |
 ## Example: Create Audit Report
 ```
 mcp__notion__notion-create-pages:
  pages:
    - parent_id: "2c8581e5-8a1e-8035-880b-e38cefc2f3ef"
      parent_type: "database"
      title: "다국어 SEO 감사 - example.com (2025-01-15)"
      properties:
        Site:
          url: "https://example.com"
        Category:
          select:
            name: "International SEO"
        Priority:
          select:
            name: "High"
        Found Date:
          date:
            start: "2025-01-15"
        Audit ID:
          rich_text:
            - text:
                content: "INTL-20250115-001"
 ```
 ## Language Guidelines
 - Report content in Korean (한국어)
 - Keep technical English terms as-is (hreflang, x-default, ccTLD, subdomain)
 - URLs and code remain unchanged
--- a/custom-skills/26-seo-international/desktop/tools/websearch.md
+++ b/custom-skills/26-seo-international/desktop/tools/websearch.md
@@ -0,0 +1,40 @@
 # WebSearch
 ## Purpose
 Search the web for current international SEO best practices, hreflang implementation guides, and regional search engine requirements.
 ## Common Search Queries
 ### Hreflang Best Practices
 ```
 WebSearch: "hreflang implementation best practices 2025"
 WebSearch: "hreflang common errors fix"
 WebSearch: "x-default hreflang when to use"
 ```
 ### Regional Search Engines
 ```
 WebSearch: "Naver SEO requirements Korean websites"
 WebSearch: "Baidu SEO China market entry"
 WebSearch: "Yahoo Japan SEO vs Google Japan"
 ```
 ### International URL Structure
 ```
 WebSearch: "ccTLD vs subdomain vs subdirectory international SEO"
 WebSearch: "Google recommendations international targeting"
 ```
 ### Korean Market Expansion
 ```
 WebSearch: "Korean business international SEO Japan market"
 WebSearch: "CJK URL encoding SEO best practices"
 ```
 ## Usage Pattern
 1. Search for domain-specific international SEO intelligence
 2. Verify current Google documentation on hreflang
 3. Research regional search engine requirements for target markets
 4. Find competitor international SEO strategies
--- a/custom-skills/27-seo-ai-visibility/code/CLAUDE.md
+++ b/custom-skills/27-seo-ai-visibility/code/CLAUDE.md
@@ -0,0 +1,147 @@
 # CLAUDE.md
 ## Overview
 AI search visibility and brand radar tool for tracking how a brand appears in AI-generated search answers. Monitors AI answer citations, tracks share of voice in AI search vs competitors, analyzes cited domains and pages, and tracks impressions/mentions history. Uses Ahrefs Brand Radar APIs for comprehensive AI visibility monitoring.
 ## Quick Start
 ```bash
 pip install -r scripts/requirements.txt
 # AI visibility tracking
 python scripts/ai_visibility_tracker.py --target example.com --json
 # AI citation analysis
 python scripts/ai_citation_analyzer.py --target example.com --json
 ```
 ## Scripts
 | Script | Purpose | Key Output |
 |--------|---------|------------|
 | `ai_visibility_tracker.py` | Track brand visibility in AI search results | AI impressions, mentions, share of voice, trends |
 | `ai_citation_analyzer.py` | Analyze AI answer citations and source pages | Cited domains, cited pages, AI response analysis |
 | `base_client.py` | Shared utilities | RateLimiter, ConfigManager, BaseAsyncClient |
 ## AI Visibility Tracker
 ```bash
 # Current visibility overview
 python scripts/ai_visibility_tracker.py --target example.com --json
 # With competitor comparison
 python scripts/ai_visibility_tracker.py --target example.com --competitor comp1.com --competitor comp2.com --json
 # Historical trend (impressions/mentions)
 python scripts/ai_visibility_tracker.py --target example.com --history --json
 # Share of voice analysis
 python scripts/ai_visibility_tracker.py --target example.com --sov --json
 ```
 **Capabilities**:
 - AI impressions overview (how often brand appears in AI answers)
 - AI mentions overview (brand mention frequency across AI engines)
 - Share of Voice in AI search vs competitors
 - Impressions history over time (trend tracking)
 - Mentions history over time
 - SOV history and trend analysis
 - Competitor AI visibility comparison
 ## AI Citation Analyzer
 ```bash
 # Analyze AI citations for brand
 python scripts/ai_citation_analyzer.py --target example.com --json
 # Cited domains analysis
 python scripts/ai_citation_analyzer.py --target example.com --cited-domains --json
 # Cited pages analysis
 python scripts/ai_citation_analyzer.py --target example.com --cited-pages --json
 # AI response content analysis
 python scripts/ai_citation_analyzer.py --target example.com --responses --json
 ```
 **Capabilities**:
 - AI response analysis (how the brand appears in AI-generated answers)
 - Cited domains analysis (which source domains AI engines reference)
 - Cited pages analysis (which specific URLs get cited)
 - Citation sentiment and context analysis
 - Citation frequency ranking
 - Competitor citation comparison
 - Recommendation generation for improving AI visibility
 ## Ahrefs MCP Tools Used
 | Tool | Purpose |
 |------|---------|
 | `brand-radar-ai-responses` | Get AI-generated responses mentioning brand |
 | `brand-radar-cited-domains` | Domains cited in AI answers |
 | `brand-radar-cited-pages` | Specific pages cited in AI answers |
 | `brand-radar-impressions-history` | Brand impression trend over time |
 | `brand-radar-impressions-overview` | Current impression metrics |
 | `brand-radar-mentions-history` | Brand mention trend over time |
 | `brand-radar-mentions-overview` | Current mention metrics |
 | `brand-radar-sov-history` | Share of voice trend |
 | `brand-radar-sov-overview` | Current share of voice |
 ## Output Format
 ```json
 {
  "target": "example.com",
  "impressions": {
    "total": 15000,
    "trend": "increasing",
    "change_pct": 12.5
  },
  "mentions": {
    "total": 850,
    "trend": "stable",
    "change_pct": 2.1
  },
  "share_of_voice": {
    "brand_sov": 18.5,
    "competitors": [
      {"domain": "comp1.com", "sov": 25.3},
      {"domain": "comp2.com", "sov": 15.8}
    ]
  },
  "cited_domains": [...],
  "cited_pages": [...],
  "ai_responses_sample": [...],
  "recommendations": [...],
  "timestamp": "2025-01-01T00:00:00"
 }
 ```
 ## Notion Output (Required)
 **IMPORTANT**: All audit reports MUST be saved to the OurDigital SEO Audit Log database.
 ### Database Configuration
 | Field | Value |
 |-------|-------|
 | Database ID | `2c8581e5-8a1e-8035-880b-e38cefc2f3ef` |
 | URL | https://www.notion.so/dintelligence/2c8581e58a1e8035880be38cefc2f3ef |
 ### Required Properties
 | Property | Type | Description |
 |----------|------|-------------|
 | Issue | Title | Report title (Korean + date) |
 | Site | URL | Tracked website URL |
 | Category | Select | AI Search Visibility |
 | Priority | Select | Based on SOV trend |
 | Found Date | Date | Report date (YYYY-MM-DD) |
 | Audit ID | Rich Text | Format: AI-YYYYMMDD-NNN |
 ### Language Guidelines
 - Report content in Korean (한국어)
 - Keep technical English terms as-is (e.g., AI Search, Share of Voice, Brand Radar)
 - URLs and code remain unchanged
--- a/custom-skills/27-seo-ai-visibility/code/scripts/ai_citation_analyzer.py
+++ b/custom-skills/27-seo-ai-visibility/code/scripts/ai_citation_analyzer.py
@@ -0,0 +1,611 @@
 """
 AI Citation Analyzer - Brand Radar Citation Analysis
 =====================================================
 Purpose: Analyze how a brand is cited in AI-generated search answers,
         including cited domains, cited pages, and AI response content.
 Python: 3.10+
 Usage:
    python ai_citation_analyzer.py --target example.com --json
    python ai_citation_analyzer.py --target example.com --cited-domains --json
    python ai_citation_analyzer.py --target example.com --cited-pages --json
    python ai_citation_analyzer.py --target example.com --responses --json
 """
 import argparse
 import asyncio
 import json
 import logging
 import subprocess
 import sys
 from dataclasses import dataclass, field, asdict
 from datetime import datetime
 from pathlib import Path
 from typing import Any
 # Add parent to path for base_client import
 sys.path.insert(0, str(Path(__file__).parent))
 from base_client import BaseAsyncClient, config
 logger = logging.getLogger(__name__)
 # ---------------------------------------------------------------------------
 # Data classes
 # ---------------------------------------------------------------------------
@dataclass
 class AiResponse:
    """An AI-generated response that mentions the brand."""
    query: str = ""
    response_text: str = ""
    brand_mentioned: bool = False
    sentiment: str = "neutral"  # positive, neutral, negative
    source_engine: str = ""
    date: str = ""
    url: str = ""
@dataclass
 class CitedDomain:
    """A domain cited in AI-generated answers."""
    domain: str = ""
    citation_count: int = 0
    topics: list[str] = field(default_factory=list)
    share_pct: float = 0.0
@dataclass
 class CitedPage:
    """A specific page cited in AI-generated answers."""
    url: str = ""
    title: str = ""
    citation_count: int = 0
    context: str = ""
    topics: list[str] = field(default_factory=list)
@dataclass
 class CitationAnalysisResult:
    """Complete citation analysis result."""
    target: str = ""
    ai_responses: list[AiResponse] = field(default_factory=list)
    cited_domains: list[CitedDomain] = field(default_factory=list)
    cited_pages: list[CitedPage] = field(default_factory=list)
    sentiment_summary: dict = field(default_factory=dict)
    citation_ranking: list[dict] = field(default_factory=list)
    competitor_citations: list[dict] = field(default_factory=list)
    recommendations: list[str] = field(default_factory=list)
    timestamp: str = field(default_factory=lambda: datetime.now().isoformat())
    def to_dict(self) -> dict:
        """Convert result to dictionary."""
        return {
            "target": self.target,
            "ai_responses": [asdict(r) for r in self.ai_responses],
            "cited_domains": [asdict(d) for d in self.cited_domains],
            "cited_pages": [asdict(p) for p in self.cited_pages],
            "sentiment_summary": self.sentiment_summary,
            "citation_ranking": self.citation_ranking,
            "competitor_citations": self.competitor_citations,
            "recommendations": self.recommendations,
            "timestamp": self.timestamp,
        }
 # ---------------------------------------------------------------------------
 # MCP tool caller helper
 # ---------------------------------------------------------------------------
 def call_mcp_tool(tool_name: str, params: dict) -> dict:
    """
    Call an Ahrefs MCP tool and return the parsed JSON response.
    In Claude Desktop / Claude Code environments the MCP tools are invoked
    directly by the AI agent. This helper exists so that the script can also
    be executed standalone via subprocess for testing purposes.
    """
    logger.info(f"Calling MCP tool: {tool_name} with params: {params}")
    try:
        cmd = ["claude", "mcp", "call", "ahrefs", tool_name, json.dumps(params)]
        result = subprocess.run(cmd, capture_output=True, text=True, timeout=120)
        if result.returncode == 0 and result.stdout.strip():
            return json.loads(result.stdout.strip())
        logger.warning(f"MCP tool {tool_name} returned non-zero or empty: {result.stderr}")
        return {}
    except (subprocess.TimeoutExpired, json.JSONDecodeError, FileNotFoundError) as exc:
        logger.warning(f"MCP call failed ({exc}). Returning empty dict.")
        return {}
 # ---------------------------------------------------------------------------
 # AI Citation Analyzer
 # ---------------------------------------------------------------------------
 class AiCitationAnalyzer(BaseAsyncClient):
    """Analyze AI answer citations and source pages for a brand."""
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.logger = logging.getLogger(self.__class__.__name__)
    # ---- AI Responses ----
    async def get_ai_responses(self, target: str) -> list[AiResponse]:
        """Fetch AI-generated responses mentioning the brand via brand-radar-ai-responses."""
        self.logger.info(f"Fetching AI responses for {target}")
        data = await asyncio.to_thread(
            call_mcp_tool,
            "brand-radar-ai-responses",
            {"target": target},
        )
        responses: list[AiResponse] = []
        if not data:
            return responses
        items = data if isinstance(data, list) else data.get("responses", data.get("data", []))
        for item in items:
            if isinstance(item, dict):
                responses.append(AiResponse(
                    query=item.get("query", item.get("keyword", "")),
                    response_text=item.get("response_text", item.get("answer", item.get("text", ""))),
                    brand_mentioned=item.get("brand_mentioned", True),
                    sentiment=item.get("sentiment", "neutral"),
                    source_engine=item.get("source_engine", item.get("engine", "")),
                    date=item.get("date", ""),
                    url=item.get("url", ""),
                ))
        return responses
    # ---- Cited Domains ----
    async def get_cited_domains(self, target: str) -> list[CitedDomain]:
        """Fetch domains cited in AI answers via brand-radar-cited-domains."""
        self.logger.info(f"Fetching cited domains for {target}")
        data = await asyncio.to_thread(
            call_mcp_tool,
            "brand-radar-cited-domains",
            {"target": target},
        )
        domains: list[CitedDomain] = []
        if not data:
            return domains
        items = data if isinstance(data, list) else data.get("domains", data.get("data", []))
        for item in items:
            if isinstance(item, dict):
                domains.append(CitedDomain(
                    domain=item.get("domain", ""),
                    citation_count=item.get("citation_count", item.get("citations", item.get("count", 0))),
                    topics=item.get("topics", []),
                    share_pct=item.get("share_pct", item.get("share", 0.0)),
                ))
        return domains
    # ---- Cited Pages ----
    async def get_cited_pages(self, target: str) -> list[CitedPage]:
        """Fetch specific pages cited in AI answers via brand-radar-cited-pages."""
        self.logger.info(f"Fetching cited pages for {target}")
        data = await asyncio.to_thread(
            call_mcp_tool,
            "brand-radar-cited-pages",
            {"target": target},
        )
        pages: list[CitedPage] = []
        if not data:
            return pages
        items = data if isinstance(data, list) else data.get("pages", data.get("data", []))
        for item in items:
            if isinstance(item, dict):
                pages.append(CitedPage(
                    url=item.get("url", ""),
                    title=item.get("title", ""),
                    citation_count=item.get("citation_count", item.get("citations", item.get("count", 0))),
                    context=item.get("context", item.get("snippet", "")),
                    topics=item.get("topics", []),
                ))
        return pages
    # ---- Sentiment Analysis ----
    @staticmethod
    def analyze_response_sentiment(responses: list[AiResponse]) -> dict:
        """
        Analyze the sentiment distribution of AI responses.
        Returns a summary with counts and percentages for each sentiment category.
        """
        if not responses:
            return {
                "total": 0,
                "positive": 0,
                "neutral": 0,
                "negative": 0,
                "positive_pct": 0.0,
                "neutral_pct": 0.0,
                "negative_pct": 0.0,
                "overall_sentiment": "unknown",
            }
        total = len(responses)
        positive = sum(1 for r in responses if r.sentiment == "positive")
        neutral = sum(1 for r in responses if r.sentiment == "neutral")
        negative = sum(1 for r in responses if r.sentiment == "negative")
        positive_pct = round((positive / total) * 100, 1)
        neutral_pct = round((neutral / total) * 100, 1)
        negative_pct = round((negative / total) * 100, 1)
        # Determine overall sentiment
        if positive_pct >= 60:
            overall = "positive"
        elif negative_pct >= 40:
            overall = "negative"
        elif positive_pct > negative_pct:
            overall = "leaning_positive"
        elif negative_pct > positive_pct:
            overall = "leaning_negative"
        else:
            overall = "neutral"
        return {
            "total": total,
            "positive": positive,
            "neutral": neutral,
            "negative": negative,
            "positive_pct": positive_pct,
            "neutral_pct": neutral_pct,
            "negative_pct": negative_pct,
            "overall_sentiment": overall,
        }
    # ---- Citation Ranking ----
    @staticmethod
    def rank_citations(items: list[CitedDomain] | list[CitedPage]) -> list[dict]:
        """Rank cited domains or pages by citation frequency."""
        if not items:
            return []
        ranked = sorted(items, key=lambda x: x.citation_count, reverse=True)
        total_citations = sum(item.citation_count for item in ranked)
        result = []
        for rank, item in enumerate(ranked, 1):
            entry = asdict(item)
            entry["rank"] = rank
            entry["share_of_citations"] = (
                round((item.citation_count / total_citations) * 100, 1)
                if total_citations > 0
                else 0.0
            )
            result.append(entry)
        return result
    # ---- Competitor Citation Comparison ----
    async def compare_competitor_citations(
        self, target: str, competitors: list[str]
    ) -> list[dict]:
        """Compare citation profiles between target and competitors."""
        self.logger.info(f"Comparing citations for {target} vs {competitors}")
        results = []
        all_domains = [target] + competitors
        for domain in all_domains:
            cited_domains = await self.get_cited_domains(domain)
            cited_pages = await self.get_cited_pages(domain)
            total_domain_citations = sum(d.citation_count for d in cited_domains)
            total_page_citations = sum(p.citation_count for p in cited_pages)
            unique_domains = len(cited_domains)
            unique_pages = len(cited_pages)
            results.append({
                "domain": domain,
                "is_target": domain == target,
                "total_domain_citations": total_domain_citations,
                "total_page_citations": total_page_citations,
                "unique_cited_domains": unique_domains,
                "unique_cited_pages": unique_pages,
                "top_cited_domain": cited_domains[0].domain if cited_domains else "",
                "top_cited_page": cited_pages[0].url if cited_pages else "",
            })
        # Sort by total page citations descending
        results.sort(key=lambda x: x["total_page_citations"], reverse=True)
        return results
    # ---- Recommendations ----
    @staticmethod
    def generate_recommendations(result: CitationAnalysisResult) -> list[str]:
        """Generate actionable recommendations for improving AI citations."""
        recs: list[str] = []
        # Based on citation count
        total_page_citations = sum(p.citation_count for p in result.cited_pages)
        if total_page_citations == 0:
            recs.append(
                "AI 검색 엔진에서 인용된 페이지가 없습니다. "
                "고품질 원본 콘텐츠(연구 데이터, 종합 가이드, 전문가 인사이트)를 "
                "발행하여 AI 엔진의 인용 대상이 되도록 하세요."
            )
        elif total_page_citations < 10:
            recs.append(
                f"인용된 페이지 수가 {total_page_citations}건으로 적습니다. "
                "FAQ, How-to, 비교 분석 등 AI가 참조하기 쉬운 "
                "구조화된 콘텐츠를 추가하세요."
            )
        # Based on domain diversity
        if result.cited_domains:
            target_domains = [d for d in result.cited_domains if d.domain == result.target]
            if not target_domains:
                recs.append(
                    "타깃 도메인이 AI 인용 도메인 목록에 포함되지 않았습니다. "
                    "도메인 권위(Domain Authority) 향상과 "
                    "Schema Markup(JSON-LD) 적용을 우선 추진하세요."
                )
        # Based on sentiment
        sentiment = result.sentiment_summary
        if sentiment.get("negative_pct", 0) > 30:
            recs.append(
                f"AI 응답 중 부정적 언급 비율이 {sentiment['negative_pct']}%입니다. "
                "브랜드 평판 관리와 긍정적 콘텐츠 확대가 필요합니다. "
                "고객 리뷰, 성공 사례, 수상 내역 등을 강화하세요."
            )
        elif sentiment.get("overall_sentiment") == "positive":
            recs.append(
                "AI 응답에서 브랜드 언급이 전반적으로 긍정적입니다. "
                "이 긍정적 이미지를 활용하여 더 많은 키워드에서 "
                "AI 인용을 확대하세요."
            )
        # Content strategy recommendations
        if result.cited_pages:
            top_pages = sorted(result.cited_pages, key=lambda p: p.citation_count, reverse=True)[:3]
            top_topics = set()
            for page in top_pages:
                top_topics.update(page.topics)
            if top_topics:
                topics_str = ", ".join(list(top_topics)[:5])
                recs.append(
                    f"가장 많이 인용되는 주제는 [{topics_str}]입니다. "
                    "이 주제들에 대한 심층 콘텐츠를 추가 제작하세요."
                )
        # E-E-A-T and structured data
        recs.append(
            "AI 인용률 향상을 위한 핵심 전략: "
            "(1) E-E-A-T 시그널 강화 - 저자 프로필, 전문가 인용, 실제 경험 콘텐츠, "
            "(2) 구조화된 데이터 적용 - FAQ, HowTo, Article Schema, "
            "(3) 콘텐츠 정확성 및 최신성 유지, "
            "(4) 원본 데이터와 독자적 연구 결과 발행."
        )
        # Competitor-based recommendations
        if result.competitor_citations:
            leader = result.competitor_citations[0]
            if not leader.get("is_target", False):
                recs.append(
                    f"인용 리더는 {leader['domain']}입니다 "
                    f"(페이지 인용 {leader['total_page_citations']}건). "
                    "해당 경쟁사의 인용된 페이지를 분석하여 "
                    "콘텐츠 갭을 파악하세요."
                )
        return recs
    # ---- Main Orchestrator ----
    async def analyze(
        self,
        target: str,
        competitors: list[str] | None = None,
        include_responses: bool = True,
        include_cited_domains: bool = True,
        include_cited_pages: bool = True,
    ) -> CitationAnalysisResult:
        """
        Orchestrate full citation analysis.
        Args:
            target: Domain to analyze
            competitors: Optional competitor domains
            include_responses: Fetch AI response data
            include_cited_domains: Fetch cited domains
            include_cited_pages: Fetch cited pages
        """
        self.logger.info(f"Starting AI citation analysis for {target}")
        result = CitationAnalysisResult(target=target)
        # AI responses
        if include_responses:
            result.ai_responses = await self.get_ai_responses(target)
            result.sentiment_summary = self.analyze_response_sentiment(result.ai_responses)
        # Cited domains
        if include_cited_domains:
            result.cited_domains = await self.get_cited_domains(target)
            if result.cited_domains:
                result.citation_ranking = self.rank_citations(result.cited_domains)
        # Cited pages
        if include_cited_pages:
            result.cited_pages = await self.get_cited_pages(target)
        # Competitor comparison
        if competitors:
            result.competitor_citations = await self.compare_competitor_citations(
                target, competitors
            )
        # Recommendations
        result.recommendations = self.generate_recommendations(result)
        self.print_stats()
        return result
 # ---------------------------------------------------------------------------
 # CLI
 # ---------------------------------------------------------------------------
 def build_parser() -> argparse.ArgumentParser:
    """Build argument parser for CLI usage."""
    parser = argparse.ArgumentParser(
        description="AI Citation Analyzer - Analyze AI answer citations and source pages",
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog="""
 Examples:
  %(prog)s --target example.com --json
  %(prog)s --target example.com --cited-domains --json
  %(prog)s --target example.com --cited-pages --json
  %(prog)s --target example.com --responses --competitor comp1.com --json
  %(prog)s --target example.com --output citations.json
        """,
    )
    parser.add_argument(
        "--target", required=True,
        help="Target domain to analyze (e.g., example.com)",
    )
    parser.add_argument(
        "--competitor", action="append", default=[],
        help="Competitor domain (repeatable). e.g., --competitor a.com --competitor b.com",
    )
    parser.add_argument(
        "--cited-domains", action="store_true",
        help="Include cited domains analysis",
    )
    parser.add_argument(
        "--cited-pages", action="store_true",
        help="Include cited pages analysis",
    )
    parser.add_argument(
        "--responses", action="store_true",
        help="Include AI response content analysis",
    )
    parser.add_argument(
        "--json", action="store_true",
        help="Output result as JSON to stdout",
    )
    parser.add_argument(
        "--output", type=str, default=None,
        help="Save JSON output to file path",
    )
    return parser
 def print_summary(result: CitationAnalysisResult) -> None:
    """Print a human-readable summary of citation analysis."""
    print("\n" + "=" * 60)
    print(f"  AI Citation Analysis: {result.target}")
    print("=" * 60)
    # AI Responses
    if result.ai_responses:
        print(f"\n  AI Responses: {len(result.ai_responses)}")
        for resp in result.ai_responses[:5]:
            engine_tag = f" [{resp.source_engine}]" if resp.source_engine else ""
            sentiment_tag = f" ({resp.sentiment})"
            print(f"    - Q: {resp.query[:60]}{engine_tag}{sentiment_tag}")
        if len(result.ai_responses) > 5:
            print(f"    ... and {len(result.ai_responses) - 5} more")
    # Sentiment Summary
    if result.sentiment_summary:
        s = result.sentiment_summary
        print(f"\n  Sentiment: {s.get('overall_sentiment', 'unknown')}")
        print(f"    Positive: {s.get('positive', 0)} ({s.get('positive_pct', 0):.1f}%)")
        print(f"    Neutral:  {s.get('neutral', 0)} ({s.get('neutral_pct', 0):.1f}%)")
        print(f"    Negative: {s.get('negative', 0)} ({s.get('negative_pct', 0):.1f}%)")
    # Cited Domains
    if result.cited_domains:
        print(f"\n  Cited Domains: {len(result.cited_domains)}")
        for domain in result.cited_domains[:10]:
            topics_str = ", ".join(domain.topics[:3]) if domain.topics else ""
            print(f"    {domain.domain}: {domain.citation_count} citations"
                  f"{f' [{topics_str}]' if topics_str else ''}")
        if len(result.cited_domains) > 10:
            print(f"    ... and {len(result.cited_domains) - 10} more")
    # Cited Pages
    if result.cited_pages:
        print(f"\n  Cited Pages: {len(result.cited_pages)}")
        for page in result.cited_pages[:10]:
            title = page.title[:50] if page.title else page.url[:50]
            print(f"    {title}: {page.citation_count} citations")
        if len(result.cited_pages) > 10:
            print(f"    ... and {len(result.cited_pages) - 10} more")
    # Competitor Comparison
    if result.competitor_citations:
        print("\n  Competitor Citation Comparison:")
        for comp in result.competitor_citations:
            marker = " <-- target" if comp.get("is_target") else ""
            print(f"    {comp['domain']}: "
                  f"domains={comp['unique_cited_domains']}, "
                  f"pages={comp['unique_cited_pages']}, "
                  f"page_citations={comp['total_page_citations']}{marker}")
    # Recommendations
    if result.recommendations:
        print("\n  Recommendations:")
        for i, rec in enumerate(result.recommendations, 1):
            print(f"    {i}. {rec}")
    print("\n" + "=" * 60)
    print(f"  Generated: {result.timestamp}")
    print("=" * 60 + "\n")
 async def main() -> None:
    """CLI entry point."""
    parser = build_parser()
    args = parser.parse_args()
    # Determine which sections to include
    # If no specific flags, include everything
    any_specific = args.cited_domains or args.cited_pages or args.responses
    include_responses = args.responses or not any_specific
    include_cited_domains = args.cited_domains or not any_specific
    include_cited_pages = args.cited_pages or not any_specific
    analyzer = AiCitationAnalyzer(
        max_concurrent=5,
        requests_per_second=2.0,
    )
    result = await analyzer.analyze(
        target=args.target,
        competitors=args.competitor if args.competitor else None,
        include_responses=include_responses,
        include_cited_domains=include_cited_domains,
        include_cited_pages=include_cited_pages,
    )
    # Output
    if args.json or args.output:
        output_data = result.to_dict()
        json_str = json.dumps(output_data, ensure_ascii=False, indent=2)
        if args.json:
            print(json_str)
        if args.output:
            output_path = Path(args.output)
            output_path.parent.mkdir(parents=True, exist_ok=True)
            output_path.write_text(json_str, encoding="utf-8")
            logger.info(f"Report saved to {args.output}")
    else:
        print_summary(result)
 if __name__ == "__main__":
    asyncio.run(main())
--- a/custom-skills/27-seo-ai-visibility/code/scripts/ai_visibility_tracker.py
+++ b/custom-skills/27-seo-ai-visibility/code/scripts/ai_visibility_tracker.py
@@ -0,0 +1,594 @@
 """
 AI Visibility Tracker - Brand Radar Monitoring
 ================================================
 Purpose: Track brand visibility in AI-generated search answers
         using Ahrefs Brand Radar APIs.
 Python: 3.10+
 Usage:
    python ai_visibility_tracker.py --target example.com --json
    python ai_visibility_tracker.py --target example.com --competitor comp1.com --json
    python ai_visibility_tracker.py --target example.com --history --json
    python ai_visibility_tracker.py --target example.com --sov --json
 """
 import argparse
 import asyncio
 import json
 import logging
 import subprocess
 import sys
 from dataclasses import dataclass, field, asdict
 from datetime import datetime
 from pathlib import Path
 from typing import Any
 # Add parent to path for base_client import
 sys.path.insert(0, str(Path(__file__).parent))
 from base_client import BaseAsyncClient, config
 logger = logging.getLogger(__name__)
 # ---------------------------------------------------------------------------
 # Data classes
 # ---------------------------------------------------------------------------
@dataclass
 class ImpressionMetrics:
    """AI search impression metrics for a brand."""
    total: int = 0
    trend: str = "stable"  # increasing, decreasing, stable
    change_pct: float = 0.0
    period: str = ""
    breakdown: dict = field(default_factory=dict)
@dataclass
 class MentionMetrics:
    """AI search mention metrics for a brand."""
    total: int = 0
    trend: str = "stable"
    change_pct: float = 0.0
    period: str = ""
    breakdown: dict = field(default_factory=dict)
@dataclass
 class SovMetric:
    """Share of Voice metric for a single domain."""
    domain: str = ""
    sov_pct: float = 0.0
    change_pct: float = 0.0
@dataclass
 class HistoryPoint:
    """Single data point in a time series."""
    date: str = ""
    value: float = 0.0
@dataclass
 class CompetitorVisibility:
    """Aggregated AI visibility metrics for a competitor domain."""
    domain: str = ""
    impressions: int = 0
    mentions: int = 0
    sov: float = 0.0
@dataclass
 class AiVisibilityResult:
    """Complete AI visibility tracking result."""
    target: str = ""
    impressions: ImpressionMetrics = field(default_factory=ImpressionMetrics)
    mentions: MentionMetrics = field(default_factory=MentionMetrics)
    share_of_voice: dict = field(default_factory=dict)
    impressions_history: list[HistoryPoint] = field(default_factory=list)
    mentions_history: list[HistoryPoint] = field(default_factory=list)
    sov_history: list[HistoryPoint] = field(default_factory=list)
    competitors: list[CompetitorVisibility] = field(default_factory=list)
    recommendations: list[str] = field(default_factory=list)
    timestamp: str = field(default_factory=lambda: datetime.now().isoformat())
    def to_dict(self) -> dict:
        """Convert result to dictionary."""
        return {
            "target": self.target,
            "impressions": asdict(self.impressions),
            "mentions": asdict(self.mentions),
            "share_of_voice": self.share_of_voice,
            "impressions_history": [asdict(h) for h in self.impressions_history],
            "mentions_history": [asdict(h) for h in self.mentions_history],
            "sov_history": [asdict(h) for h in self.sov_history],
            "competitors": [asdict(c) for c in self.competitors],
            "recommendations": self.recommendations,
            "timestamp": self.timestamp,
        }
 # ---------------------------------------------------------------------------
 # MCP tool caller helper
 # ---------------------------------------------------------------------------
 def call_mcp_tool(tool_name: str, params: dict) -> dict:
    """
    Call an Ahrefs MCP tool and return the parsed JSON response.
    In Claude Desktop / Claude Code environments the MCP tools are invoked
    directly by the AI agent. This helper exists so that the script can also
    be executed standalone via subprocess for testing purposes.
    """
    logger.info(f"Calling MCP tool: {tool_name} with params: {params}")
    try:
        cmd = ["claude", "mcp", "call", "ahrefs", tool_name, json.dumps(params)]
        result = subprocess.run(cmd, capture_output=True, text=True, timeout=120)
        if result.returncode == 0 and result.stdout.strip():
            return json.loads(result.stdout.strip())
        logger.warning(f"MCP tool {tool_name} returned non-zero or empty: {result.stderr}")
        return {}
    except (subprocess.TimeoutExpired, json.JSONDecodeError, FileNotFoundError) as exc:
        logger.warning(f"MCP call failed ({exc}). Returning empty dict.")
        return {}
 # ---------------------------------------------------------------------------
 # AI Visibility Tracker
 # ---------------------------------------------------------------------------
 class AiVisibilityTracker(BaseAsyncClient):
    """Track brand visibility across AI-generated search results."""
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.logger = logging.getLogger(self.__class__.__name__)
    # ---- Impressions ----
    async def get_impressions_overview(self, target: str) -> ImpressionMetrics:
        """Fetch current AI impression metrics via brand-radar-impressions-overview."""
        self.logger.info(f"Fetching impressions overview for {target}")
        data = await asyncio.to_thread(
            call_mcp_tool,
            "brand-radar-impressions-overview",
            {"target": target},
        )
        metrics = ImpressionMetrics()
        if not data:
            return metrics
        metrics.total = data.get("total_impressions", data.get("impressions", 0))
        metrics.change_pct = data.get("change_pct", data.get("change", 0.0))
        metrics.period = data.get("period", "")
        metrics.breakdown = data.get("breakdown", {})
        if metrics.change_pct > 5:
            metrics.trend = "increasing"
        elif metrics.change_pct < -5:
            metrics.trend = "decreasing"
        else:
            metrics.trend = "stable"
        return metrics
    # ---- Mentions ----
    async def get_mentions_overview(self, target: str) -> MentionMetrics:
        """Fetch current AI mention metrics via brand-radar-mentions-overview."""
        self.logger.info(f"Fetching mentions overview for {target}")
        data = await asyncio.to_thread(
            call_mcp_tool,
            "brand-radar-mentions-overview",
            {"target": target},
        )
        metrics = MentionMetrics()
        if not data:
            return metrics
        metrics.total = data.get("total_mentions", data.get("mentions", 0))
        metrics.change_pct = data.get("change_pct", data.get("change", 0.0))
        metrics.period = data.get("period", "")
        metrics.breakdown = data.get("breakdown", {})
        if metrics.change_pct > 5:
            metrics.trend = "increasing"
        elif metrics.change_pct < -5:
            metrics.trend = "decreasing"
        else:
            metrics.trend = "stable"
        return metrics
    # ---- Share of Voice ----
    async def get_sov_overview(self, target: str) -> dict:
        """Fetch Share of Voice overview via brand-radar-sov-overview."""
        self.logger.info(f"Fetching SOV overview for {target}")
        data = await asyncio.to_thread(
            call_mcp_tool,
            "brand-radar-sov-overview",
            {"target": target},
        )
        if not data:
            return {"brand_sov": 0.0, "competitors": []}
        brand_sov = data.get("sov", data.get("share_of_voice", 0.0))
        competitors_raw = data.get("competitors", [])
        competitors = []
        for comp in competitors_raw:
            competitors.append(SovMetric(
                domain=comp.get("domain", ""),
                sov_pct=comp.get("sov", comp.get("share_of_voice", 0.0)),
                change_pct=comp.get("change_pct", 0.0),
            ))
        return {
            "brand_sov": brand_sov,
            "competitors": [asdict(c) for c in competitors],
        }
    # ---- History ----
    async def get_impressions_history(self, target: str) -> list[HistoryPoint]:
        """Fetch impressions history via brand-radar-impressions-history."""
        self.logger.info(f"Fetching impressions history for {target}")
        data = await asyncio.to_thread(
            call_mcp_tool,
            "brand-radar-impressions-history",
            {"target": target},
        )
        return self._parse_history(data)
    async def get_mentions_history(self, target: str) -> list[HistoryPoint]:
        """Fetch mentions history via brand-radar-mentions-history."""
        self.logger.info(f"Fetching mentions history for {target}")
        data = await asyncio.to_thread(
            call_mcp_tool,
            "brand-radar-mentions-history",
            {"target": target},
        )
        return self._parse_history(data)
    async def get_sov_history(self, target: str) -> list[HistoryPoint]:
        """Fetch SOV history via brand-radar-sov-history."""
        self.logger.info(f"Fetching SOV history for {target}")
        data = await asyncio.to_thread(
            call_mcp_tool,
            "brand-radar-sov-history",
            {"target": target},
        )
        return self._parse_history(data)
    def _parse_history(self, data: dict | list) -> list[HistoryPoint]:
        """Parse history data from MCP response into HistoryPoint list."""
        points: list[HistoryPoint] = []
        if not data:
            return points
        items = data if isinstance(data, list) else data.get("history", data.get("data", []))
        for item in items:
            if isinstance(item, dict):
                points.append(HistoryPoint(
                    date=item.get("date", item.get("period", "")),
                    value=item.get("value", item.get("impressions", item.get("mentions", item.get("sov", 0.0)))),
                ))
        return points
    # ---- Competitor Comparison ----
    async def compare_competitors(
        self, target: str, competitors: list[str]
    ) -> list[CompetitorVisibility]:
        """Aggregate AI visibility metrics for target and competitors."""
        self.logger.info(f"Comparing competitors: {competitors}")
        results: list[CompetitorVisibility] = []
        all_domains = [target] + competitors
        for domain in all_domains:
            imp = await self.get_impressions_overview(domain)
            men = await self.get_mentions_overview(domain)
            sov_data = await self.get_sov_overview(domain)
            results.append(CompetitorVisibility(
                domain=domain,
                impressions=imp.total,
                mentions=men.total,
                sov=sov_data.get("brand_sov", 0.0),
            ))
        # Sort by SOV descending
        results.sort(key=lambda x: x.sov, reverse=True)
        return results
    # ---- Trend Calculation ----
    @staticmethod
    def calculate_trends(history: list[HistoryPoint]) -> dict:
        """Determine trend direction and statistics from history data."""
        if not history or len(history) < 2:
            return {
                "direction": "insufficient_data",
                "avg_value": 0.0,
                "min_value": 0.0,
                "max_value": 0.0,
                "change_pct": 0.0,
                "data_points": len(history) if history else 0,
            }
        values = [h.value for h in history]
        first_value = values[0]
        last_value = values[-1]
        avg_value = sum(values) / len(values)
        min_value = min(values)
        max_value = max(values)
        if first_value > 0:
            change_pct = ((last_value - first_value) / first_value) * 100
        else:
            change_pct = 0.0
        if change_pct > 10:
            direction = "strongly_increasing"
        elif change_pct > 3:
            direction = "increasing"
        elif change_pct < -10:
            direction = "strongly_decreasing"
        elif change_pct < -3:
            direction = "decreasing"
        else:
            direction = "stable"
        return {
            "direction": direction,
            "avg_value": round(avg_value, 2),
            "min_value": round(min_value, 2),
            "max_value": round(max_value, 2),
            "change_pct": round(change_pct, 2),
            "data_points": len(values),
        }
    # ---- Recommendations ----
    @staticmethod
    def generate_recommendations(result: AiVisibilityResult) -> list[str]:
        """Generate actionable recommendations for improving AI visibility."""
        recs: list[str] = []
        # Impression-based recommendations
        if result.impressions.total == 0:
            recs.append(
                "AI 검색에서 브랜드 노출이 감지되지 않았습니다. "
                "E-E-A-T 시그널(경험, 전문성, 권위성, 신뢰성)을 강화하여 "
                "AI 엔진이 콘텐츠를 참조할 수 있도록 하세요."
            )
        elif result.impressions.trend == "decreasing":
            recs.append(
                "AI 검색 노출이 감소 추세입니다. 최신 콘텐츠 업데이트 및 "
                "구조화된 데이터(Schema Markup) 추가를 검토하세요."
            )
        elif result.impressions.trend == "increasing":
            recs.append(
                "AI 검색 노출이 증가 추세입니다. 현재 콘텐츠 전략을 "
                "유지하면서 추가 키워드 확장을 고려하세요."
            )
        # Mention-based recommendations
        if result.mentions.total == 0:
            recs.append(
                "AI 응답에서 브랜드 언급이 없습니다. "
                "브랜드명이 포함된 고품질 콘텐츠를 제작하고, "
                "외부 사이트에서의 브랜드 언급(Citations)을 늘리세요."
            )
        elif result.mentions.trend == "decreasing":
            recs.append(
                "AI 응답 내 브랜드 언급이 줄어들고 있습니다. "
                "콘텐츠 신선도(Freshness)와 업계 권위 신호를 점검하세요."
            )
        # SOV recommendations
        sov_value = result.share_of_voice.get("brand_sov", 0.0)
        if sov_value < 10:
            recs.append(
                f"AI 검색 Share of Voice가 {sov_value}%로 낮습니다. "
                "핵심 키워드에 대한 종합 가이드, FAQ 콘텐츠, "
                "원본 데이터/연구 자료를 발행하여 인용 가능성을 높이세요."
            )
        elif sov_value < 25:
            recs.append(
                f"AI 검색 Share of Voice가 {sov_value}%입니다. "
                "경쟁사 대비 차별화된 전문 콘텐츠와 "
                "독점 데이터 기반 인사이트를 강화하세요."
            )
        # Competitor-based recommendations
        if result.competitors:
            top_competitor = result.competitors[0]
            if top_competitor.domain != result.target and top_competitor.sov > sov_value:
                recs.append(
                    f"최대 경쟁사 {top_competitor.domain}의 SOV가 "
                    f"{top_competitor.sov}%로 앞서고 있습니다. "
                    "해당 경쟁사의 콘텐츠 전략과 인용 패턴을 분석하세요."
                )
        # General best practices
        recs.append(
            "AI 검색 최적화를 위해 다음 사항을 지속적으로 점검하세요: "
            "(1) 구조화된 데이터(JSON-LD) 적용, "
            "(2) FAQ 및 How-to 콘텐츠 발행, "
            "(3) 신뢰할 수 있는 외부 사이트에서의 백링크 확보, "
            "(4) 콘텐츠 정기 업데이트 및 정확성 검증."
        )
        return recs
    # ---- Main Orchestrator ----
    async def track(
        self,
        target: str,
        competitors: list[str] | None = None,
        include_history: bool = False,
        include_sov: bool = False,
    ) -> AiVisibilityResult:
        """
        Orchestrate full AI visibility tracking.
        Args:
            target: Domain to track
            competitors: Optional list of competitor domains
            include_history: Whether to fetch historical trends
            include_sov: Whether to include SOV analysis
        """
        self.logger.info(f"Starting AI visibility tracking for {target}")
        result = AiVisibilityResult(target=target)
        # Core metrics (always fetched)
        result.impressions = await self.get_impressions_overview(target)
        result.mentions = await self.get_mentions_overview(target)
        # Share of Voice
        if include_sov or competitors:
            result.share_of_voice = await self.get_sov_overview(target)
        # History
        if include_history:
            result.impressions_history = await self.get_impressions_history(target)
            result.mentions_history = await self.get_mentions_history(target)
            if include_sov:
                result.sov_history = await self.get_sov_history(target)
        # Competitor comparison
        if competitors:
            result.competitors = await self.compare_competitors(target, competitors)
        # Generate recommendations
        result.recommendations = self.generate_recommendations(result)
        self.print_stats()
        return result
 # ---------------------------------------------------------------------------
 # CLI
 # ---------------------------------------------------------------------------
 def build_parser() -> argparse.ArgumentParser:
    """Build argument parser for CLI usage."""
    parser = argparse.ArgumentParser(
        description="AI Visibility Tracker - Monitor brand visibility in AI search results",
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog="""
 Examples:
  %(prog)s --target example.com --json
  %(prog)s --target example.com --competitor comp1.com --competitor comp2.com --json
  %(prog)s --target example.com --history --sov --json
  %(prog)s --target example.com --output report.json
        """,
    )
    parser.add_argument(
        "--target", required=True,
        help="Target domain to track (e.g., example.com)",
    )
    parser.add_argument(
        "--competitor", action="append", default=[],
        help="Competitor domain (repeatable). e.g., --competitor a.com --competitor b.com",
    )
    parser.add_argument(
        "--history", action="store_true",
        help="Include historical trend data (impressions, mentions, SOV over time)",
    )
    parser.add_argument(
        "--sov", action="store_true",
        help="Include Share of Voice analysis",
    )
    parser.add_argument(
        "--json", action="store_true",
        help="Output result as JSON to stdout",
    )
    parser.add_argument(
        "--output", type=str, default=None,
        help="Save JSON output to file path",
    )
    return parser
 def print_summary(result: AiVisibilityResult) -> None:
    """Print a human-readable summary of AI visibility results."""
    print("\n" + "=" * 60)
    print(f"  AI Visibility Report: {result.target}")
    print("=" * 60)
    print(f"\n  Impressions: {result.impressions.total:,}")
    print(f"    Trend: {result.impressions.trend} ({result.impressions.change_pct:+.1f}%)")
    print(f"\n  Mentions: {result.mentions.total:,}")
    print(f"    Trend: {result.mentions.trend} ({result.mentions.change_pct:+.1f}%)")
    if result.share_of_voice:
        sov = result.share_of_voice.get("brand_sov", 0.0)
        print(f"\n  Share of Voice: {sov:.1f}%")
        comp_list = result.share_of_voice.get("competitors", [])
        if comp_list:
            print("    Competitors:")
            for c in comp_list:
                print(f"      {c.get('domain', '?')}: {c.get('sov_pct', 0):.1f}%")
    if result.impressions_history:
        trend_info = AiVisibilityTracker.calculate_trends(result.impressions_history)
        print(f"\n  Impressions Trend: {trend_info['direction']}")
        print(f"    Range: {trend_info['min_value']:,.0f} - {trend_info['max_value']:,.0f}")
        print(f"    Change: {trend_info['change_pct']:+.1f}%")
    if result.competitors:
        print("\n  Competitor Comparison:")
        for cv in result.competitors:
            marker = " <-- target" if cv.domain == result.target else ""
            print(f"    {cv.domain}: SOV={cv.sov:.1f}%, Imp={cv.impressions:,}, Men={cv.mentions:,}{marker}")
    if result.recommendations:
        print("\n  Recommendations:")
        for i, rec in enumerate(result.recommendations, 1):
            print(f"    {i}. {rec}")
    print("\n" + "=" * 60)
    print(f"  Generated: {result.timestamp}")
    print("=" * 60 + "\n")
 async def main() -> None:
    """CLI entry point."""
    parser = build_parser()
    args = parser.parse_args()
    tracker = AiVisibilityTracker(
        max_concurrent=5,
        requests_per_second=2.0,
    )
    result = await tracker.track(
        target=args.target,
        competitors=args.competitor if args.competitor else None,
        include_history=args.history,
        include_sov=args.sov,
    )
    # Output
    if args.json or args.output:
        output_data = result.to_dict()
        json_str = json.dumps(output_data, ensure_ascii=False, indent=2)
        if args.json:
            print(json_str)
        if args.output:
            output_path = Path(args.output)
            output_path.parent.mkdir(parents=True, exist_ok=True)
            output_path.write_text(json_str, encoding="utf-8")
            logger.info(f"Report saved to {args.output}")
    else:
        print_summary(result)
 if __name__ == "__main__":
    asyncio.run(main())
--- a/custom-skills/27-seo-ai-visibility/code/scripts/base_client.py
+++ b/custom-skills/27-seo-ai-visibility/code/scripts/base_client.py
@@ -0,0 +1,207 @@
 """
 Base Client - Shared async client utilities
 ===========================================
 Purpose: Rate-limited async operations for API clients
 Python: 3.10+
 """
 import asyncio
 import logging
 import os
 from asyncio import Semaphore
 from datetime import datetime
 from typing import Any, Callable, TypeVar
 from dotenv import load_dotenv
 from tenacity import (
    retry,
    stop_after_attempt,
    wait_exponential,
    retry_if_exception_type,
 )
 # Load environment variables
 load_dotenv()
 # Logging setup
 logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s",
 )
 T = TypeVar("T")
 class RateLimiter:
    """Rate limiter using token bucket algorithm."""
    def __init__(self, rate: float, per: float = 1.0):
        """
        Initialize rate limiter.
        Args:
            rate: Number of requests allowed
            per: Time period in seconds (default: 1 second)
        """
        self.rate = rate
        self.per = per
        self.tokens = rate
        self.last_update = datetime.now()
        self._lock = asyncio.Lock()
    async def acquire(self) -> None:
        """Acquire a token, waiting if necessary."""
        async with self._lock:
            now = datetime.now()
            elapsed = (now - self.last_update).total_seconds()
            self.tokens = min(self.rate, self.tokens + elapsed * (self.rate / self.per))
            self.last_update = now
            if self.tokens < 1:
                wait_time = (1 - self.tokens) * (self.per / self.rate)
                await asyncio.sleep(wait_time)
                self.tokens = 0
            else:
                self.tokens -= 1
 class BaseAsyncClient:
    """Base class for async API clients with rate limiting."""
    def __init__(
        self,
        max_concurrent: int = 5,
        requests_per_second: float = 3.0,
        logger: logging.Logger | None = None,
    ):
        """
        Initialize base client.
        Args:
            max_concurrent: Maximum concurrent requests
            requests_per_second: Rate limit
            logger: Logger instance
        """
        self.semaphore = Semaphore(max_concurrent)
        self.rate_limiter = RateLimiter(requests_per_second)
        self.logger = logger or logging.getLogger(self.__class__.__name__)
        self.stats = {
            "requests": 0,
            "success": 0,
            "errors": 0,
            "retries": 0,
        }
    @retry(
        stop=stop_after_attempt(3),
        wait=wait_exponential(multiplier=1, min=2, max=10),
        retry=retry_if_exception_type(Exception),
    )
    async def _rate_limited_request(
        self,
        coro: Callable[[], Any],
    ) -> Any:
        """Execute a request with rate limiting and retry."""
        async with self.semaphore:
            await self.rate_limiter.acquire()
            self.stats["requests"] += 1
            try:
                result = await coro()
                self.stats["success"] += 1
                return result
            except Exception as e:
                self.stats["errors"] += 1
                self.logger.error(f"Request failed: {e}")
                raise
    async def batch_requests(
        self,
        requests: list[Callable[[], Any]],
        desc: str = "Processing",
    ) -> list[Any]:
        """Execute multiple requests concurrently."""
        try:
            from tqdm.asyncio import tqdm
            has_tqdm = True
        except ImportError:
            has_tqdm = False
        async def execute(req: Callable) -> Any:
            try:
                return await self._rate_limited_request(req)
            except Exception as e:
                return {"error": str(e)}
        tasks = [execute(req) for req in requests]
        if has_tqdm:
            results = []
            for coro in tqdm.as_completed(tasks, total=len(tasks), desc=desc):
                result = await coro
                results.append(result)
            return results
        else:
            return await asyncio.gather(*tasks, return_exceptions=True)
    def print_stats(self) -> None:
        """Print request statistics."""
        self.logger.info("=" * 40)
        self.logger.info("Request Statistics:")
        self.logger.info(f"  Total Requests: {self.stats['requests']}")
        self.logger.info(f"  Successful: {self.stats['success']}")
        self.logger.info(f"  Errors: {self.stats['errors']}")
        self.logger.info("=" * 40)
 class ConfigManager:
    """Manage API configuration and credentials."""
    def __init__(self):
        load_dotenv()
    @property
    def google_credentials_path(self) -> str | None:
        """Get Google service account credentials path."""
        # Prefer SEO-specific credentials, fallback to general credentials
        seo_creds = os.path.expanduser("~/.credential/ourdigital-seo-agent.json")
        if os.path.exists(seo_creds):
            return seo_creds
        return os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
    @property
    def pagespeed_api_key(self) -> str | None:
        """Get PageSpeed Insights API key."""
        return os.getenv("PAGESPEED_API_KEY")
    @property
    def custom_search_api_key(self) -> str | None:
        """Get Custom Search API key."""
        return os.getenv("CUSTOM_SEARCH_API_KEY")
    @property
    def custom_search_engine_id(self) -> str | None:
        """Get Custom Search Engine ID."""
        return os.getenv("CUSTOM_SEARCH_ENGINE_ID")
    @property
    def notion_token(self) -> str | None:
        """Get Notion API token."""
        return os.getenv("NOTION_TOKEN") or os.getenv("NOTION_API_KEY")
    def validate_google_credentials(self) -> bool:
        """Validate Google credentials are configured."""
        creds_path = self.google_credentials_path
        if not creds_path:
            return False
        return os.path.exists(creds_path)
    def get_required(self, key: str) -> str:
        """Get required environment variable or raise error."""
        value = os.getenv(key)
        if not value:
            raise ValueError(f"Missing required environment variable: {key}")
        return value
 # Singleton config instance
 config = ConfigManager()
--- a/custom-skills/27-seo-ai-visibility/code/scripts/requirements.txt
+++ b/custom-skills/27-seo-ai-visibility/code/scripts/requirements.txt
@@ -0,0 +1,8 @@
 # 27-seo-ai-visibility dependencies
 requests>=2.31.0
 aiohttp>=3.9.0
 pandas>=2.1.0
 tenacity>=8.2.0
 tqdm>=4.66.0
 python-dotenv>=1.0.0
 rich>=13.7.0
--- a/custom-skills/27-seo-ai-visibility/desktop/SKILL.md
+++ b/custom-skills/27-seo-ai-visibility/desktop/SKILL.md
@@ -0,0 +1,66 @@
 ---
 name: seo-ai-visibility
 description: |
  AI search visibility and brand radar monitoring. Tracks how a brand appears
  in AI-generated search answers using Ahrefs Brand Radar APIs.
  Triggers: AI search, AI visibility, brand radar, AI citations,
  share of voice, AI answers, AI mentions.
 ---
 # SEO AI Visibility & Brand Radar
 Monitor and analyze brand visibility in AI-generated search results. This skill uses Ahrefs Brand Radar APIs to track impressions, mentions, share of voice, cited domains, cited pages, and AI response content.
 ## Capabilities
 ### AI Visibility Tracking
 - **Impressions Overview** - How often the brand appears in AI answers
 - **Mentions Overview** - Brand mention frequency across AI engines
 - **Share of Voice (SOV)** - Brand's share vs competitors in AI search
 - **Historical Trends** - Impressions, mentions, and SOV over time
 - **Competitor Comparison** - Side-by-side AI visibility metrics
 ### AI Citation Analysis
 - **AI Response Analysis** - Content and sentiment of AI mentions
 - **Cited Domains** - Which source domains AI engines reference
 - **Cited Pages** - Specific URLs that get cited in AI answers
 - **Citation Ranking** - Frequency-based ranking of citations
 - **Sentiment Analysis** - Positive/neutral/negative distribution
 ## Workflow
 1. **Input**: User provides target domain and optional competitors
 2. **Data Collection**: Fetch metrics from Ahrefs Brand Radar APIs
 3. **Analysis**: Calculate trends, compare competitors, analyze sentiment
 4. **Recommendations**: Generate actionable Korean-language recommendations
 5. **Output**: JSON report and Notion database entry
 ## Ahrefs MCP Tools
 | Tool | Purpose |
 |------|---------|
 | `brand-radar-ai-responses` | AI-generated responses mentioning brand |
 | `brand-radar-cited-domains` | Domains cited in AI answers |
 | `brand-radar-cited-pages` | Specific pages cited in AI answers |
 | `brand-radar-impressions-history` | Impression trend over time |
 | `brand-radar-impressions-overview` | Current impression metrics |
 | `brand-radar-mentions-history` | Mention trend over time |
 | `brand-radar-mentions-overview` | Current mention metrics |
 | `brand-radar-sov-history` | Share of voice trend |
 | `brand-radar-sov-overview` | Current share of voice |
 ## Notion Output
 All reports are saved to the OurDigital SEO Audit Log:
 - **Database ID**: `2c8581e5-8a1e-8035-880b-e38cefc2f3ef`
 - **Category**: AI Search Visibility
 - **Audit ID Format**: AI-YYYYMMDD-NNN
 - **Language**: Korean (technical terms in English)
 ## Example Queries
 - "example.com의 AI 검색 가시성을 분석해줘"
 - "AI search visibility for example.com with competitors"
 - "브랜드 레이더 분석: example.com vs competitor.com"
 - "AI 인용 분석 - 어떤 페이지가 AI 답변에서 인용되나요?"
 - "Share of Voice in AI search for our domain"
--- a/custom-skills/27-seo-ai-visibility/desktop/skill.yaml
+++ b/custom-skills/27-seo-ai-visibility/desktop/skill.yaml
@@ -0,0 +1,8 @@
 name: seo-ai-visibility
 description: |
  AI search visibility and brand radar monitoring. Triggers: AI search, AI visibility, brand radar, AI citations, share of voice, AI answers, AI mentions.
 allowed-tools:
  - mcp__ahrefs__*
  - mcp__notion__*
  - WebSearch
  - WebFetch
--- a/custom-skills/27-seo-ai-visibility/desktop/tools/ahrefs.md
+++ b/custom-skills/27-seo-ai-visibility/desktop/tools/ahrefs.md
@@ -0,0 +1,55 @@
 # Ahrefs Brand Radar MCP Tools
 ## brand-radar-impressions-overview
 Get current AI search impression metrics for a target domain. Returns total impressions, change percentage, and breakdown by AI engine.
 **Parameters:**
 - `target` (required): Domain to analyze (e.g., "example.com")
 ## brand-radar-impressions-history
 Get historical AI search impression data over time. Returns time-series data points with date and impression values.
 **Parameters:**
 - `target` (required): Domain to analyze
 ## brand-radar-mentions-overview
 Get current AI mention metrics for a target domain. Returns total mentions, change percentage, and breakdown.
 **Parameters:**
 - `target` (required): Domain to analyze
 ## brand-radar-mentions-history
 Get historical AI mention data over time. Returns time-series data points with date and mention values.
 **Parameters:**
 - `target` (required): Domain to analyze
 ## brand-radar-sov-overview
 Get Share of Voice overview in AI search for a target domain. Returns brand SOV percentage and competitor SOV data.
 **Parameters:**
 - `target` (required): Domain to analyze
 ## brand-radar-sov-history
 Get historical Share of Voice data over time. Returns time-series SOV data points.
 **Parameters:**
 - `target` (required): Domain to analyze
 ## brand-radar-ai-responses
 Get AI-generated responses that mention the brand. Returns query, response text, sentiment, and source engine for each response.
 **Parameters:**
 - `target` (required): Domain to analyze
 ## brand-radar-cited-domains
 Get domains cited in AI answers related to the brand. Returns domain name, citation count, topics, and share percentage.
 **Parameters:**
 - `target` (required): Domain to analyze
 ## brand-radar-cited-pages
 Get specific pages cited in AI answers. Returns URL, title, citation count, context snippet, and topics.
 **Parameters:**
 - `target` (required): Domain to analyze
--- a/custom-skills/27-seo-ai-visibility/desktop/tools/notion.md
+++ b/custom-skills/27-seo-ai-visibility/desktop/tools/notion.md
@@ -0,0 +1,44 @@
 # Notion MCP Tools
 ## Database: OurDigital SEO Audit Log
 - **Database ID**: `2c8581e5-8a1e-8035-880b-e38cefc2f3ef`
 - **URL**: https://www.notion.so/dintelligence/2c8581e58a1e8035880be38cefc2f3ef
 ## Required Properties
 | Property | Type | Description |
 |----------|------|-------------|
 | Issue | Title | Report title in Korean + date |
 | Site | URL | Tracked website URL |
 | Category | Select | "AI Search Visibility" |
 | Priority | Select | Based on SOV trend (Critical, High, Medium, Low) |
 | Found Date | Date | Report date (YYYY-MM-DD) |
 | Audit ID | Rich Text | Format: AI-YYYYMMDD-NNN |
 ## Usage
 Use `notion-create-pages` to save audit results:
 ```json
 {
  "parent": {"database_id": "2c8581e5-8a1e-8035-880b-e38cefc2f3ef"},
  "properties": {
    "Issue": {"title": [{"text": {"content": "AI 검색 가시성 분석 - example.com (2025-01-15)"}}]},
    "Site": {"url": "https://example.com"},
    "Category": {"select": {"name": "AI Search Visibility"}},
    "Priority": {"select": {"name": "Medium"}},
    "Found Date": {"date": {"start": "2025-01-15"}},
    "Audit ID": {"rich_text": [{"text": {"content": "AI-20250115-001"}}]}
  }
 }
 ```
 ## Priority Guidelines
 | Condition | Priority |
 |-----------|----------|
 | SOV decreasing >10% | Critical |
 | SOV decreasing 3-10% | High |
 | SOV stable, low (<10%) | Medium |
 | SOV increasing or high (>25%) | Low |
--- a/custom-skills/27-seo-ai-visibility/desktop/tools/websearch.md
+++ b/custom-skills/27-seo-ai-visibility/desktop/tools/websearch.md
@@ -0,0 +1,17 @@
 # WebSearch & WebFetch Tools
 ## WebSearch
 Use web search to supplement AI visibility analysis with additional context:
 - Research competitor AI optimization strategies
 - Find industry benchmarks for AI search visibility
 - Look up latest AI search engine algorithm updates
 - Discover best practices for AI citation optimization
 ## WebFetch
 Use web fetch to retrieve specific pages for deeper analysis:
 - Fetch competitor pages that are frequently cited in AI answers
 - Retrieve structured data (Schema Markup) from cited pages
 - Analyze content structure of top-cited URLs
 - Check E-E-A-T signals on referenced pages
--- a/custom-skills/28-seo-knowledge-graph/code/CLAUDE.md
+++ b/custom-skills/28-seo-knowledge-graph/code/CLAUDE.md
@@ -0,0 +1,139 @@
 # CLAUDE.md
 ## Overview
 Knowledge Graph and Entity SEO tool for analyzing brand entity presence in Google Knowledge Graph, Knowledge Panels, People Also Ask (PAA), and FAQ rich results. Checks entity attribute completeness, Wikipedia/Wikidata presence, and Korean equivalents (Naver knowledge iN, Naver encyclopedia). Uses WebSearch and WebFetch for data collection, Ahrefs serp-overview for SERP feature detection.
 ## Quick Start
 ```bash
 pip install -r scripts/requirements.txt
 # Knowledge Graph analysis
 python scripts/knowledge_graph_analyzer.py --entity "Samsung Electronics" --json
 # Entity SEO audit
 python scripts/entity_auditor.py --url https://example.com --entity "Brand Name" --json
 ```
 ## Scripts
 | Script | Purpose | Key Output |
 |--------|---------|------------|
 | `knowledge_graph_analyzer.py` | Analyze Knowledge Panel and entity presence | KP detection, entity attributes, Wikipedia/Wikidata status |
 | `entity_auditor.py` | Audit entity SEO signals and PAA/FAQ presence | PAA monitoring, FAQ schema tracking, entity completeness |
 | `base_client.py` | Shared utilities | RateLimiter, ConfigManager, BaseAsyncClient |
 ## Knowledge Graph Analyzer
 ```bash
 # Analyze entity in Knowledge Graph
 python scripts/knowledge_graph_analyzer.py --entity "Samsung Electronics" --json
 # Check with Korean name
 python scripts/knowledge_graph_analyzer.py --entity "삼성전자" --language ko --json
 # Include Wikipedia/Wikidata check
 python scripts/knowledge_graph_analyzer.py --entity "Samsung" --wiki --json
 ```
 **Capabilities**:
 - Knowledge Panel detection via Google search
 - Entity attribute extraction (name, description, logo, type, social profiles, website)
 - Entity attribute completeness scoring
 - Wikipedia article presence check
 - Wikidata entity presence check (QID lookup)
 - Naver encyclopedia (네이버 백과사전) presence
 - Naver knowledge iN (지식iN) presence
 - Knowledge Panel comparison with competitors
 ## Entity Auditor
 ```bash
 # Full entity SEO audit
 python scripts/entity_auditor.py --url https://example.com --entity "Brand Name" --json
 # PAA monitoring for brand keywords
 python scripts/entity_auditor.py --url https://example.com --entity "Brand Name" --paa --json
 # FAQ rich result tracking
 python scripts/entity_auditor.py --url https://example.com --entity "Brand Name" --faq --json
 ```
 **Capabilities**:
 - People Also Ask (PAA) monitoring for brand-related queries
 - FAQ schema presence tracking (FAQPage schema -> SERP appearance)
 - Entity markup audit (Organization, Person, LocalBusiness schema on website)
 - Social profile linking validation (sameAs in schema)
 - Brand SERP analysis (what appears when you search the brand name)
 - Entity consistency across web properties
 - Korean entity optimization (Korean Knowledge Panel, Naver profiles)
 ## Data Sources
 | Source | Purpose |
 |--------|---------|
 | WebSearch | Search for entity/brand to detect Knowledge Panel |
 | WebFetch | Fetch Wikipedia, Wikidata, Naver pages |
 | Ahrefs `serp-overview` | SERP feature detection for entity keywords |
 ## Output Format
 ```json
 {
  "entity": "Samsung Electronics",
  "knowledge_panel": {
    "detected": true,
    "attributes": {
      "name": "Samsung Electronics",
      "type": "Corporation",
      "description": "...",
      "logo": true,
      "website": true,
      "social_profiles": ["twitter", "facebook", "linkedin"]
    },
    "completeness_score": 85
  },
  "wikipedia": {"present": true, "url": "..."},
  "wikidata": {"present": true, "qid": "Q20710"},
  "naver_encyclopedia": {"present": true, "url": "..."},
  "naver_knowledge_in": {"present": true, "entries": 15},
  "paa_questions": [...],
  "faq_rich_results": [...],
  "entity_schema_on_site": {
    "organization": true,
    "same_as_links": 5,
    "completeness": 78
  },
  "score": 75,
  "timestamp": "2025-01-01T00:00:00"
 }
 ```
 ## Notion Output (Required)
 **IMPORTANT**: All audit reports MUST be saved to the OurDigital SEO Audit Log database.
 ### Database Configuration
 | Field | Value |
 |-------|-------|
 | Database ID | `2c8581e5-8a1e-8035-880b-e38cefc2f3ef` |
 | URL | https://www.notion.so/dintelligence/2c8581e58a1e8035880be38cefc2f3ef |
 ### Required Properties
 | Property | Type | Description |
 |----------|------|-------------|
 | Issue | Title | Report title (Korean + date) |
 | Site | URL | Entity website URL |
 | Category | Select | Knowledge Graph & Entity SEO |
 | Priority | Select | Based on entity completeness |
 | Found Date | Date | Audit date (YYYY-MM-DD) |
 | Audit ID | Rich Text | Format: KG-YYYYMMDD-NNN |
 ### Language Guidelines
 - Report content in Korean (한국어)
 - Keep technical English terms as-is (e.g., Knowledge Panel, Knowledge Graph, PAA)
 - URLs and code remain unchanged
--- a/custom-skills/28-seo-knowledge-graph/code/scripts/base_client.py
+++ b/custom-skills/28-seo-knowledge-graph/code/scripts/base_client.py
@@ -0,0 +1,207 @@
 """
 Base Client - Shared async client utilities
 ===========================================
 Purpose: Rate-limited async operations for API clients
 Python: 3.10+
 """
 import asyncio
 import logging
 import os
 from asyncio import Semaphore
 from datetime import datetime
 from typing import Any, Callable, TypeVar
 from dotenv import load_dotenv
 from tenacity import (
    retry,
    stop_after_attempt,
    wait_exponential,
    retry_if_exception_type,
 )
 # Load environment variables
 load_dotenv()
 # Logging setup
 logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s",
 )
 T = TypeVar("T")
 class RateLimiter:
    """Rate limiter using token bucket algorithm."""
    def __init__(self, rate: float, per: float = 1.0):
        """
        Initialize rate limiter.
        Args:
            rate: Number of requests allowed
            per: Time period in seconds (default: 1 second)
        """
        self.rate = rate
        self.per = per
        self.tokens = rate
        self.last_update = datetime.now()
        self._lock = asyncio.Lock()
    async def acquire(self) -> None:
        """Acquire a token, waiting if necessary."""
        async with self._lock:
            now = datetime.now()
            elapsed = (now - self.last_update).total_seconds()
            self.tokens = min(self.rate, self.tokens + elapsed * (self.rate / self.per))
            self.last_update = now
            if self.tokens < 1:
                wait_time = (1 - self.tokens) * (self.per / self.rate)
                await asyncio.sleep(wait_time)
                self.tokens = 0
            else:
                self.tokens -= 1
 class BaseAsyncClient:
    """Base class for async API clients with rate limiting."""
    def __init__(
        self,
        max_concurrent: int = 5,
        requests_per_second: float = 3.0,
        logger: logging.Logger | None = None,
    ):
        """
        Initialize base client.
        Args:
            max_concurrent: Maximum concurrent requests
            requests_per_second: Rate limit
            logger: Logger instance
        """
        self.semaphore = Semaphore(max_concurrent)
        self.rate_limiter = RateLimiter(requests_per_second)
        self.logger = logger or logging.getLogger(self.__class__.__name__)
        self.stats = {
            "requests": 0,
            "success": 0,
            "errors": 0,
            "retries": 0,
        }
    @retry(
        stop=stop_after_attempt(3),
        wait=wait_exponential(multiplier=1, min=2, max=10),
        retry=retry_if_exception_type(Exception),
    )
    async def _rate_limited_request(
        self,
        coro: Callable[[], Any],
    ) -> Any:
        """Execute a request with rate limiting and retry."""
        async with self.semaphore:
            await self.rate_limiter.acquire()
            self.stats["requests"] += 1
            try:
                result = await coro()
                self.stats["success"] += 1
                return result
            except Exception as e:
                self.stats["errors"] += 1
                self.logger.error(f"Request failed: {e}")
                raise
    async def batch_requests(
        self,
        requests: list[Callable[[], Any]],
        desc: str = "Processing",
    ) -> list[Any]:
        """Execute multiple requests concurrently."""
        try:
            from tqdm.asyncio import tqdm
            has_tqdm = True
        except ImportError:
            has_tqdm = False
        async def execute(req: Callable) -> Any:
            try:
                return await self._rate_limited_request(req)
            except Exception as e:
                return {"error": str(e)}
        tasks = [execute(req) for req in requests]
        if has_tqdm:
            results = []
            for coro in tqdm.as_completed(tasks, total=len(tasks), desc=desc):
                result = await coro
                results.append(result)
            return results
        else:
            return await asyncio.gather(*tasks, return_exceptions=True)
    def print_stats(self) -> None:
        """Print request statistics."""
        self.logger.info("=" * 40)
        self.logger.info("Request Statistics:")
        self.logger.info(f"  Total Requests: {self.stats['requests']}")
        self.logger.info(f"  Successful: {self.stats['success']}")
        self.logger.info(f"  Errors: {self.stats['errors']}")
        self.logger.info("=" * 40)
 class ConfigManager:
    """Manage API configuration and credentials."""
    def __init__(self):
        load_dotenv()
    @property
    def google_credentials_path(self) -> str | None:
        """Get Google service account credentials path."""
        # Prefer SEO-specific credentials, fallback to general credentials
        seo_creds = os.path.expanduser("~/.credential/ourdigital-seo-agent.json")
        if os.path.exists(seo_creds):
            return seo_creds
        return os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
    @property
    def pagespeed_api_key(self) -> str | None:
        """Get PageSpeed Insights API key."""
        return os.getenv("PAGESPEED_API_KEY")
    @property
    def custom_search_api_key(self) -> str | None:
        """Get Custom Search API key."""
        return os.getenv("CUSTOM_SEARCH_API_KEY")
    @property
    def custom_search_engine_id(self) -> str | None:
        """Get Custom Search Engine ID."""
        return os.getenv("CUSTOM_SEARCH_ENGINE_ID")
    @property
    def notion_token(self) -> str | None:
        """Get Notion API token."""
        return os.getenv("NOTION_TOKEN") or os.getenv("NOTION_API_KEY")
    def validate_google_credentials(self) -> bool:
        """Validate Google credentials are configured."""
        creds_path = self.google_credentials_path
        if not creds_path:
            return False
        return os.path.exists(creds_path)
    def get_required(self, key: str) -> str:
        """Get required environment variable or raise error."""
        value = os.getenv(key)
        if not value:
            raise ValueError(f"Missing required environment variable: {key}")
        return value
 # Singleton config instance
 config = ConfigManager()
--- a/custom-skills/28-seo-knowledge-graph/code/scripts/entity_auditor.py
+++ b/custom-skills/28-seo-knowledge-graph/code/scripts/entity_auditor.py
@@ -0,0 +1,902 @@
 """
 Entity Auditor
 ===============
 Purpose: Audit entity SEO signals including PAA monitoring, FAQ schema tracking,
         entity markup validation, and brand SERP analysis.
 Python: 3.10+
 """
 import argparse
 import asyncio
 import json
 import logging
 import re
 import sys
 from dataclasses import asdict, dataclass, field
 from datetime import datetime
 from typing import Any
 from urllib.parse import quote, urljoin, urlparse
 import aiohttp
 from bs4 import BeautifulSoup
 from rich.console import Console
 from rich.table import Table
 from base_client import BaseAsyncClient, ConfigManager, config
 logger = logging.getLogger(__name__)
 console = Console()
 # ---------------------------------------------------------------------------
 # Data classes
 # ---------------------------------------------------------------------------
@dataclass
 class PaaQuestion:
    """A People Also Ask question found in SERP."""
    question: str = ""
    keyword: str = ""
    position: int = 0
    source_url: str | None = None
@dataclass
 class FaqRichResult:
    """FAQ rich result tracking entry."""
    url: str = ""
    question_count: int = 0
    appearing_in_serp: bool = False
    questions: list[str] = field(default_factory=list)
    schema_valid: bool = False
@dataclass
 class EntitySchema:
    """Entity structured data found on a website."""
    type: str = ""  # Organization, Person, LocalBusiness, etc.
    properties: dict[str, Any] = field(default_factory=dict)
    same_as_links: list[str] = field(default_factory=list)
    completeness: float = 0.0
    issues: list[str] = field(default_factory=list)
@dataclass
 class BrandSerpResult:
    """What appears when searching for the brand name."""
    query: str = ""
    features: list[str] = field(default_factory=list)
    paa_count: int = 0
    faq_count: int = 0
    knowledge_panel: bool = False
    sitelinks: bool = False
    social_profiles: list[str] = field(default_factory=list)
    top_results: list[dict[str, str]] = field(default_factory=list)
@dataclass
 class EntityAuditResult:
    """Full entity SEO audit result."""
    url: str = ""
    entity_name: str = ""
    paa_questions: list[PaaQuestion] = field(default_factory=list)
    faq_rich_results: list[FaqRichResult] = field(default_factory=list)
    entity_schemas: list[EntitySchema] = field(default_factory=list)
    brand_serp: BrandSerpResult = field(default_factory=BrandSerpResult)
    social_profile_status: dict[str, bool] = field(default_factory=dict)
    overall_score: float = 0.0
    recommendations: list[str] = field(default_factory=list)
    timestamp: str = field(default_factory=lambda: datetime.now().isoformat())
    def to_dict(self) -> dict[str, Any]:
        return asdict(self)
 # ---------------------------------------------------------------------------
 # Entity Auditor
 # ---------------------------------------------------------------------------
 class EntityAuditor(BaseAsyncClient):
    """Audit entity SEO signals and rich result presence."""
    GOOGLE_SEARCH_URL = "https://www.google.com/search"
    HEADERS = {
        "User-Agent": (
            "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
            "AppleWebKit/537.36 (KHTML, like Gecko) "
            "Chrome/120.0.0.0 Safari/537.36"
        ),
        "Accept-Language": "en-US,en;q=0.9",
    }
    PAA_KEYWORD_TEMPLATES = [
        "{entity}",
        "{entity} reviews",
        "{entity} vs",
        "what is {entity}",
        "{entity} pricing",
        "{entity} alternatives",
        "is {entity} good",
        "{entity} benefits",
        "how to use {entity}",
        "{entity} complaints",
    ]
    EXPECTED_SCHEMA_PROPERTIES = {
        "Organization": [
            "name", "url", "logo", "description", "sameAs",
            "contactPoint", "address", "foundingDate", "founder",
            "numberOfEmployees", "email", "telephone",
        ],
        "Person": [
            "name", "url", "image", "description", "sameAs",
            "jobTitle", "worksFor", "alumniOf", "birthDate",
        ],
        "LocalBusiness": [
            "name", "url", "image", "description", "sameAs",
            "address", "telephone", "openingHours", "geo",
            "priceRange", "aggregateRating",
        ],
    }
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.config = config
    # ------------------------------------------------------------------
    # PAA monitoring
    # ------------------------------------------------------------------
    async def monitor_paa(
        self,
        entity_name: str,
        keywords: list[str] | None = None,
        session: aiohttp.ClientSession | None = None,
    ) -> list[PaaQuestion]:
        """Search brand keywords and extract People Also Ask questions."""
        if keywords is None:
            keywords = [t.format(entity=entity_name) for t in self.PAA_KEYWORD_TEMPLATES]
        paa_questions: list[PaaQuestion] = []
        own_session = session is None
        if own_session:
            session = aiohttp.ClientSession()
        try:
            for keyword in keywords:
                params = {"q": keyword, "hl": "en", "gl": "us"}
                try:
                    async with session.get(
                        self.GOOGLE_SEARCH_URL, params=params, headers=self.HEADERS,
                        timeout=aiohttp.ClientTimeout(total=20),
                    ) as resp:
                        if resp.status != 200:
                            logger.warning("Search for '%s' returned status %d", keyword, resp.status)
                            continue
                        html = await resp.text()
                        soup = BeautifulSoup(html, "lxml")
                        # PAA box selectors
                        paa_selectors = [
                            "div[data-sgrd] div[data-q]",
                            "div.related-question-pair",
                            "div[jsname] div[data-q]",
                            "div.wQiwMc",
                        ]
                        position = 0
                        for selector in paa_selectors:
                            elements = soup.select(selector)
                            for el in elements:
                                question_text = el.get("data-q", "") or el.get_text(strip=True)
                                if question_text and len(question_text) > 5:
                                    position += 1
                                    paa_questions.append(PaaQuestion(
                                        question=question_text,
                                        keyword=keyword,
                                        position=position,
                                    ))
                        # Fallback: regex for PAA-like questions
                        if not paa_questions:
                            text = soup.get_text(separator="\n")
                            q_patterns = re.findall(
                                r"((?:What|How|Why|When|Where|Who|Is|Can|Does|Do|Which)\s+[^?\n]{10,80}\??)",
                                text,
                            )
                            for i, q in enumerate(q_patterns[:8]):
                                paa_questions.append(PaaQuestion(
                                    question=q.strip(),
                                    keyword=keyword,
                                    position=i + 1,
                                ))
                except Exception as exc:
                    logger.error("PAA search failed for '%s': %s", keyword, exc)
                    continue
                # Rate limit between searches
                await asyncio.sleep(1.5)
        finally:
            if own_session:
                await session.close()
        # Deduplicate questions
        seen = set()
        unique = []
        for q in paa_questions:
            key = q.question.lower().strip()
            if key not in seen:
                seen.add(key)
                unique.append(q)
        logger.info("Found %d unique PAA questions for '%s'", len(unique), entity_name)
        return unique
    # ------------------------------------------------------------------
    # FAQ rich result tracking
    # ------------------------------------------------------------------
    async def track_faq_rich_results(
        self,
        url: str,
        session: aiohttp.ClientSession | None = None,
    ) -> list[FaqRichResult]:
        """Check pages for FAQPage schema and SERP appearance."""
        faq_results: list[FaqRichResult] = []
        domain = urlparse(url).netloc
        own_session = session is None
        if own_session:
            session = aiohttp.ClientSession()
        try:
            # Fetch the page and look for FAQ schema
            async with session.get(
                url, headers=self.HEADERS, timeout=aiohttp.ClientTimeout(total=20),
            ) as resp:
                if resp.status != 200:
                    logger.warning("Page %s returned status %d", url, resp.status)
                    return faq_results
                html = await resp.text()
                soup = BeautifulSoup(html, "lxml")
                # Find JSON-LD scripts with FAQPage
                scripts = soup.find_all("script", type="application/ld+json")
                for script in scripts:
                    try:
                        data = json.loads(script.string or "{}")
                        items = data if isinstance(data, list) else [data]
                        for item in items:
                            schema_type = item.get("@type", "")
                            if schema_type == "FAQPage" or (
                                isinstance(schema_type, list) and "FAQPage" in schema_type
                            ):
                                questions = item.get("mainEntity", [])
                                faq = FaqRichResult(
                                    url=url,
                                    question_count=len(questions),
                                    questions=[
                                        q.get("name", "") for q in questions if isinstance(q, dict)
                                    ],
                                    schema_valid=True,
                                )
                                faq_results.append(faq)
                            # Check for nested @graph
                            graph = item.get("@graph", [])
                            for g_item in graph:
                                if g_item.get("@type") == "FAQPage":
                                    questions = g_item.get("mainEntity", [])
                                    faq = FaqRichResult(
                                        url=url,
                                        question_count=len(questions),
                                        questions=[
                                            q.get("name", "") for q in questions if isinstance(q, dict)
                                        ],
                                        schema_valid=True,
                                    )
                                    faq_results.append(faq)
                    except json.JSONDecodeError:
                        continue
                # Also check for microdata FAQ markup
                faq_items = soup.select("[itemtype*='FAQPage'] [itemprop='mainEntity']")
                if faq_items and not faq_results:
                    questions = []
                    for item in faq_items:
                        q_el = item.select_one("[itemprop='name']")
                        if q_el:
                            questions.append(q_el.get_text(strip=True))
                    faq_results.append(FaqRichResult(
                        url=url,
                        question_count=len(questions),
                        questions=questions,
                        schema_valid=True,
                    ))
        except Exception as exc:
            logger.error("FAQ tracking failed for %s: %s", url, exc)
        finally:
            if own_session:
                await session.close()
        logger.info("Found %d FAQ schemas on %s", len(faq_results), url)
        return faq_results
    # ------------------------------------------------------------------
    # Entity schema audit
    # ------------------------------------------------------------------
    async def audit_entity_schema(
        self,
        url: str,
        session: aiohttp.ClientSession | None = None,
    ) -> list[EntitySchema]:
        """Check Organization/Person/LocalBusiness schema on website."""
        schemas: list[EntitySchema] = []
        target_types = {"Organization", "Person", "LocalBusiness", "Corporation", "MedicalBusiness"}
        own_session = session is None
        if own_session:
            session = aiohttp.ClientSession()
        try:
            async with session.get(
                url, headers=self.HEADERS, timeout=aiohttp.ClientTimeout(total=20),
            ) as resp:
                if resp.status != 200:
                    logger.warning("Page %s returned status %d", url, resp.status)
                    return schemas
                html = await resp.text()
                soup = BeautifulSoup(html, "lxml")
                scripts = soup.find_all("script", type="application/ld+json")
                for script in scripts:
                    try:
                        data = json.loads(script.string or "{}")
                        items = data if isinstance(data, list) else [data]
                        # Include @graph nested items
                        expanded = []
                        for item in items:
                            expanded.append(item)
                            if "@graph" in item:
                                expanded.extend(item["@graph"])
                        for item in expanded:
                            item_type = item.get("@type", "")
                            if isinstance(item_type, list):
                                matching = [t for t in item_type if t in target_types]
                                if not matching:
                                    continue
                                item_type = matching[0]
                            elif item_type not in target_types:
                                continue
                            same_as = item.get("sameAs", [])
                            if isinstance(same_as, str):
                                same_as = [same_as]
                            # Calculate completeness
                            base_type = item_type
                            if base_type == "Corporation":
                                base_type = "Organization"
                            elif base_type == "MedicalBusiness":
                                base_type = "LocalBusiness"
                            expected = self.EXPECTED_SCHEMA_PROPERTIES.get(base_type, [])
                            present = [k for k in expected if k in item and item[k]]
                            completeness = round((len(present) / len(expected)) * 100, 1) if expected else 0
                            # Check for issues
                            issues = []
                            if "name" not in item:
                                issues.append("Missing 'name' property")
                            if "url" not in item:
                                issues.append("Missing 'url' property")
                            if not same_as:
                                issues.append("No 'sameAs' links (social profiles)")
                            if "logo" not in item and base_type == "Organization":
                                issues.append("Missing 'logo' property")
                            if "description" not in item:
                                issues.append("Missing 'description' property")
                            schema = EntitySchema(
                                type=item_type,
                                properties={k: (str(v)[:100] if not isinstance(v, (list, dict)) else v) for k, v in item.items() if k != "@context"},
                                same_as_links=same_as,
                                completeness=completeness,
                                issues=issues,
                            )
                            schemas.append(schema)
                    except json.JSONDecodeError:
                        continue
        except Exception as exc:
            logger.error("Entity schema audit failed for %s: %s", url, exc)
        finally:
            if own_session:
                await session.close()
        logger.info("Found %d entity schemas on %s", len(schemas), url)
        return schemas
    # ------------------------------------------------------------------
    # Brand SERP analysis
    # ------------------------------------------------------------------
    async def analyze_brand_serp(
        self,
        entity_name: str,
        session: aiohttp.ClientSession | None = None,
    ) -> BrandSerpResult:
        """Analyze what appears in SERP for the brand name search."""
        result = BrandSerpResult(query=entity_name)
        own_session = session is None
        if own_session:
            session = aiohttp.ClientSession()
        try:
            params = {"q": entity_name, "hl": "en", "gl": "us"}
            async with session.get(
                self.GOOGLE_SEARCH_URL, params=params, headers=self.HEADERS,
                timeout=aiohttp.ClientTimeout(total=20),
            ) as resp:
                if resp.status != 200:
                    return result
                html = await resp.text()
                soup = BeautifulSoup(html, "lxml")
                text = soup.get_text(separator=" ", strip=True).lower()
                # Detect SERP features
                feature_indicators = {
                    "knowledge_panel": ["kp-wholepage", "knowledge-panel", "kno-"],
                    "sitelinks": ["sitelinks", "site-links"],
                    "people_also_ask": ["related-question-pair", "data-q"],
                    "faq_rich_result": ["faqpage", "frequently asked"],
                    "featured_snippet": ["featured-snippet", "data-tts"],
                    "image_pack": ["image-result", "img-brk"],
                    "video_carousel": ["video-result", "vid-"],
                    "twitter_carousel": ["twitter-timeline", "g-scrolling-carousel"],
                    "reviews": ["star-rating", "aggregate-rating"],
                    "local_pack": ["local-pack", "local_pack"],
                }
                for feature, indicators in feature_indicators.items():
                    for ind in indicators:
                        if ind in str(soup).lower():
                            result.features.append(feature)
                            break
                result.knowledge_panel = "knowledge_panel" in result.features
                result.sitelinks = "sitelinks" in result.features
                # Count PAA questions
                paa_elements = soup.select("div[data-q], div.related-question-pair")
                result.paa_count = len(paa_elements)
                if result.paa_count > 0 and "people_also_ask" not in result.features:
                    result.features.append("people_also_ask")
                # Detect social profiles in results
                social_domains = {
                    "twitter.com": "twitter", "x.com": "twitter",
                    "facebook.com": "facebook", "linkedin.com": "linkedin",
                    "youtube.com": "youtube", "instagram.com": "instagram",
                    "github.com": "github", "pinterest.com": "pinterest",
                }
                links = soup.find_all("a", href=True)
                for link in links:
                    href = link["href"]
                    for domain, name in social_domains.items():
                        if domain in href and name not in result.social_profiles:
                            result.social_profiles.append(name)
                # Extract top organic results
                result_divs = soup.select("div.g, div[data-sokoban-container]")[:10]
                for div in result_divs:
                    title_el = div.select_one("h3")
                    link_el = div.select_one("a[href]")
                    if title_el and link_el:
                        result.top_results.append({
                            "title": title_el.get_text(strip=True),
                            "url": link_el.get("href", ""),
                        })
        except Exception as exc:
            logger.error("Brand SERP analysis failed for '%s': %s", entity_name, exc)
        finally:
            if own_session:
                await session.close()
        return result
    # ------------------------------------------------------------------
    # Social profile link validation
    # ------------------------------------------------------------------
    async def check_social_profile_links(
        self,
        same_as_links: list[str],
        session: aiohttp.ClientSession | None = None,
    ) -> dict[str, bool]:
        """Validate sameAs URLs are accessible."""
        status: dict[str, bool] = {}
        if not same_as_links:
            return status
        own_session = session is None
        if own_session:
            session = aiohttp.ClientSession()
        try:
            for link in same_as_links:
                try:
                    async with session.head(
                        link, headers=self.HEADERS, timeout=aiohttp.ClientTimeout(total=10),
                        allow_redirects=True,
                    ) as resp:
                        status[link] = resp.status < 400
                except Exception:
                    status[link] = False
                await asyncio.sleep(0.5)
        finally:
            if own_session:
                await session.close()
        accessible = sum(1 for v in status.values() if v)
        logger.info("Social profile links: %d/%d accessible", accessible, len(status))
        return status
    # ------------------------------------------------------------------
    # Recommendations
    # ------------------------------------------------------------------
    def generate_recommendations(self, result: EntityAuditResult) -> list[str]:
        """Generate actionable entity SEO improvement recommendations."""
        recs: list[str] = []
        # PAA recommendations
        if not result.paa_questions:
            recs.append(
                "브랜드 관련 People Also Ask(PAA) 질문이 감지되지 않았습니다. "
                "FAQ 콘텐츠를 작성하여 PAA 노출 기회를 확보하세요."
            )
        elif len(result.paa_questions) < 5:
            recs.append(
                f"PAA 질문이 {len(result.paa_questions)}개만 감지되었습니다. "
                "더 다양한 키워드에 대한 Q&A 콘텐츠를 강화하세요."
            )
        # FAQ schema recommendations
        if not result.faq_rich_results:
            recs.append(
                "FAQPage schema가 감지되지 않았습니다. "
                "FAQ 페이지에 FAQPage JSON-LD를 추가하여 Rich Result를 확보하세요."
            )
        else:
            invalid = [f for f in result.faq_rich_results if not f.schema_valid]
            if invalid:
                recs.append(
                    f"{len(invalid)}개의 FAQ schema에 유효성 문제가 있습니다. "
                    "Google Rich Results Test로 검증하세요."
                )
        # Entity schema recommendations
        if not result.entity_schemas:
            recs.append(
                "Organization/Person/LocalBusiness schema가 없습니다. "
                "홈페이지에 Organization schema JSON-LD를 추가하세요."
            )
        else:
            for schema in result.entity_schemas:
                if schema.completeness < 50:
                    recs.append(
                        f"{schema.type} schema 완성도가 {schema.completeness}%입니다. "
                        f"누락 항목: {', '.join(schema.issues[:3])}"
                    )
                if not schema.same_as_links:
                    recs.append(
                        f"{schema.type} schema에 sameAs 속성이 없습니다. "
                        "소셜 미디어 프로필 URL을 sameAs에 추가하세요."
                    )
        # Brand SERP recommendations
        serp = result.brand_serp
        if not serp.knowledge_panel:
            recs.append(
                "브랜드 검색 시 Knowledge Panel이 표시되지 않습니다. "
                "Wikipedia, Wikidata, 구조화된 데이터를 통해 엔티티 인식을 강화하세요."
            )
        if not serp.sitelinks:
            recs.append(
                "Sitelinks가 표시되지 않습니다. "
                "사이트 구조와 내부 링크를 개선하세요."
            )
        if len(serp.social_profiles) < 3:
            recs.append(
                f"SERP에 소셜 프로필이 {len(serp.social_profiles)}개만 표시됩니다. "
                "주요 소셜 미디어 프로필을 활성화하고 schema sameAs에 연결하세요."
            )
        # Social profile accessibility
        broken = [url for url, ok in result.social_profile_status.items() if not ok]
        if broken:
            recs.append(
                f"접근 불가한 소셜 프로필 링크 {len(broken)}개: "
                f"{', '.join(broken[:3])}. sameAs URL을 업데이트하세요."
            )
        if not recs:
            recs.append("Entity SEO 상태가 양호합니다. 현재 수준을 유지하세요.")
        return recs
    # ------------------------------------------------------------------
    # Scoring
    # ------------------------------------------------------------------
    def compute_score(self, result: EntityAuditResult) -> float:
        """Compute overall entity SEO score (0-100)."""
        score = 0.0
        # PAA presence (15 points)
        paa_count = len(result.paa_questions)
        if paa_count >= 10:
            score += 15
        elif paa_count >= 5:
            score += 10
        elif paa_count > 0:
            score += 5
        # FAQ schema (15 points)
        if result.faq_rich_results:
            valid_count = sum(1 for f in result.faq_rich_results if f.schema_valid)
            score += min(15, valid_count * 5)
        # Entity schema (25 points)
        if result.entity_schemas:
            best_completeness = max(s.completeness for s in result.entity_schemas)
            score += best_completeness * 0.25
        # Brand SERP features (25 points)
        serp = result.brand_serp
        if serp.knowledge_panel:
            score += 10
        if serp.sitelinks:
            score += 5
        score += min(10, len(serp.features) * 2)
        # Social profiles (10 points)
        if result.social_profile_status:
            accessible = sum(1 for v in result.social_profile_status.values() if v)
            total = len(result.social_profile_status)
            score += (accessible / total) * 10 if total > 0 else 0
        # sameAs links (10 points)
        total_same_as = sum(len(s.same_as_links) for s in result.entity_schemas)
        score += min(10, total_same_as * 2)
        return round(min(100, score), 1)
    # ------------------------------------------------------------------
    # Main orchestrator
    # ------------------------------------------------------------------
    async def audit(
        self,
        url: str,
        entity_name: str,
        include_paa: bool = True,
        include_faq: bool = True,
    ) -> EntityAuditResult:
        """Orchestrate full entity SEO audit."""
        result = EntityAuditResult(url=url, entity_name=entity_name)
        logger.info("Starting entity audit for '%s' at %s", entity_name, url)
        async with aiohttp.ClientSession() as session:
            # Parallel tasks: entity schema, brand SERP, FAQ
            tasks = [
                self.audit_entity_schema(url, session),
                self.analyze_brand_serp(entity_name, session),
            ]
            if include_faq:
                tasks.append(self.track_faq_rich_results(url, session))
            results = await asyncio.gather(*tasks, return_exceptions=True)
            # Unpack results
            if not isinstance(results[0], Exception):
                result.entity_schemas = results[0]
            else:
                logger.error("Entity schema audit failed: %s", results[0])
            if not isinstance(results[1], Exception):
                result.brand_serp = results[1]
            else:
                logger.error("Brand SERP analysis failed: %s", results[1])
            if include_faq and len(results) > 2 and not isinstance(results[2], Exception):
                result.faq_rich_results = results[2]
            # PAA monitoring (sequential due to rate limits)
            if include_paa:
                result.paa_questions = await self.monitor_paa(entity_name, session=session)
            # Validate social profile links from schema
            all_same_as = []
            for schema in result.entity_schemas:
                all_same_as.extend(schema.same_as_links)
            if all_same_as:
                result.social_profile_status = await self.check_social_profile_links(
                    list(set(all_same_as)), session
                )
        # Compute score and recommendations
        result.overall_score = self.compute_score(result)
        result.recommendations = self.generate_recommendations(result)
        logger.info("Entity audit complete. Score: %.1f", result.overall_score)
        return result
 # ---------------------------------------------------------------------------
 # CLI display helpers
 # ---------------------------------------------------------------------------
 def display_result(result: EntityAuditResult) -> None:
    """Display audit result in rich tables."""
    console.print()
    console.print(f"[bold cyan]Entity SEO Audit: {result.entity_name}[/bold cyan]")
    console.print(f"URL: {result.url} | Score: {result.overall_score}/100")
    console.print()
    # Entity Schema table
    if result.entity_schemas:
        table = Table(title="Entity Schema Markup", show_header=True)
        table.add_column("Type", style="bold")
        table.add_column("Completeness")
        table.add_column("sameAs Links")
        table.add_column("Issues")
        for schema in result.entity_schemas:
            issues_text = "; ".join(schema.issues[:3]) if schema.issues else "None"
            table.add_row(
                schema.type,
                f"{schema.completeness}%",
                str(len(schema.same_as_links)),
                issues_text,
            )
        console.print(table)
    else:
        console.print("[red]No entity schema markup found on website![/red]")
    console.print()
    # Brand SERP table
    serp = result.brand_serp
    serp_table = Table(title="Brand SERP Analysis", show_header=True)
    serp_table.add_column("Feature", style="bold")
    serp_table.add_column("Status")
    serp_table.add_row("Knowledge Panel", "[green]Yes[/]" if serp.knowledge_panel else "[red]No[/]")
    serp_table.add_row("Sitelinks", "[green]Yes[/]" if serp.sitelinks else "[red]No[/]")
    serp_table.add_row("PAA Count", str(serp.paa_count))
    serp_table.add_row("SERP Features", ", ".join(serp.features) if serp.features else "None")
    serp_table.add_row("Social Profiles", ", ".join(serp.social_profiles) if serp.social_profiles else "None")
    console.print(serp_table)
    console.print()
    # PAA Questions
    if result.paa_questions:
        paa_table = Table(title=f"People Also Ask ({len(result.paa_questions)} questions)", show_header=True)
        paa_table.add_column("#", style="dim")
        paa_table.add_column("Question")
        paa_table.add_column("Keyword")
        for i, q in enumerate(result.paa_questions[:15], 1):
            paa_table.add_row(str(i), q.question, q.keyword)
        console.print(paa_table)
    console.print()
    # FAQ Rich Results
    if result.faq_rich_results:
        faq_table = Table(title="FAQ Rich Results", show_header=True)
        faq_table.add_column("URL")
        faq_table.add_column("Questions")
        faq_table.add_column("Valid")
        for faq in result.faq_rich_results:
            faq_table.add_row(
                faq.url[:60],
                str(faq.question_count),
                "[green]Yes[/]" if faq.schema_valid else "[red]No[/]",
            )
        console.print(faq_table)
    console.print()
    # Social Profile Status
    if result.social_profile_status:
        sp_table = Table(title="Social Profile Link Status", show_header=True)
        sp_table.add_column("URL")
        sp_table.add_column("Accessible")
        for link, accessible in result.social_profile_status.items():
            sp_table.add_row(
                link[:70],
                "[green]Yes[/]" if accessible else "[red]No[/]",
            )
        console.print(sp_table)
    console.print()
    # Recommendations
    console.print("[bold yellow]Recommendations:[/bold yellow]")
    for i, rec in enumerate(result.recommendations, 1):
        console.print(f"  {i}. {rec}")
    console.print()
 # ---------------------------------------------------------------------------
 # Main
 # ---------------------------------------------------------------------------
 def parse_args() -> argparse.Namespace:
    parser = argparse.ArgumentParser(
        description="Entity SEO Auditor",
        formatter_class=argparse.RawDescriptionHelpFormatter,
    )
    parser.add_argument("--url", required=True, help="Website URL to audit")
    parser.add_argument("--entity", required=True, help="Entity/brand name")
    parser.add_argument("--paa", action="store_true", default=True, help="Include PAA monitoring (default: True)")
    parser.add_argument("--no-paa", action="store_true", help="Skip PAA monitoring")
    parser.add_argument("--faq", action="store_true", default=True, help="Include FAQ tracking (default: True)")
    parser.add_argument("--no-faq", action="store_true", help="Skip FAQ tracking")
    parser.add_argument("--json", action="store_true", help="Output as JSON")
    parser.add_argument("--output", type=str, help="Output file path")
    return parser.parse_args()
 async def main() -> None:
    args = parse_args()
    auditor = EntityAuditor()
    result = await auditor.audit(
        url=args.url,
        entity_name=args.entity,
        include_paa=not args.no_paa,
        include_faq=not args.no_faq,
    )
    if args.json:
        output = json.dumps(result.to_dict(), ensure_ascii=False, indent=2)
        if args.output:
            with open(args.output, "w", encoding="utf-8") as f:
                f.write(output)
            console.print(f"[green]Output saved to {args.output}[/green]")
        else:
            print(output)
    else:
        display_result(result)
        if args.output:
            with open(args.output, "w", encoding="utf-8") as f:
                json.dump(result.to_dict(), f, ensure_ascii=False, indent=2)
            console.print(f"[green]Output saved to {args.output}[/green]")
 if __name__ == "__main__":
    asyncio.run(main())
--- a/custom-skills/28-seo-knowledge-graph/code/scripts/knowledge_graph_analyzer.py
+++ b/custom-skills/28-seo-knowledge-graph/code/scripts/knowledge_graph_analyzer.py
@@ -0,0 +1,782 @@
 """
 Knowledge Graph Analyzer
 =========================
 Purpose: Analyze entity presence in Google Knowledge Graph, Knowledge Panels,
         Wikipedia, Wikidata, and Korean equivalents (Naver encyclopedia, 지식iN).
 Python: 3.10+
 """
 import argparse
 import asyncio
 import json
 import logging
 import re
 import sys
 from dataclasses import asdict, dataclass, field
 from datetime import datetime
 from typing import Any
 from urllib.parse import quote, urljoin
 import aiohttp
 from bs4 import BeautifulSoup
 from rich.console import Console
 from rich.table import Table
 from base_client import BaseAsyncClient, ConfigManager, config
 logger = logging.getLogger(__name__)
 console = Console()
 # ---------------------------------------------------------------------------
 # Data classes
 # ---------------------------------------------------------------------------
 EXPECTED_ATTRIBUTES = [
    "name",
    "type",
    "description",
    "logo",
    "website",
    "founded",
    "ceo",
    "headquarters",
    "parent_organization",
    "subsidiaries",
    "social_twitter",
    "social_facebook",
    "social_linkedin",
    "social_youtube",
    "social_instagram",
    "stock_ticker",
    "industry",
    "employees",
    "revenue",
 ]
@dataclass
 class KnowledgePanelAttribute:
    """Single attribute extracted from a Knowledge Panel."""
    name: str
    value: str | None = None
    present: bool = False
@dataclass
 class KnowledgePanel:
    """Represents a detected Knowledge Panel."""
    detected: bool = False
    entity_type: str | None = None
    attributes: list[KnowledgePanelAttribute] = field(default_factory=list)
    completeness_score: float = 0.0
    raw_snippet: str | None = None
@dataclass
 class WikiPresence:
    """Wikipedia or Wikidata presence record."""
    platform: str = ""  # "wikipedia" or "wikidata"
    present: bool = False
    url: str | None = None
    qid: str | None = None  # Wikidata QID (e.g. Q20710)
    language: str = "en"
@dataclass
 class NaverPresence:
    """Naver encyclopedia and 지식iN presence."""
    encyclopedia_present: bool = False
    encyclopedia_url: str | None = None
    knowledge_in_present: bool = False
    knowledge_in_count: int = 0
    knowledge_in_url: str | None = None
@dataclass
 class KnowledgeGraphResult:
    """Full Knowledge Graph analysis result."""
    entity: str = ""
    language: str = "en"
    knowledge_panel: KnowledgePanel = field(default_factory=KnowledgePanel)
    wikipedia: WikiPresence = field(default_factory=lambda: WikiPresence(platform="wikipedia"))
    wikidata: WikiPresence = field(default_factory=lambda: WikiPresence(platform="wikidata"))
    naver: NaverPresence = field(default_factory=NaverPresence)
    competitors: list[dict[str, Any]] = field(default_factory=list)
    overall_score: float = 0.0
    recommendations: list[str] = field(default_factory=list)
    timestamp: str = field(default_factory=lambda: datetime.now().isoformat())
    def to_dict(self) -> dict[str, Any]:
        return asdict(self)
 # ---------------------------------------------------------------------------
 # Knowledge Graph Analyzer
 # ---------------------------------------------------------------------------
 class KnowledgeGraphAnalyzer(BaseAsyncClient):
    """Analyze entity presence in Knowledge Graph and related platforms."""
    GOOGLE_SEARCH_URL = "https://www.google.com/search"
    WIKIPEDIA_API_URL = "https://{lang}.wikipedia.org/api/rest_v1/page/summary/{title}"
    WIKIDATA_API_URL = "https://www.wikidata.org/w/api.php"
    NAVER_SEARCH_URL = "https://search.naver.com/search.naver"
    NAVER_ENCYCLOPEDIA_URL = "https://terms.naver.com/search.naver"
    NAVER_KIN_URL = "https://kin.naver.com/search/list.naver"
    HEADERS = {
        "User-Agent": (
            "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
            "AppleWebKit/537.36 (KHTML, like Gecko) "
            "Chrome/120.0.0.0 Safari/537.36"
        ),
        "Accept-Language": "en-US,en;q=0.9",
    }
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.config = config
    # ------------------------------------------------------------------
    # Google entity search
    # ------------------------------------------------------------------
    async def search_entity(
        self,
        entity_name: str,
        language: str = "en",
        session: aiohttp.ClientSession | None = None,
    ) -> dict[str, Any]:
        """Search Google for entity to detect Knowledge Panel signals."""
        params = {"q": entity_name, "hl": language, "gl": "us" if language == "en" else "kr"}
        headers = {**self.HEADERS}
        if language == "ko":
            headers["Accept-Language"] = "ko-KR,ko;q=0.9"
            params["gl"] = "kr"
        own_session = session is None
        if own_session:
            session = aiohttp.ClientSession()
        try:
            async with session.get(
                self.GOOGLE_SEARCH_URL, params=params, headers=headers, timeout=aiohttp.ClientTimeout(total=20)
            ) as resp:
                if resp.status != 200:
                    logger.warning("Google search returned status %d", resp.status)
                    return {"html": "", "status": resp.status}
                html = await resp.text()
                return {"html": html, "status": resp.status}
        except Exception as exc:
            logger.error("Google search failed: %s", exc)
            return {"html": "", "status": 0, "error": str(exc)}
        finally:
            if own_session:
                await session.close()
    # ------------------------------------------------------------------
    # Knowledge Panel detection
    # ------------------------------------------------------------------
    def detect_knowledge_panel(self, search_data: dict[str, Any]) -> KnowledgePanel:
        """Parse search results HTML for Knowledge Panel indicators."""
        html = search_data.get("html", "")
        if not html:
            return KnowledgePanel(detected=False)
        soup = BeautifulSoup(html, "lxml")
        kp = KnowledgePanel()
        # Knowledge Panel is typically in a div with class 'kp-wholepage' or 'knowledge-panel'
        kp_selectors = [
            "div.kp-wholepage",
            "div.knowledge-panel",
            "div[data-attrid='title']",
            "div.kp-header",
            "div[class*='kno-']",
            "div.osrp-blk",
        ]
        kp_element = None
        for selector in kp_selectors:
            kp_element = soup.select_one(selector)
            if kp_element:
                break
        if kp_element:
            kp.detected = True
            kp.raw_snippet = kp_element.get_text(separator=" ", strip=True)[:500]
        else:
            # Fallback: check for common KP text patterns
            text = soup.get_text(separator=" ", strip=True).lower()
            kp_indicators = [
                "wikipedia", "description", "founded", "ceo",
                "headquarters", "subsidiaries", "parent organization",
            ]
            matches = sum(1 for ind in kp_indicators if ind in text)
            if matches >= 3:
                kp.detected = True
                kp.raw_snippet = text[:500]
        return kp
    # ------------------------------------------------------------------
    # Attribute extraction
    # ------------------------------------------------------------------
    def extract_attributes(self, kp: KnowledgePanel, html: str = "") -> list[KnowledgePanelAttribute]:
        """Extract entity attributes from Knowledge Panel data."""
        attributes: list[KnowledgePanelAttribute] = []
        text = (kp.raw_snippet or "").lower()
        # Parse HTML for structured attribute data
        soup = BeautifulSoup(html, "lxml") if html else None
        attribute_patterns = {
            "name": r"^(.+?)(?:\s+is\s+|\s*[-|]\s*)",
            "type": r"(?:is\s+(?:a|an)\s+)(\w[\w\s]+?)(?:\.|,|\s+based)",
            "description": r"(?:is\s+)(.{20,200}?)(?:\.\s)",
            "founded": r"(?:founded|established|incorporated)\s*(?:in|:)?\s*(\d{4})",
            "ceo": r"(?:ceo|chief executive|chairman)\s*(?::|is)?\s*([A-Z][\w\s.]+?)(?:,|\.|;|\s{2})",
            "headquarters": r"(?:headquarters?|hq|based in)\s*(?::|is|in)?\s*([A-Z][\w\s,]+?)(?:\.|;|\s{2})",
            "stock_ticker": r"(?:stock|ticker|symbol)\s*(?::|is)?\s*([A-Z]{1,5}(?:\s*:\s*[A-Z]{1,5})?)",
            "employees": r"(?:employees?|staff|workforce)\s*(?::|is)?\s*([\d,]+)",
            "revenue": r"(?:revenue|sales)\s*(?::|is)?\s*([\$\d,.]+\s*(?:billion|million|B|M)?)",
            "industry": r"(?:industry|sector)\s*(?::|is)?\s*([\w\s&]+?)(?:\.|,|;)",
        }
        social_patterns = {
            "social_twitter": r"(?:twitter\.com|x\.com)/(\w+)",
            "social_facebook": r"facebook\.com/([\w.]+)",
            "social_linkedin": r"linkedin\.com/(?:company|in)/([\w-]+)",
            "social_youtube": r"youtube\.com/(?:@|channel/|user/)([\w-]+)",
            "social_instagram": r"instagram\.com/([\w.]+)",
        }
        full_text = kp.raw_snippet or ""
        html_text = ""
        if soup:
            html_text = soup.get_text(separator=" ", strip=True)
        combined = f"{full_text} {html_text}"
        for attr_name, pattern in attribute_patterns.items():
            match = re.search(pattern, combined, re.IGNORECASE)
            present = match is not None
            value = match.group(1).strip() if match else None
            attributes.append(KnowledgePanelAttribute(name=attr_name, value=value, present=present))
        # Social profiles
        for attr_name, pattern in social_patterns.items():
            match = re.search(pattern, combined, re.IGNORECASE)
            present = match is not None
            value = match.group(1).strip() if match else None
            attributes.append(KnowledgePanelAttribute(name=attr_name, value=value, present=present))
        # Logo detection from HTML
        logo_present = False
        if soup:
            logo_img = soup.select_one("img[data-atf], g-img img, img.kno-fb-img, img[alt*='logo']")
            if logo_img:
                logo_present = True
        attributes.append(KnowledgePanelAttribute(name="logo", value=None, present=logo_present))
        # Website detection
        website_present = False
        if soup:
            site_link = soup.select_one("a[data-attrid*='website'], a.ab_button[href*='http']")
            if site_link:
                website_present = True
                value = site_link.get("href", "")
        attributes.append(KnowledgePanelAttribute(name="website", value=value if website_present else None, present=website_present))
        return attributes
    # ------------------------------------------------------------------
    # Completeness scoring
    # ------------------------------------------------------------------
    def score_completeness(self, attributes: list[KnowledgePanelAttribute]) -> float:
        """Score attribute completeness (0-100) based on filled vs expected."""
        if not attributes:
            return 0.0
        weights = {
            "name": 10, "type": 8, "description": 10, "logo": 8, "website": 10,
            "founded": 5, "ceo": 5, "headquarters": 5, "parent_organization": 3,
            "subsidiaries": 3, "social_twitter": 4, "social_facebook": 4,
            "social_linkedin": 4, "social_youtube": 3, "social_instagram": 3,
            "stock_ticker": 3, "industry": 5, "employees": 3, "revenue": 4,
        }
        total_weight = sum(weights.values())
        earned = 0.0
        attr_map = {a.name: a for a in attributes}
        for attr_name, weight in weights.items():
            attr = attr_map.get(attr_name)
            if attr and attr.present:
                earned += weight
        return round((earned / total_weight) * 100, 1) if total_weight > 0 else 0.0
    # ------------------------------------------------------------------
    # Wikipedia check
    # ------------------------------------------------------------------
    async def check_wikipedia(
        self,
        entity_name: str,
        language: str = "en",
        session: aiohttp.ClientSession | None = None,
    ) -> WikiPresence:
        """Check Wikipedia article existence for entity."""
        wiki = WikiPresence(platform="wikipedia", language=language)
        title = entity_name.replace(" ", "_")
        url = self.WIKIPEDIA_API_URL.format(lang=language, title=quote(title))
        own_session = session is None
        if own_session:
            session = aiohttp.ClientSession()
        try:
            async with session.get(url, headers=self.HEADERS, timeout=aiohttp.ClientTimeout(total=15)) as resp:
                if resp.status == 200:
                    data = await resp.json()
                    wiki.present = data.get("type") != "disambiguation"
                    wiki.url = data.get("content_urls", {}).get("desktop", {}).get("page", "")
                    if not wiki.url:
                        wiki.url = f"https://{language}.wikipedia.org/wiki/{quote(title)}"
                    logger.info("Wikipedia article found for '%s' (%s)", entity_name, language)
                elif resp.status == 404:
                    wiki.present = False
                    logger.info("No Wikipedia article for '%s' (%s)", entity_name, language)
                else:
                    logger.warning("Wikipedia API returned status %d", resp.status)
        except Exception as exc:
            logger.error("Wikipedia check failed: %s", exc)
        finally:
            if own_session:
                await session.close()
        return wiki
    # ------------------------------------------------------------------
    # Wikidata check
    # ------------------------------------------------------------------
    async def check_wikidata(
        self,
        entity_name: str,
        session: aiohttp.ClientSession | None = None,
    ) -> WikiPresence:
        """Check Wikidata QID existence for entity."""
        wiki = WikiPresence(platform="wikidata")
        params = {
            "action": "wbsearchentities",
            "search": entity_name,
            "language": "en",
            "format": "json",
            "limit": 5,
        }
        own_session = session is None
        if own_session:
            session = aiohttp.ClientSession()
        try:
            async with session.get(
                self.WIKIDATA_API_URL, params=params, headers=self.HEADERS,
                timeout=aiohttp.ClientTimeout(total=15),
            ) as resp:
                if resp.status == 200:
                    data = await resp.json()
                    results = data.get("search", [])
                    if results:
                        top = results[0]
                        wiki.present = True
                        wiki.qid = top.get("id", "")
                        wiki.url = top.get("concepturi", f"https://www.wikidata.org/wiki/{wiki.qid}")
                        logger.info("Wikidata entity found: %s (%s)", wiki.qid, entity_name)
                    else:
                        wiki.present = False
                        logger.info("No Wikidata entity for '%s'", entity_name)
                else:
                    logger.warning("Wikidata API returned status %d", resp.status)
        except Exception as exc:
            logger.error("Wikidata check failed: %s", exc)
        finally:
            if own_session:
                await session.close()
        return wiki
    # ------------------------------------------------------------------
    # Naver encyclopedia
    # ------------------------------------------------------------------
    async def check_naver_encyclopedia(
        self,
        entity_name: str,
        session: aiohttp.ClientSession | None = None,
    ) -> dict[str, Any]:
        """Check Naver encyclopedia (네이버 백과사전) presence."""
        result = {"present": False, "url": None}
        params = {"query": entity_name, "searchType": 0}
        headers = {
            **self.HEADERS,
            "Accept-Language": "ko-KR,ko;q=0.9",
        }
        own_session = session is None
        if own_session:
            session = aiohttp.ClientSession()
        try:
            async with session.get(
                self.NAVER_ENCYCLOPEDIA_URL, params=params, headers=headers,
                timeout=aiohttp.ClientTimeout(total=15),
            ) as resp:
                if resp.status == 200:
                    html = await resp.text()
                    soup = BeautifulSoup(html, "lxml")
                    # Look for search result entries
                    entries = soup.select("ul.content_list li, div.search_result a, a.title")
                    if entries:
                        result["present"] = True
                        first_link = entries[0].find("a")
                        if first_link and first_link.get("href"):
                            href = first_link["href"]
                            if not href.startswith("http"):
                                href = urljoin("https://terms.naver.com", href)
                            result["url"] = href
                        else:
                            result["url"] = f"https://terms.naver.com/search.naver?query={quote(entity_name)}"
                        logger.info("Naver encyclopedia entry found for '%s'", entity_name)
                    else:
                        # Fallback: check page text for result indicators
                        text = soup.get_text()
                        if entity_name in text and "검색결과가 없습니다" not in text:
                            result["present"] = True
                            result["url"] = f"https://terms.naver.com/search.naver?query={quote(entity_name)}"
                else:
                    logger.warning("Naver encyclopedia returned status %d", resp.status)
        except Exception as exc:
            logger.error("Naver encyclopedia check failed: %s", exc)
        finally:
            if own_session:
                await session.close()
        return result
    # ------------------------------------------------------------------
    # Naver knowledge iN
    # ------------------------------------------------------------------
    async def check_naver_knowledge_in(
        self,
        entity_name: str,
        session: aiohttp.ClientSession | None = None,
    ) -> dict[str, Any]:
        """Check Naver knowledge iN (지식iN) entries."""
        result = {"present": False, "count": 0, "url": None}
        params = {"query": entity_name}
        headers = {
            **self.HEADERS,
            "Accept-Language": "ko-KR,ko;q=0.9",
        }
        own_session = session is None
        if own_session:
            session = aiohttp.ClientSession()
        try:
            async with session.get(
                self.NAVER_KIN_URL, params=params, headers=headers,
                timeout=aiohttp.ClientTimeout(total=15),
            ) as resp:
                if resp.status == 200:
                    html = await resp.text()
                    soup = BeautifulSoup(html, "lxml")
                    # Extract total result count
                    count_el = soup.select_one("span.number, em.total_count, span.result_count")
                    count = 0
                    if count_el:
                        count_text = count_el.get_text(strip=True).replace(",", "")
                        count_match = re.search(r"(\d+)", count_text)
                        if count_match:
                            count = int(count_match.group(1))
                    # Also check for list items
                    entries = soup.select("ul.basic1 li, ul._list li, div.search_list li")
                    if count > 0 or entries:
                        result["present"] = True
                        result["count"] = count if count > 0 else len(entries)
                        result["url"] = f"https://kin.naver.com/search/list.naver?query={quote(entity_name)}"
                        logger.info("Naver 지식iN: %d entries for '%s'", result["count"], entity_name)
                    else:
                        logger.info("No Naver 지식iN entries for '%s'", entity_name)
                else:
                    logger.warning("Naver 지식iN returned status %d", resp.status)
        except Exception as exc:
            logger.error("Naver 지식iN check failed: %s", exc)
        finally:
            if own_session:
                await session.close()
        return result
    # ------------------------------------------------------------------
    # Recommendations
    # ------------------------------------------------------------------
    def generate_recommendations(self, result: KnowledgeGraphResult) -> list[str]:
        """Generate actionable recommendations based on analysis."""
        recs: list[str] = []
        kp = result.knowledge_panel
        if not kp.detected:
            recs.append(
                "Knowledge Panel이 감지되지 않았습니다. Google에 엔티티 등록을 위해 "
                "Wikipedia 페이지 생성, Wikidata 항목 추가, 구조화된 데이터(Organization schema) 구현을 권장합니다."
            )
        elif kp.completeness_score < 50:
            recs.append(
                f"Knowledge Panel 완성도가 {kp.completeness_score}%로 낮습니다. "
                "누락된 속성(소셜 프로필, 설명, 로고 등)을 보강하세요."
            )
        if not result.wikipedia.present:
            recs.append(
                "Wikipedia 문서가 없습니다. 주목할 만한 출처(reliable sources)를 확보한 후 "
                "Wikipedia 문서 생성을 고려하세요."
            )
        if not result.wikidata.present:
            recs.append(
                "Wikidata 항목이 없습니다. Wikidata에 엔티티를 등록하여 "
                "Knowledge Graph 인식을 강화하세요."
            )
        if not result.naver.encyclopedia_present:
            recs.append(
                "네이버 백과사전에 등록되어 있지 않습니다. 한국 시장 SEO를 위해 "
                "네이버 백과사전 등재를 검토하세요."
            )
        if result.naver.knowledge_in_count < 5:
            recs.append(
                "네이버 지식iN에 관련 콘텐츠가 부족합니다. Q&A 콘텐츠를 통해 "
                "브랜드 엔티티 인지도를 높이세요."
            )
        # Check social profile completeness
        attr_map = {a.name: a for a in kp.attributes}
        missing_social = []
        for soc in ["social_twitter", "social_facebook", "social_linkedin", "social_youtube"]:
            attr = attr_map.get(soc)
            if not attr or not attr.present:
                missing_social.append(soc.replace("social_", "").title())
        if missing_social:
            recs.append(
                f"소셜 프로필 연결 누락: {', '.join(missing_social)}. "
                "웹사이트 schema의 sameAs 속성에 소셜 프로필을 추가하세요."
            )
        if not recs:
            recs.append("Knowledge Graph 엔티티 상태가 양호합니다. 현재 수준을 유지하세요.")
        return recs
    # ------------------------------------------------------------------
    # Main orchestrator
    # ------------------------------------------------------------------
    async def analyze(
        self,
        entity_name: str,
        language: str = "en",
        include_wiki: bool = True,
        include_naver: bool = True,
    ) -> KnowledgeGraphResult:
        """Orchestrate full Knowledge Graph analysis."""
        result = KnowledgeGraphResult(entity=entity_name, language=language)
        logger.info("Starting Knowledge Graph analysis for '%s' (lang=%s)", entity_name, language)
        async with aiohttp.ClientSession() as session:
            # Step 1: Search entity on Google
            search_data = await self.search_entity(entity_name, language, session)
            # Step 2: Detect Knowledge Panel
            kp = self.detect_knowledge_panel(search_data)
            # Step 3: Extract attributes
            if kp.detected:
                kp.attributes = self.extract_attributes(kp, search_data.get("html", ""))
                kp.completeness_score = self.score_completeness(kp.attributes)
                # Detect entity type from attributes
                for attr in kp.attributes:
                    if attr.name == "type" and attr.present:
                        kp.entity_type = attr.value
                        break
            result.knowledge_panel = kp
            # Step 4: Wikipedia and Wikidata checks (parallel)
            if include_wiki:
                wiki_task = self.check_wikipedia(entity_name, language, session)
                wikidata_task = self.check_wikidata(entity_name, session)
                result.wikipedia, result.wikidata = await asyncio.gather(wiki_task, wikidata_task)
            # Step 5: Naver checks (parallel)
            if include_naver:
                enc_task = self.check_naver_encyclopedia(entity_name, session)
                kin_task = self.check_naver_knowledge_in(entity_name, session)
                enc_result, kin_result = await asyncio.gather(enc_task, kin_task)
                result.naver = NaverPresence(
                    encyclopedia_present=enc_result.get("present", False),
                    encyclopedia_url=enc_result.get("url"),
                    knowledge_in_present=kin_result.get("present", False),
                    knowledge_in_count=kin_result.get("count", 0),
                    knowledge_in_url=kin_result.get("url"),
                )
        # Step 6: Compute overall score
        scores = []
        if kp.detected:
            scores.append(kp.completeness_score * 0.35)
        else:
            scores.append(0)
        scores.append(20.0 if result.wikipedia.present else 0)
        scores.append(15.0 if result.wikidata.present else 0)
        scores.append(15.0 if result.naver.encyclopedia_present else 0)
        scores.append(15.0 if result.naver.knowledge_in_present else 0)
        result.overall_score = round(sum(scores), 1)
        # Step 7: Recommendations
        result.recommendations = self.generate_recommendations(result)
        logger.info("Analysis complete. Overall score: %.1f", result.overall_score)
        return result
 # ---------------------------------------------------------------------------
 # CLI display helpers
 # ---------------------------------------------------------------------------
 def display_result(result: KnowledgeGraphResult) -> None:
    """Display analysis result in a rich table."""
    console.print()
    console.print(f"[bold cyan]Knowledge Graph Analysis: {result.entity}[/bold cyan]")
    console.print(f"Language: {result.language} | Score: {result.overall_score}/100")
    console.print()
    # Knowledge Panel table
    kp = result.knowledge_panel
    table = Table(title="Knowledge Panel", show_header=True)
    table.add_column("Property", style="bold")
    table.add_column("Value")
    table.add_column("Status")
    table.add_row("Detected", str(kp.detected), "[green]OK[/]" if kp.detected else "[red]Missing[/]")
    table.add_row("Entity Type", kp.entity_type or "-", "[green]OK[/]" if kp.entity_type else "[yellow]Unknown[/]")
    table.add_row("Completeness", f"{kp.completeness_score}%", "[green]OK[/]" if kp.completeness_score >= 50 else "[red]Low[/]")
    for attr in kp.attributes:
        status = "[green]Present[/]" if attr.present else "[red]Missing[/]"
        table.add_row(f"  {attr.name}", attr.value or "-", status)
    console.print(table)
    console.print()
    # Platform presence table
    plat_table = Table(title="Platform Presence", show_header=True)
    plat_table.add_column("Platform", style="bold")
    plat_table.add_column("Present")
    plat_table.add_column("Details")
    plat_table.add_row(
        "Wikipedia",
        "[green]Yes[/]" if result.wikipedia.present else "[red]No[/]",
        result.wikipedia.url or "-",
    )
    plat_table.add_row(
        "Wikidata",
        "[green]Yes[/]" if result.wikidata.present else "[red]No[/]",
        result.wikidata.qid or "-",
    )
    plat_table.add_row(
        "Naver Encyclopedia",
        "[green]Yes[/]" if result.naver.encyclopedia_present else "[red]No[/]",
        result.naver.encyclopedia_url or "-",
    )
    plat_table.add_row(
        "Naver 지식iN",
        "[green]Yes[/]" if result.naver.knowledge_in_present else "[red]No[/]",
        f"{result.naver.knowledge_in_count} entries" if result.naver.knowledge_in_present else "-",
    )
    console.print(plat_table)
    console.print()
    # Recommendations
    console.print("[bold yellow]Recommendations:[/bold yellow]")
    for i, rec in enumerate(result.recommendations, 1):
        console.print(f"  {i}. {rec}")
    console.print()
 # ---------------------------------------------------------------------------
 # Main
 # ---------------------------------------------------------------------------
 def parse_args() -> argparse.Namespace:
    parser = argparse.ArgumentParser(
        description="Knowledge Graph & Entity Analyzer",
        formatter_class=argparse.RawDescriptionHelpFormatter,
    )
    parser.add_argument("--entity", required=True, help="Entity name to analyze")
    parser.add_argument("--language", default="en", choices=["en", "ko", "ja", "zh"], help="Language (default: en)")
    parser.add_argument("--wiki", action="store_true", default=True, help="Include Wikipedia/Wikidata check (default: True)")
    parser.add_argument("--no-wiki", action="store_true", help="Skip Wikipedia/Wikidata check")
    parser.add_argument("--no-naver", action="store_true", help="Skip Naver checks")
    parser.add_argument("--json", action="store_true", help="Output as JSON")
    parser.add_argument("--output", type=str, help="Output file path")
    return parser.parse_args()
 async def main() -> None:
    args = parse_args()
    analyzer = KnowledgeGraphAnalyzer()
    result = await analyzer.analyze(
        entity_name=args.entity,
        language=args.language,
        include_wiki=not args.no_wiki,
        include_naver=not args.no_naver,
    )
    if args.json:
        output = json.dumps(result.to_dict(), ensure_ascii=False, indent=2)
        if args.output:
            with open(args.output, "w", encoding="utf-8") as f:
                f.write(output)
            console.print(f"[green]Output saved to {args.output}[/green]")
        else:
            print(output)
    else:
        display_result(result)
        if args.output:
            with open(args.output, "w", encoding="utf-8") as f:
                json.dump(result.to_dict(), f, ensure_ascii=False, indent=2)
            console.print(f"[green]Output saved to {args.output}[/green]")
 if __name__ == "__main__":
    asyncio.run(main())
--- a/custom-skills/28-seo-knowledge-graph/code/scripts/requirements.txt
+++ b/custom-skills/28-seo-knowledge-graph/code/scripts/requirements.txt
@@ -0,0 +1,9 @@
 # 28-seo-knowledge-graph dependencies
 requests>=2.31.0
 aiohttp>=3.9.0
 beautifulsoup4>=4.12.0
 lxml>=5.1.0
 tenacity>=8.2.0
 tqdm>=4.66.0
 python-dotenv>=1.0.0
 rich>=13.7.0
--- a/Show More
+++ b/Show More