Add SEO skills 19-28, 31-32 with full Python implementations
12 new skills: Keyword Strategy, SERP Analysis, Position Tracking, Link Building, Content Strategy, E-Commerce SEO, KPI Framework, International SEO, AI Visibility, Knowledge Graph, Competitor Intel, and Crawl Budget. ~20K lines of Python across 25 domain scripts. Updated skill 11 pipeline table and repo CLAUDE.md. Enhanced skill 18 local SEO workflow from jamie.clinic audit. Note: Skill 26 hreflang_validator.py pending (content filter block). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
41
CLAUDE.md
41
CLAUDE.md
@@ -35,7 +35,7 @@ This is a Claude Skills collection repository containing:
|
|||||||
| 09 | ourdigital-backoffice | Business document creation | "create proposal", "견적서" |
|
| 09 | ourdigital-backoffice | Business document creation | "create proposal", "견적서" |
|
||||||
| 10 | ourdigital-skill-creator | Meta skill for creating skills | "create skill", "init skill" |
|
| 10 | ourdigital-skill-creator | Meta skill for creating skills | "create skill", "init skill" |
|
||||||
|
|
||||||
### SEO Tools (11-30)
|
### SEO Tools (11-32)
|
||||||
|
|
||||||
| # | Skill | Purpose | Trigger |
|
| # | Skill | Purpose | Trigger |
|
||||||
|---|-------|---------|---------|
|
|---|-------|---------|---------|
|
||||||
@@ -47,22 +47,20 @@ This is a Claude Skills collection repository containing:
|
|||||||
| 16 | seo-schema-validator | Structured data validation | "validate schema", "JSON-LD" |
|
| 16 | seo-schema-validator | Structured data validation | "validate schema", "JSON-LD" |
|
||||||
| 17 | seo-schema-generator | Schema markup creation | "generate schema", "create JSON-LD" |
|
| 17 | seo-schema-generator | Schema markup creation | "generate schema", "create JSON-LD" |
|
||||||
| 18 | seo-local-audit | NAP, GBP, citations | "local SEO", "Google Business Profile" |
|
| 18 | seo-local-audit | NAP, GBP, citations | "local SEO", "Google Business Profile" |
|
||||||
|
| 19 | seo-keyword-strategy | Keyword expansion, intent, clustering, gaps | "keyword research", "keyword strategy" |
|
||||||
|
| 20 | seo-serp-analysis | Google/Naver SERP features, competitor positions | "SERP analysis", "SERP features" |
|
||||||
|
| 21 | seo-position-tracking | Rank monitoring, visibility scores, alerts | "rank tracking", "position monitoring" |
|
||||||
|
| 22 | seo-link-building | Backlink audit, toxic links, link gaps | "backlink audit", "link building" |
|
||||||
|
| 23 | seo-content-strategy | Content audit, decay, briefs, clusters | "content strategy", "content audit" |
|
||||||
|
| 24 | seo-ecommerce | Product page audit, product schema | "e-commerce SEO", "product SEO" |
|
||||||
|
| 25 | seo-kpi-framework | Unified KPIs, health scores, ROI | "SEO KPI", "SEO performance" |
|
||||||
|
| 26 | seo-international | Hreflang, content parity, multi-language | "international SEO", "hreflang" |
|
||||||
|
| 27 | seo-ai-visibility | AI search citations, brand radar, SOV | "AI visibility", "AI search" |
|
||||||
|
| 28 | seo-knowledge-graph | Entity SEO, Knowledge Panel, PAA | "knowledge graph", "entity SEO" |
|
||||||
| 29 | seo-gateway-architect | Gateway page strategy | "SEO strategy", "gateway pages" |
|
| 29 | seo-gateway-architect | Gateway page strategy | "SEO strategy", "gateway pages" |
|
||||||
| 30 | seo-gateway-builder | Gateway page content | "build gateway page" |
|
| 30 | seo-gateway-builder | Gateway page content | "build gateway page" |
|
||||||
|
| 31 | seo-competitor-intel | Competitor profiling, benchmarking, threats | "competitor analysis", "competitive intel" |
|
||||||
**Future SEO Skills (19-28 reserved):**
|
| 32 | seo-crawl-budget | Log analysis, bot profiling, crawl waste | "crawl budget", "log analysis" |
|
||||||
|
|
||||||
| # | Planned Skill | Status |
|
|
||||||
|---|--------------|--------|
|
|
||||||
| 19 | Keyword Strategy & Research | Planned |
|
|
||||||
| 20 | SERP Analysis | Planned |
|
|
||||||
| 21 | Position Tracking | Planned |
|
|
||||||
| 22 | Link Building Diagnosis | Planned |
|
|
||||||
| 23 | Content Strategy | Planned |
|
|
||||||
| 24 | E-Commerce SEO | Planned |
|
|
||||||
| 25 | SEO KPI & Performance Framework | Planned |
|
|
||||||
| 26 | International SEO | Planned |
|
|
||||||
| 27-28 | *(reserved)* | — |
|
|
||||||
|
|
||||||
### GTM/GA Tools (60-69)
|
### GTM/GA Tools (60-69)
|
||||||
|
|
||||||
@@ -209,9 +207,20 @@ our-claude-skills/
|
|||||||
│ ├── 16-seo-schema-validator/
|
│ ├── 16-seo-schema-validator/
|
||||||
│ ├── 17-seo-schema-generator/
|
│ ├── 17-seo-schema-generator/
|
||||||
│ ├── 18-seo-local-audit/
|
│ ├── 18-seo-local-audit/
|
||||||
│ ├── 19-28 (reserved for future SEO skills)
|
│ ├── 19-seo-keyword-strategy/
|
||||||
|
│ ├── 20-seo-serp-analysis/
|
||||||
|
│ ├── 21-seo-position-tracking/
|
||||||
|
│ ├── 22-seo-link-building/
|
||||||
|
│ ├── 23-seo-content-strategy/
|
||||||
|
│ ├── 24-seo-ecommerce/
|
||||||
|
│ ├── 25-seo-kpi-framework/
|
||||||
|
│ ├── 26-seo-international/
|
||||||
|
│ ├── 27-seo-ai-visibility/
|
||||||
|
│ ├── 28-seo-knowledge-graph/
|
||||||
│ ├── 29-seo-gateway-architect/
|
│ ├── 29-seo-gateway-architect/
|
||||||
│ ├── 30-seo-gateway-builder/
|
│ ├── 30-seo-gateway-builder/
|
||||||
|
│ ├── 31-seo-competitor-intel/
|
||||||
|
│ ├── 32-seo-crawl-budget/
|
||||||
│ │
|
│ │
|
||||||
│ ├── 60-gtm-audit/
|
│ ├── 60-gtm-audit/
|
||||||
│ ├── 61-gtm-manager/
|
│ ├── 61-gtm-manager/
|
||||||
|
|||||||
@@ -34,9 +34,38 @@ python scripts/seo_audit_orchestrator.py --url https://example.com --json
|
|||||||
| 2 | On-Page SEO | `13-seo-on-page-audit/code/scripts/page_analyzer.py` |
|
| 2 | On-Page SEO | `13-seo-on-page-audit/code/scripts/page_analyzer.py` |
|
||||||
| 3 | Core Web Vitals | `14-seo-core-web-vitals/code/scripts/pagespeed_client.py` |
|
| 3 | Core Web Vitals | `14-seo-core-web-vitals/code/scripts/pagespeed_client.py` |
|
||||||
| 4 | Schema Validation | `16-seo-schema-validator/code/scripts/schema_validator.py` |
|
| 4 | Schema Validation | `16-seo-schema-validator/code/scripts/schema_validator.py` |
|
||||||
| 5 | Local SEO | `18-seo-local-audit/` (prompt-driven) |
|
| 5 | Local SEO | `18-seo-local-audit/` (prompt-driven — see Stage 5 notes below) |
|
||||||
| 6 | Search Console | `15-seo-search-console/code/scripts/gsc_client.py` |
|
| 6 | Search Console | `15-seo-search-console/code/scripts/gsc_client.py` |
|
||||||
|
|
||||||
|
## Stage 5: Local SEO — Key Requirements
|
||||||
|
|
||||||
|
Stage 5 is prompt-driven and requires **Business Identity extraction as a mandatory first step**:
|
||||||
|
1. Extract Korean name, English name, address, phone from website JSON-LD schema markup (`Organization`/`Hospital`/`LocalBusiness`)
|
||||||
|
2. Check website footer, contact page, and schema `sameAs` for GBP, Naver Place, and Kakao Map URLs
|
||||||
|
3. Use layered search fallback if listing URLs are not found on the website
|
||||||
|
4. Follow `18-seo-local-audit/code/CLAUDE.md` for the full workflow
|
||||||
|
5. **Korean market priorities**: GBP and Naver Smart Place are both Critical; Kakao Map is High; US-centric directories (Yelp, Yellow Pages) are Low
|
||||||
|
6. **Important**: GBP and Naver Map are JS-rendered. Report unfound listings as "not discoverable via web search" — not "does not exist"
|
||||||
|
|
||||||
|
## Extended SEO Skills Pipeline
|
||||||
|
|
||||||
|
Beyond the 6 core audit stages, additional specialized skills are available for deeper analysis:
|
||||||
|
|
||||||
|
| Skill | Audit ID | Purpose | Command |
|
||||||
|
|-------|----------|---------|---------|
|
||||||
|
| 19 - Keyword Strategy | KW | Seed expansion, intent classification, keyword gaps | `/seo-keyword-strategy` |
|
||||||
|
| 20 - SERP Analysis | SERP | Google/Naver SERP features, competitor positions | `/seo-serp-analysis` |
|
||||||
|
| 21 - Position Tracking | RANK | Rank monitoring, visibility scores, alerts | `/seo-position-tracking` |
|
||||||
|
| 22 - Link Building | LINK | Backlink audit, toxic links, link gaps | `/seo-link-building` |
|
||||||
|
| 23 - Content Strategy | CONTENT | Content audit, decay detection, briefs | `/seo-content-strategy` |
|
||||||
|
| 24 - E-Commerce SEO | ECOM | Product page audit, product schema | `/seo-ecommerce` |
|
||||||
|
| 25 - SEO KPI Framework | KPI | Unified KPIs, health scores, ROI | `/seo-kpi-framework` |
|
||||||
|
| 26 - International SEO | INTL | Hreflang validation, content parity | `/seo-international` |
|
||||||
|
| 27 - AI Visibility | AI | AI search citations, brand radar, SOV | `/seo-ai-visibility` |
|
||||||
|
| 28 - Knowledge Graph | KG | Entity SEO, Knowledge Panel, PAA | `/seo-knowledge-graph` |
|
||||||
|
| 31 - Competitor Intel | COMP | Competitor profiling, benchmarking | `/seo-competitor-intel` |
|
||||||
|
| 32 - Crawl Budget | CRAWL | Log analysis, bot profiling, waste | `/seo-crawl-budget` |
|
||||||
|
|
||||||
## Health Score Weights
|
## Health Score Weights
|
||||||
|
|
||||||
| Category | Weight |
|
| Category | Weight |
|
||||||
|
|||||||
@@ -62,10 +62,37 @@ python "$SKILLS/14-seo-core-web-vitals/code/scripts/pagespeed_client.py" --url $
|
|||||||
# Stage 4: Schema Validation
|
# Stage 4: Schema Validation
|
||||||
python "$SKILLS/16-seo-schema-validator/code/scripts/schema_validator.py" --url $URL --json
|
python "$SKILLS/16-seo-schema-validator/code/scripts/schema_validator.py" --url $URL --json
|
||||||
|
|
||||||
# Stage 5: Local SEO (prompt-driven, use WebFetch + WebSearch)
|
# Stage 5: Local SEO (see detailed instructions below)
|
||||||
# Stage 6: Search Console (requires GSC API credentials)
|
# Stage 6: Search Console (requires GSC API credentials)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### Stage 5: Local SEO — Detailed Instructions
|
||||||
|
|
||||||
|
Stage 5 is prompt-driven (no script). Follow this sequence:
|
||||||
|
|
||||||
|
1. **Extract Business Identity from website (MANDATORY FIRST)**
|
||||||
|
- WebFetch the homepage and parse JSON-LD `<script type="application/ld+json">` tags
|
||||||
|
- Extract from `Organization`, `Hospital`, or `LocalBusiness` schema: Korean name, English name, address, telephone
|
||||||
|
- Check `sameAs` array for GBP, Naver Place, Kakao Map URLs
|
||||||
|
|
||||||
|
2. **Check website for listing links**
|
||||||
|
- Scrape footer, contact page, about page for links matching:
|
||||||
|
- GBP: `maps.app.goo.gl/*`, `google.com/maps/place/*`, `g.page/*`
|
||||||
|
- Naver Place: `naver.me/*`, `map.naver.com/*/place/*`, `m.place.naver.com/*`
|
||||||
|
- Kakao Map: `place.map.kakao.com/*`, `kko.to/*`
|
||||||
|
- Check embedded iframes for Google Maps Place IDs or Naver Map embeds
|
||||||
|
|
||||||
|
3. **Layered search fallback (if links not found on website)**
|
||||||
|
- GBP: Search `"[Korean Name]" "[district]" Google Maps`, then `"[phone]" site:google.com/maps`
|
||||||
|
- Naver: Search `"[Korean Name]" site:map.naver.com`, then `"[Korean Name]" 네이버 지도 [district]`
|
||||||
|
- Kakao: Search `"[Korean Name]" site:place.map.kakao.com`
|
||||||
|
|
||||||
|
4. **Follow `18-seo-local-audit/code/CLAUDE.md` workflow** for the full audit (Steps 2-7)
|
||||||
|
|
||||||
|
5. **Important language**: Distinguish **"not discoverable via web search"** from **"does not exist."** GBP and Naver Map are JS-rendered; WebFetch cannot extract their listing data. Absence in search results does not confirm absence of the listing.
|
||||||
|
|
||||||
|
6. **Korean market priorities**: GBP and Naver Smart Place are both Critical. Kakao Map is High. US-centric directories (Yelp, Yellow Pages) are Low priority for Korean businesses.
|
||||||
|
|
||||||
## Health Score (Weighted 0-100)
|
## Health Score (Weighted 0-100)
|
||||||
|
|
||||||
| Category | Weight |
|
| Category | Weight |
|
||||||
|
|||||||
@@ -2,109 +2,253 @@
|
|||||||
|
|
||||||
## Overview
|
## Overview
|
||||||
|
|
||||||
Local SEO auditor for businesses with physical locations: NAP consistency, Google Business Profile optimization, local citations, and LocalBusiness schema validation.
|
Local SEO auditor for Korean-market businesses with physical locations. Covers business identity extraction, GBP optimization, Naver Smart Place, Kakao Map, NAP consistency, local citations, and LocalBusiness schema validation.
|
||||||
|
|
||||||
## Quick Start
|
|
||||||
|
|
||||||
This skill primarily uses MCP tools (Firecrawl, Perplexity) for data collection. Scripts are helpers for validation.
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# NAP consistency check (manual data input)
|
|
||||||
python scripts/nap_checker.py --business "Business Name" --address "123 Main St" --phone "555-1234"
|
|
||||||
|
|
||||||
# LocalBusiness schema validation
|
|
||||||
python scripts/local_schema_validator.py --url https://example.com
|
|
||||||
```
|
|
||||||
|
|
||||||
## Audit Components
|
|
||||||
|
|
||||||
### 1. NAP Consistency
|
|
||||||
**Name, Address, Phone** consistency across:
|
|
||||||
- Website (header, footer, contact page)
|
|
||||||
- Google Business Profile
|
|
||||||
- Local directories (Yelp, Yellow Pages, etc.)
|
|
||||||
- Social media profiles
|
|
||||||
|
|
||||||
### 2. Google Business Profile (GBP)
|
|
||||||
Optimization checklist:
|
|
||||||
- [ ] Business name matches website
|
|
||||||
- [ ] Address is complete and accurate
|
|
||||||
- [ ] Phone number is local
|
|
||||||
- [ ] Business hours are current
|
|
||||||
- [ ] Categories are appropriate
|
|
||||||
- [ ] Photos uploaded (exterior, interior, products)
|
|
||||||
- [ ] Posts are recent (within 7 days)
|
|
||||||
- [ ] Reviews are responded to
|
|
||||||
|
|
||||||
### 3. Local Citations
|
|
||||||
Priority directories to check:
|
|
||||||
- Google Business Profile
|
|
||||||
- Apple Maps
|
|
||||||
- Bing Places
|
|
||||||
- Yelp
|
|
||||||
- Facebook Business
|
|
||||||
- Industry-specific directories
|
|
||||||
|
|
||||||
### 4. LocalBusiness Schema
|
|
||||||
Required properties:
|
|
||||||
- @type (LocalBusiness or subtype)
|
|
||||||
- name
|
|
||||||
- address (PostalAddress)
|
|
||||||
- telephone
|
|
||||||
- openingHours
|
|
||||||
|
|
||||||
## Workflow
|
## Workflow
|
||||||
|
|
||||||
|
### Step 0: Business Identity (MANDATORY FIRST STEP)
|
||||||
|
|
||||||
|
Before any audit work, establish the official business identity.
|
||||||
|
|
||||||
|
**Sources (in priority order):**
|
||||||
|
1. Website schema markup (JSON-LD `Organization`, `Hospital`, `LocalBusiness`) — the `name` field is authoritative
|
||||||
|
2. Contact page / About page
|
||||||
|
3. Footer (address, phone, social links)
|
||||||
|
4. User-provided information (known GBP URL, Naver Place URL, etc.)
|
||||||
|
|
||||||
|
**Data to collect:**
|
||||||
|
|
||||||
|
| Field | Example |
|
||||||
|
|-------|---------|
|
||||||
|
| Official name (Korean) | 제이미성형외과의원 |
|
||||||
|
| Official name (English) | Jamie Plastic Surgery Clinic |
|
||||||
|
| Brand/display name | Jamie Clinic |
|
||||||
|
| Website URL | https://www.jamie.clinic |
|
||||||
|
| Address (Korean) | 서울특별시 강남구 ... |
|
||||||
|
| Phone | 02-XXX-XXXX |
|
||||||
|
| Known GBP URL | (if available) |
|
||||||
|
| Known Naver Place URL | (if available) |
|
||||||
|
| Known Kakao Map URL | (if available) |
|
||||||
|
|
||||||
|
**How to extract:**
|
||||||
```
|
```
|
||||||
1. Collect NAP from client
|
WebFetch homepage → parse JSON-LD script tags → extract name, address, telephone, sameAs
|
||||||
2. Scrape website for NAP mentions
|
WebFetch /contact or /about → extract NAP from page content
|
||||||
3. Search citations using Perplexity
|
Check footer for social links, map embeds, place listing URLs
|
||||||
4. Check GBP data (manual or API)
|
|
||||||
5. Validate LocalBusiness schema
|
|
||||||
6. Generate consistency report
|
|
||||||
```
|
```
|
||||||
|
|
||||||
|
Look specifically for these URL patterns in `sameAs`, footer links, or embedded iframes:
|
||||||
|
- GBP: `maps.app.goo.gl/*`, `google.com/maps/place/*`, `g.page/*`
|
||||||
|
- Naver Place: `naver.me/*`, `map.naver.com/*/place/*`, `m.place.naver.com/*`
|
||||||
|
- Kakao Map: `place.map.kakao.com/*`, `kko.to/*`
|
||||||
|
|
||||||
|
### Step 1: Website NAP Extraction
|
||||||
|
|
||||||
|
Scrape header, footer, contact page, about page for NAP mentions. Cross-reference with schema markup. Establish the **canonical NAP** baseline (the single source of truth for this audit).
|
||||||
|
|
||||||
|
### Step 2: GBP Verification & Audit
|
||||||
|
|
||||||
|
**Layered discovery (try in order, stop when found):**
|
||||||
|
1. Use provided GBP URL (from Step 0 or user input)
|
||||||
|
2. Check website for GBP link (footer, contact page, schema `sameAs`, embedded Google Maps iframe with Place ID)
|
||||||
|
3. WebSearch: `"[Korean Name]" "[City/District]" Google Maps`
|
||||||
|
4. WebSearch: `"[English Name]" Google Maps [City]`
|
||||||
|
5. WebSearch: `"[exact phone number]" site:google.com/maps`
|
||||||
|
|
||||||
|
**Important**: Google Maps is JS-rendered — WebFetch cannot extract business data from the listing page itself. Use WebSearch to find the listing URL, then verify details via search result snippets.
|
||||||
|
|
||||||
|
**If found — audit checklist (score /10):**
|
||||||
|
- [ ] Business name matches canonical NAP
|
||||||
|
- [ ] Address is complete and accurate
|
||||||
|
- [ ] Phone number matches
|
||||||
|
- [ ] Business hours are current
|
||||||
|
- [ ] Primary + secondary categories are appropriate
|
||||||
|
- [ ] Business description is complete
|
||||||
|
- [ ] 10+ photos uploaded (exterior, interior, products/services)
|
||||||
|
- [ ] Posts are recent (within 7 days)
|
||||||
|
- [ ] Reviews are responded to
|
||||||
|
- [ ] Q&A section is active
|
||||||
|
|
||||||
|
**If NOT found after all attempts:** Report as **"not discoverable via web search"** — this is distinct from "does not exist." The listing may exist but be unfindable through non-JS search methods.
|
||||||
|
|
||||||
|
### Step 3: Naver Smart Place Verification & Audit
|
||||||
|
|
||||||
|
**Layered discovery (try in order, stop when found):**
|
||||||
|
1. Use provided Naver Place URL (from Step 0 or user input)
|
||||||
|
2. Check website for Naver Place link (footer, contact page, schema `sameAs`, `naver.me/*` or `map.naver.com/*/place/*` patterns)
|
||||||
|
3. WebSearch: `"[Korean Name]" site:map.naver.com`
|
||||||
|
4. WebSearch: `"[Korean Name]" 네이버 지도 [district]`
|
||||||
|
5. WebSearch: `"[Korean Name]" 네이버 스마트플레이스`
|
||||||
|
6. WebSearch: `"[exact phone number]" site:map.naver.com`
|
||||||
|
|
||||||
|
**Important**: Naver Map is JS-rendered — WebFetch cannot extract data from the listing page. Use WebSearch for discovery, verify via search result snippets.
|
||||||
|
|
||||||
|
**If found — audit checklist (score /10):**
|
||||||
|
- [ ] Business name matches canonical NAP
|
||||||
|
- [ ] Address is complete and accurate
|
||||||
|
- [ ] Phone number matches
|
||||||
|
- [ ] Business hours are current
|
||||||
|
- [ ] Place is "claimed" (owner-managed / 업주 등록)
|
||||||
|
- [ ] Keywords/tags are set
|
||||||
|
- [ ] Booking/reservation link present
|
||||||
|
- [ ] Recent blog reviews linked
|
||||||
|
- [ ] Photos uploaded and current
|
||||||
|
- [ ] Menu/service/price information present
|
||||||
|
|
||||||
|
**If NOT found after all attempts:** Report as **"not discoverable via web search"** (not "does not exist" or "not registered").
|
||||||
|
|
||||||
|
### Step 4: Kakao Map Verification
|
||||||
|
|
||||||
|
**Discovery:**
|
||||||
|
1. Use provided Kakao Map URL (from Step 0)
|
||||||
|
2. Check website for Kakao Map link (`place.map.kakao.com/*`, `kko.to/*`)
|
||||||
|
3. WebSearch: `"[Korean Name]" site:place.map.kakao.com`
|
||||||
|
4. WebSearch: `"[Korean Name]" 카카오맵 [district]`
|
||||||
|
|
||||||
|
**If found:** Verify NAP consistency against canonical NAP.
|
||||||
|
|
||||||
|
### Step 5: Citation Discovery
|
||||||
|
|
||||||
|
**Korean market platform priorities:**
|
||||||
|
|
||||||
|
| Platform | Priority | Market |
|
||||||
|
|----------|----------|--------|
|
||||||
|
| Google Business Profile | Critical | Global |
|
||||||
|
| Naver Smart Place (네이버 스마트플레이스) | Critical | Korea |
|
||||||
|
| Kakao Map (카카오맵) | High | Korea |
|
||||||
|
| Industry-specific directories | High | Varies |
|
||||||
|
| Apple Maps | Medium | Global |
|
||||||
|
| Bing Places | Low | Global |
|
||||||
|
|
||||||
|
**Korean medical/cosmetic industry directories:**
|
||||||
|
- 강남언니 (Gangnam Unni)
|
||||||
|
- 바비톡 (Babitalk)
|
||||||
|
- 성예사 (Sungyesa)
|
||||||
|
- 굿닥 (Goodoc)
|
||||||
|
- 똑닥 (Ddocdoc)
|
||||||
|
- 모두닥 (Modoodoc)
|
||||||
|
- 하이닥 (HiDoc)
|
||||||
|
|
||||||
|
**Discovery methods:**
|
||||||
|
- Phone number search across platforms
|
||||||
|
- Korean business name + district search
|
||||||
|
- English business name search
|
||||||
|
- Address search
|
||||||
|
|
||||||
|
### Step 6: NAP Consistency Report
|
||||||
|
|
||||||
|
Cross-reference all discovered sources against the canonical NAP from Step 1.
|
||||||
|
|
||||||
|
**Common inconsistency points to check:**
|
||||||
|
- Building/landmark names (e.g., "EHL빌딩" vs "엔와이빌딩") — the authoritative source is the **business registration certificate** (사업자등록증), not the website alone
|
||||||
|
- Phone format variations (02-XXX-XXXX vs +82-2-XXX-XXXX vs 02XXXXXXX)
|
||||||
|
- Address format (road-name vs lot-number / 도로명 vs 지번)
|
||||||
|
- Korean vs English name spelling variations
|
||||||
|
- Suite/floor number omissions
|
||||||
|
|
||||||
|
### Step 7: LocalBusiness Schema Validation
|
||||||
|
|
||||||
|
Validate JSON-LD completeness:
|
||||||
|
- @type (LocalBusiness, Hospital, or appropriate subtype)
|
||||||
|
- name (Korean and/or English)
|
||||||
|
- address (PostalAddress with Korean format)
|
||||||
|
- telephone
|
||||||
|
- openingHours / openingHoursSpecification
|
||||||
|
- geo (GeoCoordinates — latitude, longitude)
|
||||||
|
- sameAs (should include GBP, Naver Place, Kakao Map, social profiles)
|
||||||
|
- url
|
||||||
|
- image
|
||||||
|
|
||||||
|
Use schema generator skill (17) for creating/fixing markup.
|
||||||
|
|
||||||
|
## Scoring
|
||||||
|
|
||||||
|
| Component | Weight | Max Score |
|
||||||
|
|-----------|--------|-----------|
|
||||||
|
| Business Identity completeness | 5% | /10 |
|
||||||
|
| NAP Consistency | 20% | /10 |
|
||||||
|
| GBP Optimization | 20% | /10 |
|
||||||
|
| Naver Smart Place | 20% | /10 |
|
||||||
|
| Kakao Map presence | 10% | /10 |
|
||||||
|
| Citations (directories) | 10% | /10 |
|
||||||
|
| LocalBusiness Schema | 15% | /10 |
|
||||||
|
|
||||||
|
**Overall Local SEO Score** = weighted average, normalized to /100.
|
||||||
|
|
||||||
## Output Format
|
## Output Format
|
||||||
|
|
||||||
```markdown
|
```markdown
|
||||||
## Local SEO Audit: [Business Name]
|
## Local SEO Audit: [Business Name]
|
||||||
|
**Date**: YYYY-MM-DD
|
||||||
|
**Website**: [URL]
|
||||||
|
|
||||||
### NAP Consistency Score: X/10
|
### Business Identity
|
||||||
|
| Field | Value |
|
||||||
|
|-------|-------|
|
||||||
|
| Korean Name | ... |
|
||||||
|
| English Name | ... |
|
||||||
|
| Brand Name | ... |
|
||||||
|
| Address | ... |
|
||||||
|
| Phone | ... |
|
||||||
|
|
||||||
|
### NAP Consistency: X/10
|
||||||
| Source | Name | Address | Phone | Status |
|
| Source | Name | Address | Phone | Status |
|
||||||
|--------|------|---------|-------|--------|
|
|--------|------|---------|-------|--------|
|
||||||
| Website | ✓ | ✓ | ✓ | Match |
|
| Website | OK/Issue | OK/Issue | OK/Issue | Match/Mismatch |
|
||||||
| GBP | ✓ | ✗ | ✓ | Mismatch |
|
| GBP | OK/Issue | OK/Issue | OK/Issue | Match/Mismatch |
|
||||||
|
| Naver Place | OK/Issue | OK/Issue | OK/Issue | Match/Mismatch |
|
||||||
|
| Kakao Map | OK/Issue | OK/Issue | OK/Issue | Match/Mismatch |
|
||||||
|
|
||||||
### GBP Optimization: X/10
|
### GBP Optimization: X/10
|
||||||
- [ ] Issue 1
|
- [x] Completed items
|
||||||
- [x] Completed item
|
- [ ] Missing items
|
||||||
|
**GBP URL**: [URL or "not discoverable"]
|
||||||
|
|
||||||
### Citation Audit
|
### Naver Smart Place: X/10
|
||||||
- Found: X citations
|
- [x] Completed items
|
||||||
- Consistent: X
|
- [ ] Missing items
|
||||||
- Needs update: X
|
**Naver Place URL**: [URL or "not discoverable"]
|
||||||
|
|
||||||
### Recommendations
|
### Kakao Map: X/10
|
||||||
1. Fix address mismatch on GBP
|
**Status**: Found/Not discoverable
|
||||||
2. Add LocalBusiness schema
|
**Kakao Map URL**: [URL or "not discoverable"]
|
||||||
|
|
||||||
|
### Citations: X/10
|
||||||
|
| Platform | Found | NAP Match |
|
||||||
|
|----------|-------|-----------|
|
||||||
|
| 강남언니 | Yes/No | OK/Issue |
|
||||||
|
| ... | | |
|
||||||
|
|
||||||
|
### LocalBusiness Schema: X/10
|
||||||
|
- Present: Yes/No
|
||||||
|
- Valid: Yes/No
|
||||||
|
- Missing fields: [list]
|
||||||
|
|
||||||
|
### Overall Score: XX/100 (Grade)
|
||||||
|
### Priority Actions
|
||||||
|
1. [Highest impact recommendation]
|
||||||
|
2. ...
|
||||||
```
|
```
|
||||||
|
|
||||||
## Common Issues
|
## Common Issues
|
||||||
|
|
||||||
| Issue | Impact | Fix |
|
| Issue | Impact | Fix |
|
||||||
|-------|--------|-----|
|
|-------|--------|-----|
|
||||||
| NAP inconsistency | High | Update all directories |
|
| NAP inconsistency | High | Update all directories to match canonical NAP |
|
||||||
| Missing GBP categories | Medium | Add relevant categories |
|
| Missing Naver Smart Place | Critical | Register and claim via smartplace.naver.com |
|
||||||
| No LocalBusiness schema | Medium | Add JSON-LD markup |
|
| Unclaimed Naver Place | High | Claim ownership via 네이버 스마트플레이스 |
|
||||||
| Outdated business hours | Medium | Update GBP hours |
|
| Missing GBP listing | Critical | Create via business.google.com |
|
||||||
| No review responses | Low | Respond to all reviews |
|
| Building name mismatch | Medium | Align to business registration certificate |
|
||||||
|
| No LocalBusiness schema | Medium | Add JSON-LD markup with sameAs links |
|
||||||
|
| Missing GeoCoordinates | Medium | Add lat/lng to schema |
|
||||||
|
| No sameAs in schema | Medium | Add GBP, Naver, Kakao, social URLs |
|
||||||
|
|
||||||
## Notes
|
## Notes
|
||||||
|
|
||||||
- GBP API requires enterprise approval (use manual audit)
|
- GBP and Naver Map are JS-rendered — WebFetch cannot extract listing data directly. Always use WebSearch for discovery.
|
||||||
- Citation discovery limited to public data
|
- "Not discoverable via web search" != "does not exist." Always use this precise language.
|
||||||
- Use schema generator skill (14) for creating LocalBusiness markup
|
- For Korean businesses, Naver Smart Place is as important as GBP (often more so for domestic traffic).
|
||||||
|
- Citation discovery is limited to publicly searchable data.
|
||||||
|
|
||||||
## Notion Output (Required)
|
## Notion Output (Required)
|
||||||
|
|
||||||
@@ -123,20 +267,13 @@ Required properties:
|
|||||||
|----------|------|-------------|
|
|----------|------|-------------|
|
||||||
| Issue | Title | Report title (Korean + date) |
|
| Issue | Title | Report title (Korean + date) |
|
||||||
| Site | URL | Audited website URL |
|
| Site | URL | Audited website URL |
|
||||||
| Category | Select | Technical SEO, On-page SEO, Performance, Schema/Structured Data, Sitemap, Robots.txt, Content, Local SEO |
|
| Category | Select | Local SEO |
|
||||||
| Priority | Select | Critical, High, Medium, Low |
|
| Priority | Select | Critical, High, Medium, Low |
|
||||||
| Found Date | Date | Audit date (YYYY-MM-DD) |
|
| Found Date | Date | Audit date (YYYY-MM-DD) |
|
||||||
| Audit ID | Rich Text | Format: [TYPE]-YYYYMMDD-NNN |
|
| Audit ID | Rich Text | Format: LOCAL-YYYYMMDD-NNN |
|
||||||
|
|
||||||
### Language Guidelines
|
### Language Guidelines
|
||||||
|
|
||||||
- Report content in Korean (한국어)
|
- Report content in Korean (한국어)
|
||||||
- Keep technical English terms as-is (e.g., SEO Audit, Core Web Vitals, Schema Markup)
|
- Keep technical English terms as-is (e.g., SEO Audit, GBP, NAP, Schema Markup)
|
||||||
- URLs and code remain unchanged
|
- URLs and code remain unchanged
|
||||||
|
|
||||||
### Example MCP Call
|
|
||||||
|
|
||||||
```bash
|
|
||||||
mcp-cli call notion/API-post-page '{"parent": {"database_id": "2c8581e5-8a1e-8035-880b-e38cefc2f3ef"}, "properties": {...}}'
|
|
||||||
```
|
|
||||||
|
|
||||||
|
|||||||
@@ -1,125 +1,239 @@
|
|||||||
---
|
---
|
||||||
name: seo-local-audit
|
name: seo-local-audit
|
||||||
description: |
|
description: |
|
||||||
Local business SEO auditor for NAP consistency, Google Business Profile, and citations.
|
Local business SEO auditor for Korean-market businesses. Covers business identity extraction,
|
||||||
Triggers: local SEO, NAP audit, Google Business Profile, GBP optimization, local citations.
|
NAP consistency, Google Business Profile, Naver Smart Place, Kakao Map, local citations,
|
||||||
|
and LocalBusiness schema validation.
|
||||||
|
Triggers: local SEO, NAP audit, Google Business Profile, GBP optimization, local citations,
|
||||||
|
네이버 스마트플레이스, 카카오맵, 로컬 SEO.
|
||||||
---
|
---
|
||||||
|
|
||||||
# SEO Local Audit
|
# SEO Local Audit
|
||||||
|
|
||||||
## Purpose
|
## Purpose
|
||||||
|
|
||||||
Audit local business SEO: NAP (Name, Address, Phone) consistency, Google Business Profile optimization, local citations, and LocalBusiness schema markup.
|
Audit local business SEO for Korean-market businesses: business identity extraction, NAP consistency, GBP optimization, Naver Smart Place, Kakao Map, local citations, and LocalBusiness schema markup.
|
||||||
|
|
||||||
## Core Capabilities
|
## Core Capabilities
|
||||||
|
|
||||||
1. **NAP Consistency** - Cross-platform verification
|
1. **Business Identity** - Extract official names, address, phone from website schema/content
|
||||||
2. **GBP Optimization** - Profile completeness check
|
2. **NAP Consistency** - Cross-platform verification against canonical NAP
|
||||||
3. **Citation Audit** - Directory presence
|
3. **GBP Optimization** - Layered discovery + profile completeness audit
|
||||||
4. **Schema Validation** - LocalBusiness markup
|
4. **Naver Smart Place** - Layered discovery + listing completeness audit
|
||||||
|
5. **Kakao Map** - Presence verification + NAP check
|
||||||
|
6. **Citation Audit** - Korean-first directory presence
|
||||||
|
7. **Schema Validation** - LocalBusiness JSON-LD markup
|
||||||
|
|
||||||
## MCP Tool Usage
|
## MCP Tool Usage
|
||||||
|
|
||||||
```
|
```
|
||||||
mcp__firecrawl__scrape: Extract NAP from website
|
mcp__firecrawl__scrape: Extract NAP and schema from website
|
||||||
mcp__perplexity__search: Find citations and directories
|
mcp__perplexity__search: Find citations, GBP, Naver Place listings
|
||||||
mcp__notion__create-page: Save audit findings
|
mcp__notion__create-page: Save audit findings
|
||||||
```
|
```
|
||||||
|
|
||||||
## Workflow
|
## Workflow
|
||||||
|
|
||||||
### 1. Gather Business Info
|
### Step 0: Business Identity (MANDATORY FIRST STEP)
|
||||||
Collect from client:
|
|
||||||
- Business name (exact)
|
|
||||||
- Full address
|
|
||||||
- Phone number (local preferred)
|
|
||||||
- Website URL
|
|
||||||
- GBP listing URL
|
|
||||||
|
|
||||||
### 2. Website NAP Check
|
Before any audit, establish the official business identity.
|
||||||
Scrape website for NAP mentions:
|
|
||||||
- Header/footer
|
|
||||||
- Contact page
|
|
||||||
- About page
|
|
||||||
- Schema markup
|
|
||||||
|
|
||||||
### 3. Citation Discovery
|
**Sources (in priority order):**
|
||||||
Search for business mentions:
|
1. Website schema markup (JSON-LD `Organization`, `Hospital`, `LocalBusiness`) — `name` field is authoritative
|
||||||
- "[Business Name] [City]"
|
2. Contact page / About page
|
||||||
- Phone number search
|
3. Footer (address, phone, social links)
|
||||||
- Address search
|
4. User-provided information
|
||||||
|
|
||||||
### 4. GBP Review
|
**Data to collect:**
|
||||||
Manual checklist:
|
|
||||||
- Profile completeness
|
|
||||||
- Category accuracy
|
|
||||||
- Photo presence
|
|
||||||
- Review responses
|
|
||||||
- Post recency
|
|
||||||
|
|
||||||
### 5. Schema Check
|
| Field | Example |
|
||||||
Validate LocalBusiness markup presence and accuracy.
|
|-------|---------|
|
||||||
|
| Official name (Korean) | 제이미성형외과의원 |
|
||||||
|
| Official name (English) | Jamie Plastic Surgery Clinic |
|
||||||
|
| Brand/display name | Jamie Clinic |
|
||||||
|
| Website URL | https://www.jamie.clinic |
|
||||||
|
| Address (Korean) | 서울특별시 강남구 ... |
|
||||||
|
| Phone | 02-XXX-XXXX |
|
||||||
|
| Known GBP URL | (if available) |
|
||||||
|
| Known Naver Place URL | (if available) |
|
||||||
|
| Known Kakao Map URL | (if available) |
|
||||||
|
|
||||||
## GBP Optimization Checklist
|
Look for these URL patterns in `sameAs`, footer links, or embedded iframes:
|
||||||
|
- GBP: `maps.app.goo.gl/*`, `google.com/maps/place/*`, `g.page/*`
|
||||||
|
- Naver Place: `naver.me/*`, `map.naver.com/*/place/*`, `m.place.naver.com/*`
|
||||||
|
- Kakao Map: `place.map.kakao.com/*`, `kko.to/*`
|
||||||
|
|
||||||
- [ ] Business name matches website
|
### Step 1: Website NAP Extraction
|
||||||
- [ ] Complete address with suite/unit
|
|
||||||
- [ ] Local phone number (not toll-free)
|
Scrape header, footer, contact page, about page. Cross-reference with schema markup. Establish the **canonical NAP** baseline.
|
||||||
- [ ] Accurate business hours
|
|
||||||
- [ ] Primary + secondary categories set
|
### Step 2: GBP Verification & Audit
|
||||||
|
|
||||||
|
**Layered discovery (try in order, stop when found):**
|
||||||
|
1. Use provided GBP URL (from Step 0 or user input)
|
||||||
|
2. Check website for GBP link (footer, contact, schema `sameAs`, embedded Google Maps iframe)
|
||||||
|
3. Search: `"[Korean Name]" "[City/District]" Google Maps`
|
||||||
|
4. Search: `"[English Name]" Google Maps [City]`
|
||||||
|
5. Search: `"[exact phone number]" site:google.com/maps`
|
||||||
|
|
||||||
|
**Important**: Google Maps is JS-rendered — scraping tools cannot extract business data. Use search for discovery, verify via search result snippets.
|
||||||
|
|
||||||
|
**If found — audit checklist (score /10):**
|
||||||
|
- [ ] Business name matches canonical NAP
|
||||||
|
- [ ] Address is complete and accurate
|
||||||
|
- [ ] Phone number matches
|
||||||
|
- [ ] Business hours are current
|
||||||
|
- [ ] Primary + secondary categories appropriate
|
||||||
- [ ] Business description complete
|
- [ ] Business description complete
|
||||||
- [ ] 10+ photos uploaded
|
- [ ] 10+ photos uploaded
|
||||||
- [ ] Recent post (within 7 days)
|
- [ ] Posts are recent (within 7 days)
|
||||||
- [ ] Reviews responded to
|
- [ ] Reviews are responded to
|
||||||
|
- [ ] Q&A section is active
|
||||||
|
|
||||||
## Citation Priority
|
**If NOT found:** Report as **"not discoverable via web search"** (distinct from "does not exist").
|
||||||
|
|
||||||
| Platform | Priority |
|
### Step 3: Naver Smart Place Verification & Audit
|
||||||
|----------|----------|
|
|
||||||
| Google Business Profile | Critical |
|
**Layered discovery (try in order, stop when found):**
|
||||||
| Apple Maps | High |
|
1. Use provided Naver Place URL (from Step 0 or user input)
|
||||||
| Bing Places | High |
|
2. Check website for Naver Place link (footer, contact, schema `sameAs`)
|
||||||
| Yelp | High |
|
3. Search: `"[Korean Name]" site:map.naver.com`
|
||||||
| Facebook | Medium |
|
4. Search: `"[Korean Name]" 네이버 지도 [district]`
|
||||||
| Industry directories | Medium |
|
5. Search: `"[Korean Name]" 네이버 스마트플레이스`
|
||||||
|
6. Search: `"[exact phone number]" site:map.naver.com`
|
||||||
|
|
||||||
|
**Important**: Naver Map is JS-rendered — scraping tools cannot extract data. Use search for discovery, verify via snippets.
|
||||||
|
|
||||||
|
**If found — audit checklist (score /10):**
|
||||||
|
- [ ] Business name matches canonical NAP
|
||||||
|
- [ ] Address is complete and accurate
|
||||||
|
- [ ] Phone number matches
|
||||||
|
- [ ] Business hours are current
|
||||||
|
- [ ] Place is "claimed" (owner-managed / 업주 등록)
|
||||||
|
- [ ] Keywords/tags are set
|
||||||
|
- [ ] Booking/reservation link present
|
||||||
|
- [ ] Recent blog reviews linked
|
||||||
|
- [ ] Photos uploaded and current
|
||||||
|
- [ ] Menu/service/price information present
|
||||||
|
|
||||||
|
**If NOT found:** Report as **"not discoverable via web search"** (not "does not exist" or "not registered").
|
||||||
|
|
||||||
|
### Step 4: Kakao Map Verification
|
||||||
|
|
||||||
|
**Discovery:**
|
||||||
|
1. Use provided Kakao Map URL (from Step 0)
|
||||||
|
2. Check website for Kakao Map link (`place.map.kakao.com/*`, `kko.to/*`)
|
||||||
|
3. Search: `"[Korean Name]" site:place.map.kakao.com`
|
||||||
|
4. Search: `"[Korean Name]" 카카오맵 [district]`
|
||||||
|
|
||||||
|
**If found:** Verify NAP consistency against canonical NAP.
|
||||||
|
|
||||||
|
### Step 5: Citation Discovery
|
||||||
|
|
||||||
|
**Korean market platform priorities:**
|
||||||
|
|
||||||
|
| Platform | Priority | Market |
|
||||||
|
|----------|----------|--------|
|
||||||
|
| Google Business Profile | Critical | Global |
|
||||||
|
| Naver Smart Place (네이버 스마트플레이스) | Critical | Korea |
|
||||||
|
| Kakao Map (카카오맵) | High | Korea |
|
||||||
|
| Industry-specific directories | High | Varies |
|
||||||
|
| Apple Maps | Medium | Global |
|
||||||
|
| Bing Places | Low | Global |
|
||||||
|
|
||||||
|
**Korean medical/cosmetic industry directories:**
|
||||||
|
- 강남언니 (Gangnam Unni)
|
||||||
|
- 바비톡 (Babitalk)
|
||||||
|
- 성예사 (Sungyesa)
|
||||||
|
- 굿닥 (Goodoc)
|
||||||
|
- 똑닥 (Ddocdoc)
|
||||||
|
- 모두닥 (Modoodoc)
|
||||||
|
- 하이닥 (HiDoc)
|
||||||
|
|
||||||
|
### Step 6: NAP Consistency Report
|
||||||
|
|
||||||
|
Cross-reference all sources against canonical NAP.
|
||||||
|
|
||||||
|
**Common inconsistency points:**
|
||||||
|
- Building/landmark names — authoritative source is the **business registration certificate** (사업자등록증)
|
||||||
|
- Phone format variations (02-XXX-XXXX vs +82-2-XXX-XXXX)
|
||||||
|
- Address format (road-name vs lot-number / 도로명 vs 지번)
|
||||||
|
- Korean vs English name spelling variations
|
||||||
|
- Suite/floor number omissions
|
||||||
|
|
||||||
|
### Step 7: LocalBusiness Schema Validation
|
||||||
|
|
||||||
|
Validate JSON-LD completeness: @type, name, address, telephone, openingHours, geo (GeoCoordinates), sameAs (GBP, Naver, Kakao, social), url, image.
|
||||||
|
|
||||||
|
## Scoring
|
||||||
|
|
||||||
|
| Component | Weight | Max Score |
|
||||||
|
|-----------|--------|-----------|
|
||||||
|
| Business Identity completeness | 5% | /10 |
|
||||||
|
| NAP Consistency | 20% | /10 |
|
||||||
|
| GBP Optimization | 20% | /10 |
|
||||||
|
| Naver Smart Place | 20% | /10 |
|
||||||
|
| Kakao Map presence | 10% | /10 |
|
||||||
|
| Citations (directories) | 10% | /10 |
|
||||||
|
| LocalBusiness Schema | 15% | /10 |
|
||||||
|
|
||||||
|
**Overall Local SEO Score** = weighted average, normalized to /100.
|
||||||
|
|
||||||
## Output Format
|
## Output Format
|
||||||
|
|
||||||
```markdown
|
```markdown
|
||||||
## Local SEO Audit: [Business]
|
## Local SEO Audit: [Business]
|
||||||
|
|
||||||
|
### Business Identity
|
||||||
|
| Field | Value |
|
||||||
|
|-------|-------|
|
||||||
|
| Korean Name | ... |
|
||||||
|
| English Name | ... |
|
||||||
|
| Address | ... |
|
||||||
|
| Phone | ... |
|
||||||
|
|
||||||
### NAP Consistency: X/10
|
### NAP Consistency: X/10
|
||||||
| Source | Name | Address | Phone |
|
| Source | Name | Address | Phone | Status |
|
||||||
|--------|------|---------|-------|
|
|--------|------|---------|-------|--------|
|
||||||
| Website | ✓/✗ | ✓/✗ | ✓/✗ |
|
| Website | OK/Issue | OK/Issue | OK/Issue | Match/Mismatch |
|
||||||
| GBP | ✓/✗ | ✓/✗ | ✓/✗ |
|
| GBP | OK/Issue | OK/Issue | OK/Issue | Match/Mismatch |
|
||||||
|
| Naver Place | OK/Issue | OK/Issue | OK/Issue | Match/Mismatch |
|
||||||
|
| Kakao Map | OK/Issue | OK/Issue | OK/Issue | Match/Mismatch |
|
||||||
|
|
||||||
### GBP Score: X/10
|
### GBP Score: X/10
|
||||||
[Checklist results]
|
[Checklist results]
|
||||||
|
|
||||||
### Citations Found: X
|
### Naver Smart Place: X/10
|
||||||
- Consistent: X
|
[Checklist results]
|
||||||
- Inconsistent: X
|
|
||||||
|
|
||||||
### LocalBusiness Schema
|
### Kakao Map: X/10
|
||||||
|
[Status + NAP check]
|
||||||
|
|
||||||
|
### Citations: X/10
|
||||||
|
| Platform | Found | NAP Match |
|
||||||
|
|----------|-------|-----------|
|
||||||
|
| ... | | |
|
||||||
|
|
||||||
|
### LocalBusiness Schema: X/10
|
||||||
- Present: Yes/No
|
- Present: Yes/No
|
||||||
- Valid: Yes/No
|
- Valid: Yes/No
|
||||||
|
- Missing fields: [list]
|
||||||
|
|
||||||
|
### Overall Score: XX/100 (Grade)
|
||||||
### Priority Actions
|
### Priority Actions
|
||||||
1. [Fix recommendations]
|
1. [Recommendations]
|
||||||
```
|
```
|
||||||
|
|
||||||
## Limitations
|
## Notes
|
||||||
|
|
||||||
- GBP data requires manual access
|
- GBP and Naver Map are JS-rendered — scraping tools cannot extract listing data. Always use search for discovery.
|
||||||
- Citation discovery limited to searchable sources
|
- "Not discoverable via web search" != "does not exist." Always use this precise language.
|
||||||
- Cannot update external directories
|
- For Korean businesses, Naver Smart Place is as important as GBP (often more so for domestic traffic).
|
||||||
|
|
||||||
## Notion Output (Required)
|
## Notion Output (Required)
|
||||||
|
|
||||||
All audit reports MUST be saved to OurDigital SEO Audit Log:
|
All audit reports MUST be saved to OurDigital SEO Audit Log:
|
||||||
- **Database ID**: `2c8581e5-8a1e-8035-880b-e38cefc2f3ef`
|
- **Database ID**: `2c8581e5-8a1e-8035-880b-e38cefc2f3ef`
|
||||||
- **Properties**: Issue (title), Site (url), Category, Priority, Found Date, Audit ID
|
- **Properties**: Issue (title), Site (url), Category (Local SEO), Priority, Found Date, Audit ID
|
||||||
- **Language**: Korean with English technical terms
|
- **Language**: Korean with English technical terms
|
||||||
- **Audit ID Format**: [TYPE]-YYYYMMDD-NNN
|
- **Audit ID Format**: LOCAL-YYYYMMDD-NNN
|
||||||
|
|
||||||
|
|||||||
139
custom-skills/19-seo-keyword-strategy/code/CLAUDE.md
Normal file
139
custom-skills/19-seo-keyword-strategy/code/CLAUDE.md
Normal file
@@ -0,0 +1,139 @@
|
|||||||
|
# CLAUDE.md
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Keyword strategy and research tool for SEO campaigns. Expands seed keywords via Ahrefs APIs, classifies search intent, clusters topics, performs competitor keyword gap analysis, and supports Korean market keyword discovery including Naver autocomplete.
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Install dependencies
|
||||||
|
pip install -r scripts/requirements.txt
|
||||||
|
|
||||||
|
# Keyword research from seed keyword
|
||||||
|
python scripts/keyword_researcher.py --keyword "치과 임플란트" --country kr --json
|
||||||
|
|
||||||
|
# Keyword gap analysis vs competitor
|
||||||
|
python scripts/keyword_gap_analyzer.py --target https://example.com --competitor https://competitor.com --json
|
||||||
|
```
|
||||||
|
|
||||||
|
## Scripts
|
||||||
|
|
||||||
|
| Script | Purpose | Key Output |
|
||||||
|
|--------|---------|------------|
|
||||||
|
| `keyword_researcher.py` | Expand seed keywords, classify intent, cluster topics | Keyword list with volume, KD, intent, clusters |
|
||||||
|
| `keyword_gap_analyzer.py` | Find competitor keyword gaps | Gap keywords with opportunity scores |
|
||||||
|
| `base_client.py` | Shared utilities | RateLimiter, ConfigManager, BaseAsyncClient |
|
||||||
|
|
||||||
|
## Keyword Researcher
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Basic expansion
|
||||||
|
python scripts/keyword_researcher.py --keyword "dental implant" --json
|
||||||
|
|
||||||
|
# Korean market with suffix expansion
|
||||||
|
python scripts/keyword_researcher.py --keyword "치과 임플란트" --country kr --korean-suffixes --json
|
||||||
|
|
||||||
|
# With volume-by-country comparison
|
||||||
|
python scripts/keyword_researcher.py --keyword "dental implant" --country kr --compare-global --json
|
||||||
|
|
||||||
|
# Output to file
|
||||||
|
python scripts/keyword_researcher.py --keyword "치과 임플란트" --country kr --output report.json
|
||||||
|
```
|
||||||
|
|
||||||
|
**Capabilities**:
|
||||||
|
- Seed keyword expansion (matching terms, related terms, search suggestions)
|
||||||
|
- Korean suffix expansion (추천, 가격, 후기, 잘하는곳, 부작용, 전후)
|
||||||
|
- Search intent classification (informational/navigational/commercial/transactional)
|
||||||
|
- Keyword clustering into topic groups
|
||||||
|
- Volume-by-country comparison (Korea vs global)
|
||||||
|
- Keyword difficulty scoring
|
||||||
|
|
||||||
|
## Keyword Gap Analyzer
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Find gaps vs one competitor
|
||||||
|
python scripts/keyword_gap_analyzer.py --target https://example.com --competitor https://competitor.com --json
|
||||||
|
|
||||||
|
# Multiple competitors
|
||||||
|
python scripts/keyword_gap_analyzer.py --target https://example.com --competitor https://comp1.com --competitor https://comp2.com --json
|
||||||
|
|
||||||
|
# Filter by minimum volume
|
||||||
|
python scripts/keyword_gap_analyzer.py --target https://example.com --competitor https://competitor.com --min-volume 100 --json
|
||||||
|
```
|
||||||
|
|
||||||
|
**Capabilities**:
|
||||||
|
- Identify keywords competitors rank for but target doesn't
|
||||||
|
- Opportunity scoring based on volume, KD, and competitor positions
|
||||||
|
- Segment gaps by intent type
|
||||||
|
- Prioritize low-KD high-volume opportunities
|
||||||
|
|
||||||
|
## Ahrefs MCP Tools Used
|
||||||
|
|
||||||
|
| Tool | Purpose |
|
||||||
|
|------|---------|
|
||||||
|
| `keywords-explorer-overview` | Get keyword metrics (volume, KD, CPC) |
|
||||||
|
| `keywords-explorer-matching-terms` | Find matching keyword variations |
|
||||||
|
| `keywords-explorer-related-terms` | Discover semantically related keywords |
|
||||||
|
| `keywords-explorer-search-suggestions` | Get autocomplete suggestions |
|
||||||
|
| `keywords-explorer-volume-by-country` | Compare volume across countries |
|
||||||
|
| `keywords-explorer-volume-history` | Track volume trends over time |
|
||||||
|
| `site-explorer-organic-keywords` | Get competitor keyword rankings |
|
||||||
|
|
||||||
|
## Output Format
|
||||||
|
|
||||||
|
All scripts support `--json` flag for structured output:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"seed_keyword": "치과 임플란트",
|
||||||
|
"country": "kr",
|
||||||
|
"total_keywords": 150,
|
||||||
|
"clusters": [
|
||||||
|
{
|
||||||
|
"topic": "임플란트 가격",
|
||||||
|
"keywords": [...],
|
||||||
|
"total_volume": 12000
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"keywords": [
|
||||||
|
{
|
||||||
|
"keyword": "치과 임플란트 가격",
|
||||||
|
"volume": 5400,
|
||||||
|
"kd": 32,
|
||||||
|
"cpc": 2.5,
|
||||||
|
"intent": "commercial",
|
||||||
|
"cluster": "임플란트 가격"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"timestamp": "2025-01-01T00:00:00"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Notion Output (Required)
|
||||||
|
|
||||||
|
**IMPORTANT**: All audit reports MUST be saved to the OurDigital SEO Audit Log database.
|
||||||
|
|
||||||
|
### Database Configuration
|
||||||
|
|
||||||
|
| Field | Value |
|
||||||
|
|-------|-------|
|
||||||
|
| Database ID | `2c8581e5-8a1e-8035-880b-e38cefc2f3ef` |
|
||||||
|
| URL | https://www.notion.so/dintelligence/2c8581e58a1e8035880be38cefc2f3ef |
|
||||||
|
|
||||||
|
### Required Properties
|
||||||
|
|
||||||
|
| Property | Type | Description |
|
||||||
|
|----------|------|-------------|
|
||||||
|
| Issue | Title | Report title (Korean + date) |
|
||||||
|
| Site | URL | Audited website URL |
|
||||||
|
| Category | Select | Keyword Research |
|
||||||
|
| Priority | Select | Based on opportunity score |
|
||||||
|
| Found Date | Date | Research date (YYYY-MM-DD) |
|
||||||
|
| Audit ID | Rich Text | Format: KW-YYYYMMDD-NNN |
|
||||||
|
|
||||||
|
### Language Guidelines
|
||||||
|
|
||||||
|
- Report content in Korean (한국어)
|
||||||
|
- Keep technical English terms as-is (e.g., Keyword Difficulty, Search Volume, CPC)
|
||||||
|
- URLs and code remain unchanged
|
||||||
@@ -0,0 +1,207 @@
|
|||||||
|
"""
|
||||||
|
Base Client - Shared async client utilities
|
||||||
|
===========================================
|
||||||
|
Purpose: Rate-limited async operations for API clients
|
||||||
|
Python: 3.10+
|
||||||
|
"""
|
||||||
|
|
||||||
|
import asyncio
|
||||||
|
import logging
|
||||||
|
import os
|
||||||
|
from asyncio import Semaphore
|
||||||
|
from datetime import datetime
|
||||||
|
from typing import Any, Callable, TypeVar
|
||||||
|
|
||||||
|
from dotenv import load_dotenv
|
||||||
|
from tenacity import (
|
||||||
|
retry,
|
||||||
|
stop_after_attempt,
|
||||||
|
wait_exponential,
|
||||||
|
retry_if_exception_type,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Load environment variables
|
||||||
|
load_dotenv()
|
||||||
|
|
||||||
|
# Logging setup
|
||||||
|
logging.basicConfig(
|
||||||
|
level=logging.INFO,
|
||||||
|
format="%(asctime)s - %(levelname)s - %(message)s",
|
||||||
|
)
|
||||||
|
|
||||||
|
T = TypeVar("T")
|
||||||
|
|
||||||
|
|
||||||
|
class RateLimiter:
|
||||||
|
"""Rate limiter using token bucket algorithm."""
|
||||||
|
|
||||||
|
def __init__(self, rate: float, per: float = 1.0):
|
||||||
|
"""
|
||||||
|
Initialize rate limiter.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
rate: Number of requests allowed
|
||||||
|
per: Time period in seconds (default: 1 second)
|
||||||
|
"""
|
||||||
|
self.rate = rate
|
||||||
|
self.per = per
|
||||||
|
self.tokens = rate
|
||||||
|
self.last_update = datetime.now()
|
||||||
|
self._lock = asyncio.Lock()
|
||||||
|
|
||||||
|
async def acquire(self) -> None:
|
||||||
|
"""Acquire a token, waiting if necessary."""
|
||||||
|
async with self._lock:
|
||||||
|
now = datetime.now()
|
||||||
|
elapsed = (now - self.last_update).total_seconds()
|
||||||
|
self.tokens = min(self.rate, self.tokens + elapsed * (self.rate / self.per))
|
||||||
|
self.last_update = now
|
||||||
|
|
||||||
|
if self.tokens < 1:
|
||||||
|
wait_time = (1 - self.tokens) * (self.per / self.rate)
|
||||||
|
await asyncio.sleep(wait_time)
|
||||||
|
self.tokens = 0
|
||||||
|
else:
|
||||||
|
self.tokens -= 1
|
||||||
|
|
||||||
|
|
||||||
|
class BaseAsyncClient:
|
||||||
|
"""Base class for async API clients with rate limiting."""
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
max_concurrent: int = 5,
|
||||||
|
requests_per_second: float = 3.0,
|
||||||
|
logger: logging.Logger | None = None,
|
||||||
|
):
|
||||||
|
"""
|
||||||
|
Initialize base client.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
max_concurrent: Maximum concurrent requests
|
||||||
|
requests_per_second: Rate limit
|
||||||
|
logger: Logger instance
|
||||||
|
"""
|
||||||
|
self.semaphore = Semaphore(max_concurrent)
|
||||||
|
self.rate_limiter = RateLimiter(requests_per_second)
|
||||||
|
self.logger = logger or logging.getLogger(self.__class__.__name__)
|
||||||
|
self.stats = {
|
||||||
|
"requests": 0,
|
||||||
|
"success": 0,
|
||||||
|
"errors": 0,
|
||||||
|
"retries": 0,
|
||||||
|
}
|
||||||
|
|
||||||
|
@retry(
|
||||||
|
stop=stop_after_attempt(3),
|
||||||
|
wait=wait_exponential(multiplier=1, min=2, max=10),
|
||||||
|
retry=retry_if_exception_type(Exception),
|
||||||
|
)
|
||||||
|
async def _rate_limited_request(
|
||||||
|
self,
|
||||||
|
coro: Callable[[], Any],
|
||||||
|
) -> Any:
|
||||||
|
"""Execute a request with rate limiting and retry."""
|
||||||
|
async with self.semaphore:
|
||||||
|
await self.rate_limiter.acquire()
|
||||||
|
self.stats["requests"] += 1
|
||||||
|
try:
|
||||||
|
result = await coro()
|
||||||
|
self.stats["success"] += 1
|
||||||
|
return result
|
||||||
|
except Exception as e:
|
||||||
|
self.stats["errors"] += 1
|
||||||
|
self.logger.error(f"Request failed: {e}")
|
||||||
|
raise
|
||||||
|
|
||||||
|
async def batch_requests(
|
||||||
|
self,
|
||||||
|
requests: list[Callable[[], Any]],
|
||||||
|
desc: str = "Processing",
|
||||||
|
) -> list[Any]:
|
||||||
|
"""Execute multiple requests concurrently."""
|
||||||
|
try:
|
||||||
|
from tqdm.asyncio import tqdm
|
||||||
|
has_tqdm = True
|
||||||
|
except ImportError:
|
||||||
|
has_tqdm = False
|
||||||
|
|
||||||
|
async def execute(req: Callable) -> Any:
|
||||||
|
try:
|
||||||
|
return await self._rate_limited_request(req)
|
||||||
|
except Exception as e:
|
||||||
|
return {"error": str(e)}
|
||||||
|
|
||||||
|
tasks = [execute(req) for req in requests]
|
||||||
|
|
||||||
|
if has_tqdm:
|
||||||
|
results = []
|
||||||
|
for coro in tqdm.as_completed(tasks, total=len(tasks), desc=desc):
|
||||||
|
result = await coro
|
||||||
|
results.append(result)
|
||||||
|
return results
|
||||||
|
else:
|
||||||
|
return await asyncio.gather(*tasks, return_exceptions=True)
|
||||||
|
|
||||||
|
def print_stats(self) -> None:
|
||||||
|
"""Print request statistics."""
|
||||||
|
self.logger.info("=" * 40)
|
||||||
|
self.logger.info("Request Statistics:")
|
||||||
|
self.logger.info(f" Total Requests: {self.stats['requests']}")
|
||||||
|
self.logger.info(f" Successful: {self.stats['success']}")
|
||||||
|
self.logger.info(f" Errors: {self.stats['errors']}")
|
||||||
|
self.logger.info("=" * 40)
|
||||||
|
|
||||||
|
|
||||||
|
class ConfigManager:
|
||||||
|
"""Manage API configuration and credentials."""
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
load_dotenv()
|
||||||
|
|
||||||
|
@property
|
||||||
|
def google_credentials_path(self) -> str | None:
|
||||||
|
"""Get Google service account credentials path."""
|
||||||
|
# Prefer SEO-specific credentials, fallback to general credentials
|
||||||
|
seo_creds = os.path.expanduser("~/.credential/ourdigital-seo-agent.json")
|
||||||
|
if os.path.exists(seo_creds):
|
||||||
|
return seo_creds
|
||||||
|
return os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
|
||||||
|
|
||||||
|
@property
|
||||||
|
def pagespeed_api_key(self) -> str | None:
|
||||||
|
"""Get PageSpeed Insights API key."""
|
||||||
|
return os.getenv("PAGESPEED_API_KEY")
|
||||||
|
|
||||||
|
@property
|
||||||
|
def custom_search_api_key(self) -> str | None:
|
||||||
|
"""Get Custom Search API key."""
|
||||||
|
return os.getenv("CUSTOM_SEARCH_API_KEY")
|
||||||
|
|
||||||
|
@property
|
||||||
|
def custom_search_engine_id(self) -> str | None:
|
||||||
|
"""Get Custom Search Engine ID."""
|
||||||
|
return os.getenv("CUSTOM_SEARCH_ENGINE_ID")
|
||||||
|
|
||||||
|
@property
|
||||||
|
def notion_token(self) -> str | None:
|
||||||
|
"""Get Notion API token."""
|
||||||
|
return os.getenv("NOTION_TOKEN") or os.getenv("NOTION_API_KEY")
|
||||||
|
|
||||||
|
def validate_google_credentials(self) -> bool:
|
||||||
|
"""Validate Google credentials are configured."""
|
||||||
|
creds_path = self.google_credentials_path
|
||||||
|
if not creds_path:
|
||||||
|
return False
|
||||||
|
return os.path.exists(creds_path)
|
||||||
|
|
||||||
|
def get_required(self, key: str) -> str:
|
||||||
|
"""Get required environment variable or raise error."""
|
||||||
|
value = os.getenv(key)
|
||||||
|
if not value:
|
||||||
|
raise ValueError(f"Missing required environment variable: {key}")
|
||||||
|
return value
|
||||||
|
|
||||||
|
|
||||||
|
# Singleton config instance
|
||||||
|
config = ConfigManager()
|
||||||
@@ -0,0 +1,584 @@
|
|||||||
|
"""
|
||||||
|
Keyword Gap Analyzer - Competitor keyword gap analysis with opportunity scoring
|
||||||
|
===============================================================================
|
||||||
|
Purpose: Identify keywords competitors rank for but target site doesn't,
|
||||||
|
score opportunities, and prioritize by volume/difficulty ratio.
|
||||||
|
Python: 3.10+
|
||||||
|
"""
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
import re
|
||||||
|
import subprocess
|
||||||
|
import sys
|
||||||
|
from dataclasses import dataclass, field, asdict
|
||||||
|
from datetime import datetime
|
||||||
|
from typing import Optional
|
||||||
|
from urllib.parse import urlparse
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Logging
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
logging.basicConfig(
|
||||||
|
level=logging.INFO,
|
||||||
|
format="%(asctime)s - %(levelname)s - %(message)s",
|
||||||
|
)
|
||||||
|
logger = logging.getLogger("keyword_gap_analyzer")
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Intent classification patterns (shared with keyword_researcher)
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
INTENT_PATTERNS: dict[str, list[str]] = {
|
||||||
|
"transactional": [
|
||||||
|
r"구매|구입|주문|buy|order|purchase|shop|deal|discount|coupon|할인|쿠폰",
|
||||||
|
r"예약|booking|reserve|sign\s?up|register|등록|신청",
|
||||||
|
],
|
||||||
|
"commercial": [
|
||||||
|
r"가격|비용|얼마|price|cost|pricing|fee|요금",
|
||||||
|
r"추천|best|top\s?\d|review|비교|compare|vs|versus|후기|리뷰|평점|평가",
|
||||||
|
r"잘하는곳|잘하는|맛집|업체|병원|추천\s?병원",
|
||||||
|
],
|
||||||
|
"navigational": [
|
||||||
|
r"^(www\.|http|\.com|\.co\.kr|\.net)",
|
||||||
|
r"공식|official|login|로그인|홈페이지|사이트|website",
|
||||||
|
r"고객센터|contact|support|customer\s?service",
|
||||||
|
],
|
||||||
|
"informational": [
|
||||||
|
r"방법|how\s?to|what\s?is|why|when|where|who|which",
|
||||||
|
r"뜻|의미|정의|definition|meaning|guide|tutorial",
|
||||||
|
r"효과|부작용|증상|원인|차이|종류|type|cause|symptom|effect",
|
||||||
|
r"전후|before\s?and\s?after|결과|result",
|
||||||
|
],
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Dataclasses
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class OrganicKeyword:
|
||||||
|
"""A keyword that a domain ranks for organically."""
|
||||||
|
|
||||||
|
keyword: str
|
||||||
|
position: int = 0
|
||||||
|
volume: int = 0
|
||||||
|
kd: float = 0.0
|
||||||
|
cpc: float = 0.0
|
||||||
|
url: str = ""
|
||||||
|
traffic: int = 0
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class GapKeyword:
|
||||||
|
"""A keyword gap between target and competitor(s)."""
|
||||||
|
|
||||||
|
keyword: str
|
||||||
|
volume: int = 0
|
||||||
|
kd: float = 0.0
|
||||||
|
cpc: float = 0.0
|
||||||
|
intent: str = "informational"
|
||||||
|
opportunity_score: float = 0.0
|
||||||
|
competitor_positions: dict[str, int] = field(default_factory=dict)
|
||||||
|
competitor_urls: dict[str, str] = field(default_factory=dict)
|
||||||
|
avg_competitor_position: float = 0.0
|
||||||
|
|
||||||
|
def to_dict(self) -> dict:
|
||||||
|
return asdict(self)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class GapAnalysisResult:
|
||||||
|
"""Complete gap analysis result."""
|
||||||
|
|
||||||
|
target: str
|
||||||
|
competitors: list[str] = field(default_factory=list)
|
||||||
|
country: str = "kr"
|
||||||
|
total_gaps: int = 0
|
||||||
|
total_opportunity_volume: int = 0
|
||||||
|
gaps_by_intent: dict[str, int] = field(default_factory=dict)
|
||||||
|
top_opportunities: list[GapKeyword] = field(default_factory=list)
|
||||||
|
all_gaps: list[GapKeyword] = field(default_factory=list)
|
||||||
|
target_keyword_count: int = 0
|
||||||
|
competitor_keyword_counts: dict[str, int] = field(default_factory=dict)
|
||||||
|
timestamp: str = ""
|
||||||
|
|
||||||
|
def to_dict(self) -> dict:
|
||||||
|
return {
|
||||||
|
"target": self.target,
|
||||||
|
"competitors": self.competitors,
|
||||||
|
"country": self.country,
|
||||||
|
"total_gaps": self.total_gaps,
|
||||||
|
"total_opportunity_volume": self.total_opportunity_volume,
|
||||||
|
"gaps_by_intent": self.gaps_by_intent,
|
||||||
|
"top_opportunities": [g.to_dict() for g in self.top_opportunities],
|
||||||
|
"all_gaps": [g.to_dict() for g in self.all_gaps],
|
||||||
|
"target_keyword_count": self.target_keyword_count,
|
||||||
|
"competitor_keyword_counts": self.competitor_keyword_counts,
|
||||||
|
"timestamp": self.timestamp,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# MCP Helper
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def call_mcp_tool(tool_name: str, params: dict) -> dict:
|
||||||
|
"""
|
||||||
|
Call an Ahrefs MCP tool and return parsed JSON response.
|
||||||
|
|
||||||
|
In production this delegates to the MCP bridge. For standalone usage
|
||||||
|
it invokes the Claude CLI with the appropriate tool call.
|
||||||
|
"""
|
||||||
|
logger.info(f"Calling MCP tool: {tool_name} with params: {json.dumps(params, ensure_ascii=False)}")
|
||||||
|
|
||||||
|
try:
|
||||||
|
cmd = [
|
||||||
|
"claude",
|
||||||
|
"--print",
|
||||||
|
"--output-format", "json",
|
||||||
|
"-p",
|
||||||
|
(
|
||||||
|
f"Call the tool mcp__claude_ai_Ahrefs__{tool_name} with these parameters: "
|
||||||
|
f"{json.dumps(params, ensure_ascii=False)}. Return ONLY the raw JSON result."
|
||||||
|
),
|
||||||
|
]
|
||||||
|
result = subprocess.run(cmd, capture_output=True, text=True, timeout=120)
|
||||||
|
|
||||||
|
if result.returncode != 0:
|
||||||
|
logger.warning(f"MCP tool {tool_name} returned non-zero exit code: {result.returncode}")
|
||||||
|
logger.debug(f"stderr: {result.stderr}")
|
||||||
|
return {"error": result.stderr, "keywords": [], "items": []}
|
||||||
|
|
||||||
|
try:
|
||||||
|
return json.loads(result.stdout)
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
return {"raw": result.stdout, "keywords": [], "items": []}
|
||||||
|
|
||||||
|
except subprocess.TimeoutExpired:
|
||||||
|
logger.error(f"MCP tool {tool_name} timed out")
|
||||||
|
return {"error": "timeout", "keywords": [], "items": []}
|
||||||
|
except FileNotFoundError:
|
||||||
|
logger.warning("Claude CLI not found - returning empty result for standalone testing")
|
||||||
|
return {"keywords": [], "items": []}
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Utility functions
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def extract_domain(url: str) -> str:
|
||||||
|
"""Extract clean domain from URL."""
|
||||||
|
if not url.startswith(("http://", "https://")):
|
||||||
|
url = f"https://{url}"
|
||||||
|
parsed = urlparse(url)
|
||||||
|
domain = parsed.netloc or parsed.path
|
||||||
|
domain = domain.lower().strip("/")
|
||||||
|
if domain.startswith("www."):
|
||||||
|
domain = domain[4:]
|
||||||
|
return domain
|
||||||
|
|
||||||
|
|
||||||
|
def classify_intent(keyword: str) -> str:
|
||||||
|
"""Classify search intent based on keyword patterns."""
|
||||||
|
keyword_lower = keyword.lower().strip()
|
||||||
|
for intent, patterns in INTENT_PATTERNS.items():
|
||||||
|
for pattern in patterns:
|
||||||
|
if re.search(pattern, keyword_lower, re.IGNORECASE):
|
||||||
|
return intent
|
||||||
|
return "informational"
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# KeywordGapAnalyzer
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
class KeywordGapAnalyzer:
|
||||||
|
"""Analyze keyword gaps between a target site and its competitors."""
|
||||||
|
|
||||||
|
def __init__(self, country: str = "kr", min_volume: int = 0):
|
||||||
|
self.country = country
|
||||||
|
self.min_volume = min_volume
|
||||||
|
|
||||||
|
def get_organic_keywords(self, domain: str, limit: int = 1000) -> list[OrganicKeyword]:
|
||||||
|
"""
|
||||||
|
Fetch organic keywords for a domain via Ahrefs site-explorer-organic-keywords.
|
||||||
|
Returns a list of OrganicKeyword entries.
|
||||||
|
"""
|
||||||
|
clean_domain = extract_domain(domain)
|
||||||
|
logger.info(f"Fetching organic keywords for: {clean_domain} (limit={limit})")
|
||||||
|
|
||||||
|
result = call_mcp_tool("site-explorer-organic-keywords", {
|
||||||
|
"target": clean_domain,
|
||||||
|
"country": self.country,
|
||||||
|
"limit": limit,
|
||||||
|
"mode": "domain",
|
||||||
|
})
|
||||||
|
|
||||||
|
keywords: list[OrganicKeyword] = []
|
||||||
|
for item in result.get("keywords", result.get("items", [])):
|
||||||
|
if not isinstance(item, dict):
|
||||||
|
continue
|
||||||
|
kw = OrganicKeyword(
|
||||||
|
keyword=item.get("keyword", item.get("term", "")),
|
||||||
|
position=int(item.get("position", item.get("rank", 0)) or 0),
|
||||||
|
volume=int(item.get("volume", item.get("search_volume", 0)) or 0),
|
||||||
|
kd=float(item.get("keyword_difficulty", item.get("kd", 0)) or 0),
|
||||||
|
cpc=float(item.get("cpc", item.get("cost_per_click", 0)) or 0),
|
||||||
|
url=item.get("url", item.get("best_position_url", "")),
|
||||||
|
traffic=int(item.get("traffic", item.get("estimated_traffic", 0)) or 0),
|
||||||
|
)
|
||||||
|
if kw.keyword:
|
||||||
|
keywords.append(kw)
|
||||||
|
|
||||||
|
logger.info(f"Found {len(keywords)} organic keywords for {clean_domain}")
|
||||||
|
return keywords
|
||||||
|
|
||||||
|
def find_gaps(
|
||||||
|
self,
|
||||||
|
target_keywords: list[OrganicKeyword],
|
||||||
|
competitor_keyword_sets: dict[str, list[OrganicKeyword]],
|
||||||
|
) -> list[GapKeyword]:
|
||||||
|
"""
|
||||||
|
Identify keywords that competitors rank for but the target doesn't.
|
||||||
|
|
||||||
|
A gap keyword is one that appears in at least one competitor's keyword
|
||||||
|
set but not in the target's keyword set.
|
||||||
|
"""
|
||||||
|
# Build target keyword set for fast lookup
|
||||||
|
target_kw_set: set[str] = {kw.keyword.lower().strip() for kw in target_keywords}
|
||||||
|
|
||||||
|
# Collect all competitor keywords with their positions
|
||||||
|
gap_map: dict[str, GapKeyword] = {}
|
||||||
|
|
||||||
|
for comp_domain, comp_keywords in competitor_keyword_sets.items():
|
||||||
|
for ckw in comp_keywords:
|
||||||
|
kw_lower = ckw.keyword.lower().strip()
|
||||||
|
|
||||||
|
# Skip if target already ranks for this keyword
|
||||||
|
if kw_lower in target_kw_set:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Skip below minimum volume
|
||||||
|
if ckw.volume < self.min_volume:
|
||||||
|
continue
|
||||||
|
|
||||||
|
if kw_lower not in gap_map:
|
||||||
|
gap_map[kw_lower] = GapKeyword(
|
||||||
|
keyword=ckw.keyword,
|
||||||
|
volume=ckw.volume,
|
||||||
|
kd=ckw.kd,
|
||||||
|
cpc=ckw.cpc,
|
||||||
|
intent=classify_intent(ckw.keyword),
|
||||||
|
competitor_positions={},
|
||||||
|
competitor_urls={},
|
||||||
|
)
|
||||||
|
|
||||||
|
gap_map[kw_lower].competitor_positions[comp_domain] = ckw.position
|
||||||
|
gap_map[kw_lower].competitor_urls[comp_domain] = ckw.url
|
||||||
|
|
||||||
|
# Update volume/kd if higher from another competitor
|
||||||
|
if ckw.volume > gap_map[kw_lower].volume:
|
||||||
|
gap_map[kw_lower].volume = ckw.volume
|
||||||
|
if ckw.kd > 0 and (gap_map[kw_lower].kd == 0 or ckw.kd < gap_map[kw_lower].kd):
|
||||||
|
gap_map[kw_lower].kd = ckw.kd
|
||||||
|
|
||||||
|
gaps = list(gap_map.values())
|
||||||
|
|
||||||
|
# Calculate average competitor position for each gap
|
||||||
|
for gap in gaps:
|
||||||
|
positions = list(gap.competitor_positions.values())
|
||||||
|
gap.avg_competitor_position = round(
|
||||||
|
sum(positions) / len(positions), 1
|
||||||
|
) if positions else 0.0
|
||||||
|
|
||||||
|
logger.info(f"Found {len(gaps)} keyword gaps")
|
||||||
|
return gaps
|
||||||
|
|
||||||
|
def score_opportunities(self, gaps: list[GapKeyword]) -> list[GapKeyword]:
|
||||||
|
"""
|
||||||
|
Score each gap keyword by opportunity potential.
|
||||||
|
|
||||||
|
Formula:
|
||||||
|
opportunity_score = (volume_score * 0.4) + (kd_score * 0.3) +
|
||||||
|
(position_score * 0.2) + (intent_score * 0.1)
|
||||||
|
|
||||||
|
Where:
|
||||||
|
- volume_score: normalized 0-100 based on max volume in set
|
||||||
|
- kd_score: inverted (lower KD = higher score), normalized 0-100
|
||||||
|
- position_score: based on avg competitor position (lower = easier to compete)
|
||||||
|
- intent_score: commercial/transactional get higher scores
|
||||||
|
"""
|
||||||
|
if not gaps:
|
||||||
|
return gaps
|
||||||
|
|
||||||
|
# Find max volume for normalization
|
||||||
|
max_volume = max(g.volume for g in gaps) if gaps else 1
|
||||||
|
max_volume = max(max_volume, 1)
|
||||||
|
|
||||||
|
intent_scores = {
|
||||||
|
"transactional": 100,
|
||||||
|
"commercial": 80,
|
||||||
|
"informational": 40,
|
||||||
|
"navigational": 20,
|
||||||
|
}
|
||||||
|
|
||||||
|
for gap in gaps:
|
||||||
|
# Volume score (0-100)
|
||||||
|
volume_score = (gap.volume / max_volume) * 100
|
||||||
|
|
||||||
|
# KD score (inverted: low KD = high score)
|
||||||
|
kd_score = max(0, 100 - gap.kd)
|
||||||
|
|
||||||
|
# Position score (competitors ranking 1-10 means realistic opportunity)
|
||||||
|
if gap.avg_competitor_position <= 10:
|
||||||
|
position_score = 90
|
||||||
|
elif gap.avg_competitor_position <= 20:
|
||||||
|
position_score = 70
|
||||||
|
elif gap.avg_competitor_position <= 50:
|
||||||
|
position_score = 50
|
||||||
|
else:
|
||||||
|
position_score = 30
|
||||||
|
|
||||||
|
# Intent score
|
||||||
|
intent_score = intent_scores.get(gap.intent, 40)
|
||||||
|
|
||||||
|
# Combined score
|
||||||
|
gap.opportunity_score = round(
|
||||||
|
(volume_score * 0.4) +
|
||||||
|
(kd_score * 0.3) +
|
||||||
|
(position_score * 0.2) +
|
||||||
|
(intent_score * 0.1),
|
||||||
|
1,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Sort by opportunity score descending
|
||||||
|
gaps.sort(key=lambda g: g.opportunity_score, reverse=True)
|
||||||
|
|
||||||
|
logger.info(f"Scored {len(gaps)} gap keywords by opportunity")
|
||||||
|
return gaps
|
||||||
|
|
||||||
|
def analyze(self, target_url: str, competitor_urls: list[str]) -> GapAnalysisResult:
|
||||||
|
"""
|
||||||
|
Orchestrate full keyword gap analysis:
|
||||||
|
1. Fetch organic keywords for target
|
||||||
|
2. Fetch organic keywords for each competitor
|
||||||
|
3. Identify gaps
|
||||||
|
4. Score opportunities
|
||||||
|
5. Compile results
|
||||||
|
"""
|
||||||
|
target_domain = extract_domain(target_url)
|
||||||
|
competitor_domains = [extract_domain(url) for url in competitor_urls]
|
||||||
|
|
||||||
|
logger.info(
|
||||||
|
f"Starting gap analysis: {target_domain} vs {', '.join(competitor_domains)}"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Step 1: Fetch target keywords
|
||||||
|
target_keywords = self.get_organic_keywords(target_domain)
|
||||||
|
|
||||||
|
# Step 2: Fetch competitor keywords
|
||||||
|
competitor_keyword_sets: dict[str, list[OrganicKeyword]] = {}
|
||||||
|
competitor_keyword_counts: dict[str, int] = {}
|
||||||
|
|
||||||
|
for comp_domain in competitor_domains:
|
||||||
|
comp_keywords = self.get_organic_keywords(comp_domain)
|
||||||
|
competitor_keyword_sets[comp_domain] = comp_keywords
|
||||||
|
competitor_keyword_counts[comp_domain] = len(comp_keywords)
|
||||||
|
|
||||||
|
# Step 3: Find gaps
|
||||||
|
gaps = self.find_gaps(target_keywords, competitor_keyword_sets)
|
||||||
|
|
||||||
|
# Step 4: Score opportunities
|
||||||
|
scored_gaps = self.score_opportunities(gaps)
|
||||||
|
|
||||||
|
# Step 5: Calculate intent distribution
|
||||||
|
gaps_by_intent: dict[str, int] = {}
|
||||||
|
for gap in scored_gaps:
|
||||||
|
gaps_by_intent[gap.intent] = gaps_by_intent.get(gap.intent, 0) + 1
|
||||||
|
|
||||||
|
# Step 6: Compile result
|
||||||
|
result = GapAnalysisResult(
|
||||||
|
target=target_domain,
|
||||||
|
competitors=competitor_domains,
|
||||||
|
country=self.country,
|
||||||
|
total_gaps=len(scored_gaps),
|
||||||
|
total_opportunity_volume=sum(g.volume for g in scored_gaps),
|
||||||
|
gaps_by_intent=gaps_by_intent,
|
||||||
|
top_opportunities=scored_gaps[:50],
|
||||||
|
all_gaps=scored_gaps,
|
||||||
|
target_keyword_count=len(target_keywords),
|
||||||
|
competitor_keyword_counts=competitor_keyword_counts,
|
||||||
|
timestamp=datetime.now().isoformat(),
|
||||||
|
)
|
||||||
|
|
||||||
|
logger.info(
|
||||||
|
f"Gap analysis complete: {result.total_gaps} gaps found, "
|
||||||
|
f"total opportunity volume {result.total_opportunity_volume:,}"
|
||||||
|
)
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Plain-text report formatter
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def format_text_report(result: GapAnalysisResult) -> str:
|
||||||
|
"""Format gap analysis result as a human-readable text report."""
|
||||||
|
lines: list[str] = []
|
||||||
|
lines.append("=" * 75)
|
||||||
|
lines.append(f"Keyword Gap Analysis Report")
|
||||||
|
lines.append(f"Target: {result.target}")
|
||||||
|
lines.append(f"Competitors: {', '.join(result.competitors)}")
|
||||||
|
lines.append(f"Country: {result.country.upper()} | Date: {result.timestamp[:10]}")
|
||||||
|
lines.append("=" * 75)
|
||||||
|
lines.append("")
|
||||||
|
|
||||||
|
# Overview
|
||||||
|
lines.append("## Overview")
|
||||||
|
lines.append(f" Target keywords: {result.target_keyword_count:,}")
|
||||||
|
for comp, count in result.competitor_keyword_counts.items():
|
||||||
|
lines.append(f" {comp} keywords: {count:,}")
|
||||||
|
lines.append(f" Keyword gaps found: {result.total_gaps:,}")
|
||||||
|
lines.append(f" Total opportunity volume: {result.total_opportunity_volume:,}")
|
||||||
|
lines.append("")
|
||||||
|
|
||||||
|
# Intent distribution
|
||||||
|
if result.gaps_by_intent:
|
||||||
|
lines.append("## Gaps by Intent")
|
||||||
|
for intent, count in sorted(result.gaps_by_intent.items(), key=lambda x: x[1], reverse=True):
|
||||||
|
pct = (count / result.total_gaps) * 100 if result.total_gaps else 0
|
||||||
|
lines.append(f" {intent:<15}: {count:>5} ({pct:.1f}%)")
|
||||||
|
lines.append("")
|
||||||
|
|
||||||
|
# Top opportunities
|
||||||
|
if result.top_opportunities:
|
||||||
|
lines.append("## Top Opportunities (by score)")
|
||||||
|
header = f" {'Keyword':<35} {'Vol':>8} {'KD':>6} {'Score':>7} {'Intent':<15} {'Competitors'}"
|
||||||
|
lines.append(header)
|
||||||
|
lines.append(" " + "-" * 90)
|
||||||
|
|
||||||
|
for gap in result.top_opportunities[:30]:
|
||||||
|
kw_display = gap.keyword[:33] if len(gap.keyword) > 33 else gap.keyword
|
||||||
|
comp_positions = ", ".join(
|
||||||
|
f"{d}:#{p}" for d, p in gap.competitor_positions.items()
|
||||||
|
)
|
||||||
|
comp_display = comp_positions[:30] if len(comp_positions) > 30 else comp_positions
|
||||||
|
|
||||||
|
lines.append(
|
||||||
|
f" {kw_display:<35} {gap.volume:>8,} {gap.kd:>6.1f} "
|
||||||
|
f"{gap.opportunity_score:>7.1f} {gap.intent:<15} {comp_display}"
|
||||||
|
)
|
||||||
|
lines.append("")
|
||||||
|
|
||||||
|
# Quick wins (low KD, high volume)
|
||||||
|
quick_wins = [g for g in result.all_gaps if g.kd <= 30 and g.volume >= 100]
|
||||||
|
quick_wins.sort(key=lambda g: g.volume, reverse=True)
|
||||||
|
if quick_wins:
|
||||||
|
lines.append("## Quick Wins (KD <= 30, Volume >= 100)")
|
||||||
|
lines.append(f" {'Keyword':<35} {'Vol':>8} {'KD':>6} {'Intent':<15}")
|
||||||
|
lines.append(" " + "-" * 64)
|
||||||
|
for gap in quick_wins[:20]:
|
||||||
|
kw_display = gap.keyword[:33] if len(gap.keyword) > 33 else gap.keyword
|
||||||
|
lines.append(
|
||||||
|
f" {kw_display:<35} {gap.volume:>8,} {gap.kd:>6.1f} {gap.intent:<15}"
|
||||||
|
)
|
||||||
|
lines.append("")
|
||||||
|
|
||||||
|
return "\n".join(lines)
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# CLI
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
parser = argparse.ArgumentParser(
|
||||||
|
description="Keyword Gap Analyzer - Find competitor keyword opportunities",
|
||||||
|
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||||
|
epilog="""
|
||||||
|
Examples:
|
||||||
|
python keyword_gap_analyzer.py --target https://example.com --competitor https://comp.com --json
|
||||||
|
python keyword_gap_analyzer.py --target example.com --competitor comp1.com --competitor comp2.com --min-volume 100 --json
|
||||||
|
python keyword_gap_analyzer.py --target example.com --competitor comp.com --country us --output gaps.json
|
||||||
|
""",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--target",
|
||||||
|
required=True,
|
||||||
|
help="Target website URL or domain",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--competitor",
|
||||||
|
action="append",
|
||||||
|
required=True,
|
||||||
|
dest="competitors",
|
||||||
|
help="Competitor URL or domain (can be repeated)",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--country",
|
||||||
|
default="kr",
|
||||||
|
help="Target country code (default: kr)",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--min-volume",
|
||||||
|
type=int,
|
||||||
|
default=0,
|
||||||
|
help="Minimum search volume filter (default: 0)",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--json",
|
||||||
|
action="store_true",
|
||||||
|
dest="output_json",
|
||||||
|
help="Output results as JSON",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--output",
|
||||||
|
type=str,
|
||||||
|
default=None,
|
||||||
|
help="Write output to file (path)",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--verbose",
|
||||||
|
action="store_true",
|
||||||
|
help="Enable verbose/debug logging",
|
||||||
|
)
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
if args.verbose:
|
||||||
|
logging.getLogger().setLevel(logging.DEBUG)
|
||||||
|
|
||||||
|
# Run analysis
|
||||||
|
analyzer = KeywordGapAnalyzer(
|
||||||
|
country=args.country,
|
||||||
|
min_volume=args.min_volume,
|
||||||
|
)
|
||||||
|
result = analyzer.analyze(args.target, args.competitors)
|
||||||
|
|
||||||
|
# Format output
|
||||||
|
if args.output_json:
|
||||||
|
output = json.dumps(result.to_dict(), ensure_ascii=False, indent=2)
|
||||||
|
else:
|
||||||
|
output = format_text_report(result)
|
||||||
|
|
||||||
|
# Write or print
|
||||||
|
if args.output:
|
||||||
|
with open(args.output, "w", encoding="utf-8") as f:
|
||||||
|
f.write(output)
|
||||||
|
logger.info(f"Output written to: {args.output}")
|
||||||
|
else:
|
||||||
|
print(output)
|
||||||
|
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
sys.exit(main())
|
||||||
@@ -0,0 +1,656 @@
|
|||||||
|
"""
|
||||||
|
Keyword Researcher - Seed keyword expansion, intent classification, and topic clustering
|
||||||
|
========================================================================================
|
||||||
|
Purpose: Expand seed keywords via Ahrefs APIs, classify search intent,
|
||||||
|
cluster topics, and support Korean market keyword discovery.
|
||||||
|
Python: 3.10+
|
||||||
|
"""
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
import re
|
||||||
|
import subprocess
|
||||||
|
import sys
|
||||||
|
from dataclasses import dataclass, field, asdict
|
||||||
|
from datetime import datetime
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Logging
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
logging.basicConfig(
|
||||||
|
level=logging.INFO,
|
||||||
|
format="%(asctime)s - %(levelname)s - %(message)s",
|
||||||
|
)
|
||||||
|
logger = logging.getLogger("keyword_researcher")
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Constants - Korean suffix expansion
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
KOREAN_SUFFIXES: list[str] = [
|
||||||
|
"추천",
|
||||||
|
"가격",
|
||||||
|
"후기",
|
||||||
|
"잘하는곳",
|
||||||
|
"부작용",
|
||||||
|
"전후",
|
||||||
|
"비용",
|
||||||
|
"추천 병원",
|
||||||
|
"후기 블로그",
|
||||||
|
"방법",
|
||||||
|
"종류",
|
||||||
|
"비교",
|
||||||
|
"효과",
|
||||||
|
"주의사항",
|
||||||
|
"장단점",
|
||||||
|
]
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Intent classification patterns
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
INTENT_PATTERNS: dict[str, list[str]] = {
|
||||||
|
"transactional": [
|
||||||
|
r"구매|구입|주문|buy|order|purchase|shop|deal|discount|coupon|할인|쿠폰",
|
||||||
|
r"예약|booking|reserve|sign\s?up|register|등록|신청",
|
||||||
|
],
|
||||||
|
"commercial": [
|
||||||
|
r"가격|비용|얼마|price|cost|pricing|fee|요금",
|
||||||
|
r"추천|best|top\s?\d|review|비교|compare|vs|versus|후기|리뷰|평점|평가",
|
||||||
|
r"잘하는곳|잘하는|맛집|업체|병원|추천\s?병원",
|
||||||
|
],
|
||||||
|
"navigational": [
|
||||||
|
r"^(www\.|http|\.com|\.co\.kr|\.net)",
|
||||||
|
r"공식|official|login|로그인|홈페이지|사이트|website",
|
||||||
|
r"고객센터|contact|support|customer\s?service",
|
||||||
|
],
|
||||||
|
"informational": [
|
||||||
|
r"방법|how\s?to|what\s?is|why|when|where|who|which",
|
||||||
|
r"뜻|의미|정의|definition|meaning|guide|tutorial",
|
||||||
|
r"효과|부작용|증상|원인|차이|종류|type|cause|symptom|effect",
|
||||||
|
r"전후|before\s?and\s?after|결과|result",
|
||||||
|
],
|
||||||
|
}
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Dataclasses
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class KeywordEntry:
|
||||||
|
"""Single keyword with its metrics and classification."""
|
||||||
|
|
||||||
|
keyword: str
|
||||||
|
volume: int = 0
|
||||||
|
kd: float = 0.0
|
||||||
|
cpc: float = 0.0
|
||||||
|
intent: str = "informational"
|
||||||
|
cluster: str = ""
|
||||||
|
source: str = ""
|
||||||
|
country_volumes: dict[str, int] = field(default_factory=dict)
|
||||||
|
|
||||||
|
def to_dict(self) -> dict:
|
||||||
|
data = asdict(self)
|
||||||
|
if not data["country_volumes"]:
|
||||||
|
del data["country_volumes"]
|
||||||
|
return data
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class KeywordCluster:
|
||||||
|
"""Group of semantically related keywords."""
|
||||||
|
|
||||||
|
topic: str
|
||||||
|
keywords: list[str] = field(default_factory=list)
|
||||||
|
total_volume: int = 0
|
||||||
|
avg_kd: float = 0.0
|
||||||
|
primary_intent: str = "informational"
|
||||||
|
|
||||||
|
def to_dict(self) -> dict:
|
||||||
|
return asdict(self)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class ResearchResult:
|
||||||
|
"""Full research result container."""
|
||||||
|
|
||||||
|
seed_keyword: str
|
||||||
|
country: str
|
||||||
|
total_keywords: int = 0
|
||||||
|
total_volume: int = 0
|
||||||
|
clusters: list[KeywordCluster] = field(default_factory=list)
|
||||||
|
keywords: list[KeywordEntry] = field(default_factory=list)
|
||||||
|
timestamp: str = ""
|
||||||
|
|
||||||
|
def to_dict(self) -> dict:
|
||||||
|
return {
|
||||||
|
"seed_keyword": self.seed_keyword,
|
||||||
|
"country": self.country,
|
||||||
|
"total_keywords": self.total_keywords,
|
||||||
|
"total_volume": self.total_volume,
|
||||||
|
"clusters": [c.to_dict() for c in self.clusters],
|
||||||
|
"keywords": [k.to_dict() for k in self.keywords],
|
||||||
|
"timestamp": self.timestamp,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# MCP Helper - calls Ahrefs MCP tools via subprocess
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def call_mcp_tool(tool_name: str, params: dict) -> dict:
|
||||||
|
"""
|
||||||
|
Call an Ahrefs MCP tool and return parsed JSON response.
|
||||||
|
|
||||||
|
In production this delegates to the MCP bridge. For standalone usage
|
||||||
|
it invokes the Claude CLI with the appropriate tool call.
|
||||||
|
"""
|
||||||
|
logger.info(f"Calling MCP tool: {tool_name} with params: {json.dumps(params, ensure_ascii=False)}")
|
||||||
|
|
||||||
|
try:
|
||||||
|
cmd = [
|
||||||
|
"claude",
|
||||||
|
"--print",
|
||||||
|
"--output-format", "json",
|
||||||
|
"-p",
|
||||||
|
f"Call the tool mcp__claude_ai_Ahrefs__{tool_name} with these parameters: {json.dumps(params, ensure_ascii=False)}. Return ONLY the raw JSON result.",
|
||||||
|
]
|
||||||
|
result = subprocess.run(cmd, capture_output=True, text=True, timeout=120)
|
||||||
|
|
||||||
|
if result.returncode != 0:
|
||||||
|
logger.warning(f"MCP tool {tool_name} returned non-zero exit code: {result.returncode}")
|
||||||
|
logger.debug(f"stderr: {result.stderr}")
|
||||||
|
return {"error": result.stderr, "keywords": [], "items": []}
|
||||||
|
|
||||||
|
try:
|
||||||
|
return json.loads(result.stdout)
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
return {"raw": result.stdout, "keywords": [], "items": []}
|
||||||
|
|
||||||
|
except subprocess.TimeoutExpired:
|
||||||
|
logger.error(f"MCP tool {tool_name} timed out")
|
||||||
|
return {"error": "timeout", "keywords": [], "items": []}
|
||||||
|
except FileNotFoundError:
|
||||||
|
logger.warning("Claude CLI not found - returning empty result for standalone testing")
|
||||||
|
return {"keywords": [], "items": []}
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# KeywordResearcher
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
class KeywordResearcher:
|
||||||
|
"""Expand seed keywords, classify intent, and cluster topics."""
|
||||||
|
|
||||||
|
def __init__(self, country: str = "kr", korean_suffixes: bool = False, compare_global: bool = False):
|
||||||
|
self.country = country
|
||||||
|
self.korean_suffixes = korean_suffixes
|
||||||
|
self.compare_global = compare_global
|
||||||
|
self._seen: set[str] = set()
|
||||||
|
|
||||||
|
# ---- Keyword expansion via Ahrefs MCP ----
|
||||||
|
|
||||||
|
def expand_keywords(self, seed: str) -> list[KeywordEntry]:
|
||||||
|
"""
|
||||||
|
Expand a seed keyword using Ahrefs matching-terms, related-terms,
|
||||||
|
and search-suggestions endpoints.
|
||||||
|
"""
|
||||||
|
all_keywords: list[KeywordEntry] = []
|
||||||
|
|
||||||
|
# 1. Matching terms
|
||||||
|
logger.info(f"Fetching matching terms for: {seed}")
|
||||||
|
matching = call_mcp_tool("keywords-explorer-matching-terms", {
|
||||||
|
"keyword": seed,
|
||||||
|
"country": self.country,
|
||||||
|
"limit": 100,
|
||||||
|
})
|
||||||
|
for item in matching.get("keywords", matching.get("items", [])):
|
||||||
|
kw = self._parse_keyword_item(item, source="matching-terms")
|
||||||
|
if kw and kw.keyword not in self._seen:
|
||||||
|
self._seen.add(kw.keyword)
|
||||||
|
all_keywords.append(kw)
|
||||||
|
|
||||||
|
# 2. Related terms
|
||||||
|
logger.info(f"Fetching related terms for: {seed}")
|
||||||
|
related = call_mcp_tool("keywords-explorer-related-terms", {
|
||||||
|
"keyword": seed,
|
||||||
|
"country": self.country,
|
||||||
|
"limit": 100,
|
||||||
|
})
|
||||||
|
for item in related.get("keywords", related.get("items", [])):
|
||||||
|
kw = self._parse_keyword_item(item, source="related-terms")
|
||||||
|
if kw and kw.keyword not in self._seen:
|
||||||
|
self._seen.add(kw.keyword)
|
||||||
|
all_keywords.append(kw)
|
||||||
|
|
||||||
|
# 3. Search suggestions
|
||||||
|
logger.info(f"Fetching search suggestions for: {seed}")
|
||||||
|
suggestions = call_mcp_tool("keywords-explorer-search-suggestions", {
|
||||||
|
"keyword": seed,
|
||||||
|
"country": self.country,
|
||||||
|
"limit": 50,
|
||||||
|
})
|
||||||
|
for item in suggestions.get("keywords", suggestions.get("items", [])):
|
||||||
|
kw = self._parse_keyword_item(item, source="search-suggestions")
|
||||||
|
if kw and kw.keyword not in self._seen:
|
||||||
|
self._seen.add(kw.keyword)
|
||||||
|
all_keywords.append(kw)
|
||||||
|
|
||||||
|
# 4. Add the seed itself if not already present
|
||||||
|
if seed not in self._seen:
|
||||||
|
self._seen.add(seed)
|
||||||
|
overview = call_mcp_tool("keywords-explorer-overview", {
|
||||||
|
"keyword": seed,
|
||||||
|
"country": self.country,
|
||||||
|
})
|
||||||
|
seed_entry = self._parse_keyword_item(overview, source="seed")
|
||||||
|
if seed_entry:
|
||||||
|
seed_entry.keyword = seed
|
||||||
|
all_keywords.insert(0, seed_entry)
|
||||||
|
|
||||||
|
logger.info(f"Expanded to {len(all_keywords)} keywords from Ahrefs APIs")
|
||||||
|
return all_keywords
|
||||||
|
|
||||||
|
def expand_korean_suffixes(self, seed: str) -> list[KeywordEntry]:
|
||||||
|
"""
|
||||||
|
Generate keyword variations by appending common Korean suffixes.
|
||||||
|
Each variation is checked against Ahrefs for volume data.
|
||||||
|
"""
|
||||||
|
suffix_keywords: list[KeywordEntry] = []
|
||||||
|
|
||||||
|
for suffix in KOREAN_SUFFIXES:
|
||||||
|
variation = f"{seed} {suffix}"
|
||||||
|
if variation in self._seen:
|
||||||
|
continue
|
||||||
|
|
||||||
|
logger.info(f"Checking Korean suffix variation: {variation}")
|
||||||
|
overview = call_mcp_tool("keywords-explorer-overview", {
|
||||||
|
"keyword": variation,
|
||||||
|
"country": self.country,
|
||||||
|
})
|
||||||
|
kw = self._parse_keyword_item(overview, source="korean-suffix")
|
||||||
|
if kw:
|
||||||
|
kw.keyword = variation
|
||||||
|
if kw.volume > 0:
|
||||||
|
self._seen.add(variation)
|
||||||
|
suffix_keywords.append(kw)
|
||||||
|
else:
|
||||||
|
# Even if no data, include as zero-volume for completeness
|
||||||
|
entry = KeywordEntry(
|
||||||
|
keyword=variation,
|
||||||
|
volume=0,
|
||||||
|
kd=0.0,
|
||||||
|
cpc=0.0,
|
||||||
|
intent=self.classify_intent(variation),
|
||||||
|
source="korean-suffix",
|
||||||
|
)
|
||||||
|
self._seen.add(variation)
|
||||||
|
suffix_keywords.append(entry)
|
||||||
|
|
||||||
|
logger.info(f"Korean suffix expansion yielded {len(suffix_keywords)} variations")
|
||||||
|
return suffix_keywords
|
||||||
|
|
||||||
|
def get_volume_by_country(self, keyword: str) -> dict[str, int]:
|
||||||
|
"""
|
||||||
|
Get search volume breakdown by country for a keyword.
|
||||||
|
Useful for comparing Korean vs global demand.
|
||||||
|
"""
|
||||||
|
logger.info(f"Fetching volume-by-country for: {keyword}")
|
||||||
|
result = call_mcp_tool("keywords-explorer-volume-by-country", {
|
||||||
|
"keyword": keyword,
|
||||||
|
})
|
||||||
|
|
||||||
|
volumes: dict[str, int] = {}
|
||||||
|
for item in result.get("countries", result.get("items", [])):
|
||||||
|
if isinstance(item, dict):
|
||||||
|
country_code = item.get("country", item.get("code", ""))
|
||||||
|
volume = item.get("volume", item.get("search_volume", 0))
|
||||||
|
if country_code and volume:
|
||||||
|
volumes[country_code.lower()] = int(volume)
|
||||||
|
|
||||||
|
return volumes
|
||||||
|
|
||||||
|
# ---- Intent classification ----
|
||||||
|
|
||||||
|
def classify_intent(self, keyword: str) -> str:
|
||||||
|
"""
|
||||||
|
Classify search intent based on keyword patterns.
|
||||||
|
Priority: transactional > commercial > navigational > informational
|
||||||
|
"""
|
||||||
|
keyword_lower = keyword.lower().strip()
|
||||||
|
|
||||||
|
for intent, patterns in INTENT_PATTERNS.items():
|
||||||
|
for pattern in patterns:
|
||||||
|
if re.search(pattern, keyword_lower, re.IGNORECASE):
|
||||||
|
return intent
|
||||||
|
|
||||||
|
return "informational"
|
||||||
|
|
||||||
|
# ---- Keyword clustering ----
|
||||||
|
|
||||||
|
def cluster_keywords(self, keywords: list[KeywordEntry]) -> list[KeywordCluster]:
|
||||||
|
"""
|
||||||
|
Group keywords into topic clusters using shared n-gram tokens.
|
||||||
|
Uses a simple token overlap approach: keywords sharing significant
|
||||||
|
tokens (2+ character words) are grouped together.
|
||||||
|
"""
|
||||||
|
if not keywords:
|
||||||
|
return []
|
||||||
|
|
||||||
|
# Extract meaningful tokens from each keyword
|
||||||
|
def tokenize(text: str) -> set[str]:
|
||||||
|
tokens = set()
|
||||||
|
for word in re.split(r"\s+", text.strip().lower()):
|
||||||
|
if len(word) >= 2:
|
||||||
|
tokens.add(word)
|
||||||
|
return tokens
|
||||||
|
|
||||||
|
# Build token-to-keyword mapping
|
||||||
|
token_map: dict[str, list[int]] = {}
|
||||||
|
kw_tokens: list[set[str]] = []
|
||||||
|
|
||||||
|
for i, kw in enumerate(keywords):
|
||||||
|
tokens = tokenize(kw.keyword)
|
||||||
|
kw_tokens.append(tokens)
|
||||||
|
for token in tokens:
|
||||||
|
if token not in token_map:
|
||||||
|
token_map[token] = []
|
||||||
|
token_map[token].append(i)
|
||||||
|
|
||||||
|
# Find the most common significant tokens (cluster anchors)
|
||||||
|
token_freq = sorted(token_map.items(), key=lambda x: len(x[1]), reverse=True)
|
||||||
|
|
||||||
|
assigned: set[int] = set()
|
||||||
|
clusters: list[KeywordCluster] = []
|
||||||
|
|
||||||
|
for token, indices in token_freq:
|
||||||
|
# Skip single-occurrence tokens or very common stop-like tokens
|
||||||
|
if len(indices) < 2:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Gather unassigned keywords that share this token
|
||||||
|
cluster_indices = [i for i in indices if i not in assigned]
|
||||||
|
if len(cluster_indices) < 2:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Create the cluster
|
||||||
|
cluster_kws = [keywords[i].keyword for i in cluster_indices]
|
||||||
|
cluster_volumes = [keywords[i].volume for i in cluster_indices]
|
||||||
|
cluster_kds = [keywords[i].kd for i in cluster_indices]
|
||||||
|
cluster_intents = [keywords[i].intent for i in cluster_indices]
|
||||||
|
|
||||||
|
# Determine primary intent by frequency
|
||||||
|
intent_counts: dict[str, int] = {}
|
||||||
|
for intent in cluster_intents:
|
||||||
|
intent_counts[intent] = intent_counts.get(intent, 0) + 1
|
||||||
|
primary_intent = max(intent_counts, key=intent_counts.get)
|
||||||
|
|
||||||
|
cluster = KeywordCluster(
|
||||||
|
topic=token,
|
||||||
|
keywords=cluster_kws,
|
||||||
|
total_volume=sum(cluster_volumes),
|
||||||
|
avg_kd=round(sum(cluster_kds) / len(cluster_kds), 1) if cluster_kds else 0.0,
|
||||||
|
primary_intent=primary_intent,
|
||||||
|
)
|
||||||
|
clusters.append(cluster)
|
||||||
|
|
||||||
|
for i in cluster_indices:
|
||||||
|
assigned.add(i)
|
||||||
|
keywords[i].cluster = token
|
||||||
|
|
||||||
|
# Assign unclustered keywords to an "other" cluster
|
||||||
|
unclustered = [i for i in range(len(keywords)) if i not in assigned]
|
||||||
|
if unclustered:
|
||||||
|
other_kws = [keywords[i].keyword for i in unclustered]
|
||||||
|
other_volumes = [keywords[i].volume for i in unclustered]
|
||||||
|
other_kds = [keywords[i].kd for i in unclustered]
|
||||||
|
|
||||||
|
other_cluster = KeywordCluster(
|
||||||
|
topic="(unclustered)",
|
||||||
|
keywords=other_kws,
|
||||||
|
total_volume=sum(other_volumes),
|
||||||
|
avg_kd=round(sum(other_kds) / len(other_kds), 1) if other_kds else 0.0,
|
||||||
|
primary_intent="informational",
|
||||||
|
)
|
||||||
|
clusters.append(other_cluster)
|
||||||
|
|
||||||
|
for i in unclustered:
|
||||||
|
keywords[i].cluster = "(unclustered)"
|
||||||
|
|
||||||
|
# Sort clusters by total volume descending
|
||||||
|
clusters.sort(key=lambda c: c.total_volume, reverse=True)
|
||||||
|
|
||||||
|
logger.info(f"Clustered {len(keywords)} keywords into {len(clusters)} clusters")
|
||||||
|
return clusters
|
||||||
|
|
||||||
|
# ---- Full analysis orchestration ----
|
||||||
|
|
||||||
|
def analyze(self, seed_keyword: str) -> ResearchResult:
|
||||||
|
"""
|
||||||
|
Orchestrate a full keyword research analysis:
|
||||||
|
1. Expand seed via Ahrefs
|
||||||
|
2. Optionally expand Korean suffixes
|
||||||
|
3. Classify intent for all keywords
|
||||||
|
4. Optionally fetch volume-by-country
|
||||||
|
5. Cluster keywords into topics
|
||||||
|
6. Compile results
|
||||||
|
"""
|
||||||
|
logger.info(f"Starting keyword research for: {seed_keyword} (country={self.country})")
|
||||||
|
|
||||||
|
# Step 1: Expand keywords
|
||||||
|
keywords = self.expand_keywords(seed_keyword)
|
||||||
|
|
||||||
|
# Step 2: Korean suffix expansion
|
||||||
|
if self.korean_suffixes:
|
||||||
|
suffix_keywords = self.expand_korean_suffixes(seed_keyword)
|
||||||
|
keywords.extend(suffix_keywords)
|
||||||
|
|
||||||
|
# Step 3: Classify intent for all keywords
|
||||||
|
for kw in keywords:
|
||||||
|
if not kw.intent or kw.intent == "informational":
|
||||||
|
kw.intent = self.classify_intent(kw.keyword)
|
||||||
|
|
||||||
|
# Step 4: Volume-by-country comparison
|
||||||
|
if self.compare_global and keywords:
|
||||||
|
# Fetch for the seed and top volume keywords
|
||||||
|
top_keywords = sorted(keywords, key=lambda k: k.volume, reverse=True)[:10]
|
||||||
|
for kw in top_keywords:
|
||||||
|
volumes = self.get_volume_by_country(kw.keyword)
|
||||||
|
kw.country_volumes = volumes
|
||||||
|
|
||||||
|
# Step 5: Cluster keywords
|
||||||
|
clusters = self.cluster_keywords(keywords)
|
||||||
|
|
||||||
|
# Step 6: Compile result
|
||||||
|
result = ResearchResult(
|
||||||
|
seed_keyword=seed_keyword,
|
||||||
|
country=self.country,
|
||||||
|
total_keywords=len(keywords),
|
||||||
|
total_volume=sum(kw.volume for kw in keywords),
|
||||||
|
clusters=clusters,
|
||||||
|
keywords=sorted(keywords, key=lambda k: k.volume, reverse=True),
|
||||||
|
timestamp=datetime.now().isoformat(),
|
||||||
|
)
|
||||||
|
|
||||||
|
logger.info(
|
||||||
|
f"Research complete: {result.total_keywords} keywords, "
|
||||||
|
f"{len(result.clusters)} clusters, "
|
||||||
|
f"total volume {result.total_volume}"
|
||||||
|
)
|
||||||
|
return result
|
||||||
|
|
||||||
|
# ---- Internal helpers ----
|
||||||
|
|
||||||
|
def _parse_keyword_item(self, item: dict, source: str = "") -> Optional[KeywordEntry]:
|
||||||
|
"""Parse an Ahrefs API response item into a KeywordEntry."""
|
||||||
|
if not item or "error" in item:
|
||||||
|
return None
|
||||||
|
|
||||||
|
keyword = item.get("keyword", item.get("term", item.get("query", "")))
|
||||||
|
if not keyword:
|
||||||
|
return None
|
||||||
|
|
||||||
|
volume = int(item.get("volume", item.get("search_volume", 0)) or 0)
|
||||||
|
kd = float(item.get("keyword_difficulty", item.get("kd", 0)) or 0)
|
||||||
|
cpc = float(item.get("cpc", item.get("cost_per_click", 0)) or 0)
|
||||||
|
|
||||||
|
return KeywordEntry(
|
||||||
|
keyword=keyword,
|
||||||
|
volume=volume,
|
||||||
|
kd=round(kd, 1),
|
||||||
|
cpc=round(cpc, 2),
|
||||||
|
intent=self.classify_intent(keyword),
|
||||||
|
source=source,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Plain-text report formatter
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def format_text_report(result: ResearchResult) -> str:
|
||||||
|
"""Format research result as a human-readable text report."""
|
||||||
|
lines: list[str] = []
|
||||||
|
lines.append("=" * 70)
|
||||||
|
lines.append(f"Keyword Strategy Report: {result.seed_keyword}")
|
||||||
|
lines.append(f"Country: {result.country.upper()} | Date: {result.timestamp[:10]}")
|
||||||
|
lines.append("=" * 70)
|
||||||
|
lines.append("")
|
||||||
|
|
||||||
|
lines.append("## Overview")
|
||||||
|
lines.append(f" Total keywords discovered: {result.total_keywords}")
|
||||||
|
lines.append(f" Topic clusters: {len(result.clusters)}")
|
||||||
|
lines.append(f" Total search volume: {result.total_volume:,}")
|
||||||
|
lines.append("")
|
||||||
|
|
||||||
|
# Clusters summary
|
||||||
|
if result.clusters:
|
||||||
|
lines.append("## Top Clusters")
|
||||||
|
lines.append(f" {'Cluster':<25} {'Keywords':>8} {'Volume':>10} {'Avg KD':>8} {'Intent':<15}")
|
||||||
|
lines.append(" " + "-" * 66)
|
||||||
|
for cluster in result.clusters[:15]:
|
||||||
|
lines.append(
|
||||||
|
f" {cluster.topic:<25} {len(cluster.keywords):>8} "
|
||||||
|
f"{cluster.total_volume:>10,} {cluster.avg_kd:>8.1f} "
|
||||||
|
f"{cluster.primary_intent:<15}"
|
||||||
|
)
|
||||||
|
lines.append("")
|
||||||
|
|
||||||
|
# Top keywords
|
||||||
|
if result.keywords:
|
||||||
|
lines.append("## Top Keywords (by volume)")
|
||||||
|
lines.append(f" {'Keyword':<40} {'Vol':>8} {'KD':>6} {'CPC':>7} {'Intent':<15} {'Cluster':<15}")
|
||||||
|
lines.append(" " + "-" * 91)
|
||||||
|
for kw in result.keywords[:30]:
|
||||||
|
kw_display = kw.keyword[:38] if len(kw.keyword) > 38 else kw.keyword
|
||||||
|
cluster_display = kw.cluster[:13] if len(kw.cluster) > 13 else kw.cluster
|
||||||
|
lines.append(
|
||||||
|
f" {kw_display:<40} {kw.volume:>8,} {kw.kd:>6.1f} "
|
||||||
|
f"{kw.cpc:>7.2f} {kw.intent:<15} {cluster_display:<15}"
|
||||||
|
)
|
||||||
|
lines.append("")
|
||||||
|
|
||||||
|
# Intent distribution
|
||||||
|
intent_dist: dict[str, int] = {}
|
||||||
|
for kw in result.keywords:
|
||||||
|
intent_dist[kw.intent] = intent_dist.get(kw.intent, 0) + 1
|
||||||
|
if intent_dist:
|
||||||
|
lines.append("## Intent Distribution")
|
||||||
|
for intent, count in sorted(intent_dist.items(), key=lambda x: x[1], reverse=True):
|
||||||
|
pct = (count / len(result.keywords)) * 100 if result.keywords else 0
|
||||||
|
lines.append(f" {intent:<15}: {count:>5} ({pct:.1f}%)")
|
||||||
|
lines.append("")
|
||||||
|
|
||||||
|
return "\n".join(lines)
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# CLI
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
parser = argparse.ArgumentParser(
|
||||||
|
description="Keyword Researcher - Expand, classify, and cluster keywords",
|
||||||
|
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||||
|
epilog="""
|
||||||
|
Examples:
|
||||||
|
python keyword_researcher.py --keyword "치과 임플란트" --country kr --json
|
||||||
|
python keyword_researcher.py --keyword "dental implant" --compare-global --json
|
||||||
|
python keyword_researcher.py --keyword "치과 임플란트" --korean-suffixes --output report.json
|
||||||
|
""",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--keyword",
|
||||||
|
required=True,
|
||||||
|
help="Seed keyword to expand and research",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--country",
|
||||||
|
default="kr",
|
||||||
|
help="Target country code (default: kr)",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--korean-suffixes",
|
||||||
|
action="store_true",
|
||||||
|
help="Enable Korean suffix expansion (추천, 가격, 후기, etc.)",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--compare-global",
|
||||||
|
action="store_true",
|
||||||
|
help="Fetch volume-by-country comparison for top keywords",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--json",
|
||||||
|
action="store_true",
|
||||||
|
dest="output_json",
|
||||||
|
help="Output results as JSON",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--output",
|
||||||
|
type=str,
|
||||||
|
default=None,
|
||||||
|
help="Write output to file (path)",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--verbose",
|
||||||
|
action="store_true",
|
||||||
|
help="Enable verbose/debug logging",
|
||||||
|
)
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
if args.verbose:
|
||||||
|
logging.getLogger().setLevel(logging.DEBUG)
|
||||||
|
|
||||||
|
# Run analysis
|
||||||
|
researcher = KeywordResearcher(
|
||||||
|
country=args.country,
|
||||||
|
korean_suffixes=args.korean_suffixes,
|
||||||
|
compare_global=args.compare_global,
|
||||||
|
)
|
||||||
|
result = researcher.analyze(args.keyword)
|
||||||
|
|
||||||
|
# Format output
|
||||||
|
if args.output_json:
|
||||||
|
output = json.dumps(result.to_dict(), ensure_ascii=False, indent=2)
|
||||||
|
else:
|
||||||
|
output = format_text_report(result)
|
||||||
|
|
||||||
|
# Write or print
|
||||||
|
if args.output:
|
||||||
|
with open(args.output, "w", encoding="utf-8") as f:
|
||||||
|
f.write(output)
|
||||||
|
logger.info(f"Output written to: {args.output}")
|
||||||
|
else:
|
||||||
|
print(output)
|
||||||
|
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
sys.exit(main())
|
||||||
@@ -0,0 +1,20 @@
|
|||||||
|
# 19-seo-keyword-strategy dependencies
|
||||||
|
# Install: pip install -r requirements.txt
|
||||||
|
|
||||||
|
# HTTP & Async
|
||||||
|
requests>=2.31.0
|
||||||
|
aiohttp>=3.9.0
|
||||||
|
|
||||||
|
# Data Processing
|
||||||
|
pandas>=2.1.0
|
||||||
|
|
||||||
|
# NLP / Text Similarity
|
||||||
|
scikit-learn>=1.3.0
|
||||||
|
|
||||||
|
# Async & Retry
|
||||||
|
tenacity>=8.2.0
|
||||||
|
tqdm>=4.66.0
|
||||||
|
|
||||||
|
# Environment & CLI
|
||||||
|
python-dotenv>=1.0.0
|
||||||
|
rich>=13.7.0
|
||||||
91
custom-skills/19-seo-keyword-strategy/desktop/SKILL.md
Normal file
91
custom-skills/19-seo-keyword-strategy/desktop/SKILL.md
Normal file
@@ -0,0 +1,91 @@
|
|||||||
|
---
|
||||||
|
name: seo-keyword-strategy
|
||||||
|
description: |
|
||||||
|
Keyword strategy and research for SEO campaigns. Triggers: keyword research, keyword analysis, keyword gap, search volume, keyword clustering, intent classification.
|
||||||
|
---
|
||||||
|
|
||||||
|
# SEO Keyword Strategy & Research
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
Expand seed keywords, classify search intent, cluster topics, and identify competitor keyword gaps for comprehensive keyword strategy development.
|
||||||
|
|
||||||
|
## Core Capabilities
|
||||||
|
|
||||||
|
1. **Keyword Expansion** - Matching terms, related terms, search suggestions
|
||||||
|
2. **Korean Market** - Suffix expansion, Naver autocomplete, Korean intent patterns
|
||||||
|
3. **Intent Classification** - Informational, navigational, commercial, transactional
|
||||||
|
4. **Topic Clustering** - Group keywords into semantic clusters
|
||||||
|
5. **Gap Analysis** - Find competitor keywords missing from target site
|
||||||
|
|
||||||
|
## MCP Tool Usage
|
||||||
|
|
||||||
|
### Ahrefs for Keyword Data
|
||||||
|
```
|
||||||
|
mcp__ahrefs__keywords-explorer-overview: Get keyword metrics
|
||||||
|
mcp__ahrefs__keywords-explorer-matching-terms: Find keyword variations
|
||||||
|
mcp__ahrefs__keywords-explorer-related-terms: Discover related keywords
|
||||||
|
mcp__ahrefs__keywords-explorer-search-suggestions: Autocomplete suggestions
|
||||||
|
mcp__ahrefs__keywords-explorer-volume-by-country: Country volume comparison
|
||||||
|
mcp__ahrefs__site-explorer-organic-keywords: Competitor keyword rankings
|
||||||
|
```
|
||||||
|
|
||||||
|
### Web Search for Naver Discovery
|
||||||
|
```
|
||||||
|
WebSearch: Naver autocomplete and trend discovery
|
||||||
|
```
|
||||||
|
|
||||||
|
## Workflow
|
||||||
|
|
||||||
|
### 1. Seed Keyword Expansion
|
||||||
|
1. Input seed keyword (Korean or English)
|
||||||
|
2. Query Ahrefs matching-terms and related-terms
|
||||||
|
3. Get search suggestions for long-tail variations
|
||||||
|
4. Apply Korean suffix expansion if Korean market
|
||||||
|
5. Deduplicate and merge results
|
||||||
|
|
||||||
|
### 2. Intent Classification & Clustering
|
||||||
|
1. Classify each keyword by search intent
|
||||||
|
2. Group keywords into topic clusters
|
||||||
|
3. Identify pillar topics and supporting terms
|
||||||
|
4. Calculate cluster-level metrics (total volume, avg KD)
|
||||||
|
|
||||||
|
### 3. Gap Analysis
|
||||||
|
1. Pull organic keywords for target and competitors
|
||||||
|
2. Identify keywords present in competitors but missing from target
|
||||||
|
3. Score opportunities by volume/difficulty ratio
|
||||||
|
4. Prioritize by intent alignment with business goals
|
||||||
|
|
||||||
|
## Output Format
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
## Keyword Strategy Report: [seed keyword]
|
||||||
|
|
||||||
|
### Overview
|
||||||
|
- Total keywords discovered: [count]
|
||||||
|
- Topic clusters: [count]
|
||||||
|
- Total search volume: [sum]
|
||||||
|
|
||||||
|
### Top Clusters
|
||||||
|
| Cluster | Keywords | Total Volume | Avg KD |
|
||||||
|
|---------|----------|-------------|--------|
|
||||||
|
| ... | ... | ... | ... |
|
||||||
|
|
||||||
|
### Top Opportunities
|
||||||
|
| Keyword | Volume | KD | Intent | Cluster |
|
||||||
|
|---------|--------|-----|--------|---------|
|
||||||
|
| ... | ... | ... | ... | ... |
|
||||||
|
|
||||||
|
### Keyword Gaps (vs competitors)
|
||||||
|
| Keyword | Volume | Competitor Position | Opportunity Score |
|
||||||
|
|---------|--------|-------------------|-------------------|
|
||||||
|
| ... | ... | ... | ... |
|
||||||
|
```
|
||||||
|
|
||||||
|
## Notion Output (Required)
|
||||||
|
|
||||||
|
All audit reports MUST be saved to OurDigital SEO Audit Log:
|
||||||
|
- **Database ID**: `2c8581e5-8a1e-8035-880b-e38cefc2f3ef`
|
||||||
|
- **Properties**: Issue (title), Site (url), Category, Priority, Found Date, Audit ID
|
||||||
|
- **Language**: Korean with English technical terms
|
||||||
|
- **Audit ID Format**: KW-YYYYMMDD-NNN
|
||||||
9
custom-skills/19-seo-keyword-strategy/desktop/skill.yaml
Normal file
9
custom-skills/19-seo-keyword-strategy/desktop/skill.yaml
Normal file
@@ -0,0 +1,9 @@
|
|||||||
|
name: seo-keyword-strategy
|
||||||
|
description: |
|
||||||
|
Keyword strategy and research for SEO campaigns. Triggers: keyword research, keyword analysis, keyword gap, search volume, keyword clustering, intent classification.
|
||||||
|
|
||||||
|
allowed-tools:
|
||||||
|
- mcp__ahrefs__*
|
||||||
|
- mcp__notion__*
|
||||||
|
- WebSearch
|
||||||
|
- WebFetch
|
||||||
@@ -0,0 +1,15 @@
|
|||||||
|
# Ahrefs
|
||||||
|
|
||||||
|
> TODO: Document tool usage for this skill
|
||||||
|
|
||||||
|
## Available Commands
|
||||||
|
|
||||||
|
- [ ] List commands
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
- [ ] Add configuration details
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
- [ ] Add usage examples
|
||||||
@@ -0,0 +1,15 @@
|
|||||||
|
# Notion
|
||||||
|
|
||||||
|
> TODO: Document tool usage for this skill
|
||||||
|
|
||||||
|
## Available Commands
|
||||||
|
|
||||||
|
- [ ] List commands
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
- [ ] Add configuration details
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
- [ ] Add usage examples
|
||||||
@@ -0,0 +1,15 @@
|
|||||||
|
# WebSearch
|
||||||
|
|
||||||
|
> TODO: Document tool usage for this skill
|
||||||
|
|
||||||
|
## Available Commands
|
||||||
|
|
||||||
|
- [ ] List commands
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
- [ ] Add configuration details
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
- [ ] Add usage examples
|
||||||
132
custom-skills/20-seo-serp-analysis/code/CLAUDE.md
Normal file
132
custom-skills/20-seo-serp-analysis/code/CLAUDE.md
Normal file
@@ -0,0 +1,132 @@
|
|||||||
|
# CLAUDE.md
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
SERP analysis tool for understanding search result landscapes. Detects Google SERP features (featured snippets, PAA, knowledge panels, local pack, video, ads), analyzes Naver SERP composition (blog, cafe, knowledge iN, Smart Store, brand zone, VIEW tab), maps competitor positions, and scores SERP feature opportunities.
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install -r scripts/requirements.txt
|
||||||
|
|
||||||
|
# Google SERP analysis
|
||||||
|
python scripts/serp_analyzer.py --keyword "치과 임플란트" --country kr --json
|
||||||
|
|
||||||
|
# Naver SERP analysis
|
||||||
|
python scripts/naver_serp_analyzer.py --keyword "치과 임플란트" --json
|
||||||
|
```
|
||||||
|
|
||||||
|
## Scripts
|
||||||
|
|
||||||
|
| Script | Purpose | Key Output |
|
||||||
|
|--------|---------|------------|
|
||||||
|
| `serp_analyzer.py` | Google SERP feature detection and competitor mapping | SERP features, competitor positions, opportunity scores |
|
||||||
|
| `naver_serp_analyzer.py` | Naver SERP composition analysis | Section distribution, content type mapping |
|
||||||
|
| `base_client.py` | Shared utilities | RateLimiter, ConfigManager, BaseAsyncClient |
|
||||||
|
|
||||||
|
## SERP Analyzer (Google)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Single keyword analysis
|
||||||
|
python scripts/serp_analyzer.py --keyword "dental implant cost" --json
|
||||||
|
|
||||||
|
# Korean market
|
||||||
|
python scripts/serp_analyzer.py --keyword "치과 임플란트 가격" --country kr --json
|
||||||
|
|
||||||
|
# Multiple keywords from file
|
||||||
|
python scripts/serp_analyzer.py --keywords-file keywords.txt --country kr --json
|
||||||
|
|
||||||
|
# Output to file
|
||||||
|
python scripts/serp_analyzer.py --keyword "dental implant" --output serp_report.json
|
||||||
|
```
|
||||||
|
|
||||||
|
**Capabilities**:
|
||||||
|
- SERP feature detection (featured snippet, PAA, knowledge panel, local pack, video carousel, ads, image pack, site links)
|
||||||
|
- Competitor position mapping per keyword
|
||||||
|
- Content type distribution analysis (blog, product, service, news, video)
|
||||||
|
- SERP feature opportunity scoring
|
||||||
|
- Search intent validation from SERP composition
|
||||||
|
- SERP volatility assessment
|
||||||
|
|
||||||
|
## Naver SERP Analyzer
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Analyze Naver search results
|
||||||
|
python scripts/naver_serp_analyzer.py --keyword "치과 임플란트" --json
|
||||||
|
|
||||||
|
# Analyze multiple keywords
|
||||||
|
python scripts/naver_serp_analyzer.py --keywords-file keywords.txt --json
|
||||||
|
```
|
||||||
|
|
||||||
|
**Capabilities**:
|
||||||
|
- Naver section detection (블로그, 카페, 지식iN, 스마트스토어, 브랜드존, VIEW탭)
|
||||||
|
- Section priority mapping (which sections appear above fold)
|
||||||
|
- Content type distribution per section
|
||||||
|
- Brand zone presence detection
|
||||||
|
- VIEW tab content analysis
|
||||||
|
|
||||||
|
## Ahrefs MCP Tools Used
|
||||||
|
|
||||||
|
| Tool | Purpose |
|
||||||
|
|------|---------|
|
||||||
|
| `serp-overview` | Get SERP results for a keyword |
|
||||||
|
| `keywords-explorer-overview` | Get keyword metrics and SERP features |
|
||||||
|
| `site-explorer-organic-keywords` | Map competitor positions |
|
||||||
|
|
||||||
|
## Output Format
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"keyword": "치과 임플란트",
|
||||||
|
"country": "kr",
|
||||||
|
"serp_features": {
|
||||||
|
"featured_snippet": true,
|
||||||
|
"people_also_ask": true,
|
||||||
|
"local_pack": true,
|
||||||
|
"knowledge_panel": false,
|
||||||
|
"video_carousel": false,
|
||||||
|
"ads_top": 3,
|
||||||
|
"ads_bottom": 2
|
||||||
|
},
|
||||||
|
"competitors": [
|
||||||
|
{
|
||||||
|
"position": 1,
|
||||||
|
"url": "https://example.com/page",
|
||||||
|
"domain": "example.com",
|
||||||
|
"title": "...",
|
||||||
|
"content_type": "service_page"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"opportunity_score": 72,
|
||||||
|
"intent_signals": "commercial",
|
||||||
|
"timestamp": "2025-01-01T00:00:00"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Notion Output (Required)
|
||||||
|
|
||||||
|
**IMPORTANT**: All audit reports MUST be saved to the OurDigital SEO Audit Log database.
|
||||||
|
|
||||||
|
### Database Configuration
|
||||||
|
|
||||||
|
| Field | Value |
|
||||||
|
|-------|-------|
|
||||||
|
| Database ID | `2c8581e5-8a1e-8035-880b-e38cefc2f3ef` |
|
||||||
|
| URL | https://www.notion.so/dintelligence/2c8581e58a1e8035880be38cefc2f3ef |
|
||||||
|
|
||||||
|
### Required Properties
|
||||||
|
|
||||||
|
| Property | Type | Description |
|
||||||
|
|----------|------|-------------|
|
||||||
|
| Issue | Title | Report title (Korean + date) |
|
||||||
|
| Site | URL | Audited website URL |
|
||||||
|
| Category | Select | SERP Analysis |
|
||||||
|
| Priority | Select | Based on opportunity score |
|
||||||
|
| Found Date | Date | Analysis date (YYYY-MM-DD) |
|
||||||
|
| Audit ID | Rich Text | Format: SERP-YYYYMMDD-NNN |
|
||||||
|
|
||||||
|
### Language Guidelines
|
||||||
|
|
||||||
|
- Report content in Korean (한국어)
|
||||||
|
- Keep technical English terms as-is (e.g., SERP, Featured Snippet, PAA)
|
||||||
|
- URLs and code remain unchanged
|
||||||
207
custom-skills/20-seo-serp-analysis/code/scripts/base_client.py
Normal file
207
custom-skills/20-seo-serp-analysis/code/scripts/base_client.py
Normal file
@@ -0,0 +1,207 @@
|
|||||||
|
"""
|
||||||
|
Base Client - Shared async client utilities
|
||||||
|
===========================================
|
||||||
|
Purpose: Rate-limited async operations for API clients
|
||||||
|
Python: 3.10+
|
||||||
|
"""
|
||||||
|
|
||||||
|
import asyncio
|
||||||
|
import logging
|
||||||
|
import os
|
||||||
|
from asyncio import Semaphore
|
||||||
|
from datetime import datetime
|
||||||
|
from typing import Any, Callable, TypeVar
|
||||||
|
|
||||||
|
from dotenv import load_dotenv
|
||||||
|
from tenacity import (
|
||||||
|
retry,
|
||||||
|
stop_after_attempt,
|
||||||
|
wait_exponential,
|
||||||
|
retry_if_exception_type,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Load environment variables
|
||||||
|
load_dotenv()
|
||||||
|
|
||||||
|
# Logging setup
|
||||||
|
logging.basicConfig(
|
||||||
|
level=logging.INFO,
|
||||||
|
format="%(asctime)s - %(levelname)s - %(message)s",
|
||||||
|
)
|
||||||
|
|
||||||
|
T = TypeVar("T")
|
||||||
|
|
||||||
|
|
||||||
|
class RateLimiter:
|
||||||
|
"""Rate limiter using token bucket algorithm."""
|
||||||
|
|
||||||
|
def __init__(self, rate: float, per: float = 1.0):
|
||||||
|
"""
|
||||||
|
Initialize rate limiter.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
rate: Number of requests allowed
|
||||||
|
per: Time period in seconds (default: 1 second)
|
||||||
|
"""
|
||||||
|
self.rate = rate
|
||||||
|
self.per = per
|
||||||
|
self.tokens = rate
|
||||||
|
self.last_update = datetime.now()
|
||||||
|
self._lock = asyncio.Lock()
|
||||||
|
|
||||||
|
async def acquire(self) -> None:
|
||||||
|
"""Acquire a token, waiting if necessary."""
|
||||||
|
async with self._lock:
|
||||||
|
now = datetime.now()
|
||||||
|
elapsed = (now - self.last_update).total_seconds()
|
||||||
|
self.tokens = min(self.rate, self.tokens + elapsed * (self.rate / self.per))
|
||||||
|
self.last_update = now
|
||||||
|
|
||||||
|
if self.tokens < 1:
|
||||||
|
wait_time = (1 - self.tokens) * (self.per / self.rate)
|
||||||
|
await asyncio.sleep(wait_time)
|
||||||
|
self.tokens = 0
|
||||||
|
else:
|
||||||
|
self.tokens -= 1
|
||||||
|
|
||||||
|
|
||||||
|
class BaseAsyncClient:
|
||||||
|
"""Base class for async API clients with rate limiting."""
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
max_concurrent: int = 5,
|
||||||
|
requests_per_second: float = 3.0,
|
||||||
|
logger: logging.Logger | None = None,
|
||||||
|
):
|
||||||
|
"""
|
||||||
|
Initialize base client.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
max_concurrent: Maximum concurrent requests
|
||||||
|
requests_per_second: Rate limit
|
||||||
|
logger: Logger instance
|
||||||
|
"""
|
||||||
|
self.semaphore = Semaphore(max_concurrent)
|
||||||
|
self.rate_limiter = RateLimiter(requests_per_second)
|
||||||
|
self.logger = logger or logging.getLogger(self.__class__.__name__)
|
||||||
|
self.stats = {
|
||||||
|
"requests": 0,
|
||||||
|
"success": 0,
|
||||||
|
"errors": 0,
|
||||||
|
"retries": 0,
|
||||||
|
}
|
||||||
|
|
||||||
|
@retry(
|
||||||
|
stop=stop_after_attempt(3),
|
||||||
|
wait=wait_exponential(multiplier=1, min=2, max=10),
|
||||||
|
retry=retry_if_exception_type(Exception),
|
||||||
|
)
|
||||||
|
async def _rate_limited_request(
|
||||||
|
self,
|
||||||
|
coro: Callable[[], Any],
|
||||||
|
) -> Any:
|
||||||
|
"""Execute a request with rate limiting and retry."""
|
||||||
|
async with self.semaphore:
|
||||||
|
await self.rate_limiter.acquire()
|
||||||
|
self.stats["requests"] += 1
|
||||||
|
try:
|
||||||
|
result = await coro()
|
||||||
|
self.stats["success"] += 1
|
||||||
|
return result
|
||||||
|
except Exception as e:
|
||||||
|
self.stats["errors"] += 1
|
||||||
|
self.logger.error(f"Request failed: {e}")
|
||||||
|
raise
|
||||||
|
|
||||||
|
async def batch_requests(
|
||||||
|
self,
|
||||||
|
requests: list[Callable[[], Any]],
|
||||||
|
desc: str = "Processing",
|
||||||
|
) -> list[Any]:
|
||||||
|
"""Execute multiple requests concurrently."""
|
||||||
|
try:
|
||||||
|
from tqdm.asyncio import tqdm
|
||||||
|
has_tqdm = True
|
||||||
|
except ImportError:
|
||||||
|
has_tqdm = False
|
||||||
|
|
||||||
|
async def execute(req: Callable) -> Any:
|
||||||
|
try:
|
||||||
|
return await self._rate_limited_request(req)
|
||||||
|
except Exception as e:
|
||||||
|
return {"error": str(e)}
|
||||||
|
|
||||||
|
tasks = [execute(req) for req in requests]
|
||||||
|
|
||||||
|
if has_tqdm:
|
||||||
|
results = []
|
||||||
|
for coro in tqdm.as_completed(tasks, total=len(tasks), desc=desc):
|
||||||
|
result = await coro
|
||||||
|
results.append(result)
|
||||||
|
return results
|
||||||
|
else:
|
||||||
|
return await asyncio.gather(*tasks, return_exceptions=True)
|
||||||
|
|
||||||
|
def print_stats(self) -> None:
|
||||||
|
"""Print request statistics."""
|
||||||
|
self.logger.info("=" * 40)
|
||||||
|
self.logger.info("Request Statistics:")
|
||||||
|
self.logger.info(f" Total Requests: {self.stats['requests']}")
|
||||||
|
self.logger.info(f" Successful: {self.stats['success']}")
|
||||||
|
self.logger.info(f" Errors: {self.stats['errors']}")
|
||||||
|
self.logger.info("=" * 40)
|
||||||
|
|
||||||
|
|
||||||
|
class ConfigManager:
|
||||||
|
"""Manage API configuration and credentials."""
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
load_dotenv()
|
||||||
|
|
||||||
|
@property
|
||||||
|
def google_credentials_path(self) -> str | None:
|
||||||
|
"""Get Google service account credentials path."""
|
||||||
|
# Prefer SEO-specific credentials, fallback to general credentials
|
||||||
|
seo_creds = os.path.expanduser("~/.credential/ourdigital-seo-agent.json")
|
||||||
|
if os.path.exists(seo_creds):
|
||||||
|
return seo_creds
|
||||||
|
return os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
|
||||||
|
|
||||||
|
@property
|
||||||
|
def pagespeed_api_key(self) -> str | None:
|
||||||
|
"""Get PageSpeed Insights API key."""
|
||||||
|
return os.getenv("PAGESPEED_API_KEY")
|
||||||
|
|
||||||
|
@property
|
||||||
|
def custom_search_api_key(self) -> str | None:
|
||||||
|
"""Get Custom Search API key."""
|
||||||
|
return os.getenv("CUSTOM_SEARCH_API_KEY")
|
||||||
|
|
||||||
|
@property
|
||||||
|
def custom_search_engine_id(self) -> str | None:
|
||||||
|
"""Get Custom Search Engine ID."""
|
||||||
|
return os.getenv("CUSTOM_SEARCH_ENGINE_ID")
|
||||||
|
|
||||||
|
@property
|
||||||
|
def notion_token(self) -> str | None:
|
||||||
|
"""Get Notion API token."""
|
||||||
|
return os.getenv("NOTION_TOKEN") or os.getenv("NOTION_API_KEY")
|
||||||
|
|
||||||
|
def validate_google_credentials(self) -> bool:
|
||||||
|
"""Validate Google credentials are configured."""
|
||||||
|
creds_path = self.google_credentials_path
|
||||||
|
if not creds_path:
|
||||||
|
return False
|
||||||
|
return os.path.exists(creds_path)
|
||||||
|
|
||||||
|
def get_required(self, key: str) -> str:
|
||||||
|
"""Get required environment variable or raise error."""
|
||||||
|
value = os.getenv(key)
|
||||||
|
if not value:
|
||||||
|
raise ValueError(f"Missing required environment variable: {key}")
|
||||||
|
return value
|
||||||
|
|
||||||
|
|
||||||
|
# Singleton config instance
|
||||||
|
config = ConfigManager()
|
||||||
@@ -0,0 +1,682 @@
|
|||||||
|
"""
|
||||||
|
Naver SERP Analyzer - Naver search result composition analysis
|
||||||
|
==============================================================
|
||||||
|
Purpose: Analyze Naver SERP section distribution, content type mapping,
|
||||||
|
brand zone detection, and VIEW tab content analysis.
|
||||||
|
Python: 3.10+
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python naver_serp_analyzer.py --keyword "치과 임플란트" --json
|
||||||
|
python naver_serp_analyzer.py --keywords-file keywords.txt --json
|
||||||
|
python naver_serp_analyzer.py --keyword "치과 임플란트" --output naver_report.json
|
||||||
|
"""
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
import re
|
||||||
|
import sys
|
||||||
|
from dataclasses import asdict, dataclass, field
|
||||||
|
from datetime import datetime
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
import requests
|
||||||
|
from bs4 import BeautifulSoup
|
||||||
|
from rich.console import Console
|
||||||
|
from rich.table import Table
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Logging
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
logging.basicConfig(
|
||||||
|
level=logging.INFO,
|
||||||
|
format="%(asctime)s - %(levelname)s - %(message)s",
|
||||||
|
)
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
console = Console()
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Constants - Naver SERP Section Identifiers
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
# CSS class / id patterns used to detect Naver SERP sections
|
||||||
|
NAVER_SECTION_SELECTORS: dict[str, list[str]] = {
|
||||||
|
"blog": [
|
||||||
|
"sp_blog",
|
||||||
|
"blog_widget",
|
||||||
|
"sc_new.sp_blog",
|
||||||
|
"api_subject_blog",
|
||||||
|
"type_blog",
|
||||||
|
"blog_exact",
|
||||||
|
],
|
||||||
|
"cafe": [
|
||||||
|
"sp_cafe",
|
||||||
|
"cafe_widget",
|
||||||
|
"sc_new.sp_cafe",
|
||||||
|
"api_subject_cafe",
|
||||||
|
"type_cafe",
|
||||||
|
],
|
||||||
|
"knowledge_in": [
|
||||||
|
"sp_kin",
|
||||||
|
"kin_widget",
|
||||||
|
"sc_new.sp_kin",
|
||||||
|
"api_subject_kin",
|
||||||
|
"type_kin",
|
||||||
|
"nx_kin",
|
||||||
|
],
|
||||||
|
"smart_store": [
|
||||||
|
"sp_nshop",
|
||||||
|
"shopping_widget",
|
||||||
|
"sc_new.sp_nshop",
|
||||||
|
"api_subject_shopping",
|
||||||
|
"type_shopping",
|
||||||
|
"smartstore",
|
||||||
|
],
|
||||||
|
"brand_zone": [
|
||||||
|
"sp_brand",
|
||||||
|
"brand_area",
|
||||||
|
"brand_zone",
|
||||||
|
"type_brand",
|
||||||
|
"sc_new.sp_brand",
|
||||||
|
],
|
||||||
|
"view_tab": [
|
||||||
|
"sp_view",
|
||||||
|
"view_widget",
|
||||||
|
"sc_new.sp_view",
|
||||||
|
"type_view",
|
||||||
|
"api_subject_view",
|
||||||
|
],
|
||||||
|
"news": [
|
||||||
|
"sp_nnews",
|
||||||
|
"news_widget",
|
||||||
|
"sc_new.sp_nnews",
|
||||||
|
"api_subject_news",
|
||||||
|
"type_news",
|
||||||
|
"group_news",
|
||||||
|
],
|
||||||
|
"encyclopedia": [
|
||||||
|
"sp_encyclopedia",
|
||||||
|
"sc_new.sp_encyclopedia",
|
||||||
|
"api_subject_encyclopedia",
|
||||||
|
"type_encyclopedia",
|
||||||
|
"nx_encyclopedia",
|
||||||
|
],
|
||||||
|
"image": [
|
||||||
|
"sp_image",
|
||||||
|
"image_widget",
|
||||||
|
"sc_new.sp_image",
|
||||||
|
"api_subject_image",
|
||||||
|
"type_image",
|
||||||
|
],
|
||||||
|
"video": [
|
||||||
|
"sp_video",
|
||||||
|
"video_widget",
|
||||||
|
"sc_new.sp_video",
|
||||||
|
"api_subject_video",
|
||||||
|
"type_video",
|
||||||
|
],
|
||||||
|
"place": [
|
||||||
|
"sp_local",
|
||||||
|
"local_widget",
|
||||||
|
"sc_new.sp_local",
|
||||||
|
"type_place",
|
||||||
|
"place_section",
|
||||||
|
"loc_map",
|
||||||
|
],
|
||||||
|
"ad": [
|
||||||
|
"sp_nad",
|
||||||
|
"sp_tad",
|
||||||
|
"ad_section",
|
||||||
|
"type_powerlink",
|
||||||
|
"type_ad",
|
||||||
|
"nx_ad",
|
||||||
|
],
|
||||||
|
}
|
||||||
|
|
||||||
|
# Section display names in Korean
|
||||||
|
SECTION_DISPLAY_NAMES: dict[str, str] = {
|
||||||
|
"blog": "블로그",
|
||||||
|
"cafe": "카페",
|
||||||
|
"knowledge_in": "지식iN",
|
||||||
|
"smart_store": "스마트스토어",
|
||||||
|
"brand_zone": "브랜드존",
|
||||||
|
"view_tab": "VIEW",
|
||||||
|
"news": "뉴스",
|
||||||
|
"encyclopedia": "백과사전",
|
||||||
|
"image": "이미지",
|
||||||
|
"video": "동영상",
|
||||||
|
"place": "플레이스",
|
||||||
|
"ad": "광고",
|
||||||
|
}
|
||||||
|
|
||||||
|
# Default headers for Naver requests
|
||||||
|
NAVER_HEADERS = {
|
||||||
|
"User-Agent": (
|
||||||
|
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
|
||||||
|
"AppleWebKit/537.36 (KHTML, like Gecko) "
|
||||||
|
"Chrome/120.0.0.0 Safari/537.36"
|
||||||
|
),
|
||||||
|
"Accept-Language": "ko-KR,ko;q=0.9,en-US;q=0.8,en;q=0.7",
|
||||||
|
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Data Classes
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class NaverSection:
|
||||||
|
"""A detected section within Naver SERP."""
|
||||||
|
|
||||||
|
section_type: str # blog, cafe, knowledge_in, smart_store, etc.
|
||||||
|
display_name: str = ""
|
||||||
|
position: int = 0 # Order of appearance (1-based)
|
||||||
|
item_count: int = 0 # Number of items in the section
|
||||||
|
is_above_fold: bool = False # Appears within first ~3 sections
|
||||||
|
has_more_link: bool = False # Section has "more results" link
|
||||||
|
raw_html_snippet: str = "" # Short HTML snippet for debugging
|
||||||
|
|
||||||
|
def __post_init__(self):
|
||||||
|
if not self.display_name:
|
||||||
|
self.display_name = SECTION_DISPLAY_NAMES.get(
|
||||||
|
self.section_type, self.section_type
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class NaverSerpResult:
|
||||||
|
"""Complete Naver SERP analysis result for a keyword."""
|
||||||
|
|
||||||
|
keyword: str
|
||||||
|
sections: list[NaverSection] = field(default_factory=list)
|
||||||
|
section_order: list[str] = field(default_factory=list)
|
||||||
|
brand_zone_present: bool = False
|
||||||
|
brand_zone_brand: str = ""
|
||||||
|
total_sections: int = 0
|
||||||
|
above_fold_sections: list[str] = field(default_factory=list)
|
||||||
|
ad_count: int = 0
|
||||||
|
dominant_section: str = ""
|
||||||
|
has_view_tab: bool = False
|
||||||
|
has_place_section: bool = False
|
||||||
|
timestamp: str = ""
|
||||||
|
|
||||||
|
def __post_init__(self):
|
||||||
|
if not self.timestamp:
|
||||||
|
self.timestamp = datetime.now().isoformat()
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Naver SERP Analyzer
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
class NaverSerpAnalyzer:
|
||||||
|
"""Analyzes Naver search result page composition."""
|
||||||
|
|
||||||
|
NAVER_SEARCH_URL = "https://search.naver.com/search.naver"
|
||||||
|
|
||||||
|
def __init__(self, timeout: int = 30):
|
||||||
|
self.timeout = timeout
|
||||||
|
self.logger = logging.getLogger(self.__class__.__name__)
|
||||||
|
self.session = requests.Session()
|
||||||
|
self.session.headers.update(NAVER_HEADERS)
|
||||||
|
|
||||||
|
# ----- Data Fetching -----
|
||||||
|
|
||||||
|
def fetch_serp(self, keyword: str) -> str:
|
||||||
|
"""
|
||||||
|
Fetch Naver search results HTML for a given keyword.
|
||||||
|
|
||||||
|
Returns the raw HTML string of the search results page.
|
||||||
|
"""
|
||||||
|
self.logger.info(f"Fetching Naver SERP for '{keyword}'")
|
||||||
|
|
||||||
|
params = {
|
||||||
|
"where": "nexearch",
|
||||||
|
"sm": "top_hty",
|
||||||
|
"fbm": "0",
|
||||||
|
"ie": "utf8",
|
||||||
|
"query": keyword,
|
||||||
|
}
|
||||||
|
|
||||||
|
try:
|
||||||
|
response = self.session.get(
|
||||||
|
self.NAVER_SEARCH_URL,
|
||||||
|
params=params,
|
||||||
|
timeout=self.timeout,
|
||||||
|
)
|
||||||
|
response.raise_for_status()
|
||||||
|
self.logger.info(
|
||||||
|
f"Fetched {len(response.text):,} bytes "
|
||||||
|
f"(status={response.status_code})"
|
||||||
|
)
|
||||||
|
return response.text
|
||||||
|
|
||||||
|
except requests.RequestException as exc:
|
||||||
|
self.logger.error(f"Failed to fetch Naver SERP: {exc}")
|
||||||
|
return ""
|
||||||
|
|
||||||
|
# ----- Section Detection -----
|
||||||
|
|
||||||
|
def detect_sections(self, html: str) -> list[NaverSection]:
|
||||||
|
"""
|
||||||
|
Identify Naver SERP sections from HTML structure.
|
||||||
|
|
||||||
|
Scans the HTML for known CSS class names and IDs that correspond
|
||||||
|
to Naver's SERP section types.
|
||||||
|
"""
|
||||||
|
if not html:
|
||||||
|
return []
|
||||||
|
|
||||||
|
soup = BeautifulSoup(html, "lxml")
|
||||||
|
sections: list[NaverSection] = []
|
||||||
|
position = 0
|
||||||
|
|
||||||
|
# Strategy 1: Look for section containers with known class names
|
||||||
|
# Naver uses <div class="sc_new sp_XXX"> and <section> elements
|
||||||
|
all_sections = soup.find_all(
|
||||||
|
["div", "section"],
|
||||||
|
class_=re.compile(
|
||||||
|
r"(sc_new|api_subject|sp_|type_|_widget|group_|nx_)"
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
|
seen_types: set[str] = set()
|
||||||
|
|
||||||
|
for element in all_sections:
|
||||||
|
classes = " ".join(element.get("class", []))
|
||||||
|
element_id = element.get("id", "")
|
||||||
|
search_text = f"{classes} {element_id}".lower()
|
||||||
|
|
||||||
|
for section_type, selectors in NAVER_SECTION_SELECTORS.items():
|
||||||
|
if section_type in seen_types:
|
||||||
|
continue
|
||||||
|
|
||||||
|
matched = False
|
||||||
|
for selector in selectors:
|
||||||
|
if selector.lower() in search_text:
|
||||||
|
matched = True
|
||||||
|
break
|
||||||
|
|
||||||
|
if matched:
|
||||||
|
position += 1
|
||||||
|
seen_types.add(section_type)
|
||||||
|
|
||||||
|
# Count items within the section
|
||||||
|
item_count = self._count_section_items(element, section_type)
|
||||||
|
|
||||||
|
# Check for "more" link
|
||||||
|
has_more = bool(
|
||||||
|
element.find("a", class_=re.compile(r"(more|_more|btn_more)"))
|
||||||
|
or element.find("a", string=re.compile(r"(더보기|전체보기)"))
|
||||||
|
)
|
||||||
|
|
||||||
|
# Get short HTML snippet for debugging
|
||||||
|
snippet = str(element)[:200] if element else ""
|
||||||
|
|
||||||
|
section = NaverSection(
|
||||||
|
section_type=section_type,
|
||||||
|
position=position,
|
||||||
|
item_count=item_count,
|
||||||
|
is_above_fold=(position <= 3),
|
||||||
|
has_more_link=has_more,
|
||||||
|
raw_html_snippet=snippet,
|
||||||
|
)
|
||||||
|
sections.append(section)
|
||||||
|
|
||||||
|
# Strategy 2: Fallback - scan entire HTML text for section markers
|
||||||
|
if not sections:
|
||||||
|
self.logger.warning(
|
||||||
|
"No sections found via DOM parsing; "
|
||||||
|
"falling back to text pattern matching"
|
||||||
|
)
|
||||||
|
sections = self._fallback_text_detection(html)
|
||||||
|
|
||||||
|
return sections
|
||||||
|
|
||||||
|
def _count_section_items(self, element: Any, section_type: str) -> int:
|
||||||
|
"""Count the number of result items within a section element."""
|
||||||
|
# Common item container patterns
|
||||||
|
item_selectors = [
|
||||||
|
"li",
|
||||||
|
".api_txt_lines",
|
||||||
|
".total_tit",
|
||||||
|
".detail_box",
|
||||||
|
".item",
|
||||||
|
".lst_total > li",
|
||||||
|
]
|
||||||
|
|
||||||
|
for selector in item_selectors:
|
||||||
|
items = element.select(selector)
|
||||||
|
if items and len(items) > 0:
|
||||||
|
return len(items)
|
||||||
|
|
||||||
|
# Fallback: count links that look like results
|
||||||
|
links = element.find_all("a", href=True)
|
||||||
|
result_links = [
|
||||||
|
a
|
||||||
|
for a in links
|
||||||
|
if a.get("href", "").startswith("http")
|
||||||
|
and "naver.com/search" not in a.get("href", "")
|
||||||
|
]
|
||||||
|
return len(result_links) if result_links else 0
|
||||||
|
|
||||||
|
def _fallback_text_detection(self, html: str) -> list[NaverSection]:
|
||||||
|
"""Detect sections by scanning raw HTML text for known markers."""
|
||||||
|
sections: list[NaverSection] = []
|
||||||
|
position = 0
|
||||||
|
html_lower = html.lower()
|
||||||
|
|
||||||
|
for section_type, selectors in NAVER_SECTION_SELECTORS.items():
|
||||||
|
for selector in selectors:
|
||||||
|
if selector.lower() in html_lower:
|
||||||
|
position += 1
|
||||||
|
sections.append(
|
||||||
|
NaverSection(
|
||||||
|
section_type=section_type,
|
||||||
|
position=position,
|
||||||
|
item_count=0,
|
||||||
|
is_above_fold=(position <= 3),
|
||||||
|
)
|
||||||
|
)
|
||||||
|
break
|
||||||
|
|
||||||
|
return sections
|
||||||
|
|
||||||
|
# ----- Section Priority Analysis -----
|
||||||
|
|
||||||
|
def analyze_section_priority(
|
||||||
|
self, sections: list[NaverSection]
|
||||||
|
) -> list[str]:
|
||||||
|
"""
|
||||||
|
Determine above-fold section order.
|
||||||
|
|
||||||
|
Returns ordered list of section types that appear in the first
|
||||||
|
visible area of the SERP (approximately top 3 sections).
|
||||||
|
"""
|
||||||
|
sorted_sections = sorted(sections, key=lambda s: s.position)
|
||||||
|
above_fold = [s.section_type for s in sorted_sections if s.is_above_fold]
|
||||||
|
return above_fold
|
||||||
|
|
||||||
|
# ----- Brand Zone Detection -----
|
||||||
|
|
||||||
|
def check_brand_zone(self, html: str) -> tuple[bool, str]:
|
||||||
|
"""
|
||||||
|
Detect brand zone presence and extract brand name if available.
|
||||||
|
|
||||||
|
Returns (is_present, brand_name).
|
||||||
|
"""
|
||||||
|
if not html:
|
||||||
|
return False, ""
|
||||||
|
|
||||||
|
soup = BeautifulSoup(html, "lxml")
|
||||||
|
|
||||||
|
# Look for brand zone container
|
||||||
|
brand_selectors = [
|
||||||
|
"sp_brand",
|
||||||
|
"brand_area",
|
||||||
|
"brand_zone",
|
||||||
|
"type_brand",
|
||||||
|
]
|
||||||
|
|
||||||
|
for selector in brand_selectors:
|
||||||
|
brand_el = soup.find(
|
||||||
|
["div", "section"],
|
||||||
|
class_=re.compile(selector, re.IGNORECASE),
|
||||||
|
)
|
||||||
|
if brand_el:
|
||||||
|
# Try to extract brand name from the section
|
||||||
|
brand_name = ""
|
||||||
|
title_el = brand_el.find(
|
||||||
|
["h2", "h3", "strong", "a"],
|
||||||
|
class_=re.compile(r"(tit|title|name|brand)", re.IGNORECASE),
|
||||||
|
)
|
||||||
|
if title_el:
|
||||||
|
brand_name = title_el.get_text(strip=True)
|
||||||
|
|
||||||
|
return True, brand_name
|
||||||
|
|
||||||
|
# Text-based fallback
|
||||||
|
if "brand_zone" in html.lower() or "sp_brand" in html.lower():
|
||||||
|
return True, ""
|
||||||
|
|
||||||
|
return False, ""
|
||||||
|
|
||||||
|
# ----- Dominant Section -----
|
||||||
|
|
||||||
|
def _find_dominant_section(self, sections: list[NaverSection]) -> str:
|
||||||
|
"""Find the section with the most items (excluding ads)."""
|
||||||
|
non_ad = [s for s in sections if s.section_type != "ad"]
|
||||||
|
if not non_ad:
|
||||||
|
return ""
|
||||||
|
return max(non_ad, key=lambda s: s.item_count).section_type
|
||||||
|
|
||||||
|
# ----- Main Analysis Orchestrator -----
|
||||||
|
|
||||||
|
def analyze(self, keyword: str) -> NaverSerpResult:
|
||||||
|
"""
|
||||||
|
Orchestrate full Naver SERP analysis for a single keyword.
|
||||||
|
|
||||||
|
Steps:
|
||||||
|
1. Fetch Naver search results page
|
||||||
|
2. Detect SERP sections
|
||||||
|
3. Analyze section priority
|
||||||
|
4. Check brand zone presence
|
||||||
|
5. Compile results
|
||||||
|
"""
|
||||||
|
html = self.fetch_serp(keyword)
|
||||||
|
|
||||||
|
if not html:
|
||||||
|
self.logger.error(f"No HTML content for keyword '{keyword}'")
|
||||||
|
return NaverSerpResult(keyword=keyword)
|
||||||
|
|
||||||
|
sections = self.detect_sections(html)
|
||||||
|
above_fold = self.analyze_section_priority(sections)
|
||||||
|
brand_present, brand_name = self.check_brand_zone(html)
|
||||||
|
|
||||||
|
# Build section order
|
||||||
|
section_order = [s.section_type for s in sorted(sections, key=lambda x: x.position)]
|
||||||
|
|
||||||
|
# Count ads
|
||||||
|
ad_sections = [s for s in sections if s.section_type == "ad"]
|
||||||
|
ad_count = sum(s.item_count for s in ad_sections) if ad_sections else 0
|
||||||
|
|
||||||
|
# Check special sections
|
||||||
|
has_view = any(s.section_type == "view_tab" for s in sections)
|
||||||
|
has_place = any(s.section_type == "place" for s in sections)
|
||||||
|
dominant = self._find_dominant_section(sections)
|
||||||
|
|
||||||
|
result = NaverSerpResult(
|
||||||
|
keyword=keyword,
|
||||||
|
sections=sections,
|
||||||
|
section_order=section_order,
|
||||||
|
brand_zone_present=brand_present,
|
||||||
|
brand_zone_brand=brand_name,
|
||||||
|
total_sections=len(sections),
|
||||||
|
above_fold_sections=above_fold,
|
||||||
|
ad_count=ad_count,
|
||||||
|
dominant_section=dominant,
|
||||||
|
has_view_tab=has_view,
|
||||||
|
has_place_section=has_place,
|
||||||
|
)
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Output Helpers
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def result_to_dict(result: NaverSerpResult) -> dict[str, Any]:
|
||||||
|
"""Convert NaverSerpResult to a JSON-serializable dictionary."""
|
||||||
|
d = asdict(result)
|
||||||
|
# Remove raw HTML snippets from JSON output to keep it clean
|
||||||
|
for section in d.get("sections", []):
|
||||||
|
section.pop("raw_html_snippet", None)
|
||||||
|
return d
|
||||||
|
|
||||||
|
|
||||||
|
def print_rich_report(result: NaverSerpResult) -> None:
|
||||||
|
"""Print a human-readable report using rich."""
|
||||||
|
console.rule(f"[bold blue]Naver SERP Analysis: {result.keyword}")
|
||||||
|
console.print(f"[dim]Timestamp: {result.timestamp}[/dim]")
|
||||||
|
console.print()
|
||||||
|
|
||||||
|
# Summary
|
||||||
|
summary_table = Table(title="Summary", show_lines=True)
|
||||||
|
summary_table.add_column("Metric", style="cyan")
|
||||||
|
summary_table.add_column("Value", style="green")
|
||||||
|
summary_table.add_row("Total Sections", str(result.total_sections))
|
||||||
|
summary_table.add_row("Ad Count", str(result.ad_count))
|
||||||
|
summary_table.add_row("Brand Zone", "Yes" if result.brand_zone_present else "No")
|
||||||
|
if result.brand_zone_brand:
|
||||||
|
summary_table.add_row("Brand Name", result.brand_zone_brand)
|
||||||
|
summary_table.add_row("VIEW Tab", "Yes" if result.has_view_tab else "No")
|
||||||
|
summary_table.add_row("Place Section", "Yes" if result.has_place_section else "No")
|
||||||
|
summary_table.add_row("Dominant Section", result.dominant_section or "N/A")
|
||||||
|
console.print(summary_table)
|
||||||
|
console.print()
|
||||||
|
|
||||||
|
# Section Details
|
||||||
|
if result.sections:
|
||||||
|
section_table = Table(title="Detected Sections", show_lines=True)
|
||||||
|
section_table.add_column("#", style="bold")
|
||||||
|
section_table.add_column("Section", style="cyan")
|
||||||
|
section_table.add_column("Display Name", style="magenta")
|
||||||
|
section_table.add_column("Items", style="green")
|
||||||
|
section_table.add_column("Above Fold", style="yellow")
|
||||||
|
section_table.add_column("More Link", style="dim")
|
||||||
|
|
||||||
|
for s in sorted(result.sections, key=lambda x: x.position):
|
||||||
|
section_table.add_row(
|
||||||
|
str(s.position),
|
||||||
|
s.section_type,
|
||||||
|
s.display_name,
|
||||||
|
str(s.item_count),
|
||||||
|
"Yes" if s.is_above_fold else "No",
|
||||||
|
"Yes" if s.has_more_link else "No",
|
||||||
|
)
|
||||||
|
console.print(section_table)
|
||||||
|
console.print()
|
||||||
|
|
||||||
|
# Above-Fold Sections
|
||||||
|
if result.above_fold_sections:
|
||||||
|
console.print("[bold]Above-Fold Section Order:[/bold]")
|
||||||
|
for i, sec in enumerate(result.above_fold_sections, 1):
|
||||||
|
display = SECTION_DISPLAY_NAMES.get(sec, sec)
|
||||||
|
console.print(f" {i}. {display} ({sec})")
|
||||||
|
console.print()
|
||||||
|
|
||||||
|
# Section Order
|
||||||
|
if result.section_order:
|
||||||
|
console.print("[bold]Full Section Order:[/bold]")
|
||||||
|
order_str = " -> ".join(
|
||||||
|
SECTION_DISPLAY_NAMES.get(s, s) for s in result.section_order
|
||||||
|
)
|
||||||
|
console.print(f" {order_str}")
|
||||||
|
|
||||||
|
console.rule()
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# CLI
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def build_parser() -> argparse.ArgumentParser:
|
||||||
|
parser = argparse.ArgumentParser(
|
||||||
|
description="Naver SERP composition analysis",
|
||||||
|
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||||
|
epilog="""
|
||||||
|
Examples:
|
||||||
|
python naver_serp_analyzer.py --keyword "치과 임플란트" --json
|
||||||
|
python naver_serp_analyzer.py --keywords-file keywords.txt --json
|
||||||
|
python naver_serp_analyzer.py --keyword "치과 임플란트" --output report.json
|
||||||
|
""",
|
||||||
|
)
|
||||||
|
group = parser.add_mutually_exclusive_group(required=True)
|
||||||
|
group.add_argument(
|
||||||
|
"--keyword",
|
||||||
|
type=str,
|
||||||
|
help="Single keyword to analyze",
|
||||||
|
)
|
||||||
|
group.add_argument(
|
||||||
|
"--keywords-file",
|
||||||
|
type=str,
|
||||||
|
help="Path to file with one keyword per line",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--json",
|
||||||
|
action="store_true",
|
||||||
|
dest="json_output",
|
||||||
|
help="Output results as JSON",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--output",
|
||||||
|
type=str,
|
||||||
|
help="Write JSON results to file",
|
||||||
|
)
|
||||||
|
return parser
|
||||||
|
|
||||||
|
|
||||||
|
def load_keywords(filepath: str) -> list[str]:
|
||||||
|
"""Load keywords from a text file, one per line."""
|
||||||
|
path = Path(filepath)
|
||||||
|
if not path.exists():
|
||||||
|
logger.error(f"Keywords file not found: {filepath}")
|
||||||
|
sys.exit(1)
|
||||||
|
keywords = []
|
||||||
|
with open(path, "r", encoding="utf-8") as fh:
|
||||||
|
for line in fh:
|
||||||
|
kw = line.strip()
|
||||||
|
if kw and not kw.startswith("#"):
|
||||||
|
keywords.append(kw)
|
||||||
|
logger.info(f"Loaded {len(keywords)} keywords from {filepath}")
|
||||||
|
return keywords
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> None:
|
||||||
|
parser = build_parser()
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
analyzer = NaverSerpAnalyzer()
|
||||||
|
|
||||||
|
# Collect keywords
|
||||||
|
if args.keyword:
|
||||||
|
keywords = [args.keyword]
|
||||||
|
else:
|
||||||
|
keywords = load_keywords(args.keywords_file)
|
||||||
|
|
||||||
|
if not keywords:
|
||||||
|
logger.error("No keywords to analyze")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
results: list[dict[str, Any]] = []
|
||||||
|
|
||||||
|
for kw in keywords:
|
||||||
|
console.print(f"\n[bold]Analyzing Naver SERP:[/bold] {kw}")
|
||||||
|
result = analyzer.analyze(kw)
|
||||||
|
|
||||||
|
if args.json_output or args.output:
|
||||||
|
results.append(result_to_dict(result))
|
||||||
|
else:
|
||||||
|
print_rich_report(result)
|
||||||
|
|
||||||
|
# JSON output
|
||||||
|
if args.json_output:
|
||||||
|
output_data = results[0] if len(results) == 1 else results
|
||||||
|
print(json.dumps(output_data, ensure_ascii=False, indent=2))
|
||||||
|
|
||||||
|
if args.output:
|
||||||
|
output_data = results[0] if len(results) == 1 else results
|
||||||
|
output_path = Path(args.output)
|
||||||
|
with open(output_path, "w", encoding="utf-8") as fh:
|
||||||
|
json.dump(output_data, fh, ensure_ascii=False, indent=2)
|
||||||
|
logger.info(f"Results written to {output_path}")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
@@ -0,0 +1,9 @@
|
|||||||
|
# 20-seo-serp-analysis dependencies
|
||||||
|
requests>=2.31.0
|
||||||
|
aiohttp>=3.9.0
|
||||||
|
beautifulsoup4>=4.12.0
|
||||||
|
lxml>=5.1.0
|
||||||
|
tenacity>=8.2.0
|
||||||
|
tqdm>=4.66.0
|
||||||
|
python-dotenv>=1.0.0
|
||||||
|
rich>=13.7.0
|
||||||
891
custom-skills/20-seo-serp-analysis/code/scripts/serp_analyzer.py
Normal file
891
custom-skills/20-seo-serp-analysis/code/scripts/serp_analyzer.py
Normal file
@@ -0,0 +1,891 @@
|
|||||||
|
"""
|
||||||
|
SERP Analyzer - Google SERP feature detection and competitor mapping
|
||||||
|
====================================================================
|
||||||
|
Purpose: Analyze Google SERP features, map competitor positions,
|
||||||
|
classify content types, and score SERP opportunities.
|
||||||
|
Python: 3.10+
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python serp_analyzer.py --keyword "치과 임플란트" --country kr --json
|
||||||
|
python serp_analyzer.py --keywords-file keywords.txt --country kr --json
|
||||||
|
python serp_analyzer.py --keyword "dental implant" --output serp_report.json
|
||||||
|
"""
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
import re
|
||||||
|
import subprocess
|
||||||
|
import sys
|
||||||
|
from dataclasses import asdict, dataclass, field
|
||||||
|
from datetime import datetime
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any
|
||||||
|
from urllib.parse import urlparse
|
||||||
|
|
||||||
|
from rich.console import Console
|
||||||
|
from rich.table import Table
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Logging
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
logging.basicConfig(
|
||||||
|
level=logging.INFO,
|
||||||
|
format="%(asctime)s - %(levelname)s - %(message)s",
|
||||||
|
)
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
console = Console()
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Data Classes
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class SerpFeatures:
|
||||||
|
"""Tracks presence and count of Google SERP features."""
|
||||||
|
|
||||||
|
featured_snippet: bool = False
|
||||||
|
people_also_ask: bool = False
|
||||||
|
local_pack: bool = False
|
||||||
|
knowledge_panel: bool = False
|
||||||
|
video_carousel: bool = False
|
||||||
|
image_pack: bool = False
|
||||||
|
site_links: bool = False
|
||||||
|
ads_top: int = 0
|
||||||
|
ads_bottom: int = 0
|
||||||
|
shopping: bool = False
|
||||||
|
|
||||||
|
@property
|
||||||
|
def feature_count(self) -> int:
|
||||||
|
"""Count of boolean features that are present."""
|
||||||
|
count = 0
|
||||||
|
for f in [
|
||||||
|
self.featured_snippet,
|
||||||
|
self.people_also_ask,
|
||||||
|
self.local_pack,
|
||||||
|
self.knowledge_panel,
|
||||||
|
self.video_carousel,
|
||||||
|
self.image_pack,
|
||||||
|
self.site_links,
|
||||||
|
self.shopping,
|
||||||
|
]:
|
||||||
|
if f:
|
||||||
|
count += 1
|
||||||
|
return count
|
||||||
|
|
||||||
|
@property
|
||||||
|
def has_ads(self) -> bool:
|
||||||
|
return self.ads_top > 0 or self.ads_bottom > 0
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class CompetitorPosition:
|
||||||
|
"""A single competitor entry in the SERP."""
|
||||||
|
|
||||||
|
position: int
|
||||||
|
url: str
|
||||||
|
domain: str
|
||||||
|
title: str = ""
|
||||||
|
content_type: str = "unknown"
|
||||||
|
is_featured: bool = False
|
||||||
|
has_sitelinks: bool = False
|
||||||
|
estimated_traffic_share: float = 0.0
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class SerpResult:
|
||||||
|
"""Complete SERP analysis result for a keyword."""
|
||||||
|
|
||||||
|
keyword: str
|
||||||
|
country: str = "us"
|
||||||
|
search_volume: int = 0
|
||||||
|
keyword_difficulty: float = 0.0
|
||||||
|
cpc: float = 0.0
|
||||||
|
serp_features: SerpFeatures = field(default_factory=SerpFeatures)
|
||||||
|
competitors: list[CompetitorPosition] = field(default_factory=list)
|
||||||
|
opportunity_score: int = 0
|
||||||
|
intent_signals: str = "informational"
|
||||||
|
content_type_distribution: dict[str, int] = field(default_factory=dict)
|
||||||
|
volatility: str = "stable"
|
||||||
|
timestamp: str = ""
|
||||||
|
|
||||||
|
def __post_init__(self):
|
||||||
|
if not self.timestamp:
|
||||||
|
self.timestamp = datetime.now().isoformat()
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Content Type Classifiers
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
# URL path patterns that hint at content type
|
||||||
|
URL_CONTENT_PATTERNS: dict[str, list[str]] = {
|
||||||
|
"blog": [
|
||||||
|
r"/blog/",
|
||||||
|
r"/post/",
|
||||||
|
r"/article/",
|
||||||
|
r"/news/",
|
||||||
|
r"/magazine/",
|
||||||
|
r"/journal/",
|
||||||
|
r"/column/",
|
||||||
|
r"/story/",
|
||||||
|
r"\d{4}/\d{2}/",
|
||||||
|
],
|
||||||
|
"product": [
|
||||||
|
r"/product/",
|
||||||
|
r"/item/",
|
||||||
|
r"/shop/",
|
||||||
|
r"/store/",
|
||||||
|
r"/buy/",
|
||||||
|
r"/p/",
|
||||||
|
r"/goods/",
|
||||||
|
r"/catalog/",
|
||||||
|
],
|
||||||
|
"service": [
|
||||||
|
r"/service",
|
||||||
|
r"/solution",
|
||||||
|
r"/treatment",
|
||||||
|
r"/procedure",
|
||||||
|
r"/pricing",
|
||||||
|
r"/consultation",
|
||||||
|
],
|
||||||
|
"news": [
|
||||||
|
r"/news/",
|
||||||
|
r"/press/",
|
||||||
|
r"/media/",
|
||||||
|
r"/release/",
|
||||||
|
r"news\.",
|
||||||
|
r"press\.",
|
||||||
|
],
|
||||||
|
"video": [
|
||||||
|
r"youtube\.com/watch",
|
||||||
|
r"youtu\.be/",
|
||||||
|
r"vimeo\.com/",
|
||||||
|
r"/video/",
|
||||||
|
r"/watch/",
|
||||||
|
],
|
||||||
|
"forum": [
|
||||||
|
r"/forum/",
|
||||||
|
r"/community/",
|
||||||
|
r"/discuss",
|
||||||
|
r"/thread/",
|
||||||
|
r"/question/",
|
||||||
|
r"/answers/",
|
||||||
|
],
|
||||||
|
"wiki": [
|
||||||
|
r"wikipedia\.org",
|
||||||
|
r"/wiki/",
|
||||||
|
r"namu\.wiki",
|
||||||
|
],
|
||||||
|
}
|
||||||
|
|
||||||
|
# Title keywords that hint at content type
|
||||||
|
TITLE_CONTENT_PATTERNS: dict[str, list[str]] = {
|
||||||
|
"blog": ["블로그", "후기", "리뷰", "review", "guide", "가이드", "팁", "tips"],
|
||||||
|
"product": ["구매", "가격", "buy", "price", "shop", "할인", "sale", "최저가"],
|
||||||
|
"service": ["상담", "치료", "진료", "병원", "클리닉", "clinic", "treatment"],
|
||||||
|
"news": ["뉴스", "속보", "보도", "news", "기사", "report"],
|
||||||
|
"video": ["영상", "동영상", "video", "youtube"],
|
||||||
|
"comparison": ["비교", "vs", "versus", "compare", "차이", "best"],
|
||||||
|
}
|
||||||
|
|
||||||
|
# CTR distribution by position (approximate click-through rates)
|
||||||
|
CTR_BY_POSITION: dict[int, float] = {
|
||||||
|
1: 0.316,
|
||||||
|
2: 0.158,
|
||||||
|
3: 0.110,
|
||||||
|
4: 0.080,
|
||||||
|
5: 0.062,
|
||||||
|
6: 0.049,
|
||||||
|
7: 0.040,
|
||||||
|
8: 0.034,
|
||||||
|
9: 0.029,
|
||||||
|
10: 0.025,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# SERP Analyzer
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
class SerpAnalyzer:
|
||||||
|
"""Analyzes Google SERP features, competitor positions, and opportunities."""
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
self.logger = logging.getLogger(self.__class__.__name__)
|
||||||
|
|
||||||
|
# ----- Data Fetching -----
|
||||||
|
|
||||||
|
def get_serp_data(self, keyword: str, country: str = "us") -> dict[str, Any]:
|
||||||
|
"""
|
||||||
|
Fetch SERP data via Ahrefs serp-overview MCP tool.
|
||||||
|
|
||||||
|
Uses subprocess to invoke the Ahrefs MCP tool. Falls back to a
|
||||||
|
structured placeholder when the MCP tool is unavailable (e.g., in
|
||||||
|
standalone / CI environments).
|
||||||
|
"""
|
||||||
|
self.logger.info(f"Fetching SERP data for '{keyword}' (country={country})")
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Attempt MCP tool call via subprocess
|
||||||
|
cmd = [
|
||||||
|
"claude",
|
||||||
|
"mcp",
|
||||||
|
"call",
|
||||||
|
"ahrefs",
|
||||||
|
"serp-overview",
|
||||||
|
json.dumps({"keyword": keyword, "country": country}),
|
||||||
|
]
|
||||||
|
result = subprocess.run(
|
||||||
|
cmd,
|
||||||
|
capture_output=True,
|
||||||
|
text=True,
|
||||||
|
timeout=60,
|
||||||
|
)
|
||||||
|
if result.returncode == 0 and result.stdout.strip():
|
||||||
|
data = json.loads(result.stdout)
|
||||||
|
self.logger.info("Successfully fetched SERP data via MCP")
|
||||||
|
return data
|
||||||
|
except (subprocess.TimeoutExpired, FileNotFoundError, json.JSONDecodeError) as exc:
|
||||||
|
self.logger.warning(f"MCP call unavailable ({exc}), using keyword metrics fallback")
|
||||||
|
|
||||||
|
# Fallback: try Ahrefs keywords-explorer-overview
|
||||||
|
try:
|
||||||
|
cmd_kw = [
|
||||||
|
"claude",
|
||||||
|
"mcp",
|
||||||
|
"call",
|
||||||
|
"ahrefs",
|
||||||
|
"keywords-explorer-overview",
|
||||||
|
json.dumps({"keyword": keyword, "country": country}),
|
||||||
|
]
|
||||||
|
result_kw = subprocess.run(
|
||||||
|
cmd_kw,
|
||||||
|
capture_output=True,
|
||||||
|
text=True,
|
||||||
|
timeout=60,
|
||||||
|
)
|
||||||
|
if result_kw.returncode == 0 and result_kw.stdout.strip():
|
||||||
|
data = json.loads(result_kw.stdout)
|
||||||
|
self.logger.info("Fetched keyword overview via MCP")
|
||||||
|
return data
|
||||||
|
except (subprocess.TimeoutExpired, FileNotFoundError, json.JSONDecodeError) as exc:
|
||||||
|
self.logger.warning(f"Keywords-explorer MCP also unavailable ({exc})")
|
||||||
|
|
||||||
|
# Return empty structure when no MCP tools available
|
||||||
|
self.logger.warning(
|
||||||
|
"No MCP data source available. Run inside Claude Desktop "
|
||||||
|
"or provide data via --input flag."
|
||||||
|
)
|
||||||
|
return {
|
||||||
|
"keyword": keyword,
|
||||||
|
"country": country,
|
||||||
|
"serp": [],
|
||||||
|
"serp_features": {},
|
||||||
|
"metrics": {},
|
||||||
|
}
|
||||||
|
|
||||||
|
# ----- Feature Detection -----
|
||||||
|
|
||||||
|
def detect_features(self, serp_data: dict[str, Any]) -> SerpFeatures:
|
||||||
|
"""
|
||||||
|
Identify SERP features from Ahrefs response data.
|
||||||
|
|
||||||
|
Handles both the structured 'serp_features' dict returned by
|
||||||
|
keywords-explorer-overview and the raw SERP items list from
|
||||||
|
serp-overview.
|
||||||
|
"""
|
||||||
|
features = SerpFeatures()
|
||||||
|
|
||||||
|
# -- Method 1: structured serp_features from Ahrefs --
|
||||||
|
sf = serp_data.get("serp_features", {})
|
||||||
|
if isinstance(sf, dict):
|
||||||
|
features.featured_snippet = sf.get("featured_snippet", False)
|
||||||
|
features.people_also_ask = sf.get("people_also_ask", False)
|
||||||
|
features.local_pack = sf.get("local_pack", False)
|
||||||
|
features.knowledge_panel = sf.get("knowledge_panel", False) or sf.get(
|
||||||
|
"knowledge_graph", False
|
||||||
|
)
|
||||||
|
features.video_carousel = sf.get("video", False) or sf.get(
|
||||||
|
"video_carousel", False
|
||||||
|
)
|
||||||
|
features.image_pack = sf.get("image_pack", False) or sf.get(
|
||||||
|
"images", False
|
||||||
|
)
|
||||||
|
features.site_links = sf.get("sitelinks", False) or sf.get(
|
||||||
|
"site_links", False
|
||||||
|
)
|
||||||
|
features.shopping = sf.get("shopping_results", False) or sf.get(
|
||||||
|
"shopping", False
|
||||||
|
)
|
||||||
|
features.ads_top = int(sf.get("ads_top", 0) or 0)
|
||||||
|
features.ads_bottom = int(sf.get("ads_bottom", 0) or 0)
|
||||||
|
|
||||||
|
# -- Method 2: infer from raw SERP items list --
|
||||||
|
serp_items = serp_data.get("serp", [])
|
||||||
|
if isinstance(serp_items, list):
|
||||||
|
for item in serp_items:
|
||||||
|
item_type = str(item.get("type", "")).lower()
|
||||||
|
if "featured_snippet" in item_type or item.get("is_featured"):
|
||||||
|
features.featured_snippet = True
|
||||||
|
if "people_also_ask" in item_type or "paa" in item_type:
|
||||||
|
features.people_also_ask = True
|
||||||
|
if "local" in item_type or "map" in item_type:
|
||||||
|
features.local_pack = True
|
||||||
|
if "knowledge" in item_type:
|
||||||
|
features.knowledge_panel = True
|
||||||
|
if "video" in item_type:
|
||||||
|
features.video_carousel = True
|
||||||
|
if "image" in item_type:
|
||||||
|
features.image_pack = True
|
||||||
|
if item.get("sitelinks"):
|
||||||
|
features.site_links = True
|
||||||
|
if "shopping" in item_type:
|
||||||
|
features.shopping = True
|
||||||
|
if "ad" in item_type:
|
||||||
|
pos = item.get("position", 0)
|
||||||
|
if pos <= 4:
|
||||||
|
features.ads_top += 1
|
||||||
|
else:
|
||||||
|
features.ads_bottom += 1
|
||||||
|
|
||||||
|
return features
|
||||||
|
|
||||||
|
# ----- Competitor Mapping -----
|
||||||
|
|
||||||
|
def map_competitors(self, serp_data: dict[str, Any]) -> list[CompetitorPosition]:
|
||||||
|
"""Extract competitor positions and domains from SERP data."""
|
||||||
|
competitors: list[CompetitorPosition] = []
|
||||||
|
serp_items = serp_data.get("serp", [])
|
||||||
|
|
||||||
|
if not isinstance(serp_items, list):
|
||||||
|
return competitors
|
||||||
|
|
||||||
|
for item in serp_items:
|
||||||
|
url = item.get("url", "")
|
||||||
|
if not url:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Skip ads for organic mapping
|
||||||
|
item_type = str(item.get("type", "")).lower()
|
||||||
|
if "ad" in item_type:
|
||||||
|
continue
|
||||||
|
|
||||||
|
parsed = urlparse(url)
|
||||||
|
domain = parsed.netloc.replace("www.", "")
|
||||||
|
position = int(item.get("position", len(competitors) + 1))
|
||||||
|
title = item.get("title", "")
|
||||||
|
|
||||||
|
content_type = self.classify_content_type(item)
|
||||||
|
traffic_share = CTR_BY_POSITION.get(position, 0.01)
|
||||||
|
|
||||||
|
comp = CompetitorPosition(
|
||||||
|
position=position,
|
||||||
|
url=url,
|
||||||
|
domain=domain,
|
||||||
|
title=title,
|
||||||
|
content_type=content_type,
|
||||||
|
is_featured=bool(item.get("is_featured")),
|
||||||
|
has_sitelinks=bool(item.get("sitelinks")),
|
||||||
|
estimated_traffic_share=round(traffic_share, 4),
|
||||||
|
)
|
||||||
|
competitors.append(comp)
|
||||||
|
|
||||||
|
# Sort by position
|
||||||
|
competitors.sort(key=lambda c: c.position)
|
||||||
|
return competitors
|
||||||
|
|
||||||
|
# ----- Content Type Classification -----
|
||||||
|
|
||||||
|
def classify_content_type(self, result: dict[str, Any]) -> str:
|
||||||
|
"""
|
||||||
|
Classify a SERP result as blog/product/service/news/video/forum/wiki
|
||||||
|
based on URL patterns and title keywords.
|
||||||
|
"""
|
||||||
|
url = result.get("url", "").lower()
|
||||||
|
title = result.get("title", "").lower()
|
||||||
|
|
||||||
|
scores: dict[str, int] = {}
|
||||||
|
|
||||||
|
# Score from URL patterns
|
||||||
|
for ctype, patterns in URL_CONTENT_PATTERNS.items():
|
||||||
|
for pattern in patterns:
|
||||||
|
if re.search(pattern, url):
|
||||||
|
scores[ctype] = scores.get(ctype, 0) + 2
|
||||||
|
break
|
||||||
|
|
||||||
|
# Score from title patterns
|
||||||
|
for ctype, keywords in TITLE_CONTENT_PATTERNS.items():
|
||||||
|
for kw in keywords:
|
||||||
|
if kw.lower() in title:
|
||||||
|
scores[ctype] = scores.get(ctype, 0) + 1
|
||||||
|
|
||||||
|
if not scores:
|
||||||
|
# Heuristic: if domain is a known authority site
|
||||||
|
parsed = urlparse(url)
|
||||||
|
domain = parsed.netloc.lower()
|
||||||
|
if any(d in domain for d in ["wikipedia", "namu.wiki", "나무위키"]):
|
||||||
|
return "wiki"
|
||||||
|
if any(d in domain for d in ["youtube", "vimeo"]):
|
||||||
|
return "video"
|
||||||
|
if any(d in domain for d in ["naver.com", "tistory.com", "brunch.co.kr"]):
|
||||||
|
return "blog"
|
||||||
|
return "service_page"
|
||||||
|
|
||||||
|
# Return highest scoring type
|
||||||
|
return max(scores, key=scores.get) # type: ignore[arg-type]
|
||||||
|
|
||||||
|
# ----- Opportunity Scoring -----
|
||||||
|
|
||||||
|
def calculate_opportunity_score(
|
||||||
|
self,
|
||||||
|
features: SerpFeatures,
|
||||||
|
positions: list[CompetitorPosition],
|
||||||
|
) -> int:
|
||||||
|
"""
|
||||||
|
Score SERP opportunity from 0-100.
|
||||||
|
|
||||||
|
Higher scores indicate better opportunity to rank or gain features.
|
||||||
|
|
||||||
|
Factors (additive):
|
||||||
|
- Featured snippet available but could be captured +15
|
||||||
|
- PAA present (related question opportunity) +10
|
||||||
|
- No knowledge panel (less SERP real-estate taken) +10
|
||||||
|
- Low ad count (more organic visibility) +10
|
||||||
|
- Few sitelinks in top results +5
|
||||||
|
- Content diversity (various domains in top 10) +10
|
||||||
|
- No video carousel (opportunity to add video) +5
|
||||||
|
- Top results are blogs (easier to outrank) +10
|
||||||
|
- Image pack absent (image SEO opportunity) +5
|
||||||
|
- Shopping absent for commercial keywords +5
|
||||||
|
- Top positions lacking schema/rich results +5
|
||||||
|
|
||||||
|
Penalty factors (subtractive):
|
||||||
|
- Knowledge panel dominates -15
|
||||||
|
- Heavy ad presence (4+ top ads) -10
|
||||||
|
- Single domain dominates top 5 -10
|
||||||
|
"""
|
||||||
|
score = 50 # Base score
|
||||||
|
|
||||||
|
# -- Positive signals --
|
||||||
|
if features.featured_snippet:
|
||||||
|
score += 15
|
||||||
|
if features.people_also_ask:
|
||||||
|
score += 10
|
||||||
|
if not features.knowledge_panel:
|
||||||
|
score += 10
|
||||||
|
if features.ads_top <= 1:
|
||||||
|
score += 10
|
||||||
|
elif features.ads_top <= 2:
|
||||||
|
score += 5
|
||||||
|
if not features.video_carousel:
|
||||||
|
score += 5
|
||||||
|
if not features.image_pack:
|
||||||
|
score += 5
|
||||||
|
if not features.shopping:
|
||||||
|
score += 5
|
||||||
|
|
||||||
|
# Domain diversity in top 10
|
||||||
|
if positions:
|
||||||
|
top10_domains = {p.domain for p in positions[:10]}
|
||||||
|
if len(top10_domains) >= 8:
|
||||||
|
score += 10
|
||||||
|
elif len(top10_domains) >= 5:
|
||||||
|
score += 5
|
||||||
|
|
||||||
|
# Blog-heavy top results (easier to compete)
|
||||||
|
blog_count = sum(
|
||||||
|
1 for p in positions[:5] if p.content_type == "blog"
|
||||||
|
)
|
||||||
|
if blog_count >= 3:
|
||||||
|
score += 10
|
||||||
|
elif blog_count >= 2:
|
||||||
|
score += 5
|
||||||
|
|
||||||
|
# Sitelinks reduce available space
|
||||||
|
sitelink_count = sum(1 for p in positions[:5] if p.has_sitelinks)
|
||||||
|
if sitelink_count <= 1:
|
||||||
|
score += 5
|
||||||
|
|
||||||
|
# Single domain dominance penalty
|
||||||
|
domain_counts: dict[str, int] = {}
|
||||||
|
for p in positions[:5]:
|
||||||
|
domain_counts[p.domain] = domain_counts.get(p.domain, 0) + 1
|
||||||
|
if any(c >= 3 for c in domain_counts.values()):
|
||||||
|
score -= 10
|
||||||
|
|
||||||
|
# -- Negative signals --
|
||||||
|
if features.knowledge_panel:
|
||||||
|
score -= 15
|
||||||
|
if features.ads_top >= 4:
|
||||||
|
score -= 10
|
||||||
|
elif features.ads_top >= 3:
|
||||||
|
score -= 5
|
||||||
|
|
||||||
|
# Clamp to 0-100
|
||||||
|
return max(0, min(100, score))
|
||||||
|
|
||||||
|
# ----- Intent Validation -----
|
||||||
|
|
||||||
|
def validate_intent(
|
||||||
|
self,
|
||||||
|
features: SerpFeatures,
|
||||||
|
positions: list[CompetitorPosition],
|
||||||
|
) -> str:
|
||||||
|
"""
|
||||||
|
Infer search intent from SERP composition.
|
||||||
|
|
||||||
|
Returns one of: informational, navigational, commercial, transactional, local
|
||||||
|
"""
|
||||||
|
signals: dict[str, int] = {
|
||||||
|
"informational": 0,
|
||||||
|
"navigational": 0,
|
||||||
|
"commercial": 0,
|
||||||
|
"transactional": 0,
|
||||||
|
"local": 0,
|
||||||
|
}
|
||||||
|
|
||||||
|
# Feature-based signals
|
||||||
|
if features.featured_snippet:
|
||||||
|
signals["informational"] += 3
|
||||||
|
if features.people_also_ask:
|
||||||
|
signals["informational"] += 2
|
||||||
|
if features.knowledge_panel:
|
||||||
|
signals["informational"] += 2
|
||||||
|
signals["navigational"] += 2
|
||||||
|
if features.local_pack:
|
||||||
|
signals["local"] += 5
|
||||||
|
if features.shopping:
|
||||||
|
signals["transactional"] += 4
|
||||||
|
if features.has_ads:
|
||||||
|
signals["commercial"] += 2
|
||||||
|
signals["transactional"] += 1
|
||||||
|
if features.ads_top >= 3:
|
||||||
|
signals["transactional"] += 2
|
||||||
|
if features.image_pack:
|
||||||
|
signals["informational"] += 1
|
||||||
|
if features.video_carousel:
|
||||||
|
signals["informational"] += 1
|
||||||
|
|
||||||
|
# Content type signals from top results
|
||||||
|
for pos in positions[:10]:
|
||||||
|
ct = pos.content_type
|
||||||
|
if ct == "blog":
|
||||||
|
signals["informational"] += 1
|
||||||
|
elif ct == "product":
|
||||||
|
signals["transactional"] += 2
|
||||||
|
elif ct == "service":
|
||||||
|
signals["commercial"] += 1
|
||||||
|
elif ct == "news":
|
||||||
|
signals["informational"] += 1
|
||||||
|
elif ct == "video":
|
||||||
|
signals["informational"] += 1
|
||||||
|
elif ct == "wiki":
|
||||||
|
signals["informational"] += 2
|
||||||
|
elif ct == "forum":
|
||||||
|
signals["informational"] += 1
|
||||||
|
elif ct == "comparison":
|
||||||
|
signals["commercial"] += 2
|
||||||
|
|
||||||
|
# Navigational: single domain dominates top 3
|
||||||
|
if positions:
|
||||||
|
top3_domains = [p.domain for p in positions[:3]]
|
||||||
|
if len(set(top3_domains)) == 1:
|
||||||
|
signals["navigational"] += 5
|
||||||
|
|
||||||
|
# Return highest signal
|
||||||
|
return max(signals, key=signals.get) # type: ignore[arg-type]
|
||||||
|
|
||||||
|
# ----- Content Type Distribution -----
|
||||||
|
|
||||||
|
def _content_type_distribution(
|
||||||
|
self, positions: list[CompetitorPosition]
|
||||||
|
) -> dict[str, int]:
|
||||||
|
"""Count content types across top organic results."""
|
||||||
|
dist: dict[str, int] = {}
|
||||||
|
for p in positions[:10]:
|
||||||
|
dist[p.content_type] = dist.get(p.content_type, 0) + 1
|
||||||
|
return dict(sorted(dist.items(), key=lambda x: x[1], reverse=True))
|
||||||
|
|
||||||
|
# ----- Volatility Assessment -----
|
||||||
|
|
||||||
|
def _assess_volatility(self, serp_data: dict[str, Any]) -> str:
|
||||||
|
"""
|
||||||
|
Assess SERP volatility based on available signals.
|
||||||
|
|
||||||
|
Returns: stable, moderate, volatile
|
||||||
|
"""
|
||||||
|
# Check if Ahrefs provides a volatility/movement score
|
||||||
|
metrics = serp_data.get("metrics", {})
|
||||||
|
if isinstance(metrics, dict):
|
||||||
|
volatility_score = metrics.get("serp_volatility", None)
|
||||||
|
if volatility_score is not None:
|
||||||
|
if volatility_score < 3:
|
||||||
|
return "stable"
|
||||||
|
elif volatility_score < 7:
|
||||||
|
return "moderate"
|
||||||
|
else:
|
||||||
|
return "volatile"
|
||||||
|
|
||||||
|
# Heuristic: if many results have recent dates, SERP is more volatile
|
||||||
|
serp_items = serp_data.get("serp", [])
|
||||||
|
if isinstance(serp_items, list) and serp_items:
|
||||||
|
recent_count = 0
|
||||||
|
for item in serp_items[:10]:
|
||||||
|
last_seen = item.get("last_seen", "")
|
||||||
|
if last_seen:
|
||||||
|
try:
|
||||||
|
dt = datetime.fromisoformat(last_seen.replace("Z", "+00:00"))
|
||||||
|
if (datetime.now(dt.tzinfo) - dt).days < 30:
|
||||||
|
recent_count += 1
|
||||||
|
except (ValueError, TypeError):
|
||||||
|
pass
|
||||||
|
if recent_count >= 5:
|
||||||
|
return "volatile"
|
||||||
|
elif recent_count >= 3:
|
||||||
|
return "moderate"
|
||||||
|
|
||||||
|
return "stable"
|
||||||
|
|
||||||
|
# ----- Main Analysis Orchestrator -----
|
||||||
|
|
||||||
|
def analyze(self, keyword: str, country: str = "us") -> SerpResult:
|
||||||
|
"""
|
||||||
|
Orchestrate full SERP analysis for a single keyword.
|
||||||
|
|
||||||
|
Steps:
|
||||||
|
1. Fetch SERP data from Ahrefs MCP
|
||||||
|
2. Detect SERP features
|
||||||
|
3. Map competitor positions
|
||||||
|
4. Classify content types
|
||||||
|
5. Calculate opportunity score
|
||||||
|
6. Validate search intent
|
||||||
|
7. Assess volatility
|
||||||
|
"""
|
||||||
|
serp_data = self.get_serp_data(keyword, country)
|
||||||
|
|
||||||
|
features = self.detect_features(serp_data)
|
||||||
|
positions = self.map_competitors(serp_data)
|
||||||
|
opportunity = self.calculate_opportunity_score(features, positions)
|
||||||
|
intent = self.validate_intent(features, positions)
|
||||||
|
content_dist = self._content_type_distribution(positions)
|
||||||
|
volatility = self._assess_volatility(serp_data)
|
||||||
|
|
||||||
|
# Extract keyword metrics if available
|
||||||
|
metrics = serp_data.get("metrics", {})
|
||||||
|
search_volume = int(metrics.get("search_volume", 0) or 0)
|
||||||
|
keyword_difficulty = float(metrics.get("keyword_difficulty", 0) or 0)
|
||||||
|
cpc = float(metrics.get("cpc", 0) or 0)
|
||||||
|
|
||||||
|
result = SerpResult(
|
||||||
|
keyword=keyword,
|
||||||
|
country=country,
|
||||||
|
search_volume=search_volume,
|
||||||
|
keyword_difficulty=keyword_difficulty,
|
||||||
|
cpc=cpc,
|
||||||
|
serp_features=features,
|
||||||
|
competitors=positions,
|
||||||
|
opportunity_score=opportunity,
|
||||||
|
intent_signals=intent,
|
||||||
|
content_type_distribution=content_dist,
|
||||||
|
volatility=volatility,
|
||||||
|
)
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Output Helpers
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def result_to_dict(result: SerpResult) -> dict[str, Any]:
|
||||||
|
"""Convert SerpResult to a JSON-serializable dictionary."""
|
||||||
|
d = asdict(result)
|
||||||
|
return d
|
||||||
|
|
||||||
|
|
||||||
|
def print_rich_report(result: SerpResult) -> None:
|
||||||
|
"""Print a human-readable report using rich."""
|
||||||
|
console.rule(f"[bold blue]SERP Analysis: {result.keyword}")
|
||||||
|
console.print(f"[dim]Country: {result.country} | Timestamp: {result.timestamp}[/dim]")
|
||||||
|
console.print()
|
||||||
|
|
||||||
|
# Metrics
|
||||||
|
if result.search_volume or result.keyword_difficulty:
|
||||||
|
metrics_table = Table(title="Keyword Metrics", show_lines=True)
|
||||||
|
metrics_table.add_column("Metric", style="cyan")
|
||||||
|
metrics_table.add_column("Value", style="green")
|
||||||
|
metrics_table.add_row("Search Volume", f"{result.search_volume:,}")
|
||||||
|
metrics_table.add_row("Keyword Difficulty", f"{result.keyword_difficulty:.1f}")
|
||||||
|
metrics_table.add_row("CPC", f"${result.cpc:.2f}")
|
||||||
|
console.print(metrics_table)
|
||||||
|
console.print()
|
||||||
|
|
||||||
|
# SERP Features
|
||||||
|
feat = result.serp_features
|
||||||
|
feat_table = Table(title="SERP Features", show_lines=True)
|
||||||
|
feat_table.add_column("Feature", style="cyan")
|
||||||
|
feat_table.add_column("Present", style="green")
|
||||||
|
feat_table.add_row("Featured Snippet", _bool_icon(feat.featured_snippet))
|
||||||
|
feat_table.add_row("People Also Ask", _bool_icon(feat.people_also_ask))
|
||||||
|
feat_table.add_row("Local Pack", _bool_icon(feat.local_pack))
|
||||||
|
feat_table.add_row("Knowledge Panel", _bool_icon(feat.knowledge_panel))
|
||||||
|
feat_table.add_row("Video Carousel", _bool_icon(feat.video_carousel))
|
||||||
|
feat_table.add_row("Image Pack", _bool_icon(feat.image_pack))
|
||||||
|
feat_table.add_row("Site Links", _bool_icon(feat.site_links))
|
||||||
|
feat_table.add_row("Shopping", _bool_icon(feat.shopping))
|
||||||
|
feat_table.add_row("Ads (top)", str(feat.ads_top))
|
||||||
|
feat_table.add_row("Ads (bottom)", str(feat.ads_bottom))
|
||||||
|
console.print(feat_table)
|
||||||
|
console.print()
|
||||||
|
|
||||||
|
# Competitors
|
||||||
|
if result.competitors:
|
||||||
|
comp_table = Table(title="Top Competitors", show_lines=True)
|
||||||
|
comp_table.add_column("#", style="bold")
|
||||||
|
comp_table.add_column("Domain", style="cyan")
|
||||||
|
comp_table.add_column("Type", style="magenta")
|
||||||
|
comp_table.add_column("CTR Share", style="green")
|
||||||
|
comp_table.add_column("Featured", style="yellow")
|
||||||
|
for c in result.competitors[:10]:
|
||||||
|
comp_table.add_row(
|
||||||
|
str(c.position),
|
||||||
|
c.domain,
|
||||||
|
c.content_type,
|
||||||
|
f"{c.estimated_traffic_share:.1%}",
|
||||||
|
_bool_icon(c.is_featured),
|
||||||
|
)
|
||||||
|
console.print(comp_table)
|
||||||
|
console.print()
|
||||||
|
|
||||||
|
# Content Distribution
|
||||||
|
if result.content_type_distribution:
|
||||||
|
dist_table = Table(title="Content Type Distribution (Top 10)", show_lines=True)
|
||||||
|
dist_table.add_column("Content Type", style="cyan")
|
||||||
|
dist_table.add_column("Count", style="green")
|
||||||
|
for ct, count in result.content_type_distribution.items():
|
||||||
|
dist_table.add_row(ct, str(count))
|
||||||
|
console.print(dist_table)
|
||||||
|
console.print()
|
||||||
|
|
||||||
|
# Summary
|
||||||
|
opp_color = "green" if result.opportunity_score >= 60 else (
|
||||||
|
"yellow" if result.opportunity_score >= 40 else "red"
|
||||||
|
)
|
||||||
|
console.print(f"Opportunity Score: [{opp_color}]{result.opportunity_score}/100[/{opp_color}]")
|
||||||
|
console.print(f"Search Intent: [bold]{result.intent_signals}[/bold]")
|
||||||
|
console.print(f"SERP Volatility: [bold]{result.volatility}[/bold]")
|
||||||
|
console.rule()
|
||||||
|
|
||||||
|
|
||||||
|
def _bool_icon(val: bool) -> str:
|
||||||
|
"""Return Yes/No string for boolean values."""
|
||||||
|
return "Yes" if val else "No"
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# CLI
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def build_parser() -> argparse.ArgumentParser:
|
||||||
|
parser = argparse.ArgumentParser(
|
||||||
|
description="Google SERP feature detection and competitor mapping",
|
||||||
|
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||||
|
epilog="""
|
||||||
|
Examples:
|
||||||
|
python serp_analyzer.py --keyword "치과 임플란트" --country kr --json
|
||||||
|
python serp_analyzer.py --keywords-file keywords.txt --country kr --output report.json
|
||||||
|
""",
|
||||||
|
)
|
||||||
|
group = parser.add_mutually_exclusive_group(required=True)
|
||||||
|
group.add_argument(
|
||||||
|
"--keyword",
|
||||||
|
type=str,
|
||||||
|
help="Single keyword to analyze",
|
||||||
|
)
|
||||||
|
group.add_argument(
|
||||||
|
"--keywords-file",
|
||||||
|
type=str,
|
||||||
|
help="Path to file with one keyword per line",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--country",
|
||||||
|
type=str,
|
||||||
|
default="us",
|
||||||
|
help="Country code for SERP (default: us)",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--json",
|
||||||
|
action="store_true",
|
||||||
|
dest="json_output",
|
||||||
|
help="Output results as JSON",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--output",
|
||||||
|
type=str,
|
||||||
|
help="Write JSON results to file",
|
||||||
|
)
|
||||||
|
return parser
|
||||||
|
|
||||||
|
|
||||||
|
def load_keywords(filepath: str) -> list[str]:
|
||||||
|
"""Load keywords from a text file, one per line."""
|
||||||
|
path = Path(filepath)
|
||||||
|
if not path.exists():
|
||||||
|
logger.error(f"Keywords file not found: {filepath}")
|
||||||
|
sys.exit(1)
|
||||||
|
keywords = []
|
||||||
|
with open(path, "r", encoding="utf-8") as fh:
|
||||||
|
for line in fh:
|
||||||
|
kw = line.strip()
|
||||||
|
if kw and not kw.startswith("#"):
|
||||||
|
keywords.append(kw)
|
||||||
|
logger.info(f"Loaded {len(keywords)} keywords from {filepath}")
|
||||||
|
return keywords
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> None:
|
||||||
|
parser = build_parser()
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
analyzer = SerpAnalyzer()
|
||||||
|
|
||||||
|
# Collect keywords
|
||||||
|
if args.keyword:
|
||||||
|
keywords = [args.keyword]
|
||||||
|
else:
|
||||||
|
keywords = load_keywords(args.keywords_file)
|
||||||
|
|
||||||
|
if not keywords:
|
||||||
|
logger.error("No keywords to analyze")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
results: list[dict[str, Any]] = []
|
||||||
|
|
||||||
|
for kw in keywords:
|
||||||
|
console.print(f"\n[bold]Analyzing:[/bold] {kw}")
|
||||||
|
result = analyzer.analyze(kw, args.country)
|
||||||
|
|
||||||
|
if args.json_output or args.output:
|
||||||
|
results.append(result_to_dict(result))
|
||||||
|
else:
|
||||||
|
print_rich_report(result)
|
||||||
|
|
||||||
|
# JSON output
|
||||||
|
if args.json_output:
|
||||||
|
output_data = results[0] if len(results) == 1 else results
|
||||||
|
print(json.dumps(output_data, ensure_ascii=False, indent=2))
|
||||||
|
|
||||||
|
if args.output:
|
||||||
|
output_data = results[0] if len(results) == 1 else results
|
||||||
|
output_path = Path(args.output)
|
||||||
|
with open(output_path, "w", encoding="utf-8") as fh:
|
||||||
|
json.dump(output_data, fh, ensure_ascii=False, indent=2)
|
||||||
|
logger.info(f"Results written to {output_path}")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
121
custom-skills/20-seo-serp-analysis/desktop/SKILL.md
Normal file
121
custom-skills/20-seo-serp-analysis/desktop/SKILL.md
Normal file
@@ -0,0 +1,121 @@
|
|||||||
|
---
|
||||||
|
name: seo-serp-analysis
|
||||||
|
description: |
|
||||||
|
SERP analysis for Google and Naver search results.
|
||||||
|
Triggers: SERP analysis, search results, featured snippet, SERP features, Naver SERP, 검색결과 분석, SERP 분석.
|
||||||
|
---
|
||||||
|
|
||||||
|
# SEO SERP Analysis
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
Analyze search engine result page composition for Google and Naver. Detect SERP features (featured snippets, PAA, knowledge panels, local pack, video, ads), map competitor positions, score SERP feature opportunities, and analyze Naver section distribution.
|
||||||
|
|
||||||
|
## Core Capabilities
|
||||||
|
|
||||||
|
1. **Google SERP Feature Detection** - Identify featured snippets, PAA, knowledge panels, local pack, video carousel, ads, image pack, site links, shopping
|
||||||
|
2. **Competitor Position Mapping** - Extract domains, positions, content types for top organic results
|
||||||
|
3. **Opportunity Scoring** - Score SERP opportunity (0-100) based on feature landscape and competition
|
||||||
|
4. **Search Intent Validation** - Infer intent (informational, navigational, commercial, transactional, local) from SERP composition
|
||||||
|
5. **Naver SERP Composition** - Detect sections (blog, cafe, knowledge iN, Smart Store, brand zone, VIEW tab), map section priority, analyze brand zone presence
|
||||||
|
|
||||||
|
## MCP Tool Usage
|
||||||
|
|
||||||
|
### Ahrefs for SERP Data
|
||||||
|
```
|
||||||
|
mcp__ahrefs__serp-overview: Get SERP results and features for a keyword
|
||||||
|
mcp__ahrefs__keywords-explorer-overview: Get keyword metrics, volume, difficulty, and SERP feature flags
|
||||||
|
mcp__ahrefs__site-explorer-organic-keywords: Map competitor keyword positions
|
||||||
|
```
|
||||||
|
|
||||||
|
### Notion for Report Storage
|
||||||
|
```
|
||||||
|
mcp__notion__notion-create-pages: Save analysis report to SEO Audit Log database
|
||||||
|
mcp__notion__notion-update-page: Update existing report entries
|
||||||
|
```
|
||||||
|
|
||||||
|
### Web Tools for Naver Analysis
|
||||||
|
```
|
||||||
|
WebSearch: Discover Naver search trends
|
||||||
|
WebFetch: Fetch Naver SERP HTML for section analysis
|
||||||
|
```
|
||||||
|
|
||||||
|
## Workflow
|
||||||
|
|
||||||
|
### 1. Google SERP Analysis
|
||||||
|
1. Fetch SERP data via `mcp__ahrefs__serp-overview` for the target keyword and country
|
||||||
|
2. Detect SERP features (featured snippet, PAA, local pack, knowledge panel, video, ads, images, shopping)
|
||||||
|
3. Map competitor positions from organic results (domain, URL, title, position)
|
||||||
|
4. Classify content type for each result (blog, product, service, news, video)
|
||||||
|
5. Calculate opportunity score (0-100) based on feature landscape
|
||||||
|
6. Validate search intent from SERP composition
|
||||||
|
7. Assess SERP volatility
|
||||||
|
|
||||||
|
### 2. Naver SERP Analysis
|
||||||
|
1. Fetch Naver search page for the target keyword
|
||||||
|
2. Detect SERP sections (blog, cafe, knowledge iN, Smart Store, brand zone, VIEW tab, news, encyclopedia)
|
||||||
|
3. Map section priority (above-fold order)
|
||||||
|
4. Check brand zone presence and extract brand name
|
||||||
|
5. Count items per section
|
||||||
|
6. Identify dominant content section
|
||||||
|
|
||||||
|
### 3. Report Generation
|
||||||
|
1. Compile results into structured JSON
|
||||||
|
2. Generate Korean-language report
|
||||||
|
3. Save to Notion SEO Audit Log database
|
||||||
|
|
||||||
|
## Output Format
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"keyword": "치과 임플란트",
|
||||||
|
"country": "kr",
|
||||||
|
"serp_features": {
|
||||||
|
"featured_snippet": true,
|
||||||
|
"people_also_ask": true,
|
||||||
|
"local_pack": true,
|
||||||
|
"knowledge_panel": false,
|
||||||
|
"video_carousel": false,
|
||||||
|
"ads_top": 3,
|
||||||
|
"ads_bottom": 2
|
||||||
|
},
|
||||||
|
"competitors": [
|
||||||
|
{
|
||||||
|
"position": 1,
|
||||||
|
"url": "https://example.com/page",
|
||||||
|
"domain": "example.com",
|
||||||
|
"title": "...",
|
||||||
|
"content_type": "service_page"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"opportunity_score": 72,
|
||||||
|
"intent_signals": "commercial",
|
||||||
|
"timestamp": "2025-01-01T00:00:00"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Common SERP Features
|
||||||
|
|
||||||
|
| Feature | Impact | Opportunity |
|
||||||
|
|---------|--------|-------------|
|
||||||
|
| Featured Snippet | High visibility above organic | Optimize content format for snippet capture |
|
||||||
|
| People Also Ask | Related question visibility | Create FAQ content targeting PAA |
|
||||||
|
| Local Pack | Dominates local intent SERPs | Optimize Google Business Profile |
|
||||||
|
| Knowledge Panel | Reduces organic CTR | Focus on brand queries and schema |
|
||||||
|
| Video Carousel | Visual SERP real estate | Create video content for keyword |
|
||||||
|
| Shopping | Transactional intent signal | Product feed optimization |
|
||||||
|
|
||||||
|
## Limitations
|
||||||
|
|
||||||
|
- Ahrefs SERP data may have a delay (not real-time)
|
||||||
|
- Naver SERP HTML structure changes periodically
|
||||||
|
- Brand zone detection depends on HTML class patterns
|
||||||
|
- Cannot detect personalized SERP results
|
||||||
|
|
||||||
|
## Notion Output (Required)
|
||||||
|
|
||||||
|
All audit reports MUST be saved to OurDigital SEO Audit Log:
|
||||||
|
- **Database ID**: `2c8581e5-8a1e-8035-880b-e38cefc2f3ef`
|
||||||
|
- **Properties**: Issue (title), Site (url), Category, Priority, Found Date, Audit ID
|
||||||
|
- **Language**: Korean with English technical terms
|
||||||
|
- **Audit ID Format**: SERP-YYYYMMDD-NNN
|
||||||
14
custom-skills/20-seo-serp-analysis/desktop/skill.yaml
Normal file
14
custom-skills/20-seo-serp-analysis/desktop/skill.yaml
Normal file
@@ -0,0 +1,14 @@
|
|||||||
|
# Skill metadata (extracted from SKILL.md frontmatter)
|
||||||
|
|
||||||
|
name: seo-serp-analysis
|
||||||
|
description: |
|
||||||
|
SERP analysis for Google and Naver. Triggers: SERP analysis, search results, featured snippet, SERP features, Naver SERP.
|
||||||
|
|
||||||
|
# Optional fields
|
||||||
|
allowed-tools:
|
||||||
|
- mcp__ahrefs__*
|
||||||
|
- mcp__notion__*
|
||||||
|
- WebSearch
|
||||||
|
- WebFetch
|
||||||
|
|
||||||
|
# triggers: [] # TODO: Extract from description
|
||||||
15
custom-skills/20-seo-serp-analysis/desktop/tools/ahrefs.md
Normal file
15
custom-skills/20-seo-serp-analysis/desktop/tools/ahrefs.md
Normal file
@@ -0,0 +1,15 @@
|
|||||||
|
# Ahrefs
|
||||||
|
|
||||||
|
> TODO: Document tool usage for this skill
|
||||||
|
|
||||||
|
## Available Commands
|
||||||
|
|
||||||
|
- [ ] List commands
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
- [ ] Add configuration details
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
- [ ] Add usage examples
|
||||||
15
custom-skills/20-seo-serp-analysis/desktop/tools/notion.md
Normal file
15
custom-skills/20-seo-serp-analysis/desktop/tools/notion.md
Normal file
@@ -0,0 +1,15 @@
|
|||||||
|
# Notion
|
||||||
|
|
||||||
|
> TODO: Document tool usage for this skill
|
||||||
|
|
||||||
|
## Available Commands
|
||||||
|
|
||||||
|
- [ ] List commands
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
- [ ] Add configuration details
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
- [ ] Add usage examples
|
||||||
@@ -0,0 +1,15 @@
|
|||||||
|
# WebSearch
|
||||||
|
|
||||||
|
> TODO: Document tool usage for this skill
|
||||||
|
|
||||||
|
## Available Commands
|
||||||
|
|
||||||
|
- [ ] List commands
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
- [ ] Add configuration details
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
- [ ] Add usage examples
|
||||||
148
custom-skills/21-seo-position-tracking/code/CLAUDE.md
Normal file
148
custom-skills/21-seo-position-tracking/code/CLAUDE.md
Normal file
@@ -0,0 +1,148 @@
|
|||||||
|
# CLAUDE.md
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Position tracking tool for monitoring keyword rankings via Ahrefs Rank Tracker. Monitors ranking positions, detects position changes with threshold alerts, calculates visibility scores weighted by search volume, compares against competitors, and segments by brand/non-brand keywords.
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install -r scripts/requirements.txt
|
||||||
|
|
||||||
|
# Track positions for a project
|
||||||
|
python scripts/position_tracker.py --target https://example.com --json
|
||||||
|
|
||||||
|
# Generate ranking report
|
||||||
|
python scripts/ranking_reporter.py --target https://example.com --period 30 --json
|
||||||
|
```
|
||||||
|
|
||||||
|
## Scripts
|
||||||
|
|
||||||
|
| Script | Purpose | Key Output |
|
||||||
|
|--------|---------|------------|
|
||||||
|
| `position_tracker.py` | Monitor keyword ranking positions and detect changes | Position data, change alerts, visibility scores |
|
||||||
|
| `ranking_reporter.py` | Generate ranking performance reports with trends | Trend analysis, segment reports, competitor comparison |
|
||||||
|
| `base_client.py` | Shared utilities | RateLimiter, ConfigManager, BaseAsyncClient |
|
||||||
|
|
||||||
|
## Position Tracker
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Get current positions
|
||||||
|
python scripts/position_tracker.py --target https://example.com --json
|
||||||
|
|
||||||
|
# With change threshold alerts (flag positions that moved +-5 or more)
|
||||||
|
python scripts/position_tracker.py --target https://example.com --threshold 5 --json
|
||||||
|
|
||||||
|
# Filter by keyword segment
|
||||||
|
python scripts/position_tracker.py --target https://example.com --segment brand --json
|
||||||
|
|
||||||
|
# Compare with competitors
|
||||||
|
python scripts/position_tracker.py --target https://example.com --competitor https://comp1.com --json
|
||||||
|
```
|
||||||
|
|
||||||
|
**Capabilities**:
|
||||||
|
- Current ranking position retrieval via Ahrefs Rank Tracker
|
||||||
|
- Position change detection with configurable threshold alerts
|
||||||
|
- Visibility score calculation (weighted by search volume)
|
||||||
|
- Brand vs non-brand keyword segmentation
|
||||||
|
- Competitor rank comparison
|
||||||
|
- Keyword segment grouping (by intent, cluster, landing page)
|
||||||
|
|
||||||
|
## Ranking Reporter
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 30-day ranking report
|
||||||
|
python scripts/ranking_reporter.py --target https://example.com --period 30 --json
|
||||||
|
|
||||||
|
# Quarterly comparison
|
||||||
|
python scripts/ranking_reporter.py --target https://example.com --period 90 --json
|
||||||
|
|
||||||
|
# Export with competitor comparison
|
||||||
|
python scripts/ranking_reporter.py --target https://example.com --competitor https://comp1.com --period 30 --json
|
||||||
|
```
|
||||||
|
|
||||||
|
**Capabilities**:
|
||||||
|
- Period-over-period ranking trends (improved/declined/stable)
|
||||||
|
- Top movers (biggest position gains/losses)
|
||||||
|
- Visibility score trend over time
|
||||||
|
- Segment-level performance breakdown
|
||||||
|
- Competitor overlap and position comparison
|
||||||
|
- Average position by keyword group
|
||||||
|
|
||||||
|
## Ahrefs MCP Tools Used
|
||||||
|
|
||||||
|
| Tool | Purpose |
|
||||||
|
|------|---------|
|
||||||
|
| `rank-tracker-overview` | Get rank tracking overview for project |
|
||||||
|
| `rank-tracker-competitors-overview` | Compare against competitors |
|
||||||
|
| `rank-tracker-competitors-pages` | Competitor page-level rankings |
|
||||||
|
| `rank-tracker-competitors-stats` | Competitor ranking statistics |
|
||||||
|
| `rank-tracker-serp-overview` | SERP details for tracked keywords |
|
||||||
|
| `management-projects` | List Ahrefs projects |
|
||||||
|
| `management-project-keywords` | Get tracked keywords for project |
|
||||||
|
|
||||||
|
## Output Format
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"target": "https://example.com",
|
||||||
|
"total_keywords": 250,
|
||||||
|
"visibility_score": 68.5,
|
||||||
|
"positions": {
|
||||||
|
"top3": 15,
|
||||||
|
"top10": 48,
|
||||||
|
"top20": 92,
|
||||||
|
"top50": 180,
|
||||||
|
"top100": 230
|
||||||
|
},
|
||||||
|
"changes": {
|
||||||
|
"improved": 45,
|
||||||
|
"declined": 30,
|
||||||
|
"stable": 155,
|
||||||
|
"new": 12,
|
||||||
|
"lost": 8
|
||||||
|
},
|
||||||
|
"alerts": [
|
||||||
|
{
|
||||||
|
"keyword": "치과 임플란트 가격",
|
||||||
|
"old_position": 5,
|
||||||
|
"new_position": 15,
|
||||||
|
"change": -10,
|
||||||
|
"volume": 5400
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"segments": {
|
||||||
|
"brand": {"keywords": 30, "avg_position": 2.1},
|
||||||
|
"non_brand": {"keywords": 220, "avg_position": 24.5}
|
||||||
|
},
|
||||||
|
"timestamp": "2025-01-01T00:00:00"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Notion Output (Required)
|
||||||
|
|
||||||
|
**IMPORTANT**: All audit reports MUST be saved to the OurDigital SEO Audit Log database.
|
||||||
|
|
||||||
|
### Database Configuration
|
||||||
|
|
||||||
|
| Field | Value |
|
||||||
|
|-------|-------|
|
||||||
|
| Database ID | `2c8581e5-8a1e-8035-880b-e38cefc2f3ef` |
|
||||||
|
| URL | https://www.notion.so/dintelligence/2c8581e58a1e8035880be38cefc2f3ef |
|
||||||
|
|
||||||
|
### Required Properties
|
||||||
|
|
||||||
|
| Property | Type | Description |
|
||||||
|
|----------|------|-------------|
|
||||||
|
| Issue | Title | Report title (Korean + date) |
|
||||||
|
| Site | URL | Tracked website URL |
|
||||||
|
| Category | Select | Position Tracking |
|
||||||
|
| Priority | Select | Based on visibility trend |
|
||||||
|
| Found Date | Date | Tracking date (YYYY-MM-DD) |
|
||||||
|
| Audit ID | Rich Text | Format: RANK-YYYYMMDD-NNN |
|
||||||
|
|
||||||
|
### Language Guidelines
|
||||||
|
|
||||||
|
- Report content in Korean (한국어)
|
||||||
|
- Keep technical English terms as-is (e.g., Visibility Score, SERP, Rank Tracker)
|
||||||
|
- URLs and code remain unchanged
|
||||||
@@ -0,0 +1,207 @@
|
|||||||
|
"""
|
||||||
|
Base Client - Shared async client utilities
|
||||||
|
===========================================
|
||||||
|
Purpose: Rate-limited async operations for API clients
|
||||||
|
Python: 3.10+
|
||||||
|
"""
|
||||||
|
|
||||||
|
import asyncio
|
||||||
|
import logging
|
||||||
|
import os
|
||||||
|
from asyncio import Semaphore
|
||||||
|
from datetime import datetime
|
||||||
|
from typing import Any, Callable, TypeVar
|
||||||
|
|
||||||
|
from dotenv import load_dotenv
|
||||||
|
from tenacity import (
|
||||||
|
retry,
|
||||||
|
stop_after_attempt,
|
||||||
|
wait_exponential,
|
||||||
|
retry_if_exception_type,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Load environment variables
|
||||||
|
load_dotenv()
|
||||||
|
|
||||||
|
# Logging setup
|
||||||
|
logging.basicConfig(
|
||||||
|
level=logging.INFO,
|
||||||
|
format="%(asctime)s - %(levelname)s - %(message)s",
|
||||||
|
)
|
||||||
|
|
||||||
|
T = TypeVar("T")
|
||||||
|
|
||||||
|
|
||||||
|
class RateLimiter:
|
||||||
|
"""Rate limiter using token bucket algorithm."""
|
||||||
|
|
||||||
|
def __init__(self, rate: float, per: float = 1.0):
|
||||||
|
"""
|
||||||
|
Initialize rate limiter.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
rate: Number of requests allowed
|
||||||
|
per: Time period in seconds (default: 1 second)
|
||||||
|
"""
|
||||||
|
self.rate = rate
|
||||||
|
self.per = per
|
||||||
|
self.tokens = rate
|
||||||
|
self.last_update = datetime.now()
|
||||||
|
self._lock = asyncio.Lock()
|
||||||
|
|
||||||
|
async def acquire(self) -> None:
|
||||||
|
"""Acquire a token, waiting if necessary."""
|
||||||
|
async with self._lock:
|
||||||
|
now = datetime.now()
|
||||||
|
elapsed = (now - self.last_update).total_seconds()
|
||||||
|
self.tokens = min(self.rate, self.tokens + elapsed * (self.rate / self.per))
|
||||||
|
self.last_update = now
|
||||||
|
|
||||||
|
if self.tokens < 1:
|
||||||
|
wait_time = (1 - self.tokens) * (self.per / self.rate)
|
||||||
|
await asyncio.sleep(wait_time)
|
||||||
|
self.tokens = 0
|
||||||
|
else:
|
||||||
|
self.tokens -= 1
|
||||||
|
|
||||||
|
|
||||||
|
class BaseAsyncClient:
|
||||||
|
"""Base class for async API clients with rate limiting."""
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
max_concurrent: int = 5,
|
||||||
|
requests_per_second: float = 3.0,
|
||||||
|
logger: logging.Logger | None = None,
|
||||||
|
):
|
||||||
|
"""
|
||||||
|
Initialize base client.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
max_concurrent: Maximum concurrent requests
|
||||||
|
requests_per_second: Rate limit
|
||||||
|
logger: Logger instance
|
||||||
|
"""
|
||||||
|
self.semaphore = Semaphore(max_concurrent)
|
||||||
|
self.rate_limiter = RateLimiter(requests_per_second)
|
||||||
|
self.logger = logger or logging.getLogger(self.__class__.__name__)
|
||||||
|
self.stats = {
|
||||||
|
"requests": 0,
|
||||||
|
"success": 0,
|
||||||
|
"errors": 0,
|
||||||
|
"retries": 0,
|
||||||
|
}
|
||||||
|
|
||||||
|
@retry(
|
||||||
|
stop=stop_after_attempt(3),
|
||||||
|
wait=wait_exponential(multiplier=1, min=2, max=10),
|
||||||
|
retry=retry_if_exception_type(Exception),
|
||||||
|
)
|
||||||
|
async def _rate_limited_request(
|
||||||
|
self,
|
||||||
|
coro: Callable[[], Any],
|
||||||
|
) -> Any:
|
||||||
|
"""Execute a request with rate limiting and retry."""
|
||||||
|
async with self.semaphore:
|
||||||
|
await self.rate_limiter.acquire()
|
||||||
|
self.stats["requests"] += 1
|
||||||
|
try:
|
||||||
|
result = await coro()
|
||||||
|
self.stats["success"] += 1
|
||||||
|
return result
|
||||||
|
except Exception as e:
|
||||||
|
self.stats["errors"] += 1
|
||||||
|
self.logger.error(f"Request failed: {e}")
|
||||||
|
raise
|
||||||
|
|
||||||
|
async def batch_requests(
|
||||||
|
self,
|
||||||
|
requests: list[Callable[[], Any]],
|
||||||
|
desc: str = "Processing",
|
||||||
|
) -> list[Any]:
|
||||||
|
"""Execute multiple requests concurrently."""
|
||||||
|
try:
|
||||||
|
from tqdm.asyncio import tqdm
|
||||||
|
has_tqdm = True
|
||||||
|
except ImportError:
|
||||||
|
has_tqdm = False
|
||||||
|
|
||||||
|
async def execute(req: Callable) -> Any:
|
||||||
|
try:
|
||||||
|
return await self._rate_limited_request(req)
|
||||||
|
except Exception as e:
|
||||||
|
return {"error": str(e)}
|
||||||
|
|
||||||
|
tasks = [execute(req) for req in requests]
|
||||||
|
|
||||||
|
if has_tqdm:
|
||||||
|
results = []
|
||||||
|
for coro in tqdm.as_completed(tasks, total=len(tasks), desc=desc):
|
||||||
|
result = await coro
|
||||||
|
results.append(result)
|
||||||
|
return results
|
||||||
|
else:
|
||||||
|
return await asyncio.gather(*tasks, return_exceptions=True)
|
||||||
|
|
||||||
|
def print_stats(self) -> None:
|
||||||
|
"""Print request statistics."""
|
||||||
|
self.logger.info("=" * 40)
|
||||||
|
self.logger.info("Request Statistics:")
|
||||||
|
self.logger.info(f" Total Requests: {self.stats['requests']}")
|
||||||
|
self.logger.info(f" Successful: {self.stats['success']}")
|
||||||
|
self.logger.info(f" Errors: {self.stats['errors']}")
|
||||||
|
self.logger.info("=" * 40)
|
||||||
|
|
||||||
|
|
||||||
|
class ConfigManager:
|
||||||
|
"""Manage API configuration and credentials."""
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
load_dotenv()
|
||||||
|
|
||||||
|
@property
|
||||||
|
def google_credentials_path(self) -> str | None:
|
||||||
|
"""Get Google service account credentials path."""
|
||||||
|
# Prefer SEO-specific credentials, fallback to general credentials
|
||||||
|
seo_creds = os.path.expanduser("~/.credential/ourdigital-seo-agent.json")
|
||||||
|
if os.path.exists(seo_creds):
|
||||||
|
return seo_creds
|
||||||
|
return os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
|
||||||
|
|
||||||
|
@property
|
||||||
|
def pagespeed_api_key(self) -> str | None:
|
||||||
|
"""Get PageSpeed Insights API key."""
|
||||||
|
return os.getenv("PAGESPEED_API_KEY")
|
||||||
|
|
||||||
|
@property
|
||||||
|
def custom_search_api_key(self) -> str | None:
|
||||||
|
"""Get Custom Search API key."""
|
||||||
|
return os.getenv("CUSTOM_SEARCH_API_KEY")
|
||||||
|
|
||||||
|
@property
|
||||||
|
def custom_search_engine_id(self) -> str | None:
|
||||||
|
"""Get Custom Search Engine ID."""
|
||||||
|
return os.getenv("CUSTOM_SEARCH_ENGINE_ID")
|
||||||
|
|
||||||
|
@property
|
||||||
|
def notion_token(self) -> str | None:
|
||||||
|
"""Get Notion API token."""
|
||||||
|
return os.getenv("NOTION_TOKEN") or os.getenv("NOTION_API_KEY")
|
||||||
|
|
||||||
|
def validate_google_credentials(self) -> bool:
|
||||||
|
"""Validate Google credentials are configured."""
|
||||||
|
creds_path = self.google_credentials_path
|
||||||
|
if not creds_path:
|
||||||
|
return False
|
||||||
|
return os.path.exists(creds_path)
|
||||||
|
|
||||||
|
def get_required(self, key: str) -> str:
|
||||||
|
"""Get required environment variable or raise error."""
|
||||||
|
value = os.getenv(key)
|
||||||
|
if not value:
|
||||||
|
raise ValueError(f"Missing required environment variable: {key}")
|
||||||
|
return value
|
||||||
|
|
||||||
|
|
||||||
|
# Singleton config instance
|
||||||
|
config = ConfigManager()
|
||||||
@@ -0,0 +1,786 @@
|
|||||||
|
"""
|
||||||
|
Position Tracker - Keyword Ranking Monitor via Ahrefs Rank Tracker
|
||||||
|
==================================================================
|
||||||
|
Purpose: Monitor keyword positions, detect changes, calculate visibility scores
|
||||||
|
Python: 3.10+
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python position_tracker.py --target https://example.com --json
|
||||||
|
python position_tracker.py --target https://example.com --threshold 5 --json
|
||||||
|
python position_tracker.py --target https://example.com --segment brand --json
|
||||||
|
python position_tracker.py --target https://example.com --competitor https://comp1.com --json
|
||||||
|
"""
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import asyncio
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
import math
|
||||||
|
import sys
|
||||||
|
from dataclasses import dataclass, field, asdict
|
||||||
|
from datetime import datetime
|
||||||
|
from typing import Optional
|
||||||
|
from urllib.parse import urlparse
|
||||||
|
|
||||||
|
from base_client import BaseAsyncClient, config
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# CTR curve weights for visibility score (position 1-100)
|
||||||
|
# Based on industry-standard organic CTR curves
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
CTR_WEIGHTS: dict[int, float] = {
|
||||||
|
1: 0.300,
|
||||||
|
2: 0.150,
|
||||||
|
3: 0.100,
|
||||||
|
4: 0.070,
|
||||||
|
5: 0.050,
|
||||||
|
6: 0.038,
|
||||||
|
7: 0.030,
|
||||||
|
8: 0.025,
|
||||||
|
9: 0.020,
|
||||||
|
10: 0.018,
|
||||||
|
}
|
||||||
|
|
||||||
|
# Positions 11-20 get diminishing CTR
|
||||||
|
for _p in range(11, 21):
|
||||||
|
CTR_WEIGHTS[_p] = round(0.015 - (_p - 11) * 0.001, 4)
|
||||||
|
|
||||||
|
# Positions 21-50 get minimal CTR
|
||||||
|
for _p in range(21, 51):
|
||||||
|
CTR_WEIGHTS[_p] = round(max(0.005 - (_p - 21) * 0.0001, 0.001), 4)
|
||||||
|
|
||||||
|
# Positions 51-100 get near-zero CTR
|
||||||
|
for _p in range(51, 101):
|
||||||
|
CTR_WEIGHTS[_p] = 0.0005
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Data classes
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
@dataclass
|
||||||
|
class KeywordPosition:
|
||||||
|
"""Single keyword ranking position."""
|
||||||
|
keyword: str
|
||||||
|
position: int
|
||||||
|
previous_position: Optional[int] = None
|
||||||
|
change: int = 0
|
||||||
|
volume: int = 0
|
||||||
|
url: str = ""
|
||||||
|
intent: str = "informational"
|
||||||
|
is_brand: bool = False
|
||||||
|
|
||||||
|
def __post_init__(self):
|
||||||
|
if self.previous_position is not None:
|
||||||
|
self.change = self.previous_position - self.position
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class VisibilityScore:
|
||||||
|
"""Weighted visibility score based on CTR curve."""
|
||||||
|
score: float = 0.0
|
||||||
|
top3: int = 0
|
||||||
|
top10: int = 0
|
||||||
|
top20: int = 0
|
||||||
|
top50: int = 0
|
||||||
|
top100: int = 0
|
||||||
|
total_keywords: int = 0
|
||||||
|
|
||||||
|
@property
|
||||||
|
def distribution(self) -> dict:
|
||||||
|
return {
|
||||||
|
"top3": self.top3,
|
||||||
|
"top10": self.top10,
|
||||||
|
"top20": self.top20,
|
||||||
|
"top50": self.top50,
|
||||||
|
"top100": self.top100,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class PositionAlert:
|
||||||
|
"""Alert for significant position change."""
|
||||||
|
keyword: str
|
||||||
|
old_position: int
|
||||||
|
new_position: int
|
||||||
|
change: int
|
||||||
|
volume: int = 0
|
||||||
|
severity: str = "medium"
|
||||||
|
|
||||||
|
def __post_init__(self):
|
||||||
|
abs_change = abs(self.change)
|
||||||
|
if abs_change >= 20:
|
||||||
|
self.severity = "critical"
|
||||||
|
elif abs_change >= 10:
|
||||||
|
self.severity = "high"
|
||||||
|
elif abs_change >= 5:
|
||||||
|
self.severity = "medium"
|
||||||
|
else:
|
||||||
|
self.severity = "low"
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class CompetitorComparison:
|
||||||
|
"""Competitor ranking comparison result."""
|
||||||
|
competitor: str
|
||||||
|
overlap_keywords: int = 0
|
||||||
|
competitor_better: int = 0
|
||||||
|
target_better: int = 0
|
||||||
|
avg_position_gap: float = 0.0
|
||||||
|
top_gaps: list = field(default_factory=list)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class SegmentData:
|
||||||
|
"""Keyword segment aggregation."""
|
||||||
|
name: str
|
||||||
|
keywords: int = 0
|
||||||
|
avg_position: float = 0.0
|
||||||
|
visibility: float = 0.0
|
||||||
|
improved: int = 0
|
||||||
|
declined: int = 0
|
||||||
|
stable: int = 0
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class TrackingResult:
|
||||||
|
"""Complete position tracking result."""
|
||||||
|
target: str
|
||||||
|
total_keywords: int = 0
|
||||||
|
visibility_score: float = 0.0
|
||||||
|
visibility: Optional[VisibilityScore] = None
|
||||||
|
positions: list[KeywordPosition] = field(default_factory=list)
|
||||||
|
changes: dict = field(default_factory=lambda: {
|
||||||
|
"improved": 0, "declined": 0, "stable": 0, "new": 0, "lost": 0,
|
||||||
|
})
|
||||||
|
alerts: list[PositionAlert] = field(default_factory=list)
|
||||||
|
segments: dict[str, SegmentData] = field(default_factory=dict)
|
||||||
|
competitors: list[CompetitorComparison] = field(default_factory=list)
|
||||||
|
timestamp: str = ""
|
||||||
|
|
||||||
|
def __post_init__(self):
|
||||||
|
if not self.timestamp:
|
||||||
|
self.timestamp = datetime.now().isoformat()
|
||||||
|
|
||||||
|
def to_dict(self) -> dict:
|
||||||
|
"""Convert to JSON-serializable dictionary."""
|
||||||
|
result = {
|
||||||
|
"target": self.target,
|
||||||
|
"total_keywords": self.total_keywords,
|
||||||
|
"visibility_score": round(self.visibility_score, 2),
|
||||||
|
"positions": self.visibility.distribution if self.visibility else {},
|
||||||
|
"changes": self.changes,
|
||||||
|
"alerts": [asdict(a) for a in self.alerts],
|
||||||
|
"segments": {
|
||||||
|
k: asdict(v) for k, v in self.segments.items()
|
||||||
|
},
|
||||||
|
"competitors": [asdict(c) for c in self.competitors],
|
||||||
|
"keyword_details": [asdict(p) for p in self.positions],
|
||||||
|
"timestamp": self.timestamp,
|
||||||
|
}
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Position Tracker
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
class PositionTracker(BaseAsyncClient):
|
||||||
|
"""Track keyword ranking positions via Ahrefs Rank Tracker."""
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
super().__init__(
|
||||||
|
max_concurrent=5,
|
||||||
|
requests_per_second=2.0,
|
||||||
|
logger=logger,
|
||||||
|
)
|
||||||
|
self.brand_terms: list[str] = []
|
||||||
|
|
||||||
|
def _extract_domain_brand(self, target: str) -> list[str]:
|
||||||
|
"""Extract brand terms from the target domain name."""
|
||||||
|
parsed = urlparse(target)
|
||||||
|
hostname = parsed.hostname or target
|
||||||
|
# Remove TLD and www prefix
|
||||||
|
parts = hostname.replace("www.", "").split(".")
|
||||||
|
brand_parts = []
|
||||||
|
for part in parts:
|
||||||
|
if part not in ("com", "co", "kr", "net", "org", "io", "ai", "www"):
|
||||||
|
brand_parts.append(part.lower())
|
||||||
|
# Also split camelCase and hyphenated forms
|
||||||
|
if "-" in part:
|
||||||
|
brand_parts.extend(part.lower().split("-"))
|
||||||
|
return list(set(brand_parts))
|
||||||
|
|
||||||
|
async def get_project_keywords(self, target: str) -> list[dict]:
|
||||||
|
"""
|
||||||
|
Fetch tracked keywords from Ahrefs management-project-keywords.
|
||||||
|
|
||||||
|
Uses Ahrefs MCP tool: management-project-keywords
|
||||||
|
Returns list of keyword dicts with keyword, volume, intent info.
|
||||||
|
"""
|
||||||
|
logger.info(f"Fetching project keywords for: {target}")
|
||||||
|
|
||||||
|
# Step 1: Get project list to find matching project
|
||||||
|
projects = await self._call_ahrefs_projects(target)
|
||||||
|
if not projects:
|
||||||
|
logger.warning(f"No Ahrefs project found for {target}. Using rank-tracker-overview directly.")
|
||||||
|
return []
|
||||||
|
|
||||||
|
project_id = projects[0].get("id", "")
|
||||||
|
|
||||||
|
# Step 2: Fetch keywords for the project
|
||||||
|
keywords_data = await self._call_ahrefs_project_keywords(project_id)
|
||||||
|
return keywords_data
|
||||||
|
|
||||||
|
async def _call_ahrefs_projects(self, target: str) -> list[dict]:
|
||||||
|
"""
|
||||||
|
Call Ahrefs management-projects MCP tool.
|
||||||
|
In production, this calls the MCP tool. For standalone, reads from config/cache.
|
||||||
|
"""
|
||||||
|
# Simulated MCP call structure - in production this calls:
|
||||||
|
# mcp__ahrefs__management-projects
|
||||||
|
logger.info("Calling Ahrefs management-projects...")
|
||||||
|
try:
|
||||||
|
import subprocess
|
||||||
|
result = subprocess.run(
|
||||||
|
["mcp-cli", "call", "ahrefs/management-projects", json.dumps({})],
|
||||||
|
capture_output=True, text=True, timeout=30,
|
||||||
|
)
|
||||||
|
if result.returncode == 0:
|
||||||
|
return json.loads(result.stdout).get("projects", [])
|
||||||
|
except (FileNotFoundError, subprocess.TimeoutExpired, json.JSONDecodeError):
|
||||||
|
pass
|
||||||
|
# Return empty if MCP not available - caller handles gracefully
|
||||||
|
return []
|
||||||
|
|
||||||
|
async def _call_ahrefs_project_keywords(self, project_id: str) -> list[dict]:
|
||||||
|
"""
|
||||||
|
Call Ahrefs management-project-keywords MCP tool.
|
||||||
|
"""
|
||||||
|
logger.info(f"Calling Ahrefs management-project-keywords for project: {project_id}")
|
||||||
|
try:
|
||||||
|
import subprocess
|
||||||
|
result = subprocess.run(
|
||||||
|
["mcp-cli", "call", "ahrefs/management-project-keywords",
|
||||||
|
json.dumps({"project_id": project_id})],
|
||||||
|
capture_output=True, text=True, timeout=30,
|
||||||
|
)
|
||||||
|
if result.returncode == 0:
|
||||||
|
return json.loads(result.stdout).get("keywords", [])
|
||||||
|
except (FileNotFoundError, subprocess.TimeoutExpired, json.JSONDecodeError):
|
||||||
|
pass
|
||||||
|
return []
|
||||||
|
|
||||||
|
async def get_current_positions(self, target: str) -> list[KeywordPosition]:
|
||||||
|
"""
|
||||||
|
Fetch current keyword positions via Ahrefs rank-tracker-overview.
|
||||||
|
|
||||||
|
Returns list of KeywordPosition objects with current and previous positions.
|
||||||
|
"""
|
||||||
|
logger.info(f"Fetching current positions for: {target}")
|
||||||
|
self.brand_terms = self._extract_domain_brand(target)
|
||||||
|
|
||||||
|
raw_data = await self._call_rank_tracker_overview(target)
|
||||||
|
positions: list[KeywordPosition] = []
|
||||||
|
|
||||||
|
for item in raw_data:
|
||||||
|
keyword = item.get("keyword", "")
|
||||||
|
current_pos = item.get("position", 0)
|
||||||
|
prev_pos = item.get("previous_position")
|
||||||
|
volume = item.get("volume", 0)
|
||||||
|
url = item.get("url", "")
|
||||||
|
intent = item.get("intent", "informational")
|
||||||
|
|
||||||
|
# Determine if brand keyword
|
||||||
|
is_brand = self._is_brand_keyword(keyword)
|
||||||
|
|
||||||
|
kp = KeywordPosition(
|
||||||
|
keyword=keyword,
|
||||||
|
position=current_pos,
|
||||||
|
previous_position=prev_pos,
|
||||||
|
volume=volume,
|
||||||
|
url=url,
|
||||||
|
intent=intent,
|
||||||
|
is_brand=is_brand,
|
||||||
|
)
|
||||||
|
positions.append(kp)
|
||||||
|
|
||||||
|
logger.info(f"Retrieved {len(positions)} keyword positions")
|
||||||
|
return positions
|
||||||
|
|
||||||
|
async def _call_rank_tracker_overview(self, target: str) -> list[dict]:
|
||||||
|
"""
|
||||||
|
Call Ahrefs rank-tracker-overview MCP tool.
|
||||||
|
"""
|
||||||
|
logger.info(f"Calling Ahrefs rank-tracker-overview for: {target}")
|
||||||
|
try:
|
||||||
|
import subprocess
|
||||||
|
result = subprocess.run(
|
||||||
|
["mcp-cli", "call", "ahrefs/rank-tracker-overview",
|
||||||
|
json.dumps({"target": target})],
|
||||||
|
capture_output=True, text=True, timeout=60,
|
||||||
|
)
|
||||||
|
if result.returncode == 0:
|
||||||
|
data = json.loads(result.stdout)
|
||||||
|
return data.get("keywords", data.get("results", []))
|
||||||
|
except (FileNotFoundError, subprocess.TimeoutExpired, json.JSONDecodeError):
|
||||||
|
pass
|
||||||
|
return []
|
||||||
|
|
||||||
|
def _is_brand_keyword(self, keyword: str) -> bool:
|
||||||
|
"""Check if a keyword is brand-related based on domain name."""
|
||||||
|
keyword_lower = keyword.lower()
|
||||||
|
for term in self.brand_terms:
|
||||||
|
if term in keyword_lower:
|
||||||
|
return True
|
||||||
|
return False
|
||||||
|
|
||||||
|
def detect_changes(
|
||||||
|
self,
|
||||||
|
positions: list[KeywordPosition],
|
||||||
|
threshold: int = 3,
|
||||||
|
) -> tuple[dict, list[PositionAlert]]:
|
||||||
|
"""
|
||||||
|
Detect significant position changes and generate alerts.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
positions: List of current keyword positions with previous data
|
||||||
|
threshold: Minimum position change to trigger an alert
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Tuple of (change_summary_dict, list_of_alerts)
|
||||||
|
"""
|
||||||
|
changes = {
|
||||||
|
"improved": 0,
|
||||||
|
"declined": 0,
|
||||||
|
"stable": 0,
|
||||||
|
"new": 0,
|
||||||
|
"lost": 0,
|
||||||
|
}
|
||||||
|
alerts: list[PositionAlert] = []
|
||||||
|
|
||||||
|
for kp in positions:
|
||||||
|
if kp.previous_position is None:
|
||||||
|
changes["new"] += 1
|
||||||
|
continue
|
||||||
|
|
||||||
|
if kp.position == 0 and kp.previous_position > 0:
|
||||||
|
changes["lost"] += 1
|
||||||
|
alert = PositionAlert(
|
||||||
|
keyword=kp.keyword,
|
||||||
|
old_position=kp.previous_position,
|
||||||
|
new_position=0,
|
||||||
|
change=-kp.previous_position,
|
||||||
|
volume=kp.volume,
|
||||||
|
)
|
||||||
|
alerts.append(alert)
|
||||||
|
continue
|
||||||
|
|
||||||
|
change = kp.change # positive = improved, negative = declined
|
||||||
|
if change > 0:
|
||||||
|
changes["improved"] += 1
|
||||||
|
elif change < 0:
|
||||||
|
changes["declined"] += 1
|
||||||
|
else:
|
||||||
|
changes["stable"] += 1
|
||||||
|
|
||||||
|
# Generate alert if change exceeds threshold
|
||||||
|
if abs(change) >= threshold:
|
||||||
|
alert = PositionAlert(
|
||||||
|
keyword=kp.keyword,
|
||||||
|
old_position=kp.previous_position,
|
||||||
|
new_position=kp.position,
|
||||||
|
change=change,
|
||||||
|
volume=kp.volume,
|
||||||
|
)
|
||||||
|
alerts.append(alert)
|
||||||
|
|
||||||
|
# Sort alerts by severity (critical first) then by volume (high first)
|
||||||
|
severity_order = {"critical": 0, "high": 1, "medium": 2, "low": 3}
|
||||||
|
alerts.sort(key=lambda a: (severity_order.get(a.severity, 4), -a.volume))
|
||||||
|
|
||||||
|
logger.info(
|
||||||
|
f"Changes detected - improved: {changes['improved']}, "
|
||||||
|
f"declined: {changes['declined']}, stable: {changes['stable']}, "
|
||||||
|
f"new: {changes['new']}, lost: {changes['lost']}"
|
||||||
|
)
|
||||||
|
logger.info(f"Alerts generated: {len(alerts)} (threshold: {threshold})")
|
||||||
|
|
||||||
|
return changes, alerts
|
||||||
|
|
||||||
|
def calculate_visibility(self, positions: list[KeywordPosition]) -> VisibilityScore:
|
||||||
|
"""
|
||||||
|
Calculate weighted visibility score based on CTR curve.
|
||||||
|
|
||||||
|
Visibility = sum(keyword_volume * ctr_weight_for_position) / sum(keyword_volume)
|
||||||
|
Score normalized to 0-100 scale.
|
||||||
|
"""
|
||||||
|
vis = VisibilityScore()
|
||||||
|
total_weighted = 0.0
|
||||||
|
total_volume = 0
|
||||||
|
|
||||||
|
for kp in positions:
|
||||||
|
if kp.position <= 0 or kp.position > 100:
|
||||||
|
continue
|
||||||
|
|
||||||
|
vis.total_keywords += 1
|
||||||
|
volume = max(kp.volume, 1) # Avoid zero volume
|
||||||
|
total_volume += volume
|
||||||
|
|
||||||
|
# Position bucket counting
|
||||||
|
if kp.position <= 3:
|
||||||
|
vis.top3 += 1
|
||||||
|
if kp.position <= 10:
|
||||||
|
vis.top10 += 1
|
||||||
|
if kp.position <= 20:
|
||||||
|
vis.top20 += 1
|
||||||
|
if kp.position <= 50:
|
||||||
|
vis.top50 += 1
|
||||||
|
if kp.position <= 100:
|
||||||
|
vis.top100 += 1
|
||||||
|
|
||||||
|
# Weighted visibility
|
||||||
|
ctr = CTR_WEIGHTS.get(kp.position, 0.0005)
|
||||||
|
total_weighted += volume * ctr
|
||||||
|
|
||||||
|
if total_volume > 0:
|
||||||
|
# Normalize: max possible is if all keywords were position 1
|
||||||
|
max_possible = total_volume * CTR_WEIGHTS[1]
|
||||||
|
vis.score = (total_weighted / max_possible) * 100.0
|
||||||
|
else:
|
||||||
|
vis.score = 0.0
|
||||||
|
|
||||||
|
logger.info(
|
||||||
|
f"Visibility score: {vis.score:.2f} | "
|
||||||
|
f"Top3: {vis.top3}, Top10: {vis.top10}, Top20: {vis.top20}"
|
||||||
|
)
|
||||||
|
|
||||||
|
return vis
|
||||||
|
|
||||||
|
def segment_keywords(
|
||||||
|
self,
|
||||||
|
positions: list[KeywordPosition],
|
||||||
|
filter_segment: Optional[str] = None,
|
||||||
|
) -> dict[str, SegmentData]:
|
||||||
|
"""
|
||||||
|
Segment keywords into brand/non-brand and by intent type.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
positions: List of keyword positions
|
||||||
|
filter_segment: Optional filter - 'brand', 'non_brand', or intent type
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dictionary of segment name to SegmentData
|
||||||
|
"""
|
||||||
|
segments: dict[str, list[KeywordPosition]] = {
|
||||||
|
"brand": [],
|
||||||
|
"non_brand": [],
|
||||||
|
}
|
||||||
|
intent_segments: dict[str, list[KeywordPosition]] = {}
|
||||||
|
|
||||||
|
for kp in positions:
|
||||||
|
# Brand segmentation
|
||||||
|
if kp.is_brand:
|
||||||
|
segments["brand"].append(kp)
|
||||||
|
else:
|
||||||
|
segments["non_brand"].append(kp)
|
||||||
|
|
||||||
|
# Intent segmentation
|
||||||
|
intent_key = kp.intent.lower() if kp.intent else "informational"
|
||||||
|
if intent_key not in intent_segments:
|
||||||
|
intent_segments[intent_key] = []
|
||||||
|
intent_segments[intent_key].append(kp)
|
||||||
|
|
||||||
|
# Merge intent segments into main segments
|
||||||
|
for intent_key, kps in intent_segments.items():
|
||||||
|
segments[f"intent_{intent_key}"] = kps
|
||||||
|
|
||||||
|
# Calculate segment stats
|
||||||
|
result: dict[str, SegmentData] = {}
|
||||||
|
for seg_name, kps in segments.items():
|
||||||
|
if filter_segment and seg_name != filter_segment:
|
||||||
|
continue
|
||||||
|
|
||||||
|
if not kps:
|
||||||
|
continue
|
||||||
|
|
||||||
|
active_positions = [kp for kp in kps if kp.position > 0]
|
||||||
|
avg_pos = (
|
||||||
|
sum(kp.position for kp in active_positions) / len(active_positions)
|
||||||
|
if active_positions else 0.0
|
||||||
|
)
|
||||||
|
|
||||||
|
vis = self.calculate_visibility(kps)
|
||||||
|
|
||||||
|
improved = sum(1 for kp in kps if kp.change > 0)
|
||||||
|
declined = sum(1 for kp in kps if kp.change < 0)
|
||||||
|
stable = sum(1 for kp in kps if kp.change == 0 and kp.previous_position is not None)
|
||||||
|
|
||||||
|
result[seg_name] = SegmentData(
|
||||||
|
name=seg_name,
|
||||||
|
keywords=len(kps),
|
||||||
|
avg_position=round(avg_pos, 1),
|
||||||
|
visibility=round(vis.score, 2),
|
||||||
|
improved=improved,
|
||||||
|
declined=declined,
|
||||||
|
stable=stable,
|
||||||
|
)
|
||||||
|
|
||||||
|
return result
|
||||||
|
|
||||||
|
async def compare_competitors(
|
||||||
|
self,
|
||||||
|
target: str,
|
||||||
|
competitors: list[str],
|
||||||
|
) -> list[CompetitorComparison]:
|
||||||
|
"""
|
||||||
|
Compare ranking positions against competitors.
|
||||||
|
|
||||||
|
Uses Ahrefs rank-tracker-competitors-overview MCP tool.
|
||||||
|
"""
|
||||||
|
comparisons: list[CompetitorComparison] = []
|
||||||
|
|
||||||
|
for competitor in competitors:
|
||||||
|
logger.info(f"Comparing with competitor: {competitor}")
|
||||||
|
comp_data = await self._call_competitors_overview(target, competitor)
|
||||||
|
|
||||||
|
comparison = CompetitorComparison(competitor=competitor)
|
||||||
|
|
||||||
|
if comp_data:
|
||||||
|
comparison.overlap_keywords = comp_data.get("overlap_keywords", 0)
|
||||||
|
comparison.competitor_better = comp_data.get("competitor_better", 0)
|
||||||
|
comparison.target_better = comp_data.get("target_better", 0)
|
||||||
|
comparison.avg_position_gap = comp_data.get("avg_position_gap", 0.0)
|
||||||
|
|
||||||
|
# Extract top gaps (keywords where competitor outranks us most)
|
||||||
|
top_gaps = comp_data.get("top_gaps", [])
|
||||||
|
comparison.top_gaps = top_gaps[:10]
|
||||||
|
|
||||||
|
comparisons.append(comparison)
|
||||||
|
|
||||||
|
return comparisons
|
||||||
|
|
||||||
|
async def _call_competitors_overview(self, target: str, competitor: str) -> dict:
|
||||||
|
"""
|
||||||
|
Call Ahrefs rank-tracker-competitors-overview MCP tool.
|
||||||
|
"""
|
||||||
|
logger.info(f"Calling Ahrefs rank-tracker-competitors-overview...")
|
||||||
|
try:
|
||||||
|
import subprocess
|
||||||
|
result = subprocess.run(
|
||||||
|
["mcp-cli", "call", "ahrefs/rank-tracker-competitors-overview",
|
||||||
|
json.dumps({"target": target, "competitor": competitor})],
|
||||||
|
capture_output=True, text=True, timeout=60,
|
||||||
|
)
|
||||||
|
if result.returncode == 0:
|
||||||
|
return json.loads(result.stdout)
|
||||||
|
except (FileNotFoundError, subprocess.TimeoutExpired, json.JSONDecodeError):
|
||||||
|
pass
|
||||||
|
return {}
|
||||||
|
|
||||||
|
async def analyze(
|
||||||
|
self,
|
||||||
|
target: str,
|
||||||
|
threshold: int = 3,
|
||||||
|
competitors: Optional[list[str]] = None,
|
||||||
|
segment_filter: Optional[str] = None,
|
||||||
|
) -> TrackingResult:
|
||||||
|
"""
|
||||||
|
Orchestrate full position tracking analysis.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
target: Target website URL
|
||||||
|
threshold: Position change threshold for alerts
|
||||||
|
competitors: List of competitor URLs to compare
|
||||||
|
segment_filter: Optional segment filter (brand, non_brand, intent_*)
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Complete TrackingResult with all analysis data
|
||||||
|
"""
|
||||||
|
logger.info(f"Starting position tracking analysis for: {target}")
|
||||||
|
logger.info(f"Threshold: {threshold}, Competitors: {competitors or 'none'}")
|
||||||
|
|
||||||
|
result = TrackingResult(target=target)
|
||||||
|
|
||||||
|
# Step 1: Fetch current positions
|
||||||
|
positions = await self.get_current_positions(target)
|
||||||
|
|
||||||
|
if not positions:
|
||||||
|
logger.warning("No position data retrieved. Check Ahrefs project configuration.")
|
||||||
|
return result
|
||||||
|
|
||||||
|
result.positions = positions
|
||||||
|
result.total_keywords = len(positions)
|
||||||
|
|
||||||
|
# Step 2: Detect changes and generate alerts
|
||||||
|
changes, alerts = self.detect_changes(positions, threshold)
|
||||||
|
result.changes = changes
|
||||||
|
result.alerts = alerts
|
||||||
|
|
||||||
|
# Step 3: Calculate visibility score
|
||||||
|
visibility = self.calculate_visibility(positions)
|
||||||
|
result.visibility = visibility
|
||||||
|
result.visibility_score = visibility.score
|
||||||
|
|
||||||
|
# Step 4: Segment keywords
|
||||||
|
segments = self.segment_keywords(positions, segment_filter)
|
||||||
|
result.segments = segments
|
||||||
|
|
||||||
|
# Step 5: Compare with competitors (if provided)
|
||||||
|
if competitors:
|
||||||
|
comp_results = await self.compare_competitors(target, competitors)
|
||||||
|
result.competitors = comp_results
|
||||||
|
|
||||||
|
logger.info(f"Analysis complete. Total keywords: {result.total_keywords}")
|
||||||
|
logger.info(f"Visibility score: {result.visibility_score:.2f}")
|
||||||
|
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Output formatters
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
def format_text_report(result: TrackingResult) -> str:
|
||||||
|
"""Format tracking result as human-readable text report."""
|
||||||
|
lines = []
|
||||||
|
lines.append("=" * 60)
|
||||||
|
lines.append(f"Position Tracking Report: {result.target}")
|
||||||
|
lines.append(f"Timestamp: {result.timestamp}")
|
||||||
|
lines.append("=" * 60)
|
||||||
|
|
||||||
|
# Visibility overview
|
||||||
|
lines.append(f"\nVisibility Score: {result.visibility_score:.2f}/100")
|
||||||
|
lines.append(f"Total Keywords Tracked: {result.total_keywords}")
|
||||||
|
|
||||||
|
if result.visibility:
|
||||||
|
vis = result.visibility
|
||||||
|
lines.append(f"\nPosition Distribution:")
|
||||||
|
lines.append(f" Top 3: {vis.top3}")
|
||||||
|
lines.append(f" Top 10: {vis.top10}")
|
||||||
|
lines.append(f" Top 20: {vis.top20}")
|
||||||
|
lines.append(f" Top 50: {vis.top50}")
|
||||||
|
lines.append(f" Top 100: {vis.top100}")
|
||||||
|
|
||||||
|
# Changes summary
|
||||||
|
ch = result.changes
|
||||||
|
lines.append(f"\nPosition Changes:")
|
||||||
|
lines.append(f" Improved: {ch.get('improved', 0)}")
|
||||||
|
lines.append(f" Declined: {ch.get('declined', 0)}")
|
||||||
|
lines.append(f" Stable: {ch.get('stable', 0)}")
|
||||||
|
lines.append(f" New: {ch.get('new', 0)}")
|
||||||
|
lines.append(f" Lost: {ch.get('lost', 0)}")
|
||||||
|
|
||||||
|
# Alerts
|
||||||
|
if result.alerts:
|
||||||
|
lines.append(f"\nAlerts ({len(result.alerts)}):")
|
||||||
|
lines.append("-" * 60)
|
||||||
|
for alert in result.alerts[:20]:
|
||||||
|
direction = "UP" if alert.change > 0 else "DOWN"
|
||||||
|
lines.append(
|
||||||
|
f" [{alert.severity.upper()}] {alert.keyword}: "
|
||||||
|
f"{alert.old_position} -> {alert.new_position} "
|
||||||
|
f"({direction} {abs(alert.change)}) | Vol: {alert.volume}"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Segments
|
||||||
|
if result.segments:
|
||||||
|
lines.append(f"\nSegments:")
|
||||||
|
lines.append("-" * 60)
|
||||||
|
for name, seg in result.segments.items():
|
||||||
|
lines.append(
|
||||||
|
f" {name}: {seg.keywords} keywords, "
|
||||||
|
f"avg pos {seg.avg_position}, "
|
||||||
|
f"vis {seg.visibility}"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Competitors
|
||||||
|
if result.competitors:
|
||||||
|
lines.append(f"\nCompetitor Comparison:")
|
||||||
|
lines.append("-" * 60)
|
||||||
|
for comp in result.competitors:
|
||||||
|
lines.append(f" vs {comp.competitor}:")
|
||||||
|
lines.append(f" Overlap: {comp.overlap_keywords} keywords")
|
||||||
|
lines.append(f" We win: {comp.target_better}")
|
||||||
|
lines.append(f" They win: {comp.competitor_better}")
|
||||||
|
lines.append(f" Avg gap: {comp.avg_position_gap:.1f}")
|
||||||
|
|
||||||
|
lines.append("\n" + "=" * 60)
|
||||||
|
return "\n".join(lines)
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# CLI
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
def parse_args() -> argparse.Namespace:
|
||||||
|
parser = argparse.ArgumentParser(
|
||||||
|
description="Position Tracker - Monitor keyword rankings via Ahrefs Rank Tracker",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--target",
|
||||||
|
required=True,
|
||||||
|
help="Target website URL (e.g., https://example.com)",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--threshold",
|
||||||
|
type=int,
|
||||||
|
default=3,
|
||||||
|
help="Position change threshold for alerts (default: 3)",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--segment",
|
||||||
|
choices=["brand", "non_brand", "intent_informational",
|
||||||
|
"intent_commercial", "intent_transactional", "intent_navigational"],
|
||||||
|
default=None,
|
||||||
|
help="Filter results by keyword segment",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--competitor",
|
||||||
|
action="append",
|
||||||
|
dest="competitors",
|
||||||
|
default=[],
|
||||||
|
help="Competitor URL to compare (repeatable)",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--json",
|
||||||
|
action="store_true",
|
||||||
|
dest="json_output",
|
||||||
|
help="Output in JSON format",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--output",
|
||||||
|
type=str,
|
||||||
|
default=None,
|
||||||
|
help="Save output to file path",
|
||||||
|
)
|
||||||
|
return parser.parse_args()
|
||||||
|
|
||||||
|
|
||||||
|
async def main():
|
||||||
|
args = parse_args()
|
||||||
|
|
||||||
|
tracker = PositionTracker()
|
||||||
|
|
||||||
|
result = await tracker.analyze(
|
||||||
|
target=args.target,
|
||||||
|
threshold=args.threshold,
|
||||||
|
competitors=args.competitors,
|
||||||
|
segment_filter=args.segment,
|
||||||
|
)
|
||||||
|
|
||||||
|
if args.json_output:
|
||||||
|
output = json.dumps(result.to_dict(), ensure_ascii=False, indent=2)
|
||||||
|
else:
|
||||||
|
output = format_text_report(result)
|
||||||
|
|
||||||
|
if args.output:
|
||||||
|
with open(args.output, "w", encoding="utf-8") as f:
|
||||||
|
f.write(output)
|
||||||
|
logger.info(f"Output saved to: {args.output}")
|
||||||
|
else:
|
||||||
|
print(output)
|
||||||
|
|
||||||
|
tracker.print_stats()
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
asyncio.run(main())
|
||||||
@@ -0,0 +1,728 @@
|
|||||||
|
"""
|
||||||
|
Ranking Reporter - Ranking Performance Reports with Trends
|
||||||
|
==========================================================
|
||||||
|
Purpose: Generate ranking reports with trend analysis, top movers, and competitor comparison
|
||||||
|
Python: 3.10+
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python ranking_reporter.py --target https://example.com --period 30 --json
|
||||||
|
python ranking_reporter.py --target https://example.com --period 90 --json
|
||||||
|
python ranking_reporter.py --target https://example.com --competitor https://comp1.com --period 30 --json
|
||||||
|
"""
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import asyncio
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
import sys
|
||||||
|
from dataclasses import dataclass, field, asdict
|
||||||
|
from datetime import datetime, timedelta
|
||||||
|
from typing import Optional
|
||||||
|
from urllib.parse import urlparse
|
||||||
|
|
||||||
|
from base_client import BaseAsyncClient, config
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
# CTR weights for impact scoring (same as position_tracker)
|
||||||
|
CTR_WEIGHTS: dict[int, float] = {
|
||||||
|
1: 0.300, 2: 0.150, 3: 0.100, 4: 0.070, 5: 0.050,
|
||||||
|
6: 0.038, 7: 0.030, 8: 0.025, 9: 0.020, 10: 0.018,
|
||||||
|
}
|
||||||
|
for _p in range(11, 21):
|
||||||
|
CTR_WEIGHTS[_p] = round(0.015 - (_p - 11) * 0.001, 4)
|
||||||
|
for _p in range(21, 51):
|
||||||
|
CTR_WEIGHTS[_p] = round(max(0.005 - (_p - 21) * 0.0001, 0.001), 4)
|
||||||
|
for _p in range(51, 101):
|
||||||
|
CTR_WEIGHTS[_p] = 0.0005
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Data classes
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
@dataclass
|
||||||
|
class PositionSnapshot:
|
||||||
|
"""A single position measurement at a point in time."""
|
||||||
|
date: str
|
||||||
|
position: int
|
||||||
|
volume: int = 0
|
||||||
|
url: str = ""
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class RankingTrend:
|
||||||
|
"""Keyword ranking trend over time."""
|
||||||
|
keyword: str
|
||||||
|
positions_over_time: list[PositionSnapshot] = field(default_factory=list)
|
||||||
|
trend_direction: str = "stable" # improved, declined, stable, new, lost
|
||||||
|
avg_position: float = 0.0
|
||||||
|
current_position: int = 0
|
||||||
|
start_position: int = 0
|
||||||
|
total_change: int = 0
|
||||||
|
volume: int = 0
|
||||||
|
intent: str = "informational"
|
||||||
|
is_brand: bool = False
|
||||||
|
|
||||||
|
def compute_trend(self):
|
||||||
|
"""Compute trend direction and average from position history."""
|
||||||
|
if not self.positions_over_time:
|
||||||
|
self.trend_direction = "stable"
|
||||||
|
return
|
||||||
|
|
||||||
|
positions = [s.position for s in self.positions_over_time if s.position > 0]
|
||||||
|
if not positions:
|
||||||
|
self.trend_direction = "lost"
|
||||||
|
return
|
||||||
|
|
||||||
|
self.avg_position = sum(positions) / len(positions)
|
||||||
|
self.current_position = positions[-1]
|
||||||
|
self.start_position = positions[0]
|
||||||
|
self.total_change = self.start_position - self.current_position
|
||||||
|
|
||||||
|
# Determine trend using linear regression direction
|
||||||
|
if len(positions) >= 2:
|
||||||
|
n = len(positions)
|
||||||
|
x_mean = (n - 1) / 2.0
|
||||||
|
y_mean = sum(positions) / n
|
||||||
|
numerator = sum((i - x_mean) * (p - y_mean) for i, p in enumerate(positions))
|
||||||
|
denominator = sum((i - x_mean) ** 2 for i in range(n))
|
||||||
|
|
||||||
|
if denominator > 0:
|
||||||
|
slope = numerator / denominator
|
||||||
|
# Negative slope means position number decreasing = improving
|
||||||
|
if slope < -0.5:
|
||||||
|
self.trend_direction = "improved"
|
||||||
|
elif slope > 0.5:
|
||||||
|
self.trend_direction = "declined"
|
||||||
|
else:
|
||||||
|
self.trend_direction = "stable"
|
||||||
|
else:
|
||||||
|
self.trend_direction = "stable"
|
||||||
|
|
||||||
|
if self.volume == 0 and self.positions_over_time:
|
||||||
|
self.volume = self.positions_over_time[-1].volume
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class TopMover:
|
||||||
|
"""Keyword with significant position change."""
|
||||||
|
keyword: str
|
||||||
|
position_change: int
|
||||||
|
current_position: int = 0
|
||||||
|
previous_position: int = 0
|
||||||
|
volume: int = 0
|
||||||
|
impact_score: float = 0.0
|
||||||
|
direction: str = "improved"
|
||||||
|
|
||||||
|
def calculate_impact(self):
|
||||||
|
"""Calculate impact score: volume * CTR delta."""
|
||||||
|
old_ctr = CTR_WEIGHTS.get(self.previous_position, 0.0005) if self.previous_position > 0 else 0.0
|
||||||
|
new_ctr = CTR_WEIGHTS.get(self.current_position, 0.0005) if self.current_position > 0 else 0.0
|
||||||
|
ctr_delta = abs(new_ctr - old_ctr)
|
||||||
|
self.impact_score = round(self.volume * ctr_delta, 2)
|
||||||
|
self.direction = "improved" if self.position_change > 0 else "declined"
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class SegmentReport:
|
||||||
|
"""Performance breakdown for a keyword segment."""
|
||||||
|
segment_name: str
|
||||||
|
total_keywords: int = 0
|
||||||
|
avg_position: float = 0.0
|
||||||
|
avg_position_change: float = 0.0
|
||||||
|
visibility_score: float = 0.0
|
||||||
|
improved_count: int = 0
|
||||||
|
declined_count: int = 0
|
||||||
|
stable_count: int = 0
|
||||||
|
top_gainers: list[TopMover] = field(default_factory=list)
|
||||||
|
top_losers: list[TopMover] = field(default_factory=list)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class CompetitorReport:
|
||||||
|
"""Competitor comparison for a reporting period."""
|
||||||
|
competitor: str
|
||||||
|
our_visibility: float = 0.0
|
||||||
|
their_visibility: float = 0.0
|
||||||
|
overlap_keywords: int = 0
|
||||||
|
keywords_we_lead: int = 0
|
||||||
|
keywords_they_lead: int = 0
|
||||||
|
notable_gaps: list[dict] = field(default_factory=list)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class RankingReport:
|
||||||
|
"""Complete ranking performance report."""
|
||||||
|
target: str
|
||||||
|
period_days: int = 30
|
||||||
|
period_start: str = ""
|
||||||
|
period_end: str = ""
|
||||||
|
total_keywords: int = 0
|
||||||
|
current_visibility: float = 0.0
|
||||||
|
previous_visibility: float = 0.0
|
||||||
|
visibility_change: float = 0.0
|
||||||
|
trend_summary: dict = field(default_factory=lambda: {
|
||||||
|
"improved": 0, "declined": 0, "stable": 0, "new": 0, "lost": 0,
|
||||||
|
})
|
||||||
|
top_gainers: list[TopMover] = field(default_factory=list)
|
||||||
|
top_losers: list[TopMover] = field(default_factory=list)
|
||||||
|
segments: list[SegmentReport] = field(default_factory=list)
|
||||||
|
competitors: list[CompetitorReport] = field(default_factory=list)
|
||||||
|
keyword_trends: list[RankingTrend] = field(default_factory=list)
|
||||||
|
timestamp: str = ""
|
||||||
|
|
||||||
|
def __post_init__(self):
|
||||||
|
if not self.timestamp:
|
||||||
|
self.timestamp = datetime.now().isoformat()
|
||||||
|
if not self.period_end:
|
||||||
|
self.period_end = datetime.now().strftime("%Y-%m-%d")
|
||||||
|
if not self.period_start:
|
||||||
|
start = datetime.now() - timedelta(days=self.period_days)
|
||||||
|
self.period_start = start.strftime("%Y-%m-%d")
|
||||||
|
|
||||||
|
def to_dict(self) -> dict:
|
||||||
|
"""Convert to JSON-serializable dictionary."""
|
||||||
|
return {
|
||||||
|
"target": self.target,
|
||||||
|
"period": {
|
||||||
|
"days": self.period_days,
|
||||||
|
"start": self.period_start,
|
||||||
|
"end": self.period_end,
|
||||||
|
},
|
||||||
|
"total_keywords": self.total_keywords,
|
||||||
|
"visibility": {
|
||||||
|
"current": round(self.current_visibility, 2),
|
||||||
|
"previous": round(self.previous_visibility, 2),
|
||||||
|
"change": round(self.visibility_change, 2),
|
||||||
|
},
|
||||||
|
"trend_summary": self.trend_summary,
|
||||||
|
"top_gainers": [asdict(m) for m in self.top_gainers],
|
||||||
|
"top_losers": [asdict(m) for m in self.top_losers],
|
||||||
|
"segments": [asdict(s) for s in self.segments],
|
||||||
|
"competitors": [asdict(c) for c in self.competitors],
|
||||||
|
"keyword_trends": [
|
||||||
|
{
|
||||||
|
"keyword": t.keyword,
|
||||||
|
"trend_direction": t.trend_direction,
|
||||||
|
"avg_position": round(t.avg_position, 1),
|
||||||
|
"current_position": t.current_position,
|
||||||
|
"start_position": t.start_position,
|
||||||
|
"total_change": t.total_change,
|
||||||
|
"volume": t.volume,
|
||||||
|
}
|
||||||
|
for t in self.keyword_trends
|
||||||
|
],
|
||||||
|
"timestamp": self.timestamp,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Ranking Reporter
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
class RankingReporter(BaseAsyncClient):
|
||||||
|
"""Generate ranking performance reports with trend analysis."""
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
super().__init__(
|
||||||
|
max_concurrent=5,
|
||||||
|
requests_per_second=2.0,
|
||||||
|
logger=logger,
|
||||||
|
)
|
||||||
|
|
||||||
|
def _extract_domain_brand(self, target: str) -> list[str]:
|
||||||
|
"""Extract brand terms from the target domain name."""
|
||||||
|
parsed = urlparse(target)
|
||||||
|
hostname = parsed.hostname or target
|
||||||
|
parts = hostname.replace("www.", "").split(".")
|
||||||
|
brand_parts = []
|
||||||
|
for part in parts:
|
||||||
|
if part not in ("com", "co", "kr", "net", "org", "io", "ai", "www"):
|
||||||
|
brand_parts.append(part.lower())
|
||||||
|
if "-" in part:
|
||||||
|
brand_parts.extend(part.lower().split("-"))
|
||||||
|
return list(set(brand_parts))
|
||||||
|
|
||||||
|
async def get_historical_positions(
|
||||||
|
self,
|
||||||
|
target: str,
|
||||||
|
period_days: int = 30,
|
||||||
|
) -> list[RankingTrend]:
|
||||||
|
"""
|
||||||
|
Fetch historical position data from Ahrefs rank-tracker-overview
|
||||||
|
with date range parameters.
|
||||||
|
|
||||||
|
Returns list of RankingTrend objects with position snapshots over time.
|
||||||
|
"""
|
||||||
|
logger.info(f"Fetching historical positions for {target} ({period_days} days)")
|
||||||
|
brand_terms = self._extract_domain_brand(target)
|
||||||
|
|
||||||
|
end_date = datetime.now().strftime("%Y-%m-%d")
|
||||||
|
start_date = (datetime.now() - timedelta(days=period_days)).strftime("%Y-%m-%d")
|
||||||
|
|
||||||
|
raw_data = await self._call_rank_tracker_historical(target, start_date, end_date)
|
||||||
|
|
||||||
|
trends: dict[str, RankingTrend] = {}
|
||||||
|
for item in raw_data:
|
||||||
|
keyword = item.get("keyword", "")
|
||||||
|
if keyword not in trends:
|
||||||
|
is_brand = any(term in keyword.lower() for term in brand_terms)
|
||||||
|
trends[keyword] = RankingTrend(
|
||||||
|
keyword=keyword,
|
||||||
|
volume=item.get("volume", 0),
|
||||||
|
intent=item.get("intent", "informational"),
|
||||||
|
is_brand=is_brand,
|
||||||
|
)
|
||||||
|
|
||||||
|
snapshot = PositionSnapshot(
|
||||||
|
date=item.get("date", end_date),
|
||||||
|
position=item.get("position", 0),
|
||||||
|
volume=item.get("volume", 0),
|
||||||
|
url=item.get("url", ""),
|
||||||
|
)
|
||||||
|
trends[keyword].positions_over_time.append(snapshot)
|
||||||
|
|
||||||
|
# Sort snapshots by date and compute trends
|
||||||
|
for trend in trends.values():
|
||||||
|
trend.positions_over_time.sort(key=lambda s: s.date)
|
||||||
|
trend.compute_trend()
|
||||||
|
|
||||||
|
logger.info(f"Retrieved trends for {len(trends)} keywords")
|
||||||
|
return list(trends.values())
|
||||||
|
|
||||||
|
async def _call_rank_tracker_historical(
|
||||||
|
self, target: str, start_date: str, end_date: str,
|
||||||
|
) -> list[dict]:
|
||||||
|
"""Call Ahrefs rank-tracker-overview with date range."""
|
||||||
|
logger.info(f"Calling Ahrefs rank-tracker-overview ({start_date} to {end_date})...")
|
||||||
|
try:
|
||||||
|
import subprocess
|
||||||
|
result = subprocess.run(
|
||||||
|
["mcp-cli", "call", "ahrefs/rank-tracker-overview",
|
||||||
|
json.dumps({
|
||||||
|
"target": target,
|
||||||
|
"date_from": start_date,
|
||||||
|
"date_to": end_date,
|
||||||
|
})],
|
||||||
|
capture_output=True, text=True, timeout=60,
|
||||||
|
)
|
||||||
|
if result.returncode == 0:
|
||||||
|
data = json.loads(result.stdout)
|
||||||
|
return data.get("keywords", data.get("results", []))
|
||||||
|
except (FileNotFoundError, subprocess.TimeoutExpired, json.JSONDecodeError):
|
||||||
|
pass
|
||||||
|
return []
|
||||||
|
|
||||||
|
def calculate_trends(self, trends: list[RankingTrend]) -> dict:
|
||||||
|
"""
|
||||||
|
Compute overall trend summary from keyword trends.
|
||||||
|
|
||||||
|
Returns dict with improved/declined/stable/new/lost counts.
|
||||||
|
"""
|
||||||
|
summary = {
|
||||||
|
"improved": 0,
|
||||||
|
"declined": 0,
|
||||||
|
"stable": 0,
|
||||||
|
"new": 0,
|
||||||
|
"lost": 0,
|
||||||
|
}
|
||||||
|
for trend in trends:
|
||||||
|
direction = trend.trend_direction
|
||||||
|
if direction in summary:
|
||||||
|
summary[direction] += 1
|
||||||
|
else:
|
||||||
|
summary["stable"] += 1
|
||||||
|
|
||||||
|
logger.info(
|
||||||
|
f"Trend summary: improved={summary['improved']}, "
|
||||||
|
f"declined={summary['declined']}, stable={summary['stable']}"
|
||||||
|
)
|
||||||
|
return summary
|
||||||
|
|
||||||
|
def find_top_movers(
|
||||||
|
self,
|
||||||
|
trends: list[RankingTrend],
|
||||||
|
limit: int = 10,
|
||||||
|
) -> tuple[list[TopMover], list[TopMover]]:
|
||||||
|
"""
|
||||||
|
Find keywords with biggest position gains and losses.
|
||||||
|
|
||||||
|
Returns tuple of (top_gainers, top_losers) sorted by impact score.
|
||||||
|
"""
|
||||||
|
gainers: list[TopMover] = []
|
||||||
|
losers: list[TopMover] = []
|
||||||
|
|
||||||
|
for trend in trends:
|
||||||
|
if not trend.positions_over_time or len(trend.positions_over_time) < 2:
|
||||||
|
continue
|
||||||
|
|
||||||
|
first_pos = trend.start_position
|
||||||
|
last_pos = trend.current_position
|
||||||
|
|
||||||
|
if first_pos <= 0 or last_pos <= 0:
|
||||||
|
continue
|
||||||
|
|
||||||
|
change = first_pos - last_pos # positive = improved
|
||||||
|
|
||||||
|
mover = TopMover(
|
||||||
|
keyword=trend.keyword,
|
||||||
|
position_change=change,
|
||||||
|
current_position=last_pos,
|
||||||
|
previous_position=first_pos,
|
||||||
|
volume=trend.volume,
|
||||||
|
)
|
||||||
|
mover.calculate_impact()
|
||||||
|
|
||||||
|
if change > 0:
|
||||||
|
gainers.append(mover)
|
||||||
|
elif change < 0:
|
||||||
|
losers.append(mover)
|
||||||
|
|
||||||
|
# Sort by impact score descending
|
||||||
|
gainers.sort(key=lambda m: m.impact_score, reverse=True)
|
||||||
|
losers.sort(key=lambda m: m.impact_score, reverse=True)
|
||||||
|
|
||||||
|
logger.info(f"Top movers: {len(gainers)} gainers, {len(losers)} losers")
|
||||||
|
return gainers[:limit], losers[:limit]
|
||||||
|
|
||||||
|
def _calculate_visibility_score(self, trends: list[RankingTrend], use_start: bool = False) -> float:
|
||||||
|
"""Calculate visibility score from trends (current or start positions)."""
|
||||||
|
total_weighted = 0.0
|
||||||
|
total_volume = 0
|
||||||
|
|
||||||
|
for trend in trends:
|
||||||
|
pos = trend.start_position if use_start else trend.current_position
|
||||||
|
if pos <= 0 or pos > 100:
|
||||||
|
continue
|
||||||
|
volume = max(trend.volume, 1)
|
||||||
|
total_volume += volume
|
||||||
|
ctr = CTR_WEIGHTS.get(pos, 0.0005)
|
||||||
|
total_weighted += volume * ctr
|
||||||
|
|
||||||
|
if total_volume > 0:
|
||||||
|
max_possible = total_volume * CTR_WEIGHTS[1]
|
||||||
|
return (total_weighted / max_possible) * 100.0
|
||||||
|
return 0.0
|
||||||
|
|
||||||
|
def generate_segment_report(self, trends: list[RankingTrend]) -> list[SegmentReport]:
|
||||||
|
"""
|
||||||
|
Generate performance breakdown by keyword segment.
|
||||||
|
|
||||||
|
Segments include: brand, non_brand, and by intent type.
|
||||||
|
"""
|
||||||
|
segment_map: dict[str, list[RankingTrend]] = {}
|
||||||
|
|
||||||
|
for trend in trends:
|
||||||
|
# Brand segment
|
||||||
|
brand_key = "brand" if trend.is_brand else "non_brand"
|
||||||
|
if brand_key not in segment_map:
|
||||||
|
segment_map[brand_key] = []
|
||||||
|
segment_map[brand_key].append(trend)
|
||||||
|
|
||||||
|
# Intent segment
|
||||||
|
intent_key = f"intent_{trend.intent.lower()}" if trend.intent else "intent_informational"
|
||||||
|
if intent_key not in segment_map:
|
||||||
|
segment_map[intent_key] = []
|
||||||
|
segment_map[intent_key].append(trend)
|
||||||
|
|
||||||
|
reports: list[SegmentReport] = []
|
||||||
|
for seg_name, seg_trends in sorted(segment_map.items()):
|
||||||
|
if not seg_trends:
|
||||||
|
continue
|
||||||
|
|
||||||
|
active = [t for t in seg_trends if t.current_position > 0]
|
||||||
|
avg_pos = sum(t.current_position for t in active) / len(active) if active else 0.0
|
||||||
|
avg_change = sum(t.total_change for t in seg_trends) / len(seg_trends) if seg_trends else 0.0
|
||||||
|
|
||||||
|
vis = self._calculate_visibility_score(seg_trends, use_start=False)
|
||||||
|
|
||||||
|
improved = sum(1 for t in seg_trends if t.trend_direction == "improved")
|
||||||
|
declined = sum(1 for t in seg_trends if t.trend_direction == "declined")
|
||||||
|
stable = sum(1 for t in seg_trends if t.trend_direction == "stable")
|
||||||
|
|
||||||
|
# Get top movers within segment
|
||||||
|
seg_gainers, seg_losers = self.find_top_movers(seg_trends, limit=5)
|
||||||
|
|
||||||
|
report = SegmentReport(
|
||||||
|
segment_name=seg_name,
|
||||||
|
total_keywords=len(seg_trends),
|
||||||
|
avg_position=round(avg_pos, 1),
|
||||||
|
avg_position_change=round(avg_change, 1),
|
||||||
|
visibility_score=round(vis, 2),
|
||||||
|
improved_count=improved,
|
||||||
|
declined_count=declined,
|
||||||
|
stable_count=stable,
|
||||||
|
top_gainers=seg_gainers,
|
||||||
|
top_losers=seg_losers,
|
||||||
|
)
|
||||||
|
reports.append(report)
|
||||||
|
|
||||||
|
return reports
|
||||||
|
|
||||||
|
async def compare_with_competitor(
|
||||||
|
self,
|
||||||
|
target: str,
|
||||||
|
competitor: str,
|
||||||
|
period_days: int = 30,
|
||||||
|
) -> CompetitorReport:
|
||||||
|
"""
|
||||||
|
Period-over-period comparison with a competitor.
|
||||||
|
|
||||||
|
Uses Ahrefs rank-tracker-competitors-stats for detailed comparison.
|
||||||
|
"""
|
||||||
|
logger.info(f"Comparing {target} vs {competitor} over {period_days} days")
|
||||||
|
|
||||||
|
comp_data = await self._call_competitors_stats(target, competitor)
|
||||||
|
|
||||||
|
report = CompetitorReport(competitor=competitor)
|
||||||
|
|
||||||
|
if comp_data:
|
||||||
|
report.our_visibility = comp_data.get("target_visibility", 0.0)
|
||||||
|
report.their_visibility = comp_data.get("competitor_visibility", 0.0)
|
||||||
|
report.overlap_keywords = comp_data.get("overlap_keywords", 0)
|
||||||
|
report.keywords_we_lead = comp_data.get("target_better", 0)
|
||||||
|
report.keywords_they_lead = comp_data.get("competitor_better", 0)
|
||||||
|
|
||||||
|
# Extract notable gaps
|
||||||
|
gaps = comp_data.get("keyword_gaps", [])
|
||||||
|
report.notable_gaps = [
|
||||||
|
{
|
||||||
|
"keyword": g.get("keyword", ""),
|
||||||
|
"our_position": g.get("target_position", 0),
|
||||||
|
"their_position": g.get("competitor_position", 0),
|
||||||
|
"volume": g.get("volume", 0),
|
||||||
|
}
|
||||||
|
for g in gaps[:15]
|
||||||
|
]
|
||||||
|
|
||||||
|
return report
|
||||||
|
|
||||||
|
async def _call_competitors_stats(self, target: str, competitor: str) -> dict:
|
||||||
|
"""Call Ahrefs rank-tracker-competitors-stats MCP tool."""
|
||||||
|
logger.info("Calling Ahrefs rank-tracker-competitors-stats...")
|
||||||
|
try:
|
||||||
|
import subprocess
|
||||||
|
result = subprocess.run(
|
||||||
|
["mcp-cli", "call", "ahrefs/rank-tracker-competitors-stats",
|
||||||
|
json.dumps({"target": target, "competitor": competitor})],
|
||||||
|
capture_output=True, text=True, timeout=60,
|
||||||
|
)
|
||||||
|
if result.returncode == 0:
|
||||||
|
return json.loads(result.stdout)
|
||||||
|
except (FileNotFoundError, subprocess.TimeoutExpired, json.JSONDecodeError):
|
||||||
|
pass
|
||||||
|
return {}
|
||||||
|
|
||||||
|
async def generate_report(
|
||||||
|
self,
|
||||||
|
target: str,
|
||||||
|
period_days: int = 30,
|
||||||
|
competitors: Optional[list[str]] = None,
|
||||||
|
) -> RankingReport:
|
||||||
|
"""
|
||||||
|
Orchestrate full ranking performance report generation.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
target: Target website URL
|
||||||
|
period_days: Reporting period in days
|
||||||
|
competitors: List of competitor URLs to compare
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Complete RankingReport with trends, movers, segments, and comparisons
|
||||||
|
"""
|
||||||
|
logger.info(f"Generating ranking report for: {target} ({period_days} days)")
|
||||||
|
|
||||||
|
report = RankingReport(target=target, period_days=period_days)
|
||||||
|
|
||||||
|
# Step 1: Fetch historical position data
|
||||||
|
trends = await self.get_historical_positions(target, period_days)
|
||||||
|
|
||||||
|
if not trends:
|
||||||
|
logger.warning("No historical data retrieved. Check Ahrefs project configuration.")
|
||||||
|
return report
|
||||||
|
|
||||||
|
report.keyword_trends = trends
|
||||||
|
report.total_keywords = len(trends)
|
||||||
|
|
||||||
|
# Step 2: Calculate trend summary
|
||||||
|
report.trend_summary = self.calculate_trends(trends)
|
||||||
|
|
||||||
|
# Step 3: Calculate visibility scores (current vs period start)
|
||||||
|
report.current_visibility = self._calculate_visibility_score(trends, use_start=False)
|
||||||
|
report.previous_visibility = self._calculate_visibility_score(trends, use_start=True)
|
||||||
|
report.visibility_change = report.current_visibility - report.previous_visibility
|
||||||
|
|
||||||
|
# Step 4: Find top movers
|
||||||
|
gainers, losers = self.find_top_movers(trends, limit=10)
|
||||||
|
report.top_gainers = gainers
|
||||||
|
report.top_losers = losers
|
||||||
|
|
||||||
|
# Step 5: Generate segment reports
|
||||||
|
report.segments = self.generate_segment_report(trends)
|
||||||
|
|
||||||
|
# Step 6: Compare with competitors
|
||||||
|
if competitors:
|
||||||
|
for competitor in competitors:
|
||||||
|
comp_report = await self.compare_with_competitor(
|
||||||
|
target, competitor, period_days,
|
||||||
|
)
|
||||||
|
report.competitors.append(comp_report)
|
||||||
|
|
||||||
|
logger.info(f"Report complete. Keywords: {report.total_keywords}")
|
||||||
|
logger.info(
|
||||||
|
f"Visibility: {report.previous_visibility:.2f} -> "
|
||||||
|
f"{report.current_visibility:.2f} ({report.visibility_change:+.2f})"
|
||||||
|
)
|
||||||
|
|
||||||
|
return report
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Output formatters
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
def format_text_report(report: RankingReport) -> str:
|
||||||
|
"""Format ranking report as human-readable text."""
|
||||||
|
lines = []
|
||||||
|
lines.append("=" * 60)
|
||||||
|
lines.append(f"Ranking Performance Report: {report.target}")
|
||||||
|
lines.append(f"Period: {report.period_start} ~ {report.period_end} ({report.period_days} days)")
|
||||||
|
lines.append(f"Generated: {report.timestamp}")
|
||||||
|
lines.append("=" * 60)
|
||||||
|
|
||||||
|
# Visibility trend
|
||||||
|
lines.append(f"\nVisibility Score:")
|
||||||
|
lines.append(f" Current: {report.current_visibility:.2f}")
|
||||||
|
lines.append(f" Previous: {report.previous_visibility:.2f}")
|
||||||
|
change_sign = "+" if report.visibility_change >= 0 else ""
|
||||||
|
lines.append(f" Change: {change_sign}{report.visibility_change:.2f}")
|
||||||
|
|
||||||
|
# Trend summary
|
||||||
|
ts = report.trend_summary
|
||||||
|
lines.append(f"\nKeyword Trends ({report.total_keywords} total):")
|
||||||
|
lines.append(f" Improved: {ts.get('improved', 0)}")
|
||||||
|
lines.append(f" Declined: {ts.get('declined', 0)}")
|
||||||
|
lines.append(f" Stable: {ts.get('stable', 0)}")
|
||||||
|
lines.append(f" New: {ts.get('new', 0)}")
|
||||||
|
lines.append(f" Lost: {ts.get('lost', 0)}")
|
||||||
|
|
||||||
|
# Top gainers
|
||||||
|
if report.top_gainers:
|
||||||
|
lines.append(f"\nTop Gainers:")
|
||||||
|
lines.append("-" * 60)
|
||||||
|
for m in report.top_gainers:
|
||||||
|
lines.append(
|
||||||
|
f" {m.keyword}: {m.previous_position} -> {m.current_position} "
|
||||||
|
f"(+{m.position_change}) | Vol: {m.volume} | Impact: {m.impact_score}"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Top losers
|
||||||
|
if report.top_losers:
|
||||||
|
lines.append(f"\nTop Losers:")
|
||||||
|
lines.append("-" * 60)
|
||||||
|
for m in report.top_losers:
|
||||||
|
lines.append(
|
||||||
|
f" {m.keyword}: {m.previous_position} -> {m.current_position} "
|
||||||
|
f"({m.position_change}) | Vol: {m.volume} | Impact: {m.impact_score}"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Segments
|
||||||
|
if report.segments:
|
||||||
|
lines.append(f"\nSegment Breakdown:")
|
||||||
|
lines.append("-" * 60)
|
||||||
|
for seg in report.segments:
|
||||||
|
lines.append(
|
||||||
|
f" {seg.segment_name}: {seg.total_keywords} kw, "
|
||||||
|
f"avg pos {seg.avg_position}, vis {seg.visibility_score}, "
|
||||||
|
f"improved {seg.improved_count} / declined {seg.declined_count}"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Competitors
|
||||||
|
if report.competitors:
|
||||||
|
lines.append(f"\nCompetitor Comparison:")
|
||||||
|
lines.append("-" * 60)
|
||||||
|
for comp in report.competitors:
|
||||||
|
lines.append(f" vs {comp.competitor}:")
|
||||||
|
lines.append(f" Our visibility: {comp.our_visibility:.2f}")
|
||||||
|
lines.append(f" Their visibility: {comp.their_visibility:.2f}")
|
||||||
|
lines.append(f" Overlap: {comp.overlap_keywords} keywords")
|
||||||
|
lines.append(f" We lead: {comp.keywords_we_lead}")
|
||||||
|
lines.append(f" They lead: {comp.keywords_they_lead}")
|
||||||
|
if comp.notable_gaps:
|
||||||
|
lines.append(f" Notable gaps:")
|
||||||
|
for gap in comp.notable_gaps[:5]:
|
||||||
|
lines.append(
|
||||||
|
f" {gap['keyword']}: us #{gap['our_position']} "
|
||||||
|
f"vs them #{gap['their_position']} (vol: {gap['volume']})"
|
||||||
|
)
|
||||||
|
|
||||||
|
lines.append("\n" + "=" * 60)
|
||||||
|
return "\n".join(lines)
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# CLI
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
def parse_args() -> argparse.Namespace:
|
||||||
|
parser = argparse.ArgumentParser(
|
||||||
|
description="Ranking Reporter - Generate ranking performance reports with trends",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--target",
|
||||||
|
required=True,
|
||||||
|
help="Target website URL (e.g., https://example.com)",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--period",
|
||||||
|
type=int,
|
||||||
|
default=30,
|
||||||
|
help="Reporting period in days (default: 30)",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--competitor",
|
||||||
|
action="append",
|
||||||
|
dest="competitors",
|
||||||
|
default=[],
|
||||||
|
help="Competitor URL to compare (repeatable)",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--json",
|
||||||
|
action="store_true",
|
||||||
|
dest="json_output",
|
||||||
|
help="Output in JSON format",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--output",
|
||||||
|
type=str,
|
||||||
|
default=None,
|
||||||
|
help="Save output to file path",
|
||||||
|
)
|
||||||
|
return parser.parse_args()
|
||||||
|
|
||||||
|
|
||||||
|
async def main():
|
||||||
|
args = parse_args()
|
||||||
|
|
||||||
|
reporter = RankingReporter()
|
||||||
|
|
||||||
|
report = await reporter.generate_report(
|
||||||
|
target=args.target,
|
||||||
|
period_days=args.period,
|
||||||
|
competitors=args.competitors,
|
||||||
|
)
|
||||||
|
|
||||||
|
if args.json_output:
|
||||||
|
output = json.dumps(report.to_dict(), ensure_ascii=False, indent=2)
|
||||||
|
else:
|
||||||
|
output = format_text_report(report)
|
||||||
|
|
||||||
|
if args.output:
|
||||||
|
with open(args.output, "w", encoding="utf-8") as f:
|
||||||
|
f.write(output)
|
||||||
|
logger.info(f"Output saved to: {args.output}")
|
||||||
|
else:
|
||||||
|
print(output)
|
||||||
|
|
||||||
|
reporter.print_stats()
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
asyncio.run(main())
|
||||||
@@ -0,0 +1,8 @@
|
|||||||
|
# 21-seo-position-tracking dependencies
|
||||||
|
requests>=2.31.0
|
||||||
|
aiohttp>=3.9.0
|
||||||
|
pandas>=2.1.0
|
||||||
|
tenacity>=8.2.0
|
||||||
|
tqdm>=4.66.0
|
||||||
|
python-dotenv>=1.0.0
|
||||||
|
rich>=13.7.0
|
||||||
107
custom-skills/21-seo-position-tracking/desktop/SKILL.md
Normal file
107
custom-skills/21-seo-position-tracking/desktop/SKILL.md
Normal file
@@ -0,0 +1,107 @@
|
|||||||
|
---
|
||||||
|
name: seo-position-tracking
|
||||||
|
description: |
|
||||||
|
Keyword position tracking and ranking monitoring via Ahrefs Rank Tracker.
|
||||||
|
Triggers: rank tracking, position monitoring, keyword rankings, visibility score, ranking report, 키워드 순위, 순위 추적.
|
||||||
|
---
|
||||||
|
|
||||||
|
# SEO Position Tracking
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
Monitor keyword ranking positions, detect significant changes, calculate visibility scores, and compare against competitors using Ahrefs Rank Tracker data. Provides actionable alerts for ranking drops and segment-level performance breakdown.
|
||||||
|
|
||||||
|
## Core Capabilities
|
||||||
|
|
||||||
|
1. **Position Monitoring** - Retrieve current keyword ranking positions from Ahrefs Rank Tracker projects
|
||||||
|
2. **Change Detection** - Detect significant position changes with configurable threshold alerts (severity: critical/high/medium/low)
|
||||||
|
3. **Visibility Scoring** - Calculate weighted visibility scores using CTR-curve model (position 1 = 30%, position 2 = 15%, etc.)
|
||||||
|
4. **Brand/Non-brand Segmentation** - Automatically classify keywords by brand relevance and search intent type
|
||||||
|
5. **Competitor Comparison** - Compare keyword overlap, position gaps, and visibility scores against competitors
|
||||||
|
|
||||||
|
## MCP Tool Usage
|
||||||
|
|
||||||
|
### Ahrefs Rank Tracker Tools
|
||||||
|
```
|
||||||
|
mcp__ahrefs__rank-tracker-overview: Get rank tracking overview with current positions
|
||||||
|
mcp__ahrefs__rank-tracker-competitors-overview: Compare rankings against competitors
|
||||||
|
mcp__ahrefs__rank-tracker-competitors-pages: Competitor page-level ranking data
|
||||||
|
mcp__ahrefs__rank-tracker-competitors-stats: Detailed competitor ranking statistics
|
||||||
|
mcp__ahrefs__rank-tracker-serp-overview: SERP details for tracked keywords
|
||||||
|
mcp__ahrefs__management-projects: List available Ahrefs projects
|
||||||
|
mcp__ahrefs__management-project-keywords: Get tracked keywords for a project
|
||||||
|
```
|
||||||
|
|
||||||
|
### Notion for Report Storage
|
||||||
|
```
|
||||||
|
mcp__notion__notion-create-pages: Save tracking reports to SEO Audit Log
|
||||||
|
mcp__notion__notion-update-page: Update existing tracking entries
|
||||||
|
```
|
||||||
|
|
||||||
|
## Workflow
|
||||||
|
|
||||||
|
### Phase 1: Data Collection
|
||||||
|
1. Identify Ahrefs project via `management-projects`
|
||||||
|
2. Retrieve tracked keywords via `management-project-keywords`
|
||||||
|
3. Fetch current positions via `rank-tracker-overview`
|
||||||
|
4. Fetch competitor data via `rank-tracker-competitors-overview` (if requested)
|
||||||
|
|
||||||
|
### Phase 2: Analysis
|
||||||
|
1. Detect position changes against previous period
|
||||||
|
2. Generate alerts for changes exceeding threshold
|
||||||
|
3. Calculate visibility score weighted by search volume and CTR curve
|
||||||
|
4. Segment keywords into brand/non-brand and by intent type
|
||||||
|
5. Compare positions against each competitor
|
||||||
|
|
||||||
|
### Phase 3: Reporting
|
||||||
|
1. Compile position distribution (top3/top10/top20/top50/top100)
|
||||||
|
2. Summarize changes (improved/declined/stable/new/lost)
|
||||||
|
3. List alerts sorted by severity and search volume
|
||||||
|
4. Generate segment-level breakdown
|
||||||
|
5. Save report to Notion SEO Audit Log database
|
||||||
|
|
||||||
|
## Output Format
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"target": "https://example.com",
|
||||||
|
"total_keywords": 250,
|
||||||
|
"visibility_score": 68.5,
|
||||||
|
"positions": {
|
||||||
|
"top3": 15,
|
||||||
|
"top10": 48,
|
||||||
|
"top20": 92,
|
||||||
|
"top50": 180,
|
||||||
|
"top100": 230
|
||||||
|
},
|
||||||
|
"changes": {
|
||||||
|
"improved": 45,
|
||||||
|
"declined": 30,
|
||||||
|
"stable": 155,
|
||||||
|
"new": 12,
|
||||||
|
"lost": 8
|
||||||
|
},
|
||||||
|
"alerts": [
|
||||||
|
{
|
||||||
|
"keyword": "example keyword",
|
||||||
|
"old_position": 5,
|
||||||
|
"new_position": 15,
|
||||||
|
"change": -10,
|
||||||
|
"volume": 5400,
|
||||||
|
"severity": "high"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"segments": {
|
||||||
|
"brand": {"keywords": 30, "avg_position": 2.1},
|
||||||
|
"non_brand": {"keywords": 220, "avg_position": 24.5}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Notion Output (Required)
|
||||||
|
|
||||||
|
All tracking reports MUST be saved to OurDigital SEO Audit Log:
|
||||||
|
- **Database ID**: `2c8581e5-8a1e-8035-880b-e38cefc2f3ef`
|
||||||
|
- **Properties**: Issue (title), Site (url), Category (Position Tracking), Priority, Found Date, Audit ID
|
||||||
|
- **Language**: Korean with English technical terms
|
||||||
|
- **Audit ID Format**: RANK-YYYYMMDD-NNN
|
||||||
@@ -0,0 +1,8 @@
|
|||||||
|
name: seo-position-tracking
|
||||||
|
description: |
|
||||||
|
Keyword position tracking and ranking monitoring. Triggers: rank tracking, position monitoring, keyword rankings, visibility score, ranking report.
|
||||||
|
allowed-tools:
|
||||||
|
- mcp__ahrefs__*
|
||||||
|
- mcp__notion__*
|
||||||
|
- WebSearch
|
||||||
|
- WebFetch
|
||||||
@@ -0,0 +1,15 @@
|
|||||||
|
# Ahrefs
|
||||||
|
|
||||||
|
> TODO: Document tool usage for this skill
|
||||||
|
|
||||||
|
## Available Commands
|
||||||
|
|
||||||
|
- [ ] List commands
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
- [ ] Add configuration details
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
- [ ] Add usage examples
|
||||||
@@ -0,0 +1,15 @@
|
|||||||
|
# Notion
|
||||||
|
|
||||||
|
> TODO: Document tool usage for this skill
|
||||||
|
|
||||||
|
## Available Commands
|
||||||
|
|
||||||
|
- [ ] List commands
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
- [ ] Add configuration details
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
- [ ] Add usage examples
|
||||||
@@ -0,0 +1,15 @@
|
|||||||
|
# WebSearch
|
||||||
|
|
||||||
|
> TODO: Document tool usage for this skill
|
||||||
|
|
||||||
|
## Available Commands
|
||||||
|
|
||||||
|
- [ ] List commands
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
- [ ] Add configuration details
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
- [ ] Add usage examples
|
||||||
144
custom-skills/22-seo-link-building/code/CLAUDE.md
Normal file
144
custom-skills/22-seo-link-building/code/CLAUDE.md
Normal file
@@ -0,0 +1,144 @@
|
|||||||
|
# CLAUDE.md
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Link building diagnosis tool for backlink profile analysis, toxic link detection, competitor link gap identification, and link velocity tracking. Supports Korean platform link mapping (Naver Blog, Cafe, Tistory, Brunch, Korean news sites).
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install -r scripts/requirements.txt
|
||||||
|
|
||||||
|
# Backlink profile audit
|
||||||
|
python scripts/backlink_auditor.py --url https://example.com --json
|
||||||
|
|
||||||
|
# Link gap analysis vs competitors
|
||||||
|
python scripts/link_gap_finder.py --target https://example.com --competitor https://competitor.com --json
|
||||||
|
```
|
||||||
|
|
||||||
|
## Scripts
|
||||||
|
|
||||||
|
| Script | Purpose | Key Output |
|
||||||
|
|--------|---------|------------|
|
||||||
|
| `backlink_auditor.py` | Analyze backlink profile, detect toxic links | DR, referring domains, anchor distribution, toxic links |
|
||||||
|
| `link_gap_finder.py` | Find link gap opportunities vs competitors | Domains linking to competitors but not target |
|
||||||
|
| `base_client.py` | Shared utilities | RateLimiter, ConfigManager, BaseAsyncClient |
|
||||||
|
|
||||||
|
## Backlink Auditor
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Full backlink audit
|
||||||
|
python scripts/backlink_auditor.py --url https://example.com --json
|
||||||
|
|
||||||
|
# Check link velocity (new/lost over time)
|
||||||
|
python scripts/backlink_auditor.py --url https://example.com --velocity --json
|
||||||
|
|
||||||
|
# Find broken backlinks for recovery
|
||||||
|
python scripts/backlink_auditor.py --url https://example.com --broken --json
|
||||||
|
|
||||||
|
# Korean platform link analysis
|
||||||
|
python scripts/backlink_auditor.py --url https://example.com --korean-platforms --json
|
||||||
|
```
|
||||||
|
|
||||||
|
**Capabilities**:
|
||||||
|
- Domain Rating (DR) and backlink stats overview
|
||||||
|
- Referring domain analysis (count, DR distribution, country distribution)
|
||||||
|
- Anchor text distribution analysis (branded, exact-match, generic, naked URL)
|
||||||
|
- Toxic link detection (PBN patterns, spammy domains, link farms)
|
||||||
|
- Link velocity tracking (new/lost referring domains over time)
|
||||||
|
- Broken backlink recovery opportunities
|
||||||
|
- Korean platform mapping (Naver Blog, Naver Cafe, Tistory, Brunch, Korean news)
|
||||||
|
|
||||||
|
## Link Gap Finder
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Gap vs one competitor
|
||||||
|
python scripts/link_gap_finder.py --target https://example.com --competitor https://comp1.com --json
|
||||||
|
|
||||||
|
# Multiple competitors
|
||||||
|
python scripts/link_gap_finder.py --target https://example.com --competitor https://comp1.com --competitor https://comp2.com --json
|
||||||
|
|
||||||
|
# Filter by minimum DR
|
||||||
|
python scripts/link_gap_finder.py --target https://example.com --competitor https://comp1.com --min-dr 30 --json
|
||||||
|
```
|
||||||
|
|
||||||
|
**Capabilities**:
|
||||||
|
- Find domains linking to competitors but not target
|
||||||
|
- Score link opportunities by DR, traffic, relevance
|
||||||
|
- Categorize link sources (editorial, directory, forum, blog, news)
|
||||||
|
- Prioritize by feasibility and impact
|
||||||
|
|
||||||
|
## Ahrefs MCP Tools Used
|
||||||
|
|
||||||
|
| Tool | Purpose |
|
||||||
|
|------|---------|
|
||||||
|
| `site-explorer-all-backlinks` | Get all backlinks for a target |
|
||||||
|
| `site-explorer-backlinks-stats` | Backlink statistics overview |
|
||||||
|
| `site-explorer-referring-domains` | List referring domains |
|
||||||
|
| `site-explorer-anchors` | Anchor text distribution |
|
||||||
|
| `site-explorer-broken-backlinks` | Find broken backlinks |
|
||||||
|
| `site-explorer-domain-rating` | Get Domain Rating |
|
||||||
|
| `site-explorer-domain-rating-history` | DR trend over time |
|
||||||
|
| `site-explorer-refdomains-history` | Referring domains trend |
|
||||||
|
| `site-explorer-linked-domains` | Domains linked from target |
|
||||||
|
|
||||||
|
## Output Format
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"url": "https://example.com",
|
||||||
|
"domain_rating": 45,
|
||||||
|
"backlink_stats": {
|
||||||
|
"total_backlinks": 12500,
|
||||||
|
"referring_domains": 850,
|
||||||
|
"dofollow_ratio": 0.72
|
||||||
|
},
|
||||||
|
"anchor_distribution": {
|
||||||
|
"branded": 35,
|
||||||
|
"exact_match": 12,
|
||||||
|
"partial_match": 18,
|
||||||
|
"generic": 20,
|
||||||
|
"naked_url": 15
|
||||||
|
},
|
||||||
|
"toxic_links": [...],
|
||||||
|
"korean_platforms": {
|
||||||
|
"naver_blog": 45,
|
||||||
|
"naver_cafe": 12,
|
||||||
|
"tistory": 23,
|
||||||
|
"brunch": 5
|
||||||
|
},
|
||||||
|
"link_velocity": {
|
||||||
|
"new_last_30d": 120,
|
||||||
|
"lost_last_30d": 35
|
||||||
|
},
|
||||||
|
"timestamp": "2025-01-01T00:00:00"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Notion Output (Required)
|
||||||
|
|
||||||
|
**IMPORTANT**: All audit reports MUST be saved to the OurDigital SEO Audit Log database.
|
||||||
|
|
||||||
|
### Database Configuration
|
||||||
|
|
||||||
|
| Field | Value |
|
||||||
|
|-------|-------|
|
||||||
|
| Database ID | `2c8581e5-8a1e-8035-880b-e38cefc2f3ef` |
|
||||||
|
| URL | https://www.notion.so/dintelligence/2c8581e58a1e8035880be38cefc2f3ef |
|
||||||
|
|
||||||
|
### Required Properties
|
||||||
|
|
||||||
|
| Property | Type | Description |
|
||||||
|
|----------|------|-------------|
|
||||||
|
| Issue | Title | Report title (Korean + date) |
|
||||||
|
| Site | URL | Audited website URL |
|
||||||
|
| Category | Select | Link Building |
|
||||||
|
| Priority | Select | Based on toxic link count and gap size |
|
||||||
|
| Found Date | Date | Audit date (YYYY-MM-DD) |
|
||||||
|
| Audit ID | Rich Text | Format: LINK-YYYYMMDD-NNN |
|
||||||
|
|
||||||
|
### Language Guidelines
|
||||||
|
|
||||||
|
- Report content in Korean (한국어)
|
||||||
|
- Keep technical English terms as-is (e.g., Domain Rating, Referring Domains, Backlinks)
|
||||||
|
- URLs and code remain unchanged
|
||||||
1079
custom-skills/22-seo-link-building/code/scripts/backlink_auditor.py
Normal file
1079
custom-skills/22-seo-link-building/code/scripts/backlink_auditor.py
Normal file
File diff suppressed because it is too large
Load Diff
207
custom-skills/22-seo-link-building/code/scripts/base_client.py
Normal file
207
custom-skills/22-seo-link-building/code/scripts/base_client.py
Normal file
@@ -0,0 +1,207 @@
|
|||||||
|
"""
|
||||||
|
Base Client - Shared async client utilities
|
||||||
|
===========================================
|
||||||
|
Purpose: Rate-limited async operations for API clients
|
||||||
|
Python: 3.10+
|
||||||
|
"""
|
||||||
|
|
||||||
|
import asyncio
|
||||||
|
import logging
|
||||||
|
import os
|
||||||
|
from asyncio import Semaphore
|
||||||
|
from datetime import datetime
|
||||||
|
from typing import Any, Callable, TypeVar
|
||||||
|
|
||||||
|
from dotenv import load_dotenv
|
||||||
|
from tenacity import (
|
||||||
|
retry,
|
||||||
|
stop_after_attempt,
|
||||||
|
wait_exponential,
|
||||||
|
retry_if_exception_type,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Load environment variables
|
||||||
|
load_dotenv()
|
||||||
|
|
||||||
|
# Logging setup
|
||||||
|
logging.basicConfig(
|
||||||
|
level=logging.INFO,
|
||||||
|
format="%(asctime)s - %(levelname)s - %(message)s",
|
||||||
|
)
|
||||||
|
|
||||||
|
T = TypeVar("T")
|
||||||
|
|
||||||
|
|
||||||
|
class RateLimiter:
|
||||||
|
"""Rate limiter using token bucket algorithm."""
|
||||||
|
|
||||||
|
def __init__(self, rate: float, per: float = 1.0):
|
||||||
|
"""
|
||||||
|
Initialize rate limiter.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
rate: Number of requests allowed
|
||||||
|
per: Time period in seconds (default: 1 second)
|
||||||
|
"""
|
||||||
|
self.rate = rate
|
||||||
|
self.per = per
|
||||||
|
self.tokens = rate
|
||||||
|
self.last_update = datetime.now()
|
||||||
|
self._lock = asyncio.Lock()
|
||||||
|
|
||||||
|
async def acquire(self) -> None:
|
||||||
|
"""Acquire a token, waiting if necessary."""
|
||||||
|
async with self._lock:
|
||||||
|
now = datetime.now()
|
||||||
|
elapsed = (now - self.last_update).total_seconds()
|
||||||
|
self.tokens = min(self.rate, self.tokens + elapsed * (self.rate / self.per))
|
||||||
|
self.last_update = now
|
||||||
|
|
||||||
|
if self.tokens < 1:
|
||||||
|
wait_time = (1 - self.tokens) * (self.per / self.rate)
|
||||||
|
await asyncio.sleep(wait_time)
|
||||||
|
self.tokens = 0
|
||||||
|
else:
|
||||||
|
self.tokens -= 1
|
||||||
|
|
||||||
|
|
||||||
|
class BaseAsyncClient:
|
||||||
|
"""Base class for async API clients with rate limiting."""
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
max_concurrent: int = 5,
|
||||||
|
requests_per_second: float = 3.0,
|
||||||
|
logger: logging.Logger | None = None,
|
||||||
|
):
|
||||||
|
"""
|
||||||
|
Initialize base client.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
max_concurrent: Maximum concurrent requests
|
||||||
|
requests_per_second: Rate limit
|
||||||
|
logger: Logger instance
|
||||||
|
"""
|
||||||
|
self.semaphore = Semaphore(max_concurrent)
|
||||||
|
self.rate_limiter = RateLimiter(requests_per_second)
|
||||||
|
self.logger = logger or logging.getLogger(self.__class__.__name__)
|
||||||
|
self.stats = {
|
||||||
|
"requests": 0,
|
||||||
|
"success": 0,
|
||||||
|
"errors": 0,
|
||||||
|
"retries": 0,
|
||||||
|
}
|
||||||
|
|
||||||
|
@retry(
|
||||||
|
stop=stop_after_attempt(3),
|
||||||
|
wait=wait_exponential(multiplier=1, min=2, max=10),
|
||||||
|
retry=retry_if_exception_type(Exception),
|
||||||
|
)
|
||||||
|
async def _rate_limited_request(
|
||||||
|
self,
|
||||||
|
coro: Callable[[], Any],
|
||||||
|
) -> Any:
|
||||||
|
"""Execute a request with rate limiting and retry."""
|
||||||
|
async with self.semaphore:
|
||||||
|
await self.rate_limiter.acquire()
|
||||||
|
self.stats["requests"] += 1
|
||||||
|
try:
|
||||||
|
result = await coro()
|
||||||
|
self.stats["success"] += 1
|
||||||
|
return result
|
||||||
|
except Exception as e:
|
||||||
|
self.stats["errors"] += 1
|
||||||
|
self.logger.error(f"Request failed: {e}")
|
||||||
|
raise
|
||||||
|
|
||||||
|
async def batch_requests(
|
||||||
|
self,
|
||||||
|
requests: list[Callable[[], Any]],
|
||||||
|
desc: str = "Processing",
|
||||||
|
) -> list[Any]:
|
||||||
|
"""Execute multiple requests concurrently."""
|
||||||
|
try:
|
||||||
|
from tqdm.asyncio import tqdm
|
||||||
|
has_tqdm = True
|
||||||
|
except ImportError:
|
||||||
|
has_tqdm = False
|
||||||
|
|
||||||
|
async def execute(req: Callable) -> Any:
|
||||||
|
try:
|
||||||
|
return await self._rate_limited_request(req)
|
||||||
|
except Exception as e:
|
||||||
|
return {"error": str(e)}
|
||||||
|
|
||||||
|
tasks = [execute(req) for req in requests]
|
||||||
|
|
||||||
|
if has_tqdm:
|
||||||
|
results = []
|
||||||
|
for coro in tqdm.as_completed(tasks, total=len(tasks), desc=desc):
|
||||||
|
result = await coro
|
||||||
|
results.append(result)
|
||||||
|
return results
|
||||||
|
else:
|
||||||
|
return await asyncio.gather(*tasks, return_exceptions=True)
|
||||||
|
|
||||||
|
def print_stats(self) -> None:
|
||||||
|
"""Print request statistics."""
|
||||||
|
self.logger.info("=" * 40)
|
||||||
|
self.logger.info("Request Statistics:")
|
||||||
|
self.logger.info(f" Total Requests: {self.stats['requests']}")
|
||||||
|
self.logger.info(f" Successful: {self.stats['success']}")
|
||||||
|
self.logger.info(f" Errors: {self.stats['errors']}")
|
||||||
|
self.logger.info("=" * 40)
|
||||||
|
|
||||||
|
|
||||||
|
class ConfigManager:
|
||||||
|
"""Manage API configuration and credentials."""
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
load_dotenv()
|
||||||
|
|
||||||
|
@property
|
||||||
|
def google_credentials_path(self) -> str | None:
|
||||||
|
"""Get Google service account credentials path."""
|
||||||
|
# Prefer SEO-specific credentials, fallback to general credentials
|
||||||
|
seo_creds = os.path.expanduser("~/.credential/ourdigital-seo-agent.json")
|
||||||
|
if os.path.exists(seo_creds):
|
||||||
|
return seo_creds
|
||||||
|
return os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
|
||||||
|
|
||||||
|
@property
|
||||||
|
def pagespeed_api_key(self) -> str | None:
|
||||||
|
"""Get PageSpeed Insights API key."""
|
||||||
|
return os.getenv("PAGESPEED_API_KEY")
|
||||||
|
|
||||||
|
@property
|
||||||
|
def custom_search_api_key(self) -> str | None:
|
||||||
|
"""Get Custom Search API key."""
|
||||||
|
return os.getenv("CUSTOM_SEARCH_API_KEY")
|
||||||
|
|
||||||
|
@property
|
||||||
|
def custom_search_engine_id(self) -> str | None:
|
||||||
|
"""Get Custom Search Engine ID."""
|
||||||
|
return os.getenv("CUSTOM_SEARCH_ENGINE_ID")
|
||||||
|
|
||||||
|
@property
|
||||||
|
def notion_token(self) -> str | None:
|
||||||
|
"""Get Notion API token."""
|
||||||
|
return os.getenv("NOTION_TOKEN") or os.getenv("NOTION_API_KEY")
|
||||||
|
|
||||||
|
def validate_google_credentials(self) -> bool:
|
||||||
|
"""Validate Google credentials are configured."""
|
||||||
|
creds_path = self.google_credentials_path
|
||||||
|
if not creds_path:
|
||||||
|
return False
|
||||||
|
return os.path.exists(creds_path)
|
||||||
|
|
||||||
|
def get_required(self, key: str) -> str:
|
||||||
|
"""Get required environment variable or raise error."""
|
||||||
|
value = os.getenv(key)
|
||||||
|
if not value:
|
||||||
|
raise ValueError(f"Missing required environment variable: {key}")
|
||||||
|
return value
|
||||||
|
|
||||||
|
|
||||||
|
# Singleton config instance
|
||||||
|
config = ConfigManager()
|
||||||
@@ -0,0 +1,802 @@
|
|||||||
|
"""
|
||||||
|
Link Gap Finder - Competitor link gap analysis
|
||||||
|
===============================================
|
||||||
|
Purpose: Identify link building opportunities by finding domains that link
|
||||||
|
to competitors but not to the target site via Ahrefs MCP.
|
||||||
|
Python: 3.10+
|
||||||
|
Usage:
|
||||||
|
python link_gap_finder.py --target https://example.com --competitor https://comp1.com --json
|
||||||
|
python link_gap_finder.py --target https://example.com --competitor https://comp1.com --competitor https://comp2.com --min-dr 30 --json
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import asyncio
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
import re
|
||||||
|
import sys
|
||||||
|
from dataclasses import dataclass, field, asdict
|
||||||
|
from datetime import datetime
|
||||||
|
from typing import Any
|
||||||
|
from urllib.parse import urlparse
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
|
import pandas as pd
|
||||||
|
from rich.console import Console
|
||||||
|
from rich.table import Table
|
||||||
|
|
||||||
|
from base_client import BaseAsyncClient, config
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Logging
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
logger = logging.getLogger("link_gap_finder")
|
||||||
|
console = Console()
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Constants
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
AHREFS_BASE = "https://api.ahrefs.com/v3"
|
||||||
|
|
||||||
|
# Source category detection patterns
|
||||||
|
SOURCE_CATEGORY_PATTERNS: dict[str, list[str]] = {
|
||||||
|
"news": [
|
||||||
|
"news", "press", "media", "journal", "herald", "times", "post",
|
||||||
|
"gazette", "tribune", "daily", "chosun", "donga", "joongang",
|
||||||
|
"hani", "khan", "yna", "yonhap", "reuters", "bloomberg",
|
||||||
|
"techcrunch", "verge", "wired", "arstechnica", "bbc", "cnn",
|
||||||
|
],
|
||||||
|
"blog": [
|
||||||
|
"blog", "wordpress", "medium.com", "tistory.com", "brunch.co.kr",
|
||||||
|
"blog.naver.com", "tumblr", "blogger", "substack", "ghost.io",
|
||||||
|
"velog.io", "dev.to",
|
||||||
|
],
|
||||||
|
"forum": [
|
||||||
|
"forum", "community", "discuss", "reddit.com", "quora.com",
|
||||||
|
"stackexchange", "stackoverflow", "cafe.naver.com", "dcinside",
|
||||||
|
"fmkorea", "clien", "ppomppu", "theqoo", "ruliweb",
|
||||||
|
],
|
||||||
|
"directory": [
|
||||||
|
"directory", "listing", "yellowpages", "yelp", "bbb.org",
|
||||||
|
"clutch.co", "g2.com", "capterra", "trustpilot", "glassdoor",
|
||||||
|
"dmoz", "aboutus", "hotfrog", "manta", "superpages",
|
||||||
|
],
|
||||||
|
"edu_gov": [
|
||||||
|
".edu", ".gov", ".ac.kr", ".go.kr", ".or.kr",
|
||||||
|
],
|
||||||
|
"social": [
|
||||||
|
"facebook.com", "twitter.com", "x.com", "linkedin.com",
|
||||||
|
"instagram.com", "youtube.com", "pinterest.com", "tiktok.com",
|
||||||
|
],
|
||||||
|
"korean_platform": [
|
||||||
|
"naver.com", "daum.net", "kakao.com", "tistory.com",
|
||||||
|
"brunch.co.kr", "zum.com", "nate.com",
|
||||||
|
],
|
||||||
|
}
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Dataclasses
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class LinkOpportunity:
|
||||||
|
"""A single link building opportunity from gap analysis."""
|
||||||
|
domain: str
|
||||||
|
dr: float = 0.0
|
||||||
|
traffic: int = 0
|
||||||
|
linked_competitors: list[str] = field(default_factory=list)
|
||||||
|
competitor_count: int = 0
|
||||||
|
not_linked_target: bool = True
|
||||||
|
category: str = "other"
|
||||||
|
feasibility_score: float = 0.0
|
||||||
|
impact_score: float = 0.0
|
||||||
|
overall_score: float = 0.0
|
||||||
|
backlinks_to_competitors: int = 0
|
||||||
|
country: str = ""
|
||||||
|
top_anchor: str = ""
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class GapSummary:
|
||||||
|
"""Summary statistics for the gap analysis."""
|
||||||
|
total_opportunities: int = 0
|
||||||
|
avg_dr: float = 0.0
|
||||||
|
high_dr_count: int = 0
|
||||||
|
category_breakdown: dict[str, int] = field(default_factory=dict)
|
||||||
|
top_countries: list[dict[str, Any]] = field(default_factory=list)
|
||||||
|
total_competitor_refdomains: dict[str, int] = field(default_factory=dict)
|
||||||
|
target_refdomains_count: int = 0
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class LinkGapResult:
|
||||||
|
"""Complete link gap analysis result."""
|
||||||
|
target_url: str
|
||||||
|
target_domain: str = ""
|
||||||
|
competitor_urls: list[str] = field(default_factory=list)
|
||||||
|
competitor_domains: list[str] = field(default_factory=list)
|
||||||
|
target_dr: float = 0.0
|
||||||
|
opportunities: list[LinkOpportunity] = field(default_factory=list)
|
||||||
|
summary: GapSummary | None = None
|
||||||
|
top_opportunities: list[LinkOpportunity] = field(default_factory=list)
|
||||||
|
issues: list[dict[str, str]] = field(default_factory=list)
|
||||||
|
recommendations: list[str] = field(default_factory=list)
|
||||||
|
timestamp: str = ""
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# LinkGapFinder
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
class LinkGapFinder(BaseAsyncClient):
|
||||||
|
"""Find link building opportunities by analyzing competitor backlink gaps."""
|
||||||
|
|
||||||
|
def __init__(self, **kwargs):
|
||||||
|
super().__init__(max_concurrent=5, requests_per_second=2.0, **kwargs)
|
||||||
|
self.session: aiohttp.ClientSession | None = None
|
||||||
|
|
||||||
|
# -- Ahrefs MCP helper ---------------------------------------------------
|
||||||
|
|
||||||
|
async def _call_ahrefs(
|
||||||
|
self, endpoint: str, params: dict[str, Any]
|
||||||
|
) -> dict[str, Any]:
|
||||||
|
"""
|
||||||
|
Call Ahrefs API endpoint.
|
||||||
|
|
||||||
|
In MCP context this calls mcp__ahrefs__<endpoint>.
|
||||||
|
For standalone use, falls back to REST API with token.
|
||||||
|
"""
|
||||||
|
api_token = config.get_required("AHREFS_API_TOKEN") if not self.session else None
|
||||||
|
|
||||||
|
if self.session and api_token:
|
||||||
|
url = f"{AHREFS_BASE}/{endpoint}"
|
||||||
|
headers = {"Authorization": f"Bearer {api_token}"}
|
||||||
|
async with self.session.get(url, headers=headers, params=params) as resp:
|
||||||
|
resp.raise_for_status()
|
||||||
|
return await resp.json()
|
||||||
|
|
||||||
|
logger.warning(
|
||||||
|
f"Ahrefs call to '{endpoint}' - use MCP tool "
|
||||||
|
f"mcp__ahrefs__{endpoint.replace('-', '_')} in Claude Desktop"
|
||||||
|
)
|
||||||
|
return {"endpoint": endpoint, "params": params, "data": [], "note": "mcp_stub"}
|
||||||
|
|
||||||
|
# -- Core methods --------------------------------------------------------
|
||||||
|
|
||||||
|
async def get_referring_domains(
|
||||||
|
self, url: str, limit: int = 1000
|
||||||
|
) -> list[dict[str, Any]]:
|
||||||
|
"""Fetch referring domains for a given URL/domain."""
|
||||||
|
target = urlparse(url).netloc or url
|
||||||
|
result = await self._call_ahrefs(
|
||||||
|
"site-explorer-referring-domains",
|
||||||
|
{"target": target, "mode": "domain", "limit": limit, "order_by": "domain_rating:desc"},
|
||||||
|
)
|
||||||
|
domains = result.get("data", result.get("refdomains", []))
|
||||||
|
if isinstance(domains, dict):
|
||||||
|
domains = domains.get("refdomains", [])
|
||||||
|
return domains if isinstance(domains, list) else []
|
||||||
|
|
||||||
|
async def get_domain_rating(self, url: str) -> float:
|
||||||
|
"""Fetch Domain Rating for a URL."""
|
||||||
|
target = urlparse(url).netloc or url
|
||||||
|
result = await self._call_ahrefs(
|
||||||
|
"site-explorer-domain-rating",
|
||||||
|
{"target": target},
|
||||||
|
)
|
||||||
|
data = result.get("data", result) if isinstance(result, dict) else {}
|
||||||
|
return data.get("domain_rating", 0.0)
|
||||||
|
|
||||||
|
async def get_domain_metrics(self, url: str) -> dict[str, Any]:
|
||||||
|
"""Fetch comprehensive domain metrics."""
|
||||||
|
target = urlparse(url).netloc or url
|
||||||
|
result = await self._call_ahrefs(
|
||||||
|
"site-explorer-backlinks-stats",
|
||||||
|
{"target": target, "mode": "domain"},
|
||||||
|
)
|
||||||
|
data = result.get("data", result) if isinstance(result, dict) else {}
|
||||||
|
return {
|
||||||
|
"total_backlinks": data.get("live", 0),
|
||||||
|
"referring_domains": data.get("live_refdomains", 0),
|
||||||
|
"dofollow": data.get("live_dofollow", 0),
|
||||||
|
}
|
||||||
|
|
||||||
|
def find_gaps(
|
||||||
|
self,
|
||||||
|
target_domains: set[str],
|
||||||
|
competitor_domain_maps: dict[str, set[str]],
|
||||||
|
) -> list[dict[str, Any]]:
|
||||||
|
"""
|
||||||
|
Find domains linking to competitors but not to the target.
|
||||||
|
|
||||||
|
Returns a list of gap domains with metadata about which
|
||||||
|
competitors they link to.
|
||||||
|
"""
|
||||||
|
# Collect all competitor referring domains
|
||||||
|
all_competitor_domains: dict[str, list[str]] = {}
|
||||||
|
|
||||||
|
for comp_name, comp_domains in competitor_domain_maps.items():
|
||||||
|
for domain in comp_domains:
|
||||||
|
domain_lower = domain.lower()
|
||||||
|
if domain_lower not in all_competitor_domains:
|
||||||
|
all_competitor_domains[domain_lower] = []
|
||||||
|
all_competitor_domains[domain_lower].append(comp_name)
|
||||||
|
|
||||||
|
# Find gaps: in competitor set but not in target set
|
||||||
|
target_set_lower = {d.lower() for d in target_domains}
|
||||||
|
gaps = []
|
||||||
|
|
||||||
|
for domain, linked_comps in all_competitor_domains.items():
|
||||||
|
if domain not in target_set_lower:
|
||||||
|
gaps.append({
|
||||||
|
"domain": domain,
|
||||||
|
"linked_competitors": linked_comps,
|
||||||
|
"competitor_count": len(set(linked_comps)),
|
||||||
|
})
|
||||||
|
|
||||||
|
# Sort by number of competitors linking (more = higher priority)
|
||||||
|
gaps.sort(key=lambda g: g["competitor_count"], reverse=True)
|
||||||
|
return gaps
|
||||||
|
|
||||||
|
def score_opportunities(
|
||||||
|
self,
|
||||||
|
gaps: list[dict[str, Any]],
|
||||||
|
refdomains_data: dict[str, list[dict[str, Any]]],
|
||||||
|
total_competitors: int,
|
||||||
|
) -> list[LinkOpportunity]:
|
||||||
|
"""
|
||||||
|
Score gap opportunities by DR, traffic, relevance, and feasibility.
|
||||||
|
|
||||||
|
Scoring factors:
|
||||||
|
- DR weight: Higher DR = more impactful link
|
||||||
|
- Competitor overlap: More competitors linking = easier to acquire
|
||||||
|
- Category bonus: Editorial/news links valued higher
|
||||||
|
- Traffic bonus: Higher traffic domains valued more
|
||||||
|
"""
|
||||||
|
# Build a lookup of domain metadata from competitor refdomains
|
||||||
|
domain_metadata: dict[str, dict[str, Any]] = {}
|
||||||
|
for comp_url, domains in refdomains_data.items():
|
||||||
|
for rd in domains:
|
||||||
|
d = rd.get("domain", rd.get("domain_from", "")).lower()
|
||||||
|
if d and d not in domain_metadata:
|
||||||
|
domain_metadata[d] = {
|
||||||
|
"dr": rd.get("domain_rating", rd.get("dr", 0)),
|
||||||
|
"traffic": rd.get("organic_traffic", rd.get("traffic", 0)),
|
||||||
|
"backlinks": rd.get("backlinks", 0),
|
||||||
|
"country": rd.get("country", ""),
|
||||||
|
}
|
||||||
|
|
||||||
|
opportunities = []
|
||||||
|
|
||||||
|
for gap in gaps:
|
||||||
|
domain = gap["domain"]
|
||||||
|
meta = domain_metadata.get(domain, {})
|
||||||
|
|
||||||
|
dr = meta.get("dr", 0)
|
||||||
|
traffic = meta.get("traffic", 0)
|
||||||
|
comp_count = gap["competitor_count"]
|
||||||
|
|
||||||
|
# Category detection
|
||||||
|
category = self._detect_category(domain)
|
||||||
|
|
||||||
|
# Feasibility score (0-100)
|
||||||
|
# Higher if: more competitors link (social proof), blog/forum (easier outreach)
|
||||||
|
feasibility = min(100, (
|
||||||
|
(comp_count / max(total_competitors, 1)) * 40 # Competitor overlap
|
||||||
|
+ (30 if category in ("blog", "forum", "directory") else 10) # Category ease
|
||||||
|
+ (20 if dr < 60 else 5) # Lower DR = easier to get link from
|
||||||
|
+ (10 if traffic > 0 else 0) # Active site bonus
|
||||||
|
))
|
||||||
|
|
||||||
|
# Impact score (0-100)
|
||||||
|
# Higher if: high DR, high traffic, editorial/news
|
||||||
|
impact = min(100, (
|
||||||
|
min(dr, 100) * 0.4 # DR weight (40%)
|
||||||
|
+ min(traffic / 1000, 30) # Traffic weight (up to 30)
|
||||||
|
+ (20 if category in ("news", "edu_gov") else 5) # Authority bonus
|
||||||
|
+ (comp_count / max(total_competitors, 1)) * 10 # Validation
|
||||||
|
))
|
||||||
|
|
||||||
|
# Overall score = weighted average
|
||||||
|
overall = round(feasibility * 0.4 + impact * 0.6, 1)
|
||||||
|
|
||||||
|
opp = LinkOpportunity(
|
||||||
|
domain=domain,
|
||||||
|
dr=dr,
|
||||||
|
traffic=traffic,
|
||||||
|
linked_competitors=gap["linked_competitors"],
|
||||||
|
competitor_count=comp_count,
|
||||||
|
not_linked_target=True,
|
||||||
|
category=category,
|
||||||
|
feasibility_score=round(feasibility, 1),
|
||||||
|
impact_score=round(impact, 1),
|
||||||
|
overall_score=overall,
|
||||||
|
backlinks_to_competitors=meta.get("backlinks", 0),
|
||||||
|
country=meta.get("country", ""),
|
||||||
|
)
|
||||||
|
opportunities.append(opp)
|
||||||
|
|
||||||
|
# Sort by overall score descending
|
||||||
|
opportunities.sort(key=lambda o: o.overall_score, reverse=True)
|
||||||
|
return opportunities
|
||||||
|
|
||||||
|
def categorize_sources(
|
||||||
|
self, opportunities: list[LinkOpportunity]
|
||||||
|
) -> dict[str, list[LinkOpportunity]]:
|
||||||
|
"""Group opportunities by source category."""
|
||||||
|
categorized: dict[str, list[LinkOpportunity]] = {}
|
||||||
|
for opp in opportunities:
|
||||||
|
cat = opp.category
|
||||||
|
if cat not in categorized:
|
||||||
|
categorized[cat] = []
|
||||||
|
categorized[cat].append(opp)
|
||||||
|
return categorized
|
||||||
|
|
||||||
|
# -- Orchestration -------------------------------------------------------
|
||||||
|
|
||||||
|
async def analyze(
|
||||||
|
self,
|
||||||
|
target_url: str,
|
||||||
|
competitor_urls: list[str],
|
||||||
|
min_dr: float = 0,
|
||||||
|
country_filter: str = "",
|
||||||
|
limit: int = 1000,
|
||||||
|
) -> LinkGapResult:
|
||||||
|
"""Orchestrate full link gap analysis."""
|
||||||
|
target_domain = urlparse(target_url).netloc or target_url
|
||||||
|
comp_domains = [urlparse(c).netloc or c for c in competitor_urls]
|
||||||
|
|
||||||
|
logger.info(f"Starting link gap analysis: {target_domain} vs {comp_domains}")
|
||||||
|
|
||||||
|
result = LinkGapResult(
|
||||||
|
target_url=target_url,
|
||||||
|
target_domain=target_domain,
|
||||||
|
competitor_urls=competitor_urls,
|
||||||
|
competitor_domains=comp_domains,
|
||||||
|
timestamp=datetime.now().isoformat(),
|
||||||
|
)
|
||||||
|
|
||||||
|
# Phase 1: Fetch target DR and referring domains
|
||||||
|
logger.info("Phase 1: Fetching target data...")
|
||||||
|
target_dr_task = self.get_domain_rating(target_url)
|
||||||
|
target_rd_task = self.get_referring_domains(target_url, limit=limit)
|
||||||
|
|
||||||
|
target_dr, target_refdomains = await asyncio.gather(
|
||||||
|
target_dr_task, target_rd_task, return_exceptions=True,
|
||||||
|
)
|
||||||
|
|
||||||
|
result.target_dr = target_dr if isinstance(target_dr, (int, float)) else 0
|
||||||
|
target_rd_list = target_refdomains if isinstance(target_refdomains, list) else []
|
||||||
|
target_domain_set = {
|
||||||
|
rd.get("domain", rd.get("domain_from", "")).lower()
|
||||||
|
for rd in target_rd_list
|
||||||
|
if rd.get("domain", rd.get("domain_from", ""))
|
||||||
|
}
|
||||||
|
|
||||||
|
# Phase 2: Fetch competitor referring domains (parallel)
|
||||||
|
logger.info("Phase 2: Fetching competitor data...")
|
||||||
|
comp_rd_tasks = {
|
||||||
|
comp_url: self.get_referring_domains(comp_url, limit=limit)
|
||||||
|
for comp_url in competitor_urls
|
||||||
|
}
|
||||||
|
comp_results = {}
|
||||||
|
for comp_url, task in comp_rd_tasks.items():
|
||||||
|
try:
|
||||||
|
comp_rd = await task
|
||||||
|
comp_results[comp_url] = comp_rd if isinstance(comp_rd, list) else []
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Failed to fetch refdomains for {comp_url}: {e}")
|
||||||
|
comp_results[comp_url] = []
|
||||||
|
|
||||||
|
# Build competitor domain maps
|
||||||
|
competitor_domain_maps: dict[str, set[str]] = {}
|
||||||
|
for comp_url, rd_list in comp_results.items():
|
||||||
|
comp_domain = urlparse(comp_url).netloc or comp_url
|
||||||
|
competitor_domain_maps[comp_domain] = {
|
||||||
|
rd.get("domain", rd.get("domain_from", "")).lower()
|
||||||
|
for rd in rd_list
|
||||||
|
if rd.get("domain", rd.get("domain_from", ""))
|
||||||
|
}
|
||||||
|
|
||||||
|
# Phase 3: Find gaps
|
||||||
|
logger.info("Phase 3: Finding link gaps...")
|
||||||
|
raw_gaps = self.find_gaps(target_domain_set, competitor_domain_maps)
|
||||||
|
logger.info(f"Found {len(raw_gaps)} gap domains")
|
||||||
|
|
||||||
|
# Phase 4: Score opportunities
|
||||||
|
logger.info("Phase 4: Scoring opportunities...")
|
||||||
|
opportunities = self.score_opportunities(
|
||||||
|
raw_gaps, comp_results, len(competitor_urls)
|
||||||
|
)
|
||||||
|
|
||||||
|
# Apply filters
|
||||||
|
if min_dr > 0:
|
||||||
|
opportunities = [o for o in opportunities if o.dr >= min_dr]
|
||||||
|
|
||||||
|
if country_filter:
|
||||||
|
country_lower = country_filter.lower()
|
||||||
|
opportunities = [
|
||||||
|
o for o in opportunities
|
||||||
|
if o.country.lower() == country_lower or not o.country
|
||||||
|
]
|
||||||
|
|
||||||
|
result.opportunities = opportunities
|
||||||
|
result.top_opportunities = opportunities[:50]
|
||||||
|
|
||||||
|
# Phase 5: Build summary
|
||||||
|
logger.info("Phase 5: Building summary...")
|
||||||
|
result.summary = self._build_summary(
|
||||||
|
opportunities, comp_results, len(target_rd_list)
|
||||||
|
)
|
||||||
|
|
||||||
|
# Phase 6: Generate recommendations
|
||||||
|
self._generate_issues(result)
|
||||||
|
self._generate_recommendations(result)
|
||||||
|
|
||||||
|
logger.info(f"Link gap analysis complete: {len(opportunities)} opportunities found")
|
||||||
|
return result
|
||||||
|
|
||||||
|
# -- Helpers -------------------------------------------------------------
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _detect_category(domain: str) -> str:
|
||||||
|
"""Detect the category of a domain based on patterns."""
|
||||||
|
domain_lower = domain.lower()
|
||||||
|
|
||||||
|
for category, patterns in SOURCE_CATEGORY_PATTERNS.items():
|
||||||
|
for pattern in patterns:
|
||||||
|
if pattern in domain_lower:
|
||||||
|
return category
|
||||||
|
|
||||||
|
# Fallback heuristics
|
||||||
|
if domain_lower.endswith((".edu", ".ac.kr", ".gov", ".go.kr")):
|
||||||
|
return "edu_gov"
|
||||||
|
|
||||||
|
return "other"
|
||||||
|
|
||||||
|
def _build_summary(
|
||||||
|
self,
|
||||||
|
opportunities: list[LinkOpportunity],
|
||||||
|
comp_results: dict[str, list],
|
||||||
|
target_rd_count: int,
|
||||||
|
) -> GapSummary:
|
||||||
|
"""Build summary statistics from opportunities."""
|
||||||
|
summary = GapSummary()
|
||||||
|
summary.total_opportunities = len(opportunities)
|
||||||
|
summary.target_refdomains_count = target_rd_count
|
||||||
|
|
||||||
|
if opportunities:
|
||||||
|
dr_values = [o.dr for o in opportunities if o.dr > 0]
|
||||||
|
summary.avg_dr = round(sum(dr_values) / max(len(dr_values), 1), 1)
|
||||||
|
summary.high_dr_count = sum(1 for o in opportunities if o.dr >= 50)
|
||||||
|
|
||||||
|
# Category breakdown
|
||||||
|
cat_counts: dict[str, int] = {}
|
||||||
|
country_counts: dict[str, int] = {}
|
||||||
|
for opp in opportunities:
|
||||||
|
cat_counts[opp.category] = cat_counts.get(opp.category, 0) + 1
|
||||||
|
if opp.country:
|
||||||
|
country_counts[opp.country] = country_counts.get(opp.country, 0) + 1
|
||||||
|
|
||||||
|
summary.category_breakdown = dict(
|
||||||
|
sorted(cat_counts.items(), key=lambda x: x[1], reverse=True)
|
||||||
|
)
|
||||||
|
summary.top_countries = sorted(
|
||||||
|
[{"country": k, "count": v} for k, v in country_counts.items()],
|
||||||
|
key=lambda x: x["count"], reverse=True,
|
||||||
|
)[:10]
|
||||||
|
|
||||||
|
# Competitor refdomains counts
|
||||||
|
for comp_url, rd_list in comp_results.items():
|
||||||
|
comp_domain = urlparse(comp_url).netloc or comp_url
|
||||||
|
summary.total_competitor_refdomains[comp_domain] = len(rd_list)
|
||||||
|
|
||||||
|
return summary
|
||||||
|
|
||||||
|
def _generate_issues(self, result: LinkGapResult) -> None:
|
||||||
|
"""Generate issues based on gap analysis."""
|
||||||
|
issues = []
|
||||||
|
|
||||||
|
if result.summary:
|
||||||
|
# Large gap warning
|
||||||
|
if result.summary.total_opportunities > 500:
|
||||||
|
issues.append({
|
||||||
|
"type": "warning",
|
||||||
|
"category": "link_gap",
|
||||||
|
"message": (
|
||||||
|
f"Large link gap: {result.summary.total_opportunities} domains "
|
||||||
|
"link to competitors but not to you"
|
||||||
|
),
|
||||||
|
})
|
||||||
|
|
||||||
|
# High-DR gap
|
||||||
|
if result.summary.high_dr_count > 50:
|
||||||
|
issues.append({
|
||||||
|
"type": "error",
|
||||||
|
"category": "authority_gap",
|
||||||
|
"message": (
|
||||||
|
f"{result.summary.high_dr_count} high-authority domains (DR 50+) "
|
||||||
|
"link to competitors but not to you"
|
||||||
|
),
|
||||||
|
})
|
||||||
|
|
||||||
|
# Category-specific gaps
|
||||||
|
news_gap = result.summary.category_breakdown.get("news", 0)
|
||||||
|
if news_gap > 20:
|
||||||
|
issues.append({
|
||||||
|
"type": "warning",
|
||||||
|
"category": "pr_gap",
|
||||||
|
"message": f"{news_gap} news/media domains link to competitors - consider digital PR",
|
||||||
|
})
|
||||||
|
|
||||||
|
edu_gap = result.summary.category_breakdown.get("edu_gov", 0)
|
||||||
|
if edu_gap > 5:
|
||||||
|
issues.append({
|
||||||
|
"type": "info",
|
||||||
|
"category": "edu_gov_gap",
|
||||||
|
"message": f"{edu_gap} .edu/.gov domains link to competitors - high-authority opportunity",
|
||||||
|
})
|
||||||
|
|
||||||
|
result.issues = issues
|
||||||
|
|
||||||
|
def _generate_recommendations(self, result: LinkGapResult) -> None:
|
||||||
|
"""Generate actionable recommendations."""
|
||||||
|
recs = []
|
||||||
|
|
||||||
|
if not result.opportunities:
|
||||||
|
recs.append("No significant link gaps found. Consider expanding competitor list.")
|
||||||
|
result.recommendations = recs
|
||||||
|
return
|
||||||
|
|
||||||
|
# Top opportunities by category
|
||||||
|
categorized = self.categorize_sources(result.top_opportunities[:100])
|
||||||
|
|
||||||
|
if "news" in categorized:
|
||||||
|
news_count = len(categorized["news"])
|
||||||
|
top_news = [o.domain for o in categorized["news"][:3]]
|
||||||
|
recs.append(
|
||||||
|
f"Pursue {news_count} news/media link opportunities. "
|
||||||
|
f"Top targets: {', '.join(top_news)}. "
|
||||||
|
"Strategy: create newsworthy content, press releases, expert commentary."
|
||||||
|
)
|
||||||
|
|
||||||
|
if "blog" in categorized:
|
||||||
|
blog_count = len(categorized["blog"])
|
||||||
|
recs.append(
|
||||||
|
f"Target {blog_count} blog/content site opportunities via guest posting, "
|
||||||
|
"collaborative content, and expert interviews."
|
||||||
|
)
|
||||||
|
|
||||||
|
if "directory" in categorized:
|
||||||
|
dir_count = len(categorized["directory"])
|
||||||
|
recs.append(
|
||||||
|
f"Submit to {dir_count} relevant directories and listing sites. "
|
||||||
|
"Low effort, moderate impact for local SEO signals."
|
||||||
|
)
|
||||||
|
|
||||||
|
if "forum" in categorized:
|
||||||
|
forum_count = len(categorized["forum"])
|
||||||
|
recs.append(
|
||||||
|
f"Engage in {forum_count} forum/community sites with helpful answers "
|
||||||
|
"and resource sharing. Build presence before linking."
|
||||||
|
)
|
||||||
|
|
||||||
|
if "korean_platform" in categorized:
|
||||||
|
kr_count = len(categorized["korean_platform"])
|
||||||
|
recs.append(
|
||||||
|
f"Build presence on {kr_count} Korean platforms (Naver, Tistory, Brunch). "
|
||||||
|
"Critical for Korean SERP visibility."
|
||||||
|
)
|
||||||
|
|
||||||
|
if "edu_gov" in categorized:
|
||||||
|
eg_count = len(categorized["edu_gov"])
|
||||||
|
recs.append(
|
||||||
|
f"Target {eg_count} .edu/.gov link opportunities through scholarship "
|
||||||
|
"programs, research partnerships, or government resource contributions."
|
||||||
|
)
|
||||||
|
|
||||||
|
# Multi-competitor overlap
|
||||||
|
multi_comp = [o for o in result.top_opportunities if o.competitor_count >= 2]
|
||||||
|
if multi_comp:
|
||||||
|
recs.append(
|
||||||
|
f"{len(multi_comp)} domains link to multiple competitors but not to you. "
|
||||||
|
"These are high-priority targets as they validate industry relevance."
|
||||||
|
)
|
||||||
|
|
||||||
|
# Quick wins: high feasibility, moderate impact
|
||||||
|
quick_wins = [
|
||||||
|
o for o in result.opportunities[:100]
|
||||||
|
if o.feasibility_score >= 60 and o.impact_score >= 30
|
||||||
|
]
|
||||||
|
if quick_wins:
|
||||||
|
recs.append(
|
||||||
|
f"Prioritize {len(quick_wins)} quick-win opportunities with high "
|
||||||
|
"feasibility and moderate impact for fastest link acquisition."
|
||||||
|
)
|
||||||
|
|
||||||
|
result.recommendations = recs
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Output Formatting
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def format_rich_output(result: LinkGapResult) -> None:
|
||||||
|
"""Display gap analysis results using Rich tables."""
|
||||||
|
console.print(f"\n[bold cyan]Link Gap Analysis: {result.target_domain}[/bold cyan]")
|
||||||
|
console.print(f"[dim]vs {', '.join(result.competitor_domains)}[/dim]")
|
||||||
|
console.print(f"[dim]Timestamp: {result.timestamp}[/dim]\n")
|
||||||
|
|
||||||
|
# Summary
|
||||||
|
if result.summary:
|
||||||
|
summary_table = Table(title="Summary", show_header=True, header_style="bold magenta")
|
||||||
|
summary_table.add_column("Metric", style="cyan")
|
||||||
|
summary_table.add_column("Value", style="green")
|
||||||
|
summary_table.add_row("Target DR", str(result.target_dr))
|
||||||
|
summary_table.add_row("Target Referring Domains", str(result.summary.target_refdomains_count))
|
||||||
|
summary_table.add_row("Total Gap Opportunities", str(result.summary.total_opportunities))
|
||||||
|
summary_table.add_row("Avg Opportunity DR", str(result.summary.avg_dr))
|
||||||
|
summary_table.add_row("High-DR Opportunities (50+)", str(result.summary.high_dr_count))
|
||||||
|
|
||||||
|
for comp, count in result.summary.total_competitor_refdomains.items():
|
||||||
|
summary_table.add_row(f" {comp} Refdomains", str(count))
|
||||||
|
|
||||||
|
console.print(summary_table)
|
||||||
|
|
||||||
|
# Category breakdown
|
||||||
|
if result.summary and result.summary.category_breakdown:
|
||||||
|
cat_table = Table(title="\nCategory Breakdown", show_header=True, header_style="bold magenta")
|
||||||
|
cat_table.add_column("Category", style="cyan")
|
||||||
|
cat_table.add_column("Count", style="green")
|
||||||
|
for cat, count in result.summary.category_breakdown.items():
|
||||||
|
cat_table.add_row(cat, str(count))
|
||||||
|
console.print(cat_table)
|
||||||
|
|
||||||
|
# Top opportunities
|
||||||
|
if result.top_opportunities:
|
||||||
|
opp_table = Table(
|
||||||
|
title=f"\nTop Opportunities (showing {min(25, len(result.top_opportunities))})",
|
||||||
|
show_header=True,
|
||||||
|
header_style="bold magenta",
|
||||||
|
)
|
||||||
|
opp_table.add_column("Domain", style="cyan", max_width=35)
|
||||||
|
opp_table.add_column("DR", style="green", justify="right")
|
||||||
|
opp_table.add_column("Category", style="yellow")
|
||||||
|
opp_table.add_column("Comps", justify="right")
|
||||||
|
opp_table.add_column("Score", style="bold green", justify="right")
|
||||||
|
opp_table.add_column("Feasibility", justify="right")
|
||||||
|
opp_table.add_column("Impact", justify="right")
|
||||||
|
|
||||||
|
for opp in result.top_opportunities[:25]:
|
||||||
|
opp_table.add_row(
|
||||||
|
opp.domain[:35],
|
||||||
|
str(int(opp.dr)),
|
||||||
|
opp.category,
|
||||||
|
str(opp.competitor_count),
|
||||||
|
f"{opp.overall_score:.1f}",
|
||||||
|
f"{opp.feasibility_score:.0f}",
|
||||||
|
f"{opp.impact_score:.0f}",
|
||||||
|
)
|
||||||
|
console.print(opp_table)
|
||||||
|
|
||||||
|
# Issues
|
||||||
|
if result.issues:
|
||||||
|
console.print("\n[bold red]Issues:[/bold red]")
|
||||||
|
for issue in result.issues:
|
||||||
|
icon_map = {"error": "[red]ERROR[/red]", "warning": "[yellow]WARN[/yellow]", "info": "[blue]INFO[/blue]"}
|
||||||
|
icon = icon_map.get(issue["type"], "[dim]INFO[/dim]")
|
||||||
|
console.print(f" {icon} [{issue['category']}] {issue['message']}")
|
||||||
|
|
||||||
|
# Recommendations
|
||||||
|
if result.recommendations:
|
||||||
|
console.print("\n[bold green]Recommendations:[/bold green]")
|
||||||
|
for i, rec in enumerate(result.recommendations, 1):
|
||||||
|
console.print(f" {i}. {rec}")
|
||||||
|
|
||||||
|
console.print()
|
||||||
|
|
||||||
|
|
||||||
|
def result_to_dict(result: LinkGapResult) -> dict[str, Any]:
|
||||||
|
"""Convert gap result to JSON-serializable dict."""
|
||||||
|
return {
|
||||||
|
"target_url": result.target_url,
|
||||||
|
"target_domain": result.target_domain,
|
||||||
|
"target_dr": result.target_dr,
|
||||||
|
"competitor_urls": result.competitor_urls,
|
||||||
|
"competitor_domains": result.competitor_domains,
|
||||||
|
"summary": asdict(result.summary) if result.summary else None,
|
||||||
|
"opportunities": [asdict(o) for o in result.opportunities],
|
||||||
|
"top_opportunities": [asdict(o) for o in result.top_opportunities],
|
||||||
|
"issues": result.issues,
|
||||||
|
"recommendations": result.recommendations,
|
||||||
|
"timestamp": result.timestamp,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# CLI
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def parse_args() -> argparse.Namespace:
|
||||||
|
"""Parse command-line arguments."""
|
||||||
|
parser = argparse.ArgumentParser(
|
||||||
|
description="Link Gap Finder - Identify link building opportunities vs competitors",
|
||||||
|
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||||
|
epilog="""
|
||||||
|
Examples:
|
||||||
|
python link_gap_finder.py --target https://example.com --competitor https://comp1.com --json
|
||||||
|
python link_gap_finder.py --target https://example.com --competitor https://comp1.com --competitor https://comp2.com --min-dr 30 --json
|
||||||
|
python link_gap_finder.py --target https://example.com --competitor https://comp1.com --country kr --output gap_report.json
|
||||||
|
""",
|
||||||
|
)
|
||||||
|
parser.add_argument("--target", required=True, help="Target URL or domain")
|
||||||
|
parser.add_argument(
|
||||||
|
"--competitor", action="append", required=True,
|
||||||
|
help="Competitor URL or domain (can be repeated)",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--min-dr", type=float, default=0,
|
||||||
|
help="Minimum DR filter for opportunities (default: 0)",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--country", default="",
|
||||||
|
help="Filter by country code (e.g., kr, us, jp)",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--limit", type=int, default=1000,
|
||||||
|
help="Max referring domains to fetch per site (default: 1000)",
|
||||||
|
)
|
||||||
|
parser.add_argument("--json", action="store_true", help="Output as JSON")
|
||||||
|
parser.add_argument("--output", "-o", help="Save output to file")
|
||||||
|
return parser.parse_args()
|
||||||
|
|
||||||
|
|
||||||
|
async def main() -> None:
|
||||||
|
"""Main entry point."""
|
||||||
|
args = parse_args()
|
||||||
|
|
||||||
|
finder = LinkGapFinder()
|
||||||
|
|
||||||
|
try:
|
||||||
|
result = await finder.analyze(
|
||||||
|
target_url=args.target,
|
||||||
|
competitor_urls=args.competitor,
|
||||||
|
min_dr=args.min_dr,
|
||||||
|
country_filter=args.country,
|
||||||
|
limit=args.limit,
|
||||||
|
)
|
||||||
|
|
||||||
|
if args.json or args.output:
|
||||||
|
output_data = result_to_dict(result)
|
||||||
|
json_str = json.dumps(output_data, indent=2, ensure_ascii=False)
|
||||||
|
|
||||||
|
if args.output:
|
||||||
|
with open(args.output, "w", encoding="utf-8") as f:
|
||||||
|
f.write(json_str)
|
||||||
|
logger.info(f"Report saved to {args.output}")
|
||||||
|
|
||||||
|
if args.json:
|
||||||
|
print(json_str)
|
||||||
|
else:
|
||||||
|
format_rich_output(result)
|
||||||
|
|
||||||
|
finder.print_stats()
|
||||||
|
|
||||||
|
except KeyboardInterrupt:
|
||||||
|
logger.warning("Analysis interrupted by user")
|
||||||
|
sys.exit(1)
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Analysis failed: {e}")
|
||||||
|
if args.json:
|
||||||
|
print(json.dumps({"error": str(e)}, indent=2))
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
asyncio.run(main())
|
||||||
@@ -0,0 +1,8 @@
|
|||||||
|
# 22-seo-link-building dependencies
|
||||||
|
requests>=2.31.0
|
||||||
|
aiohttp>=3.9.0
|
||||||
|
pandas>=2.1.0
|
||||||
|
tenacity>=8.2.0
|
||||||
|
tqdm>=4.66.0
|
||||||
|
python-dotenv>=1.0.0
|
||||||
|
rich>=13.7.0
|
||||||
116
custom-skills/22-seo-link-building/desktop/SKILL.md
Normal file
116
custom-skills/22-seo-link-building/desktop/SKILL.md
Normal file
@@ -0,0 +1,116 @@
|
|||||||
|
---
|
||||||
|
name: seo-link-building
|
||||||
|
description: |
|
||||||
|
Link building diagnosis and backlink analysis tool.
|
||||||
|
Triggers: backlink audit, link building, referring domains, toxic links, link gap, broken backlinks, 백링크 분석, 링크빌딩.
|
||||||
|
---
|
||||||
|
|
||||||
|
# SEO Link Building Diagnosis
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
Analyze backlink profiles, detect toxic links, find competitor link gaps, track link velocity, and map Korean platform links. Provides actionable link building recommendations.
|
||||||
|
|
||||||
|
## Core Capabilities
|
||||||
|
|
||||||
|
1. **Backlink Profile Audit** - DR, referring domains, dofollow ratio, anchor distribution
|
||||||
|
2. **Toxic Link Detection** - PBN patterns, spam domains, link farm identification
|
||||||
|
3. **Competitor Link Gap Analysis** - Domains linking to competitors but not target
|
||||||
|
4. **Link Velocity Tracking** - New/lost referring domains over time
|
||||||
|
5. **Broken Backlink Recovery** - Find and reclaim broken high-DR backlinks
|
||||||
|
6. **Korean Platform Mapping** - Naver Blog, Cafe, Tistory, Brunch, Korean news
|
||||||
|
|
||||||
|
## MCP Tool Usage
|
||||||
|
|
||||||
|
### Ahrefs for Backlink Data
|
||||||
|
```
|
||||||
|
mcp__ahrefs__site-explorer-all-backlinks: Get all backlinks for a target
|
||||||
|
mcp__ahrefs__site-explorer-backlinks-stats: Backlink statistics overview
|
||||||
|
mcp__ahrefs__site-explorer-referring-domains: List referring domains
|
||||||
|
mcp__ahrefs__site-explorer-anchors: Anchor text distribution
|
||||||
|
mcp__ahrefs__site-explorer-broken-backlinks: Find broken backlinks
|
||||||
|
mcp__ahrefs__site-explorer-domain-rating: Get Domain Rating
|
||||||
|
mcp__ahrefs__site-explorer-domain-rating-history: DR trend over time
|
||||||
|
mcp__ahrefs__site-explorer-refdomains-history: Referring domains trend
|
||||||
|
mcp__ahrefs__site-explorer-linked-domains: Domains linked from target
|
||||||
|
```
|
||||||
|
|
||||||
|
### Notion for Report Storage
|
||||||
|
```
|
||||||
|
mcp__notion__notion-create-pages: Save audit report to SEO Audit Log
|
||||||
|
mcp__notion__notion-update-page: Update existing audit entries
|
||||||
|
```
|
||||||
|
|
||||||
|
## Workflow
|
||||||
|
|
||||||
|
### 1. Backlink Profile Audit
|
||||||
|
1. Fetch Domain Rating via `site-explorer-domain-rating`
|
||||||
|
2. Get backlink stats via `site-explorer-backlinks-stats`
|
||||||
|
3. Retrieve referring domains via `site-explorer-referring-domains`
|
||||||
|
4. Analyze anchor distribution via `site-explorer-anchors`
|
||||||
|
5. Detect toxic links (PBN patterns, spam keywords, suspicious TLDs)
|
||||||
|
6. Map Korean platform links from referring domains
|
||||||
|
7. Report with issues and recommendations
|
||||||
|
|
||||||
|
### 2. Link Gap Analysis
|
||||||
|
1. Fetch target referring domains
|
||||||
|
2. Fetch competitor referring domains (parallel)
|
||||||
|
3. Compute set difference (competitor - target)
|
||||||
|
4. Score opportunities by DR, traffic, category
|
||||||
|
5. Categorize sources (news, blog, forum, directory, Korean platform)
|
||||||
|
6. Rank by feasibility and impact
|
||||||
|
7. Report top opportunities with recommendations
|
||||||
|
|
||||||
|
### 3. Link Velocity Check
|
||||||
|
1. Fetch refdomains-history for last 90 days
|
||||||
|
2. Calculate new/lost referring domains per period
|
||||||
|
3. Determine velocity trend (growing/stable/declining)
|
||||||
|
4. Flag declining velocity as issue
|
||||||
|
|
||||||
|
### 4. Broken Backlink Recovery
|
||||||
|
1. Fetch broken backlinks via `site-explorer-broken-backlinks`
|
||||||
|
2. Sort by DR (highest value first)
|
||||||
|
3. Recommend 301 redirects or content recreation
|
||||||
|
|
||||||
|
## Output Format
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
## Link Building Audit: [domain]
|
||||||
|
|
||||||
|
### Overview
|
||||||
|
- Domain Rating: [DR]
|
||||||
|
- Referring Domains: [count]
|
||||||
|
- Dofollow Ratio: [ratio]
|
||||||
|
- Toxic Links: [count] ([risk level])
|
||||||
|
|
||||||
|
### Anchor Distribution
|
||||||
|
| Type | Count | % |
|
||||||
|
|------|-------|---|
|
||||||
|
| Branded | [n] | [%] |
|
||||||
|
| Exact Match | [n] | [%] |
|
||||||
|
| Generic | [n] | [%] |
|
||||||
|
| Naked URL | [n] | [%] |
|
||||||
|
|
||||||
|
### Toxic Links (Top 10)
|
||||||
|
| Domain | Risk Score | Reason |
|
||||||
|
|--------|-----------|--------|
|
||||||
|
|
||||||
|
### Korean Platform Links
|
||||||
|
| Platform | Count |
|
||||||
|
|----------|-------|
|
||||||
|
|
||||||
|
### Link Velocity
|
||||||
|
| Period | New | Lost |
|
||||||
|
|--------|-----|------|
|
||||||
|
|
||||||
|
### Recommendations
|
||||||
|
1. [Priority actions]
|
||||||
|
```
|
||||||
|
|
||||||
|
## Notion Output (Required)
|
||||||
|
|
||||||
|
All audit reports MUST be saved to OurDigital SEO Audit Log:
|
||||||
|
- **Database ID**: `2c8581e5-8a1e-8035-880b-e38cefc2f3ef`
|
||||||
|
- **Properties**: Issue (title), Site (url), Category (Link Building), Priority, Found Date, Audit ID
|
||||||
|
- **Language**: Korean with English technical terms
|
||||||
|
- **Audit ID Format**: LINK-YYYYMMDD-NNN
|
||||||
8
custom-skills/22-seo-link-building/desktop/skill.yaml
Normal file
8
custom-skills/22-seo-link-building/desktop/skill.yaml
Normal file
@@ -0,0 +1,8 @@
|
|||||||
|
name: seo-link-building
|
||||||
|
description: |
|
||||||
|
Link building diagnosis and backlink analysis. Triggers: backlink audit, link building, referring domains, toxic links, link gap, broken backlinks.
|
||||||
|
allowed-tools:
|
||||||
|
- mcp__ahrefs__*
|
||||||
|
- mcp__notion__*
|
||||||
|
- WebSearch
|
||||||
|
- WebFetch
|
||||||
70
custom-skills/22-seo-link-building/desktop/tools/ahrefs.md
Normal file
70
custom-skills/22-seo-link-building/desktop/tools/ahrefs.md
Normal file
@@ -0,0 +1,70 @@
|
|||||||
|
# Ahrefs
|
||||||
|
|
||||||
|
## Tools Used
|
||||||
|
|
||||||
|
### site-explorer-all-backlinks
|
||||||
|
- **Purpose**: Get all backlinks for a target domain
|
||||||
|
- **Parameters**: target, mode (domain/prefix/url), limit, order_by
|
||||||
|
- **Returns**: List of backlinks with source URL, domain, DR, anchor, dofollow status
|
||||||
|
|
||||||
|
### site-explorer-backlinks-stats
|
||||||
|
- **Purpose**: Backlink statistics overview
|
||||||
|
- **Parameters**: target, mode
|
||||||
|
- **Returns**: Total backlinks, referring domains, dofollow/nofollow counts
|
||||||
|
|
||||||
|
### site-explorer-referring-domains
|
||||||
|
- **Purpose**: List all referring domains
|
||||||
|
- **Parameters**: target, mode, limit, order_by
|
||||||
|
- **Returns**: Domains with DR, backlinks count, traffic, country
|
||||||
|
|
||||||
|
### site-explorer-anchors
|
||||||
|
- **Purpose**: Anchor text distribution
|
||||||
|
- **Parameters**: target, mode, limit, order_by
|
||||||
|
- **Returns**: Anchor texts with backlink and referring domain counts
|
||||||
|
|
||||||
|
### site-explorer-broken-backlinks
|
||||||
|
- **Purpose**: Find broken backlinks for recovery
|
||||||
|
- **Parameters**: target, mode, limit, order_by
|
||||||
|
- **Returns**: Broken links with source, target URL, HTTP code, DR
|
||||||
|
|
||||||
|
### site-explorer-domain-rating
|
||||||
|
- **Purpose**: Get Domain Rating for a target
|
||||||
|
- **Parameters**: target
|
||||||
|
- **Returns**: Domain Rating value and Ahrefs rank
|
||||||
|
|
||||||
|
### site-explorer-domain-rating-history
|
||||||
|
- **Purpose**: DR trend over time
|
||||||
|
- **Parameters**: target, date_from
|
||||||
|
- **Returns**: Historical DR data points
|
||||||
|
|
||||||
|
### site-explorer-refdomains-history
|
||||||
|
- **Purpose**: Referring domains trend over time
|
||||||
|
- **Parameters**: target, mode, date_from
|
||||||
|
- **Returns**: Historical referring domain counts
|
||||||
|
|
||||||
|
### site-explorer-linked-domains
|
||||||
|
- **Purpose**: Domains linked from the target
|
||||||
|
- **Parameters**: target, mode, limit
|
||||||
|
- **Returns**: Outgoing linked domains with counts
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
- Ahrefs MCP tools are available via `mcp__ahrefs__*` prefix
|
||||||
|
- No API key needed when using MCP (handled by tool server)
|
||||||
|
- Rate limits: Follow Ahrefs plan limits (typically 500 rows/request)
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
```
|
||||||
|
# Get backlink stats
|
||||||
|
mcp__ahrefs__site-explorer-backlinks-stats(target="example.com", mode="domain")
|
||||||
|
|
||||||
|
# Get referring domains sorted by DR
|
||||||
|
mcp__ahrefs__site-explorer-referring-domains(target="example.com", mode="domain", limit=500, order_by="domain_rating:desc")
|
||||||
|
|
||||||
|
# Get anchor text distribution
|
||||||
|
mcp__ahrefs__site-explorer-anchors(target="example.com", mode="domain", limit=200)
|
||||||
|
|
||||||
|
# Find broken backlinks
|
||||||
|
mcp__ahrefs__site-explorer-broken-backlinks(target="example.com", mode="domain", limit=100)
|
||||||
|
```
|
||||||
39
custom-skills/22-seo-link-building/desktop/tools/notion.md
Normal file
39
custom-skills/22-seo-link-building/desktop/tools/notion.md
Normal file
@@ -0,0 +1,39 @@
|
|||||||
|
# Notion
|
||||||
|
|
||||||
|
## Tools Used
|
||||||
|
|
||||||
|
### notion-create-pages
|
||||||
|
- **Purpose**: Save link building audit reports to SEO Audit Log
|
||||||
|
- **Database ID**: `2c8581e5-8a1e-8035-880b-e38cefc2f3ef`
|
||||||
|
- **Required Properties**:
|
||||||
|
- Issue (title): Report title in Korean with date
|
||||||
|
- Site (url): Audited website URL
|
||||||
|
- Category (select): "Link Building"
|
||||||
|
- Priority (select): Critical / High / Medium / Low
|
||||||
|
- Found Date (date): YYYY-MM-DD
|
||||||
|
- Audit ID (rich_text): LINK-YYYYMMDD-NNN
|
||||||
|
|
||||||
|
### notion-update-page
|
||||||
|
- **Purpose**: Update existing audit entries with follow-up findings
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
- Notion MCP tools available via `mcp__notion__*` prefix
|
||||||
|
- Authentication handled by MCP tool server
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
```
|
||||||
|
# Create a link building audit report
|
||||||
|
mcp__notion__notion-create-pages(
|
||||||
|
parent={"database_id": "2c8581e5-8a1e-8035-880b-e38cefc2f3ef"},
|
||||||
|
properties={
|
||||||
|
"Issue": {"title": [{"text": {"content": "백링크 프로필 분석 - example.com (2025-01-15)"}}]},
|
||||||
|
"Site": {"url": "https://example.com"},
|
||||||
|
"Category": {"select": {"name": "Link Building"}},
|
||||||
|
"Priority": {"select": {"name": "High"}},
|
||||||
|
"Found Date": {"date": {"start": "2025-01-15"}},
|
||||||
|
"Audit ID": {"rich_text": [{"text": {"content": "LINK-20250115-001"}}]}
|
||||||
|
}
|
||||||
|
)
|
||||||
|
```
|
||||||
@@ -0,0 +1,24 @@
|
|||||||
|
# WebSearch
|
||||||
|
|
||||||
|
## Tools Used
|
||||||
|
|
||||||
|
### WebSearch
|
||||||
|
- **Purpose**: Research link building strategies, competitor insights, and industry best practices
|
||||||
|
- **Usage**: Supplement Ahrefs data with web research for context
|
||||||
|
|
||||||
|
### WebFetch
|
||||||
|
- **Purpose**: Fetch specific web pages for content analysis and link prospecting
|
||||||
|
- **Usage**: Verify link opportunities, check page content relevance
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
```
|
||||||
|
# Research link building strategies for a niche
|
||||||
|
WebSearch("link building strategies for SaaS companies 2025")
|
||||||
|
|
||||||
|
# Research Korean link building opportunities
|
||||||
|
WebSearch("네이버 블로그 백링크 전략 2025")
|
||||||
|
|
||||||
|
# Check if a target page is relevant for outreach
|
||||||
|
WebFetch("https://example.com/resources", "What topics does this page cover?")
|
||||||
|
```
|
||||||
142
custom-skills/23-seo-content-strategy/code/CLAUDE.md
Normal file
142
custom-skills/23-seo-content-strategy/code/CLAUDE.md
Normal file
@@ -0,0 +1,142 @@
|
|||||||
|
# CLAUDE.md
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Content strategy tool for SEO-driven content planning. Performs content inventory via sitemap crawl and Ahrefs top pages, scores content performance, detects content decay, analyzes topic gaps vs competitors, maps topic clusters, and generates content briefs. Supports Korean content patterns (Naver Blog format, review/후기 content).
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install -r scripts/requirements.txt
|
||||||
|
|
||||||
|
# Content audit
|
||||||
|
python scripts/content_auditor.py --url https://example.com --json
|
||||||
|
|
||||||
|
# Content gap analysis
|
||||||
|
python scripts/content_gap_analyzer.py --target https://example.com --competitor https://competitor.com --json
|
||||||
|
|
||||||
|
# Generate content brief
|
||||||
|
python scripts/content_brief_generator.py --keyword "치과 임플란트 비용" --url https://example.com --json
|
||||||
|
```
|
||||||
|
|
||||||
|
## Scripts
|
||||||
|
|
||||||
|
| Script | Purpose | Key Output |
|
||||||
|
|--------|---------|------------|
|
||||||
|
| `content_auditor.py` | Content inventory, performance scoring, decay detection | Content inventory with scores and decay flags |
|
||||||
|
| `content_gap_analyzer.py` | Topic gap analysis and cluster mapping vs competitors | Missing topics, cluster map, editorial calendar |
|
||||||
|
| `content_brief_generator.py` | Generate SEO content briefs with outlines | Brief with outline, keywords, word count targets |
|
||||||
|
| `base_client.py` | Shared utilities | RateLimiter, ConfigManager, BaseAsyncClient |
|
||||||
|
|
||||||
|
## Content Auditor
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Full content audit
|
||||||
|
python scripts/content_auditor.py --url https://example.com --json
|
||||||
|
|
||||||
|
# Detect decaying content
|
||||||
|
python scripts/content_auditor.py --url https://example.com --decay --json
|
||||||
|
|
||||||
|
# Filter by content type
|
||||||
|
python scripts/content_auditor.py --url https://example.com --type blog --json
|
||||||
|
```
|
||||||
|
|
||||||
|
**Capabilities**:
|
||||||
|
- Content inventory via sitemap crawl + Ahrefs top-pages
|
||||||
|
- Performance scoring (traffic, rankings, backlinks)
|
||||||
|
- Content decay detection (pages losing traffic over time)
|
||||||
|
- Content type classification (blog, product, service, landing, resource)
|
||||||
|
- Word count and freshness assessment
|
||||||
|
- Korean content format analysis (Naver Blog style, 후기/review content)
|
||||||
|
|
||||||
|
## Content Gap Analyzer
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Gap analysis vs competitor
|
||||||
|
python scripts/content_gap_analyzer.py --target https://example.com --competitor https://comp1.com --json
|
||||||
|
|
||||||
|
# With topic cluster mapping
|
||||||
|
python scripts/content_gap_analyzer.py --target https://example.com --competitor https://comp1.com --clusters --json
|
||||||
|
```
|
||||||
|
|
||||||
|
**Capabilities**:
|
||||||
|
- Topic gap identification vs competitors
|
||||||
|
- Topic cluster mapping (pillar + cluster pages)
|
||||||
|
- Content freshness comparison
|
||||||
|
- Content volume comparison
|
||||||
|
- Editorial calendar generation with priority scoring
|
||||||
|
- Korean content opportunity detection
|
||||||
|
|
||||||
|
## Content Brief Generator
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Generate brief for keyword
|
||||||
|
python scripts/content_brief_generator.py --keyword "치과 임플란트 비용" --url https://example.com --json
|
||||||
|
|
||||||
|
# With competitor analysis
|
||||||
|
python scripts/content_brief_generator.py --keyword "dental implant cost" --url https://example.com --competitors 5 --json
|
||||||
|
```
|
||||||
|
|
||||||
|
**Capabilities**:
|
||||||
|
- Content outline generation with H2/H3 structure
|
||||||
|
- Target keyword list (primary + secondary + LSI)
|
||||||
|
- Word count recommendation based on top-ranking pages
|
||||||
|
- Competitor content analysis (structure, word count, topics covered)
|
||||||
|
- Internal linking suggestions
|
||||||
|
- Korean content format recommendations
|
||||||
|
|
||||||
|
## Ahrefs MCP Tools Used
|
||||||
|
|
||||||
|
| Tool | Purpose |
|
||||||
|
|------|---------|
|
||||||
|
| `site-explorer-top-pages` | Get top performing pages |
|
||||||
|
| `site-explorer-pages-by-traffic` | Pages ranked by organic traffic |
|
||||||
|
| `site-explorer-organic-keywords` | Keywords per page |
|
||||||
|
| `site-explorer-organic-competitors` | Find content competitors |
|
||||||
|
| `site-explorer-best-by-external-links` | Best content by links |
|
||||||
|
|
||||||
|
## Output Format
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"url": "https://example.com",
|
||||||
|
"content_inventory": {
|
||||||
|
"total_pages": 150,
|
||||||
|
"by_type": {"blog": 80, "product": 40, "service": 20, "other": 10},
|
||||||
|
"avg_performance_score": 45
|
||||||
|
},
|
||||||
|
"decaying_content": [...],
|
||||||
|
"top_performers": [...],
|
||||||
|
"gaps": [...],
|
||||||
|
"clusters": [...],
|
||||||
|
"timestamp": "2025-01-01T00:00:00"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Notion Output (Required)
|
||||||
|
|
||||||
|
**IMPORTANT**: All audit reports MUST be saved to the OurDigital SEO Audit Log database.
|
||||||
|
|
||||||
|
### Database Configuration
|
||||||
|
|
||||||
|
| Field | Value |
|
||||||
|
|-------|-------|
|
||||||
|
| Database ID | `2c8581e5-8a1e-8035-880b-e38cefc2f3ef` |
|
||||||
|
| URL | https://www.notion.so/dintelligence/2c8581e58a1e8035880be38cefc2f3ef |
|
||||||
|
|
||||||
|
### Required Properties
|
||||||
|
|
||||||
|
| Property | Type | Description |
|
||||||
|
|----------|------|-------------|
|
||||||
|
| Issue | Title | Report title (Korean + date) |
|
||||||
|
| Site | URL | Audited website URL |
|
||||||
|
| Category | Select | Content Strategy |
|
||||||
|
| Priority | Select | Based on gap severity |
|
||||||
|
| Found Date | Date | Audit date (YYYY-MM-DD) |
|
||||||
|
| Audit ID | Rich Text | Format: CONTENT-YYYYMMDD-NNN |
|
||||||
|
|
||||||
|
### Language Guidelines
|
||||||
|
|
||||||
|
- Report content in Korean (한국어)
|
||||||
|
- Keep technical English terms as-is
|
||||||
|
- URLs and code remain unchanged
|
||||||
@@ -0,0 +1,207 @@
|
|||||||
|
"""
|
||||||
|
Base Client - Shared async client utilities
|
||||||
|
===========================================
|
||||||
|
Purpose: Rate-limited async operations for API clients
|
||||||
|
Python: 3.10+
|
||||||
|
"""
|
||||||
|
|
||||||
|
import asyncio
|
||||||
|
import logging
|
||||||
|
import os
|
||||||
|
from asyncio import Semaphore
|
||||||
|
from datetime import datetime
|
||||||
|
from typing import Any, Callable, TypeVar
|
||||||
|
|
||||||
|
from dotenv import load_dotenv
|
||||||
|
from tenacity import (
|
||||||
|
retry,
|
||||||
|
stop_after_attempt,
|
||||||
|
wait_exponential,
|
||||||
|
retry_if_exception_type,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Load environment variables
|
||||||
|
load_dotenv()
|
||||||
|
|
||||||
|
# Logging setup
|
||||||
|
logging.basicConfig(
|
||||||
|
level=logging.INFO,
|
||||||
|
format="%(asctime)s - %(levelname)s - %(message)s",
|
||||||
|
)
|
||||||
|
|
||||||
|
T = TypeVar("T")
|
||||||
|
|
||||||
|
|
||||||
|
class RateLimiter:
|
||||||
|
"""Rate limiter using token bucket algorithm."""
|
||||||
|
|
||||||
|
def __init__(self, rate: float, per: float = 1.0):
|
||||||
|
"""
|
||||||
|
Initialize rate limiter.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
rate: Number of requests allowed
|
||||||
|
per: Time period in seconds (default: 1 second)
|
||||||
|
"""
|
||||||
|
self.rate = rate
|
||||||
|
self.per = per
|
||||||
|
self.tokens = rate
|
||||||
|
self.last_update = datetime.now()
|
||||||
|
self._lock = asyncio.Lock()
|
||||||
|
|
||||||
|
async def acquire(self) -> None:
|
||||||
|
"""Acquire a token, waiting if necessary."""
|
||||||
|
async with self._lock:
|
||||||
|
now = datetime.now()
|
||||||
|
elapsed = (now - self.last_update).total_seconds()
|
||||||
|
self.tokens = min(self.rate, self.tokens + elapsed * (self.rate / self.per))
|
||||||
|
self.last_update = now
|
||||||
|
|
||||||
|
if self.tokens < 1:
|
||||||
|
wait_time = (1 - self.tokens) * (self.per / self.rate)
|
||||||
|
await asyncio.sleep(wait_time)
|
||||||
|
self.tokens = 0
|
||||||
|
else:
|
||||||
|
self.tokens -= 1
|
||||||
|
|
||||||
|
|
||||||
|
class BaseAsyncClient:
|
||||||
|
"""Base class for async API clients with rate limiting."""
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
max_concurrent: int = 5,
|
||||||
|
requests_per_second: float = 3.0,
|
||||||
|
logger: logging.Logger | None = None,
|
||||||
|
):
|
||||||
|
"""
|
||||||
|
Initialize base client.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
max_concurrent: Maximum concurrent requests
|
||||||
|
requests_per_second: Rate limit
|
||||||
|
logger: Logger instance
|
||||||
|
"""
|
||||||
|
self.semaphore = Semaphore(max_concurrent)
|
||||||
|
self.rate_limiter = RateLimiter(requests_per_second)
|
||||||
|
self.logger = logger or logging.getLogger(self.__class__.__name__)
|
||||||
|
self.stats = {
|
||||||
|
"requests": 0,
|
||||||
|
"success": 0,
|
||||||
|
"errors": 0,
|
||||||
|
"retries": 0,
|
||||||
|
}
|
||||||
|
|
||||||
|
@retry(
|
||||||
|
stop=stop_after_attempt(3),
|
||||||
|
wait=wait_exponential(multiplier=1, min=2, max=10),
|
||||||
|
retry=retry_if_exception_type(Exception),
|
||||||
|
)
|
||||||
|
async def _rate_limited_request(
|
||||||
|
self,
|
||||||
|
coro: Callable[[], Any],
|
||||||
|
) -> Any:
|
||||||
|
"""Execute a request with rate limiting and retry."""
|
||||||
|
async with self.semaphore:
|
||||||
|
await self.rate_limiter.acquire()
|
||||||
|
self.stats["requests"] += 1
|
||||||
|
try:
|
||||||
|
result = await coro()
|
||||||
|
self.stats["success"] += 1
|
||||||
|
return result
|
||||||
|
except Exception as e:
|
||||||
|
self.stats["errors"] += 1
|
||||||
|
self.logger.error(f"Request failed: {e}")
|
||||||
|
raise
|
||||||
|
|
||||||
|
async def batch_requests(
|
||||||
|
self,
|
||||||
|
requests: list[Callable[[], Any]],
|
||||||
|
desc: str = "Processing",
|
||||||
|
) -> list[Any]:
|
||||||
|
"""Execute multiple requests concurrently."""
|
||||||
|
try:
|
||||||
|
from tqdm.asyncio import tqdm
|
||||||
|
has_tqdm = True
|
||||||
|
except ImportError:
|
||||||
|
has_tqdm = False
|
||||||
|
|
||||||
|
async def execute(req: Callable) -> Any:
|
||||||
|
try:
|
||||||
|
return await self._rate_limited_request(req)
|
||||||
|
except Exception as e:
|
||||||
|
return {"error": str(e)}
|
||||||
|
|
||||||
|
tasks = [execute(req) for req in requests]
|
||||||
|
|
||||||
|
if has_tqdm:
|
||||||
|
results = []
|
||||||
|
for coro in tqdm.as_completed(tasks, total=len(tasks), desc=desc):
|
||||||
|
result = await coro
|
||||||
|
results.append(result)
|
||||||
|
return results
|
||||||
|
else:
|
||||||
|
return await asyncio.gather(*tasks, return_exceptions=True)
|
||||||
|
|
||||||
|
def print_stats(self) -> None:
|
||||||
|
"""Print request statistics."""
|
||||||
|
self.logger.info("=" * 40)
|
||||||
|
self.logger.info("Request Statistics:")
|
||||||
|
self.logger.info(f" Total Requests: {self.stats['requests']}")
|
||||||
|
self.logger.info(f" Successful: {self.stats['success']}")
|
||||||
|
self.logger.info(f" Errors: {self.stats['errors']}")
|
||||||
|
self.logger.info("=" * 40)
|
||||||
|
|
||||||
|
|
||||||
|
class ConfigManager:
|
||||||
|
"""Manage API configuration and credentials."""
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
load_dotenv()
|
||||||
|
|
||||||
|
@property
|
||||||
|
def google_credentials_path(self) -> str | None:
|
||||||
|
"""Get Google service account credentials path."""
|
||||||
|
# Prefer SEO-specific credentials, fallback to general credentials
|
||||||
|
seo_creds = os.path.expanduser("~/.credential/ourdigital-seo-agent.json")
|
||||||
|
if os.path.exists(seo_creds):
|
||||||
|
return seo_creds
|
||||||
|
return os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
|
||||||
|
|
||||||
|
@property
|
||||||
|
def pagespeed_api_key(self) -> str | None:
|
||||||
|
"""Get PageSpeed Insights API key."""
|
||||||
|
return os.getenv("PAGESPEED_API_KEY")
|
||||||
|
|
||||||
|
@property
|
||||||
|
def custom_search_api_key(self) -> str | None:
|
||||||
|
"""Get Custom Search API key."""
|
||||||
|
return os.getenv("CUSTOM_SEARCH_API_KEY")
|
||||||
|
|
||||||
|
@property
|
||||||
|
def custom_search_engine_id(self) -> str | None:
|
||||||
|
"""Get Custom Search Engine ID."""
|
||||||
|
return os.getenv("CUSTOM_SEARCH_ENGINE_ID")
|
||||||
|
|
||||||
|
@property
|
||||||
|
def notion_token(self) -> str | None:
|
||||||
|
"""Get Notion API token."""
|
||||||
|
return os.getenv("NOTION_TOKEN") or os.getenv("NOTION_API_KEY")
|
||||||
|
|
||||||
|
def validate_google_credentials(self) -> bool:
|
||||||
|
"""Validate Google credentials are configured."""
|
||||||
|
creds_path = self.google_credentials_path
|
||||||
|
if not creds_path:
|
||||||
|
return False
|
||||||
|
return os.path.exists(creds_path)
|
||||||
|
|
||||||
|
def get_required(self, key: str) -> str:
|
||||||
|
"""Get required environment variable or raise error."""
|
||||||
|
value = os.getenv(key)
|
||||||
|
if not value:
|
||||||
|
raise ValueError(f"Missing required environment variable: {key}")
|
||||||
|
return value
|
||||||
|
|
||||||
|
|
||||||
|
# Singleton config instance
|
||||||
|
config = ConfigManager()
|
||||||
@@ -0,0 +1,716 @@
|
|||||||
|
"""
|
||||||
|
Content Auditor - SEO Content Inventory & Performance Analysis
|
||||||
|
==============================================================
|
||||||
|
Purpose: Build content inventory, score performance, detect decay,
|
||||||
|
classify content types, and analyze Korean content patterns.
|
||||||
|
Python: 3.10+
|
||||||
|
"""
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import asyncio
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
import re
|
||||||
|
import sys
|
||||||
|
from dataclasses import asdict, dataclass, field
|
||||||
|
from datetime import datetime, timedelta
|
||||||
|
from typing import Any
|
||||||
|
from urllib.parse import urlparse
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
|
import requests
|
||||||
|
from bs4 import BeautifulSoup
|
||||||
|
|
||||||
|
from base_client import BaseAsyncClient, config
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Data classes
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class ContentPage:
|
||||||
|
"""Single content page with performance metrics."""
|
||||||
|
url: str
|
||||||
|
title: str = ""
|
||||||
|
content_type: str = "other"
|
||||||
|
word_count: int = 0
|
||||||
|
traffic: int = 0
|
||||||
|
keywords_count: int = 0
|
||||||
|
backlinks: int = 0
|
||||||
|
performance_score: float = 0.0
|
||||||
|
last_modified: str = ""
|
||||||
|
is_decaying: bool = False
|
||||||
|
decay_rate: float = 0.0
|
||||||
|
korean_pattern: str = ""
|
||||||
|
topics: list[str] = field(default_factory=list)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class ContentInventory:
|
||||||
|
"""Aggregated content inventory summary."""
|
||||||
|
total_pages: int = 0
|
||||||
|
by_type: dict[str, int] = field(default_factory=dict)
|
||||||
|
avg_performance_score: float = 0.0
|
||||||
|
avg_word_count: float = 0.0
|
||||||
|
pages: list[ContentPage] = field(default_factory=list)
|
||||||
|
freshness_distribution: dict[str, int] = field(default_factory=dict)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class ContentAuditResult:
|
||||||
|
"""Full content audit result."""
|
||||||
|
url: str
|
||||||
|
timestamp: str = ""
|
||||||
|
content_inventory: ContentInventory = field(default_factory=ContentInventory)
|
||||||
|
top_performers: list[ContentPage] = field(default_factory=list)
|
||||||
|
decaying_content: list[ContentPage] = field(default_factory=list)
|
||||||
|
korean_content_analysis: dict[str, Any] = field(default_factory=dict)
|
||||||
|
recommendations: list[str] = field(default_factory=list)
|
||||||
|
errors: list[str] = field(default_factory=list)
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# URL pattern rules for content type classification
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
CONTENT_TYPE_PATTERNS = {
|
||||||
|
"blog": [
|
||||||
|
r"/blog/", r"/post/", r"/posts/", r"/article/", r"/articles/",
|
||||||
|
r"/news/", r"/magazine/", r"/stories/", r"/insights/",
|
||||||
|
r"/블로그/", r"/소식/", r"/뉴스/",
|
||||||
|
],
|
||||||
|
"product": [
|
||||||
|
r"/product/", r"/products/", r"/shop/", r"/store/",
|
||||||
|
r"/item/", r"/goods/", r"/catalog/",
|
||||||
|
r"/제품/", r"/상품/", r"/쇼핑/",
|
||||||
|
],
|
||||||
|
"service": [
|
||||||
|
r"/service/", r"/services/", r"/solutions/", r"/offering/",
|
||||||
|
r"/진료/", r"/서비스/", r"/시술/", r"/치료/",
|
||||||
|
],
|
||||||
|
"landing": [
|
||||||
|
r"/lp/", r"/landing/", r"/campaign/", r"/promo/",
|
||||||
|
r"/event/", r"/이벤트/", r"/프로모션/",
|
||||||
|
],
|
||||||
|
"resource": [
|
||||||
|
r"/resource/", r"/resources/", r"/guide/", r"/guides/",
|
||||||
|
r"/whitepaper/", r"/ebook/", r"/download/", r"/faq/",
|
||||||
|
r"/help/", r"/support/", r"/가이드/", r"/자료/",
|
||||||
|
],
|
||||||
|
}
|
||||||
|
|
||||||
|
KOREAN_CONTENT_PATTERNS = {
|
||||||
|
"naver_blog_style": [
|
||||||
|
r"후기", r"리뷰", r"체험", r"솔직후기", r"방문후기",
|
||||||
|
r"사용후기", r"이용후기",
|
||||||
|
],
|
||||||
|
"listicle": [
|
||||||
|
r"추천", r"베스트", r"TOP\s*\d+", r"\d+선", r"\d+가지",
|
||||||
|
r"모음", r"정리", r"비교",
|
||||||
|
],
|
||||||
|
"how_to": [
|
||||||
|
r"방법", r"하는\s*법", r"하는\s*방법", r"가이드",
|
||||||
|
r"따라하기", r"시작하기", r"알아보기",
|
||||||
|
],
|
||||||
|
"informational": [
|
||||||
|
r"이란", r"뜻", r"의미", r"차이", r"비교",
|
||||||
|
r"장단점", r"효과", r"부작용", r"비용", r"가격",
|
||||||
|
],
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# ContentAuditor
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
class ContentAuditor(BaseAsyncClient):
|
||||||
|
"""Content auditor using Ahrefs API and sitemap crawling."""
|
||||||
|
|
||||||
|
def __init__(self, max_concurrent: int = 5, requests_per_second: float = 2.0):
|
||||||
|
super().__init__(max_concurrent=max_concurrent, requests_per_second=requests_per_second)
|
||||||
|
self.session: aiohttp.ClientSession | None = None
|
||||||
|
|
||||||
|
async def _ensure_session(self) -> aiohttp.ClientSession:
|
||||||
|
if self.session is None or self.session.closed:
|
||||||
|
timeout = aiohttp.ClientTimeout(total=30)
|
||||||
|
self.session = aiohttp.ClientSession(timeout=timeout)
|
||||||
|
return self.session
|
||||||
|
|
||||||
|
async def close(self) -> None:
|
||||||
|
if self.session and not self.session.closed:
|
||||||
|
await self.session.close()
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Ahrefs data retrieval
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
async def get_top_pages(self, url: str, limit: int = 100) -> list[dict]:
|
||||||
|
"""
|
||||||
|
Retrieve top pages via Ahrefs site-explorer-top-pages.
|
||||||
|
|
||||||
|
Returns list of dicts with keys: url, traffic, keywords, value, top_keyword.
|
||||||
|
"""
|
||||||
|
self.logger.info(f"Fetching top pages from Ahrefs for {url}")
|
||||||
|
target = urlparse(url).netloc or url
|
||||||
|
try:
|
||||||
|
# Ahrefs MCP call: site-explorer-top-pages
|
||||||
|
# In MCP context this would be called by the agent.
|
||||||
|
# Standalone fallback: use REST API if AHREFS_API_KEY is set.
|
||||||
|
api_key = config.get_required("AHREFS_API_KEY") if hasattr(config, "get_required") else None
|
||||||
|
if not api_key:
|
||||||
|
self.logger.warning("AHREFS_API_KEY not set; returning empty top pages")
|
||||||
|
return []
|
||||||
|
|
||||||
|
resp = requests.get(
|
||||||
|
"https://api.ahrefs.com/v3/site-explorer/top-pages",
|
||||||
|
params={"target": target, "limit": limit, "select": "url,traffic,keywords,value,top_keyword"},
|
||||||
|
headers={"Authorization": f"Bearer {api_key}"},
|
||||||
|
timeout=30,
|
||||||
|
)
|
||||||
|
resp.raise_for_status()
|
||||||
|
data = resp.json()
|
||||||
|
pages = data.get("pages", data.get("items", []))
|
||||||
|
self.logger.info(f"Retrieved {len(pages)} top pages")
|
||||||
|
return pages
|
||||||
|
except Exception as exc:
|
||||||
|
self.logger.warning(f"Ahrefs top-pages lookup failed: {exc}")
|
||||||
|
return []
|
||||||
|
|
||||||
|
async def get_pages_by_traffic(self, url: str, limit: int = 100) -> list[dict]:
|
||||||
|
"""
|
||||||
|
Retrieve pages sorted by organic traffic via Ahrefs site-explorer-pages-by-traffic.
|
||||||
|
|
||||||
|
Returns list of dicts with keys: url, traffic, keywords, top_keyword.
|
||||||
|
"""
|
||||||
|
self.logger.info(f"Fetching pages-by-traffic from Ahrefs for {url}")
|
||||||
|
target = urlparse(url).netloc or url
|
||||||
|
try:
|
||||||
|
api_key = config.get_required("AHREFS_API_KEY") if hasattr(config, "get_required") else None
|
||||||
|
if not api_key:
|
||||||
|
self.logger.warning("AHREFS_API_KEY not set; returning empty traffic pages")
|
||||||
|
return []
|
||||||
|
|
||||||
|
resp = requests.get(
|
||||||
|
"https://api.ahrefs.com/v3/site-explorer/pages-by-traffic",
|
||||||
|
params={"target": target, "limit": limit, "select": "url,traffic,keywords,top_keyword"},
|
||||||
|
headers={"Authorization": f"Bearer {api_key}"},
|
||||||
|
timeout=30,
|
||||||
|
)
|
||||||
|
resp.raise_for_status()
|
||||||
|
data = resp.json()
|
||||||
|
pages = data.get("pages", data.get("items", []))
|
||||||
|
self.logger.info(f"Retrieved {len(pages)} pages by traffic")
|
||||||
|
return pages
|
||||||
|
except Exception as exc:
|
||||||
|
self.logger.warning(f"Ahrefs pages-by-traffic lookup failed: {exc}")
|
||||||
|
return []
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Sitemap crawling
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
async def crawl_sitemap(self, url: str) -> list[str]:
|
||||||
|
"""Discover URLs from sitemap.xml."""
|
||||||
|
sitemap_urls_to_try = [
|
||||||
|
f"{url.rstrip('/')}/sitemap.xml",
|
||||||
|
f"{url.rstrip('/')}/sitemap_index.xml",
|
||||||
|
f"{url.rstrip('/')}/post-sitemap.xml",
|
||||||
|
]
|
||||||
|
discovered: list[str] = []
|
||||||
|
session = await self._ensure_session()
|
||||||
|
|
||||||
|
for sitemap_url in sitemap_urls_to_try:
|
||||||
|
try:
|
||||||
|
async with session.get(sitemap_url) as resp:
|
||||||
|
if resp.status != 200:
|
||||||
|
continue
|
||||||
|
text = await resp.text()
|
||||||
|
soup = BeautifulSoup(text, "lxml-xml")
|
||||||
|
|
||||||
|
# Sitemap index
|
||||||
|
sitemaps = soup.find_all("sitemap")
|
||||||
|
if sitemaps:
|
||||||
|
for sm in sitemaps:
|
||||||
|
loc = sm.find("loc")
|
||||||
|
if loc:
|
||||||
|
child_urls = await self._parse_sitemap(session, loc.text.strip())
|
||||||
|
discovered.extend(child_urls)
|
||||||
|
else:
|
||||||
|
urls = soup.find_all("url")
|
||||||
|
for u in urls:
|
||||||
|
loc = u.find("loc")
|
||||||
|
if loc:
|
||||||
|
discovered.append(loc.text.strip())
|
||||||
|
|
||||||
|
if discovered:
|
||||||
|
self.logger.info(f"Discovered {len(discovered)} URLs from {sitemap_url}")
|
||||||
|
break
|
||||||
|
except Exception as exc:
|
||||||
|
self.logger.debug(f"Failed to fetch {sitemap_url}: {exc}")
|
||||||
|
|
||||||
|
return list(set(discovered))
|
||||||
|
|
||||||
|
async def _parse_sitemap(self, session: aiohttp.ClientSession, sitemap_url: str) -> list[str]:
|
||||||
|
"""Parse a single sitemap XML and return URLs."""
|
||||||
|
urls: list[str] = []
|
||||||
|
try:
|
||||||
|
async with session.get(sitemap_url) as resp:
|
||||||
|
if resp.status != 200:
|
||||||
|
return urls
|
||||||
|
text = await resp.text()
|
||||||
|
soup = BeautifulSoup(text, "lxml-xml")
|
||||||
|
for u in soup.find_all("url"):
|
||||||
|
loc = u.find("loc")
|
||||||
|
if loc:
|
||||||
|
urls.append(loc.text.strip())
|
||||||
|
except Exception as exc:
|
||||||
|
self.logger.debug(f"Failed to parse sitemap {sitemap_url}: {exc}")
|
||||||
|
return urls
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Content type classification
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def classify_content_type(url: str, title: str = "") -> str:
|
||||||
|
"""
|
||||||
|
Classify content type based on URL path patterns and title.
|
||||||
|
|
||||||
|
Returns one of: blog, product, service, landing, resource, other.
|
||||||
|
"""
|
||||||
|
combined = f"{url.lower()} {title.lower()}"
|
||||||
|
scores: dict[str, int] = {}
|
||||||
|
|
||||||
|
for ctype, patterns in CONTENT_TYPE_PATTERNS.items():
|
||||||
|
score = 0
|
||||||
|
for pattern in patterns:
|
||||||
|
if re.search(pattern, combined, re.IGNORECASE):
|
||||||
|
score += 1
|
||||||
|
if score > 0:
|
||||||
|
scores[ctype] = score
|
||||||
|
|
||||||
|
if not scores:
|
||||||
|
return "other"
|
||||||
|
return max(scores, key=scores.get)
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Performance scoring
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def score_performance(page: ContentPage) -> float:
|
||||||
|
"""
|
||||||
|
Compute composite performance score (0-100) from traffic, keywords, backlinks.
|
||||||
|
|
||||||
|
Weights:
|
||||||
|
- Traffic: 50% (log-scaled, 10k+ traffic = max)
|
||||||
|
- Keywords count: 30% (log-scaled, 500+ = max)
|
||||||
|
- Backlinks: 20% (log-scaled, 100+ = max)
|
||||||
|
"""
|
||||||
|
import math
|
||||||
|
|
||||||
|
traffic_score = min(100, (math.log10(max(page.traffic, 1)) / math.log10(10000)) * 100)
|
||||||
|
keywords_score = min(100, (math.log10(max(page.keywords_count, 1)) / math.log10(500)) * 100)
|
||||||
|
backlinks_score = min(100, (math.log10(max(page.backlinks, 1)) / math.log10(100)) * 100)
|
||||||
|
|
||||||
|
composite = (traffic_score * 0.50) + (keywords_score * 0.30) + (backlinks_score * 0.20)
|
||||||
|
return round(min(100, max(0, composite)), 1)
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Content decay detection
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def detect_decay(pages: list[ContentPage], threshold: float = -20.0) -> list[ContentPage]:
|
||||||
|
"""
|
||||||
|
Flag pages with declining traffic trend.
|
||||||
|
|
||||||
|
Uses a simple heuristic: pages with low performance score relative to
|
||||||
|
their keyword count indicate potential decay. In production, historical
|
||||||
|
traffic data from Ahrefs metrics-history would be used.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
pages: List of content pages with metrics.
|
||||||
|
threshold: Decay rate threshold (percentage decline).
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of pages flagged as decaying.
|
||||||
|
"""
|
||||||
|
decaying: list[ContentPage] = []
|
||||||
|
for page in pages:
|
||||||
|
# Heuristic: high keyword count but low traffic suggests decay
|
||||||
|
if page.keywords_count > 10 and page.traffic < 50:
|
||||||
|
page.is_decaying = True
|
||||||
|
page.decay_rate = -50.0 if page.traffic == 0 else round(
|
||||||
|
-((page.keywords_count * 10 - page.traffic) / max(page.keywords_count * 10, 1)) * 100, 1
|
||||||
|
)
|
||||||
|
if page.decay_rate <= threshold:
|
||||||
|
decaying.append(page)
|
||||||
|
elif page.performance_score < 20 and page.keywords_count > 5:
|
||||||
|
page.is_decaying = True
|
||||||
|
page.decay_rate = round(-max(30, 100 - page.performance_score * 2), 1)
|
||||||
|
if page.decay_rate <= threshold:
|
||||||
|
decaying.append(page)
|
||||||
|
|
||||||
|
decaying.sort(key=lambda p: p.decay_rate)
|
||||||
|
return decaying
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Freshness assessment
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def analyze_freshness(pages: list[ContentPage]) -> dict[str, int]:
|
||||||
|
"""
|
||||||
|
Categorize pages by freshness based on last_modified dates.
|
||||||
|
|
||||||
|
Returns distribution: fresh (< 3 months), aging (3-12 months),
|
||||||
|
stale (> 12 months), unknown (no date).
|
||||||
|
"""
|
||||||
|
now = datetime.now()
|
||||||
|
distribution = {"fresh": 0, "aging": 0, "stale": 0, "unknown": 0}
|
||||||
|
|
||||||
|
for page in pages:
|
||||||
|
if not page.last_modified:
|
||||||
|
distribution["unknown"] += 1
|
||||||
|
continue
|
||||||
|
try:
|
||||||
|
# Try common date formats
|
||||||
|
for fmt in ("%Y-%m-%dT%H:%M:%S", "%Y-%m-%d", "%Y-%m-%dT%H:%M:%S%z"):
|
||||||
|
try:
|
||||||
|
modified = datetime.strptime(
|
||||||
|
page.last_modified.replace("+00:00", "").replace("Z", ""), fmt.replace("%z", "")
|
||||||
|
)
|
||||||
|
break
|
||||||
|
except ValueError:
|
||||||
|
continue
|
||||||
|
else:
|
||||||
|
distribution["unknown"] += 1
|
||||||
|
continue
|
||||||
|
|
||||||
|
age = now - modified
|
||||||
|
if age < timedelta(days=90):
|
||||||
|
distribution["fresh"] += 1
|
||||||
|
elif age < timedelta(days=365):
|
||||||
|
distribution["aging"] += 1
|
||||||
|
else:
|
||||||
|
distribution["stale"] += 1
|
||||||
|
except Exception:
|
||||||
|
distribution["unknown"] += 1
|
||||||
|
|
||||||
|
return distribution
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Korean content pattern identification
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def identify_korean_patterns(pages: list[ContentPage]) -> dict[str, Any]:
|
||||||
|
"""
|
||||||
|
Detect Korean content patterns across pages.
|
||||||
|
|
||||||
|
Identifies Naver Blog style review content, listicles,
|
||||||
|
how-to guides, and informational content patterns.
|
||||||
|
|
||||||
|
Returns summary with counts and example URLs per pattern.
|
||||||
|
"""
|
||||||
|
results: dict[str, Any] = {
|
||||||
|
"total_korean_content": 0,
|
||||||
|
"patterns": {},
|
||||||
|
}
|
||||||
|
|
||||||
|
for pattern_name, keywords in KOREAN_CONTENT_PATTERNS.items():
|
||||||
|
matches: list[dict[str, str]] = []
|
||||||
|
for page in pages:
|
||||||
|
combined = f"{page.url} {page.title}"
|
||||||
|
for keyword in keywords:
|
||||||
|
if re.search(keyword, combined, re.IGNORECASE):
|
||||||
|
matches.append({"url": page.url, "title": page.title, "matched_keyword": keyword})
|
||||||
|
break
|
||||||
|
|
||||||
|
results["patterns"][pattern_name] = {
|
||||||
|
"count": len(matches),
|
||||||
|
"examples": matches[:5],
|
||||||
|
}
|
||||||
|
|
||||||
|
korean_urls = set()
|
||||||
|
for pattern_data in results["patterns"].values():
|
||||||
|
for example in pattern_data["examples"]:
|
||||||
|
korean_urls.add(example["url"])
|
||||||
|
results["total_korean_content"] = len(korean_urls)
|
||||||
|
|
||||||
|
return results
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Orchestration
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
async def audit(
|
||||||
|
self,
|
||||||
|
url: str,
|
||||||
|
detect_decay_flag: bool = False,
|
||||||
|
content_type_filter: str | None = None,
|
||||||
|
limit: int = 200,
|
||||||
|
) -> ContentAuditResult:
|
||||||
|
"""
|
||||||
|
Run full content audit: inventory, scoring, decay, Korean patterns.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
url: Target website URL.
|
||||||
|
detect_decay_flag: Whether to run decay detection.
|
||||||
|
content_type_filter: Filter by content type (blog, product, etc.).
|
||||||
|
limit: Maximum pages to analyze.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
ContentAuditResult with inventory, top performers, decay, analysis.
|
||||||
|
"""
|
||||||
|
result = ContentAuditResult(
|
||||||
|
url=url,
|
||||||
|
timestamp=datetime.now().isoformat(),
|
||||||
|
)
|
||||||
|
|
||||||
|
self.logger.info(f"Starting content audit for {url}")
|
||||||
|
|
||||||
|
# 1. Gather pages from Ahrefs and sitemap
|
||||||
|
top_pages_data, traffic_pages_data, sitemap_urls = await asyncio.gather(
|
||||||
|
self.get_top_pages(url, limit=limit),
|
||||||
|
self.get_pages_by_traffic(url, limit=limit),
|
||||||
|
self.crawl_sitemap(url),
|
||||||
|
)
|
||||||
|
|
||||||
|
# 2. Merge and deduplicate pages
|
||||||
|
page_map: dict[str, ContentPage] = {}
|
||||||
|
|
||||||
|
for item in top_pages_data:
|
||||||
|
page_url = item.get("url", "")
|
||||||
|
if not page_url:
|
||||||
|
continue
|
||||||
|
page_map[page_url] = ContentPage(
|
||||||
|
url=page_url,
|
||||||
|
title=item.get("top_keyword", ""),
|
||||||
|
traffic=int(item.get("traffic", 0)),
|
||||||
|
keywords_count=int(item.get("keywords", 0)),
|
||||||
|
backlinks=int(item.get("value", 0)),
|
||||||
|
)
|
||||||
|
|
||||||
|
for item in traffic_pages_data:
|
||||||
|
page_url = item.get("url", "")
|
||||||
|
if not page_url:
|
||||||
|
continue
|
||||||
|
if page_url in page_map:
|
||||||
|
existing = page_map[page_url]
|
||||||
|
existing.traffic = max(existing.traffic, int(item.get("traffic", 0)))
|
||||||
|
existing.keywords_count = max(existing.keywords_count, int(item.get("keywords", 0)))
|
||||||
|
else:
|
||||||
|
page_map[page_url] = ContentPage(
|
||||||
|
url=page_url,
|
||||||
|
title=item.get("top_keyword", ""),
|
||||||
|
traffic=int(item.get("traffic", 0)),
|
||||||
|
keywords_count=int(item.get("keywords", 0)),
|
||||||
|
)
|
||||||
|
|
||||||
|
# Add sitemap URLs not already present
|
||||||
|
for s_url in sitemap_urls:
|
||||||
|
if s_url not in page_map:
|
||||||
|
page_map[s_url] = ContentPage(url=s_url)
|
||||||
|
|
||||||
|
# 3. Classify and score
|
||||||
|
all_pages: list[ContentPage] = []
|
||||||
|
for page in page_map.values():
|
||||||
|
page.content_type = self.classify_content_type(page.url, page.title)
|
||||||
|
page.performance_score = self.score_performance(page)
|
||||||
|
all_pages.append(page)
|
||||||
|
|
||||||
|
# 4. Filter by content type if requested
|
||||||
|
if content_type_filter:
|
||||||
|
all_pages = [p for p in all_pages if p.content_type == content_type_filter]
|
||||||
|
|
||||||
|
# 5. Build inventory
|
||||||
|
by_type: dict[str, int] = {}
|
||||||
|
for page in all_pages:
|
||||||
|
by_type[page.content_type] = by_type.get(page.content_type, 0) + 1
|
||||||
|
|
||||||
|
avg_score = (
|
||||||
|
sum(p.performance_score for p in all_pages) / len(all_pages)
|
||||||
|
if all_pages else 0.0
|
||||||
|
)
|
||||||
|
avg_word_count = (
|
||||||
|
sum(p.word_count for p in all_pages) / len(all_pages)
|
||||||
|
if all_pages else 0.0
|
||||||
|
)
|
||||||
|
|
||||||
|
freshness = self.analyze_freshness(all_pages)
|
||||||
|
|
||||||
|
result.content_inventory = ContentInventory(
|
||||||
|
total_pages=len(all_pages),
|
||||||
|
by_type=by_type,
|
||||||
|
avg_performance_score=round(avg_score, 1),
|
||||||
|
avg_word_count=round(avg_word_count, 1),
|
||||||
|
pages=sorted(all_pages, key=lambda p: p.performance_score, reverse=True)[:limit],
|
||||||
|
freshness_distribution=freshness,
|
||||||
|
)
|
||||||
|
|
||||||
|
# 6. Top performers
|
||||||
|
result.top_performers = sorted(all_pages, key=lambda p: p.performance_score, reverse=True)[:20]
|
||||||
|
|
||||||
|
# 7. Decay detection
|
||||||
|
if detect_decay_flag:
|
||||||
|
result.decaying_content = self.detect_decay(all_pages)
|
||||||
|
|
||||||
|
# 8. Korean content analysis
|
||||||
|
result.korean_content_analysis = self.identify_korean_patterns(all_pages)
|
||||||
|
|
||||||
|
# 9. Recommendations
|
||||||
|
result.recommendations = self._generate_recommendations(result)
|
||||||
|
|
||||||
|
self.logger.info(
|
||||||
|
f"Audit complete: {len(all_pages)} pages, "
|
||||||
|
f"{len(result.top_performers)} top performers, "
|
||||||
|
f"{len(result.decaying_content)} decaying"
|
||||||
|
)
|
||||||
|
|
||||||
|
return result
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _generate_recommendations(result: ContentAuditResult) -> list[str]:
|
||||||
|
"""Generate actionable recommendations from audit data."""
|
||||||
|
recs: list[str] = []
|
||||||
|
inv = result.content_inventory
|
||||||
|
|
||||||
|
# Low average score
|
||||||
|
if inv.avg_performance_score < 30:
|
||||||
|
recs.append(
|
||||||
|
"전체 콘텐츠 평균 성과 점수가 낮습니다 ({:.0f}/100). "
|
||||||
|
"상위 콘텐츠 패턴을 분석하여 저성과 페이지를 개선하세요.".format(inv.avg_performance_score)
|
||||||
|
)
|
||||||
|
|
||||||
|
# Stale content
|
||||||
|
stale = inv.freshness_distribution.get("stale", 0)
|
||||||
|
total = inv.total_pages or 1
|
||||||
|
if stale / total > 0.3:
|
||||||
|
recs.append(
|
||||||
|
f"오래된 콘텐츠가 {stale}개 ({stale * 100 // total}%)입니다. "
|
||||||
|
"콘텐츠 업데이트 또는 통합을 고려하세요."
|
||||||
|
)
|
||||||
|
|
||||||
|
# Decaying content
|
||||||
|
if len(result.decaying_content) > 5:
|
||||||
|
recs.append(
|
||||||
|
f"트래픽이 감소하는 콘텐츠가 {len(result.decaying_content)}개 감지되었습니다. "
|
||||||
|
"상위 감소 페이지부터 콘텐츠 리프레시를 진행하세요."
|
||||||
|
)
|
||||||
|
|
||||||
|
# Content type balance
|
||||||
|
blog_count = inv.by_type.get("blog", 0)
|
||||||
|
if blog_count == 0:
|
||||||
|
recs.append(
|
||||||
|
"블로그 콘텐츠가 없습니다. SEO 트래픽 확보를 위해 "
|
||||||
|
"블로그 콘텐츠 전략을 수립하세요."
|
||||||
|
)
|
||||||
|
|
||||||
|
# Korean content opportunities
|
||||||
|
korean = result.korean_content_analysis
|
||||||
|
review_count = korean.get("patterns", {}).get("naver_blog_style", {}).get("count", 0)
|
||||||
|
if review_count == 0:
|
||||||
|
recs.append(
|
||||||
|
"후기/리뷰 콘텐츠가 없습니다. 한국 시장에서 후기 콘텐츠는 "
|
||||||
|
"전환율에 큰 영향을 미치므로 후기 콘텐츠 생성을 권장합니다."
|
||||||
|
)
|
||||||
|
|
||||||
|
if not recs:
|
||||||
|
recs.append("현재 콘텐츠 전략이 양호합니다. 지속적인 모니터링을 권장합니다.")
|
||||||
|
|
||||||
|
return recs
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# CLI
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def build_parser() -> argparse.ArgumentParser:
|
||||||
|
parser = argparse.ArgumentParser(
|
||||||
|
description="SEO Content Auditor - inventory, scoring, and decay detection",
|
||||||
|
)
|
||||||
|
parser.add_argument("--url", required=True, help="Target website URL")
|
||||||
|
parser.add_argument("--decay", action="store_true", help="Enable content decay detection")
|
||||||
|
parser.add_argument("--type", dest="content_type", help="Filter by content type (blog, product, service, landing, resource)")
|
||||||
|
parser.add_argument("--limit", type=int, default=200, help="Maximum pages to analyze (default: 200)")
|
||||||
|
parser.add_argument("--json", action="store_true", help="Output as JSON")
|
||||||
|
parser.add_argument("--output", help="Save output to file")
|
||||||
|
return parser
|
||||||
|
|
||||||
|
|
||||||
|
def format_text_report(result: ContentAuditResult) -> str:
|
||||||
|
"""Format audit result as human-readable text."""
|
||||||
|
lines: list[str] = []
|
||||||
|
lines.append(f"## Content Audit: {result.url}")
|
||||||
|
lines.append(f"**Date**: {result.timestamp[:10]}")
|
||||||
|
lines.append("")
|
||||||
|
|
||||||
|
inv = result.content_inventory
|
||||||
|
lines.append(f"### Content Inventory")
|
||||||
|
lines.append(f"- Total pages: {inv.total_pages}")
|
||||||
|
lines.append(f"- Average performance score: {inv.avg_performance_score}/100")
|
||||||
|
lines.append(f"- Content types: {json.dumps(inv.by_type, ensure_ascii=False)}")
|
||||||
|
lines.append(f"- Freshness: {json.dumps(inv.freshness_distribution, ensure_ascii=False)}")
|
||||||
|
lines.append("")
|
||||||
|
|
||||||
|
lines.append("### Top Performers")
|
||||||
|
for i, page in enumerate(result.top_performers[:10], 1):
|
||||||
|
lines.append(f" {i}. [{page.performance_score:.0f}] {page.url} (traffic: {page.traffic})")
|
||||||
|
lines.append("")
|
||||||
|
|
||||||
|
if result.decaying_content:
|
||||||
|
lines.append("### Decaying Content")
|
||||||
|
for i, page in enumerate(result.decaying_content[:10], 1):
|
||||||
|
lines.append(f" {i}. [{page.decay_rate:+.0f}%] {page.url} (traffic: {page.traffic})")
|
||||||
|
lines.append("")
|
||||||
|
|
||||||
|
if result.korean_content_analysis.get("patterns"):
|
||||||
|
lines.append("### Korean Content Patterns")
|
||||||
|
for pattern_name, data in result.korean_content_analysis["patterns"].items():
|
||||||
|
lines.append(f" - {pattern_name}: {data['count']} pages")
|
||||||
|
lines.append("")
|
||||||
|
|
||||||
|
lines.append("### Recommendations")
|
||||||
|
for i, rec in enumerate(result.recommendations, 1):
|
||||||
|
lines.append(f" {i}. {rec}")
|
||||||
|
|
||||||
|
return "\n".join(lines)
|
||||||
|
|
||||||
|
|
||||||
|
async def main() -> None:
|
||||||
|
parser = build_parser()
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
auditor = ContentAuditor()
|
||||||
|
try:
|
||||||
|
result = await auditor.audit(
|
||||||
|
url=args.url,
|
||||||
|
detect_decay_flag=args.decay,
|
||||||
|
content_type_filter=args.content_type,
|
||||||
|
limit=args.limit,
|
||||||
|
)
|
||||||
|
|
||||||
|
if args.json:
|
||||||
|
output = json.dumps(asdict(result), ensure_ascii=False, indent=2, default=str)
|
||||||
|
else:
|
||||||
|
output = format_text_report(result)
|
||||||
|
|
||||||
|
if args.output:
|
||||||
|
with open(args.output, "w", encoding="utf-8") as f:
|
||||||
|
f.write(output)
|
||||||
|
logger.info(f"Output saved to {args.output}")
|
||||||
|
else:
|
||||||
|
print(output)
|
||||||
|
|
||||||
|
finally:
|
||||||
|
await auditor.close()
|
||||||
|
auditor.print_stats()
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
asyncio.run(main())
|
||||||
@@ -0,0 +1,738 @@
|
|||||||
|
"""
|
||||||
|
Content Brief Generator - SEO Content Brief Creation
|
||||||
|
=====================================================
|
||||||
|
Purpose: Generate detailed SEO content briefs with outlines,
|
||||||
|
keyword lists, word count targets, and internal linking suggestions.
|
||||||
|
Python: 3.10+
|
||||||
|
"""
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import asyncio
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
import math
|
||||||
|
import re
|
||||||
|
import sys
|
||||||
|
from dataclasses import asdict, dataclass, field
|
||||||
|
from datetime import datetime
|
||||||
|
from typing import Any
|
||||||
|
from urllib.parse import urlparse
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
|
import requests
|
||||||
|
from bs4 import BeautifulSoup
|
||||||
|
|
||||||
|
from base_client import BaseAsyncClient, config
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Data classes
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class OutlineSection:
|
||||||
|
"""A single heading section in the content outline."""
|
||||||
|
heading: str
|
||||||
|
level: int = 2 # H2 or H3
|
||||||
|
talking_points: list[str] = field(default_factory=list)
|
||||||
|
target_words: int = 200
|
||||||
|
keywords_to_include: list[str] = field(default_factory=list)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class CompetitorPageAnalysis:
|
||||||
|
"""Analysis of a single competitor page for the target keyword."""
|
||||||
|
url: str
|
||||||
|
title: str = ""
|
||||||
|
word_count: int = 0
|
||||||
|
headings: list[dict[str, str]] = field(default_factory=list)
|
||||||
|
topics_covered: list[str] = field(default_factory=list)
|
||||||
|
content_type: str = ""
|
||||||
|
has_images: bool = False
|
||||||
|
has_video: bool = False
|
||||||
|
has_faq: bool = False
|
||||||
|
has_table: bool = False
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class ContentBrief:
|
||||||
|
"""Complete SEO content brief."""
|
||||||
|
primary_keyword: str
|
||||||
|
secondary_keywords: list[str] = field(default_factory=list)
|
||||||
|
lsi_keywords: list[str] = field(default_factory=list)
|
||||||
|
target_word_count: int = 1500
|
||||||
|
word_count_range: tuple[int, int] = (1200, 1800)
|
||||||
|
suggested_title: str = ""
|
||||||
|
meta_description: str = ""
|
||||||
|
outline: list[OutlineSection] = field(default_factory=list)
|
||||||
|
competitor_analysis: list[CompetitorPageAnalysis] = field(default_factory=list)
|
||||||
|
internal_links: list[dict[str, str]] = field(default_factory=list)
|
||||||
|
content_format: str = "blog"
|
||||||
|
korean_format_recommendations: list[str] = field(default_factory=list)
|
||||||
|
search_intent: str = "informational"
|
||||||
|
notes: list[str] = field(default_factory=list)
|
||||||
|
timestamp: str = ""
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Search intent patterns
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
INTENT_PATTERNS = {
|
||||||
|
"transactional": [
|
||||||
|
r"buy", r"purchase", r"price", r"cost", r"order", r"shop",
|
||||||
|
r"구매", r"주문", r"가격", r"비용", r"할인", r"쿠폰",
|
||||||
|
],
|
||||||
|
"navigational": [
|
||||||
|
r"login", r"sign in", r"official", r"website",
|
||||||
|
r"로그인", r"공식", r"홈페이지",
|
||||||
|
],
|
||||||
|
"commercial": [
|
||||||
|
r"best", r"top", r"review", r"compare", r"vs",
|
||||||
|
r"추천", r"비교", r"후기", r"리뷰", r"순위",
|
||||||
|
],
|
||||||
|
"informational": [
|
||||||
|
r"what", r"how", r"why", r"guide", r"tutorial",
|
||||||
|
r"이란", r"방법", r"가이드", r"효과", r"원인",
|
||||||
|
],
|
||||||
|
}
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Korean content format recommendations
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
KOREAN_FORMAT_TIPS = {
|
||||||
|
"transactional": [
|
||||||
|
"가격 비교표를 포함하세요 (경쟁사 가격 대비)",
|
||||||
|
"실제 비용 사례를 3개 이상 제시하세요",
|
||||||
|
"결제 방법 및 할인 정보를 명확히 안내하세요",
|
||||||
|
"CTA(행동 유도) 버튼을 여러 위치에 배치하세요",
|
||||||
|
],
|
||||||
|
"commercial": [
|
||||||
|
"네이버 블로그 스타일의 솔직한 후기 톤을 사용하세요",
|
||||||
|
"장단점을 균형 있게 비교하세요",
|
||||||
|
"실제 사용 사진 또는 전후 비교 이미지를 포함하세요",
|
||||||
|
"별점 또는 점수 평가 체계를 추가하세요",
|
||||||
|
"FAQ 섹션을 포함하세요 (네이버 검색 노출에 유리)",
|
||||||
|
],
|
||||||
|
"informational": [
|
||||||
|
"핵심 정보를 글 상단에 요약하세요 (두괄식 구성)",
|
||||||
|
"전문 용어는 쉬운 설명을 병기하세요",
|
||||||
|
"인포그래픽 또는 도표를 활용하세요",
|
||||||
|
"관련 콘텐츠 내부 링크를 3-5개 포함하세요",
|
||||||
|
"전문가 인용 또는 출처를 명시하세요 (E-E-A-T 강화)",
|
||||||
|
],
|
||||||
|
"navigational": [
|
||||||
|
"공식 정보와 연락처를 최상단에 배치하세요",
|
||||||
|
"지도 임베드를 포함하세요 (네이버 지도/구글 맵)",
|
||||||
|
"영업시간, 주소, 전화번호를 명확히 표시하세요",
|
||||||
|
],
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# ContentBriefGenerator
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
class ContentBriefGenerator(BaseAsyncClient):
|
||||||
|
"""Generate comprehensive SEO content briefs."""
|
||||||
|
|
||||||
|
def __init__(self, max_concurrent: int = 5, requests_per_second: float = 2.0):
|
||||||
|
super().__init__(max_concurrent=max_concurrent, requests_per_second=requests_per_second)
|
||||||
|
self.session: aiohttp.ClientSession | None = None
|
||||||
|
|
||||||
|
async def _ensure_session(self) -> aiohttp.ClientSession:
|
||||||
|
if self.session is None or self.session.closed:
|
||||||
|
timeout = aiohttp.ClientTimeout(total=30)
|
||||||
|
headers = {
|
||||||
|
"User-Agent": "Mozilla/5.0 (compatible; SEOContentBrief/1.0)",
|
||||||
|
}
|
||||||
|
self.session = aiohttp.ClientSession(timeout=timeout, headers=headers)
|
||||||
|
return self.session
|
||||||
|
|
||||||
|
async def close(self) -> None:
|
||||||
|
if self.session and not self.session.closed:
|
||||||
|
await self.session.close()
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Analyze top ranking results
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
async def analyze_top_results(
|
||||||
|
self,
|
||||||
|
keyword: str,
|
||||||
|
site_url: str | None = None,
|
||||||
|
num_competitors: int = 5,
|
||||||
|
) -> list[CompetitorPageAnalysis]:
|
||||||
|
"""
|
||||||
|
Analyze top ranking pages for a keyword using Ahrefs SERP data.
|
||||||
|
|
||||||
|
Falls back to fetching pages directly if Ahrefs data is unavailable.
|
||||||
|
"""
|
||||||
|
self.logger.info(f"Analyzing top results for: {keyword}")
|
||||||
|
results: list[CompetitorPageAnalysis] = []
|
||||||
|
|
||||||
|
# Try Ahrefs organic keywords to find ranking pages
|
||||||
|
try:
|
||||||
|
api_key = config.get_required("AHREFS_API_KEY") if hasattr(config, "get_required") else None
|
||||||
|
if api_key:
|
||||||
|
resp = requests.get(
|
||||||
|
"https://api.ahrefs.com/v3/serp-overview",
|
||||||
|
params={"keyword": keyword, "select": "url,title,position,traffic"},
|
||||||
|
headers={"Authorization": f"Bearer {api_key}"},
|
||||||
|
timeout=30,
|
||||||
|
)
|
||||||
|
if resp.status_code == 200:
|
||||||
|
data = resp.json()
|
||||||
|
serp_items = data.get("positions", data.get("items", []))[:num_competitors]
|
||||||
|
for item in serp_items:
|
||||||
|
analysis = CompetitorPageAnalysis(
|
||||||
|
url=item.get("url", ""),
|
||||||
|
title=item.get("title", ""),
|
||||||
|
)
|
||||||
|
results.append(analysis)
|
||||||
|
except Exception as exc:
|
||||||
|
self.logger.warning(f"Ahrefs SERP lookup failed: {exc}")
|
||||||
|
|
||||||
|
# Fetch and analyze each page
|
||||||
|
session = await self._ensure_session()
|
||||||
|
for analysis in results[:num_competitors]:
|
||||||
|
if not analysis.url:
|
||||||
|
continue
|
||||||
|
try:
|
||||||
|
async with session.get(analysis.url) as resp:
|
||||||
|
if resp.status != 200:
|
||||||
|
continue
|
||||||
|
html = await resp.text()
|
||||||
|
self._analyze_page_content(analysis, html)
|
||||||
|
except Exception as exc:
|
||||||
|
self.logger.debug(f"Failed to fetch {analysis.url}: {exc}")
|
||||||
|
|
||||||
|
self.logger.info(f"Analyzed {len(results)} competitor pages")
|
||||||
|
return results
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _analyze_page_content(analysis: CompetitorPageAnalysis, html: str) -> None:
|
||||||
|
"""Parse HTML and extract content metrics."""
|
||||||
|
soup = BeautifulSoup(html, "html.parser")
|
||||||
|
|
||||||
|
# Title
|
||||||
|
title_tag = soup.find("title")
|
||||||
|
if title_tag and not analysis.title:
|
||||||
|
analysis.title = title_tag.get_text(strip=True)
|
||||||
|
|
||||||
|
# Word count (visible text only)
|
||||||
|
for tag in soup(["script", "style", "nav", "header", "footer"]):
|
||||||
|
tag.decompose()
|
||||||
|
visible_text = soup.get_text(separator=" ", strip=True)
|
||||||
|
analysis.word_count = len(visible_text.split())
|
||||||
|
|
||||||
|
# Headings
|
||||||
|
headings: list[dict[str, str]] = []
|
||||||
|
for level in range(1, 7):
|
||||||
|
for h in soup.find_all(f"h{level}"):
|
||||||
|
text = h.get_text(strip=True)
|
||||||
|
if text:
|
||||||
|
headings.append({"level": f"H{level}", "text": text})
|
||||||
|
analysis.headings = headings
|
||||||
|
|
||||||
|
# Content features
|
||||||
|
analysis.has_images = len(soup.find_all("img")) > 2
|
||||||
|
analysis.has_video = bool(soup.find("video") or soup.find("iframe", src=re.compile(r"youtube|vimeo")))
|
||||||
|
analysis.has_faq = bool(
|
||||||
|
soup.find(string=re.compile(r"FAQ|자주\s*묻는\s*질문|Q\s*&\s*A", re.IGNORECASE))
|
||||||
|
or soup.find("script", type="application/ld+json", string=re.compile(r"FAQPage"))
|
||||||
|
)
|
||||||
|
analysis.has_table = bool(soup.find("table"))
|
||||||
|
|
||||||
|
# Topics covered (from H2 headings)
|
||||||
|
analysis.topics_covered = [
|
||||||
|
h["text"] for h in headings if h["level"] == "H2"
|
||||||
|
][:15]
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Extract content outline
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
def extract_outline(
|
||||||
|
self,
|
||||||
|
keyword: str,
|
||||||
|
top_results: list[CompetitorPageAnalysis],
|
||||||
|
) -> list[OutlineSection]:
|
||||||
|
"""
|
||||||
|
Build recommended H2/H3 outline by aggregating competitor headings.
|
||||||
|
|
||||||
|
Identifies common topics across top-ranking pages and structures
|
||||||
|
them into a logical outline.
|
||||||
|
"""
|
||||||
|
# Collect all H2 headings
|
||||||
|
h2_topics: dict[str, int] = {}
|
||||||
|
h3_by_h2: dict[str, list[str]] = {}
|
||||||
|
|
||||||
|
for result in top_results:
|
||||||
|
current_h2 = ""
|
||||||
|
for heading in result.headings:
|
||||||
|
text = heading["text"].strip()
|
||||||
|
if heading["level"] == "H2":
|
||||||
|
current_h2 = text
|
||||||
|
h2_topics[text] = h2_topics.get(text, 0) + 1
|
||||||
|
elif heading["level"] == "H3" and current_h2:
|
||||||
|
if current_h2 not in h3_by_h2:
|
||||||
|
h3_by_h2[current_h2] = []
|
||||||
|
h3_by_h2[current_h2].append(text)
|
||||||
|
|
||||||
|
# Sort H2s by frequency (most common topics first)
|
||||||
|
sorted_h2s = sorted(h2_topics.items(), key=lambda x: x[1], reverse=True)
|
||||||
|
|
||||||
|
# Build outline
|
||||||
|
outline: list[OutlineSection] = []
|
||||||
|
target_word_count = self.calculate_word_count(top_results)
|
||||||
|
words_per_section = target_word_count // max(len(sorted_h2s), 5)
|
||||||
|
|
||||||
|
for h2_text, frequency in sorted_h2s[:8]:
|
||||||
|
section = OutlineSection(
|
||||||
|
heading=h2_text,
|
||||||
|
level=2,
|
||||||
|
target_words=words_per_section,
|
||||||
|
talking_points=[],
|
||||||
|
)
|
||||||
|
|
||||||
|
# Add H3 subtopics
|
||||||
|
if h2_text in h3_by_h2:
|
||||||
|
unique_h3s = list(dict.fromkeys(h3_by_h2[h2_text]))[:5]
|
||||||
|
for h3_text in unique_h3s:
|
||||||
|
subsection = OutlineSection(
|
||||||
|
heading=h3_text,
|
||||||
|
level=3,
|
||||||
|
target_words=words_per_section // 3,
|
||||||
|
)
|
||||||
|
section.talking_points.append(h3_text)
|
||||||
|
|
||||||
|
outline.append(section)
|
||||||
|
|
||||||
|
# Ensure FAQ section if common
|
||||||
|
faq_count = sum(1 for r in top_results if r.has_faq)
|
||||||
|
if faq_count >= 2 and not any("FAQ" in s.heading or "질문" in s.heading for s in outline):
|
||||||
|
outline.append(OutlineSection(
|
||||||
|
heading="자주 묻는 질문 (FAQ)",
|
||||||
|
level=2,
|
||||||
|
target_words=300,
|
||||||
|
talking_points=[
|
||||||
|
f"{keyword} 관련 자주 묻는 질문 5-7개",
|
||||||
|
"Schema markup (FAQPage) 적용 권장",
|
||||||
|
],
|
||||||
|
))
|
||||||
|
|
||||||
|
return outline
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Keyword suggestions
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
async def suggest_keywords(self, primary_keyword: str) -> dict[str, list[str]]:
|
||||||
|
"""
|
||||||
|
Generate primary, secondary, and LSI keyword suggestions.
|
||||||
|
|
||||||
|
Uses Ahrefs related keywords and matching terms data.
|
||||||
|
"""
|
||||||
|
self.logger.info(f"Generating keyword suggestions for: {primary_keyword}")
|
||||||
|
result = {
|
||||||
|
"primary": [primary_keyword],
|
||||||
|
"secondary": [],
|
||||||
|
"lsi": [],
|
||||||
|
}
|
||||||
|
|
||||||
|
try:
|
||||||
|
api_key = config.get_required("AHREFS_API_KEY") if hasattr(config, "get_required") else None
|
||||||
|
if not api_key:
|
||||||
|
self.logger.warning("AHREFS_API_KEY not set; returning basic keywords only")
|
||||||
|
return result
|
||||||
|
|
||||||
|
# Matching terms
|
||||||
|
resp = requests.get(
|
||||||
|
"https://api.ahrefs.com/v3/keywords-explorer/matching-terms",
|
||||||
|
params={"keyword": primary_keyword, "limit": 20, "select": "keyword,volume,difficulty"},
|
||||||
|
headers={"Authorization": f"Bearer {api_key}"},
|
||||||
|
timeout=30,
|
||||||
|
)
|
||||||
|
if resp.status_code == 200:
|
||||||
|
data = resp.json()
|
||||||
|
terms = data.get("keywords", data.get("items", []))
|
||||||
|
for term in terms:
|
||||||
|
kw = term.get("keyword", "")
|
||||||
|
if kw and kw.lower() != primary_keyword.lower():
|
||||||
|
result["secondary"].append(kw)
|
||||||
|
|
||||||
|
# Related terms (LSI)
|
||||||
|
resp2 = requests.get(
|
||||||
|
"https://api.ahrefs.com/v3/keywords-explorer/related-terms",
|
||||||
|
params={"keyword": primary_keyword, "limit": 15, "select": "keyword,volume"},
|
||||||
|
headers={"Authorization": f"Bearer {api_key}"},
|
||||||
|
timeout=30,
|
||||||
|
)
|
||||||
|
if resp2.status_code == 200:
|
||||||
|
data2 = resp2.json()
|
||||||
|
related = data2.get("keywords", data2.get("items", []))
|
||||||
|
for term in related:
|
||||||
|
kw = term.get("keyword", "")
|
||||||
|
if kw and kw not in result["secondary"]:
|
||||||
|
result["lsi"].append(kw)
|
||||||
|
|
||||||
|
except Exception as exc:
|
||||||
|
self.logger.warning(f"Keyword suggestion lookup failed: {exc}")
|
||||||
|
|
||||||
|
return result
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Word count calculation
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def calculate_word_count(top_results: list[CompetitorPageAnalysis]) -> int:
|
||||||
|
"""
|
||||||
|
Calculate target word count based on top 5 ranking pages.
|
||||||
|
|
||||||
|
Returns the average word count of top 5 with +/- 20% range.
|
||||||
|
"""
|
||||||
|
word_counts = [r.word_count for r in top_results[:5] if r.word_count > 100]
|
||||||
|
|
||||||
|
if not word_counts:
|
||||||
|
return 1500 # Default fallback
|
||||||
|
|
||||||
|
avg = sum(word_counts) / len(word_counts)
|
||||||
|
# Round to nearest 100
|
||||||
|
target = round(avg / 100) * 100
|
||||||
|
return max(800, min(5000, target))
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Internal linking suggestions
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
async def suggest_internal_links(
|
||||||
|
self,
|
||||||
|
keyword: str,
|
||||||
|
site_url: str,
|
||||||
|
) -> list[dict[str, str]]:
|
||||||
|
"""
|
||||||
|
Find related existing pages on the site for internal linking.
|
||||||
|
|
||||||
|
Uses Ahrefs organic keywords to find pages ranking for related terms.
|
||||||
|
"""
|
||||||
|
self.logger.info(f"Finding internal link opportunities for {keyword} on {site_url}")
|
||||||
|
links: list[dict[str, str]] = []
|
||||||
|
target = urlparse(site_url).netloc or site_url
|
||||||
|
|
||||||
|
try:
|
||||||
|
api_key = config.get_required("AHREFS_API_KEY") if hasattr(config, "get_required") else None
|
||||||
|
if not api_key:
|
||||||
|
return links
|
||||||
|
|
||||||
|
resp = requests.get(
|
||||||
|
"https://api.ahrefs.com/v3/site-explorer/organic-keywords",
|
||||||
|
params={
|
||||||
|
"target": target,
|
||||||
|
"limit": 50,
|
||||||
|
"select": "keyword,url,position,traffic",
|
||||||
|
},
|
||||||
|
headers={"Authorization": f"Bearer {api_key}"},
|
||||||
|
timeout=30,
|
||||||
|
)
|
||||||
|
if resp.status_code != 200:
|
||||||
|
return links
|
||||||
|
|
||||||
|
data = resp.json()
|
||||||
|
keywords_data = data.get("keywords", data.get("items", []))
|
||||||
|
|
||||||
|
# Find pages ranking for related keywords
|
||||||
|
keyword_lower = keyword.lower()
|
||||||
|
keyword_words = set(keyword_lower.split())
|
||||||
|
|
||||||
|
seen_urls: set[str] = set()
|
||||||
|
for item in keywords_data:
|
||||||
|
kw = item.get("keyword", "").lower()
|
||||||
|
url = item.get("url", "")
|
||||||
|
|
||||||
|
if not url or url in seen_urls:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Check keyword relevance
|
||||||
|
kw_words = set(kw.split())
|
||||||
|
overlap = keyword_words & kw_words
|
||||||
|
if overlap and kw != keyword_lower:
|
||||||
|
links.append({
|
||||||
|
"url": url,
|
||||||
|
"anchor_text": kw,
|
||||||
|
"relevance": f"{len(overlap)}/{len(keyword_words)} word overlap",
|
||||||
|
"current_traffic": str(item.get("traffic", 0)),
|
||||||
|
})
|
||||||
|
seen_urls.add(url)
|
||||||
|
|
||||||
|
links.sort(key=lambda l: int(l.get("current_traffic", "0")), reverse=True)
|
||||||
|
|
||||||
|
except Exception as exc:
|
||||||
|
self.logger.warning(f"Internal link suggestion failed: {exc}")
|
||||||
|
|
||||||
|
return links[:10]
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Search intent detection
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def detect_search_intent(keyword: str) -> str:
|
||||||
|
"""Classify keyword search intent."""
|
||||||
|
keyword_lower = keyword.lower()
|
||||||
|
scores: dict[str, int] = {}
|
||||||
|
|
||||||
|
for intent, patterns in INTENT_PATTERNS.items():
|
||||||
|
score = sum(1 for p in patterns if re.search(p, keyword_lower, re.IGNORECASE))
|
||||||
|
if score > 0:
|
||||||
|
scores[intent] = score
|
||||||
|
|
||||||
|
if not scores:
|
||||||
|
return "informational"
|
||||||
|
return max(scores, key=scores.get)
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Orchestration
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
async def generate(
|
||||||
|
self,
|
||||||
|
keyword: str,
|
||||||
|
site_url: str,
|
||||||
|
num_competitors: int = 5,
|
||||||
|
) -> ContentBrief:
|
||||||
|
"""
|
||||||
|
Generate a comprehensive SEO content brief.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
keyword: Primary target keyword.
|
||||||
|
site_url: Target website URL.
|
||||||
|
num_competitors: Number of competitor pages to analyze.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
ContentBrief with outline, keywords, and recommendations.
|
||||||
|
"""
|
||||||
|
self.logger.info(f"Generating content brief for: {keyword}")
|
||||||
|
|
||||||
|
# Detect search intent
|
||||||
|
intent = self.detect_search_intent(keyword)
|
||||||
|
|
||||||
|
# Run analyses in parallel
|
||||||
|
top_results_task = self.analyze_top_results(keyword, site_url, num_competitors)
|
||||||
|
keywords_task = self.suggest_keywords(keyword)
|
||||||
|
internal_links_task = self.suggest_internal_links(keyword, site_url)
|
||||||
|
|
||||||
|
top_results, keyword_data, internal_links = await asyncio.gather(
|
||||||
|
top_results_task, keywords_task, internal_links_task,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Calculate word count target
|
||||||
|
target_word_count = self.calculate_word_count(top_results)
|
||||||
|
word_count_min = int(target_word_count * 0.8)
|
||||||
|
word_count_max = int(target_word_count * 1.2)
|
||||||
|
|
||||||
|
# Build outline
|
||||||
|
outline = self.extract_outline(keyword, top_results)
|
||||||
|
|
||||||
|
# Generate title suggestion
|
||||||
|
suggested_title = self._generate_title(keyword, intent)
|
||||||
|
|
||||||
|
# Generate meta description
|
||||||
|
meta_description = self._generate_meta_description(keyword, intent)
|
||||||
|
|
||||||
|
# Korean format recommendations
|
||||||
|
korean_tips = KOREAN_FORMAT_TIPS.get(intent, KOREAN_FORMAT_TIPS["informational"])
|
||||||
|
|
||||||
|
brief = ContentBrief(
|
||||||
|
primary_keyword=keyword,
|
||||||
|
secondary_keywords=keyword_data.get("secondary", [])[:10],
|
||||||
|
lsi_keywords=keyword_data.get("lsi", [])[:10],
|
||||||
|
target_word_count=target_word_count,
|
||||||
|
word_count_range=(word_count_min, word_count_max),
|
||||||
|
suggested_title=suggested_title,
|
||||||
|
meta_description=meta_description,
|
||||||
|
outline=outline,
|
||||||
|
competitor_analysis=top_results,
|
||||||
|
internal_links=internal_links,
|
||||||
|
content_format=self._suggest_format(intent, top_results),
|
||||||
|
korean_format_recommendations=korean_tips,
|
||||||
|
search_intent=intent,
|
||||||
|
timestamp=datetime.now().isoformat(),
|
||||||
|
)
|
||||||
|
|
||||||
|
self.logger.info(
|
||||||
|
f"Brief generated: {len(outline)} sections, "
|
||||||
|
f"{target_word_count} target words, "
|
||||||
|
f"{len(keyword_data.get('secondary', []))} secondary keywords"
|
||||||
|
)
|
||||||
|
|
||||||
|
return brief
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _generate_title(keyword: str, intent: str) -> str:
|
||||||
|
"""Generate a suggested title based on keyword and intent."""
|
||||||
|
templates = {
|
||||||
|
"informational": "{keyword} - 완벽 가이드 (2025년 최신)",
|
||||||
|
"commercial": "{keyword} 추천 TOP 10 비교 (전문가 리뷰)",
|
||||||
|
"transactional": "{keyword} 가격 비교 및 구매 가이드",
|
||||||
|
"navigational": "{keyword} - 공식 안내",
|
||||||
|
}
|
||||||
|
template = templates.get(intent, templates["informational"])
|
||||||
|
return template.format(keyword=keyword)
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _generate_meta_description(keyword: str, intent: str) -> str:
|
||||||
|
"""Generate a suggested meta description."""
|
||||||
|
templates = {
|
||||||
|
"informational": (
|
||||||
|
f"{keyword}에 대해 알아야 할 모든 것을 정리했습니다. "
|
||||||
|
"전문가가 알려주는 핵심 정보와 실용적인 가이드를 확인하세요."
|
||||||
|
),
|
||||||
|
"commercial": (
|
||||||
|
f"{keyword} 비교 분석! 장단점, 가격, 실제 후기를 "
|
||||||
|
"한눈에 비교하고 최적의 선택을 하세요."
|
||||||
|
),
|
||||||
|
"transactional": (
|
||||||
|
f"{keyword} 최저가 비교 및 구매 방법을 안내합니다. "
|
||||||
|
"합리적인 가격으로 구매하는 팁을 확인하세요."
|
||||||
|
),
|
||||||
|
"navigational": (
|
||||||
|
f"{keyword} 공식 정보 및 이용 안내. "
|
||||||
|
"정확한 정보를 빠르게 확인하세요."
|
||||||
|
),
|
||||||
|
}
|
||||||
|
return templates.get(intent, templates["informational"])
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _suggest_format(intent: str, results: list[CompetitorPageAnalysis]) -> str:
|
||||||
|
"""Suggest content format based on intent and competitor analysis."""
|
||||||
|
if intent == "commercial":
|
||||||
|
return "listicle"
|
||||||
|
if intent == "informational":
|
||||||
|
return "guide"
|
||||||
|
if intent == "transactional":
|
||||||
|
return "landing"
|
||||||
|
|
||||||
|
# Check competitor patterns
|
||||||
|
avg_word_count = (
|
||||||
|
sum(r.word_count for r in results) / len(results) if results else 0
|
||||||
|
)
|
||||||
|
if avg_word_count > 3000:
|
||||||
|
return "comprehensive_guide"
|
||||||
|
return "blog"
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# CLI
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def build_parser() -> argparse.ArgumentParser:
|
||||||
|
parser = argparse.ArgumentParser(
|
||||||
|
description="SEO Content Brief Generator",
|
||||||
|
)
|
||||||
|
parser.add_argument("--keyword", required=True, help="Primary target keyword")
|
||||||
|
parser.add_argument("--url", required=True, help="Target website URL")
|
||||||
|
parser.add_argument("--competitors", type=int, default=5, help="Number of competitor pages to analyze (default: 5)")
|
||||||
|
parser.add_argument("--json", action="store_true", help="Output as JSON")
|
||||||
|
parser.add_argument("--output", help="Save output to file")
|
||||||
|
return parser
|
||||||
|
|
||||||
|
|
||||||
|
def format_text_report(brief: ContentBrief) -> str:
|
||||||
|
"""Format content brief as human-readable text."""
|
||||||
|
lines: list[str] = []
|
||||||
|
lines.append(f"## Content Brief: {brief.primary_keyword}")
|
||||||
|
lines.append(f"**Date**: {brief.timestamp[:10]}")
|
||||||
|
lines.append(f"**Search Intent**: {brief.search_intent}")
|
||||||
|
lines.append(f"**Content Format**: {brief.content_format}")
|
||||||
|
lines.append("")
|
||||||
|
|
||||||
|
lines.append("### Target Metrics")
|
||||||
|
lines.append(f"- Word count: {brief.target_word_count} ({brief.word_count_range[0]}-{brief.word_count_range[1]})")
|
||||||
|
lines.append(f"- Suggested title: {brief.suggested_title}")
|
||||||
|
lines.append(f"- Meta description: {brief.meta_description}")
|
||||||
|
lines.append("")
|
||||||
|
|
||||||
|
lines.append("### Keywords")
|
||||||
|
lines.append(f"- **Primary**: {brief.primary_keyword}")
|
||||||
|
if brief.secondary_keywords:
|
||||||
|
lines.append(f"- **Secondary**: {', '.join(brief.secondary_keywords[:8])}")
|
||||||
|
if brief.lsi_keywords:
|
||||||
|
lines.append(f"- **LSI**: {', '.join(brief.lsi_keywords[:8])}")
|
||||||
|
lines.append("")
|
||||||
|
|
||||||
|
lines.append("### Content Outline")
|
||||||
|
for section in brief.outline:
|
||||||
|
prefix = "##" if section.level == 2 else "###"
|
||||||
|
lines.append(f" {prefix} {section.heading} (~{section.target_words}w)")
|
||||||
|
for point in section.talking_points:
|
||||||
|
lines.append(f" - {point}")
|
||||||
|
lines.append("")
|
||||||
|
|
||||||
|
if brief.competitor_analysis:
|
||||||
|
lines.append(f"### Competitor Analysis ({len(brief.competitor_analysis)} pages)")
|
||||||
|
for comp in brief.competitor_analysis:
|
||||||
|
lines.append(f" - **{comp.title or comp.url}**")
|
||||||
|
lines.append(f" Word count: {comp.word_count} | Headings: {len(comp.headings)}")
|
||||||
|
features = []
|
||||||
|
if comp.has_images:
|
||||||
|
features.append("images")
|
||||||
|
if comp.has_video:
|
||||||
|
features.append("video")
|
||||||
|
if comp.has_faq:
|
||||||
|
features.append("FAQ")
|
||||||
|
if comp.has_table:
|
||||||
|
features.append("table")
|
||||||
|
if features:
|
||||||
|
lines.append(f" Features: {', '.join(features)}")
|
||||||
|
lines.append("")
|
||||||
|
|
||||||
|
if brief.internal_links:
|
||||||
|
lines.append(f"### Internal Linking Suggestions ({len(brief.internal_links)})")
|
||||||
|
for link in brief.internal_links[:7]:
|
||||||
|
lines.append(f" - [{link['anchor_text']}]({link['url']})")
|
||||||
|
lines.append("")
|
||||||
|
|
||||||
|
if brief.korean_format_recommendations:
|
||||||
|
lines.append("### Korean Content Format Recommendations")
|
||||||
|
for tip in brief.korean_format_recommendations:
|
||||||
|
lines.append(f" - {tip}")
|
||||||
|
|
||||||
|
return "\n".join(lines)
|
||||||
|
|
||||||
|
|
||||||
|
async def main() -> None:
|
||||||
|
parser = build_parser()
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
generator = ContentBriefGenerator()
|
||||||
|
try:
|
||||||
|
brief = await generator.generate(
|
||||||
|
keyword=args.keyword,
|
||||||
|
site_url=args.url,
|
||||||
|
num_competitors=args.competitors,
|
||||||
|
)
|
||||||
|
|
||||||
|
if args.json:
|
||||||
|
output = json.dumps(asdict(brief), ensure_ascii=False, indent=2, default=str)
|
||||||
|
else:
|
||||||
|
output = format_text_report(brief)
|
||||||
|
|
||||||
|
if args.output:
|
||||||
|
with open(args.output, "w", encoding="utf-8") as f:
|
||||||
|
f.write(output)
|
||||||
|
logger.info(f"Output saved to {args.output}")
|
||||||
|
else:
|
||||||
|
print(output)
|
||||||
|
|
||||||
|
finally:
|
||||||
|
await generator.close()
|
||||||
|
generator.print_stats()
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
asyncio.run(main())
|
||||||
@@ -0,0 +1,694 @@
|
|||||||
|
"""
|
||||||
|
Content Gap Analyzer - Topic Gap Detection & Cluster Mapping
|
||||||
|
=============================================================
|
||||||
|
Purpose: Identify content gaps vs competitors, build topic clusters,
|
||||||
|
and generate prioritized editorial calendars.
|
||||||
|
Python: 3.10+
|
||||||
|
"""
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import asyncio
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
import math
|
||||||
|
import re
|
||||||
|
import sys
|
||||||
|
from collections import defaultdict
|
||||||
|
from dataclasses import asdict, dataclass, field
|
||||||
|
from datetime import datetime, timedelta
|
||||||
|
from typing import Any
|
||||||
|
from urllib.parse import urlparse
|
||||||
|
|
||||||
|
import requests
|
||||||
|
from sklearn.feature_extraction.text import TfidfVectorizer
|
||||||
|
from sklearn.cluster import AgglomerativeClustering
|
||||||
|
|
||||||
|
from base_client import BaseAsyncClient, config
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Data classes
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class TopicGap:
|
||||||
|
"""A topic present in competitors but missing from target."""
|
||||||
|
topic: str
|
||||||
|
competitor_urls: list[str] = field(default_factory=list)
|
||||||
|
competitor_keywords: list[str] = field(default_factory=list)
|
||||||
|
estimated_traffic: int = 0
|
||||||
|
priority_score: float = 0.0
|
||||||
|
difficulty: str = "medium"
|
||||||
|
content_type_suggestion: str = "blog"
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class TopicCluster:
|
||||||
|
"""Topic cluster with pillar and supporting cluster pages."""
|
||||||
|
pillar_topic: str
|
||||||
|
pillar_keyword: str = ""
|
||||||
|
cluster_topics: list[str] = field(default_factory=list)
|
||||||
|
cluster_keywords: list[str] = field(default_factory=list)
|
||||||
|
total_volume: int = 0
|
||||||
|
coverage_score: float = 0.0
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class CalendarEntry:
|
||||||
|
"""Prioritized editorial calendar entry."""
|
||||||
|
topic: str
|
||||||
|
priority: str = "medium"
|
||||||
|
target_date: str = ""
|
||||||
|
content_type: str = "blog"
|
||||||
|
target_word_count: int = 1500
|
||||||
|
primary_keyword: str = ""
|
||||||
|
estimated_traffic: int = 0
|
||||||
|
cluster_name: str = ""
|
||||||
|
notes: str = ""
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class ContentGapResult:
|
||||||
|
"""Full content gap analysis result."""
|
||||||
|
target_url: str
|
||||||
|
competitor_urls: list[str] = field(default_factory=list)
|
||||||
|
timestamp: str = ""
|
||||||
|
target_topics_count: int = 0
|
||||||
|
competitor_topics_count: int = 0
|
||||||
|
gaps: list[TopicGap] = field(default_factory=list)
|
||||||
|
clusters: list[TopicCluster] = field(default_factory=list)
|
||||||
|
calendar: list[CalendarEntry] = field(default_factory=list)
|
||||||
|
content_volume_comparison: dict[str, int] = field(default_factory=dict)
|
||||||
|
korean_opportunities: list[dict[str, Any]] = field(default_factory=dict)
|
||||||
|
recommendations: list[str] = field(default_factory=list)
|
||||||
|
errors: list[str] = field(default_factory=list)
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Korean opportunity patterns
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
KOREAN_OPPORTUNITY_PATTERNS = [
|
||||||
|
{"pattern": r"후기|리뷰", "label": "review_content", "description": "후기/리뷰 콘텐츠"},
|
||||||
|
{"pattern": r"비용|가격|견적", "label": "pricing_content", "description": "비용/가격 정보 콘텐츠"},
|
||||||
|
{"pattern": r"비교|차이", "label": "comparison_content", "description": "비교 콘텐츠"},
|
||||||
|
{"pattern": r"추천|베스트|TOP", "label": "recommendation_content", "description": "추천/리스트 콘텐츠"},
|
||||||
|
{"pattern": r"방법|하는\s*법|가이드", "label": "how_to_content", "description": "가이드/방법 콘텐츠"},
|
||||||
|
{"pattern": r"부작용|주의|위험", "label": "safety_content", "description": "안전/부작용 정보"},
|
||||||
|
{"pattern": r"효과|결과|전후", "label": "results_content", "description": "효과/결과 콘텐츠"},
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# ContentGapAnalyzer
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
class ContentGapAnalyzer(BaseAsyncClient):
|
||||||
|
"""Analyze content gaps between target and competitor sites."""
|
||||||
|
|
||||||
|
def __init__(self, max_concurrent: int = 5, requests_per_second: float = 2.0):
|
||||||
|
super().__init__(max_concurrent=max_concurrent, requests_per_second=requests_per_second)
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Ahrefs data retrieval
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
async def get_competitor_topics(self, competitor_url: str, limit: int = 100) -> list[dict]:
|
||||||
|
"""
|
||||||
|
Get top pages and keywords for a competitor via Ahrefs.
|
||||||
|
|
||||||
|
Returns list of dicts: url, traffic, keywords, top_keyword, title.
|
||||||
|
"""
|
||||||
|
self.logger.info(f"Fetching competitor topics for {competitor_url}")
|
||||||
|
target = urlparse(competitor_url).netloc or competitor_url
|
||||||
|
|
||||||
|
try:
|
||||||
|
api_key = config.get_required("AHREFS_API_KEY") if hasattr(config, "get_required") else None
|
||||||
|
if not api_key:
|
||||||
|
self.logger.warning("AHREFS_API_KEY not set; returning empty competitor topics")
|
||||||
|
return []
|
||||||
|
|
||||||
|
resp = requests.get(
|
||||||
|
"https://api.ahrefs.com/v3/site-explorer/top-pages",
|
||||||
|
params={
|
||||||
|
"target": target,
|
||||||
|
"limit": limit,
|
||||||
|
"select": "url,traffic,keywords,value,top_keyword",
|
||||||
|
},
|
||||||
|
headers={"Authorization": f"Bearer {api_key}"},
|
||||||
|
timeout=30,
|
||||||
|
)
|
||||||
|
resp.raise_for_status()
|
||||||
|
data = resp.json()
|
||||||
|
pages = data.get("pages", data.get("items", []))
|
||||||
|
self.logger.info(f"Retrieved {len(pages)} competitor topics from {competitor_url}")
|
||||||
|
return pages
|
||||||
|
except Exception as exc:
|
||||||
|
self.logger.warning(f"Failed to get competitor topics for {competitor_url}: {exc}")
|
||||||
|
return []
|
||||||
|
|
||||||
|
async def get_target_keywords(self, target_url: str, limit: int = 200) -> set[str]:
|
||||||
|
"""Get the set of keywords the target site already ranks for."""
|
||||||
|
self.logger.info(f"Fetching target keywords for {target_url}")
|
||||||
|
target = urlparse(target_url).netloc or target_url
|
||||||
|
|
||||||
|
try:
|
||||||
|
api_key = config.get_required("AHREFS_API_KEY") if hasattr(config, "get_required") else None
|
||||||
|
if not api_key:
|
||||||
|
return set()
|
||||||
|
|
||||||
|
resp = requests.get(
|
||||||
|
"https://api.ahrefs.com/v3/site-explorer/organic-keywords",
|
||||||
|
params={"target": target, "limit": limit, "select": "keyword,position,traffic"},
|
||||||
|
headers={"Authorization": f"Bearer {api_key}"},
|
||||||
|
timeout=30,
|
||||||
|
)
|
||||||
|
resp.raise_for_status()
|
||||||
|
data = resp.json()
|
||||||
|
keywords = data.get("keywords", data.get("items", []))
|
||||||
|
return {kw.get("keyword", "").lower() for kw in keywords if kw.get("keyword")}
|
||||||
|
except Exception as exc:
|
||||||
|
self.logger.warning(f"Failed to get target keywords: {exc}")
|
||||||
|
return set()
|
||||||
|
|
||||||
|
async def get_organic_competitors(self, target_url: str, limit: int = 10) -> list[str]:
|
||||||
|
"""Discover organic competitors via Ahrefs."""
|
||||||
|
self.logger.info(f"Discovering organic competitors for {target_url}")
|
||||||
|
target = urlparse(target_url).netloc or target_url
|
||||||
|
|
||||||
|
try:
|
||||||
|
api_key = config.get_required("AHREFS_API_KEY") if hasattr(config, "get_required") else None
|
||||||
|
if not api_key:
|
||||||
|
return []
|
||||||
|
|
||||||
|
resp = requests.get(
|
||||||
|
"https://api.ahrefs.com/v3/site-explorer/organic-competitors",
|
||||||
|
params={"target": target, "limit": limit},
|
||||||
|
headers={"Authorization": f"Bearer {api_key}"},
|
||||||
|
timeout=30,
|
||||||
|
)
|
||||||
|
resp.raise_for_status()
|
||||||
|
data = resp.json()
|
||||||
|
competitors = data.get("competitors", data.get("items", []))
|
||||||
|
return [c.get("domain", "") for c in competitors if c.get("domain")]
|
||||||
|
except Exception as exc:
|
||||||
|
self.logger.warning(f"Failed to discover competitors: {exc}")
|
||||||
|
return []
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Gap analysis
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
async def find_topic_gaps(
|
||||||
|
self,
|
||||||
|
target_url: str,
|
||||||
|
competitor_urls: list[str],
|
||||||
|
) -> tuple[list[TopicGap], set[str], dict[str, int]]:
|
||||||
|
"""
|
||||||
|
Identify topics covered by competitors but missing from target.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
- List of TopicGap objects.
|
||||||
|
- Set of target keywords (for reference).
|
||||||
|
- Content volume comparison dict.
|
||||||
|
"""
|
||||||
|
# Gather target keywords
|
||||||
|
target_keywords = await self.get_target_keywords(target_url)
|
||||||
|
|
||||||
|
# Gather competitor data in parallel
|
||||||
|
competitor_tasks = [self.get_competitor_topics(c_url) for c_url in competitor_urls]
|
||||||
|
competitor_results = await asyncio.gather(*competitor_tasks, return_exceptions=True)
|
||||||
|
|
||||||
|
# Build competitor topic map
|
||||||
|
competitor_topic_map: dict[str, TopicGap] = {}
|
||||||
|
content_volume: dict[str, int] = {target_url: len(target_keywords)}
|
||||||
|
|
||||||
|
for c_url, c_result in zip(competitor_urls, competitor_results):
|
||||||
|
if isinstance(c_result, Exception):
|
||||||
|
self.logger.warning(f"Error fetching {c_url}: {c_result}")
|
||||||
|
continue
|
||||||
|
|
||||||
|
pages = c_result if isinstance(c_result, list) else []
|
||||||
|
content_volume[c_url] = len(pages)
|
||||||
|
|
||||||
|
for page in pages:
|
||||||
|
top_keyword = page.get("top_keyword", "").strip().lower()
|
||||||
|
if not top_keyword:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Skip if target already covers this keyword
|
||||||
|
if top_keyword in target_keywords:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Check for fuzzy matches (keyword contained in target set)
|
||||||
|
is_covered = any(
|
||||||
|
top_keyword in tk or tk in top_keyword
|
||||||
|
for tk in target_keywords
|
||||||
|
if len(tk) > 3
|
||||||
|
)
|
||||||
|
if is_covered:
|
||||||
|
continue
|
||||||
|
|
||||||
|
if top_keyword not in competitor_topic_map:
|
||||||
|
competitor_topic_map[top_keyword] = TopicGap(
|
||||||
|
topic=top_keyword,
|
||||||
|
estimated_traffic=int(page.get("traffic", 0)),
|
||||||
|
)
|
||||||
|
|
||||||
|
gap = competitor_topic_map[top_keyword]
|
||||||
|
gap.competitor_urls.append(page.get("url", c_url))
|
||||||
|
gap.competitor_keywords.append(top_keyword)
|
||||||
|
gap.estimated_traffic = max(gap.estimated_traffic, int(page.get("traffic", 0)))
|
||||||
|
|
||||||
|
# Score gaps
|
||||||
|
gaps = list(competitor_topic_map.values())
|
||||||
|
for gap in gaps:
|
||||||
|
competitor_count = len(set(gap.competitor_urls))
|
||||||
|
traffic_score = min(100, math.log10(max(gap.estimated_traffic, 1)) / math.log10(10000) * 100)
|
||||||
|
competition_score = (competitor_count / max(len(competitor_urls), 1)) * 100
|
||||||
|
gap.priority_score = round((traffic_score * 0.6) + (competition_score * 0.4), 1)
|
||||||
|
|
||||||
|
# Difficulty estimation
|
||||||
|
if competitor_count >= 3:
|
||||||
|
gap.difficulty = "high"
|
||||||
|
elif competitor_count >= 2:
|
||||||
|
gap.difficulty = "medium"
|
||||||
|
else:
|
||||||
|
gap.difficulty = "low"
|
||||||
|
|
||||||
|
# Content type suggestion
|
||||||
|
gap.content_type_suggestion = self._suggest_content_type(gap.topic)
|
||||||
|
|
||||||
|
gaps.sort(key=lambda g: g.priority_score, reverse=True)
|
||||||
|
return gaps, target_keywords, content_volume
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _suggest_content_type(topic: str) -> str:
|
||||||
|
"""Suggest content type based on topic keywords."""
|
||||||
|
topic_lower = topic.lower()
|
||||||
|
if any(w in topic_lower for w in ["how to", "guide", "tutorial", "방법", "가이드"]):
|
||||||
|
return "guide"
|
||||||
|
if any(w in topic_lower for w in ["best", "top", "review", "추천", "후기", "비교"]):
|
||||||
|
return "listicle"
|
||||||
|
if any(w in topic_lower for w in ["what is", "이란", "뜻", "의미"]):
|
||||||
|
return "informational"
|
||||||
|
if any(w in topic_lower for w in ["cost", "price", "비용", "가격"]):
|
||||||
|
return "landing"
|
||||||
|
return "blog"
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Topic cluster mapping
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
def build_topic_clusters(
|
||||||
|
self,
|
||||||
|
topics: list[str],
|
||||||
|
n_clusters: int | None = None,
|
||||||
|
min_cluster_size: int = 3,
|
||||||
|
) -> list[TopicCluster]:
|
||||||
|
"""
|
||||||
|
Group topics into pillar/cluster structure using TF-IDF + hierarchical clustering.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
topics: List of topic strings.
|
||||||
|
n_clusters: Number of clusters (auto-detected if None).
|
||||||
|
min_cluster_size: Minimum topics per cluster.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of TopicCluster objects.
|
||||||
|
"""
|
||||||
|
if len(topics) < min_cluster_size:
|
||||||
|
self.logger.warning("Too few topics for clustering")
|
||||||
|
return []
|
||||||
|
|
||||||
|
# Vectorize topics
|
||||||
|
vectorizer = TfidfVectorizer(
|
||||||
|
max_features=500,
|
||||||
|
stop_words="english",
|
||||||
|
ngram_range=(1, 2),
|
||||||
|
)
|
||||||
|
|
||||||
|
try:
|
||||||
|
tfidf_matrix = vectorizer.fit_transform(topics)
|
||||||
|
except ValueError as exc:
|
||||||
|
self.logger.warning(f"TF-IDF vectorization failed: {exc}")
|
||||||
|
return []
|
||||||
|
|
||||||
|
# Auto-detect cluster count
|
||||||
|
if n_clusters is None:
|
||||||
|
n_clusters = max(2, min(len(topics) // 5, 15))
|
||||||
|
|
||||||
|
n_clusters = min(n_clusters, len(topics) - 1)
|
||||||
|
|
||||||
|
# Hierarchical clustering
|
||||||
|
clustering = AgglomerativeClustering(
|
||||||
|
n_clusters=n_clusters,
|
||||||
|
metric="cosine",
|
||||||
|
linkage="average",
|
||||||
|
)
|
||||||
|
labels = clustering.fit_predict(tfidf_matrix.toarray())
|
||||||
|
|
||||||
|
# Build cluster objects
|
||||||
|
cluster_map: dict[int, list[str]] = defaultdict(list)
|
||||||
|
for topic, label in zip(topics, labels):
|
||||||
|
cluster_map[label].append(topic)
|
||||||
|
|
||||||
|
clusters: list[TopicCluster] = []
|
||||||
|
for label, cluster_topics in sorted(cluster_map.items()):
|
||||||
|
if len(cluster_topics) < min_cluster_size:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Pick the longest topic as pillar (usually broader)
|
||||||
|
pillar = max(cluster_topics, key=len)
|
||||||
|
subtopics = [t for t in cluster_topics if t != pillar]
|
||||||
|
|
||||||
|
cluster = TopicCluster(
|
||||||
|
pillar_topic=pillar,
|
||||||
|
pillar_keyword=pillar,
|
||||||
|
cluster_topics=subtopics[:20],
|
||||||
|
cluster_keywords=[t for t in subtopics[:20]],
|
||||||
|
total_volume=0,
|
||||||
|
coverage_score=0.0,
|
||||||
|
)
|
||||||
|
clusters.append(cluster)
|
||||||
|
|
||||||
|
clusters.sort(key=lambda c: len(c.cluster_topics), reverse=True)
|
||||||
|
return clusters
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Editorial calendar generation
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
def generate_calendar(
|
||||||
|
self,
|
||||||
|
gaps: list[TopicGap],
|
||||||
|
clusters: list[TopicCluster],
|
||||||
|
weeks_ahead: int = 12,
|
||||||
|
entries_per_week: int = 2,
|
||||||
|
) -> list[CalendarEntry]:
|
||||||
|
"""
|
||||||
|
Generate prioritized editorial calendar from gaps and clusters.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
gaps: List of topic gaps (sorted by priority).
|
||||||
|
clusters: List of topic clusters.
|
||||||
|
weeks_ahead: Number of weeks to plan.
|
||||||
|
entries_per_week: Content pieces per week.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of CalendarEntry objects.
|
||||||
|
"""
|
||||||
|
calendar: list[CalendarEntry] = []
|
||||||
|
today = datetime.now()
|
||||||
|
|
||||||
|
# Build cluster lookup
|
||||||
|
topic_to_cluster: dict[str, str] = {}
|
||||||
|
for cluster in clusters:
|
||||||
|
for topic in cluster.cluster_topics:
|
||||||
|
topic_to_cluster[topic] = cluster.pillar_topic
|
||||||
|
topic_to_cluster[cluster.pillar_topic] = cluster.pillar_topic
|
||||||
|
|
||||||
|
# Prioritize: pillar topics first, then by priority score
|
||||||
|
pillar_topics = {c.pillar_topic for c in clusters}
|
||||||
|
pillar_gaps = [g for g in gaps if g.topic in pillar_topics]
|
||||||
|
other_gaps = [g for g in gaps if g.topic not in pillar_topics]
|
||||||
|
ordered_gaps = pillar_gaps + other_gaps
|
||||||
|
|
||||||
|
max_entries = weeks_ahead * entries_per_week
|
||||||
|
week_offset = 0
|
||||||
|
slot_in_week = 0
|
||||||
|
|
||||||
|
for gap in ordered_gaps[:max_entries]:
|
||||||
|
target_date = today + timedelta(weeks=week_offset, days=slot_in_week * 3)
|
||||||
|
|
||||||
|
# Determine priority label
|
||||||
|
if gap.priority_score >= 70:
|
||||||
|
priority = "high"
|
||||||
|
elif gap.priority_score >= 40:
|
||||||
|
priority = "medium"
|
||||||
|
else:
|
||||||
|
priority = "low"
|
||||||
|
|
||||||
|
# Word count based on content type
|
||||||
|
word_count_map = {
|
||||||
|
"guide": 2500,
|
||||||
|
"listicle": 2000,
|
||||||
|
"informational": 1800,
|
||||||
|
"landing": 1200,
|
||||||
|
"blog": 1500,
|
||||||
|
}
|
||||||
|
|
||||||
|
entry = CalendarEntry(
|
||||||
|
topic=gap.topic,
|
||||||
|
priority=priority,
|
||||||
|
target_date=target_date.strftime("%Y-%m-%d"),
|
||||||
|
content_type=gap.content_type_suggestion,
|
||||||
|
target_word_count=word_count_map.get(gap.content_type_suggestion, 1500),
|
||||||
|
primary_keyword=gap.topic,
|
||||||
|
estimated_traffic=gap.estimated_traffic,
|
||||||
|
cluster_name=topic_to_cluster.get(gap.topic, "uncategorized"),
|
||||||
|
)
|
||||||
|
calendar.append(entry)
|
||||||
|
|
||||||
|
slot_in_week += 1
|
||||||
|
if slot_in_week >= entries_per_week:
|
||||||
|
slot_in_week = 0
|
||||||
|
week_offset += 1
|
||||||
|
|
||||||
|
return calendar
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Korean opportunity detection
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def detect_korean_opportunities(gaps: list[TopicGap]) -> list[dict[str, Any]]:
|
||||||
|
"""Detect Korean-market content opportunities in gaps."""
|
||||||
|
opportunities: list[dict[str, Any]] = []
|
||||||
|
|
||||||
|
for gap in gaps:
|
||||||
|
for pattern_info in KOREAN_OPPORTUNITY_PATTERNS:
|
||||||
|
if re.search(pattern_info["pattern"], gap.topic, re.IGNORECASE):
|
||||||
|
opportunities.append({
|
||||||
|
"topic": gap.topic,
|
||||||
|
"pattern": pattern_info["label"],
|
||||||
|
"description": pattern_info["description"],
|
||||||
|
"estimated_traffic": gap.estimated_traffic,
|
||||||
|
"priority_score": gap.priority_score,
|
||||||
|
})
|
||||||
|
break
|
||||||
|
|
||||||
|
opportunities.sort(key=lambda o: o["priority_score"], reverse=True)
|
||||||
|
return opportunities
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Orchestration
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
async def analyze(
|
||||||
|
self,
|
||||||
|
target_url: str,
|
||||||
|
competitor_urls: list[str],
|
||||||
|
build_clusters: bool = False,
|
||||||
|
) -> ContentGapResult:
|
||||||
|
"""
|
||||||
|
Run full content gap analysis.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
target_url: Target website URL.
|
||||||
|
competitor_urls: List of competitor URLs.
|
||||||
|
build_clusters: Whether to build topic clusters.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
ContentGapResult with gaps, clusters, and calendar.
|
||||||
|
"""
|
||||||
|
result = ContentGapResult(
|
||||||
|
target_url=target_url,
|
||||||
|
competitor_urls=competitor_urls,
|
||||||
|
timestamp=datetime.now().isoformat(),
|
||||||
|
)
|
||||||
|
|
||||||
|
self.logger.info(
|
||||||
|
f"Starting gap analysis: {target_url} vs {len(competitor_urls)} competitors"
|
||||||
|
)
|
||||||
|
|
||||||
|
# 1. Find topic gaps
|
||||||
|
gaps, target_keywords, content_volume = await self.find_topic_gaps(
|
||||||
|
target_url, competitor_urls
|
||||||
|
)
|
||||||
|
|
||||||
|
result.gaps = gaps
|
||||||
|
result.target_topics_count = len(target_keywords)
|
||||||
|
result.competitor_topics_count = sum(content_volume.get(c, 0) for c in competitor_urls)
|
||||||
|
result.content_volume_comparison = content_volume
|
||||||
|
|
||||||
|
# 2. Build topic clusters if requested
|
||||||
|
if build_clusters and gaps:
|
||||||
|
all_topics = [g.topic for g in gaps]
|
||||||
|
result.clusters = self.build_topic_clusters(all_topics)
|
||||||
|
|
||||||
|
# 3. Generate editorial calendar
|
||||||
|
result.calendar = self.generate_calendar(gaps, result.clusters)
|
||||||
|
|
||||||
|
# 4. Detect Korean opportunities
|
||||||
|
result.korean_opportunities = self.detect_korean_opportunities(gaps)
|
||||||
|
|
||||||
|
# 5. Recommendations
|
||||||
|
result.recommendations = self._generate_recommendations(result)
|
||||||
|
|
||||||
|
self.logger.info(
|
||||||
|
f"Gap analysis complete: {len(gaps)} gaps, "
|
||||||
|
f"{len(result.clusters)} clusters, "
|
||||||
|
f"{len(result.calendar)} calendar entries"
|
||||||
|
)
|
||||||
|
|
||||||
|
return result
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _generate_recommendations(result: ContentGapResult) -> list[str]:
|
||||||
|
"""Generate strategic recommendations from gap analysis."""
|
||||||
|
recs: list[str] = []
|
||||||
|
|
||||||
|
gap_count = len(result.gaps)
|
||||||
|
if gap_count > 50:
|
||||||
|
recs.append(
|
||||||
|
f"경쟁사 대비 {gap_count}개의 콘텐츠 격차가 발견되었습니다. "
|
||||||
|
"우선순위 상위 20개 주제부터 콘텐츠 생성을 시작하세요."
|
||||||
|
)
|
||||||
|
elif gap_count > 20:
|
||||||
|
recs.append(
|
||||||
|
f"{gap_count}개의 콘텐츠 격차가 있습니다. "
|
||||||
|
"높은 트래픽 기회부터 순차적으로 콘텐츠를 생성하세요."
|
||||||
|
)
|
||||||
|
elif gap_count > 0:
|
||||||
|
recs.append(
|
||||||
|
f"{gap_count}개의 콘텐츠 격차가 발견되었습니다. "
|
||||||
|
"비교적 적은 격차이므로 빠른 시일 내 모두 커버할 수 있습니다."
|
||||||
|
)
|
||||||
|
|
||||||
|
if result.clusters:
|
||||||
|
recs.append(
|
||||||
|
f"{len(result.clusters)}개의 토픽 클러스터를 구성했습니다. "
|
||||||
|
"필러 콘텐츠부터 작성하여 내부 링크 구조를 강화하세요."
|
||||||
|
)
|
||||||
|
|
||||||
|
if result.korean_opportunities:
|
||||||
|
recs.append(
|
||||||
|
f"한국어 시장 기회가 {len(result.korean_opportunities)}개 발견되었습니다. "
|
||||||
|
"후기, 비용, 비교 콘텐츠는 한국 검색 시장에서 높은 전환율을 보입니다."
|
||||||
|
)
|
||||||
|
|
||||||
|
high_priority = [g for g in result.gaps if g.priority_score >= 70]
|
||||||
|
if high_priority:
|
||||||
|
top_topics = ", ".join(g.topic for g in high_priority[:3])
|
||||||
|
recs.append(
|
||||||
|
f"최우선 주제: {top_topics}. "
|
||||||
|
"이 주제들은 높은 트래픽 잠재력과 경쟁사 커버리지를 가지고 있습니다."
|
||||||
|
)
|
||||||
|
|
||||||
|
if not recs:
|
||||||
|
recs.append("경쟁사 대비 콘텐츠 커버리지가 양호합니다. 기존 콘텐츠 최적화에 집중하세요.")
|
||||||
|
|
||||||
|
return recs
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# CLI
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def build_parser() -> argparse.ArgumentParser:
|
||||||
|
parser = argparse.ArgumentParser(
|
||||||
|
description="SEO Content Gap Analyzer - topic gaps, clusters, calendar",
|
||||||
|
)
|
||||||
|
parser.add_argument("--target", required=True, help="Target website URL")
|
||||||
|
parser.add_argument(
|
||||||
|
"--competitor", action="append", dest="competitors", required=True,
|
||||||
|
help="Competitor URL (can be repeated)",
|
||||||
|
)
|
||||||
|
parser.add_argument("--clusters", action="store_true", help="Build topic clusters")
|
||||||
|
parser.add_argument("--json", action="store_true", help="Output as JSON")
|
||||||
|
parser.add_argument("--output", help="Save output to file")
|
||||||
|
return parser
|
||||||
|
|
||||||
|
|
||||||
|
def format_text_report(result: ContentGapResult) -> str:
|
||||||
|
"""Format gap analysis result as human-readable text."""
|
||||||
|
lines: list[str] = []
|
||||||
|
lines.append(f"## Content Gap Analysis: {result.target_url}")
|
||||||
|
lines.append(f"**Date**: {result.timestamp[:10]}")
|
||||||
|
lines.append(f"**Competitors**: {', '.join(result.competitor_urls)}")
|
||||||
|
lines.append("")
|
||||||
|
|
||||||
|
lines.append("### Content Volume Comparison")
|
||||||
|
for site, count in result.content_volume_comparison.items():
|
||||||
|
lines.append(f" - {site}: {count} topics")
|
||||||
|
lines.append("")
|
||||||
|
|
||||||
|
lines.append(f"### Topic Gaps ({len(result.gaps)} found)")
|
||||||
|
for i, gap in enumerate(result.gaps[:20], 1):
|
||||||
|
lines.append(
|
||||||
|
f" {i}. [{gap.priority_score:.0f}] {gap.topic} "
|
||||||
|
f"(traffic: {gap.estimated_traffic}, difficulty: {gap.difficulty})"
|
||||||
|
)
|
||||||
|
lines.append("")
|
||||||
|
|
||||||
|
if result.clusters:
|
||||||
|
lines.append(f"### Topic Clusters ({len(result.clusters)})")
|
||||||
|
for i, cluster in enumerate(result.clusters, 1):
|
||||||
|
lines.append(f" {i}. **{cluster.pillar_topic}** ({len(cluster.cluster_topics)} subtopics)")
|
||||||
|
for sub in cluster.cluster_topics[:5]:
|
||||||
|
lines.append(f" - {sub}")
|
||||||
|
lines.append("")
|
||||||
|
|
||||||
|
if result.calendar:
|
||||||
|
lines.append(f"### Editorial Calendar ({len(result.calendar)} entries)")
|
||||||
|
for entry in result.calendar[:15]:
|
||||||
|
lines.append(
|
||||||
|
f" - [{entry.target_date}] {entry.topic} "
|
||||||
|
f"({entry.content_type}, {entry.target_word_count}w, priority: {entry.priority})"
|
||||||
|
)
|
||||||
|
lines.append("")
|
||||||
|
|
||||||
|
if result.korean_opportunities:
|
||||||
|
lines.append(f"### Korean Market Opportunities ({len(result.korean_opportunities)})")
|
||||||
|
for opp in result.korean_opportunities[:10]:
|
||||||
|
lines.append(f" - {opp['topic']} ({opp['description']})")
|
||||||
|
lines.append("")
|
||||||
|
|
||||||
|
lines.append("### Recommendations")
|
||||||
|
for i, rec in enumerate(result.recommendations, 1):
|
||||||
|
lines.append(f" {i}. {rec}")
|
||||||
|
|
||||||
|
return "\n".join(lines)
|
||||||
|
|
||||||
|
|
||||||
|
async def main() -> None:
|
||||||
|
parser = build_parser()
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
analyzer = ContentGapAnalyzer()
|
||||||
|
result = await analyzer.analyze(
|
||||||
|
target_url=args.target,
|
||||||
|
competitor_urls=args.competitors,
|
||||||
|
build_clusters=args.clusters,
|
||||||
|
)
|
||||||
|
|
||||||
|
if args.json:
|
||||||
|
output = json.dumps(asdict(result), ensure_ascii=False, indent=2, default=str)
|
||||||
|
else:
|
||||||
|
output = format_text_report(result)
|
||||||
|
|
||||||
|
if args.output:
|
||||||
|
with open(args.output, "w", encoding="utf-8") as f:
|
||||||
|
f.write(output)
|
||||||
|
logger.info(f"Output saved to {args.output}")
|
||||||
|
else:
|
||||||
|
print(output)
|
||||||
|
|
||||||
|
analyzer.print_stats()
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
asyncio.run(main())
|
||||||
@@ -0,0 +1,11 @@
|
|||||||
|
# 23-seo-content-strategy dependencies
|
||||||
|
requests>=2.31.0
|
||||||
|
aiohttp>=3.9.0
|
||||||
|
beautifulsoup4>=4.12.0
|
||||||
|
lxml>=5.1.0
|
||||||
|
pandas>=2.1.0
|
||||||
|
scikit-learn>=1.3.0
|
||||||
|
tenacity>=8.2.0
|
||||||
|
tqdm>=4.66.0
|
||||||
|
python-dotenv>=1.0.0
|
||||||
|
rich>=13.7.0
|
||||||
138
custom-skills/23-seo-content-strategy/desktop/SKILL.md
Normal file
138
custom-skills/23-seo-content-strategy/desktop/SKILL.md
Normal file
@@ -0,0 +1,138 @@
|
|||||||
|
---
|
||||||
|
name: seo-content-strategy
|
||||||
|
description: |
|
||||||
|
Content strategy and planning for SEO. Triggers: content audit, content strategy, content gap, topic clusters, content brief, editorial calendar, content decay, 콘텐츠 전략, 콘텐츠 감사.
|
||||||
|
---
|
||||||
|
|
||||||
|
# SEO Content Strategy
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
Audit existing content performance, identify topic gaps vs competitors, map topic clusters, detect content decay, and generate SEO content briefs. Supports Korean content patterns (Naver Blog format, 후기/review content, 추천 listicles).
|
||||||
|
|
||||||
|
## Core Capabilities
|
||||||
|
|
||||||
|
1. **Content Audit** - Inventory, performance scoring, decay detection
|
||||||
|
2. **Content Gap Analysis** - Topic gaps vs competitors, cluster mapping
|
||||||
|
3. **Content Brief Generation** - Outlines, keywords, word count targets
|
||||||
|
4. **Editorial Calendar** - Prioritized content creation schedule
|
||||||
|
5. **Korean Content Patterns** - Naver Blog style, 후기, 추천 format analysis
|
||||||
|
|
||||||
|
## MCP Tool Usage
|
||||||
|
|
||||||
|
### Ahrefs for Content Data
|
||||||
|
```
|
||||||
|
site-explorer-top-pages: Get top performing pages
|
||||||
|
site-explorer-pages-by-traffic: Pages ranked by organic traffic
|
||||||
|
site-explorer-organic-keywords: Keywords per page
|
||||||
|
site-explorer-organic-competitors: Find content competitors
|
||||||
|
site-explorer-best-by-external-links: Best content by backlinks
|
||||||
|
keywords-explorer-matching-terms: Secondary keyword suggestions
|
||||||
|
keywords-explorer-related-terms: LSI keyword suggestions
|
||||||
|
serp-overview: Analyze top ranking results for a keyword
|
||||||
|
```
|
||||||
|
|
||||||
|
### WebSearch for Content Research
|
||||||
|
```
|
||||||
|
WebSearch: Research content topics and competitor strategies
|
||||||
|
WebFetch: Analyze competitor page content and structure
|
||||||
|
```
|
||||||
|
|
||||||
|
### Notion for Report Storage
|
||||||
|
```
|
||||||
|
notion-create-pages: Save audit reports to SEO Audit Log
|
||||||
|
```
|
||||||
|
|
||||||
|
## Workflow
|
||||||
|
|
||||||
|
### 1. Content Audit
|
||||||
|
1. Crawl sitemap to discover all content URLs
|
||||||
|
2. Fetch top pages data from Ahrefs (traffic, keywords, backlinks)
|
||||||
|
3. Classify content types (blog, product, service, landing, resource)
|
||||||
|
4. Score each page performance (0-100 composite)
|
||||||
|
5. Detect decaying content (traffic decline patterns)
|
||||||
|
6. Analyze freshness distribution (fresh/aging/stale)
|
||||||
|
7. Identify Korean content patterns (후기, 추천, 방법 formats)
|
||||||
|
8. Generate recommendations
|
||||||
|
|
||||||
|
### 2. Content Gap Analysis
|
||||||
|
1. Gather target site keywords from Ahrefs
|
||||||
|
2. Gather competitor top pages and keywords
|
||||||
|
3. Identify topics present in competitors but missing from target
|
||||||
|
4. Score gaps by priority (traffic potential + competition coverage)
|
||||||
|
5. Build topic clusters using TF-IDF + hierarchical clustering
|
||||||
|
6. Generate editorial calendar with priority and dates
|
||||||
|
7. Detect Korean market content opportunities
|
||||||
|
|
||||||
|
### 3. Content Brief Generation
|
||||||
|
1. Analyze top 5-10 ranking pages for target keyword
|
||||||
|
2. Extract headings, word counts, content features (FAQ, images, video)
|
||||||
|
3. Build recommended H2/H3 outline from competitor patterns
|
||||||
|
4. Suggest primary, secondary, and LSI keywords
|
||||||
|
5. Calculate target word count (avg of top 5 +/- 20%)
|
||||||
|
6. Find internal linking opportunities on the target site
|
||||||
|
7. Detect search intent (informational, commercial, transactional, navigational)
|
||||||
|
8. Add Korean format recommendations based on intent
|
||||||
|
|
||||||
|
## Output Format
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
## Content Audit: [domain]
|
||||||
|
|
||||||
|
### Content Inventory
|
||||||
|
- Total pages: [count]
|
||||||
|
- By type: blog [n], product [n], service [n], other [n]
|
||||||
|
- Average performance score: [score]/100
|
||||||
|
|
||||||
|
### Top Performers
|
||||||
|
1. [score] [url] (traffic: [n])
|
||||||
|
...
|
||||||
|
|
||||||
|
### Decaying Content
|
||||||
|
1. [decay rate] [url] (traffic: [n])
|
||||||
|
...
|
||||||
|
|
||||||
|
### Content Gaps vs Competitors
|
||||||
|
1. [priority] [topic] (est. traffic: [n], difficulty: [level])
|
||||||
|
...
|
||||||
|
|
||||||
|
### Topic Clusters
|
||||||
|
1. **[Pillar Topic]** ([n] subtopics)
|
||||||
|
- [subtopic 1]
|
||||||
|
- [subtopic 2]
|
||||||
|
|
||||||
|
### Editorial Calendar
|
||||||
|
- [date] [topic] ([type], [word count], priority: [level])
|
||||||
|
...
|
||||||
|
|
||||||
|
### Recommendations
|
||||||
|
1. [Priority actions]
|
||||||
|
```
|
||||||
|
|
||||||
|
## Common Issues
|
||||||
|
|
||||||
|
| Issue | Impact | Fix |
|
||||||
|
|-------|--------|-----|
|
||||||
|
| No blog content | High | Build blog content strategy with topic clusters |
|
||||||
|
| Content decay (traffic loss) | High | Refresh and update declining pages |
|
||||||
|
| Missing competitor topics | Medium | Create content for high-priority gaps |
|
||||||
|
| No 후기/review content | Medium | Add Korean review-style content for conversions |
|
||||||
|
| Stale content (>12 months) | Medium | Update or consolidate outdated pages |
|
||||||
|
| No topic clusters | Medium | Organize content into pillar/cluster structure |
|
||||||
|
| Missing FAQ sections | Low | Add FAQ schema for featured snippet opportunities |
|
||||||
|
|
||||||
|
## Limitations
|
||||||
|
|
||||||
|
- Ahrefs API required for traffic and keyword data
|
||||||
|
- Competitor analysis limited to publicly available content
|
||||||
|
- Content decay detection uses heuristic without historical data in standalone mode
|
||||||
|
- Topic clustering requires minimum 3 topics per cluster
|
||||||
|
- Word count analysis requires accessible competitor pages (no JS rendering)
|
||||||
|
|
||||||
|
## Notion Output (Required)
|
||||||
|
|
||||||
|
All audit reports MUST be saved to OurDigital SEO Audit Log:
|
||||||
|
- **Database ID**: `2c8581e5-8a1e-8035-880b-e38cefc2f3ef`
|
||||||
|
- **Properties**: Issue (title), Site (url), Category, Priority, Found Date, Audit ID
|
||||||
|
- **Language**: Korean with English technical terms
|
||||||
|
- **Audit ID Format**: CONTENT-YYYYMMDD-NNN
|
||||||
8
custom-skills/23-seo-content-strategy/desktop/skill.yaml
Normal file
8
custom-skills/23-seo-content-strategy/desktop/skill.yaml
Normal file
@@ -0,0 +1,8 @@
|
|||||||
|
name: seo-content-strategy
|
||||||
|
description: |
|
||||||
|
Content strategy and planning for SEO. Triggers: content audit, content strategy, content gap, topic clusters, content brief, editorial calendar, content decay.
|
||||||
|
allowed-tools:
|
||||||
|
- mcp__ahrefs__*
|
||||||
|
- mcp__notion__*
|
||||||
|
- WebSearch
|
||||||
|
- WebFetch
|
||||||
@@ -0,0 +1,15 @@
|
|||||||
|
# Ahrefs
|
||||||
|
|
||||||
|
> TODO: Document tool usage for this skill
|
||||||
|
|
||||||
|
## Available Commands
|
||||||
|
|
||||||
|
- [ ] List commands
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
- [ ] Add configuration details
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
- [ ] Add usage examples
|
||||||
@@ -0,0 +1,15 @@
|
|||||||
|
# Notion
|
||||||
|
|
||||||
|
> TODO: Document tool usage for this skill
|
||||||
|
|
||||||
|
## Available Commands
|
||||||
|
|
||||||
|
- [ ] List commands
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
- [ ] Add configuration details
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
- [ ] Add usage examples
|
||||||
@@ -0,0 +1,15 @@
|
|||||||
|
# WebSearch
|
||||||
|
|
||||||
|
> TODO: Document tool usage for this skill
|
||||||
|
|
||||||
|
## Available Commands
|
||||||
|
|
||||||
|
- [ ] List commands
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
- [ ] Add configuration details
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
- [ ] Add usage examples
|
||||||
136
custom-skills/24-seo-ecommerce/code/CLAUDE.md
Normal file
136
custom-skills/24-seo-ecommerce/code/CLAUDE.md
Normal file
@@ -0,0 +1,136 @@
|
|||||||
|
# CLAUDE.md
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
E-commerce SEO audit tool for product page optimization, product schema validation, category taxonomy analysis, and marketplace presence checking. Supports Naver Smart Store optimization and Korean marketplace platforms (Coupang, Gmarket, 11번가).
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install -r scripts/requirements.txt
|
||||||
|
|
||||||
|
# E-commerce SEO audit
|
||||||
|
python scripts/ecommerce_auditor.py --url https://example.com --json
|
||||||
|
|
||||||
|
# Product schema validation
|
||||||
|
python scripts/product_schema_checker.py --url https://example.com/product --json
|
||||||
|
```
|
||||||
|
|
||||||
|
## Scripts
|
||||||
|
|
||||||
|
| Script | Purpose | Key Output |
|
||||||
|
|--------|---------|------------|
|
||||||
|
| `ecommerce_auditor.py` | Full e-commerce SEO audit | Product page issues, category structure, marketplace presence |
|
||||||
|
| `product_schema_checker.py` | Validate product structured data | Schema completeness, errors, rich result eligibility |
|
||||||
|
| `base_client.py` | Shared utilities | RateLimiter, ConfigManager, BaseAsyncClient |
|
||||||
|
|
||||||
|
## E-Commerce Auditor
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Full audit
|
||||||
|
python scripts/ecommerce_auditor.py --url https://example.com --json
|
||||||
|
|
||||||
|
# Product page audit only
|
||||||
|
python scripts/ecommerce_auditor.py --url https://example.com --scope products --json
|
||||||
|
|
||||||
|
# Category taxonomy analysis
|
||||||
|
python scripts/ecommerce_auditor.py --url https://example.com --scope categories --json
|
||||||
|
|
||||||
|
# Check Korean marketplace presence
|
||||||
|
python scripts/ecommerce_auditor.py --url https://example.com --korean-marketplaces --json
|
||||||
|
```
|
||||||
|
|
||||||
|
**Capabilities**:
|
||||||
|
- Product page SEO audit (titles, meta descriptions, image alt text, H1 structure)
|
||||||
|
- Category taxonomy analysis (depth, breadcrumb implementation, faceted navigation)
|
||||||
|
- Duplicate content detection (parameter URLs, product variants, pagination)
|
||||||
|
- Pagination/infinite scroll SEO validation (rel=prev/next, canonical tags)
|
||||||
|
- Internal linking structure for product discovery
|
||||||
|
- Naver Smart Store optimization checks
|
||||||
|
- Korean marketplace presence (Coupang, Gmarket, 11번가 product listing detection)
|
||||||
|
|
||||||
|
## Product Schema Checker
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Validate single product page
|
||||||
|
python scripts/product_schema_checker.py --url https://example.com/product/123 --json
|
||||||
|
|
||||||
|
# Batch validate from sitemap
|
||||||
|
python scripts/product_schema_checker.py --sitemap https://example.com/product-sitemap.xml --sample 50 --json
|
||||||
|
```
|
||||||
|
|
||||||
|
**Capabilities**:
|
||||||
|
- Product schema validation (Product, Offer, AggregateRating, Review, BreadcrumbList)
|
||||||
|
- Required property completeness check (name, image, description, offers, price, availability)
|
||||||
|
- Optional property recommendations (brand, sku, gtin, mpn, review, aggregateRating)
|
||||||
|
- Rich result eligibility assessment
|
||||||
|
- Price and availability markup validation
|
||||||
|
- Merchant listing schema support
|
||||||
|
- Korean market: Naver Shopping structured data requirements
|
||||||
|
|
||||||
|
## Ahrefs MCP Tools Used
|
||||||
|
|
||||||
|
| Tool | Purpose |
|
||||||
|
|------|---------|
|
||||||
|
| `site-explorer-pages-by-traffic` | Identify top product/category pages |
|
||||||
|
| `site-explorer-organic-keywords` | Product page keyword performance |
|
||||||
|
|
||||||
|
## Output Format
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"url": "https://example.com",
|
||||||
|
"product_pages_audited": 50,
|
||||||
|
"issues": {
|
||||||
|
"critical": [...],
|
||||||
|
"high": [...],
|
||||||
|
"medium": [...],
|
||||||
|
"low": [...]
|
||||||
|
},
|
||||||
|
"category_structure": {
|
||||||
|
"max_depth": 4,
|
||||||
|
"breadcrumbs_present": true,
|
||||||
|
"faceted_nav_issues": [...]
|
||||||
|
},
|
||||||
|
"schema_validation": {
|
||||||
|
"pages_with_schema": 42,
|
||||||
|
"pages_without_schema": 8,
|
||||||
|
"common_errors": [...]
|
||||||
|
},
|
||||||
|
"korean_marketplaces": {
|
||||||
|
"naver_smart_store": {"found": true, "url": "..."},
|
||||||
|
"coupang": {"found": false},
|
||||||
|
"gmarket": {"found": false}
|
||||||
|
},
|
||||||
|
"score": 65,
|
||||||
|
"timestamp": "2025-01-01T00:00:00"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Notion Output (Required)
|
||||||
|
|
||||||
|
**IMPORTANT**: All audit reports MUST be saved to the OurDigital SEO Audit Log database.
|
||||||
|
|
||||||
|
### Database Configuration
|
||||||
|
|
||||||
|
| Field | Value |
|
||||||
|
|-------|-------|
|
||||||
|
| Database ID | `2c8581e5-8a1e-8035-880b-e38cefc2f3ef` |
|
||||||
|
| URL | https://www.notion.so/dintelligence/2c8581e58a1e8035880be38cefc2f3ef |
|
||||||
|
|
||||||
|
### Required Properties
|
||||||
|
|
||||||
|
| Property | Type | Description |
|
||||||
|
|----------|------|-------------|
|
||||||
|
| Issue | Title | Report title (Korean + date) |
|
||||||
|
| Site | URL | Audited website URL |
|
||||||
|
| Category | Select | E-Commerce SEO |
|
||||||
|
| Priority | Select | Based on issue severity |
|
||||||
|
| Found Date | Date | Audit date (YYYY-MM-DD) |
|
||||||
|
| Audit ID | Rich Text | Format: ECOM-YYYYMMDD-NNN |
|
||||||
|
|
||||||
|
### Language Guidelines
|
||||||
|
|
||||||
|
- Report content in Korean (한국어)
|
||||||
|
- Keep technical English terms as-is (e.g., Schema Markup, Product, Offer)
|
||||||
|
- URLs and code remain unchanged
|
||||||
207
custom-skills/24-seo-ecommerce/code/scripts/base_client.py
Normal file
207
custom-skills/24-seo-ecommerce/code/scripts/base_client.py
Normal file
@@ -0,0 +1,207 @@
|
|||||||
|
"""
|
||||||
|
Base Client - Shared async client utilities
|
||||||
|
===========================================
|
||||||
|
Purpose: Rate-limited async operations for API clients
|
||||||
|
Python: 3.10+
|
||||||
|
"""
|
||||||
|
|
||||||
|
import asyncio
|
||||||
|
import logging
|
||||||
|
import os
|
||||||
|
from asyncio import Semaphore
|
||||||
|
from datetime import datetime
|
||||||
|
from typing import Any, Callable, TypeVar
|
||||||
|
|
||||||
|
from dotenv import load_dotenv
|
||||||
|
from tenacity import (
|
||||||
|
retry,
|
||||||
|
stop_after_attempt,
|
||||||
|
wait_exponential,
|
||||||
|
retry_if_exception_type,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Load environment variables
|
||||||
|
load_dotenv()
|
||||||
|
|
||||||
|
# Logging setup
|
||||||
|
logging.basicConfig(
|
||||||
|
level=logging.INFO,
|
||||||
|
format="%(asctime)s - %(levelname)s - %(message)s",
|
||||||
|
)
|
||||||
|
|
||||||
|
T = TypeVar("T")
|
||||||
|
|
||||||
|
|
||||||
|
class RateLimiter:
|
||||||
|
"""Rate limiter using token bucket algorithm."""
|
||||||
|
|
||||||
|
def __init__(self, rate: float, per: float = 1.0):
|
||||||
|
"""
|
||||||
|
Initialize rate limiter.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
rate: Number of requests allowed
|
||||||
|
per: Time period in seconds (default: 1 second)
|
||||||
|
"""
|
||||||
|
self.rate = rate
|
||||||
|
self.per = per
|
||||||
|
self.tokens = rate
|
||||||
|
self.last_update = datetime.now()
|
||||||
|
self._lock = asyncio.Lock()
|
||||||
|
|
||||||
|
async def acquire(self) -> None:
|
||||||
|
"""Acquire a token, waiting if necessary."""
|
||||||
|
async with self._lock:
|
||||||
|
now = datetime.now()
|
||||||
|
elapsed = (now - self.last_update).total_seconds()
|
||||||
|
self.tokens = min(self.rate, self.tokens + elapsed * (self.rate / self.per))
|
||||||
|
self.last_update = now
|
||||||
|
|
||||||
|
if self.tokens < 1:
|
||||||
|
wait_time = (1 - self.tokens) * (self.per / self.rate)
|
||||||
|
await asyncio.sleep(wait_time)
|
||||||
|
self.tokens = 0
|
||||||
|
else:
|
||||||
|
self.tokens -= 1
|
||||||
|
|
||||||
|
|
||||||
|
class BaseAsyncClient:
|
||||||
|
"""Base class for async API clients with rate limiting."""
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
max_concurrent: int = 5,
|
||||||
|
requests_per_second: float = 3.0,
|
||||||
|
logger: logging.Logger | None = None,
|
||||||
|
):
|
||||||
|
"""
|
||||||
|
Initialize base client.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
max_concurrent: Maximum concurrent requests
|
||||||
|
requests_per_second: Rate limit
|
||||||
|
logger: Logger instance
|
||||||
|
"""
|
||||||
|
self.semaphore = Semaphore(max_concurrent)
|
||||||
|
self.rate_limiter = RateLimiter(requests_per_second)
|
||||||
|
self.logger = logger or logging.getLogger(self.__class__.__name__)
|
||||||
|
self.stats = {
|
||||||
|
"requests": 0,
|
||||||
|
"success": 0,
|
||||||
|
"errors": 0,
|
||||||
|
"retries": 0,
|
||||||
|
}
|
||||||
|
|
||||||
|
@retry(
|
||||||
|
stop=stop_after_attempt(3),
|
||||||
|
wait=wait_exponential(multiplier=1, min=2, max=10),
|
||||||
|
retry=retry_if_exception_type(Exception),
|
||||||
|
)
|
||||||
|
async def _rate_limited_request(
|
||||||
|
self,
|
||||||
|
coro: Callable[[], Any],
|
||||||
|
) -> Any:
|
||||||
|
"""Execute a request with rate limiting and retry."""
|
||||||
|
async with self.semaphore:
|
||||||
|
await self.rate_limiter.acquire()
|
||||||
|
self.stats["requests"] += 1
|
||||||
|
try:
|
||||||
|
result = await coro()
|
||||||
|
self.stats["success"] += 1
|
||||||
|
return result
|
||||||
|
except Exception as e:
|
||||||
|
self.stats["errors"] += 1
|
||||||
|
self.logger.error(f"Request failed: {e}")
|
||||||
|
raise
|
||||||
|
|
||||||
|
async def batch_requests(
|
||||||
|
self,
|
||||||
|
requests: list[Callable[[], Any]],
|
||||||
|
desc: str = "Processing",
|
||||||
|
) -> list[Any]:
|
||||||
|
"""Execute multiple requests concurrently."""
|
||||||
|
try:
|
||||||
|
from tqdm.asyncio import tqdm
|
||||||
|
has_tqdm = True
|
||||||
|
except ImportError:
|
||||||
|
has_tqdm = False
|
||||||
|
|
||||||
|
async def execute(req: Callable) -> Any:
|
||||||
|
try:
|
||||||
|
return await self._rate_limited_request(req)
|
||||||
|
except Exception as e:
|
||||||
|
return {"error": str(e)}
|
||||||
|
|
||||||
|
tasks = [execute(req) for req in requests]
|
||||||
|
|
||||||
|
if has_tqdm:
|
||||||
|
results = []
|
||||||
|
for coro in tqdm.as_completed(tasks, total=len(tasks), desc=desc):
|
||||||
|
result = await coro
|
||||||
|
results.append(result)
|
||||||
|
return results
|
||||||
|
else:
|
||||||
|
return await asyncio.gather(*tasks, return_exceptions=True)
|
||||||
|
|
||||||
|
def print_stats(self) -> None:
|
||||||
|
"""Print request statistics."""
|
||||||
|
self.logger.info("=" * 40)
|
||||||
|
self.logger.info("Request Statistics:")
|
||||||
|
self.logger.info(f" Total Requests: {self.stats['requests']}")
|
||||||
|
self.logger.info(f" Successful: {self.stats['success']}")
|
||||||
|
self.logger.info(f" Errors: {self.stats['errors']}")
|
||||||
|
self.logger.info("=" * 40)
|
||||||
|
|
||||||
|
|
||||||
|
class ConfigManager:
|
||||||
|
"""Manage API configuration and credentials."""
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
load_dotenv()
|
||||||
|
|
||||||
|
@property
|
||||||
|
def google_credentials_path(self) -> str | None:
|
||||||
|
"""Get Google service account credentials path."""
|
||||||
|
# Prefer SEO-specific credentials, fallback to general credentials
|
||||||
|
seo_creds = os.path.expanduser("~/.credential/ourdigital-seo-agent.json")
|
||||||
|
if os.path.exists(seo_creds):
|
||||||
|
return seo_creds
|
||||||
|
return os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
|
||||||
|
|
||||||
|
@property
|
||||||
|
def pagespeed_api_key(self) -> str | None:
|
||||||
|
"""Get PageSpeed Insights API key."""
|
||||||
|
return os.getenv("PAGESPEED_API_KEY")
|
||||||
|
|
||||||
|
@property
|
||||||
|
def custom_search_api_key(self) -> str | None:
|
||||||
|
"""Get Custom Search API key."""
|
||||||
|
return os.getenv("CUSTOM_SEARCH_API_KEY")
|
||||||
|
|
||||||
|
@property
|
||||||
|
def custom_search_engine_id(self) -> str | None:
|
||||||
|
"""Get Custom Search Engine ID."""
|
||||||
|
return os.getenv("CUSTOM_SEARCH_ENGINE_ID")
|
||||||
|
|
||||||
|
@property
|
||||||
|
def notion_token(self) -> str | None:
|
||||||
|
"""Get Notion API token."""
|
||||||
|
return os.getenv("NOTION_TOKEN") or os.getenv("NOTION_API_KEY")
|
||||||
|
|
||||||
|
def validate_google_credentials(self) -> bool:
|
||||||
|
"""Validate Google credentials are configured."""
|
||||||
|
creds_path = self.google_credentials_path
|
||||||
|
if not creds_path:
|
||||||
|
return False
|
||||||
|
return os.path.exists(creds_path)
|
||||||
|
|
||||||
|
def get_required(self, key: str) -> str:
|
||||||
|
"""Get required environment variable or raise error."""
|
||||||
|
value = os.getenv(key)
|
||||||
|
if not value:
|
||||||
|
raise ValueError(f"Missing required environment variable: {key}")
|
||||||
|
return value
|
||||||
|
|
||||||
|
|
||||||
|
# Singleton config instance
|
||||||
|
config = ConfigManager()
|
||||||
1046
custom-skills/24-seo-ecommerce/code/scripts/ecommerce_auditor.py
Normal file
1046
custom-skills/24-seo-ecommerce/code/scripts/ecommerce_auditor.py
Normal file
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,805 @@
|
|||||||
|
"""
|
||||||
|
Product Schema Checker
|
||||||
|
======================
|
||||||
|
Purpose: Validate Product structured data (JSON-LD, Microdata, RDFa)
|
||||||
|
for Google and Naver rich result eligibility.
|
||||||
|
Python: 3.10+
|
||||||
|
"""
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import asyncio
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
import re
|
||||||
|
import sys
|
||||||
|
from dataclasses import asdict, dataclass, field
|
||||||
|
from datetime import datetime
|
||||||
|
from typing import Any
|
||||||
|
from urllib.parse import urljoin, urlparse
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
|
from bs4 import BeautifulSoup
|
||||||
|
from rich.console import Console
|
||||||
|
from rich.table import Table
|
||||||
|
|
||||||
|
from base_client import BaseAsyncClient, config
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
console = Console()
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Data classes
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class SchemaProperty:
|
||||||
|
"""Single property within a schema object."""
|
||||||
|
name: str
|
||||||
|
value: Any
|
||||||
|
required: bool
|
||||||
|
valid: bool
|
||||||
|
error: str = ""
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class ProductSchema:
|
||||||
|
"""Validation result for one product schema on a page."""
|
||||||
|
url: str
|
||||||
|
schema_type: str # Product, Offer, AggregateRating, etc.
|
||||||
|
properties: list[dict] # list of SchemaProperty as dicts
|
||||||
|
is_valid: bool = False
|
||||||
|
rich_result_eligible: bool = False
|
||||||
|
errors: list[str] = field(default_factory=list)
|
||||||
|
warnings: list[str] = field(default_factory=list)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class SchemaCheckResult:
|
||||||
|
"""Complete schema check result for one or more pages."""
|
||||||
|
urls_checked: int = 0
|
||||||
|
pages_with_schema: int = 0
|
||||||
|
pages_without_schema: int = 0
|
||||||
|
schemas: list[dict] = field(default_factory=list)
|
||||||
|
common_errors: list[str] = field(default_factory=list)
|
||||||
|
common_warnings: list[str] = field(default_factory=list)
|
||||||
|
naver_shopping_issues: list[dict] = field(default_factory=list)
|
||||||
|
score: int = 0
|
||||||
|
timestamp: str = ""
|
||||||
|
|
||||||
|
def calculate_score(self) -> int:
|
||||||
|
"""Score 0-100 based on schema completeness."""
|
||||||
|
if self.urls_checked == 0:
|
||||||
|
self.score = 0
|
||||||
|
return 0
|
||||||
|
coverage = self.pages_with_schema / self.urls_checked
|
||||||
|
valid_schemas = sum(1 for s in self.schemas if s.get("is_valid"))
|
||||||
|
validity_rate = valid_schemas / max(len(self.schemas), 1)
|
||||||
|
eligible = sum(1 for s in self.schemas if s.get("rich_result_eligible"))
|
||||||
|
eligibility_rate = eligible / max(len(self.schemas), 1)
|
||||||
|
self.score = int(coverage * 40 + validity_rate * 35 + eligibility_rate * 25)
|
||||||
|
return self.score
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Schema requirements
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
PRODUCT_REQUIRED = {"name", "image", "description"}
|
||||||
|
PRODUCT_RECOMMENDED = {
|
||||||
|
"brand", "sku", "gtin", "gtin8", "gtin13", "gtin14", "mpn",
|
||||||
|
"offers", "review", "aggregateRating", "color", "material",
|
||||||
|
}
|
||||||
|
|
||||||
|
OFFER_REQUIRED = {"price", "priceCurrency", "availability"}
|
||||||
|
OFFER_RECOMMENDED = {
|
||||||
|
"url", "priceValidUntil", "itemCondition", "seller",
|
||||||
|
"shippingDetails", "hasMerchantReturnPolicy",
|
||||||
|
}
|
||||||
|
|
||||||
|
AGGREGATE_RATING_REQUIRED = {"ratingValue", "reviewCount"}
|
||||||
|
AGGREGATE_RATING_RECOMMENDED = {"bestRating", "worstRating", "ratingCount"}
|
||||||
|
|
||||||
|
REVIEW_REQUIRED = {"author", "reviewRating"}
|
||||||
|
REVIEW_RECOMMENDED = {"datePublished", "reviewBody", "name"}
|
||||||
|
|
||||||
|
BREADCRUMB_REQUIRED = {"itemListElement"}
|
||||||
|
|
||||||
|
AVAILABILITY_VALUES = {
|
||||||
|
"https://schema.org/InStock",
|
||||||
|
"https://schema.org/OutOfStock",
|
||||||
|
"https://schema.org/PreOrder",
|
||||||
|
"https://schema.org/BackOrder",
|
||||||
|
"https://schema.org/Discontinued",
|
||||||
|
"https://schema.org/InStoreOnly",
|
||||||
|
"https://schema.org/OnlineOnly",
|
||||||
|
"https://schema.org/LimitedAvailability",
|
||||||
|
"https://schema.org/SoldOut",
|
||||||
|
"http://schema.org/InStock",
|
||||||
|
"http://schema.org/OutOfStock",
|
||||||
|
"http://schema.org/PreOrder",
|
||||||
|
"http://schema.org/BackOrder",
|
||||||
|
"http://schema.org/Discontinued",
|
||||||
|
"InStock", "OutOfStock", "PreOrder", "BackOrder", "Discontinued",
|
||||||
|
}
|
||||||
|
|
||||||
|
ITEM_CONDITION_VALUES = {
|
||||||
|
"https://schema.org/NewCondition",
|
||||||
|
"https://schema.org/UsedCondition",
|
||||||
|
"https://schema.org/RefurbishedCondition",
|
||||||
|
"https://schema.org/DamagedCondition",
|
||||||
|
"http://schema.org/NewCondition",
|
||||||
|
"http://schema.org/UsedCondition",
|
||||||
|
"http://schema.org/RefurbishedCondition",
|
||||||
|
"NewCondition", "UsedCondition", "RefurbishedCondition",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Main checker
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
class ProductSchemaChecker(BaseAsyncClient):
|
||||||
|
"""Validate Product structured data on e-commerce pages."""
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
max_concurrent: int = 10,
|
||||||
|
requests_per_second: float = 5.0,
|
||||||
|
timeout: int = 30,
|
||||||
|
):
|
||||||
|
super().__init__(max_concurrent=max_concurrent, requests_per_second=requests_per_second)
|
||||||
|
self.timeout = aiohttp.ClientTimeout(total=timeout)
|
||||||
|
self.headers = {
|
||||||
|
"User-Agent": (
|
||||||
|
"Mozilla/5.0 (compatible; ProductSchemaChecker/1.0; "
|
||||||
|
"+https://ourdigital.org)"
|
||||||
|
),
|
||||||
|
"Accept": "text/html,application/xhtml+xml",
|
||||||
|
"Accept-Language": "ko-KR,ko;q=0.9,en-US;q=0.8,en;q=0.7",
|
||||||
|
}
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Page fetching
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
async def _fetch_page(self, session: aiohttp.ClientSession, url: str) -> str:
|
||||||
|
"""Fetch page HTML."""
|
||||||
|
try:
|
||||||
|
async with session.get(url, headers=self.headers, timeout=self.timeout,
|
||||||
|
allow_redirects=True, ssl=False) as resp:
|
||||||
|
return await resp.text(errors="replace")
|
||||||
|
except Exception as exc:
|
||||||
|
self.logger.warning(f"Failed to fetch {url}: {exc}")
|
||||||
|
return ""
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Schema extraction
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
def extract_schemas(self, html: str, page_url: str) -> list[dict]:
|
||||||
|
"""Extract all structured data from HTML (JSON-LD, Microdata, RDFa)."""
|
||||||
|
schemas: list[dict] = []
|
||||||
|
soup = BeautifulSoup(html, "lxml")
|
||||||
|
|
||||||
|
# --- JSON-LD ---
|
||||||
|
for script in soup.find_all("script", attrs={"type": "application/ld+json"}):
|
||||||
|
try:
|
||||||
|
text = script.string or script.get_text()
|
||||||
|
if not text:
|
||||||
|
continue
|
||||||
|
data = json.loads(text)
|
||||||
|
if isinstance(data, list):
|
||||||
|
for item in data:
|
||||||
|
if isinstance(item, dict):
|
||||||
|
schemas.append(item)
|
||||||
|
elif isinstance(data, dict):
|
||||||
|
# Handle @graph
|
||||||
|
if "@graph" in data:
|
||||||
|
for item in data["@graph"]:
|
||||||
|
if isinstance(item, dict):
|
||||||
|
schemas.append(item)
|
||||||
|
else:
|
||||||
|
schemas.append(data)
|
||||||
|
except (json.JSONDecodeError, TypeError) as exc:
|
||||||
|
self.logger.debug(f"JSON-LD parse error on {page_url}: {exc}")
|
||||||
|
|
||||||
|
# --- Microdata ---
|
||||||
|
for item_scope in soup.find_all(attrs={"itemscope": True}):
|
||||||
|
item_type = item_scope.get("itemtype", "")
|
||||||
|
if "Product" in item_type or "Offer" in item_type:
|
||||||
|
microdata = self._parse_microdata(item_scope)
|
||||||
|
if microdata:
|
||||||
|
schemas.append(microdata)
|
||||||
|
|
||||||
|
return schemas
|
||||||
|
|
||||||
|
def _parse_microdata(self, element) -> dict:
|
||||||
|
"""Parse microdata from an itemscope element."""
|
||||||
|
result: dict[str, Any] = {}
|
||||||
|
item_type = element.get("itemtype", "")
|
||||||
|
if item_type:
|
||||||
|
type_name = item_type.rstrip("/").split("/")[-1]
|
||||||
|
result["@type"] = type_name
|
||||||
|
|
||||||
|
for prop in element.find_all(attrs={"itemprop": True}, recursive=True):
|
||||||
|
name = prop.get("itemprop", "")
|
||||||
|
if not name:
|
||||||
|
continue
|
||||||
|
# Nested itemscope
|
||||||
|
if prop.get("itemscope") is not None:
|
||||||
|
result[name] = self._parse_microdata(prop)
|
||||||
|
elif prop.name == "meta":
|
||||||
|
result[name] = prop.get("content", "")
|
||||||
|
elif prop.name == "link":
|
||||||
|
result[name] = prop.get("href", "")
|
||||||
|
elif prop.name == "img":
|
||||||
|
result[name] = prop.get("src", "")
|
||||||
|
elif prop.name == "time":
|
||||||
|
result[name] = prop.get("datetime", prop.get_text(strip=True))
|
||||||
|
else:
|
||||||
|
result[name] = prop.get_text(strip=True)
|
||||||
|
|
||||||
|
return result
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Validation methods
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
def validate_product_schema(self, schema_data: dict, page_url: str) -> ProductSchema:
|
||||||
|
"""Validate a Product schema object."""
|
||||||
|
ps = ProductSchema(
|
||||||
|
url=page_url,
|
||||||
|
schema_type="Product",
|
||||||
|
properties=[],
|
||||||
|
)
|
||||||
|
|
||||||
|
# Check required properties
|
||||||
|
for prop_name in PRODUCT_REQUIRED:
|
||||||
|
value = schema_data.get(prop_name)
|
||||||
|
valid = bool(value)
|
||||||
|
error = "" if valid else f"Missing required property: {prop_name}"
|
||||||
|
sp = SchemaProperty(
|
||||||
|
name=prop_name, value=value, required=True, valid=valid, error=error,
|
||||||
|
)
|
||||||
|
ps.properties.append(asdict(sp))
|
||||||
|
if not valid:
|
||||||
|
ps.errors.append(error)
|
||||||
|
|
||||||
|
# Check recommended properties
|
||||||
|
for prop_name in PRODUCT_RECOMMENDED:
|
||||||
|
value = schema_data.get(prop_name)
|
||||||
|
sp = SchemaProperty(
|
||||||
|
name=prop_name, value=value if value else None,
|
||||||
|
required=False, valid=bool(value),
|
||||||
|
error="" if value else f"Missing recommended property: {prop_name}",
|
||||||
|
)
|
||||||
|
ps.properties.append(asdict(sp))
|
||||||
|
if not value:
|
||||||
|
ps.warnings.append(f"Missing recommended property: {prop_name}")
|
||||||
|
|
||||||
|
# Validate offers
|
||||||
|
offers = schema_data.get("offers")
|
||||||
|
if offers:
|
||||||
|
if isinstance(offers, list):
|
||||||
|
for offer in offers:
|
||||||
|
offer_errors = self.validate_offer_schema(offer)
|
||||||
|
ps.errors.extend(offer_errors["errors"])
|
||||||
|
ps.warnings.extend(offer_errors["warnings"])
|
||||||
|
elif isinstance(offers, dict):
|
||||||
|
offer_errors = self.validate_offer_schema(offers)
|
||||||
|
ps.errors.extend(offer_errors["errors"])
|
||||||
|
ps.warnings.extend(offer_errors["warnings"])
|
||||||
|
else:
|
||||||
|
ps.errors.append("Missing 'offers' property (required for rich results)")
|
||||||
|
|
||||||
|
# Validate aggregateRating
|
||||||
|
agg_rating = schema_data.get("aggregateRating")
|
||||||
|
if agg_rating and isinstance(agg_rating, dict):
|
||||||
|
rating_result = self.validate_aggregate_rating(agg_rating)
|
||||||
|
ps.errors.extend(rating_result["errors"])
|
||||||
|
ps.warnings.extend(rating_result["warnings"])
|
||||||
|
|
||||||
|
# Validate reviews
|
||||||
|
review = schema_data.get("review")
|
||||||
|
if review:
|
||||||
|
reviews = review if isinstance(review, list) else [review]
|
||||||
|
for r in reviews[:5]: # Check up to 5 reviews
|
||||||
|
if isinstance(r, dict):
|
||||||
|
review_result = self.validate_review_schema(r)
|
||||||
|
ps.errors.extend(review_result["errors"])
|
||||||
|
ps.warnings.extend(review_result["warnings"])
|
||||||
|
|
||||||
|
ps.is_valid = len(ps.errors) == 0
|
||||||
|
ps.rich_result_eligible = self.check_rich_result_eligibility(schema_data)
|
||||||
|
|
||||||
|
return ps
|
||||||
|
|
||||||
|
def validate_offer_schema(self, offer_data: dict) -> dict[str, list[str]]:
|
||||||
|
"""Validate an Offer schema object."""
|
||||||
|
errors: list[str] = []
|
||||||
|
warnings: list[str] = []
|
||||||
|
|
||||||
|
for prop_name in OFFER_REQUIRED:
|
||||||
|
value = offer_data.get(prop_name)
|
||||||
|
if not value:
|
||||||
|
errors.append(f"Offer missing required property: {prop_name}")
|
||||||
|
|
||||||
|
# Validate price format
|
||||||
|
price = offer_data.get("price")
|
||||||
|
if price is not None:
|
||||||
|
price_str = str(price).replace(",", "").strip()
|
||||||
|
if not re.match(r"^\d+(\.\d+)?$", price_str):
|
||||||
|
errors.append(f"Invalid price format: '{price}' (must be numeric)")
|
||||||
|
elif float(price_str) <= 0:
|
||||||
|
warnings.append(f"Price is zero or negative: {price}")
|
||||||
|
|
||||||
|
# Validate priceCurrency
|
||||||
|
currency = offer_data.get("priceCurrency", "")
|
||||||
|
valid_currencies = {"KRW", "USD", "EUR", "JPY", "CNY", "GBP"}
|
||||||
|
if currency and currency.upper() not in valid_currencies:
|
||||||
|
warnings.append(f"Unusual currency code: {currency}")
|
||||||
|
|
||||||
|
# Validate availability
|
||||||
|
availability = offer_data.get("availability", "")
|
||||||
|
if availability and availability not in AVAILABILITY_VALUES:
|
||||||
|
errors.append(
|
||||||
|
f"Invalid availability value: '{availability}'. "
|
||||||
|
f"Use schema.org values like https://schema.org/InStock"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Validate itemCondition
|
||||||
|
condition = offer_data.get("itemCondition", "")
|
||||||
|
if condition and condition not in ITEM_CONDITION_VALUES:
|
||||||
|
warnings.append(f"Invalid itemCondition: '{condition}'")
|
||||||
|
|
||||||
|
# Check recommended
|
||||||
|
for prop_name in OFFER_RECOMMENDED:
|
||||||
|
if not offer_data.get(prop_name):
|
||||||
|
warnings.append(f"Offer missing recommended property: {prop_name}")
|
||||||
|
|
||||||
|
return {"errors": errors, "warnings": warnings}
|
||||||
|
|
||||||
|
def validate_aggregate_rating(self, rating_data: dict) -> dict[str, list[str]]:
|
||||||
|
"""Validate AggregateRating schema."""
|
||||||
|
errors: list[str] = []
|
||||||
|
warnings: list[str] = []
|
||||||
|
|
||||||
|
for prop_name in AGGREGATE_RATING_REQUIRED:
|
||||||
|
value = rating_data.get(prop_name)
|
||||||
|
if value is None:
|
||||||
|
errors.append(f"AggregateRating missing required: {prop_name}")
|
||||||
|
|
||||||
|
# Validate ratingValue range
|
||||||
|
rating_value = rating_data.get("ratingValue")
|
||||||
|
best_rating = rating_data.get("bestRating", 5)
|
||||||
|
worst_rating = rating_data.get("worstRating", 1)
|
||||||
|
if rating_value is not None:
|
||||||
|
try:
|
||||||
|
rv = float(rating_value)
|
||||||
|
br = float(best_rating)
|
||||||
|
wr = float(worst_rating)
|
||||||
|
if rv < wr or rv > br:
|
||||||
|
errors.append(
|
||||||
|
f"ratingValue ({rv}) outside range [{wr}, {br}]"
|
||||||
|
)
|
||||||
|
except (ValueError, TypeError):
|
||||||
|
errors.append(f"Invalid ratingValue format: {rating_value}")
|
||||||
|
|
||||||
|
# Validate reviewCount
|
||||||
|
review_count = rating_data.get("reviewCount")
|
||||||
|
if review_count is not None:
|
||||||
|
try:
|
||||||
|
rc = int(review_count)
|
||||||
|
if rc < 0:
|
||||||
|
errors.append(f"Negative reviewCount: {rc}")
|
||||||
|
except (ValueError, TypeError):
|
||||||
|
errors.append(f"Invalid reviewCount format: {review_count}")
|
||||||
|
|
||||||
|
for prop_name in AGGREGATE_RATING_RECOMMENDED:
|
||||||
|
if not rating_data.get(prop_name):
|
||||||
|
warnings.append(f"AggregateRating missing recommended: {prop_name}")
|
||||||
|
|
||||||
|
return {"errors": errors, "warnings": warnings}
|
||||||
|
|
||||||
|
def validate_review_schema(self, review_data: dict) -> dict[str, list[str]]:
|
||||||
|
"""Validate Review schema."""
|
||||||
|
errors: list[str] = []
|
||||||
|
warnings: list[str] = []
|
||||||
|
|
||||||
|
# Author validation
|
||||||
|
author = review_data.get("author")
|
||||||
|
if not author:
|
||||||
|
errors.append("Review missing required: author")
|
||||||
|
elif isinstance(author, dict):
|
||||||
|
author_name = author.get("name", "")
|
||||||
|
if not author_name:
|
||||||
|
errors.append("Review author missing 'name' property")
|
||||||
|
elif isinstance(author, str):
|
||||||
|
if len(author.strip()) == 0:
|
||||||
|
errors.append("Review author is empty string")
|
||||||
|
|
||||||
|
# reviewRating validation
|
||||||
|
review_rating = review_data.get("reviewRating")
|
||||||
|
if not review_rating:
|
||||||
|
errors.append("Review missing required: reviewRating")
|
||||||
|
elif isinstance(review_rating, dict):
|
||||||
|
rv = review_rating.get("ratingValue")
|
||||||
|
if rv is None:
|
||||||
|
errors.append("reviewRating missing ratingValue")
|
||||||
|
|
||||||
|
for prop_name in REVIEW_RECOMMENDED:
|
||||||
|
if not review_data.get(prop_name):
|
||||||
|
warnings.append(f"Review missing recommended: {prop_name}")
|
||||||
|
|
||||||
|
return {"errors": errors, "warnings": warnings}
|
||||||
|
|
||||||
|
def validate_breadcrumb(self, schema_data: dict) -> dict[str, list[str]]:
|
||||||
|
"""Validate BreadcrumbList schema."""
|
||||||
|
errors: list[str] = []
|
||||||
|
warnings: list[str] = []
|
||||||
|
|
||||||
|
items = schema_data.get("itemListElement")
|
||||||
|
if not items:
|
||||||
|
errors.append("BreadcrumbList missing itemListElement")
|
||||||
|
return {"errors": errors, "warnings": warnings}
|
||||||
|
|
||||||
|
if not isinstance(items, list):
|
||||||
|
errors.append("itemListElement should be an array")
|
||||||
|
return {"errors": errors, "warnings": warnings}
|
||||||
|
|
||||||
|
for i, item in enumerate(items):
|
||||||
|
if not isinstance(item, dict):
|
||||||
|
errors.append(f"Breadcrumb item {i} is not an object")
|
||||||
|
continue
|
||||||
|
position = item.get("position")
|
||||||
|
if position is None:
|
||||||
|
errors.append(f"Breadcrumb item {i} missing 'position'")
|
||||||
|
name = item.get("name") or (item.get("item", {}).get("name") if isinstance(item.get("item"), dict) else None)
|
||||||
|
if not name:
|
||||||
|
warnings.append(f"Breadcrumb item {i} missing 'name'")
|
||||||
|
|
||||||
|
return {"errors": errors, "warnings": warnings}
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Rich result eligibility
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
def check_rich_result_eligibility(self, schema_data: dict) -> bool:
|
||||||
|
"""Assess Google rich result eligibility for Product schema."""
|
||||||
|
# Must have name, image, and offers with price
|
||||||
|
if not schema_data.get("name"):
|
||||||
|
return False
|
||||||
|
if not schema_data.get("image"):
|
||||||
|
return False
|
||||||
|
|
||||||
|
offers = schema_data.get("offers")
|
||||||
|
if not offers:
|
||||||
|
return False
|
||||||
|
|
||||||
|
offer_list = offers if isinstance(offers, list) else [offers]
|
||||||
|
for offer in offer_list:
|
||||||
|
if not isinstance(offer, dict):
|
||||||
|
continue
|
||||||
|
if offer.get("price") and offer.get("priceCurrency") and offer.get("availability"):
|
||||||
|
return True
|
||||||
|
|
||||||
|
return False
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Naver Shopping requirements
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
def check_naver_shopping_requirements(self, schema_data: dict, page_url: str) -> list[dict]:
|
||||||
|
"""Check Naver Shopping specific schema requirements."""
|
||||||
|
issues: list[dict] = []
|
||||||
|
|
||||||
|
# Naver Shopping requires Product name in Korean for Korean market
|
||||||
|
name = schema_data.get("name", "")
|
||||||
|
korean_chars = len(re.findall(r"[\uac00-\ud7af]", str(name)))
|
||||||
|
if korean_chars == 0 and name:
|
||||||
|
issues.append({
|
||||||
|
"url": page_url,
|
||||||
|
"type": "naver_product_name",
|
||||||
|
"severity": "medium",
|
||||||
|
"message": "Product name has no Korean characters",
|
||||||
|
"recommendation": "Include Korean product name for Naver Shopping visibility.",
|
||||||
|
})
|
||||||
|
|
||||||
|
# Naver prefers specific category mapping
|
||||||
|
if not schema_data.get("category"):
|
||||||
|
issues.append({
|
||||||
|
"url": page_url,
|
||||||
|
"type": "naver_category",
|
||||||
|
"severity": "low",
|
||||||
|
"message": "Missing 'category' property for Naver Shopping categorization",
|
||||||
|
"recommendation": "Add category property matching Naver Shopping category taxonomy.",
|
||||||
|
})
|
||||||
|
|
||||||
|
# Naver requires image
|
||||||
|
image = schema_data.get("image")
|
||||||
|
if not image:
|
||||||
|
issues.append({
|
||||||
|
"url": page_url,
|
||||||
|
"type": "naver_image",
|
||||||
|
"severity": "high",
|
||||||
|
"message": "Missing product image (required for Naver Shopping)",
|
||||||
|
"recommendation": "Add at least one high-quality product image URL.",
|
||||||
|
})
|
||||||
|
elif isinstance(image, str):
|
||||||
|
if not image.startswith("http"):
|
||||||
|
issues.append({
|
||||||
|
"url": page_url,
|
||||||
|
"type": "naver_image_url",
|
||||||
|
"severity": "medium",
|
||||||
|
"message": "Product image URL is relative (should be absolute)",
|
||||||
|
"recommendation": "Use absolute URLs for product images.",
|
||||||
|
})
|
||||||
|
|
||||||
|
# Naver requires price in KRW
|
||||||
|
offers = schema_data.get("offers")
|
||||||
|
if offers:
|
||||||
|
offer_list = offers if isinstance(offers, list) else [offers]
|
||||||
|
for offer in offer_list:
|
||||||
|
if isinstance(offer, dict):
|
||||||
|
currency = offer.get("priceCurrency", "")
|
||||||
|
if currency and currency.upper() != "KRW":
|
||||||
|
issues.append({
|
||||||
|
"url": page_url,
|
||||||
|
"type": "naver_currency",
|
||||||
|
"severity": "medium",
|
||||||
|
"message": f"Price currency is {currency}, not KRW",
|
||||||
|
"recommendation": "For Naver Shopping, provide price in KRW.",
|
||||||
|
})
|
||||||
|
|
||||||
|
# Check brand/manufacturer
|
||||||
|
if not schema_data.get("brand") and not schema_data.get("manufacturer"):
|
||||||
|
issues.append({
|
||||||
|
"url": page_url,
|
||||||
|
"type": "naver_brand",
|
||||||
|
"severity": "low",
|
||||||
|
"message": "Missing brand/manufacturer (helpful for Naver Shopping filters)",
|
||||||
|
"recommendation": "Add brand or manufacturer property.",
|
||||||
|
})
|
||||||
|
|
||||||
|
return issues
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Orchestrator
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
async def check(
|
||||||
|
self,
|
||||||
|
urls: list[str] | None = None,
|
||||||
|
sitemap_url: str | None = None,
|
||||||
|
sample_size: int = 50,
|
||||||
|
) -> SchemaCheckResult:
|
||||||
|
"""Run schema validation on URLs or sitemap."""
|
||||||
|
result = SchemaCheckResult(timestamp=datetime.now().isoformat())
|
||||||
|
target_urls: list[str] = []
|
||||||
|
|
||||||
|
async with aiohttp.ClientSession() as session:
|
||||||
|
if sitemap_url:
|
||||||
|
# Fetch URLs from sitemap
|
||||||
|
target_urls = await self._urls_from_sitemap(session, sitemap_url, sample_size)
|
||||||
|
if urls:
|
||||||
|
target_urls.extend(urls)
|
||||||
|
|
||||||
|
target_urls = list(set(target_urls))[:sample_size]
|
||||||
|
result.urls_checked = len(target_urls)
|
||||||
|
self.logger.info(f"Checking {len(target_urls)} URLs for Product schema")
|
||||||
|
|
||||||
|
error_counter: dict[str, int] = {}
|
||||||
|
warning_counter: dict[str, int] = {}
|
||||||
|
|
||||||
|
for url in target_urls:
|
||||||
|
html = await self._fetch_page(session, url)
|
||||||
|
if not html:
|
||||||
|
result.pages_without_schema += 1
|
||||||
|
continue
|
||||||
|
|
||||||
|
schemas = self.extract_schemas(html, url)
|
||||||
|
product_schemas = [
|
||||||
|
s for s in schemas
|
||||||
|
if self._get_schema_type(s) in ("Product", "ProductGroup")
|
||||||
|
]
|
||||||
|
breadcrumb_schemas = [
|
||||||
|
s for s in schemas
|
||||||
|
if self._get_schema_type(s) == "BreadcrumbList"
|
||||||
|
]
|
||||||
|
|
||||||
|
if not product_schemas:
|
||||||
|
result.pages_without_schema += 1
|
||||||
|
continue
|
||||||
|
|
||||||
|
result.pages_with_schema += 1
|
||||||
|
|
||||||
|
for ps_data in product_schemas:
|
||||||
|
ps = self.validate_product_schema(ps_data, url)
|
||||||
|
result.schemas.append(asdict(ps))
|
||||||
|
|
||||||
|
for err in ps.errors:
|
||||||
|
error_counter[err] = error_counter.get(err, 0) + 1
|
||||||
|
for warn in ps.warnings:
|
||||||
|
warning_counter[warn] = warning_counter.get(warn, 0) + 1
|
||||||
|
|
||||||
|
# Naver Shopping checks
|
||||||
|
naver_issues = self.check_naver_shopping_requirements(ps_data, url)
|
||||||
|
result.naver_shopping_issues.extend(naver_issues)
|
||||||
|
|
||||||
|
# Validate breadcrumbs
|
||||||
|
for bc_data in breadcrumb_schemas:
|
||||||
|
bc_result = self.validate_breadcrumb(bc_data)
|
||||||
|
for err in bc_result["errors"]:
|
||||||
|
error_counter[err] = error_counter.get(err, 0) + 1
|
||||||
|
|
||||||
|
# Aggregate common errors/warnings
|
||||||
|
result.common_errors = sorted(
|
||||||
|
error_counter.keys(),
|
||||||
|
key=lambda k: error_counter[k],
|
||||||
|
reverse=True,
|
||||||
|
)[:20]
|
||||||
|
result.common_warnings = sorted(
|
||||||
|
warning_counter.keys(),
|
||||||
|
key=lambda k: warning_counter[k],
|
||||||
|
reverse=True,
|
||||||
|
)[:20]
|
||||||
|
|
||||||
|
result.calculate_score()
|
||||||
|
return result
|
||||||
|
|
||||||
|
async def _urls_from_sitemap(
|
||||||
|
self,
|
||||||
|
session: aiohttp.ClientSession,
|
||||||
|
sitemap_url: str,
|
||||||
|
limit: int,
|
||||||
|
) -> list[str]:
|
||||||
|
"""Fetch product URLs from sitemap."""
|
||||||
|
urls: list[str] = []
|
||||||
|
try:
|
||||||
|
async with session.get(sitemap_url, headers=self.headers,
|
||||||
|
timeout=self.timeout, ssl=False) as resp:
|
||||||
|
if resp.status != 200:
|
||||||
|
return urls
|
||||||
|
text = await resp.text(errors="replace")
|
||||||
|
soup = BeautifulSoup(text, "lxml-xml")
|
||||||
|
|
||||||
|
# Handle sitemap index
|
||||||
|
sitemapindex = soup.find_all("sitemap")
|
||||||
|
if sitemapindex:
|
||||||
|
for sm in sitemapindex[:3]:
|
||||||
|
loc = sm.find("loc")
|
||||||
|
if loc:
|
||||||
|
child_urls = await self._urls_from_sitemap(session, loc.text.strip(), limit)
|
||||||
|
urls.extend(child_urls)
|
||||||
|
if len(urls) >= limit:
|
||||||
|
break
|
||||||
|
else:
|
||||||
|
for tag in soup.find_all("url"):
|
||||||
|
loc = tag.find("loc")
|
||||||
|
if loc:
|
||||||
|
urls.append(loc.text.strip())
|
||||||
|
if len(urls) >= limit:
|
||||||
|
break
|
||||||
|
except Exception as exc:
|
||||||
|
self.logger.warning(f"Sitemap parse failed: {exc}")
|
||||||
|
|
||||||
|
return urls[:limit]
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _get_schema_type(schema: dict) -> str:
|
||||||
|
"""Get the @type from a schema dict, handling various formats."""
|
||||||
|
schema_type = schema.get("@type", "")
|
||||||
|
if isinstance(schema_type, list):
|
||||||
|
return schema_type[0] if schema_type else ""
|
||||||
|
return str(schema_type)
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# CLI output helpers
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def print_rich_report(result: SchemaCheckResult) -> None:
|
||||||
|
"""Print a rich-formatted report to the console."""
|
||||||
|
console.print(f"\n[bold cyan]Product Schema Validation Report[/bold cyan]")
|
||||||
|
console.print(f"Timestamp: {result.timestamp}")
|
||||||
|
console.print(f"URLs checked: {result.urls_checked}")
|
||||||
|
|
||||||
|
# Coverage
|
||||||
|
coverage = (result.pages_with_schema / max(result.urls_checked, 1)) * 100
|
||||||
|
cov_color = "green" if coverage >= 90 else "yellow" if coverage >= 50 else "red"
|
||||||
|
console.print(f"Schema coverage: [{cov_color}]{coverage:.0f}%[/{cov_color}] "
|
||||||
|
f"({result.pages_with_schema}/{result.urls_checked})")
|
||||||
|
|
||||||
|
# Score
|
||||||
|
score_color = "green" if result.score >= 80 else "yellow" if result.score >= 50 else "red"
|
||||||
|
console.print(f"[bold {score_color}]Score: {result.score}/100[/bold {score_color}]")
|
||||||
|
|
||||||
|
# Validity summary
|
||||||
|
valid = sum(1 for s in result.schemas if s.get("is_valid"))
|
||||||
|
eligible = sum(1 for s in result.schemas if s.get("rich_result_eligible"))
|
||||||
|
total = len(result.schemas)
|
||||||
|
|
||||||
|
table = Table(title="Schema Summary")
|
||||||
|
table.add_column("Metric", style="bold")
|
||||||
|
table.add_column("Value", justify="right")
|
||||||
|
table.add_row("Total schemas found", str(total))
|
||||||
|
table.add_row("Valid schemas", str(valid))
|
||||||
|
table.add_row("Rich result eligible", str(eligible))
|
||||||
|
table.add_row("Pages without schema", str(result.pages_without_schema))
|
||||||
|
console.print(table)
|
||||||
|
|
||||||
|
# Common errors
|
||||||
|
if result.common_errors:
|
||||||
|
console.print(f"\n[bold red]Common Errors ({len(result.common_errors)}):[/bold red]")
|
||||||
|
for err in result.common_errors[:10]:
|
||||||
|
console.print(f" [red]-[/red] {err}")
|
||||||
|
|
||||||
|
# Common warnings
|
||||||
|
if result.common_warnings:
|
||||||
|
console.print(f"\n[bold yellow]Common Warnings ({len(result.common_warnings)}):[/bold yellow]")
|
||||||
|
for warn in result.common_warnings[:10]:
|
||||||
|
console.print(f" [yellow]-[/yellow] {warn}")
|
||||||
|
|
||||||
|
# Naver Shopping issues
|
||||||
|
if result.naver_shopping_issues:
|
||||||
|
console.print(f"\n[bold magenta]Naver Shopping Issues ({len(result.naver_shopping_issues)}):[/bold magenta]")
|
||||||
|
seen: set[str] = set()
|
||||||
|
for issue in result.naver_shopping_issues:
|
||||||
|
key = f"{issue['type']}:{issue['message']}"
|
||||||
|
if key not in seen:
|
||||||
|
seen.add(key)
|
||||||
|
console.print(f" [{issue.get('severity', 'medium')}] {issue['message']}")
|
||||||
|
console.print(f" [dim]{issue['recommendation']}[/dim]")
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Main
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def main() -> None:
|
||||||
|
parser = argparse.ArgumentParser(
|
||||||
|
description="Product Schema Checker - Validate e-commerce structured data",
|
||||||
|
)
|
||||||
|
group = parser.add_mutually_exclusive_group(required=True)
|
||||||
|
group.add_argument("--url", nargs="+", help="Product page URL(s) to validate")
|
||||||
|
group.add_argument("--sitemap", help="Sitemap URL to fetch product pages from")
|
||||||
|
parser.add_argument(
|
||||||
|
"--sample",
|
||||||
|
type=int,
|
||||||
|
default=50,
|
||||||
|
help="Max URLs to check from sitemap (default: 50)",
|
||||||
|
)
|
||||||
|
parser.add_argument("--json", action="store_true", help="Output as JSON")
|
||||||
|
parser.add_argument("--output", type=str, help="Save output to file")
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
checker = ProductSchemaChecker()
|
||||||
|
result = asyncio.run(
|
||||||
|
checker.check(
|
||||||
|
urls=args.url,
|
||||||
|
sitemap_url=args.sitemap,
|
||||||
|
sample_size=args.sample,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
if args.json:
|
||||||
|
output = json.dumps(asdict(result), indent=2, ensure_ascii=False, default=str)
|
||||||
|
if args.output:
|
||||||
|
with open(args.output, "w", encoding="utf-8") as f:
|
||||||
|
f.write(output)
|
||||||
|
console.print(f"[green]Results saved to {args.output}[/green]")
|
||||||
|
else:
|
||||||
|
print(output)
|
||||||
|
else:
|
||||||
|
print_rich_report(result)
|
||||||
|
if args.output:
|
||||||
|
output = json.dumps(asdict(result), indent=2, ensure_ascii=False, default=str)
|
||||||
|
with open(args.output, "w", encoding="utf-8") as f:
|
||||||
|
f.write(output)
|
||||||
|
console.print(f"\n[green]JSON results also saved to {args.output}[/green]")
|
||||||
|
|
||||||
|
checker.print_stats()
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
@@ -0,0 +1,9 @@
|
|||||||
|
# 24-seo-ecommerce dependencies
|
||||||
|
requests>=2.31.0
|
||||||
|
aiohttp>=3.9.0
|
||||||
|
beautifulsoup4>=4.12.0
|
||||||
|
lxml>=5.1.0
|
||||||
|
tenacity>=8.2.0
|
||||||
|
tqdm>=4.66.0
|
||||||
|
python-dotenv>=1.0.0
|
||||||
|
rich>=13.7.0
|
||||||
156
custom-skills/24-seo-ecommerce/desktop/SKILL.md
Normal file
156
custom-skills/24-seo-ecommerce/desktop/SKILL.md
Normal file
@@ -0,0 +1,156 @@
|
|||||||
|
---
|
||||||
|
name: seo-ecommerce
|
||||||
|
description: |
|
||||||
|
E-commerce SEO audit and optimization for product pages, product schema, category taxonomy,
|
||||||
|
and Korean marketplace presence.
|
||||||
|
Triggers: product SEO, e-commerce audit, product schema, category SEO, Smart Store, marketplace SEO,
|
||||||
|
상품 SEO, 이커머스 감사, 쇼핑몰 SEO.
|
||||||
|
---
|
||||||
|
|
||||||
|
# E-Commerce SEO Audit
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
Audit e-commerce sites for product page optimization, structured data validation, category taxonomy health, duplicate content issues, and Korean marketplace presence (Naver Smart Store, Coupang, Gmarket, 11번가).
|
||||||
|
|
||||||
|
## Core Capabilities
|
||||||
|
|
||||||
|
1. **Product Page SEO Audit** - Title, meta description, H1, image alt text, internal links, canonical tags
|
||||||
|
2. **Product Schema Validation** - Product, Offer, AggregateRating, Review, BreadcrumbList structured data
|
||||||
|
3. **Category Taxonomy Analysis** - Depth check, breadcrumbs, faceted navigation handling
|
||||||
|
4. **Duplicate Content Detection** - Parameter variants, product variants, pagination issues
|
||||||
|
5. **Korean Marketplace Presence** - Naver Smart Store, Coupang, Gmarket, 11번가
|
||||||
|
|
||||||
|
## MCP Tool Usage
|
||||||
|
|
||||||
|
### Ahrefs for Product Page Discovery
|
||||||
|
```
|
||||||
|
mcp__ahrefs__site-explorer-pages-by-traffic: Identify top product and category pages
|
||||||
|
mcp__ahrefs__site-explorer-organic-keywords: Product page keyword performance
|
||||||
|
```
|
||||||
|
|
||||||
|
### WebSearch for Marketplace Checks
|
||||||
|
```
|
||||||
|
WebSearch: Search for brand presence on Korean marketplaces
|
||||||
|
WebFetch: Fetch and analyze marketplace listing pages
|
||||||
|
```
|
||||||
|
|
||||||
|
### Notion for Report Storage
|
||||||
|
```
|
||||||
|
mcp__notion__notion-create-pages: Save audit report to SEO Audit Log database
|
||||||
|
```
|
||||||
|
|
||||||
|
## Workflow
|
||||||
|
|
||||||
|
### 1. Product Page Audit
|
||||||
|
1. Discover product pages via Ahrefs pages-by-traffic or sitemap
|
||||||
|
2. For each product page check:
|
||||||
|
- Title tag: contains product name, under 60 chars
|
||||||
|
- Meta description: includes price/feature info, under 155 chars
|
||||||
|
- Single H1 with product name
|
||||||
|
- All product images have descriptive alt text
|
||||||
|
- Canonical tag present and correct
|
||||||
|
- Sufficient internal links (related products, breadcrumbs)
|
||||||
|
- Open Graph tags for social sharing
|
||||||
|
3. Score severity: critical/high/medium/low
|
||||||
|
|
||||||
|
### 2. Product Schema Validation
|
||||||
|
1. Extract JSON-LD and Microdata from product pages
|
||||||
|
2. Validate Product type: name, image, description (required)
|
||||||
|
3. Validate Offer: price, priceCurrency, availability (required)
|
||||||
|
4. Validate AggregateRating: ratingValue, reviewCount (required)
|
||||||
|
5. Validate Review: author, reviewRating (required)
|
||||||
|
6. Check BreadcrumbList implementation
|
||||||
|
7. Assess Google rich result eligibility
|
||||||
|
8. Check Naver Shopping specific requirements (Korean name, KRW price, absolute image URLs)
|
||||||
|
|
||||||
|
### 3. Category Taxonomy Analysis
|
||||||
|
1. Crawl category pages from sitemap or homepage navigation
|
||||||
|
2. Measure taxonomy depth (warn if > 4 levels)
|
||||||
|
3. Check breadcrumb presence on every category page
|
||||||
|
4. Identify faceted navigation URLs that are indexable without proper canonicals
|
||||||
|
5. Count child category links for structure assessment
|
||||||
|
|
||||||
|
### 4. Duplicate Content Detection
|
||||||
|
1. Group URLs by base path (stripping query parameters)
|
||||||
|
2. Identify parameter variants (?color=, ?size=, ?sort=)
|
||||||
|
3. Detect product variant URL duplicates (e.g., /product-red vs /product-blue)
|
||||||
|
4. Flag paginated pages missing self-referencing canonicals
|
||||||
|
|
||||||
|
### 5. Korean Marketplace Presence
|
||||||
|
1. Extract brand name from site (og:site_name or title)
|
||||||
|
2. Search each marketplace for brand products:
|
||||||
|
- Naver Smart Store (smartstore.naver.com)
|
||||||
|
- Coupang (coupang.com)
|
||||||
|
- Gmarket (gmarket.co.kr)
|
||||||
|
- 11번가 (11st.co.kr)
|
||||||
|
3. Check Naver Smart Store-specific SEO elements
|
||||||
|
4. Verify naver-site-verification meta tag
|
||||||
|
5. Check Korean content ratio for Naver visibility
|
||||||
|
|
||||||
|
## Output Format
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
## E-Commerce SEO Audit: [domain]
|
||||||
|
|
||||||
|
### Score: [0-100]/100
|
||||||
|
|
||||||
|
### Product Page Issues
|
||||||
|
- **Critical**: [count] issues
|
||||||
|
- **High**: [count] issues
|
||||||
|
- **Medium**: [count] issues
|
||||||
|
- **Low**: [count] issues
|
||||||
|
|
||||||
|
#### Top Issues
|
||||||
|
1. [severity] [issue_type] - [message]
|
||||||
|
Recommendation: [fix]
|
||||||
|
|
||||||
|
### Category Structure
|
||||||
|
- Categories found: [count]
|
||||||
|
- Max depth: [number]
|
||||||
|
- Breadcrumbs present: [count]
|
||||||
|
- Faceted navigation issues: [count]
|
||||||
|
|
||||||
|
### Schema Validation
|
||||||
|
- Pages with schema: [count]/[total]
|
||||||
|
- Valid schemas: [count]
|
||||||
|
- Rich result eligible: [count]
|
||||||
|
- Common errors: [list]
|
||||||
|
|
||||||
|
### Korean Marketplaces
|
||||||
|
- Naver Smart Store: [Found/Not Found]
|
||||||
|
- Coupang: [Found/Not Found]
|
||||||
|
- Gmarket: [Found/Not Found]
|
||||||
|
- 11번가: [Found/Not Found]
|
||||||
|
|
||||||
|
### Recommendations
|
||||||
|
1. [Priority fixes ordered by impact]
|
||||||
|
```
|
||||||
|
|
||||||
|
## Common Issues
|
||||||
|
|
||||||
|
| Issue | Impact | Fix |
|
||||||
|
|-------|--------|-----|
|
||||||
|
| Missing Product schema | High | Add JSON-LD Product with offers |
|
||||||
|
| No canonical on product variants | High | Add self-referencing canonical |
|
||||||
|
| Images without alt text | High | Add product name to alt text |
|
||||||
|
| Category depth > 4 levels | Medium | Flatten taxonomy |
|
||||||
|
| Missing breadcrumbs | Medium | Add BreadcrumbList schema and visible nav |
|
||||||
|
| Faceted nav creating duplicates | High | Use canonical or noindex on filtered pages |
|
||||||
|
| Missing Naver verification | Medium | Add naver-site-verification meta tag |
|
||||||
|
| Price not in KRW for Korean market | Medium | Add KRW pricing to schema |
|
||||||
|
|
||||||
|
## Limitations
|
||||||
|
|
||||||
|
- Cannot access logged-in areas (member-only products)
|
||||||
|
- Marketplace search results may vary by region/IP
|
||||||
|
- Large catalogs require sampling (default 50 pages)
|
||||||
|
- Cannot validate JavaScript-rendered product content without headless browser
|
||||||
|
|
||||||
|
## Notion Output (Required)
|
||||||
|
|
||||||
|
All audit reports MUST be saved to OurDigital SEO Audit Log:
|
||||||
|
- **Database ID**: `2c8581e5-8a1e-8035-880b-e38cefc2f3ef`
|
||||||
|
- **Properties**: Issue (title), Site (url), Category (E-Commerce SEO), Priority, Found Date, Audit ID
|
||||||
|
- **Language**: Korean with English technical terms
|
||||||
|
- **Audit ID Format**: ECOM-YYYYMMDD-NNN
|
||||||
8
custom-skills/24-seo-ecommerce/desktop/skill.yaml
Normal file
8
custom-skills/24-seo-ecommerce/desktop/skill.yaml
Normal file
@@ -0,0 +1,8 @@
|
|||||||
|
name: seo-ecommerce
|
||||||
|
description: |
|
||||||
|
E-commerce SEO audit and optimization. Triggers: product SEO, e-commerce audit, product schema, category SEO, Smart Store, marketplace SEO.
|
||||||
|
allowed-tools:
|
||||||
|
- mcp__ahrefs__*
|
||||||
|
- mcp__notion__*
|
||||||
|
- WebSearch
|
||||||
|
- WebFetch
|
||||||
15
custom-skills/24-seo-ecommerce/desktop/tools/ahrefs.md
Normal file
15
custom-skills/24-seo-ecommerce/desktop/tools/ahrefs.md
Normal file
@@ -0,0 +1,15 @@
|
|||||||
|
# Ahrefs
|
||||||
|
|
||||||
|
> TODO: Document tool usage for this skill
|
||||||
|
|
||||||
|
## Available Commands
|
||||||
|
|
||||||
|
- [ ] List commands
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
- [ ] Add configuration details
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
- [ ] Add usage examples
|
||||||
15
custom-skills/24-seo-ecommerce/desktop/tools/notion.md
Normal file
15
custom-skills/24-seo-ecommerce/desktop/tools/notion.md
Normal file
@@ -0,0 +1,15 @@
|
|||||||
|
# Notion
|
||||||
|
|
||||||
|
> TODO: Document tool usage for this skill
|
||||||
|
|
||||||
|
## Available Commands
|
||||||
|
|
||||||
|
- [ ] List commands
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
- [ ] Add configuration details
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
- [ ] Add usage examples
|
||||||
15
custom-skills/24-seo-ecommerce/desktop/tools/websearch.md
Normal file
15
custom-skills/24-seo-ecommerce/desktop/tools/websearch.md
Normal file
@@ -0,0 +1,15 @@
|
|||||||
|
# WebSearch
|
||||||
|
|
||||||
|
> TODO: Document tool usage for this skill
|
||||||
|
|
||||||
|
## Available Commands
|
||||||
|
|
||||||
|
- [ ] List commands
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
- [ ] Add configuration details
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
- [ ] Add usage examples
|
||||||
148
custom-skills/25-seo-kpi-framework/code/CLAUDE.md
Normal file
148
custom-skills/25-seo-kpi-framework/code/CLAUDE.md
Normal file
@@ -0,0 +1,148 @@
|
|||||||
|
# CLAUDE.md
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
SEO KPI and performance framework for unified metrics aggregation across all SEO dimensions. Establishes baselines, sets targets (30/60/90-day), generates executive summaries with health scores, provides tactical breakdowns, estimates ROI using Ahrefs traffic cost, and supports period-over-period comparison (MoM, QoQ).
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install -r scripts/requirements.txt
|
||||||
|
|
||||||
|
# Aggregate KPIs
|
||||||
|
python scripts/kpi_aggregator.py --url https://example.com --json
|
||||||
|
|
||||||
|
# Generate performance report
|
||||||
|
python scripts/performance_reporter.py --url https://example.com --period monthly --json
|
||||||
|
```
|
||||||
|
|
||||||
|
## Scripts
|
||||||
|
|
||||||
|
| Script | Purpose | Key Output |
|
||||||
|
|--------|---------|------------|
|
||||||
|
| `kpi_aggregator.py` | Aggregate KPIs across all SEO dimensions | Unified KPI dashboard, health score, baselines |
|
||||||
|
| `performance_reporter.py` | Generate period-over-period performance reports | Trend analysis, executive summary, tactical breakdown |
|
||||||
|
| `base_client.py` | Shared utilities | RateLimiter, ConfigManager, BaseAsyncClient |
|
||||||
|
|
||||||
|
## KPI Aggregator
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Full KPI aggregation
|
||||||
|
python scripts/kpi_aggregator.py --url https://example.com --json
|
||||||
|
|
||||||
|
# Set baselines
|
||||||
|
python scripts/kpi_aggregator.py --url https://example.com --set-baseline --json
|
||||||
|
|
||||||
|
# Compare against baseline
|
||||||
|
python scripts/kpi_aggregator.py --url https://example.com --baseline baseline.json --json
|
||||||
|
|
||||||
|
# With ROI estimation
|
||||||
|
python scripts/kpi_aggregator.py --url https://example.com --roi --json
|
||||||
|
```
|
||||||
|
|
||||||
|
**Capabilities**:
|
||||||
|
- Unified KPI taxonomy across 7 dimensions:
|
||||||
|
- Traffic KPIs (organic sessions, organic traffic value, traffic trend)
|
||||||
|
- Ranking KPIs (visibility score, avg position, top10 keywords count)
|
||||||
|
- Engagement KPIs (bounce rate, pages/session, avg session duration)
|
||||||
|
- Technical KPIs (crawl errors, page speed score, mobile usability)
|
||||||
|
- Content KPIs (indexed pages, content freshness score, thin content ratio)
|
||||||
|
- Link KPIs (domain rating, referring domains, link velocity)
|
||||||
|
- Local KPIs (GBP visibility, review score, citation accuracy)
|
||||||
|
- Multi-source data aggregation from Ahrefs and other skill outputs
|
||||||
|
- Baseline establishment and target setting (30/60/90 day)
|
||||||
|
- Overall health score (0-100) with weighted dimensions
|
||||||
|
- ROI estimation using Ahrefs organic traffic cost
|
||||||
|
|
||||||
|
## Performance Reporter
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Monthly report
|
||||||
|
python scripts/performance_reporter.py --url https://example.com --period monthly --json
|
||||||
|
|
||||||
|
# Quarterly report
|
||||||
|
python scripts/performance_reporter.py --url https://example.com --period quarterly --json
|
||||||
|
|
||||||
|
# Custom date range
|
||||||
|
python scripts/performance_reporter.py --url https://example.com --from 2025-01-01 --to 2025-03-31 --json
|
||||||
|
|
||||||
|
# Executive summary only
|
||||||
|
python scripts/performance_reporter.py --url https://example.com --period monthly --executive --json
|
||||||
|
```
|
||||||
|
|
||||||
|
**Capabilities**:
|
||||||
|
- Executive summary generation (health score, trend arrows, key wins/concerns)
|
||||||
|
- Period-over-period comparison (MoM, QoQ, YoY)
|
||||||
|
- Trend direction indicators (up improving, down declining, stable)
|
||||||
|
- Top wins and concerns identification
|
||||||
|
- Tactical breakdown with actionable next steps
|
||||||
|
- Target vs actual comparison with progress %
|
||||||
|
- Traffic value change (ROI proxy)
|
||||||
|
|
||||||
|
## Ahrefs MCP Tools Used
|
||||||
|
|
||||||
|
| Tool | Purpose |
|
||||||
|
|------|---------|
|
||||||
|
| `site-explorer-metrics` | Current organic metrics |
|
||||||
|
| `site-explorer-metrics-history` | Historical metrics trends |
|
||||||
|
| `site-explorer-metrics-by-country` | Country-level breakdown |
|
||||||
|
| `site-explorer-domain-rating-history` | DR trend over time |
|
||||||
|
| `site-explorer-total-search-volume-history` | Total keyword volume trend |
|
||||||
|
|
||||||
|
## Output Format
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"url": "https://example.com",
|
||||||
|
"health_score": 72,
|
||||||
|
"health_trend": "improving",
|
||||||
|
"kpis": {
|
||||||
|
"traffic": {"organic_traffic": 15000, "traffic_value_usd": 45000, "trend": "up"},
|
||||||
|
"rankings": {"visibility_score": 68, "avg_position": 18.5, "top10_count": 48},
|
||||||
|
"links": {"domain_rating": 45, "referring_domains": 850, "velocity": "+15/month"},
|
||||||
|
"technical": {"health_score": 85, "crawl_errors": 12},
|
||||||
|
"content": {"indexed_pages": 320, "freshness_score": 70},
|
||||||
|
"engagement": {"bounce_rate": 45, "pages_per_session": 2.8},
|
||||||
|
"local": {"gbp_visibility": 80, "review_score": 4.5}
|
||||||
|
},
|
||||||
|
"targets": {
|
||||||
|
"30_day": {},
|
||||||
|
"60_day": {},
|
||||||
|
"90_day": {}
|
||||||
|
},
|
||||||
|
"executive_summary": {
|
||||||
|
"top_wins": [],
|
||||||
|
"top_concerns": [],
|
||||||
|
"recommendations": []
|
||||||
|
},
|
||||||
|
"timestamp": "2025-01-01T00:00:00"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Notion Output (Required)
|
||||||
|
|
||||||
|
**IMPORTANT**: All audit reports MUST be saved to the OurDigital SEO Audit Log database.
|
||||||
|
|
||||||
|
### Database Configuration
|
||||||
|
|
||||||
|
| Field | Value |
|
||||||
|
|-------|-------|
|
||||||
|
| Database ID | `2c8581e5-8a1e-8035-880b-e38cefc2f3ef` |
|
||||||
|
| URL | https://www.notion.so/dintelligence/2c8581e58a1e8035880be38cefc2f3ef |
|
||||||
|
|
||||||
|
### Required Properties
|
||||||
|
|
||||||
|
| Property | Type | Description |
|
||||||
|
|----------|------|-------------|
|
||||||
|
| Issue | Title | Report title (Korean + date) |
|
||||||
|
| Site | URL | Audited website URL |
|
||||||
|
| Category | Select | SEO KPI & Performance |
|
||||||
|
| Priority | Select | Based on health score trend |
|
||||||
|
| Found Date | Date | Report date (YYYY-MM-DD) |
|
||||||
|
| Audit ID | Rich Text | Format: KPI-YYYYMMDD-NNN |
|
||||||
|
|
||||||
|
### Language Guidelines
|
||||||
|
|
||||||
|
- Report content in Korean (한국어)
|
||||||
|
- Keep technical English terms as-is (e.g., KPI, ROI, Domain Rating, Visibility Score)
|
||||||
|
- URLs and code remain unchanged
|
||||||
207
custom-skills/25-seo-kpi-framework/code/scripts/base_client.py
Normal file
207
custom-skills/25-seo-kpi-framework/code/scripts/base_client.py
Normal file
@@ -0,0 +1,207 @@
|
|||||||
|
"""
|
||||||
|
Base Client - Shared async client utilities
|
||||||
|
===========================================
|
||||||
|
Purpose: Rate-limited async operations for API clients
|
||||||
|
Python: 3.10+
|
||||||
|
"""
|
||||||
|
|
||||||
|
import asyncio
|
||||||
|
import logging
|
||||||
|
import os
|
||||||
|
from asyncio import Semaphore
|
||||||
|
from datetime import datetime
|
||||||
|
from typing import Any, Callable, TypeVar
|
||||||
|
|
||||||
|
from dotenv import load_dotenv
|
||||||
|
from tenacity import (
|
||||||
|
retry,
|
||||||
|
stop_after_attempt,
|
||||||
|
wait_exponential,
|
||||||
|
retry_if_exception_type,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Load environment variables
|
||||||
|
load_dotenv()
|
||||||
|
|
||||||
|
# Logging setup
|
||||||
|
logging.basicConfig(
|
||||||
|
level=logging.INFO,
|
||||||
|
format="%(asctime)s - %(levelname)s - %(message)s",
|
||||||
|
)
|
||||||
|
|
||||||
|
T = TypeVar("T")
|
||||||
|
|
||||||
|
|
||||||
|
class RateLimiter:
|
||||||
|
"""Rate limiter using token bucket algorithm."""
|
||||||
|
|
||||||
|
def __init__(self, rate: float, per: float = 1.0):
|
||||||
|
"""
|
||||||
|
Initialize rate limiter.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
rate: Number of requests allowed
|
||||||
|
per: Time period in seconds (default: 1 second)
|
||||||
|
"""
|
||||||
|
self.rate = rate
|
||||||
|
self.per = per
|
||||||
|
self.tokens = rate
|
||||||
|
self.last_update = datetime.now()
|
||||||
|
self._lock = asyncio.Lock()
|
||||||
|
|
||||||
|
async def acquire(self) -> None:
|
||||||
|
"""Acquire a token, waiting if necessary."""
|
||||||
|
async with self._lock:
|
||||||
|
now = datetime.now()
|
||||||
|
elapsed = (now - self.last_update).total_seconds()
|
||||||
|
self.tokens = min(self.rate, self.tokens + elapsed * (self.rate / self.per))
|
||||||
|
self.last_update = now
|
||||||
|
|
||||||
|
if self.tokens < 1:
|
||||||
|
wait_time = (1 - self.tokens) * (self.per / self.rate)
|
||||||
|
await asyncio.sleep(wait_time)
|
||||||
|
self.tokens = 0
|
||||||
|
else:
|
||||||
|
self.tokens -= 1
|
||||||
|
|
||||||
|
|
||||||
|
class BaseAsyncClient:
|
||||||
|
"""Base class for async API clients with rate limiting."""
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
max_concurrent: int = 5,
|
||||||
|
requests_per_second: float = 3.0,
|
||||||
|
logger: logging.Logger | None = None,
|
||||||
|
):
|
||||||
|
"""
|
||||||
|
Initialize base client.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
max_concurrent: Maximum concurrent requests
|
||||||
|
requests_per_second: Rate limit
|
||||||
|
logger: Logger instance
|
||||||
|
"""
|
||||||
|
self.semaphore = Semaphore(max_concurrent)
|
||||||
|
self.rate_limiter = RateLimiter(requests_per_second)
|
||||||
|
self.logger = logger or logging.getLogger(self.__class__.__name__)
|
||||||
|
self.stats = {
|
||||||
|
"requests": 0,
|
||||||
|
"success": 0,
|
||||||
|
"errors": 0,
|
||||||
|
"retries": 0,
|
||||||
|
}
|
||||||
|
|
||||||
|
@retry(
|
||||||
|
stop=stop_after_attempt(3),
|
||||||
|
wait=wait_exponential(multiplier=1, min=2, max=10),
|
||||||
|
retry=retry_if_exception_type(Exception),
|
||||||
|
)
|
||||||
|
async def _rate_limited_request(
|
||||||
|
self,
|
||||||
|
coro: Callable[[], Any],
|
||||||
|
) -> Any:
|
||||||
|
"""Execute a request with rate limiting and retry."""
|
||||||
|
async with self.semaphore:
|
||||||
|
await self.rate_limiter.acquire()
|
||||||
|
self.stats["requests"] += 1
|
||||||
|
try:
|
||||||
|
result = await coro()
|
||||||
|
self.stats["success"] += 1
|
||||||
|
return result
|
||||||
|
except Exception as e:
|
||||||
|
self.stats["errors"] += 1
|
||||||
|
self.logger.error(f"Request failed: {e}")
|
||||||
|
raise
|
||||||
|
|
||||||
|
async def batch_requests(
|
||||||
|
self,
|
||||||
|
requests: list[Callable[[], Any]],
|
||||||
|
desc: str = "Processing",
|
||||||
|
) -> list[Any]:
|
||||||
|
"""Execute multiple requests concurrently."""
|
||||||
|
try:
|
||||||
|
from tqdm.asyncio import tqdm
|
||||||
|
has_tqdm = True
|
||||||
|
except ImportError:
|
||||||
|
has_tqdm = False
|
||||||
|
|
||||||
|
async def execute(req: Callable) -> Any:
|
||||||
|
try:
|
||||||
|
return await self._rate_limited_request(req)
|
||||||
|
except Exception as e:
|
||||||
|
return {"error": str(e)}
|
||||||
|
|
||||||
|
tasks = [execute(req) for req in requests]
|
||||||
|
|
||||||
|
if has_tqdm:
|
||||||
|
results = []
|
||||||
|
for coro in tqdm.as_completed(tasks, total=len(tasks), desc=desc):
|
||||||
|
result = await coro
|
||||||
|
results.append(result)
|
||||||
|
return results
|
||||||
|
else:
|
||||||
|
return await asyncio.gather(*tasks, return_exceptions=True)
|
||||||
|
|
||||||
|
def print_stats(self) -> None:
|
||||||
|
"""Print request statistics."""
|
||||||
|
self.logger.info("=" * 40)
|
||||||
|
self.logger.info("Request Statistics:")
|
||||||
|
self.logger.info(f" Total Requests: {self.stats['requests']}")
|
||||||
|
self.logger.info(f" Successful: {self.stats['success']}")
|
||||||
|
self.logger.info(f" Errors: {self.stats['errors']}")
|
||||||
|
self.logger.info("=" * 40)
|
||||||
|
|
||||||
|
|
||||||
|
class ConfigManager:
|
||||||
|
"""Manage API configuration and credentials."""
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
load_dotenv()
|
||||||
|
|
||||||
|
@property
|
||||||
|
def google_credentials_path(self) -> str | None:
|
||||||
|
"""Get Google service account credentials path."""
|
||||||
|
# Prefer SEO-specific credentials, fallback to general credentials
|
||||||
|
seo_creds = os.path.expanduser("~/.credential/ourdigital-seo-agent.json")
|
||||||
|
if os.path.exists(seo_creds):
|
||||||
|
return seo_creds
|
||||||
|
return os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
|
||||||
|
|
||||||
|
@property
|
||||||
|
def pagespeed_api_key(self) -> str | None:
|
||||||
|
"""Get PageSpeed Insights API key."""
|
||||||
|
return os.getenv("PAGESPEED_API_KEY")
|
||||||
|
|
||||||
|
@property
|
||||||
|
def custom_search_api_key(self) -> str | None:
|
||||||
|
"""Get Custom Search API key."""
|
||||||
|
return os.getenv("CUSTOM_SEARCH_API_KEY")
|
||||||
|
|
||||||
|
@property
|
||||||
|
def custom_search_engine_id(self) -> str | None:
|
||||||
|
"""Get Custom Search Engine ID."""
|
||||||
|
return os.getenv("CUSTOM_SEARCH_ENGINE_ID")
|
||||||
|
|
||||||
|
@property
|
||||||
|
def notion_token(self) -> str | None:
|
||||||
|
"""Get Notion API token."""
|
||||||
|
return os.getenv("NOTION_TOKEN") or os.getenv("NOTION_API_KEY")
|
||||||
|
|
||||||
|
def validate_google_credentials(self) -> bool:
|
||||||
|
"""Validate Google credentials are configured."""
|
||||||
|
creds_path = self.google_credentials_path
|
||||||
|
if not creds_path:
|
||||||
|
return False
|
||||||
|
return os.path.exists(creds_path)
|
||||||
|
|
||||||
|
def get_required(self, key: str) -> str:
|
||||||
|
"""Get required environment variable or raise error."""
|
||||||
|
value = os.getenv(key)
|
||||||
|
if not value:
|
||||||
|
raise ValueError(f"Missing required environment variable: {key}")
|
||||||
|
return value
|
||||||
|
|
||||||
|
|
||||||
|
# Singleton config instance
|
||||||
|
config = ConfigManager()
|
||||||
@@ -0,0 +1,758 @@
|
|||||||
|
"""
|
||||||
|
KPI Aggregator - Unified SEO KPI aggregation across all dimensions
|
||||||
|
==================================================================
|
||||||
|
Purpose: Aggregate KPIs from Ahrefs and other sources into a unified
|
||||||
|
dashboard with health scores, baselines, targets, and ROI.
|
||||||
|
Python: 3.10+
|
||||||
|
"""
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import asyncio
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
import sys
|
||||||
|
from dataclasses import dataclass, field, asdict
|
||||||
|
from datetime import datetime
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
|
|
||||||
|
from base_client import BaseAsyncClient, config
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Data classes
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class KpiMetric:
|
||||||
|
"""Single KPI metric with trend and target information."""
|
||||||
|
name: str
|
||||||
|
value: float
|
||||||
|
previous_value: float | None = None
|
||||||
|
change_pct: float | None = None
|
||||||
|
trend: str = "stable" # up, down, stable
|
||||||
|
target_30d: float | None = None
|
||||||
|
target_60d: float | None = None
|
||||||
|
target_90d: float | None = None
|
||||||
|
|
||||||
|
def compute_trend(self) -> None:
|
||||||
|
"""Compute trend direction and change percentage."""
|
||||||
|
if self.previous_value is not None and self.previous_value != 0:
|
||||||
|
self.change_pct = round(
|
||||||
|
((self.value - self.previous_value) / abs(self.previous_value)) * 100, 2
|
||||||
|
)
|
||||||
|
if self.change_pct > 2.0:
|
||||||
|
self.trend = "up"
|
||||||
|
elif self.change_pct < -2.0:
|
||||||
|
self.trend = "down"
|
||||||
|
else:
|
||||||
|
self.trend = "stable"
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class KpiDimension:
|
||||||
|
"""A dimension grouping multiple KPI metrics."""
|
||||||
|
name: str
|
||||||
|
metrics: list[KpiMetric] = field(default_factory=list)
|
||||||
|
weight: float = 0.0
|
||||||
|
score: float = 0.0
|
||||||
|
|
||||||
|
def compute_score(self) -> float:
|
||||||
|
"""Compute dimension score (0-100) based on metrics health."""
|
||||||
|
if not self.metrics:
|
||||||
|
self.score = 0.0
|
||||||
|
return self.score
|
||||||
|
metric_scores = []
|
||||||
|
for m in self.metrics:
|
||||||
|
if m.trend == "up":
|
||||||
|
metric_scores.append(80.0)
|
||||||
|
elif m.trend == "stable":
|
||||||
|
metric_scores.append(60.0)
|
||||||
|
else:
|
||||||
|
metric_scores.append(35.0)
|
||||||
|
# Boost score if value is positive and non-zero
|
||||||
|
if m.value and m.value > 0:
|
||||||
|
metric_scores[-1] = min(100.0, metric_scores[-1] + 10.0)
|
||||||
|
self.score = round(sum(metric_scores) / len(metric_scores), 1)
|
||||||
|
return self.score
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class HealthScore:
|
||||||
|
"""Overall SEO health score."""
|
||||||
|
overall: float = 0.0
|
||||||
|
dimensions: dict[str, float] = field(default_factory=dict)
|
||||||
|
trend: str = "stable"
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class RoiEstimate:
|
||||||
|
"""ROI estimation from Ahrefs traffic cost."""
|
||||||
|
traffic_value_usd: float = 0.0
|
||||||
|
traffic_value_change: float = 0.0
|
||||||
|
estimated_monthly_value: float = 0.0
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class KpiResult:
|
||||||
|
"""Complete KPI aggregation result."""
|
||||||
|
url: str = ""
|
||||||
|
health_score: float = 0.0
|
||||||
|
health_trend: str = "stable"
|
||||||
|
kpis: dict[str, Any] = field(default_factory=dict)
|
||||||
|
targets: dict[str, Any] = field(default_factory=dict)
|
||||||
|
roi: RoiEstimate | None = None
|
||||||
|
baseline_comparison: dict[str, Any] | None = None
|
||||||
|
executive_summary: dict[str, Any] = field(default_factory=dict)
|
||||||
|
timestamp: str = ""
|
||||||
|
errors: list[str] = field(default_factory=list)
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Dimension weights
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
DIMENSION_WEIGHTS = {
|
||||||
|
"traffic": 0.25,
|
||||||
|
"rankings": 0.20,
|
||||||
|
"technical": 0.20,
|
||||||
|
"content": 0.15,
|
||||||
|
"links": 0.15,
|
||||||
|
"local": 0.05,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# KPI Aggregator
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
class KpiAggregator(BaseAsyncClient):
|
||||||
|
"""Aggregate SEO KPIs across all dimensions from Ahrefs data."""
|
||||||
|
|
||||||
|
AHREFS_BASE = "https://api.ahrefs.com/v3"
|
||||||
|
|
||||||
|
def __init__(self, api_token: str | None = None):
|
||||||
|
super().__init__(max_concurrent=3, requests_per_second=2.0)
|
||||||
|
self.api_token = api_token or config.get_required("AHREFS_API_TOKEN")
|
||||||
|
self.headers = {
|
||||||
|
"Authorization": f"Bearer {self.api_token}",
|
||||||
|
"Accept": "application/json",
|
||||||
|
}
|
||||||
|
|
||||||
|
# ----- Ahrefs API helpers -----
|
||||||
|
|
||||||
|
async def _ahrefs_get(
|
||||||
|
self, session: aiohttp.ClientSession, endpoint: str, params: dict
|
||||||
|
) -> dict:
|
||||||
|
"""Make an authenticated GET request to Ahrefs API."""
|
||||||
|
url = f"{self.AHREFS_BASE}/{endpoint}"
|
||||||
|
async with session.get(url, headers=self.headers, params=params) as resp:
|
||||||
|
if resp.status != 200:
|
||||||
|
text = await resp.text()
|
||||||
|
self.logger.warning(f"Ahrefs {endpoint} returned {resp.status}: {text}")
|
||||||
|
return {"error": f"HTTP {resp.status}", "detail": text}
|
||||||
|
return await resp.json()
|
||||||
|
|
||||||
|
# ----- Dimension collectors -----
|
||||||
|
|
||||||
|
async def get_traffic_kpis(
|
||||||
|
self, session: aiohttp.ClientSession, url: str
|
||||||
|
) -> KpiDimension:
|
||||||
|
"""Collect traffic KPIs via site-explorer-metrics."""
|
||||||
|
dim = KpiDimension(name="traffic", weight=DIMENSION_WEIGHTS["traffic"])
|
||||||
|
try:
|
||||||
|
data = await self._ahrefs_get(
|
||||||
|
session,
|
||||||
|
"site-explorer/metrics",
|
||||||
|
{"target": url, "mode": "domain"},
|
||||||
|
)
|
||||||
|
if "error" not in data:
|
||||||
|
metrics = data.get("metrics", data)
|
||||||
|
organic = metrics.get("organic", {})
|
||||||
|
organic_traffic = organic.get("traffic", 0)
|
||||||
|
traffic_value_raw = organic.get("cost", 0)
|
||||||
|
traffic_value_usd = traffic_value_raw / 100.0 if traffic_value_raw else 0.0
|
||||||
|
dim.metrics.append(
|
||||||
|
KpiMetric(name="organic_traffic", value=float(organic_traffic))
|
||||||
|
)
|
||||||
|
dim.metrics.append(
|
||||||
|
KpiMetric(name="traffic_value_usd", value=round(traffic_value_usd, 2))
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
dim.metrics.append(KpiMetric(name="organic_traffic", value=0.0))
|
||||||
|
dim.metrics.append(KpiMetric(name="traffic_value_usd", value=0.0))
|
||||||
|
except Exception as exc:
|
||||||
|
self.logger.error(f"Traffic KPI error: {exc}")
|
||||||
|
dim.metrics.append(KpiMetric(name="organic_traffic", value=0.0))
|
||||||
|
dim.compute_score()
|
||||||
|
return dim
|
||||||
|
|
||||||
|
async def get_ranking_kpis(
|
||||||
|
self, session: aiohttp.ClientSession, url: str
|
||||||
|
) -> KpiDimension:
|
||||||
|
"""Collect ranking KPIs via site-explorer-metrics."""
|
||||||
|
dim = KpiDimension(name="rankings", weight=DIMENSION_WEIGHTS["rankings"])
|
||||||
|
try:
|
||||||
|
data = await self._ahrefs_get(
|
||||||
|
session,
|
||||||
|
"site-explorer/metrics",
|
||||||
|
{"target": url, "mode": "domain"},
|
||||||
|
)
|
||||||
|
if "error" not in data:
|
||||||
|
metrics = data.get("metrics", data)
|
||||||
|
organic = metrics.get("organic", {})
|
||||||
|
keywords_total = organic.get("keywords", 0)
|
||||||
|
# Estimate top10 as ~20% of total keywords
|
||||||
|
top10_estimate = int(keywords_total * 0.20)
|
||||||
|
# Visibility score heuristic: based on traffic relative to keywords
|
||||||
|
traffic = organic.get("traffic", 0)
|
||||||
|
visibility = min(100.0, (traffic / max(keywords_total, 1)) * 10)
|
||||||
|
dim.metrics.append(
|
||||||
|
KpiMetric(name="visibility_score", value=round(visibility, 1))
|
||||||
|
)
|
||||||
|
dim.metrics.append(
|
||||||
|
KpiMetric(name="top10_keywords", value=float(top10_estimate))
|
||||||
|
)
|
||||||
|
dim.metrics.append(
|
||||||
|
KpiMetric(name="total_keywords", value=float(keywords_total))
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
dim.metrics.append(KpiMetric(name="visibility_score", value=0.0))
|
||||||
|
dim.metrics.append(KpiMetric(name="top10_keywords", value=0.0))
|
||||||
|
except Exception as exc:
|
||||||
|
self.logger.error(f"Ranking KPI error: {exc}")
|
||||||
|
dim.metrics.append(KpiMetric(name="visibility_score", value=0.0))
|
||||||
|
dim.compute_score()
|
||||||
|
return dim
|
||||||
|
|
||||||
|
async def get_link_kpis(
|
||||||
|
self, session: aiohttp.ClientSession, url: str
|
||||||
|
) -> KpiDimension:
|
||||||
|
"""Collect link KPIs via domain-rating and metrics."""
|
||||||
|
dim = KpiDimension(name="links", weight=DIMENSION_WEIGHTS["links"])
|
||||||
|
try:
|
||||||
|
# Domain rating
|
||||||
|
dr_data = await self._ahrefs_get(
|
||||||
|
session,
|
||||||
|
"site-explorer/domain-rating",
|
||||||
|
{"target": url},
|
||||||
|
)
|
||||||
|
domain_rating = 0.0
|
||||||
|
if "error" not in dr_data:
|
||||||
|
domain_rating = float(
|
||||||
|
dr_data.get("domain_rating", dr_data.get("domainRating", 0))
|
||||||
|
)
|
||||||
|
dim.metrics.append(
|
||||||
|
KpiMetric(name="domain_rating", value=round(domain_rating, 1))
|
||||||
|
)
|
||||||
|
|
||||||
|
# Referring domains from metrics
|
||||||
|
metrics_data = await self._ahrefs_get(
|
||||||
|
session,
|
||||||
|
"site-explorer/metrics",
|
||||||
|
{"target": url, "mode": "domain"},
|
||||||
|
)
|
||||||
|
ref_domains = 0
|
||||||
|
if "error" not in metrics_data:
|
||||||
|
metrics = metrics_data.get("metrics", metrics_data)
|
||||||
|
ref_domains = metrics.get("refdomains", 0)
|
||||||
|
dim.metrics.append(
|
||||||
|
KpiMetric(name="referring_domains", value=float(ref_domains))
|
||||||
|
)
|
||||||
|
except Exception as exc:
|
||||||
|
self.logger.error(f"Link KPI error: {exc}")
|
||||||
|
dim.metrics.append(KpiMetric(name="domain_rating", value=0.0))
|
||||||
|
dim.metrics.append(KpiMetric(name="referring_domains", value=0.0))
|
||||||
|
dim.compute_score()
|
||||||
|
return dim
|
||||||
|
|
||||||
|
async def get_technical_kpis(
|
||||||
|
self, session: aiohttp.ClientSession, url: str
|
||||||
|
) -> KpiDimension:
|
||||||
|
"""Collect technical KPIs (estimated from available data)."""
|
||||||
|
dim = KpiDimension(name="technical", weight=DIMENSION_WEIGHTS["technical"])
|
||||||
|
try:
|
||||||
|
data = await self._ahrefs_get(
|
||||||
|
session,
|
||||||
|
"site-explorer/metrics",
|
||||||
|
{"target": url, "mode": "domain"},
|
||||||
|
)
|
||||||
|
if "error" not in data:
|
||||||
|
metrics = data.get("metrics", data)
|
||||||
|
organic = metrics.get("organic", {})
|
||||||
|
pages_crawled = metrics.get("pages", organic.get("pages", 0))
|
||||||
|
# Heuristic: technical health score from available data
|
||||||
|
has_traffic = organic.get("traffic", 0) > 0
|
||||||
|
has_pages = pages_crawled > 0
|
||||||
|
tech_score = 50.0
|
||||||
|
if has_traffic:
|
||||||
|
tech_score += 25.0
|
||||||
|
if has_pages:
|
||||||
|
tech_score += 25.0
|
||||||
|
dim.metrics.append(
|
||||||
|
KpiMetric(name="technical_health_score", value=round(tech_score, 1))
|
||||||
|
)
|
||||||
|
dim.metrics.append(
|
||||||
|
KpiMetric(name="pages_crawled", value=float(pages_crawled))
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
dim.metrics.append(KpiMetric(name="technical_health_score", value=50.0))
|
||||||
|
dim.metrics.append(KpiMetric(name="pages_crawled", value=0.0))
|
||||||
|
except Exception as exc:
|
||||||
|
self.logger.error(f"Technical KPI error: {exc}")
|
||||||
|
dim.metrics.append(KpiMetric(name="technical_health_score", value=50.0))
|
||||||
|
dim.compute_score()
|
||||||
|
return dim
|
||||||
|
|
||||||
|
async def get_content_kpis(
|
||||||
|
self, session: aiohttp.ClientSession, url: str
|
||||||
|
) -> KpiDimension:
|
||||||
|
"""Collect content KPIs from available metrics."""
|
||||||
|
dim = KpiDimension(name="content", weight=DIMENSION_WEIGHTS["content"])
|
||||||
|
try:
|
||||||
|
data = await self._ahrefs_get(
|
||||||
|
session,
|
||||||
|
"site-explorer/metrics",
|
||||||
|
{"target": url, "mode": "domain"},
|
||||||
|
)
|
||||||
|
if "error" not in data:
|
||||||
|
metrics = data.get("metrics", data)
|
||||||
|
organic = metrics.get("organic", {})
|
||||||
|
pages = metrics.get("pages", organic.get("pages", 0))
|
||||||
|
keywords = organic.get("keywords", 0)
|
||||||
|
# Content freshness heuristic
|
||||||
|
freshness = min(100.0, (keywords / max(pages, 1)) * 5) if pages else 0.0
|
||||||
|
dim.metrics.append(
|
||||||
|
KpiMetric(name="indexed_pages", value=float(pages))
|
||||||
|
)
|
||||||
|
dim.metrics.append(
|
||||||
|
KpiMetric(name="content_freshness_score", value=round(freshness, 1))
|
||||||
|
)
|
||||||
|
dim.metrics.append(
|
||||||
|
KpiMetric(name="keywords_per_page", value=round(keywords / max(pages, 1), 2))
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
dim.metrics.append(KpiMetric(name="indexed_pages", value=0.0))
|
||||||
|
dim.metrics.append(KpiMetric(name="content_freshness_score", value=0.0))
|
||||||
|
except Exception as exc:
|
||||||
|
self.logger.error(f"Content KPI error: {exc}")
|
||||||
|
dim.metrics.append(KpiMetric(name="indexed_pages", value=0.0))
|
||||||
|
dim.compute_score()
|
||||||
|
return dim
|
||||||
|
|
||||||
|
async def get_local_kpis(self, url: str) -> KpiDimension:
|
||||||
|
"""Placeholder for local KPIs (requires external data)."""
|
||||||
|
dim = KpiDimension(name="local", weight=DIMENSION_WEIGHTS["local"])
|
||||||
|
dim.metrics.append(KpiMetric(name="gbp_visibility", value=0.0))
|
||||||
|
dim.metrics.append(KpiMetric(name="review_score", value=0.0))
|
||||||
|
dim.metrics.append(KpiMetric(name="citation_accuracy", value=0.0))
|
||||||
|
dim.compute_score()
|
||||||
|
return dim
|
||||||
|
|
||||||
|
# ----- Health score -----
|
||||||
|
|
||||||
|
def calculate_health_score(self, dimensions: list[KpiDimension]) -> HealthScore:
|
||||||
|
"""Calculate weighted health score across all dimensions."""
|
||||||
|
health = HealthScore()
|
||||||
|
total_weight = 0.0
|
||||||
|
weighted_sum = 0.0
|
||||||
|
|
||||||
|
for dim in dimensions:
|
||||||
|
dim.compute_score()
|
||||||
|
health.dimensions[dim.name] = dim.score
|
||||||
|
weighted_sum += dim.score * dim.weight
|
||||||
|
total_weight += dim.weight
|
||||||
|
|
||||||
|
if total_weight > 0:
|
||||||
|
health.overall = round(weighted_sum / total_weight, 1)
|
||||||
|
else:
|
||||||
|
health.overall = 0.0
|
||||||
|
|
||||||
|
# Determine trend from dimension trends
|
||||||
|
up_count = sum(
|
||||||
|
1 for d in dimensions
|
||||||
|
for m in d.metrics if m.trend == "up"
|
||||||
|
)
|
||||||
|
down_count = sum(
|
||||||
|
1 for d in dimensions
|
||||||
|
for m in d.metrics if m.trend == "down"
|
||||||
|
)
|
||||||
|
if up_count > down_count:
|
||||||
|
health.trend = "improving"
|
||||||
|
elif down_count > up_count:
|
||||||
|
health.trend = "declining"
|
||||||
|
else:
|
||||||
|
health.trend = "stable"
|
||||||
|
|
||||||
|
return health
|
||||||
|
|
||||||
|
# ----- Targets -----
|
||||||
|
|
||||||
|
def set_targets(self, dimensions: list[KpiDimension]) -> dict[str, Any]:
|
||||||
|
"""Calculate 30/60/90 day targets (5%/10%/20% improvement)."""
|
||||||
|
targets = {"30_day": {}, "60_day": {}, "90_day": {}}
|
||||||
|
growth_rates = {"30_day": 0.05, "60_day": 0.10, "90_day": 0.20}
|
||||||
|
|
||||||
|
for dim in dimensions:
|
||||||
|
for metric in dim.metrics:
|
||||||
|
if metric.value and metric.value > 0:
|
||||||
|
for period, rate in growth_rates.items():
|
||||||
|
key = f"{dim.name}.{metric.name}"
|
||||||
|
# For metrics where lower is better (e.g. bounce rate),
|
||||||
|
# improvement means decrease
|
||||||
|
if metric.name in ("bounce_rate", "crawl_errors", "thin_content_ratio"):
|
||||||
|
target_val = metric.value * (1 - rate)
|
||||||
|
else:
|
||||||
|
target_val = metric.value * (1 + rate)
|
||||||
|
targets[period][key] = round(target_val, 2)
|
||||||
|
metric.target_30d = targets["30_day"].get(f"{dim.name}.{metric.name}")
|
||||||
|
metric.target_60d = targets["60_day"].get(f"{dim.name}.{metric.name}")
|
||||||
|
metric.target_90d = targets["90_day"].get(f"{dim.name}.{metric.name}")
|
||||||
|
return targets
|
||||||
|
|
||||||
|
# ----- ROI estimation -----
|
||||||
|
|
||||||
|
def estimate_roi(self, traffic_dim: KpiDimension) -> RoiEstimate:
|
||||||
|
"""Estimate ROI from Ahrefs traffic cost data."""
|
||||||
|
roi = RoiEstimate()
|
||||||
|
for metric in traffic_dim.metrics:
|
||||||
|
if metric.name == "traffic_value_usd":
|
||||||
|
roi.traffic_value_usd = metric.value
|
||||||
|
roi.estimated_monthly_value = metric.value
|
||||||
|
if metric.previous_value is not None:
|
||||||
|
roi.traffic_value_change = round(
|
||||||
|
metric.value - metric.previous_value, 2
|
||||||
|
)
|
||||||
|
return roi
|
||||||
|
|
||||||
|
# ----- Baseline comparison -----
|
||||||
|
|
||||||
|
def compare_baseline(
|
||||||
|
self, current: list[KpiDimension], baseline: dict[str, Any]
|
||||||
|
) -> dict[str, Any]:
|
||||||
|
"""Compare current KPIs against a stored baseline."""
|
||||||
|
comparison = {}
|
||||||
|
baseline_kpis = baseline.get("kpis", {})
|
||||||
|
|
||||||
|
for dim in current:
|
||||||
|
dim_baseline = baseline_kpis.get(dim.name, {})
|
||||||
|
dim_comparison = {}
|
||||||
|
for metric in dim.metrics:
|
||||||
|
baseline_val = None
|
||||||
|
if isinstance(dim_baseline, dict):
|
||||||
|
baseline_val = dim_baseline.get(metric.name)
|
||||||
|
if baseline_val is not None:
|
||||||
|
metric.previous_value = float(baseline_val)
|
||||||
|
metric.compute_trend()
|
||||||
|
dim_comparison[metric.name] = {
|
||||||
|
"current": metric.value,
|
||||||
|
"baseline": baseline_val,
|
||||||
|
"change_pct": metric.change_pct,
|
||||||
|
"trend": metric.trend,
|
||||||
|
}
|
||||||
|
else:
|
||||||
|
dim_comparison[metric.name] = {
|
||||||
|
"current": metric.value,
|
||||||
|
"baseline": None,
|
||||||
|
"change_pct": None,
|
||||||
|
"trend": "no_baseline",
|
||||||
|
}
|
||||||
|
comparison[dim.name] = dim_comparison
|
||||||
|
return comparison
|
||||||
|
|
||||||
|
# ----- Executive summary -----
|
||||||
|
|
||||||
|
def generate_executive_summary(
|
||||||
|
self, dimensions: list[KpiDimension], health: HealthScore
|
||||||
|
) -> dict[str, Any]:
|
||||||
|
"""Generate executive summary with wins, concerns, recommendations."""
|
||||||
|
wins = []
|
||||||
|
concerns = []
|
||||||
|
recommendations = []
|
||||||
|
|
||||||
|
for dim in dimensions:
|
||||||
|
for metric in dim.metrics:
|
||||||
|
if metric.trend == "up" and metric.change_pct and metric.change_pct > 5:
|
||||||
|
wins.append(
|
||||||
|
f"{dim.name}/{metric.name}: +{metric.change_pct}% improvement"
|
||||||
|
)
|
||||||
|
elif metric.trend == "down" and metric.change_pct and metric.change_pct < -5:
|
||||||
|
concerns.append(
|
||||||
|
f"{dim.name}/{metric.name}: {metric.change_pct}% decline"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Generate recommendations based on dimension scores
|
||||||
|
for dim in dimensions:
|
||||||
|
if dim.score < 50:
|
||||||
|
recommendations.append(
|
||||||
|
f"Priority: Improve {dim.name} dimension (score: {dim.score}/100)"
|
||||||
|
)
|
||||||
|
elif dim.score < 70:
|
||||||
|
recommendations.append(
|
||||||
|
f"Monitor: {dim.name} dimension needs attention (score: {dim.score}/100)"
|
||||||
|
)
|
||||||
|
|
||||||
|
if not wins:
|
||||||
|
wins.append("No significant improvements detected in this period")
|
||||||
|
if not concerns:
|
||||||
|
concerns.append("No significant declines detected in this period")
|
||||||
|
if not recommendations:
|
||||||
|
recommendations.append("All dimensions performing well - maintain current strategy")
|
||||||
|
|
||||||
|
return {
|
||||||
|
"health_score": health.overall,
|
||||||
|
"health_trend": health.trend,
|
||||||
|
"top_wins": wins[:5],
|
||||||
|
"top_concerns": concerns[:5],
|
||||||
|
"recommendations": recommendations[:5],
|
||||||
|
}
|
||||||
|
|
||||||
|
# ----- Main orchestration -----
|
||||||
|
|
||||||
|
async def aggregate(
|
||||||
|
self,
|
||||||
|
url: str,
|
||||||
|
include_roi: bool = False,
|
||||||
|
baseline_path: str | None = None,
|
||||||
|
set_baseline: bool = False,
|
||||||
|
) -> KpiResult:
|
||||||
|
"""Orchestrate full KPI aggregation across all dimensions."""
|
||||||
|
result = KpiResult(url=url, timestamp=datetime.now().isoformat())
|
||||||
|
dimensions: list[KpiDimension] = []
|
||||||
|
|
||||||
|
async with aiohttp.ClientSession() as session:
|
||||||
|
# Collect all dimensions concurrently
|
||||||
|
tasks = [
|
||||||
|
self.get_traffic_kpis(session, url),
|
||||||
|
self.get_ranking_kpis(session, url),
|
||||||
|
self.get_link_kpis(session, url),
|
||||||
|
self.get_technical_kpis(session, url),
|
||||||
|
self.get_content_kpis(session, url),
|
||||||
|
]
|
||||||
|
gathered = await asyncio.gather(*tasks, return_exceptions=True)
|
||||||
|
|
||||||
|
for item in gathered:
|
||||||
|
if isinstance(item, Exception):
|
||||||
|
result.errors.append(str(item))
|
||||||
|
self.logger.error(f"Dimension error: {item}")
|
||||||
|
else:
|
||||||
|
dimensions.append(item)
|
||||||
|
|
||||||
|
# Local KPIs (no API call needed)
|
||||||
|
local_dim = await self.get_local_kpis(url)
|
||||||
|
dimensions.append(local_dim)
|
||||||
|
|
||||||
|
# Load baseline if provided
|
||||||
|
if baseline_path:
|
||||||
|
try:
|
||||||
|
baseline_data = json.loads(Path(baseline_path).read_text())
|
||||||
|
result.baseline_comparison = self.compare_baseline(dimensions, baseline_data)
|
||||||
|
except Exception as exc:
|
||||||
|
result.errors.append(f"Baseline load error: {exc}")
|
||||||
|
|
||||||
|
# Calculate health score
|
||||||
|
health = self.calculate_health_score(dimensions)
|
||||||
|
result.health_score = health.overall
|
||||||
|
result.health_trend = health.trend
|
||||||
|
|
||||||
|
# Build KPI dictionary
|
||||||
|
for dim in dimensions:
|
||||||
|
result.kpis[dim.name] = {
|
||||||
|
"score": dim.score,
|
||||||
|
"weight": dim.weight,
|
||||||
|
"metrics": {m.name: asdict(m) for m in dim.metrics},
|
||||||
|
}
|
||||||
|
|
||||||
|
# Set targets
|
||||||
|
targets = self.set_targets(dimensions)
|
||||||
|
result.targets = targets
|
||||||
|
|
||||||
|
# ROI estimation
|
||||||
|
if include_roi:
|
||||||
|
traffic_dim = next((d for d in dimensions if d.name == "traffic"), None)
|
||||||
|
if traffic_dim:
|
||||||
|
roi = self.estimate_roi(traffic_dim)
|
||||||
|
result.roi = roi
|
||||||
|
|
||||||
|
# Executive summary
|
||||||
|
result.executive_summary = self.generate_executive_summary(dimensions, health)
|
||||||
|
|
||||||
|
# Save baseline if requested
|
||||||
|
if set_baseline:
|
||||||
|
baseline_out = {
|
||||||
|
"url": url,
|
||||||
|
"timestamp": result.timestamp,
|
||||||
|
"kpis": {},
|
||||||
|
}
|
||||||
|
for dim in dimensions:
|
||||||
|
baseline_out["kpis"][dim.name] = {
|
||||||
|
m.name: m.value for m in dim.metrics
|
||||||
|
}
|
||||||
|
baseline_file = f"baseline_{url.replace('https://', '').replace('/', '_')}.json"
|
||||||
|
Path(baseline_file).write_text(json.dumps(baseline_out, indent=2))
|
||||||
|
self.logger.info(f"Baseline saved to {baseline_file}")
|
||||||
|
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Output formatting
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def format_text_report(result: KpiResult) -> str:
|
||||||
|
"""Format KPI result as human-readable text report."""
|
||||||
|
lines = []
|
||||||
|
lines.append("=" * 60)
|
||||||
|
lines.append(f"SEO KPI Dashboard: {result.url}")
|
||||||
|
lines.append(f"Timestamp: {result.timestamp}")
|
||||||
|
lines.append("=" * 60)
|
||||||
|
lines.append("")
|
||||||
|
|
||||||
|
# Health score
|
||||||
|
lines.append(f"Overall Health Score: {result.health_score}/100 ({result.health_trend})")
|
||||||
|
lines.append("-" * 40)
|
||||||
|
|
||||||
|
# Dimensions
|
||||||
|
for dim_name, dim_data in result.kpis.items():
|
||||||
|
lines.append(f"\n[{dim_name.upper()}] Score: {dim_data['score']}/100 (weight: {dim_data['weight']})")
|
||||||
|
metrics = dim_data.get("metrics", {})
|
||||||
|
for m_name, m_data in metrics.items():
|
||||||
|
trend_arrow = {"up": "^", "down": "v", "stable": "=", "no_baseline": "?"}.get(
|
||||||
|
m_data.get("trend", "stable"), "="
|
||||||
|
)
|
||||||
|
val = m_data.get("value", 0)
|
||||||
|
change = m_data.get("change_pct")
|
||||||
|
change_str = f" ({change:+.1f}%)" if change is not None else ""
|
||||||
|
lines.append(f" {trend_arrow} {m_name}: {val}{change_str}")
|
||||||
|
|
||||||
|
# Targets
|
||||||
|
if result.targets:
|
||||||
|
lines.append("\n" + "-" * 40)
|
||||||
|
lines.append("TARGETS")
|
||||||
|
for period, targets in result.targets.items():
|
||||||
|
if targets:
|
||||||
|
lines.append(f"\n {period}:")
|
||||||
|
for key, val in list(targets.items())[:10]:
|
||||||
|
lines.append(f" {key}: {val}")
|
||||||
|
|
||||||
|
# ROI
|
||||||
|
if result.roi:
|
||||||
|
lines.append("\n" + "-" * 40)
|
||||||
|
lines.append("ROI ESTIMATE")
|
||||||
|
lines.append(f" Traffic Value (USD): ${result.roi.traffic_value_usd:,.2f}")
|
||||||
|
lines.append(f" Monthly Value: ${result.roi.estimated_monthly_value:,.2f}")
|
||||||
|
lines.append(f" Value Change: ${result.roi.traffic_value_change:,.2f}")
|
||||||
|
|
||||||
|
# Executive summary
|
||||||
|
if result.executive_summary:
|
||||||
|
lines.append("\n" + "-" * 40)
|
||||||
|
lines.append("EXECUTIVE SUMMARY")
|
||||||
|
lines.append(f" Health: {result.executive_summary.get('health_score', 0)}/100")
|
||||||
|
lines.append(f" Trend: {result.executive_summary.get('health_trend', 'stable')}")
|
||||||
|
lines.append("\n Top Wins:")
|
||||||
|
for win in result.executive_summary.get("top_wins", []):
|
||||||
|
lines.append(f" + {win}")
|
||||||
|
lines.append("\n Top Concerns:")
|
||||||
|
for concern in result.executive_summary.get("top_concerns", []):
|
||||||
|
lines.append(f" - {concern}")
|
||||||
|
lines.append("\n Recommendations:")
|
||||||
|
for rec in result.executive_summary.get("recommendations", []):
|
||||||
|
lines.append(f" > {rec}")
|
||||||
|
|
||||||
|
# Errors
|
||||||
|
if result.errors:
|
||||||
|
lines.append("\n" + "-" * 40)
|
||||||
|
lines.append("ERRORS:")
|
||||||
|
for err in result.errors:
|
||||||
|
lines.append(f" ! {err}")
|
||||||
|
|
||||||
|
lines.append("\n" + "=" * 60)
|
||||||
|
return "\n".join(lines)
|
||||||
|
|
||||||
|
|
||||||
|
def serialize_result(result: KpiResult) -> dict:
|
||||||
|
"""Serialize KpiResult to JSON-safe dictionary."""
|
||||||
|
data = {
|
||||||
|
"url": result.url,
|
||||||
|
"health_score": result.health_score,
|
||||||
|
"health_trend": result.health_trend,
|
||||||
|
"kpis": result.kpis,
|
||||||
|
"targets": result.targets,
|
||||||
|
"executive_summary": result.executive_summary,
|
||||||
|
"timestamp": result.timestamp,
|
||||||
|
"errors": result.errors,
|
||||||
|
}
|
||||||
|
if result.roi:
|
||||||
|
data["roi"] = asdict(result.roi)
|
||||||
|
if result.baseline_comparison:
|
||||||
|
data["baseline_comparison"] = result.baseline_comparison
|
||||||
|
return data
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# CLI entry point
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def parse_args() -> argparse.Namespace:
|
||||||
|
"""Parse command-line arguments."""
|
||||||
|
parser = argparse.ArgumentParser(
|
||||||
|
description="SEO KPI Aggregator - Unified metrics dashboard"
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--url", required=True, help="Target URL or domain to analyze"
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--set-baseline", action="store_true",
|
||||||
|
help="Save current KPIs as baseline file"
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--baseline", type=str, default=None,
|
||||||
|
help="Path to baseline JSON file for comparison"
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--roi", action="store_true",
|
||||||
|
help="Include ROI estimation from traffic cost"
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--json", action="store_true",
|
||||||
|
help="Output results as JSON"
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--output", type=str, default=None,
|
||||||
|
help="Save output to file path"
|
||||||
|
)
|
||||||
|
return parser.parse_args()
|
||||||
|
|
||||||
|
|
||||||
|
async def main() -> None:
|
||||||
|
"""Main entry point."""
|
||||||
|
args = parse_args()
|
||||||
|
|
||||||
|
aggregator = KpiAggregator()
|
||||||
|
result = await aggregator.aggregate(
|
||||||
|
url=args.url,
|
||||||
|
include_roi=args.roi,
|
||||||
|
baseline_path=args.baseline,
|
||||||
|
set_baseline=args.set_baseline,
|
||||||
|
)
|
||||||
|
|
||||||
|
if args.json:
|
||||||
|
output = json.dumps(serialize_result(result), indent=2, ensure_ascii=False)
|
||||||
|
else:
|
||||||
|
output = format_text_report(result)
|
||||||
|
|
||||||
|
if args.output:
|
||||||
|
Path(args.output).write_text(output, encoding="utf-8")
|
||||||
|
logger.info(f"Output saved to {args.output}")
|
||||||
|
else:
|
||||||
|
print(output)
|
||||||
|
|
||||||
|
aggregator.print_stats()
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
asyncio.run(main())
|
||||||
@@ -0,0 +1,801 @@
|
|||||||
|
"""
|
||||||
|
Performance Reporter - Period-over-period SEO performance reports
|
||||||
|
================================================================
|
||||||
|
Purpose: Generate executive summaries, trend analysis, tactical breakdowns,
|
||||||
|
and target-vs-actual comparison from Ahrefs historical data.
|
||||||
|
Python: 3.10+
|
||||||
|
"""
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import asyncio
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
import sys
|
||||||
|
from dataclasses import dataclass, field, asdict
|
||||||
|
from datetime import datetime, timedelta
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
|
|
||||||
|
from base_client import BaseAsyncClient, config
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Data classes
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class TrendData:
|
||||||
|
"""Single trend data point for a metric."""
|
||||||
|
period: str
|
||||||
|
value: float
|
||||||
|
change_pct: float | None = None
|
||||||
|
direction: str = "stable" # up, down, stable
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class WinConcern:
|
||||||
|
"""A notable win or concern from performance analysis."""
|
||||||
|
category: str
|
||||||
|
description: str
|
||||||
|
impact: str = "medium" # high, medium, low
|
||||||
|
action: str = ""
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class TargetProgress:
|
||||||
|
"""Target vs actual progress tracking."""
|
||||||
|
kpi_name: str
|
||||||
|
target: float
|
||||||
|
actual: float
|
||||||
|
progress_pct: float = 0.0
|
||||||
|
|
||||||
|
def compute_progress(self) -> None:
|
||||||
|
"""Compute progress percentage toward target."""
|
||||||
|
if self.target and self.target != 0:
|
||||||
|
self.progress_pct = round((self.actual / self.target) * 100, 1)
|
||||||
|
else:
|
||||||
|
self.progress_pct = 0.0
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class PerformanceReport:
|
||||||
|
"""Complete performance report."""
|
||||||
|
url: str = ""
|
||||||
|
period: str = "monthly"
|
||||||
|
date_from: str = ""
|
||||||
|
date_to: str = ""
|
||||||
|
health_score: float = 0.0
|
||||||
|
health_trend: str = "stable"
|
||||||
|
trends: dict[str, list[TrendData]] = field(default_factory=dict)
|
||||||
|
wins: list[WinConcern] = field(default_factory=list)
|
||||||
|
concerns: list[WinConcern] = field(default_factory=list)
|
||||||
|
executive_summary: dict[str, Any] = field(default_factory=dict)
|
||||||
|
tactical_breakdown: dict[str, Any] = field(default_factory=dict)
|
||||||
|
target_progress: list[TargetProgress] = field(default_factory=list)
|
||||||
|
traffic_value_change: float = 0.0
|
||||||
|
timestamp: str = ""
|
||||||
|
errors: list[str] = field(default_factory=list)
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Period helpers
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
PERIOD_DAYS = {
|
||||||
|
"monthly": 30,
|
||||||
|
"quarterly": 90,
|
||||||
|
"yearly": 365,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def get_date_range(
|
||||||
|
period: str, date_from: str | None = None, date_to: str | None = None
|
||||||
|
) -> tuple[str, str]:
|
||||||
|
"""Compute date range from period or explicit dates."""
|
||||||
|
if date_from and date_to:
|
||||||
|
return date_from, date_to
|
||||||
|
end = datetime.now()
|
||||||
|
days = PERIOD_DAYS.get(period, 30)
|
||||||
|
start = end - timedelta(days=days)
|
||||||
|
return start.strftime("%Y-%m-%d"), end.strftime("%Y-%m-%d")
|
||||||
|
|
||||||
|
|
||||||
|
def get_previous_range(
|
||||||
|
date_from: str, date_to: str
|
||||||
|
) -> tuple[str, str]:
|
||||||
|
"""Compute the previous period of equal length for comparison."""
|
||||||
|
start = datetime.strptime(date_from, "%Y-%m-%d")
|
||||||
|
end = datetime.strptime(date_to, "%Y-%m-%d")
|
||||||
|
delta = end - start
|
||||||
|
prev_end = start - timedelta(days=1)
|
||||||
|
prev_start = prev_end - delta
|
||||||
|
return prev_start.strftime("%Y-%m-%d"), prev_end.strftime("%Y-%m-%d")
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Performance Reporter
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
class PerformanceReporter(BaseAsyncClient):
|
||||||
|
"""Generate period-over-period SEO performance reports from Ahrefs."""
|
||||||
|
|
||||||
|
AHREFS_BASE = "https://api.ahrefs.com/v3"
|
||||||
|
|
||||||
|
def __init__(self, api_token: str | None = None):
|
||||||
|
super().__init__(max_concurrent=3, requests_per_second=2.0)
|
||||||
|
self.api_token = api_token or config.get_required("AHREFS_API_TOKEN")
|
||||||
|
self.headers = {
|
||||||
|
"Authorization": f"Bearer {self.api_token}",
|
||||||
|
"Accept": "application/json",
|
||||||
|
}
|
||||||
|
|
||||||
|
async def _ahrefs_get(
|
||||||
|
self, session: aiohttp.ClientSession, endpoint: str, params: dict
|
||||||
|
) -> dict:
|
||||||
|
"""Make an authenticated GET request to Ahrefs API."""
|
||||||
|
url = f"{self.AHREFS_BASE}/{endpoint}"
|
||||||
|
async with session.get(url, headers=self.headers, params=params) as resp:
|
||||||
|
if resp.status != 200:
|
||||||
|
text = await resp.text()
|
||||||
|
self.logger.warning(f"Ahrefs {endpoint} returned {resp.status}: {text}")
|
||||||
|
return {"error": f"HTTP {resp.status}", "detail": text}
|
||||||
|
return await resp.json()
|
||||||
|
|
||||||
|
# ----- Data collectors -----
|
||||||
|
|
||||||
|
async def get_metrics_history(
|
||||||
|
self,
|
||||||
|
session: aiohttp.ClientSession,
|
||||||
|
url: str,
|
||||||
|
date_from: str,
|
||||||
|
date_to: str,
|
||||||
|
) -> list[dict]:
|
||||||
|
"""Fetch historical metrics via site-explorer-metrics-history."""
|
||||||
|
data = await self._ahrefs_get(
|
||||||
|
session,
|
||||||
|
"site-explorer/metrics-history",
|
||||||
|
{
|
||||||
|
"target": url,
|
||||||
|
"mode": "domain",
|
||||||
|
"date_from": date_from,
|
||||||
|
"date_to": date_to,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
if "error" in data:
|
||||||
|
self.logger.warning(f"Metrics history error: {data}")
|
||||||
|
return []
|
||||||
|
return data.get("metrics", data.get("data", []))
|
||||||
|
|
||||||
|
async def get_dr_history(
|
||||||
|
self,
|
||||||
|
session: aiohttp.ClientSession,
|
||||||
|
url: str,
|
||||||
|
date_from: str,
|
||||||
|
date_to: str,
|
||||||
|
) -> list[dict]:
|
||||||
|
"""Fetch domain rating history."""
|
||||||
|
data = await self._ahrefs_get(
|
||||||
|
session,
|
||||||
|
"site-explorer/domain-rating-history",
|
||||||
|
{
|
||||||
|
"target": url,
|
||||||
|
"date_from": date_from,
|
||||||
|
"date_to": date_to,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
if "error" in data:
|
||||||
|
return []
|
||||||
|
return data.get("domain_rating_history", data.get("data", []))
|
||||||
|
|
||||||
|
async def get_current_metrics(
|
||||||
|
self, session: aiohttp.ClientSession, url: str
|
||||||
|
) -> dict:
|
||||||
|
"""Fetch current snapshot metrics."""
|
||||||
|
data = await self._ahrefs_get(
|
||||||
|
session,
|
||||||
|
"site-explorer/metrics",
|
||||||
|
{"target": url, "mode": "domain"},
|
||||||
|
)
|
||||||
|
if "error" in data:
|
||||||
|
return {}
|
||||||
|
return data.get("metrics", data)
|
||||||
|
|
||||||
|
async def get_volume_history(
|
||||||
|
self,
|
||||||
|
session: aiohttp.ClientSession,
|
||||||
|
url: str,
|
||||||
|
date_from: str,
|
||||||
|
date_to: str,
|
||||||
|
) -> list[dict]:
|
||||||
|
"""Fetch total search volume history."""
|
||||||
|
data = await self._ahrefs_get(
|
||||||
|
session,
|
||||||
|
"site-explorer/total-search-volume-history",
|
||||||
|
{
|
||||||
|
"target": url,
|
||||||
|
"date_from": date_from,
|
||||||
|
"date_to": date_to,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
if "error" in data:
|
||||||
|
return []
|
||||||
|
return data.get("total_search_volume_history", data.get("data", []))
|
||||||
|
|
||||||
|
# ----- Analysis methods -----
|
||||||
|
|
||||||
|
def calculate_period_comparison(
|
||||||
|
self, current_data: list[dict], previous_data: list[dict], metric_key: str
|
||||||
|
) -> list[TrendData]:
|
||||||
|
"""Compare metric values between current and previous period."""
|
||||||
|
trends = []
|
||||||
|
|
||||||
|
def avg_metric(data_list: list[dict], key: str) -> float:
|
||||||
|
vals = []
|
||||||
|
for entry in data_list:
|
||||||
|
val = entry.get(key)
|
||||||
|
if val is None:
|
||||||
|
organic = entry.get("organic", {})
|
||||||
|
val = organic.get(key)
|
||||||
|
if val is not None:
|
||||||
|
vals.append(float(val))
|
||||||
|
return sum(vals) / len(vals) if vals else 0.0
|
||||||
|
|
||||||
|
current_avg = avg_metric(current_data, metric_key)
|
||||||
|
previous_avg = avg_metric(previous_data, metric_key)
|
||||||
|
|
||||||
|
change_pct = None
|
||||||
|
direction = "stable"
|
||||||
|
if previous_avg and previous_avg != 0:
|
||||||
|
change_pct = round(((current_avg - previous_avg) / abs(previous_avg)) * 100, 2)
|
||||||
|
if change_pct > 2.0:
|
||||||
|
direction = "up"
|
||||||
|
elif change_pct < -2.0:
|
||||||
|
direction = "down"
|
||||||
|
|
||||||
|
trends.append(TrendData(
|
||||||
|
period=metric_key,
|
||||||
|
value=round(current_avg, 2),
|
||||||
|
change_pct=change_pct,
|
||||||
|
direction=direction,
|
||||||
|
))
|
||||||
|
return trends
|
||||||
|
|
||||||
|
def identify_wins(
|
||||||
|
self, current: dict, previous: dict
|
||||||
|
) -> list[WinConcern]:
|
||||||
|
"""Identify significant positive changes between periods."""
|
||||||
|
wins = []
|
||||||
|
metric_labels = {
|
||||||
|
"traffic": "Organic Traffic",
|
||||||
|
"cost": "Traffic Value",
|
||||||
|
"keywords": "Keyword Count",
|
||||||
|
"refdomains": "Referring Domains",
|
||||||
|
}
|
||||||
|
|
||||||
|
for key, label in metric_labels.items():
|
||||||
|
curr_val = self._extract_metric(current, key)
|
||||||
|
prev_val = self._extract_metric(previous, key)
|
||||||
|
if prev_val and prev_val > 0 and curr_val > prev_val:
|
||||||
|
change_pct = ((curr_val - prev_val) / prev_val) * 100
|
||||||
|
if change_pct >= 5.0:
|
||||||
|
impact = "high" if change_pct >= 20 else ("medium" if change_pct >= 10 else "low")
|
||||||
|
wins.append(WinConcern(
|
||||||
|
category=label,
|
||||||
|
description=f"{label} increased by {change_pct:+.1f}% ({prev_val:,.0f} -> {curr_val:,.0f})",
|
||||||
|
impact=impact,
|
||||||
|
action=f"Continue current {label.lower()} strategy",
|
||||||
|
))
|
||||||
|
return wins
|
||||||
|
|
||||||
|
def identify_concerns(
|
||||||
|
self, current: dict, previous: dict
|
||||||
|
) -> list[WinConcern]:
|
||||||
|
"""Identify significant negative changes between periods."""
|
||||||
|
concerns = []
|
||||||
|
metric_labels = {
|
||||||
|
"traffic": "Organic Traffic",
|
||||||
|
"cost": "Traffic Value",
|
||||||
|
"keywords": "Keyword Count",
|
||||||
|
"refdomains": "Referring Domains",
|
||||||
|
}
|
||||||
|
|
||||||
|
for key, label in metric_labels.items():
|
||||||
|
curr_val = self._extract_metric(current, key)
|
||||||
|
prev_val = self._extract_metric(previous, key)
|
||||||
|
if prev_val and prev_val > 0 and curr_val < prev_val:
|
||||||
|
change_pct = ((curr_val - prev_val) / prev_val) * 100
|
||||||
|
if change_pct <= -5.0:
|
||||||
|
impact = "high" if change_pct <= -20 else ("medium" if change_pct <= -10 else "low")
|
||||||
|
actions = {
|
||||||
|
"Organic Traffic": "Investigate traffic sources and algorithm updates",
|
||||||
|
"Traffic Value": "Review keyword targeting and content quality",
|
||||||
|
"Keyword Count": "Expand content coverage and optimize existing pages",
|
||||||
|
"Referring Domains": "Strengthen link building outreach campaigns",
|
||||||
|
}
|
||||||
|
concerns.append(WinConcern(
|
||||||
|
category=label,
|
||||||
|
description=f"{label} decreased by {change_pct:.1f}% ({prev_val:,.0f} -> {curr_val:,.0f})",
|
||||||
|
impact=impact,
|
||||||
|
action=actions.get(label, f"Review {label.lower()} strategy"),
|
||||||
|
))
|
||||||
|
return concerns
|
||||||
|
|
||||||
|
def _extract_metric(self, data: dict, key: str) -> float:
|
||||||
|
"""Extract a metric value from nested Ahrefs response."""
|
||||||
|
if key in data:
|
||||||
|
return float(data[key])
|
||||||
|
organic = data.get("organic", {})
|
||||||
|
if key in organic:
|
||||||
|
return float(organic[key])
|
||||||
|
return 0.0
|
||||||
|
|
||||||
|
def generate_executive_summary(
|
||||||
|
self,
|
||||||
|
wins: list[WinConcern],
|
||||||
|
concerns: list[WinConcern],
|
||||||
|
health_score: float,
|
||||||
|
health_trend: str,
|
||||||
|
traffic_value_change: float,
|
||||||
|
) -> dict[str, Any]:
|
||||||
|
"""Generate high-level executive summary."""
|
||||||
|
summary = {
|
||||||
|
"health_score": health_score,
|
||||||
|
"health_trend": health_trend,
|
||||||
|
"traffic_value_change_usd": round(traffic_value_change, 2),
|
||||||
|
"total_wins": len(wins),
|
||||||
|
"total_concerns": len(concerns),
|
||||||
|
"top_wins": [
|
||||||
|
{"category": w.category, "description": w.description, "impact": w.impact}
|
||||||
|
for w in sorted(wins, key=lambda x: {"high": 0, "medium": 1, "low": 2}.get(x.impact, 3))[:5]
|
||||||
|
],
|
||||||
|
"top_concerns": [
|
||||||
|
{"category": c.category, "description": c.description, "impact": c.impact}
|
||||||
|
for c in sorted(concerns, key=lambda x: {"high": 0, "medium": 1, "low": 2}.get(x.impact, 3))[:5]
|
||||||
|
],
|
||||||
|
"overall_assessment": "",
|
||||||
|
}
|
||||||
|
|
||||||
|
if health_score >= 75:
|
||||||
|
summary["overall_assessment"] = "Strong performance - focus on maintaining momentum"
|
||||||
|
elif health_score >= 50:
|
||||||
|
summary["overall_assessment"] = "Moderate performance - targeted improvements needed"
|
||||||
|
else:
|
||||||
|
summary["overall_assessment"] = "Needs attention - prioritize fundamental improvements"
|
||||||
|
|
||||||
|
return summary
|
||||||
|
|
||||||
|
def generate_tactical_breakdown(
|
||||||
|
self, current: dict, wins: list[WinConcern], concerns: list[WinConcern]
|
||||||
|
) -> dict[str, Any]:
|
||||||
|
"""Generate actionable next steps per dimension."""
|
||||||
|
breakdown = {
|
||||||
|
"traffic": {
|
||||||
|
"status": "needs_review",
|
||||||
|
"actions": [],
|
||||||
|
},
|
||||||
|
"rankings": {
|
||||||
|
"status": "needs_review",
|
||||||
|
"actions": [],
|
||||||
|
},
|
||||||
|
"links": {
|
||||||
|
"status": "needs_review",
|
||||||
|
"actions": [],
|
||||||
|
},
|
||||||
|
"content": {
|
||||||
|
"status": "needs_review",
|
||||||
|
"actions": [],
|
||||||
|
},
|
||||||
|
"technical": {
|
||||||
|
"status": "needs_review",
|
||||||
|
"actions": [],
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
traffic = self._extract_metric(current, "traffic")
|
||||||
|
keywords = self._extract_metric(current, "keywords")
|
||||||
|
refdomains = self._extract_metric(current, "refdomains")
|
||||||
|
|
||||||
|
# Traffic actions
|
||||||
|
if traffic > 0:
|
||||||
|
breakdown["traffic"]["status"] = "active"
|
||||||
|
breakdown["traffic"]["actions"].append("Monitor top landing pages for traffic drops")
|
||||||
|
breakdown["traffic"]["actions"].append("Identify new keyword opportunities in adjacent topics")
|
||||||
|
else:
|
||||||
|
breakdown["traffic"]["actions"].append("Establish organic traffic baseline with content strategy")
|
||||||
|
|
||||||
|
# Rankings actions
|
||||||
|
if keywords > 0:
|
||||||
|
breakdown["rankings"]["status"] = "active"
|
||||||
|
breakdown["rankings"]["actions"].append(
|
||||||
|
f"Optimize pages for {int(keywords)} tracked keywords"
|
||||||
|
)
|
||||||
|
breakdown["rankings"]["actions"].append("Target featured snippets for top-performing queries")
|
||||||
|
else:
|
||||||
|
breakdown["rankings"]["actions"].append("Begin keyword research and content mapping")
|
||||||
|
|
||||||
|
# Links actions
|
||||||
|
if refdomains > 0:
|
||||||
|
breakdown["links"]["status"] = "active"
|
||||||
|
breakdown["links"]["actions"].append("Analyze top referring domains for partnership opportunities")
|
||||||
|
breakdown["links"]["actions"].append("Monitor for lost backlinks and reclaim valuable links")
|
||||||
|
else:
|
||||||
|
breakdown["links"]["actions"].append("Develop link acquisition strategy with digital PR")
|
||||||
|
|
||||||
|
# Content actions
|
||||||
|
breakdown["content"]["actions"].append("Audit content freshness and update older pages")
|
||||||
|
breakdown["content"]["actions"].append("Identify content gaps using competitor analysis")
|
||||||
|
|
||||||
|
# Technical actions
|
||||||
|
breakdown["technical"]["actions"].append("Run technical SEO audit for crawl issues")
|
||||||
|
breakdown["technical"]["actions"].append("Verify Core Web Vitals pass thresholds")
|
||||||
|
|
||||||
|
# Enrich with win/concern context
|
||||||
|
for w in wins:
|
||||||
|
cat_lower = w.category.lower()
|
||||||
|
if "traffic" in cat_lower and breakdown.get("traffic"):
|
||||||
|
breakdown["traffic"]["status"] = "improving"
|
||||||
|
if "keyword" in cat_lower and breakdown.get("rankings"):
|
||||||
|
breakdown["rankings"]["status"] = "improving"
|
||||||
|
if "domain" in cat_lower or "link" in cat_lower:
|
||||||
|
breakdown["links"]["status"] = "improving"
|
||||||
|
|
||||||
|
for c in concerns:
|
||||||
|
cat_lower = c.category.lower()
|
||||||
|
if "traffic" in cat_lower and breakdown.get("traffic"):
|
||||||
|
breakdown["traffic"]["status"] = "declining"
|
||||||
|
breakdown["traffic"]["actions"].insert(0, c.action)
|
||||||
|
if "keyword" in cat_lower and breakdown.get("rankings"):
|
||||||
|
breakdown["rankings"]["status"] = "declining"
|
||||||
|
breakdown["rankings"]["actions"].insert(0, c.action)
|
||||||
|
|
||||||
|
return breakdown
|
||||||
|
|
||||||
|
def compare_targets(
|
||||||
|
self, current: dict, targets: dict
|
||||||
|
) -> list[TargetProgress]:
|
||||||
|
"""Compare current metrics against saved targets."""
|
||||||
|
progress_list = []
|
||||||
|
for key, target_val in targets.items():
|
||||||
|
parts = key.split(".")
|
||||||
|
metric_name = parts[-1] if len(parts) > 1 else key
|
||||||
|
actual = self._extract_metric(current, metric_name)
|
||||||
|
if actual == 0.0 and len(parts) > 1:
|
||||||
|
# Try alternate key resolution
|
||||||
|
actual = current.get(key, 0.0)
|
||||||
|
if isinstance(actual, dict):
|
||||||
|
actual = 0.0
|
||||||
|
tp = TargetProgress(
|
||||||
|
kpi_name=key,
|
||||||
|
target=float(target_val),
|
||||||
|
actual=float(actual),
|
||||||
|
)
|
||||||
|
tp.compute_progress()
|
||||||
|
progress_list.append(tp)
|
||||||
|
return progress_list
|
||||||
|
|
||||||
|
# ----- Main orchestration -----
|
||||||
|
|
||||||
|
async def report(
|
||||||
|
self,
|
||||||
|
url: str,
|
||||||
|
period: str = "monthly",
|
||||||
|
date_from: str | None = None,
|
||||||
|
date_to: str | None = None,
|
||||||
|
executive_only: bool = False,
|
||||||
|
targets_path: str | None = None,
|
||||||
|
) -> PerformanceReport:
|
||||||
|
"""Orchestrate full performance report generation."""
|
||||||
|
report = PerformanceReport(
|
||||||
|
url=url,
|
||||||
|
period=period,
|
||||||
|
timestamp=datetime.now().isoformat(),
|
||||||
|
)
|
||||||
|
|
||||||
|
# Determine date ranges
|
||||||
|
report.date_from, report.date_to = get_date_range(period, date_from, date_to)
|
||||||
|
prev_from, prev_to = get_previous_range(report.date_from, report.date_to)
|
||||||
|
|
||||||
|
async with aiohttp.ClientSession() as session:
|
||||||
|
# Fetch current and previous period data concurrently
|
||||||
|
tasks = [
|
||||||
|
self.get_metrics_history(session, url, report.date_from, report.date_to),
|
||||||
|
self.get_metrics_history(session, url, prev_from, prev_to),
|
||||||
|
self.get_current_metrics(session, url),
|
||||||
|
self.get_dr_history(session, url, report.date_from, report.date_to),
|
||||||
|
self.get_volume_history(session, url, report.date_from, report.date_to),
|
||||||
|
]
|
||||||
|
results = await asyncio.gather(*tasks, return_exceptions=True)
|
||||||
|
|
||||||
|
current_history = results[0] if not isinstance(results[0], Exception) else []
|
||||||
|
previous_history = results[1] if not isinstance(results[1], Exception) else []
|
||||||
|
current_snapshot = results[2] if not isinstance(results[2], Exception) else {}
|
||||||
|
dr_history = results[3] if not isinstance(results[3], Exception) else []
|
||||||
|
volume_history = results[4] if not isinstance(results[4], Exception) else []
|
||||||
|
|
||||||
|
for i, r in enumerate(results):
|
||||||
|
if isinstance(r, Exception):
|
||||||
|
report.errors.append(f"Data fetch error [{i}]: {r}")
|
||||||
|
|
||||||
|
# Calculate trends for key metrics
|
||||||
|
for metric_key in ["traffic", "keywords", "cost", "refdomains"]:
|
||||||
|
if current_history or previous_history:
|
||||||
|
trend = self.calculate_period_comparison(
|
||||||
|
current_history if isinstance(current_history, list) else [],
|
||||||
|
previous_history if isinstance(previous_history, list) else [],
|
||||||
|
metric_key,
|
||||||
|
)
|
||||||
|
report.trends[metric_key] = [asdict(t) for t in trend]
|
||||||
|
|
||||||
|
# Build previous snapshot for comparison
|
||||||
|
previous_snapshot = {}
|
||||||
|
if isinstance(previous_history, list) and previous_history:
|
||||||
|
for entry in previous_history:
|
||||||
|
for key in ("traffic", "cost", "keywords", "refdomains"):
|
||||||
|
val = entry.get(key)
|
||||||
|
if val is None:
|
||||||
|
organic = entry.get("organic", {})
|
||||||
|
val = organic.get(key)
|
||||||
|
if val is not None:
|
||||||
|
if key not in previous_snapshot:
|
||||||
|
previous_snapshot[key] = []
|
||||||
|
previous_snapshot[key].append(float(val))
|
||||||
|
# Average the values
|
||||||
|
previous_snapshot = {
|
||||||
|
k: sum(v) / len(v) for k, v in previous_snapshot.items() if v
|
||||||
|
}
|
||||||
|
|
||||||
|
# Identify wins and concerns
|
||||||
|
if isinstance(current_snapshot, dict):
|
||||||
|
report.wins = self.identify_wins(current_snapshot, previous_snapshot)
|
||||||
|
report.concerns = self.identify_concerns(current_snapshot, previous_snapshot)
|
||||||
|
else:
|
||||||
|
report.wins = []
|
||||||
|
report.concerns = []
|
||||||
|
|
||||||
|
# Calculate health score (simple heuristic)
|
||||||
|
traffic = self._extract_metric(current_snapshot, "traffic") if isinstance(current_snapshot, dict) else 0
|
||||||
|
keywords = self._extract_metric(current_snapshot, "keywords") if isinstance(current_snapshot, dict) else 0
|
||||||
|
score_components = []
|
||||||
|
if traffic > 0:
|
||||||
|
score_components.append(min(100, traffic / 100))
|
||||||
|
if keywords > 0:
|
||||||
|
score_components.append(min(100, keywords / 50))
|
||||||
|
if dr_history:
|
||||||
|
latest_dr = dr_history[-1] if isinstance(dr_history, list) else {}
|
||||||
|
dr_val = latest_dr.get("domain_rating", latest_dr.get("domainRating", 0))
|
||||||
|
score_components.append(float(dr_val))
|
||||||
|
report.health_score = round(
|
||||||
|
sum(score_components) / max(len(score_components), 1), 1
|
||||||
|
)
|
||||||
|
|
||||||
|
# Health trend
|
||||||
|
win_count = len(report.wins)
|
||||||
|
concern_count = len(report.concerns)
|
||||||
|
if win_count > concern_count:
|
||||||
|
report.health_trend = "improving"
|
||||||
|
elif concern_count > win_count:
|
||||||
|
report.health_trend = "declining"
|
||||||
|
else:
|
||||||
|
report.health_trend = "stable"
|
||||||
|
|
||||||
|
# Traffic value change
|
||||||
|
curr_cost = self._extract_metric(current_snapshot, "cost") if isinstance(current_snapshot, dict) else 0
|
||||||
|
prev_cost = previous_snapshot.get("cost", 0)
|
||||||
|
report.traffic_value_change = round((curr_cost - prev_cost) / 100.0, 2)
|
||||||
|
|
||||||
|
# Executive summary
|
||||||
|
report.executive_summary = self.generate_executive_summary(
|
||||||
|
report.wins, report.concerns,
|
||||||
|
report.health_score, report.health_trend,
|
||||||
|
report.traffic_value_change,
|
||||||
|
)
|
||||||
|
|
||||||
|
if not executive_only:
|
||||||
|
# Tactical breakdown
|
||||||
|
report.tactical_breakdown = self.generate_tactical_breakdown(
|
||||||
|
current_snapshot if isinstance(current_snapshot, dict) else {},
|
||||||
|
report.wins, report.concerns,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Target comparison
|
||||||
|
if targets_path:
|
||||||
|
try:
|
||||||
|
targets_data = json.loads(Path(targets_path).read_text())
|
||||||
|
# Use 30-day targets by default
|
||||||
|
target_set = targets_data.get("30_day", targets_data)
|
||||||
|
report.target_progress = self.compare_targets(
|
||||||
|
current_snapshot if isinstance(current_snapshot, dict) else {},
|
||||||
|
target_set,
|
||||||
|
)
|
||||||
|
except Exception as exc:
|
||||||
|
report.errors.append(f"Targets load error: {exc}")
|
||||||
|
|
||||||
|
return report
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Output formatting
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def format_text_report(report: PerformanceReport) -> str:
|
||||||
|
"""Format performance report as human-readable text."""
|
||||||
|
lines = []
|
||||||
|
lines.append("=" * 60)
|
||||||
|
lines.append(f"SEO Performance Report: {report.url}")
|
||||||
|
lines.append(f"Period: {report.period} ({report.date_from} to {report.date_to})")
|
||||||
|
lines.append(f"Generated: {report.timestamp}")
|
||||||
|
lines.append("=" * 60)
|
||||||
|
|
||||||
|
# Executive Summary
|
||||||
|
lines.append("\nEXECUTIVE SUMMARY")
|
||||||
|
lines.append("-" * 40)
|
||||||
|
es = report.executive_summary
|
||||||
|
lines.append(f" Health Score: {es.get('health_score', 0)}/100")
|
||||||
|
trend_arrow = {"improving": "^", "declining": "v", "stable": "="}.get(
|
||||||
|
es.get("health_trend", "stable"), "="
|
||||||
|
)
|
||||||
|
lines.append(f" Trend: {trend_arrow} {es.get('health_trend', 'stable')}")
|
||||||
|
lines.append(f" Traffic Value Change: ${es.get('traffic_value_change_usd', 0):,.2f}")
|
||||||
|
lines.append(f" Assessment: {es.get('overall_assessment', 'N/A')}")
|
||||||
|
|
||||||
|
# Wins
|
||||||
|
lines.append(f"\n Top Wins ({es.get('total_wins', 0)} total):")
|
||||||
|
for w in es.get("top_wins", []):
|
||||||
|
impact_marker = {"high": "!!!", "medium": "!!", "low": "!"}.get(w.get("impact", "low"), "!")
|
||||||
|
lines.append(f" {impact_marker} [{w.get('category', '')}] {w.get('description', '')}")
|
||||||
|
|
||||||
|
# Concerns
|
||||||
|
lines.append(f"\n Top Concerns ({es.get('total_concerns', 0)} total):")
|
||||||
|
for c in es.get("top_concerns", []):
|
||||||
|
impact_marker = {"high": "!!!", "medium": "!!", "low": "!"}.get(c.get("impact", "low"), "!")
|
||||||
|
lines.append(f" {impact_marker} [{c.get('category', '')}] {c.get('description', '')}")
|
||||||
|
|
||||||
|
# Trends
|
||||||
|
if report.trends:
|
||||||
|
lines.append("\nTRENDS")
|
||||||
|
lines.append("-" * 40)
|
||||||
|
for metric_name, trend_list in report.trends.items():
|
||||||
|
for t in trend_list:
|
||||||
|
if isinstance(t, dict):
|
||||||
|
dir_arrow = {"up": "^", "down": "v", "stable": "="}.get(
|
||||||
|
t.get("direction", "stable"), "="
|
||||||
|
)
|
||||||
|
change_str = f" ({t.get('change_pct', 0):+.1f}%)" if t.get("change_pct") is not None else ""
|
||||||
|
lines.append(f" {dir_arrow} {metric_name}: {t.get('value', 0):,.2f}{change_str}")
|
||||||
|
|
||||||
|
# Tactical Breakdown
|
||||||
|
if report.tactical_breakdown:
|
||||||
|
lines.append("\nTACTICAL BREAKDOWN")
|
||||||
|
lines.append("-" * 40)
|
||||||
|
for dim_name, dim_data in report.tactical_breakdown.items():
|
||||||
|
status = dim_data.get("status", "unknown")
|
||||||
|
status_marker = {
|
||||||
|
"improving": "^", "declining": "v", "active": "=", "needs_review": "?"
|
||||||
|
}.get(status, "?")
|
||||||
|
lines.append(f"\n [{dim_name.upper()}] Status: {status_marker} {status}")
|
||||||
|
for action in dim_data.get("actions", [])[:3]:
|
||||||
|
lines.append(f" > {action}")
|
||||||
|
|
||||||
|
# Target Progress
|
||||||
|
if report.target_progress:
|
||||||
|
lines.append("\nTARGET PROGRESS")
|
||||||
|
lines.append("-" * 40)
|
||||||
|
for tp in report.target_progress:
|
||||||
|
if isinstance(tp, TargetProgress):
|
||||||
|
bar_filled = int(min(tp.progress_pct, 100) / 5)
|
||||||
|
bar = "#" * bar_filled + "-" * (20 - bar_filled)
|
||||||
|
lines.append(
|
||||||
|
f" {tp.kpi_name}: [{bar}] {tp.progress_pct:.0f}% "
|
||||||
|
f"(actual: {tp.actual:,.0f} / target: {tp.target:,.0f})"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Errors
|
||||||
|
if report.errors:
|
||||||
|
lines.append("\nERRORS")
|
||||||
|
lines.append("-" * 40)
|
||||||
|
for err in report.errors:
|
||||||
|
lines.append(f" ! {err}")
|
||||||
|
|
||||||
|
lines.append("\n" + "=" * 60)
|
||||||
|
return "\n".join(lines)
|
||||||
|
|
||||||
|
|
||||||
|
def serialize_report(report: PerformanceReport) -> dict:
|
||||||
|
"""Serialize PerformanceReport to JSON-safe dictionary."""
|
||||||
|
data = {
|
||||||
|
"url": report.url,
|
||||||
|
"period": report.period,
|
||||||
|
"date_from": report.date_from,
|
||||||
|
"date_to": report.date_to,
|
||||||
|
"health_score": report.health_score,
|
||||||
|
"health_trend": report.health_trend,
|
||||||
|
"trends": report.trends,
|
||||||
|
"wins": [asdict(w) for w in report.wins],
|
||||||
|
"concerns": [asdict(c) for c in report.concerns],
|
||||||
|
"executive_summary": report.executive_summary,
|
||||||
|
"tactical_breakdown": report.tactical_breakdown,
|
||||||
|
"target_progress": [asdict(tp) for tp in report.target_progress],
|
||||||
|
"traffic_value_change": report.traffic_value_change,
|
||||||
|
"timestamp": report.timestamp,
|
||||||
|
"errors": report.errors,
|
||||||
|
}
|
||||||
|
return data
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# CLI entry point
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def parse_args() -> argparse.Namespace:
|
||||||
|
"""Parse command-line arguments."""
|
||||||
|
parser = argparse.ArgumentParser(
|
||||||
|
description="SEO Performance Reporter - Period-over-period analysis"
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--url", required=True, help="Target URL or domain"
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--period", choices=["monthly", "quarterly", "yearly", "custom"],
|
||||||
|
default="monthly", help="Report period (default: monthly)"
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--from", dest="date_from", type=str, default=None,
|
||||||
|
help="Start date (YYYY-MM-DD) for custom period"
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--to", dest="date_to", type=str, default=None,
|
||||||
|
help="End date (YYYY-MM-DD) for custom period"
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--executive", action="store_true",
|
||||||
|
help="Generate executive summary only"
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--targets", type=str, default=None,
|
||||||
|
help="Path to targets JSON file for progress comparison"
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--json", action="store_true",
|
||||||
|
help="Output results as JSON"
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--output", type=str, default=None,
|
||||||
|
help="Save output to file path"
|
||||||
|
)
|
||||||
|
return parser.parse_args()
|
||||||
|
|
||||||
|
|
||||||
|
async def main() -> None:
|
||||||
|
"""Main entry point."""
|
||||||
|
args = parse_args()
|
||||||
|
|
||||||
|
reporter = PerformanceReporter()
|
||||||
|
result = await reporter.report(
|
||||||
|
url=args.url,
|
||||||
|
period=args.period,
|
||||||
|
date_from=args.date_from,
|
||||||
|
date_to=args.date_to,
|
||||||
|
executive_only=args.executive,
|
||||||
|
targets_path=args.targets,
|
||||||
|
)
|
||||||
|
|
||||||
|
if args.json:
|
||||||
|
output = json.dumps(serialize_report(result), indent=2, ensure_ascii=False)
|
||||||
|
else:
|
||||||
|
output = format_text_report(result)
|
||||||
|
|
||||||
|
if args.output:
|
||||||
|
Path(args.output).write_text(output, encoding="utf-8")
|
||||||
|
logger.info(f"Output saved to {args.output}")
|
||||||
|
else:
|
||||||
|
print(output)
|
||||||
|
|
||||||
|
reporter.print_stats()
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
asyncio.run(main())
|
||||||
@@ -0,0 +1,8 @@
|
|||||||
|
# 25-seo-kpi-framework dependencies
|
||||||
|
requests>=2.31.0
|
||||||
|
aiohttp>=3.9.0
|
||||||
|
pandas>=2.1.0
|
||||||
|
tenacity>=8.2.0
|
||||||
|
tqdm>=4.66.0
|
||||||
|
python-dotenv>=1.0.0
|
||||||
|
rich>=13.7.0
|
||||||
107
custom-skills/25-seo-kpi-framework/desktop/SKILL.md
Normal file
107
custom-skills/25-seo-kpi-framework/desktop/SKILL.md
Normal file
@@ -0,0 +1,107 @@
|
|||||||
|
---
|
||||||
|
name: seo-kpi-framework
|
||||||
|
description: |
|
||||||
|
SEO KPI and performance framework for unified metrics, health scores, ROI, and period-over-period reporting.
|
||||||
|
Triggers: SEO KPI, performance report, health score, SEO metrics, ROI, baseline, targets.
|
||||||
|
---
|
||||||
|
|
||||||
|
# SEO KPI & Performance Framework
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
Aggregate SEO KPIs across all dimensions into a unified dashboard. Establish baselines, set targets (30/60/90-day), generate executive summaries with health scores, provide tactical breakdowns, estimate ROI using Ahrefs traffic cost, and support period-over-period comparison (MoM, QoQ, YoY).
|
||||||
|
|
||||||
|
## Core Capabilities
|
||||||
|
|
||||||
|
1. **KPI Aggregation** - Unified metrics across 7 dimensions (traffic, rankings, links, technical, content, engagement, local)
|
||||||
|
2. **Health Scoring** - Weighted 0-100 score with trend direction
|
||||||
|
3. **Baseline & Targets** - Establish baselines and set 30/60/90 day growth targets
|
||||||
|
4. **ROI Estimation** - Traffic value from Ahrefs organic cost
|
||||||
|
5. **Performance Reporting** - Period-over-period comparison with executive summary
|
||||||
|
6. **Tactical Breakdown** - Actionable next steps per dimension
|
||||||
|
|
||||||
|
## MCP Tool Usage
|
||||||
|
|
||||||
|
### Ahrefs for SEO Metrics
|
||||||
|
```
|
||||||
|
mcp__ahrefs__site-explorer-metrics: Current organic metrics snapshot
|
||||||
|
mcp__ahrefs__site-explorer-metrics-history: Historical trend data
|
||||||
|
mcp__ahrefs__site-explorer-metrics-by-country: Country-level breakdown
|
||||||
|
mcp__ahrefs__site-explorer-domain-rating-history: Domain rating trend
|
||||||
|
mcp__ahrefs__site-explorer-total-search-volume-history: Keyword volume trend
|
||||||
|
```
|
||||||
|
|
||||||
|
### Notion for Report Storage
|
||||||
|
```
|
||||||
|
mcp__notion__*: Save reports to SEO Audit Log database
|
||||||
|
```
|
||||||
|
|
||||||
|
## Workflow
|
||||||
|
|
||||||
|
### 1. KPI Aggregation
|
||||||
|
1. Fetch site-explorer-metrics for current organic data
|
||||||
|
2. Extract traffic, ranking, link, technical, content metrics
|
||||||
|
3. Calculate dimension scores with weights (traffic 25%, rankings 20%, technical 20%, content 15%, links 15%, local 5%)
|
||||||
|
4. Compute overall health score (0-100)
|
||||||
|
5. Set 30/60/90 day targets (5%/10%/20% improvement)
|
||||||
|
6. Estimate ROI from Ahrefs traffic cost (divide raw cost by 100 for USD)
|
||||||
|
|
||||||
|
### 2. Performance Reporting
|
||||||
|
1. Determine date range from period (monthly/quarterly/yearly/custom)
|
||||||
|
2. Fetch metrics-history for current and previous period
|
||||||
|
3. Calculate period-over-period changes
|
||||||
|
4. Identify wins (>5% improvement) and concerns (>5% decline)
|
||||||
|
5. Generate executive summary with trend arrows
|
||||||
|
6. Create tactical breakdown with actionable next steps
|
||||||
|
7. Compare against targets if provided
|
||||||
|
|
||||||
|
## Output Format
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
## SEO KPI Dashboard: [domain]
|
||||||
|
|
||||||
|
### Health Score: [score]/100 ([trend])
|
||||||
|
|
||||||
|
### KPI Summary
|
||||||
|
| Dimension | Score | Key Metric | Trend |
|
||||||
|
|-----------|-------|------------|-------|
|
||||||
|
| Traffic | [score] | [organic_traffic] | [arrow] |
|
||||||
|
| Rankings | [score] | [visibility] | [arrow] |
|
||||||
|
| Links | [score] | [DR] | [arrow] |
|
||||||
|
| Technical | [score] | [health] | [arrow] |
|
||||||
|
| Content | [score] | [indexed_pages] | [arrow] |
|
||||||
|
|
||||||
|
### Executive Summary
|
||||||
|
- Top Wins: [list]
|
||||||
|
- Top Concerns: [list]
|
||||||
|
- Recommendations: [list]
|
||||||
|
|
||||||
|
### Targets (30/60/90 day)
|
||||||
|
[Target table with progress bars]
|
||||||
|
```
|
||||||
|
|
||||||
|
## Key Metrics
|
||||||
|
|
||||||
|
| Dimension | Metrics | Source |
|
||||||
|
|-----------|---------|--------|
|
||||||
|
| Traffic | Organic traffic, traffic value (USD) | site-explorer-metrics |
|
||||||
|
| Rankings | Visibility score, top10 keywords | site-explorer-metrics |
|
||||||
|
| Links | Domain rating, referring domains | domain-rating, metrics |
|
||||||
|
| Technical | Pages crawled, technical health | site-explorer-metrics |
|
||||||
|
| Content | Indexed pages, freshness score | site-explorer-metrics |
|
||||||
|
| Local | GBP visibility, review score | External data |
|
||||||
|
|
||||||
|
## Limitations
|
||||||
|
|
||||||
|
- Local KPIs require external GBP data (not available via Ahrefs)
|
||||||
|
- Engagement KPIs (bounce rate, session duration) require Google Analytics
|
||||||
|
- Technical health is estimated heuristically from available data
|
||||||
|
- ROI is estimated from Ahrefs traffic cost, not actual revenue
|
||||||
|
|
||||||
|
## Notion Output (Required)
|
||||||
|
|
||||||
|
All reports MUST be saved to OurDigital SEO Audit Log:
|
||||||
|
- **Database ID**: `2c8581e5-8a1e-8035-880b-e38cefc2f3ef`
|
||||||
|
- **Properties**: Issue (title), Site (url), Category, Priority, Found Date, Audit ID
|
||||||
|
- **Language**: Korean with English technical terms
|
||||||
|
- **Audit ID Format**: KPI-YYYYMMDD-NNN
|
||||||
8
custom-skills/25-seo-kpi-framework/desktop/skill.yaml
Normal file
8
custom-skills/25-seo-kpi-framework/desktop/skill.yaml
Normal file
@@ -0,0 +1,8 @@
|
|||||||
|
name: seo-kpi-framework
|
||||||
|
description: |
|
||||||
|
SEO KPI and performance framework. Triggers: SEO KPI, performance report, health score, SEO metrics, ROI, baseline, targets.
|
||||||
|
allowed-tools:
|
||||||
|
- mcp__ahrefs__*
|
||||||
|
- mcp__notion__*
|
||||||
|
- WebSearch
|
||||||
|
- WebFetch
|
||||||
15
custom-skills/25-seo-kpi-framework/desktop/tools/ahrefs.md
Normal file
15
custom-skills/25-seo-kpi-framework/desktop/tools/ahrefs.md
Normal file
@@ -0,0 +1,15 @@
|
|||||||
|
# Ahrefs
|
||||||
|
|
||||||
|
> TODO: Document tool usage for this skill
|
||||||
|
|
||||||
|
## Available Commands
|
||||||
|
|
||||||
|
- [ ] List commands
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
- [ ] Add configuration details
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
- [ ] Add usage examples
|
||||||
15
custom-skills/25-seo-kpi-framework/desktop/tools/notion.md
Normal file
15
custom-skills/25-seo-kpi-framework/desktop/tools/notion.md
Normal file
@@ -0,0 +1,15 @@
|
|||||||
|
# Notion
|
||||||
|
|
||||||
|
> TODO: Document tool usage for this skill
|
||||||
|
|
||||||
|
## Available Commands
|
||||||
|
|
||||||
|
- [ ] List commands
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
- [ ] Add configuration details
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
- [ ] Add usage examples
|
||||||
@@ -0,0 +1,15 @@
|
|||||||
|
# WebSearch
|
||||||
|
|
||||||
|
> TODO: Document tool usage for this skill
|
||||||
|
|
||||||
|
## Available Commands
|
||||||
|
|
||||||
|
- [ ] List commands
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
- [ ] Add configuration details
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
- [ ] Add usage examples
|
||||||
149
custom-skills/26-seo-international/code/CLAUDE.md
Normal file
149
custom-skills/26-seo-international/code/CLAUDE.md
Normal file
@@ -0,0 +1,149 @@
|
|||||||
|
# CLAUDE.md
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
International SEO audit tool for multi-language and multi-region website optimization. Validates hreflang tags (bidirectional, self-referencing, x-default), analyzes URL structure patterns (ccTLD vs subdomain vs subdirectory), audits content parity across language versions, checks language detection vs declared language, and analyzes international redirect logic. Supports Korean expansion patterns (ko→ja, ko→zh, ko→en).
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install -r scripts/requirements.txt
|
||||||
|
|
||||||
|
# Hreflang validation
|
||||||
|
python scripts/hreflang_validator.py --url https://example.com --json
|
||||||
|
|
||||||
|
# Full international SEO audit
|
||||||
|
python scripts/international_auditor.py --url https://example.com --json
|
||||||
|
```
|
||||||
|
|
||||||
|
## Scripts
|
||||||
|
|
||||||
|
| Script | Purpose | Key Output |
|
||||||
|
|--------|---------|------------|
|
||||||
|
| `hreflang_validator.py` | Validate hreflang tag implementation | Hreflang errors, missing bidirectional links, x-default issues |
|
||||||
|
| `international_auditor.py` | Full international SEO audit | URL structure, content parity, redirect logic, language detection |
|
||||||
|
| `base_client.py` | Shared utilities | RateLimiter, ConfigManager, BaseAsyncClient |
|
||||||
|
|
||||||
|
## Hreflang Validator
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Validate hreflang for homepage
|
||||||
|
python scripts/hreflang_validator.py --url https://example.com --json
|
||||||
|
|
||||||
|
# Validate with sitemap-based discovery
|
||||||
|
python scripts/hreflang_validator.py --url https://example.com --sitemap https://example.com/sitemap.xml --json
|
||||||
|
|
||||||
|
# Check specific pages
|
||||||
|
python scripts/hreflang_validator.py --urls-file pages.txt --json
|
||||||
|
```
|
||||||
|
|
||||||
|
**Capabilities**:
|
||||||
|
- Hreflang tag extraction from HTML head, HTTP headers, and XML sitemap
|
||||||
|
- Bidirectional validation (if page A→B, then B→A must exist)
|
||||||
|
- Self-referencing check (each page should reference itself)
|
||||||
|
- x-default tag verification
|
||||||
|
- Language/region code validation (ISO 639-1 + ISO 3166-1)
|
||||||
|
- Conflicting hreflang detection
|
||||||
|
- Missing language version detection
|
||||||
|
- Return tag validation (confirmation links from alternate pages)
|
||||||
|
|
||||||
|
## International Auditor
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Full international audit
|
||||||
|
python scripts/international_auditor.py --url https://example.com --json
|
||||||
|
|
||||||
|
# URL structure analysis
|
||||||
|
python scripts/international_auditor.py --url https://example.com --scope structure --json
|
||||||
|
|
||||||
|
# Content parity check
|
||||||
|
python scripts/international_auditor.py --url https://example.com --scope parity --json
|
||||||
|
|
||||||
|
# Korean expansion focus
|
||||||
|
python scripts/international_auditor.py --url https://example.com --korean-expansion --json
|
||||||
|
```
|
||||||
|
|
||||||
|
**Capabilities**:
|
||||||
|
- URL structure analysis (ccTLD vs subdomain vs subdirectory)
|
||||||
|
- Recommendation engine based on business context
|
||||||
|
- Content parity audit across language versions
|
||||||
|
- Page count comparison per language
|
||||||
|
- Key page availability check (home, about, contact, products)
|
||||||
|
- Content freshness comparison across languages
|
||||||
|
- Language/locale detection vs declared language
|
||||||
|
- HTML lang attribute check
|
||||||
|
- Content-Language header check
|
||||||
|
- Actual content language detection
|
||||||
|
- International redirect logic audit
|
||||||
|
- IP-based redirect detection
|
||||||
|
- Accept-Language redirect behavior
|
||||||
|
- Geo-redirect best practices (suggest→don't force)
|
||||||
|
- Korean expansion patterns (ko→ja, ko→zh, ko→en)
|
||||||
|
- Priority market recommendations for Korean businesses
|
||||||
|
- CJK-specific URL encoding issues
|
||||||
|
- Regional search engine considerations (Naver, Baidu, Yahoo Japan)
|
||||||
|
|
||||||
|
## Ahrefs MCP Tools Used
|
||||||
|
|
||||||
|
| Tool | Purpose |
|
||||||
|
|------|---------|
|
||||||
|
| `site-explorer-metrics-by-country` | Country-level traffic distribution |
|
||||||
|
| `site-explorer-organic-keywords` | Keywords by country filter |
|
||||||
|
|
||||||
|
## Output Format
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"url": "https://example.com",
|
||||||
|
"url_structure": "subdirectory",
|
||||||
|
"languages_detected": ["ko", "en", "ja"],
|
||||||
|
"hreflang_validation": {
|
||||||
|
"total_pages_checked": 50,
|
||||||
|
"errors": [],
|
||||||
|
"warnings": [],
|
||||||
|
"missing_bidirectional": [],
|
||||||
|
"missing_self_reference": [],
|
||||||
|
"x_default_present": true
|
||||||
|
},
|
||||||
|
"content_parity": {
|
||||||
|
"ko": {"pages": 150, "freshness_score": 90},
|
||||||
|
"en": {"pages": 120, "freshness_score": 75},
|
||||||
|
"ja": {"pages": 80, "freshness_score": 60}
|
||||||
|
},
|
||||||
|
"redirect_logic": {
|
||||||
|
"ip_based_redirect": false,
|
||||||
|
"language_based_redirect": true,
|
||||||
|
"is_forced": false
|
||||||
|
},
|
||||||
|
"score": 68,
|
||||||
|
"timestamp": "2025-01-01T00:00:00"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Notion Output (Required)
|
||||||
|
|
||||||
|
**IMPORTANT**: All audit reports MUST be saved to the OurDigital SEO Audit Log database.
|
||||||
|
|
||||||
|
### Database Configuration
|
||||||
|
|
||||||
|
| Field | Value |
|
||||||
|
|-------|-------|
|
||||||
|
| Database ID | `2c8581e5-8a1e-8035-880b-e38cefc2f3ef` |
|
||||||
|
| URL | https://www.notion.so/dintelligence/2c8581e58a1e8035880be38cefc2f3ef |
|
||||||
|
|
||||||
|
### Required Properties
|
||||||
|
|
||||||
|
| Property | Type | Description |
|
||||||
|
|----------|------|-------------|
|
||||||
|
| Issue | Title | Report title (Korean + date) |
|
||||||
|
| Site | URL | Audited website URL |
|
||||||
|
| Category | Select | International SEO |
|
||||||
|
| Priority | Select | Based on hreflang error count |
|
||||||
|
| Found Date | Date | Audit date (YYYY-MM-DD) |
|
||||||
|
| Audit ID | Rich Text | Format: INTL-YYYYMMDD-NNN |
|
||||||
|
|
||||||
|
### Language Guidelines
|
||||||
|
|
||||||
|
- Report content in Korean (한국어)
|
||||||
|
- Keep technical English terms as-is (e.g., hreflang, x-default, ccTLD)
|
||||||
|
- URLs and code remain unchanged
|
||||||
207
custom-skills/26-seo-international/code/scripts/base_client.py
Normal file
207
custom-skills/26-seo-international/code/scripts/base_client.py
Normal file
@@ -0,0 +1,207 @@
|
|||||||
|
"""
|
||||||
|
Base Client - Shared async client utilities
|
||||||
|
===========================================
|
||||||
|
Purpose: Rate-limited async operations for API clients
|
||||||
|
Python: 3.10+
|
||||||
|
"""
|
||||||
|
|
||||||
|
import asyncio
|
||||||
|
import logging
|
||||||
|
import os
|
||||||
|
from asyncio import Semaphore
|
||||||
|
from datetime import datetime
|
||||||
|
from typing import Any, Callable, TypeVar
|
||||||
|
|
||||||
|
from dotenv import load_dotenv
|
||||||
|
from tenacity import (
|
||||||
|
retry,
|
||||||
|
stop_after_attempt,
|
||||||
|
wait_exponential,
|
||||||
|
retry_if_exception_type,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Load environment variables
|
||||||
|
load_dotenv()
|
||||||
|
|
||||||
|
# Logging setup
|
||||||
|
logging.basicConfig(
|
||||||
|
level=logging.INFO,
|
||||||
|
format="%(asctime)s - %(levelname)s - %(message)s",
|
||||||
|
)
|
||||||
|
|
||||||
|
T = TypeVar("T")
|
||||||
|
|
||||||
|
|
||||||
|
class RateLimiter:
|
||||||
|
"""Rate limiter using token bucket algorithm."""
|
||||||
|
|
||||||
|
def __init__(self, rate: float, per: float = 1.0):
|
||||||
|
"""
|
||||||
|
Initialize rate limiter.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
rate: Number of requests allowed
|
||||||
|
per: Time period in seconds (default: 1 second)
|
||||||
|
"""
|
||||||
|
self.rate = rate
|
||||||
|
self.per = per
|
||||||
|
self.tokens = rate
|
||||||
|
self.last_update = datetime.now()
|
||||||
|
self._lock = asyncio.Lock()
|
||||||
|
|
||||||
|
async def acquire(self) -> None:
|
||||||
|
"""Acquire a token, waiting if necessary."""
|
||||||
|
async with self._lock:
|
||||||
|
now = datetime.now()
|
||||||
|
elapsed = (now - self.last_update).total_seconds()
|
||||||
|
self.tokens = min(self.rate, self.tokens + elapsed * (self.rate / self.per))
|
||||||
|
self.last_update = now
|
||||||
|
|
||||||
|
if self.tokens < 1:
|
||||||
|
wait_time = (1 - self.tokens) * (self.per / self.rate)
|
||||||
|
await asyncio.sleep(wait_time)
|
||||||
|
self.tokens = 0
|
||||||
|
else:
|
||||||
|
self.tokens -= 1
|
||||||
|
|
||||||
|
|
||||||
|
class BaseAsyncClient:
|
||||||
|
"""Base class for async API clients with rate limiting."""
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
max_concurrent: int = 5,
|
||||||
|
requests_per_second: float = 3.0,
|
||||||
|
logger: logging.Logger | None = None,
|
||||||
|
):
|
||||||
|
"""
|
||||||
|
Initialize base client.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
max_concurrent: Maximum concurrent requests
|
||||||
|
requests_per_second: Rate limit
|
||||||
|
logger: Logger instance
|
||||||
|
"""
|
||||||
|
self.semaphore = Semaphore(max_concurrent)
|
||||||
|
self.rate_limiter = RateLimiter(requests_per_second)
|
||||||
|
self.logger = logger or logging.getLogger(self.__class__.__name__)
|
||||||
|
self.stats = {
|
||||||
|
"requests": 0,
|
||||||
|
"success": 0,
|
||||||
|
"errors": 0,
|
||||||
|
"retries": 0,
|
||||||
|
}
|
||||||
|
|
||||||
|
@retry(
|
||||||
|
stop=stop_after_attempt(3),
|
||||||
|
wait=wait_exponential(multiplier=1, min=2, max=10),
|
||||||
|
retry=retry_if_exception_type(Exception),
|
||||||
|
)
|
||||||
|
async def _rate_limited_request(
|
||||||
|
self,
|
||||||
|
coro: Callable[[], Any],
|
||||||
|
) -> Any:
|
||||||
|
"""Execute a request with rate limiting and retry."""
|
||||||
|
async with self.semaphore:
|
||||||
|
await self.rate_limiter.acquire()
|
||||||
|
self.stats["requests"] += 1
|
||||||
|
try:
|
||||||
|
result = await coro()
|
||||||
|
self.stats["success"] += 1
|
||||||
|
return result
|
||||||
|
except Exception as e:
|
||||||
|
self.stats["errors"] += 1
|
||||||
|
self.logger.error(f"Request failed: {e}")
|
||||||
|
raise
|
||||||
|
|
||||||
|
async def batch_requests(
|
||||||
|
self,
|
||||||
|
requests: list[Callable[[], Any]],
|
||||||
|
desc: str = "Processing",
|
||||||
|
) -> list[Any]:
|
||||||
|
"""Execute multiple requests concurrently."""
|
||||||
|
try:
|
||||||
|
from tqdm.asyncio import tqdm
|
||||||
|
has_tqdm = True
|
||||||
|
except ImportError:
|
||||||
|
has_tqdm = False
|
||||||
|
|
||||||
|
async def execute(req: Callable) -> Any:
|
||||||
|
try:
|
||||||
|
return await self._rate_limited_request(req)
|
||||||
|
except Exception as e:
|
||||||
|
return {"error": str(e)}
|
||||||
|
|
||||||
|
tasks = [execute(req) for req in requests]
|
||||||
|
|
||||||
|
if has_tqdm:
|
||||||
|
results = []
|
||||||
|
for coro in tqdm.as_completed(tasks, total=len(tasks), desc=desc):
|
||||||
|
result = await coro
|
||||||
|
results.append(result)
|
||||||
|
return results
|
||||||
|
else:
|
||||||
|
return await asyncio.gather(*tasks, return_exceptions=True)
|
||||||
|
|
||||||
|
def print_stats(self) -> None:
|
||||||
|
"""Print request statistics."""
|
||||||
|
self.logger.info("=" * 40)
|
||||||
|
self.logger.info("Request Statistics:")
|
||||||
|
self.logger.info(f" Total Requests: {self.stats['requests']}")
|
||||||
|
self.logger.info(f" Successful: {self.stats['success']}")
|
||||||
|
self.logger.info(f" Errors: {self.stats['errors']}")
|
||||||
|
self.logger.info("=" * 40)
|
||||||
|
|
||||||
|
|
||||||
|
class ConfigManager:
|
||||||
|
"""Manage API configuration and credentials."""
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
load_dotenv()
|
||||||
|
|
||||||
|
@property
|
||||||
|
def google_credentials_path(self) -> str | None:
|
||||||
|
"""Get Google service account credentials path."""
|
||||||
|
# Prefer SEO-specific credentials, fallback to general credentials
|
||||||
|
seo_creds = os.path.expanduser("~/.credential/ourdigital-seo-agent.json")
|
||||||
|
if os.path.exists(seo_creds):
|
||||||
|
return seo_creds
|
||||||
|
return os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
|
||||||
|
|
||||||
|
@property
|
||||||
|
def pagespeed_api_key(self) -> str | None:
|
||||||
|
"""Get PageSpeed Insights API key."""
|
||||||
|
return os.getenv("PAGESPEED_API_KEY")
|
||||||
|
|
||||||
|
@property
|
||||||
|
def custom_search_api_key(self) -> str | None:
|
||||||
|
"""Get Custom Search API key."""
|
||||||
|
return os.getenv("CUSTOM_SEARCH_API_KEY")
|
||||||
|
|
||||||
|
@property
|
||||||
|
def custom_search_engine_id(self) -> str | None:
|
||||||
|
"""Get Custom Search Engine ID."""
|
||||||
|
return os.getenv("CUSTOM_SEARCH_ENGINE_ID")
|
||||||
|
|
||||||
|
@property
|
||||||
|
def notion_token(self) -> str | None:
|
||||||
|
"""Get Notion API token."""
|
||||||
|
return os.getenv("NOTION_TOKEN") or os.getenv("NOTION_API_KEY")
|
||||||
|
|
||||||
|
def validate_google_credentials(self) -> bool:
|
||||||
|
"""Validate Google credentials are configured."""
|
||||||
|
creds_path = self.google_credentials_path
|
||||||
|
if not creds_path:
|
||||||
|
return False
|
||||||
|
return os.path.exists(creds_path)
|
||||||
|
|
||||||
|
def get_required(self, key: str) -> str:
|
||||||
|
"""Get required environment variable or raise error."""
|
||||||
|
value = os.getenv(key)
|
||||||
|
if not value:
|
||||||
|
raise ValueError(f"Missing required environment variable: {key}")
|
||||||
|
return value
|
||||||
|
|
||||||
|
|
||||||
|
# Singleton config instance
|
||||||
|
config = ConfigManager()
|
||||||
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,10 @@
|
|||||||
|
# 26-seo-international dependencies
|
||||||
|
requests>=2.31.0
|
||||||
|
aiohttp>=3.9.0
|
||||||
|
beautifulsoup4>=4.12.0
|
||||||
|
lxml>=5.1.0
|
||||||
|
langdetect>=1.0.9
|
||||||
|
tenacity>=8.2.0
|
||||||
|
tqdm>=4.66.0
|
||||||
|
python-dotenv>=1.0.0
|
||||||
|
rich>=13.7.0
|
||||||
124
custom-skills/26-seo-international/desktop/SKILL.md
Normal file
124
custom-skills/26-seo-international/desktop/SKILL.md
Normal file
@@ -0,0 +1,124 @@
|
|||||||
|
---
|
||||||
|
name: seo-international
|
||||||
|
description: |
|
||||||
|
International SEO audit and hreflang validation for multi-language and multi-region websites.
|
||||||
|
Triggers: hreflang, international SEO, multi-language, multi-region, content parity, x-default, ccTLD, 다국어 SEO.
|
||||||
|
---
|
||||||
|
|
||||||
|
# International SEO Audit
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
Audit international SEO implementation: hreflang tags, URL structure patterns, content parity across language versions, redirect logic, and Korean expansion strategies. Identify issues preventing proper multi-language indexing.
|
||||||
|
|
||||||
|
## Core Capabilities
|
||||||
|
|
||||||
|
1. **Hreflang Validation** - Bidirectional links, self-reference, x-default, language code validation
|
||||||
|
2. **URL Structure Analysis** - ccTLD vs subdomain vs subdirectory pattern detection
|
||||||
|
3. **Content Parity Audit** - Page count comparison, key page availability across languages
|
||||||
|
4. **Redirect Logic Audit** - IP-based, Accept-Language redirects, forced redirect detection
|
||||||
|
5. **Korean Expansion** - Priority markets (ja, zh, en), CJK URL issues, regional search engines
|
||||||
|
|
||||||
|
## MCP Tool Usage
|
||||||
|
|
||||||
|
### Ahrefs for Country Metrics
|
||||||
|
```
|
||||||
|
mcp__ahrefs__site-explorer-metrics-by-country: Country-level traffic distribution
|
||||||
|
mcp__ahrefs__site-explorer-organic-keywords: Keywords filtered by country
|
||||||
|
```
|
||||||
|
|
||||||
|
### Notion for Report Storage
|
||||||
|
```
|
||||||
|
mcp__notion__notion-create-pages: Save audit report to SEO Audit Log database
|
||||||
|
```
|
||||||
|
|
||||||
|
### WebSearch for Best Practices
|
||||||
|
```
|
||||||
|
WebSearch: Research hreflang implementation guides and regional search engine requirements
|
||||||
|
```
|
||||||
|
|
||||||
|
## Workflow
|
||||||
|
|
||||||
|
### 1. Hreflang Validation
|
||||||
|
1. Fetch target URL and extract hreflang tags (HTML head, HTTP headers)
|
||||||
|
2. If sitemap provided, also extract xhtml:link hreflang from XML sitemap
|
||||||
|
3. Validate language codes (ISO 639-1) and region codes (ISO 3166-1)
|
||||||
|
4. Check bidirectional links (if A references B, B must reference A)
|
||||||
|
5. Verify self-referencing tags on each page
|
||||||
|
6. Check x-default tag presence and validity
|
||||||
|
7. Detect conflicting hreflang for same language-region
|
||||||
|
8. Report all errors with severity levels
|
||||||
|
|
||||||
|
### 2. URL Structure Analysis
|
||||||
|
1. Crawl known language versions of the site
|
||||||
|
2. Classify pattern: ccTLD (example.kr), subdomain (ko.example.com), subdirectory (example.com/ko/)
|
||||||
|
3. Check consistency across all language versions
|
||||||
|
4. Provide recommendation based on business context
|
||||||
|
|
||||||
|
### 3. Content Parity Audit
|
||||||
|
1. Discover all language versions from hreflang tags
|
||||||
|
2. Count pages per language version
|
||||||
|
3. Check availability of key pages (home, about, contact, products/services)
|
||||||
|
4. Compare content freshness (last modified dates) across versions
|
||||||
|
5. Flag significant gaps in content availability
|
||||||
|
|
||||||
|
### 4. Redirect Logic Audit
|
||||||
|
1. Test URL with different Accept-Language headers (ko, en, ja, zh)
|
||||||
|
2. Check if redirects are forced (no way to override) vs suggested (banner/popup)
|
||||||
|
3. Flag forced geo/language redirects as anti-pattern
|
||||||
|
4. Recommend proper implementation (suggest, do not force)
|
||||||
|
|
||||||
|
### 5. Korean Expansion Analysis (Optional)
|
||||||
|
1. Analyze current traffic by country via Ahrefs
|
||||||
|
2. Recommend priority target markets for Korean businesses
|
||||||
|
3. Check CJK-specific URL encoding issues
|
||||||
|
4. Advise on regional search engines (Naver, Baidu, Yahoo Japan)
|
||||||
|
|
||||||
|
## Output Format
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
## 다국어 SEO 감사: [domain]
|
||||||
|
|
||||||
|
### Hreflang 검증
|
||||||
|
- 검사 페이지 수: [count]
|
||||||
|
- 오류: [count] (심각 [count], 경고 [count])
|
||||||
|
- 양방향 링크 누락: [list]
|
||||||
|
- 자기참조 누락: [list]
|
||||||
|
- x-default: [있음/없음]
|
||||||
|
|
||||||
|
### URL 구조
|
||||||
|
- 패턴: [ccTLD/subdomain/subdirectory]
|
||||||
|
- 일관성: [양호/비일관]
|
||||||
|
- 권장사항: [recommendation]
|
||||||
|
|
||||||
|
### 콘텐츠 동등성
|
||||||
|
| 언어 | 페이지 수 | 핵심 페이지 | 최신성 점수 |
|
||||||
|
|------|----------|------------|-----------|
|
||||||
|
| ko | 150 | 5/5 | 90 |
|
||||||
|
| en | 120 | 4/5 | 75 |
|
||||||
|
|
||||||
|
### 리다이렉트 로직
|
||||||
|
- IP 기반 리다이렉트: [있음/없음]
|
||||||
|
- 언어 기반 리다이렉트: [있음/없음]
|
||||||
|
- 강제 리다이렉트: [있음/없음] (없어야 정상)
|
||||||
|
|
||||||
|
### 종합 점수: [score]/100
|
||||||
|
|
||||||
|
### 권장 조치사항
|
||||||
|
1. [Priority fixes in Korean]
|
||||||
|
```
|
||||||
|
|
||||||
|
## Notion Output (Required)
|
||||||
|
|
||||||
|
All audit reports MUST be saved to OurDigital SEO Audit Log:
|
||||||
|
- **Database ID**: `2c8581e5-8a1e-8035-880b-e38cefc2f3ef`
|
||||||
|
- **Properties**: Issue (title), Site (url), Category (International SEO), Priority, Found Date, Audit ID
|
||||||
|
- **Language**: Korean with English technical terms
|
||||||
|
- **Audit ID Format**: INTL-YYYYMMDD-NNN
|
||||||
|
|
||||||
|
## Limitations
|
||||||
|
|
||||||
|
- Cannot detect server-side IP-based redirects without proxy testing
|
||||||
|
- Content language detection requires sufficient text content
|
||||||
|
- Large sites (10,000+ pages) require sampling approach
|
||||||
|
- Sitemap-based hreflang requires XML sitemap access
|
||||||
8
custom-skills/26-seo-international/desktop/skill.yaml
Normal file
8
custom-skills/26-seo-international/desktop/skill.yaml
Normal file
@@ -0,0 +1,8 @@
|
|||||||
|
name: seo-international
|
||||||
|
description: |
|
||||||
|
International SEO audit and hreflang validation. Triggers: hreflang, international SEO, multi-language, multi-region, content parity, x-default, ccTLD.
|
||||||
|
allowed-tools:
|
||||||
|
- mcp__ahrefs__*
|
||||||
|
- mcp__notion__*
|
||||||
|
- WebSearch
|
||||||
|
- WebFetch
|
||||||
43
custom-skills/26-seo-international/desktop/tools/ahrefs.md
Normal file
43
custom-skills/26-seo-international/desktop/tools/ahrefs.md
Normal file
@@ -0,0 +1,43 @@
|
|||||||
|
# Ahrefs
|
||||||
|
|
||||||
|
## Tools Used
|
||||||
|
|
||||||
|
### site-explorer-metrics-by-country
|
||||||
|
- **Purpose**: Get country-level organic traffic distribution
|
||||||
|
- **Usage**: Analyze which countries drive traffic to identify international SEO opportunities
|
||||||
|
- **Parameters**: `target` (domain), `country` (optional filter)
|
||||||
|
- **Example**:
|
||||||
|
```
|
||||||
|
mcp__ahrefs__site-explorer-metrics-by-country:
|
||||||
|
target: example.com
|
||||||
|
```
|
||||||
|
|
||||||
|
### site-explorer-organic-keywords
|
||||||
|
- **Purpose**: Get organic keyword rankings filtered by country
|
||||||
|
- **Usage**: Analyze keyword performance in specific markets
|
||||||
|
- **Parameters**: `target` (domain), `country` (ISO country code)
|
||||||
|
- **Example**:
|
||||||
|
```
|
||||||
|
mcp__ahrefs__site-explorer-organic-keywords:
|
||||||
|
target: example.com
|
||||||
|
country: kr
|
||||||
|
```
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
- Ahrefs MCP server must be connected in Claude Desktop
|
||||||
|
- API access requires active Ahrefs subscription
|
||||||
|
|
||||||
|
## Common Patterns
|
||||||
|
|
||||||
|
### Country Traffic Analysis
|
||||||
|
1. Call `site-explorer-metrics-by-country` to get traffic distribution
|
||||||
|
2. Identify top countries by organic traffic share
|
||||||
|
3. Compare with hreflang implementation coverage
|
||||||
|
4. Flag countries with traffic but no localized version
|
||||||
|
|
||||||
|
### Keyword Gap by Market
|
||||||
|
1. Call `site-explorer-organic-keywords` with country filter
|
||||||
|
2. Compare keyword counts across target markets
|
||||||
|
3. Identify markets with low keyword coverage
|
||||||
|
4. Recommend content localization priorities
|
||||||
51
custom-skills/26-seo-international/desktop/tools/notion.md
Normal file
51
custom-skills/26-seo-international/desktop/tools/notion.md
Normal file
@@ -0,0 +1,51 @@
|
|||||||
|
# Notion
|
||||||
|
|
||||||
|
## Database Configuration
|
||||||
|
|
||||||
|
| Field | Value |
|
||||||
|
|-------|-------|
|
||||||
|
| Database ID | `2c8581e5-8a1e-8035-880b-e38cefc2f3ef` |
|
||||||
|
| URL | https://www.notion.so/dintelligence/2c8581e58a1e8035880be38cefc2f3ef |
|
||||||
|
|
||||||
|
## Required Properties
|
||||||
|
|
||||||
|
| Property | Type | Description |
|
||||||
|
|----------|------|-------------|
|
||||||
|
| Issue | Title | Report title in Korean with date |
|
||||||
|
| Site | URL | Audited website URL |
|
||||||
|
| Category | Select | "International SEO" |
|
||||||
|
| Priority | Select | Based on hreflang error severity |
|
||||||
|
| Found Date | Date | Audit date (YYYY-MM-DD) |
|
||||||
|
| Audit ID | Rich Text | Format: INTL-YYYYMMDD-NNN |
|
||||||
|
|
||||||
|
## Example: Create Audit Report
|
||||||
|
|
||||||
|
```
|
||||||
|
mcp__notion__notion-create-pages:
|
||||||
|
pages:
|
||||||
|
- parent_id: "2c8581e5-8a1e-8035-880b-e38cefc2f3ef"
|
||||||
|
parent_type: "database"
|
||||||
|
title: "다국어 SEO 감사 - example.com (2025-01-15)"
|
||||||
|
properties:
|
||||||
|
Site:
|
||||||
|
url: "https://example.com"
|
||||||
|
Category:
|
||||||
|
select:
|
||||||
|
name: "International SEO"
|
||||||
|
Priority:
|
||||||
|
select:
|
||||||
|
name: "High"
|
||||||
|
Found Date:
|
||||||
|
date:
|
||||||
|
start: "2025-01-15"
|
||||||
|
Audit ID:
|
||||||
|
rich_text:
|
||||||
|
- text:
|
||||||
|
content: "INTL-20250115-001"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Language Guidelines
|
||||||
|
|
||||||
|
- Report content in Korean (한국어)
|
||||||
|
- Keep technical English terms as-is (hreflang, x-default, ccTLD, subdomain)
|
||||||
|
- URLs and code remain unchanged
|
||||||
@@ -0,0 +1,40 @@
|
|||||||
|
# WebSearch
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
Search the web for current international SEO best practices, hreflang implementation guides, and regional search engine requirements.
|
||||||
|
|
||||||
|
## Common Search Queries
|
||||||
|
|
||||||
|
### Hreflang Best Practices
|
||||||
|
```
|
||||||
|
WebSearch: "hreflang implementation best practices 2025"
|
||||||
|
WebSearch: "hreflang common errors fix"
|
||||||
|
WebSearch: "x-default hreflang when to use"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Regional Search Engines
|
||||||
|
```
|
||||||
|
WebSearch: "Naver SEO requirements Korean websites"
|
||||||
|
WebSearch: "Baidu SEO China market entry"
|
||||||
|
WebSearch: "Yahoo Japan SEO vs Google Japan"
|
||||||
|
```
|
||||||
|
|
||||||
|
### International URL Structure
|
||||||
|
```
|
||||||
|
WebSearch: "ccTLD vs subdomain vs subdirectory international SEO"
|
||||||
|
WebSearch: "Google recommendations international targeting"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Korean Market Expansion
|
||||||
|
```
|
||||||
|
WebSearch: "Korean business international SEO Japan market"
|
||||||
|
WebSearch: "CJK URL encoding SEO best practices"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Usage Pattern
|
||||||
|
|
||||||
|
1. Search for domain-specific international SEO intelligence
|
||||||
|
2. Verify current Google documentation on hreflang
|
||||||
|
3. Research regional search engine requirements for target markets
|
||||||
|
4. Find competitor international SEO strategies
|
||||||
147
custom-skills/27-seo-ai-visibility/code/CLAUDE.md
Normal file
147
custom-skills/27-seo-ai-visibility/code/CLAUDE.md
Normal file
@@ -0,0 +1,147 @@
|
|||||||
|
# CLAUDE.md
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
AI search visibility and brand radar tool for tracking how a brand appears in AI-generated search answers. Monitors AI answer citations, tracks share of voice in AI search vs competitors, analyzes cited domains and pages, and tracks impressions/mentions history. Uses Ahrefs Brand Radar APIs for comprehensive AI visibility monitoring.
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install -r scripts/requirements.txt
|
||||||
|
|
||||||
|
# AI visibility tracking
|
||||||
|
python scripts/ai_visibility_tracker.py --target example.com --json
|
||||||
|
|
||||||
|
# AI citation analysis
|
||||||
|
python scripts/ai_citation_analyzer.py --target example.com --json
|
||||||
|
```
|
||||||
|
|
||||||
|
## Scripts
|
||||||
|
|
||||||
|
| Script | Purpose | Key Output |
|
||||||
|
|--------|---------|------------|
|
||||||
|
| `ai_visibility_tracker.py` | Track brand visibility in AI search results | AI impressions, mentions, share of voice, trends |
|
||||||
|
| `ai_citation_analyzer.py` | Analyze AI answer citations and source pages | Cited domains, cited pages, AI response analysis |
|
||||||
|
| `base_client.py` | Shared utilities | RateLimiter, ConfigManager, BaseAsyncClient |
|
||||||
|
|
||||||
|
## AI Visibility Tracker
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Current visibility overview
|
||||||
|
python scripts/ai_visibility_tracker.py --target example.com --json
|
||||||
|
|
||||||
|
# With competitor comparison
|
||||||
|
python scripts/ai_visibility_tracker.py --target example.com --competitor comp1.com --competitor comp2.com --json
|
||||||
|
|
||||||
|
# Historical trend (impressions/mentions)
|
||||||
|
python scripts/ai_visibility_tracker.py --target example.com --history --json
|
||||||
|
|
||||||
|
# Share of voice analysis
|
||||||
|
python scripts/ai_visibility_tracker.py --target example.com --sov --json
|
||||||
|
```
|
||||||
|
|
||||||
|
**Capabilities**:
|
||||||
|
- AI impressions overview (how often brand appears in AI answers)
|
||||||
|
- AI mentions overview (brand mention frequency across AI engines)
|
||||||
|
- Share of Voice in AI search vs competitors
|
||||||
|
- Impressions history over time (trend tracking)
|
||||||
|
- Mentions history over time
|
||||||
|
- SOV history and trend analysis
|
||||||
|
- Competitor AI visibility comparison
|
||||||
|
|
||||||
|
## AI Citation Analyzer
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Analyze AI citations for brand
|
||||||
|
python scripts/ai_citation_analyzer.py --target example.com --json
|
||||||
|
|
||||||
|
# Cited domains analysis
|
||||||
|
python scripts/ai_citation_analyzer.py --target example.com --cited-domains --json
|
||||||
|
|
||||||
|
# Cited pages analysis
|
||||||
|
python scripts/ai_citation_analyzer.py --target example.com --cited-pages --json
|
||||||
|
|
||||||
|
# AI response content analysis
|
||||||
|
python scripts/ai_citation_analyzer.py --target example.com --responses --json
|
||||||
|
```
|
||||||
|
|
||||||
|
**Capabilities**:
|
||||||
|
- AI response analysis (how the brand appears in AI-generated answers)
|
||||||
|
- Cited domains analysis (which source domains AI engines reference)
|
||||||
|
- Cited pages analysis (which specific URLs get cited)
|
||||||
|
- Citation sentiment and context analysis
|
||||||
|
- Citation frequency ranking
|
||||||
|
- Competitor citation comparison
|
||||||
|
- Recommendation generation for improving AI visibility
|
||||||
|
|
||||||
|
## Ahrefs MCP Tools Used
|
||||||
|
|
||||||
|
| Tool | Purpose |
|
||||||
|
|------|---------|
|
||||||
|
| `brand-radar-ai-responses` | Get AI-generated responses mentioning brand |
|
||||||
|
| `brand-radar-cited-domains` | Domains cited in AI answers |
|
||||||
|
| `brand-radar-cited-pages` | Specific pages cited in AI answers |
|
||||||
|
| `brand-radar-impressions-history` | Brand impression trend over time |
|
||||||
|
| `brand-radar-impressions-overview` | Current impression metrics |
|
||||||
|
| `brand-radar-mentions-history` | Brand mention trend over time |
|
||||||
|
| `brand-radar-mentions-overview` | Current mention metrics |
|
||||||
|
| `brand-radar-sov-history` | Share of voice trend |
|
||||||
|
| `brand-radar-sov-overview` | Current share of voice |
|
||||||
|
|
||||||
|
## Output Format
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"target": "example.com",
|
||||||
|
"impressions": {
|
||||||
|
"total": 15000,
|
||||||
|
"trend": "increasing",
|
||||||
|
"change_pct": 12.5
|
||||||
|
},
|
||||||
|
"mentions": {
|
||||||
|
"total": 850,
|
||||||
|
"trend": "stable",
|
||||||
|
"change_pct": 2.1
|
||||||
|
},
|
||||||
|
"share_of_voice": {
|
||||||
|
"brand_sov": 18.5,
|
||||||
|
"competitors": [
|
||||||
|
{"domain": "comp1.com", "sov": 25.3},
|
||||||
|
{"domain": "comp2.com", "sov": 15.8}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"cited_domains": [...],
|
||||||
|
"cited_pages": [...],
|
||||||
|
"ai_responses_sample": [...],
|
||||||
|
"recommendations": [...],
|
||||||
|
"timestamp": "2025-01-01T00:00:00"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Notion Output (Required)
|
||||||
|
|
||||||
|
**IMPORTANT**: All audit reports MUST be saved to the OurDigital SEO Audit Log database.
|
||||||
|
|
||||||
|
### Database Configuration
|
||||||
|
|
||||||
|
| Field | Value |
|
||||||
|
|-------|-------|
|
||||||
|
| Database ID | `2c8581e5-8a1e-8035-880b-e38cefc2f3ef` |
|
||||||
|
| URL | https://www.notion.so/dintelligence/2c8581e58a1e8035880be38cefc2f3ef |
|
||||||
|
|
||||||
|
### Required Properties
|
||||||
|
|
||||||
|
| Property | Type | Description |
|
||||||
|
|----------|------|-------------|
|
||||||
|
| Issue | Title | Report title (Korean + date) |
|
||||||
|
| Site | URL | Tracked website URL |
|
||||||
|
| Category | Select | AI Search Visibility |
|
||||||
|
| Priority | Select | Based on SOV trend |
|
||||||
|
| Found Date | Date | Report date (YYYY-MM-DD) |
|
||||||
|
| Audit ID | Rich Text | Format: AI-YYYYMMDD-NNN |
|
||||||
|
|
||||||
|
### Language Guidelines
|
||||||
|
|
||||||
|
- Report content in Korean (한국어)
|
||||||
|
- Keep technical English terms as-is (e.g., AI Search, Share of Voice, Brand Radar)
|
||||||
|
- URLs and code remain unchanged
|
||||||
@@ -0,0 +1,611 @@
|
|||||||
|
"""
|
||||||
|
AI Citation Analyzer - Brand Radar Citation Analysis
|
||||||
|
=====================================================
|
||||||
|
Purpose: Analyze how a brand is cited in AI-generated search answers,
|
||||||
|
including cited domains, cited pages, and AI response content.
|
||||||
|
Python: 3.10+
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python ai_citation_analyzer.py --target example.com --json
|
||||||
|
python ai_citation_analyzer.py --target example.com --cited-domains --json
|
||||||
|
python ai_citation_analyzer.py --target example.com --cited-pages --json
|
||||||
|
python ai_citation_analyzer.py --target example.com --responses --json
|
||||||
|
"""
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import asyncio
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
import subprocess
|
||||||
|
import sys
|
||||||
|
from dataclasses import dataclass, field, asdict
|
||||||
|
from datetime import datetime
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
# Add parent to path for base_client import
|
||||||
|
sys.path.insert(0, str(Path(__file__).parent))
|
||||||
|
from base_client import BaseAsyncClient, config
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Data classes
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class AiResponse:
|
||||||
|
"""An AI-generated response that mentions the brand."""
|
||||||
|
query: str = ""
|
||||||
|
response_text: str = ""
|
||||||
|
brand_mentioned: bool = False
|
||||||
|
sentiment: str = "neutral" # positive, neutral, negative
|
||||||
|
source_engine: str = ""
|
||||||
|
date: str = ""
|
||||||
|
url: str = ""
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class CitedDomain:
|
||||||
|
"""A domain cited in AI-generated answers."""
|
||||||
|
domain: str = ""
|
||||||
|
citation_count: int = 0
|
||||||
|
topics: list[str] = field(default_factory=list)
|
||||||
|
share_pct: float = 0.0
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class CitedPage:
|
||||||
|
"""A specific page cited in AI-generated answers."""
|
||||||
|
url: str = ""
|
||||||
|
title: str = ""
|
||||||
|
citation_count: int = 0
|
||||||
|
context: str = ""
|
||||||
|
topics: list[str] = field(default_factory=list)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class CitationAnalysisResult:
|
||||||
|
"""Complete citation analysis result."""
|
||||||
|
target: str = ""
|
||||||
|
ai_responses: list[AiResponse] = field(default_factory=list)
|
||||||
|
cited_domains: list[CitedDomain] = field(default_factory=list)
|
||||||
|
cited_pages: list[CitedPage] = field(default_factory=list)
|
||||||
|
sentiment_summary: dict = field(default_factory=dict)
|
||||||
|
citation_ranking: list[dict] = field(default_factory=list)
|
||||||
|
competitor_citations: list[dict] = field(default_factory=list)
|
||||||
|
recommendations: list[str] = field(default_factory=list)
|
||||||
|
timestamp: str = field(default_factory=lambda: datetime.now().isoformat())
|
||||||
|
|
||||||
|
def to_dict(self) -> dict:
|
||||||
|
"""Convert result to dictionary."""
|
||||||
|
return {
|
||||||
|
"target": self.target,
|
||||||
|
"ai_responses": [asdict(r) for r in self.ai_responses],
|
||||||
|
"cited_domains": [asdict(d) for d in self.cited_domains],
|
||||||
|
"cited_pages": [asdict(p) for p in self.cited_pages],
|
||||||
|
"sentiment_summary": self.sentiment_summary,
|
||||||
|
"citation_ranking": self.citation_ranking,
|
||||||
|
"competitor_citations": self.competitor_citations,
|
||||||
|
"recommendations": self.recommendations,
|
||||||
|
"timestamp": self.timestamp,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# MCP tool caller helper
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def call_mcp_tool(tool_name: str, params: dict) -> dict:
|
||||||
|
"""
|
||||||
|
Call an Ahrefs MCP tool and return the parsed JSON response.
|
||||||
|
|
||||||
|
In Claude Desktop / Claude Code environments the MCP tools are invoked
|
||||||
|
directly by the AI agent. This helper exists so that the script can also
|
||||||
|
be executed standalone via subprocess for testing purposes.
|
||||||
|
"""
|
||||||
|
logger.info(f"Calling MCP tool: {tool_name} with params: {params}")
|
||||||
|
try:
|
||||||
|
cmd = ["claude", "mcp", "call", "ahrefs", tool_name, json.dumps(params)]
|
||||||
|
result = subprocess.run(cmd, capture_output=True, text=True, timeout=120)
|
||||||
|
if result.returncode == 0 and result.stdout.strip():
|
||||||
|
return json.loads(result.stdout.strip())
|
||||||
|
logger.warning(f"MCP tool {tool_name} returned non-zero or empty: {result.stderr}")
|
||||||
|
return {}
|
||||||
|
except (subprocess.TimeoutExpired, json.JSONDecodeError, FileNotFoundError) as exc:
|
||||||
|
logger.warning(f"MCP call failed ({exc}). Returning empty dict.")
|
||||||
|
return {}
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# AI Citation Analyzer
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
class AiCitationAnalyzer(BaseAsyncClient):
|
||||||
|
"""Analyze AI answer citations and source pages for a brand."""
|
||||||
|
|
||||||
|
def __init__(self, **kwargs):
|
||||||
|
super().__init__(**kwargs)
|
||||||
|
self.logger = logging.getLogger(self.__class__.__name__)
|
||||||
|
|
||||||
|
# ---- AI Responses ----
|
||||||
|
|
||||||
|
async def get_ai_responses(self, target: str) -> list[AiResponse]:
|
||||||
|
"""Fetch AI-generated responses mentioning the brand via brand-radar-ai-responses."""
|
||||||
|
self.logger.info(f"Fetching AI responses for {target}")
|
||||||
|
data = await asyncio.to_thread(
|
||||||
|
call_mcp_tool,
|
||||||
|
"brand-radar-ai-responses",
|
||||||
|
{"target": target},
|
||||||
|
)
|
||||||
|
responses: list[AiResponse] = []
|
||||||
|
if not data:
|
||||||
|
return responses
|
||||||
|
|
||||||
|
items = data if isinstance(data, list) else data.get("responses", data.get("data", []))
|
||||||
|
for item in items:
|
||||||
|
if isinstance(item, dict):
|
||||||
|
responses.append(AiResponse(
|
||||||
|
query=item.get("query", item.get("keyword", "")),
|
||||||
|
response_text=item.get("response_text", item.get("answer", item.get("text", ""))),
|
||||||
|
brand_mentioned=item.get("brand_mentioned", True),
|
||||||
|
sentiment=item.get("sentiment", "neutral"),
|
||||||
|
source_engine=item.get("source_engine", item.get("engine", "")),
|
||||||
|
date=item.get("date", ""),
|
||||||
|
url=item.get("url", ""),
|
||||||
|
))
|
||||||
|
return responses
|
||||||
|
|
||||||
|
# ---- Cited Domains ----
|
||||||
|
|
||||||
|
async def get_cited_domains(self, target: str) -> list[CitedDomain]:
|
||||||
|
"""Fetch domains cited in AI answers via brand-radar-cited-domains."""
|
||||||
|
self.logger.info(f"Fetching cited domains for {target}")
|
||||||
|
data = await asyncio.to_thread(
|
||||||
|
call_mcp_tool,
|
||||||
|
"brand-radar-cited-domains",
|
||||||
|
{"target": target},
|
||||||
|
)
|
||||||
|
domains: list[CitedDomain] = []
|
||||||
|
if not data:
|
||||||
|
return domains
|
||||||
|
|
||||||
|
items = data if isinstance(data, list) else data.get("domains", data.get("data", []))
|
||||||
|
for item in items:
|
||||||
|
if isinstance(item, dict):
|
||||||
|
domains.append(CitedDomain(
|
||||||
|
domain=item.get("domain", ""),
|
||||||
|
citation_count=item.get("citation_count", item.get("citations", item.get("count", 0))),
|
||||||
|
topics=item.get("topics", []),
|
||||||
|
share_pct=item.get("share_pct", item.get("share", 0.0)),
|
||||||
|
))
|
||||||
|
return domains
|
||||||
|
|
||||||
|
# ---- Cited Pages ----
|
||||||
|
|
||||||
|
async def get_cited_pages(self, target: str) -> list[CitedPage]:
|
||||||
|
"""Fetch specific pages cited in AI answers via brand-radar-cited-pages."""
|
||||||
|
self.logger.info(f"Fetching cited pages for {target}")
|
||||||
|
data = await asyncio.to_thread(
|
||||||
|
call_mcp_tool,
|
||||||
|
"brand-radar-cited-pages",
|
||||||
|
{"target": target},
|
||||||
|
)
|
||||||
|
pages: list[CitedPage] = []
|
||||||
|
if not data:
|
||||||
|
return pages
|
||||||
|
|
||||||
|
items = data if isinstance(data, list) else data.get("pages", data.get("data", []))
|
||||||
|
for item in items:
|
||||||
|
if isinstance(item, dict):
|
||||||
|
pages.append(CitedPage(
|
||||||
|
url=item.get("url", ""),
|
||||||
|
title=item.get("title", ""),
|
||||||
|
citation_count=item.get("citation_count", item.get("citations", item.get("count", 0))),
|
||||||
|
context=item.get("context", item.get("snippet", "")),
|
||||||
|
topics=item.get("topics", []),
|
||||||
|
))
|
||||||
|
return pages
|
||||||
|
|
||||||
|
# ---- Sentiment Analysis ----
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def analyze_response_sentiment(responses: list[AiResponse]) -> dict:
|
||||||
|
"""
|
||||||
|
Analyze the sentiment distribution of AI responses.
|
||||||
|
|
||||||
|
Returns a summary with counts and percentages for each sentiment category.
|
||||||
|
"""
|
||||||
|
if not responses:
|
||||||
|
return {
|
||||||
|
"total": 0,
|
||||||
|
"positive": 0,
|
||||||
|
"neutral": 0,
|
||||||
|
"negative": 0,
|
||||||
|
"positive_pct": 0.0,
|
||||||
|
"neutral_pct": 0.0,
|
||||||
|
"negative_pct": 0.0,
|
||||||
|
"overall_sentiment": "unknown",
|
||||||
|
}
|
||||||
|
|
||||||
|
total = len(responses)
|
||||||
|
positive = sum(1 for r in responses if r.sentiment == "positive")
|
||||||
|
neutral = sum(1 for r in responses if r.sentiment == "neutral")
|
||||||
|
negative = sum(1 for r in responses if r.sentiment == "negative")
|
||||||
|
|
||||||
|
positive_pct = round((positive / total) * 100, 1)
|
||||||
|
neutral_pct = round((neutral / total) * 100, 1)
|
||||||
|
negative_pct = round((negative / total) * 100, 1)
|
||||||
|
|
||||||
|
# Determine overall sentiment
|
||||||
|
if positive_pct >= 60:
|
||||||
|
overall = "positive"
|
||||||
|
elif negative_pct >= 40:
|
||||||
|
overall = "negative"
|
||||||
|
elif positive_pct > negative_pct:
|
||||||
|
overall = "leaning_positive"
|
||||||
|
elif negative_pct > positive_pct:
|
||||||
|
overall = "leaning_negative"
|
||||||
|
else:
|
||||||
|
overall = "neutral"
|
||||||
|
|
||||||
|
return {
|
||||||
|
"total": total,
|
||||||
|
"positive": positive,
|
||||||
|
"neutral": neutral,
|
||||||
|
"negative": negative,
|
||||||
|
"positive_pct": positive_pct,
|
||||||
|
"neutral_pct": neutral_pct,
|
||||||
|
"negative_pct": negative_pct,
|
||||||
|
"overall_sentiment": overall,
|
||||||
|
}
|
||||||
|
|
||||||
|
# ---- Citation Ranking ----
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def rank_citations(items: list[CitedDomain] | list[CitedPage]) -> list[dict]:
|
||||||
|
"""Rank cited domains or pages by citation frequency."""
|
||||||
|
if not items:
|
||||||
|
return []
|
||||||
|
|
||||||
|
ranked = sorted(items, key=lambda x: x.citation_count, reverse=True)
|
||||||
|
total_citations = sum(item.citation_count for item in ranked)
|
||||||
|
|
||||||
|
result = []
|
||||||
|
for rank, item in enumerate(ranked, 1):
|
||||||
|
entry = asdict(item)
|
||||||
|
entry["rank"] = rank
|
||||||
|
entry["share_of_citations"] = (
|
||||||
|
round((item.citation_count / total_citations) * 100, 1)
|
||||||
|
if total_citations > 0
|
||||||
|
else 0.0
|
||||||
|
)
|
||||||
|
result.append(entry)
|
||||||
|
|
||||||
|
return result
|
||||||
|
|
||||||
|
# ---- Competitor Citation Comparison ----
|
||||||
|
|
||||||
|
async def compare_competitor_citations(
|
||||||
|
self, target: str, competitors: list[str]
|
||||||
|
) -> list[dict]:
|
||||||
|
"""Compare citation profiles between target and competitors."""
|
||||||
|
self.logger.info(f"Comparing citations for {target} vs {competitors}")
|
||||||
|
results = []
|
||||||
|
|
||||||
|
all_domains = [target] + competitors
|
||||||
|
for domain in all_domains:
|
||||||
|
cited_domains = await self.get_cited_domains(domain)
|
||||||
|
cited_pages = await self.get_cited_pages(domain)
|
||||||
|
|
||||||
|
total_domain_citations = sum(d.citation_count for d in cited_domains)
|
||||||
|
total_page_citations = sum(p.citation_count for p in cited_pages)
|
||||||
|
unique_domains = len(cited_domains)
|
||||||
|
unique_pages = len(cited_pages)
|
||||||
|
|
||||||
|
results.append({
|
||||||
|
"domain": domain,
|
||||||
|
"is_target": domain == target,
|
||||||
|
"total_domain_citations": total_domain_citations,
|
||||||
|
"total_page_citations": total_page_citations,
|
||||||
|
"unique_cited_domains": unique_domains,
|
||||||
|
"unique_cited_pages": unique_pages,
|
||||||
|
"top_cited_domain": cited_domains[0].domain if cited_domains else "",
|
||||||
|
"top_cited_page": cited_pages[0].url if cited_pages else "",
|
||||||
|
})
|
||||||
|
|
||||||
|
# Sort by total page citations descending
|
||||||
|
results.sort(key=lambda x: x["total_page_citations"], reverse=True)
|
||||||
|
return results
|
||||||
|
|
||||||
|
# ---- Recommendations ----
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def generate_recommendations(result: CitationAnalysisResult) -> list[str]:
|
||||||
|
"""Generate actionable recommendations for improving AI citations."""
|
||||||
|
recs: list[str] = []
|
||||||
|
|
||||||
|
# Based on citation count
|
||||||
|
total_page_citations = sum(p.citation_count for p in result.cited_pages)
|
||||||
|
if total_page_citations == 0:
|
||||||
|
recs.append(
|
||||||
|
"AI 검색 엔진에서 인용된 페이지가 없습니다. "
|
||||||
|
"고품질 원본 콘텐츠(연구 데이터, 종합 가이드, 전문가 인사이트)를 "
|
||||||
|
"발행하여 AI 엔진의 인용 대상이 되도록 하세요."
|
||||||
|
)
|
||||||
|
elif total_page_citations < 10:
|
||||||
|
recs.append(
|
||||||
|
f"인용된 페이지 수가 {total_page_citations}건으로 적습니다. "
|
||||||
|
"FAQ, How-to, 비교 분석 등 AI가 참조하기 쉬운 "
|
||||||
|
"구조화된 콘텐츠를 추가하세요."
|
||||||
|
)
|
||||||
|
|
||||||
|
# Based on domain diversity
|
||||||
|
if result.cited_domains:
|
||||||
|
target_domains = [d for d in result.cited_domains if d.domain == result.target]
|
||||||
|
if not target_domains:
|
||||||
|
recs.append(
|
||||||
|
"타깃 도메인이 AI 인용 도메인 목록에 포함되지 않았습니다. "
|
||||||
|
"도메인 권위(Domain Authority) 향상과 "
|
||||||
|
"Schema Markup(JSON-LD) 적용을 우선 추진하세요."
|
||||||
|
)
|
||||||
|
|
||||||
|
# Based on sentiment
|
||||||
|
sentiment = result.sentiment_summary
|
||||||
|
if sentiment.get("negative_pct", 0) > 30:
|
||||||
|
recs.append(
|
||||||
|
f"AI 응답 중 부정적 언급 비율이 {sentiment['negative_pct']}%입니다. "
|
||||||
|
"브랜드 평판 관리와 긍정적 콘텐츠 확대가 필요합니다. "
|
||||||
|
"고객 리뷰, 성공 사례, 수상 내역 등을 강화하세요."
|
||||||
|
)
|
||||||
|
elif sentiment.get("overall_sentiment") == "positive":
|
||||||
|
recs.append(
|
||||||
|
"AI 응답에서 브랜드 언급이 전반적으로 긍정적입니다. "
|
||||||
|
"이 긍정적 이미지를 활용하여 더 많은 키워드에서 "
|
||||||
|
"AI 인용을 확대하세요."
|
||||||
|
)
|
||||||
|
|
||||||
|
# Content strategy recommendations
|
||||||
|
if result.cited_pages:
|
||||||
|
top_pages = sorted(result.cited_pages, key=lambda p: p.citation_count, reverse=True)[:3]
|
||||||
|
top_topics = set()
|
||||||
|
for page in top_pages:
|
||||||
|
top_topics.update(page.topics)
|
||||||
|
if top_topics:
|
||||||
|
topics_str = ", ".join(list(top_topics)[:5])
|
||||||
|
recs.append(
|
||||||
|
f"가장 많이 인용되는 주제는 [{topics_str}]입니다. "
|
||||||
|
"이 주제들에 대한 심층 콘텐츠를 추가 제작하세요."
|
||||||
|
)
|
||||||
|
|
||||||
|
# E-E-A-T and structured data
|
||||||
|
recs.append(
|
||||||
|
"AI 인용률 향상을 위한 핵심 전략: "
|
||||||
|
"(1) E-E-A-T 시그널 강화 - 저자 프로필, 전문가 인용, 실제 경험 콘텐츠, "
|
||||||
|
"(2) 구조화된 데이터 적용 - FAQ, HowTo, Article Schema, "
|
||||||
|
"(3) 콘텐츠 정확성 및 최신성 유지, "
|
||||||
|
"(4) 원본 데이터와 독자적 연구 결과 발행."
|
||||||
|
)
|
||||||
|
|
||||||
|
# Competitor-based recommendations
|
||||||
|
if result.competitor_citations:
|
||||||
|
leader = result.competitor_citations[0]
|
||||||
|
if not leader.get("is_target", False):
|
||||||
|
recs.append(
|
||||||
|
f"인용 리더는 {leader['domain']}입니다 "
|
||||||
|
f"(페이지 인용 {leader['total_page_citations']}건). "
|
||||||
|
"해당 경쟁사의 인용된 페이지를 분석하여 "
|
||||||
|
"콘텐츠 갭을 파악하세요."
|
||||||
|
)
|
||||||
|
|
||||||
|
return recs
|
||||||
|
|
||||||
|
# ---- Main Orchestrator ----
|
||||||
|
|
||||||
|
async def analyze(
|
||||||
|
self,
|
||||||
|
target: str,
|
||||||
|
competitors: list[str] | None = None,
|
||||||
|
include_responses: bool = True,
|
||||||
|
include_cited_domains: bool = True,
|
||||||
|
include_cited_pages: bool = True,
|
||||||
|
) -> CitationAnalysisResult:
|
||||||
|
"""
|
||||||
|
Orchestrate full citation analysis.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
target: Domain to analyze
|
||||||
|
competitors: Optional competitor domains
|
||||||
|
include_responses: Fetch AI response data
|
||||||
|
include_cited_domains: Fetch cited domains
|
||||||
|
include_cited_pages: Fetch cited pages
|
||||||
|
"""
|
||||||
|
self.logger.info(f"Starting AI citation analysis for {target}")
|
||||||
|
result = CitationAnalysisResult(target=target)
|
||||||
|
|
||||||
|
# AI responses
|
||||||
|
if include_responses:
|
||||||
|
result.ai_responses = await self.get_ai_responses(target)
|
||||||
|
result.sentiment_summary = self.analyze_response_sentiment(result.ai_responses)
|
||||||
|
|
||||||
|
# Cited domains
|
||||||
|
if include_cited_domains:
|
||||||
|
result.cited_domains = await self.get_cited_domains(target)
|
||||||
|
if result.cited_domains:
|
||||||
|
result.citation_ranking = self.rank_citations(result.cited_domains)
|
||||||
|
|
||||||
|
# Cited pages
|
||||||
|
if include_cited_pages:
|
||||||
|
result.cited_pages = await self.get_cited_pages(target)
|
||||||
|
|
||||||
|
# Competitor comparison
|
||||||
|
if competitors:
|
||||||
|
result.competitor_citations = await self.compare_competitor_citations(
|
||||||
|
target, competitors
|
||||||
|
)
|
||||||
|
|
||||||
|
# Recommendations
|
||||||
|
result.recommendations = self.generate_recommendations(result)
|
||||||
|
|
||||||
|
self.print_stats()
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# CLI
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def build_parser() -> argparse.ArgumentParser:
|
||||||
|
"""Build argument parser for CLI usage."""
|
||||||
|
parser = argparse.ArgumentParser(
|
||||||
|
description="AI Citation Analyzer - Analyze AI answer citations and source pages",
|
||||||
|
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||||
|
epilog="""
|
||||||
|
Examples:
|
||||||
|
%(prog)s --target example.com --json
|
||||||
|
%(prog)s --target example.com --cited-domains --json
|
||||||
|
%(prog)s --target example.com --cited-pages --json
|
||||||
|
%(prog)s --target example.com --responses --competitor comp1.com --json
|
||||||
|
%(prog)s --target example.com --output citations.json
|
||||||
|
""",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--target", required=True,
|
||||||
|
help="Target domain to analyze (e.g., example.com)",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--competitor", action="append", default=[],
|
||||||
|
help="Competitor domain (repeatable). e.g., --competitor a.com --competitor b.com",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--cited-domains", action="store_true",
|
||||||
|
help="Include cited domains analysis",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--cited-pages", action="store_true",
|
||||||
|
help="Include cited pages analysis",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--responses", action="store_true",
|
||||||
|
help="Include AI response content analysis",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--json", action="store_true",
|
||||||
|
help="Output result as JSON to stdout",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--output", type=str, default=None,
|
||||||
|
help="Save JSON output to file path",
|
||||||
|
)
|
||||||
|
return parser
|
||||||
|
|
||||||
|
|
||||||
|
def print_summary(result: CitationAnalysisResult) -> None:
|
||||||
|
"""Print a human-readable summary of citation analysis."""
|
||||||
|
print("\n" + "=" * 60)
|
||||||
|
print(f" AI Citation Analysis: {result.target}")
|
||||||
|
print("=" * 60)
|
||||||
|
|
||||||
|
# AI Responses
|
||||||
|
if result.ai_responses:
|
||||||
|
print(f"\n AI Responses: {len(result.ai_responses)}")
|
||||||
|
for resp in result.ai_responses[:5]:
|
||||||
|
engine_tag = f" [{resp.source_engine}]" if resp.source_engine else ""
|
||||||
|
sentiment_tag = f" ({resp.sentiment})"
|
||||||
|
print(f" - Q: {resp.query[:60]}{engine_tag}{sentiment_tag}")
|
||||||
|
if len(result.ai_responses) > 5:
|
||||||
|
print(f" ... and {len(result.ai_responses) - 5} more")
|
||||||
|
|
||||||
|
# Sentiment Summary
|
||||||
|
if result.sentiment_summary:
|
||||||
|
s = result.sentiment_summary
|
||||||
|
print(f"\n Sentiment: {s.get('overall_sentiment', 'unknown')}")
|
||||||
|
print(f" Positive: {s.get('positive', 0)} ({s.get('positive_pct', 0):.1f}%)")
|
||||||
|
print(f" Neutral: {s.get('neutral', 0)} ({s.get('neutral_pct', 0):.1f}%)")
|
||||||
|
print(f" Negative: {s.get('negative', 0)} ({s.get('negative_pct', 0):.1f}%)")
|
||||||
|
|
||||||
|
# Cited Domains
|
||||||
|
if result.cited_domains:
|
||||||
|
print(f"\n Cited Domains: {len(result.cited_domains)}")
|
||||||
|
for domain in result.cited_domains[:10]:
|
||||||
|
topics_str = ", ".join(domain.topics[:3]) if domain.topics else ""
|
||||||
|
print(f" {domain.domain}: {domain.citation_count} citations"
|
||||||
|
f"{f' [{topics_str}]' if topics_str else ''}")
|
||||||
|
if len(result.cited_domains) > 10:
|
||||||
|
print(f" ... and {len(result.cited_domains) - 10} more")
|
||||||
|
|
||||||
|
# Cited Pages
|
||||||
|
if result.cited_pages:
|
||||||
|
print(f"\n Cited Pages: {len(result.cited_pages)}")
|
||||||
|
for page in result.cited_pages[:10]:
|
||||||
|
title = page.title[:50] if page.title else page.url[:50]
|
||||||
|
print(f" {title}: {page.citation_count} citations")
|
||||||
|
if len(result.cited_pages) > 10:
|
||||||
|
print(f" ... and {len(result.cited_pages) - 10} more")
|
||||||
|
|
||||||
|
# Competitor Comparison
|
||||||
|
if result.competitor_citations:
|
||||||
|
print("\n Competitor Citation Comparison:")
|
||||||
|
for comp in result.competitor_citations:
|
||||||
|
marker = " <-- target" if comp.get("is_target") else ""
|
||||||
|
print(f" {comp['domain']}: "
|
||||||
|
f"domains={comp['unique_cited_domains']}, "
|
||||||
|
f"pages={comp['unique_cited_pages']}, "
|
||||||
|
f"page_citations={comp['total_page_citations']}{marker}")
|
||||||
|
|
||||||
|
# Recommendations
|
||||||
|
if result.recommendations:
|
||||||
|
print("\n Recommendations:")
|
||||||
|
for i, rec in enumerate(result.recommendations, 1):
|
||||||
|
print(f" {i}. {rec}")
|
||||||
|
|
||||||
|
print("\n" + "=" * 60)
|
||||||
|
print(f" Generated: {result.timestamp}")
|
||||||
|
print("=" * 60 + "\n")
|
||||||
|
|
||||||
|
|
||||||
|
async def main() -> None:
|
||||||
|
"""CLI entry point."""
|
||||||
|
parser = build_parser()
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
# Determine which sections to include
|
||||||
|
# If no specific flags, include everything
|
||||||
|
any_specific = args.cited_domains or args.cited_pages or args.responses
|
||||||
|
include_responses = args.responses or not any_specific
|
||||||
|
include_cited_domains = args.cited_domains or not any_specific
|
||||||
|
include_cited_pages = args.cited_pages or not any_specific
|
||||||
|
|
||||||
|
analyzer = AiCitationAnalyzer(
|
||||||
|
max_concurrent=5,
|
||||||
|
requests_per_second=2.0,
|
||||||
|
)
|
||||||
|
|
||||||
|
result = await analyzer.analyze(
|
||||||
|
target=args.target,
|
||||||
|
competitors=args.competitor if args.competitor else None,
|
||||||
|
include_responses=include_responses,
|
||||||
|
include_cited_domains=include_cited_domains,
|
||||||
|
include_cited_pages=include_cited_pages,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Output
|
||||||
|
if args.json or args.output:
|
||||||
|
output_data = result.to_dict()
|
||||||
|
json_str = json.dumps(output_data, ensure_ascii=False, indent=2)
|
||||||
|
|
||||||
|
if args.json:
|
||||||
|
print(json_str)
|
||||||
|
|
||||||
|
if args.output:
|
||||||
|
output_path = Path(args.output)
|
||||||
|
output_path.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
output_path.write_text(json_str, encoding="utf-8")
|
||||||
|
logger.info(f"Report saved to {args.output}")
|
||||||
|
else:
|
||||||
|
print_summary(result)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
asyncio.run(main())
|
||||||
@@ -0,0 +1,594 @@
|
|||||||
|
"""
|
||||||
|
AI Visibility Tracker - Brand Radar Monitoring
|
||||||
|
================================================
|
||||||
|
Purpose: Track brand visibility in AI-generated search answers
|
||||||
|
using Ahrefs Brand Radar APIs.
|
||||||
|
Python: 3.10+
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python ai_visibility_tracker.py --target example.com --json
|
||||||
|
python ai_visibility_tracker.py --target example.com --competitor comp1.com --json
|
||||||
|
python ai_visibility_tracker.py --target example.com --history --json
|
||||||
|
python ai_visibility_tracker.py --target example.com --sov --json
|
||||||
|
"""
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import asyncio
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
import subprocess
|
||||||
|
import sys
|
||||||
|
from dataclasses import dataclass, field, asdict
|
||||||
|
from datetime import datetime
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
# Add parent to path for base_client import
|
||||||
|
sys.path.insert(0, str(Path(__file__).parent))
|
||||||
|
from base_client import BaseAsyncClient, config
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Data classes
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class ImpressionMetrics:
|
||||||
|
"""AI search impression metrics for a brand."""
|
||||||
|
total: int = 0
|
||||||
|
trend: str = "stable" # increasing, decreasing, stable
|
||||||
|
change_pct: float = 0.0
|
||||||
|
period: str = ""
|
||||||
|
breakdown: dict = field(default_factory=dict)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class MentionMetrics:
|
||||||
|
"""AI search mention metrics for a brand."""
|
||||||
|
total: int = 0
|
||||||
|
trend: str = "stable"
|
||||||
|
change_pct: float = 0.0
|
||||||
|
period: str = ""
|
||||||
|
breakdown: dict = field(default_factory=dict)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class SovMetric:
|
||||||
|
"""Share of Voice metric for a single domain."""
|
||||||
|
domain: str = ""
|
||||||
|
sov_pct: float = 0.0
|
||||||
|
change_pct: float = 0.0
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class HistoryPoint:
|
||||||
|
"""Single data point in a time series."""
|
||||||
|
date: str = ""
|
||||||
|
value: float = 0.0
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class CompetitorVisibility:
|
||||||
|
"""Aggregated AI visibility metrics for a competitor domain."""
|
||||||
|
domain: str = ""
|
||||||
|
impressions: int = 0
|
||||||
|
mentions: int = 0
|
||||||
|
sov: float = 0.0
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class AiVisibilityResult:
|
||||||
|
"""Complete AI visibility tracking result."""
|
||||||
|
target: str = ""
|
||||||
|
impressions: ImpressionMetrics = field(default_factory=ImpressionMetrics)
|
||||||
|
mentions: MentionMetrics = field(default_factory=MentionMetrics)
|
||||||
|
share_of_voice: dict = field(default_factory=dict)
|
||||||
|
impressions_history: list[HistoryPoint] = field(default_factory=list)
|
||||||
|
mentions_history: list[HistoryPoint] = field(default_factory=list)
|
||||||
|
sov_history: list[HistoryPoint] = field(default_factory=list)
|
||||||
|
competitors: list[CompetitorVisibility] = field(default_factory=list)
|
||||||
|
recommendations: list[str] = field(default_factory=list)
|
||||||
|
timestamp: str = field(default_factory=lambda: datetime.now().isoformat())
|
||||||
|
|
||||||
|
def to_dict(self) -> dict:
|
||||||
|
"""Convert result to dictionary."""
|
||||||
|
return {
|
||||||
|
"target": self.target,
|
||||||
|
"impressions": asdict(self.impressions),
|
||||||
|
"mentions": asdict(self.mentions),
|
||||||
|
"share_of_voice": self.share_of_voice,
|
||||||
|
"impressions_history": [asdict(h) for h in self.impressions_history],
|
||||||
|
"mentions_history": [asdict(h) for h in self.mentions_history],
|
||||||
|
"sov_history": [asdict(h) for h in self.sov_history],
|
||||||
|
"competitors": [asdict(c) for c in self.competitors],
|
||||||
|
"recommendations": self.recommendations,
|
||||||
|
"timestamp": self.timestamp,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# MCP tool caller helper
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def call_mcp_tool(tool_name: str, params: dict) -> dict:
|
||||||
|
"""
|
||||||
|
Call an Ahrefs MCP tool and return the parsed JSON response.
|
||||||
|
|
||||||
|
In Claude Desktop / Claude Code environments the MCP tools are invoked
|
||||||
|
directly by the AI agent. This helper exists so that the script can also
|
||||||
|
be executed standalone via subprocess for testing purposes.
|
||||||
|
"""
|
||||||
|
logger.info(f"Calling MCP tool: {tool_name} with params: {params}")
|
||||||
|
try:
|
||||||
|
cmd = ["claude", "mcp", "call", "ahrefs", tool_name, json.dumps(params)]
|
||||||
|
result = subprocess.run(cmd, capture_output=True, text=True, timeout=120)
|
||||||
|
if result.returncode == 0 and result.stdout.strip():
|
||||||
|
return json.loads(result.stdout.strip())
|
||||||
|
logger.warning(f"MCP tool {tool_name} returned non-zero or empty: {result.stderr}")
|
||||||
|
return {}
|
||||||
|
except (subprocess.TimeoutExpired, json.JSONDecodeError, FileNotFoundError) as exc:
|
||||||
|
logger.warning(f"MCP call failed ({exc}). Returning empty dict.")
|
||||||
|
return {}
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# AI Visibility Tracker
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
class AiVisibilityTracker(BaseAsyncClient):
|
||||||
|
"""Track brand visibility across AI-generated search results."""
|
||||||
|
|
||||||
|
def __init__(self, **kwargs):
|
||||||
|
super().__init__(**kwargs)
|
||||||
|
self.logger = logging.getLogger(self.__class__.__name__)
|
||||||
|
|
||||||
|
# ---- Impressions ----
|
||||||
|
|
||||||
|
async def get_impressions_overview(self, target: str) -> ImpressionMetrics:
|
||||||
|
"""Fetch current AI impression metrics via brand-radar-impressions-overview."""
|
||||||
|
self.logger.info(f"Fetching impressions overview for {target}")
|
||||||
|
data = await asyncio.to_thread(
|
||||||
|
call_mcp_tool,
|
||||||
|
"brand-radar-impressions-overview",
|
||||||
|
{"target": target},
|
||||||
|
)
|
||||||
|
metrics = ImpressionMetrics()
|
||||||
|
if not data:
|
||||||
|
return metrics
|
||||||
|
|
||||||
|
metrics.total = data.get("total_impressions", data.get("impressions", 0))
|
||||||
|
metrics.change_pct = data.get("change_pct", data.get("change", 0.0))
|
||||||
|
metrics.period = data.get("period", "")
|
||||||
|
metrics.breakdown = data.get("breakdown", {})
|
||||||
|
|
||||||
|
if metrics.change_pct > 5:
|
||||||
|
metrics.trend = "increasing"
|
||||||
|
elif metrics.change_pct < -5:
|
||||||
|
metrics.trend = "decreasing"
|
||||||
|
else:
|
||||||
|
metrics.trend = "stable"
|
||||||
|
|
||||||
|
return metrics
|
||||||
|
|
||||||
|
# ---- Mentions ----
|
||||||
|
|
||||||
|
async def get_mentions_overview(self, target: str) -> MentionMetrics:
|
||||||
|
"""Fetch current AI mention metrics via brand-radar-mentions-overview."""
|
||||||
|
self.logger.info(f"Fetching mentions overview for {target}")
|
||||||
|
data = await asyncio.to_thread(
|
||||||
|
call_mcp_tool,
|
||||||
|
"brand-radar-mentions-overview",
|
||||||
|
{"target": target},
|
||||||
|
)
|
||||||
|
metrics = MentionMetrics()
|
||||||
|
if not data:
|
||||||
|
return metrics
|
||||||
|
|
||||||
|
metrics.total = data.get("total_mentions", data.get("mentions", 0))
|
||||||
|
metrics.change_pct = data.get("change_pct", data.get("change", 0.0))
|
||||||
|
metrics.period = data.get("period", "")
|
||||||
|
metrics.breakdown = data.get("breakdown", {})
|
||||||
|
|
||||||
|
if metrics.change_pct > 5:
|
||||||
|
metrics.trend = "increasing"
|
||||||
|
elif metrics.change_pct < -5:
|
||||||
|
metrics.trend = "decreasing"
|
||||||
|
else:
|
||||||
|
metrics.trend = "stable"
|
||||||
|
|
||||||
|
return metrics
|
||||||
|
|
||||||
|
# ---- Share of Voice ----
|
||||||
|
|
||||||
|
async def get_sov_overview(self, target: str) -> dict:
|
||||||
|
"""Fetch Share of Voice overview via brand-radar-sov-overview."""
|
||||||
|
self.logger.info(f"Fetching SOV overview for {target}")
|
||||||
|
data = await asyncio.to_thread(
|
||||||
|
call_mcp_tool,
|
||||||
|
"brand-radar-sov-overview",
|
||||||
|
{"target": target},
|
||||||
|
)
|
||||||
|
if not data:
|
||||||
|
return {"brand_sov": 0.0, "competitors": []}
|
||||||
|
|
||||||
|
brand_sov = data.get("sov", data.get("share_of_voice", 0.0))
|
||||||
|
competitors_raw = data.get("competitors", [])
|
||||||
|
competitors = []
|
||||||
|
for comp in competitors_raw:
|
||||||
|
competitors.append(SovMetric(
|
||||||
|
domain=comp.get("domain", ""),
|
||||||
|
sov_pct=comp.get("sov", comp.get("share_of_voice", 0.0)),
|
||||||
|
change_pct=comp.get("change_pct", 0.0),
|
||||||
|
))
|
||||||
|
|
||||||
|
return {
|
||||||
|
"brand_sov": brand_sov,
|
||||||
|
"competitors": [asdict(c) for c in competitors],
|
||||||
|
}
|
||||||
|
|
||||||
|
# ---- History ----
|
||||||
|
|
||||||
|
async def get_impressions_history(self, target: str) -> list[HistoryPoint]:
|
||||||
|
"""Fetch impressions history via brand-radar-impressions-history."""
|
||||||
|
self.logger.info(f"Fetching impressions history for {target}")
|
||||||
|
data = await asyncio.to_thread(
|
||||||
|
call_mcp_tool,
|
||||||
|
"brand-radar-impressions-history",
|
||||||
|
{"target": target},
|
||||||
|
)
|
||||||
|
return self._parse_history(data)
|
||||||
|
|
||||||
|
async def get_mentions_history(self, target: str) -> list[HistoryPoint]:
|
||||||
|
"""Fetch mentions history via brand-radar-mentions-history."""
|
||||||
|
self.logger.info(f"Fetching mentions history for {target}")
|
||||||
|
data = await asyncio.to_thread(
|
||||||
|
call_mcp_tool,
|
||||||
|
"brand-radar-mentions-history",
|
||||||
|
{"target": target},
|
||||||
|
)
|
||||||
|
return self._parse_history(data)
|
||||||
|
|
||||||
|
async def get_sov_history(self, target: str) -> list[HistoryPoint]:
|
||||||
|
"""Fetch SOV history via brand-radar-sov-history."""
|
||||||
|
self.logger.info(f"Fetching SOV history for {target}")
|
||||||
|
data = await asyncio.to_thread(
|
||||||
|
call_mcp_tool,
|
||||||
|
"brand-radar-sov-history",
|
||||||
|
{"target": target},
|
||||||
|
)
|
||||||
|
return self._parse_history(data)
|
||||||
|
|
||||||
|
def _parse_history(self, data: dict | list) -> list[HistoryPoint]:
|
||||||
|
"""Parse history data from MCP response into HistoryPoint list."""
|
||||||
|
points: list[HistoryPoint] = []
|
||||||
|
if not data:
|
||||||
|
return points
|
||||||
|
|
||||||
|
items = data if isinstance(data, list) else data.get("history", data.get("data", []))
|
||||||
|
for item in items:
|
||||||
|
if isinstance(item, dict):
|
||||||
|
points.append(HistoryPoint(
|
||||||
|
date=item.get("date", item.get("period", "")),
|
||||||
|
value=item.get("value", item.get("impressions", item.get("mentions", item.get("sov", 0.0)))),
|
||||||
|
))
|
||||||
|
return points
|
||||||
|
|
||||||
|
# ---- Competitor Comparison ----
|
||||||
|
|
||||||
|
async def compare_competitors(
|
||||||
|
self, target: str, competitors: list[str]
|
||||||
|
) -> list[CompetitorVisibility]:
|
||||||
|
"""Aggregate AI visibility metrics for target and competitors."""
|
||||||
|
self.logger.info(f"Comparing competitors: {competitors}")
|
||||||
|
results: list[CompetitorVisibility] = []
|
||||||
|
|
||||||
|
all_domains = [target] + competitors
|
||||||
|
for domain in all_domains:
|
||||||
|
imp = await self.get_impressions_overview(domain)
|
||||||
|
men = await self.get_mentions_overview(domain)
|
||||||
|
sov_data = await self.get_sov_overview(domain)
|
||||||
|
|
||||||
|
results.append(CompetitorVisibility(
|
||||||
|
domain=domain,
|
||||||
|
impressions=imp.total,
|
||||||
|
mentions=men.total,
|
||||||
|
sov=sov_data.get("brand_sov", 0.0),
|
||||||
|
))
|
||||||
|
|
||||||
|
# Sort by SOV descending
|
||||||
|
results.sort(key=lambda x: x.sov, reverse=True)
|
||||||
|
return results
|
||||||
|
|
||||||
|
# ---- Trend Calculation ----
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def calculate_trends(history: list[HistoryPoint]) -> dict:
|
||||||
|
"""Determine trend direction and statistics from history data."""
|
||||||
|
if not history or len(history) < 2:
|
||||||
|
return {
|
||||||
|
"direction": "insufficient_data",
|
||||||
|
"avg_value": 0.0,
|
||||||
|
"min_value": 0.0,
|
||||||
|
"max_value": 0.0,
|
||||||
|
"change_pct": 0.0,
|
||||||
|
"data_points": len(history) if history else 0,
|
||||||
|
}
|
||||||
|
|
||||||
|
values = [h.value for h in history]
|
||||||
|
first_value = values[0]
|
||||||
|
last_value = values[-1]
|
||||||
|
avg_value = sum(values) / len(values)
|
||||||
|
min_value = min(values)
|
||||||
|
max_value = max(values)
|
||||||
|
|
||||||
|
if first_value > 0:
|
||||||
|
change_pct = ((last_value - first_value) / first_value) * 100
|
||||||
|
else:
|
||||||
|
change_pct = 0.0
|
||||||
|
|
||||||
|
if change_pct > 10:
|
||||||
|
direction = "strongly_increasing"
|
||||||
|
elif change_pct > 3:
|
||||||
|
direction = "increasing"
|
||||||
|
elif change_pct < -10:
|
||||||
|
direction = "strongly_decreasing"
|
||||||
|
elif change_pct < -3:
|
||||||
|
direction = "decreasing"
|
||||||
|
else:
|
||||||
|
direction = "stable"
|
||||||
|
|
||||||
|
return {
|
||||||
|
"direction": direction,
|
||||||
|
"avg_value": round(avg_value, 2),
|
||||||
|
"min_value": round(min_value, 2),
|
||||||
|
"max_value": round(max_value, 2),
|
||||||
|
"change_pct": round(change_pct, 2),
|
||||||
|
"data_points": len(values),
|
||||||
|
}
|
||||||
|
|
||||||
|
# ---- Recommendations ----
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def generate_recommendations(result: AiVisibilityResult) -> list[str]:
|
||||||
|
"""Generate actionable recommendations for improving AI visibility."""
|
||||||
|
recs: list[str] = []
|
||||||
|
|
||||||
|
# Impression-based recommendations
|
||||||
|
if result.impressions.total == 0:
|
||||||
|
recs.append(
|
||||||
|
"AI 검색에서 브랜드 노출이 감지되지 않았습니다. "
|
||||||
|
"E-E-A-T 시그널(경험, 전문성, 권위성, 신뢰성)을 강화하여 "
|
||||||
|
"AI 엔진이 콘텐츠를 참조할 수 있도록 하세요."
|
||||||
|
)
|
||||||
|
elif result.impressions.trend == "decreasing":
|
||||||
|
recs.append(
|
||||||
|
"AI 검색 노출이 감소 추세입니다. 최신 콘텐츠 업데이트 및 "
|
||||||
|
"구조화된 데이터(Schema Markup) 추가를 검토하세요."
|
||||||
|
)
|
||||||
|
elif result.impressions.trend == "increasing":
|
||||||
|
recs.append(
|
||||||
|
"AI 검색 노출이 증가 추세입니다. 현재 콘텐츠 전략을 "
|
||||||
|
"유지하면서 추가 키워드 확장을 고려하세요."
|
||||||
|
)
|
||||||
|
|
||||||
|
# Mention-based recommendations
|
||||||
|
if result.mentions.total == 0:
|
||||||
|
recs.append(
|
||||||
|
"AI 응답에서 브랜드 언급이 없습니다. "
|
||||||
|
"브랜드명이 포함된 고품질 콘텐츠를 제작하고, "
|
||||||
|
"외부 사이트에서의 브랜드 언급(Citations)을 늘리세요."
|
||||||
|
)
|
||||||
|
elif result.mentions.trend == "decreasing":
|
||||||
|
recs.append(
|
||||||
|
"AI 응답 내 브랜드 언급이 줄어들고 있습니다. "
|
||||||
|
"콘텐츠 신선도(Freshness)와 업계 권위 신호를 점검하세요."
|
||||||
|
)
|
||||||
|
|
||||||
|
# SOV recommendations
|
||||||
|
sov_value = result.share_of_voice.get("brand_sov", 0.0)
|
||||||
|
if sov_value < 10:
|
||||||
|
recs.append(
|
||||||
|
f"AI 검색 Share of Voice가 {sov_value}%로 낮습니다. "
|
||||||
|
"핵심 키워드에 대한 종합 가이드, FAQ 콘텐츠, "
|
||||||
|
"원본 데이터/연구 자료를 발행하여 인용 가능성을 높이세요."
|
||||||
|
)
|
||||||
|
elif sov_value < 25:
|
||||||
|
recs.append(
|
||||||
|
f"AI 검색 Share of Voice가 {sov_value}%입니다. "
|
||||||
|
"경쟁사 대비 차별화된 전문 콘텐츠와 "
|
||||||
|
"독점 데이터 기반 인사이트를 강화하세요."
|
||||||
|
)
|
||||||
|
|
||||||
|
# Competitor-based recommendations
|
||||||
|
if result.competitors:
|
||||||
|
top_competitor = result.competitors[0]
|
||||||
|
if top_competitor.domain != result.target and top_competitor.sov > sov_value:
|
||||||
|
recs.append(
|
||||||
|
f"최대 경쟁사 {top_competitor.domain}의 SOV가 "
|
||||||
|
f"{top_competitor.sov}%로 앞서고 있습니다. "
|
||||||
|
"해당 경쟁사의 콘텐츠 전략과 인용 패턴을 분석하세요."
|
||||||
|
)
|
||||||
|
|
||||||
|
# General best practices
|
||||||
|
recs.append(
|
||||||
|
"AI 검색 최적화를 위해 다음 사항을 지속적으로 점검하세요: "
|
||||||
|
"(1) 구조화된 데이터(JSON-LD) 적용, "
|
||||||
|
"(2) FAQ 및 How-to 콘텐츠 발행, "
|
||||||
|
"(3) 신뢰할 수 있는 외부 사이트에서의 백링크 확보, "
|
||||||
|
"(4) 콘텐츠 정기 업데이트 및 정확성 검증."
|
||||||
|
)
|
||||||
|
|
||||||
|
return recs
|
||||||
|
|
||||||
|
# ---- Main Orchestrator ----
|
||||||
|
|
||||||
|
async def track(
|
||||||
|
self,
|
||||||
|
target: str,
|
||||||
|
competitors: list[str] | None = None,
|
||||||
|
include_history: bool = False,
|
||||||
|
include_sov: bool = False,
|
||||||
|
) -> AiVisibilityResult:
|
||||||
|
"""
|
||||||
|
Orchestrate full AI visibility tracking.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
target: Domain to track
|
||||||
|
competitors: Optional list of competitor domains
|
||||||
|
include_history: Whether to fetch historical trends
|
||||||
|
include_sov: Whether to include SOV analysis
|
||||||
|
"""
|
||||||
|
self.logger.info(f"Starting AI visibility tracking for {target}")
|
||||||
|
result = AiVisibilityResult(target=target)
|
||||||
|
|
||||||
|
# Core metrics (always fetched)
|
||||||
|
result.impressions = await self.get_impressions_overview(target)
|
||||||
|
result.mentions = await self.get_mentions_overview(target)
|
||||||
|
|
||||||
|
# Share of Voice
|
||||||
|
if include_sov or competitors:
|
||||||
|
result.share_of_voice = await self.get_sov_overview(target)
|
||||||
|
|
||||||
|
# History
|
||||||
|
if include_history:
|
||||||
|
result.impressions_history = await self.get_impressions_history(target)
|
||||||
|
result.mentions_history = await self.get_mentions_history(target)
|
||||||
|
if include_sov:
|
||||||
|
result.sov_history = await self.get_sov_history(target)
|
||||||
|
|
||||||
|
# Competitor comparison
|
||||||
|
if competitors:
|
||||||
|
result.competitors = await self.compare_competitors(target, competitors)
|
||||||
|
|
||||||
|
# Generate recommendations
|
||||||
|
result.recommendations = self.generate_recommendations(result)
|
||||||
|
|
||||||
|
self.print_stats()
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# CLI
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def build_parser() -> argparse.ArgumentParser:
|
||||||
|
"""Build argument parser for CLI usage."""
|
||||||
|
parser = argparse.ArgumentParser(
|
||||||
|
description="AI Visibility Tracker - Monitor brand visibility in AI search results",
|
||||||
|
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||||
|
epilog="""
|
||||||
|
Examples:
|
||||||
|
%(prog)s --target example.com --json
|
||||||
|
%(prog)s --target example.com --competitor comp1.com --competitor comp2.com --json
|
||||||
|
%(prog)s --target example.com --history --sov --json
|
||||||
|
%(prog)s --target example.com --output report.json
|
||||||
|
""",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--target", required=True,
|
||||||
|
help="Target domain to track (e.g., example.com)",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--competitor", action="append", default=[],
|
||||||
|
help="Competitor domain (repeatable). e.g., --competitor a.com --competitor b.com",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--history", action="store_true",
|
||||||
|
help="Include historical trend data (impressions, mentions, SOV over time)",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--sov", action="store_true",
|
||||||
|
help="Include Share of Voice analysis",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--json", action="store_true",
|
||||||
|
help="Output result as JSON to stdout",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--output", type=str, default=None,
|
||||||
|
help="Save JSON output to file path",
|
||||||
|
)
|
||||||
|
return parser
|
||||||
|
|
||||||
|
|
||||||
|
def print_summary(result: AiVisibilityResult) -> None:
|
||||||
|
"""Print a human-readable summary of AI visibility results."""
|
||||||
|
print("\n" + "=" * 60)
|
||||||
|
print(f" AI Visibility Report: {result.target}")
|
||||||
|
print("=" * 60)
|
||||||
|
|
||||||
|
print(f"\n Impressions: {result.impressions.total:,}")
|
||||||
|
print(f" Trend: {result.impressions.trend} ({result.impressions.change_pct:+.1f}%)")
|
||||||
|
|
||||||
|
print(f"\n Mentions: {result.mentions.total:,}")
|
||||||
|
print(f" Trend: {result.mentions.trend} ({result.mentions.change_pct:+.1f}%)")
|
||||||
|
|
||||||
|
if result.share_of_voice:
|
||||||
|
sov = result.share_of_voice.get("brand_sov", 0.0)
|
||||||
|
print(f"\n Share of Voice: {sov:.1f}%")
|
||||||
|
comp_list = result.share_of_voice.get("competitors", [])
|
||||||
|
if comp_list:
|
||||||
|
print(" Competitors:")
|
||||||
|
for c in comp_list:
|
||||||
|
print(f" {c.get('domain', '?')}: {c.get('sov_pct', 0):.1f}%")
|
||||||
|
|
||||||
|
if result.impressions_history:
|
||||||
|
trend_info = AiVisibilityTracker.calculate_trends(result.impressions_history)
|
||||||
|
print(f"\n Impressions Trend: {trend_info['direction']}")
|
||||||
|
print(f" Range: {trend_info['min_value']:,.0f} - {trend_info['max_value']:,.0f}")
|
||||||
|
print(f" Change: {trend_info['change_pct']:+.1f}%")
|
||||||
|
|
||||||
|
if result.competitors:
|
||||||
|
print("\n Competitor Comparison:")
|
||||||
|
for cv in result.competitors:
|
||||||
|
marker = " <-- target" if cv.domain == result.target else ""
|
||||||
|
print(f" {cv.domain}: SOV={cv.sov:.1f}%, Imp={cv.impressions:,}, Men={cv.mentions:,}{marker}")
|
||||||
|
|
||||||
|
if result.recommendations:
|
||||||
|
print("\n Recommendations:")
|
||||||
|
for i, rec in enumerate(result.recommendations, 1):
|
||||||
|
print(f" {i}. {rec}")
|
||||||
|
|
||||||
|
print("\n" + "=" * 60)
|
||||||
|
print(f" Generated: {result.timestamp}")
|
||||||
|
print("=" * 60 + "\n")
|
||||||
|
|
||||||
|
|
||||||
|
async def main() -> None:
|
||||||
|
"""CLI entry point."""
|
||||||
|
parser = build_parser()
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
tracker = AiVisibilityTracker(
|
||||||
|
max_concurrent=5,
|
||||||
|
requests_per_second=2.0,
|
||||||
|
)
|
||||||
|
|
||||||
|
result = await tracker.track(
|
||||||
|
target=args.target,
|
||||||
|
competitors=args.competitor if args.competitor else None,
|
||||||
|
include_history=args.history,
|
||||||
|
include_sov=args.sov,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Output
|
||||||
|
if args.json or args.output:
|
||||||
|
output_data = result.to_dict()
|
||||||
|
json_str = json.dumps(output_data, ensure_ascii=False, indent=2)
|
||||||
|
|
||||||
|
if args.json:
|
||||||
|
print(json_str)
|
||||||
|
|
||||||
|
if args.output:
|
||||||
|
output_path = Path(args.output)
|
||||||
|
output_path.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
output_path.write_text(json_str, encoding="utf-8")
|
||||||
|
logger.info(f"Report saved to {args.output}")
|
||||||
|
else:
|
||||||
|
print_summary(result)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
asyncio.run(main())
|
||||||
207
custom-skills/27-seo-ai-visibility/code/scripts/base_client.py
Normal file
207
custom-skills/27-seo-ai-visibility/code/scripts/base_client.py
Normal file
@@ -0,0 +1,207 @@
|
|||||||
|
"""
|
||||||
|
Base Client - Shared async client utilities
|
||||||
|
===========================================
|
||||||
|
Purpose: Rate-limited async operations for API clients
|
||||||
|
Python: 3.10+
|
||||||
|
"""
|
||||||
|
|
||||||
|
import asyncio
|
||||||
|
import logging
|
||||||
|
import os
|
||||||
|
from asyncio import Semaphore
|
||||||
|
from datetime import datetime
|
||||||
|
from typing import Any, Callable, TypeVar
|
||||||
|
|
||||||
|
from dotenv import load_dotenv
|
||||||
|
from tenacity import (
|
||||||
|
retry,
|
||||||
|
stop_after_attempt,
|
||||||
|
wait_exponential,
|
||||||
|
retry_if_exception_type,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Load environment variables
|
||||||
|
load_dotenv()
|
||||||
|
|
||||||
|
# Logging setup
|
||||||
|
logging.basicConfig(
|
||||||
|
level=logging.INFO,
|
||||||
|
format="%(asctime)s - %(levelname)s - %(message)s",
|
||||||
|
)
|
||||||
|
|
||||||
|
T = TypeVar("T")
|
||||||
|
|
||||||
|
|
||||||
|
class RateLimiter:
|
||||||
|
"""Rate limiter using token bucket algorithm."""
|
||||||
|
|
||||||
|
def __init__(self, rate: float, per: float = 1.0):
|
||||||
|
"""
|
||||||
|
Initialize rate limiter.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
rate: Number of requests allowed
|
||||||
|
per: Time period in seconds (default: 1 second)
|
||||||
|
"""
|
||||||
|
self.rate = rate
|
||||||
|
self.per = per
|
||||||
|
self.tokens = rate
|
||||||
|
self.last_update = datetime.now()
|
||||||
|
self._lock = asyncio.Lock()
|
||||||
|
|
||||||
|
async def acquire(self) -> None:
|
||||||
|
"""Acquire a token, waiting if necessary."""
|
||||||
|
async with self._lock:
|
||||||
|
now = datetime.now()
|
||||||
|
elapsed = (now - self.last_update).total_seconds()
|
||||||
|
self.tokens = min(self.rate, self.tokens + elapsed * (self.rate / self.per))
|
||||||
|
self.last_update = now
|
||||||
|
|
||||||
|
if self.tokens < 1:
|
||||||
|
wait_time = (1 - self.tokens) * (self.per / self.rate)
|
||||||
|
await asyncio.sleep(wait_time)
|
||||||
|
self.tokens = 0
|
||||||
|
else:
|
||||||
|
self.tokens -= 1
|
||||||
|
|
||||||
|
|
||||||
|
class BaseAsyncClient:
|
||||||
|
"""Base class for async API clients with rate limiting."""
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
max_concurrent: int = 5,
|
||||||
|
requests_per_second: float = 3.0,
|
||||||
|
logger: logging.Logger | None = None,
|
||||||
|
):
|
||||||
|
"""
|
||||||
|
Initialize base client.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
max_concurrent: Maximum concurrent requests
|
||||||
|
requests_per_second: Rate limit
|
||||||
|
logger: Logger instance
|
||||||
|
"""
|
||||||
|
self.semaphore = Semaphore(max_concurrent)
|
||||||
|
self.rate_limiter = RateLimiter(requests_per_second)
|
||||||
|
self.logger = logger or logging.getLogger(self.__class__.__name__)
|
||||||
|
self.stats = {
|
||||||
|
"requests": 0,
|
||||||
|
"success": 0,
|
||||||
|
"errors": 0,
|
||||||
|
"retries": 0,
|
||||||
|
}
|
||||||
|
|
||||||
|
@retry(
|
||||||
|
stop=stop_after_attempt(3),
|
||||||
|
wait=wait_exponential(multiplier=1, min=2, max=10),
|
||||||
|
retry=retry_if_exception_type(Exception),
|
||||||
|
)
|
||||||
|
async def _rate_limited_request(
|
||||||
|
self,
|
||||||
|
coro: Callable[[], Any],
|
||||||
|
) -> Any:
|
||||||
|
"""Execute a request with rate limiting and retry."""
|
||||||
|
async with self.semaphore:
|
||||||
|
await self.rate_limiter.acquire()
|
||||||
|
self.stats["requests"] += 1
|
||||||
|
try:
|
||||||
|
result = await coro()
|
||||||
|
self.stats["success"] += 1
|
||||||
|
return result
|
||||||
|
except Exception as e:
|
||||||
|
self.stats["errors"] += 1
|
||||||
|
self.logger.error(f"Request failed: {e}")
|
||||||
|
raise
|
||||||
|
|
||||||
|
async def batch_requests(
|
||||||
|
self,
|
||||||
|
requests: list[Callable[[], Any]],
|
||||||
|
desc: str = "Processing",
|
||||||
|
) -> list[Any]:
|
||||||
|
"""Execute multiple requests concurrently."""
|
||||||
|
try:
|
||||||
|
from tqdm.asyncio import tqdm
|
||||||
|
has_tqdm = True
|
||||||
|
except ImportError:
|
||||||
|
has_tqdm = False
|
||||||
|
|
||||||
|
async def execute(req: Callable) -> Any:
|
||||||
|
try:
|
||||||
|
return await self._rate_limited_request(req)
|
||||||
|
except Exception as e:
|
||||||
|
return {"error": str(e)}
|
||||||
|
|
||||||
|
tasks = [execute(req) for req in requests]
|
||||||
|
|
||||||
|
if has_tqdm:
|
||||||
|
results = []
|
||||||
|
for coro in tqdm.as_completed(tasks, total=len(tasks), desc=desc):
|
||||||
|
result = await coro
|
||||||
|
results.append(result)
|
||||||
|
return results
|
||||||
|
else:
|
||||||
|
return await asyncio.gather(*tasks, return_exceptions=True)
|
||||||
|
|
||||||
|
def print_stats(self) -> None:
|
||||||
|
"""Print request statistics."""
|
||||||
|
self.logger.info("=" * 40)
|
||||||
|
self.logger.info("Request Statistics:")
|
||||||
|
self.logger.info(f" Total Requests: {self.stats['requests']}")
|
||||||
|
self.logger.info(f" Successful: {self.stats['success']}")
|
||||||
|
self.logger.info(f" Errors: {self.stats['errors']}")
|
||||||
|
self.logger.info("=" * 40)
|
||||||
|
|
||||||
|
|
||||||
|
class ConfigManager:
|
||||||
|
"""Manage API configuration and credentials."""
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
load_dotenv()
|
||||||
|
|
||||||
|
@property
|
||||||
|
def google_credentials_path(self) -> str | None:
|
||||||
|
"""Get Google service account credentials path."""
|
||||||
|
# Prefer SEO-specific credentials, fallback to general credentials
|
||||||
|
seo_creds = os.path.expanduser("~/.credential/ourdigital-seo-agent.json")
|
||||||
|
if os.path.exists(seo_creds):
|
||||||
|
return seo_creds
|
||||||
|
return os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
|
||||||
|
|
||||||
|
@property
|
||||||
|
def pagespeed_api_key(self) -> str | None:
|
||||||
|
"""Get PageSpeed Insights API key."""
|
||||||
|
return os.getenv("PAGESPEED_API_KEY")
|
||||||
|
|
||||||
|
@property
|
||||||
|
def custom_search_api_key(self) -> str | None:
|
||||||
|
"""Get Custom Search API key."""
|
||||||
|
return os.getenv("CUSTOM_SEARCH_API_KEY")
|
||||||
|
|
||||||
|
@property
|
||||||
|
def custom_search_engine_id(self) -> str | None:
|
||||||
|
"""Get Custom Search Engine ID."""
|
||||||
|
return os.getenv("CUSTOM_SEARCH_ENGINE_ID")
|
||||||
|
|
||||||
|
@property
|
||||||
|
def notion_token(self) -> str | None:
|
||||||
|
"""Get Notion API token."""
|
||||||
|
return os.getenv("NOTION_TOKEN") or os.getenv("NOTION_API_KEY")
|
||||||
|
|
||||||
|
def validate_google_credentials(self) -> bool:
|
||||||
|
"""Validate Google credentials are configured."""
|
||||||
|
creds_path = self.google_credentials_path
|
||||||
|
if not creds_path:
|
||||||
|
return False
|
||||||
|
return os.path.exists(creds_path)
|
||||||
|
|
||||||
|
def get_required(self, key: str) -> str:
|
||||||
|
"""Get required environment variable or raise error."""
|
||||||
|
value = os.getenv(key)
|
||||||
|
if not value:
|
||||||
|
raise ValueError(f"Missing required environment variable: {key}")
|
||||||
|
return value
|
||||||
|
|
||||||
|
|
||||||
|
# Singleton config instance
|
||||||
|
config = ConfigManager()
|
||||||
@@ -0,0 +1,8 @@
|
|||||||
|
# 27-seo-ai-visibility dependencies
|
||||||
|
requests>=2.31.0
|
||||||
|
aiohttp>=3.9.0
|
||||||
|
pandas>=2.1.0
|
||||||
|
tenacity>=8.2.0
|
||||||
|
tqdm>=4.66.0
|
||||||
|
python-dotenv>=1.0.0
|
||||||
|
rich>=13.7.0
|
||||||
66
custom-skills/27-seo-ai-visibility/desktop/SKILL.md
Normal file
66
custom-skills/27-seo-ai-visibility/desktop/SKILL.md
Normal file
@@ -0,0 +1,66 @@
|
|||||||
|
---
|
||||||
|
name: seo-ai-visibility
|
||||||
|
description: |
|
||||||
|
AI search visibility and brand radar monitoring. Tracks how a brand appears
|
||||||
|
in AI-generated search answers using Ahrefs Brand Radar APIs.
|
||||||
|
Triggers: AI search, AI visibility, brand radar, AI citations,
|
||||||
|
share of voice, AI answers, AI mentions.
|
||||||
|
---
|
||||||
|
|
||||||
|
# SEO AI Visibility & Brand Radar
|
||||||
|
|
||||||
|
Monitor and analyze brand visibility in AI-generated search results. This skill uses Ahrefs Brand Radar APIs to track impressions, mentions, share of voice, cited domains, cited pages, and AI response content.
|
||||||
|
|
||||||
|
## Capabilities
|
||||||
|
|
||||||
|
### AI Visibility Tracking
|
||||||
|
- **Impressions Overview** - How often the brand appears in AI answers
|
||||||
|
- **Mentions Overview** - Brand mention frequency across AI engines
|
||||||
|
- **Share of Voice (SOV)** - Brand's share vs competitors in AI search
|
||||||
|
- **Historical Trends** - Impressions, mentions, and SOV over time
|
||||||
|
- **Competitor Comparison** - Side-by-side AI visibility metrics
|
||||||
|
|
||||||
|
### AI Citation Analysis
|
||||||
|
- **AI Response Analysis** - Content and sentiment of AI mentions
|
||||||
|
- **Cited Domains** - Which source domains AI engines reference
|
||||||
|
- **Cited Pages** - Specific URLs that get cited in AI answers
|
||||||
|
- **Citation Ranking** - Frequency-based ranking of citations
|
||||||
|
- **Sentiment Analysis** - Positive/neutral/negative distribution
|
||||||
|
|
||||||
|
## Workflow
|
||||||
|
|
||||||
|
1. **Input**: User provides target domain and optional competitors
|
||||||
|
2. **Data Collection**: Fetch metrics from Ahrefs Brand Radar APIs
|
||||||
|
3. **Analysis**: Calculate trends, compare competitors, analyze sentiment
|
||||||
|
4. **Recommendations**: Generate actionable Korean-language recommendations
|
||||||
|
5. **Output**: JSON report and Notion database entry
|
||||||
|
|
||||||
|
## Ahrefs MCP Tools
|
||||||
|
|
||||||
|
| Tool | Purpose |
|
||||||
|
|------|---------|
|
||||||
|
| `brand-radar-ai-responses` | AI-generated responses mentioning brand |
|
||||||
|
| `brand-radar-cited-domains` | Domains cited in AI answers |
|
||||||
|
| `brand-radar-cited-pages` | Specific pages cited in AI answers |
|
||||||
|
| `brand-radar-impressions-history` | Impression trend over time |
|
||||||
|
| `brand-radar-impressions-overview` | Current impression metrics |
|
||||||
|
| `brand-radar-mentions-history` | Mention trend over time |
|
||||||
|
| `brand-radar-mentions-overview` | Current mention metrics |
|
||||||
|
| `brand-radar-sov-history` | Share of voice trend |
|
||||||
|
| `brand-radar-sov-overview` | Current share of voice |
|
||||||
|
|
||||||
|
## Notion Output
|
||||||
|
|
||||||
|
All reports are saved to the OurDigital SEO Audit Log:
|
||||||
|
- **Database ID**: `2c8581e5-8a1e-8035-880b-e38cefc2f3ef`
|
||||||
|
- **Category**: AI Search Visibility
|
||||||
|
- **Audit ID Format**: AI-YYYYMMDD-NNN
|
||||||
|
- **Language**: Korean (technical terms in English)
|
||||||
|
|
||||||
|
## Example Queries
|
||||||
|
|
||||||
|
- "example.com의 AI 검색 가시성을 분석해줘"
|
||||||
|
- "AI search visibility for example.com with competitors"
|
||||||
|
- "브랜드 레이더 분석: example.com vs competitor.com"
|
||||||
|
- "AI 인용 분석 - 어떤 페이지가 AI 답변에서 인용되나요?"
|
||||||
|
- "Share of Voice in AI search for our domain"
|
||||||
8
custom-skills/27-seo-ai-visibility/desktop/skill.yaml
Normal file
8
custom-skills/27-seo-ai-visibility/desktop/skill.yaml
Normal file
@@ -0,0 +1,8 @@
|
|||||||
|
name: seo-ai-visibility
|
||||||
|
description: |
|
||||||
|
AI search visibility and brand radar monitoring. Triggers: AI search, AI visibility, brand radar, AI citations, share of voice, AI answers, AI mentions.
|
||||||
|
allowed-tools:
|
||||||
|
- mcp__ahrefs__*
|
||||||
|
- mcp__notion__*
|
||||||
|
- WebSearch
|
||||||
|
- WebFetch
|
||||||
55
custom-skills/27-seo-ai-visibility/desktop/tools/ahrefs.md
Normal file
55
custom-skills/27-seo-ai-visibility/desktop/tools/ahrefs.md
Normal file
@@ -0,0 +1,55 @@
|
|||||||
|
# Ahrefs Brand Radar MCP Tools
|
||||||
|
|
||||||
|
## brand-radar-impressions-overview
|
||||||
|
Get current AI search impression metrics for a target domain. Returns total impressions, change percentage, and breakdown by AI engine.
|
||||||
|
|
||||||
|
**Parameters:**
|
||||||
|
- `target` (required): Domain to analyze (e.g., "example.com")
|
||||||
|
|
||||||
|
## brand-radar-impressions-history
|
||||||
|
Get historical AI search impression data over time. Returns time-series data points with date and impression values.
|
||||||
|
|
||||||
|
**Parameters:**
|
||||||
|
- `target` (required): Domain to analyze
|
||||||
|
|
||||||
|
## brand-radar-mentions-overview
|
||||||
|
Get current AI mention metrics for a target domain. Returns total mentions, change percentage, and breakdown.
|
||||||
|
|
||||||
|
**Parameters:**
|
||||||
|
- `target` (required): Domain to analyze
|
||||||
|
|
||||||
|
## brand-radar-mentions-history
|
||||||
|
Get historical AI mention data over time. Returns time-series data points with date and mention values.
|
||||||
|
|
||||||
|
**Parameters:**
|
||||||
|
- `target` (required): Domain to analyze
|
||||||
|
|
||||||
|
## brand-radar-sov-overview
|
||||||
|
Get Share of Voice overview in AI search for a target domain. Returns brand SOV percentage and competitor SOV data.
|
||||||
|
|
||||||
|
**Parameters:**
|
||||||
|
- `target` (required): Domain to analyze
|
||||||
|
|
||||||
|
## brand-radar-sov-history
|
||||||
|
Get historical Share of Voice data over time. Returns time-series SOV data points.
|
||||||
|
|
||||||
|
**Parameters:**
|
||||||
|
- `target` (required): Domain to analyze
|
||||||
|
|
||||||
|
## brand-radar-ai-responses
|
||||||
|
Get AI-generated responses that mention the brand. Returns query, response text, sentiment, and source engine for each response.
|
||||||
|
|
||||||
|
**Parameters:**
|
||||||
|
- `target` (required): Domain to analyze
|
||||||
|
|
||||||
|
## brand-radar-cited-domains
|
||||||
|
Get domains cited in AI answers related to the brand. Returns domain name, citation count, topics, and share percentage.
|
||||||
|
|
||||||
|
**Parameters:**
|
||||||
|
- `target` (required): Domain to analyze
|
||||||
|
|
||||||
|
## brand-radar-cited-pages
|
||||||
|
Get specific pages cited in AI answers. Returns URL, title, citation count, context snippet, and topics.
|
||||||
|
|
||||||
|
**Parameters:**
|
||||||
|
- `target` (required): Domain to analyze
|
||||||
44
custom-skills/27-seo-ai-visibility/desktop/tools/notion.md
Normal file
44
custom-skills/27-seo-ai-visibility/desktop/tools/notion.md
Normal file
@@ -0,0 +1,44 @@
|
|||||||
|
# Notion MCP Tools
|
||||||
|
|
||||||
|
## Database: OurDigital SEO Audit Log
|
||||||
|
|
||||||
|
- **Database ID**: `2c8581e5-8a1e-8035-880b-e38cefc2f3ef`
|
||||||
|
- **URL**: https://www.notion.so/dintelligence/2c8581e58a1e8035880be38cefc2f3ef
|
||||||
|
|
||||||
|
## Required Properties
|
||||||
|
|
||||||
|
| Property | Type | Description |
|
||||||
|
|----------|------|-------------|
|
||||||
|
| Issue | Title | Report title in Korean + date |
|
||||||
|
| Site | URL | Tracked website URL |
|
||||||
|
| Category | Select | "AI Search Visibility" |
|
||||||
|
| Priority | Select | Based on SOV trend (Critical, High, Medium, Low) |
|
||||||
|
| Found Date | Date | Report date (YYYY-MM-DD) |
|
||||||
|
| Audit ID | Rich Text | Format: AI-YYYYMMDD-NNN |
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
Use `notion-create-pages` to save audit results:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"parent": {"database_id": "2c8581e5-8a1e-8035-880b-e38cefc2f3ef"},
|
||||||
|
"properties": {
|
||||||
|
"Issue": {"title": [{"text": {"content": "AI 검색 가시성 분석 - example.com (2025-01-15)"}}]},
|
||||||
|
"Site": {"url": "https://example.com"},
|
||||||
|
"Category": {"select": {"name": "AI Search Visibility"}},
|
||||||
|
"Priority": {"select": {"name": "Medium"}},
|
||||||
|
"Found Date": {"date": {"start": "2025-01-15"}},
|
||||||
|
"Audit ID": {"rich_text": [{"text": {"content": "AI-20250115-001"}}]}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Priority Guidelines
|
||||||
|
|
||||||
|
| Condition | Priority |
|
||||||
|
|-----------|----------|
|
||||||
|
| SOV decreasing >10% | Critical |
|
||||||
|
| SOV decreasing 3-10% | High |
|
||||||
|
| SOV stable, low (<10%) | Medium |
|
||||||
|
| SOV increasing or high (>25%) | Low |
|
||||||
@@ -0,0 +1,17 @@
|
|||||||
|
# WebSearch & WebFetch Tools
|
||||||
|
|
||||||
|
## WebSearch
|
||||||
|
|
||||||
|
Use web search to supplement AI visibility analysis with additional context:
|
||||||
|
- Research competitor AI optimization strategies
|
||||||
|
- Find industry benchmarks for AI search visibility
|
||||||
|
- Look up latest AI search engine algorithm updates
|
||||||
|
- Discover best practices for AI citation optimization
|
||||||
|
|
||||||
|
## WebFetch
|
||||||
|
|
||||||
|
Use web fetch to retrieve specific pages for deeper analysis:
|
||||||
|
- Fetch competitor pages that are frequently cited in AI answers
|
||||||
|
- Retrieve structured data (Schema Markup) from cited pages
|
||||||
|
- Analyze content structure of top-cited URLs
|
||||||
|
- Check E-E-A-T signals on referenced pages
|
||||||
139
custom-skills/28-seo-knowledge-graph/code/CLAUDE.md
Normal file
139
custom-skills/28-seo-knowledge-graph/code/CLAUDE.md
Normal file
@@ -0,0 +1,139 @@
|
|||||||
|
# CLAUDE.md
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Knowledge Graph and Entity SEO tool for analyzing brand entity presence in Google Knowledge Graph, Knowledge Panels, People Also Ask (PAA), and FAQ rich results. Checks entity attribute completeness, Wikipedia/Wikidata presence, and Korean equivalents (Naver knowledge iN, Naver encyclopedia). Uses WebSearch and WebFetch for data collection, Ahrefs serp-overview for SERP feature detection.
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install -r scripts/requirements.txt
|
||||||
|
|
||||||
|
# Knowledge Graph analysis
|
||||||
|
python scripts/knowledge_graph_analyzer.py --entity "Samsung Electronics" --json
|
||||||
|
|
||||||
|
# Entity SEO audit
|
||||||
|
python scripts/entity_auditor.py --url https://example.com --entity "Brand Name" --json
|
||||||
|
```
|
||||||
|
|
||||||
|
## Scripts
|
||||||
|
|
||||||
|
| Script | Purpose | Key Output |
|
||||||
|
|--------|---------|------------|
|
||||||
|
| `knowledge_graph_analyzer.py` | Analyze Knowledge Panel and entity presence | KP detection, entity attributes, Wikipedia/Wikidata status |
|
||||||
|
| `entity_auditor.py` | Audit entity SEO signals and PAA/FAQ presence | PAA monitoring, FAQ schema tracking, entity completeness |
|
||||||
|
| `base_client.py` | Shared utilities | RateLimiter, ConfigManager, BaseAsyncClient |
|
||||||
|
|
||||||
|
## Knowledge Graph Analyzer
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Analyze entity in Knowledge Graph
|
||||||
|
python scripts/knowledge_graph_analyzer.py --entity "Samsung Electronics" --json
|
||||||
|
|
||||||
|
# Check with Korean name
|
||||||
|
python scripts/knowledge_graph_analyzer.py --entity "삼성전자" --language ko --json
|
||||||
|
|
||||||
|
# Include Wikipedia/Wikidata check
|
||||||
|
python scripts/knowledge_graph_analyzer.py --entity "Samsung" --wiki --json
|
||||||
|
```
|
||||||
|
|
||||||
|
**Capabilities**:
|
||||||
|
- Knowledge Panel detection via Google search
|
||||||
|
- Entity attribute extraction (name, description, logo, type, social profiles, website)
|
||||||
|
- Entity attribute completeness scoring
|
||||||
|
- Wikipedia article presence check
|
||||||
|
- Wikidata entity presence check (QID lookup)
|
||||||
|
- Naver encyclopedia (네이버 백과사전) presence
|
||||||
|
- Naver knowledge iN (지식iN) presence
|
||||||
|
- Knowledge Panel comparison with competitors
|
||||||
|
|
||||||
|
## Entity Auditor
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Full entity SEO audit
|
||||||
|
python scripts/entity_auditor.py --url https://example.com --entity "Brand Name" --json
|
||||||
|
|
||||||
|
# PAA monitoring for brand keywords
|
||||||
|
python scripts/entity_auditor.py --url https://example.com --entity "Brand Name" --paa --json
|
||||||
|
|
||||||
|
# FAQ rich result tracking
|
||||||
|
python scripts/entity_auditor.py --url https://example.com --entity "Brand Name" --faq --json
|
||||||
|
```
|
||||||
|
|
||||||
|
**Capabilities**:
|
||||||
|
- People Also Ask (PAA) monitoring for brand-related queries
|
||||||
|
- FAQ schema presence tracking (FAQPage schema -> SERP appearance)
|
||||||
|
- Entity markup audit (Organization, Person, LocalBusiness schema on website)
|
||||||
|
- Social profile linking validation (sameAs in schema)
|
||||||
|
- Brand SERP analysis (what appears when you search the brand name)
|
||||||
|
- Entity consistency across web properties
|
||||||
|
- Korean entity optimization (Korean Knowledge Panel, Naver profiles)
|
||||||
|
|
||||||
|
## Data Sources
|
||||||
|
|
||||||
|
| Source | Purpose |
|
||||||
|
|--------|---------|
|
||||||
|
| WebSearch | Search for entity/brand to detect Knowledge Panel |
|
||||||
|
| WebFetch | Fetch Wikipedia, Wikidata, Naver pages |
|
||||||
|
| Ahrefs `serp-overview` | SERP feature detection for entity keywords |
|
||||||
|
|
||||||
|
## Output Format
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"entity": "Samsung Electronics",
|
||||||
|
"knowledge_panel": {
|
||||||
|
"detected": true,
|
||||||
|
"attributes": {
|
||||||
|
"name": "Samsung Electronics",
|
||||||
|
"type": "Corporation",
|
||||||
|
"description": "...",
|
||||||
|
"logo": true,
|
||||||
|
"website": true,
|
||||||
|
"social_profiles": ["twitter", "facebook", "linkedin"]
|
||||||
|
},
|
||||||
|
"completeness_score": 85
|
||||||
|
},
|
||||||
|
"wikipedia": {"present": true, "url": "..."},
|
||||||
|
"wikidata": {"present": true, "qid": "Q20710"},
|
||||||
|
"naver_encyclopedia": {"present": true, "url": "..."},
|
||||||
|
"naver_knowledge_in": {"present": true, "entries": 15},
|
||||||
|
"paa_questions": [...],
|
||||||
|
"faq_rich_results": [...],
|
||||||
|
"entity_schema_on_site": {
|
||||||
|
"organization": true,
|
||||||
|
"same_as_links": 5,
|
||||||
|
"completeness": 78
|
||||||
|
},
|
||||||
|
"score": 75,
|
||||||
|
"timestamp": "2025-01-01T00:00:00"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Notion Output (Required)
|
||||||
|
|
||||||
|
**IMPORTANT**: All audit reports MUST be saved to the OurDigital SEO Audit Log database.
|
||||||
|
|
||||||
|
### Database Configuration
|
||||||
|
|
||||||
|
| Field | Value |
|
||||||
|
|-------|-------|
|
||||||
|
| Database ID | `2c8581e5-8a1e-8035-880b-e38cefc2f3ef` |
|
||||||
|
| URL | https://www.notion.so/dintelligence/2c8581e58a1e8035880be38cefc2f3ef |
|
||||||
|
|
||||||
|
### Required Properties
|
||||||
|
|
||||||
|
| Property | Type | Description |
|
||||||
|
|----------|------|-------------|
|
||||||
|
| Issue | Title | Report title (Korean + date) |
|
||||||
|
| Site | URL | Entity website URL |
|
||||||
|
| Category | Select | Knowledge Graph & Entity SEO |
|
||||||
|
| Priority | Select | Based on entity completeness |
|
||||||
|
| Found Date | Date | Audit date (YYYY-MM-DD) |
|
||||||
|
| Audit ID | Rich Text | Format: KG-YYYYMMDD-NNN |
|
||||||
|
|
||||||
|
### Language Guidelines
|
||||||
|
|
||||||
|
- Report content in Korean (한국어)
|
||||||
|
- Keep technical English terms as-is (e.g., Knowledge Panel, Knowledge Graph, PAA)
|
||||||
|
- URLs and code remain unchanged
|
||||||
207
custom-skills/28-seo-knowledge-graph/code/scripts/base_client.py
Normal file
207
custom-skills/28-seo-knowledge-graph/code/scripts/base_client.py
Normal file
@@ -0,0 +1,207 @@
|
|||||||
|
"""
|
||||||
|
Base Client - Shared async client utilities
|
||||||
|
===========================================
|
||||||
|
Purpose: Rate-limited async operations for API clients
|
||||||
|
Python: 3.10+
|
||||||
|
"""
|
||||||
|
|
||||||
|
import asyncio
|
||||||
|
import logging
|
||||||
|
import os
|
||||||
|
from asyncio import Semaphore
|
||||||
|
from datetime import datetime
|
||||||
|
from typing import Any, Callable, TypeVar
|
||||||
|
|
||||||
|
from dotenv import load_dotenv
|
||||||
|
from tenacity import (
|
||||||
|
retry,
|
||||||
|
stop_after_attempt,
|
||||||
|
wait_exponential,
|
||||||
|
retry_if_exception_type,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Load environment variables
|
||||||
|
load_dotenv()
|
||||||
|
|
||||||
|
# Logging setup
|
||||||
|
logging.basicConfig(
|
||||||
|
level=logging.INFO,
|
||||||
|
format="%(asctime)s - %(levelname)s - %(message)s",
|
||||||
|
)
|
||||||
|
|
||||||
|
T = TypeVar("T")
|
||||||
|
|
||||||
|
|
||||||
|
class RateLimiter:
|
||||||
|
"""Rate limiter using token bucket algorithm."""
|
||||||
|
|
||||||
|
def __init__(self, rate: float, per: float = 1.0):
|
||||||
|
"""
|
||||||
|
Initialize rate limiter.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
rate: Number of requests allowed
|
||||||
|
per: Time period in seconds (default: 1 second)
|
||||||
|
"""
|
||||||
|
self.rate = rate
|
||||||
|
self.per = per
|
||||||
|
self.tokens = rate
|
||||||
|
self.last_update = datetime.now()
|
||||||
|
self._lock = asyncio.Lock()
|
||||||
|
|
||||||
|
async def acquire(self) -> None:
|
||||||
|
"""Acquire a token, waiting if necessary."""
|
||||||
|
async with self._lock:
|
||||||
|
now = datetime.now()
|
||||||
|
elapsed = (now - self.last_update).total_seconds()
|
||||||
|
self.tokens = min(self.rate, self.tokens + elapsed * (self.rate / self.per))
|
||||||
|
self.last_update = now
|
||||||
|
|
||||||
|
if self.tokens < 1:
|
||||||
|
wait_time = (1 - self.tokens) * (self.per / self.rate)
|
||||||
|
await asyncio.sleep(wait_time)
|
||||||
|
self.tokens = 0
|
||||||
|
else:
|
||||||
|
self.tokens -= 1
|
||||||
|
|
||||||
|
|
||||||
|
class BaseAsyncClient:
|
||||||
|
"""Base class for async API clients with rate limiting."""
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
max_concurrent: int = 5,
|
||||||
|
requests_per_second: float = 3.0,
|
||||||
|
logger: logging.Logger | None = None,
|
||||||
|
):
|
||||||
|
"""
|
||||||
|
Initialize base client.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
max_concurrent: Maximum concurrent requests
|
||||||
|
requests_per_second: Rate limit
|
||||||
|
logger: Logger instance
|
||||||
|
"""
|
||||||
|
self.semaphore = Semaphore(max_concurrent)
|
||||||
|
self.rate_limiter = RateLimiter(requests_per_second)
|
||||||
|
self.logger = logger or logging.getLogger(self.__class__.__name__)
|
||||||
|
self.stats = {
|
||||||
|
"requests": 0,
|
||||||
|
"success": 0,
|
||||||
|
"errors": 0,
|
||||||
|
"retries": 0,
|
||||||
|
}
|
||||||
|
|
||||||
|
@retry(
|
||||||
|
stop=stop_after_attempt(3),
|
||||||
|
wait=wait_exponential(multiplier=1, min=2, max=10),
|
||||||
|
retry=retry_if_exception_type(Exception),
|
||||||
|
)
|
||||||
|
async def _rate_limited_request(
|
||||||
|
self,
|
||||||
|
coro: Callable[[], Any],
|
||||||
|
) -> Any:
|
||||||
|
"""Execute a request with rate limiting and retry."""
|
||||||
|
async with self.semaphore:
|
||||||
|
await self.rate_limiter.acquire()
|
||||||
|
self.stats["requests"] += 1
|
||||||
|
try:
|
||||||
|
result = await coro()
|
||||||
|
self.stats["success"] += 1
|
||||||
|
return result
|
||||||
|
except Exception as e:
|
||||||
|
self.stats["errors"] += 1
|
||||||
|
self.logger.error(f"Request failed: {e}")
|
||||||
|
raise
|
||||||
|
|
||||||
|
async def batch_requests(
|
||||||
|
self,
|
||||||
|
requests: list[Callable[[], Any]],
|
||||||
|
desc: str = "Processing",
|
||||||
|
) -> list[Any]:
|
||||||
|
"""Execute multiple requests concurrently."""
|
||||||
|
try:
|
||||||
|
from tqdm.asyncio import tqdm
|
||||||
|
has_tqdm = True
|
||||||
|
except ImportError:
|
||||||
|
has_tqdm = False
|
||||||
|
|
||||||
|
async def execute(req: Callable) -> Any:
|
||||||
|
try:
|
||||||
|
return await self._rate_limited_request(req)
|
||||||
|
except Exception as e:
|
||||||
|
return {"error": str(e)}
|
||||||
|
|
||||||
|
tasks = [execute(req) for req in requests]
|
||||||
|
|
||||||
|
if has_tqdm:
|
||||||
|
results = []
|
||||||
|
for coro in tqdm.as_completed(tasks, total=len(tasks), desc=desc):
|
||||||
|
result = await coro
|
||||||
|
results.append(result)
|
||||||
|
return results
|
||||||
|
else:
|
||||||
|
return await asyncio.gather(*tasks, return_exceptions=True)
|
||||||
|
|
||||||
|
def print_stats(self) -> None:
|
||||||
|
"""Print request statistics."""
|
||||||
|
self.logger.info("=" * 40)
|
||||||
|
self.logger.info("Request Statistics:")
|
||||||
|
self.logger.info(f" Total Requests: {self.stats['requests']}")
|
||||||
|
self.logger.info(f" Successful: {self.stats['success']}")
|
||||||
|
self.logger.info(f" Errors: {self.stats['errors']}")
|
||||||
|
self.logger.info("=" * 40)
|
||||||
|
|
||||||
|
|
||||||
|
class ConfigManager:
|
||||||
|
"""Manage API configuration and credentials."""
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
load_dotenv()
|
||||||
|
|
||||||
|
@property
|
||||||
|
def google_credentials_path(self) -> str | None:
|
||||||
|
"""Get Google service account credentials path."""
|
||||||
|
# Prefer SEO-specific credentials, fallback to general credentials
|
||||||
|
seo_creds = os.path.expanduser("~/.credential/ourdigital-seo-agent.json")
|
||||||
|
if os.path.exists(seo_creds):
|
||||||
|
return seo_creds
|
||||||
|
return os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
|
||||||
|
|
||||||
|
@property
|
||||||
|
def pagespeed_api_key(self) -> str | None:
|
||||||
|
"""Get PageSpeed Insights API key."""
|
||||||
|
return os.getenv("PAGESPEED_API_KEY")
|
||||||
|
|
||||||
|
@property
|
||||||
|
def custom_search_api_key(self) -> str | None:
|
||||||
|
"""Get Custom Search API key."""
|
||||||
|
return os.getenv("CUSTOM_SEARCH_API_KEY")
|
||||||
|
|
||||||
|
@property
|
||||||
|
def custom_search_engine_id(self) -> str | None:
|
||||||
|
"""Get Custom Search Engine ID."""
|
||||||
|
return os.getenv("CUSTOM_SEARCH_ENGINE_ID")
|
||||||
|
|
||||||
|
@property
|
||||||
|
def notion_token(self) -> str | None:
|
||||||
|
"""Get Notion API token."""
|
||||||
|
return os.getenv("NOTION_TOKEN") or os.getenv("NOTION_API_KEY")
|
||||||
|
|
||||||
|
def validate_google_credentials(self) -> bool:
|
||||||
|
"""Validate Google credentials are configured."""
|
||||||
|
creds_path = self.google_credentials_path
|
||||||
|
if not creds_path:
|
||||||
|
return False
|
||||||
|
return os.path.exists(creds_path)
|
||||||
|
|
||||||
|
def get_required(self, key: str) -> str:
|
||||||
|
"""Get required environment variable or raise error."""
|
||||||
|
value = os.getenv(key)
|
||||||
|
if not value:
|
||||||
|
raise ValueError(f"Missing required environment variable: {key}")
|
||||||
|
return value
|
||||||
|
|
||||||
|
|
||||||
|
# Singleton config instance
|
||||||
|
config = ConfigManager()
|
||||||
@@ -0,0 +1,902 @@
|
|||||||
|
"""
|
||||||
|
Entity Auditor
|
||||||
|
===============
|
||||||
|
Purpose: Audit entity SEO signals including PAA monitoring, FAQ schema tracking,
|
||||||
|
entity markup validation, and brand SERP analysis.
|
||||||
|
Python: 3.10+
|
||||||
|
"""
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import asyncio
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
import re
|
||||||
|
import sys
|
||||||
|
from dataclasses import asdict, dataclass, field
|
||||||
|
from datetime import datetime
|
||||||
|
from typing import Any
|
||||||
|
from urllib.parse import quote, urljoin, urlparse
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
|
from bs4 import BeautifulSoup
|
||||||
|
from rich.console import Console
|
||||||
|
from rich.table import Table
|
||||||
|
|
||||||
|
from base_client import BaseAsyncClient, ConfigManager, config
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
console = Console()
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Data classes
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class PaaQuestion:
|
||||||
|
"""A People Also Ask question found in SERP."""
|
||||||
|
question: str = ""
|
||||||
|
keyword: str = ""
|
||||||
|
position: int = 0
|
||||||
|
source_url: str | None = None
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class FaqRichResult:
|
||||||
|
"""FAQ rich result tracking entry."""
|
||||||
|
url: str = ""
|
||||||
|
question_count: int = 0
|
||||||
|
appearing_in_serp: bool = False
|
||||||
|
questions: list[str] = field(default_factory=list)
|
||||||
|
schema_valid: bool = False
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class EntitySchema:
|
||||||
|
"""Entity structured data found on a website."""
|
||||||
|
type: str = "" # Organization, Person, LocalBusiness, etc.
|
||||||
|
properties: dict[str, Any] = field(default_factory=dict)
|
||||||
|
same_as_links: list[str] = field(default_factory=list)
|
||||||
|
completeness: float = 0.0
|
||||||
|
issues: list[str] = field(default_factory=list)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class BrandSerpResult:
|
||||||
|
"""What appears when searching for the brand name."""
|
||||||
|
query: str = ""
|
||||||
|
features: list[str] = field(default_factory=list)
|
||||||
|
paa_count: int = 0
|
||||||
|
faq_count: int = 0
|
||||||
|
knowledge_panel: bool = False
|
||||||
|
sitelinks: bool = False
|
||||||
|
social_profiles: list[str] = field(default_factory=list)
|
||||||
|
top_results: list[dict[str, str]] = field(default_factory=list)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class EntityAuditResult:
|
||||||
|
"""Full entity SEO audit result."""
|
||||||
|
url: str = ""
|
||||||
|
entity_name: str = ""
|
||||||
|
paa_questions: list[PaaQuestion] = field(default_factory=list)
|
||||||
|
faq_rich_results: list[FaqRichResult] = field(default_factory=list)
|
||||||
|
entity_schemas: list[EntitySchema] = field(default_factory=list)
|
||||||
|
brand_serp: BrandSerpResult = field(default_factory=BrandSerpResult)
|
||||||
|
social_profile_status: dict[str, bool] = field(default_factory=dict)
|
||||||
|
overall_score: float = 0.0
|
||||||
|
recommendations: list[str] = field(default_factory=list)
|
||||||
|
timestamp: str = field(default_factory=lambda: datetime.now().isoformat())
|
||||||
|
|
||||||
|
def to_dict(self) -> dict[str, Any]:
|
||||||
|
return asdict(self)
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Entity Auditor
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
class EntityAuditor(BaseAsyncClient):
|
||||||
|
"""Audit entity SEO signals and rich result presence."""
|
||||||
|
|
||||||
|
GOOGLE_SEARCH_URL = "https://www.google.com/search"
|
||||||
|
|
||||||
|
HEADERS = {
|
||||||
|
"User-Agent": (
|
||||||
|
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
|
||||||
|
"AppleWebKit/537.36 (KHTML, like Gecko) "
|
||||||
|
"Chrome/120.0.0.0 Safari/537.36"
|
||||||
|
),
|
||||||
|
"Accept-Language": "en-US,en;q=0.9",
|
||||||
|
}
|
||||||
|
|
||||||
|
PAA_KEYWORD_TEMPLATES = [
|
||||||
|
"{entity}",
|
||||||
|
"{entity} reviews",
|
||||||
|
"{entity} vs",
|
||||||
|
"what is {entity}",
|
||||||
|
"{entity} pricing",
|
||||||
|
"{entity} alternatives",
|
||||||
|
"is {entity} good",
|
||||||
|
"{entity} benefits",
|
||||||
|
"how to use {entity}",
|
||||||
|
"{entity} complaints",
|
||||||
|
]
|
||||||
|
|
||||||
|
EXPECTED_SCHEMA_PROPERTIES = {
|
||||||
|
"Organization": [
|
||||||
|
"name", "url", "logo", "description", "sameAs",
|
||||||
|
"contactPoint", "address", "foundingDate", "founder",
|
||||||
|
"numberOfEmployees", "email", "telephone",
|
||||||
|
],
|
||||||
|
"Person": [
|
||||||
|
"name", "url", "image", "description", "sameAs",
|
||||||
|
"jobTitle", "worksFor", "alumniOf", "birthDate",
|
||||||
|
],
|
||||||
|
"LocalBusiness": [
|
||||||
|
"name", "url", "image", "description", "sameAs",
|
||||||
|
"address", "telephone", "openingHours", "geo",
|
||||||
|
"priceRange", "aggregateRating",
|
||||||
|
],
|
||||||
|
}
|
||||||
|
|
||||||
|
def __init__(self, **kwargs):
|
||||||
|
super().__init__(**kwargs)
|
||||||
|
self.config = config
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# PAA monitoring
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
async def monitor_paa(
|
||||||
|
self,
|
||||||
|
entity_name: str,
|
||||||
|
keywords: list[str] | None = None,
|
||||||
|
session: aiohttp.ClientSession | None = None,
|
||||||
|
) -> list[PaaQuestion]:
|
||||||
|
"""Search brand keywords and extract People Also Ask questions."""
|
||||||
|
if keywords is None:
|
||||||
|
keywords = [t.format(entity=entity_name) for t in self.PAA_KEYWORD_TEMPLATES]
|
||||||
|
|
||||||
|
paa_questions: list[PaaQuestion] = []
|
||||||
|
|
||||||
|
own_session = session is None
|
||||||
|
if own_session:
|
||||||
|
session = aiohttp.ClientSession()
|
||||||
|
|
||||||
|
try:
|
||||||
|
for keyword in keywords:
|
||||||
|
params = {"q": keyword, "hl": "en", "gl": "us"}
|
||||||
|
try:
|
||||||
|
async with session.get(
|
||||||
|
self.GOOGLE_SEARCH_URL, params=params, headers=self.HEADERS,
|
||||||
|
timeout=aiohttp.ClientTimeout(total=20),
|
||||||
|
) as resp:
|
||||||
|
if resp.status != 200:
|
||||||
|
logger.warning("Search for '%s' returned status %d", keyword, resp.status)
|
||||||
|
continue
|
||||||
|
|
||||||
|
html = await resp.text()
|
||||||
|
soup = BeautifulSoup(html, "lxml")
|
||||||
|
|
||||||
|
# PAA box selectors
|
||||||
|
paa_selectors = [
|
||||||
|
"div[data-sgrd] div[data-q]",
|
||||||
|
"div.related-question-pair",
|
||||||
|
"div[jsname] div[data-q]",
|
||||||
|
"div.wQiwMc",
|
||||||
|
]
|
||||||
|
|
||||||
|
position = 0
|
||||||
|
for selector in paa_selectors:
|
||||||
|
elements = soup.select(selector)
|
||||||
|
for el in elements:
|
||||||
|
question_text = el.get("data-q", "") or el.get_text(strip=True)
|
||||||
|
if question_text and len(question_text) > 5:
|
||||||
|
position += 1
|
||||||
|
paa_questions.append(PaaQuestion(
|
||||||
|
question=question_text,
|
||||||
|
keyword=keyword,
|
||||||
|
position=position,
|
||||||
|
))
|
||||||
|
|
||||||
|
# Fallback: regex for PAA-like questions
|
||||||
|
if not paa_questions:
|
||||||
|
text = soup.get_text(separator="\n")
|
||||||
|
q_patterns = re.findall(
|
||||||
|
r"((?:What|How|Why|When|Where|Who|Is|Can|Does|Do|Which)\s+[^?\n]{10,80}\??)",
|
||||||
|
text,
|
||||||
|
)
|
||||||
|
for i, q in enumerate(q_patterns[:8]):
|
||||||
|
paa_questions.append(PaaQuestion(
|
||||||
|
question=q.strip(),
|
||||||
|
keyword=keyword,
|
||||||
|
position=i + 1,
|
||||||
|
))
|
||||||
|
|
||||||
|
except Exception as exc:
|
||||||
|
logger.error("PAA search failed for '%s': %s", keyword, exc)
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Rate limit between searches
|
||||||
|
await asyncio.sleep(1.5)
|
||||||
|
finally:
|
||||||
|
if own_session:
|
||||||
|
await session.close()
|
||||||
|
|
||||||
|
# Deduplicate questions
|
||||||
|
seen = set()
|
||||||
|
unique = []
|
||||||
|
for q in paa_questions:
|
||||||
|
key = q.question.lower().strip()
|
||||||
|
if key not in seen:
|
||||||
|
seen.add(key)
|
||||||
|
unique.append(q)
|
||||||
|
|
||||||
|
logger.info("Found %d unique PAA questions for '%s'", len(unique), entity_name)
|
||||||
|
return unique
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# FAQ rich result tracking
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
async def track_faq_rich_results(
|
||||||
|
self,
|
||||||
|
url: str,
|
||||||
|
session: aiohttp.ClientSession | None = None,
|
||||||
|
) -> list[FaqRichResult]:
|
||||||
|
"""Check pages for FAQPage schema and SERP appearance."""
|
||||||
|
faq_results: list[FaqRichResult] = []
|
||||||
|
domain = urlparse(url).netloc
|
||||||
|
|
||||||
|
own_session = session is None
|
||||||
|
if own_session:
|
||||||
|
session = aiohttp.ClientSession()
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Fetch the page and look for FAQ schema
|
||||||
|
async with session.get(
|
||||||
|
url, headers=self.HEADERS, timeout=aiohttp.ClientTimeout(total=20),
|
||||||
|
) as resp:
|
||||||
|
if resp.status != 200:
|
||||||
|
logger.warning("Page %s returned status %d", url, resp.status)
|
||||||
|
return faq_results
|
||||||
|
|
||||||
|
html = await resp.text()
|
||||||
|
soup = BeautifulSoup(html, "lxml")
|
||||||
|
|
||||||
|
# Find JSON-LD scripts with FAQPage
|
||||||
|
scripts = soup.find_all("script", type="application/ld+json")
|
||||||
|
for script in scripts:
|
||||||
|
try:
|
||||||
|
data = json.loads(script.string or "{}")
|
||||||
|
items = data if isinstance(data, list) else [data]
|
||||||
|
|
||||||
|
for item in items:
|
||||||
|
schema_type = item.get("@type", "")
|
||||||
|
if schema_type == "FAQPage" or (
|
||||||
|
isinstance(schema_type, list) and "FAQPage" in schema_type
|
||||||
|
):
|
||||||
|
questions = item.get("mainEntity", [])
|
||||||
|
faq = FaqRichResult(
|
||||||
|
url=url,
|
||||||
|
question_count=len(questions),
|
||||||
|
questions=[
|
||||||
|
q.get("name", "") for q in questions if isinstance(q, dict)
|
||||||
|
],
|
||||||
|
schema_valid=True,
|
||||||
|
)
|
||||||
|
faq_results.append(faq)
|
||||||
|
|
||||||
|
# Check for nested @graph
|
||||||
|
graph = item.get("@graph", [])
|
||||||
|
for g_item in graph:
|
||||||
|
if g_item.get("@type") == "FAQPage":
|
||||||
|
questions = g_item.get("mainEntity", [])
|
||||||
|
faq = FaqRichResult(
|
||||||
|
url=url,
|
||||||
|
question_count=len(questions),
|
||||||
|
questions=[
|
||||||
|
q.get("name", "") for q in questions if isinstance(q, dict)
|
||||||
|
],
|
||||||
|
schema_valid=True,
|
||||||
|
)
|
||||||
|
faq_results.append(faq)
|
||||||
|
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Also check for microdata FAQ markup
|
||||||
|
faq_items = soup.select("[itemtype*='FAQPage'] [itemprop='mainEntity']")
|
||||||
|
if faq_items and not faq_results:
|
||||||
|
questions = []
|
||||||
|
for item in faq_items:
|
||||||
|
q_el = item.select_one("[itemprop='name']")
|
||||||
|
if q_el:
|
||||||
|
questions.append(q_el.get_text(strip=True))
|
||||||
|
faq_results.append(FaqRichResult(
|
||||||
|
url=url,
|
||||||
|
question_count=len(questions),
|
||||||
|
questions=questions,
|
||||||
|
schema_valid=True,
|
||||||
|
))
|
||||||
|
|
||||||
|
except Exception as exc:
|
||||||
|
logger.error("FAQ tracking failed for %s: %s", url, exc)
|
||||||
|
finally:
|
||||||
|
if own_session:
|
||||||
|
await session.close()
|
||||||
|
|
||||||
|
logger.info("Found %d FAQ schemas on %s", len(faq_results), url)
|
||||||
|
return faq_results
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Entity schema audit
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
async def audit_entity_schema(
|
||||||
|
self,
|
||||||
|
url: str,
|
||||||
|
session: aiohttp.ClientSession | None = None,
|
||||||
|
) -> list[EntitySchema]:
|
||||||
|
"""Check Organization/Person/LocalBusiness schema on website."""
|
||||||
|
schemas: list[EntitySchema] = []
|
||||||
|
target_types = {"Organization", "Person", "LocalBusiness", "Corporation", "MedicalBusiness"}
|
||||||
|
|
||||||
|
own_session = session is None
|
||||||
|
if own_session:
|
||||||
|
session = aiohttp.ClientSession()
|
||||||
|
|
||||||
|
try:
|
||||||
|
async with session.get(
|
||||||
|
url, headers=self.HEADERS, timeout=aiohttp.ClientTimeout(total=20),
|
||||||
|
) as resp:
|
||||||
|
if resp.status != 200:
|
||||||
|
logger.warning("Page %s returned status %d", url, resp.status)
|
||||||
|
return schemas
|
||||||
|
|
||||||
|
html = await resp.text()
|
||||||
|
soup = BeautifulSoup(html, "lxml")
|
||||||
|
|
||||||
|
scripts = soup.find_all("script", type="application/ld+json")
|
||||||
|
for script in scripts:
|
||||||
|
try:
|
||||||
|
data = json.loads(script.string or "{}")
|
||||||
|
items = data if isinstance(data, list) else [data]
|
||||||
|
|
||||||
|
# Include @graph nested items
|
||||||
|
expanded = []
|
||||||
|
for item in items:
|
||||||
|
expanded.append(item)
|
||||||
|
if "@graph" in item:
|
||||||
|
expanded.extend(item["@graph"])
|
||||||
|
|
||||||
|
for item in expanded:
|
||||||
|
item_type = item.get("@type", "")
|
||||||
|
if isinstance(item_type, list):
|
||||||
|
matching = [t for t in item_type if t in target_types]
|
||||||
|
if not matching:
|
||||||
|
continue
|
||||||
|
item_type = matching[0]
|
||||||
|
elif item_type not in target_types:
|
||||||
|
continue
|
||||||
|
|
||||||
|
same_as = item.get("sameAs", [])
|
||||||
|
if isinstance(same_as, str):
|
||||||
|
same_as = [same_as]
|
||||||
|
|
||||||
|
# Calculate completeness
|
||||||
|
base_type = item_type
|
||||||
|
if base_type == "Corporation":
|
||||||
|
base_type = "Organization"
|
||||||
|
elif base_type == "MedicalBusiness":
|
||||||
|
base_type = "LocalBusiness"
|
||||||
|
|
||||||
|
expected = self.EXPECTED_SCHEMA_PROPERTIES.get(base_type, [])
|
||||||
|
present = [k for k in expected if k in item and item[k]]
|
||||||
|
completeness = round((len(present) / len(expected)) * 100, 1) if expected else 0
|
||||||
|
|
||||||
|
# Check for issues
|
||||||
|
issues = []
|
||||||
|
if "name" not in item:
|
||||||
|
issues.append("Missing 'name' property")
|
||||||
|
if "url" not in item:
|
||||||
|
issues.append("Missing 'url' property")
|
||||||
|
if not same_as:
|
||||||
|
issues.append("No 'sameAs' links (social profiles)")
|
||||||
|
if "logo" not in item and base_type == "Organization":
|
||||||
|
issues.append("Missing 'logo' property")
|
||||||
|
if "description" not in item:
|
||||||
|
issues.append("Missing 'description' property")
|
||||||
|
|
||||||
|
schema = EntitySchema(
|
||||||
|
type=item_type,
|
||||||
|
properties={k: (str(v)[:100] if not isinstance(v, (list, dict)) else v) for k, v in item.items() if k != "@context"},
|
||||||
|
same_as_links=same_as,
|
||||||
|
completeness=completeness,
|
||||||
|
issues=issues,
|
||||||
|
)
|
||||||
|
schemas.append(schema)
|
||||||
|
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
continue
|
||||||
|
|
||||||
|
except Exception as exc:
|
||||||
|
logger.error("Entity schema audit failed for %s: %s", url, exc)
|
||||||
|
finally:
|
||||||
|
if own_session:
|
||||||
|
await session.close()
|
||||||
|
|
||||||
|
logger.info("Found %d entity schemas on %s", len(schemas), url)
|
||||||
|
return schemas
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Brand SERP analysis
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
async def analyze_brand_serp(
|
||||||
|
self,
|
||||||
|
entity_name: str,
|
||||||
|
session: aiohttp.ClientSession | None = None,
|
||||||
|
) -> BrandSerpResult:
|
||||||
|
"""Analyze what appears in SERP for the brand name search."""
|
||||||
|
result = BrandSerpResult(query=entity_name)
|
||||||
|
|
||||||
|
own_session = session is None
|
||||||
|
if own_session:
|
||||||
|
session = aiohttp.ClientSession()
|
||||||
|
|
||||||
|
try:
|
||||||
|
params = {"q": entity_name, "hl": "en", "gl": "us"}
|
||||||
|
async with session.get(
|
||||||
|
self.GOOGLE_SEARCH_URL, params=params, headers=self.HEADERS,
|
||||||
|
timeout=aiohttp.ClientTimeout(total=20),
|
||||||
|
) as resp:
|
||||||
|
if resp.status != 200:
|
||||||
|
return result
|
||||||
|
|
||||||
|
html = await resp.text()
|
||||||
|
soup = BeautifulSoup(html, "lxml")
|
||||||
|
text = soup.get_text(separator=" ", strip=True).lower()
|
||||||
|
|
||||||
|
# Detect SERP features
|
||||||
|
feature_indicators = {
|
||||||
|
"knowledge_panel": ["kp-wholepage", "knowledge-panel", "kno-"],
|
||||||
|
"sitelinks": ["sitelinks", "site-links"],
|
||||||
|
"people_also_ask": ["related-question-pair", "data-q"],
|
||||||
|
"faq_rich_result": ["faqpage", "frequently asked"],
|
||||||
|
"featured_snippet": ["featured-snippet", "data-tts"],
|
||||||
|
"image_pack": ["image-result", "img-brk"],
|
||||||
|
"video_carousel": ["video-result", "vid-"],
|
||||||
|
"twitter_carousel": ["twitter-timeline", "g-scrolling-carousel"],
|
||||||
|
"reviews": ["star-rating", "aggregate-rating"],
|
||||||
|
"local_pack": ["local-pack", "local_pack"],
|
||||||
|
}
|
||||||
|
|
||||||
|
for feature, indicators in feature_indicators.items():
|
||||||
|
for ind in indicators:
|
||||||
|
if ind in str(soup).lower():
|
||||||
|
result.features.append(feature)
|
||||||
|
break
|
||||||
|
|
||||||
|
result.knowledge_panel = "knowledge_panel" in result.features
|
||||||
|
result.sitelinks = "sitelinks" in result.features
|
||||||
|
|
||||||
|
# Count PAA questions
|
||||||
|
paa_elements = soup.select("div[data-q], div.related-question-pair")
|
||||||
|
result.paa_count = len(paa_elements)
|
||||||
|
if result.paa_count > 0 and "people_also_ask" not in result.features:
|
||||||
|
result.features.append("people_also_ask")
|
||||||
|
|
||||||
|
# Detect social profiles in results
|
||||||
|
social_domains = {
|
||||||
|
"twitter.com": "twitter", "x.com": "twitter",
|
||||||
|
"facebook.com": "facebook", "linkedin.com": "linkedin",
|
||||||
|
"youtube.com": "youtube", "instagram.com": "instagram",
|
||||||
|
"github.com": "github", "pinterest.com": "pinterest",
|
||||||
|
}
|
||||||
|
links = soup.find_all("a", href=True)
|
||||||
|
for link in links:
|
||||||
|
href = link["href"]
|
||||||
|
for domain, name in social_domains.items():
|
||||||
|
if domain in href and name not in result.social_profiles:
|
||||||
|
result.social_profiles.append(name)
|
||||||
|
|
||||||
|
# Extract top organic results
|
||||||
|
result_divs = soup.select("div.g, div[data-sokoban-container]")[:10]
|
||||||
|
for div in result_divs:
|
||||||
|
title_el = div.select_one("h3")
|
||||||
|
link_el = div.select_one("a[href]")
|
||||||
|
if title_el and link_el:
|
||||||
|
result.top_results.append({
|
||||||
|
"title": title_el.get_text(strip=True),
|
||||||
|
"url": link_el.get("href", ""),
|
||||||
|
})
|
||||||
|
|
||||||
|
except Exception as exc:
|
||||||
|
logger.error("Brand SERP analysis failed for '%s': %s", entity_name, exc)
|
||||||
|
finally:
|
||||||
|
if own_session:
|
||||||
|
await session.close()
|
||||||
|
|
||||||
|
return result
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Social profile link validation
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
async def check_social_profile_links(
|
||||||
|
self,
|
||||||
|
same_as_links: list[str],
|
||||||
|
session: aiohttp.ClientSession | None = None,
|
||||||
|
) -> dict[str, bool]:
|
||||||
|
"""Validate sameAs URLs are accessible."""
|
||||||
|
status: dict[str, bool] = {}
|
||||||
|
if not same_as_links:
|
||||||
|
return status
|
||||||
|
|
||||||
|
own_session = session is None
|
||||||
|
if own_session:
|
||||||
|
session = aiohttp.ClientSession()
|
||||||
|
|
||||||
|
try:
|
||||||
|
for link in same_as_links:
|
||||||
|
try:
|
||||||
|
async with session.head(
|
||||||
|
link, headers=self.HEADERS, timeout=aiohttp.ClientTimeout(total=10),
|
||||||
|
allow_redirects=True,
|
||||||
|
) as resp:
|
||||||
|
status[link] = resp.status < 400
|
||||||
|
except Exception:
|
||||||
|
status[link] = False
|
||||||
|
|
||||||
|
await asyncio.sleep(0.5)
|
||||||
|
finally:
|
||||||
|
if own_session:
|
||||||
|
await session.close()
|
||||||
|
|
||||||
|
accessible = sum(1 for v in status.values() if v)
|
||||||
|
logger.info("Social profile links: %d/%d accessible", accessible, len(status))
|
||||||
|
return status
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Recommendations
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
def generate_recommendations(self, result: EntityAuditResult) -> list[str]:
|
||||||
|
"""Generate actionable entity SEO improvement recommendations."""
|
||||||
|
recs: list[str] = []
|
||||||
|
|
||||||
|
# PAA recommendations
|
||||||
|
if not result.paa_questions:
|
||||||
|
recs.append(
|
||||||
|
"브랜드 관련 People Also Ask(PAA) 질문이 감지되지 않았습니다. "
|
||||||
|
"FAQ 콘텐츠를 작성하여 PAA 노출 기회를 확보하세요."
|
||||||
|
)
|
||||||
|
elif len(result.paa_questions) < 5:
|
||||||
|
recs.append(
|
||||||
|
f"PAA 질문이 {len(result.paa_questions)}개만 감지되었습니다. "
|
||||||
|
"더 다양한 키워드에 대한 Q&A 콘텐츠를 강화하세요."
|
||||||
|
)
|
||||||
|
|
||||||
|
# FAQ schema recommendations
|
||||||
|
if not result.faq_rich_results:
|
||||||
|
recs.append(
|
||||||
|
"FAQPage schema가 감지되지 않았습니다. "
|
||||||
|
"FAQ 페이지에 FAQPage JSON-LD를 추가하여 Rich Result를 확보하세요."
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
invalid = [f for f in result.faq_rich_results if not f.schema_valid]
|
||||||
|
if invalid:
|
||||||
|
recs.append(
|
||||||
|
f"{len(invalid)}개의 FAQ schema에 유효성 문제가 있습니다. "
|
||||||
|
"Google Rich Results Test로 검증하세요."
|
||||||
|
)
|
||||||
|
|
||||||
|
# Entity schema recommendations
|
||||||
|
if not result.entity_schemas:
|
||||||
|
recs.append(
|
||||||
|
"Organization/Person/LocalBusiness schema가 없습니다. "
|
||||||
|
"홈페이지에 Organization schema JSON-LD를 추가하세요."
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
for schema in result.entity_schemas:
|
||||||
|
if schema.completeness < 50:
|
||||||
|
recs.append(
|
||||||
|
f"{schema.type} schema 완성도가 {schema.completeness}%입니다. "
|
||||||
|
f"누락 항목: {', '.join(schema.issues[:3])}"
|
||||||
|
)
|
||||||
|
if not schema.same_as_links:
|
||||||
|
recs.append(
|
||||||
|
f"{schema.type} schema에 sameAs 속성이 없습니다. "
|
||||||
|
"소셜 미디어 프로필 URL을 sameAs에 추가하세요."
|
||||||
|
)
|
||||||
|
|
||||||
|
# Brand SERP recommendations
|
||||||
|
serp = result.brand_serp
|
||||||
|
if not serp.knowledge_panel:
|
||||||
|
recs.append(
|
||||||
|
"브랜드 검색 시 Knowledge Panel이 표시되지 않습니다. "
|
||||||
|
"Wikipedia, Wikidata, 구조화된 데이터를 통해 엔티티 인식을 강화하세요."
|
||||||
|
)
|
||||||
|
if not serp.sitelinks:
|
||||||
|
recs.append(
|
||||||
|
"Sitelinks가 표시되지 않습니다. "
|
||||||
|
"사이트 구조와 내부 링크를 개선하세요."
|
||||||
|
)
|
||||||
|
if len(serp.social_profiles) < 3:
|
||||||
|
recs.append(
|
||||||
|
f"SERP에 소셜 프로필이 {len(serp.social_profiles)}개만 표시됩니다. "
|
||||||
|
"주요 소셜 미디어 프로필을 활성화하고 schema sameAs에 연결하세요."
|
||||||
|
)
|
||||||
|
|
||||||
|
# Social profile accessibility
|
||||||
|
broken = [url for url, ok in result.social_profile_status.items() if not ok]
|
||||||
|
if broken:
|
||||||
|
recs.append(
|
||||||
|
f"접근 불가한 소셜 프로필 링크 {len(broken)}개: "
|
||||||
|
f"{', '.join(broken[:3])}. sameAs URL을 업데이트하세요."
|
||||||
|
)
|
||||||
|
|
||||||
|
if not recs:
|
||||||
|
recs.append("Entity SEO 상태가 양호합니다. 현재 수준을 유지하세요.")
|
||||||
|
|
||||||
|
return recs
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Scoring
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
def compute_score(self, result: EntityAuditResult) -> float:
|
||||||
|
"""Compute overall entity SEO score (0-100)."""
|
||||||
|
score = 0.0
|
||||||
|
|
||||||
|
# PAA presence (15 points)
|
||||||
|
paa_count = len(result.paa_questions)
|
||||||
|
if paa_count >= 10:
|
||||||
|
score += 15
|
||||||
|
elif paa_count >= 5:
|
||||||
|
score += 10
|
||||||
|
elif paa_count > 0:
|
||||||
|
score += 5
|
||||||
|
|
||||||
|
# FAQ schema (15 points)
|
||||||
|
if result.faq_rich_results:
|
||||||
|
valid_count = sum(1 for f in result.faq_rich_results if f.schema_valid)
|
||||||
|
score += min(15, valid_count * 5)
|
||||||
|
|
||||||
|
# Entity schema (25 points)
|
||||||
|
if result.entity_schemas:
|
||||||
|
best_completeness = max(s.completeness for s in result.entity_schemas)
|
||||||
|
score += best_completeness * 0.25
|
||||||
|
|
||||||
|
# Brand SERP features (25 points)
|
||||||
|
serp = result.brand_serp
|
||||||
|
if serp.knowledge_panel:
|
||||||
|
score += 10
|
||||||
|
if serp.sitelinks:
|
||||||
|
score += 5
|
||||||
|
score += min(10, len(serp.features) * 2)
|
||||||
|
|
||||||
|
# Social profiles (10 points)
|
||||||
|
if result.social_profile_status:
|
||||||
|
accessible = sum(1 for v in result.social_profile_status.values() if v)
|
||||||
|
total = len(result.social_profile_status)
|
||||||
|
score += (accessible / total) * 10 if total > 0 else 0
|
||||||
|
|
||||||
|
# sameAs links (10 points)
|
||||||
|
total_same_as = sum(len(s.same_as_links) for s in result.entity_schemas)
|
||||||
|
score += min(10, total_same_as * 2)
|
||||||
|
|
||||||
|
return round(min(100, score), 1)
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Main orchestrator
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
async def audit(
|
||||||
|
self,
|
||||||
|
url: str,
|
||||||
|
entity_name: str,
|
||||||
|
include_paa: bool = True,
|
||||||
|
include_faq: bool = True,
|
||||||
|
) -> EntityAuditResult:
|
||||||
|
"""Orchestrate full entity SEO audit."""
|
||||||
|
result = EntityAuditResult(url=url, entity_name=entity_name)
|
||||||
|
logger.info("Starting entity audit for '%s' at %s", entity_name, url)
|
||||||
|
|
||||||
|
async with aiohttp.ClientSession() as session:
|
||||||
|
# Parallel tasks: entity schema, brand SERP, FAQ
|
||||||
|
tasks = [
|
||||||
|
self.audit_entity_schema(url, session),
|
||||||
|
self.analyze_brand_serp(entity_name, session),
|
||||||
|
]
|
||||||
|
|
||||||
|
if include_faq:
|
||||||
|
tasks.append(self.track_faq_rich_results(url, session))
|
||||||
|
|
||||||
|
results = await asyncio.gather(*tasks, return_exceptions=True)
|
||||||
|
|
||||||
|
# Unpack results
|
||||||
|
if not isinstance(results[0], Exception):
|
||||||
|
result.entity_schemas = results[0]
|
||||||
|
else:
|
||||||
|
logger.error("Entity schema audit failed: %s", results[0])
|
||||||
|
|
||||||
|
if not isinstance(results[1], Exception):
|
||||||
|
result.brand_serp = results[1]
|
||||||
|
else:
|
||||||
|
logger.error("Brand SERP analysis failed: %s", results[1])
|
||||||
|
|
||||||
|
if include_faq and len(results) > 2 and not isinstance(results[2], Exception):
|
||||||
|
result.faq_rich_results = results[2]
|
||||||
|
|
||||||
|
# PAA monitoring (sequential due to rate limits)
|
||||||
|
if include_paa:
|
||||||
|
result.paa_questions = await self.monitor_paa(entity_name, session=session)
|
||||||
|
|
||||||
|
# Validate social profile links from schema
|
||||||
|
all_same_as = []
|
||||||
|
for schema in result.entity_schemas:
|
||||||
|
all_same_as.extend(schema.same_as_links)
|
||||||
|
if all_same_as:
|
||||||
|
result.social_profile_status = await self.check_social_profile_links(
|
||||||
|
list(set(all_same_as)), session
|
||||||
|
)
|
||||||
|
|
||||||
|
# Compute score and recommendations
|
||||||
|
result.overall_score = self.compute_score(result)
|
||||||
|
result.recommendations = self.generate_recommendations(result)
|
||||||
|
|
||||||
|
logger.info("Entity audit complete. Score: %.1f", result.overall_score)
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# CLI display helpers
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def display_result(result: EntityAuditResult) -> None:
|
||||||
|
"""Display audit result in rich tables."""
|
||||||
|
console.print()
|
||||||
|
console.print(f"[bold cyan]Entity SEO Audit: {result.entity_name}[/bold cyan]")
|
||||||
|
console.print(f"URL: {result.url} | Score: {result.overall_score}/100")
|
||||||
|
console.print()
|
||||||
|
|
||||||
|
# Entity Schema table
|
||||||
|
if result.entity_schemas:
|
||||||
|
table = Table(title="Entity Schema Markup", show_header=True)
|
||||||
|
table.add_column("Type", style="bold")
|
||||||
|
table.add_column("Completeness")
|
||||||
|
table.add_column("sameAs Links")
|
||||||
|
table.add_column("Issues")
|
||||||
|
|
||||||
|
for schema in result.entity_schemas:
|
||||||
|
issues_text = "; ".join(schema.issues[:3]) if schema.issues else "None"
|
||||||
|
table.add_row(
|
||||||
|
schema.type,
|
||||||
|
f"{schema.completeness}%",
|
||||||
|
str(len(schema.same_as_links)),
|
||||||
|
issues_text,
|
||||||
|
)
|
||||||
|
console.print(table)
|
||||||
|
else:
|
||||||
|
console.print("[red]No entity schema markup found on website![/red]")
|
||||||
|
console.print()
|
||||||
|
|
||||||
|
# Brand SERP table
|
||||||
|
serp = result.brand_serp
|
||||||
|
serp_table = Table(title="Brand SERP Analysis", show_header=True)
|
||||||
|
serp_table.add_column("Feature", style="bold")
|
||||||
|
serp_table.add_column("Status")
|
||||||
|
|
||||||
|
serp_table.add_row("Knowledge Panel", "[green]Yes[/]" if serp.knowledge_panel else "[red]No[/]")
|
||||||
|
serp_table.add_row("Sitelinks", "[green]Yes[/]" if serp.sitelinks else "[red]No[/]")
|
||||||
|
serp_table.add_row("PAA Count", str(serp.paa_count))
|
||||||
|
serp_table.add_row("SERP Features", ", ".join(serp.features) if serp.features else "None")
|
||||||
|
serp_table.add_row("Social Profiles", ", ".join(serp.social_profiles) if serp.social_profiles else "None")
|
||||||
|
|
||||||
|
console.print(serp_table)
|
||||||
|
console.print()
|
||||||
|
|
||||||
|
# PAA Questions
|
||||||
|
if result.paa_questions:
|
||||||
|
paa_table = Table(title=f"People Also Ask ({len(result.paa_questions)} questions)", show_header=True)
|
||||||
|
paa_table.add_column("#", style="dim")
|
||||||
|
paa_table.add_column("Question")
|
||||||
|
paa_table.add_column("Keyword")
|
||||||
|
|
||||||
|
for i, q in enumerate(result.paa_questions[:15], 1):
|
||||||
|
paa_table.add_row(str(i), q.question, q.keyword)
|
||||||
|
console.print(paa_table)
|
||||||
|
console.print()
|
||||||
|
|
||||||
|
# FAQ Rich Results
|
||||||
|
if result.faq_rich_results:
|
||||||
|
faq_table = Table(title="FAQ Rich Results", show_header=True)
|
||||||
|
faq_table.add_column("URL")
|
||||||
|
faq_table.add_column("Questions")
|
||||||
|
faq_table.add_column("Valid")
|
||||||
|
|
||||||
|
for faq in result.faq_rich_results:
|
||||||
|
faq_table.add_row(
|
||||||
|
faq.url[:60],
|
||||||
|
str(faq.question_count),
|
||||||
|
"[green]Yes[/]" if faq.schema_valid else "[red]No[/]",
|
||||||
|
)
|
||||||
|
console.print(faq_table)
|
||||||
|
console.print()
|
||||||
|
|
||||||
|
# Social Profile Status
|
||||||
|
if result.social_profile_status:
|
||||||
|
sp_table = Table(title="Social Profile Link Status", show_header=True)
|
||||||
|
sp_table.add_column("URL")
|
||||||
|
sp_table.add_column("Accessible")
|
||||||
|
|
||||||
|
for link, accessible in result.social_profile_status.items():
|
||||||
|
sp_table.add_row(
|
||||||
|
link[:70],
|
||||||
|
"[green]Yes[/]" if accessible else "[red]No[/]",
|
||||||
|
)
|
||||||
|
console.print(sp_table)
|
||||||
|
console.print()
|
||||||
|
|
||||||
|
# Recommendations
|
||||||
|
console.print("[bold yellow]Recommendations:[/bold yellow]")
|
||||||
|
for i, rec in enumerate(result.recommendations, 1):
|
||||||
|
console.print(f" {i}. {rec}")
|
||||||
|
console.print()
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Main
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def parse_args() -> argparse.Namespace:
|
||||||
|
parser = argparse.ArgumentParser(
|
||||||
|
description="Entity SEO Auditor",
|
||||||
|
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||||
|
)
|
||||||
|
parser.add_argument("--url", required=True, help="Website URL to audit")
|
||||||
|
parser.add_argument("--entity", required=True, help="Entity/brand name")
|
||||||
|
parser.add_argument("--paa", action="store_true", default=True, help="Include PAA monitoring (default: True)")
|
||||||
|
parser.add_argument("--no-paa", action="store_true", help="Skip PAA monitoring")
|
||||||
|
parser.add_argument("--faq", action="store_true", default=True, help="Include FAQ tracking (default: True)")
|
||||||
|
parser.add_argument("--no-faq", action="store_true", help="Skip FAQ tracking")
|
||||||
|
parser.add_argument("--json", action="store_true", help="Output as JSON")
|
||||||
|
parser.add_argument("--output", type=str, help="Output file path")
|
||||||
|
return parser.parse_args()
|
||||||
|
|
||||||
|
|
||||||
|
async def main() -> None:
|
||||||
|
args = parse_args()
|
||||||
|
|
||||||
|
auditor = EntityAuditor()
|
||||||
|
result = await auditor.audit(
|
||||||
|
url=args.url,
|
||||||
|
entity_name=args.entity,
|
||||||
|
include_paa=not args.no_paa,
|
||||||
|
include_faq=not args.no_faq,
|
||||||
|
)
|
||||||
|
|
||||||
|
if args.json:
|
||||||
|
output = json.dumps(result.to_dict(), ensure_ascii=False, indent=2)
|
||||||
|
if args.output:
|
||||||
|
with open(args.output, "w", encoding="utf-8") as f:
|
||||||
|
f.write(output)
|
||||||
|
console.print(f"[green]Output saved to {args.output}[/green]")
|
||||||
|
else:
|
||||||
|
print(output)
|
||||||
|
else:
|
||||||
|
display_result(result)
|
||||||
|
if args.output:
|
||||||
|
with open(args.output, "w", encoding="utf-8") as f:
|
||||||
|
json.dump(result.to_dict(), f, ensure_ascii=False, indent=2)
|
||||||
|
console.print(f"[green]Output saved to {args.output}[/green]")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
asyncio.run(main())
|
||||||
@@ -0,0 +1,782 @@
|
|||||||
|
"""
|
||||||
|
Knowledge Graph Analyzer
|
||||||
|
=========================
|
||||||
|
Purpose: Analyze entity presence in Google Knowledge Graph, Knowledge Panels,
|
||||||
|
Wikipedia, Wikidata, and Korean equivalents (Naver encyclopedia, 지식iN).
|
||||||
|
Python: 3.10+
|
||||||
|
"""
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import asyncio
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
import re
|
||||||
|
import sys
|
||||||
|
from dataclasses import asdict, dataclass, field
|
||||||
|
from datetime import datetime
|
||||||
|
from typing import Any
|
||||||
|
from urllib.parse import quote, urljoin
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
|
from bs4 import BeautifulSoup
|
||||||
|
from rich.console import Console
|
||||||
|
from rich.table import Table
|
||||||
|
|
||||||
|
from base_client import BaseAsyncClient, ConfigManager, config
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
console = Console()
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Data classes
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
EXPECTED_ATTRIBUTES = [
|
||||||
|
"name",
|
||||||
|
"type",
|
||||||
|
"description",
|
||||||
|
"logo",
|
||||||
|
"website",
|
||||||
|
"founded",
|
||||||
|
"ceo",
|
||||||
|
"headquarters",
|
||||||
|
"parent_organization",
|
||||||
|
"subsidiaries",
|
||||||
|
"social_twitter",
|
||||||
|
"social_facebook",
|
||||||
|
"social_linkedin",
|
||||||
|
"social_youtube",
|
||||||
|
"social_instagram",
|
||||||
|
"stock_ticker",
|
||||||
|
"industry",
|
||||||
|
"employees",
|
||||||
|
"revenue",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class KnowledgePanelAttribute:
|
||||||
|
"""Single attribute extracted from a Knowledge Panel."""
|
||||||
|
name: str
|
||||||
|
value: str | None = None
|
||||||
|
present: bool = False
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class KnowledgePanel:
|
||||||
|
"""Represents a detected Knowledge Panel."""
|
||||||
|
detected: bool = False
|
||||||
|
entity_type: str | None = None
|
||||||
|
attributes: list[KnowledgePanelAttribute] = field(default_factory=list)
|
||||||
|
completeness_score: float = 0.0
|
||||||
|
raw_snippet: str | None = None
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class WikiPresence:
|
||||||
|
"""Wikipedia or Wikidata presence record."""
|
||||||
|
platform: str = "" # "wikipedia" or "wikidata"
|
||||||
|
present: bool = False
|
||||||
|
url: str | None = None
|
||||||
|
qid: str | None = None # Wikidata QID (e.g. Q20710)
|
||||||
|
language: str = "en"
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class NaverPresence:
|
||||||
|
"""Naver encyclopedia and 지식iN presence."""
|
||||||
|
encyclopedia_present: bool = False
|
||||||
|
encyclopedia_url: str | None = None
|
||||||
|
knowledge_in_present: bool = False
|
||||||
|
knowledge_in_count: int = 0
|
||||||
|
knowledge_in_url: str | None = None
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class KnowledgeGraphResult:
|
||||||
|
"""Full Knowledge Graph analysis result."""
|
||||||
|
entity: str = ""
|
||||||
|
language: str = "en"
|
||||||
|
knowledge_panel: KnowledgePanel = field(default_factory=KnowledgePanel)
|
||||||
|
wikipedia: WikiPresence = field(default_factory=lambda: WikiPresence(platform="wikipedia"))
|
||||||
|
wikidata: WikiPresence = field(default_factory=lambda: WikiPresence(platform="wikidata"))
|
||||||
|
naver: NaverPresence = field(default_factory=NaverPresence)
|
||||||
|
competitors: list[dict[str, Any]] = field(default_factory=list)
|
||||||
|
overall_score: float = 0.0
|
||||||
|
recommendations: list[str] = field(default_factory=list)
|
||||||
|
timestamp: str = field(default_factory=lambda: datetime.now().isoformat())
|
||||||
|
|
||||||
|
def to_dict(self) -> dict[str, Any]:
|
||||||
|
return asdict(self)
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Knowledge Graph Analyzer
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
class KnowledgeGraphAnalyzer(BaseAsyncClient):
|
||||||
|
"""Analyze entity presence in Knowledge Graph and related platforms."""
|
||||||
|
|
||||||
|
GOOGLE_SEARCH_URL = "https://www.google.com/search"
|
||||||
|
WIKIPEDIA_API_URL = "https://{lang}.wikipedia.org/api/rest_v1/page/summary/{title}"
|
||||||
|
WIKIDATA_API_URL = "https://www.wikidata.org/w/api.php"
|
||||||
|
NAVER_SEARCH_URL = "https://search.naver.com/search.naver"
|
||||||
|
NAVER_ENCYCLOPEDIA_URL = "https://terms.naver.com/search.naver"
|
||||||
|
NAVER_KIN_URL = "https://kin.naver.com/search/list.naver"
|
||||||
|
|
||||||
|
HEADERS = {
|
||||||
|
"User-Agent": (
|
||||||
|
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
|
||||||
|
"AppleWebKit/537.36 (KHTML, like Gecko) "
|
||||||
|
"Chrome/120.0.0.0 Safari/537.36"
|
||||||
|
),
|
||||||
|
"Accept-Language": "en-US,en;q=0.9",
|
||||||
|
}
|
||||||
|
|
||||||
|
def __init__(self, **kwargs):
|
||||||
|
super().__init__(**kwargs)
|
||||||
|
self.config = config
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Google entity search
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
async def search_entity(
|
||||||
|
self,
|
||||||
|
entity_name: str,
|
||||||
|
language: str = "en",
|
||||||
|
session: aiohttp.ClientSession | None = None,
|
||||||
|
) -> dict[str, Any]:
|
||||||
|
"""Search Google for entity to detect Knowledge Panel signals."""
|
||||||
|
params = {"q": entity_name, "hl": language, "gl": "us" if language == "en" else "kr"}
|
||||||
|
headers = {**self.HEADERS}
|
||||||
|
if language == "ko":
|
||||||
|
headers["Accept-Language"] = "ko-KR,ko;q=0.9"
|
||||||
|
params["gl"] = "kr"
|
||||||
|
|
||||||
|
own_session = session is None
|
||||||
|
if own_session:
|
||||||
|
session = aiohttp.ClientSession()
|
||||||
|
|
||||||
|
try:
|
||||||
|
async with session.get(
|
||||||
|
self.GOOGLE_SEARCH_URL, params=params, headers=headers, timeout=aiohttp.ClientTimeout(total=20)
|
||||||
|
) as resp:
|
||||||
|
if resp.status != 200:
|
||||||
|
logger.warning("Google search returned status %d", resp.status)
|
||||||
|
return {"html": "", "status": resp.status}
|
||||||
|
html = await resp.text()
|
||||||
|
return {"html": html, "status": resp.status}
|
||||||
|
except Exception as exc:
|
||||||
|
logger.error("Google search failed: %s", exc)
|
||||||
|
return {"html": "", "status": 0, "error": str(exc)}
|
||||||
|
finally:
|
||||||
|
if own_session:
|
||||||
|
await session.close()
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Knowledge Panel detection
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
def detect_knowledge_panel(self, search_data: dict[str, Any]) -> KnowledgePanel:
|
||||||
|
"""Parse search results HTML for Knowledge Panel indicators."""
|
||||||
|
html = search_data.get("html", "")
|
||||||
|
if not html:
|
||||||
|
return KnowledgePanel(detected=False)
|
||||||
|
|
||||||
|
soup = BeautifulSoup(html, "lxml")
|
||||||
|
kp = KnowledgePanel()
|
||||||
|
|
||||||
|
# Knowledge Panel is typically in a div with class 'kp-wholepage' or 'knowledge-panel'
|
||||||
|
kp_selectors = [
|
||||||
|
"div.kp-wholepage",
|
||||||
|
"div.knowledge-panel",
|
||||||
|
"div[data-attrid='title']",
|
||||||
|
"div.kp-header",
|
||||||
|
"div[class*='kno-']",
|
||||||
|
"div.osrp-blk",
|
||||||
|
]
|
||||||
|
|
||||||
|
kp_element = None
|
||||||
|
for selector in kp_selectors:
|
||||||
|
kp_element = soup.select_one(selector)
|
||||||
|
if kp_element:
|
||||||
|
break
|
||||||
|
|
||||||
|
if kp_element:
|
||||||
|
kp.detected = True
|
||||||
|
kp.raw_snippet = kp_element.get_text(separator=" ", strip=True)[:500]
|
||||||
|
else:
|
||||||
|
# Fallback: check for common KP text patterns
|
||||||
|
text = soup.get_text(separator=" ", strip=True).lower()
|
||||||
|
kp_indicators = [
|
||||||
|
"wikipedia", "description", "founded", "ceo",
|
||||||
|
"headquarters", "subsidiaries", "parent organization",
|
||||||
|
]
|
||||||
|
matches = sum(1 for ind in kp_indicators if ind in text)
|
||||||
|
if matches >= 3:
|
||||||
|
kp.detected = True
|
||||||
|
kp.raw_snippet = text[:500]
|
||||||
|
|
||||||
|
return kp
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Attribute extraction
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
def extract_attributes(self, kp: KnowledgePanel, html: str = "") -> list[KnowledgePanelAttribute]:
|
||||||
|
"""Extract entity attributes from Knowledge Panel data."""
|
||||||
|
attributes: list[KnowledgePanelAttribute] = []
|
||||||
|
text = (kp.raw_snippet or "").lower()
|
||||||
|
|
||||||
|
# Parse HTML for structured attribute data
|
||||||
|
soup = BeautifulSoup(html, "lxml") if html else None
|
||||||
|
|
||||||
|
attribute_patterns = {
|
||||||
|
"name": r"^(.+?)(?:\s+is\s+|\s*[-|]\s*)",
|
||||||
|
"type": r"(?:is\s+(?:a|an)\s+)(\w[\w\s]+?)(?:\.|,|\s+based)",
|
||||||
|
"description": r"(?:is\s+)(.{20,200}?)(?:\.\s)",
|
||||||
|
"founded": r"(?:founded|established|incorporated)\s*(?:in|:)?\s*(\d{4})",
|
||||||
|
"ceo": r"(?:ceo|chief executive|chairman)\s*(?::|is)?\s*([A-Z][\w\s.]+?)(?:,|\.|;|\s{2})",
|
||||||
|
"headquarters": r"(?:headquarters?|hq|based in)\s*(?::|is|in)?\s*([A-Z][\w\s,]+?)(?:\.|;|\s{2})",
|
||||||
|
"stock_ticker": r"(?:stock|ticker|symbol)\s*(?::|is)?\s*([A-Z]{1,5}(?:\s*:\s*[A-Z]{1,5})?)",
|
||||||
|
"employees": r"(?:employees?|staff|workforce)\s*(?::|is)?\s*([\d,]+)",
|
||||||
|
"revenue": r"(?:revenue|sales)\s*(?::|is)?\s*([\$\d,.]+\s*(?:billion|million|B|M)?)",
|
||||||
|
"industry": r"(?:industry|sector)\s*(?::|is)?\s*([\w\s&]+?)(?:\.|,|;)",
|
||||||
|
}
|
||||||
|
|
||||||
|
social_patterns = {
|
||||||
|
"social_twitter": r"(?:twitter\.com|x\.com)/(\w+)",
|
||||||
|
"social_facebook": r"facebook\.com/([\w.]+)",
|
||||||
|
"social_linkedin": r"linkedin\.com/(?:company|in)/([\w-]+)",
|
||||||
|
"social_youtube": r"youtube\.com/(?:@|channel/|user/)([\w-]+)",
|
||||||
|
"social_instagram": r"instagram\.com/([\w.]+)",
|
||||||
|
}
|
||||||
|
|
||||||
|
full_text = kp.raw_snippet or ""
|
||||||
|
html_text = ""
|
||||||
|
if soup:
|
||||||
|
html_text = soup.get_text(separator=" ", strip=True)
|
||||||
|
|
||||||
|
combined = f"{full_text} {html_text}"
|
||||||
|
|
||||||
|
for attr_name, pattern in attribute_patterns.items():
|
||||||
|
match = re.search(pattern, combined, re.IGNORECASE)
|
||||||
|
present = match is not None
|
||||||
|
value = match.group(1).strip() if match else None
|
||||||
|
attributes.append(KnowledgePanelAttribute(name=attr_name, value=value, present=present))
|
||||||
|
|
||||||
|
# Social profiles
|
||||||
|
for attr_name, pattern in social_patterns.items():
|
||||||
|
match = re.search(pattern, combined, re.IGNORECASE)
|
||||||
|
present = match is not None
|
||||||
|
value = match.group(1).strip() if match else None
|
||||||
|
attributes.append(KnowledgePanelAttribute(name=attr_name, value=value, present=present))
|
||||||
|
|
||||||
|
# Logo detection from HTML
|
||||||
|
logo_present = False
|
||||||
|
if soup:
|
||||||
|
logo_img = soup.select_one("img[data-atf], g-img img, img.kno-fb-img, img[alt*='logo']")
|
||||||
|
if logo_img:
|
||||||
|
logo_present = True
|
||||||
|
attributes.append(KnowledgePanelAttribute(name="logo", value=None, present=logo_present))
|
||||||
|
|
||||||
|
# Website detection
|
||||||
|
website_present = False
|
||||||
|
if soup:
|
||||||
|
site_link = soup.select_one("a[data-attrid*='website'], a.ab_button[href*='http']")
|
||||||
|
if site_link:
|
||||||
|
website_present = True
|
||||||
|
value = site_link.get("href", "")
|
||||||
|
attributes.append(KnowledgePanelAttribute(name="website", value=value if website_present else None, present=website_present))
|
||||||
|
|
||||||
|
return attributes
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Completeness scoring
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
def score_completeness(self, attributes: list[KnowledgePanelAttribute]) -> float:
|
||||||
|
"""Score attribute completeness (0-100) based on filled vs expected."""
|
||||||
|
if not attributes:
|
||||||
|
return 0.0
|
||||||
|
|
||||||
|
weights = {
|
||||||
|
"name": 10, "type": 8, "description": 10, "logo": 8, "website": 10,
|
||||||
|
"founded": 5, "ceo": 5, "headquarters": 5, "parent_organization": 3,
|
||||||
|
"subsidiaries": 3, "social_twitter": 4, "social_facebook": 4,
|
||||||
|
"social_linkedin": 4, "social_youtube": 3, "social_instagram": 3,
|
||||||
|
"stock_ticker": 3, "industry": 5, "employees": 3, "revenue": 4,
|
||||||
|
}
|
||||||
|
|
||||||
|
total_weight = sum(weights.values())
|
||||||
|
earned = 0.0
|
||||||
|
|
||||||
|
attr_map = {a.name: a for a in attributes}
|
||||||
|
for attr_name, weight in weights.items():
|
||||||
|
attr = attr_map.get(attr_name)
|
||||||
|
if attr and attr.present:
|
||||||
|
earned += weight
|
||||||
|
|
||||||
|
return round((earned / total_weight) * 100, 1) if total_weight > 0 else 0.0
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Wikipedia check
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
async def check_wikipedia(
|
||||||
|
self,
|
||||||
|
entity_name: str,
|
||||||
|
language: str = "en",
|
||||||
|
session: aiohttp.ClientSession | None = None,
|
||||||
|
) -> WikiPresence:
|
||||||
|
"""Check Wikipedia article existence for entity."""
|
||||||
|
wiki = WikiPresence(platform="wikipedia", language=language)
|
||||||
|
title = entity_name.replace(" ", "_")
|
||||||
|
url = self.WIKIPEDIA_API_URL.format(lang=language, title=quote(title))
|
||||||
|
|
||||||
|
own_session = session is None
|
||||||
|
if own_session:
|
||||||
|
session = aiohttp.ClientSession()
|
||||||
|
|
||||||
|
try:
|
||||||
|
async with session.get(url, headers=self.HEADERS, timeout=aiohttp.ClientTimeout(total=15)) as resp:
|
||||||
|
if resp.status == 200:
|
||||||
|
data = await resp.json()
|
||||||
|
wiki.present = data.get("type") != "disambiguation"
|
||||||
|
wiki.url = data.get("content_urls", {}).get("desktop", {}).get("page", "")
|
||||||
|
if not wiki.url:
|
||||||
|
wiki.url = f"https://{language}.wikipedia.org/wiki/{quote(title)}"
|
||||||
|
logger.info("Wikipedia article found for '%s' (%s)", entity_name, language)
|
||||||
|
elif resp.status == 404:
|
||||||
|
wiki.present = False
|
||||||
|
logger.info("No Wikipedia article for '%s' (%s)", entity_name, language)
|
||||||
|
else:
|
||||||
|
logger.warning("Wikipedia API returned status %d", resp.status)
|
||||||
|
except Exception as exc:
|
||||||
|
logger.error("Wikipedia check failed: %s", exc)
|
||||||
|
finally:
|
||||||
|
if own_session:
|
||||||
|
await session.close()
|
||||||
|
|
||||||
|
return wiki
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Wikidata check
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
async def check_wikidata(
|
||||||
|
self,
|
||||||
|
entity_name: str,
|
||||||
|
session: aiohttp.ClientSession | None = None,
|
||||||
|
) -> WikiPresence:
|
||||||
|
"""Check Wikidata QID existence for entity."""
|
||||||
|
wiki = WikiPresence(platform="wikidata")
|
||||||
|
params = {
|
||||||
|
"action": "wbsearchentities",
|
||||||
|
"search": entity_name,
|
||||||
|
"language": "en",
|
||||||
|
"format": "json",
|
||||||
|
"limit": 5,
|
||||||
|
}
|
||||||
|
|
||||||
|
own_session = session is None
|
||||||
|
if own_session:
|
||||||
|
session = aiohttp.ClientSession()
|
||||||
|
|
||||||
|
try:
|
||||||
|
async with session.get(
|
||||||
|
self.WIKIDATA_API_URL, params=params, headers=self.HEADERS,
|
||||||
|
timeout=aiohttp.ClientTimeout(total=15),
|
||||||
|
) as resp:
|
||||||
|
if resp.status == 200:
|
||||||
|
data = await resp.json()
|
||||||
|
results = data.get("search", [])
|
||||||
|
if results:
|
||||||
|
top = results[0]
|
||||||
|
wiki.present = True
|
||||||
|
wiki.qid = top.get("id", "")
|
||||||
|
wiki.url = top.get("concepturi", f"https://www.wikidata.org/wiki/{wiki.qid}")
|
||||||
|
logger.info("Wikidata entity found: %s (%s)", wiki.qid, entity_name)
|
||||||
|
else:
|
||||||
|
wiki.present = False
|
||||||
|
logger.info("No Wikidata entity for '%s'", entity_name)
|
||||||
|
else:
|
||||||
|
logger.warning("Wikidata API returned status %d", resp.status)
|
||||||
|
except Exception as exc:
|
||||||
|
logger.error("Wikidata check failed: %s", exc)
|
||||||
|
finally:
|
||||||
|
if own_session:
|
||||||
|
await session.close()
|
||||||
|
|
||||||
|
return wiki
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Naver encyclopedia
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
async def check_naver_encyclopedia(
|
||||||
|
self,
|
||||||
|
entity_name: str,
|
||||||
|
session: aiohttp.ClientSession | None = None,
|
||||||
|
) -> dict[str, Any]:
|
||||||
|
"""Check Naver encyclopedia (네이버 백과사전) presence."""
|
||||||
|
result = {"present": False, "url": None}
|
||||||
|
params = {"query": entity_name, "searchType": 0}
|
||||||
|
headers = {
|
||||||
|
**self.HEADERS,
|
||||||
|
"Accept-Language": "ko-KR,ko;q=0.9",
|
||||||
|
}
|
||||||
|
|
||||||
|
own_session = session is None
|
||||||
|
if own_session:
|
||||||
|
session = aiohttp.ClientSession()
|
||||||
|
|
||||||
|
try:
|
||||||
|
async with session.get(
|
||||||
|
self.NAVER_ENCYCLOPEDIA_URL, params=params, headers=headers,
|
||||||
|
timeout=aiohttp.ClientTimeout(total=15),
|
||||||
|
) as resp:
|
||||||
|
if resp.status == 200:
|
||||||
|
html = await resp.text()
|
||||||
|
soup = BeautifulSoup(html, "lxml")
|
||||||
|
# Look for search result entries
|
||||||
|
entries = soup.select("ul.content_list li, div.search_result a, a.title")
|
||||||
|
if entries:
|
||||||
|
result["present"] = True
|
||||||
|
first_link = entries[0].find("a")
|
||||||
|
if first_link and first_link.get("href"):
|
||||||
|
href = first_link["href"]
|
||||||
|
if not href.startswith("http"):
|
||||||
|
href = urljoin("https://terms.naver.com", href)
|
||||||
|
result["url"] = href
|
||||||
|
else:
|
||||||
|
result["url"] = f"https://terms.naver.com/search.naver?query={quote(entity_name)}"
|
||||||
|
logger.info("Naver encyclopedia entry found for '%s'", entity_name)
|
||||||
|
else:
|
||||||
|
# Fallback: check page text for result indicators
|
||||||
|
text = soup.get_text()
|
||||||
|
if entity_name in text and "검색결과가 없습니다" not in text:
|
||||||
|
result["present"] = True
|
||||||
|
result["url"] = f"https://terms.naver.com/search.naver?query={quote(entity_name)}"
|
||||||
|
else:
|
||||||
|
logger.warning("Naver encyclopedia returned status %d", resp.status)
|
||||||
|
except Exception as exc:
|
||||||
|
logger.error("Naver encyclopedia check failed: %s", exc)
|
||||||
|
finally:
|
||||||
|
if own_session:
|
||||||
|
await session.close()
|
||||||
|
|
||||||
|
return result
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Naver knowledge iN
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
async def check_naver_knowledge_in(
|
||||||
|
self,
|
||||||
|
entity_name: str,
|
||||||
|
session: aiohttp.ClientSession | None = None,
|
||||||
|
) -> dict[str, Any]:
|
||||||
|
"""Check Naver knowledge iN (지식iN) entries."""
|
||||||
|
result = {"present": False, "count": 0, "url": None}
|
||||||
|
params = {"query": entity_name}
|
||||||
|
headers = {
|
||||||
|
**self.HEADERS,
|
||||||
|
"Accept-Language": "ko-KR,ko;q=0.9",
|
||||||
|
}
|
||||||
|
|
||||||
|
own_session = session is None
|
||||||
|
if own_session:
|
||||||
|
session = aiohttp.ClientSession()
|
||||||
|
|
||||||
|
try:
|
||||||
|
async with session.get(
|
||||||
|
self.NAVER_KIN_URL, params=params, headers=headers,
|
||||||
|
timeout=aiohttp.ClientTimeout(total=15),
|
||||||
|
) as resp:
|
||||||
|
if resp.status == 200:
|
||||||
|
html = await resp.text()
|
||||||
|
soup = BeautifulSoup(html, "lxml")
|
||||||
|
|
||||||
|
# Extract total result count
|
||||||
|
count_el = soup.select_one("span.number, em.total_count, span.result_count")
|
||||||
|
count = 0
|
||||||
|
if count_el:
|
||||||
|
count_text = count_el.get_text(strip=True).replace(",", "")
|
||||||
|
count_match = re.search(r"(\d+)", count_text)
|
||||||
|
if count_match:
|
||||||
|
count = int(count_match.group(1))
|
||||||
|
|
||||||
|
# Also check for list items
|
||||||
|
entries = soup.select("ul.basic1 li, ul._list li, div.search_list li")
|
||||||
|
if count > 0 or entries:
|
||||||
|
result["present"] = True
|
||||||
|
result["count"] = count if count > 0 else len(entries)
|
||||||
|
result["url"] = f"https://kin.naver.com/search/list.naver?query={quote(entity_name)}"
|
||||||
|
logger.info("Naver 지식iN: %d entries for '%s'", result["count"], entity_name)
|
||||||
|
else:
|
||||||
|
logger.info("No Naver 지식iN entries for '%s'", entity_name)
|
||||||
|
else:
|
||||||
|
logger.warning("Naver 지식iN returned status %d", resp.status)
|
||||||
|
except Exception as exc:
|
||||||
|
logger.error("Naver 지식iN check failed: %s", exc)
|
||||||
|
finally:
|
||||||
|
if own_session:
|
||||||
|
await session.close()
|
||||||
|
|
||||||
|
return result
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Recommendations
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
def generate_recommendations(self, result: KnowledgeGraphResult) -> list[str]:
|
||||||
|
"""Generate actionable recommendations based on analysis."""
|
||||||
|
recs: list[str] = []
|
||||||
|
|
||||||
|
kp = result.knowledge_panel
|
||||||
|
if not kp.detected:
|
||||||
|
recs.append(
|
||||||
|
"Knowledge Panel이 감지되지 않았습니다. Google에 엔티티 등록을 위해 "
|
||||||
|
"Wikipedia 페이지 생성, Wikidata 항목 추가, 구조화된 데이터(Organization schema) 구현을 권장합니다."
|
||||||
|
)
|
||||||
|
elif kp.completeness_score < 50:
|
||||||
|
recs.append(
|
||||||
|
f"Knowledge Panel 완성도가 {kp.completeness_score}%로 낮습니다. "
|
||||||
|
"누락된 속성(소셜 프로필, 설명, 로고 등)을 보강하세요."
|
||||||
|
)
|
||||||
|
|
||||||
|
if not result.wikipedia.present:
|
||||||
|
recs.append(
|
||||||
|
"Wikipedia 문서가 없습니다. 주목할 만한 출처(reliable sources)를 확보한 후 "
|
||||||
|
"Wikipedia 문서 생성을 고려하세요."
|
||||||
|
)
|
||||||
|
|
||||||
|
if not result.wikidata.present:
|
||||||
|
recs.append(
|
||||||
|
"Wikidata 항목이 없습니다. Wikidata에 엔티티를 등록하여 "
|
||||||
|
"Knowledge Graph 인식을 강화하세요."
|
||||||
|
)
|
||||||
|
|
||||||
|
if not result.naver.encyclopedia_present:
|
||||||
|
recs.append(
|
||||||
|
"네이버 백과사전에 등록되어 있지 않습니다. 한국 시장 SEO를 위해 "
|
||||||
|
"네이버 백과사전 등재를 검토하세요."
|
||||||
|
)
|
||||||
|
|
||||||
|
if result.naver.knowledge_in_count < 5:
|
||||||
|
recs.append(
|
||||||
|
"네이버 지식iN에 관련 콘텐츠가 부족합니다. Q&A 콘텐츠를 통해 "
|
||||||
|
"브랜드 엔티티 인지도를 높이세요."
|
||||||
|
)
|
||||||
|
|
||||||
|
# Check social profile completeness
|
||||||
|
attr_map = {a.name: a for a in kp.attributes}
|
||||||
|
missing_social = []
|
||||||
|
for soc in ["social_twitter", "social_facebook", "social_linkedin", "social_youtube"]:
|
||||||
|
attr = attr_map.get(soc)
|
||||||
|
if not attr or not attr.present:
|
||||||
|
missing_social.append(soc.replace("social_", "").title())
|
||||||
|
if missing_social:
|
||||||
|
recs.append(
|
||||||
|
f"소셜 프로필 연결 누락: {', '.join(missing_social)}. "
|
||||||
|
"웹사이트 schema의 sameAs 속성에 소셜 프로필을 추가하세요."
|
||||||
|
)
|
||||||
|
|
||||||
|
if not recs:
|
||||||
|
recs.append("Knowledge Graph 엔티티 상태가 양호합니다. 현재 수준을 유지하세요.")
|
||||||
|
|
||||||
|
return recs
|
||||||
|
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
# Main orchestrator
|
||||||
|
# ------------------------------------------------------------------
|
||||||
|
|
||||||
|
async def analyze(
|
||||||
|
self,
|
||||||
|
entity_name: str,
|
||||||
|
language: str = "en",
|
||||||
|
include_wiki: bool = True,
|
||||||
|
include_naver: bool = True,
|
||||||
|
) -> KnowledgeGraphResult:
|
||||||
|
"""Orchestrate full Knowledge Graph analysis."""
|
||||||
|
result = KnowledgeGraphResult(entity=entity_name, language=language)
|
||||||
|
logger.info("Starting Knowledge Graph analysis for '%s' (lang=%s)", entity_name, language)
|
||||||
|
|
||||||
|
async with aiohttp.ClientSession() as session:
|
||||||
|
# Step 1: Search entity on Google
|
||||||
|
search_data = await self.search_entity(entity_name, language, session)
|
||||||
|
|
||||||
|
# Step 2: Detect Knowledge Panel
|
||||||
|
kp = self.detect_knowledge_panel(search_data)
|
||||||
|
|
||||||
|
# Step 3: Extract attributes
|
||||||
|
if kp.detected:
|
||||||
|
kp.attributes = self.extract_attributes(kp, search_data.get("html", ""))
|
||||||
|
kp.completeness_score = self.score_completeness(kp.attributes)
|
||||||
|
|
||||||
|
# Detect entity type from attributes
|
||||||
|
for attr in kp.attributes:
|
||||||
|
if attr.name == "type" and attr.present:
|
||||||
|
kp.entity_type = attr.value
|
||||||
|
break
|
||||||
|
|
||||||
|
result.knowledge_panel = kp
|
||||||
|
|
||||||
|
# Step 4: Wikipedia and Wikidata checks (parallel)
|
||||||
|
if include_wiki:
|
||||||
|
wiki_task = self.check_wikipedia(entity_name, language, session)
|
||||||
|
wikidata_task = self.check_wikidata(entity_name, session)
|
||||||
|
result.wikipedia, result.wikidata = await asyncio.gather(wiki_task, wikidata_task)
|
||||||
|
|
||||||
|
# Step 5: Naver checks (parallel)
|
||||||
|
if include_naver:
|
||||||
|
enc_task = self.check_naver_encyclopedia(entity_name, session)
|
||||||
|
kin_task = self.check_naver_knowledge_in(entity_name, session)
|
||||||
|
enc_result, kin_result = await asyncio.gather(enc_task, kin_task)
|
||||||
|
|
||||||
|
result.naver = NaverPresence(
|
||||||
|
encyclopedia_present=enc_result.get("present", False),
|
||||||
|
encyclopedia_url=enc_result.get("url"),
|
||||||
|
knowledge_in_present=kin_result.get("present", False),
|
||||||
|
knowledge_in_count=kin_result.get("count", 0),
|
||||||
|
knowledge_in_url=kin_result.get("url"),
|
||||||
|
)
|
||||||
|
|
||||||
|
# Step 6: Compute overall score
|
||||||
|
scores = []
|
||||||
|
if kp.detected:
|
||||||
|
scores.append(kp.completeness_score * 0.35)
|
||||||
|
else:
|
||||||
|
scores.append(0)
|
||||||
|
scores.append(20.0 if result.wikipedia.present else 0)
|
||||||
|
scores.append(15.0 if result.wikidata.present else 0)
|
||||||
|
scores.append(15.0 if result.naver.encyclopedia_present else 0)
|
||||||
|
scores.append(15.0 if result.naver.knowledge_in_present else 0)
|
||||||
|
result.overall_score = round(sum(scores), 1)
|
||||||
|
|
||||||
|
# Step 7: Recommendations
|
||||||
|
result.recommendations = self.generate_recommendations(result)
|
||||||
|
|
||||||
|
logger.info("Analysis complete. Overall score: %.1f", result.overall_score)
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# CLI display helpers
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def display_result(result: KnowledgeGraphResult) -> None:
|
||||||
|
"""Display analysis result in a rich table."""
|
||||||
|
console.print()
|
||||||
|
console.print(f"[bold cyan]Knowledge Graph Analysis: {result.entity}[/bold cyan]")
|
||||||
|
console.print(f"Language: {result.language} | Score: {result.overall_score}/100")
|
||||||
|
console.print()
|
||||||
|
|
||||||
|
# Knowledge Panel table
|
||||||
|
kp = result.knowledge_panel
|
||||||
|
table = Table(title="Knowledge Panel", show_header=True)
|
||||||
|
table.add_column("Property", style="bold")
|
||||||
|
table.add_column("Value")
|
||||||
|
table.add_column("Status")
|
||||||
|
|
||||||
|
table.add_row("Detected", str(kp.detected), "[green]OK[/]" if kp.detected else "[red]Missing[/]")
|
||||||
|
table.add_row("Entity Type", kp.entity_type or "-", "[green]OK[/]" if kp.entity_type else "[yellow]Unknown[/]")
|
||||||
|
table.add_row("Completeness", f"{kp.completeness_score}%", "[green]OK[/]" if kp.completeness_score >= 50 else "[red]Low[/]")
|
||||||
|
|
||||||
|
for attr in kp.attributes:
|
||||||
|
status = "[green]Present[/]" if attr.present else "[red]Missing[/]"
|
||||||
|
table.add_row(f" {attr.name}", attr.value or "-", status)
|
||||||
|
|
||||||
|
console.print(table)
|
||||||
|
console.print()
|
||||||
|
|
||||||
|
# Platform presence table
|
||||||
|
plat_table = Table(title="Platform Presence", show_header=True)
|
||||||
|
plat_table.add_column("Platform", style="bold")
|
||||||
|
plat_table.add_column("Present")
|
||||||
|
plat_table.add_column("Details")
|
||||||
|
|
||||||
|
plat_table.add_row(
|
||||||
|
"Wikipedia",
|
||||||
|
"[green]Yes[/]" if result.wikipedia.present else "[red]No[/]",
|
||||||
|
result.wikipedia.url or "-",
|
||||||
|
)
|
||||||
|
plat_table.add_row(
|
||||||
|
"Wikidata",
|
||||||
|
"[green]Yes[/]" if result.wikidata.present else "[red]No[/]",
|
||||||
|
result.wikidata.qid or "-",
|
||||||
|
)
|
||||||
|
plat_table.add_row(
|
||||||
|
"Naver Encyclopedia",
|
||||||
|
"[green]Yes[/]" if result.naver.encyclopedia_present else "[red]No[/]",
|
||||||
|
result.naver.encyclopedia_url or "-",
|
||||||
|
)
|
||||||
|
plat_table.add_row(
|
||||||
|
"Naver 지식iN",
|
||||||
|
"[green]Yes[/]" if result.naver.knowledge_in_present else "[red]No[/]",
|
||||||
|
f"{result.naver.knowledge_in_count} entries" if result.naver.knowledge_in_present else "-",
|
||||||
|
)
|
||||||
|
|
||||||
|
console.print(plat_table)
|
||||||
|
console.print()
|
||||||
|
|
||||||
|
# Recommendations
|
||||||
|
console.print("[bold yellow]Recommendations:[/bold yellow]")
|
||||||
|
for i, rec in enumerate(result.recommendations, 1):
|
||||||
|
console.print(f" {i}. {rec}")
|
||||||
|
console.print()
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Main
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def parse_args() -> argparse.Namespace:
|
||||||
|
parser = argparse.ArgumentParser(
|
||||||
|
description="Knowledge Graph & Entity Analyzer",
|
||||||
|
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||||
|
)
|
||||||
|
parser.add_argument("--entity", required=True, help="Entity name to analyze")
|
||||||
|
parser.add_argument("--language", default="en", choices=["en", "ko", "ja", "zh"], help="Language (default: en)")
|
||||||
|
parser.add_argument("--wiki", action="store_true", default=True, help="Include Wikipedia/Wikidata check (default: True)")
|
||||||
|
parser.add_argument("--no-wiki", action="store_true", help="Skip Wikipedia/Wikidata check")
|
||||||
|
parser.add_argument("--no-naver", action="store_true", help="Skip Naver checks")
|
||||||
|
parser.add_argument("--json", action="store_true", help="Output as JSON")
|
||||||
|
parser.add_argument("--output", type=str, help="Output file path")
|
||||||
|
return parser.parse_args()
|
||||||
|
|
||||||
|
|
||||||
|
async def main() -> None:
|
||||||
|
args = parse_args()
|
||||||
|
|
||||||
|
analyzer = KnowledgeGraphAnalyzer()
|
||||||
|
result = await analyzer.analyze(
|
||||||
|
entity_name=args.entity,
|
||||||
|
language=args.language,
|
||||||
|
include_wiki=not args.no_wiki,
|
||||||
|
include_naver=not args.no_naver,
|
||||||
|
)
|
||||||
|
|
||||||
|
if args.json:
|
||||||
|
output = json.dumps(result.to_dict(), ensure_ascii=False, indent=2)
|
||||||
|
if args.output:
|
||||||
|
with open(args.output, "w", encoding="utf-8") as f:
|
||||||
|
f.write(output)
|
||||||
|
console.print(f"[green]Output saved to {args.output}[/green]")
|
||||||
|
else:
|
||||||
|
print(output)
|
||||||
|
else:
|
||||||
|
display_result(result)
|
||||||
|
if args.output:
|
||||||
|
with open(args.output, "w", encoding="utf-8") as f:
|
||||||
|
json.dump(result.to_dict(), f, ensure_ascii=False, indent=2)
|
||||||
|
console.print(f"[green]Output saved to {args.output}[/green]")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
asyncio.run(main())
|
||||||
@@ -0,0 +1,9 @@
|
|||||||
|
# 28-seo-knowledge-graph dependencies
|
||||||
|
requests>=2.31.0
|
||||||
|
aiohttp>=3.9.0
|
||||||
|
beautifulsoup4>=4.12.0
|
||||||
|
lxml>=5.1.0
|
||||||
|
tenacity>=8.2.0
|
||||||
|
tqdm>=4.66.0
|
||||||
|
python-dotenv>=1.0.0
|
||||||
|
rich>=13.7.0
|
||||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user