Add SEO skills 19-28, 31-32 with full Python implementations

12 new skills: Keyword Strategy, SERP Analysis, Position Tracking,
Link Building, Content Strategy, E-Commerce SEO, KPI Framework,
International SEO, AI Visibility, Knowledge Graph, Competitor Intel,
and Crawl Budget. ~20K lines of Python across 25 domain scripts.
Updated skill 11 pipeline table and repo CLAUDE.md.
Enhanced skill 18 local SEO workflow from jamie.clinic audit.

Note: Skill 26 hreflang_validator.py pending (content filter block).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-13 12:05:59 +09:00
parent 159f7ec3f7
commit a3ff965b87
125 changed files with 25948 additions and 173 deletions

View File

@@ -0,0 +1,149 @@
# CLAUDE.md
## Overview
International SEO audit tool for multi-language and multi-region website optimization. Validates hreflang tags (bidirectional, self-referencing, x-default), analyzes URL structure patterns (ccTLD vs subdomain vs subdirectory), audits content parity across language versions, checks language detection vs declared language, and analyzes international redirect logic. Supports Korean expansion patterns (ko→ja, ko→zh, ko→en).
## Quick Start
```bash
pip install -r scripts/requirements.txt
# Hreflang validation
python scripts/hreflang_validator.py --url https://example.com --json
# Full international SEO audit
python scripts/international_auditor.py --url https://example.com --json
```
## Scripts
| Script | Purpose | Key Output |
|--------|---------|------------|
| `hreflang_validator.py` | Validate hreflang tag implementation | Hreflang errors, missing bidirectional links, x-default issues |
| `international_auditor.py` | Full international SEO audit | URL structure, content parity, redirect logic, language detection |
| `base_client.py` | Shared utilities | RateLimiter, ConfigManager, BaseAsyncClient |
## Hreflang Validator
```bash
# Validate hreflang for homepage
python scripts/hreflang_validator.py --url https://example.com --json
# Validate with sitemap-based discovery
python scripts/hreflang_validator.py --url https://example.com --sitemap https://example.com/sitemap.xml --json
# Check specific pages
python scripts/hreflang_validator.py --urls-file pages.txt --json
```
**Capabilities**:
- Hreflang tag extraction from HTML head, HTTP headers, and XML sitemap
- Bidirectional validation (if page A→B, then B→A must exist)
- Self-referencing check (each page should reference itself)
- x-default tag verification
- Language/region code validation (ISO 639-1 + ISO 3166-1)
- Conflicting hreflang detection
- Missing language version detection
- Return tag validation (confirmation links from alternate pages)
## International Auditor
```bash
# Full international audit
python scripts/international_auditor.py --url https://example.com --json
# URL structure analysis
python scripts/international_auditor.py --url https://example.com --scope structure --json
# Content parity check
python scripts/international_auditor.py --url https://example.com --scope parity --json
# Korean expansion focus
python scripts/international_auditor.py --url https://example.com --korean-expansion --json
```
**Capabilities**:
- URL structure analysis (ccTLD vs subdomain vs subdirectory)
- Recommendation engine based on business context
- Content parity audit across language versions
- Page count comparison per language
- Key page availability check (home, about, contact, products)
- Content freshness comparison across languages
- Language/locale detection vs declared language
- HTML lang attribute check
- Content-Language header check
- Actual content language detection
- International redirect logic audit
- IP-based redirect detection
- Accept-Language redirect behavior
- Geo-redirect best practices (suggest→don't force)
- Korean expansion patterns (ko→ja, ko→zh, ko→en)
- Priority market recommendations for Korean businesses
- CJK-specific URL encoding issues
- Regional search engine considerations (Naver, Baidu, Yahoo Japan)
## Ahrefs MCP Tools Used
| Tool | Purpose |
|------|---------|
| `site-explorer-metrics-by-country` | Country-level traffic distribution |
| `site-explorer-organic-keywords` | Keywords by country filter |
## Output Format
```json
{
"url": "https://example.com",
"url_structure": "subdirectory",
"languages_detected": ["ko", "en", "ja"],
"hreflang_validation": {
"total_pages_checked": 50,
"errors": [],
"warnings": [],
"missing_bidirectional": [],
"missing_self_reference": [],
"x_default_present": true
},
"content_parity": {
"ko": {"pages": 150, "freshness_score": 90},
"en": {"pages": 120, "freshness_score": 75},
"ja": {"pages": 80, "freshness_score": 60}
},
"redirect_logic": {
"ip_based_redirect": false,
"language_based_redirect": true,
"is_forced": false
},
"score": 68,
"timestamp": "2025-01-01T00:00:00"
}
```
## Notion Output (Required)
**IMPORTANT**: All audit reports MUST be saved to the OurDigital SEO Audit Log database.
### Database Configuration
| Field | Value |
|-------|-------|
| Database ID | `2c8581e5-8a1e-8035-880b-e38cefc2f3ef` |
| URL | https://www.notion.so/dintelligence/2c8581e58a1e8035880be38cefc2f3ef |
### Required Properties
| Property | Type | Description |
|----------|------|-------------|
| Issue | Title | Report title (Korean + date) |
| Site | URL | Audited website URL |
| Category | Select | International SEO |
| Priority | Select | Based on hreflang error count |
| Found Date | Date | Audit date (YYYY-MM-DD) |
| Audit ID | Rich Text | Format: INTL-YYYYMMDD-NNN |
### Language Guidelines
- Report content in Korean (한국어)
- Keep technical English terms as-is (e.g., hreflang, x-default, ccTLD)
- URLs and code remain unchanged

View File

@@ -0,0 +1,207 @@
"""
Base Client - Shared async client utilities
===========================================
Purpose: Rate-limited async operations for API clients
Python: 3.10+
"""
import asyncio
import logging
import os
from asyncio import Semaphore
from datetime import datetime
from typing import Any, Callable, TypeVar
from dotenv import load_dotenv
from tenacity import (
retry,
stop_after_attempt,
wait_exponential,
retry_if_exception_type,
)
# Load environment variables
load_dotenv()
# Logging setup
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - %(levelname)s - %(message)s",
)
T = TypeVar("T")
class RateLimiter:
"""Rate limiter using token bucket algorithm."""
def __init__(self, rate: float, per: float = 1.0):
"""
Initialize rate limiter.
Args:
rate: Number of requests allowed
per: Time period in seconds (default: 1 second)
"""
self.rate = rate
self.per = per
self.tokens = rate
self.last_update = datetime.now()
self._lock = asyncio.Lock()
async def acquire(self) -> None:
"""Acquire a token, waiting if necessary."""
async with self._lock:
now = datetime.now()
elapsed = (now - self.last_update).total_seconds()
self.tokens = min(self.rate, self.tokens + elapsed * (self.rate / self.per))
self.last_update = now
if self.tokens < 1:
wait_time = (1 - self.tokens) * (self.per / self.rate)
await asyncio.sleep(wait_time)
self.tokens = 0
else:
self.tokens -= 1
class BaseAsyncClient:
"""Base class for async API clients with rate limiting."""
def __init__(
self,
max_concurrent: int = 5,
requests_per_second: float = 3.0,
logger: logging.Logger | None = None,
):
"""
Initialize base client.
Args:
max_concurrent: Maximum concurrent requests
requests_per_second: Rate limit
logger: Logger instance
"""
self.semaphore = Semaphore(max_concurrent)
self.rate_limiter = RateLimiter(requests_per_second)
self.logger = logger or logging.getLogger(self.__class__.__name__)
self.stats = {
"requests": 0,
"success": 0,
"errors": 0,
"retries": 0,
}
@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=2, max=10),
retry=retry_if_exception_type(Exception),
)
async def _rate_limited_request(
self,
coro: Callable[[], Any],
) -> Any:
"""Execute a request with rate limiting and retry."""
async with self.semaphore:
await self.rate_limiter.acquire()
self.stats["requests"] += 1
try:
result = await coro()
self.stats["success"] += 1
return result
except Exception as e:
self.stats["errors"] += 1
self.logger.error(f"Request failed: {e}")
raise
async def batch_requests(
self,
requests: list[Callable[[], Any]],
desc: str = "Processing",
) -> list[Any]:
"""Execute multiple requests concurrently."""
try:
from tqdm.asyncio import tqdm
has_tqdm = True
except ImportError:
has_tqdm = False
async def execute(req: Callable) -> Any:
try:
return await self._rate_limited_request(req)
except Exception as e:
return {"error": str(e)}
tasks = [execute(req) for req in requests]
if has_tqdm:
results = []
for coro in tqdm.as_completed(tasks, total=len(tasks), desc=desc):
result = await coro
results.append(result)
return results
else:
return await asyncio.gather(*tasks, return_exceptions=True)
def print_stats(self) -> None:
"""Print request statistics."""
self.logger.info("=" * 40)
self.logger.info("Request Statistics:")
self.logger.info(f" Total Requests: {self.stats['requests']}")
self.logger.info(f" Successful: {self.stats['success']}")
self.logger.info(f" Errors: {self.stats['errors']}")
self.logger.info("=" * 40)
class ConfigManager:
"""Manage API configuration and credentials."""
def __init__(self):
load_dotenv()
@property
def google_credentials_path(self) -> str | None:
"""Get Google service account credentials path."""
# Prefer SEO-specific credentials, fallback to general credentials
seo_creds = os.path.expanduser("~/.credential/ourdigital-seo-agent.json")
if os.path.exists(seo_creds):
return seo_creds
return os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
@property
def pagespeed_api_key(self) -> str | None:
"""Get PageSpeed Insights API key."""
return os.getenv("PAGESPEED_API_KEY")
@property
def custom_search_api_key(self) -> str | None:
"""Get Custom Search API key."""
return os.getenv("CUSTOM_SEARCH_API_KEY")
@property
def custom_search_engine_id(self) -> str | None:
"""Get Custom Search Engine ID."""
return os.getenv("CUSTOM_SEARCH_ENGINE_ID")
@property
def notion_token(self) -> str | None:
"""Get Notion API token."""
return os.getenv("NOTION_TOKEN") or os.getenv("NOTION_API_KEY")
def validate_google_credentials(self) -> bool:
"""Validate Google credentials are configured."""
creds_path = self.google_credentials_path
if not creds_path:
return False
return os.path.exists(creds_path)
def get_required(self, key: str) -> str:
"""Get required environment variable or raise error."""
value = os.getenv(key)
if not value:
raise ValueError(f"Missing required environment variable: {key}")
return value
# Singleton config instance
config = ConfigManager()

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,10 @@
# 26-seo-international dependencies
requests>=2.31.0
aiohttp>=3.9.0
beautifulsoup4>=4.12.0
lxml>=5.1.0
langdetect>=1.0.9
tenacity>=8.2.0
tqdm>=4.66.0
python-dotenv>=1.0.0
rich>=13.7.0