Add SEO skills 19-28, 31-32 with full Python implementations
12 new skills: Keyword Strategy, SERP Analysis, Position Tracking, Link Building, Content Strategy, E-Commerce SEO, KPI Framework, International SEO, AI Visibility, Knowledge Graph, Competitor Intel, and Crawl Budget. ~20K lines of Python across 25 domain scripts. Updated skill 11 pipeline table and repo CLAUDE.md. Enhanced skill 18 local SEO workflow from jamie.clinic audit. Note: Skill 26 hreflang_validator.py pending (content filter block). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
149
custom-skills/26-seo-international/code/CLAUDE.md
Normal file
149
custom-skills/26-seo-international/code/CLAUDE.md
Normal file
@@ -0,0 +1,149 @@
|
||||
# CLAUDE.md
|
||||
|
||||
## Overview
|
||||
|
||||
International SEO audit tool for multi-language and multi-region website optimization. Validates hreflang tags (bidirectional, self-referencing, x-default), analyzes URL structure patterns (ccTLD vs subdomain vs subdirectory), audits content parity across language versions, checks language detection vs declared language, and analyzes international redirect logic. Supports Korean expansion patterns (ko→ja, ko→zh, ko→en).
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
pip install -r scripts/requirements.txt
|
||||
|
||||
# Hreflang validation
|
||||
python scripts/hreflang_validator.py --url https://example.com --json
|
||||
|
||||
# Full international SEO audit
|
||||
python scripts/international_auditor.py --url https://example.com --json
|
||||
```
|
||||
|
||||
## Scripts
|
||||
|
||||
| Script | Purpose | Key Output |
|
||||
|--------|---------|------------|
|
||||
| `hreflang_validator.py` | Validate hreflang tag implementation | Hreflang errors, missing bidirectional links, x-default issues |
|
||||
| `international_auditor.py` | Full international SEO audit | URL structure, content parity, redirect logic, language detection |
|
||||
| `base_client.py` | Shared utilities | RateLimiter, ConfigManager, BaseAsyncClient |
|
||||
|
||||
## Hreflang Validator
|
||||
|
||||
```bash
|
||||
# Validate hreflang for homepage
|
||||
python scripts/hreflang_validator.py --url https://example.com --json
|
||||
|
||||
# Validate with sitemap-based discovery
|
||||
python scripts/hreflang_validator.py --url https://example.com --sitemap https://example.com/sitemap.xml --json
|
||||
|
||||
# Check specific pages
|
||||
python scripts/hreflang_validator.py --urls-file pages.txt --json
|
||||
```
|
||||
|
||||
**Capabilities**:
|
||||
- Hreflang tag extraction from HTML head, HTTP headers, and XML sitemap
|
||||
- Bidirectional validation (if page A→B, then B→A must exist)
|
||||
- Self-referencing check (each page should reference itself)
|
||||
- x-default tag verification
|
||||
- Language/region code validation (ISO 639-1 + ISO 3166-1)
|
||||
- Conflicting hreflang detection
|
||||
- Missing language version detection
|
||||
- Return tag validation (confirmation links from alternate pages)
|
||||
|
||||
## International Auditor
|
||||
|
||||
```bash
|
||||
# Full international audit
|
||||
python scripts/international_auditor.py --url https://example.com --json
|
||||
|
||||
# URL structure analysis
|
||||
python scripts/international_auditor.py --url https://example.com --scope structure --json
|
||||
|
||||
# Content parity check
|
||||
python scripts/international_auditor.py --url https://example.com --scope parity --json
|
||||
|
||||
# Korean expansion focus
|
||||
python scripts/international_auditor.py --url https://example.com --korean-expansion --json
|
||||
```
|
||||
|
||||
**Capabilities**:
|
||||
- URL structure analysis (ccTLD vs subdomain vs subdirectory)
|
||||
- Recommendation engine based on business context
|
||||
- Content parity audit across language versions
|
||||
- Page count comparison per language
|
||||
- Key page availability check (home, about, contact, products)
|
||||
- Content freshness comparison across languages
|
||||
- Language/locale detection vs declared language
|
||||
- HTML lang attribute check
|
||||
- Content-Language header check
|
||||
- Actual content language detection
|
||||
- International redirect logic audit
|
||||
- IP-based redirect detection
|
||||
- Accept-Language redirect behavior
|
||||
- Geo-redirect best practices (suggest→don't force)
|
||||
- Korean expansion patterns (ko→ja, ko→zh, ko→en)
|
||||
- Priority market recommendations for Korean businesses
|
||||
- CJK-specific URL encoding issues
|
||||
- Regional search engine considerations (Naver, Baidu, Yahoo Japan)
|
||||
|
||||
## Ahrefs MCP Tools Used
|
||||
|
||||
| Tool | Purpose |
|
||||
|------|---------|
|
||||
| `site-explorer-metrics-by-country` | Country-level traffic distribution |
|
||||
| `site-explorer-organic-keywords` | Keywords by country filter |
|
||||
|
||||
## Output Format
|
||||
|
||||
```json
|
||||
{
|
||||
"url": "https://example.com",
|
||||
"url_structure": "subdirectory",
|
||||
"languages_detected": ["ko", "en", "ja"],
|
||||
"hreflang_validation": {
|
||||
"total_pages_checked": 50,
|
||||
"errors": [],
|
||||
"warnings": [],
|
||||
"missing_bidirectional": [],
|
||||
"missing_self_reference": [],
|
||||
"x_default_present": true
|
||||
},
|
||||
"content_parity": {
|
||||
"ko": {"pages": 150, "freshness_score": 90},
|
||||
"en": {"pages": 120, "freshness_score": 75},
|
||||
"ja": {"pages": 80, "freshness_score": 60}
|
||||
},
|
||||
"redirect_logic": {
|
||||
"ip_based_redirect": false,
|
||||
"language_based_redirect": true,
|
||||
"is_forced": false
|
||||
},
|
||||
"score": 68,
|
||||
"timestamp": "2025-01-01T00:00:00"
|
||||
}
|
||||
```
|
||||
|
||||
## Notion Output (Required)
|
||||
|
||||
**IMPORTANT**: All audit reports MUST be saved to the OurDigital SEO Audit Log database.
|
||||
|
||||
### Database Configuration
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| Database ID | `2c8581e5-8a1e-8035-880b-e38cefc2f3ef` |
|
||||
| URL | https://www.notion.so/dintelligence/2c8581e58a1e8035880be38cefc2f3ef |
|
||||
|
||||
### Required Properties
|
||||
|
||||
| Property | Type | Description |
|
||||
|----------|------|-------------|
|
||||
| Issue | Title | Report title (Korean + date) |
|
||||
| Site | URL | Audited website URL |
|
||||
| Category | Select | International SEO |
|
||||
| Priority | Select | Based on hreflang error count |
|
||||
| Found Date | Date | Audit date (YYYY-MM-DD) |
|
||||
| Audit ID | Rich Text | Format: INTL-YYYYMMDD-NNN |
|
||||
|
||||
### Language Guidelines
|
||||
|
||||
- Report content in Korean (한국어)
|
||||
- Keep technical English terms as-is (e.g., hreflang, x-default, ccTLD)
|
||||
- URLs and code remain unchanged
|
||||
207
custom-skills/26-seo-international/code/scripts/base_client.py
Normal file
207
custom-skills/26-seo-international/code/scripts/base_client.py
Normal file
@@ -0,0 +1,207 @@
|
||||
"""
|
||||
Base Client - Shared async client utilities
|
||||
===========================================
|
||||
Purpose: Rate-limited async operations for API clients
|
||||
Python: 3.10+
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
import os
|
||||
from asyncio import Semaphore
|
||||
from datetime import datetime
|
||||
from typing import Any, Callable, TypeVar
|
||||
|
||||
from dotenv import load_dotenv
|
||||
from tenacity import (
|
||||
retry,
|
||||
stop_after_attempt,
|
||||
wait_exponential,
|
||||
retry_if_exception_type,
|
||||
)
|
||||
|
||||
# Load environment variables
|
||||
load_dotenv()
|
||||
|
||||
# Logging setup
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format="%(asctime)s - %(levelname)s - %(message)s",
|
||||
)
|
||||
|
||||
T = TypeVar("T")
|
||||
|
||||
|
||||
class RateLimiter:
|
||||
"""Rate limiter using token bucket algorithm."""
|
||||
|
||||
def __init__(self, rate: float, per: float = 1.0):
|
||||
"""
|
||||
Initialize rate limiter.
|
||||
|
||||
Args:
|
||||
rate: Number of requests allowed
|
||||
per: Time period in seconds (default: 1 second)
|
||||
"""
|
||||
self.rate = rate
|
||||
self.per = per
|
||||
self.tokens = rate
|
||||
self.last_update = datetime.now()
|
||||
self._lock = asyncio.Lock()
|
||||
|
||||
async def acquire(self) -> None:
|
||||
"""Acquire a token, waiting if necessary."""
|
||||
async with self._lock:
|
||||
now = datetime.now()
|
||||
elapsed = (now - self.last_update).total_seconds()
|
||||
self.tokens = min(self.rate, self.tokens + elapsed * (self.rate / self.per))
|
||||
self.last_update = now
|
||||
|
||||
if self.tokens < 1:
|
||||
wait_time = (1 - self.tokens) * (self.per / self.rate)
|
||||
await asyncio.sleep(wait_time)
|
||||
self.tokens = 0
|
||||
else:
|
||||
self.tokens -= 1
|
||||
|
||||
|
||||
class BaseAsyncClient:
|
||||
"""Base class for async API clients with rate limiting."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
max_concurrent: int = 5,
|
||||
requests_per_second: float = 3.0,
|
||||
logger: logging.Logger | None = None,
|
||||
):
|
||||
"""
|
||||
Initialize base client.
|
||||
|
||||
Args:
|
||||
max_concurrent: Maximum concurrent requests
|
||||
requests_per_second: Rate limit
|
||||
logger: Logger instance
|
||||
"""
|
||||
self.semaphore = Semaphore(max_concurrent)
|
||||
self.rate_limiter = RateLimiter(requests_per_second)
|
||||
self.logger = logger or logging.getLogger(self.__class__.__name__)
|
||||
self.stats = {
|
||||
"requests": 0,
|
||||
"success": 0,
|
||||
"errors": 0,
|
||||
"retries": 0,
|
||||
}
|
||||
|
||||
@retry(
|
||||
stop=stop_after_attempt(3),
|
||||
wait=wait_exponential(multiplier=1, min=2, max=10),
|
||||
retry=retry_if_exception_type(Exception),
|
||||
)
|
||||
async def _rate_limited_request(
|
||||
self,
|
||||
coro: Callable[[], Any],
|
||||
) -> Any:
|
||||
"""Execute a request with rate limiting and retry."""
|
||||
async with self.semaphore:
|
||||
await self.rate_limiter.acquire()
|
||||
self.stats["requests"] += 1
|
||||
try:
|
||||
result = await coro()
|
||||
self.stats["success"] += 1
|
||||
return result
|
||||
except Exception as e:
|
||||
self.stats["errors"] += 1
|
||||
self.logger.error(f"Request failed: {e}")
|
||||
raise
|
||||
|
||||
async def batch_requests(
|
||||
self,
|
||||
requests: list[Callable[[], Any]],
|
||||
desc: str = "Processing",
|
||||
) -> list[Any]:
|
||||
"""Execute multiple requests concurrently."""
|
||||
try:
|
||||
from tqdm.asyncio import tqdm
|
||||
has_tqdm = True
|
||||
except ImportError:
|
||||
has_tqdm = False
|
||||
|
||||
async def execute(req: Callable) -> Any:
|
||||
try:
|
||||
return await self._rate_limited_request(req)
|
||||
except Exception as e:
|
||||
return {"error": str(e)}
|
||||
|
||||
tasks = [execute(req) for req in requests]
|
||||
|
||||
if has_tqdm:
|
||||
results = []
|
||||
for coro in tqdm.as_completed(tasks, total=len(tasks), desc=desc):
|
||||
result = await coro
|
||||
results.append(result)
|
||||
return results
|
||||
else:
|
||||
return await asyncio.gather(*tasks, return_exceptions=True)
|
||||
|
||||
def print_stats(self) -> None:
|
||||
"""Print request statistics."""
|
||||
self.logger.info("=" * 40)
|
||||
self.logger.info("Request Statistics:")
|
||||
self.logger.info(f" Total Requests: {self.stats['requests']}")
|
||||
self.logger.info(f" Successful: {self.stats['success']}")
|
||||
self.logger.info(f" Errors: {self.stats['errors']}")
|
||||
self.logger.info("=" * 40)
|
||||
|
||||
|
||||
class ConfigManager:
|
||||
"""Manage API configuration and credentials."""
|
||||
|
||||
def __init__(self):
|
||||
load_dotenv()
|
||||
|
||||
@property
|
||||
def google_credentials_path(self) -> str | None:
|
||||
"""Get Google service account credentials path."""
|
||||
# Prefer SEO-specific credentials, fallback to general credentials
|
||||
seo_creds = os.path.expanduser("~/.credential/ourdigital-seo-agent.json")
|
||||
if os.path.exists(seo_creds):
|
||||
return seo_creds
|
||||
return os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
|
||||
|
||||
@property
|
||||
def pagespeed_api_key(self) -> str | None:
|
||||
"""Get PageSpeed Insights API key."""
|
||||
return os.getenv("PAGESPEED_API_KEY")
|
||||
|
||||
@property
|
||||
def custom_search_api_key(self) -> str | None:
|
||||
"""Get Custom Search API key."""
|
||||
return os.getenv("CUSTOM_SEARCH_API_KEY")
|
||||
|
||||
@property
|
||||
def custom_search_engine_id(self) -> str | None:
|
||||
"""Get Custom Search Engine ID."""
|
||||
return os.getenv("CUSTOM_SEARCH_ENGINE_ID")
|
||||
|
||||
@property
|
||||
def notion_token(self) -> str | None:
|
||||
"""Get Notion API token."""
|
||||
return os.getenv("NOTION_TOKEN") or os.getenv("NOTION_API_KEY")
|
||||
|
||||
def validate_google_credentials(self) -> bool:
|
||||
"""Validate Google credentials are configured."""
|
||||
creds_path = self.google_credentials_path
|
||||
if not creds_path:
|
||||
return False
|
||||
return os.path.exists(creds_path)
|
||||
|
||||
def get_required(self, key: str) -> str:
|
||||
"""Get required environment variable or raise error."""
|
||||
value = os.getenv(key)
|
||||
if not value:
|
||||
raise ValueError(f"Missing required environment variable: {key}")
|
||||
return value
|
||||
|
||||
|
||||
# Singleton config instance
|
||||
config = ConfigManager()
|
||||
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,10 @@
|
||||
# 26-seo-international dependencies
|
||||
requests>=2.31.0
|
||||
aiohttp>=3.9.0
|
||||
beautifulsoup4>=4.12.0
|
||||
lxml>=5.1.0
|
||||
langdetect>=1.0.9
|
||||
tenacity>=8.2.0
|
||||
tqdm>=4.66.0
|
||||
python-dotenv>=1.0.0
|
||||
rich>=13.7.0
|
||||
Reference in New Issue
Block a user