Add SEO skills 19-28, 31-32 with full Python implementations
12 new skills: Keyword Strategy, SERP Analysis, Position Tracking, Link Building, Content Strategy, E-Commerce SEO, KPI Framework, International SEO, AI Visibility, Knowledge Graph, Competitor Intel, and Crawl Budget. ~20K lines of Python across 25 domain scripts. Updated skill 11 pipeline table and repo CLAUDE.md. Enhanced skill 18 local SEO workflow from jamie.clinic audit. Note: Skill 26 hreflang_validator.py pending (content filter block). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
136
custom-skills/24-seo-ecommerce/code/CLAUDE.md
Normal file
136
custom-skills/24-seo-ecommerce/code/CLAUDE.md
Normal file
@@ -0,0 +1,136 @@
|
||||
# CLAUDE.md
|
||||
|
||||
## Overview
|
||||
|
||||
E-commerce SEO audit tool for product page optimization, product schema validation, category taxonomy analysis, and marketplace presence checking. Supports Naver Smart Store optimization and Korean marketplace platforms (Coupang, Gmarket, 11번가).
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
pip install -r scripts/requirements.txt
|
||||
|
||||
# E-commerce SEO audit
|
||||
python scripts/ecommerce_auditor.py --url https://example.com --json
|
||||
|
||||
# Product schema validation
|
||||
python scripts/product_schema_checker.py --url https://example.com/product --json
|
||||
```
|
||||
|
||||
## Scripts
|
||||
|
||||
| Script | Purpose | Key Output |
|
||||
|--------|---------|------------|
|
||||
| `ecommerce_auditor.py` | Full e-commerce SEO audit | Product page issues, category structure, marketplace presence |
|
||||
| `product_schema_checker.py` | Validate product structured data | Schema completeness, errors, rich result eligibility |
|
||||
| `base_client.py` | Shared utilities | RateLimiter, ConfigManager, BaseAsyncClient |
|
||||
|
||||
## E-Commerce Auditor
|
||||
|
||||
```bash
|
||||
# Full audit
|
||||
python scripts/ecommerce_auditor.py --url https://example.com --json
|
||||
|
||||
# Product page audit only
|
||||
python scripts/ecommerce_auditor.py --url https://example.com --scope products --json
|
||||
|
||||
# Category taxonomy analysis
|
||||
python scripts/ecommerce_auditor.py --url https://example.com --scope categories --json
|
||||
|
||||
# Check Korean marketplace presence
|
||||
python scripts/ecommerce_auditor.py --url https://example.com --korean-marketplaces --json
|
||||
```
|
||||
|
||||
**Capabilities**:
|
||||
- Product page SEO audit (titles, meta descriptions, image alt text, H1 structure)
|
||||
- Category taxonomy analysis (depth, breadcrumb implementation, faceted navigation)
|
||||
- Duplicate content detection (parameter URLs, product variants, pagination)
|
||||
- Pagination/infinite scroll SEO validation (rel=prev/next, canonical tags)
|
||||
- Internal linking structure for product discovery
|
||||
- Naver Smart Store optimization checks
|
||||
- Korean marketplace presence (Coupang, Gmarket, 11번가 product listing detection)
|
||||
|
||||
## Product Schema Checker
|
||||
|
||||
```bash
|
||||
# Validate single product page
|
||||
python scripts/product_schema_checker.py --url https://example.com/product/123 --json
|
||||
|
||||
# Batch validate from sitemap
|
||||
python scripts/product_schema_checker.py --sitemap https://example.com/product-sitemap.xml --sample 50 --json
|
||||
```
|
||||
|
||||
**Capabilities**:
|
||||
- Product schema validation (Product, Offer, AggregateRating, Review, BreadcrumbList)
|
||||
- Required property completeness check (name, image, description, offers, price, availability)
|
||||
- Optional property recommendations (brand, sku, gtin, mpn, review, aggregateRating)
|
||||
- Rich result eligibility assessment
|
||||
- Price and availability markup validation
|
||||
- Merchant listing schema support
|
||||
- Korean market: Naver Shopping structured data requirements
|
||||
|
||||
## Ahrefs MCP Tools Used
|
||||
|
||||
| Tool | Purpose |
|
||||
|------|---------|
|
||||
| `site-explorer-pages-by-traffic` | Identify top product/category pages |
|
||||
| `site-explorer-organic-keywords` | Product page keyword performance |
|
||||
|
||||
## Output Format
|
||||
|
||||
```json
|
||||
{
|
||||
"url": "https://example.com",
|
||||
"product_pages_audited": 50,
|
||||
"issues": {
|
||||
"critical": [...],
|
||||
"high": [...],
|
||||
"medium": [...],
|
||||
"low": [...]
|
||||
},
|
||||
"category_structure": {
|
||||
"max_depth": 4,
|
||||
"breadcrumbs_present": true,
|
||||
"faceted_nav_issues": [...]
|
||||
},
|
||||
"schema_validation": {
|
||||
"pages_with_schema": 42,
|
||||
"pages_without_schema": 8,
|
||||
"common_errors": [...]
|
||||
},
|
||||
"korean_marketplaces": {
|
||||
"naver_smart_store": {"found": true, "url": "..."},
|
||||
"coupang": {"found": false},
|
||||
"gmarket": {"found": false}
|
||||
},
|
||||
"score": 65,
|
||||
"timestamp": "2025-01-01T00:00:00"
|
||||
}
|
||||
```
|
||||
|
||||
## Notion Output (Required)
|
||||
|
||||
**IMPORTANT**: All audit reports MUST be saved to the OurDigital SEO Audit Log database.
|
||||
|
||||
### Database Configuration
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| Database ID | `2c8581e5-8a1e-8035-880b-e38cefc2f3ef` |
|
||||
| URL | https://www.notion.so/dintelligence/2c8581e58a1e8035880be38cefc2f3ef |
|
||||
|
||||
### Required Properties
|
||||
|
||||
| Property | Type | Description |
|
||||
|----------|------|-------------|
|
||||
| Issue | Title | Report title (Korean + date) |
|
||||
| Site | URL | Audited website URL |
|
||||
| Category | Select | E-Commerce SEO |
|
||||
| Priority | Select | Based on issue severity |
|
||||
| Found Date | Date | Audit date (YYYY-MM-DD) |
|
||||
| Audit ID | Rich Text | Format: ECOM-YYYYMMDD-NNN |
|
||||
|
||||
### Language Guidelines
|
||||
|
||||
- Report content in Korean (한국어)
|
||||
- Keep technical English terms as-is (e.g., Schema Markup, Product, Offer)
|
||||
- URLs and code remain unchanged
|
||||
207
custom-skills/24-seo-ecommerce/code/scripts/base_client.py
Normal file
207
custom-skills/24-seo-ecommerce/code/scripts/base_client.py
Normal file
@@ -0,0 +1,207 @@
|
||||
"""
|
||||
Base Client - Shared async client utilities
|
||||
===========================================
|
||||
Purpose: Rate-limited async operations for API clients
|
||||
Python: 3.10+
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
import os
|
||||
from asyncio import Semaphore
|
||||
from datetime import datetime
|
||||
from typing import Any, Callable, TypeVar
|
||||
|
||||
from dotenv import load_dotenv
|
||||
from tenacity import (
|
||||
retry,
|
||||
stop_after_attempt,
|
||||
wait_exponential,
|
||||
retry_if_exception_type,
|
||||
)
|
||||
|
||||
# Load environment variables
|
||||
load_dotenv()
|
||||
|
||||
# Logging setup
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format="%(asctime)s - %(levelname)s - %(message)s",
|
||||
)
|
||||
|
||||
T = TypeVar("T")
|
||||
|
||||
|
||||
class RateLimiter:
|
||||
"""Rate limiter using token bucket algorithm."""
|
||||
|
||||
def __init__(self, rate: float, per: float = 1.0):
|
||||
"""
|
||||
Initialize rate limiter.
|
||||
|
||||
Args:
|
||||
rate: Number of requests allowed
|
||||
per: Time period in seconds (default: 1 second)
|
||||
"""
|
||||
self.rate = rate
|
||||
self.per = per
|
||||
self.tokens = rate
|
||||
self.last_update = datetime.now()
|
||||
self._lock = asyncio.Lock()
|
||||
|
||||
async def acquire(self) -> None:
|
||||
"""Acquire a token, waiting if necessary."""
|
||||
async with self._lock:
|
||||
now = datetime.now()
|
||||
elapsed = (now - self.last_update).total_seconds()
|
||||
self.tokens = min(self.rate, self.tokens + elapsed * (self.rate / self.per))
|
||||
self.last_update = now
|
||||
|
||||
if self.tokens < 1:
|
||||
wait_time = (1 - self.tokens) * (self.per / self.rate)
|
||||
await asyncio.sleep(wait_time)
|
||||
self.tokens = 0
|
||||
else:
|
||||
self.tokens -= 1
|
||||
|
||||
|
||||
class BaseAsyncClient:
|
||||
"""Base class for async API clients with rate limiting."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
max_concurrent: int = 5,
|
||||
requests_per_second: float = 3.0,
|
||||
logger: logging.Logger | None = None,
|
||||
):
|
||||
"""
|
||||
Initialize base client.
|
||||
|
||||
Args:
|
||||
max_concurrent: Maximum concurrent requests
|
||||
requests_per_second: Rate limit
|
||||
logger: Logger instance
|
||||
"""
|
||||
self.semaphore = Semaphore(max_concurrent)
|
||||
self.rate_limiter = RateLimiter(requests_per_second)
|
||||
self.logger = logger or logging.getLogger(self.__class__.__name__)
|
||||
self.stats = {
|
||||
"requests": 0,
|
||||
"success": 0,
|
||||
"errors": 0,
|
||||
"retries": 0,
|
||||
}
|
||||
|
||||
@retry(
|
||||
stop=stop_after_attempt(3),
|
||||
wait=wait_exponential(multiplier=1, min=2, max=10),
|
||||
retry=retry_if_exception_type(Exception),
|
||||
)
|
||||
async def _rate_limited_request(
|
||||
self,
|
||||
coro: Callable[[], Any],
|
||||
) -> Any:
|
||||
"""Execute a request with rate limiting and retry."""
|
||||
async with self.semaphore:
|
||||
await self.rate_limiter.acquire()
|
||||
self.stats["requests"] += 1
|
||||
try:
|
||||
result = await coro()
|
||||
self.stats["success"] += 1
|
||||
return result
|
||||
except Exception as e:
|
||||
self.stats["errors"] += 1
|
||||
self.logger.error(f"Request failed: {e}")
|
||||
raise
|
||||
|
||||
async def batch_requests(
|
||||
self,
|
||||
requests: list[Callable[[], Any]],
|
||||
desc: str = "Processing",
|
||||
) -> list[Any]:
|
||||
"""Execute multiple requests concurrently."""
|
||||
try:
|
||||
from tqdm.asyncio import tqdm
|
||||
has_tqdm = True
|
||||
except ImportError:
|
||||
has_tqdm = False
|
||||
|
||||
async def execute(req: Callable) -> Any:
|
||||
try:
|
||||
return await self._rate_limited_request(req)
|
||||
except Exception as e:
|
||||
return {"error": str(e)}
|
||||
|
||||
tasks = [execute(req) for req in requests]
|
||||
|
||||
if has_tqdm:
|
||||
results = []
|
||||
for coro in tqdm.as_completed(tasks, total=len(tasks), desc=desc):
|
||||
result = await coro
|
||||
results.append(result)
|
||||
return results
|
||||
else:
|
||||
return await asyncio.gather(*tasks, return_exceptions=True)
|
||||
|
||||
def print_stats(self) -> None:
|
||||
"""Print request statistics."""
|
||||
self.logger.info("=" * 40)
|
||||
self.logger.info("Request Statistics:")
|
||||
self.logger.info(f" Total Requests: {self.stats['requests']}")
|
||||
self.logger.info(f" Successful: {self.stats['success']}")
|
||||
self.logger.info(f" Errors: {self.stats['errors']}")
|
||||
self.logger.info("=" * 40)
|
||||
|
||||
|
||||
class ConfigManager:
|
||||
"""Manage API configuration and credentials."""
|
||||
|
||||
def __init__(self):
|
||||
load_dotenv()
|
||||
|
||||
@property
|
||||
def google_credentials_path(self) -> str | None:
|
||||
"""Get Google service account credentials path."""
|
||||
# Prefer SEO-specific credentials, fallback to general credentials
|
||||
seo_creds = os.path.expanduser("~/.credential/ourdigital-seo-agent.json")
|
||||
if os.path.exists(seo_creds):
|
||||
return seo_creds
|
||||
return os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
|
||||
|
||||
@property
|
||||
def pagespeed_api_key(self) -> str | None:
|
||||
"""Get PageSpeed Insights API key."""
|
||||
return os.getenv("PAGESPEED_API_KEY")
|
||||
|
||||
@property
|
||||
def custom_search_api_key(self) -> str | None:
|
||||
"""Get Custom Search API key."""
|
||||
return os.getenv("CUSTOM_SEARCH_API_KEY")
|
||||
|
||||
@property
|
||||
def custom_search_engine_id(self) -> str | None:
|
||||
"""Get Custom Search Engine ID."""
|
||||
return os.getenv("CUSTOM_SEARCH_ENGINE_ID")
|
||||
|
||||
@property
|
||||
def notion_token(self) -> str | None:
|
||||
"""Get Notion API token."""
|
||||
return os.getenv("NOTION_TOKEN") or os.getenv("NOTION_API_KEY")
|
||||
|
||||
def validate_google_credentials(self) -> bool:
|
||||
"""Validate Google credentials are configured."""
|
||||
creds_path = self.google_credentials_path
|
||||
if not creds_path:
|
||||
return False
|
||||
return os.path.exists(creds_path)
|
||||
|
||||
def get_required(self, key: str) -> str:
|
||||
"""Get required environment variable or raise error."""
|
||||
value = os.getenv(key)
|
||||
if not value:
|
||||
raise ValueError(f"Missing required environment variable: {key}")
|
||||
return value
|
||||
|
||||
|
||||
# Singleton config instance
|
||||
config = ConfigManager()
|
||||
1046
custom-skills/24-seo-ecommerce/code/scripts/ecommerce_auditor.py
Normal file
1046
custom-skills/24-seo-ecommerce/code/scripts/ecommerce_auditor.py
Normal file
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,805 @@
|
||||
"""
|
||||
Product Schema Checker
|
||||
======================
|
||||
Purpose: Validate Product structured data (JSON-LD, Microdata, RDFa)
|
||||
for Google and Naver rich result eligibility.
|
||||
Python: 3.10+
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import asyncio
|
||||
import json
|
||||
import logging
|
||||
import re
|
||||
import sys
|
||||
from dataclasses import asdict, dataclass, field
|
||||
from datetime import datetime
|
||||
from typing import Any
|
||||
from urllib.parse import urljoin, urlparse
|
||||
|
||||
import aiohttp
|
||||
from bs4 import BeautifulSoup
|
||||
from rich.console import Console
|
||||
from rich.table import Table
|
||||
|
||||
from base_client import BaseAsyncClient, config
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
console = Console()
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Data classes
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
@dataclass
|
||||
class SchemaProperty:
|
||||
"""Single property within a schema object."""
|
||||
name: str
|
||||
value: Any
|
||||
required: bool
|
||||
valid: bool
|
||||
error: str = ""
|
||||
|
||||
|
||||
@dataclass
|
||||
class ProductSchema:
|
||||
"""Validation result for one product schema on a page."""
|
||||
url: str
|
||||
schema_type: str # Product, Offer, AggregateRating, etc.
|
||||
properties: list[dict] # list of SchemaProperty as dicts
|
||||
is_valid: bool = False
|
||||
rich_result_eligible: bool = False
|
||||
errors: list[str] = field(default_factory=list)
|
||||
warnings: list[str] = field(default_factory=list)
|
||||
|
||||
|
||||
@dataclass
|
||||
class SchemaCheckResult:
|
||||
"""Complete schema check result for one or more pages."""
|
||||
urls_checked: int = 0
|
||||
pages_with_schema: int = 0
|
||||
pages_without_schema: int = 0
|
||||
schemas: list[dict] = field(default_factory=list)
|
||||
common_errors: list[str] = field(default_factory=list)
|
||||
common_warnings: list[str] = field(default_factory=list)
|
||||
naver_shopping_issues: list[dict] = field(default_factory=list)
|
||||
score: int = 0
|
||||
timestamp: str = ""
|
||||
|
||||
def calculate_score(self) -> int:
|
||||
"""Score 0-100 based on schema completeness."""
|
||||
if self.urls_checked == 0:
|
||||
self.score = 0
|
||||
return 0
|
||||
coverage = self.pages_with_schema / self.urls_checked
|
||||
valid_schemas = sum(1 for s in self.schemas if s.get("is_valid"))
|
||||
validity_rate = valid_schemas / max(len(self.schemas), 1)
|
||||
eligible = sum(1 for s in self.schemas if s.get("rich_result_eligible"))
|
||||
eligibility_rate = eligible / max(len(self.schemas), 1)
|
||||
self.score = int(coverage * 40 + validity_rate * 35 + eligibility_rate * 25)
|
||||
return self.score
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Schema requirements
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
PRODUCT_REQUIRED = {"name", "image", "description"}
|
||||
PRODUCT_RECOMMENDED = {
|
||||
"brand", "sku", "gtin", "gtin8", "gtin13", "gtin14", "mpn",
|
||||
"offers", "review", "aggregateRating", "color", "material",
|
||||
}
|
||||
|
||||
OFFER_REQUIRED = {"price", "priceCurrency", "availability"}
|
||||
OFFER_RECOMMENDED = {
|
||||
"url", "priceValidUntil", "itemCondition", "seller",
|
||||
"shippingDetails", "hasMerchantReturnPolicy",
|
||||
}
|
||||
|
||||
AGGREGATE_RATING_REQUIRED = {"ratingValue", "reviewCount"}
|
||||
AGGREGATE_RATING_RECOMMENDED = {"bestRating", "worstRating", "ratingCount"}
|
||||
|
||||
REVIEW_REQUIRED = {"author", "reviewRating"}
|
||||
REVIEW_RECOMMENDED = {"datePublished", "reviewBody", "name"}
|
||||
|
||||
BREADCRUMB_REQUIRED = {"itemListElement"}
|
||||
|
||||
AVAILABILITY_VALUES = {
|
||||
"https://schema.org/InStock",
|
||||
"https://schema.org/OutOfStock",
|
||||
"https://schema.org/PreOrder",
|
||||
"https://schema.org/BackOrder",
|
||||
"https://schema.org/Discontinued",
|
||||
"https://schema.org/InStoreOnly",
|
||||
"https://schema.org/OnlineOnly",
|
||||
"https://schema.org/LimitedAvailability",
|
||||
"https://schema.org/SoldOut",
|
||||
"http://schema.org/InStock",
|
||||
"http://schema.org/OutOfStock",
|
||||
"http://schema.org/PreOrder",
|
||||
"http://schema.org/BackOrder",
|
||||
"http://schema.org/Discontinued",
|
||||
"InStock", "OutOfStock", "PreOrder", "BackOrder", "Discontinued",
|
||||
}
|
||||
|
||||
ITEM_CONDITION_VALUES = {
|
||||
"https://schema.org/NewCondition",
|
||||
"https://schema.org/UsedCondition",
|
||||
"https://schema.org/RefurbishedCondition",
|
||||
"https://schema.org/DamagedCondition",
|
||||
"http://schema.org/NewCondition",
|
||||
"http://schema.org/UsedCondition",
|
||||
"http://schema.org/RefurbishedCondition",
|
||||
"NewCondition", "UsedCondition", "RefurbishedCondition",
|
||||
}
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Main checker
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class ProductSchemaChecker(BaseAsyncClient):
|
||||
"""Validate Product structured data on e-commerce pages."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
max_concurrent: int = 10,
|
||||
requests_per_second: float = 5.0,
|
||||
timeout: int = 30,
|
||||
):
|
||||
super().__init__(max_concurrent=max_concurrent, requests_per_second=requests_per_second)
|
||||
self.timeout = aiohttp.ClientTimeout(total=timeout)
|
||||
self.headers = {
|
||||
"User-Agent": (
|
||||
"Mozilla/5.0 (compatible; ProductSchemaChecker/1.0; "
|
||||
"+https://ourdigital.org)"
|
||||
),
|
||||
"Accept": "text/html,application/xhtml+xml",
|
||||
"Accept-Language": "ko-KR,ko;q=0.9,en-US;q=0.8,en;q=0.7",
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Page fetching
|
||||
# ------------------------------------------------------------------
|
||||
|
||||
async def _fetch_page(self, session: aiohttp.ClientSession, url: str) -> str:
|
||||
"""Fetch page HTML."""
|
||||
try:
|
||||
async with session.get(url, headers=self.headers, timeout=self.timeout,
|
||||
allow_redirects=True, ssl=False) as resp:
|
||||
return await resp.text(errors="replace")
|
||||
except Exception as exc:
|
||||
self.logger.warning(f"Failed to fetch {url}: {exc}")
|
||||
return ""
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Schema extraction
|
||||
# ------------------------------------------------------------------
|
||||
|
||||
def extract_schemas(self, html: str, page_url: str) -> list[dict]:
|
||||
"""Extract all structured data from HTML (JSON-LD, Microdata, RDFa)."""
|
||||
schemas: list[dict] = []
|
||||
soup = BeautifulSoup(html, "lxml")
|
||||
|
||||
# --- JSON-LD ---
|
||||
for script in soup.find_all("script", attrs={"type": "application/ld+json"}):
|
||||
try:
|
||||
text = script.string or script.get_text()
|
||||
if not text:
|
||||
continue
|
||||
data = json.loads(text)
|
||||
if isinstance(data, list):
|
||||
for item in data:
|
||||
if isinstance(item, dict):
|
||||
schemas.append(item)
|
||||
elif isinstance(data, dict):
|
||||
# Handle @graph
|
||||
if "@graph" in data:
|
||||
for item in data["@graph"]:
|
||||
if isinstance(item, dict):
|
||||
schemas.append(item)
|
||||
else:
|
||||
schemas.append(data)
|
||||
except (json.JSONDecodeError, TypeError) as exc:
|
||||
self.logger.debug(f"JSON-LD parse error on {page_url}: {exc}")
|
||||
|
||||
# --- Microdata ---
|
||||
for item_scope in soup.find_all(attrs={"itemscope": True}):
|
||||
item_type = item_scope.get("itemtype", "")
|
||||
if "Product" in item_type or "Offer" in item_type:
|
||||
microdata = self._parse_microdata(item_scope)
|
||||
if microdata:
|
||||
schemas.append(microdata)
|
||||
|
||||
return schemas
|
||||
|
||||
def _parse_microdata(self, element) -> dict:
|
||||
"""Parse microdata from an itemscope element."""
|
||||
result: dict[str, Any] = {}
|
||||
item_type = element.get("itemtype", "")
|
||||
if item_type:
|
||||
type_name = item_type.rstrip("/").split("/")[-1]
|
||||
result["@type"] = type_name
|
||||
|
||||
for prop in element.find_all(attrs={"itemprop": True}, recursive=True):
|
||||
name = prop.get("itemprop", "")
|
||||
if not name:
|
||||
continue
|
||||
# Nested itemscope
|
||||
if prop.get("itemscope") is not None:
|
||||
result[name] = self._parse_microdata(prop)
|
||||
elif prop.name == "meta":
|
||||
result[name] = prop.get("content", "")
|
||||
elif prop.name == "link":
|
||||
result[name] = prop.get("href", "")
|
||||
elif prop.name == "img":
|
||||
result[name] = prop.get("src", "")
|
||||
elif prop.name == "time":
|
||||
result[name] = prop.get("datetime", prop.get_text(strip=True))
|
||||
else:
|
||||
result[name] = prop.get_text(strip=True)
|
||||
|
||||
return result
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Validation methods
|
||||
# ------------------------------------------------------------------
|
||||
|
||||
def validate_product_schema(self, schema_data: dict, page_url: str) -> ProductSchema:
|
||||
"""Validate a Product schema object."""
|
||||
ps = ProductSchema(
|
||||
url=page_url,
|
||||
schema_type="Product",
|
||||
properties=[],
|
||||
)
|
||||
|
||||
# Check required properties
|
||||
for prop_name in PRODUCT_REQUIRED:
|
||||
value = schema_data.get(prop_name)
|
||||
valid = bool(value)
|
||||
error = "" if valid else f"Missing required property: {prop_name}"
|
||||
sp = SchemaProperty(
|
||||
name=prop_name, value=value, required=True, valid=valid, error=error,
|
||||
)
|
||||
ps.properties.append(asdict(sp))
|
||||
if not valid:
|
||||
ps.errors.append(error)
|
||||
|
||||
# Check recommended properties
|
||||
for prop_name in PRODUCT_RECOMMENDED:
|
||||
value = schema_data.get(prop_name)
|
||||
sp = SchemaProperty(
|
||||
name=prop_name, value=value if value else None,
|
||||
required=False, valid=bool(value),
|
||||
error="" if value else f"Missing recommended property: {prop_name}",
|
||||
)
|
||||
ps.properties.append(asdict(sp))
|
||||
if not value:
|
||||
ps.warnings.append(f"Missing recommended property: {prop_name}")
|
||||
|
||||
# Validate offers
|
||||
offers = schema_data.get("offers")
|
||||
if offers:
|
||||
if isinstance(offers, list):
|
||||
for offer in offers:
|
||||
offer_errors = self.validate_offer_schema(offer)
|
||||
ps.errors.extend(offer_errors["errors"])
|
||||
ps.warnings.extend(offer_errors["warnings"])
|
||||
elif isinstance(offers, dict):
|
||||
offer_errors = self.validate_offer_schema(offers)
|
||||
ps.errors.extend(offer_errors["errors"])
|
||||
ps.warnings.extend(offer_errors["warnings"])
|
||||
else:
|
||||
ps.errors.append("Missing 'offers' property (required for rich results)")
|
||||
|
||||
# Validate aggregateRating
|
||||
agg_rating = schema_data.get("aggregateRating")
|
||||
if agg_rating and isinstance(agg_rating, dict):
|
||||
rating_result = self.validate_aggregate_rating(agg_rating)
|
||||
ps.errors.extend(rating_result["errors"])
|
||||
ps.warnings.extend(rating_result["warnings"])
|
||||
|
||||
# Validate reviews
|
||||
review = schema_data.get("review")
|
||||
if review:
|
||||
reviews = review if isinstance(review, list) else [review]
|
||||
for r in reviews[:5]: # Check up to 5 reviews
|
||||
if isinstance(r, dict):
|
||||
review_result = self.validate_review_schema(r)
|
||||
ps.errors.extend(review_result["errors"])
|
||||
ps.warnings.extend(review_result["warnings"])
|
||||
|
||||
ps.is_valid = len(ps.errors) == 0
|
||||
ps.rich_result_eligible = self.check_rich_result_eligibility(schema_data)
|
||||
|
||||
return ps
|
||||
|
||||
def validate_offer_schema(self, offer_data: dict) -> dict[str, list[str]]:
|
||||
"""Validate an Offer schema object."""
|
||||
errors: list[str] = []
|
||||
warnings: list[str] = []
|
||||
|
||||
for prop_name in OFFER_REQUIRED:
|
||||
value = offer_data.get(prop_name)
|
||||
if not value:
|
||||
errors.append(f"Offer missing required property: {prop_name}")
|
||||
|
||||
# Validate price format
|
||||
price = offer_data.get("price")
|
||||
if price is not None:
|
||||
price_str = str(price).replace(",", "").strip()
|
||||
if not re.match(r"^\d+(\.\d+)?$", price_str):
|
||||
errors.append(f"Invalid price format: '{price}' (must be numeric)")
|
||||
elif float(price_str) <= 0:
|
||||
warnings.append(f"Price is zero or negative: {price}")
|
||||
|
||||
# Validate priceCurrency
|
||||
currency = offer_data.get("priceCurrency", "")
|
||||
valid_currencies = {"KRW", "USD", "EUR", "JPY", "CNY", "GBP"}
|
||||
if currency and currency.upper() not in valid_currencies:
|
||||
warnings.append(f"Unusual currency code: {currency}")
|
||||
|
||||
# Validate availability
|
||||
availability = offer_data.get("availability", "")
|
||||
if availability and availability not in AVAILABILITY_VALUES:
|
||||
errors.append(
|
||||
f"Invalid availability value: '{availability}'. "
|
||||
f"Use schema.org values like https://schema.org/InStock"
|
||||
)
|
||||
|
||||
# Validate itemCondition
|
||||
condition = offer_data.get("itemCondition", "")
|
||||
if condition and condition not in ITEM_CONDITION_VALUES:
|
||||
warnings.append(f"Invalid itemCondition: '{condition}'")
|
||||
|
||||
# Check recommended
|
||||
for prop_name in OFFER_RECOMMENDED:
|
||||
if not offer_data.get(prop_name):
|
||||
warnings.append(f"Offer missing recommended property: {prop_name}")
|
||||
|
||||
return {"errors": errors, "warnings": warnings}
|
||||
|
||||
def validate_aggregate_rating(self, rating_data: dict) -> dict[str, list[str]]:
|
||||
"""Validate AggregateRating schema."""
|
||||
errors: list[str] = []
|
||||
warnings: list[str] = []
|
||||
|
||||
for prop_name in AGGREGATE_RATING_REQUIRED:
|
||||
value = rating_data.get(prop_name)
|
||||
if value is None:
|
||||
errors.append(f"AggregateRating missing required: {prop_name}")
|
||||
|
||||
# Validate ratingValue range
|
||||
rating_value = rating_data.get("ratingValue")
|
||||
best_rating = rating_data.get("bestRating", 5)
|
||||
worst_rating = rating_data.get("worstRating", 1)
|
||||
if rating_value is not None:
|
||||
try:
|
||||
rv = float(rating_value)
|
||||
br = float(best_rating)
|
||||
wr = float(worst_rating)
|
||||
if rv < wr or rv > br:
|
||||
errors.append(
|
||||
f"ratingValue ({rv}) outside range [{wr}, {br}]"
|
||||
)
|
||||
except (ValueError, TypeError):
|
||||
errors.append(f"Invalid ratingValue format: {rating_value}")
|
||||
|
||||
# Validate reviewCount
|
||||
review_count = rating_data.get("reviewCount")
|
||||
if review_count is not None:
|
||||
try:
|
||||
rc = int(review_count)
|
||||
if rc < 0:
|
||||
errors.append(f"Negative reviewCount: {rc}")
|
||||
except (ValueError, TypeError):
|
||||
errors.append(f"Invalid reviewCount format: {review_count}")
|
||||
|
||||
for prop_name in AGGREGATE_RATING_RECOMMENDED:
|
||||
if not rating_data.get(prop_name):
|
||||
warnings.append(f"AggregateRating missing recommended: {prop_name}")
|
||||
|
||||
return {"errors": errors, "warnings": warnings}
|
||||
|
||||
def validate_review_schema(self, review_data: dict) -> dict[str, list[str]]:
|
||||
"""Validate Review schema."""
|
||||
errors: list[str] = []
|
||||
warnings: list[str] = []
|
||||
|
||||
# Author validation
|
||||
author = review_data.get("author")
|
||||
if not author:
|
||||
errors.append("Review missing required: author")
|
||||
elif isinstance(author, dict):
|
||||
author_name = author.get("name", "")
|
||||
if not author_name:
|
||||
errors.append("Review author missing 'name' property")
|
||||
elif isinstance(author, str):
|
||||
if len(author.strip()) == 0:
|
||||
errors.append("Review author is empty string")
|
||||
|
||||
# reviewRating validation
|
||||
review_rating = review_data.get("reviewRating")
|
||||
if not review_rating:
|
||||
errors.append("Review missing required: reviewRating")
|
||||
elif isinstance(review_rating, dict):
|
||||
rv = review_rating.get("ratingValue")
|
||||
if rv is None:
|
||||
errors.append("reviewRating missing ratingValue")
|
||||
|
||||
for prop_name in REVIEW_RECOMMENDED:
|
||||
if not review_data.get(prop_name):
|
||||
warnings.append(f"Review missing recommended: {prop_name}")
|
||||
|
||||
return {"errors": errors, "warnings": warnings}
|
||||
|
||||
def validate_breadcrumb(self, schema_data: dict) -> dict[str, list[str]]:
|
||||
"""Validate BreadcrumbList schema."""
|
||||
errors: list[str] = []
|
||||
warnings: list[str] = []
|
||||
|
||||
items = schema_data.get("itemListElement")
|
||||
if not items:
|
||||
errors.append("BreadcrumbList missing itemListElement")
|
||||
return {"errors": errors, "warnings": warnings}
|
||||
|
||||
if not isinstance(items, list):
|
||||
errors.append("itemListElement should be an array")
|
||||
return {"errors": errors, "warnings": warnings}
|
||||
|
||||
for i, item in enumerate(items):
|
||||
if not isinstance(item, dict):
|
||||
errors.append(f"Breadcrumb item {i} is not an object")
|
||||
continue
|
||||
position = item.get("position")
|
||||
if position is None:
|
||||
errors.append(f"Breadcrumb item {i} missing 'position'")
|
||||
name = item.get("name") or (item.get("item", {}).get("name") if isinstance(item.get("item"), dict) else None)
|
||||
if not name:
|
||||
warnings.append(f"Breadcrumb item {i} missing 'name'")
|
||||
|
||||
return {"errors": errors, "warnings": warnings}
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Rich result eligibility
|
||||
# ------------------------------------------------------------------
|
||||
|
||||
def check_rich_result_eligibility(self, schema_data: dict) -> bool:
|
||||
"""Assess Google rich result eligibility for Product schema."""
|
||||
# Must have name, image, and offers with price
|
||||
if not schema_data.get("name"):
|
||||
return False
|
||||
if not schema_data.get("image"):
|
||||
return False
|
||||
|
||||
offers = schema_data.get("offers")
|
||||
if not offers:
|
||||
return False
|
||||
|
||||
offer_list = offers if isinstance(offers, list) else [offers]
|
||||
for offer in offer_list:
|
||||
if not isinstance(offer, dict):
|
||||
continue
|
||||
if offer.get("price") and offer.get("priceCurrency") and offer.get("availability"):
|
||||
return True
|
||||
|
||||
return False
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Naver Shopping requirements
|
||||
# ------------------------------------------------------------------
|
||||
|
||||
def check_naver_shopping_requirements(self, schema_data: dict, page_url: str) -> list[dict]:
|
||||
"""Check Naver Shopping specific schema requirements."""
|
||||
issues: list[dict] = []
|
||||
|
||||
# Naver Shopping requires Product name in Korean for Korean market
|
||||
name = schema_data.get("name", "")
|
||||
korean_chars = len(re.findall(r"[\uac00-\ud7af]", str(name)))
|
||||
if korean_chars == 0 and name:
|
||||
issues.append({
|
||||
"url": page_url,
|
||||
"type": "naver_product_name",
|
||||
"severity": "medium",
|
||||
"message": "Product name has no Korean characters",
|
||||
"recommendation": "Include Korean product name for Naver Shopping visibility.",
|
||||
})
|
||||
|
||||
# Naver prefers specific category mapping
|
||||
if not schema_data.get("category"):
|
||||
issues.append({
|
||||
"url": page_url,
|
||||
"type": "naver_category",
|
||||
"severity": "low",
|
||||
"message": "Missing 'category' property for Naver Shopping categorization",
|
||||
"recommendation": "Add category property matching Naver Shopping category taxonomy.",
|
||||
})
|
||||
|
||||
# Naver requires image
|
||||
image = schema_data.get("image")
|
||||
if not image:
|
||||
issues.append({
|
||||
"url": page_url,
|
||||
"type": "naver_image",
|
||||
"severity": "high",
|
||||
"message": "Missing product image (required for Naver Shopping)",
|
||||
"recommendation": "Add at least one high-quality product image URL.",
|
||||
})
|
||||
elif isinstance(image, str):
|
||||
if not image.startswith("http"):
|
||||
issues.append({
|
||||
"url": page_url,
|
||||
"type": "naver_image_url",
|
||||
"severity": "medium",
|
||||
"message": "Product image URL is relative (should be absolute)",
|
||||
"recommendation": "Use absolute URLs for product images.",
|
||||
})
|
||||
|
||||
# Naver requires price in KRW
|
||||
offers = schema_data.get("offers")
|
||||
if offers:
|
||||
offer_list = offers if isinstance(offers, list) else [offers]
|
||||
for offer in offer_list:
|
||||
if isinstance(offer, dict):
|
||||
currency = offer.get("priceCurrency", "")
|
||||
if currency and currency.upper() != "KRW":
|
||||
issues.append({
|
||||
"url": page_url,
|
||||
"type": "naver_currency",
|
||||
"severity": "medium",
|
||||
"message": f"Price currency is {currency}, not KRW",
|
||||
"recommendation": "For Naver Shopping, provide price in KRW.",
|
||||
})
|
||||
|
||||
# Check brand/manufacturer
|
||||
if not schema_data.get("brand") and not schema_data.get("manufacturer"):
|
||||
issues.append({
|
||||
"url": page_url,
|
||||
"type": "naver_brand",
|
||||
"severity": "low",
|
||||
"message": "Missing brand/manufacturer (helpful for Naver Shopping filters)",
|
||||
"recommendation": "Add brand or manufacturer property.",
|
||||
})
|
||||
|
||||
return issues
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Orchestrator
|
||||
# ------------------------------------------------------------------
|
||||
|
||||
async def check(
|
||||
self,
|
||||
urls: list[str] | None = None,
|
||||
sitemap_url: str | None = None,
|
||||
sample_size: int = 50,
|
||||
) -> SchemaCheckResult:
|
||||
"""Run schema validation on URLs or sitemap."""
|
||||
result = SchemaCheckResult(timestamp=datetime.now().isoformat())
|
||||
target_urls: list[str] = []
|
||||
|
||||
async with aiohttp.ClientSession() as session:
|
||||
if sitemap_url:
|
||||
# Fetch URLs from sitemap
|
||||
target_urls = await self._urls_from_sitemap(session, sitemap_url, sample_size)
|
||||
if urls:
|
||||
target_urls.extend(urls)
|
||||
|
||||
target_urls = list(set(target_urls))[:sample_size]
|
||||
result.urls_checked = len(target_urls)
|
||||
self.logger.info(f"Checking {len(target_urls)} URLs for Product schema")
|
||||
|
||||
error_counter: dict[str, int] = {}
|
||||
warning_counter: dict[str, int] = {}
|
||||
|
||||
for url in target_urls:
|
||||
html = await self._fetch_page(session, url)
|
||||
if not html:
|
||||
result.pages_without_schema += 1
|
||||
continue
|
||||
|
||||
schemas = self.extract_schemas(html, url)
|
||||
product_schemas = [
|
||||
s for s in schemas
|
||||
if self._get_schema_type(s) in ("Product", "ProductGroup")
|
||||
]
|
||||
breadcrumb_schemas = [
|
||||
s for s in schemas
|
||||
if self._get_schema_type(s) == "BreadcrumbList"
|
||||
]
|
||||
|
||||
if not product_schemas:
|
||||
result.pages_without_schema += 1
|
||||
continue
|
||||
|
||||
result.pages_with_schema += 1
|
||||
|
||||
for ps_data in product_schemas:
|
||||
ps = self.validate_product_schema(ps_data, url)
|
||||
result.schemas.append(asdict(ps))
|
||||
|
||||
for err in ps.errors:
|
||||
error_counter[err] = error_counter.get(err, 0) + 1
|
||||
for warn in ps.warnings:
|
||||
warning_counter[warn] = warning_counter.get(warn, 0) + 1
|
||||
|
||||
# Naver Shopping checks
|
||||
naver_issues = self.check_naver_shopping_requirements(ps_data, url)
|
||||
result.naver_shopping_issues.extend(naver_issues)
|
||||
|
||||
# Validate breadcrumbs
|
||||
for bc_data in breadcrumb_schemas:
|
||||
bc_result = self.validate_breadcrumb(bc_data)
|
||||
for err in bc_result["errors"]:
|
||||
error_counter[err] = error_counter.get(err, 0) + 1
|
||||
|
||||
# Aggregate common errors/warnings
|
||||
result.common_errors = sorted(
|
||||
error_counter.keys(),
|
||||
key=lambda k: error_counter[k],
|
||||
reverse=True,
|
||||
)[:20]
|
||||
result.common_warnings = sorted(
|
||||
warning_counter.keys(),
|
||||
key=lambda k: warning_counter[k],
|
||||
reverse=True,
|
||||
)[:20]
|
||||
|
||||
result.calculate_score()
|
||||
return result
|
||||
|
||||
async def _urls_from_sitemap(
|
||||
self,
|
||||
session: aiohttp.ClientSession,
|
||||
sitemap_url: str,
|
||||
limit: int,
|
||||
) -> list[str]:
|
||||
"""Fetch product URLs from sitemap."""
|
||||
urls: list[str] = []
|
||||
try:
|
||||
async with session.get(sitemap_url, headers=self.headers,
|
||||
timeout=self.timeout, ssl=False) as resp:
|
||||
if resp.status != 200:
|
||||
return urls
|
||||
text = await resp.text(errors="replace")
|
||||
soup = BeautifulSoup(text, "lxml-xml")
|
||||
|
||||
# Handle sitemap index
|
||||
sitemapindex = soup.find_all("sitemap")
|
||||
if sitemapindex:
|
||||
for sm in sitemapindex[:3]:
|
||||
loc = sm.find("loc")
|
||||
if loc:
|
||||
child_urls = await self._urls_from_sitemap(session, loc.text.strip(), limit)
|
||||
urls.extend(child_urls)
|
||||
if len(urls) >= limit:
|
||||
break
|
||||
else:
|
||||
for tag in soup.find_all("url"):
|
||||
loc = tag.find("loc")
|
||||
if loc:
|
||||
urls.append(loc.text.strip())
|
||||
if len(urls) >= limit:
|
||||
break
|
||||
except Exception as exc:
|
||||
self.logger.warning(f"Sitemap parse failed: {exc}")
|
||||
|
||||
return urls[:limit]
|
||||
|
||||
@staticmethod
|
||||
def _get_schema_type(schema: dict) -> str:
|
||||
"""Get the @type from a schema dict, handling various formats."""
|
||||
schema_type = schema.get("@type", "")
|
||||
if isinstance(schema_type, list):
|
||||
return schema_type[0] if schema_type else ""
|
||||
return str(schema_type)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# CLI output helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def print_rich_report(result: SchemaCheckResult) -> None:
|
||||
"""Print a rich-formatted report to the console."""
|
||||
console.print(f"\n[bold cyan]Product Schema Validation Report[/bold cyan]")
|
||||
console.print(f"Timestamp: {result.timestamp}")
|
||||
console.print(f"URLs checked: {result.urls_checked}")
|
||||
|
||||
# Coverage
|
||||
coverage = (result.pages_with_schema / max(result.urls_checked, 1)) * 100
|
||||
cov_color = "green" if coverage >= 90 else "yellow" if coverage >= 50 else "red"
|
||||
console.print(f"Schema coverage: [{cov_color}]{coverage:.0f}%[/{cov_color}] "
|
||||
f"({result.pages_with_schema}/{result.urls_checked})")
|
||||
|
||||
# Score
|
||||
score_color = "green" if result.score >= 80 else "yellow" if result.score >= 50 else "red"
|
||||
console.print(f"[bold {score_color}]Score: {result.score}/100[/bold {score_color}]")
|
||||
|
||||
# Validity summary
|
||||
valid = sum(1 for s in result.schemas if s.get("is_valid"))
|
||||
eligible = sum(1 for s in result.schemas if s.get("rich_result_eligible"))
|
||||
total = len(result.schemas)
|
||||
|
||||
table = Table(title="Schema Summary")
|
||||
table.add_column("Metric", style="bold")
|
||||
table.add_column("Value", justify="right")
|
||||
table.add_row("Total schemas found", str(total))
|
||||
table.add_row("Valid schemas", str(valid))
|
||||
table.add_row("Rich result eligible", str(eligible))
|
||||
table.add_row("Pages without schema", str(result.pages_without_schema))
|
||||
console.print(table)
|
||||
|
||||
# Common errors
|
||||
if result.common_errors:
|
||||
console.print(f"\n[bold red]Common Errors ({len(result.common_errors)}):[/bold red]")
|
||||
for err in result.common_errors[:10]:
|
||||
console.print(f" [red]-[/red] {err}")
|
||||
|
||||
# Common warnings
|
||||
if result.common_warnings:
|
||||
console.print(f"\n[bold yellow]Common Warnings ({len(result.common_warnings)}):[/bold yellow]")
|
||||
for warn in result.common_warnings[:10]:
|
||||
console.print(f" [yellow]-[/yellow] {warn}")
|
||||
|
||||
# Naver Shopping issues
|
||||
if result.naver_shopping_issues:
|
||||
console.print(f"\n[bold magenta]Naver Shopping Issues ({len(result.naver_shopping_issues)}):[/bold magenta]")
|
||||
seen: set[str] = set()
|
||||
for issue in result.naver_shopping_issues:
|
||||
key = f"{issue['type']}:{issue['message']}"
|
||||
if key not in seen:
|
||||
seen.add(key)
|
||||
console.print(f" [{issue.get('severity', 'medium')}] {issue['message']}")
|
||||
console.print(f" [dim]{issue['recommendation']}[/dim]")
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Main
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def main() -> None:
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Product Schema Checker - Validate e-commerce structured data",
|
||||
)
|
||||
group = parser.add_mutually_exclusive_group(required=True)
|
||||
group.add_argument("--url", nargs="+", help="Product page URL(s) to validate")
|
||||
group.add_argument("--sitemap", help="Sitemap URL to fetch product pages from")
|
||||
parser.add_argument(
|
||||
"--sample",
|
||||
type=int,
|
||||
default=50,
|
||||
help="Max URLs to check from sitemap (default: 50)",
|
||||
)
|
||||
parser.add_argument("--json", action="store_true", help="Output as JSON")
|
||||
parser.add_argument("--output", type=str, help="Save output to file")
|
||||
args = parser.parse_args()
|
||||
|
||||
checker = ProductSchemaChecker()
|
||||
result = asyncio.run(
|
||||
checker.check(
|
||||
urls=args.url,
|
||||
sitemap_url=args.sitemap,
|
||||
sample_size=args.sample,
|
||||
)
|
||||
)
|
||||
|
||||
if args.json:
|
||||
output = json.dumps(asdict(result), indent=2, ensure_ascii=False, default=str)
|
||||
if args.output:
|
||||
with open(args.output, "w", encoding="utf-8") as f:
|
||||
f.write(output)
|
||||
console.print(f"[green]Results saved to {args.output}[/green]")
|
||||
else:
|
||||
print(output)
|
||||
else:
|
||||
print_rich_report(result)
|
||||
if args.output:
|
||||
output = json.dumps(asdict(result), indent=2, ensure_ascii=False, default=str)
|
||||
with open(args.output, "w", encoding="utf-8") as f:
|
||||
f.write(output)
|
||||
console.print(f"\n[green]JSON results also saved to {args.output}[/green]")
|
||||
|
||||
checker.print_stats()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -0,0 +1,9 @@
|
||||
# 24-seo-ecommerce dependencies
|
||||
requests>=2.31.0
|
||||
aiohttp>=3.9.0
|
||||
beautifulsoup4>=4.12.0
|
||||
lxml>=5.1.0
|
||||
tenacity>=8.2.0
|
||||
tqdm>=4.66.0
|
||||
python-dotenv>=1.0.0
|
||||
rich>=13.7.0
|
||||
156
custom-skills/24-seo-ecommerce/desktop/SKILL.md
Normal file
156
custom-skills/24-seo-ecommerce/desktop/SKILL.md
Normal file
@@ -0,0 +1,156 @@
|
||||
---
|
||||
name: seo-ecommerce
|
||||
description: |
|
||||
E-commerce SEO audit and optimization for product pages, product schema, category taxonomy,
|
||||
and Korean marketplace presence.
|
||||
Triggers: product SEO, e-commerce audit, product schema, category SEO, Smart Store, marketplace SEO,
|
||||
상품 SEO, 이커머스 감사, 쇼핑몰 SEO.
|
||||
---
|
||||
|
||||
# E-Commerce SEO Audit
|
||||
|
||||
## Purpose
|
||||
|
||||
Audit e-commerce sites for product page optimization, structured data validation, category taxonomy health, duplicate content issues, and Korean marketplace presence (Naver Smart Store, Coupang, Gmarket, 11번가).
|
||||
|
||||
## Core Capabilities
|
||||
|
||||
1. **Product Page SEO Audit** - Title, meta description, H1, image alt text, internal links, canonical tags
|
||||
2. **Product Schema Validation** - Product, Offer, AggregateRating, Review, BreadcrumbList structured data
|
||||
3. **Category Taxonomy Analysis** - Depth check, breadcrumbs, faceted navigation handling
|
||||
4. **Duplicate Content Detection** - Parameter variants, product variants, pagination issues
|
||||
5. **Korean Marketplace Presence** - Naver Smart Store, Coupang, Gmarket, 11번가
|
||||
|
||||
## MCP Tool Usage
|
||||
|
||||
### Ahrefs for Product Page Discovery
|
||||
```
|
||||
mcp__ahrefs__site-explorer-pages-by-traffic: Identify top product and category pages
|
||||
mcp__ahrefs__site-explorer-organic-keywords: Product page keyword performance
|
||||
```
|
||||
|
||||
### WebSearch for Marketplace Checks
|
||||
```
|
||||
WebSearch: Search for brand presence on Korean marketplaces
|
||||
WebFetch: Fetch and analyze marketplace listing pages
|
||||
```
|
||||
|
||||
### Notion for Report Storage
|
||||
```
|
||||
mcp__notion__notion-create-pages: Save audit report to SEO Audit Log database
|
||||
```
|
||||
|
||||
## Workflow
|
||||
|
||||
### 1. Product Page Audit
|
||||
1. Discover product pages via Ahrefs pages-by-traffic or sitemap
|
||||
2. For each product page check:
|
||||
- Title tag: contains product name, under 60 chars
|
||||
- Meta description: includes price/feature info, under 155 chars
|
||||
- Single H1 with product name
|
||||
- All product images have descriptive alt text
|
||||
- Canonical tag present and correct
|
||||
- Sufficient internal links (related products, breadcrumbs)
|
||||
- Open Graph tags for social sharing
|
||||
3. Score severity: critical/high/medium/low
|
||||
|
||||
### 2. Product Schema Validation
|
||||
1. Extract JSON-LD and Microdata from product pages
|
||||
2. Validate Product type: name, image, description (required)
|
||||
3. Validate Offer: price, priceCurrency, availability (required)
|
||||
4. Validate AggregateRating: ratingValue, reviewCount (required)
|
||||
5. Validate Review: author, reviewRating (required)
|
||||
6. Check BreadcrumbList implementation
|
||||
7. Assess Google rich result eligibility
|
||||
8. Check Naver Shopping specific requirements (Korean name, KRW price, absolute image URLs)
|
||||
|
||||
### 3. Category Taxonomy Analysis
|
||||
1. Crawl category pages from sitemap or homepage navigation
|
||||
2. Measure taxonomy depth (warn if > 4 levels)
|
||||
3. Check breadcrumb presence on every category page
|
||||
4. Identify faceted navigation URLs that are indexable without proper canonicals
|
||||
5. Count child category links for structure assessment
|
||||
|
||||
### 4. Duplicate Content Detection
|
||||
1. Group URLs by base path (stripping query parameters)
|
||||
2. Identify parameter variants (?color=, ?size=, ?sort=)
|
||||
3. Detect product variant URL duplicates (e.g., /product-red vs /product-blue)
|
||||
4. Flag paginated pages missing self-referencing canonicals
|
||||
|
||||
### 5. Korean Marketplace Presence
|
||||
1. Extract brand name from site (og:site_name or title)
|
||||
2. Search each marketplace for brand products:
|
||||
- Naver Smart Store (smartstore.naver.com)
|
||||
- Coupang (coupang.com)
|
||||
- Gmarket (gmarket.co.kr)
|
||||
- 11번가 (11st.co.kr)
|
||||
3. Check Naver Smart Store-specific SEO elements
|
||||
4. Verify naver-site-verification meta tag
|
||||
5. Check Korean content ratio for Naver visibility
|
||||
|
||||
## Output Format
|
||||
|
||||
```markdown
|
||||
## E-Commerce SEO Audit: [domain]
|
||||
|
||||
### Score: [0-100]/100
|
||||
|
||||
### Product Page Issues
|
||||
- **Critical**: [count] issues
|
||||
- **High**: [count] issues
|
||||
- **Medium**: [count] issues
|
||||
- **Low**: [count] issues
|
||||
|
||||
#### Top Issues
|
||||
1. [severity] [issue_type] - [message]
|
||||
Recommendation: [fix]
|
||||
|
||||
### Category Structure
|
||||
- Categories found: [count]
|
||||
- Max depth: [number]
|
||||
- Breadcrumbs present: [count]
|
||||
- Faceted navigation issues: [count]
|
||||
|
||||
### Schema Validation
|
||||
- Pages with schema: [count]/[total]
|
||||
- Valid schemas: [count]
|
||||
- Rich result eligible: [count]
|
||||
- Common errors: [list]
|
||||
|
||||
### Korean Marketplaces
|
||||
- Naver Smart Store: [Found/Not Found]
|
||||
- Coupang: [Found/Not Found]
|
||||
- Gmarket: [Found/Not Found]
|
||||
- 11번가: [Found/Not Found]
|
||||
|
||||
### Recommendations
|
||||
1. [Priority fixes ordered by impact]
|
||||
```
|
||||
|
||||
## Common Issues
|
||||
|
||||
| Issue | Impact | Fix |
|
||||
|-------|--------|-----|
|
||||
| Missing Product schema | High | Add JSON-LD Product with offers |
|
||||
| No canonical on product variants | High | Add self-referencing canonical |
|
||||
| Images without alt text | High | Add product name to alt text |
|
||||
| Category depth > 4 levels | Medium | Flatten taxonomy |
|
||||
| Missing breadcrumbs | Medium | Add BreadcrumbList schema and visible nav |
|
||||
| Faceted nav creating duplicates | High | Use canonical or noindex on filtered pages |
|
||||
| Missing Naver verification | Medium | Add naver-site-verification meta tag |
|
||||
| Price not in KRW for Korean market | Medium | Add KRW pricing to schema |
|
||||
|
||||
## Limitations
|
||||
|
||||
- Cannot access logged-in areas (member-only products)
|
||||
- Marketplace search results may vary by region/IP
|
||||
- Large catalogs require sampling (default 50 pages)
|
||||
- Cannot validate JavaScript-rendered product content without headless browser
|
||||
|
||||
## Notion Output (Required)
|
||||
|
||||
All audit reports MUST be saved to OurDigital SEO Audit Log:
|
||||
- **Database ID**: `2c8581e5-8a1e-8035-880b-e38cefc2f3ef`
|
||||
- **Properties**: Issue (title), Site (url), Category (E-Commerce SEO), Priority, Found Date, Audit ID
|
||||
- **Language**: Korean with English technical terms
|
||||
- **Audit ID Format**: ECOM-YYYYMMDD-NNN
|
||||
8
custom-skills/24-seo-ecommerce/desktop/skill.yaml
Normal file
8
custom-skills/24-seo-ecommerce/desktop/skill.yaml
Normal file
@@ -0,0 +1,8 @@
|
||||
name: seo-ecommerce
|
||||
description: |
|
||||
E-commerce SEO audit and optimization. Triggers: product SEO, e-commerce audit, product schema, category SEO, Smart Store, marketplace SEO.
|
||||
allowed-tools:
|
||||
- mcp__ahrefs__*
|
||||
- mcp__notion__*
|
||||
- WebSearch
|
||||
- WebFetch
|
||||
15
custom-skills/24-seo-ecommerce/desktop/tools/ahrefs.md
Normal file
15
custom-skills/24-seo-ecommerce/desktop/tools/ahrefs.md
Normal file
@@ -0,0 +1,15 @@
|
||||
# Ahrefs
|
||||
|
||||
> TODO: Document tool usage for this skill
|
||||
|
||||
## Available Commands
|
||||
|
||||
- [ ] List commands
|
||||
|
||||
## Configuration
|
||||
|
||||
- [ ] Add configuration details
|
||||
|
||||
## Examples
|
||||
|
||||
- [ ] Add usage examples
|
||||
15
custom-skills/24-seo-ecommerce/desktop/tools/notion.md
Normal file
15
custom-skills/24-seo-ecommerce/desktop/tools/notion.md
Normal file
@@ -0,0 +1,15 @@
|
||||
# Notion
|
||||
|
||||
> TODO: Document tool usage for this skill
|
||||
|
||||
## Available Commands
|
||||
|
||||
- [ ] List commands
|
||||
|
||||
## Configuration
|
||||
|
||||
- [ ] Add configuration details
|
||||
|
||||
## Examples
|
||||
|
||||
- [ ] Add usage examples
|
||||
15
custom-skills/24-seo-ecommerce/desktop/tools/websearch.md
Normal file
15
custom-skills/24-seo-ecommerce/desktop/tools/websearch.md
Normal file
@@ -0,0 +1,15 @@
|
||||
# WebSearch
|
||||
|
||||
> TODO: Document tool usage for this skill
|
||||
|
||||
## Available Commands
|
||||
|
||||
- [ ] List commands
|
||||
|
||||
## Configuration
|
||||
|
||||
- [ ] Add configuration details
|
||||
|
||||
## Examples
|
||||
|
||||
- [ ] Add usage examples
|
||||
Reference in New Issue
Block a user