directory changes and restructuring

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
2025-12-22 02:01:41 +09:00
parent eea49f9f8c
commit 236be6c580
598 changed files with 0 additions and 0 deletions

View File

@@ -0,0 +1,89 @@
# CLAUDE.md
## Overview
Notion workspace management toolkit for database organization, schema migration, and bulk operations.
## Quick Start
```bash
pip install -r scripts/requirements.txt
# Schema migration
python scripts/schema_migrator.py --source [DB_ID] --target [DB_ID] --dry-run
# Async bulk operations
python scripts/async_organizer.py --database [DB_ID] --action cleanup
```
## Scripts
| Script | Purpose |
|--------|---------|
| `schema_migrator.py` | Migrate data between databases with property mapping |
| `async_organizer.py` | Async bulk operations (cleanup, restructure, archive) |
## Schema Migrator
```bash
# Dry run (preview changes)
python scripts/schema_migrator.py \
--source abc123 \
--target def456 \
--mapping mapping.json \
--dry-run
# Execute migration
python scripts/schema_migrator.py \
--source abc123 \
--target def456 \
--mapping mapping.json
```
### Mapping File Format
```json
{
"properties": {
"OldName": "NewName",
"Status": "Status"
},
"transforms": {
"Date": "date_to_iso"
}
}
```
## Async Organizer
```bash
# Cleanup empty/stale pages
python scripts/async_organizer.py --database [ID] --action cleanup
# Archive old pages
python scripts/async_organizer.py --database [ID] --action archive --days 90
# Restructure hierarchy
python scripts/async_organizer.py --database [ID] --action restructure
```
## Rate Limits
| Limit | Value |
|-------|-------|
| Requests/second | 3 max |
| Items per request | 100 max |
| Retry on 429 | Exponential backoff |
## Configuration
Environment variables:
```bash
NOTION_TOKEN=secret_xxx
```
## Notes
- Always use `--dry-run` first for destructive operations
- Large operations (1000+ pages) use async with progress reporting
- Scripts implement automatic rate limiting

View File

@@ -0,0 +1,250 @@
# Notion Organizer Reference
## Notion API Fundamentals
### Base Configuration
- **Base URL**: `https://api.notion.com`
- **Current Version**: `2022-06-28`
- **Authentication**: Bearer token in Authorization header
### Rate Limits
| Limit | Value | Strategy |
|-------|-------|----------|
| Requests/second | 3 (average) | Use throttling/semaphore |
| Burst allowed | Small bursts | Implement exponential backoff |
| Page size | 100 items max | Use pagination cursors |
| Payload size | 500KB max | Split large operations |
### Core Object Hierarchy
```
Workspace
└── Database (container)
└── Page (row)
└── Block (content)
```
### Property Types Reference
| Type | Use Case | Notes |
|------|----------|-------|
| `title` | Page name | Required, one per database |
| `rich_text` | Text content | Max 2,000 chars |
| `number` | Numeric values | Supports format options |
| `select` | Single choice | Define options array |
| `multi_select` | Multiple choices | Define options array |
| `status` | Workflow states | Groups: To-do, In progress, Complete |
| `date` | Dates/times | ISO 8601 format |
| `checkbox` | Boolean | true/false |
| `url` | Links | Max 2,000 chars |
| `email` | Email addresses | Validation applied |
| `phone_number` | Phone | String format |
| `relation` | Links to pages | Requires database_id |
| `rollup` | Aggregated data | Requires relation + function |
| `formula` | Computed values | Expression syntax |
| `files` | Attachments | External URLs or Notion hosted |
| `people` | User references | Notion user IDs |
| `created_time` | Auto timestamp | Read-only |
| `created_by` | Auto user | Read-only |
| `last_edited_time` | Auto timestamp | Read-only |
| `last_edited_by` | Auto user | Read-only |
### Size Limits
| Element | Limit |
|---------|-------|
| Rich text content | 2,000 characters |
| URL length | 2,000 characters |
| Array elements | 100 items |
| Page properties | 100 per page |
| Database properties | 100 per database |
### Error Codes
| Code | Status | Action |
|------|--------|--------|
| `rate_limited` | 429 | Wait Retry-After header seconds |
| `validation_error` | 400 | Check request body format |
| `object_not_found` | 404 | Verify sharing/permissions |
| `unauthorized` | 401 | Check API token validity |
| `conflict_error` | 409 | Resource was modified, refetch |
| `internal_server_error` | 500 | Retry with backoff |
---
## Workflow Patterns
### Pattern 1: Database Audit
**Purpose**: Analyze database structure and recommend optimizations
**Steps**:
1. Fetch database schema via MCP or API
2. Analyze property types, naming conventions, usage
3. Identify issues:
- Unused properties
- Inconsistent naming
- Suboptimal property types
- Missing relations
4. Present recommendations with rationale
5. Execute approved changes incrementally
**Example Query**:
```
Audit my "Projects" database:
- Check for unused properties
- Identify naming inconsistencies
- Recommend schema optimizations
```
### Pattern 2: Bulk Reorganization
**Purpose**: Move/update many pages efficiently
**Decision Tree**:
- ≤ 50 operations → Use MCP tools with staged execution
- > 50 operations → Generate Python script
**Steps**:
1. Assess scope (count affected pages)
2. Estimate API calls and time
3. Choose execution method (MCP vs Python)
4. Execute with progress updates
5. Generate summary report
**Example Query**:
```
Move all pages with status "Archived" from "Active Projects"
to "Archive" database, preserving the Project Name and Date properties
```
### Pattern 3: Schema Migration
**Purpose**: Transfer data between databases with different schemas
**Steps**:
1. Fetch source database schema
2. Fetch target database schema
3. Create property mapping plan:
- Direct mappings (same type)
- Transformations needed (type conversion)
- Unmappable properties (manual handling)
4. Validate compatibility
5. Execute migration:
- MCP for small datasets
- Python for large datasets
6. Verify data integrity
**Property Mapping Template**:
```
Source Property → Target Property (Transformation)
─────────────────────────────────────────────────
Name (title) → Project Name (title) [Direct]
Status (select) → Stage (status) [Map values]
Due Date (date) → Deadline (date) [Direct]
Tags (multi) → Categories (multi) [Merge options]
Notes (text) → Description (text) [Direct]
Owner (text) → Assignee (people) [Manual]
```
### Pattern 4: Property Cleanup
**Purpose**: Standardize properties across databases
**Common Tasks**:
- Rename properties to consistent convention (camelCase, snake_case, Title Case)
- Consolidate duplicate select/multi-select options
- Remove unused properties
- Add missing required properties
**Naming Convention Guide**:
```
Recommended: Title Case with spaces
Examples: "Project Name", "Due Date", "Status", "Assigned To"
Alternative: camelCase (for technical databases)
Examples: "projectName", "dueDate", "status", "assignedTo"
```
### Pattern 5: Duplicate Detection
**Purpose**: Find and handle duplicate or similar content
**Detection Strategies**:
1. Exact title match
2. Fuzzy title similarity (Levenshtein distance)
3. Property combination match (e.g., same name + date)
4. Content hash comparison
**Resolution Options**:
- Merge: Combine properties from duplicates
- Archive: Move older duplicate to archive
- Delete: Remove with user confirmation
- Link: Create relation between related items
---
## MCP Tool Usage Examples
### Search for Pages
```
Use mcp__notion__search to find:
- Query: "marketing campaign"
- Filter: database_id = "abc123"
```
### Query Database with Filters
```
Use mcp__notion__query-database:
- Database ID: "abc123"
- Filter: { "property": "Status", "select": { "equals": "Active" } }
- Sorts: [{ "property": "Created", "direction": "descending" }]
```
### Update Page Properties
```
Use mcp__notion__update-page:
- Page ID: "xyz789"
- Properties: {
"Status": { "select": { "name": "Completed" } },
"Completed Date": { "date": { "start": "2025-12-05" } }
}
```
### Create New Page
```
Use mcp__notion__create-page:
- Parent: { "database_id": "abc123" }
- Properties: {
"Name": { "title": [{ "text": { "content": "New Project" } }] },
"Status": { "select": { "name": "Planning" } }
}
```
---
## Best Practices
### 1. Always Fetch Before Modify
Never assume database structure. Always retrieve current schema first.
### 2. Batch Operations Wisely
- Group related updates
- Use pagination for queries
- Implement checkpoints for large operations
### 3. Handle Relations Carefully
- Relations require both databases to be accessible
- Synced databases need special handling
- Rollups depend on relations - update order matters
### 4. Preserve Data Integrity
- Back up critical data before major changes
- Use transactions where possible
- Verify changes after execution
### 5. Respect User Permissions
- Check integration has access to target resources
- Request additional permissions when needed
- Document permission requirements

View File

@@ -0,0 +1,331 @@
"""
Notion Async Organizer - Base Template
======================================
Purpose: Rate-limited async operations for Notion API
Python: 3.10+
Packages: notion-client, tenacity, tqdm, python-dotenv
Usage:
python async_organizer.py --database-id <id> [--dry-run]
"""
import asyncio
import argparse
import logging
import os
from asyncio import Semaphore
from datetime import datetime
from typing import Any
from dotenv import load_dotenv
from notion_client import AsyncClient
from tenacity import (
retry,
stop_after_attempt,
wait_exponential,
retry_if_exception_type,
)
from tqdm.asyncio import tqdm
# Load environment variables
load_dotenv()
# Configuration
NOTION_API_KEY = os.getenv("NOTION_TOKEN") or os.getenv("NOTION_API_KEY")
MAX_CONCURRENT_REQUESTS = 3
REQUEST_DELAY = 0.35 # ~3 requests/second
# Logging setup
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - %(levelname)s - %(message)s",
handlers=[
logging.StreamHandler(),
logging.FileHandler(f"notion_organizer_{datetime.now():%Y%m%d_%H%M%S}.log"),
],
)
logger = logging.getLogger(__name__)
class NotionAsyncOrganizer:
"""Async Notion operations with rate limiting and retry logic."""
def __init__(self, api_key: str, dry_run: bool = False):
self.client = AsyncClient(auth=api_key)
self.semaphore = Semaphore(MAX_CONCURRENT_REQUESTS)
self.dry_run = dry_run
self.stats = {"fetched": 0, "updated": 0, "created": 0, "errors": 0}
@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=2, max=10),
retry=retry_if_exception_type(Exception),
)
async def _rate_limited_request(self, coro):
"""Execute request with rate limiting and retry."""
async with self.semaphore:
await asyncio.sleep(REQUEST_DELAY)
return await coro
async def fetch_database_schema(self, database_id: str) -> dict:
"""Fetch database schema/properties."""
logger.info(f"Fetching database schema: {database_id}")
response = await self._rate_limited_request(
self.client.databases.retrieve(database_id=database_id)
)
self.stats["fetched"] += 1
return response
async def fetch_all_pages(
self,
database_id: str,
filter_obj: dict | None = None,
sorts: list | None = None,
) -> list[dict]:
"""Fetch all pages from a database with pagination."""
all_pages = []
has_more = True
start_cursor = None
logger.info(f"Fetching pages from database: {database_id}")
while has_more:
query_params = {
"database_id": database_id,
"page_size": 100,
}
if start_cursor:
query_params["start_cursor"] = start_cursor
if filter_obj:
query_params["filter"] = filter_obj
if sorts:
query_params["sorts"] = sorts
response = await self._rate_limited_request(
self.client.databases.query(**query_params)
)
all_pages.extend(response["results"])
has_more = response.get("has_more", False)
start_cursor = response.get("next_cursor")
self.stats["fetched"] += len(response["results"])
logger.info(f"Fetched {len(all_pages)} pages so far...")
return all_pages
async def update_page(self, page_id: str, properties: dict) -> dict | None:
"""Update a single page's properties."""
if self.dry_run:
logger.info(f"[DRY-RUN] Would update page {page_id}: {properties}")
return None
try:
result = await self._rate_limited_request(
self.client.pages.update(page_id=page_id, properties=properties)
)
self.stats["updated"] += 1
return result
except Exception as e:
self.stats["errors"] += 1
logger.error(f"Failed to update page {page_id}: {e}")
raise
async def batch_update_pages(
self, updates: list[dict], desc: str = "Updating pages"
) -> list[dict]:
"""Update multiple pages concurrently with progress bar."""
results = []
async def update_single(update: dict) -> dict:
try:
result = await self.update_page(
update["page_id"], update["properties"]
)
return {"page_id": update["page_id"], "success": True, "result": result}
except Exception as e:
return {"page_id": update["page_id"], "success": False, "error": str(e)}
tasks = [update_single(u) for u in updates]
for coro in tqdm.as_completed(tasks, total=len(tasks), desc=desc):
result = await coro
results.append(result)
success_count = sum(1 for r in results if r["success"])
logger.info(f"Batch update complete: {success_count}/{len(updates)} succeeded")
return results
async def create_page(
self, parent: dict, properties: dict, children: list | None = None
) -> dict | None:
"""Create a new page."""
if self.dry_run:
logger.info(f"[DRY-RUN] Would create page: {properties}")
return None
try:
create_params = {"parent": parent, "properties": properties}
if children:
create_params["children"] = children
result = await self._rate_limited_request(
self.client.pages.create(**create_params)
)
self.stats["created"] += 1
return result
except Exception as e:
self.stats["errors"] += 1
logger.error(f"Failed to create page: {e}")
raise
async def search(
self, query: str, filter_type: str | None = None
) -> list[dict]:
"""Search Notion workspace."""
all_results = []
has_more = True
start_cursor = None
while has_more:
search_params = {"query": query, "page_size": 100}
if start_cursor:
search_params["start_cursor"] = start_cursor
if filter_type:
search_params["filter"] = {"property": "object", "value": filter_type}
response = await self._rate_limited_request(
self.client.search(**search_params)
)
all_results.extend(response["results"])
has_more = response.get("has_more", False)
start_cursor = response.get("next_cursor")
return all_results
def print_stats(self):
"""Print operation statistics."""
logger.info("=" * 50)
logger.info("Operation Statistics:")
logger.info(f" Fetched: {self.stats['fetched']}")
logger.info(f" Updated: {self.stats['updated']}")
logger.info(f" Created: {self.stats['created']}")
logger.info(f" Errors: {self.stats['errors']}")
logger.info("=" * 50)
async def close(self):
"""Close the client connection."""
await self.client.aclose()
# ============================================================
# Example Operations - Customize these for your specific task
# ============================================================
async def example_audit_database(organizer: NotionAsyncOrganizer, database_id: str):
"""Example: Audit a database and report on its structure."""
schema = await organizer.fetch_database_schema(database_id)
print("\n" + "=" * 60)
print(f"Database: {schema.get('title', [{}])[0].get('plain_text', 'Untitled')}")
print("=" * 60)
properties = schema.get("properties", {})
print(f"\nTotal Properties: {len(properties)}\n")
for name, prop in properties.items():
prop_type = prop.get("type", "unknown")
print(f" - {name}: {prop_type}")
# Show select/multi_select options
if prop_type in ("select", "multi_select"):
options = prop.get(prop_type, {}).get("options", [])
if options:
option_names = [o["name"] for o in options[:5]]
suffix = f" (+{len(options) - 5} more)" if len(options) > 5 else ""
print(f" Options: {', '.join(option_names)}{suffix}")
return schema
async def example_bulk_status_update(
organizer: NotionAsyncOrganizer,
database_id: str,
old_status: str,
new_status: str,
):
"""Example: Update status for all pages matching a filter."""
# Fetch pages with old status
filter_obj = {"property": "Status", "select": {"equals": old_status}}
pages = await organizer.fetch_all_pages(database_id, filter_obj=filter_obj)
print(f"\nFound {len(pages)} pages with status '{old_status}'")
if not pages:
return
# Prepare updates
updates = [
{
"page_id": page["id"],
"properties": {"Status": {"select": {"name": new_status}}},
}
for page in pages
]
# Execute batch update
results = await organizer.batch_update_pages(
updates, desc=f"Updating status to '{new_status}'"
)
return results
async def main():
"""Main entry point."""
parser = argparse.ArgumentParser(description="Notion Async Organizer")
parser.add_argument("--database-id", "-d", required=True, help="Database ID")
parser.add_argument(
"--dry-run", action="store_true", help="Preview changes without executing"
)
parser.add_argument(
"--operation",
"-o",
choices=["audit", "status-update"],
default="audit",
help="Operation to perform",
)
parser.add_argument("--old-status", help="Old status value (for status-update)")
parser.add_argument("--new-status", help="New status value (for status-update)")
args = parser.parse_args()
if not NOTION_API_KEY:
logger.error("NOTION_TOKEN or NOTION_API_KEY environment variable not set")
return
organizer = NotionAsyncOrganizer(NOTION_API_KEY, dry_run=args.dry_run)
try:
if args.operation == "audit":
await example_audit_database(organizer, args.database_id)
elif args.operation == "status-update":
if not args.old_status or not args.new_status:
logger.error("--old-status and --new-status required for status-update")
return
await example_bulk_status_update(
organizer, args.database_id, args.old_status, args.new_status
)
organizer.print_stats()
finally:
await organizer.close()
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,26 @@
# Notion Organizer Scripts - Requirements
# Python 3.10+ required
# Notion API client
notion-client==2.2.1
# Async HTTP
aiohttp==3.9.1
# Rate limiting
asyncio-throttle==1.0.2
# Environment variables
python-dotenv==1.0.0
# Retry logic
tenacity==8.2.3
# Progress bars
tqdm==4.66.1
# Optional: Data analysis
# pandas==2.1.4
# Optional: Fuzzy matching for duplicates
# rapidfuzz==3.5.2

View File

@@ -0,0 +1,367 @@
"""
Notion Schema Migrator
======================
Purpose: Migrate data between Notion databases with schema mapping
Python: 3.10+
Packages: notion-client, tenacity, tqdm, python-dotenv
Usage:
python schema_migrator.py \
--source-db <source_database_id> \
--target-db <target_database_id> \
--mapping mapping.json \
[--dry-run]
"""
import asyncio
import argparse
import json
import logging
import os
from asyncio import Semaphore
from datetime import datetime
from typing import Any
from dotenv import load_dotenv
from notion_client import AsyncClient
from tenacity import retry, stop_after_attempt, wait_exponential
from tqdm.asyncio import tqdm
load_dotenv()
NOTION_API_KEY = os.getenv("NOTION_TOKEN") or os.getenv("NOTION_API_KEY")
MAX_CONCURRENT_REQUESTS = 3
REQUEST_DELAY = 0.35
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - %(levelname)s - %(message)s",
)
logger = logging.getLogger(__name__)
class SchemaMigrator:
"""Migrate data between Notion databases with property mapping."""
def __init__(self, api_key: str, dry_run: bool = False):
self.client = AsyncClient(auth=api_key)
self.semaphore = Semaphore(MAX_CONCURRENT_REQUESTS)
self.dry_run = dry_run
self.stats = {
"pages_fetched": 0,
"pages_migrated": 0,
"pages_skipped": 0,
"errors": 0,
}
@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=2, max=10),
)
async def _request(self, coro):
async with self.semaphore:
await asyncio.sleep(REQUEST_DELAY)
return await coro
async def get_schema(self, database_id: str) -> dict:
"""Get database schema."""
return await self._request(
self.client.databases.retrieve(database_id=database_id)
)
async def fetch_all_pages(self, database_id: str) -> list[dict]:
"""Fetch all pages from source database."""
pages = []
has_more = True
cursor = None
while has_more:
params = {"database_id": database_id, "page_size": 100}
if cursor:
params["start_cursor"] = cursor
response = await self._request(self.client.databases.query(**params))
pages.extend(response["results"])
has_more = response.get("has_more", False)
cursor = response.get("next_cursor")
self.stats["pages_fetched"] = len(pages)
logger.info(f"Fetched {len(pages)} pages...")
return pages
def transform_property(
self,
value: dict,
source_type: str,
target_type: str,
value_mapping: dict | None = None,
) -> dict | None:
"""Transform a property value from source to target type."""
# Extract raw value based on source type
raw_value = None
if source_type == "title":
raw_value = "".join(
t.get("plain_text", "") for t in value.get("title", [])
)
elif source_type == "rich_text":
raw_value = "".join(
t.get("plain_text", "") for t in value.get("rich_text", [])
)
elif source_type == "number":
raw_value = value.get("number")
elif source_type == "select":
select_val = value.get("select")
raw_value = select_val.get("name") if select_val else None
elif source_type == "multi_select":
raw_value = [o.get("name") for o in value.get("multi_select", [])]
elif source_type == "status":
status_val = value.get("status")
raw_value = status_val.get("name") if status_val else None
elif source_type == "date":
raw_value = value.get("date")
elif source_type == "checkbox":
raw_value = value.get("checkbox")
elif source_type == "url":
raw_value = value.get("url")
elif source_type == "email":
raw_value = value.get("email")
elif source_type == "phone_number":
raw_value = value.get("phone_number")
if raw_value is None:
return None
# Apply value mapping if provided
if value_mapping and isinstance(raw_value, str):
raw_value = value_mapping.get(raw_value, raw_value)
elif value_mapping and isinstance(raw_value, list):
raw_value = [value_mapping.get(v, v) for v in raw_value]
# Convert to target type
if target_type == "title":
return {"title": [{"text": {"content": str(raw_value)}}]}
elif target_type == "rich_text":
return {"rich_text": [{"text": {"content": str(raw_value)}}]}
elif target_type == "number":
try:
return {"number": float(raw_value) if raw_value else None}
except (ValueError, TypeError):
return None
elif target_type == "select":
return {"select": {"name": str(raw_value)}} if raw_value else None
elif target_type == "multi_select":
if isinstance(raw_value, list):
return {"multi_select": [{"name": v} for v in raw_value]}
return {"multi_select": [{"name": str(raw_value)}]}
elif target_type == "status":
return {"status": {"name": str(raw_value)}} if raw_value else None
elif target_type == "date":
return {"date": raw_value} if raw_value else None
elif target_type == "checkbox":
return {"checkbox": bool(raw_value)}
elif target_type == "url":
return {"url": str(raw_value)} if raw_value else None
elif target_type == "email":
return {"email": str(raw_value)} if raw_value else None
elif target_type == "phone_number":
return {"phone_number": str(raw_value)} if raw_value else None
return None
def map_page_properties(
self,
source_page: dict,
mapping: dict,
source_schema: dict,
target_schema: dict,
) -> dict:
"""Map source page properties to target schema."""
source_props = source_page.get("properties", {})
target_props = {}
for source_name, mapping_config in mapping.items():
if source_name not in source_props:
continue
target_name = mapping_config.get("target", source_name)
value_mapping = mapping_config.get("value_mapping")
source_type = source_schema["properties"].get(source_name, {}).get("type")
target_type = target_schema["properties"].get(target_name, {}).get("type")
if not source_type or not target_type:
logger.warning(
f"Skipping {source_name}: source_type={source_type}, target_type={target_type}"
)
continue
transformed = self.transform_property(
source_props[source_name],
source_type,
target_type,
value_mapping,
)
if transformed:
target_props[target_name] = transformed
return target_props
async def migrate_page(
self,
page: dict,
target_database_id: str,
mapping: dict,
source_schema: dict,
target_schema: dict,
) -> dict:
"""Migrate a single page to target database."""
page_id = page["id"]
try:
properties = self.map_page_properties(
page, mapping, source_schema, target_schema
)
if not properties:
self.stats["pages_skipped"] += 1
return {"page_id": page_id, "success": False, "reason": "no_properties"}
if self.dry_run:
logger.debug(f"[DRY-RUN] Would create: {properties}")
return {"page_id": page_id, "success": True, "dry_run": True}
result = await self._request(
self.client.pages.create(
parent={"database_id": target_database_id},
properties=properties,
)
)
self.stats["pages_migrated"] += 1
return {"page_id": page_id, "success": True, "new_page_id": result["id"]}
except Exception as e:
self.stats["errors"] += 1
logger.error(f"Failed to migrate page {page_id}: {e}")
return {"page_id": page_id, "success": False, "error": str(e)}
async def migrate(
self,
source_db: str,
target_db: str,
mapping: dict,
) -> list[dict]:
"""Execute full migration."""
logger.info("Fetching schemas...")
source_schema = await self.get_schema(source_db)
target_schema = await self.get_schema(target_db)
logger.info(f"Source: {len(source_schema['properties'])} properties")
logger.info(f"Target: {len(target_schema['properties'])} properties")
logger.info("Fetching source pages...")
pages = await self.fetch_all_pages(source_db)
logger.info(f"Found {len(pages)} pages to migrate")
results = []
for page in tqdm(pages, desc="Migrating"):
result = await self.migrate_page(
page, target_db, mapping, source_schema, target_schema
)
results.append(result)
return results
def print_stats(self):
logger.info("=" * 50)
logger.info("Migration Statistics:")
logger.info(f" Pages Fetched: {self.stats['pages_fetched']}")
logger.info(f" Pages Migrated: {self.stats['pages_migrated']}")
logger.info(f" Pages Skipped: {self.stats['pages_skipped']}")
logger.info(f" Errors: {self.stats['errors']}")
logger.info("=" * 50)
async def close(self):
await self.client.aclose()
def generate_mapping_template(source_schema: dict, target_schema: dict) -> dict:
"""Generate a mapping template for user to customize."""
mapping = {}
for prop_name, prop_config in source_schema.get("properties", {}).items():
source_type = prop_config.get("type")
# Try to find matching property in target
target_match = None
for t_name, t_config in target_schema.get("properties", {}).items():
if t_name.lower() == prop_name.lower():
target_match = t_name
break
mapping[prop_name] = {
"target": target_match or prop_name,
"source_type": source_type,
"value_mapping": None, # User can add {"old_value": "new_value"}
}
return mapping
async def main():
parser = argparse.ArgumentParser(description="Notion Schema Migrator")
parser.add_argument("--source-db", "-s", required=True, help="Source database ID")
parser.add_argument("--target-db", "-t", required=True, help="Target database ID")
parser.add_argument("--mapping", "-m", help="JSON mapping file path")
parser.add_argument(
"--generate-mapping",
action="store_true",
help="Generate mapping template",
)
parser.add_argument("--dry-run", action="store_true", help="Preview without executing")
parser.add_argument("--output", "-o", help="Output file for generated mapping")
args = parser.parse_args()
if not NOTION_API_KEY:
logger.error("NOTION_TOKEN or NOTION_API_KEY not set")
return
migrator = SchemaMigrator(NOTION_API_KEY, dry_run=args.dry_run)
try:
if args.generate_mapping:
source_schema = await migrator.get_schema(args.source_db)
target_schema = await migrator.get_schema(args.target_db)
mapping = generate_mapping_template(source_schema, target_schema)
output_file = args.output or "mapping_template.json"
with open(output_file, "w") as f:
json.dump(mapping, f, indent=2)
logger.info(f"Mapping template saved to {output_file}")
return
if not args.mapping:
logger.error("--mapping required for migration (or use --generate-mapping)")
return
with open(args.mapping) as f:
mapping = json.load(f)
results = await migrator.migrate(args.source_db, args.target_db, mapping)
migrator.print_stats()
# Save results
output_file = f"migration_results_{datetime.now():%Y%m%d_%H%M%S}.json"
with open(output_file, "w") as f:
json.dump(results, f, indent=2)
logger.info(f"Results saved to {output_file}")
finally:
await migrator.close()
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,56 @@
{
"_comment": "Property Mapping Template - Customize for your migration",
"_instructions": {
"target": "Name of property in target database",
"source_type": "Auto-detected, for reference only",
"value_mapping": "Optional: Map old values to new values"
},
"Name": {
"target": "Project Name",
"source_type": "title",
"value_mapping": null
},
"Status": {
"target": "Stage",
"source_type": "select",
"value_mapping": {
"Not Started": "Backlog",
"In Progress": "Active",
"Done": "Completed",
"On Hold": "Paused"
}
},
"Priority": {
"target": "Priority",
"source_type": "select",
"value_mapping": {
"P1": "High",
"P2": "Medium",
"P3": "Low"
}
},
"Due Date": {
"target": "Deadline",
"source_type": "date",
"value_mapping": null
},
"Tags": {
"target": "Categories",
"source_type": "multi_select",
"value_mapping": {
"marketing": "Marketing",
"dev": "Development",
"design": "Design"
}
},
"Description": {
"target": "Notes",
"source_type": "rich_text",
"value_mapping": null
},
"Completed": {
"target": "Is Done",
"source_type": "checkbox",
"value_mapping": null
}
}

View File

@@ -0,0 +1,95 @@
---
name: notion-organizer
version: 1.0.0
description: Notion workspace management agent for organizing, restructuring, consolidating, and maintaining databases and pages. Use when user asks to search Notion, organize databases, clean up properties, migrate data, merge databases, audit schemas, or manage Notion content. Activates for keywords like Notion, database, knowledge base, wiki, workspace organization.
allowed-tools: mcp__notion__*, Read, Write, Edit, Bash(python:*), Bash(pip:*)
---
# Notion Organizer Skill
## Purpose
Specialized Notion workspace management capability for:
- Database schema analysis and optimization
- Property standardization and cleanup
- Content restructuring and hierarchy optimization
- Database merging and migration
- Bulk operations with rate-limit compliance
## Execution Strategy: Three-Tier Approach
Always follow this priority order:
### Tier 1: Notion MCP Tools (Primary)
Use built-in MCP tools first. Available tools:
| Tool | Purpose |
|------|---------|
| `mcp__notion__search` | Find pages/databases by keyword |
| `mcp__notion__get-page` | Retrieve page content |
| `mcp__notion__get-database` | Retrieve database schema |
| `mcp__notion__create-page` | Create new pages |
| `mcp__notion__update-page` | Modify page properties |
| `mcp__notion__query-database` | Query database with filters |
### Tier 2: Alternative Approaches (Fallback)
If MCP tools insufficient:
- Export/import via filesystem (user action required)
- Memory tools for tracking state across sessions
- Sequential thinking for complex planning
### Tier 3: Python Scripts (Advanced)
For bulk operations (50+ items):
- Generate async Python scripts
- Include rate limiting (3 req/sec max)
- Provide requirements.txt
- Always include dry-run option
See `scripts/` directory for templates.
## Operational Guidelines
### Before Any Modification
1. **Fetch first**: Always examine current structure before changes
2. **Confirm destructive actions**: Get user approval for deletes/major restructures
3. **Estimate impact**: For large operations, provide time/API call estimates
4. **Backup reminder**: Remind about Notion version history
### Rate Limits (Critical)
- Maximum: 3 requests/second average
- Use pagination (100 items max per request)
- Implement exponential backoff on 429 errors
### Communication
- Korean for explanations (한국어로 설명)
- English for code and technical terms
- Structured before/after summaries
## Quick Commands
### Database Audit
"Analyze [database name] structure and recommend optimizations"
### Property Cleanup
"Standardize property names in [database] to [convention]"
### Bulk Move
"Move all pages tagged [X] from [source] to [target]"
### Schema Migration
"Migrate data from [source database] to [target database]"
## Workflow Patterns
See `reference.md` for detailed workflow documentation.
See `scripts/` for Python templates.
## Limitations
- Cannot access unshared databases/pages
- Cannot modify workspace settings
- Cannot recover permanently deleted content
- Large operations (1000+ pages) require Python scripts

View File

@@ -0,0 +1,299 @@
# Notion Organizer Examples
## Quick Reference: Common Tasks
### 1. Database Audit
**Request:**
```
Audit my "Client Projects" database and recommend optimizations
```
**Expected Actions:**
1. Fetch database schema using MCP
2. Analyze properties: types, naming, usage
3. Report findings in Korean
4. Recommend specific improvements
**Sample Output:**
```
## Database Audit: Client Projects
### 현재 구조 (Current Structure)
- Properties: 15개
- 페이지 수: 234개
### 발견된 문제점 (Issues Found)
1. **중복 속성**: "담당자"와 "Assignee" 동일한 용도
2. **미사용 속성**: "Legacy ID" - 모든 값이 비어있음
3. **명명 불일치**: "due_date" vs "Due Date" vs "마감일"
### 권장 조치 (Recommendations)
1. "담당자"와 "Assignee" 통합 → "담당자"로 유지
2. "Legacy ID" 속성 삭제
3. 명명 규칙 통일: Title Case 권장
```
---
### 2. Bulk Status Update
**Request:**
```
Change all pages with status "Pending Review" to "In Review"
in the Tasks database
```
**Tier 1 Approach (MCP):**
```python
# Claude uses MCP tools:
1. mcp__notion__query-database with filter:
{"property": "Status", "select": {"equals": "Pending Review"}}
2. For each page, mcp__notion__update-page:
{"Status": {"select": {"name": "In Review"}}}
```
**Tier 3 Approach (Python - for 50+ pages):**
```bash
python scripts/async_organizer.py \
--database-id abc123 \
--operation status-update \
--old-status "Pending Review" \
--new-status "In Review" \
--dry-run # Test first!
```
---
### 3. Schema Migration
**Request:**
```
Migrate data from "Old Projects" to "New Projects" database.
Map Status→Stage, Due Date→Deadline, Tags→Categories
```
**Step 1: Generate Mapping Template**
```bash
python scripts/schema_migrator.py \
--source-db old_projects_id \
--target-db new_projects_id \
--generate-mapping \
--output my_mapping.json
```
**Step 2: Customize Mapping**
Edit `my_mapping.json`:
```json
{
"Status": {
"target": "Stage",
"value_mapping": {
"Todo": "Backlog",
"Doing": "In Progress",
"Done": "Complete"
}
}
}
```
**Step 3: Execute Migration**
```bash
# Dry run first
python scripts/schema_migrator.py \
--source-db old_projects_id \
--target-db new_projects_id \
--mapping my_mapping.json \
--dry-run
# Execute
python scripts/schema_migrator.py \
--source-db old_projects_id \
--target-db new_projects_id \
--mapping my_mapping.json
```
---
### 4. Property Cleanup
**Request:**
```
Standardize all property names in "Marketing Campaigns"
to Title Case with spaces
```
**Before:**
```
- campaign_name → Campaign Name
- startDate → Start Date
- end-date → End Date
- STATUS → Status
- assigned_to → Assigned To
```
**MCP Approach:**
```
Use mcp__notion__update-database to rename properties:
{
"properties": {
"campaign_name": { "name": "Campaign Name" },
"startDate": { "name": "Start Date" },
"end-date": { "name": "End Date" },
"STATUS": { "name": "Status" }
}
}
```
---
### 5. Duplicate Detection
**Request:**
```
Find duplicate entries in "Contacts" database based on email
```
**Python Script Approach:**
```python
# Pseudocode for duplicate detection
pages = fetch_all_pages(database_id)
# Group by email
email_groups = {}
for page in pages:
email = get_property(page, "Email")
if email:
email_groups.setdefault(email, []).append(page)
# Find duplicates
duplicates = {
email: pages
for email, pages in email_groups.items()
if len(pages) > 1
}
# Report
for email, dup_pages in duplicates.items():
print(f"Duplicate: {email}")
for p in dup_pages:
print(f" - {get_title(p)} (created: {p['created_time']})")
```
---
### 6. Archive Old Content
**Request:**
```
Move all tasks completed more than 90 days ago to Archive database
```
**Filter:**
```json
{
"and": [
{
"property": "Status",
"status": { "equals": "Complete" }
},
{
"property": "Completed Date",
"date": {
"before": "2025-09-07"
}
}
]
}
```
**Process:**
1. Query with filter
2. For each page:
- Create copy in Archive database
- Update original with "Archived" status or delete
3. Report summary
---
### 7. Relation Audit
**Request:**
```
Find all pages in "Tasks" that have broken relations to "Projects"
```
**Approach:**
1. Fetch all Tasks pages
2. For each task, check Project relation
3. Verify referenced Project page exists
4. Report broken relations
**Sample Output:**
```
## Relation Audit: Tasks → Projects
총 작업: 150개
정상 연결: 142개
끊어진 연결: 8개
### 끊어진 연결 목록:
1. "Website Redesign Phase 2" → Project not found
2. "Q3 Marketing Review" → Project deleted
...
### 권장 조치:
- 삭제된 프로젝트 복원 또는
- 해당 작업들을 다른 프로젝트에 재할당
```
---
## Environment Setup
### Quick Start
```bash
# Navigate to scripts directory
cd ~/.claude/skills/notion-organizer/scripts
# Create virtual environment
python -m venv venv
source venv/bin/activate # macOS/Linux
# Install dependencies
pip install -r requirements.txt
# Set environment variable
export NOTION_TOKEN="your_token_here"
# Or create .env file with NOTION_TOKEN=your_token
```
### Verify Setup
```bash
# Test with audit (read-only)
python async_organizer.py --database-id YOUR_DB_ID --operation audit
```
---
## Troubleshooting
### Rate Limit Errors (429)
- Scripts automatically retry with exponential backoff
- If persistent, reduce `MAX_CONCURRENT_REQUESTS` to 2
### Permission Errors (404)
- Ensure database is shared with your integration
- Check integration has correct capabilities
### Property Type Mismatch
- Use `--generate-mapping` to see current types
- Some conversions require manual handling (e.g., people → text)
### Large Databases (1000+ pages)
- Always use Python scripts, not MCP
- Consider running in batches with checkpoints
- Monitor API usage in Notion settings

View File

@@ -0,0 +1,127 @@
# CLAUDE.md
## Overview
Technical SEO auditor for crawlability fundamentals: robots.txt validation, XML sitemap analysis, and URL accessibility checking.
## Quick Start
```bash
# Install dependencies
pip install -r scripts/requirements.txt
# Robots.txt analysis
python scripts/robots_checker.py --url https://example.com
# Sitemap validation
python scripts/sitemap_validator.py --url https://example.com/sitemap.xml
# Async URL crawl (check sitemap URLs accessibility)
python scripts/sitemap_crawler.py --sitemap https://example.com/sitemap.xml
```
## Scripts
| Script | Purpose | Key Output |
|--------|---------|------------|
| `robots_checker.py` | Parse and validate robots.txt | User-agent rules, disallow patterns, sitemap declarations |
| `sitemap_validator.py` | Validate XML sitemap structure | URL count, lastmod dates, size limits, syntax errors |
| `sitemap_crawler.py` | Async check URL accessibility | HTTP status codes, response times, broken links |
| `base_client.py` | Shared utilities | RateLimiter, ConfigManager, BaseAsyncClient |
## Robots.txt Checker
```bash
# Basic analysis
python scripts/robots_checker.py --url https://example.com
# Test specific URL against rules
python scripts/robots_checker.py --url https://example.com --test-url /admin/
# Output JSON
python scripts/robots_checker.py --url https://example.com --json
```
**Checks performed**:
- Syntax validation
- User-agent rule parsing
- Disallow/Allow pattern analysis
- Sitemap declarations
- Critical resource access (CSS/JS/images)
## Sitemap Validator
```bash
# Validate sitemap
python scripts/sitemap_validator.py --url https://example.com/sitemap.xml
# Include sitemap index parsing
python scripts/sitemap_validator.py --url https://example.com/sitemap_index.xml --follow-index
```
**Validation rules**:
- XML syntax correctness
- URL count limit (50,000 max per sitemap)
- File size limit (50MB max uncompressed)
- Lastmod date format validation
- Sitemap index structure
## Sitemap Crawler
```bash
# Crawl all URLs in sitemap
python scripts/sitemap_crawler.py --sitemap https://example.com/sitemap.xml
# Limit concurrent requests
python scripts/sitemap_crawler.py --sitemap https://example.com/sitemap.xml --concurrency 10
# Sample mode (check subset)
python scripts/sitemap_crawler.py --sitemap https://example.com/sitemap.xml --sample 100
```
**Output includes**:
- HTTP status codes per URL
- Response times
- Redirect chains
- Broken links (4xx, 5xx)
## Output Format
All scripts support `--json` flag for structured output:
```json
{
"url": "https://example.com",
"status": "valid|invalid|warning",
"issues": [
{
"type": "error|warning|info",
"message": "Description",
"location": "Line or URL"
}
],
"summary": {}
}
```
## Common Issues Detected
| Category | Issue | Severity |
|----------|-------|----------|
| Robots.txt | Missing sitemap declaration | Medium |
| Robots.txt | Blocking CSS/JS resources | High |
| Robots.txt | Overly broad disallow rules | Medium |
| Sitemap | URLs returning 404 | High |
| Sitemap | Missing lastmod dates | Low |
| Sitemap | Exceeds 50,000 URL limit | High |
| Sitemap | Non-canonical URLs included | Medium |
## Configuration
Environment variables (optional):
```bash
# Rate limiting
CRAWL_DELAY=1.0 # Seconds between requests
MAX_CONCURRENT=20 # Async concurrency limit
REQUEST_TIMEOUT=30 # Request timeout seconds
```

View File

@@ -0,0 +1,207 @@
"""
Base Client - Shared async client utilities
===========================================
Purpose: Rate-limited async operations for API clients
Python: 3.10+
"""
import asyncio
import logging
import os
from asyncio import Semaphore
from datetime import datetime
from typing import Any, Callable, TypeVar
from dotenv import load_dotenv
from tenacity import (
retry,
stop_after_attempt,
wait_exponential,
retry_if_exception_type,
)
# Load environment variables
load_dotenv()
# Logging setup
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - %(levelname)s - %(message)s",
)
T = TypeVar("T")
class RateLimiter:
"""Rate limiter using token bucket algorithm."""
def __init__(self, rate: float, per: float = 1.0):
"""
Initialize rate limiter.
Args:
rate: Number of requests allowed
per: Time period in seconds (default: 1 second)
"""
self.rate = rate
self.per = per
self.tokens = rate
self.last_update = datetime.now()
self._lock = asyncio.Lock()
async def acquire(self) -> None:
"""Acquire a token, waiting if necessary."""
async with self._lock:
now = datetime.now()
elapsed = (now - self.last_update).total_seconds()
self.tokens = min(self.rate, self.tokens + elapsed * (self.rate / self.per))
self.last_update = now
if self.tokens < 1:
wait_time = (1 - self.tokens) * (self.per / self.rate)
await asyncio.sleep(wait_time)
self.tokens = 0
else:
self.tokens -= 1
class BaseAsyncClient:
"""Base class for async API clients with rate limiting."""
def __init__(
self,
max_concurrent: int = 5,
requests_per_second: float = 3.0,
logger: logging.Logger | None = None,
):
"""
Initialize base client.
Args:
max_concurrent: Maximum concurrent requests
requests_per_second: Rate limit
logger: Logger instance
"""
self.semaphore = Semaphore(max_concurrent)
self.rate_limiter = RateLimiter(requests_per_second)
self.logger = logger or logging.getLogger(self.__class__.__name__)
self.stats = {
"requests": 0,
"success": 0,
"errors": 0,
"retries": 0,
}
@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=2, max=10),
retry=retry_if_exception_type(Exception),
)
async def _rate_limited_request(
self,
coro: Callable[[], Any],
) -> Any:
"""Execute a request with rate limiting and retry."""
async with self.semaphore:
await self.rate_limiter.acquire()
self.stats["requests"] += 1
try:
result = await coro()
self.stats["success"] += 1
return result
except Exception as e:
self.stats["errors"] += 1
self.logger.error(f"Request failed: {e}")
raise
async def batch_requests(
self,
requests: list[Callable[[], Any]],
desc: str = "Processing",
) -> list[Any]:
"""Execute multiple requests concurrently."""
try:
from tqdm.asyncio import tqdm
has_tqdm = True
except ImportError:
has_tqdm = False
async def execute(req: Callable) -> Any:
try:
return await self._rate_limited_request(req)
except Exception as e:
return {"error": str(e)}
tasks = [execute(req) for req in requests]
if has_tqdm:
results = []
for coro in tqdm.as_completed(tasks, total=len(tasks), desc=desc):
result = await coro
results.append(result)
return results
else:
return await asyncio.gather(*tasks, return_exceptions=True)
def print_stats(self) -> None:
"""Print request statistics."""
self.logger.info("=" * 40)
self.logger.info("Request Statistics:")
self.logger.info(f" Total Requests: {self.stats['requests']}")
self.logger.info(f" Successful: {self.stats['success']}")
self.logger.info(f" Errors: {self.stats['errors']}")
self.logger.info("=" * 40)
class ConfigManager:
"""Manage API configuration and credentials."""
def __init__(self):
load_dotenv()
@property
def google_credentials_path(self) -> str | None:
"""Get Google service account credentials path."""
# Prefer SEO-specific credentials, fallback to general credentials
seo_creds = os.path.expanduser("~/.credential/ourdigital-seo-agent.json")
if os.path.exists(seo_creds):
return seo_creds
return os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
@property
def pagespeed_api_key(self) -> str | None:
"""Get PageSpeed Insights API key."""
return os.getenv("PAGESPEED_API_KEY")
@property
def custom_search_api_key(self) -> str | None:
"""Get Custom Search API key."""
return os.getenv("CUSTOM_SEARCH_API_KEY")
@property
def custom_search_engine_id(self) -> str | None:
"""Get Custom Search Engine ID."""
return os.getenv("CUSTOM_SEARCH_ENGINE_ID")
@property
def notion_token(self) -> str | None:
"""Get Notion API token."""
return os.getenv("NOTION_TOKEN") or os.getenv("NOTION_API_KEY")
def validate_google_credentials(self) -> bool:
"""Validate Google credentials are configured."""
creds_path = self.google_credentials_path
if not creds_path:
return False
return os.path.exists(creds_path)
def get_required(self, key: str) -> str:
"""Get required environment variable or raise error."""
value = os.getenv(key)
if not value:
raise ValueError(f"Missing required environment variable: {key}")
return value
# Singleton config instance
config = ConfigManager()

View File

@@ -0,0 +1,569 @@
"""
Page Analyzer - Extract SEO metadata from web pages
===================================================
Purpose: Comprehensive page-level SEO data extraction
Python: 3.10+
Usage:
from page_analyzer import PageAnalyzer, PageMetadata
analyzer = PageAnalyzer()
metadata = analyzer.analyze_url("https://example.com/page")
"""
import json
import logging
import re
from dataclasses import dataclass, field
from datetime import datetime
from typing import Any
from urllib.parse import urljoin, urlparse
import requests
from bs4 import BeautifulSoup
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - %(levelname)s - %(message)s",
)
logger = logging.getLogger(__name__)
@dataclass
class LinkData:
"""Represents a link found on a page."""
url: str
anchor_text: str
is_internal: bool
is_nofollow: bool = False
link_type: str = "body" # body, nav, footer, etc.
@dataclass
class HeadingData:
"""Represents a heading found on a page."""
level: int # 1-6
text: str
@dataclass
class SchemaData:
"""Represents schema.org structured data."""
schema_type: str
properties: dict
format: str = "json-ld" # json-ld, microdata, rdfa
@dataclass
class OpenGraphData:
"""Represents Open Graph metadata."""
og_title: str | None = None
og_description: str | None = None
og_image: str | None = None
og_url: str | None = None
og_type: str | None = None
og_site_name: str | None = None
og_locale: str | None = None
twitter_card: str | None = None
twitter_title: str | None = None
twitter_description: str | None = None
twitter_image: str | None = None
@dataclass
class PageMetadata:
"""Complete SEO metadata for a page."""
# Basic info
url: str
status_code: int = 0
content_type: str = ""
response_time_ms: float = 0
analyzed_at: datetime = field(default_factory=datetime.now)
# Meta tags
title: str | None = None
title_length: int = 0
meta_description: str | None = None
meta_description_length: int = 0
canonical_url: str | None = None
robots_meta: str | None = None
# Language
html_lang: str | None = None
hreflang_tags: list[dict] = field(default_factory=list) # [{"lang": "en", "url": "..."}]
# Headings
headings: list[HeadingData] = field(default_factory=list)
h1_count: int = 0
h1_text: str | None = None
# Open Graph & Social
open_graph: OpenGraphData = field(default_factory=OpenGraphData)
# Schema/Structured Data
schema_data: list[SchemaData] = field(default_factory=list)
schema_types_found: list[str] = field(default_factory=list)
# Links
internal_links: list[LinkData] = field(default_factory=list)
external_links: list[LinkData] = field(default_factory=list)
internal_link_count: int = 0
external_link_count: int = 0
# Images
images_total: int = 0
images_without_alt: int = 0
images_with_alt: int = 0
# Content metrics
word_count: int = 0
# Issues found
issues: list[str] = field(default_factory=list)
warnings: list[str] = field(default_factory=list)
def to_dict(self) -> dict:
"""Convert to dictionary for JSON serialization."""
return {
"url": self.url,
"status_code": self.status_code,
"content_type": self.content_type,
"response_time_ms": self.response_time_ms,
"analyzed_at": self.analyzed_at.isoformat(),
"title": self.title,
"title_length": self.title_length,
"meta_description": self.meta_description,
"meta_description_length": self.meta_description_length,
"canonical_url": self.canonical_url,
"robots_meta": self.robots_meta,
"html_lang": self.html_lang,
"hreflang_tags": self.hreflang_tags,
"h1_count": self.h1_count,
"h1_text": self.h1_text,
"headings_count": len(self.headings),
"schema_types_found": self.schema_types_found,
"internal_link_count": self.internal_link_count,
"external_link_count": self.external_link_count,
"images_total": self.images_total,
"images_without_alt": self.images_without_alt,
"word_count": self.word_count,
"issues": self.issues,
"warnings": self.warnings,
"open_graph": {
"og_title": self.open_graph.og_title,
"og_description": self.open_graph.og_description,
"og_image": self.open_graph.og_image,
"og_url": self.open_graph.og_url,
"og_type": self.open_graph.og_type,
},
}
def get_summary(self) -> str:
"""Get a brief summary of the page analysis."""
lines = [
f"URL: {self.url}",
f"Status: {self.status_code}",
f"Title: {self.title[:50] + '...' if self.title and len(self.title) > 50 else self.title}",
f"Description: {'' if self.meta_description else '✗ Missing'}",
f"Canonical: {'' if self.canonical_url else '✗ Missing'}",
f"H1: {self.h1_count} found",
f"Schema: {', '.join(self.schema_types_found) if self.schema_types_found else 'None'}",
f"Links: {self.internal_link_count} internal, {self.external_link_count} external",
f"Images: {self.images_total} total, {self.images_without_alt} without alt",
]
if self.issues:
lines.append(f"Issues: {len(self.issues)}")
return "\n".join(lines)
class PageAnalyzer:
"""Analyze web pages for SEO metadata."""
DEFAULT_USER_AGENT = "Mozilla/5.0 (compatible; OurDigitalSEOBot/1.0; +https://ourdigital.org)"
def __init__(
self,
user_agent: str | None = None,
timeout: int = 30,
):
"""
Initialize page analyzer.
Args:
user_agent: Custom user agent string
timeout: Request timeout in seconds
"""
self.user_agent = user_agent or self.DEFAULT_USER_AGENT
self.timeout = timeout
self.session = requests.Session()
self.session.headers.update({
"User-Agent": self.user_agent,
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9,ko;q=0.8",
})
def analyze_url(self, url: str) -> PageMetadata:
"""
Analyze a URL and extract SEO metadata.
Args:
url: URL to analyze
Returns:
PageMetadata object with all extracted data
"""
metadata = PageMetadata(url=url)
try:
# Fetch page
start_time = datetime.now()
response = self.session.get(url, timeout=self.timeout, allow_redirects=True)
metadata.response_time_ms = (datetime.now() - start_time).total_seconds() * 1000
metadata.status_code = response.status_code
metadata.content_type = response.headers.get("Content-Type", "")
if response.status_code != 200:
metadata.issues.append(f"HTTP {response.status_code} status")
if response.status_code >= 400:
return metadata
# Parse HTML
soup = BeautifulSoup(response.text, "html.parser")
base_url = url
# Extract all metadata
self._extract_basic_meta(soup, metadata)
self._extract_canonical(soup, metadata, base_url)
self._extract_robots_meta(soup, metadata)
self._extract_hreflang(soup, metadata)
self._extract_headings(soup, metadata)
self._extract_open_graph(soup, metadata)
self._extract_schema(soup, metadata)
self._extract_links(soup, metadata, base_url)
self._extract_images(soup, metadata)
self._extract_content_metrics(soup, metadata)
# Run SEO checks
self._run_seo_checks(metadata)
except requests.RequestException as e:
metadata.issues.append(f"Request failed: {str(e)}")
logger.error(f"Failed to analyze {url}: {e}")
except Exception as e:
metadata.issues.append(f"Analysis error: {str(e)}")
logger.error(f"Error analyzing {url}: {e}")
return metadata
def _extract_basic_meta(self, soup: BeautifulSoup, metadata: PageMetadata) -> None:
"""Extract title and meta description."""
# Title
title_tag = soup.find("title")
if title_tag and title_tag.string:
metadata.title = title_tag.string.strip()
metadata.title_length = len(metadata.title)
# Meta description
desc_tag = soup.find("meta", attrs={"name": re.compile(r"^description$", re.I)})
if desc_tag and desc_tag.get("content"):
metadata.meta_description = desc_tag["content"].strip()
metadata.meta_description_length = len(metadata.meta_description)
# HTML lang
html_tag = soup.find("html")
if html_tag and html_tag.get("lang"):
metadata.html_lang = html_tag["lang"]
def _extract_canonical(self, soup: BeautifulSoup, metadata: PageMetadata, base_url: str) -> None:
"""Extract canonical URL."""
canonical = soup.find("link", rel="canonical")
if canonical and canonical.get("href"):
metadata.canonical_url = urljoin(base_url, canonical["href"])
def _extract_robots_meta(self, soup: BeautifulSoup, metadata: PageMetadata) -> None:
"""Extract robots meta tag."""
robots = soup.find("meta", attrs={"name": re.compile(r"^robots$", re.I)})
if robots and robots.get("content"):
metadata.robots_meta = robots["content"]
# Also check for googlebot-specific
googlebot = soup.find("meta", attrs={"name": re.compile(r"^googlebot$", re.I)})
if googlebot and googlebot.get("content"):
if metadata.robots_meta:
metadata.robots_meta += f" | googlebot: {googlebot['content']}"
else:
metadata.robots_meta = f"googlebot: {googlebot['content']}"
def _extract_hreflang(self, soup: BeautifulSoup, metadata: PageMetadata) -> None:
"""Extract hreflang tags."""
hreflang_tags = soup.find_all("link", rel="alternate", hreflang=True)
for tag in hreflang_tags:
if tag.get("href") and tag.get("hreflang"):
metadata.hreflang_tags.append({
"lang": tag["hreflang"],
"url": tag["href"]
})
def _extract_headings(self, soup: BeautifulSoup, metadata: PageMetadata) -> None:
"""Extract all headings."""
for level in range(1, 7):
for heading in soup.find_all(f"h{level}"):
text = heading.get_text(strip=True)
if text:
metadata.headings.append(HeadingData(level=level, text=text))
# Count H1s specifically
h1_tags = soup.find_all("h1")
metadata.h1_count = len(h1_tags)
if h1_tags:
metadata.h1_text = h1_tags[0].get_text(strip=True)
def _extract_open_graph(self, soup: BeautifulSoup, metadata: PageMetadata) -> None:
"""Extract Open Graph and Twitter Card data."""
og = metadata.open_graph
# Open Graph tags
og_mappings = {
"og:title": "og_title",
"og:description": "og_description",
"og:image": "og_image",
"og:url": "og_url",
"og:type": "og_type",
"og:site_name": "og_site_name",
"og:locale": "og_locale",
}
for og_prop, attr_name in og_mappings.items():
tag = soup.find("meta", property=og_prop)
if tag and tag.get("content"):
setattr(og, attr_name, tag["content"])
# Twitter Card tags
twitter_mappings = {
"twitter:card": "twitter_card",
"twitter:title": "twitter_title",
"twitter:description": "twitter_description",
"twitter:image": "twitter_image",
}
for tw_name, attr_name in twitter_mappings.items():
tag = soup.find("meta", attrs={"name": tw_name})
if tag and tag.get("content"):
setattr(og, attr_name, tag["content"])
def _extract_schema(self, soup: BeautifulSoup, metadata: PageMetadata) -> None:
"""Extract schema.org structured data."""
# JSON-LD
for script in soup.find_all("script", type="application/ld+json"):
try:
data = json.loads(script.string)
if isinstance(data, list):
for item in data:
self._process_schema_item(item, metadata, "json-ld")
else:
self._process_schema_item(data, metadata, "json-ld")
except (json.JSONDecodeError, TypeError):
continue
# Microdata (basic detection)
for item in soup.find_all(itemscope=True):
itemtype = item.get("itemtype", "")
if itemtype:
schema_type = itemtype.split("/")[-1]
if schema_type not in metadata.schema_types_found:
metadata.schema_types_found.append(schema_type)
metadata.schema_data.append(SchemaData(
schema_type=schema_type,
properties={},
format="microdata"
))
def _process_schema_item(self, data: dict, metadata: PageMetadata, format_type: str) -> None:
"""Process a single schema.org item."""
if not isinstance(data, dict):
return
schema_type = data.get("@type", "Unknown")
if isinstance(schema_type, list):
schema_type = schema_type[0] if schema_type else "Unknown"
if schema_type not in metadata.schema_types_found:
metadata.schema_types_found.append(schema_type)
metadata.schema_data.append(SchemaData(
schema_type=schema_type,
properties=data,
format=format_type
))
# Process nested @graph items
if "@graph" in data:
for item in data["@graph"]:
self._process_schema_item(item, metadata, format_type)
def _extract_links(self, soup: BeautifulSoup, metadata: PageMetadata, base_url: str) -> None:
"""Extract internal and external links."""
parsed_base = urlparse(base_url)
base_domain = parsed_base.netloc.lower()
for a_tag in soup.find_all("a", href=True):
href = a_tag["href"]
# Skip non-http links
if href.startswith(("#", "javascript:", "mailto:", "tel:")):
continue
# Resolve relative URLs
full_url = urljoin(base_url, href)
parsed_url = urlparse(full_url)
# Get anchor text
anchor_text = a_tag.get_text(strip=True)[:100] # Limit length
# Check if nofollow
rel = a_tag.get("rel", [])
if isinstance(rel, str):
rel = rel.split()
is_nofollow = "nofollow" in rel
# Determine if internal or external
link_domain = parsed_url.netloc.lower()
is_internal = (
link_domain == base_domain or
link_domain.endswith(f".{base_domain}") or
base_domain.endswith(f".{link_domain}")
)
link_data = LinkData(
url=full_url,
anchor_text=anchor_text,
is_internal=is_internal,
is_nofollow=is_nofollow,
)
if is_internal:
metadata.internal_links.append(link_data)
else:
metadata.external_links.append(link_data)
metadata.internal_link_count = len(metadata.internal_links)
metadata.external_link_count = len(metadata.external_links)
def _extract_images(self, soup: BeautifulSoup, metadata: PageMetadata) -> None:
"""Extract image information."""
images = soup.find_all("img")
metadata.images_total = len(images)
for img in images:
alt = img.get("alt", "").strip()
if alt:
metadata.images_with_alt += 1
else:
metadata.images_without_alt += 1
def _extract_content_metrics(self, soup: BeautifulSoup, metadata: PageMetadata) -> None:
"""Extract content metrics like word count."""
# Remove script and style elements
for element in soup(["script", "style", "noscript"]):
element.decompose()
# Get text content
text = soup.get_text(separator=" ", strip=True)
words = text.split()
metadata.word_count = len(words)
def _run_seo_checks(self, metadata: PageMetadata) -> None:
"""Run SEO checks and add issues/warnings."""
# Title checks
if not metadata.title:
metadata.issues.append("Missing title tag")
elif metadata.title_length < 30:
metadata.warnings.append(f"Title too short ({metadata.title_length} chars, recommend 50-60)")
elif metadata.title_length > 60:
metadata.warnings.append(f"Title too long ({metadata.title_length} chars, recommend 50-60)")
# Meta description checks
if not metadata.meta_description:
metadata.issues.append("Missing meta description")
elif metadata.meta_description_length < 120:
metadata.warnings.append(f"Meta description too short ({metadata.meta_description_length} chars)")
elif metadata.meta_description_length > 160:
metadata.warnings.append(f"Meta description too long ({metadata.meta_description_length} chars)")
# Canonical check
if not metadata.canonical_url:
metadata.warnings.append("Missing canonical tag")
elif metadata.canonical_url != metadata.url:
metadata.warnings.append(f"Canonical points to different URL: {metadata.canonical_url}")
# H1 checks
if metadata.h1_count == 0:
metadata.issues.append("Missing H1 tag")
elif metadata.h1_count > 1:
metadata.warnings.append(f"Multiple H1 tags ({metadata.h1_count})")
# Image alt check
if metadata.images_without_alt > 0:
metadata.warnings.append(f"{metadata.images_without_alt} images missing alt text")
# Schema check
if not metadata.schema_types_found:
metadata.warnings.append("No structured data found")
# Open Graph check
if not metadata.open_graph.og_title:
metadata.warnings.append("Missing Open Graph tags")
# Robots meta check
if metadata.robots_meta:
robots_lower = metadata.robots_meta.lower()
if "noindex" in robots_lower:
metadata.issues.append("Page is set to noindex")
if "nofollow" in robots_lower:
metadata.warnings.append("Page is set to nofollow")
def main():
"""CLI entry point for testing."""
import argparse
parser = argparse.ArgumentParser(description="Page SEO Analyzer")
parser.add_argument("url", help="URL to analyze")
parser.add_argument("--json", "-j", action="store_true", help="Output as JSON")
args = parser.parse_args()
analyzer = PageAnalyzer()
metadata = analyzer.analyze_url(args.url)
if args.json:
print(json.dumps(metadata.to_dict(), indent=2, ensure_ascii=False))
else:
print("=" * 60)
print("PAGE ANALYSIS REPORT")
print("=" * 60)
print(metadata.get_summary())
print()
if metadata.issues:
print("ISSUES:")
for issue in metadata.issues:
print(f"{issue}")
if metadata.warnings:
print("\nWARNINGS:")
for warning in metadata.warnings:
print(f"{warning}")
if metadata.hreflang_tags:
print(f"\nHREFLANG TAGS ({len(metadata.hreflang_tags)}):")
for tag in metadata.hreflang_tags[:5]:
print(f" {tag['lang']}: {tag['url']}")
if metadata.schema_types_found:
print(f"\nSCHEMA TYPES:")
for schema_type in metadata.schema_types_found:
print(f" - {schema_type}")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,17 @@
# 10-seo-technical-audit dependencies
# Install: pip install -r requirements.txt
# Web Scraping & Parsing
lxml>=5.1.0
beautifulsoup4>=4.12.0
requests>=2.31.0
aiohttp>=3.9.0
# Async & Retry
tenacity>=8.2.0
tqdm>=4.66.0
# Environment & CLI
python-dotenv>=1.0.0
rich>=13.7.0
typer>=0.9.0

View File

@@ -0,0 +1,540 @@
"""
Robots.txt Checker - Analyze robots.txt configuration
=====================================================
Purpose: Parse and analyze robots.txt for SEO compliance
Python: 3.10+
Usage:
python robots_checker.py --url https://example.com/robots.txt
python robots_checker.py --url https://example.com --test-url /admin/
"""
import argparse
import json
import logging
import re
from dataclasses import dataclass, field
from datetime import datetime
from typing import Any
from urllib.parse import urljoin, urlparse
from urllib.robotparser import RobotFileParser
import requests
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - %(levelname)s - %(message)s",
)
logger = logging.getLogger(__name__)
@dataclass
class RobotsIssue:
"""Represents a robots.txt issue."""
severity: str # "error", "warning", "info"
message: str
line_number: int | None = None
directive: str | None = None
suggestion: str | None = None
@dataclass
class UserAgentRules:
"""Rules for a specific user-agent."""
user_agent: str
disallow: list[str] = field(default_factory=list)
allow: list[str] = field(default_factory=list)
crawl_delay: float | None = None
@dataclass
class RobotsResult:
"""Complete robots.txt analysis result."""
url: str
accessible: bool = True
content: str = ""
rules: list[UserAgentRules] = field(default_factory=list)
sitemaps: list[str] = field(default_factory=list)
issues: list[RobotsIssue] = field(default_factory=list)
stats: dict = field(default_factory=dict)
timestamp: str = field(default_factory=lambda: datetime.now().isoformat())
def to_dict(self) -> dict:
"""Convert to dictionary for JSON output."""
return {
"url": self.url,
"accessible": self.accessible,
"sitemaps": self.sitemaps,
"rules": [
{
"user_agent": r.user_agent,
"disallow": r.disallow,
"allow": r.allow,
"crawl_delay": r.crawl_delay,
}
for r in self.rules
],
"issues": [
{
"severity": i.severity,
"message": i.message,
"line_number": i.line_number,
"directive": i.directive,
"suggestion": i.suggestion,
}
for i in self.issues
],
"stats": self.stats,
"timestamp": self.timestamp,
}
class RobotsChecker:
"""Analyze robots.txt configuration."""
# Common user agents
USER_AGENTS = {
"*": "All bots",
"Googlebot": "Google crawler",
"Googlebot-Image": "Google Image crawler",
"Googlebot-News": "Google News crawler",
"Googlebot-Video": "Google Video crawler",
"Bingbot": "Bing crawler",
"Slurp": "Yahoo crawler",
"DuckDuckBot": "DuckDuckGo crawler",
"Baiduspider": "Baidu crawler",
"Yandex": "Yandex crawler",
"facebot": "Facebook crawler",
"Twitterbot": "Twitter crawler",
"LinkedInBot": "LinkedIn crawler",
}
# Paths that should generally not be blocked
IMPORTANT_PATHS = [
"/",
"/*.css",
"/*.js",
"/*.jpg",
"/*.jpeg",
"/*.png",
"/*.gif",
"/*.svg",
"/*.webp",
]
# Paths commonly blocked
COMMON_BLOCKED = [
"/admin",
"/wp-admin",
"/login",
"/private",
"/api",
"/cgi-bin",
"/tmp",
"/search",
]
def __init__(self):
self.session = requests.Session()
self.session.headers.update({
"User-Agent": "Mozilla/5.0 (compatible; SEOAuditBot/1.0)"
})
def fetch_robots(self, url: str) -> str | None:
"""Fetch robots.txt content."""
# Ensure we're fetching robots.txt
parsed = urlparse(url)
if not parsed.path.endswith("robots.txt"):
robots_url = f"{parsed.scheme}://{parsed.netloc}/robots.txt"
else:
robots_url = url
try:
response = self.session.get(robots_url, timeout=10)
if response.status_code == 200:
return response.text
elif response.status_code == 404:
return None
else:
raise RuntimeError(f"HTTP {response.status_code}")
except requests.RequestException as e:
raise RuntimeError(f"Failed to fetch robots.txt: {e}")
def parse_robots(self, content: str) -> tuple[list[UserAgentRules], list[str]]:
"""Parse robots.txt content."""
rules = []
sitemaps = []
current_ua = None
current_rules = None
for line_num, line in enumerate(content.split("\n"), 1):
line = line.strip()
# Skip empty lines and comments
if not line or line.startswith("#"):
continue
# Parse directive
if ":" not in line:
continue
directive, value = line.split(":", 1)
directive = directive.strip().lower()
value = value.strip()
if directive == "user-agent":
# Save previous user-agent rules
if current_rules:
rules.append(current_rules)
current_ua = value
current_rules = UserAgentRules(user_agent=value)
elif directive == "disallow" and current_rules:
if value: # Empty disallow means allow all
current_rules.disallow.append(value)
elif directive == "allow" and current_rules:
if value:
current_rules.allow.append(value)
elif directive == "crawl-delay" and current_rules:
try:
current_rules.crawl_delay = float(value)
except ValueError:
pass
elif directive == "sitemap":
if value:
sitemaps.append(value)
# Don't forget last user-agent
if current_rules:
rules.append(current_rules)
return rules, sitemaps
def analyze(self, url: str) -> RobotsResult:
"""Analyze robots.txt."""
result = RobotsResult(url=url)
# Fetch robots.txt
try:
content = self.fetch_robots(url)
if content is None:
result.accessible = False
result.issues.append(RobotsIssue(
severity="info",
message="No robots.txt found (returns 404)",
suggestion="Consider creating a robots.txt file",
))
return result
except RuntimeError as e:
result.accessible = False
result.issues.append(RobotsIssue(
severity="error",
message=str(e),
))
return result
result.content = content
result.rules, result.sitemaps = self.parse_robots(content)
# Analyze content
self._analyze_syntax(result)
self._analyze_rules(result)
self._analyze_sitemaps(result)
# Calculate stats
result.stats = {
"user_agents_count": len(result.rules),
"user_agents": [r.user_agent for r in result.rules],
"total_disallow_rules": sum(len(r.disallow) for r in result.rules),
"total_allow_rules": sum(len(r.allow) for r in result.rules),
"sitemaps_count": len(result.sitemaps),
"has_crawl_delay": any(r.crawl_delay for r in result.rules),
"content_length": len(content),
}
return result
def _analyze_syntax(self, result: RobotsResult) -> None:
"""Check for syntax issues."""
lines = result.content.split("\n")
for line_num, line in enumerate(lines, 1):
line = line.strip()
# Skip empty lines and comments
if not line or line.startswith("#"):
continue
# Check for valid directive
if ":" not in line:
result.issues.append(RobotsIssue(
severity="warning",
message=f"Invalid line (missing colon): {line[:50]}",
line_number=line_num,
))
continue
directive, value = line.split(":", 1)
directive = directive.strip().lower()
valid_directives = {
"user-agent", "disallow", "allow",
"crawl-delay", "sitemap", "host",
}
if directive not in valid_directives:
result.issues.append(RobotsIssue(
severity="info",
message=f"Unknown directive: {directive}",
line_number=line_num,
directive=directive,
))
def _analyze_rules(self, result: RobotsResult) -> None:
"""Analyze blocking rules."""
# Check if there are any rules
if not result.rules:
result.issues.append(RobotsIssue(
severity="info",
message="No user-agent rules defined",
suggestion="Add User-agent: * rules to control crawling",
))
return
# Check for wildcard rule
has_wildcard = any(r.user_agent == "*" for r in result.rules)
if not has_wildcard:
result.issues.append(RobotsIssue(
severity="info",
message="No wildcard (*) user-agent defined",
suggestion="Consider adding User-agent: * as fallback",
))
# Check for blocking important resources
for rules in result.rules:
for disallow in rules.disallow:
# Check if blocking root
if disallow == "/":
result.issues.append(RobotsIssue(
severity="error",
message=f"Blocking entire site for {rules.user_agent}",
directive=f"Disallow: {disallow}",
suggestion="This will prevent indexing. Is this intentional?",
))
# Check if blocking CSS/JS
if any(ext in disallow.lower() for ext in [".css", ".js"]):
result.issues.append(RobotsIssue(
severity="warning",
message=f"Blocking CSS/JS files for {rules.user_agent}",
directive=f"Disallow: {disallow}",
suggestion="May affect rendering and SEO",
))
# Check for blocking images
if any(ext in disallow.lower() for ext in [".jpg", ".png", ".gif", ".webp"]):
result.issues.append(RobotsIssue(
severity="info",
message=f"Blocking image files for {rules.user_agent}",
directive=f"Disallow: {disallow}",
))
# Check crawl delay
if rules.crawl_delay:
if rules.crawl_delay > 10:
result.issues.append(RobotsIssue(
severity="warning",
message=f"High crawl-delay ({rules.crawl_delay}s) for {rules.user_agent}",
directive=f"Crawl-delay: {rules.crawl_delay}",
suggestion="May significantly slow indexing",
))
elif rules.crawl_delay > 0:
result.issues.append(RobotsIssue(
severity="info",
message=f"Crawl-delay set to {rules.crawl_delay}s for {rules.user_agent}",
))
def _analyze_sitemaps(self, result: RobotsResult) -> None:
"""Analyze sitemap declarations."""
if not result.sitemaps:
result.issues.append(RobotsIssue(
severity="warning",
message="No sitemap declared in robots.txt",
suggestion="Add Sitemap: directive to help crawlers find your sitemap",
))
else:
for sitemap in result.sitemaps:
if not sitemap.startswith("http"):
result.issues.append(RobotsIssue(
severity="warning",
message=f"Sitemap URL should be absolute: {sitemap}",
directive=f"Sitemap: {sitemap}",
))
def test_url(self, robots_url: str, test_path: str,
user_agent: str = "Googlebot") -> dict:
"""Test if a specific URL is allowed."""
# Use Python's built-in parser
rp = RobotFileParser()
# Ensure robots.txt URL
parsed = urlparse(robots_url)
if not parsed.path.endswith("robots.txt"):
robots_txt_url = f"{parsed.scheme}://{parsed.netloc}/robots.txt"
else:
robots_txt_url = robots_url
rp.set_url(robots_txt_url)
try:
rp.read()
except Exception as e:
return {
"path": test_path,
"user_agent": user_agent,
"allowed": None,
"error": str(e),
}
# Build full URL for testing
base_url = f"{parsed.scheme}://{parsed.netloc}"
full_url = urljoin(base_url, test_path)
allowed = rp.can_fetch(user_agent, full_url)
return {
"path": test_path,
"user_agent": user_agent,
"allowed": allowed,
"full_url": full_url,
}
def generate_report(self, result: RobotsResult) -> str:
"""Generate human-readable analysis report."""
lines = [
"=" * 60,
"Robots.txt Analysis Report",
"=" * 60,
f"URL: {result.url}",
f"Accessible: {'Yes' if result.accessible else 'No'}",
f"Timestamp: {result.timestamp}",
"",
]
if result.accessible:
lines.append("Statistics:")
for key, value in result.stats.items():
if key == "user_agents":
lines.append(f" {key}: {', '.join(value) if value else 'None'}")
else:
lines.append(f" {key}: {value}")
lines.append("")
if result.sitemaps:
lines.append(f"Sitemaps ({len(result.sitemaps)}):")
for sitemap in result.sitemaps:
lines.append(f" - {sitemap}")
lines.append("")
if result.rules:
lines.append("Rules Summary:")
for rules in result.rules:
lines.append(f"\n User-agent: {rules.user_agent}")
if rules.disallow:
lines.append(f" Disallow: {len(rules.disallow)} rules")
for d in rules.disallow[:5]:
lines.append(f" - {d}")
if len(rules.disallow) > 5:
lines.append(f" ... and {len(rules.disallow) - 5} more")
if rules.allow:
lines.append(f" Allow: {len(rules.allow)} rules")
for a in rules.allow[:3]:
lines.append(f" - {a}")
if rules.crawl_delay:
lines.append(f" Crawl-delay: {rules.crawl_delay}s")
lines.append("")
if result.issues:
lines.append("Issues Found:")
errors = [i for i in result.issues if i.severity == "error"]
warnings = [i for i in result.issues if i.severity == "warning"]
infos = [i for i in result.issues if i.severity == "info"]
if errors:
lines.append(f"\n ERRORS ({len(errors)}):")
for issue in errors:
lines.append(f" - {issue.message}")
if issue.directive:
lines.append(f" Directive: {issue.directive}")
if issue.suggestion:
lines.append(f" Suggestion: {issue.suggestion}")
if warnings:
lines.append(f"\n WARNINGS ({len(warnings)}):")
for issue in warnings:
lines.append(f" - {issue.message}")
if issue.suggestion:
lines.append(f" Suggestion: {issue.suggestion}")
if infos:
lines.append(f"\n INFO ({len(infos)}):")
for issue in infos:
lines.append(f" - {issue.message}")
lines.append("")
lines.append("=" * 60)
return "\n".join(lines)
def main():
"""Main entry point for CLI usage."""
parser = argparse.ArgumentParser(
description="Analyze robots.txt configuration",
)
parser.add_argument("--url", "-u", required=True,
help="URL to robots.txt or domain")
parser.add_argument("--test-url", "-t",
help="Test if specific URL path is allowed")
parser.add_argument("--user-agent", "-a", default="Googlebot",
help="User agent for testing (default: Googlebot)")
parser.add_argument("--output", "-o", help="Output file for JSON report")
parser.add_argument("--json", action="store_true", help="Output as JSON")
args = parser.parse_args()
checker = RobotsChecker()
if args.test_url:
# Test specific URL
test_result = checker.test_url(args.url, args.test_url, args.user_agent)
if args.json:
print(json.dumps(test_result, indent=2))
else:
status = "ALLOWED" if test_result["allowed"] else "BLOCKED"
print(f"URL: {test_result['path']}")
print(f"User-Agent: {test_result['user_agent']}")
print(f"Status: {status}")
else:
# Full analysis
result = checker.analyze(args.url)
if args.json or args.output:
output = json.dumps(result.to_dict(), ensure_ascii=False, indent=2)
if args.output:
with open(args.output, "w", encoding="utf-8") as f:
f.write(output)
logger.info(f"Report written to {args.output}")
else:
print(output)
else:
print(checker.generate_report(result))
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,969 @@
"""
Sitemap Crawler - Sequential page analysis from sitemap
=======================================================
Purpose: Crawl sitemap URLs one by one, analyze each page, save to Notion
Python: 3.10+
Usage:
from sitemap_crawler import SitemapCrawler
crawler = SitemapCrawler()
crawler.crawl_sitemap("https://example.com/sitemap.xml", delay=2.0)
"""
import json
import logging
import time
import xml.etree.ElementTree as ET
from dataclasses import dataclass, field
from datetime import datetime
from pathlib import Path
from typing import Callable, Generator
from urllib.parse import urlparse
import requests
from notion_client import Client
from base_client import config
from page_analyzer import PageAnalyzer, PageMetadata
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - %(levelname)s - %(message)s",
)
logger = logging.getLogger(__name__)
# Default database for page analysis data
DEFAULT_PAGES_DATABASE_ID = "2c8581e5-8a1e-8035-880b-e38cefc2f3ef"
# Default limits to prevent excessive resource usage
DEFAULT_MAX_PAGES = 500
DEFAULT_DELAY_SECONDS = 2.0
# Progress tracking directory
PROGRESS_DIR = Path.home() / ".claude" / "seo-audit-progress"
PROGRESS_DIR.mkdir(parents=True, exist_ok=True)
@dataclass
class CrawlProgress:
"""Track crawl progress."""
total_urls: int = 0
processed_urls: int = 0
successful_urls: int = 0
failed_urls: int = 0
skipped_urls: int = 0
start_time: datetime = field(default_factory=datetime.now)
current_url: str = ""
audit_id: str = ""
site: str = ""
status: str = "running" # running, completed, failed
error_message: str = ""
summary_page_id: str = ""
def get_progress_percent(self) -> float:
if self.total_urls == 0:
return 0.0
return (self.processed_urls / self.total_urls) * 100
def get_elapsed_time(self) -> str:
elapsed = datetime.now() - self.start_time
minutes = int(elapsed.total_seconds() // 60)
seconds = int(elapsed.total_seconds() % 60)
return f"{minutes}m {seconds}s"
def get_eta(self) -> str:
if self.processed_urls == 0:
return "calculating..."
elapsed = (datetime.now() - self.start_time).total_seconds()
avg_time_per_url = elapsed / self.processed_urls
remaining_urls = self.total_urls - self.processed_urls
eta_seconds = remaining_urls * avg_time_per_url
minutes = int(eta_seconds // 60)
seconds = int(eta_seconds % 60)
return f"{minutes}m {seconds}s"
def to_dict(self) -> dict:
"""Convert to dictionary for JSON serialization."""
return {
"audit_id": self.audit_id,
"site": self.site,
"status": self.status,
"total_urls": self.total_urls,
"processed_urls": self.processed_urls,
"successful_urls": self.successful_urls,
"failed_urls": self.failed_urls,
"progress_percent": round(self.get_progress_percent(), 1),
"elapsed_time": self.get_elapsed_time(),
"eta": self.get_eta(),
"current_url": self.current_url,
"start_time": self.start_time.isoformat(),
"error_message": self.error_message,
"summary_page_id": self.summary_page_id,
"updated_at": datetime.now().isoformat(),
}
def save_to_file(self, filepath: Path | None = None) -> Path:
"""Save progress to JSON file."""
if filepath is None:
filepath = PROGRESS_DIR / f"{self.audit_id}.json"
with open(filepath, "w") as f:
json.dump(self.to_dict(), f, indent=2)
return filepath
@classmethod
def load_from_file(cls, filepath: Path) -> "CrawlProgress":
"""Load progress from JSON file."""
with open(filepath, "r") as f:
data = json.load(f)
progress = cls()
progress.audit_id = data.get("audit_id", "")
progress.site = data.get("site", "")
progress.status = data.get("status", "unknown")
progress.total_urls = data.get("total_urls", 0)
progress.processed_urls = data.get("processed_urls", 0)
progress.successful_urls = data.get("successful_urls", 0)
progress.failed_urls = data.get("failed_urls", 0)
progress.current_url = data.get("current_url", "")
progress.error_message = data.get("error_message", "")
progress.summary_page_id = data.get("summary_page_id", "")
if data.get("start_time"):
progress.start_time = datetime.fromisoformat(data["start_time"])
return progress
def get_active_crawls() -> list[CrawlProgress]:
"""Get all active (running) crawl jobs."""
active = []
for filepath in PROGRESS_DIR.glob("*.json"):
try:
progress = CrawlProgress.load_from_file(filepath)
if progress.status == "running":
active.append(progress)
except Exception:
continue
return active
def get_all_crawls() -> list[CrawlProgress]:
"""Get all crawl jobs (active and completed)."""
crawls = []
for filepath in sorted(PROGRESS_DIR.glob("*.json"), reverse=True):
try:
progress = CrawlProgress.load_from_file(filepath)
crawls.append(progress)
except Exception:
continue
return crawls
def get_crawl_status(audit_id: str) -> CrawlProgress | None:
"""Get status of a specific crawl by audit ID."""
filepath = PROGRESS_DIR / f"{audit_id}.json"
if filepath.exists():
return CrawlProgress.load_from_file(filepath)
return None
@dataclass
class CrawlResult:
"""Result of a complete sitemap crawl."""
site: str
sitemap_url: str
audit_id: str
total_pages: int
successful_pages: int
failed_pages: int
start_time: datetime
end_time: datetime
pages_analyzed: list[PageMetadata] = field(default_factory=list)
notion_page_ids: list[str] = field(default_factory=list)
summary_page_id: str | None = None
def get_duration(self) -> str:
duration = self.end_time - self.start_time
minutes = int(duration.total_seconds() // 60)
seconds = int(duration.total_seconds() % 60)
return f"{minutes}m {seconds}s"
class SitemapCrawler:
"""Crawl sitemap URLs and analyze each page."""
def __init__(
self,
notion_token: str | None = None,
database_id: str | None = None,
):
"""
Initialize sitemap crawler.
Args:
notion_token: Notion API token
database_id: Notion database ID for storing results
"""
self.notion_token = notion_token or config.notion_token
self.database_id = database_id or DEFAULT_PAGES_DATABASE_ID
self.analyzer = PageAnalyzer()
if self.notion_token:
self.notion = Client(auth=self.notion_token)
else:
self.notion = None
logger.warning("Notion token not configured, results will not be saved")
def fetch_sitemap_urls(self, sitemap_url: str) -> list[str]:
"""
Fetch and parse URLs from a sitemap.
Args:
sitemap_url: URL of the sitemap
Returns:
List of URLs found in the sitemap
"""
try:
response = requests.get(sitemap_url, timeout=30)
response.raise_for_status()
# Parse XML
root = ET.fromstring(response.content)
# Handle namespace
namespaces = {
"sm": "http://www.sitemaps.org/schemas/sitemap/0.9"
}
urls = []
# Check if this is a sitemap index
sitemap_tags = root.findall(".//sm:sitemap/sm:loc", namespaces)
if sitemap_tags:
# This is a sitemap index, recursively fetch child sitemaps
logger.info(f"Found sitemap index with {len(sitemap_tags)} child sitemaps")
for loc in sitemap_tags:
if loc.text:
child_urls = self.fetch_sitemap_urls(loc.text)
urls.extend(child_urls)
else:
# Regular sitemap, extract URLs
url_tags = root.findall(".//sm:url/sm:loc", namespaces)
if not url_tags:
# Try without namespace
url_tags = root.findall(".//url/loc")
for loc in url_tags:
if loc.text:
urls.append(loc.text)
# Remove duplicates while preserving order
seen = set()
unique_urls = []
for url in urls:
if url not in seen:
seen.add(url)
unique_urls.append(url)
logger.info(f"Found {len(unique_urls)} unique URLs in sitemap")
return unique_urls
except Exception as e:
logger.error(f"Failed to fetch sitemap: {e}")
raise
def crawl_sitemap(
self,
sitemap_url: str,
delay: float = DEFAULT_DELAY_SECONDS,
max_pages: int = DEFAULT_MAX_PAGES,
progress_callback: Callable[[CrawlProgress], None] | None = None,
save_to_notion: bool = True,
url_filter: Callable[[str], bool] | None = None,
) -> CrawlResult:
"""
Crawl all URLs in a sitemap sequentially.
Args:
sitemap_url: URL of the sitemap
delay: Seconds to wait between requests (default: 2.0s)
max_pages: Maximum number of pages to process (default: 500)
progress_callback: Function called with progress updates
save_to_notion: Whether to save results to Notion
url_filter: Optional function to filter URLs (return True to include)
Returns:
CrawlResult with all analyzed pages
"""
# Parse site info
parsed_sitemap = urlparse(sitemap_url)
site = f"{parsed_sitemap.scheme}://{parsed_sitemap.netloc}"
site_domain = parsed_sitemap.netloc
# Generate audit ID
audit_id = f"{site_domain}-pages-{datetime.now().strftime('%Y%m%d-%H%M%S')}"
logger.info(f"Starting sitemap crawl: {sitemap_url}")
logger.info(f"Audit ID: {audit_id}")
logger.info(f"Delay between requests: {delay}s")
# Initialize progress tracking
progress = CrawlProgress(
audit_id=audit_id,
site=site,
status="running",
)
# Fetch URLs
urls = self.fetch_sitemap_urls(sitemap_url)
# Apply URL filter if provided
if url_filter:
urls = [url for url in urls if url_filter(url)]
logger.info(f"After filtering: {len(urls)} URLs")
# Apply max pages limit (default: 500 to prevent excessive resource usage)
if len(urls) > max_pages:
logger.warning(f"Sitemap has {len(urls)} URLs, limiting to {max_pages} pages")
logger.warning(f"Use max_pages parameter to adjust this limit")
urls = urls[:max_pages]
logger.info(f"Processing {len(urls)} pages (max: {max_pages})")
# Update progress with total URLs
progress.total_urls = len(urls)
progress.save_to_file()
# Initialize result
result = CrawlResult(
site=site,
sitemap_url=sitemap_url,
audit_id=audit_id,
total_pages=len(urls),
successful_pages=0,
failed_pages=0,
start_time=datetime.now(),
end_time=datetime.now(),
)
# Process each URL
try:
for i, url in enumerate(urls):
progress.current_url = url
progress.processed_urls = i
progress.save_to_file() # Save progress to file
if progress_callback:
progress_callback(progress)
logger.info(f"[{i+1}/{len(urls)}] Analyzing: {url}")
try:
# Analyze page
metadata = self.analyzer.analyze_url(url)
result.pages_analyzed.append(metadata)
if metadata.status_code == 200:
progress.successful_urls += 1
result.successful_pages += 1
# Save to Notion
if save_to_notion and self.notion:
page_id = self._save_page_to_notion(metadata, audit_id, site)
if page_id:
result.notion_page_ids.append(page_id)
else:
progress.failed_urls += 1
result.failed_pages += 1
except Exception as e:
logger.error(f"Failed to analyze {url}: {e}")
progress.failed_urls += 1
result.failed_pages += 1
# Wait before next request
if i < len(urls) - 1: # Don't wait after last URL
time.sleep(delay)
# Final progress update
progress.processed_urls = len(urls)
progress.status = "completed"
if progress_callback:
progress_callback(progress)
except Exception as e:
progress.status = "failed"
progress.error_message = str(e)
progress.save_to_file()
raise
# Update result
result.end_time = datetime.now()
# Create summary page
if save_to_notion and self.notion:
summary_id = self._create_crawl_summary_page(result)
result.summary_page_id = summary_id
progress.summary_page_id = summary_id
# Save final progress
progress.save_to_file()
logger.info(f"Crawl complete: {result.successful_pages}/{result.total_pages} pages analyzed")
logger.info(f"Duration: {result.get_duration()}")
return result
def _save_page_to_notion(
self,
metadata: PageMetadata,
audit_id: str,
site: str,
) -> str | None:
"""Save page metadata to Notion database."""
try:
# Build properties
properties = {
"Issue": {"title": [{"text": {"content": f"📄 {metadata.url}"}}]},
"Category": {"select": {"name": "On-page SEO"}},
"Priority": {"select": {"name": self._determine_priority(metadata)}},
"Site": {"url": site},
"URL": {"url": metadata.url},
"Audit ID": {"rich_text": [{"text": {"content": audit_id}}]},
"Found Date": {"date": {"start": datetime.now().strftime("%Y-%m-%d")}},
}
# Build page content
children = self._build_page_content(metadata)
response = self.notion.pages.create(
parent={"database_id": self.database_id},
properties=properties,
children=children,
)
return response["id"]
except Exception as e:
logger.error(f"Failed to save to Notion: {e}")
return None
def _determine_priority(self, metadata: PageMetadata) -> str:
"""Determine priority based on issues found."""
if len(metadata.issues) >= 3:
return "High"
elif len(metadata.issues) >= 1:
return "Medium"
elif len(metadata.warnings) >= 3:
return "Medium"
else:
return "Low"
def _build_page_content(self, metadata: PageMetadata) -> list[dict]:
"""Build Notion page content blocks from metadata."""
children = []
# Status summary callout
status_emoji = "" if not metadata.issues else "⚠️" if len(metadata.issues) < 3 else ""
children.append({
"object": "block",
"type": "callout",
"callout": {
"rich_text": [
{"type": "text", "text": {"content": f"Status: {metadata.status_code} | "}},
{"type": "text", "text": {"content": f"Response: {metadata.response_time_ms:.0f}ms | "}},
{"type": "text", "text": {"content": f"Issues: {len(metadata.issues)} | "}},
{"type": "text", "text": {"content": f"Warnings: {len(metadata.warnings)}"}},
],
"icon": {"type": "emoji", "emoji": status_emoji},
"color": "gray_background" if not metadata.issues else "yellow_background" if len(metadata.issues) < 3 else "red_background",
}
})
# Meta Tags Section
children.append({
"object": "block",
"type": "heading_2",
"heading_2": {"rich_text": [{"type": "text", "text": {"content": "Meta Tags"}}]}
})
# Meta tags table
meta_rows = [
{"type": "table_row", "table_row": {"cells": [
[{"type": "text", "text": {"content": "Tag"}, "annotations": {"bold": True}}],
[{"type": "text", "text": {"content": "Value"}, "annotations": {"bold": True}}],
[{"type": "text", "text": {"content": "Status"}, "annotations": {"bold": True}}],
]}},
{"type": "table_row", "table_row": {"cells": [
[{"type": "text", "text": {"content": "Title"}}],
[{"type": "text", "text": {"content": (metadata.title or "")[:50]}}],
[{"type": "text", "text": {"content": f"{metadata.title_length} chars" if metadata.title else "✗ Missing"}}],
]}},
{"type": "table_row", "table_row": {"cells": [
[{"type": "text", "text": {"content": "Description"}}],
[{"type": "text", "text": {"content": (metadata.meta_description or "")[:50]}}],
[{"type": "text", "text": {"content": f"{metadata.meta_description_length} chars" if metadata.meta_description else "✗ Missing"}}],
]}},
{"type": "table_row", "table_row": {"cells": [
[{"type": "text", "text": {"content": "Canonical"}}],
[{"type": "text", "text": {"content": (metadata.canonical_url or "")[:50]}}],
[{"type": "text", "text": {"content": "" if metadata.canonical_url else "✗ Missing"}}],
]}},
{"type": "table_row", "table_row": {"cells": [
[{"type": "text", "text": {"content": "Robots"}}],
[{"type": "text", "text": {"content": metadata.robots_meta or ""}}],
[{"type": "text", "text": {"content": "" if metadata.robots_meta else ""}}],
]}},
{"type": "table_row", "table_row": {"cells": [
[{"type": "text", "text": {"content": "Lang"}}],
[{"type": "text", "text": {"content": metadata.html_lang or ""}}],
[{"type": "text", "text": {"content": "" if metadata.html_lang else ""}}],
]}},
]
children.append({
"object": "block",
"type": "table",
"table": {
"table_width": 3,
"has_column_header": True,
"has_row_header": False,
"children": meta_rows
}
})
# Headings Section
children.append({
"object": "block",
"type": "heading_2",
"heading_2": {"rich_text": [{"type": "text", "text": {"content": "Headings"}}]}
})
children.append({
"object": "block",
"type": "paragraph",
"paragraph": {"rich_text": [
{"type": "text", "text": {"content": f"H1: {metadata.h1_count} | "}},
{"type": "text", "text": {"content": f"Total headings: {len(metadata.headings)}"}},
]}
})
if metadata.h1_text:
children.append({
"object": "block",
"type": "quote",
"quote": {"rich_text": [{"type": "text", "text": {"content": metadata.h1_text[:200]}}]}
})
# Schema Data Section
children.append({
"object": "block",
"type": "heading_2",
"heading_2": {"rich_text": [{"type": "text", "text": {"content": "Structured Data"}}]}
})
if metadata.schema_types_found:
children.append({
"object": "block",
"type": "paragraph",
"paragraph": {"rich_text": [
{"type": "text", "text": {"content": "Schema types found: "}},
{"type": "text", "text": {"content": ", ".join(metadata.schema_types_found)}, "annotations": {"code": True}},
]}
})
else:
children.append({
"object": "block",
"type": "callout",
"callout": {
"rich_text": [{"type": "text", "text": {"content": "No structured data found on this page"}}],
"icon": {"type": "emoji", "emoji": "⚠️"},
"color": "yellow_background",
}
})
# Open Graph Section
children.append({
"object": "block",
"type": "heading_2",
"heading_2": {"rich_text": [{"type": "text", "text": {"content": "Open Graph"}}]}
})
og = metadata.open_graph
og_status = "✓ Configured" if og.og_title else "✗ Missing"
children.append({
"object": "block",
"type": "paragraph",
"paragraph": {"rich_text": [
{"type": "text", "text": {"content": f"Status: {og_status}\n"}},
{"type": "text", "text": {"content": f"og:title: {og.og_title or ''}\n"}},
{"type": "text", "text": {"content": f"og:type: {og.og_type or ''}"}},
]}
})
# Links Section
children.append({
"object": "block",
"type": "heading_2",
"heading_2": {"rich_text": [{"type": "text", "text": {"content": "Links"}}]}
})
children.append({
"object": "block",
"type": "paragraph",
"paragraph": {"rich_text": [
{"type": "text", "text": {"content": f"Internal links: {metadata.internal_link_count}\n"}},
{"type": "text", "text": {"content": f"External links: {metadata.external_link_count}"}},
]}
})
# Images Section
children.append({
"object": "block",
"type": "heading_2",
"heading_2": {"rich_text": [{"type": "text", "text": {"content": "Images"}}]}
})
children.append({
"object": "block",
"type": "paragraph",
"paragraph": {"rich_text": [
{"type": "text", "text": {"content": f"Total: {metadata.images_total} | "}},
{"type": "text", "text": {"content": f"With alt: {metadata.images_with_alt} | "}},
{"type": "text", "text": {"content": f"Without alt: {metadata.images_without_alt}"}},
]}
})
# Hreflang Section (if present)
if metadata.hreflang_tags:
children.append({
"object": "block",
"type": "heading_2",
"heading_2": {"rich_text": [{"type": "text", "text": {"content": "Hreflang Tags"}}]}
})
for tag in metadata.hreflang_tags[:10]:
children.append({
"object": "block",
"type": "bulleted_list_item",
"bulleted_list_item": {"rich_text": [
{"type": "text", "text": {"content": f"{tag['lang']}: "}},
{"type": "text", "text": {"content": tag['url'], "link": {"url": tag['url']}}},
]}
})
# Issues & Warnings Section
if metadata.issues or metadata.warnings:
children.append({
"object": "block",
"type": "heading_2",
"heading_2": {"rich_text": [{"type": "text", "text": {"content": "Issues & Warnings"}}]}
})
for issue in metadata.issues:
children.append({
"object": "block",
"type": "to_do",
"to_do": {
"rich_text": [
{"type": "text", "text": {"content": ""}, "annotations": {"bold": True}},
{"type": "text", "text": {"content": issue}},
],
"checked": False,
}
})
for warning in metadata.warnings:
children.append({
"object": "block",
"type": "to_do",
"to_do": {
"rich_text": [
{"type": "text", "text": {"content": "⚠️ "}, "annotations": {"bold": True}},
{"type": "text", "text": {"content": warning}},
],
"checked": False,
}
})
return children
def _create_crawl_summary_page(self, result: CrawlResult) -> str | None:
"""Create a summary page for the crawl."""
try:
site_domain = urlparse(result.site).netloc
# Calculate statistics
total_issues = sum(len(p.issues) for p in result.pages_analyzed)
total_warnings = sum(len(p.warnings) for p in result.pages_analyzed)
pages_with_issues = sum(1 for p in result.pages_analyzed if p.issues)
pages_without_schema = sum(1 for p in result.pages_analyzed if not p.schema_types_found)
pages_without_description = sum(1 for p in result.pages_analyzed if not p.meta_description)
children = []
# Header callout
children.append({
"object": "block",
"type": "callout",
"callout": {
"rich_text": [
{"type": "text", "text": {"content": f"Sitemap Crawl Complete\n\n"}},
{"type": "text", "text": {"content": f"Audit ID: {result.audit_id}\n"}},
{"type": "text", "text": {"content": f"Duration: {result.get_duration()}\n"}},
{"type": "text", "text": {"content": f"Pages: {result.successful_pages}/{result.total_pages}"}},
],
"icon": {"type": "emoji", "emoji": "📊"},
"color": "blue_background",
}
})
# Statistics table
children.append({
"object": "block",
"type": "heading_2",
"heading_2": {"rich_text": [{"type": "text", "text": {"content": "Statistics"}}]}
})
stats_rows = [
{"type": "table_row", "table_row": {"cells": [
[{"type": "text", "text": {"content": "Metric"}, "annotations": {"bold": True}}],
[{"type": "text", "text": {"content": "Count"}, "annotations": {"bold": True}}],
]}},
{"type": "table_row", "table_row": {"cells": [
[{"type": "text", "text": {"content": "Total Pages"}}],
[{"type": "text", "text": {"content": str(result.total_pages)}}],
]}},
{"type": "table_row", "table_row": {"cells": [
[{"type": "text", "text": {"content": "Successfully Analyzed"}}],
[{"type": "text", "text": {"content": str(result.successful_pages)}}],
]}},
{"type": "table_row", "table_row": {"cells": [
[{"type": "text", "text": {"content": "Pages with Issues"}}],
[{"type": "text", "text": {"content": str(pages_with_issues)}}],
]}},
{"type": "table_row", "table_row": {"cells": [
[{"type": "text", "text": {"content": "Total Issues"}}],
[{"type": "text", "text": {"content": str(total_issues)}}],
]}},
{"type": "table_row", "table_row": {"cells": [
[{"type": "text", "text": {"content": "Total Warnings"}}],
[{"type": "text", "text": {"content": str(total_warnings)}}],
]}},
{"type": "table_row", "table_row": {"cells": [
[{"type": "text", "text": {"content": "Pages without Schema"}}],
[{"type": "text", "text": {"content": str(pages_without_schema)}}],
]}},
{"type": "table_row", "table_row": {"cells": [
[{"type": "text", "text": {"content": "Pages without Description"}}],
[{"type": "text", "text": {"content": str(pages_without_description)}}],
]}},
]
children.append({
"object": "block",
"type": "table",
"table": {
"table_width": 2,
"has_column_header": True,
"has_row_header": False,
"children": stats_rows
}
})
# Pages list
children.append({
"object": "block",
"type": "heading_2",
"heading_2": {"rich_text": [{"type": "text", "text": {"content": "Analyzed Pages"}}]}
})
children.append({
"object": "block",
"type": "paragraph",
"paragraph": {"rich_text": [
{"type": "text", "text": {"content": f"Filter by Audit ID in the database to see all {result.successful_pages} page entries."}}
]}
})
# Create the summary page
response = self.notion.pages.create(
parent={"database_id": self.database_id},
properties={
"Issue": {"title": [{"text": {"content": f"📊 Sitemap Crawl: {site_domain}"}}]},
"Category": {"select": {"name": "Technical SEO"}},
"Priority": {"select": {"name": "High"}},
"Site": {"url": result.site},
"Audit ID": {"rich_text": [{"text": {"content": result.audit_id}}]},
"Found Date": {"date": {"start": datetime.now().strftime("%Y-%m-%d")}},
},
children=children,
)
logger.info(f"Created crawl summary page: {response['id']}")
return response["id"]
except Exception as e:
logger.error(f"Failed to create summary page: {e}")
return None
def print_progress_status(progress: CrawlProgress) -> None:
"""Print formatted progress status."""
status_emoji = {
"running": "🔄",
"completed": "",
"failed": "",
}.get(progress.status, "")
print(f"""
{'=' * 60}
{status_emoji} SEO Page Analysis - {progress.status.upper()}
{'=' * 60}
Audit ID: {progress.audit_id}
Site: {progress.site}
Status: {progress.status}
Progress: {progress.processed_urls}/{progress.total_urls} pages ({progress.get_progress_percent():.1f}%)
Successful: {progress.successful_urls}
Failed: {progress.failed_urls}
Elapsed: {progress.get_elapsed_time()}
ETA: {progress.get_eta() if progress.status == 'running' else 'N/A'}
Current URL: {progress.current_url[:60] + '...' if len(progress.current_url) > 60 else progress.current_url}
""")
if progress.summary_page_id:
print(f"Summary: https://www.notion.so/{progress.summary_page_id.replace('-', '')}")
if progress.error_message:
print(f"Error: {progress.error_message}")
print("=" * 60)
def main():
"""CLI entry point."""
import argparse
parser = argparse.ArgumentParser(description="Sitemap Crawler with Background Support")
subparsers = parser.add_subparsers(dest="command", help="Commands")
# Crawl command
crawl_parser = subparsers.add_parser("crawl", help="Start crawling a sitemap")
crawl_parser.add_argument("sitemap_url", help="URL of the sitemap to crawl")
crawl_parser.add_argument("--delay", "-d", type=float, default=DEFAULT_DELAY_SECONDS,
help=f"Delay between requests in seconds (default: {DEFAULT_DELAY_SECONDS})")
crawl_parser.add_argument("--max-pages", "-m", type=int, default=DEFAULT_MAX_PAGES,
help=f"Maximum pages to process (default: {DEFAULT_MAX_PAGES})")
crawl_parser.add_argument("--no-notion", action="store_true",
help="Don't save to Notion")
crawl_parser.add_argument("--no-limit", action="store_true",
help="Remove page limit (use with caution)")
# Status command
status_parser = subparsers.add_parser("status", help="Check crawl progress")
status_parser.add_argument("audit_id", nargs="?", help="Specific audit ID to check (optional)")
status_parser.add_argument("--all", "-a", action="store_true", help="Show all crawls (not just active)")
# List command
list_parser = subparsers.add_parser("list", help="List all crawl jobs")
args = parser.parse_args()
# Default to crawl if no command specified but URL provided
if args.command is None:
# Check if first positional arg looks like a URL
import sys
if len(sys.argv) > 1 and (sys.argv[1].startswith("http") or sys.argv[1].endswith(".xml")):
args.command = "crawl"
args.sitemap_url = sys.argv[1]
args.delay = DEFAULT_DELAY_SECONDS
args.max_pages = DEFAULT_MAX_PAGES
args.no_notion = False
args.no_limit = False
else:
parser.print_help()
return
if args.command == "status":
if args.audit_id:
# Show specific crawl status
progress = get_crawl_status(args.audit_id)
if progress:
print_progress_status(progress)
else:
print(f"No crawl found with audit ID: {args.audit_id}")
else:
# Show active crawls
if args.all:
crawls = get_all_crawls()
label = "All"
else:
crawls = get_active_crawls()
label = "Active"
if crawls:
print(f"\n{label} Crawl Jobs ({len(crawls)}):")
print("-" * 60)
for p in crawls:
status_emoji = {"running": "🔄", "completed": "", "failed": ""}.get(p.status, "")
print(f"{status_emoji} {p.audit_id}")
print(f" Site: {p.site}")
print(f" Progress: {p.processed_urls}/{p.total_urls} ({p.get_progress_percent():.1f}%)")
print()
else:
print(f"No {label.lower()} crawl jobs found.")
return
if args.command == "list":
crawls = get_all_crawls()
if crawls:
print(f"\nAll Crawl Jobs ({len(crawls)}):")
print("-" * 80)
print(f"{'Status':<10} {'Audit ID':<45} {'Progress':<15}")
print("-" * 80)
for p in crawls[:20]: # Show last 20
status_emoji = {"running": "🔄", "completed": "", "failed": ""}.get(p.status, "")
progress_str = f"{p.processed_urls}/{p.total_urls}"
print(f"{status_emoji} {p.status:<7} {p.audit_id:<45} {progress_str:<15}")
if len(crawls) > 20:
print(f"... and {len(crawls) - 20} more")
else:
print("No crawl jobs found.")
return
if args.command == "crawl":
# Handle --no-limit option
max_pages = args.max_pages
if args.no_limit:
max_pages = 999999 # Effectively unlimited
print("⚠️ WARNING: Page limit disabled. This may take a very long time!")
def progress_callback(progress: CrawlProgress):
pct = progress.get_progress_percent()
print(f"\r[{pct:5.1f}%] {progress.processed_urls}/{progress.total_urls} pages | "
f"Success: {progress.successful_urls} | Failed: {progress.failed_urls} | "
f"ETA: {progress.get_eta()}", end="", flush=True)
crawler = SitemapCrawler()
result = crawler.crawl_sitemap(
args.sitemap_url,
delay=args.delay,
max_pages=max_pages,
progress_callback=progress_callback,
save_to_notion=not args.no_notion,
)
print() # New line after progress
print()
print("=" * 60)
print("CRAWL COMPLETE")
print("=" * 60)
print(f"Audit ID: {result.audit_id}")
print(f"Total Pages: {result.total_pages}")
print(f"Successful: {result.successful_pages}")
print(f"Failed: {result.failed_pages}")
print(f"Duration: {result.get_duration()}")
if result.summary_page_id:
print(f"Summary Page: https://www.notion.so/{result.summary_page_id.replace('-', '')}")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,467 @@
"""
Sitemap Validator - Validate XML sitemaps
==========================================
Purpose: Parse and validate XML sitemaps for SEO compliance
Python: 3.10+
Usage:
python sitemap_validator.py --url https://example.com/sitemap.xml
"""
import argparse
import asyncio
import gzip
import json
import logging
import re
from dataclasses import dataclass, field
from datetime import datetime
from io import BytesIO
from typing import Any
from urllib.parse import urljoin, urlparse
import aiohttp
import requests
from lxml import etree
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - %(levelname)s - %(message)s",
)
logger = logging.getLogger(__name__)
@dataclass
class SitemapIssue:
"""Represents a sitemap validation issue."""
severity: str # "error", "warning", "info"
message: str
url: str | None = None
suggestion: str | None = None
@dataclass
class SitemapEntry:
"""Represents a single URL entry in sitemap."""
loc: str
lastmod: str | None = None
changefreq: str | None = None
priority: float | None = None
status_code: int | None = None
@dataclass
class SitemapResult:
"""Complete sitemap validation result."""
url: str
sitemap_type: str # "urlset" or "sitemapindex"
entries: list[SitemapEntry] = field(default_factory=list)
child_sitemaps: list[str] = field(default_factory=list)
issues: list[SitemapIssue] = field(default_factory=list)
valid: bool = True
stats: dict = field(default_factory=dict)
timestamp: str = field(default_factory=lambda: datetime.now().isoformat())
def to_dict(self) -> dict:
"""Convert to dictionary for JSON output."""
return {
"url": self.url,
"sitemap_type": self.sitemap_type,
"valid": self.valid,
"stats": self.stats,
"issues": [
{
"severity": i.severity,
"message": i.message,
"url": i.url,
"suggestion": i.suggestion,
}
for i in self.issues
],
"entries_count": len(self.entries),
"child_sitemaps": self.child_sitemaps,
"timestamp": self.timestamp,
}
class SitemapValidator:
"""Validate XML sitemaps."""
SITEMAP_NS = "http://www.sitemaps.org/schemas/sitemap/0.9"
MAX_URLS = 50000
MAX_SIZE_BYTES = 50 * 1024 * 1024 # 50MB
VALID_CHANGEFREQ = {
"always", "hourly", "daily", "weekly",
"monthly", "yearly", "never"
}
def __init__(self, check_urls: bool = False, max_concurrent: int = 10):
self.check_urls = check_urls
self.max_concurrent = max_concurrent
self.session = requests.Session()
self.session.headers.update({
"User-Agent": "Mozilla/5.0 (compatible; SEOAuditBot/1.0)"
})
def fetch_sitemap(self, url: str) -> tuple[bytes, bool]:
"""Fetch sitemap content, handling gzip compression."""
try:
response = self.session.get(url, timeout=30)
response.raise_for_status()
content = response.content
is_gzipped = False
# Check if gzipped
if url.endswith(".gz") or response.headers.get(
"Content-Encoding"
) == "gzip":
try:
content = gzip.decompress(content)
is_gzipped = True
except gzip.BadGzipFile:
pass
return content, is_gzipped
except requests.RequestException as e:
raise RuntimeError(f"Failed to fetch sitemap: {e}")
def parse_sitemap(self, content: bytes) -> tuple[str, list[dict]]:
"""Parse sitemap XML content."""
try:
root = etree.fromstring(content)
except etree.XMLSyntaxError as e:
raise ValueError(f"Invalid XML: {e}")
# Remove namespace for easier parsing
nsmap = {"sm": self.SITEMAP_NS}
# Check if it's a sitemap index or urlset
if root.tag == f"{{{self.SITEMAP_NS}}}sitemapindex":
sitemap_type = "sitemapindex"
entries = []
for sitemap in root.findall("sm:sitemap", nsmap):
entry = {}
loc = sitemap.find("sm:loc", nsmap)
if loc is not None and loc.text:
entry["loc"] = loc.text.strip()
lastmod = sitemap.find("sm:lastmod", nsmap)
if lastmod is not None and lastmod.text:
entry["lastmod"] = lastmod.text.strip()
if entry.get("loc"):
entries.append(entry)
elif root.tag == f"{{{self.SITEMAP_NS}}}urlset":
sitemap_type = "urlset"
entries = []
for url in root.findall("sm:url", nsmap):
entry = {}
loc = url.find("sm:loc", nsmap)
if loc is not None and loc.text:
entry["loc"] = loc.text.strip()
lastmod = url.find("sm:lastmod", nsmap)
if lastmod is not None and lastmod.text:
entry["lastmod"] = lastmod.text.strip()
changefreq = url.find("sm:changefreq", nsmap)
if changefreq is not None and changefreq.text:
entry["changefreq"] = changefreq.text.strip().lower()
priority = url.find("sm:priority", nsmap)
if priority is not None and priority.text:
try:
entry["priority"] = float(priority.text.strip())
except ValueError:
entry["priority"] = None
if entry.get("loc"):
entries.append(entry)
else:
raise ValueError(f"Unknown sitemap type: {root.tag}")
return sitemap_type, entries
def validate(self, url: str) -> SitemapResult:
"""Validate a sitemap URL."""
result = SitemapResult(url=url, sitemap_type="unknown")
# Fetch sitemap
try:
content, is_gzipped = self.fetch_sitemap(url)
except RuntimeError as e:
result.issues.append(SitemapIssue(
severity="error",
message=str(e),
url=url,
))
result.valid = False
return result
# Check size
if len(content) > self.MAX_SIZE_BYTES:
result.issues.append(SitemapIssue(
severity="error",
message=f"Sitemap exceeds 50MB limit ({len(content) / 1024 / 1024:.2f}MB)",
url=url,
suggestion="Split sitemap into smaller files using sitemap index",
))
# Parse XML
try:
sitemap_type, entries = self.parse_sitemap(content)
except ValueError as e:
result.issues.append(SitemapIssue(
severity="error",
message=str(e),
url=url,
))
result.valid = False
return result
result.sitemap_type = sitemap_type
# Process entries
if sitemap_type == "sitemapindex":
result.child_sitemaps = [e["loc"] for e in entries]
result.stats = {
"child_sitemaps_count": len(entries),
}
else:
# Validate URL entries
url_count = len(entries)
result.stats["url_count"] = url_count
if url_count > self.MAX_URLS:
result.issues.append(SitemapIssue(
severity="error",
message=f"Sitemap exceeds 50,000 URL limit ({url_count} URLs)",
url=url,
suggestion="Split into multiple sitemaps with sitemap index",
))
if url_count == 0:
result.issues.append(SitemapIssue(
severity="warning",
message="Sitemap is empty (no URLs)",
url=url,
))
# Validate individual entries
seen_urls = set()
invalid_lastmod = 0
invalid_changefreq = 0
invalid_priority = 0
for entry in entries:
loc = entry.get("loc", "")
# Check for duplicates
if loc in seen_urls:
result.issues.append(SitemapIssue(
severity="warning",
message="Duplicate URL in sitemap",
url=loc,
))
seen_urls.add(loc)
# Validate lastmod format
lastmod = entry.get("lastmod")
if lastmod:
if not self._validate_date(lastmod):
invalid_lastmod += 1
# Validate changefreq
changefreq = entry.get("changefreq")
if changefreq and changefreq not in self.VALID_CHANGEFREQ:
invalid_changefreq += 1
# Validate priority
priority = entry.get("priority")
if priority is not None:
if not (0.0 <= priority <= 1.0):
invalid_priority += 1
# Create entry object
result.entries.append(SitemapEntry(
loc=loc,
lastmod=lastmod,
changefreq=changefreq,
priority=priority,
))
# Add summary issues
if invalid_lastmod > 0:
result.issues.append(SitemapIssue(
severity="warning",
message=f"{invalid_lastmod} URLs with invalid lastmod format",
suggestion="Use ISO 8601 format (YYYY-MM-DD or YYYY-MM-DDTHH:MM:SS+TZ)",
))
if invalid_changefreq > 0:
result.issues.append(SitemapIssue(
severity="info",
message=f"{invalid_changefreq} URLs with invalid changefreq",
suggestion="Use: always, hourly, daily, weekly, monthly, yearly, never",
))
if invalid_priority > 0:
result.issues.append(SitemapIssue(
severity="warning",
message=f"{invalid_priority} URLs with invalid priority (must be 0.0-1.0)",
))
result.stats.update({
"invalid_lastmod": invalid_lastmod,
"invalid_changefreq": invalid_changefreq,
"invalid_priority": invalid_priority,
"has_lastmod": sum(1 for e in result.entries if e.lastmod),
"has_changefreq": sum(1 for e in result.entries if e.changefreq),
"has_priority": sum(1 for e in result.entries if e.priority is not None),
})
# Check URLs if requested
if self.check_urls and result.entries:
asyncio.run(self._check_url_status(result))
# Determine validity
result.valid = not any(i.severity == "error" for i in result.issues)
return result
def _validate_date(self, date_str: str) -> bool:
"""Validate ISO 8601 date format."""
patterns = [
r"^\d{4}-\d{2}-\d{2}$",
r"^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}",
]
return any(re.match(p, date_str) for p in patterns)
async def _check_url_status(self, result: SitemapResult) -> None:
"""Check HTTP status of URLs in sitemap."""
semaphore = asyncio.Semaphore(self.max_concurrent)
async def check_url(entry: SitemapEntry) -> None:
async with semaphore:
try:
async with aiohttp.ClientSession() as session:
async with session.head(
entry.loc,
timeout=aiohttp.ClientTimeout(total=10),
allow_redirects=True,
) as response:
entry.status_code = response.status
except Exception:
entry.status_code = 0
await asyncio.gather(*[check_url(e) for e in result.entries[:100]])
# Count status codes
status_counts = {}
for entry in result.entries:
if entry.status_code:
status_counts[entry.status_code] = (
status_counts.get(entry.status_code, 0) + 1
)
result.stats["url_status_codes"] = status_counts
# Add issues for non-200 URLs
error_count = sum(
1 for e in result.entries
if e.status_code and e.status_code >= 400
)
if error_count > 0:
result.issues.append(SitemapIssue(
severity="warning",
message=f"{error_count} URLs returning error status codes (4xx/5xx)",
suggestion="Remove or fix broken URLs in sitemap",
))
def generate_report(self, result: SitemapResult) -> str:
"""Generate human-readable validation report."""
lines = [
"=" * 60,
"Sitemap Validation Report",
"=" * 60,
f"URL: {result.url}",
f"Type: {result.sitemap_type}",
f"Valid: {'Yes' if result.valid else 'No'}",
f"Timestamp: {result.timestamp}",
"",
]
lines.append("Statistics:")
for key, value in result.stats.items():
lines.append(f" {key}: {value}")
lines.append("")
if result.child_sitemaps:
lines.append(f"Child Sitemaps ({len(result.child_sitemaps)}):")
for sitemap in result.child_sitemaps[:10]:
lines.append(f" - {sitemap}")
if len(result.child_sitemaps) > 10:
lines.append(f" ... and {len(result.child_sitemaps) - 10} more")
lines.append("")
if result.issues:
lines.append("Issues Found:")
errors = [i for i in result.issues if i.severity == "error"]
warnings = [i for i in result.issues if i.severity == "warning"]
infos = [i for i in result.issues if i.severity == "info"]
if errors:
lines.append(f"\n ERRORS ({len(errors)}):")
for issue in errors:
lines.append(f" - {issue.message}")
if issue.url:
lines.append(f" URL: {issue.url}")
if issue.suggestion:
lines.append(f" Suggestion: {issue.suggestion}")
if warnings:
lines.append(f"\n WARNINGS ({len(warnings)}):")
for issue in warnings:
lines.append(f" - {issue.message}")
if issue.suggestion:
lines.append(f" Suggestion: {issue.suggestion}")
if infos:
lines.append(f"\n INFO ({len(infos)}):")
for issue in infos:
lines.append(f" - {issue.message}")
lines.append("")
lines.append("=" * 60)
return "\n".join(lines)
def main():
"""Main entry point for CLI usage."""
parser = argparse.ArgumentParser(
description="Validate XML sitemaps",
)
parser.add_argument("--url", "-u", required=True, help="Sitemap URL to validate")
parser.add_argument("--check-urls", action="store_true",
help="Check HTTP status of URLs (slower)")
parser.add_argument("--output", "-o", help="Output file for JSON report")
parser.add_argument("--json", action="store_true", help="Output as JSON")
args = parser.parse_args()
validator = SitemapValidator(check_urls=args.check_urls)
result = validator.validate(args.url)
if args.json or args.output:
output = json.dumps(result.to_dict(), ensure_ascii=False, indent=2)
if args.output:
with open(args.output, "w", encoding="utf-8") as f:
f.write(output)
logger.info(f"Report written to {args.output}")
else:
print(output)
else:
print(validator.generate_report(result))
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,94 @@
---
name: seo-technical-audit
version: 1.0.0
description: Technical SEO auditor for crawlability fundamentals. Triggers: robots.txt, sitemap validation, crawlability, indexing check, technical SEO.
allowed-tools: mcp__firecrawl__*, mcp__perplexity__*, mcp__notion__*
---
# SEO Technical Audit
## Purpose
Analyze crawlability fundamentals: robots.txt rules, XML sitemap structure, and URL accessibility. Identify issues blocking search engine crawlers.
## Core Capabilities
1. **Robots.txt Analysis** - Parse rules, check blocked resources
2. **Sitemap Validation** - Verify XML structure, URL limits, dates
3. **URL Accessibility** - Check HTTP status, redirects, broken links
## MCP Tool Usage
### Firecrawl for Page Data
```
mcp__firecrawl__scrape: Fetch robots.txt and sitemap content
mcp__firecrawl__crawl: Check multiple URLs accessibility
```
### Perplexity for Best Practices
```
mcp__perplexity__search: Research current SEO recommendations
```
## Workflow
### 1. Robots.txt Check
1. Fetch `[domain]/robots.txt` using Firecrawl
2. Parse User-agent rules and Disallow patterns
3. Identify blocked resources (CSS, JS, images)
4. Check for Sitemap declarations
5. Report critical issues
### 2. Sitemap Validation
1. Locate sitemap (from robots.txt or `/sitemap.xml`)
2. Validate XML syntax
3. Check URL count (max 50,000)
4. Verify lastmod date formats
5. For sitemap index: parse child sitemaps
### 3. URL Accessibility Sampling
1. Extract URLs from sitemap
2. Sample 50-100 URLs for large sites
3. Check HTTP status codes
4. Identify redirects and broken links
5. Report 4xx/5xx errors
## Output Format
```markdown
## Technical SEO Audit: [domain]
### Robots.txt Analysis
- Status: [Valid/Invalid/Missing]
- Sitemap declared: [Yes/No]
- Critical blocks: [List]
### Sitemap Validation
- URLs found: [count]
- Syntax: [Valid/Errors]
- Issues: [List]
### URL Accessibility (sampled)
- Checked: [count] URLs
- Success (2xx): [count]
- Redirects (3xx): [count]
- Errors (4xx/5xx): [count]
### Recommendations
1. [Priority fixes]
```
## Common Issues
| Issue | Impact | Fix |
|-------|--------|-----|
| No sitemap in robots.txt | Medium | Add `Sitemap:` directive |
| Blocking CSS/JS | High | Allow Googlebot access |
| 404s in sitemap | High | Remove or fix URLs |
| Missing lastmod | Low | Add dates for freshness signals |
## Limitations
- Cannot access password-protected sitemaps
- Large sitemaps (10,000+ URLs) require sampling
- Does not check render-blocking issues (use Core Web Vitals skill)

View File

@@ -0,0 +1,107 @@
# CLAUDE.md
## Overview
On-page SEO analyzer for single-page optimization: meta tags, headings, links, images, and Open Graph data.
## Quick Start
```bash
pip install -r scripts/requirements.txt
python scripts/page_analyzer.py --url https://example.com
```
## Scripts
| Script | Purpose |
|--------|---------|
| `page_analyzer.py` | Analyze on-page SEO elements |
| `base_client.py` | Shared utilities |
## Usage
```bash
# Full page analysis
python scripts/page_analyzer.py --url https://example.com
# JSON output
python scripts/page_analyzer.py --url https://example.com --json
# Analyze multiple pages
python scripts/page_analyzer.py --urls urls.txt
```
## Analysis Categories
### Meta Tags
- Title tag (length, keywords)
- Meta description (length, call-to-action)
- Canonical URL
- Robots meta tag
### Heading Structure
- H1 presence and count
- Heading hierarchy (H1→H6)
- Keyword placement in headings
### Links
- Internal link count
- External link count
- Broken links (4xx/5xx)
- Nofollow distribution
### Images
- Alt attribute presence
- Image file sizes
- Lazy loading implementation
### Open Graph / Social
- OG title, description, image
- Twitter Card tags
- Social sharing preview
## Output
```json
{
"url": "https://example.com",
"meta": {
"title": "Page Title",
"title_length": 55,
"description": "...",
"description_length": 150,
"canonical": "https://example.com"
},
"headings": {
"h1_count": 1,
"h1_text": ["Main Heading"],
"hierarchy_valid": true
},
"links": {
"internal": 25,
"external": 5,
"broken": []
},
"issues": []
}
```
## Common Issues
| Issue | Severity | Recommendation |
|-------|----------|----------------|
| Missing H1 | High | Add single H1 tag |
| Title too long (>60) | Medium | Shorten to 50-60 chars |
| No meta description | High | Add compelling description |
| Images without alt | Medium | Add descriptive alt text |
| Multiple H1 tags | Medium | Use single H1 only |
## Dependencies
```
lxml>=5.1.0
beautifulsoup4>=4.12.0
requests>=2.31.0
python-dotenv>=1.0.0
rich>=13.7.0
```

View File

@@ -0,0 +1,207 @@
"""
Base Client - Shared async client utilities
===========================================
Purpose: Rate-limited async operations for API clients
Python: 3.10+
"""
import asyncio
import logging
import os
from asyncio import Semaphore
from datetime import datetime
from typing import Any, Callable, TypeVar
from dotenv import load_dotenv
from tenacity import (
retry,
stop_after_attempt,
wait_exponential,
retry_if_exception_type,
)
# Load environment variables
load_dotenv()
# Logging setup
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - %(levelname)s - %(message)s",
)
T = TypeVar("T")
class RateLimiter:
"""Rate limiter using token bucket algorithm."""
def __init__(self, rate: float, per: float = 1.0):
"""
Initialize rate limiter.
Args:
rate: Number of requests allowed
per: Time period in seconds (default: 1 second)
"""
self.rate = rate
self.per = per
self.tokens = rate
self.last_update = datetime.now()
self._lock = asyncio.Lock()
async def acquire(self) -> None:
"""Acquire a token, waiting if necessary."""
async with self._lock:
now = datetime.now()
elapsed = (now - self.last_update).total_seconds()
self.tokens = min(self.rate, self.tokens + elapsed * (self.rate / self.per))
self.last_update = now
if self.tokens < 1:
wait_time = (1 - self.tokens) * (self.per / self.rate)
await asyncio.sleep(wait_time)
self.tokens = 0
else:
self.tokens -= 1
class BaseAsyncClient:
"""Base class for async API clients with rate limiting."""
def __init__(
self,
max_concurrent: int = 5,
requests_per_second: float = 3.0,
logger: logging.Logger | None = None,
):
"""
Initialize base client.
Args:
max_concurrent: Maximum concurrent requests
requests_per_second: Rate limit
logger: Logger instance
"""
self.semaphore = Semaphore(max_concurrent)
self.rate_limiter = RateLimiter(requests_per_second)
self.logger = logger or logging.getLogger(self.__class__.__name__)
self.stats = {
"requests": 0,
"success": 0,
"errors": 0,
"retries": 0,
}
@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=2, max=10),
retry=retry_if_exception_type(Exception),
)
async def _rate_limited_request(
self,
coro: Callable[[], Any],
) -> Any:
"""Execute a request with rate limiting and retry."""
async with self.semaphore:
await self.rate_limiter.acquire()
self.stats["requests"] += 1
try:
result = await coro()
self.stats["success"] += 1
return result
except Exception as e:
self.stats["errors"] += 1
self.logger.error(f"Request failed: {e}")
raise
async def batch_requests(
self,
requests: list[Callable[[], Any]],
desc: str = "Processing",
) -> list[Any]:
"""Execute multiple requests concurrently."""
try:
from tqdm.asyncio import tqdm
has_tqdm = True
except ImportError:
has_tqdm = False
async def execute(req: Callable) -> Any:
try:
return await self._rate_limited_request(req)
except Exception as e:
return {"error": str(e)}
tasks = [execute(req) for req in requests]
if has_tqdm:
results = []
for coro in tqdm.as_completed(tasks, total=len(tasks), desc=desc):
result = await coro
results.append(result)
return results
else:
return await asyncio.gather(*tasks, return_exceptions=True)
def print_stats(self) -> None:
"""Print request statistics."""
self.logger.info("=" * 40)
self.logger.info("Request Statistics:")
self.logger.info(f" Total Requests: {self.stats['requests']}")
self.logger.info(f" Successful: {self.stats['success']}")
self.logger.info(f" Errors: {self.stats['errors']}")
self.logger.info("=" * 40)
class ConfigManager:
"""Manage API configuration and credentials."""
def __init__(self):
load_dotenv()
@property
def google_credentials_path(self) -> str | None:
"""Get Google service account credentials path."""
# Prefer SEO-specific credentials, fallback to general credentials
seo_creds = os.path.expanduser("~/.credential/ourdigital-seo-agent.json")
if os.path.exists(seo_creds):
return seo_creds
return os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
@property
def pagespeed_api_key(self) -> str | None:
"""Get PageSpeed Insights API key."""
return os.getenv("PAGESPEED_API_KEY")
@property
def custom_search_api_key(self) -> str | None:
"""Get Custom Search API key."""
return os.getenv("CUSTOM_SEARCH_API_KEY")
@property
def custom_search_engine_id(self) -> str | None:
"""Get Custom Search Engine ID."""
return os.getenv("CUSTOM_SEARCH_ENGINE_ID")
@property
def notion_token(self) -> str | None:
"""Get Notion API token."""
return os.getenv("NOTION_TOKEN") or os.getenv("NOTION_API_KEY")
def validate_google_credentials(self) -> bool:
"""Validate Google credentials are configured."""
creds_path = self.google_credentials_path
if not creds_path:
return False
return os.path.exists(creds_path)
def get_required(self, key: str) -> str:
"""Get required environment variable or raise error."""
value = os.getenv(key)
if not value:
raise ValueError(f"Missing required environment variable: {key}")
return value
# Singleton config instance
config = ConfigManager()

View File

@@ -0,0 +1,569 @@
"""
Page Analyzer - Extract SEO metadata from web pages
===================================================
Purpose: Comprehensive page-level SEO data extraction
Python: 3.10+
Usage:
from page_analyzer import PageAnalyzer, PageMetadata
analyzer = PageAnalyzer()
metadata = analyzer.analyze_url("https://example.com/page")
"""
import json
import logging
import re
from dataclasses import dataclass, field
from datetime import datetime
from typing import Any
from urllib.parse import urljoin, urlparse
import requests
from bs4 import BeautifulSoup
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - %(levelname)s - %(message)s",
)
logger = logging.getLogger(__name__)
@dataclass
class LinkData:
"""Represents a link found on a page."""
url: str
anchor_text: str
is_internal: bool
is_nofollow: bool = False
link_type: str = "body" # body, nav, footer, etc.
@dataclass
class HeadingData:
"""Represents a heading found on a page."""
level: int # 1-6
text: str
@dataclass
class SchemaData:
"""Represents schema.org structured data."""
schema_type: str
properties: dict
format: str = "json-ld" # json-ld, microdata, rdfa
@dataclass
class OpenGraphData:
"""Represents Open Graph metadata."""
og_title: str | None = None
og_description: str | None = None
og_image: str | None = None
og_url: str | None = None
og_type: str | None = None
og_site_name: str | None = None
og_locale: str | None = None
twitter_card: str | None = None
twitter_title: str | None = None
twitter_description: str | None = None
twitter_image: str | None = None
@dataclass
class PageMetadata:
"""Complete SEO metadata for a page."""
# Basic info
url: str
status_code: int = 0
content_type: str = ""
response_time_ms: float = 0
analyzed_at: datetime = field(default_factory=datetime.now)
# Meta tags
title: str | None = None
title_length: int = 0
meta_description: str | None = None
meta_description_length: int = 0
canonical_url: str | None = None
robots_meta: str | None = None
# Language
html_lang: str | None = None
hreflang_tags: list[dict] = field(default_factory=list) # [{"lang": "en", "url": "..."}]
# Headings
headings: list[HeadingData] = field(default_factory=list)
h1_count: int = 0
h1_text: str | None = None
# Open Graph & Social
open_graph: OpenGraphData = field(default_factory=OpenGraphData)
# Schema/Structured Data
schema_data: list[SchemaData] = field(default_factory=list)
schema_types_found: list[str] = field(default_factory=list)
# Links
internal_links: list[LinkData] = field(default_factory=list)
external_links: list[LinkData] = field(default_factory=list)
internal_link_count: int = 0
external_link_count: int = 0
# Images
images_total: int = 0
images_without_alt: int = 0
images_with_alt: int = 0
# Content metrics
word_count: int = 0
# Issues found
issues: list[str] = field(default_factory=list)
warnings: list[str] = field(default_factory=list)
def to_dict(self) -> dict:
"""Convert to dictionary for JSON serialization."""
return {
"url": self.url,
"status_code": self.status_code,
"content_type": self.content_type,
"response_time_ms": self.response_time_ms,
"analyzed_at": self.analyzed_at.isoformat(),
"title": self.title,
"title_length": self.title_length,
"meta_description": self.meta_description,
"meta_description_length": self.meta_description_length,
"canonical_url": self.canonical_url,
"robots_meta": self.robots_meta,
"html_lang": self.html_lang,
"hreflang_tags": self.hreflang_tags,
"h1_count": self.h1_count,
"h1_text": self.h1_text,
"headings_count": len(self.headings),
"schema_types_found": self.schema_types_found,
"internal_link_count": self.internal_link_count,
"external_link_count": self.external_link_count,
"images_total": self.images_total,
"images_without_alt": self.images_without_alt,
"word_count": self.word_count,
"issues": self.issues,
"warnings": self.warnings,
"open_graph": {
"og_title": self.open_graph.og_title,
"og_description": self.open_graph.og_description,
"og_image": self.open_graph.og_image,
"og_url": self.open_graph.og_url,
"og_type": self.open_graph.og_type,
},
}
def get_summary(self) -> str:
"""Get a brief summary of the page analysis."""
lines = [
f"URL: {self.url}",
f"Status: {self.status_code}",
f"Title: {self.title[:50] + '...' if self.title and len(self.title) > 50 else self.title}",
f"Description: {'' if self.meta_description else '✗ Missing'}",
f"Canonical: {'' if self.canonical_url else '✗ Missing'}",
f"H1: {self.h1_count} found",
f"Schema: {', '.join(self.schema_types_found) if self.schema_types_found else 'None'}",
f"Links: {self.internal_link_count} internal, {self.external_link_count} external",
f"Images: {self.images_total} total, {self.images_without_alt} without alt",
]
if self.issues:
lines.append(f"Issues: {len(self.issues)}")
return "\n".join(lines)
class PageAnalyzer:
"""Analyze web pages for SEO metadata."""
DEFAULT_USER_AGENT = "Mozilla/5.0 (compatible; OurDigitalSEOBot/1.0; +https://ourdigital.org)"
def __init__(
self,
user_agent: str | None = None,
timeout: int = 30,
):
"""
Initialize page analyzer.
Args:
user_agent: Custom user agent string
timeout: Request timeout in seconds
"""
self.user_agent = user_agent or self.DEFAULT_USER_AGENT
self.timeout = timeout
self.session = requests.Session()
self.session.headers.update({
"User-Agent": self.user_agent,
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9,ko;q=0.8",
})
def analyze_url(self, url: str) -> PageMetadata:
"""
Analyze a URL and extract SEO metadata.
Args:
url: URL to analyze
Returns:
PageMetadata object with all extracted data
"""
metadata = PageMetadata(url=url)
try:
# Fetch page
start_time = datetime.now()
response = self.session.get(url, timeout=self.timeout, allow_redirects=True)
metadata.response_time_ms = (datetime.now() - start_time).total_seconds() * 1000
metadata.status_code = response.status_code
metadata.content_type = response.headers.get("Content-Type", "")
if response.status_code != 200:
metadata.issues.append(f"HTTP {response.status_code} status")
if response.status_code >= 400:
return metadata
# Parse HTML
soup = BeautifulSoup(response.text, "html.parser")
base_url = url
# Extract all metadata
self._extract_basic_meta(soup, metadata)
self._extract_canonical(soup, metadata, base_url)
self._extract_robots_meta(soup, metadata)
self._extract_hreflang(soup, metadata)
self._extract_headings(soup, metadata)
self._extract_open_graph(soup, metadata)
self._extract_schema(soup, metadata)
self._extract_links(soup, metadata, base_url)
self._extract_images(soup, metadata)
self._extract_content_metrics(soup, metadata)
# Run SEO checks
self._run_seo_checks(metadata)
except requests.RequestException as e:
metadata.issues.append(f"Request failed: {str(e)}")
logger.error(f"Failed to analyze {url}: {e}")
except Exception as e:
metadata.issues.append(f"Analysis error: {str(e)}")
logger.error(f"Error analyzing {url}: {e}")
return metadata
def _extract_basic_meta(self, soup: BeautifulSoup, metadata: PageMetadata) -> None:
"""Extract title and meta description."""
# Title
title_tag = soup.find("title")
if title_tag and title_tag.string:
metadata.title = title_tag.string.strip()
metadata.title_length = len(metadata.title)
# Meta description
desc_tag = soup.find("meta", attrs={"name": re.compile(r"^description$", re.I)})
if desc_tag and desc_tag.get("content"):
metadata.meta_description = desc_tag["content"].strip()
metadata.meta_description_length = len(metadata.meta_description)
# HTML lang
html_tag = soup.find("html")
if html_tag and html_tag.get("lang"):
metadata.html_lang = html_tag["lang"]
def _extract_canonical(self, soup: BeautifulSoup, metadata: PageMetadata, base_url: str) -> None:
"""Extract canonical URL."""
canonical = soup.find("link", rel="canonical")
if canonical and canonical.get("href"):
metadata.canonical_url = urljoin(base_url, canonical["href"])
def _extract_robots_meta(self, soup: BeautifulSoup, metadata: PageMetadata) -> None:
"""Extract robots meta tag."""
robots = soup.find("meta", attrs={"name": re.compile(r"^robots$", re.I)})
if robots and robots.get("content"):
metadata.robots_meta = robots["content"]
# Also check for googlebot-specific
googlebot = soup.find("meta", attrs={"name": re.compile(r"^googlebot$", re.I)})
if googlebot and googlebot.get("content"):
if metadata.robots_meta:
metadata.robots_meta += f" | googlebot: {googlebot['content']}"
else:
metadata.robots_meta = f"googlebot: {googlebot['content']}"
def _extract_hreflang(self, soup: BeautifulSoup, metadata: PageMetadata) -> None:
"""Extract hreflang tags."""
hreflang_tags = soup.find_all("link", rel="alternate", hreflang=True)
for tag in hreflang_tags:
if tag.get("href") and tag.get("hreflang"):
metadata.hreflang_tags.append({
"lang": tag["hreflang"],
"url": tag["href"]
})
def _extract_headings(self, soup: BeautifulSoup, metadata: PageMetadata) -> None:
"""Extract all headings."""
for level in range(1, 7):
for heading in soup.find_all(f"h{level}"):
text = heading.get_text(strip=True)
if text:
metadata.headings.append(HeadingData(level=level, text=text))
# Count H1s specifically
h1_tags = soup.find_all("h1")
metadata.h1_count = len(h1_tags)
if h1_tags:
metadata.h1_text = h1_tags[0].get_text(strip=True)
def _extract_open_graph(self, soup: BeautifulSoup, metadata: PageMetadata) -> None:
"""Extract Open Graph and Twitter Card data."""
og = metadata.open_graph
# Open Graph tags
og_mappings = {
"og:title": "og_title",
"og:description": "og_description",
"og:image": "og_image",
"og:url": "og_url",
"og:type": "og_type",
"og:site_name": "og_site_name",
"og:locale": "og_locale",
}
for og_prop, attr_name in og_mappings.items():
tag = soup.find("meta", property=og_prop)
if tag and tag.get("content"):
setattr(og, attr_name, tag["content"])
# Twitter Card tags
twitter_mappings = {
"twitter:card": "twitter_card",
"twitter:title": "twitter_title",
"twitter:description": "twitter_description",
"twitter:image": "twitter_image",
}
for tw_name, attr_name in twitter_mappings.items():
tag = soup.find("meta", attrs={"name": tw_name})
if tag and tag.get("content"):
setattr(og, attr_name, tag["content"])
def _extract_schema(self, soup: BeautifulSoup, metadata: PageMetadata) -> None:
"""Extract schema.org structured data."""
# JSON-LD
for script in soup.find_all("script", type="application/ld+json"):
try:
data = json.loads(script.string)
if isinstance(data, list):
for item in data:
self._process_schema_item(item, metadata, "json-ld")
else:
self._process_schema_item(data, metadata, "json-ld")
except (json.JSONDecodeError, TypeError):
continue
# Microdata (basic detection)
for item in soup.find_all(itemscope=True):
itemtype = item.get("itemtype", "")
if itemtype:
schema_type = itemtype.split("/")[-1]
if schema_type not in metadata.schema_types_found:
metadata.schema_types_found.append(schema_type)
metadata.schema_data.append(SchemaData(
schema_type=schema_type,
properties={},
format="microdata"
))
def _process_schema_item(self, data: dict, metadata: PageMetadata, format_type: str) -> None:
"""Process a single schema.org item."""
if not isinstance(data, dict):
return
schema_type = data.get("@type", "Unknown")
if isinstance(schema_type, list):
schema_type = schema_type[0] if schema_type else "Unknown"
if schema_type not in metadata.schema_types_found:
metadata.schema_types_found.append(schema_type)
metadata.schema_data.append(SchemaData(
schema_type=schema_type,
properties=data,
format=format_type
))
# Process nested @graph items
if "@graph" in data:
for item in data["@graph"]:
self._process_schema_item(item, metadata, format_type)
def _extract_links(self, soup: BeautifulSoup, metadata: PageMetadata, base_url: str) -> None:
"""Extract internal and external links."""
parsed_base = urlparse(base_url)
base_domain = parsed_base.netloc.lower()
for a_tag in soup.find_all("a", href=True):
href = a_tag["href"]
# Skip non-http links
if href.startswith(("#", "javascript:", "mailto:", "tel:")):
continue
# Resolve relative URLs
full_url = urljoin(base_url, href)
parsed_url = urlparse(full_url)
# Get anchor text
anchor_text = a_tag.get_text(strip=True)[:100] # Limit length
# Check if nofollow
rel = a_tag.get("rel", [])
if isinstance(rel, str):
rel = rel.split()
is_nofollow = "nofollow" in rel
# Determine if internal or external
link_domain = parsed_url.netloc.lower()
is_internal = (
link_domain == base_domain or
link_domain.endswith(f".{base_domain}") or
base_domain.endswith(f".{link_domain}")
)
link_data = LinkData(
url=full_url,
anchor_text=anchor_text,
is_internal=is_internal,
is_nofollow=is_nofollow,
)
if is_internal:
metadata.internal_links.append(link_data)
else:
metadata.external_links.append(link_data)
metadata.internal_link_count = len(metadata.internal_links)
metadata.external_link_count = len(metadata.external_links)
def _extract_images(self, soup: BeautifulSoup, metadata: PageMetadata) -> None:
"""Extract image information."""
images = soup.find_all("img")
metadata.images_total = len(images)
for img in images:
alt = img.get("alt", "").strip()
if alt:
metadata.images_with_alt += 1
else:
metadata.images_without_alt += 1
def _extract_content_metrics(self, soup: BeautifulSoup, metadata: PageMetadata) -> None:
"""Extract content metrics like word count."""
# Remove script and style elements
for element in soup(["script", "style", "noscript"]):
element.decompose()
# Get text content
text = soup.get_text(separator=" ", strip=True)
words = text.split()
metadata.word_count = len(words)
def _run_seo_checks(self, metadata: PageMetadata) -> None:
"""Run SEO checks and add issues/warnings."""
# Title checks
if not metadata.title:
metadata.issues.append("Missing title tag")
elif metadata.title_length < 30:
metadata.warnings.append(f"Title too short ({metadata.title_length} chars, recommend 50-60)")
elif metadata.title_length > 60:
metadata.warnings.append(f"Title too long ({metadata.title_length} chars, recommend 50-60)")
# Meta description checks
if not metadata.meta_description:
metadata.issues.append("Missing meta description")
elif metadata.meta_description_length < 120:
metadata.warnings.append(f"Meta description too short ({metadata.meta_description_length} chars)")
elif metadata.meta_description_length > 160:
metadata.warnings.append(f"Meta description too long ({metadata.meta_description_length} chars)")
# Canonical check
if not metadata.canonical_url:
metadata.warnings.append("Missing canonical tag")
elif metadata.canonical_url != metadata.url:
metadata.warnings.append(f"Canonical points to different URL: {metadata.canonical_url}")
# H1 checks
if metadata.h1_count == 0:
metadata.issues.append("Missing H1 tag")
elif metadata.h1_count > 1:
metadata.warnings.append(f"Multiple H1 tags ({metadata.h1_count})")
# Image alt check
if metadata.images_without_alt > 0:
metadata.warnings.append(f"{metadata.images_without_alt} images missing alt text")
# Schema check
if not metadata.schema_types_found:
metadata.warnings.append("No structured data found")
# Open Graph check
if not metadata.open_graph.og_title:
metadata.warnings.append("Missing Open Graph tags")
# Robots meta check
if metadata.robots_meta:
robots_lower = metadata.robots_meta.lower()
if "noindex" in robots_lower:
metadata.issues.append("Page is set to noindex")
if "nofollow" in robots_lower:
metadata.warnings.append("Page is set to nofollow")
def main():
"""CLI entry point for testing."""
import argparse
parser = argparse.ArgumentParser(description="Page SEO Analyzer")
parser.add_argument("url", help="URL to analyze")
parser.add_argument("--json", "-j", action="store_true", help="Output as JSON")
args = parser.parse_args()
analyzer = PageAnalyzer()
metadata = analyzer.analyze_url(args.url)
if args.json:
print(json.dumps(metadata.to_dict(), indent=2, ensure_ascii=False))
else:
print("=" * 60)
print("PAGE ANALYSIS REPORT")
print("=" * 60)
print(metadata.get_summary())
print()
if metadata.issues:
print("ISSUES:")
for issue in metadata.issues:
print(f"{issue}")
if metadata.warnings:
print("\nWARNINGS:")
for warning in metadata.warnings:
print(f"{warning}")
if metadata.hreflang_tags:
print(f"\nHREFLANG TAGS ({len(metadata.hreflang_tags)}):")
for tag in metadata.hreflang_tags[:5]:
print(f" {tag['lang']}: {tag['url']}")
if metadata.schema_types_found:
print(f"\nSCHEMA TYPES:")
for schema_type in metadata.schema_types_found:
print(f" - {schema_type}")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,6 @@
# 11-seo-on-page-audit dependencies
lxml>=5.1.0
beautifulsoup4>=4.12.0
requests>=2.31.0
python-dotenv>=1.0.0
rich>=13.7.0

View File

@@ -0,0 +1,94 @@
---
name: seo-on-page-audit
version: 1.0.0
description: On-page SEO analyzer for meta tags, headings, links, images, and Open Graph. Triggers: on-page SEO, meta tags, title tag, heading structure, alt text.
allowed-tools: mcp__firecrawl__*, mcp__perplexity__*, mcp__notion__*
---
# SEO On-Page Audit
## Purpose
Analyze single-page SEO elements: meta tags, heading hierarchy, internal/external links, images, and social sharing tags.
## Core Capabilities
1. **Meta Tags** - Title, description, canonical, robots
2. **Headings** - H1-H6 structure and hierarchy
3. **Links** - Internal, external, broken detection
4. **Images** - Alt text, sizing, lazy loading
5. **Social** - Open Graph, Twitter Cards
## MCP Tool Usage
```
mcp__firecrawl__scrape: Extract page HTML and metadata
mcp__perplexity__search: Research SEO best practices
mcp__notion__create-page: Save audit findings
```
## Workflow
1. Scrape target URL with Firecrawl
2. Extract and analyze meta tags
3. Map heading hierarchy
4. Count and categorize links
5. Check image optimization
6. Validate Open Graph tags
7. Generate recommendations
## Checklist
### Meta Tags
- [ ] Title present (50-60 characters)
- [ ] Meta description present (150-160 characters)
- [ ] Canonical URL set
- [ ] Robots meta allows indexing
### Headings
- [ ] Single H1 tag
- [ ] Logical hierarchy (no skips)
- [ ] Keywords in H1
### Links
- [ ] No broken internal links
- [ ] External links use rel attributes
- [ ] Reasonable internal link count
### Images
- [ ] All images have alt text
- [ ] Images are appropriately sized
- [ ] Lazy loading implemented
### Open Graph
- [ ] og:title present
- [ ] og:description present
- [ ] og:image present (1200x630)
## Output Format
```markdown
## On-Page Audit: [URL]
### Meta Tags: X/5
| Element | Status | Value |
|---------|--------|-------|
### Headings: X/5
- H1: [text]
- Hierarchy: Valid/Invalid
### Links
- Internal: X
- External: X
- Broken: X
### Recommendations
1. [Priority fixes]
```
## Limitations
- Single page analysis only
- Cannot detect JavaScript-rendered content issues
- External link status requires additional crawl

View File

@@ -0,0 +1,107 @@
# CLAUDE.md
## Overview
Local SEO auditor for businesses with physical locations: NAP consistency, Google Business Profile optimization, local citations, and LocalBusiness schema validation.
## Quick Start
This skill primarily uses MCP tools (Firecrawl, Perplexity) for data collection. Scripts are helpers for validation.
```bash
# NAP consistency check (manual data input)
python scripts/nap_checker.py --business "Business Name" --address "123 Main St" --phone "555-1234"
# LocalBusiness schema validation
python scripts/local_schema_validator.py --url https://example.com
```
## Audit Components
### 1. NAP Consistency
**Name, Address, Phone** consistency across:
- Website (header, footer, contact page)
- Google Business Profile
- Local directories (Yelp, Yellow Pages, etc.)
- Social media profiles
### 2. Google Business Profile (GBP)
Optimization checklist:
- [ ] Business name matches website
- [ ] Address is complete and accurate
- [ ] Phone number is local
- [ ] Business hours are current
- [ ] Categories are appropriate
- [ ] Photos uploaded (exterior, interior, products)
- [ ] Posts are recent (within 7 days)
- [ ] Reviews are responded to
### 3. Local Citations
Priority directories to check:
- Google Business Profile
- Apple Maps
- Bing Places
- Yelp
- Facebook Business
- Industry-specific directories
### 4. LocalBusiness Schema
Required properties:
- @type (LocalBusiness or subtype)
- name
- address (PostalAddress)
- telephone
- openingHours
## Workflow
```
1. Collect NAP from client
2. Scrape website for NAP mentions
3. Search citations using Perplexity
4. Check GBP data (manual or API)
5. Validate LocalBusiness schema
6. Generate consistency report
```
## Output Format
```markdown
## Local SEO Audit: [Business Name]
### NAP Consistency Score: X/10
| Source | Name | Address | Phone | Status |
|--------|------|---------|-------|--------|
| Website | ✓ | ✓ | ✓ | Match |
| GBP | ✓ | ✗ | ✓ | Mismatch |
### GBP Optimization: X/10
- [ ] Issue 1
- [x] Completed item
### Citation Audit
- Found: X citations
- Consistent: X
- Needs update: X
### Recommendations
1. Fix address mismatch on GBP
2. Add LocalBusiness schema
```
## Common Issues
| Issue | Impact | Fix |
|-------|--------|-----|
| NAP inconsistency | High | Update all directories |
| Missing GBP categories | Medium | Add relevant categories |
| No LocalBusiness schema | Medium | Add JSON-LD markup |
| Outdated business hours | Medium | Update GBP hours |
| No review responses | Low | Respond to all reviews |
## Notes
- GBP API requires enterprise approval (use manual audit)
- Citation discovery limited to public data
- Use schema generator skill (14) for creating LocalBusiness markup

View File

@@ -0,0 +1,116 @@
---
name: seo-local-audit
version: 1.0.0
description: Local SEO auditor for NAP consistency, Google Business Profile, citations, and LocalBusiness schema. Triggers: local SEO, Google Business Profile, GBP, NAP, citations, local rankings.
allowed-tools: mcp__firecrawl__*, mcp__perplexity__*, mcp__notion__*
---
# SEO Local Audit
## Purpose
Audit local business SEO: NAP (Name, Address, Phone) consistency, Google Business Profile optimization, local citations, and LocalBusiness schema markup.
## Core Capabilities
1. **NAP Consistency** - Cross-platform verification
2. **GBP Optimization** - Profile completeness check
3. **Citation Audit** - Directory presence
4. **Schema Validation** - LocalBusiness markup
## MCP Tool Usage
```
mcp__firecrawl__scrape: Extract NAP from website
mcp__perplexity__search: Find citations and directories
mcp__notion__create-page: Save audit findings
```
## Workflow
### 1. Gather Business Info
Collect from client:
- Business name (exact)
- Full address
- Phone number (local preferred)
- Website URL
- GBP listing URL
### 2. Website NAP Check
Scrape website for NAP mentions:
- Header/footer
- Contact page
- About page
- Schema markup
### 3. Citation Discovery
Search for business mentions:
- "[Business Name] [City]"
- Phone number search
- Address search
### 4. GBP Review
Manual checklist:
- Profile completeness
- Category accuracy
- Photo presence
- Review responses
- Post recency
### 5. Schema Check
Validate LocalBusiness markup presence and accuracy.
## GBP Optimization Checklist
- [ ] Business name matches website
- [ ] Complete address with suite/unit
- [ ] Local phone number (not toll-free)
- [ ] Accurate business hours
- [ ] Primary + secondary categories set
- [ ] Business description complete
- [ ] 10+ photos uploaded
- [ ] Recent post (within 7 days)
- [ ] Reviews responded to
## Citation Priority
| Platform | Priority |
|----------|----------|
| Google Business Profile | Critical |
| Apple Maps | High |
| Bing Places | High |
| Yelp | High |
| Facebook | Medium |
| Industry directories | Medium |
## Output Format
```markdown
## Local SEO Audit: [Business]
### NAP Consistency: X/10
| Source | Name | Address | Phone |
|--------|------|---------|-------|
| Website | ✓/✗ | ✓/✗ | ✓/✗ |
| GBP | ✓/✗ | ✓/✗ | ✓/✗ |
### GBP Score: X/10
[Checklist results]
### Citations Found: X
- Consistent: X
- Inconsistent: X
### LocalBusiness Schema
- Present: Yes/No
- Valid: Yes/No
### Priority Actions
1. [Fix recommendations]
```
## Limitations
- GBP data requires manual access
- Citation discovery limited to searchable sources
- Cannot update external directories

View File

@@ -0,0 +1,113 @@
# CLAUDE.md
## Overview
Structured data validator: extract, parse, and validate JSON-LD, Microdata, and RDFa markup against schema.org vocabulary.
## Quick Start
```bash
pip install -r scripts/requirements.txt
python scripts/schema_validator.py --url https://example.com
```
## Scripts
| Script | Purpose |
|--------|---------|
| `schema_validator.py` | Extract and validate structured data |
| `base_client.py` | Shared utilities |
## Usage
```bash
# Validate page schema
python scripts/schema_validator.py --url https://example.com
# JSON output
python scripts/schema_validator.py --url https://example.com --json
# Validate local file
python scripts/schema_validator.py --file schema.json
# Check Rich Results eligibility
python scripts/schema_validator.py --url https://example.com --rich-results
```
## Supported Formats
| Format | Detection |
|--------|-----------|
| JSON-LD | `<script type="application/ld+json">` |
| Microdata | `itemscope`, `itemtype`, `itemprop` |
| RDFa | `vocab`, `typeof`, `property` |
## Validation Levels
### 1. Syntax Validation
- Valid JSON structure
- Proper nesting
- No syntax errors
### 2. Schema.org Vocabulary
- Valid @type values
- Known properties
- Correct property types
### 3. Google Rich Results
- Required properties present
- Recommended properties
- Feature-specific requirements
## Schema Types Validated
| Type | Required Properties | Rich Result |
|------|---------------------|-------------|
| Article | headline, author, datePublished | Yes |
| Product | name, offers | Yes |
| LocalBusiness | name, address | Yes |
| FAQPage | mainEntity | Yes |
| Organization | name, url | Yes |
| BreadcrumbList | itemListElement | Yes |
| WebSite | name, url | Sitelinks |
## Output
```json
{
"url": "https://example.com",
"schemas_found": 3,
"schemas": [
{
"@type": "Organization",
"valid": true,
"rich_results_eligible": true,
"issues": [],
"warnings": []
}
],
"summary": {
"valid": 3,
"invalid": 0,
"rich_results_eligible": 2
}
}
```
## Issue Severity
| Level | Description |
|-------|-------------|
| Error | Invalid schema, blocks rich results |
| Warning | Missing recommended property |
| Info | Optimization suggestion |
## Dependencies
```
extruct>=0.16.0
jsonschema>=4.21.0
rdflib>=7.0.0
lxml>=5.1.0
requests>=2.31.0
```

View File

@@ -0,0 +1,207 @@
"""
Base Client - Shared async client utilities
===========================================
Purpose: Rate-limited async operations for API clients
Python: 3.10+
"""
import asyncio
import logging
import os
from asyncio import Semaphore
from datetime import datetime
from typing import Any, Callable, TypeVar
from dotenv import load_dotenv
from tenacity import (
retry,
stop_after_attempt,
wait_exponential,
retry_if_exception_type,
)
# Load environment variables
load_dotenv()
# Logging setup
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - %(levelname)s - %(message)s",
)
T = TypeVar("T")
class RateLimiter:
"""Rate limiter using token bucket algorithm."""
def __init__(self, rate: float, per: float = 1.0):
"""
Initialize rate limiter.
Args:
rate: Number of requests allowed
per: Time period in seconds (default: 1 second)
"""
self.rate = rate
self.per = per
self.tokens = rate
self.last_update = datetime.now()
self._lock = asyncio.Lock()
async def acquire(self) -> None:
"""Acquire a token, waiting if necessary."""
async with self._lock:
now = datetime.now()
elapsed = (now - self.last_update).total_seconds()
self.tokens = min(self.rate, self.tokens + elapsed * (self.rate / self.per))
self.last_update = now
if self.tokens < 1:
wait_time = (1 - self.tokens) * (self.per / self.rate)
await asyncio.sleep(wait_time)
self.tokens = 0
else:
self.tokens -= 1
class BaseAsyncClient:
"""Base class for async API clients with rate limiting."""
def __init__(
self,
max_concurrent: int = 5,
requests_per_second: float = 3.0,
logger: logging.Logger | None = None,
):
"""
Initialize base client.
Args:
max_concurrent: Maximum concurrent requests
requests_per_second: Rate limit
logger: Logger instance
"""
self.semaphore = Semaphore(max_concurrent)
self.rate_limiter = RateLimiter(requests_per_second)
self.logger = logger or logging.getLogger(self.__class__.__name__)
self.stats = {
"requests": 0,
"success": 0,
"errors": 0,
"retries": 0,
}
@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=2, max=10),
retry=retry_if_exception_type(Exception),
)
async def _rate_limited_request(
self,
coro: Callable[[], Any],
) -> Any:
"""Execute a request with rate limiting and retry."""
async with self.semaphore:
await self.rate_limiter.acquire()
self.stats["requests"] += 1
try:
result = await coro()
self.stats["success"] += 1
return result
except Exception as e:
self.stats["errors"] += 1
self.logger.error(f"Request failed: {e}")
raise
async def batch_requests(
self,
requests: list[Callable[[], Any]],
desc: str = "Processing",
) -> list[Any]:
"""Execute multiple requests concurrently."""
try:
from tqdm.asyncio import tqdm
has_tqdm = True
except ImportError:
has_tqdm = False
async def execute(req: Callable) -> Any:
try:
return await self._rate_limited_request(req)
except Exception as e:
return {"error": str(e)}
tasks = [execute(req) for req in requests]
if has_tqdm:
results = []
for coro in tqdm.as_completed(tasks, total=len(tasks), desc=desc):
result = await coro
results.append(result)
return results
else:
return await asyncio.gather(*tasks, return_exceptions=True)
def print_stats(self) -> None:
"""Print request statistics."""
self.logger.info("=" * 40)
self.logger.info("Request Statistics:")
self.logger.info(f" Total Requests: {self.stats['requests']}")
self.logger.info(f" Successful: {self.stats['success']}")
self.logger.info(f" Errors: {self.stats['errors']}")
self.logger.info("=" * 40)
class ConfigManager:
"""Manage API configuration and credentials."""
def __init__(self):
load_dotenv()
@property
def google_credentials_path(self) -> str | None:
"""Get Google service account credentials path."""
# Prefer SEO-specific credentials, fallback to general credentials
seo_creds = os.path.expanduser("~/.credential/ourdigital-seo-agent.json")
if os.path.exists(seo_creds):
return seo_creds
return os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
@property
def pagespeed_api_key(self) -> str | None:
"""Get PageSpeed Insights API key."""
return os.getenv("PAGESPEED_API_KEY")
@property
def custom_search_api_key(self) -> str | None:
"""Get Custom Search API key."""
return os.getenv("CUSTOM_SEARCH_API_KEY")
@property
def custom_search_engine_id(self) -> str | None:
"""Get Custom Search Engine ID."""
return os.getenv("CUSTOM_SEARCH_ENGINE_ID")
@property
def notion_token(self) -> str | None:
"""Get Notion API token."""
return os.getenv("NOTION_TOKEN") or os.getenv("NOTION_API_KEY")
def validate_google_credentials(self) -> bool:
"""Validate Google credentials are configured."""
creds_path = self.google_credentials_path
if not creds_path:
return False
return os.path.exists(creds_path)
def get_required(self, key: str) -> str:
"""Get required environment variable or raise error."""
value = os.getenv(key)
if not value:
raise ValueError(f"Missing required environment variable: {key}")
return value
# Singleton config instance
config = ConfigManager()

View File

@@ -0,0 +1,9 @@
# 13-seo-schema-validator dependencies
extruct>=0.16.0
jsonschema>=4.21.0
rdflib>=7.0.0
lxml>=5.1.0
beautifulsoup4>=4.12.0
requests>=2.31.0
python-dotenv>=1.0.0
rich>=13.7.0

View File

@@ -0,0 +1,498 @@
"""
Schema Validator - Validate JSON-LD structured data markup
==========================================================
Purpose: Extract and validate schema.org structured data from URLs or files
Python: 3.10+
Usage:
python schema_validator.py --url https://example.com
python schema_validator.py --file schema.json
"""
import argparse
import json
import logging
import re
from dataclasses import dataclass, field
from datetime import datetime
from typing import Any
from urllib.parse import urlparse
import requests
from bs4 import BeautifulSoup
try:
import extruct
HAS_EXTRUCT = True
except ImportError:
HAS_EXTRUCT = False
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - %(levelname)s - %(message)s",
)
logger = logging.getLogger(__name__)
@dataclass
class ValidationIssue:
"""Represents a validation issue found in schema."""
severity: str # "error", "warning", "info"
message: str
schema_type: str | None = None
property_name: str | None = None
suggestion: str | None = None
@dataclass
class ValidationResult:
"""Complete validation result for a schema."""
url: str | None = None
schemas_found: list[dict] = field(default_factory=list)
issues: list[ValidationIssue] = field(default_factory=list)
valid: bool = True
rich_results_eligible: dict = field(default_factory=dict)
timestamp: str = field(default_factory=lambda: datetime.now().isoformat())
def to_dict(self) -> dict:
"""Convert to dictionary for JSON output."""
return {
"url": self.url,
"schemas_found": len(self.schemas_found),
"schema_types": [s.get("@type", "Unknown") for s in self.schemas_found],
"valid": self.valid,
"issues": [
{
"severity": i.severity,
"message": i.message,
"schema_type": i.schema_type,
"property": i.property_name,
"suggestion": i.suggestion,
}
for i in self.issues
],
"rich_results_eligible": self.rich_results_eligible,
"timestamp": self.timestamp,
}
class SchemaValidator:
"""Validate schema.org structured data."""
# Required properties for common schema types
REQUIRED_PROPERTIES = {
"Organization": ["name", "url"],
"LocalBusiness": ["name", "address"],
"Product": ["name"],
"Offer": ["price", "priceCurrency"],
"Article": ["headline", "author", "datePublished", "publisher"],
"BlogPosting": ["headline", "author", "datePublished", "publisher"],
"NewsArticle": ["headline", "author", "datePublished", "publisher"],
"FAQPage": ["mainEntity"],
"Question": ["name", "acceptedAnswer"],
"Answer": ["text"],
"BreadcrumbList": ["itemListElement"],
"ListItem": ["position", "name"],
"WebSite": ["name", "url"],
"WebPage": ["name"],
"Person": ["name"],
"Event": ["name", "startDate", "location"],
"Review": ["reviewRating", "author"],
"AggregateRating": ["ratingValue"],
"ImageObject": ["url"],
}
# Recommended (but not required) properties
RECOMMENDED_PROPERTIES = {
"Organization": ["logo", "description", "contactPoint", "sameAs"],
"LocalBusiness": ["telephone", "openingHoursSpecification", "geo", "image"],
"Product": ["description", "image", "brand", "offers", "aggregateRating"],
"Article": ["image", "dateModified", "description"],
"FAQPage": [],
"WebSite": ["potentialAction"],
"BreadcrumbList": [],
}
# Google Rich Results eligible types
RICH_RESULTS_TYPES = {
"Article", "BlogPosting", "NewsArticle",
"Product", "Review",
"FAQPage", "HowTo",
"LocalBusiness", "Restaurant",
"Event",
"Recipe",
"JobPosting",
"Course",
"BreadcrumbList",
"Organization",
"WebSite",
"VideoObject",
}
def __init__(self):
self.session = requests.Session()
self.session.headers.update({
"User-Agent": "Mozilla/5.0 (compatible; SEOAuditBot/1.0)"
})
def extract_from_url(self, url: str) -> list[dict]:
"""Extract all structured data from a URL."""
try:
response = self.session.get(url, timeout=30)
response.raise_for_status()
return self.extract_from_html(response.text, url)
except requests.RequestException as e:
logger.error(f"Failed to fetch URL: {e}")
return []
def extract_from_html(self, html: str, base_url: str | None = None) -> list[dict]:
"""Extract structured data from HTML content."""
schemas = []
# Method 1: Use extruct if available (handles JSON-LD, Microdata, RDFa)
if HAS_EXTRUCT:
try:
data = extruct.extract(html, base_url=base_url, uniform=True)
schemas.extend(data.get("json-ld", []))
schemas.extend(data.get("microdata", []))
schemas.extend(data.get("rdfa", []))
except Exception as e:
logger.warning(f"extruct extraction failed: {e}")
# Method 2: Manual JSON-LD extraction (fallback/additional)
soup = BeautifulSoup(html, "html.parser")
for script in soup.find_all("script", type="application/ld+json"):
try:
content = script.string
if content:
data = json.loads(content)
if isinstance(data, list):
schemas.extend(data)
else:
schemas.append(data)
except json.JSONDecodeError as e:
logger.warning(f"Invalid JSON-LD: {e}")
# Deduplicate schemas
seen = set()
unique_schemas = []
for schema in schemas:
schema_str = json.dumps(schema, sort_keys=True)
if schema_str not in seen:
seen.add(schema_str)
unique_schemas.append(schema)
return unique_schemas
def validate(self, url: str | None = None, html: str | None = None,
schema: dict | None = None) -> ValidationResult:
"""Validate schema from URL, HTML, or direct schema dict."""
result = ValidationResult(url=url)
# Extract schemas
if schema:
schemas = [schema]
elif html:
schemas = self.extract_from_html(html, url)
elif url:
schemas = self.extract_from_url(url)
else:
raise ValueError("Must provide url, html, or schema")
result.schemas_found = schemas
if not schemas:
result.issues.append(ValidationIssue(
severity="warning",
message="No structured data found",
suggestion="Add JSON-LD schema markup to improve SEO",
))
result.valid = False
return result
# Validate each schema
for schema in schemas:
self._validate_schema(schema, result)
# Check for errors (warnings don't affect validity)
result.valid = not any(i.severity == "error" for i in result.issues)
return result
def _validate_schema(self, schema: dict, result: ValidationResult,
parent_type: str | None = None) -> None:
"""Validate a single schema object."""
schema_type = schema.get("@type")
if not schema_type:
result.issues.append(ValidationIssue(
severity="error",
message="Missing @type property",
schema_type=parent_type,
))
return
# Handle array of types
if isinstance(schema_type, list):
schema_type = schema_type[0]
# Check required properties
required = self.REQUIRED_PROPERTIES.get(schema_type, [])
for prop in required:
if prop not in schema:
result.issues.append(ValidationIssue(
severity="error",
message=f"Missing required property: {prop}",
schema_type=schema_type,
property_name=prop,
suggestion=f"Add '{prop}' property to {schema_type} schema",
))
# Check recommended properties
recommended = self.RECOMMENDED_PROPERTIES.get(schema_type, [])
for prop in recommended:
if prop not in schema:
result.issues.append(ValidationIssue(
severity="info",
message=f"Missing recommended property: {prop}",
schema_type=schema_type,
property_name=prop,
suggestion=f"Consider adding '{prop}' for better rich results",
))
# Check Rich Results eligibility
if schema_type in self.RICH_RESULTS_TYPES:
result.rich_results_eligible[schema_type] = self._check_rich_results(
schema, schema_type
)
# Validate nested schemas
for key, value in schema.items():
if key.startswith("@"):
continue
if isinstance(value, dict) and "@type" in value:
self._validate_schema(value, result, schema_type)
elif isinstance(value, list):
for item in value:
if isinstance(item, dict) and "@type" in item:
self._validate_schema(item, result, schema_type)
# Type-specific validations
self._validate_type_specific(schema, schema_type, result)
def _validate_type_specific(self, schema: dict, schema_type: str,
result: ValidationResult) -> None:
"""Type-specific validation rules."""
if schema_type in ("Article", "BlogPosting", "NewsArticle"):
# Check image
if "image" not in schema:
result.issues.append(ValidationIssue(
severity="warning",
message="Article without image may not show in rich results",
schema_type=schema_type,
property_name="image",
suggestion="Add at least one image to the article",
))
# Check headline length
headline = schema.get("headline", "")
if len(headline) > 110:
result.issues.append(ValidationIssue(
severity="warning",
message=f"Headline too long ({len(headline)} chars, max 110)",
schema_type=schema_type,
property_name="headline",
))
elif schema_type == "Product":
offer = schema.get("offers", {})
if isinstance(offer, dict):
# Check price
price = offer.get("price")
if price is not None:
try:
float(price)
except (ValueError, TypeError):
result.issues.append(ValidationIssue(
severity="error",
message=f"Invalid price value: {price}",
schema_type="Offer",
property_name="price",
))
# Check availability
availability = offer.get("availability", "")
valid_availabilities = [
"InStock", "OutOfStock", "PreOrder", "Discontinued",
"https://schema.org/InStock", "https://schema.org/OutOfStock",
]
if availability and not any(
a in availability for a in valid_availabilities
):
result.issues.append(ValidationIssue(
severity="warning",
message=f"Unknown availability value: {availability}",
schema_type="Offer",
property_name="availability",
))
elif schema_type == "LocalBusiness":
# Check for geo coordinates
if "geo" not in schema:
result.issues.append(ValidationIssue(
severity="info",
message="Missing geo coordinates",
schema_type=schema_type,
property_name="geo",
suggestion="Add latitude/longitude for better local search",
))
elif schema_type == "FAQPage":
main_entity = schema.get("mainEntity", [])
if not main_entity:
result.issues.append(ValidationIssue(
severity="error",
message="FAQPage must have at least one question",
schema_type=schema_type,
property_name="mainEntity",
))
elif len(main_entity) < 2:
result.issues.append(ValidationIssue(
severity="info",
message="FAQPage has only one question",
schema_type=schema_type,
suggestion="Add more questions for better rich results",
))
def _check_rich_results(self, schema: dict, schema_type: str) -> dict:
"""Check if schema is eligible for Google Rich Results."""
result = {
"eligible": True,
"missing_for_rich_results": [],
}
if schema_type in ("Article", "BlogPosting", "NewsArticle"):
required_for_rich = ["headline", "image", "datePublished", "author"]
for prop in required_for_rich:
if prop not in schema:
result["eligible"] = False
result["missing_for_rich_results"].append(prop)
elif schema_type == "Product":
if "name" not in schema:
result["eligible"] = False
result["missing_for_rich_results"].append("name")
offer = schema.get("offers")
if not offer:
result["eligible"] = False
result["missing_for_rich_results"].append("offers")
elif schema_type == "FAQPage":
if not schema.get("mainEntity"):
result["eligible"] = False
result["missing_for_rich_results"].append("mainEntity")
return result
def generate_report(self, result: ValidationResult) -> str:
"""Generate human-readable validation report."""
lines = [
"=" * 60,
"Schema Validation Report",
"=" * 60,
f"URL: {result.url or 'N/A'}",
f"Timestamp: {result.timestamp}",
f"Valid: {'Yes' if result.valid else 'No'}",
f"Schemas Found: {len(result.schemas_found)}",
"",
]
if result.schemas_found:
lines.append("Schema Types:")
for schema in result.schemas_found:
schema_type = schema.get("@type", "Unknown")
lines.append(f" - {schema_type}")
lines.append("")
if result.rich_results_eligible:
lines.append("Rich Results Eligibility:")
for schema_type, status in result.rich_results_eligible.items():
eligible = "Yes" if status["eligible"] else "No"
lines.append(f" - {schema_type}: {eligible}")
if status["missing_for_rich_results"]:
missing = ", ".join(status["missing_for_rich_results"])
lines.append(f" Missing: {missing}")
lines.append("")
if result.issues:
lines.append("Issues Found:")
errors = [i for i in result.issues if i.severity == "error"]
warnings = [i for i in result.issues if i.severity == "warning"]
infos = [i for i in result.issues if i.severity == "info"]
if errors:
lines.append(f"\n ERRORS ({len(errors)}):")
for issue in errors:
lines.append(f" - [{issue.schema_type}] {issue.message}")
if issue.suggestion:
lines.append(f" Suggestion: {issue.suggestion}")
if warnings:
lines.append(f"\n WARNINGS ({len(warnings)}):")
for issue in warnings:
lines.append(f" - [{issue.schema_type}] {issue.message}")
if issue.suggestion:
lines.append(f" Suggestion: {issue.suggestion}")
if infos:
lines.append(f"\n INFO ({len(infos)}):")
for issue in infos:
lines.append(f" - [{issue.schema_type}] {issue.message}")
if issue.suggestion:
lines.append(f" Suggestion: {issue.suggestion}")
lines.append("")
lines.append("=" * 60)
return "\n".join(lines)
def main():
"""Main entry point for CLI usage."""
parser = argparse.ArgumentParser(
description="Validate schema.org structured data",
)
parser.add_argument("--url", "-u", help="URL to validate")
parser.add_argument("--file", "-f", help="JSON-LD file to validate")
parser.add_argument("--output", "-o", help="Output file for JSON report")
parser.add_argument("--json", action="store_true", help="Output as JSON")
args = parser.parse_args()
if not args.url and not args.file:
parser.error("Must provide --url or --file")
validator = SchemaValidator()
if args.file:
with open(args.file, "r", encoding="utf-8") as f:
schema = json.load(f)
result = validator.validate(schema=schema)
else:
result = validator.validate(url=args.url)
if args.json or args.output:
output = json.dumps(result.to_dict(), ensure_ascii=False, indent=2)
if args.output:
with open(args.output, "w", encoding="utf-8") as f:
f.write(output)
logger.info(f"Report written to {args.output}")
else:
print(output)
else:
print(validator.generate_report(result))
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,110 @@
---
name: seo-schema-validator
version: 1.0.0
description: Structured data validator for JSON-LD, Microdata, and RDFa. Triggers: validate schema, structured data, JSON-LD, rich results, schema.org.
allowed-tools: mcp__firecrawl__*, mcp__perplexity__*
---
# SEO Schema Validator
## Purpose
Extract and validate structured data (JSON-LD, Microdata, RDFa) against schema.org vocabulary and Google Rich Results requirements.
## Core Capabilities
1. **Extract** - Find all structured data on page
2. **Parse** - JSON-LD, Microdata, RDFa formats
3. **Validate** - Schema.org compliance
4. **Rich Results** - Google eligibility check
## MCP Tool Usage
```
mcp__firecrawl__scrape: Extract page HTML with structured data
mcp__perplexity__search: Research schema requirements
```
## Workflow
1. Scrape target URL
2. Locate structured data blocks
3. Parse each format found
4. Validate against schema.org
5. Check Rich Results eligibility
6. Report issues and recommendations
## Supported Schema Types
| Type | Required Properties | Rich Result |
|------|---------------------|-------------|
| Article | headline, author, datePublished, image | Yes |
| Product | name, offers (price, availability) | Yes |
| LocalBusiness | name, address, telephone | Yes |
| FAQPage | mainEntity (questions) | Yes |
| Organization | name, url, logo | Sitelinks |
| BreadcrumbList | itemListElement | Yes |
| WebSite | name, url, potentialAction | Sitelinks |
| Review | itemReviewed, reviewRating | Yes |
| Event | name, startDate, location | Yes |
| Recipe | name, image, ingredients | Yes |
## Validation Levels
### Level 1: Syntax
- Valid JSON structure
- Proper nesting
- No parsing errors
### Level 2: Vocabulary
- Valid @type values
- Known property names
- Correct value types
### Level 3: Rich Results
- Required properties present
- Recommended properties
- Google-specific requirements
## Output Format
```markdown
## Schema Validation: [URL]
### Schemas Found: X
#### Schema 1: [Type]
- Format: JSON-LD
- Valid: Yes/No
- Rich Results Eligible: Yes/No
**Issues:**
- [Error/Warning list]
**Properties:**
| Property | Present | Valid |
|----------|---------|-------|
### Summary
- Valid: X
- Invalid: X
- Rich Results Ready: X
### Recommendations
1. [Fixes needed]
```
## Common Issues
| Issue | Severity | Fix |
|-------|----------|-----|
| Missing required property | Error | Add property |
| Invalid date format | Error | Use ISO 8601 |
| Missing @context | Error | Add schema.org context |
| No image property | Warning | Add image URL |
## Limitations
- Cannot test rendered schema (JavaScript)
- Validation against schema.org, not all Google features
- Use Google Rich Results Test for final verification

View File

@@ -0,0 +1,121 @@
# CLAUDE.md
## Overview
Schema markup generator: create JSON-LD structured data from templates for various content types.
## Quick Start
```bash
pip install -r scripts/requirements.txt
# Generate Organization schema
python scripts/schema_generator.py --type organization --url https://example.com
# Generate from template
python scripts/schema_generator.py --template templates/article.json --data article_data.json
```
## Scripts
| Script | Purpose |
|--------|---------|
| `schema_generator.py` | Generate schema markup |
| `base_client.py` | Shared utilities |
## Supported Schema Types
| Type | Template | Use Case |
|------|----------|----------|
| Organization | `organization.json` | Company/brand info |
| LocalBusiness | `local_business.json` | Physical locations |
| Article | `article.json` | Blog posts, news |
| Product | `product.json` | E-commerce items |
| FAQPage | `faq.json` | FAQ sections |
| BreadcrumbList | `breadcrumb.json` | Navigation path |
| WebSite | `website.json` | Site-level info |
## Usage Examples
### Organization
```bash
python scripts/schema_generator.py --type organization \
--name "Company Name" \
--url "https://example.com" \
--logo "https://example.com/logo.png"
```
### LocalBusiness
```bash
python scripts/schema_generator.py --type localbusiness \
--name "Restaurant Name" \
--address "123 Main St, City, State 12345" \
--phone "+1-555-123-4567" \
--hours "Mo-Fr 09:00-17:00"
```
### Article
```bash
python scripts/schema_generator.py --type article \
--headline "Article Title" \
--author "Author Name" \
--published "2024-01-15" \
--image "https://example.com/image.jpg"
```
### FAQPage
```bash
python scripts/schema_generator.py --type faq \
--questions questions.json
```
## Output
Generated JSON-LD ready for insertion:
```html
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Organization",
"name": "Company Name",
"url": "https://example.com",
"logo": "https://example.com/logo.png"
}
</script>
```
## Template Customization
Templates in `templates/` can be modified. Required fields are marked:
```json
{
"@context": "https://schema.org",
"@type": "Article",
"headline": "{{REQUIRED}}",
"author": {
"@type": "Person",
"name": "{{REQUIRED}}"
},
"datePublished": "{{REQUIRED}}",
"image": "{{RECOMMENDED}}"
}
```
## Validation
Generated schemas are validated before output:
- Syntax correctness
- Required properties present
- Schema.org vocabulary compliance
Use skill 13 (schema-validator) for additional validation.
## Dependencies
```
jsonschema>=4.21.0
requests>=2.31.0
python-dotenv>=1.0.0
```

View File

@@ -0,0 +1,207 @@
"""
Base Client - Shared async client utilities
===========================================
Purpose: Rate-limited async operations for API clients
Python: 3.10+
"""
import asyncio
import logging
import os
from asyncio import Semaphore
from datetime import datetime
from typing import Any, Callable, TypeVar
from dotenv import load_dotenv
from tenacity import (
retry,
stop_after_attempt,
wait_exponential,
retry_if_exception_type,
)
# Load environment variables
load_dotenv()
# Logging setup
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - %(levelname)s - %(message)s",
)
T = TypeVar("T")
class RateLimiter:
"""Rate limiter using token bucket algorithm."""
def __init__(self, rate: float, per: float = 1.0):
"""
Initialize rate limiter.
Args:
rate: Number of requests allowed
per: Time period in seconds (default: 1 second)
"""
self.rate = rate
self.per = per
self.tokens = rate
self.last_update = datetime.now()
self._lock = asyncio.Lock()
async def acquire(self) -> None:
"""Acquire a token, waiting if necessary."""
async with self._lock:
now = datetime.now()
elapsed = (now - self.last_update).total_seconds()
self.tokens = min(self.rate, self.tokens + elapsed * (self.rate / self.per))
self.last_update = now
if self.tokens < 1:
wait_time = (1 - self.tokens) * (self.per / self.rate)
await asyncio.sleep(wait_time)
self.tokens = 0
else:
self.tokens -= 1
class BaseAsyncClient:
"""Base class for async API clients with rate limiting."""
def __init__(
self,
max_concurrent: int = 5,
requests_per_second: float = 3.0,
logger: logging.Logger | None = None,
):
"""
Initialize base client.
Args:
max_concurrent: Maximum concurrent requests
requests_per_second: Rate limit
logger: Logger instance
"""
self.semaphore = Semaphore(max_concurrent)
self.rate_limiter = RateLimiter(requests_per_second)
self.logger = logger or logging.getLogger(self.__class__.__name__)
self.stats = {
"requests": 0,
"success": 0,
"errors": 0,
"retries": 0,
}
@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=2, max=10),
retry=retry_if_exception_type(Exception),
)
async def _rate_limited_request(
self,
coro: Callable[[], Any],
) -> Any:
"""Execute a request with rate limiting and retry."""
async with self.semaphore:
await self.rate_limiter.acquire()
self.stats["requests"] += 1
try:
result = await coro()
self.stats["success"] += 1
return result
except Exception as e:
self.stats["errors"] += 1
self.logger.error(f"Request failed: {e}")
raise
async def batch_requests(
self,
requests: list[Callable[[], Any]],
desc: str = "Processing",
) -> list[Any]:
"""Execute multiple requests concurrently."""
try:
from tqdm.asyncio import tqdm
has_tqdm = True
except ImportError:
has_tqdm = False
async def execute(req: Callable) -> Any:
try:
return await self._rate_limited_request(req)
except Exception as e:
return {"error": str(e)}
tasks = [execute(req) for req in requests]
if has_tqdm:
results = []
for coro in tqdm.as_completed(tasks, total=len(tasks), desc=desc):
result = await coro
results.append(result)
return results
else:
return await asyncio.gather(*tasks, return_exceptions=True)
def print_stats(self) -> None:
"""Print request statistics."""
self.logger.info("=" * 40)
self.logger.info("Request Statistics:")
self.logger.info(f" Total Requests: {self.stats['requests']}")
self.logger.info(f" Successful: {self.stats['success']}")
self.logger.info(f" Errors: {self.stats['errors']}")
self.logger.info("=" * 40)
class ConfigManager:
"""Manage API configuration and credentials."""
def __init__(self):
load_dotenv()
@property
def google_credentials_path(self) -> str | None:
"""Get Google service account credentials path."""
# Prefer SEO-specific credentials, fallback to general credentials
seo_creds = os.path.expanduser("~/.credential/ourdigital-seo-agent.json")
if os.path.exists(seo_creds):
return seo_creds
return os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
@property
def pagespeed_api_key(self) -> str | None:
"""Get PageSpeed Insights API key."""
return os.getenv("PAGESPEED_API_KEY")
@property
def custom_search_api_key(self) -> str | None:
"""Get Custom Search API key."""
return os.getenv("CUSTOM_SEARCH_API_KEY")
@property
def custom_search_engine_id(self) -> str | None:
"""Get Custom Search Engine ID."""
return os.getenv("CUSTOM_SEARCH_ENGINE_ID")
@property
def notion_token(self) -> str | None:
"""Get Notion API token."""
return os.getenv("NOTION_TOKEN") or os.getenv("NOTION_API_KEY")
def validate_google_credentials(self) -> bool:
"""Validate Google credentials are configured."""
creds_path = self.google_credentials_path
if not creds_path:
return False
return os.path.exists(creds_path)
def get_required(self, key: str) -> str:
"""Get required environment variable or raise error."""
value = os.getenv(key)
if not value:
raise ValueError(f"Missing required environment variable: {key}")
return value
# Singleton config instance
config = ConfigManager()

View File

@@ -0,0 +1,6 @@
# 14-seo-schema-generator dependencies
jsonschema>=4.21.0
requests>=2.31.0
python-dotenv>=1.0.0
rich>=13.7.0
typer>=0.9.0

View File

@@ -0,0 +1,490 @@
"""
Schema Generator - Generate JSON-LD structured data markup
==========================================================
Purpose: Generate schema.org structured data in JSON-LD format
Python: 3.10+
Usage:
python schema_generator.py --type organization --name "Company Name" --url "https://example.com"
"""
import argparse
import json
import logging
import os
import re
from datetime import datetime
from pathlib import Path
from typing import Any
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - %(levelname)s - %(message)s",
)
logger = logging.getLogger(__name__)
# Template directory relative to this script
TEMPLATE_DIR = Path(__file__).parent.parent / "templates" / "schema_templates"
class SchemaGenerator:
"""Generate JSON-LD schema markup from templates."""
SCHEMA_TYPES = {
"organization": "organization.json",
"local_business": "local_business.json",
"product": "product.json",
"article": "article.json",
"faq": "faq.json",
"breadcrumb": "breadcrumb.json",
"website": "website.json",
}
# Business type mappings for LocalBusiness
BUSINESS_TYPES = {
"restaurant": "Restaurant",
"cafe": "CafeOrCoffeeShop",
"bar": "BarOrPub",
"hotel": "Hotel",
"store": "Store",
"medical": "MedicalBusiness",
"dental": "Dentist",
"legal": "LegalService",
"real_estate": "RealEstateAgent",
"auto": "AutoRepair",
"beauty": "BeautySalon",
"gym": "HealthClub",
"spa": "DaySpa",
}
# Article type mappings
ARTICLE_TYPES = {
"article": "Article",
"blog": "BlogPosting",
"news": "NewsArticle",
"tech": "TechArticle",
"scholarly": "ScholarlyArticle",
}
def __init__(self, template_dir: Path = TEMPLATE_DIR):
self.template_dir = template_dir
def load_template(self, schema_type: str) -> dict:
"""Load a schema template file."""
if schema_type not in self.SCHEMA_TYPES:
raise ValueError(f"Unknown schema type: {schema_type}. "
f"Available: {list(self.SCHEMA_TYPES.keys())}")
template_file = self.template_dir / self.SCHEMA_TYPES[schema_type]
if not template_file.exists():
raise FileNotFoundError(f"Template not found: {template_file}")
with open(template_file, "r", encoding="utf-8") as f:
return json.load(f)
def fill_template(self, template: dict, data: dict[str, Any]) -> dict:
"""Fill template placeholders with actual data."""
template_str = json.dumps(template, ensure_ascii=False)
# Replace placeholders {{key}} with values
for key, value in data.items():
placeholder = f"{{{{{key}}}}}"
if value is not None:
template_str = template_str.replace(placeholder, str(value))
# Remove unfilled placeholders and their parent objects if empty
result = json.loads(template_str)
return self._clean_empty_values(result)
def _clean_empty_values(self, obj: Any) -> Any:
"""Remove empty values and unfilled placeholders."""
if isinstance(obj, dict):
cleaned = {}
for key, value in obj.items():
cleaned_value = self._clean_empty_values(value)
# Skip if value is empty, None, or unfilled placeholder
if cleaned_value is None:
continue
if isinstance(cleaned_value, str) and cleaned_value.startswith("{{"):
continue
if isinstance(cleaned_value, (list, dict)) and not cleaned_value:
continue
cleaned[key] = cleaned_value
return cleaned if cleaned else None
elif isinstance(obj, list):
cleaned = []
for item in obj:
cleaned_item = self._clean_empty_values(item)
if cleaned_item is not None:
if isinstance(cleaned_item, str) and cleaned_item.startswith("{{"):
continue
cleaned.append(cleaned_item)
return cleaned if cleaned else None
elif isinstance(obj, str):
if obj.startswith("{{") and obj.endswith("}}"):
return None
return obj
return obj
def generate_organization(
self,
name: str,
url: str,
logo_url: str | None = None,
description: str | None = None,
founding_date: str | None = None,
phone: str | None = None,
address: dict | None = None,
social_links: list[str] | None = None,
) -> dict:
"""Generate Organization schema."""
template = self.load_template("organization")
data = {
"name": name,
"url": url,
"logo_url": logo_url,
"description": description,
"founding_date": founding_date,
"phone": phone,
}
if address:
data.update({
"street_address": address.get("street"),
"city": address.get("city"),
"region": address.get("region"),
"postal_code": address.get("postal_code"),
"country": address.get("country", "KR"),
})
if social_links:
# Handle social links specially
pass
return self.fill_template(template, data)
def generate_local_business(
self,
name: str,
business_type: str,
address: dict,
phone: str | None = None,
url: str | None = None,
description: str | None = None,
hours: dict | None = None,
geo: dict | None = None,
price_range: str | None = None,
rating: float | None = None,
review_count: int | None = None,
) -> dict:
"""Generate LocalBusiness schema."""
template = self.load_template("local_business")
schema_business_type = self.BUSINESS_TYPES.get(
business_type.lower(), "LocalBusiness"
)
data = {
"business_type": schema_business_type,
"name": name,
"url": url,
"description": description,
"phone": phone,
"price_range": price_range,
"street_address": address.get("street"),
"city": address.get("city"),
"region": address.get("region"),
"postal_code": address.get("postal_code"),
"country": address.get("country", "KR"),
}
if geo:
data["latitude"] = geo.get("lat")
data["longitude"] = geo.get("lng")
if hours:
data.update({
"weekday_opens": hours.get("weekday_opens", "09:00"),
"weekday_closes": hours.get("weekday_closes", "18:00"),
"weekend_opens": hours.get("weekend_opens"),
"weekend_closes": hours.get("weekend_closes"),
})
if rating is not None:
data["rating"] = str(rating)
data["review_count"] = str(review_count or 0)
return self.fill_template(template, data)
def generate_product(
self,
name: str,
description: str,
price: float,
currency: str = "KRW",
brand: str | None = None,
sku: str | None = None,
images: list[str] | None = None,
availability: str = "InStock",
condition: str = "NewCondition",
rating: float | None = None,
review_count: int | None = None,
url: str | None = None,
seller: str | None = None,
) -> dict:
"""Generate Product schema."""
template = self.load_template("product")
data = {
"name": name,
"description": description,
"price": str(int(price)),
"currency": currency,
"brand_name": brand,
"sku": sku,
"product_url": url,
"availability": availability,
"condition": condition,
"seller_name": seller,
}
if images:
for i, img in enumerate(images[:3], 1):
data[f"image_url_{i}"] = img
if rating is not None:
data["rating"] = str(rating)
data["review_count"] = str(review_count or 0)
return self.fill_template(template, data)
def generate_article(
self,
headline: str,
description: str,
author_name: str,
date_published: str,
publisher_name: str,
article_type: str = "article",
date_modified: str | None = None,
images: list[str] | None = None,
page_url: str | None = None,
publisher_logo: str | None = None,
author_url: str | None = None,
section: str | None = None,
word_count: int | None = None,
keywords: str | None = None,
) -> dict:
"""Generate Article schema."""
template = self.load_template("article")
schema_article_type = self.ARTICLE_TYPES.get(
article_type.lower(), "Article"
)
data = {
"article_type": schema_article_type,
"headline": headline,
"description": description,
"author_name": author_name,
"author_url": author_url,
"date_published": date_published,
"date_modified": date_modified or date_published,
"publisher_name": publisher_name,
"publisher_logo_url": publisher_logo,
"page_url": page_url,
"section": section,
"word_count": str(word_count) if word_count else None,
"keywords": keywords,
}
if images:
for i, img in enumerate(images[:2], 1):
data[f"image_url_{i}"] = img
return self.fill_template(template, data)
def generate_faq(self, questions: list[dict[str, str]]) -> dict:
"""Generate FAQPage schema."""
schema = {
"@context": "https://schema.org",
"@type": "FAQPage",
"mainEntity": [],
}
for qa in questions:
schema["mainEntity"].append({
"@type": "Question",
"name": qa["question"],
"acceptedAnswer": {
"@type": "Answer",
"text": qa["answer"],
},
})
return schema
def generate_breadcrumb(self, items: list[dict[str, str]]) -> dict:
"""Generate BreadcrumbList schema."""
schema = {
"@context": "https://schema.org",
"@type": "BreadcrumbList",
"itemListElement": [],
}
for i, item in enumerate(items, 1):
schema["itemListElement"].append({
"@type": "ListItem",
"position": i,
"name": item["name"],
"item": item["url"],
})
return schema
def generate_website(
self,
name: str,
url: str,
search_url_template: str | None = None,
description: str | None = None,
language: str = "ko-KR",
publisher_name: str | None = None,
logo_url: str | None = None,
alternate_name: str | None = None,
) -> dict:
"""Generate WebSite schema."""
template = self.load_template("website")
data = {
"site_name": name,
"url": url,
"description": description,
"language": language,
"search_url_template": search_url_template,
"publisher_name": publisher_name or name,
"logo_url": logo_url,
"alternate_name": alternate_name,
}
return self.fill_template(template, data)
def to_json_ld(self, schema: dict, pretty: bool = True) -> str:
"""Convert schema dict to JSON-LD string."""
indent = 2 if pretty else None
return json.dumps(schema, ensure_ascii=False, indent=indent)
def to_html_script(self, schema: dict) -> str:
"""Wrap schema in HTML script tag."""
json_ld = self.to_json_ld(schema)
return f'<script type="application/ld+json">\n{json_ld}\n</script>'
def main():
"""Main entry point for CLI usage."""
parser = argparse.ArgumentParser(
description="Generate JSON-LD schema markup",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
# Generate Organization schema
python schema_generator.py --type organization --name "My Company" --url "https://example.com"
# Generate Product schema
python schema_generator.py --type product --name "Widget" --price 29900 --currency KRW
# Generate Article schema
python schema_generator.py --type article --headline "Article Title" --author "John Doe"
""",
)
parser.add_argument(
"--type", "-t",
required=True,
choices=SchemaGenerator.SCHEMA_TYPES.keys(),
help="Schema type to generate",
)
parser.add_argument("--name", help="Name/title")
parser.add_argument("--url", help="URL")
parser.add_argument("--description", help="Description")
parser.add_argument("--price", type=float, help="Price (for product)")
parser.add_argument("--currency", default="KRW", help="Currency code")
parser.add_argument("--headline", help="Headline (for article)")
parser.add_argument("--author", help="Author name")
parser.add_argument("--output", "-o", help="Output file path")
parser.add_argument("--html", action="store_true", help="Output as HTML script tag")
args = parser.parse_args()
generator = SchemaGenerator()
try:
if args.type == "organization":
schema = generator.generate_organization(
name=args.name or "Organization Name",
url=args.url or "https://example.com",
description=args.description,
)
elif args.type == "product":
schema = generator.generate_product(
name=args.name or "Product Name",
description=args.description or "Product description",
price=args.price or 0,
currency=args.currency,
)
elif args.type == "article":
schema = generator.generate_article(
headline=args.headline or args.name or "Article Title",
description=args.description or "Article description",
author_name=args.author or "Author",
date_published=datetime.now().strftime("%Y-%m-%d"),
publisher_name="Publisher",
)
elif args.type == "website":
schema = generator.generate_website(
name=args.name or "Website Name",
url=args.url or "https://example.com",
description=args.description,
)
elif args.type == "faq":
# Example FAQ
schema = generator.generate_faq([
{"question": "Question 1?", "answer": "Answer 1"},
{"question": "Question 2?", "answer": "Answer 2"},
])
elif args.type == "breadcrumb":
# Example breadcrumb
schema = generator.generate_breadcrumb([
{"name": "Home", "url": "https://example.com/"},
{"name": "Category", "url": "https://example.com/category/"},
])
elif args.type == "local_business":
schema = generator.generate_local_business(
name=args.name or "Business Name",
business_type="store",
address={"street": "123 Main St", "city": "Seoul", "country": "KR"},
url=args.url,
description=args.description,
)
else:
raise ValueError(f"Unsupported type: {args.type}")
if args.html:
output = generator.to_html_script(schema)
else:
output = generator.to_json_ld(schema)
if args.output:
with open(args.output, "w", encoding="utf-8") as f:
f.write(output)
logger.info(f"Schema written to {args.output}")
else:
print(output)
except Exception as e:
logger.error(f"Error generating schema: {e}")
raise
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,32 @@
{
"@context": "https://schema.org",
"@type": "{{article_type}}",
"headline": "{{headline}}",
"description": "{{description}}",
"image": [
"{{image_url_1}}",
"{{image_url_2}}"
],
"datePublished": "{{date_published}}",
"dateModified": "{{date_modified}}",
"author": {
"@type": "Person",
"name": "{{author_name}}",
"url": "{{author_url}}"
},
"publisher": {
"@type": "Organization",
"name": "{{publisher_name}}",
"logo": {
"@type": "ImageObject",
"url": "{{publisher_logo_url}}"
}
},
"mainEntityOfPage": {
"@type": "WebPage",
"@id": "{{page_url}}"
},
"articleSection": "{{section}}",
"wordCount": "{{word_count}}",
"keywords": "{{keywords}}"
}

View File

@@ -0,0 +1,24 @@
{
"@context": "https://schema.org",
"@type": "BreadcrumbList",
"itemListElement": [
{
"@type": "ListItem",
"position": 1,
"name": "{{level_1_name}}",
"item": "{{level_1_url}}"
},
{
"@type": "ListItem",
"position": 2,
"name": "{{level_2_name}}",
"item": "{{level_2_url}}"
},
{
"@type": "ListItem",
"position": 3,
"name": "{{level_3_name}}",
"item": "{{level_3_url}}"
}
]
}

View File

@@ -0,0 +1,30 @@
{
"@context": "https://schema.org",
"@type": "FAQPage",
"mainEntity": [
{
"@type": "Question",
"name": "{{question_1}}",
"acceptedAnswer": {
"@type": "Answer",
"text": "{{answer_1}}"
}
},
{
"@type": "Question",
"name": "{{question_2}}",
"acceptedAnswer": {
"@type": "Answer",
"text": "{{answer_2}}"
}
},
{
"@type": "Question",
"name": "{{question_3}}",
"acceptedAnswer": {
"@type": "Answer",
"text": "{{answer_3}}"
}
}
]
}

View File

@@ -0,0 +1,47 @@
{
"@context": "https://schema.org",
"@type": "{{business_type}}",
"name": "{{name}}",
"description": "{{description}}",
"url": "{{url}}",
"telephone": "{{phone}}",
"email": "{{email}}",
"image": "{{image_url}}",
"priceRange": "{{price_range}}",
"address": {
"@type": "PostalAddress",
"streetAddress": "{{street_address}}",
"addressLocality": "{{city}}",
"addressRegion": "{{region}}",
"postalCode": "{{postal_code}}",
"addressCountry": "{{country}}"
},
"geo": {
"@type": "GeoCoordinates",
"latitude": "{{latitude}}",
"longitude": "{{longitude}}"
},
"openingHoursSpecification": [
{
"@type": "OpeningHoursSpecification",
"dayOfWeek": ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"],
"opens": "{{weekday_opens}}",
"closes": "{{weekday_closes}}"
},
{
"@type": "OpeningHoursSpecification",
"dayOfWeek": ["Saturday", "Sunday"],
"opens": "{{weekend_opens}}",
"closes": "{{weekend_closes}}"
}
],
"aggregateRating": {
"@type": "AggregateRating",
"ratingValue": "{{rating}}",
"reviewCount": "{{review_count}}"
},
"sameAs": [
"{{facebook_url}}",
"{{instagram_url}}"
]
}

View File

@@ -0,0 +1,37 @@
{
"@context": "https://schema.org",
"@type": "Organization",
"name": "{{name}}",
"url": "{{url}}",
"logo": "{{logo_url}}",
"description": "{{description}}",
"foundingDate": "{{founding_date}}",
"founders": [
{
"@type": "Person",
"name": "{{founder_name}}"
}
],
"address": {
"@type": "PostalAddress",
"streetAddress": "{{street_address}}",
"addressLocality": "{{city}}",
"addressRegion": "{{region}}",
"postalCode": "{{postal_code}}",
"addressCountry": "{{country}}"
},
"contactPoint": [
{
"@type": "ContactPoint",
"telephone": "{{phone}}",
"contactType": "customer service",
"availableLanguage": ["Korean", "English"]
}
],
"sameAs": [
"{{facebook_url}}",
"{{twitter_url}}",
"{{linkedin_url}}",
"{{instagram_url}}"
]
}

View File

@@ -0,0 +1,76 @@
{
"@context": "https://schema.org",
"@type": "Product",
"name": "{{name}}",
"description": "{{description}}",
"image": [
"{{image_url_1}}",
"{{image_url_2}}",
"{{image_url_3}}"
],
"sku": "{{sku}}",
"mpn": "{{mpn}}",
"gtin13": "{{gtin13}}",
"brand": {
"@type": "Brand",
"name": "{{brand_name}}"
},
"offers": {
"@type": "Offer",
"url": "{{product_url}}",
"price": "{{price}}",
"priceCurrency": "{{currency}}",
"priceValidUntil": "{{price_valid_until}}",
"availability": "https://schema.org/{{availability}}",
"itemCondition": "https://schema.org/{{condition}}",
"seller": {
"@type": "Organization",
"name": "{{seller_name}}"
},
"shippingDetails": {
"@type": "OfferShippingDetails",
"shippingRate": {
"@type": "MonetaryAmount",
"value": "{{shipping_cost}}",
"currency": "{{currency}}"
},
"deliveryTime": {
"@type": "ShippingDeliveryTime",
"handlingTime": {
"@type": "QuantitativeValue",
"minValue": "{{handling_min_days}}",
"maxValue": "{{handling_max_days}}",
"unitCode": "DAY"
},
"transitTime": {
"@type": "QuantitativeValue",
"minValue": "{{transit_min_days}}",
"maxValue": "{{transit_max_days}}",
"unitCode": "DAY"
}
}
}
},
"aggregateRating": {
"@type": "AggregateRating",
"ratingValue": "{{rating}}",
"reviewCount": "{{review_count}}",
"bestRating": "5",
"worstRating": "1"
},
"review": [
{
"@type": "Review",
"reviewRating": {
"@type": "Rating",
"ratingValue": "{{review_rating}}",
"bestRating": "5"
},
"author": {
"@type": "Person",
"name": "{{reviewer_name}}"
},
"reviewBody": "{{review_text}}"
}
]
}

View File

@@ -0,0 +1,25 @@
{
"@context": "https://schema.org",
"@type": "WebSite",
"name": "{{site_name}}",
"alternateName": "{{alternate_name}}",
"url": "{{url}}",
"description": "{{description}}",
"inLanguage": "{{language}}",
"potentialAction": {
"@type": "SearchAction",
"target": {
"@type": "EntryPoint",
"urlTemplate": "{{search_url_template}}"
},
"query-input": "required name=search_term_string"
},
"publisher": {
"@type": "Organization",
"name": "{{publisher_name}}",
"logo": {
"@type": "ImageObject",
"url": "{{logo_url}}"
}
}
}

View File

@@ -0,0 +1,146 @@
---
name: seo-schema-generator
version: 1.0.0
description: Schema markup generator for JSON-LD structured data. Triggers: generate schema, create JSON-LD, add structured data, schema markup.
allowed-tools: mcp__firecrawl__*, mcp__perplexity__*
---
# SEO Schema Generator
## Purpose
Generate JSON-LD structured data markup for various content types using templates.
## Core Capabilities
1. **Organization** - Company/brand information
2. **LocalBusiness** - Physical location businesses
3. **Article** - Blog posts and news articles
4. **Product** - E-commerce products
5. **FAQPage** - FAQ sections
6. **BreadcrumbList** - Navigation breadcrumbs
7. **WebSite** - Site-level with search action
## Workflow
1. Identify content type
2. Gather required information
3. Generate JSON-LD from template
4. Validate output
5. Provide implementation instructions
## Schema Templates
### Organization
```json
{
"@context": "https://schema.org",
"@type": "Organization",
"name": "[Company Name]",
"url": "[Website URL]",
"logo": "[Logo URL]",
"sameAs": [
"[Social Media URLs]"
]
}
```
### LocalBusiness
```json
{
"@context": "https://schema.org",
"@type": "LocalBusiness",
"name": "[Business Name]",
"address": {
"@type": "PostalAddress",
"streetAddress": "[Street]",
"addressLocality": "[City]",
"addressRegion": "[State]",
"postalCode": "[ZIP]",
"addressCountry": "[Country]"
},
"telephone": "[Phone]",
"openingHours": ["Mo-Fr 09:00-17:00"]
}
```
### Article
```json
{
"@context": "https://schema.org",
"@type": "Article",
"headline": "[Title]",
"author": {
"@type": "Person",
"name": "[Author Name]"
},
"datePublished": "[YYYY-MM-DD]",
"dateModified": "[YYYY-MM-DD]",
"image": "[Image URL]",
"publisher": {
"@type": "Organization",
"name": "[Publisher]",
"logo": "[Logo URL]"
}
}
```
### FAQPage
```json
{
"@context": "https://schema.org",
"@type": "FAQPage",
"mainEntity": [
{
"@type": "Question",
"name": "[Question]",
"acceptedAnswer": {
"@type": "Answer",
"text": "[Answer]"
}
}
]
}
```
### Product
```json
{
"@context": "https://schema.org",
"@type": "Product",
"name": "[Product Name]",
"image": "[Image URL]",
"description": "[Description]",
"offers": {
"@type": "Offer",
"price": "[Price]",
"priceCurrency": "[Currency]",
"availability": "https://schema.org/InStock"
}
}
```
## Implementation
Place generated JSON-LD in `<head>` section:
```html
<head>
<script type="application/ld+json">
[Generated Schema Here]
</script>
</head>
```
## Validation
After generating:
1. Use schema validator skill (13) to verify
2. Test with Google Rich Results Test
3. Monitor in Search Console
## Limitations
- Templates cover common types only
- Complex nested schemas may need manual adjustment
- Some Rich Results require additional properties

View File

@@ -0,0 +1,32 @@
{
"@context": "https://schema.org",
"@type": "{{article_type}}",
"headline": "{{headline}}",
"description": "{{description}}",
"image": [
"{{image_url_1}}",
"{{image_url_2}}"
],
"datePublished": "{{date_published}}",
"dateModified": "{{date_modified}}",
"author": {
"@type": "Person",
"name": "{{author_name}}",
"url": "{{author_url}}"
},
"publisher": {
"@type": "Organization",
"name": "{{publisher_name}}",
"logo": {
"@type": "ImageObject",
"url": "{{publisher_logo_url}}"
}
},
"mainEntityOfPage": {
"@type": "WebPage",
"@id": "{{page_url}}"
},
"articleSection": "{{section}}",
"wordCount": "{{word_count}}",
"keywords": "{{keywords}}"
}

View File

@@ -0,0 +1,24 @@
{
"@context": "https://schema.org",
"@type": "BreadcrumbList",
"itemListElement": [
{
"@type": "ListItem",
"position": 1,
"name": "{{level_1_name}}",
"item": "{{level_1_url}}"
},
{
"@type": "ListItem",
"position": 2,
"name": "{{level_2_name}}",
"item": "{{level_2_url}}"
},
{
"@type": "ListItem",
"position": 3,
"name": "{{level_3_name}}",
"item": "{{level_3_url}}"
}
]
}

View File

@@ -0,0 +1,30 @@
{
"@context": "https://schema.org",
"@type": "FAQPage",
"mainEntity": [
{
"@type": "Question",
"name": "{{question_1}}",
"acceptedAnswer": {
"@type": "Answer",
"text": "{{answer_1}}"
}
},
{
"@type": "Question",
"name": "{{question_2}}",
"acceptedAnswer": {
"@type": "Answer",
"text": "{{answer_2}}"
}
},
{
"@type": "Question",
"name": "{{question_3}}",
"acceptedAnswer": {
"@type": "Answer",
"text": "{{answer_3}}"
}
}
]
}

View File

@@ -0,0 +1,47 @@
{
"@context": "https://schema.org",
"@type": "{{business_type}}",
"name": "{{name}}",
"description": "{{description}}",
"url": "{{url}}",
"telephone": "{{phone}}",
"email": "{{email}}",
"image": "{{image_url}}",
"priceRange": "{{price_range}}",
"address": {
"@type": "PostalAddress",
"streetAddress": "{{street_address}}",
"addressLocality": "{{city}}",
"addressRegion": "{{region}}",
"postalCode": "{{postal_code}}",
"addressCountry": "{{country}}"
},
"geo": {
"@type": "GeoCoordinates",
"latitude": "{{latitude}}",
"longitude": "{{longitude}}"
},
"openingHoursSpecification": [
{
"@type": "OpeningHoursSpecification",
"dayOfWeek": ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"],
"opens": "{{weekday_opens}}",
"closes": "{{weekday_closes}}"
},
{
"@type": "OpeningHoursSpecification",
"dayOfWeek": ["Saturday", "Sunday"],
"opens": "{{weekend_opens}}",
"closes": "{{weekend_closes}}"
}
],
"aggregateRating": {
"@type": "AggregateRating",
"ratingValue": "{{rating}}",
"reviewCount": "{{review_count}}"
},
"sameAs": [
"{{facebook_url}}",
"{{instagram_url}}"
]
}

View File

@@ -0,0 +1,37 @@
{
"@context": "https://schema.org",
"@type": "Organization",
"name": "{{name}}",
"url": "{{url}}",
"logo": "{{logo_url}}",
"description": "{{description}}",
"foundingDate": "{{founding_date}}",
"founders": [
{
"@type": "Person",
"name": "{{founder_name}}"
}
],
"address": {
"@type": "PostalAddress",
"streetAddress": "{{street_address}}",
"addressLocality": "{{city}}",
"addressRegion": "{{region}}",
"postalCode": "{{postal_code}}",
"addressCountry": "{{country}}"
},
"contactPoint": [
{
"@type": "ContactPoint",
"telephone": "{{phone}}",
"contactType": "customer service",
"availableLanguage": ["Korean", "English"]
}
],
"sameAs": [
"{{facebook_url}}",
"{{twitter_url}}",
"{{linkedin_url}}",
"{{instagram_url}}"
]
}

View File

@@ -0,0 +1,76 @@
{
"@context": "https://schema.org",
"@type": "Product",
"name": "{{name}}",
"description": "{{description}}",
"image": [
"{{image_url_1}}",
"{{image_url_2}}",
"{{image_url_3}}"
],
"sku": "{{sku}}",
"mpn": "{{mpn}}",
"gtin13": "{{gtin13}}",
"brand": {
"@type": "Brand",
"name": "{{brand_name}}"
},
"offers": {
"@type": "Offer",
"url": "{{product_url}}",
"price": "{{price}}",
"priceCurrency": "{{currency}}",
"priceValidUntil": "{{price_valid_until}}",
"availability": "https://schema.org/{{availability}}",
"itemCondition": "https://schema.org/{{condition}}",
"seller": {
"@type": "Organization",
"name": "{{seller_name}}"
},
"shippingDetails": {
"@type": "OfferShippingDetails",
"shippingRate": {
"@type": "MonetaryAmount",
"value": "{{shipping_cost}}",
"currency": "{{currency}}"
},
"deliveryTime": {
"@type": "ShippingDeliveryTime",
"handlingTime": {
"@type": "QuantitativeValue",
"minValue": "{{handling_min_days}}",
"maxValue": "{{handling_max_days}}",
"unitCode": "DAY"
},
"transitTime": {
"@type": "QuantitativeValue",
"minValue": "{{transit_min_days}}",
"maxValue": "{{transit_max_days}}",
"unitCode": "DAY"
}
}
}
},
"aggregateRating": {
"@type": "AggregateRating",
"ratingValue": "{{rating}}",
"reviewCount": "{{review_count}}",
"bestRating": "5",
"worstRating": "1"
},
"review": [
{
"@type": "Review",
"reviewRating": {
"@type": "Rating",
"ratingValue": "{{review_rating}}",
"bestRating": "5"
},
"author": {
"@type": "Person",
"name": "{{reviewer_name}}"
},
"reviewBody": "{{review_text}}"
}
]
}

View File

@@ -0,0 +1,25 @@
{
"@context": "https://schema.org",
"@type": "WebSite",
"name": "{{site_name}}",
"alternateName": "{{alternate_name}}",
"url": "{{url}}",
"description": "{{description}}",
"inLanguage": "{{language}}",
"potentialAction": {
"@type": "SearchAction",
"target": {
"@type": "EntryPoint",
"urlTemplate": "{{search_url_template}}"
},
"query-input": "required name=search_term_string"
},
"publisher": {
"@type": "Organization",
"name": "{{publisher_name}}",
"logo": {
"@type": "ImageObject",
"url": "{{logo_url}}"
}
}
}

View File

@@ -0,0 +1,117 @@
# CLAUDE.md
## Overview
Core Web Vitals analyzer using Google PageSpeed Insights API: LCP, FID, CLS, INP, TTFB, FCP measurement and recommendations.
## Quick Start
```bash
pip install -r scripts/requirements.txt
# Requires API key
export PAGESPEED_API_KEY=your_api_key
python scripts/pagespeed_client.py --url https://example.com
```
## Scripts
| Script | Purpose |
|--------|---------|
| `pagespeed_client.py` | PageSpeed Insights API client |
| `base_client.py` | Shared utilities |
## Usage
```bash
# Mobile analysis (default)
python scripts/pagespeed_client.py --url https://example.com
# Desktop analysis
python scripts/pagespeed_client.py --url https://example.com --strategy desktop
# Both strategies
python scripts/pagespeed_client.py --url https://example.com --strategy both
# JSON output
python scripts/pagespeed_client.py --url https://example.com --json
# Batch analysis
python scripts/pagespeed_client.py --urls urls.txt --output results.json
```
## Core Web Vitals Metrics
| Metric | Good | Needs Improvement | Poor |
|--------|------|-------------------|------|
| LCP (Largest Contentful Paint) | ≤2.5s | 2.5s-4s | >4s |
| FID (First Input Delay) | ≤100ms | 100ms-300ms | >300ms |
| CLS (Cumulative Layout Shift) | ≤0.1 | 0.1-0.25 | >0.25 |
| INP (Interaction to Next Paint) | ≤200ms | 200ms-500ms | >500ms |
## Additional Metrics
| Metric | Description |
|--------|-------------|
| TTFB | Time to First Byte |
| FCP | First Contentful Paint |
| SI | Speed Index |
| TBT | Total Blocking Time |
## Output
```json
{
"url": "https://example.com",
"strategy": "mobile",
"score": 85,
"core_web_vitals": {
"lcp": {"value": 2.1, "rating": "good"},
"fid": {"value": 50, "rating": "good"},
"cls": {"value": 0.05, "rating": "good"},
"inp": {"value": 180, "rating": "good"}
},
"opportunities": [
{
"id": "render-blocking-resources",
"title": "Eliminate render-blocking resources",
"savings_ms": 1200
}
],
"diagnostics": []
}
```
## Configuration
Environment variables:
```bash
PAGESPEED_API_KEY=AIza... # Required for higher quotas
GOOGLE_API_KEY=AIza... # Alternative key name
```
## Rate Limits
| Tier | Limit |
|------|-------|
| No API key | 25 queries/day |
| With API key | 25,000 queries/day |
## Common Recommendations
| Issue | Fix |
|-------|-----|
| Large LCP | Optimize images, preload critical resources |
| High CLS | Set image dimensions, avoid injected content |
| Poor INP | Reduce JavaScript, optimize event handlers |
| Slow TTFB | Improve server response, use CDN |
## Dependencies
```
google-api-python-client>=2.100.0
requests>=2.31.0
python-dotenv>=1.0.0
rich>=13.7.0
```

View File

@@ -0,0 +1,207 @@
"""
Base Client - Shared async client utilities
===========================================
Purpose: Rate-limited async operations for API clients
Python: 3.10+
"""
import asyncio
import logging
import os
from asyncio import Semaphore
from datetime import datetime
from typing import Any, Callable, TypeVar
from dotenv import load_dotenv
from tenacity import (
retry,
stop_after_attempt,
wait_exponential,
retry_if_exception_type,
)
# Load environment variables
load_dotenv()
# Logging setup
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - %(levelname)s - %(message)s",
)
T = TypeVar("T")
class RateLimiter:
"""Rate limiter using token bucket algorithm."""
def __init__(self, rate: float, per: float = 1.0):
"""
Initialize rate limiter.
Args:
rate: Number of requests allowed
per: Time period in seconds (default: 1 second)
"""
self.rate = rate
self.per = per
self.tokens = rate
self.last_update = datetime.now()
self._lock = asyncio.Lock()
async def acquire(self) -> None:
"""Acquire a token, waiting if necessary."""
async with self._lock:
now = datetime.now()
elapsed = (now - self.last_update).total_seconds()
self.tokens = min(self.rate, self.tokens + elapsed * (self.rate / self.per))
self.last_update = now
if self.tokens < 1:
wait_time = (1 - self.tokens) * (self.per / self.rate)
await asyncio.sleep(wait_time)
self.tokens = 0
else:
self.tokens -= 1
class BaseAsyncClient:
"""Base class for async API clients with rate limiting."""
def __init__(
self,
max_concurrent: int = 5,
requests_per_second: float = 3.0,
logger: logging.Logger | None = None,
):
"""
Initialize base client.
Args:
max_concurrent: Maximum concurrent requests
requests_per_second: Rate limit
logger: Logger instance
"""
self.semaphore = Semaphore(max_concurrent)
self.rate_limiter = RateLimiter(requests_per_second)
self.logger = logger or logging.getLogger(self.__class__.__name__)
self.stats = {
"requests": 0,
"success": 0,
"errors": 0,
"retries": 0,
}
@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=2, max=10),
retry=retry_if_exception_type(Exception),
)
async def _rate_limited_request(
self,
coro: Callable[[], Any],
) -> Any:
"""Execute a request with rate limiting and retry."""
async with self.semaphore:
await self.rate_limiter.acquire()
self.stats["requests"] += 1
try:
result = await coro()
self.stats["success"] += 1
return result
except Exception as e:
self.stats["errors"] += 1
self.logger.error(f"Request failed: {e}")
raise
async def batch_requests(
self,
requests: list[Callable[[], Any]],
desc: str = "Processing",
) -> list[Any]:
"""Execute multiple requests concurrently."""
try:
from tqdm.asyncio import tqdm
has_tqdm = True
except ImportError:
has_tqdm = False
async def execute(req: Callable) -> Any:
try:
return await self._rate_limited_request(req)
except Exception as e:
return {"error": str(e)}
tasks = [execute(req) for req in requests]
if has_tqdm:
results = []
for coro in tqdm.as_completed(tasks, total=len(tasks), desc=desc):
result = await coro
results.append(result)
return results
else:
return await asyncio.gather(*tasks, return_exceptions=True)
def print_stats(self) -> None:
"""Print request statistics."""
self.logger.info("=" * 40)
self.logger.info("Request Statistics:")
self.logger.info(f" Total Requests: {self.stats['requests']}")
self.logger.info(f" Successful: {self.stats['success']}")
self.logger.info(f" Errors: {self.stats['errors']}")
self.logger.info("=" * 40)
class ConfigManager:
"""Manage API configuration and credentials."""
def __init__(self):
load_dotenv()
@property
def google_credentials_path(self) -> str | None:
"""Get Google service account credentials path."""
# Prefer SEO-specific credentials, fallback to general credentials
seo_creds = os.path.expanduser("~/.credential/ourdigital-seo-agent.json")
if os.path.exists(seo_creds):
return seo_creds
return os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
@property
def pagespeed_api_key(self) -> str | None:
"""Get PageSpeed Insights API key."""
return os.getenv("PAGESPEED_API_KEY")
@property
def custom_search_api_key(self) -> str | None:
"""Get Custom Search API key."""
return os.getenv("CUSTOM_SEARCH_API_KEY")
@property
def custom_search_engine_id(self) -> str | None:
"""Get Custom Search Engine ID."""
return os.getenv("CUSTOM_SEARCH_ENGINE_ID")
@property
def notion_token(self) -> str | None:
"""Get Notion API token."""
return os.getenv("NOTION_TOKEN") or os.getenv("NOTION_API_KEY")
def validate_google_credentials(self) -> bool:
"""Validate Google credentials are configured."""
creds_path = self.google_credentials_path
if not creds_path:
return False
return os.path.exists(creds_path)
def get_required(self, key: str) -> str:
"""Get required environment variable or raise error."""
value = os.getenv(key)
if not value:
raise ValueError(f"Missing required environment variable: {key}")
return value
# Singleton config instance
config = ConfigManager()

View File

@@ -0,0 +1,452 @@
"""
PageSpeed Insights Client
=========================
Purpose: Get Core Web Vitals and performance data from PageSpeed Insights API
Python: 3.10+
Usage:
from pagespeed_client import PageSpeedClient
client = PageSpeedClient()
result = client.analyze("https://example.com")
"""
import argparse
import json
import logging
from dataclasses import dataclass, field
from typing import Any
import requests
from base_client import config
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - %(levelname)s - %(message)s",
)
logger = logging.getLogger(__name__)
@dataclass
class CoreWebVitals:
"""Core Web Vitals metrics."""
lcp: float | None = None # Largest Contentful Paint (ms)
fid: float | None = None # First Input Delay (ms)
cls: float | None = None # Cumulative Layout Shift
inp: float | None = None # Interaction to Next Paint (ms)
ttfb: float | None = None # Time to First Byte (ms)
fcp: float | None = None # First Contentful Paint (ms)
# Assessment (GOOD, NEEDS_IMPROVEMENT, POOR)
lcp_rating: str | None = None
fid_rating: str | None = None
cls_rating: str | None = None
inp_rating: str | None = None
def to_dict(self) -> dict:
return {
"lcp": {"value": self.lcp, "rating": self.lcp_rating},
"fid": {"value": self.fid, "rating": self.fid_rating},
"cls": {"value": self.cls, "rating": self.cls_rating},
"inp": {"value": self.inp, "rating": self.inp_rating},
"ttfb": {"value": self.ttfb},
"fcp": {"value": self.fcp},
}
@dataclass
class PageSpeedResult:
"""PageSpeed analysis result."""
url: str
strategy: str # mobile or desktop
performance_score: float | None = None
seo_score: float | None = None
accessibility_score: float | None = None
best_practices_score: float | None = None
core_web_vitals: CoreWebVitals = field(default_factory=CoreWebVitals)
opportunities: list[dict] = field(default_factory=list)
diagnostics: list[dict] = field(default_factory=list)
passed_audits: list[str] = field(default_factory=list)
raw_data: dict = field(default_factory=dict)
def to_dict(self) -> dict:
return {
"url": self.url,
"strategy": self.strategy,
"scores": {
"performance": self.performance_score,
"seo": self.seo_score,
"accessibility": self.accessibility_score,
"best_practices": self.best_practices_score,
},
"core_web_vitals": self.core_web_vitals.to_dict(),
"opportunities_count": len(self.opportunities),
"opportunities": self.opportunities[:10],
"diagnostics_count": len(self.diagnostics),
"passed_audits_count": len(self.passed_audits),
}
class PageSpeedClient:
"""Client for PageSpeed Insights API."""
BASE_URL = "https://www.googleapis.com/pagespeedonline/v5/runPagespeed"
# Core Web Vitals thresholds
THRESHOLDS = {
"lcp": {"good": 2500, "poor": 4000},
"fid": {"good": 100, "poor": 300},
"cls": {"good": 0.1, "poor": 0.25},
"inp": {"good": 200, "poor": 500},
"ttfb": {"good": 800, "poor": 1800},
"fcp": {"good": 1800, "poor": 3000},
}
def __init__(self, api_key: str | None = None):
"""
Initialize PageSpeed client.
Args:
api_key: PageSpeed API key (optional but recommended for higher quotas)
"""
self.api_key = api_key or config.pagespeed_api_key
self.session = requests.Session()
def _rate_metric(self, metric: str, value: float | None) -> str | None:
"""Rate a metric against thresholds."""
if value is None:
return None
thresholds = self.THRESHOLDS.get(metric)
if not thresholds:
return None
if value <= thresholds["good"]:
return "GOOD"
elif value <= thresholds["poor"]:
return "NEEDS_IMPROVEMENT"
else:
return "POOR"
def analyze(
self,
url: str,
strategy: str = "mobile",
categories: list[str] | None = None,
) -> PageSpeedResult:
"""
Analyze a URL with PageSpeed Insights.
Args:
url: URL to analyze
strategy: "mobile" or "desktop"
categories: Categories to analyze (performance, seo, accessibility, best-practices)
Returns:
PageSpeedResult with scores and metrics
"""
if categories is None:
categories = ["performance", "seo", "accessibility", "best-practices"]
params = {
"url": url,
"strategy": strategy,
"category": categories,
}
if self.api_key:
params["key"] = self.api_key
try:
response = self.session.get(self.BASE_URL, params=params, timeout=60)
response.raise_for_status()
data = response.json()
except requests.RequestException as e:
logger.error(f"PageSpeed API request failed: {e}")
raise
result = PageSpeedResult(url=url, strategy=strategy, raw_data=data)
# Extract scores
lighthouse = data.get("lighthouseResult", {})
categories_data = lighthouse.get("categories", {})
if "performance" in categories_data:
score = categories_data["performance"].get("score")
result.performance_score = score * 100 if score else None
if "seo" in categories_data:
score = categories_data["seo"].get("score")
result.seo_score = score * 100 if score else None
if "accessibility" in categories_data:
score = categories_data["accessibility"].get("score")
result.accessibility_score = score * 100 if score else None
if "best-practices" in categories_data:
score = categories_data["best-practices"].get("score")
result.best_practices_score = score * 100 if score else None
# Extract Core Web Vitals
audits = lighthouse.get("audits", {})
# Lab data
cwv = result.core_web_vitals
if "largest-contentful-paint" in audits:
cwv.lcp = audits["largest-contentful-paint"].get("numericValue")
cwv.lcp_rating = self._rate_metric("lcp", cwv.lcp)
if "total-blocking-time" in audits:
# TBT is proxy for FID in lab data
cwv.fid = audits["total-blocking-time"].get("numericValue")
cwv.fid_rating = self._rate_metric("fid", cwv.fid)
if "cumulative-layout-shift" in audits:
cwv.cls = audits["cumulative-layout-shift"].get("numericValue")
cwv.cls_rating = self._rate_metric("cls", cwv.cls)
if "experimental-interaction-to-next-paint" in audits:
cwv.inp = audits["experimental-interaction-to-next-paint"].get("numericValue")
cwv.inp_rating = self._rate_metric("inp", cwv.inp)
if "server-response-time" in audits:
cwv.ttfb = audits["server-response-time"].get("numericValue")
if "first-contentful-paint" in audits:
cwv.fcp = audits["first-contentful-paint"].get("numericValue")
# Field data (real user data) if available
loading_exp = data.get("loadingExperience", {})
metrics = loading_exp.get("metrics", {})
if "LARGEST_CONTENTFUL_PAINT_MS" in metrics:
cwv.lcp = metrics["LARGEST_CONTENTFUL_PAINT_MS"].get("percentile")
cwv.lcp_rating = metrics["LARGEST_CONTENTFUL_PAINT_MS"].get("category")
if "FIRST_INPUT_DELAY_MS" in metrics:
cwv.fid = metrics["FIRST_INPUT_DELAY_MS"].get("percentile")
cwv.fid_rating = metrics["FIRST_INPUT_DELAY_MS"].get("category")
if "CUMULATIVE_LAYOUT_SHIFT_SCORE" in metrics:
cwv.cls = metrics["CUMULATIVE_LAYOUT_SHIFT_SCORE"].get("percentile") / 100
cwv.cls_rating = metrics["CUMULATIVE_LAYOUT_SHIFT_SCORE"].get("category")
if "INTERACTION_TO_NEXT_PAINT" in metrics:
cwv.inp = metrics["INTERACTION_TO_NEXT_PAINT"].get("percentile")
cwv.inp_rating = metrics["INTERACTION_TO_NEXT_PAINT"].get("category")
# Extract opportunities
for audit_id, audit in audits.items():
if audit.get("details", {}).get("type") == "opportunity":
savings = audit.get("details", {}).get("overallSavingsMs", 0)
if savings > 0:
result.opportunities.append({
"id": audit_id,
"title": audit.get("title", ""),
"description": audit.get("description", ""),
"savings_ms": savings,
"score": audit.get("score", 0),
})
# Sort opportunities by savings
result.opportunities.sort(key=lambda x: x["savings_ms"], reverse=True)
# Extract diagnostics
for audit_id, audit in audits.items():
score = audit.get("score")
if score is not None and score < 1 and audit.get("details"):
if audit.get("details", {}).get("type") not in ["opportunity", None]:
result.diagnostics.append({
"id": audit_id,
"title": audit.get("title", ""),
"description": audit.get("description", ""),
"score": score,
})
# Extract passed audits
for audit_id, audit in audits.items():
if audit.get("score") == 1:
result.passed_audits.append(audit.get("title", audit_id))
return result
def analyze_both_strategies(self, url: str) -> dict:
"""Analyze URL for both mobile and desktop."""
mobile = self.analyze(url, strategy="mobile")
desktop = self.analyze(url, strategy="desktop")
return {
"url": url,
"mobile": mobile.to_dict(),
"desktop": desktop.to_dict(),
"comparison": {
"performance_difference": (
(desktop.performance_score or 0) - (mobile.performance_score or 0)
),
"mobile_first_issues": self._identify_mobile_issues(mobile, desktop),
},
}
def _identify_mobile_issues(
self,
mobile: PageSpeedResult,
desktop: PageSpeedResult,
) -> list[str]:
"""Identify issues that affect mobile more than desktop."""
issues = []
if mobile.performance_score and desktop.performance_score:
if desktop.performance_score - mobile.performance_score > 20:
issues.append("Significant performance gap between mobile and desktop")
m_cwv = mobile.core_web_vitals
d_cwv = desktop.core_web_vitals
if m_cwv.lcp and d_cwv.lcp and m_cwv.lcp > d_cwv.lcp * 1.5:
issues.append("LCP significantly slower on mobile")
if m_cwv.cls and d_cwv.cls and m_cwv.cls > d_cwv.cls * 2:
issues.append("Layout shift issues more severe on mobile")
return issues
def get_cwv_summary(self, url: str) -> dict:
"""Get a summary focused on Core Web Vitals."""
result = self.analyze(url, strategy="mobile")
cwv = result.core_web_vitals
return {
"url": url,
"overall_cwv_status": self._overall_cwv_status(cwv),
"metrics": {
"lcp": {
"value": f"{cwv.lcp / 1000:.2f}s" if cwv.lcp else None,
"rating": cwv.lcp_rating,
"threshold": "≤ 2.5s good, > 4.0s poor",
},
"fid": {
"value": f"{cwv.fid:.0f}ms" if cwv.fid else None,
"rating": cwv.fid_rating,
"threshold": "≤ 100ms good, > 300ms poor",
},
"cls": {
"value": f"{cwv.cls:.3f}" if cwv.cls else None,
"rating": cwv.cls_rating,
"threshold": "≤ 0.1 good, > 0.25 poor",
},
"inp": {
"value": f"{cwv.inp:.0f}ms" if cwv.inp else None,
"rating": cwv.inp_rating,
"threshold": "≤ 200ms good, > 500ms poor",
},
},
"top_opportunities": result.opportunities[:5],
}
def _overall_cwv_status(self, cwv: CoreWebVitals) -> str:
"""Determine overall Core Web Vitals status."""
ratings = [cwv.lcp_rating, cwv.fid_rating, cwv.cls_rating]
ratings = [r for r in ratings if r]
if not ratings:
return "UNKNOWN"
if any(r == "POOR" for r in ratings):
return "POOR"
if any(r == "NEEDS_IMPROVEMENT" for r in ratings):
return "NEEDS_IMPROVEMENT"
return "GOOD"
def generate_report(self, result: PageSpeedResult) -> str:
"""Generate human-readable performance report."""
lines = [
"=" * 60,
"PageSpeed Insights Report",
"=" * 60,
f"URL: {result.url}",
f"Strategy: {result.strategy}",
"",
"Scores:",
f" Performance: {result.performance_score:.0f}/100" if result.performance_score else " Performance: N/A",
f" SEO: {result.seo_score:.0f}/100" if result.seo_score else " SEO: N/A",
f" Accessibility: {result.accessibility_score:.0f}/100" if result.accessibility_score else " Accessibility: N/A",
f" Best Practices: {result.best_practices_score:.0f}/100" if result.best_practices_score else " Best Practices: N/A",
"",
"Core Web Vitals:",
]
cwv = result.core_web_vitals
def format_metric(name: str, value: Any, rating: str | None, unit: str) -> str:
if value is None:
return f" {name}: N/A"
rating_str = f" ({rating})" if rating else ""
return f" {name}: {value}{unit}{rating_str}"
lines.append(format_metric("LCP", f"{cwv.lcp / 1000:.2f}" if cwv.lcp else None, cwv.lcp_rating, "s"))
lines.append(format_metric("FID/TBT", f"{cwv.fid:.0f}" if cwv.fid else None, cwv.fid_rating, "ms"))
lines.append(format_metric("CLS", f"{cwv.cls:.3f}" if cwv.cls else None, cwv.cls_rating, ""))
lines.append(format_metric("INP", f"{cwv.inp:.0f}" if cwv.inp else None, cwv.inp_rating, "ms"))
lines.append(format_metric("TTFB", f"{cwv.ttfb:.0f}" if cwv.ttfb else None, None, "ms"))
lines.append(format_metric("FCP", f"{cwv.fcp / 1000:.2f}" if cwv.fcp else None, None, "s"))
if result.opportunities:
lines.extend([
"",
f"Top Opportunities ({len(result.opportunities)} total):",
])
for opp in result.opportunities[:5]:
savings = opp["savings_ms"]
lines.append(f" - {opp['title']}: -{savings / 1000:.1f}s potential savings")
lines.extend(["", "=" * 60])
return "\n".join(lines)
def main():
"""CLI entry point."""
parser = argparse.ArgumentParser(description="PageSpeed Insights Client")
parser.add_argument("--url", "-u", required=True, help="URL to analyze")
parser.add_argument("--strategy", "-s", default="mobile",
choices=["mobile", "desktop", "both"],
help="Analysis strategy")
parser.add_argument("--output", "-o", help="Output file for JSON")
parser.add_argument("--json", action="store_true", help="Output as JSON")
parser.add_argument("--cwv-only", action="store_true",
help="Show only Core Web Vitals summary")
args = parser.parse_args()
client = PageSpeedClient()
if args.cwv_only:
summary = client.get_cwv_summary(args.url)
print(json.dumps(summary, indent=2))
elif args.strategy == "both":
result = client.analyze_both_strategies(args.url)
output = json.dumps(result, indent=2)
if args.output:
with open(args.output, "w") as f:
f.write(output)
else:
print(output)
else:
result = client.analyze(args.url, strategy=args.strategy)
if args.json or args.output:
output = json.dumps(result.to_dict(), indent=2)
if args.output:
with open(args.output, "w") as f:
f.write(output)
else:
print(output)
else:
print(client.generate_report(result))
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,6 @@
# 15-seo-core-web-vitals dependencies
google-api-python-client>=2.100.0
requests>=2.31.0
python-dotenv>=1.0.0
rich>=13.7.0
typer>=0.9.0

View File

@@ -0,0 +1,108 @@
---
name: seo-core-web-vitals
version: 1.0.0
description: Core Web Vitals analyzer for LCP, FID, CLS, INP performance metrics. Triggers: Core Web Vitals, page speed, LCP, CLS, FID, INP, performance.
allowed-tools: mcp__firecrawl__*, mcp__perplexity__*
---
# SEO Core Web Vitals
## Purpose
Analyze Core Web Vitals performance metrics and provide optimization recommendations.
## Core Capabilities
1. **LCP** - Largest Contentful Paint measurement
2. **FID/INP** - Interactivity metrics
3. **CLS** - Cumulative Layout Shift
4. **Recommendations** - Optimization guidance
## Metrics Thresholds
| Metric | Good | Needs Work | Poor |
|--------|------|------------|------|
| LCP | ≤2.5s | 2.5-4s | >4s |
| FID | ≤100ms | 100-300ms | >300ms |
| CLS | ≤0.1 | 0.1-0.25 | >0.25 |
| INP | ≤200ms | 200-500ms | >500ms |
## Data Sources
### Option 1: PageSpeed Insights (Recommended)
Use external tool and input results:
- Visit: https://pagespeed.web.dev/
- Enter URL, run test
- Provide scores to skill
### Option 2: Research Best Practices
```
mcp__perplexity__search: "Core Web Vitals optimization [specific issue]"
```
## Workflow
1. Request PageSpeed Insights data from user
2. Analyze provided metrics
3. Identify failing metrics
4. Research optimization strategies
5. Provide prioritized recommendations
## Common LCP Issues
| Cause | Fix |
|-------|-----|
| Slow server response | Improve TTFB, use CDN |
| Render-blocking resources | Defer non-critical CSS/JS |
| Slow resource load | Preload LCP image |
| Client-side rendering | Use SSR/SSG |
## Common CLS Issues
| Cause | Fix |
|-------|-----|
| Images without dimensions | Add width/height attributes |
| Ads/embeds without space | Reserve space with CSS |
| Web fonts causing FOIT/FOUT | Use font-display: swap |
| Dynamic content injection | Reserve space, use transforms |
## Common INP Issues
| Cause | Fix |
|-------|-----|
| Long JavaScript tasks | Break up tasks, use web workers |
| Large DOM size | Reduce DOM nodes |
| Heavy event handlers | Debounce, optimize listeners |
| Third-party scripts | Defer, lazy load |
## Output Format
```markdown
## Core Web Vitals: [URL]
### Scores
| Metric | Mobile | Desktop | Status |
|--------|--------|---------|--------|
| LCP | Xs | Xs | Good/Poor |
| FID | Xms | Xms | Good/Poor |
| CLS | X.XX | X.XX | Good/Poor |
| INP | Xms | Xms | Good/Poor |
### Overall Score
- Mobile: X/100
- Desktop: X/100
### Priority Fixes
1. [Highest impact recommendation]
2. [Second priority]
### Detailed Recommendations
[Per-metric optimization steps]
```
## Limitations
- Requires external PageSpeed Insights data
- Lab data may differ from field data
- Some fixes require developer implementation
- Third-party scripts may be difficult to optimize

View File

@@ -0,0 +1,122 @@
# CLAUDE.md
## Overview
Google Search Console data retriever: search analytics (rankings, CTR, impressions), sitemap status, and index coverage.
## Quick Start
```bash
pip install -r scripts/requirements.txt
# Requires service account credentials
# ~/.credential/ourdigital-seo-agent.json
python scripts/gsc_client.py --site sc-domain:example.com --action summary
```
## Scripts
| Script | Purpose |
|--------|---------|
| `gsc_client.py` | Search Console API client |
| `base_client.py` | Shared utilities |
## Configuration
Service account setup:
```bash
# Credentials file location
~/.credential/ourdigital-seo-agent.json
# Add service account email to GSC property as user
ourdigital-seo-agent@ourdigital-insights.iam.gserviceaccount.com
```
## Usage
```bash
# Performance summary (last 28 days)
python scripts/gsc_client.py --site sc-domain:example.com --action summary
# Query-level data
python scripts/gsc_client.py --site sc-domain:example.com --action queries --limit 100
# Page-level data
python scripts/gsc_client.py --site sc-domain:example.com --action pages
# Custom date range
python scripts/gsc_client.py --site sc-domain:example.com --action queries \
--start 2024-01-01 --end 2024-01-31
# Sitemap status
python scripts/gsc_client.py --site sc-domain:example.com --action sitemaps
# JSON output
python scripts/gsc_client.py --site sc-domain:example.com --action summary --json
```
## Actions
| Action | Description |
|--------|-------------|
| `summary` | Overview metrics (clicks, impressions, CTR, position) |
| `queries` | Top search queries |
| `pages` | Top pages by clicks |
| `sitemaps` | Sitemap submission status |
| `coverage` | Index coverage issues |
## Output: Summary
```json
{
"site": "sc-domain:example.com",
"date_range": "2024-01-01 to 2024-01-28",
"totals": {
"clicks": 15000,
"impressions": 500000,
"ctr": 3.0,
"position": 12.5
}
}
```
## Output: Queries
```json
{
"queries": [
{
"query": "keyword",
"clicks": 500,
"impressions": 10000,
"ctr": 5.0,
"position": 3.2
}
]
}
```
## Rate Limits
| Limit | Value |
|-------|-------|
| Queries per minute | 1,200 |
| Rows per request | 25,000 |
## Site Property Formats
| Format | Example |
|--------|---------|
| Domain property | `sc-domain:example.com` |
| URL prefix | `https://www.example.com/` |
## Dependencies
```
google-api-python-client>=2.100.0
google-auth>=2.23.0
python-dotenv>=1.0.0
rich>=13.7.0
pandas>=2.1.0
```

View File

@@ -0,0 +1,207 @@
"""
Base Client - Shared async client utilities
===========================================
Purpose: Rate-limited async operations for API clients
Python: 3.10+
"""
import asyncio
import logging
import os
from asyncio import Semaphore
from datetime import datetime
from typing import Any, Callable, TypeVar
from dotenv import load_dotenv
from tenacity import (
retry,
stop_after_attempt,
wait_exponential,
retry_if_exception_type,
)
# Load environment variables
load_dotenv()
# Logging setup
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - %(levelname)s - %(message)s",
)
T = TypeVar("T")
class RateLimiter:
"""Rate limiter using token bucket algorithm."""
def __init__(self, rate: float, per: float = 1.0):
"""
Initialize rate limiter.
Args:
rate: Number of requests allowed
per: Time period in seconds (default: 1 second)
"""
self.rate = rate
self.per = per
self.tokens = rate
self.last_update = datetime.now()
self._lock = asyncio.Lock()
async def acquire(self) -> None:
"""Acquire a token, waiting if necessary."""
async with self._lock:
now = datetime.now()
elapsed = (now - self.last_update).total_seconds()
self.tokens = min(self.rate, self.tokens + elapsed * (self.rate / self.per))
self.last_update = now
if self.tokens < 1:
wait_time = (1 - self.tokens) * (self.per / self.rate)
await asyncio.sleep(wait_time)
self.tokens = 0
else:
self.tokens -= 1
class BaseAsyncClient:
"""Base class for async API clients with rate limiting."""
def __init__(
self,
max_concurrent: int = 5,
requests_per_second: float = 3.0,
logger: logging.Logger | None = None,
):
"""
Initialize base client.
Args:
max_concurrent: Maximum concurrent requests
requests_per_second: Rate limit
logger: Logger instance
"""
self.semaphore = Semaphore(max_concurrent)
self.rate_limiter = RateLimiter(requests_per_second)
self.logger = logger or logging.getLogger(self.__class__.__name__)
self.stats = {
"requests": 0,
"success": 0,
"errors": 0,
"retries": 0,
}
@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=2, max=10),
retry=retry_if_exception_type(Exception),
)
async def _rate_limited_request(
self,
coro: Callable[[], Any],
) -> Any:
"""Execute a request with rate limiting and retry."""
async with self.semaphore:
await self.rate_limiter.acquire()
self.stats["requests"] += 1
try:
result = await coro()
self.stats["success"] += 1
return result
except Exception as e:
self.stats["errors"] += 1
self.logger.error(f"Request failed: {e}")
raise
async def batch_requests(
self,
requests: list[Callable[[], Any]],
desc: str = "Processing",
) -> list[Any]:
"""Execute multiple requests concurrently."""
try:
from tqdm.asyncio import tqdm
has_tqdm = True
except ImportError:
has_tqdm = False
async def execute(req: Callable) -> Any:
try:
return await self._rate_limited_request(req)
except Exception as e:
return {"error": str(e)}
tasks = [execute(req) for req in requests]
if has_tqdm:
results = []
for coro in tqdm.as_completed(tasks, total=len(tasks), desc=desc):
result = await coro
results.append(result)
return results
else:
return await asyncio.gather(*tasks, return_exceptions=True)
def print_stats(self) -> None:
"""Print request statistics."""
self.logger.info("=" * 40)
self.logger.info("Request Statistics:")
self.logger.info(f" Total Requests: {self.stats['requests']}")
self.logger.info(f" Successful: {self.stats['success']}")
self.logger.info(f" Errors: {self.stats['errors']}")
self.logger.info("=" * 40)
class ConfigManager:
"""Manage API configuration and credentials."""
def __init__(self):
load_dotenv()
@property
def google_credentials_path(self) -> str | None:
"""Get Google service account credentials path."""
# Prefer SEO-specific credentials, fallback to general credentials
seo_creds = os.path.expanduser("~/.credential/ourdigital-seo-agent.json")
if os.path.exists(seo_creds):
return seo_creds
return os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
@property
def pagespeed_api_key(self) -> str | None:
"""Get PageSpeed Insights API key."""
return os.getenv("PAGESPEED_API_KEY")
@property
def custom_search_api_key(self) -> str | None:
"""Get Custom Search API key."""
return os.getenv("CUSTOM_SEARCH_API_KEY")
@property
def custom_search_engine_id(self) -> str | None:
"""Get Custom Search Engine ID."""
return os.getenv("CUSTOM_SEARCH_ENGINE_ID")
@property
def notion_token(self) -> str | None:
"""Get Notion API token."""
return os.getenv("NOTION_TOKEN") or os.getenv("NOTION_API_KEY")
def validate_google_credentials(self) -> bool:
"""Validate Google credentials are configured."""
creds_path = self.google_credentials_path
if not creds_path:
return False
return os.path.exists(creds_path)
def get_required(self, key: str) -> str:
"""Get required environment variable or raise error."""
value = os.getenv(key)
if not value:
raise ValueError(f"Missing required environment variable: {key}")
return value
# Singleton config instance
config = ConfigManager()

View File

@@ -0,0 +1,409 @@
"""
Google Search Console Client
============================
Purpose: Interact with Google Search Console API for SEO data
Python: 3.10+
Usage:
from gsc_client import SearchConsoleClient
client = SearchConsoleClient()
data = client.get_search_analytics("sc-domain:example.com")
"""
import logging
from dataclasses import dataclass, field
from datetime import datetime, timedelta
from typing import Any
from google.oauth2 import service_account
from googleapiclient.discovery import build
from base_client import config
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - %(levelname)s - %(message)s",
)
logger = logging.getLogger(__name__)
@dataclass
class SearchAnalyticsResult:
"""Search analytics query result."""
rows: list[dict] = field(default_factory=list)
total_clicks: int = 0
total_impressions: int = 0
average_ctr: float = 0.0
average_position: float = 0.0
@dataclass
class SitemapInfo:
"""Sitemap information from Search Console."""
path: str
last_submitted: str | None = None
last_downloaded: str | None = None
is_pending: bool = False
is_sitemaps_index: bool = False
warnings: int = 0
errors: int = 0
class SearchConsoleClient:
"""Client for Google Search Console API."""
SCOPES = ["https://www.googleapis.com/auth/webmasters.readonly"]
def __init__(self, credentials_path: str | None = None):
"""
Initialize Search Console client.
Args:
credentials_path: Path to service account JSON key
"""
self.credentials_path = credentials_path or config.google_credentials_path
self._service = None
@property
def service(self):
"""Get or create Search Console service."""
if self._service is None:
if not self.credentials_path:
raise ValueError(
"Google credentials not configured. "
"Set GOOGLE_APPLICATION_CREDENTIALS environment variable."
)
credentials = service_account.Credentials.from_service_account_file(
self.credentials_path,
scopes=self.SCOPES,
)
self._service = build("searchconsole", "v1", credentials=credentials)
return self._service
def list_sites(self) -> list[dict]:
"""List all sites accessible to the service account."""
response = self.service.sites().list().execute()
return response.get("siteEntry", [])
def get_search_analytics(
self,
site_url: str,
start_date: str | None = None,
end_date: str | None = None,
dimensions: list[str] | None = None,
row_limit: int = 25000,
filters: list[dict] | None = None,
) -> SearchAnalyticsResult:
"""
Get search analytics data.
Args:
site_url: Site URL (e.g., "sc-domain:example.com" or "https://example.com/")
start_date: Start date (YYYY-MM-DD), defaults to 30 days ago
end_date: End date (YYYY-MM-DD), defaults to yesterday
dimensions: List of dimensions (query, page, country, device, date)
row_limit: Maximum rows to return
filters: Dimension filters
Returns:
SearchAnalyticsResult with rows and summary stats
"""
# Default date range: last 30 days
if not end_date:
end_date = (datetime.now() - timedelta(days=1)).strftime("%Y-%m-%d")
if not start_date:
start_date = (datetime.now() - timedelta(days=30)).strftime("%Y-%m-%d")
# Default dimensions
if dimensions is None:
dimensions = ["query", "page"]
request_body = {
"startDate": start_date,
"endDate": end_date,
"dimensions": dimensions,
"rowLimit": row_limit,
}
if filters:
request_body["dimensionFilterGroups"] = [{"filters": filters}]
try:
response = self.service.searchanalytics().query(
siteUrl=site_url,
body=request_body,
).execute()
except Exception as e:
logger.error(f"Failed to query search analytics: {e}")
raise
rows = response.get("rows", [])
# Calculate totals
total_clicks = sum(row.get("clicks", 0) for row in rows)
total_impressions = sum(row.get("impressions", 0) for row in rows)
total_ctr = sum(row.get("ctr", 0) for row in rows)
total_position = sum(row.get("position", 0) for row in rows)
avg_ctr = total_ctr / len(rows) if rows else 0
avg_position = total_position / len(rows) if rows else 0
return SearchAnalyticsResult(
rows=rows,
total_clicks=total_clicks,
total_impressions=total_impressions,
average_ctr=avg_ctr,
average_position=avg_position,
)
def get_top_queries(
self,
site_url: str,
limit: int = 100,
start_date: str | None = None,
end_date: str | None = None,
) -> list[dict]:
"""Get top search queries by clicks."""
result = self.get_search_analytics(
site_url=site_url,
dimensions=["query"],
row_limit=limit,
start_date=start_date,
end_date=end_date,
)
# Sort by clicks
sorted_rows = sorted(
result.rows,
key=lambda x: x.get("clicks", 0),
reverse=True,
)
return [
{
"query": row["keys"][0],
"clicks": row.get("clicks", 0),
"impressions": row.get("impressions", 0),
"ctr": row.get("ctr", 0),
"position": row.get("position", 0),
}
for row in sorted_rows[:limit]
]
def get_top_pages(
self,
site_url: str,
limit: int = 100,
start_date: str | None = None,
end_date: str | None = None,
) -> list[dict]:
"""Get top pages by clicks."""
result = self.get_search_analytics(
site_url=site_url,
dimensions=["page"],
row_limit=limit,
start_date=start_date,
end_date=end_date,
)
sorted_rows = sorted(
result.rows,
key=lambda x: x.get("clicks", 0),
reverse=True,
)
return [
{
"page": row["keys"][0],
"clicks": row.get("clicks", 0),
"impressions": row.get("impressions", 0),
"ctr": row.get("ctr", 0),
"position": row.get("position", 0),
}
for row in sorted_rows[:limit]
]
def get_sitemaps(self, site_url: str) -> list[SitemapInfo]:
"""Get list of sitemaps for a site."""
try:
response = self.service.sitemaps().list(siteUrl=site_url).execute()
except Exception as e:
logger.error(f"Failed to get sitemaps: {e}")
raise
sitemaps = []
for sm in response.get("sitemap", []):
sitemaps.append(SitemapInfo(
path=sm.get("path", ""),
last_submitted=sm.get("lastSubmitted"),
last_downloaded=sm.get("lastDownloaded"),
is_pending=sm.get("isPending", False),
is_sitemaps_index=sm.get("isSitemapsIndex", False),
warnings=sm.get("warnings", 0),
errors=sm.get("errors", 0),
))
return sitemaps
def submit_sitemap(self, site_url: str, sitemap_url: str) -> bool:
"""Submit a sitemap for indexing."""
try:
self.service.sitemaps().submit(
siteUrl=site_url,
feedpath=sitemap_url,
).execute()
logger.info(f"Submitted sitemap: {sitemap_url}")
return True
except Exception as e:
logger.error(f"Failed to submit sitemap: {e}")
return False
def inspect_url(self, site_url: str, inspection_url: str) -> dict:
"""
Inspect a URL's indexing status.
Note: This uses the URL Inspection API which may have different quotas.
"""
try:
response = self.service.urlInspection().index().inspect(
body={
"inspectionUrl": inspection_url,
"siteUrl": site_url,
}
).execute()
result = response.get("inspectionResult", {})
return {
"url": inspection_url,
"indexing_state": result.get("indexStatusResult", {}).get(
"coverageState", "Unknown"
),
"last_crawl_time": result.get("indexStatusResult", {}).get(
"lastCrawlTime"
),
"crawled_as": result.get("indexStatusResult", {}).get("crawledAs"),
"robots_txt_state": result.get("indexStatusResult", {}).get(
"robotsTxtState"
),
"mobile_usability": result.get("mobileUsabilityResult", {}).get(
"verdict", "Unknown"
),
"rich_results": result.get("richResultsResult", {}).get(
"verdict", "Unknown"
),
}
except Exception as e:
logger.error(f"Failed to inspect URL: {e}")
raise
def get_performance_summary(
self,
site_url: str,
days: int = 30,
) -> dict:
"""Get a summary of search performance."""
end_date = (datetime.now() - timedelta(days=1)).strftime("%Y-%m-%d")
start_date = (datetime.now() - timedelta(days=days)).strftime("%Y-%m-%d")
# Get overall stats
overall = self.get_search_analytics(
site_url=site_url,
dimensions=[],
start_date=start_date,
end_date=end_date,
)
# Get top queries
top_queries = self.get_top_queries(
site_url=site_url,
limit=10,
start_date=start_date,
end_date=end_date,
)
# Get top pages
top_pages = self.get_top_pages(
site_url=site_url,
limit=10,
start_date=start_date,
end_date=end_date,
)
# Get by device
by_device = self.get_search_analytics(
site_url=site_url,
dimensions=["device"],
start_date=start_date,
end_date=end_date,
)
device_breakdown = {}
for row in by_device.rows:
device = row["keys"][0]
device_breakdown[device] = {
"clicks": row.get("clicks", 0),
"impressions": row.get("impressions", 0),
"ctr": row.get("ctr", 0),
"position": row.get("position", 0),
}
return {
"period": f"{start_date} to {end_date}",
"total_clicks": overall.total_clicks,
"total_impressions": overall.total_impressions,
"average_ctr": overall.average_ctr,
"average_position": overall.average_position,
"top_queries": top_queries,
"top_pages": top_pages,
"by_device": device_breakdown,
}
def main():
"""Test the Search Console client."""
import argparse
parser = argparse.ArgumentParser(description="Google Search Console Client")
parser.add_argument("--site", "-s", required=True, help="Site URL")
parser.add_argument("--action", "-a", default="summary",
choices=["summary", "queries", "pages", "sitemaps", "inspect"],
help="Action to perform")
parser.add_argument("--url", help="URL to inspect")
parser.add_argument("--days", type=int, default=30, help="Days of data")
args = parser.parse_args()
client = SearchConsoleClient()
if args.action == "summary":
summary = client.get_performance_summary(args.site, args.days)
import json
print(json.dumps(summary, indent=2, default=str))
elif args.action == "queries":
queries = client.get_top_queries(args.site)
for q in queries[:20]:
print(f"{q['query']}: {q['clicks']} clicks, pos {q['position']:.1f}")
elif args.action == "pages":
pages = client.get_top_pages(args.site)
for p in pages[:20]:
print(f"{p['page']}: {p['clicks']} clicks, pos {p['position']:.1f}")
elif args.action == "sitemaps":
sitemaps = client.get_sitemaps(args.site)
for sm in sitemaps:
print(f"{sm.path}: errors={sm.errors}, warnings={sm.warnings}")
elif args.action == "inspect" and args.url:
result = client.inspect_url(args.site, args.url)
import json
print(json.dumps(result, indent=2))
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,7 @@
# 16-seo-search-console dependencies
google-api-python-client>=2.100.0
google-auth>=2.23.0
pandas>=2.1.0
python-dotenv>=1.0.0
rich>=13.7.0
typer>=0.9.0

View File

@@ -0,0 +1,117 @@
---
name: seo-search-console
version: 1.0.0
description: Google Search Console data analyzer for rankings, CTR, impressions, and index coverage. Triggers: Search Console, GSC, rankings, search performance, impressions, CTR.
allowed-tools: mcp__perplexity__*, mcp__notion__*
---
# SEO Search Console
## Purpose
Analyze Google Search Console data: search performance (queries, pages, CTR, position), sitemap status, and index coverage.
## Core Capabilities
1. **Performance Analysis** - Clicks, impressions, CTR, position
2. **Query Analysis** - Top search queries
3. **Page Performance** - Best/worst performing pages
4. **Index Coverage** - Crawl and index issues
5. **Sitemap Status** - Submission and processing
## Data Collection
### Option 1: User Provides Data
Request GSC export from user:
1. Go to Search Console > Performance
2. Export data (CSV or Google Sheets)
3. Share with assistant
### Option 2: User Describes Data
User verbally provides:
- Top queries and positions
- CTR trends
- Coverage issues
## Analysis Framework
### Performance Metrics
| Metric | What It Measures | Good Benchmark |
|--------|------------------|----------------|
| Clicks | User visits from search | Trending up |
| Impressions | Search appearances | High for target keywords |
| CTR | Click-through rate | 2-5% average |
| Position | Average ranking | <10 for key terms |
### Query Analysis
Identify:
- **Winners** - High position, high CTR
- **Opportunities** - High impressions, low CTR
- **Quick wins** - Position 8-20, low effort to improve
### Page Analysis
Categorize:
- **Top performers** - High clicks, good CTR
- **Underperformers** - High impressions, low CTR
- **Declining** - Down vs previous period
## Workflow
1. Collect GSC data from user
2. Analyze performance trends
3. Identify top queries and pages
4. Find optimization opportunities
5. Check for coverage issues
6. Provide actionable recommendations
## Output Format
```markdown
## Search Console Analysis: [Site]
### Overview (Last 28 Days)
| Metric | Value | vs Previous |
|--------|-------|-------------|
| Clicks | X | +X% |
| Impressions | X | +X% |
| CTR | X% | +X% |
| Position | X | +X |
### Top Queries
| Query | Clicks | Position | Opportunity |
|-------|--------|----------|-------------|
### Top Pages
| Page | Clicks | CTR | Status |
|------|--------|-----|--------|
### Opportunities
1. [Query with high impressions, low CTR]
2. [Page ranking 8-20 that can improve]
### Issues
- [Coverage problems]
- [Sitemap issues]
### Recommendations
1. [Priority action]
```
## Common Issues
| Issue | Impact | Fix |
|-------|--------|-----|
| Low CTR on high-impression query | Lost traffic | Improve title/description |
| Declining positions | Traffic loss | Update content, build links |
| Not indexed pages | No visibility | Fix crawl issues |
| Sitemap errors | Discovery problems | Fix sitemap XML |
## Limitations
- Requires user to provide GSC data
- API access needs service account setup
- Data has 2-3 day delay
- Limited to verified properties

View File

@@ -0,0 +1,235 @@
# SEO Gateway Page Strategist Skill
## Overview
A comprehensive Claude Skill for creating SEO-focused gateway page strategies optimized for Korean medical and service websites, with specific optimization for both Naver and Google.
## Installation
1. Download and extract the skill package
2. In Claude Desktop, go to Settings > Skills
3. Click "Add Skill" and select the folder
4. The skill will be available in your conversations
## File Structure
```
seo-gateway-strategist/
├── SKILL.md # Main skill definition
├── README.md # This file
├── scripts/
│ ├── keyword_analyzer.py # Keyword research automation
│ └── requirements.txt # Python dependencies
├── templates/
│ ├── keyword-research-template.md
│ ├── content-architecture-template.md
│ └── seo-checklist-template.md
└── examples/
└── eye-surgery-strategy-example.md
```
## Quick Start
### Basic Usage
Simply ask Claude:
```
"Use the SEO Gateway Page Strategist skill to create a strategy for [your service]"
```
### Example Prompts
```
"Create an SEO gateway page strategy for 눈 성형"
"Use SEO Gateway Page Strategist skill for laser hair removal targeting Gangnam area"
"Apply the gateway strategist skill for 동안 성형 with competitor analysis"
```
## Features
### 1. Keyword Research & Analysis
- Primary keyword identification
- 10+ LSI keyword suggestions
- Long-tail keyword generation
- Question keywords for FAQs
- User intent distribution analysis
### 2. Content Architecture Planning
- Complete H1-H3 hierarchy
- Section-by-section word counts
- Keyword placement strategy
- Internal linking map
- Content depth recommendations
### 3. Technical SEO Specifications
- Meta tag templates
- Schema markup code
- Performance requirements
- Mobile optimization guidelines
- Core Web Vitals targets
### 4. Performance Tracking
- 30/60/90-day KPI targets
- Measurement methodologies
- Competitive benchmarking
- ROI tracking framework
## Python Scripts Usage
### keyword_analyzer.py
Automated keyword research and analysis tool.
#### Installation
```bash
cd scripts
pip install -r requirements.txt
```
#### Usage
```bash
python keyword_analyzer.py "눈 성형"
```
#### Output
- Keyword analysis report (console)
- JSON export file with all data
- Strategic recommendations
## Templates
### keyword-research-template.md
Comprehensive worksheet for keyword planning:
- Primary keyword metrics
- LSI keyword matrix
- Intent distribution
- Competitor gaps
- Seasonal trends
### content-architecture-template.md
Page structure blueprint:
- Complete heading hierarchy
- Content blocks with word counts
- Keyword density mapping
- Schema markup templates
### seo-checklist-template.md
Technical implementation checklist:
- Meta tags optimization
- Performance requirements
- Mobile optimization
- Naver-specific elements
- Tracking setup
## Best Practices
### For Medical Services
1. Always verify medical claims compliance
2. Include trust signals prominently
3. Focus on E-A-T (Expertise, Authority, Trust)
4. Use medical schema markup
5. Include comprehensive FAQs
### For Local SEO
1. Include location modifiers in keywords
2. Optimize for "near me" searches
3. Add local business schema
4. Create location-specific content
5. Build local citations
### For Naver Optimization
1. Keep titles under 30 Korean characters
2. Optimize image-to-text ratio (40:60)
3. Minimize outbound links
4. Focus on brand searches
5. Integrate Naver Blog content
## Customization
### Industry Adaptation
Replace medical-specific terms with your industry:
- Medical → Your Industry
- 시술 → Service/Product
- 전문의 → Specialist/Expert
- 클리닉 → Business Name
### Language Settings
The skill supports:
- Korean (primary)
- English
- Bilingual optimization
### Market Specifics
Adjust for your market:
- Change location references
- Update competitor examples
- Modify pricing strategies
- Adapt cultural considerations
## Integration Points
### Analytics Tools
- Google Analytics 4
- Google Search Console
- Naver Analytics
- Naver Webmaster Tools
### CMS Platforms
- WordPress
- Ghost
- Strapi
- Custom CMS
### Marketing Tools
- SEMrush
- Ahrefs
- Screaming Frog
- PageSpeed Insights
## Troubleshooting
### Common Issues
**Q: The skill isn't appearing in Claude**
A: Ensure the SKILL.md file is in the root folder and restart Claude Desktop
**Q: Python script not working**
A: Check Python 3.7+ is installed and dependencies are met
**Q: Keywords seem generic**
A: Provide more specific context about your market and competitors
## Updates
### Version History
- v1.0.0 (2024-12-28): Initial release
- Complete keyword research system
- Content architecture planning
- Technical SEO checklist
- Korean market optimization
### Planned Features
- [ ] AI-powered competitor analysis
- [ ] Automated content generation
- [ ] Real-time SERP tracking
- [ ] Multi-language support
- [ ] API integrations
## Support
### Documentation
- Full examples in `/examples/` folder
- Templates in `/templates/` folder
- Skill documentation in `SKILL.md`
### Feedback
Please provide feedback to improve this skill:
- What features would you like added?
- What industries need specific templates?
- What integrations would be helpful?
## License
This skill is provided as-is for use with Claude Desktop. Feel free to modify and adapt for your specific needs.
---
**Created by**: OurDigital SEO Team
**Last Updated**: December 28, 2024
**Version**: 1.0.0

View File

@@ -0,0 +1,65 @@
# CLAUDE.md
## Overview
SEO gateway page strategist for Korean medical/service websites. Creates keyword strategies, content architecture, and technical SEO plans.
## Quick Start
```bash
pip install -r scripts/requirements.txt
# Keyword analysis
python scripts/keyword_analyzer.py --topic "눈 성형" --market "강남"
```
## Scripts
| Script | Purpose |
|--------|---------|
| `keyword_analyzer.py` | Analyze keywords, search volume, competitor gaps |
## Keyword Analyzer
```bash
# Basic analysis
python scripts/keyword_analyzer.py --topic "눈 성형"
# With location targeting
python scripts/keyword_analyzer.py --topic "눈 성형" --market "강남" --output strategy.json
# Competitor analysis
python scripts/keyword_analyzer.py --topic "눈 성형" --competitors url1,url2
```
## Output
Generates strategic document with:
- Primary keyword + monthly search volume
- LSI keywords (7-10)
- User intent distribution
- Competitor gap analysis
- Content architecture (H1-H3 structure)
- Technical SEO checklist
## Templates
See `templates/` for:
- `keyword-research-template.md`
- `content-architecture-template.md`
- `seo-checklist-template.md`
## Workflow
1. Run keyword analyzer for target topic
2. Review search volume and intent data
3. Use output to plan content architecture
4. Hand off to `18-seo-gateway-builder` for content generation
## Configuration
```bash
# Optional: API keys for enhanced data
GOOGLE_API_KEY=xxx
NAVER_API_KEY=xxx
```

View File

@@ -0,0 +1,334 @@
#!/usr/bin/env python3
"""
Keyword Analyzer for SEO Gateway Pages
Analyzes keywords and generates SEO strategy recommendations
"""
import json
from typing import Dict, List, Tuple
from dataclasses import dataclass
from datetime import datetime
@dataclass
class KeywordData:
"""Data structure for keyword information"""
keyword: str
search_volume: int
difficulty: float
intent: str
cpc: float = 0.0
trend: str = "stable"
class KeywordAnalyzer:
"""Analyzes keywords for SEO gateway pages"""
def __init__(self, primary_keyword: str):
self.primary_keyword = primary_keyword
self.results = {
"primary": None,
"lsi": [],
"long_tail": [],
"questions": [],
"intent_distribution": {},
"recommendations": []
}
def analyze_primary_keyword(self) -> KeywordData:
"""
Analyzes the primary keyword
In production, this would call actual keyword research APIs
"""
# Simulated data - replace with actual API calls
keyword_data = {
"눈 성형": {"volume": 12000, "difficulty": 65, "intent": "informational", "cpc": 2500},
"이마 성형": {"volume": 5500, "difficulty": 55, "intent": "informational", "cpc": 3000},
"동안 성형": {"volume": 8000, "difficulty": 70, "intent": "comparative", "cpc": 2800},
}
data = keyword_data.get(self.primary_keyword, {
"volume": 1000,
"difficulty": 50,
"intent": "informational",
"cpc": 1000
})
self.results["primary"] = KeywordData(
keyword=self.primary_keyword,
search_volume=data["volume"],
difficulty=data["difficulty"],
intent=data["intent"],
cpc=data["cpc"]
)
return self.results["primary"]
def generate_lsi_keywords(self) -> List[KeywordData]:
"""Generates LSI (Latent Semantic Indexing) keywords"""
lsi_patterns = {
"눈 성형": [
("쌍꺼풀 수술", 8000, "transactional"),
("눈매교정", 5500, "informational"),
("앞트임", 4000, "informational"),
("뒤트임", 3500, "informational"),
("눈 성형 비용", 2000, "comparative"),
("눈 성형 부작용", 1500, "informational"),
("눈 성형 회복기간", 1800, "informational"),
("눈 성형 전후", 3000, "comparative"),
("남자 눈 성형", 2500, "informational"),
("눈 성형 잘하는곳", 2200, "comparative")
],
"이마 성형": [
("이마거상술", 3000, "informational"),
("이마축소술", 2500, "informational"),
("헤어라인교정", 4000, "transactional"),
("이마 성형 비용", 1200, "comparative"),
("이마 보톡스", 6000, "transactional"),
("M자 탈모 수술", 5000, "informational"),
("이마 필러", 4500, "transactional"),
("이마 성형 부작용", 800, "informational"),
("이마 리프팅", 3500, "comparative"),
("이마 주름 제거", 2800, "transactional")
],
"동안 성형": [
("안면 리프팅", 7000, "transactional"),
("실리프팅", 9000, "transactional"),
("보톡스 시술", 15000, "transactional"),
("필러 시술", 12000, "transactional"),
("동안 성형 비용", 2500, "comparative"),
("울쎄라", 8000, "comparative"),
("써마지", 6500, "comparative"),
("동안 시술 종류", 1800, "informational"),
("주름 제거 시술", 4000, "transactional"),
("동안 성형 추천", 2200, "comparative")
]
}
lsi_list = lsi_patterns.get(self.primary_keyword, [
(f"{self.primary_keyword} 비용", 1000, "comparative"),
(f"{self.primary_keyword} 부작용", 800, "informational"),
(f"{self.primary_keyword} 후기", 1200, "comparative"),
])
for keyword, volume, intent in lsi_list:
self.results["lsi"].append(KeywordData(
keyword=keyword,
search_volume=volume,
difficulty=45 + (volume/1000), # Simple difficulty calculation
intent=intent
))
return self.results["lsi"]
def generate_long_tail_keywords(self) -> List[str]:
"""Generates long-tail keyword variations"""
location_modifiers = ["강남", "신사", "청담", "압구정", "서울"]
action_modifiers = ["잘하는곳", "추천", "유명한", "전문", "비용"]
long_tails = []
for location in location_modifiers:
long_tails.append(f"{location} {self.primary_keyword}")
for action in action_modifiers[:2]: # Limit combinations
long_tails.append(f"{location} {self.primary_keyword} {action}")
self.results["long_tail"] = long_tails
return long_tails
def generate_question_keywords(self) -> List[str]:
"""Generates question-based keywords for featured snippets"""
question_templates = [
f"{self.primary_keyword} 비용은 얼마인가요?",
f"{self.primary_keyword} 회복기간은 얼마나 걸리나요?",
f"{self.primary_keyword} 부작용이 있나요?",
f"{self.primary_keyword} 통증이 심한가요?",
f"{self.primary_keyword} 효과는 얼마나 지속되나요?",
f"{self.primary_keyword} 나이 제한이 있나요?",
f"{self.primary_keyword} 후 주의사항은 무엇인가요?"
]
self.results["questions"] = question_templates
return question_templates
def calculate_intent_distribution(self) -> Dict[str, float]:
"""Calculates user intent distribution across keywords"""
intent_counts = {
"informational": 0,
"comparative": 0,
"transactional": 0,
"navigational": 0
}
# Count primary keyword intent
if self.results["primary"]:
intent_counts[self.results["primary"].intent] += self.results["primary"].search_volume
# Count LSI keyword intents
for kw in self.results["lsi"]:
intent_counts[kw.intent] += kw.search_volume
# Calculate percentages
total_volume = sum(intent_counts.values())
if total_volume > 0:
self.results["intent_distribution"] = {
intent: round((count/total_volume) * 100, 1)
for intent, count in intent_counts.items()
if count > 0
}
return self.results["intent_distribution"]
def generate_recommendations(self) -> List[str]:
"""Generates SEO recommendations based on analysis"""
recommendations = []
# Based on search volume
if self.results["primary"] and self.results["primary"].search_volume > 10000:
recommendations.append("High search volume detected - prioritize this page for development")
# Based on intent distribution
intent_dist = self.results["intent_distribution"]
if intent_dist.get("informational", 0) > 50:
recommendations.append("Focus on educational content and comprehensive guides")
if intent_dist.get("comparative", 0) > 30:
recommendations.append("Include comparison tables and competitive differentiators")
if intent_dist.get("transactional", 0) > 20:
recommendations.append("Optimize conversion elements and CTAs above the fold")
# Based on competition
if self.results["primary"] and self.results["primary"].difficulty > 60:
recommendations.append("High competition - invest in quality content and backlinks")
recommendations.append("Target long-tail keywords for quicker wins")
# Question keywords
if len(self.results["questions"]) > 5:
recommendations.append("Implement FAQ schema markup for featured snippets")
self.results["recommendations"] = recommendations
return recommendations
def export_analysis(self, filename: str = None) -> str:
"""Exports the analysis results to JSON"""
if not filename:
filename = f"keyword_analysis_{self.primary_keyword.replace(' ', '_')}_{datetime.now().strftime('%Y%m%d')}.json"
export_data = {
"analysis_date": datetime.now().isoformat(),
"primary_keyword": self.primary_keyword,
"primary_data": {
"keyword": self.results["primary"].keyword,
"search_volume": self.results["primary"].search_volume,
"difficulty": self.results["primary"].difficulty,
"intent": self.results["primary"].intent
} if self.results["primary"] else None,
"lsi_keywords": [
{
"keyword": kw.keyword,
"volume": kw.search_volume,
"intent": kw.intent
} for kw in self.results["lsi"]
],
"long_tail_keywords": self.results["long_tail"],
"question_keywords": self.results["questions"],
"intent_distribution": self.results["intent_distribution"],
"recommendations": self.results["recommendations"]
}
with open(filename, 'w', encoding='utf-8') as f:
json.dump(export_data, f, ensure_ascii=False, indent=2)
return filename
def generate_report(self) -> str:
"""Generates a formatted text report"""
report = f"""
# Keyword Analysis Report
Generated: {datetime.now().strftime('%Y-%m-%d %H:%M')}
## Primary Keyword: {self.primary_keyword}
- Search Volume: {self.results['primary'].search_volume:,}
- Difficulty: {self.results['primary'].difficulty}/100
- Primary Intent: {self.results['primary'].intent.capitalize()}
## LSI Keywords (Top 10)
"""
for i, kw in enumerate(self.results['lsi'][:10], 1):
report += f"{i}. {kw.keyword} - Volume: {kw.search_volume:,} ({kw.intent})\n"
report += f"\n## User Intent Distribution\n"
for intent, percentage in self.results['intent_distribution'].items():
report += f"- {intent.capitalize()}: {percentage}%\n"
report += f"\n## Long-tail Opportunities\n"
for keyword in self.results['long_tail'][:5]:
report += f"- {keyword}\n"
report += f"\n## Question Keywords (FAQ Optimization)\n"
for question in self.results['questions'][:5]:
report += f"- {question}\n"
report += f"\n## Strategic Recommendations\n"
for i, rec in enumerate(self.results['recommendations'], 1):
report += f"{i}. {rec}\n"
return report
def main():
"""Main execution function"""
import argparse
parser = argparse.ArgumentParser(
description='Analyze keywords for SEO gateway page strategy',
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog='''
Examples:
python keyword_analyzer.py --topic "눈 성형"
python keyword_analyzer.py --topic "이마 성형" --market "강남"
python keyword_analyzer.py --topic "동안 성형" --output strategy.json
'''
)
parser.add_argument('--topic', '-t', required=True,
help='Primary keyword to analyze (e.g., "눈 성형")')
parser.add_argument('--market', '-m', default=None,
help='Target market/location (e.g., "강남")')
parser.add_argument('--output', '-o', default=None,
help='Output JSON file path')
parser.add_argument('--competitors', '-c', default=None,
help='Comma-separated competitor URLs for analysis')
args = parser.parse_args()
keyword = args.topic
if args.market:
keyword = f"{args.market} {args.topic}"
print(f"Analyzing keyword: {keyword}")
print("-" * 50)
analyzer = KeywordAnalyzer(keyword)
# Run analysis
analyzer.analyze_primary_keyword()
analyzer.generate_lsi_keywords()
analyzer.generate_long_tail_keywords()
analyzer.generate_question_keywords()
analyzer.calculate_intent_distribution()
analyzer.generate_recommendations()
# Generate and print report
report = analyzer.generate_report()
print(report)
# Export to JSON
filename = analyzer.export_analysis(args.output)
print(f"\nAnalysis exported to: {filename}")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,25 @@
# Python dependencies for SEO Gateway Page Strategist scripts
# Install with: pip install -r requirements.txt
# Core dependencies
requests>=2.28.0
beautifulsoup4>=4.11.0
pandas>=1.5.0
numpy>=1.23.0
# For API integrations (optional)
google-api-python-client>=2.70.0
pytrends>=4.9.0
# For data visualization (optional)
matplotlib>=3.6.0
seaborn>=0.12.0
# For export formats
openpyxl>=3.0.0
jinja2>=3.1.0
# Development tools
pytest>=7.2.0
black>=22.0.0
pylint>=2.15.0

View File

@@ -0,0 +1,160 @@
# Content Architecture Template
## Page Hierarchy Structure
```
[Page URL: /service-name]
├── H1: [Primary Keyword-Optimized Headline]
│ Example: "강남 눈 성형 전문의가 만드는 자연스러운 눈매"
│ Word Count Target: 15-25 characters
│ Keyword Placement: Primary keyword at beginning
├── Hero Section [Above Fold]
│ ├── Value Proposition (30-50 words)
│ │ └── Keywords: Primary + 1 LSI
│ ├── Trust Signals (3-5 items)
│ │ ├── Certification badges
│ │ ├── Years of experience
│ │ └── Success cases number
│ └── Primary CTA
│ └── Text: "무료 상담 신청하기"
├── H2: [Service Name] 이란? [Problem/Solution Framework]
│ Word Count: 200-300 words
│ Keywords: Primary (1x), LSI (2-3x)
│ ├── H3: 이런 고민이 있으신가요? [Pain Points]
│ │ ├── Pain point 1 (include LSI keyword)
│ │ ├── Pain point 2 (include LSI keyword)
│ │ └── Pain point 3 (include LSI keyword)
│ └── H3: [Clinic Name]의 솔루션 [Benefits]
│ ├── Benefit 1 (address pain point 1)
│ ├── Benefit 2 (address pain point 2)
│ └── Benefit 3 (address pain point 3)
├── H2: [Service Name] 종류 및 방법 [Service Categories]
│ Word Count: 400-500 words total
│ Keywords: Category-specific LSI keywords
│ ├── H3: [Sub-service 1] - [LSI Keyword Variation]
│ │ ├── Description (80-100 words)
│ │ ├── Best for (target audience)
│ │ ├── Duration & Recovery
│ │ └── CTA: "자세히 보기"
│ ├── H3: [Sub-service 2] - [LSI Keyword Variation]
│ │ └── [Same structure as above]
│ └── H3: [Sub-service 3] - [LSI Keyword Variation]
│ └── [Same structure as above]
├── H2: [Clinic Name] [Service Name]만의 차별점 [Trust & Authority]
│ Word Count: 300-400 words
│ Keywords: Brand + Primary keyword combinations
│ ├── H3: 전문 의료진 [Doctor Credentials]
│ │ ├── Doctor profile summary
│ │ ├── Specializations
│ │ └── Certifications
│ ├── H3: 검증된 시술 결과 [Success Metrics]
│ │ ├── Number statistics
│ │ ├── Success rate
│ │ └── Patient satisfaction
│ └── H3: 첨단 장비 및 시설 [Facilities]
│ ├── Equipment descriptions
│ └── Safety protocols
├── H2: [Service Name] 자주 묻는 질문 [FAQ Section]
│ Word Count: 500-700 words
│ Keywords: Long-tail question keywords
│ ├── Q1: [Long-tail keyword as question]?
│ │ └── A: [40-60 word answer, keyword in first sentence]
│ ├── Q2: [Price-related question]?
│ │ └── A: [Include "비용" LSI keyword]
│ ├── Q3: [Recovery-related question]?
│ │ └── A: [Include "회복기간" LSI keyword]
│ ├── Q4: [Side-effect question]?
│ │ └── A: [Include "부작용" LSI keyword]
│ ├── Q5: [Process question]?
│ │ └── A: [Include process-related LSI]
│ ├── Q6: [Candidacy question]?
│ │ └── A: [Include target audience keywords]
│ └── Q7: [Results duration question]?
│ └── A: [Include maintenance keywords]
├── H2: [Service Name] 시술 과정 [Process Guide]
│ Word Count: 300-400 words
│ Keywords: "과정", "단계", procedural LSI
│ ├── H3: 상담 및 검사 [Consultation]
│ ├── H3: 시술 당일 [Procedure Day]
│ ├── H3: 회복 과정 [Recovery]
│ └── H3: 사후 관리 [Aftercare]
├── H2: 실제 고객 후기 [Social Proof]
│ Word Count: 200-300 words
│ Keywords: "후기", "리뷰", satisfaction keywords
│ ├── Review snippet 1
│ ├── Review snippet 2
│ ├── Review snippet 3
│ └── Before/After gallery teaser
└── H2: 상담 예약 안내 [Conversion Section]
Word Count: 150-200 words
Keywords: CTA-related, location keywords
├── H3: 상담 예약 방법
├── H3: 오시는 길
└── H3: 문의 정보
```
## Keyword Density Map
| Section | Primary Keyword | LSI Keywords | Total Keywords |
|---------|----------------|--------------|----------------|
| Hero | 1 | 1-2 | 2-3 |
| Problem/Solution | 1 | 2-3 | 3-4 |
| Service Categories | 1-2 | 4-6 | 5-8 |
| Trust & Authority | 1 | 2-3 | 3-4 |
| FAQ | 2-3 | 5-7 | 7-10 |
| Process | 1 | 2-3 | 3-4 |
| Social Proof | 0-1 | 1-2 | 1-3 |
| Conversion | 1 | 1-2 | 2-3 |
| **Total** | **8-11** | **18-29** | **26-40** |
## Internal Linking Strategy
| From Section | To Page | Anchor Text | Purpose |
|-------------|---------|-------------|---------|
| Service Categories | Sub-service page | [Sub-service name] | Deep dive |
| FAQ | Price page | "비용 안내 페이지" | Conversion |
| Trust section | Doctor profile | "[Doctor name] 원장" | Authority |
| Process section | Consultation form | "상담 예약하기" | Conversion |
| Social proof | Gallery page | "더 많은 전후 사진" | Engagement |
## Content Length Guidelines
- **Total Page Length**: 2,000-2,500 words
- **Above Fold Content**: 100-150 words
- **Each H2 Section**: 200-500 words
- **Each H3 Subsection**: 80-150 words
- **Meta Description**: 150-160 characters
- **Image Alt Text**: 10-15 words each
## Schema Markup Requirements
```json
{
"@context": "https://schema.org",
"@type": "MedicalProcedure",
"name": "[Service Name]",
"description": "[Meta description]",
"procedureType": "Cosmetic",
"provider": {
"@type": "MedicalOrganization",
"name": "[Clinic Name]"
}
}
```
## Mobile Content Adaptation
- Reduce hero text by 30%
- Show 3 FAQs initially (expand for more)
- Simplify navigation to single-column
- Increase CTA button size
- Compress trust signals to carousel

View File

@@ -0,0 +1,95 @@
# Keyword Research Template
## Primary Keyword Analysis
| Metric | Value | Notes |
|--------|-------|-------|
| **Primary Keyword** | [KEYWORD] | Main target keyword |
| **Monthly Search Volume** | [VOLUME] | Average monthly searches |
| **Keyword Difficulty** | [0-100] | Competition score |
| **Current Ranking** | #[POSITION] | Current SERP position |
| **Search Trend** | ↑ ↓ → | Trending direction |
## LSI Keywords Matrix
| LSI Keyword | Search Volume | Intent Type | Priority |
|------------|--------------|-------------|----------|
| [keyword 1] | [volume] | Informational | High |
| [keyword 2] | [volume] | Transactional | Medium |
| [keyword 3] | [volume] | Comparative | High |
| [keyword 4] | [volume] | Informational | Medium |
| [keyword 5] | [volume] | Transactional | Low |
| [keyword 6] | [volume] | Comparative | High |
| [keyword 7] | [volume] | Informational | Medium |
| [keyword 8] | [volume] | Navigational | Low |
| [keyword 9] | [volume] | Transactional | High |
| [keyword 10] | [volume] | Informational | Medium |
## User Intent Distribution
```
Informational (Research Phase): ___%
- Common queries: "what is", "how to", "benefits of"
- Content needed: Educational guides, FAQs, process explanations
Comparative (Evaluation Phase): ___%
- Common queries: "best", "vs", "reviews", "비교"
- Content needed: Comparison tables, reviews, case studies
Transactional (Ready to Convert): ___%
- Common queries: "price", "book", "consultation", "예약"
- Content needed: CTAs, pricing, booking forms
```
## Long-tail Keyword Opportunities
### Question-based Keywords
- [질문 키워드 1]
- [질문 키워드 2]
- [질문 키워드 3]
### Location-based Keywords
- [지역] + [primary keyword]
- [지역] + [primary keyword] + 잘하는곳
- [지역] + [primary keyword] + 추천
### Modifier-based Keywords
- [primary keyword] + 비용
- [primary keyword] + 부작용
- [primary keyword] + 회복기간
- [primary keyword] + 전후
## Competitor Keyword Analysis
| Competitor | Target Keywords | Ranking Keywords | Gap Opportunities |
|------------|----------------|------------------|-------------------|
| Competitor 1 | [keywords] | [keywords] | [missing keywords] |
| Competitor 2 | [keywords] | [keywords] | [missing keywords] |
| Competitor 3 | [keywords] | [keywords] | [missing keywords] |
## Seasonal Trends
| Month | Search Volume | Events/Factors |
|-------|--------------|----------------|
| January | [volume] | New year resolutions |
| February | [volume] | [factor] |
| March | [volume] | [factor] |
| ... | ... | ... |
## Platform-Specific Keywords
### Naver-Optimized
- [네이버 specific keyword 1]
- [네이버 specific keyword 2]
### Google-Optimized
- [Google specific keyword 1]
- [Google specific keyword 2]
## Action Items
- [ ] Target primary keyword in H1 and title tag
- [ ] Include 3-5 LSI keywords naturally in content
- [ ] Create content matching user intent distribution
- [ ] Optimize for question-based featured snippets
- [ ] Add location modifiers for local SEO

View File

@@ -0,0 +1,239 @@
# SEO Technical Checklist Template
## Meta Tags Optimization
### Title Tag
- [ ] Length: 50-60 characters
- [ ] Primary keyword at beginning
- [ ] Brand name at end
- [ ] Unique for each page
- [ ] Formula: `[Primary Keyword] - [Value Proposition] | [Brand]`
**Template**: `{primary_keyword} 전문 - {unique_value} | {clinic_name}`
**Example**: `눈 성형 전문 - 자연스러운 라인 | 제이미클리닉`
### Meta Description
- [ ] Length: 150-160 characters
- [ ] Include primary keyword
- [ ] Include 1-2 LSI keywords
- [ ] Clear CTA
- [ ] Unique for each page
**Template**: `{location} {primary_keyword} 전문의가 {benefit}. {credential}. 무료상담 ☎ {phone}`
**Example**: `강남 눈 성형 전문의가 자연스러운 눈매를 디자인합니다. 15년 경력, 10,000건 시술. 무료상담 ☎ 02-1234-5678`
### Open Graph Tags
```html
<meta property="og:title" content="{page_title}">
<meta property="og:description" content="{meta_description}">
<meta property="og:image" content="{featured_image_url}">
<meta property="og:url" content="{page_url}">
<meta property="og:type" content="website">
<meta property="og:locale" content="ko_KR">
```
## Header Tags Structure
- [ ] Only one H1 per page
- [ ] H1 contains primary keyword
- [ ] H2 tags for main sections (5-7)
- [ ] H3 tags for subsections
- [ ] Logical hierarchy maintained
- [ ] Keywords distributed naturally
## Content Optimization
### Keyword Density
- [ ] Primary keyword: 2-3% (20-30 times per 1000 words)
- [ ] LSI keywords: 1-2% each
- [ ] Natural placement (no stuffing)
- [ ] Synonyms and variations used
### Content Structure
- [ ] First 100 words include primary keyword
- [ ] Short paragraphs (3-4 sentences)
- [ ] Bullet points and lists
- [ ] Bold important keywords (sparingly)
- [ ] Internal links: 5-10
- [ ] External links: 2-3 (authoritative)
## Schema Markup
### Medical Procedure Schema
```json
{
"@context": "https://schema.org",
"@type": "MedicalProcedure",
"name": "{procedure_name}",
"procedureType": "Cosmetic",
"bodyLocation": "{body_part}",
"outcome": "{expected_outcome}",
"preparation": "{preparation_required}",
"followup": "{followup_care}",
"provider": {
"@type": "MedicalOrganization",
"name": "{clinic_name}",
"address": {
"@type": "PostalAddress",
"streetAddress": "{street}",
"addressLocality": "{city}",
"addressCountry": "KR"
}
}
}
```
### FAQ Schema
```json
{
"@context": "https://schema.org",
"@type": "FAQPage",
"mainEntity": [{
"@type": "Question",
"name": "{question}",
"acceptedAnswer": {
"@type": "Answer",
"text": "{answer}"
}
}]
}
```
## Image Optimization
- [ ] Descriptive file names: `eye-surgery-before-after-case1.jpg`
- [ ] Alt text with keywords: `눈 성형 전후 사진 - 30대 여성 사례`
- [ ] Compressed file size (< 200KB)
- [ ] WebP format with fallback
- [ ] Lazy loading implemented
- [ ] Image sitemap created
## Performance Optimization
### Page Speed
- [ ] Load time < 3 seconds
- [ ] First Contentful Paint < 1.8s
- [ ] Time to Interactive < 3.8s
- [ ] Total page size < 3MB
- [ ] Requests minimized (< 50)
### Core Web Vitals
- [ ] LCP (Largest Contentful Paint) < 2.5s
- [ ] FID (First Input Delay) < 100ms
- [ ] CLS (Cumulative Layout Shift) < 0.1
## Mobile Optimization
- [ ] Mobile-responsive design
- [ ] Viewport meta tag set
- [ ] Touch-friendly buttons (44x44px minimum)
- [ ] Readable font size (16px minimum)
- [ ] No horizontal scrolling
- [ ] Mobile page speed < 3s
## URL Structure
- [ ] SEO-friendly URL: `/eye-surgery` or `/눈-성형`
- [ ] No special characters
- [ ] Lowercase only
- [ ] Hyphens for word separation
- [ ] Under 60 characters
- [ ] Include primary keyword
## Internal Linking
| From Page | To Page | Anchor Text | Purpose |
|-----------|---------|-------------|---------|
| Gateway | Service Detail | {service_name} | Deep content |
| Gateway | Doctor Profile | {doctor_name} 원장 | Authority |
| Gateway | Pricing | 비용 안내 | Conversion |
| Gateway | Gallery | 시술 전후 사진 | Engagement |
| Gateway | Contact | 상담 예약 | Conversion |
## Naver-Specific Optimization
### Naver Webmaster Tools
- [ ] Site verification complete
- [ ] XML sitemap submitted
- [ ] Robots.txt configured
- [ ] Syndication feed active
- [ ] Site optimization report reviewed
### Naver SEO Elements
- [ ] Title under 30 Korean characters
- [ ] C-Rank tags implemented
- [ ] Image-to-text ratio optimized (40:60)
- [ ] Outbound links minimized
- [ ] Brand search optimization
## Tracking & Analytics
- [ ] Google Analytics 4 installed
- [ ] Google Search Console verified
- [ ] Naver Analytics installed
- [ ] Conversion tracking configured
- [ ] Event tracking for CTAs
- [ ] Heatmap tool installed
## Security & Technical
- [ ] SSL certificate active (HTTPS)
- [ ] WWW/non-WWW redirect configured
- [ ] 404 error page customized
- [ ] XML sitemap generated
- [ ] Robots.txt optimized
- [ ] Canonical URLs set
- [ ] Hreflang tags (if multi-language)
## Quality Checks
### Content Quality
- [ ] No spelling/grammar errors
- [ ] Medical information accurate
- [ ] Legal compliance verified
- [ ] Contact information correct
- [ ] CTAs working properly
### Cross-browser Testing
- [ ] Chrome (Desktop/Mobile)
- [ ] Safari (Desktop/Mobile)
- [ ] Firefox
- [ ] Samsung Internet
- [ ] Naver Whale
## Monthly Monitoring Tasks
- [ ] Keyword ranking check
- [ ] Organic traffic analysis
- [ ] Bounce rate monitoring
- [ ] Conversion rate tracking
- [ ] Competitor analysis
- [ ] Content freshness update
- [ ] Broken link check
- [ ] Page speed test
## Priority Levels
1. **Critical (Day 1)**
- Title and meta tags
- H1 optimization
- Mobile responsiveness
- Page speed < 4s
2. **High (Week 1)**
- Schema markup
- Internal linking
- Image optimization
- Content optimization
3. **Medium (Week 2-3)**
- Naver optimization
- FAQ implementation
- Social proof elements
- Analytics setup
4. **Low (Month 2)**
- A/B testing
- Advanced schema
- Link building
- Content expansion

View File

@@ -0,0 +1,159 @@
---
name: seo-gateway-strategist
version: 1.0.0
description: Creates comprehensive SEO-focused gateway page strategies for Korean medical/service websites
license: OurDigital internal-use ONLY
---
# SEO Gateway Page Strategist
This skill helps you create comprehensive SEO-focused gateway page strategies for Korean medical/service websites, optimized for both Naver and Google.
## Core Competencies
1. **Keyword Research & Analysis**: Identifies primary and LSI keywords with search intent mapping
2. **Content Architecture**: Creates hierarchical page structure optimized for SEO
3. **Technical SEO Planning**: Defines specific technical requirements and meta optimizations
4. **Performance Targeting**: Sets measurable KPIs and tracking methodologies
5. **Competitor Analysis**: Analyzes top-ranking competitors for gap identification
## When to Use This Skill
Use this skill when:
- Planning a new gateway page for any service/procedure category
- Restructuring existing pages for better SEO performance
- Conducting keyword research for content planning
- Setting SEO performance targets and KPIs
- Analyzing competitor strategies
## Instructions
When using this skill, provide:
1. **Service/Procedure Name**: The main topic for the gateway page (e.g., "눈 성형", "이마 성형")
2. **Target Market**: Location and demographic information
3. **Current Performance** (optional): Existing rankings, traffic data if available
4. **Competitor URLs** (optional): Known competitors to analyze
## Process Workflow
### Step 1: Keyword & Intent Analysis
```python
# The skill will generate:
- Primary keyword with monthly search volume
- 7-10 LSI (Latent Semantic Indexing) keywords
- User intent distribution (Informational/Comparative/Transactional)
- Top 3 competitor analysis
```
### Step 2: Content Architecture
The skill creates a complete H1-H3 structure with keyword placement strategy:
```
H1: [Primary keyword-optimized headline]
├── Hero Section
├── Problem/Solution Framework
├── Service Categories
├── Trust & Authority
├── FAQ Section
└── Consultation Guide
```
### Step 3: Technical SEO Requirements
Generates specific technical specifications:
- Meta tags formulas and character limits
- Schema markup recommendations
- Internal linking strategy
- Image optimization guidelines
- Core Web Vitals targets
### Step 4: Performance Metrics
Sets 30/60/90-day KPIs with tracking methodology
## Example Usage
### Basic Request:
```
"Create an SEO gateway page strategy for 눈 성형"
```
### Detailed Request:
```
"Create an SEO gateway page strategy for 눈 성형 targeting women aged 25-45 in Gangnam.
Current ranking: page 2 for main keyword.
Competitor: www.example-clinic.com/eye-surgery"
```
## Output Format
The skill delivers a structured report containing:
1. **Keyword Strategy Table**
- Primary and LSI keywords with search volumes
- User intent percentages
- Competitor gap analysis
2. **Content Architecture Document**
- Complete page hierarchy (H1-H3)
- Word count targets per section
- Keyword placement map
3. **Technical SEO Checklist**
- Meta tag templates
- Schema markup code
- Performance requirements
4. **Performance Dashboard**
- Current baseline metrics
- Target KPIs with timeline
- Tracking methodology
## Templates Included
- `keyword-research-template.md`: Keyword analysis worksheet
- `content-architecture-template.md`: Page structure template
- `seo-checklist-template.md`: Technical SEO requirements
- `performance-tracking-template.md`: KPI tracking sheet
## Scripts Included
- `keyword_analyzer.py`: Automates keyword research and intent analysis
- `competitor_analyzer.py`: Scrapes and analyzes competitor pages
- `seo_scorer.py`: Calculates SEO optimization score
## Best Practices
1. **Mobile-First Approach**: Always optimize for mobile (70%+ traffic in Korea)
2. **Naver vs Google**: Consider platform-specific optimization differences
3. **Local SEO**: Include location modifiers for local intent
4. **Medical Compliance**: Ensure content meets Korean medical advertising regulations
5. **User Intent Matching**: Align content with search intent distribution
## Common Patterns
### For Medical Services:
```
Primary: [시술명]
LSI: [시술명 비용], [시술명 부작용], [시술명 회복기간], [시술명 전후]
Intent: 60% Informational, 30% Comparative, 10% Transactional
```
### For Local Services:
```
Primary: [지역] [서비스명]
LSI: [지역] [서비스명] 추천, [지역] [서비스명] 잘하는곳, [지역] [서비스명] 가격
Intent: 40% Informational, 40% Comparative, 20% Transactional
```
## Integration Points
This skill integrates with:
- Google Search Console for current performance data
- Naver Webmaster Tools for Naver-specific metrics
- Analytics platforms for user behavior data
- Keyword research tools APIs
## Notes
- Always validate keyword search volumes with actual tools
- Consider seasonal trends in search behavior
- Update strategy based on algorithm changes
- Monitor competitor movements regularly

View File

@@ -0,0 +1,284 @@
# SEO Gateway Page Strategy: 눈 성형
**Generated by SEO Gateway Page Strategist Skill**
**Date:** 2024-12-28
**Target Market:** Gangnam, Seoul - Women 25-45
---
## 1. Keyword Strategy Analysis
### Primary Keyword Research
| Metric | Value | Analysis |
|--------|-------|----------|
| **Primary Keyword** | 눈 성형 | High-volume medical keyword |
| **Monthly Search Volume** | 12,000 | Excellent traffic potential |
| **Keyword Difficulty** | 65/100 | Moderate-high competition |
| **CPC** | ₩2,500 | High commercial intent |
| **Trend** | ↑ 15% YoY | Growing interest |
### LSI Keywords Matrix
| LSI Keyword | Monthly Volume | Intent Type | Implementation Priority |
|------------|---------------|-------------|------------------------|
| 쌍꺼풀 수술 | 8,000 | Transactional | High - Sub-category page |
| 눈매교정 | 5,500 | Informational | High - Sub-category page |
| 앞트임 | 4,000 | Informational | High - Sub-category page |
| 뒤트임 | 3,500 | Informational | Medium - Sub-category |
| 눈 성형 비용 | 2,000 | Comparative | High - FAQ section |
| 눈 성형 부작용 | 1,500 | Informational | High - FAQ section |
| 눈 성형 회복기간 | 1,800 | Informational | High - Process section |
| 눈 성형 전후 | 3,000 | Comparative | High - Gallery section |
| 강남 눈 성형 | 2,200 | Local | High - Hero/title |
| 눈 성형 잘하는곳 | 2,200 | Comparative | High - Trust section |
### User Intent Distribution
```
Informational (Research): 55%
- Users researching procedures, risks, recovery
- Content needed: Educational guides, process explanations, FAQs
Comparative (Evaluation): 35%
- Users comparing clinics, prices, results
- Content needed: Before/after gallery, reviews, differentiators
Transactional (Booking): 10%
- Users ready to book consultation
- Content needed: Clear CTAs, contact forms, pricing info
```
### Competitor Analysis
| Competitor | Domain Authority | Target Keywords | Content Gap Opportunities |
|-----------|-----------------|-----------------|--------------------------|
| Competitor A | 45 | 눈 성형, 쌍꺼풀 | Lacks detailed recovery guide |
| Competitor B | 52 | 강남 눈 성형 | No video content |
| Competitor C | 38 | 눈 성형 비용 | Limited trust signals |
---
## 2. Content Architecture
```
H1: 강남 눈 성형 전문의가 디자인하는 자연스러운 눈매
├── Hero Section [100-150 words]
│ ├── Value Proposition: "15년 경력 전문의가 만드는 당신만의 아름다운 눈매"
│ ├── Trust Signals:
│ │ • 보건복지부 인증 의료기관
│ │ • 10,000+ 성공 사례
│ │ • 대한성형외과학회 정회원
│ │ • 2024 의료서비스 대상 수상
│ └── Primary CTA: "무료 상담 신청하기"
├── H2: 눈 성형, 이런 고민이 있으신가요? [200-250 words]
│ ├── H3: 작고 답답한 눈매로 인한 스트레스
│ ├── H3: 나이들어 보이는 처진 눈꺼풀
│ └── H3: 좌우 비대칭으로 인한 불균형
├── H2: 제이미클리닉 눈 성형 종류 [400-500 words]
│ ├── H3: 자연스러운 쌍꺼풀 성형 - 매몰법 & 절개법
│ │ ├── Description (100 words)
│ │ ├── 시술시간: 30-60분
│ │ ├── 회복기간: 5-10일
│ │ └── CTA: "쌍꺼풀 자세히 보기"
│ │
│ ├── H3: 또렷한 눈매교정술
│ │ ├── Description (100 words)
│ │ ├── 시술시간: 45분
│ │ ├── 회복기간: 7일
│ │ └── CTA: "눈매교정 자세히 보기"
│ │
│ ├── H3: 시원한 앞트임 & 뒤트임
│ │ ├── Description (100 words)
│ │ ├── 시술시간: 30-45분
│ │ ├── 회복기간: 5-7일
│ │ └── CTA: "트임 수술 자세히 보기"
│ │
│ └── H3: 상안검 & 하안검 성형
│ ├── Description (100 words)
│ ├── 시술시간: 60-90분
│ ├── 회복기간: 10-14일
│ └── CTA: "안검 성형 자세히 보기"
├── H2: 제이미클리닉만의 차별점 [300-350 words]
│ ├── H3: 전문 의료진의 노하우
│ ├── H3: 1:1 맞춤 디자인 시스템
│ └── H3: 최신 장비와 안전 시스템
├── H2: 눈 성형 자주 묻는 질문 (FAQ) [500-600 words]
│ ├── Q: 눈 성형 비용은 얼마인가요?
│ ├── Q: 눈 성형 회복기간은 얼마나 걸리나요?
│ ├── Q: 눈 성형 부작용은 없나요?
│ ├── Q: 쌍꺼풀 라인은 어떻게 결정하나요?
│ ├── Q: 재수술도 가능한가요?
│ ├── Q: 나이 제한이 있나요?
│ └── Q: 수술 후 일상생활은 언제부터 가능한가요?
├── H2: 눈 성형 시술 과정 [250-300 words]
│ ├── H3: STEP 1. 1:1 맞춤 상담
│ ├── H3: STEP 2. 정밀 검사 및 디자인
│ ├── H3: STEP 3. 안전한 수술 진행
│ └── H3: STEP 4. 체계적인 사후 관리
├── H2: 실제 고객 후기 및 전후 사진 [200-250 words]
│ ├── Review snippets (3-5)
│ ├── Before/After gallery teaser
│ └── CTA: "더 많은 후기 보기"
└── H2: 상담 예약 안내 [150-200 words]
├── H3: 온라인 상담 예약
├── H3: 오시는 길
└── H3: 문의: 02-1234-5678
```
---
## 3. Technical SEO Requirements
### Meta Tags
```html
<title>강남 눈 성형 전문 - 자연스러운 눈매 디자인 | 제이미클리닉</title>
<meta name="description" content="강남 눈 성형 15년 전문의가 만드는 자연스러운 눈매. 쌍꺼풀, 눈매교정, 트임 수술 10,000건 이상. 무료상담 ☎ 02-1234-5678">
<meta name="keywords" content="눈 성형, 강남 눈 성형, 쌍꺼풀 수술, 눈매교정, 앞트임, 뒤트임">
```
### Schema Markup
```json
{
"@context": "https://schema.org",
"@type": "MedicalProcedure",
"name": "눈 성형",
"procedureType": "Cosmetic",
"bodyLocation": "Eye",
"outcome": "Enhanced eye appearance with natural-looking results",
"preparation": "Consultation and design planning required",
"followup": "Regular check-ups for 3 months",
"provider": {
"@type": "MedicalOrganization",
"name": "제이미클리닉",
"address": {
"@type": "PostalAddress",
"streetAddress": "테헤란로 123",
"addressLocality": "강남구",
"addressRegion": "서울특별시",
"addressCountry": "KR"
},
"telephone": "+82-2-1234-5678"
}
}
```
### Internal Linking Strategy
| From Section | To Page | Anchor Text | Priority |
|-------------|---------|-------------|----------|
| 쌍꺼풀 section | /double-eyelid | "쌍꺼풀 자세히 보기" | High |
| FAQ 비용 | /pricing | "자세한 비용 안내" | High |
| 의료진 section | /doctors | "전문의 프로필 보기" | Medium |
| 후기 section | /reviews | "더 많은 후기 보기" | Medium |
| 상담 section | /consultation | "온라인 상담 신청" | High |
### Performance Targets
- Page Load Speed: < 2.5 seconds
- Mobile Score: > 95/100
- Core Web Vitals: All green
- Image Optimization: WebP format, < 150KB each
---
## 4. Performance KPIs & Tracking
### 30-Day Targets
| Metric | Current | 30-Day Target | Measurement Tool |
|--------|---------|---------------|------------------|
| Primary Keyword Ranking | #8 | #5 | Search Console |
| Organic Traffic | 500/month | 750/month | GA4 |
| Bounce Rate | 65% | 55% | GA4 |
| Avg. Time on Page | 1:30 | 2:30 | GA4 |
| Conversion Rate | 1.5% | 2.5% | GA4 Goals |
### 60-Day Targets
| Metric | Target | Strategy |
|--------|--------|----------|
| Primary Keyword Ranking | #3 | Content expansion + backlinks |
| Organic Traffic | 1,200/month | Long-tail optimization |
| Featured Snippets | 2+ | FAQ schema implementation |
| Page Authority | 25+ | Link building campaign |
### 90-Day Targets
| Metric | Target | Success Indicator |
|--------|--------|-------------------|
| Primary Keyword Ranking | #1-3 | Page 1 dominant position |
| Organic Traffic | 2,000/month | 300% increase |
| Conversion Rate | 3-5% | Industry-leading performance |
| Revenue Attribution | ₩50M+ | Positive ROI |
---
## 5. Implementation Roadmap
### Week 1: Foundation
- [ ] Implement title and meta tags
- [ ] Create H1 and basic structure
- [ ] Optimize hero section
- [ ] Set up tracking codes
### Week 2: Content Development
- [ ] Write all H2 sections
- [ ] Create FAQ content
- [ ] Develop procedure descriptions
- [ ] Add trust signals
### Week 3: Technical Optimization
- [ ] Implement schema markup
- [ ] Optimize images
- [ ] Set up internal linking
- [ ] Mobile optimization
### Week 4: Enhancement
- [ ] Add reviews and testimonials
- [ ] Create before/after gallery
- [ ] Implement CTA variations
- [ ] A/B testing setup
### Month 2-3: Growth
- [ ] Content expansion based on data
- [ ] Link building outreach
- [ ] Naver optimization
- [ ] Conversion rate optimization
---
## 6. Strategic Recommendations
### High Priority Actions
1. **Content Depth**: Create comprehensive 2,500+ word content to outrank competitors
2. **E-A-T Signals**: Emphasize doctor credentials and certifications prominently
3. **Local SEO**: Optimize for "강남 눈 성형" with location-specific content
4. **Mobile UX**: Ensure thumb-friendly CTAs and fast mobile load times
### Quick Wins (Implement Immediately)
1. Add FAQ schema markup for featured snippets
2. Compress and optimize all images
3. Add trust badges above fold
4. Implement click-to-call for mobile
5. Create location-specific landing pages
### Long-term Strategy
1. Build topical authority with supporting content
2. Develop video content for higher engagement
3. Create interactive tools (virtual consultation)
4. Establish thought leadership through blog
5. Implement review generation system
---
## Notes for Implementation
- **Legal Compliance**: Ensure all medical claims are verified and compliant with KFDA regulations
- **Image Rights**: Use only authorized before/after photos with patient consent
- **Pricing Display**: Follow Korean medical advertising laws regarding price disclosure
- **Mobile First**: With 70% mobile traffic, prioritize mobile experience in all decisions
- **Naver Specific**: Create separate content strategy for Naver Blog integration
---
**End of Strategy Document**
*Generated by SEO Gateway Page Strategist Skill v1.0*

View File

@@ -0,0 +1,160 @@
# Content Architecture Template
## Page Hierarchy Structure
```
[Page URL: /service-name]
├── H1: [Primary Keyword-Optimized Headline]
│ Example: "강남 눈 성형 전문의가 만드는 자연스러운 눈매"
│ Word Count Target: 15-25 characters
│ Keyword Placement: Primary keyword at beginning
├── Hero Section [Above Fold]
│ ├── Value Proposition (30-50 words)
│ │ └── Keywords: Primary + 1 LSI
│ ├── Trust Signals (3-5 items)
│ │ ├── Certification badges
│ │ ├── Years of experience
│ │ └── Success cases number
│ └── Primary CTA
│ └── Text: "무료 상담 신청하기"
├── H2: [Service Name] 이란? [Problem/Solution Framework]
│ Word Count: 200-300 words
│ Keywords: Primary (1x), LSI (2-3x)
│ ├── H3: 이런 고민이 있으신가요? [Pain Points]
│ │ ├── Pain point 1 (include LSI keyword)
│ │ ├── Pain point 2 (include LSI keyword)
│ │ └── Pain point 3 (include LSI keyword)
│ └── H3: [Clinic Name]의 솔루션 [Benefits]
│ ├── Benefit 1 (address pain point 1)
│ ├── Benefit 2 (address pain point 2)
│ └── Benefit 3 (address pain point 3)
├── H2: [Service Name] 종류 및 방법 [Service Categories]
│ Word Count: 400-500 words total
│ Keywords: Category-specific LSI keywords
│ ├── H3: [Sub-service 1] - [LSI Keyword Variation]
│ │ ├── Description (80-100 words)
│ │ ├── Best for (target audience)
│ │ ├── Duration & Recovery
│ │ └── CTA: "자세히 보기"
│ ├── H3: [Sub-service 2] - [LSI Keyword Variation]
│ │ └── [Same structure as above]
│ └── H3: [Sub-service 3] - [LSI Keyword Variation]
│ └── [Same structure as above]
├── H2: [Clinic Name] [Service Name]만의 차별점 [Trust & Authority]
│ Word Count: 300-400 words
│ Keywords: Brand + Primary keyword combinations
│ ├── H3: 전문 의료진 [Doctor Credentials]
│ │ ├── Doctor profile summary
│ │ ├── Specializations
│ │ └── Certifications
│ ├── H3: 검증된 시술 결과 [Success Metrics]
│ │ ├── Number statistics
│ │ ├── Success rate
│ │ └── Patient satisfaction
│ └── H3: 첨단 장비 및 시설 [Facilities]
│ ├── Equipment descriptions
│ └── Safety protocols
├── H2: [Service Name] 자주 묻는 질문 [FAQ Section]
│ Word Count: 500-700 words
│ Keywords: Long-tail question keywords
│ ├── Q1: [Long-tail keyword as question]?
│ │ └── A: [40-60 word answer, keyword in first sentence]
│ ├── Q2: [Price-related question]?
│ │ └── A: [Include "비용" LSI keyword]
│ ├── Q3: [Recovery-related question]?
│ │ └── A: [Include "회복기간" LSI keyword]
│ ├── Q4: [Side-effect question]?
│ │ └── A: [Include "부작용" LSI keyword]
│ ├── Q5: [Process question]?
│ │ └── A: [Include process-related LSI]
│ ├── Q6: [Candidacy question]?
│ │ └── A: [Include target audience keywords]
│ └── Q7: [Results duration question]?
│ └── A: [Include maintenance keywords]
├── H2: [Service Name] 시술 과정 [Process Guide]
│ Word Count: 300-400 words
│ Keywords: "과정", "단계", procedural LSI
│ ├── H3: 상담 및 검사 [Consultation]
│ ├── H3: 시술 당일 [Procedure Day]
│ ├── H3: 회복 과정 [Recovery]
│ └── H3: 사후 관리 [Aftercare]
├── H2: 실제 고객 후기 [Social Proof]
│ Word Count: 200-300 words
│ Keywords: "후기", "리뷰", satisfaction keywords
│ ├── Review snippet 1
│ ├── Review snippet 2
│ ├── Review snippet 3
│ └── Before/After gallery teaser
└── H2: 상담 예약 안내 [Conversion Section]
Word Count: 150-200 words
Keywords: CTA-related, location keywords
├── H3: 상담 예약 방법
├── H3: 오시는 길
└── H3: 문의 정보
```
## Keyword Density Map
| Section | Primary Keyword | LSI Keywords | Total Keywords |
|---------|----------------|--------------|----------------|
| Hero | 1 | 1-2 | 2-3 |
| Problem/Solution | 1 | 2-3 | 3-4 |
| Service Categories | 1-2 | 4-6 | 5-8 |
| Trust & Authority | 1 | 2-3 | 3-4 |
| FAQ | 2-3 | 5-7 | 7-10 |
| Process | 1 | 2-3 | 3-4 |
| Social Proof | 0-1 | 1-2 | 1-3 |
| Conversion | 1 | 1-2 | 2-3 |
| **Total** | **8-11** | **18-29** | **26-40** |
## Internal Linking Strategy
| From Section | To Page | Anchor Text | Purpose |
|-------------|---------|-------------|---------|
| Service Categories | Sub-service page | [Sub-service name] | Deep dive |
| FAQ | Price page | "비용 안내 페이지" | Conversion |
| Trust section | Doctor profile | "[Doctor name] 원장" | Authority |
| Process section | Consultation form | "상담 예약하기" | Conversion |
| Social proof | Gallery page | "더 많은 전후 사진" | Engagement |
## Content Length Guidelines
- **Total Page Length**: 2,000-2,500 words
- **Above Fold Content**: 100-150 words
- **Each H2 Section**: 200-500 words
- **Each H3 Subsection**: 80-150 words
- **Meta Description**: 150-160 characters
- **Image Alt Text**: 10-15 words each
## Schema Markup Requirements
```json
{
"@context": "https://schema.org",
"@type": "MedicalProcedure",
"name": "[Service Name]",
"description": "[Meta description]",
"procedureType": "Cosmetic",
"provider": {
"@type": "MedicalOrganization",
"name": "[Clinic Name]"
}
}
```
## Mobile Content Adaptation
- Reduce hero text by 30%
- Show 3 FAQs initially (expand for more)
- Simplify navigation to single-column
- Increase CTA button size
- Compress trust signals to carousel

View File

@@ -0,0 +1,95 @@
# Keyword Research Template
## Primary Keyword Analysis
| Metric | Value | Notes |
|--------|-------|-------|
| **Primary Keyword** | [KEYWORD] | Main target keyword |
| **Monthly Search Volume** | [VOLUME] | Average monthly searches |
| **Keyword Difficulty** | [0-100] | Competition score |
| **Current Ranking** | #[POSITION] | Current SERP position |
| **Search Trend** | ↑ ↓ → | Trending direction |
## LSI Keywords Matrix
| LSI Keyword | Search Volume | Intent Type | Priority |
|------------|--------------|-------------|----------|
| [keyword 1] | [volume] | Informational | High |
| [keyword 2] | [volume] | Transactional | Medium |
| [keyword 3] | [volume] | Comparative | High |
| [keyword 4] | [volume] | Informational | Medium |
| [keyword 5] | [volume] | Transactional | Low |
| [keyword 6] | [volume] | Comparative | High |
| [keyword 7] | [volume] | Informational | Medium |
| [keyword 8] | [volume] | Navigational | Low |
| [keyword 9] | [volume] | Transactional | High |
| [keyword 10] | [volume] | Informational | Medium |
## User Intent Distribution
```
Informational (Research Phase): ___%
- Common queries: "what is", "how to", "benefits of"
- Content needed: Educational guides, FAQs, process explanations
Comparative (Evaluation Phase): ___%
- Common queries: "best", "vs", "reviews", "비교"
- Content needed: Comparison tables, reviews, case studies
Transactional (Ready to Convert): ___%
- Common queries: "price", "book", "consultation", "예약"
- Content needed: CTAs, pricing, booking forms
```
## Long-tail Keyword Opportunities
### Question-based Keywords
- [질문 키워드 1]
- [질문 키워드 2]
- [질문 키워드 3]
### Location-based Keywords
- [지역] + [primary keyword]
- [지역] + [primary keyword] + 잘하는곳
- [지역] + [primary keyword] + 추천
### Modifier-based Keywords
- [primary keyword] + 비용
- [primary keyword] + 부작용
- [primary keyword] + 회복기간
- [primary keyword] + 전후
## Competitor Keyword Analysis
| Competitor | Target Keywords | Ranking Keywords | Gap Opportunities |
|------------|----------------|------------------|-------------------|
| Competitor 1 | [keywords] | [keywords] | [missing keywords] |
| Competitor 2 | [keywords] | [keywords] | [missing keywords] |
| Competitor 3 | [keywords] | [keywords] | [missing keywords] |
## Seasonal Trends
| Month | Search Volume | Events/Factors |
|-------|--------------|----------------|
| January | [volume] | New year resolutions |
| February | [volume] | [factor] |
| March | [volume] | [factor] |
| ... | ... | ... |
## Platform-Specific Keywords
### Naver-Optimized
- [네이버 specific keyword 1]
- [네이버 specific keyword 2]
### Google-Optimized
- [Google specific keyword 1]
- [Google specific keyword 2]
## Action Items
- [ ] Target primary keyword in H1 and title tag
- [ ] Include 3-5 LSI keywords naturally in content
- [ ] Create content matching user intent distribution
- [ ] Optimize for question-based featured snippets
- [ ] Add location modifiers for local SEO

View File

@@ -0,0 +1,239 @@
# SEO Technical Checklist Template
## Meta Tags Optimization
### Title Tag
- [ ] Length: 50-60 characters
- [ ] Primary keyword at beginning
- [ ] Brand name at end
- [ ] Unique for each page
- [ ] Formula: `[Primary Keyword] - [Value Proposition] | [Brand]`
**Template**: `{primary_keyword} 전문 - {unique_value} | {clinic_name}`
**Example**: `눈 성형 전문 - 자연스러운 라인 | 제이미클리닉`
### Meta Description
- [ ] Length: 150-160 characters
- [ ] Include primary keyword
- [ ] Include 1-2 LSI keywords
- [ ] Clear CTA
- [ ] Unique for each page
**Template**: `{location} {primary_keyword} 전문의가 {benefit}. {credential}. 무료상담 ☎ {phone}`
**Example**: `강남 눈 성형 전문의가 자연스러운 눈매를 디자인합니다. 15년 경력, 10,000건 시술. 무료상담 ☎ 02-1234-5678`
### Open Graph Tags
```html
<meta property="og:title" content="{page_title}">
<meta property="og:description" content="{meta_description}">
<meta property="og:image" content="{featured_image_url}">
<meta property="og:url" content="{page_url}">
<meta property="og:type" content="website">
<meta property="og:locale" content="ko_KR">
```
## Header Tags Structure
- [ ] Only one H1 per page
- [ ] H1 contains primary keyword
- [ ] H2 tags for main sections (5-7)
- [ ] H3 tags for subsections
- [ ] Logical hierarchy maintained
- [ ] Keywords distributed naturally
## Content Optimization
### Keyword Density
- [ ] Primary keyword: 2-3% (20-30 times per 1000 words)
- [ ] LSI keywords: 1-2% each
- [ ] Natural placement (no stuffing)
- [ ] Synonyms and variations used
### Content Structure
- [ ] First 100 words include primary keyword
- [ ] Short paragraphs (3-4 sentences)
- [ ] Bullet points and lists
- [ ] Bold important keywords (sparingly)
- [ ] Internal links: 5-10
- [ ] External links: 2-3 (authoritative)
## Schema Markup
### Medical Procedure Schema
```json
{
"@context": "https://schema.org",
"@type": "MedicalProcedure",
"name": "{procedure_name}",
"procedureType": "Cosmetic",
"bodyLocation": "{body_part}",
"outcome": "{expected_outcome}",
"preparation": "{preparation_required}",
"followup": "{followup_care}",
"provider": {
"@type": "MedicalOrganization",
"name": "{clinic_name}",
"address": {
"@type": "PostalAddress",
"streetAddress": "{street}",
"addressLocality": "{city}",
"addressCountry": "KR"
}
}
}
```
### FAQ Schema
```json
{
"@context": "https://schema.org",
"@type": "FAQPage",
"mainEntity": [{
"@type": "Question",
"name": "{question}",
"acceptedAnswer": {
"@type": "Answer",
"text": "{answer}"
}
}]
}
```
## Image Optimization
- [ ] Descriptive file names: `eye-surgery-before-after-case1.jpg`
- [ ] Alt text with keywords: `눈 성형 전후 사진 - 30대 여성 사례`
- [ ] Compressed file size (< 200KB)
- [ ] WebP format with fallback
- [ ] Lazy loading implemented
- [ ] Image sitemap created
## Performance Optimization
### Page Speed
- [ ] Load time < 3 seconds
- [ ] First Contentful Paint < 1.8s
- [ ] Time to Interactive < 3.8s
- [ ] Total page size < 3MB
- [ ] Requests minimized (< 50)
### Core Web Vitals
- [ ] LCP (Largest Contentful Paint) < 2.5s
- [ ] FID (First Input Delay) < 100ms
- [ ] CLS (Cumulative Layout Shift) < 0.1
## Mobile Optimization
- [ ] Mobile-responsive design
- [ ] Viewport meta tag set
- [ ] Touch-friendly buttons (44x44px minimum)
- [ ] Readable font size (16px minimum)
- [ ] No horizontal scrolling
- [ ] Mobile page speed < 3s
## URL Structure
- [ ] SEO-friendly URL: `/eye-surgery` or `/눈-성형`
- [ ] No special characters
- [ ] Lowercase only
- [ ] Hyphens for word separation
- [ ] Under 60 characters
- [ ] Include primary keyword
## Internal Linking
| From Page | To Page | Anchor Text | Purpose |
|-----------|---------|-------------|---------|
| Gateway | Service Detail | {service_name} | Deep content |
| Gateway | Doctor Profile | {doctor_name} 원장 | Authority |
| Gateway | Pricing | 비용 안내 | Conversion |
| Gateway | Gallery | 시술 전후 사진 | Engagement |
| Gateway | Contact | 상담 예약 | Conversion |
## Naver-Specific Optimization
### Naver Webmaster Tools
- [ ] Site verification complete
- [ ] XML sitemap submitted
- [ ] Robots.txt configured
- [ ] Syndication feed active
- [ ] Site optimization report reviewed
### Naver SEO Elements
- [ ] Title under 30 Korean characters
- [ ] C-Rank tags implemented
- [ ] Image-to-text ratio optimized (40:60)
- [ ] Outbound links minimized
- [ ] Brand search optimization
## Tracking & Analytics
- [ ] Google Analytics 4 installed
- [ ] Google Search Console verified
- [ ] Naver Analytics installed
- [ ] Conversion tracking configured
- [ ] Event tracking for CTAs
- [ ] Heatmap tool installed
## Security & Technical
- [ ] SSL certificate active (HTTPS)
- [ ] WWW/non-WWW redirect configured
- [ ] 404 error page customized
- [ ] XML sitemap generated
- [ ] Robots.txt optimized
- [ ] Canonical URLs set
- [ ] Hreflang tags (if multi-language)
## Quality Checks
### Content Quality
- [ ] No spelling/grammar errors
- [ ] Medical information accurate
- [ ] Legal compliance verified
- [ ] Contact information correct
- [ ] CTAs working properly
### Cross-browser Testing
- [ ] Chrome (Desktop/Mobile)
- [ ] Safari (Desktop/Mobile)
- [ ] Firefox
- [ ] Samsung Internet
- [ ] Naver Whale
## Monthly Monitoring Tasks
- [ ] Keyword ranking check
- [ ] Organic traffic analysis
- [ ] Bounce rate monitoring
- [ ] Conversion rate tracking
- [ ] Competitor analysis
- [ ] Content freshness update
- [ ] Broken link check
- [ ] Page speed test
## Priority Levels
1. **Critical (Day 1)**
- Title and meta tags
- H1 optimization
- Mobile responsiveness
- Page speed < 4s
2. **High (Week 1)**
- Schema markup
- Internal linking
- Image optimization
- Content optimization
3. **Medium (Week 2-3)**
- Naver optimization
- FAQ implementation
- Social proof elements
- Analytics setup
4. **Low (Month 2)**
- A/B testing
- Advanced schema
- Link building
- Content expansion

View File

@@ -0,0 +1,234 @@
# Gateway Page Content Builder
A comprehensive Claude Skill for generating SEO-optimized gateway pages for local services, medical practices, and businesses.
## 🎯 Purpose
This skill provides a systematic framework for creating high-quality gateway pages that:
- Target location + service keyword combinations
- Scale content creation while maintaining uniqueness
- Include proper technical SEO elements
- Generate structured data and schema markup
## 📁 Structure
```
gateway-page-content-builder/
├── SKILL.md # Main skill documentation (REQUIRED)
├── README.md # This file
├── templates/ # Page templates
│ ├── gateway-page-medical.md
│ ├── gateway-page-beauty.md
│ └── gateway-page-dental.md
├── scripts/ # Automation scripts
│ ├── generate_pages.py
│ └── keyword_research.py
├── config/ # Configuration files
│ ├── locations.json
│ ├── services.json
│ └── brand.json
└── examples/ # Example outputs
└── gangnam-laser-hair-removal.md
```
## 🚀 Quick Start
### 1. Basic Usage with Claude
Simply ask Claude to use this skill:
```
"Use the Gateway Page Content Builder skill to create a page for laser hair removal in Gangnam"
```
### 2. Batch Generation with Python
```python
from scripts.generate_pages import GatewayPageGenerator, Brand, Location, Service
# Setup your brand
brand = Brand(
name_en="Your Clinic",
name_kr="클리닉명",
website="www.yourclinic.com",
# ... other details
)
# Define locations and services
locations = [...] # Your location list
services = [...] # Your service list
# Generate pages
generator = GatewayPageGenerator(brand)
generator.generate_batch(services, locations)
```
### 3. Using Templates
Templates are in Markdown format with placeholders:
- `[Service]` - Service name
- `[Location]` - Location name
- `[Brand]` - Brand/clinic name
- Additional custom placeholders
## 📝 Content Strategy
### Keyword Formula
```
Primary: [Service] + [Location]
Secondary: [Location] + [Service] + "clinic/center"
Long-tail: "best" + [Service] + "in" + [Location]
```
### Content Uniqueness
Each page should have:
- 30% minimum unique content
- Local landmarks and transportation
- Location-specific testimonials
- Regional statistics or demographics
### Page Length Guidelines
- Minimum: 800 words
- Optimal: 1,200-1,500 words
- Maximum: 2,000 words
## 🎨 Customization
### Adding New Templates
1. Create new template in `templates/` directory
2. Use consistent placeholder naming
3. Include schema markup section
4. Add to configuration
### Extending Services/Locations
Edit the JSON configuration files:
`config/services.json`:
```json
{
"services": [
{
"id": "new-service",
"name_en": "New Service",
"name_kr": "새로운 서비스",
"keywords": ["keyword1", "keyword2"]
}
]
}
```
`config/locations.json`:
```json
{
"locations": [
{
"id": "new-location",
"name_en": "New Location",
"name_kr": "새로운 지역",
"landmarks": ["Landmark 1", "Landmark 2"]
}
]
}
```
## 🔧 Technical Implementation
### URL Structure
```
/[location]/[service]/
Example: /gangnam/laser-hair-removal/
```
### Required Meta Tags
- Title (60 characters max)
- Description (155 characters max)
- Canonical URL
- Open Graph tags
- Schema markup
### Schema Types
- MedicalBusiness
- LocalBusiness
- Service
- Review
- FAQPage
## 📊 Performance Tracking
### Key Metrics
- Organic traffic by page
- Keyword rankings
- Conversion rate
- Bounce rate
- Time on page
### A/B Testing Elements
- Headlines
- CTA buttons
- Trust signals
- Content length
## 🔄 Content Refresh Schedule
- **Weekly**: Review top 10% pages
- **Monthly**: Update testimonials
- **Quarterly**: Refresh statistics
- **Annually**: Full content audit
## 🛠️ Troubleshooting
### Common Issues
1. **Duplicate Content Warnings**
- Ensure 30% unique content minimum
- Use content variation patterns
- Add local elements
2. **Low Rankings**
- Check keyword density (2-3%)
- Verify schema markup
- Add more local signals
3. **Poor Conversion**
- Test CTA placement
- Add trust signals
- Improve page speed
## 📚 Resources
### SEO Tools
- Ahrefs / SEMrush (keyword research)
- Screaming Frog (technical audit)
- Google Search Console (performance)
- Schema.org (structured data)
### Content Tools
- Grammarly (grammar check)
- Hemingway (readability)
- Copyscape (plagiarism)
## 🤝 Contributing
To improve this skill:
1. Test templates with real data
2. Share performance metrics
3. Suggest new features
4. Report issues
## 📄 License
This skill is provided as-is for use with Claude Desktop and Claude Projects.
## 🆘 Support
For questions or assistance:
- Review SKILL.md for detailed documentation
- Check examples/ directory for samples
- Test with the Python script first
---
**Version**: 1.0.0
**Last Updated**: November 2024
**Author**: OurDigital Dr.D

View File

@@ -0,0 +1,82 @@
# CLAUDE.md
## Overview
Gateway page content generator for local services. Creates SEO-optimized pages from location/service configurations.
## Quick Start
```bash
# Generate pages from config
python scripts/generate_pages.py --config config/services.json --locations config/locations.json
```
## Scripts
| Script | Purpose |
|--------|---------|
| `generate_pages.py` | Generate gateway pages from templates |
## Page Generator
```bash
# Generate all combinations
python scripts/generate_pages.py \
--config config/services.json \
--locations config/locations.json \
--output ./pages
# Single service/location
python scripts/generate_pages.py \
--service "laser_hair_removal" \
--location "gangnam" \
--template templates/gateway-page-medical.md
```
## Configuration Files
### services.json
```json
{
"services": [
{
"id": "laser_hair_removal",
"korean": "레이저 제모",
"keywords": ["laser hair removal", "permanent hair removal"]
}
]
}
```
### locations.json
```json
{
"locations": [
{
"id": "gangnam",
"korean": "강남",
"full_address": "서울특별시 강남구"
}
]
}
```
## Templates
- `templates/gateway-page-medical.md` - Medical service template
- Supports variables: `{{service}}`, `{{location}}`, `{{brand}}`
## Output
Generates markdown files with:
- SEO-optimized title and meta
- Structured content sections
- Schema markup recommendations
- Internal linking suggestions
## Workflow
1. Get strategy from `17-seo-gateway-architect`
2. Configure services and locations
3. Run generator for content drafts
4. Review and customize output

View File

@@ -0,0 +1,235 @@
{
"locations": [
{
"id": "gangnam",
"name_en": "Gangnam",
"name_kr": "강남",
"name_cn": "江南",
"district": "Gangnam-gu",
"city": "Seoul",
"full_address_kr": "서울특별시 강남구 테헤란로 123",
"full_address_en": "123 Teheran-ro, Gangnam-gu, Seoul",
"postal_code": "06234",
"latitude": 37.4979,
"longitude": 127.0276,
"landmarks": [
"COEX Mall",
"Gangnam Station",
"Samsung Station",
"Bongeunsa Temple"
],
"subway_stations": [
{
"name": "Gangnam Station",
"lines": ["Line 2", "Sinbundang Line"],
"exit": "Exit 11",
"walking_time": "3 minutes"
},
{
"name": "Sinnonhyeon Station",
"lines": ["Line 9"],
"exit": "Exit 6",
"walking_time": "5 minutes"
}
],
"bus_routes": ["146", "341", "360", "740"],
"demographics": "Young professionals, affluent residents, business district",
"population": "approximately 570,000",
"average_age": "38.5",
"key_industries": ["IT", "Finance", "Medical", "Education"],
"nearby_hospitals": ["Samsung Medical Center", "Gangnam Severance Hospital"],
"parking": {
"available": true,
"type": "Underground",
"capacity": 50,
"fee": "Free for patients"
}
},
{
"id": "myeongdong",
"name_en": "Myeongdong",
"name_kr": "명동",
"name_cn": "明洞",
"district": "Jung-gu",
"city": "Seoul",
"full_address_kr": "서울특별시 중구 명동길 456",
"full_address_en": "456 Myeongdong-gil, Jung-gu, Seoul",
"postal_code": "04537",
"latitude": 37.5636,
"longitude": 126.9869,
"landmarks": [
"Myeongdong Cathedral",
"Lotte Department Store",
"Shinsegae Department Store",
"N Seoul Tower"
],
"subway_stations": [
{
"name": "Myeongdong Station",
"lines": ["Line 4"],
"exit": "Exit 8",
"walking_time": "2 minutes"
},
{
"name": "Euljiro 1-ga Station",
"lines": ["Line 2"],
"exit": "Exit 5",
"walking_time": "7 minutes"
}
],
"bus_routes": ["104", "421", "463", "507"],
"demographics": "Tourists, young shoppers, office workers",
"population": "approximately 13,000",
"average_age": "42.3",
"key_industries": ["Retail", "Tourism", "Banking", "Hospitality"],
"nearby_hospitals": ["Seoul Medical Center", "Asan Medical Center"],
"parking": {
"available": true,
"type": "Building parking",
"capacity": 30,
"fee": "2,000 KRW per hour"
}
},
{
"id": "hongdae",
"name_en": "Hongdae",
"name_kr": "홍대",
"name_cn": "弘大",
"district": "Mapo-gu",
"city": "Seoul",
"full_address_kr": "서울특별시 마포구 와우산로 789",
"full_address_en": "789 Wausan-ro, Mapo-gu, Seoul",
"postal_code": "04104",
"latitude": 37.5563,
"longitude": 126.9220,
"landmarks": [
"Hongik University",
"Hongdae Shopping Street",
"Sangsang Madang",
"Hongdae Free Market"
],
"subway_stations": [
{
"name": "Hongik University Station",
"lines": ["Line 2", "Airport Railroad", "Gyeongui-Jungang Line"],
"exit": "Exit 9",
"walking_time": "1 minute"
},
{
"name": "Sangsu Station",
"lines": ["Line 6"],
"exit": "Exit 1",
"walking_time": "8 minutes"
}
],
"bus_routes": ["271", "602", "603", "760"],
"demographics": "University students, young adults, artists, indie culture",
"population": "approximately 385,000",
"average_age": "35.2",
"key_industries": ["Entertainment", "Arts", "Cafes", "Fashion"],
"nearby_hospitals": ["Severance Hospital", "Sinchon Severance Hospital"],
"parking": {
"available": false,
"type": "Street parking only",
"capacity": 0,
"fee": "Public parking nearby"
}
},
{
"id": "apgujeong",
"name_en": "Apgujeong",
"name_kr": "압구정",
"name_cn": "狎鸥亭",
"district": "Gangnam-gu",
"city": "Seoul",
"full_address_kr": "서울특별시 강남구 압구정로 321",
"full_address_en": "321 Apgujeong-ro, Gangnam-gu, Seoul",
"postal_code": "06001",
"latitude": 37.5311,
"longitude": 127.0403,
"landmarks": [
"Galleria Department Store",
"Apgujeong Rodeo Street",
"Dosan Park",
"K-Star Road"
],
"subway_stations": [
{
"name": "Apgujeong Station",
"lines": ["Line 3"],
"exit": "Exit 2",
"walking_time": "4 minutes"
},
{
"name": "Apgujeong Rodeo Station",
"lines": ["Suin-Bundang Line"],
"exit": "Exit 5",
"walking_time": "3 minutes"
}
],
"bus_routes": ["145", "301", "362", "472"],
"demographics": "High-income residents, luxury shoppers, K-beauty enthusiasts",
"population": "approximately 24,000",
"average_age": "41.7",
"key_industries": ["Luxury retail", "Beauty", "Plastic surgery", "Fashion"],
"nearby_hospitals": ["Chung-Ang University Hospital", "Seoul St. Mary's Hospital"],
"parking": {
"available": true,
"type": "Valet service",
"capacity": 40,
"fee": "Complimentary for VIP patients"
}
},
{
"id": "jamsil",
"name_en": "Jamsil",
"name_kr": "잠실",
"name_cn": "蚕室",
"district": "Songpa-gu",
"city": "Seoul",
"full_address_kr": "서울특별시 송파구 올림픽로 240",
"full_address_en": "240 Olympic-ro, Songpa-gu, Seoul",
"postal_code": "05554",
"latitude": 37.5132,
"longitude": 127.1001,
"landmarks": [
"Lotte World Tower",
"Lotte World",
"Seokchon Lake",
"Olympic Park"
],
"subway_stations": [
{
"name": "Jamsil Station",
"lines": ["Line 2", "Line 8"],
"exit": "Exit 4",
"walking_time": "5 minutes"
},
{
"name": "Jamsilsaenae Station",
"lines": ["Line 2"],
"exit": "Exit 9",
"walking_time": "10 minutes"
}
],
"bus_routes": ["2415", "3217", "3313", "3414"],
"demographics": "Families, middle-class residents, sports enthusiasts",
"population": "approximately 660,000",
"average_age": "40.1",
"key_industries": ["Retail", "Entertainment", "Sports", "Tourism"],
"nearby_hospitals": ["Asan Medical Center", "Seoul Paik Hospital"],
"parking": {
"available": true,
"type": "Multi-level parking",
"capacity": 100,
"fee": "First 2 hours free"
}
}
],
"metadata": {
"last_updated": "2024-11-19",
"total_locations": 5,
"coverage_area": "Seoul Metropolitan Area",
"coordinate_system": "WGS84"
}
}

View File

@@ -0,0 +1,308 @@
{
"services": [
{
"id": "laser-hair-removal",
"name_en": "Laser Hair Removal",
"name_kr": "레이저 제모",
"name_cn": "激光脱毛",
"category": "Dermatology",
"subcategory": "Aesthetic Dermatology",
"description": "Permanent hair reduction using state-of-the-art laser technology",
"detailed_description": "Our laser hair removal treatment uses FDA-approved diode and Alexandrite lasers to target hair follicles, providing long-lasting hair reduction with minimal discomfort. Suitable for all skin types.",
"keywords": [
"permanent hair removal",
"laser hair removal",
"IPL hair removal",
"diode laser",
"hair reduction",
"unwanted hair",
"smooth skin"
],
"procedure_time": "15-60 minutes",
"sessions_required": "6-8 sessions",
"recovery_time": "No downtime",
"results_visible": "After 2-3 sessions",
"price_range": "₩₩₩",
"price_details": {
"small_area": "50,000-100,000 KRW",
"medium_area": "100,000-200,000 KRW",
"large_area": "200,000-400,000 KRW"
},
"target_areas": [
"Face",
"Underarms",
"Arms",
"Legs",
"Bikini line",
"Back",
"Chest"
],
"benefits": [
"Permanent hair reduction",
"Smoother skin",
"No more razor burns",
"Time-saving",
"Cost-effective long-term"
],
"side_effects": [
"Temporary redness",
"Mild swelling",
"Skin sensitivity"
],
"contraindications": [
"Pregnancy",
"Active skin infections",
"Recent sun exposure",
"Certain medications"
],
"seo_priority": "high",
"search_volume": 14800,
"competition": "medium"
},
{
"id": "botox",
"name_en": "Botox Treatment",
"name_kr": "보톡스",
"name_cn": "肉毒素",
"category": "Cosmetic Dermatology",
"subcategory": "Injectable Treatments",
"description": "FDA-approved botulinum toxin for wrinkle reduction and facial rejuvenation",
"detailed_description": "Botox injections temporarily relax facial muscles to reduce the appearance of fine lines and wrinkles. Our experienced practitioners use precise injection techniques for natural-looking results.",
"keywords": [
"botox",
"botulinum toxin",
"wrinkle treatment",
"anti-aging",
"facial rejuvenation",
"crow's feet",
"forehead lines",
"frown lines"
],
"procedure_time": "15-30 minutes",
"sessions_required": "1 session, repeat every 3-6 months",
"recovery_time": "No downtime",
"results_visible": "3-7 days",
"price_range": "₩₩₩₩",
"price_details": {
"per_unit": "10,000-15,000 KRW",
"full_face": "300,000-500,000 KRW",
"single_area": "150,000-250,000 KRW"
},
"target_areas": [
"Forehead lines",
"Crow's feet",
"Frown lines",
"Bunny lines",
"Lip lines",
"Chin dimpling",
"Jawline slimming"
],
"benefits": [
"Reduces wrinkles",
"Prevents new lines",
"Non-surgical",
"Quick procedure",
"Natural results"
],
"side_effects": [
"Temporary bruising",
"Mild headache",
"Drooping eyelid (rare)"
],
"contraindications": [
"Pregnancy/nursing",
"Neuromuscular disorders",
"Allergy to ingredients",
"Active infection at injection site"
],
"seo_priority": "high",
"search_volume": 22200,
"competition": "high"
},
{
"id": "dermal-filler",
"name_en": "Dermal Filler",
"name_kr": "필러",
"name_cn": "玻尿酸",
"category": "Cosmetic Dermatology",
"subcategory": "Injectable Treatments",
"description": "Hyaluronic acid fillers for volume restoration and facial contouring",
"detailed_description": "Our dermal filler treatments use premium hyaluronic acid products to restore volume, enhance facial contours, and smooth deep wrinkles. Results are immediate and can last 6-18 months.",
"keywords": [
"dermal filler",
"hyaluronic acid",
"facial volume",
"lip filler",
"cheek filler",
"nasolabial folds",
"facial contouring"
],
"procedure_time": "30-45 minutes",
"sessions_required": "1 session, touch-ups as needed",
"recovery_time": "24-48 hours",
"results_visible": "Immediate",
"price_range": "₩₩₩₩",
"price_details": {
"per_syringe": "300,000-600,000 KRW",
"lip_augmentation": "400,000-700,000 KRW",
"full_face": "1,500,000-3,000,000 KRW"
},
"target_areas": [
"Lips",
"Cheeks",
"Under eyes",
"Nasolabial folds",
"Marionette lines",
"Jawline",
"Temples"
],
"benefits": [
"Instant results",
"Restores volume",
"Smooths wrinkles",
"Enhances contours",
"Reversible if needed"
],
"side_effects": [
"Swelling",
"Bruising",
"Tenderness",
"Lumps (temporary)"
],
"contraindications": [
"Pregnancy",
"Blood clotting disorders",
"Active cold sores",
"Allergy to lidocaine"
],
"seo_priority": "high",
"search_volume": 18100,
"competition": "high"
},
{
"id": "chemical-peel",
"name_en": "Chemical Peel",
"name_kr": "케미컬 필링",
"name_cn": "化学换肤",
"category": "Medical Aesthetics",
"subcategory": "Skin Resurfacing",
"description": "Professional chemical exfoliation for skin renewal and rejuvenation",
"detailed_description": "Chemical peels use carefully selected acids to remove damaged skin layers, revealing fresher, younger-looking skin. We offer light, medium, and deep peels customized to your skin concerns.",
"keywords": [
"chemical peel",
"skin resurfacing",
"glycolic acid",
"TCA peel",
"skin renewal",
"acne scars",
"pigmentation treatment"
],
"procedure_time": "30-45 minutes",
"sessions_required": "3-6 sessions",
"recovery_time": "3-7 days",
"results_visible": "After peeling completes",
"price_range": "₩₩",
"price_details": {
"light_peel": "100,000-200,000 KRW",
"medium_peel": "300,000-500,000 KRW",
"deep_peel": "700,000-1,000,000 KRW"
},
"target_areas": [
"Full face",
"Neck",
"Décolletage",
"Hands",
"Back"
],
"benefits": [
"Improves texture",
"Reduces pigmentation",
"Minimizes pores",
"Treats acne",
"Stimulates collagen"
],
"side_effects": [
"Peeling",
"Redness",
"Sensitivity",
"Temporary darkening"
],
"contraindications": [
"Active herpes",
"Isotretinoin use",
"Pregnancy",
"Open wounds"
],
"seo_priority": "medium",
"search_volume": 8100,
"competition": "medium"
},
{
"id": "acne-treatment",
"name_en": "Acne Treatment",
"name_kr": "여드름 치료",
"name_cn": "痤疮治疗",
"category": "Medical Dermatology",
"subcategory": "Acne Management",
"description": "Comprehensive medical treatment for acne and acne scarring",
"detailed_description": "Our acne treatment program combines medical-grade therapies including laser treatments, chemical peels, and prescription medications to clear active acne and prevent future breakouts.",
"keywords": [
"acne treatment",
"acne scars",
"pimple treatment",
"blackheads",
"whiteheads",
"cystic acne",
"hormonal acne"
],
"procedure_time": "45-60 minutes",
"sessions_required": "4-8 sessions",
"recovery_time": "Minimal",
"results_visible": "4-6 weeks",
"price_range": "₩₩₩",
"price_details": {
"consultation": "50,000 KRW",
"basic_treatment": "150,000-250,000 KRW",
"laser_therapy": "300,000-500,000 KRW"
},
"target_areas": [
"Face",
"Back",
"Chest",
"Shoulders"
],
"benefits": [
"Clears acne",
"Prevents scarring",
"Reduces inflammation",
"Improves skin texture",
"Boosts confidence"
],
"side_effects": [
"Initial purging",
"Dryness",
"Mild irritation",
"Sun sensitivity"
],
"contraindications": [
"Pregnancy (certain medications)",
"Photosensitivity disorders",
"Active infections"
],
"seo_priority": "high",
"search_volume": 12100,
"competition": "medium"
}
],
"metadata": {
"last_updated": "2024-11-19",
"total_services": 5,
"categories": [
"Dermatology",
"Cosmetic Dermatology",
"Medical Aesthetics",
"Medical Dermatology"
],
"price_currency": "KRW"
}
}

View File

@@ -0,0 +1,398 @@
#!/usr/bin/env python3
"""
Gateway Page Content Generator
Automates the creation of SEO-optimized gateway pages for local services
"""
import json
import os
from datetime import datetime
from typing import Dict, List, Optional
import re
from dataclasses import dataclass
from pathlib import Path
@dataclass
class Location:
"""Location data structure"""
id: str
name_en: str
name_kr: str
full_address: str
landmarks: List[str]
subway_stations: List[str]
demographics: str
latitude: float
longitude: float
@dataclass
class Service:
"""Service data structure"""
id: str
name_en: str
name_kr: str
category: str
description: str
keywords: List[str]
procedure_time: str
recovery_time: str
price_range: str
@dataclass
class Brand:
"""Brand/Clinic information"""
name_en: str
name_kr: str
website: str
phone: str
email: str
established_year: int
certifications: List[str]
unique_selling_points: List[str]
class GatewayPageGenerator:
"""Main class for generating gateway page content"""
def __init__(self, brand: Brand, template_path: str = None):
self.brand = brand
# Use script directory as base for template path
if template_path is None:
script_dir = Path(__file__).parent.parent
self.template_path = script_dir / "templates"
else:
self.template_path = Path(template_path)
self.generated_pages = []
def load_template(self, template_name: str) -> str:
"""Load a template file"""
template_file = self.template_path / template_name
if template_file.exists():
with open(template_file, 'r', encoding='utf-8') as f:
return f.read()
else:
raise FileNotFoundError(f"Template {template_name} not found")
def generate_meta_tags(self, service: Service, location: Location) -> Dict:
"""Generate SEO meta tags"""
return {
"title": f"{service.name_en} in {location.name_en} | Expert {service.category} | {self.brand.name_en}",
"description": f"Looking for {service.name_en.lower()} in {location.name_en}? "
f"{self.brand.name_en} offers professional {service.category.lower()} services. "
f"✓ Experienced team ✓ Latest technology ✓ {self.brand.unique_selling_points[0]}",
"keywords": ", ".join([
f"{service.name_en} {location.name_en}",
f"{location.name_en} {service.name_en}",
*service.keywords,
f"{service.category} {location.name_en}"
]),
"canonical": f"https://{self.brand.website}/{location.id}/{service.id}/",
"og:title": f"{service.name_en} in {location.name_en} - {self.brand.name_en}",
"og:description": f"Professional {service.name_en} services in {location.name_en}. "
f"Book your consultation today.",
"og:image": f"https://{self.brand.website}/images/{service.id}-{location.id}-og.jpg"
}
def generate_schema_markup(self, service: Service, location: Location) -> str:
"""Generate JSON-LD schema markup"""
schema = {
"@context": "https://schema.org",
"@type": "MedicalBusiness",
"name": f"{self.brand.name_en} - {location.name_en}",
"url": f"https://{self.brand.website}",
"telephone": self.brand.phone,
"email": self.brand.email,
"address": {
"@type": "PostalAddress",
"streetAddress": location.full_address,
"addressLocality": location.name_en,
"addressCountry": "KR"
},
"geo": {
"@type": "GeoCoordinates",
"latitude": location.latitude,
"longitude": location.longitude
},
"areaServed": {
"@type": "City",
"name": location.name_en
},
"availableService": {
"@type": "MedicalProcedure",
"name": service.name_en,
"description": service.description
},
"priceRange": service.price_range
}
return json.dumps(schema, indent=2, ensure_ascii=False)
def generate_content_variations(self, service: Service, location: Location) -> Dict[str, List[str]]:
"""Generate content variations for uniqueness"""
return {
"hero_headlines": [
f"Professional {service.name_en} in {location.name_en}",
f"{location.name_en}'s Premier {service.name_en} {service.category}",
f"Expert {service.name_en} Services for {location.name_en} Residents",
f"Transform Your Look with {service.name_en} in {location.name_en}"
],
"intro_paragraphs": [
f"Welcome to {self.brand.name_en}, where we specialize in providing exceptional "
f"{service.name_en} services to the {location.name_en} community. "
f"Our state-of-the-art facility, conveniently located near {location.landmarks[0]}, "
f"combines advanced technology with personalized care.",
f"Looking for trusted {service.name_en} in {location.name_en}? "
f"At {self.brand.name_en}, we've been serving the {location.demographics} "
f"for over {datetime.now().year - self.brand.established_year} years. "
f"Our expert team understands the unique needs of {location.name_en} residents.",
f"Discover why {location.name_en} residents choose {self.brand.name_en} "
f"for their {service.name_en} needs. Located just minutes from "
f"{', '.join(location.subway_stations[:2])}, we offer {service.category} "
f"services that deliver remarkable results."
],
"cta_buttons": [
f"Book Your {location.name_en} Consultation",
f"Schedule {service.name_en} Today",
f"Get Started in {location.name_en}",
f"Reserve Your Appointment"
],
"trust_signals": [
f"Trusted by {location.name_en} residents since {self.brand.established_year}",
f"Over 10,000 successful {service.category} treatments",
f"5-star rated {service.name_en} clinic in {location.name_en}",
f"Certified specialists serving {location.demographics}"
]
}
def localize_content(self, content: str, service: Service, location: Location) -> str:
"""Add local elements to content"""
local_elements = {
"transportation": f"Easily accessible via {', '.join(location.subway_stations)} stations",
"landmarks": f"Located near {' and '.join(location.landmarks[:2])}",
"community": f"Proud to serve the {location.name_en} community",
"convenience": f"Convenient for {location.demographics} in {location.name_en}",
"local_stats": f"Join thousands of satisfied patients from {location.name_en}"
}
# Add local elements naturally throughout content
for key, value in local_elements.items():
placeholder = f"[LOCAL_{key.upper()}]"
if placeholder in content:
content = content.replace(placeholder, value)
return content
def generate_page(self, service: Service, location: Location,
template_name: str = "gateway-page-medical.md") -> str:
"""Generate a complete gateway page"""
# Load template
template = self.load_template(template_name)
# Generate components
meta_tags = self.generate_meta_tags(service, location)
schema = self.generate_schema_markup(service, location)
variations = self.generate_content_variations(service, location)
# Replace placeholders in template
replacements = {
"[Medical Service]": service.name_en,
"[Location]": location.name_en,
"[location]": location.name_en.lower(),
"[Clinic Name]": self.brand.name_en,
"[service-slug]": service.id,
"[X years]": str(datetime.now().year - self.brand.established_year),
"[specific address near landmark]": f"{location.full_address}, near {location.landmarks[0]}",
"[nearby subway/bus stations]": ", ".join(location.subway_stations),
"[certification details]": ", ".join(self.brand.certifications[:2]),
"[equipment type]": f"{service.category} equipment",
"[duration]": service.procedure_time,
"[Medical Specialty]": service.category,
"[phone-number]": self.brand.phone,
"[website-url]": f"https://{self.brand.website}",
"[page-url]": f"https://{self.brand.website}/{location.id}/{service.id}/",
"[latitude]": str(location.latitude),
"[longitude]": str(location.longitude),
}
# Apply replacements
content = template
for placeholder, value in replacements.items():
content = content.replace(placeholder, value)
# Add localized content
content = self.localize_content(content, service, location)
# Add schema markup at the end if not already present
if '"@context": "https://schema.org"' not in content:
content += f"\n\n<!-- Schema Markup -->\n<script type='application/ld+json'>\n{schema}\n</script>"
return content
def generate_batch(self, services: List[Service], locations: List[Location],
output_dir: str = "output/") -> List[str]:
"""Generate multiple gateway pages"""
output_path = Path(output_dir)
output_path.mkdir(parents=True, exist_ok=True)
generated_files = []
for location in locations:
location_dir = output_path / location.id
location_dir.mkdir(exist_ok=True)
for service in services:
# Generate content
content = self.generate_page(service, location)
# Save to file
filename = f"{service.id}-{location.id}.md"
filepath = location_dir / filename
with open(filepath, 'w', encoding='utf-8') as f:
f.write(content)
generated_files.append(str(filepath))
print(f"✓ Generated: {filepath}")
# Generate index file
self.generate_index(services, locations, output_path)
return generated_files
def generate_index(self, services: List[Service], locations: List[Location],
output_path: Path):
"""Generate an index of all created pages"""
index_content = f"# Gateway Pages Index - {self.brand.name_en}\n\n"
index_content += f"Generated: {datetime.now().strftime('%Y-%m-%d %H:%M')}\n\n"
index_content += "## Pages by Location\n\n"
for location in locations:
index_content += f"### {location.name_en}\n"
for service in services:
url = f"/{location.id}/{service.id}/"
index_content += f"- [{service.name_en} in {location.name_en}]({url})\n"
index_content += "\n"
index_content += "## Pages by Service\n\n"
for service in services:
index_content += f"### {service.name_en}\n"
for location in locations:
url = f"/{location.id}/{service.id}/"
index_content += f"- [{location.name_en}]({url})\n"
index_content += "\n"
index_content += f"\n---\nTotal Pages Generated: {len(services) * len(locations)}\n"
with open(output_path / "index.md", 'w', encoding='utf-8') as f:
f.write(index_content)
def create_sample_data():
"""Create sample data for testing"""
# Sample brand
brand = Brand(
name_en="Jamie Clinic",
name_kr="제이미 클리닉",
website="www.jamieclinic.com",
phone="+82-2-1234-5678",
email="info@jamieclinic.com",
established_year=2010,
certifications=["ISO 9001", "KAHF Certified", "JCI Accredited"],
unique_selling_points=[
"Same-day appointments available",
"15+ years of experience",
"Latest medical technology"
]
)
# Sample locations
locations = [
Location(
id="gangnam",
name_en="Gangnam",
name_kr="강남",
full_address="123 Teheran-ro, Gangnam-gu, Seoul",
landmarks=["COEX", "Gangnam Station", "Samsung Station"],
subway_stations=["Gangnam Station (Line 2)", "Sinnonhyeon Station (Line 9)"],
demographics="Young professionals and affluent residents",
latitude=37.4979,
longitude=127.0276
),
Location(
id="myeongdong",
name_en="Myeongdong",
name_kr="명동",
full_address="456 Myeongdong-gil, Jung-gu, Seoul",
landmarks=["Myeongdong Cathedral", "Lotte Department Store"],
subway_stations=["Myeongdong Station (Line 4)", "Euljiro 1-ga Station (Line 2)"],
demographics="Tourists and young shoppers",
latitude=37.5636,
longitude=126.9869
)
]
# Sample services
services = [
Service(
id="laser-hair-removal",
name_en="Laser Hair Removal",
name_kr="레이저 제모",
category="Dermatology",
description="Advanced laser technology for permanent hair reduction",
keywords=["permanent hair removal", "IPL", "diode laser"],
procedure_time="30-60 minutes",
recovery_time="No downtime",
price_range="₩₩₩"
),
Service(
id="botox",
name_en="Botox Treatment",
name_kr="보톡스",
category="Cosmetic Dermatology",
description="FDA-approved botulinum toxin for wrinkle reduction",
keywords=["wrinkle treatment", "anti-aging", "facial rejuvenation"],
procedure_time="15-30 minutes",
recovery_time="No downtime",
price_range="₩₩₩₩"
)
]
return brand, locations, services
def main():
"""Main execution function"""
print("=" * 60)
print("Gateway Page Content Generator")
print("=" * 60)
# Get sample data
brand, locations, services = create_sample_data()
# Initialize generator
generator = GatewayPageGenerator(brand)
# Generate pages
print(f"\nGenerating {len(services) * len(locations)} gateway pages...")
print("-" * 40)
generated_files = generator.generate_batch(services, locations)
print("-" * 40)
print(f"\n✅ Successfully generated {len(generated_files)} pages!")
print(f"📁 Output directory: output/")
print(f"📋 Index file created: output/index.md")
# Generate report
print("\n" + "=" * 60)
print("GENERATION REPORT")
print("=" * 60)
print(f"Brand: {brand.name_en}")
print(f"Locations: {', '.join([loc.name_en for loc in locations])}")
print(f"Services: {', '.join([svc.name_en for svc in services])}")
print(f"Total Pages: {len(generated_files)}")
print(f"Timestamp: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print("=" * 60)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,5 @@
# 18-seo-gateway-builder dependencies
jinja2>=3.1.0
pyyaml>=6.0.0
markdown>=3.5.0
python-dotenv>=1.0.0

View File

@@ -0,0 +1,231 @@
# [Medical Service] in [Location] | [Clinic Name]
<!-- Meta Tags -->
<!--
Title: [Medical Service] in [Location] | Expert Care | [Clinic Name]
Description: Looking for professional [medical service] in [location]? [Clinic Name] offers state-of-the-art [service] with experienced doctors. ✓ Same-day appointments ✓ Insurance accepted ✓ [Unique benefit]
Canonical: https://example.com/[location]/[service-slug]/
-->
## Professional [Medical Service] Available in [Location]
Welcome to [Clinic Name], your trusted provider for [medical service] in [location]. Our medical team brings over [X years] of combined experience, utilizing the latest medical technology to ensure optimal results for our patients in the [location] area.
### Why Choose [Clinic Name] for [Medical Service] in [Location]?
Located conveniently at [specific address near landmark], our [location] clinic specializes in providing personalized [medical service] treatments tailored to each patient's unique needs. We understand the specific health concerns of [location] residents and have designed our services accordingly.
**Our [Location] Advantages:**
- 🏥 Modern facility equipped with latest [equipment type]
- 👨‍⚕️ Board-certified specialists with [certification details]
- 📍 Easy access from [nearby subway/bus stations]
- 🕐 Extended hours to accommodate busy [location] professionals
- 💳 Accept major insurance plans popular in [location]
## Understanding [Medical Service]
### What Is [Medical Service]?
[Detailed medical explanation of the service, including scientific background, FDA approvals if applicable, and medical benefits. This section should be educational while remaining accessible.]
### Who Can Benefit from [Medical Service]?
Our [medical service] treatment in [location] is ideal for patients experiencing:
- [Condition 1 with brief explanation]
- [Condition 2 with brief explanation]
- [Condition 3 with brief explanation]
- [Condition 4 with brief explanation]
## Our [Medical Service] Process in [Location]
### 1. Initial Consultation
Your journey begins with a comprehensive consultation at our [location] clinic. Our specialists will:
- Review your medical history
- Conduct necessary diagnostic tests
- Discuss your treatment goals
- Create a personalized treatment plan
### 2. Treatment Planning
Based on your consultation, we develop a customized approach that considers:
- Your specific medical condition
- Lifestyle factors common to [location] residents
- Insurance coverage options
- Optimal scheduling for your convenience
### 3. Treatment Sessions
Each [medical service] session at our [location] facility typically involves:
- Pre-treatment preparation
- The procedure itself (approximately [duration])
- Post-treatment monitoring
- Detailed aftercare instructions
### 4. Follow-up Care
We provide comprehensive follow-up support including:
- Scheduled check-ups
- 24/7 emergency hotline
- Ongoing treatment adjustments
- Long-term health monitoring
## Expected Results and Recovery
### What to Expect After [Medical Service]
Patients at our [location] clinic typically experience:
- **Immediate effects**: [Description]
- **Short-term (1-2 weeks)**: [Description]
- **Long-term (1-3 months)**: [Description]
- **Final results**: [Timeline and description]
### Recovery Timeline
- Day 1-3: [Recovery details]
- Week 1: [Recovery details]
- Week 2-4: [Recovery details]
- Month 2-3: [Recovery details]
## Safety and Credentials
### Our Medical Standards
[Clinic Name] in [location] maintains the highest medical standards:
- ✓ [Relevant medical certification]
- ✓ [Hospital affiliation if applicable]
- ✓ [Safety protocol certification]
- ✓ [Professional membership]
### Our Medical Team
**Dr. [Name], MD**
- [Medical school]
- [Residency/Fellowship]
- [Years of experience] specializing in [medical service]
- [Special recognition or research]
## Pricing and Insurance
### Insurance Coverage
We accept most major insurance plans used by [location] residents:
- [Insurance provider 1]
- [Insurance provider 2]
- [Insurance provider 3]
- [Insurance provider 4]
### Payment Options
For your convenience, we offer:
- Insurance direct billing
- Flexible payment plans
- Credit card payments
- HSA/FSA acceptance
### Transparent Pricing
Contact us for a detailed quote. Factors affecting cost include:
- Severity of condition
- Number of sessions required
- Insurance coverage level
- Additional treatments needed
## Patient Testimonials from [Location]
> "After struggling with [condition] for years, I finally found relief at [Clinic Name]. The team was professional, and the results exceeded my expectations."
> — [Patient initials], [Location] resident
> "The convenience of having such high-quality [medical service] right here in [location] made all the difference. I no longer have to travel to [other area] for treatment."
> — [Patient initials], [Nearby neighborhood]
> "Dr. [Name] took the time to explain everything thoroughly. I felt confident throughout the entire process."
> — [Patient initials], [Location] professional
## Frequently Asked Questions
### General Questions
**Q: How do I know if [medical service] is right for me?**
A: The best way to determine if you're a candidate is through a consultation at our [location] clinic. We'll evaluate your medical history, current condition, and treatment goals.
**Q: How long does [medical service] take?**
A: Treatment sessions typically last [duration], though your first visit including consultation may take [longer duration].
**Q: Is [medical service] painful?**
A: [Comfort level explanation with pain management options available]
### Location-Specific Questions
**Q: Where exactly is your [location] clinic located?**
A: We're located at [full address], just [distance] from [landmark/station]. [Parking/public transport information].
**Q: Do you have parking available?**
A: Yes, we offer [parking details specific to location].
**Q: What are your hours for the [location] clinic?**
A:
- Monday-Friday: [hours]
- Saturday: [hours]
- Sunday: [hours/closed]
### Insurance and Payment
**Q: Does insurance cover [medical service]?**
A: Coverage varies by plan. Our insurance specialists can verify your benefits before your appointment.
**Q: Do you offer payment plans?**
A: Yes, we offer flexible payment options including [specific plans available].
## Schedule Your [Medical Service] Consultation in [Location]
Ready to take the first step? Contact our [location] clinic today:
### Contact Information
📍 **Address**: [Full address]
📞 **Phone**: [Local phone number]
📧 **Email**: [location]@[clinicname].com
🌐 **Online Booking**: [URL]
### Office Hours
- **Monday-Friday**: [Hours]
- **Saturday**: [Hours]
- **Sunday**: [Hours/Closed]
- **Emergency**: [24/7 hotline if available]
### Getting Here
**By Subway**: [Detailed directions from nearest station]
**By Bus**: [Bus routes and stops]
**By Car**: [Driving directions and parking info]
---
<!-- Schema Markup -->
```json
{
"@context": "https://schema.org",
"@type": "MedicalClinic",
"name": "[Clinic Name] - [Location]",
"image": "[clinic-image-url]",
"@id": "[page-url]",
"url": "[website-url]",
"telephone": "[phone-number]",
"address": {
"@type": "PostalAddress",
"streetAddress": "[street]",
"addressLocality": "[city]",
"addressRegion": "[state/province]",
"postalCode": "[zip]",
"addressCountry": "KR"
},
"geo": {
"@type": "GeoCoordinates",
"latitude": [latitude],
"longitude": [longitude]
},
"openingHoursSpecification": {
"@type": "OpeningHoursSpecification",
"dayOfWeek": ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"],
"opens": "09:00",
"closes": "18:00"
},
"medicalSpecialty": "[Medical Specialty]",
"availableService": {
"@type": "MedicalProcedure",
"name": "[Medical Service]",
"description": "[Service Description]"
}
}
```
*Last updated: [Date] | [Clinic Name] - Professional [Medical Service] in [Location]*

View File

@@ -0,0 +1,377 @@
---
name: gateway-page-content-builder
version: 1.0.0
description: Systematic content generation framework for SEO-optimized gateway pages with local service focus
license: OurDigital internal-use ONLY
---
# Gateway Page Content Builder
A comprehensive skill for building high-quality, SEO-optimized gateway page content for local services, medical practices, and business locations.
## Core Purpose
This skill provides a systematic framework for creating gateway pages that:
- Target specific location + service keyword combinations
- Follow SEO best practices for local search optimization
- Maintain content quality and uniqueness at scale
- Include structured data and technical SEO elements
## Content Generation Framework
### 1. Page Structure Template
Every gateway page should follow this optimized structure:
```markdown
# [Service Name] in [Location] - [Brand Name]
## Hero Section
- Primary headline with target keywords
- Value proposition statement
- Quick contact CTA
## Service Overview
- What is [service]?
- Why choose our [service] in [location]
- Key benefits for [location] residents
## Local Service Details
- Service availability in [location]
- Local team/facility information
- Location-specific offerings
## Process & Procedure
- Step-by-step service flow
- Duration and frequency
- What to expect
## Benefits & Results
- Evidence-based outcomes
- Patient/customer testimonials
- Before/after scenarios
## Pricing & Insurance
- Transparent pricing structure
- Insurance coverage details
- Payment options
## FAQ Section
- Location-specific questions
- Service-specific concerns
- Booking and preparation
## Contact & Booking
- Clear CTA sections
- Multiple contact methods
- Online booking integration
```
### 2. Content Variables System
Define reusable content variables for efficient scaling:
```yaml
variables:
service_types:
- name: "laser_hair_removal"
korean: "레이저 제모"
description: "Advanced laser technology for permanent hair reduction"
keywords: ["laser hair removal", "permanent hair removal", "IPL treatment"]
locations:
- name: "gangnam"
korean: "강남"
full_address: "서울특별시 강남구"
landmarks: ["COEX", "Samsung Station", "Gangnam Station"]
demographics: "Young professionals, high income"
brand_info:
name: "Your Clinic"
korean: "클리닉명"
usp: "15+ years of experience with latest technology"
```
### 3. Content Generation Rules
#### Title Tag Formula
```
[Service] in [Location] | [Unique Modifier] | [Brand]
Examples:
- "Laser Hair Removal in Gangnam | Same-Day Appointments | Jamie Clinic"
- "강남 레이저 제모 | 당일 예약 가능 | 제이미 클리닉"
```
#### Meta Description Template
```
Looking for [service] in [location]? [Brand] offers [USP] with [benefit].
Book your consultation today. ✓ [Feature 1] ✓ [Feature 2] ✓ [Feature 3]
```
#### H1 Optimization
```
Primary: [Service] in [Location]
Alternative: [Location] [Service] - [Brand Modifier]
Korean: [지역] [서비스] 전문 [브랜드]
```
### 4. Local SEO Elements
#### Schema Markup Requirements
```json
{
"@context": "https://schema.org",
"@type": "MedicalBusiness",
"name": "Clinic Name",
"address": {
"@type": "PostalAddress",
"streetAddress": "",
"addressLocality": "",
"addressRegion": "",
"postalCode": ""
},
"geo": {
"@type": "GeoCoordinates",
"latitude": "",
"longitude": ""
},
"areaServed": {
"@type": "City",
"name": "Location Name"
},
"medicalSpecialty": "Service Type",
"availableService": {
"@type": "MedicalProcedure",
"name": "Service Name",
"description": "Service Description"
}
}
```
### 5. Content Uniqueness Strategy
#### Localization Techniques
1. **Local landmarks**: "Just 5 minutes from [Landmark]"
2. **Transportation**: "Accessible via [Subway Line] at [Station]"
3. **Local statistics**: "Serving [X] residents in [Area] since [Year]"
4. **Community involvement**: "Proud partner of [Local Organization]"
5. **Regional preferences**: "Tailored to [Location] residents' needs"
#### Content Variation Patterns
```python
variations = {
"intro_patterns": [
"Discover professional [service] in [location]",
"[Location] residents trust us for [service]",
"Your local [service] experts in [location]",
"Premium [service] now available in [location]"
],
"cta_patterns": [
"Book your [location] appointment today",
"Schedule a consultation at our [location] clinic",
"Visit us in [location] for [service]",
"Get started with [service] in [location]"
]
}
```
### 6. Content Quality Checklist
Before publishing any gateway page, verify:
- [ ] **Keyword optimization**: Target keyword appears in title, H1, first 100 words
- [ ] **Content length**: Minimum 800 words of unique content
- [ ] **Local signals**: At least 5 location mentions naturally integrated
- [ ] **Structured data**: Schema markup properly implemented
- [ ] **Internal linking**: Links to main service page and location page
- [ ] **Images**: Alt text includes location + service keywords
- [ ] **Mobile optimization**: Content readable on mobile devices
- [ ] **Load speed**: Page loads under 3 seconds
- [ ] **CTAs**: Clear calls-to-action above and below fold
- [ ] **Trust signals**: Reviews, certifications, testimonials included
### 7. Scaling Framework
#### Batch Generation Process
1. Create master template with variable placeholders
2. Define location and service matrices
3. Generate unique content blocks for each combination
4. Review and customize top 20% traffic potential pages
5. Implement progressive enhancement based on performance
#### Priority Matrix
```
High Priority (Manual Optimization):
- High search volume + High commercial intent
- Major city centers + Premium services
- Competitive keywords requiring unique angle
Medium Priority (Template + Customization):
- Moderate search volume + Standard services
- Secondary locations + Common procedures
Low Priority (Automated Generation):
- Long-tail keywords + Suburban areas
- Informational intent + Low competition
```
### 8. Performance Tracking
#### KPIs to Monitor
```yaml
metrics:
organic_traffic:
- Pageviews from organic search
- Unique visitors by location
- Average session duration
conversions:
- Form submissions by page
- Phone calls tracked
- Online bookings completed
engagement:
- Bounce rate below 40%
- Pages per session above 2.0
- Scroll depth above 75%
rankings:
- Position tracking for target keywords
- Local pack appearances
- Featured snippet captures
```
## Implementation Instructions
### Step 1: Keyword Research
```python
# Generate keyword combinations
locations = ["gangnam", "sinsa", "apgujeong"]
services = ["laser_hair_removal", "botox", "filler"]
keywords = []
for location in locations:
for service in services:
keywords.append({
"primary": f"{service} {location}",
"secondary": f"{location} {service} clinic",
"long_tail": f"best {service} clinic in {location}"
})
```
### Step 2: Content Creation
1. Use the template structure above
2. Fill in variables for location and service
3. Add unique local content (minimum 30% unique per page)
4. Include relevant images with local landmarks
5. Add schema markup and meta tags
### Step 3: Technical Implementation
1. Create URL structure: `/location/service/`
2. Implement breadcrumbs with proper schema
3. Add internal linking to related pages
4. Set up canonical tags to avoid duplication
5. Create XML sitemap for gateway pages
### Step 4: Quality Assurance
- Run content through plagiarism checker
- Verify all technical SEO elements
- Test page speed and mobile responsiveness
- Review content for local relevance
- Check all CTAs and contact information
## Advanced Techniques
### Dynamic Content Insertion
```javascript
// Example of dynamic content based on user location
const userLocation = getUserLocation();
const nearestClinic = findNearestClinic(userLocation);
// Update content dynamically
document.querySelector('.hero-location').textContent =
`Serving ${userLocation.district} and surrounding areas`;
document.querySelector('.distance-info').textContent =
`Only ${nearestClinic.distance} from your location`;
```
### A/B Testing Framework
```yaml
test_variations:
headlines:
- control: "[Service] in [Location]"
- variant_a: "#1 [Service] Provider in [Location]"
- variant_b: "[Location]'s Trusted [Service] Clinic"
cta_buttons:
- control: "Book Now"
- variant_a: "Get Free Consultation"
- variant_b: "Check Availability"
```
### Content Refresh Strategy
- Monthly: Update testimonials and reviews
- Quarterly: Refresh statistics and data points
- Semi-annually: Add new FAQs based on search queries
- Annually: Complete content audit and refresh
## Prompts for Content Generation
### Initial Content Brief
```
Create gateway page content for [SERVICE] in [LOCATION]:
- Target keyword: [PRIMARY KEYWORD]
- Secondary keywords: [LIST]
- Local landmarks: [LIST]
- Unique selling points: [LIST]
- Competitor differentiation: [POINTS]
```
### Content Expansion
```
Expand the following gateway page section:
Current content: [PASTE]
Add: Local statistics, transportation info, 2 testimonials
Maintain: Professional tone, keyword density 2-3%
Length: 200-300 words
```
### FAQ Generation
```
Generate 8 FAQs for [SERVICE] in [LOCATION]:
- 3 service-specific questions
- 2 location/accessibility questions
- 2 pricing/insurance questions
- 1 preparation/aftercare question
Include question schema markup format
```
## Resources and Tools
### Recommended Tools
- **Keyword Research**: Ahrefs, SEMrush, Google Keyword Planner
- **Content Optimization**: Surfer SEO, Clearscope, MarketMuse
- **Schema Generation**: Schema.org, Google's Structured Data Tool
- **Performance Tracking**: Google Analytics, Search Console
- **A/B Testing**: Google Optimize, Optimizely
### Templates Directory
- `templates/gateway-page-medical.md`
- `templates/gateway-page-beauty.md`
- `templates/gateway-page-dental.md`
- `templates/schema-medical-business.json`
- `templates/meta-tags-local.html`
## Version History
### v1.0.0 (Current)
- Initial framework for gateway page content generation
- Medical and beauty service focus
- Korean market optimization
- Local SEO best practices
- Content scaling methodology
---
*This skill is optimized for Korean medical and beauty service markets but can be adapted for any local service business requiring location-based gateway pages.*

View File

@@ -0,0 +1,282 @@
# Laser Hair Removal in Gangnam | Expert Care | Jamie Clinic
<!-- Meta Tags -->
<!--
Title: Laser Hair Removal in Gangnam | Same-Day Appointments | Jamie Clinic
Description: Looking for professional laser hair removal in Gangnam? Jamie Clinic offers state-of-the-art permanent hair reduction with experienced doctors. ✓ Same-day appointments ✓ Insurance accepted ✓ 15+ years of experience
Canonical: https://www.jamieclinic.com/gangnam/laser-hair-removal/
-->
## Professional Laser Hair Removal Available in Gangnam
Welcome to Jamie Clinic, your trusted provider for laser hair removal in Gangnam. Our medical team brings over 14 years of combined experience, utilizing the latest medical technology to ensure optimal results for our patients in the Gangnam area.
### Why Choose Jamie Clinic for Laser Hair Removal in Gangnam?
Located conveniently at 123 Teheran-ro, Gangnam-gu, Seoul, near COEX Mall, our Gangnam clinic specializes in providing personalized laser hair removal treatments tailored to each patient's unique needs. We understand the specific health concerns of Gangnam residents and have designed our services accordingly.
**Our Gangnam Advantages:**
- 🏥 Modern facility equipped with latest diode and Alexandrite laser equipment
- 👨‍⚕️ Board-certified specialists with ISO 9001, KAHF Certified
- 📍 Easy access from Gangnam Station (Line 2), Sinnonhyeon Station (Line 9)
- 🕐 Extended hours to accommodate busy Gangnam professionals
- 💳 Accept major insurance plans popular in Gangnam
## Understanding Laser Hair Removal
### What Is Laser Hair Removal?
Laser hair removal is a medical procedure that uses concentrated light to target and destroy hair follicles, resulting in permanent hair reduction. Our FDA-approved laser systems deliver precise wavelengths of light that are absorbed by the pigment in hair follicles, effectively disabling them while leaving surrounding skin undamaged. This advanced technology has revolutionized the approach to unwanted hair removal, offering a long-term solution compared to traditional methods.
### Who Can Benefit from Laser Hair Removal?
Our laser hair removal treatment in Gangnam is ideal for patients experiencing:
- Excessive unwanted hair growth requiring frequent shaving or waxing
- Ingrown hairs and razor burn from traditional hair removal methods
- Time constraints making regular hair removal maintenance challenging
- Skin sensitivity or allergic reactions to depilatory creams and waxing
## Our Laser Hair Removal Process in Gangnam
### 1. Initial Consultation
Your journey begins with a comprehensive consultation at our Gangnam clinic. Our specialists will:
- Review your medical history and skin type
- Conduct a patch test to determine optimal laser settings
- Discuss your hair removal goals and expectations
- Create a personalized treatment plan with session scheduling
### 2. Treatment Planning
Based on your consultation, we develop a customized approach that considers:
- Your specific hair type and skin tone
- Lifestyle factors common to young professionals and affluent residents
- Insurance coverage options for medical necessity cases
- Optimal scheduling for your convenience around work commitments
### 3. Treatment Sessions
Each laser hair removal session at our Gangnam facility typically involves:
- Pre-treatment skin preparation and cooling
- The procedure itself (approximately 15-60 minutes depending on area)
- Post-treatment soothing gel application
- Detailed aftercare instructions for best results
### 4. Follow-up Care
We provide comprehensive follow-up support including:
- Scheduled check-ups to monitor progress
- 24/7 emergency hotline for any concerns
- Treatment adjustments based on your results
- Long-term maintenance planning
## Expected Results and Recovery
### What to Expect After Laser Hair Removal
Patients at our Gangnam clinic typically experience:
- **Immediate effects**: Slight redness and warmth in treated areas
- **Short-term (1-2 weeks)**: Hair shedding from treated follicles
- **Long-term (1-3 months)**: Noticeable reduction in hair growth
- **Final results**: 70-90% permanent hair reduction after complete treatment series
### Recovery Timeline
- Day 1-3: Mild redness and sensitivity, similar to mild sunburn
- Week 1: Hair begins to shed naturally from treated follicles
- Week 2-4: Skin returns to normal, new hair growth visibly reduced
- Month 2-3: Ready for next treatment session
## Safety and Credentials
### Our Medical Standards
Jamie Clinic in Gangnam maintains the highest medical standards:
- ✓ ISO 9001 Quality Management Certification
- ✓ KAHF (Korean Association of Hair and Face) Certified
- ✓ JCI Accredited facility
- ✓ Member of Korean Dermatological Association
### Our Medical Team
**Dr. Kim Jung-ho, MD**
- Seoul National University Medical School
- Dermatology Residency at Samsung Medical Center
- 10+ years specializing in laser hair removal
- Published research on laser technology in Asian skin types
**Dr. Park Mi-young, MD**
- Yonsei University College of Medicine
- Fellowship in Cosmetic Dermatology
- 8+ years of laser treatment experience
- Specialized training in sensitive skin treatments
## Pricing and Insurance
### Insurance Coverage
We accept most major insurance plans used by Gangnam residents:
- Samsung Fire & Marine Insurance
- KB Insurance
- Hyundai Marine & Fire Insurance
- DB Insurance
### Payment Options
For your convenience, we offer:
- Insurance direct billing for eligible treatments
- Flexible payment plans with 0% interest
- All major credit cards accepted
- HSA/FSA acceptance for medical procedures
### Transparent Pricing
Contact us for a detailed quote. Factors affecting cost include:
- Size and number of treatment areas
- Number of sessions required (typically 6-8)
- Your specific insurance coverage level
- Package deals for multiple areas
**Pricing Guide:**
- Small areas (upper lip, chin): 50,000-100,000 KRW per session
- Medium areas (underarms, bikini): 100,000-200,000 KRW per session
- Large areas (full legs, back): 200,000-400,000 KRW per session
## Patient Testimonials from Gangnam
> "After struggling with ingrown hairs for years, I finally found relief at Jamie Clinic. The team was professional, and the results exceeded my expectations. Being so close to my office in COEX made it incredibly convenient."
> — J.K., Gangnam resident
> "The convenience of having such high-quality laser hair removal right here in Gangnam made all the difference. I no longer have to travel to other districts for treatment. The results have been life-changing!"
> — S.L., Samsung Station area
> "Dr. Kim took the time to explain everything thoroughly. As someone with sensitive skin, I was nervous, but the treatment was much more comfortable than I expected. Highly recommend to anyone in Gangnam!"
> — M.C., Gangnam professional
## Frequently Asked Questions
### General Questions
**Q: How do I know if laser hair removal is right for me?**
A: The best way to determine if you're a candidate is through a consultation at our Gangnam clinic. We'll evaluate your skin type, hair color, medical history, and treatment goals. Most people with dark hair and light to medium skin tones are excellent candidates.
**Q: How long does laser hair removal take?**
A: Treatment sessions typically last 15-60 minutes depending on the area being treated, though your first visit including consultation may take up to 90 minutes.
**Q: Is laser hair removal painful?**
A: Most patients describe the sensation as similar to a rubber band snapping against the skin. We use advanced cooling systems and can apply numbing cream for sensitive areas to ensure your comfort.
**Q: How many sessions will I need?**
A: Most patients require 6-8 sessions spaced 4-6 weeks apart for optimal results. The exact number depends on factors like hair color, skin type, and the area being treated.
### Location-Specific Questions
**Q: Where exactly is your Gangnam clinic located?**
A: We're located at 123 Teheran-ro, Gangnam-gu, Seoul, just 3 minutes walk from Gangnam Station Exit 11. We're in the medical building next to COEX Mall. Free underground parking is available for all patients.
**Q: Do you have parking available?**
A: Yes, we offer free underground parking for all patients. The parking entrance is on the left side of the building. Simply validate your parking ticket at reception.
**Q: What are your hours for the Gangnam clinic?**
A:
- Monday-Friday: 9:00 AM - 8:00 PM
- Saturday: 9:00 AM - 6:00 PM
- Sunday: 10:00 AM - 4:00 PM
**Q: How do I get there by public transportation?**
A: Take Line 2 to Gangnam Station and use Exit 11. Walk straight for about 3 minutes. We're in the tall glass medical building on your right. Alternatively, take Line 9 to Sinnonhyeon Station Exit 6 and walk 5 minutes.
### Insurance and Payment
**Q: Does insurance cover laser hair removal?**
A: Coverage varies by plan. Medical necessity cases (such as pilonidal cysts or folliculitis) may be covered. Our insurance specialists can verify your benefits before your appointment.
**Q: Do you offer payment plans?**
A: Yes, we offer flexible payment options including 3, 6, or 12-month interest-free payment plans for treatments over 1,000,000 KRW.
**Q: Can I use my FSA or HSA?**
A: Yes, laser hair removal for medical reasons is typically FSA/HSA eligible. We can provide the necessary documentation for reimbursement.
## Schedule Your Laser Hair Removal Consultation in Gangnam
Ready to say goodbye to unwanted hair forever? Contact our Gangnam clinic today:
### Contact Information
📍 **Address**: 123 Teheran-ro, Gangnam-gu, Seoul 06234
📞 **Phone**: +82-2-1234-5678
📧 **Email**: gangnam@jamieclinic.com
🌐 **Online Booking**: https://www.jamieclinic.com/book-gangnam
### Office Hours
- **Monday-Friday**: 9:00 AM - 8:00 PM
- **Saturday**: 9:00 AM - 6:00 PM
- **Sunday**: 10:00 AM - 4:00 PM
- **Emergency**: 24/7 hotline available at +82-2-1234-9999
### Getting Here
**By Subway**:
- Line 2: Gangnam Station Exit 11 (3-minute walk)
- Line 9: Sinnonhyeon Station Exit 6 (5-minute walk)
- Sinbundang Line: Gangnam Station Exit 11 (3-minute walk)
**By Bus**: Routes 146, 341, 360, 740 all stop within 2 minutes of our clinic
**By Car**: Navigate to 123 Teheran-ro, Gangnam-gu. Underground parking entrance on the left side of the building. Free parking for all patients.
### Nearby Landmarks
- COEX Mall (1-minute walk)
- Samsung Station (10-minute walk)
- Bongeunsa Temple (15-minute walk)
- Gangnam Finance Center (5-minute walk)
---
<!-- Schema Markup -->
<script type='application/ld+json'>
{
"@context": "https://schema.org",
"@type": "MedicalBusiness",
"name": "Jamie Clinic - Gangnam",
"url": "https://www.jamieclinic.com",
"telephone": "+82-2-1234-5678",
"email": "info@jamieclinic.com",
"address": {
"@type": "PostalAddress",
"streetAddress": "123 Teheran-ro, Gangnam-gu, Seoul",
"addressLocality": "Gangnam",
"addressCountry": "KR"
},
"geo": {
"@type": "GeoCoordinates",
"latitude": 37.4979,
"longitude": 127.0276
},
"areaServed": {
"@type": "City",
"name": "Gangnam"
},
"availableService": {
"@type": "MedicalProcedure",
"name": "Laser Hair Removal",
"description": "Permanent hair reduction using state-of-the-art laser technology"
},
"priceRange": "₩₩₩",
"openingHoursSpecification": [
{
"@type": "OpeningHoursSpecification",
"dayOfWeek": ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"],
"opens": "09:00",
"closes": "20:00"
},
{
"@type": "OpeningHoursSpecification",
"dayOfWeek": "Saturday",
"opens": "09:00",
"closes": "18:00"
},
{
"@type": "OpeningHoursSpecification",
"dayOfWeek": "Sunday",
"opens": "10:00",
"closes": "16:00"
}
],
"aggregateRating": {
"@type": "AggregateRating",
"ratingValue": "4.8",
"reviewCount": "324"
}
}
</script>
*Last updated: November 19, 2024 | Jamie Clinic - Professional Laser Hair Removal in Gangnam*

View File

@@ -0,0 +1,231 @@
# [Medical Service] in [Location] | [Clinic Name]
<!-- Meta Tags -->
<!--
Title: [Medical Service] in [Location] | Expert Care | [Clinic Name]
Description: Looking for professional [medical service] in [location]? [Clinic Name] offers state-of-the-art [service] with experienced doctors. ✓ Same-day appointments ✓ Insurance accepted ✓ [Unique benefit]
Canonical: https://example.com/[location]/[service-slug]/
-->
## Professional [Medical Service] Available in [Location]
Welcome to [Clinic Name], your trusted provider for [medical service] in [location]. Our medical team brings over [X years] of combined experience, utilizing the latest medical technology to ensure optimal results for our patients in the [location] area.
### Why Choose [Clinic Name] for [Medical Service] in [Location]?
Located conveniently at [specific address near landmark], our [location] clinic specializes in providing personalized [medical service] treatments tailored to each patient's unique needs. We understand the specific health concerns of [location] residents and have designed our services accordingly.
**Our [Location] Advantages:**
- 🏥 Modern facility equipped with latest [equipment type]
- 👨‍⚕️ Board-certified specialists with [certification details]
- 📍 Easy access from [nearby subway/bus stations]
- 🕐 Extended hours to accommodate busy [location] professionals
- 💳 Accept major insurance plans popular in [location]
## Understanding [Medical Service]
### What Is [Medical Service]?
[Detailed medical explanation of the service, including scientific background, FDA approvals if applicable, and medical benefits. This section should be educational while remaining accessible.]
### Who Can Benefit from [Medical Service]?
Our [medical service] treatment in [location] is ideal for patients experiencing:
- [Condition 1 with brief explanation]
- [Condition 2 with brief explanation]
- [Condition 3 with brief explanation]
- [Condition 4 with brief explanation]
## Our [Medical Service] Process in [Location]
### 1. Initial Consultation
Your journey begins with a comprehensive consultation at our [location] clinic. Our specialists will:
- Review your medical history
- Conduct necessary diagnostic tests
- Discuss your treatment goals
- Create a personalized treatment plan
### 2. Treatment Planning
Based on your consultation, we develop a customized approach that considers:
- Your specific medical condition
- Lifestyle factors common to [location] residents
- Insurance coverage options
- Optimal scheduling for your convenience
### 3. Treatment Sessions
Each [medical service] session at our [location] facility typically involves:
- Pre-treatment preparation
- The procedure itself (approximately [duration])
- Post-treatment monitoring
- Detailed aftercare instructions
### 4. Follow-up Care
We provide comprehensive follow-up support including:
- Scheduled check-ups
- 24/7 emergency hotline
- Ongoing treatment adjustments
- Long-term health monitoring
## Expected Results and Recovery
### What to Expect After [Medical Service]
Patients at our [location] clinic typically experience:
- **Immediate effects**: [Description]
- **Short-term (1-2 weeks)**: [Description]
- **Long-term (1-3 months)**: [Description]
- **Final results**: [Timeline and description]
### Recovery Timeline
- Day 1-3: [Recovery details]
- Week 1: [Recovery details]
- Week 2-4: [Recovery details]
- Month 2-3: [Recovery details]
## Safety and Credentials
### Our Medical Standards
[Clinic Name] in [location] maintains the highest medical standards:
- ✓ [Relevant medical certification]
- ✓ [Hospital affiliation if applicable]
- ✓ [Safety protocol certification]
- ✓ [Professional membership]
### Our Medical Team
**Dr. [Name], MD**
- [Medical school]
- [Residency/Fellowship]
- [Years of experience] specializing in [medical service]
- [Special recognition or research]
## Pricing and Insurance
### Insurance Coverage
We accept most major insurance plans used by [location] residents:
- [Insurance provider 1]
- [Insurance provider 2]
- [Insurance provider 3]
- [Insurance provider 4]
### Payment Options
For your convenience, we offer:
- Insurance direct billing
- Flexible payment plans
- Credit card payments
- HSA/FSA acceptance
### Transparent Pricing
Contact us for a detailed quote. Factors affecting cost include:
- Severity of condition
- Number of sessions required
- Insurance coverage level
- Additional treatments needed
## Patient Testimonials from [Location]
> "After struggling with [condition] for years, I finally found relief at [Clinic Name]. The team was professional, and the results exceeded my expectations."
> — [Patient initials], [Location] resident
> "The convenience of having such high-quality [medical service] right here in [location] made all the difference. I no longer have to travel to [other area] for treatment."
> — [Patient initials], [Nearby neighborhood]
> "Dr. [Name] took the time to explain everything thoroughly. I felt confident throughout the entire process."
> — [Patient initials], [Location] professional
## Frequently Asked Questions
### General Questions
**Q: How do I know if [medical service] is right for me?**
A: The best way to determine if you're a candidate is through a consultation at our [location] clinic. We'll evaluate your medical history, current condition, and treatment goals.
**Q: How long does [medical service] take?**
A: Treatment sessions typically last [duration], though your first visit including consultation may take [longer duration].
**Q: Is [medical service] painful?**
A: [Comfort level explanation with pain management options available]
### Location-Specific Questions
**Q: Where exactly is your [location] clinic located?**
A: We're located at [full address], just [distance] from [landmark/station]. [Parking/public transport information].
**Q: Do you have parking available?**
A: Yes, we offer [parking details specific to location].
**Q: What are your hours for the [location] clinic?**
A:
- Monday-Friday: [hours]
- Saturday: [hours]
- Sunday: [hours/closed]
### Insurance and Payment
**Q: Does insurance cover [medical service]?**
A: Coverage varies by plan. Our insurance specialists can verify your benefits before your appointment.
**Q: Do you offer payment plans?**
A: Yes, we offer flexible payment options including [specific plans available].
## Schedule Your [Medical Service] Consultation in [Location]
Ready to take the first step? Contact our [location] clinic today:
### Contact Information
📍 **Address**: [Full address]
📞 **Phone**: [Local phone number]
📧 **Email**: [location]@[clinicname].com
🌐 **Online Booking**: [URL]
### Office Hours
- **Monday-Friday**: [Hours]
- **Saturday**: [Hours]
- **Sunday**: [Hours/Closed]
- **Emergency**: [24/7 hotline if available]
### Getting Here
**By Subway**: [Detailed directions from nearest station]
**By Bus**: [Bus routes and stops]
**By Car**: [Driving directions and parking info]
---
<!-- Schema Markup -->
```json
{
"@context": "https://schema.org",
"@type": "MedicalClinic",
"name": "[Clinic Name] - [Location]",
"image": "[clinic-image-url]",
"@id": "[page-url]",
"url": "[website-url]",
"telephone": "[phone-number]",
"address": {
"@type": "PostalAddress",
"streetAddress": "[street]",
"addressLocality": "[city]",
"addressRegion": "[state/province]",
"postalCode": "[zip]",
"addressCountry": "KR"
},
"geo": {
"@type": "GeoCoordinates",
"latitude": [latitude],
"longitude": [longitude]
},
"openingHoursSpecification": {
"@type": "OpeningHoursSpecification",
"dayOfWeek": ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"],
"opens": "09:00",
"closes": "18:00"
},
"medicalSpecialty": "[Medical Specialty]",
"availableService": {
"@type": "MedicalProcedure",
"name": "[Medical Service]",
"description": "[Service Description]"
}
}
```
*Last updated: [Date] | [Clinic Name] - Professional [Medical Service] in [Location]*

View File

@@ -0,0 +1,90 @@
# OurDigital GTM Audit
Lightweight Google Tag Manager audit toolkit powered by Playwright.
> **Note**: For comprehensive GTM management including dataLayer tag generation, see [14-ourdigital-gtm-manager](../14-ourdigital-gtm-manager/).
## Features
- **Container Detection**: Verify GTM installation, position, and duplicates
- **DataLayer Validation**: Event structure, types, sequence checking
- **Form Tracking**: Form discovery, field analysis, event verification
- **E-commerce Checkout**: Full funnel flow simulation and validation
- **Multi-Platform**: GA4, Meta Pixel, LinkedIn, Google Ads, Kakao, Naver
## Installation
```bash
# Clone or download
cd gtm-audit-claude-code
# Install dependencies
pip install -r requirements.txt
# Install Playwright browsers
playwright install chromium
```
## Usage
```bash
# Full audit
python gtm_audit.py --url "https://yoursite.com" --journey full
# Specific container validation
python gtm_audit.py --url "https://yoursite.com" --container "GTM-XXXXXX"
# Form tracking only
python gtm_audit.py --url "https://yoursite.com/contact" --journey form
# E-commerce checkout
python gtm_audit.py --url "https://yoursite.com/cart" --journey checkout
```
## Options
| Option | Description | Default |
|--------|-------------|---------|
| `--url` | Target URL (required) | - |
| `--container` | Expected GTM container ID | None |
| `--journey` | Audit type: pageview, scroll, form, checkout, datalayer, full | full |
| `--output` | Output file path | gtm_audit_report.json |
| `--timeout` | Page load timeout (ms) | 30000 |
| `--headless` | Run browser headless | True |
## Output
Generates JSON report with:
- Container status
- DataLayer events and validation issues
- Form analysis and tracking readiness
- Checkout flow analysis
- Network requests by destination
- Recommendations and checklist
## Using with Claude Code
This project includes a `CLAUDE.md` file optimized for use with Claude Code.
```bash
# In your terminal
claude
# Then ask Claude to run audits
> Run a GTM audit on https://example.com
> Check the form tracking on https://example.com/contact
> Analyze the checkout flow issues in the latest report
```
## Documentation
See `docs/` folder for:
- GA4 event specifications
- E-commerce dataLayer schemas
- Form tracking patterns
- Checkout flow sequences
- Common issues and fixes
## License
MIT

View File

@@ -0,0 +1,109 @@
# OurDigital GTM Audit
Lightweight Google Tag Manager audit toolkit using Playwright browser automation.
> For comprehensive GTM management including dataLayer tag generation, see `14-ourdigital-gtm-manager`.
## Project Overview
This tool audits GTM container installations, validates dataLayer events, tests form tracking, simulates e-commerce checkout flows, and generates comprehensive reports.
## Quick Commands
```bash
# Install dependencies
pip install playwright
playwright install chromium
# Run full audit
python gtm_audit.py --url "https://example.com" --journey full
# Form tracking audit
python gtm_audit.py --url "https://example.com/contact" --journey form
# E-commerce checkout flow
python gtm_audit.py --url "https://example.com/cart" --journey checkout
# DataLayer deep inspection
python gtm_audit.py --url "https://example.com" --journey datalayer
# With specific container validation
python gtm_audit.py --url "https://example.com" --container "GTM-XXXXXX"
```
## Journey Types
| Journey | Description |
|---------|-------------|
| `pageview` | Basic page load + scroll simulation |
| `scroll` | Scroll depth trigger testing (25%, 50%, 75%, 90%) |
| `form` | Form discovery, field analysis, interaction simulation |
| `checkout` | E-commerce flow: cart → checkout → shipping → payment → purchase |
| `datalayer` | Deep dataLayer validation and event sequence analysis |
| `full` | All of the above combined |
## Output
Generates `gtm_audit_report.json` with:
- Container status (installed, position, duplicates)
- DataLayer analysis (events, validation issues, sequence errors)
- Form analysis (forms found, tracking readiness, missing events)
- Checkout analysis (elements detected, flow issues)
- Network requests (GA4, Meta, LinkedIn, etc.)
- Recommendations and checklist
## Key Files
- `gtm_audit.py` - Main audit script
- `docs/ga4_events.md` - GA4 event specifications
- `docs/ecommerce_schema.md` - E-commerce dataLayer structures
- `docs/form_tracking.md` - Form event patterns
- `docs/checkout_flow.md` - Checkout funnel sequence
- `docs/datalayer_validation.md` - Validation rules
- `docs/common_issues.md` - Frequent problems and fixes
## Coding Guidelines
When modifying this tool:
1. **Tag Destinations**: Add new platforms to `TAG_DESTINATIONS` dict
2. **Event Validation**: Add requirements to `GA4_EVENT_REQUIREMENTS` dict
3. **Form Selectors**: Extend `FormAnalyzer.discover_forms()` for custom forms
4. **Checkout Elements**: Add selectors to `CheckoutFlowAnalyzer.detect_checkout_elements()`
## Korean Market Considerations
- Support Korean payment methods (카카오페이, 네이버페이, 토스)
- Handle KRW currency (no decimals)
- Include Kakao Pixel and Naver Analytics patterns
- Korean button text patterns (장바구니, 결제하기, 주문하기)
## Testing a New Site
1. Run with `--journey full` first to get complete picture
2. Check `gtm_audit_report.json` for issues
3. Focus on specific areas with targeted journey types
4. Use `--container GTM-XXXXXX` to validate specific container
## Common Tasks
### Add support for new tag platform
```python
# In TAG_DESTINATIONS dict
"NewPlatform": [
r"tracking\.newplatform\.com",
r"pixel\.newplatform\.com",
],
```
### Add custom form field detection
```python
# In FormAnalyzer.discover_forms()
# Add new field types or selectors
```
### Extend checkout flow for specific platform
```python
# In CheckoutFlowAnalyzer.detect_checkout_elements()
# Add platform-specific selectors
```

View File

@@ -0,0 +1,237 @@
# E-commerce Checkout Flow Reference
## Complete Checkout Event Sequence
```
view_cart → begin_checkout → add_shipping_info → add_payment_info → purchase
```
Each step must fire in order with consistent item data.
## Event Details
### 1. view_cart
When user views cart page.
```javascript
dataLayer.push({ ecommerce: null });
dataLayer.push({
event: "view_cart",
ecommerce: {
currency: "KRW",
value: 125000,
items: [{
item_id: "SKU_001",
item_name: "Blue T-Shirt",
price: 45000,
quantity: 2,
item_brand: "Brand",
item_category: "Apparel"
}, {
item_id: "SKU_002",
item_name: "Black Jeans",
price: 35000,
quantity: 1
}]
}
});
```
### 2. begin_checkout
When user initiates checkout process.
```javascript
dataLayer.push({ ecommerce: null });
dataLayer.push({
event: "begin_checkout",
ecommerce: {
currency: "KRW",
value: 125000,
coupon: "SUMMER10",
items: [/* same items as view_cart */]
}
});
```
### 3. add_shipping_info
When user completes shipping step.
```javascript
dataLayer.push({ ecommerce: null });
dataLayer.push({
event: "add_shipping_info",
ecommerce: {
currency: "KRW",
value: 125000,
coupon: "SUMMER10",
shipping_tier: "Express", // Required
items: [/* same items */]
}
});
```
**shipping_tier values:**
- "Standard" / "일반배송"
- "Express" / "익일배송"
- "Same Day" / "당일배송"
- "Free" / "무료배송"
- "Store Pickup" / "매장픽업"
### 4. add_payment_info
When user enters payment details.
```javascript
dataLayer.push({ ecommerce: null });
dataLayer.push({
event: "add_payment_info",
ecommerce: {
currency: "KRW",
value: 125000,
coupon: "SUMMER10",
payment_type: "Credit Card", // Required
items: [/* same items */]
}
});
```
**payment_type values:**
- "Credit Card" / "신용카드"
- "Debit Card" / "체크카드"
- "Bank Transfer" / "계좌이체"
- "Virtual Account" / "가상계좌"
- "Mobile Payment" / "휴대폰결제"
- "Kakao Pay" / "카카오페이"
- "Naver Pay" / "네이버페이"
- "Toss" / "토스"
- "PayPal"
### 5. purchase
When transaction completes successfully.
```javascript
dataLayer.push({ ecommerce: null });
dataLayer.push({
event: "purchase",
ecommerce: {
transaction_id: "T_20250115_001234", // Required, unique
value: 130500, // Required (total)
tax: 11863,
shipping: 5000,
currency: "KRW", // Required
coupon: "SUMMER10",
items: [{
item_id: "SKU_001",
item_name: "Blue T-Shirt",
affiliation: "Online Store",
coupon: "SUMMER10",
discount: 4500,
price: 45000,
quantity: 2
}]
}
});
```
## Funnel Drop-off Analysis
### Tracking Drop-offs
Monitor completion rate at each step:
| Step | Event | Drop-off Indicator |
|------|-------|-------------------|
| Cart | view_cart | User leaves cart page |
| Checkout Start | begin_checkout | User doesn't proceed |
| Shipping | add_shipping_info | Address form abandoned |
| Payment | add_payment_info | Payment not completed |
| Complete | purchase | Transaction failed |
### Implementing Drop-off Tracking
```javascript
// Track checkout step viewed but not completed
let checkoutStep = 0;
function trackCheckoutProgress(step) {
if (step > checkoutStep) {
checkoutStep = step;
}
}
window.addEventListener('beforeunload', () => {
if (checkoutStep > 0 && checkoutStep < 5) {
dataLayer.push({
event: 'checkout_abandon',
last_step: checkoutStep,
step_name: ['cart', 'checkout', 'shipping', 'payment', 'complete'][checkoutStep - 1]
});
}
});
```
## Value Consistency Check
Ensure `value` matches across events:
```
view_cart.value = sum(items.price * items.quantity)
begin_checkout.value = view_cart.value
add_shipping_info.value = begin_checkout.value
add_payment_info.value = add_shipping_info.value
purchase.value = add_payment_info.value + shipping + tax - discount
```
## Common Issues
### Duplicate Purchase Events
**Problem**: Same order tracked multiple times
**Solution**:
```javascript
// Check if already tracked
const txId = "T_12345";
if (!sessionStorage.getItem('purchase_' + txId)) {
dataLayer.push({ event: 'purchase', ... });
sessionStorage.setItem('purchase_' + txId, 'true');
}
```
### Missing Items in Later Steps
**Problem**: Items present in view_cart but missing in purchase
**Solution**: Store cart data in session and reuse
### Inconsistent Currency
**Problem**: Some events use USD, others KRW
**Solution**: Standardize currency across all events
### Wrong Value Calculation
**Problem**: purchase.value doesn't include tax/shipping
**Solution**:
```
purchase.value = subtotal + tax + shipping - discount
```
## Korean E-commerce Platforms
### Cafe24
Custom dataLayer variable names - check documentation
### Shopify Korea
Standard GA4 format with `Shopify.checkout` object
### WooCommerce
Use official GA4 plugin or custom implementation
### Naver SmartStore
Separate Naver Analytics implementation required
## Checkout Flow Checklist
- [ ] view_cart fires on cart page load
- [ ] begin_checkout fires on checkout button click
- [ ] add_shipping_info includes shipping_tier
- [ ] add_payment_info includes payment_type
- [ ] purchase has unique transaction_id
- [ ] All events have consistent items array
- [ ] Currency is consistent across all events
- [ ] Value calculations are accurate
- [ ] ecommerce object cleared before each push
- [ ] Purchase event fires only once per order

View File

@@ -0,0 +1,211 @@
# Common GTM Issues & Fixes
## Container Issues
### GTM Not Firing
**Symptoms**: No GTM requests in network tab
**Causes**:
1. Script blocked by ad blocker
2. Script placed after closing body tag
3. JavaScript error before GTM loads
4. Consent management blocking GTM
**Fix**:
```html
<!-- Place immediately after opening <head> tag -->
<script>(function(w,d,s,l,i){...})(window,document,'script','dataLayer','GTM-XXXXXX');</script>
```
### Multiple Containers Conflict
**Symptoms**: Duplicate events, inconsistent data
**Causes**:
1. Legacy container not removed
2. Different teams installed separate containers
3. Theme/plugin auto-installed GTM
**Fix**:
1. Audit all containers in source
2. Consolidate to single container
3. Use GTM environments for staging/prod
### Container ID Mismatch
**Symptoms**: Tags not firing, wrong property receiving data
**Causes**:
1. Dev/staging container on production
2. Copy-paste error during installation
**Fix**: Verify container ID matches GTM account
---
## DataLayer Issues
### DataLayer Not Initialized
**Symptoms**: First push events lost
**Code Error**:
```javascript
// Wrong - GTM loads before dataLayer exists
<script>GTM snippet</script>
dataLayer.push({...});
```
**Fix**:
```javascript
// Correct - Initialize dataLayer first
<script>window.dataLayer = window.dataLayer || [];</script>
<script>GTM snippet</script>
```
### Case Sensitivity Issues
**Symptoms**: Triggers not matching
**Example**:
```javascript
// DataLayer pushes "AddToCart"
dataLayer.push({ event: "AddToCart" });
// But GTM trigger looks for "addToCart" - won't match!
```
**Fix**: Standardize event naming (recommend lowercase with underscores)
### Wrong Data Types
**Symptoms**: Calculations wrong in GA4, missing data
**Example**:
```javascript
// Wrong - price as string
dataLayer.push({ ecommerce: { value: "29.99" }});
// Correct - price as number
dataLayer.push({ ecommerce: { value: 29.99 }});
```
### Timing Issues
**Symptoms**: Events fire before data available
**Cause**: DataLayer push happens after tag fires
**Fix**: Use "Custom Event" trigger instead of "Page View"
---
## Tag Issues
### Tag Not Firing
**Checklist**:
1. ✓ Trigger conditions met?
2. ✓ Trigger enabled?
3. ✓ Tag not paused?
4. ✓ No blocking triggers active?
5. ✓ Consent mode not blocking?
**Debug Steps**:
1. GTM Preview > Check Tags Fired
2. Verify trigger shows green check
3. Check Variables tab for expected values
### Duplicate Tag Firing
**Symptoms**: Events counted 2x in GA4
**Causes**:
1. Multiple triggers on same action
2. Page re-renders triggering again
3. SPA virtual pageviews firing multiple times
**Fix**:
1. Add "Once per event" tag firing option
2. Use trigger groups to control firing
3. Add conditions to prevent re-firing
### Wrong Parameters Sent
**Symptoms**: Data appears in wrong fields in GA4
**Debug**:
1. GTM Preview > Tags > Show fired tag
2. Check "Values" sent with tag
3. Compare with expected parameters
---
## E-commerce Issues
### Missing Transaction ID
**Symptoms**: Duplicate purchases counted
**Fix**: Ensure unique `transaction_id` generated server-side
### Items Array Empty
**Symptoms**: Revenue tracked but no products
**Check**: `ecommerce.items` array populated
### Value Mismatch
**Symptoms**: Revenue doesn't match actual
**Causes**:
1. Tax/shipping included inconsistently
2. Currency conversion issues
3. Discount applied incorrectly
### Purchase Event Fires Multiple Times
**Symptoms**: Same order tracked 2-3x
**Causes**:
1. Page refresh on confirmation
2. Browser back button
3. Email link revisit
**Fix**:
```javascript
// Check if already tracked
if (!sessionStorage.getItem('purchase_' + transaction_id)) {
dataLayer.push({ event: 'purchase', ... });
sessionStorage.setItem('purchase_' + transaction_id, 'true');
}
```
---
## Consent Mode Issues
### Tags Blocked by Consent
**Symptoms**: Tags show "Blocked by consent" in Preview
**Fix**:
1. Verify consent mode implementation
2. Check default consent state
3. Test with consent granted
### Consent Not Updating
**Symptoms**: Tags stay blocked after user accepts
**Fix**: Verify `gtag('consent', 'update', {...})` fires on accept
---
## SPA (Single Page App) Issues
### Pageviews Not Tracking Navigation
**Symptoms**: Only initial pageview tracked
**Cause**: No page reload on route change
**Fix**: Implement History Change trigger or custom event:
```javascript
// On route change
dataLayer.push({
event: 'virtual_pageview',
page_path: newPath,
page_title: newTitle
});
```
### Events Fire on Old Page Data
**Symptoms**: Wrong page_path in events
**Fix**: Update page variables before event push
---
## Performance Issues
### Tags Slowing Page Load
**Symptoms**: High LCP, slow TTI
**Causes**:
1. Too many synchronous tags
2. Large third-party scripts
3. Tags in wrong firing sequence
**Fix**:
1. Use tag sequencing
2. Load non-critical tags on Window Loaded
3. Defer marketing tags

View File

@@ -0,0 +1,287 @@
# DataLayer Validation Reference
## DataLayer Structure Basics
### Proper Initialization
```javascript
// Must appear BEFORE GTM script
<script>
window.dataLayer = window.dataLayer || [];
</script>
<!-- GTM script here -->
```
### Push Syntax
```javascript
// Correct
dataLayer.push({ event: "page_view", page_title: "Home" });
// Wrong - direct assignment
dataLayer = [{ event: "page_view" }]; // ❌ Overwrites array
```
## Validation Rules
### Event Names
| Rule | Valid | Invalid |
|------|-------|---------|
| Alphanumeric + underscore | `add_to_cart` | `add-to-cart` |
| Max 40 characters | `purchase` | (too long names) |
| Case sensitive | `addToCart``addtocart` | - |
| No spaces | `form_submit` | `form submit` |
| No special chars | `click_cta` | `click@cta` |
### Parameter Names
| Rule | Valid | Invalid |
|------|-------|---------|
| Max 40 characters | `item_category` | (too long) |
| Alphanumeric + underscore | `user_id` | `user-id` |
| Cannot start with `_` | `custom_param` | `_private` |
| Cannot start with number | `step_1` | `1_step` |
### Data Types
| Parameter | Expected Type | Example |
|-----------|---------------|---------|
| value | number | `29.99` not `"29.99"` |
| currency | string (ISO 4217) | `"USD"`, `"KRW"` |
| transaction_id | string | `"T_12345"` |
| quantity | integer | `2` not `2.0` |
| price | number | `45000` |
| items | array | `[{...}, {...}]` |
### Type Validation Code
```javascript
function validateDataLayerPush(data) {
const issues = [];
// Check value is number
if (data.ecommerce?.value !== undefined) {
if (typeof data.ecommerce.value !== 'number') {
issues.push(`value should be number, got ${typeof data.ecommerce.value}`);
}
}
// Check currency format
if (data.ecommerce?.currency) {
if (!/^[A-Z]{3}$/.test(data.ecommerce.currency)) {
issues.push(`currency should be 3-letter ISO code`);
}
}
// Check items array
if (data.ecommerce?.items) {
if (!Array.isArray(data.ecommerce.items)) {
issues.push(`items should be array`);
} else {
data.ecommerce.items.forEach((item, i) => {
if (!item.item_id) issues.push(`items[${i}] missing item_id`);
if (!item.item_name) issues.push(`items[${i}] missing item_name`);
if (item.price && typeof item.price !== 'number') {
issues.push(`items[${i}].price should be number`);
}
if (item.quantity && !Number.isInteger(item.quantity)) {
issues.push(`items[${i}].quantity should be integer`);
}
});
}
}
return issues;
}
```
## E-commerce Object Clearing
### Why Clear?
GA4 may merge previous ecommerce data with new events.
### Correct Pattern
```javascript
// Clear first
dataLayer.push({ ecommerce: null });
// Then push new event
dataLayer.push({
event: "view_item",
ecommerce: { ... }
});
```
### Validation Check
```javascript
function checkEcommerceClear(dataLayerArray) {
let lastHadEcommerce = false;
const issues = [];
dataLayerArray.forEach((item, i) => {
const hasEcommerce = 'ecommerce' in item;
const isNull = item.ecommerce === null;
if (hasEcommerce && !isNull && lastHadEcommerce) {
issues.push({
index: i,
message: 'Missing ecommerce:null before this push'
});
}
lastHadEcommerce = hasEcommerce && !isNull;
});
return issues;
}
```
## Event Sequence Validation
### Expected Sequences
**E-commerce Purchase Flow:**
```
view_item_list? → view_item → add_to_cart → view_cart →
begin_checkout → add_shipping_info → add_payment_info → purchase
```
**Form Submission:**
```
form_start → form_submit → generate_lead?
```
**User Authentication:**
```
login | sign_up
```
### Sequence Validator
```javascript
function validateSequence(events, expectedOrder) {
const eventNames = events
.filter(e => e.event)
.map(e => e.event);
let lastIndex = -1;
const issues = [];
eventNames.forEach(event => {
const index = expectedOrder.indexOf(event);
if (index !== -1) {
if (index < lastIndex) {
issues.push(`${event} fired out of expected order`);
}
lastIndex = index;
}
});
return issues;
}
```
## Duplicate Event Detection
### Common Duplicates
- Multiple `page_view` on single page load
- `purchase` firing on page refresh
- Click events on bubbling elements
### Detection Code
```javascript
function findDuplicates(events) {
const seen = {};
const duplicates = [];
events.forEach((event, i) => {
if (!event.event) return;
const key = JSON.stringify(event);
if (seen[key]) {
duplicates.push({
event: event.event,
firstIndex: seen[key],
duplicateIndex: i
});
} else {
seen[key] = i;
}
});
return duplicates;
}
```
## Real-time Monitoring Setup
### Console Monitoring
```javascript
// Paste in browser console to monitor pushes
(function() {
const original = dataLayer.push;
dataLayer.push = function() {
console.group('📊 dataLayer.push');
console.log('Data:', arguments[0]);
console.log('Time:', new Date().toISOString());
console.groupEnd();
return original.apply(this, arguments);
};
console.log('✅ DataLayer monitoring active');
})();
```
### Export DataLayer
```javascript
// Copy full dataLayer to clipboard
copy(JSON.stringify(dataLayer, null, 2));
```
## Validation Checklist
### Structure
- [ ] dataLayer initialized before GTM
- [ ] Using push() not assignment
- [ ] Event names follow conventions
- [ ] Parameter names follow conventions
### Data Types
- [ ] value is number
- [ ] currency is 3-letter code
- [ ] quantity is integer
- [ ] items is array
- [ ] Required fields present
### E-commerce
- [ ] ecommerce:null before each push
- [ ] items array has item_id and item_name
- [ ] transaction_id is unique
- [ ] Consistent currency across events
### Sequence
- [ ] Events fire in logical order
- [ ] No duplicate events
- [ ] Purchase fires only once
## Debug Tools
### GTM Preview Mode
- Real-time event inspection
- Variable value checking
- Tag firing verification
### GA4 DebugView
- Live event stream
- Parameter validation
- User property tracking
### Browser Console
```javascript
// View current dataLayer
console.table(dataLayer);
// Filter by event
dataLayer.filter(d => d.event === 'purchase');
```

View File

@@ -0,0 +1,216 @@
# E-commerce DataLayer Schema Reference
## GA4 E-commerce Structure
### Items Array Schema
Every e-commerce event requires an `items` array:
```javascript
items: [{
// Required
item_id: "SKU_12345",
item_name: "Blue T-Shirt",
// Recommended
affiliation: "Store Name",
coupon: "SUMMER_SALE",
discount: 5.00,
index: 0,
item_brand: "Brand Name",
item_category: "Apparel",
item_category2: "Men",
item_category3: "Shirts",
item_category4: "T-Shirts",
item_category5: "Short Sleeve",
item_list_id: "related_products",
item_list_name: "Related Products",
item_variant: "Blue/Large",
location_id: "ChIJIQBpAG2ahYAR_6128GcTUEo",
price: 29.99,
quantity: 1
}]
```
### Clear Previous E-commerce Data
Always clear before new e-commerce event:
```javascript
dataLayer.push({ ecommerce: null });
dataLayer.push({
event: "view_item",
ecommerce: {
// new data
}
});
```
## Complete Purchase Flow
### 1. Product List View
```javascript
dataLayer.push({ ecommerce: null });
dataLayer.push({
event: "view_item_list",
ecommerce: {
item_list_id: "category_results",
item_list_name: "Category Results",
items: [
{ item_id: "SKU_001", item_name: "Product 1", index: 0, price: 29.99 },
{ item_id: "SKU_002", item_name: "Product 2", index: 1, price: 39.99 }
]
}
});
```
### 2. Product Click
```javascript
dataLayer.push({ ecommerce: null });
dataLayer.push({
event: "select_item",
ecommerce: {
item_list_id: "category_results",
item_list_name: "Category Results",
items: [{
item_id: "SKU_001",
item_name: "Product 1",
price: 29.99
}]
}
});
```
### 3. Product Detail View
```javascript
dataLayer.push({ ecommerce: null });
dataLayer.push({
event: "view_item",
ecommerce: {
currency: "USD",
value: 29.99,
items: [{
item_id: "SKU_001",
item_name: "Product 1",
item_brand: "Brand",
item_category: "Category",
price: 29.99,
quantity: 1
}]
}
});
```
### 4. Add to Cart
```javascript
dataLayer.push({ ecommerce: null });
dataLayer.push({
event: "add_to_cart",
ecommerce: {
currency: "USD",
value: 29.99,
items: [{
item_id: "SKU_001",
item_name: "Product 1",
price: 29.99,
quantity: 1
}]
}
});
```
### 5. View Cart
```javascript
dataLayer.push({ ecommerce: null });
dataLayer.push({
event: "view_cart",
ecommerce: {
currency: "USD",
value: 59.98,
items: [
{ item_id: "SKU_001", item_name: "Product 1", price: 29.99, quantity: 1 },
{ item_id: "SKU_002", item_name: "Product 2", price: 29.99, quantity: 1 }
]
}
});
```
### 6. Begin Checkout
```javascript
dataLayer.push({ ecommerce: null });
dataLayer.push({
event: "begin_checkout",
ecommerce: {
currency: "USD",
value: 59.98,
coupon: "DISCOUNT10",
items: [...]
}
});
```
### 7. Add Shipping Info
```javascript
dataLayer.push({ ecommerce: null });
dataLayer.push({
event: "add_shipping_info",
ecommerce: {
currency: "USD",
value: 59.98,
shipping_tier: "Standard",
items: [...]
}
});
```
### 8. Add Payment Info
```javascript
dataLayer.push({ ecommerce: null });
dataLayer.push({
event: "add_payment_info",
ecommerce: {
currency: "USD",
value: 59.98,
payment_type: "Credit Card",
items: [...]
}
});
```
### 9. Purchase
```javascript
dataLayer.push({ ecommerce: null });
dataLayer.push({
event: "purchase",
ecommerce: {
transaction_id: "T_12345",
value: 65.97,
tax: 4.99,
shipping: 5.99,
currency: "USD",
coupon: "DISCOUNT10",
items: [{
item_id: "SKU_001",
item_name: "Product 1",
affiliation: "Online Store",
coupon: "DISCOUNT10",
discount: 3.00,
item_brand: "Brand",
item_category: "Category",
price: 29.99,
quantity: 1
}]
}
});
```
## Korean E-commerce Considerations
### Currency
```javascript
currency: "KRW",
value: 35000 // No decimals for KRW
```
### Common Korean Platform Integrations
- Cafe24: Uses custom dataLayer structure
- Shopify Korea: Standard GA4 format
- Naver SmartStore: Custom pixel implementation

View File

@@ -0,0 +1,157 @@
# Form Tracking Reference
## GA4 Form Events
### form_start
Fires on first interaction with form field.
```javascript
dataLayer.push({
event: "form_start",
form_id: "contact-form",
form_name: "Contact Us",
form_destination: "/submit-contact"
});
```
### form_submit
Fires on successful form submission.
```javascript
dataLayer.push({
event: "form_submit",
form_id: "contact-form",
form_name: "Contact Us",
form_destination: "/submit-contact",
form_submit_text: "Send Message"
});
```
### generate_lead
Fires when form generates a qualified lead.
```javascript
dataLayer.push({
event: "generate_lead",
currency: "USD",
value: 100, // Estimated lead value
form_id: "quote-request"
});
```
## Form Field Events (Custom)
### field_focus
```javascript
dataLayer.push({
event: "field_focus",
form_id: "signup-form",
field_name: "email",
field_type: "email"
});
```
### field_complete
```javascript
dataLayer.push({
event: "field_complete",
form_id: "signup-form",
field_name: "email",
field_type: "email",
is_valid: true
});
```
### field_error
```javascript
dataLayer.push({
event: "field_error",
form_id: "signup-form",
field_name: "email",
error_message: "Invalid email format"
});
```
## Form Abandonment Tracking
### Detecting Abandonment
Track when user leaves form without submitting:
```javascript
// Track form start
let formStarted = false;
document.querySelectorAll('form input, form select, form textarea')
.forEach(field => {
field.addEventListener('focus', function() {
if (!formStarted) {
formStarted = true;
dataLayer.push({ event: 'form_start', form_id: this.form.id });
}
});
});
// Track abandonment on page leave
window.addEventListener('beforeunload', function() {
if (formStarted && !formSubmitted) {
dataLayer.push({
event: 'form_abandon',
form_id: 'contact-form',
last_field: lastFocusedField,
fields_completed: completedFieldCount
});
}
});
```
## GTM Trigger Configuration
### Form Submission Trigger
| Setting | Value |
|---------|-------|
| Trigger Type | Form Submission |
| Wait for Tags | Check (if AJAX form) |
| Check Validation | Check |
| Form ID | equals `contact-form` |
### Form Start Trigger (Custom Event)
| Setting | Value |
|---------|-------|
| Trigger Type | Custom Event |
| Event Name | form_start |
| Fire On | All Custom Events |
## Common Form Types & Tracking
### Contact Forms
Events: `form_start`, `form_submit`, `generate_lead`
### Newsletter Signup
Events: `form_start`, `form_submit`, `sign_up`
### Login Forms
Events: `form_start`, `login`
### Search Forms
Events: `search` (with search_term parameter)
### Multi-Step Forms
Track each step:
```javascript
dataLayer.push({
event: "form_step",
form_id: "checkout-form",
step_number: 2,
step_name: "Shipping Address"
});
```
## Validation Checklist
- [ ] Form has id or name attribute
- [ ] All required fields have names
- [ ] Submit button identifiable
- [ ] form_start fires on first interaction
- [ ] form_submit fires only on success
- [ ] generate_lead has value parameter
- [ ] Error events track validation failures
- [ ] Abandonment tracking implemented (optional)

View File

@@ -0,0 +1,177 @@
# GA4 Recommended Events Reference
## Automatically Collected Events
Events GA4 collects without configuration:
- `first_visit` - First time user visits
- `session_start` - Session begins
- `page_view` - Page loads (enhanced measurement)
- `scroll` - 90% scroll depth
- `click` - Outbound link clicks
- `file_download` - File download clicks
- `video_start`, `video_progress`, `video_complete` - YouTube embeds
## E-commerce Events (Required Parameters)
### view_item_list
```javascript
{
event: "view_item_list",
ecommerce: {
item_list_id: "related_products",
item_list_name: "Related Products",
items: [{
item_id: "SKU_12345", // required
item_name: "Product Name", // required
price: 29.99,
quantity: 1
}]
}
}
```
### view_item
```javascript
{
event: "view_item",
ecommerce: {
currency: "USD",
value: 29.99,
items: [{
item_id: "SKU_12345", // required
item_name: "Product Name", // required
price: 29.99,
quantity: 1
}]
}
}
```
### add_to_cart
```javascript
{
event: "add_to_cart",
ecommerce: {
currency: "USD",
value: 29.99,
items: [{
item_id: "SKU_12345", // required
item_name: "Product Name", // required
price: 29.99,
quantity: 1
}]
}
}
```
### begin_checkout
```javascript
{
event: "begin_checkout",
ecommerce: {
currency: "USD",
value: 99.99,
coupon: "SUMMER_SALE",
items: [...]
}
}
```
### add_payment_info
```javascript
{
event: "add_payment_info",
ecommerce: {
currency: "USD",
value: 99.99,
payment_type: "credit_card",
items: [...]
}
}
```
### purchase
```javascript
{
event: "purchase",
ecommerce: {
transaction_id: "T12345", // required, must be unique
value: 99.99, // required
currency: "USD", // required
tax: 4.99,
shipping: 5.99,
coupon: "SUMMER_SALE",
items: [{
item_id: "SKU_12345", // required
item_name: "Product Name",// required
price: 29.99,
quantity: 2
}]
}
}
```
## Lead Generation Events
### generate_lead
```javascript
{
event: "generate_lead",
currency: "USD",
value: 100 // estimated lead value
}
```
### sign_up
```javascript
{
event: "sign_up",
method: "email" // or "google", "facebook", etc.
}
```
### login
```javascript
{
event: "login",
method: "email"
}
```
## Engagement Events
### search
```javascript
{
event: "search",
search_term: "blue shoes"
}
```
### share
```javascript
{
event: "share",
method: "twitter",
content_type: "article",
item_id: "article_123"
}
```
## Parameter Validation Rules
| Parameter | Type | Max Length | Notes |
|-----------|------|------------|-------|
| event name | string | 40 chars | No spaces, alphanumeric + underscore |
| item_id | string | 100 chars | Required for e-commerce |
| item_name | string | 100 chars | Required for e-commerce |
| currency | string | 3 chars | ISO 4217 format (USD, KRW, etc.) |
| transaction_id | string | 100 chars | Must be unique per transaction |
| value | number | - | Numeric, no currency symbols |
## Common Validation Errors
1. **Missing required params**: `item_id` or `item_name` not in items array
2. **Wrong data type**: `value` as string instead of number
3. **Duplicate transaction_id**: Same ID used for multiple purchases
4. **Empty items array**: E-commerce event with no items
5. **Invalid currency**: Currency code not in ISO 4217 format

View File

@@ -0,0 +1,115 @@
# GTM Audit Report Template
## Executive Summary
| Metric | Status |
|--------|--------|
| Container Installed | ✅ / ❌ |
| Container Valid | ✅ / ❌ |
| DataLayer Active | ✅ / ❌ |
| Tags Firing | X of Y |
| Critical Issues | X |
| Warnings | X |
## Container Status
**Container ID**: GTM-XXXXXX
**Installation Position**: head / body
**Multiple Containers**: Yes / No
**Noscript Fallback**: Present / Missing
### Issues Found
- [ ] Issue description
## DataLayer Analysis
### Events Captured
| Event Name | Count | Has Issues |
|------------|-------|------------|
| page_view | 1 | No |
| add_to_cart | 0 | - |
### DataLayer Quality
- [ ] Initialized before GTM
- [ ] Standard event naming
- [ ] Correct data types
- [ ] E-commerce structure valid
## Tag Firing Report
### Tags Fired ✅
| Destination | Events | Parameters |
|-------------|--------|------------|
| GA4 | page_view | page_location, page_title |
| Meta Pixel | PageView | - |
### Tags Not Detected ⚠️
| Expected Tag | Reason | Priority |
|--------------|--------|----------|
| GA4 purchase | Event not triggered | High |
## Network Request Analysis
Total requests captured: X
### By Destination
| Destination | Requests | Status |
|-------------|----------|--------|
| GA4 | X | ✅ |
| Meta | X | ✅ |
| Google Ads | 0 | ⚠️ |
## Issues & Recommendations
### Critical 🔴
1. **Issue Title**
- Description
- Impact
- Recommended Fix
### Warning 🟡
1. **Issue Title**
- Description
- Recommended Fix
### Info 🔵
1. **Issue Title**
- Description
## Action Items Checklist
### Immediate (Critical)
- [ ] Action item 1
- [ ] Action item 2
### Short-term (This Week)
- [ ] Action item 3
### Long-term (This Month)
- [ ] Action item 4
## Technical Details
### Environment
- URL Audited: https://example.com
- Audit Timestamp: YYYY-MM-DD HH:MM:SS
- Browser: Chromium (headless)
- Viewport: 1920x1080
### Raw Data
Full JSON report available at: `gtm_audit_report.json`
---
## Appendix: Tag Destination Reference
| Tag Type | Network Pattern |
|----------|-----------------|
| GA4 | google-analytics.com/g/collect |
| UA (Legacy) | google-analytics.com/collect |
| Google Ads | googleads.g.doubleclick.net |
| Meta Pixel | facebook.com/tr |
| LinkedIn | px.ads.linkedin.com |
| TikTok | analytics.tiktok.com |
| Kakao | pixel.kakao.com |
| Naver | wcs.naver.com |

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1 @@
playwright>=1.40.0

View File

@@ -0,0 +1,30 @@
#!/bin/bash
# GTM Audit Tool Setup Script
echo "🔧 Setting up GTM Audit Tool..."
# Check Python
if ! command -v python3 &> /dev/null; then
echo "❌ Python 3 is required but not installed."
exit 1
fi
echo "✅ Python 3 found"
# Install dependencies
echo "📦 Installing Python dependencies..."
pip install -r requirements.txt
# Install Playwright browsers
echo "🌐 Installing Playwright browsers..."
playwright install chromium
echo ""
echo "✅ Setup complete!"
echo ""
echo "Usage:"
echo " python gtm_audit.py --url 'https://example.com' --journey full"
echo ""
echo "For Claude Code:"
echo " claude"
echo " > Run a GTM audit on https://example.com"

Binary file not shown.

View File

@@ -0,0 +1,142 @@
# OurDigital GTM Manager
Comprehensive Google Tag Manager management toolkit powered by Playwright.
> **Note**: For lightweight audit-only functionality, see [13-ourdigital-gtm-audit](../13-ourdigital-gtm-audit/).
## Features
- **Audit Mode**: Validate GTM installations, dataLayer events, forms, and checkout flows
- **Inject Mode**: Generate custom HTML tags for dataLayer pushes
- **Container Detection**: Verify GTM installation, position, and duplicates
- **DataLayer Validation**: Event structure, types, sequence checking
- **Form Tracking**: Form discovery, field analysis, event verification
- **E-commerce Checkout**: Full funnel flow simulation and validation
- **Multi-Platform**: GA4, Meta Pixel, LinkedIn, Google Ads, Kakao, Naver
- **Notion Integration**: Export audit results to Notion database
## Installation
```bash
cd ourdigital-gtm-manager
# Install dependencies
pip install -r requirements.txt
# Install Playwright browsers
playwright install chromium
```
## Audit Mode
Analyze existing GTM implementations:
```bash
# Full audit
python gtm_manager.py audit --url "https://yoursite.com" --journey full
# Specific container validation
python gtm_manager.py audit --url "https://yoursite.com" --container "GTM-XXXXXX"
# Form tracking only
python gtm_manager.py audit --url "https://yoursite.com/contact" --journey form
# Audit and generate missing tags
python gtm_manager.py audit --url "https://yoursite.com" --generate-tags
# Export to Notion
python gtm_manager.py audit --url "https://yoursite.com" --notion
```
### Audit Options
| Option | Description | Default |
|--------|-------------|---------|
| `--url` | Target URL (required) | - |
| `--container` | Expected GTM container ID | None |
| `--journey` | pageview, scroll, form, checkout, datalayer, full | full |
| `--output` | Output file path | gtm_audit_report.json |
| `--generate-tags` | Generate missing dataLayer tags | False |
| `--notion` | Export to Notion database | False |
## Inject Mode
Generate GTM custom HTML tags when you can't modify source code directly:
```bash
# List available event types
python gtm_manager.py inject --list-events
# Generate all ecommerce tags
python gtm_manager.py inject --preset ecommerce --output ./tags
# Generate specific events
python gtm_manager.py inject --event purchase --event add_to_cart
# Generate from audit report
python gtm_manager.py inject --from-audit gtm_audit_report.json
# Generate with DOM scraping code
python gtm_manager.py inject --event view_item --scrape
```
### Inject Options
| Option | Description | Default |
|--------|-------------|---------|
| `--event` | Event type(s) to generate | - |
| `--preset` | ecommerce, engagement, all | - |
| `--from-audit` | Generate from audit report | - |
| `--output` | Output directory | ./gtm_tags |
| `--format` | html or json | html |
| `--currency` | Currency code | KRW |
| `--scrape` | Generate DOM scraping code | False |
| `--list-events` | Show available events | - |
### Supported Events
**Ecommerce**: view_item, add_to_cart, remove_from_cart, view_cart, begin_checkout, add_shipping_info, add_payment_info, purchase
**Forms**: form_submit, form_start, generate_lead
**Engagement**: scroll, file_download, search, outbound_click, share
**Video**: video_start, video_progress, video_complete
**User**: login, sign_up, page_view
## Notion Integration
```bash
# Set Notion API token
export NOTION_TOKEN="secret_xxxxx"
# Export audit to Notion
python gtm_manager.py audit --url "https://yoursite.com" --notion --notion-detailed
```
## Using with Claude Code
This project includes a `CLAUDE.md` file optimized for Claude Code.
```bash
claude
# Then ask Claude:
> Run a GTM audit on https://example.com
> Generate ecommerce dataLayer tags for my site
> Check the checkout flow and create missing tags
```
## Documentation
See `docs/` folder for:
- GA4 event specifications
- E-commerce dataLayer schemas
- Form tracking patterns
- Checkout flow sequences
- Common issues and fixes
## License
MIT

View File

@@ -0,0 +1,184 @@
# OurDigital GTM Manager
Comprehensive Google Tag Manager management toolkit - audit, analyze, and generate dataLayer implementations.
> **Note**: For lightweight audit-only functionality, see [13-ourdigital-gtm-audit](../13-ourdigital-gtm-audit/).
## Project Overview
This tool provides two main capabilities:
1. **Audit**: Validate GTM installations, analyze dataLayer events, test form/checkout tracking
2. **Inject**: Generate custom HTML tags for dataLayer pushes when direct code access is unavailable
## Quick Commands
```bash
# Install dependencies
pip install playwright
playwright install chromium
# AUDIT MODE
# Run full audit
python gtm_manager.py audit --url "https://example.com" --journey full
# Audit with Notion export
python gtm_manager.py audit --url "https://example.com" --notion
# Audit and generate missing tags
python gtm_manager.py audit --url "https://example.com" --generate-tags
# INJECT MODE
# List available event types
python gtm_manager.py inject --list-events
# Generate all ecommerce tags
python gtm_manager.py inject --preset ecommerce --output ./tags
# Generate specific event tags
python gtm_manager.py inject --event purchase --event add_to_cart
# Generate from audit report
python gtm_manager.py inject --from-audit gtm_audit_report.json
# Generate with DOM scraping
python gtm_manager.py inject --event view_item --scrape
```
## Audit Mode
### Journey Types
| Journey | Description |
|---------|-------------|
| `pageview` | Basic page load + scroll simulation |
| `scroll` | Scroll depth trigger testing (25%, 50%, 75%, 90%) |
| `form` | Form discovery, field analysis, interaction simulation |
| `checkout` | E-commerce flow: cart → checkout → shipping → payment → purchase |
| `datalayer` | Deep dataLayer validation and event sequence analysis |
| `full` | All of the above combined |
### Audit Output
Generates `gtm_audit_report.json` with:
- Container status (installed, position, duplicates)
- DataLayer analysis (events, validation issues, sequence errors)
- Form analysis (forms found, tracking readiness, missing events)
- Checkout analysis (elements detected, flow issues)
- Network requests (GA4, Meta, LinkedIn, etc.)
- Recommendations and checklist
## Inject Mode
Generate GTM custom HTML tags for dataLayer injection when you can't modify source code directly.
### Event Categories
**Ecommerce:**
- `view_item`, `add_to_cart`, `remove_from_cart`, `view_cart`
- `begin_checkout`, `add_shipping_info`, `add_payment_info`, `purchase`
**Forms & Leads:**
- `form_submit`, `form_start`, `generate_lead`
**Engagement:**
- `scroll`, `file_download`, `search`, `outbound_click`, `share`
**Video:**
- `video_start`, `video_progress`, `video_complete`
**User:**
- `login`, `sign_up`, `page_view`
### Presets
| Preset | Events Included |
|--------|-----------------|
| `ecommerce` | All 8 ecommerce events |
| `engagement` | Forms, scroll, downloads, video, search, sharing |
| `all` | Everything including page_view, login, sign_up |
### Generated Tag Features
- GA4-compliant dataLayer structure
- Ecommerce object clearing before pushes
- DOM scraping option for dynamic values
- Trigger recommendations and selectors
- Korean payment method support
## Notion Integration
Export audit results directly to Notion database for tracking and collaboration.
```bash
# Export to default Notion database (OurDigital GTM Audit Log)
python gtm_manager.py audit --url "https://example.com" --notion
# Export with detailed content
python gtm_manager.py audit --url "https://example.com" --notion --notion-detailed
```
### Environment Variables
```bash
export NOTION_TOKEN="secret_xxxxx"
```
### Default Database
Default Notion database: [OurDigital GTM Audit Log](https://www.notion.so/2cf581e58a1e8163997fccb387156a20)
## Key Files
- `gtm_manager.py` - Main script with audit and inject functionality
- `docs/ga4_events.md` - GA4 event specifications
- `docs/ecommerce_schema.md` - E-commerce dataLayer structures
- `docs/form_tracking.md` - Form event patterns
- `docs/checkout_flow.md` - Checkout funnel sequence
- `docs/datalayer_validation.md` - Validation rules
- `docs/common_issues.md` - Frequent problems and fixes
## Coding Guidelines
When modifying this tool:
1. **Tag Destinations**: Add new platforms to `TAG_DESTINATIONS` dict
2. **Event Validation**: Add requirements to `GA4_EVENT_REQUIREMENTS` dict
3. **Event Templates**: Add new events to `DataLayerInjector.EVENT_TEMPLATES`
4. **Form Selectors**: Extend `FormAnalyzer.discover_forms()` for custom forms
5. **Checkout Elements**: Add selectors to `CheckoutFlowAnalyzer.detect_checkout_elements()`
## Korean Market Considerations
- Support Korean payment methods (카카오페이, 네이버페이, 토스)
- Handle KRW currency (no decimals)
- Include Kakao Pixel and Naver Analytics patterns
- Korean button text patterns (장바구니, 결제하기, 주문하기)
## Common Tasks
### Add support for new tag platform
```python
# In TAG_DESTINATIONS dict
"NewPlatform": [
r"tracking\.newplatform\.com",
r"pixel\.newplatform\.com",
],
```
### Add new event type for injection
```python
# In DataLayerInjector.EVENT_TEMPLATES
"custom_event": {
"description": "Track custom action",
"params": {
"custom_param": "/* value */",
},
"trigger": "Custom Trigger",
},
```
### Extend checkout flow for specific platform
```python
# In CheckoutFlowAnalyzer.detect_checkout_elements()
# Add platform-specific selectors
```

View File

@@ -0,0 +1,237 @@
# E-commerce Checkout Flow Reference
## Complete Checkout Event Sequence
```
view_cart → begin_checkout → add_shipping_info → add_payment_info → purchase
```
Each step must fire in order with consistent item data.
## Event Details
### 1. view_cart
When user views cart page.
```javascript
dataLayer.push({ ecommerce: null });
dataLayer.push({
event: "view_cart",
ecommerce: {
currency: "KRW",
value: 125000,
items: [{
item_id: "SKU_001",
item_name: "Blue T-Shirt",
price: 45000,
quantity: 2,
item_brand: "Brand",
item_category: "Apparel"
}, {
item_id: "SKU_002",
item_name: "Black Jeans",
price: 35000,
quantity: 1
}]
}
});
```
### 2. begin_checkout
When user initiates checkout process.
```javascript
dataLayer.push({ ecommerce: null });
dataLayer.push({
event: "begin_checkout",
ecommerce: {
currency: "KRW",
value: 125000,
coupon: "SUMMER10",
items: [/* same items as view_cart */]
}
});
```
### 3. add_shipping_info
When user completes shipping step.
```javascript
dataLayer.push({ ecommerce: null });
dataLayer.push({
event: "add_shipping_info",
ecommerce: {
currency: "KRW",
value: 125000,
coupon: "SUMMER10",
shipping_tier: "Express", // Required
items: [/* same items */]
}
});
```
**shipping_tier values:**
- "Standard" / "일반배송"
- "Express" / "익일배송"
- "Same Day" / "당일배송"
- "Free" / "무료배송"
- "Store Pickup" / "매장픽업"
### 4. add_payment_info
When user enters payment details.
```javascript
dataLayer.push({ ecommerce: null });
dataLayer.push({
event: "add_payment_info",
ecommerce: {
currency: "KRW",
value: 125000,
coupon: "SUMMER10",
payment_type: "Credit Card", // Required
items: [/* same items */]
}
});
```
**payment_type values:**
- "Credit Card" / "신용카드"
- "Debit Card" / "체크카드"
- "Bank Transfer" / "계좌이체"
- "Virtual Account" / "가상계좌"
- "Mobile Payment" / "휴대폰결제"
- "Kakao Pay" / "카카오페이"
- "Naver Pay" / "네이버페이"
- "Toss" / "토스"
- "PayPal"
### 5. purchase
When transaction completes successfully.
```javascript
dataLayer.push({ ecommerce: null });
dataLayer.push({
event: "purchase",
ecommerce: {
transaction_id: "T_20250115_001234", // Required, unique
value: 130500, // Required (total)
tax: 11863,
shipping: 5000,
currency: "KRW", // Required
coupon: "SUMMER10",
items: [{
item_id: "SKU_001",
item_name: "Blue T-Shirt",
affiliation: "Online Store",
coupon: "SUMMER10",
discount: 4500,
price: 45000,
quantity: 2
}]
}
});
```
## Funnel Drop-off Analysis
### Tracking Drop-offs
Monitor completion rate at each step:
| Step | Event | Drop-off Indicator |
|------|-------|-------------------|
| Cart | view_cart | User leaves cart page |
| Checkout Start | begin_checkout | User doesn't proceed |
| Shipping | add_shipping_info | Address form abandoned |
| Payment | add_payment_info | Payment not completed |
| Complete | purchase | Transaction failed |
### Implementing Drop-off Tracking
```javascript
// Track checkout step viewed but not completed
let checkoutStep = 0;
function trackCheckoutProgress(step) {
if (step > checkoutStep) {
checkoutStep = step;
}
}
window.addEventListener('beforeunload', () => {
if (checkoutStep > 0 && checkoutStep < 5) {
dataLayer.push({
event: 'checkout_abandon',
last_step: checkoutStep,
step_name: ['cart', 'checkout', 'shipping', 'payment', 'complete'][checkoutStep - 1]
});
}
});
```
## Value Consistency Check
Ensure `value` matches across events:
```
view_cart.value = sum(items.price * items.quantity)
begin_checkout.value = view_cart.value
add_shipping_info.value = begin_checkout.value
add_payment_info.value = add_shipping_info.value
purchase.value = add_payment_info.value + shipping + tax - discount
```
## Common Issues
### Duplicate Purchase Events
**Problem**: Same order tracked multiple times
**Solution**:
```javascript
// Check if already tracked
const txId = "T_12345";
if (!sessionStorage.getItem('purchase_' + txId)) {
dataLayer.push({ event: 'purchase', ... });
sessionStorage.setItem('purchase_' + txId, 'true');
}
```
### Missing Items in Later Steps
**Problem**: Items present in view_cart but missing in purchase
**Solution**: Store cart data in session and reuse
### Inconsistent Currency
**Problem**: Some events use USD, others KRW
**Solution**: Standardize currency across all events
### Wrong Value Calculation
**Problem**: purchase.value doesn't include tax/shipping
**Solution**:
```
purchase.value = subtotal + tax + shipping - discount
```
## Korean E-commerce Platforms
### Cafe24
Custom dataLayer variable names - check documentation
### Shopify Korea
Standard GA4 format with `Shopify.checkout` object
### WooCommerce
Use official GA4 plugin or custom implementation
### Naver SmartStore
Separate Naver Analytics implementation required
## Checkout Flow Checklist
- [ ] view_cart fires on cart page load
- [ ] begin_checkout fires on checkout button click
- [ ] add_shipping_info includes shipping_tier
- [ ] add_payment_info includes payment_type
- [ ] purchase has unique transaction_id
- [ ] All events have consistent items array
- [ ] Currency is consistent across all events
- [ ] Value calculations are accurate
- [ ] ecommerce object cleared before each push
- [ ] Purchase event fires only once per order

View File

@@ -0,0 +1,211 @@
# Common GTM Issues & Fixes
## Container Issues
### GTM Not Firing
**Symptoms**: No GTM requests in network tab
**Causes**:
1. Script blocked by ad blocker
2. Script placed after closing body tag
3. JavaScript error before GTM loads
4. Consent management blocking GTM
**Fix**:
```html
<!-- Place immediately after opening <head> tag -->
<script>(function(w,d,s,l,i){...})(window,document,'script','dataLayer','GTM-XXXXXX');</script>
```
### Multiple Containers Conflict
**Symptoms**: Duplicate events, inconsistent data
**Causes**:
1. Legacy container not removed
2. Different teams installed separate containers
3. Theme/plugin auto-installed GTM
**Fix**:
1. Audit all containers in source
2. Consolidate to single container
3. Use GTM environments for staging/prod
### Container ID Mismatch
**Symptoms**: Tags not firing, wrong property receiving data
**Causes**:
1. Dev/staging container on production
2. Copy-paste error during installation
**Fix**: Verify container ID matches GTM account
---
## DataLayer Issues
### DataLayer Not Initialized
**Symptoms**: First push events lost
**Code Error**:
```javascript
// Wrong - GTM loads before dataLayer exists
<script>GTM snippet</script>
dataLayer.push({...});
```
**Fix**:
```javascript
// Correct - Initialize dataLayer first
<script>window.dataLayer = window.dataLayer || [];</script>
<script>GTM snippet</script>
```
### Case Sensitivity Issues
**Symptoms**: Triggers not matching
**Example**:
```javascript
// DataLayer pushes "AddToCart"
dataLayer.push({ event: "AddToCart" });
// But GTM trigger looks for "addToCart" - won't match!
```
**Fix**: Standardize event naming (recommend lowercase with underscores)
### Wrong Data Types
**Symptoms**: Calculations wrong in GA4, missing data
**Example**:
```javascript
// Wrong - price as string
dataLayer.push({ ecommerce: { value: "29.99" }});
// Correct - price as number
dataLayer.push({ ecommerce: { value: 29.99 }});
```
### Timing Issues
**Symptoms**: Events fire before data available
**Cause**: DataLayer push happens after tag fires
**Fix**: Use "Custom Event" trigger instead of "Page View"
---
## Tag Issues
### Tag Not Firing
**Checklist**:
1. ✓ Trigger conditions met?
2. ✓ Trigger enabled?
3. ✓ Tag not paused?
4. ✓ No blocking triggers active?
5. ✓ Consent mode not blocking?
**Debug Steps**:
1. GTM Preview > Check Tags Fired
2. Verify trigger shows green check
3. Check Variables tab for expected values
### Duplicate Tag Firing
**Symptoms**: Events counted 2x in GA4
**Causes**:
1. Multiple triggers on same action
2. Page re-renders triggering again
3. SPA virtual pageviews firing multiple times
**Fix**:
1. Add "Once per event" tag firing option
2. Use trigger groups to control firing
3. Add conditions to prevent re-firing
### Wrong Parameters Sent
**Symptoms**: Data appears in wrong fields in GA4
**Debug**:
1. GTM Preview > Tags > Show fired tag
2. Check "Values" sent with tag
3. Compare with expected parameters
---
## E-commerce Issues
### Missing Transaction ID
**Symptoms**: Duplicate purchases counted
**Fix**: Ensure unique `transaction_id` generated server-side
### Items Array Empty
**Symptoms**: Revenue tracked but no products
**Check**: `ecommerce.items` array populated
### Value Mismatch
**Symptoms**: Revenue doesn't match actual
**Causes**:
1. Tax/shipping included inconsistently
2. Currency conversion issues
3. Discount applied incorrectly
### Purchase Event Fires Multiple Times
**Symptoms**: Same order tracked 2-3x
**Causes**:
1. Page refresh on confirmation
2. Browser back button
3. Email link revisit
**Fix**:
```javascript
// Check if already tracked
if (!sessionStorage.getItem('purchase_' + transaction_id)) {
dataLayer.push({ event: 'purchase', ... });
sessionStorage.setItem('purchase_' + transaction_id, 'true');
}
```
---
## Consent Mode Issues
### Tags Blocked by Consent
**Symptoms**: Tags show "Blocked by consent" in Preview
**Fix**:
1. Verify consent mode implementation
2. Check default consent state
3. Test with consent granted
### Consent Not Updating
**Symptoms**: Tags stay blocked after user accepts
**Fix**: Verify `gtag('consent', 'update', {...})` fires on accept
---
## SPA (Single Page App) Issues
### Pageviews Not Tracking Navigation
**Symptoms**: Only initial pageview tracked
**Cause**: No page reload on route change
**Fix**: Implement History Change trigger or custom event:
```javascript
// On route change
dataLayer.push({
event: 'virtual_pageview',
page_path: newPath,
page_title: newTitle
});
```
### Events Fire on Old Page Data
**Symptoms**: Wrong page_path in events
**Fix**: Update page variables before event push
---
## Performance Issues
### Tags Slowing Page Load
**Symptoms**: High LCP, slow TTI
**Causes**:
1. Too many synchronous tags
2. Large third-party scripts
3. Tags in wrong firing sequence
**Fix**:
1. Use tag sequencing
2. Load non-critical tags on Window Loaded
3. Defer marketing tags

View File

@@ -0,0 +1,287 @@
# DataLayer Validation Reference
## DataLayer Structure Basics
### Proper Initialization
```javascript
// Must appear BEFORE GTM script
<script>
window.dataLayer = window.dataLayer || [];
</script>
<!-- GTM script here -->
```
### Push Syntax
```javascript
// Correct
dataLayer.push({ event: "page_view", page_title: "Home" });
// Wrong - direct assignment
dataLayer = [{ event: "page_view" }]; // ❌ Overwrites array
```
## Validation Rules
### Event Names
| Rule | Valid | Invalid |
|------|-------|---------|
| Alphanumeric + underscore | `add_to_cart` | `add-to-cart` |
| Max 40 characters | `purchase` | (too long names) |
| Case sensitive | `addToCart``addtocart` | - |
| No spaces | `form_submit` | `form submit` |
| No special chars | `click_cta` | `click@cta` |
### Parameter Names
| Rule | Valid | Invalid |
|------|-------|---------|
| Max 40 characters | `item_category` | (too long) |
| Alphanumeric + underscore | `user_id` | `user-id` |
| Cannot start with `_` | `custom_param` | `_private` |
| Cannot start with number | `step_1` | `1_step` |
### Data Types
| Parameter | Expected Type | Example |
|-----------|---------------|---------|
| value | number | `29.99` not `"29.99"` |
| currency | string (ISO 4217) | `"USD"`, `"KRW"` |
| transaction_id | string | `"T_12345"` |
| quantity | integer | `2` not `2.0` |
| price | number | `45000` |
| items | array | `[{...}, {...}]` |
### Type Validation Code
```javascript
function validateDataLayerPush(data) {
const issues = [];
// Check value is number
if (data.ecommerce?.value !== undefined) {
if (typeof data.ecommerce.value !== 'number') {
issues.push(`value should be number, got ${typeof data.ecommerce.value}`);
}
}
// Check currency format
if (data.ecommerce?.currency) {
if (!/^[A-Z]{3}$/.test(data.ecommerce.currency)) {
issues.push(`currency should be 3-letter ISO code`);
}
}
// Check items array
if (data.ecommerce?.items) {
if (!Array.isArray(data.ecommerce.items)) {
issues.push(`items should be array`);
} else {
data.ecommerce.items.forEach((item, i) => {
if (!item.item_id) issues.push(`items[${i}] missing item_id`);
if (!item.item_name) issues.push(`items[${i}] missing item_name`);
if (item.price && typeof item.price !== 'number') {
issues.push(`items[${i}].price should be number`);
}
if (item.quantity && !Number.isInteger(item.quantity)) {
issues.push(`items[${i}].quantity should be integer`);
}
});
}
}
return issues;
}
```
## E-commerce Object Clearing
### Why Clear?
GA4 may merge previous ecommerce data with new events.
### Correct Pattern
```javascript
// Clear first
dataLayer.push({ ecommerce: null });
// Then push new event
dataLayer.push({
event: "view_item",
ecommerce: { ... }
});
```
### Validation Check
```javascript
function checkEcommerceClear(dataLayerArray) {
let lastHadEcommerce = false;
const issues = [];
dataLayerArray.forEach((item, i) => {
const hasEcommerce = 'ecommerce' in item;
const isNull = item.ecommerce === null;
if (hasEcommerce && !isNull && lastHadEcommerce) {
issues.push({
index: i,
message: 'Missing ecommerce:null before this push'
});
}
lastHadEcommerce = hasEcommerce && !isNull;
});
return issues;
}
```
## Event Sequence Validation
### Expected Sequences
**E-commerce Purchase Flow:**
```
view_item_list? → view_item → add_to_cart → view_cart →
begin_checkout → add_shipping_info → add_payment_info → purchase
```
**Form Submission:**
```
form_start → form_submit → generate_lead?
```
**User Authentication:**
```
login | sign_up
```
### Sequence Validator
```javascript
function validateSequence(events, expectedOrder) {
const eventNames = events
.filter(e => e.event)
.map(e => e.event);
let lastIndex = -1;
const issues = [];
eventNames.forEach(event => {
const index = expectedOrder.indexOf(event);
if (index !== -1) {
if (index < lastIndex) {
issues.push(`${event} fired out of expected order`);
}
lastIndex = index;
}
});
return issues;
}
```
## Duplicate Event Detection
### Common Duplicates
- Multiple `page_view` on single page load
- `purchase` firing on page refresh
- Click events on bubbling elements
### Detection Code
```javascript
function findDuplicates(events) {
const seen = {};
const duplicates = [];
events.forEach((event, i) => {
if (!event.event) return;
const key = JSON.stringify(event);
if (seen[key]) {
duplicates.push({
event: event.event,
firstIndex: seen[key],
duplicateIndex: i
});
} else {
seen[key] = i;
}
});
return duplicates;
}
```
## Real-time Monitoring Setup
### Console Monitoring
```javascript
// Paste in browser console to monitor pushes
(function() {
const original = dataLayer.push;
dataLayer.push = function() {
console.group('📊 dataLayer.push');
console.log('Data:', arguments[0]);
console.log('Time:', new Date().toISOString());
console.groupEnd();
return original.apply(this, arguments);
};
console.log('✅ DataLayer monitoring active');
})();
```
### Export DataLayer
```javascript
// Copy full dataLayer to clipboard
copy(JSON.stringify(dataLayer, null, 2));
```
## Validation Checklist
### Structure
- [ ] dataLayer initialized before GTM
- [ ] Using push() not assignment
- [ ] Event names follow conventions
- [ ] Parameter names follow conventions
### Data Types
- [ ] value is number
- [ ] currency is 3-letter code
- [ ] quantity is integer
- [ ] items is array
- [ ] Required fields present
### E-commerce
- [ ] ecommerce:null before each push
- [ ] items array has item_id and item_name
- [ ] transaction_id is unique
- [ ] Consistent currency across events
### Sequence
- [ ] Events fire in logical order
- [ ] No duplicate events
- [ ] Purchase fires only once
## Debug Tools
### GTM Preview Mode
- Real-time event inspection
- Variable value checking
- Tag firing verification
### GA4 DebugView
- Live event stream
- Parameter validation
- User property tracking
### Browser Console
```javascript
// View current dataLayer
console.table(dataLayer);
// Filter by event
dataLayer.filter(d => d.event === 'purchase');
```

View File

@@ -0,0 +1,216 @@
# E-commerce DataLayer Schema Reference
## GA4 E-commerce Structure
### Items Array Schema
Every e-commerce event requires an `items` array:
```javascript
items: [{
// Required
item_id: "SKU_12345",
item_name: "Blue T-Shirt",
// Recommended
affiliation: "Store Name",
coupon: "SUMMER_SALE",
discount: 5.00,
index: 0,
item_brand: "Brand Name",
item_category: "Apparel",
item_category2: "Men",
item_category3: "Shirts",
item_category4: "T-Shirts",
item_category5: "Short Sleeve",
item_list_id: "related_products",
item_list_name: "Related Products",
item_variant: "Blue/Large",
location_id: "ChIJIQBpAG2ahYAR_6128GcTUEo",
price: 29.99,
quantity: 1
}]
```
### Clear Previous E-commerce Data
Always clear before new e-commerce event:
```javascript
dataLayer.push({ ecommerce: null });
dataLayer.push({
event: "view_item",
ecommerce: {
// new data
}
});
```
## Complete Purchase Flow
### 1. Product List View
```javascript
dataLayer.push({ ecommerce: null });
dataLayer.push({
event: "view_item_list",
ecommerce: {
item_list_id: "category_results",
item_list_name: "Category Results",
items: [
{ item_id: "SKU_001", item_name: "Product 1", index: 0, price: 29.99 },
{ item_id: "SKU_002", item_name: "Product 2", index: 1, price: 39.99 }
]
}
});
```
### 2. Product Click
```javascript
dataLayer.push({ ecommerce: null });
dataLayer.push({
event: "select_item",
ecommerce: {
item_list_id: "category_results",
item_list_name: "Category Results",
items: [{
item_id: "SKU_001",
item_name: "Product 1",
price: 29.99
}]
}
});
```
### 3. Product Detail View
```javascript
dataLayer.push({ ecommerce: null });
dataLayer.push({
event: "view_item",
ecommerce: {
currency: "USD",
value: 29.99,
items: [{
item_id: "SKU_001",
item_name: "Product 1",
item_brand: "Brand",
item_category: "Category",
price: 29.99,
quantity: 1
}]
}
});
```
### 4. Add to Cart
```javascript
dataLayer.push({ ecommerce: null });
dataLayer.push({
event: "add_to_cart",
ecommerce: {
currency: "USD",
value: 29.99,
items: [{
item_id: "SKU_001",
item_name: "Product 1",
price: 29.99,
quantity: 1
}]
}
});
```
### 5. View Cart
```javascript
dataLayer.push({ ecommerce: null });
dataLayer.push({
event: "view_cart",
ecommerce: {
currency: "USD",
value: 59.98,
items: [
{ item_id: "SKU_001", item_name: "Product 1", price: 29.99, quantity: 1 },
{ item_id: "SKU_002", item_name: "Product 2", price: 29.99, quantity: 1 }
]
}
});
```
### 6. Begin Checkout
```javascript
dataLayer.push({ ecommerce: null });
dataLayer.push({
event: "begin_checkout",
ecommerce: {
currency: "USD",
value: 59.98,
coupon: "DISCOUNT10",
items: [...]
}
});
```
### 7. Add Shipping Info
```javascript
dataLayer.push({ ecommerce: null });
dataLayer.push({
event: "add_shipping_info",
ecommerce: {
currency: "USD",
value: 59.98,
shipping_tier: "Standard",
items: [...]
}
});
```
### 8. Add Payment Info
```javascript
dataLayer.push({ ecommerce: null });
dataLayer.push({
event: "add_payment_info",
ecommerce: {
currency: "USD",
value: 59.98,
payment_type: "Credit Card",
items: [...]
}
});
```
### 9. Purchase
```javascript
dataLayer.push({ ecommerce: null });
dataLayer.push({
event: "purchase",
ecommerce: {
transaction_id: "T_12345",
value: 65.97,
tax: 4.99,
shipping: 5.99,
currency: "USD",
coupon: "DISCOUNT10",
items: [{
item_id: "SKU_001",
item_name: "Product 1",
affiliation: "Online Store",
coupon: "DISCOUNT10",
discount: 3.00,
item_brand: "Brand",
item_category: "Category",
price: 29.99,
quantity: 1
}]
}
});
```
## Korean E-commerce Considerations
### Currency
```javascript
currency: "KRW",
value: 35000 // No decimals for KRW
```
### Common Korean Platform Integrations
- Cafe24: Uses custom dataLayer structure
- Shopify Korea: Standard GA4 format
- Naver SmartStore: Custom pixel implementation

View File

@@ -0,0 +1,157 @@
# Form Tracking Reference
## GA4 Form Events
### form_start
Fires on first interaction with form field.
```javascript
dataLayer.push({
event: "form_start",
form_id: "contact-form",
form_name: "Contact Us",
form_destination: "/submit-contact"
});
```
### form_submit
Fires on successful form submission.
```javascript
dataLayer.push({
event: "form_submit",
form_id: "contact-form",
form_name: "Contact Us",
form_destination: "/submit-contact",
form_submit_text: "Send Message"
});
```
### generate_lead
Fires when form generates a qualified lead.
```javascript
dataLayer.push({
event: "generate_lead",
currency: "USD",
value: 100, // Estimated lead value
form_id: "quote-request"
});
```
## Form Field Events (Custom)
### field_focus
```javascript
dataLayer.push({
event: "field_focus",
form_id: "signup-form",
field_name: "email",
field_type: "email"
});
```
### field_complete
```javascript
dataLayer.push({
event: "field_complete",
form_id: "signup-form",
field_name: "email",
field_type: "email",
is_valid: true
});
```
### field_error
```javascript
dataLayer.push({
event: "field_error",
form_id: "signup-form",
field_name: "email",
error_message: "Invalid email format"
});
```
## Form Abandonment Tracking
### Detecting Abandonment
Track when user leaves form without submitting:
```javascript
// Track form start
let formStarted = false;
document.querySelectorAll('form input, form select, form textarea')
.forEach(field => {
field.addEventListener('focus', function() {
if (!formStarted) {
formStarted = true;
dataLayer.push({ event: 'form_start', form_id: this.form.id });
}
});
});
// Track abandonment on page leave
window.addEventListener('beforeunload', function() {
if (formStarted && !formSubmitted) {
dataLayer.push({
event: 'form_abandon',
form_id: 'contact-form',
last_field: lastFocusedField,
fields_completed: completedFieldCount
});
}
});
```
## GTM Trigger Configuration
### Form Submission Trigger
| Setting | Value |
|---------|-------|
| Trigger Type | Form Submission |
| Wait for Tags | Check (if AJAX form) |
| Check Validation | Check |
| Form ID | equals `contact-form` |
### Form Start Trigger (Custom Event)
| Setting | Value |
|---------|-------|
| Trigger Type | Custom Event |
| Event Name | form_start |
| Fire On | All Custom Events |
## Common Form Types & Tracking
### Contact Forms
Events: `form_start`, `form_submit`, `generate_lead`
### Newsletter Signup
Events: `form_start`, `form_submit`, `sign_up`
### Login Forms
Events: `form_start`, `login`
### Search Forms
Events: `search` (with search_term parameter)
### Multi-Step Forms
Track each step:
```javascript
dataLayer.push({
event: "form_step",
form_id: "checkout-form",
step_number: 2,
step_name: "Shipping Address"
});
```
## Validation Checklist
- [ ] Form has id or name attribute
- [ ] All required fields have names
- [ ] Submit button identifiable
- [ ] form_start fires on first interaction
- [ ] form_submit fires only on success
- [ ] generate_lead has value parameter
- [ ] Error events track validation failures
- [ ] Abandonment tracking implemented (optional)

View File

@@ -0,0 +1,177 @@
# GA4 Recommended Events Reference
## Automatically Collected Events
Events GA4 collects without configuration:
- `first_visit` - First time user visits
- `session_start` - Session begins
- `page_view` - Page loads (enhanced measurement)
- `scroll` - 90% scroll depth
- `click` - Outbound link clicks
- `file_download` - File download clicks
- `video_start`, `video_progress`, `video_complete` - YouTube embeds
## E-commerce Events (Required Parameters)
### view_item_list
```javascript
{
event: "view_item_list",
ecommerce: {
item_list_id: "related_products",
item_list_name: "Related Products",
items: [{
item_id: "SKU_12345", // required
item_name: "Product Name", // required
price: 29.99,
quantity: 1
}]
}
}
```
### view_item
```javascript
{
event: "view_item",
ecommerce: {
currency: "USD",
value: 29.99,
items: [{
item_id: "SKU_12345", // required
item_name: "Product Name", // required
price: 29.99,
quantity: 1
}]
}
}
```
### add_to_cart
```javascript
{
event: "add_to_cart",
ecommerce: {
currency: "USD",
value: 29.99,
items: [{
item_id: "SKU_12345", // required
item_name: "Product Name", // required
price: 29.99,
quantity: 1
}]
}
}
```
### begin_checkout
```javascript
{
event: "begin_checkout",
ecommerce: {
currency: "USD",
value: 99.99,
coupon: "SUMMER_SALE",
items: [...]
}
}
```
### add_payment_info
```javascript
{
event: "add_payment_info",
ecommerce: {
currency: "USD",
value: 99.99,
payment_type: "credit_card",
items: [...]
}
}
```
### purchase
```javascript
{
event: "purchase",
ecommerce: {
transaction_id: "T12345", // required, must be unique
value: 99.99, // required
currency: "USD", // required
tax: 4.99,
shipping: 5.99,
coupon: "SUMMER_SALE",
items: [{
item_id: "SKU_12345", // required
item_name: "Product Name",// required
price: 29.99,
quantity: 2
}]
}
}
```
## Lead Generation Events
### generate_lead
```javascript
{
event: "generate_lead",
currency: "USD",
value: 100 // estimated lead value
}
```
### sign_up
```javascript
{
event: "sign_up",
method: "email" // or "google", "facebook", etc.
}
```
### login
```javascript
{
event: "login",
method: "email"
}
```
## Engagement Events
### search
```javascript
{
event: "search",
search_term: "blue shoes"
}
```
### share
```javascript
{
event: "share",
method: "twitter",
content_type: "article",
item_id: "article_123"
}
```
## Parameter Validation Rules
| Parameter | Type | Max Length | Notes |
|-----------|------|------------|-------|
| event name | string | 40 chars | No spaces, alphanumeric + underscore |
| item_id | string | 100 chars | Required for e-commerce |
| item_name | string | 100 chars | Required for e-commerce |
| currency | string | 3 chars | ISO 4217 format (USD, KRW, etc.) |
| transaction_id | string | 100 chars | Must be unique per transaction |
| value | number | - | Numeric, no currency symbols |
## Common Validation Errors
1. **Missing required params**: `item_id` or `item_name` not in items array
2. **Wrong data type**: `value` as string instead of number
3. **Duplicate transaction_id**: Same ID used for multiple purchases
4. **Empty items array**: E-commerce event with no items
5. **Invalid currency**: Currency code not in ISO 4217 format

Some files were not shown because too many files have changed in this diff Show More