directory changes and restructuring
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
89
custom-skills/01-notion-organizer/code/CLAUDE.md
Normal file
89
custom-skills/01-notion-organizer/code/CLAUDE.md
Normal file
@@ -0,0 +1,89 @@
|
||||
# CLAUDE.md
|
||||
|
||||
## Overview
|
||||
|
||||
Notion workspace management toolkit for database organization, schema migration, and bulk operations.
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
pip install -r scripts/requirements.txt
|
||||
|
||||
# Schema migration
|
||||
python scripts/schema_migrator.py --source [DB_ID] --target [DB_ID] --dry-run
|
||||
|
||||
# Async bulk operations
|
||||
python scripts/async_organizer.py --database [DB_ID] --action cleanup
|
||||
```
|
||||
|
||||
## Scripts
|
||||
|
||||
| Script | Purpose |
|
||||
|--------|---------|
|
||||
| `schema_migrator.py` | Migrate data between databases with property mapping |
|
||||
| `async_organizer.py` | Async bulk operations (cleanup, restructure, archive) |
|
||||
|
||||
## Schema Migrator
|
||||
|
||||
```bash
|
||||
# Dry run (preview changes)
|
||||
python scripts/schema_migrator.py \
|
||||
--source abc123 \
|
||||
--target def456 \
|
||||
--mapping mapping.json \
|
||||
--dry-run
|
||||
|
||||
# Execute migration
|
||||
python scripts/schema_migrator.py \
|
||||
--source abc123 \
|
||||
--target def456 \
|
||||
--mapping mapping.json
|
||||
```
|
||||
|
||||
### Mapping File Format
|
||||
|
||||
```json
|
||||
{
|
||||
"properties": {
|
||||
"OldName": "NewName",
|
||||
"Status": "Status"
|
||||
},
|
||||
"transforms": {
|
||||
"Date": "date_to_iso"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Async Organizer
|
||||
|
||||
```bash
|
||||
# Cleanup empty/stale pages
|
||||
python scripts/async_organizer.py --database [ID] --action cleanup
|
||||
|
||||
# Archive old pages
|
||||
python scripts/async_organizer.py --database [ID] --action archive --days 90
|
||||
|
||||
# Restructure hierarchy
|
||||
python scripts/async_organizer.py --database [ID] --action restructure
|
||||
```
|
||||
|
||||
## Rate Limits
|
||||
|
||||
| Limit | Value |
|
||||
|-------|-------|
|
||||
| Requests/second | 3 max |
|
||||
| Items per request | 100 max |
|
||||
| Retry on 429 | Exponential backoff |
|
||||
|
||||
## Configuration
|
||||
|
||||
Environment variables:
|
||||
```bash
|
||||
NOTION_TOKEN=secret_xxx
|
||||
```
|
||||
|
||||
## Notes
|
||||
|
||||
- Always use `--dry-run` first for destructive operations
|
||||
- Large operations (1000+ pages) use async with progress reporting
|
||||
- Scripts implement automatic rate limiting
|
||||
250
custom-skills/01-notion-organizer/code/references/reference.md
Normal file
250
custom-skills/01-notion-organizer/code/references/reference.md
Normal file
@@ -0,0 +1,250 @@
|
||||
# Notion Organizer Reference
|
||||
|
||||
## Notion API Fundamentals
|
||||
|
||||
### Base Configuration
|
||||
- **Base URL**: `https://api.notion.com`
|
||||
- **Current Version**: `2022-06-28`
|
||||
- **Authentication**: Bearer token in Authorization header
|
||||
|
||||
### Rate Limits
|
||||
|
||||
| Limit | Value | Strategy |
|
||||
|-------|-------|----------|
|
||||
| Requests/second | 3 (average) | Use throttling/semaphore |
|
||||
| Burst allowed | Small bursts | Implement exponential backoff |
|
||||
| Page size | 100 items max | Use pagination cursors |
|
||||
| Payload size | 500KB max | Split large operations |
|
||||
|
||||
### Core Object Hierarchy
|
||||
|
||||
```
|
||||
Workspace
|
||||
└── Database (container)
|
||||
└── Page (row)
|
||||
└── Block (content)
|
||||
```
|
||||
|
||||
### Property Types Reference
|
||||
|
||||
| Type | Use Case | Notes |
|
||||
|------|----------|-------|
|
||||
| `title` | Page name | Required, one per database |
|
||||
| `rich_text` | Text content | Max 2,000 chars |
|
||||
| `number` | Numeric values | Supports format options |
|
||||
| `select` | Single choice | Define options array |
|
||||
| `multi_select` | Multiple choices | Define options array |
|
||||
| `status` | Workflow states | Groups: To-do, In progress, Complete |
|
||||
| `date` | Dates/times | ISO 8601 format |
|
||||
| `checkbox` | Boolean | true/false |
|
||||
| `url` | Links | Max 2,000 chars |
|
||||
| `email` | Email addresses | Validation applied |
|
||||
| `phone_number` | Phone | String format |
|
||||
| `relation` | Links to pages | Requires database_id |
|
||||
| `rollup` | Aggregated data | Requires relation + function |
|
||||
| `formula` | Computed values | Expression syntax |
|
||||
| `files` | Attachments | External URLs or Notion hosted |
|
||||
| `people` | User references | Notion user IDs |
|
||||
| `created_time` | Auto timestamp | Read-only |
|
||||
| `created_by` | Auto user | Read-only |
|
||||
| `last_edited_time` | Auto timestamp | Read-only |
|
||||
| `last_edited_by` | Auto user | Read-only |
|
||||
|
||||
### Size Limits
|
||||
|
||||
| Element | Limit |
|
||||
|---------|-------|
|
||||
| Rich text content | 2,000 characters |
|
||||
| URL length | 2,000 characters |
|
||||
| Array elements | 100 items |
|
||||
| Page properties | 100 per page |
|
||||
| Database properties | 100 per database |
|
||||
|
||||
### Error Codes
|
||||
|
||||
| Code | Status | Action |
|
||||
|------|--------|--------|
|
||||
| `rate_limited` | 429 | Wait Retry-After header seconds |
|
||||
| `validation_error` | 400 | Check request body format |
|
||||
| `object_not_found` | 404 | Verify sharing/permissions |
|
||||
| `unauthorized` | 401 | Check API token validity |
|
||||
| `conflict_error` | 409 | Resource was modified, refetch |
|
||||
| `internal_server_error` | 500 | Retry with backoff |
|
||||
|
||||
---
|
||||
|
||||
## Workflow Patterns
|
||||
|
||||
### Pattern 1: Database Audit
|
||||
|
||||
**Purpose**: Analyze database structure and recommend optimizations
|
||||
|
||||
**Steps**:
|
||||
1. Fetch database schema via MCP or API
|
||||
2. Analyze property types, naming conventions, usage
|
||||
3. Identify issues:
|
||||
- Unused properties
|
||||
- Inconsistent naming
|
||||
- Suboptimal property types
|
||||
- Missing relations
|
||||
4. Present recommendations with rationale
|
||||
5. Execute approved changes incrementally
|
||||
|
||||
**Example Query**:
|
||||
```
|
||||
Audit my "Projects" database:
|
||||
- Check for unused properties
|
||||
- Identify naming inconsistencies
|
||||
- Recommend schema optimizations
|
||||
```
|
||||
|
||||
### Pattern 2: Bulk Reorganization
|
||||
|
||||
**Purpose**: Move/update many pages efficiently
|
||||
|
||||
**Decision Tree**:
|
||||
- ≤ 50 operations → Use MCP tools with staged execution
|
||||
- > 50 operations → Generate Python script
|
||||
|
||||
**Steps**:
|
||||
1. Assess scope (count affected pages)
|
||||
2. Estimate API calls and time
|
||||
3. Choose execution method (MCP vs Python)
|
||||
4. Execute with progress updates
|
||||
5. Generate summary report
|
||||
|
||||
**Example Query**:
|
||||
```
|
||||
Move all pages with status "Archived" from "Active Projects"
|
||||
to "Archive" database, preserving the Project Name and Date properties
|
||||
```
|
||||
|
||||
### Pattern 3: Schema Migration
|
||||
|
||||
**Purpose**: Transfer data between databases with different schemas
|
||||
|
||||
**Steps**:
|
||||
1. Fetch source database schema
|
||||
2. Fetch target database schema
|
||||
3. Create property mapping plan:
|
||||
- Direct mappings (same type)
|
||||
- Transformations needed (type conversion)
|
||||
- Unmappable properties (manual handling)
|
||||
4. Validate compatibility
|
||||
5. Execute migration:
|
||||
- MCP for small datasets
|
||||
- Python for large datasets
|
||||
6. Verify data integrity
|
||||
|
||||
**Property Mapping Template**:
|
||||
```
|
||||
Source Property → Target Property (Transformation)
|
||||
─────────────────────────────────────────────────
|
||||
Name (title) → Project Name (title) [Direct]
|
||||
Status (select) → Stage (status) [Map values]
|
||||
Due Date (date) → Deadline (date) [Direct]
|
||||
Tags (multi) → Categories (multi) [Merge options]
|
||||
Notes (text) → Description (text) [Direct]
|
||||
Owner (text) → Assignee (people) [Manual]
|
||||
```
|
||||
|
||||
### Pattern 4: Property Cleanup
|
||||
|
||||
**Purpose**: Standardize properties across databases
|
||||
|
||||
**Common Tasks**:
|
||||
- Rename properties to consistent convention (camelCase, snake_case, Title Case)
|
||||
- Consolidate duplicate select/multi-select options
|
||||
- Remove unused properties
|
||||
- Add missing required properties
|
||||
|
||||
**Naming Convention Guide**:
|
||||
```
|
||||
Recommended: Title Case with spaces
|
||||
Examples: "Project Name", "Due Date", "Status", "Assigned To"
|
||||
|
||||
Alternative: camelCase (for technical databases)
|
||||
Examples: "projectName", "dueDate", "status", "assignedTo"
|
||||
```
|
||||
|
||||
### Pattern 5: Duplicate Detection
|
||||
|
||||
**Purpose**: Find and handle duplicate or similar content
|
||||
|
||||
**Detection Strategies**:
|
||||
1. Exact title match
|
||||
2. Fuzzy title similarity (Levenshtein distance)
|
||||
3. Property combination match (e.g., same name + date)
|
||||
4. Content hash comparison
|
||||
|
||||
**Resolution Options**:
|
||||
- Merge: Combine properties from duplicates
|
||||
- Archive: Move older duplicate to archive
|
||||
- Delete: Remove with user confirmation
|
||||
- Link: Create relation between related items
|
||||
|
||||
---
|
||||
|
||||
## MCP Tool Usage Examples
|
||||
|
||||
### Search for Pages
|
||||
```
|
||||
Use mcp__notion__search to find:
|
||||
- Query: "marketing campaign"
|
||||
- Filter: database_id = "abc123"
|
||||
```
|
||||
|
||||
### Query Database with Filters
|
||||
```
|
||||
Use mcp__notion__query-database:
|
||||
- Database ID: "abc123"
|
||||
- Filter: { "property": "Status", "select": { "equals": "Active" } }
|
||||
- Sorts: [{ "property": "Created", "direction": "descending" }]
|
||||
```
|
||||
|
||||
### Update Page Properties
|
||||
```
|
||||
Use mcp__notion__update-page:
|
||||
- Page ID: "xyz789"
|
||||
- Properties: {
|
||||
"Status": { "select": { "name": "Completed" } },
|
||||
"Completed Date": { "date": { "start": "2025-12-05" } }
|
||||
}
|
||||
```
|
||||
|
||||
### Create New Page
|
||||
```
|
||||
Use mcp__notion__create-page:
|
||||
- Parent: { "database_id": "abc123" }
|
||||
- Properties: {
|
||||
"Name": { "title": [{ "text": { "content": "New Project" } }] },
|
||||
"Status": { "select": { "name": "Planning" } }
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
### 1. Always Fetch Before Modify
|
||||
Never assume database structure. Always retrieve current schema first.
|
||||
|
||||
### 2. Batch Operations Wisely
|
||||
- Group related updates
|
||||
- Use pagination for queries
|
||||
- Implement checkpoints for large operations
|
||||
|
||||
### 3. Handle Relations Carefully
|
||||
- Relations require both databases to be accessible
|
||||
- Synced databases need special handling
|
||||
- Rollups depend on relations - update order matters
|
||||
|
||||
### 4. Preserve Data Integrity
|
||||
- Back up critical data before major changes
|
||||
- Use transactions where possible
|
||||
- Verify changes after execution
|
||||
|
||||
### 5. Respect User Permissions
|
||||
- Check integration has access to target resources
|
||||
- Request additional permissions when needed
|
||||
- Document permission requirements
|
||||
@@ -0,0 +1,331 @@
|
||||
"""
|
||||
Notion Async Organizer - Base Template
|
||||
======================================
|
||||
Purpose: Rate-limited async operations for Notion API
|
||||
Python: 3.10+
|
||||
Packages: notion-client, tenacity, tqdm, python-dotenv
|
||||
|
||||
Usage:
|
||||
python async_organizer.py --database-id <id> [--dry-run]
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import argparse
|
||||
import logging
|
||||
import os
|
||||
from asyncio import Semaphore
|
||||
from datetime import datetime
|
||||
from typing import Any
|
||||
|
||||
from dotenv import load_dotenv
|
||||
from notion_client import AsyncClient
|
||||
from tenacity import (
|
||||
retry,
|
||||
stop_after_attempt,
|
||||
wait_exponential,
|
||||
retry_if_exception_type,
|
||||
)
|
||||
from tqdm.asyncio import tqdm
|
||||
|
||||
# Load environment variables
|
||||
load_dotenv()
|
||||
|
||||
# Configuration
|
||||
NOTION_API_KEY = os.getenv("NOTION_TOKEN") or os.getenv("NOTION_API_KEY")
|
||||
MAX_CONCURRENT_REQUESTS = 3
|
||||
REQUEST_DELAY = 0.35 # ~3 requests/second
|
||||
|
||||
# Logging setup
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format="%(asctime)s - %(levelname)s - %(message)s",
|
||||
handlers=[
|
||||
logging.StreamHandler(),
|
||||
logging.FileHandler(f"notion_organizer_{datetime.now():%Y%m%d_%H%M%S}.log"),
|
||||
],
|
||||
)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class NotionAsyncOrganizer:
|
||||
"""Async Notion operations with rate limiting and retry logic."""
|
||||
|
||||
def __init__(self, api_key: str, dry_run: bool = False):
|
||||
self.client = AsyncClient(auth=api_key)
|
||||
self.semaphore = Semaphore(MAX_CONCURRENT_REQUESTS)
|
||||
self.dry_run = dry_run
|
||||
self.stats = {"fetched": 0, "updated": 0, "created": 0, "errors": 0}
|
||||
|
||||
@retry(
|
||||
stop=stop_after_attempt(3),
|
||||
wait=wait_exponential(multiplier=1, min=2, max=10),
|
||||
retry=retry_if_exception_type(Exception),
|
||||
)
|
||||
async def _rate_limited_request(self, coro):
|
||||
"""Execute request with rate limiting and retry."""
|
||||
async with self.semaphore:
|
||||
await asyncio.sleep(REQUEST_DELAY)
|
||||
return await coro
|
||||
|
||||
async def fetch_database_schema(self, database_id: str) -> dict:
|
||||
"""Fetch database schema/properties."""
|
||||
logger.info(f"Fetching database schema: {database_id}")
|
||||
response = await self._rate_limited_request(
|
||||
self.client.databases.retrieve(database_id=database_id)
|
||||
)
|
||||
self.stats["fetched"] += 1
|
||||
return response
|
||||
|
||||
async def fetch_all_pages(
|
||||
self,
|
||||
database_id: str,
|
||||
filter_obj: dict | None = None,
|
||||
sorts: list | None = None,
|
||||
) -> list[dict]:
|
||||
"""Fetch all pages from a database with pagination."""
|
||||
all_pages = []
|
||||
has_more = True
|
||||
start_cursor = None
|
||||
|
||||
logger.info(f"Fetching pages from database: {database_id}")
|
||||
|
||||
while has_more:
|
||||
query_params = {
|
||||
"database_id": database_id,
|
||||
"page_size": 100,
|
||||
}
|
||||
if start_cursor:
|
||||
query_params["start_cursor"] = start_cursor
|
||||
if filter_obj:
|
||||
query_params["filter"] = filter_obj
|
||||
if sorts:
|
||||
query_params["sorts"] = sorts
|
||||
|
||||
response = await self._rate_limited_request(
|
||||
self.client.databases.query(**query_params)
|
||||
)
|
||||
|
||||
all_pages.extend(response["results"])
|
||||
has_more = response.get("has_more", False)
|
||||
start_cursor = response.get("next_cursor")
|
||||
self.stats["fetched"] += len(response["results"])
|
||||
|
||||
logger.info(f"Fetched {len(all_pages)} pages so far...")
|
||||
|
||||
return all_pages
|
||||
|
||||
async def update_page(self, page_id: str, properties: dict) -> dict | None:
|
||||
"""Update a single page's properties."""
|
||||
if self.dry_run:
|
||||
logger.info(f"[DRY-RUN] Would update page {page_id}: {properties}")
|
||||
return None
|
||||
|
||||
try:
|
||||
result = await self._rate_limited_request(
|
||||
self.client.pages.update(page_id=page_id, properties=properties)
|
||||
)
|
||||
self.stats["updated"] += 1
|
||||
return result
|
||||
except Exception as e:
|
||||
self.stats["errors"] += 1
|
||||
logger.error(f"Failed to update page {page_id}: {e}")
|
||||
raise
|
||||
|
||||
async def batch_update_pages(
|
||||
self, updates: list[dict], desc: str = "Updating pages"
|
||||
) -> list[dict]:
|
||||
"""Update multiple pages concurrently with progress bar."""
|
||||
results = []
|
||||
|
||||
async def update_single(update: dict) -> dict:
|
||||
try:
|
||||
result = await self.update_page(
|
||||
update["page_id"], update["properties"]
|
||||
)
|
||||
return {"page_id": update["page_id"], "success": True, "result": result}
|
||||
except Exception as e:
|
||||
return {"page_id": update["page_id"], "success": False, "error": str(e)}
|
||||
|
||||
tasks = [update_single(u) for u in updates]
|
||||
|
||||
for coro in tqdm.as_completed(tasks, total=len(tasks), desc=desc):
|
||||
result = await coro
|
||||
results.append(result)
|
||||
|
||||
success_count = sum(1 for r in results if r["success"])
|
||||
logger.info(f"Batch update complete: {success_count}/{len(updates)} succeeded")
|
||||
|
||||
return results
|
||||
|
||||
async def create_page(
|
||||
self, parent: dict, properties: dict, children: list | None = None
|
||||
) -> dict | None:
|
||||
"""Create a new page."""
|
||||
if self.dry_run:
|
||||
logger.info(f"[DRY-RUN] Would create page: {properties}")
|
||||
return None
|
||||
|
||||
try:
|
||||
create_params = {"parent": parent, "properties": properties}
|
||||
if children:
|
||||
create_params["children"] = children
|
||||
|
||||
result = await self._rate_limited_request(
|
||||
self.client.pages.create(**create_params)
|
||||
)
|
||||
self.stats["created"] += 1
|
||||
return result
|
||||
except Exception as e:
|
||||
self.stats["errors"] += 1
|
||||
logger.error(f"Failed to create page: {e}")
|
||||
raise
|
||||
|
||||
async def search(
|
||||
self, query: str, filter_type: str | None = None
|
||||
) -> list[dict]:
|
||||
"""Search Notion workspace."""
|
||||
all_results = []
|
||||
has_more = True
|
||||
start_cursor = None
|
||||
|
||||
while has_more:
|
||||
search_params = {"query": query, "page_size": 100}
|
||||
if start_cursor:
|
||||
search_params["start_cursor"] = start_cursor
|
||||
if filter_type:
|
||||
search_params["filter"] = {"property": "object", "value": filter_type}
|
||||
|
||||
response = await self._rate_limited_request(
|
||||
self.client.search(**search_params)
|
||||
)
|
||||
|
||||
all_results.extend(response["results"])
|
||||
has_more = response.get("has_more", False)
|
||||
start_cursor = response.get("next_cursor")
|
||||
|
||||
return all_results
|
||||
|
||||
def print_stats(self):
|
||||
"""Print operation statistics."""
|
||||
logger.info("=" * 50)
|
||||
logger.info("Operation Statistics:")
|
||||
logger.info(f" Fetched: {self.stats['fetched']}")
|
||||
logger.info(f" Updated: {self.stats['updated']}")
|
||||
logger.info(f" Created: {self.stats['created']}")
|
||||
logger.info(f" Errors: {self.stats['errors']}")
|
||||
logger.info("=" * 50)
|
||||
|
||||
async def close(self):
|
||||
"""Close the client connection."""
|
||||
await self.client.aclose()
|
||||
|
||||
|
||||
# ============================================================
|
||||
# Example Operations - Customize these for your specific task
|
||||
# ============================================================
|
||||
|
||||
|
||||
async def example_audit_database(organizer: NotionAsyncOrganizer, database_id: str):
|
||||
"""Example: Audit a database and report on its structure."""
|
||||
schema = await organizer.fetch_database_schema(database_id)
|
||||
|
||||
print("\n" + "=" * 60)
|
||||
print(f"Database: {schema.get('title', [{}])[0].get('plain_text', 'Untitled')}")
|
||||
print("=" * 60)
|
||||
|
||||
properties = schema.get("properties", {})
|
||||
print(f"\nTotal Properties: {len(properties)}\n")
|
||||
|
||||
for name, prop in properties.items():
|
||||
prop_type = prop.get("type", "unknown")
|
||||
print(f" - {name}: {prop_type}")
|
||||
|
||||
# Show select/multi_select options
|
||||
if prop_type in ("select", "multi_select"):
|
||||
options = prop.get(prop_type, {}).get("options", [])
|
||||
if options:
|
||||
option_names = [o["name"] for o in options[:5]]
|
||||
suffix = f" (+{len(options) - 5} more)" if len(options) > 5 else ""
|
||||
print(f" Options: {', '.join(option_names)}{suffix}")
|
||||
|
||||
return schema
|
||||
|
||||
|
||||
async def example_bulk_status_update(
|
||||
organizer: NotionAsyncOrganizer,
|
||||
database_id: str,
|
||||
old_status: str,
|
||||
new_status: str,
|
||||
):
|
||||
"""Example: Update status for all pages matching a filter."""
|
||||
# Fetch pages with old status
|
||||
filter_obj = {"property": "Status", "select": {"equals": old_status}}
|
||||
pages = await organizer.fetch_all_pages(database_id, filter_obj=filter_obj)
|
||||
|
||||
print(f"\nFound {len(pages)} pages with status '{old_status}'")
|
||||
|
||||
if not pages:
|
||||
return
|
||||
|
||||
# Prepare updates
|
||||
updates = [
|
||||
{
|
||||
"page_id": page["id"],
|
||||
"properties": {"Status": {"select": {"name": new_status}}},
|
||||
}
|
||||
for page in pages
|
||||
]
|
||||
|
||||
# Execute batch update
|
||||
results = await organizer.batch_update_pages(
|
||||
updates, desc=f"Updating status to '{new_status}'"
|
||||
)
|
||||
|
||||
return results
|
||||
|
||||
|
||||
async def main():
|
||||
"""Main entry point."""
|
||||
parser = argparse.ArgumentParser(description="Notion Async Organizer")
|
||||
parser.add_argument("--database-id", "-d", required=True, help="Database ID")
|
||||
parser.add_argument(
|
||||
"--dry-run", action="store_true", help="Preview changes without executing"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--operation",
|
||||
"-o",
|
||||
choices=["audit", "status-update"],
|
||||
default="audit",
|
||||
help="Operation to perform",
|
||||
)
|
||||
parser.add_argument("--old-status", help="Old status value (for status-update)")
|
||||
parser.add_argument("--new-status", help="New status value (for status-update)")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
if not NOTION_API_KEY:
|
||||
logger.error("NOTION_TOKEN or NOTION_API_KEY environment variable not set")
|
||||
return
|
||||
|
||||
organizer = NotionAsyncOrganizer(NOTION_API_KEY, dry_run=args.dry_run)
|
||||
|
||||
try:
|
||||
if args.operation == "audit":
|
||||
await example_audit_database(organizer, args.database_id)
|
||||
|
||||
elif args.operation == "status-update":
|
||||
if not args.old_status or not args.new_status:
|
||||
logger.error("--old-status and --new-status required for status-update")
|
||||
return
|
||||
await example_bulk_status_update(
|
||||
organizer, args.database_id, args.old_status, args.new_status
|
||||
)
|
||||
|
||||
organizer.print_stats()
|
||||
|
||||
finally:
|
||||
await organizer.close()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
@@ -0,0 +1,26 @@
|
||||
# Notion Organizer Scripts - Requirements
|
||||
# Python 3.10+ required
|
||||
|
||||
# Notion API client
|
||||
notion-client==2.2.1
|
||||
|
||||
# Async HTTP
|
||||
aiohttp==3.9.1
|
||||
|
||||
# Rate limiting
|
||||
asyncio-throttle==1.0.2
|
||||
|
||||
# Environment variables
|
||||
python-dotenv==1.0.0
|
||||
|
||||
# Retry logic
|
||||
tenacity==8.2.3
|
||||
|
||||
# Progress bars
|
||||
tqdm==4.66.1
|
||||
|
||||
# Optional: Data analysis
|
||||
# pandas==2.1.4
|
||||
|
||||
# Optional: Fuzzy matching for duplicates
|
||||
# rapidfuzz==3.5.2
|
||||
@@ -0,0 +1,367 @@
|
||||
"""
|
||||
Notion Schema Migrator
|
||||
======================
|
||||
Purpose: Migrate data between Notion databases with schema mapping
|
||||
Python: 3.10+
|
||||
Packages: notion-client, tenacity, tqdm, python-dotenv
|
||||
|
||||
Usage:
|
||||
python schema_migrator.py \
|
||||
--source-db <source_database_id> \
|
||||
--target-db <target_database_id> \
|
||||
--mapping mapping.json \
|
||||
[--dry-run]
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import argparse
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
from asyncio import Semaphore
|
||||
from datetime import datetime
|
||||
from typing import Any
|
||||
|
||||
from dotenv import load_dotenv
|
||||
from notion_client import AsyncClient
|
||||
from tenacity import retry, stop_after_attempt, wait_exponential
|
||||
from tqdm.asyncio import tqdm
|
||||
|
||||
load_dotenv()
|
||||
|
||||
NOTION_API_KEY = os.getenv("NOTION_TOKEN") or os.getenv("NOTION_API_KEY")
|
||||
MAX_CONCURRENT_REQUESTS = 3
|
||||
REQUEST_DELAY = 0.35
|
||||
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format="%(asctime)s - %(levelname)s - %(message)s",
|
||||
)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class SchemaMigrator:
|
||||
"""Migrate data between Notion databases with property mapping."""
|
||||
|
||||
def __init__(self, api_key: str, dry_run: bool = False):
|
||||
self.client = AsyncClient(auth=api_key)
|
||||
self.semaphore = Semaphore(MAX_CONCURRENT_REQUESTS)
|
||||
self.dry_run = dry_run
|
||||
self.stats = {
|
||||
"pages_fetched": 0,
|
||||
"pages_migrated": 0,
|
||||
"pages_skipped": 0,
|
||||
"errors": 0,
|
||||
}
|
||||
|
||||
@retry(
|
||||
stop=stop_after_attempt(3),
|
||||
wait=wait_exponential(multiplier=1, min=2, max=10),
|
||||
)
|
||||
async def _request(self, coro):
|
||||
async with self.semaphore:
|
||||
await asyncio.sleep(REQUEST_DELAY)
|
||||
return await coro
|
||||
|
||||
async def get_schema(self, database_id: str) -> dict:
|
||||
"""Get database schema."""
|
||||
return await self._request(
|
||||
self.client.databases.retrieve(database_id=database_id)
|
||||
)
|
||||
|
||||
async def fetch_all_pages(self, database_id: str) -> list[dict]:
|
||||
"""Fetch all pages from source database."""
|
||||
pages = []
|
||||
has_more = True
|
||||
cursor = None
|
||||
|
||||
while has_more:
|
||||
params = {"database_id": database_id, "page_size": 100}
|
||||
if cursor:
|
||||
params["start_cursor"] = cursor
|
||||
|
||||
response = await self._request(self.client.databases.query(**params))
|
||||
pages.extend(response["results"])
|
||||
has_more = response.get("has_more", False)
|
||||
cursor = response.get("next_cursor")
|
||||
self.stats["pages_fetched"] = len(pages)
|
||||
logger.info(f"Fetched {len(pages)} pages...")
|
||||
|
||||
return pages
|
||||
|
||||
def transform_property(
|
||||
self,
|
||||
value: dict,
|
||||
source_type: str,
|
||||
target_type: str,
|
||||
value_mapping: dict | None = None,
|
||||
) -> dict | None:
|
||||
"""Transform a property value from source to target type."""
|
||||
|
||||
# Extract raw value based on source type
|
||||
raw_value = None
|
||||
|
||||
if source_type == "title":
|
||||
raw_value = "".join(
|
||||
t.get("plain_text", "") for t in value.get("title", [])
|
||||
)
|
||||
elif source_type == "rich_text":
|
||||
raw_value = "".join(
|
||||
t.get("plain_text", "") for t in value.get("rich_text", [])
|
||||
)
|
||||
elif source_type == "number":
|
||||
raw_value = value.get("number")
|
||||
elif source_type == "select":
|
||||
select_val = value.get("select")
|
||||
raw_value = select_val.get("name") if select_val else None
|
||||
elif source_type == "multi_select":
|
||||
raw_value = [o.get("name") for o in value.get("multi_select", [])]
|
||||
elif source_type == "status":
|
||||
status_val = value.get("status")
|
||||
raw_value = status_val.get("name") if status_val else None
|
||||
elif source_type == "date":
|
||||
raw_value = value.get("date")
|
||||
elif source_type == "checkbox":
|
||||
raw_value = value.get("checkbox")
|
||||
elif source_type == "url":
|
||||
raw_value = value.get("url")
|
||||
elif source_type == "email":
|
||||
raw_value = value.get("email")
|
||||
elif source_type == "phone_number":
|
||||
raw_value = value.get("phone_number")
|
||||
|
||||
if raw_value is None:
|
||||
return None
|
||||
|
||||
# Apply value mapping if provided
|
||||
if value_mapping and isinstance(raw_value, str):
|
||||
raw_value = value_mapping.get(raw_value, raw_value)
|
||||
elif value_mapping and isinstance(raw_value, list):
|
||||
raw_value = [value_mapping.get(v, v) for v in raw_value]
|
||||
|
||||
# Convert to target type
|
||||
if target_type == "title":
|
||||
return {"title": [{"text": {"content": str(raw_value)}}]}
|
||||
elif target_type == "rich_text":
|
||||
return {"rich_text": [{"text": {"content": str(raw_value)}}]}
|
||||
elif target_type == "number":
|
||||
try:
|
||||
return {"number": float(raw_value) if raw_value else None}
|
||||
except (ValueError, TypeError):
|
||||
return None
|
||||
elif target_type == "select":
|
||||
return {"select": {"name": str(raw_value)}} if raw_value else None
|
||||
elif target_type == "multi_select":
|
||||
if isinstance(raw_value, list):
|
||||
return {"multi_select": [{"name": v} for v in raw_value]}
|
||||
return {"multi_select": [{"name": str(raw_value)}]}
|
||||
elif target_type == "status":
|
||||
return {"status": {"name": str(raw_value)}} if raw_value else None
|
||||
elif target_type == "date":
|
||||
return {"date": raw_value} if raw_value else None
|
||||
elif target_type == "checkbox":
|
||||
return {"checkbox": bool(raw_value)}
|
||||
elif target_type == "url":
|
||||
return {"url": str(raw_value)} if raw_value else None
|
||||
elif target_type == "email":
|
||||
return {"email": str(raw_value)} if raw_value else None
|
||||
elif target_type == "phone_number":
|
||||
return {"phone_number": str(raw_value)} if raw_value else None
|
||||
|
||||
return None
|
||||
|
||||
def map_page_properties(
|
||||
self,
|
||||
source_page: dict,
|
||||
mapping: dict,
|
||||
source_schema: dict,
|
||||
target_schema: dict,
|
||||
) -> dict:
|
||||
"""Map source page properties to target schema."""
|
||||
source_props = source_page.get("properties", {})
|
||||
target_props = {}
|
||||
|
||||
for source_name, mapping_config in mapping.items():
|
||||
if source_name not in source_props:
|
||||
continue
|
||||
|
||||
target_name = mapping_config.get("target", source_name)
|
||||
value_mapping = mapping_config.get("value_mapping")
|
||||
|
||||
source_type = source_schema["properties"].get(source_name, {}).get("type")
|
||||
target_type = target_schema["properties"].get(target_name, {}).get("type")
|
||||
|
||||
if not source_type or not target_type:
|
||||
logger.warning(
|
||||
f"Skipping {source_name}: source_type={source_type}, target_type={target_type}"
|
||||
)
|
||||
continue
|
||||
|
||||
transformed = self.transform_property(
|
||||
source_props[source_name],
|
||||
source_type,
|
||||
target_type,
|
||||
value_mapping,
|
||||
)
|
||||
|
||||
if transformed:
|
||||
target_props[target_name] = transformed
|
||||
|
||||
return target_props
|
||||
|
||||
async def migrate_page(
|
||||
self,
|
||||
page: dict,
|
||||
target_database_id: str,
|
||||
mapping: dict,
|
||||
source_schema: dict,
|
||||
target_schema: dict,
|
||||
) -> dict:
|
||||
"""Migrate a single page to target database."""
|
||||
page_id = page["id"]
|
||||
|
||||
try:
|
||||
properties = self.map_page_properties(
|
||||
page, mapping, source_schema, target_schema
|
||||
)
|
||||
|
||||
if not properties:
|
||||
self.stats["pages_skipped"] += 1
|
||||
return {"page_id": page_id, "success": False, "reason": "no_properties"}
|
||||
|
||||
if self.dry_run:
|
||||
logger.debug(f"[DRY-RUN] Would create: {properties}")
|
||||
return {"page_id": page_id, "success": True, "dry_run": True}
|
||||
|
||||
result = await self._request(
|
||||
self.client.pages.create(
|
||||
parent={"database_id": target_database_id},
|
||||
properties=properties,
|
||||
)
|
||||
)
|
||||
self.stats["pages_migrated"] += 1
|
||||
return {"page_id": page_id, "success": True, "new_page_id": result["id"]}
|
||||
|
||||
except Exception as e:
|
||||
self.stats["errors"] += 1
|
||||
logger.error(f"Failed to migrate page {page_id}: {e}")
|
||||
return {"page_id": page_id, "success": False, "error": str(e)}
|
||||
|
||||
async def migrate(
|
||||
self,
|
||||
source_db: str,
|
||||
target_db: str,
|
||||
mapping: dict,
|
||||
) -> list[dict]:
|
||||
"""Execute full migration."""
|
||||
logger.info("Fetching schemas...")
|
||||
source_schema = await self.get_schema(source_db)
|
||||
target_schema = await self.get_schema(target_db)
|
||||
|
||||
logger.info(f"Source: {len(source_schema['properties'])} properties")
|
||||
logger.info(f"Target: {len(target_schema['properties'])} properties")
|
||||
|
||||
logger.info("Fetching source pages...")
|
||||
pages = await self.fetch_all_pages(source_db)
|
||||
logger.info(f"Found {len(pages)} pages to migrate")
|
||||
|
||||
results = []
|
||||
for page in tqdm(pages, desc="Migrating"):
|
||||
result = await self.migrate_page(
|
||||
page, target_db, mapping, source_schema, target_schema
|
||||
)
|
||||
results.append(result)
|
||||
|
||||
return results
|
||||
|
||||
def print_stats(self):
|
||||
logger.info("=" * 50)
|
||||
logger.info("Migration Statistics:")
|
||||
logger.info(f" Pages Fetched: {self.stats['pages_fetched']}")
|
||||
logger.info(f" Pages Migrated: {self.stats['pages_migrated']}")
|
||||
logger.info(f" Pages Skipped: {self.stats['pages_skipped']}")
|
||||
logger.info(f" Errors: {self.stats['errors']}")
|
||||
logger.info("=" * 50)
|
||||
|
||||
async def close(self):
|
||||
await self.client.aclose()
|
||||
|
||||
|
||||
def generate_mapping_template(source_schema: dict, target_schema: dict) -> dict:
|
||||
"""Generate a mapping template for user to customize."""
|
||||
mapping = {}
|
||||
|
||||
for prop_name, prop_config in source_schema.get("properties", {}).items():
|
||||
source_type = prop_config.get("type")
|
||||
|
||||
# Try to find matching property in target
|
||||
target_match = None
|
||||
for t_name, t_config in target_schema.get("properties", {}).items():
|
||||
if t_name.lower() == prop_name.lower():
|
||||
target_match = t_name
|
||||
break
|
||||
|
||||
mapping[prop_name] = {
|
||||
"target": target_match or prop_name,
|
||||
"source_type": source_type,
|
||||
"value_mapping": None, # User can add {"old_value": "new_value"}
|
||||
}
|
||||
|
||||
return mapping
|
||||
|
||||
|
||||
async def main():
|
||||
parser = argparse.ArgumentParser(description="Notion Schema Migrator")
|
||||
parser.add_argument("--source-db", "-s", required=True, help="Source database ID")
|
||||
parser.add_argument("--target-db", "-t", required=True, help="Target database ID")
|
||||
parser.add_argument("--mapping", "-m", help="JSON mapping file path")
|
||||
parser.add_argument(
|
||||
"--generate-mapping",
|
||||
action="store_true",
|
||||
help="Generate mapping template",
|
||||
)
|
||||
parser.add_argument("--dry-run", action="store_true", help="Preview without executing")
|
||||
parser.add_argument("--output", "-o", help="Output file for generated mapping")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
if not NOTION_API_KEY:
|
||||
logger.error("NOTION_TOKEN or NOTION_API_KEY not set")
|
||||
return
|
||||
|
||||
migrator = SchemaMigrator(NOTION_API_KEY, dry_run=args.dry_run)
|
||||
|
||||
try:
|
||||
if args.generate_mapping:
|
||||
source_schema = await migrator.get_schema(args.source_db)
|
||||
target_schema = await migrator.get_schema(args.target_db)
|
||||
mapping = generate_mapping_template(source_schema, target_schema)
|
||||
|
||||
output_file = args.output or "mapping_template.json"
|
||||
with open(output_file, "w") as f:
|
||||
json.dump(mapping, f, indent=2)
|
||||
logger.info(f"Mapping template saved to {output_file}")
|
||||
return
|
||||
|
||||
if not args.mapping:
|
||||
logger.error("--mapping required for migration (or use --generate-mapping)")
|
||||
return
|
||||
|
||||
with open(args.mapping) as f:
|
||||
mapping = json.load(f)
|
||||
|
||||
results = await migrator.migrate(args.source_db, args.target_db, mapping)
|
||||
migrator.print_stats()
|
||||
|
||||
# Save results
|
||||
output_file = f"migration_results_{datetime.now():%Y%m%d_%H%M%S}.json"
|
||||
with open(output_file, "w") as f:
|
||||
json.dump(results, f, indent=2)
|
||||
logger.info(f"Results saved to {output_file}")
|
||||
|
||||
finally:
|
||||
await migrator.close()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
@@ -0,0 +1,56 @@
|
||||
{
|
||||
"_comment": "Property Mapping Template - Customize for your migration",
|
||||
"_instructions": {
|
||||
"target": "Name of property in target database",
|
||||
"source_type": "Auto-detected, for reference only",
|
||||
"value_mapping": "Optional: Map old values to new values"
|
||||
},
|
||||
"Name": {
|
||||
"target": "Project Name",
|
||||
"source_type": "title",
|
||||
"value_mapping": null
|
||||
},
|
||||
"Status": {
|
||||
"target": "Stage",
|
||||
"source_type": "select",
|
||||
"value_mapping": {
|
||||
"Not Started": "Backlog",
|
||||
"In Progress": "Active",
|
||||
"Done": "Completed",
|
||||
"On Hold": "Paused"
|
||||
}
|
||||
},
|
||||
"Priority": {
|
||||
"target": "Priority",
|
||||
"source_type": "select",
|
||||
"value_mapping": {
|
||||
"P1": "High",
|
||||
"P2": "Medium",
|
||||
"P3": "Low"
|
||||
}
|
||||
},
|
||||
"Due Date": {
|
||||
"target": "Deadline",
|
||||
"source_type": "date",
|
||||
"value_mapping": null
|
||||
},
|
||||
"Tags": {
|
||||
"target": "Categories",
|
||||
"source_type": "multi_select",
|
||||
"value_mapping": {
|
||||
"marketing": "Marketing",
|
||||
"dev": "Development",
|
||||
"design": "Design"
|
||||
}
|
||||
},
|
||||
"Description": {
|
||||
"target": "Notes",
|
||||
"source_type": "rich_text",
|
||||
"value_mapping": null
|
||||
},
|
||||
"Completed": {
|
||||
"target": "Is Done",
|
||||
"source_type": "checkbox",
|
||||
"value_mapping": null
|
||||
}
|
||||
}
|
||||
95
custom-skills/01-notion-organizer/desktop/SKILL.md
Normal file
95
custom-skills/01-notion-organizer/desktop/SKILL.md
Normal file
@@ -0,0 +1,95 @@
|
||||
---
|
||||
name: notion-organizer
|
||||
version: 1.0.0
|
||||
description: Notion workspace management agent for organizing, restructuring, consolidating, and maintaining databases and pages. Use when user asks to search Notion, organize databases, clean up properties, migrate data, merge databases, audit schemas, or manage Notion content. Activates for keywords like Notion, database, knowledge base, wiki, workspace organization.
|
||||
allowed-tools: mcp__notion__*, Read, Write, Edit, Bash(python:*), Bash(pip:*)
|
||||
---
|
||||
|
||||
# Notion Organizer Skill
|
||||
|
||||
## Purpose
|
||||
|
||||
Specialized Notion workspace management capability for:
|
||||
- Database schema analysis and optimization
|
||||
- Property standardization and cleanup
|
||||
- Content restructuring and hierarchy optimization
|
||||
- Database merging and migration
|
||||
- Bulk operations with rate-limit compliance
|
||||
|
||||
## Execution Strategy: Three-Tier Approach
|
||||
|
||||
Always follow this priority order:
|
||||
|
||||
### Tier 1: Notion MCP Tools (Primary)
|
||||
|
||||
Use built-in MCP tools first. Available tools:
|
||||
|
||||
| Tool | Purpose |
|
||||
|------|---------|
|
||||
| `mcp__notion__search` | Find pages/databases by keyword |
|
||||
| `mcp__notion__get-page` | Retrieve page content |
|
||||
| `mcp__notion__get-database` | Retrieve database schema |
|
||||
| `mcp__notion__create-page` | Create new pages |
|
||||
| `mcp__notion__update-page` | Modify page properties |
|
||||
| `mcp__notion__query-database` | Query database with filters |
|
||||
|
||||
### Tier 2: Alternative Approaches (Fallback)
|
||||
|
||||
If MCP tools insufficient:
|
||||
- Export/import via filesystem (user action required)
|
||||
- Memory tools for tracking state across sessions
|
||||
- Sequential thinking for complex planning
|
||||
|
||||
### Tier 3: Python Scripts (Advanced)
|
||||
|
||||
For bulk operations (50+ items):
|
||||
- Generate async Python scripts
|
||||
- Include rate limiting (3 req/sec max)
|
||||
- Provide requirements.txt
|
||||
- Always include dry-run option
|
||||
|
||||
See `scripts/` directory for templates.
|
||||
|
||||
## Operational Guidelines
|
||||
|
||||
### Before Any Modification
|
||||
1. **Fetch first**: Always examine current structure before changes
|
||||
2. **Confirm destructive actions**: Get user approval for deletes/major restructures
|
||||
3. **Estimate impact**: For large operations, provide time/API call estimates
|
||||
4. **Backup reminder**: Remind about Notion version history
|
||||
|
||||
### Rate Limits (Critical)
|
||||
- Maximum: 3 requests/second average
|
||||
- Use pagination (100 items max per request)
|
||||
- Implement exponential backoff on 429 errors
|
||||
|
||||
### Communication
|
||||
- Korean for explanations (한국어로 설명)
|
||||
- English for code and technical terms
|
||||
- Structured before/after summaries
|
||||
|
||||
## Quick Commands
|
||||
|
||||
### Database Audit
|
||||
"Analyze [database name] structure and recommend optimizations"
|
||||
|
||||
### Property Cleanup
|
||||
"Standardize property names in [database] to [convention]"
|
||||
|
||||
### Bulk Move
|
||||
"Move all pages tagged [X] from [source] to [target]"
|
||||
|
||||
### Schema Migration
|
||||
"Migrate data from [source database] to [target database]"
|
||||
|
||||
## Workflow Patterns
|
||||
|
||||
See `reference.md` for detailed workflow documentation.
|
||||
See `scripts/` for Python templates.
|
||||
|
||||
## Limitations
|
||||
|
||||
- Cannot access unshared databases/pages
|
||||
- Cannot modify workspace settings
|
||||
- Cannot recover permanently deleted content
|
||||
- Large operations (1000+ pages) require Python scripts
|
||||
299
custom-skills/01-notion-organizer/desktop/examples/examples.md
Normal file
299
custom-skills/01-notion-organizer/desktop/examples/examples.md
Normal file
@@ -0,0 +1,299 @@
|
||||
# Notion Organizer Examples
|
||||
|
||||
## Quick Reference: Common Tasks
|
||||
|
||||
### 1. Database Audit
|
||||
|
||||
**Request:**
|
||||
```
|
||||
Audit my "Client Projects" database and recommend optimizations
|
||||
```
|
||||
|
||||
**Expected Actions:**
|
||||
1. Fetch database schema using MCP
|
||||
2. Analyze properties: types, naming, usage
|
||||
3. Report findings in Korean
|
||||
4. Recommend specific improvements
|
||||
|
||||
**Sample Output:**
|
||||
```
|
||||
## Database Audit: Client Projects
|
||||
|
||||
### 현재 구조 (Current Structure)
|
||||
- Properties: 15개
|
||||
- 페이지 수: 234개
|
||||
|
||||
### 발견된 문제점 (Issues Found)
|
||||
|
||||
1. **중복 속성**: "담당자"와 "Assignee" 동일한 용도
|
||||
2. **미사용 속성**: "Legacy ID" - 모든 값이 비어있음
|
||||
3. **명명 불일치**: "due_date" vs "Due Date" vs "마감일"
|
||||
|
||||
### 권장 조치 (Recommendations)
|
||||
|
||||
1. "담당자"와 "Assignee" 통합 → "담당자"로 유지
|
||||
2. "Legacy ID" 속성 삭제
|
||||
3. 명명 규칙 통일: Title Case 권장
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 2. Bulk Status Update
|
||||
|
||||
**Request:**
|
||||
```
|
||||
Change all pages with status "Pending Review" to "In Review"
|
||||
in the Tasks database
|
||||
```
|
||||
|
||||
**Tier 1 Approach (MCP):**
|
||||
```python
|
||||
# Claude uses MCP tools:
|
||||
1. mcp__notion__query-database with filter:
|
||||
{"property": "Status", "select": {"equals": "Pending Review"}}
|
||||
|
||||
2. For each page, mcp__notion__update-page:
|
||||
{"Status": {"select": {"name": "In Review"}}}
|
||||
```
|
||||
|
||||
**Tier 3 Approach (Python - for 50+ pages):**
|
||||
```bash
|
||||
python scripts/async_organizer.py \
|
||||
--database-id abc123 \
|
||||
--operation status-update \
|
||||
--old-status "Pending Review" \
|
||||
--new-status "In Review" \
|
||||
--dry-run # Test first!
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 3. Schema Migration
|
||||
|
||||
**Request:**
|
||||
```
|
||||
Migrate data from "Old Projects" to "New Projects" database.
|
||||
Map Status→Stage, Due Date→Deadline, Tags→Categories
|
||||
```
|
||||
|
||||
**Step 1: Generate Mapping Template**
|
||||
```bash
|
||||
python scripts/schema_migrator.py \
|
||||
--source-db old_projects_id \
|
||||
--target-db new_projects_id \
|
||||
--generate-mapping \
|
||||
--output my_mapping.json
|
||||
```
|
||||
|
||||
**Step 2: Customize Mapping**
|
||||
Edit `my_mapping.json`:
|
||||
```json
|
||||
{
|
||||
"Status": {
|
||||
"target": "Stage",
|
||||
"value_mapping": {
|
||||
"Todo": "Backlog",
|
||||
"Doing": "In Progress",
|
||||
"Done": "Complete"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Step 3: Execute Migration**
|
||||
```bash
|
||||
# Dry run first
|
||||
python scripts/schema_migrator.py \
|
||||
--source-db old_projects_id \
|
||||
--target-db new_projects_id \
|
||||
--mapping my_mapping.json \
|
||||
--dry-run
|
||||
|
||||
# Execute
|
||||
python scripts/schema_migrator.py \
|
||||
--source-db old_projects_id \
|
||||
--target-db new_projects_id \
|
||||
--mapping my_mapping.json
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 4. Property Cleanup
|
||||
|
||||
**Request:**
|
||||
```
|
||||
Standardize all property names in "Marketing Campaigns"
|
||||
to Title Case with spaces
|
||||
```
|
||||
|
||||
**Before:**
|
||||
```
|
||||
- campaign_name → Campaign Name
|
||||
- startDate → Start Date
|
||||
- end-date → End Date
|
||||
- STATUS → Status
|
||||
- assigned_to → Assigned To
|
||||
```
|
||||
|
||||
**MCP Approach:**
|
||||
```
|
||||
Use mcp__notion__update-database to rename properties:
|
||||
{
|
||||
"properties": {
|
||||
"campaign_name": { "name": "Campaign Name" },
|
||||
"startDate": { "name": "Start Date" },
|
||||
"end-date": { "name": "End Date" },
|
||||
"STATUS": { "name": "Status" }
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 5. Duplicate Detection
|
||||
|
||||
**Request:**
|
||||
```
|
||||
Find duplicate entries in "Contacts" database based on email
|
||||
```
|
||||
|
||||
**Python Script Approach:**
|
||||
```python
|
||||
# Pseudocode for duplicate detection
|
||||
pages = fetch_all_pages(database_id)
|
||||
|
||||
# Group by email
|
||||
email_groups = {}
|
||||
for page in pages:
|
||||
email = get_property(page, "Email")
|
||||
if email:
|
||||
email_groups.setdefault(email, []).append(page)
|
||||
|
||||
# Find duplicates
|
||||
duplicates = {
|
||||
email: pages
|
||||
for email, pages in email_groups.items()
|
||||
if len(pages) > 1
|
||||
}
|
||||
|
||||
# Report
|
||||
for email, dup_pages in duplicates.items():
|
||||
print(f"Duplicate: {email}")
|
||||
for p in dup_pages:
|
||||
print(f" - {get_title(p)} (created: {p['created_time']})")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 6. Archive Old Content
|
||||
|
||||
**Request:**
|
||||
```
|
||||
Move all tasks completed more than 90 days ago to Archive database
|
||||
```
|
||||
|
||||
**Filter:**
|
||||
```json
|
||||
{
|
||||
"and": [
|
||||
{
|
||||
"property": "Status",
|
||||
"status": { "equals": "Complete" }
|
||||
},
|
||||
{
|
||||
"property": "Completed Date",
|
||||
"date": {
|
||||
"before": "2025-09-07"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Process:**
|
||||
1. Query with filter
|
||||
2. For each page:
|
||||
- Create copy in Archive database
|
||||
- Update original with "Archived" status or delete
|
||||
3. Report summary
|
||||
|
||||
---
|
||||
|
||||
### 7. Relation Audit
|
||||
|
||||
**Request:**
|
||||
```
|
||||
Find all pages in "Tasks" that have broken relations to "Projects"
|
||||
```
|
||||
|
||||
**Approach:**
|
||||
1. Fetch all Tasks pages
|
||||
2. For each task, check Project relation
|
||||
3. Verify referenced Project page exists
|
||||
4. Report broken relations
|
||||
|
||||
**Sample Output:**
|
||||
```
|
||||
## Relation Audit: Tasks → Projects
|
||||
|
||||
총 작업: 150개
|
||||
정상 연결: 142개
|
||||
끊어진 연결: 8개
|
||||
|
||||
### 끊어진 연결 목록:
|
||||
1. "Website Redesign Phase 2" → Project not found
|
||||
2. "Q3 Marketing Review" → Project deleted
|
||||
...
|
||||
|
||||
### 권장 조치:
|
||||
- 삭제된 프로젝트 복원 또는
|
||||
- 해당 작업들을 다른 프로젝트에 재할당
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Environment Setup
|
||||
|
||||
### Quick Start
|
||||
```bash
|
||||
# Navigate to scripts directory
|
||||
cd ~/.claude/skills/notion-organizer/scripts
|
||||
|
||||
# Create virtual environment
|
||||
python -m venv venv
|
||||
source venv/bin/activate # macOS/Linux
|
||||
|
||||
# Install dependencies
|
||||
pip install -r requirements.txt
|
||||
|
||||
# Set environment variable
|
||||
export NOTION_TOKEN="your_token_here"
|
||||
# Or create .env file with NOTION_TOKEN=your_token
|
||||
```
|
||||
|
||||
### Verify Setup
|
||||
```bash
|
||||
# Test with audit (read-only)
|
||||
python async_organizer.py --database-id YOUR_DB_ID --operation audit
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Rate Limit Errors (429)
|
||||
- Scripts automatically retry with exponential backoff
|
||||
- If persistent, reduce `MAX_CONCURRENT_REQUESTS` to 2
|
||||
|
||||
### Permission Errors (404)
|
||||
- Ensure database is shared with your integration
|
||||
- Check integration has correct capabilities
|
||||
|
||||
### Property Type Mismatch
|
||||
- Use `--generate-mapping` to see current types
|
||||
- Some conversions require manual handling (e.g., people → text)
|
||||
|
||||
### Large Databases (1000+ pages)
|
||||
- Always use Python scripts, not MCP
|
||||
- Consider running in batches with checkpoints
|
||||
- Monitor API usage in Notion settings
|
||||
127
custom-skills/10-seo-technical-audit/code/CLAUDE.md
Normal file
127
custom-skills/10-seo-technical-audit/code/CLAUDE.md
Normal file
@@ -0,0 +1,127 @@
|
||||
# CLAUDE.md
|
||||
|
||||
## Overview
|
||||
|
||||
Technical SEO auditor for crawlability fundamentals: robots.txt validation, XML sitemap analysis, and URL accessibility checking.
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
# Install dependencies
|
||||
pip install -r scripts/requirements.txt
|
||||
|
||||
# Robots.txt analysis
|
||||
python scripts/robots_checker.py --url https://example.com
|
||||
|
||||
# Sitemap validation
|
||||
python scripts/sitemap_validator.py --url https://example.com/sitemap.xml
|
||||
|
||||
# Async URL crawl (check sitemap URLs accessibility)
|
||||
python scripts/sitemap_crawler.py --sitemap https://example.com/sitemap.xml
|
||||
```
|
||||
|
||||
## Scripts
|
||||
|
||||
| Script | Purpose | Key Output |
|
||||
|--------|---------|------------|
|
||||
| `robots_checker.py` | Parse and validate robots.txt | User-agent rules, disallow patterns, sitemap declarations |
|
||||
| `sitemap_validator.py` | Validate XML sitemap structure | URL count, lastmod dates, size limits, syntax errors |
|
||||
| `sitemap_crawler.py` | Async check URL accessibility | HTTP status codes, response times, broken links |
|
||||
| `base_client.py` | Shared utilities | RateLimiter, ConfigManager, BaseAsyncClient |
|
||||
|
||||
## Robots.txt Checker
|
||||
|
||||
```bash
|
||||
# Basic analysis
|
||||
python scripts/robots_checker.py --url https://example.com
|
||||
|
||||
# Test specific URL against rules
|
||||
python scripts/robots_checker.py --url https://example.com --test-url /admin/
|
||||
|
||||
# Output JSON
|
||||
python scripts/robots_checker.py --url https://example.com --json
|
||||
```
|
||||
|
||||
**Checks performed**:
|
||||
- Syntax validation
|
||||
- User-agent rule parsing
|
||||
- Disallow/Allow pattern analysis
|
||||
- Sitemap declarations
|
||||
- Critical resource access (CSS/JS/images)
|
||||
|
||||
## Sitemap Validator
|
||||
|
||||
```bash
|
||||
# Validate sitemap
|
||||
python scripts/sitemap_validator.py --url https://example.com/sitemap.xml
|
||||
|
||||
# Include sitemap index parsing
|
||||
python scripts/sitemap_validator.py --url https://example.com/sitemap_index.xml --follow-index
|
||||
```
|
||||
|
||||
**Validation rules**:
|
||||
- XML syntax correctness
|
||||
- URL count limit (50,000 max per sitemap)
|
||||
- File size limit (50MB max uncompressed)
|
||||
- Lastmod date format validation
|
||||
- Sitemap index structure
|
||||
|
||||
## Sitemap Crawler
|
||||
|
||||
```bash
|
||||
# Crawl all URLs in sitemap
|
||||
python scripts/sitemap_crawler.py --sitemap https://example.com/sitemap.xml
|
||||
|
||||
# Limit concurrent requests
|
||||
python scripts/sitemap_crawler.py --sitemap https://example.com/sitemap.xml --concurrency 10
|
||||
|
||||
# Sample mode (check subset)
|
||||
python scripts/sitemap_crawler.py --sitemap https://example.com/sitemap.xml --sample 100
|
||||
```
|
||||
|
||||
**Output includes**:
|
||||
- HTTP status codes per URL
|
||||
- Response times
|
||||
- Redirect chains
|
||||
- Broken links (4xx, 5xx)
|
||||
|
||||
## Output Format
|
||||
|
||||
All scripts support `--json` flag for structured output:
|
||||
|
||||
```json
|
||||
{
|
||||
"url": "https://example.com",
|
||||
"status": "valid|invalid|warning",
|
||||
"issues": [
|
||||
{
|
||||
"type": "error|warning|info",
|
||||
"message": "Description",
|
||||
"location": "Line or URL"
|
||||
}
|
||||
],
|
||||
"summary": {}
|
||||
}
|
||||
```
|
||||
|
||||
## Common Issues Detected
|
||||
|
||||
| Category | Issue | Severity |
|
||||
|----------|-------|----------|
|
||||
| Robots.txt | Missing sitemap declaration | Medium |
|
||||
| Robots.txt | Blocking CSS/JS resources | High |
|
||||
| Robots.txt | Overly broad disallow rules | Medium |
|
||||
| Sitemap | URLs returning 404 | High |
|
||||
| Sitemap | Missing lastmod dates | Low |
|
||||
| Sitemap | Exceeds 50,000 URL limit | High |
|
||||
| Sitemap | Non-canonical URLs included | Medium |
|
||||
|
||||
## Configuration
|
||||
|
||||
Environment variables (optional):
|
||||
```bash
|
||||
# Rate limiting
|
||||
CRAWL_DELAY=1.0 # Seconds between requests
|
||||
MAX_CONCURRENT=20 # Async concurrency limit
|
||||
REQUEST_TIMEOUT=30 # Request timeout seconds
|
||||
```
|
||||
207
custom-skills/10-seo-technical-audit/code/scripts/base_client.py
Normal file
207
custom-skills/10-seo-technical-audit/code/scripts/base_client.py
Normal file
@@ -0,0 +1,207 @@
|
||||
"""
|
||||
Base Client - Shared async client utilities
|
||||
===========================================
|
||||
Purpose: Rate-limited async operations for API clients
|
||||
Python: 3.10+
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
import os
|
||||
from asyncio import Semaphore
|
||||
from datetime import datetime
|
||||
from typing import Any, Callable, TypeVar
|
||||
|
||||
from dotenv import load_dotenv
|
||||
from tenacity import (
|
||||
retry,
|
||||
stop_after_attempt,
|
||||
wait_exponential,
|
||||
retry_if_exception_type,
|
||||
)
|
||||
|
||||
# Load environment variables
|
||||
load_dotenv()
|
||||
|
||||
# Logging setup
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format="%(asctime)s - %(levelname)s - %(message)s",
|
||||
)
|
||||
|
||||
T = TypeVar("T")
|
||||
|
||||
|
||||
class RateLimiter:
|
||||
"""Rate limiter using token bucket algorithm."""
|
||||
|
||||
def __init__(self, rate: float, per: float = 1.0):
|
||||
"""
|
||||
Initialize rate limiter.
|
||||
|
||||
Args:
|
||||
rate: Number of requests allowed
|
||||
per: Time period in seconds (default: 1 second)
|
||||
"""
|
||||
self.rate = rate
|
||||
self.per = per
|
||||
self.tokens = rate
|
||||
self.last_update = datetime.now()
|
||||
self._lock = asyncio.Lock()
|
||||
|
||||
async def acquire(self) -> None:
|
||||
"""Acquire a token, waiting if necessary."""
|
||||
async with self._lock:
|
||||
now = datetime.now()
|
||||
elapsed = (now - self.last_update).total_seconds()
|
||||
self.tokens = min(self.rate, self.tokens + elapsed * (self.rate / self.per))
|
||||
self.last_update = now
|
||||
|
||||
if self.tokens < 1:
|
||||
wait_time = (1 - self.tokens) * (self.per / self.rate)
|
||||
await asyncio.sleep(wait_time)
|
||||
self.tokens = 0
|
||||
else:
|
||||
self.tokens -= 1
|
||||
|
||||
|
||||
class BaseAsyncClient:
|
||||
"""Base class for async API clients with rate limiting."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
max_concurrent: int = 5,
|
||||
requests_per_second: float = 3.0,
|
||||
logger: logging.Logger | None = None,
|
||||
):
|
||||
"""
|
||||
Initialize base client.
|
||||
|
||||
Args:
|
||||
max_concurrent: Maximum concurrent requests
|
||||
requests_per_second: Rate limit
|
||||
logger: Logger instance
|
||||
"""
|
||||
self.semaphore = Semaphore(max_concurrent)
|
||||
self.rate_limiter = RateLimiter(requests_per_second)
|
||||
self.logger = logger or logging.getLogger(self.__class__.__name__)
|
||||
self.stats = {
|
||||
"requests": 0,
|
||||
"success": 0,
|
||||
"errors": 0,
|
||||
"retries": 0,
|
||||
}
|
||||
|
||||
@retry(
|
||||
stop=stop_after_attempt(3),
|
||||
wait=wait_exponential(multiplier=1, min=2, max=10),
|
||||
retry=retry_if_exception_type(Exception),
|
||||
)
|
||||
async def _rate_limited_request(
|
||||
self,
|
||||
coro: Callable[[], Any],
|
||||
) -> Any:
|
||||
"""Execute a request with rate limiting and retry."""
|
||||
async with self.semaphore:
|
||||
await self.rate_limiter.acquire()
|
||||
self.stats["requests"] += 1
|
||||
try:
|
||||
result = await coro()
|
||||
self.stats["success"] += 1
|
||||
return result
|
||||
except Exception as e:
|
||||
self.stats["errors"] += 1
|
||||
self.logger.error(f"Request failed: {e}")
|
||||
raise
|
||||
|
||||
async def batch_requests(
|
||||
self,
|
||||
requests: list[Callable[[], Any]],
|
||||
desc: str = "Processing",
|
||||
) -> list[Any]:
|
||||
"""Execute multiple requests concurrently."""
|
||||
try:
|
||||
from tqdm.asyncio import tqdm
|
||||
has_tqdm = True
|
||||
except ImportError:
|
||||
has_tqdm = False
|
||||
|
||||
async def execute(req: Callable) -> Any:
|
||||
try:
|
||||
return await self._rate_limited_request(req)
|
||||
except Exception as e:
|
||||
return {"error": str(e)}
|
||||
|
||||
tasks = [execute(req) for req in requests]
|
||||
|
||||
if has_tqdm:
|
||||
results = []
|
||||
for coro in tqdm.as_completed(tasks, total=len(tasks), desc=desc):
|
||||
result = await coro
|
||||
results.append(result)
|
||||
return results
|
||||
else:
|
||||
return await asyncio.gather(*tasks, return_exceptions=True)
|
||||
|
||||
def print_stats(self) -> None:
|
||||
"""Print request statistics."""
|
||||
self.logger.info("=" * 40)
|
||||
self.logger.info("Request Statistics:")
|
||||
self.logger.info(f" Total Requests: {self.stats['requests']}")
|
||||
self.logger.info(f" Successful: {self.stats['success']}")
|
||||
self.logger.info(f" Errors: {self.stats['errors']}")
|
||||
self.logger.info("=" * 40)
|
||||
|
||||
|
||||
class ConfigManager:
|
||||
"""Manage API configuration and credentials."""
|
||||
|
||||
def __init__(self):
|
||||
load_dotenv()
|
||||
|
||||
@property
|
||||
def google_credentials_path(self) -> str | None:
|
||||
"""Get Google service account credentials path."""
|
||||
# Prefer SEO-specific credentials, fallback to general credentials
|
||||
seo_creds = os.path.expanduser("~/.credential/ourdigital-seo-agent.json")
|
||||
if os.path.exists(seo_creds):
|
||||
return seo_creds
|
||||
return os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
|
||||
|
||||
@property
|
||||
def pagespeed_api_key(self) -> str | None:
|
||||
"""Get PageSpeed Insights API key."""
|
||||
return os.getenv("PAGESPEED_API_KEY")
|
||||
|
||||
@property
|
||||
def custom_search_api_key(self) -> str | None:
|
||||
"""Get Custom Search API key."""
|
||||
return os.getenv("CUSTOM_SEARCH_API_KEY")
|
||||
|
||||
@property
|
||||
def custom_search_engine_id(self) -> str | None:
|
||||
"""Get Custom Search Engine ID."""
|
||||
return os.getenv("CUSTOM_SEARCH_ENGINE_ID")
|
||||
|
||||
@property
|
||||
def notion_token(self) -> str | None:
|
||||
"""Get Notion API token."""
|
||||
return os.getenv("NOTION_TOKEN") or os.getenv("NOTION_API_KEY")
|
||||
|
||||
def validate_google_credentials(self) -> bool:
|
||||
"""Validate Google credentials are configured."""
|
||||
creds_path = self.google_credentials_path
|
||||
if not creds_path:
|
||||
return False
|
||||
return os.path.exists(creds_path)
|
||||
|
||||
def get_required(self, key: str) -> str:
|
||||
"""Get required environment variable or raise error."""
|
||||
value = os.getenv(key)
|
||||
if not value:
|
||||
raise ValueError(f"Missing required environment variable: {key}")
|
||||
return value
|
||||
|
||||
|
||||
# Singleton config instance
|
||||
config = ConfigManager()
|
||||
@@ -0,0 +1,569 @@
|
||||
"""
|
||||
Page Analyzer - Extract SEO metadata from web pages
|
||||
===================================================
|
||||
Purpose: Comprehensive page-level SEO data extraction
|
||||
Python: 3.10+
|
||||
Usage:
|
||||
from page_analyzer import PageAnalyzer, PageMetadata
|
||||
analyzer = PageAnalyzer()
|
||||
metadata = analyzer.analyze_url("https://example.com/page")
|
||||
"""
|
||||
|
||||
import json
|
||||
import logging
|
||||
import re
|
||||
from dataclasses import dataclass, field
|
||||
from datetime import datetime
|
||||
from typing import Any
|
||||
from urllib.parse import urljoin, urlparse
|
||||
|
||||
import requests
|
||||
from bs4 import BeautifulSoup
|
||||
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format="%(asctime)s - %(levelname)s - %(message)s",
|
||||
)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@dataclass
|
||||
class LinkData:
|
||||
"""Represents a link found on a page."""
|
||||
url: str
|
||||
anchor_text: str
|
||||
is_internal: bool
|
||||
is_nofollow: bool = False
|
||||
link_type: str = "body" # body, nav, footer, etc.
|
||||
|
||||
|
||||
@dataclass
|
||||
class HeadingData:
|
||||
"""Represents a heading found on a page."""
|
||||
level: int # 1-6
|
||||
text: str
|
||||
|
||||
|
||||
@dataclass
|
||||
class SchemaData:
|
||||
"""Represents schema.org structured data."""
|
||||
schema_type: str
|
||||
properties: dict
|
||||
format: str = "json-ld" # json-ld, microdata, rdfa
|
||||
|
||||
|
||||
@dataclass
|
||||
class OpenGraphData:
|
||||
"""Represents Open Graph metadata."""
|
||||
og_title: str | None = None
|
||||
og_description: str | None = None
|
||||
og_image: str | None = None
|
||||
og_url: str | None = None
|
||||
og_type: str | None = None
|
||||
og_site_name: str | None = None
|
||||
og_locale: str | None = None
|
||||
twitter_card: str | None = None
|
||||
twitter_title: str | None = None
|
||||
twitter_description: str | None = None
|
||||
twitter_image: str | None = None
|
||||
|
||||
|
||||
@dataclass
|
||||
class PageMetadata:
|
||||
"""Complete SEO metadata for a page."""
|
||||
|
||||
# Basic info
|
||||
url: str
|
||||
status_code: int = 0
|
||||
content_type: str = ""
|
||||
response_time_ms: float = 0
|
||||
analyzed_at: datetime = field(default_factory=datetime.now)
|
||||
|
||||
# Meta tags
|
||||
title: str | None = None
|
||||
title_length: int = 0
|
||||
meta_description: str | None = None
|
||||
meta_description_length: int = 0
|
||||
canonical_url: str | None = None
|
||||
robots_meta: str | None = None
|
||||
|
||||
# Language
|
||||
html_lang: str | None = None
|
||||
hreflang_tags: list[dict] = field(default_factory=list) # [{"lang": "en", "url": "..."}]
|
||||
|
||||
# Headings
|
||||
headings: list[HeadingData] = field(default_factory=list)
|
||||
h1_count: int = 0
|
||||
h1_text: str | None = None
|
||||
|
||||
# Open Graph & Social
|
||||
open_graph: OpenGraphData = field(default_factory=OpenGraphData)
|
||||
|
||||
# Schema/Structured Data
|
||||
schema_data: list[SchemaData] = field(default_factory=list)
|
||||
schema_types_found: list[str] = field(default_factory=list)
|
||||
|
||||
# Links
|
||||
internal_links: list[LinkData] = field(default_factory=list)
|
||||
external_links: list[LinkData] = field(default_factory=list)
|
||||
internal_link_count: int = 0
|
||||
external_link_count: int = 0
|
||||
|
||||
# Images
|
||||
images_total: int = 0
|
||||
images_without_alt: int = 0
|
||||
images_with_alt: int = 0
|
||||
|
||||
# Content metrics
|
||||
word_count: int = 0
|
||||
|
||||
# Issues found
|
||||
issues: list[str] = field(default_factory=list)
|
||||
warnings: list[str] = field(default_factory=list)
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
"""Convert to dictionary for JSON serialization."""
|
||||
return {
|
||||
"url": self.url,
|
||||
"status_code": self.status_code,
|
||||
"content_type": self.content_type,
|
||||
"response_time_ms": self.response_time_ms,
|
||||
"analyzed_at": self.analyzed_at.isoformat(),
|
||||
"title": self.title,
|
||||
"title_length": self.title_length,
|
||||
"meta_description": self.meta_description,
|
||||
"meta_description_length": self.meta_description_length,
|
||||
"canonical_url": self.canonical_url,
|
||||
"robots_meta": self.robots_meta,
|
||||
"html_lang": self.html_lang,
|
||||
"hreflang_tags": self.hreflang_tags,
|
||||
"h1_count": self.h1_count,
|
||||
"h1_text": self.h1_text,
|
||||
"headings_count": len(self.headings),
|
||||
"schema_types_found": self.schema_types_found,
|
||||
"internal_link_count": self.internal_link_count,
|
||||
"external_link_count": self.external_link_count,
|
||||
"images_total": self.images_total,
|
||||
"images_without_alt": self.images_without_alt,
|
||||
"word_count": self.word_count,
|
||||
"issues": self.issues,
|
||||
"warnings": self.warnings,
|
||||
"open_graph": {
|
||||
"og_title": self.open_graph.og_title,
|
||||
"og_description": self.open_graph.og_description,
|
||||
"og_image": self.open_graph.og_image,
|
||||
"og_url": self.open_graph.og_url,
|
||||
"og_type": self.open_graph.og_type,
|
||||
},
|
||||
}
|
||||
|
||||
def get_summary(self) -> str:
|
||||
"""Get a brief summary of the page analysis."""
|
||||
lines = [
|
||||
f"URL: {self.url}",
|
||||
f"Status: {self.status_code}",
|
||||
f"Title: {self.title[:50] + '...' if self.title and len(self.title) > 50 else self.title}",
|
||||
f"Description: {'✓' if self.meta_description else '✗ Missing'}",
|
||||
f"Canonical: {'✓' if self.canonical_url else '✗ Missing'}",
|
||||
f"H1: {self.h1_count} found",
|
||||
f"Schema: {', '.join(self.schema_types_found) if self.schema_types_found else 'None'}",
|
||||
f"Links: {self.internal_link_count} internal, {self.external_link_count} external",
|
||||
f"Images: {self.images_total} total, {self.images_without_alt} without alt",
|
||||
]
|
||||
if self.issues:
|
||||
lines.append(f"Issues: {len(self.issues)}")
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
class PageAnalyzer:
|
||||
"""Analyze web pages for SEO metadata."""
|
||||
|
||||
DEFAULT_USER_AGENT = "Mozilla/5.0 (compatible; OurDigitalSEOBot/1.0; +https://ourdigital.org)"
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
user_agent: str | None = None,
|
||||
timeout: int = 30,
|
||||
):
|
||||
"""
|
||||
Initialize page analyzer.
|
||||
|
||||
Args:
|
||||
user_agent: Custom user agent string
|
||||
timeout: Request timeout in seconds
|
||||
"""
|
||||
self.user_agent = user_agent or self.DEFAULT_USER_AGENT
|
||||
self.timeout = timeout
|
||||
self.session = requests.Session()
|
||||
self.session.headers.update({
|
||||
"User-Agent": self.user_agent,
|
||||
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
|
||||
"Accept-Language": "en-US,en;q=0.9,ko;q=0.8",
|
||||
})
|
||||
|
||||
def analyze_url(self, url: str) -> PageMetadata:
|
||||
"""
|
||||
Analyze a URL and extract SEO metadata.
|
||||
|
||||
Args:
|
||||
url: URL to analyze
|
||||
|
||||
Returns:
|
||||
PageMetadata object with all extracted data
|
||||
"""
|
||||
metadata = PageMetadata(url=url)
|
||||
|
||||
try:
|
||||
# Fetch page
|
||||
start_time = datetime.now()
|
||||
response = self.session.get(url, timeout=self.timeout, allow_redirects=True)
|
||||
metadata.response_time_ms = (datetime.now() - start_time).total_seconds() * 1000
|
||||
metadata.status_code = response.status_code
|
||||
metadata.content_type = response.headers.get("Content-Type", "")
|
||||
|
||||
if response.status_code != 200:
|
||||
metadata.issues.append(f"HTTP {response.status_code} status")
|
||||
if response.status_code >= 400:
|
||||
return metadata
|
||||
|
||||
# Parse HTML
|
||||
soup = BeautifulSoup(response.text, "html.parser")
|
||||
base_url = url
|
||||
|
||||
# Extract all metadata
|
||||
self._extract_basic_meta(soup, metadata)
|
||||
self._extract_canonical(soup, metadata, base_url)
|
||||
self._extract_robots_meta(soup, metadata)
|
||||
self._extract_hreflang(soup, metadata)
|
||||
self._extract_headings(soup, metadata)
|
||||
self._extract_open_graph(soup, metadata)
|
||||
self._extract_schema(soup, metadata)
|
||||
self._extract_links(soup, metadata, base_url)
|
||||
self._extract_images(soup, metadata)
|
||||
self._extract_content_metrics(soup, metadata)
|
||||
|
||||
# Run SEO checks
|
||||
self._run_seo_checks(metadata)
|
||||
|
||||
except requests.RequestException as e:
|
||||
metadata.issues.append(f"Request failed: {str(e)}")
|
||||
logger.error(f"Failed to analyze {url}: {e}")
|
||||
except Exception as e:
|
||||
metadata.issues.append(f"Analysis error: {str(e)}")
|
||||
logger.error(f"Error analyzing {url}: {e}")
|
||||
|
||||
return metadata
|
||||
|
||||
def _extract_basic_meta(self, soup: BeautifulSoup, metadata: PageMetadata) -> None:
|
||||
"""Extract title and meta description."""
|
||||
# Title
|
||||
title_tag = soup.find("title")
|
||||
if title_tag and title_tag.string:
|
||||
metadata.title = title_tag.string.strip()
|
||||
metadata.title_length = len(metadata.title)
|
||||
|
||||
# Meta description
|
||||
desc_tag = soup.find("meta", attrs={"name": re.compile(r"^description$", re.I)})
|
||||
if desc_tag and desc_tag.get("content"):
|
||||
metadata.meta_description = desc_tag["content"].strip()
|
||||
metadata.meta_description_length = len(metadata.meta_description)
|
||||
|
||||
# HTML lang
|
||||
html_tag = soup.find("html")
|
||||
if html_tag and html_tag.get("lang"):
|
||||
metadata.html_lang = html_tag["lang"]
|
||||
|
||||
def _extract_canonical(self, soup: BeautifulSoup, metadata: PageMetadata, base_url: str) -> None:
|
||||
"""Extract canonical URL."""
|
||||
canonical = soup.find("link", rel="canonical")
|
||||
if canonical and canonical.get("href"):
|
||||
metadata.canonical_url = urljoin(base_url, canonical["href"])
|
||||
|
||||
def _extract_robots_meta(self, soup: BeautifulSoup, metadata: PageMetadata) -> None:
|
||||
"""Extract robots meta tag."""
|
||||
robots = soup.find("meta", attrs={"name": re.compile(r"^robots$", re.I)})
|
||||
if robots and robots.get("content"):
|
||||
metadata.robots_meta = robots["content"]
|
||||
|
||||
# Also check for googlebot-specific
|
||||
googlebot = soup.find("meta", attrs={"name": re.compile(r"^googlebot$", re.I)})
|
||||
if googlebot and googlebot.get("content"):
|
||||
if metadata.robots_meta:
|
||||
metadata.robots_meta += f" | googlebot: {googlebot['content']}"
|
||||
else:
|
||||
metadata.robots_meta = f"googlebot: {googlebot['content']}"
|
||||
|
||||
def _extract_hreflang(self, soup: BeautifulSoup, metadata: PageMetadata) -> None:
|
||||
"""Extract hreflang tags."""
|
||||
hreflang_tags = soup.find_all("link", rel="alternate", hreflang=True)
|
||||
for tag in hreflang_tags:
|
||||
if tag.get("href") and tag.get("hreflang"):
|
||||
metadata.hreflang_tags.append({
|
||||
"lang": tag["hreflang"],
|
||||
"url": tag["href"]
|
||||
})
|
||||
|
||||
def _extract_headings(self, soup: BeautifulSoup, metadata: PageMetadata) -> None:
|
||||
"""Extract all headings."""
|
||||
for level in range(1, 7):
|
||||
for heading in soup.find_all(f"h{level}"):
|
||||
text = heading.get_text(strip=True)
|
||||
if text:
|
||||
metadata.headings.append(HeadingData(level=level, text=text))
|
||||
|
||||
# Count H1s specifically
|
||||
h1_tags = soup.find_all("h1")
|
||||
metadata.h1_count = len(h1_tags)
|
||||
if h1_tags:
|
||||
metadata.h1_text = h1_tags[0].get_text(strip=True)
|
||||
|
||||
def _extract_open_graph(self, soup: BeautifulSoup, metadata: PageMetadata) -> None:
|
||||
"""Extract Open Graph and Twitter Card data."""
|
||||
og = metadata.open_graph
|
||||
|
||||
# Open Graph tags
|
||||
og_mappings = {
|
||||
"og:title": "og_title",
|
||||
"og:description": "og_description",
|
||||
"og:image": "og_image",
|
||||
"og:url": "og_url",
|
||||
"og:type": "og_type",
|
||||
"og:site_name": "og_site_name",
|
||||
"og:locale": "og_locale",
|
||||
}
|
||||
|
||||
for og_prop, attr_name in og_mappings.items():
|
||||
tag = soup.find("meta", property=og_prop)
|
||||
if tag and tag.get("content"):
|
||||
setattr(og, attr_name, tag["content"])
|
||||
|
||||
# Twitter Card tags
|
||||
twitter_mappings = {
|
||||
"twitter:card": "twitter_card",
|
||||
"twitter:title": "twitter_title",
|
||||
"twitter:description": "twitter_description",
|
||||
"twitter:image": "twitter_image",
|
||||
}
|
||||
|
||||
for tw_name, attr_name in twitter_mappings.items():
|
||||
tag = soup.find("meta", attrs={"name": tw_name})
|
||||
if tag and tag.get("content"):
|
||||
setattr(og, attr_name, tag["content"])
|
||||
|
||||
def _extract_schema(self, soup: BeautifulSoup, metadata: PageMetadata) -> None:
|
||||
"""Extract schema.org structured data."""
|
||||
# JSON-LD
|
||||
for script in soup.find_all("script", type="application/ld+json"):
|
||||
try:
|
||||
data = json.loads(script.string)
|
||||
if isinstance(data, list):
|
||||
for item in data:
|
||||
self._process_schema_item(item, metadata, "json-ld")
|
||||
else:
|
||||
self._process_schema_item(data, metadata, "json-ld")
|
||||
except (json.JSONDecodeError, TypeError):
|
||||
continue
|
||||
|
||||
# Microdata (basic detection)
|
||||
for item in soup.find_all(itemscope=True):
|
||||
itemtype = item.get("itemtype", "")
|
||||
if itemtype:
|
||||
schema_type = itemtype.split("/")[-1]
|
||||
if schema_type not in metadata.schema_types_found:
|
||||
metadata.schema_types_found.append(schema_type)
|
||||
metadata.schema_data.append(SchemaData(
|
||||
schema_type=schema_type,
|
||||
properties={},
|
||||
format="microdata"
|
||||
))
|
||||
|
||||
def _process_schema_item(self, data: dict, metadata: PageMetadata, format_type: str) -> None:
|
||||
"""Process a single schema.org item."""
|
||||
if not isinstance(data, dict):
|
||||
return
|
||||
|
||||
schema_type = data.get("@type", "Unknown")
|
||||
if isinstance(schema_type, list):
|
||||
schema_type = schema_type[0] if schema_type else "Unknown"
|
||||
|
||||
if schema_type not in metadata.schema_types_found:
|
||||
metadata.schema_types_found.append(schema_type)
|
||||
|
||||
metadata.schema_data.append(SchemaData(
|
||||
schema_type=schema_type,
|
||||
properties=data,
|
||||
format=format_type
|
||||
))
|
||||
|
||||
# Process nested @graph items
|
||||
if "@graph" in data:
|
||||
for item in data["@graph"]:
|
||||
self._process_schema_item(item, metadata, format_type)
|
||||
|
||||
def _extract_links(self, soup: BeautifulSoup, metadata: PageMetadata, base_url: str) -> None:
|
||||
"""Extract internal and external links."""
|
||||
parsed_base = urlparse(base_url)
|
||||
base_domain = parsed_base.netloc.lower()
|
||||
|
||||
for a_tag in soup.find_all("a", href=True):
|
||||
href = a_tag["href"]
|
||||
|
||||
# Skip non-http links
|
||||
if href.startswith(("#", "javascript:", "mailto:", "tel:")):
|
||||
continue
|
||||
|
||||
# Resolve relative URLs
|
||||
full_url = urljoin(base_url, href)
|
||||
parsed_url = urlparse(full_url)
|
||||
|
||||
# Get anchor text
|
||||
anchor_text = a_tag.get_text(strip=True)[:100] # Limit length
|
||||
|
||||
# Check if nofollow
|
||||
rel = a_tag.get("rel", [])
|
||||
if isinstance(rel, str):
|
||||
rel = rel.split()
|
||||
is_nofollow = "nofollow" in rel
|
||||
|
||||
# Determine if internal or external
|
||||
link_domain = parsed_url.netloc.lower()
|
||||
is_internal = (
|
||||
link_domain == base_domain or
|
||||
link_domain.endswith(f".{base_domain}") or
|
||||
base_domain.endswith(f".{link_domain}")
|
||||
)
|
||||
|
||||
link_data = LinkData(
|
||||
url=full_url,
|
||||
anchor_text=anchor_text,
|
||||
is_internal=is_internal,
|
||||
is_nofollow=is_nofollow,
|
||||
)
|
||||
|
||||
if is_internal:
|
||||
metadata.internal_links.append(link_data)
|
||||
else:
|
||||
metadata.external_links.append(link_data)
|
||||
|
||||
metadata.internal_link_count = len(metadata.internal_links)
|
||||
metadata.external_link_count = len(metadata.external_links)
|
||||
|
||||
def _extract_images(self, soup: BeautifulSoup, metadata: PageMetadata) -> None:
|
||||
"""Extract image information."""
|
||||
images = soup.find_all("img")
|
||||
metadata.images_total = len(images)
|
||||
|
||||
for img in images:
|
||||
alt = img.get("alt", "").strip()
|
||||
if alt:
|
||||
metadata.images_with_alt += 1
|
||||
else:
|
||||
metadata.images_without_alt += 1
|
||||
|
||||
def _extract_content_metrics(self, soup: BeautifulSoup, metadata: PageMetadata) -> None:
|
||||
"""Extract content metrics like word count."""
|
||||
# Remove script and style elements
|
||||
for element in soup(["script", "style", "noscript"]):
|
||||
element.decompose()
|
||||
|
||||
# Get text content
|
||||
text = soup.get_text(separator=" ", strip=True)
|
||||
words = text.split()
|
||||
metadata.word_count = len(words)
|
||||
|
||||
def _run_seo_checks(self, metadata: PageMetadata) -> None:
|
||||
"""Run SEO checks and add issues/warnings."""
|
||||
# Title checks
|
||||
if not metadata.title:
|
||||
metadata.issues.append("Missing title tag")
|
||||
elif metadata.title_length < 30:
|
||||
metadata.warnings.append(f"Title too short ({metadata.title_length} chars, recommend 50-60)")
|
||||
elif metadata.title_length > 60:
|
||||
metadata.warnings.append(f"Title too long ({metadata.title_length} chars, recommend 50-60)")
|
||||
|
||||
# Meta description checks
|
||||
if not metadata.meta_description:
|
||||
metadata.issues.append("Missing meta description")
|
||||
elif metadata.meta_description_length < 120:
|
||||
metadata.warnings.append(f"Meta description too short ({metadata.meta_description_length} chars)")
|
||||
elif metadata.meta_description_length > 160:
|
||||
metadata.warnings.append(f"Meta description too long ({metadata.meta_description_length} chars)")
|
||||
|
||||
# Canonical check
|
||||
if not metadata.canonical_url:
|
||||
metadata.warnings.append("Missing canonical tag")
|
||||
elif metadata.canonical_url != metadata.url:
|
||||
metadata.warnings.append(f"Canonical points to different URL: {metadata.canonical_url}")
|
||||
|
||||
# H1 checks
|
||||
if metadata.h1_count == 0:
|
||||
metadata.issues.append("Missing H1 tag")
|
||||
elif metadata.h1_count > 1:
|
||||
metadata.warnings.append(f"Multiple H1 tags ({metadata.h1_count})")
|
||||
|
||||
# Image alt check
|
||||
if metadata.images_without_alt > 0:
|
||||
metadata.warnings.append(f"{metadata.images_without_alt} images missing alt text")
|
||||
|
||||
# Schema check
|
||||
if not metadata.schema_types_found:
|
||||
metadata.warnings.append("No structured data found")
|
||||
|
||||
# Open Graph check
|
||||
if not metadata.open_graph.og_title:
|
||||
metadata.warnings.append("Missing Open Graph tags")
|
||||
|
||||
# Robots meta check
|
||||
if metadata.robots_meta:
|
||||
robots_lower = metadata.robots_meta.lower()
|
||||
if "noindex" in robots_lower:
|
||||
metadata.issues.append("Page is set to noindex")
|
||||
if "nofollow" in robots_lower:
|
||||
metadata.warnings.append("Page is set to nofollow")
|
||||
|
||||
|
||||
def main():
|
||||
"""CLI entry point for testing."""
|
||||
import argparse
|
||||
|
||||
parser = argparse.ArgumentParser(description="Page SEO Analyzer")
|
||||
parser.add_argument("url", help="URL to analyze")
|
||||
parser.add_argument("--json", "-j", action="store_true", help="Output as JSON")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
analyzer = PageAnalyzer()
|
||||
metadata = analyzer.analyze_url(args.url)
|
||||
|
||||
if args.json:
|
||||
print(json.dumps(metadata.to_dict(), indent=2, ensure_ascii=False))
|
||||
else:
|
||||
print("=" * 60)
|
||||
print("PAGE ANALYSIS REPORT")
|
||||
print("=" * 60)
|
||||
print(metadata.get_summary())
|
||||
print()
|
||||
|
||||
if metadata.issues:
|
||||
print("ISSUES:")
|
||||
for issue in metadata.issues:
|
||||
print(f" ✗ {issue}")
|
||||
|
||||
if metadata.warnings:
|
||||
print("\nWARNINGS:")
|
||||
for warning in metadata.warnings:
|
||||
print(f" ⚠ {warning}")
|
||||
|
||||
if metadata.hreflang_tags:
|
||||
print(f"\nHREFLANG TAGS ({len(metadata.hreflang_tags)}):")
|
||||
for tag in metadata.hreflang_tags[:5]:
|
||||
print(f" {tag['lang']}: {tag['url']}")
|
||||
|
||||
if metadata.schema_types_found:
|
||||
print(f"\nSCHEMA TYPES:")
|
||||
for schema_type in metadata.schema_types_found:
|
||||
print(f" - {schema_type}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -0,0 +1,17 @@
|
||||
# 10-seo-technical-audit dependencies
|
||||
# Install: pip install -r requirements.txt
|
||||
|
||||
# Web Scraping & Parsing
|
||||
lxml>=5.1.0
|
||||
beautifulsoup4>=4.12.0
|
||||
requests>=2.31.0
|
||||
aiohttp>=3.9.0
|
||||
|
||||
# Async & Retry
|
||||
tenacity>=8.2.0
|
||||
tqdm>=4.66.0
|
||||
|
||||
# Environment & CLI
|
||||
python-dotenv>=1.0.0
|
||||
rich>=13.7.0
|
||||
typer>=0.9.0
|
||||
@@ -0,0 +1,540 @@
|
||||
"""
|
||||
Robots.txt Checker - Analyze robots.txt configuration
|
||||
=====================================================
|
||||
Purpose: Parse and analyze robots.txt for SEO compliance
|
||||
Python: 3.10+
|
||||
Usage:
|
||||
python robots_checker.py --url https://example.com/robots.txt
|
||||
python robots_checker.py --url https://example.com --test-url /admin/
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import logging
|
||||
import re
|
||||
from dataclasses import dataclass, field
|
||||
from datetime import datetime
|
||||
from typing import Any
|
||||
from urllib.parse import urljoin, urlparse
|
||||
from urllib.robotparser import RobotFileParser
|
||||
|
||||
import requests
|
||||
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format="%(asctime)s - %(levelname)s - %(message)s",
|
||||
)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@dataclass
|
||||
class RobotsIssue:
|
||||
"""Represents a robots.txt issue."""
|
||||
|
||||
severity: str # "error", "warning", "info"
|
||||
message: str
|
||||
line_number: int | None = None
|
||||
directive: str | None = None
|
||||
suggestion: str | None = None
|
||||
|
||||
|
||||
@dataclass
|
||||
class UserAgentRules:
|
||||
"""Rules for a specific user-agent."""
|
||||
|
||||
user_agent: str
|
||||
disallow: list[str] = field(default_factory=list)
|
||||
allow: list[str] = field(default_factory=list)
|
||||
crawl_delay: float | None = None
|
||||
|
||||
|
||||
@dataclass
|
||||
class RobotsResult:
|
||||
"""Complete robots.txt analysis result."""
|
||||
|
||||
url: str
|
||||
accessible: bool = True
|
||||
content: str = ""
|
||||
rules: list[UserAgentRules] = field(default_factory=list)
|
||||
sitemaps: list[str] = field(default_factory=list)
|
||||
issues: list[RobotsIssue] = field(default_factory=list)
|
||||
stats: dict = field(default_factory=dict)
|
||||
timestamp: str = field(default_factory=lambda: datetime.now().isoformat())
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
"""Convert to dictionary for JSON output."""
|
||||
return {
|
||||
"url": self.url,
|
||||
"accessible": self.accessible,
|
||||
"sitemaps": self.sitemaps,
|
||||
"rules": [
|
||||
{
|
||||
"user_agent": r.user_agent,
|
||||
"disallow": r.disallow,
|
||||
"allow": r.allow,
|
||||
"crawl_delay": r.crawl_delay,
|
||||
}
|
||||
for r in self.rules
|
||||
],
|
||||
"issues": [
|
||||
{
|
||||
"severity": i.severity,
|
||||
"message": i.message,
|
||||
"line_number": i.line_number,
|
||||
"directive": i.directive,
|
||||
"suggestion": i.suggestion,
|
||||
}
|
||||
for i in self.issues
|
||||
],
|
||||
"stats": self.stats,
|
||||
"timestamp": self.timestamp,
|
||||
}
|
||||
|
||||
|
||||
class RobotsChecker:
|
||||
"""Analyze robots.txt configuration."""
|
||||
|
||||
# Common user agents
|
||||
USER_AGENTS = {
|
||||
"*": "All bots",
|
||||
"Googlebot": "Google crawler",
|
||||
"Googlebot-Image": "Google Image crawler",
|
||||
"Googlebot-News": "Google News crawler",
|
||||
"Googlebot-Video": "Google Video crawler",
|
||||
"Bingbot": "Bing crawler",
|
||||
"Slurp": "Yahoo crawler",
|
||||
"DuckDuckBot": "DuckDuckGo crawler",
|
||||
"Baiduspider": "Baidu crawler",
|
||||
"Yandex": "Yandex crawler",
|
||||
"facebot": "Facebook crawler",
|
||||
"Twitterbot": "Twitter crawler",
|
||||
"LinkedInBot": "LinkedIn crawler",
|
||||
}
|
||||
|
||||
# Paths that should generally not be blocked
|
||||
IMPORTANT_PATHS = [
|
||||
"/",
|
||||
"/*.css",
|
||||
"/*.js",
|
||||
"/*.jpg",
|
||||
"/*.jpeg",
|
||||
"/*.png",
|
||||
"/*.gif",
|
||||
"/*.svg",
|
||||
"/*.webp",
|
||||
]
|
||||
|
||||
# Paths commonly blocked
|
||||
COMMON_BLOCKED = [
|
||||
"/admin",
|
||||
"/wp-admin",
|
||||
"/login",
|
||||
"/private",
|
||||
"/api",
|
||||
"/cgi-bin",
|
||||
"/tmp",
|
||||
"/search",
|
||||
]
|
||||
|
||||
def __init__(self):
|
||||
self.session = requests.Session()
|
||||
self.session.headers.update({
|
||||
"User-Agent": "Mozilla/5.0 (compatible; SEOAuditBot/1.0)"
|
||||
})
|
||||
|
||||
def fetch_robots(self, url: str) -> str | None:
|
||||
"""Fetch robots.txt content."""
|
||||
# Ensure we're fetching robots.txt
|
||||
parsed = urlparse(url)
|
||||
if not parsed.path.endswith("robots.txt"):
|
||||
robots_url = f"{parsed.scheme}://{parsed.netloc}/robots.txt"
|
||||
else:
|
||||
robots_url = url
|
||||
|
||||
try:
|
||||
response = self.session.get(robots_url, timeout=10)
|
||||
if response.status_code == 200:
|
||||
return response.text
|
||||
elif response.status_code == 404:
|
||||
return None
|
||||
else:
|
||||
raise RuntimeError(f"HTTP {response.status_code}")
|
||||
except requests.RequestException as e:
|
||||
raise RuntimeError(f"Failed to fetch robots.txt: {e}")
|
||||
|
||||
def parse_robots(self, content: str) -> tuple[list[UserAgentRules], list[str]]:
|
||||
"""Parse robots.txt content."""
|
||||
rules = []
|
||||
sitemaps = []
|
||||
current_ua = None
|
||||
current_rules = None
|
||||
|
||||
for line_num, line in enumerate(content.split("\n"), 1):
|
||||
line = line.strip()
|
||||
|
||||
# Skip empty lines and comments
|
||||
if not line or line.startswith("#"):
|
||||
continue
|
||||
|
||||
# Parse directive
|
||||
if ":" not in line:
|
||||
continue
|
||||
|
||||
directive, value = line.split(":", 1)
|
||||
directive = directive.strip().lower()
|
||||
value = value.strip()
|
||||
|
||||
if directive == "user-agent":
|
||||
# Save previous user-agent rules
|
||||
if current_rules:
|
||||
rules.append(current_rules)
|
||||
|
||||
current_ua = value
|
||||
current_rules = UserAgentRules(user_agent=value)
|
||||
|
||||
elif directive == "disallow" and current_rules:
|
||||
if value: # Empty disallow means allow all
|
||||
current_rules.disallow.append(value)
|
||||
|
||||
elif directive == "allow" and current_rules:
|
||||
if value:
|
||||
current_rules.allow.append(value)
|
||||
|
||||
elif directive == "crawl-delay" and current_rules:
|
||||
try:
|
||||
current_rules.crawl_delay = float(value)
|
||||
except ValueError:
|
||||
pass
|
||||
|
||||
elif directive == "sitemap":
|
||||
if value:
|
||||
sitemaps.append(value)
|
||||
|
||||
# Don't forget last user-agent
|
||||
if current_rules:
|
||||
rules.append(current_rules)
|
||||
|
||||
return rules, sitemaps
|
||||
|
||||
def analyze(self, url: str) -> RobotsResult:
|
||||
"""Analyze robots.txt."""
|
||||
result = RobotsResult(url=url)
|
||||
|
||||
# Fetch robots.txt
|
||||
try:
|
||||
content = self.fetch_robots(url)
|
||||
if content is None:
|
||||
result.accessible = False
|
||||
result.issues.append(RobotsIssue(
|
||||
severity="info",
|
||||
message="No robots.txt found (returns 404)",
|
||||
suggestion="Consider creating a robots.txt file",
|
||||
))
|
||||
return result
|
||||
except RuntimeError as e:
|
||||
result.accessible = False
|
||||
result.issues.append(RobotsIssue(
|
||||
severity="error",
|
||||
message=str(e),
|
||||
))
|
||||
return result
|
||||
|
||||
result.content = content
|
||||
result.rules, result.sitemaps = self.parse_robots(content)
|
||||
|
||||
# Analyze content
|
||||
self._analyze_syntax(result)
|
||||
self._analyze_rules(result)
|
||||
self._analyze_sitemaps(result)
|
||||
|
||||
# Calculate stats
|
||||
result.stats = {
|
||||
"user_agents_count": len(result.rules),
|
||||
"user_agents": [r.user_agent for r in result.rules],
|
||||
"total_disallow_rules": sum(len(r.disallow) for r in result.rules),
|
||||
"total_allow_rules": sum(len(r.allow) for r in result.rules),
|
||||
"sitemaps_count": len(result.sitemaps),
|
||||
"has_crawl_delay": any(r.crawl_delay for r in result.rules),
|
||||
"content_length": len(content),
|
||||
}
|
||||
|
||||
return result
|
||||
|
||||
def _analyze_syntax(self, result: RobotsResult) -> None:
|
||||
"""Check for syntax issues."""
|
||||
lines = result.content.split("\n")
|
||||
|
||||
for line_num, line in enumerate(lines, 1):
|
||||
line = line.strip()
|
||||
|
||||
# Skip empty lines and comments
|
||||
if not line or line.startswith("#"):
|
||||
continue
|
||||
|
||||
# Check for valid directive
|
||||
if ":" not in line:
|
||||
result.issues.append(RobotsIssue(
|
||||
severity="warning",
|
||||
message=f"Invalid line (missing colon): {line[:50]}",
|
||||
line_number=line_num,
|
||||
))
|
||||
continue
|
||||
|
||||
directive, value = line.split(":", 1)
|
||||
directive = directive.strip().lower()
|
||||
|
||||
valid_directives = {
|
||||
"user-agent", "disallow", "allow",
|
||||
"crawl-delay", "sitemap", "host",
|
||||
}
|
||||
|
||||
if directive not in valid_directives:
|
||||
result.issues.append(RobotsIssue(
|
||||
severity="info",
|
||||
message=f"Unknown directive: {directive}",
|
||||
line_number=line_num,
|
||||
directive=directive,
|
||||
))
|
||||
|
||||
def _analyze_rules(self, result: RobotsResult) -> None:
|
||||
"""Analyze blocking rules."""
|
||||
# Check if there are any rules
|
||||
if not result.rules:
|
||||
result.issues.append(RobotsIssue(
|
||||
severity="info",
|
||||
message="No user-agent rules defined",
|
||||
suggestion="Add User-agent: * rules to control crawling",
|
||||
))
|
||||
return
|
||||
|
||||
# Check for wildcard rule
|
||||
has_wildcard = any(r.user_agent == "*" for r in result.rules)
|
||||
if not has_wildcard:
|
||||
result.issues.append(RobotsIssue(
|
||||
severity="info",
|
||||
message="No wildcard (*) user-agent defined",
|
||||
suggestion="Consider adding User-agent: * as fallback",
|
||||
))
|
||||
|
||||
# Check for blocking important resources
|
||||
for rules in result.rules:
|
||||
for disallow in rules.disallow:
|
||||
# Check if blocking root
|
||||
if disallow == "/":
|
||||
result.issues.append(RobotsIssue(
|
||||
severity="error",
|
||||
message=f"Blocking entire site for {rules.user_agent}",
|
||||
directive=f"Disallow: {disallow}",
|
||||
suggestion="This will prevent indexing. Is this intentional?",
|
||||
))
|
||||
|
||||
# Check if blocking CSS/JS
|
||||
if any(ext in disallow.lower() for ext in [".css", ".js"]):
|
||||
result.issues.append(RobotsIssue(
|
||||
severity="warning",
|
||||
message=f"Blocking CSS/JS files for {rules.user_agent}",
|
||||
directive=f"Disallow: {disallow}",
|
||||
suggestion="May affect rendering and SEO",
|
||||
))
|
||||
|
||||
# Check for blocking images
|
||||
if any(ext in disallow.lower() for ext in [".jpg", ".png", ".gif", ".webp"]):
|
||||
result.issues.append(RobotsIssue(
|
||||
severity="info",
|
||||
message=f"Blocking image files for {rules.user_agent}",
|
||||
directive=f"Disallow: {disallow}",
|
||||
))
|
||||
|
||||
# Check crawl delay
|
||||
if rules.crawl_delay:
|
||||
if rules.crawl_delay > 10:
|
||||
result.issues.append(RobotsIssue(
|
||||
severity="warning",
|
||||
message=f"High crawl-delay ({rules.crawl_delay}s) for {rules.user_agent}",
|
||||
directive=f"Crawl-delay: {rules.crawl_delay}",
|
||||
suggestion="May significantly slow indexing",
|
||||
))
|
||||
elif rules.crawl_delay > 0:
|
||||
result.issues.append(RobotsIssue(
|
||||
severity="info",
|
||||
message=f"Crawl-delay set to {rules.crawl_delay}s for {rules.user_agent}",
|
||||
))
|
||||
|
||||
def _analyze_sitemaps(self, result: RobotsResult) -> None:
|
||||
"""Analyze sitemap declarations."""
|
||||
if not result.sitemaps:
|
||||
result.issues.append(RobotsIssue(
|
||||
severity="warning",
|
||||
message="No sitemap declared in robots.txt",
|
||||
suggestion="Add Sitemap: directive to help crawlers find your sitemap",
|
||||
))
|
||||
else:
|
||||
for sitemap in result.sitemaps:
|
||||
if not sitemap.startswith("http"):
|
||||
result.issues.append(RobotsIssue(
|
||||
severity="warning",
|
||||
message=f"Sitemap URL should be absolute: {sitemap}",
|
||||
directive=f"Sitemap: {sitemap}",
|
||||
))
|
||||
|
||||
def test_url(self, robots_url: str, test_path: str,
|
||||
user_agent: str = "Googlebot") -> dict:
|
||||
"""Test if a specific URL is allowed."""
|
||||
# Use Python's built-in parser
|
||||
rp = RobotFileParser()
|
||||
|
||||
# Ensure robots.txt URL
|
||||
parsed = urlparse(robots_url)
|
||||
if not parsed.path.endswith("robots.txt"):
|
||||
robots_txt_url = f"{parsed.scheme}://{parsed.netloc}/robots.txt"
|
||||
else:
|
||||
robots_txt_url = robots_url
|
||||
|
||||
rp.set_url(robots_txt_url)
|
||||
try:
|
||||
rp.read()
|
||||
except Exception as e:
|
||||
return {
|
||||
"path": test_path,
|
||||
"user_agent": user_agent,
|
||||
"allowed": None,
|
||||
"error": str(e),
|
||||
}
|
||||
|
||||
# Build full URL for testing
|
||||
base_url = f"{parsed.scheme}://{parsed.netloc}"
|
||||
full_url = urljoin(base_url, test_path)
|
||||
|
||||
allowed = rp.can_fetch(user_agent, full_url)
|
||||
|
||||
return {
|
||||
"path": test_path,
|
||||
"user_agent": user_agent,
|
||||
"allowed": allowed,
|
||||
"full_url": full_url,
|
||||
}
|
||||
|
||||
def generate_report(self, result: RobotsResult) -> str:
|
||||
"""Generate human-readable analysis report."""
|
||||
lines = [
|
||||
"=" * 60,
|
||||
"Robots.txt Analysis Report",
|
||||
"=" * 60,
|
||||
f"URL: {result.url}",
|
||||
f"Accessible: {'Yes' if result.accessible else 'No'}",
|
||||
f"Timestamp: {result.timestamp}",
|
||||
"",
|
||||
]
|
||||
|
||||
if result.accessible:
|
||||
lines.append("Statistics:")
|
||||
for key, value in result.stats.items():
|
||||
if key == "user_agents":
|
||||
lines.append(f" {key}: {', '.join(value) if value else 'None'}")
|
||||
else:
|
||||
lines.append(f" {key}: {value}")
|
||||
lines.append("")
|
||||
|
||||
if result.sitemaps:
|
||||
lines.append(f"Sitemaps ({len(result.sitemaps)}):")
|
||||
for sitemap in result.sitemaps:
|
||||
lines.append(f" - {sitemap}")
|
||||
lines.append("")
|
||||
|
||||
if result.rules:
|
||||
lines.append("Rules Summary:")
|
||||
for rules in result.rules:
|
||||
lines.append(f"\n User-agent: {rules.user_agent}")
|
||||
if rules.disallow:
|
||||
lines.append(f" Disallow: {len(rules.disallow)} rules")
|
||||
for d in rules.disallow[:5]:
|
||||
lines.append(f" - {d}")
|
||||
if len(rules.disallow) > 5:
|
||||
lines.append(f" ... and {len(rules.disallow) - 5} more")
|
||||
if rules.allow:
|
||||
lines.append(f" Allow: {len(rules.allow)} rules")
|
||||
for a in rules.allow[:3]:
|
||||
lines.append(f" - {a}")
|
||||
if rules.crawl_delay:
|
||||
lines.append(f" Crawl-delay: {rules.crawl_delay}s")
|
||||
lines.append("")
|
||||
|
||||
if result.issues:
|
||||
lines.append("Issues Found:")
|
||||
errors = [i for i in result.issues if i.severity == "error"]
|
||||
warnings = [i for i in result.issues if i.severity == "warning"]
|
||||
infos = [i for i in result.issues if i.severity == "info"]
|
||||
|
||||
if errors:
|
||||
lines.append(f"\n ERRORS ({len(errors)}):")
|
||||
for issue in errors:
|
||||
lines.append(f" - {issue.message}")
|
||||
if issue.directive:
|
||||
lines.append(f" Directive: {issue.directive}")
|
||||
if issue.suggestion:
|
||||
lines.append(f" Suggestion: {issue.suggestion}")
|
||||
|
||||
if warnings:
|
||||
lines.append(f"\n WARNINGS ({len(warnings)}):")
|
||||
for issue in warnings:
|
||||
lines.append(f" - {issue.message}")
|
||||
if issue.suggestion:
|
||||
lines.append(f" Suggestion: {issue.suggestion}")
|
||||
|
||||
if infos:
|
||||
lines.append(f"\n INFO ({len(infos)}):")
|
||||
for issue in infos:
|
||||
lines.append(f" - {issue.message}")
|
||||
|
||||
lines.append("")
|
||||
lines.append("=" * 60)
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def main():
|
||||
"""Main entry point for CLI usage."""
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Analyze robots.txt configuration",
|
||||
)
|
||||
parser.add_argument("--url", "-u", required=True,
|
||||
help="URL to robots.txt or domain")
|
||||
parser.add_argument("--test-url", "-t",
|
||||
help="Test if specific URL path is allowed")
|
||||
parser.add_argument("--user-agent", "-a", default="Googlebot",
|
||||
help="User agent for testing (default: Googlebot)")
|
||||
parser.add_argument("--output", "-o", help="Output file for JSON report")
|
||||
parser.add_argument("--json", action="store_true", help="Output as JSON")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
checker = RobotsChecker()
|
||||
|
||||
if args.test_url:
|
||||
# Test specific URL
|
||||
test_result = checker.test_url(args.url, args.test_url, args.user_agent)
|
||||
if args.json:
|
||||
print(json.dumps(test_result, indent=2))
|
||||
else:
|
||||
status = "ALLOWED" if test_result["allowed"] else "BLOCKED"
|
||||
print(f"URL: {test_result['path']}")
|
||||
print(f"User-Agent: {test_result['user_agent']}")
|
||||
print(f"Status: {status}")
|
||||
else:
|
||||
# Full analysis
|
||||
result = checker.analyze(args.url)
|
||||
|
||||
if args.json or args.output:
|
||||
output = json.dumps(result.to_dict(), ensure_ascii=False, indent=2)
|
||||
if args.output:
|
||||
with open(args.output, "w", encoding="utf-8") as f:
|
||||
f.write(output)
|
||||
logger.info(f"Report written to {args.output}")
|
||||
else:
|
||||
print(output)
|
||||
else:
|
||||
print(checker.generate_report(result))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -0,0 +1,969 @@
|
||||
"""
|
||||
Sitemap Crawler - Sequential page analysis from sitemap
|
||||
=======================================================
|
||||
Purpose: Crawl sitemap URLs one by one, analyze each page, save to Notion
|
||||
Python: 3.10+
|
||||
Usage:
|
||||
from sitemap_crawler import SitemapCrawler
|
||||
crawler = SitemapCrawler()
|
||||
crawler.crawl_sitemap("https://example.com/sitemap.xml", delay=2.0)
|
||||
"""
|
||||
|
||||
import json
|
||||
import logging
|
||||
import time
|
||||
import xml.etree.ElementTree as ET
|
||||
from dataclasses import dataclass, field
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from typing import Callable, Generator
|
||||
from urllib.parse import urlparse
|
||||
|
||||
import requests
|
||||
from notion_client import Client
|
||||
|
||||
from base_client import config
|
||||
from page_analyzer import PageAnalyzer, PageMetadata
|
||||
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format="%(asctime)s - %(levelname)s - %(message)s",
|
||||
)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Default database for page analysis data
|
||||
DEFAULT_PAGES_DATABASE_ID = "2c8581e5-8a1e-8035-880b-e38cefc2f3ef"
|
||||
|
||||
# Default limits to prevent excessive resource usage
|
||||
DEFAULT_MAX_PAGES = 500
|
||||
DEFAULT_DELAY_SECONDS = 2.0
|
||||
|
||||
# Progress tracking directory
|
||||
PROGRESS_DIR = Path.home() / ".claude" / "seo-audit-progress"
|
||||
PROGRESS_DIR.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
|
||||
@dataclass
|
||||
class CrawlProgress:
|
||||
"""Track crawl progress."""
|
||||
total_urls: int = 0
|
||||
processed_urls: int = 0
|
||||
successful_urls: int = 0
|
||||
failed_urls: int = 0
|
||||
skipped_urls: int = 0
|
||||
start_time: datetime = field(default_factory=datetime.now)
|
||||
current_url: str = ""
|
||||
audit_id: str = ""
|
||||
site: str = ""
|
||||
status: str = "running" # running, completed, failed
|
||||
error_message: str = ""
|
||||
summary_page_id: str = ""
|
||||
|
||||
def get_progress_percent(self) -> float:
|
||||
if self.total_urls == 0:
|
||||
return 0.0
|
||||
return (self.processed_urls / self.total_urls) * 100
|
||||
|
||||
def get_elapsed_time(self) -> str:
|
||||
elapsed = datetime.now() - self.start_time
|
||||
minutes = int(elapsed.total_seconds() // 60)
|
||||
seconds = int(elapsed.total_seconds() % 60)
|
||||
return f"{minutes}m {seconds}s"
|
||||
|
||||
def get_eta(self) -> str:
|
||||
if self.processed_urls == 0:
|
||||
return "calculating..."
|
||||
elapsed = (datetime.now() - self.start_time).total_seconds()
|
||||
avg_time_per_url = elapsed / self.processed_urls
|
||||
remaining_urls = self.total_urls - self.processed_urls
|
||||
eta_seconds = remaining_urls * avg_time_per_url
|
||||
minutes = int(eta_seconds // 60)
|
||||
seconds = int(eta_seconds % 60)
|
||||
return f"{minutes}m {seconds}s"
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
"""Convert to dictionary for JSON serialization."""
|
||||
return {
|
||||
"audit_id": self.audit_id,
|
||||
"site": self.site,
|
||||
"status": self.status,
|
||||
"total_urls": self.total_urls,
|
||||
"processed_urls": self.processed_urls,
|
||||
"successful_urls": self.successful_urls,
|
||||
"failed_urls": self.failed_urls,
|
||||
"progress_percent": round(self.get_progress_percent(), 1),
|
||||
"elapsed_time": self.get_elapsed_time(),
|
||||
"eta": self.get_eta(),
|
||||
"current_url": self.current_url,
|
||||
"start_time": self.start_time.isoformat(),
|
||||
"error_message": self.error_message,
|
||||
"summary_page_id": self.summary_page_id,
|
||||
"updated_at": datetime.now().isoformat(),
|
||||
}
|
||||
|
||||
def save_to_file(self, filepath: Path | None = None) -> Path:
|
||||
"""Save progress to JSON file."""
|
||||
if filepath is None:
|
||||
filepath = PROGRESS_DIR / f"{self.audit_id}.json"
|
||||
with open(filepath, "w") as f:
|
||||
json.dump(self.to_dict(), f, indent=2)
|
||||
return filepath
|
||||
|
||||
@classmethod
|
||||
def load_from_file(cls, filepath: Path) -> "CrawlProgress":
|
||||
"""Load progress from JSON file."""
|
||||
with open(filepath, "r") as f:
|
||||
data = json.load(f)
|
||||
progress = cls()
|
||||
progress.audit_id = data.get("audit_id", "")
|
||||
progress.site = data.get("site", "")
|
||||
progress.status = data.get("status", "unknown")
|
||||
progress.total_urls = data.get("total_urls", 0)
|
||||
progress.processed_urls = data.get("processed_urls", 0)
|
||||
progress.successful_urls = data.get("successful_urls", 0)
|
||||
progress.failed_urls = data.get("failed_urls", 0)
|
||||
progress.current_url = data.get("current_url", "")
|
||||
progress.error_message = data.get("error_message", "")
|
||||
progress.summary_page_id = data.get("summary_page_id", "")
|
||||
if data.get("start_time"):
|
||||
progress.start_time = datetime.fromisoformat(data["start_time"])
|
||||
return progress
|
||||
|
||||
|
||||
def get_active_crawls() -> list[CrawlProgress]:
|
||||
"""Get all active (running) crawl jobs."""
|
||||
active = []
|
||||
for filepath in PROGRESS_DIR.glob("*.json"):
|
||||
try:
|
||||
progress = CrawlProgress.load_from_file(filepath)
|
||||
if progress.status == "running":
|
||||
active.append(progress)
|
||||
except Exception:
|
||||
continue
|
||||
return active
|
||||
|
||||
|
||||
def get_all_crawls() -> list[CrawlProgress]:
|
||||
"""Get all crawl jobs (active and completed)."""
|
||||
crawls = []
|
||||
for filepath in sorted(PROGRESS_DIR.glob("*.json"), reverse=True):
|
||||
try:
|
||||
progress = CrawlProgress.load_from_file(filepath)
|
||||
crawls.append(progress)
|
||||
except Exception:
|
||||
continue
|
||||
return crawls
|
||||
|
||||
|
||||
def get_crawl_status(audit_id: str) -> CrawlProgress | None:
|
||||
"""Get status of a specific crawl by audit ID."""
|
||||
filepath = PROGRESS_DIR / f"{audit_id}.json"
|
||||
if filepath.exists():
|
||||
return CrawlProgress.load_from_file(filepath)
|
||||
return None
|
||||
|
||||
|
||||
@dataclass
|
||||
class CrawlResult:
|
||||
"""Result of a complete sitemap crawl."""
|
||||
site: str
|
||||
sitemap_url: str
|
||||
audit_id: str
|
||||
total_pages: int
|
||||
successful_pages: int
|
||||
failed_pages: int
|
||||
start_time: datetime
|
||||
end_time: datetime
|
||||
pages_analyzed: list[PageMetadata] = field(default_factory=list)
|
||||
notion_page_ids: list[str] = field(default_factory=list)
|
||||
summary_page_id: str | None = None
|
||||
|
||||
def get_duration(self) -> str:
|
||||
duration = self.end_time - self.start_time
|
||||
minutes = int(duration.total_seconds() // 60)
|
||||
seconds = int(duration.total_seconds() % 60)
|
||||
return f"{minutes}m {seconds}s"
|
||||
|
||||
|
||||
class SitemapCrawler:
|
||||
"""Crawl sitemap URLs and analyze each page."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
notion_token: str | None = None,
|
||||
database_id: str | None = None,
|
||||
):
|
||||
"""
|
||||
Initialize sitemap crawler.
|
||||
|
||||
Args:
|
||||
notion_token: Notion API token
|
||||
database_id: Notion database ID for storing results
|
||||
"""
|
||||
self.notion_token = notion_token or config.notion_token
|
||||
self.database_id = database_id or DEFAULT_PAGES_DATABASE_ID
|
||||
self.analyzer = PageAnalyzer()
|
||||
|
||||
if self.notion_token:
|
||||
self.notion = Client(auth=self.notion_token)
|
||||
else:
|
||||
self.notion = None
|
||||
logger.warning("Notion token not configured, results will not be saved")
|
||||
|
||||
def fetch_sitemap_urls(self, sitemap_url: str) -> list[str]:
|
||||
"""
|
||||
Fetch and parse URLs from a sitemap.
|
||||
|
||||
Args:
|
||||
sitemap_url: URL of the sitemap
|
||||
|
||||
Returns:
|
||||
List of URLs found in the sitemap
|
||||
"""
|
||||
try:
|
||||
response = requests.get(sitemap_url, timeout=30)
|
||||
response.raise_for_status()
|
||||
|
||||
# Parse XML
|
||||
root = ET.fromstring(response.content)
|
||||
|
||||
# Handle namespace
|
||||
namespaces = {
|
||||
"sm": "http://www.sitemaps.org/schemas/sitemap/0.9"
|
||||
}
|
||||
|
||||
urls = []
|
||||
|
||||
# Check if this is a sitemap index
|
||||
sitemap_tags = root.findall(".//sm:sitemap/sm:loc", namespaces)
|
||||
if sitemap_tags:
|
||||
# This is a sitemap index, recursively fetch child sitemaps
|
||||
logger.info(f"Found sitemap index with {len(sitemap_tags)} child sitemaps")
|
||||
for loc in sitemap_tags:
|
||||
if loc.text:
|
||||
child_urls = self.fetch_sitemap_urls(loc.text)
|
||||
urls.extend(child_urls)
|
||||
else:
|
||||
# Regular sitemap, extract URLs
|
||||
url_tags = root.findall(".//sm:url/sm:loc", namespaces)
|
||||
if not url_tags:
|
||||
# Try without namespace
|
||||
url_tags = root.findall(".//url/loc")
|
||||
|
||||
for loc in url_tags:
|
||||
if loc.text:
|
||||
urls.append(loc.text)
|
||||
|
||||
# Remove duplicates while preserving order
|
||||
seen = set()
|
||||
unique_urls = []
|
||||
for url in urls:
|
||||
if url not in seen:
|
||||
seen.add(url)
|
||||
unique_urls.append(url)
|
||||
|
||||
logger.info(f"Found {len(unique_urls)} unique URLs in sitemap")
|
||||
return unique_urls
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to fetch sitemap: {e}")
|
||||
raise
|
||||
|
||||
def crawl_sitemap(
|
||||
self,
|
||||
sitemap_url: str,
|
||||
delay: float = DEFAULT_DELAY_SECONDS,
|
||||
max_pages: int = DEFAULT_MAX_PAGES,
|
||||
progress_callback: Callable[[CrawlProgress], None] | None = None,
|
||||
save_to_notion: bool = True,
|
||||
url_filter: Callable[[str], bool] | None = None,
|
||||
) -> CrawlResult:
|
||||
"""
|
||||
Crawl all URLs in a sitemap sequentially.
|
||||
|
||||
Args:
|
||||
sitemap_url: URL of the sitemap
|
||||
delay: Seconds to wait between requests (default: 2.0s)
|
||||
max_pages: Maximum number of pages to process (default: 500)
|
||||
progress_callback: Function called with progress updates
|
||||
save_to_notion: Whether to save results to Notion
|
||||
url_filter: Optional function to filter URLs (return True to include)
|
||||
|
||||
Returns:
|
||||
CrawlResult with all analyzed pages
|
||||
"""
|
||||
# Parse site info
|
||||
parsed_sitemap = urlparse(sitemap_url)
|
||||
site = f"{parsed_sitemap.scheme}://{parsed_sitemap.netloc}"
|
||||
site_domain = parsed_sitemap.netloc
|
||||
|
||||
# Generate audit ID
|
||||
audit_id = f"{site_domain}-pages-{datetime.now().strftime('%Y%m%d-%H%M%S')}"
|
||||
|
||||
logger.info(f"Starting sitemap crawl: {sitemap_url}")
|
||||
logger.info(f"Audit ID: {audit_id}")
|
||||
logger.info(f"Delay between requests: {delay}s")
|
||||
|
||||
# Initialize progress tracking
|
||||
progress = CrawlProgress(
|
||||
audit_id=audit_id,
|
||||
site=site,
|
||||
status="running",
|
||||
)
|
||||
|
||||
# Fetch URLs
|
||||
urls = self.fetch_sitemap_urls(sitemap_url)
|
||||
|
||||
# Apply URL filter if provided
|
||||
if url_filter:
|
||||
urls = [url for url in urls if url_filter(url)]
|
||||
logger.info(f"After filtering: {len(urls)} URLs")
|
||||
|
||||
# Apply max pages limit (default: 500 to prevent excessive resource usage)
|
||||
if len(urls) > max_pages:
|
||||
logger.warning(f"Sitemap has {len(urls)} URLs, limiting to {max_pages} pages")
|
||||
logger.warning(f"Use max_pages parameter to adjust this limit")
|
||||
urls = urls[:max_pages]
|
||||
logger.info(f"Processing {len(urls)} pages (max: {max_pages})")
|
||||
|
||||
# Update progress with total URLs
|
||||
progress.total_urls = len(urls)
|
||||
progress.save_to_file()
|
||||
|
||||
# Initialize result
|
||||
result = CrawlResult(
|
||||
site=site,
|
||||
sitemap_url=sitemap_url,
|
||||
audit_id=audit_id,
|
||||
total_pages=len(urls),
|
||||
successful_pages=0,
|
||||
failed_pages=0,
|
||||
start_time=datetime.now(),
|
||||
end_time=datetime.now(),
|
||||
)
|
||||
|
||||
# Process each URL
|
||||
try:
|
||||
for i, url in enumerate(urls):
|
||||
progress.current_url = url
|
||||
progress.processed_urls = i
|
||||
progress.save_to_file() # Save progress to file
|
||||
|
||||
if progress_callback:
|
||||
progress_callback(progress)
|
||||
|
||||
logger.info(f"[{i+1}/{len(urls)}] Analyzing: {url}")
|
||||
|
||||
try:
|
||||
# Analyze page
|
||||
metadata = self.analyzer.analyze_url(url)
|
||||
result.pages_analyzed.append(metadata)
|
||||
|
||||
if metadata.status_code == 200:
|
||||
progress.successful_urls += 1
|
||||
result.successful_pages += 1
|
||||
|
||||
# Save to Notion
|
||||
if save_to_notion and self.notion:
|
||||
page_id = self._save_page_to_notion(metadata, audit_id, site)
|
||||
if page_id:
|
||||
result.notion_page_ids.append(page_id)
|
||||
else:
|
||||
progress.failed_urls += 1
|
||||
result.failed_pages += 1
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to analyze {url}: {e}")
|
||||
progress.failed_urls += 1
|
||||
result.failed_pages += 1
|
||||
|
||||
# Wait before next request
|
||||
if i < len(urls) - 1: # Don't wait after last URL
|
||||
time.sleep(delay)
|
||||
|
||||
# Final progress update
|
||||
progress.processed_urls = len(urls)
|
||||
progress.status = "completed"
|
||||
if progress_callback:
|
||||
progress_callback(progress)
|
||||
|
||||
except Exception as e:
|
||||
progress.status = "failed"
|
||||
progress.error_message = str(e)
|
||||
progress.save_to_file()
|
||||
raise
|
||||
|
||||
# Update result
|
||||
result.end_time = datetime.now()
|
||||
|
||||
# Create summary page
|
||||
if save_to_notion and self.notion:
|
||||
summary_id = self._create_crawl_summary_page(result)
|
||||
result.summary_page_id = summary_id
|
||||
progress.summary_page_id = summary_id
|
||||
|
||||
# Save final progress
|
||||
progress.save_to_file()
|
||||
|
||||
logger.info(f"Crawl complete: {result.successful_pages}/{result.total_pages} pages analyzed")
|
||||
logger.info(f"Duration: {result.get_duration()}")
|
||||
|
||||
return result
|
||||
|
||||
def _save_page_to_notion(
|
||||
self,
|
||||
metadata: PageMetadata,
|
||||
audit_id: str,
|
||||
site: str,
|
||||
) -> str | None:
|
||||
"""Save page metadata to Notion database."""
|
||||
try:
|
||||
# Build properties
|
||||
properties = {
|
||||
"Issue": {"title": [{"text": {"content": f"📄 {metadata.url}"}}]},
|
||||
"Category": {"select": {"name": "On-page SEO"}},
|
||||
"Priority": {"select": {"name": self._determine_priority(metadata)}},
|
||||
"Site": {"url": site},
|
||||
"URL": {"url": metadata.url},
|
||||
"Audit ID": {"rich_text": [{"text": {"content": audit_id}}]},
|
||||
"Found Date": {"date": {"start": datetime.now().strftime("%Y-%m-%d")}},
|
||||
}
|
||||
|
||||
# Build page content
|
||||
children = self._build_page_content(metadata)
|
||||
|
||||
response = self.notion.pages.create(
|
||||
parent={"database_id": self.database_id},
|
||||
properties=properties,
|
||||
children=children,
|
||||
)
|
||||
|
||||
return response["id"]
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to save to Notion: {e}")
|
||||
return None
|
||||
|
||||
def _determine_priority(self, metadata: PageMetadata) -> str:
|
||||
"""Determine priority based on issues found."""
|
||||
if len(metadata.issues) >= 3:
|
||||
return "High"
|
||||
elif len(metadata.issues) >= 1:
|
||||
return "Medium"
|
||||
elif len(metadata.warnings) >= 3:
|
||||
return "Medium"
|
||||
else:
|
||||
return "Low"
|
||||
|
||||
def _build_page_content(self, metadata: PageMetadata) -> list[dict]:
|
||||
"""Build Notion page content blocks from metadata."""
|
||||
children = []
|
||||
|
||||
# Status summary callout
|
||||
status_emoji = "✅" if not metadata.issues else "⚠️" if len(metadata.issues) < 3 else "❌"
|
||||
children.append({
|
||||
"object": "block",
|
||||
"type": "callout",
|
||||
"callout": {
|
||||
"rich_text": [
|
||||
{"type": "text", "text": {"content": f"Status: {metadata.status_code} | "}},
|
||||
{"type": "text", "text": {"content": f"Response: {metadata.response_time_ms:.0f}ms | "}},
|
||||
{"type": "text", "text": {"content": f"Issues: {len(metadata.issues)} | "}},
|
||||
{"type": "text", "text": {"content": f"Warnings: {len(metadata.warnings)}"}},
|
||||
],
|
||||
"icon": {"type": "emoji", "emoji": status_emoji},
|
||||
"color": "gray_background" if not metadata.issues else "yellow_background" if len(metadata.issues) < 3 else "red_background",
|
||||
}
|
||||
})
|
||||
|
||||
# Meta Tags Section
|
||||
children.append({
|
||||
"object": "block",
|
||||
"type": "heading_2",
|
||||
"heading_2": {"rich_text": [{"type": "text", "text": {"content": "Meta Tags"}}]}
|
||||
})
|
||||
|
||||
# Meta tags table
|
||||
meta_rows = [
|
||||
{"type": "table_row", "table_row": {"cells": [
|
||||
[{"type": "text", "text": {"content": "Tag"}, "annotations": {"bold": True}}],
|
||||
[{"type": "text", "text": {"content": "Value"}, "annotations": {"bold": True}}],
|
||||
[{"type": "text", "text": {"content": "Status"}, "annotations": {"bold": True}}],
|
||||
]}},
|
||||
{"type": "table_row", "table_row": {"cells": [
|
||||
[{"type": "text", "text": {"content": "Title"}}],
|
||||
[{"type": "text", "text": {"content": (metadata.title or "—")[:50]}}],
|
||||
[{"type": "text", "text": {"content": f"✓ {metadata.title_length} chars" if metadata.title else "✗ Missing"}}],
|
||||
]}},
|
||||
{"type": "table_row", "table_row": {"cells": [
|
||||
[{"type": "text", "text": {"content": "Description"}}],
|
||||
[{"type": "text", "text": {"content": (metadata.meta_description or "—")[:50]}}],
|
||||
[{"type": "text", "text": {"content": f"✓ {metadata.meta_description_length} chars" if metadata.meta_description else "✗ Missing"}}],
|
||||
]}},
|
||||
{"type": "table_row", "table_row": {"cells": [
|
||||
[{"type": "text", "text": {"content": "Canonical"}}],
|
||||
[{"type": "text", "text": {"content": (metadata.canonical_url or "—")[:50]}}],
|
||||
[{"type": "text", "text": {"content": "✓" if metadata.canonical_url else "✗ Missing"}}],
|
||||
]}},
|
||||
{"type": "table_row", "table_row": {"cells": [
|
||||
[{"type": "text", "text": {"content": "Robots"}}],
|
||||
[{"type": "text", "text": {"content": metadata.robots_meta or "—"}}],
|
||||
[{"type": "text", "text": {"content": "✓" if metadata.robots_meta else "—"}}],
|
||||
]}},
|
||||
{"type": "table_row", "table_row": {"cells": [
|
||||
[{"type": "text", "text": {"content": "Lang"}}],
|
||||
[{"type": "text", "text": {"content": metadata.html_lang or "—"}}],
|
||||
[{"type": "text", "text": {"content": "✓" if metadata.html_lang else "—"}}],
|
||||
]}},
|
||||
]
|
||||
|
||||
children.append({
|
||||
"object": "block",
|
||||
"type": "table",
|
||||
"table": {
|
||||
"table_width": 3,
|
||||
"has_column_header": True,
|
||||
"has_row_header": False,
|
||||
"children": meta_rows
|
||||
}
|
||||
})
|
||||
|
||||
# Headings Section
|
||||
children.append({
|
||||
"object": "block",
|
||||
"type": "heading_2",
|
||||
"heading_2": {"rich_text": [{"type": "text", "text": {"content": "Headings"}}]}
|
||||
})
|
||||
|
||||
children.append({
|
||||
"object": "block",
|
||||
"type": "paragraph",
|
||||
"paragraph": {"rich_text": [
|
||||
{"type": "text", "text": {"content": f"H1: {metadata.h1_count} | "}},
|
||||
{"type": "text", "text": {"content": f"Total headings: {len(metadata.headings)}"}},
|
||||
]}
|
||||
})
|
||||
|
||||
if metadata.h1_text:
|
||||
children.append({
|
||||
"object": "block",
|
||||
"type": "quote",
|
||||
"quote": {"rich_text": [{"type": "text", "text": {"content": metadata.h1_text[:200]}}]}
|
||||
})
|
||||
|
||||
# Schema Data Section
|
||||
children.append({
|
||||
"object": "block",
|
||||
"type": "heading_2",
|
||||
"heading_2": {"rich_text": [{"type": "text", "text": {"content": "Structured Data"}}]}
|
||||
})
|
||||
|
||||
if metadata.schema_types_found:
|
||||
children.append({
|
||||
"object": "block",
|
||||
"type": "paragraph",
|
||||
"paragraph": {"rich_text": [
|
||||
{"type": "text", "text": {"content": "Schema types found: "}},
|
||||
{"type": "text", "text": {"content": ", ".join(metadata.schema_types_found)}, "annotations": {"code": True}},
|
||||
]}
|
||||
})
|
||||
else:
|
||||
children.append({
|
||||
"object": "block",
|
||||
"type": "callout",
|
||||
"callout": {
|
||||
"rich_text": [{"type": "text", "text": {"content": "No structured data found on this page"}}],
|
||||
"icon": {"type": "emoji", "emoji": "⚠️"},
|
||||
"color": "yellow_background",
|
||||
}
|
||||
})
|
||||
|
||||
# Open Graph Section
|
||||
children.append({
|
||||
"object": "block",
|
||||
"type": "heading_2",
|
||||
"heading_2": {"rich_text": [{"type": "text", "text": {"content": "Open Graph"}}]}
|
||||
})
|
||||
|
||||
og = metadata.open_graph
|
||||
og_status = "✓ Configured" if og.og_title else "✗ Missing"
|
||||
children.append({
|
||||
"object": "block",
|
||||
"type": "paragraph",
|
||||
"paragraph": {"rich_text": [
|
||||
{"type": "text", "text": {"content": f"Status: {og_status}\n"}},
|
||||
{"type": "text", "text": {"content": f"og:title: {og.og_title or '—'}\n"}},
|
||||
{"type": "text", "text": {"content": f"og:type: {og.og_type or '—'}"}},
|
||||
]}
|
||||
})
|
||||
|
||||
# Links Section
|
||||
children.append({
|
||||
"object": "block",
|
||||
"type": "heading_2",
|
||||
"heading_2": {"rich_text": [{"type": "text", "text": {"content": "Links"}}]}
|
||||
})
|
||||
|
||||
children.append({
|
||||
"object": "block",
|
||||
"type": "paragraph",
|
||||
"paragraph": {"rich_text": [
|
||||
{"type": "text", "text": {"content": f"Internal links: {metadata.internal_link_count}\n"}},
|
||||
{"type": "text", "text": {"content": f"External links: {metadata.external_link_count}"}},
|
||||
]}
|
||||
})
|
||||
|
||||
# Images Section
|
||||
children.append({
|
||||
"object": "block",
|
||||
"type": "heading_2",
|
||||
"heading_2": {"rich_text": [{"type": "text", "text": {"content": "Images"}}]}
|
||||
})
|
||||
|
||||
children.append({
|
||||
"object": "block",
|
||||
"type": "paragraph",
|
||||
"paragraph": {"rich_text": [
|
||||
{"type": "text", "text": {"content": f"Total: {metadata.images_total} | "}},
|
||||
{"type": "text", "text": {"content": f"With alt: {metadata.images_with_alt} | "}},
|
||||
{"type": "text", "text": {"content": f"Without alt: {metadata.images_without_alt}"}},
|
||||
]}
|
||||
})
|
||||
|
||||
# Hreflang Section (if present)
|
||||
if metadata.hreflang_tags:
|
||||
children.append({
|
||||
"object": "block",
|
||||
"type": "heading_2",
|
||||
"heading_2": {"rich_text": [{"type": "text", "text": {"content": "Hreflang Tags"}}]}
|
||||
})
|
||||
|
||||
for tag in metadata.hreflang_tags[:10]:
|
||||
children.append({
|
||||
"object": "block",
|
||||
"type": "bulleted_list_item",
|
||||
"bulleted_list_item": {"rich_text": [
|
||||
{"type": "text", "text": {"content": f"{tag['lang']}: "}},
|
||||
{"type": "text", "text": {"content": tag['url'], "link": {"url": tag['url']}}},
|
||||
]}
|
||||
})
|
||||
|
||||
# Issues & Warnings Section
|
||||
if metadata.issues or metadata.warnings:
|
||||
children.append({
|
||||
"object": "block",
|
||||
"type": "heading_2",
|
||||
"heading_2": {"rich_text": [{"type": "text", "text": {"content": "Issues & Warnings"}}]}
|
||||
})
|
||||
|
||||
for issue in metadata.issues:
|
||||
children.append({
|
||||
"object": "block",
|
||||
"type": "to_do",
|
||||
"to_do": {
|
||||
"rich_text": [
|
||||
{"type": "text", "text": {"content": "❌ "}, "annotations": {"bold": True}},
|
||||
{"type": "text", "text": {"content": issue}},
|
||||
],
|
||||
"checked": False,
|
||||
}
|
||||
})
|
||||
|
||||
for warning in metadata.warnings:
|
||||
children.append({
|
||||
"object": "block",
|
||||
"type": "to_do",
|
||||
"to_do": {
|
||||
"rich_text": [
|
||||
{"type": "text", "text": {"content": "⚠️ "}, "annotations": {"bold": True}},
|
||||
{"type": "text", "text": {"content": warning}},
|
||||
],
|
||||
"checked": False,
|
||||
}
|
||||
})
|
||||
|
||||
return children
|
||||
|
||||
def _create_crawl_summary_page(self, result: CrawlResult) -> str | None:
|
||||
"""Create a summary page for the crawl."""
|
||||
try:
|
||||
site_domain = urlparse(result.site).netloc
|
||||
|
||||
# Calculate statistics
|
||||
total_issues = sum(len(p.issues) for p in result.pages_analyzed)
|
||||
total_warnings = sum(len(p.warnings) for p in result.pages_analyzed)
|
||||
pages_with_issues = sum(1 for p in result.pages_analyzed if p.issues)
|
||||
pages_without_schema = sum(1 for p in result.pages_analyzed if not p.schema_types_found)
|
||||
pages_without_description = sum(1 for p in result.pages_analyzed if not p.meta_description)
|
||||
|
||||
children = []
|
||||
|
||||
# Header callout
|
||||
children.append({
|
||||
"object": "block",
|
||||
"type": "callout",
|
||||
"callout": {
|
||||
"rich_text": [
|
||||
{"type": "text", "text": {"content": f"Sitemap Crawl Complete\n\n"}},
|
||||
{"type": "text", "text": {"content": f"Audit ID: {result.audit_id}\n"}},
|
||||
{"type": "text", "text": {"content": f"Duration: {result.get_duration()}\n"}},
|
||||
{"type": "text", "text": {"content": f"Pages: {result.successful_pages}/{result.total_pages}"}},
|
||||
],
|
||||
"icon": {"type": "emoji", "emoji": "📊"},
|
||||
"color": "blue_background",
|
||||
}
|
||||
})
|
||||
|
||||
# Statistics table
|
||||
children.append({
|
||||
"object": "block",
|
||||
"type": "heading_2",
|
||||
"heading_2": {"rich_text": [{"type": "text", "text": {"content": "Statistics"}}]}
|
||||
})
|
||||
|
||||
stats_rows = [
|
||||
{"type": "table_row", "table_row": {"cells": [
|
||||
[{"type": "text", "text": {"content": "Metric"}, "annotations": {"bold": True}}],
|
||||
[{"type": "text", "text": {"content": "Count"}, "annotations": {"bold": True}}],
|
||||
]}},
|
||||
{"type": "table_row", "table_row": {"cells": [
|
||||
[{"type": "text", "text": {"content": "Total Pages"}}],
|
||||
[{"type": "text", "text": {"content": str(result.total_pages)}}],
|
||||
]}},
|
||||
{"type": "table_row", "table_row": {"cells": [
|
||||
[{"type": "text", "text": {"content": "Successfully Analyzed"}}],
|
||||
[{"type": "text", "text": {"content": str(result.successful_pages)}}],
|
||||
]}},
|
||||
{"type": "table_row", "table_row": {"cells": [
|
||||
[{"type": "text", "text": {"content": "Pages with Issues"}}],
|
||||
[{"type": "text", "text": {"content": str(pages_with_issues)}}],
|
||||
]}},
|
||||
{"type": "table_row", "table_row": {"cells": [
|
||||
[{"type": "text", "text": {"content": "Total Issues"}}],
|
||||
[{"type": "text", "text": {"content": str(total_issues)}}],
|
||||
]}},
|
||||
{"type": "table_row", "table_row": {"cells": [
|
||||
[{"type": "text", "text": {"content": "Total Warnings"}}],
|
||||
[{"type": "text", "text": {"content": str(total_warnings)}}],
|
||||
]}},
|
||||
{"type": "table_row", "table_row": {"cells": [
|
||||
[{"type": "text", "text": {"content": "Pages without Schema"}}],
|
||||
[{"type": "text", "text": {"content": str(pages_without_schema)}}],
|
||||
]}},
|
||||
{"type": "table_row", "table_row": {"cells": [
|
||||
[{"type": "text", "text": {"content": "Pages without Description"}}],
|
||||
[{"type": "text", "text": {"content": str(pages_without_description)}}],
|
||||
]}},
|
||||
]
|
||||
|
||||
children.append({
|
||||
"object": "block",
|
||||
"type": "table",
|
||||
"table": {
|
||||
"table_width": 2,
|
||||
"has_column_header": True,
|
||||
"has_row_header": False,
|
||||
"children": stats_rows
|
||||
}
|
||||
})
|
||||
|
||||
# Pages list
|
||||
children.append({
|
||||
"object": "block",
|
||||
"type": "heading_2",
|
||||
"heading_2": {"rich_text": [{"type": "text", "text": {"content": "Analyzed Pages"}}]}
|
||||
})
|
||||
|
||||
children.append({
|
||||
"object": "block",
|
||||
"type": "paragraph",
|
||||
"paragraph": {"rich_text": [
|
||||
{"type": "text", "text": {"content": f"Filter by Audit ID in the database to see all {result.successful_pages} page entries."}}
|
||||
]}
|
||||
})
|
||||
|
||||
# Create the summary page
|
||||
response = self.notion.pages.create(
|
||||
parent={"database_id": self.database_id},
|
||||
properties={
|
||||
"Issue": {"title": [{"text": {"content": f"📊 Sitemap Crawl: {site_domain}"}}]},
|
||||
"Category": {"select": {"name": "Technical SEO"}},
|
||||
"Priority": {"select": {"name": "High"}},
|
||||
"Site": {"url": result.site},
|
||||
"Audit ID": {"rich_text": [{"text": {"content": result.audit_id}}]},
|
||||
"Found Date": {"date": {"start": datetime.now().strftime("%Y-%m-%d")}},
|
||||
},
|
||||
children=children,
|
||||
)
|
||||
|
||||
logger.info(f"Created crawl summary page: {response['id']}")
|
||||
return response["id"]
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to create summary page: {e}")
|
||||
return None
|
||||
|
||||
|
||||
def print_progress_status(progress: CrawlProgress) -> None:
|
||||
"""Print formatted progress status."""
|
||||
status_emoji = {
|
||||
"running": "🔄",
|
||||
"completed": "✅",
|
||||
"failed": "❌",
|
||||
}.get(progress.status, "❓")
|
||||
|
||||
print(f"""
|
||||
{'=' * 60}
|
||||
{status_emoji} SEO Page Analysis - {progress.status.upper()}
|
||||
{'=' * 60}
|
||||
Audit ID: {progress.audit_id}
|
||||
Site: {progress.site}
|
||||
Status: {progress.status}
|
||||
|
||||
Progress: {progress.processed_urls}/{progress.total_urls} pages ({progress.get_progress_percent():.1f}%)
|
||||
Successful: {progress.successful_urls}
|
||||
Failed: {progress.failed_urls}
|
||||
Elapsed: {progress.get_elapsed_time()}
|
||||
ETA: {progress.get_eta() if progress.status == 'running' else 'N/A'}
|
||||
|
||||
Current URL: {progress.current_url[:60] + '...' if len(progress.current_url) > 60 else progress.current_url}
|
||||
""")
|
||||
|
||||
if progress.summary_page_id:
|
||||
print(f"Summary: https://www.notion.so/{progress.summary_page_id.replace('-', '')}")
|
||||
|
||||
if progress.error_message:
|
||||
print(f"Error: {progress.error_message}")
|
||||
|
||||
print("=" * 60)
|
||||
|
||||
|
||||
def main():
|
||||
"""CLI entry point."""
|
||||
import argparse
|
||||
|
||||
parser = argparse.ArgumentParser(description="Sitemap Crawler with Background Support")
|
||||
subparsers = parser.add_subparsers(dest="command", help="Commands")
|
||||
|
||||
# Crawl command
|
||||
crawl_parser = subparsers.add_parser("crawl", help="Start crawling a sitemap")
|
||||
crawl_parser.add_argument("sitemap_url", help="URL of the sitemap to crawl")
|
||||
crawl_parser.add_argument("--delay", "-d", type=float, default=DEFAULT_DELAY_SECONDS,
|
||||
help=f"Delay between requests in seconds (default: {DEFAULT_DELAY_SECONDS})")
|
||||
crawl_parser.add_argument("--max-pages", "-m", type=int, default=DEFAULT_MAX_PAGES,
|
||||
help=f"Maximum pages to process (default: {DEFAULT_MAX_PAGES})")
|
||||
crawl_parser.add_argument("--no-notion", action="store_true",
|
||||
help="Don't save to Notion")
|
||||
crawl_parser.add_argument("--no-limit", action="store_true",
|
||||
help="Remove page limit (use with caution)")
|
||||
|
||||
# Status command
|
||||
status_parser = subparsers.add_parser("status", help="Check crawl progress")
|
||||
status_parser.add_argument("audit_id", nargs="?", help="Specific audit ID to check (optional)")
|
||||
status_parser.add_argument("--all", "-a", action="store_true", help="Show all crawls (not just active)")
|
||||
|
||||
# List command
|
||||
list_parser = subparsers.add_parser("list", help="List all crawl jobs")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Default to crawl if no command specified but URL provided
|
||||
if args.command is None:
|
||||
# Check if first positional arg looks like a URL
|
||||
import sys
|
||||
if len(sys.argv) > 1 and (sys.argv[1].startswith("http") or sys.argv[1].endswith(".xml")):
|
||||
args.command = "crawl"
|
||||
args.sitemap_url = sys.argv[1]
|
||||
args.delay = DEFAULT_DELAY_SECONDS
|
||||
args.max_pages = DEFAULT_MAX_PAGES
|
||||
args.no_notion = False
|
||||
args.no_limit = False
|
||||
else:
|
||||
parser.print_help()
|
||||
return
|
||||
|
||||
if args.command == "status":
|
||||
if args.audit_id:
|
||||
# Show specific crawl status
|
||||
progress = get_crawl_status(args.audit_id)
|
||||
if progress:
|
||||
print_progress_status(progress)
|
||||
else:
|
||||
print(f"No crawl found with audit ID: {args.audit_id}")
|
||||
else:
|
||||
# Show active crawls
|
||||
if args.all:
|
||||
crawls = get_all_crawls()
|
||||
label = "All"
|
||||
else:
|
||||
crawls = get_active_crawls()
|
||||
label = "Active"
|
||||
|
||||
if crawls:
|
||||
print(f"\n{label} Crawl Jobs ({len(crawls)}):")
|
||||
print("-" * 60)
|
||||
for p in crawls:
|
||||
status_emoji = {"running": "🔄", "completed": "✅", "failed": "❌"}.get(p.status, "❓")
|
||||
print(f"{status_emoji} {p.audit_id}")
|
||||
print(f" Site: {p.site}")
|
||||
print(f" Progress: {p.processed_urls}/{p.total_urls} ({p.get_progress_percent():.1f}%)")
|
||||
print()
|
||||
else:
|
||||
print(f"No {label.lower()} crawl jobs found.")
|
||||
return
|
||||
|
||||
if args.command == "list":
|
||||
crawls = get_all_crawls()
|
||||
if crawls:
|
||||
print(f"\nAll Crawl Jobs ({len(crawls)}):")
|
||||
print("-" * 80)
|
||||
print(f"{'Status':<10} {'Audit ID':<45} {'Progress':<15}")
|
||||
print("-" * 80)
|
||||
for p in crawls[:20]: # Show last 20
|
||||
status_emoji = {"running": "🔄", "completed": "✅", "failed": "❌"}.get(p.status, "❓")
|
||||
progress_str = f"{p.processed_urls}/{p.total_urls}"
|
||||
print(f"{status_emoji} {p.status:<7} {p.audit_id:<45} {progress_str:<15}")
|
||||
if len(crawls) > 20:
|
||||
print(f"... and {len(crawls) - 20} more")
|
||||
else:
|
||||
print("No crawl jobs found.")
|
||||
return
|
||||
|
||||
if args.command == "crawl":
|
||||
# Handle --no-limit option
|
||||
max_pages = args.max_pages
|
||||
if args.no_limit:
|
||||
max_pages = 999999 # Effectively unlimited
|
||||
print("⚠️ WARNING: Page limit disabled. This may take a very long time!")
|
||||
|
||||
def progress_callback(progress: CrawlProgress):
|
||||
pct = progress.get_progress_percent()
|
||||
print(f"\r[{pct:5.1f}%] {progress.processed_urls}/{progress.total_urls} pages | "
|
||||
f"Success: {progress.successful_urls} | Failed: {progress.failed_urls} | "
|
||||
f"ETA: {progress.get_eta()}", end="", flush=True)
|
||||
|
||||
crawler = SitemapCrawler()
|
||||
result = crawler.crawl_sitemap(
|
||||
args.sitemap_url,
|
||||
delay=args.delay,
|
||||
max_pages=max_pages,
|
||||
progress_callback=progress_callback,
|
||||
save_to_notion=not args.no_notion,
|
||||
)
|
||||
|
||||
print() # New line after progress
|
||||
print()
|
||||
print("=" * 60)
|
||||
print("CRAWL COMPLETE")
|
||||
print("=" * 60)
|
||||
print(f"Audit ID: {result.audit_id}")
|
||||
print(f"Total Pages: {result.total_pages}")
|
||||
print(f"Successful: {result.successful_pages}")
|
||||
print(f"Failed: {result.failed_pages}")
|
||||
print(f"Duration: {result.get_duration()}")
|
||||
if result.summary_page_id:
|
||||
print(f"Summary Page: https://www.notion.so/{result.summary_page_id.replace('-', '')}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -0,0 +1,467 @@
|
||||
"""
|
||||
Sitemap Validator - Validate XML sitemaps
|
||||
==========================================
|
||||
Purpose: Parse and validate XML sitemaps for SEO compliance
|
||||
Python: 3.10+
|
||||
Usage:
|
||||
python sitemap_validator.py --url https://example.com/sitemap.xml
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import asyncio
|
||||
import gzip
|
||||
import json
|
||||
import logging
|
||||
import re
|
||||
from dataclasses import dataclass, field
|
||||
from datetime import datetime
|
||||
from io import BytesIO
|
||||
from typing import Any
|
||||
from urllib.parse import urljoin, urlparse
|
||||
|
||||
import aiohttp
|
||||
import requests
|
||||
from lxml import etree
|
||||
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format="%(asctime)s - %(levelname)s - %(message)s",
|
||||
)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@dataclass
|
||||
class SitemapIssue:
|
||||
"""Represents a sitemap validation issue."""
|
||||
|
||||
severity: str # "error", "warning", "info"
|
||||
message: str
|
||||
url: str | None = None
|
||||
suggestion: str | None = None
|
||||
|
||||
|
||||
@dataclass
|
||||
class SitemapEntry:
|
||||
"""Represents a single URL entry in sitemap."""
|
||||
|
||||
loc: str
|
||||
lastmod: str | None = None
|
||||
changefreq: str | None = None
|
||||
priority: float | None = None
|
||||
status_code: int | None = None
|
||||
|
||||
|
||||
@dataclass
|
||||
class SitemapResult:
|
||||
"""Complete sitemap validation result."""
|
||||
|
||||
url: str
|
||||
sitemap_type: str # "urlset" or "sitemapindex"
|
||||
entries: list[SitemapEntry] = field(default_factory=list)
|
||||
child_sitemaps: list[str] = field(default_factory=list)
|
||||
issues: list[SitemapIssue] = field(default_factory=list)
|
||||
valid: bool = True
|
||||
stats: dict = field(default_factory=dict)
|
||||
timestamp: str = field(default_factory=lambda: datetime.now().isoformat())
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
"""Convert to dictionary for JSON output."""
|
||||
return {
|
||||
"url": self.url,
|
||||
"sitemap_type": self.sitemap_type,
|
||||
"valid": self.valid,
|
||||
"stats": self.stats,
|
||||
"issues": [
|
||||
{
|
||||
"severity": i.severity,
|
||||
"message": i.message,
|
||||
"url": i.url,
|
||||
"suggestion": i.suggestion,
|
||||
}
|
||||
for i in self.issues
|
||||
],
|
||||
"entries_count": len(self.entries),
|
||||
"child_sitemaps": self.child_sitemaps,
|
||||
"timestamp": self.timestamp,
|
||||
}
|
||||
|
||||
|
||||
class SitemapValidator:
|
||||
"""Validate XML sitemaps."""
|
||||
|
||||
SITEMAP_NS = "http://www.sitemaps.org/schemas/sitemap/0.9"
|
||||
MAX_URLS = 50000
|
||||
MAX_SIZE_BYTES = 50 * 1024 * 1024 # 50MB
|
||||
|
||||
VALID_CHANGEFREQ = {
|
||||
"always", "hourly", "daily", "weekly",
|
||||
"monthly", "yearly", "never"
|
||||
}
|
||||
|
||||
def __init__(self, check_urls: bool = False, max_concurrent: int = 10):
|
||||
self.check_urls = check_urls
|
||||
self.max_concurrent = max_concurrent
|
||||
self.session = requests.Session()
|
||||
self.session.headers.update({
|
||||
"User-Agent": "Mozilla/5.0 (compatible; SEOAuditBot/1.0)"
|
||||
})
|
||||
|
||||
def fetch_sitemap(self, url: str) -> tuple[bytes, bool]:
|
||||
"""Fetch sitemap content, handling gzip compression."""
|
||||
try:
|
||||
response = self.session.get(url, timeout=30)
|
||||
response.raise_for_status()
|
||||
|
||||
content = response.content
|
||||
is_gzipped = False
|
||||
|
||||
# Check if gzipped
|
||||
if url.endswith(".gz") or response.headers.get(
|
||||
"Content-Encoding"
|
||||
) == "gzip":
|
||||
try:
|
||||
content = gzip.decompress(content)
|
||||
is_gzipped = True
|
||||
except gzip.BadGzipFile:
|
||||
pass
|
||||
|
||||
return content, is_gzipped
|
||||
except requests.RequestException as e:
|
||||
raise RuntimeError(f"Failed to fetch sitemap: {e}")
|
||||
|
||||
def parse_sitemap(self, content: bytes) -> tuple[str, list[dict]]:
|
||||
"""Parse sitemap XML content."""
|
||||
try:
|
||||
root = etree.fromstring(content)
|
||||
except etree.XMLSyntaxError as e:
|
||||
raise ValueError(f"Invalid XML: {e}")
|
||||
|
||||
# Remove namespace for easier parsing
|
||||
nsmap = {"sm": self.SITEMAP_NS}
|
||||
|
||||
# Check if it's a sitemap index or urlset
|
||||
if root.tag == f"{{{self.SITEMAP_NS}}}sitemapindex":
|
||||
sitemap_type = "sitemapindex"
|
||||
entries = []
|
||||
for sitemap in root.findall("sm:sitemap", nsmap):
|
||||
entry = {}
|
||||
loc = sitemap.find("sm:loc", nsmap)
|
||||
if loc is not None and loc.text:
|
||||
entry["loc"] = loc.text.strip()
|
||||
lastmod = sitemap.find("sm:lastmod", nsmap)
|
||||
if lastmod is not None and lastmod.text:
|
||||
entry["lastmod"] = lastmod.text.strip()
|
||||
if entry.get("loc"):
|
||||
entries.append(entry)
|
||||
elif root.tag == f"{{{self.SITEMAP_NS}}}urlset":
|
||||
sitemap_type = "urlset"
|
||||
entries = []
|
||||
for url in root.findall("sm:url", nsmap):
|
||||
entry = {}
|
||||
loc = url.find("sm:loc", nsmap)
|
||||
if loc is not None and loc.text:
|
||||
entry["loc"] = loc.text.strip()
|
||||
lastmod = url.find("sm:lastmod", nsmap)
|
||||
if lastmod is not None and lastmod.text:
|
||||
entry["lastmod"] = lastmod.text.strip()
|
||||
changefreq = url.find("sm:changefreq", nsmap)
|
||||
if changefreq is not None and changefreq.text:
|
||||
entry["changefreq"] = changefreq.text.strip().lower()
|
||||
priority = url.find("sm:priority", nsmap)
|
||||
if priority is not None and priority.text:
|
||||
try:
|
||||
entry["priority"] = float(priority.text.strip())
|
||||
except ValueError:
|
||||
entry["priority"] = None
|
||||
if entry.get("loc"):
|
||||
entries.append(entry)
|
||||
else:
|
||||
raise ValueError(f"Unknown sitemap type: {root.tag}")
|
||||
|
||||
return sitemap_type, entries
|
||||
|
||||
def validate(self, url: str) -> SitemapResult:
|
||||
"""Validate a sitemap URL."""
|
||||
result = SitemapResult(url=url, sitemap_type="unknown")
|
||||
|
||||
# Fetch sitemap
|
||||
try:
|
||||
content, is_gzipped = self.fetch_sitemap(url)
|
||||
except RuntimeError as e:
|
||||
result.issues.append(SitemapIssue(
|
||||
severity="error",
|
||||
message=str(e),
|
||||
url=url,
|
||||
))
|
||||
result.valid = False
|
||||
return result
|
||||
|
||||
# Check size
|
||||
if len(content) > self.MAX_SIZE_BYTES:
|
||||
result.issues.append(SitemapIssue(
|
||||
severity="error",
|
||||
message=f"Sitemap exceeds 50MB limit ({len(content) / 1024 / 1024:.2f}MB)",
|
||||
url=url,
|
||||
suggestion="Split sitemap into smaller files using sitemap index",
|
||||
))
|
||||
|
||||
# Parse XML
|
||||
try:
|
||||
sitemap_type, entries = self.parse_sitemap(content)
|
||||
except ValueError as e:
|
||||
result.issues.append(SitemapIssue(
|
||||
severity="error",
|
||||
message=str(e),
|
||||
url=url,
|
||||
))
|
||||
result.valid = False
|
||||
return result
|
||||
|
||||
result.sitemap_type = sitemap_type
|
||||
|
||||
# Process entries
|
||||
if sitemap_type == "sitemapindex":
|
||||
result.child_sitemaps = [e["loc"] for e in entries]
|
||||
result.stats = {
|
||||
"child_sitemaps_count": len(entries),
|
||||
}
|
||||
else:
|
||||
# Validate URL entries
|
||||
url_count = len(entries)
|
||||
result.stats["url_count"] = url_count
|
||||
|
||||
if url_count > self.MAX_URLS:
|
||||
result.issues.append(SitemapIssue(
|
||||
severity="error",
|
||||
message=f"Sitemap exceeds 50,000 URL limit ({url_count} URLs)",
|
||||
url=url,
|
||||
suggestion="Split into multiple sitemaps with sitemap index",
|
||||
))
|
||||
|
||||
if url_count == 0:
|
||||
result.issues.append(SitemapIssue(
|
||||
severity="warning",
|
||||
message="Sitemap is empty (no URLs)",
|
||||
url=url,
|
||||
))
|
||||
|
||||
# Validate individual entries
|
||||
seen_urls = set()
|
||||
invalid_lastmod = 0
|
||||
invalid_changefreq = 0
|
||||
invalid_priority = 0
|
||||
|
||||
for entry in entries:
|
||||
loc = entry.get("loc", "")
|
||||
|
||||
# Check for duplicates
|
||||
if loc in seen_urls:
|
||||
result.issues.append(SitemapIssue(
|
||||
severity="warning",
|
||||
message="Duplicate URL in sitemap",
|
||||
url=loc,
|
||||
))
|
||||
seen_urls.add(loc)
|
||||
|
||||
# Validate lastmod format
|
||||
lastmod = entry.get("lastmod")
|
||||
if lastmod:
|
||||
if not self._validate_date(lastmod):
|
||||
invalid_lastmod += 1
|
||||
|
||||
# Validate changefreq
|
||||
changefreq = entry.get("changefreq")
|
||||
if changefreq and changefreq not in self.VALID_CHANGEFREQ:
|
||||
invalid_changefreq += 1
|
||||
|
||||
# Validate priority
|
||||
priority = entry.get("priority")
|
||||
if priority is not None:
|
||||
if not (0.0 <= priority <= 1.0):
|
||||
invalid_priority += 1
|
||||
|
||||
# Create entry object
|
||||
result.entries.append(SitemapEntry(
|
||||
loc=loc,
|
||||
lastmod=lastmod,
|
||||
changefreq=changefreq,
|
||||
priority=priority,
|
||||
))
|
||||
|
||||
# Add summary issues
|
||||
if invalid_lastmod > 0:
|
||||
result.issues.append(SitemapIssue(
|
||||
severity="warning",
|
||||
message=f"{invalid_lastmod} URLs with invalid lastmod format",
|
||||
suggestion="Use ISO 8601 format (YYYY-MM-DD or YYYY-MM-DDTHH:MM:SS+TZ)",
|
||||
))
|
||||
|
||||
if invalid_changefreq > 0:
|
||||
result.issues.append(SitemapIssue(
|
||||
severity="info",
|
||||
message=f"{invalid_changefreq} URLs with invalid changefreq",
|
||||
suggestion="Use: always, hourly, daily, weekly, monthly, yearly, never",
|
||||
))
|
||||
|
||||
if invalid_priority > 0:
|
||||
result.issues.append(SitemapIssue(
|
||||
severity="warning",
|
||||
message=f"{invalid_priority} URLs with invalid priority (must be 0.0-1.0)",
|
||||
))
|
||||
|
||||
result.stats.update({
|
||||
"invalid_lastmod": invalid_lastmod,
|
||||
"invalid_changefreq": invalid_changefreq,
|
||||
"invalid_priority": invalid_priority,
|
||||
"has_lastmod": sum(1 for e in result.entries if e.lastmod),
|
||||
"has_changefreq": sum(1 for e in result.entries if e.changefreq),
|
||||
"has_priority": sum(1 for e in result.entries if e.priority is not None),
|
||||
})
|
||||
|
||||
# Check URLs if requested
|
||||
if self.check_urls and result.entries:
|
||||
asyncio.run(self._check_url_status(result))
|
||||
|
||||
# Determine validity
|
||||
result.valid = not any(i.severity == "error" for i in result.issues)
|
||||
|
||||
return result
|
||||
|
||||
def _validate_date(self, date_str: str) -> bool:
|
||||
"""Validate ISO 8601 date format."""
|
||||
patterns = [
|
||||
r"^\d{4}-\d{2}-\d{2}$",
|
||||
r"^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}",
|
||||
]
|
||||
return any(re.match(p, date_str) for p in patterns)
|
||||
|
||||
async def _check_url_status(self, result: SitemapResult) -> None:
|
||||
"""Check HTTP status of URLs in sitemap."""
|
||||
semaphore = asyncio.Semaphore(self.max_concurrent)
|
||||
|
||||
async def check_url(entry: SitemapEntry) -> None:
|
||||
async with semaphore:
|
||||
try:
|
||||
async with aiohttp.ClientSession() as session:
|
||||
async with session.head(
|
||||
entry.loc,
|
||||
timeout=aiohttp.ClientTimeout(total=10),
|
||||
allow_redirects=True,
|
||||
) as response:
|
||||
entry.status_code = response.status
|
||||
except Exception:
|
||||
entry.status_code = 0
|
||||
|
||||
await asyncio.gather(*[check_url(e) for e in result.entries[:100]])
|
||||
|
||||
# Count status codes
|
||||
status_counts = {}
|
||||
for entry in result.entries:
|
||||
if entry.status_code:
|
||||
status_counts[entry.status_code] = (
|
||||
status_counts.get(entry.status_code, 0) + 1
|
||||
)
|
||||
|
||||
result.stats["url_status_codes"] = status_counts
|
||||
|
||||
# Add issues for non-200 URLs
|
||||
error_count = sum(
|
||||
1 for e in result.entries
|
||||
if e.status_code and e.status_code >= 400
|
||||
)
|
||||
if error_count > 0:
|
||||
result.issues.append(SitemapIssue(
|
||||
severity="warning",
|
||||
message=f"{error_count} URLs returning error status codes (4xx/5xx)",
|
||||
suggestion="Remove or fix broken URLs in sitemap",
|
||||
))
|
||||
|
||||
def generate_report(self, result: SitemapResult) -> str:
|
||||
"""Generate human-readable validation report."""
|
||||
lines = [
|
||||
"=" * 60,
|
||||
"Sitemap Validation Report",
|
||||
"=" * 60,
|
||||
f"URL: {result.url}",
|
||||
f"Type: {result.sitemap_type}",
|
||||
f"Valid: {'Yes' if result.valid else 'No'}",
|
||||
f"Timestamp: {result.timestamp}",
|
||||
"",
|
||||
]
|
||||
|
||||
lines.append("Statistics:")
|
||||
for key, value in result.stats.items():
|
||||
lines.append(f" {key}: {value}")
|
||||
lines.append("")
|
||||
|
||||
if result.child_sitemaps:
|
||||
lines.append(f"Child Sitemaps ({len(result.child_sitemaps)}):")
|
||||
for sitemap in result.child_sitemaps[:10]:
|
||||
lines.append(f" - {sitemap}")
|
||||
if len(result.child_sitemaps) > 10:
|
||||
lines.append(f" ... and {len(result.child_sitemaps) - 10} more")
|
||||
lines.append("")
|
||||
|
||||
if result.issues:
|
||||
lines.append("Issues Found:")
|
||||
errors = [i for i in result.issues if i.severity == "error"]
|
||||
warnings = [i for i in result.issues if i.severity == "warning"]
|
||||
infos = [i for i in result.issues if i.severity == "info"]
|
||||
|
||||
if errors:
|
||||
lines.append(f"\n ERRORS ({len(errors)}):")
|
||||
for issue in errors:
|
||||
lines.append(f" - {issue.message}")
|
||||
if issue.url:
|
||||
lines.append(f" URL: {issue.url}")
|
||||
if issue.suggestion:
|
||||
lines.append(f" Suggestion: {issue.suggestion}")
|
||||
|
||||
if warnings:
|
||||
lines.append(f"\n WARNINGS ({len(warnings)}):")
|
||||
for issue in warnings:
|
||||
lines.append(f" - {issue.message}")
|
||||
if issue.suggestion:
|
||||
lines.append(f" Suggestion: {issue.suggestion}")
|
||||
|
||||
if infos:
|
||||
lines.append(f"\n INFO ({len(infos)}):")
|
||||
for issue in infos:
|
||||
lines.append(f" - {issue.message}")
|
||||
|
||||
lines.append("")
|
||||
lines.append("=" * 60)
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def main():
|
||||
"""Main entry point for CLI usage."""
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Validate XML sitemaps",
|
||||
)
|
||||
parser.add_argument("--url", "-u", required=True, help="Sitemap URL to validate")
|
||||
parser.add_argument("--check-urls", action="store_true",
|
||||
help="Check HTTP status of URLs (slower)")
|
||||
parser.add_argument("--output", "-o", help="Output file for JSON report")
|
||||
parser.add_argument("--json", action="store_true", help="Output as JSON")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
validator = SitemapValidator(check_urls=args.check_urls)
|
||||
result = validator.validate(args.url)
|
||||
|
||||
if args.json or args.output:
|
||||
output = json.dumps(result.to_dict(), ensure_ascii=False, indent=2)
|
||||
if args.output:
|
||||
with open(args.output, "w", encoding="utf-8") as f:
|
||||
f.write(output)
|
||||
logger.info(f"Report written to {args.output}")
|
||||
else:
|
||||
print(output)
|
||||
else:
|
||||
print(validator.generate_report(result))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
94
custom-skills/10-seo-technical-audit/desktop/SKILL.md
Normal file
94
custom-skills/10-seo-technical-audit/desktop/SKILL.md
Normal file
@@ -0,0 +1,94 @@
|
||||
---
|
||||
name: seo-technical-audit
|
||||
version: 1.0.0
|
||||
description: Technical SEO auditor for crawlability fundamentals. Triggers: robots.txt, sitemap validation, crawlability, indexing check, technical SEO.
|
||||
allowed-tools: mcp__firecrawl__*, mcp__perplexity__*, mcp__notion__*
|
||||
---
|
||||
|
||||
# SEO Technical Audit
|
||||
|
||||
## Purpose
|
||||
|
||||
Analyze crawlability fundamentals: robots.txt rules, XML sitemap structure, and URL accessibility. Identify issues blocking search engine crawlers.
|
||||
|
||||
## Core Capabilities
|
||||
|
||||
1. **Robots.txt Analysis** - Parse rules, check blocked resources
|
||||
2. **Sitemap Validation** - Verify XML structure, URL limits, dates
|
||||
3. **URL Accessibility** - Check HTTP status, redirects, broken links
|
||||
|
||||
## MCP Tool Usage
|
||||
|
||||
### Firecrawl for Page Data
|
||||
```
|
||||
mcp__firecrawl__scrape: Fetch robots.txt and sitemap content
|
||||
mcp__firecrawl__crawl: Check multiple URLs accessibility
|
||||
```
|
||||
|
||||
### Perplexity for Best Practices
|
||||
```
|
||||
mcp__perplexity__search: Research current SEO recommendations
|
||||
```
|
||||
|
||||
## Workflow
|
||||
|
||||
### 1. Robots.txt Check
|
||||
1. Fetch `[domain]/robots.txt` using Firecrawl
|
||||
2. Parse User-agent rules and Disallow patterns
|
||||
3. Identify blocked resources (CSS, JS, images)
|
||||
4. Check for Sitemap declarations
|
||||
5. Report critical issues
|
||||
|
||||
### 2. Sitemap Validation
|
||||
1. Locate sitemap (from robots.txt or `/sitemap.xml`)
|
||||
2. Validate XML syntax
|
||||
3. Check URL count (max 50,000)
|
||||
4. Verify lastmod date formats
|
||||
5. For sitemap index: parse child sitemaps
|
||||
|
||||
### 3. URL Accessibility Sampling
|
||||
1. Extract URLs from sitemap
|
||||
2. Sample 50-100 URLs for large sites
|
||||
3. Check HTTP status codes
|
||||
4. Identify redirects and broken links
|
||||
5. Report 4xx/5xx errors
|
||||
|
||||
## Output Format
|
||||
|
||||
```markdown
|
||||
## Technical SEO Audit: [domain]
|
||||
|
||||
### Robots.txt Analysis
|
||||
- Status: [Valid/Invalid/Missing]
|
||||
- Sitemap declared: [Yes/No]
|
||||
- Critical blocks: [List]
|
||||
|
||||
### Sitemap Validation
|
||||
- URLs found: [count]
|
||||
- Syntax: [Valid/Errors]
|
||||
- Issues: [List]
|
||||
|
||||
### URL Accessibility (sampled)
|
||||
- Checked: [count] URLs
|
||||
- Success (2xx): [count]
|
||||
- Redirects (3xx): [count]
|
||||
- Errors (4xx/5xx): [count]
|
||||
|
||||
### Recommendations
|
||||
1. [Priority fixes]
|
||||
```
|
||||
|
||||
## Common Issues
|
||||
|
||||
| Issue | Impact | Fix |
|
||||
|-------|--------|-----|
|
||||
| No sitemap in robots.txt | Medium | Add `Sitemap:` directive |
|
||||
| Blocking CSS/JS | High | Allow Googlebot access |
|
||||
| 404s in sitemap | High | Remove or fix URLs |
|
||||
| Missing lastmod | Low | Add dates for freshness signals |
|
||||
|
||||
## Limitations
|
||||
|
||||
- Cannot access password-protected sitemaps
|
||||
- Large sitemaps (10,000+ URLs) require sampling
|
||||
- Does not check render-blocking issues (use Core Web Vitals skill)
|
||||
107
custom-skills/11-seo-on-page-audit/code/CLAUDE.md
Normal file
107
custom-skills/11-seo-on-page-audit/code/CLAUDE.md
Normal file
@@ -0,0 +1,107 @@
|
||||
# CLAUDE.md
|
||||
|
||||
## Overview
|
||||
|
||||
On-page SEO analyzer for single-page optimization: meta tags, headings, links, images, and Open Graph data.
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
pip install -r scripts/requirements.txt
|
||||
python scripts/page_analyzer.py --url https://example.com
|
||||
```
|
||||
|
||||
## Scripts
|
||||
|
||||
| Script | Purpose |
|
||||
|--------|---------|
|
||||
| `page_analyzer.py` | Analyze on-page SEO elements |
|
||||
| `base_client.py` | Shared utilities |
|
||||
|
||||
## Usage
|
||||
|
||||
```bash
|
||||
# Full page analysis
|
||||
python scripts/page_analyzer.py --url https://example.com
|
||||
|
||||
# JSON output
|
||||
python scripts/page_analyzer.py --url https://example.com --json
|
||||
|
||||
# Analyze multiple pages
|
||||
python scripts/page_analyzer.py --urls urls.txt
|
||||
```
|
||||
|
||||
## Analysis Categories
|
||||
|
||||
### Meta Tags
|
||||
- Title tag (length, keywords)
|
||||
- Meta description (length, call-to-action)
|
||||
- Canonical URL
|
||||
- Robots meta tag
|
||||
|
||||
### Heading Structure
|
||||
- H1 presence and count
|
||||
- Heading hierarchy (H1→H6)
|
||||
- Keyword placement in headings
|
||||
|
||||
### Links
|
||||
- Internal link count
|
||||
- External link count
|
||||
- Broken links (4xx/5xx)
|
||||
- Nofollow distribution
|
||||
|
||||
### Images
|
||||
- Alt attribute presence
|
||||
- Image file sizes
|
||||
- Lazy loading implementation
|
||||
|
||||
### Open Graph / Social
|
||||
- OG title, description, image
|
||||
- Twitter Card tags
|
||||
- Social sharing preview
|
||||
|
||||
## Output
|
||||
|
||||
```json
|
||||
{
|
||||
"url": "https://example.com",
|
||||
"meta": {
|
||||
"title": "Page Title",
|
||||
"title_length": 55,
|
||||
"description": "...",
|
||||
"description_length": 150,
|
||||
"canonical": "https://example.com"
|
||||
},
|
||||
"headings": {
|
||||
"h1_count": 1,
|
||||
"h1_text": ["Main Heading"],
|
||||
"hierarchy_valid": true
|
||||
},
|
||||
"links": {
|
||||
"internal": 25,
|
||||
"external": 5,
|
||||
"broken": []
|
||||
},
|
||||
"issues": []
|
||||
}
|
||||
```
|
||||
|
||||
## Common Issues
|
||||
|
||||
| Issue | Severity | Recommendation |
|
||||
|-------|----------|----------------|
|
||||
| Missing H1 | High | Add single H1 tag |
|
||||
| Title too long (>60) | Medium | Shorten to 50-60 chars |
|
||||
| No meta description | High | Add compelling description |
|
||||
| Images without alt | Medium | Add descriptive alt text |
|
||||
| Multiple H1 tags | Medium | Use single H1 only |
|
||||
|
||||
## Dependencies
|
||||
|
||||
```
|
||||
lxml>=5.1.0
|
||||
beautifulsoup4>=4.12.0
|
||||
requests>=2.31.0
|
||||
python-dotenv>=1.0.0
|
||||
rich>=13.7.0
|
||||
```
|
||||
207
custom-skills/11-seo-on-page-audit/code/scripts/base_client.py
Normal file
207
custom-skills/11-seo-on-page-audit/code/scripts/base_client.py
Normal file
@@ -0,0 +1,207 @@
|
||||
"""
|
||||
Base Client - Shared async client utilities
|
||||
===========================================
|
||||
Purpose: Rate-limited async operations for API clients
|
||||
Python: 3.10+
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
import os
|
||||
from asyncio import Semaphore
|
||||
from datetime import datetime
|
||||
from typing import Any, Callable, TypeVar
|
||||
|
||||
from dotenv import load_dotenv
|
||||
from tenacity import (
|
||||
retry,
|
||||
stop_after_attempt,
|
||||
wait_exponential,
|
||||
retry_if_exception_type,
|
||||
)
|
||||
|
||||
# Load environment variables
|
||||
load_dotenv()
|
||||
|
||||
# Logging setup
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format="%(asctime)s - %(levelname)s - %(message)s",
|
||||
)
|
||||
|
||||
T = TypeVar("T")
|
||||
|
||||
|
||||
class RateLimiter:
|
||||
"""Rate limiter using token bucket algorithm."""
|
||||
|
||||
def __init__(self, rate: float, per: float = 1.0):
|
||||
"""
|
||||
Initialize rate limiter.
|
||||
|
||||
Args:
|
||||
rate: Number of requests allowed
|
||||
per: Time period in seconds (default: 1 second)
|
||||
"""
|
||||
self.rate = rate
|
||||
self.per = per
|
||||
self.tokens = rate
|
||||
self.last_update = datetime.now()
|
||||
self._lock = asyncio.Lock()
|
||||
|
||||
async def acquire(self) -> None:
|
||||
"""Acquire a token, waiting if necessary."""
|
||||
async with self._lock:
|
||||
now = datetime.now()
|
||||
elapsed = (now - self.last_update).total_seconds()
|
||||
self.tokens = min(self.rate, self.tokens + elapsed * (self.rate / self.per))
|
||||
self.last_update = now
|
||||
|
||||
if self.tokens < 1:
|
||||
wait_time = (1 - self.tokens) * (self.per / self.rate)
|
||||
await asyncio.sleep(wait_time)
|
||||
self.tokens = 0
|
||||
else:
|
||||
self.tokens -= 1
|
||||
|
||||
|
||||
class BaseAsyncClient:
|
||||
"""Base class for async API clients with rate limiting."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
max_concurrent: int = 5,
|
||||
requests_per_second: float = 3.0,
|
||||
logger: logging.Logger | None = None,
|
||||
):
|
||||
"""
|
||||
Initialize base client.
|
||||
|
||||
Args:
|
||||
max_concurrent: Maximum concurrent requests
|
||||
requests_per_second: Rate limit
|
||||
logger: Logger instance
|
||||
"""
|
||||
self.semaphore = Semaphore(max_concurrent)
|
||||
self.rate_limiter = RateLimiter(requests_per_second)
|
||||
self.logger = logger or logging.getLogger(self.__class__.__name__)
|
||||
self.stats = {
|
||||
"requests": 0,
|
||||
"success": 0,
|
||||
"errors": 0,
|
||||
"retries": 0,
|
||||
}
|
||||
|
||||
@retry(
|
||||
stop=stop_after_attempt(3),
|
||||
wait=wait_exponential(multiplier=1, min=2, max=10),
|
||||
retry=retry_if_exception_type(Exception),
|
||||
)
|
||||
async def _rate_limited_request(
|
||||
self,
|
||||
coro: Callable[[], Any],
|
||||
) -> Any:
|
||||
"""Execute a request with rate limiting and retry."""
|
||||
async with self.semaphore:
|
||||
await self.rate_limiter.acquire()
|
||||
self.stats["requests"] += 1
|
||||
try:
|
||||
result = await coro()
|
||||
self.stats["success"] += 1
|
||||
return result
|
||||
except Exception as e:
|
||||
self.stats["errors"] += 1
|
||||
self.logger.error(f"Request failed: {e}")
|
||||
raise
|
||||
|
||||
async def batch_requests(
|
||||
self,
|
||||
requests: list[Callable[[], Any]],
|
||||
desc: str = "Processing",
|
||||
) -> list[Any]:
|
||||
"""Execute multiple requests concurrently."""
|
||||
try:
|
||||
from tqdm.asyncio import tqdm
|
||||
has_tqdm = True
|
||||
except ImportError:
|
||||
has_tqdm = False
|
||||
|
||||
async def execute(req: Callable) -> Any:
|
||||
try:
|
||||
return await self._rate_limited_request(req)
|
||||
except Exception as e:
|
||||
return {"error": str(e)}
|
||||
|
||||
tasks = [execute(req) for req in requests]
|
||||
|
||||
if has_tqdm:
|
||||
results = []
|
||||
for coro in tqdm.as_completed(tasks, total=len(tasks), desc=desc):
|
||||
result = await coro
|
||||
results.append(result)
|
||||
return results
|
||||
else:
|
||||
return await asyncio.gather(*tasks, return_exceptions=True)
|
||||
|
||||
def print_stats(self) -> None:
|
||||
"""Print request statistics."""
|
||||
self.logger.info("=" * 40)
|
||||
self.logger.info("Request Statistics:")
|
||||
self.logger.info(f" Total Requests: {self.stats['requests']}")
|
||||
self.logger.info(f" Successful: {self.stats['success']}")
|
||||
self.logger.info(f" Errors: {self.stats['errors']}")
|
||||
self.logger.info("=" * 40)
|
||||
|
||||
|
||||
class ConfigManager:
|
||||
"""Manage API configuration and credentials."""
|
||||
|
||||
def __init__(self):
|
||||
load_dotenv()
|
||||
|
||||
@property
|
||||
def google_credentials_path(self) -> str | None:
|
||||
"""Get Google service account credentials path."""
|
||||
# Prefer SEO-specific credentials, fallback to general credentials
|
||||
seo_creds = os.path.expanduser("~/.credential/ourdigital-seo-agent.json")
|
||||
if os.path.exists(seo_creds):
|
||||
return seo_creds
|
||||
return os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
|
||||
|
||||
@property
|
||||
def pagespeed_api_key(self) -> str | None:
|
||||
"""Get PageSpeed Insights API key."""
|
||||
return os.getenv("PAGESPEED_API_KEY")
|
||||
|
||||
@property
|
||||
def custom_search_api_key(self) -> str | None:
|
||||
"""Get Custom Search API key."""
|
||||
return os.getenv("CUSTOM_SEARCH_API_KEY")
|
||||
|
||||
@property
|
||||
def custom_search_engine_id(self) -> str | None:
|
||||
"""Get Custom Search Engine ID."""
|
||||
return os.getenv("CUSTOM_SEARCH_ENGINE_ID")
|
||||
|
||||
@property
|
||||
def notion_token(self) -> str | None:
|
||||
"""Get Notion API token."""
|
||||
return os.getenv("NOTION_TOKEN") or os.getenv("NOTION_API_KEY")
|
||||
|
||||
def validate_google_credentials(self) -> bool:
|
||||
"""Validate Google credentials are configured."""
|
||||
creds_path = self.google_credentials_path
|
||||
if not creds_path:
|
||||
return False
|
||||
return os.path.exists(creds_path)
|
||||
|
||||
def get_required(self, key: str) -> str:
|
||||
"""Get required environment variable or raise error."""
|
||||
value = os.getenv(key)
|
||||
if not value:
|
||||
raise ValueError(f"Missing required environment variable: {key}")
|
||||
return value
|
||||
|
||||
|
||||
# Singleton config instance
|
||||
config = ConfigManager()
|
||||
569
custom-skills/11-seo-on-page-audit/code/scripts/page_analyzer.py
Normal file
569
custom-skills/11-seo-on-page-audit/code/scripts/page_analyzer.py
Normal file
@@ -0,0 +1,569 @@
|
||||
"""
|
||||
Page Analyzer - Extract SEO metadata from web pages
|
||||
===================================================
|
||||
Purpose: Comprehensive page-level SEO data extraction
|
||||
Python: 3.10+
|
||||
Usage:
|
||||
from page_analyzer import PageAnalyzer, PageMetadata
|
||||
analyzer = PageAnalyzer()
|
||||
metadata = analyzer.analyze_url("https://example.com/page")
|
||||
"""
|
||||
|
||||
import json
|
||||
import logging
|
||||
import re
|
||||
from dataclasses import dataclass, field
|
||||
from datetime import datetime
|
||||
from typing import Any
|
||||
from urllib.parse import urljoin, urlparse
|
||||
|
||||
import requests
|
||||
from bs4 import BeautifulSoup
|
||||
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format="%(asctime)s - %(levelname)s - %(message)s",
|
||||
)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@dataclass
|
||||
class LinkData:
|
||||
"""Represents a link found on a page."""
|
||||
url: str
|
||||
anchor_text: str
|
||||
is_internal: bool
|
||||
is_nofollow: bool = False
|
||||
link_type: str = "body" # body, nav, footer, etc.
|
||||
|
||||
|
||||
@dataclass
|
||||
class HeadingData:
|
||||
"""Represents a heading found on a page."""
|
||||
level: int # 1-6
|
||||
text: str
|
||||
|
||||
|
||||
@dataclass
|
||||
class SchemaData:
|
||||
"""Represents schema.org structured data."""
|
||||
schema_type: str
|
||||
properties: dict
|
||||
format: str = "json-ld" # json-ld, microdata, rdfa
|
||||
|
||||
|
||||
@dataclass
|
||||
class OpenGraphData:
|
||||
"""Represents Open Graph metadata."""
|
||||
og_title: str | None = None
|
||||
og_description: str | None = None
|
||||
og_image: str | None = None
|
||||
og_url: str | None = None
|
||||
og_type: str | None = None
|
||||
og_site_name: str | None = None
|
||||
og_locale: str | None = None
|
||||
twitter_card: str | None = None
|
||||
twitter_title: str | None = None
|
||||
twitter_description: str | None = None
|
||||
twitter_image: str | None = None
|
||||
|
||||
|
||||
@dataclass
|
||||
class PageMetadata:
|
||||
"""Complete SEO metadata for a page."""
|
||||
|
||||
# Basic info
|
||||
url: str
|
||||
status_code: int = 0
|
||||
content_type: str = ""
|
||||
response_time_ms: float = 0
|
||||
analyzed_at: datetime = field(default_factory=datetime.now)
|
||||
|
||||
# Meta tags
|
||||
title: str | None = None
|
||||
title_length: int = 0
|
||||
meta_description: str | None = None
|
||||
meta_description_length: int = 0
|
||||
canonical_url: str | None = None
|
||||
robots_meta: str | None = None
|
||||
|
||||
# Language
|
||||
html_lang: str | None = None
|
||||
hreflang_tags: list[dict] = field(default_factory=list) # [{"lang": "en", "url": "..."}]
|
||||
|
||||
# Headings
|
||||
headings: list[HeadingData] = field(default_factory=list)
|
||||
h1_count: int = 0
|
||||
h1_text: str | None = None
|
||||
|
||||
# Open Graph & Social
|
||||
open_graph: OpenGraphData = field(default_factory=OpenGraphData)
|
||||
|
||||
# Schema/Structured Data
|
||||
schema_data: list[SchemaData] = field(default_factory=list)
|
||||
schema_types_found: list[str] = field(default_factory=list)
|
||||
|
||||
# Links
|
||||
internal_links: list[LinkData] = field(default_factory=list)
|
||||
external_links: list[LinkData] = field(default_factory=list)
|
||||
internal_link_count: int = 0
|
||||
external_link_count: int = 0
|
||||
|
||||
# Images
|
||||
images_total: int = 0
|
||||
images_without_alt: int = 0
|
||||
images_with_alt: int = 0
|
||||
|
||||
# Content metrics
|
||||
word_count: int = 0
|
||||
|
||||
# Issues found
|
||||
issues: list[str] = field(default_factory=list)
|
||||
warnings: list[str] = field(default_factory=list)
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
"""Convert to dictionary for JSON serialization."""
|
||||
return {
|
||||
"url": self.url,
|
||||
"status_code": self.status_code,
|
||||
"content_type": self.content_type,
|
||||
"response_time_ms": self.response_time_ms,
|
||||
"analyzed_at": self.analyzed_at.isoformat(),
|
||||
"title": self.title,
|
||||
"title_length": self.title_length,
|
||||
"meta_description": self.meta_description,
|
||||
"meta_description_length": self.meta_description_length,
|
||||
"canonical_url": self.canonical_url,
|
||||
"robots_meta": self.robots_meta,
|
||||
"html_lang": self.html_lang,
|
||||
"hreflang_tags": self.hreflang_tags,
|
||||
"h1_count": self.h1_count,
|
||||
"h1_text": self.h1_text,
|
||||
"headings_count": len(self.headings),
|
||||
"schema_types_found": self.schema_types_found,
|
||||
"internal_link_count": self.internal_link_count,
|
||||
"external_link_count": self.external_link_count,
|
||||
"images_total": self.images_total,
|
||||
"images_without_alt": self.images_without_alt,
|
||||
"word_count": self.word_count,
|
||||
"issues": self.issues,
|
||||
"warnings": self.warnings,
|
||||
"open_graph": {
|
||||
"og_title": self.open_graph.og_title,
|
||||
"og_description": self.open_graph.og_description,
|
||||
"og_image": self.open_graph.og_image,
|
||||
"og_url": self.open_graph.og_url,
|
||||
"og_type": self.open_graph.og_type,
|
||||
},
|
||||
}
|
||||
|
||||
def get_summary(self) -> str:
|
||||
"""Get a brief summary of the page analysis."""
|
||||
lines = [
|
||||
f"URL: {self.url}",
|
||||
f"Status: {self.status_code}",
|
||||
f"Title: {self.title[:50] + '...' if self.title and len(self.title) > 50 else self.title}",
|
||||
f"Description: {'✓' if self.meta_description else '✗ Missing'}",
|
||||
f"Canonical: {'✓' if self.canonical_url else '✗ Missing'}",
|
||||
f"H1: {self.h1_count} found",
|
||||
f"Schema: {', '.join(self.schema_types_found) if self.schema_types_found else 'None'}",
|
||||
f"Links: {self.internal_link_count} internal, {self.external_link_count} external",
|
||||
f"Images: {self.images_total} total, {self.images_without_alt} without alt",
|
||||
]
|
||||
if self.issues:
|
||||
lines.append(f"Issues: {len(self.issues)}")
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
class PageAnalyzer:
|
||||
"""Analyze web pages for SEO metadata."""
|
||||
|
||||
DEFAULT_USER_AGENT = "Mozilla/5.0 (compatible; OurDigitalSEOBot/1.0; +https://ourdigital.org)"
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
user_agent: str | None = None,
|
||||
timeout: int = 30,
|
||||
):
|
||||
"""
|
||||
Initialize page analyzer.
|
||||
|
||||
Args:
|
||||
user_agent: Custom user agent string
|
||||
timeout: Request timeout in seconds
|
||||
"""
|
||||
self.user_agent = user_agent or self.DEFAULT_USER_AGENT
|
||||
self.timeout = timeout
|
||||
self.session = requests.Session()
|
||||
self.session.headers.update({
|
||||
"User-Agent": self.user_agent,
|
||||
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
|
||||
"Accept-Language": "en-US,en;q=0.9,ko;q=0.8",
|
||||
})
|
||||
|
||||
def analyze_url(self, url: str) -> PageMetadata:
|
||||
"""
|
||||
Analyze a URL and extract SEO metadata.
|
||||
|
||||
Args:
|
||||
url: URL to analyze
|
||||
|
||||
Returns:
|
||||
PageMetadata object with all extracted data
|
||||
"""
|
||||
metadata = PageMetadata(url=url)
|
||||
|
||||
try:
|
||||
# Fetch page
|
||||
start_time = datetime.now()
|
||||
response = self.session.get(url, timeout=self.timeout, allow_redirects=True)
|
||||
metadata.response_time_ms = (datetime.now() - start_time).total_seconds() * 1000
|
||||
metadata.status_code = response.status_code
|
||||
metadata.content_type = response.headers.get("Content-Type", "")
|
||||
|
||||
if response.status_code != 200:
|
||||
metadata.issues.append(f"HTTP {response.status_code} status")
|
||||
if response.status_code >= 400:
|
||||
return metadata
|
||||
|
||||
# Parse HTML
|
||||
soup = BeautifulSoup(response.text, "html.parser")
|
||||
base_url = url
|
||||
|
||||
# Extract all metadata
|
||||
self._extract_basic_meta(soup, metadata)
|
||||
self._extract_canonical(soup, metadata, base_url)
|
||||
self._extract_robots_meta(soup, metadata)
|
||||
self._extract_hreflang(soup, metadata)
|
||||
self._extract_headings(soup, metadata)
|
||||
self._extract_open_graph(soup, metadata)
|
||||
self._extract_schema(soup, metadata)
|
||||
self._extract_links(soup, metadata, base_url)
|
||||
self._extract_images(soup, metadata)
|
||||
self._extract_content_metrics(soup, metadata)
|
||||
|
||||
# Run SEO checks
|
||||
self._run_seo_checks(metadata)
|
||||
|
||||
except requests.RequestException as e:
|
||||
metadata.issues.append(f"Request failed: {str(e)}")
|
||||
logger.error(f"Failed to analyze {url}: {e}")
|
||||
except Exception as e:
|
||||
metadata.issues.append(f"Analysis error: {str(e)}")
|
||||
logger.error(f"Error analyzing {url}: {e}")
|
||||
|
||||
return metadata
|
||||
|
||||
def _extract_basic_meta(self, soup: BeautifulSoup, metadata: PageMetadata) -> None:
|
||||
"""Extract title and meta description."""
|
||||
# Title
|
||||
title_tag = soup.find("title")
|
||||
if title_tag and title_tag.string:
|
||||
metadata.title = title_tag.string.strip()
|
||||
metadata.title_length = len(metadata.title)
|
||||
|
||||
# Meta description
|
||||
desc_tag = soup.find("meta", attrs={"name": re.compile(r"^description$", re.I)})
|
||||
if desc_tag and desc_tag.get("content"):
|
||||
metadata.meta_description = desc_tag["content"].strip()
|
||||
metadata.meta_description_length = len(metadata.meta_description)
|
||||
|
||||
# HTML lang
|
||||
html_tag = soup.find("html")
|
||||
if html_tag and html_tag.get("lang"):
|
||||
metadata.html_lang = html_tag["lang"]
|
||||
|
||||
def _extract_canonical(self, soup: BeautifulSoup, metadata: PageMetadata, base_url: str) -> None:
|
||||
"""Extract canonical URL."""
|
||||
canonical = soup.find("link", rel="canonical")
|
||||
if canonical and canonical.get("href"):
|
||||
metadata.canonical_url = urljoin(base_url, canonical["href"])
|
||||
|
||||
def _extract_robots_meta(self, soup: BeautifulSoup, metadata: PageMetadata) -> None:
|
||||
"""Extract robots meta tag."""
|
||||
robots = soup.find("meta", attrs={"name": re.compile(r"^robots$", re.I)})
|
||||
if robots and robots.get("content"):
|
||||
metadata.robots_meta = robots["content"]
|
||||
|
||||
# Also check for googlebot-specific
|
||||
googlebot = soup.find("meta", attrs={"name": re.compile(r"^googlebot$", re.I)})
|
||||
if googlebot and googlebot.get("content"):
|
||||
if metadata.robots_meta:
|
||||
metadata.robots_meta += f" | googlebot: {googlebot['content']}"
|
||||
else:
|
||||
metadata.robots_meta = f"googlebot: {googlebot['content']}"
|
||||
|
||||
def _extract_hreflang(self, soup: BeautifulSoup, metadata: PageMetadata) -> None:
|
||||
"""Extract hreflang tags."""
|
||||
hreflang_tags = soup.find_all("link", rel="alternate", hreflang=True)
|
||||
for tag in hreflang_tags:
|
||||
if tag.get("href") and tag.get("hreflang"):
|
||||
metadata.hreflang_tags.append({
|
||||
"lang": tag["hreflang"],
|
||||
"url": tag["href"]
|
||||
})
|
||||
|
||||
def _extract_headings(self, soup: BeautifulSoup, metadata: PageMetadata) -> None:
|
||||
"""Extract all headings."""
|
||||
for level in range(1, 7):
|
||||
for heading in soup.find_all(f"h{level}"):
|
||||
text = heading.get_text(strip=True)
|
||||
if text:
|
||||
metadata.headings.append(HeadingData(level=level, text=text))
|
||||
|
||||
# Count H1s specifically
|
||||
h1_tags = soup.find_all("h1")
|
||||
metadata.h1_count = len(h1_tags)
|
||||
if h1_tags:
|
||||
metadata.h1_text = h1_tags[0].get_text(strip=True)
|
||||
|
||||
def _extract_open_graph(self, soup: BeautifulSoup, metadata: PageMetadata) -> None:
|
||||
"""Extract Open Graph and Twitter Card data."""
|
||||
og = metadata.open_graph
|
||||
|
||||
# Open Graph tags
|
||||
og_mappings = {
|
||||
"og:title": "og_title",
|
||||
"og:description": "og_description",
|
||||
"og:image": "og_image",
|
||||
"og:url": "og_url",
|
||||
"og:type": "og_type",
|
||||
"og:site_name": "og_site_name",
|
||||
"og:locale": "og_locale",
|
||||
}
|
||||
|
||||
for og_prop, attr_name in og_mappings.items():
|
||||
tag = soup.find("meta", property=og_prop)
|
||||
if tag and tag.get("content"):
|
||||
setattr(og, attr_name, tag["content"])
|
||||
|
||||
# Twitter Card tags
|
||||
twitter_mappings = {
|
||||
"twitter:card": "twitter_card",
|
||||
"twitter:title": "twitter_title",
|
||||
"twitter:description": "twitter_description",
|
||||
"twitter:image": "twitter_image",
|
||||
}
|
||||
|
||||
for tw_name, attr_name in twitter_mappings.items():
|
||||
tag = soup.find("meta", attrs={"name": tw_name})
|
||||
if tag and tag.get("content"):
|
||||
setattr(og, attr_name, tag["content"])
|
||||
|
||||
def _extract_schema(self, soup: BeautifulSoup, metadata: PageMetadata) -> None:
|
||||
"""Extract schema.org structured data."""
|
||||
# JSON-LD
|
||||
for script in soup.find_all("script", type="application/ld+json"):
|
||||
try:
|
||||
data = json.loads(script.string)
|
||||
if isinstance(data, list):
|
||||
for item in data:
|
||||
self._process_schema_item(item, metadata, "json-ld")
|
||||
else:
|
||||
self._process_schema_item(data, metadata, "json-ld")
|
||||
except (json.JSONDecodeError, TypeError):
|
||||
continue
|
||||
|
||||
# Microdata (basic detection)
|
||||
for item in soup.find_all(itemscope=True):
|
||||
itemtype = item.get("itemtype", "")
|
||||
if itemtype:
|
||||
schema_type = itemtype.split("/")[-1]
|
||||
if schema_type not in metadata.schema_types_found:
|
||||
metadata.schema_types_found.append(schema_type)
|
||||
metadata.schema_data.append(SchemaData(
|
||||
schema_type=schema_type,
|
||||
properties={},
|
||||
format="microdata"
|
||||
))
|
||||
|
||||
def _process_schema_item(self, data: dict, metadata: PageMetadata, format_type: str) -> None:
|
||||
"""Process a single schema.org item."""
|
||||
if not isinstance(data, dict):
|
||||
return
|
||||
|
||||
schema_type = data.get("@type", "Unknown")
|
||||
if isinstance(schema_type, list):
|
||||
schema_type = schema_type[0] if schema_type else "Unknown"
|
||||
|
||||
if schema_type not in metadata.schema_types_found:
|
||||
metadata.schema_types_found.append(schema_type)
|
||||
|
||||
metadata.schema_data.append(SchemaData(
|
||||
schema_type=schema_type,
|
||||
properties=data,
|
||||
format=format_type
|
||||
))
|
||||
|
||||
# Process nested @graph items
|
||||
if "@graph" in data:
|
||||
for item in data["@graph"]:
|
||||
self._process_schema_item(item, metadata, format_type)
|
||||
|
||||
def _extract_links(self, soup: BeautifulSoup, metadata: PageMetadata, base_url: str) -> None:
|
||||
"""Extract internal and external links."""
|
||||
parsed_base = urlparse(base_url)
|
||||
base_domain = parsed_base.netloc.lower()
|
||||
|
||||
for a_tag in soup.find_all("a", href=True):
|
||||
href = a_tag["href"]
|
||||
|
||||
# Skip non-http links
|
||||
if href.startswith(("#", "javascript:", "mailto:", "tel:")):
|
||||
continue
|
||||
|
||||
# Resolve relative URLs
|
||||
full_url = urljoin(base_url, href)
|
||||
parsed_url = urlparse(full_url)
|
||||
|
||||
# Get anchor text
|
||||
anchor_text = a_tag.get_text(strip=True)[:100] # Limit length
|
||||
|
||||
# Check if nofollow
|
||||
rel = a_tag.get("rel", [])
|
||||
if isinstance(rel, str):
|
||||
rel = rel.split()
|
||||
is_nofollow = "nofollow" in rel
|
||||
|
||||
# Determine if internal or external
|
||||
link_domain = parsed_url.netloc.lower()
|
||||
is_internal = (
|
||||
link_domain == base_domain or
|
||||
link_domain.endswith(f".{base_domain}") or
|
||||
base_domain.endswith(f".{link_domain}")
|
||||
)
|
||||
|
||||
link_data = LinkData(
|
||||
url=full_url,
|
||||
anchor_text=anchor_text,
|
||||
is_internal=is_internal,
|
||||
is_nofollow=is_nofollow,
|
||||
)
|
||||
|
||||
if is_internal:
|
||||
metadata.internal_links.append(link_data)
|
||||
else:
|
||||
metadata.external_links.append(link_data)
|
||||
|
||||
metadata.internal_link_count = len(metadata.internal_links)
|
||||
metadata.external_link_count = len(metadata.external_links)
|
||||
|
||||
def _extract_images(self, soup: BeautifulSoup, metadata: PageMetadata) -> None:
|
||||
"""Extract image information."""
|
||||
images = soup.find_all("img")
|
||||
metadata.images_total = len(images)
|
||||
|
||||
for img in images:
|
||||
alt = img.get("alt", "").strip()
|
||||
if alt:
|
||||
metadata.images_with_alt += 1
|
||||
else:
|
||||
metadata.images_without_alt += 1
|
||||
|
||||
def _extract_content_metrics(self, soup: BeautifulSoup, metadata: PageMetadata) -> None:
|
||||
"""Extract content metrics like word count."""
|
||||
# Remove script and style elements
|
||||
for element in soup(["script", "style", "noscript"]):
|
||||
element.decompose()
|
||||
|
||||
# Get text content
|
||||
text = soup.get_text(separator=" ", strip=True)
|
||||
words = text.split()
|
||||
metadata.word_count = len(words)
|
||||
|
||||
def _run_seo_checks(self, metadata: PageMetadata) -> None:
|
||||
"""Run SEO checks and add issues/warnings."""
|
||||
# Title checks
|
||||
if not metadata.title:
|
||||
metadata.issues.append("Missing title tag")
|
||||
elif metadata.title_length < 30:
|
||||
metadata.warnings.append(f"Title too short ({metadata.title_length} chars, recommend 50-60)")
|
||||
elif metadata.title_length > 60:
|
||||
metadata.warnings.append(f"Title too long ({metadata.title_length} chars, recommend 50-60)")
|
||||
|
||||
# Meta description checks
|
||||
if not metadata.meta_description:
|
||||
metadata.issues.append("Missing meta description")
|
||||
elif metadata.meta_description_length < 120:
|
||||
metadata.warnings.append(f"Meta description too short ({metadata.meta_description_length} chars)")
|
||||
elif metadata.meta_description_length > 160:
|
||||
metadata.warnings.append(f"Meta description too long ({metadata.meta_description_length} chars)")
|
||||
|
||||
# Canonical check
|
||||
if not metadata.canonical_url:
|
||||
metadata.warnings.append("Missing canonical tag")
|
||||
elif metadata.canonical_url != metadata.url:
|
||||
metadata.warnings.append(f"Canonical points to different URL: {metadata.canonical_url}")
|
||||
|
||||
# H1 checks
|
||||
if metadata.h1_count == 0:
|
||||
metadata.issues.append("Missing H1 tag")
|
||||
elif metadata.h1_count > 1:
|
||||
metadata.warnings.append(f"Multiple H1 tags ({metadata.h1_count})")
|
||||
|
||||
# Image alt check
|
||||
if metadata.images_without_alt > 0:
|
||||
metadata.warnings.append(f"{metadata.images_without_alt} images missing alt text")
|
||||
|
||||
# Schema check
|
||||
if not metadata.schema_types_found:
|
||||
metadata.warnings.append("No structured data found")
|
||||
|
||||
# Open Graph check
|
||||
if not metadata.open_graph.og_title:
|
||||
metadata.warnings.append("Missing Open Graph tags")
|
||||
|
||||
# Robots meta check
|
||||
if metadata.robots_meta:
|
||||
robots_lower = metadata.robots_meta.lower()
|
||||
if "noindex" in robots_lower:
|
||||
metadata.issues.append("Page is set to noindex")
|
||||
if "nofollow" in robots_lower:
|
||||
metadata.warnings.append("Page is set to nofollow")
|
||||
|
||||
|
||||
def main():
|
||||
"""CLI entry point for testing."""
|
||||
import argparse
|
||||
|
||||
parser = argparse.ArgumentParser(description="Page SEO Analyzer")
|
||||
parser.add_argument("url", help="URL to analyze")
|
||||
parser.add_argument("--json", "-j", action="store_true", help="Output as JSON")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
analyzer = PageAnalyzer()
|
||||
metadata = analyzer.analyze_url(args.url)
|
||||
|
||||
if args.json:
|
||||
print(json.dumps(metadata.to_dict(), indent=2, ensure_ascii=False))
|
||||
else:
|
||||
print("=" * 60)
|
||||
print("PAGE ANALYSIS REPORT")
|
||||
print("=" * 60)
|
||||
print(metadata.get_summary())
|
||||
print()
|
||||
|
||||
if metadata.issues:
|
||||
print("ISSUES:")
|
||||
for issue in metadata.issues:
|
||||
print(f" ✗ {issue}")
|
||||
|
||||
if metadata.warnings:
|
||||
print("\nWARNINGS:")
|
||||
for warning in metadata.warnings:
|
||||
print(f" ⚠ {warning}")
|
||||
|
||||
if metadata.hreflang_tags:
|
||||
print(f"\nHREFLANG TAGS ({len(metadata.hreflang_tags)}):")
|
||||
for tag in metadata.hreflang_tags[:5]:
|
||||
print(f" {tag['lang']}: {tag['url']}")
|
||||
|
||||
if metadata.schema_types_found:
|
||||
print(f"\nSCHEMA TYPES:")
|
||||
for schema_type in metadata.schema_types_found:
|
||||
print(f" - {schema_type}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -0,0 +1,6 @@
|
||||
# 11-seo-on-page-audit dependencies
|
||||
lxml>=5.1.0
|
||||
beautifulsoup4>=4.12.0
|
||||
requests>=2.31.0
|
||||
python-dotenv>=1.0.0
|
||||
rich>=13.7.0
|
||||
94
custom-skills/11-seo-on-page-audit/desktop/SKILL.md
Normal file
94
custom-skills/11-seo-on-page-audit/desktop/SKILL.md
Normal file
@@ -0,0 +1,94 @@
|
||||
---
|
||||
name: seo-on-page-audit
|
||||
version: 1.0.0
|
||||
description: On-page SEO analyzer for meta tags, headings, links, images, and Open Graph. Triggers: on-page SEO, meta tags, title tag, heading structure, alt text.
|
||||
allowed-tools: mcp__firecrawl__*, mcp__perplexity__*, mcp__notion__*
|
||||
---
|
||||
|
||||
# SEO On-Page Audit
|
||||
|
||||
## Purpose
|
||||
|
||||
Analyze single-page SEO elements: meta tags, heading hierarchy, internal/external links, images, and social sharing tags.
|
||||
|
||||
## Core Capabilities
|
||||
|
||||
1. **Meta Tags** - Title, description, canonical, robots
|
||||
2. **Headings** - H1-H6 structure and hierarchy
|
||||
3. **Links** - Internal, external, broken detection
|
||||
4. **Images** - Alt text, sizing, lazy loading
|
||||
5. **Social** - Open Graph, Twitter Cards
|
||||
|
||||
## MCP Tool Usage
|
||||
|
||||
```
|
||||
mcp__firecrawl__scrape: Extract page HTML and metadata
|
||||
mcp__perplexity__search: Research SEO best practices
|
||||
mcp__notion__create-page: Save audit findings
|
||||
```
|
||||
|
||||
## Workflow
|
||||
|
||||
1. Scrape target URL with Firecrawl
|
||||
2. Extract and analyze meta tags
|
||||
3. Map heading hierarchy
|
||||
4. Count and categorize links
|
||||
5. Check image optimization
|
||||
6. Validate Open Graph tags
|
||||
7. Generate recommendations
|
||||
|
||||
## Checklist
|
||||
|
||||
### Meta Tags
|
||||
- [ ] Title present (50-60 characters)
|
||||
- [ ] Meta description present (150-160 characters)
|
||||
- [ ] Canonical URL set
|
||||
- [ ] Robots meta allows indexing
|
||||
|
||||
### Headings
|
||||
- [ ] Single H1 tag
|
||||
- [ ] Logical hierarchy (no skips)
|
||||
- [ ] Keywords in H1
|
||||
|
||||
### Links
|
||||
- [ ] No broken internal links
|
||||
- [ ] External links use rel attributes
|
||||
- [ ] Reasonable internal link count
|
||||
|
||||
### Images
|
||||
- [ ] All images have alt text
|
||||
- [ ] Images are appropriately sized
|
||||
- [ ] Lazy loading implemented
|
||||
|
||||
### Open Graph
|
||||
- [ ] og:title present
|
||||
- [ ] og:description present
|
||||
- [ ] og:image present (1200x630)
|
||||
|
||||
## Output Format
|
||||
|
||||
```markdown
|
||||
## On-Page Audit: [URL]
|
||||
|
||||
### Meta Tags: X/5
|
||||
| Element | Status | Value |
|
||||
|---------|--------|-------|
|
||||
|
||||
### Headings: X/5
|
||||
- H1: [text]
|
||||
- Hierarchy: Valid/Invalid
|
||||
|
||||
### Links
|
||||
- Internal: X
|
||||
- External: X
|
||||
- Broken: X
|
||||
|
||||
### Recommendations
|
||||
1. [Priority fixes]
|
||||
```
|
||||
|
||||
## Limitations
|
||||
|
||||
- Single page analysis only
|
||||
- Cannot detect JavaScript-rendered content issues
|
||||
- External link status requires additional crawl
|
||||
107
custom-skills/12-seo-local-audit/code/CLAUDE.md
Normal file
107
custom-skills/12-seo-local-audit/code/CLAUDE.md
Normal file
@@ -0,0 +1,107 @@
|
||||
# CLAUDE.md
|
||||
|
||||
## Overview
|
||||
|
||||
Local SEO auditor for businesses with physical locations: NAP consistency, Google Business Profile optimization, local citations, and LocalBusiness schema validation.
|
||||
|
||||
## Quick Start
|
||||
|
||||
This skill primarily uses MCP tools (Firecrawl, Perplexity) for data collection. Scripts are helpers for validation.
|
||||
|
||||
```bash
|
||||
# NAP consistency check (manual data input)
|
||||
python scripts/nap_checker.py --business "Business Name" --address "123 Main St" --phone "555-1234"
|
||||
|
||||
# LocalBusiness schema validation
|
||||
python scripts/local_schema_validator.py --url https://example.com
|
||||
```
|
||||
|
||||
## Audit Components
|
||||
|
||||
### 1. NAP Consistency
|
||||
**Name, Address, Phone** consistency across:
|
||||
- Website (header, footer, contact page)
|
||||
- Google Business Profile
|
||||
- Local directories (Yelp, Yellow Pages, etc.)
|
||||
- Social media profiles
|
||||
|
||||
### 2. Google Business Profile (GBP)
|
||||
Optimization checklist:
|
||||
- [ ] Business name matches website
|
||||
- [ ] Address is complete and accurate
|
||||
- [ ] Phone number is local
|
||||
- [ ] Business hours are current
|
||||
- [ ] Categories are appropriate
|
||||
- [ ] Photos uploaded (exterior, interior, products)
|
||||
- [ ] Posts are recent (within 7 days)
|
||||
- [ ] Reviews are responded to
|
||||
|
||||
### 3. Local Citations
|
||||
Priority directories to check:
|
||||
- Google Business Profile
|
||||
- Apple Maps
|
||||
- Bing Places
|
||||
- Yelp
|
||||
- Facebook Business
|
||||
- Industry-specific directories
|
||||
|
||||
### 4. LocalBusiness Schema
|
||||
Required properties:
|
||||
- @type (LocalBusiness or subtype)
|
||||
- name
|
||||
- address (PostalAddress)
|
||||
- telephone
|
||||
- openingHours
|
||||
|
||||
## Workflow
|
||||
|
||||
```
|
||||
1. Collect NAP from client
|
||||
2. Scrape website for NAP mentions
|
||||
3. Search citations using Perplexity
|
||||
4. Check GBP data (manual or API)
|
||||
5. Validate LocalBusiness schema
|
||||
6. Generate consistency report
|
||||
```
|
||||
|
||||
## Output Format
|
||||
|
||||
```markdown
|
||||
## Local SEO Audit: [Business Name]
|
||||
|
||||
### NAP Consistency Score: X/10
|
||||
|
||||
| Source | Name | Address | Phone | Status |
|
||||
|--------|------|---------|-------|--------|
|
||||
| Website | ✓ | ✓ | ✓ | Match |
|
||||
| GBP | ✓ | ✗ | ✓ | Mismatch |
|
||||
|
||||
### GBP Optimization: X/10
|
||||
- [ ] Issue 1
|
||||
- [x] Completed item
|
||||
|
||||
### Citation Audit
|
||||
- Found: X citations
|
||||
- Consistent: X
|
||||
- Needs update: X
|
||||
|
||||
### Recommendations
|
||||
1. Fix address mismatch on GBP
|
||||
2. Add LocalBusiness schema
|
||||
```
|
||||
|
||||
## Common Issues
|
||||
|
||||
| Issue | Impact | Fix |
|
||||
|-------|--------|-----|
|
||||
| NAP inconsistency | High | Update all directories |
|
||||
| Missing GBP categories | Medium | Add relevant categories |
|
||||
| No LocalBusiness schema | Medium | Add JSON-LD markup |
|
||||
| Outdated business hours | Medium | Update GBP hours |
|
||||
| No review responses | Low | Respond to all reviews |
|
||||
|
||||
## Notes
|
||||
|
||||
- GBP API requires enterprise approval (use manual audit)
|
||||
- Citation discovery limited to public data
|
||||
- Use schema generator skill (14) for creating LocalBusiness markup
|
||||
116
custom-skills/12-seo-local-audit/desktop/SKILL.md
Normal file
116
custom-skills/12-seo-local-audit/desktop/SKILL.md
Normal file
@@ -0,0 +1,116 @@
|
||||
---
|
||||
name: seo-local-audit
|
||||
version: 1.0.0
|
||||
description: Local SEO auditor for NAP consistency, Google Business Profile, citations, and LocalBusiness schema. Triggers: local SEO, Google Business Profile, GBP, NAP, citations, local rankings.
|
||||
allowed-tools: mcp__firecrawl__*, mcp__perplexity__*, mcp__notion__*
|
||||
---
|
||||
|
||||
# SEO Local Audit
|
||||
|
||||
## Purpose
|
||||
|
||||
Audit local business SEO: NAP (Name, Address, Phone) consistency, Google Business Profile optimization, local citations, and LocalBusiness schema markup.
|
||||
|
||||
## Core Capabilities
|
||||
|
||||
1. **NAP Consistency** - Cross-platform verification
|
||||
2. **GBP Optimization** - Profile completeness check
|
||||
3. **Citation Audit** - Directory presence
|
||||
4. **Schema Validation** - LocalBusiness markup
|
||||
|
||||
## MCP Tool Usage
|
||||
|
||||
```
|
||||
mcp__firecrawl__scrape: Extract NAP from website
|
||||
mcp__perplexity__search: Find citations and directories
|
||||
mcp__notion__create-page: Save audit findings
|
||||
```
|
||||
|
||||
## Workflow
|
||||
|
||||
### 1. Gather Business Info
|
||||
Collect from client:
|
||||
- Business name (exact)
|
||||
- Full address
|
||||
- Phone number (local preferred)
|
||||
- Website URL
|
||||
- GBP listing URL
|
||||
|
||||
### 2. Website NAP Check
|
||||
Scrape website for NAP mentions:
|
||||
- Header/footer
|
||||
- Contact page
|
||||
- About page
|
||||
- Schema markup
|
||||
|
||||
### 3. Citation Discovery
|
||||
Search for business mentions:
|
||||
- "[Business Name] [City]"
|
||||
- Phone number search
|
||||
- Address search
|
||||
|
||||
### 4. GBP Review
|
||||
Manual checklist:
|
||||
- Profile completeness
|
||||
- Category accuracy
|
||||
- Photo presence
|
||||
- Review responses
|
||||
- Post recency
|
||||
|
||||
### 5. Schema Check
|
||||
Validate LocalBusiness markup presence and accuracy.
|
||||
|
||||
## GBP Optimization Checklist
|
||||
|
||||
- [ ] Business name matches website
|
||||
- [ ] Complete address with suite/unit
|
||||
- [ ] Local phone number (not toll-free)
|
||||
- [ ] Accurate business hours
|
||||
- [ ] Primary + secondary categories set
|
||||
- [ ] Business description complete
|
||||
- [ ] 10+ photos uploaded
|
||||
- [ ] Recent post (within 7 days)
|
||||
- [ ] Reviews responded to
|
||||
|
||||
## Citation Priority
|
||||
|
||||
| Platform | Priority |
|
||||
|----------|----------|
|
||||
| Google Business Profile | Critical |
|
||||
| Apple Maps | High |
|
||||
| Bing Places | High |
|
||||
| Yelp | High |
|
||||
| Facebook | Medium |
|
||||
| Industry directories | Medium |
|
||||
|
||||
## Output Format
|
||||
|
||||
```markdown
|
||||
## Local SEO Audit: [Business]
|
||||
|
||||
### NAP Consistency: X/10
|
||||
| Source | Name | Address | Phone |
|
||||
|--------|------|---------|-------|
|
||||
| Website | ✓/✗ | ✓/✗ | ✓/✗ |
|
||||
| GBP | ✓/✗ | ✓/✗ | ✓/✗ |
|
||||
|
||||
### GBP Score: X/10
|
||||
[Checklist results]
|
||||
|
||||
### Citations Found: X
|
||||
- Consistent: X
|
||||
- Inconsistent: X
|
||||
|
||||
### LocalBusiness Schema
|
||||
- Present: Yes/No
|
||||
- Valid: Yes/No
|
||||
|
||||
### Priority Actions
|
||||
1. [Fix recommendations]
|
||||
```
|
||||
|
||||
## Limitations
|
||||
|
||||
- GBP data requires manual access
|
||||
- Citation discovery limited to searchable sources
|
||||
- Cannot update external directories
|
||||
113
custom-skills/13-seo-schema-validator/code/CLAUDE.md
Normal file
113
custom-skills/13-seo-schema-validator/code/CLAUDE.md
Normal file
@@ -0,0 +1,113 @@
|
||||
# CLAUDE.md
|
||||
|
||||
## Overview
|
||||
|
||||
Structured data validator: extract, parse, and validate JSON-LD, Microdata, and RDFa markup against schema.org vocabulary.
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
pip install -r scripts/requirements.txt
|
||||
python scripts/schema_validator.py --url https://example.com
|
||||
```
|
||||
|
||||
## Scripts
|
||||
|
||||
| Script | Purpose |
|
||||
|--------|---------|
|
||||
| `schema_validator.py` | Extract and validate structured data |
|
||||
| `base_client.py` | Shared utilities |
|
||||
|
||||
## Usage
|
||||
|
||||
```bash
|
||||
# Validate page schema
|
||||
python scripts/schema_validator.py --url https://example.com
|
||||
|
||||
# JSON output
|
||||
python scripts/schema_validator.py --url https://example.com --json
|
||||
|
||||
# Validate local file
|
||||
python scripts/schema_validator.py --file schema.json
|
||||
|
||||
# Check Rich Results eligibility
|
||||
python scripts/schema_validator.py --url https://example.com --rich-results
|
||||
```
|
||||
|
||||
## Supported Formats
|
||||
|
||||
| Format | Detection |
|
||||
|--------|-----------|
|
||||
| JSON-LD | `<script type="application/ld+json">` |
|
||||
| Microdata | `itemscope`, `itemtype`, `itemprop` |
|
||||
| RDFa | `vocab`, `typeof`, `property` |
|
||||
|
||||
## Validation Levels
|
||||
|
||||
### 1. Syntax Validation
|
||||
- Valid JSON structure
|
||||
- Proper nesting
|
||||
- No syntax errors
|
||||
|
||||
### 2. Schema.org Vocabulary
|
||||
- Valid @type values
|
||||
- Known properties
|
||||
- Correct property types
|
||||
|
||||
### 3. Google Rich Results
|
||||
- Required properties present
|
||||
- Recommended properties
|
||||
- Feature-specific requirements
|
||||
|
||||
## Schema Types Validated
|
||||
|
||||
| Type | Required Properties | Rich Result |
|
||||
|------|---------------------|-------------|
|
||||
| Article | headline, author, datePublished | Yes |
|
||||
| Product | name, offers | Yes |
|
||||
| LocalBusiness | name, address | Yes |
|
||||
| FAQPage | mainEntity | Yes |
|
||||
| Organization | name, url | Yes |
|
||||
| BreadcrumbList | itemListElement | Yes |
|
||||
| WebSite | name, url | Sitelinks |
|
||||
|
||||
## Output
|
||||
|
||||
```json
|
||||
{
|
||||
"url": "https://example.com",
|
||||
"schemas_found": 3,
|
||||
"schemas": [
|
||||
{
|
||||
"@type": "Organization",
|
||||
"valid": true,
|
||||
"rich_results_eligible": true,
|
||||
"issues": [],
|
||||
"warnings": []
|
||||
}
|
||||
],
|
||||
"summary": {
|
||||
"valid": 3,
|
||||
"invalid": 0,
|
||||
"rich_results_eligible": 2
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Issue Severity
|
||||
|
||||
| Level | Description |
|
||||
|-------|-------------|
|
||||
| Error | Invalid schema, blocks rich results |
|
||||
| Warning | Missing recommended property |
|
||||
| Info | Optimization suggestion |
|
||||
|
||||
## Dependencies
|
||||
|
||||
```
|
||||
extruct>=0.16.0
|
||||
jsonschema>=4.21.0
|
||||
rdflib>=7.0.0
|
||||
lxml>=5.1.0
|
||||
requests>=2.31.0
|
||||
```
|
||||
@@ -0,0 +1,207 @@
|
||||
"""
|
||||
Base Client - Shared async client utilities
|
||||
===========================================
|
||||
Purpose: Rate-limited async operations for API clients
|
||||
Python: 3.10+
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
import os
|
||||
from asyncio import Semaphore
|
||||
from datetime import datetime
|
||||
from typing import Any, Callable, TypeVar
|
||||
|
||||
from dotenv import load_dotenv
|
||||
from tenacity import (
|
||||
retry,
|
||||
stop_after_attempt,
|
||||
wait_exponential,
|
||||
retry_if_exception_type,
|
||||
)
|
||||
|
||||
# Load environment variables
|
||||
load_dotenv()
|
||||
|
||||
# Logging setup
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format="%(asctime)s - %(levelname)s - %(message)s",
|
||||
)
|
||||
|
||||
T = TypeVar("T")
|
||||
|
||||
|
||||
class RateLimiter:
|
||||
"""Rate limiter using token bucket algorithm."""
|
||||
|
||||
def __init__(self, rate: float, per: float = 1.0):
|
||||
"""
|
||||
Initialize rate limiter.
|
||||
|
||||
Args:
|
||||
rate: Number of requests allowed
|
||||
per: Time period in seconds (default: 1 second)
|
||||
"""
|
||||
self.rate = rate
|
||||
self.per = per
|
||||
self.tokens = rate
|
||||
self.last_update = datetime.now()
|
||||
self._lock = asyncio.Lock()
|
||||
|
||||
async def acquire(self) -> None:
|
||||
"""Acquire a token, waiting if necessary."""
|
||||
async with self._lock:
|
||||
now = datetime.now()
|
||||
elapsed = (now - self.last_update).total_seconds()
|
||||
self.tokens = min(self.rate, self.tokens + elapsed * (self.rate / self.per))
|
||||
self.last_update = now
|
||||
|
||||
if self.tokens < 1:
|
||||
wait_time = (1 - self.tokens) * (self.per / self.rate)
|
||||
await asyncio.sleep(wait_time)
|
||||
self.tokens = 0
|
||||
else:
|
||||
self.tokens -= 1
|
||||
|
||||
|
||||
class BaseAsyncClient:
|
||||
"""Base class for async API clients with rate limiting."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
max_concurrent: int = 5,
|
||||
requests_per_second: float = 3.0,
|
||||
logger: logging.Logger | None = None,
|
||||
):
|
||||
"""
|
||||
Initialize base client.
|
||||
|
||||
Args:
|
||||
max_concurrent: Maximum concurrent requests
|
||||
requests_per_second: Rate limit
|
||||
logger: Logger instance
|
||||
"""
|
||||
self.semaphore = Semaphore(max_concurrent)
|
||||
self.rate_limiter = RateLimiter(requests_per_second)
|
||||
self.logger = logger or logging.getLogger(self.__class__.__name__)
|
||||
self.stats = {
|
||||
"requests": 0,
|
||||
"success": 0,
|
||||
"errors": 0,
|
||||
"retries": 0,
|
||||
}
|
||||
|
||||
@retry(
|
||||
stop=stop_after_attempt(3),
|
||||
wait=wait_exponential(multiplier=1, min=2, max=10),
|
||||
retry=retry_if_exception_type(Exception),
|
||||
)
|
||||
async def _rate_limited_request(
|
||||
self,
|
||||
coro: Callable[[], Any],
|
||||
) -> Any:
|
||||
"""Execute a request with rate limiting and retry."""
|
||||
async with self.semaphore:
|
||||
await self.rate_limiter.acquire()
|
||||
self.stats["requests"] += 1
|
||||
try:
|
||||
result = await coro()
|
||||
self.stats["success"] += 1
|
||||
return result
|
||||
except Exception as e:
|
||||
self.stats["errors"] += 1
|
||||
self.logger.error(f"Request failed: {e}")
|
||||
raise
|
||||
|
||||
async def batch_requests(
|
||||
self,
|
||||
requests: list[Callable[[], Any]],
|
||||
desc: str = "Processing",
|
||||
) -> list[Any]:
|
||||
"""Execute multiple requests concurrently."""
|
||||
try:
|
||||
from tqdm.asyncio import tqdm
|
||||
has_tqdm = True
|
||||
except ImportError:
|
||||
has_tqdm = False
|
||||
|
||||
async def execute(req: Callable) -> Any:
|
||||
try:
|
||||
return await self._rate_limited_request(req)
|
||||
except Exception as e:
|
||||
return {"error": str(e)}
|
||||
|
||||
tasks = [execute(req) for req in requests]
|
||||
|
||||
if has_tqdm:
|
||||
results = []
|
||||
for coro in tqdm.as_completed(tasks, total=len(tasks), desc=desc):
|
||||
result = await coro
|
||||
results.append(result)
|
||||
return results
|
||||
else:
|
||||
return await asyncio.gather(*tasks, return_exceptions=True)
|
||||
|
||||
def print_stats(self) -> None:
|
||||
"""Print request statistics."""
|
||||
self.logger.info("=" * 40)
|
||||
self.logger.info("Request Statistics:")
|
||||
self.logger.info(f" Total Requests: {self.stats['requests']}")
|
||||
self.logger.info(f" Successful: {self.stats['success']}")
|
||||
self.logger.info(f" Errors: {self.stats['errors']}")
|
||||
self.logger.info("=" * 40)
|
||||
|
||||
|
||||
class ConfigManager:
|
||||
"""Manage API configuration and credentials."""
|
||||
|
||||
def __init__(self):
|
||||
load_dotenv()
|
||||
|
||||
@property
|
||||
def google_credentials_path(self) -> str | None:
|
||||
"""Get Google service account credentials path."""
|
||||
# Prefer SEO-specific credentials, fallback to general credentials
|
||||
seo_creds = os.path.expanduser("~/.credential/ourdigital-seo-agent.json")
|
||||
if os.path.exists(seo_creds):
|
||||
return seo_creds
|
||||
return os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
|
||||
|
||||
@property
|
||||
def pagespeed_api_key(self) -> str | None:
|
||||
"""Get PageSpeed Insights API key."""
|
||||
return os.getenv("PAGESPEED_API_KEY")
|
||||
|
||||
@property
|
||||
def custom_search_api_key(self) -> str | None:
|
||||
"""Get Custom Search API key."""
|
||||
return os.getenv("CUSTOM_SEARCH_API_KEY")
|
||||
|
||||
@property
|
||||
def custom_search_engine_id(self) -> str | None:
|
||||
"""Get Custom Search Engine ID."""
|
||||
return os.getenv("CUSTOM_SEARCH_ENGINE_ID")
|
||||
|
||||
@property
|
||||
def notion_token(self) -> str | None:
|
||||
"""Get Notion API token."""
|
||||
return os.getenv("NOTION_TOKEN") or os.getenv("NOTION_API_KEY")
|
||||
|
||||
def validate_google_credentials(self) -> bool:
|
||||
"""Validate Google credentials are configured."""
|
||||
creds_path = self.google_credentials_path
|
||||
if not creds_path:
|
||||
return False
|
||||
return os.path.exists(creds_path)
|
||||
|
||||
def get_required(self, key: str) -> str:
|
||||
"""Get required environment variable or raise error."""
|
||||
value = os.getenv(key)
|
||||
if not value:
|
||||
raise ValueError(f"Missing required environment variable: {key}")
|
||||
return value
|
||||
|
||||
|
||||
# Singleton config instance
|
||||
config = ConfigManager()
|
||||
@@ -0,0 +1,9 @@
|
||||
# 13-seo-schema-validator dependencies
|
||||
extruct>=0.16.0
|
||||
jsonschema>=4.21.0
|
||||
rdflib>=7.0.0
|
||||
lxml>=5.1.0
|
||||
beautifulsoup4>=4.12.0
|
||||
requests>=2.31.0
|
||||
python-dotenv>=1.0.0
|
||||
rich>=13.7.0
|
||||
@@ -0,0 +1,498 @@
|
||||
"""
|
||||
Schema Validator - Validate JSON-LD structured data markup
|
||||
==========================================================
|
||||
Purpose: Extract and validate schema.org structured data from URLs or files
|
||||
Python: 3.10+
|
||||
Usage:
|
||||
python schema_validator.py --url https://example.com
|
||||
python schema_validator.py --file schema.json
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import logging
|
||||
import re
|
||||
from dataclasses import dataclass, field
|
||||
from datetime import datetime
|
||||
from typing import Any
|
||||
from urllib.parse import urlparse
|
||||
|
||||
import requests
|
||||
from bs4 import BeautifulSoup
|
||||
|
||||
try:
|
||||
import extruct
|
||||
HAS_EXTRUCT = True
|
||||
except ImportError:
|
||||
HAS_EXTRUCT = False
|
||||
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format="%(asctime)s - %(levelname)s - %(message)s",
|
||||
)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@dataclass
|
||||
class ValidationIssue:
|
||||
"""Represents a validation issue found in schema."""
|
||||
|
||||
severity: str # "error", "warning", "info"
|
||||
message: str
|
||||
schema_type: str | None = None
|
||||
property_name: str | None = None
|
||||
suggestion: str | None = None
|
||||
|
||||
|
||||
@dataclass
|
||||
class ValidationResult:
|
||||
"""Complete validation result for a schema."""
|
||||
|
||||
url: str | None = None
|
||||
schemas_found: list[dict] = field(default_factory=list)
|
||||
issues: list[ValidationIssue] = field(default_factory=list)
|
||||
valid: bool = True
|
||||
rich_results_eligible: dict = field(default_factory=dict)
|
||||
timestamp: str = field(default_factory=lambda: datetime.now().isoformat())
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
"""Convert to dictionary for JSON output."""
|
||||
return {
|
||||
"url": self.url,
|
||||
"schemas_found": len(self.schemas_found),
|
||||
"schema_types": [s.get("@type", "Unknown") for s in self.schemas_found],
|
||||
"valid": self.valid,
|
||||
"issues": [
|
||||
{
|
||||
"severity": i.severity,
|
||||
"message": i.message,
|
||||
"schema_type": i.schema_type,
|
||||
"property": i.property_name,
|
||||
"suggestion": i.suggestion,
|
||||
}
|
||||
for i in self.issues
|
||||
],
|
||||
"rich_results_eligible": self.rich_results_eligible,
|
||||
"timestamp": self.timestamp,
|
||||
}
|
||||
|
||||
|
||||
class SchemaValidator:
|
||||
"""Validate schema.org structured data."""
|
||||
|
||||
# Required properties for common schema types
|
||||
REQUIRED_PROPERTIES = {
|
||||
"Organization": ["name", "url"],
|
||||
"LocalBusiness": ["name", "address"],
|
||||
"Product": ["name"],
|
||||
"Offer": ["price", "priceCurrency"],
|
||||
"Article": ["headline", "author", "datePublished", "publisher"],
|
||||
"BlogPosting": ["headline", "author", "datePublished", "publisher"],
|
||||
"NewsArticle": ["headline", "author", "datePublished", "publisher"],
|
||||
"FAQPage": ["mainEntity"],
|
||||
"Question": ["name", "acceptedAnswer"],
|
||||
"Answer": ["text"],
|
||||
"BreadcrumbList": ["itemListElement"],
|
||||
"ListItem": ["position", "name"],
|
||||
"WebSite": ["name", "url"],
|
||||
"WebPage": ["name"],
|
||||
"Person": ["name"],
|
||||
"Event": ["name", "startDate", "location"],
|
||||
"Review": ["reviewRating", "author"],
|
||||
"AggregateRating": ["ratingValue"],
|
||||
"ImageObject": ["url"],
|
||||
}
|
||||
|
||||
# Recommended (but not required) properties
|
||||
RECOMMENDED_PROPERTIES = {
|
||||
"Organization": ["logo", "description", "contactPoint", "sameAs"],
|
||||
"LocalBusiness": ["telephone", "openingHoursSpecification", "geo", "image"],
|
||||
"Product": ["description", "image", "brand", "offers", "aggregateRating"],
|
||||
"Article": ["image", "dateModified", "description"],
|
||||
"FAQPage": [],
|
||||
"WebSite": ["potentialAction"],
|
||||
"BreadcrumbList": [],
|
||||
}
|
||||
|
||||
# Google Rich Results eligible types
|
||||
RICH_RESULTS_TYPES = {
|
||||
"Article", "BlogPosting", "NewsArticle",
|
||||
"Product", "Review",
|
||||
"FAQPage", "HowTo",
|
||||
"LocalBusiness", "Restaurant",
|
||||
"Event",
|
||||
"Recipe",
|
||||
"JobPosting",
|
||||
"Course",
|
||||
"BreadcrumbList",
|
||||
"Organization",
|
||||
"WebSite",
|
||||
"VideoObject",
|
||||
}
|
||||
|
||||
def __init__(self):
|
||||
self.session = requests.Session()
|
||||
self.session.headers.update({
|
||||
"User-Agent": "Mozilla/5.0 (compatible; SEOAuditBot/1.0)"
|
||||
})
|
||||
|
||||
def extract_from_url(self, url: str) -> list[dict]:
|
||||
"""Extract all structured data from a URL."""
|
||||
try:
|
||||
response = self.session.get(url, timeout=30)
|
||||
response.raise_for_status()
|
||||
return self.extract_from_html(response.text, url)
|
||||
except requests.RequestException as e:
|
||||
logger.error(f"Failed to fetch URL: {e}")
|
||||
return []
|
||||
|
||||
def extract_from_html(self, html: str, base_url: str | None = None) -> list[dict]:
|
||||
"""Extract structured data from HTML content."""
|
||||
schemas = []
|
||||
|
||||
# Method 1: Use extruct if available (handles JSON-LD, Microdata, RDFa)
|
||||
if HAS_EXTRUCT:
|
||||
try:
|
||||
data = extruct.extract(html, base_url=base_url, uniform=True)
|
||||
schemas.extend(data.get("json-ld", []))
|
||||
schemas.extend(data.get("microdata", []))
|
||||
schemas.extend(data.get("rdfa", []))
|
||||
except Exception as e:
|
||||
logger.warning(f"extruct extraction failed: {e}")
|
||||
|
||||
# Method 2: Manual JSON-LD extraction (fallback/additional)
|
||||
soup = BeautifulSoup(html, "html.parser")
|
||||
for script in soup.find_all("script", type="application/ld+json"):
|
||||
try:
|
||||
content = script.string
|
||||
if content:
|
||||
data = json.loads(content)
|
||||
if isinstance(data, list):
|
||||
schemas.extend(data)
|
||||
else:
|
||||
schemas.append(data)
|
||||
except json.JSONDecodeError as e:
|
||||
logger.warning(f"Invalid JSON-LD: {e}")
|
||||
|
||||
# Deduplicate schemas
|
||||
seen = set()
|
||||
unique_schemas = []
|
||||
for schema in schemas:
|
||||
schema_str = json.dumps(schema, sort_keys=True)
|
||||
if schema_str not in seen:
|
||||
seen.add(schema_str)
|
||||
unique_schemas.append(schema)
|
||||
|
||||
return unique_schemas
|
||||
|
||||
def validate(self, url: str | None = None, html: str | None = None,
|
||||
schema: dict | None = None) -> ValidationResult:
|
||||
"""Validate schema from URL, HTML, or direct schema dict."""
|
||||
result = ValidationResult(url=url)
|
||||
|
||||
# Extract schemas
|
||||
if schema:
|
||||
schemas = [schema]
|
||||
elif html:
|
||||
schemas = self.extract_from_html(html, url)
|
||||
elif url:
|
||||
schemas = self.extract_from_url(url)
|
||||
else:
|
||||
raise ValueError("Must provide url, html, or schema")
|
||||
|
||||
result.schemas_found = schemas
|
||||
|
||||
if not schemas:
|
||||
result.issues.append(ValidationIssue(
|
||||
severity="warning",
|
||||
message="No structured data found",
|
||||
suggestion="Add JSON-LD schema markup to improve SEO",
|
||||
))
|
||||
result.valid = False
|
||||
return result
|
||||
|
||||
# Validate each schema
|
||||
for schema in schemas:
|
||||
self._validate_schema(schema, result)
|
||||
|
||||
# Check for errors (warnings don't affect validity)
|
||||
result.valid = not any(i.severity == "error" for i in result.issues)
|
||||
|
||||
return result
|
||||
|
||||
def _validate_schema(self, schema: dict, result: ValidationResult,
|
||||
parent_type: str | None = None) -> None:
|
||||
"""Validate a single schema object."""
|
||||
schema_type = schema.get("@type")
|
||||
|
||||
if not schema_type:
|
||||
result.issues.append(ValidationIssue(
|
||||
severity="error",
|
||||
message="Missing @type property",
|
||||
schema_type=parent_type,
|
||||
))
|
||||
return
|
||||
|
||||
# Handle array of types
|
||||
if isinstance(schema_type, list):
|
||||
schema_type = schema_type[0]
|
||||
|
||||
# Check required properties
|
||||
required = self.REQUIRED_PROPERTIES.get(schema_type, [])
|
||||
for prop in required:
|
||||
if prop not in schema:
|
||||
result.issues.append(ValidationIssue(
|
||||
severity="error",
|
||||
message=f"Missing required property: {prop}",
|
||||
schema_type=schema_type,
|
||||
property_name=prop,
|
||||
suggestion=f"Add '{prop}' property to {schema_type} schema",
|
||||
))
|
||||
|
||||
# Check recommended properties
|
||||
recommended = self.RECOMMENDED_PROPERTIES.get(schema_type, [])
|
||||
for prop in recommended:
|
||||
if prop not in schema:
|
||||
result.issues.append(ValidationIssue(
|
||||
severity="info",
|
||||
message=f"Missing recommended property: {prop}",
|
||||
schema_type=schema_type,
|
||||
property_name=prop,
|
||||
suggestion=f"Consider adding '{prop}' for better rich results",
|
||||
))
|
||||
|
||||
# Check Rich Results eligibility
|
||||
if schema_type in self.RICH_RESULTS_TYPES:
|
||||
result.rich_results_eligible[schema_type] = self._check_rich_results(
|
||||
schema, schema_type
|
||||
)
|
||||
|
||||
# Validate nested schemas
|
||||
for key, value in schema.items():
|
||||
if key.startswith("@"):
|
||||
continue
|
||||
if isinstance(value, dict) and "@type" in value:
|
||||
self._validate_schema(value, result, schema_type)
|
||||
elif isinstance(value, list):
|
||||
for item in value:
|
||||
if isinstance(item, dict) and "@type" in item:
|
||||
self._validate_schema(item, result, schema_type)
|
||||
|
||||
# Type-specific validations
|
||||
self._validate_type_specific(schema, schema_type, result)
|
||||
|
||||
def _validate_type_specific(self, schema: dict, schema_type: str,
|
||||
result: ValidationResult) -> None:
|
||||
"""Type-specific validation rules."""
|
||||
if schema_type in ("Article", "BlogPosting", "NewsArticle"):
|
||||
# Check image
|
||||
if "image" not in schema:
|
||||
result.issues.append(ValidationIssue(
|
||||
severity="warning",
|
||||
message="Article without image may not show in rich results",
|
||||
schema_type=schema_type,
|
||||
property_name="image",
|
||||
suggestion="Add at least one image to the article",
|
||||
))
|
||||
|
||||
# Check headline length
|
||||
headline = schema.get("headline", "")
|
||||
if len(headline) > 110:
|
||||
result.issues.append(ValidationIssue(
|
||||
severity="warning",
|
||||
message=f"Headline too long ({len(headline)} chars, max 110)",
|
||||
schema_type=schema_type,
|
||||
property_name="headline",
|
||||
))
|
||||
|
||||
elif schema_type == "Product":
|
||||
offer = schema.get("offers", {})
|
||||
if isinstance(offer, dict):
|
||||
# Check price
|
||||
price = offer.get("price")
|
||||
if price is not None:
|
||||
try:
|
||||
float(price)
|
||||
except (ValueError, TypeError):
|
||||
result.issues.append(ValidationIssue(
|
||||
severity="error",
|
||||
message=f"Invalid price value: {price}",
|
||||
schema_type="Offer",
|
||||
property_name="price",
|
||||
))
|
||||
|
||||
# Check availability
|
||||
availability = offer.get("availability", "")
|
||||
valid_availabilities = [
|
||||
"InStock", "OutOfStock", "PreOrder", "Discontinued",
|
||||
"https://schema.org/InStock", "https://schema.org/OutOfStock",
|
||||
]
|
||||
if availability and not any(
|
||||
a in availability for a in valid_availabilities
|
||||
):
|
||||
result.issues.append(ValidationIssue(
|
||||
severity="warning",
|
||||
message=f"Unknown availability value: {availability}",
|
||||
schema_type="Offer",
|
||||
property_name="availability",
|
||||
))
|
||||
|
||||
elif schema_type == "LocalBusiness":
|
||||
# Check for geo coordinates
|
||||
if "geo" not in schema:
|
||||
result.issues.append(ValidationIssue(
|
||||
severity="info",
|
||||
message="Missing geo coordinates",
|
||||
schema_type=schema_type,
|
||||
property_name="geo",
|
||||
suggestion="Add latitude/longitude for better local search",
|
||||
))
|
||||
|
||||
elif schema_type == "FAQPage":
|
||||
main_entity = schema.get("mainEntity", [])
|
||||
if not main_entity:
|
||||
result.issues.append(ValidationIssue(
|
||||
severity="error",
|
||||
message="FAQPage must have at least one question",
|
||||
schema_type=schema_type,
|
||||
property_name="mainEntity",
|
||||
))
|
||||
elif len(main_entity) < 2:
|
||||
result.issues.append(ValidationIssue(
|
||||
severity="info",
|
||||
message="FAQPage has only one question",
|
||||
schema_type=schema_type,
|
||||
suggestion="Add more questions for better rich results",
|
||||
))
|
||||
|
||||
def _check_rich_results(self, schema: dict, schema_type: str) -> dict:
|
||||
"""Check if schema is eligible for Google Rich Results."""
|
||||
result = {
|
||||
"eligible": True,
|
||||
"missing_for_rich_results": [],
|
||||
}
|
||||
|
||||
if schema_type in ("Article", "BlogPosting", "NewsArticle"):
|
||||
required_for_rich = ["headline", "image", "datePublished", "author"]
|
||||
for prop in required_for_rich:
|
||||
if prop not in schema:
|
||||
result["eligible"] = False
|
||||
result["missing_for_rich_results"].append(prop)
|
||||
|
||||
elif schema_type == "Product":
|
||||
if "name" not in schema:
|
||||
result["eligible"] = False
|
||||
result["missing_for_rich_results"].append("name")
|
||||
offer = schema.get("offers")
|
||||
if not offer:
|
||||
result["eligible"] = False
|
||||
result["missing_for_rich_results"].append("offers")
|
||||
|
||||
elif schema_type == "FAQPage":
|
||||
if not schema.get("mainEntity"):
|
||||
result["eligible"] = False
|
||||
result["missing_for_rich_results"].append("mainEntity")
|
||||
|
||||
return result
|
||||
|
||||
def generate_report(self, result: ValidationResult) -> str:
|
||||
"""Generate human-readable validation report."""
|
||||
lines = [
|
||||
"=" * 60,
|
||||
"Schema Validation Report",
|
||||
"=" * 60,
|
||||
f"URL: {result.url or 'N/A'}",
|
||||
f"Timestamp: {result.timestamp}",
|
||||
f"Valid: {'Yes' if result.valid else 'No'}",
|
||||
f"Schemas Found: {len(result.schemas_found)}",
|
||||
"",
|
||||
]
|
||||
|
||||
if result.schemas_found:
|
||||
lines.append("Schema Types:")
|
||||
for schema in result.schemas_found:
|
||||
schema_type = schema.get("@type", "Unknown")
|
||||
lines.append(f" - {schema_type}")
|
||||
lines.append("")
|
||||
|
||||
if result.rich_results_eligible:
|
||||
lines.append("Rich Results Eligibility:")
|
||||
for schema_type, status in result.rich_results_eligible.items():
|
||||
eligible = "Yes" if status["eligible"] else "No"
|
||||
lines.append(f" - {schema_type}: {eligible}")
|
||||
if status["missing_for_rich_results"]:
|
||||
missing = ", ".join(status["missing_for_rich_results"])
|
||||
lines.append(f" Missing: {missing}")
|
||||
lines.append("")
|
||||
|
||||
if result.issues:
|
||||
lines.append("Issues Found:")
|
||||
errors = [i for i in result.issues if i.severity == "error"]
|
||||
warnings = [i for i in result.issues if i.severity == "warning"]
|
||||
infos = [i for i in result.issues if i.severity == "info"]
|
||||
|
||||
if errors:
|
||||
lines.append(f"\n ERRORS ({len(errors)}):")
|
||||
for issue in errors:
|
||||
lines.append(f" - [{issue.schema_type}] {issue.message}")
|
||||
if issue.suggestion:
|
||||
lines.append(f" Suggestion: {issue.suggestion}")
|
||||
|
||||
if warnings:
|
||||
lines.append(f"\n WARNINGS ({len(warnings)}):")
|
||||
for issue in warnings:
|
||||
lines.append(f" - [{issue.schema_type}] {issue.message}")
|
||||
if issue.suggestion:
|
||||
lines.append(f" Suggestion: {issue.suggestion}")
|
||||
|
||||
if infos:
|
||||
lines.append(f"\n INFO ({len(infos)}):")
|
||||
for issue in infos:
|
||||
lines.append(f" - [{issue.schema_type}] {issue.message}")
|
||||
if issue.suggestion:
|
||||
lines.append(f" Suggestion: {issue.suggestion}")
|
||||
|
||||
lines.append("")
|
||||
lines.append("=" * 60)
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def main():
|
||||
"""Main entry point for CLI usage."""
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Validate schema.org structured data",
|
||||
)
|
||||
parser.add_argument("--url", "-u", help="URL to validate")
|
||||
parser.add_argument("--file", "-f", help="JSON-LD file to validate")
|
||||
parser.add_argument("--output", "-o", help="Output file for JSON report")
|
||||
parser.add_argument("--json", action="store_true", help="Output as JSON")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
if not args.url and not args.file:
|
||||
parser.error("Must provide --url or --file")
|
||||
|
||||
validator = SchemaValidator()
|
||||
|
||||
if args.file:
|
||||
with open(args.file, "r", encoding="utf-8") as f:
|
||||
schema = json.load(f)
|
||||
result = validator.validate(schema=schema)
|
||||
else:
|
||||
result = validator.validate(url=args.url)
|
||||
|
||||
if args.json or args.output:
|
||||
output = json.dumps(result.to_dict(), ensure_ascii=False, indent=2)
|
||||
if args.output:
|
||||
with open(args.output, "w", encoding="utf-8") as f:
|
||||
f.write(output)
|
||||
logger.info(f"Report written to {args.output}")
|
||||
else:
|
||||
print(output)
|
||||
else:
|
||||
print(validator.generate_report(result))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
110
custom-skills/13-seo-schema-validator/desktop/SKILL.md
Normal file
110
custom-skills/13-seo-schema-validator/desktop/SKILL.md
Normal file
@@ -0,0 +1,110 @@
|
||||
---
|
||||
name: seo-schema-validator
|
||||
version: 1.0.0
|
||||
description: Structured data validator for JSON-LD, Microdata, and RDFa. Triggers: validate schema, structured data, JSON-LD, rich results, schema.org.
|
||||
allowed-tools: mcp__firecrawl__*, mcp__perplexity__*
|
||||
---
|
||||
|
||||
# SEO Schema Validator
|
||||
|
||||
## Purpose
|
||||
|
||||
Extract and validate structured data (JSON-LD, Microdata, RDFa) against schema.org vocabulary and Google Rich Results requirements.
|
||||
|
||||
## Core Capabilities
|
||||
|
||||
1. **Extract** - Find all structured data on page
|
||||
2. **Parse** - JSON-LD, Microdata, RDFa formats
|
||||
3. **Validate** - Schema.org compliance
|
||||
4. **Rich Results** - Google eligibility check
|
||||
|
||||
## MCP Tool Usage
|
||||
|
||||
```
|
||||
mcp__firecrawl__scrape: Extract page HTML with structured data
|
||||
mcp__perplexity__search: Research schema requirements
|
||||
```
|
||||
|
||||
## Workflow
|
||||
|
||||
1. Scrape target URL
|
||||
2. Locate structured data blocks
|
||||
3. Parse each format found
|
||||
4. Validate against schema.org
|
||||
5. Check Rich Results eligibility
|
||||
6. Report issues and recommendations
|
||||
|
||||
## Supported Schema Types
|
||||
|
||||
| Type | Required Properties | Rich Result |
|
||||
|------|---------------------|-------------|
|
||||
| Article | headline, author, datePublished, image | Yes |
|
||||
| Product | name, offers (price, availability) | Yes |
|
||||
| LocalBusiness | name, address, telephone | Yes |
|
||||
| FAQPage | mainEntity (questions) | Yes |
|
||||
| Organization | name, url, logo | Sitelinks |
|
||||
| BreadcrumbList | itemListElement | Yes |
|
||||
| WebSite | name, url, potentialAction | Sitelinks |
|
||||
| Review | itemReviewed, reviewRating | Yes |
|
||||
| Event | name, startDate, location | Yes |
|
||||
| Recipe | name, image, ingredients | Yes |
|
||||
|
||||
## Validation Levels
|
||||
|
||||
### Level 1: Syntax
|
||||
- Valid JSON structure
|
||||
- Proper nesting
|
||||
- No parsing errors
|
||||
|
||||
### Level 2: Vocabulary
|
||||
- Valid @type values
|
||||
- Known property names
|
||||
- Correct value types
|
||||
|
||||
### Level 3: Rich Results
|
||||
- Required properties present
|
||||
- Recommended properties
|
||||
- Google-specific requirements
|
||||
|
||||
## Output Format
|
||||
|
||||
```markdown
|
||||
## Schema Validation: [URL]
|
||||
|
||||
### Schemas Found: X
|
||||
|
||||
#### Schema 1: [Type]
|
||||
- Format: JSON-LD
|
||||
- Valid: Yes/No
|
||||
- Rich Results Eligible: Yes/No
|
||||
|
||||
**Issues:**
|
||||
- [Error/Warning list]
|
||||
|
||||
**Properties:**
|
||||
| Property | Present | Valid |
|
||||
|----------|---------|-------|
|
||||
|
||||
### Summary
|
||||
- Valid: X
|
||||
- Invalid: X
|
||||
- Rich Results Ready: X
|
||||
|
||||
### Recommendations
|
||||
1. [Fixes needed]
|
||||
```
|
||||
|
||||
## Common Issues
|
||||
|
||||
| Issue | Severity | Fix |
|
||||
|-------|----------|-----|
|
||||
| Missing required property | Error | Add property |
|
||||
| Invalid date format | Error | Use ISO 8601 |
|
||||
| Missing @context | Error | Add schema.org context |
|
||||
| No image property | Warning | Add image URL |
|
||||
|
||||
## Limitations
|
||||
|
||||
- Cannot test rendered schema (JavaScript)
|
||||
- Validation against schema.org, not all Google features
|
||||
- Use Google Rich Results Test for final verification
|
||||
121
custom-skills/14-seo-schema-generator/code/CLAUDE.md
Normal file
121
custom-skills/14-seo-schema-generator/code/CLAUDE.md
Normal file
@@ -0,0 +1,121 @@
|
||||
# CLAUDE.md
|
||||
|
||||
## Overview
|
||||
|
||||
Schema markup generator: create JSON-LD structured data from templates for various content types.
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
pip install -r scripts/requirements.txt
|
||||
|
||||
# Generate Organization schema
|
||||
python scripts/schema_generator.py --type organization --url https://example.com
|
||||
|
||||
# Generate from template
|
||||
python scripts/schema_generator.py --template templates/article.json --data article_data.json
|
||||
```
|
||||
|
||||
## Scripts
|
||||
|
||||
| Script | Purpose |
|
||||
|--------|---------|
|
||||
| `schema_generator.py` | Generate schema markup |
|
||||
| `base_client.py` | Shared utilities |
|
||||
|
||||
## Supported Schema Types
|
||||
|
||||
| Type | Template | Use Case |
|
||||
|------|----------|----------|
|
||||
| Organization | `organization.json` | Company/brand info |
|
||||
| LocalBusiness | `local_business.json` | Physical locations |
|
||||
| Article | `article.json` | Blog posts, news |
|
||||
| Product | `product.json` | E-commerce items |
|
||||
| FAQPage | `faq.json` | FAQ sections |
|
||||
| BreadcrumbList | `breadcrumb.json` | Navigation path |
|
||||
| WebSite | `website.json` | Site-level info |
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Organization
|
||||
```bash
|
||||
python scripts/schema_generator.py --type organization \
|
||||
--name "Company Name" \
|
||||
--url "https://example.com" \
|
||||
--logo "https://example.com/logo.png"
|
||||
```
|
||||
|
||||
### LocalBusiness
|
||||
```bash
|
||||
python scripts/schema_generator.py --type localbusiness \
|
||||
--name "Restaurant Name" \
|
||||
--address "123 Main St, City, State 12345" \
|
||||
--phone "+1-555-123-4567" \
|
||||
--hours "Mo-Fr 09:00-17:00"
|
||||
```
|
||||
|
||||
### Article
|
||||
```bash
|
||||
python scripts/schema_generator.py --type article \
|
||||
--headline "Article Title" \
|
||||
--author "Author Name" \
|
||||
--published "2024-01-15" \
|
||||
--image "https://example.com/image.jpg"
|
||||
```
|
||||
|
||||
### FAQPage
|
||||
```bash
|
||||
python scripts/schema_generator.py --type faq \
|
||||
--questions questions.json
|
||||
```
|
||||
|
||||
## Output
|
||||
|
||||
Generated JSON-LD ready for insertion:
|
||||
|
||||
```html
|
||||
<script type="application/ld+json">
|
||||
{
|
||||
"@context": "https://schema.org",
|
||||
"@type": "Organization",
|
||||
"name": "Company Name",
|
||||
"url": "https://example.com",
|
||||
"logo": "https://example.com/logo.png"
|
||||
}
|
||||
</script>
|
||||
```
|
||||
|
||||
## Template Customization
|
||||
|
||||
Templates in `templates/` can be modified. Required fields are marked:
|
||||
|
||||
```json
|
||||
{
|
||||
"@context": "https://schema.org",
|
||||
"@type": "Article",
|
||||
"headline": "{{REQUIRED}}",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "{{REQUIRED}}"
|
||||
},
|
||||
"datePublished": "{{REQUIRED}}",
|
||||
"image": "{{RECOMMENDED}}"
|
||||
}
|
||||
```
|
||||
|
||||
## Validation
|
||||
|
||||
Generated schemas are validated before output:
|
||||
- Syntax correctness
|
||||
- Required properties present
|
||||
- Schema.org vocabulary compliance
|
||||
|
||||
Use skill 13 (schema-validator) for additional validation.
|
||||
|
||||
## Dependencies
|
||||
|
||||
```
|
||||
jsonschema>=4.21.0
|
||||
requests>=2.31.0
|
||||
python-dotenv>=1.0.0
|
||||
```
|
||||
@@ -0,0 +1,207 @@
|
||||
"""
|
||||
Base Client - Shared async client utilities
|
||||
===========================================
|
||||
Purpose: Rate-limited async operations for API clients
|
||||
Python: 3.10+
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
import os
|
||||
from asyncio import Semaphore
|
||||
from datetime import datetime
|
||||
from typing import Any, Callable, TypeVar
|
||||
|
||||
from dotenv import load_dotenv
|
||||
from tenacity import (
|
||||
retry,
|
||||
stop_after_attempt,
|
||||
wait_exponential,
|
||||
retry_if_exception_type,
|
||||
)
|
||||
|
||||
# Load environment variables
|
||||
load_dotenv()
|
||||
|
||||
# Logging setup
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format="%(asctime)s - %(levelname)s - %(message)s",
|
||||
)
|
||||
|
||||
T = TypeVar("T")
|
||||
|
||||
|
||||
class RateLimiter:
|
||||
"""Rate limiter using token bucket algorithm."""
|
||||
|
||||
def __init__(self, rate: float, per: float = 1.0):
|
||||
"""
|
||||
Initialize rate limiter.
|
||||
|
||||
Args:
|
||||
rate: Number of requests allowed
|
||||
per: Time period in seconds (default: 1 second)
|
||||
"""
|
||||
self.rate = rate
|
||||
self.per = per
|
||||
self.tokens = rate
|
||||
self.last_update = datetime.now()
|
||||
self._lock = asyncio.Lock()
|
||||
|
||||
async def acquire(self) -> None:
|
||||
"""Acquire a token, waiting if necessary."""
|
||||
async with self._lock:
|
||||
now = datetime.now()
|
||||
elapsed = (now - self.last_update).total_seconds()
|
||||
self.tokens = min(self.rate, self.tokens + elapsed * (self.rate / self.per))
|
||||
self.last_update = now
|
||||
|
||||
if self.tokens < 1:
|
||||
wait_time = (1 - self.tokens) * (self.per / self.rate)
|
||||
await asyncio.sleep(wait_time)
|
||||
self.tokens = 0
|
||||
else:
|
||||
self.tokens -= 1
|
||||
|
||||
|
||||
class BaseAsyncClient:
|
||||
"""Base class for async API clients with rate limiting."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
max_concurrent: int = 5,
|
||||
requests_per_second: float = 3.0,
|
||||
logger: logging.Logger | None = None,
|
||||
):
|
||||
"""
|
||||
Initialize base client.
|
||||
|
||||
Args:
|
||||
max_concurrent: Maximum concurrent requests
|
||||
requests_per_second: Rate limit
|
||||
logger: Logger instance
|
||||
"""
|
||||
self.semaphore = Semaphore(max_concurrent)
|
||||
self.rate_limiter = RateLimiter(requests_per_second)
|
||||
self.logger = logger or logging.getLogger(self.__class__.__name__)
|
||||
self.stats = {
|
||||
"requests": 0,
|
||||
"success": 0,
|
||||
"errors": 0,
|
||||
"retries": 0,
|
||||
}
|
||||
|
||||
@retry(
|
||||
stop=stop_after_attempt(3),
|
||||
wait=wait_exponential(multiplier=1, min=2, max=10),
|
||||
retry=retry_if_exception_type(Exception),
|
||||
)
|
||||
async def _rate_limited_request(
|
||||
self,
|
||||
coro: Callable[[], Any],
|
||||
) -> Any:
|
||||
"""Execute a request with rate limiting and retry."""
|
||||
async with self.semaphore:
|
||||
await self.rate_limiter.acquire()
|
||||
self.stats["requests"] += 1
|
||||
try:
|
||||
result = await coro()
|
||||
self.stats["success"] += 1
|
||||
return result
|
||||
except Exception as e:
|
||||
self.stats["errors"] += 1
|
||||
self.logger.error(f"Request failed: {e}")
|
||||
raise
|
||||
|
||||
async def batch_requests(
|
||||
self,
|
||||
requests: list[Callable[[], Any]],
|
||||
desc: str = "Processing",
|
||||
) -> list[Any]:
|
||||
"""Execute multiple requests concurrently."""
|
||||
try:
|
||||
from tqdm.asyncio import tqdm
|
||||
has_tqdm = True
|
||||
except ImportError:
|
||||
has_tqdm = False
|
||||
|
||||
async def execute(req: Callable) -> Any:
|
||||
try:
|
||||
return await self._rate_limited_request(req)
|
||||
except Exception as e:
|
||||
return {"error": str(e)}
|
||||
|
||||
tasks = [execute(req) for req in requests]
|
||||
|
||||
if has_tqdm:
|
||||
results = []
|
||||
for coro in tqdm.as_completed(tasks, total=len(tasks), desc=desc):
|
||||
result = await coro
|
||||
results.append(result)
|
||||
return results
|
||||
else:
|
||||
return await asyncio.gather(*tasks, return_exceptions=True)
|
||||
|
||||
def print_stats(self) -> None:
|
||||
"""Print request statistics."""
|
||||
self.logger.info("=" * 40)
|
||||
self.logger.info("Request Statistics:")
|
||||
self.logger.info(f" Total Requests: {self.stats['requests']}")
|
||||
self.logger.info(f" Successful: {self.stats['success']}")
|
||||
self.logger.info(f" Errors: {self.stats['errors']}")
|
||||
self.logger.info("=" * 40)
|
||||
|
||||
|
||||
class ConfigManager:
|
||||
"""Manage API configuration and credentials."""
|
||||
|
||||
def __init__(self):
|
||||
load_dotenv()
|
||||
|
||||
@property
|
||||
def google_credentials_path(self) -> str | None:
|
||||
"""Get Google service account credentials path."""
|
||||
# Prefer SEO-specific credentials, fallback to general credentials
|
||||
seo_creds = os.path.expanduser("~/.credential/ourdigital-seo-agent.json")
|
||||
if os.path.exists(seo_creds):
|
||||
return seo_creds
|
||||
return os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
|
||||
|
||||
@property
|
||||
def pagespeed_api_key(self) -> str | None:
|
||||
"""Get PageSpeed Insights API key."""
|
||||
return os.getenv("PAGESPEED_API_KEY")
|
||||
|
||||
@property
|
||||
def custom_search_api_key(self) -> str | None:
|
||||
"""Get Custom Search API key."""
|
||||
return os.getenv("CUSTOM_SEARCH_API_KEY")
|
||||
|
||||
@property
|
||||
def custom_search_engine_id(self) -> str | None:
|
||||
"""Get Custom Search Engine ID."""
|
||||
return os.getenv("CUSTOM_SEARCH_ENGINE_ID")
|
||||
|
||||
@property
|
||||
def notion_token(self) -> str | None:
|
||||
"""Get Notion API token."""
|
||||
return os.getenv("NOTION_TOKEN") or os.getenv("NOTION_API_KEY")
|
||||
|
||||
def validate_google_credentials(self) -> bool:
|
||||
"""Validate Google credentials are configured."""
|
||||
creds_path = self.google_credentials_path
|
||||
if not creds_path:
|
||||
return False
|
||||
return os.path.exists(creds_path)
|
||||
|
||||
def get_required(self, key: str) -> str:
|
||||
"""Get required environment variable or raise error."""
|
||||
value = os.getenv(key)
|
||||
if not value:
|
||||
raise ValueError(f"Missing required environment variable: {key}")
|
||||
return value
|
||||
|
||||
|
||||
# Singleton config instance
|
||||
config = ConfigManager()
|
||||
@@ -0,0 +1,6 @@
|
||||
# 14-seo-schema-generator dependencies
|
||||
jsonschema>=4.21.0
|
||||
requests>=2.31.0
|
||||
python-dotenv>=1.0.0
|
||||
rich>=13.7.0
|
||||
typer>=0.9.0
|
||||
@@ -0,0 +1,490 @@
|
||||
"""
|
||||
Schema Generator - Generate JSON-LD structured data markup
|
||||
==========================================================
|
||||
Purpose: Generate schema.org structured data in JSON-LD format
|
||||
Python: 3.10+
|
||||
Usage:
|
||||
python schema_generator.py --type organization --name "Company Name" --url "https://example.com"
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import re
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format="%(asctime)s - %(levelname)s - %(message)s",
|
||||
)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Template directory relative to this script
|
||||
TEMPLATE_DIR = Path(__file__).parent.parent / "templates" / "schema_templates"
|
||||
|
||||
|
||||
class SchemaGenerator:
|
||||
"""Generate JSON-LD schema markup from templates."""
|
||||
|
||||
SCHEMA_TYPES = {
|
||||
"organization": "organization.json",
|
||||
"local_business": "local_business.json",
|
||||
"product": "product.json",
|
||||
"article": "article.json",
|
||||
"faq": "faq.json",
|
||||
"breadcrumb": "breadcrumb.json",
|
||||
"website": "website.json",
|
||||
}
|
||||
|
||||
# Business type mappings for LocalBusiness
|
||||
BUSINESS_TYPES = {
|
||||
"restaurant": "Restaurant",
|
||||
"cafe": "CafeOrCoffeeShop",
|
||||
"bar": "BarOrPub",
|
||||
"hotel": "Hotel",
|
||||
"store": "Store",
|
||||
"medical": "MedicalBusiness",
|
||||
"dental": "Dentist",
|
||||
"legal": "LegalService",
|
||||
"real_estate": "RealEstateAgent",
|
||||
"auto": "AutoRepair",
|
||||
"beauty": "BeautySalon",
|
||||
"gym": "HealthClub",
|
||||
"spa": "DaySpa",
|
||||
}
|
||||
|
||||
# Article type mappings
|
||||
ARTICLE_TYPES = {
|
||||
"article": "Article",
|
||||
"blog": "BlogPosting",
|
||||
"news": "NewsArticle",
|
||||
"tech": "TechArticle",
|
||||
"scholarly": "ScholarlyArticle",
|
||||
}
|
||||
|
||||
def __init__(self, template_dir: Path = TEMPLATE_DIR):
|
||||
self.template_dir = template_dir
|
||||
|
||||
def load_template(self, schema_type: str) -> dict:
|
||||
"""Load a schema template file."""
|
||||
if schema_type not in self.SCHEMA_TYPES:
|
||||
raise ValueError(f"Unknown schema type: {schema_type}. "
|
||||
f"Available: {list(self.SCHEMA_TYPES.keys())}")
|
||||
|
||||
template_file = self.template_dir / self.SCHEMA_TYPES[schema_type]
|
||||
if not template_file.exists():
|
||||
raise FileNotFoundError(f"Template not found: {template_file}")
|
||||
|
||||
with open(template_file, "r", encoding="utf-8") as f:
|
||||
return json.load(f)
|
||||
|
||||
def fill_template(self, template: dict, data: dict[str, Any]) -> dict:
|
||||
"""Fill template placeholders with actual data."""
|
||||
template_str = json.dumps(template, ensure_ascii=False)
|
||||
|
||||
# Replace placeholders {{key}} with values
|
||||
for key, value in data.items():
|
||||
placeholder = f"{{{{{key}}}}}"
|
||||
if value is not None:
|
||||
template_str = template_str.replace(placeholder, str(value))
|
||||
|
||||
# Remove unfilled placeholders and their parent objects if empty
|
||||
result = json.loads(template_str)
|
||||
return self._clean_empty_values(result)
|
||||
|
||||
def _clean_empty_values(self, obj: Any) -> Any:
|
||||
"""Remove empty values and unfilled placeholders."""
|
||||
if isinstance(obj, dict):
|
||||
cleaned = {}
|
||||
for key, value in obj.items():
|
||||
cleaned_value = self._clean_empty_values(value)
|
||||
# Skip if value is empty, None, or unfilled placeholder
|
||||
if cleaned_value is None:
|
||||
continue
|
||||
if isinstance(cleaned_value, str) and cleaned_value.startswith("{{"):
|
||||
continue
|
||||
if isinstance(cleaned_value, (list, dict)) and not cleaned_value:
|
||||
continue
|
||||
cleaned[key] = cleaned_value
|
||||
return cleaned if cleaned else None
|
||||
elif isinstance(obj, list):
|
||||
cleaned = []
|
||||
for item in obj:
|
||||
cleaned_item = self._clean_empty_values(item)
|
||||
if cleaned_item is not None:
|
||||
if isinstance(cleaned_item, str) and cleaned_item.startswith("{{"):
|
||||
continue
|
||||
cleaned.append(cleaned_item)
|
||||
return cleaned if cleaned else None
|
||||
elif isinstance(obj, str):
|
||||
if obj.startswith("{{") and obj.endswith("}}"):
|
||||
return None
|
||||
return obj
|
||||
return obj
|
||||
|
||||
def generate_organization(
|
||||
self,
|
||||
name: str,
|
||||
url: str,
|
||||
logo_url: str | None = None,
|
||||
description: str | None = None,
|
||||
founding_date: str | None = None,
|
||||
phone: str | None = None,
|
||||
address: dict | None = None,
|
||||
social_links: list[str] | None = None,
|
||||
) -> dict:
|
||||
"""Generate Organization schema."""
|
||||
template = self.load_template("organization")
|
||||
|
||||
data = {
|
||||
"name": name,
|
||||
"url": url,
|
||||
"logo_url": logo_url,
|
||||
"description": description,
|
||||
"founding_date": founding_date,
|
||||
"phone": phone,
|
||||
}
|
||||
|
||||
if address:
|
||||
data.update({
|
||||
"street_address": address.get("street"),
|
||||
"city": address.get("city"),
|
||||
"region": address.get("region"),
|
||||
"postal_code": address.get("postal_code"),
|
||||
"country": address.get("country", "KR"),
|
||||
})
|
||||
|
||||
if social_links:
|
||||
# Handle social links specially
|
||||
pass
|
||||
|
||||
return self.fill_template(template, data)
|
||||
|
||||
def generate_local_business(
|
||||
self,
|
||||
name: str,
|
||||
business_type: str,
|
||||
address: dict,
|
||||
phone: str | None = None,
|
||||
url: str | None = None,
|
||||
description: str | None = None,
|
||||
hours: dict | None = None,
|
||||
geo: dict | None = None,
|
||||
price_range: str | None = None,
|
||||
rating: float | None = None,
|
||||
review_count: int | None = None,
|
||||
) -> dict:
|
||||
"""Generate LocalBusiness schema."""
|
||||
template = self.load_template("local_business")
|
||||
|
||||
schema_business_type = self.BUSINESS_TYPES.get(
|
||||
business_type.lower(), "LocalBusiness"
|
||||
)
|
||||
|
||||
data = {
|
||||
"business_type": schema_business_type,
|
||||
"name": name,
|
||||
"url": url,
|
||||
"description": description,
|
||||
"phone": phone,
|
||||
"price_range": price_range,
|
||||
"street_address": address.get("street"),
|
||||
"city": address.get("city"),
|
||||
"region": address.get("region"),
|
||||
"postal_code": address.get("postal_code"),
|
||||
"country": address.get("country", "KR"),
|
||||
}
|
||||
|
||||
if geo:
|
||||
data["latitude"] = geo.get("lat")
|
||||
data["longitude"] = geo.get("lng")
|
||||
|
||||
if hours:
|
||||
data.update({
|
||||
"weekday_opens": hours.get("weekday_opens", "09:00"),
|
||||
"weekday_closes": hours.get("weekday_closes", "18:00"),
|
||||
"weekend_opens": hours.get("weekend_opens"),
|
||||
"weekend_closes": hours.get("weekend_closes"),
|
||||
})
|
||||
|
||||
if rating is not None:
|
||||
data["rating"] = str(rating)
|
||||
data["review_count"] = str(review_count or 0)
|
||||
|
||||
return self.fill_template(template, data)
|
||||
|
||||
def generate_product(
|
||||
self,
|
||||
name: str,
|
||||
description: str,
|
||||
price: float,
|
||||
currency: str = "KRW",
|
||||
brand: str | None = None,
|
||||
sku: str | None = None,
|
||||
images: list[str] | None = None,
|
||||
availability: str = "InStock",
|
||||
condition: str = "NewCondition",
|
||||
rating: float | None = None,
|
||||
review_count: int | None = None,
|
||||
url: str | None = None,
|
||||
seller: str | None = None,
|
||||
) -> dict:
|
||||
"""Generate Product schema."""
|
||||
template = self.load_template("product")
|
||||
|
||||
data = {
|
||||
"name": name,
|
||||
"description": description,
|
||||
"price": str(int(price)),
|
||||
"currency": currency,
|
||||
"brand_name": brand,
|
||||
"sku": sku,
|
||||
"product_url": url,
|
||||
"availability": availability,
|
||||
"condition": condition,
|
||||
"seller_name": seller,
|
||||
}
|
||||
|
||||
if images:
|
||||
for i, img in enumerate(images[:3], 1):
|
||||
data[f"image_url_{i}"] = img
|
||||
|
||||
if rating is not None:
|
||||
data["rating"] = str(rating)
|
||||
data["review_count"] = str(review_count or 0)
|
||||
|
||||
return self.fill_template(template, data)
|
||||
|
||||
def generate_article(
|
||||
self,
|
||||
headline: str,
|
||||
description: str,
|
||||
author_name: str,
|
||||
date_published: str,
|
||||
publisher_name: str,
|
||||
article_type: str = "article",
|
||||
date_modified: str | None = None,
|
||||
images: list[str] | None = None,
|
||||
page_url: str | None = None,
|
||||
publisher_logo: str | None = None,
|
||||
author_url: str | None = None,
|
||||
section: str | None = None,
|
||||
word_count: int | None = None,
|
||||
keywords: str | None = None,
|
||||
) -> dict:
|
||||
"""Generate Article schema."""
|
||||
template = self.load_template("article")
|
||||
|
||||
schema_article_type = self.ARTICLE_TYPES.get(
|
||||
article_type.lower(), "Article"
|
||||
)
|
||||
|
||||
data = {
|
||||
"article_type": schema_article_type,
|
||||
"headline": headline,
|
||||
"description": description,
|
||||
"author_name": author_name,
|
||||
"author_url": author_url,
|
||||
"date_published": date_published,
|
||||
"date_modified": date_modified or date_published,
|
||||
"publisher_name": publisher_name,
|
||||
"publisher_logo_url": publisher_logo,
|
||||
"page_url": page_url,
|
||||
"section": section,
|
||||
"word_count": str(word_count) if word_count else None,
|
||||
"keywords": keywords,
|
||||
}
|
||||
|
||||
if images:
|
||||
for i, img in enumerate(images[:2], 1):
|
||||
data[f"image_url_{i}"] = img
|
||||
|
||||
return self.fill_template(template, data)
|
||||
|
||||
def generate_faq(self, questions: list[dict[str, str]]) -> dict:
|
||||
"""Generate FAQPage schema."""
|
||||
schema = {
|
||||
"@context": "https://schema.org",
|
||||
"@type": "FAQPage",
|
||||
"mainEntity": [],
|
||||
}
|
||||
|
||||
for qa in questions:
|
||||
schema["mainEntity"].append({
|
||||
"@type": "Question",
|
||||
"name": qa["question"],
|
||||
"acceptedAnswer": {
|
||||
"@type": "Answer",
|
||||
"text": qa["answer"],
|
||||
},
|
||||
})
|
||||
|
||||
return schema
|
||||
|
||||
def generate_breadcrumb(self, items: list[dict[str, str]]) -> dict:
|
||||
"""Generate BreadcrumbList schema."""
|
||||
schema = {
|
||||
"@context": "https://schema.org",
|
||||
"@type": "BreadcrumbList",
|
||||
"itemListElement": [],
|
||||
}
|
||||
|
||||
for i, item in enumerate(items, 1):
|
||||
schema["itemListElement"].append({
|
||||
"@type": "ListItem",
|
||||
"position": i,
|
||||
"name": item["name"],
|
||||
"item": item["url"],
|
||||
})
|
||||
|
||||
return schema
|
||||
|
||||
def generate_website(
|
||||
self,
|
||||
name: str,
|
||||
url: str,
|
||||
search_url_template: str | None = None,
|
||||
description: str | None = None,
|
||||
language: str = "ko-KR",
|
||||
publisher_name: str | None = None,
|
||||
logo_url: str | None = None,
|
||||
alternate_name: str | None = None,
|
||||
) -> dict:
|
||||
"""Generate WebSite schema."""
|
||||
template = self.load_template("website")
|
||||
|
||||
data = {
|
||||
"site_name": name,
|
||||
"url": url,
|
||||
"description": description,
|
||||
"language": language,
|
||||
"search_url_template": search_url_template,
|
||||
"publisher_name": publisher_name or name,
|
||||
"logo_url": logo_url,
|
||||
"alternate_name": alternate_name,
|
||||
}
|
||||
|
||||
return self.fill_template(template, data)
|
||||
|
||||
def to_json_ld(self, schema: dict, pretty: bool = True) -> str:
|
||||
"""Convert schema dict to JSON-LD string."""
|
||||
indent = 2 if pretty else None
|
||||
return json.dumps(schema, ensure_ascii=False, indent=indent)
|
||||
|
||||
def to_html_script(self, schema: dict) -> str:
|
||||
"""Wrap schema in HTML script tag."""
|
||||
json_ld = self.to_json_ld(schema)
|
||||
return f'<script type="application/ld+json">\n{json_ld}\n</script>'
|
||||
|
||||
|
||||
def main():
|
||||
"""Main entry point for CLI usage."""
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Generate JSON-LD schema markup",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
Examples:
|
||||
# Generate Organization schema
|
||||
python schema_generator.py --type organization --name "My Company" --url "https://example.com"
|
||||
|
||||
# Generate Product schema
|
||||
python schema_generator.py --type product --name "Widget" --price 29900 --currency KRW
|
||||
|
||||
# Generate Article schema
|
||||
python schema_generator.py --type article --headline "Article Title" --author "John Doe"
|
||||
""",
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--type", "-t",
|
||||
required=True,
|
||||
choices=SchemaGenerator.SCHEMA_TYPES.keys(),
|
||||
help="Schema type to generate",
|
||||
)
|
||||
parser.add_argument("--name", help="Name/title")
|
||||
parser.add_argument("--url", help="URL")
|
||||
parser.add_argument("--description", help="Description")
|
||||
parser.add_argument("--price", type=float, help="Price (for product)")
|
||||
parser.add_argument("--currency", default="KRW", help="Currency code")
|
||||
parser.add_argument("--headline", help="Headline (for article)")
|
||||
parser.add_argument("--author", help="Author name")
|
||||
parser.add_argument("--output", "-o", help="Output file path")
|
||||
parser.add_argument("--html", action="store_true", help="Output as HTML script tag")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
generator = SchemaGenerator()
|
||||
|
||||
try:
|
||||
if args.type == "organization":
|
||||
schema = generator.generate_organization(
|
||||
name=args.name or "Organization Name",
|
||||
url=args.url or "https://example.com",
|
||||
description=args.description,
|
||||
)
|
||||
elif args.type == "product":
|
||||
schema = generator.generate_product(
|
||||
name=args.name or "Product Name",
|
||||
description=args.description or "Product description",
|
||||
price=args.price or 0,
|
||||
currency=args.currency,
|
||||
)
|
||||
elif args.type == "article":
|
||||
schema = generator.generate_article(
|
||||
headline=args.headline or args.name or "Article Title",
|
||||
description=args.description or "Article description",
|
||||
author_name=args.author or "Author",
|
||||
date_published=datetime.now().strftime("%Y-%m-%d"),
|
||||
publisher_name="Publisher",
|
||||
)
|
||||
elif args.type == "website":
|
||||
schema = generator.generate_website(
|
||||
name=args.name or "Website Name",
|
||||
url=args.url or "https://example.com",
|
||||
description=args.description,
|
||||
)
|
||||
elif args.type == "faq":
|
||||
# Example FAQ
|
||||
schema = generator.generate_faq([
|
||||
{"question": "Question 1?", "answer": "Answer 1"},
|
||||
{"question": "Question 2?", "answer": "Answer 2"},
|
||||
])
|
||||
elif args.type == "breadcrumb":
|
||||
# Example breadcrumb
|
||||
schema = generator.generate_breadcrumb([
|
||||
{"name": "Home", "url": "https://example.com/"},
|
||||
{"name": "Category", "url": "https://example.com/category/"},
|
||||
])
|
||||
elif args.type == "local_business":
|
||||
schema = generator.generate_local_business(
|
||||
name=args.name or "Business Name",
|
||||
business_type="store",
|
||||
address={"street": "123 Main St", "city": "Seoul", "country": "KR"},
|
||||
url=args.url,
|
||||
description=args.description,
|
||||
)
|
||||
else:
|
||||
raise ValueError(f"Unsupported type: {args.type}")
|
||||
|
||||
if args.html:
|
||||
output = generator.to_html_script(schema)
|
||||
else:
|
||||
output = generator.to_json_ld(schema)
|
||||
|
||||
if args.output:
|
||||
with open(args.output, "w", encoding="utf-8") as f:
|
||||
f.write(output)
|
||||
logger.info(f"Schema written to {args.output}")
|
||||
else:
|
||||
print(output)
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error generating schema: {e}")
|
||||
raise
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -0,0 +1,32 @@
|
||||
{
|
||||
"@context": "https://schema.org",
|
||||
"@type": "{{article_type}}",
|
||||
"headline": "{{headline}}",
|
||||
"description": "{{description}}",
|
||||
"image": [
|
||||
"{{image_url_1}}",
|
||||
"{{image_url_2}}"
|
||||
],
|
||||
"datePublished": "{{date_published}}",
|
||||
"dateModified": "{{date_modified}}",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "{{author_name}}",
|
||||
"url": "{{author_url}}"
|
||||
},
|
||||
"publisher": {
|
||||
"@type": "Organization",
|
||||
"name": "{{publisher_name}}",
|
||||
"logo": {
|
||||
"@type": "ImageObject",
|
||||
"url": "{{publisher_logo_url}}"
|
||||
}
|
||||
},
|
||||
"mainEntityOfPage": {
|
||||
"@type": "WebPage",
|
||||
"@id": "{{page_url}}"
|
||||
},
|
||||
"articleSection": "{{section}}",
|
||||
"wordCount": "{{word_count}}",
|
||||
"keywords": "{{keywords}}"
|
||||
}
|
||||
@@ -0,0 +1,24 @@
|
||||
{
|
||||
"@context": "https://schema.org",
|
||||
"@type": "BreadcrumbList",
|
||||
"itemListElement": [
|
||||
{
|
||||
"@type": "ListItem",
|
||||
"position": 1,
|
||||
"name": "{{level_1_name}}",
|
||||
"item": "{{level_1_url}}"
|
||||
},
|
||||
{
|
||||
"@type": "ListItem",
|
||||
"position": 2,
|
||||
"name": "{{level_2_name}}",
|
||||
"item": "{{level_2_url}}"
|
||||
},
|
||||
{
|
||||
"@type": "ListItem",
|
||||
"position": 3,
|
||||
"name": "{{level_3_name}}",
|
||||
"item": "{{level_3_url}}"
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,30 @@
|
||||
{
|
||||
"@context": "https://schema.org",
|
||||
"@type": "FAQPage",
|
||||
"mainEntity": [
|
||||
{
|
||||
"@type": "Question",
|
||||
"name": "{{question_1}}",
|
||||
"acceptedAnswer": {
|
||||
"@type": "Answer",
|
||||
"text": "{{answer_1}}"
|
||||
}
|
||||
},
|
||||
{
|
||||
"@type": "Question",
|
||||
"name": "{{question_2}}",
|
||||
"acceptedAnswer": {
|
||||
"@type": "Answer",
|
||||
"text": "{{answer_2}}"
|
||||
}
|
||||
},
|
||||
{
|
||||
"@type": "Question",
|
||||
"name": "{{question_3}}",
|
||||
"acceptedAnswer": {
|
||||
"@type": "Answer",
|
||||
"text": "{{answer_3}}"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,47 @@
|
||||
{
|
||||
"@context": "https://schema.org",
|
||||
"@type": "{{business_type}}",
|
||||
"name": "{{name}}",
|
||||
"description": "{{description}}",
|
||||
"url": "{{url}}",
|
||||
"telephone": "{{phone}}",
|
||||
"email": "{{email}}",
|
||||
"image": "{{image_url}}",
|
||||
"priceRange": "{{price_range}}",
|
||||
"address": {
|
||||
"@type": "PostalAddress",
|
||||
"streetAddress": "{{street_address}}",
|
||||
"addressLocality": "{{city}}",
|
||||
"addressRegion": "{{region}}",
|
||||
"postalCode": "{{postal_code}}",
|
||||
"addressCountry": "{{country}}"
|
||||
},
|
||||
"geo": {
|
||||
"@type": "GeoCoordinates",
|
||||
"latitude": "{{latitude}}",
|
||||
"longitude": "{{longitude}}"
|
||||
},
|
||||
"openingHoursSpecification": [
|
||||
{
|
||||
"@type": "OpeningHoursSpecification",
|
||||
"dayOfWeek": ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"],
|
||||
"opens": "{{weekday_opens}}",
|
||||
"closes": "{{weekday_closes}}"
|
||||
},
|
||||
{
|
||||
"@type": "OpeningHoursSpecification",
|
||||
"dayOfWeek": ["Saturday", "Sunday"],
|
||||
"opens": "{{weekend_opens}}",
|
||||
"closes": "{{weekend_closes}}"
|
||||
}
|
||||
],
|
||||
"aggregateRating": {
|
||||
"@type": "AggregateRating",
|
||||
"ratingValue": "{{rating}}",
|
||||
"reviewCount": "{{review_count}}"
|
||||
},
|
||||
"sameAs": [
|
||||
"{{facebook_url}}",
|
||||
"{{instagram_url}}"
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,37 @@
|
||||
{
|
||||
"@context": "https://schema.org",
|
||||
"@type": "Organization",
|
||||
"name": "{{name}}",
|
||||
"url": "{{url}}",
|
||||
"logo": "{{logo_url}}",
|
||||
"description": "{{description}}",
|
||||
"foundingDate": "{{founding_date}}",
|
||||
"founders": [
|
||||
{
|
||||
"@type": "Person",
|
||||
"name": "{{founder_name}}"
|
||||
}
|
||||
],
|
||||
"address": {
|
||||
"@type": "PostalAddress",
|
||||
"streetAddress": "{{street_address}}",
|
||||
"addressLocality": "{{city}}",
|
||||
"addressRegion": "{{region}}",
|
||||
"postalCode": "{{postal_code}}",
|
||||
"addressCountry": "{{country}}"
|
||||
},
|
||||
"contactPoint": [
|
||||
{
|
||||
"@type": "ContactPoint",
|
||||
"telephone": "{{phone}}",
|
||||
"contactType": "customer service",
|
||||
"availableLanguage": ["Korean", "English"]
|
||||
}
|
||||
],
|
||||
"sameAs": [
|
||||
"{{facebook_url}}",
|
||||
"{{twitter_url}}",
|
||||
"{{linkedin_url}}",
|
||||
"{{instagram_url}}"
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,76 @@
|
||||
{
|
||||
"@context": "https://schema.org",
|
||||
"@type": "Product",
|
||||
"name": "{{name}}",
|
||||
"description": "{{description}}",
|
||||
"image": [
|
||||
"{{image_url_1}}",
|
||||
"{{image_url_2}}",
|
||||
"{{image_url_3}}"
|
||||
],
|
||||
"sku": "{{sku}}",
|
||||
"mpn": "{{mpn}}",
|
||||
"gtin13": "{{gtin13}}",
|
||||
"brand": {
|
||||
"@type": "Brand",
|
||||
"name": "{{brand_name}}"
|
||||
},
|
||||
"offers": {
|
||||
"@type": "Offer",
|
||||
"url": "{{product_url}}",
|
||||
"price": "{{price}}",
|
||||
"priceCurrency": "{{currency}}",
|
||||
"priceValidUntil": "{{price_valid_until}}",
|
||||
"availability": "https://schema.org/{{availability}}",
|
||||
"itemCondition": "https://schema.org/{{condition}}",
|
||||
"seller": {
|
||||
"@type": "Organization",
|
||||
"name": "{{seller_name}}"
|
||||
},
|
||||
"shippingDetails": {
|
||||
"@type": "OfferShippingDetails",
|
||||
"shippingRate": {
|
||||
"@type": "MonetaryAmount",
|
||||
"value": "{{shipping_cost}}",
|
||||
"currency": "{{currency}}"
|
||||
},
|
||||
"deliveryTime": {
|
||||
"@type": "ShippingDeliveryTime",
|
||||
"handlingTime": {
|
||||
"@type": "QuantitativeValue",
|
||||
"minValue": "{{handling_min_days}}",
|
||||
"maxValue": "{{handling_max_days}}",
|
||||
"unitCode": "DAY"
|
||||
},
|
||||
"transitTime": {
|
||||
"@type": "QuantitativeValue",
|
||||
"minValue": "{{transit_min_days}}",
|
||||
"maxValue": "{{transit_max_days}}",
|
||||
"unitCode": "DAY"
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"aggregateRating": {
|
||||
"@type": "AggregateRating",
|
||||
"ratingValue": "{{rating}}",
|
||||
"reviewCount": "{{review_count}}",
|
||||
"bestRating": "5",
|
||||
"worstRating": "1"
|
||||
},
|
||||
"review": [
|
||||
{
|
||||
"@type": "Review",
|
||||
"reviewRating": {
|
||||
"@type": "Rating",
|
||||
"ratingValue": "{{review_rating}}",
|
||||
"bestRating": "5"
|
||||
},
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "{{reviewer_name}}"
|
||||
},
|
||||
"reviewBody": "{{review_text}}"
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,25 @@
|
||||
{
|
||||
"@context": "https://schema.org",
|
||||
"@type": "WebSite",
|
||||
"name": "{{site_name}}",
|
||||
"alternateName": "{{alternate_name}}",
|
||||
"url": "{{url}}",
|
||||
"description": "{{description}}",
|
||||
"inLanguage": "{{language}}",
|
||||
"potentialAction": {
|
||||
"@type": "SearchAction",
|
||||
"target": {
|
||||
"@type": "EntryPoint",
|
||||
"urlTemplate": "{{search_url_template}}"
|
||||
},
|
||||
"query-input": "required name=search_term_string"
|
||||
},
|
||||
"publisher": {
|
||||
"@type": "Organization",
|
||||
"name": "{{publisher_name}}",
|
||||
"logo": {
|
||||
"@type": "ImageObject",
|
||||
"url": "{{logo_url}}"
|
||||
}
|
||||
}
|
||||
}
|
||||
146
custom-skills/14-seo-schema-generator/desktop/SKILL.md
Normal file
146
custom-skills/14-seo-schema-generator/desktop/SKILL.md
Normal file
@@ -0,0 +1,146 @@
|
||||
---
|
||||
name: seo-schema-generator
|
||||
version: 1.0.0
|
||||
description: Schema markup generator for JSON-LD structured data. Triggers: generate schema, create JSON-LD, add structured data, schema markup.
|
||||
allowed-tools: mcp__firecrawl__*, mcp__perplexity__*
|
||||
---
|
||||
|
||||
# SEO Schema Generator
|
||||
|
||||
## Purpose
|
||||
|
||||
Generate JSON-LD structured data markup for various content types using templates.
|
||||
|
||||
## Core Capabilities
|
||||
|
||||
1. **Organization** - Company/brand information
|
||||
2. **LocalBusiness** - Physical location businesses
|
||||
3. **Article** - Blog posts and news articles
|
||||
4. **Product** - E-commerce products
|
||||
5. **FAQPage** - FAQ sections
|
||||
6. **BreadcrumbList** - Navigation breadcrumbs
|
||||
7. **WebSite** - Site-level with search action
|
||||
|
||||
## Workflow
|
||||
|
||||
1. Identify content type
|
||||
2. Gather required information
|
||||
3. Generate JSON-LD from template
|
||||
4. Validate output
|
||||
5. Provide implementation instructions
|
||||
|
||||
## Schema Templates
|
||||
|
||||
### Organization
|
||||
```json
|
||||
{
|
||||
"@context": "https://schema.org",
|
||||
"@type": "Organization",
|
||||
"name": "[Company Name]",
|
||||
"url": "[Website URL]",
|
||||
"logo": "[Logo URL]",
|
||||
"sameAs": [
|
||||
"[Social Media URLs]"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### LocalBusiness
|
||||
```json
|
||||
{
|
||||
"@context": "https://schema.org",
|
||||
"@type": "LocalBusiness",
|
||||
"name": "[Business Name]",
|
||||
"address": {
|
||||
"@type": "PostalAddress",
|
||||
"streetAddress": "[Street]",
|
||||
"addressLocality": "[City]",
|
||||
"addressRegion": "[State]",
|
||||
"postalCode": "[ZIP]",
|
||||
"addressCountry": "[Country]"
|
||||
},
|
||||
"telephone": "[Phone]",
|
||||
"openingHours": ["Mo-Fr 09:00-17:00"]
|
||||
}
|
||||
```
|
||||
|
||||
### Article
|
||||
```json
|
||||
{
|
||||
"@context": "https://schema.org",
|
||||
"@type": "Article",
|
||||
"headline": "[Title]",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "[Author Name]"
|
||||
},
|
||||
"datePublished": "[YYYY-MM-DD]",
|
||||
"dateModified": "[YYYY-MM-DD]",
|
||||
"image": "[Image URL]",
|
||||
"publisher": {
|
||||
"@type": "Organization",
|
||||
"name": "[Publisher]",
|
||||
"logo": "[Logo URL]"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### FAQPage
|
||||
```json
|
||||
{
|
||||
"@context": "https://schema.org",
|
||||
"@type": "FAQPage",
|
||||
"mainEntity": [
|
||||
{
|
||||
"@type": "Question",
|
||||
"name": "[Question]",
|
||||
"acceptedAnswer": {
|
||||
"@type": "Answer",
|
||||
"text": "[Answer]"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Product
|
||||
```json
|
||||
{
|
||||
"@context": "https://schema.org",
|
||||
"@type": "Product",
|
||||
"name": "[Product Name]",
|
||||
"image": "[Image URL]",
|
||||
"description": "[Description]",
|
||||
"offers": {
|
||||
"@type": "Offer",
|
||||
"price": "[Price]",
|
||||
"priceCurrency": "[Currency]",
|
||||
"availability": "https://schema.org/InStock"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Implementation
|
||||
|
||||
Place generated JSON-LD in `<head>` section:
|
||||
|
||||
```html
|
||||
<head>
|
||||
<script type="application/ld+json">
|
||||
[Generated Schema Here]
|
||||
</script>
|
||||
</head>
|
||||
```
|
||||
|
||||
## Validation
|
||||
|
||||
After generating:
|
||||
1. Use schema validator skill (13) to verify
|
||||
2. Test with Google Rich Results Test
|
||||
3. Monitor in Search Console
|
||||
|
||||
## Limitations
|
||||
|
||||
- Templates cover common types only
|
||||
- Complex nested schemas may need manual adjustment
|
||||
- Some Rich Results require additional properties
|
||||
@@ -0,0 +1,32 @@
|
||||
{
|
||||
"@context": "https://schema.org",
|
||||
"@type": "{{article_type}}",
|
||||
"headline": "{{headline}}",
|
||||
"description": "{{description}}",
|
||||
"image": [
|
||||
"{{image_url_1}}",
|
||||
"{{image_url_2}}"
|
||||
],
|
||||
"datePublished": "{{date_published}}",
|
||||
"dateModified": "{{date_modified}}",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "{{author_name}}",
|
||||
"url": "{{author_url}}"
|
||||
},
|
||||
"publisher": {
|
||||
"@type": "Organization",
|
||||
"name": "{{publisher_name}}",
|
||||
"logo": {
|
||||
"@type": "ImageObject",
|
||||
"url": "{{publisher_logo_url}}"
|
||||
}
|
||||
},
|
||||
"mainEntityOfPage": {
|
||||
"@type": "WebPage",
|
||||
"@id": "{{page_url}}"
|
||||
},
|
||||
"articleSection": "{{section}}",
|
||||
"wordCount": "{{word_count}}",
|
||||
"keywords": "{{keywords}}"
|
||||
}
|
||||
@@ -0,0 +1,24 @@
|
||||
{
|
||||
"@context": "https://schema.org",
|
||||
"@type": "BreadcrumbList",
|
||||
"itemListElement": [
|
||||
{
|
||||
"@type": "ListItem",
|
||||
"position": 1,
|
||||
"name": "{{level_1_name}}",
|
||||
"item": "{{level_1_url}}"
|
||||
},
|
||||
{
|
||||
"@type": "ListItem",
|
||||
"position": 2,
|
||||
"name": "{{level_2_name}}",
|
||||
"item": "{{level_2_url}}"
|
||||
},
|
||||
{
|
||||
"@type": "ListItem",
|
||||
"position": 3,
|
||||
"name": "{{level_3_name}}",
|
||||
"item": "{{level_3_url}}"
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,30 @@
|
||||
{
|
||||
"@context": "https://schema.org",
|
||||
"@type": "FAQPage",
|
||||
"mainEntity": [
|
||||
{
|
||||
"@type": "Question",
|
||||
"name": "{{question_1}}",
|
||||
"acceptedAnswer": {
|
||||
"@type": "Answer",
|
||||
"text": "{{answer_1}}"
|
||||
}
|
||||
},
|
||||
{
|
||||
"@type": "Question",
|
||||
"name": "{{question_2}}",
|
||||
"acceptedAnswer": {
|
||||
"@type": "Answer",
|
||||
"text": "{{answer_2}}"
|
||||
}
|
||||
},
|
||||
{
|
||||
"@type": "Question",
|
||||
"name": "{{question_3}}",
|
||||
"acceptedAnswer": {
|
||||
"@type": "Answer",
|
||||
"text": "{{answer_3}}"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,47 @@
|
||||
{
|
||||
"@context": "https://schema.org",
|
||||
"@type": "{{business_type}}",
|
||||
"name": "{{name}}",
|
||||
"description": "{{description}}",
|
||||
"url": "{{url}}",
|
||||
"telephone": "{{phone}}",
|
||||
"email": "{{email}}",
|
||||
"image": "{{image_url}}",
|
||||
"priceRange": "{{price_range}}",
|
||||
"address": {
|
||||
"@type": "PostalAddress",
|
||||
"streetAddress": "{{street_address}}",
|
||||
"addressLocality": "{{city}}",
|
||||
"addressRegion": "{{region}}",
|
||||
"postalCode": "{{postal_code}}",
|
||||
"addressCountry": "{{country}}"
|
||||
},
|
||||
"geo": {
|
||||
"@type": "GeoCoordinates",
|
||||
"latitude": "{{latitude}}",
|
||||
"longitude": "{{longitude}}"
|
||||
},
|
||||
"openingHoursSpecification": [
|
||||
{
|
||||
"@type": "OpeningHoursSpecification",
|
||||
"dayOfWeek": ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"],
|
||||
"opens": "{{weekday_opens}}",
|
||||
"closes": "{{weekday_closes}}"
|
||||
},
|
||||
{
|
||||
"@type": "OpeningHoursSpecification",
|
||||
"dayOfWeek": ["Saturday", "Sunday"],
|
||||
"opens": "{{weekend_opens}}",
|
||||
"closes": "{{weekend_closes}}"
|
||||
}
|
||||
],
|
||||
"aggregateRating": {
|
||||
"@type": "AggregateRating",
|
||||
"ratingValue": "{{rating}}",
|
||||
"reviewCount": "{{review_count}}"
|
||||
},
|
||||
"sameAs": [
|
||||
"{{facebook_url}}",
|
||||
"{{instagram_url}}"
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,37 @@
|
||||
{
|
||||
"@context": "https://schema.org",
|
||||
"@type": "Organization",
|
||||
"name": "{{name}}",
|
||||
"url": "{{url}}",
|
||||
"logo": "{{logo_url}}",
|
||||
"description": "{{description}}",
|
||||
"foundingDate": "{{founding_date}}",
|
||||
"founders": [
|
||||
{
|
||||
"@type": "Person",
|
||||
"name": "{{founder_name}}"
|
||||
}
|
||||
],
|
||||
"address": {
|
||||
"@type": "PostalAddress",
|
||||
"streetAddress": "{{street_address}}",
|
||||
"addressLocality": "{{city}}",
|
||||
"addressRegion": "{{region}}",
|
||||
"postalCode": "{{postal_code}}",
|
||||
"addressCountry": "{{country}}"
|
||||
},
|
||||
"contactPoint": [
|
||||
{
|
||||
"@type": "ContactPoint",
|
||||
"telephone": "{{phone}}",
|
||||
"contactType": "customer service",
|
||||
"availableLanguage": ["Korean", "English"]
|
||||
}
|
||||
],
|
||||
"sameAs": [
|
||||
"{{facebook_url}}",
|
||||
"{{twitter_url}}",
|
||||
"{{linkedin_url}}",
|
||||
"{{instagram_url}}"
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,76 @@
|
||||
{
|
||||
"@context": "https://schema.org",
|
||||
"@type": "Product",
|
||||
"name": "{{name}}",
|
||||
"description": "{{description}}",
|
||||
"image": [
|
||||
"{{image_url_1}}",
|
||||
"{{image_url_2}}",
|
||||
"{{image_url_3}}"
|
||||
],
|
||||
"sku": "{{sku}}",
|
||||
"mpn": "{{mpn}}",
|
||||
"gtin13": "{{gtin13}}",
|
||||
"brand": {
|
||||
"@type": "Brand",
|
||||
"name": "{{brand_name}}"
|
||||
},
|
||||
"offers": {
|
||||
"@type": "Offer",
|
||||
"url": "{{product_url}}",
|
||||
"price": "{{price}}",
|
||||
"priceCurrency": "{{currency}}",
|
||||
"priceValidUntil": "{{price_valid_until}}",
|
||||
"availability": "https://schema.org/{{availability}}",
|
||||
"itemCondition": "https://schema.org/{{condition}}",
|
||||
"seller": {
|
||||
"@type": "Organization",
|
||||
"name": "{{seller_name}}"
|
||||
},
|
||||
"shippingDetails": {
|
||||
"@type": "OfferShippingDetails",
|
||||
"shippingRate": {
|
||||
"@type": "MonetaryAmount",
|
||||
"value": "{{shipping_cost}}",
|
||||
"currency": "{{currency}}"
|
||||
},
|
||||
"deliveryTime": {
|
||||
"@type": "ShippingDeliveryTime",
|
||||
"handlingTime": {
|
||||
"@type": "QuantitativeValue",
|
||||
"minValue": "{{handling_min_days}}",
|
||||
"maxValue": "{{handling_max_days}}",
|
||||
"unitCode": "DAY"
|
||||
},
|
||||
"transitTime": {
|
||||
"@type": "QuantitativeValue",
|
||||
"minValue": "{{transit_min_days}}",
|
||||
"maxValue": "{{transit_max_days}}",
|
||||
"unitCode": "DAY"
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"aggregateRating": {
|
||||
"@type": "AggregateRating",
|
||||
"ratingValue": "{{rating}}",
|
||||
"reviewCount": "{{review_count}}",
|
||||
"bestRating": "5",
|
||||
"worstRating": "1"
|
||||
},
|
||||
"review": [
|
||||
{
|
||||
"@type": "Review",
|
||||
"reviewRating": {
|
||||
"@type": "Rating",
|
||||
"ratingValue": "{{review_rating}}",
|
||||
"bestRating": "5"
|
||||
},
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "{{reviewer_name}}"
|
||||
},
|
||||
"reviewBody": "{{review_text}}"
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,25 @@
|
||||
{
|
||||
"@context": "https://schema.org",
|
||||
"@type": "WebSite",
|
||||
"name": "{{site_name}}",
|
||||
"alternateName": "{{alternate_name}}",
|
||||
"url": "{{url}}",
|
||||
"description": "{{description}}",
|
||||
"inLanguage": "{{language}}",
|
||||
"potentialAction": {
|
||||
"@type": "SearchAction",
|
||||
"target": {
|
||||
"@type": "EntryPoint",
|
||||
"urlTemplate": "{{search_url_template}}"
|
||||
},
|
||||
"query-input": "required name=search_term_string"
|
||||
},
|
||||
"publisher": {
|
||||
"@type": "Organization",
|
||||
"name": "{{publisher_name}}",
|
||||
"logo": {
|
||||
"@type": "ImageObject",
|
||||
"url": "{{logo_url}}"
|
||||
}
|
||||
}
|
||||
}
|
||||
117
custom-skills/15-seo-core-web-vitals/code/CLAUDE.md
Normal file
117
custom-skills/15-seo-core-web-vitals/code/CLAUDE.md
Normal file
@@ -0,0 +1,117 @@
|
||||
# CLAUDE.md
|
||||
|
||||
## Overview
|
||||
|
||||
Core Web Vitals analyzer using Google PageSpeed Insights API: LCP, FID, CLS, INP, TTFB, FCP measurement and recommendations.
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
pip install -r scripts/requirements.txt
|
||||
|
||||
# Requires API key
|
||||
export PAGESPEED_API_KEY=your_api_key
|
||||
|
||||
python scripts/pagespeed_client.py --url https://example.com
|
||||
```
|
||||
|
||||
## Scripts
|
||||
|
||||
| Script | Purpose |
|
||||
|--------|---------|
|
||||
| `pagespeed_client.py` | PageSpeed Insights API client |
|
||||
| `base_client.py` | Shared utilities |
|
||||
|
||||
## Usage
|
||||
|
||||
```bash
|
||||
# Mobile analysis (default)
|
||||
python scripts/pagespeed_client.py --url https://example.com
|
||||
|
||||
# Desktop analysis
|
||||
python scripts/pagespeed_client.py --url https://example.com --strategy desktop
|
||||
|
||||
# Both strategies
|
||||
python scripts/pagespeed_client.py --url https://example.com --strategy both
|
||||
|
||||
# JSON output
|
||||
python scripts/pagespeed_client.py --url https://example.com --json
|
||||
|
||||
# Batch analysis
|
||||
python scripts/pagespeed_client.py --urls urls.txt --output results.json
|
||||
```
|
||||
|
||||
## Core Web Vitals Metrics
|
||||
|
||||
| Metric | Good | Needs Improvement | Poor |
|
||||
|--------|------|-------------------|------|
|
||||
| LCP (Largest Contentful Paint) | ≤2.5s | 2.5s-4s | >4s |
|
||||
| FID (First Input Delay) | ≤100ms | 100ms-300ms | >300ms |
|
||||
| CLS (Cumulative Layout Shift) | ≤0.1 | 0.1-0.25 | >0.25 |
|
||||
| INP (Interaction to Next Paint) | ≤200ms | 200ms-500ms | >500ms |
|
||||
|
||||
## Additional Metrics
|
||||
|
||||
| Metric | Description |
|
||||
|--------|-------------|
|
||||
| TTFB | Time to First Byte |
|
||||
| FCP | First Contentful Paint |
|
||||
| SI | Speed Index |
|
||||
| TBT | Total Blocking Time |
|
||||
|
||||
## Output
|
||||
|
||||
```json
|
||||
{
|
||||
"url": "https://example.com",
|
||||
"strategy": "mobile",
|
||||
"score": 85,
|
||||
"core_web_vitals": {
|
||||
"lcp": {"value": 2.1, "rating": "good"},
|
||||
"fid": {"value": 50, "rating": "good"},
|
||||
"cls": {"value": 0.05, "rating": "good"},
|
||||
"inp": {"value": 180, "rating": "good"}
|
||||
},
|
||||
"opportunities": [
|
||||
{
|
||||
"id": "render-blocking-resources",
|
||||
"title": "Eliminate render-blocking resources",
|
||||
"savings_ms": 1200
|
||||
}
|
||||
],
|
||||
"diagnostics": []
|
||||
}
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
Environment variables:
|
||||
```bash
|
||||
PAGESPEED_API_KEY=AIza... # Required for higher quotas
|
||||
GOOGLE_API_KEY=AIza... # Alternative key name
|
||||
```
|
||||
|
||||
## Rate Limits
|
||||
|
||||
| Tier | Limit |
|
||||
|------|-------|
|
||||
| No API key | 25 queries/day |
|
||||
| With API key | 25,000 queries/day |
|
||||
|
||||
## Common Recommendations
|
||||
|
||||
| Issue | Fix |
|
||||
|-------|-----|
|
||||
| Large LCP | Optimize images, preload critical resources |
|
||||
| High CLS | Set image dimensions, avoid injected content |
|
||||
| Poor INP | Reduce JavaScript, optimize event handlers |
|
||||
| Slow TTFB | Improve server response, use CDN |
|
||||
|
||||
## Dependencies
|
||||
|
||||
```
|
||||
google-api-python-client>=2.100.0
|
||||
requests>=2.31.0
|
||||
python-dotenv>=1.0.0
|
||||
rich>=13.7.0
|
||||
```
|
||||
207
custom-skills/15-seo-core-web-vitals/code/scripts/base_client.py
Normal file
207
custom-skills/15-seo-core-web-vitals/code/scripts/base_client.py
Normal file
@@ -0,0 +1,207 @@
|
||||
"""
|
||||
Base Client - Shared async client utilities
|
||||
===========================================
|
||||
Purpose: Rate-limited async operations for API clients
|
||||
Python: 3.10+
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
import os
|
||||
from asyncio import Semaphore
|
||||
from datetime import datetime
|
||||
from typing import Any, Callable, TypeVar
|
||||
|
||||
from dotenv import load_dotenv
|
||||
from tenacity import (
|
||||
retry,
|
||||
stop_after_attempt,
|
||||
wait_exponential,
|
||||
retry_if_exception_type,
|
||||
)
|
||||
|
||||
# Load environment variables
|
||||
load_dotenv()
|
||||
|
||||
# Logging setup
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format="%(asctime)s - %(levelname)s - %(message)s",
|
||||
)
|
||||
|
||||
T = TypeVar("T")
|
||||
|
||||
|
||||
class RateLimiter:
|
||||
"""Rate limiter using token bucket algorithm."""
|
||||
|
||||
def __init__(self, rate: float, per: float = 1.0):
|
||||
"""
|
||||
Initialize rate limiter.
|
||||
|
||||
Args:
|
||||
rate: Number of requests allowed
|
||||
per: Time period in seconds (default: 1 second)
|
||||
"""
|
||||
self.rate = rate
|
||||
self.per = per
|
||||
self.tokens = rate
|
||||
self.last_update = datetime.now()
|
||||
self._lock = asyncio.Lock()
|
||||
|
||||
async def acquire(self) -> None:
|
||||
"""Acquire a token, waiting if necessary."""
|
||||
async with self._lock:
|
||||
now = datetime.now()
|
||||
elapsed = (now - self.last_update).total_seconds()
|
||||
self.tokens = min(self.rate, self.tokens + elapsed * (self.rate / self.per))
|
||||
self.last_update = now
|
||||
|
||||
if self.tokens < 1:
|
||||
wait_time = (1 - self.tokens) * (self.per / self.rate)
|
||||
await asyncio.sleep(wait_time)
|
||||
self.tokens = 0
|
||||
else:
|
||||
self.tokens -= 1
|
||||
|
||||
|
||||
class BaseAsyncClient:
|
||||
"""Base class for async API clients with rate limiting."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
max_concurrent: int = 5,
|
||||
requests_per_second: float = 3.0,
|
||||
logger: logging.Logger | None = None,
|
||||
):
|
||||
"""
|
||||
Initialize base client.
|
||||
|
||||
Args:
|
||||
max_concurrent: Maximum concurrent requests
|
||||
requests_per_second: Rate limit
|
||||
logger: Logger instance
|
||||
"""
|
||||
self.semaphore = Semaphore(max_concurrent)
|
||||
self.rate_limiter = RateLimiter(requests_per_second)
|
||||
self.logger = logger or logging.getLogger(self.__class__.__name__)
|
||||
self.stats = {
|
||||
"requests": 0,
|
||||
"success": 0,
|
||||
"errors": 0,
|
||||
"retries": 0,
|
||||
}
|
||||
|
||||
@retry(
|
||||
stop=stop_after_attempt(3),
|
||||
wait=wait_exponential(multiplier=1, min=2, max=10),
|
||||
retry=retry_if_exception_type(Exception),
|
||||
)
|
||||
async def _rate_limited_request(
|
||||
self,
|
||||
coro: Callable[[], Any],
|
||||
) -> Any:
|
||||
"""Execute a request with rate limiting and retry."""
|
||||
async with self.semaphore:
|
||||
await self.rate_limiter.acquire()
|
||||
self.stats["requests"] += 1
|
||||
try:
|
||||
result = await coro()
|
||||
self.stats["success"] += 1
|
||||
return result
|
||||
except Exception as e:
|
||||
self.stats["errors"] += 1
|
||||
self.logger.error(f"Request failed: {e}")
|
||||
raise
|
||||
|
||||
async def batch_requests(
|
||||
self,
|
||||
requests: list[Callable[[], Any]],
|
||||
desc: str = "Processing",
|
||||
) -> list[Any]:
|
||||
"""Execute multiple requests concurrently."""
|
||||
try:
|
||||
from tqdm.asyncio import tqdm
|
||||
has_tqdm = True
|
||||
except ImportError:
|
||||
has_tqdm = False
|
||||
|
||||
async def execute(req: Callable) -> Any:
|
||||
try:
|
||||
return await self._rate_limited_request(req)
|
||||
except Exception as e:
|
||||
return {"error": str(e)}
|
||||
|
||||
tasks = [execute(req) for req in requests]
|
||||
|
||||
if has_tqdm:
|
||||
results = []
|
||||
for coro in tqdm.as_completed(tasks, total=len(tasks), desc=desc):
|
||||
result = await coro
|
||||
results.append(result)
|
||||
return results
|
||||
else:
|
||||
return await asyncio.gather(*tasks, return_exceptions=True)
|
||||
|
||||
def print_stats(self) -> None:
|
||||
"""Print request statistics."""
|
||||
self.logger.info("=" * 40)
|
||||
self.logger.info("Request Statistics:")
|
||||
self.logger.info(f" Total Requests: {self.stats['requests']}")
|
||||
self.logger.info(f" Successful: {self.stats['success']}")
|
||||
self.logger.info(f" Errors: {self.stats['errors']}")
|
||||
self.logger.info("=" * 40)
|
||||
|
||||
|
||||
class ConfigManager:
|
||||
"""Manage API configuration and credentials."""
|
||||
|
||||
def __init__(self):
|
||||
load_dotenv()
|
||||
|
||||
@property
|
||||
def google_credentials_path(self) -> str | None:
|
||||
"""Get Google service account credentials path."""
|
||||
# Prefer SEO-specific credentials, fallback to general credentials
|
||||
seo_creds = os.path.expanduser("~/.credential/ourdigital-seo-agent.json")
|
||||
if os.path.exists(seo_creds):
|
||||
return seo_creds
|
||||
return os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
|
||||
|
||||
@property
|
||||
def pagespeed_api_key(self) -> str | None:
|
||||
"""Get PageSpeed Insights API key."""
|
||||
return os.getenv("PAGESPEED_API_KEY")
|
||||
|
||||
@property
|
||||
def custom_search_api_key(self) -> str | None:
|
||||
"""Get Custom Search API key."""
|
||||
return os.getenv("CUSTOM_SEARCH_API_KEY")
|
||||
|
||||
@property
|
||||
def custom_search_engine_id(self) -> str | None:
|
||||
"""Get Custom Search Engine ID."""
|
||||
return os.getenv("CUSTOM_SEARCH_ENGINE_ID")
|
||||
|
||||
@property
|
||||
def notion_token(self) -> str | None:
|
||||
"""Get Notion API token."""
|
||||
return os.getenv("NOTION_TOKEN") or os.getenv("NOTION_API_KEY")
|
||||
|
||||
def validate_google_credentials(self) -> bool:
|
||||
"""Validate Google credentials are configured."""
|
||||
creds_path = self.google_credentials_path
|
||||
if not creds_path:
|
||||
return False
|
||||
return os.path.exists(creds_path)
|
||||
|
||||
def get_required(self, key: str) -> str:
|
||||
"""Get required environment variable or raise error."""
|
||||
value = os.getenv(key)
|
||||
if not value:
|
||||
raise ValueError(f"Missing required environment variable: {key}")
|
||||
return value
|
||||
|
||||
|
||||
# Singleton config instance
|
||||
config = ConfigManager()
|
||||
@@ -0,0 +1,452 @@
|
||||
"""
|
||||
PageSpeed Insights Client
|
||||
=========================
|
||||
Purpose: Get Core Web Vitals and performance data from PageSpeed Insights API
|
||||
Python: 3.10+
|
||||
Usage:
|
||||
from pagespeed_client import PageSpeedClient
|
||||
client = PageSpeedClient()
|
||||
result = client.analyze("https://example.com")
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import logging
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Any
|
||||
|
||||
import requests
|
||||
|
||||
from base_client import config
|
||||
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format="%(asctime)s - %(levelname)s - %(message)s",
|
||||
)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@dataclass
|
||||
class CoreWebVitals:
|
||||
"""Core Web Vitals metrics."""
|
||||
|
||||
lcp: float | None = None # Largest Contentful Paint (ms)
|
||||
fid: float | None = None # First Input Delay (ms)
|
||||
cls: float | None = None # Cumulative Layout Shift
|
||||
inp: float | None = None # Interaction to Next Paint (ms)
|
||||
ttfb: float | None = None # Time to First Byte (ms)
|
||||
fcp: float | None = None # First Contentful Paint (ms)
|
||||
|
||||
# Assessment (GOOD, NEEDS_IMPROVEMENT, POOR)
|
||||
lcp_rating: str | None = None
|
||||
fid_rating: str | None = None
|
||||
cls_rating: str | None = None
|
||||
inp_rating: str | None = None
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
return {
|
||||
"lcp": {"value": self.lcp, "rating": self.lcp_rating},
|
||||
"fid": {"value": self.fid, "rating": self.fid_rating},
|
||||
"cls": {"value": self.cls, "rating": self.cls_rating},
|
||||
"inp": {"value": self.inp, "rating": self.inp_rating},
|
||||
"ttfb": {"value": self.ttfb},
|
||||
"fcp": {"value": self.fcp},
|
||||
}
|
||||
|
||||
|
||||
@dataclass
|
||||
class PageSpeedResult:
|
||||
"""PageSpeed analysis result."""
|
||||
|
||||
url: str
|
||||
strategy: str # mobile or desktop
|
||||
performance_score: float | None = None
|
||||
seo_score: float | None = None
|
||||
accessibility_score: float | None = None
|
||||
best_practices_score: float | None = None
|
||||
core_web_vitals: CoreWebVitals = field(default_factory=CoreWebVitals)
|
||||
opportunities: list[dict] = field(default_factory=list)
|
||||
diagnostics: list[dict] = field(default_factory=list)
|
||||
passed_audits: list[str] = field(default_factory=list)
|
||||
raw_data: dict = field(default_factory=dict)
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
return {
|
||||
"url": self.url,
|
||||
"strategy": self.strategy,
|
||||
"scores": {
|
||||
"performance": self.performance_score,
|
||||
"seo": self.seo_score,
|
||||
"accessibility": self.accessibility_score,
|
||||
"best_practices": self.best_practices_score,
|
||||
},
|
||||
"core_web_vitals": self.core_web_vitals.to_dict(),
|
||||
"opportunities_count": len(self.opportunities),
|
||||
"opportunities": self.opportunities[:10],
|
||||
"diagnostics_count": len(self.diagnostics),
|
||||
"passed_audits_count": len(self.passed_audits),
|
||||
}
|
||||
|
||||
|
||||
class PageSpeedClient:
|
||||
"""Client for PageSpeed Insights API."""
|
||||
|
||||
BASE_URL = "https://www.googleapis.com/pagespeedonline/v5/runPagespeed"
|
||||
|
||||
# Core Web Vitals thresholds
|
||||
THRESHOLDS = {
|
||||
"lcp": {"good": 2500, "poor": 4000},
|
||||
"fid": {"good": 100, "poor": 300},
|
||||
"cls": {"good": 0.1, "poor": 0.25},
|
||||
"inp": {"good": 200, "poor": 500},
|
||||
"ttfb": {"good": 800, "poor": 1800},
|
||||
"fcp": {"good": 1800, "poor": 3000},
|
||||
}
|
||||
|
||||
def __init__(self, api_key: str | None = None):
|
||||
"""
|
||||
Initialize PageSpeed client.
|
||||
|
||||
Args:
|
||||
api_key: PageSpeed API key (optional but recommended for higher quotas)
|
||||
"""
|
||||
self.api_key = api_key or config.pagespeed_api_key
|
||||
self.session = requests.Session()
|
||||
|
||||
def _rate_metric(self, metric: str, value: float | None) -> str | None:
|
||||
"""Rate a metric against thresholds."""
|
||||
if value is None:
|
||||
return None
|
||||
|
||||
thresholds = self.THRESHOLDS.get(metric)
|
||||
if not thresholds:
|
||||
return None
|
||||
|
||||
if value <= thresholds["good"]:
|
||||
return "GOOD"
|
||||
elif value <= thresholds["poor"]:
|
||||
return "NEEDS_IMPROVEMENT"
|
||||
else:
|
||||
return "POOR"
|
||||
|
||||
def analyze(
|
||||
self,
|
||||
url: str,
|
||||
strategy: str = "mobile",
|
||||
categories: list[str] | None = None,
|
||||
) -> PageSpeedResult:
|
||||
"""
|
||||
Analyze a URL with PageSpeed Insights.
|
||||
|
||||
Args:
|
||||
url: URL to analyze
|
||||
strategy: "mobile" or "desktop"
|
||||
categories: Categories to analyze (performance, seo, accessibility, best-practices)
|
||||
|
||||
Returns:
|
||||
PageSpeedResult with scores and metrics
|
||||
"""
|
||||
if categories is None:
|
||||
categories = ["performance", "seo", "accessibility", "best-practices"]
|
||||
|
||||
params = {
|
||||
"url": url,
|
||||
"strategy": strategy,
|
||||
"category": categories,
|
||||
}
|
||||
|
||||
if self.api_key:
|
||||
params["key"] = self.api_key
|
||||
|
||||
try:
|
||||
response = self.session.get(self.BASE_URL, params=params, timeout=60)
|
||||
response.raise_for_status()
|
||||
data = response.json()
|
||||
except requests.RequestException as e:
|
||||
logger.error(f"PageSpeed API request failed: {e}")
|
||||
raise
|
||||
|
||||
result = PageSpeedResult(url=url, strategy=strategy, raw_data=data)
|
||||
|
||||
# Extract scores
|
||||
lighthouse = data.get("lighthouseResult", {})
|
||||
categories_data = lighthouse.get("categories", {})
|
||||
|
||||
if "performance" in categories_data:
|
||||
score = categories_data["performance"].get("score")
|
||||
result.performance_score = score * 100 if score else None
|
||||
|
||||
if "seo" in categories_data:
|
||||
score = categories_data["seo"].get("score")
|
||||
result.seo_score = score * 100 if score else None
|
||||
|
||||
if "accessibility" in categories_data:
|
||||
score = categories_data["accessibility"].get("score")
|
||||
result.accessibility_score = score * 100 if score else None
|
||||
|
||||
if "best-practices" in categories_data:
|
||||
score = categories_data["best-practices"].get("score")
|
||||
result.best_practices_score = score * 100 if score else None
|
||||
|
||||
# Extract Core Web Vitals
|
||||
audits = lighthouse.get("audits", {})
|
||||
|
||||
# Lab data
|
||||
cwv = result.core_web_vitals
|
||||
|
||||
if "largest-contentful-paint" in audits:
|
||||
cwv.lcp = audits["largest-contentful-paint"].get("numericValue")
|
||||
cwv.lcp_rating = self._rate_metric("lcp", cwv.lcp)
|
||||
|
||||
if "total-blocking-time" in audits:
|
||||
# TBT is proxy for FID in lab data
|
||||
cwv.fid = audits["total-blocking-time"].get("numericValue")
|
||||
cwv.fid_rating = self._rate_metric("fid", cwv.fid)
|
||||
|
||||
if "cumulative-layout-shift" in audits:
|
||||
cwv.cls = audits["cumulative-layout-shift"].get("numericValue")
|
||||
cwv.cls_rating = self._rate_metric("cls", cwv.cls)
|
||||
|
||||
if "experimental-interaction-to-next-paint" in audits:
|
||||
cwv.inp = audits["experimental-interaction-to-next-paint"].get("numericValue")
|
||||
cwv.inp_rating = self._rate_metric("inp", cwv.inp)
|
||||
|
||||
if "server-response-time" in audits:
|
||||
cwv.ttfb = audits["server-response-time"].get("numericValue")
|
||||
|
||||
if "first-contentful-paint" in audits:
|
||||
cwv.fcp = audits["first-contentful-paint"].get("numericValue")
|
||||
|
||||
# Field data (real user data) if available
|
||||
loading_exp = data.get("loadingExperience", {})
|
||||
metrics = loading_exp.get("metrics", {})
|
||||
|
||||
if "LARGEST_CONTENTFUL_PAINT_MS" in metrics:
|
||||
cwv.lcp = metrics["LARGEST_CONTENTFUL_PAINT_MS"].get("percentile")
|
||||
cwv.lcp_rating = metrics["LARGEST_CONTENTFUL_PAINT_MS"].get("category")
|
||||
|
||||
if "FIRST_INPUT_DELAY_MS" in metrics:
|
||||
cwv.fid = metrics["FIRST_INPUT_DELAY_MS"].get("percentile")
|
||||
cwv.fid_rating = metrics["FIRST_INPUT_DELAY_MS"].get("category")
|
||||
|
||||
if "CUMULATIVE_LAYOUT_SHIFT_SCORE" in metrics:
|
||||
cwv.cls = metrics["CUMULATIVE_LAYOUT_SHIFT_SCORE"].get("percentile") / 100
|
||||
cwv.cls_rating = metrics["CUMULATIVE_LAYOUT_SHIFT_SCORE"].get("category")
|
||||
|
||||
if "INTERACTION_TO_NEXT_PAINT" in metrics:
|
||||
cwv.inp = metrics["INTERACTION_TO_NEXT_PAINT"].get("percentile")
|
||||
cwv.inp_rating = metrics["INTERACTION_TO_NEXT_PAINT"].get("category")
|
||||
|
||||
# Extract opportunities
|
||||
for audit_id, audit in audits.items():
|
||||
if audit.get("details", {}).get("type") == "opportunity":
|
||||
savings = audit.get("details", {}).get("overallSavingsMs", 0)
|
||||
if savings > 0:
|
||||
result.opportunities.append({
|
||||
"id": audit_id,
|
||||
"title": audit.get("title", ""),
|
||||
"description": audit.get("description", ""),
|
||||
"savings_ms": savings,
|
||||
"score": audit.get("score", 0),
|
||||
})
|
||||
|
||||
# Sort opportunities by savings
|
||||
result.opportunities.sort(key=lambda x: x["savings_ms"], reverse=True)
|
||||
|
||||
# Extract diagnostics
|
||||
for audit_id, audit in audits.items():
|
||||
score = audit.get("score")
|
||||
if score is not None and score < 1 and audit.get("details"):
|
||||
if audit.get("details", {}).get("type") not in ["opportunity", None]:
|
||||
result.diagnostics.append({
|
||||
"id": audit_id,
|
||||
"title": audit.get("title", ""),
|
||||
"description": audit.get("description", ""),
|
||||
"score": score,
|
||||
})
|
||||
|
||||
# Extract passed audits
|
||||
for audit_id, audit in audits.items():
|
||||
if audit.get("score") == 1:
|
||||
result.passed_audits.append(audit.get("title", audit_id))
|
||||
|
||||
return result
|
||||
|
||||
def analyze_both_strategies(self, url: str) -> dict:
|
||||
"""Analyze URL for both mobile and desktop."""
|
||||
mobile = self.analyze(url, strategy="mobile")
|
||||
desktop = self.analyze(url, strategy="desktop")
|
||||
|
||||
return {
|
||||
"url": url,
|
||||
"mobile": mobile.to_dict(),
|
||||
"desktop": desktop.to_dict(),
|
||||
"comparison": {
|
||||
"performance_difference": (
|
||||
(desktop.performance_score or 0) - (mobile.performance_score or 0)
|
||||
),
|
||||
"mobile_first_issues": self._identify_mobile_issues(mobile, desktop),
|
||||
},
|
||||
}
|
||||
|
||||
def _identify_mobile_issues(
|
||||
self,
|
||||
mobile: PageSpeedResult,
|
||||
desktop: PageSpeedResult,
|
||||
) -> list[str]:
|
||||
"""Identify issues that affect mobile more than desktop."""
|
||||
issues = []
|
||||
|
||||
if mobile.performance_score and desktop.performance_score:
|
||||
if desktop.performance_score - mobile.performance_score > 20:
|
||||
issues.append("Significant performance gap between mobile and desktop")
|
||||
|
||||
m_cwv = mobile.core_web_vitals
|
||||
d_cwv = desktop.core_web_vitals
|
||||
|
||||
if m_cwv.lcp and d_cwv.lcp and m_cwv.lcp > d_cwv.lcp * 1.5:
|
||||
issues.append("LCP significantly slower on mobile")
|
||||
|
||||
if m_cwv.cls and d_cwv.cls and m_cwv.cls > d_cwv.cls * 2:
|
||||
issues.append("Layout shift issues more severe on mobile")
|
||||
|
||||
return issues
|
||||
|
||||
def get_cwv_summary(self, url: str) -> dict:
|
||||
"""Get a summary focused on Core Web Vitals."""
|
||||
result = self.analyze(url, strategy="mobile")
|
||||
|
||||
cwv = result.core_web_vitals
|
||||
|
||||
return {
|
||||
"url": url,
|
||||
"overall_cwv_status": self._overall_cwv_status(cwv),
|
||||
"metrics": {
|
||||
"lcp": {
|
||||
"value": f"{cwv.lcp / 1000:.2f}s" if cwv.lcp else None,
|
||||
"rating": cwv.lcp_rating,
|
||||
"threshold": "≤ 2.5s good, > 4.0s poor",
|
||||
},
|
||||
"fid": {
|
||||
"value": f"{cwv.fid:.0f}ms" if cwv.fid else None,
|
||||
"rating": cwv.fid_rating,
|
||||
"threshold": "≤ 100ms good, > 300ms poor",
|
||||
},
|
||||
"cls": {
|
||||
"value": f"{cwv.cls:.3f}" if cwv.cls else None,
|
||||
"rating": cwv.cls_rating,
|
||||
"threshold": "≤ 0.1 good, > 0.25 poor",
|
||||
},
|
||||
"inp": {
|
||||
"value": f"{cwv.inp:.0f}ms" if cwv.inp else None,
|
||||
"rating": cwv.inp_rating,
|
||||
"threshold": "≤ 200ms good, > 500ms poor",
|
||||
},
|
||||
},
|
||||
"top_opportunities": result.opportunities[:5],
|
||||
}
|
||||
|
||||
def _overall_cwv_status(self, cwv: CoreWebVitals) -> str:
|
||||
"""Determine overall Core Web Vitals status."""
|
||||
ratings = [cwv.lcp_rating, cwv.fid_rating, cwv.cls_rating]
|
||||
ratings = [r for r in ratings if r]
|
||||
|
||||
if not ratings:
|
||||
return "UNKNOWN"
|
||||
|
||||
if any(r == "POOR" for r in ratings):
|
||||
return "POOR"
|
||||
if any(r == "NEEDS_IMPROVEMENT" for r in ratings):
|
||||
return "NEEDS_IMPROVEMENT"
|
||||
return "GOOD"
|
||||
|
||||
def generate_report(self, result: PageSpeedResult) -> str:
|
||||
"""Generate human-readable performance report."""
|
||||
lines = [
|
||||
"=" * 60,
|
||||
"PageSpeed Insights Report",
|
||||
"=" * 60,
|
||||
f"URL: {result.url}",
|
||||
f"Strategy: {result.strategy}",
|
||||
"",
|
||||
"Scores:",
|
||||
f" Performance: {result.performance_score:.0f}/100" if result.performance_score else " Performance: N/A",
|
||||
f" SEO: {result.seo_score:.0f}/100" if result.seo_score else " SEO: N/A",
|
||||
f" Accessibility: {result.accessibility_score:.0f}/100" if result.accessibility_score else " Accessibility: N/A",
|
||||
f" Best Practices: {result.best_practices_score:.0f}/100" if result.best_practices_score else " Best Practices: N/A",
|
||||
"",
|
||||
"Core Web Vitals:",
|
||||
]
|
||||
|
||||
cwv = result.core_web_vitals
|
||||
|
||||
def format_metric(name: str, value: Any, rating: str | None, unit: str) -> str:
|
||||
if value is None:
|
||||
return f" {name}: N/A"
|
||||
rating_str = f" ({rating})" if rating else ""
|
||||
return f" {name}: {value}{unit}{rating_str}"
|
||||
|
||||
lines.append(format_metric("LCP", f"{cwv.lcp / 1000:.2f}" if cwv.lcp else None, cwv.lcp_rating, "s"))
|
||||
lines.append(format_metric("FID/TBT", f"{cwv.fid:.0f}" if cwv.fid else None, cwv.fid_rating, "ms"))
|
||||
lines.append(format_metric("CLS", f"{cwv.cls:.3f}" if cwv.cls else None, cwv.cls_rating, ""))
|
||||
lines.append(format_metric("INP", f"{cwv.inp:.0f}" if cwv.inp else None, cwv.inp_rating, "ms"))
|
||||
lines.append(format_metric("TTFB", f"{cwv.ttfb:.0f}" if cwv.ttfb else None, None, "ms"))
|
||||
lines.append(format_metric("FCP", f"{cwv.fcp / 1000:.2f}" if cwv.fcp else None, None, "s"))
|
||||
|
||||
if result.opportunities:
|
||||
lines.extend([
|
||||
"",
|
||||
f"Top Opportunities ({len(result.opportunities)} total):",
|
||||
])
|
||||
for opp in result.opportunities[:5]:
|
||||
savings = opp["savings_ms"]
|
||||
lines.append(f" - {opp['title']}: -{savings / 1000:.1f}s potential savings")
|
||||
|
||||
lines.extend(["", "=" * 60])
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def main():
|
||||
"""CLI entry point."""
|
||||
parser = argparse.ArgumentParser(description="PageSpeed Insights Client")
|
||||
parser.add_argument("--url", "-u", required=True, help="URL to analyze")
|
||||
parser.add_argument("--strategy", "-s", default="mobile",
|
||||
choices=["mobile", "desktop", "both"],
|
||||
help="Analysis strategy")
|
||||
parser.add_argument("--output", "-o", help="Output file for JSON")
|
||||
parser.add_argument("--json", action="store_true", help="Output as JSON")
|
||||
parser.add_argument("--cwv-only", action="store_true",
|
||||
help="Show only Core Web Vitals summary")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
client = PageSpeedClient()
|
||||
|
||||
if args.cwv_only:
|
||||
summary = client.get_cwv_summary(args.url)
|
||||
print(json.dumps(summary, indent=2))
|
||||
elif args.strategy == "both":
|
||||
result = client.analyze_both_strategies(args.url)
|
||||
output = json.dumps(result, indent=2)
|
||||
if args.output:
|
||||
with open(args.output, "w") as f:
|
||||
f.write(output)
|
||||
else:
|
||||
print(output)
|
||||
else:
|
||||
result = client.analyze(args.url, strategy=args.strategy)
|
||||
|
||||
if args.json or args.output:
|
||||
output = json.dumps(result.to_dict(), indent=2)
|
||||
if args.output:
|
||||
with open(args.output, "w") as f:
|
||||
f.write(output)
|
||||
else:
|
||||
print(output)
|
||||
else:
|
||||
print(client.generate_report(result))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -0,0 +1,6 @@
|
||||
# 15-seo-core-web-vitals dependencies
|
||||
google-api-python-client>=2.100.0
|
||||
requests>=2.31.0
|
||||
python-dotenv>=1.0.0
|
||||
rich>=13.7.0
|
||||
typer>=0.9.0
|
||||
108
custom-skills/15-seo-core-web-vitals/desktop/SKILL.md
Normal file
108
custom-skills/15-seo-core-web-vitals/desktop/SKILL.md
Normal file
@@ -0,0 +1,108 @@
|
||||
---
|
||||
name: seo-core-web-vitals
|
||||
version: 1.0.0
|
||||
description: Core Web Vitals analyzer for LCP, FID, CLS, INP performance metrics. Triggers: Core Web Vitals, page speed, LCP, CLS, FID, INP, performance.
|
||||
allowed-tools: mcp__firecrawl__*, mcp__perplexity__*
|
||||
---
|
||||
|
||||
# SEO Core Web Vitals
|
||||
|
||||
## Purpose
|
||||
|
||||
Analyze Core Web Vitals performance metrics and provide optimization recommendations.
|
||||
|
||||
## Core Capabilities
|
||||
|
||||
1. **LCP** - Largest Contentful Paint measurement
|
||||
2. **FID/INP** - Interactivity metrics
|
||||
3. **CLS** - Cumulative Layout Shift
|
||||
4. **Recommendations** - Optimization guidance
|
||||
|
||||
## Metrics Thresholds
|
||||
|
||||
| Metric | Good | Needs Work | Poor |
|
||||
|--------|------|------------|------|
|
||||
| LCP | ≤2.5s | 2.5-4s | >4s |
|
||||
| FID | ≤100ms | 100-300ms | >300ms |
|
||||
| CLS | ≤0.1 | 0.1-0.25 | >0.25 |
|
||||
| INP | ≤200ms | 200-500ms | >500ms |
|
||||
|
||||
## Data Sources
|
||||
|
||||
### Option 1: PageSpeed Insights (Recommended)
|
||||
Use external tool and input results:
|
||||
- Visit: https://pagespeed.web.dev/
|
||||
- Enter URL, run test
|
||||
- Provide scores to skill
|
||||
|
||||
### Option 2: Research Best Practices
|
||||
```
|
||||
mcp__perplexity__search: "Core Web Vitals optimization [specific issue]"
|
||||
```
|
||||
|
||||
## Workflow
|
||||
|
||||
1. Request PageSpeed Insights data from user
|
||||
2. Analyze provided metrics
|
||||
3. Identify failing metrics
|
||||
4. Research optimization strategies
|
||||
5. Provide prioritized recommendations
|
||||
|
||||
## Common LCP Issues
|
||||
|
||||
| Cause | Fix |
|
||||
|-------|-----|
|
||||
| Slow server response | Improve TTFB, use CDN |
|
||||
| Render-blocking resources | Defer non-critical CSS/JS |
|
||||
| Slow resource load | Preload LCP image |
|
||||
| Client-side rendering | Use SSR/SSG |
|
||||
|
||||
## Common CLS Issues
|
||||
|
||||
| Cause | Fix |
|
||||
|-------|-----|
|
||||
| Images without dimensions | Add width/height attributes |
|
||||
| Ads/embeds without space | Reserve space with CSS |
|
||||
| Web fonts causing FOIT/FOUT | Use font-display: swap |
|
||||
| Dynamic content injection | Reserve space, use transforms |
|
||||
|
||||
## Common INP Issues
|
||||
|
||||
| Cause | Fix |
|
||||
|-------|-----|
|
||||
| Long JavaScript tasks | Break up tasks, use web workers |
|
||||
| Large DOM size | Reduce DOM nodes |
|
||||
| Heavy event handlers | Debounce, optimize listeners |
|
||||
| Third-party scripts | Defer, lazy load |
|
||||
|
||||
## Output Format
|
||||
|
||||
```markdown
|
||||
## Core Web Vitals: [URL]
|
||||
|
||||
### Scores
|
||||
| Metric | Mobile | Desktop | Status |
|
||||
|--------|--------|---------|--------|
|
||||
| LCP | Xs | Xs | Good/Poor |
|
||||
| FID | Xms | Xms | Good/Poor |
|
||||
| CLS | X.XX | X.XX | Good/Poor |
|
||||
| INP | Xms | Xms | Good/Poor |
|
||||
|
||||
### Overall Score
|
||||
- Mobile: X/100
|
||||
- Desktop: X/100
|
||||
|
||||
### Priority Fixes
|
||||
1. [Highest impact recommendation]
|
||||
2. [Second priority]
|
||||
|
||||
### Detailed Recommendations
|
||||
[Per-metric optimization steps]
|
||||
```
|
||||
|
||||
## Limitations
|
||||
|
||||
- Requires external PageSpeed Insights data
|
||||
- Lab data may differ from field data
|
||||
- Some fixes require developer implementation
|
||||
- Third-party scripts may be difficult to optimize
|
||||
122
custom-skills/16-seo-search-console/code/CLAUDE.md
Normal file
122
custom-skills/16-seo-search-console/code/CLAUDE.md
Normal file
@@ -0,0 +1,122 @@
|
||||
# CLAUDE.md
|
||||
|
||||
## Overview
|
||||
|
||||
Google Search Console data retriever: search analytics (rankings, CTR, impressions), sitemap status, and index coverage.
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
pip install -r scripts/requirements.txt
|
||||
|
||||
# Requires service account credentials
|
||||
# ~/.credential/ourdigital-seo-agent.json
|
||||
|
||||
python scripts/gsc_client.py --site sc-domain:example.com --action summary
|
||||
```
|
||||
|
||||
## Scripts
|
||||
|
||||
| Script | Purpose |
|
||||
|--------|---------|
|
||||
| `gsc_client.py` | Search Console API client |
|
||||
| `base_client.py` | Shared utilities |
|
||||
|
||||
## Configuration
|
||||
|
||||
Service account setup:
|
||||
```bash
|
||||
# Credentials file location
|
||||
~/.credential/ourdigital-seo-agent.json
|
||||
|
||||
# Add service account email to GSC property as user
|
||||
ourdigital-seo-agent@ourdigital-insights.iam.gserviceaccount.com
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
```bash
|
||||
# Performance summary (last 28 days)
|
||||
python scripts/gsc_client.py --site sc-domain:example.com --action summary
|
||||
|
||||
# Query-level data
|
||||
python scripts/gsc_client.py --site sc-domain:example.com --action queries --limit 100
|
||||
|
||||
# Page-level data
|
||||
python scripts/gsc_client.py --site sc-domain:example.com --action pages
|
||||
|
||||
# Custom date range
|
||||
python scripts/gsc_client.py --site sc-domain:example.com --action queries \
|
||||
--start 2024-01-01 --end 2024-01-31
|
||||
|
||||
# Sitemap status
|
||||
python scripts/gsc_client.py --site sc-domain:example.com --action sitemaps
|
||||
|
||||
# JSON output
|
||||
python scripts/gsc_client.py --site sc-domain:example.com --action summary --json
|
||||
```
|
||||
|
||||
## Actions
|
||||
|
||||
| Action | Description |
|
||||
|--------|-------------|
|
||||
| `summary` | Overview metrics (clicks, impressions, CTR, position) |
|
||||
| `queries` | Top search queries |
|
||||
| `pages` | Top pages by clicks |
|
||||
| `sitemaps` | Sitemap submission status |
|
||||
| `coverage` | Index coverage issues |
|
||||
|
||||
## Output: Summary
|
||||
|
||||
```json
|
||||
{
|
||||
"site": "sc-domain:example.com",
|
||||
"date_range": "2024-01-01 to 2024-01-28",
|
||||
"totals": {
|
||||
"clicks": 15000,
|
||||
"impressions": 500000,
|
||||
"ctr": 3.0,
|
||||
"position": 12.5
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Output: Queries
|
||||
|
||||
```json
|
||||
{
|
||||
"queries": [
|
||||
{
|
||||
"query": "keyword",
|
||||
"clicks": 500,
|
||||
"impressions": 10000,
|
||||
"ctr": 5.0,
|
||||
"position": 3.2
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Rate Limits
|
||||
|
||||
| Limit | Value |
|
||||
|-------|-------|
|
||||
| Queries per minute | 1,200 |
|
||||
| Rows per request | 25,000 |
|
||||
|
||||
## Site Property Formats
|
||||
|
||||
| Format | Example |
|
||||
|--------|---------|
|
||||
| Domain property | `sc-domain:example.com` |
|
||||
| URL prefix | `https://www.example.com/` |
|
||||
|
||||
## Dependencies
|
||||
|
||||
```
|
||||
google-api-python-client>=2.100.0
|
||||
google-auth>=2.23.0
|
||||
python-dotenv>=1.0.0
|
||||
rich>=13.7.0
|
||||
pandas>=2.1.0
|
||||
```
|
||||
207
custom-skills/16-seo-search-console/code/scripts/base_client.py
Normal file
207
custom-skills/16-seo-search-console/code/scripts/base_client.py
Normal file
@@ -0,0 +1,207 @@
|
||||
"""
|
||||
Base Client - Shared async client utilities
|
||||
===========================================
|
||||
Purpose: Rate-limited async operations for API clients
|
||||
Python: 3.10+
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
import os
|
||||
from asyncio import Semaphore
|
||||
from datetime import datetime
|
||||
from typing import Any, Callable, TypeVar
|
||||
|
||||
from dotenv import load_dotenv
|
||||
from tenacity import (
|
||||
retry,
|
||||
stop_after_attempt,
|
||||
wait_exponential,
|
||||
retry_if_exception_type,
|
||||
)
|
||||
|
||||
# Load environment variables
|
||||
load_dotenv()
|
||||
|
||||
# Logging setup
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format="%(asctime)s - %(levelname)s - %(message)s",
|
||||
)
|
||||
|
||||
T = TypeVar("T")
|
||||
|
||||
|
||||
class RateLimiter:
|
||||
"""Rate limiter using token bucket algorithm."""
|
||||
|
||||
def __init__(self, rate: float, per: float = 1.0):
|
||||
"""
|
||||
Initialize rate limiter.
|
||||
|
||||
Args:
|
||||
rate: Number of requests allowed
|
||||
per: Time period in seconds (default: 1 second)
|
||||
"""
|
||||
self.rate = rate
|
||||
self.per = per
|
||||
self.tokens = rate
|
||||
self.last_update = datetime.now()
|
||||
self._lock = asyncio.Lock()
|
||||
|
||||
async def acquire(self) -> None:
|
||||
"""Acquire a token, waiting if necessary."""
|
||||
async with self._lock:
|
||||
now = datetime.now()
|
||||
elapsed = (now - self.last_update).total_seconds()
|
||||
self.tokens = min(self.rate, self.tokens + elapsed * (self.rate / self.per))
|
||||
self.last_update = now
|
||||
|
||||
if self.tokens < 1:
|
||||
wait_time = (1 - self.tokens) * (self.per / self.rate)
|
||||
await asyncio.sleep(wait_time)
|
||||
self.tokens = 0
|
||||
else:
|
||||
self.tokens -= 1
|
||||
|
||||
|
||||
class BaseAsyncClient:
|
||||
"""Base class for async API clients with rate limiting."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
max_concurrent: int = 5,
|
||||
requests_per_second: float = 3.0,
|
||||
logger: logging.Logger | None = None,
|
||||
):
|
||||
"""
|
||||
Initialize base client.
|
||||
|
||||
Args:
|
||||
max_concurrent: Maximum concurrent requests
|
||||
requests_per_second: Rate limit
|
||||
logger: Logger instance
|
||||
"""
|
||||
self.semaphore = Semaphore(max_concurrent)
|
||||
self.rate_limiter = RateLimiter(requests_per_second)
|
||||
self.logger = logger or logging.getLogger(self.__class__.__name__)
|
||||
self.stats = {
|
||||
"requests": 0,
|
||||
"success": 0,
|
||||
"errors": 0,
|
||||
"retries": 0,
|
||||
}
|
||||
|
||||
@retry(
|
||||
stop=stop_after_attempt(3),
|
||||
wait=wait_exponential(multiplier=1, min=2, max=10),
|
||||
retry=retry_if_exception_type(Exception),
|
||||
)
|
||||
async def _rate_limited_request(
|
||||
self,
|
||||
coro: Callable[[], Any],
|
||||
) -> Any:
|
||||
"""Execute a request with rate limiting and retry."""
|
||||
async with self.semaphore:
|
||||
await self.rate_limiter.acquire()
|
||||
self.stats["requests"] += 1
|
||||
try:
|
||||
result = await coro()
|
||||
self.stats["success"] += 1
|
||||
return result
|
||||
except Exception as e:
|
||||
self.stats["errors"] += 1
|
||||
self.logger.error(f"Request failed: {e}")
|
||||
raise
|
||||
|
||||
async def batch_requests(
|
||||
self,
|
||||
requests: list[Callable[[], Any]],
|
||||
desc: str = "Processing",
|
||||
) -> list[Any]:
|
||||
"""Execute multiple requests concurrently."""
|
||||
try:
|
||||
from tqdm.asyncio import tqdm
|
||||
has_tqdm = True
|
||||
except ImportError:
|
||||
has_tqdm = False
|
||||
|
||||
async def execute(req: Callable) -> Any:
|
||||
try:
|
||||
return await self._rate_limited_request(req)
|
||||
except Exception as e:
|
||||
return {"error": str(e)}
|
||||
|
||||
tasks = [execute(req) for req in requests]
|
||||
|
||||
if has_tqdm:
|
||||
results = []
|
||||
for coro in tqdm.as_completed(tasks, total=len(tasks), desc=desc):
|
||||
result = await coro
|
||||
results.append(result)
|
||||
return results
|
||||
else:
|
||||
return await asyncio.gather(*tasks, return_exceptions=True)
|
||||
|
||||
def print_stats(self) -> None:
|
||||
"""Print request statistics."""
|
||||
self.logger.info("=" * 40)
|
||||
self.logger.info("Request Statistics:")
|
||||
self.logger.info(f" Total Requests: {self.stats['requests']}")
|
||||
self.logger.info(f" Successful: {self.stats['success']}")
|
||||
self.logger.info(f" Errors: {self.stats['errors']}")
|
||||
self.logger.info("=" * 40)
|
||||
|
||||
|
||||
class ConfigManager:
|
||||
"""Manage API configuration and credentials."""
|
||||
|
||||
def __init__(self):
|
||||
load_dotenv()
|
||||
|
||||
@property
|
||||
def google_credentials_path(self) -> str | None:
|
||||
"""Get Google service account credentials path."""
|
||||
# Prefer SEO-specific credentials, fallback to general credentials
|
||||
seo_creds = os.path.expanduser("~/.credential/ourdigital-seo-agent.json")
|
||||
if os.path.exists(seo_creds):
|
||||
return seo_creds
|
||||
return os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
|
||||
|
||||
@property
|
||||
def pagespeed_api_key(self) -> str | None:
|
||||
"""Get PageSpeed Insights API key."""
|
||||
return os.getenv("PAGESPEED_API_KEY")
|
||||
|
||||
@property
|
||||
def custom_search_api_key(self) -> str | None:
|
||||
"""Get Custom Search API key."""
|
||||
return os.getenv("CUSTOM_SEARCH_API_KEY")
|
||||
|
||||
@property
|
||||
def custom_search_engine_id(self) -> str | None:
|
||||
"""Get Custom Search Engine ID."""
|
||||
return os.getenv("CUSTOM_SEARCH_ENGINE_ID")
|
||||
|
||||
@property
|
||||
def notion_token(self) -> str | None:
|
||||
"""Get Notion API token."""
|
||||
return os.getenv("NOTION_TOKEN") or os.getenv("NOTION_API_KEY")
|
||||
|
||||
def validate_google_credentials(self) -> bool:
|
||||
"""Validate Google credentials are configured."""
|
||||
creds_path = self.google_credentials_path
|
||||
if not creds_path:
|
||||
return False
|
||||
return os.path.exists(creds_path)
|
||||
|
||||
def get_required(self, key: str) -> str:
|
||||
"""Get required environment variable or raise error."""
|
||||
value = os.getenv(key)
|
||||
if not value:
|
||||
raise ValueError(f"Missing required environment variable: {key}")
|
||||
return value
|
||||
|
||||
|
||||
# Singleton config instance
|
||||
config = ConfigManager()
|
||||
409
custom-skills/16-seo-search-console/code/scripts/gsc_client.py
Normal file
409
custom-skills/16-seo-search-console/code/scripts/gsc_client.py
Normal file
@@ -0,0 +1,409 @@
|
||||
"""
|
||||
Google Search Console Client
|
||||
============================
|
||||
Purpose: Interact with Google Search Console API for SEO data
|
||||
Python: 3.10+
|
||||
Usage:
|
||||
from gsc_client import SearchConsoleClient
|
||||
client = SearchConsoleClient()
|
||||
data = client.get_search_analytics("sc-domain:example.com")
|
||||
"""
|
||||
|
||||
import logging
|
||||
from dataclasses import dataclass, field
|
||||
from datetime import datetime, timedelta
|
||||
from typing import Any
|
||||
|
||||
from google.oauth2 import service_account
|
||||
from googleapiclient.discovery import build
|
||||
|
||||
from base_client import config
|
||||
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format="%(asctime)s - %(levelname)s - %(message)s",
|
||||
)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@dataclass
|
||||
class SearchAnalyticsResult:
|
||||
"""Search analytics query result."""
|
||||
|
||||
rows: list[dict] = field(default_factory=list)
|
||||
total_clicks: int = 0
|
||||
total_impressions: int = 0
|
||||
average_ctr: float = 0.0
|
||||
average_position: float = 0.0
|
||||
|
||||
|
||||
@dataclass
|
||||
class SitemapInfo:
|
||||
"""Sitemap information from Search Console."""
|
||||
|
||||
path: str
|
||||
last_submitted: str | None = None
|
||||
last_downloaded: str | None = None
|
||||
is_pending: bool = False
|
||||
is_sitemaps_index: bool = False
|
||||
warnings: int = 0
|
||||
errors: int = 0
|
||||
|
||||
|
||||
class SearchConsoleClient:
|
||||
"""Client for Google Search Console API."""
|
||||
|
||||
SCOPES = ["https://www.googleapis.com/auth/webmasters.readonly"]
|
||||
|
||||
def __init__(self, credentials_path: str | None = None):
|
||||
"""
|
||||
Initialize Search Console client.
|
||||
|
||||
Args:
|
||||
credentials_path: Path to service account JSON key
|
||||
"""
|
||||
self.credentials_path = credentials_path or config.google_credentials_path
|
||||
self._service = None
|
||||
|
||||
@property
|
||||
def service(self):
|
||||
"""Get or create Search Console service."""
|
||||
if self._service is None:
|
||||
if not self.credentials_path:
|
||||
raise ValueError(
|
||||
"Google credentials not configured. "
|
||||
"Set GOOGLE_APPLICATION_CREDENTIALS environment variable."
|
||||
)
|
||||
|
||||
credentials = service_account.Credentials.from_service_account_file(
|
||||
self.credentials_path,
|
||||
scopes=self.SCOPES,
|
||||
)
|
||||
self._service = build("searchconsole", "v1", credentials=credentials)
|
||||
|
||||
return self._service
|
||||
|
||||
def list_sites(self) -> list[dict]:
|
||||
"""List all sites accessible to the service account."""
|
||||
response = self.service.sites().list().execute()
|
||||
return response.get("siteEntry", [])
|
||||
|
||||
def get_search_analytics(
|
||||
self,
|
||||
site_url: str,
|
||||
start_date: str | None = None,
|
||||
end_date: str | None = None,
|
||||
dimensions: list[str] | None = None,
|
||||
row_limit: int = 25000,
|
||||
filters: list[dict] | None = None,
|
||||
) -> SearchAnalyticsResult:
|
||||
"""
|
||||
Get search analytics data.
|
||||
|
||||
Args:
|
||||
site_url: Site URL (e.g., "sc-domain:example.com" or "https://example.com/")
|
||||
start_date: Start date (YYYY-MM-DD), defaults to 30 days ago
|
||||
end_date: End date (YYYY-MM-DD), defaults to yesterday
|
||||
dimensions: List of dimensions (query, page, country, device, date)
|
||||
row_limit: Maximum rows to return
|
||||
filters: Dimension filters
|
||||
|
||||
Returns:
|
||||
SearchAnalyticsResult with rows and summary stats
|
||||
"""
|
||||
# Default date range: last 30 days
|
||||
if not end_date:
|
||||
end_date = (datetime.now() - timedelta(days=1)).strftime("%Y-%m-%d")
|
||||
if not start_date:
|
||||
start_date = (datetime.now() - timedelta(days=30)).strftime("%Y-%m-%d")
|
||||
|
||||
# Default dimensions
|
||||
if dimensions is None:
|
||||
dimensions = ["query", "page"]
|
||||
|
||||
request_body = {
|
||||
"startDate": start_date,
|
||||
"endDate": end_date,
|
||||
"dimensions": dimensions,
|
||||
"rowLimit": row_limit,
|
||||
}
|
||||
|
||||
if filters:
|
||||
request_body["dimensionFilterGroups"] = [{"filters": filters}]
|
||||
|
||||
try:
|
||||
response = self.service.searchanalytics().query(
|
||||
siteUrl=site_url,
|
||||
body=request_body,
|
||||
).execute()
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to query search analytics: {e}")
|
||||
raise
|
||||
|
||||
rows = response.get("rows", [])
|
||||
|
||||
# Calculate totals
|
||||
total_clicks = sum(row.get("clicks", 0) for row in rows)
|
||||
total_impressions = sum(row.get("impressions", 0) for row in rows)
|
||||
total_ctr = sum(row.get("ctr", 0) for row in rows)
|
||||
total_position = sum(row.get("position", 0) for row in rows)
|
||||
|
||||
avg_ctr = total_ctr / len(rows) if rows else 0
|
||||
avg_position = total_position / len(rows) if rows else 0
|
||||
|
||||
return SearchAnalyticsResult(
|
||||
rows=rows,
|
||||
total_clicks=total_clicks,
|
||||
total_impressions=total_impressions,
|
||||
average_ctr=avg_ctr,
|
||||
average_position=avg_position,
|
||||
)
|
||||
|
||||
def get_top_queries(
|
||||
self,
|
||||
site_url: str,
|
||||
limit: int = 100,
|
||||
start_date: str | None = None,
|
||||
end_date: str | None = None,
|
||||
) -> list[dict]:
|
||||
"""Get top search queries by clicks."""
|
||||
result = self.get_search_analytics(
|
||||
site_url=site_url,
|
||||
dimensions=["query"],
|
||||
row_limit=limit,
|
||||
start_date=start_date,
|
||||
end_date=end_date,
|
||||
)
|
||||
|
||||
# Sort by clicks
|
||||
sorted_rows = sorted(
|
||||
result.rows,
|
||||
key=lambda x: x.get("clicks", 0),
|
||||
reverse=True,
|
||||
)
|
||||
|
||||
return [
|
||||
{
|
||||
"query": row["keys"][0],
|
||||
"clicks": row.get("clicks", 0),
|
||||
"impressions": row.get("impressions", 0),
|
||||
"ctr": row.get("ctr", 0),
|
||||
"position": row.get("position", 0),
|
||||
}
|
||||
for row in sorted_rows[:limit]
|
||||
]
|
||||
|
||||
def get_top_pages(
|
||||
self,
|
||||
site_url: str,
|
||||
limit: int = 100,
|
||||
start_date: str | None = None,
|
||||
end_date: str | None = None,
|
||||
) -> list[dict]:
|
||||
"""Get top pages by clicks."""
|
||||
result = self.get_search_analytics(
|
||||
site_url=site_url,
|
||||
dimensions=["page"],
|
||||
row_limit=limit,
|
||||
start_date=start_date,
|
||||
end_date=end_date,
|
||||
)
|
||||
|
||||
sorted_rows = sorted(
|
||||
result.rows,
|
||||
key=lambda x: x.get("clicks", 0),
|
||||
reverse=True,
|
||||
)
|
||||
|
||||
return [
|
||||
{
|
||||
"page": row["keys"][0],
|
||||
"clicks": row.get("clicks", 0),
|
||||
"impressions": row.get("impressions", 0),
|
||||
"ctr": row.get("ctr", 0),
|
||||
"position": row.get("position", 0),
|
||||
}
|
||||
for row in sorted_rows[:limit]
|
||||
]
|
||||
|
||||
def get_sitemaps(self, site_url: str) -> list[SitemapInfo]:
|
||||
"""Get list of sitemaps for a site."""
|
||||
try:
|
||||
response = self.service.sitemaps().list(siteUrl=site_url).execute()
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to get sitemaps: {e}")
|
||||
raise
|
||||
|
||||
sitemaps = []
|
||||
for sm in response.get("sitemap", []):
|
||||
sitemaps.append(SitemapInfo(
|
||||
path=sm.get("path", ""),
|
||||
last_submitted=sm.get("lastSubmitted"),
|
||||
last_downloaded=sm.get("lastDownloaded"),
|
||||
is_pending=sm.get("isPending", False),
|
||||
is_sitemaps_index=sm.get("isSitemapsIndex", False),
|
||||
warnings=sm.get("warnings", 0),
|
||||
errors=sm.get("errors", 0),
|
||||
))
|
||||
|
||||
return sitemaps
|
||||
|
||||
def submit_sitemap(self, site_url: str, sitemap_url: str) -> bool:
|
||||
"""Submit a sitemap for indexing."""
|
||||
try:
|
||||
self.service.sitemaps().submit(
|
||||
siteUrl=site_url,
|
||||
feedpath=sitemap_url,
|
||||
).execute()
|
||||
logger.info(f"Submitted sitemap: {sitemap_url}")
|
||||
return True
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to submit sitemap: {e}")
|
||||
return False
|
||||
|
||||
def inspect_url(self, site_url: str, inspection_url: str) -> dict:
|
||||
"""
|
||||
Inspect a URL's indexing status.
|
||||
|
||||
Note: This uses the URL Inspection API which may have different quotas.
|
||||
"""
|
||||
try:
|
||||
response = self.service.urlInspection().index().inspect(
|
||||
body={
|
||||
"inspectionUrl": inspection_url,
|
||||
"siteUrl": site_url,
|
||||
}
|
||||
).execute()
|
||||
|
||||
result = response.get("inspectionResult", {})
|
||||
|
||||
return {
|
||||
"url": inspection_url,
|
||||
"indexing_state": result.get("indexStatusResult", {}).get(
|
||||
"coverageState", "Unknown"
|
||||
),
|
||||
"last_crawl_time": result.get("indexStatusResult", {}).get(
|
||||
"lastCrawlTime"
|
||||
),
|
||||
"crawled_as": result.get("indexStatusResult", {}).get("crawledAs"),
|
||||
"robots_txt_state": result.get("indexStatusResult", {}).get(
|
||||
"robotsTxtState"
|
||||
),
|
||||
"mobile_usability": result.get("mobileUsabilityResult", {}).get(
|
||||
"verdict", "Unknown"
|
||||
),
|
||||
"rich_results": result.get("richResultsResult", {}).get(
|
||||
"verdict", "Unknown"
|
||||
),
|
||||
}
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to inspect URL: {e}")
|
||||
raise
|
||||
|
||||
def get_performance_summary(
|
||||
self,
|
||||
site_url: str,
|
||||
days: int = 30,
|
||||
) -> dict:
|
||||
"""Get a summary of search performance."""
|
||||
end_date = (datetime.now() - timedelta(days=1)).strftime("%Y-%m-%d")
|
||||
start_date = (datetime.now() - timedelta(days=days)).strftime("%Y-%m-%d")
|
||||
|
||||
# Get overall stats
|
||||
overall = self.get_search_analytics(
|
||||
site_url=site_url,
|
||||
dimensions=[],
|
||||
start_date=start_date,
|
||||
end_date=end_date,
|
||||
)
|
||||
|
||||
# Get top queries
|
||||
top_queries = self.get_top_queries(
|
||||
site_url=site_url,
|
||||
limit=10,
|
||||
start_date=start_date,
|
||||
end_date=end_date,
|
||||
)
|
||||
|
||||
# Get top pages
|
||||
top_pages = self.get_top_pages(
|
||||
site_url=site_url,
|
||||
limit=10,
|
||||
start_date=start_date,
|
||||
end_date=end_date,
|
||||
)
|
||||
|
||||
# Get by device
|
||||
by_device = self.get_search_analytics(
|
||||
site_url=site_url,
|
||||
dimensions=["device"],
|
||||
start_date=start_date,
|
||||
end_date=end_date,
|
||||
)
|
||||
|
||||
device_breakdown = {}
|
||||
for row in by_device.rows:
|
||||
device = row["keys"][0]
|
||||
device_breakdown[device] = {
|
||||
"clicks": row.get("clicks", 0),
|
||||
"impressions": row.get("impressions", 0),
|
||||
"ctr": row.get("ctr", 0),
|
||||
"position": row.get("position", 0),
|
||||
}
|
||||
|
||||
return {
|
||||
"period": f"{start_date} to {end_date}",
|
||||
"total_clicks": overall.total_clicks,
|
||||
"total_impressions": overall.total_impressions,
|
||||
"average_ctr": overall.average_ctr,
|
||||
"average_position": overall.average_position,
|
||||
"top_queries": top_queries,
|
||||
"top_pages": top_pages,
|
||||
"by_device": device_breakdown,
|
||||
}
|
||||
|
||||
|
||||
def main():
|
||||
"""Test the Search Console client."""
|
||||
import argparse
|
||||
|
||||
parser = argparse.ArgumentParser(description="Google Search Console Client")
|
||||
parser.add_argument("--site", "-s", required=True, help="Site URL")
|
||||
parser.add_argument("--action", "-a", default="summary",
|
||||
choices=["summary", "queries", "pages", "sitemaps", "inspect"],
|
||||
help="Action to perform")
|
||||
parser.add_argument("--url", help="URL to inspect")
|
||||
parser.add_argument("--days", type=int, default=30, help="Days of data")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
client = SearchConsoleClient()
|
||||
|
||||
if args.action == "summary":
|
||||
summary = client.get_performance_summary(args.site, args.days)
|
||||
import json
|
||||
print(json.dumps(summary, indent=2, default=str))
|
||||
|
||||
elif args.action == "queries":
|
||||
queries = client.get_top_queries(args.site)
|
||||
for q in queries[:20]:
|
||||
print(f"{q['query']}: {q['clicks']} clicks, pos {q['position']:.1f}")
|
||||
|
||||
elif args.action == "pages":
|
||||
pages = client.get_top_pages(args.site)
|
||||
for p in pages[:20]:
|
||||
print(f"{p['page']}: {p['clicks']} clicks, pos {p['position']:.1f}")
|
||||
|
||||
elif args.action == "sitemaps":
|
||||
sitemaps = client.get_sitemaps(args.site)
|
||||
for sm in sitemaps:
|
||||
print(f"{sm.path}: errors={sm.errors}, warnings={sm.warnings}")
|
||||
|
||||
elif args.action == "inspect" and args.url:
|
||||
result = client.inspect_url(args.site, args.url)
|
||||
import json
|
||||
print(json.dumps(result, indent=2))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -0,0 +1,7 @@
|
||||
# 16-seo-search-console dependencies
|
||||
google-api-python-client>=2.100.0
|
||||
google-auth>=2.23.0
|
||||
pandas>=2.1.0
|
||||
python-dotenv>=1.0.0
|
||||
rich>=13.7.0
|
||||
typer>=0.9.0
|
||||
117
custom-skills/16-seo-search-console/desktop/SKILL.md
Normal file
117
custom-skills/16-seo-search-console/desktop/SKILL.md
Normal file
@@ -0,0 +1,117 @@
|
||||
---
|
||||
name: seo-search-console
|
||||
version: 1.0.0
|
||||
description: Google Search Console data analyzer for rankings, CTR, impressions, and index coverage. Triggers: Search Console, GSC, rankings, search performance, impressions, CTR.
|
||||
allowed-tools: mcp__perplexity__*, mcp__notion__*
|
||||
---
|
||||
|
||||
# SEO Search Console
|
||||
|
||||
## Purpose
|
||||
|
||||
Analyze Google Search Console data: search performance (queries, pages, CTR, position), sitemap status, and index coverage.
|
||||
|
||||
## Core Capabilities
|
||||
|
||||
1. **Performance Analysis** - Clicks, impressions, CTR, position
|
||||
2. **Query Analysis** - Top search queries
|
||||
3. **Page Performance** - Best/worst performing pages
|
||||
4. **Index Coverage** - Crawl and index issues
|
||||
5. **Sitemap Status** - Submission and processing
|
||||
|
||||
## Data Collection
|
||||
|
||||
### Option 1: User Provides Data
|
||||
Request GSC export from user:
|
||||
1. Go to Search Console > Performance
|
||||
2. Export data (CSV or Google Sheets)
|
||||
3. Share with assistant
|
||||
|
||||
### Option 2: User Describes Data
|
||||
User verbally provides:
|
||||
- Top queries and positions
|
||||
- CTR trends
|
||||
- Coverage issues
|
||||
|
||||
## Analysis Framework
|
||||
|
||||
### Performance Metrics
|
||||
|
||||
| Metric | What It Measures | Good Benchmark |
|
||||
|--------|------------------|----------------|
|
||||
| Clicks | User visits from search | Trending up |
|
||||
| Impressions | Search appearances | High for target keywords |
|
||||
| CTR | Click-through rate | 2-5% average |
|
||||
| Position | Average ranking | <10 for key terms |
|
||||
|
||||
### Query Analysis
|
||||
|
||||
Identify:
|
||||
- **Winners** - High position, high CTR
|
||||
- **Opportunities** - High impressions, low CTR
|
||||
- **Quick wins** - Position 8-20, low effort to improve
|
||||
|
||||
### Page Analysis
|
||||
|
||||
Categorize:
|
||||
- **Top performers** - High clicks, good CTR
|
||||
- **Underperformers** - High impressions, low CTR
|
||||
- **Declining** - Down vs previous period
|
||||
|
||||
## Workflow
|
||||
|
||||
1. Collect GSC data from user
|
||||
2. Analyze performance trends
|
||||
3. Identify top queries and pages
|
||||
4. Find optimization opportunities
|
||||
5. Check for coverage issues
|
||||
6. Provide actionable recommendations
|
||||
|
||||
## Output Format
|
||||
|
||||
```markdown
|
||||
## Search Console Analysis: [Site]
|
||||
|
||||
### Overview (Last 28 Days)
|
||||
| Metric | Value | vs Previous |
|
||||
|--------|-------|-------------|
|
||||
| Clicks | X | +X% |
|
||||
| Impressions | X | +X% |
|
||||
| CTR | X% | +X% |
|
||||
| Position | X | +X |
|
||||
|
||||
### Top Queries
|
||||
| Query | Clicks | Position | Opportunity |
|
||||
|-------|--------|----------|-------------|
|
||||
|
||||
### Top Pages
|
||||
| Page | Clicks | CTR | Status |
|
||||
|------|--------|-----|--------|
|
||||
|
||||
### Opportunities
|
||||
1. [Query with high impressions, low CTR]
|
||||
2. [Page ranking 8-20 that can improve]
|
||||
|
||||
### Issues
|
||||
- [Coverage problems]
|
||||
- [Sitemap issues]
|
||||
|
||||
### Recommendations
|
||||
1. [Priority action]
|
||||
```
|
||||
|
||||
## Common Issues
|
||||
|
||||
| Issue | Impact | Fix |
|
||||
|-------|--------|-----|
|
||||
| Low CTR on high-impression query | Lost traffic | Improve title/description |
|
||||
| Declining positions | Traffic loss | Update content, build links |
|
||||
| Not indexed pages | No visibility | Fix crawl issues |
|
||||
| Sitemap errors | Discovery problems | Fix sitemap XML |
|
||||
|
||||
## Limitations
|
||||
|
||||
- Requires user to provide GSC data
|
||||
- API access needs service account setup
|
||||
- Data has 2-3 day delay
|
||||
- Limited to verified properties
|
||||
235
custom-skills/17-seo-gateway-architect/README.md
Normal file
235
custom-skills/17-seo-gateway-architect/README.md
Normal file
@@ -0,0 +1,235 @@
|
||||
# SEO Gateway Page Strategist Skill
|
||||
|
||||
## Overview
|
||||
A comprehensive Claude Skill for creating SEO-focused gateway page strategies optimized for Korean medical and service websites, with specific optimization for both Naver and Google.
|
||||
|
||||
## Installation
|
||||
|
||||
1. Download and extract the skill package
|
||||
2. In Claude Desktop, go to Settings > Skills
|
||||
3. Click "Add Skill" and select the folder
|
||||
4. The skill will be available in your conversations
|
||||
|
||||
## File Structure
|
||||
```
|
||||
seo-gateway-strategist/
|
||||
├── SKILL.md # Main skill definition
|
||||
├── README.md # This file
|
||||
├── scripts/
|
||||
│ ├── keyword_analyzer.py # Keyword research automation
|
||||
│ └── requirements.txt # Python dependencies
|
||||
├── templates/
|
||||
│ ├── keyword-research-template.md
|
||||
│ ├── content-architecture-template.md
|
||||
│ └── seo-checklist-template.md
|
||||
└── examples/
|
||||
└── eye-surgery-strategy-example.md
|
||||
```
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Basic Usage
|
||||
Simply ask Claude:
|
||||
```
|
||||
"Use the SEO Gateway Page Strategist skill to create a strategy for [your service]"
|
||||
```
|
||||
|
||||
### Example Prompts
|
||||
```
|
||||
"Create an SEO gateway page strategy for 눈 성형"
|
||||
|
||||
"Use SEO Gateway Page Strategist skill for laser hair removal targeting Gangnam area"
|
||||
|
||||
"Apply the gateway strategist skill for 동안 성형 with competitor analysis"
|
||||
```
|
||||
|
||||
## Features
|
||||
|
||||
### 1. Keyword Research & Analysis
|
||||
- Primary keyword identification
|
||||
- 10+ LSI keyword suggestions
|
||||
- Long-tail keyword generation
|
||||
- Question keywords for FAQs
|
||||
- User intent distribution analysis
|
||||
|
||||
### 2. Content Architecture Planning
|
||||
- Complete H1-H3 hierarchy
|
||||
- Section-by-section word counts
|
||||
- Keyword placement strategy
|
||||
- Internal linking map
|
||||
- Content depth recommendations
|
||||
|
||||
### 3. Technical SEO Specifications
|
||||
- Meta tag templates
|
||||
- Schema markup code
|
||||
- Performance requirements
|
||||
- Mobile optimization guidelines
|
||||
- Core Web Vitals targets
|
||||
|
||||
### 4. Performance Tracking
|
||||
- 30/60/90-day KPI targets
|
||||
- Measurement methodologies
|
||||
- Competitive benchmarking
|
||||
- ROI tracking framework
|
||||
|
||||
## Python Scripts Usage
|
||||
|
||||
### keyword_analyzer.py
|
||||
Automated keyword research and analysis tool.
|
||||
|
||||
#### Installation
|
||||
```bash
|
||||
cd scripts
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
#### Usage
|
||||
```bash
|
||||
python keyword_analyzer.py "눈 성형"
|
||||
```
|
||||
|
||||
#### Output
|
||||
- Keyword analysis report (console)
|
||||
- JSON export file with all data
|
||||
- Strategic recommendations
|
||||
|
||||
## Templates
|
||||
|
||||
### keyword-research-template.md
|
||||
Comprehensive worksheet for keyword planning:
|
||||
- Primary keyword metrics
|
||||
- LSI keyword matrix
|
||||
- Intent distribution
|
||||
- Competitor gaps
|
||||
- Seasonal trends
|
||||
|
||||
### content-architecture-template.md
|
||||
Page structure blueprint:
|
||||
- Complete heading hierarchy
|
||||
- Content blocks with word counts
|
||||
- Keyword density mapping
|
||||
- Schema markup templates
|
||||
|
||||
### seo-checklist-template.md
|
||||
Technical implementation checklist:
|
||||
- Meta tags optimization
|
||||
- Performance requirements
|
||||
- Mobile optimization
|
||||
- Naver-specific elements
|
||||
- Tracking setup
|
||||
|
||||
## Best Practices
|
||||
|
||||
### For Medical Services
|
||||
1. Always verify medical claims compliance
|
||||
2. Include trust signals prominently
|
||||
3. Focus on E-A-T (Expertise, Authority, Trust)
|
||||
4. Use medical schema markup
|
||||
5. Include comprehensive FAQs
|
||||
|
||||
### For Local SEO
|
||||
1. Include location modifiers in keywords
|
||||
2. Optimize for "near me" searches
|
||||
3. Add local business schema
|
||||
4. Create location-specific content
|
||||
5. Build local citations
|
||||
|
||||
### For Naver Optimization
|
||||
1. Keep titles under 30 Korean characters
|
||||
2. Optimize image-to-text ratio (40:60)
|
||||
3. Minimize outbound links
|
||||
4. Focus on brand searches
|
||||
5. Integrate Naver Blog content
|
||||
|
||||
## Customization
|
||||
|
||||
### Industry Adaptation
|
||||
Replace medical-specific terms with your industry:
|
||||
- Medical → Your Industry
|
||||
- 시술 → Service/Product
|
||||
- 전문의 → Specialist/Expert
|
||||
- 클리닉 → Business Name
|
||||
|
||||
### Language Settings
|
||||
The skill supports:
|
||||
- Korean (primary)
|
||||
- English
|
||||
- Bilingual optimization
|
||||
|
||||
### Market Specifics
|
||||
Adjust for your market:
|
||||
- Change location references
|
||||
- Update competitor examples
|
||||
- Modify pricing strategies
|
||||
- Adapt cultural considerations
|
||||
|
||||
## Integration Points
|
||||
|
||||
### Analytics Tools
|
||||
- Google Analytics 4
|
||||
- Google Search Console
|
||||
- Naver Analytics
|
||||
- Naver Webmaster Tools
|
||||
|
||||
### CMS Platforms
|
||||
- WordPress
|
||||
- Ghost
|
||||
- Strapi
|
||||
- Custom CMS
|
||||
|
||||
### Marketing Tools
|
||||
- SEMrush
|
||||
- Ahrefs
|
||||
- Screaming Frog
|
||||
- PageSpeed Insights
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
**Q: The skill isn't appearing in Claude**
|
||||
A: Ensure the SKILL.md file is in the root folder and restart Claude Desktop
|
||||
|
||||
**Q: Python script not working**
|
||||
A: Check Python 3.7+ is installed and dependencies are met
|
||||
|
||||
**Q: Keywords seem generic**
|
||||
A: Provide more specific context about your market and competitors
|
||||
|
||||
## Updates
|
||||
|
||||
### Version History
|
||||
- v1.0.0 (2024-12-28): Initial release
|
||||
- Complete keyword research system
|
||||
- Content architecture planning
|
||||
- Technical SEO checklist
|
||||
- Korean market optimization
|
||||
|
||||
### Planned Features
|
||||
- [ ] AI-powered competitor analysis
|
||||
- [ ] Automated content generation
|
||||
- [ ] Real-time SERP tracking
|
||||
- [ ] Multi-language support
|
||||
- [ ] API integrations
|
||||
|
||||
## Support
|
||||
|
||||
### Documentation
|
||||
- Full examples in `/examples/` folder
|
||||
- Templates in `/templates/` folder
|
||||
- Skill documentation in `SKILL.md`
|
||||
|
||||
### Feedback
|
||||
Please provide feedback to improve this skill:
|
||||
- What features would you like added?
|
||||
- What industries need specific templates?
|
||||
- What integrations would be helpful?
|
||||
|
||||
## License
|
||||
This skill is provided as-is for use with Claude Desktop. Feel free to modify and adapt for your specific needs.
|
||||
|
||||
---
|
||||
|
||||
**Created by**: OurDigital SEO Team
|
||||
**Last Updated**: December 28, 2024
|
||||
**Version**: 1.0.0
|
||||
65
custom-skills/17-seo-gateway-architect/code/CLAUDE.md
Normal file
65
custom-skills/17-seo-gateway-architect/code/CLAUDE.md
Normal file
@@ -0,0 +1,65 @@
|
||||
# CLAUDE.md
|
||||
|
||||
## Overview
|
||||
|
||||
SEO gateway page strategist for Korean medical/service websites. Creates keyword strategies, content architecture, and technical SEO plans.
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
pip install -r scripts/requirements.txt
|
||||
|
||||
# Keyword analysis
|
||||
python scripts/keyword_analyzer.py --topic "눈 성형" --market "강남"
|
||||
```
|
||||
|
||||
## Scripts
|
||||
|
||||
| Script | Purpose |
|
||||
|--------|---------|
|
||||
| `keyword_analyzer.py` | Analyze keywords, search volume, competitor gaps |
|
||||
|
||||
## Keyword Analyzer
|
||||
|
||||
```bash
|
||||
# Basic analysis
|
||||
python scripts/keyword_analyzer.py --topic "눈 성형"
|
||||
|
||||
# With location targeting
|
||||
python scripts/keyword_analyzer.py --topic "눈 성형" --market "강남" --output strategy.json
|
||||
|
||||
# Competitor analysis
|
||||
python scripts/keyword_analyzer.py --topic "눈 성형" --competitors url1,url2
|
||||
```
|
||||
|
||||
## Output
|
||||
|
||||
Generates strategic document with:
|
||||
- Primary keyword + monthly search volume
|
||||
- LSI keywords (7-10)
|
||||
- User intent distribution
|
||||
- Competitor gap analysis
|
||||
- Content architecture (H1-H3 structure)
|
||||
- Technical SEO checklist
|
||||
|
||||
## Templates
|
||||
|
||||
See `templates/` for:
|
||||
- `keyword-research-template.md`
|
||||
- `content-architecture-template.md`
|
||||
- `seo-checklist-template.md`
|
||||
|
||||
## Workflow
|
||||
|
||||
1. Run keyword analyzer for target topic
|
||||
2. Review search volume and intent data
|
||||
3. Use output to plan content architecture
|
||||
4. Hand off to `18-seo-gateway-builder` for content generation
|
||||
|
||||
## Configuration
|
||||
|
||||
```bash
|
||||
# Optional: API keys for enhanced data
|
||||
GOOGLE_API_KEY=xxx
|
||||
NAVER_API_KEY=xxx
|
||||
```
|
||||
@@ -0,0 +1,334 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Keyword Analyzer for SEO Gateway Pages
|
||||
Analyzes keywords and generates SEO strategy recommendations
|
||||
"""
|
||||
|
||||
import json
|
||||
from typing import Dict, List, Tuple
|
||||
from dataclasses import dataclass
|
||||
from datetime import datetime
|
||||
|
||||
@dataclass
|
||||
class KeywordData:
|
||||
"""Data structure for keyword information"""
|
||||
keyword: str
|
||||
search_volume: int
|
||||
difficulty: float
|
||||
intent: str
|
||||
cpc: float = 0.0
|
||||
trend: str = "stable"
|
||||
|
||||
class KeywordAnalyzer:
|
||||
"""Analyzes keywords for SEO gateway pages"""
|
||||
|
||||
def __init__(self, primary_keyword: str):
|
||||
self.primary_keyword = primary_keyword
|
||||
self.results = {
|
||||
"primary": None,
|
||||
"lsi": [],
|
||||
"long_tail": [],
|
||||
"questions": [],
|
||||
"intent_distribution": {},
|
||||
"recommendations": []
|
||||
}
|
||||
|
||||
def analyze_primary_keyword(self) -> KeywordData:
|
||||
"""
|
||||
Analyzes the primary keyword
|
||||
In production, this would call actual keyword research APIs
|
||||
"""
|
||||
# Simulated data - replace with actual API calls
|
||||
keyword_data = {
|
||||
"눈 성형": {"volume": 12000, "difficulty": 65, "intent": "informational", "cpc": 2500},
|
||||
"이마 성형": {"volume": 5500, "difficulty": 55, "intent": "informational", "cpc": 3000},
|
||||
"동안 성형": {"volume": 8000, "difficulty": 70, "intent": "comparative", "cpc": 2800},
|
||||
}
|
||||
|
||||
data = keyword_data.get(self.primary_keyword, {
|
||||
"volume": 1000,
|
||||
"difficulty": 50,
|
||||
"intent": "informational",
|
||||
"cpc": 1000
|
||||
})
|
||||
|
||||
self.results["primary"] = KeywordData(
|
||||
keyword=self.primary_keyword,
|
||||
search_volume=data["volume"],
|
||||
difficulty=data["difficulty"],
|
||||
intent=data["intent"],
|
||||
cpc=data["cpc"]
|
||||
)
|
||||
|
||||
return self.results["primary"]
|
||||
|
||||
def generate_lsi_keywords(self) -> List[KeywordData]:
|
||||
"""Generates LSI (Latent Semantic Indexing) keywords"""
|
||||
|
||||
lsi_patterns = {
|
||||
"눈 성형": [
|
||||
("쌍꺼풀 수술", 8000, "transactional"),
|
||||
("눈매교정", 5500, "informational"),
|
||||
("앞트임", 4000, "informational"),
|
||||
("뒤트임", 3500, "informational"),
|
||||
("눈 성형 비용", 2000, "comparative"),
|
||||
("눈 성형 부작용", 1500, "informational"),
|
||||
("눈 성형 회복기간", 1800, "informational"),
|
||||
("눈 성형 전후", 3000, "comparative"),
|
||||
("남자 눈 성형", 2500, "informational"),
|
||||
("눈 성형 잘하는곳", 2200, "comparative")
|
||||
],
|
||||
"이마 성형": [
|
||||
("이마거상술", 3000, "informational"),
|
||||
("이마축소술", 2500, "informational"),
|
||||
("헤어라인교정", 4000, "transactional"),
|
||||
("이마 성형 비용", 1200, "comparative"),
|
||||
("이마 보톡스", 6000, "transactional"),
|
||||
("M자 탈모 수술", 5000, "informational"),
|
||||
("이마 필러", 4500, "transactional"),
|
||||
("이마 성형 부작용", 800, "informational"),
|
||||
("이마 리프팅", 3500, "comparative"),
|
||||
("이마 주름 제거", 2800, "transactional")
|
||||
],
|
||||
"동안 성형": [
|
||||
("안면 리프팅", 7000, "transactional"),
|
||||
("실리프팅", 9000, "transactional"),
|
||||
("보톡스 시술", 15000, "transactional"),
|
||||
("필러 시술", 12000, "transactional"),
|
||||
("동안 성형 비용", 2500, "comparative"),
|
||||
("울쎄라", 8000, "comparative"),
|
||||
("써마지", 6500, "comparative"),
|
||||
("동안 시술 종류", 1800, "informational"),
|
||||
("주름 제거 시술", 4000, "transactional"),
|
||||
("동안 성형 추천", 2200, "comparative")
|
||||
]
|
||||
}
|
||||
|
||||
lsi_list = lsi_patterns.get(self.primary_keyword, [
|
||||
(f"{self.primary_keyword} 비용", 1000, "comparative"),
|
||||
(f"{self.primary_keyword} 부작용", 800, "informational"),
|
||||
(f"{self.primary_keyword} 후기", 1200, "comparative"),
|
||||
])
|
||||
|
||||
for keyword, volume, intent in lsi_list:
|
||||
self.results["lsi"].append(KeywordData(
|
||||
keyword=keyword,
|
||||
search_volume=volume,
|
||||
difficulty=45 + (volume/1000), # Simple difficulty calculation
|
||||
intent=intent
|
||||
))
|
||||
|
||||
return self.results["lsi"]
|
||||
|
||||
def generate_long_tail_keywords(self) -> List[str]:
|
||||
"""Generates long-tail keyword variations"""
|
||||
|
||||
location_modifiers = ["강남", "신사", "청담", "압구정", "서울"]
|
||||
action_modifiers = ["잘하는곳", "추천", "유명한", "전문", "비용"]
|
||||
|
||||
long_tails = []
|
||||
for location in location_modifiers:
|
||||
long_tails.append(f"{location} {self.primary_keyword}")
|
||||
for action in action_modifiers[:2]: # Limit combinations
|
||||
long_tails.append(f"{location} {self.primary_keyword} {action}")
|
||||
|
||||
self.results["long_tail"] = long_tails
|
||||
return long_tails
|
||||
|
||||
def generate_question_keywords(self) -> List[str]:
|
||||
"""Generates question-based keywords for featured snippets"""
|
||||
|
||||
question_templates = [
|
||||
f"{self.primary_keyword} 비용은 얼마인가요?",
|
||||
f"{self.primary_keyword} 회복기간은 얼마나 걸리나요?",
|
||||
f"{self.primary_keyword} 부작용이 있나요?",
|
||||
f"{self.primary_keyword} 통증이 심한가요?",
|
||||
f"{self.primary_keyword} 효과는 얼마나 지속되나요?",
|
||||
f"{self.primary_keyword} 나이 제한이 있나요?",
|
||||
f"{self.primary_keyword} 후 주의사항은 무엇인가요?"
|
||||
]
|
||||
|
||||
self.results["questions"] = question_templates
|
||||
return question_templates
|
||||
|
||||
def calculate_intent_distribution(self) -> Dict[str, float]:
|
||||
"""Calculates user intent distribution across keywords"""
|
||||
|
||||
intent_counts = {
|
||||
"informational": 0,
|
||||
"comparative": 0,
|
||||
"transactional": 0,
|
||||
"navigational": 0
|
||||
}
|
||||
|
||||
# Count primary keyword intent
|
||||
if self.results["primary"]:
|
||||
intent_counts[self.results["primary"].intent] += self.results["primary"].search_volume
|
||||
|
||||
# Count LSI keyword intents
|
||||
for kw in self.results["lsi"]:
|
||||
intent_counts[kw.intent] += kw.search_volume
|
||||
|
||||
# Calculate percentages
|
||||
total_volume = sum(intent_counts.values())
|
||||
if total_volume > 0:
|
||||
self.results["intent_distribution"] = {
|
||||
intent: round((count/total_volume) * 100, 1)
|
||||
for intent, count in intent_counts.items()
|
||||
if count > 0
|
||||
}
|
||||
|
||||
return self.results["intent_distribution"]
|
||||
|
||||
def generate_recommendations(self) -> List[str]:
|
||||
"""Generates SEO recommendations based on analysis"""
|
||||
|
||||
recommendations = []
|
||||
|
||||
# Based on search volume
|
||||
if self.results["primary"] and self.results["primary"].search_volume > 10000:
|
||||
recommendations.append("High search volume detected - prioritize this page for development")
|
||||
|
||||
# Based on intent distribution
|
||||
intent_dist = self.results["intent_distribution"]
|
||||
if intent_dist.get("informational", 0) > 50:
|
||||
recommendations.append("Focus on educational content and comprehensive guides")
|
||||
if intent_dist.get("comparative", 0) > 30:
|
||||
recommendations.append("Include comparison tables and competitive differentiators")
|
||||
if intent_dist.get("transactional", 0) > 20:
|
||||
recommendations.append("Optimize conversion elements and CTAs above the fold")
|
||||
|
||||
# Based on competition
|
||||
if self.results["primary"] and self.results["primary"].difficulty > 60:
|
||||
recommendations.append("High competition - invest in quality content and backlinks")
|
||||
recommendations.append("Target long-tail keywords for quicker wins")
|
||||
|
||||
# Question keywords
|
||||
if len(self.results["questions"]) > 5:
|
||||
recommendations.append("Implement FAQ schema markup for featured snippets")
|
||||
|
||||
self.results["recommendations"] = recommendations
|
||||
return recommendations
|
||||
|
||||
def export_analysis(self, filename: str = None) -> str:
|
||||
"""Exports the analysis results to JSON"""
|
||||
|
||||
if not filename:
|
||||
filename = f"keyword_analysis_{self.primary_keyword.replace(' ', '_')}_{datetime.now().strftime('%Y%m%d')}.json"
|
||||
|
||||
export_data = {
|
||||
"analysis_date": datetime.now().isoformat(),
|
||||
"primary_keyword": self.primary_keyword,
|
||||
"primary_data": {
|
||||
"keyword": self.results["primary"].keyword,
|
||||
"search_volume": self.results["primary"].search_volume,
|
||||
"difficulty": self.results["primary"].difficulty,
|
||||
"intent": self.results["primary"].intent
|
||||
} if self.results["primary"] else None,
|
||||
"lsi_keywords": [
|
||||
{
|
||||
"keyword": kw.keyword,
|
||||
"volume": kw.search_volume,
|
||||
"intent": kw.intent
|
||||
} for kw in self.results["lsi"]
|
||||
],
|
||||
"long_tail_keywords": self.results["long_tail"],
|
||||
"question_keywords": self.results["questions"],
|
||||
"intent_distribution": self.results["intent_distribution"],
|
||||
"recommendations": self.results["recommendations"]
|
||||
}
|
||||
|
||||
with open(filename, 'w', encoding='utf-8') as f:
|
||||
json.dump(export_data, f, ensure_ascii=False, indent=2)
|
||||
|
||||
return filename
|
||||
|
||||
def generate_report(self) -> str:
|
||||
"""Generates a formatted text report"""
|
||||
|
||||
report = f"""
|
||||
# Keyword Analysis Report
|
||||
Generated: {datetime.now().strftime('%Y-%m-%d %H:%M')}
|
||||
|
||||
## Primary Keyword: {self.primary_keyword}
|
||||
- Search Volume: {self.results['primary'].search_volume:,}
|
||||
- Difficulty: {self.results['primary'].difficulty}/100
|
||||
- Primary Intent: {self.results['primary'].intent.capitalize()}
|
||||
|
||||
## LSI Keywords (Top 10)
|
||||
"""
|
||||
for i, kw in enumerate(self.results['lsi'][:10], 1):
|
||||
report += f"{i}. {kw.keyword} - Volume: {kw.search_volume:,} ({kw.intent})\n"
|
||||
|
||||
report += f"\n## User Intent Distribution\n"
|
||||
for intent, percentage in self.results['intent_distribution'].items():
|
||||
report += f"- {intent.capitalize()}: {percentage}%\n"
|
||||
|
||||
report += f"\n## Long-tail Opportunities\n"
|
||||
for keyword in self.results['long_tail'][:5]:
|
||||
report += f"- {keyword}\n"
|
||||
|
||||
report += f"\n## Question Keywords (FAQ Optimization)\n"
|
||||
for question in self.results['questions'][:5]:
|
||||
report += f"- {question}\n"
|
||||
|
||||
report += f"\n## Strategic Recommendations\n"
|
||||
for i, rec in enumerate(self.results['recommendations'], 1):
|
||||
report += f"{i}. {rec}\n"
|
||||
|
||||
return report
|
||||
|
||||
|
||||
def main():
|
||||
"""Main execution function"""
|
||||
import argparse
|
||||
|
||||
parser = argparse.ArgumentParser(
|
||||
description='Analyze keywords for SEO gateway page strategy',
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog='''
|
||||
Examples:
|
||||
python keyword_analyzer.py --topic "눈 성형"
|
||||
python keyword_analyzer.py --topic "이마 성형" --market "강남"
|
||||
python keyword_analyzer.py --topic "동안 성형" --output strategy.json
|
||||
'''
|
||||
)
|
||||
parser.add_argument('--topic', '-t', required=True,
|
||||
help='Primary keyword to analyze (e.g., "눈 성형")')
|
||||
parser.add_argument('--market', '-m', default=None,
|
||||
help='Target market/location (e.g., "강남")')
|
||||
parser.add_argument('--output', '-o', default=None,
|
||||
help='Output JSON file path')
|
||||
parser.add_argument('--competitors', '-c', default=None,
|
||||
help='Comma-separated competitor URLs for analysis')
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
keyword = args.topic
|
||||
if args.market:
|
||||
keyword = f"{args.market} {args.topic}"
|
||||
|
||||
print(f"Analyzing keyword: {keyword}")
|
||||
print("-" * 50)
|
||||
|
||||
analyzer = KeywordAnalyzer(keyword)
|
||||
|
||||
# Run analysis
|
||||
analyzer.analyze_primary_keyword()
|
||||
analyzer.generate_lsi_keywords()
|
||||
analyzer.generate_long_tail_keywords()
|
||||
analyzer.generate_question_keywords()
|
||||
analyzer.calculate_intent_distribution()
|
||||
analyzer.generate_recommendations()
|
||||
|
||||
# Generate and print report
|
||||
report = analyzer.generate_report()
|
||||
print(report)
|
||||
|
||||
# Export to JSON
|
||||
filename = analyzer.export_analysis(args.output)
|
||||
print(f"\nAnalysis exported to: {filename}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -0,0 +1,25 @@
|
||||
# Python dependencies for SEO Gateway Page Strategist scripts
|
||||
# Install with: pip install -r requirements.txt
|
||||
|
||||
# Core dependencies
|
||||
requests>=2.28.0
|
||||
beautifulsoup4>=4.11.0
|
||||
pandas>=1.5.0
|
||||
numpy>=1.23.0
|
||||
|
||||
# For API integrations (optional)
|
||||
google-api-python-client>=2.70.0
|
||||
pytrends>=4.9.0
|
||||
|
||||
# For data visualization (optional)
|
||||
matplotlib>=3.6.0
|
||||
seaborn>=0.12.0
|
||||
|
||||
# For export formats
|
||||
openpyxl>=3.0.0
|
||||
jinja2>=3.1.0
|
||||
|
||||
# Development tools
|
||||
pytest>=7.2.0
|
||||
black>=22.0.0
|
||||
pylint>=2.15.0
|
||||
@@ -0,0 +1,160 @@
|
||||
# Content Architecture Template
|
||||
|
||||
## Page Hierarchy Structure
|
||||
|
||||
```
|
||||
[Page URL: /service-name]
|
||||
│
|
||||
├── H1: [Primary Keyword-Optimized Headline]
|
||||
│ Example: "강남 눈 성형 전문의가 만드는 자연스러운 눈매"
|
||||
│ Word Count Target: 15-25 characters
|
||||
│ Keyword Placement: Primary keyword at beginning
|
||||
│
|
||||
├── Hero Section [Above Fold]
|
||||
│ ├── Value Proposition (30-50 words)
|
||||
│ │ └── Keywords: Primary + 1 LSI
|
||||
│ ├── Trust Signals (3-5 items)
|
||||
│ │ ├── Certification badges
|
||||
│ │ ├── Years of experience
|
||||
│ │ └── Success cases number
|
||||
│ └── Primary CTA
|
||||
│ └── Text: "무료 상담 신청하기"
|
||||
│
|
||||
├── H2: [Service Name] 이란? [Problem/Solution Framework]
|
||||
│ Word Count: 200-300 words
|
||||
│ Keywords: Primary (1x), LSI (2-3x)
|
||||
│ ├── H3: 이런 고민이 있으신가요? [Pain Points]
|
||||
│ │ ├── Pain point 1 (include LSI keyword)
|
||||
│ │ ├── Pain point 2 (include LSI keyword)
|
||||
│ │ └── Pain point 3 (include LSI keyword)
|
||||
│ └── H3: [Clinic Name]의 솔루션 [Benefits]
|
||||
│ ├── Benefit 1 (address pain point 1)
|
||||
│ ├── Benefit 2 (address pain point 2)
|
||||
│ └── Benefit 3 (address pain point 3)
|
||||
│
|
||||
├── H2: [Service Name] 종류 및 방법 [Service Categories]
|
||||
│ Word Count: 400-500 words total
|
||||
│ Keywords: Category-specific LSI keywords
|
||||
│ ├── H3: [Sub-service 1] - [LSI Keyword Variation]
|
||||
│ │ ├── Description (80-100 words)
|
||||
│ │ ├── Best for (target audience)
|
||||
│ │ ├── Duration & Recovery
|
||||
│ │ └── CTA: "자세히 보기"
|
||||
│ ├── H3: [Sub-service 2] - [LSI Keyword Variation]
|
||||
│ │ └── [Same structure as above]
|
||||
│ └── H3: [Sub-service 3] - [LSI Keyword Variation]
|
||||
│ └── [Same structure as above]
|
||||
│
|
||||
├── H2: [Clinic Name] [Service Name]만의 차별점 [Trust & Authority]
|
||||
│ Word Count: 300-400 words
|
||||
│ Keywords: Brand + Primary keyword combinations
|
||||
│ ├── H3: 전문 의료진 [Doctor Credentials]
|
||||
│ │ ├── Doctor profile summary
|
||||
│ │ ├── Specializations
|
||||
│ │ └── Certifications
|
||||
│ ├── H3: 검증된 시술 결과 [Success Metrics]
|
||||
│ │ ├── Number statistics
|
||||
│ │ ├── Success rate
|
||||
│ │ └── Patient satisfaction
|
||||
│ └── H3: 첨단 장비 및 시설 [Facilities]
|
||||
│ ├── Equipment descriptions
|
||||
│ └── Safety protocols
|
||||
│
|
||||
├── H2: [Service Name] 자주 묻는 질문 [FAQ Section]
|
||||
│ Word Count: 500-700 words
|
||||
│ Keywords: Long-tail question keywords
|
||||
│ ├── Q1: [Long-tail keyword as question]?
|
||||
│ │ └── A: [40-60 word answer, keyword in first sentence]
|
||||
│ ├── Q2: [Price-related question]?
|
||||
│ │ └── A: [Include "비용" LSI keyword]
|
||||
│ ├── Q3: [Recovery-related question]?
|
||||
│ │ └── A: [Include "회복기간" LSI keyword]
|
||||
│ ├── Q4: [Side-effect question]?
|
||||
│ │ └── A: [Include "부작용" LSI keyword]
|
||||
│ ├── Q5: [Process question]?
|
||||
│ │ └── A: [Include process-related LSI]
|
||||
│ ├── Q6: [Candidacy question]?
|
||||
│ │ └── A: [Include target audience keywords]
|
||||
│ └── Q7: [Results duration question]?
|
||||
│ └── A: [Include maintenance keywords]
|
||||
│
|
||||
├── H2: [Service Name] 시술 과정 [Process Guide]
|
||||
│ Word Count: 300-400 words
|
||||
│ Keywords: "과정", "단계", procedural LSI
|
||||
│ ├── H3: 상담 및 검사 [Consultation]
|
||||
│ ├── H3: 시술 당일 [Procedure Day]
|
||||
│ ├── H3: 회복 과정 [Recovery]
|
||||
│ └── H3: 사후 관리 [Aftercare]
|
||||
│
|
||||
├── H2: 실제 고객 후기 [Social Proof]
|
||||
│ Word Count: 200-300 words
|
||||
│ Keywords: "후기", "리뷰", satisfaction keywords
|
||||
│ ├── Review snippet 1
|
||||
│ ├── Review snippet 2
|
||||
│ ├── Review snippet 3
|
||||
│ └── Before/After gallery teaser
|
||||
│
|
||||
└── H2: 상담 예약 안내 [Conversion Section]
|
||||
Word Count: 150-200 words
|
||||
Keywords: CTA-related, location keywords
|
||||
├── H3: 상담 예약 방법
|
||||
├── H3: 오시는 길
|
||||
└── H3: 문의 정보
|
||||
```
|
||||
|
||||
## Keyword Density Map
|
||||
|
||||
| Section | Primary Keyword | LSI Keywords | Total Keywords |
|
||||
|---------|----------------|--------------|----------------|
|
||||
| Hero | 1 | 1-2 | 2-3 |
|
||||
| Problem/Solution | 1 | 2-3 | 3-4 |
|
||||
| Service Categories | 1-2 | 4-6 | 5-8 |
|
||||
| Trust & Authority | 1 | 2-3 | 3-4 |
|
||||
| FAQ | 2-3 | 5-7 | 7-10 |
|
||||
| Process | 1 | 2-3 | 3-4 |
|
||||
| Social Proof | 0-1 | 1-2 | 1-3 |
|
||||
| Conversion | 1 | 1-2 | 2-3 |
|
||||
| **Total** | **8-11** | **18-29** | **26-40** |
|
||||
|
||||
## Internal Linking Strategy
|
||||
|
||||
| From Section | To Page | Anchor Text | Purpose |
|
||||
|-------------|---------|-------------|---------|
|
||||
| Service Categories | Sub-service page | [Sub-service name] | Deep dive |
|
||||
| FAQ | Price page | "비용 안내 페이지" | Conversion |
|
||||
| Trust section | Doctor profile | "[Doctor name] 원장" | Authority |
|
||||
| Process section | Consultation form | "상담 예약하기" | Conversion |
|
||||
| Social proof | Gallery page | "더 많은 전후 사진" | Engagement |
|
||||
|
||||
## Content Length Guidelines
|
||||
|
||||
- **Total Page Length**: 2,000-2,500 words
|
||||
- **Above Fold Content**: 100-150 words
|
||||
- **Each H2 Section**: 200-500 words
|
||||
- **Each H3 Subsection**: 80-150 words
|
||||
- **Meta Description**: 150-160 characters
|
||||
- **Image Alt Text**: 10-15 words each
|
||||
|
||||
## Schema Markup Requirements
|
||||
|
||||
```json
|
||||
{
|
||||
"@context": "https://schema.org",
|
||||
"@type": "MedicalProcedure",
|
||||
"name": "[Service Name]",
|
||||
"description": "[Meta description]",
|
||||
"procedureType": "Cosmetic",
|
||||
"provider": {
|
||||
"@type": "MedicalOrganization",
|
||||
"name": "[Clinic Name]"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Mobile Content Adaptation
|
||||
|
||||
- Reduce hero text by 30%
|
||||
- Show 3 FAQs initially (expand for more)
|
||||
- Simplify navigation to single-column
|
||||
- Increase CTA button size
|
||||
- Compress trust signals to carousel
|
||||
@@ -0,0 +1,95 @@
|
||||
# Keyword Research Template
|
||||
|
||||
## Primary Keyword Analysis
|
||||
|
||||
| Metric | Value | Notes |
|
||||
|--------|-------|-------|
|
||||
| **Primary Keyword** | [KEYWORD] | Main target keyword |
|
||||
| **Monthly Search Volume** | [VOLUME] | Average monthly searches |
|
||||
| **Keyword Difficulty** | [0-100] | Competition score |
|
||||
| **Current Ranking** | #[POSITION] | Current SERP position |
|
||||
| **Search Trend** | ↑ ↓ → | Trending direction |
|
||||
|
||||
## LSI Keywords Matrix
|
||||
|
||||
| LSI Keyword | Search Volume | Intent Type | Priority |
|
||||
|------------|--------------|-------------|----------|
|
||||
| [keyword 1] | [volume] | Informational | High |
|
||||
| [keyword 2] | [volume] | Transactional | Medium |
|
||||
| [keyword 3] | [volume] | Comparative | High |
|
||||
| [keyword 4] | [volume] | Informational | Medium |
|
||||
| [keyword 5] | [volume] | Transactional | Low |
|
||||
| [keyword 6] | [volume] | Comparative | High |
|
||||
| [keyword 7] | [volume] | Informational | Medium |
|
||||
| [keyword 8] | [volume] | Navigational | Low |
|
||||
| [keyword 9] | [volume] | Transactional | High |
|
||||
| [keyword 10] | [volume] | Informational | Medium |
|
||||
|
||||
## User Intent Distribution
|
||||
|
||||
```
|
||||
Informational (Research Phase): ___%
|
||||
- Common queries: "what is", "how to", "benefits of"
|
||||
- Content needed: Educational guides, FAQs, process explanations
|
||||
|
||||
Comparative (Evaluation Phase): ___%
|
||||
- Common queries: "best", "vs", "reviews", "비교"
|
||||
- Content needed: Comparison tables, reviews, case studies
|
||||
|
||||
Transactional (Ready to Convert): ___%
|
||||
- Common queries: "price", "book", "consultation", "예약"
|
||||
- Content needed: CTAs, pricing, booking forms
|
||||
```
|
||||
|
||||
## Long-tail Keyword Opportunities
|
||||
|
||||
### Question-based Keywords
|
||||
- [질문 키워드 1]
|
||||
- [질문 키워드 2]
|
||||
- [질문 키워드 3]
|
||||
|
||||
### Location-based Keywords
|
||||
- [지역] + [primary keyword]
|
||||
- [지역] + [primary keyword] + 잘하는곳
|
||||
- [지역] + [primary keyword] + 추천
|
||||
|
||||
### Modifier-based Keywords
|
||||
- [primary keyword] + 비용
|
||||
- [primary keyword] + 부작용
|
||||
- [primary keyword] + 회복기간
|
||||
- [primary keyword] + 전후
|
||||
|
||||
## Competitor Keyword Analysis
|
||||
|
||||
| Competitor | Target Keywords | Ranking Keywords | Gap Opportunities |
|
||||
|------------|----------------|------------------|-------------------|
|
||||
| Competitor 1 | [keywords] | [keywords] | [missing keywords] |
|
||||
| Competitor 2 | [keywords] | [keywords] | [missing keywords] |
|
||||
| Competitor 3 | [keywords] | [keywords] | [missing keywords] |
|
||||
|
||||
## Seasonal Trends
|
||||
|
||||
| Month | Search Volume | Events/Factors |
|
||||
|-------|--------------|----------------|
|
||||
| January | [volume] | New year resolutions |
|
||||
| February | [volume] | [factor] |
|
||||
| March | [volume] | [factor] |
|
||||
| ... | ... | ... |
|
||||
|
||||
## Platform-Specific Keywords
|
||||
|
||||
### Naver-Optimized
|
||||
- [네이버 specific keyword 1]
|
||||
- [네이버 specific keyword 2]
|
||||
|
||||
### Google-Optimized
|
||||
- [Google specific keyword 1]
|
||||
- [Google specific keyword 2]
|
||||
|
||||
## Action Items
|
||||
|
||||
- [ ] Target primary keyword in H1 and title tag
|
||||
- [ ] Include 3-5 LSI keywords naturally in content
|
||||
- [ ] Create content matching user intent distribution
|
||||
- [ ] Optimize for question-based featured snippets
|
||||
- [ ] Add location modifiers for local SEO
|
||||
@@ -0,0 +1,239 @@
|
||||
# SEO Technical Checklist Template
|
||||
|
||||
## Meta Tags Optimization
|
||||
|
||||
### Title Tag
|
||||
- [ ] Length: 50-60 characters
|
||||
- [ ] Primary keyword at beginning
|
||||
- [ ] Brand name at end
|
||||
- [ ] Unique for each page
|
||||
- [ ] Formula: `[Primary Keyword] - [Value Proposition] | [Brand]`
|
||||
|
||||
**Template**: `{primary_keyword} 전문 - {unique_value} | {clinic_name}`
|
||||
**Example**: `눈 성형 전문 - 자연스러운 라인 | 제이미클리닉`
|
||||
|
||||
### Meta Description
|
||||
- [ ] Length: 150-160 characters
|
||||
- [ ] Include primary keyword
|
||||
- [ ] Include 1-2 LSI keywords
|
||||
- [ ] Clear CTA
|
||||
- [ ] Unique for each page
|
||||
|
||||
**Template**: `{location} {primary_keyword} 전문의가 {benefit}. {credential}. 무료상담 ☎ {phone}`
|
||||
**Example**: `강남 눈 성형 전문의가 자연스러운 눈매를 디자인합니다. 15년 경력, 10,000건 시술. 무료상담 ☎ 02-1234-5678`
|
||||
|
||||
### Open Graph Tags
|
||||
```html
|
||||
<meta property="og:title" content="{page_title}">
|
||||
<meta property="og:description" content="{meta_description}">
|
||||
<meta property="og:image" content="{featured_image_url}">
|
||||
<meta property="og:url" content="{page_url}">
|
||||
<meta property="og:type" content="website">
|
||||
<meta property="og:locale" content="ko_KR">
|
||||
```
|
||||
|
||||
## Header Tags Structure
|
||||
|
||||
- [ ] Only one H1 per page
|
||||
- [ ] H1 contains primary keyword
|
||||
- [ ] H2 tags for main sections (5-7)
|
||||
- [ ] H3 tags for subsections
|
||||
- [ ] Logical hierarchy maintained
|
||||
- [ ] Keywords distributed naturally
|
||||
|
||||
## Content Optimization
|
||||
|
||||
### Keyword Density
|
||||
- [ ] Primary keyword: 2-3% (20-30 times per 1000 words)
|
||||
- [ ] LSI keywords: 1-2% each
|
||||
- [ ] Natural placement (no stuffing)
|
||||
- [ ] Synonyms and variations used
|
||||
|
||||
### Content Structure
|
||||
- [ ] First 100 words include primary keyword
|
||||
- [ ] Short paragraphs (3-4 sentences)
|
||||
- [ ] Bullet points and lists
|
||||
- [ ] Bold important keywords (sparingly)
|
||||
- [ ] Internal links: 5-10
|
||||
- [ ] External links: 2-3 (authoritative)
|
||||
|
||||
## Schema Markup
|
||||
|
||||
### Medical Procedure Schema
|
||||
```json
|
||||
{
|
||||
"@context": "https://schema.org",
|
||||
"@type": "MedicalProcedure",
|
||||
"name": "{procedure_name}",
|
||||
"procedureType": "Cosmetic",
|
||||
"bodyLocation": "{body_part}",
|
||||
"outcome": "{expected_outcome}",
|
||||
"preparation": "{preparation_required}",
|
||||
"followup": "{followup_care}",
|
||||
"provider": {
|
||||
"@type": "MedicalOrganization",
|
||||
"name": "{clinic_name}",
|
||||
"address": {
|
||||
"@type": "PostalAddress",
|
||||
"streetAddress": "{street}",
|
||||
"addressLocality": "{city}",
|
||||
"addressCountry": "KR"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### FAQ Schema
|
||||
```json
|
||||
{
|
||||
"@context": "https://schema.org",
|
||||
"@type": "FAQPage",
|
||||
"mainEntity": [{
|
||||
"@type": "Question",
|
||||
"name": "{question}",
|
||||
"acceptedAnswer": {
|
||||
"@type": "Answer",
|
||||
"text": "{answer}"
|
||||
}
|
||||
}]
|
||||
}
|
||||
```
|
||||
|
||||
## Image Optimization
|
||||
|
||||
- [ ] Descriptive file names: `eye-surgery-before-after-case1.jpg`
|
||||
- [ ] Alt text with keywords: `눈 성형 전후 사진 - 30대 여성 사례`
|
||||
- [ ] Compressed file size (< 200KB)
|
||||
- [ ] WebP format with fallback
|
||||
- [ ] Lazy loading implemented
|
||||
- [ ] Image sitemap created
|
||||
|
||||
## Performance Optimization
|
||||
|
||||
### Page Speed
|
||||
- [ ] Load time < 3 seconds
|
||||
- [ ] First Contentful Paint < 1.8s
|
||||
- [ ] Time to Interactive < 3.8s
|
||||
- [ ] Total page size < 3MB
|
||||
- [ ] Requests minimized (< 50)
|
||||
|
||||
### Core Web Vitals
|
||||
- [ ] LCP (Largest Contentful Paint) < 2.5s
|
||||
- [ ] FID (First Input Delay) < 100ms
|
||||
- [ ] CLS (Cumulative Layout Shift) < 0.1
|
||||
|
||||
## Mobile Optimization
|
||||
|
||||
- [ ] Mobile-responsive design
|
||||
- [ ] Viewport meta tag set
|
||||
- [ ] Touch-friendly buttons (44x44px minimum)
|
||||
- [ ] Readable font size (16px minimum)
|
||||
- [ ] No horizontal scrolling
|
||||
- [ ] Mobile page speed < 3s
|
||||
|
||||
## URL Structure
|
||||
|
||||
- [ ] SEO-friendly URL: `/eye-surgery` or `/눈-성형`
|
||||
- [ ] No special characters
|
||||
- [ ] Lowercase only
|
||||
- [ ] Hyphens for word separation
|
||||
- [ ] Under 60 characters
|
||||
- [ ] Include primary keyword
|
||||
|
||||
## Internal Linking
|
||||
|
||||
| From Page | To Page | Anchor Text | Purpose |
|
||||
|-----------|---------|-------------|---------|
|
||||
| Gateway | Service Detail | {service_name} | Deep content |
|
||||
| Gateway | Doctor Profile | {doctor_name} 원장 | Authority |
|
||||
| Gateway | Pricing | 비용 안내 | Conversion |
|
||||
| Gateway | Gallery | 시술 전후 사진 | Engagement |
|
||||
| Gateway | Contact | 상담 예약 | Conversion |
|
||||
|
||||
## Naver-Specific Optimization
|
||||
|
||||
### Naver Webmaster Tools
|
||||
- [ ] Site verification complete
|
||||
- [ ] XML sitemap submitted
|
||||
- [ ] Robots.txt configured
|
||||
- [ ] Syndication feed active
|
||||
- [ ] Site optimization report reviewed
|
||||
|
||||
### Naver SEO Elements
|
||||
- [ ] Title under 30 Korean characters
|
||||
- [ ] C-Rank tags implemented
|
||||
- [ ] Image-to-text ratio optimized (40:60)
|
||||
- [ ] Outbound links minimized
|
||||
- [ ] Brand search optimization
|
||||
|
||||
## Tracking & Analytics
|
||||
|
||||
- [ ] Google Analytics 4 installed
|
||||
- [ ] Google Search Console verified
|
||||
- [ ] Naver Analytics installed
|
||||
- [ ] Conversion tracking configured
|
||||
- [ ] Event tracking for CTAs
|
||||
- [ ] Heatmap tool installed
|
||||
|
||||
## Security & Technical
|
||||
|
||||
- [ ] SSL certificate active (HTTPS)
|
||||
- [ ] WWW/non-WWW redirect configured
|
||||
- [ ] 404 error page customized
|
||||
- [ ] XML sitemap generated
|
||||
- [ ] Robots.txt optimized
|
||||
- [ ] Canonical URLs set
|
||||
- [ ] Hreflang tags (if multi-language)
|
||||
|
||||
## Quality Checks
|
||||
|
||||
### Content Quality
|
||||
- [ ] No spelling/grammar errors
|
||||
- [ ] Medical information accurate
|
||||
- [ ] Legal compliance verified
|
||||
- [ ] Contact information correct
|
||||
- [ ] CTAs working properly
|
||||
|
||||
### Cross-browser Testing
|
||||
- [ ] Chrome (Desktop/Mobile)
|
||||
- [ ] Safari (Desktop/Mobile)
|
||||
- [ ] Firefox
|
||||
- [ ] Samsung Internet
|
||||
- [ ] Naver Whale
|
||||
|
||||
## Monthly Monitoring Tasks
|
||||
|
||||
- [ ] Keyword ranking check
|
||||
- [ ] Organic traffic analysis
|
||||
- [ ] Bounce rate monitoring
|
||||
- [ ] Conversion rate tracking
|
||||
- [ ] Competitor analysis
|
||||
- [ ] Content freshness update
|
||||
- [ ] Broken link check
|
||||
- [ ] Page speed test
|
||||
|
||||
## Priority Levels
|
||||
|
||||
1. **Critical (Day 1)**
|
||||
- Title and meta tags
|
||||
- H1 optimization
|
||||
- Mobile responsiveness
|
||||
- Page speed < 4s
|
||||
|
||||
2. **High (Week 1)**
|
||||
- Schema markup
|
||||
- Internal linking
|
||||
- Image optimization
|
||||
- Content optimization
|
||||
|
||||
3. **Medium (Week 2-3)**
|
||||
- Naver optimization
|
||||
- FAQ implementation
|
||||
- Social proof elements
|
||||
- Analytics setup
|
||||
|
||||
4. **Low (Month 2)**
|
||||
- A/B testing
|
||||
- Advanced schema
|
||||
- Link building
|
||||
- Content expansion
|
||||
159
custom-skills/17-seo-gateway-architect/desktop/SKILL.md
Normal file
159
custom-skills/17-seo-gateway-architect/desktop/SKILL.md
Normal file
@@ -0,0 +1,159 @@
|
||||
---
|
||||
name: seo-gateway-strategist
|
||||
version: 1.0.0
|
||||
description: Creates comprehensive SEO-focused gateway page strategies for Korean medical/service websites
|
||||
license: OurDigital internal-use ONLY
|
||||
---
|
||||
|
||||
# SEO Gateway Page Strategist
|
||||
|
||||
This skill helps you create comprehensive SEO-focused gateway page strategies for Korean medical/service websites, optimized for both Naver and Google.
|
||||
|
||||
## Core Competencies
|
||||
|
||||
1. **Keyword Research & Analysis**: Identifies primary and LSI keywords with search intent mapping
|
||||
2. **Content Architecture**: Creates hierarchical page structure optimized for SEO
|
||||
3. **Technical SEO Planning**: Defines specific technical requirements and meta optimizations
|
||||
4. **Performance Targeting**: Sets measurable KPIs and tracking methodologies
|
||||
5. **Competitor Analysis**: Analyzes top-ranking competitors for gap identification
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
Use this skill when:
|
||||
- Planning a new gateway page for any service/procedure category
|
||||
- Restructuring existing pages for better SEO performance
|
||||
- Conducting keyword research for content planning
|
||||
- Setting SEO performance targets and KPIs
|
||||
- Analyzing competitor strategies
|
||||
|
||||
## Instructions
|
||||
|
||||
When using this skill, provide:
|
||||
1. **Service/Procedure Name**: The main topic for the gateway page (e.g., "눈 성형", "이마 성형")
|
||||
2. **Target Market**: Location and demographic information
|
||||
3. **Current Performance** (optional): Existing rankings, traffic data if available
|
||||
4. **Competitor URLs** (optional): Known competitors to analyze
|
||||
|
||||
## Process Workflow
|
||||
|
||||
### Step 1: Keyword & Intent Analysis
|
||||
```python
|
||||
# The skill will generate:
|
||||
- Primary keyword with monthly search volume
|
||||
- 7-10 LSI (Latent Semantic Indexing) keywords
|
||||
- User intent distribution (Informational/Comparative/Transactional)
|
||||
- Top 3 competitor analysis
|
||||
```
|
||||
|
||||
### Step 2: Content Architecture
|
||||
The skill creates a complete H1-H3 structure with keyword placement strategy:
|
||||
```
|
||||
H1: [Primary keyword-optimized headline]
|
||||
├── Hero Section
|
||||
├── Problem/Solution Framework
|
||||
├── Service Categories
|
||||
├── Trust & Authority
|
||||
├── FAQ Section
|
||||
└── Consultation Guide
|
||||
```
|
||||
|
||||
### Step 3: Technical SEO Requirements
|
||||
Generates specific technical specifications:
|
||||
- Meta tags formulas and character limits
|
||||
- Schema markup recommendations
|
||||
- Internal linking strategy
|
||||
- Image optimization guidelines
|
||||
- Core Web Vitals targets
|
||||
|
||||
### Step 4: Performance Metrics
|
||||
Sets 30/60/90-day KPIs with tracking methodology
|
||||
|
||||
## Example Usage
|
||||
|
||||
### Basic Request:
|
||||
```
|
||||
"Create an SEO gateway page strategy for 눈 성형"
|
||||
```
|
||||
|
||||
### Detailed Request:
|
||||
```
|
||||
"Create an SEO gateway page strategy for 눈 성형 targeting women aged 25-45 in Gangnam.
|
||||
Current ranking: page 2 for main keyword.
|
||||
Competitor: www.example-clinic.com/eye-surgery"
|
||||
```
|
||||
|
||||
## Output Format
|
||||
|
||||
The skill delivers a structured report containing:
|
||||
|
||||
1. **Keyword Strategy Table**
|
||||
- Primary and LSI keywords with search volumes
|
||||
- User intent percentages
|
||||
- Competitor gap analysis
|
||||
|
||||
2. **Content Architecture Document**
|
||||
- Complete page hierarchy (H1-H3)
|
||||
- Word count targets per section
|
||||
- Keyword placement map
|
||||
|
||||
3. **Technical SEO Checklist**
|
||||
- Meta tag templates
|
||||
- Schema markup code
|
||||
- Performance requirements
|
||||
|
||||
4. **Performance Dashboard**
|
||||
- Current baseline metrics
|
||||
- Target KPIs with timeline
|
||||
- Tracking methodology
|
||||
|
||||
## Templates Included
|
||||
|
||||
- `keyword-research-template.md`: Keyword analysis worksheet
|
||||
- `content-architecture-template.md`: Page structure template
|
||||
- `seo-checklist-template.md`: Technical SEO requirements
|
||||
- `performance-tracking-template.md`: KPI tracking sheet
|
||||
|
||||
## Scripts Included
|
||||
|
||||
- `keyword_analyzer.py`: Automates keyword research and intent analysis
|
||||
- `competitor_analyzer.py`: Scrapes and analyzes competitor pages
|
||||
- `seo_scorer.py`: Calculates SEO optimization score
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Mobile-First Approach**: Always optimize for mobile (70%+ traffic in Korea)
|
||||
2. **Naver vs Google**: Consider platform-specific optimization differences
|
||||
3. **Local SEO**: Include location modifiers for local intent
|
||||
4. **Medical Compliance**: Ensure content meets Korean medical advertising regulations
|
||||
5. **User Intent Matching**: Align content with search intent distribution
|
||||
|
||||
## Common Patterns
|
||||
|
||||
### For Medical Services:
|
||||
```
|
||||
Primary: [시술명]
|
||||
LSI: [시술명 비용], [시술명 부작용], [시술명 회복기간], [시술명 전후]
|
||||
Intent: 60% Informational, 30% Comparative, 10% Transactional
|
||||
```
|
||||
|
||||
### For Local Services:
|
||||
```
|
||||
Primary: [지역] [서비스명]
|
||||
LSI: [지역] [서비스명] 추천, [지역] [서비스명] 잘하는곳, [지역] [서비스명] 가격
|
||||
Intent: 40% Informational, 40% Comparative, 20% Transactional
|
||||
```
|
||||
|
||||
## Integration Points
|
||||
|
||||
This skill integrates with:
|
||||
- Google Search Console for current performance data
|
||||
- Naver Webmaster Tools for Naver-specific metrics
|
||||
- Analytics platforms for user behavior data
|
||||
- Keyword research tools APIs
|
||||
|
||||
## Notes
|
||||
|
||||
- Always validate keyword search volumes with actual tools
|
||||
- Consider seasonal trends in search behavior
|
||||
- Update strategy based on algorithm changes
|
||||
- Monitor competitor movements regularly
|
||||
@@ -0,0 +1,284 @@
|
||||
# SEO Gateway Page Strategy: 눈 성형
|
||||
**Generated by SEO Gateway Page Strategist Skill**
|
||||
**Date:** 2024-12-28
|
||||
**Target Market:** Gangnam, Seoul - Women 25-45
|
||||
|
||||
---
|
||||
|
||||
## 1. Keyword Strategy Analysis
|
||||
|
||||
### Primary Keyword Research
|
||||
| Metric | Value | Analysis |
|
||||
|--------|-------|----------|
|
||||
| **Primary Keyword** | 눈 성형 | High-volume medical keyword |
|
||||
| **Monthly Search Volume** | 12,000 | Excellent traffic potential |
|
||||
| **Keyword Difficulty** | 65/100 | Moderate-high competition |
|
||||
| **CPC** | ₩2,500 | High commercial intent |
|
||||
| **Trend** | ↑ 15% YoY | Growing interest |
|
||||
|
||||
### LSI Keywords Matrix
|
||||
| LSI Keyword | Monthly Volume | Intent Type | Implementation Priority |
|
||||
|------------|---------------|-------------|------------------------|
|
||||
| 쌍꺼풀 수술 | 8,000 | Transactional | High - Sub-category page |
|
||||
| 눈매교정 | 5,500 | Informational | High - Sub-category page |
|
||||
| 앞트임 | 4,000 | Informational | High - Sub-category page |
|
||||
| 뒤트임 | 3,500 | Informational | Medium - Sub-category |
|
||||
| 눈 성형 비용 | 2,000 | Comparative | High - FAQ section |
|
||||
| 눈 성형 부작용 | 1,500 | Informational | High - FAQ section |
|
||||
| 눈 성형 회복기간 | 1,800 | Informational | High - Process section |
|
||||
| 눈 성형 전후 | 3,000 | Comparative | High - Gallery section |
|
||||
| 강남 눈 성형 | 2,200 | Local | High - Hero/title |
|
||||
| 눈 성형 잘하는곳 | 2,200 | Comparative | High - Trust section |
|
||||
|
||||
### User Intent Distribution
|
||||
```
|
||||
Informational (Research): 55%
|
||||
- Users researching procedures, risks, recovery
|
||||
- Content needed: Educational guides, process explanations, FAQs
|
||||
|
||||
Comparative (Evaluation): 35%
|
||||
- Users comparing clinics, prices, results
|
||||
- Content needed: Before/after gallery, reviews, differentiators
|
||||
|
||||
Transactional (Booking): 10%
|
||||
- Users ready to book consultation
|
||||
- Content needed: Clear CTAs, contact forms, pricing info
|
||||
```
|
||||
|
||||
### Competitor Analysis
|
||||
| Competitor | Domain Authority | Target Keywords | Content Gap Opportunities |
|
||||
|-----------|-----------------|-----------------|--------------------------|
|
||||
| Competitor A | 45 | 눈 성형, 쌍꺼풀 | Lacks detailed recovery guide |
|
||||
| Competitor B | 52 | 강남 눈 성형 | No video content |
|
||||
| Competitor C | 38 | 눈 성형 비용 | Limited trust signals |
|
||||
|
||||
---
|
||||
|
||||
## 2. Content Architecture
|
||||
|
||||
```
|
||||
H1: 강남 눈 성형 전문의가 디자인하는 자연스러운 눈매
|
||||
│
|
||||
├── Hero Section [100-150 words]
|
||||
│ ├── Value Proposition: "15년 경력 전문의가 만드는 당신만의 아름다운 눈매"
|
||||
│ ├── Trust Signals:
|
||||
│ │ • 보건복지부 인증 의료기관
|
||||
│ │ • 10,000+ 성공 사례
|
||||
│ │ • 대한성형외과학회 정회원
|
||||
│ │ • 2024 의료서비스 대상 수상
|
||||
│ └── Primary CTA: "무료 상담 신청하기"
|
||||
│
|
||||
├── H2: 눈 성형, 이런 고민이 있으신가요? [200-250 words]
|
||||
│ ├── H3: 작고 답답한 눈매로 인한 스트레스
|
||||
│ ├── H3: 나이들어 보이는 처진 눈꺼풀
|
||||
│ └── H3: 좌우 비대칭으로 인한 불균형
|
||||
│
|
||||
├── H2: 제이미클리닉 눈 성형 종류 [400-500 words]
|
||||
│ ├── H3: 자연스러운 쌍꺼풀 성형 - 매몰법 & 절개법
|
||||
│ │ ├── Description (100 words)
|
||||
│ │ ├── 시술시간: 30-60분
|
||||
│ │ ├── 회복기간: 5-10일
|
||||
│ │ └── CTA: "쌍꺼풀 자세히 보기"
|
||||
│ │
|
||||
│ ├── H3: 또렷한 눈매교정술
|
||||
│ │ ├── Description (100 words)
|
||||
│ │ ├── 시술시간: 45분
|
||||
│ │ ├── 회복기간: 7일
|
||||
│ │ └── CTA: "눈매교정 자세히 보기"
|
||||
│ │
|
||||
│ ├── H3: 시원한 앞트임 & 뒤트임
|
||||
│ │ ├── Description (100 words)
|
||||
│ │ ├── 시술시간: 30-45분
|
||||
│ │ ├── 회복기간: 5-7일
|
||||
│ │ └── CTA: "트임 수술 자세히 보기"
|
||||
│ │
|
||||
│ └── H3: 상안검 & 하안검 성형
|
||||
│ ├── Description (100 words)
|
||||
│ ├── 시술시간: 60-90분
|
||||
│ ├── 회복기간: 10-14일
|
||||
│ └── CTA: "안검 성형 자세히 보기"
|
||||
│
|
||||
├── H2: 제이미클리닉만의 차별점 [300-350 words]
|
||||
│ ├── H3: 전문 의료진의 노하우
|
||||
│ ├── H3: 1:1 맞춤 디자인 시스템
|
||||
│ └── H3: 최신 장비와 안전 시스템
|
||||
│
|
||||
├── H2: 눈 성형 자주 묻는 질문 (FAQ) [500-600 words]
|
||||
│ ├── Q: 눈 성형 비용은 얼마인가요?
|
||||
│ ├── Q: 눈 성형 회복기간은 얼마나 걸리나요?
|
||||
│ ├── Q: 눈 성형 부작용은 없나요?
|
||||
│ ├── Q: 쌍꺼풀 라인은 어떻게 결정하나요?
|
||||
│ ├── Q: 재수술도 가능한가요?
|
||||
│ ├── Q: 나이 제한이 있나요?
|
||||
│ └── Q: 수술 후 일상생활은 언제부터 가능한가요?
|
||||
│
|
||||
├── H2: 눈 성형 시술 과정 [250-300 words]
|
||||
│ ├── H3: STEP 1. 1:1 맞춤 상담
|
||||
│ ├── H3: STEP 2. 정밀 검사 및 디자인
|
||||
│ ├── H3: STEP 3. 안전한 수술 진행
|
||||
│ └── H3: STEP 4. 체계적인 사후 관리
|
||||
│
|
||||
├── H2: 실제 고객 후기 및 전후 사진 [200-250 words]
|
||||
│ ├── Review snippets (3-5)
|
||||
│ ├── Before/After gallery teaser
|
||||
│ └── CTA: "더 많은 후기 보기"
|
||||
│
|
||||
└── H2: 상담 예약 안내 [150-200 words]
|
||||
├── H3: 온라인 상담 예약
|
||||
├── H3: 오시는 길
|
||||
└── H3: 문의: 02-1234-5678
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Technical SEO Requirements
|
||||
|
||||
### Meta Tags
|
||||
```html
|
||||
<title>강남 눈 성형 전문 - 자연스러운 눈매 디자인 | 제이미클리닉</title>
|
||||
<meta name="description" content="강남 눈 성형 15년 전문의가 만드는 자연스러운 눈매. 쌍꺼풀, 눈매교정, 트임 수술 10,000건 이상. 무료상담 ☎ 02-1234-5678">
|
||||
<meta name="keywords" content="눈 성형, 강남 눈 성형, 쌍꺼풀 수술, 눈매교정, 앞트임, 뒤트임">
|
||||
```
|
||||
|
||||
### Schema Markup
|
||||
```json
|
||||
{
|
||||
"@context": "https://schema.org",
|
||||
"@type": "MedicalProcedure",
|
||||
"name": "눈 성형",
|
||||
"procedureType": "Cosmetic",
|
||||
"bodyLocation": "Eye",
|
||||
"outcome": "Enhanced eye appearance with natural-looking results",
|
||||
"preparation": "Consultation and design planning required",
|
||||
"followup": "Regular check-ups for 3 months",
|
||||
"provider": {
|
||||
"@type": "MedicalOrganization",
|
||||
"name": "제이미클리닉",
|
||||
"address": {
|
||||
"@type": "PostalAddress",
|
||||
"streetAddress": "테헤란로 123",
|
||||
"addressLocality": "강남구",
|
||||
"addressRegion": "서울특별시",
|
||||
"addressCountry": "KR"
|
||||
},
|
||||
"telephone": "+82-2-1234-5678"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Internal Linking Strategy
|
||||
| From Section | To Page | Anchor Text | Priority |
|
||||
|-------------|---------|-------------|----------|
|
||||
| 쌍꺼풀 section | /double-eyelid | "쌍꺼풀 자세히 보기" | High |
|
||||
| FAQ 비용 | /pricing | "자세한 비용 안내" | High |
|
||||
| 의료진 section | /doctors | "전문의 프로필 보기" | Medium |
|
||||
| 후기 section | /reviews | "더 많은 후기 보기" | Medium |
|
||||
| 상담 section | /consultation | "온라인 상담 신청" | High |
|
||||
|
||||
### Performance Targets
|
||||
- Page Load Speed: < 2.5 seconds
|
||||
- Mobile Score: > 95/100
|
||||
- Core Web Vitals: All green
|
||||
- Image Optimization: WebP format, < 150KB each
|
||||
|
||||
---
|
||||
|
||||
## 4. Performance KPIs & Tracking
|
||||
|
||||
### 30-Day Targets
|
||||
| Metric | Current | 30-Day Target | Measurement Tool |
|
||||
|--------|---------|---------------|------------------|
|
||||
| Primary Keyword Ranking | #8 | #5 | Search Console |
|
||||
| Organic Traffic | 500/month | 750/month | GA4 |
|
||||
| Bounce Rate | 65% | 55% | GA4 |
|
||||
| Avg. Time on Page | 1:30 | 2:30 | GA4 |
|
||||
| Conversion Rate | 1.5% | 2.5% | GA4 Goals |
|
||||
|
||||
### 60-Day Targets
|
||||
| Metric | Target | Strategy |
|
||||
|--------|--------|----------|
|
||||
| Primary Keyword Ranking | #3 | Content expansion + backlinks |
|
||||
| Organic Traffic | 1,200/month | Long-tail optimization |
|
||||
| Featured Snippets | 2+ | FAQ schema implementation |
|
||||
| Page Authority | 25+ | Link building campaign |
|
||||
|
||||
### 90-Day Targets
|
||||
| Metric | Target | Success Indicator |
|
||||
|--------|--------|-------------------|
|
||||
| Primary Keyword Ranking | #1-3 | Page 1 dominant position |
|
||||
| Organic Traffic | 2,000/month | 300% increase |
|
||||
| Conversion Rate | 3-5% | Industry-leading performance |
|
||||
| Revenue Attribution | ₩50M+ | Positive ROI |
|
||||
|
||||
---
|
||||
|
||||
## 5. Implementation Roadmap
|
||||
|
||||
### Week 1: Foundation
|
||||
- [ ] Implement title and meta tags
|
||||
- [ ] Create H1 and basic structure
|
||||
- [ ] Optimize hero section
|
||||
- [ ] Set up tracking codes
|
||||
|
||||
### Week 2: Content Development
|
||||
- [ ] Write all H2 sections
|
||||
- [ ] Create FAQ content
|
||||
- [ ] Develop procedure descriptions
|
||||
- [ ] Add trust signals
|
||||
|
||||
### Week 3: Technical Optimization
|
||||
- [ ] Implement schema markup
|
||||
- [ ] Optimize images
|
||||
- [ ] Set up internal linking
|
||||
- [ ] Mobile optimization
|
||||
|
||||
### Week 4: Enhancement
|
||||
- [ ] Add reviews and testimonials
|
||||
- [ ] Create before/after gallery
|
||||
- [ ] Implement CTA variations
|
||||
- [ ] A/B testing setup
|
||||
|
||||
### Month 2-3: Growth
|
||||
- [ ] Content expansion based on data
|
||||
- [ ] Link building outreach
|
||||
- [ ] Naver optimization
|
||||
- [ ] Conversion rate optimization
|
||||
|
||||
---
|
||||
|
||||
## 6. Strategic Recommendations
|
||||
|
||||
### High Priority Actions
|
||||
1. **Content Depth**: Create comprehensive 2,500+ word content to outrank competitors
|
||||
2. **E-A-T Signals**: Emphasize doctor credentials and certifications prominently
|
||||
3. **Local SEO**: Optimize for "강남 눈 성형" with location-specific content
|
||||
4. **Mobile UX**: Ensure thumb-friendly CTAs and fast mobile load times
|
||||
|
||||
### Quick Wins (Implement Immediately)
|
||||
1. Add FAQ schema markup for featured snippets
|
||||
2. Compress and optimize all images
|
||||
3. Add trust badges above fold
|
||||
4. Implement click-to-call for mobile
|
||||
5. Create location-specific landing pages
|
||||
|
||||
### Long-term Strategy
|
||||
1. Build topical authority with supporting content
|
||||
2. Develop video content for higher engagement
|
||||
3. Create interactive tools (virtual consultation)
|
||||
4. Establish thought leadership through blog
|
||||
5. Implement review generation system
|
||||
|
||||
---
|
||||
|
||||
## Notes for Implementation
|
||||
|
||||
- **Legal Compliance**: Ensure all medical claims are verified and compliant with KFDA regulations
|
||||
- **Image Rights**: Use only authorized before/after photos with patient consent
|
||||
- **Pricing Display**: Follow Korean medical advertising laws regarding price disclosure
|
||||
- **Mobile First**: With 70% mobile traffic, prioritize mobile experience in all decisions
|
||||
- **Naver Specific**: Create separate content strategy for Naver Blog integration
|
||||
|
||||
---
|
||||
|
||||
**End of Strategy Document**
|
||||
*Generated by SEO Gateway Page Strategist Skill v1.0*
|
||||
@@ -0,0 +1,160 @@
|
||||
# Content Architecture Template
|
||||
|
||||
## Page Hierarchy Structure
|
||||
|
||||
```
|
||||
[Page URL: /service-name]
|
||||
│
|
||||
├── H1: [Primary Keyword-Optimized Headline]
|
||||
│ Example: "강남 눈 성형 전문의가 만드는 자연스러운 눈매"
|
||||
│ Word Count Target: 15-25 characters
|
||||
│ Keyword Placement: Primary keyword at beginning
|
||||
│
|
||||
├── Hero Section [Above Fold]
|
||||
│ ├── Value Proposition (30-50 words)
|
||||
│ │ └── Keywords: Primary + 1 LSI
|
||||
│ ├── Trust Signals (3-5 items)
|
||||
│ │ ├── Certification badges
|
||||
│ │ ├── Years of experience
|
||||
│ │ └── Success cases number
|
||||
│ └── Primary CTA
|
||||
│ └── Text: "무료 상담 신청하기"
|
||||
│
|
||||
├── H2: [Service Name] 이란? [Problem/Solution Framework]
|
||||
│ Word Count: 200-300 words
|
||||
│ Keywords: Primary (1x), LSI (2-3x)
|
||||
│ ├── H3: 이런 고민이 있으신가요? [Pain Points]
|
||||
│ │ ├── Pain point 1 (include LSI keyword)
|
||||
│ │ ├── Pain point 2 (include LSI keyword)
|
||||
│ │ └── Pain point 3 (include LSI keyword)
|
||||
│ └── H3: [Clinic Name]의 솔루션 [Benefits]
|
||||
│ ├── Benefit 1 (address pain point 1)
|
||||
│ ├── Benefit 2 (address pain point 2)
|
||||
│ └── Benefit 3 (address pain point 3)
|
||||
│
|
||||
├── H2: [Service Name] 종류 및 방법 [Service Categories]
|
||||
│ Word Count: 400-500 words total
|
||||
│ Keywords: Category-specific LSI keywords
|
||||
│ ├── H3: [Sub-service 1] - [LSI Keyword Variation]
|
||||
│ │ ├── Description (80-100 words)
|
||||
│ │ ├── Best for (target audience)
|
||||
│ │ ├── Duration & Recovery
|
||||
│ │ └── CTA: "자세히 보기"
|
||||
│ ├── H3: [Sub-service 2] - [LSI Keyword Variation]
|
||||
│ │ └── [Same structure as above]
|
||||
│ └── H3: [Sub-service 3] - [LSI Keyword Variation]
|
||||
│ └── [Same structure as above]
|
||||
│
|
||||
├── H2: [Clinic Name] [Service Name]만의 차별점 [Trust & Authority]
|
||||
│ Word Count: 300-400 words
|
||||
│ Keywords: Brand + Primary keyword combinations
|
||||
│ ├── H3: 전문 의료진 [Doctor Credentials]
|
||||
│ │ ├── Doctor profile summary
|
||||
│ │ ├── Specializations
|
||||
│ │ └── Certifications
|
||||
│ ├── H3: 검증된 시술 결과 [Success Metrics]
|
||||
│ │ ├── Number statistics
|
||||
│ │ ├── Success rate
|
||||
│ │ └── Patient satisfaction
|
||||
│ └── H3: 첨단 장비 및 시설 [Facilities]
|
||||
│ ├── Equipment descriptions
|
||||
│ └── Safety protocols
|
||||
│
|
||||
├── H2: [Service Name] 자주 묻는 질문 [FAQ Section]
|
||||
│ Word Count: 500-700 words
|
||||
│ Keywords: Long-tail question keywords
|
||||
│ ├── Q1: [Long-tail keyword as question]?
|
||||
│ │ └── A: [40-60 word answer, keyword in first sentence]
|
||||
│ ├── Q2: [Price-related question]?
|
||||
│ │ └── A: [Include "비용" LSI keyword]
|
||||
│ ├── Q3: [Recovery-related question]?
|
||||
│ │ └── A: [Include "회복기간" LSI keyword]
|
||||
│ ├── Q4: [Side-effect question]?
|
||||
│ │ └── A: [Include "부작용" LSI keyword]
|
||||
│ ├── Q5: [Process question]?
|
||||
│ │ └── A: [Include process-related LSI]
|
||||
│ ├── Q6: [Candidacy question]?
|
||||
│ │ └── A: [Include target audience keywords]
|
||||
│ └── Q7: [Results duration question]?
|
||||
│ └── A: [Include maintenance keywords]
|
||||
│
|
||||
├── H2: [Service Name] 시술 과정 [Process Guide]
|
||||
│ Word Count: 300-400 words
|
||||
│ Keywords: "과정", "단계", procedural LSI
|
||||
│ ├── H3: 상담 및 검사 [Consultation]
|
||||
│ ├── H3: 시술 당일 [Procedure Day]
|
||||
│ ├── H3: 회복 과정 [Recovery]
|
||||
│ └── H3: 사후 관리 [Aftercare]
|
||||
│
|
||||
├── H2: 실제 고객 후기 [Social Proof]
|
||||
│ Word Count: 200-300 words
|
||||
│ Keywords: "후기", "리뷰", satisfaction keywords
|
||||
│ ├── Review snippet 1
|
||||
│ ├── Review snippet 2
|
||||
│ ├── Review snippet 3
|
||||
│ └── Before/After gallery teaser
|
||||
│
|
||||
└── H2: 상담 예약 안내 [Conversion Section]
|
||||
Word Count: 150-200 words
|
||||
Keywords: CTA-related, location keywords
|
||||
├── H3: 상담 예약 방법
|
||||
├── H3: 오시는 길
|
||||
└── H3: 문의 정보
|
||||
```
|
||||
|
||||
## Keyword Density Map
|
||||
|
||||
| Section | Primary Keyword | LSI Keywords | Total Keywords |
|
||||
|---------|----------------|--------------|----------------|
|
||||
| Hero | 1 | 1-2 | 2-3 |
|
||||
| Problem/Solution | 1 | 2-3 | 3-4 |
|
||||
| Service Categories | 1-2 | 4-6 | 5-8 |
|
||||
| Trust & Authority | 1 | 2-3 | 3-4 |
|
||||
| FAQ | 2-3 | 5-7 | 7-10 |
|
||||
| Process | 1 | 2-3 | 3-4 |
|
||||
| Social Proof | 0-1 | 1-2 | 1-3 |
|
||||
| Conversion | 1 | 1-2 | 2-3 |
|
||||
| **Total** | **8-11** | **18-29** | **26-40** |
|
||||
|
||||
## Internal Linking Strategy
|
||||
|
||||
| From Section | To Page | Anchor Text | Purpose |
|
||||
|-------------|---------|-------------|---------|
|
||||
| Service Categories | Sub-service page | [Sub-service name] | Deep dive |
|
||||
| FAQ | Price page | "비용 안내 페이지" | Conversion |
|
||||
| Trust section | Doctor profile | "[Doctor name] 원장" | Authority |
|
||||
| Process section | Consultation form | "상담 예약하기" | Conversion |
|
||||
| Social proof | Gallery page | "더 많은 전후 사진" | Engagement |
|
||||
|
||||
## Content Length Guidelines
|
||||
|
||||
- **Total Page Length**: 2,000-2,500 words
|
||||
- **Above Fold Content**: 100-150 words
|
||||
- **Each H2 Section**: 200-500 words
|
||||
- **Each H3 Subsection**: 80-150 words
|
||||
- **Meta Description**: 150-160 characters
|
||||
- **Image Alt Text**: 10-15 words each
|
||||
|
||||
## Schema Markup Requirements
|
||||
|
||||
```json
|
||||
{
|
||||
"@context": "https://schema.org",
|
||||
"@type": "MedicalProcedure",
|
||||
"name": "[Service Name]",
|
||||
"description": "[Meta description]",
|
||||
"procedureType": "Cosmetic",
|
||||
"provider": {
|
||||
"@type": "MedicalOrganization",
|
||||
"name": "[Clinic Name]"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Mobile Content Adaptation
|
||||
|
||||
- Reduce hero text by 30%
|
||||
- Show 3 FAQs initially (expand for more)
|
||||
- Simplify navigation to single-column
|
||||
- Increase CTA button size
|
||||
- Compress trust signals to carousel
|
||||
@@ -0,0 +1,95 @@
|
||||
# Keyword Research Template
|
||||
|
||||
## Primary Keyword Analysis
|
||||
|
||||
| Metric | Value | Notes |
|
||||
|--------|-------|-------|
|
||||
| **Primary Keyword** | [KEYWORD] | Main target keyword |
|
||||
| **Monthly Search Volume** | [VOLUME] | Average monthly searches |
|
||||
| **Keyword Difficulty** | [0-100] | Competition score |
|
||||
| **Current Ranking** | #[POSITION] | Current SERP position |
|
||||
| **Search Trend** | ↑ ↓ → | Trending direction |
|
||||
|
||||
## LSI Keywords Matrix
|
||||
|
||||
| LSI Keyword | Search Volume | Intent Type | Priority |
|
||||
|------------|--------------|-------------|----------|
|
||||
| [keyword 1] | [volume] | Informational | High |
|
||||
| [keyword 2] | [volume] | Transactional | Medium |
|
||||
| [keyword 3] | [volume] | Comparative | High |
|
||||
| [keyword 4] | [volume] | Informational | Medium |
|
||||
| [keyword 5] | [volume] | Transactional | Low |
|
||||
| [keyword 6] | [volume] | Comparative | High |
|
||||
| [keyword 7] | [volume] | Informational | Medium |
|
||||
| [keyword 8] | [volume] | Navigational | Low |
|
||||
| [keyword 9] | [volume] | Transactional | High |
|
||||
| [keyword 10] | [volume] | Informational | Medium |
|
||||
|
||||
## User Intent Distribution
|
||||
|
||||
```
|
||||
Informational (Research Phase): ___%
|
||||
- Common queries: "what is", "how to", "benefits of"
|
||||
- Content needed: Educational guides, FAQs, process explanations
|
||||
|
||||
Comparative (Evaluation Phase): ___%
|
||||
- Common queries: "best", "vs", "reviews", "비교"
|
||||
- Content needed: Comparison tables, reviews, case studies
|
||||
|
||||
Transactional (Ready to Convert): ___%
|
||||
- Common queries: "price", "book", "consultation", "예약"
|
||||
- Content needed: CTAs, pricing, booking forms
|
||||
```
|
||||
|
||||
## Long-tail Keyword Opportunities
|
||||
|
||||
### Question-based Keywords
|
||||
- [질문 키워드 1]
|
||||
- [질문 키워드 2]
|
||||
- [질문 키워드 3]
|
||||
|
||||
### Location-based Keywords
|
||||
- [지역] + [primary keyword]
|
||||
- [지역] + [primary keyword] + 잘하는곳
|
||||
- [지역] + [primary keyword] + 추천
|
||||
|
||||
### Modifier-based Keywords
|
||||
- [primary keyword] + 비용
|
||||
- [primary keyword] + 부작용
|
||||
- [primary keyword] + 회복기간
|
||||
- [primary keyword] + 전후
|
||||
|
||||
## Competitor Keyword Analysis
|
||||
|
||||
| Competitor | Target Keywords | Ranking Keywords | Gap Opportunities |
|
||||
|------------|----------------|------------------|-------------------|
|
||||
| Competitor 1 | [keywords] | [keywords] | [missing keywords] |
|
||||
| Competitor 2 | [keywords] | [keywords] | [missing keywords] |
|
||||
| Competitor 3 | [keywords] | [keywords] | [missing keywords] |
|
||||
|
||||
## Seasonal Trends
|
||||
|
||||
| Month | Search Volume | Events/Factors |
|
||||
|-------|--------------|----------------|
|
||||
| January | [volume] | New year resolutions |
|
||||
| February | [volume] | [factor] |
|
||||
| March | [volume] | [factor] |
|
||||
| ... | ... | ... |
|
||||
|
||||
## Platform-Specific Keywords
|
||||
|
||||
### Naver-Optimized
|
||||
- [네이버 specific keyword 1]
|
||||
- [네이버 specific keyword 2]
|
||||
|
||||
### Google-Optimized
|
||||
- [Google specific keyword 1]
|
||||
- [Google specific keyword 2]
|
||||
|
||||
## Action Items
|
||||
|
||||
- [ ] Target primary keyword in H1 and title tag
|
||||
- [ ] Include 3-5 LSI keywords naturally in content
|
||||
- [ ] Create content matching user intent distribution
|
||||
- [ ] Optimize for question-based featured snippets
|
||||
- [ ] Add location modifiers for local SEO
|
||||
@@ -0,0 +1,239 @@
|
||||
# SEO Technical Checklist Template
|
||||
|
||||
## Meta Tags Optimization
|
||||
|
||||
### Title Tag
|
||||
- [ ] Length: 50-60 characters
|
||||
- [ ] Primary keyword at beginning
|
||||
- [ ] Brand name at end
|
||||
- [ ] Unique for each page
|
||||
- [ ] Formula: `[Primary Keyword] - [Value Proposition] | [Brand]`
|
||||
|
||||
**Template**: `{primary_keyword} 전문 - {unique_value} | {clinic_name}`
|
||||
**Example**: `눈 성형 전문 - 자연스러운 라인 | 제이미클리닉`
|
||||
|
||||
### Meta Description
|
||||
- [ ] Length: 150-160 characters
|
||||
- [ ] Include primary keyword
|
||||
- [ ] Include 1-2 LSI keywords
|
||||
- [ ] Clear CTA
|
||||
- [ ] Unique for each page
|
||||
|
||||
**Template**: `{location} {primary_keyword} 전문의가 {benefit}. {credential}. 무료상담 ☎ {phone}`
|
||||
**Example**: `강남 눈 성형 전문의가 자연스러운 눈매를 디자인합니다. 15년 경력, 10,000건 시술. 무료상담 ☎ 02-1234-5678`
|
||||
|
||||
### Open Graph Tags
|
||||
```html
|
||||
<meta property="og:title" content="{page_title}">
|
||||
<meta property="og:description" content="{meta_description}">
|
||||
<meta property="og:image" content="{featured_image_url}">
|
||||
<meta property="og:url" content="{page_url}">
|
||||
<meta property="og:type" content="website">
|
||||
<meta property="og:locale" content="ko_KR">
|
||||
```
|
||||
|
||||
## Header Tags Structure
|
||||
|
||||
- [ ] Only one H1 per page
|
||||
- [ ] H1 contains primary keyword
|
||||
- [ ] H2 tags for main sections (5-7)
|
||||
- [ ] H3 tags for subsections
|
||||
- [ ] Logical hierarchy maintained
|
||||
- [ ] Keywords distributed naturally
|
||||
|
||||
## Content Optimization
|
||||
|
||||
### Keyword Density
|
||||
- [ ] Primary keyword: 2-3% (20-30 times per 1000 words)
|
||||
- [ ] LSI keywords: 1-2% each
|
||||
- [ ] Natural placement (no stuffing)
|
||||
- [ ] Synonyms and variations used
|
||||
|
||||
### Content Structure
|
||||
- [ ] First 100 words include primary keyword
|
||||
- [ ] Short paragraphs (3-4 sentences)
|
||||
- [ ] Bullet points and lists
|
||||
- [ ] Bold important keywords (sparingly)
|
||||
- [ ] Internal links: 5-10
|
||||
- [ ] External links: 2-3 (authoritative)
|
||||
|
||||
## Schema Markup
|
||||
|
||||
### Medical Procedure Schema
|
||||
```json
|
||||
{
|
||||
"@context": "https://schema.org",
|
||||
"@type": "MedicalProcedure",
|
||||
"name": "{procedure_name}",
|
||||
"procedureType": "Cosmetic",
|
||||
"bodyLocation": "{body_part}",
|
||||
"outcome": "{expected_outcome}",
|
||||
"preparation": "{preparation_required}",
|
||||
"followup": "{followup_care}",
|
||||
"provider": {
|
||||
"@type": "MedicalOrganization",
|
||||
"name": "{clinic_name}",
|
||||
"address": {
|
||||
"@type": "PostalAddress",
|
||||
"streetAddress": "{street}",
|
||||
"addressLocality": "{city}",
|
||||
"addressCountry": "KR"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### FAQ Schema
|
||||
```json
|
||||
{
|
||||
"@context": "https://schema.org",
|
||||
"@type": "FAQPage",
|
||||
"mainEntity": [{
|
||||
"@type": "Question",
|
||||
"name": "{question}",
|
||||
"acceptedAnswer": {
|
||||
"@type": "Answer",
|
||||
"text": "{answer}"
|
||||
}
|
||||
}]
|
||||
}
|
||||
```
|
||||
|
||||
## Image Optimization
|
||||
|
||||
- [ ] Descriptive file names: `eye-surgery-before-after-case1.jpg`
|
||||
- [ ] Alt text with keywords: `눈 성형 전후 사진 - 30대 여성 사례`
|
||||
- [ ] Compressed file size (< 200KB)
|
||||
- [ ] WebP format with fallback
|
||||
- [ ] Lazy loading implemented
|
||||
- [ ] Image sitemap created
|
||||
|
||||
## Performance Optimization
|
||||
|
||||
### Page Speed
|
||||
- [ ] Load time < 3 seconds
|
||||
- [ ] First Contentful Paint < 1.8s
|
||||
- [ ] Time to Interactive < 3.8s
|
||||
- [ ] Total page size < 3MB
|
||||
- [ ] Requests minimized (< 50)
|
||||
|
||||
### Core Web Vitals
|
||||
- [ ] LCP (Largest Contentful Paint) < 2.5s
|
||||
- [ ] FID (First Input Delay) < 100ms
|
||||
- [ ] CLS (Cumulative Layout Shift) < 0.1
|
||||
|
||||
## Mobile Optimization
|
||||
|
||||
- [ ] Mobile-responsive design
|
||||
- [ ] Viewport meta tag set
|
||||
- [ ] Touch-friendly buttons (44x44px minimum)
|
||||
- [ ] Readable font size (16px minimum)
|
||||
- [ ] No horizontal scrolling
|
||||
- [ ] Mobile page speed < 3s
|
||||
|
||||
## URL Structure
|
||||
|
||||
- [ ] SEO-friendly URL: `/eye-surgery` or `/눈-성형`
|
||||
- [ ] No special characters
|
||||
- [ ] Lowercase only
|
||||
- [ ] Hyphens for word separation
|
||||
- [ ] Under 60 characters
|
||||
- [ ] Include primary keyword
|
||||
|
||||
## Internal Linking
|
||||
|
||||
| From Page | To Page | Anchor Text | Purpose |
|
||||
|-----------|---------|-------------|---------|
|
||||
| Gateway | Service Detail | {service_name} | Deep content |
|
||||
| Gateway | Doctor Profile | {doctor_name} 원장 | Authority |
|
||||
| Gateway | Pricing | 비용 안내 | Conversion |
|
||||
| Gateway | Gallery | 시술 전후 사진 | Engagement |
|
||||
| Gateway | Contact | 상담 예약 | Conversion |
|
||||
|
||||
## Naver-Specific Optimization
|
||||
|
||||
### Naver Webmaster Tools
|
||||
- [ ] Site verification complete
|
||||
- [ ] XML sitemap submitted
|
||||
- [ ] Robots.txt configured
|
||||
- [ ] Syndication feed active
|
||||
- [ ] Site optimization report reviewed
|
||||
|
||||
### Naver SEO Elements
|
||||
- [ ] Title under 30 Korean characters
|
||||
- [ ] C-Rank tags implemented
|
||||
- [ ] Image-to-text ratio optimized (40:60)
|
||||
- [ ] Outbound links minimized
|
||||
- [ ] Brand search optimization
|
||||
|
||||
## Tracking & Analytics
|
||||
|
||||
- [ ] Google Analytics 4 installed
|
||||
- [ ] Google Search Console verified
|
||||
- [ ] Naver Analytics installed
|
||||
- [ ] Conversion tracking configured
|
||||
- [ ] Event tracking for CTAs
|
||||
- [ ] Heatmap tool installed
|
||||
|
||||
## Security & Technical
|
||||
|
||||
- [ ] SSL certificate active (HTTPS)
|
||||
- [ ] WWW/non-WWW redirect configured
|
||||
- [ ] 404 error page customized
|
||||
- [ ] XML sitemap generated
|
||||
- [ ] Robots.txt optimized
|
||||
- [ ] Canonical URLs set
|
||||
- [ ] Hreflang tags (if multi-language)
|
||||
|
||||
## Quality Checks
|
||||
|
||||
### Content Quality
|
||||
- [ ] No spelling/grammar errors
|
||||
- [ ] Medical information accurate
|
||||
- [ ] Legal compliance verified
|
||||
- [ ] Contact information correct
|
||||
- [ ] CTAs working properly
|
||||
|
||||
### Cross-browser Testing
|
||||
- [ ] Chrome (Desktop/Mobile)
|
||||
- [ ] Safari (Desktop/Mobile)
|
||||
- [ ] Firefox
|
||||
- [ ] Samsung Internet
|
||||
- [ ] Naver Whale
|
||||
|
||||
## Monthly Monitoring Tasks
|
||||
|
||||
- [ ] Keyword ranking check
|
||||
- [ ] Organic traffic analysis
|
||||
- [ ] Bounce rate monitoring
|
||||
- [ ] Conversion rate tracking
|
||||
- [ ] Competitor analysis
|
||||
- [ ] Content freshness update
|
||||
- [ ] Broken link check
|
||||
- [ ] Page speed test
|
||||
|
||||
## Priority Levels
|
||||
|
||||
1. **Critical (Day 1)**
|
||||
- Title and meta tags
|
||||
- H1 optimization
|
||||
- Mobile responsiveness
|
||||
- Page speed < 4s
|
||||
|
||||
2. **High (Week 1)**
|
||||
- Schema markup
|
||||
- Internal linking
|
||||
- Image optimization
|
||||
- Content optimization
|
||||
|
||||
3. **Medium (Week 2-3)**
|
||||
- Naver optimization
|
||||
- FAQ implementation
|
||||
- Social proof elements
|
||||
- Analytics setup
|
||||
|
||||
4. **Low (Month 2)**
|
||||
- A/B testing
|
||||
- Advanced schema
|
||||
- Link building
|
||||
- Content expansion
|
||||
234
custom-skills/18-seo-gateway-builder/README.md
Normal file
234
custom-skills/18-seo-gateway-builder/README.md
Normal file
@@ -0,0 +1,234 @@
|
||||
# Gateway Page Content Builder
|
||||
|
||||
A comprehensive Claude Skill for generating SEO-optimized gateway pages for local services, medical practices, and businesses.
|
||||
|
||||
## 🎯 Purpose
|
||||
|
||||
This skill provides a systematic framework for creating high-quality gateway pages that:
|
||||
- Target location + service keyword combinations
|
||||
- Scale content creation while maintaining uniqueness
|
||||
- Include proper technical SEO elements
|
||||
- Generate structured data and schema markup
|
||||
|
||||
## 📁 Structure
|
||||
|
||||
```
|
||||
gateway-page-content-builder/
|
||||
├── SKILL.md # Main skill documentation (REQUIRED)
|
||||
├── README.md # This file
|
||||
├── templates/ # Page templates
|
||||
│ ├── gateway-page-medical.md
|
||||
│ ├── gateway-page-beauty.md
|
||||
│ └── gateway-page-dental.md
|
||||
├── scripts/ # Automation scripts
|
||||
│ ├── generate_pages.py
|
||||
│ └── keyword_research.py
|
||||
├── config/ # Configuration files
|
||||
│ ├── locations.json
|
||||
│ ├── services.json
|
||||
│ └── brand.json
|
||||
└── examples/ # Example outputs
|
||||
└── gangnam-laser-hair-removal.md
|
||||
```
|
||||
|
||||
## 🚀 Quick Start
|
||||
|
||||
### 1. Basic Usage with Claude
|
||||
|
||||
Simply ask Claude to use this skill:
|
||||
```
|
||||
"Use the Gateway Page Content Builder skill to create a page for laser hair removal in Gangnam"
|
||||
```
|
||||
|
||||
### 2. Batch Generation with Python
|
||||
|
||||
```python
|
||||
from scripts.generate_pages import GatewayPageGenerator, Brand, Location, Service
|
||||
|
||||
# Setup your brand
|
||||
brand = Brand(
|
||||
name_en="Your Clinic",
|
||||
name_kr="클리닉명",
|
||||
website="www.yourclinic.com",
|
||||
# ... other details
|
||||
)
|
||||
|
||||
# Define locations and services
|
||||
locations = [...] # Your location list
|
||||
services = [...] # Your service list
|
||||
|
||||
# Generate pages
|
||||
generator = GatewayPageGenerator(brand)
|
||||
generator.generate_batch(services, locations)
|
||||
```
|
||||
|
||||
### 3. Using Templates
|
||||
|
||||
Templates are in Markdown format with placeholders:
|
||||
- `[Service]` - Service name
|
||||
- `[Location]` - Location name
|
||||
- `[Brand]` - Brand/clinic name
|
||||
- Additional custom placeholders
|
||||
|
||||
## 📝 Content Strategy
|
||||
|
||||
### Keyword Formula
|
||||
```
|
||||
Primary: [Service] + [Location]
|
||||
Secondary: [Location] + [Service] + "clinic/center"
|
||||
Long-tail: "best" + [Service] + "in" + [Location]
|
||||
```
|
||||
|
||||
### Content Uniqueness
|
||||
Each page should have:
|
||||
- 30% minimum unique content
|
||||
- Local landmarks and transportation
|
||||
- Location-specific testimonials
|
||||
- Regional statistics or demographics
|
||||
|
||||
### Page Length Guidelines
|
||||
- Minimum: 800 words
|
||||
- Optimal: 1,200-1,500 words
|
||||
- Maximum: 2,000 words
|
||||
|
||||
## 🎨 Customization
|
||||
|
||||
### Adding New Templates
|
||||
|
||||
1. Create new template in `templates/` directory
|
||||
2. Use consistent placeholder naming
|
||||
3. Include schema markup section
|
||||
4. Add to configuration
|
||||
|
||||
### Extending Services/Locations
|
||||
|
||||
Edit the JSON configuration files:
|
||||
|
||||
`config/services.json`:
|
||||
```json
|
||||
{
|
||||
"services": [
|
||||
{
|
||||
"id": "new-service",
|
||||
"name_en": "New Service",
|
||||
"name_kr": "새로운 서비스",
|
||||
"keywords": ["keyword1", "keyword2"]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
`config/locations.json`:
|
||||
```json
|
||||
{
|
||||
"locations": [
|
||||
{
|
||||
"id": "new-location",
|
||||
"name_en": "New Location",
|
||||
"name_kr": "새로운 지역",
|
||||
"landmarks": ["Landmark 1", "Landmark 2"]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## 🔧 Technical Implementation
|
||||
|
||||
### URL Structure
|
||||
```
|
||||
/[location]/[service]/
|
||||
Example: /gangnam/laser-hair-removal/
|
||||
```
|
||||
|
||||
### Required Meta Tags
|
||||
- Title (60 characters max)
|
||||
- Description (155 characters max)
|
||||
- Canonical URL
|
||||
- Open Graph tags
|
||||
- Schema markup
|
||||
|
||||
### Schema Types
|
||||
- MedicalBusiness
|
||||
- LocalBusiness
|
||||
- Service
|
||||
- Review
|
||||
- FAQPage
|
||||
|
||||
## 📊 Performance Tracking
|
||||
|
||||
### Key Metrics
|
||||
- Organic traffic by page
|
||||
- Keyword rankings
|
||||
- Conversion rate
|
||||
- Bounce rate
|
||||
- Time on page
|
||||
|
||||
### A/B Testing Elements
|
||||
- Headlines
|
||||
- CTA buttons
|
||||
- Trust signals
|
||||
- Content length
|
||||
|
||||
## 🔄 Content Refresh Schedule
|
||||
|
||||
- **Weekly**: Review top 10% pages
|
||||
- **Monthly**: Update testimonials
|
||||
- **Quarterly**: Refresh statistics
|
||||
- **Annually**: Full content audit
|
||||
|
||||
## 🛠️ Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
1. **Duplicate Content Warnings**
|
||||
- Ensure 30% unique content minimum
|
||||
- Use content variation patterns
|
||||
- Add local elements
|
||||
|
||||
2. **Low Rankings**
|
||||
- Check keyword density (2-3%)
|
||||
- Verify schema markup
|
||||
- Add more local signals
|
||||
|
||||
3. **Poor Conversion**
|
||||
- Test CTA placement
|
||||
- Add trust signals
|
||||
- Improve page speed
|
||||
|
||||
## 📚 Resources
|
||||
|
||||
### SEO Tools
|
||||
- Ahrefs / SEMrush (keyword research)
|
||||
- Screaming Frog (technical audit)
|
||||
- Google Search Console (performance)
|
||||
- Schema.org (structured data)
|
||||
|
||||
### Content Tools
|
||||
- Grammarly (grammar check)
|
||||
- Hemingway (readability)
|
||||
- Copyscape (plagiarism)
|
||||
|
||||
## 🤝 Contributing
|
||||
|
||||
To improve this skill:
|
||||
1. Test templates with real data
|
||||
2. Share performance metrics
|
||||
3. Suggest new features
|
||||
4. Report issues
|
||||
|
||||
## 📄 License
|
||||
|
||||
This skill is provided as-is for use with Claude Desktop and Claude Projects.
|
||||
|
||||
## 🆘 Support
|
||||
|
||||
For questions or assistance:
|
||||
- Review SKILL.md for detailed documentation
|
||||
- Check examples/ directory for samples
|
||||
- Test with the Python script first
|
||||
|
||||
---
|
||||
|
||||
**Version**: 1.0.0
|
||||
**Last Updated**: November 2024
|
||||
**Author**: OurDigital Dr.D
|
||||
82
custom-skills/18-seo-gateway-builder/code/CLAUDE.md
Normal file
82
custom-skills/18-seo-gateway-builder/code/CLAUDE.md
Normal file
@@ -0,0 +1,82 @@
|
||||
# CLAUDE.md
|
||||
|
||||
## Overview
|
||||
|
||||
Gateway page content generator for local services. Creates SEO-optimized pages from location/service configurations.
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
# Generate pages from config
|
||||
python scripts/generate_pages.py --config config/services.json --locations config/locations.json
|
||||
```
|
||||
|
||||
## Scripts
|
||||
|
||||
| Script | Purpose |
|
||||
|--------|---------|
|
||||
| `generate_pages.py` | Generate gateway pages from templates |
|
||||
|
||||
## Page Generator
|
||||
|
||||
```bash
|
||||
# Generate all combinations
|
||||
python scripts/generate_pages.py \
|
||||
--config config/services.json \
|
||||
--locations config/locations.json \
|
||||
--output ./pages
|
||||
|
||||
# Single service/location
|
||||
python scripts/generate_pages.py \
|
||||
--service "laser_hair_removal" \
|
||||
--location "gangnam" \
|
||||
--template templates/gateway-page-medical.md
|
||||
```
|
||||
|
||||
## Configuration Files
|
||||
|
||||
### services.json
|
||||
```json
|
||||
{
|
||||
"services": [
|
||||
{
|
||||
"id": "laser_hair_removal",
|
||||
"korean": "레이저 제모",
|
||||
"keywords": ["laser hair removal", "permanent hair removal"]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### locations.json
|
||||
```json
|
||||
{
|
||||
"locations": [
|
||||
{
|
||||
"id": "gangnam",
|
||||
"korean": "강남",
|
||||
"full_address": "서울특별시 강남구"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Templates
|
||||
|
||||
- `templates/gateway-page-medical.md` - Medical service template
|
||||
- Supports variables: `{{service}}`, `{{location}}`, `{{brand}}`
|
||||
|
||||
## Output
|
||||
|
||||
Generates markdown files with:
|
||||
- SEO-optimized title and meta
|
||||
- Structured content sections
|
||||
- Schema markup recommendations
|
||||
- Internal linking suggestions
|
||||
|
||||
## Workflow
|
||||
|
||||
1. Get strategy from `17-seo-gateway-architect`
|
||||
2. Configure services and locations
|
||||
3. Run generator for content drafts
|
||||
4. Review and customize output
|
||||
235
custom-skills/18-seo-gateway-builder/code/config/locations.json
Normal file
235
custom-skills/18-seo-gateway-builder/code/config/locations.json
Normal file
@@ -0,0 +1,235 @@
|
||||
{
|
||||
"locations": [
|
||||
{
|
||||
"id": "gangnam",
|
||||
"name_en": "Gangnam",
|
||||
"name_kr": "강남",
|
||||
"name_cn": "江南",
|
||||
"district": "Gangnam-gu",
|
||||
"city": "Seoul",
|
||||
"full_address_kr": "서울특별시 강남구 테헤란로 123",
|
||||
"full_address_en": "123 Teheran-ro, Gangnam-gu, Seoul",
|
||||
"postal_code": "06234",
|
||||
"latitude": 37.4979,
|
||||
"longitude": 127.0276,
|
||||
"landmarks": [
|
||||
"COEX Mall",
|
||||
"Gangnam Station",
|
||||
"Samsung Station",
|
||||
"Bongeunsa Temple"
|
||||
],
|
||||
"subway_stations": [
|
||||
{
|
||||
"name": "Gangnam Station",
|
||||
"lines": ["Line 2", "Sinbundang Line"],
|
||||
"exit": "Exit 11",
|
||||
"walking_time": "3 minutes"
|
||||
},
|
||||
{
|
||||
"name": "Sinnonhyeon Station",
|
||||
"lines": ["Line 9"],
|
||||
"exit": "Exit 6",
|
||||
"walking_time": "5 minutes"
|
||||
}
|
||||
],
|
||||
"bus_routes": ["146", "341", "360", "740"],
|
||||
"demographics": "Young professionals, affluent residents, business district",
|
||||
"population": "approximately 570,000",
|
||||
"average_age": "38.5",
|
||||
"key_industries": ["IT", "Finance", "Medical", "Education"],
|
||||
"nearby_hospitals": ["Samsung Medical Center", "Gangnam Severance Hospital"],
|
||||
"parking": {
|
||||
"available": true,
|
||||
"type": "Underground",
|
||||
"capacity": 50,
|
||||
"fee": "Free for patients"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "myeongdong",
|
||||
"name_en": "Myeongdong",
|
||||
"name_kr": "명동",
|
||||
"name_cn": "明洞",
|
||||
"district": "Jung-gu",
|
||||
"city": "Seoul",
|
||||
"full_address_kr": "서울특별시 중구 명동길 456",
|
||||
"full_address_en": "456 Myeongdong-gil, Jung-gu, Seoul",
|
||||
"postal_code": "04537",
|
||||
"latitude": 37.5636,
|
||||
"longitude": 126.9869,
|
||||
"landmarks": [
|
||||
"Myeongdong Cathedral",
|
||||
"Lotte Department Store",
|
||||
"Shinsegae Department Store",
|
||||
"N Seoul Tower"
|
||||
],
|
||||
"subway_stations": [
|
||||
{
|
||||
"name": "Myeongdong Station",
|
||||
"lines": ["Line 4"],
|
||||
"exit": "Exit 8",
|
||||
"walking_time": "2 minutes"
|
||||
},
|
||||
{
|
||||
"name": "Euljiro 1-ga Station",
|
||||
"lines": ["Line 2"],
|
||||
"exit": "Exit 5",
|
||||
"walking_time": "7 minutes"
|
||||
}
|
||||
],
|
||||
"bus_routes": ["104", "421", "463", "507"],
|
||||
"demographics": "Tourists, young shoppers, office workers",
|
||||
"population": "approximately 13,000",
|
||||
"average_age": "42.3",
|
||||
"key_industries": ["Retail", "Tourism", "Banking", "Hospitality"],
|
||||
"nearby_hospitals": ["Seoul Medical Center", "Asan Medical Center"],
|
||||
"parking": {
|
||||
"available": true,
|
||||
"type": "Building parking",
|
||||
"capacity": 30,
|
||||
"fee": "2,000 KRW per hour"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "hongdae",
|
||||
"name_en": "Hongdae",
|
||||
"name_kr": "홍대",
|
||||
"name_cn": "弘大",
|
||||
"district": "Mapo-gu",
|
||||
"city": "Seoul",
|
||||
"full_address_kr": "서울특별시 마포구 와우산로 789",
|
||||
"full_address_en": "789 Wausan-ro, Mapo-gu, Seoul",
|
||||
"postal_code": "04104",
|
||||
"latitude": 37.5563,
|
||||
"longitude": 126.9220,
|
||||
"landmarks": [
|
||||
"Hongik University",
|
||||
"Hongdae Shopping Street",
|
||||
"Sangsang Madang",
|
||||
"Hongdae Free Market"
|
||||
],
|
||||
"subway_stations": [
|
||||
{
|
||||
"name": "Hongik University Station",
|
||||
"lines": ["Line 2", "Airport Railroad", "Gyeongui-Jungang Line"],
|
||||
"exit": "Exit 9",
|
||||
"walking_time": "1 minute"
|
||||
},
|
||||
{
|
||||
"name": "Sangsu Station",
|
||||
"lines": ["Line 6"],
|
||||
"exit": "Exit 1",
|
||||
"walking_time": "8 minutes"
|
||||
}
|
||||
],
|
||||
"bus_routes": ["271", "602", "603", "760"],
|
||||
"demographics": "University students, young adults, artists, indie culture",
|
||||
"population": "approximately 385,000",
|
||||
"average_age": "35.2",
|
||||
"key_industries": ["Entertainment", "Arts", "Cafes", "Fashion"],
|
||||
"nearby_hospitals": ["Severance Hospital", "Sinchon Severance Hospital"],
|
||||
"parking": {
|
||||
"available": false,
|
||||
"type": "Street parking only",
|
||||
"capacity": 0,
|
||||
"fee": "Public parking nearby"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "apgujeong",
|
||||
"name_en": "Apgujeong",
|
||||
"name_kr": "압구정",
|
||||
"name_cn": "狎鸥亭",
|
||||
"district": "Gangnam-gu",
|
||||
"city": "Seoul",
|
||||
"full_address_kr": "서울특별시 강남구 압구정로 321",
|
||||
"full_address_en": "321 Apgujeong-ro, Gangnam-gu, Seoul",
|
||||
"postal_code": "06001",
|
||||
"latitude": 37.5311,
|
||||
"longitude": 127.0403,
|
||||
"landmarks": [
|
||||
"Galleria Department Store",
|
||||
"Apgujeong Rodeo Street",
|
||||
"Dosan Park",
|
||||
"K-Star Road"
|
||||
],
|
||||
"subway_stations": [
|
||||
{
|
||||
"name": "Apgujeong Station",
|
||||
"lines": ["Line 3"],
|
||||
"exit": "Exit 2",
|
||||
"walking_time": "4 minutes"
|
||||
},
|
||||
{
|
||||
"name": "Apgujeong Rodeo Station",
|
||||
"lines": ["Suin-Bundang Line"],
|
||||
"exit": "Exit 5",
|
||||
"walking_time": "3 minutes"
|
||||
}
|
||||
],
|
||||
"bus_routes": ["145", "301", "362", "472"],
|
||||
"demographics": "High-income residents, luxury shoppers, K-beauty enthusiasts",
|
||||
"population": "approximately 24,000",
|
||||
"average_age": "41.7",
|
||||
"key_industries": ["Luxury retail", "Beauty", "Plastic surgery", "Fashion"],
|
||||
"nearby_hospitals": ["Chung-Ang University Hospital", "Seoul St. Mary's Hospital"],
|
||||
"parking": {
|
||||
"available": true,
|
||||
"type": "Valet service",
|
||||
"capacity": 40,
|
||||
"fee": "Complimentary for VIP patients"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "jamsil",
|
||||
"name_en": "Jamsil",
|
||||
"name_kr": "잠실",
|
||||
"name_cn": "蚕室",
|
||||
"district": "Songpa-gu",
|
||||
"city": "Seoul",
|
||||
"full_address_kr": "서울특별시 송파구 올림픽로 240",
|
||||
"full_address_en": "240 Olympic-ro, Songpa-gu, Seoul",
|
||||
"postal_code": "05554",
|
||||
"latitude": 37.5132,
|
||||
"longitude": 127.1001,
|
||||
"landmarks": [
|
||||
"Lotte World Tower",
|
||||
"Lotte World",
|
||||
"Seokchon Lake",
|
||||
"Olympic Park"
|
||||
],
|
||||
"subway_stations": [
|
||||
{
|
||||
"name": "Jamsil Station",
|
||||
"lines": ["Line 2", "Line 8"],
|
||||
"exit": "Exit 4",
|
||||
"walking_time": "5 minutes"
|
||||
},
|
||||
{
|
||||
"name": "Jamsilsaenae Station",
|
||||
"lines": ["Line 2"],
|
||||
"exit": "Exit 9",
|
||||
"walking_time": "10 minutes"
|
||||
}
|
||||
],
|
||||
"bus_routes": ["2415", "3217", "3313", "3414"],
|
||||
"demographics": "Families, middle-class residents, sports enthusiasts",
|
||||
"population": "approximately 660,000",
|
||||
"average_age": "40.1",
|
||||
"key_industries": ["Retail", "Entertainment", "Sports", "Tourism"],
|
||||
"nearby_hospitals": ["Asan Medical Center", "Seoul Paik Hospital"],
|
||||
"parking": {
|
||||
"available": true,
|
||||
"type": "Multi-level parking",
|
||||
"capacity": 100,
|
||||
"fee": "First 2 hours free"
|
||||
}
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"last_updated": "2024-11-19",
|
||||
"total_locations": 5,
|
||||
"coverage_area": "Seoul Metropolitan Area",
|
||||
"coordinate_system": "WGS84"
|
||||
}
|
||||
}
|
||||
308
custom-skills/18-seo-gateway-builder/code/config/services.json
Normal file
308
custom-skills/18-seo-gateway-builder/code/config/services.json
Normal file
@@ -0,0 +1,308 @@
|
||||
{
|
||||
"services": [
|
||||
{
|
||||
"id": "laser-hair-removal",
|
||||
"name_en": "Laser Hair Removal",
|
||||
"name_kr": "레이저 제모",
|
||||
"name_cn": "激光脱毛",
|
||||
"category": "Dermatology",
|
||||
"subcategory": "Aesthetic Dermatology",
|
||||
"description": "Permanent hair reduction using state-of-the-art laser technology",
|
||||
"detailed_description": "Our laser hair removal treatment uses FDA-approved diode and Alexandrite lasers to target hair follicles, providing long-lasting hair reduction with minimal discomfort. Suitable for all skin types.",
|
||||
"keywords": [
|
||||
"permanent hair removal",
|
||||
"laser hair removal",
|
||||
"IPL hair removal",
|
||||
"diode laser",
|
||||
"hair reduction",
|
||||
"unwanted hair",
|
||||
"smooth skin"
|
||||
],
|
||||
"procedure_time": "15-60 minutes",
|
||||
"sessions_required": "6-8 sessions",
|
||||
"recovery_time": "No downtime",
|
||||
"results_visible": "After 2-3 sessions",
|
||||
"price_range": "₩₩₩",
|
||||
"price_details": {
|
||||
"small_area": "50,000-100,000 KRW",
|
||||
"medium_area": "100,000-200,000 KRW",
|
||||
"large_area": "200,000-400,000 KRW"
|
||||
},
|
||||
"target_areas": [
|
||||
"Face",
|
||||
"Underarms",
|
||||
"Arms",
|
||||
"Legs",
|
||||
"Bikini line",
|
||||
"Back",
|
||||
"Chest"
|
||||
],
|
||||
"benefits": [
|
||||
"Permanent hair reduction",
|
||||
"Smoother skin",
|
||||
"No more razor burns",
|
||||
"Time-saving",
|
||||
"Cost-effective long-term"
|
||||
],
|
||||
"side_effects": [
|
||||
"Temporary redness",
|
||||
"Mild swelling",
|
||||
"Skin sensitivity"
|
||||
],
|
||||
"contraindications": [
|
||||
"Pregnancy",
|
||||
"Active skin infections",
|
||||
"Recent sun exposure",
|
||||
"Certain medications"
|
||||
],
|
||||
"seo_priority": "high",
|
||||
"search_volume": 14800,
|
||||
"competition": "medium"
|
||||
},
|
||||
{
|
||||
"id": "botox",
|
||||
"name_en": "Botox Treatment",
|
||||
"name_kr": "보톡스",
|
||||
"name_cn": "肉毒素",
|
||||
"category": "Cosmetic Dermatology",
|
||||
"subcategory": "Injectable Treatments",
|
||||
"description": "FDA-approved botulinum toxin for wrinkle reduction and facial rejuvenation",
|
||||
"detailed_description": "Botox injections temporarily relax facial muscles to reduce the appearance of fine lines and wrinkles. Our experienced practitioners use precise injection techniques for natural-looking results.",
|
||||
"keywords": [
|
||||
"botox",
|
||||
"botulinum toxin",
|
||||
"wrinkle treatment",
|
||||
"anti-aging",
|
||||
"facial rejuvenation",
|
||||
"crow's feet",
|
||||
"forehead lines",
|
||||
"frown lines"
|
||||
],
|
||||
"procedure_time": "15-30 minutes",
|
||||
"sessions_required": "1 session, repeat every 3-6 months",
|
||||
"recovery_time": "No downtime",
|
||||
"results_visible": "3-7 days",
|
||||
"price_range": "₩₩₩₩",
|
||||
"price_details": {
|
||||
"per_unit": "10,000-15,000 KRW",
|
||||
"full_face": "300,000-500,000 KRW",
|
||||
"single_area": "150,000-250,000 KRW"
|
||||
},
|
||||
"target_areas": [
|
||||
"Forehead lines",
|
||||
"Crow's feet",
|
||||
"Frown lines",
|
||||
"Bunny lines",
|
||||
"Lip lines",
|
||||
"Chin dimpling",
|
||||
"Jawline slimming"
|
||||
],
|
||||
"benefits": [
|
||||
"Reduces wrinkles",
|
||||
"Prevents new lines",
|
||||
"Non-surgical",
|
||||
"Quick procedure",
|
||||
"Natural results"
|
||||
],
|
||||
"side_effects": [
|
||||
"Temporary bruising",
|
||||
"Mild headache",
|
||||
"Drooping eyelid (rare)"
|
||||
],
|
||||
"contraindications": [
|
||||
"Pregnancy/nursing",
|
||||
"Neuromuscular disorders",
|
||||
"Allergy to ingredients",
|
||||
"Active infection at injection site"
|
||||
],
|
||||
"seo_priority": "high",
|
||||
"search_volume": 22200,
|
||||
"competition": "high"
|
||||
},
|
||||
{
|
||||
"id": "dermal-filler",
|
||||
"name_en": "Dermal Filler",
|
||||
"name_kr": "필러",
|
||||
"name_cn": "玻尿酸",
|
||||
"category": "Cosmetic Dermatology",
|
||||
"subcategory": "Injectable Treatments",
|
||||
"description": "Hyaluronic acid fillers for volume restoration and facial contouring",
|
||||
"detailed_description": "Our dermal filler treatments use premium hyaluronic acid products to restore volume, enhance facial contours, and smooth deep wrinkles. Results are immediate and can last 6-18 months.",
|
||||
"keywords": [
|
||||
"dermal filler",
|
||||
"hyaluronic acid",
|
||||
"facial volume",
|
||||
"lip filler",
|
||||
"cheek filler",
|
||||
"nasolabial folds",
|
||||
"facial contouring"
|
||||
],
|
||||
"procedure_time": "30-45 minutes",
|
||||
"sessions_required": "1 session, touch-ups as needed",
|
||||
"recovery_time": "24-48 hours",
|
||||
"results_visible": "Immediate",
|
||||
"price_range": "₩₩₩₩",
|
||||
"price_details": {
|
||||
"per_syringe": "300,000-600,000 KRW",
|
||||
"lip_augmentation": "400,000-700,000 KRW",
|
||||
"full_face": "1,500,000-3,000,000 KRW"
|
||||
},
|
||||
"target_areas": [
|
||||
"Lips",
|
||||
"Cheeks",
|
||||
"Under eyes",
|
||||
"Nasolabial folds",
|
||||
"Marionette lines",
|
||||
"Jawline",
|
||||
"Temples"
|
||||
],
|
||||
"benefits": [
|
||||
"Instant results",
|
||||
"Restores volume",
|
||||
"Smooths wrinkles",
|
||||
"Enhances contours",
|
||||
"Reversible if needed"
|
||||
],
|
||||
"side_effects": [
|
||||
"Swelling",
|
||||
"Bruising",
|
||||
"Tenderness",
|
||||
"Lumps (temporary)"
|
||||
],
|
||||
"contraindications": [
|
||||
"Pregnancy",
|
||||
"Blood clotting disorders",
|
||||
"Active cold sores",
|
||||
"Allergy to lidocaine"
|
||||
],
|
||||
"seo_priority": "high",
|
||||
"search_volume": 18100,
|
||||
"competition": "high"
|
||||
},
|
||||
{
|
||||
"id": "chemical-peel",
|
||||
"name_en": "Chemical Peel",
|
||||
"name_kr": "케미컬 필링",
|
||||
"name_cn": "化学换肤",
|
||||
"category": "Medical Aesthetics",
|
||||
"subcategory": "Skin Resurfacing",
|
||||
"description": "Professional chemical exfoliation for skin renewal and rejuvenation",
|
||||
"detailed_description": "Chemical peels use carefully selected acids to remove damaged skin layers, revealing fresher, younger-looking skin. We offer light, medium, and deep peels customized to your skin concerns.",
|
||||
"keywords": [
|
||||
"chemical peel",
|
||||
"skin resurfacing",
|
||||
"glycolic acid",
|
||||
"TCA peel",
|
||||
"skin renewal",
|
||||
"acne scars",
|
||||
"pigmentation treatment"
|
||||
],
|
||||
"procedure_time": "30-45 minutes",
|
||||
"sessions_required": "3-6 sessions",
|
||||
"recovery_time": "3-7 days",
|
||||
"results_visible": "After peeling completes",
|
||||
"price_range": "₩₩",
|
||||
"price_details": {
|
||||
"light_peel": "100,000-200,000 KRW",
|
||||
"medium_peel": "300,000-500,000 KRW",
|
||||
"deep_peel": "700,000-1,000,000 KRW"
|
||||
},
|
||||
"target_areas": [
|
||||
"Full face",
|
||||
"Neck",
|
||||
"Décolletage",
|
||||
"Hands",
|
||||
"Back"
|
||||
],
|
||||
"benefits": [
|
||||
"Improves texture",
|
||||
"Reduces pigmentation",
|
||||
"Minimizes pores",
|
||||
"Treats acne",
|
||||
"Stimulates collagen"
|
||||
],
|
||||
"side_effects": [
|
||||
"Peeling",
|
||||
"Redness",
|
||||
"Sensitivity",
|
||||
"Temporary darkening"
|
||||
],
|
||||
"contraindications": [
|
||||
"Active herpes",
|
||||
"Isotretinoin use",
|
||||
"Pregnancy",
|
||||
"Open wounds"
|
||||
],
|
||||
"seo_priority": "medium",
|
||||
"search_volume": 8100,
|
||||
"competition": "medium"
|
||||
},
|
||||
{
|
||||
"id": "acne-treatment",
|
||||
"name_en": "Acne Treatment",
|
||||
"name_kr": "여드름 치료",
|
||||
"name_cn": "痤疮治疗",
|
||||
"category": "Medical Dermatology",
|
||||
"subcategory": "Acne Management",
|
||||
"description": "Comprehensive medical treatment for acne and acne scarring",
|
||||
"detailed_description": "Our acne treatment program combines medical-grade therapies including laser treatments, chemical peels, and prescription medications to clear active acne and prevent future breakouts.",
|
||||
"keywords": [
|
||||
"acne treatment",
|
||||
"acne scars",
|
||||
"pimple treatment",
|
||||
"blackheads",
|
||||
"whiteheads",
|
||||
"cystic acne",
|
||||
"hormonal acne"
|
||||
],
|
||||
"procedure_time": "45-60 minutes",
|
||||
"sessions_required": "4-8 sessions",
|
||||
"recovery_time": "Minimal",
|
||||
"results_visible": "4-6 weeks",
|
||||
"price_range": "₩₩₩",
|
||||
"price_details": {
|
||||
"consultation": "50,000 KRW",
|
||||
"basic_treatment": "150,000-250,000 KRW",
|
||||
"laser_therapy": "300,000-500,000 KRW"
|
||||
},
|
||||
"target_areas": [
|
||||
"Face",
|
||||
"Back",
|
||||
"Chest",
|
||||
"Shoulders"
|
||||
],
|
||||
"benefits": [
|
||||
"Clears acne",
|
||||
"Prevents scarring",
|
||||
"Reduces inflammation",
|
||||
"Improves skin texture",
|
||||
"Boosts confidence"
|
||||
],
|
||||
"side_effects": [
|
||||
"Initial purging",
|
||||
"Dryness",
|
||||
"Mild irritation",
|
||||
"Sun sensitivity"
|
||||
],
|
||||
"contraindications": [
|
||||
"Pregnancy (certain medications)",
|
||||
"Photosensitivity disorders",
|
||||
"Active infections"
|
||||
],
|
||||
"seo_priority": "high",
|
||||
"search_volume": 12100,
|
||||
"competition": "medium"
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"last_updated": "2024-11-19",
|
||||
"total_services": 5,
|
||||
"categories": [
|
||||
"Dermatology",
|
||||
"Cosmetic Dermatology",
|
||||
"Medical Aesthetics",
|
||||
"Medical Dermatology"
|
||||
],
|
||||
"price_currency": "KRW"
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,398 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Gateway Page Content Generator
|
||||
Automates the creation of SEO-optimized gateway pages for local services
|
||||
"""
|
||||
|
||||
import json
|
||||
import os
|
||||
from datetime import datetime
|
||||
from typing import Dict, List, Optional
|
||||
import re
|
||||
from dataclasses import dataclass
|
||||
from pathlib import Path
|
||||
|
||||
@dataclass
|
||||
class Location:
|
||||
"""Location data structure"""
|
||||
id: str
|
||||
name_en: str
|
||||
name_kr: str
|
||||
full_address: str
|
||||
landmarks: List[str]
|
||||
subway_stations: List[str]
|
||||
demographics: str
|
||||
latitude: float
|
||||
longitude: float
|
||||
|
||||
@dataclass
|
||||
class Service:
|
||||
"""Service data structure"""
|
||||
id: str
|
||||
name_en: str
|
||||
name_kr: str
|
||||
category: str
|
||||
description: str
|
||||
keywords: List[str]
|
||||
procedure_time: str
|
||||
recovery_time: str
|
||||
price_range: str
|
||||
|
||||
@dataclass
|
||||
class Brand:
|
||||
"""Brand/Clinic information"""
|
||||
name_en: str
|
||||
name_kr: str
|
||||
website: str
|
||||
phone: str
|
||||
email: str
|
||||
established_year: int
|
||||
certifications: List[str]
|
||||
unique_selling_points: List[str]
|
||||
|
||||
class GatewayPageGenerator:
|
||||
"""Main class for generating gateway page content"""
|
||||
|
||||
def __init__(self, brand: Brand, template_path: str = None):
|
||||
self.brand = brand
|
||||
# Use script directory as base for template path
|
||||
if template_path is None:
|
||||
script_dir = Path(__file__).parent.parent
|
||||
self.template_path = script_dir / "templates"
|
||||
else:
|
||||
self.template_path = Path(template_path)
|
||||
self.generated_pages = []
|
||||
|
||||
def load_template(self, template_name: str) -> str:
|
||||
"""Load a template file"""
|
||||
template_file = self.template_path / template_name
|
||||
if template_file.exists():
|
||||
with open(template_file, 'r', encoding='utf-8') as f:
|
||||
return f.read()
|
||||
else:
|
||||
raise FileNotFoundError(f"Template {template_name} not found")
|
||||
|
||||
def generate_meta_tags(self, service: Service, location: Location) -> Dict:
|
||||
"""Generate SEO meta tags"""
|
||||
return {
|
||||
"title": f"{service.name_en} in {location.name_en} | Expert {service.category} | {self.brand.name_en}",
|
||||
"description": f"Looking for {service.name_en.lower()} in {location.name_en}? "
|
||||
f"{self.brand.name_en} offers professional {service.category.lower()} services. "
|
||||
f"✓ Experienced team ✓ Latest technology ✓ {self.brand.unique_selling_points[0]}",
|
||||
"keywords": ", ".join([
|
||||
f"{service.name_en} {location.name_en}",
|
||||
f"{location.name_en} {service.name_en}",
|
||||
*service.keywords,
|
||||
f"{service.category} {location.name_en}"
|
||||
]),
|
||||
"canonical": f"https://{self.brand.website}/{location.id}/{service.id}/",
|
||||
"og:title": f"{service.name_en} in {location.name_en} - {self.brand.name_en}",
|
||||
"og:description": f"Professional {service.name_en} services in {location.name_en}. "
|
||||
f"Book your consultation today.",
|
||||
"og:image": f"https://{self.brand.website}/images/{service.id}-{location.id}-og.jpg"
|
||||
}
|
||||
|
||||
def generate_schema_markup(self, service: Service, location: Location) -> str:
|
||||
"""Generate JSON-LD schema markup"""
|
||||
schema = {
|
||||
"@context": "https://schema.org",
|
||||
"@type": "MedicalBusiness",
|
||||
"name": f"{self.brand.name_en} - {location.name_en}",
|
||||
"url": f"https://{self.brand.website}",
|
||||
"telephone": self.brand.phone,
|
||||
"email": self.brand.email,
|
||||
"address": {
|
||||
"@type": "PostalAddress",
|
||||
"streetAddress": location.full_address,
|
||||
"addressLocality": location.name_en,
|
||||
"addressCountry": "KR"
|
||||
},
|
||||
"geo": {
|
||||
"@type": "GeoCoordinates",
|
||||
"latitude": location.latitude,
|
||||
"longitude": location.longitude
|
||||
},
|
||||
"areaServed": {
|
||||
"@type": "City",
|
||||
"name": location.name_en
|
||||
},
|
||||
"availableService": {
|
||||
"@type": "MedicalProcedure",
|
||||
"name": service.name_en,
|
||||
"description": service.description
|
||||
},
|
||||
"priceRange": service.price_range
|
||||
}
|
||||
return json.dumps(schema, indent=2, ensure_ascii=False)
|
||||
|
||||
def generate_content_variations(self, service: Service, location: Location) -> Dict[str, List[str]]:
|
||||
"""Generate content variations for uniqueness"""
|
||||
return {
|
||||
"hero_headlines": [
|
||||
f"Professional {service.name_en} in {location.name_en}",
|
||||
f"{location.name_en}'s Premier {service.name_en} {service.category}",
|
||||
f"Expert {service.name_en} Services for {location.name_en} Residents",
|
||||
f"Transform Your Look with {service.name_en} in {location.name_en}"
|
||||
],
|
||||
"intro_paragraphs": [
|
||||
f"Welcome to {self.brand.name_en}, where we specialize in providing exceptional "
|
||||
f"{service.name_en} services to the {location.name_en} community. "
|
||||
f"Our state-of-the-art facility, conveniently located near {location.landmarks[0]}, "
|
||||
f"combines advanced technology with personalized care.",
|
||||
|
||||
f"Looking for trusted {service.name_en} in {location.name_en}? "
|
||||
f"At {self.brand.name_en}, we've been serving the {location.demographics} "
|
||||
f"for over {datetime.now().year - self.brand.established_year} years. "
|
||||
f"Our expert team understands the unique needs of {location.name_en} residents.",
|
||||
|
||||
f"Discover why {location.name_en} residents choose {self.brand.name_en} "
|
||||
f"for their {service.name_en} needs. Located just minutes from "
|
||||
f"{', '.join(location.subway_stations[:2])}, we offer {service.category} "
|
||||
f"services that deliver remarkable results."
|
||||
],
|
||||
"cta_buttons": [
|
||||
f"Book Your {location.name_en} Consultation",
|
||||
f"Schedule {service.name_en} Today",
|
||||
f"Get Started in {location.name_en}",
|
||||
f"Reserve Your Appointment"
|
||||
],
|
||||
"trust_signals": [
|
||||
f"Trusted by {location.name_en} residents since {self.brand.established_year}",
|
||||
f"Over 10,000 successful {service.category} treatments",
|
||||
f"5-star rated {service.name_en} clinic in {location.name_en}",
|
||||
f"Certified specialists serving {location.demographics}"
|
||||
]
|
||||
}
|
||||
|
||||
def localize_content(self, content: str, service: Service, location: Location) -> str:
|
||||
"""Add local elements to content"""
|
||||
local_elements = {
|
||||
"transportation": f"Easily accessible via {', '.join(location.subway_stations)} stations",
|
||||
"landmarks": f"Located near {' and '.join(location.landmarks[:2])}",
|
||||
"community": f"Proud to serve the {location.name_en} community",
|
||||
"convenience": f"Convenient for {location.demographics} in {location.name_en}",
|
||||
"local_stats": f"Join thousands of satisfied patients from {location.name_en}"
|
||||
}
|
||||
|
||||
# Add local elements naturally throughout content
|
||||
for key, value in local_elements.items():
|
||||
placeholder = f"[LOCAL_{key.upper()}]"
|
||||
if placeholder in content:
|
||||
content = content.replace(placeholder, value)
|
||||
|
||||
return content
|
||||
|
||||
def generate_page(self, service: Service, location: Location,
|
||||
template_name: str = "gateway-page-medical.md") -> str:
|
||||
"""Generate a complete gateway page"""
|
||||
|
||||
# Load template
|
||||
template = self.load_template(template_name)
|
||||
|
||||
# Generate components
|
||||
meta_tags = self.generate_meta_tags(service, location)
|
||||
schema = self.generate_schema_markup(service, location)
|
||||
variations = self.generate_content_variations(service, location)
|
||||
|
||||
# Replace placeholders in template
|
||||
replacements = {
|
||||
"[Medical Service]": service.name_en,
|
||||
"[Location]": location.name_en,
|
||||
"[location]": location.name_en.lower(),
|
||||
"[Clinic Name]": self.brand.name_en,
|
||||
"[service-slug]": service.id,
|
||||
"[X years]": str(datetime.now().year - self.brand.established_year),
|
||||
"[specific address near landmark]": f"{location.full_address}, near {location.landmarks[0]}",
|
||||
"[nearby subway/bus stations]": ", ".join(location.subway_stations),
|
||||
"[certification details]": ", ".join(self.brand.certifications[:2]),
|
||||
"[equipment type]": f"{service.category} equipment",
|
||||
"[duration]": service.procedure_time,
|
||||
"[Medical Specialty]": service.category,
|
||||
"[phone-number]": self.brand.phone,
|
||||
"[website-url]": f"https://{self.brand.website}",
|
||||
"[page-url]": f"https://{self.brand.website}/{location.id}/{service.id}/",
|
||||
"[latitude]": str(location.latitude),
|
||||
"[longitude]": str(location.longitude),
|
||||
}
|
||||
|
||||
# Apply replacements
|
||||
content = template
|
||||
for placeholder, value in replacements.items():
|
||||
content = content.replace(placeholder, value)
|
||||
|
||||
# Add localized content
|
||||
content = self.localize_content(content, service, location)
|
||||
|
||||
# Add schema markup at the end if not already present
|
||||
if '"@context": "https://schema.org"' not in content:
|
||||
content += f"\n\n<!-- Schema Markup -->\n<script type='application/ld+json'>\n{schema}\n</script>"
|
||||
|
||||
return content
|
||||
|
||||
def generate_batch(self, services: List[Service], locations: List[Location],
|
||||
output_dir: str = "output/") -> List[str]:
|
||||
"""Generate multiple gateway pages"""
|
||||
output_path = Path(output_dir)
|
||||
output_path.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
generated_files = []
|
||||
|
||||
for location in locations:
|
||||
location_dir = output_path / location.id
|
||||
location_dir.mkdir(exist_ok=True)
|
||||
|
||||
for service in services:
|
||||
# Generate content
|
||||
content = self.generate_page(service, location)
|
||||
|
||||
# Save to file
|
||||
filename = f"{service.id}-{location.id}.md"
|
||||
filepath = location_dir / filename
|
||||
|
||||
with open(filepath, 'w', encoding='utf-8') as f:
|
||||
f.write(content)
|
||||
|
||||
generated_files.append(str(filepath))
|
||||
print(f"✓ Generated: {filepath}")
|
||||
|
||||
# Generate index file
|
||||
self.generate_index(services, locations, output_path)
|
||||
|
||||
return generated_files
|
||||
|
||||
def generate_index(self, services: List[Service], locations: List[Location],
|
||||
output_path: Path):
|
||||
"""Generate an index of all created pages"""
|
||||
index_content = f"# Gateway Pages Index - {self.brand.name_en}\n\n"
|
||||
index_content += f"Generated: {datetime.now().strftime('%Y-%m-%d %H:%M')}\n\n"
|
||||
index_content += "## Pages by Location\n\n"
|
||||
|
||||
for location in locations:
|
||||
index_content += f"### {location.name_en}\n"
|
||||
for service in services:
|
||||
url = f"/{location.id}/{service.id}/"
|
||||
index_content += f"- [{service.name_en} in {location.name_en}]({url})\n"
|
||||
index_content += "\n"
|
||||
|
||||
index_content += "## Pages by Service\n\n"
|
||||
for service in services:
|
||||
index_content += f"### {service.name_en}\n"
|
||||
for location in locations:
|
||||
url = f"/{location.id}/{service.id}/"
|
||||
index_content += f"- [{location.name_en}]({url})\n"
|
||||
index_content += "\n"
|
||||
|
||||
index_content += f"\n---\nTotal Pages Generated: {len(services) * len(locations)}\n"
|
||||
|
||||
with open(output_path / "index.md", 'w', encoding='utf-8') as f:
|
||||
f.write(index_content)
|
||||
|
||||
def create_sample_data():
|
||||
"""Create sample data for testing"""
|
||||
|
||||
# Sample brand
|
||||
brand = Brand(
|
||||
name_en="Jamie Clinic",
|
||||
name_kr="제이미 클리닉",
|
||||
website="www.jamieclinic.com",
|
||||
phone="+82-2-1234-5678",
|
||||
email="info@jamieclinic.com",
|
||||
established_year=2010,
|
||||
certifications=["ISO 9001", "KAHF Certified", "JCI Accredited"],
|
||||
unique_selling_points=[
|
||||
"Same-day appointments available",
|
||||
"15+ years of experience",
|
||||
"Latest medical technology"
|
||||
]
|
||||
)
|
||||
|
||||
# Sample locations
|
||||
locations = [
|
||||
Location(
|
||||
id="gangnam",
|
||||
name_en="Gangnam",
|
||||
name_kr="강남",
|
||||
full_address="123 Teheran-ro, Gangnam-gu, Seoul",
|
||||
landmarks=["COEX", "Gangnam Station", "Samsung Station"],
|
||||
subway_stations=["Gangnam Station (Line 2)", "Sinnonhyeon Station (Line 9)"],
|
||||
demographics="Young professionals and affluent residents",
|
||||
latitude=37.4979,
|
||||
longitude=127.0276
|
||||
),
|
||||
Location(
|
||||
id="myeongdong",
|
||||
name_en="Myeongdong",
|
||||
name_kr="명동",
|
||||
full_address="456 Myeongdong-gil, Jung-gu, Seoul",
|
||||
landmarks=["Myeongdong Cathedral", "Lotte Department Store"],
|
||||
subway_stations=["Myeongdong Station (Line 4)", "Euljiro 1-ga Station (Line 2)"],
|
||||
demographics="Tourists and young shoppers",
|
||||
latitude=37.5636,
|
||||
longitude=126.9869
|
||||
)
|
||||
]
|
||||
|
||||
# Sample services
|
||||
services = [
|
||||
Service(
|
||||
id="laser-hair-removal",
|
||||
name_en="Laser Hair Removal",
|
||||
name_kr="레이저 제모",
|
||||
category="Dermatology",
|
||||
description="Advanced laser technology for permanent hair reduction",
|
||||
keywords=["permanent hair removal", "IPL", "diode laser"],
|
||||
procedure_time="30-60 minutes",
|
||||
recovery_time="No downtime",
|
||||
price_range="₩₩₩"
|
||||
),
|
||||
Service(
|
||||
id="botox",
|
||||
name_en="Botox Treatment",
|
||||
name_kr="보톡스",
|
||||
category="Cosmetic Dermatology",
|
||||
description="FDA-approved botulinum toxin for wrinkle reduction",
|
||||
keywords=["wrinkle treatment", "anti-aging", "facial rejuvenation"],
|
||||
procedure_time="15-30 minutes",
|
||||
recovery_time="No downtime",
|
||||
price_range="₩₩₩₩"
|
||||
)
|
||||
]
|
||||
|
||||
return brand, locations, services
|
||||
|
||||
def main():
|
||||
"""Main execution function"""
|
||||
print("=" * 60)
|
||||
print("Gateway Page Content Generator")
|
||||
print("=" * 60)
|
||||
|
||||
# Get sample data
|
||||
brand, locations, services = create_sample_data()
|
||||
|
||||
# Initialize generator
|
||||
generator = GatewayPageGenerator(brand)
|
||||
|
||||
# Generate pages
|
||||
print(f"\nGenerating {len(services) * len(locations)} gateway pages...")
|
||||
print("-" * 40)
|
||||
|
||||
generated_files = generator.generate_batch(services, locations)
|
||||
|
||||
print("-" * 40)
|
||||
print(f"\n✅ Successfully generated {len(generated_files)} pages!")
|
||||
print(f"📁 Output directory: output/")
|
||||
print(f"📋 Index file created: output/index.md")
|
||||
|
||||
# Generate report
|
||||
print("\n" + "=" * 60)
|
||||
print("GENERATION REPORT")
|
||||
print("=" * 60)
|
||||
print(f"Brand: {brand.name_en}")
|
||||
print(f"Locations: {', '.join([loc.name_en for loc in locations])}")
|
||||
print(f"Services: {', '.join([svc.name_en for svc in services])}")
|
||||
print(f"Total Pages: {len(generated_files)}")
|
||||
print(f"Timestamp: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
|
||||
print("=" * 60)
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -0,0 +1,5 @@
|
||||
# 18-seo-gateway-builder dependencies
|
||||
jinja2>=3.1.0
|
||||
pyyaml>=6.0.0
|
||||
markdown>=3.5.0
|
||||
python-dotenv>=1.0.0
|
||||
@@ -0,0 +1,231 @@
|
||||
# [Medical Service] in [Location] | [Clinic Name]
|
||||
|
||||
<!-- Meta Tags -->
|
||||
<!--
|
||||
Title: [Medical Service] in [Location] | Expert Care | [Clinic Name]
|
||||
Description: Looking for professional [medical service] in [location]? [Clinic Name] offers state-of-the-art [service] with experienced doctors. ✓ Same-day appointments ✓ Insurance accepted ✓ [Unique benefit]
|
||||
Canonical: https://example.com/[location]/[service-slug]/
|
||||
-->
|
||||
|
||||
## Professional [Medical Service] Available in [Location]
|
||||
|
||||
Welcome to [Clinic Name], your trusted provider for [medical service] in [location]. Our medical team brings over [X years] of combined experience, utilizing the latest medical technology to ensure optimal results for our patients in the [location] area.
|
||||
|
||||
### Why Choose [Clinic Name] for [Medical Service] in [Location]?
|
||||
|
||||
Located conveniently at [specific address near landmark], our [location] clinic specializes in providing personalized [medical service] treatments tailored to each patient's unique needs. We understand the specific health concerns of [location] residents and have designed our services accordingly.
|
||||
|
||||
**Our [Location] Advantages:**
|
||||
- 🏥 Modern facility equipped with latest [equipment type]
|
||||
- 👨⚕️ Board-certified specialists with [certification details]
|
||||
- 📍 Easy access from [nearby subway/bus stations]
|
||||
- 🕐 Extended hours to accommodate busy [location] professionals
|
||||
- 💳 Accept major insurance plans popular in [location]
|
||||
|
||||
## Understanding [Medical Service]
|
||||
|
||||
### What Is [Medical Service]?
|
||||
|
||||
[Detailed medical explanation of the service, including scientific background, FDA approvals if applicable, and medical benefits. This section should be educational while remaining accessible.]
|
||||
|
||||
### Who Can Benefit from [Medical Service]?
|
||||
|
||||
Our [medical service] treatment in [location] is ideal for patients experiencing:
|
||||
- [Condition 1 with brief explanation]
|
||||
- [Condition 2 with brief explanation]
|
||||
- [Condition 3 with brief explanation]
|
||||
- [Condition 4 with brief explanation]
|
||||
|
||||
## Our [Medical Service] Process in [Location]
|
||||
|
||||
### 1. Initial Consultation
|
||||
Your journey begins with a comprehensive consultation at our [location] clinic. Our specialists will:
|
||||
- Review your medical history
|
||||
- Conduct necessary diagnostic tests
|
||||
- Discuss your treatment goals
|
||||
- Create a personalized treatment plan
|
||||
|
||||
### 2. Treatment Planning
|
||||
Based on your consultation, we develop a customized approach that considers:
|
||||
- Your specific medical condition
|
||||
- Lifestyle factors common to [location] residents
|
||||
- Insurance coverage options
|
||||
- Optimal scheduling for your convenience
|
||||
|
||||
### 3. Treatment Sessions
|
||||
Each [medical service] session at our [location] facility typically involves:
|
||||
- Pre-treatment preparation
|
||||
- The procedure itself (approximately [duration])
|
||||
- Post-treatment monitoring
|
||||
- Detailed aftercare instructions
|
||||
|
||||
### 4. Follow-up Care
|
||||
We provide comprehensive follow-up support including:
|
||||
- Scheduled check-ups
|
||||
- 24/7 emergency hotline
|
||||
- Ongoing treatment adjustments
|
||||
- Long-term health monitoring
|
||||
|
||||
## Expected Results and Recovery
|
||||
|
||||
### What to Expect After [Medical Service]
|
||||
|
||||
Patients at our [location] clinic typically experience:
|
||||
- **Immediate effects**: [Description]
|
||||
- **Short-term (1-2 weeks)**: [Description]
|
||||
- **Long-term (1-3 months)**: [Description]
|
||||
- **Final results**: [Timeline and description]
|
||||
|
||||
### Recovery Timeline
|
||||
- Day 1-3: [Recovery details]
|
||||
- Week 1: [Recovery details]
|
||||
- Week 2-4: [Recovery details]
|
||||
- Month 2-3: [Recovery details]
|
||||
|
||||
## Safety and Credentials
|
||||
|
||||
### Our Medical Standards
|
||||
[Clinic Name] in [location] maintains the highest medical standards:
|
||||
- ✓ [Relevant medical certification]
|
||||
- ✓ [Hospital affiliation if applicable]
|
||||
- ✓ [Safety protocol certification]
|
||||
- ✓ [Professional membership]
|
||||
|
||||
### Our Medical Team
|
||||
**Dr. [Name], MD**
|
||||
- [Medical school]
|
||||
- [Residency/Fellowship]
|
||||
- [Years of experience] specializing in [medical service]
|
||||
- [Special recognition or research]
|
||||
|
||||
## Pricing and Insurance
|
||||
|
||||
### Insurance Coverage
|
||||
We accept most major insurance plans used by [location] residents:
|
||||
- [Insurance provider 1]
|
||||
- [Insurance provider 2]
|
||||
- [Insurance provider 3]
|
||||
- [Insurance provider 4]
|
||||
|
||||
### Payment Options
|
||||
For your convenience, we offer:
|
||||
- Insurance direct billing
|
||||
- Flexible payment plans
|
||||
- Credit card payments
|
||||
- HSA/FSA acceptance
|
||||
|
||||
### Transparent Pricing
|
||||
Contact us for a detailed quote. Factors affecting cost include:
|
||||
- Severity of condition
|
||||
- Number of sessions required
|
||||
- Insurance coverage level
|
||||
- Additional treatments needed
|
||||
|
||||
## Patient Testimonials from [Location]
|
||||
|
||||
> "After struggling with [condition] for years, I finally found relief at [Clinic Name]. The team was professional, and the results exceeded my expectations."
|
||||
> — [Patient initials], [Location] resident
|
||||
|
||||
> "The convenience of having such high-quality [medical service] right here in [location] made all the difference. I no longer have to travel to [other area] for treatment."
|
||||
> — [Patient initials], [Nearby neighborhood]
|
||||
|
||||
> "Dr. [Name] took the time to explain everything thoroughly. I felt confident throughout the entire process."
|
||||
> — [Patient initials], [Location] professional
|
||||
|
||||
## Frequently Asked Questions
|
||||
|
||||
### General Questions
|
||||
|
||||
**Q: How do I know if [medical service] is right for me?**
|
||||
A: The best way to determine if you're a candidate is through a consultation at our [location] clinic. We'll evaluate your medical history, current condition, and treatment goals.
|
||||
|
||||
**Q: How long does [medical service] take?**
|
||||
A: Treatment sessions typically last [duration], though your first visit including consultation may take [longer duration].
|
||||
|
||||
**Q: Is [medical service] painful?**
|
||||
A: [Comfort level explanation with pain management options available]
|
||||
|
||||
### Location-Specific Questions
|
||||
|
||||
**Q: Where exactly is your [location] clinic located?**
|
||||
A: We're located at [full address], just [distance] from [landmark/station]. [Parking/public transport information].
|
||||
|
||||
**Q: Do you have parking available?**
|
||||
A: Yes, we offer [parking details specific to location].
|
||||
|
||||
**Q: What are your hours for the [location] clinic?**
|
||||
A:
|
||||
- Monday-Friday: [hours]
|
||||
- Saturday: [hours]
|
||||
- Sunday: [hours/closed]
|
||||
|
||||
### Insurance and Payment
|
||||
|
||||
**Q: Does insurance cover [medical service]?**
|
||||
A: Coverage varies by plan. Our insurance specialists can verify your benefits before your appointment.
|
||||
|
||||
**Q: Do you offer payment plans?**
|
||||
A: Yes, we offer flexible payment options including [specific plans available].
|
||||
|
||||
## Schedule Your [Medical Service] Consultation in [Location]
|
||||
|
||||
Ready to take the first step? Contact our [location] clinic today:
|
||||
|
||||
### Contact Information
|
||||
📍 **Address**: [Full address]
|
||||
📞 **Phone**: [Local phone number]
|
||||
📧 **Email**: [location]@[clinicname].com
|
||||
🌐 **Online Booking**: [URL]
|
||||
|
||||
### Office Hours
|
||||
- **Monday-Friday**: [Hours]
|
||||
- **Saturday**: [Hours]
|
||||
- **Sunday**: [Hours/Closed]
|
||||
- **Emergency**: [24/7 hotline if available]
|
||||
|
||||
### Getting Here
|
||||
**By Subway**: [Detailed directions from nearest station]
|
||||
**By Bus**: [Bus routes and stops]
|
||||
**By Car**: [Driving directions and parking info]
|
||||
|
||||
---
|
||||
|
||||
<!-- Schema Markup -->
|
||||
```json
|
||||
{
|
||||
"@context": "https://schema.org",
|
||||
"@type": "MedicalClinic",
|
||||
"name": "[Clinic Name] - [Location]",
|
||||
"image": "[clinic-image-url]",
|
||||
"@id": "[page-url]",
|
||||
"url": "[website-url]",
|
||||
"telephone": "[phone-number]",
|
||||
"address": {
|
||||
"@type": "PostalAddress",
|
||||
"streetAddress": "[street]",
|
||||
"addressLocality": "[city]",
|
||||
"addressRegion": "[state/province]",
|
||||
"postalCode": "[zip]",
|
||||
"addressCountry": "KR"
|
||||
},
|
||||
"geo": {
|
||||
"@type": "GeoCoordinates",
|
||||
"latitude": [latitude],
|
||||
"longitude": [longitude]
|
||||
},
|
||||
"openingHoursSpecification": {
|
||||
"@type": "OpeningHoursSpecification",
|
||||
"dayOfWeek": ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"],
|
||||
"opens": "09:00",
|
||||
"closes": "18:00"
|
||||
},
|
||||
"medicalSpecialty": "[Medical Specialty]",
|
||||
"availableService": {
|
||||
"@type": "MedicalProcedure",
|
||||
"name": "[Medical Service]",
|
||||
"description": "[Service Description]"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
*Last updated: [Date] | [Clinic Name] - Professional [Medical Service] in [Location]*
|
||||
377
custom-skills/18-seo-gateway-builder/desktop/SKILL.md
Normal file
377
custom-skills/18-seo-gateway-builder/desktop/SKILL.md
Normal file
@@ -0,0 +1,377 @@
|
||||
---
|
||||
name: gateway-page-content-builder
|
||||
version: 1.0.0
|
||||
description: Systematic content generation framework for SEO-optimized gateway pages with local service focus
|
||||
license: OurDigital internal-use ONLY
|
||||
---
|
||||
|
||||
# Gateway Page Content Builder
|
||||
|
||||
A comprehensive skill for building high-quality, SEO-optimized gateway page content for local services, medical practices, and business locations.
|
||||
|
||||
## Core Purpose
|
||||
|
||||
This skill provides a systematic framework for creating gateway pages that:
|
||||
- Target specific location + service keyword combinations
|
||||
- Follow SEO best practices for local search optimization
|
||||
- Maintain content quality and uniqueness at scale
|
||||
- Include structured data and technical SEO elements
|
||||
|
||||
## Content Generation Framework
|
||||
|
||||
### 1. Page Structure Template
|
||||
|
||||
Every gateway page should follow this optimized structure:
|
||||
|
||||
```markdown
|
||||
# [Service Name] in [Location] - [Brand Name]
|
||||
|
||||
## Hero Section
|
||||
- Primary headline with target keywords
|
||||
- Value proposition statement
|
||||
- Quick contact CTA
|
||||
|
||||
## Service Overview
|
||||
- What is [service]?
|
||||
- Why choose our [service] in [location]
|
||||
- Key benefits for [location] residents
|
||||
|
||||
## Local Service Details
|
||||
- Service availability in [location]
|
||||
- Local team/facility information
|
||||
- Location-specific offerings
|
||||
|
||||
## Process & Procedure
|
||||
- Step-by-step service flow
|
||||
- Duration and frequency
|
||||
- What to expect
|
||||
|
||||
## Benefits & Results
|
||||
- Evidence-based outcomes
|
||||
- Patient/customer testimonials
|
||||
- Before/after scenarios
|
||||
|
||||
## Pricing & Insurance
|
||||
- Transparent pricing structure
|
||||
- Insurance coverage details
|
||||
- Payment options
|
||||
|
||||
## FAQ Section
|
||||
- Location-specific questions
|
||||
- Service-specific concerns
|
||||
- Booking and preparation
|
||||
|
||||
## Contact & Booking
|
||||
- Clear CTA sections
|
||||
- Multiple contact methods
|
||||
- Online booking integration
|
||||
```
|
||||
|
||||
### 2. Content Variables System
|
||||
|
||||
Define reusable content variables for efficient scaling:
|
||||
|
||||
```yaml
|
||||
variables:
|
||||
service_types:
|
||||
- name: "laser_hair_removal"
|
||||
korean: "레이저 제모"
|
||||
description: "Advanced laser technology for permanent hair reduction"
|
||||
keywords: ["laser hair removal", "permanent hair removal", "IPL treatment"]
|
||||
|
||||
locations:
|
||||
- name: "gangnam"
|
||||
korean: "강남"
|
||||
full_address: "서울특별시 강남구"
|
||||
landmarks: ["COEX", "Samsung Station", "Gangnam Station"]
|
||||
demographics: "Young professionals, high income"
|
||||
|
||||
brand_info:
|
||||
name: "Your Clinic"
|
||||
korean: "클리닉명"
|
||||
usp: "15+ years of experience with latest technology"
|
||||
```
|
||||
|
||||
### 3. Content Generation Rules
|
||||
|
||||
#### Title Tag Formula
|
||||
```
|
||||
[Service] in [Location] | [Unique Modifier] | [Brand]
|
||||
Examples:
|
||||
- "Laser Hair Removal in Gangnam | Same-Day Appointments | Jamie Clinic"
|
||||
- "강남 레이저 제모 | 당일 예약 가능 | 제이미 클리닉"
|
||||
```
|
||||
|
||||
#### Meta Description Template
|
||||
```
|
||||
Looking for [service] in [location]? [Brand] offers [USP] with [benefit].
|
||||
Book your consultation today. ✓ [Feature 1] ✓ [Feature 2] ✓ [Feature 3]
|
||||
```
|
||||
|
||||
#### H1 Optimization
|
||||
```
|
||||
Primary: [Service] in [Location]
|
||||
Alternative: [Location] [Service] - [Brand Modifier]
|
||||
Korean: [지역] [서비스] 전문 [브랜드]
|
||||
```
|
||||
|
||||
### 4. Local SEO Elements
|
||||
|
||||
#### Schema Markup Requirements
|
||||
```json
|
||||
{
|
||||
"@context": "https://schema.org",
|
||||
"@type": "MedicalBusiness",
|
||||
"name": "Clinic Name",
|
||||
"address": {
|
||||
"@type": "PostalAddress",
|
||||
"streetAddress": "",
|
||||
"addressLocality": "",
|
||||
"addressRegion": "",
|
||||
"postalCode": ""
|
||||
},
|
||||
"geo": {
|
||||
"@type": "GeoCoordinates",
|
||||
"latitude": "",
|
||||
"longitude": ""
|
||||
},
|
||||
"areaServed": {
|
||||
"@type": "City",
|
||||
"name": "Location Name"
|
||||
},
|
||||
"medicalSpecialty": "Service Type",
|
||||
"availableService": {
|
||||
"@type": "MedicalProcedure",
|
||||
"name": "Service Name",
|
||||
"description": "Service Description"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 5. Content Uniqueness Strategy
|
||||
|
||||
#### Localization Techniques
|
||||
1. **Local landmarks**: "Just 5 minutes from [Landmark]"
|
||||
2. **Transportation**: "Accessible via [Subway Line] at [Station]"
|
||||
3. **Local statistics**: "Serving [X] residents in [Area] since [Year]"
|
||||
4. **Community involvement**: "Proud partner of [Local Organization]"
|
||||
5. **Regional preferences**: "Tailored to [Location] residents' needs"
|
||||
|
||||
#### Content Variation Patterns
|
||||
```python
|
||||
variations = {
|
||||
"intro_patterns": [
|
||||
"Discover professional [service] in [location]",
|
||||
"[Location] residents trust us for [service]",
|
||||
"Your local [service] experts in [location]",
|
||||
"Premium [service] now available in [location]"
|
||||
],
|
||||
"cta_patterns": [
|
||||
"Book your [location] appointment today",
|
||||
"Schedule a consultation at our [location] clinic",
|
||||
"Visit us in [location] for [service]",
|
||||
"Get started with [service] in [location]"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### 6. Content Quality Checklist
|
||||
|
||||
Before publishing any gateway page, verify:
|
||||
|
||||
- [ ] **Keyword optimization**: Target keyword appears in title, H1, first 100 words
|
||||
- [ ] **Content length**: Minimum 800 words of unique content
|
||||
- [ ] **Local signals**: At least 5 location mentions naturally integrated
|
||||
- [ ] **Structured data**: Schema markup properly implemented
|
||||
- [ ] **Internal linking**: Links to main service page and location page
|
||||
- [ ] **Images**: Alt text includes location + service keywords
|
||||
- [ ] **Mobile optimization**: Content readable on mobile devices
|
||||
- [ ] **Load speed**: Page loads under 3 seconds
|
||||
- [ ] **CTAs**: Clear calls-to-action above and below fold
|
||||
- [ ] **Trust signals**: Reviews, certifications, testimonials included
|
||||
|
||||
### 7. Scaling Framework
|
||||
|
||||
#### Batch Generation Process
|
||||
1. Create master template with variable placeholders
|
||||
2. Define location and service matrices
|
||||
3. Generate unique content blocks for each combination
|
||||
4. Review and customize top 20% traffic potential pages
|
||||
5. Implement progressive enhancement based on performance
|
||||
|
||||
#### Priority Matrix
|
||||
```
|
||||
High Priority (Manual Optimization):
|
||||
- High search volume + High commercial intent
|
||||
- Major city centers + Premium services
|
||||
- Competitive keywords requiring unique angle
|
||||
|
||||
Medium Priority (Template + Customization):
|
||||
- Moderate search volume + Standard services
|
||||
- Secondary locations + Common procedures
|
||||
|
||||
Low Priority (Automated Generation):
|
||||
- Long-tail keywords + Suburban areas
|
||||
- Informational intent + Low competition
|
||||
```
|
||||
|
||||
### 8. Performance Tracking
|
||||
|
||||
#### KPIs to Monitor
|
||||
```yaml
|
||||
metrics:
|
||||
organic_traffic:
|
||||
- Pageviews from organic search
|
||||
- Unique visitors by location
|
||||
- Average session duration
|
||||
|
||||
conversions:
|
||||
- Form submissions by page
|
||||
- Phone calls tracked
|
||||
- Online bookings completed
|
||||
|
||||
engagement:
|
||||
- Bounce rate below 40%
|
||||
- Pages per session above 2.0
|
||||
- Scroll depth above 75%
|
||||
|
||||
rankings:
|
||||
- Position tracking for target keywords
|
||||
- Local pack appearances
|
||||
- Featured snippet captures
|
||||
```
|
||||
|
||||
## Implementation Instructions
|
||||
|
||||
### Step 1: Keyword Research
|
||||
```python
|
||||
# Generate keyword combinations
|
||||
locations = ["gangnam", "sinsa", "apgujeong"]
|
||||
services = ["laser_hair_removal", "botox", "filler"]
|
||||
|
||||
keywords = []
|
||||
for location in locations:
|
||||
for service in services:
|
||||
keywords.append({
|
||||
"primary": f"{service} {location}",
|
||||
"secondary": f"{location} {service} clinic",
|
||||
"long_tail": f"best {service} clinic in {location}"
|
||||
})
|
||||
```
|
||||
|
||||
### Step 2: Content Creation
|
||||
1. Use the template structure above
|
||||
2. Fill in variables for location and service
|
||||
3. Add unique local content (minimum 30% unique per page)
|
||||
4. Include relevant images with local landmarks
|
||||
5. Add schema markup and meta tags
|
||||
|
||||
### Step 3: Technical Implementation
|
||||
1. Create URL structure: `/location/service/`
|
||||
2. Implement breadcrumbs with proper schema
|
||||
3. Add internal linking to related pages
|
||||
4. Set up canonical tags to avoid duplication
|
||||
5. Create XML sitemap for gateway pages
|
||||
|
||||
### Step 4: Quality Assurance
|
||||
- Run content through plagiarism checker
|
||||
- Verify all technical SEO elements
|
||||
- Test page speed and mobile responsiveness
|
||||
- Review content for local relevance
|
||||
- Check all CTAs and contact information
|
||||
|
||||
## Advanced Techniques
|
||||
|
||||
### Dynamic Content Insertion
|
||||
```javascript
|
||||
// Example of dynamic content based on user location
|
||||
const userLocation = getUserLocation();
|
||||
const nearestClinic = findNearestClinic(userLocation);
|
||||
|
||||
// Update content dynamically
|
||||
document.querySelector('.hero-location').textContent =
|
||||
`Serving ${userLocation.district} and surrounding areas`;
|
||||
|
||||
document.querySelector('.distance-info').textContent =
|
||||
`Only ${nearestClinic.distance} from your location`;
|
||||
```
|
||||
|
||||
### A/B Testing Framework
|
||||
```yaml
|
||||
test_variations:
|
||||
headlines:
|
||||
- control: "[Service] in [Location]"
|
||||
- variant_a: "#1 [Service] Provider in [Location]"
|
||||
- variant_b: "[Location]'s Trusted [Service] Clinic"
|
||||
|
||||
cta_buttons:
|
||||
- control: "Book Now"
|
||||
- variant_a: "Get Free Consultation"
|
||||
- variant_b: "Check Availability"
|
||||
```
|
||||
|
||||
### Content Refresh Strategy
|
||||
- Monthly: Update testimonials and reviews
|
||||
- Quarterly: Refresh statistics and data points
|
||||
- Semi-annually: Add new FAQs based on search queries
|
||||
- Annually: Complete content audit and refresh
|
||||
|
||||
## Prompts for Content Generation
|
||||
|
||||
### Initial Content Brief
|
||||
```
|
||||
Create gateway page content for [SERVICE] in [LOCATION]:
|
||||
- Target keyword: [PRIMARY KEYWORD]
|
||||
- Secondary keywords: [LIST]
|
||||
- Local landmarks: [LIST]
|
||||
- Unique selling points: [LIST]
|
||||
- Competitor differentiation: [POINTS]
|
||||
```
|
||||
|
||||
### Content Expansion
|
||||
```
|
||||
Expand the following gateway page section:
|
||||
Current content: [PASTE]
|
||||
Add: Local statistics, transportation info, 2 testimonials
|
||||
Maintain: Professional tone, keyword density 2-3%
|
||||
Length: 200-300 words
|
||||
```
|
||||
|
||||
### FAQ Generation
|
||||
```
|
||||
Generate 8 FAQs for [SERVICE] in [LOCATION]:
|
||||
- 3 service-specific questions
|
||||
- 2 location/accessibility questions
|
||||
- 2 pricing/insurance questions
|
||||
- 1 preparation/aftercare question
|
||||
Include question schema markup format
|
||||
```
|
||||
|
||||
## Resources and Tools
|
||||
|
||||
### Recommended Tools
|
||||
- **Keyword Research**: Ahrefs, SEMrush, Google Keyword Planner
|
||||
- **Content Optimization**: Surfer SEO, Clearscope, MarketMuse
|
||||
- **Schema Generation**: Schema.org, Google's Structured Data Tool
|
||||
- **Performance Tracking**: Google Analytics, Search Console
|
||||
- **A/B Testing**: Google Optimize, Optimizely
|
||||
|
||||
### Templates Directory
|
||||
- `templates/gateway-page-medical.md`
|
||||
- `templates/gateway-page-beauty.md`
|
||||
- `templates/gateway-page-dental.md`
|
||||
- `templates/schema-medical-business.json`
|
||||
- `templates/meta-tags-local.html`
|
||||
|
||||
## Version History
|
||||
|
||||
### v1.0.0 (Current)
|
||||
- Initial framework for gateway page content generation
|
||||
- Medical and beauty service focus
|
||||
- Korean market optimization
|
||||
- Local SEO best practices
|
||||
- Content scaling methodology
|
||||
|
||||
---
|
||||
|
||||
*This skill is optimized for Korean medical and beauty service markets but can be adapted for any local service business requiring location-based gateway pages.*
|
||||
@@ -0,0 +1,282 @@
|
||||
# Laser Hair Removal in Gangnam | Expert Care | Jamie Clinic
|
||||
|
||||
<!-- Meta Tags -->
|
||||
<!--
|
||||
Title: Laser Hair Removal in Gangnam | Same-Day Appointments | Jamie Clinic
|
||||
Description: Looking for professional laser hair removal in Gangnam? Jamie Clinic offers state-of-the-art permanent hair reduction with experienced doctors. ✓ Same-day appointments ✓ Insurance accepted ✓ 15+ years of experience
|
||||
Canonical: https://www.jamieclinic.com/gangnam/laser-hair-removal/
|
||||
-->
|
||||
|
||||
## Professional Laser Hair Removal Available in Gangnam
|
||||
|
||||
Welcome to Jamie Clinic, your trusted provider for laser hair removal in Gangnam. Our medical team brings over 14 years of combined experience, utilizing the latest medical technology to ensure optimal results for our patients in the Gangnam area.
|
||||
|
||||
### Why Choose Jamie Clinic for Laser Hair Removal in Gangnam?
|
||||
|
||||
Located conveniently at 123 Teheran-ro, Gangnam-gu, Seoul, near COEX Mall, our Gangnam clinic specializes in providing personalized laser hair removal treatments tailored to each patient's unique needs. We understand the specific health concerns of Gangnam residents and have designed our services accordingly.
|
||||
|
||||
**Our Gangnam Advantages:**
|
||||
- 🏥 Modern facility equipped with latest diode and Alexandrite laser equipment
|
||||
- 👨⚕️ Board-certified specialists with ISO 9001, KAHF Certified
|
||||
- 📍 Easy access from Gangnam Station (Line 2), Sinnonhyeon Station (Line 9)
|
||||
- 🕐 Extended hours to accommodate busy Gangnam professionals
|
||||
- 💳 Accept major insurance plans popular in Gangnam
|
||||
|
||||
## Understanding Laser Hair Removal
|
||||
|
||||
### What Is Laser Hair Removal?
|
||||
|
||||
Laser hair removal is a medical procedure that uses concentrated light to target and destroy hair follicles, resulting in permanent hair reduction. Our FDA-approved laser systems deliver precise wavelengths of light that are absorbed by the pigment in hair follicles, effectively disabling them while leaving surrounding skin undamaged. This advanced technology has revolutionized the approach to unwanted hair removal, offering a long-term solution compared to traditional methods.
|
||||
|
||||
### Who Can Benefit from Laser Hair Removal?
|
||||
|
||||
Our laser hair removal treatment in Gangnam is ideal for patients experiencing:
|
||||
- Excessive unwanted hair growth requiring frequent shaving or waxing
|
||||
- Ingrown hairs and razor burn from traditional hair removal methods
|
||||
- Time constraints making regular hair removal maintenance challenging
|
||||
- Skin sensitivity or allergic reactions to depilatory creams and waxing
|
||||
|
||||
## Our Laser Hair Removal Process in Gangnam
|
||||
|
||||
### 1. Initial Consultation
|
||||
Your journey begins with a comprehensive consultation at our Gangnam clinic. Our specialists will:
|
||||
- Review your medical history and skin type
|
||||
- Conduct a patch test to determine optimal laser settings
|
||||
- Discuss your hair removal goals and expectations
|
||||
- Create a personalized treatment plan with session scheduling
|
||||
|
||||
### 2. Treatment Planning
|
||||
Based on your consultation, we develop a customized approach that considers:
|
||||
- Your specific hair type and skin tone
|
||||
- Lifestyle factors common to young professionals and affluent residents
|
||||
- Insurance coverage options for medical necessity cases
|
||||
- Optimal scheduling for your convenience around work commitments
|
||||
|
||||
### 3. Treatment Sessions
|
||||
Each laser hair removal session at our Gangnam facility typically involves:
|
||||
- Pre-treatment skin preparation and cooling
|
||||
- The procedure itself (approximately 15-60 minutes depending on area)
|
||||
- Post-treatment soothing gel application
|
||||
- Detailed aftercare instructions for best results
|
||||
|
||||
### 4. Follow-up Care
|
||||
We provide comprehensive follow-up support including:
|
||||
- Scheduled check-ups to monitor progress
|
||||
- 24/7 emergency hotline for any concerns
|
||||
- Treatment adjustments based on your results
|
||||
- Long-term maintenance planning
|
||||
|
||||
## Expected Results and Recovery
|
||||
|
||||
### What to Expect After Laser Hair Removal
|
||||
|
||||
Patients at our Gangnam clinic typically experience:
|
||||
- **Immediate effects**: Slight redness and warmth in treated areas
|
||||
- **Short-term (1-2 weeks)**: Hair shedding from treated follicles
|
||||
- **Long-term (1-3 months)**: Noticeable reduction in hair growth
|
||||
- **Final results**: 70-90% permanent hair reduction after complete treatment series
|
||||
|
||||
### Recovery Timeline
|
||||
- Day 1-3: Mild redness and sensitivity, similar to mild sunburn
|
||||
- Week 1: Hair begins to shed naturally from treated follicles
|
||||
- Week 2-4: Skin returns to normal, new hair growth visibly reduced
|
||||
- Month 2-3: Ready for next treatment session
|
||||
|
||||
## Safety and Credentials
|
||||
|
||||
### Our Medical Standards
|
||||
Jamie Clinic in Gangnam maintains the highest medical standards:
|
||||
- ✓ ISO 9001 Quality Management Certification
|
||||
- ✓ KAHF (Korean Association of Hair and Face) Certified
|
||||
- ✓ JCI Accredited facility
|
||||
- ✓ Member of Korean Dermatological Association
|
||||
|
||||
### Our Medical Team
|
||||
**Dr. Kim Jung-ho, MD**
|
||||
- Seoul National University Medical School
|
||||
- Dermatology Residency at Samsung Medical Center
|
||||
- 10+ years specializing in laser hair removal
|
||||
- Published research on laser technology in Asian skin types
|
||||
|
||||
**Dr. Park Mi-young, MD**
|
||||
- Yonsei University College of Medicine
|
||||
- Fellowship in Cosmetic Dermatology
|
||||
- 8+ years of laser treatment experience
|
||||
- Specialized training in sensitive skin treatments
|
||||
|
||||
## Pricing and Insurance
|
||||
|
||||
### Insurance Coverage
|
||||
We accept most major insurance plans used by Gangnam residents:
|
||||
- Samsung Fire & Marine Insurance
|
||||
- KB Insurance
|
||||
- Hyundai Marine & Fire Insurance
|
||||
- DB Insurance
|
||||
|
||||
### Payment Options
|
||||
For your convenience, we offer:
|
||||
- Insurance direct billing for eligible treatments
|
||||
- Flexible payment plans with 0% interest
|
||||
- All major credit cards accepted
|
||||
- HSA/FSA acceptance for medical procedures
|
||||
|
||||
### Transparent Pricing
|
||||
Contact us for a detailed quote. Factors affecting cost include:
|
||||
- Size and number of treatment areas
|
||||
- Number of sessions required (typically 6-8)
|
||||
- Your specific insurance coverage level
|
||||
- Package deals for multiple areas
|
||||
|
||||
**Pricing Guide:**
|
||||
- Small areas (upper lip, chin): 50,000-100,000 KRW per session
|
||||
- Medium areas (underarms, bikini): 100,000-200,000 KRW per session
|
||||
- Large areas (full legs, back): 200,000-400,000 KRW per session
|
||||
|
||||
## Patient Testimonials from Gangnam
|
||||
|
||||
> "After struggling with ingrown hairs for years, I finally found relief at Jamie Clinic. The team was professional, and the results exceeded my expectations. Being so close to my office in COEX made it incredibly convenient."
|
||||
> — J.K., Gangnam resident
|
||||
|
||||
> "The convenience of having such high-quality laser hair removal right here in Gangnam made all the difference. I no longer have to travel to other districts for treatment. The results have been life-changing!"
|
||||
> — S.L., Samsung Station area
|
||||
|
||||
> "Dr. Kim took the time to explain everything thoroughly. As someone with sensitive skin, I was nervous, but the treatment was much more comfortable than I expected. Highly recommend to anyone in Gangnam!"
|
||||
> — M.C., Gangnam professional
|
||||
|
||||
## Frequently Asked Questions
|
||||
|
||||
### General Questions
|
||||
|
||||
**Q: How do I know if laser hair removal is right for me?**
|
||||
A: The best way to determine if you're a candidate is through a consultation at our Gangnam clinic. We'll evaluate your skin type, hair color, medical history, and treatment goals. Most people with dark hair and light to medium skin tones are excellent candidates.
|
||||
|
||||
**Q: How long does laser hair removal take?**
|
||||
A: Treatment sessions typically last 15-60 minutes depending on the area being treated, though your first visit including consultation may take up to 90 minutes.
|
||||
|
||||
**Q: Is laser hair removal painful?**
|
||||
A: Most patients describe the sensation as similar to a rubber band snapping against the skin. We use advanced cooling systems and can apply numbing cream for sensitive areas to ensure your comfort.
|
||||
|
||||
**Q: How many sessions will I need?**
|
||||
A: Most patients require 6-8 sessions spaced 4-6 weeks apart for optimal results. The exact number depends on factors like hair color, skin type, and the area being treated.
|
||||
|
||||
### Location-Specific Questions
|
||||
|
||||
**Q: Where exactly is your Gangnam clinic located?**
|
||||
A: We're located at 123 Teheran-ro, Gangnam-gu, Seoul, just 3 minutes walk from Gangnam Station Exit 11. We're in the medical building next to COEX Mall. Free underground parking is available for all patients.
|
||||
|
||||
**Q: Do you have parking available?**
|
||||
A: Yes, we offer free underground parking for all patients. The parking entrance is on the left side of the building. Simply validate your parking ticket at reception.
|
||||
|
||||
**Q: What are your hours for the Gangnam clinic?**
|
||||
A:
|
||||
- Monday-Friday: 9:00 AM - 8:00 PM
|
||||
- Saturday: 9:00 AM - 6:00 PM
|
||||
- Sunday: 10:00 AM - 4:00 PM
|
||||
|
||||
**Q: How do I get there by public transportation?**
|
||||
A: Take Line 2 to Gangnam Station and use Exit 11. Walk straight for about 3 minutes. We're in the tall glass medical building on your right. Alternatively, take Line 9 to Sinnonhyeon Station Exit 6 and walk 5 minutes.
|
||||
|
||||
### Insurance and Payment
|
||||
|
||||
**Q: Does insurance cover laser hair removal?**
|
||||
A: Coverage varies by plan. Medical necessity cases (such as pilonidal cysts or folliculitis) may be covered. Our insurance specialists can verify your benefits before your appointment.
|
||||
|
||||
**Q: Do you offer payment plans?**
|
||||
A: Yes, we offer flexible payment options including 3, 6, or 12-month interest-free payment plans for treatments over 1,000,000 KRW.
|
||||
|
||||
**Q: Can I use my FSA or HSA?**
|
||||
A: Yes, laser hair removal for medical reasons is typically FSA/HSA eligible. We can provide the necessary documentation for reimbursement.
|
||||
|
||||
## Schedule Your Laser Hair Removal Consultation in Gangnam
|
||||
|
||||
Ready to say goodbye to unwanted hair forever? Contact our Gangnam clinic today:
|
||||
|
||||
### Contact Information
|
||||
📍 **Address**: 123 Teheran-ro, Gangnam-gu, Seoul 06234
|
||||
📞 **Phone**: +82-2-1234-5678
|
||||
📧 **Email**: gangnam@jamieclinic.com
|
||||
🌐 **Online Booking**: https://www.jamieclinic.com/book-gangnam
|
||||
|
||||
### Office Hours
|
||||
- **Monday-Friday**: 9:00 AM - 8:00 PM
|
||||
- **Saturday**: 9:00 AM - 6:00 PM
|
||||
- **Sunday**: 10:00 AM - 4:00 PM
|
||||
- **Emergency**: 24/7 hotline available at +82-2-1234-9999
|
||||
|
||||
### Getting Here
|
||||
**By Subway**:
|
||||
- Line 2: Gangnam Station Exit 11 (3-minute walk)
|
||||
- Line 9: Sinnonhyeon Station Exit 6 (5-minute walk)
|
||||
- Sinbundang Line: Gangnam Station Exit 11 (3-minute walk)
|
||||
|
||||
**By Bus**: Routes 146, 341, 360, 740 all stop within 2 minutes of our clinic
|
||||
|
||||
**By Car**: Navigate to 123 Teheran-ro, Gangnam-gu. Underground parking entrance on the left side of the building. Free parking for all patients.
|
||||
|
||||
### Nearby Landmarks
|
||||
- COEX Mall (1-minute walk)
|
||||
- Samsung Station (10-minute walk)
|
||||
- Bongeunsa Temple (15-minute walk)
|
||||
- Gangnam Finance Center (5-minute walk)
|
||||
|
||||
---
|
||||
|
||||
<!-- Schema Markup -->
|
||||
<script type='application/ld+json'>
|
||||
{
|
||||
"@context": "https://schema.org",
|
||||
"@type": "MedicalBusiness",
|
||||
"name": "Jamie Clinic - Gangnam",
|
||||
"url": "https://www.jamieclinic.com",
|
||||
"telephone": "+82-2-1234-5678",
|
||||
"email": "info@jamieclinic.com",
|
||||
"address": {
|
||||
"@type": "PostalAddress",
|
||||
"streetAddress": "123 Teheran-ro, Gangnam-gu, Seoul",
|
||||
"addressLocality": "Gangnam",
|
||||
"addressCountry": "KR"
|
||||
},
|
||||
"geo": {
|
||||
"@type": "GeoCoordinates",
|
||||
"latitude": 37.4979,
|
||||
"longitude": 127.0276
|
||||
},
|
||||
"areaServed": {
|
||||
"@type": "City",
|
||||
"name": "Gangnam"
|
||||
},
|
||||
"availableService": {
|
||||
"@type": "MedicalProcedure",
|
||||
"name": "Laser Hair Removal",
|
||||
"description": "Permanent hair reduction using state-of-the-art laser technology"
|
||||
},
|
||||
"priceRange": "₩₩₩",
|
||||
"openingHoursSpecification": [
|
||||
{
|
||||
"@type": "OpeningHoursSpecification",
|
||||
"dayOfWeek": ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"],
|
||||
"opens": "09:00",
|
||||
"closes": "20:00"
|
||||
},
|
||||
{
|
||||
"@type": "OpeningHoursSpecification",
|
||||
"dayOfWeek": "Saturday",
|
||||
"opens": "09:00",
|
||||
"closes": "18:00"
|
||||
},
|
||||
{
|
||||
"@type": "OpeningHoursSpecification",
|
||||
"dayOfWeek": "Sunday",
|
||||
"opens": "10:00",
|
||||
"closes": "16:00"
|
||||
}
|
||||
],
|
||||
"aggregateRating": {
|
||||
"@type": "AggregateRating",
|
||||
"ratingValue": "4.8",
|
||||
"reviewCount": "324"
|
||||
}
|
||||
}
|
||||
</script>
|
||||
|
||||
*Last updated: November 19, 2024 | Jamie Clinic - Professional Laser Hair Removal in Gangnam*
|
||||
@@ -0,0 +1,231 @@
|
||||
# [Medical Service] in [Location] | [Clinic Name]
|
||||
|
||||
<!-- Meta Tags -->
|
||||
<!--
|
||||
Title: [Medical Service] in [Location] | Expert Care | [Clinic Name]
|
||||
Description: Looking for professional [medical service] in [location]? [Clinic Name] offers state-of-the-art [service] with experienced doctors. ✓ Same-day appointments ✓ Insurance accepted ✓ [Unique benefit]
|
||||
Canonical: https://example.com/[location]/[service-slug]/
|
||||
-->
|
||||
|
||||
## Professional [Medical Service] Available in [Location]
|
||||
|
||||
Welcome to [Clinic Name], your trusted provider for [medical service] in [location]. Our medical team brings over [X years] of combined experience, utilizing the latest medical technology to ensure optimal results for our patients in the [location] area.
|
||||
|
||||
### Why Choose [Clinic Name] for [Medical Service] in [Location]?
|
||||
|
||||
Located conveniently at [specific address near landmark], our [location] clinic specializes in providing personalized [medical service] treatments tailored to each patient's unique needs. We understand the specific health concerns of [location] residents and have designed our services accordingly.
|
||||
|
||||
**Our [Location] Advantages:**
|
||||
- 🏥 Modern facility equipped with latest [equipment type]
|
||||
- 👨⚕️ Board-certified specialists with [certification details]
|
||||
- 📍 Easy access from [nearby subway/bus stations]
|
||||
- 🕐 Extended hours to accommodate busy [location] professionals
|
||||
- 💳 Accept major insurance plans popular in [location]
|
||||
|
||||
## Understanding [Medical Service]
|
||||
|
||||
### What Is [Medical Service]?
|
||||
|
||||
[Detailed medical explanation of the service, including scientific background, FDA approvals if applicable, and medical benefits. This section should be educational while remaining accessible.]
|
||||
|
||||
### Who Can Benefit from [Medical Service]?
|
||||
|
||||
Our [medical service] treatment in [location] is ideal for patients experiencing:
|
||||
- [Condition 1 with brief explanation]
|
||||
- [Condition 2 with brief explanation]
|
||||
- [Condition 3 with brief explanation]
|
||||
- [Condition 4 with brief explanation]
|
||||
|
||||
## Our [Medical Service] Process in [Location]
|
||||
|
||||
### 1. Initial Consultation
|
||||
Your journey begins with a comprehensive consultation at our [location] clinic. Our specialists will:
|
||||
- Review your medical history
|
||||
- Conduct necessary diagnostic tests
|
||||
- Discuss your treatment goals
|
||||
- Create a personalized treatment plan
|
||||
|
||||
### 2. Treatment Planning
|
||||
Based on your consultation, we develop a customized approach that considers:
|
||||
- Your specific medical condition
|
||||
- Lifestyle factors common to [location] residents
|
||||
- Insurance coverage options
|
||||
- Optimal scheduling for your convenience
|
||||
|
||||
### 3. Treatment Sessions
|
||||
Each [medical service] session at our [location] facility typically involves:
|
||||
- Pre-treatment preparation
|
||||
- The procedure itself (approximately [duration])
|
||||
- Post-treatment monitoring
|
||||
- Detailed aftercare instructions
|
||||
|
||||
### 4. Follow-up Care
|
||||
We provide comprehensive follow-up support including:
|
||||
- Scheduled check-ups
|
||||
- 24/7 emergency hotline
|
||||
- Ongoing treatment adjustments
|
||||
- Long-term health monitoring
|
||||
|
||||
## Expected Results and Recovery
|
||||
|
||||
### What to Expect After [Medical Service]
|
||||
|
||||
Patients at our [location] clinic typically experience:
|
||||
- **Immediate effects**: [Description]
|
||||
- **Short-term (1-2 weeks)**: [Description]
|
||||
- **Long-term (1-3 months)**: [Description]
|
||||
- **Final results**: [Timeline and description]
|
||||
|
||||
### Recovery Timeline
|
||||
- Day 1-3: [Recovery details]
|
||||
- Week 1: [Recovery details]
|
||||
- Week 2-4: [Recovery details]
|
||||
- Month 2-3: [Recovery details]
|
||||
|
||||
## Safety and Credentials
|
||||
|
||||
### Our Medical Standards
|
||||
[Clinic Name] in [location] maintains the highest medical standards:
|
||||
- ✓ [Relevant medical certification]
|
||||
- ✓ [Hospital affiliation if applicable]
|
||||
- ✓ [Safety protocol certification]
|
||||
- ✓ [Professional membership]
|
||||
|
||||
### Our Medical Team
|
||||
**Dr. [Name], MD**
|
||||
- [Medical school]
|
||||
- [Residency/Fellowship]
|
||||
- [Years of experience] specializing in [medical service]
|
||||
- [Special recognition or research]
|
||||
|
||||
## Pricing and Insurance
|
||||
|
||||
### Insurance Coverage
|
||||
We accept most major insurance plans used by [location] residents:
|
||||
- [Insurance provider 1]
|
||||
- [Insurance provider 2]
|
||||
- [Insurance provider 3]
|
||||
- [Insurance provider 4]
|
||||
|
||||
### Payment Options
|
||||
For your convenience, we offer:
|
||||
- Insurance direct billing
|
||||
- Flexible payment plans
|
||||
- Credit card payments
|
||||
- HSA/FSA acceptance
|
||||
|
||||
### Transparent Pricing
|
||||
Contact us for a detailed quote. Factors affecting cost include:
|
||||
- Severity of condition
|
||||
- Number of sessions required
|
||||
- Insurance coverage level
|
||||
- Additional treatments needed
|
||||
|
||||
## Patient Testimonials from [Location]
|
||||
|
||||
> "After struggling with [condition] for years, I finally found relief at [Clinic Name]. The team was professional, and the results exceeded my expectations."
|
||||
> — [Patient initials], [Location] resident
|
||||
|
||||
> "The convenience of having such high-quality [medical service] right here in [location] made all the difference. I no longer have to travel to [other area] for treatment."
|
||||
> — [Patient initials], [Nearby neighborhood]
|
||||
|
||||
> "Dr. [Name] took the time to explain everything thoroughly. I felt confident throughout the entire process."
|
||||
> — [Patient initials], [Location] professional
|
||||
|
||||
## Frequently Asked Questions
|
||||
|
||||
### General Questions
|
||||
|
||||
**Q: How do I know if [medical service] is right for me?**
|
||||
A: The best way to determine if you're a candidate is through a consultation at our [location] clinic. We'll evaluate your medical history, current condition, and treatment goals.
|
||||
|
||||
**Q: How long does [medical service] take?**
|
||||
A: Treatment sessions typically last [duration], though your first visit including consultation may take [longer duration].
|
||||
|
||||
**Q: Is [medical service] painful?**
|
||||
A: [Comfort level explanation with pain management options available]
|
||||
|
||||
### Location-Specific Questions
|
||||
|
||||
**Q: Where exactly is your [location] clinic located?**
|
||||
A: We're located at [full address], just [distance] from [landmark/station]. [Parking/public transport information].
|
||||
|
||||
**Q: Do you have parking available?**
|
||||
A: Yes, we offer [parking details specific to location].
|
||||
|
||||
**Q: What are your hours for the [location] clinic?**
|
||||
A:
|
||||
- Monday-Friday: [hours]
|
||||
- Saturday: [hours]
|
||||
- Sunday: [hours/closed]
|
||||
|
||||
### Insurance and Payment
|
||||
|
||||
**Q: Does insurance cover [medical service]?**
|
||||
A: Coverage varies by plan. Our insurance specialists can verify your benefits before your appointment.
|
||||
|
||||
**Q: Do you offer payment plans?**
|
||||
A: Yes, we offer flexible payment options including [specific plans available].
|
||||
|
||||
## Schedule Your [Medical Service] Consultation in [Location]
|
||||
|
||||
Ready to take the first step? Contact our [location] clinic today:
|
||||
|
||||
### Contact Information
|
||||
📍 **Address**: [Full address]
|
||||
📞 **Phone**: [Local phone number]
|
||||
📧 **Email**: [location]@[clinicname].com
|
||||
🌐 **Online Booking**: [URL]
|
||||
|
||||
### Office Hours
|
||||
- **Monday-Friday**: [Hours]
|
||||
- **Saturday**: [Hours]
|
||||
- **Sunday**: [Hours/Closed]
|
||||
- **Emergency**: [24/7 hotline if available]
|
||||
|
||||
### Getting Here
|
||||
**By Subway**: [Detailed directions from nearest station]
|
||||
**By Bus**: [Bus routes and stops]
|
||||
**By Car**: [Driving directions and parking info]
|
||||
|
||||
---
|
||||
|
||||
<!-- Schema Markup -->
|
||||
```json
|
||||
{
|
||||
"@context": "https://schema.org",
|
||||
"@type": "MedicalClinic",
|
||||
"name": "[Clinic Name] - [Location]",
|
||||
"image": "[clinic-image-url]",
|
||||
"@id": "[page-url]",
|
||||
"url": "[website-url]",
|
||||
"telephone": "[phone-number]",
|
||||
"address": {
|
||||
"@type": "PostalAddress",
|
||||
"streetAddress": "[street]",
|
||||
"addressLocality": "[city]",
|
||||
"addressRegion": "[state/province]",
|
||||
"postalCode": "[zip]",
|
||||
"addressCountry": "KR"
|
||||
},
|
||||
"geo": {
|
||||
"@type": "GeoCoordinates",
|
||||
"latitude": [latitude],
|
||||
"longitude": [longitude]
|
||||
},
|
||||
"openingHoursSpecification": {
|
||||
"@type": "OpeningHoursSpecification",
|
||||
"dayOfWeek": ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"],
|
||||
"opens": "09:00",
|
||||
"closes": "18:00"
|
||||
},
|
||||
"medicalSpecialty": "[Medical Specialty]",
|
||||
"availableService": {
|
||||
"@type": "MedicalProcedure",
|
||||
"name": "[Medical Service]",
|
||||
"description": "[Service Description]"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
*Last updated: [Date] | [Clinic Name] - Professional [Medical Service] in [Location]*
|
||||
90
custom-skills/20-gtm-audit/README.md
Normal file
90
custom-skills/20-gtm-audit/README.md
Normal file
@@ -0,0 +1,90 @@
|
||||
# OurDigital GTM Audit
|
||||
|
||||
Lightweight Google Tag Manager audit toolkit powered by Playwright.
|
||||
|
||||
> **Note**: For comprehensive GTM management including dataLayer tag generation, see [14-ourdigital-gtm-manager](../14-ourdigital-gtm-manager/).
|
||||
|
||||
## Features
|
||||
|
||||
- **Container Detection**: Verify GTM installation, position, and duplicates
|
||||
- **DataLayer Validation**: Event structure, types, sequence checking
|
||||
- **Form Tracking**: Form discovery, field analysis, event verification
|
||||
- **E-commerce Checkout**: Full funnel flow simulation and validation
|
||||
- **Multi-Platform**: GA4, Meta Pixel, LinkedIn, Google Ads, Kakao, Naver
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
# Clone or download
|
||||
cd gtm-audit-claude-code
|
||||
|
||||
# Install dependencies
|
||||
pip install -r requirements.txt
|
||||
|
||||
# Install Playwright browsers
|
||||
playwright install chromium
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
```bash
|
||||
# Full audit
|
||||
python gtm_audit.py --url "https://yoursite.com" --journey full
|
||||
|
||||
# Specific container validation
|
||||
python gtm_audit.py --url "https://yoursite.com" --container "GTM-XXXXXX"
|
||||
|
||||
# Form tracking only
|
||||
python gtm_audit.py --url "https://yoursite.com/contact" --journey form
|
||||
|
||||
# E-commerce checkout
|
||||
python gtm_audit.py --url "https://yoursite.com/cart" --journey checkout
|
||||
```
|
||||
|
||||
## Options
|
||||
|
||||
| Option | Description | Default |
|
||||
|--------|-------------|---------|
|
||||
| `--url` | Target URL (required) | - |
|
||||
| `--container` | Expected GTM container ID | None |
|
||||
| `--journey` | Audit type: pageview, scroll, form, checkout, datalayer, full | full |
|
||||
| `--output` | Output file path | gtm_audit_report.json |
|
||||
| `--timeout` | Page load timeout (ms) | 30000 |
|
||||
| `--headless` | Run browser headless | True |
|
||||
|
||||
## Output
|
||||
|
||||
Generates JSON report with:
|
||||
- Container status
|
||||
- DataLayer events and validation issues
|
||||
- Form analysis and tracking readiness
|
||||
- Checkout flow analysis
|
||||
- Network requests by destination
|
||||
- Recommendations and checklist
|
||||
|
||||
## Using with Claude Code
|
||||
|
||||
This project includes a `CLAUDE.md` file optimized for use with Claude Code.
|
||||
|
||||
```bash
|
||||
# In your terminal
|
||||
claude
|
||||
|
||||
# Then ask Claude to run audits
|
||||
> Run a GTM audit on https://example.com
|
||||
> Check the form tracking on https://example.com/contact
|
||||
> Analyze the checkout flow issues in the latest report
|
||||
```
|
||||
|
||||
## Documentation
|
||||
|
||||
See `docs/` folder for:
|
||||
- GA4 event specifications
|
||||
- E-commerce dataLayer schemas
|
||||
- Form tracking patterns
|
||||
- Checkout flow sequences
|
||||
- Common issues and fixes
|
||||
|
||||
## License
|
||||
|
||||
MIT
|
||||
109
custom-skills/20-gtm-audit/code/CLAUDE.md
Normal file
109
custom-skills/20-gtm-audit/code/CLAUDE.md
Normal file
@@ -0,0 +1,109 @@
|
||||
# OurDigital GTM Audit
|
||||
|
||||
Lightweight Google Tag Manager audit toolkit using Playwright browser automation.
|
||||
|
||||
> For comprehensive GTM management including dataLayer tag generation, see `14-ourdigital-gtm-manager`.
|
||||
|
||||
## Project Overview
|
||||
|
||||
This tool audits GTM container installations, validates dataLayer events, tests form tracking, simulates e-commerce checkout flows, and generates comprehensive reports.
|
||||
|
||||
## Quick Commands
|
||||
|
||||
```bash
|
||||
# Install dependencies
|
||||
pip install playwright
|
||||
playwright install chromium
|
||||
|
||||
# Run full audit
|
||||
python gtm_audit.py --url "https://example.com" --journey full
|
||||
|
||||
# Form tracking audit
|
||||
python gtm_audit.py --url "https://example.com/contact" --journey form
|
||||
|
||||
# E-commerce checkout flow
|
||||
python gtm_audit.py --url "https://example.com/cart" --journey checkout
|
||||
|
||||
# DataLayer deep inspection
|
||||
python gtm_audit.py --url "https://example.com" --journey datalayer
|
||||
|
||||
# With specific container validation
|
||||
python gtm_audit.py --url "https://example.com" --container "GTM-XXXXXX"
|
||||
```
|
||||
|
||||
## Journey Types
|
||||
|
||||
| Journey | Description |
|
||||
|---------|-------------|
|
||||
| `pageview` | Basic page load + scroll simulation |
|
||||
| `scroll` | Scroll depth trigger testing (25%, 50%, 75%, 90%) |
|
||||
| `form` | Form discovery, field analysis, interaction simulation |
|
||||
| `checkout` | E-commerce flow: cart → checkout → shipping → payment → purchase |
|
||||
| `datalayer` | Deep dataLayer validation and event sequence analysis |
|
||||
| `full` | All of the above combined |
|
||||
|
||||
## Output
|
||||
|
||||
Generates `gtm_audit_report.json` with:
|
||||
- Container status (installed, position, duplicates)
|
||||
- DataLayer analysis (events, validation issues, sequence errors)
|
||||
- Form analysis (forms found, tracking readiness, missing events)
|
||||
- Checkout analysis (elements detected, flow issues)
|
||||
- Network requests (GA4, Meta, LinkedIn, etc.)
|
||||
- Recommendations and checklist
|
||||
|
||||
## Key Files
|
||||
|
||||
- `gtm_audit.py` - Main audit script
|
||||
- `docs/ga4_events.md` - GA4 event specifications
|
||||
- `docs/ecommerce_schema.md` - E-commerce dataLayer structures
|
||||
- `docs/form_tracking.md` - Form event patterns
|
||||
- `docs/checkout_flow.md` - Checkout funnel sequence
|
||||
- `docs/datalayer_validation.md` - Validation rules
|
||||
- `docs/common_issues.md` - Frequent problems and fixes
|
||||
|
||||
## Coding Guidelines
|
||||
|
||||
When modifying this tool:
|
||||
|
||||
1. **Tag Destinations**: Add new platforms to `TAG_DESTINATIONS` dict
|
||||
2. **Event Validation**: Add requirements to `GA4_EVENT_REQUIREMENTS` dict
|
||||
3. **Form Selectors**: Extend `FormAnalyzer.discover_forms()` for custom forms
|
||||
4. **Checkout Elements**: Add selectors to `CheckoutFlowAnalyzer.detect_checkout_elements()`
|
||||
|
||||
## Korean Market Considerations
|
||||
|
||||
- Support Korean payment methods (카카오페이, 네이버페이, 토스)
|
||||
- Handle KRW currency (no decimals)
|
||||
- Include Kakao Pixel and Naver Analytics patterns
|
||||
- Korean button text patterns (장바구니, 결제하기, 주문하기)
|
||||
|
||||
## Testing a New Site
|
||||
|
||||
1. Run with `--journey full` first to get complete picture
|
||||
2. Check `gtm_audit_report.json` for issues
|
||||
3. Focus on specific areas with targeted journey types
|
||||
4. Use `--container GTM-XXXXXX` to validate specific container
|
||||
|
||||
## Common Tasks
|
||||
|
||||
### Add support for new tag platform
|
||||
```python
|
||||
# In TAG_DESTINATIONS dict
|
||||
"NewPlatform": [
|
||||
r"tracking\.newplatform\.com",
|
||||
r"pixel\.newplatform\.com",
|
||||
],
|
||||
```
|
||||
|
||||
### Add custom form field detection
|
||||
```python
|
||||
# In FormAnalyzer.discover_forms()
|
||||
# Add new field types or selectors
|
||||
```
|
||||
|
||||
### Extend checkout flow for specific platform
|
||||
```python
|
||||
# In CheckoutFlowAnalyzer.detect_checkout_elements()
|
||||
# Add platform-specific selectors
|
||||
```
|
||||
237
custom-skills/20-gtm-audit/code/docs/checkout_flow.md
Normal file
237
custom-skills/20-gtm-audit/code/docs/checkout_flow.md
Normal file
@@ -0,0 +1,237 @@
|
||||
# E-commerce Checkout Flow Reference
|
||||
|
||||
## Complete Checkout Event Sequence
|
||||
|
||||
```
|
||||
view_cart → begin_checkout → add_shipping_info → add_payment_info → purchase
|
||||
```
|
||||
|
||||
Each step must fire in order with consistent item data.
|
||||
|
||||
## Event Details
|
||||
|
||||
### 1. view_cart
|
||||
When user views cart page.
|
||||
|
||||
```javascript
|
||||
dataLayer.push({ ecommerce: null });
|
||||
dataLayer.push({
|
||||
event: "view_cart",
|
||||
ecommerce: {
|
||||
currency: "KRW",
|
||||
value: 125000,
|
||||
items: [{
|
||||
item_id: "SKU_001",
|
||||
item_name: "Blue T-Shirt",
|
||||
price: 45000,
|
||||
quantity: 2,
|
||||
item_brand: "Brand",
|
||||
item_category: "Apparel"
|
||||
}, {
|
||||
item_id: "SKU_002",
|
||||
item_name: "Black Jeans",
|
||||
price: 35000,
|
||||
quantity: 1
|
||||
}]
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
### 2. begin_checkout
|
||||
When user initiates checkout process.
|
||||
|
||||
```javascript
|
||||
dataLayer.push({ ecommerce: null });
|
||||
dataLayer.push({
|
||||
event: "begin_checkout",
|
||||
ecommerce: {
|
||||
currency: "KRW",
|
||||
value: 125000,
|
||||
coupon: "SUMMER10",
|
||||
items: [/* same items as view_cart */]
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
### 3. add_shipping_info
|
||||
When user completes shipping step.
|
||||
|
||||
```javascript
|
||||
dataLayer.push({ ecommerce: null });
|
||||
dataLayer.push({
|
||||
event: "add_shipping_info",
|
||||
ecommerce: {
|
||||
currency: "KRW",
|
||||
value: 125000,
|
||||
coupon: "SUMMER10",
|
||||
shipping_tier: "Express", // Required
|
||||
items: [/* same items */]
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
**shipping_tier values:**
|
||||
- "Standard" / "일반배송"
|
||||
- "Express" / "익일배송"
|
||||
- "Same Day" / "당일배송"
|
||||
- "Free" / "무료배송"
|
||||
- "Store Pickup" / "매장픽업"
|
||||
|
||||
### 4. add_payment_info
|
||||
When user enters payment details.
|
||||
|
||||
```javascript
|
||||
dataLayer.push({ ecommerce: null });
|
||||
dataLayer.push({
|
||||
event: "add_payment_info",
|
||||
ecommerce: {
|
||||
currency: "KRW",
|
||||
value: 125000,
|
||||
coupon: "SUMMER10",
|
||||
payment_type: "Credit Card", // Required
|
||||
items: [/* same items */]
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
**payment_type values:**
|
||||
- "Credit Card" / "신용카드"
|
||||
- "Debit Card" / "체크카드"
|
||||
- "Bank Transfer" / "계좌이체"
|
||||
- "Virtual Account" / "가상계좌"
|
||||
- "Mobile Payment" / "휴대폰결제"
|
||||
- "Kakao Pay" / "카카오페이"
|
||||
- "Naver Pay" / "네이버페이"
|
||||
- "Toss" / "토스"
|
||||
- "PayPal"
|
||||
|
||||
### 5. purchase
|
||||
When transaction completes successfully.
|
||||
|
||||
```javascript
|
||||
dataLayer.push({ ecommerce: null });
|
||||
dataLayer.push({
|
||||
event: "purchase",
|
||||
ecommerce: {
|
||||
transaction_id: "T_20250115_001234", // Required, unique
|
||||
value: 130500, // Required (total)
|
||||
tax: 11863,
|
||||
shipping: 5000,
|
||||
currency: "KRW", // Required
|
||||
coupon: "SUMMER10",
|
||||
items: [{
|
||||
item_id: "SKU_001",
|
||||
item_name: "Blue T-Shirt",
|
||||
affiliation: "Online Store",
|
||||
coupon: "SUMMER10",
|
||||
discount: 4500,
|
||||
price: 45000,
|
||||
quantity: 2
|
||||
}]
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
## Funnel Drop-off Analysis
|
||||
|
||||
### Tracking Drop-offs
|
||||
Monitor completion rate at each step:
|
||||
|
||||
| Step | Event | Drop-off Indicator |
|
||||
|------|-------|-------------------|
|
||||
| Cart | view_cart | User leaves cart page |
|
||||
| Checkout Start | begin_checkout | User doesn't proceed |
|
||||
| Shipping | add_shipping_info | Address form abandoned |
|
||||
| Payment | add_payment_info | Payment not completed |
|
||||
| Complete | purchase | Transaction failed |
|
||||
|
||||
### Implementing Drop-off Tracking
|
||||
|
||||
```javascript
|
||||
// Track checkout step viewed but not completed
|
||||
let checkoutStep = 0;
|
||||
|
||||
function trackCheckoutProgress(step) {
|
||||
if (step > checkoutStep) {
|
||||
checkoutStep = step;
|
||||
}
|
||||
}
|
||||
|
||||
window.addEventListener('beforeunload', () => {
|
||||
if (checkoutStep > 0 && checkoutStep < 5) {
|
||||
dataLayer.push({
|
||||
event: 'checkout_abandon',
|
||||
last_step: checkoutStep,
|
||||
step_name: ['cart', 'checkout', 'shipping', 'payment', 'complete'][checkoutStep - 1]
|
||||
});
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
## Value Consistency Check
|
||||
|
||||
Ensure `value` matches across events:
|
||||
|
||||
```
|
||||
view_cart.value = sum(items.price * items.quantity)
|
||||
begin_checkout.value = view_cart.value
|
||||
add_shipping_info.value = begin_checkout.value
|
||||
add_payment_info.value = add_shipping_info.value
|
||||
purchase.value = add_payment_info.value + shipping + tax - discount
|
||||
```
|
||||
|
||||
## Common Issues
|
||||
|
||||
### Duplicate Purchase Events
|
||||
**Problem**: Same order tracked multiple times
|
||||
**Solution**:
|
||||
```javascript
|
||||
// Check if already tracked
|
||||
const txId = "T_12345";
|
||||
if (!sessionStorage.getItem('purchase_' + txId)) {
|
||||
dataLayer.push({ event: 'purchase', ... });
|
||||
sessionStorage.setItem('purchase_' + txId, 'true');
|
||||
}
|
||||
```
|
||||
|
||||
### Missing Items in Later Steps
|
||||
**Problem**: Items present in view_cart but missing in purchase
|
||||
**Solution**: Store cart data in session and reuse
|
||||
|
||||
### Inconsistent Currency
|
||||
**Problem**: Some events use USD, others KRW
|
||||
**Solution**: Standardize currency across all events
|
||||
|
||||
### Wrong Value Calculation
|
||||
**Problem**: purchase.value doesn't include tax/shipping
|
||||
**Solution**:
|
||||
```
|
||||
purchase.value = subtotal + tax + shipping - discount
|
||||
```
|
||||
|
||||
## Korean E-commerce Platforms
|
||||
|
||||
### Cafe24
|
||||
Custom dataLayer variable names - check documentation
|
||||
|
||||
### Shopify Korea
|
||||
Standard GA4 format with `Shopify.checkout` object
|
||||
|
||||
### WooCommerce
|
||||
Use official GA4 plugin or custom implementation
|
||||
|
||||
### Naver SmartStore
|
||||
Separate Naver Analytics implementation required
|
||||
|
||||
## Checkout Flow Checklist
|
||||
|
||||
- [ ] view_cart fires on cart page load
|
||||
- [ ] begin_checkout fires on checkout button click
|
||||
- [ ] add_shipping_info includes shipping_tier
|
||||
- [ ] add_payment_info includes payment_type
|
||||
- [ ] purchase has unique transaction_id
|
||||
- [ ] All events have consistent items array
|
||||
- [ ] Currency is consistent across all events
|
||||
- [ ] Value calculations are accurate
|
||||
- [ ] ecommerce object cleared before each push
|
||||
- [ ] Purchase event fires only once per order
|
||||
211
custom-skills/20-gtm-audit/code/docs/common_issues.md
Normal file
211
custom-skills/20-gtm-audit/code/docs/common_issues.md
Normal file
@@ -0,0 +1,211 @@
|
||||
# Common GTM Issues & Fixes
|
||||
|
||||
## Container Issues
|
||||
|
||||
### GTM Not Firing
|
||||
**Symptoms**: No GTM requests in network tab
|
||||
**Causes**:
|
||||
1. Script blocked by ad blocker
|
||||
2. Script placed after closing body tag
|
||||
3. JavaScript error before GTM loads
|
||||
4. Consent management blocking GTM
|
||||
|
||||
**Fix**:
|
||||
```html
|
||||
<!-- Place immediately after opening <head> tag -->
|
||||
<script>(function(w,d,s,l,i){...})(window,document,'script','dataLayer','GTM-XXXXXX');</script>
|
||||
```
|
||||
|
||||
### Multiple Containers Conflict
|
||||
**Symptoms**: Duplicate events, inconsistent data
|
||||
**Causes**:
|
||||
1. Legacy container not removed
|
||||
2. Different teams installed separate containers
|
||||
3. Theme/plugin auto-installed GTM
|
||||
|
||||
**Fix**:
|
||||
1. Audit all containers in source
|
||||
2. Consolidate to single container
|
||||
3. Use GTM environments for staging/prod
|
||||
|
||||
### Container ID Mismatch
|
||||
**Symptoms**: Tags not firing, wrong property receiving data
|
||||
**Causes**:
|
||||
1. Dev/staging container on production
|
||||
2. Copy-paste error during installation
|
||||
|
||||
**Fix**: Verify container ID matches GTM account
|
||||
|
||||
---
|
||||
|
||||
## DataLayer Issues
|
||||
|
||||
### DataLayer Not Initialized
|
||||
**Symptoms**: First push events lost
|
||||
**Code Error**:
|
||||
```javascript
|
||||
// Wrong - GTM loads before dataLayer exists
|
||||
<script>GTM snippet</script>
|
||||
dataLayer.push({...});
|
||||
```
|
||||
|
||||
**Fix**:
|
||||
```javascript
|
||||
// Correct - Initialize dataLayer first
|
||||
<script>window.dataLayer = window.dataLayer || [];</script>
|
||||
<script>GTM snippet</script>
|
||||
```
|
||||
|
||||
### Case Sensitivity Issues
|
||||
**Symptoms**: Triggers not matching
|
||||
**Example**:
|
||||
```javascript
|
||||
// DataLayer pushes "AddToCart"
|
||||
dataLayer.push({ event: "AddToCart" });
|
||||
|
||||
// But GTM trigger looks for "addToCart" - won't match!
|
||||
```
|
||||
|
||||
**Fix**: Standardize event naming (recommend lowercase with underscores)
|
||||
|
||||
### Wrong Data Types
|
||||
**Symptoms**: Calculations wrong in GA4, missing data
|
||||
**Example**:
|
||||
```javascript
|
||||
// Wrong - price as string
|
||||
dataLayer.push({ ecommerce: { value: "29.99" }});
|
||||
|
||||
// Correct - price as number
|
||||
dataLayer.push({ ecommerce: { value: 29.99 }});
|
||||
```
|
||||
|
||||
### Timing Issues
|
||||
**Symptoms**: Events fire before data available
|
||||
**Cause**: DataLayer push happens after tag fires
|
||||
|
||||
**Fix**: Use "Custom Event" trigger instead of "Page View"
|
||||
|
||||
---
|
||||
|
||||
## Tag Issues
|
||||
|
||||
### Tag Not Firing
|
||||
|
||||
**Checklist**:
|
||||
1. ✓ Trigger conditions met?
|
||||
2. ✓ Trigger enabled?
|
||||
3. ✓ Tag not paused?
|
||||
4. ✓ No blocking triggers active?
|
||||
5. ✓ Consent mode not blocking?
|
||||
|
||||
**Debug Steps**:
|
||||
1. GTM Preview > Check Tags Fired
|
||||
2. Verify trigger shows green check
|
||||
3. Check Variables tab for expected values
|
||||
|
||||
### Duplicate Tag Firing
|
||||
**Symptoms**: Events counted 2x in GA4
|
||||
**Causes**:
|
||||
1. Multiple triggers on same action
|
||||
2. Page re-renders triggering again
|
||||
3. SPA virtual pageviews firing multiple times
|
||||
|
||||
**Fix**:
|
||||
1. Add "Once per event" tag firing option
|
||||
2. Use trigger groups to control firing
|
||||
3. Add conditions to prevent re-firing
|
||||
|
||||
### Wrong Parameters Sent
|
||||
**Symptoms**: Data appears in wrong fields in GA4
|
||||
**Debug**:
|
||||
1. GTM Preview > Tags > Show fired tag
|
||||
2. Check "Values" sent with tag
|
||||
3. Compare with expected parameters
|
||||
|
||||
---
|
||||
|
||||
## E-commerce Issues
|
||||
|
||||
### Missing Transaction ID
|
||||
**Symptoms**: Duplicate purchases counted
|
||||
**Fix**: Ensure unique `transaction_id` generated server-side
|
||||
|
||||
### Items Array Empty
|
||||
**Symptoms**: Revenue tracked but no products
|
||||
**Check**: `ecommerce.items` array populated
|
||||
|
||||
### Value Mismatch
|
||||
**Symptoms**: Revenue doesn't match actual
|
||||
**Causes**:
|
||||
1. Tax/shipping included inconsistently
|
||||
2. Currency conversion issues
|
||||
3. Discount applied incorrectly
|
||||
|
||||
### Purchase Event Fires Multiple Times
|
||||
**Symptoms**: Same order tracked 2-3x
|
||||
**Causes**:
|
||||
1. Page refresh on confirmation
|
||||
2. Browser back button
|
||||
3. Email link revisit
|
||||
|
||||
**Fix**:
|
||||
```javascript
|
||||
// Check if already tracked
|
||||
if (!sessionStorage.getItem('purchase_' + transaction_id)) {
|
||||
dataLayer.push({ event: 'purchase', ... });
|
||||
sessionStorage.setItem('purchase_' + transaction_id, 'true');
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Consent Mode Issues
|
||||
|
||||
### Tags Blocked by Consent
|
||||
**Symptoms**: Tags show "Blocked by consent" in Preview
|
||||
**Fix**:
|
||||
1. Verify consent mode implementation
|
||||
2. Check default consent state
|
||||
3. Test with consent granted
|
||||
|
||||
### Consent Not Updating
|
||||
**Symptoms**: Tags stay blocked after user accepts
|
||||
**Fix**: Verify `gtag('consent', 'update', {...})` fires on accept
|
||||
|
||||
---
|
||||
|
||||
## SPA (Single Page App) Issues
|
||||
|
||||
### Pageviews Not Tracking Navigation
|
||||
**Symptoms**: Only initial pageview tracked
|
||||
**Cause**: No page reload on route change
|
||||
|
||||
**Fix**: Implement History Change trigger or custom event:
|
||||
```javascript
|
||||
// On route change
|
||||
dataLayer.push({
|
||||
event: 'virtual_pageview',
|
||||
page_path: newPath,
|
||||
page_title: newTitle
|
||||
});
|
||||
```
|
||||
|
||||
### Events Fire on Old Page Data
|
||||
**Symptoms**: Wrong page_path in events
|
||||
**Fix**: Update page variables before event push
|
||||
|
||||
---
|
||||
|
||||
## Performance Issues
|
||||
|
||||
### Tags Slowing Page Load
|
||||
**Symptoms**: High LCP, slow TTI
|
||||
**Causes**:
|
||||
1. Too many synchronous tags
|
||||
2. Large third-party scripts
|
||||
3. Tags in wrong firing sequence
|
||||
|
||||
**Fix**:
|
||||
1. Use tag sequencing
|
||||
2. Load non-critical tags on Window Loaded
|
||||
3. Defer marketing tags
|
||||
287
custom-skills/20-gtm-audit/code/docs/datalayer_validation.md
Normal file
287
custom-skills/20-gtm-audit/code/docs/datalayer_validation.md
Normal file
@@ -0,0 +1,287 @@
|
||||
# DataLayer Validation Reference
|
||||
|
||||
## DataLayer Structure Basics
|
||||
|
||||
### Proper Initialization
|
||||
```javascript
|
||||
// Must appear BEFORE GTM script
|
||||
<script>
|
||||
window.dataLayer = window.dataLayer || [];
|
||||
</script>
|
||||
<!-- GTM script here -->
|
||||
```
|
||||
|
||||
### Push Syntax
|
||||
```javascript
|
||||
// Correct
|
||||
dataLayer.push({ event: "page_view", page_title: "Home" });
|
||||
|
||||
// Wrong - direct assignment
|
||||
dataLayer = [{ event: "page_view" }]; // ❌ Overwrites array
|
||||
```
|
||||
|
||||
## Validation Rules
|
||||
|
||||
### Event Names
|
||||
|
||||
| Rule | Valid | Invalid |
|
||||
|------|-------|---------|
|
||||
| Alphanumeric + underscore | `add_to_cart` | `add-to-cart` |
|
||||
| Max 40 characters | `purchase` | (too long names) |
|
||||
| Case sensitive | `addToCart` ≠ `addtocart` | - |
|
||||
| No spaces | `form_submit` | `form submit` |
|
||||
| No special chars | `click_cta` | `click@cta` |
|
||||
|
||||
### Parameter Names
|
||||
|
||||
| Rule | Valid | Invalid |
|
||||
|------|-------|---------|
|
||||
| Max 40 characters | `item_category` | (too long) |
|
||||
| Alphanumeric + underscore | `user_id` | `user-id` |
|
||||
| Cannot start with `_` | `custom_param` | `_private` |
|
||||
| Cannot start with number | `step_1` | `1_step` |
|
||||
|
||||
### Data Types
|
||||
|
||||
| Parameter | Expected Type | Example |
|
||||
|-----------|---------------|---------|
|
||||
| value | number | `29.99` not `"29.99"` |
|
||||
| currency | string (ISO 4217) | `"USD"`, `"KRW"` |
|
||||
| transaction_id | string | `"T_12345"` |
|
||||
| quantity | integer | `2` not `2.0` |
|
||||
| price | number | `45000` |
|
||||
| items | array | `[{...}, {...}]` |
|
||||
|
||||
### Type Validation Code
|
||||
|
||||
```javascript
|
||||
function validateDataLayerPush(data) {
|
||||
const issues = [];
|
||||
|
||||
// Check value is number
|
||||
if (data.ecommerce?.value !== undefined) {
|
||||
if (typeof data.ecommerce.value !== 'number') {
|
||||
issues.push(`value should be number, got ${typeof data.ecommerce.value}`);
|
||||
}
|
||||
}
|
||||
|
||||
// Check currency format
|
||||
if (data.ecommerce?.currency) {
|
||||
if (!/^[A-Z]{3}$/.test(data.ecommerce.currency)) {
|
||||
issues.push(`currency should be 3-letter ISO code`);
|
||||
}
|
||||
}
|
||||
|
||||
// Check items array
|
||||
if (data.ecommerce?.items) {
|
||||
if (!Array.isArray(data.ecommerce.items)) {
|
||||
issues.push(`items should be array`);
|
||||
} else {
|
||||
data.ecommerce.items.forEach((item, i) => {
|
||||
if (!item.item_id) issues.push(`items[${i}] missing item_id`);
|
||||
if (!item.item_name) issues.push(`items[${i}] missing item_name`);
|
||||
if (item.price && typeof item.price !== 'number') {
|
||||
issues.push(`items[${i}].price should be number`);
|
||||
}
|
||||
if (item.quantity && !Number.isInteger(item.quantity)) {
|
||||
issues.push(`items[${i}].quantity should be integer`);
|
||||
}
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
return issues;
|
||||
}
|
||||
```
|
||||
|
||||
## E-commerce Object Clearing
|
||||
|
||||
### Why Clear?
|
||||
GA4 may merge previous ecommerce data with new events.
|
||||
|
||||
### Correct Pattern
|
||||
```javascript
|
||||
// Clear first
|
||||
dataLayer.push({ ecommerce: null });
|
||||
|
||||
// Then push new event
|
||||
dataLayer.push({
|
||||
event: "view_item",
|
||||
ecommerce: { ... }
|
||||
});
|
||||
```
|
||||
|
||||
### Validation Check
|
||||
```javascript
|
||||
function checkEcommerceClear(dataLayerArray) {
|
||||
let lastHadEcommerce = false;
|
||||
const issues = [];
|
||||
|
||||
dataLayerArray.forEach((item, i) => {
|
||||
const hasEcommerce = 'ecommerce' in item;
|
||||
const isNull = item.ecommerce === null;
|
||||
|
||||
if (hasEcommerce && !isNull && lastHadEcommerce) {
|
||||
issues.push({
|
||||
index: i,
|
||||
message: 'Missing ecommerce:null before this push'
|
||||
});
|
||||
}
|
||||
|
||||
lastHadEcommerce = hasEcommerce && !isNull;
|
||||
});
|
||||
|
||||
return issues;
|
||||
}
|
||||
```
|
||||
|
||||
## Event Sequence Validation
|
||||
|
||||
### Expected Sequences
|
||||
|
||||
**E-commerce Purchase Flow:**
|
||||
```
|
||||
view_item_list? → view_item → add_to_cart → view_cart →
|
||||
begin_checkout → add_shipping_info → add_payment_info → purchase
|
||||
```
|
||||
|
||||
**Form Submission:**
|
||||
```
|
||||
form_start → form_submit → generate_lead?
|
||||
```
|
||||
|
||||
**User Authentication:**
|
||||
```
|
||||
login | sign_up
|
||||
```
|
||||
|
||||
### Sequence Validator
|
||||
|
||||
```javascript
|
||||
function validateSequence(events, expectedOrder) {
|
||||
const eventNames = events
|
||||
.filter(e => e.event)
|
||||
.map(e => e.event);
|
||||
|
||||
let lastIndex = -1;
|
||||
const issues = [];
|
||||
|
||||
eventNames.forEach(event => {
|
||||
const index = expectedOrder.indexOf(event);
|
||||
if (index !== -1) {
|
||||
if (index < lastIndex) {
|
||||
issues.push(`${event} fired out of expected order`);
|
||||
}
|
||||
lastIndex = index;
|
||||
}
|
||||
});
|
||||
|
||||
return issues;
|
||||
}
|
||||
```
|
||||
|
||||
## Duplicate Event Detection
|
||||
|
||||
### Common Duplicates
|
||||
- Multiple `page_view` on single page load
|
||||
- `purchase` firing on page refresh
|
||||
- Click events on bubbling elements
|
||||
|
||||
### Detection Code
|
||||
|
||||
```javascript
|
||||
function findDuplicates(events) {
|
||||
const seen = {};
|
||||
const duplicates = [];
|
||||
|
||||
events.forEach((event, i) => {
|
||||
if (!event.event) return;
|
||||
|
||||
const key = JSON.stringify(event);
|
||||
if (seen[key]) {
|
||||
duplicates.push({
|
||||
event: event.event,
|
||||
firstIndex: seen[key],
|
||||
duplicateIndex: i
|
||||
});
|
||||
} else {
|
||||
seen[key] = i;
|
||||
}
|
||||
});
|
||||
|
||||
return duplicates;
|
||||
}
|
||||
```
|
||||
|
||||
## Real-time Monitoring Setup
|
||||
|
||||
### Console Monitoring
|
||||
|
||||
```javascript
|
||||
// Paste in browser console to monitor pushes
|
||||
(function() {
|
||||
const original = dataLayer.push;
|
||||
dataLayer.push = function() {
|
||||
console.group('📊 dataLayer.push');
|
||||
console.log('Data:', arguments[0]);
|
||||
console.log('Time:', new Date().toISOString());
|
||||
console.groupEnd();
|
||||
return original.apply(this, arguments);
|
||||
};
|
||||
console.log('✅ DataLayer monitoring active');
|
||||
})();
|
||||
```
|
||||
|
||||
### Export DataLayer
|
||||
|
||||
```javascript
|
||||
// Copy full dataLayer to clipboard
|
||||
copy(JSON.stringify(dataLayer, null, 2));
|
||||
```
|
||||
|
||||
## Validation Checklist
|
||||
|
||||
### Structure
|
||||
- [ ] dataLayer initialized before GTM
|
||||
- [ ] Using push() not assignment
|
||||
- [ ] Event names follow conventions
|
||||
- [ ] Parameter names follow conventions
|
||||
|
||||
### Data Types
|
||||
- [ ] value is number
|
||||
- [ ] currency is 3-letter code
|
||||
- [ ] quantity is integer
|
||||
- [ ] items is array
|
||||
- [ ] Required fields present
|
||||
|
||||
### E-commerce
|
||||
- [ ] ecommerce:null before each push
|
||||
- [ ] items array has item_id and item_name
|
||||
- [ ] transaction_id is unique
|
||||
- [ ] Consistent currency across events
|
||||
|
||||
### Sequence
|
||||
- [ ] Events fire in logical order
|
||||
- [ ] No duplicate events
|
||||
- [ ] Purchase fires only once
|
||||
|
||||
## Debug Tools
|
||||
|
||||
### GTM Preview Mode
|
||||
- Real-time event inspection
|
||||
- Variable value checking
|
||||
- Tag firing verification
|
||||
|
||||
### GA4 DebugView
|
||||
- Live event stream
|
||||
- Parameter validation
|
||||
- User property tracking
|
||||
|
||||
### Browser Console
|
||||
```javascript
|
||||
// View current dataLayer
|
||||
console.table(dataLayer);
|
||||
|
||||
// Filter by event
|
||||
dataLayer.filter(d => d.event === 'purchase');
|
||||
```
|
||||
216
custom-skills/20-gtm-audit/code/docs/ecommerce_schema.md
Normal file
216
custom-skills/20-gtm-audit/code/docs/ecommerce_schema.md
Normal file
@@ -0,0 +1,216 @@
|
||||
# E-commerce DataLayer Schema Reference
|
||||
|
||||
## GA4 E-commerce Structure
|
||||
|
||||
### Items Array Schema
|
||||
Every e-commerce event requires an `items` array:
|
||||
|
||||
```javascript
|
||||
items: [{
|
||||
// Required
|
||||
item_id: "SKU_12345",
|
||||
item_name: "Blue T-Shirt",
|
||||
|
||||
// Recommended
|
||||
affiliation: "Store Name",
|
||||
coupon: "SUMMER_SALE",
|
||||
discount: 5.00,
|
||||
index: 0,
|
||||
item_brand: "Brand Name",
|
||||
item_category: "Apparel",
|
||||
item_category2: "Men",
|
||||
item_category3: "Shirts",
|
||||
item_category4: "T-Shirts",
|
||||
item_category5: "Short Sleeve",
|
||||
item_list_id: "related_products",
|
||||
item_list_name: "Related Products",
|
||||
item_variant: "Blue/Large",
|
||||
location_id: "ChIJIQBpAG2ahYAR_6128GcTUEo",
|
||||
price: 29.99,
|
||||
quantity: 1
|
||||
}]
|
||||
```
|
||||
|
||||
### Clear Previous E-commerce Data
|
||||
Always clear before new e-commerce event:
|
||||
|
||||
```javascript
|
||||
dataLayer.push({ ecommerce: null });
|
||||
dataLayer.push({
|
||||
event: "view_item",
|
||||
ecommerce: {
|
||||
// new data
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
## Complete Purchase Flow
|
||||
|
||||
### 1. Product List View
|
||||
```javascript
|
||||
dataLayer.push({ ecommerce: null });
|
||||
dataLayer.push({
|
||||
event: "view_item_list",
|
||||
ecommerce: {
|
||||
item_list_id: "category_results",
|
||||
item_list_name: "Category Results",
|
||||
items: [
|
||||
{ item_id: "SKU_001", item_name: "Product 1", index: 0, price: 29.99 },
|
||||
{ item_id: "SKU_002", item_name: "Product 2", index: 1, price: 39.99 }
|
||||
]
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
### 2. Product Click
|
||||
```javascript
|
||||
dataLayer.push({ ecommerce: null });
|
||||
dataLayer.push({
|
||||
event: "select_item",
|
||||
ecommerce: {
|
||||
item_list_id: "category_results",
|
||||
item_list_name: "Category Results",
|
||||
items: [{
|
||||
item_id: "SKU_001",
|
||||
item_name: "Product 1",
|
||||
price: 29.99
|
||||
}]
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
### 3. Product Detail View
|
||||
```javascript
|
||||
dataLayer.push({ ecommerce: null });
|
||||
dataLayer.push({
|
||||
event: "view_item",
|
||||
ecommerce: {
|
||||
currency: "USD",
|
||||
value: 29.99,
|
||||
items: [{
|
||||
item_id: "SKU_001",
|
||||
item_name: "Product 1",
|
||||
item_brand: "Brand",
|
||||
item_category: "Category",
|
||||
price: 29.99,
|
||||
quantity: 1
|
||||
}]
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
### 4. Add to Cart
|
||||
```javascript
|
||||
dataLayer.push({ ecommerce: null });
|
||||
dataLayer.push({
|
||||
event: "add_to_cart",
|
||||
ecommerce: {
|
||||
currency: "USD",
|
||||
value: 29.99,
|
||||
items: [{
|
||||
item_id: "SKU_001",
|
||||
item_name: "Product 1",
|
||||
price: 29.99,
|
||||
quantity: 1
|
||||
}]
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
### 5. View Cart
|
||||
```javascript
|
||||
dataLayer.push({ ecommerce: null });
|
||||
dataLayer.push({
|
||||
event: "view_cart",
|
||||
ecommerce: {
|
||||
currency: "USD",
|
||||
value: 59.98,
|
||||
items: [
|
||||
{ item_id: "SKU_001", item_name: "Product 1", price: 29.99, quantity: 1 },
|
||||
{ item_id: "SKU_002", item_name: "Product 2", price: 29.99, quantity: 1 }
|
||||
]
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
### 6. Begin Checkout
|
||||
```javascript
|
||||
dataLayer.push({ ecommerce: null });
|
||||
dataLayer.push({
|
||||
event: "begin_checkout",
|
||||
ecommerce: {
|
||||
currency: "USD",
|
||||
value: 59.98,
|
||||
coupon: "DISCOUNT10",
|
||||
items: [...]
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
### 7. Add Shipping Info
|
||||
```javascript
|
||||
dataLayer.push({ ecommerce: null });
|
||||
dataLayer.push({
|
||||
event: "add_shipping_info",
|
||||
ecommerce: {
|
||||
currency: "USD",
|
||||
value: 59.98,
|
||||
shipping_tier: "Standard",
|
||||
items: [...]
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
### 8. Add Payment Info
|
||||
```javascript
|
||||
dataLayer.push({ ecommerce: null });
|
||||
dataLayer.push({
|
||||
event: "add_payment_info",
|
||||
ecommerce: {
|
||||
currency: "USD",
|
||||
value: 59.98,
|
||||
payment_type: "Credit Card",
|
||||
items: [...]
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
### 9. Purchase
|
||||
```javascript
|
||||
dataLayer.push({ ecommerce: null });
|
||||
dataLayer.push({
|
||||
event: "purchase",
|
||||
ecommerce: {
|
||||
transaction_id: "T_12345",
|
||||
value: 65.97,
|
||||
tax: 4.99,
|
||||
shipping: 5.99,
|
||||
currency: "USD",
|
||||
coupon: "DISCOUNT10",
|
||||
items: [{
|
||||
item_id: "SKU_001",
|
||||
item_name: "Product 1",
|
||||
affiliation: "Online Store",
|
||||
coupon: "DISCOUNT10",
|
||||
discount: 3.00,
|
||||
item_brand: "Brand",
|
||||
item_category: "Category",
|
||||
price: 29.99,
|
||||
quantity: 1
|
||||
}]
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
## Korean E-commerce Considerations
|
||||
|
||||
### Currency
|
||||
```javascript
|
||||
currency: "KRW",
|
||||
value: 35000 // No decimals for KRW
|
||||
```
|
||||
|
||||
### Common Korean Platform Integrations
|
||||
- Cafe24: Uses custom dataLayer structure
|
||||
- Shopify Korea: Standard GA4 format
|
||||
- Naver SmartStore: Custom pixel implementation
|
||||
157
custom-skills/20-gtm-audit/code/docs/form_tracking.md
Normal file
157
custom-skills/20-gtm-audit/code/docs/form_tracking.md
Normal file
@@ -0,0 +1,157 @@
|
||||
# Form Tracking Reference
|
||||
|
||||
## GA4 Form Events
|
||||
|
||||
### form_start
|
||||
Fires on first interaction with form field.
|
||||
|
||||
```javascript
|
||||
dataLayer.push({
|
||||
event: "form_start",
|
||||
form_id: "contact-form",
|
||||
form_name: "Contact Us",
|
||||
form_destination: "/submit-contact"
|
||||
});
|
||||
```
|
||||
|
||||
### form_submit
|
||||
Fires on successful form submission.
|
||||
|
||||
```javascript
|
||||
dataLayer.push({
|
||||
event: "form_submit",
|
||||
form_id: "contact-form",
|
||||
form_name: "Contact Us",
|
||||
form_destination: "/submit-contact",
|
||||
form_submit_text: "Send Message"
|
||||
});
|
||||
```
|
||||
|
||||
### generate_lead
|
||||
Fires when form generates a qualified lead.
|
||||
|
||||
```javascript
|
||||
dataLayer.push({
|
||||
event: "generate_lead",
|
||||
currency: "USD",
|
||||
value: 100, // Estimated lead value
|
||||
form_id: "quote-request"
|
||||
});
|
||||
```
|
||||
|
||||
## Form Field Events (Custom)
|
||||
|
||||
### field_focus
|
||||
```javascript
|
||||
dataLayer.push({
|
||||
event: "field_focus",
|
||||
form_id: "signup-form",
|
||||
field_name: "email",
|
||||
field_type: "email"
|
||||
});
|
||||
```
|
||||
|
||||
### field_complete
|
||||
```javascript
|
||||
dataLayer.push({
|
||||
event: "field_complete",
|
||||
form_id: "signup-form",
|
||||
field_name: "email",
|
||||
field_type: "email",
|
||||
is_valid: true
|
||||
});
|
||||
```
|
||||
|
||||
### field_error
|
||||
```javascript
|
||||
dataLayer.push({
|
||||
event: "field_error",
|
||||
form_id: "signup-form",
|
||||
field_name: "email",
|
||||
error_message: "Invalid email format"
|
||||
});
|
||||
```
|
||||
|
||||
## Form Abandonment Tracking
|
||||
|
||||
### Detecting Abandonment
|
||||
Track when user leaves form without submitting:
|
||||
|
||||
```javascript
|
||||
// Track form start
|
||||
let formStarted = false;
|
||||
document.querySelectorAll('form input, form select, form textarea')
|
||||
.forEach(field => {
|
||||
field.addEventListener('focus', function() {
|
||||
if (!formStarted) {
|
||||
formStarted = true;
|
||||
dataLayer.push({ event: 'form_start', form_id: this.form.id });
|
||||
}
|
||||
});
|
||||
});
|
||||
|
||||
// Track abandonment on page leave
|
||||
window.addEventListener('beforeunload', function() {
|
||||
if (formStarted && !formSubmitted) {
|
||||
dataLayer.push({
|
||||
event: 'form_abandon',
|
||||
form_id: 'contact-form',
|
||||
last_field: lastFocusedField,
|
||||
fields_completed: completedFieldCount
|
||||
});
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
## GTM Trigger Configuration
|
||||
|
||||
### Form Submission Trigger
|
||||
| Setting | Value |
|
||||
|---------|-------|
|
||||
| Trigger Type | Form Submission |
|
||||
| Wait for Tags | Check (if AJAX form) |
|
||||
| Check Validation | Check |
|
||||
| Form ID | equals `contact-form` |
|
||||
|
||||
### Form Start Trigger (Custom Event)
|
||||
| Setting | Value |
|
||||
|---------|-------|
|
||||
| Trigger Type | Custom Event |
|
||||
| Event Name | form_start |
|
||||
| Fire On | All Custom Events |
|
||||
|
||||
## Common Form Types & Tracking
|
||||
|
||||
### Contact Forms
|
||||
Events: `form_start`, `form_submit`, `generate_lead`
|
||||
|
||||
### Newsletter Signup
|
||||
Events: `form_start`, `form_submit`, `sign_up`
|
||||
|
||||
### Login Forms
|
||||
Events: `form_start`, `login`
|
||||
|
||||
### Search Forms
|
||||
Events: `search` (with search_term parameter)
|
||||
|
||||
### Multi-Step Forms
|
||||
Track each step:
|
||||
```javascript
|
||||
dataLayer.push({
|
||||
event: "form_step",
|
||||
form_id: "checkout-form",
|
||||
step_number: 2,
|
||||
step_name: "Shipping Address"
|
||||
});
|
||||
```
|
||||
|
||||
## Validation Checklist
|
||||
|
||||
- [ ] Form has id or name attribute
|
||||
- [ ] All required fields have names
|
||||
- [ ] Submit button identifiable
|
||||
- [ ] form_start fires on first interaction
|
||||
- [ ] form_submit fires only on success
|
||||
- [ ] generate_lead has value parameter
|
||||
- [ ] Error events track validation failures
|
||||
- [ ] Abandonment tracking implemented (optional)
|
||||
177
custom-skills/20-gtm-audit/code/docs/ga4_events.md
Normal file
177
custom-skills/20-gtm-audit/code/docs/ga4_events.md
Normal file
@@ -0,0 +1,177 @@
|
||||
# GA4 Recommended Events Reference
|
||||
|
||||
## Automatically Collected Events
|
||||
Events GA4 collects without configuration:
|
||||
- `first_visit` - First time user visits
|
||||
- `session_start` - Session begins
|
||||
- `page_view` - Page loads (enhanced measurement)
|
||||
- `scroll` - 90% scroll depth
|
||||
- `click` - Outbound link clicks
|
||||
- `file_download` - File download clicks
|
||||
- `video_start`, `video_progress`, `video_complete` - YouTube embeds
|
||||
|
||||
## E-commerce Events (Required Parameters)
|
||||
|
||||
### view_item_list
|
||||
```javascript
|
||||
{
|
||||
event: "view_item_list",
|
||||
ecommerce: {
|
||||
item_list_id: "related_products",
|
||||
item_list_name: "Related Products",
|
||||
items: [{
|
||||
item_id: "SKU_12345", // required
|
||||
item_name: "Product Name", // required
|
||||
price: 29.99,
|
||||
quantity: 1
|
||||
}]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### view_item
|
||||
```javascript
|
||||
{
|
||||
event: "view_item",
|
||||
ecommerce: {
|
||||
currency: "USD",
|
||||
value: 29.99,
|
||||
items: [{
|
||||
item_id: "SKU_12345", // required
|
||||
item_name: "Product Name", // required
|
||||
price: 29.99,
|
||||
quantity: 1
|
||||
}]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### add_to_cart
|
||||
```javascript
|
||||
{
|
||||
event: "add_to_cart",
|
||||
ecommerce: {
|
||||
currency: "USD",
|
||||
value: 29.99,
|
||||
items: [{
|
||||
item_id: "SKU_12345", // required
|
||||
item_name: "Product Name", // required
|
||||
price: 29.99,
|
||||
quantity: 1
|
||||
}]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### begin_checkout
|
||||
```javascript
|
||||
{
|
||||
event: "begin_checkout",
|
||||
ecommerce: {
|
||||
currency: "USD",
|
||||
value: 99.99,
|
||||
coupon: "SUMMER_SALE",
|
||||
items: [...]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### add_payment_info
|
||||
```javascript
|
||||
{
|
||||
event: "add_payment_info",
|
||||
ecommerce: {
|
||||
currency: "USD",
|
||||
value: 99.99,
|
||||
payment_type: "credit_card",
|
||||
items: [...]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### purchase
|
||||
```javascript
|
||||
{
|
||||
event: "purchase",
|
||||
ecommerce: {
|
||||
transaction_id: "T12345", // required, must be unique
|
||||
value: 99.99, // required
|
||||
currency: "USD", // required
|
||||
tax: 4.99,
|
||||
shipping: 5.99,
|
||||
coupon: "SUMMER_SALE",
|
||||
items: [{
|
||||
item_id: "SKU_12345", // required
|
||||
item_name: "Product Name",// required
|
||||
price: 29.99,
|
||||
quantity: 2
|
||||
}]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Lead Generation Events
|
||||
|
||||
### generate_lead
|
||||
```javascript
|
||||
{
|
||||
event: "generate_lead",
|
||||
currency: "USD",
|
||||
value: 100 // estimated lead value
|
||||
}
|
||||
```
|
||||
|
||||
### sign_up
|
||||
```javascript
|
||||
{
|
||||
event: "sign_up",
|
||||
method: "email" // or "google", "facebook", etc.
|
||||
}
|
||||
```
|
||||
|
||||
### login
|
||||
```javascript
|
||||
{
|
||||
event: "login",
|
||||
method: "email"
|
||||
}
|
||||
```
|
||||
|
||||
## Engagement Events
|
||||
|
||||
### search
|
||||
```javascript
|
||||
{
|
||||
event: "search",
|
||||
search_term: "blue shoes"
|
||||
}
|
||||
```
|
||||
|
||||
### share
|
||||
```javascript
|
||||
{
|
||||
event: "share",
|
||||
method: "twitter",
|
||||
content_type: "article",
|
||||
item_id: "article_123"
|
||||
}
|
||||
```
|
||||
|
||||
## Parameter Validation Rules
|
||||
|
||||
| Parameter | Type | Max Length | Notes |
|
||||
|-----------|------|------------|-------|
|
||||
| event name | string | 40 chars | No spaces, alphanumeric + underscore |
|
||||
| item_id | string | 100 chars | Required for e-commerce |
|
||||
| item_name | string | 100 chars | Required for e-commerce |
|
||||
| currency | string | 3 chars | ISO 4217 format (USD, KRW, etc.) |
|
||||
| transaction_id | string | 100 chars | Must be unique per transaction |
|
||||
| value | number | - | Numeric, no currency symbols |
|
||||
|
||||
## Common Validation Errors
|
||||
|
||||
1. **Missing required params**: `item_id` or `item_name` not in items array
|
||||
2. **Wrong data type**: `value` as string instead of number
|
||||
3. **Duplicate transaction_id**: Same ID used for multiple purchases
|
||||
4. **Empty items array**: E-commerce event with no items
|
||||
5. **Invalid currency**: Currency code not in ISO 4217 format
|
||||
115
custom-skills/20-gtm-audit/code/docs/report_template.md
Normal file
115
custom-skills/20-gtm-audit/code/docs/report_template.md
Normal file
@@ -0,0 +1,115 @@
|
||||
# GTM Audit Report Template
|
||||
|
||||
## Executive Summary
|
||||
|
||||
| Metric | Status |
|
||||
|--------|--------|
|
||||
| Container Installed | ✅ / ❌ |
|
||||
| Container Valid | ✅ / ❌ |
|
||||
| DataLayer Active | ✅ / ❌ |
|
||||
| Tags Firing | X of Y |
|
||||
| Critical Issues | X |
|
||||
| Warnings | X |
|
||||
|
||||
## Container Status
|
||||
|
||||
**Container ID**: GTM-XXXXXX
|
||||
**Installation Position**: head / body
|
||||
**Multiple Containers**: Yes / No
|
||||
**Noscript Fallback**: Present / Missing
|
||||
|
||||
### Issues Found
|
||||
- [ ] Issue description
|
||||
|
||||
## DataLayer Analysis
|
||||
|
||||
### Events Captured
|
||||
| Event Name | Count | Has Issues |
|
||||
|------------|-------|------------|
|
||||
| page_view | 1 | No |
|
||||
| add_to_cart | 0 | - |
|
||||
|
||||
### DataLayer Quality
|
||||
- [ ] Initialized before GTM
|
||||
- [ ] Standard event naming
|
||||
- [ ] Correct data types
|
||||
- [ ] E-commerce structure valid
|
||||
|
||||
## Tag Firing Report
|
||||
|
||||
### Tags Fired ✅
|
||||
| Destination | Events | Parameters |
|
||||
|-------------|--------|------------|
|
||||
| GA4 | page_view | page_location, page_title |
|
||||
| Meta Pixel | PageView | - |
|
||||
|
||||
### Tags Not Detected ⚠️
|
||||
| Expected Tag | Reason | Priority |
|
||||
|--------------|--------|----------|
|
||||
| GA4 purchase | Event not triggered | High |
|
||||
|
||||
## Network Request Analysis
|
||||
|
||||
Total requests captured: X
|
||||
|
||||
### By Destination
|
||||
| Destination | Requests | Status |
|
||||
|-------------|----------|--------|
|
||||
| GA4 | X | ✅ |
|
||||
| Meta | X | ✅ |
|
||||
| Google Ads | 0 | ⚠️ |
|
||||
|
||||
## Issues & Recommendations
|
||||
|
||||
### Critical 🔴
|
||||
1. **Issue Title**
|
||||
- Description
|
||||
- Impact
|
||||
- Recommended Fix
|
||||
|
||||
### Warning 🟡
|
||||
1. **Issue Title**
|
||||
- Description
|
||||
- Recommended Fix
|
||||
|
||||
### Info 🔵
|
||||
1. **Issue Title**
|
||||
- Description
|
||||
|
||||
## Action Items Checklist
|
||||
|
||||
### Immediate (Critical)
|
||||
- [ ] Action item 1
|
||||
- [ ] Action item 2
|
||||
|
||||
### Short-term (This Week)
|
||||
- [ ] Action item 3
|
||||
|
||||
### Long-term (This Month)
|
||||
- [ ] Action item 4
|
||||
|
||||
## Technical Details
|
||||
|
||||
### Environment
|
||||
- URL Audited: https://example.com
|
||||
- Audit Timestamp: YYYY-MM-DD HH:MM:SS
|
||||
- Browser: Chromium (headless)
|
||||
- Viewport: 1920x1080
|
||||
|
||||
### Raw Data
|
||||
Full JSON report available at: `gtm_audit_report.json`
|
||||
|
||||
---
|
||||
|
||||
## Appendix: Tag Destination Reference
|
||||
|
||||
| Tag Type | Network Pattern |
|
||||
|----------|-----------------|
|
||||
| GA4 | google-analytics.com/g/collect |
|
||||
| UA (Legacy) | google-analytics.com/collect |
|
||||
| Google Ads | googleads.g.doubleclick.net |
|
||||
| Meta Pixel | facebook.com/tr |
|
||||
| LinkedIn | px.ads.linkedin.com |
|
||||
| TikTok | analytics.tiktok.com |
|
||||
| Kakao | pixel.kakao.com |
|
||||
| Naver | wcs.naver.com |
|
||||
1113
custom-skills/20-gtm-audit/code/scripts/gtm_audit.py
Normal file
1113
custom-skills/20-gtm-audit/code/scripts/gtm_audit.py
Normal file
File diff suppressed because it is too large
Load Diff
1
custom-skills/20-gtm-audit/code/scripts/requirements.txt
Normal file
1
custom-skills/20-gtm-audit/code/scripts/requirements.txt
Normal file
@@ -0,0 +1 @@
|
||||
playwright>=1.40.0
|
||||
30
custom-skills/20-gtm-audit/code/scripts/setup.sh
Normal file
30
custom-skills/20-gtm-audit/code/scripts/setup.sh
Normal file
@@ -0,0 +1,30 @@
|
||||
#!/bin/bash
|
||||
# GTM Audit Tool Setup Script
|
||||
|
||||
echo "🔧 Setting up GTM Audit Tool..."
|
||||
|
||||
# Check Python
|
||||
if ! command -v python3 &> /dev/null; then
|
||||
echo "❌ Python 3 is required but not installed."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "✅ Python 3 found"
|
||||
|
||||
# Install dependencies
|
||||
echo "📦 Installing Python dependencies..."
|
||||
pip install -r requirements.txt
|
||||
|
||||
# Install Playwright browsers
|
||||
echo "🌐 Installing Playwright browsers..."
|
||||
playwright install chromium
|
||||
|
||||
echo ""
|
||||
echo "✅ Setup complete!"
|
||||
echo ""
|
||||
echo "Usage:"
|
||||
echo " python gtm_audit.py --url 'https://example.com' --journey full"
|
||||
echo ""
|
||||
echo "For Claude Code:"
|
||||
echo " claude"
|
||||
echo " > Run a GTM audit on https://example.com"
|
||||
BIN
custom-skills/20-gtm-audit/desktop/SKILL.md
Normal file
BIN
custom-skills/20-gtm-audit/desktop/SKILL.md
Normal file
Binary file not shown.
142
custom-skills/21-gtm-manager/README.md
Normal file
142
custom-skills/21-gtm-manager/README.md
Normal file
@@ -0,0 +1,142 @@
|
||||
# OurDigital GTM Manager
|
||||
|
||||
Comprehensive Google Tag Manager management toolkit powered by Playwright.
|
||||
|
||||
> **Note**: For lightweight audit-only functionality, see [13-ourdigital-gtm-audit](../13-ourdigital-gtm-audit/).
|
||||
|
||||
## Features
|
||||
|
||||
- **Audit Mode**: Validate GTM installations, dataLayer events, forms, and checkout flows
|
||||
- **Inject Mode**: Generate custom HTML tags for dataLayer pushes
|
||||
- **Container Detection**: Verify GTM installation, position, and duplicates
|
||||
- **DataLayer Validation**: Event structure, types, sequence checking
|
||||
- **Form Tracking**: Form discovery, field analysis, event verification
|
||||
- **E-commerce Checkout**: Full funnel flow simulation and validation
|
||||
- **Multi-Platform**: GA4, Meta Pixel, LinkedIn, Google Ads, Kakao, Naver
|
||||
- **Notion Integration**: Export audit results to Notion database
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
cd ourdigital-gtm-manager
|
||||
|
||||
# Install dependencies
|
||||
pip install -r requirements.txt
|
||||
|
||||
# Install Playwright browsers
|
||||
playwright install chromium
|
||||
```
|
||||
|
||||
## Audit Mode
|
||||
|
||||
Analyze existing GTM implementations:
|
||||
|
||||
```bash
|
||||
# Full audit
|
||||
python gtm_manager.py audit --url "https://yoursite.com" --journey full
|
||||
|
||||
# Specific container validation
|
||||
python gtm_manager.py audit --url "https://yoursite.com" --container "GTM-XXXXXX"
|
||||
|
||||
# Form tracking only
|
||||
python gtm_manager.py audit --url "https://yoursite.com/contact" --journey form
|
||||
|
||||
# Audit and generate missing tags
|
||||
python gtm_manager.py audit --url "https://yoursite.com" --generate-tags
|
||||
|
||||
# Export to Notion
|
||||
python gtm_manager.py audit --url "https://yoursite.com" --notion
|
||||
```
|
||||
|
||||
### Audit Options
|
||||
|
||||
| Option | Description | Default |
|
||||
|--------|-------------|---------|
|
||||
| `--url` | Target URL (required) | - |
|
||||
| `--container` | Expected GTM container ID | None |
|
||||
| `--journey` | pageview, scroll, form, checkout, datalayer, full | full |
|
||||
| `--output` | Output file path | gtm_audit_report.json |
|
||||
| `--generate-tags` | Generate missing dataLayer tags | False |
|
||||
| `--notion` | Export to Notion database | False |
|
||||
|
||||
## Inject Mode
|
||||
|
||||
Generate GTM custom HTML tags when you can't modify source code directly:
|
||||
|
||||
```bash
|
||||
# List available event types
|
||||
python gtm_manager.py inject --list-events
|
||||
|
||||
# Generate all ecommerce tags
|
||||
python gtm_manager.py inject --preset ecommerce --output ./tags
|
||||
|
||||
# Generate specific events
|
||||
python gtm_manager.py inject --event purchase --event add_to_cart
|
||||
|
||||
# Generate from audit report
|
||||
python gtm_manager.py inject --from-audit gtm_audit_report.json
|
||||
|
||||
# Generate with DOM scraping code
|
||||
python gtm_manager.py inject --event view_item --scrape
|
||||
```
|
||||
|
||||
### Inject Options
|
||||
|
||||
| Option | Description | Default |
|
||||
|--------|-------------|---------|
|
||||
| `--event` | Event type(s) to generate | - |
|
||||
| `--preset` | ecommerce, engagement, all | - |
|
||||
| `--from-audit` | Generate from audit report | - |
|
||||
| `--output` | Output directory | ./gtm_tags |
|
||||
| `--format` | html or json | html |
|
||||
| `--currency` | Currency code | KRW |
|
||||
| `--scrape` | Generate DOM scraping code | False |
|
||||
| `--list-events` | Show available events | - |
|
||||
|
||||
### Supported Events
|
||||
|
||||
**Ecommerce**: view_item, add_to_cart, remove_from_cart, view_cart, begin_checkout, add_shipping_info, add_payment_info, purchase
|
||||
|
||||
**Forms**: form_submit, form_start, generate_lead
|
||||
|
||||
**Engagement**: scroll, file_download, search, outbound_click, share
|
||||
|
||||
**Video**: video_start, video_progress, video_complete
|
||||
|
||||
**User**: login, sign_up, page_view
|
||||
|
||||
## Notion Integration
|
||||
|
||||
```bash
|
||||
# Set Notion API token
|
||||
export NOTION_TOKEN="secret_xxxxx"
|
||||
|
||||
# Export audit to Notion
|
||||
python gtm_manager.py audit --url "https://yoursite.com" --notion --notion-detailed
|
||||
```
|
||||
|
||||
## Using with Claude Code
|
||||
|
||||
This project includes a `CLAUDE.md` file optimized for Claude Code.
|
||||
|
||||
```bash
|
||||
claude
|
||||
|
||||
# Then ask Claude:
|
||||
> Run a GTM audit on https://example.com
|
||||
> Generate ecommerce dataLayer tags for my site
|
||||
> Check the checkout flow and create missing tags
|
||||
```
|
||||
|
||||
## Documentation
|
||||
|
||||
See `docs/` folder for:
|
||||
- GA4 event specifications
|
||||
- E-commerce dataLayer schemas
|
||||
- Form tracking patterns
|
||||
- Checkout flow sequences
|
||||
- Common issues and fixes
|
||||
|
||||
## License
|
||||
|
||||
MIT
|
||||
184
custom-skills/21-gtm-manager/code/CLAUDE.md
Normal file
184
custom-skills/21-gtm-manager/code/CLAUDE.md
Normal file
@@ -0,0 +1,184 @@
|
||||
# OurDigital GTM Manager
|
||||
|
||||
Comprehensive Google Tag Manager management toolkit - audit, analyze, and generate dataLayer implementations.
|
||||
|
||||
> **Note**: For lightweight audit-only functionality, see [13-ourdigital-gtm-audit](../13-ourdigital-gtm-audit/).
|
||||
|
||||
## Project Overview
|
||||
|
||||
This tool provides two main capabilities:
|
||||
1. **Audit**: Validate GTM installations, analyze dataLayer events, test form/checkout tracking
|
||||
2. **Inject**: Generate custom HTML tags for dataLayer pushes when direct code access is unavailable
|
||||
|
||||
## Quick Commands
|
||||
|
||||
```bash
|
||||
# Install dependencies
|
||||
pip install playwright
|
||||
playwright install chromium
|
||||
|
||||
# AUDIT MODE
|
||||
# Run full audit
|
||||
python gtm_manager.py audit --url "https://example.com" --journey full
|
||||
|
||||
# Audit with Notion export
|
||||
python gtm_manager.py audit --url "https://example.com" --notion
|
||||
|
||||
# Audit and generate missing tags
|
||||
python gtm_manager.py audit --url "https://example.com" --generate-tags
|
||||
|
||||
# INJECT MODE
|
||||
# List available event types
|
||||
python gtm_manager.py inject --list-events
|
||||
|
||||
# Generate all ecommerce tags
|
||||
python gtm_manager.py inject --preset ecommerce --output ./tags
|
||||
|
||||
# Generate specific event tags
|
||||
python gtm_manager.py inject --event purchase --event add_to_cart
|
||||
|
||||
# Generate from audit report
|
||||
python gtm_manager.py inject --from-audit gtm_audit_report.json
|
||||
|
||||
# Generate with DOM scraping
|
||||
python gtm_manager.py inject --event view_item --scrape
|
||||
```
|
||||
|
||||
## Audit Mode
|
||||
|
||||
### Journey Types
|
||||
|
||||
| Journey | Description |
|
||||
|---------|-------------|
|
||||
| `pageview` | Basic page load + scroll simulation |
|
||||
| `scroll` | Scroll depth trigger testing (25%, 50%, 75%, 90%) |
|
||||
| `form` | Form discovery, field analysis, interaction simulation |
|
||||
| `checkout` | E-commerce flow: cart → checkout → shipping → payment → purchase |
|
||||
| `datalayer` | Deep dataLayer validation and event sequence analysis |
|
||||
| `full` | All of the above combined |
|
||||
|
||||
### Audit Output
|
||||
|
||||
Generates `gtm_audit_report.json` with:
|
||||
- Container status (installed, position, duplicates)
|
||||
- DataLayer analysis (events, validation issues, sequence errors)
|
||||
- Form analysis (forms found, tracking readiness, missing events)
|
||||
- Checkout analysis (elements detected, flow issues)
|
||||
- Network requests (GA4, Meta, LinkedIn, etc.)
|
||||
- Recommendations and checklist
|
||||
|
||||
## Inject Mode
|
||||
|
||||
Generate GTM custom HTML tags for dataLayer injection when you can't modify source code directly.
|
||||
|
||||
### Event Categories
|
||||
|
||||
**Ecommerce:**
|
||||
- `view_item`, `add_to_cart`, `remove_from_cart`, `view_cart`
|
||||
- `begin_checkout`, `add_shipping_info`, `add_payment_info`, `purchase`
|
||||
|
||||
**Forms & Leads:**
|
||||
- `form_submit`, `form_start`, `generate_lead`
|
||||
|
||||
**Engagement:**
|
||||
- `scroll`, `file_download`, `search`, `outbound_click`, `share`
|
||||
|
||||
**Video:**
|
||||
- `video_start`, `video_progress`, `video_complete`
|
||||
|
||||
**User:**
|
||||
- `login`, `sign_up`, `page_view`
|
||||
|
||||
### Presets
|
||||
|
||||
| Preset | Events Included |
|
||||
|--------|-----------------|
|
||||
| `ecommerce` | All 8 ecommerce events |
|
||||
| `engagement` | Forms, scroll, downloads, video, search, sharing |
|
||||
| `all` | Everything including page_view, login, sign_up |
|
||||
|
||||
### Generated Tag Features
|
||||
|
||||
- GA4-compliant dataLayer structure
|
||||
- Ecommerce object clearing before pushes
|
||||
- DOM scraping option for dynamic values
|
||||
- Trigger recommendations and selectors
|
||||
- Korean payment method support
|
||||
|
||||
## Notion Integration
|
||||
|
||||
Export audit results directly to Notion database for tracking and collaboration.
|
||||
|
||||
```bash
|
||||
# Export to default Notion database (OurDigital GTM Audit Log)
|
||||
python gtm_manager.py audit --url "https://example.com" --notion
|
||||
|
||||
# Export with detailed content
|
||||
python gtm_manager.py audit --url "https://example.com" --notion --notion-detailed
|
||||
```
|
||||
|
||||
### Environment Variables
|
||||
|
||||
```bash
|
||||
export NOTION_TOKEN="secret_xxxxx"
|
||||
```
|
||||
|
||||
### Default Database
|
||||
|
||||
Default Notion database: [OurDigital GTM Audit Log](https://www.notion.so/2cf581e58a1e8163997fccb387156a20)
|
||||
|
||||
## Key Files
|
||||
|
||||
- `gtm_manager.py` - Main script with audit and inject functionality
|
||||
- `docs/ga4_events.md` - GA4 event specifications
|
||||
- `docs/ecommerce_schema.md` - E-commerce dataLayer structures
|
||||
- `docs/form_tracking.md` - Form event patterns
|
||||
- `docs/checkout_flow.md` - Checkout funnel sequence
|
||||
- `docs/datalayer_validation.md` - Validation rules
|
||||
- `docs/common_issues.md` - Frequent problems and fixes
|
||||
|
||||
## Coding Guidelines
|
||||
|
||||
When modifying this tool:
|
||||
|
||||
1. **Tag Destinations**: Add new platforms to `TAG_DESTINATIONS` dict
|
||||
2. **Event Validation**: Add requirements to `GA4_EVENT_REQUIREMENTS` dict
|
||||
3. **Event Templates**: Add new events to `DataLayerInjector.EVENT_TEMPLATES`
|
||||
4. **Form Selectors**: Extend `FormAnalyzer.discover_forms()` for custom forms
|
||||
5. **Checkout Elements**: Add selectors to `CheckoutFlowAnalyzer.detect_checkout_elements()`
|
||||
|
||||
## Korean Market Considerations
|
||||
|
||||
- Support Korean payment methods (카카오페이, 네이버페이, 토스)
|
||||
- Handle KRW currency (no decimals)
|
||||
- Include Kakao Pixel and Naver Analytics patterns
|
||||
- Korean button text patterns (장바구니, 결제하기, 주문하기)
|
||||
|
||||
## Common Tasks
|
||||
|
||||
### Add support for new tag platform
|
||||
```python
|
||||
# In TAG_DESTINATIONS dict
|
||||
"NewPlatform": [
|
||||
r"tracking\.newplatform\.com",
|
||||
r"pixel\.newplatform\.com",
|
||||
],
|
||||
```
|
||||
|
||||
### Add new event type for injection
|
||||
```python
|
||||
# In DataLayerInjector.EVENT_TEMPLATES
|
||||
"custom_event": {
|
||||
"description": "Track custom action",
|
||||
"params": {
|
||||
"custom_param": "/* value */",
|
||||
},
|
||||
"trigger": "Custom Trigger",
|
||||
},
|
||||
```
|
||||
|
||||
### Extend checkout flow for specific platform
|
||||
```python
|
||||
# In CheckoutFlowAnalyzer.detect_checkout_elements()
|
||||
# Add platform-specific selectors
|
||||
```
|
||||
237
custom-skills/21-gtm-manager/code/docs/checkout_flow.md
Normal file
237
custom-skills/21-gtm-manager/code/docs/checkout_flow.md
Normal file
@@ -0,0 +1,237 @@
|
||||
# E-commerce Checkout Flow Reference
|
||||
|
||||
## Complete Checkout Event Sequence
|
||||
|
||||
```
|
||||
view_cart → begin_checkout → add_shipping_info → add_payment_info → purchase
|
||||
```
|
||||
|
||||
Each step must fire in order with consistent item data.
|
||||
|
||||
## Event Details
|
||||
|
||||
### 1. view_cart
|
||||
When user views cart page.
|
||||
|
||||
```javascript
|
||||
dataLayer.push({ ecommerce: null });
|
||||
dataLayer.push({
|
||||
event: "view_cart",
|
||||
ecommerce: {
|
||||
currency: "KRW",
|
||||
value: 125000,
|
||||
items: [{
|
||||
item_id: "SKU_001",
|
||||
item_name: "Blue T-Shirt",
|
||||
price: 45000,
|
||||
quantity: 2,
|
||||
item_brand: "Brand",
|
||||
item_category: "Apparel"
|
||||
}, {
|
||||
item_id: "SKU_002",
|
||||
item_name: "Black Jeans",
|
||||
price: 35000,
|
||||
quantity: 1
|
||||
}]
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
### 2. begin_checkout
|
||||
When user initiates checkout process.
|
||||
|
||||
```javascript
|
||||
dataLayer.push({ ecommerce: null });
|
||||
dataLayer.push({
|
||||
event: "begin_checkout",
|
||||
ecommerce: {
|
||||
currency: "KRW",
|
||||
value: 125000,
|
||||
coupon: "SUMMER10",
|
||||
items: [/* same items as view_cart */]
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
### 3. add_shipping_info
|
||||
When user completes shipping step.
|
||||
|
||||
```javascript
|
||||
dataLayer.push({ ecommerce: null });
|
||||
dataLayer.push({
|
||||
event: "add_shipping_info",
|
||||
ecommerce: {
|
||||
currency: "KRW",
|
||||
value: 125000,
|
||||
coupon: "SUMMER10",
|
||||
shipping_tier: "Express", // Required
|
||||
items: [/* same items */]
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
**shipping_tier values:**
|
||||
- "Standard" / "일반배송"
|
||||
- "Express" / "익일배송"
|
||||
- "Same Day" / "당일배송"
|
||||
- "Free" / "무료배송"
|
||||
- "Store Pickup" / "매장픽업"
|
||||
|
||||
### 4. add_payment_info
|
||||
When user enters payment details.
|
||||
|
||||
```javascript
|
||||
dataLayer.push({ ecommerce: null });
|
||||
dataLayer.push({
|
||||
event: "add_payment_info",
|
||||
ecommerce: {
|
||||
currency: "KRW",
|
||||
value: 125000,
|
||||
coupon: "SUMMER10",
|
||||
payment_type: "Credit Card", // Required
|
||||
items: [/* same items */]
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
**payment_type values:**
|
||||
- "Credit Card" / "신용카드"
|
||||
- "Debit Card" / "체크카드"
|
||||
- "Bank Transfer" / "계좌이체"
|
||||
- "Virtual Account" / "가상계좌"
|
||||
- "Mobile Payment" / "휴대폰결제"
|
||||
- "Kakao Pay" / "카카오페이"
|
||||
- "Naver Pay" / "네이버페이"
|
||||
- "Toss" / "토스"
|
||||
- "PayPal"
|
||||
|
||||
### 5. purchase
|
||||
When transaction completes successfully.
|
||||
|
||||
```javascript
|
||||
dataLayer.push({ ecommerce: null });
|
||||
dataLayer.push({
|
||||
event: "purchase",
|
||||
ecommerce: {
|
||||
transaction_id: "T_20250115_001234", // Required, unique
|
||||
value: 130500, // Required (total)
|
||||
tax: 11863,
|
||||
shipping: 5000,
|
||||
currency: "KRW", // Required
|
||||
coupon: "SUMMER10",
|
||||
items: [{
|
||||
item_id: "SKU_001",
|
||||
item_name: "Blue T-Shirt",
|
||||
affiliation: "Online Store",
|
||||
coupon: "SUMMER10",
|
||||
discount: 4500,
|
||||
price: 45000,
|
||||
quantity: 2
|
||||
}]
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
## Funnel Drop-off Analysis
|
||||
|
||||
### Tracking Drop-offs
|
||||
Monitor completion rate at each step:
|
||||
|
||||
| Step | Event | Drop-off Indicator |
|
||||
|------|-------|-------------------|
|
||||
| Cart | view_cart | User leaves cart page |
|
||||
| Checkout Start | begin_checkout | User doesn't proceed |
|
||||
| Shipping | add_shipping_info | Address form abandoned |
|
||||
| Payment | add_payment_info | Payment not completed |
|
||||
| Complete | purchase | Transaction failed |
|
||||
|
||||
### Implementing Drop-off Tracking
|
||||
|
||||
```javascript
|
||||
// Track checkout step viewed but not completed
|
||||
let checkoutStep = 0;
|
||||
|
||||
function trackCheckoutProgress(step) {
|
||||
if (step > checkoutStep) {
|
||||
checkoutStep = step;
|
||||
}
|
||||
}
|
||||
|
||||
window.addEventListener('beforeunload', () => {
|
||||
if (checkoutStep > 0 && checkoutStep < 5) {
|
||||
dataLayer.push({
|
||||
event: 'checkout_abandon',
|
||||
last_step: checkoutStep,
|
||||
step_name: ['cart', 'checkout', 'shipping', 'payment', 'complete'][checkoutStep - 1]
|
||||
});
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
## Value Consistency Check
|
||||
|
||||
Ensure `value` matches across events:
|
||||
|
||||
```
|
||||
view_cart.value = sum(items.price * items.quantity)
|
||||
begin_checkout.value = view_cart.value
|
||||
add_shipping_info.value = begin_checkout.value
|
||||
add_payment_info.value = add_shipping_info.value
|
||||
purchase.value = add_payment_info.value + shipping + tax - discount
|
||||
```
|
||||
|
||||
## Common Issues
|
||||
|
||||
### Duplicate Purchase Events
|
||||
**Problem**: Same order tracked multiple times
|
||||
**Solution**:
|
||||
```javascript
|
||||
// Check if already tracked
|
||||
const txId = "T_12345";
|
||||
if (!sessionStorage.getItem('purchase_' + txId)) {
|
||||
dataLayer.push({ event: 'purchase', ... });
|
||||
sessionStorage.setItem('purchase_' + txId, 'true');
|
||||
}
|
||||
```
|
||||
|
||||
### Missing Items in Later Steps
|
||||
**Problem**: Items present in view_cart but missing in purchase
|
||||
**Solution**: Store cart data in session and reuse
|
||||
|
||||
### Inconsistent Currency
|
||||
**Problem**: Some events use USD, others KRW
|
||||
**Solution**: Standardize currency across all events
|
||||
|
||||
### Wrong Value Calculation
|
||||
**Problem**: purchase.value doesn't include tax/shipping
|
||||
**Solution**:
|
||||
```
|
||||
purchase.value = subtotal + tax + shipping - discount
|
||||
```
|
||||
|
||||
## Korean E-commerce Platforms
|
||||
|
||||
### Cafe24
|
||||
Custom dataLayer variable names - check documentation
|
||||
|
||||
### Shopify Korea
|
||||
Standard GA4 format with `Shopify.checkout` object
|
||||
|
||||
### WooCommerce
|
||||
Use official GA4 plugin or custom implementation
|
||||
|
||||
### Naver SmartStore
|
||||
Separate Naver Analytics implementation required
|
||||
|
||||
## Checkout Flow Checklist
|
||||
|
||||
- [ ] view_cart fires on cart page load
|
||||
- [ ] begin_checkout fires on checkout button click
|
||||
- [ ] add_shipping_info includes shipping_tier
|
||||
- [ ] add_payment_info includes payment_type
|
||||
- [ ] purchase has unique transaction_id
|
||||
- [ ] All events have consistent items array
|
||||
- [ ] Currency is consistent across all events
|
||||
- [ ] Value calculations are accurate
|
||||
- [ ] ecommerce object cleared before each push
|
||||
- [ ] Purchase event fires only once per order
|
||||
211
custom-skills/21-gtm-manager/code/docs/common_issues.md
Normal file
211
custom-skills/21-gtm-manager/code/docs/common_issues.md
Normal file
@@ -0,0 +1,211 @@
|
||||
# Common GTM Issues & Fixes
|
||||
|
||||
## Container Issues
|
||||
|
||||
### GTM Not Firing
|
||||
**Symptoms**: No GTM requests in network tab
|
||||
**Causes**:
|
||||
1. Script blocked by ad blocker
|
||||
2. Script placed after closing body tag
|
||||
3. JavaScript error before GTM loads
|
||||
4. Consent management blocking GTM
|
||||
|
||||
**Fix**:
|
||||
```html
|
||||
<!-- Place immediately after opening <head> tag -->
|
||||
<script>(function(w,d,s,l,i){...})(window,document,'script','dataLayer','GTM-XXXXXX');</script>
|
||||
```
|
||||
|
||||
### Multiple Containers Conflict
|
||||
**Symptoms**: Duplicate events, inconsistent data
|
||||
**Causes**:
|
||||
1. Legacy container not removed
|
||||
2. Different teams installed separate containers
|
||||
3. Theme/plugin auto-installed GTM
|
||||
|
||||
**Fix**:
|
||||
1. Audit all containers in source
|
||||
2. Consolidate to single container
|
||||
3. Use GTM environments for staging/prod
|
||||
|
||||
### Container ID Mismatch
|
||||
**Symptoms**: Tags not firing, wrong property receiving data
|
||||
**Causes**:
|
||||
1. Dev/staging container on production
|
||||
2. Copy-paste error during installation
|
||||
|
||||
**Fix**: Verify container ID matches GTM account
|
||||
|
||||
---
|
||||
|
||||
## DataLayer Issues
|
||||
|
||||
### DataLayer Not Initialized
|
||||
**Symptoms**: First push events lost
|
||||
**Code Error**:
|
||||
```javascript
|
||||
// Wrong - GTM loads before dataLayer exists
|
||||
<script>GTM snippet</script>
|
||||
dataLayer.push({...});
|
||||
```
|
||||
|
||||
**Fix**:
|
||||
```javascript
|
||||
// Correct - Initialize dataLayer first
|
||||
<script>window.dataLayer = window.dataLayer || [];</script>
|
||||
<script>GTM snippet</script>
|
||||
```
|
||||
|
||||
### Case Sensitivity Issues
|
||||
**Symptoms**: Triggers not matching
|
||||
**Example**:
|
||||
```javascript
|
||||
// DataLayer pushes "AddToCart"
|
||||
dataLayer.push({ event: "AddToCart" });
|
||||
|
||||
// But GTM trigger looks for "addToCart" - won't match!
|
||||
```
|
||||
|
||||
**Fix**: Standardize event naming (recommend lowercase with underscores)
|
||||
|
||||
### Wrong Data Types
|
||||
**Symptoms**: Calculations wrong in GA4, missing data
|
||||
**Example**:
|
||||
```javascript
|
||||
// Wrong - price as string
|
||||
dataLayer.push({ ecommerce: { value: "29.99" }});
|
||||
|
||||
// Correct - price as number
|
||||
dataLayer.push({ ecommerce: { value: 29.99 }});
|
||||
```
|
||||
|
||||
### Timing Issues
|
||||
**Symptoms**: Events fire before data available
|
||||
**Cause**: DataLayer push happens after tag fires
|
||||
|
||||
**Fix**: Use "Custom Event" trigger instead of "Page View"
|
||||
|
||||
---
|
||||
|
||||
## Tag Issues
|
||||
|
||||
### Tag Not Firing
|
||||
|
||||
**Checklist**:
|
||||
1. ✓ Trigger conditions met?
|
||||
2. ✓ Trigger enabled?
|
||||
3. ✓ Tag not paused?
|
||||
4. ✓ No blocking triggers active?
|
||||
5. ✓ Consent mode not blocking?
|
||||
|
||||
**Debug Steps**:
|
||||
1. GTM Preview > Check Tags Fired
|
||||
2. Verify trigger shows green check
|
||||
3. Check Variables tab for expected values
|
||||
|
||||
### Duplicate Tag Firing
|
||||
**Symptoms**: Events counted 2x in GA4
|
||||
**Causes**:
|
||||
1. Multiple triggers on same action
|
||||
2. Page re-renders triggering again
|
||||
3. SPA virtual pageviews firing multiple times
|
||||
|
||||
**Fix**:
|
||||
1. Add "Once per event" tag firing option
|
||||
2. Use trigger groups to control firing
|
||||
3. Add conditions to prevent re-firing
|
||||
|
||||
### Wrong Parameters Sent
|
||||
**Symptoms**: Data appears in wrong fields in GA4
|
||||
**Debug**:
|
||||
1. GTM Preview > Tags > Show fired tag
|
||||
2. Check "Values" sent with tag
|
||||
3. Compare with expected parameters
|
||||
|
||||
---
|
||||
|
||||
## E-commerce Issues
|
||||
|
||||
### Missing Transaction ID
|
||||
**Symptoms**: Duplicate purchases counted
|
||||
**Fix**: Ensure unique `transaction_id` generated server-side
|
||||
|
||||
### Items Array Empty
|
||||
**Symptoms**: Revenue tracked but no products
|
||||
**Check**: `ecommerce.items` array populated
|
||||
|
||||
### Value Mismatch
|
||||
**Symptoms**: Revenue doesn't match actual
|
||||
**Causes**:
|
||||
1. Tax/shipping included inconsistently
|
||||
2. Currency conversion issues
|
||||
3. Discount applied incorrectly
|
||||
|
||||
### Purchase Event Fires Multiple Times
|
||||
**Symptoms**: Same order tracked 2-3x
|
||||
**Causes**:
|
||||
1. Page refresh on confirmation
|
||||
2. Browser back button
|
||||
3. Email link revisit
|
||||
|
||||
**Fix**:
|
||||
```javascript
|
||||
// Check if already tracked
|
||||
if (!sessionStorage.getItem('purchase_' + transaction_id)) {
|
||||
dataLayer.push({ event: 'purchase', ... });
|
||||
sessionStorage.setItem('purchase_' + transaction_id, 'true');
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Consent Mode Issues
|
||||
|
||||
### Tags Blocked by Consent
|
||||
**Symptoms**: Tags show "Blocked by consent" in Preview
|
||||
**Fix**:
|
||||
1. Verify consent mode implementation
|
||||
2. Check default consent state
|
||||
3. Test with consent granted
|
||||
|
||||
### Consent Not Updating
|
||||
**Symptoms**: Tags stay blocked after user accepts
|
||||
**Fix**: Verify `gtag('consent', 'update', {...})` fires on accept
|
||||
|
||||
---
|
||||
|
||||
## SPA (Single Page App) Issues
|
||||
|
||||
### Pageviews Not Tracking Navigation
|
||||
**Symptoms**: Only initial pageview tracked
|
||||
**Cause**: No page reload on route change
|
||||
|
||||
**Fix**: Implement History Change trigger or custom event:
|
||||
```javascript
|
||||
// On route change
|
||||
dataLayer.push({
|
||||
event: 'virtual_pageview',
|
||||
page_path: newPath,
|
||||
page_title: newTitle
|
||||
});
|
||||
```
|
||||
|
||||
### Events Fire on Old Page Data
|
||||
**Symptoms**: Wrong page_path in events
|
||||
**Fix**: Update page variables before event push
|
||||
|
||||
---
|
||||
|
||||
## Performance Issues
|
||||
|
||||
### Tags Slowing Page Load
|
||||
**Symptoms**: High LCP, slow TTI
|
||||
**Causes**:
|
||||
1. Too many synchronous tags
|
||||
2. Large third-party scripts
|
||||
3. Tags in wrong firing sequence
|
||||
|
||||
**Fix**:
|
||||
1. Use tag sequencing
|
||||
2. Load non-critical tags on Window Loaded
|
||||
3. Defer marketing tags
|
||||
287
custom-skills/21-gtm-manager/code/docs/datalayer_validation.md
Normal file
287
custom-skills/21-gtm-manager/code/docs/datalayer_validation.md
Normal file
@@ -0,0 +1,287 @@
|
||||
# DataLayer Validation Reference
|
||||
|
||||
## DataLayer Structure Basics
|
||||
|
||||
### Proper Initialization
|
||||
```javascript
|
||||
// Must appear BEFORE GTM script
|
||||
<script>
|
||||
window.dataLayer = window.dataLayer || [];
|
||||
</script>
|
||||
<!-- GTM script here -->
|
||||
```
|
||||
|
||||
### Push Syntax
|
||||
```javascript
|
||||
// Correct
|
||||
dataLayer.push({ event: "page_view", page_title: "Home" });
|
||||
|
||||
// Wrong - direct assignment
|
||||
dataLayer = [{ event: "page_view" }]; // ❌ Overwrites array
|
||||
```
|
||||
|
||||
## Validation Rules
|
||||
|
||||
### Event Names
|
||||
|
||||
| Rule | Valid | Invalid |
|
||||
|------|-------|---------|
|
||||
| Alphanumeric + underscore | `add_to_cart` | `add-to-cart` |
|
||||
| Max 40 characters | `purchase` | (too long names) |
|
||||
| Case sensitive | `addToCart` ≠ `addtocart` | - |
|
||||
| No spaces | `form_submit` | `form submit` |
|
||||
| No special chars | `click_cta` | `click@cta` |
|
||||
|
||||
### Parameter Names
|
||||
|
||||
| Rule | Valid | Invalid |
|
||||
|------|-------|---------|
|
||||
| Max 40 characters | `item_category` | (too long) |
|
||||
| Alphanumeric + underscore | `user_id` | `user-id` |
|
||||
| Cannot start with `_` | `custom_param` | `_private` |
|
||||
| Cannot start with number | `step_1` | `1_step` |
|
||||
|
||||
### Data Types
|
||||
|
||||
| Parameter | Expected Type | Example |
|
||||
|-----------|---------------|---------|
|
||||
| value | number | `29.99` not `"29.99"` |
|
||||
| currency | string (ISO 4217) | `"USD"`, `"KRW"` |
|
||||
| transaction_id | string | `"T_12345"` |
|
||||
| quantity | integer | `2` not `2.0` |
|
||||
| price | number | `45000` |
|
||||
| items | array | `[{...}, {...}]` |
|
||||
|
||||
### Type Validation Code
|
||||
|
||||
```javascript
|
||||
function validateDataLayerPush(data) {
|
||||
const issues = [];
|
||||
|
||||
// Check value is number
|
||||
if (data.ecommerce?.value !== undefined) {
|
||||
if (typeof data.ecommerce.value !== 'number') {
|
||||
issues.push(`value should be number, got ${typeof data.ecommerce.value}`);
|
||||
}
|
||||
}
|
||||
|
||||
// Check currency format
|
||||
if (data.ecommerce?.currency) {
|
||||
if (!/^[A-Z]{3}$/.test(data.ecommerce.currency)) {
|
||||
issues.push(`currency should be 3-letter ISO code`);
|
||||
}
|
||||
}
|
||||
|
||||
// Check items array
|
||||
if (data.ecommerce?.items) {
|
||||
if (!Array.isArray(data.ecommerce.items)) {
|
||||
issues.push(`items should be array`);
|
||||
} else {
|
||||
data.ecommerce.items.forEach((item, i) => {
|
||||
if (!item.item_id) issues.push(`items[${i}] missing item_id`);
|
||||
if (!item.item_name) issues.push(`items[${i}] missing item_name`);
|
||||
if (item.price && typeof item.price !== 'number') {
|
||||
issues.push(`items[${i}].price should be number`);
|
||||
}
|
||||
if (item.quantity && !Number.isInteger(item.quantity)) {
|
||||
issues.push(`items[${i}].quantity should be integer`);
|
||||
}
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
return issues;
|
||||
}
|
||||
```
|
||||
|
||||
## E-commerce Object Clearing
|
||||
|
||||
### Why Clear?
|
||||
GA4 may merge previous ecommerce data with new events.
|
||||
|
||||
### Correct Pattern
|
||||
```javascript
|
||||
// Clear first
|
||||
dataLayer.push({ ecommerce: null });
|
||||
|
||||
// Then push new event
|
||||
dataLayer.push({
|
||||
event: "view_item",
|
||||
ecommerce: { ... }
|
||||
});
|
||||
```
|
||||
|
||||
### Validation Check
|
||||
```javascript
|
||||
function checkEcommerceClear(dataLayerArray) {
|
||||
let lastHadEcommerce = false;
|
||||
const issues = [];
|
||||
|
||||
dataLayerArray.forEach((item, i) => {
|
||||
const hasEcommerce = 'ecommerce' in item;
|
||||
const isNull = item.ecommerce === null;
|
||||
|
||||
if (hasEcommerce && !isNull && lastHadEcommerce) {
|
||||
issues.push({
|
||||
index: i,
|
||||
message: 'Missing ecommerce:null before this push'
|
||||
});
|
||||
}
|
||||
|
||||
lastHadEcommerce = hasEcommerce && !isNull;
|
||||
});
|
||||
|
||||
return issues;
|
||||
}
|
||||
```
|
||||
|
||||
## Event Sequence Validation
|
||||
|
||||
### Expected Sequences
|
||||
|
||||
**E-commerce Purchase Flow:**
|
||||
```
|
||||
view_item_list? → view_item → add_to_cart → view_cart →
|
||||
begin_checkout → add_shipping_info → add_payment_info → purchase
|
||||
```
|
||||
|
||||
**Form Submission:**
|
||||
```
|
||||
form_start → form_submit → generate_lead?
|
||||
```
|
||||
|
||||
**User Authentication:**
|
||||
```
|
||||
login | sign_up
|
||||
```
|
||||
|
||||
### Sequence Validator
|
||||
|
||||
```javascript
|
||||
function validateSequence(events, expectedOrder) {
|
||||
const eventNames = events
|
||||
.filter(e => e.event)
|
||||
.map(e => e.event);
|
||||
|
||||
let lastIndex = -1;
|
||||
const issues = [];
|
||||
|
||||
eventNames.forEach(event => {
|
||||
const index = expectedOrder.indexOf(event);
|
||||
if (index !== -1) {
|
||||
if (index < lastIndex) {
|
||||
issues.push(`${event} fired out of expected order`);
|
||||
}
|
||||
lastIndex = index;
|
||||
}
|
||||
});
|
||||
|
||||
return issues;
|
||||
}
|
||||
```
|
||||
|
||||
## Duplicate Event Detection
|
||||
|
||||
### Common Duplicates
|
||||
- Multiple `page_view` on single page load
|
||||
- `purchase` firing on page refresh
|
||||
- Click events on bubbling elements
|
||||
|
||||
### Detection Code
|
||||
|
||||
```javascript
|
||||
function findDuplicates(events) {
|
||||
const seen = {};
|
||||
const duplicates = [];
|
||||
|
||||
events.forEach((event, i) => {
|
||||
if (!event.event) return;
|
||||
|
||||
const key = JSON.stringify(event);
|
||||
if (seen[key]) {
|
||||
duplicates.push({
|
||||
event: event.event,
|
||||
firstIndex: seen[key],
|
||||
duplicateIndex: i
|
||||
});
|
||||
} else {
|
||||
seen[key] = i;
|
||||
}
|
||||
});
|
||||
|
||||
return duplicates;
|
||||
}
|
||||
```
|
||||
|
||||
## Real-time Monitoring Setup
|
||||
|
||||
### Console Monitoring
|
||||
|
||||
```javascript
|
||||
// Paste in browser console to monitor pushes
|
||||
(function() {
|
||||
const original = dataLayer.push;
|
||||
dataLayer.push = function() {
|
||||
console.group('📊 dataLayer.push');
|
||||
console.log('Data:', arguments[0]);
|
||||
console.log('Time:', new Date().toISOString());
|
||||
console.groupEnd();
|
||||
return original.apply(this, arguments);
|
||||
};
|
||||
console.log('✅ DataLayer monitoring active');
|
||||
})();
|
||||
```
|
||||
|
||||
### Export DataLayer
|
||||
|
||||
```javascript
|
||||
// Copy full dataLayer to clipboard
|
||||
copy(JSON.stringify(dataLayer, null, 2));
|
||||
```
|
||||
|
||||
## Validation Checklist
|
||||
|
||||
### Structure
|
||||
- [ ] dataLayer initialized before GTM
|
||||
- [ ] Using push() not assignment
|
||||
- [ ] Event names follow conventions
|
||||
- [ ] Parameter names follow conventions
|
||||
|
||||
### Data Types
|
||||
- [ ] value is number
|
||||
- [ ] currency is 3-letter code
|
||||
- [ ] quantity is integer
|
||||
- [ ] items is array
|
||||
- [ ] Required fields present
|
||||
|
||||
### E-commerce
|
||||
- [ ] ecommerce:null before each push
|
||||
- [ ] items array has item_id and item_name
|
||||
- [ ] transaction_id is unique
|
||||
- [ ] Consistent currency across events
|
||||
|
||||
### Sequence
|
||||
- [ ] Events fire in logical order
|
||||
- [ ] No duplicate events
|
||||
- [ ] Purchase fires only once
|
||||
|
||||
## Debug Tools
|
||||
|
||||
### GTM Preview Mode
|
||||
- Real-time event inspection
|
||||
- Variable value checking
|
||||
- Tag firing verification
|
||||
|
||||
### GA4 DebugView
|
||||
- Live event stream
|
||||
- Parameter validation
|
||||
- User property tracking
|
||||
|
||||
### Browser Console
|
||||
```javascript
|
||||
// View current dataLayer
|
||||
console.table(dataLayer);
|
||||
|
||||
// Filter by event
|
||||
dataLayer.filter(d => d.event === 'purchase');
|
||||
```
|
||||
216
custom-skills/21-gtm-manager/code/docs/ecommerce_schema.md
Normal file
216
custom-skills/21-gtm-manager/code/docs/ecommerce_schema.md
Normal file
@@ -0,0 +1,216 @@
|
||||
# E-commerce DataLayer Schema Reference
|
||||
|
||||
## GA4 E-commerce Structure
|
||||
|
||||
### Items Array Schema
|
||||
Every e-commerce event requires an `items` array:
|
||||
|
||||
```javascript
|
||||
items: [{
|
||||
// Required
|
||||
item_id: "SKU_12345",
|
||||
item_name: "Blue T-Shirt",
|
||||
|
||||
// Recommended
|
||||
affiliation: "Store Name",
|
||||
coupon: "SUMMER_SALE",
|
||||
discount: 5.00,
|
||||
index: 0,
|
||||
item_brand: "Brand Name",
|
||||
item_category: "Apparel",
|
||||
item_category2: "Men",
|
||||
item_category3: "Shirts",
|
||||
item_category4: "T-Shirts",
|
||||
item_category5: "Short Sleeve",
|
||||
item_list_id: "related_products",
|
||||
item_list_name: "Related Products",
|
||||
item_variant: "Blue/Large",
|
||||
location_id: "ChIJIQBpAG2ahYAR_6128GcTUEo",
|
||||
price: 29.99,
|
||||
quantity: 1
|
||||
}]
|
||||
```
|
||||
|
||||
### Clear Previous E-commerce Data
|
||||
Always clear before new e-commerce event:
|
||||
|
||||
```javascript
|
||||
dataLayer.push({ ecommerce: null });
|
||||
dataLayer.push({
|
||||
event: "view_item",
|
||||
ecommerce: {
|
||||
// new data
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
## Complete Purchase Flow
|
||||
|
||||
### 1. Product List View
|
||||
```javascript
|
||||
dataLayer.push({ ecommerce: null });
|
||||
dataLayer.push({
|
||||
event: "view_item_list",
|
||||
ecommerce: {
|
||||
item_list_id: "category_results",
|
||||
item_list_name: "Category Results",
|
||||
items: [
|
||||
{ item_id: "SKU_001", item_name: "Product 1", index: 0, price: 29.99 },
|
||||
{ item_id: "SKU_002", item_name: "Product 2", index: 1, price: 39.99 }
|
||||
]
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
### 2. Product Click
|
||||
```javascript
|
||||
dataLayer.push({ ecommerce: null });
|
||||
dataLayer.push({
|
||||
event: "select_item",
|
||||
ecommerce: {
|
||||
item_list_id: "category_results",
|
||||
item_list_name: "Category Results",
|
||||
items: [{
|
||||
item_id: "SKU_001",
|
||||
item_name: "Product 1",
|
||||
price: 29.99
|
||||
}]
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
### 3. Product Detail View
|
||||
```javascript
|
||||
dataLayer.push({ ecommerce: null });
|
||||
dataLayer.push({
|
||||
event: "view_item",
|
||||
ecommerce: {
|
||||
currency: "USD",
|
||||
value: 29.99,
|
||||
items: [{
|
||||
item_id: "SKU_001",
|
||||
item_name: "Product 1",
|
||||
item_brand: "Brand",
|
||||
item_category: "Category",
|
||||
price: 29.99,
|
||||
quantity: 1
|
||||
}]
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
### 4. Add to Cart
|
||||
```javascript
|
||||
dataLayer.push({ ecommerce: null });
|
||||
dataLayer.push({
|
||||
event: "add_to_cart",
|
||||
ecommerce: {
|
||||
currency: "USD",
|
||||
value: 29.99,
|
||||
items: [{
|
||||
item_id: "SKU_001",
|
||||
item_name: "Product 1",
|
||||
price: 29.99,
|
||||
quantity: 1
|
||||
}]
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
### 5. View Cart
|
||||
```javascript
|
||||
dataLayer.push({ ecommerce: null });
|
||||
dataLayer.push({
|
||||
event: "view_cart",
|
||||
ecommerce: {
|
||||
currency: "USD",
|
||||
value: 59.98,
|
||||
items: [
|
||||
{ item_id: "SKU_001", item_name: "Product 1", price: 29.99, quantity: 1 },
|
||||
{ item_id: "SKU_002", item_name: "Product 2", price: 29.99, quantity: 1 }
|
||||
]
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
### 6. Begin Checkout
|
||||
```javascript
|
||||
dataLayer.push({ ecommerce: null });
|
||||
dataLayer.push({
|
||||
event: "begin_checkout",
|
||||
ecommerce: {
|
||||
currency: "USD",
|
||||
value: 59.98,
|
||||
coupon: "DISCOUNT10",
|
||||
items: [...]
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
### 7. Add Shipping Info
|
||||
```javascript
|
||||
dataLayer.push({ ecommerce: null });
|
||||
dataLayer.push({
|
||||
event: "add_shipping_info",
|
||||
ecommerce: {
|
||||
currency: "USD",
|
||||
value: 59.98,
|
||||
shipping_tier: "Standard",
|
||||
items: [...]
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
### 8. Add Payment Info
|
||||
```javascript
|
||||
dataLayer.push({ ecommerce: null });
|
||||
dataLayer.push({
|
||||
event: "add_payment_info",
|
||||
ecommerce: {
|
||||
currency: "USD",
|
||||
value: 59.98,
|
||||
payment_type: "Credit Card",
|
||||
items: [...]
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
### 9. Purchase
|
||||
```javascript
|
||||
dataLayer.push({ ecommerce: null });
|
||||
dataLayer.push({
|
||||
event: "purchase",
|
||||
ecommerce: {
|
||||
transaction_id: "T_12345",
|
||||
value: 65.97,
|
||||
tax: 4.99,
|
||||
shipping: 5.99,
|
||||
currency: "USD",
|
||||
coupon: "DISCOUNT10",
|
||||
items: [{
|
||||
item_id: "SKU_001",
|
||||
item_name: "Product 1",
|
||||
affiliation: "Online Store",
|
||||
coupon: "DISCOUNT10",
|
||||
discount: 3.00,
|
||||
item_brand: "Brand",
|
||||
item_category: "Category",
|
||||
price: 29.99,
|
||||
quantity: 1
|
||||
}]
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
## Korean E-commerce Considerations
|
||||
|
||||
### Currency
|
||||
```javascript
|
||||
currency: "KRW",
|
||||
value: 35000 // No decimals for KRW
|
||||
```
|
||||
|
||||
### Common Korean Platform Integrations
|
||||
- Cafe24: Uses custom dataLayer structure
|
||||
- Shopify Korea: Standard GA4 format
|
||||
- Naver SmartStore: Custom pixel implementation
|
||||
157
custom-skills/21-gtm-manager/code/docs/form_tracking.md
Normal file
157
custom-skills/21-gtm-manager/code/docs/form_tracking.md
Normal file
@@ -0,0 +1,157 @@
|
||||
# Form Tracking Reference
|
||||
|
||||
## GA4 Form Events
|
||||
|
||||
### form_start
|
||||
Fires on first interaction with form field.
|
||||
|
||||
```javascript
|
||||
dataLayer.push({
|
||||
event: "form_start",
|
||||
form_id: "contact-form",
|
||||
form_name: "Contact Us",
|
||||
form_destination: "/submit-contact"
|
||||
});
|
||||
```
|
||||
|
||||
### form_submit
|
||||
Fires on successful form submission.
|
||||
|
||||
```javascript
|
||||
dataLayer.push({
|
||||
event: "form_submit",
|
||||
form_id: "contact-form",
|
||||
form_name: "Contact Us",
|
||||
form_destination: "/submit-contact",
|
||||
form_submit_text: "Send Message"
|
||||
});
|
||||
```
|
||||
|
||||
### generate_lead
|
||||
Fires when form generates a qualified lead.
|
||||
|
||||
```javascript
|
||||
dataLayer.push({
|
||||
event: "generate_lead",
|
||||
currency: "USD",
|
||||
value: 100, // Estimated lead value
|
||||
form_id: "quote-request"
|
||||
});
|
||||
```
|
||||
|
||||
## Form Field Events (Custom)
|
||||
|
||||
### field_focus
|
||||
```javascript
|
||||
dataLayer.push({
|
||||
event: "field_focus",
|
||||
form_id: "signup-form",
|
||||
field_name: "email",
|
||||
field_type: "email"
|
||||
});
|
||||
```
|
||||
|
||||
### field_complete
|
||||
```javascript
|
||||
dataLayer.push({
|
||||
event: "field_complete",
|
||||
form_id: "signup-form",
|
||||
field_name: "email",
|
||||
field_type: "email",
|
||||
is_valid: true
|
||||
});
|
||||
```
|
||||
|
||||
### field_error
|
||||
```javascript
|
||||
dataLayer.push({
|
||||
event: "field_error",
|
||||
form_id: "signup-form",
|
||||
field_name: "email",
|
||||
error_message: "Invalid email format"
|
||||
});
|
||||
```
|
||||
|
||||
## Form Abandonment Tracking
|
||||
|
||||
### Detecting Abandonment
|
||||
Track when user leaves form without submitting:
|
||||
|
||||
```javascript
|
||||
// Track form start
|
||||
let formStarted = false;
|
||||
document.querySelectorAll('form input, form select, form textarea')
|
||||
.forEach(field => {
|
||||
field.addEventListener('focus', function() {
|
||||
if (!formStarted) {
|
||||
formStarted = true;
|
||||
dataLayer.push({ event: 'form_start', form_id: this.form.id });
|
||||
}
|
||||
});
|
||||
});
|
||||
|
||||
// Track abandonment on page leave
|
||||
window.addEventListener('beforeunload', function() {
|
||||
if (formStarted && !formSubmitted) {
|
||||
dataLayer.push({
|
||||
event: 'form_abandon',
|
||||
form_id: 'contact-form',
|
||||
last_field: lastFocusedField,
|
||||
fields_completed: completedFieldCount
|
||||
});
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
## GTM Trigger Configuration
|
||||
|
||||
### Form Submission Trigger
|
||||
| Setting | Value |
|
||||
|---------|-------|
|
||||
| Trigger Type | Form Submission |
|
||||
| Wait for Tags | Check (if AJAX form) |
|
||||
| Check Validation | Check |
|
||||
| Form ID | equals `contact-form` |
|
||||
|
||||
### Form Start Trigger (Custom Event)
|
||||
| Setting | Value |
|
||||
|---------|-------|
|
||||
| Trigger Type | Custom Event |
|
||||
| Event Name | form_start |
|
||||
| Fire On | All Custom Events |
|
||||
|
||||
## Common Form Types & Tracking
|
||||
|
||||
### Contact Forms
|
||||
Events: `form_start`, `form_submit`, `generate_lead`
|
||||
|
||||
### Newsletter Signup
|
||||
Events: `form_start`, `form_submit`, `sign_up`
|
||||
|
||||
### Login Forms
|
||||
Events: `form_start`, `login`
|
||||
|
||||
### Search Forms
|
||||
Events: `search` (with search_term parameter)
|
||||
|
||||
### Multi-Step Forms
|
||||
Track each step:
|
||||
```javascript
|
||||
dataLayer.push({
|
||||
event: "form_step",
|
||||
form_id: "checkout-form",
|
||||
step_number: 2,
|
||||
step_name: "Shipping Address"
|
||||
});
|
||||
```
|
||||
|
||||
## Validation Checklist
|
||||
|
||||
- [ ] Form has id or name attribute
|
||||
- [ ] All required fields have names
|
||||
- [ ] Submit button identifiable
|
||||
- [ ] form_start fires on first interaction
|
||||
- [ ] form_submit fires only on success
|
||||
- [ ] generate_lead has value parameter
|
||||
- [ ] Error events track validation failures
|
||||
- [ ] Abandonment tracking implemented (optional)
|
||||
177
custom-skills/21-gtm-manager/code/docs/ga4_events.md
Normal file
177
custom-skills/21-gtm-manager/code/docs/ga4_events.md
Normal file
@@ -0,0 +1,177 @@
|
||||
# GA4 Recommended Events Reference
|
||||
|
||||
## Automatically Collected Events
|
||||
Events GA4 collects without configuration:
|
||||
- `first_visit` - First time user visits
|
||||
- `session_start` - Session begins
|
||||
- `page_view` - Page loads (enhanced measurement)
|
||||
- `scroll` - 90% scroll depth
|
||||
- `click` - Outbound link clicks
|
||||
- `file_download` - File download clicks
|
||||
- `video_start`, `video_progress`, `video_complete` - YouTube embeds
|
||||
|
||||
## E-commerce Events (Required Parameters)
|
||||
|
||||
### view_item_list
|
||||
```javascript
|
||||
{
|
||||
event: "view_item_list",
|
||||
ecommerce: {
|
||||
item_list_id: "related_products",
|
||||
item_list_name: "Related Products",
|
||||
items: [{
|
||||
item_id: "SKU_12345", // required
|
||||
item_name: "Product Name", // required
|
||||
price: 29.99,
|
||||
quantity: 1
|
||||
}]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### view_item
|
||||
```javascript
|
||||
{
|
||||
event: "view_item",
|
||||
ecommerce: {
|
||||
currency: "USD",
|
||||
value: 29.99,
|
||||
items: [{
|
||||
item_id: "SKU_12345", // required
|
||||
item_name: "Product Name", // required
|
||||
price: 29.99,
|
||||
quantity: 1
|
||||
}]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### add_to_cart
|
||||
```javascript
|
||||
{
|
||||
event: "add_to_cart",
|
||||
ecommerce: {
|
||||
currency: "USD",
|
||||
value: 29.99,
|
||||
items: [{
|
||||
item_id: "SKU_12345", // required
|
||||
item_name: "Product Name", // required
|
||||
price: 29.99,
|
||||
quantity: 1
|
||||
}]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### begin_checkout
|
||||
```javascript
|
||||
{
|
||||
event: "begin_checkout",
|
||||
ecommerce: {
|
||||
currency: "USD",
|
||||
value: 99.99,
|
||||
coupon: "SUMMER_SALE",
|
||||
items: [...]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### add_payment_info
|
||||
```javascript
|
||||
{
|
||||
event: "add_payment_info",
|
||||
ecommerce: {
|
||||
currency: "USD",
|
||||
value: 99.99,
|
||||
payment_type: "credit_card",
|
||||
items: [...]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### purchase
|
||||
```javascript
|
||||
{
|
||||
event: "purchase",
|
||||
ecommerce: {
|
||||
transaction_id: "T12345", // required, must be unique
|
||||
value: 99.99, // required
|
||||
currency: "USD", // required
|
||||
tax: 4.99,
|
||||
shipping: 5.99,
|
||||
coupon: "SUMMER_SALE",
|
||||
items: [{
|
||||
item_id: "SKU_12345", // required
|
||||
item_name: "Product Name",// required
|
||||
price: 29.99,
|
||||
quantity: 2
|
||||
}]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Lead Generation Events
|
||||
|
||||
### generate_lead
|
||||
```javascript
|
||||
{
|
||||
event: "generate_lead",
|
||||
currency: "USD",
|
||||
value: 100 // estimated lead value
|
||||
}
|
||||
```
|
||||
|
||||
### sign_up
|
||||
```javascript
|
||||
{
|
||||
event: "sign_up",
|
||||
method: "email" // or "google", "facebook", etc.
|
||||
}
|
||||
```
|
||||
|
||||
### login
|
||||
```javascript
|
||||
{
|
||||
event: "login",
|
||||
method: "email"
|
||||
}
|
||||
```
|
||||
|
||||
## Engagement Events
|
||||
|
||||
### search
|
||||
```javascript
|
||||
{
|
||||
event: "search",
|
||||
search_term: "blue shoes"
|
||||
}
|
||||
```
|
||||
|
||||
### share
|
||||
```javascript
|
||||
{
|
||||
event: "share",
|
||||
method: "twitter",
|
||||
content_type: "article",
|
||||
item_id: "article_123"
|
||||
}
|
||||
```
|
||||
|
||||
## Parameter Validation Rules
|
||||
|
||||
| Parameter | Type | Max Length | Notes |
|
||||
|-----------|------|------------|-------|
|
||||
| event name | string | 40 chars | No spaces, alphanumeric + underscore |
|
||||
| item_id | string | 100 chars | Required for e-commerce |
|
||||
| item_name | string | 100 chars | Required for e-commerce |
|
||||
| currency | string | 3 chars | ISO 4217 format (USD, KRW, etc.) |
|
||||
| transaction_id | string | 100 chars | Must be unique per transaction |
|
||||
| value | number | - | Numeric, no currency symbols |
|
||||
|
||||
## Common Validation Errors
|
||||
|
||||
1. **Missing required params**: `item_id` or `item_name` not in items array
|
||||
2. **Wrong data type**: `value` as string instead of number
|
||||
3. **Duplicate transaction_id**: Same ID used for multiple purchases
|
||||
4. **Empty items array**: E-commerce event with no items
|
||||
5. **Invalid currency**: Currency code not in ISO 4217 format
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user