refactor(skills): Restructure skills to dual-platform architecture

Major refactoring of ourdigital-custom-skills with new numbering system: ## Structure Changes - Each skill now has code/ (Claude Code) and desktop/ (Claude Desktop) versions - New progressive numbering: 01-09 General, 10-19 SEO, 20-29 GTM, 30-39 OurDigital, 40-49 Jamie ## Skill Reorganization - 01-notion-organizer (from 02) - 10-18: SEO tools split into focused skills (technical, on-page, local, schema, vitals, gsc, gateway) - 20-21: GTM audit and manager - 30-32: OurDigital designer, research, presentation - 40-41: Jamie brand editor and audit ## New Files - .claude/commands/: Slash command definitions for all skills - CLAUDE.md: Updated with new skill structure documentation - REFACTORING_PLAN.md: Migration documentation - COMPATIBILITY_REPORT.md, SKILLS_COMPARISON.md: Analysis docs ## Removed - Old skill directories (02-05, 10-14, 20-21 old numbering) - Consolidated into new structure with _archive/ for reference 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-22 01:58:24 +09:00
parent 214247ace2
commit eea49f9f8c
251 changed files with 12308 additions and 102 deletions
--- a/ourdigital-custom-skills/11-seo-on-page-audit/code/CLAUDE.md
+++ b/ourdigital-custom-skills/11-seo-on-page-audit/code/CLAUDE.md
@@ -0,0 +1,107 @@
+# CLAUDE.md
+
+## Overview
+
+On-page SEO analyzer for single-page optimization: meta tags, headings, links, images, and Open Graph data.
+
+## Quick Start
+
+```bash
+pip install -r scripts/requirements.txt
+python scripts/page_analyzer.py --url https://example.com
+```
+
+## Scripts
+
+| Script | Purpose |
+|--------|---------|
+| `page_analyzer.py` | Analyze on-page SEO elements |
+| `base_client.py` | Shared utilities |
+
+## Usage
+
+```bash
+# Full page analysis
+python scripts/page_analyzer.py --url https://example.com
+
+# JSON output
+python scripts/page_analyzer.py --url https://example.com --json
+
+# Analyze multiple pages
+python scripts/page_analyzer.py --urls urls.txt
+```
+
+## Analysis Categories
+
+### Meta Tags
+- Title tag (length, keywords)
+- Meta description (length, call-to-action)
+- Canonical URL
+- Robots meta tag
+
+### Heading Structure
+- H1 presence and count
+- Heading hierarchy (H1→H6)
+- Keyword placement in headings
+
+### Links
+- Internal link count
+- External link count
+- Broken links (4xx/5xx)
+- Nofollow distribution
+
+### Images
+- Alt attribute presence
+- Image file sizes
+- Lazy loading implementation
+
+### Open Graph / Social
+- OG title, description, image
+- Twitter Card tags
+- Social sharing preview
+
+## Output
+
+```json
+{
+  "url": "https://example.com",
+  "meta": {
+    "title": "Page Title",
+    "title_length": 55,
+    "description": "...",
+    "description_length": 150,
+    "canonical": "https://example.com"
+  },
+  "headings": {
+    "h1_count": 1,
+    "h1_text": ["Main Heading"],
+    "hierarchy_valid": true
+  },
+  "links": {
+    "internal": 25,
+    "external": 5,
+    "broken": []
+  },
+  "issues": []
+}
+```
+
+## Common Issues
+
+| Issue | Severity | Recommendation |
+|-------|----------|----------------|
+| Missing H1 | High | Add single H1 tag |
+| Title too long (>60) | Medium | Shorten to 50-60 chars |
+| No meta description | High | Add compelling description |
+| Images without alt | Medium | Add descriptive alt text |
+| Multiple H1 tags | Medium | Use single H1 only |
+
+## Dependencies
+
+```
+lxml>=5.1.0
+beautifulsoup4>=4.12.0
+requests>=2.31.0
+python-dotenv>=1.0.0
+rich>=13.7.0
+```
--- a/ourdigital-custom-skills/11-seo-on-page-audit/code/scripts/base_client.py
+++ b/ourdigital-custom-skills/11-seo-on-page-audit/code/scripts/base_client.py
@@ -0,0 +1,207 @@
+"""
+Base Client - Shared async client utilities
+===========================================
+Purpose: Rate-limited async operations for API clients
+Python: 3.10+
+"""
+
+import asyncio
+import logging
+import os
+from asyncio import Semaphore
+from datetime import datetime
+from typing import Any, Callable, TypeVar
+
+from dotenv import load_dotenv
+from tenacity import (
+    retry,
+    stop_after_attempt,
+    wait_exponential,
+    retry_if_exception_type,
+)
+
+# Load environment variables
+load_dotenv()
+
+# Logging setup
+logging.basicConfig(
+    level=logging.INFO,
+    format="%(asctime)s - %(levelname)s - %(message)s",
+)
+
+T = TypeVar("T")
+
+
+class RateLimiter:
+    """Rate limiter using token bucket algorithm."""
+
+    def __init__(self, rate: float, per: float = 1.0):
+        """
+        Initialize rate limiter.
+
+        Args:
+            rate: Number of requests allowed
+            per: Time period in seconds (default: 1 second)
+        """
+        self.rate = rate
+        self.per = per
+        self.tokens = rate
+        self.last_update = datetime.now()
+        self._lock = asyncio.Lock()
+
+    async def acquire(self) -> None:
+        """Acquire a token, waiting if necessary."""
+        async with self._lock:
+            now = datetime.now()
+            elapsed = (now - self.last_update).total_seconds()
+            self.tokens = min(self.rate, self.tokens + elapsed * (self.rate / self.per))
+            self.last_update = now
+
+            if self.tokens < 1:
+                wait_time = (1 - self.tokens) * (self.per / self.rate)
+                await asyncio.sleep(wait_time)
+                self.tokens = 0
+            else:
+                self.tokens -= 1
+
+
+class BaseAsyncClient:
+    """Base class for async API clients with rate limiting."""
+
+    def __init__(
+        self,
+        max_concurrent: int = 5,
+        requests_per_second: float = 3.0,
+        logger: logging.Logger | None = None,
+    ):
+        """
+        Initialize base client.
+
+        Args:
+            max_concurrent: Maximum concurrent requests
+            requests_per_second: Rate limit
+            logger: Logger instance
+        """
+        self.semaphore = Semaphore(max_concurrent)
+        self.rate_limiter = RateLimiter(requests_per_second)
+        self.logger = logger or logging.getLogger(self.__class__.__name__)
+        self.stats = {
+            "requests": 0,
+            "success": 0,
+            "errors": 0,
+            "retries": 0,
+        }
+
+    @retry(
+        stop=stop_after_attempt(3),
+        wait=wait_exponential(multiplier=1, min=2, max=10),
+        retry=retry_if_exception_type(Exception),
+    )
+    async def _rate_limited_request(
+        self,
+        coro: Callable[[], Any],
+    ) -> Any:
+        """Execute a request with rate limiting and retry."""
+        async with self.semaphore:
+            await self.rate_limiter.acquire()
+            self.stats["requests"] += 1
+            try:
+                result = await coro()
+                self.stats["success"] += 1
+                return result
+            except Exception as e:
+                self.stats["errors"] += 1
+                self.logger.error(f"Request failed: {e}")
+                raise
+
+    async def batch_requests(
+        self,
+        requests: list[Callable[[], Any]],
+        desc: str = "Processing",
+    ) -> list[Any]:
+        """Execute multiple requests concurrently."""
+        try:
+            from tqdm.asyncio import tqdm
+            has_tqdm = True
+        except ImportError:
+            has_tqdm = False
+
+        async def execute(req: Callable) -> Any:
+            try:
+                return await self._rate_limited_request(req)
+            except Exception as e:
+                return {"error": str(e)}
+
+        tasks = [execute(req) for req in requests]
+
+        if has_tqdm:
+            results = []
+            for coro in tqdm.as_completed(tasks, total=len(tasks), desc=desc):
+                result = await coro
+                results.append(result)
+            return results
+        else:
+            return await asyncio.gather(*tasks, return_exceptions=True)
+
+    def print_stats(self) -> None:
+        """Print request statistics."""
+        self.logger.info("=" * 40)
+        self.logger.info("Request Statistics:")
+        self.logger.info(f"  Total Requests: {self.stats['requests']}")
+        self.logger.info(f"  Successful: {self.stats['success']}")
+        self.logger.info(f"  Errors: {self.stats['errors']}")
+        self.logger.info("=" * 40)
+
+
+class ConfigManager:
+    """Manage API configuration and credentials."""
+
+    def __init__(self):
+        load_dotenv()
+
+    @property
+    def google_credentials_path(self) -> str | None:
+        """Get Google service account credentials path."""
+        # Prefer SEO-specific credentials, fallback to general credentials
+        seo_creds = os.path.expanduser("~/.credential/ourdigital-seo-agent.json")
+        if os.path.exists(seo_creds):
+            return seo_creds
+        return os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
+
+    @property
+    def pagespeed_api_key(self) -> str | None:
+        """Get PageSpeed Insights API key."""
+        return os.getenv("PAGESPEED_API_KEY")
+
+    @property
+    def custom_search_api_key(self) -> str | None:
+        """Get Custom Search API key."""
+        return os.getenv("CUSTOM_SEARCH_API_KEY")
+
+    @property
+    def custom_search_engine_id(self) -> str | None:
+        """Get Custom Search Engine ID."""
+        return os.getenv("CUSTOM_SEARCH_ENGINE_ID")
+
+    @property
+    def notion_token(self) -> str | None:
+        """Get Notion API token."""
+        return os.getenv("NOTION_TOKEN") or os.getenv("NOTION_API_KEY")
+
+    def validate_google_credentials(self) -> bool:
+        """Validate Google credentials are configured."""
+        creds_path = self.google_credentials_path
+        if not creds_path:
+            return False
+        return os.path.exists(creds_path)
+
+    def get_required(self, key: str) -> str:
+        """Get required environment variable or raise error."""
+        value = os.getenv(key)
+        if not value:
+            raise ValueError(f"Missing required environment variable: {key}")
+        return value
+
+
+# Singleton config instance
+config = ConfigManager()
--- a/ourdigital-custom-skills/11-seo-on-page-audit/code/scripts/page_analyzer.py
+++ b/ourdigital-custom-skills/11-seo-on-page-audit/code/scripts/page_analyzer.py
@@ -0,0 +1,569 @@
+"""
+Page Analyzer - Extract SEO metadata from web pages
+===================================================
+Purpose: Comprehensive page-level SEO data extraction
+Python: 3.10+
+Usage:
+    from page_analyzer import PageAnalyzer, PageMetadata
+    analyzer = PageAnalyzer()
+    metadata = analyzer.analyze_url("https://example.com/page")
+"""
+
+import json
+import logging
+import re
+from dataclasses import dataclass, field
+from datetime import datetime
+from typing import Any
+from urllib.parse import urljoin, urlparse
+
+import requests
+from bs4 import BeautifulSoup
+
+logging.basicConfig(
+    level=logging.INFO,
+    format="%(asctime)s - %(levelname)s - %(message)s",
+)
+logger = logging.getLogger(__name__)
+
+
+@dataclass
+class LinkData:
+    """Represents a link found on a page."""
+    url: str
+    anchor_text: str
+    is_internal: bool
+    is_nofollow: bool = False
+    link_type: str = "body"  # body, nav, footer, etc.
+
+
+@dataclass
+class HeadingData:
+    """Represents a heading found on a page."""
+    level: int  # 1-6
+    text: str
+
+
+@dataclass
+class SchemaData:
+    """Represents schema.org structured data."""
+    schema_type: str
+    properties: dict
+    format: str = "json-ld"  # json-ld, microdata, rdfa
+
+
+@dataclass
+class OpenGraphData:
+    """Represents Open Graph metadata."""
+    og_title: str | None = None
+    og_description: str | None = None
+    og_image: str | None = None
+    og_url: str | None = None
+    og_type: str | None = None
+    og_site_name: str | None = None
+    og_locale: str | None = None
+    twitter_card: str | None = None
+    twitter_title: str | None = None
+    twitter_description: str | None = None
+    twitter_image: str | None = None
+
+
+@dataclass
+class PageMetadata:
+    """Complete SEO metadata for a page."""
+
+    # Basic info
+    url: str
+    status_code: int = 0
+    content_type: str = ""
+    response_time_ms: float = 0
+    analyzed_at: datetime = field(default_factory=datetime.now)
+
+    # Meta tags
+    title: str | None = None
+    title_length: int = 0
+    meta_description: str | None = None
+    meta_description_length: int = 0
+    canonical_url: str | None = None
+    robots_meta: str | None = None
+
+    # Language
+    html_lang: str | None = None
+    hreflang_tags: list[dict] = field(default_factory=list)  # [{"lang": "en", "url": "..."}]
+
+    # Headings
+    headings: list[HeadingData] = field(default_factory=list)
+    h1_count: int = 0
+    h1_text: str | None = None
+
+    # Open Graph & Social
+    open_graph: OpenGraphData = field(default_factory=OpenGraphData)
+
+    # Schema/Structured Data
+    schema_data: list[SchemaData] = field(default_factory=list)
+    schema_types_found: list[str] = field(default_factory=list)
+
+    # Links
+    internal_links: list[LinkData] = field(default_factory=list)
+    external_links: list[LinkData] = field(default_factory=list)
+    internal_link_count: int = 0
+    external_link_count: int = 0
+
+    # Images
+    images_total: int = 0
+    images_without_alt: int = 0
+    images_with_alt: int = 0
+
+    # Content metrics
+    word_count: int = 0
+
+    # Issues found
+    issues: list[str] = field(default_factory=list)
+    warnings: list[str] = field(default_factory=list)
+
+    def to_dict(self) -> dict:
+        """Convert to dictionary for JSON serialization."""
+        return {
+            "url": self.url,
+            "status_code": self.status_code,
+            "content_type": self.content_type,
+            "response_time_ms": self.response_time_ms,
+            "analyzed_at": self.analyzed_at.isoformat(),
+            "title": self.title,
+            "title_length": self.title_length,
+            "meta_description": self.meta_description,
+            "meta_description_length": self.meta_description_length,
+            "canonical_url": self.canonical_url,
+            "robots_meta": self.robots_meta,
+            "html_lang": self.html_lang,
+            "hreflang_tags": self.hreflang_tags,
+            "h1_count": self.h1_count,
+            "h1_text": self.h1_text,
+            "headings_count": len(self.headings),
+            "schema_types_found": self.schema_types_found,
+            "internal_link_count": self.internal_link_count,
+            "external_link_count": self.external_link_count,
+            "images_total": self.images_total,
+            "images_without_alt": self.images_without_alt,
+            "word_count": self.word_count,
+            "issues": self.issues,
+            "warnings": self.warnings,
+            "open_graph": {
+                "og_title": self.open_graph.og_title,
+                "og_description": self.open_graph.og_description,
+                "og_image": self.open_graph.og_image,
+                "og_url": self.open_graph.og_url,
+                "og_type": self.open_graph.og_type,
+            },
+        }
+
+    def get_summary(self) -> str:
+        """Get a brief summary of the page analysis."""
+        lines = [
+            f"URL: {self.url}",
+            f"Status: {self.status_code}",
+            f"Title: {self.title[:50] + '...' if self.title and len(self.title) > 50 else self.title}",
+            f"Description: {'✓' if self.meta_description else '✗ Missing'}",
+            f"Canonical: {'✓' if self.canonical_url else '✗ Missing'}",
+            f"H1: {self.h1_count} found",
+            f"Schema: {', '.join(self.schema_types_found) if self.schema_types_found else 'None'}",
+            f"Links: {self.internal_link_count} internal, {self.external_link_count} external",
+            f"Images: {self.images_total} total, {self.images_without_alt} without alt",
+        ]
+        if self.issues:
+            lines.append(f"Issues: {len(self.issues)}")
+        return "\n".join(lines)
+
+
+class PageAnalyzer:
+    """Analyze web pages for SEO metadata."""
+
+    DEFAULT_USER_AGENT = "Mozilla/5.0 (compatible; OurDigitalSEOBot/1.0; +https://ourdigital.org)"
+
+    def __init__(
+        self,
+        user_agent: str | None = None,
+        timeout: int = 30,
+    ):
+        """
+        Initialize page analyzer.
+
+        Args:
+            user_agent: Custom user agent string
+            timeout: Request timeout in seconds
+        """
+        self.user_agent = user_agent or self.DEFAULT_USER_AGENT
+        self.timeout = timeout
+        self.session = requests.Session()
+        self.session.headers.update({
+            "User-Agent": self.user_agent,
+            "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
+            "Accept-Language": "en-US,en;q=0.9,ko;q=0.8",
+        })
+
+    def analyze_url(self, url: str) -> PageMetadata:
+        """
+        Analyze a URL and extract SEO metadata.
+
+        Args:
+            url: URL to analyze
+
+        Returns:
+            PageMetadata object with all extracted data
+        """
+        metadata = PageMetadata(url=url)
+
+        try:
+            # Fetch page
+            start_time = datetime.now()
+            response = self.session.get(url, timeout=self.timeout, allow_redirects=True)
+            metadata.response_time_ms = (datetime.now() - start_time).total_seconds() * 1000
+            metadata.status_code = response.status_code
+            metadata.content_type = response.headers.get("Content-Type", "")
+
+            if response.status_code != 200:
+                metadata.issues.append(f"HTTP {response.status_code} status")
+                if response.status_code >= 400:
+                    return metadata
+
+            # Parse HTML
+            soup = BeautifulSoup(response.text, "html.parser")
+            base_url = url
+
+            # Extract all metadata
+            self._extract_basic_meta(soup, metadata)
+            self._extract_canonical(soup, metadata, base_url)
+            self._extract_robots_meta(soup, metadata)
+            self._extract_hreflang(soup, metadata)
+            self._extract_headings(soup, metadata)
+            self._extract_open_graph(soup, metadata)
+            self._extract_schema(soup, metadata)
+            self._extract_links(soup, metadata, base_url)
+            self._extract_images(soup, metadata)
+            self._extract_content_metrics(soup, metadata)
+
+            # Run SEO checks
+            self._run_seo_checks(metadata)
+
+        except requests.RequestException as e:
+            metadata.issues.append(f"Request failed: {str(e)}")
+            logger.error(f"Failed to analyze {url}: {e}")
+        except Exception as e:
+            metadata.issues.append(f"Analysis error: {str(e)}")
+            logger.error(f"Error analyzing {url}: {e}")
+
+        return metadata
+
+    def _extract_basic_meta(self, soup: BeautifulSoup, metadata: PageMetadata) -> None:
+        """Extract title and meta description."""
+        # Title
+        title_tag = soup.find("title")
+        if title_tag and title_tag.string:
+            metadata.title = title_tag.string.strip()
+            metadata.title_length = len(metadata.title)
+
+        # Meta description
+        desc_tag = soup.find("meta", attrs={"name": re.compile(r"^description$", re.I)})
+        if desc_tag and desc_tag.get("content"):
+            metadata.meta_description = desc_tag["content"].strip()
+            metadata.meta_description_length = len(metadata.meta_description)
+
+        # HTML lang
+        html_tag = soup.find("html")
+        if html_tag and html_tag.get("lang"):
+            metadata.html_lang = html_tag["lang"]
+
+    def _extract_canonical(self, soup: BeautifulSoup, metadata: PageMetadata, base_url: str) -> None:
+        """Extract canonical URL."""
+        canonical = soup.find("link", rel="canonical")
+        if canonical and canonical.get("href"):
+            metadata.canonical_url = urljoin(base_url, canonical["href"])
+
+    def _extract_robots_meta(self, soup: BeautifulSoup, metadata: PageMetadata) -> None:
+        """Extract robots meta tag."""
+        robots = soup.find("meta", attrs={"name": re.compile(r"^robots$", re.I)})
+        if robots and robots.get("content"):
+            metadata.robots_meta = robots["content"]
+
+        # Also check for googlebot-specific
+        googlebot = soup.find("meta", attrs={"name": re.compile(r"^googlebot$", re.I)})
+        if googlebot and googlebot.get("content"):
+            if metadata.robots_meta:
+                metadata.robots_meta += f" | googlebot: {googlebot['content']}"
+            else:
+                metadata.robots_meta = f"googlebot: {googlebot['content']}"
+
+    def _extract_hreflang(self, soup: BeautifulSoup, metadata: PageMetadata) -> None:
+        """Extract hreflang tags."""
+        hreflang_tags = soup.find_all("link", rel="alternate", hreflang=True)
+        for tag in hreflang_tags:
+            if tag.get("href") and tag.get("hreflang"):
+                metadata.hreflang_tags.append({
+                    "lang": tag["hreflang"],
+                    "url": tag["href"]
+                })
+
+    def _extract_headings(self, soup: BeautifulSoup, metadata: PageMetadata) -> None:
+        """Extract all headings."""
+        for level in range(1, 7):
+            for heading in soup.find_all(f"h{level}"):
+                text = heading.get_text(strip=True)
+                if text:
+                    metadata.headings.append(HeadingData(level=level, text=text))
+
+        # Count H1s specifically
+        h1_tags = soup.find_all("h1")
+        metadata.h1_count = len(h1_tags)
+        if h1_tags:
+            metadata.h1_text = h1_tags[0].get_text(strip=True)
+
+    def _extract_open_graph(self, soup: BeautifulSoup, metadata: PageMetadata) -> None:
+        """Extract Open Graph and Twitter Card data."""
+        og = metadata.open_graph
+
+        # Open Graph tags
+        og_mappings = {
+            "og:title": "og_title",
+            "og:description": "og_description",
+            "og:image": "og_image",
+            "og:url": "og_url",
+            "og:type": "og_type",
+            "og:site_name": "og_site_name",
+            "og:locale": "og_locale",
+        }
+
+        for og_prop, attr_name in og_mappings.items():
+            tag = soup.find("meta", property=og_prop)
+            if tag and tag.get("content"):
+                setattr(og, attr_name, tag["content"])
+
+        # Twitter Card tags
+        twitter_mappings = {
+            "twitter:card": "twitter_card",
+            "twitter:title": "twitter_title",
+            "twitter:description": "twitter_description",
+            "twitter:image": "twitter_image",
+        }
+
+        for tw_name, attr_name in twitter_mappings.items():
+            tag = soup.find("meta", attrs={"name": tw_name})
+            if tag and tag.get("content"):
+                setattr(og, attr_name, tag["content"])
+
+    def _extract_schema(self, soup: BeautifulSoup, metadata: PageMetadata) -> None:
+        """Extract schema.org structured data."""
+        # JSON-LD
+        for script in soup.find_all("script", type="application/ld+json"):
+            try:
+                data = json.loads(script.string)
+                if isinstance(data, list):
+                    for item in data:
+                        self._process_schema_item(item, metadata, "json-ld")
+                else:
+                    self._process_schema_item(data, metadata, "json-ld")
+            except (json.JSONDecodeError, TypeError):
+                continue
+
+        # Microdata (basic detection)
+        for item in soup.find_all(itemscope=True):
+            itemtype = item.get("itemtype", "")
+            if itemtype:
+                schema_type = itemtype.split("/")[-1]
+                if schema_type not in metadata.schema_types_found:
+                    metadata.schema_types_found.append(schema_type)
+                    metadata.schema_data.append(SchemaData(
+                        schema_type=schema_type,
+                        properties={},
+                        format="microdata"
+                    ))
+
+    def _process_schema_item(self, data: dict, metadata: PageMetadata, format_type: str) -> None:
+        """Process a single schema.org item."""
+        if not isinstance(data, dict):
+            return
+
+        schema_type = data.get("@type", "Unknown")
+        if isinstance(schema_type, list):
+            schema_type = schema_type[0] if schema_type else "Unknown"
+
+        if schema_type not in metadata.schema_types_found:
+            metadata.schema_types_found.append(schema_type)
+
+        metadata.schema_data.append(SchemaData(
+            schema_type=schema_type,
+            properties=data,
+            format=format_type
+        ))
+
+        # Process nested @graph items
+        if "@graph" in data:
+            for item in data["@graph"]:
+                self._process_schema_item(item, metadata, format_type)
+
+    def _extract_links(self, soup: BeautifulSoup, metadata: PageMetadata, base_url: str) -> None:
+        """Extract internal and external links."""
+        parsed_base = urlparse(base_url)
+        base_domain = parsed_base.netloc.lower()
+
+        for a_tag in soup.find_all("a", href=True):
+            href = a_tag["href"]
+
+            # Skip non-http links
+            if href.startswith(("#", "javascript:", "mailto:", "tel:")):
+                continue
+
+            # Resolve relative URLs
+            full_url = urljoin(base_url, href)
+            parsed_url = urlparse(full_url)
+
+            # Get anchor text
+            anchor_text = a_tag.get_text(strip=True)[:100]  # Limit length
+
+            # Check if nofollow
+            rel = a_tag.get("rel", [])
+            if isinstance(rel, str):
+                rel = rel.split()
+            is_nofollow = "nofollow" in rel
+
+            # Determine if internal or external
+            link_domain = parsed_url.netloc.lower()
+            is_internal = (
+                link_domain == base_domain or
+                link_domain.endswith(f".{base_domain}") or
+                base_domain.endswith(f".{link_domain}")
+            )
+
+            link_data = LinkData(
+                url=full_url,
+                anchor_text=anchor_text,
+                is_internal=is_internal,
+                is_nofollow=is_nofollow,
+            )
+
+            if is_internal:
+                metadata.internal_links.append(link_data)
+            else:
+                metadata.external_links.append(link_data)
+
+        metadata.internal_link_count = len(metadata.internal_links)
+        metadata.external_link_count = len(metadata.external_links)
+
+    def _extract_images(self, soup: BeautifulSoup, metadata: PageMetadata) -> None:
+        """Extract image information."""
+        images = soup.find_all("img")
+        metadata.images_total = len(images)
+
+        for img in images:
+            alt = img.get("alt", "").strip()
+            if alt:
+                metadata.images_with_alt += 1
+            else:
+                metadata.images_without_alt += 1
+
+    def _extract_content_metrics(self, soup: BeautifulSoup, metadata: PageMetadata) -> None:
+        """Extract content metrics like word count."""
+        # Remove script and style elements
+        for element in soup(["script", "style", "noscript"]):
+            element.decompose()
+
+        # Get text content
+        text = soup.get_text(separator=" ", strip=True)
+        words = text.split()
+        metadata.word_count = len(words)
+
+    def _run_seo_checks(self, metadata: PageMetadata) -> None:
+        """Run SEO checks and add issues/warnings."""
+        # Title checks
+        if not metadata.title:
+            metadata.issues.append("Missing title tag")
+        elif metadata.title_length < 30:
+            metadata.warnings.append(f"Title too short ({metadata.title_length} chars, recommend 50-60)")
+        elif metadata.title_length > 60:
+            metadata.warnings.append(f"Title too long ({metadata.title_length} chars, recommend 50-60)")
+
+        # Meta description checks
+        if not metadata.meta_description:
+            metadata.issues.append("Missing meta description")
+        elif metadata.meta_description_length < 120:
+            metadata.warnings.append(f"Meta description too short ({metadata.meta_description_length} chars)")
+        elif metadata.meta_description_length > 160:
+            metadata.warnings.append(f"Meta description too long ({metadata.meta_description_length} chars)")
+
+        # Canonical check
+        if not metadata.canonical_url:
+            metadata.warnings.append("Missing canonical tag")
+        elif metadata.canonical_url != metadata.url:
+            metadata.warnings.append(f"Canonical points to different URL: {metadata.canonical_url}")
+
+        # H1 checks
+        if metadata.h1_count == 0:
+            metadata.issues.append("Missing H1 tag")
+        elif metadata.h1_count > 1:
+            metadata.warnings.append(f"Multiple H1 tags ({metadata.h1_count})")
+
+        # Image alt check
+        if metadata.images_without_alt > 0:
+            metadata.warnings.append(f"{metadata.images_without_alt} images missing alt text")
+
+        # Schema check
+        if not metadata.schema_types_found:
+            metadata.warnings.append("No structured data found")
+
+        # Open Graph check
+        if not metadata.open_graph.og_title:
+            metadata.warnings.append("Missing Open Graph tags")
+
+        # Robots meta check
+        if metadata.robots_meta:
+            robots_lower = metadata.robots_meta.lower()
+            if "noindex" in robots_lower:
+                metadata.issues.append("Page is set to noindex")
+            if "nofollow" in robots_lower:
+                metadata.warnings.append("Page is set to nofollow")
+
+
+def main():
+    """CLI entry point for testing."""
+    import argparse
+
+    parser = argparse.ArgumentParser(description="Page SEO Analyzer")
+    parser.add_argument("url", help="URL to analyze")
+    parser.add_argument("--json", "-j", action="store_true", help="Output as JSON")
+
+    args = parser.parse_args()
+
+    analyzer = PageAnalyzer()
+    metadata = analyzer.analyze_url(args.url)
+
+    if args.json:
+        print(json.dumps(metadata.to_dict(), indent=2, ensure_ascii=False))
+    else:
+        print("=" * 60)
+        print("PAGE ANALYSIS REPORT")
+        print("=" * 60)
+        print(metadata.get_summary())
+        print()
+
+        if metadata.issues:
+            print("ISSUES:")
+            for issue in metadata.issues:
+                print(f"  ✗ {issue}")
+
+        if metadata.warnings:
+            print("\nWARNINGS:")
+            for warning in metadata.warnings:
+                print(f"  ⚠ {warning}")
+
+        if metadata.hreflang_tags:
+            print(f"\nHREFLANG TAGS ({len(metadata.hreflang_tags)}):")
+            for tag in metadata.hreflang_tags[:5]:
+                print(f"  {tag['lang']}: {tag['url']}")
+
+        if metadata.schema_types_found:
+            print(f"\nSCHEMA TYPES:")
+            for schema_type in metadata.schema_types_found:
+                print(f"  - {schema_type}")
+
+
+if __name__ == "__main__":
+    main()
--- a/ourdigital-custom-skills/11-seo-on-page-audit/code/scripts/requirements.txt
+++ b/ourdigital-custom-skills/11-seo-on-page-audit/code/scripts/requirements.txt
@@ -0,0 +1,6 @@
+# 11-seo-on-page-audit dependencies
+lxml>=5.1.0
+beautifulsoup4>=4.12.0
+requests>=2.31.0
+python-dotenv>=1.0.0
+rich>=13.7.0