feat(seo-audit): Add comprehensive SEO audit skill

Add ourdigital-seo-audit skill with: - Full site audit orchestrator (full_audit.py) - Google Search Console and PageSpeed API clients - Schema.org JSON-LD validation and generation - XML sitemap and robots.txt validation - Notion database integration for findings export - Core Web Vitals measurement and analysis - 7 schema templates (article, faq, product, etc.) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 02:30:02 +09:00
parent b3ceebdf63
commit 9426787ba6
28 changed files with 9279 additions and 0 deletions
--- a/ourdigital-custom-skills/12-ourdigital-seo-audit/CLAUDE.md
+++ b/ourdigital-custom-skills/12-ourdigital-seo-audit/CLAUDE.md
@@ -0,0 +1,151 @@
 # CLAUDE.md
 This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
 ## Skill Overview
 **ourdigital-seo-audit** is a comprehensive SEO audit skill that performs technical SEO analysis, schema validation, sitemap/robots.txt checks, and Core Web Vitals measurement. Results are exported to a Notion database.
 ## Architecture
 ```
 12-ourdigital-seo-audit/
 ├── SKILL.md                    # Skill definition with YAML frontmatter
 ├── scripts/                    # Python automation scripts
 │   ├── base_client.py          # Shared utilities: RateLimiter, ConfigManager
 │   ├── full_audit.py           # Main orchestrator (SEOAuditor class)
 │   ├── gsc_client.py           # Google Search Console API
 │   ├── pagespeed_client.py     # PageSpeed Insights API
 │   ├── schema_validator.py     # JSON-LD/Microdata extraction & validation
 │   ├── schema_generator.py     # Generate schema markup from templates
 │   ├── sitemap_validator.py    # XML sitemap validation
 │   ├── sitemap_crawler.py      # Async sitemap URL crawler
 │   ├── robots_checker.py       # Robots.txt parser & rule tester
 │   ├── page_analyzer.py        # On-page SEO analysis
 │   └── notion_reporter.py      # Notion database integration
 ├── templates/
 │   ├── schema_templates/       # JSON-LD templates (article, faq, product, etc.)
 │   └── notion_database_schema.json
 ├── reference.md                # API documentation
 └── USER_GUIDE.md               # End-user documentation
 ```
 ## Script Relationships
 ```
 full_audit.py (orchestrator)
 ├── robots_checker.py     → RobotsChecker.analyze()
 ├── sitemap_validator.py  → SitemapValidator.validate()
 ├── schema_validator.py   → SchemaValidator.validate()
 ├── pagespeed_client.py   → PageSpeedClient.analyze()
 └── notion_reporter.py    → NotionReporter.create_audit_report()
 All scripts use:
 └── base_client.py        → ConfigManager (credentials), RateLimiter, BaseAsyncClient
 ```
 ## Common Commands
 ### Install Dependencies
 ```bash
 pip install -r scripts/requirements.txt
 ```
 ### Run Full SEO Audit
 ```bash
 python scripts/full_audit.py --url https://example.com --output console
 python scripts/full_audit.py --url https://example.com --output notion
 python scripts/full_audit.py --url https://example.com --json
 ```
 ### Individual Script Usage
 ```bash
 # Robots.txt analysis
 python scripts/robots_checker.py --url https://example.com
 # Sitemap validation
 python scripts/sitemap_validator.py --url https://example.com/sitemap.xml
 # Schema validation
 python scripts/schema_validator.py --url https://example.com
 # Schema generation
 python scripts/schema_generator.py --type organization --url https://example.com
 # Core Web Vitals
 python scripts/pagespeed_client.py --url https://example.com --strategy mobile
 # Search Console data
 python scripts/gsc_client.py --site sc-domain:example.com --action summary
 ```
 ## Key Classes and Data Flow
 ### AuditResult (full_audit.py)
 Central dataclass holding all audit findings:
 - `robots`, `sitemap`, `schema`, `performance` - Raw results from each checker
 - `findings: list[SEOFinding]` - Normalized issues for Notion export
 - `summary` - Aggregated statistics
 ### SEOFinding (notion_reporter.py)
 Standard format for all audit issues:
 ```python
@dataclass
 class SEOFinding:
    issue: str          # Issue title
    category: str       # Technical SEO, Performance, Schema, etc.
    priority: str       # Critical, High, Medium, Low
    url: str | None     # Affected URL
    recommendation: str # How to fix
    audit_id: str       # Groups findings from same session
 ```
 ### NotionReporter
 Creates findings in Notion with two modes:
 1. Individual pages per finding in default database
 2. Summary page with checklist table via `create_audit_report()`
 Default database: `2c8581e5-8a1e-8035-880b-e38cefc2f3ef`
 ## Google API Configuration
 **Service Account**: `~/.credential/ourdigital-seo-agent.json`
 | API | Authentication | Usage |
 |-----|---------------|-------|
 | Search Console | Service account | `gsc_client.py` |
 | PageSpeed Insights | API key (`PAGESPEED_API_KEY`) | `pagespeed_client.py` |
 | GA4 Analytics | Service account | Traffic data |
 Environment variables are loaded from `~/Workspaces/claude-workspace/.env`.
 ## MCP Tool Integration
 The skill uses MCP tools as primary data sources (Tier 1):
 - `mcp__firecrawl__scrape/crawl` - Web page content extraction
 - `mcp__perplexity__search` - Competitor research
 - `mcp__notion__*` - Database operations
 Python scripts are Tier 2 for Google API data collection.
 ## Extending the Skill
 ### Adding a New Schema Type
 1. Add JSON template to `templates/schema_templates/`
 2. Update `REQUIRED_PROPERTIES` and `RECOMMENDED_PROPERTIES` in `schema_validator.py`
 3. Add type-specific validation in `_validate_type_specific()`
 ### Adding a New Audit Check
 1. Create checker class following pattern in existing scripts
 2. Return dataclass with `to_dict()` method and `issues` list
 3. Add processing method in `SEOAuditor` (`_process_*_findings`)
 4. Wire into `run_audit()` in `full_audit.py`
 ## Rate Limits
 | Service | Limit | Handled By |
 |---------|-------|------------|
 | Firecrawl | Per plan | MCP |
 | PageSpeed | 25,000/day | `base_client.py` RateLimiter |
 | Search Console | 1,200/min | Manual delays |
 | Notion | 3 req/sec | Semaphore in reporter |
--- a/ourdigital-custom-skills/12-ourdigital-seo-audit/SKILL.md
+++ b/ourdigital-custom-skills/12-ourdigital-seo-audit/SKILL.md
@@ -0,0 +1,330 @@
 ---
 name: ourdigital-seo-audit
 description: Comprehensive SEO audit skill for technical SEO, on-page optimization, content analysis, local SEO, Core Web Vitals assessment, schema markup generation/validation, sitemap validation, and robots.txt analysis. Use when user asks for SEO audit, website analysis, search performance review, schema markup, structured data, sitemap check, robots.txt analysis, or optimization recommendations. Activates for keywords like SEO, audit, search console, rankings, crawlability, indexing, meta tags, Core Web Vitals, local SEO, schema, structured data, sitemap, robots.txt.
 allowed-tools: mcp__firecrawl__*, mcp__perplexity__*, mcp__notion__*, mcp__google-drive__*, mcp__memory__*, Read, Write, Edit, Bash(python:*), Bash(pip:*)
 ---
 # OurDigital SEO Audit Skill
 ## Purpose
 Comprehensive SEO audit capability for:
 - Technical SEO analysis (crawlability, indexing, site structure)
 - On-page SEO optimization (meta tags, headings, content)
 - Content quality assessment
 - Local SEO evaluation
 - Core Web Vitals performance
 - Schema markup generation and validation
 - XML sitemap validation
 - Robots.txt analysis
 ## Execution Strategy: Three-Tier Approach
 Always follow this priority order:
 ### Tier 1: MCP Tools (Primary)
 Use built-in MCP tools first for real-time analysis:
 | Tool | Purpose |
 |------|---------|
 | `mcp__firecrawl__scrape` | Scrape page content and structure |
 | `mcp__firecrawl__crawl` | Crawl entire website |
 | `mcp__firecrawl__extract` | Extract structured data |
 | `mcp__perplexity__search` | Research competitors, best practices |
 | `mcp__notion__create-database` | Create findings database |
 | `mcp__notion__create-page` | Add audit findings |
 | `mcp__google-drive__search` | Access Sheets for output |
 | `mcp__memory__create_entities` | Track audit state |
 ### Tier 2: Python Scripts (Data Collection)
 For Google API data and specialized analysis:
 - `gsc_client.py` - Search Console performance data
 - `pagespeed_client.py` - Core Web Vitals metrics
 - `ga4_client.py` - Traffic and user behavior
 - `schema_validator.py` - Validate structured data
 - `sitemap_validator.py` - Validate XML sitemaps
 - `robots_checker.py` - Analyze robots.txt
 ### Tier 3: Manual Fallback
 For data requiring special access:
 - Export data for offline analysis
 - Manual GBP data entry (API requires enterprise approval)
 - Third-party tool integration
 ## Google API Configuration
 ### Service Account Credentials
 The skill uses `ourdigital-seo-agent` service account for authenticated APIs:
 ```
 Credentials: ~/.credential/ourdigital-seo-agent.json
 Service Account: ourdigital-seo-agent@ourdigital-insights.iam.gserviceaccount.com
 Project: ourdigital-insights
 ```
 ### API Status & Configuration
 | API | Status | Authentication | Notes |
 |-----|--------|----------------|-------|
 | Search Console | **WORKING** | Service account | Domain: sc-domain:ourdigital.org |
 | PageSpeed Insights | **WORKING** | API key | Higher quotas with key |
 | Analytics Data (GA4) | **WORKING** | Service account | Properties: Lab, Journal, Blog |
 | Google Trends | **WORKING** | None (pytrends) | No auth required |
 | Custom Search JSON | **WORKING** | API key | cx: e5f27994f2bab4bf2 |
 | Knowledge Graph | **WORKING** | API key | Entity search |
 | Google Sheets | **WORKING** | Service account | Share sheet with service account |
 ### Environment Variables (Configured)
 Located in `~/Workspaces/claude-workspace/.env`:
 ```bash
 # Google Service Account (auto-detected)
 # ~/.credential/ourdigital-seo-agent.json
 # Google API Key (PageSpeed, Custom Search, Knowledge Graph)
 GOOGLE_API_KEY=AIzaSyBdfnL3-CVl-ZAKYrLMuaHFR6MASa9ZH1Q
 PAGESPEED_API_KEY=AIzaSyBdfnL3-CVl-ZAKYrLMuaHFR6MASa9ZH1Q
 CUSTOM_SEARCH_API_KEY=AIzaSyBdfnL3-CVl-ZAKYrLMuaHFR6MASa9ZH1Q
 CUSTOM_SEARCH_ENGINE_ID=e5f27994f2bab4bf2
 ```
 ### Enabled APIs in Google Cloud Console (ourdigital-insights)
 - Search Console API
 - PageSpeed Insights API
 - Google Analytics Admin API
 - Google Analytics Data API
 - Custom Search API
 - Knowledge Graph Search API
 ## Audit Categories
 ### 1. Technical SEO
 - HTTPS/SSL implementation
 - Canonical URL setup
 - Redirect chains/loops
 - 404 error pages
 - Server response times
 - Mobile-friendliness
 - Crawlability assessment
 - Hreflang tags
 ### 2. On-page SEO
 - Title tags (length, uniqueness, keywords)
 - Meta descriptions
 - Heading hierarchy (H1-H6)
 - Image alt attributes
 - Internal linking structure
 - URL structure
 - Open Graph / Twitter Card tags
 ### 3. Content SEO
 - Content quality assessment
 - Thin content identification
 - Duplicate content detection
 - Keyword relevance
 - Content freshness
 - E-E-A-T signals
 ### 4. Local SEO
 - Google Business Profile optimization
 - NAP consistency
 - Local citations
 - Review management
 - LocalBusiness schema markup
 ### 5. Core Web Vitals
 - Largest Contentful Paint (LCP) < 2.5s
 - First Input Delay (FID) < 100ms
 - Cumulative Layout Shift (CLS) < 0.1
 - Interaction to Next Paint (INP) < 200ms
 - Time to First Byte (TTFB)
 - First Contentful Paint (FCP)
 ### 6. Schema/Structured Data
 - Extract existing schema (JSON-LD, Microdata, RDFa)
 - Validate against schema.org vocabulary
 - Check Google Rich Results compatibility
 - Generate missing schema markup
 - Support: Organization, LocalBusiness, Product, Article, FAQ, Breadcrumb, WebSite
 ### 7. Sitemap Validation
 - XML syntax validation
 - URL accessibility (HTTP status)
 - URL count limits (50,000 max)
 - File size limits (50MB max)
 - Lastmod dates validity
 - Index sitemap structure
 ### 8. Robots.txt Analysis
 - Syntax validation
 - User-agent rules review
 - Disallow/Allow patterns
 - Sitemap declarations
 - Critical resources access
 - URL testing against rules
 ## Report Output
 ### Default Notion Database
 All SEO audit findings are stored in the centralized **OurDigital SEO Audit Log**:
 - **Database ID**: `2c8581e5-8a1e-8035-880b-e38cefc2f3ef`
 - **URL**: https://www.notion.so/dintelligence/2c8581e58a1e8035880be38cefc2f3ef
 ### Notion Database Schema
 **Database Properties (Metadata)**
 | Property | Type | Values | Description |
 |----------|------|--------|-------------|
 | Issue | Title | Issue description | Primary identifier |
 | Site | URL | Website URL | Audited site (e.g., https://blog.ourdigital.org) |
 | Category | Select | Technical SEO, On-page SEO, Content, Local SEO, Performance, Schema/Structured Data, Sitemap, Robots.txt | Issue classification |
 | Priority | Select | Critical, High, Medium, Low | Fix priority |
 | Status | Status | Not started, In progress, Done | Tracking status |
 | URL | URL | Affected URL | Specific page with issue |
 | Found Date | Date | Discovery date | When issue was found |
 | Audit ID | Rich Text | Audit identifier | Groups findings from same audit session |
 **Page Content Template**
 Each finding page contains structured content blocks:
 ```
 ## Description
 [Detailed explanation of the issue]
 ## Impact
 ⚠️ [Business/ranking impact callout]
 ## Recommendation
 💡 [Actionable solution callout]
 ```
 ### Report Categories
 1. **Required Actions** (Critical/High Priority)
   - Security issues, indexing blocks, major errors
 2. **Quick Wins** (Easy fixes with high impact)
   - Missing meta tags, schema markup, image optimization
 3. **Further Investigation**
   - Complex issues needing deeper analysis
 4. **Items to Monitor**
   - Performance metrics, ranking changes, crawl stats
 ## Operational Guidelines
 ### Before Any Audit
 1. **Gather context**: Ask for target URL, business type, priorities
 2. **Check access**: Verify MCP tools are available
 3. **Set scope**: Full site vs specific pages
 ### During Audit
 1. Use Firecrawl for initial site analysis
 2. Run Python scripts for Google API data
 3. Validate schema, sitemap, robots.txt
 4. Document findings in Notion
 ### Rate Limits
 | Service | Limit | Strategy |
 |---------|-------|----------|
 | Firecrawl | Per plan | Use crawl for site-wide |
 | PageSpeed | 25,000/day | Batch critical pages |
 | Search Console | 1,200/min | Use async with delays |
 | Notion | 3 req/sec | Implement semaphore |
 ## Quick Commands
 ### Full Site Audit
 ```
 Perform a comprehensive SEO audit for [URL]
 ```
 ### Technical SEO Check
 ```
 Check technical SEO for [URL] including crawlability and indexing
 ```
 ### Schema Generation
 ```
 Generate [type] schema markup for [URL/content]
 ```
 ### Schema Validation
 ```
 Validate existing schema markup on [URL]
 ```
 ### Sitemap Check
 ```
 Validate the sitemap at [sitemap URL]
 ```
 ### Robots.txt Analysis
 ```
 Analyze robots.txt for [domain]
 ```
 ### Core Web Vitals
 ```
 Check Core Web Vitals for [URL]
 ```
 ### Local SEO Assessment
 ```
 Perform local SEO audit for [business name] in [location]
 ```
 ## Script Usage
 ### Schema Generator
 ```bash
 python scripts/schema_generator.py --type organization --url https://example.com
 ```
 ### Schema Validator
 ```bash
 python scripts/schema_validator.py --url https://example.com
 ```
 ### Sitemap Validator
 ```bash
 python scripts/sitemap_validator.py --url https://example.com/sitemap.xml
 ```
 ### Robots.txt Checker
 ```bash
 python scripts/robots_checker.py --url https://example.com/robots.txt
 ```
 ### Full Audit
 ```bash
 python scripts/full_audit.py --url https://example.com --output notion
 ```
 ## Limitations
 - Google Business Profile API requires enterprise approval
 - Some competitive analysis limited to public data
 - Large sites (10,000+ pages) require extended crawl time
 - Real-time ranking data requires third-party tools
 ## Related Resources
 - `reference.md` - Detailed API documentation
 - `examples.md` - Usage examples
 - `templates/` - Schema and report templates
 - `scripts/` - Python automation scripts
--- a/ourdigital-custom-skills/12-ourdigital-seo-audit/USER_GUIDE.md
+++ b/ourdigital-custom-skills/12-ourdigital-seo-audit/USER_GUIDE.md
@@ -0,0 +1,494 @@
 # OurDigital SEO Audit - User Guide
 ## Table of Contents
 1. [Overview](#overview)
 2. [Prerequisites](#prerequisites)
 3. [Quick Start](#quick-start)
 4. [Features](#features)
 5. [Usage Examples](#usage-examples)
 6. [API Reference](#api-reference)
 7. [Notion Integration](#notion-integration)
 8. [Troubleshooting](#troubleshooting)
 ---
 ## Overview
 The **ourdigital-seo-audit** skill is a comprehensive SEO audit tool for Claude Code that:
 - Performs technical SEO analysis
 - Validates schema markup (JSON-LD, Microdata, RDFa)
 - Checks sitemap and robots.txt configuration
 - Measures Core Web Vitals performance
 - Integrates with Google APIs (Search Console, Analytics, PageSpeed)
 - Exports findings to Notion database
 ### Supported Audit Categories
 | Category | Description |
 |----------|-------------|
 | Technical SEO | HTTPS, canonicals, redirects, crawlability |
 | On-page SEO | Meta tags, headings, images, internal links |
 | Content | Quality, duplication, freshness, E-E-A-T |
 | Local SEO | NAP consistency, citations, LocalBusiness schema |
 | Performance | Core Web Vitals (LCP, CLS, TBT, FCP) |
 | Schema/Structured Data | JSON-LD validation, rich results eligibility |
 | Sitemap | XML validation, URL accessibility |
 | Robots.txt | Directive analysis, blocking issues |
 ---
 ## Prerequisites
 ### 1. Python Environment
 ```bash
 # Install required packages
 cd ~/.claude/skills/ourdigital-seo-audit/scripts
 pip install -r requirements.txt
 ```
 ### 2. Google Cloud Service Account
 The skill uses a service account for authenticated APIs:
 ```
 File: ~/.credential/ourdigital-seo-agent.json
 Service Account: ourdigital-seo-agent@ourdigital-insights.iam.gserviceaccount.com
 Project: ourdigital-insights
 ```
 **Required permissions:**
 - Search Console: Add service account email as user in [Search Console](https://search.google.com/search-console)
 - GA4: Add service account as Viewer in [Google Analytics](https://analytics.google.com)
 ### 3. API Keys
 Located in `~/Workspaces/claude-workspace/.env`:
 ```bash
 # Google API Key (for PageSpeed, Custom Search, Knowledge Graph)
 GOOGLE_API_KEY=your-api-key
 PAGESPEED_API_KEY=your-api-key
 CUSTOM_SEARCH_API_KEY=your-api-key
 CUSTOM_SEARCH_ENGINE_ID=your-cx-id
 ```
 ### 4. Enabled Google Cloud APIs
 Enable these APIs in [Google Cloud Console](https://console.cloud.google.com/apis/library):
 - Search Console API
 - PageSpeed Insights API
 - Google Analytics Admin API
 - Google Analytics Data API
 - Custom Search API
 - Knowledge Graph Search API
 ---
 ## Quick Start
 ### Run a Full SEO Audit
 ```
 Perform a comprehensive SEO audit for https://example.com
 ```
 ### Check Core Web Vitals
 ```
 Check Core Web Vitals for https://example.com
 ```
 ### Validate Schema Markup
 ```
 Validate schema markup on https://example.com
 ```
 ### Check Sitemap
 ```
 Validate the sitemap at https://example.com/sitemap.xml
 ```
 ### Analyze Robots.txt
 ```
 Analyze robots.txt for example.com
 ```
 ---
 ## Features
 ### 1. Technical SEO Analysis
 Checks for:
 - HTTPS/SSL implementation
 - Canonical URL configuration
 - Redirect chains and loops
 - 404 error pages
 - Mobile-friendliness
 - Hreflang tags (international sites)
 **Command:**
 ```
 Check technical SEO for https://example.com
 ```
 ### 2. Schema Markup Validation
 Extracts and validates structured data:
 - JSON-LD (recommended)
 - Microdata
 - RDFa
 Supported schema types:
 - Organization / LocalBusiness
 - Product / Offer
 - Article / BlogPosting
 - FAQPage / HowTo
 - BreadcrumbList
 - WebSite / WebPage
 **Command:**
 ```
 Validate existing schema markup on https://example.com
 ```
 ### 3. Schema Generation
 Generate JSON-LD markup for your pages:
 **Command:**
 ```
 Generate Organization schema for https://example.com
 ```
 **Available types:**
 - `organization` - Company/organization info
 - `local_business` - Physical business location
 - `product` - E-commerce products
 - `article` - Blog posts and news
 - `faq` - FAQ pages
 - `breadcrumb` - Navigation breadcrumbs
 - `website` - Site-level schema with search
 ### 4. Sitemap Validation
 Validates XML sitemaps for:
 - XML syntax errors
 - URL accessibility (HTTP status)
 - URL count limits (50,000 max)
 - File size limits (50MB max)
 - Lastmod date validity
 - Index sitemap structure
 **Command:**
 ```
 Validate the sitemap at https://example.com/sitemap.xml
 ```
 ### 5. Robots.txt Analysis
 Analyzes robots.txt for:
 - Syntax validation
 - User-agent rules
 - Disallow/Allow patterns
 - Sitemap declarations
 - Critical resource blocking
 - URL testing against rules
 **Command:**
 ```
 Analyze robots.txt for example.com
 ```
 ### 6. Core Web Vitals
 Measures performance metrics:
 | Metric | Good | Needs Improvement | Poor |
 |--------|------|-------------------|------|
 | LCP (Largest Contentful Paint) | < 2.5s | 2.5s - 4.0s | > 4.0s |
 | CLS (Cumulative Layout Shift) | < 0.1 | 0.1 - 0.25 | > 0.25 |
 | TBT (Total Blocking Time) | < 200ms | 200ms - 600ms | > 600ms |
 | FCP (First Contentful Paint) | < 1.8s | 1.8s - 3.0s | > 3.0s |
 **Command:**
 ```
 Check Core Web Vitals for https://example.com
 ```
 ### 7. Search Console Integration
 Access search performance data:
 - Top queries and pages
 - Click-through rates
 - Average positions
 - Indexing status
 **Command:**
 ```
 Get Search Console data for sc-domain:example.com
 ```
 ### 8. GA4 Analytics Integration
 Access traffic and behavior data:
 - Page views
 - User sessions
 - Traffic sources
 - Engagement metrics
 **Available properties:**
 - OurDigital Lab (218477407)
 - OurDigital Journal (413643875)
 - OurDigital Blog (489750460)
 **Command:**
 ```
 Get GA4 traffic data for OurDigital Blog
 ```
 ---
 ## Usage Examples
 ### Example 1: Full Site Audit with Notion Export
 ```
 Perform a comprehensive SEO audit for https://blog.ourdigital.org and export findings to Notion
 ```
 This will:
 1. Check robots.txt configuration
 2. Validate sitemap
 3. Analyze schema markup
 4. Run PageSpeed analysis
 5. Export all findings to the OurDigital SEO Audit Log database
 ### Example 2: Schema Audit Only
 ```
 Check schema markup on https://blog.ourdigital.org and identify any issues
 ```
 ### Example 3: Performance Deep Dive
 ```
 Analyze Core Web Vitals for https://blog.ourdigital.org and provide optimization recommendations
 ```
 ### Example 4: Competitive Analysis
 ```
 Compare SEO performance between https://blog.ourdigital.org and https://competitor.com
 ```
 ### Example 5: Local SEO Audit
 ```
 Perform local SEO audit for OurDigital in Seoul, Korea
 ```
 ---
 ## API Reference
 ### Python Scripts
 All scripts are located in `~/.claude/skills/ourdigital-seo-audit/scripts/`
 #### robots_checker.py
 ```bash
 python robots_checker.py --url https://example.com
 ```
 Options:
 - `--url` - Base URL to check
 - `--test-url` - Specific URL to test against rules
 - `--user-agent` - User agent to test (default: Googlebot)
 #### sitemap_validator.py
 ```bash
 python sitemap_validator.py --url https://example.com/sitemap.xml
 ```
 Options:
 - `--url` - Sitemap URL
 - `--check-urls` - Verify URL accessibility (slower)
 - `--limit` - Max URLs to check
 #### schema_validator.py
 ```bash
 python schema_validator.py --url https://example.com
 ```
 Options:
 - `--url` - Page URL to validate
 - `--check-rich-results` - Check Google Rich Results eligibility
 #### schema_generator.py
 ```bash
 python schema_generator.py --type organization --url https://example.com
 ```
 Options:
 - `--type` - Schema type (organization, local_business, product, article, faq, breadcrumb, website)
 - `--url` - Target URL
 - `--output` - Output file (default: stdout)
 #### pagespeed_client.py
 ```bash
 python pagespeed_client.py --url https://example.com --strategy mobile
 ```
 Options:
 - `--url` - URL to analyze
 - `--strategy` - mobile, desktop, or both
 - `--json` - Output as JSON
 - `--cwv-only` - Show only Core Web Vitals
 #### gsc_client.py
 ```bash
 python gsc_client.py --site sc-domain:example.com --action summary
 ```
 Options:
 - `--site` - Site URL (sc-domain: or https://)
 - `--action` - summary, queries, pages, sitemaps, inspect
 - `--days` - Days of data (default: 30)
 #### full_audit.py
 ```bash
 python full_audit.py --url https://example.com --output notion
 ```
 Options:
 - `--url` - URL to audit
 - `--output` - console, notion, or json
 - `--no-robots` - Skip robots.txt check
 - `--no-sitemap` - Skip sitemap validation
 - `--no-schema` - Skip schema validation
 - `--no-performance` - Skip PageSpeed analysis
 ---
 ## Notion Integration
 ### Default Database
 All findings are stored in the **OurDigital SEO Audit Log**:
 - **Database ID**: `2c8581e5-8a1e-8035-880b-e38cefc2f3ef`
 - **URL**: https://www.notion.so/dintelligence/2c8581e58a1e8035880be38cefc2f3ef
 ### Database Properties
 | Property | Type | Description |
 |----------|------|-------------|
 | Issue | Title | Issue title |
 | Site | URL | Audited site URL |
 | Category | Select | Issue category |
 | Priority | Select | Critical / High / Medium / Low |
 | Status | Status | Not started / In progress / Done |
 | URL | URL | Specific page with issue |
 | Found Date | Date | Discovery date |
 | Audit ID | Text | Groups findings by session |
 ### Page Content Template
 Each finding page contains:
 ```
 ## Description
 [Detailed explanation of the issue]
 ## Impact
 ⚠️ [Business/ranking impact]
 ## Recommendation
 💡 [Actionable solution]
 ```
 ### Filtering Findings
 Use Notion filters to view:
 - **By Site**: Filter by Site property
 - **By Category**: Filter by Category (Schema, Performance, etc.)
 - **By Priority**: Filter by Priority (Critical first)
 - **By Audit**: Filter by Audit ID to see all findings from one session
 ---
 ## Troubleshooting
 ### API Authentication Errors
 **Error: "Invalid JWT Signature"**
 - Check that the service account key file exists
 - Verify the file path in `~/.credential/ourdigital-seo-agent.json`
 - Regenerate the key in Google Cloud Console if corrupted
 **Error: "Requests from referer are blocked"**
 - Go to [API Credentials](https://console.cloud.google.com/apis/credentials)
 - Click on your API key
 - Set "Application restrictions" to "None"
 - Save and wait 1-2 minutes
 **Error: "API has not been enabled"**
 - Enable the required API in [Google Cloud Console](https://console.cloud.google.com/apis/library)
 - Wait a few minutes for propagation
 ### Search Console Issues
 **Error: "Site not found"**
 - Use domain property format: `sc-domain:example.com`
 - Or URL prefix format: `https://www.example.com/`
 - Add service account email as user in Search Console
 ### PageSpeed Rate Limiting
 **Error: "429 Too Many Requests"**
 - Wait a few minutes before retrying
 - Use an API key for higher quotas
 - Batch requests with delays
 ### Notion Integration Issues
 **Error: "Failed to create page"**
 - Verify NOTION_API_KEY is set
 - Check that the integration has access to the database
 - Ensure database properties match expected schema
 ### Python Import Errors
 **Error: "ModuleNotFoundError"**
 ```bash
 cd ~/.claude/skills/ourdigital-seo-audit/scripts
 pip install -r requirements.txt
 ```
 ---
 ## Support
 For issues or feature requests:
 1. Check this guide first
 2. Review the [SKILL.md](SKILL.md) for technical details
 3. Check [examples.md](examples.md) for more usage examples
 ---
 *Last updated: December 2024*
--- a/ourdigital-custom-skills/12-ourdigital-seo-audit/desktop-skill-refer/QUICK_REFERENCE.md
+++ b/ourdigital-custom-skills/12-ourdigital-seo-audit/desktop-skill-refer/QUICK_REFERENCE.md
@@ -0,0 +1,113 @@
 # SEO Audit Quick Reference Card
 ## Instant Commands
 ### Site Audits
 ```
 Perform full SEO audit for https://example.com
 Check technical SEO for https://example.com
 Analyze robots.txt for example.com
 Validate sitemap at https://example.com/sitemap.xml
 ```
 ### Schema Operations
 ```
 Validate schema markup on https://example.com
 Generate Organization schema for [Company Name], URL: [URL]
 Generate LocalBusiness schema for [Business Name], Address: [Address], Hours: [Hours]
 Generate Article schema for [Title], Author: [Name], Published: [Date]
 Generate FAQPage schema with these Q&As: [Questions and Answers]
 ```
 ### Performance
 ```
 Check Core Web Vitals for https://example.com
 Analyze page speed issues on https://example.com
 ```
 ### Local SEO
 ```
 Local SEO audit for [Business Name] in [City]
 Check NAP consistency for [Business Name]
 ```
 ### Competitive Analysis
 ```
 Compare SEO between https://site1.com and https://site2.com
 Analyze top 10 results for "[keyword]"
 ```
 ### Export & Reporting
 ```
 Export findings to Notion
 Create SEO audit report for [URL]
 Summarize audit findings with priorities
 ```
 ## Schema Validation Checklist
 ### Organization
 - [ ] name (required)
 - [ ] url (required)
 - [ ] logo (recommended)
 - [ ] sameAs (recommended)
 - [ ] contactPoint (recommended)
 ### LocalBusiness
 - [ ] name (required)
 - [ ] address (required)
 - [ ] telephone (recommended)
 - [ ] openingHours (recommended)
 - [ ] geo (recommended)
 ### Article
 - [ ] headline (required)
 - [ ] author (required)
 - [ ] datePublished (required)
 - [ ] image (recommended)
 - [ ] publisher (recommended)
 ### Product
 - [ ] name (required)
 - [ ] image (recommended)
 - [ ] description (recommended)
 - [ ] offers (recommended)
 ### FAQPage
 - [ ] mainEntity (required)
 - [ ] Question with name (required)
 - [ ] acceptedAnswer (required)
 ## Core Web Vitals Targets
 | Metric | Good | Improve | Poor |
 |--------|------|---------|------|
 | LCP | <2.5s | 2.5-4s | >4s |
 | CLS | <0.1 | 0.1-0.25 | >0.25 |
 | INP | <200ms | 200-500ms | >500ms |
 | FCP | <1.8s | 1.8-3s | >3s |
 | TTFB | <800ms | 800ms-1.8s | >1.8s |
 ## Priority Levels
 | Priority | Examples |
 |----------|----------|
 | Critical | Site blocked, 5xx errors, no HTTPS |
 | High | Missing titles, schema errors, broken links |
 | Medium | Missing alt text, thin content, no OG tags |
 | Low | URL structure, minor schema issues |
 ## Notion Database ID
 **OurDigital SEO Audit Log**: `2c8581e5-8a1e-8035-880b-e38cefc2f3ef`
 ## MCP Tools Reference
 | Tool | Usage |
 |------|-------|
 | firecrawl_scrape | Single page analysis |
 | firecrawl_map | Site structure discovery |
 | firecrawl_crawl | Full site crawl |
 | perplexity search | Research & competitive analysis |
 | notion API-post-page | Create findings |
 | fetch | Get robots.txt, sitemap |
--- a/ourdigital-custom-skills/12-ourdigital-seo-audit/desktop-skill-refer/SEO_AUDIT_KNOWLEDGE.md
+++ b/ourdigital-custom-skills/12-ourdigital-seo-audit/desktop-skill-refer/SEO_AUDIT_KNOWLEDGE.md
@@ -0,0 +1,386 @@
 # OurDigital SEO Audit - Claude Desktop Project Knowledge
 ## Overview
 This project knowledge file enables Claude Desktop to perform comprehensive SEO audits using MCP tools. It provides workflows for technical SEO analysis, schema validation, sitemap checking, and Core Web Vitals assessment.
 ## Available MCP Tools
 ### Primary Tools
 | Tool | Purpose |
 |------|---------|
 | `firecrawl` | Website crawling, scraping, structured data extraction |
 | `perplexity` | AI-powered research, competitive analysis |
 | `notion` | Store audit findings in database |
 | `fetch` | Fetch web pages and resources |
 | `sequential-thinking` | Complex multi-step analysis |
 ### Tool Usage Patterns
 #### Firecrawl - Web Scraping
 ```
 firecrawl_scrape: Scrape single page content
 firecrawl_crawl: Crawl entire website
 firecrawl_extract: Extract structured data
 firecrawl_map: Get site structure
 ```
 #### Notion - Output Storage
 ```
 API-post-search: Find existing databases
 API-post-database-query: Query database
 API-post-page: Create finding pages
 API-patch-page: Update findings
 ```
 ## SEO Audit Workflows
 ### 1. Full Site Audit
 **User prompt:** "Perform SEO audit for https://example.com"
 **Workflow:**
 1. Use `firecrawl_scrape` to get homepage content
 2. Use `firecrawl_map` to discover site structure
 3. Check robots.txt at /robots.txt
 4. Check sitemap at /sitemap.xml
 5. Extract and validate schema markup
 6. Use `perplexity` for competitive insights
 7. Store findings in Notion database
 ### 2. Schema Markup Validation
 **User prompt:** "Validate schema on https://example.com"
 **Workflow:**
 1. Use `firecrawl_scrape` with extractSchema option
 2. Look for JSON-LD in `<script type="application/ld+json">`
 3. Check for Microdata (`itemscope`, `itemtype`, `itemprop`)
 4. Validate against schema.org requirements
 5. Check Rich Results eligibility
 **Schema Types to Check:**
 - Organization / LocalBusiness
 - Product / Offer
 - Article / BlogPosting
 - FAQPage / HowTo
 - BreadcrumbList
 - WebSite / WebPage
 **Required Properties by Type:**
 | Type | Required | Recommended |
 |------|----------|-------------|
 | Organization | name, url | logo, sameAs, contactPoint |
 | LocalBusiness | name, address | telephone, openingHours, geo |
 | Product | name | image, description, offers, brand |
 | Article | headline, author, datePublished | image, dateModified, publisher |
 | FAQPage | mainEntity (Question + Answer) | - |
 ### 3. Robots.txt Analysis
 **User prompt:** "Check robots.txt for example.com"
 **Workflow:**
 1. Fetch https://example.com/robots.txt
 2. Parse directives:
   - User-agent rules
   - Disallow patterns
   - Allow patterns
   - Crawl-delay
   - Sitemap declarations
 3. Check for issues:
   - Blocking CSS/JS resources
   - Missing sitemap reference
   - Overly restrictive rules
 **Sample Analysis Output:**
 ```
 Robots.txt Analysis
 ==================
 User-agents: 3 defined (*, Googlebot, Bingbot)
 Directives:
 - Disallow: /admin/, /private/, /tmp/
 - Allow: /public/, /blog/
 - Sitemap: https://example.com/sitemap.xml
 Issues Found:
 - WARNING: CSS/JS files may be blocked (/assets/)
 - OK: Sitemap is declared
 - INFO: Crawl-delay set to 10s
 ```
 ### 4. Sitemap Validation
 **User prompt:** "Validate sitemap at https://example.com/sitemap.xml"
 **Workflow:**
 1. Fetch sitemap XML
 2. Parse and validate structure
 3. Check:
   - XML syntax validity
   - URL count (max 50,000)
   - Lastmod date formats
   - URL accessibility (sample)
 4. For sitemap index, check child sitemaps
 **Validation Criteria:**
 - Valid XML syntax
 - `<urlset>` or `<sitemapindex>` root element
 - Each `<url>` has `<loc>` element
 - `<lastmod>` in W3C datetime format
 - File size under 50MB uncompressed
 ### 5. Core Web Vitals Check
 **User prompt:** "Check Core Web Vitals for https://example.com"
 **Workflow:**
 1. Use PageSpeed Insights (if API available)
 2. Or analyze page with firecrawl for common issues
 3. Check for:
   - Large images without optimization
   - Render-blocking resources
   - Layout shift causes
   - JavaScript execution time
 **Metrics & Thresholds:**
 | Metric | Good | Needs Improvement | Poor |
 |--------|------|-------------------|------|
 | LCP | < 2.5s | 2.5s - 4.0s | > 4.0s |
 | CLS | < 0.1 | 0.1 - 0.25 | > 0.25 |
 | FID/INP | < 100ms/200ms | 100-300ms/200-500ms | > 300ms/500ms |
 ### 6. Technical SEO Check
 **User prompt:** "Check technical SEO for https://example.com"
 **Workflow:**
 1. Check HTTPS implementation
 2. Verify canonical tags
 3. Check meta robots tags
 4. Analyze heading structure (H1-H6)
 5. Check image alt attributes
 6. Verify Open Graph / Twitter Cards
 7. Check mobile-friendliness indicators
 **Checklist:**
 - [ ] HTTPS enabled
 - [ ] Single canonical URL per page
 - [ ] Proper robots meta tags
 - [ ] One H1 per page
 - [ ] All images have alt text
 - [ ] OG tags present (og:title, og:description, og:image)
 - [ ] Twitter Card tags present
 - [ ] Viewport meta tag for mobile
 ### 7. Local SEO Audit
 **User prompt:** "Local SEO audit for [Business Name] in [Location]"
 **Workflow:**
 1. Search for business citations with `perplexity`
 2. Check for LocalBusiness schema
 3. Verify NAP (Name, Address, Phone) consistency
 4. Look for review signals
 5. Check Google Business Profile (manual)
 ## Notion Database Integration
 ### Default Database
 - **Database ID**: `2c8581e5-8a1e-8035-880b-e38cefc2f3ef`
 - **Name**: OurDigital SEO Audit Log
 ### Database Properties
 | Property | Type | Values |
 |----------|------|--------|
 | Issue | Title | Finding title |
 | Site | URL | Audited site URL |
 | Category | Select | Technical SEO, On-page SEO, Content, Local SEO, Performance, Schema/Structured Data, Sitemap, Robots.txt |
 | Priority | Select | Critical, High, Medium, Low |
 | Status | Status | Not started, In progress, Done |
 | URL | URL | Specific page with issue |
 | Found Date | Date | Discovery date |
 | Audit ID | Rich Text | Groups findings from same audit |
 ### Page Content Template
 Each finding page should contain:
 ```markdown
 ## Description
 [Detailed explanation of the issue]
 ## Impact
 [Business/ranking impact with warning callout]
 ## Recommendation
 [Actionable solution with lightbulb callout]
 ```
 ### Creating Findings
 Use Notion MCP to create pages:
 1. Query database to check for existing entries
 2. Create new page with properties
 3. Add content blocks (Description, Impact, Recommendation)
 ## Schema Markup Templates
 ### Organization Schema
 ```json
 {
  "@context": "https://schema.org",
  "@type": "Organization",
  "name": "[Company Name]",
  "url": "[Website URL]",
  "logo": "[Logo URL]",
  "sameAs": [
    "[Social Media URLs]"
  ],
  "contactPoint": {
    "@type": "ContactPoint",
    "telephone": "[Phone]",
    "contactType": "customer service"
  }
 }
 ```
 ### LocalBusiness Schema
 ```json
 {
  "@context": "https://schema.org",
  "@type": "LocalBusiness",
  "name": "[Business Name]",
  "address": {
    "@type": "PostalAddress",
    "streetAddress": "[Street]",
    "addressLocality": "[City]",
    "addressCountry": "[Country Code]"
  },
  "telephone": "[Phone]",
  "openingHoursSpecification": [{
    "@type": "OpeningHoursSpecification",
    "dayOfWeek": ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"],
    "opens": "09:00",
    "closes": "18:00"
  }]
 }
 ```
 ### Article Schema
 ```json
 {
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "[Article Title]",
  "author": {
    "@type": "Person",
    "name": "[Author Name]"
  },
  "datePublished": "[ISO Date]",
  "dateModified": "[ISO Date]",
  "publisher": {
    "@type": "Organization",
    "name": "[Publisher Name]",
    "logo": {
      "@type": "ImageObject",
      "url": "[Logo URL]"
    }
  }
 }
 ```
 ### FAQPage Schema
 ```json
 {
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [{
    "@type": "Question",
    "name": "[Question Text]",
    "acceptedAnswer": {
      "@type": "Answer",
      "text": "[Answer Text]"
    }
  }]
 }
 ```
 ### BreadcrumbList Schema
 ```json
 {
  "@context": "https://schema.org",
  "@type": "BreadcrumbList",
  "itemListElement": [{
    "@type": "ListItem",
    "position": 1,
    "name": "Home",
    "item": "[Homepage URL]"
  }, {
    "@type": "ListItem",
    "position": 2,
    "name": "[Category]",
    "item": "[Category URL]"
  }]
 }
 ```
 ## Common SEO Issues Reference
 ### Critical Priority
 - Site not accessible (5xx errors)
 - Robots.txt blocking entire site
 - No HTTPS implementation
 - Duplicate content across domain
 - Sitemap returning errors
 ### High Priority
 - Missing or duplicate title tags
 - No meta descriptions
 - Schema markup errors
 - Broken internal links
 - Missing canonical tags
 - Core Web Vitals failing
 ### Medium Priority
 - Missing alt text on images
 - Thin content pages
 - Missing Open Graph tags
 - Suboptimal heading structure
 - Missing breadcrumb schema
 ### Low Priority
 - Missing Twitter Card tags
 - Suboptimal URL structure
 - Missing FAQ schema
 - Review schema not implemented
 ## Quick Commands Reference
 | Task | Prompt |
 |------|--------|
 | Full audit | "Perform SEO audit for [URL]" |
 | Schema check | "Validate schema on [URL]" |
 | Sitemap check | "Validate sitemap at [URL]" |
 | Robots.txt | "Analyze robots.txt for [domain]" |
 | Performance | "Check Core Web Vitals for [URL]" |
 | Generate schema | "Generate [type] schema for [details]" |
 | Export to Notion | "Export findings to Notion" |
 | Local SEO | "Local SEO audit for [business] in [location]" |
 | Competitive | "Compare SEO of [URL1] vs [URL2]" |
 ## Tips for Best Results
 1. **Be specific** - Provide full URLs including https://
 2. **One site at a time** - Audit one domain per session for clarity
 3. **Check Notion** - Review existing findings before creating duplicates
 4. **Prioritize fixes** - Focus on Critical/High issues first
 5. **Validate changes** - Re-audit after implementing fixes
 ## Limitations
 - No direct Python script execution (use MCP tools instead)
 - PageSpeed API requires separate configuration
 - Google Search Console data requires authenticated access
 - GA4 data requires service account setup
 - Large sites may require multiple sessions
--- a/ourdigital-custom-skills/12-ourdigital-seo-audit/desktop-skill-refer/SETUP_GUIDE.md
+++ b/ourdigital-custom-skills/12-ourdigital-seo-audit/desktop-skill-refer/SETUP_GUIDE.md
@@ -0,0 +1,230 @@
 # Claude Desktop SEO Audit - Setup Guide
 ## Prerequisites
 ### 1. Claude Desktop Application
 - Download from https://claude.ai/download
 - Sign in with your Anthropic account
 - Pro subscription recommended for extended usage
 ### 2. Required MCP Servers
 Configure these MCP servers in Claude Desktop settings:
 #### Firecrawl (Web Scraping)
 ```json
 {
  "mcpServers": {
    "firecrawl": {
      "command": "npx",
      "args": ["-y", "firecrawl-mcp"],
      "env": {
        "FIRECRAWL_API_KEY": "your-firecrawl-api-key"
      }
    }
  }
 }
 ```
 Get API key from: https://firecrawl.dev
 #### Notion (Database Storage)
 ```json
 {
  "mcpServers": {
    "notion": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-notion"],
      "env": {
        "NOTION_API_KEY": "your-notion-api-key"
      }
    }
  }
 }
 ```
 Get API key from: https://www.notion.so/my-integrations
 #### Perplexity (Research)
 ```json
 {
  "mcpServers": {
    "perplexity": {
      "command": "npx",
      "args": ["-y", "perplexity-mcp"],
      "env": {
        "PERPLEXITY_API_KEY": "your-perplexity-api-key"
      }
    }
  }
 }
 ```
 Get API key from: https://www.perplexity.ai/settings/api
 ### 3. Notion Database Setup
 #### Option A: Use Existing Database
 The OurDigital SEO Audit Log database is already configured:
 - **Database ID**: `2c8581e5-8a1e-8035-880b-e38cefc2f3ef`
 Ensure your Notion integration has access to this database.
 #### Option B: Create New Database
 Create a database with these properties:
 | Property | Type | Options |
 |----------|------|---------|
 | Issue | Title | - |
 | Site | URL | - |
 | Category | Select | Technical SEO, On-page SEO, Content, Local SEO, Performance, Schema/Structured Data, Sitemap, Robots.txt |
 | Priority | Select | Critical, High, Medium, Low |
 | Status | Status | Not started, In progress, Done |
 | URL | URL | - |
 | Found Date | Date | - |
 | Audit ID | Rich Text | - |
 ## Configuration File Location
 ### macOS
 ```
 ~/Library/Application Support/Claude/claude_desktop_config.json
 ```
 ### Windows
 ```
 %APPDATA%\Claude\claude_desktop_config.json
 ```
 ### Linux
 ```
 ~/.config/Claude/claude_desktop_config.json
 ```
 ## Complete Configuration Example
 ```json
 {
  "mcpServers": {
    "firecrawl": {
      "command": "npx",
      "args": ["-y", "firecrawl-mcp"],
      "env": {
        "FIRECRAWL_API_KEY": "fc-your-key-here"
      }
    },
    "notion": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-notion"],
      "env": {
        "NOTION_API_KEY": "ntn_your-key-here"
      }
    },
    "perplexity": {
      "command": "npx",
      "args": ["-y", "perplexity-mcp"],
      "env": {
        "PERPLEXITY_API_KEY": "pplx-your-key-here"
      }
    },
    "fetch": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-fetch"]
    }
  }
 }
 ```
 ## Adding Project Knowledge
 ### Step 1: Create a Project
 1. Open Claude Desktop
 2. Click on the project selector (top left)
 3. Click "New Project"
 4. Name it "SEO Audit"
 ### Step 2: Add Knowledge Files
 1. In your project, click the paperclip icon or "Add content"
 2. Select "Add files"
 3. Add these files from `~/.claude/desktop-projects/seo-audit/`:
   - `SEO_AUDIT_KNOWLEDGE.md` (main knowledge file)
   - `QUICK_REFERENCE.md` (quick commands)
 ### Step 3: Verify Setup
 Start a new conversation and ask:
 ```
 What SEO audit capabilities do you have?
 ```
 Claude should describe the available audit features.
 ## Testing the Setup
 ### Test 1: Firecrawl
 ```
 Scrape https://example.com and show me the page structure
 ```
 ### Test 2: Notion
 ```
 Search for "SEO Audit" in Notion
 ```
 ### Test 3: Perplexity
 ```
 Research current SEO best practices for 2024
 ```
 ### Test 4: Full Audit
 ```
 Perform a quick SEO audit for https://blog.ourdigital.org
 ```
 ## Troubleshooting
 ### MCP Server Not Connecting
 1. Restart Claude Desktop
 2. Check config file JSON syntax
 3. Verify API keys are correct
 4. Check Node.js is installed (`node --version`)
 ### Notion Permission Error
 1. Go to Notion integration settings
 2. Add the database to integration access
 3. Ensure integration has read/write permissions
 ### Firecrawl Rate Limit
 1. Wait a few minutes between requests
 2. Consider upgrading Firecrawl plan
 3. Use `firecrawl_map` for discovery, then targeted scrapes
 ### Knowledge Files Not Loading
 1. Ensure files are in supported formats (.md, .txt)
 2. Keep file sizes under 10MB
 3. Restart the project conversation
 ## Usage Tips
 1. **Start with Quick Reference** - Use the commands from QUICK_REFERENCE.md
 2. **One site per conversation** - Keep context focused
 3. **Export regularly** - Save findings to Notion frequently
 4. **Check existing findings** - Query Notion before creating duplicates
 5. **Prioritize Critical issues** - Fix showstoppers first
 ## Differences from Claude Code Version
 | Feature | Claude Code | Claude Desktop |
 |---------|-------------|----------------|
 | Python Scripts | Direct execution | Not available |
 | Google APIs | Service account auth | Manual or via MCP |
 | File System | Full access | Limited to uploads |
 | Automation | Bash commands | MCP tools only |
 | Scheduling | Possible via cron | Manual only |
 ## Support
 For issues with:
 - **Claude Desktop**: https://support.anthropic.com
 - **Firecrawl**: https://docs.firecrawl.dev
 - **Notion API**: https://developers.notion.com
 - **MCP Protocol**: https://modelcontextprotocol.io
--- a/ourdigital-custom-skills/12-ourdigital-seo-audit/examples.md
+++ b/ourdigital-custom-skills/12-ourdigital-seo-audit/examples.md
@@ -0,0 +1,521 @@
 # OurDigital SEO Audit - Usage Examples
 ## Quick Reference
 | Task | Command |
 |------|---------|
 | Full audit | `Perform SEO audit for [URL]` |
 | Schema check | `Validate schema on [URL]` |
 | PageSpeed | `Check Core Web Vitals for [URL]` |
 | Sitemap | `Validate sitemap at [URL]` |
 | Robots.txt | `Analyze robots.txt for [domain]` |
 | Export to Notion | `Export findings to Notion` |
 ---
 ## Full Site Audit
 ### Basic Full Audit
 **User prompt:**
 > Perform a comprehensive SEO audit for https://example.com
 **Agent workflow:**
 1. Use Firecrawl to crawl the site and gather page data
 2. Run PageSpeed Insights for Core Web Vitals
 3. Validate schema markup, sitemap, and robots.txt
 4. Query Search Console for performance data (if access granted)
 5. Create Notion database with findings
 6. Generate prioritized recommendations
 ---
 ## Technical SEO Check
 ### Crawlability Analysis
 **User prompt:**
 > Check technical SEO for https://example.com including crawlability and indexing
 **Agent workflow:**
 1. Fetch robots.txt and analyze directives
 2. Validate sitemap.xml structure
 3. Check HTTP response codes for key pages
 4. Verify canonical tags and redirects
 5. Test mobile-friendliness
 ---
 ## Schema Markup
 ### Generate Organization Schema
 **User prompt:**
 > Generate Organization schema markup for my company: OurDigital Corp, website https://ourdigital.com, founded 2020
 **Agent response:**
 ```json
 {
  "@context": "https://schema.org",
  "@type": "Organization",
  "name": "OurDigital Corp",
  "url": "https://ourdigital.com",
  "foundingDate": "2020",
  "sameAs": []
 }
 ```
 ### Generate LocalBusiness Schema
 **User prompt:**
 > Generate LocalBusiness schema for a restaurant: Seoul Cafe, 123 Gangnam-daero, Seoul, open Mon-Fri 9am-6pm
 **Agent response:**
 ```json
 {
  "@context": "https://schema.org",
  "@type": "Restaurant",
  "name": "Seoul Cafe",
  "address": {
    "@type": "PostalAddress",
    "streetAddress": "123 Gangnam-daero",
    "addressLocality": "Seoul",
    "addressCountry": "KR"
  },
  "openingHoursSpecification": [{
    "@type": "OpeningHoursSpecification",
    "dayOfWeek": ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"],
    "opens": "09:00",
    "closes": "18:00"
  }]
 }
 ```
 ### Validate Existing Schema
 **User prompt:**
 > Validate existing schema markup on https://example.com
 **Agent workflow:**
 1. Scrape page with Firecrawl
 2. Extract JSON-LD, Microdata, and RDFa
 3. Validate against schema.org vocabulary
 4. Check for required properties
 5. Test Rich Results eligibility
 6. Report issues and recommendations
 ---
 ## Sitemap Validation
 ### Check Sitemap
 **User prompt:**
 > Validate the sitemap at https://example.com/sitemap.xml
 **Agent workflow:**
 1. Fetch and parse XML sitemap
 2. Validate XML syntax
 3. Check URL count (max 50,000)
 4. Verify lastmod dates
 5. Test sample URLs for accessibility
 6. Report issues found
 **Sample output:**
 ```
 Sitemap Validation Report
 =========================
 URL: https://example.com/sitemap.xml
 Total URLs: 1,234
 Valid URLs: 1,200
 Issues Found:
 - 34 URLs returning 404
 - 12 URLs with invalid lastmod format
 - Missing sitemap index (recommended for 1000+ URLs)
 ```
 ---
 ## Robots.txt Analysis
 ### Analyze Robots.txt
 **User prompt:**
 > Analyze robots.txt for example.com
 **Agent workflow:**
 1. Fetch /robots.txt
 2. Parse all directives
 3. Check for blocking issues
 4. Verify sitemap declaration
 5. Test specific URLs
 6. Compare user-agent rules
 **Sample output:**
 ```
 Robots.txt Analysis
 ==================
 URL: https://example.com/robots.txt
 User-agents defined: 3 (*, Googlebot, Bingbot)
 Issues:
 - WARNING: CSS/JS files blocked (/assets/)
 - INFO: Crawl-delay set to 10 seconds (may slow indexing)
 - OK: Sitemap declared
 Rules Summary:
 - Disallowed: /admin/, /private/, /tmp/
 - Allowed: /public/, /blog/
 ```
 ---
 ## Core Web Vitals
 ### Performance Analysis
 **User prompt:**
 > Check Core Web Vitals for https://example.com
 **Agent workflow:**
 1. Run PageSpeed Insights API (mobile + desktop)
 2. Extract Core Web Vitals metrics
 3. Compare against thresholds
 4. Identify optimization opportunities
 5. Prioritize recommendations
 **Sample output:**
 ```
 Core Web Vitals Report
 =====================
 URL: https://example.com
 Strategy: Mobile
 Metrics:
 - LCP: 3.2s (NEEDS IMPROVEMENT - target <2.5s)
 - FID: 45ms (GOOD - target <100ms)
 - CLS: 0.15 (NEEDS IMPROVEMENT - target <0.1)
 - INP: 180ms (GOOD - target <200ms)
 Top Opportunities:
 1. Serve images in next-gen formats (-1.5s)
 2. Eliminate render-blocking resources (-0.8s)
 3. Reduce unused CSS (-0.3s)
 ```
 ---
 ## Local SEO Assessment
 ### Local SEO Audit
 **User prompt:**
 > Perform local SEO audit for Seoul Dental Clinic in Gangnam
 **Agent workflow:**
 1. Search for existing citations (Perplexity)
 2. Check for LocalBusiness schema
 3. Analyze NAP consistency
 4. Review Google Business Profile (manual check)
 5. Identify missing citations
 6. Recommend improvements
 ---
 ## Keyword Research
 ### Trend Analysis
 **User prompt:**
 > Research keyword trends for "digital marketing" in Korea over the past year
 **Agent workflow:**
 1. Query Google Trends (pytrends)
 2. Get related queries
 3. Identify seasonal patterns
 4. Compare with related terms
 5. Generate insights
 ---
 ## Competitive Analysis
 ### SERP Analysis
 **User prompt:**
 > Analyze top 10 search results for "best coffee shops Seoul"
 **Agent workflow:**
 1. Use Custom Search API
 2. Extract title, description, URL
 3. Analyze common patterns
 4. Check for schema markup
 5. Identify content gaps
 ---
 ## CLI Script Usage
 ### Schema Generator
 ```bash
 # Generate Organization schema
 python scripts/schema_generator.py \
  --type organization \
  --name "OurDigital Corp" \
  --url "https://ourdigital.com"
 # Generate Product schema
 python scripts/schema_generator.py \
  --type product \
  --name "SEO Tool" \
  --price 29900 \
  --currency KRW
 ```
 ### Schema Validator
 ```bash
 # Validate schema on a URL
 python scripts/schema_validator.py \
  --url https://example.com \
  --output report.json
 # Validate local JSON-LD file
 python scripts/schema_validator.py \
  --file schema.json
 ```
 ### Sitemap Validator
 ```bash
 # Validate sitemap
 python scripts/sitemap_validator.py \
  --url https://example.com/sitemap.xml \
  --check-urls \
  --output sitemap_report.json
 ```
 ### Robots.txt Checker
 ```bash
 # Analyze robots.txt
 python scripts/robots_checker.py \
  --url https://example.com/robots.txt
 # Test specific URL
 python scripts/robots_checker.py \
  --url https://example.com/robots.txt \
  --test-url /admin/dashboard \
  --user-agent Googlebot
 ```
 ### Full Audit
 ```bash
 # Run complete audit
 python scripts/full_audit.py \
  --url https://example.com \
  --output notion \
  --notion-page-id abc123
 # Export to Google Sheets
 python scripts/full_audit.py \
  --url https://example.com \
  --output sheets \
  --spreadsheet-id xyz789
 ```
 ---
 ## Output to Notion
 ### Create Findings Database
 **User prompt:**
 > Create an SEO audit findings database in Notion for example.com
 **Agent workflow:**
 1. Search for existing SEO audit pages
 2. Create new database with schema
 3. Add initial findings from audit
 4. Set up views (by priority, by category)
 5. Share database link with user
 ---
 ## Batch Operations
 ### Audit Multiple Pages
 **User prompt:**
 > Check schema markup on these URLs: url1.com, url2.com, url3.com
 **Agent workflow:**
 1. Queue URLs for processing
 2. Validate each URL sequentially
 3. Aggregate findings
 4. Generate comparison report
 ---
 ## Integration with Search Console
 ### Performance Report
 **User prompt:**
 > Get Search Console performance data for the last 30 days
 **Agent workflow:**
 1. Verify Search Console access
 2. Query search analytics API
 3. Get top queries and pages
 4. Calculate CTR and position changes
 5. Identify opportunities
 **Sample output:**
 ```
 Search Console Performance (Last 30 Days)
 ========================================
 Total Clicks: 12,345
 Total Impressions: 456,789
 Average CTR: 2.7%
 Average Position: 15.3
 Top Queries:
 1. "example product" - 1,234 clicks, position 3.2
 2. "example service" - 987 clicks, position 5.1
 3. "example review" - 654 clicks, position 8.4
 Pages with Opportunities:
 - /product-page: High impressions, low CTR (improve title)
 - /service-page: Good CTR, position 11 (push to page 1)
 ```
 ---
 ## Real-World Examples (OurDigital)
 ### Example: Audit blog.ourdigital.org
 **User prompt:**
 > Perform SEO audit for https://blog.ourdigital.org and export to Notion
 **Actual Results:**
 ```
 === SEO Audit: blog.ourdigital.org ===
 Robots.txt: ✓ Valid
  - 6 disallow rules
  - Sitemap declared
 Sitemap: ✓ Valid
  - 126 posts indexed
  - All URLs accessible
 Schema Markup: ⚠ Issues Found
  - Organization missing 'url' property (High)
  - WebPage missing 'name' property (High)
  - Missing SearchAction on WebSite (Medium)
  - Missing sameAs on Organization (Medium)
 Core Web Vitals (Mobile):
  - Performance: 53/100
  - SEO: 100/100
  - LCP: 5.91s ✗ Poor
  - CLS: 0.085 ✓ Good
  - TBT: 651ms ✗ Poor
 Findings exported to Notion: 6 issues
 ```
 ### Example: GA4 Traffic Analysis
 **User prompt:**
 > Get traffic data for OurDigital Blog from GA4
 **Actual Results:**
 ```
 GA4 Property: OurDigital Blog (489750460)
 Period: Last 30 days
 Top Pages by Views:
 1. / (Homepage): 86 views
 2. /google-business-profile-ownership-authentication: 59 views
 3. /information-overload/: 37 views
 4. /social-media-vs-sns/: 23 views
 5. /reputation-in-connected-world/: 19 views
 ```
 ### Example: Search Console Performance
 **User prompt:**
 > Get Search Console data for ourdigital.org
 **Actual Results:**
 ```
 Property: sc-domain:ourdigital.org
 Period: Last 30 days
 Top Pages by Clicks:
 1. ourdigital.org/information-overload - 27 clicks, pos 4.2
 2. ourdigital.org/google-business-profile-ownership - 18 clicks, pos 5.9
 3. ourdigital.org/social-media-vs-sns - 13 clicks, pos 9.5
 4. ourdigital.org/website-migration-redirect - 12 clicks, pos 17.9
 5. ourdigital.org/google-brand-lift-measurement - 7 clicks, pos 5.7
 ```
 ---
 ## Notion Database Structure
 ### Finding Entry Example
 **Issue:** Organization schema missing 'url' property
 **Properties:**
 | Field | Value |
 |-------|-------|
 | Category | Schema/Structured Data |
 | Priority | High |
 | Site | https://blog.ourdigital.org |
 | URL | https://blog.ourdigital.org/posts/example/ |
 | Found Date | 2024-12-14 |
 | Audit ID | blog.ourdigital.org-20241214-123456 |
 **Page Content:**
 ```markdown
 ## Description
 The Organization schema on the blog post is missing the required
 'url' property that identifies the organization's website.
 ## Impact
 ⚠️ May affect rich result eligibility and knowledge panel display
 in search results. Google uses the url property to verify and
 connect your organization across web properties.
 ## Recommendation
 💡 Add 'url': 'https://ourdigital.org' to the Organization schema
 markup in your site's JSON-LD structured data.
 ```
 ---
 ## API Configuration Reference
 ### Available Properties
 | API | Property/Domain | ID |
 |-----|-----------------|-----|
 | Search Console | sc-domain:ourdigital.org | - |
 | GA4 | OurDigital Lab | 218477407 |
 | GA4 | OurDigital Journal | 413643875 |
 | GA4 | OurDigital Blog | 489750460 |
 | Custom Search | - | e5f27994f2bab4bf2 |
 ### Service Account
 ```
 Email: ourdigital-seo-agent@ourdigital-insights.iam.gserviceaccount.com
 File: ~/.credential/ourdigital-seo-agent.json
 ```
--- a/ourdigital-custom-skills/12-ourdigital-seo-audit/reference.md
+++ b/ourdigital-custom-skills/12-ourdigital-seo-audit/reference.md
@@ -0,0 +1,606 @@
 # OurDigital SEO Audit - API Reference
 ## Google Search Console API
 ### Authentication
 ```python
 from google.oauth2 import service_account
 from googleapiclient.discovery import build
 SCOPES = ['https://www.googleapis.com/auth/webmasters.readonly']
 credentials = service_account.Credentials.from_service_account_file(
    'service-account-key.json', scopes=SCOPES
 )
 service = build('searchconsole', 'v1', credentials=credentials)
 ```
 ### Endpoints
 #### Search Analytics
 ```python
 # Get search performance data
 request = {
    'startDate': '2024-01-01',
    'endDate': '2024-12-31',
    'dimensions': ['query', 'page', 'country', 'device'],
    'rowLimit': 25000,
    'dimensionFilterGroups': [{
        'filters': [{
            'dimension': 'country',
            'expression': 'kor'
        }]
    }]
 }
 response = service.searchanalytics().query(
    siteUrl='sc-domain:example.com',
    body=request
 ).execute()
 ```
 #### URL Inspection
 ```python
 request = {
    'inspectionUrl': 'https://example.com/page',
    'siteUrl': 'sc-domain:example.com'
 }
 response = service.urlInspection().index().inspect(body=request).execute()
 ```
 #### Sitemaps
 ```python
 # List sitemaps
 sitemaps = service.sitemaps().list(siteUrl='sc-domain:example.com').execute()
 # Submit sitemap
 service.sitemaps().submit(
    siteUrl='sc-domain:example.com',
    feedpath='https://example.com/sitemap.xml'
 ).execute()
 ```
 ### Rate Limits
 - 1,200 queries per minute per project
 - 25,000 rows max per request
 ---
 ## PageSpeed Insights API
 ### Authentication
 ```python
 import requests
 API_KEY = 'your-api-key'
 BASE_URL = 'https://www.googleapis.com/pagespeedonline/v5/runPagespeed'
 ```
 ### Request Parameters
 ```python
 params = {
    'url': 'https://example.com',
    'key': API_KEY,
    'strategy': 'mobile',  # or 'desktop'
    'category': ['performance', 'accessibility', 'best-practices', 'seo']
 }
 response = requests.get(BASE_URL, params=params)
 ```
 ### Response Structure
 ```json
 {
  "lighthouseResult": {
    "categories": {
      "performance": { "score": 0.85 },
      "seo": { "score": 0.92 }
    },
    "audits": {
      "largest-contentful-paint": {
        "numericValue": 2500,
        "displayValue": "2.5 s"
      },
      "cumulative-layout-shift": {
        "numericValue": 0.05
      },
      "total-blocking-time": {
        "numericValue": 150
      }
    }
  },
  "loadingExperience": {
    "metrics": {
      "LARGEST_CONTENTFUL_PAINT_MS": {
        "percentile": 2500,
        "category": "AVERAGE"
      }
    }
  }
 }
 ```
 ### Core Web Vitals Thresholds
 | Metric | Good | Needs Improvement | Poor |
 |--------|------|-------------------|------|
 | LCP | ≤ 2.5s | 2.5s - 4.0s | > 4.0s |
 | FID | ≤ 100ms | 100ms - 300ms | > 300ms |
 | CLS | ≤ 0.1 | 0.1 - 0.25 | > 0.25 |
 | INP | ≤ 200ms | 200ms - 500ms | > 500ms |
 | TTFB | ≤ 800ms | 800ms - 1800ms | > 1800ms |
 | FCP | ≤ 1.8s | 1.8s - 3.0s | > 3.0s |
 ### Rate Limits
 - 25,000 queries per day (free tier)
 - No per-minute limit
 ---
 ## Google Analytics 4 Data API
 ### Authentication
 ```python
 from google.analytics.data_v1beta import BetaAnalyticsDataClient
 from google.analytics.data_v1beta.types import RunReportRequest
 client = BetaAnalyticsDataClient()
 property_id = '123456789'
 ```
 ### Common Reports
 #### Traffic Overview
 ```python
 request = RunReportRequest(
    property=f'properties/{property_id}',
    dimensions=[
        {'name': 'date'},
        {'name': 'sessionDefaultChannelGroup'}
    ],
    metrics=[
        {'name': 'sessions'},
        {'name': 'totalUsers'},
        {'name': 'screenPageViews'},
        {'name': 'bounceRate'}
    ],
    date_ranges=[{'start_date': '30daysAgo', 'end_date': 'today'}]
 )
 response = client.run_report(request)
 ```
 #### Landing Pages
 ```python
 request = RunReportRequest(
    property=f'properties/{property_id}',
    dimensions=[{'name': 'landingPage'}],
    metrics=[
        {'name': 'sessions'},
        {'name': 'engagementRate'},
        {'name': 'conversions'}
    ],
    date_ranges=[{'start_date': '30daysAgo', 'end_date': 'today'}],
    order_bys=[{'metric': {'metric_name': 'sessions'}, 'desc': True}],
    limit=100
 )
 ```
 ### Useful Dimensions
 - `date`, `dateHour`
 - `sessionDefaultChannelGroup`
 - `landingPage`, `pagePath`
 - `deviceCategory`, `operatingSystem`
 - `country`, `city`
 - `sessionSource`, `sessionMedium`
 ### Useful Metrics
 - `sessions`, `totalUsers`, `newUsers`
 - `screenPageViews`, `engagementRate`
 - `averageSessionDuration`
 - `bounceRate`, `conversions`
 ---
 ## Google Trends API (pytrends)
 ### Installation
 ```bash
 pip install pytrends
 ```
 ### Usage
 ```python
 from pytrends.request import TrendReq
 pytrends = TrendReq(hl='ko-KR', tz=540)
 # Interest over time
 pytrends.build_payload(['keyword1', 'keyword2'], timeframe='today 12-m', geo='KR')
 interest_df = pytrends.interest_over_time()
 # Related queries
 related = pytrends.related_queries()
 # Trending searches
 trending = pytrends.trending_searches(pn='south_korea')
 # Suggestions
 suggestions = pytrends.suggestions('seo')
 ```
 ### Rate Limits
 - No official limits, but implement delays (1-2 seconds between requests)
 - May trigger CAPTCHA with heavy usage
 ---
 ## Custom Search JSON API
 ### Authentication
 ```python
 import requests
 API_KEY = 'your-api-key'
 CX = 'your-search-engine-id'  # Programmable Search Engine ID
 BASE_URL = 'https://www.googleapis.com/customsearch/v1'
 ```
 ### Request
 ```python
 params = {
    'key': API_KEY,
    'cx': CX,
    'q': 'search query',
    'num': 10,  # 1-10
    'start': 1,  # Pagination
    'gl': 'kr',  # Country
    'hl': 'ko'   # Language
 }
 response = requests.get(BASE_URL, params=params)
 ```
 ### Response Structure
 ```json
 {
  "searchInformation": {
    "totalResults": "12345",
    "searchTime": 0.5
  },
  "items": [
    {
      "title": "Page Title",
      "link": "https://example.com",
      "snippet": "Description...",
      "pagemap": {
        "metatags": [...],
        "cse_image": [...]
      }
    }
  ]
 }
 ```
 ### Rate Limits
 - 100 queries per day (free)
 - 10,000 queries per day ($5 per 1,000)
 ---
 ## Knowledge Graph Search API
 ### Request
 ```python
 API_KEY = 'your-api-key'
 BASE_URL = 'https://kgsearch.googleapis.com/v1/entities:search'
 params = {
    'key': API_KEY,
    'query': 'entity name',
    'types': 'Organization',
    'languages': 'ko',
    'limit': 10
 }
 response = requests.get(BASE_URL, params=params)
 ```
 ### Response
 ```json
 {
  "itemListElement": [
    {
      "result": {
        "@type": "EntitySearchResult",
        "name": "Entity Name",
        "description": "Description...",
        "@id": "kg:/m/entity_id",
        "detailedDescription": {
          "articleBody": "..."
        }
      },
      "resultScore": 1234.56
    }
  ]
 }
 ```
 ---
 ## Schema.org Reference
 ### JSON-LD Format
 ```html
 <script type="application/ld+json">
 {
  "@context": "https://schema.org",
  "@type": "Organization",
  "name": "Company Name",
  "url": "https://example.com"
 }
 </script>
 ```
 ### Common Schema Types
 #### Organization
 ```json
 {
  "@context": "https://schema.org",
  "@type": "Organization",
  "name": "Company Name",
  "url": "https://example.com",
  "logo": "https://example.com/logo.png",
  "sameAs": [
    "https://facebook.com/company",
    "https://twitter.com/company"
  ],
  "contactPoint": {
    "@type": "ContactPoint",
    "telephone": "+82-2-1234-5678",
    "contactType": "customer service"
  }
 }
 ```
 #### LocalBusiness
 ```json
 {
  "@context": "https://schema.org",
  "@type": "LocalBusiness",
  "name": "Business Name",
  "address": {
    "@type": "PostalAddress",
    "streetAddress": "123 Street",
    "addressLocality": "Seoul",
    "addressRegion": "Seoul",
    "postalCode": "12345",
    "addressCountry": "KR"
  },
  "geo": {
    "@type": "GeoCoordinates",
    "latitude": 37.5665,
    "longitude": 126.9780
  },
  "openingHoursSpecification": [{
    "@type": "OpeningHoursSpecification",
    "dayOfWeek": ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"],
    "opens": "09:00",
    "closes": "18:00"
  }]
 }
 ```
 #### Article/BlogPosting
 ```json
 {
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "Article Title",
  "author": {
    "@type": "Person",
    "name": "Author Name"
  },
  "datePublished": "2024-01-01",
  "dateModified": "2024-01-15",
  "image": "https://example.com/image.jpg",
  "publisher": {
    "@type": "Organization",
    "name": "Publisher Name",
    "logo": {
      "@type": "ImageObject",
      "url": "https://example.com/logo.png"
    }
  }
 }
 ```
 #### Product
 ```json
 {
  "@context": "https://schema.org",
  "@type": "Product",
  "name": "Product Name",
  "image": "https://example.com/product.jpg",
  "description": "Product description",
  "brand": {
    "@type": "Brand",
    "name": "Brand Name"
  },
  "offers": {
    "@type": "Offer",
    "price": "29900",
    "priceCurrency": "KRW",
    "availability": "https://schema.org/InStock"
  },
  "aggregateRating": {
    "@type": "AggregateRating",
    "ratingValue": "4.5",
    "reviewCount": "100"
  }
 }
 ```
 #### FAQPage
 ```json
 {
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [{
    "@type": "Question",
    "name": "Question text?",
    "acceptedAnswer": {
      "@type": "Answer",
      "text": "Answer text."
    }
  }]
 }
 ```
 #### BreadcrumbList
 ```json
 {
  "@context": "https://schema.org",
  "@type": "BreadcrumbList",
  "itemListElement": [{
    "@type": "ListItem",
    "position": 1,
    "name": "Home",
    "item": "https://example.com/"
  }, {
    "@type": "ListItem",
    "position": 2,
    "name": "Category",
    "item": "https://example.com/category/"
  }]
 }
 ```
 #### WebSite (with SearchAction)
 ```json
 {
  "@context": "https://schema.org",
  "@type": "WebSite",
  "name": "Site Name",
  "url": "https://example.com",
  "potentialAction": {
    "@type": "SearchAction",
    "target": {
      "@type": "EntryPoint",
      "urlTemplate": "https://example.com/search?q={search_term_string}"
    },
    "query-input": "required name=search_term_string"
  }
 }
 ```
 ---
 ## XML Sitemap Specification
 ### Format
 ```xml
 <?xml version="1.0" encoding="UTF-8"?>
 <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/page</loc>
    <lastmod>2024-01-15</lastmod>
    <changefreq>weekly</changefreq>
    <priority>0.8</priority>
  </url>
 </urlset>
 ```
 ### Index Sitemap
 ```xml
 <?xml version="1.0" encoding="UTF-8"?>
 <sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://example.com/sitemap-posts.xml</loc>
    <lastmod>2024-01-15</lastmod>
  </sitemap>
 </sitemapindex>
 ```
 ### Limits
 - 50,000 URLs max per sitemap
 - 50MB uncompressed max
 - Use index for larger sites
 ### Best Practices
 - Use absolute URLs
 - Include only canonical URLs
 - Keep lastmod accurate
 - Exclude noindex pages
 ---
 ## Robots.txt Reference
 ### Directives
 ```txt
 # Comments start with #
 User-agent: *
 Disallow: /admin/
 Disallow: /private/
 Allow: /public/
 User-agent: Googlebot
 Disallow: /no-google/
 Crawl-delay: 1
 Sitemap: https://example.com/sitemap.xml
 ```
 ### Common User-agents
 - `*` - All bots
 - `Googlebot` - Google crawler
 - `Googlebot-Image` - Google Image crawler
 - `Bingbot` - Bing crawler
 - `Yandex` - Yandex crawler
 - `Baiduspider` - Baidu crawler
 ### Pattern Matching
 - `*` - Wildcard (any sequence)
 - `$` - End of URL
 - `/path/` - Directory
 - `/*.pdf$` - All PDFs
 ### Testing
 ```python
 from urllib.robotparser import RobotFileParser
 rp = RobotFileParser()
 rp.set_url("https://example.com/robots.txt")
 rp.read()
 # Check if URL is allowed
 can_fetch = rp.can_fetch("Googlebot", "https://example.com/page")
 ```
 ---
 ## Error Handling
 ### HTTP Status Codes
 | Code | Meaning | Action |
 |------|---------|--------|
 | 200 | OK | Process response |
 | 301/302 | Redirect | Follow or flag |
 | 400 | Bad Request | Check parameters |
 | 401 | Unauthorized | Check credentials |
 | 403 | Forbidden | Check permissions |
 | 404 | Not Found | Flag missing resource |
 | 429 | Rate Limited | Implement backoff |
 | 500 | Server Error | Retry with backoff |
 | 503 | Service Unavailable | Retry later |
 ### Retry Strategy
 ```python
 from tenacity import retry, stop_after_attempt, wait_exponential
@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=2, max=10)
 )
 async def make_request(url):
    # Request logic
    pass
 ```
--- a/ourdigital-custom-skills/12-ourdigital-seo-audit/scripts/base_client.py
+++ b/ourdigital-custom-skills/12-ourdigital-seo-audit/scripts/base_client.py
@@ -0,0 +1,207 @@
 """
 Base Client - Shared async client utilities
 ===========================================
 Purpose: Rate-limited async operations for API clients
 Python: 3.10+
 """
 import asyncio
 import logging
 import os
 from asyncio import Semaphore
 from datetime import datetime
 from typing import Any, Callable, TypeVar
 from dotenv import load_dotenv
 from tenacity import (
    retry,
    stop_after_attempt,
    wait_exponential,
    retry_if_exception_type,
 )
 # Load environment variables
 load_dotenv()
 # Logging setup
 logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s",
 )
 T = TypeVar("T")
 class RateLimiter:
    """Rate limiter using token bucket algorithm."""
    def __init__(self, rate: float, per: float = 1.0):
        """
        Initialize rate limiter.
        Args:
            rate: Number of requests allowed
            per: Time period in seconds (default: 1 second)
        """
        self.rate = rate
        self.per = per
        self.tokens = rate
        self.last_update = datetime.now()
        self._lock = asyncio.Lock()
    async def acquire(self) -> None:
        """Acquire a token, waiting if necessary."""
        async with self._lock:
            now = datetime.now()
            elapsed = (now - self.last_update).total_seconds()
            self.tokens = min(self.rate, self.tokens + elapsed * (self.rate / self.per))
            self.last_update = now
            if self.tokens < 1:
                wait_time = (1 - self.tokens) * (self.per / self.rate)
                await asyncio.sleep(wait_time)
                self.tokens = 0
            else:
                self.tokens -= 1
 class BaseAsyncClient:
    """Base class for async API clients with rate limiting."""
    def __init__(
        self,
        max_concurrent: int = 5,
        requests_per_second: float = 3.0,
        logger: logging.Logger | None = None,
    ):
        """
        Initialize base client.
        Args:
            max_concurrent: Maximum concurrent requests
            requests_per_second: Rate limit
            logger: Logger instance
        """
        self.semaphore = Semaphore(max_concurrent)
        self.rate_limiter = RateLimiter(requests_per_second)
        self.logger = logger or logging.getLogger(self.__class__.__name__)
        self.stats = {
            "requests": 0,
            "success": 0,
            "errors": 0,
            "retries": 0,
        }
    @retry(
        stop=stop_after_attempt(3),
        wait=wait_exponential(multiplier=1, min=2, max=10),
        retry=retry_if_exception_type(Exception),
    )
    async def _rate_limited_request(
        self,
        coro: Callable[[], Any],
    ) -> Any:
        """Execute a request with rate limiting and retry."""
        async with self.semaphore:
            await self.rate_limiter.acquire()
            self.stats["requests"] += 1
            try:
                result = await coro()
                self.stats["success"] += 1
                return result
            except Exception as e:
                self.stats["errors"] += 1
                self.logger.error(f"Request failed: {e}")
                raise
    async def batch_requests(
        self,
        requests: list[Callable[[], Any]],
        desc: str = "Processing",
    ) -> list[Any]:
        """Execute multiple requests concurrently."""
        try:
            from tqdm.asyncio import tqdm
            has_tqdm = True
        except ImportError:
            has_tqdm = False
        async def execute(req: Callable) -> Any:
            try:
                return await self._rate_limited_request(req)
            except Exception as e:
                return {"error": str(e)}
        tasks = [execute(req) for req in requests]
        if has_tqdm:
            results = []
            for coro in tqdm.as_completed(tasks, total=len(tasks), desc=desc):
                result = await coro
                results.append(result)
            return results
        else:
            return await asyncio.gather(*tasks, return_exceptions=True)
    def print_stats(self) -> None:
        """Print request statistics."""
        self.logger.info("=" * 40)
        self.logger.info("Request Statistics:")
        self.logger.info(f"  Total Requests: {self.stats['requests']}")
        self.logger.info(f"  Successful: {self.stats['success']}")
        self.logger.info(f"  Errors: {self.stats['errors']}")
        self.logger.info("=" * 40)
 class ConfigManager:
    """Manage API configuration and credentials."""
    def __init__(self):
        load_dotenv()
    @property
    def google_credentials_path(self) -> str | None:
        """Get Google service account credentials path."""
        # Prefer SEO-specific credentials, fallback to general credentials
        seo_creds = os.path.expanduser("~/.credential/ourdigital-seo-agent.json")
        if os.path.exists(seo_creds):
            return seo_creds
        return os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
    @property
    def pagespeed_api_key(self) -> str | None:
        """Get PageSpeed Insights API key."""
        return os.getenv("PAGESPEED_API_KEY")
    @property
    def custom_search_api_key(self) -> str | None:
        """Get Custom Search API key."""
        return os.getenv("CUSTOM_SEARCH_API_KEY")
    @property
    def custom_search_engine_id(self) -> str | None:
        """Get Custom Search Engine ID."""
        return os.getenv("CUSTOM_SEARCH_ENGINE_ID")
    @property
    def notion_token(self) -> str | None:
        """Get Notion API token."""
        return os.getenv("NOTION_TOKEN") or os.getenv("NOTION_API_KEY")
    def validate_google_credentials(self) -> bool:
        """Validate Google credentials are configured."""
        creds_path = self.google_credentials_path
        if not creds_path:
            return False
        return os.path.exists(creds_path)
    def get_required(self, key: str) -> str:
        """Get required environment variable or raise error."""
        value = os.getenv(key)
        if not value:
            raise ValueError(f"Missing required environment variable: {key}")
        return value
 # Singleton config instance
 config = ConfigManager()
--- a/ourdigital-custom-skills/12-ourdigital-seo-audit/scripts/full_audit.py
+++ b/ourdigital-custom-skills/12-ourdigital-seo-audit/scripts/full_audit.py
@@ -0,0 +1,497 @@
 """
 Full SEO Audit - Orchestration Script
 =====================================
 Purpose: Run comprehensive SEO audit combining all tools
 Python: 3.10+
 Usage:
    python full_audit.py --url https://example.com --output notion --notion-page-id abc123
 """
 import argparse
 import json
 import logging
 from dataclasses import dataclass, field
 from datetime import datetime
 from typing import Any
 from urllib.parse import urlparse
 from robots_checker import RobotsChecker
 from schema_validator import SchemaValidator
 from sitemap_validator import SitemapValidator
 from pagespeed_client import PageSpeedClient
 from notion_reporter import NotionReporter, SEOFinding
 logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s",
 )
 logger = logging.getLogger(__name__)
@dataclass
 class AuditResult:
    """Complete SEO audit result."""
    url: str
    timestamp: str = field(default_factory=lambda: datetime.now().isoformat())
    robots: dict = field(default_factory=dict)
    sitemap: dict = field(default_factory=dict)
    schema: dict = field(default_factory=dict)
    performance: dict = field(default_factory=dict)
    findings: list[SEOFinding] = field(default_factory=list)
    summary: dict = field(default_factory=dict)
    def to_dict(self) -> dict:
        return {
            "url": self.url,
            "timestamp": self.timestamp,
            "robots": self.robots,
            "sitemap": self.sitemap,
            "schema": self.schema,
            "performance": self.performance,
            "summary": self.summary,
            "findings_count": len(self.findings),
        }
 class SEOAuditor:
    """Orchestrate comprehensive SEO audit."""
    def __init__(self):
        self.robots_checker = RobotsChecker()
        self.sitemap_validator = SitemapValidator()
        self.schema_validator = SchemaValidator()
        self.pagespeed_client = PageSpeedClient()
    def run_audit(
        self,
        url: str,
        include_robots: bool = True,
        include_sitemap: bool = True,
        include_schema: bool = True,
        include_performance: bool = True,
    ) -> AuditResult:
        """
        Run comprehensive SEO audit.
        Args:
            url: URL to audit
            include_robots: Check robots.txt
            include_sitemap: Validate sitemap
            include_schema: Validate schema markup
            include_performance: Run PageSpeed analysis
        """
        result = AuditResult(url=url)
        parsed_url = urlparse(url)
        base_url = f"{parsed_url.scheme}://{parsed_url.netloc}"
        logger.info(f"Starting SEO audit for {url}")
        # 1. Robots.txt analysis
        if include_robots:
            logger.info("Analyzing robots.txt...")
            try:
                robots_result = self.robots_checker.analyze(base_url)
                result.robots = robots_result.to_dict()
                self._process_robots_findings(robots_result, result)
            except Exception as e:
                logger.error(f"Robots.txt analysis failed: {e}")
                result.robots = {"error": str(e)}
        # 2. Sitemap validation
        if include_sitemap:
            logger.info("Validating sitemap...")
            sitemap_url = f"{base_url}/sitemap.xml"
            # Try to get sitemap URL from robots.txt
            if result.robots.get("sitemaps"):
                sitemap_url = result.robots["sitemaps"][0]
            try:
                sitemap_result = self.sitemap_validator.validate(sitemap_url)
                result.sitemap = sitemap_result.to_dict()
                self._process_sitemap_findings(sitemap_result, result)
            except Exception as e:
                logger.error(f"Sitemap validation failed: {e}")
                result.sitemap = {"error": str(e)}
        # 3. Schema validation
        if include_schema:
            logger.info("Validating schema markup...")
            try:
                schema_result = self.schema_validator.validate(url=url)
                result.schema = schema_result.to_dict()
                self._process_schema_findings(schema_result, result)
            except Exception as e:
                logger.error(f"Schema validation failed: {e}")
                result.schema = {"error": str(e)}
        # 4. PageSpeed analysis
        if include_performance:
            logger.info("Running PageSpeed analysis...")
            try:
                perf_result = self.pagespeed_client.analyze(url, strategy="mobile")
                result.performance = perf_result.to_dict()
                self._process_performance_findings(perf_result, result)
            except Exception as e:
                logger.error(f"PageSpeed analysis failed: {e}")
                result.performance = {"error": str(e)}
        # Generate summary
        result.summary = self._generate_summary(result)
        logger.info(f"Audit complete. Found {len(result.findings)} issues.")
        return result
    def _process_robots_findings(self, robots_result, audit_result: AuditResult):
        """Convert robots.txt issues to findings."""
        for issue in robots_result.issues:
            priority = "Medium"
            if issue.severity == "error":
                priority = "Critical"
            elif issue.severity == "warning":
                priority = "High"
            audit_result.findings.append(SEOFinding(
                issue=issue.message,
                category="Robots.txt",
                priority=priority,
                description=issue.directive or "",
                recommendation=issue.suggestion or "",
            ))
    def _process_sitemap_findings(self, sitemap_result, audit_result: AuditResult):
        """Convert sitemap issues to findings."""
        for issue in sitemap_result.issues:
            priority = "Medium"
            if issue.severity == "error":
                priority = "High"
            elif issue.severity == "warning":
                priority = "Medium"
            audit_result.findings.append(SEOFinding(
                issue=issue.message,
                category="Sitemap",
                priority=priority,
                url=issue.url,
                recommendation=issue.suggestion or "",
            ))
    def _process_schema_findings(self, schema_result, audit_result: AuditResult):
        """Convert schema issues to findings."""
        for issue in schema_result.issues:
            priority = "Low"
            if issue.severity == "error":
                priority = "High"
            elif issue.severity == "warning":
                priority = "Medium"
            audit_result.findings.append(SEOFinding(
                issue=issue.message,
                category="Schema/Structured Data",
                priority=priority,
                description=f"Schema type: {issue.schema_type}" if issue.schema_type else "",
                recommendation=issue.suggestion or "",
            ))
    def _process_performance_findings(self, perf_result, audit_result: AuditResult):
        """Convert performance issues to findings."""
        cwv = perf_result.core_web_vitals
        # Check Core Web Vitals
        if cwv.lcp_rating == "POOR":
            audit_result.findings.append(SEOFinding(
                issue=f"Poor LCP: {cwv.lcp / 1000:.2f}s (should be < 2.5s)",
                category="Performance",
                priority="Critical",
                impact="Users experience slow page loads, affecting bounce rate and rankings",
                recommendation="Optimize images, reduce server response time, use CDN",
            ))
        elif cwv.lcp_rating == "NEEDS_IMPROVEMENT":
            audit_result.findings.append(SEOFinding(
                issue=f"LCP needs improvement: {cwv.lcp / 1000:.2f}s (target < 2.5s)",
                category="Performance",
                priority="High",
                recommendation="Optimize largest content element loading",
            ))
        if cwv.cls_rating == "POOR":
            audit_result.findings.append(SEOFinding(
                issue=f"Poor CLS: {cwv.cls:.3f} (should be < 0.1)",
                category="Performance",
                priority="High",
                impact="Layout shifts frustrate users",
                recommendation="Set dimensions for images/embeds, avoid inserting content above existing content",
            ))
        if cwv.fid_rating == "POOR":
            audit_result.findings.append(SEOFinding(
                issue=f"Poor FID/TBT: {cwv.fid:.0f}ms (should be < 100ms)",
                category="Performance",
                priority="High",
                impact="Slow interactivity affects user experience",
                recommendation="Reduce JavaScript execution time, break up long tasks",
            ))
        # Check performance score
        if perf_result.performance_score and perf_result.performance_score < 50:
            audit_result.findings.append(SEOFinding(
                issue=f"Low performance score: {perf_result.performance_score:.0f}/100",
                category="Performance",
                priority="High",
                impact="Poor performance affects user experience and SEO",
                recommendation="Address top opportunities from PageSpeed Insights",
            ))
        # Add top opportunities as findings
        for opp in perf_result.opportunities[:3]:
            if opp["savings_ms"] > 500:  # Only significant savings
                audit_result.findings.append(SEOFinding(
                    issue=opp["title"],
                    category="Performance",
                    priority="Medium",
                    description=opp.get("description", ""),
                    impact=f"Potential savings: {opp['savings_ms'] / 1000:.1f}s",
                    recommendation="See PageSpeed Insights for details",
                ))
    def _generate_summary(self, result: AuditResult) -> dict:
        """Generate audit summary."""
        findings_by_priority = {}
        findings_by_category = {}
        for finding in result.findings:
            # Count by priority
            findings_by_priority[finding.priority] = (
                findings_by_priority.get(finding.priority, 0) + 1
            )
            # Count by category
            findings_by_category[finding.category] = (
                findings_by_category.get(finding.category, 0) + 1
            )
        return {
            "total_findings": len(result.findings),
            "findings_by_priority": findings_by_priority,
            "findings_by_category": findings_by_category,
            "robots_accessible": result.robots.get("accessible", False),
            "sitemap_valid": result.sitemap.get("valid", False),
            "schema_valid": result.schema.get("valid", False),
            "performance_score": result.performance.get("scores", {}).get("performance"),
            "quick_wins": [
                f.issue for f in result.findings
                if f.priority in ("Medium", "Low")
            ][:5],
            "critical_issues": [
                f.issue for f in result.findings
                if f.priority == "Critical"
            ],
        }
    def export_to_notion(
        self,
        result: AuditResult,
        parent_page_id: str | None = None,
        use_default_db: bool = True,
    ) -> dict:
        """
        Export audit results to Notion.
        Args:
            result: AuditResult object
            parent_page_id: Parent page ID (for creating new database)
            use_default_db: If True, use OurDigital SEO Audit Log database
        Returns:
            Dict with database_id, summary_page_id, findings_created
        """
        reporter = NotionReporter()
        audit_id = f"{urlparse(result.url).netloc}-{datetime.now().strftime('%Y%m%d-%H%M%S')}"
        # Add site and audit_id to all findings
        for finding in result.findings:
            finding.site = result.url
            finding.audit_id = audit_id
        if use_default_db:
            # Use the default OurDigital SEO Audit Log database
            page_ids = reporter.add_findings_batch(result.findings)
            return {
                "database_id": reporter.DEFAULT_DATABASE_ID if hasattr(reporter, 'DEFAULT_DATABASE_ID') else "2c8581e5-8a1e-8035-880b-e38cefc2f3ef",
                "audit_id": audit_id,
                "findings_created": len(page_ids),
            }
        else:
            # Create new database under parent page
            if not parent_page_id:
                raise ValueError("parent_page_id required when not using default database")
            db_title = f"SEO Audit - {urlparse(result.url).netloc} - {datetime.now().strftime('%Y-%m-%d')}"
            database_id = reporter.create_findings_database(parent_page_id, db_title)
            page_ids = reporter.add_findings_batch(result.findings, database_id)
            # Create summary page
            summary_page_id = reporter.create_audit_summary_page(
                parent_page_id,
                result.url,
                result.summary,
            )
            return {
                "database_id": database_id,
                "summary_page_id": summary_page_id,
                "audit_id": audit_id,
                "findings_created": len(page_ids),
            }
    def generate_report(self, result: AuditResult) -> str:
        """Generate human-readable report."""
        lines = [
            "=" * 70,
            "SEO AUDIT REPORT",
            "=" * 70,
            f"URL: {result.url}",
            f"Date: {result.timestamp}",
            "",
            "-" * 70,
            "SUMMARY",
            "-" * 70,
            f"Total Issues Found: {result.summary.get('total_findings', 0)}",
            "",
        ]
        # Priority breakdown
        lines.append("Issues by Priority:")
        for priority in ["Critical", "High", "Medium", "Low"]:
            count = result.summary.get("findings_by_priority", {}).get(priority, 0)
            if count:
                lines.append(f"  {priority}: {count}")
        lines.append("")
        # Category breakdown
        lines.append("Issues by Category:")
        for category, count in result.summary.get("findings_by_category", {}).items():
            lines.append(f"  {category}: {count}")
        lines.append("")
        lines.append("-" * 70)
        lines.append("STATUS OVERVIEW")
        lines.append("-" * 70)
        # Status checks
        lines.append(f"Robots.txt: {'✓ Accessible' if result.robots.get('accessible') else '✗ Not accessible'}")
        lines.append(f"Sitemap: {'✓ Valid' if result.sitemap.get('valid') else '✗ Issues found'}")
        lines.append(f"Schema: {'✓ Valid' if result.schema.get('valid') else '✗ Issues found'}")
        perf_score = result.performance.get("scores", {}).get("performance")
        if perf_score:
            status = "✓ Good" if perf_score >= 90 else "⚠ Needs work" if perf_score >= 50 else "✗ Poor"
            lines.append(f"Performance: {status} ({perf_score:.0f}/100)")
        # Critical issues
        critical = result.summary.get("critical_issues", [])
        if critical:
            lines.extend([
                "",
                "-" * 70,
                "CRITICAL ISSUES (Fix Immediately)",
                "-" * 70,
            ])
            for issue in critical:
                lines.append(f"  • {issue}")
        # Quick wins
        quick_wins = result.summary.get("quick_wins", [])
        if quick_wins:
            lines.extend([
                "",
                "-" * 70,
                "QUICK WINS",
                "-" * 70,
            ])
            for issue in quick_wins[:5]:
                lines.append(f"  • {issue}")
        # All findings
        if result.findings:
            lines.extend([
                "",
                "-" * 70,
                "ALL FINDINGS",
                "-" * 70,
            ])
            current_category = None
            for finding in sorted(result.findings, key=lambda x: (x.category, x.priority)):
                if finding.category != current_category:
                    current_category = finding.category
                    lines.append(f"\n[{current_category}]")
                lines.append(f"  [{finding.priority}] {finding.issue}")
                if finding.recommendation:
                    lines.append(f"    → {finding.recommendation}")
        lines.extend(["", "=" * 70])
        return "\n".join(lines)
 def main():
    """CLI entry point."""
    parser = argparse.ArgumentParser(
        description="Run comprehensive SEO audit",
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog="""
 Examples:
  # Run full audit and output to console
  python full_audit.py --url https://example.com
  # Export to Notion
  python full_audit.py --url https://example.com --output notion --notion-page-id abc123
  # Output as JSON
  python full_audit.py --url https://example.com --json
        """,
    )
    parser.add_argument("--url", "-u", required=True, help="URL to audit")
    parser.add_argument("--output", "-o", choices=["console", "notion", "json"],
                       default="console", help="Output format")
    parser.add_argument("--notion-page-id", help="Notion parent page ID (required for notion output)")
    parser.add_argument("--json", action="store_true", help="Output as JSON")
    parser.add_argument("--no-robots", action="store_true", help="Skip robots.txt check")
    parser.add_argument("--no-sitemap", action="store_true", help="Skip sitemap validation")
    parser.add_argument("--no-schema", action="store_true", help="Skip schema validation")
    parser.add_argument("--no-performance", action="store_true", help="Skip PageSpeed analysis")
    args = parser.parse_args()
    auditor = SEOAuditor()
    # Run audit
    result = auditor.run_audit(
        args.url,
        include_robots=not args.no_robots,
        include_sitemap=not args.no_sitemap,
        include_schema=not args.no_schema,
        include_performance=not args.no_performance,
    )
    # Output results
    if args.json or args.output == "json":
        print(json.dumps(result.to_dict(), indent=2, default=str))
    elif args.output == "notion":
        if not args.notion_page_id:
            parser.error("--notion-page-id required for notion output")
        notion_result = auditor.export_to_notion(result, args.notion_page_id)
        print(f"Exported to Notion:")
        print(f"  Database ID: {notion_result['database_id']}")
        print(f"  Summary Page: {notion_result['summary_page_id']}")
        print(f"  Findings Created: {notion_result['findings_created']}")
    else:
        print(auditor.generate_report(result))
 if __name__ == "__main__":
    main()
--- a/ourdigital-custom-skills/12-ourdigital-seo-audit/scripts/gsc_client.py
+++ b/ourdigital-custom-skills/12-ourdigital-seo-audit/scripts/gsc_client.py
@@ -0,0 +1,409 @@
 """
 Google Search Console Client
 ============================
 Purpose: Interact with Google Search Console API for SEO data
 Python: 3.10+
 Usage:
    from gsc_client import SearchConsoleClient
    client = SearchConsoleClient()
    data = client.get_search_analytics("sc-domain:example.com")
 """
 import logging
 from dataclasses import dataclass, field
 from datetime import datetime, timedelta
 from typing import Any
 from google.oauth2 import service_account
 from googleapiclient.discovery import build
 from base_client import config
 logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s",
 )
 logger = logging.getLogger(__name__)
@dataclass
 class SearchAnalyticsResult:
    """Search analytics query result."""
    rows: list[dict] = field(default_factory=list)
    total_clicks: int = 0
    total_impressions: int = 0
    average_ctr: float = 0.0
    average_position: float = 0.0
@dataclass
 class SitemapInfo:
    """Sitemap information from Search Console."""
    path: str
    last_submitted: str | None = None
    last_downloaded: str | None = None
    is_pending: bool = False
    is_sitemaps_index: bool = False
    warnings: int = 0
    errors: int = 0
 class SearchConsoleClient:
    """Client for Google Search Console API."""
    SCOPES = ["https://www.googleapis.com/auth/webmasters.readonly"]
    def __init__(self, credentials_path: str | None = None):
        """
        Initialize Search Console client.
        Args:
            credentials_path: Path to service account JSON key
        """
        self.credentials_path = credentials_path or config.google_credentials_path
        self._service = None
    @property
    def service(self):
        """Get or create Search Console service."""
        if self._service is None:
            if not self.credentials_path:
                raise ValueError(
                    "Google credentials not configured. "
                    "Set GOOGLE_APPLICATION_CREDENTIALS environment variable."
                )
            credentials = service_account.Credentials.from_service_account_file(
                self.credentials_path,
                scopes=self.SCOPES,
            )
            self._service = build("searchconsole", "v1", credentials=credentials)
        return self._service
    def list_sites(self) -> list[dict]:
        """List all sites accessible to the service account."""
        response = self.service.sites().list().execute()
        return response.get("siteEntry", [])
    def get_search_analytics(
        self,
        site_url: str,
        start_date: str | None = None,
        end_date: str | None = None,
        dimensions: list[str] | None = None,
        row_limit: int = 25000,
        filters: list[dict] | None = None,
    ) -> SearchAnalyticsResult:
        """
        Get search analytics data.
        Args:
            site_url: Site URL (e.g., "sc-domain:example.com" or "https://example.com/")
            start_date: Start date (YYYY-MM-DD), defaults to 30 days ago
            end_date: End date (YYYY-MM-DD), defaults to yesterday
            dimensions: List of dimensions (query, page, country, device, date)
            row_limit: Maximum rows to return
            filters: Dimension filters
        Returns:
            SearchAnalyticsResult with rows and summary stats
        """
        # Default date range: last 30 days
        if not end_date:
            end_date = (datetime.now() - timedelta(days=1)).strftime("%Y-%m-%d")
        if not start_date:
            start_date = (datetime.now() - timedelta(days=30)).strftime("%Y-%m-%d")
        # Default dimensions
        if dimensions is None:
            dimensions = ["query", "page"]
        request_body = {
            "startDate": start_date,
            "endDate": end_date,
            "dimensions": dimensions,
            "rowLimit": row_limit,
        }
        if filters:
            request_body["dimensionFilterGroups"] = [{"filters": filters}]
        try:
            response = self.service.searchanalytics().query(
                siteUrl=site_url,
                body=request_body,
            ).execute()
        except Exception as e:
            logger.error(f"Failed to query search analytics: {e}")
            raise
        rows = response.get("rows", [])
        # Calculate totals
        total_clicks = sum(row.get("clicks", 0) for row in rows)
        total_impressions = sum(row.get("impressions", 0) for row in rows)
        total_ctr = sum(row.get("ctr", 0) for row in rows)
        total_position = sum(row.get("position", 0) for row in rows)
        avg_ctr = total_ctr / len(rows) if rows else 0
        avg_position = total_position / len(rows) if rows else 0
        return SearchAnalyticsResult(
            rows=rows,
            total_clicks=total_clicks,
            total_impressions=total_impressions,
            average_ctr=avg_ctr,
            average_position=avg_position,
        )
    def get_top_queries(
        self,
        site_url: str,
        limit: int = 100,
        start_date: str | None = None,
        end_date: str | None = None,
    ) -> list[dict]:
        """Get top search queries by clicks."""
        result = self.get_search_analytics(
            site_url=site_url,
            dimensions=["query"],
            row_limit=limit,
            start_date=start_date,
            end_date=end_date,
        )
        # Sort by clicks
        sorted_rows = sorted(
            result.rows,
            key=lambda x: x.get("clicks", 0),
            reverse=True,
        )
        return [
            {
                "query": row["keys"][0],
                "clicks": row.get("clicks", 0),
                "impressions": row.get("impressions", 0),
                "ctr": row.get("ctr", 0),
                "position": row.get("position", 0),
            }
            for row in sorted_rows[:limit]
        ]
    def get_top_pages(
        self,
        site_url: str,
        limit: int = 100,
        start_date: str | None = None,
        end_date: str | None = None,
    ) -> list[dict]:
        """Get top pages by clicks."""
        result = self.get_search_analytics(
            site_url=site_url,
            dimensions=["page"],
            row_limit=limit,
            start_date=start_date,
            end_date=end_date,
        )
        sorted_rows = sorted(
            result.rows,
            key=lambda x: x.get("clicks", 0),
            reverse=True,
        )
        return [
            {
                "page": row["keys"][0],
                "clicks": row.get("clicks", 0),
                "impressions": row.get("impressions", 0),
                "ctr": row.get("ctr", 0),
                "position": row.get("position", 0),
            }
            for row in sorted_rows[:limit]
        ]
    def get_sitemaps(self, site_url: str) -> list[SitemapInfo]:
        """Get list of sitemaps for a site."""
        try:
            response = self.service.sitemaps().list(siteUrl=site_url).execute()
        except Exception as e:
            logger.error(f"Failed to get sitemaps: {e}")
            raise
        sitemaps = []
        for sm in response.get("sitemap", []):
            sitemaps.append(SitemapInfo(
                path=sm.get("path", ""),
                last_submitted=sm.get("lastSubmitted"),
                last_downloaded=sm.get("lastDownloaded"),
                is_pending=sm.get("isPending", False),
                is_sitemaps_index=sm.get("isSitemapsIndex", False),
                warnings=sm.get("warnings", 0),
                errors=sm.get("errors", 0),
            ))
        return sitemaps
    def submit_sitemap(self, site_url: str, sitemap_url: str) -> bool:
        """Submit a sitemap for indexing."""
        try:
            self.service.sitemaps().submit(
                siteUrl=site_url,
                feedpath=sitemap_url,
            ).execute()
            logger.info(f"Submitted sitemap: {sitemap_url}")
            return True
        except Exception as e:
            logger.error(f"Failed to submit sitemap: {e}")
            return False
    def inspect_url(self, site_url: str, inspection_url: str) -> dict:
        """
        Inspect a URL's indexing status.
        Note: This uses the URL Inspection API which may have different quotas.
        """
        try:
            response = self.service.urlInspection().index().inspect(
                body={
                    "inspectionUrl": inspection_url,
                    "siteUrl": site_url,
                }
            ).execute()
            result = response.get("inspectionResult", {})
            return {
                "url": inspection_url,
                "indexing_state": result.get("indexStatusResult", {}).get(
                    "coverageState", "Unknown"
                ),
                "last_crawl_time": result.get("indexStatusResult", {}).get(
                    "lastCrawlTime"
                ),
                "crawled_as": result.get("indexStatusResult", {}).get("crawledAs"),
                "robots_txt_state": result.get("indexStatusResult", {}).get(
                    "robotsTxtState"
                ),
                "mobile_usability": result.get("mobileUsabilityResult", {}).get(
                    "verdict", "Unknown"
                ),
                "rich_results": result.get("richResultsResult", {}).get(
                    "verdict", "Unknown"
                ),
            }
        except Exception as e:
            logger.error(f"Failed to inspect URL: {e}")
            raise
    def get_performance_summary(
        self,
        site_url: str,
        days: int = 30,
    ) -> dict:
        """Get a summary of search performance."""
        end_date = (datetime.now() - timedelta(days=1)).strftime("%Y-%m-%d")
        start_date = (datetime.now() - timedelta(days=days)).strftime("%Y-%m-%d")
        # Get overall stats
        overall = self.get_search_analytics(
            site_url=site_url,
            dimensions=[],
            start_date=start_date,
            end_date=end_date,
        )
        # Get top queries
        top_queries = self.get_top_queries(
            site_url=site_url,
            limit=10,
            start_date=start_date,
            end_date=end_date,
        )
        # Get top pages
        top_pages = self.get_top_pages(
            site_url=site_url,
            limit=10,
            start_date=start_date,
            end_date=end_date,
        )
        # Get by device
        by_device = self.get_search_analytics(
            site_url=site_url,
            dimensions=["device"],
            start_date=start_date,
            end_date=end_date,
        )
        device_breakdown = {}
        for row in by_device.rows:
            device = row["keys"][0]
            device_breakdown[device] = {
                "clicks": row.get("clicks", 0),
                "impressions": row.get("impressions", 0),
                "ctr": row.get("ctr", 0),
                "position": row.get("position", 0),
            }
        return {
            "period": f"{start_date} to {end_date}",
            "total_clicks": overall.total_clicks,
            "total_impressions": overall.total_impressions,
            "average_ctr": overall.average_ctr,
            "average_position": overall.average_position,
            "top_queries": top_queries,
            "top_pages": top_pages,
            "by_device": device_breakdown,
        }
 def main():
    """Test the Search Console client."""
    import argparse
    parser = argparse.ArgumentParser(description="Google Search Console Client")
    parser.add_argument("--site", "-s", required=True, help="Site URL")
    parser.add_argument("--action", "-a", default="summary",
                       choices=["summary", "queries", "pages", "sitemaps", "inspect"],
                       help="Action to perform")
    parser.add_argument("--url", help="URL to inspect")
    parser.add_argument("--days", type=int, default=30, help="Days of data")
    args = parser.parse_args()
    client = SearchConsoleClient()
    if args.action == "summary":
        summary = client.get_performance_summary(args.site, args.days)
        import json
        print(json.dumps(summary, indent=2, default=str))
    elif args.action == "queries":
        queries = client.get_top_queries(args.site)
        for q in queries[:20]:
            print(f"{q['query']}: {q['clicks']} clicks, pos {q['position']:.1f}")
    elif args.action == "pages":
        pages = client.get_top_pages(args.site)
        for p in pages[:20]:
            print(f"{p['page']}: {p['clicks']} clicks, pos {p['position']:.1f}")
    elif args.action == "sitemaps":
        sitemaps = client.get_sitemaps(args.site)
        for sm in sitemaps:
            print(f"{sm.path}: errors={sm.errors}, warnings={sm.warnings}")
    elif args.action == "inspect" and args.url:
        result = client.inspect_url(args.site, args.url)
        import json
        print(json.dumps(result, indent=2))
 if __name__ == "__main__":
    main()
--- a/ourdigital-custom-skills/12-ourdigital-seo-audit/scripts/notion_reporter.py
+++ b/ourdigital-custom-skills/12-ourdigital-seo-audit/scripts/notion_reporter.py
@@ -0,0 +1,951 @@
 """
 Notion Reporter - Create SEO audit findings in Notion
 =====================================================
 Purpose: Output SEO audit findings to Notion databases
 Python: 3.10+
 Usage:
    from notion_reporter import NotionReporter, SEOFinding, AuditReport
    reporter = NotionReporter()
    # Create audit report with checklist table
    report = AuditReport(site="https://example.com")
    report.add_finding(SEOFinding(...))
    reporter.create_audit_report(report)
 """
 import json
 import logging
 import os
 from dataclasses import dataclass, field
 from datetime import datetime
 from pathlib import Path
 from typing import Any
 from notion_client import Client
 from base_client import config
 logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s",
 )
 logger = logging.getLogger(__name__)
 # Template directory
 TEMPLATE_DIR = Path(__file__).parent.parent / "templates"
 # Default OurDigital SEO Audit Log database
 DEFAULT_DATABASE_ID = "2c8581e5-8a1e-8035-880b-e38cefc2f3ef"
 # Default parent page for audit reports (OurDigital SEO Audit Log)
 DEFAULT_AUDIT_REPORTS_PAGE_ID = "2c8581e5-8a1e-8035-880b-e38cefc2f3ef"
@dataclass
 class SEOFinding:
    """Represents an SEO audit finding."""
    issue: str
    category: str
    priority: str
    status: str = "To Fix"
    url: str | None = None
    description: str | None = None
    impact: str | None = None
    recommendation: str | None = None
    site: str | None = None  # The audited site URL
    audit_id: str | None = None  # Groups findings from same audit session
    affected_urls: list[str] = field(default_factory=list)  # List of all affected URLs
@dataclass
 class AuditReport:
    """Represents a complete SEO audit report with checklist."""
    site: str
    audit_id: str = field(default_factory=lambda: datetime.now().strftime("%Y%m%d-%H%M%S"))
    audit_date: datetime = field(default_factory=datetime.now)
    findings: list[SEOFinding] = field(default_factory=list)
    # Audit check results
    robots_txt_status: str = "Not checked"
    sitemap_status: str = "Not checked"
    schema_status: str = "Not checked"
    performance_status: str = "Not checked"
    # Summary statistics
    total_urls_checked: int = 0
    total_issues: int = 0
    def add_finding(self, finding: SEOFinding) -> None:
        """Add a finding to the report."""
        finding.site = self.site
        finding.audit_id = f"{self.site.replace('https://', '').replace('http://', '').split('/')[0]}-{self.audit_id}"
        self.findings.append(finding)
        self.total_issues = len(self.findings)
    def get_findings_by_priority(self) -> dict[str, list[SEOFinding]]:
        """Group findings by priority."""
        result = {"Critical": [], "High": [], "Medium": [], "Low": []}
        for f in self.findings:
            if f.priority in result:
                result[f.priority].append(f)
        return result
    def get_findings_by_category(self) -> dict[str, list[SEOFinding]]:
        """Group findings by category."""
        result = {}
        for f in self.findings:
            if f.category not in result:
                result[f.category] = []
            result[f.category].append(f)
        return result
 class NotionReporter:
    """Create and manage SEO audit findings in Notion."""
    CATEGORIES = [
        "Technical SEO",
        "On-page SEO",
        "Content",
        "Local SEO",
        "Performance",
        "Schema/Structured Data",
        "Sitemap",
        "Robots.txt",
    ]
    PRIORITIES = ["Critical", "High", "Medium", "Low"]
    STATUSES = ["To Fix", "In Progress", "Fixed", "Monitoring"]
    CATEGORY_COLORS = {
        "Technical SEO": "blue",
        "On-page SEO": "green",
        "Content": "purple",
        "Local SEO": "orange",
        "Performance": "red",
        "Schema/Structured Data": "yellow",
        "Sitemap": "pink",
        "Robots.txt": "gray",
    }
    PRIORITY_COLORS = {
        "Critical": "red",
        "High": "orange",
        "Medium": "yellow",
        "Low": "gray",
    }
    def __init__(self, token: str | None = None):
        """
        Initialize Notion reporter.
        Args:
            token: Notion API token
        """
        self.token = token or config.notion_token
        if not self.token:
            raise ValueError(
                "Notion token not configured. "
                "Set NOTION_TOKEN or NOTION_API_KEY environment variable."
            )
        self.client = Client(auth=self.token)
    def create_findings_database(
        self,
        parent_page_id: str,
        title: str = "SEO Audit Findings",
    ) -> str:
        """
        Create a new SEO findings database.
        Args:
            parent_page_id: Parent page ID for the database
            title: Database title
        Returns:
            Database ID
        """
        # Build database schema
        properties = {
            "Issue": {"title": {}},
            "Category": {
                "select": {
                    "options": [
                        {"name": cat, "color": self.CATEGORY_COLORS.get(cat, "default")}
                        for cat in self.CATEGORIES
                    ]
                }
            },
            "Priority": {
                "select": {
                    "options": [
                        {"name": pri, "color": self.PRIORITY_COLORS.get(pri, "default")}
                        for pri in self.PRIORITIES
                    ]
                }
            },
            "Status": {
                "status": {
                    "options": [
                        {"name": "To Fix", "color": "red"},
                        {"name": "In Progress", "color": "yellow"},
                        {"name": "Fixed", "color": "green"},
                        {"name": "Monitoring", "color": "blue"},
                    ],
                    "groups": [
                        {"name": "To-do", "option_ids": [], "color": "gray"},
                        {"name": "In progress", "option_ids": [], "color": "blue"},
                        {"name": "Complete", "option_ids": [], "color": "green"},
                    ],
                }
            },
            "URL": {"url": {}},
            "Description": {"rich_text": {}},
            "Impact": {"rich_text": {}},
            "Recommendation": {"rich_text": {}},
            "Found Date": {"date": {}},
        }
        try:
            response = self.client.databases.create(
                parent={"page_id": parent_page_id},
                title=[{"type": "text", "text": {"content": title}}],
                properties=properties,
            )
            database_id = response["id"]
            logger.info(f"Created database: {database_id}")
            return database_id
        except Exception as e:
            logger.error(f"Failed to create database: {e}")
            raise
    def add_finding(
        self,
        finding: SEOFinding,
        database_id: str | None = None,
    ) -> str:
        """
        Add a finding to the database with page content.
        Args:
            finding: SEOFinding object
            database_id: Target database ID (defaults to OurDigital SEO Audit Log)
        Returns:
            Page ID of created entry
        """
        db_id = database_id or DEFAULT_DATABASE_ID
        # Database properties (metadata)
        properties = {
            "Issue": {"title": [{"text": {"content": finding.issue}}]},
            "Category": {"select": {"name": finding.category}},
            "Priority": {"select": {"name": finding.priority}},
            "Found Date": {"date": {"start": datetime.now().strftime("%Y-%m-%d")}},
        }
        if finding.url:
            properties["URL"] = {"url": finding.url}
        if finding.site:
            properties["Site"] = {"url": finding.site}
        if finding.audit_id:
            properties["Audit ID"] = {
                "rich_text": [{"text": {"content": finding.audit_id}}]
            }
        # Page content blocks (Description, Impact, Recommendation)
        children = []
        if finding.description:
            children.extend([
                {
                    "object": "block",
                    "type": "heading_2",
                    "heading_2": {
                        "rich_text": [{"type": "text", "text": {"content": "Description"}}]
                    }
                },
                {
                    "object": "block",
                    "type": "paragraph",
                    "paragraph": {
                        "rich_text": [{"type": "text", "text": {"content": finding.description}}]
                    }
                }
            ])
        if finding.impact:
            children.extend([
                {
                    "object": "block",
                    "type": "heading_2",
                    "heading_2": {
                        "rich_text": [{"type": "text", "text": {"content": "Impact"}}]
                    }
                },
                {
                    "object": "block",
                    "type": "callout",
                    "callout": {
                        "rich_text": [{"type": "text", "text": {"content": finding.impact}}],
                        "icon": {"type": "emoji", "emoji": "⚠️"}
                    }
                }
            ])
        if finding.recommendation:
            children.extend([
                {
                    "object": "block",
                    "type": "heading_2",
                    "heading_2": {
                        "rich_text": [{"type": "text", "text": {"content": "Recommendation"}}]
                    }
                },
                {
                    "object": "block",
                    "type": "callout",
                    "callout": {
                        "rich_text": [{"type": "text", "text": {"content": finding.recommendation}}],
                        "icon": {"type": "emoji", "emoji": "💡"}
                    }
                }
            ])
        try:
            response = self.client.pages.create(
                parent={"database_id": db_id},
                properties=properties,
                children=children if children else None,
            )
            return response["id"]
        except Exception as e:
            logger.error(f"Failed to add finding: {e}")
            raise
    def add_findings_batch(
        self,
        findings: list[SEOFinding],
        database_id: str | None = None,
    ) -> list[str]:
        """
        Add multiple findings to the database.
        Args:
            findings: List of SEOFinding objects
            database_id: Target database ID (defaults to OurDigital SEO Audit Log)
        Returns:
            List of created page IDs
        """
        page_ids = []
        for finding in findings:
            try:
                page_id = self.add_finding(finding, database_id)
                page_ids.append(page_id)
            except Exception as e:
                logger.error(f"Failed to add finding '{finding.issue}': {e}")
        return page_ids
    def create_audit_summary_page(
        self,
        parent_page_id: str,
        url: str,
        summary: dict,
    ) -> str:
        """
        Create a summary page for the audit.
        Args:
            parent_page_id: Parent page ID
            url: Audited URL
            summary: Audit summary data
        Returns:
            Page ID
        """
        # Build page content
        children = [
            {
                "object": "block",
                "type": "heading_1",
                "heading_1": {
                    "rich_text": [{"type": "text", "text": {"content": f"SEO Audit: {url}"}}]
                },
            },
            {
                "object": "block",
                "type": "paragraph",
                "paragraph": {
                    "rich_text": [
                        {
                            "type": "text",
                            "text": {"content": f"Audit Date: {datetime.now().strftime('%Y-%m-%d %H:%M')}"},
                        }
                    ]
                },
            },
            {
                "object": "block",
                "type": "divider",
                "divider": {},
            },
            {
                "object": "block",
                "type": "heading_2",
                "heading_2": {
                    "rich_text": [{"type": "text", "text": {"content": "Summary"}}]
                },
            },
        ]
        # Add summary statistics
        if "stats" in summary:
            stats = summary["stats"]
            stats_text = "\n".join([f"• {k}: {v}" for k, v in stats.items()])
            children.append({
                "object": "block",
                "type": "paragraph",
                "paragraph": {
                    "rich_text": [{"type": "text", "text": {"content": stats_text}}]
                },
            })
        # Add findings by priority
        if "findings_by_priority" in summary:
            children.append({
                "object": "block",
                "type": "heading_2",
                "heading_2": {
                    "rich_text": [{"type": "text", "text": {"content": "Findings by Priority"}}]
                },
            })
            for priority, count in summary["findings_by_priority"].items():
                children.append({
                    "object": "block",
                    "type": "bulleted_list_item",
                    "bulleted_list_item": {
                        "rich_text": [{"type": "text", "text": {"content": f"{priority}: {count}"}}]
                    },
                })
        try:
            response = self.client.pages.create(
                parent={"page_id": parent_page_id},
                properties={
                    "title": {"title": [{"text": {"content": f"SEO Audit - {url}"}}]}
                },
                children=children,
            )
            return response["id"]
        except Exception as e:
            logger.error(f"Failed to create summary page: {e}")
            raise
    def query_findings(
        self,
        database_id: str,
        category: str | None = None,
        priority: str | None = None,
        status: str | None = None,
    ) -> list[dict]:
        """
        Query findings from database.
        Args:
            database_id: Database ID
            category: Filter by category
            priority: Filter by priority
            status: Filter by status
        Returns:
            List of finding records
        """
        filters = []
        if category:
            filters.append({
                "property": "Category",
                "select": {"equals": category},
            })
        if priority:
            filters.append({
                "property": "Priority",
                "select": {"equals": priority},
            })
        if status:
            filters.append({
                "property": "Status",
                "status": {"equals": status},
            })
        query_params = {"database_id": database_id}
        if filters:
            if len(filters) == 1:
                query_params["filter"] = filters[0]
            else:
                query_params["filter"] = {"and": filters}
        try:
            response = self.client.databases.query(**query_params)
            return response.get("results", [])
        except Exception as e:
            logger.error(f"Failed to query findings: {e}")
            raise
    def update_finding_status(
        self,
        page_id: str,
        status: str,
    ) -> None:
        """Update the status of a finding."""
        if status not in self.STATUSES:
            raise ValueError(f"Invalid status: {status}")
        try:
            self.client.pages.update(
                page_id=page_id,
                properties={"Status": {"status": {"name": status}}},
            )
            logger.info(f"Updated finding {page_id} to {status}")
        except Exception as e:
            logger.error(f"Failed to update status: {e}")
            raise
    def create_audit_report(
        self,
        report: "AuditReport",
        database_id: str | None = None,
    ) -> dict:
        """
        Create a comprehensive audit report page with checklist table.
        This creates:
        1. Individual finding pages in the database
        2. A summary page with all findings in table format for checklist tracking
        Args:
            report: AuditReport object with all findings
            database_id: Target database ID (defaults to OurDigital SEO Audit Log)
        Returns:
            Dict with summary_page_id and finding_page_ids
        """
        db_id = database_id or DEFAULT_DATABASE_ID
        # Generate full audit ID
        site_domain = report.site.replace('https://', '').replace('http://', '').split('/')[0]
        full_audit_id = f"{site_domain}-{report.audit_id}"
        result = {
            "audit_id": full_audit_id,
            "site": report.site,
            "summary_page_id": None,
            "finding_page_ids": [],
        }
        # 1. Create individual finding pages in database
        logger.info(f"Creating {len(report.findings)} finding pages...")
        for finding in report.findings:
            finding.audit_id = full_audit_id
            finding.site = report.site
            try:
                page_id = self.add_finding(finding, db_id)
                result["finding_page_ids"].append(page_id)
            except Exception as e:
                logger.error(f"Failed to add finding '{finding.issue}': {e}")
        # 2. Create summary page with checklist table
        logger.info("Creating audit summary page with checklist...")
        summary_page_id = self._create_audit_summary_with_table(report, full_audit_id, db_id)
        result["summary_page_id"] = summary_page_id
        logger.info(f"Audit report created: {full_audit_id}")
        return result
    def _create_audit_summary_with_table(
        self,
        report: "AuditReport",
        audit_id: str,
        database_id: str,
    ) -> str:
        """
        Create audit summary page with checklist table format.
        Args:
            report: AuditReport object
            audit_id: Full audit ID
            database_id: Parent database ID
        Returns:
            Summary page ID
        """
        site_domain = report.site.replace('https://', '').replace('http://', '').split('/')[0]
        # Build page content blocks
        children = []
        # Header with audit info
        children.append({
            "object": "block",
            "type": "callout",
            "callout": {
                "rich_text": [
                    {"type": "text", "text": {"content": f"Audit ID: {audit_id}\n"}},
                    {"type": "text", "text": {"content": f"Date: {report.audit_date.strftime('%Y-%m-%d %H:%M')}\n"}},
                    {"type": "text", "text": {"content": f"Total Issues: {report.total_issues}"}},
                ],
                "icon": {"type": "emoji", "emoji": "📋"},
                "color": "blue_background",
            }
        })
        # Audit Status Summary
        children.append({
            "object": "block",
            "type": "heading_2",
            "heading_2": {
                "rich_text": [{"type": "text", "text": {"content": "Audit Status"}}]
            }
        })
        # Status table
        status_table = {
            "object": "block",
            "type": "table",
            "table": {
                "table_width": 2,
                "has_column_header": True,
                "has_row_header": False,
                "children": [
                    {
                        "type": "table_row",
                        "table_row": {
                            "cells": [
                                [{"type": "text", "text": {"content": "Check"}}],
                                [{"type": "text", "text": {"content": "Status"}}],
                            ]
                        }
                    },
                    {
                        "type": "table_row",
                        "table_row": {
                            "cells": [
                                [{"type": "text", "text": {"content": "Robots.txt"}}],
                                [{"type": "text", "text": {"content": report.robots_txt_status}}],
                            ]
                        }
                    },
                    {
                        "type": "table_row",
                        "table_row": {
                            "cells": [
                                [{"type": "text", "text": {"content": "Sitemap"}}],
                                [{"type": "text", "text": {"content": report.sitemap_status}}],
                            ]
                        }
                    },
                    {
                        "type": "table_row",
                        "table_row": {
                            "cells": [
                                [{"type": "text", "text": {"content": "Schema Markup"}}],
                                [{"type": "text", "text": {"content": report.schema_status}}],
                            ]
                        }
                    },
                    {
                        "type": "table_row",
                        "table_row": {
                            "cells": [
                                [{"type": "text", "text": {"content": "Performance"}}],
                                [{"type": "text", "text": {"content": report.performance_status}}],
                            ]
                        }
                    },
                ]
            }
        }
        children.append(status_table)
        # Divider
        children.append({"object": "block", "type": "divider", "divider": {}})
        # Findings Checklist Header
        children.append({
            "object": "block",
            "type": "heading_2",
            "heading_2": {
                "rich_text": [{"type": "text", "text": {"content": "Findings Checklist"}}]
            }
        })
        children.append({
            "object": "block",
            "type": "paragraph",
            "paragraph": {
                "rich_text": [{"type": "text", "text": {"content": "Use this checklist to track fixes. Check off items as you complete them."}}]
            }
        })
        # Create findings table with checklist format
        if report.findings:
            # Build table rows - Header row
            table_rows = [
                {
                    "type": "table_row",
                    "table_row": {
                        "cells": [
                            [{"type": "text", "text": {"content": "#"}, "annotations": {"bold": True}}],
                            [{"type": "text", "text": {"content": "Priority"}, "annotations": {"bold": True}}],
                            [{"type": "text", "text": {"content": "Category"}, "annotations": {"bold": True}}],
                            [{"type": "text", "text": {"content": "Issue"}, "annotations": {"bold": True}}],
                            [{"type": "text", "text": {"content": "URL"}, "annotations": {"bold": True}}],
                        ]
                    }
                }
            ]
            # Add finding rows
            for idx, finding in enumerate(report.findings, 1):
                # Truncate long text for table cells
                issue_text = finding.issue[:50] + "..." if len(finding.issue) > 50 else finding.issue
                url_text = finding.url[:40] + "..." if finding.url and len(finding.url) > 40 else (finding.url or "-")
                table_rows.append({
                    "type": "table_row",
                    "table_row": {
                        "cells": [
                            [{"type": "text", "text": {"content": str(idx)}}],
                            [{"type": "text", "text": {"content": finding.priority}}],
                            [{"type": "text", "text": {"content": finding.category}}],
                            [{"type": "text", "text": {"content": issue_text}}],
                            [{"type": "text", "text": {"content": url_text}}],
                        ]
                    }
                })
            findings_table = {
                "object": "block",
                "type": "table",
                "table": {
                    "table_width": 5,
                    "has_column_header": True,
                    "has_row_header": False,
                    "children": table_rows
                }
            }
            children.append(findings_table)
        # Divider
        children.append({"object": "block", "type": "divider", "divider": {}})
        # Detailed Findings with To-Do checkboxes
        children.append({
            "object": "block",
            "type": "heading_2",
            "heading_2": {
                "rich_text": [{"type": "text", "text": {"content": "Detailed Findings & Actions"}}]
            }
        })
        # Group findings by priority and add as to-do items
        for priority in ["Critical", "High", "Medium", "Low"]:
            priority_findings = [f for f in report.findings if f.priority == priority]
            if not priority_findings:
                continue
            # Priority header with emoji
            priority_emoji = {"Critical": "🔴", "High": "🟠", "Medium": "🟡", "Low": "⚪"}
            children.append({
                "object": "block",
                "type": "heading_3",
                "heading_3": {
                    "rich_text": [{"type": "text", "text": {"content": f"{priority_emoji.get(priority, '')} {priority} Priority ({len(priority_findings)})"}}]
                }
            })
            # Add each finding as a to-do item with details
            for finding in priority_findings:
                # Main to-do item
                children.append({
                    "object": "block",
                    "type": "to_do",
                    "to_do": {
                        "rich_text": [
                            {"type": "text", "text": {"content": f"[{finding.category}] "}, "annotations": {"bold": True}},
                            {"type": "text", "text": {"content": finding.issue}},
                        ],
                        "checked": False,
                    }
                })
                # URL if available
                if finding.url:
                    children.append({
                        "object": "block",
                        "type": "bulleted_list_item",
                        "bulleted_list_item": {
                            "rich_text": [
                                {"type": "text", "text": {"content": "URL: "}},
                                {"type": "text", "text": {"content": finding.url, "link": {"url": finding.url}}},
                            ]
                        }
                    })
                # Affected URLs list if available
                if finding.affected_urls:
                    children.append({
                        "object": "block",
                        "type": "toggle",
                        "toggle": {
                            "rich_text": [{"type": "text", "text": {"content": f"Affected URLs ({len(finding.affected_urls)})"}}],
                            "children": [
                                {
                                    "object": "block",
                                    "type": "bulleted_list_item",
                                    "bulleted_list_item": {
                                        "rich_text": [{"type": "text", "text": {"content": url, "link": {"url": url} if url.startswith("http") else None}}]
                                    }
                                }
                                for url in finding.affected_urls[:20]  # Limit to 20 URLs
                            ] + ([{
                                "object": "block",
                                "type": "paragraph",
                                "paragraph": {
                                    "rich_text": [{"type": "text", "text": {"content": f"... and {len(finding.affected_urls) - 20} more URLs"}}]
                                }
                            }] if len(finding.affected_urls) > 20 else [])
                        }
                    })
                # Recommendation as sub-item
                if finding.recommendation:
                    children.append({
                        "object": "block",
                        "type": "bulleted_list_item",
                        "bulleted_list_item": {
                            "rich_text": [
                                {"type": "text", "text": {"content": "💡 "}, "annotations": {"bold": True}},
                                {"type": "text", "text": {"content": finding.recommendation}},
                            ]
                        }
                    })
        # Create the summary page
        try:
            response = self.client.pages.create(
                parent={"database_id": database_id},
                properties={
                    "Issue": {"title": [{"text": {"content": f"📊 Audit Report: {site_domain}"}}]},
                    "Category": {"select": {"name": "Technical SEO"}},
                    "Priority": {"select": {"name": "High"}},
                    "Site": {"url": report.site},
                    "Audit ID": {"rich_text": [{"text": {"content": audit_id}}]},
                    "Found Date": {"date": {"start": report.audit_date.strftime("%Y-%m-%d")}},
                },
                children=children,
            )
            logger.info(f"Created audit summary page: {response['id']}")
            return response["id"]
        except Exception as e:
            logger.error(f"Failed to create audit summary page: {e}")
            raise
    def create_quick_audit_report(
        self,
        site: str,
        findings: list[SEOFinding],
        robots_status: str = "Not checked",
        sitemap_status: str = "Not checked",
        schema_status: str = "Not checked",
        performance_status: str = "Not checked",
        database_id: str | None = None,
    ) -> dict:
        """
        Quick method to create audit report from a list of findings.
        Args:
            site: Site URL
            findings: List of SEOFinding objects
            robots_status: Robots.txt check result
            sitemap_status: Sitemap check result
            schema_status: Schema check result
            performance_status: Performance check result
            database_id: Target database ID
        Returns:
            Dict with audit results
        """
        report = AuditReport(site=site)
        report.robots_txt_status = robots_status
        report.sitemap_status = sitemap_status
        report.schema_status = schema_status
        report.performance_status = performance_status
        for finding in findings:
            report.add_finding(finding)
        return self.create_audit_report(report, database_id)
 def main():
    """CLI entry point for testing."""
    import argparse
    parser = argparse.ArgumentParser(description="Notion SEO Reporter")
    parser.add_argument("--action", "-a", required=True,
                       choices=["create-db", "add-finding", "query"],
                       help="Action to perform")
    parser.add_argument("--parent-id", "-p", help="Parent page ID")
    parser.add_argument("--database-id", "-d", help="Database ID")
    parser.add_argument("--title", "-t", default="SEO Audit Findings",
                       help="Database title")
    args = parser.parse_args()
    reporter = NotionReporter()
    if args.action == "create-db":
        if not args.parent_id:
            parser.error("--parent-id required for create-db")
        db_id = reporter.create_findings_database(args.parent_id, args.title)
        print(f"Created database: {db_id}")
    elif args.action == "add-finding":
        if not args.database_id:
            parser.error("--database-id required for add-finding")
        # Example finding
        finding = SEOFinding(
            issue="Missing meta description",
            category="On-page SEO",
            priority="Medium",
            url="https://example.com/page",
            description="Page is missing meta description tag",
            impact="May affect CTR in search results",
            recommendation="Add unique meta description under 160 characters",
        )
        page_id = reporter.add_finding(args.database_id, finding)
        print(f"Created finding: {page_id}")
    elif args.action == "query":
        if not args.database_id:
            parser.error("--database-id required for query")
        findings = reporter.query_findings(args.database_id)
        print(f"Found {len(findings)} findings")
        for f in findings[:5]:
            title = f["properties"]["Issue"]["title"]
            if title:
                print(f"  - {title[0]['plain_text']}")
 if __name__ == "__main__":
    main()
--- a/ourdigital-custom-skills/12-ourdigital-seo-audit/scripts/page_analyzer.py
+++ b/ourdigital-custom-skills/12-ourdigital-seo-audit/scripts/page_analyzer.py
@@ -0,0 +1,569 @@
 """
 Page Analyzer - Extract SEO metadata from web pages
 ===================================================
 Purpose: Comprehensive page-level SEO data extraction
 Python: 3.10+
 Usage:
    from page_analyzer import PageAnalyzer, PageMetadata
    analyzer = PageAnalyzer()
    metadata = analyzer.analyze_url("https://example.com/page")
 """
 import json
 import logging
 import re
 from dataclasses import dataclass, field
 from datetime import datetime
 from typing import Any
 from urllib.parse import urljoin, urlparse
 import requests
 from bs4 import BeautifulSoup
 logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s",
 )
 logger = logging.getLogger(__name__)
@dataclass
 class LinkData:
    """Represents a link found on a page."""
    url: str
    anchor_text: str
    is_internal: bool
    is_nofollow: bool = False
    link_type: str = "body"  # body, nav, footer, etc.
@dataclass
 class HeadingData:
    """Represents a heading found on a page."""
    level: int  # 1-6
    text: str
@dataclass
 class SchemaData:
    """Represents schema.org structured data."""
    schema_type: str
    properties: dict
    format: str = "json-ld"  # json-ld, microdata, rdfa
@dataclass
 class OpenGraphData:
    """Represents Open Graph metadata."""
    og_title: str | None = None
    og_description: str | None = None
    og_image: str | None = None
    og_url: str | None = None
    og_type: str | None = None
    og_site_name: str | None = None
    og_locale: str | None = None
    twitter_card: str | None = None
    twitter_title: str | None = None
    twitter_description: str | None = None
    twitter_image: str | None = None
@dataclass
 class PageMetadata:
    """Complete SEO metadata for a page."""
    # Basic info
    url: str
    status_code: int = 0
    content_type: str = ""
    response_time_ms: float = 0
    analyzed_at: datetime = field(default_factory=datetime.now)
    # Meta tags
    title: str | None = None
    title_length: int = 0
    meta_description: str | None = None
    meta_description_length: int = 0
    canonical_url: str | None = None
    robots_meta: str | None = None
    # Language
    html_lang: str | None = None
    hreflang_tags: list[dict] = field(default_factory=list)  # [{"lang": "en", "url": "..."}]
    # Headings
    headings: list[HeadingData] = field(default_factory=list)
    h1_count: int = 0
    h1_text: str | None = None
    # Open Graph & Social
    open_graph: OpenGraphData = field(default_factory=OpenGraphData)
    # Schema/Structured Data
    schema_data: list[SchemaData] = field(default_factory=list)
    schema_types_found: list[str] = field(default_factory=list)
    # Links
    internal_links: list[LinkData] = field(default_factory=list)
    external_links: list[LinkData] = field(default_factory=list)
    internal_link_count: int = 0
    external_link_count: int = 0
    # Images
    images_total: int = 0
    images_without_alt: int = 0
    images_with_alt: int = 0
    # Content metrics
    word_count: int = 0
    # Issues found
    issues: list[str] = field(default_factory=list)
    warnings: list[str] = field(default_factory=list)
    def to_dict(self) -> dict:
        """Convert to dictionary for JSON serialization."""
        return {
            "url": self.url,
            "status_code": self.status_code,
            "content_type": self.content_type,
            "response_time_ms": self.response_time_ms,
            "analyzed_at": self.analyzed_at.isoformat(),
            "title": self.title,
            "title_length": self.title_length,
            "meta_description": self.meta_description,
            "meta_description_length": self.meta_description_length,
            "canonical_url": self.canonical_url,
            "robots_meta": self.robots_meta,
            "html_lang": self.html_lang,
            "hreflang_tags": self.hreflang_tags,
            "h1_count": self.h1_count,
            "h1_text": self.h1_text,
            "headings_count": len(self.headings),
            "schema_types_found": self.schema_types_found,
            "internal_link_count": self.internal_link_count,
            "external_link_count": self.external_link_count,
            "images_total": self.images_total,
            "images_without_alt": self.images_without_alt,
            "word_count": self.word_count,
            "issues": self.issues,
            "warnings": self.warnings,
            "open_graph": {
                "og_title": self.open_graph.og_title,
                "og_description": self.open_graph.og_description,
                "og_image": self.open_graph.og_image,
                "og_url": self.open_graph.og_url,
                "og_type": self.open_graph.og_type,
            },
        }
    def get_summary(self) -> str:
        """Get a brief summary of the page analysis."""
        lines = [
            f"URL: {self.url}",
            f"Status: {self.status_code}",
            f"Title: {self.title[:50] + '...' if self.title and len(self.title) > 50 else self.title}",
            f"Description: {'✓' if self.meta_description else '✗ Missing'}",
            f"Canonical: {'✓' if self.canonical_url else '✗ Missing'}",
            f"H1: {self.h1_count} found",
            f"Schema: {', '.join(self.schema_types_found) if self.schema_types_found else 'None'}",
            f"Links: {self.internal_link_count} internal, {self.external_link_count} external",
            f"Images: {self.images_total} total, {self.images_without_alt} without alt",
        ]
        if self.issues:
            lines.append(f"Issues: {len(self.issues)}")
        return "\n".join(lines)
 class PageAnalyzer:
    """Analyze web pages for SEO metadata."""
    DEFAULT_USER_AGENT = "Mozilla/5.0 (compatible; OurDigitalSEOBot/1.0; +https://ourdigital.org)"
    def __init__(
        self,
        user_agent: str | None = None,
        timeout: int = 30,
    ):
        """
        Initialize page analyzer.
        Args:
            user_agent: Custom user agent string
            timeout: Request timeout in seconds
        """
        self.user_agent = user_agent or self.DEFAULT_USER_AGENT
        self.timeout = timeout
        self.session = requests.Session()
        self.session.headers.update({
            "User-Agent": self.user_agent,
            "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
            "Accept-Language": "en-US,en;q=0.9,ko;q=0.8",
        })
    def analyze_url(self, url: str) -> PageMetadata:
        """
        Analyze a URL and extract SEO metadata.
        Args:
            url: URL to analyze
        Returns:
            PageMetadata object with all extracted data
        """
        metadata = PageMetadata(url=url)
        try:
            # Fetch page
            start_time = datetime.now()
            response = self.session.get(url, timeout=self.timeout, allow_redirects=True)
            metadata.response_time_ms = (datetime.now() - start_time).total_seconds() * 1000
            metadata.status_code = response.status_code
            metadata.content_type = response.headers.get("Content-Type", "")
            if response.status_code != 200:
                metadata.issues.append(f"HTTP {response.status_code} status")
                if response.status_code >= 400:
                    return metadata
            # Parse HTML
            soup = BeautifulSoup(response.text, "html.parser")
            base_url = url
            # Extract all metadata
            self._extract_basic_meta(soup, metadata)
            self._extract_canonical(soup, metadata, base_url)
            self._extract_robots_meta(soup, metadata)
            self._extract_hreflang(soup, metadata)
            self._extract_headings(soup, metadata)
            self._extract_open_graph(soup, metadata)
            self._extract_schema(soup, metadata)
            self._extract_links(soup, metadata, base_url)
            self._extract_images(soup, metadata)
            self._extract_content_metrics(soup, metadata)
            # Run SEO checks
            self._run_seo_checks(metadata)
        except requests.RequestException as e:
            metadata.issues.append(f"Request failed: {str(e)}")
            logger.error(f"Failed to analyze {url}: {e}")
        except Exception as e:
            metadata.issues.append(f"Analysis error: {str(e)}")
            logger.error(f"Error analyzing {url}: {e}")
        return metadata
    def _extract_basic_meta(self, soup: BeautifulSoup, metadata: PageMetadata) -> None:
        """Extract title and meta description."""
        # Title
        title_tag = soup.find("title")
        if title_tag and title_tag.string:
            metadata.title = title_tag.string.strip()
            metadata.title_length = len(metadata.title)
        # Meta description
        desc_tag = soup.find("meta", attrs={"name": re.compile(r"^description$", re.I)})
        if desc_tag and desc_tag.get("content"):
            metadata.meta_description = desc_tag["content"].strip()
            metadata.meta_description_length = len(metadata.meta_description)
        # HTML lang
        html_tag = soup.find("html")
        if html_tag and html_tag.get("lang"):
            metadata.html_lang = html_tag["lang"]
    def _extract_canonical(self, soup: BeautifulSoup, metadata: PageMetadata, base_url: str) -> None:
        """Extract canonical URL."""
        canonical = soup.find("link", rel="canonical")
        if canonical and canonical.get("href"):
            metadata.canonical_url = urljoin(base_url, canonical["href"])
    def _extract_robots_meta(self, soup: BeautifulSoup, metadata: PageMetadata) -> None:
        """Extract robots meta tag."""
        robots = soup.find("meta", attrs={"name": re.compile(r"^robots$", re.I)})
        if robots and robots.get("content"):
            metadata.robots_meta = robots["content"]
        # Also check for googlebot-specific
        googlebot = soup.find("meta", attrs={"name": re.compile(r"^googlebot$", re.I)})
        if googlebot and googlebot.get("content"):
            if metadata.robots_meta:
                metadata.robots_meta += f" | googlebot: {googlebot['content']}"
            else:
                metadata.robots_meta = f"googlebot: {googlebot['content']}"
    def _extract_hreflang(self, soup: BeautifulSoup, metadata: PageMetadata) -> None:
        """Extract hreflang tags."""
        hreflang_tags = soup.find_all("link", rel="alternate", hreflang=True)
        for tag in hreflang_tags:
            if tag.get("href") and tag.get("hreflang"):
                metadata.hreflang_tags.append({
                    "lang": tag["hreflang"],
                    "url": tag["href"]
                })
    def _extract_headings(self, soup: BeautifulSoup, metadata: PageMetadata) -> None:
        """Extract all headings."""
        for level in range(1, 7):
            for heading in soup.find_all(f"h{level}"):
                text = heading.get_text(strip=True)
                if text:
                    metadata.headings.append(HeadingData(level=level, text=text))
        # Count H1s specifically
        h1_tags = soup.find_all("h1")
        metadata.h1_count = len(h1_tags)
        if h1_tags:
            metadata.h1_text = h1_tags[0].get_text(strip=True)
    def _extract_open_graph(self, soup: BeautifulSoup, metadata: PageMetadata) -> None:
        """Extract Open Graph and Twitter Card data."""
        og = metadata.open_graph
        # Open Graph tags
        og_mappings = {
            "og:title": "og_title",
            "og:description": "og_description",
            "og:image": "og_image",
            "og:url": "og_url",
            "og:type": "og_type",
            "og:site_name": "og_site_name",
            "og:locale": "og_locale",
        }
        for og_prop, attr_name in og_mappings.items():
            tag = soup.find("meta", property=og_prop)
            if tag and tag.get("content"):
                setattr(og, attr_name, tag["content"])
        # Twitter Card tags
        twitter_mappings = {
            "twitter:card": "twitter_card",
            "twitter:title": "twitter_title",
            "twitter:description": "twitter_description",
            "twitter:image": "twitter_image",
        }
        for tw_name, attr_name in twitter_mappings.items():
            tag = soup.find("meta", attrs={"name": tw_name})
            if tag and tag.get("content"):
                setattr(og, attr_name, tag["content"])
    def _extract_schema(self, soup: BeautifulSoup, metadata: PageMetadata) -> None:
        """Extract schema.org structured data."""
        # JSON-LD
        for script in soup.find_all("script", type="application/ld+json"):
            try:
                data = json.loads(script.string)
                if isinstance(data, list):
                    for item in data:
                        self._process_schema_item(item, metadata, "json-ld")
                else:
                    self._process_schema_item(data, metadata, "json-ld")
            except (json.JSONDecodeError, TypeError):
                continue
        # Microdata (basic detection)
        for item in soup.find_all(itemscope=True):
            itemtype = item.get("itemtype", "")
            if itemtype:
                schema_type = itemtype.split("/")[-1]
                if schema_type not in metadata.schema_types_found:
                    metadata.schema_types_found.append(schema_type)
                    metadata.schema_data.append(SchemaData(
                        schema_type=schema_type,
                        properties={},
                        format="microdata"
                    ))
    def _process_schema_item(self, data: dict, metadata: PageMetadata, format_type: str) -> None:
        """Process a single schema.org item."""
        if not isinstance(data, dict):
            return
        schema_type = data.get("@type", "Unknown")
        if isinstance(schema_type, list):
            schema_type = schema_type[0] if schema_type else "Unknown"
        if schema_type not in metadata.schema_types_found:
            metadata.schema_types_found.append(schema_type)
        metadata.schema_data.append(SchemaData(
            schema_type=schema_type,
            properties=data,
            format=format_type
        ))
        # Process nested @graph items
        if "@graph" in data:
            for item in data["@graph"]:
                self._process_schema_item(item, metadata, format_type)
    def _extract_links(self, soup: BeautifulSoup, metadata: PageMetadata, base_url: str) -> None:
        """Extract internal and external links."""
        parsed_base = urlparse(base_url)
        base_domain = parsed_base.netloc.lower()
        for a_tag in soup.find_all("a", href=True):
            href = a_tag["href"]
            # Skip non-http links
            if href.startswith(("#", "javascript:", "mailto:", "tel:")):
                continue
            # Resolve relative URLs
            full_url = urljoin(base_url, href)
            parsed_url = urlparse(full_url)
            # Get anchor text
            anchor_text = a_tag.get_text(strip=True)[:100]  # Limit length
            # Check if nofollow
            rel = a_tag.get("rel", [])
            if isinstance(rel, str):
                rel = rel.split()
            is_nofollow = "nofollow" in rel
            # Determine if internal or external
            link_domain = parsed_url.netloc.lower()
            is_internal = (
                link_domain == base_domain or
                link_domain.endswith(f".{base_domain}") or
                base_domain.endswith(f".{link_domain}")
            )
            link_data = LinkData(
                url=full_url,
                anchor_text=anchor_text,
                is_internal=is_internal,
                is_nofollow=is_nofollow,
            )
            if is_internal:
                metadata.internal_links.append(link_data)
            else:
                metadata.external_links.append(link_data)
        metadata.internal_link_count = len(metadata.internal_links)
        metadata.external_link_count = len(metadata.external_links)
    def _extract_images(self, soup: BeautifulSoup, metadata: PageMetadata) -> None:
        """Extract image information."""
        images = soup.find_all("img")
        metadata.images_total = len(images)
        for img in images:
            alt = img.get("alt", "").strip()
            if alt:
                metadata.images_with_alt += 1
            else:
                metadata.images_without_alt += 1
    def _extract_content_metrics(self, soup: BeautifulSoup, metadata: PageMetadata) -> None:
        """Extract content metrics like word count."""
        # Remove script and style elements
        for element in soup(["script", "style", "noscript"]):
            element.decompose()
        # Get text content
        text = soup.get_text(separator=" ", strip=True)
        words = text.split()
        metadata.word_count = len(words)
    def _run_seo_checks(self, metadata: PageMetadata) -> None:
        """Run SEO checks and add issues/warnings."""
        # Title checks
        if not metadata.title:
            metadata.issues.append("Missing title tag")
        elif metadata.title_length < 30:
            metadata.warnings.append(f"Title too short ({metadata.title_length} chars, recommend 50-60)")
        elif metadata.title_length > 60:
            metadata.warnings.append(f"Title too long ({metadata.title_length} chars, recommend 50-60)")
        # Meta description checks
        if not metadata.meta_description:
            metadata.issues.append("Missing meta description")
        elif metadata.meta_description_length < 120:
            metadata.warnings.append(f"Meta description too short ({metadata.meta_description_length} chars)")
        elif metadata.meta_description_length > 160:
            metadata.warnings.append(f"Meta description too long ({metadata.meta_description_length} chars)")
        # Canonical check
        if not metadata.canonical_url:
            metadata.warnings.append("Missing canonical tag")
        elif metadata.canonical_url != metadata.url:
            metadata.warnings.append(f"Canonical points to different URL: {metadata.canonical_url}")
        # H1 checks
        if metadata.h1_count == 0:
            metadata.issues.append("Missing H1 tag")
        elif metadata.h1_count > 1:
            metadata.warnings.append(f"Multiple H1 tags ({metadata.h1_count})")
        # Image alt check
        if metadata.images_without_alt > 0:
            metadata.warnings.append(f"{metadata.images_without_alt} images missing alt text")
        # Schema check
        if not metadata.schema_types_found:
            metadata.warnings.append("No structured data found")
        # Open Graph check
        if not metadata.open_graph.og_title:
            metadata.warnings.append("Missing Open Graph tags")
        # Robots meta check
        if metadata.robots_meta:
            robots_lower = metadata.robots_meta.lower()
            if "noindex" in robots_lower:
                metadata.issues.append("Page is set to noindex")
            if "nofollow" in robots_lower:
                metadata.warnings.append("Page is set to nofollow")
 def main():
    """CLI entry point for testing."""
    import argparse
    parser = argparse.ArgumentParser(description="Page SEO Analyzer")
    parser.add_argument("url", help="URL to analyze")
    parser.add_argument("--json", "-j", action="store_true", help="Output as JSON")
    args = parser.parse_args()
    analyzer = PageAnalyzer()
    metadata = analyzer.analyze_url(args.url)
    if args.json:
        print(json.dumps(metadata.to_dict(), indent=2, ensure_ascii=False))
    else:
        print("=" * 60)
        print("PAGE ANALYSIS REPORT")
        print("=" * 60)
        print(metadata.get_summary())
        print()
        if metadata.issues:
            print("ISSUES:")
            for issue in metadata.issues:
                print(f"  ✗ {issue}")
        if metadata.warnings:
            print("\nWARNINGS:")
            for warning in metadata.warnings:
                print(f"  ⚠ {warning}")
        if metadata.hreflang_tags:
            print(f"\nHREFLANG TAGS ({len(metadata.hreflang_tags)}):")
            for tag in metadata.hreflang_tags[:5]:
                print(f"  {tag['lang']}: {tag['url']}")
        if metadata.schema_types_found:
            print(f"\nSCHEMA TYPES:")
            for schema_type in metadata.schema_types_found:
                print(f"  - {schema_type}")
 if __name__ == "__main__":
    main()
--- a/ourdigital-custom-skills/12-ourdigital-seo-audit/scripts/pagespeed_client.py
+++ b/ourdigital-custom-skills/12-ourdigital-seo-audit/scripts/pagespeed_client.py
@@ -0,0 +1,452 @@
 """
 PageSpeed Insights Client
 =========================
 Purpose: Get Core Web Vitals and performance data from PageSpeed Insights API
 Python: 3.10+
 Usage:
    from pagespeed_client import PageSpeedClient
    client = PageSpeedClient()
    result = client.analyze("https://example.com")
 """
 import argparse
 import json
 import logging
 from dataclasses import dataclass, field
 from typing import Any
 import requests
 from base_client import config
 logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s",
 )
 logger = logging.getLogger(__name__)
@dataclass
 class CoreWebVitals:
    """Core Web Vitals metrics."""
    lcp: float | None = None  # Largest Contentful Paint (ms)
    fid: float | None = None  # First Input Delay (ms)
    cls: float | None = None  # Cumulative Layout Shift
    inp: float | None = None  # Interaction to Next Paint (ms)
    ttfb: float | None = None  # Time to First Byte (ms)
    fcp: float | None = None  # First Contentful Paint (ms)
    # Assessment (GOOD, NEEDS_IMPROVEMENT, POOR)
    lcp_rating: str | None = None
    fid_rating: str | None = None
    cls_rating: str | None = None
    inp_rating: str | None = None
    def to_dict(self) -> dict:
        return {
            "lcp": {"value": self.lcp, "rating": self.lcp_rating},
            "fid": {"value": self.fid, "rating": self.fid_rating},
            "cls": {"value": self.cls, "rating": self.cls_rating},
            "inp": {"value": self.inp, "rating": self.inp_rating},
            "ttfb": {"value": self.ttfb},
            "fcp": {"value": self.fcp},
        }
@dataclass
 class PageSpeedResult:
    """PageSpeed analysis result."""
    url: str
    strategy: str  # mobile or desktop
    performance_score: float | None = None
    seo_score: float | None = None
    accessibility_score: float | None = None
    best_practices_score: float | None = None
    core_web_vitals: CoreWebVitals = field(default_factory=CoreWebVitals)
    opportunities: list[dict] = field(default_factory=list)
    diagnostics: list[dict] = field(default_factory=list)
    passed_audits: list[str] = field(default_factory=list)
    raw_data: dict = field(default_factory=dict)
    def to_dict(self) -> dict:
        return {
            "url": self.url,
            "strategy": self.strategy,
            "scores": {
                "performance": self.performance_score,
                "seo": self.seo_score,
                "accessibility": self.accessibility_score,
                "best_practices": self.best_practices_score,
            },
            "core_web_vitals": self.core_web_vitals.to_dict(),
            "opportunities_count": len(self.opportunities),
            "opportunities": self.opportunities[:10],
            "diagnostics_count": len(self.diagnostics),
            "passed_audits_count": len(self.passed_audits),
        }
 class PageSpeedClient:
    """Client for PageSpeed Insights API."""
    BASE_URL = "https://www.googleapis.com/pagespeedonline/v5/runPagespeed"
    # Core Web Vitals thresholds
    THRESHOLDS = {
        "lcp": {"good": 2500, "poor": 4000},
        "fid": {"good": 100, "poor": 300},
        "cls": {"good": 0.1, "poor": 0.25},
        "inp": {"good": 200, "poor": 500},
        "ttfb": {"good": 800, "poor": 1800},
        "fcp": {"good": 1800, "poor": 3000},
    }
    def __init__(self, api_key: str | None = None):
        """
        Initialize PageSpeed client.
        Args:
            api_key: PageSpeed API key (optional but recommended for higher quotas)
        """
        self.api_key = api_key or config.pagespeed_api_key
        self.session = requests.Session()
    def _rate_metric(self, metric: str, value: float | None) -> str | None:
        """Rate a metric against thresholds."""
        if value is None:
            return None
        thresholds = self.THRESHOLDS.get(metric)
        if not thresholds:
            return None
        if value <= thresholds["good"]:
            return "GOOD"
        elif value <= thresholds["poor"]:
            return "NEEDS_IMPROVEMENT"
        else:
            return "POOR"
    def analyze(
        self,
        url: str,
        strategy: str = "mobile",
        categories: list[str] | None = None,
    ) -> PageSpeedResult:
        """
        Analyze a URL with PageSpeed Insights.
        Args:
            url: URL to analyze
            strategy: "mobile" or "desktop"
            categories: Categories to analyze (performance, seo, accessibility, best-practices)
        Returns:
            PageSpeedResult with scores and metrics
        """
        if categories is None:
            categories = ["performance", "seo", "accessibility", "best-practices"]
        params = {
            "url": url,
            "strategy": strategy,
            "category": categories,
        }
        if self.api_key:
            params["key"] = self.api_key
        try:
            response = self.session.get(self.BASE_URL, params=params, timeout=60)
            response.raise_for_status()
            data = response.json()
        except requests.RequestException as e:
            logger.error(f"PageSpeed API request failed: {e}")
            raise
        result = PageSpeedResult(url=url, strategy=strategy, raw_data=data)
        # Extract scores
        lighthouse = data.get("lighthouseResult", {})
        categories_data = lighthouse.get("categories", {})
        if "performance" in categories_data:
            score = categories_data["performance"].get("score")
            result.performance_score = score * 100 if score else None
        if "seo" in categories_data:
            score = categories_data["seo"].get("score")
            result.seo_score = score * 100 if score else None
        if "accessibility" in categories_data:
            score = categories_data["accessibility"].get("score")
            result.accessibility_score = score * 100 if score else None
        if "best-practices" in categories_data:
            score = categories_data["best-practices"].get("score")
            result.best_practices_score = score * 100 if score else None
        # Extract Core Web Vitals
        audits = lighthouse.get("audits", {})
        # Lab data
        cwv = result.core_web_vitals
        if "largest-contentful-paint" in audits:
            cwv.lcp = audits["largest-contentful-paint"].get("numericValue")
            cwv.lcp_rating = self._rate_metric("lcp", cwv.lcp)
        if "total-blocking-time" in audits:
            # TBT is proxy for FID in lab data
            cwv.fid = audits["total-blocking-time"].get("numericValue")
            cwv.fid_rating = self._rate_metric("fid", cwv.fid)
        if "cumulative-layout-shift" in audits:
            cwv.cls = audits["cumulative-layout-shift"].get("numericValue")
            cwv.cls_rating = self._rate_metric("cls", cwv.cls)
        if "experimental-interaction-to-next-paint" in audits:
            cwv.inp = audits["experimental-interaction-to-next-paint"].get("numericValue")
            cwv.inp_rating = self._rate_metric("inp", cwv.inp)
        if "server-response-time" in audits:
            cwv.ttfb = audits["server-response-time"].get("numericValue")
        if "first-contentful-paint" in audits:
            cwv.fcp = audits["first-contentful-paint"].get("numericValue")
        # Field data (real user data) if available
        loading_exp = data.get("loadingExperience", {})
        metrics = loading_exp.get("metrics", {})
        if "LARGEST_CONTENTFUL_PAINT_MS" in metrics:
            cwv.lcp = metrics["LARGEST_CONTENTFUL_PAINT_MS"].get("percentile")
            cwv.lcp_rating = metrics["LARGEST_CONTENTFUL_PAINT_MS"].get("category")
        if "FIRST_INPUT_DELAY_MS" in metrics:
            cwv.fid = metrics["FIRST_INPUT_DELAY_MS"].get("percentile")
            cwv.fid_rating = metrics["FIRST_INPUT_DELAY_MS"].get("category")
        if "CUMULATIVE_LAYOUT_SHIFT_SCORE" in metrics:
            cwv.cls = metrics["CUMULATIVE_LAYOUT_SHIFT_SCORE"].get("percentile") / 100
            cwv.cls_rating = metrics["CUMULATIVE_LAYOUT_SHIFT_SCORE"].get("category")
        if "INTERACTION_TO_NEXT_PAINT" in metrics:
            cwv.inp = metrics["INTERACTION_TO_NEXT_PAINT"].get("percentile")
            cwv.inp_rating = metrics["INTERACTION_TO_NEXT_PAINT"].get("category")
        # Extract opportunities
        for audit_id, audit in audits.items():
            if audit.get("details", {}).get("type") == "opportunity":
                savings = audit.get("details", {}).get("overallSavingsMs", 0)
                if savings > 0:
                    result.opportunities.append({
                        "id": audit_id,
                        "title": audit.get("title", ""),
                        "description": audit.get("description", ""),
                        "savings_ms": savings,
                        "score": audit.get("score", 0),
                    })
        # Sort opportunities by savings
        result.opportunities.sort(key=lambda x: x["savings_ms"], reverse=True)
        # Extract diagnostics
        for audit_id, audit in audits.items():
            score = audit.get("score")
            if score is not None and score < 1 and audit.get("details"):
                if audit.get("details", {}).get("type") not in ["opportunity", None]:
                    result.diagnostics.append({
                        "id": audit_id,
                        "title": audit.get("title", ""),
                        "description": audit.get("description", ""),
                        "score": score,
                    })
        # Extract passed audits
        for audit_id, audit in audits.items():
            if audit.get("score") == 1:
                result.passed_audits.append(audit.get("title", audit_id))
        return result
    def analyze_both_strategies(self, url: str) -> dict:
        """Analyze URL for both mobile and desktop."""
        mobile = self.analyze(url, strategy="mobile")
        desktop = self.analyze(url, strategy="desktop")
        return {
            "url": url,
            "mobile": mobile.to_dict(),
            "desktop": desktop.to_dict(),
            "comparison": {
                "performance_difference": (
                    (desktop.performance_score or 0) - (mobile.performance_score or 0)
                ),
                "mobile_first_issues": self._identify_mobile_issues(mobile, desktop),
            },
        }
    def _identify_mobile_issues(
        self,
        mobile: PageSpeedResult,
        desktop: PageSpeedResult,
    ) -> list[str]:
        """Identify issues that affect mobile more than desktop."""
        issues = []
        if mobile.performance_score and desktop.performance_score:
            if desktop.performance_score - mobile.performance_score > 20:
                issues.append("Significant performance gap between mobile and desktop")
        m_cwv = mobile.core_web_vitals
        d_cwv = desktop.core_web_vitals
        if m_cwv.lcp and d_cwv.lcp and m_cwv.lcp > d_cwv.lcp * 1.5:
            issues.append("LCP significantly slower on mobile")
        if m_cwv.cls and d_cwv.cls and m_cwv.cls > d_cwv.cls * 2:
            issues.append("Layout shift issues more severe on mobile")
        return issues
    def get_cwv_summary(self, url: str) -> dict:
        """Get a summary focused on Core Web Vitals."""
        result = self.analyze(url, strategy="mobile")
        cwv = result.core_web_vitals
        return {
            "url": url,
            "overall_cwv_status": self._overall_cwv_status(cwv),
            "metrics": {
                "lcp": {
                    "value": f"{cwv.lcp / 1000:.2f}s" if cwv.lcp else None,
                    "rating": cwv.lcp_rating,
                    "threshold": "≤ 2.5s good, > 4.0s poor",
                },
                "fid": {
                    "value": f"{cwv.fid:.0f}ms" if cwv.fid else None,
                    "rating": cwv.fid_rating,
                    "threshold": "≤ 100ms good, > 300ms poor",
                },
                "cls": {
                    "value": f"{cwv.cls:.3f}" if cwv.cls else None,
                    "rating": cwv.cls_rating,
                    "threshold": "≤ 0.1 good, > 0.25 poor",
                },
                "inp": {
                    "value": f"{cwv.inp:.0f}ms" if cwv.inp else None,
                    "rating": cwv.inp_rating,
                    "threshold": "≤ 200ms good, > 500ms poor",
                },
            },
            "top_opportunities": result.opportunities[:5],
        }
    def _overall_cwv_status(self, cwv: CoreWebVitals) -> str:
        """Determine overall Core Web Vitals status."""
        ratings = [cwv.lcp_rating, cwv.fid_rating, cwv.cls_rating]
        ratings = [r for r in ratings if r]
        if not ratings:
            return "UNKNOWN"
        if any(r == "POOR" for r in ratings):
            return "POOR"
        if any(r == "NEEDS_IMPROVEMENT" for r in ratings):
            return "NEEDS_IMPROVEMENT"
        return "GOOD"
    def generate_report(self, result: PageSpeedResult) -> str:
        """Generate human-readable performance report."""
        lines = [
            "=" * 60,
            "PageSpeed Insights Report",
            "=" * 60,
            f"URL: {result.url}",
            f"Strategy: {result.strategy}",
            "",
            "Scores:",
            f"  Performance: {result.performance_score:.0f}/100" if result.performance_score else "  Performance: N/A",
            f"  SEO: {result.seo_score:.0f}/100" if result.seo_score else "  SEO: N/A",
            f"  Accessibility: {result.accessibility_score:.0f}/100" if result.accessibility_score else "  Accessibility: N/A",
            f"  Best Practices: {result.best_practices_score:.0f}/100" if result.best_practices_score else "  Best Practices: N/A",
            "",
            "Core Web Vitals:",
        ]
        cwv = result.core_web_vitals
        def format_metric(name: str, value: Any, rating: str | None, unit: str) -> str:
            if value is None:
                return f"  {name}: N/A"
            rating_str = f" ({rating})" if rating else ""
            return f"  {name}: {value}{unit}{rating_str}"
        lines.append(format_metric("LCP", f"{cwv.lcp / 1000:.2f}" if cwv.lcp else None, cwv.lcp_rating, "s"))
        lines.append(format_metric("FID/TBT", f"{cwv.fid:.0f}" if cwv.fid else None, cwv.fid_rating, "ms"))
        lines.append(format_metric("CLS", f"{cwv.cls:.3f}" if cwv.cls else None, cwv.cls_rating, ""))
        lines.append(format_metric("INP", f"{cwv.inp:.0f}" if cwv.inp else None, cwv.inp_rating, "ms"))
        lines.append(format_metric("TTFB", f"{cwv.ttfb:.0f}" if cwv.ttfb else None, None, "ms"))
        lines.append(format_metric("FCP", f"{cwv.fcp / 1000:.2f}" if cwv.fcp else None, None, "s"))
        if result.opportunities:
            lines.extend([
                "",
                f"Top Opportunities ({len(result.opportunities)} total):",
            ])
            for opp in result.opportunities[:5]:
                savings = opp["savings_ms"]
                lines.append(f"  - {opp['title']}: -{savings / 1000:.1f}s potential savings")
        lines.extend(["", "=" * 60])
        return "\n".join(lines)
 def main():
    """CLI entry point."""
    parser = argparse.ArgumentParser(description="PageSpeed Insights Client")
    parser.add_argument("--url", "-u", required=True, help="URL to analyze")
    parser.add_argument("--strategy", "-s", default="mobile",
                       choices=["mobile", "desktop", "both"],
                       help="Analysis strategy")
    parser.add_argument("--output", "-o", help="Output file for JSON")
    parser.add_argument("--json", action="store_true", help="Output as JSON")
    parser.add_argument("--cwv-only", action="store_true",
                       help="Show only Core Web Vitals summary")
    args = parser.parse_args()
    client = PageSpeedClient()
    if args.cwv_only:
        summary = client.get_cwv_summary(args.url)
        print(json.dumps(summary, indent=2))
    elif args.strategy == "both":
        result = client.analyze_both_strategies(args.url)
        output = json.dumps(result, indent=2)
        if args.output:
            with open(args.output, "w") as f:
                f.write(output)
        else:
            print(output)
    else:
        result = client.analyze(args.url, strategy=args.strategy)
        if args.json or args.output:
            output = json.dumps(result.to_dict(), indent=2)
            if args.output:
                with open(args.output, "w") as f:
                    f.write(output)
            else:
                print(output)
        else:
            print(client.generate_report(result))
 if __name__ == "__main__":
    main()
--- a/ourdigital-custom-skills/12-ourdigital-seo-audit/scripts/requirements.txt
+++ b/ourdigital-custom-skills/12-ourdigital-seo-audit/scripts/requirements.txt
@@ -0,0 +1,40 @@
 # OurDigital SEO Audit - Python Dependencies
 # Install with: pip install -r requirements.txt
 # Google APIs
 google-api-python-client>=2.100.0
 google-auth>=2.23.0
 google-auth-oauthlib>=1.1.0
 google-auth-httplib2>=0.1.1
 google-analytics-data>=0.18.0
 # Notion API
 notion-client>=2.0.0
 # Web Scraping & Parsing
 lxml>=5.1.0
 beautifulsoup4>=4.12.0
 extruct>=0.16.0
 requests>=2.31.0
 aiohttp>=3.9.0
 # Schema Validation
 jsonschema>=4.21.0
 rdflib>=7.0.0
 # Google Trends
 pytrends>=4.9.2
 # Data Processing
 pandas>=2.1.0
 # Async & Retry
 tenacity>=8.2.0
 tqdm>=4.66.0
 # Environment
 python-dotenv>=1.0.0
 # Logging & CLI
 rich>=13.7.0
 typer>=0.9.0
--- a/ourdigital-custom-skills/12-ourdigital-seo-audit/scripts/robots_checker.py
+++ b/ourdigital-custom-skills/12-ourdigital-seo-audit/scripts/robots_checker.py
@@ -0,0 +1,540 @@
 """
 Robots.txt Checker - Analyze robots.txt configuration
 =====================================================
 Purpose: Parse and analyze robots.txt for SEO compliance
 Python: 3.10+
 Usage:
    python robots_checker.py --url https://example.com/robots.txt
    python robots_checker.py --url https://example.com --test-url /admin/
 """
 import argparse
 import json
 import logging
 import re
 from dataclasses import dataclass, field
 from datetime import datetime
 from typing import Any
 from urllib.parse import urljoin, urlparse
 from urllib.robotparser import RobotFileParser
 import requests
 logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s",
 )
 logger = logging.getLogger(__name__)
@dataclass
 class RobotsIssue:
    """Represents a robots.txt issue."""
    severity: str  # "error", "warning", "info"
    message: str
    line_number: int | None = None
    directive: str | None = None
    suggestion: str | None = None
@dataclass
 class UserAgentRules:
    """Rules for a specific user-agent."""
    user_agent: str
    disallow: list[str] = field(default_factory=list)
    allow: list[str] = field(default_factory=list)
    crawl_delay: float | None = None
@dataclass
 class RobotsResult:
    """Complete robots.txt analysis result."""
    url: str
    accessible: bool = True
    content: str = ""
    rules: list[UserAgentRules] = field(default_factory=list)
    sitemaps: list[str] = field(default_factory=list)
    issues: list[RobotsIssue] = field(default_factory=list)
    stats: dict = field(default_factory=dict)
    timestamp: str = field(default_factory=lambda: datetime.now().isoformat())
    def to_dict(self) -> dict:
        """Convert to dictionary for JSON output."""
        return {
            "url": self.url,
            "accessible": self.accessible,
            "sitemaps": self.sitemaps,
            "rules": [
                {
                    "user_agent": r.user_agent,
                    "disallow": r.disallow,
                    "allow": r.allow,
                    "crawl_delay": r.crawl_delay,
                }
                for r in self.rules
            ],
            "issues": [
                {
                    "severity": i.severity,
                    "message": i.message,
                    "line_number": i.line_number,
                    "directive": i.directive,
                    "suggestion": i.suggestion,
                }
                for i in self.issues
            ],
            "stats": self.stats,
            "timestamp": self.timestamp,
        }
 class RobotsChecker:
    """Analyze robots.txt configuration."""
    # Common user agents
    USER_AGENTS = {
        "*": "All bots",
        "Googlebot": "Google crawler",
        "Googlebot-Image": "Google Image crawler",
        "Googlebot-News": "Google News crawler",
        "Googlebot-Video": "Google Video crawler",
        "Bingbot": "Bing crawler",
        "Slurp": "Yahoo crawler",
        "DuckDuckBot": "DuckDuckGo crawler",
        "Baiduspider": "Baidu crawler",
        "Yandex": "Yandex crawler",
        "facebot": "Facebook crawler",
        "Twitterbot": "Twitter crawler",
        "LinkedInBot": "LinkedIn crawler",
    }
    # Paths that should generally not be blocked
    IMPORTANT_PATHS = [
        "/",
        "/*.css",
        "/*.js",
        "/*.jpg",
        "/*.jpeg",
        "/*.png",
        "/*.gif",
        "/*.svg",
        "/*.webp",
    ]
    # Paths commonly blocked
    COMMON_BLOCKED = [
        "/admin",
        "/wp-admin",
        "/login",
        "/private",
        "/api",
        "/cgi-bin",
        "/tmp",
        "/search",
    ]
    def __init__(self):
        self.session = requests.Session()
        self.session.headers.update({
            "User-Agent": "Mozilla/5.0 (compatible; SEOAuditBot/1.0)"
        })
    def fetch_robots(self, url: str) -> str | None:
        """Fetch robots.txt content."""
        # Ensure we're fetching robots.txt
        parsed = urlparse(url)
        if not parsed.path.endswith("robots.txt"):
            robots_url = f"{parsed.scheme}://{parsed.netloc}/robots.txt"
        else:
            robots_url = url
        try:
            response = self.session.get(robots_url, timeout=10)
            if response.status_code == 200:
                return response.text
            elif response.status_code == 404:
                return None
            else:
                raise RuntimeError(f"HTTP {response.status_code}")
        except requests.RequestException as e:
            raise RuntimeError(f"Failed to fetch robots.txt: {e}")
    def parse_robots(self, content: str) -> tuple[list[UserAgentRules], list[str]]:
        """Parse robots.txt content."""
        rules = []
        sitemaps = []
        current_ua = None
        current_rules = None
        for line_num, line in enumerate(content.split("\n"), 1):
            line = line.strip()
            # Skip empty lines and comments
            if not line or line.startswith("#"):
                continue
            # Parse directive
            if ":" not in line:
                continue
            directive, value = line.split(":", 1)
            directive = directive.strip().lower()
            value = value.strip()
            if directive == "user-agent":
                # Save previous user-agent rules
                if current_rules:
                    rules.append(current_rules)
                current_ua = value
                current_rules = UserAgentRules(user_agent=value)
            elif directive == "disallow" and current_rules:
                if value:  # Empty disallow means allow all
                    current_rules.disallow.append(value)
            elif directive == "allow" and current_rules:
                if value:
                    current_rules.allow.append(value)
            elif directive == "crawl-delay" and current_rules:
                try:
                    current_rules.crawl_delay = float(value)
                except ValueError:
                    pass
            elif directive == "sitemap":
                if value:
                    sitemaps.append(value)
        # Don't forget last user-agent
        if current_rules:
            rules.append(current_rules)
        return rules, sitemaps
    def analyze(self, url: str) -> RobotsResult:
        """Analyze robots.txt."""
        result = RobotsResult(url=url)
        # Fetch robots.txt
        try:
            content = self.fetch_robots(url)
            if content is None:
                result.accessible = False
                result.issues.append(RobotsIssue(
                    severity="info",
                    message="No robots.txt found (returns 404)",
                    suggestion="Consider creating a robots.txt file",
                ))
                return result
        except RuntimeError as e:
            result.accessible = False
            result.issues.append(RobotsIssue(
                severity="error",
                message=str(e),
            ))
            return result
        result.content = content
        result.rules, result.sitemaps = self.parse_robots(content)
        # Analyze content
        self._analyze_syntax(result)
        self._analyze_rules(result)
        self._analyze_sitemaps(result)
        # Calculate stats
        result.stats = {
            "user_agents_count": len(result.rules),
            "user_agents": [r.user_agent for r in result.rules],
            "total_disallow_rules": sum(len(r.disallow) for r in result.rules),
            "total_allow_rules": sum(len(r.allow) for r in result.rules),
            "sitemaps_count": len(result.sitemaps),
            "has_crawl_delay": any(r.crawl_delay for r in result.rules),
            "content_length": len(content),
        }
        return result
    def _analyze_syntax(self, result: RobotsResult) -> None:
        """Check for syntax issues."""
        lines = result.content.split("\n")
        for line_num, line in enumerate(lines, 1):
            line = line.strip()
            # Skip empty lines and comments
            if not line or line.startswith("#"):
                continue
            # Check for valid directive
            if ":" not in line:
                result.issues.append(RobotsIssue(
                    severity="warning",
                    message=f"Invalid line (missing colon): {line[:50]}",
                    line_number=line_num,
                ))
                continue
            directive, value = line.split(":", 1)
            directive = directive.strip().lower()
            valid_directives = {
                "user-agent", "disallow", "allow",
                "crawl-delay", "sitemap", "host",
            }
            if directive not in valid_directives:
                result.issues.append(RobotsIssue(
                    severity="info",
                    message=f"Unknown directive: {directive}",
                    line_number=line_num,
                    directive=directive,
                ))
    def _analyze_rules(self, result: RobotsResult) -> None:
        """Analyze blocking rules."""
        # Check if there are any rules
        if not result.rules:
            result.issues.append(RobotsIssue(
                severity="info",
                message="No user-agent rules defined",
                suggestion="Add User-agent: * rules to control crawling",
            ))
            return
        # Check for wildcard rule
        has_wildcard = any(r.user_agent == "*" for r in result.rules)
        if not has_wildcard:
            result.issues.append(RobotsIssue(
                severity="info",
                message="No wildcard (*) user-agent defined",
                suggestion="Consider adding User-agent: * as fallback",
            ))
        # Check for blocking important resources
        for rules in result.rules:
            for disallow in rules.disallow:
                # Check if blocking root
                if disallow == "/":
                    result.issues.append(RobotsIssue(
                        severity="error",
                        message=f"Blocking entire site for {rules.user_agent}",
                        directive=f"Disallow: {disallow}",
                        suggestion="This will prevent indexing. Is this intentional?",
                    ))
                # Check if blocking CSS/JS
                if any(ext in disallow.lower() for ext in [".css", ".js"]):
                    result.issues.append(RobotsIssue(
                        severity="warning",
                        message=f"Blocking CSS/JS files for {rules.user_agent}",
                        directive=f"Disallow: {disallow}",
                        suggestion="May affect rendering and SEO",
                    ))
                # Check for blocking images
                if any(ext in disallow.lower() for ext in [".jpg", ".png", ".gif", ".webp"]):
                    result.issues.append(RobotsIssue(
                        severity="info",
                        message=f"Blocking image files for {rules.user_agent}",
                        directive=f"Disallow: {disallow}",
                    ))
            # Check crawl delay
            if rules.crawl_delay:
                if rules.crawl_delay > 10:
                    result.issues.append(RobotsIssue(
                        severity="warning",
                        message=f"High crawl-delay ({rules.crawl_delay}s) for {rules.user_agent}",
                        directive=f"Crawl-delay: {rules.crawl_delay}",
                        suggestion="May significantly slow indexing",
                    ))
                elif rules.crawl_delay > 0:
                    result.issues.append(RobotsIssue(
                        severity="info",
                        message=f"Crawl-delay set to {rules.crawl_delay}s for {rules.user_agent}",
                    ))
    def _analyze_sitemaps(self, result: RobotsResult) -> None:
        """Analyze sitemap declarations."""
        if not result.sitemaps:
            result.issues.append(RobotsIssue(
                severity="warning",
                message="No sitemap declared in robots.txt",
                suggestion="Add Sitemap: directive to help crawlers find your sitemap",
            ))
        else:
            for sitemap in result.sitemaps:
                if not sitemap.startswith("http"):
                    result.issues.append(RobotsIssue(
                        severity="warning",
                        message=f"Sitemap URL should be absolute: {sitemap}",
                        directive=f"Sitemap: {sitemap}",
                    ))
    def test_url(self, robots_url: str, test_path: str,
                 user_agent: str = "Googlebot") -> dict:
        """Test if a specific URL is allowed."""
        # Use Python's built-in parser
        rp = RobotFileParser()
        # Ensure robots.txt URL
        parsed = urlparse(robots_url)
        if not parsed.path.endswith("robots.txt"):
            robots_txt_url = f"{parsed.scheme}://{parsed.netloc}/robots.txt"
        else:
            robots_txt_url = robots_url
        rp.set_url(robots_txt_url)
        try:
            rp.read()
        except Exception as e:
            return {
                "path": test_path,
                "user_agent": user_agent,
                "allowed": None,
                "error": str(e),
            }
        # Build full URL for testing
        base_url = f"{parsed.scheme}://{parsed.netloc}"
        full_url = urljoin(base_url, test_path)
        allowed = rp.can_fetch(user_agent, full_url)
        return {
            "path": test_path,
            "user_agent": user_agent,
            "allowed": allowed,
            "full_url": full_url,
        }
    def generate_report(self, result: RobotsResult) -> str:
        """Generate human-readable analysis report."""
        lines = [
            "=" * 60,
            "Robots.txt Analysis Report",
            "=" * 60,
            f"URL: {result.url}",
            f"Accessible: {'Yes' if result.accessible else 'No'}",
            f"Timestamp: {result.timestamp}",
            "",
        ]
        if result.accessible:
            lines.append("Statistics:")
            for key, value in result.stats.items():
                if key == "user_agents":
                    lines.append(f"  {key}: {', '.join(value) if value else 'None'}")
                else:
                    lines.append(f"  {key}: {value}")
            lines.append("")
            if result.sitemaps:
                lines.append(f"Sitemaps ({len(result.sitemaps)}):")
                for sitemap in result.sitemaps:
                    lines.append(f"  - {sitemap}")
                lines.append("")
            if result.rules:
                lines.append("Rules Summary:")
                for rules in result.rules:
                    lines.append(f"\n  User-agent: {rules.user_agent}")
                    if rules.disallow:
                        lines.append(f"    Disallow: {len(rules.disallow)} rules")
                        for d in rules.disallow[:5]:
                            lines.append(f"      - {d}")
                        if len(rules.disallow) > 5:
                            lines.append(f"      ... and {len(rules.disallow) - 5} more")
                    if rules.allow:
                        lines.append(f"    Allow: {len(rules.allow)} rules")
                        for a in rules.allow[:3]:
                            lines.append(f"      - {a}")
                    if rules.crawl_delay:
                        lines.append(f"    Crawl-delay: {rules.crawl_delay}s")
                lines.append("")
        if result.issues:
            lines.append("Issues Found:")
            errors = [i for i in result.issues if i.severity == "error"]
            warnings = [i for i in result.issues if i.severity == "warning"]
            infos = [i for i in result.issues if i.severity == "info"]
            if errors:
                lines.append(f"\n  ERRORS ({len(errors)}):")
                for issue in errors:
                    lines.append(f"    - {issue.message}")
                    if issue.directive:
                        lines.append(f"      Directive: {issue.directive}")
                    if issue.suggestion:
                        lines.append(f"      Suggestion: {issue.suggestion}")
            if warnings:
                lines.append(f"\n  WARNINGS ({len(warnings)}):")
                for issue in warnings:
                    lines.append(f"    - {issue.message}")
                    if issue.suggestion:
                        lines.append(f"      Suggestion: {issue.suggestion}")
            if infos:
                lines.append(f"\n  INFO ({len(infos)}):")
                for issue in infos:
                    lines.append(f"    - {issue.message}")
        lines.append("")
        lines.append("=" * 60)
        return "\n".join(lines)
 def main():
    """Main entry point for CLI usage."""
    parser = argparse.ArgumentParser(
        description="Analyze robots.txt configuration",
    )
    parser.add_argument("--url", "-u", required=True,
                       help="URL to robots.txt or domain")
    parser.add_argument("--test-url", "-t",
                       help="Test if specific URL path is allowed")
    parser.add_argument("--user-agent", "-a", default="Googlebot",
                       help="User agent for testing (default: Googlebot)")
    parser.add_argument("--output", "-o", help="Output file for JSON report")
    parser.add_argument("--json", action="store_true", help="Output as JSON")
    args = parser.parse_args()
    checker = RobotsChecker()
    if args.test_url:
        # Test specific URL
        test_result = checker.test_url(args.url, args.test_url, args.user_agent)
        if args.json:
            print(json.dumps(test_result, indent=2))
        else:
            status = "ALLOWED" if test_result["allowed"] else "BLOCKED"
            print(f"URL: {test_result['path']}")
            print(f"User-Agent: {test_result['user_agent']}")
            print(f"Status: {status}")
    else:
        # Full analysis
        result = checker.analyze(args.url)
        if args.json or args.output:
            output = json.dumps(result.to_dict(), ensure_ascii=False, indent=2)
            if args.output:
                with open(args.output, "w", encoding="utf-8") as f:
                    f.write(output)
                logger.info(f"Report written to {args.output}")
            else:
                print(output)
        else:
            print(checker.generate_report(result))
 if __name__ == "__main__":
    main()
--- a/ourdigital-custom-skills/12-ourdigital-seo-audit/scripts/schema_generator.py
+++ b/ourdigital-custom-skills/12-ourdigital-seo-audit/scripts/schema_generator.py
@@ -0,0 +1,490 @@
 """
 Schema Generator - Generate JSON-LD structured data markup
 ==========================================================
 Purpose: Generate schema.org structured data in JSON-LD format
 Python: 3.10+
 Usage:
    python schema_generator.py --type organization --name "Company Name" --url "https://example.com"
 """
 import argparse
 import json
 import logging
 import os
 import re
 from datetime import datetime
 from pathlib import Path
 from typing import Any
 logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s",
 )
 logger = logging.getLogger(__name__)
 # Template directory relative to this script
 TEMPLATE_DIR = Path(__file__).parent.parent / "templates" / "schema_templates"
 class SchemaGenerator:
    """Generate JSON-LD schema markup from templates."""
    SCHEMA_TYPES = {
        "organization": "organization.json",
        "local_business": "local_business.json",
        "product": "product.json",
        "article": "article.json",
        "faq": "faq.json",
        "breadcrumb": "breadcrumb.json",
        "website": "website.json",
    }
    # Business type mappings for LocalBusiness
    BUSINESS_TYPES = {
        "restaurant": "Restaurant",
        "cafe": "CafeOrCoffeeShop",
        "bar": "BarOrPub",
        "hotel": "Hotel",
        "store": "Store",
        "medical": "MedicalBusiness",
        "dental": "Dentist",
        "legal": "LegalService",
        "real_estate": "RealEstateAgent",
        "auto": "AutoRepair",
        "beauty": "BeautySalon",
        "gym": "HealthClub",
        "spa": "DaySpa",
    }
    # Article type mappings
    ARTICLE_TYPES = {
        "article": "Article",
        "blog": "BlogPosting",
        "news": "NewsArticle",
        "tech": "TechArticle",
        "scholarly": "ScholarlyArticle",
    }
    def __init__(self, template_dir: Path = TEMPLATE_DIR):
        self.template_dir = template_dir
    def load_template(self, schema_type: str) -> dict:
        """Load a schema template file."""
        if schema_type not in self.SCHEMA_TYPES:
            raise ValueError(f"Unknown schema type: {schema_type}. "
                           f"Available: {list(self.SCHEMA_TYPES.keys())}")
        template_file = self.template_dir / self.SCHEMA_TYPES[schema_type]
        if not template_file.exists():
            raise FileNotFoundError(f"Template not found: {template_file}")
        with open(template_file, "r", encoding="utf-8") as f:
            return json.load(f)
    def fill_template(self, template: dict, data: dict[str, Any]) -> dict:
        """Fill template placeholders with actual data."""
        template_str = json.dumps(template, ensure_ascii=False)
        # Replace placeholders {{key}} with values
        for key, value in data.items():
            placeholder = f"{{{{{key}}}}}"
            if value is not None:
                template_str = template_str.replace(placeholder, str(value))
        # Remove unfilled placeholders and their parent objects if empty
        result = json.loads(template_str)
        return self._clean_empty_values(result)
    def _clean_empty_values(self, obj: Any) -> Any:
        """Remove empty values and unfilled placeholders."""
        if isinstance(obj, dict):
            cleaned = {}
            for key, value in obj.items():
                cleaned_value = self._clean_empty_values(value)
                # Skip if value is empty, None, or unfilled placeholder
                if cleaned_value is None:
                    continue
                if isinstance(cleaned_value, str) and cleaned_value.startswith("{{"):
                    continue
                if isinstance(cleaned_value, (list, dict)) and not cleaned_value:
                    continue
                cleaned[key] = cleaned_value
            return cleaned if cleaned else None
        elif isinstance(obj, list):
            cleaned = []
            for item in obj:
                cleaned_item = self._clean_empty_values(item)
                if cleaned_item is not None:
                    if isinstance(cleaned_item, str) and cleaned_item.startswith("{{"):
                        continue
                    cleaned.append(cleaned_item)
            return cleaned if cleaned else None
        elif isinstance(obj, str):
            if obj.startswith("{{") and obj.endswith("}}"):
                return None
            return obj
        return obj
    def generate_organization(
        self,
        name: str,
        url: str,
        logo_url: str | None = None,
        description: str | None = None,
        founding_date: str | None = None,
        phone: str | None = None,
        address: dict | None = None,
        social_links: list[str] | None = None,
    ) -> dict:
        """Generate Organization schema."""
        template = self.load_template("organization")
        data = {
            "name": name,
            "url": url,
            "logo_url": logo_url,
            "description": description,
            "founding_date": founding_date,
            "phone": phone,
        }
        if address:
            data.update({
                "street_address": address.get("street"),
                "city": address.get("city"),
                "region": address.get("region"),
                "postal_code": address.get("postal_code"),
                "country": address.get("country", "KR"),
            })
        if social_links:
            # Handle social links specially
            pass
        return self.fill_template(template, data)
    def generate_local_business(
        self,
        name: str,
        business_type: str,
        address: dict,
        phone: str | None = None,
        url: str | None = None,
        description: str | None = None,
        hours: dict | None = None,
        geo: dict | None = None,
        price_range: str | None = None,
        rating: float | None = None,
        review_count: int | None = None,
    ) -> dict:
        """Generate LocalBusiness schema."""
        template = self.load_template("local_business")
        schema_business_type = self.BUSINESS_TYPES.get(
            business_type.lower(), "LocalBusiness"
        )
        data = {
            "business_type": schema_business_type,
            "name": name,
            "url": url,
            "description": description,
            "phone": phone,
            "price_range": price_range,
            "street_address": address.get("street"),
            "city": address.get("city"),
            "region": address.get("region"),
            "postal_code": address.get("postal_code"),
            "country": address.get("country", "KR"),
        }
        if geo:
            data["latitude"] = geo.get("lat")
            data["longitude"] = geo.get("lng")
        if hours:
            data.update({
                "weekday_opens": hours.get("weekday_opens", "09:00"),
                "weekday_closes": hours.get("weekday_closes", "18:00"),
                "weekend_opens": hours.get("weekend_opens"),
                "weekend_closes": hours.get("weekend_closes"),
            })
        if rating is not None:
            data["rating"] = str(rating)
            data["review_count"] = str(review_count or 0)
        return self.fill_template(template, data)
    def generate_product(
        self,
        name: str,
        description: str,
        price: float,
        currency: str = "KRW",
        brand: str | None = None,
        sku: str | None = None,
        images: list[str] | None = None,
        availability: str = "InStock",
        condition: str = "NewCondition",
        rating: float | None = None,
        review_count: int | None = None,
        url: str | None = None,
        seller: str | None = None,
    ) -> dict:
        """Generate Product schema."""
        template = self.load_template("product")
        data = {
            "name": name,
            "description": description,
            "price": str(int(price)),
            "currency": currency,
            "brand_name": brand,
            "sku": sku,
            "product_url": url,
            "availability": availability,
            "condition": condition,
            "seller_name": seller,
        }
        if images:
            for i, img in enumerate(images[:3], 1):
                data[f"image_url_{i}"] = img
        if rating is not None:
            data["rating"] = str(rating)
            data["review_count"] = str(review_count or 0)
        return self.fill_template(template, data)
    def generate_article(
        self,
        headline: str,
        description: str,
        author_name: str,
        date_published: str,
        publisher_name: str,
        article_type: str = "article",
        date_modified: str | None = None,
        images: list[str] | None = None,
        page_url: str | None = None,
        publisher_logo: str | None = None,
        author_url: str | None = None,
        section: str | None = None,
        word_count: int | None = None,
        keywords: str | None = None,
    ) -> dict:
        """Generate Article schema."""
        template = self.load_template("article")
        schema_article_type = self.ARTICLE_TYPES.get(
            article_type.lower(), "Article"
        )
        data = {
            "article_type": schema_article_type,
            "headline": headline,
            "description": description,
            "author_name": author_name,
            "author_url": author_url,
            "date_published": date_published,
            "date_modified": date_modified or date_published,
            "publisher_name": publisher_name,
            "publisher_logo_url": publisher_logo,
            "page_url": page_url,
            "section": section,
            "word_count": str(word_count) if word_count else None,
            "keywords": keywords,
        }
        if images:
            for i, img in enumerate(images[:2], 1):
                data[f"image_url_{i}"] = img
        return self.fill_template(template, data)
    def generate_faq(self, questions: list[dict[str, str]]) -> dict:
        """Generate FAQPage schema."""
        schema = {
            "@context": "https://schema.org",
            "@type": "FAQPage",
            "mainEntity": [],
        }
        for qa in questions:
            schema["mainEntity"].append({
                "@type": "Question",
                "name": qa["question"],
                "acceptedAnswer": {
                    "@type": "Answer",
                    "text": qa["answer"],
                },
            })
        return schema
    def generate_breadcrumb(self, items: list[dict[str, str]]) -> dict:
        """Generate BreadcrumbList schema."""
        schema = {
            "@context": "https://schema.org",
            "@type": "BreadcrumbList",
            "itemListElement": [],
        }
        for i, item in enumerate(items, 1):
            schema["itemListElement"].append({
                "@type": "ListItem",
                "position": i,
                "name": item["name"],
                "item": item["url"],
            })
        return schema
    def generate_website(
        self,
        name: str,
        url: str,
        search_url_template: str | None = None,
        description: str | None = None,
        language: str = "ko-KR",
        publisher_name: str | None = None,
        logo_url: str | None = None,
        alternate_name: str | None = None,
    ) -> dict:
        """Generate WebSite schema."""
        template = self.load_template("website")
        data = {
            "site_name": name,
            "url": url,
            "description": description,
            "language": language,
            "search_url_template": search_url_template,
            "publisher_name": publisher_name or name,
            "logo_url": logo_url,
            "alternate_name": alternate_name,
        }
        return self.fill_template(template, data)
    def to_json_ld(self, schema: dict, pretty: bool = True) -> str:
        """Convert schema dict to JSON-LD string."""
        indent = 2 if pretty else None
        return json.dumps(schema, ensure_ascii=False, indent=indent)
    def to_html_script(self, schema: dict) -> str:
        """Wrap schema in HTML script tag."""
        json_ld = self.to_json_ld(schema)
        return f'<script type="application/ld+json">\n{json_ld}\n</script>'
 def main():
    """Main entry point for CLI usage."""
    parser = argparse.ArgumentParser(
        description="Generate JSON-LD schema markup",
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog="""
 Examples:
  # Generate Organization schema
  python schema_generator.py --type organization --name "My Company" --url "https://example.com"
  # Generate Product schema
  python schema_generator.py --type product --name "Widget" --price 29900 --currency KRW
  # Generate Article schema
  python schema_generator.py --type article --headline "Article Title" --author "John Doe"
        """,
    )
    parser.add_argument(
        "--type", "-t",
        required=True,
        choices=SchemaGenerator.SCHEMA_TYPES.keys(),
        help="Schema type to generate",
    )
    parser.add_argument("--name", help="Name/title")
    parser.add_argument("--url", help="URL")
    parser.add_argument("--description", help="Description")
    parser.add_argument("--price", type=float, help="Price (for product)")
    parser.add_argument("--currency", default="KRW", help="Currency code")
    parser.add_argument("--headline", help="Headline (for article)")
    parser.add_argument("--author", help="Author name")
    parser.add_argument("--output", "-o", help="Output file path")
    parser.add_argument("--html", action="store_true", help="Output as HTML script tag")
    args = parser.parse_args()
    generator = SchemaGenerator()
    try:
        if args.type == "organization":
            schema = generator.generate_organization(
                name=args.name or "Organization Name",
                url=args.url or "https://example.com",
                description=args.description,
            )
        elif args.type == "product":
            schema = generator.generate_product(
                name=args.name or "Product Name",
                description=args.description or "Product description",
                price=args.price or 0,
                currency=args.currency,
            )
        elif args.type == "article":
            schema = generator.generate_article(
                headline=args.headline or args.name or "Article Title",
                description=args.description or "Article description",
                author_name=args.author or "Author",
                date_published=datetime.now().strftime("%Y-%m-%d"),
                publisher_name="Publisher",
            )
        elif args.type == "website":
            schema = generator.generate_website(
                name=args.name or "Website Name",
                url=args.url or "https://example.com",
                description=args.description,
            )
        elif args.type == "faq":
            # Example FAQ
            schema = generator.generate_faq([
                {"question": "Question 1?", "answer": "Answer 1"},
                {"question": "Question 2?", "answer": "Answer 2"},
            ])
        elif args.type == "breadcrumb":
            # Example breadcrumb
            schema = generator.generate_breadcrumb([
                {"name": "Home", "url": "https://example.com/"},
                {"name": "Category", "url": "https://example.com/category/"},
            ])
        elif args.type == "local_business":
            schema = generator.generate_local_business(
                name=args.name or "Business Name",
                business_type="store",
                address={"street": "123 Main St", "city": "Seoul", "country": "KR"},
                url=args.url,
                description=args.description,
            )
        else:
            raise ValueError(f"Unsupported type: {args.type}")
        if args.html:
            output = generator.to_html_script(schema)
        else:
            output = generator.to_json_ld(schema)
        if args.output:
            with open(args.output, "w", encoding="utf-8") as f:
                f.write(output)
            logger.info(f"Schema written to {args.output}")
        else:
            print(output)
    except Exception as e:
        logger.error(f"Error generating schema: {e}")
        raise
 if __name__ == "__main__":
    main()
--- a/ourdigital-custom-skills/12-ourdigital-seo-audit/scripts/schema_validator.py
+++ b/ourdigital-custom-skills/12-ourdigital-seo-audit/scripts/schema_validator.py
@@ -0,0 +1,498 @@
 """
 Schema Validator - Validate JSON-LD structured data markup
 ==========================================================
 Purpose: Extract and validate schema.org structured data from URLs or files
 Python: 3.10+
 Usage:
    python schema_validator.py --url https://example.com
    python schema_validator.py --file schema.json
 """
 import argparse
 import json
 import logging
 import re
 from dataclasses import dataclass, field
 from datetime import datetime
 from typing import Any
 from urllib.parse import urlparse
 import requests
 from bs4 import BeautifulSoup
 try:
    import extruct
    HAS_EXTRUCT = True
 except ImportError:
    HAS_EXTRUCT = False
 logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s",
 )
 logger = logging.getLogger(__name__)
@dataclass
 class ValidationIssue:
    """Represents a validation issue found in schema."""
    severity: str  # "error", "warning", "info"
    message: str
    schema_type: str | None = None
    property_name: str | None = None
    suggestion: str | None = None
@dataclass
 class ValidationResult:
    """Complete validation result for a schema."""
    url: str | None = None
    schemas_found: list[dict] = field(default_factory=list)
    issues: list[ValidationIssue] = field(default_factory=list)
    valid: bool = True
    rich_results_eligible: dict = field(default_factory=dict)
    timestamp: str = field(default_factory=lambda: datetime.now().isoformat())
    def to_dict(self) -> dict:
        """Convert to dictionary for JSON output."""
        return {
            "url": self.url,
            "schemas_found": len(self.schemas_found),
            "schema_types": [s.get("@type", "Unknown") for s in self.schemas_found],
            "valid": self.valid,
            "issues": [
                {
                    "severity": i.severity,
                    "message": i.message,
                    "schema_type": i.schema_type,
                    "property": i.property_name,
                    "suggestion": i.suggestion,
                }
                for i in self.issues
            ],
            "rich_results_eligible": self.rich_results_eligible,
            "timestamp": self.timestamp,
        }
 class SchemaValidator:
    """Validate schema.org structured data."""
    # Required properties for common schema types
    REQUIRED_PROPERTIES = {
        "Organization": ["name", "url"],
        "LocalBusiness": ["name", "address"],
        "Product": ["name"],
        "Offer": ["price", "priceCurrency"],
        "Article": ["headline", "author", "datePublished", "publisher"],
        "BlogPosting": ["headline", "author", "datePublished", "publisher"],
        "NewsArticle": ["headline", "author", "datePublished", "publisher"],
        "FAQPage": ["mainEntity"],
        "Question": ["name", "acceptedAnswer"],
        "Answer": ["text"],
        "BreadcrumbList": ["itemListElement"],
        "ListItem": ["position", "name"],
        "WebSite": ["name", "url"],
        "WebPage": ["name"],
        "Person": ["name"],
        "Event": ["name", "startDate", "location"],
        "Review": ["reviewRating", "author"],
        "AggregateRating": ["ratingValue"],
        "ImageObject": ["url"],
    }
    # Recommended (but not required) properties
    RECOMMENDED_PROPERTIES = {
        "Organization": ["logo", "description", "contactPoint", "sameAs"],
        "LocalBusiness": ["telephone", "openingHoursSpecification", "geo", "image"],
        "Product": ["description", "image", "brand", "offers", "aggregateRating"],
        "Article": ["image", "dateModified", "description"],
        "FAQPage": [],
        "WebSite": ["potentialAction"],
        "BreadcrumbList": [],
    }
    # Google Rich Results eligible types
    RICH_RESULTS_TYPES = {
        "Article", "BlogPosting", "NewsArticle",
        "Product", "Review",
        "FAQPage", "HowTo",
        "LocalBusiness", "Restaurant",
        "Event",
        "Recipe",
        "JobPosting",
        "Course",
        "BreadcrumbList",
        "Organization",
        "WebSite",
        "VideoObject",
    }
    def __init__(self):
        self.session = requests.Session()
        self.session.headers.update({
            "User-Agent": "Mozilla/5.0 (compatible; SEOAuditBot/1.0)"
        })
    def extract_from_url(self, url: str) -> list[dict]:
        """Extract all structured data from a URL."""
        try:
            response = self.session.get(url, timeout=30)
            response.raise_for_status()
            return self.extract_from_html(response.text, url)
        except requests.RequestException as e:
            logger.error(f"Failed to fetch URL: {e}")
            return []
    def extract_from_html(self, html: str, base_url: str | None = None) -> list[dict]:
        """Extract structured data from HTML content."""
        schemas = []
        # Method 1: Use extruct if available (handles JSON-LD, Microdata, RDFa)
        if HAS_EXTRUCT:
            try:
                data = extruct.extract(html, base_url=base_url, uniform=True)
                schemas.extend(data.get("json-ld", []))
                schemas.extend(data.get("microdata", []))
                schemas.extend(data.get("rdfa", []))
            except Exception as e:
                logger.warning(f"extruct extraction failed: {e}")
        # Method 2: Manual JSON-LD extraction (fallback/additional)
        soup = BeautifulSoup(html, "html.parser")
        for script in soup.find_all("script", type="application/ld+json"):
            try:
                content = script.string
                if content:
                    data = json.loads(content)
                    if isinstance(data, list):
                        schemas.extend(data)
                    else:
                        schemas.append(data)
            except json.JSONDecodeError as e:
                logger.warning(f"Invalid JSON-LD: {e}")
        # Deduplicate schemas
        seen = set()
        unique_schemas = []
        for schema in schemas:
            schema_str = json.dumps(schema, sort_keys=True)
            if schema_str not in seen:
                seen.add(schema_str)
                unique_schemas.append(schema)
        return unique_schemas
    def validate(self, url: str | None = None, html: str | None = None,
                 schema: dict | None = None) -> ValidationResult:
        """Validate schema from URL, HTML, or direct schema dict."""
        result = ValidationResult(url=url)
        # Extract schemas
        if schema:
            schemas = [schema]
        elif html:
            schemas = self.extract_from_html(html, url)
        elif url:
            schemas = self.extract_from_url(url)
        else:
            raise ValueError("Must provide url, html, or schema")
        result.schemas_found = schemas
        if not schemas:
            result.issues.append(ValidationIssue(
                severity="warning",
                message="No structured data found",
                suggestion="Add JSON-LD schema markup to improve SEO",
            ))
            result.valid = False
            return result
        # Validate each schema
        for schema in schemas:
            self._validate_schema(schema, result)
        # Check for errors (warnings don't affect validity)
        result.valid = not any(i.severity == "error" for i in result.issues)
        return result
    def _validate_schema(self, schema: dict, result: ValidationResult,
                        parent_type: str | None = None) -> None:
        """Validate a single schema object."""
        schema_type = schema.get("@type")
        if not schema_type:
            result.issues.append(ValidationIssue(
                severity="error",
                message="Missing @type property",
                schema_type=parent_type,
            ))
            return
        # Handle array of types
        if isinstance(schema_type, list):
            schema_type = schema_type[0]
        # Check required properties
        required = self.REQUIRED_PROPERTIES.get(schema_type, [])
        for prop in required:
            if prop not in schema:
                result.issues.append(ValidationIssue(
                    severity="error",
                    message=f"Missing required property: {prop}",
                    schema_type=schema_type,
                    property_name=prop,
                    suggestion=f"Add '{prop}' property to {schema_type} schema",
                ))
        # Check recommended properties
        recommended = self.RECOMMENDED_PROPERTIES.get(schema_type, [])
        for prop in recommended:
            if prop not in schema:
                result.issues.append(ValidationIssue(
                    severity="info",
                    message=f"Missing recommended property: {prop}",
                    schema_type=schema_type,
                    property_name=prop,
                    suggestion=f"Consider adding '{prop}' for better rich results",
                ))
        # Check Rich Results eligibility
        if schema_type in self.RICH_RESULTS_TYPES:
            result.rich_results_eligible[schema_type] = self._check_rich_results(
                schema, schema_type
            )
        # Validate nested schemas
        for key, value in schema.items():
            if key.startswith("@"):
                continue
            if isinstance(value, dict) and "@type" in value:
                self._validate_schema(value, result, schema_type)
            elif isinstance(value, list):
                for item in value:
                    if isinstance(item, dict) and "@type" in item:
                        self._validate_schema(item, result, schema_type)
        # Type-specific validations
        self._validate_type_specific(schema, schema_type, result)
    def _validate_type_specific(self, schema: dict, schema_type: str,
                                result: ValidationResult) -> None:
        """Type-specific validation rules."""
        if schema_type in ("Article", "BlogPosting", "NewsArticle"):
            # Check image
            if "image" not in schema:
                result.issues.append(ValidationIssue(
                    severity="warning",
                    message="Article without image may not show in rich results",
                    schema_type=schema_type,
                    property_name="image",
                    suggestion="Add at least one image to the article",
                ))
            # Check headline length
            headline = schema.get("headline", "")
            if len(headline) > 110:
                result.issues.append(ValidationIssue(
                    severity="warning",
                    message=f"Headline too long ({len(headline)} chars, max 110)",
                    schema_type=schema_type,
                    property_name="headline",
                ))
        elif schema_type == "Product":
            offer = schema.get("offers", {})
            if isinstance(offer, dict):
                # Check price
                price = offer.get("price")
                if price is not None:
                    try:
                        float(price)
                    except (ValueError, TypeError):
                        result.issues.append(ValidationIssue(
                            severity="error",
                            message=f"Invalid price value: {price}",
                            schema_type="Offer",
                            property_name="price",
                        ))
                # Check availability
                availability = offer.get("availability", "")
                valid_availabilities = [
                    "InStock", "OutOfStock", "PreOrder", "Discontinued",
                    "https://schema.org/InStock", "https://schema.org/OutOfStock",
                ]
                if availability and not any(
                    a in availability for a in valid_availabilities
                ):
                    result.issues.append(ValidationIssue(
                        severity="warning",
                        message=f"Unknown availability value: {availability}",
                        schema_type="Offer",
                        property_name="availability",
                    ))
        elif schema_type == "LocalBusiness":
            # Check for geo coordinates
            if "geo" not in schema:
                result.issues.append(ValidationIssue(
                    severity="info",
                    message="Missing geo coordinates",
                    schema_type=schema_type,
                    property_name="geo",
                    suggestion="Add latitude/longitude for better local search",
                ))
        elif schema_type == "FAQPage":
            main_entity = schema.get("mainEntity", [])
            if not main_entity:
                result.issues.append(ValidationIssue(
                    severity="error",
                    message="FAQPage must have at least one question",
                    schema_type=schema_type,
                    property_name="mainEntity",
                ))
            elif len(main_entity) < 2:
                result.issues.append(ValidationIssue(
                    severity="info",
                    message="FAQPage has only one question",
                    schema_type=schema_type,
                    suggestion="Add more questions for better rich results",
                ))
    def _check_rich_results(self, schema: dict, schema_type: str) -> dict:
        """Check if schema is eligible for Google Rich Results."""
        result = {
            "eligible": True,
            "missing_for_rich_results": [],
        }
        if schema_type in ("Article", "BlogPosting", "NewsArticle"):
            required_for_rich = ["headline", "image", "datePublished", "author"]
            for prop in required_for_rich:
                if prop not in schema:
                    result["eligible"] = False
                    result["missing_for_rich_results"].append(prop)
        elif schema_type == "Product":
            if "name" not in schema:
                result["eligible"] = False
                result["missing_for_rich_results"].append("name")
            offer = schema.get("offers")
            if not offer:
                result["eligible"] = False
                result["missing_for_rich_results"].append("offers")
        elif schema_type == "FAQPage":
            if not schema.get("mainEntity"):
                result["eligible"] = False
                result["missing_for_rich_results"].append("mainEntity")
        return result
    def generate_report(self, result: ValidationResult) -> str:
        """Generate human-readable validation report."""
        lines = [
            "=" * 60,
            "Schema Validation Report",
            "=" * 60,
            f"URL: {result.url or 'N/A'}",
            f"Timestamp: {result.timestamp}",
            f"Valid: {'Yes' if result.valid else 'No'}",
            f"Schemas Found: {len(result.schemas_found)}",
            "",
        ]
        if result.schemas_found:
            lines.append("Schema Types:")
            for schema in result.schemas_found:
                schema_type = schema.get("@type", "Unknown")
                lines.append(f"  - {schema_type}")
            lines.append("")
        if result.rich_results_eligible:
            lines.append("Rich Results Eligibility:")
            for schema_type, status in result.rich_results_eligible.items():
                eligible = "Yes" if status["eligible"] else "No"
                lines.append(f"  - {schema_type}: {eligible}")
                if status["missing_for_rich_results"]:
                    missing = ", ".join(status["missing_for_rich_results"])
                    lines.append(f"    Missing: {missing}")
            lines.append("")
        if result.issues:
            lines.append("Issues Found:")
            errors = [i for i in result.issues if i.severity == "error"]
            warnings = [i for i in result.issues if i.severity == "warning"]
            infos = [i for i in result.issues if i.severity == "info"]
            if errors:
                lines.append(f"\n  ERRORS ({len(errors)}):")
                for issue in errors:
                    lines.append(f"    - [{issue.schema_type}] {issue.message}")
                    if issue.suggestion:
                        lines.append(f"      Suggestion: {issue.suggestion}")
            if warnings:
                lines.append(f"\n  WARNINGS ({len(warnings)}):")
                for issue in warnings:
                    lines.append(f"    - [{issue.schema_type}] {issue.message}")
                    if issue.suggestion:
                        lines.append(f"      Suggestion: {issue.suggestion}")
            if infos:
                lines.append(f"\n  INFO ({len(infos)}):")
                for issue in infos:
                    lines.append(f"    - [{issue.schema_type}] {issue.message}")
                    if issue.suggestion:
                        lines.append(f"      Suggestion: {issue.suggestion}")
        lines.append("")
        lines.append("=" * 60)
        return "\n".join(lines)
 def main():
    """Main entry point for CLI usage."""
    parser = argparse.ArgumentParser(
        description="Validate schema.org structured data",
    )
    parser.add_argument("--url", "-u", help="URL to validate")
    parser.add_argument("--file", "-f", help="JSON-LD file to validate")
    parser.add_argument("--output", "-o", help="Output file for JSON report")
    parser.add_argument("--json", action="store_true", help="Output as JSON")
    args = parser.parse_args()
    if not args.url and not args.file:
        parser.error("Must provide --url or --file")
    validator = SchemaValidator()
    if args.file:
        with open(args.file, "r", encoding="utf-8") as f:
            schema = json.load(f)
        result = validator.validate(schema=schema)
    else:
        result = validator.validate(url=args.url)
    if args.json or args.output:
        output = json.dumps(result.to_dict(), ensure_ascii=False, indent=2)
        if args.output:
            with open(args.output, "w", encoding="utf-8") as f:
                f.write(output)
            logger.info(f"Report written to {args.output}")
        else:
            print(output)
    else:
        print(validator.generate_report(result))
 if __name__ == "__main__":
    main()
--- a/ourdigital-custom-skills/12-ourdigital-seo-audit/scripts/sitemap_crawler.py
+++ b/ourdigital-custom-skills/12-ourdigital-seo-audit/scripts/sitemap_crawler.py
@@ -0,0 +1,969 @@
 """
 Sitemap Crawler - Sequential page analysis from sitemap
 =======================================================
 Purpose: Crawl sitemap URLs one by one, analyze each page, save to Notion
 Python: 3.10+
 Usage:
    from sitemap_crawler import SitemapCrawler
    crawler = SitemapCrawler()
    crawler.crawl_sitemap("https://example.com/sitemap.xml", delay=2.0)
 """
 import json
 import logging
 import time
 import xml.etree.ElementTree as ET
 from dataclasses import dataclass, field
 from datetime import datetime
 from pathlib import Path
 from typing import Callable, Generator
 from urllib.parse import urlparse
 import requests
 from notion_client import Client
 from base_client import config
 from page_analyzer import PageAnalyzer, PageMetadata
 logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s",
 )
 logger = logging.getLogger(__name__)
 # Default database for page analysis data
 DEFAULT_PAGES_DATABASE_ID = "2c8581e5-8a1e-8035-880b-e38cefc2f3ef"
 # Default limits to prevent excessive resource usage
 DEFAULT_MAX_PAGES = 500
 DEFAULT_DELAY_SECONDS = 2.0
 # Progress tracking directory
 PROGRESS_DIR = Path.home() / ".claude" / "seo-audit-progress"
 PROGRESS_DIR.mkdir(parents=True, exist_ok=True)
@dataclass
 class CrawlProgress:
    """Track crawl progress."""
    total_urls: int = 0
    processed_urls: int = 0
    successful_urls: int = 0
    failed_urls: int = 0
    skipped_urls: int = 0
    start_time: datetime = field(default_factory=datetime.now)
    current_url: str = ""
    audit_id: str = ""
    site: str = ""
    status: str = "running"  # running, completed, failed
    error_message: str = ""
    summary_page_id: str = ""
    def get_progress_percent(self) -> float:
        if self.total_urls == 0:
            return 0.0
        return (self.processed_urls / self.total_urls) * 100
    def get_elapsed_time(self) -> str:
        elapsed = datetime.now() - self.start_time
        minutes = int(elapsed.total_seconds() // 60)
        seconds = int(elapsed.total_seconds() % 60)
        return f"{minutes}m {seconds}s"
    def get_eta(self) -> str:
        if self.processed_urls == 0:
            return "calculating..."
        elapsed = (datetime.now() - self.start_time).total_seconds()
        avg_time_per_url = elapsed / self.processed_urls
        remaining_urls = self.total_urls - self.processed_urls
        eta_seconds = remaining_urls * avg_time_per_url
        minutes = int(eta_seconds // 60)
        seconds = int(eta_seconds % 60)
        return f"{minutes}m {seconds}s"
    def to_dict(self) -> dict:
        """Convert to dictionary for JSON serialization."""
        return {
            "audit_id": self.audit_id,
            "site": self.site,
            "status": self.status,
            "total_urls": self.total_urls,
            "processed_urls": self.processed_urls,
            "successful_urls": self.successful_urls,
            "failed_urls": self.failed_urls,
            "progress_percent": round(self.get_progress_percent(), 1),
            "elapsed_time": self.get_elapsed_time(),
            "eta": self.get_eta(),
            "current_url": self.current_url,
            "start_time": self.start_time.isoformat(),
            "error_message": self.error_message,
            "summary_page_id": self.summary_page_id,
            "updated_at": datetime.now().isoformat(),
        }
    def save_to_file(self, filepath: Path | None = None) -> Path:
        """Save progress to JSON file."""
        if filepath is None:
            filepath = PROGRESS_DIR / f"{self.audit_id}.json"
        with open(filepath, "w") as f:
            json.dump(self.to_dict(), f, indent=2)
        return filepath
    @classmethod
    def load_from_file(cls, filepath: Path) -> "CrawlProgress":
        """Load progress from JSON file."""
        with open(filepath, "r") as f:
            data = json.load(f)
        progress = cls()
        progress.audit_id = data.get("audit_id", "")
        progress.site = data.get("site", "")
        progress.status = data.get("status", "unknown")
        progress.total_urls = data.get("total_urls", 0)
        progress.processed_urls = data.get("processed_urls", 0)
        progress.successful_urls = data.get("successful_urls", 0)
        progress.failed_urls = data.get("failed_urls", 0)
        progress.current_url = data.get("current_url", "")
        progress.error_message = data.get("error_message", "")
        progress.summary_page_id = data.get("summary_page_id", "")
        if data.get("start_time"):
            progress.start_time = datetime.fromisoformat(data["start_time"])
        return progress
 def get_active_crawls() -> list[CrawlProgress]:
    """Get all active (running) crawl jobs."""
    active = []
    for filepath in PROGRESS_DIR.glob("*.json"):
        try:
            progress = CrawlProgress.load_from_file(filepath)
            if progress.status == "running":
                active.append(progress)
        except Exception:
            continue
    return active
 def get_all_crawls() -> list[CrawlProgress]:
    """Get all crawl jobs (active and completed)."""
    crawls = []
    for filepath in sorted(PROGRESS_DIR.glob("*.json"), reverse=True):
        try:
            progress = CrawlProgress.load_from_file(filepath)
            crawls.append(progress)
        except Exception:
            continue
    return crawls
 def get_crawl_status(audit_id: str) -> CrawlProgress | None:
    """Get status of a specific crawl by audit ID."""
    filepath = PROGRESS_DIR / f"{audit_id}.json"
    if filepath.exists():
        return CrawlProgress.load_from_file(filepath)
    return None
@dataclass
 class CrawlResult:
    """Result of a complete sitemap crawl."""
    site: str
    sitemap_url: str
    audit_id: str
    total_pages: int
    successful_pages: int
    failed_pages: int
    start_time: datetime
    end_time: datetime
    pages_analyzed: list[PageMetadata] = field(default_factory=list)
    notion_page_ids: list[str] = field(default_factory=list)
    summary_page_id: str | None = None
    def get_duration(self) -> str:
        duration = self.end_time - self.start_time
        minutes = int(duration.total_seconds() // 60)
        seconds = int(duration.total_seconds() % 60)
        return f"{minutes}m {seconds}s"
 class SitemapCrawler:
    """Crawl sitemap URLs and analyze each page."""
    def __init__(
        self,
        notion_token: str | None = None,
        database_id: str | None = None,
    ):
        """
        Initialize sitemap crawler.
        Args:
            notion_token: Notion API token
            database_id: Notion database ID for storing results
        """
        self.notion_token = notion_token or config.notion_token
        self.database_id = database_id or DEFAULT_PAGES_DATABASE_ID
        self.analyzer = PageAnalyzer()
        if self.notion_token:
            self.notion = Client(auth=self.notion_token)
        else:
            self.notion = None
            logger.warning("Notion token not configured, results will not be saved")
    def fetch_sitemap_urls(self, sitemap_url: str) -> list[str]:
        """
        Fetch and parse URLs from a sitemap.
        Args:
            sitemap_url: URL of the sitemap
        Returns:
            List of URLs found in the sitemap
        """
        try:
            response = requests.get(sitemap_url, timeout=30)
            response.raise_for_status()
            # Parse XML
            root = ET.fromstring(response.content)
            # Handle namespace
            namespaces = {
                "sm": "http://www.sitemaps.org/schemas/sitemap/0.9"
            }
            urls = []
            # Check if this is a sitemap index
            sitemap_tags = root.findall(".//sm:sitemap/sm:loc", namespaces)
            if sitemap_tags:
                # This is a sitemap index, recursively fetch child sitemaps
                logger.info(f"Found sitemap index with {len(sitemap_tags)} child sitemaps")
                for loc in sitemap_tags:
                    if loc.text:
                        child_urls = self.fetch_sitemap_urls(loc.text)
                        urls.extend(child_urls)
            else:
                # Regular sitemap, extract URLs
                url_tags = root.findall(".//sm:url/sm:loc", namespaces)
                if not url_tags:
                    # Try without namespace
                    url_tags = root.findall(".//url/loc")
                for loc in url_tags:
                    if loc.text:
                        urls.append(loc.text)
            # Remove duplicates while preserving order
            seen = set()
            unique_urls = []
            for url in urls:
                if url not in seen:
                    seen.add(url)
                    unique_urls.append(url)
            logger.info(f"Found {len(unique_urls)} unique URLs in sitemap")
            return unique_urls
        except Exception as e:
            logger.error(f"Failed to fetch sitemap: {e}")
            raise
    def crawl_sitemap(
        self,
        sitemap_url: str,
        delay: float = DEFAULT_DELAY_SECONDS,
        max_pages: int = DEFAULT_MAX_PAGES,
        progress_callback: Callable[[CrawlProgress], None] | None = None,
        save_to_notion: bool = True,
        url_filter: Callable[[str], bool] | None = None,
    ) -> CrawlResult:
        """
        Crawl all URLs in a sitemap sequentially.
        Args:
            sitemap_url: URL of the sitemap
            delay: Seconds to wait between requests (default: 2.0s)
            max_pages: Maximum number of pages to process (default: 500)
            progress_callback: Function called with progress updates
            save_to_notion: Whether to save results to Notion
            url_filter: Optional function to filter URLs (return True to include)
        Returns:
            CrawlResult with all analyzed pages
        """
        # Parse site info
        parsed_sitemap = urlparse(sitemap_url)
        site = f"{parsed_sitemap.scheme}://{parsed_sitemap.netloc}"
        site_domain = parsed_sitemap.netloc
        # Generate audit ID
        audit_id = f"{site_domain}-pages-{datetime.now().strftime('%Y%m%d-%H%M%S')}"
        logger.info(f"Starting sitemap crawl: {sitemap_url}")
        logger.info(f"Audit ID: {audit_id}")
        logger.info(f"Delay between requests: {delay}s")
        # Initialize progress tracking
        progress = CrawlProgress(
            audit_id=audit_id,
            site=site,
            status="running",
        )
        # Fetch URLs
        urls = self.fetch_sitemap_urls(sitemap_url)
        # Apply URL filter if provided
        if url_filter:
            urls = [url for url in urls if url_filter(url)]
            logger.info(f"After filtering: {len(urls)} URLs")
        # Apply max pages limit (default: 500 to prevent excessive resource usage)
        if len(urls) > max_pages:
            logger.warning(f"Sitemap has {len(urls)} URLs, limiting to {max_pages} pages")
            logger.warning(f"Use max_pages parameter to adjust this limit")
            urls = urls[:max_pages]
        logger.info(f"Processing {len(urls)} pages (max: {max_pages})")
        # Update progress with total URLs
        progress.total_urls = len(urls)
        progress.save_to_file()
        # Initialize result
        result = CrawlResult(
            site=site,
            sitemap_url=sitemap_url,
            audit_id=audit_id,
            total_pages=len(urls),
            successful_pages=0,
            failed_pages=0,
            start_time=datetime.now(),
            end_time=datetime.now(),
        )
        # Process each URL
        try:
            for i, url in enumerate(urls):
                progress.current_url = url
                progress.processed_urls = i
                progress.save_to_file()  # Save progress to file
                if progress_callback:
                    progress_callback(progress)
                logger.info(f"[{i+1}/{len(urls)}] Analyzing: {url}")
                try:
                    # Analyze page
                    metadata = self.analyzer.analyze_url(url)
                    result.pages_analyzed.append(metadata)
                    if metadata.status_code == 200:
                        progress.successful_urls += 1
                        result.successful_pages += 1
                        # Save to Notion
                        if save_to_notion and self.notion:
                            page_id = self._save_page_to_notion(metadata, audit_id, site)
                            if page_id:
                                result.notion_page_ids.append(page_id)
                    else:
                        progress.failed_urls += 1
                        result.failed_pages += 1
                except Exception as e:
                    logger.error(f"Failed to analyze {url}: {e}")
                    progress.failed_urls += 1
                    result.failed_pages += 1
                # Wait before next request
                if i < len(urls) - 1:  # Don't wait after last URL
                    time.sleep(delay)
            # Final progress update
            progress.processed_urls = len(urls)
            progress.status = "completed"
            if progress_callback:
                progress_callback(progress)
        except Exception as e:
            progress.status = "failed"
            progress.error_message = str(e)
            progress.save_to_file()
            raise
        # Update result
        result.end_time = datetime.now()
        # Create summary page
        if save_to_notion and self.notion:
            summary_id = self._create_crawl_summary_page(result)
            result.summary_page_id = summary_id
            progress.summary_page_id = summary_id
        # Save final progress
        progress.save_to_file()
        logger.info(f"Crawl complete: {result.successful_pages}/{result.total_pages} pages analyzed")
        logger.info(f"Duration: {result.get_duration()}")
        return result
    def _save_page_to_notion(
        self,
        metadata: PageMetadata,
        audit_id: str,
        site: str,
    ) -> str | None:
        """Save page metadata to Notion database."""
        try:
            # Build properties
            properties = {
                "Issue": {"title": [{"text": {"content": f"📄 {metadata.url}"}}]},
                "Category": {"select": {"name": "On-page SEO"}},
                "Priority": {"select": {"name": self._determine_priority(metadata)}},
                "Site": {"url": site},
                "URL": {"url": metadata.url},
                "Audit ID": {"rich_text": [{"text": {"content": audit_id}}]},
                "Found Date": {"date": {"start": datetime.now().strftime("%Y-%m-%d")}},
            }
            # Build page content
            children = self._build_page_content(metadata)
            response = self.notion.pages.create(
                parent={"database_id": self.database_id},
                properties=properties,
                children=children,
            )
            return response["id"]
        except Exception as e:
            logger.error(f"Failed to save to Notion: {e}")
            return None
    def _determine_priority(self, metadata: PageMetadata) -> str:
        """Determine priority based on issues found."""
        if len(metadata.issues) >= 3:
            return "High"
        elif len(metadata.issues) >= 1:
            return "Medium"
        elif len(metadata.warnings) >= 3:
            return "Medium"
        else:
            return "Low"
    def _build_page_content(self, metadata: PageMetadata) -> list[dict]:
        """Build Notion page content blocks from metadata."""
        children = []
        # Status summary callout
        status_emoji = "✅" if not metadata.issues else "⚠️" if len(metadata.issues) < 3 else "❌"
        children.append({
            "object": "block",
            "type": "callout",
            "callout": {
                "rich_text": [
                    {"type": "text", "text": {"content": f"Status: {metadata.status_code} | "}},
                    {"type": "text", "text": {"content": f"Response: {metadata.response_time_ms:.0f}ms | "}},
                    {"type": "text", "text": {"content": f"Issues: {len(metadata.issues)} | "}},
                    {"type": "text", "text": {"content": f"Warnings: {len(metadata.warnings)}"}},
                ],
                "icon": {"type": "emoji", "emoji": status_emoji},
                "color": "gray_background" if not metadata.issues else "yellow_background" if len(metadata.issues) < 3 else "red_background",
            }
        })
        # Meta Tags Section
        children.append({
            "object": "block",
            "type": "heading_2",
            "heading_2": {"rich_text": [{"type": "text", "text": {"content": "Meta Tags"}}]}
        })
        # Meta tags table
        meta_rows = [
            {"type": "table_row", "table_row": {"cells": [
                [{"type": "text", "text": {"content": "Tag"}, "annotations": {"bold": True}}],
                [{"type": "text", "text": {"content": "Value"}, "annotations": {"bold": True}}],
                [{"type": "text", "text": {"content": "Status"}, "annotations": {"bold": True}}],
            ]}},
            {"type": "table_row", "table_row": {"cells": [
                [{"type": "text", "text": {"content": "Title"}}],
                [{"type": "text", "text": {"content": (metadata.title or "—")[:50]}}],
                [{"type": "text", "text": {"content": f"✓ {metadata.title_length} chars" if metadata.title else "✗ Missing"}}],
            ]}},
            {"type": "table_row", "table_row": {"cells": [
                [{"type": "text", "text": {"content": "Description"}}],
                [{"type": "text", "text": {"content": (metadata.meta_description or "—")[:50]}}],
                [{"type": "text", "text": {"content": f"✓ {metadata.meta_description_length} chars" if metadata.meta_description else "✗ Missing"}}],
            ]}},
            {"type": "table_row", "table_row": {"cells": [
                [{"type": "text", "text": {"content": "Canonical"}}],
                [{"type": "text", "text": {"content": (metadata.canonical_url or "—")[:50]}}],
                [{"type": "text", "text": {"content": "✓" if metadata.canonical_url else "✗ Missing"}}],
            ]}},
            {"type": "table_row", "table_row": {"cells": [
                [{"type": "text", "text": {"content": "Robots"}}],
                [{"type": "text", "text": {"content": metadata.robots_meta or "—"}}],
                [{"type": "text", "text": {"content": "✓" if metadata.robots_meta else "—"}}],
            ]}},
            {"type": "table_row", "table_row": {"cells": [
                [{"type": "text", "text": {"content": "Lang"}}],
                [{"type": "text", "text": {"content": metadata.html_lang or "—"}}],
                [{"type": "text", "text": {"content": "✓" if metadata.html_lang else "—"}}],
            ]}},
        ]
        children.append({
            "object": "block",
            "type": "table",
            "table": {
                "table_width": 3,
                "has_column_header": True,
                "has_row_header": False,
                "children": meta_rows
            }
        })
        # Headings Section
        children.append({
            "object": "block",
            "type": "heading_2",
            "heading_2": {"rich_text": [{"type": "text", "text": {"content": "Headings"}}]}
        })
        children.append({
            "object": "block",
            "type": "paragraph",
            "paragraph": {"rich_text": [
                {"type": "text", "text": {"content": f"H1: {metadata.h1_count} | "}},
                {"type": "text", "text": {"content": f"Total headings: {len(metadata.headings)}"}},
            ]}
        })
        if metadata.h1_text:
            children.append({
                "object": "block",
                "type": "quote",
                "quote": {"rich_text": [{"type": "text", "text": {"content": metadata.h1_text[:200]}}]}
            })
        # Schema Data Section
        children.append({
            "object": "block",
            "type": "heading_2",
            "heading_2": {"rich_text": [{"type": "text", "text": {"content": "Structured Data"}}]}
        })
        if metadata.schema_types_found:
            children.append({
                "object": "block",
                "type": "paragraph",
                "paragraph": {"rich_text": [
                    {"type": "text", "text": {"content": "Schema types found: "}},
                    {"type": "text", "text": {"content": ", ".join(metadata.schema_types_found)}, "annotations": {"code": True}},
                ]}
            })
        else:
            children.append({
                "object": "block",
                "type": "callout",
                "callout": {
                    "rich_text": [{"type": "text", "text": {"content": "No structured data found on this page"}}],
                    "icon": {"type": "emoji", "emoji": "⚠️"},
                    "color": "yellow_background",
                }
            })
        # Open Graph Section
        children.append({
            "object": "block",
            "type": "heading_2",
            "heading_2": {"rich_text": [{"type": "text", "text": {"content": "Open Graph"}}]}
        })
        og = metadata.open_graph
        og_status = "✓ Configured" if og.og_title else "✗ Missing"
        children.append({
            "object": "block",
            "type": "paragraph",
            "paragraph": {"rich_text": [
                {"type": "text", "text": {"content": f"Status: {og_status}\n"}},
                {"type": "text", "text": {"content": f"og:title: {og.og_title or '—'}\n"}},
                {"type": "text", "text": {"content": f"og:type: {og.og_type or '—'}"}},
            ]}
        })
        # Links Section
        children.append({
            "object": "block",
            "type": "heading_2",
            "heading_2": {"rich_text": [{"type": "text", "text": {"content": "Links"}}]}
        })
        children.append({
            "object": "block",
            "type": "paragraph",
            "paragraph": {"rich_text": [
                {"type": "text", "text": {"content": f"Internal links: {metadata.internal_link_count}\n"}},
                {"type": "text", "text": {"content": f"External links: {metadata.external_link_count}"}},
            ]}
        })
        # Images Section
        children.append({
            "object": "block",
            "type": "heading_2",
            "heading_2": {"rich_text": [{"type": "text", "text": {"content": "Images"}}]}
        })
        children.append({
            "object": "block",
            "type": "paragraph",
            "paragraph": {"rich_text": [
                {"type": "text", "text": {"content": f"Total: {metadata.images_total} | "}},
                {"type": "text", "text": {"content": f"With alt: {metadata.images_with_alt} | "}},
                {"type": "text", "text": {"content": f"Without alt: {metadata.images_without_alt}"}},
            ]}
        })
        # Hreflang Section (if present)
        if metadata.hreflang_tags:
            children.append({
                "object": "block",
                "type": "heading_2",
                "heading_2": {"rich_text": [{"type": "text", "text": {"content": "Hreflang Tags"}}]}
            })
            for tag in metadata.hreflang_tags[:10]:
                children.append({
                    "object": "block",
                    "type": "bulleted_list_item",
                    "bulleted_list_item": {"rich_text": [
                        {"type": "text", "text": {"content": f"{tag['lang']}: "}},
                        {"type": "text", "text": {"content": tag['url'], "link": {"url": tag['url']}}},
                    ]}
                })
        # Issues & Warnings Section
        if metadata.issues or metadata.warnings:
            children.append({
                "object": "block",
                "type": "heading_2",
                "heading_2": {"rich_text": [{"type": "text", "text": {"content": "Issues & Warnings"}}]}
            })
            for issue in metadata.issues:
                children.append({
                    "object": "block",
                    "type": "to_do",
                    "to_do": {
                        "rich_text": [
                            {"type": "text", "text": {"content": "❌ "}, "annotations": {"bold": True}},
                            {"type": "text", "text": {"content": issue}},
                        ],
                        "checked": False,
                    }
                })
            for warning in metadata.warnings:
                children.append({
                    "object": "block",
                    "type": "to_do",
                    "to_do": {
                        "rich_text": [
                            {"type": "text", "text": {"content": "⚠️ "}, "annotations": {"bold": True}},
                            {"type": "text", "text": {"content": warning}},
                        ],
                        "checked": False,
                    }
                })
        return children
    def _create_crawl_summary_page(self, result: CrawlResult) -> str | None:
        """Create a summary page for the crawl."""
        try:
            site_domain = urlparse(result.site).netloc
            # Calculate statistics
            total_issues = sum(len(p.issues) for p in result.pages_analyzed)
            total_warnings = sum(len(p.warnings) for p in result.pages_analyzed)
            pages_with_issues = sum(1 for p in result.pages_analyzed if p.issues)
            pages_without_schema = sum(1 for p in result.pages_analyzed if not p.schema_types_found)
            pages_without_description = sum(1 for p in result.pages_analyzed if not p.meta_description)
            children = []
            # Header callout
            children.append({
                "object": "block",
                "type": "callout",
                "callout": {
                    "rich_text": [
                        {"type": "text", "text": {"content": f"Sitemap Crawl Complete\n\n"}},
                        {"type": "text", "text": {"content": f"Audit ID: {result.audit_id}\n"}},
                        {"type": "text", "text": {"content": f"Duration: {result.get_duration()}\n"}},
                        {"type": "text", "text": {"content": f"Pages: {result.successful_pages}/{result.total_pages}"}},
                    ],
                    "icon": {"type": "emoji", "emoji": "📊"},
                    "color": "blue_background",
                }
            })
            # Statistics table
            children.append({
                "object": "block",
                "type": "heading_2",
                "heading_2": {"rich_text": [{"type": "text", "text": {"content": "Statistics"}}]}
            })
            stats_rows = [
                {"type": "table_row", "table_row": {"cells": [
                    [{"type": "text", "text": {"content": "Metric"}, "annotations": {"bold": True}}],
                    [{"type": "text", "text": {"content": "Count"}, "annotations": {"bold": True}}],
                ]}},
                {"type": "table_row", "table_row": {"cells": [
                    [{"type": "text", "text": {"content": "Total Pages"}}],
                    [{"type": "text", "text": {"content": str(result.total_pages)}}],
                ]}},
                {"type": "table_row", "table_row": {"cells": [
                    [{"type": "text", "text": {"content": "Successfully Analyzed"}}],
                    [{"type": "text", "text": {"content": str(result.successful_pages)}}],
                ]}},
                {"type": "table_row", "table_row": {"cells": [
                    [{"type": "text", "text": {"content": "Pages with Issues"}}],
                    [{"type": "text", "text": {"content": str(pages_with_issues)}}],
                ]}},
                {"type": "table_row", "table_row": {"cells": [
                    [{"type": "text", "text": {"content": "Total Issues"}}],
                    [{"type": "text", "text": {"content": str(total_issues)}}],
                ]}},
                {"type": "table_row", "table_row": {"cells": [
                    [{"type": "text", "text": {"content": "Total Warnings"}}],
                    [{"type": "text", "text": {"content": str(total_warnings)}}],
                ]}},
                {"type": "table_row", "table_row": {"cells": [
                    [{"type": "text", "text": {"content": "Pages without Schema"}}],
                    [{"type": "text", "text": {"content": str(pages_without_schema)}}],
                ]}},
                {"type": "table_row", "table_row": {"cells": [
                    [{"type": "text", "text": {"content": "Pages without Description"}}],
                    [{"type": "text", "text": {"content": str(pages_without_description)}}],
                ]}},
            ]
            children.append({
                "object": "block",
                "type": "table",
                "table": {
                    "table_width": 2,
                    "has_column_header": True,
                    "has_row_header": False,
                    "children": stats_rows
                }
            })
            # Pages list
            children.append({
                "object": "block",
                "type": "heading_2",
                "heading_2": {"rich_text": [{"type": "text", "text": {"content": "Analyzed Pages"}}]}
            })
            children.append({
                "object": "block",
                "type": "paragraph",
                "paragraph": {"rich_text": [
                    {"type": "text", "text": {"content": f"Filter by Audit ID in the database to see all {result.successful_pages} page entries."}}
                ]}
            })
            # Create the summary page
            response = self.notion.pages.create(
                parent={"database_id": self.database_id},
                properties={
                    "Issue": {"title": [{"text": {"content": f"📊 Sitemap Crawl: {site_domain}"}}]},
                    "Category": {"select": {"name": "Technical SEO"}},
                    "Priority": {"select": {"name": "High"}},
                    "Site": {"url": result.site},
                    "Audit ID": {"rich_text": [{"text": {"content": result.audit_id}}]},
                    "Found Date": {"date": {"start": datetime.now().strftime("%Y-%m-%d")}},
                },
                children=children,
            )
            logger.info(f"Created crawl summary page: {response['id']}")
            return response["id"]
        except Exception as e:
            logger.error(f"Failed to create summary page: {e}")
            return None
 def print_progress_status(progress: CrawlProgress) -> None:
    """Print formatted progress status."""
    status_emoji = {
        "running": "🔄",
        "completed": "✅",
        "failed": "❌",
    }.get(progress.status, "❓")
    print(f"""
 {'=' * 60}
 {status_emoji} SEO Page Analysis - {progress.status.upper()}
 {'=' * 60}
 Audit ID:    {progress.audit_id}
 Site:        {progress.site}
 Status:      {progress.status}
 Progress:    {progress.processed_urls}/{progress.total_urls} pages ({progress.get_progress_percent():.1f}%)
 Successful:  {progress.successful_urls}
 Failed:      {progress.failed_urls}
 Elapsed:     {progress.get_elapsed_time()}
 ETA:         {progress.get_eta() if progress.status == 'running' else 'N/A'}
 Current URL: {progress.current_url[:60] + '...' if len(progress.current_url) > 60 else progress.current_url}
 """)
    if progress.summary_page_id:
        print(f"Summary:     https://www.notion.so/{progress.summary_page_id.replace('-', '')}")
    if progress.error_message:
        print(f"Error:       {progress.error_message}")
    print("=" * 60)
 def main():
    """CLI entry point."""
    import argparse
    parser = argparse.ArgumentParser(description="Sitemap Crawler with Background Support")
    subparsers = parser.add_subparsers(dest="command", help="Commands")
    # Crawl command
    crawl_parser = subparsers.add_parser("crawl", help="Start crawling a sitemap")
    crawl_parser.add_argument("sitemap_url", help="URL of the sitemap to crawl")
    crawl_parser.add_argument("--delay", "-d", type=float, default=DEFAULT_DELAY_SECONDS,
                             help=f"Delay between requests in seconds (default: {DEFAULT_DELAY_SECONDS})")
    crawl_parser.add_argument("--max-pages", "-m", type=int, default=DEFAULT_MAX_PAGES,
                             help=f"Maximum pages to process (default: {DEFAULT_MAX_PAGES})")
    crawl_parser.add_argument("--no-notion", action="store_true",
                             help="Don't save to Notion")
    crawl_parser.add_argument("--no-limit", action="store_true",
                             help="Remove page limit (use with caution)")
    # Status command
    status_parser = subparsers.add_parser("status", help="Check crawl progress")
    status_parser.add_argument("audit_id", nargs="?", help="Specific audit ID to check (optional)")
    status_parser.add_argument("--all", "-a", action="store_true", help="Show all crawls (not just active)")
    # List command
    list_parser = subparsers.add_parser("list", help="List all crawl jobs")
    args = parser.parse_args()
    # Default to crawl if no command specified but URL provided
    if args.command is None:
        # Check if first positional arg looks like a URL
        import sys
        if len(sys.argv) > 1 and (sys.argv[1].startswith("http") or sys.argv[1].endswith(".xml")):
            args.command = "crawl"
            args.sitemap_url = sys.argv[1]
            args.delay = DEFAULT_DELAY_SECONDS
            args.max_pages = DEFAULT_MAX_PAGES
            args.no_notion = False
            args.no_limit = False
        else:
            parser.print_help()
            return
    if args.command == "status":
        if args.audit_id:
            # Show specific crawl status
            progress = get_crawl_status(args.audit_id)
            if progress:
                print_progress_status(progress)
            else:
                print(f"No crawl found with audit ID: {args.audit_id}")
        else:
            # Show active crawls
            if args.all:
                crawls = get_all_crawls()
                label = "All"
            else:
                crawls = get_active_crawls()
                label = "Active"
            if crawls:
                print(f"\n{label} Crawl Jobs ({len(crawls)}):")
                print("-" * 60)
                for p in crawls:
                    status_emoji = {"running": "🔄", "completed": "✅", "failed": "❌"}.get(p.status, "❓")
                    print(f"{status_emoji} {p.audit_id}")
                    print(f"   Site: {p.site}")
                    print(f"   Progress: {p.processed_urls}/{p.total_urls} ({p.get_progress_percent():.1f}%)")
                    print()
            else:
                print(f"No {label.lower()} crawl jobs found.")
        return
    if args.command == "list":
        crawls = get_all_crawls()
        if crawls:
            print(f"\nAll Crawl Jobs ({len(crawls)}):")
            print("-" * 80)
            print(f"{'Status':<10} {'Audit ID':<45} {'Progress':<15}")
            print("-" * 80)
            for p in crawls[:20]:  # Show last 20
                status_emoji = {"running": "🔄", "completed": "✅", "failed": "❌"}.get(p.status, "❓")
                progress_str = f"{p.processed_urls}/{p.total_urls}"
                print(f"{status_emoji} {p.status:<7} {p.audit_id:<45} {progress_str:<15}")
            if len(crawls) > 20:
                print(f"... and {len(crawls) - 20} more")
        else:
            print("No crawl jobs found.")
        return
    if args.command == "crawl":
        # Handle --no-limit option
        max_pages = args.max_pages
        if args.no_limit:
            max_pages = 999999  # Effectively unlimited
            print("⚠️  WARNING: Page limit disabled. This may take a very long time!")
        def progress_callback(progress: CrawlProgress):
            pct = progress.get_progress_percent()
            print(f"\r[{pct:5.1f}%] {progress.processed_urls}/{progress.total_urls} pages | "
                  f"Success: {progress.successful_urls} | Failed: {progress.failed_urls} | "
                  f"ETA: {progress.get_eta()}", end="", flush=True)
        crawler = SitemapCrawler()
        result = crawler.crawl_sitemap(
            args.sitemap_url,
            delay=args.delay,
            max_pages=max_pages,
            progress_callback=progress_callback,
            save_to_notion=not args.no_notion,
        )
        print()  # New line after progress
        print()
        print("=" * 60)
        print("CRAWL COMPLETE")
        print("=" * 60)
        print(f"Audit ID: {result.audit_id}")
        print(f"Total Pages: {result.total_pages}")
        print(f"Successful: {result.successful_pages}")
        print(f"Failed: {result.failed_pages}")
        print(f"Duration: {result.get_duration()}")
        if result.summary_page_id:
            print(f"Summary Page: https://www.notion.so/{result.summary_page_id.replace('-', '')}")
 if __name__ == "__main__":
    main()
--- a/ourdigital-custom-skills/12-ourdigital-seo-audit/scripts/sitemap_validator.py
+++ b/ourdigital-custom-skills/12-ourdigital-seo-audit/scripts/sitemap_validator.py
@@ -0,0 +1,467 @@
 """
 Sitemap Validator - Validate XML sitemaps
 ==========================================
 Purpose: Parse and validate XML sitemaps for SEO compliance
 Python: 3.10+
 Usage:
    python sitemap_validator.py --url https://example.com/sitemap.xml
 """
 import argparse
 import asyncio
 import gzip
 import json
 import logging
 import re
 from dataclasses import dataclass, field
 from datetime import datetime
 from io import BytesIO
 from typing import Any
 from urllib.parse import urljoin, urlparse
 import aiohttp
 import requests
 from lxml import etree
 logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s",
 )
 logger = logging.getLogger(__name__)
@dataclass
 class SitemapIssue:
    """Represents a sitemap validation issue."""
    severity: str  # "error", "warning", "info"
    message: str
    url: str | None = None
    suggestion: str | None = None
@dataclass
 class SitemapEntry:
    """Represents a single URL entry in sitemap."""
    loc: str
    lastmod: str | None = None
    changefreq: str | None = None
    priority: float | None = None
    status_code: int | None = None
@dataclass
 class SitemapResult:
    """Complete sitemap validation result."""
    url: str
    sitemap_type: str  # "urlset" or "sitemapindex"
    entries: list[SitemapEntry] = field(default_factory=list)
    child_sitemaps: list[str] = field(default_factory=list)
    issues: list[SitemapIssue] = field(default_factory=list)
    valid: bool = True
    stats: dict = field(default_factory=dict)
    timestamp: str = field(default_factory=lambda: datetime.now().isoformat())
    def to_dict(self) -> dict:
        """Convert to dictionary for JSON output."""
        return {
            "url": self.url,
            "sitemap_type": self.sitemap_type,
            "valid": self.valid,
            "stats": self.stats,
            "issues": [
                {
                    "severity": i.severity,
                    "message": i.message,
                    "url": i.url,
                    "suggestion": i.suggestion,
                }
                for i in self.issues
            ],
            "entries_count": len(self.entries),
            "child_sitemaps": self.child_sitemaps,
            "timestamp": self.timestamp,
        }
 class SitemapValidator:
    """Validate XML sitemaps."""
    SITEMAP_NS = "http://www.sitemaps.org/schemas/sitemap/0.9"
    MAX_URLS = 50000
    MAX_SIZE_BYTES = 50 * 1024 * 1024  # 50MB
    VALID_CHANGEFREQ = {
        "always", "hourly", "daily", "weekly",
        "monthly", "yearly", "never"
    }
    def __init__(self, check_urls: bool = False, max_concurrent: int = 10):
        self.check_urls = check_urls
        self.max_concurrent = max_concurrent
        self.session = requests.Session()
        self.session.headers.update({
            "User-Agent": "Mozilla/5.0 (compatible; SEOAuditBot/1.0)"
        })
    def fetch_sitemap(self, url: str) -> tuple[bytes, bool]:
        """Fetch sitemap content, handling gzip compression."""
        try:
            response = self.session.get(url, timeout=30)
            response.raise_for_status()
            content = response.content
            is_gzipped = False
            # Check if gzipped
            if url.endswith(".gz") or response.headers.get(
                "Content-Encoding"
            ) == "gzip":
                try:
                    content = gzip.decompress(content)
                    is_gzipped = True
                except gzip.BadGzipFile:
                    pass
            return content, is_gzipped
        except requests.RequestException as e:
            raise RuntimeError(f"Failed to fetch sitemap: {e}")
    def parse_sitemap(self, content: bytes) -> tuple[str, list[dict]]:
        """Parse sitemap XML content."""
        try:
            root = etree.fromstring(content)
        except etree.XMLSyntaxError as e:
            raise ValueError(f"Invalid XML: {e}")
        # Remove namespace for easier parsing
        nsmap = {"sm": self.SITEMAP_NS}
        # Check if it's a sitemap index or urlset
        if root.tag == f"{{{self.SITEMAP_NS}}}sitemapindex":
            sitemap_type = "sitemapindex"
            entries = []
            for sitemap in root.findall("sm:sitemap", nsmap):
                entry = {}
                loc = sitemap.find("sm:loc", nsmap)
                if loc is not None and loc.text:
                    entry["loc"] = loc.text.strip()
                lastmod = sitemap.find("sm:lastmod", nsmap)
                if lastmod is not None and lastmod.text:
                    entry["lastmod"] = lastmod.text.strip()
                if entry.get("loc"):
                    entries.append(entry)
        elif root.tag == f"{{{self.SITEMAP_NS}}}urlset":
            sitemap_type = "urlset"
            entries = []
            for url in root.findall("sm:url", nsmap):
                entry = {}
                loc = url.find("sm:loc", nsmap)
                if loc is not None and loc.text:
                    entry["loc"] = loc.text.strip()
                lastmod = url.find("sm:lastmod", nsmap)
                if lastmod is not None and lastmod.text:
                    entry["lastmod"] = lastmod.text.strip()
                changefreq = url.find("sm:changefreq", nsmap)
                if changefreq is not None and changefreq.text:
                    entry["changefreq"] = changefreq.text.strip().lower()
                priority = url.find("sm:priority", nsmap)
                if priority is not None and priority.text:
                    try:
                        entry["priority"] = float(priority.text.strip())
                    except ValueError:
                        entry["priority"] = None
                if entry.get("loc"):
                    entries.append(entry)
        else:
            raise ValueError(f"Unknown sitemap type: {root.tag}")
        return sitemap_type, entries
    def validate(self, url: str) -> SitemapResult:
        """Validate a sitemap URL."""
        result = SitemapResult(url=url, sitemap_type="unknown")
        # Fetch sitemap
        try:
            content, is_gzipped = self.fetch_sitemap(url)
        except RuntimeError as e:
            result.issues.append(SitemapIssue(
                severity="error",
                message=str(e),
                url=url,
            ))
            result.valid = False
            return result
        # Check size
        if len(content) > self.MAX_SIZE_BYTES:
            result.issues.append(SitemapIssue(
                severity="error",
                message=f"Sitemap exceeds 50MB limit ({len(content) / 1024 / 1024:.2f}MB)",
                url=url,
                suggestion="Split sitemap into smaller files using sitemap index",
            ))
        # Parse XML
        try:
            sitemap_type, entries = self.parse_sitemap(content)
        except ValueError as e:
            result.issues.append(SitemapIssue(
                severity="error",
                message=str(e),
                url=url,
            ))
            result.valid = False
            return result
        result.sitemap_type = sitemap_type
        # Process entries
        if sitemap_type == "sitemapindex":
            result.child_sitemaps = [e["loc"] for e in entries]
            result.stats = {
                "child_sitemaps_count": len(entries),
            }
        else:
            # Validate URL entries
            url_count = len(entries)
            result.stats["url_count"] = url_count
            if url_count > self.MAX_URLS:
                result.issues.append(SitemapIssue(
                    severity="error",
                    message=f"Sitemap exceeds 50,000 URL limit ({url_count} URLs)",
                    url=url,
                    suggestion="Split into multiple sitemaps with sitemap index",
                ))
            if url_count == 0:
                result.issues.append(SitemapIssue(
                    severity="warning",
                    message="Sitemap is empty (no URLs)",
                    url=url,
                ))
            # Validate individual entries
            seen_urls = set()
            invalid_lastmod = 0
            invalid_changefreq = 0
            invalid_priority = 0
            for entry in entries:
                loc = entry.get("loc", "")
                # Check for duplicates
                if loc in seen_urls:
                    result.issues.append(SitemapIssue(
                        severity="warning",
                        message="Duplicate URL in sitemap",
                        url=loc,
                    ))
                seen_urls.add(loc)
                # Validate lastmod format
                lastmod = entry.get("lastmod")
                if lastmod:
                    if not self._validate_date(lastmod):
                        invalid_lastmod += 1
                # Validate changefreq
                changefreq = entry.get("changefreq")
                if changefreq and changefreq not in self.VALID_CHANGEFREQ:
                    invalid_changefreq += 1
                # Validate priority
                priority = entry.get("priority")
                if priority is not None:
                    if not (0.0 <= priority <= 1.0):
                        invalid_priority += 1
                # Create entry object
                result.entries.append(SitemapEntry(
                    loc=loc,
                    lastmod=lastmod,
                    changefreq=changefreq,
                    priority=priority,
                ))
            # Add summary issues
            if invalid_lastmod > 0:
                result.issues.append(SitemapIssue(
                    severity="warning",
                    message=f"{invalid_lastmod} URLs with invalid lastmod format",
                    suggestion="Use ISO 8601 format (YYYY-MM-DD or YYYY-MM-DDTHH:MM:SS+TZ)",
                ))
            if invalid_changefreq > 0:
                result.issues.append(SitemapIssue(
                    severity="info",
                    message=f"{invalid_changefreq} URLs with invalid changefreq",
                    suggestion="Use: always, hourly, daily, weekly, monthly, yearly, never",
                ))
            if invalid_priority > 0:
                result.issues.append(SitemapIssue(
                    severity="warning",
                    message=f"{invalid_priority} URLs with invalid priority (must be 0.0-1.0)",
                ))
            result.stats.update({
                "invalid_lastmod": invalid_lastmod,
                "invalid_changefreq": invalid_changefreq,
                "invalid_priority": invalid_priority,
                "has_lastmod": sum(1 for e in result.entries if e.lastmod),
                "has_changefreq": sum(1 for e in result.entries if e.changefreq),
                "has_priority": sum(1 for e in result.entries if e.priority is not None),
            })
        # Check URLs if requested
        if self.check_urls and result.entries:
            asyncio.run(self._check_url_status(result))
        # Determine validity
        result.valid = not any(i.severity == "error" for i in result.issues)
        return result
    def _validate_date(self, date_str: str) -> bool:
        """Validate ISO 8601 date format."""
        patterns = [
            r"^\d{4}-\d{2}-\d{2}$",
            r"^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}",
        ]
        return any(re.match(p, date_str) for p in patterns)
    async def _check_url_status(self, result: SitemapResult) -> None:
        """Check HTTP status of URLs in sitemap."""
        semaphore = asyncio.Semaphore(self.max_concurrent)
        async def check_url(entry: SitemapEntry) -> None:
            async with semaphore:
                try:
                    async with aiohttp.ClientSession() as session:
                        async with session.head(
                            entry.loc,
                            timeout=aiohttp.ClientTimeout(total=10),
                            allow_redirects=True,
                        ) as response:
                            entry.status_code = response.status
                except Exception:
                    entry.status_code = 0
        await asyncio.gather(*[check_url(e) for e in result.entries[:100]])
        # Count status codes
        status_counts = {}
        for entry in result.entries:
            if entry.status_code:
                status_counts[entry.status_code] = (
                    status_counts.get(entry.status_code, 0) + 1
                )
        result.stats["url_status_codes"] = status_counts
        # Add issues for non-200 URLs
        error_count = sum(
            1 for e in result.entries
            if e.status_code and e.status_code >= 400
        )
        if error_count > 0:
            result.issues.append(SitemapIssue(
                severity="warning",
                message=f"{error_count} URLs returning error status codes (4xx/5xx)",
                suggestion="Remove or fix broken URLs in sitemap",
            ))
    def generate_report(self, result: SitemapResult) -> str:
        """Generate human-readable validation report."""
        lines = [
            "=" * 60,
            "Sitemap Validation Report",
            "=" * 60,
            f"URL: {result.url}",
            f"Type: {result.sitemap_type}",
            f"Valid: {'Yes' if result.valid else 'No'}",
            f"Timestamp: {result.timestamp}",
            "",
        ]
        lines.append("Statistics:")
        for key, value in result.stats.items():
            lines.append(f"  {key}: {value}")
        lines.append("")
        if result.child_sitemaps:
            lines.append(f"Child Sitemaps ({len(result.child_sitemaps)}):")
            for sitemap in result.child_sitemaps[:10]:
                lines.append(f"  - {sitemap}")
            if len(result.child_sitemaps) > 10:
                lines.append(f"  ... and {len(result.child_sitemaps) - 10} more")
            lines.append("")
        if result.issues:
            lines.append("Issues Found:")
            errors = [i for i in result.issues if i.severity == "error"]
            warnings = [i for i in result.issues if i.severity == "warning"]
            infos = [i for i in result.issues if i.severity == "info"]
            if errors:
                lines.append(f"\n  ERRORS ({len(errors)}):")
                for issue in errors:
                    lines.append(f"    - {issue.message}")
                    if issue.url:
                        lines.append(f"      URL: {issue.url}")
                    if issue.suggestion:
                        lines.append(f"      Suggestion: {issue.suggestion}")
            if warnings:
                lines.append(f"\n  WARNINGS ({len(warnings)}):")
                for issue in warnings:
                    lines.append(f"    - {issue.message}")
                    if issue.suggestion:
                        lines.append(f"      Suggestion: {issue.suggestion}")
            if infos:
                lines.append(f"\n  INFO ({len(infos)}):")
                for issue in infos:
                    lines.append(f"    - {issue.message}")
        lines.append("")
        lines.append("=" * 60)
        return "\n".join(lines)
 def main():
    """Main entry point for CLI usage."""
    parser = argparse.ArgumentParser(
        description="Validate XML sitemaps",
    )
    parser.add_argument("--url", "-u", required=True, help="Sitemap URL to validate")
    parser.add_argument("--check-urls", action="store_true",
                       help="Check HTTP status of URLs (slower)")
    parser.add_argument("--output", "-o", help="Output file for JSON report")
    parser.add_argument("--json", action="store_true", help="Output as JSON")
    args = parser.parse_args()
    validator = SitemapValidator(check_urls=args.check_urls)
    result = validator.validate(args.url)
    if args.json or args.output:
        output = json.dumps(result.to_dict(), ensure_ascii=False, indent=2)
        if args.output:
            with open(args.output, "w", encoding="utf-8") as f:
                f.write(output)
            logger.info(f"Report written to {args.output}")
        else:
            print(output)
    else:
        print(validator.generate_report(result))
 if __name__ == "__main__":
    main()
--- a/ourdigital-custom-skills/12-ourdigital-seo-audit/templates/notion_database_schema.json
+++ b/ourdigital-custom-skills/12-ourdigital-seo-audit/templates/notion_database_schema.json
@@ -0,0 +1,88 @@
 {
  "_comment": "Default OurDigital SEO Audit Log Database",
  "database_id": "2c8581e5-8a1e-8035-880b-e38cefc2f3ef",
  "url": "https://www.notion.so/dintelligence/2c8581e58a1e8035880be38cefc2f3ef",
  "properties": {
    "Issue": {
      "type": "title",
      "description": "Primary identifier - issue title"
    },
    "Site": {
      "type": "url",
      "description": "The audited site URL (e.g., https://blog.ourdigital.org)"
    },
    "Category": {
      "type": "select",
      "options": [
        { "name": "Technical SEO", "color": "blue" },
        { "name": "On-page SEO", "color": "green" },
        { "name": "Content", "color": "purple" },
        { "name": "Local SEO", "color": "orange" },
        { "name": "Performance", "color": "red" },
        { "name": "Schema/Structured Data", "color": "yellow" },
        { "name": "Sitemap", "color": "pink" },
        { "name": "Robots.txt", "color": "gray" }
      ]
    },
    "Priority": {
      "type": "select",
      "options": [
        { "name": "Critical", "color": "red" },
        { "name": "High", "color": "orange" },
        { "name": "Medium", "color": "yellow" },
        { "name": "Low", "color": "gray" }
      ]
    },
    "Status": {
      "type": "status",
      "description": "Managed by Notion - default options: Not started, In progress, Done"
    },
    "URL": {
      "type": "url",
      "description": "Specific page with the issue"
    },
    "Found Date": {
      "type": "date",
      "description": "When issue was discovered"
    },
    "Audit ID": {
      "type": "rich_text",
      "description": "Groups findings from same audit session (format: domain-YYYYMMDD-HHMMSS)"
    }
  },
  "page_content_template": {
    "_comment": "Each finding page contains the following content blocks",
    "blocks": [
      {
        "type": "heading_2",
        "content": "Description"
      },
      {
        "type": "paragraph",
        "content": "{{description}}"
      },
      {
        "type": "heading_2",
        "content": "Impact"
      },
      {
        "type": "callout",
        "icon": "⚠️",
        "content": "{{impact}}"
      },
      {
        "type": "heading_2",
        "content": "Recommendation"
      },
      {
        "type": "callout",
        "icon": "💡",
        "content": "{{recommendation}}"
      }
    ]
  },
  "description": "Centralized SEO audit findings with categorized issues, priorities, and tracking status. Generated by ourdigital-seo-audit skill."
 }
--- a/ourdigital-custom-skills/12-ourdigital-seo-audit/templates/schema_templates/article.json
+++ b/ourdigital-custom-skills/12-ourdigital-seo-audit/templates/schema_templates/article.json
@@ -0,0 +1,32 @@
 {
  "@context": "https://schema.org",
  "@type": "{{article_type}}",
  "headline": "{{headline}}",
  "description": "{{description}}",
  "image": [
    "{{image_url_1}}",
    "{{image_url_2}}"
  ],
  "datePublished": "{{date_published}}",
  "dateModified": "{{date_modified}}",
  "author": {
    "@type": "Person",
    "name": "{{author_name}}",
    "url": "{{author_url}}"
  },
  "publisher": {
    "@type": "Organization",
    "name": "{{publisher_name}}",
    "logo": {
      "@type": "ImageObject",
      "url": "{{publisher_logo_url}}"
    }
  },
  "mainEntityOfPage": {
    "@type": "WebPage",
    "@id": "{{page_url}}"
  },
  "articleSection": "{{section}}",
  "wordCount": "{{word_count}}",
  "keywords": "{{keywords}}"
 }
--- a/ourdigital-custom-skills/12-ourdigital-seo-audit/templates/schema_templates/breadcrumb.json
+++ b/ourdigital-custom-skills/12-ourdigital-seo-audit/templates/schema_templates/breadcrumb.json
@@ -0,0 +1,24 @@
 {
  "@context": "https://schema.org",
  "@type": "BreadcrumbList",
  "itemListElement": [
    {
      "@type": "ListItem",
      "position": 1,
      "name": "{{level_1_name}}",
      "item": "{{level_1_url}}"
    },
    {
      "@type": "ListItem",
      "position": 2,
      "name": "{{level_2_name}}",
      "item": "{{level_2_url}}"
    },
    {
      "@type": "ListItem",
      "position": 3,
      "name": "{{level_3_name}}",
      "item": "{{level_3_url}}"
    }
  ]
 }
--- a/ourdigital-custom-skills/12-ourdigital-seo-audit/templates/schema_templates/faq.json
+++ b/ourdigital-custom-skills/12-ourdigital-seo-audit/templates/schema_templates/faq.json
@@ -0,0 +1,30 @@
 {
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "{{question_1}}",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "{{answer_1}}"
      }
    },
    {
      "@type": "Question",
      "name": "{{question_2}}",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "{{answer_2}}"
      }
    },
    {
      "@type": "Question",
      "name": "{{question_3}}",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "{{answer_3}}"
      }
    }
  ]
 }
--- a/ourdigital-custom-skills/12-ourdigital-seo-audit/templates/schema_templates/local_business.json
+++ b/ourdigital-custom-skills/12-ourdigital-seo-audit/templates/schema_templates/local_business.json
@@ -0,0 +1,47 @@
 {
  "@context": "https://schema.org",
  "@type": "{{business_type}}",
  "name": "{{name}}",
  "description": "{{description}}",
  "url": "{{url}}",
  "telephone": "{{phone}}",
  "email": "{{email}}",
  "image": "{{image_url}}",
  "priceRange": "{{price_range}}",
  "address": {
    "@type": "PostalAddress",
    "streetAddress": "{{street_address}}",
    "addressLocality": "{{city}}",
    "addressRegion": "{{region}}",
    "postalCode": "{{postal_code}}",
    "addressCountry": "{{country}}"
  },
  "geo": {
    "@type": "GeoCoordinates",
    "latitude": "{{latitude}}",
    "longitude": "{{longitude}}"
  },
  "openingHoursSpecification": [
    {
      "@type": "OpeningHoursSpecification",
      "dayOfWeek": ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"],
      "opens": "{{weekday_opens}}",
      "closes": "{{weekday_closes}}"
    },
    {
      "@type": "OpeningHoursSpecification",
      "dayOfWeek": ["Saturday", "Sunday"],
      "opens": "{{weekend_opens}}",
      "closes": "{{weekend_closes}}"
    }
  ],
  "aggregateRating": {
    "@type": "AggregateRating",
    "ratingValue": "{{rating}}",
    "reviewCount": "{{review_count}}"
  },
  "sameAs": [
    "{{facebook_url}}",
    "{{instagram_url}}"
  ]
 }
--- a/ourdigital-custom-skills/12-ourdigital-seo-audit/templates/schema_templates/organization.json
+++ b/ourdigital-custom-skills/12-ourdigital-seo-audit/templates/schema_templates/organization.json
@@ -0,0 +1,37 @@
 {
  "@context": "https://schema.org",
  "@type": "Organization",
  "name": "{{name}}",
  "url": "{{url}}",
  "logo": "{{logo_url}}",
  "description": "{{description}}",
  "foundingDate": "{{founding_date}}",
  "founders": [
    {
      "@type": "Person",
      "name": "{{founder_name}}"
    }
  ],
  "address": {
    "@type": "PostalAddress",
    "streetAddress": "{{street_address}}",
    "addressLocality": "{{city}}",
    "addressRegion": "{{region}}",
    "postalCode": "{{postal_code}}",
    "addressCountry": "{{country}}"
  },
  "contactPoint": [
    {
      "@type": "ContactPoint",
      "telephone": "{{phone}}",
      "contactType": "customer service",
      "availableLanguage": ["Korean", "English"]
    }
  ],
  "sameAs": [
    "{{facebook_url}}",
    "{{twitter_url}}",
    "{{linkedin_url}}",
    "{{instagram_url}}"
  ]
 }
--- a/ourdigital-custom-skills/12-ourdigital-seo-audit/templates/schema_templates/product.json
+++ b/ourdigital-custom-skills/12-ourdigital-seo-audit/templates/schema_templates/product.json
@@ -0,0 +1,76 @@
 {
  "@context": "https://schema.org",
  "@type": "Product",
  "name": "{{name}}",
  "description": "{{description}}",
  "image": [
    "{{image_url_1}}",
    "{{image_url_2}}",
    "{{image_url_3}}"
  ],
  "sku": "{{sku}}",
  "mpn": "{{mpn}}",
  "gtin13": "{{gtin13}}",
  "brand": {
    "@type": "Brand",
    "name": "{{brand_name}}"
  },
  "offers": {
    "@type": "Offer",
    "url": "{{product_url}}",
    "price": "{{price}}",
    "priceCurrency": "{{currency}}",
    "priceValidUntil": "{{price_valid_until}}",
    "availability": "https://schema.org/{{availability}}",
    "itemCondition": "https://schema.org/{{condition}}",
    "seller": {
      "@type": "Organization",
      "name": "{{seller_name}}"
    },
    "shippingDetails": {
      "@type": "OfferShippingDetails",
      "shippingRate": {
        "@type": "MonetaryAmount",
        "value": "{{shipping_cost}}",
        "currency": "{{currency}}"
      },
      "deliveryTime": {
        "@type": "ShippingDeliveryTime",
        "handlingTime": {
          "@type": "QuantitativeValue",
          "minValue": "{{handling_min_days}}",
          "maxValue": "{{handling_max_days}}",
          "unitCode": "DAY"
        },
        "transitTime": {
          "@type": "QuantitativeValue",
          "minValue": "{{transit_min_days}}",
          "maxValue": "{{transit_max_days}}",
          "unitCode": "DAY"
        }
      }
    }
  },
  "aggregateRating": {
    "@type": "AggregateRating",
    "ratingValue": "{{rating}}",
    "reviewCount": "{{review_count}}",
    "bestRating": "5",
    "worstRating": "1"
  },
  "review": [
    {
      "@type": "Review",
      "reviewRating": {
        "@type": "Rating",
        "ratingValue": "{{review_rating}}",
        "bestRating": "5"
      },
      "author": {
        "@type": "Person",
        "name": "{{reviewer_name}}"
      },
      "reviewBody": "{{review_text}}"
    }
  ]
 }
--- a/ourdigital-custom-skills/12-ourdigital-seo-audit/templates/schema_templates/website.json
+++ b/ourdigital-custom-skills/12-ourdigital-seo-audit/templates/schema_templates/website.json
@@ -0,0 +1,25 @@
 {
  "@context": "https://schema.org",
  "@type": "WebSite",
  "name": "{{site_name}}",
  "alternateName": "{{alternate_name}}",
  "url": "{{url}}",
  "description": "{{description}}",
  "inLanguage": "{{language}}",
  "potentialAction": {
    "@type": "SearchAction",
    "target": {
      "@type": "EntryPoint",
      "urlTemplate": "{{search_url_template}}"
    },
    "query-input": "required name=search_term_string"
  },
  "publisher": {
    "@type": "Organization",
    "name": "{{publisher_name}}",
    "logo": {
      "@type": "ImageObject",
      "url": "{{logo_url}}"
    }
  }
 }