# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Skill Overview **ourdigital-seo-audit** is a comprehensive SEO audit skill that performs technical SEO analysis, schema validation, sitemap/robots.txt checks, and Core Web Vitals measurement. Results are exported to a Notion database. ## Architecture ``` 12-ourdigital-seo-audit/ ├── SKILL.md # Skill definition with YAML frontmatter ├── scripts/ # Python automation scripts │ ├── base_client.py # Shared utilities: RateLimiter, ConfigManager │ ├── full_audit.py # Main orchestrator (SEOAuditor class) │ ├── gsc_client.py # Google Search Console API │ ├── pagespeed_client.py # PageSpeed Insights API │ ├── schema_validator.py # JSON-LD/Microdata extraction & validation │ ├── schema_generator.py # Generate schema markup from templates │ ├── sitemap_validator.py # XML sitemap validation │ ├── sitemap_crawler.py # Async sitemap URL crawler │ ├── robots_checker.py # Robots.txt parser & rule tester │ ├── page_analyzer.py # On-page SEO analysis │ └── notion_reporter.py # Notion database integration ├── templates/ │ ├── schema_templates/ # JSON-LD templates (article, faq, product, etc.) │ └── notion_database_schema.json ├── reference.md # API documentation └── USER_GUIDE.md # End-user documentation ``` ## Script Relationships ``` full_audit.py (orchestrator) ├── robots_checker.py → RobotsChecker.analyze() ├── sitemap_validator.py → SitemapValidator.validate() ├── schema_validator.py → SchemaValidator.validate() ├── pagespeed_client.py → PageSpeedClient.analyze() └── notion_reporter.py → NotionReporter.create_audit_report() All scripts use: └── base_client.py → ConfigManager (credentials), RateLimiter, BaseAsyncClient ``` ## Common Commands ### Install Dependencies ```bash pip install -r scripts/requirements.txt ``` ### Run Full SEO Audit ```bash python scripts/full_audit.py --url https://example.com --output console python scripts/full_audit.py --url https://example.com --output notion python scripts/full_audit.py --url https://example.com --json ``` ### Individual Script Usage ```bash # Robots.txt analysis python scripts/robots_checker.py --url https://example.com # Sitemap validation python scripts/sitemap_validator.py --url https://example.com/sitemap.xml # Schema validation python scripts/schema_validator.py --url https://example.com # Schema generation python scripts/schema_generator.py --type organization --url https://example.com # Core Web Vitals python scripts/pagespeed_client.py --url https://example.com --strategy mobile # Search Console data python scripts/gsc_client.py --site sc-domain:example.com --action summary ``` ## Key Classes and Data Flow ### AuditResult (full_audit.py) Central dataclass holding all audit findings: - `robots`, `sitemap`, `schema`, `performance` - Raw results from each checker - `findings: list[SEOFinding]` - Normalized issues for Notion export - `summary` - Aggregated statistics ### SEOFinding (notion_reporter.py) Standard format for all audit issues: ```python @dataclass class SEOFinding: issue: str # Issue title category: str # Technical SEO, Performance, Schema, etc. priority: str # Critical, High, Medium, Low url: str | None # Affected URL recommendation: str # How to fix audit_id: str # Groups findings from same session ``` ### NotionReporter Creates findings in Notion with two modes: 1. Individual pages per finding in default database 2. Summary page with checklist table via `create_audit_report()` Default database: `2c8581e5-8a1e-8035-880b-e38cefc2f3ef` ## Google API Configuration **Service Account**: `~/.credential/ourdigital-seo-agent.json` | API | Authentication | Usage | |-----|---------------|-------| | Search Console | Service account | `gsc_client.py` | | PageSpeed Insights | API key (`PAGESPEED_API_KEY`) | `pagespeed_client.py` | | GA4 Analytics | Service account | Traffic data | Environment variables are loaded from `~/Workspaces/claude-workspace/.env`. ## MCP Tool Integration The skill uses MCP tools as primary data sources (Tier 1): - `mcp__firecrawl__scrape/crawl` - Web page content extraction - `mcp__perplexity__search` - Competitor research - `mcp__notion__*` - Database operations Python scripts are Tier 2 for Google API data collection. ## Extending the Skill ### Adding a New Schema Type 1. Add JSON template to `templates/schema_templates/` 2. Update `REQUIRED_PROPERTIES` and `RECOMMENDED_PROPERTIES` in `schema_validator.py` 3. Add type-specific validation in `_validate_type_specific()` ### Adding a New Audit Check 1. Create checker class following pattern in existing scripts 2. Return dataclass with `to_dict()` method and `issues` list 3. Add processing method in `SEOAuditor` (`_process_*_findings`) 4. Wire into `run_audit()` in `full_audit.py` ## Rate Limits | Service | Limit | Handled By | |---------|-------|------------| | Firecrawl | Per plan | MCP | | PageSpeed | 25,000/day | `base_client.py` RateLimiter | | Search Console | 1,200/min | Manual delays | | Notion | 3 req/sec | Semaphore in reporter |