our-claude-skills/custom-skills/99_archive/seo-audit-agent/CLAUDE.md

# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Skill Overview

**ourdigital-seo-audit** is a comprehensive SEO audit skill that performs technical SEO analysis, schema validation, sitemap/robots.txt checks, and Core Web Vitals measurement. Results are exported to a Notion database.

## Architecture

```
12-ourdigital-seo-audit/
├── SKILL.md                    # Skill definition with YAML frontmatter
├── scripts/                    # Python automation scripts
│   ├── base_client.py          # Shared utilities: RateLimiter, ConfigManager
│   ├── full_audit.py           # Main orchestrator (SEOAuditor class)
│   ├── gsc_client.py           # Google Search Console API
│   ├── pagespeed_client.py     # PageSpeed Insights API
│   ├── schema_validator.py     # JSON-LD/Microdata extraction & validation
│   ├── schema_generator.py     # Generate schema markup from templates
│   ├── sitemap_validator.py    # XML sitemap validation
│   ├── sitemap_crawler.py      # Async sitemap URL crawler
│   ├── robots_checker.py       # Robots.txt parser & rule tester
│   ├── page_analyzer.py        # On-page SEO analysis
│   └── notion_reporter.py      # Notion database integration
├── templates/
│   ├── schema_templates/       # JSON-LD templates (article, faq, product, etc.)
│   └── notion_database_schema.json
├── reference.md                # API documentation
└── USER_GUIDE.md               # End-user documentation
```

## Script Relationships

```
full_audit.py (orchestrator)
├── robots_checker.py     → RobotsChecker.analyze()
├── sitemap_validator.py  → SitemapValidator.validate()
├── schema_validator.py   → SchemaValidator.validate()
├── pagespeed_client.py   → PageSpeedClient.analyze()
└── notion_reporter.py    → NotionReporter.create_audit_report()

All scripts use:
└── base_client.py        → ConfigManager (credentials), RateLimiter, BaseAsyncClient
```

## Common Commands

### Install Dependencies
```bash
pip install -r scripts/requirements.txt
```

### Run Full SEO Audit
```bash
python scripts/full_audit.py --url https://example.com --output console
python scripts/full_audit.py --url https://example.com --output notion
python scripts/full_audit.py --url https://example.com --json
```

### Individual Script Usage
```bash
# Robots.txt analysis
python scripts/robots_checker.py --url https://example.com

# Sitemap validation
python scripts/sitemap_validator.py --url https://example.com/sitemap.xml

# Schema validation
python scripts/schema_validator.py --url https://example.com

# Schema generation
python scripts/schema_generator.py --type organization --url https://example.com

# Core Web Vitals
python scripts/pagespeed_client.py --url https://example.com --strategy mobile

# Search Console data
python scripts/gsc_client.py --site sc-domain:example.com --action summary
```

## Key Classes and Data Flow

### AuditResult (full_audit.py)
Central dataclass holding all audit findings:
- `robots`, `sitemap`, `schema`, `performance` - Raw results from each checker
- `findings: list[SEOFinding]` - Normalized issues for Notion export
- `summary` - Aggregated statistics

### SEOFinding (notion_reporter.py)
Standard format for all audit issues:
```python
@dataclass
class SEOFinding:
    issue: str          # Issue title
    category: str       # Technical SEO, Performance, Schema, etc.
    priority: str       # Critical, High, Medium, Low
    url: str | None     # Affected URL
    recommendation: str # How to fix
    audit_id: str       # Groups findings from same session
```

### NotionReporter
Creates findings in Notion with two modes:
1. Individual pages per finding in default database
2. Summary page with checklist table via `create_audit_report()`

Default database: `2c8581e5-8a1e-8035-880b-e38cefc2f3ef`

## Google API Configuration

**Service Account**: `~/.credential/ourdigital-seo-agent.json`

| API | Authentication | Usage |
|-----|---------------|-------|
| Search Console | Service account | `gsc_client.py` |
| PageSpeed Insights | API key (`PAGESPEED_API_KEY`) | `pagespeed_client.py` |
| GA4 Analytics | Service account | Traffic data |

Environment variables are loaded from `~/Workspaces/claude-workspace/.env`.

## MCP Tool Integration

The skill uses MCP tools as primary data sources (Tier 1):
- `mcp__firecrawl__scrape/crawl` - Web page content extraction
- `mcp__perplexity__search` - Competitor research
- `mcp__notion__*` - Database operations

Python scripts are Tier 2 for Google API data collection.

## Extending the Skill

### Adding a New Schema Type
1. Add JSON template to `templates/schema_templates/`
2. Update `REQUIRED_PROPERTIES` and `RECOMMENDED_PROPERTIES` in `schema_validator.py`
3. Add type-specific validation in `_validate_type_specific()`

### Adding a New Audit Check
1. Create checker class following pattern in existing scripts
2. Return dataclass with `to_dict()` method and `issues` list
3. Add processing method in `SEOAuditor` (`_process_*_findings`)
4. Wire into `run_audit()` in `full_audit.py`

## Rate Limits

| Service | Limit | Handled By |
|---------|-------|------------|
| Firecrawl | Per plan | MCP |
| PageSpeed | 25,000/day | `base_client.py` RateLimiter |
| Search Console | 1,200/min | Manual delays |
| Notion | 3 req/sec | Semaphore in reporter |