Complete implementation of OurDigital skills with dual-platform support (Claude Desktop + Claude Code) following standardized structure. Skills created: - 01-ourdigital-brand-guide: Brand reference & style guidelines - 02-ourdigital-blog: Korean blog drafts (blog.ourdigital.org) - 03-ourdigital-journal: English essays (journal.ourdigital.org) - 04-ourdigital-research: Research prompts & workflows - 05-ourdigital-document: Notion-to-presentation pipeline - 06-ourdigital-designer: Visual/image prompt generation - 07-ourdigital-ad-manager: Ad copywriting & keyword research - 08-ourdigital-trainer: Training materials & workshop planning - 09-ourdigital-backoffice: Quotes, proposals, cost analysis - 10-ourdigital-skill-creator: Meta skill for creating new skills Features: - YAML frontmatter with "ourdigital" or "our" prefix triggers - Standardized directory structure (code/, desktop/, shared/, docs/) - Shared environment setup (_ourdigital-shared/) - Comprehensive reference documentation - Cross-skill integration support Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
163 lines
4.5 KiB
Markdown
163 lines
4.5 KiB
Markdown
# CLAUDE.md
|
|
|
|
## Overview
|
|
|
|
Technical SEO auditor for crawlability fundamentals: robots.txt validation, XML sitemap analysis, and URL accessibility checking.
|
|
|
|
## Quick Start
|
|
|
|
```bash
|
|
# Install dependencies
|
|
pip install -r scripts/requirements.txt
|
|
|
|
# Robots.txt analysis
|
|
python scripts/robots_checker.py --url https://example.com
|
|
|
|
# Sitemap validation
|
|
python scripts/sitemap_validator.py --url https://example.com/sitemap.xml
|
|
|
|
# Async URL crawl (check sitemap URLs accessibility)
|
|
python scripts/sitemap_crawler.py --sitemap https://example.com/sitemap.xml
|
|
```
|
|
|
|
## Scripts
|
|
|
|
| Script | Purpose | Key Output |
|
|
|--------|---------|------------|
|
|
| `robots_checker.py` | Parse and validate robots.txt | User-agent rules, disallow patterns, sitemap declarations |
|
|
| `sitemap_validator.py` | Validate XML sitemap structure | URL count, lastmod dates, size limits, syntax errors |
|
|
| `sitemap_crawler.py` | Async check URL accessibility | HTTP status codes, response times, broken links |
|
|
| `base_client.py` | Shared utilities | RateLimiter, ConfigManager, BaseAsyncClient |
|
|
|
|
## Robots.txt Checker
|
|
|
|
```bash
|
|
# Basic analysis
|
|
python scripts/robots_checker.py --url https://example.com
|
|
|
|
# Test specific URL against rules
|
|
python scripts/robots_checker.py --url https://example.com --test-url /admin/
|
|
|
|
# Output JSON
|
|
python scripts/robots_checker.py --url https://example.com --json
|
|
```
|
|
|
|
**Checks performed**:
|
|
- Syntax validation
|
|
- User-agent rule parsing
|
|
- Disallow/Allow pattern analysis
|
|
- Sitemap declarations
|
|
- Critical resource access (CSS/JS/images)
|
|
|
|
## Sitemap Validator
|
|
|
|
```bash
|
|
# Validate sitemap
|
|
python scripts/sitemap_validator.py --url https://example.com/sitemap.xml
|
|
|
|
# Include sitemap index parsing
|
|
python scripts/sitemap_validator.py --url https://example.com/sitemap_index.xml --follow-index
|
|
```
|
|
|
|
**Validation rules**:
|
|
- XML syntax correctness
|
|
- URL count limit (50,000 max per sitemap)
|
|
- File size limit (50MB max uncompressed)
|
|
- Lastmod date format validation
|
|
- Sitemap index structure
|
|
|
|
## Sitemap Crawler
|
|
|
|
```bash
|
|
# Crawl all URLs in sitemap
|
|
python scripts/sitemap_crawler.py --sitemap https://example.com/sitemap.xml
|
|
|
|
# Limit concurrent requests
|
|
python scripts/sitemap_crawler.py --sitemap https://example.com/sitemap.xml --concurrency 10
|
|
|
|
# Sample mode (check subset)
|
|
python scripts/sitemap_crawler.py --sitemap https://example.com/sitemap.xml --sample 100
|
|
```
|
|
|
|
**Output includes**:
|
|
- HTTP status codes per URL
|
|
- Response times
|
|
- Redirect chains
|
|
- Broken links (4xx, 5xx)
|
|
|
|
## Output Format
|
|
|
|
All scripts support `--json` flag for structured output:
|
|
|
|
```json
|
|
{
|
|
"url": "https://example.com",
|
|
"status": "valid|invalid|warning",
|
|
"issues": [
|
|
{
|
|
"type": "error|warning|info",
|
|
"message": "Description",
|
|
"location": "Line or URL"
|
|
}
|
|
],
|
|
"summary": {}
|
|
}
|
|
```
|
|
|
|
## Common Issues Detected
|
|
|
|
| Category | Issue | Severity |
|
|
|----------|-------|----------|
|
|
| Robots.txt | Missing sitemap declaration | Medium |
|
|
| Robots.txt | Blocking CSS/JS resources | High |
|
|
| Robots.txt | Overly broad disallow rules | Medium |
|
|
| Sitemap | URLs returning 404 | High |
|
|
| Sitemap | Missing lastmod dates | Low |
|
|
| Sitemap | Exceeds 50,000 URL limit | High |
|
|
| Sitemap | Non-canonical URLs included | Medium |
|
|
|
|
## Configuration
|
|
|
|
Environment variables (optional):
|
|
```bash
|
|
# Rate limiting
|
|
CRAWL_DELAY=1.0 # Seconds between requests
|
|
MAX_CONCURRENT=20 # Async concurrency limit
|
|
REQUEST_TIMEOUT=30 # Request timeout seconds
|
|
```
|
|
|
|
## Notion Output (Required)
|
|
|
|
**IMPORTANT**: All audit reports MUST be saved to the OurDigital SEO Audit Log database.
|
|
|
|
### Database Configuration
|
|
|
|
| Field | Value |
|
|
|-------|-------|
|
|
| Database ID | `2c8581e5-8a1e-8035-880b-e38cefc2f3ef` |
|
|
| URL | https://www.notion.so/dintelligence/2c8581e58a1e8035880be38cefc2f3ef |
|
|
|
|
### Required Properties
|
|
|
|
| Property | Type | Description |
|
|
|----------|------|-------------|
|
|
| Issue | Title | Report title (Korean + date) |
|
|
| Site | URL | Audited website URL |
|
|
| Category | Select | Technical SEO, On-page SEO, Performance, Schema/Structured Data, Sitemap, Robots.txt, Content, Local SEO |
|
|
| Priority | Select | Critical, High, Medium, Low |
|
|
| Found Date | Date | Audit date (YYYY-MM-DD) |
|
|
| Audit ID | Rich Text | Format: [TYPE]-YYYYMMDD-NNN |
|
|
|
|
### Language Guidelines
|
|
|
|
- Report content in Korean (한국어)
|
|
- Keep technical English terms as-is (e.g., SEO Audit, Core Web Vitals, Schema Markup)
|
|
- URLs and code remain unchanged
|
|
|
|
### Example MCP Call
|
|
|
|
```bash
|
|
mcp-cli call notion/API-post-page '{"parent": {"database_id": "2c8581e5-8a1e-8035-880b-e38cefc2f3ef"}, "properties": {...}}'
|
|
```
|
|
|