directory changes and restructuring
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
127
custom-skills/10-seo-technical-audit/code/CLAUDE.md
Normal file
127
custom-skills/10-seo-technical-audit/code/CLAUDE.md
Normal file
@@ -0,0 +1,127 @@
|
||||
# CLAUDE.md
|
||||
|
||||
## Overview
|
||||
|
||||
Technical SEO auditor for crawlability fundamentals: robots.txt validation, XML sitemap analysis, and URL accessibility checking.
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
# Install dependencies
|
||||
pip install -r scripts/requirements.txt
|
||||
|
||||
# Robots.txt analysis
|
||||
python scripts/robots_checker.py --url https://example.com
|
||||
|
||||
# Sitemap validation
|
||||
python scripts/sitemap_validator.py --url https://example.com/sitemap.xml
|
||||
|
||||
# Async URL crawl (check sitemap URLs accessibility)
|
||||
python scripts/sitemap_crawler.py --sitemap https://example.com/sitemap.xml
|
||||
```
|
||||
|
||||
## Scripts
|
||||
|
||||
| Script | Purpose | Key Output |
|
||||
|--------|---------|------------|
|
||||
| `robots_checker.py` | Parse and validate robots.txt | User-agent rules, disallow patterns, sitemap declarations |
|
||||
| `sitemap_validator.py` | Validate XML sitemap structure | URL count, lastmod dates, size limits, syntax errors |
|
||||
| `sitemap_crawler.py` | Async check URL accessibility | HTTP status codes, response times, broken links |
|
||||
| `base_client.py` | Shared utilities | RateLimiter, ConfigManager, BaseAsyncClient |
|
||||
|
||||
## Robots.txt Checker
|
||||
|
||||
```bash
|
||||
# Basic analysis
|
||||
python scripts/robots_checker.py --url https://example.com
|
||||
|
||||
# Test specific URL against rules
|
||||
python scripts/robots_checker.py --url https://example.com --test-url /admin/
|
||||
|
||||
# Output JSON
|
||||
python scripts/robots_checker.py --url https://example.com --json
|
||||
```
|
||||
|
||||
**Checks performed**:
|
||||
- Syntax validation
|
||||
- User-agent rule parsing
|
||||
- Disallow/Allow pattern analysis
|
||||
- Sitemap declarations
|
||||
- Critical resource access (CSS/JS/images)
|
||||
|
||||
## Sitemap Validator
|
||||
|
||||
```bash
|
||||
# Validate sitemap
|
||||
python scripts/sitemap_validator.py --url https://example.com/sitemap.xml
|
||||
|
||||
# Include sitemap index parsing
|
||||
python scripts/sitemap_validator.py --url https://example.com/sitemap_index.xml --follow-index
|
||||
```
|
||||
|
||||
**Validation rules**:
|
||||
- XML syntax correctness
|
||||
- URL count limit (50,000 max per sitemap)
|
||||
- File size limit (50MB max uncompressed)
|
||||
- Lastmod date format validation
|
||||
- Sitemap index structure
|
||||
|
||||
## Sitemap Crawler
|
||||
|
||||
```bash
|
||||
# Crawl all URLs in sitemap
|
||||
python scripts/sitemap_crawler.py --sitemap https://example.com/sitemap.xml
|
||||
|
||||
# Limit concurrent requests
|
||||
python scripts/sitemap_crawler.py --sitemap https://example.com/sitemap.xml --concurrency 10
|
||||
|
||||
# Sample mode (check subset)
|
||||
python scripts/sitemap_crawler.py --sitemap https://example.com/sitemap.xml --sample 100
|
||||
```
|
||||
|
||||
**Output includes**:
|
||||
- HTTP status codes per URL
|
||||
- Response times
|
||||
- Redirect chains
|
||||
- Broken links (4xx, 5xx)
|
||||
|
||||
## Output Format
|
||||
|
||||
All scripts support `--json` flag for structured output:
|
||||
|
||||
```json
|
||||
{
|
||||
"url": "https://example.com",
|
||||
"status": "valid|invalid|warning",
|
||||
"issues": [
|
||||
{
|
||||
"type": "error|warning|info",
|
||||
"message": "Description",
|
||||
"location": "Line or URL"
|
||||
}
|
||||
],
|
||||
"summary": {}
|
||||
}
|
||||
```
|
||||
|
||||
## Common Issues Detected
|
||||
|
||||
| Category | Issue | Severity |
|
||||
|----------|-------|----------|
|
||||
| Robots.txt | Missing sitemap declaration | Medium |
|
||||
| Robots.txt | Blocking CSS/JS resources | High |
|
||||
| Robots.txt | Overly broad disallow rules | Medium |
|
||||
| Sitemap | URLs returning 404 | High |
|
||||
| Sitemap | Missing lastmod dates | Low |
|
||||
| Sitemap | Exceeds 50,000 URL limit | High |
|
||||
| Sitemap | Non-canonical URLs included | Medium |
|
||||
|
||||
## Configuration
|
||||
|
||||
Environment variables (optional):
|
||||
```bash
|
||||
# Rate limiting
|
||||
CRAWL_DELAY=1.0 # Seconds between requests
|
||||
MAX_CONCURRENT=20 # Async concurrency limit
|
||||
REQUEST_TIMEOUT=30 # Request timeout seconds
|
||||
```
|
||||
Reference in New Issue
Block a user