feat(reference-curator): Add Claude.ai Projects export format

Add claude-project/ folder with skill files formatted for upload to Claude.ai Projects (web interface): - reference-curator-complete.md: All 6 skills consolidated - INDEX.md: Overview and workflow documentation - Individual skill files (01-06) without YAML frontmatter Add --claude-ai option to install.sh: - Lists available files for upload - Optionally copies to custom destination directory - Provides upload instructions for Claude.ai Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 00:33:06 +07:00
parent 8762f68e6e
commit 243b9d851c
10 changed files with 1987 additions and 0 deletions
--- a/custom-skills/90-reference-curator/README.md
+++ b/custom-skills/90-reference-curator/README.md
@@ -60,6 +60,7 @@ cd our-claude-skills/custom-skills/90-reference-curator
 | **Full** | `./install.sh` | Interactive setup with MySQL and crawlers |
 | **Minimal** | `./install.sh --minimal` | Firecrawl MCP only, no database |
 | **Check** | `./install.sh --check` | Verify installation status |
 | **Claude.ai** | `./install.sh --claude-ai` | Export skills for Claude.ai Projects |
 | **Uninstall** | `./install.sh --uninstall` | Remove installation (preserves data) |
 ### What Gets Installed
@@ -94,6 +95,38 @@ export CRAWLER_PROJECT_PATH=""       # Path to local crawlers (optional)
 ---
 ## Claude.ai Projects Installation
 To use these skills in Claude.ai (web interface), export the skill files for upload:
 ```bash
 ./install.sh --claude-ai
 ```
 This displays available files in `claude-project/` and optionally copies them to a convenient location.
 ### Files for Upload
 | File | Description |
 |------|-------------|
 | `reference-curator-complete.md` | All 6 skills combined (recommended) |
 | `INDEX.md` | Overview and workflow documentation |
 | `01-reference-discovery.md` | Source discovery skill |
 | `02-web-crawler.md` | Crawling orchestration skill |
 | `03-content-repository.md` | Database storage skill |
 | `04-content-distiller.md` | Content summarization skill |
 | `05-quality-reviewer.md` | QA review skill |
 | `06-markdown-exporter.md` | Export skill |
 ### Upload Instructions
 1. Go to [claude.ai](https://claude.ai)
 2. Create a new Project or open existing one
 3. Click "Add to project knowledge"
 4. Upload `reference-curator-complete.md` (or individual skills as needed)
 ---
 ## Architecture
 ```
@@ -386,6 +419,16 @@ mysql -h $MYSQL_HOST -u $MYSQL_USER -p"$MYSQL_PASSWORD" reference_library < shar
 ├── CHANGELOG.md                  # Version history
 ├── install.sh                    # Portable installation script
 │
 ├── claude-project/               # Files for Claude.ai Projects
 │   ├── INDEX.md                  # Overview
 │   ├── reference-curator-complete.md  # All skills combined
 │   ├── 01-reference-discovery.md
 │   ├── 02-web-crawler.md
 │   ├── 03-content-repository.md
 │   ├── 04-content-distiller.md
 │   ├── 05-quality-reviewer.md
 │   └── 06-markdown-exporter.md
 │
 ├── commands/                     # Claude Code commands (tracked in git)
 │   ├── reference-discovery.md
 │   ├── web-crawler.md
--- a/custom-skills/90-reference-curator/claude-project/01-reference-discovery.md
+++ b/custom-skills/90-reference-curator/claude-project/01-reference-discovery.md
@@ -0,0 +1,184 @@
 # Reference Discovery
 Searches for authoritative sources, validates credibility, and produces curated URL lists for crawling.
 ## Source Priority Hierarchy
 | Tier | Source Type | Examples |
 |------|-------------|----------|
 | **Tier 1** | Official documentation | docs.anthropic.com, docs.claude.com, platform.openai.com/docs |
 | **Tier 1** | Engineering blogs (official) | anthropic.com/news, openai.com/blog |
 | **Tier 1** | Official GitHub repos | github.com/anthropics/*, github.com/openai/* |
 | **Tier 2** | Research papers | arxiv.org, papers with citations |
 | **Tier 2** | Verified community guides | Cookbook examples, official tutorials |
 | **Tier 3** | Community content | Blog posts, tutorials, Stack Overflow |
 ## Discovery Workflow
 ### Step 1: Define Search Scope
 ```python
 search_config = {
    "topic": "prompt engineering",
    "vendors": ["anthropic", "openai", "google"],
    "source_types": ["official_docs", "engineering_blog", "github_repo"],
    "freshness": "past_year",  # past_week, past_month, past_year, any
    "max_results_per_query": 20
 }
 ```
 ### Step 2: Generate Search Queries
 For a given topic, generate targeted queries:
 ```python
 def generate_queries(topic, vendors):
    queries = []
    # Official documentation queries
    for vendor in vendors:
        queries.append(f"site:docs.{vendor}.com {topic}")
        queries.append(f"site:{vendor}.com/docs {topic}")
    # Engineering blog queries
    for vendor in vendors:
        queries.append(f"site:{vendor}.com/blog {topic}")
        queries.append(f"site:{vendor}.com/news {topic}")
    # GitHub queries
    for vendor in vendors:
        queries.append(f"site:github.com/{vendor} {topic}")
    # Research queries
    queries.append(f"site:arxiv.org {topic}")
    return queries
 ```
 ### Step 3: Execute Search
 Use web search tool for each query:
 ```python
 def execute_discovery(queries):
    results = []
    for query in queries:
        search_results = web_search(query)
        for result in search_results:
            results.append({
                "url": result.url,
                "title": result.title,
                "snippet": result.snippet,
                "query_used": query
            })
    return deduplicate_by_url(results)
 ```
 ### Step 4: Validate and Score Sources
 ```python
 def score_source(url, title):
    score = 0.0
    # Domain credibility
    if any(d in url for d in ['docs.anthropic.com', 'docs.claude.com', 'docs.openai.com']):
        score += 0.40  # Tier 1 official docs
    elif any(d in url for d in ['anthropic.com', 'openai.com', 'google.dev']):
        score += 0.30  # Tier 1 official blog/news
    elif 'github.com' in url and any(v in url for v in ['anthropics', 'openai', 'google']):
        score += 0.30  # Tier 1 official repos
    elif 'arxiv.org' in url:
        score += 0.20  # Tier 2 research
    else:
        score += 0.10  # Tier 3 community
    # Freshness signals (from title/snippet)
    if any(year in title for year in ['2025', '2024']):
        score += 0.20
    elif any(year in title for year in ['2023']):
        score += 0.10
    # Relevance signals
    if any(kw in title.lower() for kw in ['guide', 'documentation', 'tutorial', 'best practices']):
        score += 0.15
    return min(score, 1.0)
 def assign_credibility_tier(score):
    if score >= 0.60:
        return 'tier1_official'
    elif score >= 0.40:
        return 'tier2_verified'
    else:
        return 'tier3_community'
 ```
 ### Step 5: Output URL Manifest
 ```python
 def create_manifest(scored_results, topic):
    manifest = {
        "discovery_date": datetime.now().isoformat(),
        "topic": topic,
        "total_urls": len(scored_results),
        "urls": []
    }
    for result in sorted(scored_results, key=lambda x: x['score'], reverse=True):
        manifest["urls"].append({
            "url": result["url"],
            "title": result["title"],
            "credibility_tier": result["tier"],
            "credibility_score": result["score"],
            "source_type": infer_source_type(result["url"]),
            "vendor": infer_vendor(result["url"])
        })
    return manifest
 ```
 ## Output Format
 Discovery produces a JSON manifest for the crawler:
 ```json
 {
  "discovery_date": "2025-01-28T10:30:00",
  "topic": "prompt engineering",
  "total_urls": 15,
  "urls": [
    {
      "url": "https://docs.anthropic.com/en/docs/prompt-engineering",
      "title": "Prompt Engineering Guide",
      "credibility_tier": "tier1_official",
      "credibility_score": 0.85,
      "source_type": "official_docs",
      "vendor": "anthropic"
    }
  ]
 }
 ```
 ## Known Authoritative Sources
 Pre-validated sources for common topics:
 | Vendor | Documentation | Blog/News | GitHub |
 |--------|--------------|-----------|--------|
 | Anthropic | docs.anthropic.com, docs.claude.com | anthropic.com/news | github.com/anthropics |
 | OpenAI | platform.openai.com/docs | openai.com/blog | github.com/openai |
 | Google | ai.google.dev/docs | blog.google/technology/ai | github.com/google |
 ## Integration
 **Output:** URL manifest JSON → `web-crawler-orchestrator`
 **Database:** Register new sources in `sources` table via `content-repository`
 ## Deduplication
 Before outputting, deduplicate URLs:
 - Normalize URLs (remove trailing slashes, query params)
 - Check against existing `documents` table via `content-repository`
 - Merge duplicate entries, keeping highest credibility score
--- a/custom-skills/90-reference-curator/claude-project/02-web-crawler.md
+++ b/custom-skills/90-reference-curator/claude-project/02-web-crawler.md
@@ -0,0 +1,230 @@
 # Web Crawler Orchestrator
 Manages crawling operations using Firecrawl MCP with rate limiting and format handling.
 ## Prerequisites
 - Firecrawl MCP server connected
 - Config file at `~/.config/reference-curator/crawl_config.yaml`
 - Storage directory exists: `~/reference-library/raw/`
 ## Crawl Configuration
 ```yaml
 # ~/.config/reference-curator/crawl_config.yaml
 firecrawl:
  rate_limit:
    requests_per_minute: 20
    concurrent_requests: 3
  default_options:
    timeout: 30000
    only_main_content: true
    include_html: false
 processing:
  max_content_size_mb: 50
  raw_content_dir: ~/reference-library/raw/
 ```
 ## Crawl Workflow
 ### Step 1: Load URL Manifest
 Receive manifest from `reference-discovery`:
 ```python
 def load_manifest(manifest_path):
    with open(manifest_path) as f:
        manifest = json.load(f)
    return manifest["urls"]
 ```
 ### Step 2: Determine Crawl Strategy
 ```python
 def select_strategy(url):
    """Select optimal crawl strategy based on URL characteristics."""
    if url.endswith('.pdf'):
        return 'pdf_extract'
    elif 'github.com' in url and '/blob/' in url:
        return 'raw_content'  # Get raw file content
    elif 'github.com' in url:
        return 'scrape'  # Repository pages
    elif any(d in url for d in ['docs.', 'documentation']):
        return 'scrape'  # Documentation sites
    else:
        return 'scrape'  # Default
 ```
 ### Step 3: Execute Firecrawl
 Use Firecrawl MCP for crawling:
 ```python
 # Single page scrape
 firecrawl_scrape(
    url="https://docs.anthropic.com/en/docs/prompt-engineering",
    formats=["markdown"],  # markdown | html | screenshot
    only_main_content=True,
    timeout=30000
 )
 # Multi-page crawl (documentation sites)
 firecrawl_crawl(
    url="https://docs.anthropic.com/en/docs/",
    max_depth=2,
    limit=50,
    formats=["markdown"],
    only_main_content=True
 )
 ```
 ### Step 4: Rate Limiting
 ```python
 import time
 from collections import deque
 class RateLimiter:
    def __init__(self, requests_per_minute=20):
        self.rpm = requests_per_minute
        self.request_times = deque()
    def wait_if_needed(self):
        now = time.time()
        # Remove requests older than 1 minute
        while self.request_times and now - self.request_times[0] > 60:
            self.request_times.popleft()
        if len(self.request_times) >= self.rpm:
            wait_time = 60 - (now - self.request_times[0])
            if wait_time > 0:
                time.sleep(wait_time)
        self.request_times.append(time.time())
 ```
 ### Step 5: Save Raw Content
 ```python
 import hashlib
 from pathlib import Path
 def save_content(url, content, content_type='markdown'):
    """Save crawled content to raw storage."""
    # Generate filename from URL hash
    url_hash = hashlib.sha256(url.encode()).hexdigest()[:16]
    # Determine extension
    ext_map = {'markdown': '.md', 'html': '.html', 'pdf': '.pdf'}
    ext = ext_map.get(content_type, '.txt')
    # Create dated subdirectory
    date_dir = datetime.now().strftime('%Y/%m')
    output_dir = Path.home() / 'reference-library/raw' / date_dir
    output_dir.mkdir(parents=True, exist_ok=True)
    # Save file
    filepath = output_dir / f"{url_hash}{ext}"
    if content_type == 'pdf':
        filepath.write_bytes(content)
    else:
        filepath.write_text(content, encoding='utf-8')
    return str(filepath)
 ```
 ### Step 6: Generate Crawl Manifest
 ```python
 def create_crawl_manifest(results):
    manifest = {
        "crawl_date": datetime.now().isoformat(),
        "total_crawled": len([r for r in results if r["status"] == "success"]),
        "total_failed": len([r for r in results if r["status"] == "failed"]),
        "documents": []
    }
    for result in results:
        manifest["documents"].append({
            "url": result["url"],
            "status": result["status"],
            "raw_content_path": result.get("filepath"),
            "content_size": result.get("size"),
            "crawl_method": "firecrawl",
            "error": result.get("error")
        })
    return manifest
 ```
 ## Error Handling
 | Error | Action |
 |-------|--------|
 | Timeout | Retry once with 2x timeout |
 | Rate limit (429) | Exponential backoff, max 3 retries |
 | Not found (404) | Log and skip |
 | Access denied (403) | Log, mark as `failed` |
 | Connection error | Retry with backoff |
 ```python
 def crawl_with_retry(url, max_retries=3):
    for attempt in range(max_retries):
        try:
            result = firecrawl_scrape(url)
            return {"status": "success", "content": result}
        except RateLimitError:
            wait = 2 ** attempt * 10  # 10, 20, 40 seconds
            time.sleep(wait)
        except TimeoutError:
            if attempt == 0:
                # Retry with doubled timeout
                result = firecrawl_scrape(url, timeout=60000)
                return {"status": "success", "content": result}
        except NotFoundError:
            return {"status": "failed", "error": "404 Not Found"}
        except Exception as e:
            if attempt == max_retries - 1:
                return {"status": "failed", "error": str(e)}
    return {"status": "failed", "error": "Max retries exceeded"}
 ```
 ## Firecrawl MCP Reference
 **scrape** - Single page:
 ```
 firecrawl_scrape(url, formats, only_main_content, timeout)
 ```
 **crawl** - Multi-page:
 ```
 firecrawl_crawl(url, max_depth, limit, formats, only_main_content)
 ```
 **map** - Discover URLs:
 ```
 firecrawl_map(url, limit)  # Returns list of URLs on site
 ```
 ## Integration
 | From | Input | To |
 |------|-------|-----|
 | reference-discovery | URL manifest | web-crawler-orchestrator |
 | web-crawler-orchestrator | Crawl manifest + raw files | content-repository |
 | quality-reviewer (deep_research) | Additional queries | reference-discovery → here |
 ## Output Structure
 ```
 ~/reference-library/raw/
 └── 2025/01/
    ├── a1b2c3d4e5f6g7h8.md   # Markdown content
    ├── b2c3d4e5f6g7h8i9.md
    └── c3d4e5f6g7h8i9j0.pdf  # PDF documents
 ```
--- a/custom-skills/90-reference-curator/claude-project/03-content-repository.md
+++ b/custom-skills/90-reference-curator/claude-project/03-content-repository.md
@@ -0,0 +1,158 @@
 # Content Repository
 Manages MySQL storage for the reference library system. Handles document storage, version control, deduplication, and retrieval.
 ## Prerequisites
 - MySQL 8.0+ with utf8mb4 charset
 - Config file at `~/.config/reference-curator/db_config.yaml`
 - Database `reference_library` initialized with schema
 ## Quick Reference
 ### Connection Setup
 ```python
 import yaml
 import os
 from pathlib import Path
 def get_db_config():
    config_path = Path.home() / ".config/reference-curator/db_config.yaml"
    with open(config_path) as f:
        config = yaml.safe_load(f)
    # Resolve environment variables
    mysql = config['mysql']
    return {
        'host': mysql['host'],
        'port': mysql['port'],
        'database': mysql['database'],
        'user': os.environ.get('MYSQL_USER', mysql.get('user', '')),
        'password': os.environ.get('MYSQL_PASSWORD', mysql.get('password', '')),
        'charset': mysql['charset']
    }
 ```
 ### Core Operations
 **Store New Document:**
 ```python
 def store_document(cursor, source_id, title, url, doc_type, raw_content_path):
    sql = """
    INSERT INTO documents (source_id, title, url, doc_type, crawl_date, crawl_status, raw_content_path)
    VALUES (%s, %s, %s, %s, NOW(), 'completed', %s)
    ON DUPLICATE KEY UPDATE 
        version = version + 1,
        previous_version_id = doc_id,
        crawl_date = NOW(),
        raw_content_path = VALUES(raw_content_path)
    """
    cursor.execute(sql, (source_id, title, url, doc_type, raw_content_path))
    return cursor.lastrowid
 ```
 **Check Duplicate:**
 ```python
 def is_duplicate(cursor, url):
    cursor.execute("SELECT doc_id FROM documents WHERE url_hash = SHA2(%s, 256)", (url,))
    return cursor.fetchone() is not None
 ```
 **Get Document by Topic:**
 ```python
 def get_docs_by_topic(cursor, topic_slug, min_quality=0.80):
    sql = """
    SELECT d.doc_id, d.title, d.url, dc.structured_content, dc.quality_score
    FROM documents d
    JOIN document_topics dt ON d.doc_id = dt.doc_id
    JOIN topics t ON dt.topic_id = t.topic_id
    LEFT JOIN distilled_content dc ON d.doc_id = dc.doc_id
    WHERE t.topic_slug = %s 
    AND (dc.review_status = 'approved' OR dc.review_status IS NULL)
    ORDER BY dt.relevance_score DESC
    """
    cursor.execute(sql, (topic_slug,))
    return cursor.fetchall()
 ```
 ## Table Quick Reference
 | Table | Purpose | Key Fields |
 |-------|---------|------------|
 | `sources` | Authorized content sources | source_type, credibility_tier, vendor |
 | `documents` | Crawled document metadata | url_hash (dedup), version, crawl_status |
 | `distilled_content` | Processed summaries | review_status, compression_ratio |
 | `review_logs` | QA decisions | quality_score, decision, refactor_instructions |
 | `topics` | Taxonomy | topic_slug, parent_topic_id |
 | `document_topics` | Many-to-many linking | relevance_score |
 | `export_jobs` | Export tracking | export_type, output_format, status |
 ## Status Values
 **crawl_status:** `pending` → `completed` | `failed` | `stale`
 **review_status:** `pending` → `in_review` → `approved` | `needs_refactor` | `rejected`
 **decision (review):** `approve` | `refactor` | `deep_research` | `reject`
 ## Common Queries
 ### Find Stale Documents (needs re-crawl)
 ```sql
 SELECT d.doc_id, d.title, d.url, d.crawl_date
 FROM documents d
 JOIN crawl_schedule cs ON d.source_id = cs.source_id
 WHERE d.crawl_date < DATE_SUB(NOW(), INTERVAL 
    CASE cs.frequency 
        WHEN 'daily' THEN 1 
        WHEN 'weekly' THEN 7 
        WHEN 'biweekly' THEN 14 
        WHEN 'monthly' THEN 30 
    END DAY)
 AND cs.is_enabled = TRUE;
 ```
 ### Get Pending Reviews
 ```sql
 SELECT dc.distill_id, d.title, d.url, dc.token_count_distilled
 FROM distilled_content dc
 JOIN documents d ON dc.doc_id = d.doc_id
 WHERE dc.review_status = 'pending'
 ORDER BY dc.distill_date ASC;
 ```
 ### Export-Ready Content
 ```sql
 SELECT d.title, d.url, dc.structured_content, t.topic_slug
 FROM documents d
 JOIN distilled_content dc ON d.doc_id = dc.doc_id
 JOIN document_topics dt ON d.doc_id = dt.doc_id
 JOIN topics t ON dt.topic_id = t.topic_id
 JOIN review_logs rl ON dc.distill_id = rl.distill_id
 WHERE rl.decision = 'approve'
 AND rl.quality_score >= 0.85
 ORDER BY t.topic_slug, dt.relevance_score DESC;
 ```
 ## Workflow Integration
 1. **From crawler-orchestrator:** Receive URL + raw content path → `store_document()`
 2. **To content-distiller:** Query pending documents → send for processing
 3. **From quality-reviewer:** Update `review_status` based on decision
 4. **To markdown-exporter:** Query approved content by topic
 ## Error Handling
 - **Duplicate URL:** Silent update (version increment) via `ON DUPLICATE KEY UPDATE`
 - **Missing source_id:** Validate against `sources` table before insert
 - **Connection failure:** Implement retry with exponential backoff
 ## Full Schema Reference
 See `references/schema.sql` for complete table definitions including indexes and constraints.
 ## Config File Template
 See `references/db_config_template.yaml` for connection configuration template.
--- a/custom-skills/90-reference-curator/claude-project/04-content-distiller.md
+++ b/custom-skills/90-reference-curator/claude-project/04-content-distiller.md
@@ -0,0 +1,234 @@
 # Content Distiller
 Transforms raw crawled content into structured, high-quality reference materials.
 ## Distillation Goals
 1. **Compress** - Reduce token count while preserving essential information
 2. **Structure** - Organize content for easy retrieval and reference
 3. **Extract** - Pull out code snippets, key concepts, and actionable patterns
 4. **Annotate** - Add metadata for searchability and categorization
 ## Distillation Workflow
 ### Step 1: Load Raw Content
 ```python
 def load_for_distillation(cursor):
    """Get documents ready for distillation."""
    sql = """
    SELECT d.doc_id, d.title, d.url, d.raw_content_path, 
           d.doc_type, s.source_type, s.credibility_tier
    FROM documents d
    JOIN sources s ON d.source_id = s.source_id
    LEFT JOIN distilled_content dc ON d.doc_id = dc.doc_id
    WHERE d.crawl_status = 'completed'
    AND dc.distill_id IS NULL
    ORDER BY s.credibility_tier ASC
    """
    cursor.execute(sql)
    return cursor.fetchall()
 ```
 ### Step 2: Analyze Content Structure
 Identify content type and select appropriate distillation strategy:
 ```python
 def analyze_structure(content, doc_type):
    """Analyze document structure for distillation."""
    analysis = {
        "has_code_blocks": bool(re.findall(r'```[\s\S]*?```', content)),
        "has_headers": bool(re.findall(r'^#+\s', content, re.MULTILINE)),
        "has_lists": bool(re.findall(r'^\s*[-*]\s', content, re.MULTILINE)),
        "has_tables": bool(re.findall(r'\|.*\|', content)),
        "estimated_tokens": len(content.split()) * 1.3,  # Rough estimate
        "section_count": len(re.findall(r'^#+\s', content, re.MULTILINE))
    }
    return analysis
 ```
 ### Step 3: Extract Key Components
 **Extract Code Snippets:**
 ```python
 def extract_code_snippets(content):
    """Extract all code blocks with language tags."""
    pattern = r'```(\w*)\n([\s\S]*?)```'
    snippets = []
    for match in re.finditer(pattern, content):
        snippets.append({
            "language": match.group(1) or "text",
            "code": match.group(2).strip(),
            "context": get_surrounding_text(content, match.start(), 200)
        })
    return snippets
 ```
 **Extract Key Concepts:**
 ```python
 def extract_key_concepts(content, title):
    """Use Claude to extract key concepts and definitions."""
    prompt = f"""
    Analyze this document and extract key concepts:
    Title: {title}
    Content: {content[:8000]}  # Limit for context
    Return JSON with:
    - concepts: [{{"term": "...", "definition": "...", "importance": "high|medium|low"}}]
    - techniques: [{{"name": "...", "description": "...", "use_case": "..."}}]
    - best_practices: ["..."]
    """
    # Use Claude API to process
    return claude_extract(prompt)
 ```
 ### Step 4: Create Structured Summary
 **Summary Template:**
 ```markdown
 # {title}
 **Source:** {url}
 **Type:** {source_type} | **Tier:** {credibility_tier}
 **Distilled:** {date}
 ## Executive Summary
 {2-3 sentence overview}
 ## Key Concepts
 {bulleted list of core concepts with brief definitions}
 ## Techniques & Patterns
 {extracted techniques with use cases}
 ## Code Examples
 {relevant code snippets with context}
 ## Best Practices
 {actionable recommendations}
 ## Related Topics
 {links to related content in library}
 ```
 ### Step 5: Optimize for Tokens
 ```python
 def optimize_content(structured_content, target_ratio=0.30):
    """
    Compress content to target ratio while preserving quality.
    Target: 30% of original token count.
    """
    original_tokens = count_tokens(structured_content)
    target_tokens = int(original_tokens * target_ratio)
    # Prioritized compression strategies
    strategies = [
        remove_redundant_explanations,
        condense_examples,
        merge_similar_sections,
        trim_verbose_descriptions
    ]
    optimized = structured_content
    for strategy in strategies:
        if count_tokens(optimized) > target_tokens:
            optimized = strategy(optimized)
    return optimized
 ```
 ### Step 6: Store Distilled Content
 ```python
 def store_distilled(cursor, doc_id, summary, key_concepts, 
                    code_snippets, structured_content, 
                    original_tokens, distilled_tokens):
    sql = """
    INSERT INTO distilled_content 
    (doc_id, summary, key_concepts, code_snippets, structured_content,
     token_count_original, token_count_distilled, distill_model, review_status)
    VALUES (%s, %s, %s, %s, %s, %s, %s, 'claude-opus-4-5', 'pending')
    """
    cursor.execute(sql, (
        doc_id, summary, 
        json.dumps(key_concepts), 
        json.dumps(code_snippets),
        structured_content,
        original_tokens, 
        distilled_tokens
    ))
    return cursor.lastrowid
 ```
 ## Distillation Prompts
 **For Prompt Engineering Content:**
 ```
 Focus on:
 1. Specific techniques with before/after examples
 2. Why techniques work (not just what)
 3. Common pitfalls and how to avoid them
 4. Actionable patterns that can be directly applied
 ```
 **For API Documentation:**
 ```
 Focus on:
 1. Endpoint specifications and parameters
 2. Request/response examples
 3. Error codes and handling
 4. Rate limits and best practices
 ```
 **For Research Papers:**
 ```
 Focus on:
 1. Key findings and conclusions
 2. Novel techniques introduced
 3. Practical applications
 4. Limitations and caveats
 ```
 ## Quality Metrics
 Track compression efficiency:
 | Metric | Target |
 |--------|--------|
 | Compression Ratio | 25-35% of original |
 | Key Concept Coverage | ≥90% of important terms |
 | Code Snippet Retention | 100% of relevant examples |
 | Readability | Clear, scannable structure |
 ## Handling Refactor Requests
 When `quality-reviewer` returns `refactor` decision:
 ```python
 def handle_refactor(distill_id, instructions):
    """Re-distill based on reviewer feedback."""
    # Load original content and existing distillation
    original = load_raw_content(distill_id)
    existing = load_distilled_content(distill_id)
    # Apply specific improvements based on instructions
    improved = apply_improvements(existing, instructions)
    # Update distilled_content
    update_distilled(distill_id, improved)
    # Reset review status
    set_review_status(distill_id, 'pending')
 ```
 ## Integration
 | From | Input | To |
 |------|-------|-----|
 | content-repository | Raw document records | content-distiller |
 | content-distiller | Distilled content | quality-reviewer |
 | quality-reviewer | Refactor instructions | content-distiller (loop) |
--- a/custom-skills/90-reference-curator/claude-project/05-quality-reviewer.md
+++ b/custom-skills/90-reference-curator/claude-project/05-quality-reviewer.md
@@ -0,0 +1,223 @@
 # Quality Reviewer
 Evaluates distilled content for quality, routes decisions, and triggers refactoring or additional research when needed.
 ## Review Workflow
 ```
 [Distilled Content]
       │
       ▼
 ┌─────────────────┐
 │ Score Criteria  │ → accuracy, completeness, clarity, PE quality, usability
 └─────────────────┘
       │
       ▼
 ┌─────────────────┐
 │ Calculate Total │ → weighted average
 └─────────────────┘
       │
       ├── ≥ 0.85 → APPROVE → markdown-exporter
       ├── 0.60-0.84 → REFACTOR → content-distiller (with instructions)
       ├── 0.40-0.59 → DEEP_RESEARCH → web-crawler-orchestrator (with queries)
       └── < 0.40 → REJECT → archive with reason
 ```
 ## Scoring Criteria
 | Criterion | Weight | Checks |
 |-----------|--------|--------|
 | **Accuracy** | 0.25 | Factual correctness, up-to-date info, proper attribution |
 | **Completeness** | 0.20 | Covers key concepts, includes examples, addresses edge cases |
 | **Clarity** | 0.20 | Clear structure, concise language, logical flow |
 | **PE Quality** | 0.25 | Demonstrates techniques, before/after examples, explains why |
 | **Usability** | 0.10 | Easy to reference, searchable keywords, appropriate length |
 ## Decision Thresholds
 | Score Range | Decision | Action |
 |-------------|----------|--------|
 | ≥ 0.85 | `approve` | Proceed to export |
 | 0.60 - 0.84 | `refactor` | Return to distiller with feedback |
 | 0.40 - 0.59 | `deep_research` | Gather more sources, then re-distill |
 | < 0.40 | `reject` | Archive, log reason |
 ## Review Process
 ### Step 1: Load Content for Review
 ```python
 def get_pending_reviews(cursor):
    sql = """
    SELECT dc.distill_id, dc.doc_id, d.title, d.url, 
           dc.summary, dc.key_concepts, dc.structured_content,
           dc.token_count_original, dc.token_count_distilled,
           s.credibility_tier
    FROM distilled_content dc
    JOIN documents d ON dc.doc_id = d.doc_id
    JOIN sources s ON d.source_id = s.source_id
    WHERE dc.review_status = 'pending'
    ORDER BY s.credibility_tier ASC, dc.distill_date ASC
    """
    cursor.execute(sql)
    return cursor.fetchall()
 ```
 ### Step 2: Score Each Criterion
 Evaluate content against each criterion using this assessment template:
 ```python
 assessment_template = {
    "accuracy": {
        "score": 0.0,  # 0.00 - 1.00
        "notes": "",
        "issues": []   # Specific factual errors if any
    },
    "completeness": {
        "score": 0.0,
        "notes": "",
        "missing_topics": []  # Concepts that should be covered
    },
    "clarity": {
        "score": 0.0,
        "notes": "",
        "confusing_sections": []  # Sections needing rewrite
    },
    "prompt_engineering_quality": {
        "score": 0.0,
        "notes": "",
        "improvements": []  # Specific PE technique gaps
    },
    "usability": {
        "score": 0.0,
        "notes": "",
        "suggestions": []
    }
 }
 ```
 ### Step 3: Calculate Final Score
 ```python
 WEIGHTS = {
    "accuracy": 0.25,
    "completeness": 0.20,
    "clarity": 0.20,
    "prompt_engineering_quality": 0.25,
    "usability": 0.10
 }
 def calculate_quality_score(assessment):
    return sum(
        assessment[criterion]["score"] * weight
        for criterion, weight in WEIGHTS.items()
    )
 ```
 ### Step 4: Route Decision
 ```python
 def determine_decision(score, assessment):
    if score >= 0.85:
        return "approve", None, None
    elif score >= 0.60:
        instructions = generate_refactor_instructions(assessment)
        return "refactor", instructions, None
    elif score >= 0.40:
        queries = generate_research_queries(assessment)
        return "deep_research", None, queries
    else:
        return "reject", f"Quality score {score:.2f} below minimum threshold", None
 def generate_refactor_instructions(assessment):
    """Extract actionable feedback from low-scoring criteria."""
    instructions = []
    for criterion, data in assessment.items():
        if data["score"] < 0.80:
            if data.get("issues"):
                instructions.extend(data["issues"])
            if data.get("missing_topics"):
                instructions.append(f"Add coverage for: {', '.join(data['missing_topics'])}")
            if data.get("improvements"):
                instructions.extend(data["improvements"])
    return "\n".join(instructions)
 def generate_research_queries(assessment):
    """Generate search queries for content gaps."""
    queries = []
    if assessment["completeness"]["missing_topics"]:
        for topic in assessment["completeness"]["missing_topics"]:
            queries.append(f"{topic} documentation guide")
    if assessment["accuracy"]["issues"]:
        queries.append("latest official documentation verification")
    return queries
 ```
 ### Step 5: Log Review Decision
 ```python
 def log_review(cursor, distill_id, assessment, score, decision, instructions=None, queries=None):
    # Get current round number
    cursor.execute(
        "SELECT COALESCE(MAX(review_round), 0) + 1 FROM review_logs WHERE distill_id = %s",
        (distill_id,)
    )
    review_round = cursor.fetchone()[0]
    sql = """
    INSERT INTO review_logs 
    (distill_id, review_round, reviewer_type, quality_score, assessment, 
     decision, refactor_instructions, research_queries)
    VALUES (%s, %s, 'claude_review', %s, %s, %s, %s, %s)
    """
    cursor.execute(sql, (
        distill_id, review_round, score, 
        json.dumps(assessment), decision, instructions, 
        json.dumps(queries) if queries else None
    ))
    # Update distilled_content status
    status_map = {
        "approve": "approved",
        "refactor": "needs_refactor", 
        "deep_research": "needs_refactor",
        "reject": "rejected"
    }
    cursor.execute(
        "UPDATE distilled_content SET review_status = %s WHERE distill_id = %s",
        (status_map[decision], distill_id)
    )
 ```
 ## Prompt Engineering Quality Checklist
 When scoring `prompt_engineering_quality`, verify:
 - [ ] Demonstrates specific techniques (CoT, few-shot, etc.)
 - [ ] Shows before/after examples
 - [ ] Explains *why* techniques work, not just *what*
 - [ ] Provides actionable patterns
 - [ ] Includes edge cases and failure modes
 - [ ] References authoritative sources
 ## Auto-Approve Rules
 Tier 1 (official) sources with score ≥ 0.80 may auto-approve without human review if configured:
 ```yaml
 # In export_config.yaml
 quality:
  auto_approve_tier1_sources: true
  auto_approve_min_score: 0.80
 ```
 ## Integration Points
 | From | Action | To |
 |------|--------|-----|
 | content-distiller | Sends distilled content | quality-reviewer |
 | quality-reviewer | APPROVE | markdown-exporter |
 | quality-reviewer | REFACTOR + instructions | content-distiller |
 | quality-reviewer | DEEP_RESEARCH + queries | web-crawler-orchestrator |
--- a/custom-skills/90-reference-curator/claude-project/06-markdown-exporter.md
+++ b/custom-skills/90-reference-curator/claude-project/06-markdown-exporter.md
@@ -0,0 +1,290 @@
 # Markdown Exporter
 Exports approved content as structured markdown files for Claude Projects or fine-tuning.
 ## Export Configuration
 ```yaml
 # ~/.config/reference-curator/export_config.yaml
 output:
  base_path: ~/reference-library/exports/
  project_files:
    structure: nested_by_topic  # flat | nested_by_topic | nested_by_source
    index_file: INDEX.md
    include_metadata: true
  fine_tuning:
    format: jsonl
    max_tokens_per_sample: 4096
    include_system_prompt: true
 quality:
  min_score_for_export: 0.80
 ```
 ## Export Workflow
 ### Step 1: Query Approved Content
 ```python
 def get_exportable_content(cursor, min_score=0.80, topic_filter=None):
    """Get all approved content meeting quality threshold."""
    sql = """
    SELECT d.doc_id, d.title, d.url, 
           dc.summary, dc.key_concepts, dc.code_snippets, dc.structured_content,
           t.topic_slug, t.topic_name,
           rl.quality_score, s.credibility_tier, s.vendor
    FROM documents d
    JOIN distilled_content dc ON d.doc_id = dc.doc_id
    JOIN document_topics dt ON d.doc_id = dt.doc_id
    JOIN topics t ON dt.topic_id = t.topic_id
    JOIN review_logs rl ON dc.distill_id = rl.distill_id
    JOIN sources s ON d.source_id = s.source_id
    WHERE rl.decision = 'approve'
    AND rl.quality_score >= %s
    AND rl.review_id = (
        SELECT MAX(review_id) FROM review_logs 
        WHERE distill_id = dc.distill_id
    )
    """
    params = [min_score]
    if topic_filter:
        sql += " AND t.topic_slug IN (%s)" % ','.join(['%s'] * len(topic_filter))
        params.extend(topic_filter)
    sql += " ORDER BY t.topic_slug, rl.quality_score DESC"
    cursor.execute(sql, params)
    return cursor.fetchall()
 ```
 ### Step 2: Organize by Structure
 **Nested by Topic (recommended):**
 ```
 exports/
 ├── INDEX.md
 ├── prompt-engineering/
 │   ├── _index.md
 │   ├── 01-chain-of-thought.md
 │   ├── 02-few-shot-prompting.md
 │   └── 03-system-prompts.md
 ├── claude-models/
 │   ├── _index.md
 │   ├── 01-model-comparison.md
 │   └── 02-context-windows.md
 └── agent-building/
    ├── _index.md
    └── 01-tool-use.md
 ```
 **Flat Structure:**
 ```
 exports/
 ├── INDEX.md
 ├── prompt-engineering-chain-of-thought.md
 ├── prompt-engineering-few-shot.md
 └── claude-models-comparison.md
 ```
 ### Step 3: Generate Files
 **Document File Template:**
 ```python
 def generate_document_file(doc, include_metadata=True):
    content = []
    if include_metadata:
        content.append("---")
        content.append(f"title: {doc['title']}")
        content.append(f"source: {doc['url']}")
        content.append(f"vendor: {doc['vendor']}")
        content.append(f"tier: {doc['credibility_tier']}")
        content.append(f"quality_score: {doc['quality_score']:.2f}")
        content.append(f"exported: {datetime.now().isoformat()}")
        content.append("---")
        content.append("")
    content.append(doc['structured_content'])
    return "\n".join(content)
 ```
 **Topic Index Template:**
 ```python
 def generate_topic_index(topic_slug, topic_name, documents):
    content = [
        f"# {topic_name}",
        "",
        f"This section contains {len(documents)} reference documents.",
        "",
        "## Contents",
        ""
    ]
    for i, doc in enumerate(documents, 1):
        filename = generate_filename(doc['title'])
        content.append(f"{i}. [{doc['title']}]({filename})")
    return "\n".join(content)
 ```
 **Root INDEX Template:**
 ```python
 def generate_root_index(topics_with_counts, export_date):
    content = [
        "# Reference Library",
        "",
        f"Exported: {export_date}",
        "",
        "## Topics",
        ""
    ]
    for topic in topics_with_counts:
        content.append(f"- [{topic['name']}]({topic['slug']}/) ({topic['count']} documents)")
    content.extend([
        "",
        "## Quality Standards",
        "",
        "All documents in this library have:",
        "- Passed quality review (score ≥ 0.80)",
        "- Been distilled for conciseness",
        "- Verified source attribution"
    ])
    return "\n".join(content)
 ```
 ### Step 4: Write Files
 ```python
 def export_project_files(content_list, config):
    base_path = Path(config['output']['base_path'])
    structure = config['output']['project_files']['structure']
    # Group by topic
    by_topic = defaultdict(list)
    for doc in content_list:
        by_topic[doc['topic_slug']].append(doc)
    # Create directories and files
    for topic_slug, docs in by_topic.items():
        if structure == 'nested_by_topic':
            topic_dir = base_path / topic_slug
            topic_dir.mkdir(parents=True, exist_ok=True)
            # Write topic index
            topic_index = generate_topic_index(topic_slug, docs[0]['topic_name'], docs)
            (topic_dir / '_index.md').write_text(topic_index)
            # Write document files
            for i, doc in enumerate(docs, 1):
                filename = f"{i:02d}-{slugify(doc['title'])}.md"
                file_content = generate_document_file(doc)
                (topic_dir / filename).write_text(file_content)
    # Write root INDEX
    topics_summary = [
        {"slug": slug, "name": docs[0]['topic_name'], "count": len(docs)}
        for slug, docs in by_topic.items()
    ]
    root_index = generate_root_index(topics_summary, datetime.now().isoformat())
    (base_path / 'INDEX.md').write_text(root_index)
 ```
 ### Step 5: Fine-tuning Export (Optional)
 ```python
 def export_fine_tuning_dataset(content_list, config):
    """Export as JSONL for fine-tuning."""
    output_path = Path(config['output']['base_path']) / 'fine_tuning.jsonl'
    max_tokens = config['output']['fine_tuning']['max_tokens_per_sample']
    with open(output_path, 'w') as f:
        for doc in content_list:
            sample = {
                "messages": [
                    {
                        "role": "system",
                        "content": "You are an expert on AI and prompt engineering."
                    },
                    {
                        "role": "user", 
                        "content": f"Explain {doc['title']}"
                    },
                    {
                        "role": "assistant",
                        "content": truncate_to_tokens(doc['structured_content'], max_tokens)
                    }
                ],
                "metadata": {
                    "source": doc['url'],
                    "topic": doc['topic_slug'],
                    "quality_score": doc['quality_score']
                }
            }
            f.write(json.dumps(sample) + '\n')
 ```
 ### Step 6: Log Export Job
 ```python
 def log_export_job(cursor, export_name, export_type, output_path, 
                   topic_filter, total_docs, total_tokens):
    sql = """
    INSERT INTO export_jobs 
    (export_name, export_type, output_format, topic_filter, output_path,
     total_documents, total_tokens, status, started_at, completed_at)
    VALUES (%s, %s, 'markdown', %s, %s, %s, %s, 'completed', NOW(), NOW())
    """
    cursor.execute(sql, (
        export_name, export_type, 
        json.dumps(topic_filter) if topic_filter else None,
        str(output_path), total_docs, total_tokens
    ))
 ```
 ## Cross-Reference Generation
 Link related documents:
 ```python
 def add_cross_references(doc, all_docs):
    """Find and link related documents."""
    related = []
    doc_concepts = set(c['term'].lower() for c in doc['key_concepts'])
    for other in all_docs:
        if other['doc_id'] == doc['doc_id']:
            continue
        other_concepts = set(c['term'].lower() for c in other['key_concepts'])
        overlap = len(doc_concepts & other_concepts)
        if overlap >= 2:
            related.append({
                "title": other['title'],
                "path": generate_relative_path(doc, other),
                "overlap": overlap
            })
    return sorted(related, key=lambda x: x['overlap'], reverse=True)[:5]
 ```
 ## Output Verification
 After export, verify:
 - [ ] All files readable and valid markdown
 - [ ] INDEX.md links resolve correctly
 - [ ] No broken cross-references
 - [ ] Total token count matches expectation
 - [ ] No duplicate content
 ## Integration
 | From | Input | To |
 |------|-------|-----|
 | quality-reviewer | Approved content IDs | markdown-exporter |
 | markdown-exporter | Structured files | Project knowledge / Fine-tuning |
--- a/custom-skills/90-reference-curator/claude-project/INDEX.md
+++ b/custom-skills/90-reference-curator/claude-project/INDEX.md
@@ -0,0 +1,89 @@
 # Reference Curator - Claude.ai Project Knowledge
 This project knowledge enables Claude to curate, process, and export reference documentation through 6 modular skills.
 ## Skills Overview
 | Skill | Purpose | Trigger Phrases |
 |-------|---------|-----------------|
 | **reference-discovery** | Search & validate authoritative sources | "find references", "search documentation", "discover sources" |
 | **web-crawler** | Multi-backend crawling orchestration | "crawl URL", "fetch documents", "scrape pages" |
 | **content-repository** | MySQL storage management | "store content", "save to database", "check duplicates" |
 | **content-distiller** | Summarize & extract key concepts | "distill content", "summarize document", "extract key concepts" |
 | **quality-reviewer** | QA scoring & routing decisions | "review content", "quality check", "assess distilled content" |
 | **markdown-exporter** | Export to markdown/JSONL | "export references", "generate project files", "create markdown output" |
 ## Workflow
 ```
 [Topic Input]
     │
     ▼
 ┌─────────────────────┐
 │ reference-discovery │ → Search & validate sources
 └─────────────────────┘
     │
     ▼
 ┌─────────────────────┐
 │ web-crawler         │ → Crawl (Firecrawl/Node.js/aiohttp/Scrapy)
 └─────────────────────┘
     │
     ▼
 ┌─────────────────────┐
 │ content-repository  │ → Store in MySQL
 └─────────────────────┘
     │
     ▼
 ┌─────────────────────┐
 │ content-distiller   │ → Summarize & extract
 └─────────────────────┘
     │
     ▼
 ┌─────────────────────┐
 │ quality-reviewer    │ → QA loop
 └─────────────────────┘
     │
     ├── REFACTOR → content-distiller
     ├── DEEP_RESEARCH → web-crawler
     │
     ▼ APPROVE
 ┌─────────────────────┐
 │ markdown-exporter   │ → Project files / Fine-tuning
 └─────────────────────┘
 ```
 ## Quality Scoring Thresholds
 | Score | Decision | Action |
 |-------|----------|--------|
 | ≥ 0.85 | **Approve** | Ready for export |
 | 0.60-0.84 | **Refactor** | Re-distill with feedback |
 | 0.40-0.59 | **Deep Research** | Gather more sources |
 | < 0.40 | **Reject** | Archive (low quality) |
 ## Source Credibility Tiers
 | Tier | Source Type | Examples |
 |------|-------------|----------|
 | **Tier 1** | Official documentation | docs.anthropic.com, platform.openai.com/docs |
 | **Tier 1** | Official engineering blogs | anthropic.com/news, openai.com/blog |
 | **Tier 2** | Research papers | arxiv.org papers with citations |
 | **Tier 2** | Verified community guides | Official cookbooks, tutorials |
 | **Tier 3** | Community content | Blog posts, Stack Overflow |
 ## Files in This Project
 - `INDEX.md` - This overview file
 - `reference-curator-complete.md` - All 6 skills in one file
 - `01-reference-discovery.md` - Source discovery skill
 - `02-web-crawler.md` - Crawling orchestration skill
 - `03-content-repository.md` - Database storage skill
 - `04-content-distiller.md` - Content summarization skill
 - `05-quality-reviewer.md` - QA review skill
 - `06-markdown-exporter.md` - Export skill
 ## Usage
 Upload all files to a Claude.ai Project, or upload only the skills you need.
 For the complete experience, upload `reference-curator-complete.md` which contains all skills in one file.
--- a/custom-skills/90-reference-curator/claude-project/reference-curator-complete.md
+++ b/custom-skills/90-reference-curator/claude-project/reference-curator-complete.md
@@ -0,0 +1,473 @@
 # Reference Curator - Complete Skill Set
 This document contains all 6 skills for curating, processing, and exporting reference documentation.
 ---
 # 1. Reference Discovery
 Searches for authoritative sources, validates credibility, and produces curated URL lists for crawling.
 ## Source Priority Hierarchy
 | Tier | Source Type | Examples |
 |------|-------------|----------|
 | **Tier 1** | Official documentation | docs.anthropic.com, docs.claude.com, platform.openai.com/docs |
 | **Tier 1** | Engineering blogs (official) | anthropic.com/news, openai.com/blog |
 | **Tier 1** | Official GitHub repos | github.com/anthropics/*, github.com/openai/* |
 | **Tier 2** | Research papers | arxiv.org, papers with citations |
 | **Tier 2** | Verified community guides | Cookbook examples, official tutorials |
 | **Tier 3** | Community content | Blog posts, tutorials, Stack Overflow |
 ## Discovery Workflow
 ### Step 1: Define Search Scope
 ```python
 search_config = {
    "topic": "prompt engineering",
    "vendors": ["anthropic", "openai", "google"],
    "source_types": ["official_docs", "engineering_blog", "github_repo"],
    "freshness": "past_year",
    "max_results_per_query": 20
 }
 ```
 ### Step 2: Generate Search Queries
 ```python
 def generate_queries(topic, vendors):
    queries = []
    for vendor in vendors:
        queries.append(f"site:docs.{vendor}.com {topic}")
        queries.append(f"site:{vendor}.com/docs {topic}")
        queries.append(f"site:{vendor}.com/blog {topic}")
        queries.append(f"site:github.com/{vendor} {topic}")
    queries.append(f"site:arxiv.org {topic}")
    return queries
 ```
 ### Step 3: Validate and Score Sources
 ```python
 def score_source(url, title):
    score = 0.0
    if any(d in url for d in ['docs.anthropic.com', 'docs.claude.com', 'docs.openai.com']):
        score += 0.40  # Tier 1 official docs
    elif any(d in url for d in ['anthropic.com', 'openai.com', 'google.dev']):
        score += 0.30  # Tier 1 official blog/news
    elif 'github.com' in url and any(v in url for v in ['anthropics', 'openai', 'google']):
        score += 0.30  # Tier 1 official repos
    elif 'arxiv.org' in url:
        score += 0.20  # Tier 2 research
    else:
        score += 0.10  # Tier 3 community
    return min(score, 1.0)
 def assign_credibility_tier(score):
    if score >= 0.60:
        return 'tier1_official'
    elif score >= 0.40:
        return 'tier2_verified'
    else:
        return 'tier3_community'
 ```
 ## Output Format
 ```json
 {
  "discovery_date": "2025-01-28T10:30:00",
  "topic": "prompt engineering",
  "total_urls": 15,
  "urls": [
    {
      "url": "https://docs.anthropic.com/en/docs/prompt-engineering",
      "title": "Prompt Engineering Guide",
      "credibility_tier": "tier1_official",
      "credibility_score": 0.85,
      "source_type": "official_docs",
      "vendor": "anthropic"
    }
  ]
 }
 ```
 ---
 # 2. Web Crawler Orchestrator
 Manages crawling operations using Firecrawl MCP with rate limiting and format handling.
 ## Crawl Configuration
 ```yaml
 firecrawl:
  rate_limit:
    requests_per_minute: 20
    concurrent_requests: 3
  default_options:
    timeout: 30000
    only_main_content: true
 ```
 ## Crawl Workflow
 ### Determine Crawl Strategy
 ```python
 def select_strategy(url):
    if url.endswith('.pdf'):
        return 'pdf_extract'
    elif 'github.com' in url and '/blob/' in url:
        return 'raw_content'
    elif any(d in url for d in ['docs.', 'documentation']):
        return 'scrape'
    else:
        return 'scrape'
 ```
 ### Execute Firecrawl
 ```python
 # Single page scrape
 firecrawl_scrape(
    url="https://docs.anthropic.com/en/docs/prompt-engineering",
    formats=["markdown"],
    only_main_content=True,
    timeout=30000
 )
 # Multi-page crawl
 firecrawl_crawl(
    url="https://docs.anthropic.com/en/docs/",
    max_depth=2,
    limit=50,
    formats=["markdown"]
 )
 ```
 ### Rate Limiting
 ```python
 class RateLimiter:
    def __init__(self, requests_per_minute=20):
        self.rpm = requests_per_minute
        self.request_times = deque()
    def wait_if_needed(self):
        now = time.time()
        while self.request_times and now - self.request_times[0] > 60:
            self.request_times.popleft()
        if len(self.request_times) >= self.rpm:
            wait_time = 60 - (now - self.request_times[0])
            if wait_time > 0:
                time.sleep(wait_time)
        self.request_times.append(time.time())
 ```
 ## Error Handling
 | Error | Action |
 |-------|--------|
 | Timeout | Retry once with 2x timeout |
 | Rate limit (429) | Exponential backoff, max 3 retries |
 | Not found (404) | Log and skip |
 | Access denied (403) | Log, mark as `failed` |
 ---
 # 3. Content Repository
 Manages MySQL storage for the reference library. Handles document storage, version control, deduplication, and retrieval.
 ## Core Operations
 **Store New Document:**
 ```python
 def store_document(cursor, source_id, title, url, doc_type, raw_content_path):
    sql = """
    INSERT INTO documents (source_id, title, url, doc_type, crawl_date, crawl_status, raw_content_path)
    VALUES (%s, %s, %s, %s, NOW(), 'completed', %s)
    ON DUPLICATE KEY UPDATE
        version = version + 1,
        crawl_date = NOW(),
        raw_content_path = VALUES(raw_content_path)
    """
    cursor.execute(sql, (source_id, title, url, doc_type, raw_content_path))
    return cursor.lastrowid
 ```
 **Check Duplicate:**
 ```python
 def is_duplicate(cursor, url):
    cursor.execute("SELECT doc_id FROM documents WHERE url_hash = SHA2(%s, 256)", (url,))
    return cursor.fetchone() is not None
 ```
 ## Table Quick Reference
 | Table | Purpose | Key Fields |
 |-------|---------|------------|
 | `sources` | Authorized content sources | source_type, credibility_tier, vendor |
 | `documents` | Crawled document metadata | url_hash (dedup), version, crawl_status |
 | `distilled_content` | Processed summaries | review_status, compression_ratio |
 | `review_logs` | QA decisions | quality_score, decision |
 | `topics` | Taxonomy | topic_slug, parent_topic_id |
 ## Status Values
 - **crawl_status:** `pending` → `completed` | `failed` | `stale`
 - **review_status:** `pending` → `in_review` → `approved` | `needs_refactor` | `rejected`
 - **decision:** `approve` | `refactor` | `deep_research` | `reject`
 ---
 # 4. Content Distiller
 Transforms raw crawled content into structured, high-quality reference materials.
 ## Distillation Goals
 1. **Compress** - Reduce token count while preserving essential information
 2. **Structure** - Organize content for easy retrieval and reference
 3. **Extract** - Pull out code snippets, key concepts, and actionable patterns
 4. **Annotate** - Add metadata for searchability and categorization
 ## Extract Key Components
 **Extract Code Snippets:**
 ```python
 def extract_code_snippets(content):
    pattern = r'```(\w*)\n([\s\S]*?)```'
    snippets = []
    for match in re.finditer(pattern, content):
        snippets.append({
            "language": match.group(1) or "text",
            "code": match.group(2).strip(),
            "context": get_surrounding_text(content, match.start(), 200)
        })
    return snippets
 ```
 **Extract Key Concepts:**
 ```python
 def extract_key_concepts(content, title):
    prompt = f"""
    Analyze this document and extract key concepts:
    Title: {title}
    Content: {content[:8000]}
    Return JSON with:
    - concepts: [{{"term": "...", "definition": "...", "importance": "high|medium|low"}}]
    - techniques: [{{"name": "...", "description": "...", "use_case": "..."}}]
    - best_practices: ["..."]
    """
    return claude_extract(prompt)
 ```
 ## Summary Template
 ```markdown
 # {title}
 **Source:** {url}
 **Type:** {source_type} | **Tier:** {credibility_tier}
 ## Executive Summary
 {2-3 sentence overview}
 ## Key Concepts
 {bulleted list of core concepts}
 ## Techniques & Patterns
 {extracted techniques with use cases}
 ## Code Examples
 {relevant code snippets}
 ## Best Practices
 {actionable recommendations}
 ```
 ## Quality Metrics
 | Metric | Target |
 |--------|--------|
 | Compression Ratio | 25-35% of original |
 | Key Concept Coverage | ≥90% of important terms |
 | Code Snippet Retention | 100% of relevant examples |
 ---
 # 5. Quality Reviewer
 Evaluates distilled content, routes decisions, and triggers refactoring or additional research.
 ## Review Workflow
 ```
 [Distilled Content]
       │
       ▼
 ┌─────────────────┐
 │ Score Criteria  │ → accuracy, completeness, clarity, PE quality, usability
 └─────────────────┘
       │
       ├── ≥ 0.85 → APPROVE → markdown-exporter
       ├── 0.60-0.84 → REFACTOR → content-distiller (with instructions)
       ├── 0.40-0.59 → DEEP_RESEARCH → web-crawler (with queries)
       └── < 0.40 → REJECT → archive with reason
 ```
 ## Scoring Criteria
 | Criterion | Weight | Checks |
 |-----------|--------|--------|
 | **Accuracy** | 0.25 | Factual correctness, up-to-date info, proper attribution |
 | **Completeness** | 0.20 | Covers key concepts, includes examples, addresses edge cases |
 | **Clarity** | 0.20 | Clear structure, concise language, logical flow |
 | **PE Quality** | 0.25 | Demonstrates techniques, before/after examples, explains why |
 | **Usability** | 0.10 | Easy to reference, searchable keywords, appropriate length |
 ## Calculate Final Score
 ```python
 WEIGHTS = {
    "accuracy": 0.25,
    "completeness": 0.20,
    "clarity": 0.20,
    "prompt_engineering_quality": 0.25,
    "usability": 0.10
 }
 def calculate_quality_score(assessment):
    return sum(
        assessment[criterion]["score"] * weight
        for criterion, weight in WEIGHTS.items()
    )
 ```
 ## Route Decision
 ```python
 def determine_decision(score, assessment):
    if score >= 0.85:
        return "approve", None, None
    elif score >= 0.60:
        instructions = generate_refactor_instructions(assessment)
        return "refactor", instructions, None
    elif score >= 0.40:
        queries = generate_research_queries(assessment)
        return "deep_research", None, queries
    else:
        return "reject", f"Quality score {score:.2f} below minimum", None
 ```
 ## Prompt Engineering Quality Checklist
 - [ ] Demonstrates specific techniques (CoT, few-shot, etc.)
 - [ ] Shows before/after examples
 - [ ] Explains *why* techniques work, not just *what*
 - [ ] Provides actionable patterns
 - [ ] Includes edge cases and failure modes
 - [ ] References authoritative sources
 ---
 # 6. Markdown Exporter
 Exports approved content as structured markdown files for Claude Projects or fine-tuning.
 ## Export Structure
 **Nested by Topic (recommended):**
 ```
 exports/
 ├── INDEX.md
 ├── prompt-engineering/
 │   ├── _index.md
 │   ├── 01-chain-of-thought.md
 │   └── 02-few-shot-prompting.md
 ├── claude-models/
 │   ├── _index.md
 │   └── 01-model-comparison.md
 └── agent-building/
    └── 01-tool-use.md
 ```
 ## Document File Template
 ```python
 def generate_document_file(doc, include_metadata=True):
    content = []
    if include_metadata:
        content.append("---")
        content.append(f"title: {doc['title']}")
        content.append(f"source: {doc['url']}")
        content.append(f"vendor: {doc['vendor']}")
        content.append(f"tier: {doc['credibility_tier']}")
        content.append(f"quality_score: {doc['quality_score']:.2f}")
        content.append("---")
        content.append("")
    content.append(doc['structured_content'])
    return "\n".join(content)
 ```
 ## Fine-tuning Export (JSONL)
 ```python
 def export_fine_tuning_dataset(content_list, config):
    with open('fine_tuning.jsonl', 'w') as f:
        for doc in content_list:
            sample = {
                "messages": [
                    {"role": "system", "content": "You are an expert on AI and prompt engineering."},
                    {"role": "user", "content": f"Explain {doc['title']}"},
                    {"role": "assistant", "content": doc['structured_content']}
                ],
                "metadata": {
                    "source": doc['url'],
                    "topic": doc['topic_slug'],
                    "quality_score": doc['quality_score']
                }
            }
            f.write(json.dumps(sample) + '\n')
 ```
 ## Cross-Reference Generation
 ```python
 def add_cross_references(doc, all_docs):
    related = []
    doc_concepts = set(c['term'].lower() for c in doc['key_concepts'])
    for other in all_docs:
        if other['doc_id'] == doc['doc_id']:
            continue
        other_concepts = set(c['term'].lower() for c in other['key_concepts'])
        overlap = len(doc_concepts & other_concepts)
        if overlap >= 2:
            related.append({
                "title": other['title'],
                "path": generate_relative_path(doc, other),
                "overlap": overlap
            })
    return sorted(related, key=lambda x: x['overlap'], reverse=True)[:5]
 ```
 ---
 # Integration Flow
 | From | Output | To |
 |------|--------|-----|
 | **reference-discovery** | URL manifest | web-crawler |
 | **web-crawler** | Raw content + manifest | content-repository |
 | **content-repository** | Document records | content-distiller |
 | **content-distiller** | Distilled content | quality-reviewer |
 | **quality-reviewer** (approve) | Approved IDs | markdown-exporter |
 | **quality-reviewer** (refactor) | Instructions | content-distiller |
 | **quality-reviewer** (deep_research) | Queries | web-crawler |
--- a/custom-skills/90-reference-curator/install.sh
+++ b/custom-skills/90-reference-curator/install.sh
@@ -717,6 +717,65 @@ EOF
    post_install
 }
 # ============================================================================
 # Export for Claude.ai Projects
 # ============================================================================
 export_claude_ai() {
    print_header
    echo -e "${BOLD}Export for Claude.ai Projects${NC}"
    echo ""
    local project_dir="$SCRIPT_DIR/claude-project"
    if [[ ! -d "$project_dir" ]]; then
        print_error "claude-project directory not found"
        echo "Expected: $project_dir"
        exit 1
    fi
    echo "Available files for Claude.ai Projects:"
    echo ""
    echo -e "  ${CYAN}Consolidated (single file):${NC}"
    echo "    reference-curator-complete.md  - All 6 skills in one file"
    echo ""
    echo -e "  ${CYAN}Individual skills:${NC}"
    ls -1 "$project_dir"/*.md 2>/dev/null | while read file; do
        local filename=$(basename "$file")
        local size=$(du -h "$file" | cut -f1)
        if [[ "$filename" != "INDEX.md" && "$filename" != "reference-curator-complete.md" ]]; then
            echo "    $filename ($size)"
        fi
    done
    echo ""
    echo -e "${BOLD}Upload to Claude.ai:${NC}"
    echo ""
    echo "  1. Go to https://claude.ai"
    echo "  2. Create a new Project or open existing one"
    echo "  3. Click 'Add to project knowledge'"
    echo "  4. Upload files from:"
    echo -e "     ${CYAN}$project_dir${NC}"
    echo ""
    echo "  Recommended: Upload 'reference-curator-complete.md' for full skill set"
    echo ""
    if prompt_yes_no "Copy files to a different location?" "n"; then
        prompt_with_default "Destination directory" "$HOME/Desktop/reference-curator-claude-ai" "DEST_DIR"
        mkdir -p "$DEST_DIR"
        cp "$project_dir"/*.md "$DEST_DIR/"
        print_success "Files copied to $DEST_DIR"
        echo ""
        echo "Files ready for upload:"
        ls -la "$DEST_DIR"/*.md
    fi
    echo ""
    echo -e "${GREEN}Done!${NC} Upload the files to your Claude.ai Project."
 }
 # ============================================================================
 # Entry Point
 # ============================================================================
@@ -731,6 +790,9 @@ case "${1:-}" in
    --minimal)
        install_minimal
        ;;
    --claude-ai)
        export_claude_ai
        ;;
    --help|-h)
        echo "Reference Curator - Portable Installation Script"
        echo ""
@@ -738,6 +800,7 @@ case "${1:-}" in
        echo "  ./install.sh              Interactive installation"
        echo "  ./install.sh --check      Check installation status"
        echo "  ./install.sh --minimal    Firecrawl-only mode (no MySQL)"
        echo "  ./install.sh --claude-ai  Export skills for Claude.ai Projects"
        echo "  ./install.sh --uninstall  Remove installation"
        echo "  ./install.sh --help       Show this help"
        ;;