# Reference Curator - User Guide A modular skill suite for curating reference documentation with Claude Code. ## Quick Start ### One-Shot Full Pipeline (Recommended) ```bash # Research a topic and generate reference library /reference-curator-pipeline "Claude Code best practices" --auto-approve # From specific URLs /reference-curator-pipeline https://docs.anthropic.com/en/docs/agents # With options /reference-curator-pipeline "MCP servers" --max-sources 10 --export-format fine_tuning ``` ### Run as Background Agent Ask Claude Code: > "Run reference-curator as a background agent for topic X" This launches autonomous processing while you continue other work. --- ## Individual Skills Use these when you need granular control over each stage. ### 1. Reference Discovery **Purpose:** Search and discover authoritative sources ```bash /reference-discovery "prompt engineering techniques" ``` **Output:** Discovery manifest (JSON) with scored URLs | Tier | Score | Sources | |------|-------|---------| | tier1_official | ≥0.60 | Official docs, vendor blogs | | tier2_verified | 0.40-0.59 | Research papers, cookbooks | | tier3_community | <0.40 | Tutorials, blog posts | --- ### 2. Web Crawler **Purpose:** Crawl URLs and save raw content ```bash # From discovery manifest /web-crawler /path/to/manifest.json # Single URL /web-crawler https://docs.example.com/guide ``` **Output:** Markdown files in `~/Documents/05_AI Agent/10_Reference Library/raw/YYYY/MM/` **Crawler Selection:** | Crawler | Best For | |---------|----------| | Firecrawl (default) | SPAs, JS-rendered sites | | WebFetch | Simple documentation pages | | Node.js | Small static sites (≤50 pages) | --- ### 3. Content Repository **Purpose:** Store documents in MySQL with versioning ```bash # Store crawled content /content-repository store --topic "my-topic" # Query existing documents /content-repository query --topic "my-topic" ``` **Prerequisites:** MySQL database `reference_library` with schema applied. --- ### 4. Content Distiller **Purpose:** Summarize and extract key concepts ```bash /content-distiller --topic "my-topic" ``` **Output per document:** - Executive summary (2-3 sentences) - Key concepts (JSON) - Code snippets (JSON) - Structured content (Markdown) - ~15-20% compression ratio --- ### 5. Quality Reviewer **Purpose:** Score and approve content ```bash /quality-reviewer --topic "my-topic" ``` **Scoring Criteria:** | Criterion | Weight | |-----------|--------| | Accuracy | 25% | | Completeness | 20% | | Clarity | 20% | | PE Quality | 25% | | Usability | 10% | **Decisions:** | Score | Decision | Action | |-------|----------|--------| | ≥0.85 | APPROVE | Ready for export | | 0.60-0.84 | REFACTOR | Re-distill with feedback | | 0.40-0.59 | DEEP_RESEARCH | Crawl more sources | | <0.40 | REJECT | Archive | --- ### 6. Markdown Exporter **Purpose:** Generate final export files ```bash # Project files (for Claude Projects) /markdown-exporter --topic "my-topic" --format project_files # Fine-tuning dataset (JSONL) /markdown-exporter --topic "my-topic" --format fine_tuning ``` **Output Structure:** ``` ~/Documents/05_AI Agent/10_Reference Library/exports/ ├── INDEX.md └── my-topic/ ├── _index.md ├── 01-document-one.md ├── 02-document-two.md └── ... ``` --- ## Common Workflows ### Workflow 1: Quick Reference on a Topic ``` You: "Create a reference library on n8n self-hosting" Claude: [Runs full pipeline automatically] ``` ### Workflow 2: Curate Specific URLs ``` You: "Add these URLs to my reference library: - https://docs.example.com/guide1 - https://docs.example.com/guide2" Claude: [Skips discovery, crawls URLs directly] ``` ### Workflow 3: Update Existing Topic ``` You: "Update the MCP developer manual with latest SDK docs" Claude: [Discovers new sources, crawls, merges with existing] ``` ### Workflow 4: Export for Fine-Tuning ``` You: "Export all approved content as JSONL for fine-tuning" Claude: /markdown-exporter --format fine_tuning ``` --- ## Configuration ### Storage Paths | Type | Location | |------|----------| | Raw content | `~/Documents/05_AI Agent/10_Reference Library/raw/` | | Exports | `~/Documents/05_AI Agent/10_Reference Library/exports/` | | Config | `~/.config/reference-curator/` | ### MySQL Setup Database credentials via `~/.my.cnf`: ```ini [client] user = root password = your_password socket = /tmp/mysql.sock ``` ### Config Files ``` ~/.config/reference-curator/ ├── db_config.yaml # Database connection ├── crawl_config.yaml # Crawler settings └── export_config.yaml # Export preferences ``` --- ## Tips ### For Best Results 1. **Be specific with topics** - "Claude Code MCP integration" > "MCP" 2. **Use official sources first** - tier1 sources get higher quality scores 3. **Review before export** - Check quality scores, refine if needed 4. **Use background agents** - For large topics, run as background task ### Troubleshooting | Issue | Solution | |-------|----------| | MySQL auth failed | Check `~/.my.cnf` permissions (chmod 600) | | Missing table | Run schema.sql from `shared/schema.sql` | | Low quality scores | Add more tier1 sources, re-distill | | Crawl timeout | Try different crawler or reduce max_pages | | Export empty | Check documents are approved in quality review | ### Checking Pipeline Status ```sql -- View pending documents SELECT title, review_status FROM distilled_content JOIN documents USING (doc_id) WHERE review_status = 'pending'; -- View export history SELECT export_name, total_documents, status FROM export_jobs; ``` --- ## Example Session ``` You: Create a reference library on "Anthropic Claude API best practices" Claude: Starting reference-curator pipeline... [1/6] Reference Discovery - Searching docs.anthropic.com, github.com/anthropics... - Found 12 authoritative sources (8 tier1, 3 tier2, 1 tier3) [2/6] Web Crawling - Crawling 12 URLs with WebFetch... - Saved 12 markdown files [3/6] Content Repository - Stored 12 documents in MySQL - Created topic: claude-api-best-practices [4/6] Content Distillation - Distilled 12 documents (~18% avg compression) - Extracted 45 key concepts [5/6] Quality Review - Scored all documents - 11 APPROVED (≥0.85), 1 REFACTOR (0.78) - Re-distilling 1 document... APPROVED (0.86) [6/6] Markdown Export - Generated 14 files (12 docs + 2 indexes) - Output: ~/Documents/.../exports/claude-api-best-practices/ Pipeline complete! Reference library ready. ``` --- ## Need Help? - **Skill source code:** `custom-skills/90-reference-curator/` - **MySQL schema:** `shared/schema.sql` - **Config templates:** `shared/config/`