# Reference Curator Skills Modular Claude Skills for curating, processing, and exporting reference documentation. ## Quick Start ```bash # Clone and install git clone https://github.com/ourdigital/our-claude-skills.git cd our-claude-skills/custom-skills/90-reference-curator ./install.sh # Or minimal install (Firecrawl only, no MySQL) ./install.sh --minimal # Check installation status ./install.sh --check # Uninstall ./install.sh --uninstall ``` --- ## Installing from GitHub ### For New Machines 1. **Clone the repository:** ```bash git clone https://github.com/ourdigital/our-claude-skills.git cd our-claude-skills/custom-skills/90-reference-curator ``` 2. **Run the installer:** ```bash ./install.sh ``` 3. **Follow the interactive prompts:** - Set storage directory path - Configure MySQL credentials (optional) - Choose crawler backend (Firecrawl MCP recommended) 4. **Add to shell profile:** ```bash echo 'source ~/.reference-curator.env' >> ~/.zshrc source ~/.reference-curator.env ``` 5. **Verify installation:** ```bash ./install.sh --check ``` ### Installation Modes | Mode | Command | Description | |------|---------|-------------| | **Full** | `./install.sh` | Interactive setup with MySQL and crawlers | | **Minimal** | `./install.sh --minimal` | Firecrawl MCP only, no database | | **Check** | `./install.sh --check` | Verify installation status | | **Uninstall** | `./install.sh --uninstall` | Remove installation (preserves data) | ### What Gets Installed | Component | Location | Purpose | |-----------|----------|---------| | Environment config | `~/.reference-curator.env` | Credentials and paths | | Config files | `~/.config/reference-curator/` | YAML configuration | | Storage directories | `~/reference-library/` | Raw/processed/exports | | Claude Code commands | `~/.claude/commands/` | Slash commands | | Claude Desktop skills | `~/.claude/skills/` | Skill symlinks | | MySQL database | `reference_library` | Document storage | ### Environment Variables The installer creates `~/.reference-curator.env` with: ```bash # Storage paths export REFERENCE_LIBRARY_PATH="~/reference-library" # MySQL configuration (if enabled) export MYSQL_HOST="localhost" export MYSQL_PORT="3306" export MYSQL_USER="youruser" export MYSQL_PASSWORD="yourpassword" # Crawler configuration export DEFAULT_CRAWLER="firecrawl" # or "nodejs" export CRAWLER_PROJECT_PATH="" # Path to local crawlers (optional) ``` --- ## Architecture ``` [Topic Input] │ ▼ ┌─────────────────────┐ │ reference-discovery │ → Search & validate sources └─────────────────────┘ │ ▼ ┌──────────────────────────┐ │ web-crawler-orchestrator │ → Crawl (Firecrawl/Node.js/aiohttp/Scrapy) └──────────────────────────┘ │ ▼ ┌────────────────────┐ │ content-repository │ → Store in MySQL └────────────────────┘ │ ▼ ┌───────────────────┐ │ content-distiller │ → Summarize & extract └───────────────────┘ │ ▼ ┌──────────────────┐ │ quality-reviewer │ → QA loop └──────────────────┘ │ ├── REFACTOR → content-distiller ├── DEEP_RESEARCH → web-crawler-orchestrator │ ▼ APPROVE ┌───────────────────┐ │ markdown-exporter │ → Project files / Fine-tuning └───────────────────┘ ``` --- ## User Guide ### Basic Workflow **Step 1: Discover References** ``` /reference-discovery Claude's system prompt best practices ``` Claude searches the web, validates sources, and creates a manifest of URLs to crawl. **Step 2: Crawl Content** ``` /web-crawler https://docs.anthropic.com --max-pages 50 ``` The crawler automatically selects the best backend based on site characteristics. **Step 3: Store in Repository** ``` /content-repository store ``` Documents are saved to MySQL with deduplication and version tracking. **Step 4: Distill Content** ``` /content-distiller all-pending ``` Claude summarizes, extracts key concepts, and creates structured content. **Step 5: Quality Review** ``` /quality-reviewer all-pending --auto-approve ``` Automated QA scoring determines: approve, refactor, deep research, or reject. **Step 6: Export** ``` /markdown-exporter project_files --topic prompt-engineering ``` Generates markdown files organized by topic with cross-references. ### Example Prompts | Task | Command | |------|---------| | **Discover sources** | `/reference-discovery MCP server development` | | **Crawl URL** | `/web-crawler https://docs.anthropic.com` | | **Check repository** | `/content-repository stats` | | **Distill document** | `/content-distiller 42` | | **Review quality** | `/quality-reviewer all-pending` | | **Export files** | `/markdown-exporter project_files` | ### Crawler Selection The system automatically selects the optimal crawler: | Site Type | Crawler | When Auto-Selected | |-----------|---------|-------------------| | SPAs/Dynamic | **Firecrawl MCP** | React, Vue, Angular sites (default) | | Small docs sites | **Node.js** | ≤50 pages, static HTML | | Technical docs | **Python aiohttp** | ≤200 pages, needs SEO data | | Enterprise sites | **Scrapy** | >200 pages, multi-domain | Override with explicit request: ``` /web-crawler https://example.com --crawler firecrawl ``` ### Quality Review Decisions | Score | Decision | What Happens | |-------|----------|--------------| | ≥ 0.85 | **Approve** | Ready for export | | 0.60-0.84 | **Refactor** | Re-distill with feedback | | 0.40-0.59 | **Deep Research** | Gather more sources | | < 0.40 | **Reject** | Archive (low quality) | ### Database Queries Check your reference library status: ```bash # Source credentials source ~/.reference-curator.env # Count documents by status mysql -h $MYSQL_HOST -u $MYSQL_USER -p"$MYSQL_PASSWORD" reference_library -e " SELECT crawl_status, COUNT(*) as count FROM documents GROUP BY crawl_status;" # View pending reviews mysql -h $MYSQL_HOST -u $MYSQL_USER -p"$MYSQL_PASSWORD" reference_library -e " SELECT * FROM v_pending_reviews;" # View export-ready documents mysql -h $MYSQL_HOST -u $MYSQL_USER -p"$MYSQL_PASSWORD" reference_library -e " SELECT * FROM v_export_ready;" ``` ### Output Formats **For Claude Projects:** ``` ~/reference-library/exports/ ├── INDEX.md # Master index with all topics └── prompt-engineering/ # Topic folder ├── _index.md # Topic overview ├── system-prompts.md # Individual document └── chain-of-thought.md ``` **For Fine-tuning (JSONL):** ```json {"messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]} ``` --- ## Skills & Commands Reference | # | Skill | Command | Purpose | |---|-------|---------|---------| | 01 | reference-discovery | `/reference-discovery` | Search authoritative sources | | 02 | web-crawler-orchestrator | `/web-crawler` | Multi-backend crawling | | 03 | content-repository | `/content-repository` | MySQL storage management | | 04 | content-distiller | `/content-distiller` | Summarize & extract | | 05 | quality-reviewer | `/quality-reviewer` | QA scoring & routing | | 06 | markdown-exporter | `/markdown-exporter` | Export to markdown/JSONL | --- ## Configuration ### Environment (`~/.reference-curator.env`) ```bash # Required for MySQL features export MYSQL_HOST="localhost" export MYSQL_USER="youruser" export MYSQL_PASSWORD="yourpassword" # Storage location export REFERENCE_LIBRARY_PATH="~/reference-library" # Crawler selection export DEFAULT_CRAWLER="firecrawl" # firecrawl, nodejs, aiohttp, scrapy ``` ### Database (`~/.config/reference-curator/db_config.yaml`) ```yaml mysql: host: ${MYSQL_HOST:-localhost} port: ${MYSQL_PORT:-3306} database: reference_library user: ${MYSQL_USER} password: ${MYSQL_PASSWORD} ``` ### Crawlers (`~/.config/reference-curator/crawl_config.yaml`) ```yaml default_crawler: ${DEFAULT_CRAWLER:-firecrawl} rate_limit: requests_per_minute: 20 concurrent_requests: 3 default_options: timeout: 30000 max_depth: 3 max_pages: 100 ``` ### Export (`~/.config/reference-curator/export_config.yaml`) ```yaml output: base_path: ${REFERENCE_LIBRARY_PATH:-~/reference-library}/exports/ quality: min_score_for_export: 0.80 auto_approve_tier1_sources: true ``` --- ## Troubleshooting ### MySQL Connection Failed ```bash # Check MySQL is running brew services list | grep mysql # macOS systemctl status mysql # Linux # Start MySQL brew services start mysql # macOS sudo systemctl start mysql # Linux # Verify credentials source ~/.reference-curator.env mysql -h $MYSQL_HOST -u $MYSQL_USER -p"$MYSQL_PASSWORD" -e "SELECT 1" ``` ### Commands Not Found ```bash # Check commands are registered ls -la ~/.claude/commands/ # Re-run installer to fix ./install.sh ``` ### Crawler Timeout For slow sites, increase timeout in `~/.config/reference-curator/crawl_config.yaml`: ```yaml default_options: timeout: 60000 # 60 seconds ``` ### Skills Not Loading ```bash # Check symlinks exist ls -la ~/.claude/skills/ # Re-run installer ./install.sh --uninstall ./install.sh ``` ### Database Schema Outdated ```bash # Re-apply schema (preserves data with IF NOT EXISTS) source ~/.reference-curator.env mysql -h $MYSQL_HOST -u $MYSQL_USER -p"$MYSQL_PASSWORD" reference_library < shared/schema.sql ``` --- ## Directory Structure ``` 90-reference-curator/ ├── README.md # This file ├── CHANGELOG.md # Version history ├── install.sh # Portable installation script │ ├── commands/ # Claude Code commands (tracked in git) │ ├── reference-discovery.md │ ├── web-crawler.md │ ├── content-repository.md │ ├── content-distiller.md │ ├── quality-reviewer.md │ └── markdown-exporter.md │ ├── 01-reference-discovery/ │ ├── code/CLAUDE.md # Claude Code directive │ └── desktop/SKILL.md # Claude Desktop directive ├── 02-web-crawler-orchestrator/ │ ├── code/CLAUDE.md │ └── desktop/SKILL.md ├── 03-content-repository/ │ ├── code/CLAUDE.md │ └── desktop/SKILL.md ├── 04-content-distiller/ │ ├── code/CLAUDE.md │ └── desktop/SKILL.md ├── 05-quality-reviewer/ │ ├── code/CLAUDE.md │ └── desktop/SKILL.md ├── 06-markdown-exporter/ │ ├── code/CLAUDE.md │ └── desktop/SKILL.md │ └── shared/ ├── schema.sql # MySQL schema └── config/ # Config templates ├── db_config.yaml ├── crawl_config.yaml └── export_config.yaml ``` --- ## Platform Differences | Aspect | `code/` (Claude Code) | `desktop/` (Claude Desktop) | |--------|----------------------|----------------------------| | Directive | CLAUDE.md | SKILL.md (YAML frontmatter) | | Commands | `~/.claude/commands/` | Not used | | Skills | Reference only | `~/.claude/skills/` symlinks | | Execution | Direct Bash/Python | MCP tools only | | Best for | Automation, CI/CD | Interactive use | --- ## Prerequisites ### Required - macOS or Linux - Claude Code or Claude Desktop ### Optional (for full features) - MySQL 8.0+ (for database storage) - Firecrawl MCP server configured - Node.js 18+ (for Node.js crawler) - Python 3.12+ (for aiohttp/Scrapy crawlers)