feat(reference-curator): Add portable skill suite for reference documentation curation
6 modular skills for curating, processing, and exporting reference docs: - reference-discovery: Search and validate authoritative sources - web-crawler-orchestrator: Multi-backend crawling (Firecrawl/Node/aiohttp/Scrapy) - content-repository: MySQL storage with version tracking - content-distiller: Summarization and key concept extraction - quality-reviewer: QA loop with approve/refactor/research routing - markdown-exporter: Structured output for Claude Projects or fine-tuning Cross-machine installation support: - Environment-based config (~/.reference-curator.env) - Commands tracked in repo, symlinked during install - install.sh with --minimal, --check, --uninstall modes - Firecrawl MCP as default (always available) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
448
custom-skills/90-reference-curator/README.md
Normal file
448
custom-skills/90-reference-curator/README.md
Normal file
@@ -0,0 +1,448 @@
|
||||
# Reference Curator Skills
|
||||
|
||||
Modular Claude Skills for curating, processing, and exporting reference documentation.
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
# Clone and install
|
||||
git clone https://github.com/ourdigital/our-claude-skills.git
|
||||
cd our-claude-skills/custom-skills/90-reference-curator
|
||||
./install.sh
|
||||
|
||||
# Or minimal install (Firecrawl only, no MySQL)
|
||||
./install.sh --minimal
|
||||
|
||||
# Check installation status
|
||||
./install.sh --check
|
||||
|
||||
# Uninstall
|
||||
./install.sh --uninstall
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Installing from GitHub
|
||||
|
||||
### For New Machines
|
||||
|
||||
1. **Clone the repository:**
|
||||
```bash
|
||||
git clone https://github.com/ourdigital/our-claude-skills.git
|
||||
cd our-claude-skills/custom-skills/90-reference-curator
|
||||
```
|
||||
|
||||
2. **Run the installer:**
|
||||
```bash
|
||||
./install.sh
|
||||
```
|
||||
|
||||
3. **Follow the interactive prompts:**
|
||||
- Set storage directory path
|
||||
- Configure MySQL credentials (optional)
|
||||
- Choose crawler backend (Firecrawl MCP recommended)
|
||||
|
||||
4. **Add to shell profile:**
|
||||
```bash
|
||||
echo 'source ~/.reference-curator.env' >> ~/.zshrc
|
||||
source ~/.reference-curator.env
|
||||
```
|
||||
|
||||
5. **Verify installation:**
|
||||
```bash
|
||||
./install.sh --check
|
||||
```
|
||||
|
||||
### Installation Modes
|
||||
|
||||
| Mode | Command | Description |
|
||||
|------|---------|-------------|
|
||||
| **Full** | `./install.sh` | Interactive setup with MySQL and crawlers |
|
||||
| **Minimal** | `./install.sh --minimal` | Firecrawl MCP only, no database |
|
||||
| **Check** | `./install.sh --check` | Verify installation status |
|
||||
| **Uninstall** | `./install.sh --uninstall` | Remove installation (preserves data) |
|
||||
|
||||
### What Gets Installed
|
||||
|
||||
| Component | Location | Purpose |
|
||||
|-----------|----------|---------|
|
||||
| Environment config | `~/.reference-curator.env` | Credentials and paths |
|
||||
| Config files | `~/.config/reference-curator/` | YAML configuration |
|
||||
| Storage directories | `~/reference-library/` | Raw/processed/exports |
|
||||
| Claude Code commands | `~/.claude/commands/` | Slash commands |
|
||||
| Claude Desktop skills | `~/.claude/skills/` | Skill symlinks |
|
||||
| MySQL database | `reference_library` | Document storage |
|
||||
|
||||
### Environment Variables
|
||||
|
||||
The installer creates `~/.reference-curator.env` with:
|
||||
|
||||
```bash
|
||||
# Storage paths
|
||||
export REFERENCE_LIBRARY_PATH="~/reference-library"
|
||||
|
||||
# MySQL configuration (if enabled)
|
||||
export MYSQL_HOST="localhost"
|
||||
export MYSQL_PORT="3306"
|
||||
export MYSQL_USER="youruser"
|
||||
export MYSQL_PASSWORD="yourpassword"
|
||||
|
||||
# Crawler configuration
|
||||
export DEFAULT_CRAWLER="firecrawl" # or "nodejs"
|
||||
export CRAWLER_PROJECT_PATH="" # Path to local crawlers (optional)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
[Topic Input]
|
||||
│
|
||||
▼
|
||||
┌─────────────────────┐
|
||||
│ reference-discovery │ → Search & validate sources
|
||||
└─────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌──────────────────────────┐
|
||||
│ web-crawler-orchestrator │ → Crawl (Firecrawl/Node.js/aiohttp/Scrapy)
|
||||
└──────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌────────────────────┐
|
||||
│ content-repository │ → Store in MySQL
|
||||
└────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌───────────────────┐
|
||||
│ content-distiller │ → Summarize & extract
|
||||
└───────────────────┘
|
||||
│
|
||||
▼
|
||||
┌──────────────────┐
|
||||
│ quality-reviewer │ → QA loop
|
||||
└──────────────────┘
|
||||
│
|
||||
├── REFACTOR → content-distiller
|
||||
├── DEEP_RESEARCH → web-crawler-orchestrator
|
||||
│
|
||||
▼ APPROVE
|
||||
┌───────────────────┐
|
||||
│ markdown-exporter │ → Project files / Fine-tuning
|
||||
└───────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## User Guide
|
||||
|
||||
### Basic Workflow
|
||||
|
||||
**Step 1: Discover References**
|
||||
```
|
||||
/reference-discovery Claude's system prompt best practices
|
||||
```
|
||||
Claude searches the web, validates sources, and creates a manifest of URLs to crawl.
|
||||
|
||||
**Step 2: Crawl Content**
|
||||
```
|
||||
/web-crawler https://docs.anthropic.com --max-pages 50
|
||||
```
|
||||
The crawler automatically selects the best backend based on site characteristics.
|
||||
|
||||
**Step 3: Store in Repository**
|
||||
```
|
||||
/content-repository store
|
||||
```
|
||||
Documents are saved to MySQL with deduplication and version tracking.
|
||||
|
||||
**Step 4: Distill Content**
|
||||
```
|
||||
/content-distiller all-pending
|
||||
```
|
||||
Claude summarizes, extracts key concepts, and creates structured content.
|
||||
|
||||
**Step 5: Quality Review**
|
||||
```
|
||||
/quality-reviewer all-pending --auto-approve
|
||||
```
|
||||
Automated QA scoring determines: approve, refactor, deep research, or reject.
|
||||
|
||||
**Step 6: Export**
|
||||
```
|
||||
/markdown-exporter project_files --topic prompt-engineering
|
||||
```
|
||||
Generates markdown files organized by topic with cross-references.
|
||||
|
||||
### Example Prompts
|
||||
|
||||
| Task | Command |
|
||||
|------|---------|
|
||||
| **Discover sources** | `/reference-discovery MCP server development` |
|
||||
| **Crawl URL** | `/web-crawler https://docs.anthropic.com` |
|
||||
| **Check repository** | `/content-repository stats` |
|
||||
| **Distill document** | `/content-distiller 42` |
|
||||
| **Review quality** | `/quality-reviewer all-pending` |
|
||||
| **Export files** | `/markdown-exporter project_files` |
|
||||
|
||||
### Crawler Selection
|
||||
|
||||
The system automatically selects the optimal crawler:
|
||||
|
||||
| Site Type | Crawler | When Auto-Selected |
|
||||
|-----------|---------|-------------------|
|
||||
| SPAs/Dynamic | **Firecrawl MCP** | React, Vue, Angular sites (default) |
|
||||
| Small docs sites | **Node.js** | ≤50 pages, static HTML |
|
||||
| Technical docs | **Python aiohttp** | ≤200 pages, needs SEO data |
|
||||
| Enterprise sites | **Scrapy** | >200 pages, multi-domain |
|
||||
|
||||
Override with explicit request:
|
||||
```
|
||||
/web-crawler https://example.com --crawler firecrawl
|
||||
```
|
||||
|
||||
### Quality Review Decisions
|
||||
|
||||
| Score | Decision | What Happens |
|
||||
|-------|----------|--------------|
|
||||
| ≥ 0.85 | **Approve** | Ready for export |
|
||||
| 0.60-0.84 | **Refactor** | Re-distill with feedback |
|
||||
| 0.40-0.59 | **Deep Research** | Gather more sources |
|
||||
| < 0.40 | **Reject** | Archive (low quality) |
|
||||
|
||||
### Database Queries
|
||||
|
||||
Check your reference library status:
|
||||
|
||||
```bash
|
||||
# Source credentials
|
||||
source ~/.reference-curator.env
|
||||
|
||||
# Count documents by status
|
||||
mysql -h $MYSQL_HOST -u $MYSQL_USER -p"$MYSQL_PASSWORD" reference_library -e "
|
||||
SELECT crawl_status, COUNT(*) as count
|
||||
FROM documents GROUP BY crawl_status;"
|
||||
|
||||
# View pending reviews
|
||||
mysql -h $MYSQL_HOST -u $MYSQL_USER -p"$MYSQL_PASSWORD" reference_library -e "
|
||||
SELECT * FROM v_pending_reviews;"
|
||||
|
||||
# View export-ready documents
|
||||
mysql -h $MYSQL_HOST -u $MYSQL_USER -p"$MYSQL_PASSWORD" reference_library -e "
|
||||
SELECT * FROM v_export_ready;"
|
||||
```
|
||||
|
||||
### Output Formats
|
||||
|
||||
**For Claude Projects:**
|
||||
```
|
||||
~/reference-library/exports/
|
||||
├── INDEX.md # Master index with all topics
|
||||
└── prompt-engineering/ # Topic folder
|
||||
├── _index.md # Topic overview
|
||||
├── system-prompts.md # Individual document
|
||||
└── chain-of-thought.md
|
||||
```
|
||||
|
||||
**For Fine-tuning (JSONL):**
|
||||
```json
|
||||
{"messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Skills & Commands Reference
|
||||
|
||||
| # | Skill | Command | Purpose |
|
||||
|---|-------|---------|---------|
|
||||
| 01 | reference-discovery | `/reference-discovery` | Search authoritative sources |
|
||||
| 02 | web-crawler-orchestrator | `/web-crawler` | Multi-backend crawling |
|
||||
| 03 | content-repository | `/content-repository` | MySQL storage management |
|
||||
| 04 | content-distiller | `/content-distiller` | Summarize & extract |
|
||||
| 05 | quality-reviewer | `/quality-reviewer` | QA scoring & routing |
|
||||
| 06 | markdown-exporter | `/markdown-exporter` | Export to markdown/JSONL |
|
||||
|
||||
---
|
||||
|
||||
## Configuration
|
||||
|
||||
### Environment (`~/.reference-curator.env`)
|
||||
|
||||
```bash
|
||||
# Required for MySQL features
|
||||
export MYSQL_HOST="localhost"
|
||||
export MYSQL_USER="youruser"
|
||||
export MYSQL_PASSWORD="yourpassword"
|
||||
|
||||
# Storage location
|
||||
export REFERENCE_LIBRARY_PATH="~/reference-library"
|
||||
|
||||
# Crawler selection
|
||||
export DEFAULT_CRAWLER="firecrawl" # firecrawl, nodejs, aiohttp, scrapy
|
||||
```
|
||||
|
||||
### Database (`~/.config/reference-curator/db_config.yaml`)
|
||||
|
||||
```yaml
|
||||
mysql:
|
||||
host: ${MYSQL_HOST:-localhost}
|
||||
port: ${MYSQL_PORT:-3306}
|
||||
database: reference_library
|
||||
user: ${MYSQL_USER}
|
||||
password: ${MYSQL_PASSWORD}
|
||||
```
|
||||
|
||||
### Crawlers (`~/.config/reference-curator/crawl_config.yaml`)
|
||||
|
||||
```yaml
|
||||
default_crawler: ${DEFAULT_CRAWLER:-firecrawl}
|
||||
|
||||
rate_limit:
|
||||
requests_per_minute: 20
|
||||
concurrent_requests: 3
|
||||
|
||||
default_options:
|
||||
timeout: 30000
|
||||
max_depth: 3
|
||||
max_pages: 100
|
||||
```
|
||||
|
||||
### Export (`~/.config/reference-curator/export_config.yaml`)
|
||||
|
||||
```yaml
|
||||
output:
|
||||
base_path: ${REFERENCE_LIBRARY_PATH:-~/reference-library}/exports/
|
||||
|
||||
quality:
|
||||
min_score_for_export: 0.80
|
||||
auto_approve_tier1_sources: true
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### MySQL Connection Failed
|
||||
|
||||
```bash
|
||||
# Check MySQL is running
|
||||
brew services list | grep mysql # macOS
|
||||
systemctl status mysql # Linux
|
||||
|
||||
# Start MySQL
|
||||
brew services start mysql # macOS
|
||||
sudo systemctl start mysql # Linux
|
||||
|
||||
# Verify credentials
|
||||
source ~/.reference-curator.env
|
||||
mysql -h $MYSQL_HOST -u $MYSQL_USER -p"$MYSQL_PASSWORD" -e "SELECT 1"
|
||||
```
|
||||
|
||||
### Commands Not Found
|
||||
|
||||
```bash
|
||||
# Check commands are registered
|
||||
ls -la ~/.claude/commands/
|
||||
|
||||
# Re-run installer to fix
|
||||
./install.sh
|
||||
```
|
||||
|
||||
### Crawler Timeout
|
||||
|
||||
For slow sites, increase timeout in `~/.config/reference-curator/crawl_config.yaml`:
|
||||
```yaml
|
||||
default_options:
|
||||
timeout: 60000 # 60 seconds
|
||||
```
|
||||
|
||||
### Skills Not Loading
|
||||
|
||||
```bash
|
||||
# Check symlinks exist
|
||||
ls -la ~/.claude/skills/
|
||||
|
||||
# Re-run installer
|
||||
./install.sh --uninstall
|
||||
./install.sh
|
||||
```
|
||||
|
||||
### Database Schema Outdated
|
||||
|
||||
```bash
|
||||
# Re-apply schema (preserves data with IF NOT EXISTS)
|
||||
source ~/.reference-curator.env
|
||||
mysql -h $MYSQL_HOST -u $MYSQL_USER -p"$MYSQL_PASSWORD" reference_library < shared/schema.sql
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Directory Structure
|
||||
|
||||
```
|
||||
90-reference-curator/
|
||||
├── README.md # This file
|
||||
├── CHANGELOG.md # Version history
|
||||
├── install.sh # Portable installation script
|
||||
│
|
||||
├── commands/ # Claude Code commands (tracked in git)
|
||||
│ ├── reference-discovery.md
|
||||
│ ├── web-crawler.md
|
||||
│ ├── content-repository.md
|
||||
│ ├── content-distiller.md
|
||||
│ ├── quality-reviewer.md
|
||||
│ └── markdown-exporter.md
|
||||
│
|
||||
├── 01-reference-discovery/
|
||||
│ ├── code/CLAUDE.md # Claude Code directive
|
||||
│ └── desktop/SKILL.md # Claude Desktop directive
|
||||
├── 02-web-crawler-orchestrator/
|
||||
│ ├── code/CLAUDE.md
|
||||
│ └── desktop/SKILL.md
|
||||
├── 03-content-repository/
|
||||
│ ├── code/CLAUDE.md
|
||||
│ └── desktop/SKILL.md
|
||||
├── 04-content-distiller/
|
||||
│ ├── code/CLAUDE.md
|
||||
│ └── desktop/SKILL.md
|
||||
├── 05-quality-reviewer/
|
||||
│ ├── code/CLAUDE.md
|
||||
│ └── desktop/SKILL.md
|
||||
├── 06-markdown-exporter/
|
||||
│ ├── code/CLAUDE.md
|
||||
│ └── desktop/SKILL.md
|
||||
│
|
||||
└── shared/
|
||||
├── schema.sql # MySQL schema
|
||||
└── config/ # Config templates
|
||||
├── db_config.yaml
|
||||
├── crawl_config.yaml
|
||||
└── export_config.yaml
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Platform Differences
|
||||
|
||||
| Aspect | `code/` (Claude Code) | `desktop/` (Claude Desktop) |
|
||||
|--------|----------------------|----------------------------|
|
||||
| Directive | CLAUDE.md | SKILL.md (YAML frontmatter) |
|
||||
| Commands | `~/.claude/commands/` | Not used |
|
||||
| Skills | Reference only | `~/.claude/skills/` symlinks |
|
||||
| Execution | Direct Bash/Python | MCP tools only |
|
||||
| Best for | Automation, CI/CD | Interactive use |
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
|
||||
### Required
|
||||
- macOS or Linux
|
||||
- Claude Code or Claude Desktop
|
||||
|
||||
### Optional (for full features)
|
||||
- MySQL 8.0+ (for database storage)
|
||||
- Firecrawl MCP server configured
|
||||
- Node.js 18+ (for Node.js crawler)
|
||||
- Python 3.12+ (for aiohttp/Scrapy crawlers)
|
||||
Reference in New Issue
Block a user