Add claude-project/ folder with skill files formatted for upload to Claude.ai Projects (web interface): - reference-curator-complete.md: All 6 skills consolidated - INDEX.md: Overview and workflow documentation - Individual skill files (01-06) without YAML frontmatter Add --claude-ai option to install.sh: - Lists available files for upload - Optionally copies to custom destination directory - Provides upload instructions for Claude.ai Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Reference Curator Skills
Modular Claude Skills for curating, processing, and exporting reference documentation.
Quick Start
# Clone and install
git clone https://github.com/ourdigital/our-claude-skills.git
cd our-claude-skills/custom-skills/90-reference-curator
./install.sh
# Or minimal install (Firecrawl only, no MySQL)
./install.sh --minimal
# Check installation status
./install.sh --check
# Uninstall
./install.sh --uninstall
Installing from GitHub
For New Machines
-
Clone the repository:
git clone https://github.com/ourdigital/our-claude-skills.git cd our-claude-skills/custom-skills/90-reference-curator -
Run the installer:
./install.sh -
Follow the interactive prompts:
- Set storage directory path
- Configure MySQL credentials (optional)
- Choose crawler backend (Firecrawl MCP recommended)
-
Add to shell profile:
echo 'source ~/.reference-curator.env' >> ~/.zshrc source ~/.reference-curator.env -
Verify installation:
./install.sh --check
Installation Modes
| Mode | Command | Description |
|---|---|---|
| Full | ./install.sh |
Interactive setup with MySQL and crawlers |
| Minimal | ./install.sh --minimal |
Firecrawl MCP only, no database |
| Check | ./install.sh --check |
Verify installation status |
| Claude.ai | ./install.sh --claude-ai |
Export skills for Claude.ai Projects |
| Uninstall | ./install.sh --uninstall |
Remove installation (preserves data) |
What Gets Installed
| Component | Location | Purpose |
|---|---|---|
| Environment config | ~/.reference-curator.env |
Credentials and paths |
| Config files | ~/.config/reference-curator/ |
YAML configuration |
| Storage directories | ~/reference-library/ |
Raw/processed/exports |
| Claude Code commands | ~/.claude/commands/ |
Slash commands |
| Claude Desktop skills | ~/.claude/skills/ |
Skill symlinks |
| MySQL database | reference_library |
Document storage |
Environment Variables
The installer creates ~/.reference-curator.env with:
# Storage paths
export REFERENCE_LIBRARY_PATH="~/reference-library"
# MySQL configuration (if enabled)
export MYSQL_HOST="localhost"
export MYSQL_PORT="3306"
export MYSQL_USER="youruser"
export MYSQL_PASSWORD="yourpassword"
# Crawler configuration
export DEFAULT_CRAWLER="firecrawl" # or "nodejs"
export CRAWLER_PROJECT_PATH="" # Path to local crawlers (optional)
Claude.ai Projects Installation
To use these skills in Claude.ai (web interface), export the skill files for upload:
./install.sh --claude-ai
This displays available files in claude-project/ and optionally copies them to a convenient location.
Files for Upload
| File | Description |
|---|---|
reference-curator-complete.md |
All 6 skills combined (recommended) |
INDEX.md |
Overview and workflow documentation |
01-reference-discovery.md |
Source discovery skill |
02-web-crawler.md |
Crawling orchestration skill |
03-content-repository.md |
Database storage skill |
04-content-distiller.md |
Content summarization skill |
05-quality-reviewer.md |
QA review skill |
06-markdown-exporter.md |
Export skill |
Upload Instructions
- Go to claude.ai
- Create a new Project or open existing one
- Click "Add to project knowledge"
- Upload
reference-curator-complete.md(or individual skills as needed)
Architecture
[Topic Input]
│
▼
┌─────────────────────┐
│ reference-discovery │ → Search & validate sources
└─────────────────────┘
│
▼
┌──────────────────────────┐
│ web-crawler-orchestrator │ → Crawl (Firecrawl/Node.js/aiohttp/Scrapy)
└──────────────────────────┘
│
▼
┌────────────────────┐
│ content-repository │ → Store in MySQL
└────────────────────┘
│
▼
┌───────────────────┐
│ content-distiller │ → Summarize & extract
└───────────────────┘
│
▼
┌──────────────────┐
│ quality-reviewer │ → QA loop
└──────────────────┘
│
├── REFACTOR → content-distiller
├── DEEP_RESEARCH → web-crawler-orchestrator
│
▼ APPROVE
┌───────────────────┐
│ markdown-exporter │ → Project files / Fine-tuning
└───────────────────┘
User Guide
Basic Workflow
Step 1: Discover References
/reference-discovery Claude's system prompt best practices
Claude searches the web, validates sources, and creates a manifest of URLs to crawl.
Step 2: Crawl Content
/web-crawler https://docs.anthropic.com --max-pages 50
The crawler automatically selects the best backend based on site characteristics.
Step 3: Store in Repository
/content-repository store
Documents are saved to MySQL with deduplication and version tracking.
Step 4: Distill Content
/content-distiller all-pending
Claude summarizes, extracts key concepts, and creates structured content.
Step 5: Quality Review
/quality-reviewer all-pending --auto-approve
Automated QA scoring determines: approve, refactor, deep research, or reject.
Step 6: Export
/markdown-exporter project_files --topic prompt-engineering
Generates markdown files organized by topic with cross-references.
Example Prompts
| Task | Command |
|---|---|
| Discover sources | /reference-discovery MCP server development |
| Crawl URL | /web-crawler https://docs.anthropic.com |
| Check repository | /content-repository stats |
| Distill document | /content-distiller 42 |
| Review quality | /quality-reviewer all-pending |
| Export files | /markdown-exporter project_files |
Crawler Selection
The system automatically selects the optimal crawler:
| Site Type | Crawler | When Auto-Selected |
|---|---|---|
| SPAs/Dynamic | Firecrawl MCP | React, Vue, Angular sites (default) |
| Small docs sites | Node.js | ≤50 pages, static HTML |
| Technical docs | Python aiohttp | ≤200 pages, needs SEO data |
| Enterprise sites | Scrapy | >200 pages, multi-domain |
Override with explicit request:
/web-crawler https://example.com --crawler firecrawl
Quality Review Decisions
| Score | Decision | What Happens |
|---|---|---|
| ≥ 0.85 | Approve | Ready for export |
| 0.60-0.84 | Refactor | Re-distill with feedback |
| 0.40-0.59 | Deep Research | Gather more sources |
| < 0.40 | Reject | Archive (low quality) |
Database Queries
Check your reference library status:
# Source credentials
source ~/.reference-curator.env
# Count documents by status
mysql -h $MYSQL_HOST -u $MYSQL_USER -p"$MYSQL_PASSWORD" reference_library -e "
SELECT crawl_status, COUNT(*) as count
FROM documents GROUP BY crawl_status;"
# View pending reviews
mysql -h $MYSQL_HOST -u $MYSQL_USER -p"$MYSQL_PASSWORD" reference_library -e "
SELECT * FROM v_pending_reviews;"
# View export-ready documents
mysql -h $MYSQL_HOST -u $MYSQL_USER -p"$MYSQL_PASSWORD" reference_library -e "
SELECT * FROM v_export_ready;"
Output Formats
For Claude Projects:
~/reference-library/exports/
├── INDEX.md # Master index with all topics
└── prompt-engineering/ # Topic folder
├── _index.md # Topic overview
├── system-prompts.md # Individual document
└── chain-of-thought.md
For Fine-tuning (JSONL):
{"messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}
Skills & Commands Reference
| # | Skill | Command | Purpose |
|---|---|---|---|
| 01 | reference-discovery | /reference-discovery |
Search authoritative sources |
| 02 | web-crawler-orchestrator | /web-crawler |
Multi-backend crawling |
| 03 | content-repository | /content-repository |
MySQL storage management |
| 04 | content-distiller | /content-distiller |
Summarize & extract |
| 05 | quality-reviewer | /quality-reviewer |
QA scoring & routing |
| 06 | markdown-exporter | /markdown-exporter |
Export to markdown/JSONL |
Configuration
Environment (~/.reference-curator.env)
# Required for MySQL features
export MYSQL_HOST="localhost"
export MYSQL_USER="youruser"
export MYSQL_PASSWORD="yourpassword"
# Storage location
export REFERENCE_LIBRARY_PATH="~/reference-library"
# Crawler selection
export DEFAULT_CRAWLER="firecrawl" # firecrawl, nodejs, aiohttp, scrapy
Database (~/.config/reference-curator/db_config.yaml)
mysql:
host: ${MYSQL_HOST:-localhost}
port: ${MYSQL_PORT:-3306}
database: reference_library
user: ${MYSQL_USER}
password: ${MYSQL_PASSWORD}
Crawlers (~/.config/reference-curator/crawl_config.yaml)
default_crawler: ${DEFAULT_CRAWLER:-firecrawl}
rate_limit:
requests_per_minute: 20
concurrent_requests: 3
default_options:
timeout: 30000
max_depth: 3
max_pages: 100
Export (~/.config/reference-curator/export_config.yaml)
output:
base_path: ${REFERENCE_LIBRARY_PATH:-~/reference-library}/exports/
quality:
min_score_for_export: 0.80
auto_approve_tier1_sources: true
Troubleshooting
MySQL Connection Failed
# Check MySQL is running
brew services list | grep mysql # macOS
systemctl status mysql # Linux
# Start MySQL
brew services start mysql # macOS
sudo systemctl start mysql # Linux
# Verify credentials
source ~/.reference-curator.env
mysql -h $MYSQL_HOST -u $MYSQL_USER -p"$MYSQL_PASSWORD" -e "SELECT 1"
Commands Not Found
# Check commands are registered
ls -la ~/.claude/commands/
# Re-run installer to fix
./install.sh
Crawler Timeout
For slow sites, increase timeout in ~/.config/reference-curator/crawl_config.yaml:
default_options:
timeout: 60000 # 60 seconds
Skills Not Loading
# Check symlinks exist
ls -la ~/.claude/skills/
# Re-run installer
./install.sh --uninstall
./install.sh
Database Schema Outdated
# Re-apply schema (preserves data with IF NOT EXISTS)
source ~/.reference-curator.env
mysql -h $MYSQL_HOST -u $MYSQL_USER -p"$MYSQL_PASSWORD" reference_library < shared/schema.sql
Directory Structure
90-reference-curator/
├── README.md # This file
├── CHANGELOG.md # Version history
├── install.sh # Portable installation script
│
├── claude-project/ # Files for Claude.ai Projects
│ ├── INDEX.md # Overview
│ ├── reference-curator-complete.md # All skills combined
│ ├── 01-reference-discovery.md
│ ├── 02-web-crawler.md
│ ├── 03-content-repository.md
│ ├── 04-content-distiller.md
│ ├── 05-quality-reviewer.md
│ └── 06-markdown-exporter.md
│
├── commands/ # Claude Code commands (tracked in git)
│ ├── reference-discovery.md
│ ├── web-crawler.md
│ ├── content-repository.md
│ ├── content-distiller.md
│ ├── quality-reviewer.md
│ └── markdown-exporter.md
│
├── 01-reference-discovery/
│ ├── code/CLAUDE.md # Claude Code directive
│ └── desktop/SKILL.md # Claude Desktop directive
├── 02-web-crawler-orchestrator/
│ ├── code/CLAUDE.md
│ └── desktop/SKILL.md
├── 03-content-repository/
│ ├── code/CLAUDE.md
│ └── desktop/SKILL.md
├── 04-content-distiller/
│ ├── code/CLAUDE.md
│ └── desktop/SKILL.md
├── 05-quality-reviewer/
│ ├── code/CLAUDE.md
│ └── desktop/SKILL.md
├── 06-markdown-exporter/
│ ├── code/CLAUDE.md
│ └── desktop/SKILL.md
│
└── shared/
├── schema.sql # MySQL schema
└── config/ # Config templates
├── db_config.yaml
├── crawl_config.yaml
└── export_config.yaml
Platform Differences
| Aspect | code/ (Claude Code) |
desktop/ (Claude Desktop) |
|---|---|---|
| Directive | CLAUDE.md | SKILL.md (YAML frontmatter) |
| Commands | ~/.claude/commands/ |
Not used |
| Skills | Reference only | ~/.claude/skills/ symlinks |
| Execution | Direct Bash/Python | MCP tools only |
| Best for | Automation, CI/CD | Interactive use |
Prerequisites
Required
- macOS or Linux
- Claude Code or Claude Desktop
Optional (for full features)
- MySQL 8.0+ (for database storage)
- Firecrawl MCP server configured
- Node.js 18+ (for Node.js crawler)
- Python 3.12+ (for aiohttp/Scrapy crawlers)