Files

Verify Skills / verify-skills (push) Has been cancelled

Details

refactor(skills): clean skill names (strip NN- prefix from name:) — convention change

Adopt: directory keeps its NN- ordering prefix; skill `name:` is the clean form
without it (dir 16-seo-schema-validator → name: seo-schema-validator). Nicer to
invoke, matches the original desktop/SKILL.md names, still globally unique.

- 71 root SKILL.md: name: NN-foo → name: foo (flat skills + reference-curator suite).
  Plugins (mac-optimizer/multi-agent-guide/dintel-bootstrap) already clean; 95 already clean.
- scripts/migrate_skill_root.py: derive name = dirname minus NN- prefix (skill_name()).
- CLAUDE.md + SKILL-MIGRATION-GUIDE.md: document the dir-prefix / clean-name convention.

verify_skills.py: 0 name collisions across all renamed skills. (The ~/.claude/skills
symlinks were re-pointed to the clean names separately — filesystem only.)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-28 02:11:01 +09:00

01-reference-discovery

refactor(skills): clean skill names (strip NN- prefix from name:) — convention change

2026-05-28 02:11:01 +09:00

02-web-crawler-orchestrator

refactor(skills): clean skill names (strip NN- prefix from name:) — convention change

2026-05-28 02:11:01 +09:00

03-content-repository

refactor(skills): clean skill names (strip NN- prefix from name:) — convention change

2026-05-28 02:11:01 +09:00

04-content-distiller

refactor(skills): clean skill names (strip NN- prefix from name:) — convention change

2026-05-28 02:11:01 +09:00

05-quality-reviewer

refactor(skills): clean skill names (strip NN- prefix from name:) — convention change

2026-05-28 02:11:01 +09:00

06-markdown-exporter

refactor(skills): clean skill names (strip NN- prefix from name:) — convention change

2026-05-28 02:11:01 +09:00

07-pipeline-orchestrator

refactor(skills): clean skill names (strip NN- prefix from name:) — convention change

2026-05-28 02:11:01 +09:00

claude-project

feat(reference-curator): Add pipeline orchestrator and refactor skill format

2026-01-29 01:01:02 +07:00

commands

feat(reference-curator): add --save-raw, --no-distill flags + OurSEO crawler integration

2026-04-12 23:26:50 +09:00

shared

feat(reference-curator): add --save-raw, --no-distill flags + OurSEO crawler integration

2026-04-12 23:26:50 +09:00

CHANGELOG.md

feat(reference-curator): Add portable skill suite for reference documentation curation

2026-01-29 00:20:27 +07:00

install.sh

feat(reference-curator): implement Python scripts + Gemini quality gate

2026-04-12 18:22:28 +09:00

README.md

feat(reference-curator): add --save-raw, --no-distill flags + OurSEO crawler integration

2026-04-12 23:26:50 +09:00

SKILL.md

refactor(skills): clean skill names (strip NN- prefix from name:) — convention change

2026-05-28 02:11:01 +09:00

USER-GUIDE.md

feat(reference-curator): add --save-raw, --no-distill flags + OurSEO crawler integration

2026-04-12 23:26:50 +09:00

README.md

Reference Curator Skills

Modular Claude Skills for curating, processing, and exporting reference documentation.

Quick Start

# Clone and install
git clone https://github.com/ourdigital/our-claude-skills.git
cd our-claude-skills/custom-skills/90-reference-curator
./install.sh

# Or minimal install (Firecrawl only, no MySQL)
./install.sh --minimal

# Check installation status
./install.sh --check

# Uninstall
./install.sh --uninstall

Installing from GitHub

For New Machines

Clone the repository:

git clone https://github.com/ourdigital/our-claude-skills.git
cd our-claude-skills/custom-skills/90-reference-curator

Run the installer:
```
./install.sh
```
Follow the interactive prompts:
- Set storage directory path
- Configure MySQL credentials (optional)
- Choose crawler backend (Firecrawl MCP recommended)

Add to shell profile:

echo 'source ~/.reference-curator.env' >> ~/.zshrc
source ~/.reference-curator.env

Verify installation:
```
./install.sh --check
```

Installation Modes

Mode	Command	Description
Full	`./install.sh`	Interactive setup with MySQL and crawlers
Minimal	`./install.sh --minimal`	Firecrawl MCP only, no database
Check	`./install.sh --check`	Verify installation status
Claude.ai	`./install.sh --claude-ai`	Export skills for Claude.ai Projects
Uninstall	`./install.sh --uninstall`	Remove installation (preserves data)

What Gets Installed

Component	Location	Purpose
Environment config	`~/.reference-curator.env`	Credentials and paths
Config files	`~/.config/reference-curator/`	YAML configuration
Storage directories	`~/reference-library/`	Raw/processed/exports
Claude Code commands	`~/.claude/commands/`	Slash commands
Claude Desktop skills	`~/.claude/skills/`	Skill symlinks
MySQL database	`reference_library`	Document storage

Environment Variables

The installer creates ~/.reference-curator.env with:

# Storage paths
export REFERENCE_LIBRARY_PATH="~/reference-library"

# MySQL configuration (if enabled)
export MYSQL_HOST="localhost"
export MYSQL_PORT="3306"
export MYSQL_USER="youruser"
export MYSQL_PASSWORD="yourpassword"

# Crawler configuration
export DEFAULT_CRAWLER="firecrawl"  # or "nodejs"
export CRAWLER_PROJECT_PATH=""       # Path to local crawlers (optional)

Claude.ai Projects Installation

To use these skills in Claude.ai (web interface), export the skill files for upload:

./install.sh --claude-ai

This displays available files in claude-project/ and optionally copies them to a convenient location.

Files for Upload

File	Description
`reference-curator-complete.md`	All 6 skills combined (recommended)
`INDEX.md`	Overview and workflow documentation
`01-reference-discovery.md`	Source discovery skill
`02-web-crawler.md`	Crawling orchestration skill
`03-content-repository.md`	Database storage skill
`04-content-distiller.md`	Content summarization skill
`05-quality-reviewer.md`	QA review skill
`06-markdown-exporter.md`	Export skill

Upload Instructions

Go to claude.ai
Create a new Project or open existing one
Click "Add to project knowledge"
Upload reference-curator-complete.md (or individual skills as needed)

Architecture

                    ┌──────────────────────────────┐
                    │ reference-curator-pipeline   │  (Orchestrator)
                    │ /reference-curator-pipeline  │
                    └──────────────────────────────┘
                                  │
          ┌───────────────────────┼───────────────────────┐
          ▼                       ▼                       ▼
     [Topic Input]          [URL Input]          [Manifest Input]
          │                       │                       │
          ▼                       │                       │
┌─────────────────────┐           │                       │
│ reference-discovery │ ◄─────────┴───────────────────────┘
└─────────────────────┘                            (skip if URLs/manifest)
          │
          ▼
┌──────────────────────────┐
│ web-crawler-orchestrator │ → Crawl (Firecrawl/Node.js/aiohttp/Scrapy)
└──────────────────────────┘
          │
          ▼
┌────────────────────┐
│ content-repository │ → Store in MySQL
└────────────────────┘
          │
          ▼
┌───────────────────┐
│ content-distiller │ → Summarize & extract  ◄─────┐
└───────────────────┘                              │
          │                                        │
          ▼                                        │
┌──────────────────┐                               │
│ quality-reviewer │ → QA loop                     │
└──────────────────┘                               │
          │                                        │
          ├── REFACTOR (max 3) ────────────────────┤
          ├── DEEP_RESEARCH (max 2) → crawler ─────┘
          │
          ▼ APPROVE
┌───────────────────┐
│ markdown-exporter │ → Project files / Fine-tuning
└───────────────────┘

User Guide

Full Pipeline (Recommended)

Run the complete curation workflow with a single command:

# From topic - runs all 6 stages automatically
/reference-curator-pipeline "Claude Code best practices" --max-sources 5

# From URLs - skip discovery, start at crawler
/reference-curator-pipeline https://docs.anthropic.com/en/docs/prompt-caching

# Resume from manifest file
/reference-curator-pipeline ./manifest.json --auto-approve

# Fine-tuning dataset output
/reference-curator-pipeline "MCP servers" --export-format fine_tuning

# Save raw crawled content alongside distilled output
/reference-curator-pipeline "Claude Code best practices" --save-raw

# Pure archival — skip distillation, keep raw markdown only
/reference-curator-pipeline https://docs.anthropic.com --no-distill

# Full-depth archival
/reference-curator-pipeline https://legacy-docs.example.com --depth full --no-distill

Pipeline Options:

--max-sources 10 - Max sources to discover (topic mode)
--max-pages 50 - Max pages per source to crawl
--auto-approve - Auto-approve scores above threshold
--threshold 0.85 - Approval threshold
--max-iterations 3 - Max QA loop iterations per document
--export-format project_files - Output format (project_files, fine_tuning, jsonl)
--save-raw - Also save intact raw crawled markdown in a raw/ subdirectory
--no-distill - Skip distillation and QA (pure archival). Implies --save-raw

Manual Workflow (Step-by-Step)

Step 1: Discover References

/reference-discovery Claude's system prompt best practices

Claude searches the web, validates sources, and creates a manifest of URLs to crawl.

Step 2: Crawl Content

/web-crawler https://docs.anthropic.com --max-pages 50

The crawler automatically selects the best backend based on site characteristics.

Step 3: Store in Repository

/content-repository store

Documents are saved to MySQL with deduplication and version tracking.

Step 4: Distill Content

/content-distiller all-pending

Claude summarizes, extracts key concepts, and creates structured content.

Step 5: Quality Review

/quality-reviewer all-pending --auto-approve

Automated QA scoring determines: approve, refactor, deep research, or reject.

Step 6: Export

/markdown-exporter project_files --topic prompt-engineering

Generates markdown files organized by topic with cross-references.

Example Prompts

Task	Command
Discover sources	`/reference-discovery MCP server development`
Crawl URL	`/web-crawler https://docs.anthropic.com`
Check repository	`/content-repository stats`
Distill document	`/content-distiller 42`
Review quality	`/quality-reviewer all-pending`
Export files	`/markdown-exporter project_files`

Crawler Selection

The system automatically selects the optimal crawler:

Site Type	Crawler	When Auto-Selected
SPAs/Dynamic	Firecrawl MCP	React, Vue, Angular sites (default)
Small docs sites	Node.js	≤50 pages, static HTML
Technical docs	Python aiohttp	≤200 pages, needs SEO data
Enterprise sites	Scrapy	>200 pages, multi-domain

Override with explicit request:

/web-crawler https://example.com --crawler firecrawl

Quality Review Decisions

Score	Decision	What Happens
≥ 0.85	Approve	Ready for export
0.60-0.84	Refactor	Re-distill with feedback
0.40-0.59	Deep Research	Gather more sources
< 0.40	Reject	Archive (low quality)

Database Queries

Check your reference library status:

# Source credentials
source ~/.reference-curator.env

# Count documents by status
mysql -h $MYSQL_HOST -u $MYSQL_USER -p"$MYSQL_PASSWORD" reference_library -e "
  SELECT crawl_status, COUNT(*) as count
  FROM documents GROUP BY crawl_status;"

# View pending reviews
mysql -h $MYSQL_HOST -u $MYSQL_USER -p"$MYSQL_PASSWORD" reference_library -e "
  SELECT * FROM v_pending_reviews;"

# View export-ready documents
mysql -h $MYSQL_HOST -u $MYSQL_USER -p"$MYSQL_PASSWORD" reference_library -e "
  SELECT * FROM v_export_ready;"

Output Formats

For Claude Projects:

~/reference-library/exports/
├── INDEX.md                 # Master index with all topics
└── prompt-engineering/      # Topic folder
    ├── _index.md            # Topic overview
    ├── system-prompts.md    # Individual document
    └── chain-of-thought.md

Raw Archival (--no-distill):

~/reference-library/exports/
└── topic-slug/
    ├── README.md
    ├── 00-page-name.raw.md          # Intact crawled content
    ├── 01-page-name.raw.md
    ├── topic-slug-raw-complete.md   # Combined raw bundle
    └── manifest.json

Distilled + Raw (--save-raw):

~/reference-library/exports/
└── topic-slug/
    ├── INDEX.md
    ├── 00-page-name.md              # Distilled
    ├── raw/
    │   ├── 00-page-name.raw.md      # Intact crawled
    │   └── topic-slug-raw-complete.md
    └── manifest.json

For Fine-tuning (JSONL):

{"messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}

Skills & Commands Reference

#	Skill	Command	Purpose
01	reference-discovery	`/reference-discovery`	Search authoritative sources
02	web-crawler-orchestrator	`/web-crawler`	Multi-backend crawling
03	content-repository	`/content-repository`	MySQL storage management
04	content-distiller	`/content-distiller`	Summarize & extract
05	quality-reviewer	`/quality-reviewer`	QA scoring & routing
06	markdown-exporter	`/markdown-exporter`	Export to markdown/JSONL
07	pipeline-orchestrator	`/reference-curator-pipeline`	Full pipeline orchestration

Configuration

Environment (`~/.reference-curator.env`)

# Required for MySQL features
export MYSQL_HOST="localhost"
export MYSQL_USER="youruser"
export MYSQL_PASSWORD="yourpassword"

# Storage location
export REFERENCE_LIBRARY_PATH="~/reference-library"

# Crawler selection
export DEFAULT_CRAWLER="firecrawl"  # firecrawl, nodejs, aiohttp, scrapy

Database (`~/.config/reference-curator/db_config.yaml`)

mysql:
  host: ${MYSQL_HOST:-localhost}
  port: ${MYSQL_PORT:-3306}
  database: reference_library
  user: ${MYSQL_USER}
  password: ${MYSQL_PASSWORD}

Crawlers (`~/.config/reference-curator/crawl_config.yaml`)

default_crawler: ${DEFAULT_CRAWLER:-firecrawl}

rate_limit:
  requests_per_minute: 20
  concurrent_requests: 3

default_options:
  timeout: 30000
  max_depth: 3
  max_pages: 100

Export (`~/.config/reference-curator/export_config.yaml`)

output:
  base_path: ${REFERENCE_LIBRARY_PATH:-~/reference-library}/exports/

quality:
  min_score_for_export: 0.80
  auto_approve_tier1_sources: true

Troubleshooting

MySQL Connection Failed

# Check MySQL is running
brew services list | grep mysql   # macOS
systemctl status mysql            # Linux

# Start MySQL
brew services start mysql         # macOS
sudo systemctl start mysql        # Linux

# Verify credentials
source ~/.reference-curator.env
mysql -h $MYSQL_HOST -u $MYSQL_USER -p"$MYSQL_PASSWORD" -e "SELECT 1"

Commands Not Found

# Check commands are registered
ls -la ~/.claude/commands/

# Re-run installer to fix
./install.sh

Crawler Timeout

For slow sites, increase timeout in ~/.config/reference-curator/crawl_config.yaml:

default_options:
  timeout: 60000  # 60 seconds

Skills Not Loading

# Check symlinks exist
ls -la ~/.claude/skills/

# Re-run installer
./install.sh --uninstall
./install.sh

Database Schema Outdated

# Re-apply schema (preserves data with IF NOT EXISTS)
source ~/.reference-curator.env
mysql -h $MYSQL_HOST -u $MYSQL_USER -p"$MYSQL_PASSWORD" reference_library < shared/schema.sql

Directory Structure

90-reference-curator/
├── README.md                     # This file
├── CHANGELOG.md                  # Version history
├── install.sh                    # Portable installation script
│
├── claude-project/               # Files for Claude.ai Projects
│   ├── INDEX.md                  # Overview
│   ├── reference-curator-complete.md  # All skills combined
│   ├── 01-reference-discovery.md
│   ├── 02-web-crawler.md
│   ├── 03-content-repository.md
│   ├── 04-content-distiller.md
│   ├── 05-quality-reviewer.md
│   └── 06-markdown-exporter.md
│
├── commands/                     # Claude Code commands (tracked in git)
│   ├── reference-discovery.md
│   ├── web-crawler.md
│   ├── content-repository.md
│   ├── content-distiller.md
│   ├── quality-reviewer.md
│   ├── markdown-exporter.md
│   └── reference-curator-pipeline.md
│
├── 01-reference-discovery/
│   ├── code/CLAUDE.md            # Claude Code directive
│   └── desktop/SKILL.md          # Claude Desktop directive
├── 02-web-crawler-orchestrator/
│   ├── code/CLAUDE.md
│   └── desktop/SKILL.md
├── 03-content-repository/
│   ├── code/CLAUDE.md
│   └── desktop/SKILL.md
├── 04-content-distiller/
│   ├── code/CLAUDE.md
│   └── desktop/SKILL.md
├── 05-quality-reviewer/
│   ├── code/CLAUDE.md
│   └── desktop/SKILL.md
├── 06-markdown-exporter/
│   ├── code/CLAUDE.md
│   └── desktop/SKILL.md
├── 07-pipeline-orchestrator/
│   ├── code/CLAUDE.md
│   └── desktop/SKILL.md
│
└── shared/
    ├── schema.sql                # MySQL schema
    └── config/                   # Config templates
        ├── db_config.yaml
        ├── crawl_config.yaml
        └── export_config.yaml

Platform Differences

Aspect	`code/` (Claude Code)	`desktop/` (Claude Desktop)
Directive	CLAUDE.md	SKILL.md (YAML frontmatter)
Commands	`~/.claude/commands/`	Not used
Skills	Reference only	`~/.claude/skills/` symlinks
Execution	Direct Bash/Python	MCP tools only
Best for	Automation, CI/CD	Interactive use

README.md

Reference Curator Skills

Quick Start

Installing from GitHub

For New Machines

Installation Modes

What Gets Installed

Environment Variables

Claude.ai Projects Installation

Files for Upload

Upload Instructions

Architecture

User Guide

Full Pipeline (Recommended)

Manual Workflow (Step-by-Step)

Example Prompts

Crawler Selection

Quality Review Decisions

Database Queries

Output Formats

Skills & Commands Reference

Configuration

Environment (`~/.reference-curator.env`)

Database (`~/.config/reference-curator/db_config.yaml`)

Crawlers (`~/.config/reference-curator/crawl_config.yaml`)

Export (`~/.config/reference-curator/export_config.yaml`)

Troubleshooting

MySQL Connection Failed

Commands Not Found

Crawler Timeout

Skills Not Loading

Database Schema Outdated

Directory Structure

Platform Differences

Prerequisites

Required

Optional (for full features)

README.md

Reference Curator Skills

Quick Start

Installing from GitHub

For New Machines

Installation Modes

What Gets Installed

Environment Variables

Claude.ai Projects Installation

Files for Upload

Upload Instructions

Architecture

User Guide

Full Pipeline (Recommended)

Manual Workflow (Step-by-Step)

Example Prompts

Crawler Selection

Quality Review Decisions

Database Queries

Output Formats

Skills & Commands Reference

Configuration

Environment (~/.reference-curator.env)

Database (~/.config/reference-curator/db_config.yaml)

Crawlers (~/.config/reference-curator/crawl_config.yaml)

Export (~/.config/reference-curator/export_config.yaml)

Troubleshooting

MySQL Connection Failed

Commands Not Found

Crawler Timeout

Skills Not Loading

Database Schema Outdated

Directory Structure

Platform Differences

Prerequisites

Required

Optional (for full features)

Environment (`~/.reference-curator.env`)

Database (`~/.config/reference-curator/db_config.yaml`)

Crawlers (`~/.config/reference-curator/crawl_config.yaml`)

Export (`~/.config/reference-curator/export_config.yaml`)