Files
Andrew Yim 397fa2aa5d Fix SEO skills 19-34 bugs, add slash commands, enhance reference-curator (#3)
* Fix SEO skill 34 bugs, Korean labels, and transition Ahrefs refs to our-seo-agent

P0: Fix report_aggregator.py — wrong SKILL_REGISTRY[33] mapping, missing
CATEGORY_WEIGHTS for 7 categories, and break bug in health score parsing
that exited loop even on parse failure.

P1: Remove VIEW tab references from skill 20, expand skill 32 docs,
replace Ahrefs MCP references across all 16 skills (19-28, 31-34)
with our-seo-agent CLI data source references.

P2: Fix Korean labels in executive_report.py and dashboard_generator.py,
add tenacity to base requirements, sync skill 34 base_client.py with
canonical version from skill 12.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add Claude Code slash commands for SEO skills 19-34 and fix stale paths

Create 14 new slash command files for skills 19-28, 31-34 so they
appear as /seo-* commands in Claude Code. Also fix stale directory
paths in 8 existing commands (skills 12-18, 29-30) that referenced
pre-renumbering skill directories.

Update .gitignore to track .claude/commands/ while keeping other
.claude/ files ignored.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add 8 slash commands, enhance reference-curator with depth/output options

- Add slash commands: ourdigital-brand-guide, notion-writer, notebooklm-agent,
  notebooklm-automation, notebooklm-studio, notebooklm-research,
  reference-curator, multi-agent-guide
- Add --depth (light/standard/deep/full) with Firecrawl parameter mapping
- Add --output with ~/Documents/reference-library/ default and user confirmation
- Increase --max-sources default from 10 to 100
- Rename /reference-curator-pipeline to /reference-curator
- Simplify web-crawler-orchestrator label to web-crawler in docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 14:12:57 +09:00
..

Reference Curator Skills

Modular Claude Skills for curating, processing, and exporting reference documentation.

Quick Start

# Clone and install
git clone https://github.com/ourdigital/our-claude-skills.git
cd our-claude-skills/custom-skills/90-reference-curator
./install.sh

# Or minimal install (Firecrawl only, no MySQL)
./install.sh --minimal

# Check installation status
./install.sh --check

# Uninstall
./install.sh --uninstall

Installing from GitHub

For New Machines

  1. Clone the repository:

    git clone https://github.com/ourdigital/our-claude-skills.git
    cd our-claude-skills/custom-skills/90-reference-curator
    
  2. Run the installer:

    ./install.sh
    
  3. Follow the interactive prompts:

    • Set storage directory path
    • Configure MySQL credentials (optional)
    • Choose crawler backend (Firecrawl MCP recommended)
  4. Add to shell profile:

    echo 'source ~/.reference-curator.env' >> ~/.zshrc
    source ~/.reference-curator.env
    
  5. Verify installation:

    ./install.sh --check
    

Installation Modes

Mode Command Description
Full ./install.sh Interactive setup with MySQL and crawlers
Minimal ./install.sh --minimal Firecrawl MCP only, no database
Check ./install.sh --check Verify installation status
Claude.ai ./install.sh --claude-ai Export skills for Claude.ai Projects
Uninstall ./install.sh --uninstall Remove installation (preserves data)

What Gets Installed

Component Location Purpose
Environment config ~/.reference-curator.env Credentials and paths
Config files ~/.config/reference-curator/ YAML configuration
Storage directories ~/reference-library/ Raw/processed/exports
Claude Code commands ~/.claude/commands/ Slash commands
Claude Desktop skills ~/.claude/skills/ Skill symlinks
MySQL database reference_library Document storage

Environment Variables

The installer creates ~/.reference-curator.env with:

# Storage paths
export REFERENCE_LIBRARY_PATH="~/reference-library"

# MySQL configuration (if enabled)
export MYSQL_HOST="localhost"
export MYSQL_PORT="3306"
export MYSQL_USER="youruser"
export MYSQL_PASSWORD="yourpassword"

# Crawler configuration
export DEFAULT_CRAWLER="firecrawl"  # or "nodejs"
export CRAWLER_PROJECT_PATH=""       # Path to local crawlers (optional)

Claude.ai Projects Installation

To use these skills in Claude.ai (web interface), export the skill files for upload:

./install.sh --claude-ai

This displays available files in claude-project/ and optionally copies them to a convenient location.

Files for Upload

File Description
reference-curator-complete.md All 6 skills combined (recommended)
INDEX.md Overview and workflow documentation
01-reference-discovery.md Source discovery skill
02-web-crawler.md Crawling orchestration skill
03-content-repository.md Database storage skill
04-content-distiller.md Content summarization skill
05-quality-reviewer.md QA review skill
06-markdown-exporter.md Export skill

Upload Instructions

  1. Go to claude.ai
  2. Create a new Project or open existing one
  3. Click "Add to project knowledge"
  4. Upload reference-curator-complete.md (or individual skills as needed)

Architecture

                    ┌──────────────────────────────┐
                    │ reference-curator-pipeline   │  (Orchestrator)
                    │ /reference-curator-pipeline  │
                    └──────────────────────────────┘
                                  │
          ┌───────────────────────┼───────────────────────┐
          ▼                       ▼                       ▼
     [Topic Input]          [URL Input]          [Manifest Input]
          │                       │                       │
          ▼                       │                       │
┌─────────────────────┐           │                       │
│ reference-discovery │ ◄─────────┴───────────────────────┘
└─────────────────────┘                            (skip if URLs/manifest)
          │
          ▼
┌──────────────────────────┐
│ web-crawler-orchestrator │ → Crawl (Firecrawl/Node.js/aiohttp/Scrapy)
└──────────────────────────┘
          │
          ▼
┌────────────────────┐
│ content-repository │ → Store in MySQL
└────────────────────┘
          │
          ▼
┌───────────────────┐
│ content-distiller │ → Summarize & extract  ◄─────┐
└───────────────────┘                              │
          │                                        │
          ▼                                        │
┌──────────────────┐                               │
│ quality-reviewer │ → QA loop                     │
└──────────────────┘                               │
          │                                        │
          ├── REFACTOR (max 3) ────────────────────┤
          ├── DEEP_RESEARCH (max 2) → crawler ─────┘
          │
          ▼ APPROVE
┌───────────────────┐
│ markdown-exporter │ → Project files / Fine-tuning
└───────────────────┘

User Guide

Run the complete curation workflow with a single command:

# From topic - runs all 6 stages automatically
/reference-curator-pipeline "Claude Code best practices" --max-sources 5

# From URLs - skip discovery, start at crawler
/reference-curator-pipeline https://docs.anthropic.com/en/docs/prompt-caching

# Resume from manifest file
/reference-curator-pipeline ./manifest.json --auto-approve

# Fine-tuning dataset output
/reference-curator-pipeline "MCP servers" --export-format fine_tuning

Pipeline Options:

  • --max-sources 10 - Max sources to discover (topic mode)
  • --max-pages 50 - Max pages per source to crawl
  • --auto-approve - Auto-approve scores above threshold
  • --threshold 0.85 - Approval threshold
  • --max-iterations 3 - Max QA loop iterations per document
  • --export-format project_files - Output format (project_files, fine_tuning, jsonl)

Manual Workflow (Step-by-Step)

Step 1: Discover References

/reference-discovery Claude's system prompt best practices

Claude searches the web, validates sources, and creates a manifest of URLs to crawl.

Step 2: Crawl Content

/web-crawler https://docs.anthropic.com --max-pages 50

The crawler automatically selects the best backend based on site characteristics.

Step 3: Store in Repository

/content-repository store

Documents are saved to MySQL with deduplication and version tracking.

Step 4: Distill Content

/content-distiller all-pending

Claude summarizes, extracts key concepts, and creates structured content.

Step 5: Quality Review

/quality-reviewer all-pending --auto-approve

Automated QA scoring determines: approve, refactor, deep research, or reject.

Step 6: Export

/markdown-exporter project_files --topic prompt-engineering

Generates markdown files organized by topic with cross-references.

Example Prompts

Task Command
Discover sources /reference-discovery MCP server development
Crawl URL /web-crawler https://docs.anthropic.com
Check repository /content-repository stats
Distill document /content-distiller 42
Review quality /quality-reviewer all-pending
Export files /markdown-exporter project_files

Crawler Selection

The system automatically selects the optimal crawler:

Site Type Crawler When Auto-Selected
SPAs/Dynamic Firecrawl MCP React, Vue, Angular sites (default)
Small docs sites Node.js ≤50 pages, static HTML
Technical docs Python aiohttp ≤200 pages, needs SEO data
Enterprise sites Scrapy >200 pages, multi-domain

Override with explicit request:

/web-crawler https://example.com --crawler firecrawl

Quality Review Decisions

Score Decision What Happens
≥ 0.85 Approve Ready for export
0.60-0.84 Refactor Re-distill with feedback
0.40-0.59 Deep Research Gather more sources
< 0.40 Reject Archive (low quality)

Database Queries

Check your reference library status:

# Source credentials
source ~/.reference-curator.env

# Count documents by status
mysql -h $MYSQL_HOST -u $MYSQL_USER -p"$MYSQL_PASSWORD" reference_library -e "
  SELECT crawl_status, COUNT(*) as count
  FROM documents GROUP BY crawl_status;"

# View pending reviews
mysql -h $MYSQL_HOST -u $MYSQL_USER -p"$MYSQL_PASSWORD" reference_library -e "
  SELECT * FROM v_pending_reviews;"

# View export-ready documents
mysql -h $MYSQL_HOST -u $MYSQL_USER -p"$MYSQL_PASSWORD" reference_library -e "
  SELECT * FROM v_export_ready;"

Output Formats

For Claude Projects:

~/reference-library/exports/
├── INDEX.md                 # Master index with all topics
└── prompt-engineering/      # Topic folder
    ├── _index.md            # Topic overview
    ├── system-prompts.md    # Individual document
    └── chain-of-thought.md

For Fine-tuning (JSONL):

{"messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}

Skills & Commands Reference

# Skill Command Purpose
01 reference-discovery /reference-discovery Search authoritative sources
02 web-crawler-orchestrator /web-crawler Multi-backend crawling
03 content-repository /content-repository MySQL storage management
04 content-distiller /content-distiller Summarize & extract
05 quality-reviewer /quality-reviewer QA scoring & routing
06 markdown-exporter /markdown-exporter Export to markdown/JSONL
07 pipeline-orchestrator /reference-curator-pipeline Full pipeline orchestration

Configuration

Environment (~/.reference-curator.env)

# Required for MySQL features
export MYSQL_HOST="localhost"
export MYSQL_USER="youruser"
export MYSQL_PASSWORD="yourpassword"

# Storage location
export REFERENCE_LIBRARY_PATH="~/reference-library"

# Crawler selection
export DEFAULT_CRAWLER="firecrawl"  # firecrawl, nodejs, aiohttp, scrapy

Database (~/.config/reference-curator/db_config.yaml)

mysql:
  host: ${MYSQL_HOST:-localhost}
  port: ${MYSQL_PORT:-3306}
  database: reference_library
  user: ${MYSQL_USER}
  password: ${MYSQL_PASSWORD}

Crawlers (~/.config/reference-curator/crawl_config.yaml)

default_crawler: ${DEFAULT_CRAWLER:-firecrawl}

rate_limit:
  requests_per_minute: 20
  concurrent_requests: 3

default_options:
  timeout: 30000
  max_depth: 3
  max_pages: 100

Export (~/.config/reference-curator/export_config.yaml)

output:
  base_path: ${REFERENCE_LIBRARY_PATH:-~/reference-library}/exports/

quality:
  min_score_for_export: 0.80
  auto_approve_tier1_sources: true

Troubleshooting

MySQL Connection Failed

# Check MySQL is running
brew services list | grep mysql   # macOS
systemctl status mysql            # Linux

# Start MySQL
brew services start mysql         # macOS
sudo systemctl start mysql        # Linux

# Verify credentials
source ~/.reference-curator.env
mysql -h $MYSQL_HOST -u $MYSQL_USER -p"$MYSQL_PASSWORD" -e "SELECT 1"

Commands Not Found

# Check commands are registered
ls -la ~/.claude/commands/

# Re-run installer to fix
./install.sh

Crawler Timeout

For slow sites, increase timeout in ~/.config/reference-curator/crawl_config.yaml:

default_options:
  timeout: 60000  # 60 seconds

Skills Not Loading

# Check symlinks exist
ls -la ~/.claude/skills/

# Re-run installer
./install.sh --uninstall
./install.sh

Database Schema Outdated

# Re-apply schema (preserves data with IF NOT EXISTS)
source ~/.reference-curator.env
mysql -h $MYSQL_HOST -u $MYSQL_USER -p"$MYSQL_PASSWORD" reference_library < shared/schema.sql

Directory Structure

90-reference-curator/
├── README.md                     # This file
├── CHANGELOG.md                  # Version history
├── install.sh                    # Portable installation script
│
├── claude-project/               # Files for Claude.ai Projects
│   ├── INDEX.md                  # Overview
│   ├── reference-curator-complete.md  # All skills combined
│   ├── 01-reference-discovery.md
│   ├── 02-web-crawler.md
│   ├── 03-content-repository.md
│   ├── 04-content-distiller.md
│   ├── 05-quality-reviewer.md
│   └── 06-markdown-exporter.md
│
├── commands/                     # Claude Code commands (tracked in git)
│   ├── reference-discovery.md
│   ├── web-crawler.md
│   ├── content-repository.md
│   ├── content-distiller.md
│   ├── quality-reviewer.md
│   ├── markdown-exporter.md
│   └── reference-curator-pipeline.md
│
├── 01-reference-discovery/
│   ├── code/CLAUDE.md            # Claude Code directive
│   └── desktop/SKILL.md          # Claude Desktop directive
├── 02-web-crawler-orchestrator/
│   ├── code/CLAUDE.md
│   └── desktop/SKILL.md
├── 03-content-repository/
│   ├── code/CLAUDE.md
│   └── desktop/SKILL.md
├── 04-content-distiller/
│   ├── code/CLAUDE.md
│   └── desktop/SKILL.md
├── 05-quality-reviewer/
│   ├── code/CLAUDE.md
│   └── desktop/SKILL.md
├── 06-markdown-exporter/
│   ├── code/CLAUDE.md
│   └── desktop/SKILL.md
├── 07-pipeline-orchestrator/
│   ├── code/CLAUDE.md
│   └── desktop/SKILL.md
│
└── shared/
    ├── schema.sql                # MySQL schema
    └── config/                   # Config templates
        ├── db_config.yaml
        ├── crawl_config.yaml
        └── export_config.yaml

Platform Differences

Aspect code/ (Claude Code) desktop/ (Claude Desktop)
Directive CLAUDE.md SKILL.md (YAML frontmatter)
Commands ~/.claude/commands/ Not used
Skills Reference only ~/.claude/skills/ symlinks
Execution Direct Bash/Python MCP tools only
Best for Automation, CI/CD Interactive use

Prerequisites

Required

  • macOS or Linux
  • Claude Code or Claude Desktop

Optional (for full features)

  • MySQL 8.0+ (for database storage)
  • Firecrawl MCP server configured
  • Node.js 18+ (for Node.js crawler)
  • Python 3.12+ (for aiohttp/Scrapy crawlers)