Files

Andrew Yim d1cd1298a8 feat(reference-curator): Add pipeline orchestrator and refactor skill format

Pipeline Orchestrator:
- Add 07-pipeline-orchestrator skill with code/CLAUDE.md and desktop/SKILL.md
- Add /reference-curator-pipeline slash command for full workflow automation
- Add pipeline_runs and pipeline_iteration_tracker tables to schema.sql
- Add v_pipeline_status and v_pipeline_iterations views
- Add pipeline_config.yaml configuration template
- Update AGENTS.md with Reference Curator Skills section
- Update claude-project files with pipeline documentation

Skill Format Refactoring:
- Extract YAML frontmatter from SKILL.md files to separate skill.yaml
- Add tools/ directories with MCP tool documentation
- Update SKILL-FORMAT-REQUIREMENTS.md with new structure
- Add migrate-skill-structure.py script for format conversion

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-29 01:01:02 +07:00

3.4 KiB

Raw Blame History

description, argument-hint, allowed-tools

description	argument-hint	allowed-tools
Orchestrates full reference curation pipeline as background task. Runs discovery → crawl → store → distill → review → export with QA loop handling.	<topic\|urls\|manifest> [--max-sources 10] [--max-pages 50] [--auto-approve] [--threshold 0.85] [--max-iterations 3] [--export-format project_files]	WebSearch, WebFetch, Read, Write, Bash, Grep, Glob, Task

Reference Curator Pipeline

Full-stack orchestration of the 6-skill reference curation workflow.

Input Modes

Mode	Input Example	Pipeline Start
Topic	`"Claude system prompts"`	reference-discovery
URLs	`https://docs.anthropic.com/...`	web-crawler (skip discovery)
Manifest	`./manifest.json`	web-crawler (resume from discovery)

Arguments

<input>: Required. Topic string, URL(s), or manifest file path
--max-sources: Maximum sources to discover (topic mode, default: 10)
--max-pages: Maximum pages per source to crawl (default: 50)
--auto-approve: Auto-approve scores above threshold
--threshold: Approval threshold (default: 0.85)
--max-iterations: Max QA loop iterations per document (default: 3)
--export-format: Output format: project_files, fine_tuning, jsonl (default: project_files)

Pipeline Stages

1. reference-discovery  (topic mode only)
2. web-crawler-orchestrator
3. content-repository
4. content-distiller    ◄────────┐
5. quality-reviewer              │
   ├── APPROVE → export          │
   ├── REFACTOR ─────────────────┤
   ├── DEEP_RESEARCH → crawler ──┘
   └── REJECT → archive
6. markdown-exporter

QA Loop Handling

Decision	Action	Max Iterations
REFACTOR	Re-distill with feedback	3
DEEP_RESEARCH	Crawl more sources, re-distill	2
Combined	Total loops per document	5

After max iterations, document marked as needs_manual_review.

Example Usage

# Full pipeline from topic
/reference-curator-pipeline "Claude Code best practices" --max-sources 5

# Pipeline from specific URLs (skip discovery)
/reference-curator-pipeline https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching

# Resume from existing manifest
/reference-curator-pipeline ./manifest.json --auto-approve

# Fine-tuning dataset output
/reference-curator-pipeline "MCP servers" --export-format fine_tuning --auto-approve

State Management

Pipeline state is saved after each stage to allow resume:

With MySQL:

SELECT * FROM pipeline_runs WHERE run_id = 123;

File-based fallback:

~/reference-library/pipeline_state/run_XXX/state.json

Output

Pipeline returns summary on completion:

{
  "run_id": 123,
  "status": "completed",
  "stats": {
    "sources_discovered": 5,
    "pages_crawled": 45,
    "documents_stored": 45,
    "approved": 40,
    "refactored": 8,
    "deep_researched": 2,
    "rejected": 3,
    "needs_manual_review": 2
  },
  "exports": {
    "format": "project_files",
    "path": "~/reference-library/exports/"
  }
}

3.4 KiB Raw Blame History