Files
our-claude-skills/custom-skills/90-reference-curator/commands/reference-curator-pipeline.md
Andrew Yim d1cd1298a8 feat(reference-curator): Add pipeline orchestrator and refactor skill format
Pipeline Orchestrator:
- Add 07-pipeline-orchestrator skill with code/CLAUDE.md and desktop/SKILL.md
- Add /reference-curator-pipeline slash command for full workflow automation
- Add pipeline_runs and pipeline_iteration_tracker tables to schema.sql
- Add v_pipeline_status and v_pipeline_iterations views
- Add pipeline_config.yaml configuration template
- Update AGENTS.md with Reference Curator Skills section
- Update claude-project files with pipeline documentation

Skill Format Refactoring:
- Extract YAML frontmatter from SKILL.md files to separate skill.yaml
- Add tools/ directories with MCP tool documentation
- Update SKILL-FORMAT-REQUIREMENTS.md with new structure
- Add migrate-skill-structure.py script for format conversion

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 01:01:02 +07:00

3.4 KiB

description, argument-hint, allowed-tools
description argument-hint allowed-tools
Orchestrates full reference curation pipeline as background task. Runs discovery → crawl → store → distill → review → export with QA loop handling. <topic|urls|manifest> [--max-sources 10] [--max-pages 50] [--auto-approve] [--threshold 0.85] [--max-iterations 3] [--export-format project_files] WebSearch, WebFetch, Read, Write, Bash, Grep, Glob, Task

Reference Curator Pipeline

Full-stack orchestration of the 6-skill reference curation workflow.

Input Modes

Mode Input Example Pipeline Start
Topic "Claude system prompts" reference-discovery
URLs https://docs.anthropic.com/... web-crawler (skip discovery)
Manifest ./manifest.json web-crawler (resume from discovery)

Arguments

  • <input>: Required. Topic string, URL(s), or manifest file path
  • --max-sources: Maximum sources to discover (topic mode, default: 10)
  • --max-pages: Maximum pages per source to crawl (default: 50)
  • --auto-approve: Auto-approve scores above threshold
  • --threshold: Approval threshold (default: 0.85)
  • --max-iterations: Max QA loop iterations per document (default: 3)
  • --export-format: Output format: project_files, fine_tuning, jsonl (default: project_files)

Pipeline Stages

1. reference-discovery  (topic mode only)
2. web-crawler-orchestrator
3. content-repository
4. content-distiller    ◄────────┐
5. quality-reviewer              │
   ├── APPROVE → export          │
   ├── REFACTOR ─────────────────┤
   ├── DEEP_RESEARCH → crawler ──┘
   └── REJECT → archive
6. markdown-exporter

QA Loop Handling

Decision Action Max Iterations
REFACTOR Re-distill with feedback 3
DEEP_RESEARCH Crawl more sources, re-distill 2
Combined Total loops per document 5

After max iterations, document marked as needs_manual_review.

Example Usage

# Full pipeline from topic
/reference-curator-pipeline "Claude Code best practices" --max-sources 5

# Pipeline from specific URLs (skip discovery)
/reference-curator-pipeline https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching

# Resume from existing manifest
/reference-curator-pipeline ./manifest.json --auto-approve

# Fine-tuning dataset output
/reference-curator-pipeline "MCP servers" --export-format fine_tuning --auto-approve

State Management

Pipeline state is saved after each stage to allow resume:

With MySQL:

SELECT * FROM pipeline_runs WHERE run_id = 123;

File-based fallback:

~/reference-library/pipeline_state/run_XXX/state.json

Output

Pipeline returns summary on completion:

{
  "run_id": 123,
  "status": "completed",
  "stats": {
    "sources_discovered": 5,
    "pages_crawled": 45,
    "documents_stored": 45,
    "approved": 40,
    "refactored": 8,
    "deep_researched": 2,
    "rejected": 3,
    "needs_manual_review": 2
  },
  "exports": {
    "format": "project_files",
    "path": "~/reference-library/exports/"
  }
}

See Also

  • /reference-discovery - Run discovery stage only
  • /web-crawler - Run crawler stage only
  • /content-repository - Manage stored content
  • /quality-reviewer - Run QA review only
  • /markdown-exporter - Run export only