Pipeline Orchestrator: - Add 07-pipeline-orchestrator skill with code/CLAUDE.md and desktop/SKILL.md - Add /reference-curator-pipeline slash command for full workflow automation - Add pipeline_runs and pipeline_iteration_tracker tables to schema.sql - Add v_pipeline_status and v_pipeline_iterations views - Add pipeline_config.yaml configuration template - Update AGENTS.md with Reference Curator Skills section - Update claude-project files with pipeline documentation Skill Format Refactoring: - Extract YAML frontmatter from SKILL.md files to separate skill.yaml - Add tools/ directories with MCP tool documentation - Update SKILL-FORMAT-REQUIREMENTS.md with new structure - Add migrate-skill-structure.py script for format conversion Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
3.4 KiB
3.4 KiB
description, argument-hint, allowed-tools
| description | argument-hint | allowed-tools |
|---|---|---|
| Orchestrates full reference curation pipeline as background task. Runs discovery → crawl → store → distill → review → export with QA loop handling. | <topic|urls|manifest> [--max-sources 10] [--max-pages 50] [--auto-approve] [--threshold 0.85] [--max-iterations 3] [--export-format project_files] | WebSearch, WebFetch, Read, Write, Bash, Grep, Glob, Task |
Reference Curator Pipeline
Full-stack orchestration of the 6-skill reference curation workflow.
Input Modes
| Mode | Input Example | Pipeline Start |
|---|---|---|
| Topic | "Claude system prompts" |
reference-discovery |
| URLs | https://docs.anthropic.com/... |
web-crawler (skip discovery) |
| Manifest | ./manifest.json |
web-crawler (resume from discovery) |
Arguments
<input>: Required. Topic string, URL(s), or manifest file path--max-sources: Maximum sources to discover (topic mode, default: 10)--max-pages: Maximum pages per source to crawl (default: 50)--auto-approve: Auto-approve scores above threshold--threshold: Approval threshold (default: 0.85)--max-iterations: Max QA loop iterations per document (default: 3)--export-format: Output format:project_files,fine_tuning,jsonl(default: project_files)
Pipeline Stages
1. reference-discovery (topic mode only)
2. web-crawler-orchestrator
3. content-repository
4. content-distiller ◄────────┐
5. quality-reviewer │
├── APPROVE → export │
├── REFACTOR ─────────────────┤
├── DEEP_RESEARCH → crawler ──┘
└── REJECT → archive
6. markdown-exporter
QA Loop Handling
| Decision | Action | Max Iterations |
|---|---|---|
| REFACTOR | Re-distill with feedback | 3 |
| DEEP_RESEARCH | Crawl more sources, re-distill | 2 |
| Combined | Total loops per document | 5 |
After max iterations, document marked as needs_manual_review.
Example Usage
# Full pipeline from topic
/reference-curator-pipeline "Claude Code best practices" --max-sources 5
# Pipeline from specific URLs (skip discovery)
/reference-curator-pipeline https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching
# Resume from existing manifest
/reference-curator-pipeline ./manifest.json --auto-approve
# Fine-tuning dataset output
/reference-curator-pipeline "MCP servers" --export-format fine_tuning --auto-approve
State Management
Pipeline state is saved after each stage to allow resume:
With MySQL:
SELECT * FROM pipeline_runs WHERE run_id = 123;
File-based fallback:
~/reference-library/pipeline_state/run_XXX/state.json
Output
Pipeline returns summary on completion:
{
"run_id": 123,
"status": "completed",
"stats": {
"sources_discovered": 5,
"pages_crawled": 45,
"documents_stored": 45,
"approved": 40,
"refactored": 8,
"deep_researched": 2,
"rejected": 3,
"needs_manual_review": 2
},
"exports": {
"format": "project_files",
"path": "~/reference-library/exports/"
}
}
See Also
/reference-discovery- Run discovery stage only/web-crawler- Run crawler stage only/content-repository- Manage stored content/quality-reviewer- Run QA review only/markdown-exporter- Run export only