--- description: Orchestrates full reference curation pipeline as background task. Runs discovery → crawl → store → distill → review → export with QA loop handling. argument-hint: [--max-sources 10] [--max-pages 50] [--auto-approve] [--threshold 0.85] [--max-iterations 3] [--export-format project_files] allowed-tools: WebSearch, WebFetch, Read, Write, Bash, Grep, Glob, Task --- # Reference Curator Pipeline Full-stack orchestration of the 6-skill reference curation workflow. ## Input Modes | Mode | Input Example | Pipeline Start | |------|---------------|----------------| | **Topic** | `"Claude system prompts"` | reference-discovery | | **URLs** | `https://docs.anthropic.com/...` | web-crawler (skip discovery) | | **Manifest** | `./manifest.json` | web-crawler (resume from discovery) | ## Arguments - ``: Required. Topic string, URL(s), or manifest file path - `--max-sources`: Maximum sources to discover (topic mode, default: 10) - `--max-pages`: Maximum pages per source to crawl (default: 50) - `--auto-approve`: Auto-approve scores above threshold - `--threshold`: Approval threshold (default: 0.85) - `--max-iterations`: Max QA loop iterations per document (default: 3) - `--export-format`: Output format: `project_files`, `fine_tuning`, `jsonl` (default: project_files) ## Pipeline Stages ``` 1. reference-discovery (topic mode only) 2. web-crawler-orchestrator 3. content-repository 4. content-distiller ◄────────┐ 5. quality-reviewer │ ├── APPROVE → export │ ├── REFACTOR ─────────────────┤ ├── DEEP_RESEARCH → crawler ──┘ └── REJECT → archive 6. markdown-exporter ``` ## QA Loop Handling | Decision | Action | Max Iterations | |----------|--------|----------------| | REFACTOR | Re-distill with feedback | 3 | | DEEP_RESEARCH | Crawl more sources, re-distill | 2 | | Combined | Total loops per document | 5 | After max iterations, document marked as `needs_manual_review`. ## Example Usage ``` # Full pipeline from topic /reference-curator-pipeline "Claude Code best practices" --max-sources 5 # Pipeline from specific URLs (skip discovery) /reference-curator-pipeline https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching # Resume from existing manifest /reference-curator-pipeline ./manifest.json --auto-approve # Fine-tuning dataset output /reference-curator-pipeline "MCP servers" --export-format fine_tuning --auto-approve ``` ## State Management Pipeline state is saved after each stage to allow resume: **With MySQL:** ```sql SELECT * FROM pipeline_runs WHERE run_id = 123; ``` **File-based fallback:** ``` ~/reference-library/pipeline_state/run_XXX/state.json ``` ## Output Pipeline returns summary on completion: ```json { "run_id": 123, "status": "completed", "stats": { "sources_discovered": 5, "pages_crawled": 45, "documents_stored": 45, "approved": 40, "refactored": 8, "deep_researched": 2, "rejected": 3, "needs_manual_review": 2 }, "exports": { "format": "project_files", "path": "~/reference-library/exports/" } } ``` ## See Also - `/reference-discovery` - Run discovery stage only - `/web-crawler` - Run crawler stage only - `/content-repository` - Manage stored content - `/quality-reviewer` - Run QA review only - `/markdown-exporter` - Run export only