# Pipeline Orchestrator Coordinates the full 6-skill reference curation workflow with automated QA loop handling. ## Trigger Phrases - "curate references on [topic]" - "run full curation pipeline" - "automate reference curation" - "curate these URLs: [url1, url2]" ## Input Modes | Mode | Example | Pipeline Start | |------|---------|----------------| | **Topic** | "curate references on Claude system prompts" | Stage 1 (discovery) | | **URLs** | "curate these URLs: https://docs.anthropic.com/..." | Stage 2 (crawler) | | **Manifest** | "resume curation from manifest.json" | Stage 2 (crawler) | ## Pipeline Stages ``` 1. reference-discovery (topic mode only) │ ▼ 2. web-crawler-orchestrator │ ▼ 3. content-repository │ ▼ 4. content-distiller ◄─────────────┐ │ │ ▼ │ 5. quality-reviewer │ │ │ ├── APPROVE → Stage 6 │ ├── REFACTOR ───────────────┤ ├── DEEP_RESEARCH → Stage 2 ┘ └── REJECT → Archive │ ▼ 6. markdown-exporter ``` ## Configuration Options | Option | Default | Description | |--------|---------|-------------| | max_sources | 10 | Maximum sources to discover (topic mode) | | max_pages | 50 | Maximum pages per source to crawl | | auto_approve | false | Auto-approve scores above threshold | | threshold | 0.85 | Quality score threshold for approval | | max_iterations | 3 | Maximum QA loop iterations per document | | export_format | project_files | Output format (project_files, fine_tuning, jsonl) | ## QA Loop Handling The orchestrator automatically handles QA decisions: | Decision | Action | Iteration Limit | |----------|--------|-----------------| | **APPROVE** | Proceed to export | - | | **REFACTOR** | Re-distill with feedback | 3 iterations | | **DEEP_RESEARCH** | Crawl more sources, re-distill | 2 iterations | | **REJECT** | Archive with reason | - | After reaching iteration limits, documents are marked `needs_manual_review`. ## State Management ### With Database Pipeline state is tracked in `pipeline_runs` table: - Run ID, input type, current stage - Statistics (crawled, distilled, approved, etc.) - Error handling and resume capability ### File-Based Fallback State saved to `~/reference-library/pipeline_state/run_XXX/`: - `state.json` - Current stage and statistics - `manifest.json` - Discovered sources - `review_log.json` - QA decisions ## Progress Tracking The orchestrator reports progress at each stage: ``` [Pipeline] Stage 1/6: Discovery - Found 8 sources [Pipeline] Stage 2/6: Crawling - 45/50 pages complete [Pipeline] Stage 3/6: Storing - 45 documents saved [Pipeline] Stage 4/6: Distilling - 45 documents processed [Pipeline] Stage 5/6: Reviewing - 40 approved, 3 refactored, 2 rejected [Pipeline] Stage 6/6: Exporting - 40 documents exported [Pipeline] Complete! See ~/reference-library/exports/ ``` ## Output Summary On completion, returns detailed summary: ``` Pipeline Complete: - Sources discovered: 5 - Pages crawled: 45 - Documents stored: 45 - Approved: 40 - Refactored: 8 - Deep researched: 2 - Rejected: 3 - Needs manual review: 2 Exports saved to: ~/reference-library/exports/ Format: project_files ``` ## Error Handling If a stage fails: 1. State is checkpointed 2. Error is logged with details 3. Pipeline can be resumed from last successful stage ## Resume Pipeline To resume a paused or failed pipeline: - Provide the manifest.json or run state file - Pipeline continues from last checkpoint ## Example Usage ### Full Pipeline from Topic ``` User: Curate references on Claude Code best practices with 5 sources Claude: I'll run the full curation pipeline for "Claude Code best practices": [Pipeline] Stage 1/6: Discovering sources... Found 5 authoritative sources (3 official, 2 community) [Pipeline] Stage 2/6: Crawling 5 sources... Crawled 45 pages total [Pipeline] Stage 3/6: Storing documents... 45 documents saved to repository [Pipeline] Stage 4/6: Distilling content... 45 documents processed [Pipeline] Stage 5/6: Quality review... - 38 approved - 5 refactored (re-distilling...) - 2 rejected (low quality) [Pipeline] Stage 6/6: Exporting... 40 documents exported to ~/reference-library/exports/ Pipeline complete! 40 reference documents ready for use. ``` ### From Specific URLs ``` User: Curate these URLs with auto-approve: - https://docs.anthropic.com/en/docs/prompt-caching - https://docs.anthropic.com/en/docs/build-with-claude/tool-use Claude: Running pipeline from URLs (skipping discovery)... [Pipeline] Stage 2/6: Crawling 2 sources... [...continues with remaining stages...] ```