Pipeline Orchestrator: - Add 07-pipeline-orchestrator skill with code/CLAUDE.md and desktop/SKILL.md - Add /reference-curator-pipeline slash command for full workflow automation - Add pipeline_runs and pipeline_iteration_tracker tables to schema.sql - Add v_pipeline_status and v_pipeline_iterations views - Add pipeline_config.yaml configuration template - Update AGENTS.md with Reference Curator Skills section - Update claude-project files with pipeline documentation Skill Format Refactoring: - Extract YAML frontmatter from SKILL.md files to separate skill.yaml - Add tools/ directories with MCP tool documentation - Update SKILL-FORMAT-REQUIREMENTS.md with new structure - Add migrate-skill-structure.py script for format conversion Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
176 lines
4.7 KiB
Markdown
176 lines
4.7 KiB
Markdown
# Pipeline Orchestrator
|
|
|
|
Coordinates the full 6-skill reference curation workflow with automated QA loop handling.
|
|
|
|
## Trigger Phrases
|
|
|
|
- "curate references on [topic]"
|
|
- "run full curation pipeline"
|
|
- "automate reference curation"
|
|
- "curate these URLs: [url1, url2]"
|
|
|
|
## Input Modes
|
|
|
|
| Mode | Example | Pipeline Start |
|
|
|------|---------|----------------|
|
|
| **Topic** | "curate references on Claude system prompts" | Stage 1 (discovery) |
|
|
| **URLs** | "curate these URLs: https://docs.anthropic.com/..." | Stage 2 (crawler) |
|
|
| **Manifest** | "resume curation from manifest.json" | Stage 2 (crawler) |
|
|
|
|
## Pipeline Stages
|
|
|
|
```
|
|
1. reference-discovery (topic mode only)
|
|
│
|
|
▼
|
|
2. web-crawler-orchestrator
|
|
│
|
|
▼
|
|
3. content-repository
|
|
│
|
|
▼
|
|
4. content-distiller ◄─────────────┐
|
|
│ │
|
|
▼ │
|
|
5. quality-reviewer │
|
|
│ │
|
|
├── APPROVE → Stage 6 │
|
|
├── REFACTOR ───────────────┤
|
|
├── DEEP_RESEARCH → Stage 2 ┘
|
|
└── REJECT → Archive
|
|
│
|
|
▼
|
|
6. markdown-exporter
|
|
```
|
|
|
|
## Configuration Options
|
|
|
|
| Option | Default | Description |
|
|
|--------|---------|-------------|
|
|
| max_sources | 10 | Maximum sources to discover (topic mode) |
|
|
| max_pages | 50 | Maximum pages per source to crawl |
|
|
| auto_approve | false | Auto-approve scores above threshold |
|
|
| threshold | 0.85 | Quality score threshold for approval |
|
|
| max_iterations | 3 | Maximum QA loop iterations per document |
|
|
| export_format | project_files | Output format (project_files, fine_tuning, jsonl) |
|
|
|
|
## QA Loop Handling
|
|
|
|
The orchestrator automatically handles QA decisions:
|
|
|
|
| Decision | Action | Iteration Limit |
|
|
|----------|--------|-----------------|
|
|
| **APPROVE** | Proceed to export | - |
|
|
| **REFACTOR** | Re-distill with feedback | 3 iterations |
|
|
| **DEEP_RESEARCH** | Crawl more sources, re-distill | 2 iterations |
|
|
| **REJECT** | Archive with reason | - |
|
|
|
|
After reaching iteration limits, documents are marked `needs_manual_review`.
|
|
|
|
## State Management
|
|
|
|
### With Database
|
|
|
|
Pipeline state is tracked in `pipeline_runs` table:
|
|
- Run ID, input type, current stage
|
|
- Statistics (crawled, distilled, approved, etc.)
|
|
- Error handling and resume capability
|
|
|
|
### File-Based Fallback
|
|
|
|
State saved to `~/reference-library/pipeline_state/run_XXX/`:
|
|
- `state.json` - Current stage and statistics
|
|
- `manifest.json` - Discovered sources
|
|
- `review_log.json` - QA decisions
|
|
|
|
## Progress Tracking
|
|
|
|
The orchestrator reports progress at each stage:
|
|
|
|
```
|
|
[Pipeline] Stage 1/6: Discovery - Found 8 sources
|
|
[Pipeline] Stage 2/6: Crawling - 45/50 pages complete
|
|
[Pipeline] Stage 3/6: Storing - 45 documents saved
|
|
[Pipeline] Stage 4/6: Distilling - 45 documents processed
|
|
[Pipeline] Stage 5/6: Reviewing - 40 approved, 3 refactored, 2 rejected
|
|
[Pipeline] Stage 6/6: Exporting - 40 documents exported
|
|
[Pipeline] Complete! See ~/reference-library/exports/
|
|
```
|
|
|
|
## Output Summary
|
|
|
|
On completion, returns detailed summary:
|
|
|
|
```
|
|
Pipeline Complete:
|
|
- Sources discovered: 5
|
|
- Pages crawled: 45
|
|
- Documents stored: 45
|
|
- Approved: 40
|
|
- Refactored: 8
|
|
- Deep researched: 2
|
|
- Rejected: 3
|
|
- Needs manual review: 2
|
|
|
|
Exports saved to: ~/reference-library/exports/
|
|
Format: project_files
|
|
```
|
|
|
|
## Error Handling
|
|
|
|
If a stage fails:
|
|
1. State is checkpointed
|
|
2. Error is logged with details
|
|
3. Pipeline can be resumed from last successful stage
|
|
|
|
## Resume Pipeline
|
|
|
|
To resume a paused or failed pipeline:
|
|
- Provide the manifest.json or run state file
|
|
- Pipeline continues from last checkpoint
|
|
|
|
## Example Usage
|
|
|
|
### Full Pipeline from Topic
|
|
|
|
```
|
|
User: Curate references on Claude Code best practices with 5 sources
|
|
|
|
Claude: I'll run the full curation pipeline for "Claude Code best practices":
|
|
|
|
[Pipeline] Stage 1/6: Discovering sources...
|
|
Found 5 authoritative sources (3 official, 2 community)
|
|
|
|
[Pipeline] Stage 2/6: Crawling 5 sources...
|
|
Crawled 45 pages total
|
|
|
|
[Pipeline] Stage 3/6: Storing documents...
|
|
45 documents saved to repository
|
|
|
|
[Pipeline] Stage 4/6: Distilling content...
|
|
45 documents processed
|
|
|
|
[Pipeline] Stage 5/6: Quality review...
|
|
- 38 approved
|
|
- 5 refactored (re-distilling...)
|
|
- 2 rejected (low quality)
|
|
|
|
[Pipeline] Stage 6/6: Exporting...
|
|
40 documents exported to ~/reference-library/exports/
|
|
|
|
Pipeline complete! 40 reference documents ready for use.
|
|
```
|
|
|
|
### From Specific URLs
|
|
|
|
```
|
|
User: Curate these URLs with auto-approve:
|
|
- https://docs.anthropic.com/en/docs/prompt-caching
|
|
- https://docs.anthropic.com/en/docs/build-with-claude/tool-use
|
|
|
|
Claude: Running pipeline from URLs (skipping discovery)...
|
|
|
|
[Pipeline] Stage 2/6: Crawling 2 sources...
|
|
[...continues with remaining stages...]
|
|
```
|