our-claude-skills/custom-skills/90-reference-curator/claude-project/07-pipeline-orchestrator.md

# Pipeline Orchestrator

Coordinates the full 6-skill reference curation workflow with automated QA loop handling.

## Trigger Phrases

- "curate references on [topic]"
- "run full curation pipeline"
- "automate reference curation"
- "curate these URLs: [url1, url2]"

## Input Modes

| Mode | Example | Pipeline Start |
|------|---------|----------------|
| **Topic** | "curate references on Claude system prompts" | Stage 1 (discovery) |
| **URLs** | "curate these URLs: https://docs.anthropic.com/..." | Stage 2 (crawler) |
| **Manifest** | "resume curation from manifest.json" | Stage 2 (crawler) |

## Pipeline Stages

```
1. reference-discovery  (topic mode only)
       │
       ▼
2. web-crawler-orchestrator
       │
       ▼
3. content-repository
       │
       ▼
4. content-distiller ◄─────────────┐
       │                           │
       ▼                           │
5. quality-reviewer                │
       │                           │
       ├── APPROVE → Stage 6       │
       ├── REFACTOR ───────────────┤
       ├── DEEP_RESEARCH → Stage 2 ┘
       └── REJECT → Archive
       │
       ▼
6. markdown-exporter
```

## Configuration Options

| Option | Default | Description |
|--------|---------|-------------|
| max_sources | 10 | Maximum sources to discover (topic mode) |
| max_pages | 50 | Maximum pages per source to crawl |
| auto_approve | false | Auto-approve scores above threshold |
| threshold | 0.85 | Quality score threshold for approval |
| max_iterations | 3 | Maximum QA loop iterations per document |
| export_format | project_files | Output format (project_files, fine_tuning, jsonl) |

## QA Loop Handling

The orchestrator automatically handles QA decisions:

| Decision | Action | Iteration Limit |
|----------|--------|-----------------|
| **APPROVE** | Proceed to export | - |
| **REFACTOR** | Re-distill with feedback | 3 iterations |
| **DEEP_RESEARCH** | Crawl more sources, re-distill | 2 iterations |
| **REJECT** | Archive with reason | - |

After reaching iteration limits, documents are marked `needs_manual_review`.

## State Management

### With Database

Pipeline state is tracked in `pipeline_runs` table:
- Run ID, input type, current stage
- Statistics (crawled, distilled, approved, etc.)
- Error handling and resume capability

### File-Based Fallback

State saved to `~/reference-library/pipeline_state/run_XXX/`:
- `state.json` - Current stage and statistics
- `manifest.json` - Discovered sources
- `review_log.json` - QA decisions

## Progress Tracking

The orchestrator reports progress at each stage:

```
[Pipeline] Stage 1/6: Discovery - Found 8 sources
[Pipeline] Stage 2/6: Crawling - 45/50 pages complete
[Pipeline] Stage 3/6: Storing - 45 documents saved
[Pipeline] Stage 4/6: Distilling - 45 documents processed
[Pipeline] Stage 5/6: Reviewing - 40 approved, 3 refactored, 2 rejected
[Pipeline] Stage 6/6: Exporting - 40 documents exported
[Pipeline] Complete! See ~/reference-library/exports/
```

## Output Summary

On completion, returns detailed summary:

```
Pipeline Complete:
- Sources discovered: 5
- Pages crawled: 45
- Documents stored: 45
- Approved: 40
- Refactored: 8
- Deep researched: 2
- Rejected: 3
- Needs manual review: 2

Exports saved to: ~/reference-library/exports/
Format: project_files
```

## Error Handling

If a stage fails:
1. State is checkpointed
2. Error is logged with details
3. Pipeline can be resumed from last successful stage

## Resume Pipeline

To resume a paused or failed pipeline:
- Provide the manifest.json or run state file
- Pipeline continues from last checkpoint

## Example Usage

### Full Pipeline from Topic

```
User: Curate references on Claude Code best practices with 5 sources

Claude: I'll run the full curation pipeline for "Claude Code best practices":

[Pipeline] Stage 1/6: Discovering sources...
Found 5 authoritative sources (3 official, 2 community)

[Pipeline] Stage 2/6: Crawling 5 sources...
Crawled 45 pages total

[Pipeline] Stage 3/6: Storing documents...
45 documents saved to repository

[Pipeline] Stage 4/6: Distilling content...
45 documents processed

[Pipeline] Stage 5/6: Quality review...
- 38 approved
- 5 refactored (re-distilling...)
- 2 rejected (low quality)

[Pipeline] Stage 6/6: Exporting...
40 documents exported to ~/reference-library/exports/

Pipeline complete! 40 reference documents ready for use.
```

### From Specific URLs

```
User: Curate these URLs with auto-approve:
- https://docs.anthropic.com/en/docs/prompt-caching
- https://docs.anthropic.com/en/docs/build-with-claude/tool-use

Claude: Running pipeline from URLs (skipping discovery)...

[Pipeline] Stage 2/6: Crawling 2 sources...
[...continues with remaining stages...]
```