---
description: Orchestrates full reference curation pipeline as background task. Runs discovery → crawl → store → distill → review → export with QA loop handling.
argument-hint: <topic|urls|manifest> [--max-sources 10] [--max-pages 50] [--auto-approve] [--threshold 0.85] [--max-iterations 3] [--export-format project_files]
allowed-tools: WebSearch, WebFetch, Read, Write, Bash, Grep, Glob, Task
---

# Reference Curator Pipeline

Full-stack orchestration of the 6-skill reference curation workflow.

## Input Modes

| Mode | Input Example | Pipeline Start |
|------|---------------|----------------|
| **Topic** | `"Claude system prompts"` | reference-discovery |
| **URLs** | `https://docs.anthropic.com/...` | web-crawler (skip discovery) |
| **Manifest** | `./manifest.json` | web-crawler (resume from discovery) |

## Arguments

- `<input>`: Required. Topic string, URL(s), or manifest file path
- `--max-sources`: Maximum sources to discover (topic mode, default: 10)
- `--max-pages`: Maximum pages per source to crawl (default: 50)
- `--auto-approve`: Auto-approve scores above threshold
- `--threshold`: Approval threshold (default: 0.85)
- `--max-iterations`: Max QA loop iterations per document (default: 3)
- `--export-format`: Output format: `project_files`, `fine_tuning`, `jsonl` (default: project_files)

## Pipeline Stages

```
1. reference-discovery  (topic mode only)
2. web-crawler-orchestrator
3. content-repository
4. content-distiller    ◄────────┐
5. quality-reviewer              │
   ├── APPROVE → export          │
   ├── REFACTOR ─────────────────┤
   ├── DEEP_RESEARCH → crawler ──┘
   └── REJECT → archive
6. markdown-exporter
```

## QA Loop Handling

| Decision | Action | Max Iterations |
|----------|--------|----------------|
| REFACTOR | Re-distill with feedback | 3 |
| DEEP_RESEARCH | Crawl more sources, re-distill | 2 |
| Combined | Total loops per document | 5 |

After max iterations, document marked as `needs_manual_review`.

## Example Usage

```
# Full pipeline from topic
/reference-curator-pipeline "Claude Code best practices" --max-sources 5

# Pipeline from specific URLs (skip discovery)
/reference-curator-pipeline https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching

# Resume from existing manifest
/reference-curator-pipeline ./manifest.json --auto-approve

# Fine-tuning dataset output
/reference-curator-pipeline "MCP servers" --export-format fine_tuning --auto-approve
```

## State Management

Pipeline state is saved after each stage to allow resume:

**With MySQL:**
```sql
SELECT * FROM pipeline_runs WHERE run_id = 123;
```

**File-based fallback:**
```
~/reference-library/pipeline_state/run_XXX/state.json
```

## Output

Pipeline returns summary on completion:

```json
{
  "run_id": 123,
  "status": "completed",
  "stats": {
    "sources_discovered": 5,
    "pages_crawled": 45,
    "documents_stored": 45,
    "approved": 40,
    "refactored": 8,
    "deep_researched": 2,
    "rejected": 3,
    "needs_manual_review": 2
  },
  "exports": {
    "format": "project_files",
    "path": "~/reference-library/exports/"
  }
}
```

## See Also

- `/reference-discovery` - Run discovery stage only
- `/web-crawler` - Run crawler stage only
- `/content-repository` - Manage stored content
- `/quality-reviewer` - Run QA review only
- `/markdown-exporter` - Run export only