feat(reference-curator): Add pipeline orchestrator and refactor skill format

Pipeline Orchestrator: - Add 07-pipeline-orchestrator skill with code/CLAUDE.md and desktop/SKILL.md - Add /reference-curator-pipeline slash command for full workflow automation - Add pipeline_runs and pipeline_iteration_tracker tables to schema.sql - Add v_pipeline_status and v_pipeline_iterations views - Add pipeline_config.yaml configuration template - Update AGENTS.md with Reference Curator Skills section - Update claude-project files with pipeline documentation Skill Format Refactoring: - Extract YAML frontmatter from SKILL.md files to separate skill.yaml - Add tools/ directories with MCP tool documentation - Update SKILL-FORMAT-REQUIREMENTS.md with new structure - Add migrate-skill-structure.py script for format conversion Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 01:01:02 +07:00
parent 243b9d851c
commit d1cd1298a8
91 changed files with 2475 additions and 281 deletions
--- a/custom-skills/90-reference-curator/commands/reference-curator-pipeline.md
+++ b/custom-skills/90-reference-curator/commands/reference-curator-pipeline.md
@@ -0,0 +1,115 @@
+---
+description: Orchestrates full reference curation pipeline as background task. Runs discovery → crawl → store → distill → review → export with QA loop handling.
+argument-hint: <topic|urls|manifest> [--max-sources 10] [--max-pages 50] [--auto-approve] [--threshold 0.85] [--max-iterations 3] [--export-format project_files]
+allowed-tools: WebSearch, WebFetch, Read, Write, Bash, Grep, Glob, Task
+---
+
+# Reference Curator Pipeline
+
+Full-stack orchestration of the 6-skill reference curation workflow.
+
+## Input Modes
+
+| Mode | Input Example | Pipeline Start |
+|------|---------------|----------------|
+| **Topic** | `"Claude system prompts"` | reference-discovery |
+| **URLs** | `https://docs.anthropic.com/...` | web-crawler (skip discovery) |
+| **Manifest** | `./manifest.json` | web-crawler (resume from discovery) |
+
+## Arguments
+
+- `<input>`: Required. Topic string, URL(s), or manifest file path
+- `--max-sources`: Maximum sources to discover (topic mode, default: 10)
+- `--max-pages`: Maximum pages per source to crawl (default: 50)
+- `--auto-approve`: Auto-approve scores above threshold
+- `--threshold`: Approval threshold (default: 0.85)
+- `--max-iterations`: Max QA loop iterations per document (default: 3)
+- `--export-format`: Output format: `project_files`, `fine_tuning`, `jsonl` (default: project_files)
+
+## Pipeline Stages
+
+```
+1. reference-discovery  (topic mode only)
+2. web-crawler-orchestrator
+3. content-repository
+4. content-distiller    ◄────────┐
+5. quality-reviewer              │
+   ├── APPROVE → export          │
+   ├── REFACTOR ─────────────────┤
+   ├── DEEP_RESEARCH → crawler ──┘
+   └── REJECT → archive
+6. markdown-exporter
+```
+
+## QA Loop Handling
+
+| Decision | Action | Max Iterations |
+|----------|--------|----------------|
+| REFACTOR | Re-distill with feedback | 3 |
+| DEEP_RESEARCH | Crawl more sources, re-distill | 2 |
+| Combined | Total loops per document | 5 |
+
+After max iterations, document marked as `needs_manual_review`.
+
+## Example Usage
+
+```
+# Full pipeline from topic
+/reference-curator-pipeline "Claude Code best practices" --max-sources 5
+
+# Pipeline from specific URLs (skip discovery)
+/reference-curator-pipeline https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching
+
+# Resume from existing manifest
+/reference-curator-pipeline ./manifest.json --auto-approve
+
+# Fine-tuning dataset output
+/reference-curator-pipeline "MCP servers" --export-format fine_tuning --auto-approve
+```
+
+## State Management
+
+Pipeline state is saved after each stage to allow resume:
+
+**With MySQL:**
+```sql
+SELECT * FROM pipeline_runs WHERE run_id = 123;
+```
+
+**File-based fallback:**
+```
+~/reference-library/pipeline_state/run_XXX/state.json
+```
+
+## Output
+
+Pipeline returns summary on completion:
+
+```json
+{
+  "run_id": 123,
+  "status": "completed",
+  "stats": {
+    "sources_discovered": 5,
+    "pages_crawled": 45,
+    "documents_stored": 45,
+    "approved": 40,
+    "refactored": 8,
+    "deep_researched": 2,
+    "rejected": 3,
+    "needs_manual_review": 2
+  },
+  "exports": {
+    "format": "project_files",
+    "path": "~/reference-library/exports/"
+  }
+}
+```
+
+## See Also
+
+- `/reference-discovery` - Run discovery stage only
+- `/web-crawler` - Run crawler stage only
+- `/content-repository` - Manage stored content
+- `/quality-reviewer` - Run QA review only
+- `/markdown-exporter` - Run export only