feat(reference-curator): Add pipeline orchestrator and refactor skill format

Pipeline Orchestrator:
- Add 07-pipeline-orchestrator skill with code/CLAUDE.md and desktop/SKILL.md
- Add /reference-curator-pipeline slash command for full workflow automation
- Add pipeline_runs and pipeline_iteration_tracker tables to schema.sql
- Add v_pipeline_status and v_pipeline_iterations views
- Add pipeline_config.yaml configuration template
- Update AGENTS.md with Reference Curator Skills section
- Update claude-project files with pipeline documentation

Skill Format Refactoring:
- Extract YAML frontmatter from SKILL.md files to separate skill.yaml
- Add tools/ directories with MCP tool documentation
- Update SKILL-FORMAT-REQUIREMENTS.md with new structure
- Add migrate-skill-structure.py script for format conversion

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-01-29 01:01:02 +07:00
parent 243b9d851c
commit d1cd1298a8
91 changed files with 2475 additions and 281 deletions

View File

@@ -130,37 +130,44 @@ This displays available files in `claude-project/` and optionally copies them to
## Architecture
```
[Topic Input]
─────────────────────
│ reference-discovery │ → Search & validate sources
─────────────────────
┌──────────────────────────────┐
│ reference-curator-pipeline │ (Orchestrator)
│ /reference-curator-pipeline │
└──────────────────────────────
┌───────────────────────┼───────────────────────┐
▼ ▼ ▼
[Topic Input] [URL Input] [Manifest Input]
│ │ │
▼ │ │
┌─────────────────────┐ │ │
│ reference-discovery │ ◄─────────┴───────────────────────┘
└─────────────────────┘ (skip if URLs/manifest)
┌──────────────────────────┐
│ web-crawler-orchestrator │ → Crawl (Firecrawl/Node.js/aiohttp/Scrapy)
└──────────────────────────┘
┌────────────────────┐
│ content-repository │ → Store in MySQL
└────────────────────┘
┌───────────────────┐
│ content-distiller │ → Summarize & extract
└───────────────────┘
┌──────────────────┐
│ quality-reviewer │ → QA loop
└──────────────────┘
├── REFACTOR → content-distiller
├── DEEP_RESEARCH → web-crawler-orchestrator
▼ APPROVE
│ content-distiller │ → Summarize & extract ◄─────┐
└───────────────────┘
▼ │
┌──────────────────┐
│ quality-reviewer │ → QA loop
└──────────────────┘
├── REFACTOR (max 3) ────────────────────┤
├── DEEP_RESEARCH (max 2) → crawler ─────┘
▼ APPROVE
┌───────────────────┐
│ markdown-exporter │ → Project files / Fine-tuning
└───────────────────┘
@@ -170,7 +177,35 @@ This displays available files in `claude-project/` and optionally copies them to
## User Guide
### Basic Workflow
### Full Pipeline (Recommended)
Run the complete curation workflow with a single command:
```
# From topic - runs all 6 stages automatically
/reference-curator-pipeline "Claude Code best practices" --max-sources 5
# From URLs - skip discovery, start at crawler
/reference-curator-pipeline https://docs.anthropic.com/en/docs/prompt-caching
# Resume from manifest file
/reference-curator-pipeline ./manifest.json --auto-approve
# Fine-tuning dataset output
/reference-curator-pipeline "MCP servers" --export-format fine_tuning
```
**Pipeline Options:**
- `--max-sources 10` - Max sources to discover (topic mode)
- `--max-pages 50` - Max pages per source to crawl
- `--auto-approve` - Auto-approve scores above threshold
- `--threshold 0.85` - Approval threshold
- `--max-iterations 3` - Max QA loop iterations per document
- `--export-format project_files` - Output format (project_files, fine_tuning, jsonl)
---
### Manual Workflow (Step-by-Step)
**Step 1: Discover References**
```
@@ -295,6 +330,7 @@ mysql -h $MYSQL_HOST -u $MYSQL_USER -p"$MYSQL_PASSWORD" reference_library -e "
| 04 | content-distiller | `/content-distiller` | Summarize & extract |
| 05 | quality-reviewer | `/quality-reviewer` | QA scoring & routing |
| 06 | markdown-exporter | `/markdown-exporter` | Export to markdown/JSONL |
| 07 | pipeline-orchestrator | `/reference-curator-pipeline` | Full pipeline orchestration |
---
@@ -435,7 +471,8 @@ mysql -h $MYSQL_HOST -u $MYSQL_USER -p"$MYSQL_PASSWORD" reference_library < shar
│ ├── content-repository.md
│ ├── content-distiller.md
│ ├── quality-reviewer.md
── markdown-exporter.md
── markdown-exporter.md
│ └── reference-curator-pipeline.md
├── 01-reference-discovery/
│ ├── code/CLAUDE.md # Claude Code directive
@@ -455,6 +492,9 @@ mysql -h $MYSQL_HOST -u $MYSQL_USER -p"$MYSQL_PASSWORD" reference_library < shar
├── 06-markdown-exporter/
│ ├── code/CLAUDE.md
│ └── desktop/SKILL.md
├── 07-pipeline-orchestrator/
│ ├── code/CLAUDE.md
│ └── desktop/SKILL.md
└── shared/
├── schema.sql # MySQL schema