feat(reference-curator): Add pipeline orchestrator and refactor skill format

Pipeline Orchestrator:
- Add 07-pipeline-orchestrator skill with code/CLAUDE.md and desktop/SKILL.md
- Add /reference-curator-pipeline slash command for full workflow automation
- Add pipeline_runs and pipeline_iteration_tracker tables to schema.sql
- Add v_pipeline_status and v_pipeline_iterations views
- Add pipeline_config.yaml configuration template
- Update AGENTS.md with Reference Curator Skills section
- Update claude-project files with pipeline documentation

Skill Format Refactoring:
- Extract YAML frontmatter from SKILL.md files to separate skill.yaml
- Add tools/ directories with MCP tool documentation
- Update SKILL-FORMAT-REQUIREMENTS.md with new structure
- Add migrate-skill-structure.py script for format conversion

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-01-29 01:01:02 +07:00
parent 243b9d851c
commit d1cd1298a8
91 changed files with 2475 additions and 281 deletions

View File

@@ -1,6 +1,87 @@
# Reference Curator - Complete Skill Set
This document contains all 6 skills for curating, processing, and exporting reference documentation.
This document contains all 7 skills for curating, processing, and exporting reference documentation.
---
# Pipeline Orchestrator (Recommended Entry Point)
Coordinates the full 6-skill workflow with automated QA loop handling.
## Quick Start
```
# Full pipeline from topic
curate references on "Claude Code best practices"
# From URLs (skip discovery)
curate these URLs: https://docs.anthropic.com/en/docs/prompt-caching
# With auto-approve
curate references on "MCP servers" with auto-approve and fine-tuning output
```
## Configuration Options
| Option | Default | Description |
|--------|---------|-------------|
| max_sources | 10 | Maximum sources to discover |
| max_pages | 50 | Maximum pages per source |
| auto_approve | false | Auto-approve above threshold |
| threshold | 0.85 | Approval threshold |
| max_iterations | 3 | Max QA loop iterations |
| export_format | project_files | Output format |
## Pipeline Flow
```
[Input: Topic | URLs | Manifest]
1. reference-discovery (skip if URLs/manifest)
2. web-crawler
3. content-repository
4. content-distiller ◄─────────────┐
│ │
▼ │
5. quality-reviewer │
│ │
├── APPROVE → export │
├── REFACTOR (max 3) ─────┤
├── DEEP_RESEARCH (max 2) → crawler
└── REJECT → archive
6. markdown-exporter
```
## QA Loop Handling
| Decision | Action | Max Iterations |
|----------|--------|----------------|
| APPROVE | Proceed to export | - |
| REFACTOR | Re-distill with feedback | 3 |
| DEEP_RESEARCH | Crawl more sources | 2 |
| REJECT | Archive with reason | - |
Documents exceeding iteration limits are marked `needs_manual_review`.
## Output Summary
```
Pipeline Complete:
- Sources discovered: 5
- Pages crawled: 45
- Approved: 40
- Needs manual review: 2
- Exports: ~/reference-library/exports/
```
---
@@ -464,6 +545,7 @@ def add_cross_references(doc, all_docs):
| From | Output | To |
|------|--------|-----|
| **pipeline-orchestrator** | Coordinates all stages | All skills below |
| **reference-discovery** | URL manifest | web-crawler |
| **web-crawler** | Raw content + manifest | content-repository |
| **content-repository** | Document records | content-distiller |
@@ -471,3 +553,25 @@ def add_cross_references(doc, all_docs):
| **quality-reviewer** (approve) | Approved IDs | markdown-exporter |
| **quality-reviewer** (refactor) | Instructions | content-distiller |
| **quality-reviewer** (deep_research) | Queries | web-crawler |
## State Management
The pipeline orchestrator tracks state for resume capability:
**With Database:**
- `pipeline_runs` table tracks run status, current stage, statistics
- `pipeline_iteration_tracker` tracks QA loop iterations per document
**File-Based Fallback:**
```
~/reference-library/pipeline_state/run_XXX/
├── state.json # Current stage and stats
├── manifest.json # Discovered sources
└── review_log.json # QA decisions
```
## Resume Pipeline
To resume a paused or failed pipeline:
1. Provide the run_id or state file path
2. Pipeline continues from last successful checkpoint