feat(reference-curator): Add pipeline orchestrator and refactor skill format
Pipeline Orchestrator: - Add 07-pipeline-orchestrator skill with code/CLAUDE.md and desktop/SKILL.md - Add /reference-curator-pipeline slash command for full workflow automation - Add pipeline_runs and pipeline_iteration_tracker tables to schema.sql - Add v_pipeline_status and v_pipeline_iterations views - Add pipeline_config.yaml configuration template - Update AGENTS.md with Reference Curator Skills section - Update claude-project files with pipeline documentation Skill Format Refactoring: - Extract YAML frontmatter from SKILL.md files to separate skill.yaml - Add tools/ directories with MCP tool documentation - Update SKILL-FORMAT-REQUIREMENTS.md with new structure - Add migrate-skill-structure.py script for format conversion Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,175 @@
|
||||
# Pipeline Orchestrator
|
||||
|
||||
Coordinates the full 6-skill reference curation workflow with automated QA loop handling.
|
||||
|
||||
## Trigger Phrases
|
||||
|
||||
- "curate references on [topic]"
|
||||
- "run full curation pipeline"
|
||||
- "automate reference curation"
|
||||
- "curate these URLs: [url1, url2]"
|
||||
|
||||
## Input Modes
|
||||
|
||||
| Mode | Example | Pipeline Start |
|
||||
|------|---------|----------------|
|
||||
| **Topic** | "curate references on Claude system prompts" | Stage 1 (discovery) |
|
||||
| **URLs** | "curate these URLs: https://docs.anthropic.com/..." | Stage 2 (crawler) |
|
||||
| **Manifest** | "resume curation from manifest.json" | Stage 2 (crawler) |
|
||||
|
||||
## Pipeline Stages
|
||||
|
||||
```
|
||||
1. reference-discovery (topic mode only)
|
||||
│
|
||||
▼
|
||||
2. web-crawler-orchestrator
|
||||
│
|
||||
▼
|
||||
3. content-repository
|
||||
│
|
||||
▼
|
||||
4. content-distiller ◄─────────────┐
|
||||
│ │
|
||||
▼ │
|
||||
5. quality-reviewer │
|
||||
│ │
|
||||
├── APPROVE → Stage 6 │
|
||||
├── REFACTOR ───────────────┤
|
||||
├── DEEP_RESEARCH → Stage 2 ┘
|
||||
└── REJECT → Archive
|
||||
│
|
||||
▼
|
||||
6. markdown-exporter
|
||||
```
|
||||
|
||||
## Configuration Options
|
||||
|
||||
| Option | Default | Description |
|
||||
|--------|---------|-------------|
|
||||
| max_sources | 10 | Maximum sources to discover (topic mode) |
|
||||
| max_pages | 50 | Maximum pages per source to crawl |
|
||||
| auto_approve | false | Auto-approve scores above threshold |
|
||||
| threshold | 0.85 | Quality score threshold for approval |
|
||||
| max_iterations | 3 | Maximum QA loop iterations per document |
|
||||
| export_format | project_files | Output format (project_files, fine_tuning, jsonl) |
|
||||
|
||||
## QA Loop Handling
|
||||
|
||||
The orchestrator automatically handles QA decisions:
|
||||
|
||||
| Decision | Action | Iteration Limit |
|
||||
|----------|--------|-----------------|
|
||||
| **APPROVE** | Proceed to export | - |
|
||||
| **REFACTOR** | Re-distill with feedback | 3 iterations |
|
||||
| **DEEP_RESEARCH** | Crawl more sources, re-distill | 2 iterations |
|
||||
| **REJECT** | Archive with reason | - |
|
||||
|
||||
After reaching iteration limits, documents are marked `needs_manual_review`.
|
||||
|
||||
## State Management
|
||||
|
||||
### With Database
|
||||
|
||||
Pipeline state is tracked in `pipeline_runs` table:
|
||||
- Run ID, input type, current stage
|
||||
- Statistics (crawled, distilled, approved, etc.)
|
||||
- Error handling and resume capability
|
||||
|
||||
### File-Based Fallback
|
||||
|
||||
State saved to `~/reference-library/pipeline_state/run_XXX/`:
|
||||
- `state.json` - Current stage and statistics
|
||||
- `manifest.json` - Discovered sources
|
||||
- `review_log.json` - QA decisions
|
||||
|
||||
## Progress Tracking
|
||||
|
||||
The orchestrator reports progress at each stage:
|
||||
|
||||
```
|
||||
[Pipeline] Stage 1/6: Discovery - Found 8 sources
|
||||
[Pipeline] Stage 2/6: Crawling - 45/50 pages complete
|
||||
[Pipeline] Stage 3/6: Storing - 45 documents saved
|
||||
[Pipeline] Stage 4/6: Distilling - 45 documents processed
|
||||
[Pipeline] Stage 5/6: Reviewing - 40 approved, 3 refactored, 2 rejected
|
||||
[Pipeline] Stage 6/6: Exporting - 40 documents exported
|
||||
[Pipeline] Complete! See ~/reference-library/exports/
|
||||
```
|
||||
|
||||
## Output Summary
|
||||
|
||||
On completion, returns detailed summary:
|
||||
|
||||
```
|
||||
Pipeline Complete:
|
||||
- Sources discovered: 5
|
||||
- Pages crawled: 45
|
||||
- Documents stored: 45
|
||||
- Approved: 40
|
||||
- Refactored: 8
|
||||
- Deep researched: 2
|
||||
- Rejected: 3
|
||||
- Needs manual review: 2
|
||||
|
||||
Exports saved to: ~/reference-library/exports/
|
||||
Format: project_files
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
If a stage fails:
|
||||
1. State is checkpointed
|
||||
2. Error is logged with details
|
||||
3. Pipeline can be resumed from last successful stage
|
||||
|
||||
## Resume Pipeline
|
||||
|
||||
To resume a paused or failed pipeline:
|
||||
- Provide the manifest.json or run state file
|
||||
- Pipeline continues from last checkpoint
|
||||
|
||||
## Example Usage
|
||||
|
||||
### Full Pipeline from Topic
|
||||
|
||||
```
|
||||
User: Curate references on Claude Code best practices with 5 sources
|
||||
|
||||
Claude: I'll run the full curation pipeline for "Claude Code best practices":
|
||||
|
||||
[Pipeline] Stage 1/6: Discovering sources...
|
||||
Found 5 authoritative sources (3 official, 2 community)
|
||||
|
||||
[Pipeline] Stage 2/6: Crawling 5 sources...
|
||||
Crawled 45 pages total
|
||||
|
||||
[Pipeline] Stage 3/6: Storing documents...
|
||||
45 documents saved to repository
|
||||
|
||||
[Pipeline] Stage 4/6: Distilling content...
|
||||
45 documents processed
|
||||
|
||||
[Pipeline] Stage 5/6: Quality review...
|
||||
- 38 approved
|
||||
- 5 refactored (re-distilling...)
|
||||
- 2 rejected (low quality)
|
||||
|
||||
[Pipeline] Stage 6/6: Exporting...
|
||||
40 documents exported to ~/reference-library/exports/
|
||||
|
||||
Pipeline complete! 40 reference documents ready for use.
|
||||
```
|
||||
|
||||
### From Specific URLs
|
||||
|
||||
```
|
||||
User: Curate these URLs with auto-approve:
|
||||
- https://docs.anthropic.com/en/docs/prompt-caching
|
||||
- https://docs.anthropic.com/en/docs/build-with-claude/tool-use
|
||||
|
||||
Claude: Running pipeline from URLs (skipping discovery)...
|
||||
|
||||
[Pipeline] Stage 2/6: Crawling 2 sources...
|
||||
[...continues with remaining stages...]
|
||||
```
|
||||
@@ -1,11 +1,27 @@
|
||||
# Reference Curator - Claude.ai Project Knowledge
|
||||
|
||||
This project knowledge enables Claude to curate, process, and export reference documentation through 6 modular skills.
|
||||
This project knowledge enables Claude to curate, process, and export reference documentation through 7 modular skills.
|
||||
|
||||
## Quick Start - Pipeline Orchestrator
|
||||
|
||||
Run the full curation workflow with a single command:
|
||||
|
||||
```
|
||||
# Full pipeline from topic
|
||||
curate references on "Claude Code best practices"
|
||||
|
||||
# From URLs (skip discovery)
|
||||
curate these URLs: https://docs.anthropic.com/en/docs/prompt-caching
|
||||
|
||||
# With auto-approve
|
||||
curate references on "MCP servers" with auto-approve
|
||||
```
|
||||
|
||||
## Skills Overview
|
||||
|
||||
| Skill | Purpose | Trigger Phrases |
|
||||
|-------|---------|-----------------|
|
||||
| **pipeline-orchestrator** | Full 6-skill workflow with QA loops | "curate references", "run full pipeline", "automate curation" |
|
||||
| **reference-discovery** | Search & validate authoritative sources | "find references", "search documentation", "discover sources" |
|
||||
| **web-crawler** | Multi-backend crawling orchestration | "crawl URL", "fetch documents", "scrape pages" |
|
||||
| **content-repository** | MySQL storage management | "store content", "save to database", "check duplicates" |
|
||||
@@ -16,37 +32,43 @@ This project knowledge enables Claude to curate, process, and export reference d
|
||||
## Workflow
|
||||
|
||||
```
|
||||
[Topic Input]
|
||||
│
|
||||
▼
|
||||
┌─────────────────────┐
|
||||
│ reference-discovery │ → Search & validate sources
|
||||
└─────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌───────────────────────────┐
|
||||
│ pipeline-orchestrator │ (Coordinates all stages)
|
||||
└───────────────────────────┘
|
||||
│
|
||||
┌───────────────────┼───────────────────┐
|
||||
▼ ▼ ▼
|
||||
[Topic Input] [URL Input] [Manifest Input]
|
||||
│ │ │
|
||||
▼ │ │
|
||||
┌─────────────────────┐ │ │
|
||||
│ reference-discovery │ ◄───┴───────────────────┘
|
||||
└─────────────────────┘ (skip if URLs/manifest)
|
||||
│
|
||||
▼
|
||||
┌─────────────────────┐
|
||||
│ web-crawler │ → Crawl (Firecrawl/Node.js/aiohttp/Scrapy)
|
||||
└─────────────────────┘
|
||||
│
|
||||
▼
|
||||
│
|
||||
▼
|
||||
┌─────────────────────┐
|
||||
│ content-repository │ → Store in MySQL
|
||||
└─────────────────────┘
|
||||
│
|
||||
▼
|
||||
│
|
||||
▼
|
||||
┌─────────────────────┐
|
||||
│ content-distiller │ → Summarize & extract
|
||||
└─────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────┐
|
||||
│ quality-reviewer │ → QA loop
|
||||
└─────────────────────┘
|
||||
│
|
||||
├── REFACTOR → content-distiller
|
||||
├── DEEP_RESEARCH → web-crawler
|
||||
│
|
||||
▼ APPROVE
|
||||
│ content-distiller │ → Summarize & extract ◄────┐
|
||||
└─────────────────────┘ │
|
||||
│ │
|
||||
▼ │
|
||||
┌─────────────────────┐ │
|
||||
│ quality-reviewer │ → QA loop │
|
||||
└─────────────────────┘ │
|
||||
│ │
|
||||
├── REFACTOR (max 3) ───────────────────────┤
|
||||
├── DEEP_RESEARCH (max 2) → crawler ────────┘
|
||||
│
|
||||
▼ APPROVE
|
||||
┌─────────────────────┐
|
||||
│ markdown-exporter │ → Project files / Fine-tuning
|
||||
└─────────────────────┘
|
||||
@@ -74,16 +96,28 @@ This project knowledge enables Claude to curate, process, and export reference d
|
||||
## Files in This Project
|
||||
|
||||
- `INDEX.md` - This overview file
|
||||
- `reference-curator-complete.md` - All 6 skills in one file
|
||||
- `reference-curator-complete.md` - All 7 skills in one file (recommended)
|
||||
- `01-reference-discovery.md` - Source discovery skill
|
||||
- `02-web-crawler.md` - Crawling orchestration skill
|
||||
- `03-content-repository.md` - Database storage skill
|
||||
- `04-content-distiller.md` - Content summarization skill
|
||||
- `05-quality-reviewer.md` - QA review skill
|
||||
- `06-markdown-exporter.md` - Export skill
|
||||
- `07-pipeline-orchestrator.md` - Full pipeline orchestration
|
||||
|
||||
## Usage
|
||||
|
||||
Upload all files to a Claude.ai Project, or upload only the skills you need.
|
||||
|
||||
For the complete experience, upload `reference-curator-complete.md` which contains all skills in one file.
|
||||
|
||||
## Pipeline Orchestrator Options
|
||||
|
||||
| Option | Default | Description |
|
||||
|--------|---------|-------------|
|
||||
| max_sources | 10 | Max sources to discover |
|
||||
| max_pages | 50 | Max pages per source |
|
||||
| auto_approve | false | Auto-approve above threshold |
|
||||
| threshold | 0.85 | Approval threshold |
|
||||
| max_iterations | 3 | Max QA loop iterations |
|
||||
| export_format | project_files | Output format |
|
||||
|
||||
@@ -1,6 +1,87 @@
|
||||
# Reference Curator - Complete Skill Set
|
||||
|
||||
This document contains all 6 skills for curating, processing, and exporting reference documentation.
|
||||
This document contains all 7 skills for curating, processing, and exporting reference documentation.
|
||||
|
||||
---
|
||||
|
||||
# Pipeline Orchestrator (Recommended Entry Point)
|
||||
|
||||
Coordinates the full 6-skill workflow with automated QA loop handling.
|
||||
|
||||
## Quick Start
|
||||
|
||||
```
|
||||
# Full pipeline from topic
|
||||
curate references on "Claude Code best practices"
|
||||
|
||||
# From URLs (skip discovery)
|
||||
curate these URLs: https://docs.anthropic.com/en/docs/prompt-caching
|
||||
|
||||
# With auto-approve
|
||||
curate references on "MCP servers" with auto-approve and fine-tuning output
|
||||
```
|
||||
|
||||
## Configuration Options
|
||||
|
||||
| Option | Default | Description |
|
||||
|--------|---------|-------------|
|
||||
| max_sources | 10 | Maximum sources to discover |
|
||||
| max_pages | 50 | Maximum pages per source |
|
||||
| auto_approve | false | Auto-approve above threshold |
|
||||
| threshold | 0.85 | Approval threshold |
|
||||
| max_iterations | 3 | Max QA loop iterations |
|
||||
| export_format | project_files | Output format |
|
||||
|
||||
## Pipeline Flow
|
||||
|
||||
```
|
||||
[Input: Topic | URLs | Manifest]
|
||||
│
|
||||
▼
|
||||
1. reference-discovery (skip if URLs/manifest)
|
||||
│
|
||||
▼
|
||||
2. web-crawler
|
||||
│
|
||||
▼
|
||||
3. content-repository
|
||||
│
|
||||
▼
|
||||
4. content-distiller ◄─────────────┐
|
||||
│ │
|
||||
▼ │
|
||||
5. quality-reviewer │
|
||||
│ │
|
||||
├── APPROVE → export │
|
||||
├── REFACTOR (max 3) ─────┤
|
||||
├── DEEP_RESEARCH (max 2) → crawler
|
||||
└── REJECT → archive
|
||||
│
|
||||
▼
|
||||
6. markdown-exporter
|
||||
```
|
||||
|
||||
## QA Loop Handling
|
||||
|
||||
| Decision | Action | Max Iterations |
|
||||
|----------|--------|----------------|
|
||||
| APPROVE | Proceed to export | - |
|
||||
| REFACTOR | Re-distill with feedback | 3 |
|
||||
| DEEP_RESEARCH | Crawl more sources | 2 |
|
||||
| REJECT | Archive with reason | - |
|
||||
|
||||
Documents exceeding iteration limits are marked `needs_manual_review`.
|
||||
|
||||
## Output Summary
|
||||
|
||||
```
|
||||
Pipeline Complete:
|
||||
- Sources discovered: 5
|
||||
- Pages crawled: 45
|
||||
- Approved: 40
|
||||
- Needs manual review: 2
|
||||
- Exports: ~/reference-library/exports/
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
@@ -464,6 +545,7 @@ def add_cross_references(doc, all_docs):
|
||||
|
||||
| From | Output | To |
|
||||
|------|--------|-----|
|
||||
| **pipeline-orchestrator** | Coordinates all stages | All skills below |
|
||||
| **reference-discovery** | URL manifest | web-crawler |
|
||||
| **web-crawler** | Raw content + manifest | content-repository |
|
||||
| **content-repository** | Document records | content-distiller |
|
||||
@@ -471,3 +553,25 @@ def add_cross_references(doc, all_docs):
|
||||
| **quality-reviewer** (approve) | Approved IDs | markdown-exporter |
|
||||
| **quality-reviewer** (refactor) | Instructions | content-distiller |
|
||||
| **quality-reviewer** (deep_research) | Queries | web-crawler |
|
||||
|
||||
## State Management
|
||||
|
||||
The pipeline orchestrator tracks state for resume capability:
|
||||
|
||||
**With Database:**
|
||||
- `pipeline_runs` table tracks run status, current stage, statistics
|
||||
- `pipeline_iteration_tracker` tracks QA loop iterations per document
|
||||
|
||||
**File-Based Fallback:**
|
||||
```
|
||||
~/reference-library/pipeline_state/run_XXX/
|
||||
├── state.json # Current stage and stats
|
||||
├── manifest.json # Discovered sources
|
||||
└── review_log.json # QA decisions
|
||||
```
|
||||
|
||||
## Resume Pipeline
|
||||
|
||||
To resume a paused or failed pipeline:
|
||||
1. Provide the run_id or state file path
|
||||
2. Pipeline continues from last successful checkpoint
|
||||
|
||||
Reference in New Issue
Block a user