Files

Andrew Yim d1cd1298a8 feat(reference-curator): Add pipeline orchestrator and refactor skill format

Pipeline Orchestrator:
- Add 07-pipeline-orchestrator skill with code/CLAUDE.md and desktop/SKILL.md
- Add /reference-curator-pipeline slash command for full workflow automation
- Add pipeline_runs and pipeline_iteration_tracker tables to schema.sql
- Add v_pipeline_status and v_pipeline_iterations views
- Add pipeline_config.yaml configuration template
- Update AGENTS.md with Reference Curator Skills section
- Update claude-project files with pipeline documentation

Skill Format Refactoring:
- Extract YAML frontmatter from SKILL.md files to separate skill.yaml
- Add tools/ directories with MCP tool documentation
- Update SKILL-FORMAT-REQUIREMENTS.md with new structure
- Add migrate-skill-structure.py script for format conversion

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-29 01:01:02 +07:00

4.7 KiB

Raw Blame History

Pipeline Orchestrator

Coordinates the full 6-skill reference curation workflow with automated QA loop handling.

Trigger Phrases

"curate references on [topic]"
"run full curation pipeline"
"automate reference curation"
"curate these URLs: [url1, url2]"

Input Modes

Mode	Example	Pipeline Start
Topic	"curate references on Claude system prompts"	Stage 1 (discovery)
URLs	"curate these URLs: https://docs.anthropic.com/..."	Stage 2 (crawler)
Manifest	"resume curation from manifest.json"	Stage 2 (crawler)

Pipeline Stages

1. reference-discovery  (topic mode only)
       │
       ▼
2. web-crawler-orchestrator
       │
       ▼
3. content-repository
       │
       ▼
4. content-distiller ◄─────────────┐
       │                           │
       ▼                           │
5. quality-reviewer                │
       │                           │
       ├── APPROVE → Stage 6       │
       ├── REFACTOR ───────────────┤
       ├── DEEP_RESEARCH → Stage 2 ┘
       └── REJECT → Archive
       │
       ▼
6. markdown-exporter

Configuration Options

Option	Default	Description
max_sources	10	Maximum sources to discover (topic mode)
max_pages	50	Maximum pages per source to crawl
auto_approve	false	Auto-approve scores above threshold
threshold	0.85	Quality score threshold for approval
max_iterations	3	Maximum QA loop iterations per document
export_format	project_files	Output format (project_files, fine_tuning, jsonl)

QA Loop Handling

The orchestrator automatically handles QA decisions:

Decision	Action	Iteration Limit
APPROVE	Proceed to export	-
REFACTOR	Re-distill with feedback	3 iterations
DEEP_RESEARCH	Crawl more sources, re-distill	2 iterations
REJECT	Archive with reason	-

After reaching iteration limits, documents are marked needs_manual_review.

State Management

With Database

Pipeline state is tracked in pipeline_runs table:

Run ID, input type, current stage
Statistics (crawled, distilled, approved, etc.)
Error handling and resume capability

File-Based Fallback

State saved to ~/reference-library/pipeline_state/run_XXX/:

state.json - Current stage and statistics
manifest.json - Discovered sources
review_log.json - QA decisions

Progress Tracking

The orchestrator reports progress at each stage:

[Pipeline] Stage 1/6: Discovery - Found 8 sources
[Pipeline] Stage 2/6: Crawling - 45/50 pages complete
[Pipeline] Stage 3/6: Storing - 45 documents saved
[Pipeline] Stage 4/6: Distilling - 45 documents processed
[Pipeline] Stage 5/6: Reviewing - 40 approved, 3 refactored, 2 rejected
[Pipeline] Stage 6/6: Exporting - 40 documents exported
[Pipeline] Complete! See ~/reference-library/exports/

Output Summary

On completion, returns detailed summary:

Pipeline Complete:
- Sources discovered: 5
- Pages crawled: 45
- Documents stored: 45
- Approved: 40
- Refactored: 8
- Deep researched: 2
- Rejected: 3
- Needs manual review: 2

Exports saved to: ~/reference-library/exports/
Format: project_files

Error Handling

If a stage fails:

State is checkpointed
Error is logged with details
Pipeline can be resumed from last successful stage

Resume Pipeline

To resume a paused or failed pipeline:

Provide the manifest.json or run state file
Pipeline continues from last checkpoint

Example Usage

Full Pipeline from Topic

User: Curate references on Claude Code best practices with 5 sources

Claude: I'll run the full curation pipeline for "Claude Code best practices":

[Pipeline] Stage 1/6: Discovering sources...
Found 5 authoritative sources (3 official, 2 community)

[Pipeline] Stage 2/6: Crawling 5 sources...
Crawled 45 pages total

[Pipeline] Stage 3/6: Storing documents...
45 documents saved to repository

[Pipeline] Stage 4/6: Distilling content...
45 documents processed

[Pipeline] Stage 5/6: Quality review...
- 38 approved
- 5 refactored (re-distilling...)
- 2 rejected (low quality)

[Pipeline] Stage 6/6: Exporting...
40 documents exported to ~/reference-library/exports/

Pipeline complete! 40 reference documents ready for use.

From Specific URLs

User: Curate these URLs with auto-approve:
- https://docs.anthropic.com/en/docs/prompt-caching
- https://docs.anthropic.com/en/docs/build-with-claude/tool-use

Claude: Running pipeline from URLs (skipping discovery)...

[Pipeline] Stage 2/6: Crawling 2 sources...
[...continues with remaining stages...]

4.7 KiB Raw Blame History