Files

Andrew Yim d1cd1298a8 feat(reference-curator): Add pipeline orchestrator and refactor skill format

Pipeline Orchestrator:
- Add 07-pipeline-orchestrator skill with code/CLAUDE.md and desktop/SKILL.md
- Add /reference-curator-pipeline slash command for full workflow automation
- Add pipeline_runs and pipeline_iteration_tracker tables to schema.sql
- Add v_pipeline_status and v_pipeline_iterations views
- Add pipeline_config.yaml configuration template
- Update AGENTS.md with Reference Curator Skills section
- Update claude-project files with pipeline documentation

Skill Format Refactoring:
- Extract YAML frontmatter from SKILL.md files to separate skill.yaml
- Add tools/ directories with MCP tool documentation
- Update SKILL-FORMAT-REQUIREMENTS.md with new structure
- Add migrate-skill-structure.py script for format conversion

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-29 01:01:02 +07:00

5.8 KiB

Raw Permalink Blame History

Reference Curator - Claude.ai Project Knowledge

This project knowledge enables Claude to curate, process, and export reference documentation through 7 modular skills.

Quick Start - Pipeline Orchestrator

Run the full curation workflow with a single command:

# Full pipeline from topic
curate references on "Claude Code best practices"

# From URLs (skip discovery)
curate these URLs: https://docs.anthropic.com/en/docs/prompt-caching

# With auto-approve
curate references on "MCP servers" with auto-approve

Skills Overview

Skill	Purpose	Trigger Phrases
pipeline-orchestrator	Full 6-skill workflow with QA loops	"curate references", "run full pipeline", "automate curation"
reference-discovery	Search & validate authoritative sources	"find references", "search documentation", "discover sources"
web-crawler	Multi-backend crawling orchestration	"crawl URL", "fetch documents", "scrape pages"
content-repository	MySQL storage management	"store content", "save to database", "check duplicates"
content-distiller	Summarize & extract key concepts	"distill content", "summarize document", "extract key concepts"
quality-reviewer	QA scoring & routing decisions	"review content", "quality check", "assess distilled content"
markdown-exporter	Export to markdown/JSONL	"export references", "generate project files", "create markdown output"

Workflow

                ┌───────────────────────────┐
                │   pipeline-orchestrator   │  (Coordinates all stages)
                └───────────────────────────┘
                            │
        ┌───────────────────┼───────────────────┐
        ▼                   ▼                   ▼
   [Topic Input]      [URL Input]        [Manifest Input]
        │                   │                   │
        ▼                   │                   │
┌─────────────────────┐     │                   │
│ reference-discovery │ ◄───┴───────────────────┘
└─────────────────────┘                  (skip if URLs/manifest)
        │
        ▼
┌─────────────────────┐
│ web-crawler         │ → Crawl (Firecrawl/Node.js/aiohttp/Scrapy)
└─────────────────────┘
        │
        ▼
┌─────────────────────┐
│ content-repository  │ → Store in MySQL
└─────────────────────┘
        │
        ▼
┌─────────────────────┐
│ content-distiller   │ → Summarize & extract  ◄────┐
└─────────────────────┘                             │
        │                                           │
        ▼                                           │
┌─────────────────────┐                             │
│ quality-reviewer    │ → QA loop                   │
└─────────────────────┘                             │
        │                                           │
        ├── REFACTOR (max 3) ───────────────────────┤
        ├── DEEP_RESEARCH (max 2) → crawler ────────┘
        │
        ▼ APPROVE
┌─────────────────────┐
│ markdown-exporter   │ → Project files / Fine-tuning
└─────────────────────┘

Quality Scoring Thresholds

Score	Decision	Action
≥ 0.85	Approve	Ready for export
0.60-0.84	Refactor	Re-distill with feedback
0.40-0.59	Deep Research	Gather more sources
< 0.40	Reject	Archive (low quality)

Source Credibility Tiers

Tier	Source Type	Examples
Tier 1	Official documentation	docs.anthropic.com, platform.openai.com/docs
Tier 1	Official engineering blogs	anthropic.com/news, openai.com/blog
Tier 2	Research papers	arxiv.org papers with citations
Tier 2	Verified community guides	Official cookbooks, tutorials
Tier 3	Community content	Blog posts, Stack Overflow

Files in This Project

INDEX.md - This overview file
reference-curator-complete.md - All 7 skills in one file (recommended)
01-reference-discovery.md - Source discovery skill
02-web-crawler.md - Crawling orchestration skill
03-content-repository.md - Database storage skill
04-content-distiller.md - Content summarization skill
05-quality-reviewer.md - QA review skill
06-markdown-exporter.md - Export skill
07-pipeline-orchestrator.md - Full pipeline orchestration

Usage

Upload all files to a Claude.ai Project, or upload only the skills you need.

For the complete experience, upload reference-curator-complete.md which contains all skills in one file.

Pipeline Orchestrator Options

Option	Default	Description
max_sources	10	Max sources to discover
max_pages	50	Max pages per source
auto_approve	false	Auto-approve above threshold
threshold	0.85	Approval threshold
max_iterations	3	Max QA loop iterations
export_format	project_files	Output format

5.8 KiB Raw Permalink Blame History