# Reference Curator - Claude.ai Project Knowledge

This project knowledge enables Claude to curate, process, and export reference documentation through 7 modular skills.

## Quick Start - Pipeline Orchestrator

Run the full curation workflow with a single command:

```
# Full pipeline from topic
curate references on "Claude Code best practices"

# From URLs (skip discovery)
curate these URLs: https://docs.anthropic.com/en/docs/prompt-caching

# With auto-approve
curate references on "MCP servers" with auto-approve
```

## Skills Overview

| Skill | Purpose | Trigger Phrases |
|-------|---------|-----------------|
| **pipeline-orchestrator** | Full 6-skill workflow with QA loops | "curate references", "run full pipeline", "automate curation" |
| **reference-discovery** | Search & validate authoritative sources | "find references", "search documentation", "discover sources" |
| **web-crawler** | Multi-backend crawling orchestration | "crawl URL", "fetch documents", "scrape pages" |
| **content-repository** | MySQL storage management | "store content", "save to database", "check duplicates" |
| **content-distiller** | Summarize & extract key concepts | "distill content", "summarize document", "extract key concepts" |
| **quality-reviewer** | QA scoring & routing decisions | "review content", "quality check", "assess distilled content" |
| **markdown-exporter** | Export to markdown/JSONL | "export references", "generate project files", "create markdown output" |

## Workflow

```
                ┌───────────────────────────┐
                │   pipeline-orchestrator   │  (Coordinates all stages)
                └───────────────────────────┘
                            │
        ┌───────────────────┼───────────────────┐
        ▼                   ▼                   ▼
   [Topic Input]      [URL Input]        [Manifest Input]
        │                   │                   │
        ▼                   │                   │
┌─────────────────────┐     │                   │
│ reference-discovery │ ◄───┴───────────────────┘
└─────────────────────┘                  (skip if URLs/manifest)
        │
        ▼
┌─────────────────────┐
│ web-crawler         │ → Crawl (Firecrawl/Node.js/aiohttp/Scrapy)
└─────────────────────┘
        │
        ▼
┌─────────────────────┐
│ content-repository  │ → Store in MySQL
└─────────────────────┘
        │
        ▼
┌─────────────────────┐
│ content-distiller   │ → Summarize & extract  ◄────┐
└─────────────────────┘                             │
        │                                           │
        ▼                                           │
┌─────────────────────┐                             │
│ quality-reviewer    │ → QA loop                   │
└─────────────────────┘                             │
        │                                           │
        ├── REFACTOR (max 3) ───────────────────────┤
        ├── DEEP_RESEARCH (max 2) → crawler ────────┘
        │
        ▼ APPROVE
┌─────────────────────┐
│ markdown-exporter   │ → Project files / Fine-tuning
└─────────────────────┘
```

## Quality Scoring Thresholds

| Score | Decision | Action |
|-------|----------|--------|
| ≥ 0.85 | **Approve** | Ready for export |
| 0.60-0.84 | **Refactor** | Re-distill with feedback |
| 0.40-0.59 | **Deep Research** | Gather more sources |
| < 0.40 | **Reject** | Archive (low quality) |

## Source Credibility Tiers

| Tier | Source Type | Examples |
|------|-------------|----------|
| **Tier 1** | Official documentation | docs.anthropic.com, platform.openai.com/docs |
| **Tier 1** | Official engineering blogs | anthropic.com/news, openai.com/blog |
| **Tier 2** | Research papers | arxiv.org papers with citations |
| **Tier 2** | Verified community guides | Official cookbooks, tutorials |
| **Tier 3** | Community content | Blog posts, Stack Overflow |

## Files in This Project

- `INDEX.md` - This overview file
- `reference-curator-complete.md` - All 7 skills in one file (recommended)
- `01-reference-discovery.md` - Source discovery skill
- `02-web-crawler.md` - Crawling orchestration skill
- `03-content-repository.md` - Database storage skill
- `04-content-distiller.md` - Content summarization skill
- `05-quality-reviewer.md` - QA review skill
- `06-markdown-exporter.md` - Export skill
- `07-pipeline-orchestrator.md` - Full pipeline orchestration

## Usage

Upload all files to a Claude.ai Project, or upload only the skills you need.

For the complete experience, upload `reference-curator-complete.md` which contains all skills in one file.

## Pipeline Orchestrator Options

| Option | Default | Description |
|--------|---------|-------------|
| max_sources | 10 | Max sources to discover |
| max_pages | 50 | Max pages per source |
| auto_approve | false | Auto-approve above threshold |
| threshold | 0.85 | Approval threshold |
| max_iterations | 3 | Max QA loop iterations |
| export_format | project_files | Output format |