our-claude-skills/custom-skills/90-reference-curator/USER-GUIDE.md

# Reference Curator - User Guide

A modular skill suite for curating reference documentation with Claude Code.

## Quick Start

### One-Shot Full Pipeline (Recommended)

```bash
# Research a topic and generate reference library
/reference-curator-pipeline "Claude Code best practices" --auto-approve

# From specific URLs
/reference-curator-pipeline https://docs.anthropic.com/en/docs/agents

# With options
/reference-curator-pipeline "MCP servers" --max-sources 10 --export-format fine_tuning
```

### Run as Background Agent

Ask Claude Code:
> "Run reference-curator as a background agent for topic X"

This launches autonomous processing while you continue other work.

---

## Individual Skills

Use these when you need granular control over each stage.

### 1. Reference Discovery

**Purpose:** Search and discover authoritative sources

```bash
/reference-discovery "prompt engineering techniques"
```

**Output:** Discovery manifest (JSON) with scored URLs

| Tier | Score | Sources |
|------|-------|---------|
| tier1_official | ≥0.60 | Official docs, vendor blogs |
| tier2_verified | 0.40-0.59 | Research papers, cookbooks |
| tier3_community | <0.40 | Tutorials, blog posts |

---

### 2. Web Crawler

**Purpose:** Crawl URLs and save raw content

```bash
# From discovery manifest
/web-crawler /path/to/manifest.json

# Single URL
/web-crawler https://docs.example.com/guide
```

**Output:** Markdown files in `~/Documents/05_AI Agent/10_Reference Library/raw/YYYY/MM/`

**Crawler Selection:**
| Crawler | Best For |
|---------|----------|
| Firecrawl (default) | SPAs, JS-rendered sites |
| WebFetch | Simple documentation pages |
| Node.js | Small static sites (≤50 pages) |

---

### 3. Content Repository

**Purpose:** Store documents in MySQL with versioning

```bash
# Store crawled content
/content-repository store --topic "my-topic"

# Query existing documents
/content-repository query --topic "my-topic"
```

**Prerequisites:** MySQL database `reference_library` with schema applied.

---

### 4. Content Distiller

**Purpose:** Summarize and extract key concepts

```bash
/content-distiller --topic "my-topic"
```

**Output per document:**
- Executive summary (2-3 sentences)
- Key concepts (JSON)
- Code snippets (JSON)
- Structured content (Markdown)
- ~15-20% compression ratio

---

### 5. Quality Reviewer

**Purpose:** Score and approve content

```bash
/quality-reviewer --topic "my-topic"
```

**Scoring Criteria:**
| Criterion | Weight |
|-----------|--------|
| Accuracy | 25% |
| Completeness | 20% |
| Clarity | 20% |
| PE Quality | 25% |
| Usability | 10% |

**Decisions:**
| Score | Decision | Action |
|-------|----------|--------|
| ≥0.85 | APPROVE | Ready for export |
| 0.60-0.84 | REFACTOR | Re-distill with feedback |
| 0.40-0.59 | DEEP_RESEARCH | Crawl more sources |
| <0.40 | REJECT | Archive |

---

### 6. Markdown Exporter

**Purpose:** Generate final export files

```bash
# Project files (for Claude Projects)
/markdown-exporter --topic "my-topic" --format project_files

# Fine-tuning dataset (JSONL)
/markdown-exporter --topic "my-topic" --format fine_tuning
```

**Output Structure:**
```
~/Documents/05_AI Agent/10_Reference Library/exports/
├── INDEX.md
└── my-topic/
    ├── _index.md
    ├── 01-document-one.md
    ├── 02-document-two.md
    └── ...
```

---

## Common Workflows

### Workflow 1: Quick Reference on a Topic

```
You: "Create a reference library on n8n self-hosting"
Claude: [Runs full pipeline automatically]
```

### Workflow 2: Curate Specific URLs

```
You: "Add these URLs to my reference library:
      - https://docs.example.com/guide1
      - https://docs.example.com/guide2"
Claude: [Skips discovery, crawls URLs directly]
```

### Workflow 3: Update Existing Topic

```
You: "Update the MCP developer manual with latest SDK docs"
Claude: [Discovers new sources, crawls, merges with existing]
```

### Workflow 4: Export for Fine-Tuning

```
You: "Export all approved content as JSONL for fine-tuning"
Claude: /markdown-exporter --format fine_tuning
```

---

## Configuration

### Storage Paths

| Type | Location |
|------|----------|
| Raw content | `~/Documents/05_AI Agent/10_Reference Library/raw/` |
| Exports | `~/Documents/05_AI Agent/10_Reference Library/exports/` |
| Config | `~/.config/reference-curator/` |

### MySQL Setup

Database credentials via `~/.my.cnf`:
```ini
[client]
user = root
password = your_password
socket = /tmp/mysql.sock
```

### Config Files

```
~/.config/reference-curator/
├── db_config.yaml      # Database connection
├── crawl_config.yaml   # Crawler settings
└── export_config.yaml  # Export preferences
```

---

## Tips

### For Best Results

1. **Be specific with topics** - "Claude Code MCP integration" > "MCP"
2. **Use official sources first** - tier1 sources get higher quality scores
3. **Review before export** - Check quality scores, refine if needed
4. **Use background agents** - For large topics, run as background task

### Troubleshooting

| Issue | Solution |
|-------|----------|
| MySQL auth failed | Check `~/.my.cnf` permissions (chmod 600) |
| Missing table | Run schema.sql from `shared/schema.sql` |
| Low quality scores | Add more tier1 sources, re-distill |
| Crawl timeout | Try different crawler or reduce max_pages |
| Export empty | Check documents are approved in quality review |

### Checking Pipeline Status

```sql
-- View pending documents
SELECT title, review_status FROM distilled_content
JOIN documents USING (doc_id) WHERE review_status = 'pending';

-- View export history
SELECT export_name, total_documents, status FROM export_jobs;
```

---

## Example Session

```
You: Create a reference library on "Anthropic Claude API best practices"

Claude: Starting reference-curator pipeline...

[1/6] Reference Discovery
- Searching docs.anthropic.com, github.com/anthropics...
- Found 12 authoritative sources (8 tier1, 3 tier2, 1 tier3)

[2/6] Web Crawling
- Crawling 12 URLs with WebFetch...
- Saved 12 markdown files

[3/6] Content Repository
- Stored 12 documents in MySQL
- Created topic: claude-api-best-practices

[4/6] Content Distillation
- Distilled 12 documents (~18% avg compression)
- Extracted 45 key concepts

[5/6] Quality Review
- Scored all documents
- 11 APPROVED (≥0.85), 1 REFACTOR (0.78)
- Re-distilling 1 document... APPROVED (0.86)

[6/6] Markdown Export
- Generated 14 files (12 docs + 2 indexes)
- Output: ~/Documents/.../exports/claude-api-best-practices/

Pipeline complete! Reference library ready.
```

---

## Need Help?

- **Skill source code:** `custom-skills/90-reference-curator/`
- **MySQL schema:** `shared/schema.sql`
- **Config templates:** `shared/config/`