feat(reference-curator): Add portable skill suite for reference documentation curation

6 modular skills for curating, processing, and exporting reference docs: - reference-discovery: Search and validate authoritative sources - web-crawler-orchestrator: Multi-backend crawling (Firecrawl/Node/aiohttp/Scrapy) - content-repository: MySQL storage with version tracking - content-distiller: Summarization and key concept extraction - quality-reviewer: QA loop with approve/refactor/research routing - markdown-exporter: Structured output for Claude Projects or fine-tuning Cross-machine installation support: - Environment-based config (~/.reference-curator.env) - Commands tracked in repo, symlinked during install - install.sh with --minimal, --check, --uninstall modes - Firecrawl MCP as default (always available) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 00:20:27 +07:00
parent e80056ae8a
commit 6d7a6d7a88
26 changed files with 4486 additions and 1 deletions
--- a/custom-skills/90-reference-curator/05-quality-reviewer/code/CLAUDE.md
+++ b/custom-skills/90-reference-curator/05-quality-reviewer/code/CLAUDE.md
@@ -0,0 +1,103 @@
+# Quality Reviewer
+
+QA loop for reference library content. Scores distilled materials, routes decisions, and provides actionable feedback.
+
+## Trigger Keywords
+"review content", "quality check", "QA review", "assess distilled content", "check reference quality"
+
+## Decision Flow
+
+```
+[Distilled Content]
+       │
+       ▼
+┌─────────────────┐
+│ Score Criteria  │ → accuracy, completeness, clarity, PE quality, usability
+└─────────────────┘
+       │
+       ├── ≥ 0.85 → APPROVE → markdown-exporter
+       ├── 0.60-0.84 → REFACTOR → content-distiller
+       ├── 0.40-0.59 → DEEP_RESEARCH → web-crawler
+       └── < 0.40 → REJECT → archive
+```
+
+## Scoring Criteria
+
+| Criterion | Weight | Checks |
+|-----------|--------|--------|
+| **Accuracy** | 0.25 | Factual correctness, up-to-date, attribution |
+| **Completeness** | 0.20 | Key concepts, examples, edge cases |
+| **Clarity** | 0.20 | Structure, concise language, logical flow |
+| **PE Quality** | 0.25 | Techniques, before/after, explains why |
+| **Usability** | 0.10 | Easy reference, searchable, appropriate length |
+
+## Workflow
+
+### Step 1: Load Pending Reviews
+```bash
+python scripts/load_pending_reviews.py --output pending.json
+```
+
+### Step 2: Score Content
+```bash
+python scripts/score_content.py --distill-id 123 --output assessment.json
+```
+
+### Step 3: Calculate Final Score
+```bash
+python scripts/calculate_score.py --assessment assessment.json
+```
+
+### Step 4: Route Decision
+```bash
+python scripts/route_decision.py --distill-id 123 --score 0.78
+```
+
+Outputs:
+- `approve` → Ready for export
+- `refactor` → Return to distiller with instructions
+- `deep_research` → Need more sources (queries generated)
+- `reject` → Archive with reason
+
+### Step 5: Log Review
+```bash
+python scripts/log_review.py --distill-id 123 --decision refactor --instructions "Add more examples"
+```
+
+## PE Quality Checklist
+
+When scoring `prompt_engineering_quality`:
+- [ ] Demonstrates specific techniques (CoT, few-shot, etc.)
+- [ ] Shows before/after examples
+- [ ] Explains *why* techniques work
+- [ ] Provides actionable patterns
+- [ ] Includes edge cases and failure modes
+- [ ] References authoritative sources
+
+## Auto-Approve Rules
+
+Tier 1 sources with score ≥ 0.80 may auto-approve:
+```yaml
+# In config
+quality:
+  auto_approve_tier1_sources: true
+  auto_approve_min_score: 0.80
+```
+
+## Scripts
+
+- `scripts/load_pending_reviews.py` - Get pending reviews
+- `scripts/score_content.py` - Multi-criteria scoring
+- `scripts/calculate_score.py` - Weighted average calculation
+- `scripts/route_decision.py` - Decision routing logic
+- `scripts/log_review.py` - Log review to database
+- `scripts/generate_feedback.py` - Generate refactor instructions
+
+## Integration
+
+| From | Action | To |
+|------|--------|-----|
+| content-distiller | Distilled content | → |
+| → | APPROVE | markdown-exporter |
+| → | REFACTOR + instructions | content-distiller |
+| → | DEEP_RESEARCH + queries | web-crawler-orchestrator |
--- a/custom-skills/90-reference-curator/05-quality-reviewer/desktop/SKILL.md
+++ b/custom-skills/90-reference-curator/05-quality-reviewer/desktop/SKILL.md
@@ -0,0 +1,227 @@
+---
+name: quality-reviewer
+description: QA loop for reference library content. Scores distilled materials against prompt engineering quality criteria, routes decisions (approve/refactor/deep_research/reject), and provides actionable feedback. Triggers on "review content", "quality check", "QA review", "assess distilled content", "check reference quality", "refactoring needed".
+---
+
+# Quality Reviewer
+
+Evaluates distilled content for quality, routes decisions, and triggers refactoring or additional research when needed.
+
+## Review Workflow
+
+```
+[Distilled Content]
+       │
+       ▼
+┌─────────────────┐
+│ Score Criteria  │ → accuracy, completeness, clarity, PE quality, usability
+└─────────────────┘
+       │
+       ▼
+┌─────────────────┐
+│ Calculate Total │ → weighted average
+└─────────────────┘
+       │
+       ├── ≥ 0.85 → APPROVE → markdown-exporter
+       ├── 0.60-0.84 → REFACTOR → content-distiller (with instructions)
+       ├── 0.40-0.59 → DEEP_RESEARCH → web-crawler-orchestrator (with queries)
+       └── < 0.40 → REJECT → archive with reason
+```
+
+## Scoring Criteria
+
+| Criterion | Weight | Checks |
+|-----------|--------|--------|
+| **Accuracy** | 0.25 | Factual correctness, up-to-date info, proper attribution |
+| **Completeness** | 0.20 | Covers key concepts, includes examples, addresses edge cases |
+| **Clarity** | 0.20 | Clear structure, concise language, logical flow |
+| **PE Quality** | 0.25 | Demonstrates techniques, before/after examples, explains why |
+| **Usability** | 0.10 | Easy to reference, searchable keywords, appropriate length |
+
+## Decision Thresholds
+
+| Score Range | Decision | Action |
+|-------------|----------|--------|
+| ≥ 0.85 | `approve` | Proceed to export |
+| 0.60 - 0.84 | `refactor` | Return to distiller with feedback |
+| 0.40 - 0.59 | `deep_research` | Gather more sources, then re-distill |
+| < 0.40 | `reject` | Archive, log reason |
+
+## Review Process
+
+### Step 1: Load Content for Review
+
+```python
+def get_pending_reviews(cursor):
+    sql = """
+    SELECT dc.distill_id, dc.doc_id, d.title, d.url, 
+           dc.summary, dc.key_concepts, dc.structured_content,
+           dc.token_count_original, dc.token_count_distilled,
+           s.credibility_tier
+    FROM distilled_content dc
+    JOIN documents d ON dc.doc_id = d.doc_id
+    JOIN sources s ON d.source_id = s.source_id
+    WHERE dc.review_status = 'pending'
+    ORDER BY s.credibility_tier ASC, dc.distill_date ASC
+    """
+    cursor.execute(sql)
+    return cursor.fetchall()
+```
+
+### Step 2: Score Each Criterion
+
+Evaluate content against each criterion using this assessment template:
+
+```python
+assessment_template = {
+    "accuracy": {
+        "score": 0.0,  # 0.00 - 1.00
+        "notes": "",
+        "issues": []   # Specific factual errors if any
+    },
+    "completeness": {
+        "score": 0.0,
+        "notes": "",
+        "missing_topics": []  # Concepts that should be covered
+    },
+    "clarity": {
+        "score": 0.0,
+        "notes": "",
+        "confusing_sections": []  # Sections needing rewrite
+    },
+    "prompt_engineering_quality": {
+        "score": 0.0,
+        "notes": "",
+        "improvements": []  # Specific PE technique gaps
+    },
+    "usability": {
+        "score": 0.0,
+        "notes": "",
+        "suggestions": []
+    }
+}
+```
+
+### Step 3: Calculate Final Score
+
+```python
+WEIGHTS = {
+    "accuracy": 0.25,
+    "completeness": 0.20,
+    "clarity": 0.20,
+    "prompt_engineering_quality": 0.25,
+    "usability": 0.10
+}
+
+def calculate_quality_score(assessment):
+    return sum(
+        assessment[criterion]["score"] * weight
+        for criterion, weight in WEIGHTS.items()
+    )
+```
+
+### Step 4: Route Decision
+
+```python
+def determine_decision(score, assessment):
+    if score >= 0.85:
+        return "approve", None, None
+    elif score >= 0.60:
+        instructions = generate_refactor_instructions(assessment)
+        return "refactor", instructions, None
+    elif score >= 0.40:
+        queries = generate_research_queries(assessment)
+        return "deep_research", None, queries
+    else:
+        return "reject", f"Quality score {score:.2f} below minimum threshold", None
+
+def generate_refactor_instructions(assessment):
+    """Extract actionable feedback from low-scoring criteria."""
+    instructions = []
+    for criterion, data in assessment.items():
+        if data["score"] < 0.80:
+            if data.get("issues"):
+                instructions.extend(data["issues"])
+            if data.get("missing_topics"):
+                instructions.append(f"Add coverage for: {', '.join(data['missing_topics'])}")
+            if data.get("improvements"):
+                instructions.extend(data["improvements"])
+    return "\n".join(instructions)
+
+def generate_research_queries(assessment):
+    """Generate search queries for content gaps."""
+    queries = []
+    if assessment["completeness"]["missing_topics"]:
+        for topic in assessment["completeness"]["missing_topics"]:
+            queries.append(f"{topic} documentation guide")
+    if assessment["accuracy"]["issues"]:
+        queries.append("latest official documentation verification")
+    return queries
+```
+
+### Step 5: Log Review Decision
+
+```python
+def log_review(cursor, distill_id, assessment, score, decision, instructions=None, queries=None):
+    # Get current round number
+    cursor.execute(
+        "SELECT COALESCE(MAX(review_round), 0) + 1 FROM review_logs WHERE distill_id = %s",
+        (distill_id,)
+    )
+    review_round = cursor.fetchone()[0]
+    
+    sql = """
+    INSERT INTO review_logs 
+    (distill_id, review_round, reviewer_type, quality_score, assessment, 
+     decision, refactor_instructions, research_queries)
+    VALUES (%s, %s, 'claude_review', %s, %s, %s, %s, %s)
+    """
+    cursor.execute(sql, (
+        distill_id, review_round, score, 
+        json.dumps(assessment), decision, instructions, 
+        json.dumps(queries) if queries else None
+    ))
+    
+    # Update distilled_content status
+    status_map = {
+        "approve": "approved",
+        "refactor": "needs_refactor", 
+        "deep_research": "needs_refactor",
+        "reject": "rejected"
+    }
+    cursor.execute(
+        "UPDATE distilled_content SET review_status = %s WHERE distill_id = %s",
+        (status_map[decision], distill_id)
+    )
+```
+
+## Prompt Engineering Quality Checklist
+
+When scoring `prompt_engineering_quality`, verify:
+
+- [ ] Demonstrates specific techniques (CoT, few-shot, etc.)
+- [ ] Shows before/after examples
+- [ ] Explains *why* techniques work, not just *what*
+- [ ] Provides actionable patterns
+- [ ] Includes edge cases and failure modes
+- [ ] References authoritative sources
+
+## Auto-Approve Rules
+
+Tier 1 (official) sources with score ≥ 0.80 may auto-approve without human review if configured:
+
+```yaml
+# In export_config.yaml
+quality:
+  auto_approve_tier1_sources: true
+  auto_approve_min_score: 0.80
+```
+
+## Integration Points
+
+| From | Action | To |
+|------|--------|-----|
+| content-distiller | Sends distilled content | quality-reviewer |
+| quality-reviewer | APPROVE | markdown-exporter |
+| quality-reviewer | REFACTOR + instructions | content-distiller |
+| quality-reviewer | DEEP_RESEARCH + queries | web-crawler-orchestrator |