feat(reference-curator): Add portable skill suite for reference documentation curation

6 modular skills for curating, processing, and exporting reference docs:
- reference-discovery: Search and validate authoritative sources
- web-crawler-orchestrator: Multi-backend crawling (Firecrawl/Node/aiohttp/Scrapy)
- content-repository: MySQL storage with version tracking
- content-distiller: Summarization and key concept extraction
- quality-reviewer: QA loop with approve/refactor/research routing
- markdown-exporter: Structured output for Claude Projects or fine-tuning

Cross-machine installation support:
- Environment-based config (~/.reference-curator.env)
- Commands tracked in repo, symlinked during install
- install.sh with --minimal, --check, --uninstall modes
- Firecrawl MCP as default (always available)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-01-29 00:20:27 +07:00
parent e80056ae8a
commit 6d7a6d7a88
26 changed files with 4486 additions and 1 deletions

View File

@@ -0,0 +1,103 @@
# Quality Reviewer
QA loop for reference library content. Scores distilled materials, routes decisions, and provides actionable feedback.
## Trigger Keywords
"review content", "quality check", "QA review", "assess distilled content", "check reference quality"
## Decision Flow
```
[Distilled Content]
┌─────────────────┐
│ Score Criteria │ → accuracy, completeness, clarity, PE quality, usability
└─────────────────┘
├── ≥ 0.85 → APPROVE → markdown-exporter
├── 0.60-0.84 → REFACTOR → content-distiller
├── 0.40-0.59 → DEEP_RESEARCH → web-crawler
└── < 0.40 → REJECT → archive
```
## Scoring Criteria
| Criterion | Weight | Checks |
|-----------|--------|--------|
| **Accuracy** | 0.25 | Factual correctness, up-to-date, attribution |
| **Completeness** | 0.20 | Key concepts, examples, edge cases |
| **Clarity** | 0.20 | Structure, concise language, logical flow |
| **PE Quality** | 0.25 | Techniques, before/after, explains why |
| **Usability** | 0.10 | Easy reference, searchable, appropriate length |
## Workflow
### Step 1: Load Pending Reviews
```bash
python scripts/load_pending_reviews.py --output pending.json
```
### Step 2: Score Content
```bash
python scripts/score_content.py --distill-id 123 --output assessment.json
```
### Step 3: Calculate Final Score
```bash
python scripts/calculate_score.py --assessment assessment.json
```
### Step 4: Route Decision
```bash
python scripts/route_decision.py --distill-id 123 --score 0.78
```
Outputs:
- `approve` → Ready for export
- `refactor` → Return to distiller with instructions
- `deep_research` → Need more sources (queries generated)
- `reject` → Archive with reason
### Step 5: Log Review
```bash
python scripts/log_review.py --distill-id 123 --decision refactor --instructions "Add more examples"
```
## PE Quality Checklist
When scoring `prompt_engineering_quality`:
- [ ] Demonstrates specific techniques (CoT, few-shot, etc.)
- [ ] Shows before/after examples
- [ ] Explains *why* techniques work
- [ ] Provides actionable patterns
- [ ] Includes edge cases and failure modes
- [ ] References authoritative sources
## Auto-Approve Rules
Tier 1 sources with score ≥ 0.80 may auto-approve:
```yaml
# In config
quality:
auto_approve_tier1_sources: true
auto_approve_min_score: 0.80
```
## Scripts
- `scripts/load_pending_reviews.py` - Get pending reviews
- `scripts/score_content.py` - Multi-criteria scoring
- `scripts/calculate_score.py` - Weighted average calculation
- `scripts/route_decision.py` - Decision routing logic
- `scripts/log_review.py` - Log review to database
- `scripts/generate_feedback.py` - Generate refactor instructions
## Integration
| From | Action | To |
|------|--------|-----|
| content-distiller | Distilled content | → |
| → | APPROVE | markdown-exporter |
| → | REFACTOR + instructions | content-distiller |
| → | DEEP_RESEARCH + queries | web-crawler-orchestrator |

View File

@@ -0,0 +1,227 @@
---
name: quality-reviewer
description: QA loop for reference library content. Scores distilled materials against prompt engineering quality criteria, routes decisions (approve/refactor/deep_research/reject), and provides actionable feedback. Triggers on "review content", "quality check", "QA review", "assess distilled content", "check reference quality", "refactoring needed".
---
# Quality Reviewer
Evaluates distilled content for quality, routes decisions, and triggers refactoring or additional research when needed.
## Review Workflow
```
[Distilled Content]
┌─────────────────┐
│ Score Criteria │ → accuracy, completeness, clarity, PE quality, usability
└─────────────────┘
┌─────────────────┐
│ Calculate Total │ → weighted average
└─────────────────┘
├── ≥ 0.85 → APPROVE → markdown-exporter
├── 0.60-0.84 → REFACTOR → content-distiller (with instructions)
├── 0.40-0.59 → DEEP_RESEARCH → web-crawler-orchestrator (with queries)
└── < 0.40 → REJECT → archive with reason
```
## Scoring Criteria
| Criterion | Weight | Checks |
|-----------|--------|--------|
| **Accuracy** | 0.25 | Factual correctness, up-to-date info, proper attribution |
| **Completeness** | 0.20 | Covers key concepts, includes examples, addresses edge cases |
| **Clarity** | 0.20 | Clear structure, concise language, logical flow |
| **PE Quality** | 0.25 | Demonstrates techniques, before/after examples, explains why |
| **Usability** | 0.10 | Easy to reference, searchable keywords, appropriate length |
## Decision Thresholds
| Score Range | Decision | Action |
|-------------|----------|--------|
| ≥ 0.85 | `approve` | Proceed to export |
| 0.60 - 0.84 | `refactor` | Return to distiller with feedback |
| 0.40 - 0.59 | `deep_research` | Gather more sources, then re-distill |
| < 0.40 | `reject` | Archive, log reason |
## Review Process
### Step 1: Load Content for Review
```python
def get_pending_reviews(cursor):
sql = """
SELECT dc.distill_id, dc.doc_id, d.title, d.url,
dc.summary, dc.key_concepts, dc.structured_content,
dc.token_count_original, dc.token_count_distilled,
s.credibility_tier
FROM distilled_content dc
JOIN documents d ON dc.doc_id = d.doc_id
JOIN sources s ON d.source_id = s.source_id
WHERE dc.review_status = 'pending'
ORDER BY s.credibility_tier ASC, dc.distill_date ASC
"""
cursor.execute(sql)
return cursor.fetchall()
```
### Step 2: Score Each Criterion
Evaluate content against each criterion using this assessment template:
```python
assessment_template = {
"accuracy": {
"score": 0.0, # 0.00 - 1.00
"notes": "",
"issues": [] # Specific factual errors if any
},
"completeness": {
"score": 0.0,
"notes": "",
"missing_topics": [] # Concepts that should be covered
},
"clarity": {
"score": 0.0,
"notes": "",
"confusing_sections": [] # Sections needing rewrite
},
"prompt_engineering_quality": {
"score": 0.0,
"notes": "",
"improvements": [] # Specific PE technique gaps
},
"usability": {
"score": 0.0,
"notes": "",
"suggestions": []
}
}
```
### Step 3: Calculate Final Score
```python
WEIGHTS = {
"accuracy": 0.25,
"completeness": 0.20,
"clarity": 0.20,
"prompt_engineering_quality": 0.25,
"usability": 0.10
}
def calculate_quality_score(assessment):
return sum(
assessment[criterion]["score"] * weight
for criterion, weight in WEIGHTS.items()
)
```
### Step 4: Route Decision
```python
def determine_decision(score, assessment):
if score >= 0.85:
return "approve", None, None
elif score >= 0.60:
instructions = generate_refactor_instructions(assessment)
return "refactor", instructions, None
elif score >= 0.40:
queries = generate_research_queries(assessment)
return "deep_research", None, queries
else:
return "reject", f"Quality score {score:.2f} below minimum threshold", None
def generate_refactor_instructions(assessment):
"""Extract actionable feedback from low-scoring criteria."""
instructions = []
for criterion, data in assessment.items():
if data["score"] < 0.80:
if data.get("issues"):
instructions.extend(data["issues"])
if data.get("missing_topics"):
instructions.append(f"Add coverage for: {', '.join(data['missing_topics'])}")
if data.get("improvements"):
instructions.extend(data["improvements"])
return "\n".join(instructions)
def generate_research_queries(assessment):
"""Generate search queries for content gaps."""
queries = []
if assessment["completeness"]["missing_topics"]:
for topic in assessment["completeness"]["missing_topics"]:
queries.append(f"{topic} documentation guide")
if assessment["accuracy"]["issues"]:
queries.append("latest official documentation verification")
return queries
```
### Step 5: Log Review Decision
```python
def log_review(cursor, distill_id, assessment, score, decision, instructions=None, queries=None):
# Get current round number
cursor.execute(
"SELECT COALESCE(MAX(review_round), 0) + 1 FROM review_logs WHERE distill_id = %s",
(distill_id,)
)
review_round = cursor.fetchone()[0]
sql = """
INSERT INTO review_logs
(distill_id, review_round, reviewer_type, quality_score, assessment,
decision, refactor_instructions, research_queries)
VALUES (%s, %s, 'claude_review', %s, %s, %s, %s, %s)
"""
cursor.execute(sql, (
distill_id, review_round, score,
json.dumps(assessment), decision, instructions,
json.dumps(queries) if queries else None
))
# Update distilled_content status
status_map = {
"approve": "approved",
"refactor": "needs_refactor",
"deep_research": "needs_refactor",
"reject": "rejected"
}
cursor.execute(
"UPDATE distilled_content SET review_status = %s WHERE distill_id = %s",
(status_map[decision], distill_id)
)
```
## Prompt Engineering Quality Checklist
When scoring `prompt_engineering_quality`, verify:
- [ ] Demonstrates specific techniques (CoT, few-shot, etc.)
- [ ] Shows before/after examples
- [ ] Explains *why* techniques work, not just *what*
- [ ] Provides actionable patterns
- [ ] Includes edge cases and failure modes
- [ ] References authoritative sources
## Auto-Approve Rules
Tier 1 (official) sources with score ≥ 0.80 may auto-approve without human review if configured:
```yaml
# In export_config.yaml
quality:
auto_approve_tier1_sources: true
auto_approve_min_score: 0.80
```
## Integration Points
| From | Action | To |
|------|--------|-----|
| content-distiller | Sends distilled content | quality-reviewer |
| quality-reviewer | APPROVE | markdown-exporter |
| quality-reviewer | REFACTOR + instructions | content-distiller |
| quality-reviewer | DEEP_RESEARCH + queries | web-crawler-orchestrator |