Files
our-claude-skills/custom-skills/90-reference-curator/05-quality-reviewer/desktop/SKILL.md
Andrew Yim 6d7a6d7a88 feat(reference-curator): Add portable skill suite for reference documentation curation
6 modular skills for curating, processing, and exporting reference docs:
- reference-discovery: Search and validate authoritative sources
- web-crawler-orchestrator: Multi-backend crawling (Firecrawl/Node/aiohttp/Scrapy)
- content-repository: MySQL storage with version tracking
- content-distiller: Summarization and key concept extraction
- quality-reviewer: QA loop with approve/refactor/research routing
- markdown-exporter: Structured output for Claude Projects or fine-tuning

Cross-machine installation support:
- Environment-based config (~/.reference-curator.env)
- Commands tracked in repo, symlinked during install
- install.sh with --minimal, --check, --uninstall modes
- Firecrawl MCP as default (always available)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 00:20:27 +07:00

7.3 KiB

name, description
name description
quality-reviewer QA loop for reference library content. Scores distilled materials against prompt engineering quality criteria, routes decisions (approve/refactor/deep_research/reject), and provides actionable feedback. Triggers on "review content", "quality check", "QA review", "assess distilled content", "check reference quality", "refactoring needed".

Quality Reviewer

Evaluates distilled content for quality, routes decisions, and triggers refactoring or additional research when needed.

Review Workflow

[Distilled Content]
       │
       ▼
┌─────────────────┐
│ Score Criteria  │ → accuracy, completeness, clarity, PE quality, usability
└─────────────────┘
       │
       ▼
┌─────────────────┐
│ Calculate Total │ → weighted average
└─────────────────┘
       │
       ├── ≥ 0.85 → APPROVE → markdown-exporter
       ├── 0.60-0.84 → REFACTOR → content-distiller (with instructions)
       ├── 0.40-0.59 → DEEP_RESEARCH → web-crawler-orchestrator (with queries)
       └── < 0.40 → REJECT → archive with reason

Scoring Criteria

Criterion Weight Checks
Accuracy 0.25 Factual correctness, up-to-date info, proper attribution
Completeness 0.20 Covers key concepts, includes examples, addresses edge cases
Clarity 0.20 Clear structure, concise language, logical flow
PE Quality 0.25 Demonstrates techniques, before/after examples, explains why
Usability 0.10 Easy to reference, searchable keywords, appropriate length

Decision Thresholds

Score Range Decision Action
≥ 0.85 approve Proceed to export
0.60 - 0.84 refactor Return to distiller with feedback
0.40 - 0.59 deep_research Gather more sources, then re-distill
< 0.40 reject Archive, log reason

Review Process

Step 1: Load Content for Review

def get_pending_reviews(cursor):
    sql = """
    SELECT dc.distill_id, dc.doc_id, d.title, d.url, 
           dc.summary, dc.key_concepts, dc.structured_content,
           dc.token_count_original, dc.token_count_distilled,
           s.credibility_tier
    FROM distilled_content dc
    JOIN documents d ON dc.doc_id = d.doc_id
    JOIN sources s ON d.source_id = s.source_id
    WHERE dc.review_status = 'pending'
    ORDER BY s.credibility_tier ASC, dc.distill_date ASC
    """
    cursor.execute(sql)
    return cursor.fetchall()

Step 2: Score Each Criterion

Evaluate content against each criterion using this assessment template:

assessment_template = {
    "accuracy": {
        "score": 0.0,  # 0.00 - 1.00
        "notes": "",
        "issues": []   # Specific factual errors if any
    },
    "completeness": {
        "score": 0.0,
        "notes": "",
        "missing_topics": []  # Concepts that should be covered
    },
    "clarity": {
        "score": 0.0,
        "notes": "",
        "confusing_sections": []  # Sections needing rewrite
    },
    "prompt_engineering_quality": {
        "score": 0.0,
        "notes": "",
        "improvements": []  # Specific PE technique gaps
    },
    "usability": {
        "score": 0.0,
        "notes": "",
        "suggestions": []
    }
}

Step 3: Calculate Final Score

WEIGHTS = {
    "accuracy": 0.25,
    "completeness": 0.20,
    "clarity": 0.20,
    "prompt_engineering_quality": 0.25,
    "usability": 0.10
}

def calculate_quality_score(assessment):
    return sum(
        assessment[criterion]["score"] * weight
        for criterion, weight in WEIGHTS.items()
    )

Step 4: Route Decision

def determine_decision(score, assessment):
    if score >= 0.85:
        return "approve", None, None
    elif score >= 0.60:
        instructions = generate_refactor_instructions(assessment)
        return "refactor", instructions, None
    elif score >= 0.40:
        queries = generate_research_queries(assessment)
        return "deep_research", None, queries
    else:
        return "reject", f"Quality score {score:.2f} below minimum threshold", None

def generate_refactor_instructions(assessment):
    """Extract actionable feedback from low-scoring criteria."""
    instructions = []
    for criterion, data in assessment.items():
        if data["score"] < 0.80:
            if data.get("issues"):
                instructions.extend(data["issues"])
            if data.get("missing_topics"):
                instructions.append(f"Add coverage for: {', '.join(data['missing_topics'])}")
            if data.get("improvements"):
                instructions.extend(data["improvements"])
    return "\n".join(instructions)

def generate_research_queries(assessment):
    """Generate search queries for content gaps."""
    queries = []
    if assessment["completeness"]["missing_topics"]:
        for topic in assessment["completeness"]["missing_topics"]:
            queries.append(f"{topic} documentation guide")
    if assessment["accuracy"]["issues"]:
        queries.append("latest official documentation verification")
    return queries

Step 5: Log Review Decision

def log_review(cursor, distill_id, assessment, score, decision, instructions=None, queries=None):
    # Get current round number
    cursor.execute(
        "SELECT COALESCE(MAX(review_round), 0) + 1 FROM review_logs WHERE distill_id = %s",
        (distill_id,)
    )
    review_round = cursor.fetchone()[0]
    
    sql = """
    INSERT INTO review_logs 
    (distill_id, review_round, reviewer_type, quality_score, assessment, 
     decision, refactor_instructions, research_queries)
    VALUES (%s, %s, 'claude_review', %s, %s, %s, %s, %s)
    """
    cursor.execute(sql, (
        distill_id, review_round, score, 
        json.dumps(assessment), decision, instructions, 
        json.dumps(queries) if queries else None
    ))
    
    # Update distilled_content status
    status_map = {
        "approve": "approved",
        "refactor": "needs_refactor", 
        "deep_research": "needs_refactor",
        "reject": "rejected"
    }
    cursor.execute(
        "UPDATE distilled_content SET review_status = %s WHERE distill_id = %s",
        (status_map[decision], distill_id)
    )

Prompt Engineering Quality Checklist

When scoring prompt_engineering_quality, verify:

  • Demonstrates specific techniques (CoT, few-shot, etc.)
  • Shows before/after examples
  • Explains why techniques work, not just what
  • Provides actionable patterns
  • Includes edge cases and failure modes
  • References authoritative sources

Auto-Approve Rules

Tier 1 (official) sources with score ≥ 0.80 may auto-approve without human review if configured:

# In export_config.yaml
quality:
  auto_approve_tier1_sources: true
  auto_approve_min_score: 0.80

Integration Points

From Action To
content-distiller Sends distilled content quality-reviewer
quality-reviewer APPROVE markdown-exporter
quality-reviewer REFACTOR + instructions content-distiller
quality-reviewer DEEP_RESEARCH + queries web-crawler-orchestrator