# Quality Reviewer Evaluates distilled content for quality, routes decisions, and triggers refactoring or additional research when needed. ## Review Workflow ``` [Distilled Content] │ ▼ ┌─────────────────┐ │ Score Criteria │ → accuracy, completeness, clarity, PE quality, usability └─────────────────┘ │ ▼ ┌─────────────────┐ │ Calculate Total │ → weighted average └─────────────────┘ │ ├── ≥ 0.85 → APPROVE → markdown-exporter ├── 0.60-0.84 → REFACTOR → content-distiller (with instructions) ├── 0.40-0.59 → DEEP_RESEARCH → web-crawler-orchestrator (with queries) └── < 0.40 → REJECT → archive with reason ``` ## Scoring Criteria | Criterion | Weight | Checks | |-----------|--------|--------| | **Accuracy** | 0.25 | Factual correctness, up-to-date info, proper attribution | | **Completeness** | 0.20 | Covers key concepts, includes examples, addresses edge cases | | **Clarity** | 0.20 | Clear structure, concise language, logical flow | | **PE Quality** | 0.25 | Demonstrates techniques, before/after examples, explains why | | **Usability** | 0.10 | Easy to reference, searchable keywords, appropriate length | ## Decision Thresholds | Score Range | Decision | Action | |-------------|----------|--------| | ≥ 0.85 | `approve` | Proceed to export | | 0.60 - 0.84 | `refactor` | Return to distiller with feedback | | 0.40 - 0.59 | `deep_research` | Gather more sources, then re-distill | | < 0.40 | `reject` | Archive, log reason | ## Review Process ### Step 1: Load Content for Review ```python def get_pending_reviews(cursor): sql = """ SELECT dc.distill_id, dc.doc_id, d.title, d.url, dc.summary, dc.key_concepts, dc.structured_content, dc.token_count_original, dc.token_count_distilled, s.credibility_tier FROM distilled_content dc JOIN documents d ON dc.doc_id = d.doc_id JOIN sources s ON d.source_id = s.source_id WHERE dc.review_status = 'pending' ORDER BY s.credibility_tier ASC, dc.distill_date ASC """ cursor.execute(sql) return cursor.fetchall() ``` ### Step 2: Score Each Criterion Evaluate content against each criterion using this assessment template: ```python assessment_template = { "accuracy": { "score": 0.0, # 0.00 - 1.00 "notes": "", "issues": [] # Specific factual errors if any }, "completeness": { "score": 0.0, "notes": "", "missing_topics": [] # Concepts that should be covered }, "clarity": { "score": 0.0, "notes": "", "confusing_sections": [] # Sections needing rewrite }, "prompt_engineering_quality": { "score": 0.0, "notes": "", "improvements": [] # Specific PE technique gaps }, "usability": { "score": 0.0, "notes": "", "suggestions": [] } } ``` ### Step 3: Calculate Final Score ```python WEIGHTS = { "accuracy": 0.25, "completeness": 0.20, "clarity": 0.20, "prompt_engineering_quality": 0.25, "usability": 0.10 } def calculate_quality_score(assessment): return sum( assessment[criterion]["score"] * weight for criterion, weight in WEIGHTS.items() ) ``` ### Step 4: Route Decision ```python def determine_decision(score, assessment): if score >= 0.85: return "approve", None, None elif score >= 0.60: instructions = generate_refactor_instructions(assessment) return "refactor", instructions, None elif score >= 0.40: queries = generate_research_queries(assessment) return "deep_research", None, queries else: return "reject", f"Quality score {score:.2f} below minimum threshold", None def generate_refactor_instructions(assessment): """Extract actionable feedback from low-scoring criteria.""" instructions = [] for criterion, data in assessment.items(): if data["score"] < 0.80: if data.get("issues"): instructions.extend(data["issues"]) if data.get("missing_topics"): instructions.append(f"Add coverage for: {', '.join(data['missing_topics'])}") if data.get("improvements"): instructions.extend(data["improvements"]) return "\n".join(instructions) def generate_research_queries(assessment): """Generate search queries for content gaps.""" queries = [] if assessment["completeness"]["missing_topics"]: for topic in assessment["completeness"]["missing_topics"]: queries.append(f"{topic} documentation guide") if assessment["accuracy"]["issues"]: queries.append("latest official documentation verification") return queries ``` ### Step 5: Log Review Decision ```python def log_review(cursor, distill_id, assessment, score, decision, instructions=None, queries=None): # Get current round number cursor.execute( "SELECT COALESCE(MAX(review_round), 0) + 1 FROM review_logs WHERE distill_id = %s", (distill_id,) ) review_round = cursor.fetchone()[0] sql = """ INSERT INTO review_logs (distill_id, review_round, reviewer_type, quality_score, assessment, decision, refactor_instructions, research_queries) VALUES (%s, %s, 'claude_review', %s, %s, %s, %s, %s) """ cursor.execute(sql, ( distill_id, review_round, score, json.dumps(assessment), decision, instructions, json.dumps(queries) if queries else None )) # Update distilled_content status status_map = { "approve": "approved", "refactor": "needs_refactor", "deep_research": "needs_refactor", "reject": "rejected" } cursor.execute( "UPDATE distilled_content SET review_status = %s WHERE distill_id = %s", (status_map[decision], distill_id) ) ``` ## Prompt Engineering Quality Checklist When scoring `prompt_engineering_quality`, verify: - [ ] Demonstrates specific techniques (CoT, few-shot, etc.) - [ ] Shows before/after examples - [ ] Explains *why* techniques work, not just *what* - [ ] Provides actionable patterns - [ ] Includes edge cases and failure modes - [ ] References authoritative sources ## Auto-Approve Rules Tier 1 (official) sources with score ≥ 0.80 may auto-approve without human review if configured: ```yaml # In export_config.yaml quality: auto_approve_tier1_sources: true auto_approve_min_score: 0.80 ``` ## Integration Points | From | Action | To | |------|--------|-----| | content-distiller | Sends distilled content | quality-reviewer | | quality-reviewer | APPROVE | markdown-exporter | | quality-reviewer | REFACTOR + instructions | content-distiller | | quality-reviewer | DEEP_RESEARCH + queries | web-crawler-orchestrator |