## Summary - Add portable installation tool (`install.sh`) for cross-machine setup - Add Claude.ai export files with proper YAML frontmatter - Add multi-agent-guide v2.0 with consolidated framework template - Rename `00-claude-code-setting` → `00-our-settings-audit` (avoid reserved word) - Add YAML frontmatter to 25+ SKILL.md files for Claude Desktop compatibility ## Commits Included - `93f604a` feat: Add portable installation tool for cross-machine setup - `9b84104` feat: Add Claude.ai export for portable skill installation - `f7ab973` fix: Add YAML frontmatter to Claude.ai export files - `3fed49a` feat(multi-agent-guide): Add v2.0 with consolidated framework - `3be26ef` refactor: Rename settings-audit skill and add YAML frontmatter Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
230 lines
7.1 KiB
Markdown
230 lines
7.1 KiB
Markdown
---
|
|
name: quality-reviewer
|
|
description: |
|
|
Content quality evaluator with multi-criteria scoring and decision routing.
|
|
Triggers: review quality, score content, QA review, approve refactor reject.
|
|
---
|
|
|
|
# Quality Reviewer
|
|
|
|
Evaluates distilled content for quality, routes decisions, and triggers refactoring or additional research when needed.
|
|
|
|
## Review Workflow
|
|
|
|
```
|
|
[Distilled Content]
|
|
│
|
|
▼
|
|
┌─────────────────┐
|
|
│ Score Criteria │ → accuracy, completeness, clarity, PE quality, usability
|
|
└─────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────┐
|
|
│ Calculate Total │ → weighted average
|
|
└─────────────────┘
|
|
│
|
|
├── ≥ 0.85 → APPROVE → markdown-exporter
|
|
├── 0.60-0.84 → REFACTOR → content-distiller (with instructions)
|
|
├── 0.40-0.59 → DEEP_RESEARCH → web-crawler-orchestrator (with queries)
|
|
└── < 0.40 → REJECT → archive with reason
|
|
```
|
|
|
|
## Scoring Criteria
|
|
|
|
| Criterion | Weight | Checks |
|
|
|-----------|--------|--------|
|
|
| **Accuracy** | 0.25 | Factual correctness, up-to-date info, proper attribution |
|
|
| **Completeness** | 0.20 | Covers key concepts, includes examples, addresses edge cases |
|
|
| **Clarity** | 0.20 | Clear structure, concise language, logical flow |
|
|
| **PE Quality** | 0.25 | Demonstrates techniques, before/after examples, explains why |
|
|
| **Usability** | 0.10 | Easy to reference, searchable keywords, appropriate length |
|
|
|
|
## Decision Thresholds
|
|
|
|
| Score Range | Decision | Action |
|
|
|-------------|----------|--------|
|
|
| ≥ 0.85 | `approve` | Proceed to export |
|
|
| 0.60 - 0.84 | `refactor` | Return to distiller with feedback |
|
|
| 0.40 - 0.59 | `deep_research` | Gather more sources, then re-distill |
|
|
| < 0.40 | `reject` | Archive, log reason |
|
|
|
|
## Review Process
|
|
|
|
### Step 1: Load Content for Review
|
|
|
|
```python
|
|
def get_pending_reviews(cursor):
|
|
sql = """
|
|
SELECT dc.distill_id, dc.doc_id, d.title, d.url,
|
|
dc.summary, dc.key_concepts, dc.structured_content,
|
|
dc.token_count_original, dc.token_count_distilled,
|
|
s.credibility_tier
|
|
FROM distilled_content dc
|
|
JOIN documents d ON dc.doc_id = d.doc_id
|
|
JOIN sources s ON d.source_id = s.source_id
|
|
WHERE dc.review_status = 'pending'
|
|
ORDER BY s.credibility_tier ASC, dc.distill_date ASC
|
|
"""
|
|
cursor.execute(sql)
|
|
return cursor.fetchall()
|
|
```
|
|
|
|
### Step 2: Score Each Criterion
|
|
|
|
Evaluate content against each criterion using this assessment template:
|
|
|
|
```python
|
|
assessment_template = {
|
|
"accuracy": {
|
|
"score": 0.0, # 0.00 - 1.00
|
|
"notes": "",
|
|
"issues": [] # Specific factual errors if any
|
|
},
|
|
"completeness": {
|
|
"score": 0.0,
|
|
"notes": "",
|
|
"missing_topics": [] # Concepts that should be covered
|
|
},
|
|
"clarity": {
|
|
"score": 0.0,
|
|
"notes": "",
|
|
"confusing_sections": [] # Sections needing rewrite
|
|
},
|
|
"prompt_engineering_quality": {
|
|
"score": 0.0,
|
|
"notes": "",
|
|
"improvements": [] # Specific PE technique gaps
|
|
},
|
|
"usability": {
|
|
"score": 0.0,
|
|
"notes": "",
|
|
"suggestions": []
|
|
}
|
|
}
|
|
```
|
|
|
|
### Step 3: Calculate Final Score
|
|
|
|
```python
|
|
WEIGHTS = {
|
|
"accuracy": 0.25,
|
|
"completeness": 0.20,
|
|
"clarity": 0.20,
|
|
"prompt_engineering_quality": 0.25,
|
|
"usability": 0.10
|
|
}
|
|
|
|
def calculate_quality_score(assessment):
|
|
return sum(
|
|
assessment[criterion]["score"] * weight
|
|
for criterion, weight in WEIGHTS.items()
|
|
)
|
|
```
|
|
|
|
### Step 4: Route Decision
|
|
|
|
```python
|
|
def determine_decision(score, assessment):
|
|
if score >= 0.85:
|
|
return "approve", None, None
|
|
elif score >= 0.60:
|
|
instructions = generate_refactor_instructions(assessment)
|
|
return "refactor", instructions, None
|
|
elif score >= 0.40:
|
|
queries = generate_research_queries(assessment)
|
|
return "deep_research", None, queries
|
|
else:
|
|
return "reject", f"Quality score {score:.2f} below minimum threshold", None
|
|
|
|
def generate_refactor_instructions(assessment):
|
|
"""Extract actionable feedback from low-scoring criteria."""
|
|
instructions = []
|
|
for criterion, data in assessment.items():
|
|
if data["score"] < 0.80:
|
|
if data.get("issues"):
|
|
instructions.extend(data["issues"])
|
|
if data.get("missing_topics"):
|
|
instructions.append(f"Add coverage for: {', '.join(data['missing_topics'])}")
|
|
if data.get("improvements"):
|
|
instructions.extend(data["improvements"])
|
|
return "\n".join(instructions)
|
|
|
|
def generate_research_queries(assessment):
|
|
"""Generate search queries for content gaps."""
|
|
queries = []
|
|
if assessment["completeness"]["missing_topics"]:
|
|
for topic in assessment["completeness"]["missing_topics"]:
|
|
queries.append(f"{topic} documentation guide")
|
|
if assessment["accuracy"]["issues"]:
|
|
queries.append("latest official documentation verification")
|
|
return queries
|
|
```
|
|
|
|
### Step 5: Log Review Decision
|
|
|
|
```python
|
|
def log_review(cursor, distill_id, assessment, score, decision, instructions=None, queries=None):
|
|
# Get current round number
|
|
cursor.execute(
|
|
"SELECT COALESCE(MAX(review_round), 0) + 1 FROM review_logs WHERE distill_id = %s",
|
|
(distill_id,)
|
|
)
|
|
review_round = cursor.fetchone()[0]
|
|
|
|
sql = """
|
|
INSERT INTO review_logs
|
|
(distill_id, review_round, reviewer_type, quality_score, assessment,
|
|
decision, refactor_instructions, research_queries)
|
|
VALUES (%s, %s, 'claude_review', %s, %s, %s, %s, %s)
|
|
"""
|
|
cursor.execute(sql, (
|
|
distill_id, review_round, score,
|
|
json.dumps(assessment), decision, instructions,
|
|
json.dumps(queries) if queries else None
|
|
))
|
|
|
|
# Update distilled_content status
|
|
status_map = {
|
|
"approve": "approved",
|
|
"refactor": "needs_refactor",
|
|
"deep_research": "needs_refactor",
|
|
"reject": "rejected"
|
|
}
|
|
cursor.execute(
|
|
"UPDATE distilled_content SET review_status = %s WHERE distill_id = %s",
|
|
(status_map[decision], distill_id)
|
|
)
|
|
```
|
|
|
|
## Prompt Engineering Quality Checklist
|
|
|
|
When scoring `prompt_engineering_quality`, verify:
|
|
|
|
- [ ] Demonstrates specific techniques (CoT, few-shot, etc.)
|
|
- [ ] Shows before/after examples
|
|
- [ ] Explains *why* techniques work, not just *what*
|
|
- [ ] Provides actionable patterns
|
|
- [ ] Includes edge cases and failure modes
|
|
- [ ] References authoritative sources
|
|
|
|
## Auto-Approve Rules
|
|
|
|
Tier 1 (official) sources with score ≥ 0.80 may auto-approve without human review if configured:
|
|
|
|
```yaml
|
|
# In export_config.yaml
|
|
quality:
|
|
auto_approve_tier1_sources: true
|
|
auto_approve_min_score: 0.80
|
|
```
|
|
|
|
## Integration Points
|
|
|
|
| From | Action | To |
|
|
|------|--------|-----|
|
|
| content-distiller | Sends distilled content | quality-reviewer |
|
|
| quality-reviewer | APPROVE | markdown-exporter |
|
|
| quality-reviewer | REFACTOR + instructions | content-distiller |
|
|
| quality-reviewer | DEEP_RESEARCH + queries | web-crawler-orchestrator |
|