# Content Distiller Analyzes and distills raw crawled content into concise reference materials. Extracts key concepts, code snippets, and creates structured summaries. ## Trigger Keywords "distill content", "summarize document", "extract key concepts", "process raw content", "create reference summary" ## Goals 1. **Compress** - Reduce token count while preserving essential information 2. **Structure** - Organize content for easy retrieval 3. **Extract** - Pull out code snippets, key concepts, patterns 4. **Annotate** - Add metadata for searchability ## Workflow ### Step 1: Load Raw Content ```bash python scripts/load_pending.py --output pending_docs.json ``` ### Step 2: Analyze Content Structure Identify document characteristics: - Has code blocks? - Has headers? - Has tables? - Estimated tokens? ### Step 3: Extract Key Components ```bash python scripts/extract_components.py --doc-id 123 --output components.json ``` Extracts: - Code snippets with language tags - Key concepts and definitions - Best practices - Techniques and patterns ### Step 4: Create Structured Summary Output template: ```markdown # {title} **Source:** {url} **Type:** {source_type} | **Tier:** {credibility_tier} **Distilled:** {date} ## Executive Summary {2-3 sentence overview} ## Key Concepts {bulleted list with definitions} ## Techniques & Patterns {extracted techniques with use cases} ## Code Examples {relevant code snippets} ## Best Practices {actionable recommendations} ``` ### Step 5: Optimize for Tokens Target: 25-35% of original token count ```bash python scripts/optimize_content.py --doc-id 123 --target-ratio 0.30 ``` ### Step 6: Store Distilled Content ```bash python scripts/store_distilled.py --doc-id 123 --content distilled.md ``` ## Quality Metrics | Metric | Target | |--------|--------| | Compression Ratio | 25-35% of original | | Key Concept Coverage | ≥90% of important terms | | Code Snippet Retention | 100% of relevant examples | | Readability | Clear, scannable structure | ## Handling Refactor Requests When `quality-reviewer` returns `refactor`: ```bash python scripts/refactor_content.py --distill-id 456 --instructions "Add more examples" ``` ## Scripts - `scripts/load_pending.py` - Load documents pending distillation - `scripts/extract_components.py` - Extract code, concepts, patterns - `scripts/optimize_content.py` - Token optimization - `scripts/store_distilled.py` - Save to database - `scripts/refactor_content.py` - Handle refactor requests ## Integration | From | To | |------|-----| | content-repository | Raw document records | | → | quality-reviewer (distilled content) | | quality-reviewer | Refactor instructions (loop back) |