Files
our-claude-skills/custom-skills/90-reference-curator/04-content-distiller/code/CLAUDE.md
Andrew Yim 6d7a6d7a88 feat(reference-curator): Add portable skill suite for reference documentation curation
6 modular skills for curating, processing, and exporting reference docs:
- reference-discovery: Search and validate authoritative sources
- web-crawler-orchestrator: Multi-backend crawling (Firecrawl/Node/aiohttp/Scrapy)
- content-repository: MySQL storage with version tracking
- content-distiller: Summarization and key concept extraction
- quality-reviewer: QA loop with approve/refactor/research routing
- markdown-exporter: Structured output for Claude Projects or fine-tuning

Cross-machine installation support:
- Environment-based config (~/.reference-curator.env)
- Commands tracked in repo, symlinked during install
- install.sh with --minimal, --check, --uninstall modes
- Firecrawl MCP as default (always available)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 00:20:27 +07:00

2.6 KiB

Content Distiller

Analyzes and distills raw crawled content into concise reference materials. Extracts key concepts, code snippets, and creates structured summaries.

Trigger Keywords

"distill content", "summarize document", "extract key concepts", "process raw content", "create reference summary"

Goals

  1. Compress - Reduce token count while preserving essential information
  2. Structure - Organize content for easy retrieval
  3. Extract - Pull out code snippets, key concepts, patterns
  4. Annotate - Add metadata for searchability

Workflow

Step 1: Load Raw Content

python scripts/load_pending.py --output pending_docs.json

Step 2: Analyze Content Structure

Identify document characteristics:

  • Has code blocks?
  • Has headers?
  • Has tables?
  • Estimated tokens?

Step 3: Extract Key Components

python scripts/extract_components.py --doc-id 123 --output components.json

Extracts:

  • Code snippets with language tags
  • Key concepts and definitions
  • Best practices
  • Techniques and patterns

Step 4: Create Structured Summary

Output template:

# {title}

**Source:** {url}
**Type:** {source_type} | **Tier:** {credibility_tier}
**Distilled:** {date}

## Executive Summary
{2-3 sentence overview}

## Key Concepts
{bulleted list with definitions}

## Techniques & Patterns
{extracted techniques with use cases}

## Code Examples
{relevant code snippets}

## Best Practices
{actionable recommendations}

Step 5: Optimize for Tokens

Target: 25-35% of original token count

python scripts/optimize_content.py --doc-id 123 --target-ratio 0.30

Step 6: Store Distilled Content

python scripts/store_distilled.py --doc-id 123 --content distilled.md

Quality Metrics

Metric Target
Compression Ratio 25-35% of original
Key Concept Coverage ≥90% of important terms
Code Snippet Retention 100% of relevant examples
Readability Clear, scannable structure

Handling Refactor Requests

When quality-reviewer returns refactor:

python scripts/refactor_content.py --distill-id 456 --instructions "Add more examples"

Scripts

  • scripts/load_pending.py - Load documents pending distillation
  • scripts/extract_components.py - Extract code, concepts, patterns
  • scripts/optimize_content.py - Token optimization
  • scripts/store_distilled.py - Save to database
  • scripts/refactor_content.py - Handle refactor requests

Integration

From To
content-repository Raw document records
quality-reviewer (distilled content)
quality-reviewer Refactor instructions (loop back)