Files
our-claude-skills/custom-skills/90-reference-curator/commands/content-distiller.md
Andrew Yim 6d7a6d7a88 feat(reference-curator): Add portable skill suite for reference documentation curation
6 modular skills for curating, processing, and exporting reference docs:
- reference-discovery: Search and validate authoritative sources
- web-crawler-orchestrator: Multi-backend crawling (Firecrawl/Node/aiohttp/Scrapy)
- content-repository: MySQL storage with version tracking
- content-distiller: Summarization and key concept extraction
- quality-reviewer: QA loop with approve/refactor/research routing
- markdown-exporter: Structured output for Claude Projects or fine-tuning

Cross-machine installation support:
- Environment-based config (~/.reference-curator.env)
- Commands tracked in repo, symlinked during install
- install.sh with --minimal, --check, --uninstall modes
- Firecrawl MCP as default (always available)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 00:20:27 +07:00

2.4 KiB

description, argument-hint, allowed-tools
description argument-hint allowed-tools
Analyze and summarize stored documents. Extracts key concepts, code snippets, and creates structured content. <doc-id|all-pending> [--focus keywords] [--max-tokens 2000] Read, Write, Bash, Glob, Grep

Content Distiller

Analyze, summarize, and extract key information from stored documents.

Arguments

  • <doc-id|all-pending>: Specific document ID or process all pending
  • --focus: Keywords to emphasize in distillation
  • --max-tokens: Target token count for distilled output (default: 2000)

Distillation Process

1. Load Raw Content

source ~/.envrc
# Get document path
mysql -u $MYSQL_USER -p"$MYSQL_PASSWORD" reference_library -N -e \
  "SELECT raw_content_path FROM documents WHERE doc_id = $DOC_ID"

2. Analyze Content

Extract:

  • Summary: 2-3 sentence executive summary
  • Key Concepts: Important terms with definitions
  • Code Snippets: Relevant code examples
  • Structured Content: Full distilled markdown

3. Output Format

{
  "summary": "Executive summary of the document...",
  "key_concepts": [
    {"term": "System Prompt", "definition": "..."},
    {"term": "Context Window", "definition": "..."}
  ],
  "code_snippets": [
    {"language": "python", "description": "...", "code": "..."}
  ],
  "structured_content": "# Title\n\n## Overview\n..."
}

4. Store Distilled Content

INSERT INTO distilled_content
  (doc_id, summary, key_concepts, code_snippets, structured_content,
   token_count_original, token_count_distilled, distill_model, review_status)
VALUES
  (?, ?, ?, ?, ?, ?, ?, 'claude-opus-4', 'pending');

5. Calculate Metrics

  • token_count_original: Tokens in raw content
  • token_count_distilled: Tokens in output
  • compression_ratio: Auto-calculated (distilled/original * 100)

Distillation Guidelines

For Prompt Engineering Content:

  • Emphasize techniques and patterns
  • Include before/after examples
  • Extract actionable best practices
  • Note model-specific behaviors

For API Documentation:

  • Focus on endpoint signatures
  • Include request/response examples
  • Note rate limits and constraints
  • Extract error handling patterns

For Code Repositories:

  • Summarize architecture
  • Extract key functions/classes
  • Note dependencies
  • Include usage examples

Example Usage

/content-distiller 42
/content-distiller all-pending --focus "system prompts"
/content-distiller 15 --max-tokens 3000