6 modular skills for curating, processing, and exporting reference docs: - reference-discovery: Search and validate authoritative sources - web-crawler-orchestrator: Multi-backend crawling (Firecrawl/Node/aiohttp/Scrapy) - content-repository: MySQL storage with version tracking - content-distiller: Summarization and key concept extraction - quality-reviewer: QA loop with approve/refactor/research routing - markdown-exporter: Structured output for Claude Projects or fine-tuning Cross-machine installation support: - Environment-based config (~/.reference-curator.env) - Commands tracked in repo, symlinked during install - install.sh with --minimal, --check, --uninstall modes - Firecrawl MCP as default (always available) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2.4 KiB
2.4 KiB
description, argument-hint, allowed-tools
| description | argument-hint | allowed-tools |
|---|---|---|
| Analyze and summarize stored documents. Extracts key concepts, code snippets, and creates structured content. | <doc-id|all-pending> [--focus keywords] [--max-tokens 2000] | Read, Write, Bash, Glob, Grep |
Content Distiller
Analyze, summarize, and extract key information from stored documents.
Arguments
<doc-id|all-pending>: Specific document ID or process all pending--focus: Keywords to emphasize in distillation--max-tokens: Target token count for distilled output (default: 2000)
Distillation Process
1. Load Raw Content
source ~/.envrc
# Get document path
mysql -u $MYSQL_USER -p"$MYSQL_PASSWORD" reference_library -N -e \
"SELECT raw_content_path FROM documents WHERE doc_id = $DOC_ID"
2. Analyze Content
Extract:
- Summary: 2-3 sentence executive summary
- Key Concepts: Important terms with definitions
- Code Snippets: Relevant code examples
- Structured Content: Full distilled markdown
3. Output Format
{
"summary": "Executive summary of the document...",
"key_concepts": [
{"term": "System Prompt", "definition": "..."},
{"term": "Context Window", "definition": "..."}
],
"code_snippets": [
{"language": "python", "description": "...", "code": "..."}
],
"structured_content": "# Title\n\n## Overview\n..."
}
4. Store Distilled Content
INSERT INTO distilled_content
(doc_id, summary, key_concepts, code_snippets, structured_content,
token_count_original, token_count_distilled, distill_model, review_status)
VALUES
(?, ?, ?, ?, ?, ?, ?, 'claude-opus-4', 'pending');
5. Calculate Metrics
token_count_original: Tokens in raw contenttoken_count_distilled: Tokens in outputcompression_ratio: Auto-calculated (distilled/original * 100)
Distillation Guidelines
For Prompt Engineering Content:
- Emphasize techniques and patterns
- Include before/after examples
- Extract actionable best practices
- Note model-specific behaviors
For API Documentation:
- Focus on endpoint signatures
- Include request/response examples
- Note rate limits and constraints
- Extract error handling patterns
For Code Repositories:
- Summarize architecture
- Extract key functions/classes
- Note dependencies
- Include usage examples
Example Usage
/content-distiller 42
/content-distiller all-pending --focus "system prompts"
/content-distiller 15 --max-tokens 3000