feat(reference-curator): Add portable skill suite for reference documentation curation

6 modular skills for curating, processing, and exporting reference docs: - reference-discovery: Search and validate authoritative sources - web-crawler-orchestrator: Multi-backend crawling (Firecrawl/Node/aiohttp/Scrapy) - content-repository: MySQL storage with version tracking - content-distiller: Summarization and key concept extraction - quality-reviewer: QA loop with approve/refactor/research routing - markdown-exporter: Structured output for Claude Projects or fine-tuning Cross-machine installation support: - Environment-based config (~/.reference-curator.env) - Commands tracked in repo, symlinked during install - install.sh with --minimal, --check, --uninstall modes - Firecrawl MCP as default (always available) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 00:20:27 +07:00
parent e80056ae8a
commit 6d7a6d7a88
26 changed files with 4486 additions and 1 deletions
--- a/custom-skills/90-reference-curator/commands/content-distiller.md
+++ b/custom-skills/90-reference-curator/commands/content-distiller.md
@@ -0,0 +1,92 @@
+---
+description: Analyze and summarize stored documents. Extracts key concepts, code snippets, and creates structured content.
+argument-hint: <doc-id|all-pending> [--focus keywords] [--max-tokens 2000]
+allowed-tools: Read, Write, Bash, Glob, Grep
+---
+
+# Content Distiller
+
+Analyze, summarize, and extract key information from stored documents.
+
+## Arguments
+- `<doc-id|all-pending>`: Specific document ID or process all pending
+- `--focus`: Keywords to emphasize in distillation
+- `--max-tokens`: Target token count for distilled output (default: 2000)
+
+## Distillation Process
+
+### 1. Load Raw Content
+```bash
+source ~/.envrc
+# Get document path
+mysql -u $MYSQL_USER -p"$MYSQL_PASSWORD" reference_library -N -e \
+  "SELECT raw_content_path FROM documents WHERE doc_id = $DOC_ID"
+```
+
+### 2. Analyze Content
+
+Extract:
+- **Summary**: 2-3 sentence executive summary
+- **Key Concepts**: Important terms with definitions
+- **Code Snippets**: Relevant code examples
+- **Structured Content**: Full distilled markdown
+
+### 3. Output Format
+
+```json
+{
+  "summary": "Executive summary of the document...",
+  "key_concepts": [
+    {"term": "System Prompt", "definition": "..."},
+    {"term": "Context Window", "definition": "..."}
+  ],
+  "code_snippets": [
+    {"language": "python", "description": "...", "code": "..."}
+  ],
+  "structured_content": "# Title\n\n## Overview\n..."
+}
+```
+
+### 4. Store Distilled Content
+
+```sql
+INSERT INTO distilled_content
+  (doc_id, summary, key_concepts, code_snippets, structured_content,
+   token_count_original, token_count_distilled, distill_model, review_status)
+VALUES
+  (?, ?, ?, ?, ?, ?, ?, 'claude-opus-4', 'pending');
+```
+
+### 5. Calculate Metrics
+
+- `token_count_original`: Tokens in raw content
+- `token_count_distilled`: Tokens in output
+- `compression_ratio`: Auto-calculated (distilled/original * 100)
+
+## Distillation Guidelines
+
+**For Prompt Engineering Content:**
+- Emphasize techniques and patterns
+- Include before/after examples
+- Extract actionable best practices
+- Note model-specific behaviors
+
+**For API Documentation:**
+- Focus on endpoint signatures
+- Include request/response examples
+- Note rate limits and constraints
+- Extract error handling patterns
+
+**For Code Repositories:**
+- Summarize architecture
+- Extract key functions/classes
+- Note dependencies
+- Include usage examples
+
+## Example Usage
+
+```
+/content-distiller 42
+/content-distiller all-pending --focus "system prompts"
+/content-distiller 15 --max-tokens 3000
+```