feat(reference-curator): Add portable skill suite for reference documentation curation
6 modular skills for curating, processing, and exporting reference docs: - reference-discovery: Search and validate authoritative sources - web-crawler-orchestrator: Multi-backend crawling (Firecrawl/Node/aiohttp/Scrapy) - content-repository: MySQL storage with version tracking - content-distiller: Summarization and key concept extraction - quality-reviewer: QA loop with approve/refactor/research routing - markdown-exporter: Structured output for Claude Projects or fine-tuning Cross-machine installation support: - Environment-based config (~/.reference-curator.env) - Commands tracked in repo, symlinked during install - install.sh with --minimal, --check, --uninstall modes - Firecrawl MCP as default (always available) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,92 @@
|
||||
---
|
||||
description: Analyze and summarize stored documents. Extracts key concepts, code snippets, and creates structured content.
|
||||
argument-hint: <doc-id|all-pending> [--focus keywords] [--max-tokens 2000]
|
||||
allowed-tools: Read, Write, Bash, Glob, Grep
|
||||
---
|
||||
|
||||
# Content Distiller
|
||||
|
||||
Analyze, summarize, and extract key information from stored documents.
|
||||
|
||||
## Arguments
|
||||
- `<doc-id|all-pending>`: Specific document ID or process all pending
|
||||
- `--focus`: Keywords to emphasize in distillation
|
||||
- `--max-tokens`: Target token count for distilled output (default: 2000)
|
||||
|
||||
## Distillation Process
|
||||
|
||||
### 1. Load Raw Content
|
||||
```bash
|
||||
source ~/.envrc
|
||||
# Get document path
|
||||
mysql -u $MYSQL_USER -p"$MYSQL_PASSWORD" reference_library -N -e \
|
||||
"SELECT raw_content_path FROM documents WHERE doc_id = $DOC_ID"
|
||||
```
|
||||
|
||||
### 2. Analyze Content
|
||||
|
||||
Extract:
|
||||
- **Summary**: 2-3 sentence executive summary
|
||||
- **Key Concepts**: Important terms with definitions
|
||||
- **Code Snippets**: Relevant code examples
|
||||
- **Structured Content**: Full distilled markdown
|
||||
|
||||
### 3. Output Format
|
||||
|
||||
```json
|
||||
{
|
||||
"summary": "Executive summary of the document...",
|
||||
"key_concepts": [
|
||||
{"term": "System Prompt", "definition": "..."},
|
||||
{"term": "Context Window", "definition": "..."}
|
||||
],
|
||||
"code_snippets": [
|
||||
{"language": "python", "description": "...", "code": "..."}
|
||||
],
|
||||
"structured_content": "# Title\n\n## Overview\n..."
|
||||
}
|
||||
```
|
||||
|
||||
### 4. Store Distilled Content
|
||||
|
||||
```sql
|
||||
INSERT INTO distilled_content
|
||||
(doc_id, summary, key_concepts, code_snippets, structured_content,
|
||||
token_count_original, token_count_distilled, distill_model, review_status)
|
||||
VALUES
|
||||
(?, ?, ?, ?, ?, ?, ?, 'claude-opus-4', 'pending');
|
||||
```
|
||||
|
||||
### 5. Calculate Metrics
|
||||
|
||||
- `token_count_original`: Tokens in raw content
|
||||
- `token_count_distilled`: Tokens in output
|
||||
- `compression_ratio`: Auto-calculated (distilled/original * 100)
|
||||
|
||||
## Distillation Guidelines
|
||||
|
||||
**For Prompt Engineering Content:**
|
||||
- Emphasize techniques and patterns
|
||||
- Include before/after examples
|
||||
- Extract actionable best practices
|
||||
- Note model-specific behaviors
|
||||
|
||||
**For API Documentation:**
|
||||
- Focus on endpoint signatures
|
||||
- Include request/response examples
|
||||
- Note rate limits and constraints
|
||||
- Extract error handling patterns
|
||||
|
||||
**For Code Repositories:**
|
||||
- Summarize architecture
|
||||
- Extract key functions/classes
|
||||
- Note dependencies
|
||||
- Include usage examples
|
||||
|
||||
## Example Usage
|
||||
|
||||
```
|
||||
/content-distiller 42
|
||||
/content-distiller all-pending --focus "system prompts"
|
||||
/content-distiller 15 --max-tokens 3000
|
||||
```
|
||||
Reference in New Issue
Block a user