Files
our-claude-skills/custom-skills/90-reference-curator/04-content-distiller/desktop/SKILL.md
Andrew Yim b6a478e1df feat: Add installation tool, Claude.ai export, and skill standardization (#1)
## Summary

- Add portable installation tool (`install.sh`) for cross-machine setup
- Add Claude.ai export files with proper YAML frontmatter
- Add multi-agent-guide v2.0 with consolidated framework template
- Rename `00-claude-code-setting` → `00-our-settings-audit` (avoid reserved word)
- Add YAML frontmatter to 25+ SKILL.md files for Claude Desktop compatibility

## Commits Included

- `93f604a` feat: Add portable installation tool for cross-machine setup
- `9b84104` feat: Add Claude.ai export for portable skill installation
- `f7ab973` fix: Add YAML frontmatter to Claude.ai export files
- `3fed49a` feat(multi-agent-guide): Add v2.0 with consolidated framework
- `3be26ef` refactor: Rename settings-audit skill and add YAML frontmatter

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 16:48:06 +07:00

241 lines
6.6 KiB
Markdown

---
name: content-distiller
description: |
Raw content summarizer extracting key concepts, code snippets, and structured output.
Triggers: distill content, summarize document, extract key concepts, compress content.
---
# Content Distiller
Transforms raw crawled content into structured, high-quality reference materials.
## Distillation Goals
1. **Compress** - Reduce token count while preserving essential information
2. **Structure** - Organize content for easy retrieval and reference
3. **Extract** - Pull out code snippets, key concepts, and actionable patterns
4. **Annotate** - Add metadata for searchability and categorization
## Distillation Workflow
### Step 1: Load Raw Content
```python
def load_for_distillation(cursor):
"""Get documents ready for distillation."""
sql = """
SELECT d.doc_id, d.title, d.url, d.raw_content_path,
d.doc_type, s.source_type, s.credibility_tier
FROM documents d
JOIN sources s ON d.source_id = s.source_id
LEFT JOIN distilled_content dc ON d.doc_id = dc.doc_id
WHERE d.crawl_status = 'completed'
AND dc.distill_id IS NULL
ORDER BY s.credibility_tier ASC
"""
cursor.execute(sql)
return cursor.fetchall()
```
### Step 2: Analyze Content Structure
Identify content type and select appropriate distillation strategy:
```python
def analyze_structure(content, doc_type):
"""Analyze document structure for distillation."""
analysis = {
"has_code_blocks": bool(re.findall(r'```[\s\S]*?```', content)),
"has_headers": bool(re.findall(r'^#+\s', content, re.MULTILINE)),
"has_lists": bool(re.findall(r'^\s*[-*]\s', content, re.MULTILINE)),
"has_tables": bool(re.findall(r'\|.*\|', content)),
"estimated_tokens": len(content.split()) * 1.3, # Rough estimate
"section_count": len(re.findall(r'^#+\s', content, re.MULTILINE))
}
return analysis
```
### Step 3: Extract Key Components
**Extract Code Snippets:**
```python
def extract_code_snippets(content):
"""Extract all code blocks with language tags."""
pattern = r'```(\w*)\n([\s\S]*?)```'
snippets = []
for match in re.finditer(pattern, content):
snippets.append({
"language": match.group(1) or "text",
"code": match.group(2).strip(),
"context": get_surrounding_text(content, match.start(), 200)
})
return snippets
```
**Extract Key Concepts:**
```python
def extract_key_concepts(content, title):
"""Use Claude to extract key concepts and definitions."""
prompt = f"""
Analyze this document and extract key concepts:
Title: {title}
Content: {content[:8000]} # Limit for context
Return JSON with:
- concepts: [{{"term": "...", "definition": "...", "importance": "high|medium|low"}}]
- techniques: [{{"name": "...", "description": "...", "use_case": "..."}}]
- best_practices: ["..."]
"""
# Use Claude API to process
return claude_extract(prompt)
```
### Step 4: Create Structured Summary
**Summary Template:**
```markdown
# {title}
**Source:** {url}
**Type:** {source_type} | **Tier:** {credibility_tier}
**Distilled:** {date}
## Executive Summary
{2-3 sentence overview}
## Key Concepts
{bulleted list of core concepts with brief definitions}
## Techniques & Patterns
{extracted techniques with use cases}
## Code Examples
{relevant code snippets with context}
## Best Practices
{actionable recommendations}
## Related Topics
{links to related content in library}
```
### Step 5: Optimize for Tokens
```python
def optimize_content(structured_content, target_ratio=0.30):
"""
Compress content to target ratio while preserving quality.
Target: 30% of original token count.
"""
original_tokens = count_tokens(structured_content)
target_tokens = int(original_tokens * target_ratio)
# Prioritized compression strategies
strategies = [
remove_redundant_explanations,
condense_examples,
merge_similar_sections,
trim_verbose_descriptions
]
optimized = structured_content
for strategy in strategies:
if count_tokens(optimized) > target_tokens:
optimized = strategy(optimized)
return optimized
```
### Step 6: Store Distilled Content
```python
def store_distilled(cursor, doc_id, summary, key_concepts,
code_snippets, structured_content,
original_tokens, distilled_tokens):
sql = """
INSERT INTO distilled_content
(doc_id, summary, key_concepts, code_snippets, structured_content,
token_count_original, token_count_distilled, distill_model, review_status)
VALUES (%s, %s, %s, %s, %s, %s, %s, 'claude-opus-4-5', 'pending')
"""
cursor.execute(sql, (
doc_id, summary,
json.dumps(key_concepts),
json.dumps(code_snippets),
structured_content,
original_tokens,
distilled_tokens
))
return cursor.lastrowid
```
## Distillation Prompts
**For Prompt Engineering Content:**
```
Focus on:
1. Specific techniques with before/after examples
2. Why techniques work (not just what)
3. Common pitfalls and how to avoid them
4. Actionable patterns that can be directly applied
```
**For API Documentation:**
```
Focus on:
1. Endpoint specifications and parameters
2. Request/response examples
3. Error codes and handling
4. Rate limits and best practices
```
**For Research Papers:**
```
Focus on:
1. Key findings and conclusions
2. Novel techniques introduced
3. Practical applications
4. Limitations and caveats
```
## Quality Metrics
Track compression efficiency:
| Metric | Target |
|--------|--------|
| Compression Ratio | 25-35% of original |
| Key Concept Coverage | ≥90% of important terms |
| Code Snippet Retention | 100% of relevant examples |
| Readability | Clear, scannable structure |
## Handling Refactor Requests
When `quality-reviewer` returns `refactor` decision:
```python
def handle_refactor(distill_id, instructions):
"""Re-distill based on reviewer feedback."""
# Load original content and existing distillation
original = load_raw_content(distill_id)
existing = load_distilled_content(distill_id)
# Apply specific improvements based on instructions
improved = apply_improvements(existing, instructions)
# Update distilled_content
update_distilled(distill_id, improved)
# Reset review status
set_review_status(distill_id, 'pending')
```
## Integration
| From | Input | To |
|------|-------|-----|
| content-repository | Raw document records | content-distiller |
| content-distiller | Distilled content | quality-reviewer |
| quality-reviewer | Refactor instructions | content-distiller (loop) |