feat(reference-curator): Add Claude.ai Projects export format
Add claude-project/ folder with skill files formatted for upload to Claude.ai Projects (web interface): - reference-curator-complete.md: All 6 skills consolidated - INDEX.md: Overview and workflow documentation - Individual skill files (01-06) without YAML frontmatter Add --claude-ai option to install.sh: - Lists available files for upload - Optionally copies to custom destination directory - Provides upload instructions for Claude.ai Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,234 @@
|
||||
|
||||
# Content Distiller
|
||||
|
||||
Transforms raw crawled content into structured, high-quality reference materials.
|
||||
|
||||
## Distillation Goals
|
||||
|
||||
1. **Compress** - Reduce token count while preserving essential information
|
||||
2. **Structure** - Organize content for easy retrieval and reference
|
||||
3. **Extract** - Pull out code snippets, key concepts, and actionable patterns
|
||||
4. **Annotate** - Add metadata for searchability and categorization
|
||||
|
||||
## Distillation Workflow
|
||||
|
||||
### Step 1: Load Raw Content
|
||||
|
||||
```python
|
||||
def load_for_distillation(cursor):
|
||||
"""Get documents ready for distillation."""
|
||||
sql = """
|
||||
SELECT d.doc_id, d.title, d.url, d.raw_content_path,
|
||||
d.doc_type, s.source_type, s.credibility_tier
|
||||
FROM documents d
|
||||
JOIN sources s ON d.source_id = s.source_id
|
||||
LEFT JOIN distilled_content dc ON d.doc_id = dc.doc_id
|
||||
WHERE d.crawl_status = 'completed'
|
||||
AND dc.distill_id IS NULL
|
||||
ORDER BY s.credibility_tier ASC
|
||||
"""
|
||||
cursor.execute(sql)
|
||||
return cursor.fetchall()
|
||||
```
|
||||
|
||||
### Step 2: Analyze Content Structure
|
||||
|
||||
Identify content type and select appropriate distillation strategy:
|
||||
|
||||
```python
|
||||
def analyze_structure(content, doc_type):
|
||||
"""Analyze document structure for distillation."""
|
||||
analysis = {
|
||||
"has_code_blocks": bool(re.findall(r'```[\s\S]*?```', content)),
|
||||
"has_headers": bool(re.findall(r'^#+\s', content, re.MULTILINE)),
|
||||
"has_lists": bool(re.findall(r'^\s*[-*]\s', content, re.MULTILINE)),
|
||||
"has_tables": bool(re.findall(r'\|.*\|', content)),
|
||||
"estimated_tokens": len(content.split()) * 1.3, # Rough estimate
|
||||
"section_count": len(re.findall(r'^#+\s', content, re.MULTILINE))
|
||||
}
|
||||
return analysis
|
||||
```
|
||||
|
||||
### Step 3: Extract Key Components
|
||||
|
||||
**Extract Code Snippets:**
|
||||
```python
|
||||
def extract_code_snippets(content):
|
||||
"""Extract all code blocks with language tags."""
|
||||
pattern = r'```(\w*)\n([\s\S]*?)```'
|
||||
snippets = []
|
||||
for match in re.finditer(pattern, content):
|
||||
snippets.append({
|
||||
"language": match.group(1) or "text",
|
||||
"code": match.group(2).strip(),
|
||||
"context": get_surrounding_text(content, match.start(), 200)
|
||||
})
|
||||
return snippets
|
||||
```
|
||||
|
||||
**Extract Key Concepts:**
|
||||
```python
|
||||
def extract_key_concepts(content, title):
|
||||
"""Use Claude to extract key concepts and definitions."""
|
||||
prompt = f"""
|
||||
Analyze this document and extract key concepts:
|
||||
|
||||
Title: {title}
|
||||
Content: {content[:8000]} # Limit for context
|
||||
|
||||
Return JSON with:
|
||||
- concepts: [{{"term": "...", "definition": "...", "importance": "high|medium|low"}}]
|
||||
- techniques: [{{"name": "...", "description": "...", "use_case": "..."}}]
|
||||
- best_practices: ["..."]
|
||||
"""
|
||||
# Use Claude API to process
|
||||
return claude_extract(prompt)
|
||||
```
|
||||
|
||||
### Step 4: Create Structured Summary
|
||||
|
||||
**Summary Template:**
|
||||
```markdown
|
||||
# {title}
|
||||
|
||||
**Source:** {url}
|
||||
**Type:** {source_type} | **Tier:** {credibility_tier}
|
||||
**Distilled:** {date}
|
||||
|
||||
## Executive Summary
|
||||
{2-3 sentence overview}
|
||||
|
||||
## Key Concepts
|
||||
{bulleted list of core concepts with brief definitions}
|
||||
|
||||
## Techniques & Patterns
|
||||
{extracted techniques with use cases}
|
||||
|
||||
## Code Examples
|
||||
{relevant code snippets with context}
|
||||
|
||||
## Best Practices
|
||||
{actionable recommendations}
|
||||
|
||||
## Related Topics
|
||||
{links to related content in library}
|
||||
```
|
||||
|
||||
### Step 5: Optimize for Tokens
|
||||
|
||||
```python
|
||||
def optimize_content(structured_content, target_ratio=0.30):
|
||||
"""
|
||||
Compress content to target ratio while preserving quality.
|
||||
Target: 30% of original token count.
|
||||
"""
|
||||
original_tokens = count_tokens(structured_content)
|
||||
target_tokens = int(original_tokens * target_ratio)
|
||||
|
||||
# Prioritized compression strategies
|
||||
strategies = [
|
||||
remove_redundant_explanations,
|
||||
condense_examples,
|
||||
merge_similar_sections,
|
||||
trim_verbose_descriptions
|
||||
]
|
||||
|
||||
optimized = structured_content
|
||||
for strategy in strategies:
|
||||
if count_tokens(optimized) > target_tokens:
|
||||
optimized = strategy(optimized)
|
||||
|
||||
return optimized
|
||||
```
|
||||
|
||||
### Step 6: Store Distilled Content
|
||||
|
||||
```python
|
||||
def store_distilled(cursor, doc_id, summary, key_concepts,
|
||||
code_snippets, structured_content,
|
||||
original_tokens, distilled_tokens):
|
||||
sql = """
|
||||
INSERT INTO distilled_content
|
||||
(doc_id, summary, key_concepts, code_snippets, structured_content,
|
||||
token_count_original, token_count_distilled, distill_model, review_status)
|
||||
VALUES (%s, %s, %s, %s, %s, %s, %s, 'claude-opus-4-5', 'pending')
|
||||
"""
|
||||
cursor.execute(sql, (
|
||||
doc_id, summary,
|
||||
json.dumps(key_concepts),
|
||||
json.dumps(code_snippets),
|
||||
structured_content,
|
||||
original_tokens,
|
||||
distilled_tokens
|
||||
))
|
||||
return cursor.lastrowid
|
||||
```
|
||||
|
||||
## Distillation Prompts
|
||||
|
||||
**For Prompt Engineering Content:**
|
||||
```
|
||||
Focus on:
|
||||
1. Specific techniques with before/after examples
|
||||
2. Why techniques work (not just what)
|
||||
3. Common pitfalls and how to avoid them
|
||||
4. Actionable patterns that can be directly applied
|
||||
```
|
||||
|
||||
**For API Documentation:**
|
||||
```
|
||||
Focus on:
|
||||
1. Endpoint specifications and parameters
|
||||
2. Request/response examples
|
||||
3. Error codes and handling
|
||||
4. Rate limits and best practices
|
||||
```
|
||||
|
||||
**For Research Papers:**
|
||||
```
|
||||
Focus on:
|
||||
1. Key findings and conclusions
|
||||
2. Novel techniques introduced
|
||||
3. Practical applications
|
||||
4. Limitations and caveats
|
||||
```
|
||||
|
||||
## Quality Metrics
|
||||
|
||||
Track compression efficiency:
|
||||
|
||||
| Metric | Target |
|
||||
|--------|--------|
|
||||
| Compression Ratio | 25-35% of original |
|
||||
| Key Concept Coverage | ≥90% of important terms |
|
||||
| Code Snippet Retention | 100% of relevant examples |
|
||||
| Readability | Clear, scannable structure |
|
||||
|
||||
## Handling Refactor Requests
|
||||
|
||||
When `quality-reviewer` returns `refactor` decision:
|
||||
|
||||
```python
|
||||
def handle_refactor(distill_id, instructions):
|
||||
"""Re-distill based on reviewer feedback."""
|
||||
# Load original content and existing distillation
|
||||
original = load_raw_content(distill_id)
|
||||
existing = load_distilled_content(distill_id)
|
||||
|
||||
# Apply specific improvements based on instructions
|
||||
improved = apply_improvements(existing, instructions)
|
||||
|
||||
# Update distilled_content
|
||||
update_distilled(distill_id, improved)
|
||||
|
||||
# Reset review status
|
||||
set_review_status(distill_id, 'pending')
|
||||
```
|
||||
|
||||
## Integration
|
||||
|
||||
| From | Input | To |
|
||||
|------|-------|-----|
|
||||
| content-repository | Raw document records | content-distiller |
|
||||
| content-distiller | Distilled content | quality-reviewer |
|
||||
| quality-reviewer | Refactor instructions | content-distiller (loop) |
|
||||
Reference in New Issue
Block a user