feat(reference-curator): Add Claude.ai Projects export format

Add claude-project/ folder with skill files formatted for upload to Claude.ai Projects (web interface): - reference-curator-complete.md: All 6 skills consolidated - INDEX.md: Overview and workflow documentation - Individual skill files (01-06) without YAML frontmatter Add --claude-ai option to install.sh: - Lists available files for upload - Optionally copies to custom destination directory - Provides upload instructions for Claude.ai Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 00:33:06 +07:00
parent 8762f68e6e
commit 243b9d851c
10 changed files with 1987 additions and 0 deletions
--- a/custom-skills/90-reference-curator/claude-project/04-content-distiller.md
+++ b/custom-skills/90-reference-curator/claude-project/04-content-distiller.md
@@ -0,0 +1,234 @@
+
+# Content Distiller
+
+Transforms raw crawled content into structured, high-quality reference materials.
+
+## Distillation Goals
+
+1. **Compress** - Reduce token count while preserving essential information
+2. **Structure** - Organize content for easy retrieval and reference
+3. **Extract** - Pull out code snippets, key concepts, and actionable patterns
+4. **Annotate** - Add metadata for searchability and categorization
+
+## Distillation Workflow
+
+### Step 1: Load Raw Content
+
+```python
+def load_for_distillation(cursor):
+    """Get documents ready for distillation."""
+    sql = """
+    SELECT d.doc_id, d.title, d.url, d.raw_content_path, 
+           d.doc_type, s.source_type, s.credibility_tier
+    FROM documents d
+    JOIN sources s ON d.source_id = s.source_id
+    LEFT JOIN distilled_content dc ON d.doc_id = dc.doc_id
+    WHERE d.crawl_status = 'completed'
+    AND dc.distill_id IS NULL
+    ORDER BY s.credibility_tier ASC
+    """
+    cursor.execute(sql)
+    return cursor.fetchall()
+```
+
+### Step 2: Analyze Content Structure
+
+Identify content type and select appropriate distillation strategy:
+
+```python
+def analyze_structure(content, doc_type):
+    """Analyze document structure for distillation."""
+    analysis = {
+        "has_code_blocks": bool(re.findall(r'```[\s\S]*?```', content)),
+        "has_headers": bool(re.findall(r'^#+\s', content, re.MULTILINE)),
+        "has_lists": bool(re.findall(r'^\s*[-*]\s', content, re.MULTILINE)),
+        "has_tables": bool(re.findall(r'\|.*\|', content)),
+        "estimated_tokens": len(content.split()) * 1.3,  # Rough estimate
+        "section_count": len(re.findall(r'^#+\s', content, re.MULTILINE))
+    }
+    return analysis
+```
+
+### Step 3: Extract Key Components
+
+**Extract Code Snippets:**
+```python
+def extract_code_snippets(content):
+    """Extract all code blocks with language tags."""
+    pattern = r'```(\w*)\n([\s\S]*?)```'
+    snippets = []
+    for match in re.finditer(pattern, content):
+        snippets.append({
+            "language": match.group(1) or "text",
+            "code": match.group(2).strip(),
+            "context": get_surrounding_text(content, match.start(), 200)
+        })
+    return snippets
+```
+
+**Extract Key Concepts:**
+```python
+def extract_key_concepts(content, title):
+    """Use Claude to extract key concepts and definitions."""
+    prompt = f"""
+    Analyze this document and extract key concepts:
+    
+    Title: {title}
+    Content: {content[:8000]}  # Limit for context
+    
+    Return JSON with:
+    - concepts: [{{"term": "...", "definition": "...", "importance": "high|medium|low"}}]
+    - techniques: [{{"name": "...", "description": "...", "use_case": "..."}}]
+    - best_practices: ["..."]
+    """
+    # Use Claude API to process
+    return claude_extract(prompt)
+```
+
+### Step 4: Create Structured Summary
+
+**Summary Template:**
+```markdown
+# {title}
+
+**Source:** {url}
+**Type:** {source_type} | **Tier:** {credibility_tier}
+**Distilled:** {date}
+
+## Executive Summary
+{2-3 sentence overview}
+
+## Key Concepts
+{bulleted list of core concepts with brief definitions}
+
+## Techniques & Patterns
+{extracted techniques with use cases}
+
+## Code Examples
+{relevant code snippets with context}
+
+## Best Practices
+{actionable recommendations}
+
+## Related Topics
+{links to related content in library}
+```
+
+### Step 5: Optimize for Tokens
+
+```python
+def optimize_content(structured_content, target_ratio=0.30):
+    """
+    Compress content to target ratio while preserving quality.
+    Target: 30% of original token count.
+    """
+    original_tokens = count_tokens(structured_content)
+    target_tokens = int(original_tokens * target_ratio)
+    
+    # Prioritized compression strategies
+    strategies = [
+        remove_redundant_explanations,
+        condense_examples,
+        merge_similar_sections,
+        trim_verbose_descriptions
+    ]
+    
+    optimized = structured_content
+    for strategy in strategies:
+        if count_tokens(optimized) > target_tokens:
+            optimized = strategy(optimized)
+    
+    return optimized
+```
+
+### Step 6: Store Distilled Content
+
+```python
+def store_distilled(cursor, doc_id, summary, key_concepts, 
+                    code_snippets, structured_content, 
+                    original_tokens, distilled_tokens):
+    sql = """
+    INSERT INTO distilled_content 
+    (doc_id, summary, key_concepts, code_snippets, structured_content,
+     token_count_original, token_count_distilled, distill_model, review_status)
+    VALUES (%s, %s, %s, %s, %s, %s, %s, 'claude-opus-4-5', 'pending')
+    """
+    cursor.execute(sql, (
+        doc_id, summary, 
+        json.dumps(key_concepts), 
+        json.dumps(code_snippets),
+        structured_content,
+        original_tokens, 
+        distilled_tokens
+    ))
+    return cursor.lastrowid
+```
+
+## Distillation Prompts
+
+**For Prompt Engineering Content:**
+```
+Focus on:
+1. Specific techniques with before/after examples
+2. Why techniques work (not just what)
+3. Common pitfalls and how to avoid them
+4. Actionable patterns that can be directly applied
+```
+
+**For API Documentation:**
+```
+Focus on:
+1. Endpoint specifications and parameters
+2. Request/response examples
+3. Error codes and handling
+4. Rate limits and best practices
+```
+
+**For Research Papers:**
+```
+Focus on:
+1. Key findings and conclusions
+2. Novel techniques introduced
+3. Practical applications
+4. Limitations and caveats
+```
+
+## Quality Metrics
+
+Track compression efficiency:
+
+| Metric | Target |
+|--------|--------|
+| Compression Ratio | 25-35% of original |
+| Key Concept Coverage | ≥90% of important terms |
+| Code Snippet Retention | 100% of relevant examples |
+| Readability | Clear, scannable structure |
+
+## Handling Refactor Requests
+
+When `quality-reviewer` returns `refactor` decision:
+
+```python
+def handle_refactor(distill_id, instructions):
+    """Re-distill based on reviewer feedback."""
+    # Load original content and existing distillation
+    original = load_raw_content(distill_id)
+    existing = load_distilled_content(distill_id)
+    
+    # Apply specific improvements based on instructions
+    improved = apply_improvements(existing, instructions)
+    
+    # Update distilled_content
+    update_distilled(distill_id, improved)
+    
+    # Reset review status
+    set_review_status(distill_id, 'pending')
+```
+
+## Integration
+
+| From | Input | To |
+|------|-------|-----|
+| content-repository | Raw document records | content-distiller |
+| content-distiller | Distilled content | quality-reviewer |
+| quality-reviewer | Refactor instructions | content-distiller (loop) |