feat(reference-curator): Add portable skill suite for reference documentation curation
6 modular skills for curating, processing, and exporting reference docs: - reference-discovery: Search and validate authoritative sources - web-crawler-orchestrator: Multi-backend crawling (Firecrawl/Node/aiohttp/Scrapy) - content-repository: MySQL storage with version tracking - content-distiller: Summarization and key concept extraction - quality-reviewer: QA loop with approve/refactor/research routing - markdown-exporter: Structured output for Claude Projects or fine-tuning Cross-machine installation support: - Environment-based config (~/.reference-curator.env) - Commands tracked in repo, symlinked during install - install.sh with --minimal, --check, --uninstall modes - Firecrawl MCP as default (always available) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,106 @@
|
||||
# Content Distiller
|
||||
|
||||
Analyzes and distills raw crawled content into concise reference materials. Extracts key concepts, code snippets, and creates structured summaries.
|
||||
|
||||
## Trigger Keywords
|
||||
"distill content", "summarize document", "extract key concepts", "process raw content", "create reference summary"
|
||||
|
||||
## Goals
|
||||
|
||||
1. **Compress** - Reduce token count while preserving essential information
|
||||
2. **Structure** - Organize content for easy retrieval
|
||||
3. **Extract** - Pull out code snippets, key concepts, patterns
|
||||
4. **Annotate** - Add metadata for searchability
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 1: Load Raw Content
|
||||
```bash
|
||||
python scripts/load_pending.py --output pending_docs.json
|
||||
```
|
||||
|
||||
### Step 2: Analyze Content Structure
|
||||
Identify document characteristics:
|
||||
- Has code blocks?
|
||||
- Has headers?
|
||||
- Has tables?
|
||||
- Estimated tokens?
|
||||
|
||||
### Step 3: Extract Key Components
|
||||
```bash
|
||||
python scripts/extract_components.py --doc-id 123 --output components.json
|
||||
```
|
||||
|
||||
Extracts:
|
||||
- Code snippets with language tags
|
||||
- Key concepts and definitions
|
||||
- Best practices
|
||||
- Techniques and patterns
|
||||
|
||||
### Step 4: Create Structured Summary
|
||||
Output template:
|
||||
```markdown
|
||||
# {title}
|
||||
|
||||
**Source:** {url}
|
||||
**Type:** {source_type} | **Tier:** {credibility_tier}
|
||||
**Distilled:** {date}
|
||||
|
||||
## Executive Summary
|
||||
{2-3 sentence overview}
|
||||
|
||||
## Key Concepts
|
||||
{bulleted list with definitions}
|
||||
|
||||
## Techniques & Patterns
|
||||
{extracted techniques with use cases}
|
||||
|
||||
## Code Examples
|
||||
{relevant code snippets}
|
||||
|
||||
## Best Practices
|
||||
{actionable recommendations}
|
||||
```
|
||||
|
||||
### Step 5: Optimize for Tokens
|
||||
Target: 25-35% of original token count
|
||||
```bash
|
||||
python scripts/optimize_content.py --doc-id 123 --target-ratio 0.30
|
||||
```
|
||||
|
||||
### Step 6: Store Distilled Content
|
||||
```bash
|
||||
python scripts/store_distilled.py --doc-id 123 --content distilled.md
|
||||
```
|
||||
|
||||
## Quality Metrics
|
||||
|
||||
| Metric | Target |
|
||||
|--------|--------|
|
||||
| Compression Ratio | 25-35% of original |
|
||||
| Key Concept Coverage | ≥90% of important terms |
|
||||
| Code Snippet Retention | 100% of relevant examples |
|
||||
| Readability | Clear, scannable structure |
|
||||
|
||||
## Handling Refactor Requests
|
||||
|
||||
When `quality-reviewer` returns `refactor`:
|
||||
```bash
|
||||
python scripts/refactor_content.py --distill-id 456 --instructions "Add more examples"
|
||||
```
|
||||
|
||||
## Scripts
|
||||
|
||||
- `scripts/load_pending.py` - Load documents pending distillation
|
||||
- `scripts/extract_components.py` - Extract code, concepts, patterns
|
||||
- `scripts/optimize_content.py` - Token optimization
|
||||
- `scripts/store_distilled.py` - Save to database
|
||||
- `scripts/refactor_content.py` - Handle refactor requests
|
||||
|
||||
## Integration
|
||||
|
||||
| From | To |
|
||||
|------|-----|
|
||||
| content-repository | Raw document records |
|
||||
| → | quality-reviewer (distilled content) |
|
||||
| quality-reviewer | Refactor instructions (loop back) |
|
||||
Reference in New Issue
Block a user