feat(reference-curator): Add Claude.ai Projects export format
Add claude-project/ folder with skill files formatted for upload to Claude.ai Projects (web interface): - reference-curator-complete.md: All 6 skills consolidated - INDEX.md: Overview and workflow documentation - Individual skill files (01-06) without YAML frontmatter Add --claude-ai option to install.sh: - Lists available files for upload - Optionally copies to custom destination directory - Provides upload instructions for Claude.ai Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,184 @@
|
||||
|
||||
# Reference Discovery
|
||||
|
||||
Searches for authoritative sources, validates credibility, and produces curated URL lists for crawling.
|
||||
|
||||
## Source Priority Hierarchy
|
||||
|
||||
| Tier | Source Type | Examples |
|
||||
|------|-------------|----------|
|
||||
| **Tier 1** | Official documentation | docs.anthropic.com, docs.claude.com, platform.openai.com/docs |
|
||||
| **Tier 1** | Engineering blogs (official) | anthropic.com/news, openai.com/blog |
|
||||
| **Tier 1** | Official GitHub repos | github.com/anthropics/*, github.com/openai/* |
|
||||
| **Tier 2** | Research papers | arxiv.org, papers with citations |
|
||||
| **Tier 2** | Verified community guides | Cookbook examples, official tutorials |
|
||||
| **Tier 3** | Community content | Blog posts, tutorials, Stack Overflow |
|
||||
|
||||
## Discovery Workflow
|
||||
|
||||
### Step 1: Define Search Scope
|
||||
|
||||
```python
|
||||
search_config = {
|
||||
"topic": "prompt engineering",
|
||||
"vendors": ["anthropic", "openai", "google"],
|
||||
"source_types": ["official_docs", "engineering_blog", "github_repo"],
|
||||
"freshness": "past_year", # past_week, past_month, past_year, any
|
||||
"max_results_per_query": 20
|
||||
}
|
||||
```
|
||||
|
||||
### Step 2: Generate Search Queries
|
||||
|
||||
For a given topic, generate targeted queries:
|
||||
|
||||
```python
|
||||
def generate_queries(topic, vendors):
|
||||
queries = []
|
||||
|
||||
# Official documentation queries
|
||||
for vendor in vendors:
|
||||
queries.append(f"site:docs.{vendor}.com {topic}")
|
||||
queries.append(f"site:{vendor}.com/docs {topic}")
|
||||
|
||||
# Engineering blog queries
|
||||
for vendor in vendors:
|
||||
queries.append(f"site:{vendor}.com/blog {topic}")
|
||||
queries.append(f"site:{vendor}.com/news {topic}")
|
||||
|
||||
# GitHub queries
|
||||
for vendor in vendors:
|
||||
queries.append(f"site:github.com/{vendor} {topic}")
|
||||
|
||||
# Research queries
|
||||
queries.append(f"site:arxiv.org {topic}")
|
||||
|
||||
return queries
|
||||
```
|
||||
|
||||
### Step 3: Execute Search
|
||||
|
||||
Use web search tool for each query:
|
||||
|
||||
```python
|
||||
def execute_discovery(queries):
|
||||
results = []
|
||||
for query in queries:
|
||||
search_results = web_search(query)
|
||||
for result in search_results:
|
||||
results.append({
|
||||
"url": result.url,
|
||||
"title": result.title,
|
||||
"snippet": result.snippet,
|
||||
"query_used": query
|
||||
})
|
||||
return deduplicate_by_url(results)
|
||||
```
|
||||
|
||||
### Step 4: Validate and Score Sources
|
||||
|
||||
```python
|
||||
def score_source(url, title):
|
||||
score = 0.0
|
||||
|
||||
# Domain credibility
|
||||
if any(d in url for d in ['docs.anthropic.com', 'docs.claude.com', 'docs.openai.com']):
|
||||
score += 0.40 # Tier 1 official docs
|
||||
elif any(d in url for d in ['anthropic.com', 'openai.com', 'google.dev']):
|
||||
score += 0.30 # Tier 1 official blog/news
|
||||
elif 'github.com' in url and any(v in url for v in ['anthropics', 'openai', 'google']):
|
||||
score += 0.30 # Tier 1 official repos
|
||||
elif 'arxiv.org' in url:
|
||||
score += 0.20 # Tier 2 research
|
||||
else:
|
||||
score += 0.10 # Tier 3 community
|
||||
|
||||
# Freshness signals (from title/snippet)
|
||||
if any(year in title for year in ['2025', '2024']):
|
||||
score += 0.20
|
||||
elif any(year in title for year in ['2023']):
|
||||
score += 0.10
|
||||
|
||||
# Relevance signals
|
||||
if any(kw in title.lower() for kw in ['guide', 'documentation', 'tutorial', 'best practices']):
|
||||
score += 0.15
|
||||
|
||||
return min(score, 1.0)
|
||||
|
||||
def assign_credibility_tier(score):
|
||||
if score >= 0.60:
|
||||
return 'tier1_official'
|
||||
elif score >= 0.40:
|
||||
return 'tier2_verified'
|
||||
else:
|
||||
return 'tier3_community'
|
||||
```
|
||||
|
||||
### Step 5: Output URL Manifest
|
||||
|
||||
```python
|
||||
def create_manifest(scored_results, topic):
|
||||
manifest = {
|
||||
"discovery_date": datetime.now().isoformat(),
|
||||
"topic": topic,
|
||||
"total_urls": len(scored_results),
|
||||
"urls": []
|
||||
}
|
||||
|
||||
for result in sorted(scored_results, key=lambda x: x['score'], reverse=True):
|
||||
manifest["urls"].append({
|
||||
"url": result["url"],
|
||||
"title": result["title"],
|
||||
"credibility_tier": result["tier"],
|
||||
"credibility_score": result["score"],
|
||||
"source_type": infer_source_type(result["url"]),
|
||||
"vendor": infer_vendor(result["url"])
|
||||
})
|
||||
|
||||
return manifest
|
||||
```
|
||||
|
||||
## Output Format
|
||||
|
||||
Discovery produces a JSON manifest for the crawler:
|
||||
|
||||
```json
|
||||
{
|
||||
"discovery_date": "2025-01-28T10:30:00",
|
||||
"topic": "prompt engineering",
|
||||
"total_urls": 15,
|
||||
"urls": [
|
||||
{
|
||||
"url": "https://docs.anthropic.com/en/docs/prompt-engineering",
|
||||
"title": "Prompt Engineering Guide",
|
||||
"credibility_tier": "tier1_official",
|
||||
"credibility_score": 0.85,
|
||||
"source_type": "official_docs",
|
||||
"vendor": "anthropic"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Known Authoritative Sources
|
||||
|
||||
Pre-validated sources for common topics:
|
||||
|
||||
| Vendor | Documentation | Blog/News | GitHub |
|
||||
|--------|--------------|-----------|--------|
|
||||
| Anthropic | docs.anthropic.com, docs.claude.com | anthropic.com/news | github.com/anthropics |
|
||||
| OpenAI | platform.openai.com/docs | openai.com/blog | github.com/openai |
|
||||
| Google | ai.google.dev/docs | blog.google/technology/ai | github.com/google |
|
||||
|
||||
## Integration
|
||||
|
||||
**Output:** URL manifest JSON → `web-crawler-orchestrator`
|
||||
|
||||
**Database:** Register new sources in `sources` table via `content-repository`
|
||||
|
||||
## Deduplication
|
||||
|
||||
Before outputting, deduplicate URLs:
|
||||
- Normalize URLs (remove trailing slashes, query params)
|
||||
- Check against existing `documents` table via `content-repository`
|
||||
- Merge duplicate entries, keeping highest credibility score
|
||||
@@ -0,0 +1,230 @@
|
||||
|
||||
# Web Crawler Orchestrator
|
||||
|
||||
Manages crawling operations using Firecrawl MCP with rate limiting and format handling.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Firecrawl MCP server connected
|
||||
- Config file at `~/.config/reference-curator/crawl_config.yaml`
|
||||
- Storage directory exists: `~/reference-library/raw/`
|
||||
|
||||
## Crawl Configuration
|
||||
|
||||
```yaml
|
||||
# ~/.config/reference-curator/crawl_config.yaml
|
||||
firecrawl:
|
||||
rate_limit:
|
||||
requests_per_minute: 20
|
||||
concurrent_requests: 3
|
||||
default_options:
|
||||
timeout: 30000
|
||||
only_main_content: true
|
||||
include_html: false
|
||||
|
||||
processing:
|
||||
max_content_size_mb: 50
|
||||
raw_content_dir: ~/reference-library/raw/
|
||||
```
|
||||
|
||||
## Crawl Workflow
|
||||
|
||||
### Step 1: Load URL Manifest
|
||||
|
||||
Receive manifest from `reference-discovery`:
|
||||
|
||||
```python
|
||||
def load_manifest(manifest_path):
|
||||
with open(manifest_path) as f:
|
||||
manifest = json.load(f)
|
||||
return manifest["urls"]
|
||||
```
|
||||
|
||||
### Step 2: Determine Crawl Strategy
|
||||
|
||||
```python
|
||||
def select_strategy(url):
|
||||
"""Select optimal crawl strategy based on URL characteristics."""
|
||||
|
||||
if url.endswith('.pdf'):
|
||||
return 'pdf_extract'
|
||||
elif 'github.com' in url and '/blob/' in url:
|
||||
return 'raw_content' # Get raw file content
|
||||
elif 'github.com' in url:
|
||||
return 'scrape' # Repository pages
|
||||
elif any(d in url for d in ['docs.', 'documentation']):
|
||||
return 'scrape' # Documentation sites
|
||||
else:
|
||||
return 'scrape' # Default
|
||||
```
|
||||
|
||||
### Step 3: Execute Firecrawl
|
||||
|
||||
Use Firecrawl MCP for crawling:
|
||||
|
||||
```python
|
||||
# Single page scrape
|
||||
firecrawl_scrape(
|
||||
url="https://docs.anthropic.com/en/docs/prompt-engineering",
|
||||
formats=["markdown"], # markdown | html | screenshot
|
||||
only_main_content=True,
|
||||
timeout=30000
|
||||
)
|
||||
|
||||
# Multi-page crawl (documentation sites)
|
||||
firecrawl_crawl(
|
||||
url="https://docs.anthropic.com/en/docs/",
|
||||
max_depth=2,
|
||||
limit=50,
|
||||
formats=["markdown"],
|
||||
only_main_content=True
|
||||
)
|
||||
```
|
||||
|
||||
### Step 4: Rate Limiting
|
||||
|
||||
```python
|
||||
import time
|
||||
from collections import deque
|
||||
|
||||
class RateLimiter:
|
||||
def __init__(self, requests_per_minute=20):
|
||||
self.rpm = requests_per_minute
|
||||
self.request_times = deque()
|
||||
|
||||
def wait_if_needed(self):
|
||||
now = time.time()
|
||||
# Remove requests older than 1 minute
|
||||
while self.request_times and now - self.request_times[0] > 60:
|
||||
self.request_times.popleft()
|
||||
|
||||
if len(self.request_times) >= self.rpm:
|
||||
wait_time = 60 - (now - self.request_times[0])
|
||||
if wait_time > 0:
|
||||
time.sleep(wait_time)
|
||||
|
||||
self.request_times.append(time.time())
|
||||
```
|
||||
|
||||
### Step 5: Save Raw Content
|
||||
|
||||
```python
|
||||
import hashlib
|
||||
from pathlib import Path
|
||||
|
||||
def save_content(url, content, content_type='markdown'):
|
||||
"""Save crawled content to raw storage."""
|
||||
|
||||
# Generate filename from URL hash
|
||||
url_hash = hashlib.sha256(url.encode()).hexdigest()[:16]
|
||||
|
||||
# Determine extension
|
||||
ext_map = {'markdown': '.md', 'html': '.html', 'pdf': '.pdf'}
|
||||
ext = ext_map.get(content_type, '.txt')
|
||||
|
||||
# Create dated subdirectory
|
||||
date_dir = datetime.now().strftime('%Y/%m')
|
||||
output_dir = Path.home() / 'reference-library/raw' / date_dir
|
||||
output_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# Save file
|
||||
filepath = output_dir / f"{url_hash}{ext}"
|
||||
if content_type == 'pdf':
|
||||
filepath.write_bytes(content)
|
||||
else:
|
||||
filepath.write_text(content, encoding='utf-8')
|
||||
|
||||
return str(filepath)
|
||||
```
|
||||
|
||||
### Step 6: Generate Crawl Manifest
|
||||
|
||||
```python
|
||||
def create_crawl_manifest(results):
|
||||
manifest = {
|
||||
"crawl_date": datetime.now().isoformat(),
|
||||
"total_crawled": len([r for r in results if r["status"] == "success"]),
|
||||
"total_failed": len([r for r in results if r["status"] == "failed"]),
|
||||
"documents": []
|
||||
}
|
||||
|
||||
for result in results:
|
||||
manifest["documents"].append({
|
||||
"url": result["url"],
|
||||
"status": result["status"],
|
||||
"raw_content_path": result.get("filepath"),
|
||||
"content_size": result.get("size"),
|
||||
"crawl_method": "firecrawl",
|
||||
"error": result.get("error")
|
||||
})
|
||||
|
||||
return manifest
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
| Error | Action |
|
||||
|-------|--------|
|
||||
| Timeout | Retry once with 2x timeout |
|
||||
| Rate limit (429) | Exponential backoff, max 3 retries |
|
||||
| Not found (404) | Log and skip |
|
||||
| Access denied (403) | Log, mark as `failed` |
|
||||
| Connection error | Retry with backoff |
|
||||
|
||||
```python
|
||||
def crawl_with_retry(url, max_retries=3):
|
||||
for attempt in range(max_retries):
|
||||
try:
|
||||
result = firecrawl_scrape(url)
|
||||
return {"status": "success", "content": result}
|
||||
except RateLimitError:
|
||||
wait = 2 ** attempt * 10 # 10, 20, 40 seconds
|
||||
time.sleep(wait)
|
||||
except TimeoutError:
|
||||
if attempt == 0:
|
||||
# Retry with doubled timeout
|
||||
result = firecrawl_scrape(url, timeout=60000)
|
||||
return {"status": "success", "content": result}
|
||||
except NotFoundError:
|
||||
return {"status": "failed", "error": "404 Not Found"}
|
||||
except Exception as e:
|
||||
if attempt == max_retries - 1:
|
||||
return {"status": "failed", "error": str(e)}
|
||||
|
||||
return {"status": "failed", "error": "Max retries exceeded"}
|
||||
```
|
||||
|
||||
## Firecrawl MCP Reference
|
||||
|
||||
**scrape** - Single page:
|
||||
```
|
||||
firecrawl_scrape(url, formats, only_main_content, timeout)
|
||||
```
|
||||
|
||||
**crawl** - Multi-page:
|
||||
```
|
||||
firecrawl_crawl(url, max_depth, limit, formats, only_main_content)
|
||||
```
|
||||
|
||||
**map** - Discover URLs:
|
||||
```
|
||||
firecrawl_map(url, limit) # Returns list of URLs on site
|
||||
```
|
||||
|
||||
## Integration
|
||||
|
||||
| From | Input | To |
|
||||
|------|-------|-----|
|
||||
| reference-discovery | URL manifest | web-crawler-orchestrator |
|
||||
| web-crawler-orchestrator | Crawl manifest + raw files | content-repository |
|
||||
| quality-reviewer (deep_research) | Additional queries | reference-discovery → here |
|
||||
|
||||
## Output Structure
|
||||
|
||||
```
|
||||
~/reference-library/raw/
|
||||
└── 2025/01/
|
||||
├── a1b2c3d4e5f6g7h8.md # Markdown content
|
||||
├── b2c3d4e5f6g7h8i9.md
|
||||
└── c3d4e5f6g7h8i9j0.pdf # PDF documents
|
||||
```
|
||||
@@ -0,0 +1,158 @@
|
||||
|
||||
# Content Repository
|
||||
|
||||
Manages MySQL storage for the reference library system. Handles document storage, version control, deduplication, and retrieval.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- MySQL 8.0+ with utf8mb4 charset
|
||||
- Config file at `~/.config/reference-curator/db_config.yaml`
|
||||
- Database `reference_library` initialized with schema
|
||||
|
||||
## Quick Reference
|
||||
|
||||
### Connection Setup
|
||||
|
||||
```python
|
||||
import yaml
|
||||
import os
|
||||
from pathlib import Path
|
||||
|
||||
def get_db_config():
|
||||
config_path = Path.home() / ".config/reference-curator/db_config.yaml"
|
||||
with open(config_path) as f:
|
||||
config = yaml.safe_load(f)
|
||||
|
||||
# Resolve environment variables
|
||||
mysql = config['mysql']
|
||||
return {
|
||||
'host': mysql['host'],
|
||||
'port': mysql['port'],
|
||||
'database': mysql['database'],
|
||||
'user': os.environ.get('MYSQL_USER', mysql.get('user', '')),
|
||||
'password': os.environ.get('MYSQL_PASSWORD', mysql.get('password', '')),
|
||||
'charset': mysql['charset']
|
||||
}
|
||||
```
|
||||
|
||||
### Core Operations
|
||||
|
||||
**Store New Document:**
|
||||
```python
|
||||
def store_document(cursor, source_id, title, url, doc_type, raw_content_path):
|
||||
sql = """
|
||||
INSERT INTO documents (source_id, title, url, doc_type, crawl_date, crawl_status, raw_content_path)
|
||||
VALUES (%s, %s, %s, %s, NOW(), 'completed', %s)
|
||||
ON DUPLICATE KEY UPDATE
|
||||
version = version + 1,
|
||||
previous_version_id = doc_id,
|
||||
crawl_date = NOW(),
|
||||
raw_content_path = VALUES(raw_content_path)
|
||||
"""
|
||||
cursor.execute(sql, (source_id, title, url, doc_type, raw_content_path))
|
||||
return cursor.lastrowid
|
||||
```
|
||||
|
||||
**Check Duplicate:**
|
||||
```python
|
||||
def is_duplicate(cursor, url):
|
||||
cursor.execute("SELECT doc_id FROM documents WHERE url_hash = SHA2(%s, 256)", (url,))
|
||||
return cursor.fetchone() is not None
|
||||
```
|
||||
|
||||
**Get Document by Topic:**
|
||||
```python
|
||||
def get_docs_by_topic(cursor, topic_slug, min_quality=0.80):
|
||||
sql = """
|
||||
SELECT d.doc_id, d.title, d.url, dc.structured_content, dc.quality_score
|
||||
FROM documents d
|
||||
JOIN document_topics dt ON d.doc_id = dt.doc_id
|
||||
JOIN topics t ON dt.topic_id = t.topic_id
|
||||
LEFT JOIN distilled_content dc ON d.doc_id = dc.doc_id
|
||||
WHERE t.topic_slug = %s
|
||||
AND (dc.review_status = 'approved' OR dc.review_status IS NULL)
|
||||
ORDER BY dt.relevance_score DESC
|
||||
"""
|
||||
cursor.execute(sql, (topic_slug,))
|
||||
return cursor.fetchall()
|
||||
```
|
||||
|
||||
## Table Quick Reference
|
||||
|
||||
| Table | Purpose | Key Fields |
|
||||
|-------|---------|------------|
|
||||
| `sources` | Authorized content sources | source_type, credibility_tier, vendor |
|
||||
| `documents` | Crawled document metadata | url_hash (dedup), version, crawl_status |
|
||||
| `distilled_content` | Processed summaries | review_status, compression_ratio |
|
||||
| `review_logs` | QA decisions | quality_score, decision, refactor_instructions |
|
||||
| `topics` | Taxonomy | topic_slug, parent_topic_id |
|
||||
| `document_topics` | Many-to-many linking | relevance_score |
|
||||
| `export_jobs` | Export tracking | export_type, output_format, status |
|
||||
|
||||
## Status Values
|
||||
|
||||
**crawl_status:** `pending` → `completed` | `failed` | `stale`
|
||||
|
||||
**review_status:** `pending` → `in_review` → `approved` | `needs_refactor` | `rejected`
|
||||
|
||||
**decision (review):** `approve` | `refactor` | `deep_research` | `reject`
|
||||
|
||||
## Common Queries
|
||||
|
||||
### Find Stale Documents (needs re-crawl)
|
||||
```sql
|
||||
SELECT d.doc_id, d.title, d.url, d.crawl_date
|
||||
FROM documents d
|
||||
JOIN crawl_schedule cs ON d.source_id = cs.source_id
|
||||
WHERE d.crawl_date < DATE_SUB(NOW(), INTERVAL
|
||||
CASE cs.frequency
|
||||
WHEN 'daily' THEN 1
|
||||
WHEN 'weekly' THEN 7
|
||||
WHEN 'biweekly' THEN 14
|
||||
WHEN 'monthly' THEN 30
|
||||
END DAY)
|
||||
AND cs.is_enabled = TRUE;
|
||||
```
|
||||
|
||||
### Get Pending Reviews
|
||||
```sql
|
||||
SELECT dc.distill_id, d.title, d.url, dc.token_count_distilled
|
||||
FROM distilled_content dc
|
||||
JOIN documents d ON dc.doc_id = d.doc_id
|
||||
WHERE dc.review_status = 'pending'
|
||||
ORDER BY dc.distill_date ASC;
|
||||
```
|
||||
|
||||
### Export-Ready Content
|
||||
```sql
|
||||
SELECT d.title, d.url, dc.structured_content, t.topic_slug
|
||||
FROM documents d
|
||||
JOIN distilled_content dc ON d.doc_id = dc.doc_id
|
||||
JOIN document_topics dt ON d.doc_id = dt.doc_id
|
||||
JOIN topics t ON dt.topic_id = t.topic_id
|
||||
JOIN review_logs rl ON dc.distill_id = rl.distill_id
|
||||
WHERE rl.decision = 'approve'
|
||||
AND rl.quality_score >= 0.85
|
||||
ORDER BY t.topic_slug, dt.relevance_score DESC;
|
||||
```
|
||||
|
||||
## Workflow Integration
|
||||
|
||||
1. **From crawler-orchestrator:** Receive URL + raw content path → `store_document()`
|
||||
2. **To content-distiller:** Query pending documents → send for processing
|
||||
3. **From quality-reviewer:** Update `review_status` based on decision
|
||||
4. **To markdown-exporter:** Query approved content by topic
|
||||
|
||||
## Error Handling
|
||||
|
||||
- **Duplicate URL:** Silent update (version increment) via `ON DUPLICATE KEY UPDATE`
|
||||
- **Missing source_id:** Validate against `sources` table before insert
|
||||
- **Connection failure:** Implement retry with exponential backoff
|
||||
|
||||
## Full Schema Reference
|
||||
|
||||
See `references/schema.sql` for complete table definitions including indexes and constraints.
|
||||
|
||||
## Config File Template
|
||||
|
||||
See `references/db_config_template.yaml` for connection configuration template.
|
||||
@@ -0,0 +1,234 @@
|
||||
|
||||
# Content Distiller
|
||||
|
||||
Transforms raw crawled content into structured, high-quality reference materials.
|
||||
|
||||
## Distillation Goals
|
||||
|
||||
1. **Compress** - Reduce token count while preserving essential information
|
||||
2. **Structure** - Organize content for easy retrieval and reference
|
||||
3. **Extract** - Pull out code snippets, key concepts, and actionable patterns
|
||||
4. **Annotate** - Add metadata for searchability and categorization
|
||||
|
||||
## Distillation Workflow
|
||||
|
||||
### Step 1: Load Raw Content
|
||||
|
||||
```python
|
||||
def load_for_distillation(cursor):
|
||||
"""Get documents ready for distillation."""
|
||||
sql = """
|
||||
SELECT d.doc_id, d.title, d.url, d.raw_content_path,
|
||||
d.doc_type, s.source_type, s.credibility_tier
|
||||
FROM documents d
|
||||
JOIN sources s ON d.source_id = s.source_id
|
||||
LEFT JOIN distilled_content dc ON d.doc_id = dc.doc_id
|
||||
WHERE d.crawl_status = 'completed'
|
||||
AND dc.distill_id IS NULL
|
||||
ORDER BY s.credibility_tier ASC
|
||||
"""
|
||||
cursor.execute(sql)
|
||||
return cursor.fetchall()
|
||||
```
|
||||
|
||||
### Step 2: Analyze Content Structure
|
||||
|
||||
Identify content type and select appropriate distillation strategy:
|
||||
|
||||
```python
|
||||
def analyze_structure(content, doc_type):
|
||||
"""Analyze document structure for distillation."""
|
||||
analysis = {
|
||||
"has_code_blocks": bool(re.findall(r'```[\s\S]*?```', content)),
|
||||
"has_headers": bool(re.findall(r'^#+\s', content, re.MULTILINE)),
|
||||
"has_lists": bool(re.findall(r'^\s*[-*]\s', content, re.MULTILINE)),
|
||||
"has_tables": bool(re.findall(r'\|.*\|', content)),
|
||||
"estimated_tokens": len(content.split()) * 1.3, # Rough estimate
|
||||
"section_count": len(re.findall(r'^#+\s', content, re.MULTILINE))
|
||||
}
|
||||
return analysis
|
||||
```
|
||||
|
||||
### Step 3: Extract Key Components
|
||||
|
||||
**Extract Code Snippets:**
|
||||
```python
|
||||
def extract_code_snippets(content):
|
||||
"""Extract all code blocks with language tags."""
|
||||
pattern = r'```(\w*)\n([\s\S]*?)```'
|
||||
snippets = []
|
||||
for match in re.finditer(pattern, content):
|
||||
snippets.append({
|
||||
"language": match.group(1) or "text",
|
||||
"code": match.group(2).strip(),
|
||||
"context": get_surrounding_text(content, match.start(), 200)
|
||||
})
|
||||
return snippets
|
||||
```
|
||||
|
||||
**Extract Key Concepts:**
|
||||
```python
|
||||
def extract_key_concepts(content, title):
|
||||
"""Use Claude to extract key concepts and definitions."""
|
||||
prompt = f"""
|
||||
Analyze this document and extract key concepts:
|
||||
|
||||
Title: {title}
|
||||
Content: {content[:8000]} # Limit for context
|
||||
|
||||
Return JSON with:
|
||||
- concepts: [{{"term": "...", "definition": "...", "importance": "high|medium|low"}}]
|
||||
- techniques: [{{"name": "...", "description": "...", "use_case": "..."}}]
|
||||
- best_practices: ["..."]
|
||||
"""
|
||||
# Use Claude API to process
|
||||
return claude_extract(prompt)
|
||||
```
|
||||
|
||||
### Step 4: Create Structured Summary
|
||||
|
||||
**Summary Template:**
|
||||
```markdown
|
||||
# {title}
|
||||
|
||||
**Source:** {url}
|
||||
**Type:** {source_type} | **Tier:** {credibility_tier}
|
||||
**Distilled:** {date}
|
||||
|
||||
## Executive Summary
|
||||
{2-3 sentence overview}
|
||||
|
||||
## Key Concepts
|
||||
{bulleted list of core concepts with brief definitions}
|
||||
|
||||
## Techniques & Patterns
|
||||
{extracted techniques with use cases}
|
||||
|
||||
## Code Examples
|
||||
{relevant code snippets with context}
|
||||
|
||||
## Best Practices
|
||||
{actionable recommendations}
|
||||
|
||||
## Related Topics
|
||||
{links to related content in library}
|
||||
```
|
||||
|
||||
### Step 5: Optimize for Tokens
|
||||
|
||||
```python
|
||||
def optimize_content(structured_content, target_ratio=0.30):
|
||||
"""
|
||||
Compress content to target ratio while preserving quality.
|
||||
Target: 30% of original token count.
|
||||
"""
|
||||
original_tokens = count_tokens(structured_content)
|
||||
target_tokens = int(original_tokens * target_ratio)
|
||||
|
||||
# Prioritized compression strategies
|
||||
strategies = [
|
||||
remove_redundant_explanations,
|
||||
condense_examples,
|
||||
merge_similar_sections,
|
||||
trim_verbose_descriptions
|
||||
]
|
||||
|
||||
optimized = structured_content
|
||||
for strategy in strategies:
|
||||
if count_tokens(optimized) > target_tokens:
|
||||
optimized = strategy(optimized)
|
||||
|
||||
return optimized
|
||||
```
|
||||
|
||||
### Step 6: Store Distilled Content
|
||||
|
||||
```python
|
||||
def store_distilled(cursor, doc_id, summary, key_concepts,
|
||||
code_snippets, structured_content,
|
||||
original_tokens, distilled_tokens):
|
||||
sql = """
|
||||
INSERT INTO distilled_content
|
||||
(doc_id, summary, key_concepts, code_snippets, structured_content,
|
||||
token_count_original, token_count_distilled, distill_model, review_status)
|
||||
VALUES (%s, %s, %s, %s, %s, %s, %s, 'claude-opus-4-5', 'pending')
|
||||
"""
|
||||
cursor.execute(sql, (
|
||||
doc_id, summary,
|
||||
json.dumps(key_concepts),
|
||||
json.dumps(code_snippets),
|
||||
structured_content,
|
||||
original_tokens,
|
||||
distilled_tokens
|
||||
))
|
||||
return cursor.lastrowid
|
||||
```
|
||||
|
||||
## Distillation Prompts
|
||||
|
||||
**For Prompt Engineering Content:**
|
||||
```
|
||||
Focus on:
|
||||
1. Specific techniques with before/after examples
|
||||
2. Why techniques work (not just what)
|
||||
3. Common pitfalls and how to avoid them
|
||||
4. Actionable patterns that can be directly applied
|
||||
```
|
||||
|
||||
**For API Documentation:**
|
||||
```
|
||||
Focus on:
|
||||
1. Endpoint specifications and parameters
|
||||
2. Request/response examples
|
||||
3. Error codes and handling
|
||||
4. Rate limits and best practices
|
||||
```
|
||||
|
||||
**For Research Papers:**
|
||||
```
|
||||
Focus on:
|
||||
1. Key findings and conclusions
|
||||
2. Novel techniques introduced
|
||||
3. Practical applications
|
||||
4. Limitations and caveats
|
||||
```
|
||||
|
||||
## Quality Metrics
|
||||
|
||||
Track compression efficiency:
|
||||
|
||||
| Metric | Target |
|
||||
|--------|--------|
|
||||
| Compression Ratio | 25-35% of original |
|
||||
| Key Concept Coverage | ≥90% of important terms |
|
||||
| Code Snippet Retention | 100% of relevant examples |
|
||||
| Readability | Clear, scannable structure |
|
||||
|
||||
## Handling Refactor Requests
|
||||
|
||||
When `quality-reviewer` returns `refactor` decision:
|
||||
|
||||
```python
|
||||
def handle_refactor(distill_id, instructions):
|
||||
"""Re-distill based on reviewer feedback."""
|
||||
# Load original content and existing distillation
|
||||
original = load_raw_content(distill_id)
|
||||
existing = load_distilled_content(distill_id)
|
||||
|
||||
# Apply specific improvements based on instructions
|
||||
improved = apply_improvements(existing, instructions)
|
||||
|
||||
# Update distilled_content
|
||||
update_distilled(distill_id, improved)
|
||||
|
||||
# Reset review status
|
||||
set_review_status(distill_id, 'pending')
|
||||
```
|
||||
|
||||
## Integration
|
||||
|
||||
| From | Input | To |
|
||||
|------|-------|-----|
|
||||
| content-repository | Raw document records | content-distiller |
|
||||
| content-distiller | Distilled content | quality-reviewer |
|
||||
| quality-reviewer | Refactor instructions | content-distiller (loop) |
|
||||
@@ -0,0 +1,223 @@
|
||||
|
||||
# Quality Reviewer
|
||||
|
||||
Evaluates distilled content for quality, routes decisions, and triggers refactoring or additional research when needed.
|
||||
|
||||
## Review Workflow
|
||||
|
||||
```
|
||||
[Distilled Content]
|
||||
│
|
||||
▼
|
||||
┌─────────────────┐
|
||||
│ Score Criteria │ → accuracy, completeness, clarity, PE quality, usability
|
||||
└─────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────┐
|
||||
│ Calculate Total │ → weighted average
|
||||
└─────────────────┘
|
||||
│
|
||||
├── ≥ 0.85 → APPROVE → markdown-exporter
|
||||
├── 0.60-0.84 → REFACTOR → content-distiller (with instructions)
|
||||
├── 0.40-0.59 → DEEP_RESEARCH → web-crawler-orchestrator (with queries)
|
||||
└── < 0.40 → REJECT → archive with reason
|
||||
```
|
||||
|
||||
## Scoring Criteria
|
||||
|
||||
| Criterion | Weight | Checks |
|
||||
|-----------|--------|--------|
|
||||
| **Accuracy** | 0.25 | Factual correctness, up-to-date info, proper attribution |
|
||||
| **Completeness** | 0.20 | Covers key concepts, includes examples, addresses edge cases |
|
||||
| **Clarity** | 0.20 | Clear structure, concise language, logical flow |
|
||||
| **PE Quality** | 0.25 | Demonstrates techniques, before/after examples, explains why |
|
||||
| **Usability** | 0.10 | Easy to reference, searchable keywords, appropriate length |
|
||||
|
||||
## Decision Thresholds
|
||||
|
||||
| Score Range | Decision | Action |
|
||||
|-------------|----------|--------|
|
||||
| ≥ 0.85 | `approve` | Proceed to export |
|
||||
| 0.60 - 0.84 | `refactor` | Return to distiller with feedback |
|
||||
| 0.40 - 0.59 | `deep_research` | Gather more sources, then re-distill |
|
||||
| < 0.40 | `reject` | Archive, log reason |
|
||||
|
||||
## Review Process
|
||||
|
||||
### Step 1: Load Content for Review
|
||||
|
||||
```python
|
||||
def get_pending_reviews(cursor):
|
||||
sql = """
|
||||
SELECT dc.distill_id, dc.doc_id, d.title, d.url,
|
||||
dc.summary, dc.key_concepts, dc.structured_content,
|
||||
dc.token_count_original, dc.token_count_distilled,
|
||||
s.credibility_tier
|
||||
FROM distilled_content dc
|
||||
JOIN documents d ON dc.doc_id = d.doc_id
|
||||
JOIN sources s ON d.source_id = s.source_id
|
||||
WHERE dc.review_status = 'pending'
|
||||
ORDER BY s.credibility_tier ASC, dc.distill_date ASC
|
||||
"""
|
||||
cursor.execute(sql)
|
||||
return cursor.fetchall()
|
||||
```
|
||||
|
||||
### Step 2: Score Each Criterion
|
||||
|
||||
Evaluate content against each criterion using this assessment template:
|
||||
|
||||
```python
|
||||
assessment_template = {
|
||||
"accuracy": {
|
||||
"score": 0.0, # 0.00 - 1.00
|
||||
"notes": "",
|
||||
"issues": [] # Specific factual errors if any
|
||||
},
|
||||
"completeness": {
|
||||
"score": 0.0,
|
||||
"notes": "",
|
||||
"missing_topics": [] # Concepts that should be covered
|
||||
},
|
||||
"clarity": {
|
||||
"score": 0.0,
|
||||
"notes": "",
|
||||
"confusing_sections": [] # Sections needing rewrite
|
||||
},
|
||||
"prompt_engineering_quality": {
|
||||
"score": 0.0,
|
||||
"notes": "",
|
||||
"improvements": [] # Specific PE technique gaps
|
||||
},
|
||||
"usability": {
|
||||
"score": 0.0,
|
||||
"notes": "",
|
||||
"suggestions": []
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Step 3: Calculate Final Score
|
||||
|
||||
```python
|
||||
WEIGHTS = {
|
||||
"accuracy": 0.25,
|
||||
"completeness": 0.20,
|
||||
"clarity": 0.20,
|
||||
"prompt_engineering_quality": 0.25,
|
||||
"usability": 0.10
|
||||
}
|
||||
|
||||
def calculate_quality_score(assessment):
|
||||
return sum(
|
||||
assessment[criterion]["score"] * weight
|
||||
for criterion, weight in WEIGHTS.items()
|
||||
)
|
||||
```
|
||||
|
||||
### Step 4: Route Decision
|
||||
|
||||
```python
|
||||
def determine_decision(score, assessment):
|
||||
if score >= 0.85:
|
||||
return "approve", None, None
|
||||
elif score >= 0.60:
|
||||
instructions = generate_refactor_instructions(assessment)
|
||||
return "refactor", instructions, None
|
||||
elif score >= 0.40:
|
||||
queries = generate_research_queries(assessment)
|
||||
return "deep_research", None, queries
|
||||
else:
|
||||
return "reject", f"Quality score {score:.2f} below minimum threshold", None
|
||||
|
||||
def generate_refactor_instructions(assessment):
|
||||
"""Extract actionable feedback from low-scoring criteria."""
|
||||
instructions = []
|
||||
for criterion, data in assessment.items():
|
||||
if data["score"] < 0.80:
|
||||
if data.get("issues"):
|
||||
instructions.extend(data["issues"])
|
||||
if data.get("missing_topics"):
|
||||
instructions.append(f"Add coverage for: {', '.join(data['missing_topics'])}")
|
||||
if data.get("improvements"):
|
||||
instructions.extend(data["improvements"])
|
||||
return "\n".join(instructions)
|
||||
|
||||
def generate_research_queries(assessment):
|
||||
"""Generate search queries for content gaps."""
|
||||
queries = []
|
||||
if assessment["completeness"]["missing_topics"]:
|
||||
for topic in assessment["completeness"]["missing_topics"]:
|
||||
queries.append(f"{topic} documentation guide")
|
||||
if assessment["accuracy"]["issues"]:
|
||||
queries.append("latest official documentation verification")
|
||||
return queries
|
||||
```
|
||||
|
||||
### Step 5: Log Review Decision
|
||||
|
||||
```python
|
||||
def log_review(cursor, distill_id, assessment, score, decision, instructions=None, queries=None):
|
||||
# Get current round number
|
||||
cursor.execute(
|
||||
"SELECT COALESCE(MAX(review_round), 0) + 1 FROM review_logs WHERE distill_id = %s",
|
||||
(distill_id,)
|
||||
)
|
||||
review_round = cursor.fetchone()[0]
|
||||
|
||||
sql = """
|
||||
INSERT INTO review_logs
|
||||
(distill_id, review_round, reviewer_type, quality_score, assessment,
|
||||
decision, refactor_instructions, research_queries)
|
||||
VALUES (%s, %s, 'claude_review', %s, %s, %s, %s, %s)
|
||||
"""
|
||||
cursor.execute(sql, (
|
||||
distill_id, review_round, score,
|
||||
json.dumps(assessment), decision, instructions,
|
||||
json.dumps(queries) if queries else None
|
||||
))
|
||||
|
||||
# Update distilled_content status
|
||||
status_map = {
|
||||
"approve": "approved",
|
||||
"refactor": "needs_refactor",
|
||||
"deep_research": "needs_refactor",
|
||||
"reject": "rejected"
|
||||
}
|
||||
cursor.execute(
|
||||
"UPDATE distilled_content SET review_status = %s WHERE distill_id = %s",
|
||||
(status_map[decision], distill_id)
|
||||
)
|
||||
```
|
||||
|
||||
## Prompt Engineering Quality Checklist
|
||||
|
||||
When scoring `prompt_engineering_quality`, verify:
|
||||
|
||||
- [ ] Demonstrates specific techniques (CoT, few-shot, etc.)
|
||||
- [ ] Shows before/after examples
|
||||
- [ ] Explains *why* techniques work, not just *what*
|
||||
- [ ] Provides actionable patterns
|
||||
- [ ] Includes edge cases and failure modes
|
||||
- [ ] References authoritative sources
|
||||
|
||||
## Auto-Approve Rules
|
||||
|
||||
Tier 1 (official) sources with score ≥ 0.80 may auto-approve without human review if configured:
|
||||
|
||||
```yaml
|
||||
# In export_config.yaml
|
||||
quality:
|
||||
auto_approve_tier1_sources: true
|
||||
auto_approve_min_score: 0.80
|
||||
```
|
||||
|
||||
## Integration Points
|
||||
|
||||
| From | Action | To |
|
||||
|------|--------|-----|
|
||||
| content-distiller | Sends distilled content | quality-reviewer |
|
||||
| quality-reviewer | APPROVE | markdown-exporter |
|
||||
| quality-reviewer | REFACTOR + instructions | content-distiller |
|
||||
| quality-reviewer | DEEP_RESEARCH + queries | web-crawler-orchestrator |
|
||||
@@ -0,0 +1,290 @@
|
||||
|
||||
# Markdown Exporter
|
||||
|
||||
Exports approved content as structured markdown files for Claude Projects or fine-tuning.
|
||||
|
||||
## Export Configuration
|
||||
|
||||
```yaml
|
||||
# ~/.config/reference-curator/export_config.yaml
|
||||
output:
|
||||
base_path: ~/reference-library/exports/
|
||||
|
||||
project_files:
|
||||
structure: nested_by_topic # flat | nested_by_topic | nested_by_source
|
||||
index_file: INDEX.md
|
||||
include_metadata: true
|
||||
|
||||
fine_tuning:
|
||||
format: jsonl
|
||||
max_tokens_per_sample: 4096
|
||||
include_system_prompt: true
|
||||
|
||||
quality:
|
||||
min_score_for_export: 0.80
|
||||
```
|
||||
|
||||
## Export Workflow
|
||||
|
||||
### Step 1: Query Approved Content
|
||||
|
||||
```python
|
||||
def get_exportable_content(cursor, min_score=0.80, topic_filter=None):
|
||||
"""Get all approved content meeting quality threshold."""
|
||||
sql = """
|
||||
SELECT d.doc_id, d.title, d.url,
|
||||
dc.summary, dc.key_concepts, dc.code_snippets, dc.structured_content,
|
||||
t.topic_slug, t.topic_name,
|
||||
rl.quality_score, s.credibility_tier, s.vendor
|
||||
FROM documents d
|
||||
JOIN distilled_content dc ON d.doc_id = dc.doc_id
|
||||
JOIN document_topics dt ON d.doc_id = dt.doc_id
|
||||
JOIN topics t ON dt.topic_id = t.topic_id
|
||||
JOIN review_logs rl ON dc.distill_id = rl.distill_id
|
||||
JOIN sources s ON d.source_id = s.source_id
|
||||
WHERE rl.decision = 'approve'
|
||||
AND rl.quality_score >= %s
|
||||
AND rl.review_id = (
|
||||
SELECT MAX(review_id) FROM review_logs
|
||||
WHERE distill_id = dc.distill_id
|
||||
)
|
||||
"""
|
||||
params = [min_score]
|
||||
|
||||
if topic_filter:
|
||||
sql += " AND t.topic_slug IN (%s)" % ','.join(['%s'] * len(topic_filter))
|
||||
params.extend(topic_filter)
|
||||
|
||||
sql += " ORDER BY t.topic_slug, rl.quality_score DESC"
|
||||
cursor.execute(sql, params)
|
||||
return cursor.fetchall()
|
||||
```
|
||||
|
||||
### Step 2: Organize by Structure
|
||||
|
||||
**Nested by Topic (recommended):**
|
||||
```
|
||||
exports/
|
||||
├── INDEX.md
|
||||
├── prompt-engineering/
|
||||
│ ├── _index.md
|
||||
│ ├── 01-chain-of-thought.md
|
||||
│ ├── 02-few-shot-prompting.md
|
||||
│ └── 03-system-prompts.md
|
||||
├── claude-models/
|
||||
│ ├── _index.md
|
||||
│ ├── 01-model-comparison.md
|
||||
│ └── 02-context-windows.md
|
||||
└── agent-building/
|
||||
├── _index.md
|
||||
└── 01-tool-use.md
|
||||
```
|
||||
|
||||
**Flat Structure:**
|
||||
```
|
||||
exports/
|
||||
├── INDEX.md
|
||||
├── prompt-engineering-chain-of-thought.md
|
||||
├── prompt-engineering-few-shot.md
|
||||
└── claude-models-comparison.md
|
||||
```
|
||||
|
||||
### Step 3: Generate Files
|
||||
|
||||
**Document File Template:**
|
||||
```python
|
||||
def generate_document_file(doc, include_metadata=True):
|
||||
content = []
|
||||
|
||||
if include_metadata:
|
||||
content.append("---")
|
||||
content.append(f"title: {doc['title']}")
|
||||
content.append(f"source: {doc['url']}")
|
||||
content.append(f"vendor: {doc['vendor']}")
|
||||
content.append(f"tier: {doc['credibility_tier']}")
|
||||
content.append(f"quality_score: {doc['quality_score']:.2f}")
|
||||
content.append(f"exported: {datetime.now().isoformat()}")
|
||||
content.append("---")
|
||||
content.append("")
|
||||
|
||||
content.append(doc['structured_content'])
|
||||
|
||||
return "\n".join(content)
|
||||
```
|
||||
|
||||
**Topic Index Template:**
|
||||
```python
|
||||
def generate_topic_index(topic_slug, topic_name, documents):
|
||||
content = [
|
||||
f"# {topic_name}",
|
||||
"",
|
||||
f"This section contains {len(documents)} reference documents.",
|
||||
"",
|
||||
"## Contents",
|
||||
""
|
||||
]
|
||||
|
||||
for i, doc in enumerate(documents, 1):
|
||||
filename = generate_filename(doc['title'])
|
||||
content.append(f"{i}. [{doc['title']}]({filename})")
|
||||
|
||||
return "\n".join(content)
|
||||
```
|
||||
|
||||
**Root INDEX Template:**
|
||||
```python
|
||||
def generate_root_index(topics_with_counts, export_date):
|
||||
content = [
|
||||
"# Reference Library",
|
||||
"",
|
||||
f"Exported: {export_date}",
|
||||
"",
|
||||
"## Topics",
|
||||
""
|
||||
]
|
||||
|
||||
for topic in topics_with_counts:
|
||||
content.append(f"- [{topic['name']}]({topic['slug']}/) ({topic['count']} documents)")
|
||||
|
||||
content.extend([
|
||||
"",
|
||||
"## Quality Standards",
|
||||
"",
|
||||
"All documents in this library have:",
|
||||
"- Passed quality review (score ≥ 0.80)",
|
||||
"- Been distilled for conciseness",
|
||||
"- Verified source attribution"
|
||||
])
|
||||
|
||||
return "\n".join(content)
|
||||
```
|
||||
|
||||
### Step 4: Write Files
|
||||
|
||||
```python
|
||||
def export_project_files(content_list, config):
|
||||
base_path = Path(config['output']['base_path'])
|
||||
structure = config['output']['project_files']['structure']
|
||||
|
||||
# Group by topic
|
||||
by_topic = defaultdict(list)
|
||||
for doc in content_list:
|
||||
by_topic[doc['topic_slug']].append(doc)
|
||||
|
||||
# Create directories and files
|
||||
for topic_slug, docs in by_topic.items():
|
||||
if structure == 'nested_by_topic':
|
||||
topic_dir = base_path / topic_slug
|
||||
topic_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# Write topic index
|
||||
topic_index = generate_topic_index(topic_slug, docs[0]['topic_name'], docs)
|
||||
(topic_dir / '_index.md').write_text(topic_index)
|
||||
|
||||
# Write document files
|
||||
for i, doc in enumerate(docs, 1):
|
||||
filename = f"{i:02d}-{slugify(doc['title'])}.md"
|
||||
file_content = generate_document_file(doc)
|
||||
(topic_dir / filename).write_text(file_content)
|
||||
|
||||
# Write root INDEX
|
||||
topics_summary = [
|
||||
{"slug": slug, "name": docs[0]['topic_name'], "count": len(docs)}
|
||||
for slug, docs in by_topic.items()
|
||||
]
|
||||
root_index = generate_root_index(topics_summary, datetime.now().isoformat())
|
||||
(base_path / 'INDEX.md').write_text(root_index)
|
||||
```
|
||||
|
||||
### Step 5: Fine-tuning Export (Optional)
|
||||
|
||||
```python
|
||||
def export_fine_tuning_dataset(content_list, config):
|
||||
"""Export as JSONL for fine-tuning."""
|
||||
output_path = Path(config['output']['base_path']) / 'fine_tuning.jsonl'
|
||||
max_tokens = config['output']['fine_tuning']['max_tokens_per_sample']
|
||||
|
||||
with open(output_path, 'w') as f:
|
||||
for doc in content_list:
|
||||
sample = {
|
||||
"messages": [
|
||||
{
|
||||
"role": "system",
|
||||
"content": "You are an expert on AI and prompt engineering."
|
||||
},
|
||||
{
|
||||
"role": "user",
|
||||
"content": f"Explain {doc['title']}"
|
||||
},
|
||||
{
|
||||
"role": "assistant",
|
||||
"content": truncate_to_tokens(doc['structured_content'], max_tokens)
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"source": doc['url'],
|
||||
"topic": doc['topic_slug'],
|
||||
"quality_score": doc['quality_score']
|
||||
}
|
||||
}
|
||||
f.write(json.dumps(sample) + '\n')
|
||||
```
|
||||
|
||||
### Step 6: Log Export Job
|
||||
|
||||
```python
|
||||
def log_export_job(cursor, export_name, export_type, output_path,
|
||||
topic_filter, total_docs, total_tokens):
|
||||
sql = """
|
||||
INSERT INTO export_jobs
|
||||
(export_name, export_type, output_format, topic_filter, output_path,
|
||||
total_documents, total_tokens, status, started_at, completed_at)
|
||||
VALUES (%s, %s, 'markdown', %s, %s, %s, %s, 'completed', NOW(), NOW())
|
||||
"""
|
||||
cursor.execute(sql, (
|
||||
export_name, export_type,
|
||||
json.dumps(topic_filter) if topic_filter else None,
|
||||
str(output_path), total_docs, total_tokens
|
||||
))
|
||||
```
|
||||
|
||||
## Cross-Reference Generation
|
||||
|
||||
Link related documents:
|
||||
|
||||
```python
|
||||
def add_cross_references(doc, all_docs):
|
||||
"""Find and link related documents."""
|
||||
related = []
|
||||
doc_concepts = set(c['term'].lower() for c in doc['key_concepts'])
|
||||
|
||||
for other in all_docs:
|
||||
if other['doc_id'] == doc['doc_id']:
|
||||
continue
|
||||
other_concepts = set(c['term'].lower() for c in other['key_concepts'])
|
||||
overlap = len(doc_concepts & other_concepts)
|
||||
if overlap >= 2:
|
||||
related.append({
|
||||
"title": other['title'],
|
||||
"path": generate_relative_path(doc, other),
|
||||
"overlap": overlap
|
||||
})
|
||||
|
||||
return sorted(related, key=lambda x: x['overlap'], reverse=True)[:5]
|
||||
```
|
||||
|
||||
## Output Verification
|
||||
|
||||
After export, verify:
|
||||
- [ ] All files readable and valid markdown
|
||||
- [ ] INDEX.md links resolve correctly
|
||||
- [ ] No broken cross-references
|
||||
- [ ] Total token count matches expectation
|
||||
- [ ] No duplicate content
|
||||
|
||||
## Integration
|
||||
|
||||
| From | Input | To |
|
||||
|------|-------|-----|
|
||||
| quality-reviewer | Approved content IDs | markdown-exporter |
|
||||
| markdown-exporter | Structured files | Project knowledge / Fine-tuning |
|
||||
89
custom-skills/90-reference-curator/claude-project/INDEX.md
Normal file
89
custom-skills/90-reference-curator/claude-project/INDEX.md
Normal file
@@ -0,0 +1,89 @@
|
||||
# Reference Curator - Claude.ai Project Knowledge
|
||||
|
||||
This project knowledge enables Claude to curate, process, and export reference documentation through 6 modular skills.
|
||||
|
||||
## Skills Overview
|
||||
|
||||
| Skill | Purpose | Trigger Phrases |
|
||||
|-------|---------|-----------------|
|
||||
| **reference-discovery** | Search & validate authoritative sources | "find references", "search documentation", "discover sources" |
|
||||
| **web-crawler** | Multi-backend crawling orchestration | "crawl URL", "fetch documents", "scrape pages" |
|
||||
| **content-repository** | MySQL storage management | "store content", "save to database", "check duplicates" |
|
||||
| **content-distiller** | Summarize & extract key concepts | "distill content", "summarize document", "extract key concepts" |
|
||||
| **quality-reviewer** | QA scoring & routing decisions | "review content", "quality check", "assess distilled content" |
|
||||
| **markdown-exporter** | Export to markdown/JSONL | "export references", "generate project files", "create markdown output" |
|
||||
|
||||
## Workflow
|
||||
|
||||
```
|
||||
[Topic Input]
|
||||
│
|
||||
▼
|
||||
┌─────────────────────┐
|
||||
│ reference-discovery │ → Search & validate sources
|
||||
└─────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────┐
|
||||
│ web-crawler │ → Crawl (Firecrawl/Node.js/aiohttp/Scrapy)
|
||||
└─────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────┐
|
||||
│ content-repository │ → Store in MySQL
|
||||
└─────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────┐
|
||||
│ content-distiller │ → Summarize & extract
|
||||
└─────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────┐
|
||||
│ quality-reviewer │ → QA loop
|
||||
└─────────────────────┘
|
||||
│
|
||||
├── REFACTOR → content-distiller
|
||||
├── DEEP_RESEARCH → web-crawler
|
||||
│
|
||||
▼ APPROVE
|
||||
┌─────────────────────┐
|
||||
│ markdown-exporter │ → Project files / Fine-tuning
|
||||
└─────────────────────┘
|
||||
```
|
||||
|
||||
## Quality Scoring Thresholds
|
||||
|
||||
| Score | Decision | Action |
|
||||
|-------|----------|--------|
|
||||
| ≥ 0.85 | **Approve** | Ready for export |
|
||||
| 0.60-0.84 | **Refactor** | Re-distill with feedback |
|
||||
| 0.40-0.59 | **Deep Research** | Gather more sources |
|
||||
| < 0.40 | **Reject** | Archive (low quality) |
|
||||
|
||||
## Source Credibility Tiers
|
||||
|
||||
| Tier | Source Type | Examples |
|
||||
|------|-------------|----------|
|
||||
| **Tier 1** | Official documentation | docs.anthropic.com, platform.openai.com/docs |
|
||||
| **Tier 1** | Official engineering blogs | anthropic.com/news, openai.com/blog |
|
||||
| **Tier 2** | Research papers | arxiv.org papers with citations |
|
||||
| **Tier 2** | Verified community guides | Official cookbooks, tutorials |
|
||||
| **Tier 3** | Community content | Blog posts, Stack Overflow |
|
||||
|
||||
## Files in This Project
|
||||
|
||||
- `INDEX.md` - This overview file
|
||||
- `reference-curator-complete.md` - All 6 skills in one file
|
||||
- `01-reference-discovery.md` - Source discovery skill
|
||||
- `02-web-crawler.md` - Crawling orchestration skill
|
||||
- `03-content-repository.md` - Database storage skill
|
||||
- `04-content-distiller.md` - Content summarization skill
|
||||
- `05-quality-reviewer.md` - QA review skill
|
||||
- `06-markdown-exporter.md` - Export skill
|
||||
|
||||
## Usage
|
||||
|
||||
Upload all files to a Claude.ai Project, or upload only the skills you need.
|
||||
|
||||
For the complete experience, upload `reference-curator-complete.md` which contains all skills in one file.
|
||||
@@ -0,0 +1,473 @@
|
||||
# Reference Curator - Complete Skill Set
|
||||
|
||||
This document contains all 6 skills for curating, processing, and exporting reference documentation.
|
||||
|
||||
---
|
||||
|
||||
# 1. Reference Discovery
|
||||
|
||||
Searches for authoritative sources, validates credibility, and produces curated URL lists for crawling.
|
||||
|
||||
## Source Priority Hierarchy
|
||||
|
||||
| Tier | Source Type | Examples |
|
||||
|------|-------------|----------|
|
||||
| **Tier 1** | Official documentation | docs.anthropic.com, docs.claude.com, platform.openai.com/docs |
|
||||
| **Tier 1** | Engineering blogs (official) | anthropic.com/news, openai.com/blog |
|
||||
| **Tier 1** | Official GitHub repos | github.com/anthropics/*, github.com/openai/* |
|
||||
| **Tier 2** | Research papers | arxiv.org, papers with citations |
|
||||
| **Tier 2** | Verified community guides | Cookbook examples, official tutorials |
|
||||
| **Tier 3** | Community content | Blog posts, tutorials, Stack Overflow |
|
||||
|
||||
## Discovery Workflow
|
||||
|
||||
### Step 1: Define Search Scope
|
||||
|
||||
```python
|
||||
search_config = {
|
||||
"topic": "prompt engineering",
|
||||
"vendors": ["anthropic", "openai", "google"],
|
||||
"source_types": ["official_docs", "engineering_blog", "github_repo"],
|
||||
"freshness": "past_year",
|
||||
"max_results_per_query": 20
|
||||
}
|
||||
```
|
||||
|
||||
### Step 2: Generate Search Queries
|
||||
|
||||
```python
|
||||
def generate_queries(topic, vendors):
|
||||
queries = []
|
||||
for vendor in vendors:
|
||||
queries.append(f"site:docs.{vendor}.com {topic}")
|
||||
queries.append(f"site:{vendor}.com/docs {topic}")
|
||||
queries.append(f"site:{vendor}.com/blog {topic}")
|
||||
queries.append(f"site:github.com/{vendor} {topic}")
|
||||
queries.append(f"site:arxiv.org {topic}")
|
||||
return queries
|
||||
```
|
||||
|
||||
### Step 3: Validate and Score Sources
|
||||
|
||||
```python
|
||||
def score_source(url, title):
|
||||
score = 0.0
|
||||
if any(d in url for d in ['docs.anthropic.com', 'docs.claude.com', 'docs.openai.com']):
|
||||
score += 0.40 # Tier 1 official docs
|
||||
elif any(d in url for d in ['anthropic.com', 'openai.com', 'google.dev']):
|
||||
score += 0.30 # Tier 1 official blog/news
|
||||
elif 'github.com' in url and any(v in url for v in ['anthropics', 'openai', 'google']):
|
||||
score += 0.30 # Tier 1 official repos
|
||||
elif 'arxiv.org' in url:
|
||||
score += 0.20 # Tier 2 research
|
||||
else:
|
||||
score += 0.10 # Tier 3 community
|
||||
return min(score, 1.0)
|
||||
|
||||
def assign_credibility_tier(score):
|
||||
if score >= 0.60:
|
||||
return 'tier1_official'
|
||||
elif score >= 0.40:
|
||||
return 'tier2_verified'
|
||||
else:
|
||||
return 'tier3_community'
|
||||
```
|
||||
|
||||
## Output Format
|
||||
|
||||
```json
|
||||
{
|
||||
"discovery_date": "2025-01-28T10:30:00",
|
||||
"topic": "prompt engineering",
|
||||
"total_urls": 15,
|
||||
"urls": [
|
||||
{
|
||||
"url": "https://docs.anthropic.com/en/docs/prompt-engineering",
|
||||
"title": "Prompt Engineering Guide",
|
||||
"credibility_tier": "tier1_official",
|
||||
"credibility_score": 0.85,
|
||||
"source_type": "official_docs",
|
||||
"vendor": "anthropic"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
# 2. Web Crawler Orchestrator
|
||||
|
||||
Manages crawling operations using Firecrawl MCP with rate limiting and format handling.
|
||||
|
||||
## Crawl Configuration
|
||||
|
||||
```yaml
|
||||
firecrawl:
|
||||
rate_limit:
|
||||
requests_per_minute: 20
|
||||
concurrent_requests: 3
|
||||
default_options:
|
||||
timeout: 30000
|
||||
only_main_content: true
|
||||
```
|
||||
|
||||
## Crawl Workflow
|
||||
|
||||
### Determine Crawl Strategy
|
||||
|
||||
```python
|
||||
def select_strategy(url):
|
||||
if url.endswith('.pdf'):
|
||||
return 'pdf_extract'
|
||||
elif 'github.com' in url and '/blob/' in url:
|
||||
return 'raw_content'
|
||||
elif any(d in url for d in ['docs.', 'documentation']):
|
||||
return 'scrape'
|
||||
else:
|
||||
return 'scrape'
|
||||
```
|
||||
|
||||
### Execute Firecrawl
|
||||
|
||||
```python
|
||||
# Single page scrape
|
||||
firecrawl_scrape(
|
||||
url="https://docs.anthropic.com/en/docs/prompt-engineering",
|
||||
formats=["markdown"],
|
||||
only_main_content=True,
|
||||
timeout=30000
|
||||
)
|
||||
|
||||
# Multi-page crawl
|
||||
firecrawl_crawl(
|
||||
url="https://docs.anthropic.com/en/docs/",
|
||||
max_depth=2,
|
||||
limit=50,
|
||||
formats=["markdown"]
|
||||
)
|
||||
```
|
||||
|
||||
### Rate Limiting
|
||||
|
||||
```python
|
||||
class RateLimiter:
|
||||
def __init__(self, requests_per_minute=20):
|
||||
self.rpm = requests_per_minute
|
||||
self.request_times = deque()
|
||||
|
||||
def wait_if_needed(self):
|
||||
now = time.time()
|
||||
while self.request_times and now - self.request_times[0] > 60:
|
||||
self.request_times.popleft()
|
||||
if len(self.request_times) >= self.rpm:
|
||||
wait_time = 60 - (now - self.request_times[0])
|
||||
if wait_time > 0:
|
||||
time.sleep(wait_time)
|
||||
self.request_times.append(time.time())
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
| Error | Action |
|
||||
|-------|--------|
|
||||
| Timeout | Retry once with 2x timeout |
|
||||
| Rate limit (429) | Exponential backoff, max 3 retries |
|
||||
| Not found (404) | Log and skip |
|
||||
| Access denied (403) | Log, mark as `failed` |
|
||||
|
||||
---
|
||||
|
||||
# 3. Content Repository
|
||||
|
||||
Manages MySQL storage for the reference library. Handles document storage, version control, deduplication, and retrieval.
|
||||
|
||||
## Core Operations
|
||||
|
||||
**Store New Document:**
|
||||
```python
|
||||
def store_document(cursor, source_id, title, url, doc_type, raw_content_path):
|
||||
sql = """
|
||||
INSERT INTO documents (source_id, title, url, doc_type, crawl_date, crawl_status, raw_content_path)
|
||||
VALUES (%s, %s, %s, %s, NOW(), 'completed', %s)
|
||||
ON DUPLICATE KEY UPDATE
|
||||
version = version + 1,
|
||||
crawl_date = NOW(),
|
||||
raw_content_path = VALUES(raw_content_path)
|
||||
"""
|
||||
cursor.execute(sql, (source_id, title, url, doc_type, raw_content_path))
|
||||
return cursor.lastrowid
|
||||
```
|
||||
|
||||
**Check Duplicate:**
|
||||
```python
|
||||
def is_duplicate(cursor, url):
|
||||
cursor.execute("SELECT doc_id FROM documents WHERE url_hash = SHA2(%s, 256)", (url,))
|
||||
return cursor.fetchone() is not None
|
||||
```
|
||||
|
||||
## Table Quick Reference
|
||||
|
||||
| Table | Purpose | Key Fields |
|
||||
|-------|---------|------------|
|
||||
| `sources` | Authorized content sources | source_type, credibility_tier, vendor |
|
||||
| `documents` | Crawled document metadata | url_hash (dedup), version, crawl_status |
|
||||
| `distilled_content` | Processed summaries | review_status, compression_ratio |
|
||||
| `review_logs` | QA decisions | quality_score, decision |
|
||||
| `topics` | Taxonomy | topic_slug, parent_topic_id |
|
||||
|
||||
## Status Values
|
||||
|
||||
- **crawl_status:** `pending` → `completed` | `failed` | `stale`
|
||||
- **review_status:** `pending` → `in_review` → `approved` | `needs_refactor` | `rejected`
|
||||
- **decision:** `approve` | `refactor` | `deep_research` | `reject`
|
||||
|
||||
---
|
||||
|
||||
# 4. Content Distiller
|
||||
|
||||
Transforms raw crawled content into structured, high-quality reference materials.
|
||||
|
||||
## Distillation Goals
|
||||
|
||||
1. **Compress** - Reduce token count while preserving essential information
|
||||
2. **Structure** - Organize content for easy retrieval and reference
|
||||
3. **Extract** - Pull out code snippets, key concepts, and actionable patterns
|
||||
4. **Annotate** - Add metadata for searchability and categorization
|
||||
|
||||
## Extract Key Components
|
||||
|
||||
**Extract Code Snippets:**
|
||||
```python
|
||||
def extract_code_snippets(content):
|
||||
pattern = r'```(\w*)\n([\s\S]*?)```'
|
||||
snippets = []
|
||||
for match in re.finditer(pattern, content):
|
||||
snippets.append({
|
||||
"language": match.group(1) or "text",
|
||||
"code": match.group(2).strip(),
|
||||
"context": get_surrounding_text(content, match.start(), 200)
|
||||
})
|
||||
return snippets
|
||||
```
|
||||
|
||||
**Extract Key Concepts:**
|
||||
```python
|
||||
def extract_key_concepts(content, title):
|
||||
prompt = f"""
|
||||
Analyze this document and extract key concepts:
|
||||
|
||||
Title: {title}
|
||||
Content: {content[:8000]}
|
||||
|
||||
Return JSON with:
|
||||
- concepts: [{{"term": "...", "definition": "...", "importance": "high|medium|low"}}]
|
||||
- techniques: [{{"name": "...", "description": "...", "use_case": "..."}}]
|
||||
- best_practices: ["..."]
|
||||
"""
|
||||
return claude_extract(prompt)
|
||||
```
|
||||
|
||||
## Summary Template
|
||||
|
||||
```markdown
|
||||
# {title}
|
||||
|
||||
**Source:** {url}
|
||||
**Type:** {source_type} | **Tier:** {credibility_tier}
|
||||
|
||||
## Executive Summary
|
||||
{2-3 sentence overview}
|
||||
|
||||
## Key Concepts
|
||||
{bulleted list of core concepts}
|
||||
|
||||
## Techniques & Patterns
|
||||
{extracted techniques with use cases}
|
||||
|
||||
## Code Examples
|
||||
{relevant code snippets}
|
||||
|
||||
## Best Practices
|
||||
{actionable recommendations}
|
||||
```
|
||||
|
||||
## Quality Metrics
|
||||
|
||||
| Metric | Target |
|
||||
|--------|--------|
|
||||
| Compression Ratio | 25-35% of original |
|
||||
| Key Concept Coverage | ≥90% of important terms |
|
||||
| Code Snippet Retention | 100% of relevant examples |
|
||||
|
||||
---
|
||||
|
||||
# 5. Quality Reviewer
|
||||
|
||||
Evaluates distilled content, routes decisions, and triggers refactoring or additional research.
|
||||
|
||||
## Review Workflow
|
||||
|
||||
```
|
||||
[Distilled Content]
|
||||
│
|
||||
▼
|
||||
┌─────────────────┐
|
||||
│ Score Criteria │ → accuracy, completeness, clarity, PE quality, usability
|
||||
└─────────────────┘
|
||||
│
|
||||
├── ≥ 0.85 → APPROVE → markdown-exporter
|
||||
├── 0.60-0.84 → REFACTOR → content-distiller (with instructions)
|
||||
├── 0.40-0.59 → DEEP_RESEARCH → web-crawler (with queries)
|
||||
└── < 0.40 → REJECT → archive with reason
|
||||
```
|
||||
|
||||
## Scoring Criteria
|
||||
|
||||
| Criterion | Weight | Checks |
|
||||
|-----------|--------|--------|
|
||||
| **Accuracy** | 0.25 | Factual correctness, up-to-date info, proper attribution |
|
||||
| **Completeness** | 0.20 | Covers key concepts, includes examples, addresses edge cases |
|
||||
| **Clarity** | 0.20 | Clear structure, concise language, logical flow |
|
||||
| **PE Quality** | 0.25 | Demonstrates techniques, before/after examples, explains why |
|
||||
| **Usability** | 0.10 | Easy to reference, searchable keywords, appropriate length |
|
||||
|
||||
## Calculate Final Score
|
||||
|
||||
```python
|
||||
WEIGHTS = {
|
||||
"accuracy": 0.25,
|
||||
"completeness": 0.20,
|
||||
"clarity": 0.20,
|
||||
"prompt_engineering_quality": 0.25,
|
||||
"usability": 0.10
|
||||
}
|
||||
|
||||
def calculate_quality_score(assessment):
|
||||
return sum(
|
||||
assessment[criterion]["score"] * weight
|
||||
for criterion, weight in WEIGHTS.items()
|
||||
)
|
||||
```
|
||||
|
||||
## Route Decision
|
||||
|
||||
```python
|
||||
def determine_decision(score, assessment):
|
||||
if score >= 0.85:
|
||||
return "approve", None, None
|
||||
elif score >= 0.60:
|
||||
instructions = generate_refactor_instructions(assessment)
|
||||
return "refactor", instructions, None
|
||||
elif score >= 0.40:
|
||||
queries = generate_research_queries(assessment)
|
||||
return "deep_research", None, queries
|
||||
else:
|
||||
return "reject", f"Quality score {score:.2f} below minimum", None
|
||||
```
|
||||
|
||||
## Prompt Engineering Quality Checklist
|
||||
|
||||
- [ ] Demonstrates specific techniques (CoT, few-shot, etc.)
|
||||
- [ ] Shows before/after examples
|
||||
- [ ] Explains *why* techniques work, not just *what*
|
||||
- [ ] Provides actionable patterns
|
||||
- [ ] Includes edge cases and failure modes
|
||||
- [ ] References authoritative sources
|
||||
|
||||
---
|
||||
|
||||
# 6. Markdown Exporter
|
||||
|
||||
Exports approved content as structured markdown files for Claude Projects or fine-tuning.
|
||||
|
||||
## Export Structure
|
||||
|
||||
**Nested by Topic (recommended):**
|
||||
```
|
||||
exports/
|
||||
├── INDEX.md
|
||||
├── prompt-engineering/
|
||||
│ ├── _index.md
|
||||
│ ├── 01-chain-of-thought.md
|
||||
│ └── 02-few-shot-prompting.md
|
||||
├── claude-models/
|
||||
│ ├── _index.md
|
||||
│ └── 01-model-comparison.md
|
||||
└── agent-building/
|
||||
└── 01-tool-use.md
|
||||
```
|
||||
|
||||
## Document File Template
|
||||
|
||||
```python
|
||||
def generate_document_file(doc, include_metadata=True):
|
||||
content = []
|
||||
if include_metadata:
|
||||
content.append("---")
|
||||
content.append(f"title: {doc['title']}")
|
||||
content.append(f"source: {doc['url']}")
|
||||
content.append(f"vendor: {doc['vendor']}")
|
||||
content.append(f"tier: {doc['credibility_tier']}")
|
||||
content.append(f"quality_score: {doc['quality_score']:.2f}")
|
||||
content.append("---")
|
||||
content.append("")
|
||||
content.append(doc['structured_content'])
|
||||
return "\n".join(content)
|
||||
```
|
||||
|
||||
## Fine-tuning Export (JSONL)
|
||||
|
||||
```python
|
||||
def export_fine_tuning_dataset(content_list, config):
|
||||
with open('fine_tuning.jsonl', 'w') as f:
|
||||
for doc in content_list:
|
||||
sample = {
|
||||
"messages": [
|
||||
{"role": "system", "content": "You are an expert on AI and prompt engineering."},
|
||||
{"role": "user", "content": f"Explain {doc['title']}"},
|
||||
{"role": "assistant", "content": doc['structured_content']}
|
||||
],
|
||||
"metadata": {
|
||||
"source": doc['url'],
|
||||
"topic": doc['topic_slug'],
|
||||
"quality_score": doc['quality_score']
|
||||
}
|
||||
}
|
||||
f.write(json.dumps(sample) + '\n')
|
||||
```
|
||||
|
||||
## Cross-Reference Generation
|
||||
|
||||
```python
|
||||
def add_cross_references(doc, all_docs):
|
||||
related = []
|
||||
doc_concepts = set(c['term'].lower() for c in doc['key_concepts'])
|
||||
|
||||
for other in all_docs:
|
||||
if other['doc_id'] == doc['doc_id']:
|
||||
continue
|
||||
other_concepts = set(c['term'].lower() for c in other['key_concepts'])
|
||||
overlap = len(doc_concepts & other_concepts)
|
||||
if overlap >= 2:
|
||||
related.append({
|
||||
"title": other['title'],
|
||||
"path": generate_relative_path(doc, other),
|
||||
"overlap": overlap
|
||||
})
|
||||
|
||||
return sorted(related, key=lambda x: x['overlap'], reverse=True)[:5]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
# Integration Flow
|
||||
|
||||
| From | Output | To |
|
||||
|------|--------|-----|
|
||||
| **reference-discovery** | URL manifest | web-crawler |
|
||||
| **web-crawler** | Raw content + manifest | content-repository |
|
||||
| **content-repository** | Document records | content-distiller |
|
||||
| **content-distiller** | Distilled content | quality-reviewer |
|
||||
| **quality-reviewer** (approve) | Approved IDs | markdown-exporter |
|
||||
| **quality-reviewer** (refactor) | Instructions | content-distiller |
|
||||
| **quality-reviewer** (deep_research) | Queries | web-crawler |
|
||||
Reference in New Issue
Block a user