feat(reference-curator): Add portable skill suite for reference documentation curation

6 modular skills for curating, processing, and exporting reference docs:
- reference-discovery: Search and validate authoritative sources
- web-crawler-orchestrator: Multi-backend crawling (Firecrawl/Node/aiohttp/Scrapy)
- content-repository: MySQL storage with version tracking
- content-distiller: Summarization and key concept extraction
- quality-reviewer: QA loop with approve/refactor/research routing
- markdown-exporter: Structured output for Claude Projects or fine-tuning

Cross-machine installation support:
- Environment-based config (~/.reference-curator.env)
- Commands tracked in repo, symlinked during install
- install.sh with --minimal, --check, --uninstall modes
- Firecrawl MCP as default (always available)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-01-29 00:20:27 +07:00
parent e80056ae8a
commit 6d7a6d7a88
26 changed files with 4486 additions and 1 deletions

View File

@@ -0,0 +1,138 @@
---
description: Export approved content to markdown files or JSONL for fine-tuning. Generates structured output with cross-references.
argument-hint: <format> [--topic slug] [--min-score 0.80]
allowed-tools: Read, Write, Bash, Glob, Grep
---
# Markdown Exporter
Export approved content to structured formats.
## Arguments
- `<format>`: project_files | fine_tuning | knowledge_base
- `--topic`: Filter by topic slug (e.g., "prompt-engineering")
- `--min-score`: Minimum quality score (default: 0.80)
## Export Formats
### project_files (Claude Projects)
Output structure:
```
~/reference-library/exports/
├── INDEX.md # Master index
└── {topic-slug}/
├── _index.md # Topic overview
├── {document-1}.md
└── {document-2}.md
```
**INDEX.md format:**
```markdown
# Reference Library Index
Generated: {timestamp}
Total Documents: {count}
## Topics
### [Prompt Engineering](./prompt-engineering/)
{count} documents | Last updated: {date}
### [Claude Models](./claude-models/)
{count} documents | Last updated: {date}
```
**Document format:**
```markdown
---
source: {source_name}
url: {original_url}
credibility: {tier}
quality_score: {score}
exported: {timestamp}
---
# {title}
{structured_content}
## Related Documents
- [Related Doc 1](./related-1.md)
- [Related Doc 2](./related-2.md)
```
### fine_tuning (JSONL)
Output: `~/reference-library/exports/fine_tuning_{timestamp}.jsonl`
```json
{"messages": [
{"role": "system", "content": "You are an expert on AI and prompt engineering."},
{"role": "user", "content": "Explain {topic}"},
{"role": "assistant", "content": "{structured_content}"}
]}
```
### knowledge_base (Flat)
Single consolidated file with table of contents.
## Export Process
### 1. Query Approved Content
```sql
SELECT dc.*, d.title, d.url, s.source_name, s.credibility_tier, t.topic_slug
FROM distilled_content dc
JOIN documents d ON dc.doc_id = d.doc_id
JOIN sources s ON d.source_id = s.source_id
LEFT JOIN document_topics dt ON d.doc_id = dt.doc_id
LEFT JOIN topics t ON dt.topic_id = t.topic_id
WHERE dc.review_status = 'approved'
AND (SELECT MAX(quality_score) FROM review_logs WHERE distill_id = dc.distill_id) >= ?;
```
### 2. Generate Cross-References
Find related documents by:
- Shared topics
- Overlapping key concepts
- Same source
### 3. Write Files
```bash
mkdir -p ~/reference-library/exports/{topic-slug}
```
### 4. Log Export Job
```sql
INSERT INTO export_jobs
(export_name, export_type, output_format, topic_filter,
min_quality_score, output_path, total_documents, status)
VALUES
(?, ?, ?, ?, ?, ?, ?, 'completed');
```
## Configuration
From `~/.config/reference-curator/export_config.yaml`:
```yaml
output:
base_path: ~/reference-library/exports/
project_files:
structure: nested_by_topic
include_metadata: true
quality:
min_score_for_export: 0.80
auto_approve_tier1_sources: true
```
## Example Usage
```
/markdown-exporter project_files
/markdown-exporter fine_tuning --topic prompt-engineering
/markdown-exporter project_files --min-score 0.90
```