feat(reference-curator): Add portable skill suite for reference documentation curation

6 modular skills for curating, processing, and exporting reference docs: - reference-discovery: Search and validate authoritative sources - web-crawler-orchestrator: Multi-backend crawling (Firecrawl/Node/aiohttp/Scrapy) - content-repository: MySQL storage with version tracking - content-distiller: Summarization and key concept extraction - quality-reviewer: QA loop with approve/refactor/research routing - markdown-exporter: Structured output for Claude Projects or fine-tuning Cross-machine installation support: - Environment-based config (~/.reference-curator.env) - Commands tracked in repo, symlinked during install - install.sh with --minimal, --check, --uninstall modes - Firecrawl MCP as default (always available) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 00:20:27 +07:00
parent e80056ae8a
commit 6d7a6d7a88
26 changed files with 4486 additions and 1 deletions
--- a/custom-skills/90-reference-curator/06-markdown-exporter/code/CLAUDE.md
+++ b/custom-skills/90-reference-curator/06-markdown-exporter/code/CLAUDE.md
@@ -0,0 +1,136 @@
+# Markdown Exporter
+
+Exports approved reference content as structured markdown files for project knowledge or fine-tuning datasets.
+
+## Trigger Keywords
+"export references", "generate project files", "create markdown output", "export for fine-tuning", "build knowledge base"
+
+## Export Types
+
+| Type | Format | Use Case |
+|------|--------|----------|
+| `project_files` | Nested markdown | Claude Projects knowledge |
+| `fine_tuning` | JSONL | Model fine-tuning dataset |
+| `knowledge_base` | Flat markdown | Documentation |
+
+## Workflow
+
+### Step 1: Query Approved Content
+```bash
+python scripts/query_approved.py --min-score 0.80 --output approved.json
+```
+
+### Step 2: Organize by Structure
+
+**Nested by Topic (default):**
+```
+exports/
+├── INDEX.md
+├── prompt-engineering/
+│   ├── _index.md
+│   ├── 01-chain-of-thought.md
+│   └── 02-few-shot-prompting.md
+└── claude-models/
+    ├── _index.md
+    └── 01-model-comparison.md
+```
+
+**Flat Structure:**
+```
+exports/
+├── INDEX.md
+├── prompt-engineering-chain-of-thought.md
+└── claude-models-comparison.md
+```
+
+### Step 3: Generate Files
+```bash
+python scripts/export_project.py \
+  --structure nested_by_topic \
+  --output ~/reference-library/exports/ \
+  --include-metadata
+```
+
+### Step 4: Generate INDEX
+```bash
+python scripts/generate_index.py --output ~/reference-library/exports/INDEX.md
+```
+
+### Step 5: Fine-tuning Export (Optional)
+```bash
+python scripts/export_finetuning.py \
+  --output ~/reference-library/exports/fine_tuning.jsonl \
+  --max-tokens 4096
+```
+
+JSONL format:
+```json
+{
+  "messages": [
+    {"role": "system", "content": "You are an expert on AI and prompt engineering."},
+    {"role": "user", "content": "Explain {title}"},
+    {"role": "assistant", "content": "{structured_content}"}
+  ],
+  "metadata": {"source": "{url}", "topic": "{topic_slug}", "quality_score": 0.92}
+}
+```
+
+### Step 6: Log Export Job
+```bash
+python scripts/log_export.py --name "January 2025 Export" --type project_files --docs 45
+```
+
+## Cross-Reference Generation
+```bash
+python scripts/add_crossrefs.py --input ~/reference-library/exports/
+```
+
+Links related documents based on overlapping key concepts.
+
+## Output Verification
+
+After export, verify:
+- [ ] All files readable and valid markdown
+- [ ] INDEX.md links resolve correctly
+- [ ] No broken cross-references
+- [ ] Total token count matches expectation
+- [ ] No duplicate content
+
+```bash
+python scripts/verify_export.py --path ~/reference-library/exports/
+```
+
+## Scripts
+
+- `scripts/query_approved.py` - Get approved content from DB
+- `scripts/export_project.py` - Main export for project files
+- `scripts/export_finetuning.py` - JSONL export for fine-tuning
+- `scripts/generate_index.py` - Generate INDEX.md
+- `scripts/add_crossrefs.py` - Add cross-references
+- `scripts/log_export.py` - Log export job to DB
+- `scripts/verify_export.py` - Verify export integrity
+
+## Configuration
+
+```yaml
+# ~/.config/reference-curator/export_config.yaml
+output:
+  base_path: ~/reference-library/exports/
+  project_files:
+    structure: nested_by_topic
+    index_file: INDEX.md
+    include_metadata: true
+  fine_tuning:
+    format: jsonl
+    max_tokens_per_sample: 4096
+
+quality:
+  min_score_for_export: 0.80
+```
+
+## Integration
+
+| From | To |
+|------|-----|
+| quality-reviewer (approved) | → |
+| → | Project knowledge / Fine-tuning dataset |