feat(reference-curator): Add portable skill suite for reference documentation curation
6 modular skills for curating, processing, and exporting reference docs: - reference-discovery: Search and validate authoritative sources - web-crawler-orchestrator: Multi-backend crawling (Firecrawl/Node/aiohttp/Scrapy) - content-repository: MySQL storage with version tracking - content-distiller: Summarization and key concept extraction - quality-reviewer: QA loop with approve/refactor/research routing - markdown-exporter: Structured output for Claude Projects or fine-tuning Cross-machine installation support: - Environment-based config (~/.reference-curator.env) - Commands tracked in repo, symlinked during install - install.sh with --minimal, --check, --uninstall modes - Firecrawl MCP as default (always available) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,136 @@
|
||||
# Markdown Exporter
|
||||
|
||||
Exports approved reference content as structured markdown files for project knowledge or fine-tuning datasets.
|
||||
|
||||
## Trigger Keywords
|
||||
"export references", "generate project files", "create markdown output", "export for fine-tuning", "build knowledge base"
|
||||
|
||||
## Export Types
|
||||
|
||||
| Type | Format | Use Case |
|
||||
|------|--------|----------|
|
||||
| `project_files` | Nested markdown | Claude Projects knowledge |
|
||||
| `fine_tuning` | JSONL | Model fine-tuning dataset |
|
||||
| `knowledge_base` | Flat markdown | Documentation |
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 1: Query Approved Content
|
||||
```bash
|
||||
python scripts/query_approved.py --min-score 0.80 --output approved.json
|
||||
```
|
||||
|
||||
### Step 2: Organize by Structure
|
||||
|
||||
**Nested by Topic (default):**
|
||||
```
|
||||
exports/
|
||||
├── INDEX.md
|
||||
├── prompt-engineering/
|
||||
│ ├── _index.md
|
||||
│ ├── 01-chain-of-thought.md
|
||||
│ └── 02-few-shot-prompting.md
|
||||
└── claude-models/
|
||||
├── _index.md
|
||||
└── 01-model-comparison.md
|
||||
```
|
||||
|
||||
**Flat Structure:**
|
||||
```
|
||||
exports/
|
||||
├── INDEX.md
|
||||
├── prompt-engineering-chain-of-thought.md
|
||||
└── claude-models-comparison.md
|
||||
```
|
||||
|
||||
### Step 3: Generate Files
|
||||
```bash
|
||||
python scripts/export_project.py \
|
||||
--structure nested_by_topic \
|
||||
--output ~/reference-library/exports/ \
|
||||
--include-metadata
|
||||
```
|
||||
|
||||
### Step 4: Generate INDEX
|
||||
```bash
|
||||
python scripts/generate_index.py --output ~/reference-library/exports/INDEX.md
|
||||
```
|
||||
|
||||
### Step 5: Fine-tuning Export (Optional)
|
||||
```bash
|
||||
python scripts/export_finetuning.py \
|
||||
--output ~/reference-library/exports/fine_tuning.jsonl \
|
||||
--max-tokens 4096
|
||||
```
|
||||
|
||||
JSONL format:
|
||||
```json
|
||||
{
|
||||
"messages": [
|
||||
{"role": "system", "content": "You are an expert on AI and prompt engineering."},
|
||||
{"role": "user", "content": "Explain {title}"},
|
||||
{"role": "assistant", "content": "{structured_content}"}
|
||||
],
|
||||
"metadata": {"source": "{url}", "topic": "{topic_slug}", "quality_score": 0.92}
|
||||
}
|
||||
```
|
||||
|
||||
### Step 6: Log Export Job
|
||||
```bash
|
||||
python scripts/log_export.py --name "January 2025 Export" --type project_files --docs 45
|
||||
```
|
||||
|
||||
## Cross-Reference Generation
|
||||
```bash
|
||||
python scripts/add_crossrefs.py --input ~/reference-library/exports/
|
||||
```
|
||||
|
||||
Links related documents based on overlapping key concepts.
|
||||
|
||||
## Output Verification
|
||||
|
||||
After export, verify:
|
||||
- [ ] All files readable and valid markdown
|
||||
- [ ] INDEX.md links resolve correctly
|
||||
- [ ] No broken cross-references
|
||||
- [ ] Total token count matches expectation
|
||||
- [ ] No duplicate content
|
||||
|
||||
```bash
|
||||
python scripts/verify_export.py --path ~/reference-library/exports/
|
||||
```
|
||||
|
||||
## Scripts
|
||||
|
||||
- `scripts/query_approved.py` - Get approved content from DB
|
||||
- `scripts/export_project.py` - Main export for project files
|
||||
- `scripts/export_finetuning.py` - JSONL export for fine-tuning
|
||||
- `scripts/generate_index.py` - Generate INDEX.md
|
||||
- `scripts/add_crossrefs.py` - Add cross-references
|
||||
- `scripts/log_export.py` - Log export job to DB
|
||||
- `scripts/verify_export.py` - Verify export integrity
|
||||
|
||||
## Configuration
|
||||
|
||||
```yaml
|
||||
# ~/.config/reference-curator/export_config.yaml
|
||||
output:
|
||||
base_path: ~/reference-library/exports/
|
||||
project_files:
|
||||
structure: nested_by_topic
|
||||
index_file: INDEX.md
|
||||
include_metadata: true
|
||||
fine_tuning:
|
||||
format: jsonl
|
||||
max_tokens_per_sample: 4096
|
||||
|
||||
quality:
|
||||
min_score_for_export: 0.80
|
||||
```
|
||||
|
||||
## Integration
|
||||
|
||||
| From | To |
|
||||
|------|-----|
|
||||
| quality-reviewer (approved) | → |
|
||||
| → | Project knowledge / Fine-tuning dataset |
|
||||
Reference in New Issue
Block a user