6 modular skills for curating, processing, and exporting reference docs: - reference-discovery: Search and validate authoritative sources - web-crawler-orchestrator: Multi-backend crawling (Firecrawl/Node/aiohttp/Scrapy) - content-repository: MySQL storage with version tracking - content-distiller: Summarization and key concept extraction - quality-reviewer: QA loop with approve/refactor/research routing - markdown-exporter: Structured output for Claude Projects or fine-tuning Cross-machine installation support: - Environment-based config (~/.reference-curator.env) - Commands tracked in repo, symlinked during install - install.sh with --minimal, --check, --uninstall modes - Firecrawl MCP as default (always available) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
3.3 KiB
3.3 KiB
Markdown Exporter
Exports approved reference content as structured markdown files for project knowledge or fine-tuning datasets.
Trigger Keywords
"export references", "generate project files", "create markdown output", "export for fine-tuning", "build knowledge base"
Export Types
| Type | Format | Use Case |
|---|---|---|
project_files |
Nested markdown | Claude Projects knowledge |
fine_tuning |
JSONL | Model fine-tuning dataset |
knowledge_base |
Flat markdown | Documentation |
Workflow
Step 1: Query Approved Content
python scripts/query_approved.py --min-score 0.80 --output approved.json
Step 2: Organize by Structure
Nested by Topic (default):
exports/
├── INDEX.md
├── prompt-engineering/
│ ├── _index.md
│ ├── 01-chain-of-thought.md
│ └── 02-few-shot-prompting.md
└── claude-models/
├── _index.md
└── 01-model-comparison.md
Flat Structure:
exports/
├── INDEX.md
├── prompt-engineering-chain-of-thought.md
└── claude-models-comparison.md
Step 3: Generate Files
python scripts/export_project.py \
--structure nested_by_topic \
--output ~/reference-library/exports/ \
--include-metadata
Step 4: Generate INDEX
python scripts/generate_index.py --output ~/reference-library/exports/INDEX.md
Step 5: Fine-tuning Export (Optional)
python scripts/export_finetuning.py \
--output ~/reference-library/exports/fine_tuning.jsonl \
--max-tokens 4096
JSONL format:
{
"messages": [
{"role": "system", "content": "You are an expert on AI and prompt engineering."},
{"role": "user", "content": "Explain {title}"},
{"role": "assistant", "content": "{structured_content}"}
],
"metadata": {"source": "{url}", "topic": "{topic_slug}", "quality_score": 0.92}
}
Step 6: Log Export Job
python scripts/log_export.py --name "January 2025 Export" --type project_files --docs 45
Cross-Reference Generation
python scripts/add_crossrefs.py --input ~/reference-library/exports/
Links related documents based on overlapping key concepts.
Output Verification
After export, verify:
- All files readable and valid markdown
- INDEX.md links resolve correctly
- No broken cross-references
- Total token count matches expectation
- No duplicate content
python scripts/verify_export.py --path ~/reference-library/exports/
Scripts
scripts/query_approved.py- Get approved content from DBscripts/export_project.py- Main export for project filesscripts/export_finetuning.py- JSONL export for fine-tuningscripts/generate_index.py- Generate INDEX.mdscripts/add_crossrefs.py- Add cross-referencesscripts/log_export.py- Log export job to DBscripts/verify_export.py- Verify export integrity
Configuration
# ~/.config/reference-curator/export_config.yaml
output:
base_path: ~/reference-library/exports/
project_files:
structure: nested_by_topic
index_file: INDEX.md
include_metadata: true
fine_tuning:
format: jsonl
max_tokens_per_sample: 4096
quality:
min_score_for_export: 0.80
Integration
| From | To |
|---|---|
| quality-reviewer (approved) | → |
| → | Project knowledge / Fine-tuning dataset |