feat(reference-curator): Add portable skill suite for reference documentation curation
6 modular skills for curating, processing, and exporting reference docs: - reference-discovery: Search and validate authoritative sources - web-crawler-orchestrator: Multi-backend crawling (Firecrawl/Node/aiohttp/Scrapy) - content-repository: MySQL storage with version tracking - content-distiller: Summarization and key concept extraction - quality-reviewer: QA loop with approve/refactor/research routing - markdown-exporter: Structured output for Claude Projects or fine-tuning Cross-machine installation support: - Environment-based config (~/.reference-curator.env) - Commands tracked in repo, symlinked during install - install.sh with --minimal, --check, --uninstall modes - Firecrawl MCP as default (always available) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,97 @@
|
||||
# Content Repository
|
||||
|
||||
MySQL storage management for the reference library. Handles document storage, version control, deduplication, and retrieval.
|
||||
|
||||
## Trigger Keywords
|
||||
"store content", "save to database", "check duplicates", "version tracking", "document retrieval", "reference library DB"
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- MySQL 8.0+ with utf8mb4 charset
|
||||
- Config file at `~/.config/reference-curator/db_config.yaml`
|
||||
- Database `reference_library` initialized
|
||||
|
||||
## Database Setup
|
||||
|
||||
```bash
|
||||
# Initialize database
|
||||
mysql -u root -p < references/schema.sql
|
||||
|
||||
# Verify tables
|
||||
mysql -u root -p reference_library -e "SHOW TABLES;"
|
||||
```
|
||||
|
||||
## Core Scripts
|
||||
|
||||
### Store Document
|
||||
```bash
|
||||
python scripts/store_document.py \
|
||||
--source-id 1 \
|
||||
--title "Prompt Engineering Guide" \
|
||||
--url "https://docs.anthropic.com/..." \
|
||||
--doc-type webpage \
|
||||
--raw-path ~/reference-library/raw/2025/01/abc123.md
|
||||
```
|
||||
|
||||
### Check Duplicate
|
||||
```bash
|
||||
python scripts/check_duplicate.py --url "https://docs.anthropic.com/..."
|
||||
```
|
||||
|
||||
### Query by Topic
|
||||
```bash
|
||||
python scripts/query_topic.py --topic-slug prompt-engineering --min-quality 0.80
|
||||
```
|
||||
|
||||
## Table Quick Reference
|
||||
|
||||
| Table | Purpose | Key Fields |
|
||||
|-------|---------|------------|
|
||||
| `sources` | Authorized sources | source_type, credibility_tier, vendor |
|
||||
| `documents` | Document metadata | url_hash (dedup), version, crawl_status |
|
||||
| `distilled_content` | Processed summaries | review_status, compression_ratio |
|
||||
| `review_logs` | QA decisions | quality_score, decision |
|
||||
| `topics` | Taxonomy | topic_slug, parent_topic_id |
|
||||
| `document_topics` | Many-to-many links | relevance_score |
|
||||
| `export_jobs` | Export tracking | export_type, status |
|
||||
|
||||
## Status Values
|
||||
|
||||
**crawl_status:** `pending` → `completed` | `failed` | `stale`
|
||||
|
||||
**review_status:** `pending` → `in_review` → `approved` | `needs_refactor` | `rejected`
|
||||
|
||||
## Common Queries
|
||||
|
||||
### Find Stale Documents
|
||||
```bash
|
||||
python scripts/find_stale.py --output stale_docs.json
|
||||
```
|
||||
|
||||
### Get Pending Reviews
|
||||
```bash
|
||||
python scripts/pending_reviews.py --output pending.json
|
||||
```
|
||||
|
||||
### Export-Ready Content
|
||||
```bash
|
||||
python scripts/export_ready.py --min-score 0.85 --output ready.json
|
||||
```
|
||||
|
||||
## Scripts
|
||||
|
||||
- `scripts/store_document.py` - Store new document
|
||||
- `scripts/check_duplicate.py` - URL deduplication
|
||||
- `scripts/query_topic.py` - Query by topic
|
||||
- `scripts/find_stale.py` - Find stale documents
|
||||
- `scripts/pending_reviews.py` - Get pending reviews
|
||||
- `scripts/db_utils.py` - Database connection utilities
|
||||
|
||||
## Integration
|
||||
|
||||
| From | Action | To |
|
||||
|------|--------|-----|
|
||||
| crawler-orchestrator | Store crawled content | → |
|
||||
| → | Query pending docs | content-distiller |
|
||||
| quality-reviewer | Update review_status | → |
|
||||
| → | Query approved content | markdown-exporter |
|
||||
Reference in New Issue
Block a user