# Content Repository MySQL storage management for the reference library. Handles document storage, version control, deduplication, and retrieval. ## Trigger Keywords "store content", "save to database", "check duplicates", "version tracking", "document retrieval", "reference library DB" ## Prerequisites - MySQL 8.0+ with utf8mb4 charset - Config file at `~/.config/reference-curator/db_config.yaml` - Database `reference_library` initialized ## Database Setup ```bash # Initialize database mysql -u root -p < references/schema.sql # Verify tables mysql -u root -p reference_library -e "SHOW TABLES;" ``` ## Core Scripts ### Store Document ```bash python scripts/store_document.py \ --source-id 1 \ --title "Prompt Engineering Guide" \ --url "https://docs.anthropic.com/..." \ --doc-type webpage \ --raw-path ~/reference-library/raw/2025/01/abc123.md ``` ### Check Duplicate ```bash python scripts/check_duplicate.py --url "https://docs.anthropic.com/..." ``` ### Query by Topic ```bash python scripts/query_topic.py --topic-slug prompt-engineering --min-quality 0.80 ``` ## Table Quick Reference | Table | Purpose | Key Fields | |-------|---------|------------| | `sources` | Authorized sources | source_type, credibility_tier, vendor | | `documents` | Document metadata | url_hash (dedup), version, crawl_status | | `distilled_content` | Processed summaries | review_status, compression_ratio | | `review_logs` | QA decisions | quality_score, decision | | `topics` | Taxonomy | topic_slug, parent_topic_id | | `document_topics` | Many-to-many links | relevance_score | | `export_jobs` | Export tracking | export_type, status | ## Status Values **crawl_status:** `pending` → `completed` | `failed` | `stale` **review_status:** `pending` → `in_review` → `approved` | `needs_refactor` | `rejected` ## Common Queries ### Find Stale Documents ```bash python scripts/find_stale.py --output stale_docs.json ``` ### Get Pending Reviews ```bash python scripts/pending_reviews.py --output pending.json ``` ### Export-Ready Content ```bash python scripts/export_ready.py --min-score 0.85 --output ready.json ``` ## Scripts - `scripts/store_document.py` - Store new document - `scripts/check_duplicate.py` - URL deduplication - `scripts/query_topic.py` - Query by topic - `scripts/find_stale.py` - Find stale documents - `scripts/pending_reviews.py` - Get pending reviews - `scripts/db_utils.py` - Database connection utilities ## Integration | From | Action | To | |------|--------|-----| | crawler-orchestrator | Store crawled content | → | | → | Query pending docs | content-distiller | | quality-reviewer | Update review_status | → | | → | Query approved content | markdown-exporter |