Add claude-project/ folder with skill files formatted for upload to Claude.ai Projects (web interface): - reference-curator-complete.md: All 6 skills consolidated - INDEX.md: Overview and workflow documentation - Individual skill files (01-06) without YAML frontmatter Add --claude-ai option to install.sh: - Lists available files for upload - Optionally copies to custom destination directory - Provides upload instructions for Claude.ai Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
5.0 KiB
5.0 KiB
Content Repository
Manages MySQL storage for the reference library system. Handles document storage, version control, deduplication, and retrieval.
Prerequisites
- MySQL 8.0+ with utf8mb4 charset
- Config file at
~/.config/reference-curator/db_config.yaml - Database
reference_libraryinitialized with schema
Quick Reference
Connection Setup
import yaml
import os
from pathlib import Path
def get_db_config():
config_path = Path.home() / ".config/reference-curator/db_config.yaml"
with open(config_path) as f:
config = yaml.safe_load(f)
# Resolve environment variables
mysql = config['mysql']
return {
'host': mysql['host'],
'port': mysql['port'],
'database': mysql['database'],
'user': os.environ.get('MYSQL_USER', mysql.get('user', '')),
'password': os.environ.get('MYSQL_PASSWORD', mysql.get('password', '')),
'charset': mysql['charset']
}
Core Operations
Store New Document:
def store_document(cursor, source_id, title, url, doc_type, raw_content_path):
sql = """
INSERT INTO documents (source_id, title, url, doc_type, crawl_date, crawl_status, raw_content_path)
VALUES (%s, %s, %s, %s, NOW(), 'completed', %s)
ON DUPLICATE KEY UPDATE
version = version + 1,
previous_version_id = doc_id,
crawl_date = NOW(),
raw_content_path = VALUES(raw_content_path)
"""
cursor.execute(sql, (source_id, title, url, doc_type, raw_content_path))
return cursor.lastrowid
Check Duplicate:
def is_duplicate(cursor, url):
cursor.execute("SELECT doc_id FROM documents WHERE url_hash = SHA2(%s, 256)", (url,))
return cursor.fetchone() is not None
Get Document by Topic:
def get_docs_by_topic(cursor, topic_slug, min_quality=0.80):
sql = """
SELECT d.doc_id, d.title, d.url, dc.structured_content, dc.quality_score
FROM documents d
JOIN document_topics dt ON d.doc_id = dt.doc_id
JOIN topics t ON dt.topic_id = t.topic_id
LEFT JOIN distilled_content dc ON d.doc_id = dc.doc_id
WHERE t.topic_slug = %s
AND (dc.review_status = 'approved' OR dc.review_status IS NULL)
ORDER BY dt.relevance_score DESC
"""
cursor.execute(sql, (topic_slug,))
return cursor.fetchall()
Table Quick Reference
| Table | Purpose | Key Fields |
|---|---|---|
sources |
Authorized content sources | source_type, credibility_tier, vendor |
documents |
Crawled document metadata | url_hash (dedup), version, crawl_status |
distilled_content |
Processed summaries | review_status, compression_ratio |
review_logs |
QA decisions | quality_score, decision, refactor_instructions |
topics |
Taxonomy | topic_slug, parent_topic_id |
document_topics |
Many-to-many linking | relevance_score |
export_jobs |
Export tracking | export_type, output_format, status |
Status Values
crawl_status: pending → completed | failed | stale
review_status: pending → in_review → approved | needs_refactor | rejected
decision (review): approve | refactor | deep_research | reject
Common Queries
Find Stale Documents (needs re-crawl)
SELECT d.doc_id, d.title, d.url, d.crawl_date
FROM documents d
JOIN crawl_schedule cs ON d.source_id = cs.source_id
WHERE d.crawl_date < DATE_SUB(NOW(), INTERVAL
CASE cs.frequency
WHEN 'daily' THEN 1
WHEN 'weekly' THEN 7
WHEN 'biweekly' THEN 14
WHEN 'monthly' THEN 30
END DAY)
AND cs.is_enabled = TRUE;
Get Pending Reviews
SELECT dc.distill_id, d.title, d.url, dc.token_count_distilled
FROM distilled_content dc
JOIN documents d ON dc.doc_id = d.doc_id
WHERE dc.review_status = 'pending'
ORDER BY dc.distill_date ASC;
Export-Ready Content
SELECT d.title, d.url, dc.structured_content, t.topic_slug
FROM documents d
JOIN distilled_content dc ON d.doc_id = dc.doc_id
JOIN document_topics dt ON d.doc_id = dt.doc_id
JOIN topics t ON dt.topic_id = t.topic_id
JOIN review_logs rl ON dc.distill_id = rl.distill_id
WHERE rl.decision = 'approve'
AND rl.quality_score >= 0.85
ORDER BY t.topic_slug, dt.relevance_score DESC;
Workflow Integration
- From crawler-orchestrator: Receive URL + raw content path →
store_document() - To content-distiller: Query pending documents → send for processing
- From quality-reviewer: Update
review_statusbased on decision - To markdown-exporter: Query approved content by topic
Error Handling
- Duplicate URL: Silent update (version increment) via
ON DUPLICATE KEY UPDATE - Missing source_id: Validate against
sourcestable before insert - Connection failure: Implement retry with exponential backoff
Full Schema Reference
See references/schema.sql for complete table definitions including indexes and constraints.
Config File Template
See references/db_config_template.yaml for connection configuration template.