feat(reference-curator): Add Claude.ai Projects export format
Add claude-project/ folder with skill files formatted for upload to Claude.ai Projects (web interface): - reference-curator-complete.md: All 6 skills consolidated - INDEX.md: Overview and workflow documentation - Individual skill files (01-06) without YAML frontmatter Add --claude-ai option to install.sh: - Lists available files for upload - Optionally copies to custom destination directory - Provides upload instructions for Claude.ai Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,158 @@
|
||||
|
||||
# Content Repository
|
||||
|
||||
Manages MySQL storage for the reference library system. Handles document storage, version control, deduplication, and retrieval.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- MySQL 8.0+ with utf8mb4 charset
|
||||
- Config file at `~/.config/reference-curator/db_config.yaml`
|
||||
- Database `reference_library` initialized with schema
|
||||
|
||||
## Quick Reference
|
||||
|
||||
### Connection Setup
|
||||
|
||||
```python
|
||||
import yaml
|
||||
import os
|
||||
from pathlib import Path
|
||||
|
||||
def get_db_config():
|
||||
config_path = Path.home() / ".config/reference-curator/db_config.yaml"
|
||||
with open(config_path) as f:
|
||||
config = yaml.safe_load(f)
|
||||
|
||||
# Resolve environment variables
|
||||
mysql = config['mysql']
|
||||
return {
|
||||
'host': mysql['host'],
|
||||
'port': mysql['port'],
|
||||
'database': mysql['database'],
|
||||
'user': os.environ.get('MYSQL_USER', mysql.get('user', '')),
|
||||
'password': os.environ.get('MYSQL_PASSWORD', mysql.get('password', '')),
|
||||
'charset': mysql['charset']
|
||||
}
|
||||
```
|
||||
|
||||
### Core Operations
|
||||
|
||||
**Store New Document:**
|
||||
```python
|
||||
def store_document(cursor, source_id, title, url, doc_type, raw_content_path):
|
||||
sql = """
|
||||
INSERT INTO documents (source_id, title, url, doc_type, crawl_date, crawl_status, raw_content_path)
|
||||
VALUES (%s, %s, %s, %s, NOW(), 'completed', %s)
|
||||
ON DUPLICATE KEY UPDATE
|
||||
version = version + 1,
|
||||
previous_version_id = doc_id,
|
||||
crawl_date = NOW(),
|
||||
raw_content_path = VALUES(raw_content_path)
|
||||
"""
|
||||
cursor.execute(sql, (source_id, title, url, doc_type, raw_content_path))
|
||||
return cursor.lastrowid
|
||||
```
|
||||
|
||||
**Check Duplicate:**
|
||||
```python
|
||||
def is_duplicate(cursor, url):
|
||||
cursor.execute("SELECT doc_id FROM documents WHERE url_hash = SHA2(%s, 256)", (url,))
|
||||
return cursor.fetchone() is not None
|
||||
```
|
||||
|
||||
**Get Document by Topic:**
|
||||
```python
|
||||
def get_docs_by_topic(cursor, topic_slug, min_quality=0.80):
|
||||
sql = """
|
||||
SELECT d.doc_id, d.title, d.url, dc.structured_content, dc.quality_score
|
||||
FROM documents d
|
||||
JOIN document_topics dt ON d.doc_id = dt.doc_id
|
||||
JOIN topics t ON dt.topic_id = t.topic_id
|
||||
LEFT JOIN distilled_content dc ON d.doc_id = dc.doc_id
|
||||
WHERE t.topic_slug = %s
|
||||
AND (dc.review_status = 'approved' OR dc.review_status IS NULL)
|
||||
ORDER BY dt.relevance_score DESC
|
||||
"""
|
||||
cursor.execute(sql, (topic_slug,))
|
||||
return cursor.fetchall()
|
||||
```
|
||||
|
||||
## Table Quick Reference
|
||||
|
||||
| Table | Purpose | Key Fields |
|
||||
|-------|---------|------------|
|
||||
| `sources` | Authorized content sources | source_type, credibility_tier, vendor |
|
||||
| `documents` | Crawled document metadata | url_hash (dedup), version, crawl_status |
|
||||
| `distilled_content` | Processed summaries | review_status, compression_ratio |
|
||||
| `review_logs` | QA decisions | quality_score, decision, refactor_instructions |
|
||||
| `topics` | Taxonomy | topic_slug, parent_topic_id |
|
||||
| `document_topics` | Many-to-many linking | relevance_score |
|
||||
| `export_jobs` | Export tracking | export_type, output_format, status |
|
||||
|
||||
## Status Values
|
||||
|
||||
**crawl_status:** `pending` → `completed` | `failed` | `stale`
|
||||
|
||||
**review_status:** `pending` → `in_review` → `approved` | `needs_refactor` | `rejected`
|
||||
|
||||
**decision (review):** `approve` | `refactor` | `deep_research` | `reject`
|
||||
|
||||
## Common Queries
|
||||
|
||||
### Find Stale Documents (needs re-crawl)
|
||||
```sql
|
||||
SELECT d.doc_id, d.title, d.url, d.crawl_date
|
||||
FROM documents d
|
||||
JOIN crawl_schedule cs ON d.source_id = cs.source_id
|
||||
WHERE d.crawl_date < DATE_SUB(NOW(), INTERVAL
|
||||
CASE cs.frequency
|
||||
WHEN 'daily' THEN 1
|
||||
WHEN 'weekly' THEN 7
|
||||
WHEN 'biweekly' THEN 14
|
||||
WHEN 'monthly' THEN 30
|
||||
END DAY)
|
||||
AND cs.is_enabled = TRUE;
|
||||
```
|
||||
|
||||
### Get Pending Reviews
|
||||
```sql
|
||||
SELECT dc.distill_id, d.title, d.url, dc.token_count_distilled
|
||||
FROM distilled_content dc
|
||||
JOIN documents d ON dc.doc_id = d.doc_id
|
||||
WHERE dc.review_status = 'pending'
|
||||
ORDER BY dc.distill_date ASC;
|
||||
```
|
||||
|
||||
### Export-Ready Content
|
||||
```sql
|
||||
SELECT d.title, d.url, dc.structured_content, t.topic_slug
|
||||
FROM documents d
|
||||
JOIN distilled_content dc ON d.doc_id = dc.doc_id
|
||||
JOIN document_topics dt ON d.doc_id = dt.doc_id
|
||||
JOIN topics t ON dt.topic_id = t.topic_id
|
||||
JOIN review_logs rl ON dc.distill_id = rl.distill_id
|
||||
WHERE rl.decision = 'approve'
|
||||
AND rl.quality_score >= 0.85
|
||||
ORDER BY t.topic_slug, dt.relevance_score DESC;
|
||||
```
|
||||
|
||||
## Workflow Integration
|
||||
|
||||
1. **From crawler-orchestrator:** Receive URL + raw content path → `store_document()`
|
||||
2. **To content-distiller:** Query pending documents → send for processing
|
||||
3. **From quality-reviewer:** Update `review_status` based on decision
|
||||
4. **To markdown-exporter:** Query approved content by topic
|
||||
|
||||
## Error Handling
|
||||
|
||||
- **Duplicate URL:** Silent update (version increment) via `ON DUPLICATE KEY UPDATE`
|
||||
- **Missing source_id:** Validate against `sources` table before insert
|
||||
- **Connection failure:** Implement retry with exponential backoff
|
||||
|
||||
## Full Schema Reference
|
||||
|
||||
See `references/schema.sql` for complete table definitions including indexes and constraints.
|
||||
|
||||
## Config File Template
|
||||
|
||||
See `references/db_config_template.yaml` for connection configuration template.
|
||||
Reference in New Issue
Block a user