feat(reference-curator): Add portable skill suite for reference documentation curation

6 modular skills for curating, processing, and exporting reference docs: - reference-discovery: Search and validate authoritative sources - web-crawler-orchestrator: Multi-backend crawling (Firecrawl/Node/aiohttp/Scrapy) - content-repository: MySQL storage with version tracking - content-distiller: Summarization and key concept extraction - quality-reviewer: QA loop with approve/refactor/research routing - markdown-exporter: Structured output for Claude Projects or fine-tuning Cross-machine installation support: - Environment-based config (~/.reference-curator.env) - Commands tracked in repo, symlinked during install - install.sh with --minimal, --check, --uninstall modes - Firecrawl MCP as default (always available) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 00:20:27 +07:00
parent e80056ae8a
commit 6d7a6d7a88
26 changed files with 4486 additions and 1 deletions
--- a/custom-skills/90-reference-curator/commands/content-repository.md
+++ b/custom-skills/90-reference-curator/commands/content-repository.md
@@ -0,0 +1,94 @@
+---
+description: Store and manage crawled content in MySQL. Handles versioning, deduplication, and document metadata.
+argument-hint: <action> [--doc-id N] [--source-id N]
+allowed-tools: Bash, Read, Write, Glob, Grep
+---
+
+# Content Repository
+
+Manage crawled content in MySQL database.
+
+## Arguments
+- `<action>`: store | list | get | update | delete | stats
+- `--doc-id`: Specific document ID
+- `--source-id`: Filter by source ID
+
+## Database Connection
+
+```bash
+source ~/.envrc
+mysql -u $MYSQL_USER -p"$MYSQL_PASSWORD" reference_library
+```
+
+## Actions
+
+### store
+Store new documents from crawl output:
+```bash
+# Read crawl manifest
+cat ~/reference-library/raw/YYYY/MM/crawl_manifest.json
+
+# Insert into documents table
+INSERT INTO documents (source_id, title, url, doc_type, raw_content_path, crawl_date, crawl_status)
+VALUES (...);
+```
+
+### list
+List documents with filters:
+```sql
+SELECT doc_id, title, crawl_status, created_at
+FROM documents
+WHERE source_id = ? AND crawl_status = 'completed'
+ORDER BY created_at DESC;
+```
+
+### get
+Retrieve specific document:
+```sql
+SELECT d.*, s.source_name, s.credibility_tier
+FROM documents d
+JOIN sources s ON d.source_id = s.source_id
+WHERE d.doc_id = ?;
+```
+
+### stats
+Show repository statistics:
+```sql
+SELECT
+  COUNT(*) as total_docs,
+  SUM(CASE WHEN crawl_status = 'completed' THEN 1 ELSE 0 END) as completed,
+  SUM(CASE WHEN crawl_status = 'pending' THEN 1 ELSE 0 END) as pending
+FROM documents;
+```
+
+## Deduplication
+
+Documents are deduplicated by URL hash:
+```sql
+-- url_hash is auto-generated: SHA2(url, 256)
+SELECT * FROM documents WHERE url_hash = SHA2('https://...', 256);
+```
+
+## Version Tracking
+
+When content changes:
+```sql
+-- Create new version
+INSERT INTO documents (..., version, previous_version_id)
+SELECT ..., version + 1, doc_id FROM documents WHERE doc_id = ?;
+
+-- Mark old as superseded
+UPDATE documents SET crawl_status = 'stale' WHERE doc_id = ?;
+```
+
+## Schema Reference
+
+Key tables:
+- `sources` - Authoritative source registry
+- `documents` - Crawled document storage
+- `distilled_content` - Processed summaries
+- `review_logs` - QA decisions
+
+Views:
+- `v_pending_reviews` - Documents awaiting review
+- `v_export_ready` - Approved for export