feat(reference-curator): Add portable skill suite for reference documentation curation
6 modular skills for curating, processing, and exporting reference docs: - reference-discovery: Search and validate authoritative sources - web-crawler-orchestrator: Multi-backend crawling (Firecrawl/Node/aiohttp/Scrapy) - content-repository: MySQL storage with version tracking - content-distiller: Summarization and key concept extraction - quality-reviewer: QA loop with approve/refactor/research routing - markdown-exporter: Structured output for Claude Projects or fine-tuning Cross-machine installation support: - Environment-based config (~/.reference-curator.env) - Commands tracked in repo, symlinked during install - install.sh with --minimal, --check, --uninstall modes - Firecrawl MCP as default (always available) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,94 @@
|
||||
---
|
||||
description: Store and manage crawled content in MySQL. Handles versioning, deduplication, and document metadata.
|
||||
argument-hint: <action> [--doc-id N] [--source-id N]
|
||||
allowed-tools: Bash, Read, Write, Glob, Grep
|
||||
---
|
||||
|
||||
# Content Repository
|
||||
|
||||
Manage crawled content in MySQL database.
|
||||
|
||||
## Arguments
|
||||
- `<action>`: store | list | get | update | delete | stats
|
||||
- `--doc-id`: Specific document ID
|
||||
- `--source-id`: Filter by source ID
|
||||
|
||||
## Database Connection
|
||||
|
||||
```bash
|
||||
source ~/.envrc
|
||||
mysql -u $MYSQL_USER -p"$MYSQL_PASSWORD" reference_library
|
||||
```
|
||||
|
||||
## Actions
|
||||
|
||||
### store
|
||||
Store new documents from crawl output:
|
||||
```bash
|
||||
# Read crawl manifest
|
||||
cat ~/reference-library/raw/YYYY/MM/crawl_manifest.json
|
||||
|
||||
# Insert into documents table
|
||||
INSERT INTO documents (source_id, title, url, doc_type, raw_content_path, crawl_date, crawl_status)
|
||||
VALUES (...);
|
||||
```
|
||||
|
||||
### list
|
||||
List documents with filters:
|
||||
```sql
|
||||
SELECT doc_id, title, crawl_status, created_at
|
||||
FROM documents
|
||||
WHERE source_id = ? AND crawl_status = 'completed'
|
||||
ORDER BY created_at DESC;
|
||||
```
|
||||
|
||||
### get
|
||||
Retrieve specific document:
|
||||
```sql
|
||||
SELECT d.*, s.source_name, s.credibility_tier
|
||||
FROM documents d
|
||||
JOIN sources s ON d.source_id = s.source_id
|
||||
WHERE d.doc_id = ?;
|
||||
```
|
||||
|
||||
### stats
|
||||
Show repository statistics:
|
||||
```sql
|
||||
SELECT
|
||||
COUNT(*) as total_docs,
|
||||
SUM(CASE WHEN crawl_status = 'completed' THEN 1 ELSE 0 END) as completed,
|
||||
SUM(CASE WHEN crawl_status = 'pending' THEN 1 ELSE 0 END) as pending
|
||||
FROM documents;
|
||||
```
|
||||
|
||||
## Deduplication
|
||||
|
||||
Documents are deduplicated by URL hash:
|
||||
```sql
|
||||
-- url_hash is auto-generated: SHA2(url, 256)
|
||||
SELECT * FROM documents WHERE url_hash = SHA2('https://...', 256);
|
||||
```
|
||||
|
||||
## Version Tracking
|
||||
|
||||
When content changes:
|
||||
```sql
|
||||
-- Create new version
|
||||
INSERT INTO documents (..., version, previous_version_id)
|
||||
SELECT ..., version + 1, doc_id FROM documents WHERE doc_id = ?;
|
||||
|
||||
-- Mark old as superseded
|
||||
UPDATE documents SET crawl_status = 'stale' WHERE doc_id = ?;
|
||||
```
|
||||
|
||||
## Schema Reference
|
||||
|
||||
Key tables:
|
||||
- `sources` - Authoritative source registry
|
||||
- `documents` - Crawled document storage
|
||||
- `distilled_content` - Processed summaries
|
||||
- `review_logs` - QA decisions
|
||||
|
||||
Views:
|
||||
- `v_pending_reviews` - Documents awaiting review
|
||||
- `v_export_ready` - Approved for export
|
||||
Reference in New Issue
Block a user