6 modular skills for curating, processing, and exporting reference docs: - reference-discovery: Search and validate authoritative sources - web-crawler-orchestrator: Multi-backend crawling (Firecrawl/Node/aiohttp/Scrapy) - content-repository: MySQL storage with version tracking - content-distiller: Summarization and key concept extraction - quality-reviewer: QA loop with approve/refactor/research routing - markdown-exporter: Structured output for Claude Projects or fine-tuning Cross-machine installation support: - Environment-based config (~/.reference-curator.env) - Commands tracked in repo, symlinked during install - install.sh with --minimal, --check, --uninstall modes - Firecrawl MCP as default (always available) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
169 lines
4.7 KiB
Markdown
169 lines
4.7 KiB
Markdown
# Reference Curator Skills - Refactoring Log
|
|
|
|
**Date**: 2025-01-28
|
|
**Version**: 2.0
|
|
**Author**: Claude Code (Opus 4.5)
|
|
|
|
---
|
|
|
|
## Summary
|
|
|
|
Complete restructuring of the Reference Curator skill suite from a flat structure to dual-platform format, with full installation automation.
|
|
|
|
---
|
|
|
|
## Changes Made
|
|
|
|
### 1. Directory Restructuring
|
|
|
|
**Before:**
|
|
```
|
|
90-reference-curator/
|
|
├── SKILL.md (Skill 01)
|
|
└── mnt/user-data/outputs/reference-curator-skills/
|
|
├── 02-web-crawler/SKILL.md
|
|
├── 03-content-repository/SKILL.md
|
|
├── 04-content-distiller/SKILL.md
|
|
├── 05-quality-reviewer/SKILL.md
|
|
└── 06-markdown-exporter/SKILL.md
|
|
```
|
|
|
|
**After:**
|
|
```
|
|
90-reference-curator/
|
|
├── README.md
|
|
├── install.sh # NEW: Installation script
|
|
├── 01-reference-discovery/
|
|
│ ├── code/CLAUDE.md # NEW: Claude Code directive
|
|
│ └── desktop/SKILL.md
|
|
├── 02-web-crawler-orchestrator/
|
|
│ ├── code/CLAUDE.md # NEW
|
|
│ └── desktop/SKILL.md
|
|
├── 03-content-repository/
|
|
│ ├── code/CLAUDE.md # NEW
|
|
│ └── desktop/SKILL.md
|
|
├── 04-content-distiller/
|
|
│ ├── code/CLAUDE.md # NEW
|
|
│ └── desktop/SKILL.md
|
|
├── 05-quality-reviewer/
|
|
│ ├── code/CLAUDE.md # NEW
|
|
│ └── desktop/SKILL.md
|
|
├── 06-markdown-exporter/
|
|
│ ├── code/CLAUDE.md # NEW
|
|
│ └── desktop/SKILL.md
|
|
└── shared/
|
|
├── schema.sql # NEW: MySQL schema
|
|
└── config/
|
|
├── db_config.yaml # NEW
|
|
├── crawl_config.yaml # NEW
|
|
└── export_config.yaml # NEW
|
|
```
|
|
|
|
### 2. New Files Created
|
|
|
|
| File | Purpose |
|
|
|------|---------|
|
|
| `install.sh` | Interactive installation script |
|
|
| `shared/schema.sql` | MySQL schema (9 tables, 2 views) |
|
|
| `shared/config/db_config.yaml` | Database connection config |
|
|
| `shared/config/crawl_config.yaml` | Crawler routing config |
|
|
| `shared/config/export_config.yaml` | Export format config |
|
|
| `*/code/CLAUDE.md` | 6 Claude Code directives |
|
|
|
|
### 3. Crawler Configuration
|
|
|
|
Implemented intelligent crawler routing:
|
|
|
|
| Crawler | Condition | Use Case |
|
|
|---------|-----------|----------|
|
|
| **Node.js** (default) | ≤50 pages, static | Small documentation sites |
|
|
| **Python aiohttp** | ≤200 pages, SEO needed | Technical docs |
|
|
| **Scrapy** | >200 pages, multi-domain | Enterprise crawls |
|
|
| **Firecrawl MCP** | SPA, JS-rendered | Dynamic sites |
|
|
|
|
### 4. Installation Script Features
|
|
|
|
```bash
|
|
./install.sh # Interactive installation
|
|
./install.sh --check # Verify status
|
|
./install.sh --reset # Reset (preserves data)
|
|
```
|
|
|
|
**Handles:**
|
|
- Config file deployment to `~/.config/reference-curator/`
|
|
- Storage directory creation at `~/reference-library/`
|
|
- MySQL credentials setup in `~/.envrc`
|
|
- Database creation and schema application
|
|
- Skill symlink registration in `~/.claude/skills/`
|
|
|
|
### 5. MySQL Schema
|
|
|
|
**Tables (9):**
|
|
- `sources` - Authoritative source registry
|
|
- `documents` - Crawled document storage
|
|
- `distilled_content` - Processed summaries
|
|
- `review_logs` - QA decision history
|
|
- `topics` - Topic taxonomy
|
|
- `document_topics` - Document-topic mapping
|
|
- `export_jobs` - Export task tracking
|
|
- `crawl_schedule` - Scheduled crawl jobs
|
|
- `change_detection` - Content change tracking
|
|
|
|
**Views (2):**
|
|
- `v_pending_reviews` - Documents awaiting review
|
|
- `v_export_ready` - Approved documents ready for export
|
|
|
|
### 6. Environment Setup
|
|
|
|
**Files installed:**
|
|
```
|
|
~/.config/reference-curator/
|
|
├── db_config.yaml
|
|
├── crawl_config.yaml
|
|
└── export_config.yaml
|
|
|
|
~/reference-library/
|
|
├── raw/
|
|
├── processed/
|
|
└── exports/
|
|
|
|
~/.claude/skills/
|
|
├── reference-discovery -> .../01-reference-discovery/desktop
|
|
├── web-crawler-orchestrator -> .../02-web-crawler-orchestrator/desktop
|
|
├── content-repository -> .../03-content-repository/desktop
|
|
├── content-distiller -> .../04-content-distiller/desktop
|
|
├── quality-reviewer -> .../05-quality-reviewer/desktop
|
|
└── markdown-exporter -> .../06-markdown-exporter/desktop
|
|
```
|
|
|
|
---
|
|
|
|
## Deleted
|
|
|
|
- `mnt/user-data/outputs/` directory (moved to proper structure)
|
|
- Root-level `SKILL.md` (moved to `01-reference-discovery/desktop/`)
|
|
|
|
---
|
|
|
|
## Verification
|
|
|
|
```bash
|
|
$ ./install.sh --check
|
|
|
|
✓ Configuration files (3/3)
|
|
✓ Storage directories (3/3)
|
|
✓ MySQL database (11 tables)
|
|
✓ Skill registrations (6/6)
|
|
|
|
All components installed correctly.
|
|
```
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
1. Add Python scripts to `*/code/scripts/` folders for automation
|
|
2. Implement `select_crawler.py` for intelligent routing logic
|
|
3. Add unit tests for database operations
|
|
4. Create example workflows in `*/desktop/examples/`
|