Files
our-claude-skills/custom-skills/90-reference-curator/01-reference-discovery/code/CLAUDE.md
Andrew Yim 6d7a6d7a88 feat(reference-curator): Add portable skill suite for reference documentation curation
6 modular skills for curating, processing, and exporting reference docs:
- reference-discovery: Search and validate authoritative sources
- web-crawler-orchestrator: Multi-backend crawling (Firecrawl/Node/aiohttp/Scrapy)
- content-repository: MySQL storage with version tracking
- content-distiller: Summarization and key concept extraction
- quality-reviewer: QA loop with approve/refactor/research routing
- markdown-exporter: Structured output for Claude Projects or fine-tuning

Cross-machine installation support:
- Environment-based config (~/.reference-curator.env)
- Commands tracked in repo, symlinked during install
- install.sh with --minimal, --check, --uninstall modes
- Firecrawl MCP as default (always available)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 00:20:27 +07:00

2.3 KiB

Reference Discovery

Search and identify authoritative sources for reference materials. Validates source credibility, prioritizes by relevance, and outputs curated URL lists with metadata.

Trigger Keywords

"find references", "search documentation", "discover sources", "find authoritative materials", "research topic sources"

Source Priority Hierarchy

Tier Source Type Examples
Tier 1 Official documentation docs.anthropic.com, docs.claude.com, platform.openai.com/docs
Tier 1 Engineering blogs (official) anthropic.com/news, openai.com/blog
Tier 1 Official GitHub repos github.com/anthropics/, github.com/openai/
Tier 2 Research papers arxiv.org, papers with citations
Tier 2 Verified community guides Cookbook examples, official tutorials
Tier 3 Community content Blog posts, tutorials, Stack Overflow

Workflow

Step 1: Define Search Scope

Gather topic, target vendors, and freshness requirements from user input.

Use WebSearch tool with targeted queries:

site:docs.anthropic.com {topic}
site:github.com/anthropics {topic}
site:arxiv.org {topic}

Step 3: Score and Validate Sources

Apply credibility scoring:

  • Domain credibility (0.10 - 0.40)
  • Freshness signals (0.10 - 0.20)
  • Relevance signals (0.15)

Step 4: Output URL Manifest

Generate JSON manifest for the crawler skill:

{
  "discovery_date": "2025-01-28T10:30:00",
  "topic": "prompt engineering",
  "total_urls": 15,
  "urls": [
    {
      "url": "https://docs.anthropic.com/en/docs/prompt-engineering",
      "title": "Prompt Engineering Guide",
      "credibility_tier": "tier1_official",
      "credibility_score": 0.85,
      "source_type": "official_docs",
      "vendor": "anthropic"
    }
  ]
}

Scripts

discover_sources.py

Main discovery script. Usage:

python scripts/discover_sources.py --topic "prompt engineering" --vendors anthropic,openai --output manifest.json

Output

  • manifest.json → Handoff to 02-web-crawler-orchestrator
  • Register new sources in sources table via 03-content-repository

Deduplication

Before outputting:

  • Normalize URLs (remove trailing slashes, query params)
  • Check against existing documents table
  • Merge duplicates, keeping highest credibility score