# Reference Discovery Search and identify authoritative sources for reference materials. Validates source credibility, prioritizes by relevance, and outputs curated URL lists with metadata. ## Trigger Keywords "find references", "search documentation", "discover sources", "find authoritative materials", "research topic sources" ## Source Priority Hierarchy | Tier | Source Type | Examples | |------|-------------|----------| | **Tier 1** | Official documentation | docs.anthropic.com, docs.claude.com, platform.openai.com/docs | | **Tier 1** | Engineering blogs (official) | anthropic.com/news, openai.com/blog | | **Tier 1** | Official GitHub repos | github.com/anthropics/*, github.com/openai/* | | **Tier 2** | Research papers | arxiv.org, papers with citations | | **Tier 2** | Verified community guides | Cookbook examples, official tutorials | | **Tier 3** | Community content | Blog posts, tutorials, Stack Overflow | ## Workflow ### Step 1: Define Search Scope Gather topic, target vendors, and freshness requirements from user input. ### Step 2: Execute Web Search Use WebSearch tool with targeted queries: ``` site:docs.anthropic.com {topic} site:github.com/anthropics {topic} site:arxiv.org {topic} ``` ### Step 3: Score and Validate Sources Apply credibility scoring: - Domain credibility (0.10 - 0.40) - Freshness signals (0.10 - 0.20) - Relevance signals (0.15) ### Step 4: Output URL Manifest Generate JSON manifest for the crawler skill: ```json { "discovery_date": "2025-01-28T10:30:00", "topic": "prompt engineering", "total_urls": 15, "urls": [ { "url": "https://docs.anthropic.com/en/docs/prompt-engineering", "title": "Prompt Engineering Guide", "credibility_tier": "tier1_official", "credibility_score": 0.85, "source_type": "official_docs", "vendor": "anthropic" } ] } ``` ## Scripts ### `discover_sources.py` Main discovery script. Usage: ```bash python scripts/discover_sources.py --topic "prompt engineering" --vendors anthropic,openai --output manifest.json ``` ## Output - `manifest.json` → Handoff to `02-web-crawler-orchestrator` - Register new sources in `sources` table via `03-content-repository` ## Deduplication Before outputting: - Normalize URLs (remove trailing slashes, query params) - Check against existing `documents` table - Merge duplicates, keeping highest credibility score