Files
Andrew Yim b6a478e1df feat: Add installation tool, Claude.ai export, and skill standardization (#1)
## Summary

- Add portable installation tool (`install.sh`) for cross-machine setup
- Add Claude.ai export files with proper YAML frontmatter
- Add multi-agent-guide v2.0 with consolidated framework template
- Rename `00-claude-code-setting` → `00-our-settings-audit` (avoid reserved word)
- Add YAML frontmatter to 25+ SKILL.md files for Claude Desktop compatibility

## Commits Included

- `93f604a` feat: Add portable installation tool for cross-machine setup
- `9b84104` feat: Add Claude.ai export for portable skill installation
- `f7ab973` fix: Add YAML frontmatter to Claude.ai export files
- `3fed49a` feat(multi-agent-guide): Add v2.0 with consolidated framework
- `3be26ef` refactor: Rename settings-audit skill and add YAML frontmatter

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 16:48:06 +07:00

191 lines
5.6 KiB
Markdown

---
name: reference-discovery
description: |
Search and discover authoritative reference sources with credibility validation.
Triggers: find sources, search documentation, discover references, source validation.
---
# Reference Discovery
Searches for authoritative sources, validates credibility, and produces curated URL lists for crawling.
## Source Priority Hierarchy
| Tier | Source Type | Examples |
|------|-------------|----------|
| **Tier 1** | Official documentation | docs.anthropic.com, docs.claude.com, platform.openai.com/docs |
| **Tier 1** | Engineering blogs (official) | anthropic.com/news, openai.com/blog |
| **Tier 1** | Official GitHub repos | github.com/anthropics/*, github.com/openai/* |
| **Tier 2** | Research papers | arxiv.org, papers with citations |
| **Tier 2** | Verified community guides | Cookbook examples, official tutorials |
| **Tier 3** | Community content | Blog posts, tutorials, Stack Overflow |
## Discovery Workflow
### Step 1: Define Search Scope
```python
search_config = {
"topic": "prompt engineering",
"vendors": ["anthropic", "openai", "google"],
"source_types": ["official_docs", "engineering_blog", "github_repo"],
"freshness": "past_year", # past_week, past_month, past_year, any
"max_results_per_query": 20
}
```
### Step 2: Generate Search Queries
For a given topic, generate targeted queries:
```python
def generate_queries(topic, vendors):
queries = []
# Official documentation queries
for vendor in vendors:
queries.append(f"site:docs.{vendor}.com {topic}")
queries.append(f"site:{vendor}.com/docs {topic}")
# Engineering blog queries
for vendor in vendors:
queries.append(f"site:{vendor}.com/blog {topic}")
queries.append(f"site:{vendor}.com/news {topic}")
# GitHub queries
for vendor in vendors:
queries.append(f"site:github.com/{vendor} {topic}")
# Research queries
queries.append(f"site:arxiv.org {topic}")
return queries
```
### Step 3: Execute Search
Use web search tool for each query:
```python
def execute_discovery(queries):
results = []
for query in queries:
search_results = web_search(query)
for result in search_results:
results.append({
"url": result.url,
"title": result.title,
"snippet": result.snippet,
"query_used": query
})
return deduplicate_by_url(results)
```
### Step 4: Validate and Score Sources
```python
def score_source(url, title):
score = 0.0
# Domain credibility
if any(d in url for d in ['docs.anthropic.com', 'docs.claude.com', 'docs.openai.com']):
score += 0.40 # Tier 1 official docs
elif any(d in url for d in ['anthropic.com', 'openai.com', 'google.dev']):
score += 0.30 # Tier 1 official blog/news
elif 'github.com' in url and any(v in url for v in ['anthropics', 'openai', 'google']):
score += 0.30 # Tier 1 official repos
elif 'arxiv.org' in url:
score += 0.20 # Tier 2 research
else:
score += 0.10 # Tier 3 community
# Freshness signals (from title/snippet)
if any(year in title for year in ['2025', '2024']):
score += 0.20
elif any(year in title for year in ['2023']):
score += 0.10
# Relevance signals
if any(kw in title.lower() for kw in ['guide', 'documentation', 'tutorial', 'best practices']):
score += 0.15
return min(score, 1.0)
def assign_credibility_tier(score):
if score >= 0.60:
return 'tier1_official'
elif score >= 0.40:
return 'tier2_verified'
else:
return 'tier3_community'
```
### Step 5: Output URL Manifest
```python
def create_manifest(scored_results, topic):
manifest = {
"discovery_date": datetime.now().isoformat(),
"topic": topic,
"total_urls": len(scored_results),
"urls": []
}
for result in sorted(scored_results, key=lambda x: x['score'], reverse=True):
manifest["urls"].append({
"url": result["url"],
"title": result["title"],
"credibility_tier": result["tier"],
"credibility_score": result["score"],
"source_type": infer_source_type(result["url"]),
"vendor": infer_vendor(result["url"])
})
return manifest
```
## Output Format
Discovery produces a JSON manifest for the crawler:
```json
{
"discovery_date": "2025-01-28T10:30:00",
"topic": "prompt engineering",
"total_urls": 15,
"urls": [
{
"url": "https://docs.anthropic.com/en/docs/prompt-engineering",
"title": "Prompt Engineering Guide",
"credibility_tier": "tier1_official",
"credibility_score": 0.85,
"source_type": "official_docs",
"vendor": "anthropic"
}
]
}
```
## Known Authoritative Sources
Pre-validated sources for common topics:
| Vendor | Documentation | Blog/News | GitHub |
|--------|--------------|-----------|--------|
| Anthropic | docs.anthropic.com, docs.claude.com | anthropic.com/news | github.com/anthropics |
| OpenAI | platform.openai.com/docs | openai.com/blog | github.com/openai |
| Google | ai.google.dev/docs | blog.google/technology/ai | github.com/google |
## Integration
**Output:** URL manifest JSON → `web-crawler-orchestrator`
**Database:** Register new sources in `sources` table via `content-repository`
## Deduplication
Before outputting, deduplicate URLs:
- Normalize URLs (remove trailing slashes, query params)
- Check against existing `documents` table via `content-repository`
- Merge duplicate entries, keeping highest credibility score