feat(reference-curator): Add Claude.ai Projects export format

Add claude-project/ folder with skill files formatted for upload to Claude.ai Projects (web interface): - reference-curator-complete.md: All 6 skills consolidated - INDEX.md: Overview and workflow documentation - Individual skill files (01-06) without YAML frontmatter Add --claude-ai option to install.sh: - Lists available files for upload - Optionally copies to custom destination directory - Provides upload instructions for Claude.ai Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 00:33:06 +07:00
parent 8762f68e6e
commit 243b9d851c
10 changed files with 1987 additions and 0 deletions
--- a/custom-skills/90-reference-curator/claude-project/01-reference-discovery.md
+++ b/custom-skills/90-reference-curator/claude-project/01-reference-discovery.md
@@ -0,0 +1,184 @@
+
+# Reference Discovery
+
+Searches for authoritative sources, validates credibility, and produces curated URL lists for crawling.
+
+## Source Priority Hierarchy
+
+| Tier | Source Type | Examples |
+|------|-------------|----------|
+| **Tier 1** | Official documentation | docs.anthropic.com, docs.claude.com, platform.openai.com/docs |
+| **Tier 1** | Engineering blogs (official) | anthropic.com/news, openai.com/blog |
+| **Tier 1** | Official GitHub repos | github.com/anthropics/*, github.com/openai/* |
+| **Tier 2** | Research papers | arxiv.org, papers with citations |
+| **Tier 2** | Verified community guides | Cookbook examples, official tutorials |
+| **Tier 3** | Community content | Blog posts, tutorials, Stack Overflow |
+
+## Discovery Workflow
+
+### Step 1: Define Search Scope
+
+```python
+search_config = {
+    "topic": "prompt engineering",
+    "vendors": ["anthropic", "openai", "google"],
+    "source_types": ["official_docs", "engineering_blog", "github_repo"],
+    "freshness": "past_year",  # past_week, past_month, past_year, any
+    "max_results_per_query": 20
+}
+```
+
+### Step 2: Generate Search Queries
+
+For a given topic, generate targeted queries:
+
+```python
+def generate_queries(topic, vendors):
+    queries = []
+    
+    # Official documentation queries
+    for vendor in vendors:
+        queries.append(f"site:docs.{vendor}.com {topic}")
+        queries.append(f"site:{vendor}.com/docs {topic}")
+    
+    # Engineering blog queries
+    for vendor in vendors:
+        queries.append(f"site:{vendor}.com/blog {topic}")
+        queries.append(f"site:{vendor}.com/news {topic}")
+    
+    # GitHub queries
+    for vendor in vendors:
+        queries.append(f"site:github.com/{vendor} {topic}")
+    
+    # Research queries
+    queries.append(f"site:arxiv.org {topic}")
+    
+    return queries
+```
+
+### Step 3: Execute Search
+
+Use web search tool for each query:
+
+```python
+def execute_discovery(queries):
+    results = []
+    for query in queries:
+        search_results = web_search(query)
+        for result in search_results:
+            results.append({
+                "url": result.url,
+                "title": result.title,
+                "snippet": result.snippet,
+                "query_used": query
+            })
+    return deduplicate_by_url(results)
+```
+
+### Step 4: Validate and Score Sources
+
+```python
+def score_source(url, title):
+    score = 0.0
+    
+    # Domain credibility
+    if any(d in url for d in ['docs.anthropic.com', 'docs.claude.com', 'docs.openai.com']):
+        score += 0.40  # Tier 1 official docs
+    elif any(d in url for d in ['anthropic.com', 'openai.com', 'google.dev']):
+        score += 0.30  # Tier 1 official blog/news
+    elif 'github.com' in url and any(v in url for v in ['anthropics', 'openai', 'google']):
+        score += 0.30  # Tier 1 official repos
+    elif 'arxiv.org' in url:
+        score += 0.20  # Tier 2 research
+    else:
+        score += 0.10  # Tier 3 community
+    
+    # Freshness signals (from title/snippet)
+    if any(year in title for year in ['2025', '2024']):
+        score += 0.20
+    elif any(year in title for year in ['2023']):
+        score += 0.10
+    
+    # Relevance signals
+    if any(kw in title.lower() for kw in ['guide', 'documentation', 'tutorial', 'best practices']):
+        score += 0.15
+    
+    return min(score, 1.0)
+
+def assign_credibility_tier(score):
+    if score >= 0.60:
+        return 'tier1_official'
+    elif score >= 0.40:
+        return 'tier2_verified'
+    else:
+        return 'tier3_community'
+```
+
+### Step 5: Output URL Manifest
+
+```python
+def create_manifest(scored_results, topic):
+    manifest = {
+        "discovery_date": datetime.now().isoformat(),
+        "topic": topic,
+        "total_urls": len(scored_results),
+        "urls": []
+    }
+    
+    for result in sorted(scored_results, key=lambda x: x['score'], reverse=True):
+        manifest["urls"].append({
+            "url": result["url"],
+            "title": result["title"],
+            "credibility_tier": result["tier"],
+            "credibility_score": result["score"],
+            "source_type": infer_source_type(result["url"]),
+            "vendor": infer_vendor(result["url"])
+        })
+    
+    return manifest
+```
+
+## Output Format
+
+Discovery produces a JSON manifest for the crawler:
+
+```json
+{
+  "discovery_date": "2025-01-28T10:30:00",
+  "topic": "prompt engineering",
+  "total_urls": 15,
+  "urls": [
+    {
+      "url": "https://docs.anthropic.com/en/docs/prompt-engineering",
+      "title": "Prompt Engineering Guide",
+      "credibility_tier": "tier1_official",
+      "credibility_score": 0.85,
+      "source_type": "official_docs",
+      "vendor": "anthropic"
+    }
+  ]
+}
+```
+
+## Known Authoritative Sources
+
+Pre-validated sources for common topics:
+
+| Vendor | Documentation | Blog/News | GitHub |
+|--------|--------------|-----------|--------|
+| Anthropic | docs.anthropic.com, docs.claude.com | anthropic.com/news | github.com/anthropics |
+| OpenAI | platform.openai.com/docs | openai.com/blog | github.com/openai |
+| Google | ai.google.dev/docs | blog.google/technology/ai | github.com/google |
+
+## Integration
+
+**Output:** URL manifest JSON → `web-crawler-orchestrator`
+
+**Database:** Register new sources in `sources` table via `content-repository`
+
+## Deduplication
+
+Before outputting, deduplicate URLs:
+- Normalize URLs (remove trailing slashes, query params)
+- Check against existing `documents` table via `content-repository`
+- Merge duplicate entries, keeping highest credibility score