Files

Andrew Yim 236be6c580 directory changes and restructuring

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2025-12-22 02:01:41 +09:00

conversation-to-wiki.json

2025-12-22 02:01:41 +09:00

decision-record.json

2025-12-22 02:01:41 +09:00

README.md

2025-12-22 02:01:41 +09:00

Knowledge Capture Skill Evaluations

Evaluation scenarios for testing the Knowledge Capture skill across different Claude models.

Purpose

These evaluations ensure the Knowledge Capture skill:

Correctly identifies content types (how-to guides, FAQs, decision records, wikis)
Extracts relevant information from conversations
Structures content appropriately for each type
Searches and places content in the right Notion location
Works consistently across Haiku, Sonnet, and Opus

Tests capturing conversation content as a how-to guide for the team wiki.

Scenario: Save deployment discussion to wiki
Key Behaviors:

Extracts steps, gotchas, and best practices from conversation
Identifies content as How-To Guide
Structures with proper sections (Overview, Prerequisites, Steps, Troubleshooting)
Searches for team wiki location
Preserves technical details (commands, configs)

Tests capturing architectural or technical decisions with full context.

Scenario: Document database migration decision
Key Behaviors:

Extracts decision context, alternatives, and rationale
Follows decision record structure (Context, Decision, Alternatives, Consequences)
Captures both selected and rejected options with reasoning
Places in decision log or ADR database
Links to related technical documentation

Knowledge Capture evaluations should verify:

Correctly identifies appropriate content type (how-to, FAQ, decision record, wiki page)
Uses matching structure from reference documentation
Applies proper Notion markdown formatting

When adding Knowledge Capture evaluations:

Use realistic conversation content - Include actual technical details, decisions, or processes
Test different content types - How-to guides, FAQs, decision records, meeting notes, learnings
Vary complexity - Simple captures vs. complex technical discussions
Test discovery - Finding the right wiki section or database
Include edge cases - Unclear content types, minimal context, overlapping categories

Good (specific, testable):

Bad (vague, untestable):