directory changes and restructuring

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-22 02:01:41 +09:00
parent eea49f9f8c
commit 236be6c580
598 changed files with 0 additions and 0 deletions
--- a/official-skills/notion-spec-to-implementation/evaluations/README.md
+++ b/official-skills/notion-spec-to-implementation/evaluations/README.md
@@ -0,0 +1,120 @@
+# Spec to Implementation Skill Evaluations
+
+Evaluation scenarios for testing the Spec to Implementation skill across different Claude models.
+
+## Purpose
+
+These evaluations ensure the Spec to Implementation skill:
+- Finds and parses specification pages accurately
+- Breaks down specs into actionable implementation plans
+- Creates tasks that Claude can implement with clear acceptance criteria
+- Tracks progress and updates implementation status
+- Works consistently across Haiku, Sonnet, and Opus
+
+## Evaluation Files
+
+### basic-spec-implementation.json
+Tests basic workflow of turning a spec into an implementation plan.
+
+**Scenario**: Implement user authentication feature from spec  
+**Key Behaviors**:
+- Searches for and finds the authentication spec page
+- Fetches spec and extracts requirements
+- Parses requirements into phases (setup, core features, polish)
+- Creates implementation plan page linked to original spec
+- Breaks down into clear phases with deliverables
+- Includes timeline and dependencies
+
+### spec-to-tasks.json
+Tests creating concrete tasks from a specification in a task database.
+
+**Scenario**: Create tasks from API redesign spec  
+**Key Behaviors**:
+- Finds spec page in Notion
+- Extracts specific requirements and acceptance criteria
+- Searches for or creates task database
+- Fetches task database schema
+- Creates multiple tasks with proper properties (Status, Priority, Sprint, etc.)
+- Each task has clear title, description, and acceptance criteria
+- Tasks have dependencies where appropriate
+- Links all tasks back to original spec
+
+## Running Evaluations
+
+1. Enable the `spec-to-implementation` skill
+2. Submit the query from the evaluation file
+3. Verify the skill finds the spec page via search
+4. Check that requirements are accurately parsed
+5. Confirm implementation plan is created with phases
+6. Verify tasks have clear, implementable acceptance criteria
+7. Check that tasks link back to spec
+8. Test with Haiku, Sonnet, and Opus
+
+## Expected Skill Behaviors
+
+Spec to Implementation evaluations should verify:
+
+### Spec Discovery & Parsing
+- Searches Notion for specification pages
+- Fetches complete spec content
+- Extracts all requirements accurately
+- Identifies technical dependencies
+- Understands acceptance criteria
+- Notes any ambiguities or missing details
+
+### Implementation Planning
+- Creates implementation plan page
+- Breaks work into logical phases:
+  - Phase 1: Foundation/Setup
+  - Phase 2: Core Implementation
+  - Phase 3: Testing & Polish
+- Includes timeline estimates
+- Identifies dependencies between phases
+- Links back to original spec
+
+### Task Creation
+- Finds or identifies task database
+- Fetches database schema for property names
+- Creates tasks with correct properties
+- Each task has:
+  - Clear, specific title
+  - Context and description
+  - Acceptance criteria (checklist format)
+  - Appropriate priority and status
+  - Link to spec page
+- Tasks are right-sized (not too big, not too small)
+- Dependencies between tasks are noted
+
+### Progress Tracking
+- Implementation plan includes progress markers
+- Tasks can be updated as work progresses
+- Status updates link to completed work
+- Blockers or changes are noted
+
+## Creating New Evaluations
+
+When adding Spec to Implementation evaluations:
+
+1. **Test different spec types** - Features, migrations, refactors, API changes, UI components
+2. **Vary complexity** - Simple 1-phase vs. complex multi-phase implementations
+3. **Test task granularity** - Does it create appropriately-sized tasks?
+4. **Include edge cases** - Vague specs, conflicting requirements, missing details
+5. **Test database integration** - Creating tasks in existing task databases with various schemas
+6. **Progress tracking** - Updating implementation plans as tasks complete
+
+## Example Success Criteria
+
+**Good** (specific, testable):
+- "Searches Notion for spec page using feature name"
+- "Creates implementation plan with 3 phases: Setup → Core → Polish"
+- "Creates 5-8 tasks in task database with properties: Task (title), Status, Priority, Sprint"
+- "Each task has acceptance criteria in checklist format (- [ ] ...)"
+- "Tasks link back to spec using mention-page tag"
+- "Task titles are specific and actionable (e.g., 'Create login API endpoint' not 'Authentication')"
+
+**Bad** (vague, untestable):
+- "Creates good implementation plan"
+- "Tasks are well-structured"
+- "Breaks down spec appropriately"
+- "Links to spec"
+