**Description:** Guide to using the evaluation and tuning toolkit for systematic agent improvement. **Acceptance Criteria:** - [ ] Guide at `docs/guides/evaluation.md` - [ ] When and why to evaluate agents - [ ] Writing effective test cases - [ ] Running evaluations (programmatic and CLI) - [ ] Interpreting results and metrics - [ ] Parameter sweeps: when and how - [ ] Prompt comparison workflow - [ ] CI/CD integration patterns - [ ] Cost management for evaluations
Description:
Guide to using the evaluation and tuning toolkit for systematic agent improvement.
Acceptance Criteria:
docs/guides/evaluation.md