Evaluate agent skill quality. Find the weakest link. Fix it. Prove it worked.
-
Updated
Apr 23, 2026 - JavaScript
Evaluate agent skill quality. Find the weakest link. Fix it. Prove it worked.
Open-source self-hosted web tool for evaluating Agent Skills with rubric scores, Deep Review, and improvement suggestions.
Evaluation framework for LLM knowledge inputs — prompts, RAG corpora, skills, agent workflows. Fix the model, vary the artifact. Built-in statistical rigor: bootstrap CI, Krippendorff α, length-debias, saturation curves.
Triage-trainer:从零为您的个人助手构建定制化的导诊 Skill,赋予精准的就诊科室推荐能力
A skill that reviews whether skills found online are safe to install for non-tech-background developers
AI-powered mock interview platform with automated scoring, role-based questions, modern React UI, FastAPI backend, and a fully implemented freemium SaaS architecture.
Evaluate agent SKILL.md files for structure, security, quality, and domain correctness.
Claim-first 仓库评测框架。in: owner/repo + repo_type → out: eval scaffold + 可靠性桶 (unusable/usable/reusable/recommendable)
Binary-criteria evaluation harness for Claude skills with planned extension to plugins, agents, and MCP servers. Score every change yes/no across 7 layers — package integrity, trigger quality, functional quality, regression protection, baseline value, model variance, rollout safety. Never gradients.
Detect malicious code and security risks in AI skill files before installation to protect AI agents from hidden threats and obfuscation techniques.
🧬 Agent 自我进化系统 - 基于数据驱动的 AI Agent 能力提升平台 | ✨ 任务监控/技能评估/智能调度/自动进化 | 📊 95%+ 测试覆盖,<20ms 延迟
Add a description, image, and links to the skill-evaluation topic page so that developers can more easily learn about it.
To associate your repository with the skill-evaluation topic, visit your repo's landing page and select "manage topics."