skill-evaluation

Star

Here are 11 public repositories matching this topic...

Evol-ai / SkillCompass

Star

Evaluate agent skill quality. Find the weakest link. Fix it. Prove it worked.

ai-agents skill-evaluation anthropic agent-skills claude-code skill-rating claude-code-skill openclaw openclaw-skill

Updated Apr 23, 2026
JavaScript

AndrewNgGirl / SkillLens

Star

Open-source self-hosted web tool for evaluating Agent Skills with rubric scores, Deep Review, and improvement suggestions.

typescript skills skill nextjs self-hosted developer-tools cursor ai-agents claude rubric skill-evaluation llm agent-skills claude-code openclaw

Updated May 7, 2026
TypeScript

Evaluation framework for LLM knowledge inputs — prompts, RAG corpora, skills, agent workflows. Fix the model, vary the artifact. Built-in statistical rigor: bootstrap CI, Krippendorff α, length-debias, saturation curves.

benchmark ai evaluation-framework claude knowledge-engineering skill-evaluation llm prompt-engineering prompt-testing llm-evaluation rag-evaluation llm-judge claude-code agent-evaluation bootstrap-ci krippendorff-alpha evaluation-as-code multi-judge-ensemble

Updated May 12, 2026
TypeScript

SirryChen / triage-skill-creator

Star

Triage-trainer：从零为您的个人助手构建定制化的导诊 Skill，赋予精准的就诊科室推荐能力

skill triage skill-evaluation skill-creator agent-trainer

Updated Mar 28, 2026
HTML

duck-ai-yy / skill-safety-reviewer

Star

A skill that reviews whether skills found online are safe to install for non-tech-background developers

ai-safety cowork skill-evaluation tool-evaluation claude-ai claude-skills safety-reviewer

Updated Mar 22, 2026

saniyaacharya04 / interviewforge

Star

AI-powered mock interview platform with automated scoring, role-based questions, modern React UI, FastAPI backend, and a fully implemented freemium SaaS architecture.

machine-learning-application technical-interviews fastapi skill-evaluation ai-tools assessment-platform interview-simulator ai-scoring

Updated Dec 13, 2025
Python

WilliamWJHuang / agent-skill-evaluator

Star

Evaluate agent SKILL.md files for structure, security, quality, and domain correctness.

linter quality-assurance security-analysis ai-agents skill-evaluation agent-skills

Updated Apr 18, 2026
Python

zinan92 / repo-evals

Star

Claim-first 仓库评测框架。in: owner/repo + repo_type → out: eval scaffold + 可靠性桶 (unusable/usable/reusable/recommendable)

bash evaluation testing-framework skill-evaluation claim-first

Updated May 13, 2026
HTML

jeremylongshore / j-rig-skill-binary-eval

Sponsor

Star

Binary-criteria evaluation harness for Claude skills with planned extension to plugins, agents, and MCP servers. Score every change yes/no across 7 layers — package integrity, trigger quality, functional quality, regression protection, baseline value, model variance, rollout safety. Never gradients.

mcp regression-testing skill-evaluation ai-evaluation llm-eval claude-code plugin-testing eval-harness agent-eval binary-criteria