Skip to content

Popular repositories Loading

  1. skillsbench skillsbench Public

    SkillsBench evaluates how well skills work and how effective agents are at using them

    PDDL 465 187

  2. benchflow benchflow Public

    AI benchmark runtime framework that allows you to integrate and evaluate AI tasks using Docker-based benchmarks.

    Python 182 15

  3. pokemon-gym pokemon-gym Public

    Python 90 7

  4. jfkarena jfkarena Public

    TypeScript 7

  5. llm-builds-linux llm-builds-linux Public

    Python 6 1

  6. paperbench paperbench Public

    Python 5 1

Repositories

Showing 10 of 11 repositories
  • skillsbench Public

    SkillsBench evaluates how well skills work and how effective agents are at using them

    benchflow-ai/skillsbench’s past year of commit activity
    PDDL 465 Apache-2.0 187 1 210 Updated Feb 18, 2026
  • terminal-bench-3 Public Forked from harbor-framework/terminal-bench-3

    🚧 Accepting Task Submissions 🚧

    benchflow-ai/terminal-bench-3’s past year of commit activity
    Python 0 37 0 0 Updated Feb 17, 2026
  • harbor Public Forked from laude-institute/harbor

    Harbor is a framework for running agent evaluations and creating and using RL environments.

    benchflow-ai/harbor’s past year of commit activity
    Python 0 Apache-2.0 535 0 0 Updated Feb 12, 2026
  • benchflow-ai/harbor-datasets’s past year of commit activity
    Python 0 69 0 0 Updated Feb 12, 2026
  • benchflow-ai/skillsbench-trajectories’s past year of commit activity
    Python 1 0 0 0 Updated Feb 11, 2026
  • benchflow-ai/llm-builds-linux’s past year of commit activity
    Python 6 1 0 8 Updated Dec 20, 2025
  • benchflow Public

    AI benchmark runtime framework that allows you to integrate and evaluate AI tasks using Docker-based benchmarks.

    benchflow-ai/benchflow’s past year of commit activity
    Python 182 MIT 15 0 0 Updated Dec 19, 2025
  • pokemon-gym Public
    benchflow-ai/pokemon-gym’s past year of commit activity
    Python 90 7 0 0 Updated Jun 29, 2025
  • paperbench Public
    benchflow-ai/paperbench’s past year of commit activity
    Python 5 MIT 1 0 0 Updated Apr 14, 2025
  • jfkarena Public
    benchflow-ai/jfkarena’s past year of commit activity
    TypeScript 7 0 0 0 Updated Apr 1, 2025

Most used topics

Loading…