Skip to content
Change the repository type filter

All

    Repositories list

    • Python
      0300Updated Apr 18, 2026Apr 18, 2026
    • harbor

      Public
      Harbor is a framework for running agent evaluations and creating and using RL environments.
      Python
      Apache License 2.0
      923100Updated Apr 8, 2026Apr 8, 2026
    • anvil

      Public
      Python
      10810Updated Mar 28, 2026Mar 28, 2026
    • AfterQuery's MLE Reasoning Harness
      Python
      1000Updated Feb 2, 2026Feb 2, 2026
    • IDE-Bench

      Public
      Comprehensive framework for evaluating AI IDE agents on real-world, cross-stack SWE tasks
      Python
      9400Updated Feb 1, 2026Feb 1, 2026
    • RL-Take-Home

      Public
      RL Intern Take-Home Assignment
      Python
      1000Updated Dec 30, 2025Dec 30, 2025
    • public documentation for appbench.ai
      1000Updated Dec 10, 2025Dec 10, 2025
    • Python
      7000Updated Oct 27, 2025Oct 27, 2025
    • vader

      Public
      Java
      31010Updated May 24, 2025May 24, 2025
    • FinanceQA

      Public
      FinanceQA: A Benchmark for Evaluating Financial Analysis Capabilities in Large Language Models
      1600Updated Feb 2, 2025Feb 2, 2025
    ProTip! When viewing an organization's repositories, you can use the props. filter to filter by custom property.