An open-source CLI agent called Dirac has achieved the top score of 65.2% on TerminalBench, outperforming Google's official agent (47.8%) and the previous closed-source leader Junie CLI (64.3%). The developer emphasizes the implementation was fully compliant with no cheating mechanisms, highlighting the importance of proper benchmarking harness design. The achievement demonstrates significant progress in terminal-based AI agents and open-source AI capabilities.
Background
TerminalBench is a benchmark for evaluating AI agents' performance in terminal environments, with recent concerns about cheating in the leaderboard submissions. The field of AI agents is rapidly evolving with both open-source and proprietary solutions competing for performance superiority.
- Source
- Hacker News (RSS)
- Published
- Apr 27, 2026 at 08:35 PM
- Score
- 7.0 / 10