E-Ink News Daily

Back to list

Show HN: OSS Agent I built topped the TerminalBench on Gemini-3-flash-preview

An open-source CLI agent called Dirac has achieved the top score of 65.2% on TerminalBench, outperforming Google's official agent (47.8%) and the previous closed-source leader Junie CLI (64.3%). The developer emphasizes the implementation was fully compliant with no cheating mechanisms, highlighting the importance of proper benchmarking harness design. The achievement demonstrates significant progress in terminal-based AI agents and open-source AI capabilities.

Background

TerminalBench is a benchmark for evaluating AI agents' performance in terminal environments, with recent concerns about cheating in the leaderboard submissions. The field of AI agents is rapidly evolving with both open-source and proprietary solutions competing for performance superiority.

Source
Hacker News (RSS)
Published
Apr 27, 2026 at 08:35 PM
Score
7.0 / 10