A New York Times analysis using OpenAI's SimpleQA benchmark found Google's AI Overviews provides incorrect answers 10% of the time, translating to millions of daily errors. The feature showed improvement from 85% to 91% accuracy between Gemini 2.5 and 3.0 updates, but remains problematic at scale. Examples include confidently citing contradictory or irrelevant sources for factual queries.
Background
Google's AI Overviews is a Gemini-powered feature that appears atop search results, designed to summarize information but criticized for accuracy issues since its 2024 launch. The SimpleQA benchmark is a standardized test with 4,000+ verifiable questions used to evaluate AI factuality.
- Source
- Ars Technica
- Published
- Apr 8, 2026 at 12:53 AM
- Score
- 7.0 / 10