Google's Gemma 4 AI models get 3x speed boost by predicting future tokens

Ars Technica

Ryan Whitwam

May 6, 2026 at 11:44 PM7.0/10

Google has introduced Multi-Token Prediction (MTP) drafters for its Gemma 4 open AI models, which use speculative decoding to predict future tokens and achieve up to 3x faster generation speeds. The models are designed to run locally on consumer hardware, with the largest version capable of running on a single high-power AI accelerator at full precision. Google has also changed Gemma 4's license to the more permissive Apache 2.0, making it more accessible for developers.

Background

Google's Gemma models are open-source AI models designed to run locally on consumer hardware, providing an alternative to cloud-based AI systems. The models are built on the same technology as Google's Gemini AI but are optimized for edge computing.

Source: Ars Technica
Published: May 6, 2026 at 11:44 PM
Score: 7.0 / 10

Read Original →