Google DeepMind releases DiffusionGemma, a model that runs local AI 4x faster

Ars Technica

Ryan Whitwam

Jun 11, 2026 at 03:29 AM7.0/10

Google DeepMind has released DiffusionGemma, a new open AI model that generates text in parallel blocks rather than sequentially, achieving speeds up to 4x faster than traditional autoregressive models. The 26-billion parameter Mixture of Experts model activates only 3.8 billion parameters during inference, making it suitable for local hardware like gaming GPUs. It can generate up to 1,000+ tokens per second on an H100 accelerator, with potential applications in text editing and molecular modeling.

Background

Traditional AI language models like GPT use autoregressive generation, producing text sequentially one token at a time, which can be computationally intensive and slow. Google's new approach adapts diffusion techniques from image generation to text, enabling parallel processing for faster outputs.

Source: Ars Technica
Published: Jun 11, 2026 at 03:29 AM
Score: 7.0 / 10

Read Original →