Google DeepMind has released DiffusionGemma, a new open AI model that generates text in parallel blocks rather than sequentially, achieving speeds up to 4x faster than traditional autoregressive models. The 26-billion parameter Mixture of Experts model activates only 3.8 billion parameters during inference, making it suitable for local hardware like gaming GPUs. It can generate up to 1,000+ tokens per second on an H100 accelerator, with potential applications in text editing and molecular modeling.
Background
Traditional AI language models like GPT use autoregressive generation, producing text sequentially one token at a time, which can be computationally intensive and slow. Google's new approach adapts diffusion techniques from image generation to text, enabling parallel processing for faster outputs.
- Source
- Ars Technica
- Published
- Jun 11, 2026 at 03:29 AM
- Score
- 7.0 / 10