Google’s Gemini Omni turns images, audio, and text into video — and that’s just the start

TechCrunch

RERebecca Bellan

May 20, 2026 at 01:45 AM8.0/10

Google has unveiled Gemini Omni, a groundbreaking multimodal AI model that can generate and edit videos through conversational inputs across text, images, audio, and video. The technology, launching first with Omni Flash, represents a significant leap in AI's ability to understand and create complex multimedia content. This development could revolutionize content creation and video editing workflows.

Background

Multimodal AI models that can process and generate different types of media have been a major focus in AI research, with previous models typically limited to one or two modalities. Google's Gemini series has been at the forefront of this research, competing with other major AI models like OpenAI's GPT series.

Source: TechCrunch
Published: May 20, 2026 at 01:45 AM
Score: 8.0 / 10

Read Original →