Google I/O 2025: Imagene 4 and VO3AI Photo and Video Generator Launched, Automatic Audio
Google has launched its most advanced AI tool Imageen 4 and Veo 3 in the image and video generations in Google I/O 2025. These new generative AI models now support text signals, audio generations and better realism.
Listen to the story

In Google I/O 2025 Kenote, Google unveiled its most advanced AI image and video generation tool – imagene 4 for image generation and VEO 3 for Video generations. These new AI models will allow users to generate short video clips and pictures based on text or image signals. Not only the video, VO3 will also be able to generate automatic and relevant audio in its clip.
Google says that VEO 3 is the latest version of its video generation model that is capable of creating a short video clip based on text or image signals. The company notes that the VEO 3 model gives the speed better, environmental interaction and visual stability and provides better video realism.
According to the company, the VO3 is now available in the beta called Flow through the Gemini app for Google AI Ultra Plan Subscribers in the US and Google’s AI film manufacturing platform in Beta, which was also revealed in I/O. Enterprise Access is being provided through vertex AI.
Update to VO2
Meanwhile, Google has also updated the previous generation, Veo 2. These include:
Reference input: Users can now upload images of people, objects or styles to maintain stability in scenes.
Camera Control: The new model now also has options like pan, zoom and rotate, which can be defined in the prompt.
Outpainting: The video can now be extended beyond the original frame, which is useful to adopt forms.
Add and remove object: The Veo 2 model allows users to connect or remove objects from the frame. The model adjusts light and shade for stability.
Meet imagene 4
With the video AI model, Google also has a new AI image generator. This is imagene 4. The new model now supports up to 2K resolution, with better details handling in areas such as fabric texture, reflection and fur. The model also works in various styles, including photorolic and Illustrative prompts.
One of the main attractions of imagene 4 has the ability to handle the text within images – in other words, accurate spelling – which makes the model more useful to make posters, slides, or cards that include custom typography.
Google is now integrating Imagene 4 in the workspace devices such as Gemini, Vertex AI, Whisk and Docs, Slide and VIDS. Google has also announced that a sharp version – 10 times faster than the imagene 3 – – will be released soon for a faster prototype.