Google has launched GEMMA 3n, Multimodal Open Source AI model that runs on just 2GB of RAM without internet
Google has launched the GEMMA 3n, a powerful AI model that can run directly without internet access, supports advanced multimodal tasks using only 2GB of memory.
Listen to the story

In short
- Gemma 3n supports text, audio, image and video input
- Runs offline on phone with 2 GB memory
- Real -time AI features to edge equipment like Pixel phones.
Google has announced a full launch of its latest on-device AI model, Jemma 3N, which was first declared in May 2025. The AI ​​model brings advanced multimodal abilities with limited memory and without internet connection, including audio, image, video and lesson processing. With this release, developers can now deploy AI features, which require powerful cloud infrastructure, directly on phones and low-power devices.
Jemma 3N has a new architecture in the heart called Matformer, which is small for a matrious transformer. Google explains that like dolls victim dolls, models include small, completely functional sub-models inside older people. This design makes it easier to score performance based on hardware available for developers. For example, Gemma 3n is available in two versions: E2B, which operates less in the form of 2GB memory, and E4B, which requires approximately 3GBs.
Despite having a 5 to 8 billion raw parameters, both models perform like a very small model in terms of resource usage. This efficiency comes from innovations such as per-layer embeding (ple), which transfer some workload from the phone’s graphics processor to its central processor, which freely free valuable memory.
Gemma 3n also introduces KV cash sharing, which gives considerable speed of how quickly the model processes long audio and video input. Google says that it improves the response to two times in time, making real -time applications such as voice assistants or video analysis, very fast and more practical on mobile devices.
For speech-based features, Gemma 3n contains an underlying audio encoder adapted to Google’s universal speech model. This allows it to do tasks like speech-to-text and language translation directly on the phone. Preliminary tests have shown particularly strong results when translating between English and European languages ​​such as Spanish, French, Italian and Portuguese.
The visual side of GEMMA 3N is powered by Mobilenet-V5, Google’s new lightweight vision encoder. This system can handle video streams up to 60 frames per second on devices such as Google Pixel, which smoothly enables real -time video analysis. Despite being small and sharp, it improves the previous vision model in both speed and accuracy.
Developers can use Gemma 3n such as through popular devices such as face transformers, Olama, MLX, LLAMA.CPP and others. Google has also launched the “GEMMA 3N Impact Challenge”, inviting developers to make applications using the offline capabilities of the model. The winners will share the $ 150,000 prize pool.
Importantly, the model can operate completely offline, which means that it does not require internet connection to work. It opens the door to AI-managed apps in remote areas or privacy-sensitive conditions where cloud-based models are not viable. With the support of over 140 languages ​​and the ability to understand the material in 35, the GEMMA 3n sets a new standard for efficient, accessible on-device AI.