Meta creates mini version of llama AI model that can fit in your pocket, run on phones and tablets
Meta has collaborated with MediaTek and Qualcomm to integrate quantized Llama models on Arm CPUs on smartphones and tablets.
listen to the story

In contrast to Apple’s strategy of “not the first, but the best” to bring Apple Intelligence to the iPhone – it has been significantly delayed, and is expected some time next week (even then in a limited capacity) – Meta has outperformed its rivals. Apple has been beat to it, and Google has announced the “quantized Llama model”. Meta has miniaturized its Llama AI models so they can offer “increased speed and reduced memory footprint” and run on smartphones and tablets. These models will be available on Qualcomm and MediaTek ARM CPUs, including flagship phones from Samsung, Xiaomi, OnePlus, Vivo, Google Pixel and other brands.
These are Meta’s first “lightweight” AI models. It uses a technique called quantization, which has been applied to two models: Llama 3.2 1B and 3B. Think that shrunken models have “the same quality and security requirements” as larger models, while “achieving 2-4x speeds,” says Meta. “We also achieve an average reduction of 56 percent in model size and a 41 percent average reduction in memory usage compared to the original BF16 format,” Meta says.
Essentially, these smaller sized models are adjusted for the instructions while maintaining the same quality and safety standards as the original 1B and 3B models in the Llama range. Despite this, they run 2 to 4 times faster and are approximately 56 percent smaller in size, using 41 percent less memory than the original models. This was achieved in testing conducted on OnePlus 12.
Meta says it used two methods to reduce the size of the Llama 3.2 1B and 3B models. The first is quantization-aware training with the LoRA adapter, which focuses on keeping the model accurate. The second is SpinQuant, a new technique that reduces the size of models after training, making them easier to use on different devices.
The performance of the Quantized Llama AI model was also tested on the OnePlus 12, Samsung Galaxy S-series phones, and an unnamed iOS device. According to Meta, the time taken to process data improved by 2.5 times, and initial response times improved by an average of 4.2 times.
The announcement of Meta marks a significant shift in bringing advanced AI capabilities directly to smartphones and tablets. By taking advantage of quantization, these models become lighter and more efficient, enabling them to run on devices rather than relying on cloud processing. This approach enhances user privacy, reduces latency, and allows seamless, real-time AI experiences without the need for constant Internet connectivity.
The move is particularly impactful for expanding AI reach in areas where network infrastructure may be less robust, making it possible for more users to seamlessly experience AI-powered features. It also opens the door for developers to integrate these models into various applications, potentially expanding the AI ecosystem on mobile platforms. Meta’s focus on MediaTek and Qualcomm chips – which are common in many Android devices – signals a strategic push to improve performance and user experience across a wide range of smartphones, making AI more democratized.