Meta shrinks Llama models for faster on-device AI

Community-Team · October 28, 2024, 9:24pm

Subscribe for free access to Data Points!

Meta released quantized versions of its Llama 3.2 1B and 3B language models, optimized for mobile devices. The new model versions achieve twice to four times the speed of the non-quantized models, a 56 percent reduction in size, and a 41 percent reduction in memory usage compared to the original versions, while maintaining high quality and safety. These mobile versions of Llama 3.2 allow developers to build AI experiences that run entirely on-device, offering improved speed and privacy for users. (Meta)

Topic		Replies	Views
Meta’s Llama 3.2 goes multimodal AI Discussions ai-discussions , data-points , meta	1	102	September 27, 2024
LLM on mobiles AI Discussions ai-discussions	1	120	May 24, 2024
Cool development for fast, local CPU driven LLM inference AI Discussions ai-discussions	6	626	April 5, 2024
How to get this great folder with all models that can be used? Retrieval Augmented Generation week-module-1 , dl-ai-learning-platform	1	40	October 13, 2025
Meta releases Llama 4 models, claiming superior performance AI Discussions ai-discussions , data-points	3	214	April 8, 2025

Meta shrinks Llama models for faster on-device AI

Related topics