Meta’s Llama 3.2 goes multimodal

Get more details!

Meta released Llama 3.2, a family of vision-capable large language models and lightweight text models for edge devices. The new lineup includes 11 billion and 90 billion parameter multimodal models that can reason about images, outperforming Claude 3 Haiku and GPT4-mini on visual understanding tasks. Meta also launched 1 billion and 3 billion parameter models optimized for on-device use with 128K token context lengths, with the 3B model outperforming Gemma 2 2.6B and Phi 3.5-mini on tasks like instruction following and summarization. The company also introduced Llama Stack distributions to simplify deployment across various environments, and updated safety tools, including Llama Guard 3 for vision tasks. (Meta AI)