Nvidia has unveiled its new Llama-3.1 Nemotron Ultra-253B, a large language model (LLM) designed to outperform competitors like Meta’s Llama-4 and DeepSeek R1. This model, based on Nvidia’s Llama-3.1-405B-Instruct, is optimized for advanced reasoning, instruction-following, and AI assistant workflows.
With a 253-billion parameter architecture, it’s more efficient than other LLMs, allowing for deployment on a single 8x H100 GPU node. It features toggling between “reasoning on” and “reasoning off” modes to adjust performance for complex and simple tasks.
The model leverages a Neural Architecture Search (NAS) process for efficient inference, introducing structural innovations like skipped attention layers and fused feedforward networks to reduce memory use while maintaining high output quality. It supports Nvidia’s B100 and Hopper microarchitectures, making it cost-effective for data centers.
Post-training, Nvidia refined the model through supervised fine-tuning and reinforcement learning, improving performance in areas like math, coding, chat, and tool use. The model showed substantial improvements in benchmarks, such as the MATH500, AIME25, and LiveCodeBench, with reasoning mode significantly enhancing results.
Compared to DeepSeek R1, Llama-3.1 Nemotron Ultra outperformed in general reasoning tasks and instruction-following, though DeepSeek still excels in math-heavy tasks.
Llama-3.1 Nemotron Ultra supports multilingual applications and is compatible with the Hugging Face Transformers library. It’s suitable for a variety of use cases, including chatbot development, AI workflows, and code generation.
Released under the Nvidia Open Model License, it is available for commercial use and comes with open weights and post-training data. Nvidia emphasizes responsible AI development, encouraging users to assess the model’s alignment and safety for specific applications.