A Chinese AI firm, DeepSeek, has launched DeepSeek V3, a cutting-edge open AI model that is making waves in the tech industry. Released under a permissive license, the model allows developers to download and adapt it for various applications, including commercial use. This V3 is designed to handle diverse tasks like coding, translating, and generating written content from prompts.
DeepSeek claims its model surpasses both open-source and proprietary AI systems in performance. In coding competitions and benchmarks, it outperformed notable models, including Meta’s Llama 3.1 and OpenAI’s GPT-4o.
The model has been trained on a vast dataset of 14.8 trillion tokens and features 671 billion parameters, highlighting its scale and sophistication. Despite its size, it was developed on a modest budget of $5.5 million using Nvidia H800 GPUs, showcasing DeepSeek’s efficiency in model training.
One challenge with DeepSeek V3 is its reliance on high-end hardware for optimal performance, as its sheer size demands powerful GPUs. However, its achievements are remarkable, considering it was developed in just two months under hardware restrictions imposed by U.S. regulations.
While its technical capabilities are impressive, the model reflects certain limitations in political discussions. For instance, it avoids topics sensitive to Chinese authorities, adhering to regulatory guidelines that emphasize “core socialist values.”
DeepSeek is backed by High-Flyer Capital Management, a quantitative hedge fund leveraging AI for financial decisions. The organization has invested heavily in AI infrastructure and aims to push the boundaries of superintelligence. DeepSeek’s efforts signal a shift in the AI landscape, with open models like V3 posing a significant challenge to established proprietary systems.