In our previous note discussing the top 10 developments during the holiday season, we committed a nearly cardinal error: our most important announcement was about Chinese models and their cost efficiencies. While hailing DeepSeek V3 as the most significant innovation, we inadvertently linked it to another significant development, Alibaba's Qwen Model and its reduced inference costs. Our phrasing was akin to stating "Google’s ChatGPT," a mix-up that raised eyebrows. But correcting this mistake led us down a fascinating rabbit hole, uncovering the truly revolutionary changes brought by DeepSeek V3. The following note is understandable (hopefully!) for everyone without much technical background because it is also important for those in the investment community to understand where things could be headed in the coming quarters because of some of the changes successfully implemented in this model.
A Revolution Praised by All—Even in the West
DeepSeek V3 has to be revolutionary, not just for its claims but for its remarkable reception from the Valley's most respected minds. Unlike the typical skepticism surrounding Chinese innovations, DeepSeek V3 has drawn almost universal acclaim. Experts like Andrej Karpathy and Nvidia’s Jim Fan have lauded it as a “dark horse” in the AI race. This near-unanimous praise defies historical patterns, where discussions about Chinese tech are often as polarized as Fox News debating liberal proposals or CNN evaluating Trump-era policies. DeepSeek’s achievements, however, seem to have bridged this divide for multiple reasons.
Small Theoretical Changes Can Lead to Massive Innovations
Before we discuss the potential outcomes of DeepSeek V3, let's examine how seemingly minor theoretical changes can lead to significant shifts in the landscape of model development. This phenomenon has occurred multiple times, with the most clear, impactful, and comprehensible example being the impressive "Chain-of-Thought" approach introduced in mid-2022. The idea was elegantly straightforward: rather than endlessly expanding models to enhance quality, why not train them to think step by step, as humans do? This approach greatly improved performance on math problems, programming, and scientific research by promoting logical reasoning and encouraging models to break tasks into smaller steps.
Initially theoretical, this concept has become foundational in the development of reasoning models, which are seen as a crucial step toward achieving Artificial General Intelligence (AGI). DeepSeek V3’s innovations build upon similar advancements—making AI larger, smarter, and more efficient.
For those uninterested in details, DeepSeek's four potentially groundbreaking implementations are:
1. Efficient training methods that reduce training costs by over 90% compared to similar models.
2. Improved strategies for addressing queries, allowing it to solve simple problems with fewer resources.
3. Claims of enabling models to learn and update continuously.
4. Employing non-model methodologies to tackle problems that do not require neural networks.
For instance, consider how many "r"s are in the word "Strawberry," a query that continues to challenge many models. The enhancements made by DeepSeek are equivalent to:
a. Creating a model whose costs are barely USD 6 million to address this issue.
b. Using fewer resources for such questions frees up more for inquiries concerning quantum physics riddles.
c. Enable the model to quickly incorporate corrections when it is shown a mistake rather than waiting for the next upgrade.
d. Questioning the need for neural networks for such simple queries and delegating those problems to programs that can handle them more effectively.
Now, let's discuss a tad more details on each of these implementations:
Cutting Training Costs by 95%
One of DeepSeek V3’s most remarkable achievements is its ability to train a massive model for less than $6 million, a feat that would cost over $100 million using traditional methods. How does it achieve this?
- Sparse Mixture-of-Experts (MoE) Architecture: Unlike conventional models that activate all their parameters for every task, DeepSeek V3 activates only a small subset of its 671 billion parameters (37 billion per task). This reduces computational overhead while maintaining performance.
- Optimized Training: DeepSeek has streamlined its training process, focusing on high-quality data and efficient algorithms. This allows them to perform comparably to larger models with significantly less computational resources and training time.
Dynamic Memory Allocation
DeepSeek V3 introduces dynamic memory allocation, a system where the model adjusts its computational resources based on task complexity:
- For simple tasks, fewer parameters are activated, saving compute power.
- For complex tasks, additional resources are dynamically allocated, ensuring high-quality outputs.
This approach contrasts with traditional models, which uniformly use the same resources regardless of task difficulty. Dynamic allocation improves efficiency and paves the way for adaptive systems capable of scaling intelligently in real-time. This approach saves energy and makes AI development more accessible to organizations with limited resources. It’s like having a car that automatically adjusts its engine power based on whether you’re cruising on a highway or climbing a steep hill.
Continuous Learning: A Model That Evolves
Unlike traditional LLMs, which require retraining to incorporate new knowledge, DeepSeek V3 embraces continuous learning:
- Self-Improving Loops: The model refines its training data and objectives based on user interactions, learning from feedback without requiring full-scale retraining.
- Real-Time Adaptation: It can adjust its behavior during conversations, retaining corrections and avoiding repeated errors.
While no model can entirely eliminate static limitations, DeepSeek’s continuous learning approach reduces reliance on costly retraining cycles, ensuring it stays current and relevant over time. In the Strawberry problem, if it makes a mistake and is corrected, it is unlikely to answer wrong not only within the same session (a problem most LLMs had even a few months ago) but across interactions (persistent memory) and after a while, even for other users asking the same question (continuous training) - if true even to a degree, this is a hugely, significant development.
Symbolic AI: Seeking Solutions Beyond Neural Networks
We had written a note a few weeks ago on why, to fix a nail, one should not need an LLM. To everyone, it is evident that if one asks what is 2 and 2 or how many "r"s are there in the word "Strawberries," we already have far better methods. DeepSeek V3 combines neural networks with symbolic reasoning, enabling it to:
- Perform precise calculations and logical tasks by "calling out" to external symbolic engines (e.g., solving equations or executing code).
- Integrate these results back into its neural workflows for a seamless user experience.
This hybrid approach bridges the gap between rule-based systems and modern AI, enhancing performance on tasks requiring flexibility and precision. While not entirely new, DeepSeek’s implementation represents a significant leap in efficiency and scalability.
Beyond Big Tech: A New Breed of Innovators
DeepSeek V3's emergence from a quantitative hedge fund rather than a traditional tech giant underscores the broadening landscape of AI innovation. It highlights the potential for diverse organizations and research groups to contribute significantly to the advancement of AI. This development serves as a reminder that groundbreaking ideas can come from unexpected sources, challenging the conventional notion that AI progress is solely driven by large tech companies or can only come from certain types of teams.
The Future of AI: Efficiency, Adaptability, and Beyond
DeepSeek is likely not a pioneer in discussing any of the concepts above but is undoubtedly a league ahead in the implementation. The others will likely learn and catch up in no time in our "instant copiability" or patentless era. Still, its innovations offer a glimpse into the future of AI, where efficiency, adaptability, and continuous learning will be paramount. The most significant changes in AI do not always come from new versions like GPT-5 or Gemini 2 but from innovations in methodology and architecture. While new versions and larger models will undoubtedly emerge, a lot is going on all the time to make AI more accessible, sustainable, and impactful across a wide range of applications.