DeepSeek’s Efficiency Leap: Could Lightning Strike Twice?
Nilesh Jasani
·
February 19, 2025

Last Christmas, DeepSeek made a bombshell announcement that shook the AI world, albeit four weeks later. For a while, a handful of us kept thinking about the implications with our mouths agape, but markets were indifferent until suddenly, everyone became a DeepSeek expert. DeepSeek released another paper in the last 24 hours, and once again, no one in financial markets is looking. We must be careful not to overstate the implications because the past episode worked out a particular way. But the paper was definitely more important than the buzz around Grok 3’s release from xAI. Even if the latest announcements are not as earth-shattering as last time, they need more attention, in the least, for implications on what the rest of the year holds for us. And likely more for the short-term impact, as well.

For those interested in conclusions without technical details, at some point, the new paper discusses an 11x improvement in the speed at which its newer methods can process long input. Overstatements and exaggerations in this headline aside (details below), if the last announcements highlighted potential improvements in training models, this paper proves that the computations required for the usage, aka inference, can also be improved meaningfully, and the search is on. 

Our key argument, upfront for the time-strapped: pay close attention to AI efficiency breakthroughs, especially those emerging from China and beyond the US. These developments are snowballing and are more critical for global tech investments than splashy hardware reveals or the next big model announcement from OpenAI, Anthropic (both likely soon), or xAI (yesterday). It is not about just DeepSeek or its announcement overnight,, but when coupled with Kimi.ai's parallel efficiency reveal within hours of the DS paper, a clear trend emerges: the efficiency hunt is accelerating fast. Prepare for a year packed with disruptive efficiency-focused announcements.

Sparse Attention: Less Work, Same Smarts

The rest of the document may appear more technical, but we feel serious investors need to spend time understanding the nuances, as this knowledge will be needed repeatedly in the coming quarters. In our efforts to make the message more straightforward, we might have missed nuances or oversimplified, for which we apologize in advance.

DeepSeek’s new trick, Native Sparse Attention (NSA), is about doing more with less. When AI models read data—a long document or a video—they usually chew through every word or frame, a process called full attention. These usual attention mechanisms, crucial for training and inference, require every token to interact with every other. This becomes computationally expensive and is enormously power-hungry as sequence lengths grow in either training or inference. NSA flips that. It skims the fluff and zooms in on what counts, slashing the effort needed.

The paper claims this speeds up processing a hefty 64,000-token chunk—think a novella-sized input—by up to 11 times compared to old-school full attention. The big claim is from DeepSeek’s own paper (https://arxiv.org/pdf/2502.11089). This is likely an overstatement as model makers already use efficiency-producing methods in their attention mechanism. But even against modern tricks like FlashAttention-2, the DS’s new methods could be 3-4 times faster, which is still a solid win for long tasks. For short stuff, like a quick question summarising this paper, the boost will likely be smaller—maybe 10-20%—because there’s less to skim.

Why does this matter? Long contexts are exploding. Videos, deep research, and chain-of-thought reasoning models (more on this below) demand far more tokens than a simple chat. The NSA could make these jobs quicker and cheaper, especially in areas where compute costs are piling up quickly.

Reasoning and Reality: Why Long Context Matters

You might wonder, "Do we really need AI to read 64,000 words at once?" The answer is no for many everyday tasks, such as quick questions or document summaries. While frequent, these "short context" tasks aren't the biggest computational drain individually. However, compute efforts required change dramatically not just for videos and images but also with reasoning models, one of AI's hottest trends since the middle of last year. 

AI isn’t just parroting answers anymore—it’s thinking harder. Over the past six months, big players have shifted to reasoning models that dig deep, like OpenAI’s o1 or DeepSeek’s R1. These beasts don’t just skim; they ponder, chaining thoughts to crack challenging problems. Sam Altman even hinted GPT-5 might be the last non-reasoning model from OpenAI. That’s a sign of where things are headed.

By their nature, these reasoning models, particularly those conducting “Deep Research,” the latest catchphrase, process vast amounts of information internally to generate insightful outputs. This dramatically increases the compute needed for each interaction. Efficiency breakthroughs like DeepSeek's NSA become crucial for making these powerful reasoning models practical and affordable for widespread use. And NSA isn't just about faster responses; DeepSeek also states that these efficiency gains extend to the pre-training phase – the most computationally expensive part of developing AI models.

Old and Older GPUs, New Tricks

DeepSeek’s boldest claim in the paper is that their latest R1 model could be trained not just on NVIDIA’s H100 GPUs, as was announced before, but on even older ones. The details in the paper are sketchy, but with this new method, it might be possible to use hardware released years ago to develop today’s cutting-edge models. 

While rigorous verification is crucial, the implications are enormous. This could mean smaller teams or firms anywhere with tighter wallets can build serious models without bleeding cash on GPU farms. This will only accelerate the efficiency search. In fact, even yesterday, DeepSeek was not alone. Within hours, Chinese AI company Kimi.ai released its own paper, also focusing on sparse attention and a "mixture of blocks" method, with results that appear competitive and, in some cases, potentially exceeding DeepSeek's claims in its latest paper. 

The Bigger Picture: Efficiency is the New Frontier

It’s easy to dismiss individual AI research papers—hundreds are published every week, with a vast majority claiming material improvements. Given our experience, we should pay more attention to one coming from DeepSeek, while staying aware of any overemphasis risks. 

However, this is not about a single paper. The global focus has turned to making models efficient. It’s happening in China, in open-source research communities, and could suddenly have announcements coming from other countries as well. If xAI could produce the cutting-edge models - as shown yesterday - from a standing start in less than two years, the journey for others starting now may not be too long, notwithstanding xAI’s remarkable funding abilities.

For investors, policymakers, for anyone following AI, the lesson is clear: hardware advancements and flashy model launches are no longer the sole indicators of progress. The real game-changer is how efficiently these models can be trained and run. DeepSeek’s NSA might not be the defining breakthrough, but it is a clear signpost: the AI efficiency race is here and accelerating.

Related Articles on Innovation