What if businesses don’t have to invest nearly as much as they think in creating AI models?
Given the increased focus on DeepSeek, a Chinese AI software that has become the most popular app in the U.S. software Store, investors are asking themselves that question on Monday. According to reports, the company was able to create a model that works similarly to ChatGPT from OpenAI without investing as much money.
Wall Street is concerned about the implications of DeepSeek’s success for firms such as Nvidia Corp. (NVDA), Broadcom Inc. (AVGO), Marvell Technology Inc. (MRVL), and others whose stocks have surged on the belief that their companies would reap the benefits of ambitious, AI-driven capital-expenditure plans in the years to come.
In Monday’s premarket trade, those stocks had all dropped more than 10%. Futures on the Nasdaq are down 3.9%.
In a note to clients over the weekend, Srini Pajjuri, an analyst at Raymond James, stated, “If DeepSeek’s innovations are adopted broadly, an argument can be made that model training costs could come down significantly even at U.S. hyperscalers, potentially raising questions about the need for 1-million XPU/GPU clusters as projected by some.”
In a piece titled “The Short Case for Nvidia Stock,” Jeffrey Emanuel, a Web3 entrepreneur and former quant investor, claimed that DeepSeek’s success “suggests the entire industry has been massively over-provisioning compute resources.”
He stated that “markets eventually find a way around artificial bottlenecks that generate super-normal profits,” which means Nvidia could face “a much rockier path to maintaining its current growth trajectory and margins than its valuation implies..”
Examining the figures that are so concerning to Wall Street is also worthwhile. In particular, there is outrage over a report that claimed the model’s development cost DeepSeek’s developer $5.6 million. Large U.S. technology businesses, on the other hand, are spending tens of billions of dollars annually on capital projects, with a significant portion of that money going toward AI infrastructure.
Stacy Rasgon, a Bernstein analyst, said the $5 million figure is extremely deceptive. Was it true that DeepSeek “built OpenAI for $5M?” He responded in a note to clients over the weekend, “Of course not.”
The “mixture-of-experts” model DeepSeek-V3, which Rasgon claims “through a number of optimizations and clever techniques can provide similar or better performance vs other large foundational models but requires a small fraction of the compute resources to train,” is the model that that number correlates to.
However, the $5 million amount “does not include all the other costs associated with prior research and experiments on architectures, algorithms, or data,” he added. Additionally, this kind of model is made “to significantly reduce cost to train and run, given that only a portion of the parameter set is active at any one time.”
Rasgon claims that DeepSeek’s R1 model “seems to be causing most of the angst” because of its resemblance to OpenAI’s o1 model. “DeepSeek’s R1 paper did not quantify the additional resources that were required to develop the R1 model (presumably they were substantial as well),” noted Rasgon.
Having said that, he believes it is “absolutely true that DeepSeek’s pricing blows away anything from the competition, with the company pricing their models anywhere from 20-40x cheaper than equivalent models from OpenAI.”
He does not, however, believe that the semiconductor industry is in a “doomsday” scenario, saying, “We are still going to need, and get, a lot of chips.”
There was also a bright side for C.J. Muse, a character in Cantor Fitzgerald. “Innovation is driving down cost of adoption and making AI ubiquitous,” he stated. “We see this progress as positive in the need for more and more compute over time (not less).”
Pajjuri by Raymond James makes a similar point. “A more logical implication is that DeepSeek will drive even more urgency among U.S. hyperscalers to leverage their key advantage (access to GPUs) to distance themselves from cheaper alternatives,” he stated.
Furthermore, he believes that investors should consider inferencing even though DeepSeek is primarily concerned with training expenses. Inferencing is the process of using new data to make a model function, while training is the process of exposing a model to data that will teach it to draw conclusions.
According to Pajjuri, “as training costs decline, more AI use cases could emerge, driving significant growth in inferencing,” especially for models such as the R1 from DeepSeek and the o1 from OpenAI.
However, according to Emanuel, DeepSeek is “nearly 50x more compute efficient” than well-known American models for training, and possibly even more so for inference.