As social media platforms and stock markets are provoking the popularity of the new AI company Deepsac, a report by Bernstein states that Deepsek looks fantastic, but is not a miracle and is not made in USD 5 million.
The report addressed the discussion around the models of Deepsac, especially the considering that the company created something comparable to Openai for only 5 million USD. According to the report, this claim is misleading and does not reflect the entire picture.
It has been said that “We believe that Deepsek” did not build OpenII for USD 5M “; the models look fantastic, but we don’t think they are miracles; and Twitter-waves in the weekend would result in overbloud as a result of nervousness in the weekend. Is”.
Bernstein’s report states that Deepsek has developed two main families of the AI model: ‘Deepsek-V3’ and ‘Dipsek R1’. The V3 model is a large language model that uses the mixture (Moe) architecture of the mixture.
This approach combines several small models to work together, resulting in high performance using less computing resources than other large models. The V3 model has a total of 671 billion parameters, at any time 37 billion active.
It also includes new technologies such as multi-headed down (MHLA), which using FP8 computation using memory use, and reduces mixed-collection, which improves efficiency.
To train the V3 model, Deepsek used a cluster of 2,048 Nvidia H800 GPU for about two months, about 2.7 million GPU hours for pre-training and 2.8 million gPU hours after training.
While some have estimated the cost of this training, about 5 million USD USD per GPU is based on the rate of fare, reports show that this figure shows that this figure is in developing comprehensive research, experiment and other costs. Not included. Sample.
The second model, ‘Dipsek R1’, produces on the V3 foundation, but uses reinforcement learning (RL) and other techniques to greatly improve logic capabilities. The R1 model has been particularly impressive, performing competitively against Openai’s model in logic functions.
However, the report states that the possibility of additional resources required to develop R1 was sufficient, although the volume was not determined in the company’s research paper.
Despite the publicity, the report insisted that the models of Deepsak are really impressive. For example, the V3 model performs better than other larger models on language, coding and mathematics benchmark using only one fraction of computing resources.
For example, pre-training V3 requires approximately 2.7 million GPU hours, which is only 9 percent of the calculation resources required to train some other major models.
Finally, the report states that when the achievements of Deepsaks are notable, the nervousness and exaggerated claims about the creation of an OpenAI competitor for the USD 5 million are overblown.
(Except for the headline, the story has not been edited by NDTV employees and is published by a syndicated feed.)