DeepSeek: The AI Upstart Challenging the Giants

It’s fascinating how quickly the landscape of artificial intelligence can shift, isn't it? Just when we think we've got a handle on who's leading the pack, a new player emerges, shaking things up with impressive speed and efficiency. That’s precisely the story unfolding with DeepSeek.

DeepSeek, or more formally Hangzhou DeepSeek Artificial Intelligence Basic Technology Research Co., Ltd., is a Chinese AI company that’s making significant waves in the world of large language models (LLMs). Founded in July 2023, it’s backed by High-Flyer, a Chinese hedge fund, and helmed by Liang Wenfeng, who also co-founded High-Flyer. What’s particularly striking is their rapid ascent and the way they’re achieving it.

Their flagship model, DeepSeek-R1, released under the permissive MIT License, has shown performance comparable to established giants like OpenAI's GPT-4. But here’s where it gets really interesting: the cost. DeepSeek claims they trained their V3 model for a mere US$6 million, a stark contrast to the estimated US$100 million for GPT-4 in 2023. They also report using about a tenth of the computing power that Meta’s Llama 3.1 model required. This kind of cost-effectiveness and performance has led observers to describe DeepSeek's success as "upending AI."

One of the key strategies behind this efficiency seems to be the use of "open weight" models, meaning the underlying parameters are shared, though with specific usage conditions. This approach, combined with recruiting AI talent from top Chinese universities and even drawing from non-traditional computer science backgrounds to enrich their models, appears to be a winning formula.

Techniques like mixture of experts (MoE) layers have played a crucial role in reducing training expenses. What’s even more remarkable is that DeepSeek achieved these breakthroughs while navigating trade restrictions on AI chip exports to China. They reportedly managed this by using less powerful, export-intended AI chips and employing fewer units overall. This ingenuity has sent ripples through the industry, with some calling it a "Sputnik moment" for the US in AI, especially given the open-source, cost-effective, and high-performing nature of DeepSeek's models.

The impact has been so significant that it’s even affected major players. Nvidia, a leader in AI hardware, saw its share price drop sharply, losing a staggering US$600 billion in market value – the largest single-company decline in US stock market history at the time.

Looking back, the roots of DeepSeek can be traced to High-Flyer, founded in 2016 by Liang Wenfeng. Initially focused on AI-driven stock trading, High-Flyer developed sophisticated deep learning models and invested heavily in computing infrastructure. By 2021, they were acquiring large quantities of Nvidia GPUs, even before US restrictions took hold. Their computing clusters, like Fire-Flyer and Fire-Flyer 2, were substantial investments, pushing the boundaries of what was possible in AI research and development.

In April 2023, High-Flyer announced a dedicated AGI research lab, separate from its financial operations. Just a few months later, in July 2023, this lab spun off to become the independent entity we now know as DeepSeek, with High-Flyer as its primary supporter. It’s a testament to how a focused vision, coupled with innovative engineering and strategic resourcefulness, can create a formidable force in a rapidly evolving technological frontier.

You Might Also Like

Leave a Reply Cancel reply