It feels like just yesterday we were marveling at the latest advancements in AI, and already, DeepSeek is back with what looks like another significant leap forward. You might recall the buzz around DeepSeek last Spring Festival – they certainly know how to make an impact. Now, whispers are circulating about a new model structure being tested, one that could support an astonishing 1 million tokens in context. Imagine the possibilities!
While the public-facing API for DeepSeek V3.2 remains at 128K context, this internal testing hints at something even more ambitious on the horizon. It’s reminiscent of their January paper, "Conditional Memory via Scalable Lookup," which tackled a key limitation in current large language models: their memory. This research, a collaboration between Peking University and DeepSeek, laid the groundwork for what many believe will be the V4 model, potentially arriving around this year's Spring Festival.
Let's talk about V3.2 itself, which officially launched late last year alongside its 'Speciale' counterpart. The goal with V3.2 was a delicate balancing act: powerful reasoning capabilities married with manageable output lengths, making it a solid choice for everyday tasks like Q&A and general agent applications. In benchmarks, it's been holding its own, reaching GPT-5 levels and only slightly trailing Gemini 3.0 Pro. Compared to models like Kimi-K2-Thinking, V3.2 offers a significant reduction in output length, which translates to faster processing and less waiting time for users – a practical win.
DeepSeek has become a real bellwether in the AI space, and its every move is watched closely. It's no surprise then that 'deepseek' was named the 2025 Word of the Year by Youdao Dictionary, racking up millions of searches. The dictionary's team noted a distinct surge in interest, with each major development driving up search volumes, starting from their early breakthroughs in overcoming computational barriers.
There's also the experimental V3.2-Exp version, which served as a stepping stone towards new architectures. It introduced DeepSeek Sparse Attention, specifically optimizing for efficiency in training and inference with long texts. This experimental version was rolled out across their app, web, and mini-programs, accompanied by a significant price reduction for API access.
Looking at the technical details, V3.2, released in September 2025, is a testament to robust development. It's backed by national policy support, integrated into the national computing internet, and participates in data element development pilots. The model boasts a dual CUDA/TileLang GPU architecture, supporting up to 160K context. Crucially, it achieved day-zero compatibility with domestic chip manufacturers like Cambricon and Ascend, and is available on platforms like Huawei Cloud and Modao.
The introduction of the TileLang programming language for hardware scheduling and their proprietary UE8M0 FP8 data format are noteworthy innovations. The experimental V3.2-Exp, launched in late September 2025, saw API prices slashed by over 50%, with input costs dropping to 0.2 yuan/million tokens (cache hit) and 2 yuan/million tokens (cache miss), and output at 3 yuan/million tokens. By December 1st, the official V3.2 and the temporary API for Speciale were fully updated. Technical reports indicated that V3.2-Speciale outperformed GPT-5 and Google Gemini 3 Pro in evaluations.
Building on this architecture, the open-source math-specific model DeepSeekMath-V2 was released in November 2025, achieving gold medal status in International Mathematical Olympiad competitions. December saw the model integrated into Tencent's ecosystem, powering AI applications like Yuanbao and ima alongside Tencent Hunyuan 2.0. By December 15th, Tianyi Cloud's Xirang also incorporated DeepSeek V3.2.
In the competitive landscape, Mistral AI's Devstral 2 series, released in December 2025, showed strong performance in code generation benchmarks, surpassing DeepSeek V3.2's baseline. However, DeepSeek's focus on accessibility and efficiency continues to be a driving force.
Even in practical applications like Huawei Cloud's CodeArts code agent, which entered public beta before the 2026 Spring Festival, DeepSeek V3.2 is a supported model, alongside GLM-4.7. This agent allows users to generate code directly from their requirements, with the copyright belonging to the user.
Similarly, Huawei Cloud's OpenClaw experience plan, launched in March 2026, supports various mainstream large models, including DeepSeek V3.2, GLM-5, and Kimi-K2. This plan offers a cost-effective way to deploy AI assistants, with introductory pricing and token vouchers that can significantly reduce usage costs for DeepSeek V3.2 API calls.
As of December 2026, the official release of DeepSeek V3.2 is highlighted, emphasizing its enhanced agent capabilities, integrated reasoning, and availability across web, app, and API. The platform also offers free dialogue with V3.2 and access to its API for quick integration. The company's research portfolio includes a range of models, showcasing a continuous drive for innovation.
Ultimately, DeepSeek V3.2 represents a significant step in the evolution of open large language models. Its modular design, optimized training strategies, and focus on accessibility are setting new benchmarks for the industry, making advanced AI more attainable for developers and users alike.
