DeepSeek’s new AI model debuts with support for China-native chips and CANN, a replacement for Nvidia’s CUDA — Chinese chipmakers Huawei, Cambricon, and Hygon get first-class support
Chinese AI firm DeepSeek has released its latest large language model, DeepSeek-V3.2-Exp, with first-day optimizations for Huawei’s Ascend hardware and CANN software stack. The launch marks a shift in priorities to ensure leading-edge models run on domestic accelerators rather than relying on Nvidia’s CUDA ecosystem. DeepSeek announced the model on September 29, posting code and…
Chinese AI firm DeepSeek has released its latest large language model, DeepSeek-V3.2-Exp, with first-day optimizations for Huawei’s Ascend hardware and CANN software stack. The launch marks a shift in priorities to ensure leading-edge models run on domestic accelerators rather than relying on Nvidia’s CUDA ecosystem.
DeepSeek announced the model on September 29, posting code and checkpoints to Hugging Face alongside a technical report. The company describes V3.2-Exp as an “intermediate step toward our next-generation architecture,” designed to cut costs on long-context inference. It features a sparse attention mechanism that trims memory and compute requirements while maintaining output quality.
Huawei’s Ascend team and the wider vLLM-Ascend community moved swiftly to integrate DeepSeek-V3.2-Exp. In the vLLM-Ascend repo, a new issue outlines custom operator installation steps and kernel packaging for Ascend NPUs to support V3.2-Exp. The CANN team also published an inference recipe, positioning the model for immediate deployment across Huawei hardware.
Increased collaboration bw DeepSeek & Ascend/CANN team in supporting V3.2-Exp w/ gitcode updates to Cann as well as GitHub updates into vLLM & SGLang + TileLang support.Also Cambricon had updates into vLLM (vLLM-MLU) to support its inference.DS is really dealing w/ reality of… pic.twitter.com/CBgk7pVZrxSeptember 29, 2025
Meanwhile, SGLang confirmed V3.2-Exp support across multiple back ends, including Ascend, while DeepSeek’s GitHub notes suggest parity with vLLM at launch. DeepSeek itself publicly references both TileLang and CUDA kernels in its announcements, urging researchers to use TileLang for prototyping. Practically, that means the same model artifact can be deployed across Nvidia and Chinese accelerators with only minimal graph changes.
The sheer speed of adoption here illustrates how China’s AI ecosystem is undeniably preparing for a future in which access to Nvidia hardware cannot be taken for granted. Nvidia’s CUDA remains dominant for both training and inference, but DeepSeek’s latest release is one of the first from a major Chinese company to apparently arrive optimized for non-CUDA stacks on day one.
The coordinated effort across Ascend, Cambricon, and Hygon is the clearest sign to date that Chinese firms are taking Beijing’s demands for AI sovereignty seriously, not just making their hardware compatible after the fact, but positioning domestic platforms as first-class targets.
Follow Tom’s Hardware on Google News to get our up-to-date news, analysis, and reviews in your feeds. Make sure to click the Follow button.