In a move that could reshape how businesses pay for AI, Huawei Cloud announced a token‑based billing system in a press briefing in Shanghai on 12 June 2026. The new framework lets customers in the Asia‑Pacific region pay for compute resources by the token instead of the traditional per‑hour or per‑request models, giving developers granular control over costs for generative‑AI workloads.

The token service follows Huawei Cloud’s earlier launch of an AI compute platform that enabled iFLYTEK to set up a large‑language‑model training pipeline in just two weeks. The platform ran continuously for 60 days without interruption, and the company reported zero faults during major version releases. Huawei said the token‑pricing approach is designed to match the economics of LLM inference, where token volumes can vary wildly between requests.

Huawei’s push extends beyond pricing. The company unveiled the Ascend 950PR inference chip, a 1.56‑petaflop device that delivers 2.8 times the FP4 performance of Nvidia’s H20. The chip is aimed at high‑throughput inference workloads, including those that process large volumes of tokens. A three‑year roadmap for Ascend chips—starting with the 950 series in 2026, followed by the 960 in 2027 and the 970 in 2028—targets a doubling of compute capacity with each release.

In the same briefing, Huawei highlighted its expanding AI partner ecosystem. The launch of the Huawei Cloud AI Services APAC Partner Ecosystem comes with awards for partners that have achieved significant milestones in cloud technology. The ecosystem is intended to accelerate the adoption of Huawei’s AI tools across enterprises in the region.

Industry analysts note that Huawei’s token‑pricing model could lower entry barriers for startups and SMEs that need to experiment with generative AI but are wary of unpredictable compute costs. By tying payment directly to token usage, the model aligns more closely with the economics of large‑language‑model inference.

Huawei’s strategy also dovetails with its broader hardware initiatives. HiSilicon’s domestically fabricated Kirin processors and the company’s domestic AI chip supply chain—including packaging suppliers and EDA partners—provide a foundation for high‑performance Ascend chips. This integrated approach positions Huawei to compete with established cloud providers such as Amazon Web Services, Microsoft Azure, and Google Cloud.

The announcement comes amid growing scrutiny of AI infrastructure providers. While Huawei has faced restrictions in some markets, it continues to expand its presence in Asia, Europe, and Africa. The focus on token‑pricing reflects a shift toward more granular billing models that could appeal to customers seeking transparency and cost predictability.

Looking ahead, Huawei plans to roll out additional AI services, including a new suite of developer tools and APIs that integrate with its token service. The company also intends to expand Ascend chip production capacity to meet rising demand for high‑performance inference. As the AI‑cloud market matures, Huawei’s token‑pricing approach may become a key differentiator for customers prioritizing cost efficiency and scalability.

In summary, Huawei Cloud’s new token‑pricing framework signals a strategic pivot toward more flexible billing for AI workloads. Coupled with its Ascend chip roadmap and partner ecosystem expansion, the company is positioning itself as a viable alternative to Western cloud providers in the rapidly evolving generative‑AI landscape.