As enterprises deploy AI, they are discovering the high cost of actually running an AI model, the “inference” costs. Uber spent its way through its 2026 AI budget in four months, with its COO Andrew Macdonald saying: “If you’re not actually able to draw a direct line to how [many] useful features and functionality you’re shipping to your users, that trade becomes harder to justify”. He is not alone. Salesforce CEO Marc Benioff admitted that his company would spend $300 million in Anthropic tokens in 2026. ServiceNow also guzzled their AI budget in the first few months of 2026. A Goldman Sachs report found that in one software company, inference costs are approaching 10% of total headcount costs. One company reported a 50-times increase in their AI costs when they put an AI application into production.
Fortunately for CFOs, there is an emerging area that may help them with these runaway inference costs. The examples mentioned so far are all associated with frontier models, specifically Anthropic’s Claude, which in the last year has dramatically improved its ability to write software. Claude has now overtaken ChatGPT in enterprise market share. However, Claude is not the only LLM that can code software. Models like Claude and ChatGPT are “closed” models i.e. their weightings are known only to the vendors that own and develop them. There is another class of open-source and open weight models, where the weighting of the models are public and the software is free. These include DeepSeek, Qwen (from Alibaba), GLM (from Z.AI, formerly Zhipu AI), Llama (from Meta) and Gemma (from Google).
Not only do these models not come with a hefty licence fee, some of them are very efficient to execute, so have low inference costs. DeepSeek V4 was reported to be fifty times cheaper to run than Claude Opus 4.6 in a May 2026 comparison report (Gemini 3.1 Pro had the fastest throughput). DeepSeek V4 has a million token context window, the same as Claude Opus 4.6 and ChatGPT 5.4. It is not the only high performing Chinese AI model.
A June 2026 benchmark ranked GLM 5.2 above ChatGPT 5.5 amongst others, and indeed top overall on software design. This was the first time that a Chinese open-source model has beaten western frontier models in such a benchmark. The Chinese model that has made the largest market impact is Qwen. Qwen has perhaps 25-30% market share of open-source models, ahead of Llama and DeepSeek at 20-25%. It has been estimated that 80% of Silicon Valley AI start-ups use Qwen.
At present, open-source AI models have around 11%-20% of deployed enterprise production usage, though 20% of token usage. However, their market share is growing, and it is estimated that 75% of all enterprises now have at least some open source AI model used within their software stack. The Chinese companies make no money directly from the AI models but can charge for hosted cloud deployments, customisation, support and adjacent products like developer tools or vertical applications in specific industries.
Clearly, this is rapidly changing landscape, but the economics are compelling. Anthropic and OpenAI may have the edge in terms of highest performance models, but only a limited proportion of companies need to be on the cutting edge of performance. With open-source models perhaps six to ten times cheaper to run in terms of inference (and with free software licences as a bonus), it is hard to imagine that hard-pressed CFOs will ignore the opportunity to nudge their companies towards these much cheaper solutions, at least for the bulk of applications. As a bonus, open-source models can be deployed in-house, a major advantage for companies concerned about security and privacy. While actual costs depend heavily on deployment architecture and optimisation, open-weight models give enterprises far more control over the cost-performance trade-off.
Factors that are keeping the closed models in front at present are switching costs, perceived brand trust and perhaps, in some cases, concerns about potential regulatory issues around using Chinese models. However, as more and more stories of AI inference cost overruns come out, it is likely that economics will win out over brand reputation, and open-weight market share will rise. It is likely that the market will split into a premium end, with the latest closed models used for the most demanding competitive edge applications, and the rest, where open-weight models will dominate due to cost and control of deployment.







