Lean Machines: Small Language Models

One mantra regarding large language models (LLMs) is that bigger is better. Parameters are the learned weights of a model, while tokens represent the pieces of text used to train it.

The more training tokens an LLM sees, the more fluent the answers it produces. This assumption that bigger is better is what has driven the frenzied activity in data centre construction and has propelled NVIDIA to become a company worth $5 trillion in October 2025. ChatGPT-3 was trained on 300 billion tokens and had 175 billion parameters, while ChatGPT-4 was much larger: it was trained on 13 trillion tokens and had 1.8 trillion parameters. ChatGPT-5, though its exact numbers are not public at this point, is rumoured to have been trained on as many as 70 trillion tokens and to have 10 trillion parameters. Each succeeding model was more sophisticated than the next and did better in benchmarks, though benchmarks can be slippery things.

You can measure the efficiency of an LLM based on its token costs, throughput of tokens or queries processed per second, memory and compute usage, output quality and latency of the model response. Various industry benchmarks have been devised to combine these and measure LLMs against these criteria. This benchmark topic is discussed in more detail in my separate blog on this subject here.

Just producing ever bigger models is the OpenAI approach, but this has very high costs, and there is another way to approach things. The Chinese company DeepSeek, founded in May 2023 and based in Hangzhou, made efficiency rather than scale a goal, forced to do so by its lack of access to the latest GPU chips from NVIDIA. When their first model was released in January 2025, it stunned Western observers. The open-source model performed very well on LLM benchmarks, yet was trained for a cost of $6 million compared to a rumoured $100 million for ChatGPT-4. Its per-token pricing was one-third of ChatGPT’s price, and indeed at the time of writing, the per-token cost of its latest model V3.1 Terminus is a quarter that of ChatGPT-5. The company has released other models since, including DeepSeek Coder and DeepSeek-V3.1 in August 2025.

DeepSeek has continued on this path, but it is not alone. There is Kimi and Kimi K2 from Moonshot AI in Beijing, Qwen from Alibaba Cloud in Hangzhou, Wu Dao 3.0 from the Beijing Academy of Artificial Intelligence (BAAI), and ChatGLM from Z.ai in Beijing. These products are often open source and do well on LLM benchmarks. The world of LLM benchmarks is itself a moving landscape, but there is little doubt that these emerging Chinese LLMs are at least competitive with leading Western ones. One sign of this is that many US start-up AI companies (estimates are up to 80% of them) are now basing their technology on Chinese LLMs, primarily for cost reasons.

Another difference in approach between many Western and Chinese models is that many of the Chinese models are open source, but what does that really mean? An open-source model publishes the model weights, architecture, and training code. Western examples of open source models include Mistral by French company Mistral AI, Llama from Meta (open in its weights, but not fully open source), Falcon from Technology Innovation Institute in Abu Dhabi, and Phi-3 from Microsoft. Chinese open-source models include DeepSeek-V3, Qwen, Kimi K2 (from Moonshot AI), Yi (from 01.ai), GLM-4.5 from Z.ai and HunyuanWorld from Tencent.

For many companies, whether start-ups or corporates, having the latest, shiniest model is not necessarily critical. LLMs are expensive to train for the AI vendors and also costly to operate for corporate end users. A model that scores marginally lower on benchmarks but has far lower running costs is a trade-off that many companies will be happy to make. The cost of licensing the big-name Western LLMs is not trivial.

An enterprise licence for ChatGPT will be an individual negotiation, but a cost of $60-$100 per user per month is not uncommon. If you have thousands of staff that you want to have access to AI, then that quickly adds up. A company with a thousand users may pay around $1 million or more annually for licensing ChatGPT. By contrast, using an open-source model is free in terms of licence cost, though of course you still incur hardware costs and staff to support it. Also important are the inference costs per query, which are usually lower in more efficient open-source LLMs.

As companies begin to get a better grip on what they really want from LLMs, running costs and licence costs become important. Corporate IT projects have generally struggled to show a positive return on investment, or indeed, in 95% of cases, any financial return whatever, according to MIT. One way to improve return on investment on a project is to spend less for the same result, and open-source, lower-cost LLMs may well be perfectly adequate for most tasks. As corporations evolve from being early adopters of AI into starting to move pilot projects into production, these low-cost, efficient LLMs have a bright future.

Related Posts