Because of the buzz around generative artificial intelligence (AI) chatbots like ChatGPT and Claude, it is easy to forget that large language models (LLMs) which power generative AI, are just a subset of AI. There are actually many forms of AI around, from old-fashioned expert systems to machine learning, to reinforcement learning systems to newer types that are mostly at the research stage. It is a key point because it would be good to understand what is behind the horrendous failure rate of current AI projects. This failure rate is 95% in mid-2025, according to MIT, and other estimates are barely better. Rand estimate a failure rate over 80%, Gartner over 85%. When I asked McKinsey’s own new AI chatbot, called Insights, just today:
“What percentage of generative AI projects succeed?” it answered:
“For now, I can only answer questions related to Gen AI/AI, Tech, Media, and Telecom.”
Very insightful, and an answer that gives you an indication about how successful that particular chatbot project has been so far (to be fair, it is new).
One reason for the high failure rate, I believe, is that many people assume that generative AI is the answer to all problems, when in fact it is not. When all you have is a hammer, everything looks like a nail, and at present the generative AI hammer is being used for all kinds of tasks that a hammer is ill-suited for. LLMs are probabilistic in nature. They are creative, can conduct a conversation fluently, but have limitations. They excel with text and images, but can struggle with numbers. This is why they are fairly poor at arithmetic despite being able to solve some complex mathematical problems, especially if these problems are heavily represented in their training data. LLMs have limitations, a key one being hallucinations, which is a major problem for many business situations. An LLM will not necessarily give tiy the same answer to the same question if asked repeatedly. The internal consistency rates may be 90% or more, but not 100%. Anything requiring consistency of response and accuracy is not something that an LLM is likely to excel at.
Contrast this with machine learning, a well-established branch of AI. Here data is prepared as input to learning algorithms, such as linear regression, decision trees or a neural network. Machine learning works very well with structured data like databases, logs and sensor streams. It is excellent for classifying data and making recommendations based on historical patterns. This makes machine learning very well suited to situations like fraud detection, predictive maintenance, or highlighting situations where customer churn is likely based on past behaviour. Machine learning models are transparent and explainable, which is ideal for situations where an audit trail is needed. This makes them suitable for regulated industries like finance and pharmaceuticals, where people need to understand the basis of the AI model decisions. Machine learning algorithms are also usually fast and efficient compared to some other forms of AI.
By contrast, LLMs are creative, and great for generating new content like text, images, video or software code. They are optimised for unstructured data like documents, emails and audio or image files. It is important to understand that LLMs are black boxes. Despite considerable research efforts, it is essentially impossible to really explain the internal thinking of an LLM, which is based on dozens of layered neural networks, each making statistical decisions based on its weightings and its training data. LLM wrappers and products that claim to be able to explain internal reasoning are essentially post hoc fictional accounts of what is actually going on inside an LLM. They may be able to generate “chain of thought” logic statements that sound plausible, but these bear little or no resemblance to what is actually going on internally inside an LLM. Hence, LLMs are poorly suited for situations where an audit trail of reasoning is required. On the other hand, LLMs are generalists, able to answer questions on a wide range of subjects, at least within the scope of their extensive training data.
These characteristics can help us choose our AI tool wisely. Between machine learning and an LLM, here are some examples.
Generate new marketing material? | LLM |
Predictive maintenance? | Machine learning |
Image synthesis? | LLM |
Spotting fraud patterns? | Machine learning |
Translating text | LLM |
Customer churn prediction | Machine learning |
Debugging code | LLM |
Regulatory reporting | Machine learning |
Synthetic data creation | LLM |
Interpreting MRI images | Machine learning |
Personalised chatbots | LLM |
Spam detection | Machine learning |
Prototyping | LLM |
Credit scoring | Machine learning |
Summarising documents | LLM |
Sentiment analysis | Machine learning |
In some situations, a combined approach may be optimal. In drug discovery, LLMs are good at simulating novel molecular structures, but machine learning is very good at predicting patterns of toxicity. In driverless cars, machine learning is used for real-time decision making and object detection, but LLMs can be useful in generating rare edge cases and unusual driving situations to test. In fraud detection, machine learning is excellent at spotting patterns, but LLMs can potentially generate novel fraud scenarios to test. In marketing, machine learning can suggest new products to customers based on their preferences or past behaviours, while LLMs are good at creating personalised content or product messaging.
These are not the only AI tools in the toolkit. Reinforcement learning maximises cumulative reward across actions by interacting with an environment, and does not need historical labelled data. It works well when feedback is gradual and long-term.
This makes it well-suited to things like adaptive control and planning, which may be useful in robotics, drones, game playing, traffic routing and energy distribution. An illustration is in the game of chess. Deepmind’s AlphaZero comprehensively defeated the best specialist classical computer chess program (Stockfish), which relied on specific rules and deep experience related to chess. AlphaZero started with just the basic moves of chess, played itself millions of times and improved and adapted. In a hundred games between these two, AlphaZero won 72 games and drew 28, losing not one game. By contrast, LLMs are currently terrible at chess, barely able to play legal moves consistently once the opening few moves (where plenty of training data is available) have unfolded. LLMs hallucinate extra pieces, play illegal moves and are pretty much clueless at chess, and even the better ones lose quickly even to club-level players like myself. By contrast, engines like Stockfish and AlphaZero would trounce even the chess world champion. Indeed, Stockfish recently played the world’s second-highest-rated chess player (Hikaru Nakamura), and beat him across several games even when it was handicapped by removing two pawns at the start of the game, a large disadvantage.
The key lesson here is that we need to choose the right tool for the job. LLMs are great at some things but bad at others, and the same is true for machine learning and reinforcement learning. We need to improve the levels of education in the corporate world about the pros and cons of these AI techniques and other technology approaches, so that we can improve the odds of project success by using the specific AI technique (or indeed non-AI technique) that works best for the specific project or use case. This should at least be a starting point to help improve the currently dismal rate of success in AI projects.