The English poet William Cowper wrote that “God moves in a mysterious way, his wonders to perform” in his 1773 poem “Light Shining out of Darkness”. The same could be said of large language models (LLMs), the technology underpinning generative AI. LLMs are capable of extremely fluent conversation and answer questions with authority and confidence, but can we peer inside an LLM to understand its reasoning? The short answer is “no”. LLMs are a black box, where the input (your prompt to the LLM) and output (its answer) are known, but the internal mechanism is opaque.
This fundamental truth is contrary to some advice that you will find on the internet, and is simply a function of how artificial neural networks actually work. Explaining how a traditional computer program makes its decision is not usually so tricky, and indeed is important in many areas. If a bank uses an application to assess whether to make a loan or not, it is normally based on a set of rules. The application might look at the credit rating of the applicant, the collateral they can put up, their income, their age etc. The assessment is “deterministic”, meaning that the decision will always be the same given the same input data. Moreover, the set of weightings that the application gives to the various factors is known, and so it is easy to go back and check why a loan was granted or rejected. LLMs are probabilistic creatures: they are not completely consistent in their answers. Studies have shown that an LLM may score as high as 95% for internal consistency, but this figure can be much lower, depending on the model in question. This is actually quite important for the use of LLMs in many circumstances. However, in many real-life situations, there is a requirement for a computer decision to be auditable and repeatable. Government regulations for some industries, such as pharmaceuticals, require that systems can be audited, and their reasoning can be traced and made transparent.
There have been various approaches to explainability in traditional machine learning models, such as Local Interpretable Model-Agnostic Explanations (LIME). This technique has been shown to be difficult to apply to LLMs due to their complex, non-linear nature.
Another approach is (SHapley Additive exPlanations) SHAP. This approach, which comes from game theory, examines the features of a given model, such as one to assess bank loans, and assigns scores to the factors that affect the answer of that model. It is a technique that works well for machine learning models and neural networks, but has limitations for complex and multi-layered neural networks, such as those used within modern LLMs. SHAP was designed for tabular models where you can turn features on and off and recalculate predictions. LLMs do not work that way: they work with contextual embeddings – removing or adding a word may change the meaning of a sentence. For this reason, the basic SHAP approach cannot be used with LLMs. A number of additional mathematical techniques can be used to enhance SHAP in various ways, but each has limitations. Even using the best available techniques, an LLM’s answer cannot be readily explained due to their sheer complexity and nature. Partial explanations can be derived using techniques like “attention maps”, “behavioural probing” and “neuron/feature analysis”, but none of these provide a reliable and complete explanation of the way that an LLM thinks.
You will sometimes see a “chain of reasoning” explanation from an LLM, which gives a step-by-step explanation of how it tackled a question. In fact, these are post-hoc justifications; they are not how LLMs actually come up with answers. In some ways, this kind of after-the-fact, fake explanation is worse than none at all, as it gives a pretence of auditability when none exists. These “unfaithful” explanations are discussed in more depth in this research paper. They may make us feel better, but they have nothing to do with the internal reasoning process of LLMs.
This state of affairs is unlikely to change, and means that corporations implementing LLM technology need to be very careful in exactly where they use it. In areas where auditability and transparency is important, LLMs are simply unsuitable. We have already seen many examples of this in the legal field, where LLMs make up fake case references in court documents. I spoke today to an NHS practitioner who initially used ChatGPT to look up medical references in papers that she wrote, and stopped doing so once she realised that many of the references provided by the LLM simply didn’t exist. Companies need to train their staff more in AI literacy, so that they are aware of the limitations of LLMs when it comes to reliability, consistency and auditability. A troubling number of people are blissfully unaware that these issues exist, and this lack of understanding is inevitably going to lead to disappointment and worse.







