AI Legacy Code – The New COBOL - The Information Difference

One of the areas where large language models (LLMs) like Anthropic’s Claude have made substantial progress is in generating computer code. Just as LLMs can produce outputs in English or Spanish, they can do so in computer languages like Python, Java, Ruby and more. This is all very cutting edge, and the true effect on software productivity is emerging and much debated. However, for the sake of argument, suppose that writing code using LLMs completely replaces the old paradigm of humans writing new code. What then?

Think back to previous technological change. We moved from machine code to assembly language, from assembly language to 3GLs like FORTRAN (introduced in 1956) and COBOL (1960), and from there into a plethora of language approaches, from 4GLs to code generators to object oriented languages like Smalltalk, then languages like C++ (1980), Python (1991), Ruby (1995), Java and JavaScript (1995) through to Rust (2015) and Mojo (2023). Some languages flower briefly and then retreat into obscurity while others endure for decades. It is estimated that there are over 800 billion lines of COBOL still running operational systems today in enterprises and governments, 66 years after its release.

Large language models produce code but are also connected with retrieval systems, datasets, workflows, security controls and processes. The AI implementation is a potentially fragile stack of prompts, APIs, vector databases, middleware, orchestration frameworks, and monitoring tools. The final code may be in a language like Python, but it was written by an LLM rather than a human. Over time, organisations may accumulate systems whose behaviour is only partially understood because the original prompts, workflows, model assumptions, and generated code have evolved faster than institutional knowledge. LLM generated code turns out to be harder to maintain than human code, with a CodeRabbit study finding 1.7 times more major issues overall, and that AI code creates significantly more technical debt, as picked up in this April 2026 Chinese study of over 300,000 AI-author commits to GitHub.

Unlike traditional software, AI systems are probabilistic rather than deterministic. Their outputs can change over time and model behaviour may drift after updates. Training data quality can degrade, and regulatory requirements evolve. Infrastructure costs fluctuate dramatically, such as the cost of inference. As organisations layer AI systems on top of one another, complexity compounds rapidly. They can perform well in pilots and controlled workflows, yet become brittle once exposed to changing data, policies, prompts, and downstream dependencies. AI code tends to be verbose, which in turn causes maintenance issues.

The current leading models like Claude and ChatGPT may be dominant now, but history suggests that today’s leading technology does not always translate into lasting hegemony. When I was a programmer, I wrote code in PL/1 (designed by IBM), a 4GL called Nomad and an IBM programming language called ADF. These are not exactly household names today, though PL/1 still exists on mainframe computers. The LLM models that are household names now may not survive an economic shock forever, especially since OpenAI and Anthropic are burning cash at an impressive rate. No one can be sure which of the current crop of LLMs will be dominant, or even around, in a decade.

Government regulation tends to reinforce technological inertia. Once governments establish compliance standards around AI systems, organisations may become reluctant to upgrade models frequently due to validation costs and legal risk. Industries that are heavily regulated, such as healthcare, banking, insurance, and defence, already operate under heavy regulatory constraints. AI systems deployed in these sectors may require extensive auditing, explainability controls, documentation, and risk assessments. Organisations may respond by freezing AI environments for long periods. This could create “approved” enterprise AI stacks that persist for years despite rapid advances elsewhere. This picture may be quite different in the faster moving industries or those providing products for consumers rather than businesses.

This means that switching costs for an enterprise may be high. Sure, you can in theory swap out Claude for DeepSeek or Qwen or whatever, but what would this look like in practice?

AI systems are often embedded directly into knowledge work, so replacing a particular LLM with another may disrupt not only infrastructure but also workflows and the “institutional memory” of an organisation. For now, companies are mostly investing in AI to build new applications: those mountains of COBOL code are still there, booking airline flights and collecting your taxes. Around two thirds of enterprise IT budgets go on support and maintenance, and this will be true as the new shiny AI coded systems of today become the operational systems of tomorrow. It is likely that a large market may develop to handle maintenance, security, optimisation and integration of ageing AI systems. There will be AI-related jobs in governance, observability, compliance, infrastructure optimisation, synthetic data remediation, hallucination containment, and related maintenance work.

Companies today need to try and insulate themselves as far as possible. Software layers can be built to isolate LLM specifics behind stable interfaces, so that a new LLM can be invoked with less disruption. Proprietary behaviours of a particular model should be avoided unless absolutely necessary. A layer of abstraction should be built rather than embedding a specific model directly into workflows. Tools like LiteLLM can be useful since it normalises the APIs of many providers. Prompts should be versioned, stored centrally and parametrised. Reasoning should be separated from code generation as far as possible. Fine tuning of models should similarly be avoided, if possible, as this increases vendor lock in. Business logic should be kept outside an LLM, just as you should avoid hard-coding as far as possible in old fashioned human code. Enterprises should mandate policies that insulate their applications as much as possible, and should ensure that these rules and policies are followed.

In years to come we will have to deal with decades of accumulated AI dependency. The future may be less of a race to create new AI models, and more a challenge of living with the legacy of AI. The next great software legacy problem may not be COBOL, but the accumulated dependencies, governance rules, and hidden assumptions of AI systems built today.

AI Legacy Code – The New COBOL

Related Posts