A sign that most of us barely understand the way that LLMs work is illustrated by the trend for “prompt engineering”, where merely rewording a prompt to a large language model (LLM) would produce a better answer. There is lots of advice on this subject. Be as detailed as possible in your prompt, give context or background to the LLM, and ask for structure in the answers. You might also be specific about what to avoid (“e.g. no analogies“) and to explain the goal of the prompt, the type of audience for it, the length, etc. You may ask the LLM to play a role e.g. “you are a senior data architect…”.
So, for example, instead of a prompt “summarise this report”, you might prompt “Summarise this report in 3 bullet points for a non-technical executive. Each bullet should state one key finding and one implication for business strategy, using plain language.” The latter should get you a better result. This all sounds a little like advice on how to talk to a young child, but it works. Several studies have examined this area and concluded that these techniques can be quite effective.
All well and good, but it seems that we have all been missing a very basic form of prompt engineering that takes advantage of the fact that LLMs produce one token at a time and work left to right when tokenising your input prompt. Early tokens are processed before the model has seen the full question and any constraints. So, if the prompt lists options first and the question later, then the model may struggle to understand what is going on when it first reads those options. The trick is simply to repeat the entire prompt. So, whatever your prompt was: “x”, just copy and paste it to be “x x”. This gives the model a chance to process the query after it has read the first copy. Any tokens that were confusing in the original query can be represented better in the second pass, now that the model knows what matters. Does this sound absurd?
It turns out not to be. A research paper was published by Google in February 2026 that reveals something about how LLMs work. The simple technique of repeating the prompt significantly improves the quality of results, especially for retrieval-style queries.
The researchers tested seven different models across seven different benchmarks, with 67% of queries yielding better answers, and none yielding worse answers. One model (Gemini Flash-Lite), went from 21% accuracy to 97% accuracy on a name retrieval task. Basically, queries of the form “question context” perform better than “context question”. Modern transformers do use self-attention over the whole input sequence, but training and decoding are still causal (left-to-right), so ordering and first impressions matter. In their tests, layouts that would normally confuse a model (showing options first before the question) were dramatically stabilised, so repeated prompts often made ‘question context’ formulations perform better than ‘context question’.
It turns out that reasoning models actually already effectively do this, so they don’t get much or any improvement from this technique. Yet reasoning models typically take ten times the token processing of a regular LLM. There are some limitations: prompt repetition does not reliably improve things like creative writing quality, step-by-step math solutions or general conversational quality. There are, however, major gains in structured, non-reasoning, classification-style tasks, and small gains in reasoning tasks.
The most famous paper in AI is the 2017 “Attention is all you need” paper by several Google researchers, which described the transformer architecture that led to the development of modern LLMs such as the GPT (generative pre-trained transformer) model and ChatGPT, the tool based on it. It turns out that the simple repetition of a prompt might not be quite as transformative as that paper, but very useful nonetheless.







