A conceptual illustration of the fragility of AI reasoning. A glowing digital brain labeled 'LLM' rests precariously on top of a stack of stone blocks. The blocks are labeled 'Benchmarks', 'MMLU-Pro', 'Math Olympiad', and 'Coding'. The stack is crumbling and unstable. Small feathers labeled 'Rewording', 'Context Change', and 'Phrasing' are gently touching the stack, causing it to crack and wobble, illustrating how tiny changes cause failure. On the right, a person is looking at a tablet that says 'FAILURE' with a confused expression.