Measuring Up: AI Benchmarks
Imagine giving a robot a multiple-choice exam, but then realising it already knows all the answers. When a new version of an AI model is released by a major vendor like OpenAI, Anthropic, or Google, it usually comes with a series of statements about how well the model performs on benchmark tests. These benchmarks may…

