Confidently Wrong: The Fragile World of AI
Large language models (LLMs) are noted for their fluent and confident answers. They increasingly perform well in a range of tests and benchmarks. For example, LLMs score highly on benchmarks like MMLU-Pro, which was specifically designed to challenge LLMs in a range of around 12,000 general knowledge tests. LLMs these days also score well on…






