Large language models “hallucinate” because training and evaluation reward confident guesses over admitting uncertainty, making errors statistically inevitable. To curb this, benchmarks must be redesigned so models aren’t penalized for expressing uncertainty, fostering more trustworthy AI.