OpenAI is testing a “confessions” system that trains new AI models to openly report their own mistakes, rule-breaking, uncertainties, and possible hallucinations through a second output channel. It’s not a ChatGPT feature yet, but early research suggests it could become a powerful safety tool for detecting failures that would otherwise go unnoticed.
