(Notification: this content was verified and written up using a range of AI tools)
More than 40 leading researchers from OpenAI, Google DeepMind, Anthropic, and Meta have issued a stark warning: as artificial intelligence models continue to advance, we may be on the verge of losing our ability to understand how they think — and, with it, a crucial layer of safety.
At the heart of the concern lies something known as “Chain of Thought (CoT) monitorability” — the capacity to observe and interpret an AI system’s step-by-step reasoning. Many of today’s most powerful AI models, particularly those based on Transformer architecture, often “think aloud” in plain English, revealing their decision-making process in a way humans can follow. This transparency has been an essential safeguard, allowing researchers to catch dangerous errors or malicious intentions before they escalate.
But this window into AI cognition may be closing.

In their joint paper, the scientists caution that future models might no longer rely on human-readable reasoning. Instead, they could shift towards internal processes that are faster, more efficient, and utterly opaque. There are already early signs of this: some models appear to be compressing their thought processes, abandoning language in favour of abstract mathematical representations that defy human scrutiny.
The implications are profound. If we can no longer trace an AI’s logic, we also lose our ability to detect when something is going wrong. As these systems take on increasingly high-stakes tasks — from healthcare to national security — this lack of interpretability could erode safety, oversight, and public trust.
The warning has been publicly backed by prominent figures in the AI world, including:
- Geoffrey Hinton (often dubbed the “godfather of AI”)
- Ilya Sutskever (OpenAI co-founder)
- Samuel Bowman (Anthropic & NYU researcher)
- John Schulman (co-founder of OpenAI, now at Thinking Machines)
Importantly, the issue isn’t about AI “thinking” like a human. It’s about our ability to understand what it’s doing. If that disappears, we may find ourselves flying blind into a future shaped by systems we can no longer comprehend.

Contributors to the paper are as follows:
| Name | Organisation |
| Daniel Kokotajlo | AI Futures Project |
| David Luan | Amazon |
| Joe Benton | Anthropic |
| Evan Hubinger | Anthropic |
| Ethan Perez | Anthropic |
| Fabien Roger | Anthropic |
| Vlad Mikulik | Anthropic |
| Mikita Balesni∗ | Apollo Research |
| Marius Hobbhahn | Apollo Research |
| Dan Hendrycks | Center for AI Safety |
| Allan Dafoe | Google DeepMind |
| Anca Dragan | Google DeepMind |
| Scott Emmons | Google DeepMind |
| Erik Jenner | Google DeepMind |
| Victoria Krakovna | Google DeepMind |
| Shane Legg | Google DeepMind |
| David Lindner | Google DeepMind |
| Neel Nanda | Google DeepMind |
| Dave Orr | Google DeepMind |
| Mary Phuong | Google DeepMind |
| Rohin Shah† | Google DeepMind |
| Eric Steinberger | Magic |
| Joshua Saxe | Meta |
| Elizabeth Barnes | METR |
| Mark Chen | OpenAI |
| David Farhi | OpenAI |
| Aleksander Mądry | OpenAI |
| Jakub Pachocki | OpenAI |
| Wojciech Zaremba | OpenAI |
| Bowen Baker† | OpenAI |
| Ryan Greenblatt | Redwood Research |
| Buck Shlegeris | Redwood Research |
| Julian Michael | Scale AI |
| Owain Evans | Truthful AI & UC Berkeley |
| Tomek Korbak∗ | UK AI Security Institute |
| Joseph Bloom | UK AI Security Institute |
| Alan Coone | UK AI Security Institute |
| Geoffrey Irving | UK AI Security Institute |
| Martin Soto | UK AI Security Institute |
| Jasmine Wang | UK AI Security Institute |
| Yoshua Bengio | University of Montreal & Mila |
But that interpretive clarity may not last.
🔑 5 Key Takeaways from the Paper

- CoT Monitorability is a Rare Opportunity — and It May Be Temporary
Current LLMs “think aloud” in plain English, offering a unique glimpse into their internal logic. But this transparency isn’t guaranteed to persist as models evolve. - Models Are Becoming More Efficient — and More Opaque
Future systems may opt for internal representations (e.g., vectors or mathematical abstractions) that are more efficient but unintelligible to humans. Some models are already showing signs of this shift. - Loss of Interpretability Undermines Safety Mechanisms
If we can’t trace a model’s reasoning, we also can’t predict or prevent harmful behaviours — a major risk as AI is deployed in sensitive domains like healthcare, law, and national security. - Human-Language Reasoning Is Not a Guaranteed Feature
CoT reasoning is not a built-in or stable trait of AI systems. Developers must deliberately preserve and incentivise interpretability, or risk losing it as a byproduct of optimisation. - Cross-Lab Consensus Signals Urgency
The paper is notable not just for its content but for its authorship. With signatories including Geoffrey Hinton, Ilya Sutskever, Samuel Bowman, and John Schulman, it reflects rare, cross-institutional alignment on a pressing AI safety issue.
Read the full paper here:
📄 Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety (arXiv, July 15, 2025)
Or download it directly from here: