Popular science article in Quanta Mag article on "Reasoning in Latent Space"

An intriguing idea. I’m not sure how this works, but there it is. We read:

When large language models (LLMs) process information, they do so in mathematical spaces, far from the world of words. That’s because LLMs are built using deep neural networks, which essentially transform one sequence of numbers into another — they’re effectively complicated math functions. Researchers call the numerical universe in which these calculations take place a latent space.

But these models must often leave the latent space for the much more constrained one of individual words. This can be expensive, since it requires extra computational resources to convert the neural network’s latent representations of various concepts into words. This reliance on filtering concepts through the sieve of language can also result in a loss of information, just as digitizing a photograph inevitably means losing some of the definition in the original. “A lot of researchers are curious,” said Mike Knoop, co-creator of one of the leading benchmarks for testing abstract reasoning in AI models. “Can you do reasoning purely in latent space?”

Two recent papers suggest that the answer may be yes. In them, researchers introduce deep neural networks that allow language models to continue thinking in mathematical spaces before producing any text. While still fairly basic, these models are more efficient and reason better than their standard alternatives.

This article refers to the following papers:

2024-12: Training Large Language Models to Reason in a Continuous Latent Space

This the “Coconut” model, for “chain of continuous thought”

2025-02: Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

We also read:

Despite these positive results, Hao believes it may take more time and research for latent reasoning models to become mainstream. Leading companies, such as OpenAI and Anthropic, are already heavily invested in existing LLM architectures. Redoing them to incorporate latent space reasoning would require heavy reengineering, so it’s unlikely they’ll adopt such techniques anytime soon.

Zettlemoyer also cautions that latent space reasoning may have its own shortcomings. Ultimately, the data that LLMs train on is based on text, and the traditional approach has been extremely successful at finding patterns in it. LLMs can learn any kind of reasoning pattern, as long as it exists in texts — ensuring that the models reason in ways that humans do. Letting LLMs reason without using words could mean they’ll work in ways that aren’t amenable to human thinking. “Moving into a continuous space could allow for all kinds of possibilities that aren’t actually going to be helpful,” Zettlemoyer said.

P.S.

There is a whle article at SEP regarding the philosophy of the “Language of Thought”. I should read this…