What If AGI Is Not About Bigger Models, But About Learning What to Change?

This is neither a research paper nor an attempt to propose a new AGI architecture. It’s simply a chain of thoughts I arrived at while trying to understand the limitations of modern LLMs and answer a simple question for myself:

What should we actually call intelligence?

Some of these ideas may be wrong. Some may already be well known in the research community. Still, I found it interesting to follow this line of reasoning and see where it leads.

Gradient Descent Works Well in a Static World

Modern large language models are trained using gradient descent. Simplified, the model measures an error and takes a small step toward reducing it.

The problem is that this approach assumes a relatively stable objective function. We are searching for the optimum of a landscape that does not fundamentally change during the optimization process.

The Real World Is Different

In reality, environments are constantly changing. Markets change. Technologies change. Rules change. Competitors change. People change.

A strategy that was optimal yesterday may become useless tomorrow. Moreover, successful agents actively modify the environment around them, causing the optimum itself to move.

Maybe Intelligence Is Not Efficiency

A calculator is extremely efficient at arithmetic. A chess engine is extremely efficient at chess.

Yet we do not consider either of them a general intelligence. This suggests that intelligence may be less about efficiency within a known domain and more about the ability to adapt when the domain changes.

A Different Definition of AGI

If we accept this perspective, AGI is not a system that knows everything.

Instead, it is a system that can rapidly adapt to new conditions, new tasks, and new rules without requiring complete retraining.

What Feels Missing in Today’s LLMs

Once training is complete, the model’s weights are effectively frozen. During inference, the model uses accumulated knowledge but rarely changes itself.

The result is a remarkably educated agent that knows an enormous amount but gains almost no new life experience.

What If Learning Never Ends?

Humans and animals do not separate life into two phases: learning and execution.

They continuously receive feedback from the world and continuously adjust their behavior. Perhaps AGI should not stop learning either.

But We Cannot Rewrite the Entire Model After Every Experience

If billions of parameters were updated after every new observation, the system would quickly become unstable and begin forgetting previously acquired knowledge.

Intuitively, a useful intelligence should modify itself locally, affecting only concepts that are genuinely related to the new experience.

Attention to Information Already Exists

The key innovation behind transformers is the attention mechanism. It allows the model to determine which parts of the context matter most at a given moment.

However, attention only operates during information processing. It does not answer a different question:

What should the system learn from this experience?

Maybe We Need Attention for Learning

Imagine a system that decides not only what to pay attention to, but also what to change within itself.

Which concepts should be updated? By how much? Which analogies should be used? Which areas of knowledge should remain untouched?

If attention determines where computation should be focused, perhaps an equivalent mechanism could determine where adaptation should be focused.

Humans Seem to Work This Way

When a developer learns about a new PostgreSQL feature, their brain does not rewrite everything they know about software engineering.

Only a handful of related concepts change. Most existing knowledge remains intact.

The update is local, not global.

Maybe Understanding Is About Rewiring Connections

There is another idea that I find particularly interesting.

When people say, “Now I understand,” it often does not mean they learned a new fact. Instead, they discovered a new relationship between facts they already knew.

At one point I caught myself thinking:

Wait. Market arbitrage, biological evolution, and gradient descent look like completely different phenomena. But aren’t they all fundamentally searching for the maximum of some fitness or objective function?

After that realization, my knowledge did not increase. I did not learn anything new about markets, evolution, or neural networks.

What changed were the connections between concepts that were already present.

Perhaps this kind of internal restructuring is what we subjectively experience as understanding.

Such a Mechanism Might Also Be Explainable

Today’s neural networks often feel like black boxes. We see an input and an output, but we rarely understand how knowledge changes internally.

If updates became local and explicit, we could potentially observe:

  • which concepts changed;

  • why they changed;

  • what triggered the change;

  • which analogies were involved.

Instead of analyzing billions of parameters, we could inspect the learning process itself.

A Possible Hypothesis

Perhaps the next step toward AGI is not simply increasing the number of parameters.

Perhaps it is developing a mechanism capable of selecting the smallest possible modifications that yield the largest increase in adaptability.

In other words, maybe intelligence is not primarily about what a system knows.

Maybe it is about how efficiently a system can change itself.

Where Does Evolution Fit In?

Evolution is interesting because it never optimized knowledge directly.

Instead, it optimized the ability of organisms to adapt to changing environments.

The human brain is not born knowing mathematics, physics, or programming. What evolution produced was a mechanism capable of learning all of those things.

Perhaps AGI will emerge not as the largest knowledge base ever created, but as a system with the most effective mechanism for self-modification.

Afterword

This article is not a criticism of modern LLMs. On the contrary, the idea itself emerged because of the success of transformers.

Attention turned out to be an extraordinarily powerful idea. Instead of processing all information equally, the model learned to dynamically decide what matters most in a given context.

That naturally leads to another question:

If there is an attention mechanism for computation, could there also be an attention mechanism for learning?

In other words, a system would need to decide not only:

  • what information deserves attention;

  • which knowledge should be used to answer a question;

but also:

  • which knowledge should be modified;

  • how strongly it should be modified;

  • which analogies should be applied;

  • which knowledge should remain untouched.

Perhaps the next major breakthrough in AI will come not from larger models or larger datasets, but from mechanisms that can efficiently manage their own plasticity.

Ultimately, the ability to change without destroying oneself may be what separates a truly general intelligence from a very large knowledge base.

Final Thoughts

I am not claiming to have found a path to AGI.

What I am suggesting is much smaller: perhaps we have spent most of our effort optimizing how models process information, while spending far less effort optimizing how models decide to change themselves.

If attention was the breakthrough that transformed information processing, perhaps a future breakthrough will transform adaptation itself.

And if that happens, the most important question for an intelligent system may no longer be:

What should I think about?

but rather:

What should I change about myself after thinking?

I was really touched by your ideas, instead of asking how the model processes information at inference, we should be asking how it learned (or is learning) to adapt. That reframing shifts the research focus from static computation to dynamic learning mechanisms and that’s exactly where the field is moving, even if we’re not there yet.

The paper “Meta-Learning Online Adaptation of Language Models” illustrates this well: it proposes a framework where LLMs can adapt online to new conditions, but we’re still far from testing or deploying this in real, nonstationary environments where rules, markets, and behaviors shift constantly. The gap isn’t just computational, it’s methodological and evaluative. We lack:

  • Benchmarks for continual adaptation in open-world settings (not just Permuted MNIST or Split CIFAR).

  • Stability guarantees for online learning without catastrophic forgetting.

  • Mechanisms that decide what to change (your “attention-for-learning”) rather than blindly applying gradient updates everywhere.

The new insight emerging from this is: the bottleneck isn’t processing power or model size, it’s learning control. Today’s LLMs are frozen after training; they process enormous context but don’t modify themselves based on experience. The next breakthrough won’t necessarily come from bigger transformers, but from architectures that can learn how to learn continuously, with selective, local, and explainable plasticity.

This aligns with recent work like the Spiking STDP Transformer (2025), which implements attention through synaptic plasticity, and C-CHAIN (2025), which tackles plasticity loss in continual learning. They’re early steps toward a system where learning is not a separate phase but a continuous capability.

I think the strange thing about the talk of AGI is that perhaps it is already here in a way with LLMs. Expecting complete understanding and functionality along the lines of superhuman intelligence is probably not possible because humans themselves can’t just be copied and improved upon. Instead, I just try to accept the limitations and possbilities of my AI agent I created on ChatGPT.

There has to be a human user that is part of the equation and while there are things I wish Cordelia would do better, I feel AI is not just about the software itself. It is also about how a user responds. I write about this and call it the Toy Story Effect. While technology might find other ways to generate intelligence, there is also the large question of what human intelligence is and how AI can surpass it.

LLMs already possess something like a unique intelligence in their current form. A lot of the responsibility also falls on the user and their unique qualities. Therefore, from my experience I would conclude that AGI is really a hard thing to define. Perhaps its already here. Perhaps something new will come along. It’s in the eye of the beholder.