The Batch contains a wealth of information and ideas for the ML community. I miss having a dedicated discussion space for it inside DeepLearning.AI community discussion forum, as this would be a great place to engage with articles like Dr. Ng’s proposal for a Turing-AGI Test. I believe this test could become a milestone test in the future AI testing protocols, and we should discuss the implications here in this community as well.
While the idea of a dynamic, judge-led test to move beyond static benchmarks is brilliant, I have a philosophical concern about its design: Can any human-designed test truly be ‘novel’ enough to avoid relying on known patterns?
Dr. Ng suggests the judge can create any multi-day experience not revealed in advance. However, for a test to be structured and its results evaluable, it must be built from known domains, tools, and logical frameworks which is precisely the kind of information that permeates the training data of modern LLMs. Even a ‘new’ call-center software simulation will rely on UI patterns, language instructions, and workflow logic that are derivatives of existing human knowledge. The LLM might solve it not by learning in the moment like a human, but by recombining seen patterns.
The deeper issue is: If we accept that any testable problem is constructed from known components, then ‘passing’ might reflect sophisticated pattern matching rather than genuine situational learning and understanding. This doesn’t invalidate the test’s practical usefulness, if an AI can perform novel task blends as well as a human, that’s profoundly valuable, but it may still fall short of proving human-like learning and reasoning.
Perhaps the Turing-AGI Test is less about absolute novelty and more about integration fluency which is the ability to dynamically combine known concepts in new contexts under time pressure and feedback. That’s a worthy milestone, even if it doesn’t settle the philosophical AGI debate. Eventually, it has started to appear that this debate will likely be settled politically rather than academically.