Hi everybody,
I’ve just stumbled upon this hugely interesting paper [2505.03335] Absolute Zero: Reinforced Self-play Reasoning with Zero Data, about “a system that self-evolves its training curriculum and reasoning ability by using a code executor to both validate proposed code reasoning tasks and verify answers” (as per the autors) that achieves SOTA results. This seems huge, or at least very, very promising to break the plateau genAI has been hitting recently and not only. To my surprise, outside some youtube summaries (search “absolute zero ai”), this news has not penetrated the community yet. Hence this post is an attempt to draw attention and spark discussions. What are your thoughts ? – thank you in advance for sharing them.