OpenAI’s involvement in math test development raises questions about AI benchmarking

Subscribe for free access to :arrow_forward: Data Points!

OpenAI’s early report on its o3 model included a high score on FrontierMath, a challenging AI math test developed by Epoch AI — but (it was later revealed) with OpenAI’s funding. The revelation that OpenAI may have had prior access to the test problems and solutions raised concerns about the benchmark’s fairness and independence. This controversy highlights the complexities surrounding AI model evaluation and questions whether evolving AI benchmarks can be truly unbiased. (TechCrunch and meemi’s Shortform]