Updated as of February 9, 2025
đź“ť Contribute to this listAlphaGeometry2 (AG2), part of the system that achieved silver-medal standard at IMO 2024 last July! now has surpassed the average gold-medalist in solving Olympiad geometry problems, achieving a solving rate of 84% for all IMO geometry problems over the last 25 years, compared to 54% previously!
February 7, 2025OpenAI's Deep Research now scores 26% on Humanity’s Last Exam, a dataset with 3,000 questions developed with hundreds of subject matter experts to capture the human frontier of knowledge and reasoning. other state-of-the-art AIs get <10% accuracy and are highly overconfident.
February 2, 2025New randomized, controlled trial of students using GPT-4 as a tutor in Nigeria. 6 weeks of after-school AI tutoring = 2 years of typical learning gains, outperforming 80% of other educational interventions. And it helped all students, especially girls who were initially behind.
January 16, 2025OpenAI introduced the o3 model, which demonstrated groundbreaking reasoning capabilities. It achieved an 88% score on the ARC-AGI benchmark , and scoring 25% on EpochAI's Frontier Math benchmark that no other AI model scored more than 2% on.
December 21, 2024o1-preview is far superior to doctors on reasoning tasks and it's not even close, according to OpenAI's latest paper. AI does ~80% vs ~30% on the 143 hard NEJM CPC diagnoses. It's dangerous now to trust your doctor and NOT consult an AI model.
December 17, 2024Tesla has officially started rolling out FSD (Supervised) version 13.2 to owners (non-employees), featuring a 5-6x improvement in miles between necessary human interventions vs FSD v12.5.4.
December 01, 2024