Updated as of August 21, 2025
š Contribute to this listProof: I took a convex optimization paper with a clean open problem in it and asked gpt-5-pro to work on it. It proved a better bound than what is in the paper, and I checked the proof it's correct.
August 21, 2025Iām thrilled to share that our @OpenAI reasoning system scored high enough to achieve gold š„š„ in one of the worldās top programming competitions - the 2025 International Olympiad in Informatics (IOI) - placing first among AI participants!
August 12, 2025An advanced version of Gemini with Deep Think has officially achieved gold medal-level performance at the International Mathematical Olympiad. š„ It solved 5ļøā£ out of 6ļøā£ exceptionally difficult problems, involving algebra, combinatorics, geometry and number theory.
July 22, 2025Iām excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the worldās most prestigious math competitionāthe International Math Olympiad (IMO).
July 19, 2025GPT-4.5 (when prompted to adopt a humanlike persona) was judged to be the human 73% of the time, suggesting it passes the Turing test
April 1, 2025Google's AI co-scientist did in hours what a team of PhDs took years to find. for the first time AI generated a novel scientific hypothesis, one that took human researchers years to figure out. and it got it right
February 19, 2025AlphaGeometry2 (AG2), part of the system that achieved silver-medal standard at IMO 2024 last July! now has surpassed the average gold-medalist in solving Olympiad geometry problems, achieving a solving rate of 84% for all IMO geometry problems over the last 25 years, compared to 54% previously!
February 7, 2025OpenAI's Deep Research now scores 26% on Humanityās Last Exam, a dataset with 3,000 questions developed with hundreds of subject matter experts to capture the human frontier of knowledge and reasoning. other state-of-the-art AIs get <10% accuracy and are highly overconfident.
February 2, 2025New randomized, controlled trial of students using GPT-4 as a tutor in Nigeria. 6 weeks of after-school AI tutoring = 2 years of typical learning gains, outperforming 80% of other educational interventions. And it helped all students, especially girls who were initially behind.
January 16, 2025OpenAI introduced the o3 model, which demonstrated groundbreaking reasoning capabilities. It achieved an 88% score on the ARC-AGI benchmark , and scoring 25% on EpochAI's Frontier Math benchmark that no other AI model scored more than 2% on.
December 21, 2024o1-preview is far superior to doctors on reasoning tasks and it's not even close, according to OpenAI's latest paper. AI does ~80% vs ~30% on the 143 hard NEJM CPC diagnoses. It's dangerous now to trust your doctor and NOT consult an AI model.
December 17, 2024Tesla has officially started rolling out FSD (Supervised) version 13.2 to owners (non-employees), featuring a 5-6x improvement in miles between necessary human interventions vs FSD v12.5.4.
December 01, 2024