The Memo - Special edition: OpenAI o1

OpenAI releases o1 reasoning model in September 2024

Sep 12, 2024

To:      US Govt, major govts, Microsoft, Apple, NVIDIA, Alphabet, Amazon, Meta, Tesla, Citi, Tencent, IBM, & 10,000+ more recipients…
From:    Dr Alan D. Thompson <LifeArchitect.ai>
Sent:    12/Sep/2024
Subject: The Memo - AI that matters, as it happens, in plain English
AGI:     76% ➜ 81%

The BIG Stuff

OpenAI releases o1

Once again, we have this out to The Memo readers within just a few hours of model release.

Key points:

This is an extended reasoning model that ‘thinks’ before responding. The ‘thinking’ sometimes takes 20-30 seconds.
New highest MMLU score (o1=92.3 vs Claude 3.5S=88.7).
New highest GPQA score (o1=78.3 vs Claude 3.5S=67.2).
My initial testing shows this model outperforms all other models, and hits benchmark ceilings.

OpenAI o1 (reasoning model) consistently scores 100% in all ALPrompts. These were hardened prompts designed for frontier models. I hadn't expected the 2024 H2 version to be solved for a long time (prior to this, no LLM in Sep/2024 got a score of more than 2/5 for this prompt). I will be re-evaluating my life's work...

The model also hits the ‘uncontroversially correct’ ceilings on major benchmarks (GPQA Extended ceiling is 74%, MMLU ceiling is about 90%).

GPQA Diamond=78.3.
MMLU=90.8, 92.3 for final model.

Here’s a visualization of the distance between o1 and other models on major benchmarks. Note that there is nowhere left to go at the top; AI has now hit the human-comprehensible ceiling across standardized testing for 'smarts':

Source: https://lifearchitect.ai/mapping/

Read the announce: https://openai.com/index/introducing-openai-o1-preview/

Read the system card (no arch details): https://openai.com/index/openai-o1-system-card/

The Memo by LifeArchitect.ai

Ready for more?