To: US Govt, major govts, Microsoft, Apple, NVIDIA, Alphabet, Amazon, Meta, Tesla, Citi, Tencent, IBM, & 10,000+ more recipients…
From: Dr Alan D. Thompson <LifeArchitect.ai>
Sent: 27/Jul/2024
Subject: The Memo - AI that matters, as it happens, in plain English
AGI: 75% ➜ 76%
James Campbell (Gray Swan AI, Berkeley, Cornell, Johns Hopkins, 22/Jul/2024):
”It's just so weird how the people who should have the most credibility—Sam, Demis, Dario, Ilya, everyone behind LLMs, scaling laws, RLHF, etc—also have the most extreme views regarding the imminent eschaton [end of history], and that if you adopt their views on the imminent eschaton, most people in the field will think *you're* the crazy one…
most of the staff at the secretive top labs are seriously planning their lives around the existence of digital gods in 2027.”
The AI lull is over. I reckon it spanned 27 days between the release of Anthropic’s Claude 3.5 Sonnet on 21/Jun/2024 and OpenAI’s GPT-4o mini on 18/Jul/2024.
The majority of my keynotes are for private entities (and I publish videos of as many of these as possible to full subscribers of The Memo), but I will be travelling through Australia with at least three public keynotes in the second half of 2024. If you’re here in Oz, do come up and say ‘hello’!
Sydney: Thu 8/Aug/2024 for Seismic.com. Free. Register.
Melbourne: Tue 3/Sep/2024 for Focus. $1990. Register.
Darwin: Wed 23/Oct/2024 for the Australian Govt. $50. Register.
More (mostly private): https://lifearchitect.ai/speaking/
Contents
The BIG Stuff (GPT-4o mini, Llama 3.1 405B, and Mistral Large 2, SearchGPT…)
The Interesting Stuff (Gemini + Olympics, AI job interviews, Unitree G1…)
Policy (AI Manhattan Project, NBER UBI, TSMC, Meta AI in the EU…)
Toys to Play With (Kling, AI Reddit, Google NotebookLM, Oscar + Gaby…)
Flashback (Endgame…)
Next (Roundtable…)
The BIG Stuff
DeepMind AI solving International Mathematical Olympiad problems (25/Jul/2024)
You’re looking at a new maths question posed in the 2024 International Mathematical Olympiad (IMO) exam, and administered to humans (and AI) on 16/Jul/2024. Because this question was designed for this specific exam this month, note that the AI system could not have seen the question or answer in its training dataset. DeepMind’s new AI system AlphaGeometry 2 solved Problem 4 (above) within 19 seconds after receiving its formalization.
The fact that the program can come up with a non-obvious construction like this is very impressive, and well beyond what I thought was state of the art.
— Prof Sir Timothy Gowers, IMO Gold Medalist and Fields Medal Winner (26/Jun/2024)
Using a version of the Gemini model, DeepMind’s AlphaProof and AlphaGeometry 2 AI systems achieved a total score of 28/42, a significant milestone equivalent to a silver medal performance. Gold medal performance is just two more points for a score of 30/42. This accomplishment highlights the potential of artificial general intelligence (AGI) in advancing mathematical reasoning, with AlphaProof excelling in algebra and number theory, and AlphaGeometry 2 demonstrating advanced geometric problem-solving capabilities.
'When people saw Sputnik in 1957, they might have had the same feeling I do now. Human civilization needs to move to high alert!'
— Prof Po-Shen Loh, national coach of the United States' IMO team (26/Jul/2024)
This progress nudged the AGI countdown up from 75% ➜ 76%.
Read more via Google DeepMind.
Read an analysis by the NYT.
New models (Jul/2024)
We saw three big model releases in about a week. Let’s count them off, including my perspective on their impact in plain English: GPT-4o mini, Llama 3.1 405B, and Mistral Large 2.
OpenAI GPT-4o mini (18/Jul/2024)
Parameters: estimate 8B trained on 6T tokens. MMLU=82.0. GPQA=40.2.
"OpenAI would not disclose exactly how large GPT-4o mini is, but said it’s roughly in the same tier as other small AI models, such as Llama 3 8b, Claude Haiku and Gemini 1.5 Flash." (18/Jul/2024)
Perspective: OpenAI trained a new stripped-back model (GPT-4o mini) that its CEO calls ‘too cheap to meter’ (19/Jul/2024). The new model costs 15 cents per million input tokens, 60 cents per million output tokens. This is a step forward for LLM ubiquity.
GPT-4o mini is now the free default in the ChatGPT interface. My analysis shows it is now typing 44× more than GPT-3 davinci’s 3.1M words per minute (25/Mar/2021), and 2× more than ChatGPT’s 70M words per minute from the beginning of this year (10/Feb/2024). Lightning fast progress! Here’s the updated viz:
Sidenote: LLM Tech Lead Philipp Schmid says that Hugging Face now serves over 1B requests per day. And, every day they also see a lot of model downloads: 6,000,000GB or 6,000TB or 6PB of data! These are extraordinary numbers. Note the spike at the end there correlating with the release of Meta AI’s Llama 3.1 model family (below).
To minimize the impact of bad actors ‘jailbreaking’ the model, GPT-4o mini uses innovations from OpenAI’s Apr/2024 paper ‘The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions’.
Announce, Playground on ChatGPT.com, Models Table.
Meta AI Llama 3.1 405B (23/Jul/2024)
Parameters: 405B trained on 15T tokens. MMLU=88.6. GPQA=51.1.
Perspective: Meta’s Llama 3.1 405B model is the new gold standard for open source models. It’s big (close to 1TB for the raw weights), and even when compressed will need expensive hardware to run, but the model itself is completely free!
Announce, Paper (92 pages), Model card, Playground via Poe.com, Models Table.
Mistral Large 2 123B (24/Jul/2024)
Parameters: 123B trained on 8T tokens (estimate). MMLU=84.0.
Perspective: French AI lab Mistral’s latest model Mistral Large 2 123B fits on one node of hardware, but has a restrictive licence for non-commercial use only.
Announce, Playground via Poe.com, Models Table.
OpenAI spending $7B on training + inference, $1.5B on payroll (24/Jul/2024)
I’ve previously reported that OpenAI paid Dr Ilya Sutskever a salary of US$1.9M (NYT, Apr/2018). There are now 1,200 staff at OpenAI (4/Apr/2024). If they were all on that salary, we’d have a massive spend on payroll alone:
$1.9M × 1,200 staff ≈ $2.3B per year