The Memo - 1/Oct/2024
15 major new model releases in September 2024, o1 Mensa member, inspecting OpenAI datasets, and much more!
To: US Govt, major govts, Microsoft, Apple, NVIDIA, Alphabet, Amazon, Meta, Tesla, Citi, Tencent, IBM, & 10,000+ more recipients…
From: Dr Alan D. Thompson <LifeArchitect.ai>
Sent: 1/Oct/2024
Subject: The Memo - AI that matters, as it happens, in plain English
AGI: 81%
Contents
The BIG Stuff (15 new models, o1 Mensa, AlphaChip + 16 Alpha systems…)
The Interesting Stuff (Inspect OpenAI data, Three Mile Island, Orion, robotaxis…)
Policy (EU AI Act, Korinek on AGI, UN governance report, Mark Robinson AI ad…)
Toys to Play With (AVM, Rabbit, Ive, TI-84, first AI comedy TV show…)
Flashback (Microsoft…)
Next (Roundtable…)
A recording of my latest keynote delivered to ~700 staff at an Australian government agency is now available, alongside most of my books and all the other stuff for full subscribers here:
The BIG Stuff
Exclusive: 15 new model releases in September 2024 (Sep/2024)
We’re keeping a steady rhythm of a major new large language model release about every 48 hours. Here’s my exclusive list of major new models for September…
Allen AI OLMoE-1B-7B (6.9B on 5.9T tokens)
Open Language Mixture of Experts model. Achieves strong performance with efficient training. More...01-ai Yi-Coder (9B on 6.2T tokens)
Specialized coding model built on Yi-34B base. Trained on code data. More...DeepSeek-AI DeepSeek-V2.5 (236B on 10.2T tokens)
Large MoE model combining chat and coding capabilities. More...Mistral Pixtral-12b-240910 (12B on 6T tokens)
Multimodal model designed as a drop-in replacement for Mistral Nemo 12B. More...Jina AI Reader-LM (1.54B on 2.5T tokens)
Specialized small model for HTML to Markdown conversion. More...OpenAI o1 (200B estimated)
Latest model with superhuman reasoning performance. More...Google DeepMind Data-Gemma (27B on 13T tokens)
RAG/RIG model fine-tuned for Data Commons queries and statistics. More...Microsoft GRIN MoE (60B on 4T tokens)
Gradient-Informed Mixture of Experts model with efficient param activation. More...Alibaba Qwen2.5 (72B on 18T tokens)
Large dense model with strong performance across various tasks. More...Google DeepMind Gemini-1.5-Pro-002 (1.5T on 30T tokens)
Large sparse MoE model with 2M token context window. Ridiculous name. More...Allen AI Molmo (72B on 7T tokens)
Multimodal Open Language Model using ViT architecture. More...Meta AI Llama 3.2 3B (3.21B on 9T tokens)
Efficient text model distilled from larger Llama 3 variants. More...Meta AI Llama 3.2 90B (90B on 9T tokens)
Large multimodal model with vision capabilities. More...AMD-Llama-135m (135M on 670B tokens)
Small language model trained on AMD hardware. More...Salesforce SFR-LLaMA-3.1-70B-Judge (70B on 15T tokens)
Judge model for evaluating and fine-tuning other LLMs, built with Meta Llama 3 and Mistral NeMO. More...
See all 428+ models at: https://lifearchitect.ai/models-table/
The above count doesn't include the 10,000 derivative models—a significant percentage of which are Llama derivatives—created every month, illustrated in my latest viz:
See more (with PDF, sources): https://lifearchitect.ai/models/#count
o1 passed Mensa admissions test (at launch, 12/Sep/2024)
I didn’t call this one out as loudly as I should have earlier (and it’s only been recognized by Metaculus as of 18/Sep/2024), so here it is laid out in plain English.
Although several earlier models achieved verbal-linguistic IQ (but not full-scale IQ) test results far above 98%—for the first time, o1 would officially pass the Mensa admission based on its 2024 LSAT score (o1=95.6%, Mensa minimum for this test for admission=95%).
A discussion on Metaculus notes the surprising result decades ahead of their crowd-sourced predictions. (18/Sep/2024)
In Sep/2020 (after GPT-3 175B), the crowd predicted this achievement would be 22 years away in 2042. In Sep/2021 (after Google LaMDA 137B), the crowd predicted this achievement would be 10 years away in 2031. (Crowds ain’t that smart…)
Additionally, around 20/Sep/2024, o1 completed a new Dutch high school maths exam in 10 minutes, scoring 100% (paper).
“[o1] scored [76 out of] 76 points. For context, only 24 out of 16,414 students in the Netherlands achieved a perfect score. By comparison, the GPT-4o model scored 66 and 61 out of 76, well above the Dutch average of 40.63 points. The o1-preview model completed the exam in around 10 minutes…”
See my table of LLM achievements.
See my o1 page: https://lifearchitect.ai/o1/
There is also a new interview and related article published by FinancialSense.com: https://www.financialsense.com/blog/21037/openais-o1-model-quantum-leap-ai-intelligence-insights-dr-alan-d-thompson
Listen (link):
How AlphaChip transformed computer chip design (26/Sep/2024)
AlphaChip by DeepMind has revolutionized computer chip design by utilizing AI to accelerate and optimize chip layouts, significantly reducing the time and effort required for this complex task. This AI-driven method, which started in 2020, has been instrumental in designing superhuman chip layouts for Google's Tensor Processing Units (TPUs), used in AI models like Gemini. AlphaChip’s application extends beyond Google, influencing chip design across the industry, promising faster, cheaper, and more efficient chips in the future.
Read more via Google DeepMind.
Read the addendum in Nature: https://www.nature.com/articles/s41586-024-08032-5
This brings us to a current total of 16 DeepMind Alpha systems, five of them announced in Q3 2024:
AlphaGo (Oct/2015), AlphaGo Zero (Oct/2017), AlphaZero (Dec/2017), AlphaFold 1 (Dec/2018), AlphaStar (Jan/2019), AlphaFold 2 (Nov/2020), AlphaCode (Feb/2022), AlphaTensor (Oct/2022), AlphaDev (Jun/2023), AlphaCode 2 (Dec/2023), AlphaGeometry (Jan/2024), AlphaProof (Jul/2024), AlphaGeometry 2 (Jul/2024), AlphaFold 3 (May/2024), AlphaProteo (Sep/2024), AlphaChip (Sep/2024).
See them in a table: https://lifearchitect.ai/gemini-report/#alpha
The Interesting Stuff
Exclusive: OpenAI training data now inspectable (24/Sep/2024)
For the first time, OpenAI is set to provide access to its training data as part of a legal inspection related to copyright infringement claims by authors including Sarah Silverman. The authors allege their works were used without consent to train AI models like ChatGPT. The review will take place in a secure setting at OpenAI’s San Francisco office, with strict controls on access and note-taking. This case may establish crucial guidelines for the use of copyrighted material in AI development, with OpenAI likely defending its practices under fair use.
To view the OpenAI training data as part of the inspection related to the copyright case, the following steps are outlined:
Location: Access to the data will be provided at OpenAI’s San Francisco office.
Security measures: The data will be viewed on a secured ‘air-gapped’ computer with no internet or network access.
Non-disclosure agreement: Anyone reviewing the data must sign a non-disclosure agreement.
Visitor protocol: Visitors must sign a visitor’s log and provide identification upon entry.
Restrictions on technology: No recording devices, including computers, cell phones, or cameras, are allowed in the inspection room.
Note-taking: Limited use of a computer may be provided for taking notes. Handwritten or electronic notes are allowed in scratch files, but copying any training data into notes is prohibited.
Supervised note transfer: Lawyers for the authors will copy notes under the supervision of OpenAI representatives at the end of each day.
Read some related reporting via The Hollywood Reporter.
Read the complete filing ‘Training data inspection protocol’ (24/Sep/2024, PDF, 12 pages):
I’ve updated my ‘What’s in my AI?’ dataset analysis paper to note that OpenAI data is now inspectable: https://lifearchitect.ai/whats-in-my-ai/
Even more OpenAI drama (27/Sep/2024)
Here’s a decent summary of the recent OpenAI drama. For a visualization, I really liked this AI-inpainted image (probably via OpenAI DALL-E 2!) from Dr Yuchen Jin: