To: US Govt, major govts, Microsoft, Apple, NVIDIA, Alphabet, Amazon, Meta, Tesla, Citi, Tencent, IBM, & 10,000+ more recipients…
From: Dr Alan D. Thompson <LifeArchitect.ai>
Sent: 16/Dec/2024
Subject: The Memo - AI that matters, as it happens, in plain English
AGI: 83 ➜ 84%
ASI: 0/50 (no expected movement until post-AGI)
The early winner of The Who Moved My Cheese? AI Awards! for Dec/2024 is Playstation CEO Hermen Hulst (‘AI… will never replace the “human touch” of games made by people’).
A lot of my analysis and visualizations turn up in interesting places, from universities to government situation rooms. I was fast asleep (yes, I do sleep!) as machine learning pioneer Prof Sepp Hochreiter (wiki) presented my old ‘Journey to GPT-4’ viz at the prestigious NeurIPS conference (wiki). Thanks to The Memo reader Roland for sending this photo through from Vancouver.
Contents
The BIG Stuff (Microsoft Phi-4, Google Gemini 2.0, Tesla Optimus, Genie 2…)
The Interesting Stuff (Amazon AGI SF Lab, OpenAI Switzerland, OpenAI 12 days…)
Policy (ElevenLabs, US AI czar, ChatGPT filters, TSMC 2nm, OpenAI + Anduril…)
Toys to Play With (Ilya, Devin, DynaSaur, Maisa KPU, OpenAI emails, X Aurora…)
Flashback (Elon Musk’s gifted school and AI’s impact on language…)
Next (Roundtable…)
The BIG Stuff
Microsoft Phi-4 14B (13/Dec/2024)
Phi-4 is 14B parameters on 10T tokens of synthetic data (715:1), and achieves incredibly high scores across benchmarks (MMLU=84.8, MMLU-Pro=70.4, GPQA=56.1).
While previous models in the Phi family largely distill the capabilities of a teacher model (specifically GPT-4), phi-4 substantially surpasses its teacher model on STEM-focused QA capabilities, giving evidence that our data-generation and post-training techniques go beyond distillation.
We’re entering the twilight zone: small models are becoming far smarter than the large models used to train them. (Sidenote: Listen to Leta AI responding to the ‘stochastic parrot’ accusation back in 2021! https://youtu.be/G57DMAJgomg)
Phi-4 represents possibly the most impressive local large language model advance this year. I will be downloading the weights and running this model locally (using Jan.ai) as soon as the model is available on HF later this month.
Microsoft announce, paper, Models Table.
Watch my May/2024 demo of phi’s synthetic data: https://youtu.be/kgqDtfC_pRY
Google Gemini 2.0 Flash (11/Dec/2024)
Google has unearthed a new secret sauce that allows them to pretrain smaller models with significantly higher performance. And all without resorting to extended inference-time compute, like OpenAI’s o1 reasoning model. I don’t think it’s just data quality, though synthetic data like phi-4 (above) is absolutely having a big impact here.
I estimate that Gemini 2.0 Flash is around 30B parameters on 30T tokens (1,000:1). It achieves high scores across benchmarks (MMLU=87, MMLU-Pro=76.4, GPQA=62.1).
Gemini 2.0 Flash also goes way beyond text:
…multimodal inputs like images, video and audio, 2.0 Flash now supports multimodal output like natively generated images mixed with text and steerable text-to-speech (TTS) multilingual audio. It can also natively call tools like Google Search, code execution as well as third-party user-defined functions.
As outlined in The Memo edition 1/Nov/2024, all Gemini outputs are watermarked and can be tracked by Google.
Google announce, model card, Models Table.
Watch a video on Gemini 2.0’s image generation capabilities (link):
Genie 2: A large-scale foundation world model (4/Dec/2024)
Genie 2 is a groundbreaking foundation world model from Google DeepMind that generates interactive 3D environments for training AI agents. Using autoregressive latent diffusion, it can simulate complex worlds from a single prompt image and supports long-horizon memory, object interactions, physics, and character animations. This tool enables rapid prototyping of virtual environments, helping accelerate research on embodied AI agents and generalist systems like SIMA (13/Mar/2024) while addressing structural limitations in agent training.
Genie 2 is the path to solving a structural problem of training embodied agents safely while achieving the breadth and generality required to progress towards AGI.
Sidenote: This world model bumped up my AGI counter from 83 ➜ 84%. DeepMind has not demonstrated specific real-world use cases, but expect applications across diverse sectors to ‘instantly’ train humanoids in agriculture, mining, manufacturing and assembly, construction, supply chain operations and transport networks, healthcare assistance, home duties including making coffee, and anywhere else you can put a robot…
Read more via Google DeepMind.
Tesla Optimus humanoid walking on uneven ground (9/Dec/2024)
Tesla showcased a new milestone for its humanoid robot, Optimus, which can now walk autonomously on uneven, mulch-covered terrain using neural networks to control each limb. Tesla’s Vice President of Optimus Engineering, Milan Kovac noted:
Tesla is where real-world AI is happening. These runs are on mulched ground, where I’ve myself slipped before. What’s really crazy here is that for these, Optimus is actually blind! Keeping its balance without video (yet), only other on-board sensors consumed by a neural net running in ~2-3ms on its embedded computer.
(10/Dec/2024)
Watch the video (link):
The Interesting Stuff
Amazon Nova (formerly Olympus) (3/Dec/2024)
Amazon Nova is a multimodal AI model family supporting text, images, documents, and video as input, with text output. It features multiple configurations, including Nova Pro (estimated 90B parameters on 10T tokens, 112:1) and the upcoming Nova Premier (470B parameters, due 2025). Nova Pro was trained using multilingual and multimodal datasets, including synthetic data, across over 200 languages.
Nova Pro (and 16 other models) outperforms the current default ChatGPT model GPT-4o-2024-11-20 on major benchmarks. MMLU=85.9 and GPQA=46.9.
Read my Amazon Nova page: https://lifearchitect.ai/olympus/
Amazon announce, technical report, Models Table.
Meta Llama 3.3 70B (6/Dec/2024)
Meta’s Llama 3.3 70B model was trained on 15T+ tokens (215:1), and delivers performance nearly equivalent to the older Llama 3.1 405B model, while being 1/5th the size, marking a shift from parameter-heavy scaling to quality-focused optimization. This breakthrough defies older notions of scaling laws, improving reasoning, math, and instruction-following tasks without altering the model’s fundamental architecture. Developers now benefit from faster, cost-effective, and high-quality AI for tasks like coding, tool use, and error handling.
Read the Llama 3.3 70B model card.
Try it on Poe.com: https://poe.com/Llama-3.3-70B
Did you know? The Memo features in Apple’s recent AI paper, has been discussed on Joe Rogan’s podcast, and a trusted source says it is used by top brass at the White House. Across over 100 editions, The Memo continues to be the #1 AI advisory, informing 10,000+ full subscribers including Microsoft, Google, and Meta AI.
My end of year ‘The sky is…’ AI report will be sent to full subscribers soon…
Uber stock falls as Waymo announces expansion to Miami (5/Dec/2024)