The Memo - 16/Dec/2024

Phi-4, Gemini 2.0 Flash, Amazon's new AGI SF Lab, and much more!

Dec 16, 2024

To:      US Govt, major govts, Microsoft, Apple, NVIDIA, Alphabet, Amazon, Meta, Tesla, Citi, Tencent, IBM, & 10,000+ more recipients…
From:    Dr Alan D. Thompson <LifeArchitect.ai>
Sent:    16/Dec/2024
Subject: The Memo - AI that matters, as it happens, in plain English
AGI:     83 ➜ 84%
ASI:     0/50 (no expected movement until post-AGI)

The early winner of The Who Moved My Cheese? AI Awards! for Dec/2024 is Playstation CEO Hermen Hulst (‘AI… will never replace the “human touch” of games made by people’).

A lot of my analysis and visualizations turn up in interesting places, from universities to government situation rooms. I was fast asleep (yes, I do sleep!) as machine learning pioneer Prof Sepp Hochreiter (wiki) presented my old ‘Journey to GPT-4’ viz at the prestigious NeurIPS conference (wiki). Thanks to The Memo reader Roland for sending this photo through from Vancouver.

NeurIPS 2024. Prof Sepp Hochreiter with my old GPT-4 analysis viz.

Contents

The BIG Stuff (Microsoft Phi-4, Google Gemini 2.0, Tesla Optimus, Genie 2…)
The Interesting Stuff (Amazon AGI SF Lab, OpenAI Switzerland, OpenAI 12 days…)
Policy (ElevenLabs, US AI czar, ChatGPT filters, TSMC 2nm, OpenAI + Anduril…)
Toys to Play With (Ilya, Devin, DynaSaur, Maisa KPU, OpenAI emails, X Aurora…)
Flashback (Elon Musk’s gifted school and AI’s impact on language…)
Next (Roundtable…)

The BIG Stuff

Microsoft Phi-4 14B (13/Dec/2024)

Phi-4 is 14B parameters on 10T tokens of synthetic data (715:1), and achieves incredibly high scores across benchmarks (MMLU=84.8, MMLU-Pro=70.4, GPQA=56.1).

While previous models in the Phi family largely distill the capabilities of a teacher model (specifically GPT-4), phi-4 substantially surpasses its teacher model on STEM-focused QA capabilities, giving evidence that our data-generation and post-training techniques go beyond distillation.

We’re entering the twilight zone: small models are becoming far smarter than the large models used to train them. (Sidenote: Listen to Leta AI responding to the ‘stochastic parrot’ accusation back in 2021! https://youtu.be/G57DMAJgomg)

Phi-4 represents possibly the most impressive local large language model advance this year. I will be downloading the weights and running this model locally (using Jan.ai) as soon as the model is available on HF later this month.

Microsoft announce, paper, Models Table.

Watch my May/2024 demo of phi’s synthetic data: https://youtu.be/kgqDtfC_pRY

Google Gemini 2.0 Flash (11/Dec/2024)

Google has unearthed a new secret sauce that allows them to pretrain smaller models with significantly higher performance. And all without resorting to extended inference-time compute, like OpenAI’s o1 reasoning model. I don’t think it’s just data quality, though synthetic data like phi-4 (above) is absolutely having a big impact here.

I estimate that Gemini 2.0 Flash is around 30B parameters on 30T tokens (1,000:1). It achieves high scores across benchmarks (MMLU=87, MMLU-Pro=76.4, GPQA=62.1).

Gemini 2.0 Flash also goes way beyond text:

…multimodal inputs like images, video and audio, 2.0 Flash now supports multimodal output like natively generated images mixed with text and steerable text-to-speech (TTS) multilingual audio. It can also natively call tools like Google Search, code execution as well as third-party user-defined functions.

As outlined in The Memo edition 1/Nov/2024, all Gemini outputs are watermarked and can be tracked by Google.

Google announce, model card, Models Table.

Watch a video on Gemini 2.0’s image generation capabilities (link):

Genie 2: A large-scale foundation world model (4/Dec/2024)

Genie 2 is a groundbreaking foundation world model from Google DeepMind that generates interactive 3D environments for training AI agents. Using autoregressive latent diffusion, it can simulate complex worlds from a single prompt image and supports long-horizon memory, object interactions, physics, and character animations. This tool enables rapid prototyping of virtual environments, helping accelerate research on embodied AI agents and generalist systems like SIMA (13/Mar/2024) while addressing structural limitations in agent training.

Genie 2 is the path to solving a structural problem of training embodied agents safely while achieving the breadth and generality required to progress towards AGI.

Sidenote: This world model bumped up my AGI counter from 83 ➜ 84%. DeepMind has not demonstrated specific real-world use cases, but expect applications across diverse sectors to ‘instantly’ train humanoids in agriculture, mining, manufacturing and assembly, construction, supply chain operations and transport networks, healthcare assistance, home duties including making coffee, and anywhere else you can put a robot…

The Interesting Stuff

Amazon Nova (formerly Olympus) (3/Dec/2024)

Amazon Nova is a multimodal AI model family supporting text, images, documents, and video as input, with text output. It features multiple configurations, including Nova Pro (estimated 90B parameters on 10T tokens, 112:1) and the upcoming Nova Premier (470B parameters, due 2025). Nova Pro was trained using multilingual and multimodal datasets, including synthetic data, across over 200 languages.

Nova Pro (and 16 other models) outperforms the current default ChatGPT model GPT-4o-2024-11-20 on major benchmarks. MMLU=85.9 and GPQA=46.9.

Read my Amazon Nova page: https://lifearchitect.ai/olympus/

Amazon announce, technical report, Models Table.

Meta Llama 3.3 70B (6/Dec/2024)

Meta’s Llama 3.3 70B model was trained on 15T+ tokens (215:1), and delivers performance nearly equivalent to the older Llama 3.1 405B model, while being 1/5th the size, marking a shift from parameter-heavy scaling to quality-focused optimization. This breakthrough defies older notions of scaling laws, improving reasoning, math, and instruction-following tasks without altering the model’s fundamental architecture. Developers now benefit from faster, cost-effective, and high-quality AI for tasks like coding, tool use, and error handling.

Read the Llama 3.3 70B model card.

Try it on Poe.com: https://poe.com/Llama-3.3-70B

Did you know? The Memo features in Apple’s recent AI paper, has been discussed on Joe Rogan’s podcast, and a trusted source says it is used by top brass at the White House. Across over 100 editions, The Memo continues to be the #1 AI advisory, informing 10,000+ full subscribers including Microsoft, Google, and Meta AI.

My end of year ‘The sky is…’ AI report will be sent to full subscribers soon…

Uber stock falls as Waymo announces expansion to Miami (5/Dec/2024)

Waymo, Alphabet’s driverless ride-hailing platform, plans to deploy its all-electric Jaguar I-PACEs in Miami starting early 2025, with ride-hailing services available to customers by 2026 via the Waymo One app.

Following the announcement, Uber and Lyft stocks dropped 6.4% and 6.6%, respectively, as investors worry about the competitive impact of AI-powered autonomous vehicles in the ride-hailing market. Waymo already operates in Los Angeles, San Francisco, and Phoenix, with plans to expand further.

Policy

ElevenLabs’ AI voice generation ‘very likely’ used in a Russian influence operation (10/Dec/2024)

A report by Recorded Future reveals that ElevenLabs’ AI voice generation technology was ‘very likely’ used in a Russian influence campaign called ‘Operation Undercut’. The campaign targeted European audiences with fake news videos featuring AI-generated voiceovers in multiple languages to undermine support for Ukraine. While ElevenLabs did not comment, its AI Speech Classifier matched clips from the campaign to their technology, showcasing how generative AI enables rapid multilingual propaganda.

Toys to Play With

Ilya: Sequence to sequence learning with neural networks: what a decade (14/Dec/2024)

In his NeurIPS 2024 talk, Ilya Sutskever shared bold predictions on the future of AI, declaring that ‘pre-training as we know it will end’. He outlined a shift toward superintelligent systems that are agentic, capable of reasoning, understanding, and even self-awareness. This marks a transformative vision for neural networks and their evolution over the next decade.

Watch the video (link):

Devin is generally available (11/Dec/2024)

Cognition Labs has officially launched Devin, the world’s first AI software engineer, designed to assist engineering teams with tasks such as fixing frontend bugs, creating first-draft PRs, and performing refactors. Devin integrates seamlessly into workflows via Slack, GitHub, and IDEs, offering collaborative support for US$500/month. Engineering teams are already using Devin to contribute to open-source projects, build APIs, and perform QA tasks.

An example of Devin’s capabilities includes triaging, solving, and testing a fix for an issue in Anthropic’s MCP, with the session available here. The merged PR can be viewed on GitHub.

Flashback

First published by Mensa in Nov/2017, my article ‘The future is now: Gifted education beyond 2020’ about Elon Musk’s gifted school in California is still echoing today:

Less languages
Some Australian state education departments have recently enforced mandatory second language teaching (for example, Mandarin Chinese, Spanish, French) in classrooms. Second languages are not taught at [Elon Musk’s gifted school,] Ad Astra.
Yes, learning an additional language has been shown to be beneficial in supporting a child’s brain development and understanding of other cultures. Elon’s involvement in Neuralink—an American neurotechnology company developing implantable brain-computer interfaces—gives some indication of why learning languages is part of the past, not part of the future.

This month, The Economist (12/Dec/2024) found that English rates in China have dropped significantly in the last few years, and apparently on purpose.

…China ranks 91st among 116 countries and regions in terms of English proficiency. Just four years ago it ranked 38th out of 100. Over that time its rating has slipped from “moderate” to “low” proficiency.
…translation apps, which are improving at a rapid pace and becoming more ubiquitous. The tools may be having an effect outside China, too. The EF rankings show that tech-savvy Japan and South Korea have also been losing ground when it comes to English proficiency. Why spend time learning a new language when your phone is already fluent in it?

With brain-machine interfaces coming up, I would instead be asking ‘Why spend time learning a new language when your brain is already fluent in it?’

The next roundtable will be:

Life Architect - The Memo - Roundtable #22
Follows the Chatham House Rule (no recording, no outside discussion)
Saturday 21/Dec/2024 at 4PM Los Angeles (timezone change)
Saturday 21/Dec/2024 at 7PM New York (timezone change)
Sunday 22/Dec/2024 at 10AM Brisbane (primary/reference time zone)
or check your timezone via Google.

You don’t need to do anything for this; there’s no registration or forms to fill in, I don’t want your email, you don’t even need to turn on your camera or give your real name!

All my very best,

Alan
LifeArchitect.ai

Search | Archives

The Memo by LifeArchitect.ai