The Memo - 17/Sep/2024

o1 analysis, GameGen-O, NotebookLM demos, and much more!

Sep 16, 2024

∙ Paid

To:      US Govt, major govts, Microsoft, Apple, NVIDIA, Alphabet, Amazon, Meta, Tesla, Citi, Tencent, IBM, & 10,000+ more recipients…
From:    Dr Alan D. Thompson <LifeArchitect.ai>
Sent:    17/Sep/2024
Subject: The Memo - AI that matters, as it happens, in plain English
AGI:     81%

Robin Sloan, author (Mar/2024)
’I’m trying hard to imagine an average day [post ASI].
I can’t even come up with a half-decent scene.
Will people live in buildings? Will they wear clothes?
My imagination is almost physically straining.
It’s easier to write the defeat than the victory, isn’t it?
Easier to write the failure than the success.
For some reason, the success seems like it might be … boring.
…Turns out, no, it’s not boring at all. Plot gallops on, even at the outer limits of matter and energy. Even at the far reaches of freedom, the stories are only just beginning.’

Contents

The BIG Stuff (o1 analysis, humanoid robots value, Last Exam, OpenAI profit…)
The Interesting Stuff (Neuralink update from Noland, AlphaProteo, Tao, Oracle…)
Policy (AI in judgment, TSMC US, China H100s, datacenter=critical infrastructure…)
Toys to Play With (AI quiz, clone webpage via Claude, NotebookLM + demos…)
Flashback (Google DeepMind Gemini…)
Next (Roundtable…)

The Memo continues to be a primary source of AI analysis for all humanity, including major government and enterprise. The Memo features in foundational papers across organizations like Accenture, Brookings, NYU, Weights & Biases, and more (all paper links at the end of this edition).

Apple referenced The Memo for its new model paper and visualizations this month, leaning on analysis of model sizes and parameter counts provided in this publication.

Thanks to ‘LiliTheAdventurer’ for this AI-generated track via the new Suno v3.5, ‘Gone’. It’s challenging for me to remember that this has nothing to do with 1980s MIDI control signals, but actually a complete studio-performance-style song conceptualized from scratch. With vocals in Japanese…

1×

0:00

-1:47

The BIG Stuff

Exclusive: o1: Smarter than we think (17/Sep/2024)

Building on the special edition from last week (12/Sep/2024), I’ve released a new page highlighting the major advancements of OpenAI o1. The comprehensive analysis includes size estimates for the o1 model.

Summary
Major points
- Ready for new discoveries and new inventions
- Reasoning via reinforcement learning and increased test-time compute
- Hidden CoT (chain of thought)
- Increased self-awareness
- The most dangerous model ever released (so far...)
Smarts
Size estimate
- Pricing
- Observations
- Conclusions
Dataset
Use cases
- Solar system model (HTML animation)
- Coding competitions (Codeforces)
- Mathematics (Prof Terry Tao)
Timeline

Read more via ARK Invest.

It’s worth remembering that robots can work 24×7, probably without air conditioning, and in the dark…

I still use ARK’s Jun/2022 AI figures in my keynotes:

1X NEO: I Lived With a Humanoid Robot for 48 Hours by S3 (8/Sep/2024)

In last week's S3 1X shared their plans to quickly deploy Neo, their new robot, in homes. So I asked if they'd put Neo in my home.

This is worth watching in full, and added an ‘info’ note to my AGI countdown due to how close it comes to fulfilling Woz’s AGI definition (strange house, make a cup of coffee).

Watch the video: https://youtu.be/Sb6LMPXRdVc

Reuters: AI experts ready ‘Humanity’s Last Exam’ to stump powerful tech (17/Sep/2024)

Scale AI has launched a global initiative called ‘Humanity’s Last Exam’ to challenge AI systems with difficult questions that extend beyond current benchmark tests.

"[o1] destroyed the most popular reasoning benchmarks… They're now crushed."
— Dr Dan Hendrycks, creator of MMLU and MATH (Reuters, 17/Sep/2024)

This project, organized by the Center for AI Safety and Scale AI, aims to evaluate AI's true capabilities, particularly in abstract reasoning, by crowdsourcing over 1,000 challenging questions.

Read more via Reuters, and source via Scale AI.

You can submit a question to win US$5,000. Scale AI suggests that items should only be designed by PhD experts(!) and:

Questions should not be simple trick questions. Questions should not be straightforward calculation/computation questions. The questions for this dataset can be so challenging that they are impractical for real-world human exams or problem sets. The answers should not be easily found through a Google search.
It should impress you if an AI correctly answers your question. We found questions written by undergraduates tend to be too easy for the models. As a rule of thumb, if a randomly selected undergraduate can understand what is being asked, it is likely too easy for the frontier LLMs of today and tomorrow.

Submit an item: https://agi.safe.ai/submit

This is another comprehensive edition of The Memo, coming in at nearly 5,000 words. And that’s after filtering out ~95% of the AI ‘news’ that comes across my desk. Let’s jump in…

The Memo by LifeArchitect.ai

The Memo - 17/Sep/2024

o1 analysis, GameGen-O, NotebookLM demos, and much more!

The BIG Stuff

This post is for paid subscribers