To: US Govt, major govts, Microsoft, Apple, NVIDIA, Alphabet, Amazon, Meta, Tesla, Citi, Tencent, IBM, & 10,000+ more recipients…
From: Dr Alan D. Thompson <LifeArchitect.ai>
Sent: 8/Mar/2024
Subject: The Memo - AI that matters, as it happens, in plain English
AGI: 71%
Dr Demis Hassabis, Google DeepMind founder (24/Feb/2024):
’[With AGI,] suddenly the nature of money even changes… I don’t know if company constructs would even be the right thing to think about… We don’t want to have to wait till the eve before AGI happens… we should be preparing for that now.’
The Memo reader Tom asked to see the exact prompts I use for testing large language models.
Update Apr/2024: I’ve released these behind a password-protect page at:
https://lifearchitect.ai/ALprompt/
Here's a recent video timecode link of my 2024 H1 prompt being run against Claude 3 Opus. I also use the Meta AI GAIA prompts—two in particular—and you can see all the highest Level 3 GAIA prompts here.
Note that I don’t subscribe to the idea of measuring model performance with ‘vibe’… that’s just silly. Given my extensive background in designing and administering test suites for high cognitive ability (IQ 145+, in the 99.9th percentile) during my time as Chairman of Mensa’s gifted families—and the rigour necessary to ensure that final scores were reliable and comparable—it’s tiring to see ‘experts’ relying on ‘vibes’ rather than accessible norm-referenced measures.
This is another very long edition, with an entire section for many recent humanoid updates. Since we started, The Memo has had a section at the very end—after The BIG Stuff, The Interesting Stuff, Policy, Toys to Play With, and Flashback—called Next which is a space for me to discuss model schedules and upcoming AI releases. Let’s bring this forward, just for this edition.
Here’s my AI forecast calendar for the rest of 2024, starting with GPT-5 which should have started training before Dec/2023 (OpenAI CEO under oath 16/May/2023: ‘We are not currently training what will be GPT-5; we don’t have plans to do it in the next six months [to 16/Nov/2023]’), and so 120 days later would be due to complete that training next Friday 15 March 2024. For safety, I expect the GPT-5 public release date to be after the November 2024 US elections.
2024 AI forecast calendar:
March: GPT-5 trained to convergence for 120d, end Fri 15/March/2024
April: GPT-4.5 released with safety, Gemini 1.5 Ultra ready
May: Amazon Olympus 2T ready
June: AuroraGPT (ScienceGPT research model) ready
July: Meta AI Llama 3 released
August: Google DeepMind Gemini 2 ready
September: 1X NEO humanoid in more factories and some homes
October: US elections 5/Nov/2024, no major releases
November: US elections 5/Nov/2024, no major releases
December: GPT-5 released
2025…
The BIG Stuff
Inflection-2.5 (8/Mar/2024)
Inflection AI (founded by CEO Mustafa Suleyman, who was also a co-founder of Google DeepMind) has released Inflection-2.5, a smarter version of their empathic chatbot. Inflection-2.5 was trained with more than 5,000 NVIDIA H100 GPUs, one of the first models to use this chip. We explored some context of the earlier Inflection-2 model in The Memo edition 23/Nov/2023.
Now we are adding IQ to Pi’s exceptional EQ… approaches GPT-4’s performance, but used only 40% of the amount of compute for training… An average conversation with Pi lasts 33 minutes and one in ten lasts over an hour each day.
While this is the best chat-specific model available as of March 2024, Inflection’s focus on conversation means that Inflection-2.5 has lower overall performance than frontier models like GPT-4, Gemini, and Claude 3. The extended prompting score for MMLU=85.5 (GPT-4=87.3), and Google’s BIG-bench hard=82.2 (GPT-4=83.1).
Read the release: https://inflection.ai/inflection-2-5
Try it via pi.ai (free, no login): https://pi.ai/talk
See it on the Models Table: https://lifearchitect.ai/models-table/
Financial Sense interview (Mar/2024)
Here’s my latest interview about Sora, Mistral, Microsoft, and BMIs. These interviews are part of a premium Financial Sense membership, and I’m grateful to Cris and team for allowing me to share them all (complete list back to pre-ChatGPT Apr/2022) with full subscribers here at The Memo.
Watch the video (link):
Claude 3 hitting more ceilings + system prompt (5/Mar/2024)
We covered Claude 3 in a special edition of The Memo last week:
The Claude 3 Opus model continues to outperform GPT-4 and Gemini on a range of tests, in many cases achieving much higher scores than expected. In a private conversation, Prof Anton Korinek shared with me that Claude 3 had even broken his tailored ‘econ evals’; economics benchmarks designed at PhD-level:
My econ evals are broken - Claude 3 is at 100% - and so I have to find more difficult tests! These evals have reached a ceiling barely a year after developing them…
I’m always interested in the system prompts used by these big labs, and Anthropic has some of the best ML brains in the world, many of them ex-OpenAI. Here’s the Claude 3 Opus system prompt in full (~210 words):
The assistant is Claude, created by Anthropic. The current date is <today>.
Claude's knowledge base was last updated on August 2023. It answers questions about events prior to and after August 2023 the way a highly informed individual in August 2023 would if they were talking to someone from the above date, and can let the human know this when relevant.
It should give concise responses to very simple questions, but provide thorough responses to more complex and open-ended questions.
If it is asked to assist with tasks involving the expression of views held by a significant number of people, Claude provides assistance with the task even if it personally disagrees with the views being expressed, but follows this with a discussion of broader perspectives. Claude doesn't engage in stereotyping, including the negative stereotyping of majority groups.
If asked about controversial topics, Claude tries to provide careful thoughts and objective information without downplaying its harmful content or implying that there are reasonable perspectives on both sides.
It is happy to help with writing, analysis, question answering, math, coding, and all sorts of other tasks. It uses markdown for coding.
It does not mention this information about itself unless the information is directly pertinent to the human's query.
Compare with the ChatGPT system prompt: https://lifearchitect.ai/alignment/#dall-e3
The Interesting Stuff
Sergey Brin on Gemini Pro 1.5 (2/Mar/2024)
Google founder Sergey Brin recently spoke to developers at ‘AGI house’ about the massive Gemini Pro 1.5 model (with a working memory of 10M tokens). The Youtube audio is really poor, but the transcript via OpenAI Whisper and Anthropic Claude 2.1 came out nicely, and was picked up by several media outlets.
Read it: https://lifearchitect.ai/sergey/
The unfolding Singularity (2024)
I’ve been thinking a lot about what actions people can take right now to prepare for AGI (artificial general intelligence at average human level), ASI (artificial superintelligence at expert human level), and the Singularity (the point in time at which technological growth races out of our control).
To that end, I’ve been revisiting Ray Kurzweil’s writings, especially The Singularity is Near (2005, Amazon), and The Age of Spiritual Machines (1999, Amazon).
For me, one of the most relevant points was Ray’s view of equity investments right now. In his 2005 book The Singularity is Near, he wrote:
According to my models, if we replace the linear outlook with the more appropriate exponential outlook, current stock prices should triple.
…my prediction is that indeed these views on exponential growth will ultimately prevail but only over time, as more and more evidence of the exponential nature of technology and its impact on the economy becomes apparent. This will happen gradually over the next decade, which will represent a strong long-term updraft for the market.
…while the trends predicted by the law of accelerating returns are remarkably smooth, that doesn't mean we can readily predict which competitors will prevail.
For full subscribers to The Memo, I’m making a searchable PDF of this available, but please buy the book (Amazon).
At some point in 2024, I plan to release an accessible paper covering ‘what happens next.’ For now, a summary of Ray’s predictions for the next few years is worth reading at this archive link:
Predictions: https://en.everybodywiki.com/Predictions_made_by_Ray_Kurzweil#2029
And my updated transcripts of Ray’s talks in 2022 and 2023: https://lifearchitect.ai/kurzweil/
Stable Diffusion 3 paper (Mar/2024)
We covered the state-of-the-art text-to-image model, Stable Diffusion 3 in The Memo edition 27/Feb/2024, but were still waiting on the paper. It has finally arrived, and has specific technical detail around architecture, thanks to the open nature of the Stability AI organization.
Read the announcement: https://stability.ai/news/stable-diffusion-3-research-paper
Mount Sinai: AI outperforms specialists in eye medicine (22/Feb/2024)
A study by Mount Sinai shows that GPT-4 can match or outperform human specialists in managing retina and glaucoma, potentially supporting clinicians in patient care.
AI demonstrated superior performance in response to glaucoma questions and case-management advice, while reflecting a more balanced outcome in retina questions, where AI matched humans in accuracy but exceeded them in completeness.
Read more via Mount Sinai.
Scaling ChatGPT: Five Real-World Engineering Challenges (21/Feb/2024)
OpenAI’s head of engineering, Evan Morikawa, sat down with Gergely to discuss the challenges of deploying the hardware-intensive ChatGPT to more than 200 million users! The full article also covers another angle for how to understand the workings of LLMs. OpenAI’s head of engineering said:
A final, dominant challenge was the inability to scale up our GPU fleet. There are simply no more GPUs to buy or rent, and therefore no GPUs to autoscale into. The difficulty of acquiring GPUs continues to this day and shows no signs of easing up. The exponent on demand currently appears larger than the exponent on supply. Not only are there more users, but larger models require ever-more compute, and new techniques like agents take substantially more compute per user.
…
The scale of challenges will only grow. With every jump between GPT-2, 3, and 4, we needed entirely new ways to train and run the models at scale. This shall continue in future versions. All our new modalities like vision, images, and speech, require re-architecting systems, while unlocking new use cases.
The pace of development at OpenAI – and the ecosystem as a whole – is accelerating. We'll see what challenges the next 10x of scale holds!
Read full article featuring OpenAI’s head of engineering, Evan Morikawa, via Gergely.
Measuring GitHub Copilot’s impact on productivity (15/Feb/2024)
Research shows GitHub Copilot boosts developer productivity, with a notable increase in task speed and code quality, especially for less experienced developers. Importantly, the utility of Copilot's suggestions is valued more than their correctness, offering a beneficial starting point that developers can refine.
Read more via the ACM.
Axios: Teachers are embracing ChatGPT-powered grading (7/Mar/2024)
Writable, which is billed as a time-saving tool for teachers, was purchased last month by education giant Houghton Mifflin Harcourt, whose materials are used in 90% of [US] K-12 schools. Teachers use it to run students' essays through ChatGPT, then evaluate the AI-generated feedback and return it to the students.
Read more: https://www.axios.com/2024/03/06/ai-tools-teachers-chatgpt-writable
Writable: https://www.writable.com/ai/
Chinese AI-generated cartoon series broadcast on state television (27/Feb/2024)
State broadcaster China Media Group aired the country’s first cartoon series made with the help of generative artificial intelligence (GenAI) services, including text-to-video tools similar to OpenAI’s Sora. The 26-episode series, Qianqiu Shisong, which debuted on February 27, 2024, features some of the most fabled Chinese poetry and their backstories, with each instalment lasting around seven minutes.
This may be generated by an older text-to-image model, maybe related to Baidu’s ERNIE 4.0.
Watch the video via SCMP (apologies for the source!).
Scientists are putting ChatGPT brains inside robot bodies. What could possibly go wrong? (1/Mar/2024)
This is a longform article by Scientific American (4,000 words). Combining AI chatbot brains like ChatGPT with robot bodies could revolutionize robotics with increased flexibility and knowledge, but it also raises significant practical and ethical concerns.
Read more via Scientific American.
Figure Raises $675M at $2.6B Valuation and Signs Collaboration Agreement with OpenAI (29/Feb/2024)
Figure and OpenAI have entered into a collaboration agreement to develop next generation AI models for humanoid robots, combining OpenAI's research with Figure's deep understanding of robotics hardware and software. The collaboration aims to help accelerate Figure's commercial timeline by enhancing the capabilities of humanoid robots to process and reason from language.
"We've always planned to come back to robotics and we see a path with Figure to explore what humanoid robots can achieve when powered by highly capable multimodal models. We're blown away by Figure's progress to date and we look forward to working together to open up new possibilities for how robots can help in everyday life," said Peter Welinder, VP of Product and Partnerships at OpenAI.
Watch the latest video: https://youtu.be/gEjXcEU3Bbw
Meet Punyo, Toyota’s Soft Robot for Whole-Body Manipulation Research (28/Feb/2024)
This soft robot reminds me of Baymax from Disney’s Big Hero 6 (wiki).
Punyo’s hands, arms, and chest are covered with compliant materials and tactile sensors so it can feel contact. The softness allows Punyo to conform to the items it’s manipulating, enabling stability, increased friction, and evenly distributed contact forces. Tactile sensing allows Punyo to apply controlled forces on objects, sense contact (both expected and unexpected), and react to object slips and bumps. Tactile sensing is also important for interacting with people. Whether lifting heavy objects or physically assisting people, robots should be aware of their own bodies and interact appropriately.
Read a comprehensive article by Toyota: https://medium.com/toyotaresearch/meet-punyo-tris-soft-robot-for-whole-body-manipulation-research-949c934ac3d8
Read the official site: https://punyo.tech/
Watch the video (link):
Sanctuary AI Phoenix humanoid update (28/Feb/2024)
Powered by Carbon, Phoenix is now autonomously completing simple tasks at human-equivalent speed. This is an important step on the journey to full autonomy. Phoenix is unique among humanoids in its speed, precision, and strength, all critical for industrial applications.
Watch the video: https://twitter.com/realgeordierose/status/1762938715134157077
Unitree H1 Breaking humanoid robot speed world record V3.0 (2/Mar/2024)
Unitree says that their H1 is now ‘Breaking the full-size humanoid speed world record of 3.3m/s (the previous record was about 2.5m/s).’
Watch the video (link):
EMO: Emote portrait alive - Generating expressive portrait videos with Audio2Video Diffusion Model under weak conditions (Feb/2024)
Researchers at the Institute for Intelligent Computing, Alibaba Group, introduced EMO, a framework capable of generating expressive audio-driven portrait videos from a single reference image and audio input.
View the repo: https://humanaigc.github.io/emote-portrait-alive/
AI outperforms humans in standardized tests of creative potential (1/Mar/2024)
In yet another study, GPT-4 surpassed 151 humans in tests measuring creative potential. The study focused on divergent thinking tasks, where GPT-4 showcased more originality and elaboration in its responses. These tasks are key indicators of creative thought, highlighting AI's evolving capabilities in areas previously thought to be uniquely human.
Read more: https://www.sciencedaily.com/releases/2024/03/240301134758.htm
Sidenote: the gold standard for testing creativity is still the Torrance suite, and that testing was completed last year. GPT-4 approached the ceiling, in the 99th percentile for originality.
Read more (Jul/2023): https://www.umt.edu/news/2023/07/070523test.php
ServiceNow StarCoder increases productivity by 52% (26/Jan/2024)
ServiceNow's developers have been using text to code for several months. They are generating high-quality code using text to describe the type of code they want. This has increased our developer innovation speed by 52%.
Read source transcript via Yahoo Finance.
Read citation and context in the StarCoder 2 paper: https://arxiv.org/abs/2402.19173
Singapore parliament on AI (26/Feb/2024)
Singapore is well ahead of much of the world when it comes to informed government and AI.
Watch the video (link):
Singapore’s Temasek in talks to invest in OpenAI - FT reports (5/Mar/2024)
Singapore’s state investor Temasek Holdings is in discussions to invest in OpenAI, the creator of chatbot sensation ChatGPT, according to the Financial Times.
Temasek is an active investor in the tech sector with a portfolio valued at US$284B... Some of the companies in the portfolio include Roblox, Tencent, and Alibaba.
Confidential sidenote: if I were to have any advisory roles related to similar investments in this space, ‘no identify’ clauses would definitely not allow me to discuss them anyway.
Read more via Reuters.
Updated viz (Mar/2024)
See more: https://lifearchitect.ai/models/#model-bubbles
See more: https://lifearchitect.ai/models/#api
Policy
Anthropic and US regulation references (4/Mar/2024)
With the release of Claude 3 Opus, Anthropic referenced government commitments that are of interest. While these US regulations have been explored previously in The Memo, this may be the first time they’ve been referenced by a major AI lab for a frontier model release.
Our red teaming evaluations (performed in line with our White House commitments and the 2023 US Executive Order) have concluded that the [Claude 3] models present negligible potential for catastrophic risk at this time.
Read more via the Claude 3 announce.
Elon Musk v OpenAI updates (Mar/2024)
I was privileged to speak with Dr David Millhouse from Bond University about the suit filed by Elon Musk against OpenAI, as well as the class action suit from 2023.
In a livestreamed conversation, Dr Millhouse talked about how this case could become one of the possible blockers to AI and AGI, through the legal mechanism of ‘promisory estoppel’ (wiki). My understanding of this is that in an extreme decision by the courts, OpenAI could be forced to ‘rollback’ and delete all models including GPT-4. Although very unlikely, it’s a fascinating discussion.
Watch the livestream recording: https://youtu.be/f7kiKleyGvs
OpenAI recently responded to Elon’s allegations (5/Mar/2024): https://openai.com/blog/openai-elon-musk
Sidenote: It’s best not to get distracted by these political hysterics. They’ll remain a tiny footnote in history.
OpenAI, tech companies sign pledge to build AI responsibly (4/Mar/2024)
OpenAI, Salesforce, and other tech companies have pledged a “collective responsibility” to responsibly maximize AI benefits and mitigate risks, amidst calls for ethical AI development and a lawsuit from Elon Musk.
Read more: https://fortune.com/2024/03/04/openai-signs-open-letter-ai-salesforce-sam-altman-elon-musk/
US NTIA AI Open Model Weights RFC (26/Feb/2024)
The US government—specifically the National Telecommunications and Information Administration (NTIA)—is considering the dangers of open models (those with downloadable weights), and calling for public comment.
…to conduct a public consultation process and issue a report on the potential risks, benefits, other implications, and appropriate policy and regulatory approaches to dual-use foundation models for which the model weights are widely available…
Foundation models with widely-available model weights could engender substantial harms, such as risks to security, equity, civil rights, or other harms due to, for instance, affirmative misuse, failures of effective oversight, or lack of clear accountability mechanisms.
Download the RFC: https://www.regulations.gov/document/NTIA-2023-0009-0001
Comments must be submitted by 28/Mar/2024. Only a few comments have been submitted so far, but doing so is a very big job.
There are 67 questions in total (thanks, Claude 2!), and I would expect it to take around 40 hours to comprehensively address all points with rigorous citations: https://www.regulations.gov/docket/NTIA-2023-0009
Toys to Play With
OpenAI TTS (Mar/2024)
I’ve been playing around with text-to-speech, with a simple use case of changing my voicemail(!).
The six new voices by OpenAI are perhaps better than those by Sonantic.io (before it was shuttered) and ElevenLabs. The best way to use these is through the OpenAI API, but of course there’s also a GPT for that.
Listen to the voices: https://platform.openai.com/docs/guides/text-to-speech
Try the GPT: https://chat.openai.com/g/g-a83ktVq7n-ai-voice-generator
Sidenote: The chart below isn’t really fair, but compares GPTs (simple text prompts) with full-blown apps from the Apple and Google stores.

1-min Sora video (2/Mar/2024)
Here’s a full one minute video generated by Sora. The prompt is:
fly through tour of a museum with many paintings and sculptures and beautiful works of art in all styles.
Watch: https://twitter.com/_tim_brooks/status/1764074241740460187
Flashback
A year ago (12/Mar/2023), the AGI countdown was only at 42%. Today it is at 71%. Here’s a video from all the way back then! (And it will be time for a video update as soon as we have OpenAI’s next big model.)
Watch my video (link):
Next
We moved this to the top of this edition! I’m eager to see OpenAI’s response to the frontier models already released early in 2024.
The next roundtable will be:
Life Architect - The Memo - Roundtable #8
Follows the Chatham House Rule (no recording, no outside discussion)
Saturday 16/Mar/2024 at 4PM Los Angeles
Saturday 16/Mar/2024 at 7PM New York
Sunday 17/Mar/2024 at 8AM Perth (primary/reference time zone)
or check your timezone via Google.
You don’t need to do anything for this; there’s no registration or forms to fill in, I don’t want your email, you don’t even need to turn on your camera or give your real name!
All my very best,
Alan
LifeArchitect.ai
Hi Alan, I guess the round table call is going to be in about 50 minutes and not at 4 PM PDT as indicated in your message?
I did the time zone calculation and it's only 7:10 AM now at Perth - guessing due to recent time change in the US cities.
thanks,
Anand.