The Memo - 8/Mar/2024

Claude 3 updates, Inflection-2.5, Sergey interview, and much more!

Mar 07, 2024

To:      US Govt, major govts, Microsoft, Apple, NVIDIA, Alphabet, Amazon, Meta, Tesla, Citi, Tencent, IBM, & 10,000+ more recipients…
From:    Dr Alan D. Thompson <LifeArchitect.ai>
Sent:    8/Mar/2024
Subject: The Memo - AI that matters, as it happens, in plain English
AGI:     71%

Dr Demis Hassabis, Google DeepMind founder (24/Feb/2024):
’[With AGI,] suddenly the nature of money even changes… I don’t know if company constructs would even be the right thing to think about… We don’t want to have to wait till the eve before AGI happens… we should be preparing for that now.’

The Memo reader Tom asked to see the exact prompts I use for testing large language models.

Update Apr/2024: I’ve released these behind a password-protect page at:

https://lifearchitect.ai/ALprompt/

Here's a recent video timecode link of my 2024 H1 prompt being run against Claude 3 Opus. I also use the Meta AI GAIA prompts—two in particular—and you can see all the highest Level 3 GAIA prompts here.

Note that I don’t subscribe to the idea of measuring model performance with ‘vibe’… that’s just silly. Given my extensive background in designing and administering test suites for high cognitive ability (IQ 145+, in the 99.9th percentile) during my time as Chairman of Mensa’s gifted families—and the rigour necessary to ensure that final scores were reliable and comparable—it’s tiring to see ‘experts’ relying on ‘vibes’ rather than accessible norm-referenced measures.

This is another very long edition, with an entire section for many recent humanoid updates. Since we started, The Memo has had a section at the very end—after The BIG Stuff, The Interesting Stuff, Policy, Toys to Play With, and Flashback—called Next which is a space for me to discuss model schedules and upcoming AI releases. Let’s bring this forward, just for this edition.

Here’s my AI forecast calendar for the rest of 2024, starting with GPT-5 which should have started training before Dec/2023 (OpenAI CEO under oath 16/May/2023: ‘We are not currently training what will be GPT-5; we don’t have plans to do it in the next six months [to 16/Nov/2023]’), and so 120 days later would be due to complete that training next Friday 15 March 2024. For safety, I expect the GPT-5 public release date to be after the November 2024 US elections.

2024 AI forecast calendar:
March: GPT-5 trained to convergence for 120d, end Fri 15/March/2024
April: GPT-4.5 released with safety, Gemini 1.5 Ultra ready
May: Amazon Olympus 2T ready
June: AuroraGPT (ScienceGPT research model) ready
July: Meta AI Llama 3 released
August: Google DeepMind Gemini 2 ready
September: 1X NEO humanoid in more factories and some homes
October: US elections 5/Nov/2024, no major releases
November: US elections 5/Nov/2024, no major releases
December: GPT-5 released
2025…

The BIG Stuff

Inflection-2.5 (8/Mar/2024)

Inflection AI (founded by CEO Mustafa Suleyman, who was also a co-founder of Google DeepMind) has released Inflection-2.5, a smarter version of their empathic chatbot. Inflection-2.5 was trained with more than 5,000 NVIDIA H100 GPUs, one of the first models to use this chip. We explored some context of the earlier Inflection-2 model in The Memo edition 23/Nov/2023.

Now we are adding IQ to Pi’s exceptional EQ… approaches GPT-4’s performance, but used only 40% of the amount of compute for training… An average conversation with Pi lasts 33 minutes and one in ten lasts over an hour each day.

While this is the best chat-specific model available as of March 2024, Inflection’s focus on conversation means that Inflection-2.5 has lower overall performance than frontier models like GPT-4, Gemini, and Claude 3. The extended prompting score for MMLU=85.5 (GPT-4=87.3), and Google’s BIG-bench hard=82.2 (GPT-4=83.1).

Read the release: https://inflection.ai/inflection-2-5

Try it via pi.ai (free, no login): https://pi.ai/talk

See it on the Models Table: https://lifearchitect.ai/models-table/

Financial Sense interview (Mar/2024)

Here’s my latest interview about Sora, Mistral, Microsoft, and BMIs. These interviews are part of a premium Financial Sense membership, and I’m grateful to Cris and team for allowing me to share them all (complete list back to pre-ChatGPT Apr/2022) with full subscribers here at The Memo.

Watch the video (link):

Claude 3 hitting more ceilings + system prompt (5/Mar/2024)

We covered Claude 3 in a special edition of The Memo last week:

The Memo - Special edition: Claude 3 Opus

Dr Alan D. Thompson

March 4, 2024

Read full story

The Claude 3 Opus model continues to outperform GPT-4 and Gemini on a range of tests, in many cases achieving much higher scores than expected. In a private conversation, Prof Anton Korinek shared with me that Claude 3 had even broken his tailored ‘econ evals’; economics benchmarks designed at PhD-level:

My econ evals are broken - Claude 3 is at 100% - and so I have to find more difficult tests! These evals have reached a ceiling barely a year after developing them…

I’m always interested in the system prompts used by these big labs, and Anthropic has some of the best ML brains in the world, many of them ex-OpenAI. Here’s the Claude 3 Opus system prompt in full (~210 words):

The assistant is Claude, created by Anthropic. The current date is <today>.
Claude's knowledge base was last updated on August 2023. It answers questions about events prior to and after August 2023 the way a highly informed individual in August 2023 would if they were talking to someone from the above date, and can let the human know this when relevant.
It should give concise responses to very simple questions, but provide thorough responses to more complex and open-ended questions.
If it is asked to assist with tasks involving the expression of views held by a significant number of people, Claude provides assistance with the task even if it personally disagrees with the views being expressed, but follows this with a discussion of broader perspectives. Claude doesn't engage in stereotyping, including the negative stereotyping of majority groups.
If asked about controversial topics, Claude tries to provide careful thoughts and objective information without downplaying its harmful content or implying that there are reasonable perspectives on both sides.
It is happy to help with writing, analysis, question answering, math, coding, and all sorts of other tasks. It uses markdown for coding.
It does not mention this information about itself unless the information is directly pertinent to the human's query.

Compare with the ChatGPT system prompt: https://lifearchitect.ai/alignment/#dall-e3

The Interesting Stuff

Sergey Brin on Gemini Pro 1.5 (2/Mar/2024)

Google founder Sergey Brin recently spoke to developers at ‘AGI house’ about the massive Gemini Pro 1.5 model (with a working memory of 10M tokens). The Youtube audio is really poor, but the transcript via OpenAI Whisper and Anthropic Claude 2.1 came out nicely, and was picked up by several media outlets.

Read it: https://lifearchitect.ai/sergey/

The unfolding Singularity (2024)

I’ve been thinking a lot about what actions people can take right now to prepare for AGI (artificial general intelligence at average human level), ASI (artificial superintelligence at expert human level), and the Singularity (the point in time at which technological growth races out of our control).

To that end, I’ve been revisiting Ray Kurzweil’s writings, especially The Singularity is Near (2005, Amazon), and The Age of Spiritual Machines (1999, Amazon).

For me, one of the most relevant points was Ray’s view of equity investments right now. In his 2005 book The Singularity is Near, he wrote:

According to my models, if we replace the linear outlook with the more appropriate exponential outlook, current stock prices should triple.
…my prediction is that indeed these views on exponential growth will ultimately prevail but only over time, as more and more evidence of the exponential nature of technology and its impact on the economy becomes apparent. This will happen gradually over the next decade, which will represent a strong long-term updraft for the market.
…while the trends predicted by the law of accelerating returns are remarkably smooth, that doesn't mean we can readily predict which competitors will prevail.

For full subscribers to The Memo, I’m making a searchable PDF of this available, but please buy the book (Amazon).

At some point in 2024, I plan to release an accessible paper covering ‘what happens next.’ For now, a summary of Ray’s predictions for the next few years is worth reading at this archive link:

Predictions: https://en.everybodywiki.com/Predictions_made_by_Ray_Kurzweil#2029

And my updated transcripts of Ray’s talks in 2022 and 2023: https://lifearchitect.ai/kurzweil/

Stable Diffusion 3 paper (Mar/2024)

We covered the state-of-the-art text-to-image model, Stable Diffusion 3 in The Memo edition 27/Feb/2024, but were still waiting on the paper. It has finally arrived, and has specific technical detail around architecture, thanks to the open nature of the Stability AI organization.

Read the announcement: https://stability.ai/news/stable-diffusion-3-research-paper

Read the paper (28 pages).

Mount Sinai: AI outperforms specialists in eye medicine (22/Feb/2024)

A study by Mount Sinai shows that GPT-4 can match or outperform human specialists in managing retina and glaucoma, potentially supporting clinicians in patient care.

AI demonstrated superior performance in response to glaucoma questions and case-management advice, while reflecting a more balanced outcome in retina questions, where AI matched humans in accuracy but exceeded them in completeness.

Read more via Scientific American.

Figure Raises $675M at $2.6B Valuation and Signs Collaboration Agreement with OpenAI (29/Feb/2024)

Figure and OpenAI have entered into a collaboration agreement to develop next generation AI models for humanoid robots, combining OpenAI's research with Figure's deep understanding of robotics hardware and software. The collaboration aims to help accelerate Figure's commercial timeline by enhancing the capabilities of humanoid robots to process and reason from language.
"We've always planned to come back to robotics and we see a path with Figure to explore what humanoid robots can achieve when powered by highly capable multimodal models. We're blown away by Figure's progress to date and we look forward to working together to open up new possibilities for how robots can help in everyday life," said Peter Welinder, VP of Product and Partnerships at OpenAI.

Read the release.

Watch the latest video: https://youtu.be/gEjXcEU3Bbw

Meet Punyo, Toyota’s Soft Robot for Whole-Body Manipulation Research (28/Feb/2024)

This soft robot reminds me of Baymax from Disney’s Big Hero 6 (wiki).

Punyo’s hands, arms, and chest are covered with compliant materials and tactile sensors so it can feel contact. The softness allows Punyo to conform to the items it’s manipulating, enabling stability, increased friction, and evenly distributed contact forces. Tactile sensing allows Punyo to apply controlled forces on objects, sense contact (both expected and unexpected), and react to object slips and bumps. Tactile sensing is also important for interacting with people. Whether lifting heavy objects or physically assisting people, robots should be aware of their own bodies and interact appropriately.

Read a comprehensive article by Toyota: https://medium.com/toyotaresearch/meet-punyo-tris-soft-robot-for-whole-body-manipulation-research-949c934ac3d8

Read the official site: https://punyo.tech/

Watch the video (link):

Sanctuary AI Phoenix humanoid update (28/Feb/2024)

Powered by Carbon, Phoenix is now autonomously completing simple tasks at human-equivalent speed. This is an important step on the journey to full autonomy. Phoenix is unique among humanoids in its speed, precision, and strength, all critical for industrial applications.

Watch the video: https://twitter.com/realgeordierose/status/1762938715134157077

Unitree H1 Breaking humanoid robot speed world record V3.0 (2/Mar/2024)

Unitree says that their H1 is now ‘Breaking the full-size humanoid speed world record of 3.3m/s (the previous record was about 2.5m/s).’

Watch the video (link):

EMO: Emote portrait alive - Generating expressive portrait videos with Audio2Video Diffusion Model under weak conditions (Feb/2024)

Researchers at the Institute for Intelligent Computing, Alibaba Group, introduced EMO, a framework capable of generating expressive audio-driven portrait videos from a single reference image and audio input.

View the repo: https://humanaigc.github.io/emote-portrait-alive/

AI outperforms humans in standardized tests of creative potential (1/Mar/2024)

In yet another study, GPT-4 surpassed 151 humans in tests measuring creative potential. The study focused on divergent thinking tasks, where GPT-4 showcased more originality and elaboration in its responses. These tasks are key indicators of creative thought, highlighting AI's evolving capabilities in areas previously thought to be uniquely human.

Sidenote: the gold standard for testing creativity is still the Torrance suite, and that testing was completed last year. GPT-4 approached the ceiling, in the 99th percentile for originality.

Read more (Jul/2023): https://www.umt.edu/news/2023/07/070523test.php

ServiceNow StarCoder increases productivity by 52% (26/Jan/2024)

ServiceNow's developers have been using text to code for several months. They are generating high-quality code using text to describe the type of code they want. This has increased our developer innovation speed by 52%.

Read source transcript via Yahoo Finance.

Read citation and context in the StarCoder 2 paper: https://arxiv.org/abs/2402.19173

Singapore parliament on AI (26/Feb/2024)

Singapore is well ahead of much of the world when it comes to informed government and AI.

Watch the video (link):

Singapore’s Temasek in talks to invest in OpenAI - FT reports (5/Mar/2024)

Singapore’s state investor Temasek Holdings is in discussions to invest in OpenAI, the creator of chatbot sensation ChatGPT, according to the Financial Times.

Temasek is an active investor in the tech sector with a portfolio valued at US$284B... Some of the companies in the portfolio include Roblox, Tencent, and Alibaba.

Confidential sidenote: if I were to have any advisory roles related to similar investments in this space, ‘no identify’ clauses would definitely not allow me to discuss them anyway.

Policy

Anthropic and US regulation references (4/Mar/2024)

With the release of Claude 3 Opus, Anthropic referenced government commitments that are of interest. While these US regulations have been explored previously in The Memo, this may be the first time they’ve been referenced by a major AI lab for a frontier model release.

Our red teaming evaluations (performed in line with our White House commitments and the 2023 US Executive Order) have concluded that the [Claude 3] models present negligible potential for catastrophic risk at this time.

Toys to Play With

OpenAI TTS (Mar/2024)

I’ve been playing around with text-to-speech, with a simple use case of changing my voicemail(!).

The six new voices by OpenAI are perhaps better than those by Sonantic.io (before it was shuttered) and ElevenLabs. The best way to use these is through the OpenAI API, but of course there’s also a GPT for that.

Listen to the voices: https://platform.openai.com/docs/guides/text-to-speech

Try the GPT: https://chat.openai.com/g/g-a83ktVq7n-ai-voice-generator

Sidenote: The chart below isn’t really fair, but compares GPTs (simple text prompts) with full-blown apps from the Apple and Google stores.

# of Apps on Apple vs Google vs GPT stores. Source.

1-min Sora video (2/Mar/2024)

Here’s a full one minute video generated by Sora. The prompt is:

fly through tour of a museum with many paintings and sculptures and beautiful works of art in all styles.

Watch: https://twitter.com/_tim_brooks/status/1764074241740460187

Flashback

A year ago (12/Mar/2023), the AGI countdown was only at 42%. Today it is at 71%. Here’s a video from all the way back then! (And it will be time for a video update as soon as we have OpenAI’s next big model.)

Watch my video (link):

We moved this to the top of this edition! I’m eager to see OpenAI’s response to the frontier models already released early in 2024.

The next roundtable will be:

Life Architect - The Memo - Roundtable #8
Follows the Chatham House Rule (no recording, no outside discussion)
Saturday 16/Mar/2024 at 4PM Los Angeles
Saturday 16/Mar/2024 at 7PM New York
Sunday 17/Mar/2024 at 8AM Perth (primary/reference time zone)
or check your timezone via Google.

You don’t need to do anything for this; there’s no registration or forms to fill in, I don’t want your email, you don’t even need to turn on your camera or give your real name!

All my very best,

Alan
LifeArchitect.ai

Search | Archives

The Memo by LifeArchitect.ai

The Memo - Special edition: Claude 3 Opus

1 Comment