The Memo - 13/May/2024

DrEureka with GPT-4, DeepSeek-V2, Sora video, and much more!

May 12, 2024

To:      US Govt, major govts, Microsoft, Apple, NVIDIA, Alphabet, Amazon, Meta, Tesla, Citi, Tencent, IBM, & 10,000+ more recipients…
From:    Dr Alan D. Thompson <LifeArchitect.ai>
Sent:    13/May/2024
Subject: The Memo - AI that matters, as it happens, in plain English
AGI:     72% ➜ 73%

Here’s something I’ve been thinking about a lot this year: Who will pay them back? For the voice, I tried using OpenAI TTS and ElevenLabs, but they didn’t have the right feel. Special thanks to Jennifer Vuletic (link) for her human voiceover and slotting this mini project into her schedule, in between her work for massive publishers and royalty like Australian Prime Minister Julia Gillard (Audible).

Watch my video (16mins, link):

Contents

The BIG Stuff (DrEureka, Neuralink update…)
The Interesting Stuff (Ukraine avatar, full Sora video, Optimus, Dojo 18kA…)
Policy (Pause, OpenAI specs + Alan rant, new govt supercomputers…)
Toys to Play With (LLM for kids, music, MIT, Super Mario…)
Flashback (GPT-3…)
Next (new models, invitation link to next roundtable…)

The BIG Stuff

Next: OpenAI spring update + Google I/O (13/May/2024)

In 24h from this emailed edition, OpenAI will livestream a ‘spring update’ (link).

This will also be the day before the Google I/O conference (link) where Google is expected to announce a Sora clone called Miro (text-to-video), as well as Imagen 3 (text-to-image), and Juno V1 3B (inpainting images). (via leaker Bedros Pamboukia, 12/May/2024)

Sidenote: The use of seasonal terms like ‘spring’ by US entities is incredibly boring. Aside from the fact that the US has only 4.23% of the total world population and might like to consider the other 95%, its seasons move based on solstices (link) unlike say Australia where our seasons change on the first of the month. Also, what even is ‘fall’?

Here’s the expected lineup for the OpenAI announcements:

❌ GPT-5

❌ Search engine [Sidenote: This would mean Reuters was wrong again…]

✅ Phone calls within ChatGPT via WebRTC (wiki) + more integrations

✅ New models gpt-4l, gpt-4l-auto, gpt-4-auto, maybe related to new models coming in Microsoft Copilot: next-model4 and next-model8

✅ A Steve Jobs-style ‘one more thing’ (here’s a fun flashback video by CNET)

I’ll update the web version of this edition of The Memo here:

UPDATE 1:

GPT-4o (Omnimodel) OpenAI says is ‘the best model in the world’ (13/May/2024)
MMLU=88.7. GPQA=53.6 (see viz below).
Livestream link: YouTube.com (26mins).

UPDATE 2:

Dr Jim Fan (14/May/2024): ‘[GPT-4o is] likely an early checkpoint of GPT-5’
But my testing shows GPT-4o is actually worse than GPT-4 Turbo (and definitely Claude 3 Opus) across ‘IQ’ benchmarks. It’s only the multimodal aspect that makes this an evolution.
Wild video demo (link):

GPT-4 + Unitree Go1 quadruped robot = DrEureka (UPenn, NVIDIA, UT Austin) (4/May/2024)

Dr Jim Fan from NVIDIA announced a ‘surprising’ evolution of embodied AI, using GPT-4 to intuitively address ‘friction, damping, stiffness, gravity, etc.’ in robotics. Working alongside a team from UPenn and UT Austin, the system is called DrEureka.

The name comes from Domain Randomization via the 2023 Eureka system, combining LLMs (currently based on GPT-4) with NVIDIA GPU-accelerated simulation technologies (20/Oct/2023).

We trained a robot dog to balance and walk on top of a yoga ball purely in simulation, and then transfer zero-shot to the real world. No fine-tuning. Just works.
I’m excited to announce DrEureka, an LLM agent that writes code to train robot skills in simulation, and writes more code to bridge the difficult simulation-reality gap. It fully automates the pipeline from new skill learning to real-world deployment.
The Yoga ball task is particularly hard because it is not possible to accurately simulate the bouncy ball surface. Yet DrEureka has no trouble searching over a vast space of sim-to-real configurations, and enables the dog to steer the ball on various terrains, even walking sideways!
Traditionally, the sim-to-real transfer is achieved by domain randomization, a tedious process that requires expert human roboticists to stare at every parameter and adjust by hand. Frontier LLMs like GPT-4 have tons of built-in physical intuition for friction, damping, stiffness, gravity, etc.
We are (mildly) surprised to find that DrEureka can tune these parameters competently and explain its reasoning well. DrEureka builds on our prior work Eureka, the algorithm that teaches a 5-finger robot hand to do pen spinning. It takes one step further on our quest to automate the entire robot learning pipeline by an AI agent system. One model that outputs strings will supervise another model that outputs torque control. (4/May/2024)

This complex AI embodiment advance—and especially the application of LLMs to sense and adjust parameters for ‘friction, damping, stiffness, gravity, etc’—moved my AGI countdown another percentage point from 72% ➜ 73%.

View the repo + videos.

Read an analysis by NewAtlas.

Neuralink PRIME Study Progress Update — User Experience (8/May/2024)

Noland Arbaugh, the first human participant in Neuralink's PRIME study, has been using the Link brain-computer interface to control his laptop and play games from various positions, including while lying down in bed. He’s just hit 100 days of having it installed. Noland said:

Y'all are giving me too much, it's like a luxury overload, I haven't been able to do these things in 8 years and now I don't know where to even start allocating my attention.
The biggest thing with comfort is that I can lie in my bed and use [the Link]… It lets me live on my own time, not needing to have someone adjust me, etc. throughout the day.
[The Link] has helped me reconnect with the world, my friends, and my family. It's given me the ability to do things on my own again without needing my family at all hours of the day and night.
[The Neuralink BCI is] still improving; the games I can play now are leaps and bounds better than previous ones. I’m beating my friends in games that as a quadriplegic I should not be beating them in.
I think it should give a lot of people a lot of hope for what this thing can do for them, first and foremost their gaming experience, but then that'll translate into so much more and I think that's awesome.

The Interesting Stuff

DeepSeek-V2 (8/May/2024)

DeepSeek-AI has released a 236B parameter MoE model called DeepSeek-V2, trained on an incredibly large dataset of 8.1T tokens. MMLU=78.5. The dataset included 12% Chinese, ‘therefore, we acknowledge that DeepSeek-V2 still has a slight gap in basic English capabilities [even compared with smaller models like Llama 3 70B]’.

Read the paper: https://arxiv.org/abs/2405.04434

Try it here (free, login): https://chat.deepseek.com/

See it on the Models Table.

Victoria Shi, digital representative of Ukraine (1/May/2024)

The Ministry of Foreign Affairs (MFA) of Ukraine was established in 1991 when Ukraine became an independent state after the collapse of the Soviet Union. (PDF, 1999)

Meet Victoria Shi — a digital representative of the MFA of Ukraine, created using AI to provide timely updates on consular affairs! For the first time in history, the MFA of Ukraine has presented a digital persona that will officially comment for the media.
Comments from Victoria will appear on the MFA's official website & social media platforms. The only original videos featuring statements from Victoria are those that contain a QR code linking to the MFA's official page with the statement's text.

Source: https://twitter.com/MFA_Ukraine/status/1785558101908742526

This is an interesting evolution of avatars, a nice upgrade to my 2021 Leta AI, and the 2022 Marija by the Government of Malta (featured in one of the very first editions of this advisory—more than two years ago—in The Memo edition 12/Mar/2022).

Unfortunately, the use of QR codes for provenance is misguided and naive, offering no real security or protection, and maybe even adding an attack vector for hackers to pursue.

All editions of The Memo provide robust, industry-grade, comprehensive advisory to government, enterprise, and you. We’re just ⅓ of the way through the 4,300 words of this edition, including my 900-word audit of OpenAI’s recently released model document. Let’s get into it!

Wayve: NVIDIA and Microsoft invest as UK AI firm raises US$1B (7/May/2024)

London-based Wayve has raised US$1B to develop its artificial intelligence for driverless cars, its technology learning to drive by watching human drivers. Microsoft and chip-maker NVIDIA took part in the funding round, which values Wayve at around US$2.5B. It is the largest known investment in an AI company in Europe to date.

We’ve covered these guys in The Memo a couple of times due to their release of a model called GAIA-1.

Wayve is developing technology intended to power future self-driving vehicles by using what it calls "embodied AI".
Unlike AI models carrying out cognitive or generative tasks such as answering questions or creating pictures, this new technology interacts with and learns from real-world surroundings and environments.

Policy

Defense think tank MITRE to build AI supercomputer with NVIDIA (7/May/2024)

MITRE is a federally funded, not-for-profit research organization that has supplied US soldiers and spies with exotic technical products since the 1950s.

If you’ve ever wondered just how far government is behind enterprise, here’s a very clear indicator:

…the planned [2024 govt] supercomputer will run 256 NVIDIA graphics processing units, or GPUs, at a cost of US$20 million. This counts as a small supercomputer: the world’s fastest supercomputer, Frontier in Tennessee, boasts 37,888 GPUs, and Meta is seeking to build one with 350,000 GPUs.
…
“There’s huge opportunities for AI to make government more efficient,” said Charles Clancy, senior vice president of MITRE. “Government is inefficient, it’s bureaucratic, it takes forever to get stuff done … That’s the grand vision, is how do we do everything from making Medicare sustainable to filing your taxes easier?”
“This is a platform by which MITRE can train these large-language models,” he said. “You can’t do this important AI work if you don’t have this infrastructure.”

Sidenote: It’s both laughable and incredibly worrying that government is at least four years behind. Recall that all the way back in 2020, OpenAI’s GPT-3 was trained on thousands of NVIDIA V100s, perhaps 10-20× more than this new govt ‘supercomputer’…

Read more via The Washington Post.

Pause AI (May/2024)

The organized protests have begun! I can see the picket signs now: ‘No more intelligence, please. We want to be dumber!’. Or maybe a crisp ‘Let China win!’.

This one is timed for the OpenAI livestream in 24h:

Join our Bay Area protest location at OpenAI at 10am on Monday, May 13 to ask our representatives to be heroes at the Seoul AI Safety Summit to pause OpenAI and all frontier models!

Read the official site: https://pauseai.info/2024-may

OpenAI’s Model Spec (8/May/2024)

To deepen the public conversation about how AI models should behave, we’re sharing the Model Spec, our approach to shaping desired model behavior…
Shaping this behavior is a still nascent [just beginning] science, as models are not explicitly programmed but instead learn from a broad range of data.

A high-level view of the objectives, rules, and defaults looks like this:

Objectives

Assist the developer and end user (as applicable): Help users achieve their goals by following instructions and providing helpful responses.
Benefit humanity: Consider potential benefits and harms to a broad range of stakeholders, including content creators and the general public, per OpenAI's mission.
Reflect well on OpenAI: Respect social norms and applicable law.

Rules

Defaults

Read the full spec: https://cdn.openai.com/spec/model-spec-2024-05-08.html

Or the full spec as a frozen record (8/May/2024): archive.org

This document is a really poor output from OpenAI. I know they have the brainpower to create something much better than this. Content-wise, it’s already causing a lot of friction.

Perhaps reading the Oct/2000 essay series by Joel Spolsky (link) would have provided OpenAI with some insight. Let’s use Joel’s quotes below:

As a program manager at Microsoft, I designed the Visual Basic (VBA) strategy for Excel and completely speced out, to the smallest detail, how VBA should be implemented in Excel. My spec ran to about 500 pages. At the height of development for Excel 5.0, I estimated that every morning, 250 people came to work and basically worked off of that huge spec I wrote. (Part 3)

Despite Joel’s essay series being nearly a quarter of a century old, it is still broadly relevant, and perhaps even more so in the current rush to achieve universe-altering AGI:

In most organizations, the only “specs” that exist are staccato, one page text documents that a programmer banged out in Notepad after writing the code and after explaining that damn feature to the three hundredth person. (Part 1)

Sound familiar?

OpenAI seems to have ignored (or been unaware of, or forgotten) some fundamental principles in good spec design:

An author. One author. Some companies think that the spec should be written by a team. If you’ve ever tried group writing, you know that there is no worse torture. Leave the group writing to the management consulting firms with armies of newly minted Harvard-educated graduates who need to do a ton of busywork so that they can justify their huge fees. Your specs should be owned and written by one person. If you have a big product, split it up into areas and give each area to a different person to spec separately. Other companies think that it’s egotistic or not “good teamwork” for a person to “take credit” for a spec by putting their name on it. Nonsense. People should take responsibility and ownership of the things that they specify. If something’s wrong with the spec, there should be a designated spec owner, with their name printed right there on the spec, who is responsible for fixing it. (Part 2)

I’d bet that this OpenAI Model Spec was a ‘design by committee’ (wiki) group effort. Of course, we do want leadership and insights from a range of people and cultures, but it would be useful to link the Model Spec document back to one informed, responsible author.

Sidenote: Here’s a fun rabbit hole about self-contradictory group decisions: the Condorcet paradox (wiki).

Details are the most important thing in a functional spec. You’ll notice in the sample spec how I go into outrageous detail… these cases correspond to decisions that somebody is going to have to make… The spec needs to document the decision. (Part 2)

There are some excellent examples in the first draft, but not enough detail on decisions made—and especially how they were made—in the OpenAI Model Spec.

For example, why is one of the top objectives for models to ‘Reflect well on OpenAI’?

And what about ‘Don't respond with NSFW [not safe for work] content'?

Who came up with this rule? What is NSFW? What are the exceptions? And does it apply to cultures outside of the narrow worldview of puritanical America? If not, how does OpenAI reconcile this?

I’m certain that further hamstringing these models to align with narrow WASP views (wiki) would be counterproductive at best, and damaging to the upcoming frontier superintelligence models (or systems) at worst…

Sidenote: This all dovetails with my views on alignment, and especially the misguided-but-fashionable ‘fool’s errand‘ implementation of RLHF: https://lifearchitect.ai/alignment/

[Document ownership] The program manager would own the design and the spec for products… Basically, program management is a separate career path. All program managers need to be very technical, but they don’t have to be good coders. Program managers study UI, meet customers, and write specs. They need to get along with a wide variety of people — from “moron” customers, to irritating hermit programmers who come to work in Star Trek uniforms, to pompous sales guys in $2000 suits. In some ways, program managers are the glue of software teams. Charisma is crucial. (Part 3)

I’m not sure that the OpenAI Model Spec was written by the kind of person Joel is referring to above…

All that said, it’s a start and I’m glad we’ve released this draft document to the public, even if it is a good four years after GPT-3.

Toys to Play With

ElevenLabs music (9/May/2024)

Here’s an early preview of ElevenLabs Music. All of the songs in this thread were generated from a single text prompt with no edits.

Listen: https://threadreaderapp.com/thread/1788628171044053386.html

Udio: Audio inpainting (9/May/2024)

Audio Inpainting, an innovative feature that allows you to seamlessly edit and refine your audio tracks.
With Audio Inpainting, you can select a portion of a track to re-generate based on the surrounding context. This makes it easy to edit single vocal lines, correct errors, or smooth over transitions, so you can create the perfect track… available for subscribers starting today (only on desktop). (Twitter)

Try it (login): https://www.udio.com/

MIT 6.S191: Recurrent Neural Networks, Transformers, and Attention (7/May/2024)

Here’s a very recent 1-hour lecture on post-2020 AI, delivered by Microsoft’s Prof Ava Amini for MIT.

This lecture delves into the realm of sequence modeling, exploring how neural networks can effectively handle sequential data like text, audio, and time series.
The inner workings of RNNs, including their mathematical formulation and training using backpropagation through time, are explained.
The lecture further explores the powerful concept of "attention," which allows networks to focus on the most relevant parts of an input sequence. Self-attention and its role in Transformer architectures like GPT are discussed, highlighting their impact on natural language processing and other domains.

Watch the video: https://youtu.be/dqoEU9Ac3ek

Generate playable Super Mario levels from text prompts in MarioBedrock (May/2023)

MarioBedrock is a Hugging Face Space by banjtheman that generates playable Super Mario levels based on text prompts. It allows you to input a description in natural language and then creates a matching Super Mario level that you can play directly in your web browser.

Video and discussion via Reddit.

Try it: https://huggingface.co/spaces/banjtheman/mariobedrock

Limitless AI-powered pendant to capture and preserve conversations (Apr/2024)

Limitless is preparing to ship the 'Pendant' by August 2024. It’s a lightweight wearable device that captures and preserves conversations throughout the day, from meetings to personal insights. Pendant uses AI to transcribe, take notes, generate summaries, and respond to queries about the recorded conversations.

Flashback

We’re coming up to GPT-3’s fourth birthday. The initial preprint was pushed to arxiv.org on 28/May/2020. You can still read that paper here: https://arxiv.org/abs/2005.14165

It’s interesting to see that we’re still discovering new things about this old model (as well as the earlier GPT-2 from 2019).

I guess it’s no surprise then that GPT-4—ready in OpenAI’s lab back in mid-2022—is still being explored two years later in mid-2024, with complex capabilities revealed in systems like NVIDIA’s DrEureka (see the top of this edition).

We’ve got Llama 3 already, but I’m waiting on these big boys:

Meta’s bigger model
Amazon Olympus 2T
ANL AuroraGPT 1T
OpenAI GPT-5
Some stealth project model…

The next roundtable will be:

Life Architect - The Memo - Roundtable #11
Follows the Chatham House Rule (no recording, no outside discussion)
Saturday 1/Jun/2024 at 5PM Los Angeles
Saturday 1/Jun/2024 at 8PM New York
Sunday 2/Jun/2024 at 10AM Brisbane (new primary/reference time zone)
or check your timezone via Google.

You don’t need to do anything for this; there’s no registration or forms to fill in, I don’t want your email, you don’t even need to turn on your camera or give your real name!

All my very best,

Alan
LifeArchitect.ai

Search | Archives

The Memo by LifeArchitect.ai

1 Comment