The Memo - 22/May/2024

Google Trillium, OpenAI and safety, GPT-4o, and much more!

May 21, 2024

To:      US Govt, major govts, Microsoft, Apple, NVIDIA, Alphabet, Amazon, Meta, Tesla, Citi, Tencent, IBM, & 10,000+ more recipients…
From:    Dr Alan D. Thompson <LifeArchitect.ai>
Sent:    22/May/2024
Subject: The Memo - AI that matters, as it happens, in plain English
AGI:     73% ➜ 74%

The winner of The Who Moved My Cheese? AI Awards! for May/2024 is The Verge for messing up the basics of how LLMs work, and pushing a specist narrative.

Join me in about four hours from this emailed edition for a cozy livestream.

Contents

The BIG Stuff (OpenAI and safety, GPT-4o…)
The Interesting Stuff (Claude 4, OpenAI + Apple, Trillium, Blackwell, Baidu…)
Policy (UBI, China, France, US Senate, crony capitalism…)
Toys to Play With (GPT-4o charts, new MMLU, GPT in Rust, OpenAI leaderboard…)
Flashback (AI beginnings…)
Next (Grok, Roundtable link…)

The BIG Stuff

Exclusive: OpenAI and safety (May/2024)

Recently, some more researchers left OpenAI (18/May/2024) because they think OpenAI is releasing models too quickly and isn’t handling safety properly. I think this is a strange perspective, so allow me to provide another possibility.

In plain English, safety experts are working to ensure you don’t get access to higher intelligence.

Let’s take a quick look at the timeline of OpenAI and safety:

2019: OpenAI locks up GPT-2 1.5B. The public has to wait nine months to access the (very small and very basic) model for ‘safety’ reasons. ‘Due to our concerns about malicious applications of the technology, we are not releasing the trained model.’

2019: Connor Leahy locks up his GPT-2 replication model. After Connor successfully replicates GPT-2 1.5B, OpenAI meets with him and convinces him that AI is ‘dangerous’.

2020: OpenAI locks up GPT-3 175B. The public has to wait about a year and a half to access the model for ‘safety’ reasons. ‘We’re able to better prevent misuse by limiting access to approved customers and use cases. We have a mandatory production review process before proposed applications can go live.’ Then in 2024, OpenAI permanently revokes all access to the base GPT-3 models (RIP Leta).

2022: OpenAI locks up DALL-E. The public has to wait about a year to access the model for ‘safety’ reasons.

2023: OpenAI locks up GPT-4. The public has to wait eight months to access the model for ‘safety’ reasons.

2024: OpenAI locks up Sora. The public has to wait an estimated 10 months to access the model for ‘safety’ reasons.

2024: OpenAI locks up GPT-4o (full model). The public has to wait an unknown period to access the model for ‘safety’ reasons. ‘Today we are publicly releasing text… Over the upcoming weeks and months, we’ll be working on the… safety necessary to release the other modalities.’ As of 21/May/2024, this became: ‘GPT-4o real-time voice and vision will be rolling out to a limited Alpha for ChatGPT Plus users in a few weeks. It will be widely available for ChatGPT Plus users over the coming months.’

2025: Let’s see, but I think this probably isn’t the last time we’ll see the phrase “The public has to wait about a year to access the model for ‘safety’ reasons.”

The sophomoric approach to most aspects of business by labs like OpenAI is embarrassing to watch. We already have laws in place that cover whatever they’re worried AI will do. In-house legal teams are well-equipped to provide guidance on the usual legal landscape, particularly regarding the extensive laws and precedents surrounding deepfakes, bomb-making, and other naughty activities. These actions are already illegal and have been addressed by centuries of criminal law and jurisprudence.

It is essential to recognize that there are now over 400 AI models (see my Models Table) being developed in regions inside and outside of the US, and the focus on 'safety' and ethical concerns is not nearly as prominent away from the excess caution and hand-wringing of Silicon Valley.

The widespread practice of AI labs withholding higher intelligence from the public under the guise of ‘safety’ is ethically indefensible. Rather than complaining about not enough safety, we should be investigating the actions of these companies as they keep AI to themselves and suppress humanity’s evolution.

Despite all the claims around ’safety’, I suspect that the unspoken and underlying reasons for locking up AI models are actually far more closely related to money, reputation, power, control, and plain ol’ human greed.

Yes, I’ve covered some of this ground before, including the Sora delay in The Memo edition 19/Feb/2024, and you can find the viz here: https://lifearchitect.ai/gap/

As a caveat to this piece, I do believe AI safety is vital. Some of my thoughts above are an oversimplification of a very complex problem. Perhaps the most complex problem the world has ever seen. If you’d like to read more on AI safety, my favourite paper this month is by Yoshua Bengio, Stuart Russell, Max Tegmark and others: ‘Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems‘. It looks at ‘world models’ as a solution to the problem of AI safety, and especially the egregious ‘fool’s errand’ of fine-tuning AI models by humans (read my report on this at: https://lifearchitect.ai/alignment/)

Read the paper: https://arxiv.org/abs/2405.06624

GPT-4o (13/May/2024)

OpenAI announced a new ‘world model’ combining audio, vision, and text in real time. The model is called GPT-4o (for ‘Omnimodel’). Unfortunately, GPT-4o is not a huge improvement in ‘IQ’, and underperforms other frontier models like Claude 3 Opus.

The new functionality and rating on the MMMU (vision) benchmark did provide a small uptick to my AGI countdown, from 73 → 74%. I address why the model performance is not that impressive here: https://lifearchitect.ai/gpt-4-5/#gpt-4o

As noted in the ‘safety’ section above, this is only a partial release for now:

Today [13/May/2024] we are publicly releasing text and image inputs and text outputs. Over the upcoming weeks and months, we’ll be working on the technical infrastructure, usability via post-training, and safety necessary to release the other modalities.

Read the announcement: https://openai.com/index/hello-gpt-4o/

See it on the Models Table: https://lifearchitect.ai/models-table/

Watch one of the official demo videos (link):

The Interesting Stuff

Exclusive: Anthropic has a model 4× more powerful than Claude 3 Opus coming (20/May/2024)

In a post on AI safety—and using a logical interpretation of language which could arguably be fuzzy—OpenAI competitor Anthropic noted (20/May/2024):

Currently, we conduct pre-deployment testing in the domains of cybersecurity, CBRN [chemical, biological, radiological, and nuclear], and Model Autonomy for frontier models which have reached 4x the compute of our most recently tested model (you can read a more detailed description of our most recent set of evaluations on Claude 3 Opus here).

My estimates for the Mar/2024 release of Claude 3 Opus were 2T parameters trained on 40T tokens (see my Models Table). If we 4× this (and 4× compute does not mean we’ll get 4× the size), we may be heading towards an 8T parameter model from Anthropic this year.

Whoah. If this does become Claude 4, I hope they optimize it enough that we’re measuring single prompt costs in cents rather than dollars!

US FDA clears Neuralink's brain chip implant in second patient, WSJ reports (20/May/2024)

Neuralink expects to implant its device in the second patient in June and a total of 10 people this year, the report said, adding that more than 1,000 quadriplegics had signed up for its patient registry.
The company also aims to submit applications to regulators in Canada and Britain in the next few months to begin similar trials, according to the report.

The WSJ report is behind a paywall and doesn’t say much more.

Policy

AI 'godfather' says universal basic income will be needed (18/May/2024)

Geoffrey Hinton, often referred to as the ‘godfather of AI’, has advocated for the implementation of a universal basic income (UBI) to address potential job displacement caused by advancements in artificial intelligence. Hinton emphasizes that as AI systems become more capable, many jobs currently performed by humans might be automated, necessitating a societal shift towards UBI to ensure economic stability.

Toys to Play With

Models Table animation (May/2024)

Data science analyst Eneda Muja created this interesting animation of my Models Table data.

View original post by Eneda.

MMLU-Pro (May/2024)

An independent research team decided to improve upon MMLU. Jack Clark from Anthropic provided some clear analysis of this benchmark (20/May/2024):

What they did: MMLU challenges LLMs to answer multiple choice questions, picking from four possible answers. MMLU-Pro expands the number of potential answers to 10, which means that randomly guessing will lead to significantly lower scores. Along with this, they expand on the original MMLU by adding in hard questions from Scibench (science questions from college exams), TheoremQA, and STEM websites, as well as sub-slicing the original MMLU to "remove the trivial and ambiguous questions". In total, they add 12,187 questions - 5,254 new questions along with 6,933 selected from MMLU.

Results - it's hard: MMLU-Pro seems meaningfully harder; Claude 3 Sonnet saw its performance fall from 0.815 on MMLU to 0.5793 on MMLU Pro. Other models have even more dramatic falls - Mixtral-8x7B-v0.1 sees its performance drop from 0.706 to 0.3893.

See it: https://huggingface.co/datasets/TIGER-Lab/MMLU-Pro

Columns: your AI data storyteller (May/2024)

Columns is an AI-driven application designed to transform complex data into engaging narratives. The platform now leverages GPT-4o to help users generate insightful stories from raw data, making it easier to understand and communicate information effectively.

Try it here (free, no login): https://columns.ai/chatgpt

Discussion: https://news.ycombinator.com/item?id=40405380

GPT-burn: Implementation of the GPT-2 architecture in Rust + Burn (May/2024)

This project aims to be a clean and concise re-implementation of GPT-2. The model implementation, contained in src/model.rs, is under 300 lines of code. While this was a fun exercise mostly for educational purposes, it demonstrates the utility of Rust and Burn in the machine learning domain: The entire project compiles into a single binary, making deployment relatively straightforward.

View the repo: https://github.com/felix-andreas/gpt-burn

Turn your data into APIs for text generation and natural language search (May/2024)

Selfie is a local-first, open-source project designed to turn your data into APIs for text generation and natural language search. It supports personalized interactions for chatbots, storytelling, games, and more, using local or hosted language models. The platform offers a web-based UI for easy data import and interaction.

View the repo: https://github.com/vana-com/selfie

Llama 3 implementation one matrix multiplication at a time (May/2024)

Feeling super nerdy? This code repository provides an implementation of the Llama 3 model from scratch, focusing on matrix multiplication. The project includes code for loading tensors directly from Meta's model file for Llama 3, and uses tiktoken as the tokenizer.

View the repo: https://github.com/naklecha/llama3-from-scratch

Experts.js: The easiest way to create and deploy OpenAI's Assistants (May/2024)

Experts.js simplifies the creation and deployment of OpenAI's Assistants, enabling them to be linked together as Tools to form advanced Multi AI Agent Systems. This framework leverages OpenAI's new Assistants API, allowing for expanded memory and attention to detail by integrating various tools and utilizing a managed context window called a Thread. Key features include the ability to handle instructions up to 256,000 characters and efficient file search across up to 10,000 files per assistant.

View the repo: https://github.com/metaskills/experts

Benchmark leaderboard by OpenAI (May/2024)

This repository contains a lightweight library for evaluating language models. It emphasizes the zero-shot, chain-of-thought setting with simple instructions, aiming to reflect the models' performance in realistic usage. The repository includes evaluations for various benchmarks like MMLU, MATH, and HumanEval, and supports APIs for OpenAI and Claude.

View the repo with leaderboard: https://github.com/openai/simple-evals

Synthetic data from scratch (May/2024)

In one of the recent livestreams, I stepped through the creation of synthetic data from scratch. No rehearsal, on the spot, using theory from Microsoft (phi-1) and steps from Cosmopedia.

Read the paper: https://arxiv.org/abs/2306.11644

See the steps: https://huggingface.co/blog/cosmopedia

Read the shared document: https://docs.google.com/document/d/1dNrRumSf6Umq63r6cirvpG5ahizg8HgqJCl_RzJ6AhM

Watch the video (link):

Flashback

How did we get here? We’ve gone as far back as 1959 in The Memo edition 21/Oct/2022, where we looked at the namesake for this advisory, the MIT AI memos. But the real cornerstone (leaving out Turing’s 1940s documents for the moment) was the Dartmouth Workshop in 1956 (that’s 68 years ago).

Dartmouth College is an Ivy League university founded in 1769 (that’s 255 years ago).

Sidenote: Can you name all eight members of the Ivy League?

You’ll find Dartmouth College about two hours north of Boston. More specifically, you’ll find it along the Connecticut River (and hosting a part of the Appalachian trail) in Hanover, Grafton County, New Hampshire, United States.

Thanks to Peter for correcting the photo of Trenchard More!

This month marks the 68th anniversary of the workshop preparation. On 26 May 1956, John McCarthy pulled together 11 core attendees including Marvin Minsky.

Around 18 June 1956, the earliest participants arrived at the Dartmouth campus to join John McCarthy who already had an apartment there.

The workshop was held on the entire top floor of the Dartmouth Math Department for roughly eight weeks, from 18 June 1956 to 17 August 1956.

Their research, discoveries, and insights drove the entire field of artificial intelligence, and set the stage for neural networks—still the core concept of today’s models.

Wiki: https://en.wikipedia.org/wiki/Dartmouth_workshop

The official but brief page by Dartmouth.edu

Read the 13-page 1956 Dartmouth proposal with US$13.5K budget (PDF).

The Dartmouth workshop is also on my AI timeline: https://lifearchitect.ai/timeline/

On 14/May/2024, Elon said ‘Major upgrade to Grok coming in about 39 days [taking us to a release date of around Saturday, 22 June 2024].’

The next roundtable will be:

Life Architect - The Memo - Roundtable #11
Follows the Chatham House Rule (no recording, no outside discussion)
Saturday 1/Jun/2024 at 5PM Los Angeles
Saturday 1/Jun/2024 at 8PM New York
Sunday 2/Jun/2024 at 10AM Brisbane (new primary/reference time zone)
or check your timezone via Google.

You don’t need to do anything for this; there’s no registration or forms to fill in, I don’t want your email, you don’t even need to turn on your camera or give your real name!

All my very best,

Alan
LifeArchitect.ai

Search | Archives

John Beasley

The focus of this safety issue does seem to revolve around power and the c word, censorship which will not be uttered no doubt. A higher intelligence accessible by the public will undermine many 'narratives' pushed out by both the corporate and government on many issues. That's not a safe place for many of the elite to be in. Thanks for staying on this question.

Expand full comment

The Memo by LifeArchitect.ai

Discussion about this post