The Memo - 21/Jun/2024

Claude 3.5 Sonnet, Nemotron-4-340B, AuroraGPT updates, and much more!

Jun 20, 2024

To:      US Govt, major govts, Microsoft, Apple, NVIDIA, Alphabet, Amazon, Meta, Tesla, Citi, Tencent, IBM, & 10,000+ more recipients…
From:    Dr Alan D. Thompson <LifeArchitect.ai>
Sent:    21/Jun/2024
Subject: The Memo - AI that matters, as it happens, in plain English
AGI:     74 ➜ 75%

Microsoft VP Prof Sébastien Bubeck on Phi-3 (23/Apr/2024):
If you have a very, very high stakes application, let’s say in a healthcare scenario, then I definitely think that you should go with the frontier model—the best, most capable, most reliable. For other uses, other factors matter more, including speed and cost.
That’s where you want to go with Phi-3.

Contents

The BIG Stuff (Claude 3.5 Sonnet, Nemotron-4-340B, Claude 3 as judge…)
The Interesting Stuff (mid-year report, Ilya, Waymo 3.5x, Runway Gen-3, V2A…)
Policy ($100M model laws, OpenAI + NSA, Trump uses ChatGPT for speech…)
Toys to Play With (AuroraGPT, social AI + human, Llama 3 training, Teleport…)
Flashback (IQ testing AI…)
Next (Roundtable…)

The BIG Stuff

Anthropic Claude 3.5 Sonnet (21/Jun/2024)

Anthropic released Claude 3.5 Sonnet a few hours ago. Smaller than the previous model Claude 3 Opus, this is the new state-of-the-art model. MMLU=90.4 (5-shot CoT). GPQA=67.2 (maj32 + 5-shot). Scores 5/5 on ALPrompt 2024H1.

This model bumped the AGI countdown from 74% ➜ 75%: https://lifearchitect.ai/agi/

For the first time, a large language model has breached the 65% mark on GPQA, designed to be at the level of our smartest PhDs. ‘Regular’ PhDs score 34%, while in-domain specialized PhDs are at 65%. Claude 3 Sonnet scored 67.2% (maj32 + 5-shot).

Anthropic gave one hour notice before release with this fun substitution cipher, decoded here.

Read the announce: https://www.anthropic.com/news/claude-3-5-sonnet

Read the model card.

Try it here (free, login): https://poe.com/Claude-3.5-Sonnet

See an example app output: https://x.com/skirano/status/1803809495811858807

See it on the Models Table: https://lifearchitect.ai/models-table/

NVIDIA Nemotron-4-340B (15/Jun/2024)

A successor to Megatron (my link), Nemotron-4-340B was trained on 9T tokens (27:1). The model was trained using 6,144 H100s between December 2023 and May 2024. MMLU=81.1. The dataset is made up of web documents, news articles, scientific papers, books, and more (Feb/2024):

This is currently the largest open-source model available to date.

Read the paper.

View the repo.

See it on the Models Table: https://lifearchitect.ai/models-table/

Dell to provide H100 racks for xAI supercomputer (20/Jun/2024)

Dell and Supermicro (SMC) are set to provide the servers for Elon Musk’s xAI supercomputer. This project, described as ‘the world’s largest and most powerful supercomputer’, will be located in Memphis, Tennessee and represents the largest multi-billion dollar investment in the city's history. The supercomputer is expected to be operational by the fall [US fall is September to November] of 2025 and will power the next version of xAI’s Grok chatbot.

I’m including the images below as I think they are important. These racks will power the next frontier model—approaching superintelligence—and we will interact directly with models that come out of these servers (after a full six months of training!).

Dell NVIDIA H100 racks. Click to enlarge.

Dell NVIDIA H100 rack. Click to enlarge.

The Interesting Stuff

Integrated AI: The sky is quickening (mid-2024 AI retrospective, Jun/2024)

In 2024, the capabilities of cutting-edge AI systems have surpassed what even the brightest PhDs—those who’ve achieved a level of academic mastery after two decades in the education system—can fully comprehend or match. This is an incredible milestone with incredible benefits. AI will soon begin independently tackling major challenges facing humanity such as education, health, economics, and scientific mysteries that have stumped our greatest minds.

Read the report: https://lifearchitect.ai/the-sky-is-quickening/

Watch the video (link):

The GPT-4 model family (20/Jun/2024)

It’s challenging to interpret just what OpenAI are doing with their model names. Following on from my complex GPT-3 model family viz released last year, I’ve now created a GPT-4 model family viz.

Take a look: https://lifearchitect.ai/gpt-4/#family

It’s been a huge month for AI releases, with the most powerful frontier model and the largest open-source model being released in the last few days. We’re not even halfway through this edition, covering massive progress across labs like Meta and DeepSeek, new policies affecting you, and toys to play with like a new Facebook clone for humans and AI…

Microsoft AI CEO Mustafa Suleyman audits OpenAI’s code (14/Jun/2024)

Microsoft's AI chief, Mustafa Suleyman, has been examining OpenAI's algorithms, highlighting the intertwined yet competitive relationship between Microsoft and OpenAI. While Microsoft has intellectual property rights to OpenAI’s software due to its substantial investment, the presence of Suleyman has added a layer of complexity. Despite collaboration, there are signs Microsoft might be preparing to develop its own large-scale AI models independently.

Policy

California’s new AI bill: Why Big Tech is worried about liability (14/Jun/2024)

California's SB 1047 bill has sparked significant concern among tech leaders, as it mandates safety testing for companies spending over US$100M on AI ‘frontier models’. Critics like Meta’s chief AI scientist, Yann LeCun, argue that such regulations could stifle innovation and harm California's tech industry. Proponents, however, emphasize the importance of accountability, especially for AI systems with the potential to cause mass casualty events.

Toys to Play With

AuroraGPT dataset presentation: 25/Jun/2024

The Argonne National Laboratory is presenting a seminar on ‘Preparing Data at Scale for AuroraGPT’, the upcoming trillion-parameter science model.

Abstract: In this talk, I’ll share the recent progress of the AuroraGPT Data Team, how we contribute to the project of building a science focused LLM with AuroraGPT, how we collaborate with the other teams, and what topics we see as open questions. As the data team, our team is responsible for identifying, preparing, and deduplicating scientific data and text. We’ll talk about the systems and data quality challenges that our team tackles to prepare terabytes of scientific data and text to produce high quality text and data for training.

Tuesday 25/Jun/2024 @ 1:00 PM – 2:00 PM (Illinois, USA)

Attend via Zoom.

Some limited updates in this video from May/2024 at 51m47s.

How Meta trains large language models at scale (12/Jun/2024)

Here’s a mental toy to play with: training Llama 3! Meta's AI research has faced significant computational challenges due to the scale required for training large language models (LLMs). The company has had to innovate across its infrastructure stack, including hardware reliability, fast recovery on failure, efficient preservation of the training state, and optimal connectivity between GPUs. Meta built two 24k GPU clusters using RoCE and InfiniBand to train its latest model, Llama 3, demonstrating how it balances performance and operational learnings.

Flashback

I’ve been wandering down the halls of IQ testing AI models, something I started exploring back in 2021. The sheer pace of change in model power and smarts is incredibly jarring to me.

Take a look: https://lifearchitect.ai/iq-testing-ai/

The next roundtable will be:

Life Architect - The Memo - Roundtable #13
Follows the Chatham House Rule (no recording, no outside discussion)
Saturday 29/Jun/2024 at 5PM Los Angeles
Saturday 29/Jun/2024 at 8PM New York
Sunday 30/Jun/2024 at 10AM Brisbane (new primary/reference time zone)
or check your timezone via Google.

You don’t need to do anything for this; there’s no registration or forms to fill in, I don’t want your email, you don’t even need to turn on your camera or give your real name!

All my very best,

Alan
LifeArchitect.ai

Search | Archives

The Memo by LifeArchitect.ai