The Memo - 5/Jul/2024

DCLM-Pool 240T, Baidu ERNIE 4.0 Turbo, ChatGPT 'delves', and much more!

Jul 05, 2024

To:      US Govt, major govts, Microsoft, Apple, NVIDIA, Alphabet, Amazon, Meta, Tesla, Citi, Tencent, IBM, & 10,000+ more recipients…
From:    Dr Alan D. Thompson <LifeArchitect.ai>
Sent:    5/Jul/2024
Subject: The Memo - AI that matters, as it happens, in plain English
AGI:     75%

OpenAI CEO (27/Jun/2024):
”There’s tonnes of wonderful things… What are our lives going to be like when it's not just that the computer understands us, gets to know us, and helps us do these things? We can say, 'Hey computer, discover all of physics,' and it can go off and do that. What does it mean when we can say, 'Hey, start and run a great company,' and it can go off and do that? That's a big change."

July 2024! We’re in the second half of the year already. The first half—as presented in my mid-year report—was spectacular.

When you're a hammer, everything looks like a nail. The New Yorker. Apr/2019.

There’s been a bit of talk about an ‘AI downturn’. If the media can’t see the immense ‘economic benefits’ already apparent, perhaps it is to be expected from that industry. Some are saying we are in the ‘trough of disillusionment’ for AI, and while I like Gartner, their incessant need to map their ‘hype cycle’ graphic (wiki) to anything and everything is… justified by their business model.

There is no AI hype cycle. Like humanity, AI is much more than a tool, a productivity enhancement, an automation, or even an industry. We can’t map ‘evolution’ or ‘imagination’ to a hype cycle. Gartner and Goldman Sachs may want to stick to the knitting.

20240625 Goldman Spend

2.53MB ∙ PDF file

Download

Contents

The BIG Stuff (Sonnet details, humor…)
The Interesting Stuff (first major app written with AI, acquisitions, dataset…)
Policy (ChatGPT publishing papers, IATSE…)
Toys to Play With (Free chat, running 1.5B/70B/405B models locally, UBI show…)
Flashback (Roadmap…)
Next (Roundtable…)

The BIG Stuff

Claude 3.5 Sonnet details (Jun/2024)

The Claude 3.5 Sonnet release has been significant, and new emergent properties are still being discovered. We first covered this model within a few hours of launch in The Memo edition 21/Jun/2024.

The Memo - 21/Jun/2024

Dr Alan D. Thompson

June 20, 2024

Read full story

There is still no official information on Claude 3.5 Sonnet: no paper, no technical note, and no model card. There is a blog post with some benchmarks and pretty pictures, and that’s it.

In 2020, AI labs were proud to release hundred-page academic papers about their models. By 2023, this had shrunk to releasing short ‘technical notes’ or single-page ‘model cards’. Now—apparently—we have to trawl through mainstream media pieces to get just a glimpse of how models were trained.

Michael Gerstenhaber, head of product at Anthropic, was interviewed by two outlets in particular where he provided a little more detail on Claude 3.5 Sonnet. For Wired:

Claude 3.5 Sonnet model is larger than its predecessor but draws much of its new competence from innovations in training. For example, the model was given feedback designed to improve its logical reasoning skills. — 20/Jun/2024

For TechCrunch:

The improvements are the result of architectural tweaks and new training data, including AI-generated data. Which data specifically? Gerstenhaber wouldn’t disclose, but he implied that Claude 3.5 Sonnet draws much of its strength from these training sets. — 20/Jun/2024

I’ve been using the incredible Claude 3.5 Sonnet Artifacts component, and it really is amazing to see in real time (as we explored in the recent roundtable). Take a look at my video (link) and generated web page at https://lifearchitect.ai/distractions

I’ve also taken the time to highlight the full 2,800-word system prompt for Claude 3.5 Sonnet Artifacts. The biggest innovation here is Anthropic’s ‘<antThinking>’ mechanism which allows Claude 3.5 to privately think and reason step-by-step, also known as chain-of-thought (CoT) reasoning.

Anthropic has documented the older <thinking> tag here: https://docs.anthropic.com/en/docs/build-with-claude/tool-use#chain-of-thought

In early July 2024, other researchers flagged the new antThinking hidden mechanism here and here.

Take a look at the Claude 3.5 Sonnet Artifacts system prompt.

How funny is ChatGPT? A comparison of human- and A.I.-produced jokes (3/Jul/2024)

Last year I had a difference of opinion with Prof Jeremy Howard during a private discussion where I told ABC that AI would outperform humans in standup comedy (joke telling). Turns out I was qualitatively and quantitatively correct. A new study systematically tested ChatGPT 3.5’s humor production abilities against human participants. Results showed that ChatGPT 3.5-produced jokes were rated as equally funny or funnier than human-produced jokes, regardless of the comedic task.

ChatGPT outperformed the majority of our human humor producers on each task. ChatGPT 3.5 performed above 73% of human producers on the acronym task, 63% of human producers on the fill-in-the-blank task, and 87% of human producers on the roast joke task.

It is unfortunate that researchers keep using the smaller and lower quality gpt-3.5 20B model versus the much larger GPT-4 Classic 1.76T model. They are effectively testing something that is 88 times smaller (and perhaps 88 times worse) than the current state-of-the-art model.

Source: Microsoft CodeFusion paper Oct/2023, original paper withdrawn by Microsoft, my PDF backup.

In my time as a human intelligence researcher working alongside Mensa International and the Davidson Academy and many education systems, humor was widely accepted to be a strong indicator of exceptional intelligence (listen to my tribute to Prof Miraca Gross for GE where she talks about this). I await similar testing on GPT-4 (or the current SOTA, Claude 3.5!).

Read the new ChatGPT humor paper via PLOS ONE.

See it on my GPT Achievements Table.

This is another mammoth edition: around 4,000 words, featuring more than 10 new AI toys to play with…

The Interesting Stuff

Baidu ERNIE 4.0 Turbo (28/Jun/2024)

Baidu announced a new model called ERNIE 4.0 Turbo, with no detail besides the name. The company seems to be following OpenAI’s naming scheme and model design quite closely:

May/2020: OpenAI GPT-3 175B
(1½ year gap…)
Dec/2021: Baidu ERNIE 3.0 260B

Mar/2023: OpenAI GPT-4 1.76T
(7 month gap…)
Oct/2023: Baidu ERNIE 4.0 1T

Nov/2023: OpenAI GPT-4 Turbo
(7 month gap…)
Jun/2024: Baidu ERNIE 4.0 Turbo

Baidu also revealed that ERNIE Bot has 300 million users, which would be about 2× more than OpenAI’s ChatGPT.

Read (not very much) more via Reuters.

The ERNIE Bot model playground is available to Chinese citizens with a Chinese mobile phone number here: https://yiyan.baidu.com/

New datasets: DCLM-Pool and DCLM-Baseline (20/Jun/2024)

Source: DCLM paper: https://arxiv.org/abs/2406.11794

A team of researchers from 23 labs (including University of Washington, Apple, and Toyota Research) have deployed the world’s largest dataset, using web data from Common Crawl.

The final dataset is 240 trillion tokens in 1PB (1,000TB or 1,000,000GB) uncompressed.

DCLM-Pool is the largest dataset to date, 8× larger than the previous Oct/2023 SOTA of RedPajama-Data-v2 with 30 trillion tokens in 125TB.

The full DCLM-Pool dataset is nearly useless though, as shown in the graphic above, and has to be filtered down to around 1% of its size to be useful for model training right now. The resulting dataset is called DCLM-Baseline, and is 4 trillion tokens in about 13,000GB uncompressed.

Interestingly, the initial web rip is pretty similar to what we achieved in 2020. GPT-3’s initial CC download was 45TB, versus the DCLM-Pool dataset at 370TB compressed. (From the GPT-3 paper: "The CommonCrawl data was downloaded from 41 shards of monthly CommonCrawl covering 2016 to 2019, constituting 45TB of compressed plaintext before filtering".)

Read my paper: ‘What’s in my AI?’: https://lifearchitect.ai/whats-in-my-ai/

Read the DCLM paper: https://arxiv.org/abs/2406.11794

See the project page: https://www.datacomp.ai/dclm/

See DCLM-Pool and DCLM-Baseline on my updated Datasets Table: https://lifearchitect.ai/datasets-table/

LetterDrop: First major app developed entirely with AI (Jun/2024)

Developer Dawei Ma has used GPT-4o to generate a newsletter app linked to Cloudflare:

I used the GPT-4o model to generate the code for LetterDrop. That means the code is generated by the AI model, and I only need to provide the prompts to the model. This approach is very efficient and can save a lot of time. I've also recorded a video to show how to create the LetterDrop project using the GPT-4o model.
That also means you can easily customize the code by changing the prompts. You can find the prompts in the CDDR file.

Take a look: https://github.com/i365dev/LetterDrop

OpenAI acquisition #1: Enterprise data startup ‘Rockset’ (21/Jun/2024)

OpenAI has made its first acquisition by purchasing Rockset, an enterprise analytics startup, to enhance its retrieval infrastructure across various products. The terms of the acquisition were not disclosed, but Rockset has raised $105 million in funding to date. The integration will see some members of the Rockset team joining OpenAI, as the company gradually transitions its current customers off the platform.

Official announce: https://openai.com/index/openai-acquires-rockset/

Policy

The word ‘delve’ and how cheap, outsourced labour in Africa is shaping AI English (Apr/2024 and Jun/2024)

For many years now I’ve spoken about how fine-tuning on human preferences is a fool’s errand. Known as RLHF—reinforcement learning from human feedback—using humans to do AI’s work results in some horrible issues. You can read my thoughts here:

https://lifearchitect.ai/alignment/

Back in April, the Guardian jumped on a finding by Aussie researcher Prof Jeremy Nguyen (Tweet) who asked:

Are medical studies being written with ChatGPT? Well, we all know ChatGPT overuses the word "delve". Look below at how often the word 'delve' is used in papers on PubMed (2023 was the first full year of ChatGPT).

The Guardian investigated exploitation of African workers who are paid minimal wages to assist in the creation of chatbots, resulting in their language patterns being mirrored by AI systems. This has led to the emergence of ‘AI-ese,’ a distinct writing style used by AI assistants.

[The word] “delve” was overused by ChatGPT compared to the internet at large. But there’s one part of the internet where “delve” is a much more common word: the African web. In Nigeria, “delve” is much more frequently used in business English than it is in England or the US. So the workers training their systems provided examples of input and output that used the same language, eventually ending up with an AI system that writes slightly like an African.
And that’s the final indignity. If AI-ese sounds like African English, then African English sounds like AI-ese… how much worse will it get when a significant chunk of humanity sounds like the AI systems they were paid to train?

Toys to Play With

From bare metal to a 70B model: infrastructure set-up and scripts (25/Jun/2024)

In a few months, a small team of researchers and engineers from Imbue trained a 70B parameter model on their own infrastructure, outperforming zero-shot GPT-4 on reasoning tasks. This end-to-end guide details the challenges and solutions in setting up the infrastructure, from initial cluster setup to error recovery. The team also released several infrastructure scripts to assist other teams in stabilising their model training environments.

Flashback

Two years ago, I published a paper called ‘Roadmap: AI’s next big steps in the world (AI that matters, as it happens, in plain English)’. It still seems relevant, though we’re further along the timeline of progress.

You are here.
You live in 2022.
You have a front row seat to the most exciting period in human history.
And perhaps most importantly, for some reason, you have unparalleled access to the inner workings of what’s going on. AI labs are extraordinarily and astonishingly open about their progress. You can read the AI papers at no charge, as they are released. You can play with many of the AI models, often for free. You are living in the future. And AI’s next big steps in the world are going to be groundbreaking.

Read it: https://lifearchitect.ai/roadmap/

Watch my video (link):

There are many models in the pipeline that have finished training in the last few weeks but have not been fully released:

Grok 2 (text generation)
GPT-4o (image generation and voice components)
Imagen 3 (image generation)
Claude 3.5 Opus (text generation)
Sora (video generation)
GPT-5 (text generation)
and many more…

The next roundtable will be:

Life Architect - The Memo - Roundtable #14
Follows the Chatham House Rule (no recording, no outside discussion)
Saturday 13/Jul/2024 at 5PM Los Angeles
Saturday 13/Jul/2024 at 8PM New York
Sunday 14/Jul/2024 at 10AM Brisbane (new primary/reference time zone)
or check your timezone via Google.

You don’t need to do anything for this; there’s no registration or forms to fill in, I don’t want your email, you don’t even need to turn on your camera or give your real name!

All my very best,

Alan
LifeArchitect.ai

Search | Archives

The Memo by LifeArchitect.ai

The Memo - 21/Jun/2024

Discussion about this post

The Memo by LifeArchitect.ai

The Memo - 5/Jul/2024

DCLM-Pool 240T, Baidu ERNIE 4.0 Turbo, ChatGPT 'delves', and much more!

The BIG Stuff

The Memo - 21/Jun/2024

The Interesting Stuff

Policy

Toys to Play With

Flashback

Next

Discussion about this post