The Memo - 30/Aug/2023

Llama-3 and OLMo rumours, Consensus, The Line, and much more!

Aug 29, 2023

FOR IMMEDIATE RELEASE: 30/Aug/2023

Welcome back to The Memo.

You’re reading alongside policy writers and decision makers within governments, agencies, and intergovernmental organisations including the ███, ██████, the ████, the ███, the █████ Government, the ███, the Government of █████, the trillion-dollar foreign reserves management company behind the Government of █████████, and more…

The winner of the Who Moved My Cheese? AI Awards! for August 2023 is theoretical physicist Prof Michio Kaku, who (very wrongly) refers to LLMs as ‘glorified tape recorders’:

It takes snippets of what’s on the web created by a human, splices them together and passes it off as if it created these things… And people are saying, ‘Oh my God, it’s a human, it’s humanlike.’

An author once said: ‘Better to remain silent and be thought a fool than to speak and to remove all doubt.’

This is another very long edition, and if you’re interested in reading and/or listening, there are several hours of updates here. In the Toys to play with section, we look at the latest way to run Llama 2 on Mac, a new gold-standard LLM platform for academic research and writing, Harvey Castro MD’s new LLM one-pagers, a new free DALL-E 2 experiment, and a rehearsal version of my latest keynote for a major government.

I will be ramping up the livestreams over the next few weeks. The best way to be notified about those is to click some buttons (Subscribe ➜ Notify) on the YouTube channel:

https://www.youtube.com/c/DrAlanDThompson

The next roundtable for full subscribers will be:

Life Architect - The Memo - Roundtable #3 with Harvey Castro
Follows the Chatham House Rule (no recording, no outside discussion)
Saturday 23/Sep/2023 at 5PM Los Angeles
Saturday 23/Sep/2023 at 8PM New York
Sunday 24/Sep/2023 at 8AM Perth (primary/reference time zone)
or check your timezone via Google.

Details at the end of this edition.

The BIG Stuff

AI vs Human: The Creativity Experiment (29/Aug/2023)

I recently appeared on ABC Catalyst in Australia. The documentary is called AI vs Human: The Creativity Experiment, and also features my friend and colleague Prof Jeremy Howard (yes, the Aussie that invented large language model training and fine-tuning as we know it!).

In Australia, you can stream on ABC iview with some alternate viewing times on ABC TV.

Preview clip:

ChatGPT Enterprise (28/Aug/2023)

It’s the most common concern I hear during my keynotes: What about my data? While Microsoft Azure has offered a solution to that during the first half of 2023, now OpenAI has a direct offering: ChatGPT Enterprise.

You own and control your business data in ChatGPT Enterprise. We do not train on your business data or conversations, and our models don’t learn from your usage. ChatGPT Enterprise is also SOC 2 compliant and all conversations are encrypted in transit and at rest. Our new admin console lets you manage team members easily and offers domain verification, SSO, and usage insights, allowing for large-scale deployment into enterprise.

Notably, ChatGPT Enterprise provides GPT-4 as ‘unlimited’ and 2x faster than standard.

See OpenAI’s pen test and compliance docs: https://trust.openai.com/

Increasing LLM training budgets (25/Aug/2023)

AnthropicAI’s CEO Dario Amodei says:

Right now the most expensive model [in Aug/2023] costs +/- $100m. Next year we will have $1B+ models. By 2025, we may have a $10B model. (24/Aug/2023, Twitter)

Here are my draft numbers for training spend on an LLM (in USD) over the years:

2018: BERT 330M (4 days x 64 TPUv2): $7K
2019: GPT-2 1.5B (7 days x 256 TPU v3): $43k
2020: GPT-3 175B (1 month x 1,024 V100s): $4.6M
2021: MT-NLG 530B (3 months x 2,000 A100s): $25M
2022: PaLM 540B: (64 days x 6,144 TPU v4): $23.1M
2022: BLOOM 176B (176 days x 384 A100s): $5M
2023: GPT-4/PaLM 2/Gemini (Gemini on TPUv5) 2T: $100M
2024 (4 months on 25,000 H100s): GPT-5: $1B
2025 (not yet announced hardware): GPT-6: $10B

GPT-4 (my experience is that this model is more like GPT-4.5 when using ‘Advanced Data Analysis’, which is the new name for the ‘Code Interpreter’ as of 28/Aug/2023) kindly generated this chart within seconds based on my rough working above:

And here’s a paper about model training costs up to 2021: https://epochai.org/blog/trends-in-the-dollar-training-cost-of-machine-learning-systems

For a little bit of history, see an article from all the way back in 2019, ‘The Staggering Cost of Training SOTA AI Models’ (Jun/2019). As pointed out there, ‘the cost of the compute used to train models is also expected to become significantly cheaper with the continuing advance of algorithms, computing devices, and engineering efforts.’

In plain English, as we move along the timeline, AI labs continue to increase their training spend, just as the capability of training hardware increases (and/or hardware cost decreases). A perfect storm for the evolution of humanity.

Llama-3 and Llama-4 rumours (26/Aug/2023)

Overheard at a Meta GenAI social: "We have compute to train Llama 3 and 4. The plan is for Llama-3 to be as good as GPT-4." "Wow, if Llama-3 is as good as GPT-4, will you guys still open source it?" "Yeah we will. Sorry alignment people."

From: https://twitter.com/agikoala/status/1695125016764157988

I’d like to see Llama-3 hitting at least 340B parameters (in line with PaLM 2, the largest dense model available today) and trained on 7T tokens. Any predictions on Llama-4 may be too far ahead to guess!

Here’s my take on alignment (and its damage): https://lifearchitect.ai/alignment/#rlhf

Read these horrifying examples of alignment documented by the Ollama team: https://ollama.ai/blog/run-llama2-uncensored-locally

The Interesting Stuff

Allen AI introduces Dolma dataset, OLMo 70B coming soon (18/Aug/2023)

As usual, AllenAI provides a rigorous and detailed overview of their newest 3T token dataset, Dolma, “Data to feed OLMo’s Appetite”. Congrats to Jesse and the team.

This dataset is available to all, and will be used to train their upcoming language model, OLMo 70B, expected in Q1 2024. Dolma is said to be the ‘largest open dataset to date,’ because the next biggest (TTI’s RefinedWeb) only released a portion (600B tokens) of their full 5T token dataset.

Read my What’s in My AI report from Mar/2022.

Read more via AP.

RLHF vs RLAIF for language model alignment (22/Aug/2023)

Models further to the right are more helpful, and closer to the top are more harmless. RLAIF constitutes a Pareto improvement over RLHF, meaning that performance differences along these axes are simultaneously non-negative compared to RLHF.

I considered writing this kind of piece, and someone else has done it for me! It’s an excellent look at Constitutional AI using AI feedback, as demonstrated with Anthropic’s Claude model.

Read it: https://www.assemblyai.com/blog/rlhf-vs-rlaif-for-language-model-alignment/

Meta AI: CodeLlama-34B (25/Aug/2023)

Code Llama is a code-specialized version of Llama 2 that was created by further training Llama 2 on its code-specific datasets, sampling more data from that same dataset for longer. Essentially, Code Llama features enhanced coding capabilities. It can generate code and natural language about code, from both code and natural language prompts (e.g., “Write me a function that outputs the fibonacci sequence”). It can also be used for code completion and debugging. It supports many of the most popular programming languages used today, including Python, C++, Java, PHP, Typescript (Javascript), C#, Bash and more.
We are releasing three sizes of Code Llama with 7B, 13B and 34B parameters respectively. Each of these models is trained with 500B tokens of code and code-related data.

Phind: Beating GPT-4 on HumanEval with a Fine-Tuned CodeLlama-34B (26/Aug/2023)

We have fine-tuned CodeLlama-34B and CodeLlama-34B-Python on an internal Phind dataset that achieved 67.6% and 69.5% pass@1 on HumanEval, respectively. GPT-4 achieved 67% according to their official technical report in March. To ensure result validity, we applied OpenAI's decontamination methodology to our dataset.
The CodeLlama models released yesterday demonstrate impressive performance on HumanEval.
CodeLlama-34B achieved 48.8% pass@1 on HumanEval
CodeLlama-34B-Python achieved 53.7% pass@1 on HumanEval

Consciousness in Artificial Intelligence: Insights from the Science of Consciousness (22/Aug/2023)

From 19 authors including Turing Award-winner Prof Yoshua Bengio, this paper looks at critera for measuring consciousness.

To be included, a theory had to be based on neuroscience and supported by empirical evidence, such as data from brain scans during tests that manipulate consciousness using perceptual tricks. It also had to allow for the possibility that consciousness can arise regardless of whether computations are performed by biological neurons or silicon chips…
Google’s PaLM-E, which receives inputs from various robotic sensors, met the criterion “agency and embodiment.” And, “If you squint there’s something like a workspace,” Elmoznino adds…
The problem for all such projects, Razi says, is that current theories are based on our understanding of human consciousness. Yet consciousness may take other forms, even in our fellow mammals. “We really have no idea what it’s like to be a bat,” he says. “It’s a limitation we cannot get rid of.”

Read an analysis by Science.org: https://www.science.org/content/article/if-ai-becomes-conscious-how-will-we-know

Read the paper: https://arxiv.org/abs/2308.08708

DeepMind SynthID: Watermarking for Imagen (29/Aug/2023)

SynthID, a tool for watermarking and identifying AI-generated images. This technology embeds a digital watermark directly into the pixels of an image, making it imperceptible to the human eye, but detectable for identification.

AI + housing + communities (Aug/2023)

NEOM'S 170km The Line: The world's first cognitive city.

I don’t reckon you’d find this kind of thought experiment in any other AI updates, outside of these editions of The Memo…

Consider how broadly and deeply AI is already affecting our daily lives, from healthcare to writing to leisure. One area that has been of interest to me lately is housing and community. AI will help here too, and is already being proven in some new cities.

If you enjoy reading, here are a couple of pieces to ponder…

Let’s start with a new city being readied in Solano County, California. The land that has been purchased is 52,000 acres (210 km²)—an empire that is nearly double the size of the city of San Francisco.

The practical need for more space has at times morphed into lofty visions of building entire cities from scratch… Take an arid patch of brown hills cut by a two-lane highway between suburbs and rural land, and convert it into a community with tens of thousands of residents, clean energy, public transportation and dense urban life.
…The company’s investors, whose identities have not been previously reported, comprise a who’s who of Silicon Valley, according to three people who were not authorized to speak publicly about the plans. They include Moritz; Reid Hoffman, the LinkedIn co-founder…; Marc Andreessen and Chris Dixon, investors at the Andreessen Horowitz venture capital firm; Patrick and John Collison, the sibling co-founders of payments company Stripe; Laurene Powell Jobs; and Nat Friedman and Daniel Gross, entrepreneurs-turned-investors. (26/Aug/2023, Marin Independent Journal)

This would be a ‘planned city,’ probably using AI. Did you know that cities like Adelaide (Aus), Joondalup (Aus, near Perth), Washington DC (USA), Penang (Malaysia), and many more (wiki) were all deliberately planned rather than allowed to spread naturally?

A planned community, planned city, planned town, or planned settlement is any community that was carefully planned from its inception and is typically constructed on previously undeveloped land. This contrasts with settlements that evolve in a more organic fashion. (wiki)

Further afield, Saudi Arabia is famously working on the half-trillion dollar project The Line (wiki), which has been called ‘The world’s first cognitive city’ (Aug/2023, Wired), with the first ‘module’ ready very soon.

‘I can't think of anybody that wouldn't want to be part of this project, it's going to be, without a question, the single most extraordinary piece of work that begins in the first quarter of the 21st century.’
‘I drive a Tesla, and I see that now that's transitional. That's the last car before there's no cars.’ (18/Jul/2023, Dezeen)

At 9 million residents, this thing will be bigger than the population of most modern cities:

London: 8.9M residents
NYC: 8.4M residents
Sydney: 5.3M residents

I expect that this project—and probably all ‘cities’ and ‘housing’ in the future—will be created by or co-created with AI, and will look vastly different to the cities and housing of the 20th century:

The creators of The Line said that their whole community will become “cognitive” and will be based on AI that “it will continue to explore ways to predict, to make life easier for residents”. (link)

Policy

Spain Just Created the First European AI Supervision Agency (24/Aug/2023)

Spain has created the Spanish Agency for the Supervision of Artificial Intelligence (AESIA). The agency, announced in a royal decree on Tuesday, will be the first AI regulatory body of its kind in the European Union.
The body, which will work to develop an “inclusive, sustainable and citizen-centered” AI, is in line with the country’s National Strategy on Artificial Intelligence.

Read the press release in Spanish.

Toys to Play With

Ollama (Aug/2023)

Get up and running with large language models, locally. Run Llama 2 and other models on macOS. Customize and create your own.

Try it: https://ollama.ai/

Put it in your menu bar: https://github.com/JerrySievert/Dumbar

GPT-4 creates prompts for DALL-E about space (Aug/2023)

Prompt by GPT-4: ‘Create a mesmerizing 3D render of a cosmic landscape, featuring multiple pulsars emitting vibrant, colorful beams of radiation like ethereal cosmic lighthouses. These fast-spinning neutron stars appear to blink brilliantly against a spectacular starry backdrop, highlighting their unique role in the vast universe. Wide swaths of radiant nebulas and countless distant galaxies enriched with bright stars can be seen strewn across the awe-inspiring scene, showcasing the immense depth and grandeur of the cosmos. Additionally, incorporate a variety of celestial hues such as deep blues, purples, reds, and greens that accentuate the vast, mysterious beauty of space.‘

New image generates every 30mins. What a great project!

Take a look: https://cosmictrip.space/

Consensus: Evidence-Based Answers, Faster (Aug/2023)

Consensus is a search engine that uses AI to find insights in research papers.

Tech stack includes OpenAI (GPT) and AI21 (Jurassic-2).

Try it (free search, no login): https://consensus.app/

‘Healthcare GPT Cheat Sheet’ by Harvey Castro MD (Aug/2023)

Thanks to Harvey!

Download Healthcare GPT Cheat Sheet (PDF).

Download Healthcare GPT Cheat Sheet - Advanced (PDF).

Flashback

Two flashbacks this week… The first is by Wired, from Sep/2021, and it was titled ‘The Exponential Age Will Transform Economics Forever’.

The most basic cause of the exponential gap is simple: we are bad at maths.
…Human cognitive machinery does not naturally process such rapid change. The calculations bewilder us. Take the case of an atypical London rainstorm. Wembley Stadium is England’s national soccer venue… Imagine sitting at the highest row of level three, the furthest above the pitch you can be – some 40m or so above the ground.
Rain starts to fall, but you are sheltered by the partial roof above you. Yet this is no ordinary rain. This is exponential rain. The raindrops are going to gradually increase in frequency, doubling with each passing minute. One drop, then a minute later two drops, then a minute later four drops. By the fourth minute, eight drops. If it takes 30 minutes to get out of your seats and out of the stadium, how soon should you get moving to avoid being drenched?
To be safe, you should start moving by no later than minute 17 – to give yourself 30 minutes to be clear of the stadium. By the 47th minute, the exponential rain will be falling at a rate of 141 trillion drops per minute. Assuming a raindrop is about four cubic millimetres, by the 47th minute the deluge would be 600 million litres of water. Of course, the rain in the 48th minute will be twice as large, so you are likely to get soaked in the car park. And if you make it to the car, the deluge in the fiftieth minute will comprise five billion litres of water. It would weigh five million tonnes. Frankly, if exponential rain is forecast, you're best off staying at home.

AI in plain English using Jell-O crystals! (May/2022)

This video is doing the rounds again. We used jelly (Jell-O) crystals in different colors to symbolize different datasets, and then put the whole thing in a black box. It’s not a perfect analogy, but does show LLMs in a better way than ‘it’s a tape recorder’!

Watch my video (link):

The next roundtable will be:

You don’t need to do anything for this; there’s no registration or forms to fill in, I don’t want your email, you don’t even need to turn on your camera or give your real name!

Here’s a look at a rehearsal version of my latest keynote, for your interest. These keynotes are all closed-door private events for enterprise or government, and the ticket prices are outrageous, usually in the 4 or 5 figures.

Victorian Government (Aug/2023, link)

All my very best,

Alan
LifeArchitect.ai

Search | Archives

The Memo by LifeArchitect.ai

1 Comment