To: US Govt, major govts, Microsoft, Apple, NVIDIA, Alphabet, Amazon, Meta, Tesla, Citi, Tencent, IBM, & 10,000+ more recipients…
From: Dr Alan D. Thompson <LifeArchitect.ai>
Sent: 11/Jun/2024
Subject: The Memo - AI that matters, as it happens, in plain English
AGI: 74%
The winner of The Who Moved My Cheese? AI Awards! for June 2024 is Strauss Zelnick, CEO of Take-Two, responsible for games like GTA and Red Dead. He says: ‘I also don’t think for a minute that generative AI is going to reduce employment. That’s crazy, it’s actually crazy… I’m in a Whatsapp chat with a bunch of Silicon Valley CEOs, and the conventional wisdom out there is like, ‘AI is gonna make us all unemployed.’ It is just the stupidest thing I’ve ever heard. The history of productivity tools is that it increases employment.’ [Alan: Absolutely tremendous quote! Amazing that superintelligence has been relegated to being a ‘productivity tool’.]
Join the weekly livestream in ~15 hours from this emailed edition (link/notify).
Contents
The BIG Stuff (Terry Tao, mid-year AI report, GPT-5 updates…)
The Interesting Stuff (retiring in 3 years, Qwen2, Kling, robot doctors, AMD…)
Policy (AI political candidates, China + chips, Estonia, EC…)
Toys to Play With (Alan’s new LLM daily driver, Showrunner, AI movies…)
Flashback (MMLU…)
Next (Roundtable…)
The BIG Stuff
Teams of GPT-4 Agents can Exploit Zero-Day Vulnerabilities (2/Jun/2024)
Teams of GPT-4 agents can autonomously exploit zero-day vulnerabilities.
Cybersecurity, on both the offensive and defensive side, will increase in pace.
Black-hat actors can use AI agents to hack websites. On the other hand, penetration testers can use AI agents to aid in more frequent penetration testing.
Do you recall The Memo edition 1/May/2024 when we talked about researchers using GPT-4 agents to exploit 87% of 15 real-world documented CVEs?
They’re back with a follow-up paper showing how they've been able to hack zero-day vulnerabilities (unknown or unreported exploits) with a team of autonomous, self-propagating GPT-4 agents.
Instead of assigning a single LLM agent trying to solve many complex tasks, researchers at University of Illinois Urbana-Champaign used a GPT-4 ‘planning agent’ that oversees the entire process and launches multiple GPT-4 ‘subagents,’ that are task-specific. Very much like a boss and subordinates, the planning agent coordinates to the managing agent which delegates all efforts of each ‘expert subagent’.
It's a technique similar to what Cognition Labs uses with its Devin AI software development team; it plans a job out, figures out what kinds of workers it'll need, then project-manages the job to completion while spawning its own specialist 'employees' to handle tasks as needed.
Read the paper: https://arxiv.org/abs/2406.01637
Read an analysis via New Atlas.
Apple Intelligence (10/Jun/2024)
They may be four years behind GPT-3, but Apple has finally offered integrated AI—including a clunky integration of ChatGPT that asks permission before proceeding—to its users on iOS and Mac. Apple’s annual Worldwide Developers Conference (WWDC) revealed ‘Apple Intelligence’ or extended AI functionality to most users of its 2.2 billion active Apple devices.
The on-device model is 3B parameters using GQA and LoRA (Apple, 10/Jun/2024). It is most likely a model called OpenELM 3.04B trained on 1.5T tokens, documented by Apple in Apr/2024. MMLU=26.76.
OpenELM paper: https://arxiv.org/abs/2404.14619
OpenELM repo: https://huggingface.co/apple/OpenELM-3B-Instruct
See it on the models table: https://lifearchitect.ai/models-table/
The server-based model is possibly a version of Apple’s Ferret (Oct/2023) and Ferret-UI (Apr/2024), both based on Vicuna 13B, a Llama-2 derivative with a ‘commercial-friendly’ license covering less than 700M users only. Any legal agreements between Apple and Meta would be behind closed doors, but it certainly makes me wonder… View the Ferret repo: https://github.com/apple/ml-ferret
Apple revealed some very limited rankings (and only bfloat16 precision evaluations) for both models:
Both the on-device and server models use grouped-query-attention. We use shared input and output vocab embedding tables to reduce memory requirements and inference cost…
For on-device inference, we use low-bit palletization, a critical optimization technique that achieves the necessary memory, power, and performance requirements. To maintain model quality, we developed a new framework using LoRA adapters that incorporates a mixed 2-bit and 4-bit configuration strategy — averaging 3.5 bits-per-weight — to achieve the same accuracy as the uncompressed models.
…the ~3 billion parameter on-device model, the parameters for a rank 16 adapter typically require 10s of megabytes. The adapter models can be dynamically loaded, temporarily cached in memory, and swapped — giving our foundation model the ability to specialize itself on the fly for the task at hand while efficiently managing memory and guaranteeing the operating system's responsiveness.
Sidenote: Jack Ma got a lot of flack for saying that AI should stand for ‘Alibaba Intelligence’ (watch video timecode), yet it seems to be acceptable for Apple to do the same thing…
Read the OpenAI press release: https://openai.com/index/openai-and-apple-announce-partnership/
HN discussion: https://news.ycombinator.com/item?id=40636844
Terence Tao: AI will become mathematicians’ ‘co-pilot’ (8/Jun/2024)
We’ve covered Aussie genius Prof Terence Tao several times in The Memo (see editions 20/Apr/2023 and 20/Jun/2023). Terry is widely regarded as one of the smartest humans alive right now. He recently discussed how large language models are transforming mathematics.
I think in the future, instead of typing up our proofs, we would explain them to some GPT. And the GPT will try to formalize it in Lean as you go along. If everything checks out, the GPT will [essentially] say, “Here’s your paper in LaTeX; here’s your Lean proof. If you like, I can press this button and submit it to a journal for you.” It could be a wonderful assistant in the future…
The way we do mathematics hasn’t changed that much. But in every other type of discipline, we have mass production. And so with AI, we can start proving hundreds of theorems or thousands of theorems at a time. And human mathematicians will direct the AIs to do various things.
…maybe we will just ask an AI, “Is this true or not?” And we can explore the space much more efficiently, and we can try to focus on what we actually care about. The AI will help us a lot by accelerating this process. We will still be driving, at least for now… in the near term, AI will automate the boring, trivial stuff first.
Read more via Scientific American.
Here’s an interesting sidenote. Born in Jul/1975 in Adelaide, Terry sat the SAT (a standardized test used for college admissions in the US) in May/1983 when he was just eight years old. My late colleague Prof Miraca Gross (my link) published the original comments on Terry’s SAT results by Prof Julian Stanley from Johns Hopkins. For the maths section, Terry scored 760/800 (99th percentile). For the verbal section, he scored just 290/800 (a fail; below the 9th percentile). The source is hard to find (the Davidson Gifted database seems to have been shut down), so here’s the PDF:
This disparity between domains is well-documented in humans, related to the concept of asynchrony in my postgraduate field of gifted education and human intelligence research. In artificial intelligence research, performance disparity across academic subjects is less pronounced due to the generality of large language models—though of course, maths performance (and numbers in general) is still an issue in 2024 LLMs due to challenges like tokenization (see new solutions in recent papers 1 and 2).
Terry was one of the ‘favorites’ (how’s that for diplomatic?) given early access to the full version of GPT-4 months before the publicly released model. As previously covered in The Memo, he now reports on GPT to the US President’s Council of Advisors on Science and Technology, and wrote about GPT-4 for Microsoft in Jun/2023.
Recently, Terry presented at the AMS Colloquium Lectures for the 2024 Joint Mathematics Meetings in San Francisco. Watch the video with timecodes:
2024 mid-year AI report (Jun/2024)
I want to extend my personal thanks to you for being a full subscriber of The Memo. I’m grateful for your support of my advisory during this significant and historic period for humanity.
Rather than Stanford AI’s approach of writing very long annual reports (their recent AI report was over 500 pages and out of date before even being published), this 14-page report in my popular ongoing series covers only what you need to know: AI that matters, as it happens, in plain English.
The 2024 mid-year AI report is being made available to you earlier than usual, and you can read it now.