The Memo - 10/May/2023
AI hitting the 100% ceiling in tests like ToM, Inflection Pi, NVIDIA GPT-2B-001, and much more!
FOR IMMEDIATE RELEASE: 10/May/2023
Welcome back to The Memo.
In terms of AI releases, March and April 2023 were absurd. I counted 11 major language models announced in March, and 6 major models in April. I think we’re entering a more balanced cadence now, despite already having 4 major models announced in the first week of May, and with Google set to release PaLM 2 within the next 24 hours (watch live here).
In the Policy section, we look at news from the US, and a report on ChatGPT in education.
In the Toys to play with section, we look at Transformify.ai’s fantastic Zapier-like automation (including a code for The Memo subscribers), a GPT-4 escape room, new LLM integrations from Box and Slack, prompt crafting lessons by Microsoft, and advanced use cases for GPT-4 by power users…
The BIG Stuff
Exclusive: Microsoft’s Chief Economist suggests pause on AI regulation (3/May/2023)
Speaking to the World Economic Forum in Geneva, Microsoft’s Corporate Vice President and Chief Economist Dr Michael A. Schwarz (wiki) said:
What should be our philosophy about regulating AI? Clearly, we have to regulate it, and I think my philosophy there is very simple. We should regulate AI in a way where we don’t throw away the baby with the bathwater.
So, I think that regulation should be based not on abstract principles. As an Economist, I like efficiency, so first, we shouldn’t regulate AI until we see some meaningful harm that is actually happening — not imaginary scenarios.
The first time we starting requiring driver’s license it was after many dozens of people died in car accidents, right, and that was the right thing. If we would have required driver’s licenses where there were the first two cars on the road, that would have been a big mistake. We would have completely screwed up that regulation.
There has to be at least a little bit of harm, so that we see what is the real problem. What is the real problem? Did anybody suffer at least a thousand dollars because of that?
Should we jump to regulate something on a planet of eight billion people where there is not even a thousand dollars of damage? Of course not!
So, once we see real harm, then we have to ask ourselves a simple question, ‘could we regulate it in a way where the good things that will be prevented by this regulation are less important and less valuable than the harm that we prevent?’
You don’t put regulation in place to prevent a thousand dollars worth of harm where the same regulation prevents a million dollars worth of benefit to people around the world.
I agree with most of Dr Michael’s points here, and he brings some much-needed perspective to AI regulation, in contrast to the headless chickens running around panicking about our new discovery of fire (someone might burn us), or electricity (we might get electrocuted), or the Internet (it’s new and scary).
Hear Michael’s words in the WEF video at timecode (45m55s):
Inflection Pi chatbot (3/May/2023)
I’ve been waiting for Inflection (former members of DeepMind) to release their model for many months, and the chatbot version is finally here! Based on my testing, I estimate that it has between 60-100B parameters (Chinchilla scale); making it bigger than GPT-3 but not as big as GPT-4. It is designed to emulated a conversational chatbot, so is not useful for design tasks (like ChatGPT). The platform includes text-to-speech, and you can choose from four different voices.
“There’s lots of things Pi cannot do. It doesn’t do lists, or coding, it doesn’t do travel plans, it won’t write your marketing strategy, or your essay for school,” [One of three founders of DeepMind, Mustafa Suleyman] said in an interview with the Financial Times. “It’s purely designed for relaxed, supportive, informative conversation.”
…To keep up with its well-funded rivals, Inflection has hired AI experts from several competitors, including OpenAI, DeepMind and Google, who have previously helped build some of the world’s most powerful language models. Earlier this year, the company was in discussions to raise up to $675mn from investors. (-via The Fin)
Inflection will offer Pi for free for now, with no token restrictions. (Asked how it will charge users, and when, the company declined to comment.) Built on one of Inflection’s in-house large language models, Pi doesn’t use the company’s most advanced ones, which remain unreleased, according to Suleyman; Inflection already runs one of the world’s largest and best-performing models, he added, without providing specifics. Like OpenAI, Inflection uses Microsoft Azure for its cloud infrastructure. (-via Forbes)
Try it here (free, no login): https://heypi.com/talk
GPT-4 reaches 100% in Theory-of-mind (ToM) testing (26/Apr/2023)
GPT-4 has achieved 100% in ‘Theory of mind’ tasks (wiki). In my work with gifted children, when a student hits 100% on a test, this is a ‘very bad thing’. It means the test was designed poorly, and it was perhaps a waste of time testing that student with that instrument, as they may have been able to score (the equivalent of) 101% or 10,000%… but we’d never know because the test wasn’t comprehensive enough.
We are now seeing large language models outperforming peak human abilities in some areas. For the first time, we are also reaching the ceilings on some benchmarks. This is concerning given that (eventually) we may not be ‘smart enough’ to continue creating new tests for AI. We’re already hitting the ceiling of our human capabilities!
Note that CoT means ‘chain-of-thought’ (like show your working in maths), and ‘SS thinking’ means instructing the model to use ‘step-by-step’ thinking.
Read the paper via Johns Hopkins: https://arxiv.org/abs/2304.11490
Watch my video:
The Interesting Stuff
GPT-4 IQ = 152 (15/Mar/2023)
Dr David Rozado is an Associate Professor at Otago Polytechnic, and formerly at CSIRO and Max Planck. His testing of GPT models using verbal-linguist IQ tests is useful in giving us a real indication of artificial intelligence progress.
IQ
GPT-4 = 152 (99.97th %ile)
ChatGPT = 147 (99.91st %ile)
Mensa minimum = 130 (98th %ile)
Human average = 100 (50th %ile)
Read more about IQ on my ‘visualising brightness’ page.
See the latest benchmarks on my AI + IQ testing (human vs AI) page.
Google’s Luke Sernau and the ‘We have no moat’ document (Apr/2023)
In early April 2023, Google Engineer Luke Sernau published a document on Google’s internal system. The document was recently ‘leaked’, though please note that some think this was done intentionally by Google PR to redirect attention.
Highlights:
[Google isn’t] positioned to win this arms race and neither is OpenAI. While we’ve been squabbling, a third faction has been quietly eating our lunch. I’m talking, of course, about open source. Plainly put, they are lapping us.
Giant models are slowing us down. In the long run, the best models are the ones
which can be iterated upon quickly. [Alan: This is a misdirection at best. Giant models will continue to run the giant/corporate world.]
At the beginning of March [2023] the open source community got their hands on their first really capable foundation model, as Meta’s LLaMA was leaked to the public. It had no instruction or conversation tuning, and no RLHF. Nonetheless, the community immediately understood the significance of what they had been given.
…the one clear winner in all of this is Meta. Because the leaked model was theirs, they have effectively garnered an entire planet's worth of free labor.
The more tightly we control our models, the more attractive we make open alternatives. Google and OpenAI have both gravitated defensively toward release patterns… Anyone seeking to use LLMs for unsanctioned purposes can simply take their pick of the freely available models.
…in the end, OpenAI doesn’t matter. They are making the same mistakes we are in their posture relative to open source, and their ability to maintain an edge is necessarily in question. Open source alternatives can and will eventually eclipse them unless they change their stance.
Read the full thesis on GitHub.
Less publishing, less openness: Google follows DeepMind and OpenAI in restricting publishing (4/May/2023)
I’ve previously expressed my disappointment in the decision by several AI labs to stop publishing their research. The 2017-2022 openness of information in the AI field was a real boon for everyone. Now in 2023, this has come to an end, with Google joining DeepMind and OpenAI in restricting publication of research papers.
"We're not in the business of just publishing everything anymore," one Google Brain staffer described as the message from upper management.
Read more via BusinessInsider.
Datasets: Reddit bans Pushshift from data scraping (1/May/2023)
Pushshift is a data project started and maintained by Jason Baumgartner. It powers the collection of data (mainly web links and comments) from Reddit, which is used as a proxy for ‘popular’ web content for datasets like WebText (used for GPT-3, The Pile, and many other datasets). You can read more about this in my What’s in my AI? paper.
Reddit recently changed their terms and conditions, and has now banned Pushshift. The author noted:
The Reddit Dataset paper [May/2020] that I co-authored has been cited a whopping 630 times and it constantly grows. I don't think Reddit fully understands just how much Pushshift is used in research and the academic world -- but when we speak to the admins sometime this week, we'll try and make a strong case to keep as much functionality as we can in the API.
Datasets are a significant part of the AI race, and this is an interesting move by Reddit. It echoes Elon Musk’s belligerent blocking of the Twitter API and usage by OpenAI in Dec/2022:
[Musk] had learned of a relationship between OpenAI, the start-up behind the popular chatbot ChatGPT, and Twitter, which he had bought in October for $44 billion. OpenAI was licensing Twitter’s data — a feed of every tweet — for about $2 million a year to help build ChatGPT, two people with knowledge of the matter said. Mr. Musk believed the A.I. start-up wasn’t paying Twitter enough, they said.
So Mr. Musk cut OpenAI off from Twitter’s data, they said. (-via NYT)
NVIDIA GPT-2B-001 (May/2023)
NVIDIA should probably have been on my AI race viz, but at the time of publication I had decided that they were more about tooling and less about model-building:
They’re still in the race, and currently testing out training smaller models with enormous amounts of data. Their latest model is the successor to their Megatron-GPT 20B model (2022).
GPT-2B-001 (not related to GPT-2 or OpenAI) is only 2 billion parameters, but was trained on 1.1 trillion tokens. This makes it one of the most ‘data optimal’ models around—and potentially overtrained—though this is what they’re testing. On my Chinchilla viz (Nov/2022), it would be on the far-right at 550:1 tokens per parameter.
It will be interesting to see this progress, as it looks like we are heading towards models that can fit on one GPU for inference, but were trained on more than 3x the amount of data used for GPT-3.
View the HF space: https://huggingface.co/nvidia/GPT-2B-001
View tokens:parameters data on my table, showing comparisons between major models.
MosaicML MPT-7B-Instruct (5/May/2023)
Introducing MPT-7B, the latest entry in our MosaicML Foundation Series. MPT-7B is a transformer trained from scratch on 1T tokens of text and code. It is open source, available for commercial use, and matches the quality of LLaMA-7B. MPT-7B was trained on the MosaicML platform in 9.5 days with zero human intervention at a cost of ~$200k… we are also releasing three finetuned models in addition to the base MPT-7B: MPT-7B-Instruct, MPT-7B-Chat, and MPT-7B-StoryWriter-65k+, the last of which uses a context length of 65k tokens!
65,000 tokens is about 48,750 words (1 token≈0.75 words). Remember that the largest version of GPT-4 caps out at ‘only’ 32,000 tokens or 24,000 words.
So, MPT-7B allows you to generate pretty close to a standard 50,000-word book, OR feed it a 50,000-word book and ask for a one-page summary.
Imagine the possibilities of instantly having a book turned into a screenplay ‘in the style of Aaron Sorkin’ or ‘by Trey Parker and Matt Stone’!