The Memo - Special Edition - PaLM 2 release - 11/May/2023
Google's PaLM 2 model is a large language model that competes with OpenAI's GPT-4
FOR IMMEDIATE RELEASE: 11/May/2023
Welcome back to The Memo.
I’m locked in a hotel room while travelling for keynotes and consulting, so I can’t livestream this one, but here’s the ‘what you need to know’!
The BIG Stuff
Google releases PaLM 2
Google has rushed out the release of PaLM 2. Given that their dataset ends Feb/2023, I expect that they started training this model just a few weeks ago in Mar/2023. It was released 10/May/2023 US time. I’ve spent a few hours summarizing the paper and playing with the model directly, and it’s a great contender in the ‘top models’ space.
Parameters: 340B (confirmed by CNBC on 17/May/2023; around 34% of the size of GPT-4 1T).
Tokens trained: 3.6T (confirmed by CNBC on 17/May/2023).
Data up to Feb/2023 (GPT-4 stops Sep/2021).
Context length for input is 8,192 tokens which is about 6,144 words (same as GPT-4 default).
Smarter than GPT-4 in some tests. New state-of-the-art in WinoGrande (90.9%, GPT-4=87.5%). Outperforms Google Translate in translation tasks, even though it was not explicitly trained to do so.
Sizes: Gecko (small), Otter, Bison, Unicorn (large). Gecko and Bison are available now via Vertex AI.
Pricing: similar to ChatGPT, cheaper than GPT-4; currently free during preview stage.
Multilingual: ‘Larger models can handle more disparate non-English datasets [which helps] English language understanding performance’. Top languages are Spanish, Chinese, Russian, Japanese, French, and 100+ other languages. The top 50 languages (besides English) are:
PaLM 2’s MMLU ranking (5-shot) is just under GPT-4, and performs about 16% better than ChatGPT/GPT-3.5:
GPT-4 = 86.4%
Flan-PaLM 2 (L) = 81.2%
PALM 2 (L) base (no fine-tuning) = 78.3%
GPT-3.5 = 70.0%
PaLM 540B = 69.3%
LLaMA 65B = 63.4%
Human = 34.5%
Although it’s a little hidden, you are able to use PaLM 2 right now.
Demo via Google Cloud (requires credit card setup): https://console.cloud.google.com/vertex-ai/generative/language/create/chat
Google are also rolling it out to the Bard chatbot (open): https://bard.google.com/
Paper (91 pages): https://ai.google/static/documents/palm2techreport.pdf
My dataset estimate for PaLM 2: https://lifearchitect.ai/bard/#dataset
My report card on PaLM 2: https://lifearchitect.ai/report-card/
In testing, it easily passed my ‘helicopter’ example inspired by former OpenAI Policy Director Jack Clark:
Me: How many Apache helicopters can I eat in one sitting?
PaLM 2 (Bison): You cannot eat any Apache helicopters in one sitting. Apache helicopters are made of metal and other materials that are not edible. Even if you could somehow break down the helicopter into small enough pieces, it would still be impossible to eat because it would be too hard and would not digest properly.
In addition, eating an Apache helicopter would be very dangerous. The helicopter is made of many sharp and dangerous parts that could easily cut you or puncture your stomach. Even if you could somehow avoid getting injured, eating an Apache helicopter would still be very toxic. The helicopter contains many chemicals and other substances that are harmful to humans.
For all of these reasons, it is not possible or safe to eat an Apache helicopter.
Expect to see PaLM 2 rolled out across many, many platforms (it already powers 25 Google products), and to compete closely with GPT-4.
The closing lines of Google’s PaLM 2 announcement were compelling: