The Memo - 16/Dec/2023

Gemini, Mistral Medium, Imagen 2, and much more!

Dr Alan D. Thompson

Dec 16, 2023

FOR IMMEDIATE RELEASE: 16/Dec/2023

Welcome back to The Memo.

You’re joining full subscribers from Vanguard, VMware, Vanenburg, Verizon, Vonage, and more.

I think this is the longest edition yet (3,700+ words, or 7½ printed pages). Let’s get started…

The BIG Stuff

DeepMind: LLMs can now produce new maths discoveries and solve real-world problems (14/Dec/2023)

DeepMind head of AI for science (14/Dec/2023 Guardian, MIT): ‘this is the first time that a genuine, new scientific discovery has been made by a large language model… It’s not in the training data—it wasn’t even known.’

the first time a new discovery has been made for challenging open problems in science or mathematics using LLMs. FunSearch discovered new solutions… its solutions could potentially be slotted into a variety of real-world industrial systems to bring swift benefits… the power of these models [tested with Codey PaLM 2 340B] can be harnessed not only to produce new mathematical discoveries, but also to reveal potentially impactful solutions to important real-world problems.

Read DeepMind’s explanation.

Yes, this moved the AGI needle from 61% → 64%.

Sidenote: In Feb/2007, fellow Aussie Prof Terry Tao called the cap set question his ‘favorite open question’. In Jun/2023 Terry also said that LLMs would take another three years to reach this level of progress (‘2026-level AI… will be a trustworthy co-author in mathematical research’). Read more about exponential growth (wiki).

Read the paper: https://doi.org/10.1038/s41586-023-06924-6

Gemini (6/Dec/2023)

We pushed out a special edition of The Memo to all 10,000+ readers for the release of Gemini; it was in your inbox within the first few hours of announcement.

It usually takes researchers several years to discover the capabilities of new models, and we’re still discovering cool new things about GPT-2 (2019) and GPT-3 (2020). Google DeepMind even flagged this phenomenon in the report: ‘Gemini can enable new approaches in areas like education, everyday problem solving, multilingual communication, information summarization, extraction, and creativity. We expect that the users of these models will find all kinds of beneficial new uses that we have only scratched the surface of in our own investigations.’

Here’s why I think Gemini is so important for AI models and the world.

More multimodal.
1. Inputs: text, image, audio, video.
2. Outputs: text, image.
Multilinguality. Trained on many languages.
Impressive benchmark performance, beating GPT-4 1760B across 30+ metrics.
Model sizes including on-device options (ready for phone, assistant, and humanoid).

I finished my Gemini annotated paper very recently, and it is now available to paid subscribers.

The Memo - Downloads + highlights (2024)

Dr Alan D. Thompson

December 12, 2023

Read full story

Microsoft argued (12/Dec/2023) that Google’s prompting and ‘best of 32x’ is the reason the Gemini benchmark scores are so good, especially on the MMLU benchmark where Gemini outperformed GPT-4. Microsoft has now used new prompting to achieve an even higher score for GPT-4 on MMLU.

MMLU results
90.10%: GPT-4 (Microsoft’s new testing)
90.04%: Gemini Ultra (Google’s testing)
89.8%: Human expert baseline (ASI)
86.4%: GPT-4 (OpenAI’s initial testing)
…
34.5%: Human average baseline (AGI)

All of this is completely moot given that the MMLU contains a lot of errors, so arguments about this level of overprecision (thanks to GPT-4 for finding that word for me) are… misguided at best. If you’d like to read more about this—including examples of where the MMLU rubric is just plain wrong—I can recommend ‘Errors in the MMLU: The Deep Learning Benchmark is Wrong Surprisingly Often’ by Daniel Erenrich (23/Aug/2023): https://archive.md/8lMxY

New models by Mistral (11/Dec/2023)

Mistral released two ground-breaking models.

Mistral Small, also known as Mixtral, also known as mixtral-8x7b-32kseqlen. Mistral says ‘Concretely, Mixtral has 45B total parameters but only uses 12B parameters per token. It, therefore, processes input and generates output at the same speed and for the same cost as a 12B model.‘
Mistral Medium, a new ‘prototype model’ that I’ve estimated at 180B parameters. It outperforms ChatGPT 20B and Llama 2 70B. MMLU=75.3% (GPT-3.5-turbo 20B=70%, Llama 2 70B=68.9%).

Read more about Mistral Small: https://mistral.ai/news/mixtral-of-experts/

Read more about Mistral Medium: https://mistral.ai/news/la-plateforme/

The best place for inexpensive inference of Mistral’s models is actually via competitor Together AI: https://www.together.ai/blog/mixtral

Sidenote: I laughed at this anonymous comment on HN (11/Dec/2023) making fun of the absurd buzzwords and silly model names we’re seeing:

Cheeseface just dropped the Blippy-7B model which is almost as good as the twinamp 34B model on the SwagCube benchmark when run locally as int8 and this shows that the gains made by the skibidi-70B model will probably filter down to the baseline Eras models in the next few weeks.

I hope my reports don’t read like this!

If you’d like to deep dive into ‘mixture of experts’ models, read the new HF walkthrough (Dec/2023): https://huggingface.co/blog/moe

If you’d like to understand how Transformers and large language models work, read the Financial Times walkthrough (Sep/2023):
https://ig.ft.com/generative-ai/

Google Imagen 2 - the cutting-edge of AI-generated art (13/Dec/2023)

Imagen 2 is Google DeepMind’s latest text-to-image diffusion model, capable of creating photorealistic images from textual prompts, designed for use by developers and featured in Google Arts and Culture experiments.

It does text well, works in multiple languages, is watermarked with SynthID, and seems to be the bleeding edge in text-to-image right now.

Imagen 2: Prompt: ‘A shot of a 32-year-old female, up and coming conservationist in a jungle; athletic with short, curly hair and a warm smile’

Available on Vertex AI, become a ‘trusted tester’: https://cloud.google.com/blog/products/ai-machine-learning/imagen-2-on-vertex-ai-is-now-generally-available

Optimus Gen 2 (13/Dec/2023)

‘Everything in this video is real, no CGI. All real time, nothing sped up. Incredible hardware improvements from the team.’

Read the tweet: https://twitter.com/julianibarz/status/1734759309077344737

Meet Ashley, the world’s first AI-powered political campaign caller (12/Dec/2023)

I am putting this in the ‘big stuff’ pile, because it is huge. On the surface, it looks like 20x LLMs, voice models, and other AI models stitched together. But look a little closer. This is a real-life illustration of the explosion we’ve been expecting, with very tangible effects and outcomes.

Ashley is introduced as the first artificial intelligence system designed to engage with voters for political campaigns.

…she is the first political phone banker powered by generative AI technology similar to OpenAI's ChatGPT. She is capable of having an infinite number of customized one-on-one conversations at the same time.
…Over the weekend, Ashley called thousands of Pennsylvania voters on behalf of Daniels. Like a seasoned campaign volunteer, Ashley analyzes voters' profiles to tailor conversations around their key issues. Unlike a human, Ashley always shows up for the job, has perfect recall of all of Daniels' positions, and does not feel dejected when she's hung up on.
"This is going to scale fast," said 30-year-old Ilya Mouzykantskii, the London-based CEO of Civox, the company behind Ashley. "We intend to be making tens of thousands of calls a day by the end of the year and into the six digits pretty soon. This is coming for the 2024 election and it's coming in a very big way. ... The future is now."
Mouzykantskii and his co-founder Adam Reis, former computer science students at Stanford and Columbia Universities respectively, declined to disclose the exact generative AI models they are using. They will only say they use over 20 different AI models, some proprietary and some open source
[Alan’s guess:
LLM: Meta AI Llama 2 derivative
LLM backup: OpenAI gpt-3.5-turbo (ChatGPT)
Document search: OpenAI text-embedding-ada-002 for context and profiling
Voice out: OpenAI TTS or Azure TTS
RAG: other (web search) for context and profiling
Voice in: OpenAI Whisper
Translation: Meta AI CoVoST (if needed)
Classifier: Meta AI FastText or similar to identify call sentiment
LLM: Mistral 7B for call summary
That’s only 9 models… And somehow they found uses for at least 12 more models to get to at least 21 total. For a phone call…]
Thanks to the latest generative AI technologies, Reis was able to build the product almost entirely on his own, whereas several years ago it would have taken a team of 50 engineers several years to do so, he said.

The Interesting Stuff

End of year AI report (16/Dec/2023)

I’m very happy with the end of year report, the latest in ‘The sky is’ series, and a warm ‘thank you’ to our technical reviewers. We’re making the report available early to full subscribers of The Memo. I appreciate your continued support of what you’ve told me is the most complete, grounded, and optimistic view of our current AI reality.

Watch out for the video coming soon, and you can be notified about that by clicking some buttons on YouTube.

You are welcome to share this report anywhere you’d like immediately, and it will be officially launched to the public around Christmas 2023.

Read the report online.

Or download the PDF:

2023 Alan D Thompson The Sky Is Comforting Rev 0

9.45MB ∙ PDF file

Download

AI plush toys in partnership with OpenAI (15/Dec/2023)

Curio and OpenAI have released a line of interactive AI plush toys that can engage with kids and adapt to their personalities.

Policy

EU agrees on AI Act, landmark regulation for artificial intelligence (8/Dec/2023)

European Union lawmakers have reached an agreement on the AI Act, aiming to set a global precedent in the regulation of artificial intelligence, focusing on its high-risk applications.

Alan’s take: When it comes to developing AI and contributing to humanity’s revolution, the EU is not helping. That entire region is lost. Start over. See the Cato principles as addressed in The Memo edition 13/Nov/2023 for more.

Toys to Play With

llamafile: Bash one-liners for LLMs (4/Dec/2023)

1337 h4x0r and Google dev Justine Tunney (wiki) is the author behind Mozilla’s llamafile (see The Memo edition 2/Dec/2023).

Justine collaborated with Mozilla to create llamafile, an open-source project that enables running a large language model on personal computers, which has gained significant attention and contributions from the open-source community.

And she’s now spelled out some real use cases for the project:

As we can see, Mistral and links decimated a web page with 3,774 words down to just 129 words. You can ask Mistral any question you want in your prompt. For example, unlike Commander Data, this LLM is capable of simulating empathy. So you could ask Mistral if the author of the text sounds disturbed or incoherent.

Download llamafile: https://github.com/Mozilla-Ocho/llamafile

Grok in Australia and elsewhere (12/Dec/2023)

Elon Musk’s company, xAI, has expanded its AI chatbot Grok to Australia and 46 other countries, with the service being available to Twitter (X) Premium+ subscribers and designed to answer questions with wit and a rebellious streak.

The full list of countries is (thanks to GPT-4):

Australia, Bahamas, Barbados, Belize, Botswana, Cameroon, Canada, Dominica, Eswatini, Fiji, Gambia, Ghana, Grenada, Guyana, India, Jamaica, Kenya, Liberia, Malaysia, Malawi, Malta, Mauritius, Namibia, New Zealand, Nigeria, Pakistan, Papua New Guinea, Philippines, Rwanda, Saint Kitts & Nevis, Saint Lucia, Saint Vincent & the Grenadines, Samoa, Seychelles, Sierra Leone, Singapore, Solomon Islands, South Sudan, Sri Lanka, Tanzania, Tonga, Trinidad & Tobago, Tuvalu, Uganda, Vanuatu, Zambia, Zimbabwe.

Source tweet: https://twitter.com/X/status/1735007444781121708

Flashback

Megatron-11B (Apr/2020)

Remember this thing? It’s still around. The demo still works. Released around the same time as GPT-3, it was my very favourite model for a while. It’s now been relegated to the archives of history. In just 3.5 years!

Try it (free, no login): https://app.inferkit.com/demo

See it on my Models Table (at the bottom!): https://lifearchitect.ai/models-table/

What a year!

I think everyone is due a quick breather during the end of December 2023. But watch out for those sneaky AI labs providing last-minute releases. ERNIE 3.0 came out 23/Dec/2021, and last year in their first big collab, Google and DeepMind gave us Med-PaLM 1 on Boxing Day 26/Dec/2022…

It’s possible that this is the last edition for 2023, but let’s see! Thanks for your support this year, and wishing you a peaceful and inspired New Year.

All my very best,

Alan
LifeArchitect.ai

Search | Archives

The Memo by LifeArchitect.ai

The Memo - Downloads + highlights (2024)

Discussion about this post

The Memo by LifeArchitect.ai

The Memo - 16/Dec/2023

Gemini, Mistral Medium, Imagen 2, and much more!

The BIG Stuff

The Memo - Downloads + highlights (2024)

The Interesting Stuff

Policy

Toys to Play With

Flashback

Next

Discussion about this post