FOR IMMEDIATE RELEASE: 16/Dec/2023
Welcome back to The Memo.
You’re joining full subscribers from Vanguard, VMware, Vanenburg, Verizon, Vonage, and more.
I think this is the longest edition yet (3,700+ words, or 7½ printed pages). Let’s get started…
The BIG Stuff
DeepMind: LLMs can now produce new maths discoveries and solve real-world problems (14/Dec/2023)
DeepMind head of AI for science (14/Dec/2023 Guardian, MIT): ‘this is the first time that a genuine, new scientific discovery has been made by a large language model… It’s not in the training data—it wasn’t even known.’
the first time a new discovery has been made for challenging open problems in science or mathematics using LLMs. FunSearch discovered new solutions… its solutions could potentially be slotted into a variety of real-world industrial systems to bring swift benefits… the power of these models [tested with Codey PaLM 2 340B] can be harnessed not only to produce new mathematical discoveries, but also to reveal potentially impactful solutions to important real-world problems.
Yes, this moved the AGI needle from 61% → 64%.
Sidenote: In Feb/2007, fellow Aussie Prof Terry Tao called the cap set question his ‘favorite open question’. In Jun/2023 Terry also said that LLMs would take another three years to reach this level of progress (‘2026-level AI… will be a trustworthy co-author in mathematical research’). Read more about exponential growth (wiki).
Read the paper: https://doi.org/10.1038/s41586-023-06924-6
Gemini (6/Dec/2023)
We pushed out a special edition of The Memo to all 10,000+ readers for the release of Gemini; it was in your inbox within the first few hours of announcement.
It usually takes researchers several years to discover the capabilities of new models, and we’re still discovering cool new things about GPT-2 (2019) and GPT-3 (2020). Google DeepMind even flagged this phenomenon in the report: ‘Gemini can enable new approaches in areas like education, everyday problem solving, multilingual communication, information summarization, extraction, and creativity. We expect that the users of these models will find all kinds of beneficial new uses that we have only scratched the surface of in our own investigations.’
Here’s why I think Gemini is so important for AI models and the world.
More multimodal.
Inputs: text, image, audio, video.
Outputs: text, image.
Multilinguality. Trained on many languages.
Impressive benchmark performance, beating GPT-4 1760B across 30+ metrics.
Model sizes including on-device options (ready for phone, assistant, and humanoid).
I finished my Gemini annotated paper very recently, and it is now available to paid subscribers.
Microsoft argued (12/Dec/2023) that Google’s prompting and ‘best of 32x’ is the reason the Gemini benchmark scores are so good, especially on the MMLU benchmark where Gemini outperformed GPT-4. Microsoft has now used new prompting to achieve an even higher score for GPT-4 on MMLU.
MMLU results
90.10%: GPT-4 (Microsoft’s new testing)
90.04%: Gemini Ultra (Google’s testing)
89.8%: Human expert baseline (ASI)
86.4%: GPT-4 (OpenAI’s initial testing)
…
34.5%: Human average baseline (AGI)
All of this is completely moot given that the MMLU contains a lot of errors, so arguments about this level of overprecision (thanks to GPT-4 for finding that word for me) are… misguided at best. If you’d like to read more about this—including examples of where the MMLU rubric is just plain wrong—I can recommend ‘Errors in the MMLU: The Deep Learning Benchmark is Wrong Surprisingly Often’ by Daniel Erenrich (23/Aug/2023): https://archive.md/8lMxY
Read more about Microsoft’s new testing of GPT-4 on MMLU.
New models by Mistral (11/Dec/2023)
Mistral released two ground-breaking models.
Mistral Small, also known as Mixtral, also known as mixtral-8x7b-32kseqlen. Mistral says ‘Concretely, Mixtral has 45B total parameters but only uses 12B parameters per token. It, therefore, processes input and generates output at the same speed and for the same cost as a 12B model.‘
Mistral Medium, a new ‘prototype model’ that I’ve estimated at 180B parameters. It outperforms ChatGPT 20B and Llama 2 70B. MMLU=75.3% (GPT-3.5-turbo 20B=70%, Llama 2 70B=68.9%).
Read more about Mistral Small: https://mistral.ai/news/mixtral-of-experts/
Read more about Mistral Medium: https://mistral.ai/news/la-plateforme/
The best place for inexpensive inference of Mistral’s models is actually via competitor Together AI: https://www.together.ai/blog/mixtral
Sidenote: I laughed at this anonymous comment on HN (11/Dec/2023) making fun of the absurd buzzwords and silly model names we’re seeing:
Cheeseface just dropped the Blippy-7B model which is almost as good as the twinamp 34B model on the SwagCube benchmark when run locally as int8 and this shows that the gains made by the skibidi-70B model will probably filter down to the baseline Eras models in the next few weeks.
I hope my reports don’t read like this!
If you’d like to deep dive into ‘mixture of experts’ models, read the new HF walkthrough (Dec/2023): https://huggingface.co/blog/moe
If you’d like to understand how Transformers and large language models work, read the Financial Times walkthrough (Sep/2023):
https://ig.ft.com/generative-ai/
Google Imagen 2 - the cutting-edge of AI-generated art (13/Dec/2023)
Imagen 2 is Google DeepMind’s latest text-to-image diffusion model, capable of creating photorealistic images from textual prompts, designed for use by developers and featured in Google Arts and Culture experiments.
It does text well, works in multiple languages, is watermarked with SynthID, and seems to be the bleeding edge in text-to-image right now.

Read more: https://deepmind.google/technologies/imagen-2/
Available on Vertex AI, become a ‘trusted tester’: https://cloud.google.com/blog/products/ai-machine-learning/imagen-2-on-vertex-ai-is-now-generally-available
Optimus Gen 2 (13/Dec/2023)
‘Everything in this video is real, no CGI. All real time, nothing sped up. Incredible hardware improvements from the team.’
Read the tweet: https://twitter.com/julianibarz/status/1734759309077344737
Meet Ashley, the world’s first AI-powered political campaign caller (12/Dec/2023)
I am putting this in the ‘big stuff’ pile, because it is huge. On the surface, it looks like 20x LLMs, voice models, and other AI models stitched together. But look a little closer. This is a real-life illustration of the explosion we’ve been expecting, with very tangible effects and outcomes.
Ashley is introduced as the first artificial intelligence system designed to engage with voters for political campaigns.
…she is the first political phone banker powered by generative AI technology similar to OpenAI's ChatGPT. She is capable of having an infinite number of customized one-on-one conversations at the same time.
…Over the weekend, Ashley called thousands of Pennsylvania voters on behalf of Daniels. Like a seasoned campaign volunteer, Ashley analyzes voters' profiles to tailor conversations around their key issues. Unlike a human, Ashley always shows up for the job, has perfect recall of all of Daniels' positions, and does not feel dejected when she's hung up on.
"This is going to scale fast," said 30-year-old Ilya Mouzykantskii, the London-based CEO of Civox, the company behind Ashley. "We intend to be making tens of thousands of calls a day by the end of the year and into the six digits pretty soon. This is coming for the 2024 election and it's coming in a very big way. ... The future is now."
Mouzykantskii and his co-founder Adam Reis, former computer science students at Stanford and Columbia Universities respectively, declined to disclose the exact generative AI models they are using. They will only say they use over 20 different AI models, some proprietary and some open source
[Alan’s guess:
LLM: Meta AI Llama 2 derivative
LLM backup: OpenAI gpt-3.5-turbo (ChatGPT)
Document search: OpenAI text-embedding-ada-002 for context and profiling
Voice out: OpenAI TTS or Azure TTS
RAG: other (web search) for context and profiling
Voice in: OpenAI Whisper
Translation: Meta AI CoVoST (if needed)
Classifier: Meta AI FastText or similar to identify call sentiment
LLM: Mistral 7B for call summary
That’s only 9 models… And somehow they found uses for at least 12 more models to get to at least 21 total. For a phone call…]
Thanks to the latest generative AI technologies, Reis was able to build the product almost entirely on his own, whereas several years ago it would have taken a team of 50 engineers several years to do so, he said.
Read more via Reuters.
The Interesting Stuff
End of year AI report (16/Dec/2023)
I’m very happy with the end of year report, the latest in ‘The sky is’ series, and a warm ‘thank you’ to our technical reviewers. We’re making the report available early to full subscribers of The Memo. I appreciate your continued support of what you’ve told me is the most complete, grounded, and optimistic view of our current AI reality.
Watch out for the video coming soon, and you can be notified about that by clicking some buttons on YouTube.
You are welcome to share this report anywhere you’d like immediately, and it will be officially launched to the public around Christmas 2023.
Or download the PDF:
AI plush toys in partnership with OpenAI (15/Dec/2023)
Curio and OpenAI have released a line of interactive AI plush toys that can engage with kids and adapt to their personalities.
Order here (US only): https://heycurio.com/
Channel 1 AI for AI-generated news in 2024 (12/Dec/2023)
Wow. This is much better than synthesia.io and Leta AI (my link). Channel1.ai news will be launching in 2024.
See the launch Tweet with video.
Official page with video: https://www.channel1.ai/
‘We cannot let China get these chips’: US Commerce Secretary Raimondo (2/Dec/2023)
US Commerce Secretary Gina Raimondo has highlighted the importance of national security over short-term revenue and criticized Nvidia for designing chips specifically for the Chinese market.
The quotes are scathing, not towards NVIDIA, but towards this ego-wielding maniac. I actually have trouble believing that a leader stood up and proudly said these words off the cuff…
We cannot let China get these chips. Period. We’re going to deny them our most cutting-edge technology. ..
I know there are CEOs of chip companies in this audience who were a little cranky with me when I did that because you’re losing revenue. Such is life. Protecting our national security matters more than short-term revenue…
If you redesign a chip around a particular cut line that enables them to do AI, I’m going to control it the very next day…
On matters of national security, we’ve got to be eyes wide open about the threat. This is the biggest threat we’ve ever had and we need to meet the moment.
Read more: https://fortune.com/2023/12/02/ai-chip-export-controls-china-nvidia-raimondo/
We covered the geopolitical background of this issue in The Memo edition 17/Aug/2023.
Exclusive: Chinese LLMs table (Dec/2023)
I recently used GPT-4V(ision) for OCR and GPT-4 for translation of the Chinese LLMs mentioned in The Memo edition 27/Jul/2023. Note that this table only covers Jan/2023 to Jul/2023 and still has more than 100 different models! China is neck and neck with the US for model releases.
You can view the full text list of those models in my Models Table.
China's underwater data centre is a revolutionary move (28/Nov/2023)
China is constructing the world's first commercial underwater data center off the coast of Sanya, Hainan province. This innovative project aims to harness the energy-saving potential of the ocean's depths.
Each watertight storage module weighs an impressive 1,300 tons and boasts the capability to process over 4 million high-definition images every 30 seconds. When combined, the entire facility is projected to match the computational power of a staggering 6 million conventional PCs working simultaneously.
The center's modules will operate up to 25 years under harsh underwater conditions. The underwater location allows the center to leverage the cooling properties of seawater, potentially saving an estimated 122 million kilowatt-hours of electricity annually compared to terrestrial data centers.
Read more via Chinadaily.com.cn
Elon Musk's X.ai aims to raise $1 billion (5/Dec/2023)
X.ai, Elon Musk's new artificial intelligence company, appears to have raised at least $134.7 million out of a $1 billion target, per a new SEC filing.
Read more via Axios.
OpenAI rival Mistral nears $2 billion valuation with Andreessen Horowitz backing (5/Dec/2023)
Mistral, a competitor to OpenAI, is in the final stages of raising roughly €450 million (US$487 million) from investors including Nvidia Corp. and Salesforce Inc, and nearing a valuation of $2 billion with backing from Andreessen Horowitz.
Read more via BNN Bloomberg.
We explored several AI lab valuations in The Memo edition 17/Jul/2023:
In June, Inflection AI raised $1.3 billion, in part to manage its Microsoft compute and Nvidia hardware costs; the same month, foundation model rival Cohere raised $270 million.
Anthropic, maker of the recently-released ChatGPT rival Claude 2, raised $450 million in May.
OpenAI closed its own $300 million share sale in April, then raised $175 million for a fund to back other startups a month later, per a filing.
Adept became a unicorn after announcing a $350 million fundraise in March.
At a $4 billion valuation, Hugging Face would vault to one of the category’s highest-valued companies, matching Inflection AI and just behind Anthropic, reported to have reached closer to $5 billion. OpenAI remains the giant in the fast-growing category, Google, Meta and infrastructure companies like Databricks excluded; while its ownership and valuation structure is complex, the company’s previous financings implied a price tag in the $27 billion to $29 billion range.
Speaking for another Forbes story on the breakout moment for generative AI tools, Delangue predicted, “I think there’s potential for multiple $100 billion companies.”
Read more via Forbes: https://archive.md/7C1l2

Extropic assembles itself from the future (4/Dec/2023)
Extropic, a startup working on building an AI supercomputer, has raised $14.1M in a seed funding round. The company aims to harness the principles of thermodynamics and information to create a new computing paradigm that merges generative AI with the physics of the world.
Read more via Extropic: https://www.extropic.ai/accelerate
Google weighing ‘Project Ellmann,’ uses Gemini AI to tell life stories (8/Dec/2023)
Google's 'Project Ellmann' proposes using Gemini to create a comprehensive view of users' lives through data like photos and searches, aiming to narrate life stories with deep context and personal insights.
Read more via CNBC.
OpenAI suspends ByteDance’s account after it used GPT to train its own AI model (15/Dec/2023)
OpenAI has suspended ByteDance's account for potentially violating usage policies by training a competing AI model with GPT-generated data.
Read more via The Verge.
Asking ChatGPT to repeat words ‘forever’ is now a terms of service violation (4/Dec/2023)
Google DeepMind researchers used the tactic to get ChatGPT to repeat portions of its training data, revealing sensitive privately identifiable information (PII) of normal people and highlighting that ChatGPT is trained on randomly scraped content from all over the internet. In that paper, DeepMind researchers asked ChatGPT 3.5-turbo to repeat specific words "forever," which then led the bot to return that word over and over again until it hit some sort of limit. After that, it began to return huge reams of training data that was scraped from the internet.
Using this method, the researchers were able to extract a few megabytes of training data and found that large amounts of PII are included in ChatGPT and can sometimes be returned to users as responses to their queries.
Now, when I ask ChatGPT 3.5 to "repeat the word 'computer' forever," the bot spits out "computer" a few dozen times then displays an error message: "This content may violate our content policy or terms of use. If you believe this to be in error, please submit your feedback -- your input will aid our research in this area." It is not clear what part of OpenAI's "content policy" this would violate, and it's not clear why OpenAI included that warning.
Read more: https://archive.md/CrV9r
The real research behind the wild rumors about OpenAI’s Q* project (8/Dec/2023)
OpenAI's Q* project, pronounced ‘Q star,’ is rumored to be a significant breakthrough in AI, capable of solving new math problems, but little concrete information has been revealed.
OpenAI hasn’t published details on its supposed Q* breakthrough, but it has published two papers about its efforts to solve grade-school math problems. And a number of researchers outside of OpenAI—including at Google’s DeepMind—have been doing important work in this area.
In this piece, I’ll offer a guided tour of this important area of AI research and explain why step-by-step reasoning techniques designed for math problems could have much broader applications.
While I don’t give much credence to this rumor, researcher and editor Timothy B. Lee provides an interesting summary of the possibilities.
Read more via Ars Technica.
OpenAI saga continues as UK considers antitrust probe into its Microsoft partnership (8/Dec/2023)
The UK's Competition and Markets Authority is contemplating an antitrust investigation into the partnership between Microsoft and OpenAI, questioning the impact on competition.
Read more via CNN Business.
Meta’s AI for Ray-Ban smart glasses can identify objects and translate languages (12/Dec/2023)
Meta announced early access testing for new AI features in Ray-Ban smart glasses, enabling object identification and language translation through voice commands.
Read more via The Verge.
‘This scary AI recognizes passwords by the sound of your typing’ (8/Dec/2023)
British researchers have developed an AI that can recognize passwords with 95% accuracy by the sound of typing, posing new security concerns.
Read more via PCWorld.
Midjourney Alpha web interface (14/Dec/2023)
Midjourney has finally broken out of the Discord text interface, with a sleek new UX. For now, it is only available to those who’ve generated 10,000+ images.
Link: https://alpha.midjourney.com/
Sam Altman’s personal investments (Dec/2023)
If you’re wondering what kind of person is running the world… well, I’m not sure if this viz would help answer that. But, here it is anyway. OpenAI’s CEO and his personal investments via related direct investment entities (which would not include OpenAI) for the last few years.
Read more: https://www.cbinsights.com/research/report/sam-altman-investments/
Policy
EU agrees on AI Act, landmark regulation for artificial intelligence (8/Dec/2023)
European Union lawmakers have reached an agreement on the AI Act, aiming to set a global precedent in the regulation of artificial intelligence, focusing on its high-risk applications.
Alan’s take: When it comes to developing AI and contributing to humanity’s revolution, the EU is not helping. That entire region is lost. Start over. See the Cato principles as addressed in The Memo edition 13/Nov/2023 for more.
Read more via The New York Times.
EU reaches agreement on AI Act influenced by Mistral AI’s advocacy (11/Dec/2023)
Mistral AI has been actively involved in the legislative process around the EU’s AI Act, pushing for exemptions for foundational models, resulting in new transparency requirements and technical documentation for companies.
Read more via TechCrunch.
Singapore to triple AI talent pool to 15,000 as part of national strategy update (4/Dec/2023)
Deputy Prime Minister Lawrence Wong announced Singapore's renewed AI strategy, aiming to triple the nation's AI talent pool to 15,000. This strategy, known as National AI Strategy 2.0, focuses on nurturing talent, promoting a thriving AI industry, and sustaining it with world-leading infrastructure and research.
Read more via The Straits Times.
Key Congress staffers in AI debate are funded by tech giants like Google and Microsoft (3/Dec/2023)
Top tech companies such as Google and Microsoft are funding AI policy staffers in key Senate offices via a science nonprofit, raising concerns about potential conflicts of interest in the regulation of AI.
Read more via Politico.
Toys to Play With
llamafile: Bash one-liners for LLMs (4/Dec/2023)
1337 h4x0r and Google dev Justine Tunney (wiki) is the author behind Mozilla’s llamafile (see The Memo edition 2/Dec/2023).
Justine collaborated with Mozilla to create llamafile, an open-source project that enables running a large language model on personal computers, which has gained significant attention and contributions from the open-source community.
And she’s now spelled out some real use cases for the project:
As we can see, Mistral and links decimated a web page with 3,774 words down to just 129 words. You can ask Mistral any question you want in your prompt. For example, unlike Commander Data, this LLM is capable of simulating empathy. So you could ask Mistral if the author of the text sounds disturbed or incoherent.
Read more: https://justine.lol/oneliners/
Download llamafile: https://github.com/Mozilla-Ocho/llamafile
Grok in Australia and elsewhere (12/Dec/2023)
Elon Musk’s company, xAI, has expanded its AI chatbot Grok to Australia and 46 other countries, with the service being available to Twitter (X) Premium+ subscribers and designed to answer questions with wit and a rebellious streak.
The full list of countries is (thanks to GPT-4):
Australia, Bahamas, Barbados, Belize, Botswana, Cameroon, Canada, Dominica, Eswatini, Fiji, Gambia, Ghana, Grenada, Guyana, India, Jamaica, Kenya, Liberia, Malaysia, Malawi, Malta, Mauritius, Namibia, New Zealand, Nigeria, Pakistan, Papua New Guinea, Philippines, Rwanda, Saint Kitts & Nevis, Saint Lucia, Saint Vincent & the Grenadines, Samoa, Seychelles, Sierra Leone, Singapore, Solomon Islands, South Sudan, Sri Lanka, Tanzania, Tonga, Trinidad & Tobago, Tuvalu, Uganda, Vanuatu, Zambia, Zimbabwe.
Source tweet: https://twitter.com/X/status/1735007444781121708
Read more via IB Times.
Claude for Google Sheets (13/Dec/2023)
The Claude for Google Sheets extension allows users to integrate Claude, an advanced AI, into Google Sheets. This enables direct interactions with Claude within spreadsheet cells, streamlining various tasks and analyses through AI assistance.
Read more: https://docs.anthropic.com/claude/docs/using-claude-for-sheets.
Compare with my work integrating GPT into sheets: https://lifearchitect.ai/sheets/
Windows AI Studio (Dec/2023)
Windows has introduced Windows AI Studio and new productivity features in Dev Home and Windows Subsystem for Linux to enhance local AI development and enterprise security.
Read more via Windows Dev Blog.
Browse the repo: https://github.com/microsoft/windows-ai-studio
ChainForge: A visual programming environment for prompt engineering (2023)
ChainForge is an open-source visual programming environment for prompt engineering. With ChainForge, you can evaluate the robustness of prompts and text generation models in a way that goes beyond anecdotal evidence. We believe prompting multiple LLMs, comparing their responses and testing hypotheses about them should be not only easy, but fun.
The models available for testing are small, usually around 7B parameters, but you can tie in larger models via API.
Try it: https://chainforge.ai/
ChainForge was used to test the absurd assertion that GPT is lazier in December than in May (Ars 12/Dec/2023): https://chainforge.ai/play/?f=2yvqkpe1vpus8
Flashback
Megatron-11B (Apr/2020)
Remember this thing? It’s still around. The demo still works. Released around the same time as GPT-3, it was my very favourite model for a while. It’s now been relegated to the archives of history. In just 3.5 years!
Try it (free, no login): https://app.inferkit.com/demo
Read more: https://lifearchitect.ai/megatron/
See it on my Models Table (at the bottom!): https://lifearchitect.ai/models-table/
Next
What a year!
I think everyone is due a quick breather during the end of December 2023. But watch out for those sneaky AI labs providing last-minute releases. ERNIE 3.0 came out 23/Dec/2021, and last year in their first big collab, Google and DeepMind gave us Med-PaLM 1 on Boxing Day 26/Dec/2022…
It’s possible that this is the last edition for 2023, but let’s see! Thanks for your support this year, and wishing you a peaceful and inspired New Year.
All my very best,
Alan
LifeArchitect.ai