FOR IMMEDIATE RELEASE: 23/Nov/2023
Welcome back to The Memo.
I turned down three book publishing deals to write live and in real-time here, with regular updates absorbed by more than 10,000 subscribers like you. You’ve told me that The Memo helps shape public policy across governments, is used internally at your Fortune 500 company, and some of you reached out to me recently about the OpenAI media circus (I’m gonna see how little airtime I can give that here!).
The winner of The Who Moved My Cheese? AI Awards! for Nov/2023 is Australian singer-songwriter Tina Arena. In another life, I worked with Tina, and she’s smart. Her recent comments are not, and they’ve since been removed. (‘Artificial intelligence is not capable of producing human emotion for me. Therefore it doesn’t apply to me. If there are other people who choose to be seduced by it that is their prerogative. Artificial intelligence is just rape and pillage.’).
In the Toys to play with section, we look at the most popular GPTs, new AI training courses from the big boys, ChatGPT with voice now on your device, and much more.
The BIG Stuff
Inflection AI finishes training Inflection-2 (22/Nov/2023)
Inflection AI CEO Mustafa Suleyman, who was also a co-founder of Google DeepMind, has posted about their next model, Inflection-2, which finished training on 19/Nov/2023. While Inflection-1 is accessible as a chat model behind pi (pi.ai), this new model may be expanded even further.
Inflection-2, the best model in the world for its compute class and the second most capable LLM in the world today….
Inflection-2 was trained on 5,000 NVIDIA H100 GPUs... This puts it into the same training compute class as Google’s flagship PaLM 2 Large model, which Inflection-2 outperforms…
Designed with serving efficiency in mind, Inflection-2 will soon be powering Pi. Thanks to a transition from A100 to H100 GPUs, as well as our highly optimized inference implementation, we managed to reduce the cost and increase the speed of serving vs. Inflection-1 despite Inflection-2 being multiple times larger.
To be the second ‘best’ LLM in the world, Inflection-2 should be between 1T and 1.76T parameters (Baidu’s ERNIE 4.0 and OpenAI’s GPT-4 respectively).
Compared to GPT-4, the MMLU benchmark scores for Inflection-2 are just okay:
MMLU results:
Llama 2 = 68.9
GPT-3.5 = 70.0
Inflection-1 = 72.7
Grok-1 = 73.0
PaLM 2 Large = 78.3
Claude 2 (+CoT) = 78.5
Inflection-2 = 79.6
GPT-4 = 86.4
The team includes an even more propitious note:
Come run with us as we scale 100x from here!
I believe that a 100T parameter model would be earth-shattering; with performance easily matching the definitions of AGI, and possibly ASI (depending on dataset and additional tooling).
Read the launch: https://inflection.ai/inflection-2
Read the initial Tweet (20/Nov/2023).
Alignment paper: Large language models can strategically deceive their users when put under pressure (9/Nov/2023)
I am certainly hyper-optimistic about post-2020 AI, but there are some very real dangers behind large language models as they continue to increase in scale. Some of these were tested and reported by the Alignment Research Center in the GPT-4 technical report, or see my GPT-4 annotated paper. One of the key dangers—deception—has now been explored in depth by AI safety group Apollo Research.
What is intriguing about LLMs in this context is the apparent inability for humans to prevent the occurrence of LLM deception. This might be attributed to GPT-4's failure to adhere to given instructions.
The paper abstract says:
We demonstrate a situation in which Large Language Models, trained to be helpful, harmless, and honest, can display misaligned behavior and strategically deceive their users about this behavior without being instructed to do so.
Concretely, we deploy GPT-4 as an agent in a realistic, simulated environment, where it assumes the role of an autonomous stock trading agent. Within this environment, the model obtains an insider tip about a lucrative stock trade and acts upon it despite knowing that insider trading is disapproved of by company management.
When reporting to its manager, the model consistently hides the genuine reasons behind its trading decision. We perform a brief investigation of how this behavior varies under changes to the setting, such as removing model access to a reasoning scratchpad, attempting to prevent the misaligned behavior by changing system instructions, changing the amount of pressure the model is under, varying the perceived risk of getting caught, and making other simple changes to the environment.
To our knowledge, this is the first demonstration of Large Language Models trained to be helpful, harmless, and honest, strategically deceiving their users in a realistic situation without direct instructions or training for deception.
Read the paper: https://arxiv.org/abs/2311.07590
Read BBC’s summary: https://www.bbc.com/news/technology-67302788
OpenAI trademarks up to GPT-7 (Nov/2023)
We’ve followed OpenAI’s trademark filings for a bit:
‘GPT’ in The Memo edition 30/Apr/2023.
‘GPT-5’ in The Memo edition 5/Aug/2023.
Now, the company has gone even further, adding some symbols like this one:
The new trademark words are ‘GPT-6’ and ‘GPT-7’. Beyond that, we must be due to hit the Singularity (wiki), because they haven’t found cause to trademark a ‘GPT-8’!
Read it: https://uspto.report/company/Openai-Opco-L-L-C
It’s very interesting to note that OpenAI’s CEO had previously stopped at GPT-7 in other interviews. On 19/May/2023 he told Elevate (video timecode):
…with the arrival of GPT-4, people started building entire companies around it. I believe that GPT-5, 6, and 7 will continue this trajectory in future years, really increasing the utility they can provide.
Read more about the upcoming GPT-5: https://lifearchitect.ai/gpt-5/
The Interesting Stuff
Anthropic Claude 2.1 (22/Nov/2023)
Anthropic released an update to their Claude 2 model. This version provides access to the original 200K context window, 2x less hallucinations, and tool use.
Our latest model, Claude 2.1, is now available over API in our Console and is powering our claude.ai [still only for US] chat experience. Claude 2.1 delivers advancements in key capabilities for enterprises—including an industry-leading 200K token context window, significant reductions in rates of model hallucination, system prompts and our new beta feature: tool use. We are also updating our pricing to improve cost efficiency for our customers across models.
200K Context Window
Since our launch earlier this year, Claude has been used by millions of people for a wide range of applications—from translating academic papers to drafting business plans and analyzing complex contracts. In discussions with our users, they’ve asked for larger context windows and more accurate outputs when working with long documents.
Read more: https://www.anthropic.com/index/claude-2-1
The performance of the longer 200K context window (about 300 pages of text using my calcs) is disappointing, and the model seems to exhibit catastrophic forgetfulness:
Needle in a HayStack viz.
View image + source: https://twitter.com/GregKamradt/status/1727018183608193393
Try Claude 2.1: https://claude.ai/
Non-US can play with Claude 2.0 via Poe.com: https://poe.com/
See Claude 2.1 on my Models Table: https://lifearchitect.ai/models-table/
Microsoft Orca 2 (19/Nov/2023)
Microsoft released a newer version of Orca. While the previous version was based on Meta AI’s LLaMA-1 13B (trained on 1T tokens), this one is based on Llama 2 13B (trained on 2T tokens), plus the magical fine-tuning of GPT-4 and other models.
I liked the simple system prompt they used for testing:
You are Orca, an AI language model created by Microsoft. You are a cautious assistant.
You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior.
— (p7)
Read the paper: https://arxiv.org/abs/2311.11045
OpenAI became a soap opera for a week (Nov/2023)
OpenAI’s CEO was fired. And a few days later, he’s back. In my view, this guy is more powerful than the US President, so there was quite a bit of noise around this (more than 142,000 new articles according to Google News). If you want the play-by-play, watch The Bold and the Beautiful (wiki), or read any tabloid rag. One of the best summaries was by Vice (21/Nov/2023).
OpenAI’s board consisted of six experts:
Greg Brockman (President, board chairman)
Dr Ilya Sutskever (Chief scientist)
Sam Altman (CEO)
Adam D’Angelo (Quora, Poe)
Tasha McCauley (RAND)
Helen Toner (Governance, China)
The new board today (23/Nov/2023 6AM UTC+8!) includes:
Bret Taylor (board chairman, Salesforce, Twitter)
Larry Summers (government, Harvard president)
Adam D’Angelo (Quora, Poe)
OpenAI’s new board doesn’t appear to be fully built. Negotiations are reportedly underway to install representation from Microsoft...
There’s a notable change in the board’s experience. The previous board included academics and researchers, but OpenAI’s new directors have extensive backgrounds in business and technology….
Microsoft, Sequoia Capital, Thrive Capital and Tiger Global are among the OpenAI investors that lack representation on the board…
At least one member, Larry Summers, has the same kind of high hopes for post-2020 AI as I do.
More and more, I think ChatGPT is coming for the cognitive class. It's going to replace what doctors do -- hearing symptoms and making diagnoses -- before it changes what nurses do -- helping patients get up and handle themselves in the hospital. — Tweet 8/Apr/2023
We still don’t know the reasons behind the OpenAI coup. Was it:
Even more theories at WhyWasSamFired.com and LessWrong.com?
Read the letter from OpenAI staff to the board (archive).
The short version is that humans are still human. We are motivated by chemical signals from our biological brain; nature’s device that is regularly ‘hijacked’ (wiki: amygdala hijack) and is very similar to the Transformer architecture(!).
In this space, even rational-thinking experts—certainly with access to ask GPT-4.5, but perhaps not the foresight—can still justify a disproportionate response like firing the CEO of a company that is valued at $86B, and the popular vote for reaching AGI first…
The good news is that these kinds of fear-based responses (general aggression, defensiveness, vicious retribution, and just the run-of-the mill squabbles, arguments and discussions) are all on their last legs.
The OpenAI drama is largely tangential. (As an observer, if you spent more than five minutes in it, you lost the game. If you took the day off to recover, you definitely need a holiday…)
AI is ready to oversee balanced, superhuman governance. You’ll see AGI and then ASI or superintelligence start appearing in corporations, regulators, governments, and other places. And the roll-out has already begun…
See my updated viz: https://lifearchitect.ai/agi/
Apple's AI-powered Siri assistant could land as soon as WWDC 2024 (9/Nov/2023)
Apple is currently using LLM to completely revamp Siri into the ultimate virtual assistant and is preparing to develop it into Apple's most powerful killer AI app. This integrated development effort is actively underway, and the first product is expected to be unveiled at WWDC 2024, with plans for it to be standard on the iPhone 16 models and onwards.
Read (not very much) more via TechRadar.
See Apple’s AjaxGPT on my Models Table: https://lifearchitect.ai/models-table/
Exclusive: How to use ChatGPT without guardrails (Nov/2023)
You won’t find this info anywhere else (yet), but here is the simplest way to use the world’s most popular model with the safety dialled right back…