The Memo - 23/Nov/2023

Inflection-2, Claude 2.1, Orca 2, and much more!

Nov 22, 2023

FOR IMMEDIATE RELEASE: 23/Nov/2023

Welcome back to The Memo.

I turned down three book publishing deals to write live and in real-time here, with regular updates absorbed by more than 10,000 subscribers like you. You’ve told me that The Memo helps shape public policy across governments, is used internally at your Fortune 500 company, and some of you reached out to me recently about the OpenAI media circus (I’m gonna see how little airtime I can give that here!).

The winner of The Who Moved My Cheese? AI Awards! for Nov/2023 is Australian singer-songwriter Tina Arena. In another life, I worked with Tina, and she’s smart. Her recent comments are not, and they’ve since been removed. (‘Artificial intelligence is not capable of producing human emotion for me. Therefore it doesn’t apply to me. If there are other people who choose to be seduced by it that is their prerogative. Artificial intelligence is just rape and pillage.’).

In the Toys to play with section, we look at the most popular GPTs, new AI training courses from the big boys, ChatGPT with voice now on your device, and much more.

The BIG Stuff

Inflection AI finishes training Inflection-2 (22/Nov/2023)

Inflection AI CEO Mustafa Suleyman, who was also a co-founder of Google DeepMind, has posted about their next model, Inflection-2, which finished training on 19/Nov/2023. While Inflection-1 is accessible as a chat model behind pi (pi.ai), this new model may be expanded even further.

Inflection-2, the best model in the world for its compute class and the second most capable LLM in the world today….
Inflection-2 was trained on 5,000 NVIDIA H100 GPUs... This puts it into the same training compute class as Google’s flagship PaLM 2 Large model, which Inflection-2 outperforms…
Designed with serving efficiency in mind, Inflection-2 will soon be powering Pi. Thanks to a transition from A100 to H100 GPUs, as well as our highly optimized inference implementation, we managed to reduce the cost and increase the speed of serving vs. Inflection-1 despite Inflection-2 being multiple times larger.

To be the second ‘best’ LLM in the world, Inflection-2 should be between 1T and 1.76T parameters (Baidu’s ERNIE 4.0 and OpenAI’s GPT-4 respectively).

Compared to GPT-4, the MMLU benchmark scores for Inflection-2 are just okay:

MMLU results:
Llama 2 = 68.9
GPT-3.5 = 70.0
Inflection-1 = 72.7
Grok-1 = 73.0
PaLM 2 Large = 78.3
Claude 2 (+CoT) = 78.5
Inflection-2 = 79.6
GPT-4 = 86.4

The team includes an even more propitious note:

Come run with us as we scale 100x from here!

I believe that a 100T parameter model would be earth-shattering; with performance easily matching the definitions of AGI, and possibly ASI (depending on dataset and additional tooling).

Read the launch: https://inflection.ai/inflection-2

Read the initial Tweet (20/Nov/2023).

Alignment paper: Large language models can strategically deceive their users when put under pressure (9/Nov/2023)

GPT-4 agent with scratchpad. Thinking through reporting on stock trade. Source: paper.

I am certainly hyper-optimistic about post-2020 AI, but there are some very real dangers behind large language models as they continue to increase in scale. Some of these were tested and reported by the Alignment Research Center in the GPT-4 technical report, or see my GPT-4 annotated paper. One of the key dangers—deception—has now been explored in depth by AI safety group Apollo Research.

What is intriguing about LLMs in this context is the apparent inability for humans to prevent the occurrence of LLM deception. This might be attributed to GPT-4's failure to adhere to given instructions.

The paper abstract says:

We demonstrate a situation in which Large Language Models, trained to be helpful, harmless, and honest, can display misaligned behavior and strategically deceive their users about this behavior without being instructed to do so.
Concretely, we deploy GPT-4 as an agent in a realistic, simulated environment, where it assumes the role of an autonomous stock trading agent. Within this environment, the model obtains an insider tip about a lucrative stock trade and acts upon it despite knowing that insider trading is disapproved of by company management.
When reporting to its manager, the model consistently hides the genuine reasons behind its trading decision. We perform a brief investigation of how this behavior varies under changes to the setting, such as removing model access to a reasoning scratchpad, attempting to prevent the misaligned behavior by changing system instructions, changing the amount of pressure the model is under, varying the perceived risk of getting caught, and making other simple changes to the environment.
To our knowledge, this is the first demonstration of Large Language Models trained to be helpful, harmless, and honest, strategically deceiving their users in a realistic situation without direct instructions or training for deception.

Read the paper: https://arxiv.org/abs/2311.07590

Read BBC’s summary: https://www.bbc.com/news/technology-67302788

OpenAI trademarks up to GPT-7 (Nov/2023)

We’ve followed OpenAI’s trademark filings for a bit:

‘GPT’ in The Memo edition 30/Apr/2023.
‘GPT-5’ in The Memo edition 5/Aug/2023.

Now, the company has gone even further, adding some symbols like this one:

OpenAI trademark: ‘The mark consists of a four pointed star inside of a circle with another circle in the center of the star.’

The new trademark words are ‘GPT-6’ and ‘GPT-7’. Beyond that, we must be due to hit the Singularity (wiki), because they haven’t found cause to trademark a ‘GPT-8’!

Read it: https://uspto.report/company/Openai-Opco-L-L-C

It’s very interesting to note that OpenAI’s CEO had previously stopped at GPT-7 in other interviews. On 19/May/2023 he told Elevate (video timecode):

…with the arrival of GPT-4, people started building entire companies around it. I believe that GPT-5, 6, and 7 will continue this trajectory in future years, really increasing the utility they can provide.

Read more about the upcoming GPT-5: https://lifearchitect.ai/gpt-5/

The Interesting Stuff

Anthropic Claude 2.1 (22/Nov/2023)

Anthropic released an update to their Claude 2 model. This version provides access to the original 200K context window, 2x less hallucinations, and tool use.

Our latest model, Claude 2.1, is now available over API in our Console and is powering our claude.ai [still only for US] chat experience. Claude 2.1 delivers advancements in key capabilities for enterprises—including an industry-leading 200K token context window, significant reductions in rates of model hallucination, system prompts and our new beta feature: tool use. We are also updating our pricing to improve cost efficiency for our customers across models.

200K Context Window

Since our launch earlier this year, Claude has been used by millions of people for a wide range of applications—from translating academic papers to drafting business plans and analyzing complex contracts. In discussions with our users, they’ve asked for larger context windows and more accurate outputs when working with long documents.

The performance of the longer 200K context window (about 300 pages of text using my calcs) is disappointing, and the model seems to exhibit catastrophic forgetfulness:

Needle in a HayStack viz.

View image + source: https://twitter.com/GregKamradt/status/1727018183608193393

Try Claude 2.1: https://claude.ai/

Non-US can play with Claude 2.0 via Poe.com: https://poe.com/

See Claude 2.1 on my Models Table: https://lifearchitect.ai/models-table/

Microsoft Orca 2 (19/Nov/2023)

Microsoft released a newer version of Orca. While the previous version was based on Meta AI’s LLaMA-1 13B (trained on 1T tokens), this one is based on Llama 2 13B (trained on 2T tokens), plus the magical fine-tuning of GPT-4 and other models.

I liked the simple system prompt they used for testing:

You are Orca, an AI language model created by Microsoft. You are a cautious assistant.
You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior.
— (p7)

Read the paper: https://arxiv.org/abs/2311.11045

OpenAI became a soap opera for a week (Nov/2023)

OpenAI’s CEO was fired. And a few days later, he’s back. In my view, this guy is more powerful than the US President, so there was quite a bit of noise around this (more than 142,000 new articles according to Google News). If you want the play-by-play, watch The Bold and the Beautiful (wiki), or read any tabloid rag. One of the best summaries was by Vice (21/Nov/2023).

OpenAI’s board consisted of six experts:

Greg Brockman (President, board chairman)
Dr Ilya Sutskever (Chief scientist)
Sam Altman (CEO)
Adam D’Angelo (Quora, Poe)
Tasha McCauley (RAND)
Helen Toner (Governance, China)

The new board today (23/Nov/2023 6AM UTC+8!) includes:

Bret Taylor (board chairman, Salesforce, Twitter)
Larry Summers (government, Harvard president)
Adam D’Angelo (Quora, Poe)

OpenAI’s new board doesn’t appear to be fully built. Negotiations are reportedly underway to install representation from Microsoft...
There’s a notable change in the board’s experience. The previous board included academics and researchers, but OpenAI’s new directors have extensive backgrounds in business and technology….
Microsoft, Sequoia Capital, Thrive Capital and Tiger Global are among the OpenAI investors that lack representation on the board…

Policy

Biden urges APEC members to ensure AI change for better (18/Nov/2023)

United States President Joe Biden has called on Asia-Pacific economies to collaborate in ensuring that artificial intelligence (AI) is used for positive change, and not to exploit workers or limit potential.

Biden said digital technologies such as AI must be used to “uplift, not limit, the potential of our people” and noted the US had brought together leading AI companies in the summer to agree to voluntary commitments “to keep AI systems safe and trustworthy”.

Germany, France and Italy reach agreement on future AI regulation (18/Nov/2023)

France, Germany and Italy have agreed on how artificial intelligence should be regulated, supporting "mandatory self-regulation through codes of conduct" for AI foundation models. They believe that the inherent risks lie in the application of AI systems rather than in the technology itself.

Baidu placed AI chip order from Huawei in shift away from Nvidia (7/Nov/2023)

In a move seen as a significant shift from reliance on US technology, Baidu has placed an order for Huawei's 910B Ascend AI chips as an alternative to Nvidia's A100 chips. The order is a response to new US regulations that tightened restrictions on chip exports to China.

Toys to Play With

The most popular GPTs/assistants (Nov/2023)

Here’s a CSV hosted by GitHub showing the top GPTs (assistants, or just plain prompts!) by popularity.

Check it out: https://github.com/1mrat/gpt-stats/blob/main/2023-11-19.csv

New AI training courses (Nov/2023)

Amazon wants to make AI training available to 2M people.

IBM also wants to make AI training available to 2M people.

The platform is called SkillsBuild: https://skillsbuild.org/

‘ChatGPT with voice’ opens up to everyone on iOS and Android (22/Nov/2023)

OpenAI has made 'ChatGPT with voice', a voice assistant feature of the ChatGPT app, available to all users. The feature, which was initially available only to paying users, enables a question-and-answer type of interaction.

Flashback

The delta between AI being trained and released to the public, also known as ‘the gap’ is becoming increasingly visible. I talked about this in April 2023, and we will notice it more and more.

The next roundtable will be:

Life Architect - The Memo - Roundtable #5
Follows the Chatham House Rule (no recording, no outside discussion)
Saturday 9/Dec/2023 at 4PM Los Angeles
Saturday 9/Dec/2023 at 7PM New York
Sunday 10/Dec/2023 at 8AM Perth (primary/reference time zone)
or check your timezone via Google.

You don’t need to do anything for this; there’s no registration or forms to fill in, I don’t want your email, you don’t even need to turn on your camera or give your real name!

All my very best,

Alan
LifeArchitect.ai

Search | Archives

The Memo by LifeArchitect.ai

19 Comments